-
PDF
- Split View
-
Views
-
Cite
Cite
Maria Economou, Shauni Van Herck, Femke Vanden Bempt, Toivo Glatz, Jan Wouters, Pol Ghesquière, Jolijn Vanderauwera, Maaike Vandermosten, Investigating the impact of early literacy training on white matter structure in prereaders at risk for dyslexia, Cerebral Cortex, Volume 32, Issue 21, 1 November 2022, Pages 4684–4697, https://doi.org/10.1093/cercor/bhab510
- Share Icon Share
Abstract
Recent prereading evidence demonstrates that white matter alterations are associated with dyslexia even before the onset of reading instruction. At the same time, remediation of reading difficulties is suggested to be most effective when provided as early as kindergarten, yet evidence is currently lacking on the early neuroanatomical changes associated with such preventive interventions. To address this open question, we investigated white matter changes following early literacy intervention in Dutch-speaking prereaders (aged 5–6 years) with an increased cognitive risk for developing dyslexia. Diffusion-weighted images were acquired before and after a 12-week digital intervention in three groups: (i) at-risk children receiving phonics-based training (n = 31); (ii) at-risk children engaging with active control training (n = 25); and (iii) typically developing children (n = 27) receiving no intervention. Following automated quantification of white matter tracts relevant for reading, we first examined baseline differences between at-risk and typically developing children, revealing bilateral dorsal and ventral differences. Longitudinal analyses showed that white matter properties changed within the course of the training; however, the absence of intervention-specific results suggests that these changes rather reflect developmental effects. This study contributes important first insights on the neurocognitive mechanisms of intervention that precedes formal reading onset.
Introduction
Learning to read is one of the most important educational achievements of early childhood, setting the foundation for long-term learning. Reading development relies on the reorganization of existing brain circuits including dorsal temporoparietal, ventral occipitotemporal, and inferior frontal regions, that facilitate the processing and integration of orthographic, phonological, and semantic information (Pugh et al. 2001; Schlaggar and McCandliss 2007). Although the majority of children learn to read with relative ease during the first school years, around 7% are diagnosed with developmental dyslexia, which means they experience severe and persistent reading difficulties that occur despite adequate opportunity and instruction (Peterson and Pennington 2015). In most educational settings, dyslexia is typically diagnosed after years of demonstrated reading failure, which has a considerable impact on socioemotional well-being (Valas 1999; Grills-Taquechel et al. 2012) and future educational attainment (Dougherty 2003). Importantly, a gap in reading ability between dyslexic and typical readers is shown to be present as early as first grade (Ferrer et al. 2015), and is likely to persist throughout the school years and potentially widen over time if not addressed properly (Stanovich 1986; Ferrer et al. 2015). Research has shown that extensive reading interventions (comprising at least 100 sessions) are more effective if provided during kindergarten and first grade, compared with later grades (Wanzek and Vaughn 2007). Yet, in current practice, struggling readers are likely to receive the necessary support only after years of reading failure and thus after the most sensitive period for intervention has passed. This has been described as the “dyslexia paradox” (Ozernov-Palchik and Gaab 2016). With the aim of reducing or closing the gap, preventive interventions (i.e., interventions that take place before the onset of reading acquisition) offer promising potential to overcome the consequences of this paradox and ensure that reducing or even closing the gap between struggling and typical readers does not become unattainable. To date, however, no studies have investigated the neuroanatomical changes associated with preventive interventions in young children at risk for dyslexia. Hence, the goal of this study is to characterize white matter microstructural changes during such an early intervention.
In the last two decades, an abundance of neuroimaging studies provided evidence of atypical brain structure in dyslexia, and especially in white matter. Long-range white matter pathways carry signals between cortical regions and therefore constitute a key part of the reading circuitry. Diffusion-weighted imaging (DWI) studies, which enable the identification and quantification of white matter connections, have identified both dorsal and ventral pathways linked to reading: the arcuate fasciculus (AF), the inferior longitudinal fasciculus (ILF), and the inferior frontooccipital fasciculus (IFOF) (Ben-Shachar et al. 2007; Vandermosten, Boets, Wouters, et al. 2012; Wandell and Yeatman 2013; Zhang et al. 2014). The AF dorsally connects inferior frontal to superior temporal and inferior parietal cortex (Catani et al. 2005). The left AF is thought to constitute a key region for sensory-motor mapping of sound to articulation (Saur et al. 2008) as well as grapheme–phoneme mapping (Gullick and Booth 2014). Indeed, several studies support the role of the left AF sustaining phonological aspects of reading (Yeatman et al. 2011; Saygin et al. 2013; Vanderauwera et al. 2015; Vandermosten et al. 2015), while reduced fractional anisotropy (FA) has been reported in both children (Vanderauwera et al. 2017; Van Der Auwera et al. 2021) and adults (Vandermosten, Boets, Poelmans, et al. 2012) with dyslexia. The IFOF and ILF are ventral association tracts connecting the occipital to the inferior frontal and to the anterior temporal cortex, respectively (Saur et al. 2008; Forkel et al. 2014). The left IFOF has been linked to orthographic aspects of reading in both adults and school-aged children (Vandermosten, Boets, Poelmans, et al. 2012; Vanderauwera et al. 2018; Banfi et al. 2019). A different pattern seems to emerge in prereaders, however, with evidence suggesting associations between bilateral IFOF and early phonological ability (Walton et al. 2018) as well as early reading-related cognitive measures (Vanderauwera et al. 2018). Properties of bilateral ILF have been shown to correlate with phonological decoding in school-aged children (Lebel et al. 2013), while in prereading children, left ILF properties were related to emergent literacy skills including both phonological and orthographic aspects (Vanderauwera et al. 2018). In terms of anatomy, both the IFOF and ILF likely contain projections adjacent to the posterior occipitotemporal cortex around the visual word form area (Yeatman et al. 2013), a region specialized in visual letter recognition, and thus important for word recognition and consequently reading (Dehaene and Cohen 2011).
Most strikingly, recent studies in prereaders suggest that these white matter alterations associated with dyslexia are evident even before the onset of reading acquisition. More specifically, properties of the left frontotemporal and frontoparietal AF segments and left IFOF were shown to differ between prereaders with and without a familial risk for dyslexia (Vandermosten et al. 2015; Langer et al. 2017; Wang et al. 2017). In addition to a link with familial risk, FA differences in bilateral AF were already present in prereading children who later developed dyslexia compared with children who later developed typical reading skills (Vanderauwera et al. 2017). Longitudinal evidence of children throughout reading development also suggests that white matter properties of the left AFdirect in kindergarten were predictive of later reading outcomes (Vanderauwera et al. 2017; Borchers et al. 2019; Van Der Auwera et al. 2021), while the rate of white matter development was significantly associated with subsequent reading skill (Yeatman, Dougherty, Ben-Shachar, 2012; Wang et al. 2017). Moreover, recent evidence points out that early deficits in the left AFdirect of dyslexic compared with typical readers remain present throughout primary school (Van Der Auwera et al. 2021). Taken together, these findings provide converging evidence that early differences in white matter structure predate the formal onset of reading, suggesting a central role in determining (a) typical trajectories of reading development.
In light of these findings, the question arises whether preventive intervention can target these early white matter deviances and potentially have an impact on subsequent reading development. Previous research provides convincing evidence that white matter is malleable and subject to learning- and experience-dependent changes in controlled designs. Studies in school-aged children demonstrated that explicit training in high-level cognitive domains such as spelling and math, resulted in behavioral improvements accompanied by training-induced white matter plasticity (Gebauer et al. 2012; Jolles et al. 2016). Such training-induced plasticity was also demonstrated in a younger age group, where Ekerdt et al. (2020) showed white matter changes in typically developing 4-year-old preliterate children following three weeks of oral word learning. More specifically, children in the oral word training group had increased FA in the left dorsal precentral gyrus, when compared to both active and passive control groups.
Only a few studies have directly investigated structural plasticity following targeted reading intervention. Following a reading intervention in school-aged children between 6 and 9 years old, increased cortical thickness was reported in treatment responders, when compared with treatment nonresponders and a waiting control group (Romeo et al. 2018). With respect to white matter studies, Keller and Just (2009) demonstrated for the first time that poor readers (aged 8–12 years) who received intensive reading instruction over the course of 6 months, showed increased FA in the left centrum semiovale in contrast to both poor readers who received no training and good readers. Using a longitudinal design with several imaging assessments, Huber et al. (2018) provide additional evidence for reading intervention-related white matter plasticity in children with reading difficulties. The authors reported widespread changes in diffusion properties together with significant growth in reading skill, after a training period of eight weeks. On the other hand, Partanen et al. (2021) did not find any evidence of intervention-related white matter changes following a three-month intervention in dyslexic children. Overall, these studies provide important insights on how behavioral interventions targeting reading skills can induce measurable changes in white matter properties of school-aged children, although evidence is lacking on the impact of interventions during the sensitive period for learning to read. Moreover, the absence of adequate inclusion of control groups in many training studies makes it difficult to separate maturation from learning or experience effects. As a result, very little is known about the mechanisms of early intervention, particularly when it takes place prior to formal reading instruction.
The present study aimed to address these remaining gaps in knowledge by investigating the impact of a preventive intervention in 5-year-old prereaders with an increased cognitive risk for developing dyslexia. Our objectives were twofold. First, we aimed to examine cognitive risk-related white matter differences prior to intervention by comparing at-risk with typically developing prereaders. Second, by means of a controlled intervention design, we sought to characterize intervention-related white matter changes following 12 weeks of digital at-home training. In order to comprehensively address these goals, we assessed white matter microstructure longitudinally in (i) at-risk children who received phonics-based training adapted for kindergarten, (ii) at-risk children engaging with another immersive game that did not train literacy skills, and (iii) typically developing children receiving no intervention. We hypothesized phonics-based training-related FA changes in white matter connections sustaining early phonological and orthographic processing, therefore focusing on four bilateral pathways: the frontotemporal and frontoparietal segments of the dorsal AF (AFdirect and AFanterior, respectively) and the ventral ILF and IFOF.
Materials and Methods
Study Design
Diffusion-weighted images were collected before and after a 12-week training period to evaluate changes in white matter structure. These data were collected as part of a larger longitudinal intervention project (n = 149) in which additional behavioral and neurophysiological (Vanden Bempt et al. 2021; Van Herck et al. 2021) assessments were carried out. The current study focuses on the direct effects of phonics-based training on white matter properties.
Participants
The participants of this study were 90 monolingual Dutch-speaking children in their third year of kindergarten, which is the last year before starting primary school. Participants had no prior known hearing or neurological problems, no formal diagnosis or risk for attention deficit hyperactivity disorder (Smidts and Oosterlaan 2007), and no history of speech and language therapy. All children had received comparable kindergarten education up until the beginning of the study and had not received any formal reading or writing instruction, in accordance with Flemish government guidelines (https://onderwijsdoelen.be/). The data reported in this study were collected between December 2018 and June 2019, thus before the start of the COVID-19 pandemic.
Children with and without an increased cognitive risk for developing dyslexia were selected based on a screening conducted at school at the beginning of their last year of kindergarten. Figure 1 provides a detailed description of participant flow in the study. Over 1900 families responded to a first questionnaire, 1225 of whom met the inclusion criteria described above and were selected to participate in a risk screening. Nonverbal reasoning and subsequent intelligence estimates were calculated using a tablet-based version of the Raven’s Color Progressive Matrices (Raven et al. 1984). Three early literacy predictors were assessed to define risk criteria: letter knowledge, phonological awareness, and rapid automatized naming (Puolakanaho et al. 2007; Caravolas et al. 2012). After excluding incomplete or unreliable scores (n = 21) and children with the lowest 10% of nonverbal intelligence estimates (n = 113), percentile scores for the early literacy measures were calculated in a sample of 1091 Flemish kindergarteners. Children were classified as at-risk if they performed below the 30th percentile on two out of the three measures and below the 40th percentile on letter knowledge. They were classified as typically developing if they performed above the 40th percentile on all measures. For more details on the screening procedure and behavioral assessments, see Verwimp et al. (2020).

According to the criteria described above, 60 children with and 30 children without a cognitive risk were selected to take part in the intervention study. Of the 60 at-risk children, 31 received a tablet-based intervention that aimed at training early literacy skills, while the other 29 children engaged with a comparably immersive control intervention. Group assignment for the at-risk children was done using a block randomization procedure (R package randomizr; Coppock 2019), to balance the covariates of sex and birth trimester across groups. The group of typically developing children (n = 30) was matched for sex, age, and nonverbal intelligence and participated in the behavioral and magnetic resonance imaging (MRI) assessments but not in any intervention.
For all cognitive and neuroimaging assessments, written informed consent and verbal assent were obtained from all primary caregivers and children, respectively. This study was approved by the Medical Ethics Committee of the Leuven University Hospital.
Tablet-Based Intervention
In the present study, early literacy intervention focused on the systematic training of grapheme–phoneme coupling (how letters correspond to sounds), a fundamental aspect of learning to read in alphabetic scripts. Interventions that involve this type of training were shown to be the most beneficial in ameliorating reading deficits in poor readers (Galuschka et al. 2014) and were also suggested to be beneficial for young children at risk of reading delay (Hatcher et al. 2004; Snowling and Hulme 2011, 2012; Lonigan et al. 2013). For the specific purposes of this study, we adapted the GraphoGame interface (Richardson and Lyytinen 2014) to create a new tablet-based version (GraphoGame-Flemish; GG-FL) which is appropriate for Flemish kindergarteners without formal reading instruction. Briefly, the training started by introducing the graphemes, providing exercises of visual and auditory discrimination of graphemes and phonemes, while gradually progressing to exercises of grapheme–phoneme coupling, phoneme blending and counting, and early reading. For a detailed description of the game content and level progression, see Glatz et al. (2022).
Children in the active control (AC) group received a different type of training. For this purpose, six commercially available tablet-based games (LEGO City My City, LEGO DUPLO Town, LEGO DUPLO Connected Train, LEGO Friends: Heartlake Rush, PLAYMOBIL Horse Farm, and PLAYMOBIL Police) were implemented within a single custom-made application. The games were selected because they were age-appropriate, were motivating enough for a training period of 12 weeks, and did not train early literacy skills. These characteristics were confirmed by pilot data of 5-year-old kindergarteners collected before the start of the study.
The intervention was intended for at-home use, whereby all participating families received the necessary equipment (Samsung Galaxy Table E9.6 tablet and Audiotechnica ATH M20X headphones). The games were installed on a child-friendly profile and the app icons were modified into colored stars to conceal the group assignment. They were asked to play their game (GG-FL or AC) for 15 min a day, six days a week for a total duration of 12 weeks. As part of the larger longitudinal project, both intervention groups received an additional story listening game, which they were asked to use for 10 min a day in addition to their normal training schedule. The effects of story listening fall beyond the scope of the current study and will not be discussed further. Note, however, that the two groups (GG-FL and AC) did not differ in the amount of story listening (W = 378.5, P = 0.805) or the total amount of intervention-related tablet exposure (W = 361, P = 0.480). All families received additional support during the intervention period in the form of a detailed instruction manual, intervention calendar to monitor progress, as well as stickers and small rewards for reaching milestones during the training period. Data logs were transmitted daily to a central university server and contact with the parents was maintained throughout the intervention in the case of technical or motivational/compliance issues.
MRI Data Acquisition
Participants were invited for two MRI sessions, one before and one shortly after the intervention phase. All images were acquired on a 3 T Philips Achieva MRI scanner (Philips, Best, The Netherlands) using a 32-channel head coil and identical acquisition parameters. The sequence consisted of seven nondiffusion-weighted volumes, followed by 20, 32, and 60 diffusion-weighted volumes with b-values of 700, 1000, and 2000 s/mm2, respectively. Additional scanning parameters were as follows: 62 transverse slices, slice thickness = 2.2 mm, voxel size 2.2 x 2.3 x 2.2 mm3, TR/TE (ms) = 3593/88 ms, flip angle = 90|${\kern0em }^{\circ }$|, multiband factor MB = 2, EPI factor = 43, and acquisition time = 7:38 min. In the same session, a reversed phase-encoding direction (posterior–anterior) sequence was acquired for subsequent distortion correction during processing. Anatomical 3D T1-weighted images were acquired using a CS-SENSE TFE sequence with following parameters: 240 sagittal slices, isotropic voxel size 0.9 mm3, TR/TE (ms) = 9.1/4.2, and acquisition time = 3:30 min. In line with previous MRI studies of young children (Raschle et al. 2009; Theys et al. 2014; Vanderauwera et al. 2017), an age-appropriate preparation protocol was used to familiarize the children with the scanning procedure and minimize motion artifacts. In order to make the experience more pleasant, the scanner room was decorated in the protocol story theme and the children were given time to acclimatize with the surrounding environment. Communication was maintained during the scan via a built-in microphone system and close monitoring was possible using a camera system. To limit head movement, custom-fit inflatable pads were used for head positioning, as well as positive encouragement during scanning. The children watched a cartoon of their choice during scan acquisition. All participants received a small nonmonetary reward at the end of each visit.
Of the total sample of 90 children reported in the present study, MRI data are not available for six individuals at pretest who either did not wish to participate in MRI assessments (n = 2), or were not able to complete any of the MRI examinations (n = 4). At posttest, MRI data are not available for four additional participants who did not plan a second visit and one dataset that was compromised due to technical issues (see Fig. 1).
MRI Data Preprocessing
Diffusion MRI data were preprocessed using the FMRIB Software Library v6.0 (FSL; Jenkinson et al. 2012). A pair of nondiffusion-weighted images in forward- and reversed-encoding directions were used for estimating the off-resonance field (Andersson et al. 2003) and correcting for susceptibility distortions (Smith et al. 2004). The volumes of all shells of the DWI series were subsequently corrected for motion and eddy current distortions (Andersson and Sotiropoulos 2016) followed by brain mask estimation (Smith 2002). Scalar FA maps were produced using FSL’S DTIFIT, which fits the tensor model on the data. The amount of head motion in each subject was quantified as the mean volume displacement relative to the first volume of the DWI series. This was done by calculating the root mean square of the three translation parameters (x, y, z), similar to Yendiki et al. (2014). Eleven datasets (n = 5 at pretest and n = 6 at post-test) with mean translational movement exceeding 2.2 mm (acquisition voxel size) were excluded from further analyses.
Fiber Tract identification and Quantification
White matter tract identification and quantification was done using the open-source tool automated fiber quantification (AFQ; Yeatman, Dougherty, Myall, et al. 2012) implemented in MATLAB 2016a. First, whole-brain deterministic tractography was performed using a streamlines tracking algorithm with a fourth-order Runge–Kutta path integration method and following parameters: step size = 1, FA threshold = 0.2, minimum length = 50 mm, maximum length = 250 mm, and angle = 30|${\kern0em }^{\circ }$|. This and subsequent steps were performed using all acquired diffusion-weighted volumes from all shells. Next, a waypoint region of interest (ROI)-based tract segmentation was used to define white matter tracts (as described in Wakana et al. 2007), followed by tract refinement and cleaning (removal of fibers deviating more than 4 SD above mean fiber length or more than 5 SD from the fiber core). For a detailed description of the AFQ procedure, see Yeatman, Dougherty, Myall, et al. 2012. The resulting reconstructed tract was then resampled to 100 equidistant nodes spanning the full length of the tract and a weighted FA average was calculated at each node to create a tract profile (ordered anterior to posterior). The weighting procedure was such that fibers deviating from the fiber core count less than those closer to the fiber core. Because the first and the last nodes could not be reliably estimated in all tracts and subjects, they were not included in further analyses. Both the tract profile as well as summarized tract FA measures were analyzed.
The analysis focused on four bilateral pathways of interest defined a priori: the direct frontotemporal and anterior frontoparietal segments of the AF (AFdirect and AFanterior, respectively), the IFOF and the ILF (Vandermosten, Boets, Wouters, et al. 2012; Wandell and Yeatman 2013; Zhang et al. 2014). The investigation of both left- and right-hemispheric pathways is warranted by evidence that (1) reveals bilateral associations between white matter and early language and phonological skills (Vandermosten et al. 2015; Vanderauwera et al. 2018; Walton et al. 2018; Farah et al. 2020), and (2) suggests a potential role of right-hemispheric pathways in promoting better reading outcomes, by means of either compensatory (Hoeft et al. 2011) or protective mechanisms (Wang et al. 2017; Zuk et al. 2021). Here, it is important to clarify that the pathway corresponding to AFanterior in the present study, is referred to as SLF in AFQ, which is consistent with the terminology used in Wakana et al. (2007). Tract identification was performed in 150 datasets from a total sample of 83 subjects. The following tracts could not be identified in certain datasets using the chosen tensor-based tractography and AFQ procedure: left AFdirect (n = 7), right AFdirect (n = 34), left AFanterior (n = 3), right AFanterior (n = 2), left IFOF (n = 2), right IFOF (n = 4), left ILF (n = 0), and right ILF (n = 1). Notably, a higher number is observed for the right AFdirect (frontotemporal segment), which is largely consistent with previous research showing that this pathway may not be detectable in certain individuals, including studies in both children (Eluvathingal et al. 2007; Van Der Auwera et al. 2021) and adults (Catani et al. 2007; Lebel and Beaulieu 2009; Allendorfer et al. 2016). This inability to detect the right AFdirect has been attributed to limitations of deterministic algorithms in regions of crossing fibers (as is the case in the temporoparietal junction), rather than the anatomical absence of the tract from the brain (Yeatman et al. 2011).
Statistical Analyses
All reported analyses were performed in R (version 4.0.0) (R Core Team 2020). For all analyses, the alpha level was set at 0.05. First, we examined white matter risk-related differences prior to intervention. Note that in these pretest FA comparisons, for the purpose of assessing the effect of cognitive risk, the two risk groups were merged (i.e., GG-FL and AC groups, n = 52) and compared with the typically developing group (n = 26). To this end, average tract FA measures (i.e., tract average) and FA values at each node of the tract (i.e., tract profiles consisting of 98 nodes) at pretest were compared between the at-risk children and typically developing children using a linear model, while controlling for in-scanner subject motion.
Given the inherently correlated node structure within each tract profile, Bonferroni correction for multiple comparison corrections can be considered overly conservative and potentially lead to type-II errors. Therefore, a nonparametric permutation-based multiple testing correction was applied for along-tract analyses, as described in Nichols and Holmes (2002) and implemented in several tractometry papers (Yeatman, Dougherty, Myall, et al. 2012; Travis et al. 2017; Farah et al. 2020; Jossinger et al. 2021; Wasserthal et al. 2021). This approach provides the family-wise error (FWE) corrected alpha value for pointwise tests at a given tract, as well as a FWE-corrected cluster size threshold at the user-defined alpha of 0.05. Additional details on the implementation of this procedure are provided in the Supplementary Material.
Group differences were considered significant if they occurred at nodes where the P-value was lower than the FWE-corrected alpha, or if they occurred in a sufficient number of adjacent nodes as determined by the FWE-corrected cluster size, without the need for further P-value adjustment. For the baseline risk-related analysis, corrected alpha thresholds for the different tracts ranged from P < 0.001 to P < 0.003, while cluster-based thresholds ranged from 11 to 14 consecutive nodes.
. | Group . | . | ||
---|---|---|---|---|
Variables . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | Typical control (n = 27)a . | P-valueb . |
Sex (female) | 16/15 | 11/14 | 15/12 | 0.700 |
Familial risk (n) | 5 | 1 | 0 | 0.035 |
Nonverbal intelligence | 102 (78–133) | 96 (79–126) | 101 (78–133) | 0.394 |
Handednessc | 0.857 | |||
Left/right/ambidextrous | 2/28/1 | 2/21/2 | 2/22/3 | |
Age at pretest scan (months) | 64 (60–71) | 66 (60–72) | 66 (62–73) | 0.075 |
Age at post-test scan (months) | 69 (65–76) | 70 (65–76) | 70 (65–77) | 0.734 |
Socioeconomic statusc | 0.112 | |||
Low/middle/high/unknown | 7/12/12/0 | 7/13/4/1 | 3/10/14/0 |
. | Group . | . | ||
---|---|---|---|---|
Variables . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | Typical control (n = 27)a . | P-valueb . |
Sex (female) | 16/15 | 11/14 | 15/12 | 0.700 |
Familial risk (n) | 5 | 1 | 0 | 0.035 |
Nonverbal intelligence | 102 (78–133) | 96 (79–126) | 101 (78–133) | 0.394 |
Handednessc | 0.857 | |||
Left/right/ambidextrous | 2/28/1 | 2/21/2 | 2/22/3 | |
Age at pretest scan (months) | 64 (60–71) | 66 (60–72) | 66 (62–73) | 0.075 |
Age at post-test scan (months) | 69 (65–76) | 70 (65–76) | 70 (65–77) | 0.734 |
Socioeconomic statusc | 0.112 | |||
Low/middle/high/unknown | 7/12/12/0 | 7/13/4/1 | 3/10/14/0 |
aMedian (range) or occurrence (n).
bGroup differences were assessed using a Kruskal–Wallis test for age and nonverbal intelligence, a Pearson’s Chi-squared test for sex, handedness and maternal education, and a Fisher’s exact test for familial risk.
cAssessment based on maternal educational level (low = no extra degree after secondary school; middle = professional bachelor/academic bachelor; high = Master or PhD).
dBased on a subset of questions adapted from the Edinburgh Handedness Inventory (Oldfield, 1971).
. | Group . | . | ||
---|---|---|---|---|
Variables . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | Typical control (n = 27)a . | P-valueb . |
Sex (female) | 16/15 | 11/14 | 15/12 | 0.700 |
Familial risk (n) | 5 | 1 | 0 | 0.035 |
Nonverbal intelligence | 102 (78–133) | 96 (79–126) | 101 (78–133) | 0.394 |
Handednessc | 0.857 | |||
Left/right/ambidextrous | 2/28/1 | 2/21/2 | 2/22/3 | |
Age at pretest scan (months) | 64 (60–71) | 66 (60–72) | 66 (62–73) | 0.075 |
Age at post-test scan (months) | 69 (65–76) | 70 (65–76) | 70 (65–77) | 0.734 |
Socioeconomic statusc | 0.112 | |||
Low/middle/high/unknown | 7/12/12/0 | 7/13/4/1 | 3/10/14/0 |
. | Group . | . | ||
---|---|---|---|---|
Variables . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | Typical control (n = 27)a . | P-valueb . |
Sex (female) | 16/15 | 11/14 | 15/12 | 0.700 |
Familial risk (n) | 5 | 1 | 0 | 0.035 |
Nonverbal intelligence | 102 (78–133) | 96 (79–126) | 101 (78–133) | 0.394 |
Handednessc | 0.857 | |||
Left/right/ambidextrous | 2/28/1 | 2/21/2 | 2/22/3 | |
Age at pretest scan (months) | 64 (60–71) | 66 (60–72) | 66 (62–73) | 0.075 |
Age at post-test scan (months) | 69 (65–76) | 70 (65–76) | 70 (65–77) | 0.734 |
Socioeconomic statusc | 0.112 | |||
Low/middle/high/unknown | 7/12/12/0 | 7/13/4/1 | 3/10/14/0 |
aMedian (range) or occurrence (n).
bGroup differences were assessed using a Kruskal–Wallis test for age and nonverbal intelligence, a Pearson’s Chi-squared test for sex, handedness and maternal education, and a Fisher’s exact test for familial risk.
cAssessment based on maternal educational level (low = no extra degree after secondary school; middle = professional bachelor/academic bachelor; high = Master or PhD).
dBased on a subset of questions adapted from the Edinburgh Handedness Inventory (Oldfield, 1971).
. | Group . | . | |
---|---|---|---|
Variable . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | P-valueb . |
Training exposure (hours) | 16 (3–25) | 18 (6–28) | 0.315 |
Training period (weeks) | 12 (6–19) | 13 (10–20) | 0.483 |
Proportion of completed interventionc (%) | 91 (15–137) | 101 (32–158) | 0.315 |
. | Group . | . | |
---|---|---|---|
Variable . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | P-valueb . |
Training exposure (hours) | 16 (3–25) | 18 (6–28) | 0.315 |
Training period (weeks) | 12 (6–19) | 13 (10–20) | 0.483 |
Proportion of completed interventionc (%) | 91 (15–137) | 101 (32–158) | 0.315 |
aMedian (range).
bGroup differences were assessed using a Wilcoxon rank sum test.
cThe proportion of completed intervention is calculated as the number of hours spent on-task divided by the total amount of hours they were instructed to play during 12 weeks (estimated at 18 h).
. | Group . | . | |
---|---|---|---|
Variable . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | P-valueb . |
Training exposure (hours) | 16 (3–25) | 18 (6–28) | 0.315 |
Training period (weeks) | 12 (6–19) | 13 (10–20) | 0.483 |
Proportion of completed interventionc (%) | 91 (15–137) | 101 (32–158) | 0.315 |
. | Group . | . | |
---|---|---|---|
Variable . | GraphoGame-Flemish (n = 31)a . | Active control (n = 25)a . | P-valueb . |
Training exposure (hours) | 16 (3–25) | 18 (6–28) | 0.315 |
Training period (weeks) | 12 (6–19) | 13 (10–20) | 0.483 |
Proportion of completed interventionc (%) | 91 (15–137) | 101 (32–158) | 0.315 |
aMedian (range).
bGroup differences were assessed using a Wilcoxon rank sum test.
cThe proportion of completed intervention is calculated as the number of hours spent on-task divided by the total amount of hours they were instructed to play during 12 weeks (estimated at 18 h).
To assess changes in white matter over time and intervention effects, average tract FA values (i.e., tract average) were entered in linear mixed-effect models (LMMs) with fixed-effects of group (GG-FL, AC, typical control), session (pretest, post-test), and their interaction, as well as by-subject random intercepts. The same model specification was then used to assess these factors at each node (i.e., tract profiles) to identify clusters of nodes with similar effects. In all reported LMMs, in-scanner subject motion was included as a nuisance predictor. All LMMs were fitted using lmer via the R package lme4 (Bates et al. 2015), and F-statistics and P-values were obtained by calculating type-III analysis of variance tables using Satterthwaite’s degrees-of-freedom approximation (R package lmerTest, Kuznetsova et al. 2017). In order to determine the appropriate P-value or cluster size for the longitudinal analysis, the same permutation-based approach as the one described above was applied. For the intervention analysis, corrected alpha thresholds for the different tracts ranged from P < 0.001 to < 0.002, while cluster-based thresholds ranged from 8 to 11 consecutive nodes.
Effect sizes for all pretest comparisons are reported as Cohen’s d values, while effect sizes for LMM main effects and interactions are reported as Cohen’s |$f$| values (Cohen 1988).
Data Availability
Anonymized data and code to reproduce the figures and findings of this study are publicly available at https://github.com/meconomou/Economou2021_wmintervention.
Results
Participant and group characteristics
Descriptive group statistics are summarized in Table 1. The three groups did not differ significantly with respect to nonverbal intelligence, sex, handedness, socioeconomic status, or age at either MRI visit. More children in the GG-FL group had a familial risk for dyslexia; however, this was not controlled for further given the very low number of occurrences in our sample. A Kruskal–Wallis test revealed no group differences in head motion at either timepoint (pretest: H(2) = 0.29, P = 0.865; post-test: H(2) = 0.96, P = 0.618). There was, however, a main effect of session (F(1,74) = 11.12, P = 0.001), such that less head movement was observed at posttest compared to pretest. As summarized in Table 2, both intervention groups were overall comparable in terms of training duration and total game exposure. Most children played at least 90% of the intended amount (estimated at 18 h) and successful completion of the intervention took slightly longer than the intended 12-week period.

Tract FA profiles before the intervention. Each plot shows the mean tract FA profile ±1 standard error of the mean for the two groups (at-risk and typical control). Bold black panel borders indicate the tracts where global tract FA differences were found. Localized risk-related differences are indicated by gray-shaded regions on the tract profile. The red arrows indicate the position of the corresponding anatomical locations, which are visualized in black on three-dimensional tract renderings next to each plot.
Risk-Related Differences in White Matter Properties Before the Intervention
Tract Average
Here we examined the differences in FA between at-risk and typically developing children (Fig. 2). Overall, the at-risk group had significantly lower FA in the left AFdirect (β = −0.01, t = −2.02, P = 0.047, d = 0.53), left IFOF (β = −0.01, t = −2.15, P = 0.035, d = 0.62), and right IFOF (β = −0.01, t = −2.23, P = 0.029, d = 0.59), but this difference did not reach significance in the right AFdirect (β = −0.02, t = −1.97, P = 0.054, d = 0.55). We did not find any evidence for risk-related differences in global tract FA in bilateral AFanterior (left: β = −0.01, t = −1.52, P = 0.132, d = 0.38; right: β = −0.01, t = −1.13, P = 0.262, d = 0.32) or bilateral ILF (left: β = −0.01, t = −1.56, P = 0.123, d = 0.40; right: β = −0.01, t = −0.97, P = 0.335, d = 0.23). Tracts for which group differences were found in the tract average are depicted with bold black panel borders in Figure 2.
Tract Profiles
The results of the tract profile analysis are shown in Figure 2. Based on the permutation-based multiple testing correction described in the Methods section, significant group differences between at-risk and typically developing children were observed in localized node clusters of both dorsal and ventral pathways. Lower FA for the at-risk group was found in the right AFdirect (nodes 11–13, average d = 0.47), right ILF (nodes 8–9, average d = 0.30), and right IFOF (node 37, d = 0.58). These locations are visualized as red bands on individual tract renderings and as gray-shaded bars on the tract profile plots in Figure 2. No further localized group differences were found in the left AFdirect, left ILF, left IFOF, and bilateral AFanterior. Along-tract effect sizes for these group comparisons are visualized in Supplementary Figure 1. Note that the analyses on the right AFdirect have been conducted with a considerably smaller sample size compared to the other tracts, due to the high number of failed reconstructions. As a result, it is possible that findings concerning the right AFdirect might not be as reliable.
Training Effects on White Matter Properties
Tract Average
A main effect of session (pretest, post-test) was present in several pathways, which was always positive, indicating a systematic increase in FA over time. This effect was significant in the left AFdirect (F(1,63) = 8.09, P = 0.006, |$f$| = 0.36), left AFanterior (F(1,68) = 5.34, P = 0.024, |$f$| = 0.28), left IFOF (F(1,69) = 4.75, P = 0.033, |$f$| = 0.26), and left ILF (F(1,76) = 5.86, P = 0.018, |$f$| = 0.28), but did not reach significance in the right AFdirect (F(1,51) = 3.96, P = 0.052, |$f$| = 0.28), right AFanterior (F(1,70) = 2.16, P = 0.146, |$f$| = 0.18), right IFOF (F(1,70) = 3.86, P = 0.053, |$f$| = 0.23), or right ILF (F(1,72) = 2.04, P = 0.158, |$f$| = 0.17). This analysis revealed no evidence of intervention-related group differences in global tract FA, as indicated by the absence of significant group-by-session interactions in global FA for any of the tracts (right AFanterior: F(2, 67) = 2.65, P = 0.078; all other P > 0.213). The individual tract average FA changes across sessions are shown in Supplementary Figure 2.
Tract Profiles
Longitudinal changes in FA were subsequently examined at each node along the pathways, comparing the three groups. A main effect of session was present in the left AFanterior (nodes 44–57, average |$f$| = 0.10), such that an increase in FA was observed from pretest to posttest. Similar to the global tract average analysis, no significant group-by-session interactions were found. Along-tract effect sizes for the session main effect and the group-by-session interaction effect are visualized in Supplementary Figures 3 and 4, respectively.
Discussion
The present study examined white matter properties in prereading kindergarten children with a cognitive risk for developing dyslexia during an early literacy intervention. The main aim of this study was to investigate the effects of preventive training on white matter development. To this end, at-risk children were assigned to either a digital phonics-based training (GG-FL group) or engaged with other games unrelated to reading (AC group). Before the intervention, we found evidence of global and local risk-related differences in bilateral dorsal and ventral tracts, suggesting that white matter alterations related to dyslexia predate the formal onset of reading. Our longitudinal analysis revealed that while white matter properties changed from pre- to post-test, no evidence was found to support literacy intervention in more detail in the following paragraphs.
Concerning risk-related white matter differences before the intervention, we observed lower FA in bilateral dorsal and ventral pathways in children who performed poorly on reading-related precursors, when compared to typically developing children. More specifically, global average FA differences were found in the left AFdirect, left IFOF, and right IFOF, as well as more localized differences in the right AFdirect, right IFOF, and right ILF. Our findings are consistent with recent evidence reporting prereading differences in children with (a familial risk for) dyslexia (Langer et al. 2017; Vanderauwera et al. 2017; Wang et al. 2017). Together, these observations support the view that early underlying anatomical differences represent a neurobiological predisposition for struggling to read and are not purely driven by reading difficulties.
With respect to location, previous studies in adults and school-aged children have predominantly emphasized the role of left-hemispheric pathways in skilled reading and dyslexia (Yeatman et al. 2011; Vandermosten, Boets, Poelmans, et al. 2012; Zhang et al. 2014). In the present study, our analyses indicate additional risk-related differences in the right homologues, revealing a more bilateral pattern. This is less commonly reported in older children and adults, but is largely in line with prereading evidence supporting more bilateral involvement in younger children (for a review, see Vandermosten et al. 2016). This bilateral involvement might shift toward a left-lateralized pattern within the first few years of formal reading instruction and/or once dyslexia can be diagnosed; however, additional follow-up assessments would be needed to specifically investigate this hemispheric specialization within our sample.
In examining these prereading differences, a tractometry approach was utilized to map FA along the trajectory of white matter tracts (Bells et al. 2011; Yeatman, Dougherty, Myall, et al. 2012; Chamberland et al. 2019; Chandio et al. 2020). Such a fine-grained approach may be more appropriate in investigating regional variations in FA than using global average tract measures, especially given the underlying influence of local anatomical features on FA estimation. Indeed, studies have demonstrated that FA varies significantly along the course of white matter tracts, with these variation patterns remaining relatively consistent across subjects (Yeatman, Dougherty, Myall, et al. 2012; Johnson et al. 2014; Farah et al. 2020). This observation is also confirmed by the present data (see tract profile visualization in Fig. 2).
Notably, both global tract FA differences (in left AFdirect and bilateral IFOF) as well as localized FA differences (in right AFdirect, right ILF and right IFOF) were associated with the presence of a cognitive risk for dyslexia in our study. These localized group differences were identified following a rigorous permutation-based approach to control the FWE rate (Nichols and Holmes 2002) and were observed only in small clusters, ranging from one to three nodes. It is important to clarify here that the number of nodes analyzed, as well as the approach for determining the portion of the tract that will be analyzed, vary considerably among studies. As a result, the size of reported clusters is often not directly comparable. With this in mind, our results are in agreement with previously reported cluster sizes in the literature (see, e.g., Wang et al. 2017; Banfi et al. 2019), although it remains uncertain how anatomically meaningful these small clusters are.
In order to aid interpretation of our findings, effect sizes are provided for the risk-related comparisons as Cohen’s d values (see Supplementary Fig. 1). Based on suggested thresholds (Cohen 1988), a number of along-tract moderate effects are observed in bilateral AFdirect, bilateral ILF, and right IFOF. Despite these moderate effects, the localized group differences in the left AFdirect and left ILF were not identified as significant following correction for multiple comparisons. It is important not to entirely dismiss these results, as they provide relevant insights on the practical implications of our findings, which are independent of significance and sample size (Sullivan and Feinn 2012). Moreover, reporting effect sizes enables comparison of similar effects across studies, which is preferred over solely relying on arbitrary thresholds for interpretation (Lakens 2013).
After 12 weeks of training, an effect of session was observed in several tracts, indicating an increase in FA over the course of the intervention. The absence of any group-by-session interaction, however, suggests that this increase is not uniquely related to the early literacy training and likely reflects developmental effects. This finding is consistent with previous reports of significant FA development in early childhood (Reynolds et al. 2019; Dimond et al. 2020). Although the available studies capture these developmental FA changes in the course of years, our results point out that changes in shorter timescales (i.e., a few months) may also be detectable at this age.
The absence of longitudinal GraphoGame (GG)-specific changes in FA may suggest that the training provided in this study is not associated with measurable changes in white matter. This interpretation, however, warrants caution. Currently available evidence from reading intervention studies is limited and findings are mixed regarding the location and magnitude of training-induced white matter changes. Following intensive reading instruction, Keller and Just (2009) reported a localized FA increase in the left anterior centrum semiovale, whereas Huber et al. (2018) described widespread white matter growth during an intensive reading intervention, including changes in the reading network that we investigated. In contrast to both of these studies and therefore more in line with the findings of the current study, Partanen et al. (2021) reported no measurable effects of a three-month reading intervention on white matter properties of dyslexic readers.
Several factors contribute to the observed variability among studies, including intervention-related aspects or divergence in study design and analysis. For instance, interventions that target specific aspects of reading acquisition may lead to more regional changes, as was the case in the study of Keller and Just (2009) (and might be less straightforward to detect with a region-of-interest analysis). On the other hand, widespread plasticity may be a reflection of the general learning experience resulting from receiving multicomponential training, as was the case in Huber et al. (2018). Similarly, the inclusion of an AC group (such as in our study and that of Keller and Just 2009) or lack thereof (such as in Huber et al. 2018; Partanen et al. 2021) might yield different patterns of intervention effects. A recent meta-analysis has shown that the lack of randomization or comparison with a business-as-usual group instead of an AC group, has an impact on estimated effect sizes in early literacy intervention research (Verhoeven et al. 2020). As such, research design is a very relevant factor to consider when interpreting (null) intervention effects. Intervention characteristics such as intensity and exposure are also likely to play a key role in explaining the findings of the current study. Both Keller and Just (2009) and Huber et al. (2018) report more than a 100 h of intervention received by the participants, which is markedly higher than the total intervention exposure in our study and could explain why we did not find an effect on white matter properties. The intervention reported in the current study led to improvements in letter sound knowledge and word decoding (Vanden Bempt et al. 2021). Yet, given that extensive interventions comprising 100 or more sessions are put forward as the standard for effective remediation (Wanzek and Vaughn 2007), the training provided in our study might have only induced behavioral effects. Lastly, the role of the statistical framework used in the present study needs to be addressed when drawing conclusions about the findings. Other methodologically similar studies often use less conservative approaches to adjust for multiple along-tract comparisons, compared to the approach described in the current study. The methodological choice can have an influence on both the results and the conclusions and therefore should be factored in when interpreting findings within and across studies. For completeness, uncorrected P-values for the analyses are shown in Supplementary Figures 5–7.
A final consideration when discussing the present findings and the variation among reported intervention results, concerns developmental effects. While some evidence exists to support plasticity in school-aged children following reading interventions, less is known about white matter changes during early (preschool) interventions. In the present study, we specifically investigate the prereading brain and therefore focus on structural changes during the sensitive period of learning to read (Ozernov-Palchik and Gaab 2016), whereas the studies described above examine intervention mechanisms in later stages of reading development. Hence, it might be that the impact of such intervention on white matter is more profound after the onset of primary school compared to the prereading stage. Another possibility is that the timing of the intervention provided in our study might be associated with other structural changes besides white matter. For instance, recent evidence shows that cortical gray matter volume in left temporal and temporoparietal regions linked to reading is still increasing within the first few years of primary school (Phan et al. 2021). This provides an indication that gray matter structure is amenable to (intervention-induced) plasticity during early childhood.
A number of limitations are recognized in the present study. First, while FA has been extensively used in studies of reading- and dyslexia-related differences (Ben-Shachar et al. 2007; Vandermosten, Boets, Wouters, et al. 2012; Vandermosten, Boets, Poelmans, et al. 2012) and has been established as a valuable index of neuroplasticity (Sagi et al. 2012; Zatorre et al. 2012), its biological interpretation remains limited. Due to the demonstrably low specificity of FA for a single neurobiological process, the underlying neural changes that might influence FA change cannot be understood based on this index alone. Access to complementary information about underlying tissue architecture, ideally from multimodal investigations, would be an advantage of future studies and could uncover a different pattern of results. Recent evidence mapping intervention-related white matter changes to more specific neurobiological processes (Huber et al. 2021) further corroborates this view. A second consideration pertains to the cohort sample size, despite best efforts and a large-scale screening in more than 1000 prereaders. This is especially relevant for our analyses in the right AFdirect, where lower FA was observed in at-risk children. Given that the right AFdirect was not successfully reconstructed in a relatively large subset of participants (30%), this finding is potentially less robust and should be interpreted with caution. Owing to similar constraints, exploring individual differences was not possible and we thus report only intervention effects at the group level. The sample size, dropout and data quality are undoubtedly major practical challenges of longitudinal neuroimaging studies in young children up to the age of five, such as the present one (Turesky et al. 2021). They nevertheless advance our understanding of intervention mechanisms during this sensitive period of early reading acquisition.
To summarize, here we report on a longitudinal investigation of white matter properties in 5-year-old kindergarten children following an early literacy training. Our findings extend previous research supporting an early link between structural white matter alterations and a cognitive risk for developing reading difficulties. The results provide evidence of an overall change in white matter properties within the course of the intervention, however no specific training-related effects were observed. The present study is one of the first to investigate the neurocognitive mechanisms of preventive intervention in young pre-reading children. Longitudinal follow-up of the children’s reading skills will help clarify the long-lasting impact of preventive training, which, in turn, has the potential to inform future remediation strategies for at-risk readers at a time when they are most beneficial.
Funding
Research Council of KU Leuven (C14/17/046). J.V. was a postdoctoral fellow of the Research Foundation Flanders (12T4818N).
Notes
We would like to thank Cara Verwimp, Fran Vanfleteren, Klara Schevenels, Lauren Blockmans, Justine Soetaert, Caitlan Nys, Valerie Geerits, and Aymara Taillieu for their assistance with MRI protocol preparation and data collection. We are grateful to Ron Peeters for technical MRI support. We are most thankful to all the participating families and children without whom this research would not be possible. Conflict of Interest: None declared.
References
Oldfield RC.
Wasserthal J, Maier-Hein KH, Neher PF, Wolf RC, Northoff G, Waddington JL, Kubera KM, Fritze S, Harneit A, Geiger LS, et al.