-
PDF
- Split View
-
Views
-
Cite
Cite
Fabio Richlan, Benjamin Gagl, Stefan Hawelka, Mario Braun, Matthias Schurz, Martin Kronbichler, Florian Hutzler, Fixation-Related fMRI Analysis in the Domain of Reading Research: Using Self-Paced Eye Movements as Markers for Hemodynamic Brain Responses During Visual Letter String Processing, Cerebral Cortex, Volume 24, Issue 10, October 2014, Pages 2647–2656, https://doi.org/10.1093/cercor/bht117
- Share Icon Share
Abstract
The present study investigated the feasibility of using self-paced eye movements during reading (measured by an eye tracker) as markers for calculating hemodynamic brain responses measured by functional magnetic resonance imaging (fMRI). Specifically, we were interested in whether the fixation-related fMRI analysis approach was sensitive enough to detect activation differences between reading material (words and pseudowords) and nonreading material (line and unfamiliar Hebrew strings). Reliable reading-related activation was identified in left hemisphere superior temporal, middle temporal, and occipito-temporal regions including the visual word form area (VWFA). The results of the present study are encouraging insofar as fixation-related analysis could be used in future fMRI studies to clarify some of the inconsistent findings in the literature regarding the VWFA. Our study is the first step in investigating specific visual word recognition processes during self-paced natural sentence reading via simultaneous eye tracking and fMRI, thus aiming at an ecologically valid measurement of reading processes. We provided the proof of concept and methodological framework for the analysis of fixation-related fMRI activation in the domain of reading research.
Introduction
Reading is a complex activity involving numerous cognitive processes. In the typical everyday natural reading situation, a reader silently scans sentences and texts at his or her own pace with the intention of extracting information. In contrast, the majority of neurocognitive studies on reading suffer from an ecologically invalid reading situation. Specifically, in typical electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) studies, the participants are presented with isolated letter strings (e.g., words, pseudowords) without context and asked to perform a more or less artificial task (e.g., reading aloud, rhyme judgment, lexical decision). Furthermore, exposition duration to stimuli is chosen rather arbitrarily and controlled by the experimenter. However, it is well known that the brain response can be altered by exposition time and task demands (e.g., via top-down activation) in such a way that it no longer reflects the processing of the stimulus material per se (e.g., Dehaene and Cohen 2011).
Apart from single-word presentation, other common paradigms lack ecological validity as well. For example, studies based on rapid serial visual presentation (RSVP) of words or parts of sentences suffer from the problem that the rate of presentation (i.e., which information is presented and when it is attended to) is externally determined instead of internally controlled (i.e., by the subject). Recently, it has been reported that the presentation rate utilized in RSVP experiments can substantially affect electrophysiological brain responses (Dambacher et al. 2012). Hence, these studies are probably limited in advancing our understanding of neurocognitive processes during natural reading. Rather, they inform us on the neural correlates of visual word processing during certain reading-related situations or tasks.
Paradigms in which whole sentences or text passages are presented and participants are unconstrained with respect to the point of time and target location of their eye movements offer more natural reading situations. On the downside, any step to improve ecological validity usually implies a loss in experimental control. For example, the complex pattern of fixations during natural reading (e.g., refixations, regressions, word skippings, etc.) varies across trials and participants. In consequence, the experimental situation is more natural but data analyses are potentially less reliable and more difficult to interpret. However, in typical fMRI studies sentences (consisting of several words) are treated as unitary events starting with the initial appearance of the text on the screen. Thereby, the measured signal represents an average across multiple distinct neurocognitive processes during reading. Accordingly, in order to investigate processes associated with visual word processing during natural reading, one has to present sentences but treat the processing of the different parts of sentences (i.e., words) as separate events.
In the domain of EEG, this was realized by the use of fixation-related potentials (FRPs; Hutzler et al. 2007; Dimigen et al. 2011). In this approach, an eye tracker is used to measure the subject's eye movements (i.e., saccades) during reading and the resting periods on words (i.e., fixations) are used as markers for calculating electrophysiological brain potentials. The reasoning is that the point of time when a word is fixated for the first time (compared with the point of time when a word appears on the screen) is a more valid indicator for the beginning of cognitive processes that depend upon foveal perception of that word. Note that in natural reading there is also a significant amount of parafoveal preprocessing of upcoming, not yet fixated words, which—by definition—starts before foveal processing (e.g., Rayner 1998). Furthermore, eye movements represent internally (by the subject) generated shifts of attention, whereas in typical paradigms it is determined externally (by the experimenter) when the next stimulus is presented (and attended to).
The feasibility of the FRP approach for the domain of reading research has been well established. Prominent EEG effects were replicated and extended, for example the old–new effect (Hutzler et al. 2007) and the word-predictability effect (Dimigen et al. 2011). The benefit of FRPs over traditional ERPs is that they allow experimental paradigms resembling natural reading conditions. Evidence for differing effects between artificial and natural reading situations are only beginning to emerge, but first studies point to an earlier onset of electrophysiological components under natural conditions (Dimigen et al. 2011), most likely resulting from parafoveal lexical preprocessing (Reingold et al. 2012). This is important because the latency of specific effects is one of the most valuable pieces of information to be gained from EEG studies.
Complementary to EEG, fMRI offers the possibility of studying human brain function at a high spatial resolution. Recently, Marsman et al. (2012) transferred the logic of the FRP approach to fMRI. In a visual object processing task, they demonstrated the feasibility of using fixations as onsets for calculating hemodynamic responses. The success of the fixation-related analysis approach is remarkable, given the low temporal resolution of fMRI compared with EEG. Specifically, concerns about using fixations as onsets in the analysis of fMRI data relate to two temporal properties of fixations: first, they are relatively short (∼200–300 ms) and, secondly, they usually occur at a relatively high rate (∼3–4 per second), only split by short saccades (∼20–30 ms) during which no visual information is picked up.
The results of Marsman et al. (2012) are encouraging in such a way that the fixation-related analysis approach may also be applied to reading material. Specifically, Marsman et al. (2012) presented screens with a circular array of visual objects (pictures of faces and houses) and the participants were free to explore these screens, with a memory task following stimulus presentation. The eye movement behavior of the participants corresponded to typical viewing behavior during reading with brief fixations, high frequency of saccades, and a number of refixations and regressions. Importantly, fixations corresponded to an fMRI response reliably enough for identification of activation differences between faces and houses. Thus, this study provides evidence for the fundamental feasibility of using fixations as markers for the onset of hemodynamic events.
The present study investigated whether fixation-related fMRI analysis may also be applied to the domain of reading research. However, before applying this method to natural reading situations (i.e., sentence reading), it must be shown that analysis of fMRI responses from self-paced fixations is sensitive enough to identify activation differences between reading material (e.g., words) and nonreading material (e.g., line strings). In order to provide this proof of concept, we adapted the Marsman et al. (2012) study and instead of faces and houses we presented familiar letter strings (words), unfamiliar letter string (pseudowords), simple nonletter strings (line strings), and complex unfamiliar character strings (Hebrew strings). Note that the circular array of stimuli and the judgment task used in our study is very different from natural reading. The main purpose of our study, however, was to investigate whether the fixation-related fMRI approach as implemented by Marsman et al. (2012) for visual object processing would also be suitable for visual letter string processing.
We expected to find activation of the typical task-positive bilateral network for visual processing of character strings (independent of stimulus type). This is important to assure general data quality and suitability for further analysis. However, the critical expectation referred to the comparison between the different stimulus categories. We expected higher activation for reading material (words and pseudowords) compared with nonreading material (line and Hebrew strings) in left hemisphere language regions. Specifically, reading-related activation was expected in left posterior temporal, left occipito-temporal, and left inferior frontal regions. These regions are generally accepted as important reading-related regions (see reviews by Jobard et al. 2003; Démonet et al. 2004; Schlaggar and McCandliss 2007; Shaywitz and Shaywitz 2008; Price 2012; Richlan 2012). The present study is the first step in the application of the fixation-related fMRI approach to the domain of reading research. It paves the way for subsequent fMRI studies utilizing self-paced natural sentence reading.
Materials and Methods
Participants
Eighteen (13 females, 5 males) German-speaking adults in the age range of 17–35 years (M = 24.39 years; SD = 4.41 years) participated in the study. All participants had normal or corrected-to-normal vision and reported no history of neurological disease or reading difficulties. None of the participants were familiar with Hebrew characters. Participants gave written informed consent and were paid for their participation.
Stimuli and Task
The stimulus set consisted of 108 words, 108 pseudowords, 108 line strings, and 108 Hebrew strings. Each item consisted of 5 characters of mono-spaced Courier New font, with single character widths not exceeding 0.3° of visual angle (total item width ∼1.7° of visual angle, which is equivalent to typical reading situations). As evident from Table 1, words (with a log frequency of 0.96; CELEX database; Baayen et al. 1993) were matched to pseudowords on the following variables: number of syllables, number of Coltheart's orthographic neighbors (i.e., same-length words differing by one letter), log frequency of the highest frequency neighbor, initial bigram frequency, final bigram frequency, and summated bigram frequency. Line strings—serving as simple visual control stimuli—consisted of forward and backward slashes. Hebrew strings—serving as more complex visual control stimuli—were built of Hebrew characters, with which our participants were unfamiliar. Compared with the reading material (i.e., words and pseudowords), the line strings had fewer visual features (e.g., line length, number of line junctions, number of brushstrokes), while the visual characteristics of the Hebrew strings were similar to those of the Latin letter strings. All stimuli were presented in the same font size and—due to the mono-spaced font—were equal in width. Ten stimuli from each of the four categories included a sequence with two identical symbols in immediate succession.
Characteristics . | Words . | Pseudowords . |
---|---|---|
Log frequency | 0.96 (0.52) | — |
Number of syllables | 1.87 (0.46) | 1.88 (0.45) |
Number of Coltheart's orthographic neighbors | 2.35 (2.02) | 2.33 (1.59) |
Log frequency of the highest frequency neighbor | 1.00 (0.94) | 1.02 (0.93) |
Initial bigram frequency | 437 (364) | 437 (366) |
Final bigram frequency | 1046 (1579) | 1381 (2449) |
Summated bigram frequency | 13475 (6882) | 13428 (6722) |
Characteristics . | Words . | Pseudowords . |
---|---|---|
Log frequency | 0.96 (0.52) | — |
Number of syllables | 1.87 (0.46) | 1.88 (0.45) |
Number of Coltheart's orthographic neighbors | 2.35 (2.02) | 2.33 (1.59) |
Log frequency of the highest frequency neighbor | 1.00 (0.94) | 1.02 (0.93) |
Initial bigram frequency | 437 (364) | 437 (366) |
Final bigram frequency | 1046 (1579) | 1381 (2449) |
Summated bigram frequency | 13475 (6882) | 13428 (6722) |
Characteristics . | Words . | Pseudowords . |
---|---|---|
Log frequency | 0.96 (0.52) | — |
Number of syllables | 1.87 (0.46) | 1.88 (0.45) |
Number of Coltheart's orthographic neighbors | 2.35 (2.02) | 2.33 (1.59) |
Log frequency of the highest frequency neighbor | 1.00 (0.94) | 1.02 (0.93) |
Initial bigram frequency | 437 (364) | 437 (366) |
Final bigram frequency | 1046 (1579) | 1381 (2449) |
Summated bigram frequency | 13475 (6882) | 13428 (6722) |
Characteristics . | Words . | Pseudowords . |
---|---|---|
Log frequency | 0.96 (0.52) | — |
Number of syllables | 1.87 (0.46) | 1.88 (0.45) |
Number of Coltheart's orthographic neighbors | 2.35 (2.02) | 2.33 (1.59) |
Log frequency of the highest frequency neighbor | 1.00 (0.94) | 1.02 (0.93) |
Initial bigram frequency | 437 (364) | 437 (366) |
Final bigram frequency | 1046 (1579) | 1381 (2449) |
Summated bigram frequency | 13475 (6882) | 13428 (6722) |
As illustrated in Figure 1, six stimuli were equidistantly arranged on six fixed locations and simultaneously presented via a mirror on an MR-compatible LCD screen (NordicNeuroLab, Bergen, Norway). This design was borrowed from Marsman et al. (2012). The LCD screen was set to a refresh rate of 60 Hz and a resolution of 1024 × 768 pixels. A total number of 72 screens (each containing 6 stimuli) were presented by the Experiment Builder software (SR-Research Ltd., ON, Canada). In addition, 18 null-events were included in which a fixation cross was presented in the center of the screen.

In contrast to the Marsman et al. (2012) study, which used a Sternberg memory task, we chose an implicit reading task with similar attention and task demands to all of the four stimulus categories: our task required the participants to count the number of items with two identical consecutive characters. The rationale behind this task was that we wanted to ensure comparable processing demands for reading material and nonreading material, as well as for words and pseudowords with as little top-down influence as possible. Nevertheless, the task should result in implicit, automatic reading of words and pseudowords upon fixation. For example, on a screen with “Komet,” “רוצטא,” “Hobby,” “/ \ / \ /,” “Valer,” and “/ \ / / \” the correct answer was “two” as “Hobby” and “/ \ / / \” contained two identical characters in immediate succession. Thirty-six screens contained no such strings, 32 screens contained one such string, and 4 screens contained 2 such strings. Each trial started with a central fixation cross, which was presented pseudo-randomly for 3000, 3500, 4000, 4500, or 5000 ms. The fixation cross was followed by a drift correction realized by the eye tracker (details below). After the drift correction, the stimulus screen was presented for 4000, 4500, 5000, 5500, or 6000 ms, depending on the duration of the fixation cross (that is, when the fixation cross was presented for 3000 ms, the stimulus screen was presented for 6000 ms, and so on). The reason for the temporal jittering of the fixation cross and stimulus screen was to reduce expectancy effects of the participants. In contrast to Marsman et al. (2012), who presented the stimuli for up to 18 s, we chose a shorter duration to reduce the number of regressions on previously fixated items. Such regressions may lead to unwanted repetition priming effects. In summary, the duration of the fixation cross and the stimulus screen was 9000 ms plus a variable duration resulting from the drift correction. After a stimulus screen, a central question mark appeared for 3000 ms and the participants were required to provide their response by pressing with their thumb one of the three keys on a response pad (corresponding to 0, 1, and 2 strings containing two identical consecutive characters, respectively). Participants were familiarized with the procedure before scanning by a training session outside the scanner.
Data Acquisition and Analysis
Eye movements were recorded by an Eyelink CL system (SR-Research Ltd., ON, Canada) in the long range set up. The camera was mounted on the head side of the scanner bore, nearest to the LCD screen. Movements of the right eye were recorded with a sampling rate of 1000 Hz. While recording, the head was stabilized in the head coil ∼90 cm away from the screen. The eye tracker was calibrated with a 9-point calibration routine before each run and when the drift correction failed. Preceding each trial, a drift correction procedure was used to adapt the calibration to minor changes due to drifts.
Fixations were attributed to an item when they fell within a rectangle of 200 × 150 pixels around the center of an item (corresponding to a width of ∼4.5°). Fixations shorter than 80 ms were discarded from analysis.
Functional imaging data were acquired with a Siemens Magnetom Trio 3 Tesla scanner (Siemens AG, Erlangen, Germany) equipped with a 12-channel head-coil. Functional images sensitive to blood oxygen level-dependent (BOLD) contrast were acquired with a T2*-weighted gradient echo EPI sequence (TR 2000 ms, TE 30 ms, matrix 64 × 64 mm, FOV 192 mm, flip angle 80°). Thirty-six slices with a slice thickness of 3 mm and a slice gap of 0.3 mm were acquired within the TR. Scanning proceeded in 3 runs with a variable number of scans per run. The exact number of scans depended on the participants' viewing behavior and the calibration procedure, and ranged from 197 to 359 scans (M = 224 scans, SD = 27 scans). In addition to the functional images, a gradient echo field map (TR 488 ms, TE 1 = 4.49 ms, TE 2 = 6.95 ms) and a high resolution (1 × 1 × 1.2 mm) structural scan with a T1-weighted MPRAGE sequence were acquired from each participant.
For preprocessing and statistical analysis, SPM8 software was used (http://www.fil.ion.ucl.ac.uk/spm/) running in a MATLAB 7.6 environment (Mathworks, Inc., Natick, MA, USA). Functional images were corrected for geometric distortions by use of the FieldMap toolbox, realigned and unwarped, and then coregistered to the high-resolution structural image. The structural image was normalized to the MNI T1 template image, and the resulting parameters were used for normalization of the functional images, which were resampled to isotropic 3 × 3 × 3 mm voxels and smoothed with a 6 mm FWHM Gaussian kernel. No slice timing correction was applied.
Statistical analysis was performed in a two-stage mixed effects model. The crucial analysis step of the fixation-related approach was realized during the subject-specific first level model specification. In contrast to traditional event-related analysis, where the onsets of stimuli are modeled, in the fixation-related analysis each first fixation on an item was modeled by a canonical hemodynamic response function combined with time and dispersion derivatives comprising an informed basis set. The movement parameters derived from the realignment step during preprocessing were modeled as covariates of no interest. The functional data in these first level models were high-pass filtered with a cut-off of 128 s and corrected for autocorrelation by an AR(1) model (Friston et al. 2002). In these first-level models, the parameter estimates reflecting signal change for words versus baseline (which consisted of the inter-stimulus interval, the null-events, and the eye tracker drift correction/recalibration procedure), pseudowords versus baseline, line strings versus baseline, and Hebrew strings versus baseline were calculated in the context of a GLM (Henson 2004). These subject-specific contrast images were used for the second-level random effects analysis.
Activation for each of the four baseline contrasts was examined by t-tests thresholded at a voxel-level (height) of P < 0.005 (uncorrected) and a cluster-level (extent) of P < 0.05 (corrected for multiple comparisons using the family-wise error rate). The resulting activation maps were combined and used as a mask to search for differences between reading material (words and pseudowords) and nonreading material (line and Hebrew strings). These analyses were thresholded at the same voxel-level and cluster-level threshold used for the baseline contrasts.
Results
Behavioral and Eye Tracking Results
Task accuracy was close to perfect with on average 96.14% correct responses (SD = 4.17%). As evident from Table 2, Hebrew strings elicited longer looking times and higher numbers of fixations and regressions than the other stimulus types, Fs(3, 51) > 11.96, Ps < 0.001. In addition, words, compared with pseudowords, elicited shorter first fixation durations, t(17) = 2.79, P < 0.05, shorter first pass reading times, t(17) = 5.38, P < 0.001, shorter total reading times, t(17) = 7.16, P < 0.001, and fewer total number of fixations, Wilcoxon W = 2.11, P < 0.05.
. | Words . | Pseudowords . | Line strings . | Hebrew strings . |
---|---|---|---|---|
First fixation duration (ms) | 295 (38) | 314 (55) | 313 (54) | 357 (66) |
First pass reading time (ms) | 431 (67) | 465 (75) | 457 (103) | 625 (141) |
Total reading time (ms) | 532 (61) | 596 (69) | 571 (116) | 891 (179) |
First pass number of fixations | 1.48 (0.26) | 1.50 (0.28) | 1.51 (0.25) | 1.83 (0.39) |
Total number of fixations | 1.87 (0.29) | 1.96 (0.33) | 1.94 (0.33) | 2.67 (0.47) |
Number of regressions | 0.27 (0.10) | 0.30 (0.13) | 0.29 (0.11) | 0.45 (0.19) |
. | Words . | Pseudowords . | Line strings . | Hebrew strings . |
---|---|---|---|---|
First fixation duration (ms) | 295 (38) | 314 (55) | 313 (54) | 357 (66) |
First pass reading time (ms) | 431 (67) | 465 (75) | 457 (103) | 625 (141) |
Total reading time (ms) | 532 (61) | 596 (69) | 571 (116) | 891 (179) |
First pass number of fixations | 1.48 (0.26) | 1.50 (0.28) | 1.51 (0.25) | 1.83 (0.39) |
Total number of fixations | 1.87 (0.29) | 1.96 (0.33) | 1.94 (0.33) | 2.67 (0.47) |
Number of regressions | 0.27 (0.10) | 0.30 (0.13) | 0.29 (0.11) | 0.45 (0.19) |
. | Words . | Pseudowords . | Line strings . | Hebrew strings . |
---|---|---|---|---|
First fixation duration (ms) | 295 (38) | 314 (55) | 313 (54) | 357 (66) |
First pass reading time (ms) | 431 (67) | 465 (75) | 457 (103) | 625 (141) |
Total reading time (ms) | 532 (61) | 596 (69) | 571 (116) | 891 (179) |
First pass number of fixations | 1.48 (0.26) | 1.50 (0.28) | 1.51 (0.25) | 1.83 (0.39) |
Total number of fixations | 1.87 (0.29) | 1.96 (0.33) | 1.94 (0.33) | 2.67 (0.47) |
Number of regressions | 0.27 (0.10) | 0.30 (0.13) | 0.29 (0.11) | 0.45 (0.19) |
. | Words . | Pseudowords . | Line strings . | Hebrew strings . |
---|---|---|---|---|
First fixation duration (ms) | 295 (38) | 314 (55) | 313 (54) | 357 (66) |
First pass reading time (ms) | 431 (67) | 465 (75) | 457 (103) | 625 (141) |
Total reading time (ms) | 532 (61) | 596 (69) | 571 (116) | 891 (179) |
First pass number of fixations | 1.48 (0.26) | 1.50 (0.28) | 1.51 (0.25) | 1.83 (0.39) |
Total number of fixations | 1.87 (0.29) | 1.96 (0.33) | 1.94 (0.33) | 2.67 (0.47) |
Number of regressions | 0.27 (0.10) | 0.30 (0.13) | 0.29 (0.11) | 0.45 (0.19) |
fMRI Results
As shown in Figure 2, each of the four baseline contrasts (words > fixation, pseudowords > fixation, line strings > fixation, Hebrew strings > fixation) resulted in activation of a similar bilateral task-positive network. This network included bilateral occipital regions extending ventrally in posterior temporal regions (inferior, middle, and superior temporal gyri), and dorsally in superior parietal and postcentral regions. Furthermore, activation was identified in bilateral precentral, inferior temporal, middle temporal, and supplementary motor regions, as well as in the cerebellum and in subcortical regions (putamen, pallidum, caudate nuclei, thalamus, and middle cingulum). Words and pseudowords showed a slight left-lateralization (especially in temporal regions) whereas line and Hebrew strings showed higher bilateral occipito-parietal activation.

Surface rendering of the baseline contrasts (left, right, and ventral views with the cerebellum removed, respectively).
The results from the four separate baseline contrasts were inclusively combined in a disjunction mask which was used to search for differences between the stimulus categories. That is, the analysis was restricted to voxels which showed reliable activation in at least one of the four baseline contrasts.
Figure 3 and Table 3 show the regions with higher activation for reading material (words and pseudowords) compared with nonreading material (line and Hebrew strings). As expected, reading material compared with nonreading material led to higher activation in left posterior temporal regions. Specifically, words, compared with line strings, showed higher activation in the left posterior middle temporal gyrus (MTG) and in the left superior temporal sulcus (STS). Compared with Hebrew strings, words showed higher activation in similar regions, with an additional small extension in the left supramarginal gyrus (SMG).
Regions with higher activation for reading material (words and pseudowords) compared with nonreading material (line and Hebrew strings)
Region . | MNI coordinates . | Z . | Extent (voxels) . | ||
---|---|---|---|---|---|
x . | y . | z . | |||
Words > line strings | |||||
L posterior MTG | −57 | −55 | 4 | 4.37 | 118 |
L STS | −51 | −34 | 10 | 3.77 | |
L MTG | −48 | −49 | 4 | 3.34 | |
L STG | −48 | −16 | −16 | 3.22 | |
Words > Hebrew strings | |||||
L posterior MTG | −60 | −58 | 7 | 4.60 | 124 |
L SMG | −60 | −37 | 25 | 4.15 | |
L STS | −51 | −37 | 7 | 4.11 | |
L STG | −48 | −46 | 19 | 3.54 | |
Pseudowords > line strings | |||||
L anterior OTS | −42 | −43 | −11 | 3.83 | 108 |
L MTG | −48 | −49 | 4 | 3.73 | |
L STS | −57 | −37 | 4 | 3.51 | |
L posterior MTG | −57 | −58 | 4 | 3.45 | |
Pseudowords > Hebrew strings | |||||
L STS | −66 | −34 | 7 | 4.09 | 118 |
L posterior MTG | −60 | −55 | 7 | 3.95 | |
L anterior OTS | −45 | −43 | −14 | 3.84 | |
L MTG | −54 | −37 | 4 | 3.56 |
Region . | MNI coordinates . | Z . | Extent (voxels) . | ||
---|---|---|---|---|---|
x . | y . | z . | |||
Words > line strings | |||||
L posterior MTG | −57 | −55 | 4 | 4.37 | 118 |
L STS | −51 | −34 | 10 | 3.77 | |
L MTG | −48 | −49 | 4 | 3.34 | |
L STG | −48 | −16 | −16 | 3.22 | |
Words > Hebrew strings | |||||
L posterior MTG | −60 | −58 | 7 | 4.60 | 124 |
L SMG | −60 | −37 | 25 | 4.15 | |
L STS | −51 | −37 | 7 | 4.11 | |
L STG | −48 | −46 | 19 | 3.54 | |
Pseudowords > line strings | |||||
L anterior OTS | −42 | −43 | −11 | 3.83 | 108 |
L MTG | −48 | −49 | 4 | 3.73 | |
L STS | −57 | −37 | 4 | 3.51 | |
L posterior MTG | −57 | −58 | 4 | 3.45 | |
Pseudowords > Hebrew strings | |||||
L STS | −66 | −34 | 7 | 4.09 | 118 |
L posterior MTG | −60 | −55 | 7 | 3.95 | |
L anterior OTS | −45 | −43 | −14 | 3.84 | |
L MTG | −54 | −37 | 4 | 3.56 |
L, left; MTG, middle temporal gyrus; OTS, occipito-temporal sulcus; SMG, supramarginal gyrus; STG, superior temporal gyrus; STS, superior temporal sulcus.
Regions with higher activation for reading material (words and pseudowords) compared with nonreading material (line and Hebrew strings)
Region . | MNI coordinates . | Z . | Extent (voxels) . | ||
---|---|---|---|---|---|
x . | y . | z . | |||
Words > line strings | |||||
L posterior MTG | −57 | −55 | 4 | 4.37 | 118 |
L STS | −51 | −34 | 10 | 3.77 | |
L MTG | −48 | −49 | 4 | 3.34 | |
L STG | −48 | −16 | −16 | 3.22 | |
Words > Hebrew strings | |||||
L posterior MTG | −60 | −58 | 7 | 4.60 | 124 |
L SMG | −60 | −37 | 25 | 4.15 | |
L STS | −51 | −37 | 7 | 4.11 | |
L STG | −48 | −46 | 19 | 3.54 | |
Pseudowords > line strings | |||||
L anterior OTS | −42 | −43 | −11 | 3.83 | 108 |
L MTG | −48 | −49 | 4 | 3.73 | |
L STS | −57 | −37 | 4 | 3.51 | |
L posterior MTG | −57 | −58 | 4 | 3.45 | |
Pseudowords > Hebrew strings | |||||
L STS | −66 | −34 | 7 | 4.09 | 118 |
L posterior MTG | −60 | −55 | 7 | 3.95 | |
L anterior OTS | −45 | −43 | −14 | 3.84 | |
L MTG | −54 | −37 | 4 | 3.56 |
Region . | MNI coordinates . | Z . | Extent (voxels) . | ||
---|---|---|---|---|---|
x . | y . | z . | |||
Words > line strings | |||||
L posterior MTG | −57 | −55 | 4 | 4.37 | 118 |
L STS | −51 | −34 | 10 | 3.77 | |
L MTG | −48 | −49 | 4 | 3.34 | |
L STG | −48 | −16 | −16 | 3.22 | |
Words > Hebrew strings | |||||
L posterior MTG | −60 | −58 | 7 | 4.60 | 124 |
L SMG | −60 | −37 | 25 | 4.15 | |
L STS | −51 | −37 | 7 | 4.11 | |
L STG | −48 | −46 | 19 | 3.54 | |
Pseudowords > line strings | |||||
L anterior OTS | −42 | −43 | −11 | 3.83 | 108 |
L MTG | −48 | −49 | 4 | 3.73 | |
L STS | −57 | −37 | 4 | 3.51 | |
L posterior MTG | −57 | −58 | 4 | 3.45 | |
Pseudowords > Hebrew strings | |||||
L STS | −66 | −34 | 7 | 4.09 | 118 |
L posterior MTG | −60 | −55 | 7 | 3.95 | |
L anterior OTS | −45 | −43 | −14 | 3.84 | |
L MTG | −54 | −37 | 4 | 3.56 |
L, left; MTG, middle temporal gyrus; OTS, occipito-temporal sulcus; SMG, supramarginal gyrus; STG, superior temporal gyrus; STS, superior temporal sulcus.

The activation pattern for pseudowords was slightly different. Compared with line strings, the maximum of the activation difference was not located in middle temporal regions but in the left occipito–temporal sulcus. However, submaxima were similar to those for words and located in the left MTG and in the left STS. Pseudowords compared with Hebrew strings led to higher activation in similar coordinates, but the maximum was not located in the occipito-temporal sulcus (OTS) but in the left STS.
Unexpectedly, our analysis failed to identify higher activation for reading material compared with non-reading material in left inferior frontal language regions. However, after omitting the cluster-level correction, reading-related activation was identified in opercular and triangular parts of the left inferior frontal gyrus (IFG), as well as in the left insula. This tendency was also evident from stronger and more extended activation for words and pseudowords compared with line and Hebrew strings illustrated in Figure 2.
The finding of higher activation for pseudowords (but not for words) compared with non-reading material in the left OTS was of specific interest. For closer inspection of this effect, we directly compared activation elicited by pseudowords to activation elicited by words. This analysis revealed a tendency for higher activation for pseudowords in an anterior aspect of the left OTS (only statistically significant without the cluster-level correction). To investigate the activation pattern in the OTS more precisely, we conducted a region of interest (ROI) analysis. This analysis was focused on the ventral visual stream of both hemispheres and was similar to ROI analyses from recent fMRI studies on visual word recognition (Brem et al. 2006; Vinckier et al. 2007; Van der Mark et al. 2009, 2011; Richlan et al. 2010; Schurz et al. 2010; Szwed et al. 2011). Data representing signal change estimates (in arbitrary units) for each of the four stimulus categories versus baseline were extracted from four left hemisphere spheres (6 mm radius) and from four homologue right hemisphere spheres. Along the y-axis (anterior to posterior gradient), the centers of the spheres were equidistantly spaced by 12 mm so that the spheres did not overlap each other. The location of the spheres and the results of the ROI analyses are provided in Figure 4. Differences between stimulus categories were analyzed by ANOVAs and significant post hoc pairwise comparisons are indicated by asterisks.

ROI-based analysis of the ventral visual stream. Bar plots represent signal change estimates (in arbitrary units) and standard errors of the mean (SEM). Statistically significant post hoc pairwise comparisons are indicated by asterisks: *P < 0.05, **P < 0.01, ***P < 0.001. W, words; PW, pseudowords; LS, line strings; HS, Hebrew strings.
The only region which showed a lexicality effect, that is, higher activation for pseudowords compared with words, was the most anterior ROI in the left OTS around x = −42, y = −40, z = −14. In addition, this region showed substantially higher activation for reading material compared with nonreading material. The slightly posterior left ROI around x = −42, y = −52, z = −14, also showed significantly higher activation for words and pseudowords compared with line strings. However, there was no difference compared with the visually more letter-like Hebrew strings. In the two most posterior left ROIs, the Hebrew strings exhibited the highest activation levels, followed by the line strings. The same pattern was observed for the right hemisphere ROIs. None of the right hemisphere regions showed higher activation for words or pseudowords compared with line or Hebrew strings.
Discussion
The present study investigated the feasibility of fixation-related analysis of fMRI data for the domain of reading research. Before applying the approach to natural reading (i.e., in the context of sentences), we provided the proof of concept that analysis of fMRI responses from self-paced fixations is sensitive enough to identify activation differences between reading material and nonreading material. This constitutes the prerequisite for subsequent fMRI studies utilizing self-paced natural sentence reading. In the following we will discuss the fMRI findings as well as some methodological considerations and challenges for future studies.
Baseline Contrasts
As expected, fixation of each of the four stimulus types (words, pseudowords, line strings, Hebrew strings) resulted in activation of a similar bilateral network including occipital, temporal, parietal, frontal, cerebellar, and subcortical regions. This network is typically associated with task-positive activation (Fox et al. 2005; Fox and Raichle 2007; Power et al. 2011) and is thought to underlie perceptual processes, which operate on external information, as is the case during visual string processing (Binder et al. 2009). Activation of this network was an important indicator for the quality of our data, as this finding replicates findings from previous fMRI studies with reading material, which used traditional analysis (e.g., Carreiras et al. 2007; Fiebach et al. 2007; Jobard et al. 2007; Cohen et al. 2008; Kronbichler et al. 2009; Van der Mark et al. 2009; Schurz et al. 2010; Twomey et al. 2011). We demonstrated that fixation-related fMRI analysis worked on a fundamental level (i.e., on baseline contrasts) in the present dataset. This was an important prerequisite for further comparisons between stimulus categories.
Differences Between Reading Material and Nonreading Material
The critical expectation referred to the comparisons between reading material (words and pseudowords) and nonreading material (line and Hebrew strings). We expected higher activation for reading material in three regions typically associated with reading or reading-related tasks. Specifically, these were the left posterior temporal region, the left ventral occipito-temporal region, and the left inferior frontal region. Numerous functional neuroimaging studies have identified these regions as core regions for reading and visual letter string processing (see reviews by Jobard et al. 2003; Démonet et al. 2004; Schlaggar and McCandliss 2007; Shaywitz and Shaywitz 2008; Price 2012; Richlan 2012). Although we used an implicit reading task, we expected to trigger automatic reading of words and pseudowords upon fixation.
In line with our expectations, we found higher activation for words compared with either type of nonreading strings (line and Hebrew strings) in left posterior middle and superior temporal regions. For pseudowords compared with nonreading strings, both left posterior middle/superior temporal regions and left ventral occipito-temporal regions were activated.
Notably, in left inferior frontal language regions, activation differences between reading material and nonreading material were identified only without the cluster-level correction. Small clusters were found in opercular and triangular parts of the left IFG, as well as in the left insula. The consistency with findings from previous studies using explicit reading tasks shows that—although it was not necessary to read the letter strings to perform the task—our implicit reading task resulted in reliable automatic word and pseudoword processing.
Left Middle and Superior Temporal Regions
The most consistent activation across all four comparisons (words > line strings, words > Hebrew strings, pseudowords > line strings, pseudowords > Hebrew strings) was found in the left posterior MTG. This region is typically associated with semantic processing and its important role in reading has extensively been documented not only by fMRI studies (e.g., Jobard et al. 2003; Vigneau et al. 2006; Binder and Desai 2011; Price 2012) but also by EEG and MEG studies (e.g., Simos et al. 2002; Lau et al. 2008; Vartiainen et al. 2011).
In addition to the left posterior MTG, the present study identified higher activation for reading material (words and pseudowords) compared with nonreading material (line and Hebrew strings) in more dorsal regions including the left STS, the left superior temporal gyrus (STG), and an inferior part of the left SMG. Taken together, these regions, classically described as Wernicke's area, are thought to play a central role in the integration of auditory and visual information (e.g., van Atteveldt et al. 2004). They are involved in both the perception and production of speech (Hickok and Poeppel 2007; Price, 2012). During reading, their main function is related to grapheme–phoneme conversion (Jobard et al. 2003; Vigneau et al. 2006). The identification of reliable activation in these well-established reading-related regions (middle and superior temporal regions) is an important manipulation check, which is necessary for demonstrating the validity of the fixation-related fMRI approach for reading research.
Left Occipito-Temporal Regions
The whole-brain analysis identified activation in the left OTS only for pseudowords compared with nonreading material, but not for words compared with nonreading material. The direct comparison between words and pseudowords was only statistically significant without the cluster-level correction. This analysis identified a small cluster with higher activation for pseudowords compared with words in an anterior portion of the left OTS. The left OTS, which is located between the fusiform gyrus and the inferior temporal gyrus, is considered one of the core parts of the reading network. Therefore, we investigated the activation pattern in this region more precisely. Specifically, in order to provide further evidence for the plausibility of the present results using the fixation-related analysis approach, we conducted an ROI analysis based on coordinates from the literature (Brem et al. 2006; Vinckier et al. 2007; Van der Mark et al. 2009, 2011; Richlan et al. 2010; Schurz et al. 2010; Szwed et al. 2011).
Higher activation for pseudowords compared with words was limited to the most anterior ROI in the left hemisphere. In addition, as expected from its functional role in reading, this region also showed higher activation for reading compared with nonreading material. The location of the ROI corresponds to the most anterior part of the visual word form system (Vinckier et al. 2007; Van der Mark et al. 2009; Richlan et al. 2010), whose exact function is still the subject of considerable debate (e.g., Dehaene and Cohen 2011; Price and Devlin 2011). The most anterior part of this system was proposed to be involved in the processing of morphemes and short whole words (Kronbichler et al. 2004, 2007; Dehaene et al. 2005; Schurz et al. 2010). It was supposed to play a role in lexico-semantic reading (Price 2012) and is situated just posterior to the basal temporal language area associated with heteromodal semantic processing (Jobard et al. 2003; Binder and Desai 2011).
Higher activation for words and pseudowords compared with line strings was also found in the slightly posterior left ROI, which corresponds to the more classical location of the visual word form area (VWFA) (Cohen et al. 2000; Dehaene et al. 2002; Jobard et al. 2003). In line with the original formulation of the functional role of the VWFA, that is, a role in prelexical processing during reading, we found no difference between words and pseudowords but higher activation for these letter strings compared with simple line strings. The high activation level for Hebrew strings may have resulted from their increased visual complexity and their more letter-like appearance relative to line strings.
The high processing demands (as evidenced by the eye tracking findings) may also be the reason for the high activation levels elicited by Hebrew strings in the two most posterior left ROIs. The right hemisphere ROIs showed similar activation patterns with highest activation levels for line and Hebrew strings throughout the ventral stream. This finding is in line with the notion of engagement of the right ventral visual cortex in the processing of non-linguistic stimuli (e.g., Kanwisher 2010). In summary, the results of the ROI analysis are largely in line with findings from the literature, thus providing further evidence for the feasibility of the fixation-related analysis approach.
The present study was not explicitly designed to investigate lexicality effects in the left OTS and probably lacked both a task emphasizing such effects and statistical power to identify reliable differences using a corrected threshold on the whole-brain level. However, as will be explicated in the following, the fixation-related analysis approach may be a promising method to clarify some of the inconsistent findings in the literature regarding lexicality effects in the left OTS. To illustrate, previous studies found higher activation for pseudowords compared with words (e.g., Mechelli et al. 2003; Kronbichler et al. 2004, 2007, 2009; Bruno et al. 2008), no activation differences between words and pseudowords (e.g., Dehaene et al. 2002; Carreiras et al. 2007; Vinckier et al. 2007), and even higher activation for words compared with pseudowords (Fiebach et al. 2002; Diaz and McCarthy. 2007).
The present study is the first step in utilizing self-paced fixations during reading as markers for calculating hemodynamic responses of the VWFA. Future fMRI studies using this approach may enable novel, more natural reading situations, thereby avoiding some of the methodological problems of previous studies (e.g., uncontrolled top-down influences on activation, artificial tasks, long presentation times), which are likely to be the underlying cause for the inconsistent findings regarding the VWFA. As reviewed by Dehaene and Cohen (2011), the type of in-scanner task used in an experiment can massively influence activation patterns. The same is true for unnaturally long or short exposition times to stimuli.
A further benefit of the combined recording of eye tracking and fMRI data is that it is no longer necessary to rely on a task (e.g., one-back, target detection) to assure that the participants attend to the stimulus material. Instead, eye movement measures can directly be used to infer participants' attention. This can be utilized in order to realize silent reading paradigms during scanning. Such paradigms used to be problematic because without eye tracking the experimenter had no means of observing the participants' behavior during scanning.
Left Inferior Frontal Regions
At first sight, it is quite astonishing that the present study identified the left IFG in the comparison between reading material (words and pseudowords) and non-reading material (line and Hebrew strings) only after omitting the cluster-level correction. However, although activation of the left IFG is typically found in fMRI reading studies, it is still unclear how specific this region is related to reading processes per se. While some authors argue for a specific role in reading, for example related to grapheme–phoneme conversion (e.g., Jobard et al. 2003) or lexical access (e.g., Fiebach et al. 2002; Heim et al. in press), other findings point to a number of different processes supported by the IFG, for example, different language processes related to speech planning and comprehension (e.g., Price 2012), semantics (e.g., Binder and Desai 2011), executive functions, affective, and interoceptive processes including working memory, reasoning, decision-making, inhibition, attention, and emotion (Laird et al. 2011).
The use of an implicit reading task, which did not demand excessive phonological or lexical processing of the words and pseudowords, may have been responsible for the weak left IFG difference between reading material and nonreading material in the present study. Furthermore, by use of a delayed response paradigm, stimulus processing and decision-making were temporally decoupled, which may have suppressed response-related decision processes during fixation. Likewise, the high rate (i.e., fast succession) of fixations may have left only little time for later, higher order top-down processes. Therefore, it is plausible to assume that the fixation-related fMRI signal is primarily driven by earlier, bottom-up processes. However, this interpretation is highly speculative and certainly requires further investigation.
Methodological Considerations
We could replicate the successful use of fixations as markers for calculating hemodynamic brain responses developed by Marsman et al. (2012) and we were able to extend the approach to the domain of visual letter string processing. At first sight, given the temporal properties of the fMRI signal (imprecise and sluggish), it is quite astonishing that fixation-related analysis of fMRI data does not only work for categories that have well-established vastly distinct neural signatures (faces and houses), but also for categories that differ more subtly (visual character strings). Specifically, we showed that the initial fixations of items (about 300 ms) were sufficiently long to indicate reliable fMRI responses. Moreover, although fixations occurred at a high rate (about 3 per second), responses corresponding to these fixations could be temporally separated.
The first finding—regarding brief exposure time to stimuli—is not surprising, as studies employing rapid event-related designs often use similar or even shorter presentation times (e.g., Wheatley et al. 2005; Yarkoni et al. 2008). Even extremely short presentation of letter strings (<50 ms and masked) has been shown to result in reliable fMRI responses (e.g., Dehaene et al. 1998, 2001; Diaz and McCarthy. 2007; Nakamura et al. 2007). Therefore, the time spent on looking at an item in the present study is clearly sufficient to elicit reliable fMRI responses.
The latter issue—temporal separation of rapidly succeeding fixations—deserves closer attention. It has been shown that detectability of individual responses in a time course of overlapping signals crucially depends on temporal randomness of individual events. Specifically, Dale (1999) found that while statistical efficiency of event-related fMRI designs drops dramatically as inter-stimulus intervals get shorter, this circumstance can be avoided by using temporal jittering (i.e., randomly spaced trials). Normally, temporal jittering is controlled for during the design of rapid event-related fMRI experiments. In the case of fixation-related fMRI experiments, temporal jittering cannot be planned but rather results from the viewing behavior of the participants. However, it is plausible to assume (and confirmed by our results) that onset times of fixations are sufficiently random to achieve reasonable statistical efficiency.
A difference between the pioneering study by Marsman et al. (2012) and our study was that while Marsman et al. (2012) based their analysis strategy exclusively on ROIs, we searched the whole brain for reliable activation differences between stimulus categories. Whereas the ROI-based approach facilitates statistical testing by reducing the multiple comparisons problem, it is vulnerable to missing potentially interesting activation in regions, which were not analyzed. In contrast, an unconstrained whole-brain search strategy has the potential to discover activation differences throughout the whole brain. It is not limited by a priori assumptions and may lead to novel, unexpected findings. On the downside, one has to deal with the problem of computing numerous statistical tests (about 45 000 in our case) and to adapt the statistical thresholds for the number of these tests. Therefore, high statistical power is necessary in order to detect reliable effects. However, as evident from our results, the effects we were interested in (reading material vs. nonreading material) were robust enough to survive even thresholds corrected for multiple comparisons. This is particularly impressive given that, up to the present study, the feasibility of the fixation-related fMRI approach for the domain of reading research was unclear. We assume that future experiments can be optimized in a way to even detect more subtle effects (e.g., lexicality effects, word frequency effects, semantic effects) using corrected tests on the whole-brain level.
The next step would be to apply the fixation-related fMRI approach to more natural reading situations. For such an enterprise, one would abandon the circular array of stimuli and would rather present lists of words or even whole sentences in a line during a silent reading task. This would lead to a smaller horizontal distance between the items (hence shorter saccades) and (due to parafoveal preprocessing) would result in more fluent processing, expressed by shorter fixation durations. A crucial feature of the present approach, which allowed separation of overlapping fixation-related fMRI responses, was the temporal jittering of fixations resulting from the eye movement behavior of the participants. It is unclear whether in a sentence reading task onset times of fixations would be sufficiently random to achieve reasonable statistical efficiency. Furthermore, a line array would allow parafoveal preprocessing of upcoming, not yet fixated words. Up to now, it is an open question whether the fixation-related fMRI approach can be realized in the face of the complex interplay of foveal and parafoveal processes and the faster timing of a natural reading situation. As already mentioned in the Introduction section, a further challenge for studies attempting to employ ecologically valid self-paced reading paradigms is a certain loss of experimental control. Specifically, there would be more variability across trials and participants with respect to eye movement behavior (e.g., number of fixations, refixations, regressions, word skippings, etc.) compared with tightly controlled, rather artificial experimental situations. It is up to future studies to deal with these challenges and to refine and advance the fixation-related fMRI approach where necessary to allow the investigation of specific visual word recognition processes during self-paced natural sentence reading.
Conclusion
The present study showed the feasibility of fixation-related fMRI analysis for the domain of reading research. Using self-paced eye movements as markers for hemodynamic brain responses, we found reliable reading-related activation in important left hemisphere reading regions. Specifically, statistical power was large enough to identify reliable differences between reading material and nonreading material on the whole-brain level using thresholds corrected for multiple comparisons. Additional ROI analyses showed the sensitivity of the present approach to detect even more subtle (i.e., lexicality) effects in the left occipito-temporal cortex. We provided the proof of concept and analysis framework for future combined eye tracking and fMRI studies on reading using the fixation-related analysis approach. This approach may enable the investigation of specific visual word recognition processes during self-paced natural sentence reading (e.g., parafoveal preprocessing), which were previously inaccessible with fMRI.
Funding
This work was supported by the Austrian Science Fund (FWF P 25799-B23). Funding to pay the Open Access publication charges for this article was provided by the University of Salzburg.
Notes
We would like to thank Lisa Mayrhauser and Sarah Schuster for help with data acquisition. Conflict of Interest: None declared.