Abstract

The ability to represent the emerging regularity of sensory information from the external environment has been thought to allow one to probabilistically infer future sensory occurrences and thus optimize behavior. However, the underlying neural implementation of this process is still not comprehensively understood. Through a convergence of behavioral and neurophysiological evidence, we establish that the probabilistic inference of future events is critically linked to people's ability to maintain the recent past in working memory. Magnetoencephalography recordings demonstrated that when visual stimuli occurring over an extended time series had a greater statistical regularity, individuals with higher working-memory capacity (WMC) displayed enhanced slow-wave neural oscillations in the θ frequency band (4-8 Hz.) prior to, but not during stimulus appearance. This prestimulus neural activity was specifically linked to contexts where information could be anticipated and influenced the preferential sensory processing for this visual information after its appearance. A separate behavioral study demonstrated that this process intrinsically emerges during continuous perception and underpins a realistic advantage for efficient behavioral responses. In this way, WMC optimizes the anticipation of higher level semantic concepts expected to occur in the near future.

Introduction

The human brain is particularly sensitive to the ongoing complexity of sensory information from the external environment (Pouget et al. 2013; Karuza et al. 2014). It has been thought that the ability to represent the statistical relationships of this information over time allows one to anticipate or predict future sensory inputs (e.g., Harrison et al. 2011; Bornstein and Daw 2012; Pouget et al. 2013), a process known as probabilistic inference. Previous behavioral research has established that individuals are exceptionally sensitive to stochastic regularities (Lewicki et al. 1992; Smithson 1997; Stephen and Dixon 2011; Emberson et al. 2015) that by their very nature reduce the uncertainty about future events, even if they occur nondeterministically within the environment. Recent neurobiological research has also identified neural systems whose activity tracks the stochastic regularity of incoming sensory information, and that may implement this process (Turk-Browne et al. 2009; Tobia, Iacovella, Davis, et al. 2012; Tobia, Iacovella, and Hasson 2012; Karuza et al. 2014; Nastase et al. 2014; Emberson et al. 2015).

While the aforementioned work shows that there exists a functional capacity to recognize and capitalize on such forms of regularity, the underlying neural implementation of this predictive process as naturally induced by the statistical properties of the input is still largely unknown. Thus far, the canonical finding is that stimuli that are less expected to occur within a sequence of events evoke stronger neural responses after stimulus presentation, for example, within-sequence “Prediction error” to deviants (e.g., Huettel et al. 2005; Strange et al. 2005; Bubic et al. 2009, 2011; Vossel et al. 2009; Karuza et al. 2014). This is thought to be based on the presence of patterns in the past triggering a match–mismatch evaluation after target presentation (Kumaran and Maguire 2009). Furthermore, statistical regularities between recently encountered stimuli can be maintained either via a simple consolidation-related process that chunks together frequently co-occurring stimulus categories (Perruchet and Pacton 2006) or via statistical learning mechanisms (Bornstein and Daw 2012) to create a “statistical model” of previous information.

Our working assumption in the current study is that the regularities reflected within this statistical model can promote anticipatory or predictive neural activity that then further effect how stimuli are processed once they occur. It has been previously demonstrated that experiencing a deterministic regularity of ongoing sensory information can spontaneously promote a heightened processing of this predicted stimuli (Turk-Browne et al. 2010; Reddy et al. 2015). For instance, study (Turk-Browne et al. 2010) where participants viewed a continuous stream of repeated images that contained sequential contingencies (i.e., some images were deterministically predictive of the next image) showed that participants spontaneously coded these relations, as seen in greater hippocampal activity for predictive images. A similar study using depth electrode recordings from the human temporal lobe also demonstrated anticipatory neural firing prior to stimulus presentation for learned stimulus associations within a deterministic series of images (Reddy et al. 2015). The relation between prestimulus neural state and poststimulus processing has been repeatedly demonstrated in recent MEG, EEG, and fMRI work (van Dijk et al. 2008; Mathewson et al. 2009; de Lange et al. 2013; Stokes et al. 2014). It has often been shown that differences in endogenous slow-wave neural oscillations prior to stimulus presentation can influence the ability to detect near-threshold visual targets once they appear (van Dijk et al. 2008; Mathewson et al. 2009), and the expectation of a given stimulus category (via a deterministically informative cue) has been shown to differentially engage such prestimulus activity compared with when the category type could not be anticipated (Bollinger et al. 2010). Similarly, cues that are informative of a behaviorally relevant target have also been shown to invoke specific patterns of post-cue activity (de Lange et al. 2013; Stokes et al. 2014). Because regular contexts are very likely to be associated with unique patterns of prestimulus activity compared with random ones, comparing anticipatory activity in random series and in stochastically regular series provides an opportunity for identifying the predictive mechanisms potentiated by the underlying maintenance of statistical models. Furthermore, if this anticipatory activity is related to the probabilistic inference about future events, then there should be a strong relation between a person's degree of prestimulus neural activity and the extent to which they show differences in poststimulus activity for expected versus surprising stimuli.

A major goal of our current work was also to address the fundamental issue of whether probabilities are sufficient for triggering anticipatory activity or whether this depends on one's working-memory capacity (WMC) that serves to transiently maintain this information over brief periods of time. Recent behavioral work has identified the important explanatory role of interindividual differences in the ability to detect regularities within the external environment (Misyak et al. 2010; Frost et al. 2015; Siegelman and Frost 2015); however, neurobiological work has yet to examine this potential role. While initial investigations of the brain's sensitivity to statistical structure of sensory inputs (Strange et al. 2005; Harrison et al. 2006; Bestmann et al. 2008; Mars et al. 2008) assumed that observers equally weigh all previous events (an “ideal Bayesian observer' model”; Kersten and Yuille 2003; Geisler 2011), it is well established that both the ability to retain observations from the recent past (Luck and Vogel 2013; Ma et al. 2014), as well as the fidelity of these representations (Ma et al. 2014), are a capacity-limited process subjected to decay. Indeed, some work (e.g., Harrison et al. 2011; Bornstein and Daw 2012) has shown that neural and behavioral responses to inputs are better modeled by formalisms in which recent observations are more strongly weighted than distant ones, for example by implementing a decay function as a proxy for forgetting (Harrison et al. 2011). Given the potential capacity limit for representing recent events, we hypothesized that individuals with a higher WMC, allowing the maintenance of more information from the recent past, will also show stronger anticipatory signatures in regular contexts than in random ones.

To investigate whether anticipatory activity reflects an interaction between nondeterministic statistical regularities in the environment and an individuals' WMC, we used magnetoencephalography (MEG) to study neural activity when individuals observed series of unique visual images. These series were identical in all aspects apart from the extent to which their statistical structure (as quantified by a 1st order Markov process) licensed predictions about future events. Using this approach, we establish that 1) statistically regular contexts evoke regularity-related prestimulus anticipatory signatures in the θ (4-8 Hz) and α (8-12 Hz) frequency bands that are not found for nonpredictable contexts; 2) that these signatures are linked to differences in poststimulus processing of predictable versus unpredictable visual stimuli; and 3) that both pre- and poststimulus regularity-related processing were also strongly correlated with an individual's WMC, thus suggesting a strong relation between WMC and predictive capacity.

Materials and Methods

Participants

Twenty participants took part in the MEG study. Data from 2 participants were excluded due to technical errors during data collection resulting in 18 participants with usable data (range = 19–35 y.o.a, M = 26.35, SD = 3.94; Female 13). All participants reported normal to corrected vision and were not using any medications known to affect cognitive functioning. Participants provided written informed consent, and the study was approved by the University of Trento's Ethical Review Board for human-based research. All participants were compensated at a rate of 12€ per hour for their time (2 h).

Change Detection Task (Visual WMC)

Prior to both Experiment 1 (MEG) and Experiment 2 (behavioral study), participants first performed a change detection task (Vogel et al. 2005; Fukuda and Vogel 2009) used to assess an individuals' visual WMC (Cowan 2001). The test–retest reliability of this task is well documented (Kyllingsbaek and Bundesen 2009; Luck and Vogel 2013; Ma et al. 2014), and we confirmed high split-half reliability of this measure within our own data in both experiments (see Supplementary Results). This task consisted of stimulus arrays of 2, 4, 6, or 8 colored squares presented briefly (250 ms) on the computer screen. Participants were instructed to remember the stimulus array over a retention interval of 1000 ms while fixating on a gray crosshair. After which, a single colored square was presented in the same location as one of the previous items from the stimulus array and participants indicated whether the color of the square was the same or different as the original item in that location via a button press. On half of the trials, the color of the square was the same as the original item in that location and the other half of the trials presented a colored square that did not match the original item. Individual accuracy on each array size was transformed into a K estimate using a standardized formula (Cowan 2001) considered to be a robust measurement of individuals' visual WMC. In this formula, K = S (H−F), K is the memory capacity, S is the size of stimulus array, H is the observed hit rate, and F is the false alarm rate (Cowan 2001; Awh et al. 2007; Fukuda and Vogel 2009; Ma et al. 2014). Using this formula, we calculated the mean K for each array size (S) for each individual, and these were averaged resulting in a single visual WMC measure for each participant (following Cowan 2001). WMC was the only covariate obtained for the participants in this study, given our apriori hypothesis on the relation between WMC and statistically invoked predictive mechanisms.

Stimuli

Stimuli consisted of 2825 unique gray-scale photographs from 4 distinct visual categories (animals, human faces, houses, and tools). All photographs were normalized to a mean gray value of 127 and a SD of 75, set at 300 × 300 pixels, matched for luminance, contrast and spatial variance using the SHINE package (Willenbockel et al. 2010), and presented upon a gray background (127 value). Furthermore, a subset of 150 composite images were created by randomly selecting 1 picture from each visual category and averaging them so that all aforementioned stimulus features were preserved, yet any distinct visual characteristics of the individual category features were indiscernible. In this way, the composite images maintained the essential low-level features of the discernible images, but did not contain any perceptual/semantic information. We presented these composite images during the interstimulus interval (rather than a simple monochrome screen or crosshair fixation) to minimize any effects of major changes in contrast, luminance, and spatial frequency between displays.

Design and Manipulation of Statistical Structure

The regular and “random” series were generated via 2 separate Markov processes that differed in their level of statistical uncertainty (i.e., entropy; see Fig. 1). Markov entropy (ME) can be used to quantify the regularity of an input stream by calculating the mean level of transition constraints between each item within a series of information (i.e., given the current input “1” what is the statistical probability that the following input would be “2” based on all previous occurrences of the input “1”) and is derived by the following formula:
in which, P(i) is the probability of an event i, and, P(j|i) is the probability of event j, given the current event i, and, n indicates the number of different event types. For instance, the transition constraints of a continuous series of 2 repeating numbers that are ordered deterministically (i.e., “1,2,1,2,1,2…”) will have a first-order ME of 0, while a completely random series of 2 numbers will have a first-order ME of 1.0 (stationary distribution of 1 bit). Likewise, given a completely random series of stimuli consisting of 4 items, the stationary distribution would be 2 bits and the ME would equal 2.0.
Manipulation of statistical structure, task design, and procedure. Markov entropy (ME) is a measure that quantifies the regularity of a continuous sequential input. This task implemented 2 types of transition matrices where the relational constraints between 4 items within a series followed 2 distinct levels of ME: (A) An example of an Ordered series (low entropy) where given “1” there is a 75% probability that the following input would be “2” and a 25% probability that the following input would be “4.” (B) High-entropy (“Random”) series where no transition constraints exist apart from the absence of repetitions. After these, “4 Category” series were created via a string of numbers (1,2,3,4), each number was assigned a stimulus category (i.e., Animals = 1, Human faces = 2, Houses = 3, and Tools = 4). These assignments were changed for each series to ensure that statistical associations would be relearned in each series (see 2nd matrices with “A” “F” “H” “T” assignments). (C) A short example of an Ordered series (48 stimuli per series in total). Participants responded with a button press if an image appeared upside down (catch trials). The same picture was never presented twice during the experiment, therefore only allowing for the possibility of learning the abstract semantically related statistical associations between categories to occur.
Figure 1.

Manipulation of statistical structure, task design, and procedure. Markov entropy (ME) is a measure that quantifies the regularity of a continuous sequential input. This task implemented 2 types of transition matrices where the relational constraints between 4 items within a series followed 2 distinct levels of ME: (A) An example of an Ordered series (low entropy) where given “1” there is a 75% probability that the following input would be “2” and a 25% probability that the following input would be “4.” (B) High-entropy (“Random”) series where no transition constraints exist apart from the absence of repetitions. After these, “4 Category” series were created via a string of numbers (1,2,3,4), each number was assigned a stimulus category (i.e., Animals = 1, Human faces = 2, Houses = 3, and Tools = 4). These assignments were changed for each series to ensure that statistical associations would be relearned in each series (see 2nd matrices with “A” “F” “H” “T” assignments). (C) A short example of an Ordered series (48 stimuli per series in total). Participants responded with a button press if an image appeared upside down (catch trials). The same picture was never presented twice during the experiment, therefore only allowing for the possibility of learning the abstract semantically related statistical associations between categories to occur.

Using this approach, we constructed 2 types of transition matrices specifying the transition constraints between the 4 categories described above. These transition profiles allowed for 2 levels of ME between the 4 categories based on the transition matrices demonstrated in Figure 1A,B. In the high-entropy (“Random”) condition, there were no transition constraints except that the same stimulus category could not be subsequently repeated (i.e., given the current input “1” all subsequent possibilities are equally as probable [33.3%] and therefore the likelihood of a category occurring after the previous one was low), and so the ME of the stationary distribution was 1.57 bit. The low-entropy (more ordered) condition was one where each category could transition to only 2 out of the 4 other categories in the series with a 75% probability for one transition and 25% for the other. Consequently, the ME of this matrix was 0.81 (Fig. 1A). To illustrate, in the low-entropy series, given the current input “1,” there was a 75% probability that the following input would be “2” (predictable) and a 25% probability that the following input would be “4” (surprising). This low-entropy series contains statistical regularities, (i.e., transition constraints) and therefore offers a basis for learning these associations of occurrence between visual categories, whereas the high-entropy series does not contain any statistical regularities and thus does not allow for any associative learning. We note that even the more regular series are essentially nondeterministic in that at no point is there certainty regarding the next category that could appear (that is, no cell in the transition matrix is 100% diagnostic about the next category that would appear). In addition, all constraints were between adjacent stimuli, and we did not manipulate the strength of nonadjacent constraints.

From these 2 transition matrices (high and low entropy), we generated series of 48 items in length. For each individual series, one of 4 visual categories was assigned to a number within the generated series (i.e., Animals = 1, Human faces = 2, Houses = 3, and Tools = 4) resulting in visual series with distinct categorical levels of ME (see example of low ME series Fig. 1C). We refer to these series assignments with low ME as the Ordered condition and to series with high ME as the Random condition. Importantly, our design assured that in both these conditions, the marginal probabilities of each category were identical in all cases and set to 25%. Thus, only transition probabilities differed between Ordered and Random series.

For purposes of the current design, it was important that these 4-category transition constraints needed to be relearned within each series in the case of the Ordered condition so as to rule out any longer term transfer of statistical learning from one series to the next. For this reason, the assignment of categories to each series of generated numeric label (1, 2, 3, 4) was permuted across series until every possible number/category combination within these constraints was achieved for each entropy condition (16 series per entropy condition). We did this to make sure participants would be continuously engaged in a learning process within the regular series. Had we maintained the exact same transition mappings for all the regular series in the experiment, participants could then rely on a simple recognition strategy of a single pattern across the entire experiment (this could be reduced to a relatively simple associative recognition process, (Davachi and Wagner 2002; Bergmann et al. 2012) instead of learning the new sequential relationship between categories within each series.

Experimental Procedure (Experiment 1; MEG Study)

The MEG experiment consisted of 10 recording blocks. Each block consisted of 2 Random and 2 Ordered series conditions (48 stimuli per series), and the presentation of conditions was randomized within each testing block (16 series per condition). Each series was presented as follows (see Fig. 1C): Prior to the start of each series, participants pressed a button to indicate that they were prepared to view the series of pictures. After which, a black crosshair was presented centrally on a gray screen for 3000 ms, followed by a red crosshair (1000 ms) indicating the start of a series. Participants then continuously viewed 48 novel pictures (stimulus presentation of 500 ms and an interstimulus interval [ISI] = 1000 ms) per series. No picture was presented twice during the experiment. This therefore only allowed for the construction and evaluation of relatively abstract semantically related statistical associations between categories and ruled out any lower level associative memory effects that could hold between specific stimuli. During the ISI, a single composite image was presented for 1000 ms throughout an entire series and was changed prior to each series presentation by random assignment (see Fig. 1C).

To ensure alertness during the study and reduce motor-related MEG artifacts, we implemented catch trials on 5% of trials where participants were instructed to press a button when an image was presented upside down (Fig. 1C). MEG recordings during catch-trials (and false alarms) were removed prior to analysis.

At the conclusion of each series, a new composite image was presented centrally on the screen and participants were then instructed, “Indicate if you find this image ‘Pleasant’ (right button) or ‘Un-Pleasant’ (left button).” We included this judgment to briefly engage participants in an alternate task involving a subjective discrimination to disrupt any short-term retention of prior associative learning from one series to the next (in lieu of simple arithmetic or alphabetizing tasks often used for this purpose, which have an inherent ordering component involved in the completion of these tasks).

Trial Selection (Experiments 1 and 2)

Trials within a series were identified for analysis based on their transition probability status (Fig. 1A,B). The Random series contained only random trials, with a transition probability of 33%. The Ordered series contained both “surprising trials, which were those with a 25% transition probability, and associatively “predictable” trials, which were those with a 75% transition probability (Ord_25, Ord_75 henceforth; see Fig. 2A). Prior to analysis in both experiments, we excluded the first 8 trials from each series of 48 trials from analysis to avoid any sort of initial processes related to beginning a new series (and potential unlearning of the prior one) that could potentially add noise to our experimental data. We isolated all Ord_25 trials and to equate the number of Ord_75 trials (which were by definition more frequent than the surprising ones), we selected only predictable trials within an Ordered series which were preceded by 4 prior Ord_75 trials. Therefore, by the ≥5th presentation of an Ord_75 stimulus, participants had previously viewed at least one full cycle of the Markov transition matrix with no interfering surprising trial. This was done to minimize the recent occurrence of prediction errors in the analysis of Ord_75 trials, recovery from which could introduce noise into the trials we were interested in, thereby focusing our analyses on trials where participants would be in a state of an unimpeded process (i.e., “local streak”) of successful prediction. Furthermore, to pseudo-match for the number of trials extracted from the Random series to those extracted for Ord_25 and Ord_75 trials, we only extracted every 4th trial from the Random series (Rand trials; Fig. 2B). It was important to equate the number of trials extracted for each condition, as selecting unequal numbers could bias the signal-to-noise of specific conditions. For instance, had we selected all possible trials, there were be a much larger number of Rand and Ord_75 trials extracted per participant compared with Ord_25 trials. All else being equal, this would make the contrast between the former 2 conditions more sensitive than contrasts against the Ord_25 condition. The issue of unequal trials is of a particular concern for correlational analyses where a behavioral measure is correlated against the mean condition estimate calculated per participant—here, including different numbers of trials per condition could bias the precision across conditions and confound interpretation of the correlation values.

Trial selection procedure and event-related neural responses to stimuli. To isolate trials associated with different transition probabilities, stimuli within a series were extracted for analysis based on the pre-established Markov transition matrices. (A) “Ord_75” are associatively “predictable” trials, which were those with a 75% transition probability within an ordered series (Red). Trials were selected for subsequent analysis only if they were preceded by 4 prior ordered stimuli. All associatively “surprising” trials with a 25% transition probability within an ordered series (“Ord_25” in blue) were selected for analysis. (B) To pseudo-match for the number of trials used for analysis in the ordered series, only every 4th trial from the Random series was extracted (“Rand” in green). The neural responses related to the presentation of these different trial types (C) were analyzed during a 200–400 ms poststimulus presentation window to examine the commonly reported novelty and “oddball” components (*P < 0.05; (fT) femtotesla). Within this window, participants displayed a heightened sensitively to surprising (Ord_25) compared with predictable trials (Ord_75) on magnetometer sensors.
Figure 2.

Trial selection procedure and event-related neural responses to stimuli. To isolate trials associated with different transition probabilities, stimuli within a series were extracted for analysis based on the pre-established Markov transition matrices. (A) “Ord_75” are associatively “predictable” trials, which were those with a 75% transition probability within an ordered series (Red). Trials were selected for subsequent analysis only if they were preceded by 4 prior ordered stimuli. All associatively “surprising” trials with a 25% transition probability within an ordered series (“Ord_25” in blue) were selected for analysis. (B) To pseudo-match for the number of trials used for analysis in the ordered series, only every 4th trial from the Random series was extracted (“Rand” in green). The neural responses related to the presentation of these different trial types (C) were analyzed during a 200–400 ms poststimulus presentation window to examine the commonly reported novelty and “oddball” components (*P < 0.05; (fT) femtotesla). Within this window, participants displayed a heightened sensitively to surprising (Ord_25) compared with predictable trials (Ord_75) on magnetometer sensors.

For both experiments, we report all of the results in this manner (Ord_25, Ord_75, and Rand), where Ord_25 and Ord_75 reflect associatively “surprising” or associatively “predictable” trials within the Ordered series, respectively (Fig. 2A,B). Using this approach of pseudo-matching our trial selection, Rand trials formed 25% of all the trials from the Random series, Ord_25 trials formed 25% of all the trials from the Ordered series, and Ord_75 trials formed 24% of all the trials from the Ordered series (by trial selection of the ≥5th presentation this probability is 0.755= 24%). Note that there were no statistical differences in the mean number of trials included in the MEG analysis after artifact rejection (all Ps > 0.11), and also, the mean number of trials for each participant did not significantly correlate with individual K-scores (all Ps > 0.10), demonstrating that the number of trials included did not influence the MEG results.

MEG Recording and Preprocessing

MEG data were recorded in an electromagnetically shielded room (Vacuumschmelze, Hanau, Germany) using a 306-channel MEG (Vectorview, Elekta-Neuromag Oy, Helsinki, Finland) comprising 204 orthogonal planar gradiometers and 102 magnetometers combined in 102 locations above the participant's head. Prior to the MEG recording session, cardinal points at the nasion and left and right preauricular points were digitized using a Polhemus FASTRAK 3D digitizer. During recording blocks, the position of the participant's head was quasi-continuously measured using 5 head position indicator coils. The MEG acquisition threshold for head movements was <2 mm between recording blocks. Data were recorded at a 1000 Hz sampling rate and 0.01 Hz high pass filtering.

All MEG data were preprocessed and statistically analyzed using the Fieldtrip toolbox (Oostenveld et al. 2011). A discrete Fourier transform filter was applied to remove line noise (default values of 50, 100, 150 Hz), and data were epoched from −1.5 to 1.5 s relative to stimulus onset and down-sampled to 250 Hz. All data were visually inspected to remove noisy trials and channels prior to an independent component analysis (ICA; Bell and Sejnowski 1995). Components capturing ocular and cardiac artifacts were removed and the raw data reconstructed. After ICA, missing channels were interpolated using a nearest-neighbor approach.

Event-Related Field Preprocessing

Epochs were bandpass filtered between 1 and 35 Hz and then averaged from −200 to 600 ms relative to stimulus onset per trial. The 200 ms prior to stimulus presentation was used as a prestimulus epoch for baseline correction. Statistical comparisons between trials types were conducted separately for Magnetometers (102 sensors) and Gradiometers (102 combined sensors).

Prestimulus Time–Frequency Preprocessing

For spectral analysis, epochs were high pass filtered (1 Hz), and no baseline normalization was applied because we had an a priori hypothesis concerning ongoing prestimulus differences between conditions, which were the focus of this current investigation. Trials were selected with a restriction allowing only for trials where individuals had experienced a succession of 4 standard (Ord_75) stimuli in Ordered series and a matched number of Rand trials from the Random series. The time–frequency distributions of prestimulus activity types were compared separately for the magnetometer and gradiometer sensors. Condition-related differences in oscillatory power were estimated using a multitaper FFT time–frequency transformation with frequency-dependent Hanning tapers (time window: Δt = 4/f sliding in 50 ms steps). We calculated power from 2 to 30 Hz in steps of 2 Hz, separately for each series type (Ord_75 and Rand). The type-I error rate for the complete set of sensors (analysis was conducted separately for Magnetometers [102 sensors] and Gradiometers [102 combined sensors]) was controlled for multiple comparisons using cluster-extent family-wise error control (Maris and Oostenveld 2007) (P < 0.05 on the cluster level) implemented in the Fieldtrip software. While we were primarily interested in the 1 s prestimulus ISI time window, we also included the 0.5 s window in which the stimulus was on the screen to determine whether any differences in oscillatory activity were specific to the timing of stimulus onset and offset or were due to more continuous tonic differences between trial types that was unrelated to stimulus timing.

MEG Event-Related Fields Analysis

To investigate differences in the event-related neural response to trial types within Ordered series (Ord_75, Ord_25 trials), we calculated event-related fields (ERFs) differences between trial types using an a priori time window of interest from 200 to 400 ms after stimulus onset based on well-established event-related components common for novelty and “oddball” detection (Polich and Comerchero 2003; Gonsalves et al. 2005; Cycowicz and Friedman 2007; for review, see Polich 2007). We then conducted paired-sample t-tests (n = 18, 2-tailed, thresholded at P< 0.05, FDR corrected for multiple comparisons) within this averaged time window of interest to evaluate differences between trial types across magnetometer and gradiometer sensors separately. Pairwise differences were only considered significant for clusters of 4 or more neighboring sensors (same threshold as time–frequency analysis below).

MEG Time–Frequency Analysis

Time–Frequency Differences Between Ordered and Random Trials

The power differences (2–30 Hz) during ISIs in the Ordered and Random series were compared using a cluster-based nonparametric, permutation-based statistic that controls for type-I errors with respect to multiple comparisons (Nichols and Holmes 2002; Maris and Oostenveld 2007). First, Student's t statistics for the Ord_75 versus Rand contrast were calculated. The cluster-finding algorithm identified clusters of neighboring sensors (minimum cluster of ≥4 neighboring sensors) and frequency bins where the t statistics for the contrast exceeded a significance level of P < 0.05. The cluster-level test statistic was a cluster-mass measure defined as the sum of the t statistics of the sensors in a cluster. In a nonparametric statistical test, cluster-level test statistic was determined based on construction of a null distribution. The null distribution was obtained by randomly permuting the data between the 2 trial types within every participant. By creating a reference distribution from 500 random sets of permutations, the cluster-level P value was estimated as the 95% percentile of the randomization null distribution. In summary, this permutation procedure identifies clusters in the data where the contrast on the single sensor exceeds the P < 0.05 level, and the number of adjacent sensors showing this effects exceeds that likely to be found by chance.

Prestimulus Time–Frequency Correlation with WMC

The mean power difference between both trial types (Ord_75–Rand) was calculated for each participant and averaged over the entire 1.5 s time window, in the same manner as the group analysis above (ISI: −1.0 to 0 s and Stimulus presentation: 0–0.5 s). Note this covers the entire total time period of a series (split trial-by-trial) and allows investigating the impact of the experimental design both on the prestimulus period as well as for the period during which the stimulus was present on the screen. We then conducted a sensor-wise Pearson correlation test to assess whether individual differences in K-score correlated with (Ord_75–Rand) power differences during the ISI, for each of the frequencies and time bins of interest. Using this approach, we imposed no a priori assumptions of sensor location or specific frequency band between 2 and 30 Hz. Control for family-wise error was implemented via the cluster-level correction as described above.

Prestimulus Time–Frequency Correlation with Poststimulus ERF

For each participant (n = 18), we identified the sensor showing the largest power difference (within the θ and α bands separately) between the Ord_75 and Rand trial types, averaged only within the prestimulus time window (ISI period: −1.0 to 0 s prior to stimulus onset). Next we determined whether these individual-level prestimulus power differences correlated with individual ERF differences between conditions (Ord_75, Ord_25, and Rand) within 2 a priori time windows of interest based on well-established event-related components (Luck et al. 2000; Polich 2007). To isolate early attentional processing, we selected an early time window from 0 to 200 ms after stimulus presentation (Heinze et al. 1994; Valdes-Sosa et al. 1998; Luck et al. 2000). We also examined a later time window from 200 to 400 ms after stimulus onset, commonly found for novelty and “oddball” detection ERF components (Polich and Comerchero 2003; Gonsalves et al. 2005; Cycowicz and Friedman 2007). We then conducted a sensor-wise permutation-based (described above) Pearson correlation test to evaluate these relations (based on our previous ERF and Time–Frequency results this analysis was only conducted on magnetometer sensors).

It is important to note that we imposed no a priori constraints on sensor location for either the maximal prestimulus power difference calculation or for significant clusters reflecting a correlation with poststimulus onset ERF difference between trial types. Therefore, this analysis was independent of any prior one to maintain statistical independence. Significant clusters reflecting a correlation with ERF differences between trial types were identified by groupings of 4 or more neighboring channels (identical to all prior analyses).

Behavioral Experiment 2

Participants

Twenty healthy participants took part in the study (19–31 y.o.a., mean = 23.95, SD 3.62; Female 12). This number of participants is within the range of previous investigations using the same change detection task as a covariate for a secondary task to assess individual behavioral variance within a population (Luck and Vogel 2013; Ma et al. 2014). All participants reported normal or corrected vision and were not using any medications known to affect cognitive functioning. Participants provided written informed consent, and the study was approved by the University of Trento's Ethical Review Board. All participants were compensated at a rate of 10€ per hour for their time.

Design and Procedure

In this study, participants underwent a slightly modified variant of the MEG design where they were asked to make a “Living/NonLiving” judgment for each stimulus presented (see Supplementary Methods and Fig. S1).

Results

During MEG recordings, we presented participants with continuous series of unique visual stimuli drawn from 4 distinct categories (Animals, Houses, Faces, Tools) where the probabilistic likelihood of category type transitioning to another type was systematically varied (Fig. 1). In the Random condition, there were no transition constraints between categories apart from the fact that a category could not appear twice in a row (Fig. 1B), and the probability of each category (the marginal frequency) was set at 33.3% (Rand trials). In the Ordered condition, the appearance of a particular category could be predicted with a 75% probability (Ord_75 trials) and deviants (Ord_25 trials; Fig. 1A) occurred on 25% of trials.

WMC and Behavior During MEG Study

As expected, participants' accuracy during the change detection task (used to assess WMC, see Materials and Methods) decreased with the size of the target array to be encoded (Pairwise t-tests, 2-tailed, mean ± SD, 2-items: 95.8 ± 3.64% vs. 4-items: 81.5 ± 10.9%; t(17) = 7.23, P = 0.0001; 4-items vs. 6-items: 73.5 ± 10.3%; t(17) = 3.77, P = 0.002; and 6-items vs. 8-items: 67.2 ± 9.77%; t(17) = 2.79, P = 0.013). When calculating participants' individual visual working-memory capacity (K-score), this resulted in a mean K of 2.70 and median K of 2.35 (±1.02), which is similar to previous reports using this measure (see review, Luck and Vogel 2013). The split-half reliability of this behavioral test (Spearman–Brown corrected) was r = 0.83 (see Supplementary Results). During the MEG study, participants were accurate in identifying catch trials (80.6 ± 8.4%), which is a similar level of performance to that reported in previous reports of stimulus detection for inverted pictures (Scapinello and Yamey 1970; Diamond and Carey 1986). Furthermore, there were no differences in participant's performance between conditions (Pairwise t-tests, 2-tailed, mean ± SD, Order: 80.4 ± 11.1% vs. Random: 80.9 ± 9.2%, P = 0.88) consistent with comparable levels of attention for both conditions. Although participants displayed a very low mean false alarm rate of 2.4% (i.e., incorrectly responding to a trial when the stimulus within a series that was not upside down), we also calculated a corrected hit-rate (Hits—False Alarms). Pairwise t-tests confirmed that there was no differences in response bias between conditions (2-tailed, mean ± SD, corrected hit-rate Order: 77.8 ± 12.0% vs. Random: 78.7 ± 9.7%, P = 0.78).

Event-Related Fields

We computed the orthogonal pairwise contrasts between ERFs in the 3 trial types (Ord_75, Ord_25, and Rand) during a 200–400 ms poststimulus onset window to target the commonly reported novelty and “oddball” components (Polich and Comerchero 2003; Gonsalves et al. 2005; Cycowicz and Friedman 2007) (for review, see Polich 2007; see Fig. 2A,B for trial selection procedure). This analysis revealed a pattern consistent with statistical learning of the transition structure, seen in a greater amplitude for Ord_25 trials (7.35 ± 1.82 fT) relative to Ord_75 trials (−0.57 ± 2.15 fT) (Fig. 2C) on the magnetometer sensors (average over cluster: t(17) = 2.92, P= 0.01). The contrast between the Ord_25 and Rand conditions (Rand = 5.02 ± 2.13 fT) was not statistically significant (t(17) = 0.49, P= 0.63). No differences were found for combined gradiometer sensors.

Prestimulus Time–Frequency Effects in Relation to WMC

We compared Time–Frequency power signatures during prestimulus intervals for the Ordered and Random series (range 2–30 Hz, separately for the magnetometer and gradiometer sensors). An initial analysis that was independent of the WMC measure revealed no statistically significant differences in prestimulus activity in these series (see Materials and Methods). However, power differences during the prestimulus intervals showed topologically widespread and statistically significant correlations with participants' WMC, within clusters including primarily frontal and central magnetometer sensors (see Supplementary Fig. 2). While this result speaks very strongly to the importance of examining WMC in relation to statistical contexts, it contains a very large amount of data, with significant effects found in different frequency bands and brain regions. Specifically, when examined over the entire prestimulus time window, the frequency distribution of this significant cluster encompassed the entire frequency range between 2 and 10 Hz. Because this range includes both α- and θ-bands, which are often associated with different cognitive functions, and in accordance with previous investigations of prestimulus oscillatory activity (α: van Dijk et al. 2008; Mathewson et al. 2009; θ: Addante et al. 2011; Jutras et al. 2013), we then filtered for results specifically within θ (4–8 Hz) and α (8–12 Hz) separately to better isolate and to interpret these processes. To summarize our analysis approach, the first part of the analysis (multiplot results presented in Supplementary Fig. 2) is completely independent of any prior analysis, whereas the follow-up “drill down” into the α and θ band is based on a follow-up descriptive procedure whose purpose is to meaningfully describe the core features of these data patterns (Fig. 3). This includes generating scatter plots which are necessary for understanding the direction of the correlation and the range of values it subsumes (Fig. 3C,F).

Prestimulus oscillations and WMC. The mean power differences between Ord_75 and Rand trials (ISI: −1.0 to 0 s and stimulus presentation: 0–0.5 s) positively correlated with participants' individual WMC (K-score) within clusters including frontal and central magnetometer sensors (**P< 0.01). Specifically, (A) prestimulus power differences in the θ band (4–8 Hz) between Ord_75 and Rand trials positively correlated with individuals' K-scores within a cluster of central sensors (C) and continued throughout the entire prestimulus period but terminated with the onset of the stimulus (B). Similarly, within a cluster of frontal sensors (D) prestimulus power differences in the α band (8–12 Hz) between Ord_75 and Rand trials also positively correlated with individual's K-scores (F), yet this pattern continued throughout stimulus presentation (E).
Figure 3.

Prestimulus oscillations and WMC. The mean power differences between Ord_75 and Rand trials (ISI: −1.0 to 0 s and stimulus presentation: 0–0.5 s) positively correlated with participants' individual WMC (K-score) within clusters including frontal and central magnetometer sensors (**P< 0.01). Specifically, (A) prestimulus power differences in the θ band (4–8 Hz) between Ord_75 and Rand trials positively correlated with individuals' K-scores within a cluster of central sensors (C) and continued throughout the entire prestimulus period but terminated with the onset of the stimulus (B). Similarly, within a cluster of frontal sensors (D) prestimulus power differences in the α band (8–12 Hz) between Ord_75 and Rand trials also positively correlated with individual's K-scores (F), yet this pattern continued throughout stimulus presentation (E).

In a cluster of central sensors, [Ord_75–Rand] prestimulus power differences in the θ band (M = 0.045 ± 0.12 fT) were positively correlated with individuals' WMC estimates (collapsing across sensors in the significant cluster r(18) = 0.59, P= 0.009; Fig. 3A,C). This θ band pattern held throughout the prestimulus period but terminated just prior to stimulus onset (Fig. 3B) indicating this relationship was temporally synchronized with the stimulus timing of the presented visual series.

In a cluster of frontal sensors, [Ord_75–Rand] power differences in the α band (M = −0.034 ± 0.072 fT) were also positively correlated with individual's WMC (average of all sensors over time within the significant cluster r(18) = 0.61, P= 0.008). Notably for the α band, the correlation held throughout both the interstimulus interval and during stimulus presentation (Fig. 3D–F). This suggests that for α, the relation between WMC and prestimulus power (Ord_75–Rand) was not modulated by the timing of stimulus onset and offset. We found no statistically significant correlations for gradiometers. Taken together, these analyses indicate that WMC is associated with anticipatory patterns of neural activity that are reflected in greater prestimulus power during Ordered compared with Random series.

Prestimulus Time–Frequency and Poststimulus ERFs

The analyses above suggest that individuals with higher WMC show stronger prestimulus preparatory activity in statistically regular contexts, consistent with recent work suggesting a role for WMC in capitalizing on statistical information (as outlined in the Introduction). However, the question still remains of how this activity relates to neural processing during stimulus appearance. Several prior MEG studies have linked prestimulus oscillations to poststimulus ERFs (van Dijk et al. 2008; Lange et al. 2012; Wutz et al. 2014). We conducted a similar analysis to determine whether individuals who more strongly differentiated Ordered from Random series during the prestimulus interval show stronger poststimulus ERF differences between stimuli drawn from Ordered versus Random series.

For each participant, we identified the sensor showing the maximal power difference (within the θ and α bands separately) between the Ord_75 and Rand trial types, during the interstimulus interval. Next, we assessed whether these difference magnitudes correlated with interindividual differences in ERFs for the different trial types (Ord_75, Ord_25, and Rand). We examined differences in ERFs in 2 time windows after stimulus onset: 0–200 ms, an epoch that has been linked to early attentional processes (Heinze et al. 1994; Valdes-Sosa et al. 1998; Luck et al. 2000) and 200–400 ms, an epoch linked to odd-ball-detection effects (Polich and Comerchero 2003; Gonsalves et al. 2005; Cycowicz and Friedman 2007; Polich 2007). In summary, this sensor-wise analysis identified clusters where ERF differences were explained by differential prestimulus activity between Ordered and Random series. (Note that by identifying the sensor with the strongest prestimulus effect separately for each participant, we imposed no apriori assumption on sensor location, thus maintaining independence from the prior analyses.)

We found that prestimulus Ord_75–Rand differences in the θ band positively correlated with the (Ord_75–Rand) ERF during the 0–200 ms window, within a cluster of occipital sensors (“Preθ/PostERF”; Fig. 4A). The average correlation over this sensor cluster was also statistically significant, r(18) = 0.53, P= 0.024. This indicates that enhanced early poststimulus processing of a predictable stimulus (Ord_75–Rand) is found for individuals for which Ordered series were associated with greater prestimulus θ band power compared with Random series. For the θ band, we found no other statistically significant correlation between prestimulus power and poststimulus ERF differences in any other comparison, within the early or late time windows.

Prestimulus oscillations and poststimulus neural responses. For each participant, we identified the sensor with the largest absolute prestimulus power difference (Ord_75–Rand), within the θ (Preθ) and α (Preα) frequency bands separately (−1.0 to 0 s prior to stimulus onset). These differences were subjected to a sensor-wise Pearson correlation analysis between prestimulus power differences and poststimulus onset amplitude differences between trial types (PostERF: Ord_75, Ord_25, and Rand). We found that differences in prestimulus slow-wave oscillations significantly correlated with the differential processing of trial types once the stimulus appears (*P< 0.05, fT—femtotesla). (A) Prestimulus Ord_75–Rand power differences in θ frequency band (4–8 Hz) positively correlated with enhanced early processing (0–200 ms after stimulus onset) of Ord_75 compared with Rand trials (Preθ/PostERF). (B) Prestimulus α frequency band (8-12 Hz) power differences (Ord_75–Rand) correlated with greater neural sensitivity to Ord_25 compared with Ord_75 during the later poststimulus onset time window (200–400 ms after stimulus onset, Preα/PostERF).
Figure 4.

Prestimulus oscillations and poststimulus neural responses. For each participant, we identified the sensor with the largest absolute prestimulus power difference (Ord_75–Rand), within the θ (Preθ) and α (Preα) frequency bands separately (−1.0 to 0 s prior to stimulus onset). These differences were subjected to a sensor-wise Pearson correlation analysis between prestimulus power differences and poststimulus onset amplitude differences between trial types (PostERF: Ord_75, Ord_25, and Rand). We found that differences in prestimulus slow-wave oscillations significantly correlated with the differential processing of trial types once the stimulus appears (*P< 0.05, fT—femtotesla). (A) Prestimulus Ord_75–Rand power differences in θ frequency band (4–8 Hz) positively correlated with enhanced early processing (0–200 ms after stimulus onset) of Ord_75 compared with Rand trials (Preθ/PostERF). (B) Prestimulus α frequency band (8-12 Hz) power differences (Ord_75–Rand) correlated with greater neural sensitivity to Ord_25 compared with Ord_75 during the later poststimulus onset time window (200–400 ms after stimulus onset, Preα/PostERF).

We conducted a similar analysis for the α band power (After calculating the maximal prestimulus power difference within the α frequency band, 1 participant was found to be a statistical outlier (greater than ±2.5 SD of the mean) on this measure and was excluded from the subsequent correlation analysis [n= 17 ])). Prestimulus α differences between Ord_75 and Rand trials positively correlated with the (Ord_25–Ord_75) ERF during the 200–400 ms window, within a cluster of parieto-occipital sensors (“Preα/PostERF”; Fig. 4B). The average correlation in this sensor cluster was statistically significant, r(17) = 0.57, P= 0.017. This indicates that a stronger “odd-ball” ERF (Ord_25–Ord_75) in this time window was found for individuals for which Ordered series invoked greater prestimulus α band power compared with Random series (In Supplementary Results (Section 1.2.2), we report a similar sensor-wise analysis investigating clusters where poststimulus ERF differences could be explained by differential time–frequency activity between Ordered and Random series during processing of the previous stimulus, which returned a null result, all P > 0.05 after cluster-correction.).

To ensure that the reported ERP waveform differences in this analysis (Fig. 4) and the prior analysis (Fig. 2C) are not influenced by possible baseline shifts (ERF baseline: −200 ms prior to stimulus presentation) of the reported time–frequency correlations during the prestimulus period (−1000 ms prior to stimulus presentation), we performed the following post hoc analyses: first, we repeated the same analysis (Fig. 2C) of orthogonal pairwise contrasts between ERFs for the 3 trial types (Ord_75, Ord_25, and Rand) during the −200 ms prestimulus time window (baseline period) and found no significant differences between any trial types even without cluster-correction (all Ps > 0.05). Next, we repeated the same analysis reported above (Fig. 4) of maximal time–frequency power differences (within the θ and α bands separately) between the Ord_75 and Rand trial types, during the interstimulus interval and found no significant correlations of these difference magnitudes with interindividual differences in ERFs for the different trial types (Ord_75, Ord_25, and Rand) during the −200 ms prestimulus time window used as a baseline period (all Ps > 0.05). Therefore, all reported ERP waveform differences are not due to baseline shifts during the interstimulus interval.

Moderation Analysis of Prestimulus and Poststimulus Processing with WMC

The relationship presented in the prior section between prestimulus power differences for Ordered and Random series, and poststimulus onset ERFs (“Preθ/PostERF” and “Preα/PostERF”; Fig. 4), was established independently from any relationship with WMC. Our final analysis examined whether the relation between (regularity-related) prestimulus power differences and poststimulus ERF patterns is itself moderated by WMC.

To examine this issue, we conducted a moderation analysis (implemented via regression models; Baron and Kenny 1986) to determine the potential role of WMC, operationalized via K-scores, in moderating prestimulus to poststimulus relationships (see Supplementary Methods). Here, we report this analysis for the θ band only, as the data for the α band did not satisfy the typical requirements for a moderation analysis (the 2 nondirect pathways were not associated with a significant correlation or approaching significance). The moderation analysis showed that the relationship between the Preθ and PostERF variables was moderated by WMC (see Supplementary Fig. 3). Specifically, after fixing for WMC, the statistically significant correlation between Preθ and PostERF (r(18) = 0.53, P = 0.024) was no longer significant. We note, however, that we did not identify a significant difference in the magnitude of this direct pathway (a post hoc Sobel test was nonsignificant, P > 0.05).

Behavioral Evaluation (Experiment 2)

The MEG findings corroborated our hypothesis that statistically regular series promote a specific anticipatory prestimulus processing that scales with WMC, and that these anticipatory processes impact poststimulus processing. To evaluate whether WMC is related to the cognitive ability to use this statistical information, we conducted an additional behavioral study (Experiment 2: N = 20) with the same design of the MEG study, but that required participants to indicate by button press whether each picture presented was a “living” or “nonliving” item (see Supplementary Methods and Fig. 1). This study also included a WMC assessment, exactly as detailed for Experiment 1. The split-half reliability of this behavioral test (Spearman–Brown corrected) was r = 0.77 (see Supplemental Results).

Pairwise t-tests demonstrated that participants' accuracy (correctly classifying pictures as “living” or “nonliving”) was significantly better on Ord_75 trials (91.64 ± 4.23%) compared with Rand trials (90.69 ± 4.26%; t(18) = −2.40, P = 0.028), demonstrating a behavioral facilitation of visual target identification when category information was associatively predictable. In addition, participants showed significantly better accuracy for Ord_75 trials compared with trials that were associatively “surprising” within an Ordered series (Ord_25: 90.00 ± 3.92%; t(18) = −3.06, P = 0.007). This pattern of results is important, as it indicates that the behavioral facilitation of responses for Ord_75 trials is not a generalized effect of those trials appearing in low-entropy series per se, but instead a differentiation of performance based on the predictability each specific trial type within a series. In accordance with this notion, there was no difference in accuracy between Rand trials and Ord_25 trials (t(18) = 1.17, P = 0.259). Subsequent pairwise t-test comparisons of participant's speed of response (RT: reaction times) did not significantly differ between any of the trial types (all P > 0.05), yet there was a trend for slower responses on Ord_75 trials (453.14 ± 13.9 ms) compared with Rand trials (448.17 ± 13.7 ms; t(18) = 1.97, P = 0.065) suggesting that a beneficial increase of RT might be related to the higher accuracy in Ord_75 trials. To summarize, these patterns strongly suggest that in Ordered series, the participants as a group were sensitive to the statistics of the series and used this information to anticipate the more predictable category.

We were, however, also interested in how WMC could be related to the benefit provided by statistically regular series. We derived a dependent measure of behavioral efficiency (reaction time/accuracy, known as Inverse Efficiency Score [IES]—see Supplementary Results) for each trial type, where a lower score reflects more efficient behavior (Townsend and Ashby 1978, 1983). Using this approach, we found that participants with higher WMC more strongly benefited from trial predictability. Consistent with what could be expected from the MEG results, participants' WMC estimates were negatively correlated with the (Ord_75–Rand) differences in IES, r(19) = −0.534, P= 0.019. That is, participants with higher WMC showed more efficient response behavior. There was no correlation between WMC and the (Ord_25–Rand) difference in IES, r(19) = −0.018, P= 0.942). To summarize, this study showed that participants with greater WMC made better the use of prior statistical information, resulting in more efficient processing of predictable trials.

Discussion

Current neurobiological models attribute a fundamental role of people's ability to assess the uncertainty of sensory events from the environment as a foundation for perception and cognition (Friston 2009). This statistical knowledge about the relative probabilities of sensory occurrences can be used to optimize sensory processing either by mechanisms engaged after stimulus presentation (Bar et al. 2006) and/or via the construction of anticipatory predictions prior to stimulus appearance. People's sensitivity to the statistical structure of temporally extended events (Lewicki et al. 1992; Smithson 1997; Stephen and Dixon 2011; Karuza et al. 2014; Emberson et al. 2015) would suggest that such probabilities could be used for constructing anticipatory predictions about forthcoming events, a process known as probabilistic inference.

However, to date, the neurobiological mechanisms that allow using statistical information to optimize processing have not been sufficiently delineated. Most importantly, the potential role of working memory ability has been virtually absent in discussions of the neurobiology of statistical learning more generally, and prediction specifically (but see Misyak et al. 2010; Frost et al. 2015; Siegelman and Frost 2015; Huettig and Janse 2016 for examinations in the behavioral literature). One reason may be that initial neurobiological models emphasized the idea that individuals aggregate statistical knowledge over large temporal constants, which was taken to reflect an “ideal Bayesian observer” mechanism (Strange et al. 2005; Harrison et al. 2006). However, subsequent modeling of these experimental data was highly suggestive of a role for WMC, showing that probabilistic inference is more accurately potentiated from a very limited number of recent events (Harrison et al. 2011). Here we find that individuals display a specific sensitivity to the nondeterministic associative regularities of occurrence between semantic categories and most importantly, we establish that the probabilistic inference of future events is critically influenced by differences in people's ability to maintain the recent past in short-term working memory. Specifically, visual WMC was related to differences in slow-wave neural oscillations prior to the stimulus appearing, primarily in the θ frequency band (Fig. 3AC), and this enhancement of prestimulus activity was further linked to preferential early attentional processing after stimulus presentation in statistical regular contexts (Fig. 4A). This therefore suggests that one's ability to represent statistical relationships within the recent past helps anticipate events in the near future (via preparatory slow-wave neural oscillations) and thus facilitate early processing of future sensory inputs.

Signatures of Statistical Learning

Our results highlight how the statistical structure of sensory information from the environment is related to anticipatory prestimulus activity, the role of working memory in this anticipatory process, and the relation of prestimulus activity to poststimulus processing. In both our experiments, we demonstrate that this process occurs spontaneously, since the predictability of future events was manipulated orthogonally to the task demands and afforded participants no explicit benefit nor did it impose any explicit memory requirements. Our MEG results demonstrated a component (Fig. 2C) highly similar to the commonly reported novelty and “odd-ball” components (Polich and Comerchero 2003; Gonsalves et al. 2005; Cycowicz and Friedman 2007; Polich 2007). These findings are completely consistent with prior work showing spontaneous learning in image streams that contain deterministic association between image pairs (Turk-Browne et al. 2010; Reddy et al. 2015) and with studies that documented statistical learning of the more and less probable transitions (Bornstein and Daw 2012; Tobia, Iacovella, and Hasson 2012). In the behavioral study, we also found facilitated responses to stimuli that satisfied the more likely transition probability and could therefore be anticipated (see Results).

The Role of Working Memory

Importantly, we found a convergence of evidence for the role of working memory in probabilistic inference. WMC was related to differences in neural processing during both the pre- and poststimulus onset periods. Because our MEG experiment was designed to contrast prestimulus activity epochs between statistically regular and random series, we could directly link anticipatory processes to the statistical structure of event sequences (such a contrast partials out any general anticipatory mechanisms). We found that during the prestimulus period, individuals with higher WMC showed greater differentiation in the θ frequency band, which held throughout the prestimulus period, but terminated just prior to the onset of the next stimulus (Fig. 3B). During prestimulus periods, higher WMC also correlated with greater differentiation in the α band, but this pattern held throughout the prestimulus period, as well as during stimulus presentation (Fig. 3E).

We also independently identified posterior MEG sensors where differential prestimulus θ power correlated with the difference in response to predictable versus nonpredictable stimuli just after stimulus presentation (Fig. 4A). This relationship was also found to be moderated by participant's working memory (see Supplementary Fig. 3), indicating that the relation between statistically related anticipatory activity and statistically influenced stimulus processing is linked to one's working memory abilities. Including a WMC mediator was important for determining whether the direct link between pre- and poststimulus activity is itself significant. In absence of the mediator variable, one would have concluded that prestimulus activity directly influences poststimulus activity to predictability.

Taken together, our findings suggest that one's ability to represent the recent past helps anticipate events in the near future (via preparatory slow-wave neural oscillations) and thus facilitate early processing of future sensory inputs that could benefit behavior. In accordance with this notion, Experiment 2 demonstrated that individuals with greater WMC also displayed more efficient behavioral responses to predictable than to nonpredictable stimuli. In fact, WMC was found to play such an important moderating role, that when interindividual differences in WMC were not taken into account, differences between predictable and nonpredictable stimuli were greatly minimalized in the behavioral data, and were virtually absent in the prestimulus MEG findings. This result not only shows the relation between statistical learning processes and WMC, but also suggests that future work may benefit from considering this factor of interindividual differences in statistical learning—see also Frost et al. (2015).

The WMC-related findings that we have identified, and the account we outlined, do however raise the question of how young children are able to achieve statistical learning (Saffran et al. 1999; Emberson et al. 2015) given their lower WMC. In addressing this issue, we consider 3 themes that should be evaluated conjointly: 1) what is the temporal integration windows over which statistical information appears to be integrated in adults, 2) is there indication that children's WMC is sufficient to support integration on this scale, and 3) what is the degree of robustness of children's statistical learning and could these differences be related to WMC. Concerning integration time windows in adults, as mentioned above, it is becoming increasingly clear that adults to do not integrate statistical information in a way that equally weighs all prior instances in these types of paradigms. For instance, a study by Harrison et al. (2011) modeled the surprise response (prediction error) to stimuli and found that the best model assumed sensitivity to only the last 4 items. They concluded, “no matter how many samples are presented, observers have a threshold on the effective number of past observations that guide their behavior. This provides evidence that observers discount distant information when making inference about statistical regularities in their environment.” Similarly, Bornstein and Daw (2012) implemented a 4 × 4 transition matrix similar to ours (but with transitions holding between specific exemplars rather than categories). They found that the predictive power of stimulus A on stimulus B depended on how much time had passed since the last AB combination had been presented, with a very sharp decay function (over the last 4 trials), and noted that this pattern minimizes the efficacy of any model that does not incorporate a forgetting process. Finally, in our own work (Tobia, Iacovella, and Hasson 2012), we implemented a paradigm where the Markov entropy changed gradually over short-time scales. We identified brain regions sensitive to these entropy levels in the very recent past (within the prior 10 s), but also other areas that tracked the trajectory of changes in regularity over greater time windows (direction of gradual increase/decrease in regularity over longer time periods). To conclude this point, in stochastic contexts (i.e., nondeterministic), there is evidence that even adults retain information over a relatively recent window.

We note this might not be the case for deterministic contexts, that is, ones where a stimulus is completely predictive of another stimulus (Turk-Browne et al. 2010), and which is an operationalization of statistical learning often used in children's studies (where “words” reflect a substring of stimuli that are completely predictive; Saffran et al. 1996). Can children's working memory support such temporal windows? Studies using a WMC capacity assessment similar to the one we used here suggest that young children could most likely approach the adult WMC of 3–4 items in memory within the first year of life (Rose et al. 2001; Ross-Sheehy et al. 2003), and paradigms using object occlusion point to the ability to maintain 2–3 items in memory even at the age of 7 months (Moher et al. 2012). Yet comparisons of WMC in adults to that of the different stages of development in children can be sometimes quite tenuous due to the fact that differences in task design used to assess this capacity within different age groups may not always be directly comparable and should be taken with some degree of caution (Simmering 2012). To summarize, if behavior in stochastic contexts depends mainly on the very recent past, it might be that children could use their WMC to support this process, in a similar way to that of adults. It is important to note that while children's ability is considered robust, commonly used measures to assess these capacities in children (e.g., preferential looking or preferential listening; Saffran et al. 1999; Kirkham et al. 2007) make it difficult to say just how robust learning actually is within and across individuals, especially when comparing these findings to those in adults. Finally, recent work on the relation between attention and statistical learning suggests that some statistical learning may take place “under the radar” with little cognitive control. For instance, automatic learning of statistical features of an unattended stream can take place when 2 information streams are presented in parallel (Musz et al.2015). This may allow children, who have less developed control mechanisms, to achieve statistical learning.

Neural Oscillations in the θ- and α-Frequency Bands

Recent work has shown that neural activity occurring prior to the presentation of a visual stimulus can strongly impact subsequent stimulus processing. Particularly relevant to our study, prestimulus increases in θ power have been shown to facilitate memory encoding (Jutras et al. 2013) and delayed retrieval (Addante et al. 2011) of visual items committed to memory. Neural modulations within the θ band have also been associated with the active maintenance of complex visual stimuli, similar to those used here, during the retention intervals of visual working memory tasks (Cashdollar et al. 2009, 2013). Furthermore, θ activity has been known to support the maintenance of temporally separate events (Hsieh et al. 2011; Roberts et al. 2013) and has been proposed as a likely neural mechanism supporting the process of pattern completion (Hasselmo et al. 1995) to instantiate the next event when only fragmentary information is available.

On the basis of these findings, we suggest that in the current study, θ band activity during prestimulus periods may not indicate solely the construction of a prediction, but perhaps fulfills a dual role within statistically regular sequences—that of consolidation and prediction. Consider the schematic series A, B, C that stands for a subsection of a statistically regular series. On the one hand, θ band activity may be related to forming retrospective associative links between the current and prior event, that is, after being presented events A, B, the link of AB is reinforced, and the strength of this link reflects the transition strengths between the categories. Such an operation would be consistent with evidence showing that individuals are sensitive to the probability of past events given the present (reverse transition probability; Perruchet and Desaulty 2008; Pelucchi et al. 2009). On the other hand, activity in the θ band may also be related to constructing a prospective association, that is, given the established link BC, the presentation of B will increase the accessibility of C. Therefore, the prestimulus θ activity found here may be related to both backward and forward oriented operations. The forward-looking operation may be described from the perspective of prediction making (minimizing uncertainty or free energy; Friston 2009). Both the forward- and backward-linking operations are compatible with the perspective of automatic pattern completion as described by “chunking” approaches (Perruchet and Pacton 2006; Kumaran and Maguire 2009). Indeed, our own work (Tremblay et al. 2013) suggests that statistically regular auditory series are related to perceptual grouping. In this study, participants were presented with regular or random series that always contained 4 items. Yet, after hearing these series, participants reported hearing fewer distinct items in the regular series suggesting an associative binding occurred during the perception of these series (but this effect was absent in random series).

Prestimulus α band differences between statistically regular and random series were also related to WMC, but these differences also encompassed the entire epoch in which the stimuli were on the screen which suggests a general, tonic modulation of attention between series. It has been previously shown that modulations of prestimulus α oscillations enhance the detection of low-level visual features (van Dijk et al. 2008; Mathewson et al. 2009; Spaak et al. 2014). Here we demonstrate a similar effect for a higher level cognitive process, where prestimulus α differences between statistically regular and random series enhanced the poststimulus sensitivity to surprising (Ord_25) categories when compared with predictable (Ord_75) categories (Fig. 4B). However, this correlative relationship was not moderated by WMC (see Supplementary Methods) suggesting that WMC may not be uniquely responsible for this relationship as measured here.

It has recently been suggested that visual WMC, as assessed by a change detection task, is related to the ability to probabilistically infer future sensory information (Ma et al. 2014). For instance, in cases where sensory features are probabilistically varied in such tasks (i.e., likelihood of cue to target match), observers have been shown to retain the corresponding probabilities over time to optimize their behavioral precision (Najemnik and Geisler 2005; Ma et al. 2011). Furthermore, it has also been found that associatively binding items within a static visual display into patterns of occurrence will affect the precision of subsequent change detection estimates (Brady and Tenenbaum 2013). In comparison, these studies manipulated low-level visual features within a static visual search display (Hollingworth et al. 2008; Carlisle et al. 2011), or target probabilities in the context of the change detection task itself (Najemnik and Geisler 2005; Ma et al. 2011) and did not address the process of probabilistic inference in a more naturalistic setting where events occur over an extended time series and where the specific visual features of subsequent items could not be known with certainty. Our work suggests that such WMC-driven predictions occur spontaneously during perception since the predictability of yet unseen visual categories was manipulated orthogonally to the task demands and afforded participants no explicit benefit. Overall, this is an intuitively satisfying notion that one's ability to retain the recent past allows them to predict the near future, and our results are consistent with work showing that when modeling the statistical features of a sensory input, individuals tend to be influenced primarily by events in the very recent past (Harrison et al. 2011).

In summary, our study provides a convergence of evidence establishing the vital role of working memory in the probabilistic inference of future events. This capacity to represent the recent past specifically facilitates processing within statistically regular contexts in that it boosts anticipatory slow-wave neural oscillations prior to the appearance of visual information and preferential sensory processing for these anticipated events once they are presented. This process, which is moderated by one's working memory ability, appears to intrinsically occur during continuous perception and poses a realistic advantage for human behavior by allowing the anticipation of higher level semantic concepts that are expected to occur in the near future.

Supplementary Material

Supplementary material can be found at http://www.cercor.oxfordjournals.org/online.

Funding

This work was supported by a European Council Starting Grant (ERC-STG #263318; NeuroInt) to U.H. and a European Council Starting Grant (ERC-STG 283404; WIN2CON) to N.W. The authors declare no competing financial interests.

Notes

Conflict of Interest: None declared.

References

Addante
RJ
,
Watrous
AJ
,
Yonelinas
AP
,
Ekstrom
AD
,
Ranganath
C
.
2011
.
Prestimulus theta activity predicts correct source memory retrieval
.
Proc Natl Acad Sci USA.
108
:
10702
10707
.

Awh
E
,
Barton
B
,
Vogel
EK
.
2007
.
Visual working memory represents a fixed number of items regardless of complexity
.
Psychol Sci.
18
:
622
628
.

Bar
M
,
Kassam
KS
,
Ghuman
AS
,
Boshyan
J
,
Schmid
AM
,
Dale
AM
,
Hamalainen
MS
,
Marinkovic
K
,
Schacter
DL
,
Rosen
BR
et al. .
2006
.
Top-down facilitation of visual recognition
.
Proc Natl Acad Sci USA.
103
:
449
454
.

Baron
RM
,
Kenny
DA
.
1986
.
The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations
.
J Pers Soc Psychol.
51
:
1173
1182
.

Bell
AJ
,
Sejnowski
TJ
.
1995
.
An information-maximization approach to blind separation and blind deconvolution
.
Neural Comput.
7
:
1129
1159
.

Bergmann
HC
,
Rijpkema
M
,
Fernandez
G
,
Kessels
RP
.
2012
.
Distinct neural correlates of associative working memory and long-term memory encoding in the medial temporal lobe
.
Neuroimage.
63
:
989
997
.

Bestmann
S
,
Harrison
LM
,
Blankenburg
F
,
Mars
RB
,
Haggard
P
,
Friston
KJ
,
Rothwell
JC
.
2008
.
Influence of uncertainty and surprise on human corticospinal excitability during preparation for action
.
Curr Biol.
18
:
775
780
.

Bollinger
J
,
Rubens
MT
,
Zanto
TP
,
Gazzaley
A
.
2010
.
Expectation-driven changes in cortical functional connectivity influence working memory and long-term memory performance
.
J Neurosci
.
30
:
14399
14410
.

Bornstein
AM
,
Daw
ND
.
2012
.
Dissociating hippocampal and striatal contributions to sequential prediction learning
.
Eur J Neurosci.
35
:
1011
1023
.

Brady
TF
,
Tenenbaum
JB
.
2013
.
A probabilistic model of visual working memory: Incorporating higher order regularities into working memory capacity estimates
.
Psychol Rev.
120
:
85
109
.

Bubic
A
,
von Cramon
DY
,
Jacobsen
T
,
Schroger
E
,
Schubotz
RI
.
2009
.
Violation of expectation: neural correlates reflect bases of prediction
.
J Cogn Neurosci.
21
:
155
168
.

Bubic
A
,
von Cramon
DY
,
Schubotz
RI
.
2011
.
Exploring the detection of associatively novel events using fMRI
.
Hum Brain Mapp.
32
:
370
381
.

Carlisle
NB
,
Arita
JT
,
Pardo
D
,
Woodman
GF
.
2011
.
Attentional templates in visual working memory
.
J Neurosci.
31
:
9315
9322
.

Cashdollar
N
,
Lavie
N
,
Duzel
E
.
2013
.
Alleviating memory impairment through distraction
.
J Neurosci
.
33
:
19012
19022
.

Cashdollar
N
,
Malecki
U
,
Rugg-Gunn
FJ
,
Duncan
JS
,
Lavie
N
,
Duzel
E
.
2009
.
Hippocampus-dependent and -independent theta-networks of active maintenance
.
Proc Natl Acad Sci USA
.
106
:
20493
20498
.

Cowan
N
.
2001
.
The magical number 4 in short-term memory: a reconsideration of mental storage capacity
.
Behav Brain Sci.
24
:
87
114
.
discussion 114–185
.

Cycowicz
YM
,
Friedman
D
.
2007
.
Visual novel stimuli in an ERP novelty oddball paradigm: effects of familiarity on repetition and recognition memory
.
Psychophysiology.
44
:
11
29
.

Davachi
L
,
Wagner
AD
.
2002
.
Hippocampal contributions to episodic encoding: insights from relational and item-based learning
.
J Neurophysiol.
88
:
982
990
.

de Lange
FP
,
Rahnev
DA
,
Donner
TH
,
Lau
H
.
2013
.
Prestimulus oscillatory activity over motor cortex reflects perceptual expectations
.
J Neurosci.
33
:
1400
1410
.

Diamond
R
,
Carey
S
.
1986
.
Why faces are and are not special: an effect of expertise
.
J Exp Psychol.
115
:
107
117
.

Emberson
LL
,
Richards
JE
,
Aslin
RN
.
2015
.
Top-down modulation in the infant brain: Learning-induced expectations rapidly affect the sensory cortex at 6 months
.
Proc Natl Acad Sci USA.
112
:
9585
9590
.

Friston
K
.
2009
.
The free-energy principle: a rough guide to the brain?
Trends Cogn Sci.
13
:
293
301
.

Frost
R
,
Armstrong
BC
,
Siegelman
N
,
Christiansen
MH
.
2015
.
Domain generality versus modality specificity: the paradox of statistical learning
.
Trends Cogn Sci.
19
:
117
125
.

Fukuda
K
,
Vogel
EK
.
2009
.
Human variation in overriding attentional capture
.
J Neurosci.
29
:
8726
8733
.

Geisler
WS
.
2011
.
Contributions of ideal observer theory to vision research
.
Vision Res.
51
:
771
781
.

Gonsalves
BD
,
Kahn
I
,
Curran
T
,
Norman
KA
,
Wagner
AD
.
2005
.
Memory strength and repetition suppression: multimodal imaging of medial temporal cortical contributions to recognition
.
Neuron.
47
:
751
761
.

Harrison
LM
,
Bestmann
S
,
Rosa
MJ
,
Penny
W
,
Green
GG
.
2011
.
Time scales of representation in the human brain: weighing past information to predict future events
.
Front Hum Neurosci
.
5
:
37
.

Harrison
LM
,
Duggins
A
,
Friston
KJ
.
2006
.
Encoding uncertainty in the hippocampus
.
Neural Netw.
19
:
535
546
.

Hasselmo
ME
,
Schnell
E
,
Barkai
E
.
1995
.
Dynamics of learning and recall at excitatory recurrent synapses and cholinergic modulation in rat hippocampal region CA3
.
J Neurosci
.
15
:
5249
5262
.

Heinze
HJ
,
Mangun
GR
,
Burchert
W
,
Hinrichs
H
,
Scholz
M
,
Munte
TF
,
Gos
A
,
Scherg
M
,
Johannes
S
,
Hundeshagen
H
et al. .
1994
.
Combined spatial and temporal imaging of brain activity during visual selective attention in humans
.
Nature.
372
:
543
546
.

Hollingworth
A
,
Richard
AM
,
Luck
SJ
.
2008
.
Understanding the function of visual short-term memory: transsaccadic memory, object correspondence, and gaze correction
.
J Exp Psychol Gen.
137
:
163
181
.

Hsieh
LT
,
Ekstrom
AD
,
Ranganath
C
.
2011
.
Neural oscillations associated with item and temporal order maintenance in working memory
.
J Neurosci.
31
:
10803
10810
.

Huettel
SA
,
Song
AW
,
McCarthy
G
.
2005
.
Decisions under uncertainty: probabilistic context influences activation of prefrontal and parietal cortices
.
J Neurosci.
25
:
3304
3311
.

Huettig
F
,
Janse
E
.
2016
.
Individual differences in working memory and processing speed predict anticipatory spoken language processing in the visual world
.
Lang Cogn Neurosci
.
31
1
:
80
93
.

Jutras
MJ
,
Fries
P
,
Buffalo
EA
.
2013
.
Oscillatory activity in the monkey hippocampus during visual exploration and memory formation
.
Proc Natl Acad Sci USA.
110
:
13144
13149
.

Karuza
EA
,
Emberson
LL
,
Aslin
RN
.
2014
.
Combining fMRI and behavioral measures to examine the process of human learning
.
Neurobiol Learn Mem.
109
:
193
206
.

Kersten
D
,
Yuille
A
.
2003
.
Bayesian models of object perception
.
Curr Opin Neurobiol.
13
:
150
158
.

Kirkham
NZ
,
Slemmer
JA
,
Richardson
DC
,
Johnson
SP
.
2007
.
Location, location, location: development of spatiotemporal sequence learning in infancy
.
Child Dev.
78
:
1559
1571
.

Kumaran
D
,
Maguire
EA
.
2009
.
Novelty signals: a window into hippocampal information processing
.
Trends Cogn Sci.
13
:
47
54
.

Kyllingsbaek
S
,
Bundesen
C
.
2009
.
Changing change detection: improving the reliability of measures of visual short-term memory capacity
.
Psychon Bull Rev.
16
:
1000
1010
.

Lange
J
,
Halacz
J
,
van Dijk
H
,
Kahlbrock
N
,
Schnitzler
A
.
2012
.
Fluctuations of prestimulus oscillatory power predict subjective perception of tactile simultaneity
.
Cereb Cortex.
22
:
2564
2574
.

Lewicki
P
,
Hill
T
,
Czyzewska
M
.
1992
.
Nonconscious acquisition of information
.
Am Psychol.
47
:
796
801
.

Luck
SJ
,
Vogel
EK
.
2013
.
Visual working memory capacity: from psychophysics and neurobiology to individual differences
.
Trends Cogn Sci.
17
:
391
400
.

Luck
SJ
,
Woodman
GF
,
Vogel
EK
.
2000
.
Event-related potential studies of attention
.
Trends Cogn Sci.
4
:
432
440
.

Ma
WJ
,
Husain
M
,
Bays
PM
.
2014
.
Changing concepts of working memory
.
Nat Neurosci.
17
:
347
356
.

Ma
WJ
,
Navalpakkam
V
,
Beck
JM
,
Berg
R
,
Pouget
A
.
2011
.
Behavior and neural basis of near-optimal visual search
.
Nat Neurosci.
14
:
783
790
.

Maris
E
,
Oostenveld
R
.
2007
.
Nonparametric statistical testing of EEG- and MEG-data
.
J Neurosci Methods
.
164
:
177
190
.

Mars
RB
,
Debener
S
,
Gladwin
TE
,
Harrison
LM
,
Haggard
P
,
Rothwell
JC
,
Bestmann
S
.
2008
.
Trial-by-trial fluctuations in the event-related electroencephalogram reflect dynamic changes in the degree of surprise
.
J Neurosci.
28
:
12539
12545
.

Mathewson
KE
,
Gratton
G
,
Fabiani
M
,
Beck
DM
,
Ro
T
.
2009
.
To see or not to see: prestimulus alpha phase predicts visual awareness
.
J Neurosci.
29
:
2725
2732
.

Misyak
JB
,
Christiansen
MH
,
Tomblin
JB
.
2010
.
On-line individual differences in statistical learning predict language processing
.
Front Psychol.
1
:
31
.

Moher
M
,
Tuerk
AS
,
Feigenson
L
.
2012
.
Seven-month-old infants chunk items in memory
.
J Exp Child Psychol.
112
:
361
377
.

Musz
E
,
Weber
MJ
,
Thompson-Schill
SL
.
2015
.
Visual statistical learning is not reliably modulated by selective attention to isolated events
.
Attention Perception Psychophysics.
77
:
78
96
.

Najemnik
J
,
Geisler
WS
.
2005
.
Optimal eye movement strategies in visual search
.
Nature.
434
:
387
391
.

Nastase
S
,
Iacovella
V
,
Hasson
U
.
2014
.
Uncertainty in visual and auditory series is coded by modality-general and modality-specific neural systems
.
Hum Brain Mapp.
35
:
1111
1128
.

Nichols
TE
,
Holmes
AP
.
2002
.
Nonparametric permutation tests for functional neuroimaging: a primer with examples
.
Hum Brain Mapp.
15
:
1
25
.

Oostenveld
R
,
Fries
P
,
Maris
E
,
Schoffelen
JM
.
2011
.
FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data
.
Comput Intelligence Neurosci.
2011
:
156869
.

Pelucchi
B
,
Hay
JF
,
Saffran
JR
.
2009
.
Learning in reverse: eight-month-old infants track backward transitional probabilities
.
Cognition.
113
:
244
247
.

Perruchet
P
,
Desaulty
S
.
2008
.
A role for backward transitional probabilities in word segmentation?
Memory Cognition.
36
:
1299
1305
.

Perruchet
P
,
Pacton
S
.
2006
.
Implicit learning and statistical learning: one phenomenon, two approaches
.
Trends Cogn Sci.
10
:
233
238
.

Polich
J
.
2007
.
Updating P300: an integrative theory of P3a and P3b
.
Clin Neurophysiol.
118
:
2128
2148
.

Polich
J
,
Comerchero
MD
.
2003
.
P3a from visual stimuli: typicality, task, and topography
.
Brain Topogr.
15
:
141
152
.

Pouget
A
,
Beck
JM
,
Ma
WJ
,
Latham
PE
.
2013
.
Probabilistic brains: knowns and unknowns
.
Nat Neurosci.
16
:
1170
1178
.

Reddy
L
,
Poncet
M
,
Self
MW
,
Peters
JC
,
Douw
L
,
van Dellen
E
,
Claus
S
,
Reijneveld
JC
,
Baayen
JC
,
Roelfsema
PR
.
2015
.
Learning of anticipatory responses in single neurons of the human medial temporal lobe
.
Nat Commun.
6
:
8556
.

Roberts
BM
,
Hsieh
LT
,
Ranganath
C
.
2013
.
Oscillatory activity during maintenance of spatial and temporal information in working memory
.
Neuropsychologia.
51
:
349
357
.

Rose
SA
,
Feldman
JF
,
Jankowski
JJ
.
2001
.
Visual short-term memory in the first year of life: capacity and recency effects
.
Dev Psychol.
37
:
539
549
.

Ross-Sheehy
S
,
Oakes
LM
,
Luck
SJ
.
2003
.
The development of visual short-term memory capacity in infants
.
Child Dev.
74
:
1807
1822
.

Saffran
JR
,
Aslin
RN
,
Newport
EL
.
1996
.
Statistical learning by 8-month-old infants
.
Science.
274
:
1926
1928
.

Saffran
JR
,
Johnson
EK
,
Aslin
RN
,
Newport
EL
.
1999
.
Statistical learning of tone sequences by human infants and adults
.
Cognition
.
70
:
27
52
.

Scapinello
KF
,
Yamey
AD
.
1970
.
The role of familiarity and orientation in immediate and delayed recognition of pictorial stimuli
.
Psychonomic Sci
.
21
:
329
330
.

Siegelman
N
,
Frost
R
.
2015
.
Statistical learning as an individual ability: theoretical perspectives and empirical evidence
.
J Memory Lang
.
81
:
105
120
.

Simmering
VR
.
2012
.
The development of visual working memory capacity during early childhood
.
J Exp Child Psychol.
111
:
695
707
.

Smithson
M
.
1997
.
Judgment under chaos
.
Org Behav Hum Decision Processes
.
69
:
58
66
.

Spaak
E
,
de Lange
FP
,
Jensen
O
.
2014
.
Local entrainment of alpha oscillations by visual stimuli causes cyclic modulation of perception
.
J Neurosci
.
34
:
3536
3544
.

Stephen
DG
,
Dixon
Ja
.
2011
.
Strong anticipation: multifractal cascade dynamics modulate scaling in synchronization behaviors
.
Chaos Solitons Fractals
.
44
:
160
168
.

Stokes
MG
,
Myers
NE
,
Turnbull
J
,
Nobre
AC
.
2014
.
Preferential encoding of behaviorally relevant predictions revealed by EEG
.
Front Human Neurosci.
8
:
687
.

Strange
BA
,
Duggins
A
,
Penny
W
,
Dolan
RJ
,
Friston
KJ
.
2005
.
Information theory, novelty and hippocampal responses: unpredicted or unpredictable?
Neural Netw.
18
:
225
230
.

Tobia
MJ
,
Iacovella
V
,
Davis
B
,
Hasson
U
.
2012
.
Neural systems mediating recognition of changes in statistical regularities
.
Neuroimage.
63
:
1730
1742
.

Tobia
MJ
,
Iacovella
V
,
Hasson
U
.
2012
.
Multiple sensitivity profiles to diversity and transition structure in non-stationary input
.
Neuroimage.
60
:
991
1005
.

Townsend
JT
,
Ashby
FG
.
1978
.
Methods of modeling capacity in simple processing systems
. In:
Castellan
J
,
Restle
F
, editors.
Cognitive theory Hillsdale
.
NJ
:
Erlbaum
. p.
200
239
.

Townsend
JT
,
Ashby
FG
.
1983
.
Stochastic modeling of elementary psychological processes
.
Cambridge
:
Cambridge University Press
.

Tremblay
P
,
Baroni
M
,
Hasson
U
.
2013
.
Processing of speech and non-speech sounds in the supratemporal plane: auditory input preference does not predict sensitivity to statistical structure
.
Neuroimage.
66
:
318
332
.

Turk-Browne
NB
,
Scholl
BJ
,
Chun
MM
,
Johnson
MK
.
2009
.
Neural evidence of statistical learning: efficient detection of visual regularities without awareness
.
J Cogn Neurosci.
21
:
1934
1945
.

Turk-Browne
NB
,
Scholl
BJ
,
Johnson
MK
,
Chun
MM
.
2010
.
Implicit perceptual anticipation triggered by statistical learning
.
J Neurosci.
30
:
11177
11187
.

Valdes-Sosa
M
,
Bobes
MA
,
Rodriguez
V
,
Pinilla
T
.
1998
.
Switching attention without shifting the spotlight object-based attentional modulation of brain potentials
.
J Cogn Neurosci.
10
:
137
151
.

van Dijk
H
,
Schoffelen
JM
,
Oostenveld
R
,
Jensen
O
.
2008
.
Prestimulus oscillatory activity in the alpha band predicts visual discrimination ability
.
J Neurosci.
28
:
1816
1823
.

Vogel
EK
,
McCollough
AW
,
Machizawa
MG
.
2005
.
Neural measures reveal individual differences in controlling access to working memory
.
Nature.
438
:
500
503
.

Vossel
S
,
Weidner
R
,
Thiel
CM
,
Fink
GR
.
2009
.
What is “odd” in Posner's location-cueing paradigm? Neural responses to unexpected location and feature changes compared
.
J Cogn Neurosci.
21
:
30
41
.

Willenbockel
V
,
Sadr
J
,
Fiset
D
,
Horne
GO
,
Gosselin
F
,
Tanaka
JW
.
2010
.
Controlling low-level image properties: the SHINE toolbox
.
Behav Res Methods.
42
:
671
684
.

Wutz
A
,
Weisz
N
,
Braun
C
,
Melcher
D
.
2014
.
Temporal windows in visual processing: “prestimulus brain state” and “poststimulus phase reset” segregate visual transients on different temporal scales
.
J Neurosci
.
34
:
1554
1565
.

Supplementary data