-
PDF
- Split View
-
Views
-
Cite
Cite
Eduardo Valero Cano, Andreas Fichtner, Daniel Peter, P Martin Mai, The impact of ambient noise sources in subsurface models estimated from noise correlation waveforms, Geophysical Journal International, Volume 239, Issue 1, October 2024, Pages 85–98, https://doi.org/10.1093/gji/ggae259
- Share Icon Share
SUMMARY
Cross-correlations of seismic ambient noise are frequently used to image Earth structure. Usually, tomographic studies assume that noise sources are uniformly distributed and interpret noise correlations as empirical Green’s functions. However, previous research suggests that this assumption can introduce errors in the estimated models, especially when noise correlation waveforms are inverted. In this paper, we investigate changes in subsurface models inferred from noise correlation waveforms depending on whether the noise source distribution is considered to be uniform. To this end, we set up numerical experiments that mimic a tomographic study in Southern California exploiting ambient noise generated in the Pacific Ocean. Our results show that if the distribution of noise sources is deemed uniform instead of being numerically represented in the wave simulations, the misfit of the estimated models increases. In our experiments, the model misfit increase ranges between 5 and 21 per cent, depending on the heterogeneity of the noise source distribution. This indicates that assuming uniform noise sources introduces source-dependent model errors. Since the location of noise sources may change over time, these errors are also time-dependent. In order to mitigate these errors, it is necessary to account for the noise source distribution. The spatial extent to which noise sources must be considered depends on the propagation distance of the ambient noise wavefield. If only sources near the study area are considered, model errors may arise.
1 INTRODUCTION
Seismic ambient noise contains valuable information about Earth’s structure. Previously, ambient noise recordings were considered unusable, as they consist of weak, pseudo-random oscillations difficult to interpret. However, their correlations are now regarded as valuable signals. One of the first works to acknowledge the utility of ambient noise is by Aki (1957), who studied the phase speed of surface waves using the spatial autocorrelation method. Another early study that discovered the value of noise correlations is by Claerbout (1968), who determined that the autocorrelation of seismic waves generated at depth and recorded at the surface yields the reflectivity below the recording point. Despite these notable contributions, interest in ambient noise only increased much later, following the seminal publication of Lobkis & Weaver (2001). In this work, they show that the cross-correlation of a diffuse field recorded at two receivers is proportional to the impulse response of the medium, that is, the Green’s function. This finding motivated studies proving that Green’s functions and correlations of waves like earthquake-coda (Campillo & Paul 2003; Snieder 2004) and ambient noise (Shapiro & Campillo 2004; Roux et al. 2005) are proportional under certain conditions, eventually leading to the first noise tomography applications (Shapiro et al. 2005; Sabra et al. 2005).
For noise correlations to approximate Green’s functions, the noise wavefield must be isotropic near the receivers (Snieder 2004) or the source distribution must be uniform (Roux et al. 2005). Nowadays, many tomographic studies assume that the noise source distribution is uniform and interpret noise correlations as empirical Green’s functions. Most such studies exploit ambient noise generated by the interaction of the oceans with the solid Earth due to its relatively high energy compared to other types of ambient noise (Nakata et al. 2019). Also, they often focus on the analysis of surface waves as they are usually the strongest signals. A typical noise tomography workflow consists of estimating 2-D wave speed maps using surface wave dispersion measurements and ray theory (Yang et al. 2007; Bensen et al. 2008; Lin et al. 2008; Saygin & Kennett 2010), followed by a 1-D inversion to obtain a 3-D Earth structure model (Brenguier et al. 2007; Stehly et al. 2009; Spica et al. 2017). Another option is to invert noise correlations as empirical Green’s functions using full-waveform inversion, which yields a 3-D subsurface model without intermediate steps (Chen et al. 2014; Gao & Shen 2015; Liu et al. 2017; Zhang et al. 2018; Lu et al. 2020; Wang et al. 2021).
Assuming that noise sources are uniformly distributed is a common practice; however, this assumption is not generally correct. For instance, ocean noise sources are often localized rather than being evenly spread over a large area. Moreover, ocean-wave properties and hence ocean noise sources vary over time, particularly with the season (Ardhuin et al. 2015; Gualtieri et al. 2019). As a result, noise correlations represent only first-order approximations of Green’s functions, containing biases in amplitude (Prieto et al. 2011; Tsai 2011; Hanasoge 2013) and traveltime (Tsai 2009; Weaver et al. 2009; Yao & van der Hilst 2009), as well as spurious arrivals (Snieder et al. 2006; Retailleau et al. 2017; Li et al. 2020). If not accounted for properly, these biases can be mapped into the estimated Earth structure, resulting in fictitious heterogeneities.
Although current theory (Tromp et al. 2010; Hanasoge 2013, 2013b; Sager et al. 2018b) and tools (Ermert et al. 2020; Igel et al. 2023) allow the inversion of noise correlations considering uneven noise sources, the assumption of uniform noise sources still prevails. Treating noise correlations as empirical Green’s functions is convenient, as this transforms seismic stations into virtual point sources. Besides, multiple studies suggest that the noise source distribution has a small impact on narrow-band traveltime measurements (Tsai 2009; Yao & van der Hilst 2009; Weaver et al. 2009; Froment et al. 2010), supporting the use of noise correlations as empirical Green’s functions in traveltime tomography. Nevertheless, other works observe that Earth structure models obtained in this manner struggle to fit seismic waveforms not used during their inversion, suggesting that they may not be accurate. For instance, Lu & Ben-Zion (2022) use data sets of noise correlations and earthquake waveforms to validate crustal models of Southern California (Small et al. 2017; Berg et al. 2018) and observe that models inferred from empirical Green’s functions have the lowest data fit. Similarly, Rodgers et al. (2022) note that 3-D subsurface models of the Western United States (Schmandt et al. 2015; Shen & Ritzwoller 2016) estimated from empirical Green’s functions poorly fit earthquake waveforms. Moreover, these subsurface models fit the data with different accuracy, indicating that they are distinct despite using similar data and imaging the same structure.
Multiple aspects, such as data coverage, forward modelling and inversion method, affect the accuracy of seismic tomography. However, one factor specific to noise tomography is the treatment of the noise source distribution. The assumption of uniform noise sources may decrease the accuracy of subsurface models obtained from noise tomography compared to those inferred from earthquake tomography. Also, it may promote differences between subsurface models of the same region, especially if obtained from ambient noise generated by distinct noise source distributions. This is particularly relevant for time-lapse studies using ambient noise (Brenguier et al. 2008; de Ridder et al. 2014), as they analyse data recorded during different time periods and thus different noise source conditions. The influence of source-related errors increases when using finite-frequency imaging methods like full-waveform inversion. Several studies indicate that this technique is more affected by the noise source distribution than ray tomography. For example, Basini et al. (2013) show that the location of noise sources directly affects the sensitivity kernels that determine the estimated Earth structure. Also, Fichtner (2014) concludes that noise correlation waveforms are considerably affected by the noise source distribution, thereby changing misfit measurements commonly used in full-waveform inversion.
To relax the assumption of uniform noise sources it is necessary to invert noise correlations without requiring them to approximate Green’s functions with an error that is smaller than the data errors. Based on ideas from helioseismology (Duvall et al. 1993; Gizon & Birch 2002), Tromp et al. (2010) introduced forward and adjoint modelling theory to use noise correlations as self-consistent observables in full-waveform inversion. This method includes any noise source distribution in the inversion framework, bringing several benefits. For instance, traveltime and amplitude changes induced by the distribution of noise sources are now assimilated as information during the inversion. Also, the noise source distribution can be estimated, facilitating the study of different phenomena such as the Earth’s hum (Ermert et al. 2017). The downside of these benefits is that a noise source model is required to invert for Earth structure and vice versa, similar to earthquake tomography (Fichtner et al. 2017; Sager et al. 2018b). This introduces source-structure trade-offs that complicate the inversion (Fichtner 2015; Sager et al. 2018b). Additionally, regional and local tomography studies are challenging, especially those using ocean noise. Due to the global distribution of ocean noise sources, the sensitivity of noise correlations can extend hundreds of kilometres away from the receiver–receiver line (Tromp et al. 2010; Fichtner 2015; Sager et al. 2018a). Consequently, non-global applications cannot account for distant noise sources if the computational domain is limited, potentially affecting the estimated models to an unexplored extent. Despite these inconveniences, multiple works successfully use noise correlations to infer noise sources (Ermert et al. 2017, 2021 ; Datta et al. 2019; Igel et al. 2021 ) and subsurface structure (Sager et al. 2020) at global to continental scales, demonstrating the usefulness of this technique.
Understanding how the treatment of the noise source distribution affects subsurface models is important, as it can shed light on practical consequences. Still, only a few works investigate the differences between subsurface models obtained under the assumption of uniform noise sources and those inferred considering the noise source distribution. Studies that address this matter include Hanasoge (2013) and Sager et al. (2018b), who show that interpreting noise correlations as empirical Green’s functions, that is, assuming uniform noise sources, reduces the accuracy of the estimated subsurface models. Although valuable, the experiments in these studies do not represent conditions found in nature, as they use simplistic Earth structure models and unrealistic noise source distributions (e.g. rings of noise sources or Gaussian blobs). Furthermore, they do not investigate how temporal changes in the noise source distribution affect subsurface structure models. Thus, there is room for improvement by conducting numerical experiments in realistic settings and studying changes in Earth structure models due to variations in the noise source.
In this paper, we investigate how the treatment of the noise source distribution changes Earth structure models obtained from full-waveform inversion of noise correlations. Our main focus is to study how subsurface models change if a heterogeneous noise source distribution is either pretended to be homogeneous or properly taken into account. Additionally, we explore the consequences of partially considering the noise source distribution (i.e. ignoring distant noise sources) since this is relevant for regional studies. For this, we conduct a series of 2-D structure inversions considering synthetic data. Due to the frequent interest in regional subsurface structure, we imitate a regional-scale experiment exploiting ocean noise.
In the following, we briefly outline the full-waveform inversion method and describe the numerical modelling of noise correlations and empirical Green’s functions. Then, we detail the computation of the synthetic data sets in Section 3, followed by the description of the experiments and their outcomes in Section 4. We discuss the results and state limitations in Section 5, before finishing the paper with our conclusions.
2 THEORY
Full-waveform inversion is mostly formulated as an iterative seismic tomography method that uses finite-frequency wave propagation and adjoint (Tarantola 1984, 1988; Mora 1987; Tromp et al. 2005) or scattering-integral techniques (Zhao et al. 2005; Chen et al. 2007) to obtain models of Earth structure and seismic sources. Its objective is to find a model that matches numerically simulated waveforms and observations within measurement errors. This task is achieved by using the differences between synthetic and observed waveforms to compute sensitivity kernels that indicate which model parameters to update. The theory of this method is extensively reviewed in the literature (e.g. Tromp et al. 2005, 2010; Fichtner 2010; Virieux et al. 2017). Therefore, we only briefly summarize the forward modelling of the observables used in our study, that is, noise correlations and empirical Green’s functions.
2.1 Noise correlations
Noise correlations are not conventional seismic signals as they represent the interference between seismic waves recorded at two distinct locations. To express noise correlations in the frequency domain, we define the
where
with
Eq. (3) defines time-averaged noise correlations. However, its numerical evaluation is computationally expensive, because it involves a double integral and requires multiple realizations of the noise sources. To decrease computational costs, we assume that noise sources are spatially uncorrelated, that is,
This approximation simplifies eq. (3) to
where
also known as the generating wavefield. Thus, eq. (5) enables the forward modelling of noise correlations between a reference station
2.2 Empirical Green’s functions
Eq. (3) highlights that noise correlations are not generally Green’s functions. Nonetheless, multiple representation theorems prove that these two signals are proportional if the propagation modes of the noise wavefield are equally strong and uncorrelated (Nakata et al. 2019). A noise wavefield with such characteristics can emerge through multiple scattering (Snieder 2004) or uniformly distributed noise sources (Roux et al. 2005). However, these conditions do not generally occur in nature, demanding processing techniques such as temporal normalization and spectral whitening to increase the noise wavefield isotropy (Bensen et al. 2007). After the noise recordings are processed, correlated and stacked, the following relation is often used to approximate empirical Green’s functions in the time domain (e.g. Snieder 2004; Roux et al. 2005):
where
3 SYNTHETIC DATA SETS
We compute two synthetic data sets that imitate noise correlations acquired in a real study during two time periods with different distributions of noise sources. Since regional applications are often of interest, we set up a regional-scale experiment. Among the different types of ambient noise, ocean noise is one of the most energetic, making it a frequent choice to conduct tomography. Therefore, we model correlations of intermediate-period ocean noise (11–50 s). To obtain practically relevant results, we locate our synthetic experiment in Southern California, USA, a region frequently studied because of its geological complexity, seismic activity and dense population that leads to strong scientific interest. Below, we describe the computation of the data sets in detail.
3.1 Experimental setup
To avoid the high computational cost of 3-D simulations, we conduct 2-D elastic wave simulations in the time domain using the spectral-element solver SPECFEM2D (Komatitsch & Vilotte 1998; Tape et al. 2007). The propagated waves are analogous to vertical-component, single-mode Rayleigh waves travelling on a membrane at a fixed depth (Tanimoto 1990; Peter et al. 2007). Therefore, we refer to them as surface waves in the rest of this paper. Following eq. (5), we simulate vertical-component noise correlations between 1653 unique seismic-receiver pairs using 58 receivers situated in Southern California, USA. This region is ideal for evaluating the impact of the noise source distribution on the estimation of Earth structure, as multiple noise tomography studies yield subsurface models with different features (Schmandt et al. 2015; Shen & Ritzwoller 2016; Berg et al. 2018). As illustrated by the magenta dots in Fig. 1(a), the receivers follow the geometry of the NASA MT-TA array (Adam et al. 2019) since its spatial coverage is ideal for tomography.

(a) Target structure model. The model consists of the superposition of the CSEM (Fichtner et al. 2018) and random anomalies located in a radius of
Unlike most regional studies, we define a computational domain larger than the target area to use a large-scale model of ocean noise sources. Since seismic attenuation decreases the energy emitted by far-away sources, we limit the domain to a radius of
3.2 Earth structure and noise source models
Multiple crustal structure models of Southern California are available in the Unified Community Velocity Model framework (Small et al. 2017). However, our target Earth structure model must include the seismic wave speeds of other regions due to the extension of our computational boundaries. Therefore, we use a horizontal slice of shear wave speed from the Collaborative Seismic Earth Model (CSEM; Fichtner et al. 2018). We take this slice at a depth of 24 km, which is within the range of depths sampled by Rayleigh waves with a period equal to the dominant period of our noise source model (T = 16 s; see the next paragraph). The CSEM is a global multiscale model, meaning that the spatial resolution changes depending on the data coverage. To mimic the subsurface structure not resolved in the CSEM, we add random perturbations of shear wave speed around Southern California. We define the perturbations using a von Karman correlation function, a model commonly used to represent Earth structure heterogeneities (Imperatori & Mai 2013; Sato 2019; Vyas et al. 2021). Thereby, the wave speed perturbations are characterized by the correlation length, standard deviation and Hurst exponent. Based on the shortest wavelength propagated in our simulations, we use a correlation length of 40 km. Moreover, we use a standard deviation of
We illustrate the noise source models in Fig. 2. For simplicity, we only consider noise sources acting on the vertical component and separate their spatial and frequency dependence, that is,

(a) Noise-source PSD
Regarding the location of noise sources, we define two noise source distributions
Using the setup described above, we compute two data sets of noise correlations, A and B, where the only difference between them is the noise source distribution. Estimating the target Earth structure model from both data sets will show how much the inferred models change due to temporal variations in the noise source.
4 INVERSIONS FOR EARTH STRUCTURE
We consider data sets A and B as observations to estimate the target subsurface model using three inversions based on the cross-correlation traveltime misfit function (Luo & Schuster 1991). The main difference between the inversions is a variation in treating the noise source. Inversion I assumes uniform noise sources and thus interprets noise correlations as empirical Green’s functions. In contrast, inversion II considers the distribution of noise sources and uses noise correlations as self-consistent signals. Lastly, inversion III replicates inversion II on a smaller mesh, ignoring noise sources outside the red square in Figs 2(c) and (d). We perform the inversions using a modified version of the open-source package SeisFlows (Chow et al. 2020). In all inversions, the initial Earth structure model is homogeneous, with a shear wave speed of 3894 m s−1. This value equals the average of the target model inside the array limits without the random variations of shear wave speed. Also, the noise source (power spectrum and distribution) is assumed to be known to simplify the comparison of the results. In the following, we detail the inversions and the corresponding outcomes. For conciseness, we name the inversions and their results after the inversion number and data set letter, that is, I-A.
4.1 Inversion I: uniform noise source distribution
Assuming a uniform noise source distribution, we treat noise correlations as approximations of Green’s functions. Following eq. (7), we obtain empirical Green’s functions by averaging the branches of the observed noise correlations and take the negative time derivative of the result. Then, we whiten the spectrum of the empirical Green’s functions by dividing it by the noise-source power spectrum. This operation deconvolves the signature of the noise-source autocorrelation function but does not remove the imprint of the noise source distribution. As for the synthetic data, we model Green’s functions using vertical point sources with a Gaussian source time function of 10 s half-duration (Gao & Shen 2015; Wang et al. 2021). The half-duration determines the lowest period for which the spectral amplitude of the source time function is non-zero. Thus, we define it based on the minimum period resolved by our mesh. Because the involved wave simulations do not require a noise source model, there is no need to use the entire mesh. Thus, we conduct the inversion in the blue region of the mesh shown in Fig. 1(b).
We fit the surface wave part of the observed waveforms using the cross-correlation traveltime misfit function (Luo & Schuster 1991). This function measures traveltime differences between observed and synthetic waveforms. As illustrated in Fig. 3, measurements are made on windows of 100 s duration centred at the maximum value of the synthetic-waveform envelope (Zhang et al. 2018). To guarantee the assimilation of meaningful information, a minimum correlation coefficient of CC = 0.7 and a traveltime difference between
where N indicates the number of measurement windows and

Selection of measurement windows (grey rectangles) for (a) empirical Green’s functions and (b) noise correlations. The observations are coloured in black and the synthetic waveforms in blue. The involved stations are named in the upper-left corner and their locations are shown in Fig. 7(a). Abbreviations: CC, correlation coefficient;

Data misfit evolution of the inversions from data sets (a) A and (b) B. The inversions are stopped once the line search does not find a step length that decreases the data misfit after 10 trial steps.

(a) Zoom-in of the target structure shown in Fig. 1(a). (b)–(g) Estimated structure models. The magenta dots indicate the position of the receivers. A mask is applied to cover regions with low data coverage. The letters A–G and dashed black lines indicate areas of interest as stated in the main text.
As Fig. 5(b) shows, the model estimated from data set A (model I-A) recovers the large-scale structure of the target model. Nonetheless, in some areas, the model is smeared and small heterogeneities are not resolved. For instance, the structure inside area F is blurred, resulting in a fictitious lineament with a northwest strike. Also, the low-speed heterogeneity inside area D is not recovered. The model I-B in Fig. 5(c) presents similar features. However, the shear wave speed is overestimated in some areas such as B and G. To analyse the accuracy of the models, we show their root-mean-square error (hereafter model misfit) in Fig. 6. This figure indicates that misfit of model I-B is 11 per cent larger than the misfit of model I-A. Remarkably, it also shows that during the estimation of both models, the model misfit increases after a given iteration even though the data misfit decreases (Fig. 4).

Model misfit evolution of the inversions from data sets (a) A and (b) B. The model misfit is defined as the root-mean-square error between the target and estimated models. It is computed using all GLL points inside the region marked by the mask in Fig. 5. The model misfit is normalized by the misfit of the initial model in order to facilitate its comparison.
Figs 7 and 8 illustrate the data fit evolution of the inversions from data sets A and B, respectively. As the histograms of traveltime measurements in Figs 7(b) and 8(b) show, model I-A reduces the standard deviation of the measurements 0.24 s more than model I-B, and fits 302 more measurement windows. An important observation is that these histograms are skewed to the left, meaning that most traveltime differences are positive (observed traveltimes are greater than synthetic traveltimes). Because the initial model is close to the average of the target model, we expect a similar number of positive and negative traveltime differences and hence symmetric histograms. The fact that the histograms are skewed therefore suggests that interpreting noise correlations as empirical Green’s functions results in a positive traveltime bias.

Data fit evolution of inversions I-A to III-A. (a) Map of selected receivers. The outer ring shows the azimuthal integration of noise source distribution A. (b)–(d) Histograms of traveltime measurements from the initial (blue) and estimated (red) models. (e)–(g) Comparison between observations (black) and waveforms from the initial (blue) and estimated (red) models. The waveforms are normalized by their maximum value. The grey rectangles indicate measurement windows. Abbreviations: N, measurement windows;

Data fit evolution of inversions I-B to III-B. (a) Map of selected receivers. The outer ring shows the azimuthal integration of noise source distribution B. (b)–(d) Histograms of traveltime measurements from the initial (blue) and estimated (red) models. (e)–(g) Comparison between observations (black) and waveforms from the initial (blue) and estimated (red) models. The waveforms are normalized by their maximum value. The grey rectangles indicate measurement windows. Abbreviations as indicated in Fig. 7.
4.2 Inversion II: true noise source distribution
Here, we account for the noise source distribution and use noise correlations as observables. We apply the same inversion parameters as in inversion I. The windowing criteria, gradient post-processing, optimization method and stopping criteria are the same. However, we conduct wave simulations on the complete mesh to include the noise source distributions shown in Fig. 2. Also, as illustrated in Fig. 3(b), we compute measurement windows in both branches of the noise correlations. Whenever the windows overlap, they are accepted if they extend for at least half their duration (50 s) on the corresponding branch. This avoids discarding measurements from receiver pairs with short inter-receiver distances. Moreover, if one branch does not contain a measurable arrival, that branch is ignored.
Fitting data set A while considering the noise source distribution yields similar results to those obtained under the assumption of uniform noise sources. As shown in Figs 5(b) and (d), both models I-A and II-A recover large-scale heterogeneities while failing to image small features. The largest difference between these models is that the fictitious lineament inside area F in model I-A disappears in model II-A. In contrast, the repercussions of assuming uniform noise sources are more notable in the inversion of data set B. While model I-B overestimates the target shear wave speed in some areas, model II-B does not. This is observed in Figs 5(c) and (e), specifically in area C and the northwest of area G. Moreover, Fig. 6(b) shows that the misfit of model II-B is 21 per cent lower than that of model I-B.
The data misfit curves in Fig. 4 show that models II-A and II-B fit the observed data to a similar degree. This can also be observed in the histograms of traveltime measurements in Figs 7(c) and 8(c). Opposite to inversions I-A and I-B, the number of negative and positive traveltime measurements is similar, resulting in symmetric histograms. This shows that treating noise correlations as self-contained observations avoids source-related biases. Also, the number of measurement windows is higher for inversions II-A and II-B, indicating a better match between observed and synthetic waveforms when fitting the noise correlation functions.
4.3 Inversion III: partial noise source distribution
A large computational domain, like the one of inversion II, allows for the representation of a large-scale model of ocean noise sources. However, using a computational domain much larger than the study area increases the computational cost several times, especially at periods used in regional and local studies. Therefore, extending the computational boundaries far from the region of interest is impractical, especially when conducting 3-D wave simulations. On the other hand, a domain bounded around the target area does not permit the inclusion of distant noise sources, potentially affecting the estimated model. To better understand the consequences of ignoring the distribution of far-away noise sources, we repeat inversion II using the mesh of inversion I. The inversion setup remains the same, with the solely difference being that only noise sources inside the red square in Figs 2(c) and (d) are considered.
According to Fig. 5(f), model III-A recovers the target structure with an accuracy close to the accuracy of model II-A. This is further illustrated by the model misfit in Fig. 6(a). Nonetheless, model III-A contains an artificial low-speed heterogeneity in the east of area G. The consequences of ignoring distant noise sources are easier to observe in models estimated from the data set B. Fig. 5(g) shows that model III-B overestimates the shear wave speed of the target structure in areas A and E. As a result, the misfit of model III-B is 9 per cent larger than the misfit of model III-A. Also, the structure estimated outside the array changes if distant noise sources are ignored. This can be observed by comparing Figs 5(e) and (g). However, the data coverage outside the array is insufficient to reliably estimate the structure in this area. Similar to inversions I-A and II-B, the model misfit of inversions III-A and III-B (Fig. 6) increases at some point while the data misfit keeps reducing (Fig. 4).
Fig. 4 shows that the data misfit of models III-A and III-B is close to the data misfit of models II-A and II-B. However, the histograms of traveltime measurements in Figs 7(d) and 8(d) indicate that the number of measurement windows is less for models III-A and III-B. As observed in Figs 7(g) and 8(g), ignoring distant noise sources affects the synthetic noise correlation waveforms, especially the amplitude asymmetry. This reduces the similarity between synthetic and observed waveforms, resulting in fewer observations that satisfy our requirements to make a traveltime measurement.
5 DISCUSSION
Based on the experiments in Section 4, we discuss how different approaches to account for the distribution of noise sources affect subsurface models estimated from noise correlations. Also, we comment on the challenges and benefits of considering the noise source distribution during the inversion of noise correlations and state the assumptions and limitations of our experiments.
5.1 The impact of the noise source distribution on subsurface models
Our experiments investigate changes in subsurface models inferred from ambient noise correlations depending on whether the noise source distribution is deemed uniform, fully represented, or partially considered. The panels in the left-hand column of Fig. 5 show that all three approaches yield similar models when data set A is inverted. Despite minor discrepancies, these models recover similar structural heterogeneities and have comparable model misfits, with differences between 4 and 8 per cent as indicated in Fig. 6(a). In contrast, the right-hand column of Fig. 5 shows that models estimated from data set B differ more. According to Fig. 6(b), model II-B has the lowest model misfit, followed by model III-B with a 15 per cent misfit increase and model I-B with a 21 per cent misfit increase. Because the only difference between data sets A and B is the noise source distribution, our results suggest that assuming a uniform noise source distribution or ignoring distant noise sources can lead to substantial source-dependent model errors. The magnitude of these errors depends on the heterogeneity of the noise source distribution. In our experiments, noise source distribution B is more heterogeneous than noise source distribution A. Thus, models inverted from data set B present higher errors than models inverted from data set A. Since the location of noise sources changes over time, model errors are also time-dependent. Hence, subsurface models of the same region may differ if estimated from data recorded during different time intervals without considering the noise source distribution. This has important implications for time-lapse studies, as differences between Earth structure models caused by temporal variations in the noise source distribution can be confused with changes in Earth structure properties.
Another important remark is that in inversions I and III, the model misfit increases even though the data misfit decreases. This suggests that the estimated models compensate for traveltime biases due to imperfections in the forward modelling, that is, the assumption of uniform noise sources in inversion I and the omission of far-away noise sources in inversion III. Thus, if the noise source distribution is deemed uniform or distant noise sources are ignored, there is a risk of overfitting the observed waveforms and introducing errors into the subsurface models. In inversion I-B, the data misfit is reduced to 0.68 s2 (iteration 8) before the model misfit increases. Meanwhile, inversion I-A decreases the data misfit to 0.4 s2 (iteration 16) before data overfitting starts. This indicates that the point at which data overfitting begins depends on the noise source distribution.
5.2 Challenges and benefits of accounting for the noise source distribution
Inversion II illustrates that accounting for the noise source distribution reduces errors in the estimated Earth structure. Meanwhile, inversion III shows that the estimated subsurface structure can be biased if the computational domain is limited and distant noise sources are omitted. Hence, all noise sources located within the propagation distance of the ambient noise wavefield should be considered to decrease model errors. This distance is determined by several factors (e.g. wave frequency, geometrical spreading, anelastic attenuation, among others) and is specific to each study. Unfortunately, ocean-generated seismic noise may travel continental distances, complicating the representation of distant noise sources acting in a local region. Inversion II addresses this task using a mesh several times larger than the study region. However, this strategy significantly increases computational costs, making it infeasible for real applications. The development of methods to account for noise sources located outside a limited spatial domain is a topic that requires further research. A possible solution to this problem is to propagate the wavefield generated by distant noise sources into the domain of interest using wavefield injection techniques (Monteiller et al. 2012; Pienkowska et al. 2020).
Besides decreasing biases in subsurface models, accounting for the noise source distribution permits the assimilation of more information from noise correlations. Using a noise source model facilitates the simulation of noise correlation wavefields. Hence, it allows forward modelling and inversion of arrivals specific to noise correlations, for example, spurious arrivals. Additionally, if the noise source distribution is considered and the data are carefully processed, the amplitude and traveltime of the noise correlations can be assimilated using waveform-based measurements like the
5.3 Assumptions and limitations of the conducted experiments
The difficulty of the inversions is reduced by assuming noise-free data and complete knowledge of the noise source distribution and spectrum. We consider that using noise-free data is a minor simplification, as data noise can be accounted for in the definition of the misfit functional (e.g. by incorporating a data covariance matrix). In contrast, assuming that the noise source is known has considerable implications. Studies using field data must estimate the noise-source properties and distribution to invert noise correlations, which introduces additional errors to the estimated structure model if not done correctly. Moreover, the imprint of the noise-source time function, which we remove during spectral whitening, decreases the similarity between empirical and synthetic Green’s functions and, thus, the number of measured windows. Therefore, our synthetic experiments represent a best-case scenario for the inversion of noise correlations as observables and as empirical Green’s functions.
Due to computational constraints, we conduct wave simulations in 2-D with shear wave speed as the only model parameter. As a result, 3-D propagation effects, anisotropy and anelasticity are disregarded. Additionally, we ignore the spatial correlation of ambient noise sources. Studies indicate that spatial and temporal correlation of noise sources may hinder the observation of higher order surface wave modes (Kimman & Trampert 2010) and affect the phase of noise correlations (Ayala-Garcia et al. 2021). However, since the wavelengths of ambient noise generated by ocean waves are usually larger than the ocean waves, we do not expect the correlation of ambient noise sources to significantly change our results. Because of the aforementioned simplifications, our experiments serve as a guideline rather than an exact replication of reality. Also, we acknowledge that the specific setup of our synthetic experiments does not capture the wide range of situations found in real studies. None the less, the physics of the problem remains general.
6 CONCLUSIONS
Using numerical experiments, we investigated how different ways to account for the noise source distribution affect subsurface models estimated from noise correlation waveforms. Our results show that models inferred under the assumption of uniform noise sources have a larger model misfit than models estimated considering the noise source distribution. Thus, assuming uniform noise sources and interpreting noise correlations as empirical Green’s functions introduces errors in subsurface models. The magnitude of these errors depends on the heterogeneity of the noise source distribution. In our experiments, source-related errors increase the model misfit between 5 and 21 per cent. Since the noise source distribution changes over time, this implies that source-related errors are also time-dependent. To mitigate these errors, it is necessary to consider the noise source distribution during the inversion of noise correlations. Depending on the propagation distance of the ambient noise wavefield, noise sources located away from the area of interest must be considered. Otherwise, the estimated model can contain biases, which, in our case, increased the model misfit by up to 15 per cent.
ACKNOWLEDGEMENTS
The research presented in this article is supported by King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia, grant BAS/1/1339-01-01. Simulations have been carried out on the Ibex cluster of the KAUST Supercomputing Laboratory (KSL). All authors thank the editor Carl Tape for his work, Nori Nakata and an anonymous reviewer for their constructive comments, and the KSL staff for their support. EVC acknowledges the hospitality of the Seismic and Waves Physics group at ETH Zürich, where a big part of this work was conducted, and gratefully thanks Andrea Zunino and Scott Keating for fruitful discussions.
DATA AVAILABILITY
Scripts used in this work are available at https://github.com/evcano/cano_etal_2024_gji.