-
PDF
- Split View
-
Views
-
Cite
Cite
Huachen Yang, Pan Li, Fei Ma, Jianzhong Zhang, Building near-surface velocity models by integrating the first-arrival traveltime tomography and supervised deep learning, Geophysical Journal International, Volume 235, Issue 1, October 2023, Pages 326–341, https://doi.org/10.1093/gji/ggad223
- Share Icon Share
SUMMARY
Accurate near-surface velocity models are necessary for land seismic imaging. First-arrival traveltime tomography (FTT) routinely used for estimating near-surface velocity models may fail in geological complex areas. Supervised deep learning (SDL) is capable of building accurate velocity models, based on tens of thousands of velocity model-shot gathers training pairs. It takes lots of time and memory space, which may be unaffordable for practical applications. We propose integrating the FTT and SDL to build near-surface velocity models. During the neural network training, the FTT-inverted models rather than the original seismic data are used as the network inputs and corresponding true models are the outputs. The FTT-inverted and true models are the same physical quantities and with the same dimensions. Their relationship is less non-linear than that between shot gathers and true models. Thus, the neural network of the proposed method can be trained well using only a small number of training samples, dramatically reducing the time and memory costs. Numerical tests demonstrate the feasibility and effectiveness of the proposed method. We applied the proposed method to a land data set obtained in mountainous areas in the west of China and obtained satisfactory near-surface velocity models and stacking images.
1 INTRODUCTION
In land seismic exploration, accurate near-surface velocity models are essential for the static correction and pre-stack depth-domain migration (Adamczyk et al. 2014; Jiang & Zhang 2017; Law & Trad 2018; Shao et al. 2022; Yang et al. 2022). The first-arrival traveltime tomography (FTT) is a robust tool for near-surface velocity estimation via minimizing the misfits between calculated and observed first-arrival traveltimes (Senet et al. 2001; Taillandier et al. 2009; Zhang et al. 2014; Yang et al. 2018). We applied the FTT to a field seismic data set acquired in mountainous areas in the west of China (Yang et al. 2021). The static correction problem of the seismic data observed on the foot of the mountains is well solved whereas that on the mountains is still unsolved. In the mountains, the lateral velocity and topography are changing severely, preventing us from obtaining an accurate near-surface velocity model using the FTT. To further deal with the static correction problem of the field seismic data set, new methods are needed for building more accurate near-surface velocity models.
In recent years, building velocity models via supervised deep learning (SDL) attracts more and more attention (Araya-Polo et al. 2018; Guo et al. 2021; Li et al. 2021a; Luo et al. 2022; Wamriew et al. 2022). Unlike the FTT based on the laws of physics, SDL methods are based on big-data training (Rouet-Leduc et al. 2020; Barfod et al. 2021; Zhang et al. 2021b; Vantassel et al. 2022; Muller et al. 2023). During the training stage, the network establishes a non-linear mapping from the seismic data to the corresponding velocity models. During the prediction stage, the trained network is used to estimate the velocity model from the input seismic data. Yang & Ma (2019) proposed using a fully convolutional neural network (CNN) to establish the non-linear projection from multishot seismic data to the corresponding velocity models. By the comparison between velocity models estimated by the deep learning and full wave-form inversion, respectively, the feasibility of SDL methods was verified. Zhang et al. (2021a) proposed an two-step adjoint-driven deep-learning FWI approach which used the SDL to estimate the low and high wavenumber velocity models independently from the given initial models and common-source gradients. A significant innovation of this method is that both the network inputs and outputs are the depth-domain physical quantities. Subsequently, Zhang & Gao (2022) proposed using the shot-domain common imaging gathers as networks inputs to avoid the seismic information lost cased by the adjoint operators. Similarly, Geng et al. (2022), Wu et al. (2022) and Muller et al. (2023) proposed to predict the true velocity models from the smooth starting velocity models and migration images via SDL methods. However, applying the trained neural networks to field land data sets may be difficult due to its limited generalization ability and sensitivity to the accuracy of the staring models and seismic noises (Wang & Ma 2020; Dong et al. 2022; Qian et al. 2022).
In order to improve the generalization ability of neural networks, researchers try to construct massive training data sets with different characteristics (Araya-Polo et al. 2019; Dong et al. 2022; Qian et al. 2022). Araya-Polo et al. (2019) proposed a deep learning-driven velocity model building method to generate abundant training data sets. In their method, generative adversarial networks (Goodfellow et al. 2014) are used to learn a geologic representation from a finite number of model examples built beforehand, and then a large number of true models are sampled from the learned distribution. Wang & Ma (2020) modified massive natural images collected from an online repository as training labels. Liu et al. (2021) proposed a multi-step velocity model building method based on some simple assumptions for constructing extensive dense-layer/fault/salt body models without much human effort.
For 2-D and 3-D surveys of realistic size, generating large quantities of model-gathers pairs for training neural networks is still challenging (Simon et al. 2021). To deal with this problem, transfer learning algorithms were proposed, in which the network was first trained on the data set similar to the target one and then used as the starting solution to the target data set (Hu et al. 2021; Wang et al. 2021; Jin et al. 2022). Li et al. (2022) trained the networks on the specific data sets where the seismic data and velocity models in the same blurring levers and then on the extensive data sets. By doing so, acceptable results can be obtained using a small number of model-gathers training pairs. An alternative strategy is to establish the projection from seismic data to 1-D vertical velocity curves via neural networks and then construct 2-D or 3-D velocity models based on these predicted 1-D curves. Kazei et al. (2020) trained a CNN to map CMP gathers to 1-D vertical velocity curves, simplifying the training phase. Similarly, Fabien-Ouellet & Sarkar (2020) trained a deep recurrent neural network (RNN) to map CMP gathers to 1-D root-mean-square and interval velocity models in time. They applied the trained RNN to a real 2-D marine survey and demonstrated that good performance could be achieved on real data even if the training was based on synthetic data sets.
The above researches have shown that the deep-learning methods have great potential and good application prospects in velocity model building. However, the neural networks established in the above researches are mathematical relationships, ignoring the physical laws between the seismic data and the corresponding velocity models. This drawback may restrict the deep-learning methods applying to field seismic data due to the poor generalization ability of the neural networks trained with a limited number of training samples. Besides, when the velocity distribution of the actual underground medium is complex, the relationships between seismic data sets and velocity models are extremely highly non-linear, preventing SDL applying to field data sets.
To our best knowledge, there are no published researches about building near-surface velocity models via SDL from seismic data sets obtained on rugged observation surface. When the topography is undulating, the neural network needs to learn not only the subsurface velocity distributions but also the observation-surface positions. Consequently, extremely large number of training samples are necessary for networks to establish the highly non-linear mapping from seismic data sets to velocity models with undulating topography.
To solve the static correction problem mentioned at first with affordable time and computer memory cost, we propose a joint inversion method which integrates the FTT and SDL to build a more accurate near-surface velocity model with a small number of training samples. First, the FTT is used to build a rough near-surface velocity model. Then the SDL is used to improve its accuracy. We applied the proposed method to synthetic data sets to verify its correctness and effectiveness, then to field land seismic data to obtain better stacking images.
2 METHODOLOGY
2.1 Joint inversion
Most modern SDL methods aim to establish the mapping from seismic data to velocity models directly. The corresponding loss function of the neural network training is expressed as (Yang & Ma 2019)
where Θ is the set of network parameters, N is the total training sample number, vtru is the true velocity model, D is the seismic data set and Net(;) indicates the output of the network.
The mapping between seismic data sets and true velocity models is highly non-linear due to their huge differences in physical quantities, dimensions, variation ranges and so on. Consequently, numerous training samples are needed for training the neural networks especially for geological complex models with undulating topography (Araya-Polo et al. 2019; Wang & Ma 2020). Moreover, to our best knowledge, how many training samples are needed for a given field survey is unknown. As a result, whether the trained network can be successfully applied to field seismic data is uncertain.
To ensure the neural network trained with a small number of training samples can be applied to field seismic data sets successfully, we embed the FTT into SDL to make full use of the physical laws between first-arrival traveltimes and near-surface velocity models. We replace the shot gathers used as network inputs in conventional SDL (Yang & Ma 2019) with FTT-inverted models, and use the corresponding true models as outputs. By doing so, both the inputs and outputs of the networks are the same dimensional and variation-range velocity models. They have the same topography and similar velocity structures. As a result, the non-linearity of the network decreases dramatically, mitigating the need of the large number of training samples (Wang & Ma 2020). Moreover, the predicted models can be compared with the FTT-inverted one to judge its correctness, which is important for real surveys.
To establish the relationship between the FTT-inverted and true near-surface velocity models, the loss function of the neural networks training of the proposed joint inversion is set up as follows
where vtom is the FTT-inverted model, λ is the maximum value of the FTT-inverted model used for normalizing the velocity models. To solve the optimization problem indicated by eq. (2), back propagation and stochastic gradient-descent (SGD) algorithms are used (Wang & Ma 2020). The updating of the network parameters is expressed as (Yang & Ma 2019)
where k is the iteration number, η is the step length (learning ratio) and ∇L is the gradient of the loss function with respect to network parameters. The numerical computation of the gradient using all training samples is not feasible based on our GPU memory at each iteration. Therefore, only a part of the randomly selected samples are used for calculating the gradient. In this paper, the batch size is set to 5.
2.2 Architecture of the network
To establish the mapping from FTT-inverted near-surface velocity models to the true ones, we adopted and modified the U-net (Ronneberger et al. 2015; Li et al. 2021b; Gao et al. 2022) architecture as shown in Fig. 1. The U-net architectures have been adopted by several researchers successfully for building velocity models (Yang & Ma 2019; Zhang et al. 2021a, Zhang & Gao 2022). The network consists of a contracting path (left-hand panel) and an expanding path (right-hand panel), used for capturing the features from inputting FTT-inverted models and enabling precise localization of these features with expected true models. The architecture of the network is an encoder-decoder structure based on the max-pooling and transposed convolution. Each convolution module in the encoder includes convolution and activation operations, in which the convolution kernel size is fixed as 3 × 3 and the stride is 1. The activation function is rectified linear unit (ReLU). The kernel size of the max pooling is 2 × 2 and the stride is 2. The effective receptive field of the network increases from the first to fourth layer. The numbers of channels in the left path are 64, 128, 256 and 512 as the network depth increases. The channel numbers in the right path are contrary to ones in the left path as shown in Fig. 1. Skip layers are used to combine the global deep-feature maps in the left path with the local shallow-feature maps in the right path. The coefficient in the dropout layer is set to 0.5 to improve the robustness of the network. The construction of the network is realized by the DL tool boxes of the MATLAB. All the training and testing processes are implemented on the same workstation (GPU: NVIDA Quadro P620).

2.3 Generating training samples
The key for SDL methods to achieve good results from practical seismic data lies in the design of training samples that fit the actual situation. If the designed training samples deviate too much from the actual situation, the neural network cannot be guaranteed to learn effective features, resulting in the inconsistency between the network predicted models and the true velocity distributions. For example, if all the designed training samples consist of horizontal layer velocity models (outputs) and their corresponding seismic records (inputs), the neural network can not learn the characteristics of the undulating observation surface from the inputting seismic records. As a result, the trained neural network is difficult to be applied to the field seismic data observed on rugged topography. To fit the actual situation, the topographies of the designed true velocity models are set to be the same with the true ones. The geometry for generating first-arrival traveltimes using the true models is also set to be the same with the practical one.
The FTT is used to invert the observed first-arrival traveltimes, and the rough near-surface velocity model is obtained. According to the velocity structures and variation ranges of the FTT-inverted model, a small number of velocity models are constructed as label models (Liu et al. 2021), which are consistent with the actual topography. Using the observation system of the field seismic data, the first-arrival traveltimes of these velocity models are computed through the forward modelling of the eikonal equation. Then the velocity models inverted from the simulated first-arrival traveltimes are obtained using the same FTT and parameters as the inversion of the observed first-arrival traveltimes. The models inverted using the simulated data and the corresponding theoretical models are used as the network inputs and outputs, respectively. By doing so, the features learned by the neural network conform to the actual situation of the survey area, reducing the demand for large number of training samples and improving the velocity model building efficiency.
2.4 Velocity model building steps
The flow chart of the proposed method is shown in Fig. 2.

3 TESTS ON SYNTHETIC DATA
In this section, we use synthetic data to verify the effectiveness of the proposed method and show its superiority via comparing with conventional methods. We first introduce how to generate the training samples. Secondly, we introduce the details of training and testing the neural networks. Thirdly, we show the influence of training sample numbers on the accuracy of network-predicted models. Finally, we test the generalization ability of the trained neural networks.
3.1 Generating training samples
The training samples of the proposed method are true (network expected outputs) and FTT-inverted (network inputs) velocity model pairs. We build true models at first, then compute the first-arrival traveltimes via ray tracing methods using the constructed true models (Li et al. 2019), and finally obtain the FTT-inverted models from the calculated traveltimes. In total, 300 training samples are constructed for numerical tests.
The true models are generated based on the method proposed by Liu et al. (2021). Every true model is composed of a weathering layer and a high-velocity layer. The top and bottom interfaces of the weathering layers of all true models are determined by
where ztop and zbot are the depth of the top and bottom interfaces of the weathering layers, respectively, x is the horizontal distance, xmax is the maximum value of the horizontal distance, α1, α2 and α3 are random numbers between 0 and 1 varying with different true models. For all true models, xmax is set to 6.4 km.
The top interfaces of the weathering layers of all true models are the same since the actual observation surfaces of the survey line are known and stationary. The bottom interfaces of the weathering layers change around a possible interface. Here the possible interface is supposed to be the 400-m depth constant interface.
The velocities of the weathering layers of all true models are given by
where vw is the weathering layer velocity, zmax is the maximum depth of the true models, α4, α5, α6 and α7 are random numbers between 0 and 1 varying with different true models. For all models, zmax is 0.7 km. The weathering layers are divided into two parts by the interface,
The velocities of the top part increase with depth relatively faster than ones of the bottom part. On the whole, the velocities of the weathering layers increase with depth and change sinusoidally in horizontal direction.
After determining the weathering layers, the velocities of high-velocity layers are set to constant, expressed as
where α8 is a random number between 0 and 1 varying with different true models.
Using eqs (4)–(8), all true models are generated and they are unique. The left-hand panels in Fig. 3 show five representative true models. They have the same topographies but different velocity variations.

Five representative training samples. The left-hand (output) and right-hand (input) panels are true and FTT-inverted near-surface velocity models, respectively.
Then a fixed-spread land acquisition system is used for computing the first-arrival traveltimes via the ray tracing method (Li et al. 2019). The acquisition system is composed of 64 sources and 256 receivers evenly deployed on the observation surfaces. All true models are discretized with 25 × 10 m2 rectangle grids, generating 256 × 64 grids.
FTT-inverted models are obtained from a 4 km s−1 constant initial model after 30 iterations. The right panels in Fig. 3 show five FTT-inverted models corresponding to the true models shown in the left-hand panels. From Fig. 3, the FTT-inverted models are obviously away from the corresponding true ones even though the FTT-loss functions converge to very small values at 30 iterations as shown in Fig. 4(a). The FTT-loss function is expressed as (Jiang & Zhang 2017; Yang et al. 2021)
where E is the loss function value, v is the near-surface velocity model, M is the number of first-arrival traveltimes, T is the first-arrival traveltime and superscript * indicates the observed first-arrival traveltimes.

The normalized loss functions of the FTT (a) and SDL using FTT-inverted models (b) and first-arrival traveltimes (c) as inputs, respectively.
3.2 Training and testing neural networks
The FTT-inverted and corresponding true models are used as the network inputs and expected outputs, respectively. The dimensions of the inputs and outputs are 256 × 64. The sample numbers for the network training, validation, testing are 210, 60 and 30, respectively. During the training, the batch size is set to 5, the maximum epoch number 30 and 42 iterations for each epoch. The network validation is implemented every eight iterations. The training process is stopped when reaching the maximum iteration number or the last five loss function values of the SDL on validation samples larger than the smallest one. The parameters of the networks are updated using the Adam algorithm (Kingma & Ba 2014).
After 990 iterations, about 24 epochs, the training process meets the termination conditions, costing 904 s. The mean-squared errors (MSEs) of the network training and validation between the predicted and true velocity models are indicated by the black solid line and red circles in Fig. 4(b). The MSEs of the network training decrease dramatically at the first epoch and then dynamically declines to about 2.86 per cent of the maximum MSE.
After training the network, the 30 selected testing samples are used to examine its performance. Fig. 5 shows three representative testing samples. The first, second and third rows in Fig. 5 are the true, FTT-inverted and network-predicted models. Comparing with the FTT-inverted models, the predicted models are much closer to the true models. The accuracy and resolution of the predicted models in both lateral and vertical directions are dramatically improved. This test shows that the proposed neural networks are able to learn the error distributions of the FTT-inverted models from the training samples and play as an error-correcting operator during the networks predicting. Meanwhile, the trained networks play as a resolution-improving operator due to the accuracy improvements of the predicted models.

Three representative testing samples with 210 training samples. The first and second rows are true and FTT-inverted models, respectively. The third and fourth rows are the network predicted models using FTT-inverted models and first-arrival traveltimes as network inputs, respectively.
To reveal the influence of different network inputs on the network predictions, we use the first-arrival traveltimes computed with true models as network inputs to train the proposed network. The first-arrival traveltimes are arranged in shot and receiver horizontal distance as shown in Fig. 6. They are intentionally discretized to the same dimensions of the true models, namely, 256 × 64.

First-arrival traveltimes corresponding to the true models shown in Fig. 3.
Using the same training parameters as the training using the FTT-inverted models as inputs, the training process using first-arrival traveltimes as inputs meets the termination conditions after 1170 iterations, about 28 epochs, costing 1066 s. The mean-squared errors (MSEs) of the network training and validation between the predicted and true velocity models are indicated by the black solid line and red circles in Fig. 4(c). The validation MSEs is higher than the training MSEs after the initial epochs, indicating that the over-fitting of the trained networks occurs. Although the overfitting of the networks trained using the FTT-inverted models as inputs is not visible from Fig. 4(b). On the whole, the MSEs of the velocity models predicted from first-arrival traveltimes are larger than ones from FTT-inverted models. The final MSE of of the network trained using first-arrival traveltimes is about 5.74 per cent of the maximum MSE and about two times larger than that using FTT-inverted models.
After training the neural network using the first-arrival traveltimes as inputs, the same testing samples are used to examine its performance. The fourth row in Fig. 5 shows three representative testing results. From Fig. 5, it is clearly shows that the models predicted from first-arrival traveltimes are still away from true models and worse than ones predicted from FTT-inverted models.
To quantitatively analyse the accuracy of the velocity models predicted from different kinds of data sets, the histograms of the velocity errors of the 30 testing samples are shown in Fig. 7. Figs 7(a)–(c) indicate the models obtained by FTT and SDL using FTT-inverted models and first-arrival traveltimes as inputs, respectively. The velocity errors of FTT-inverted models show a normal distribution with the mean value of −0.1 km s−1. And about 22.7 per cent of them are between −0.1 and 0.1 km s−1. About 88.5 and 29.4 per cent of the velocity errors of the models predicted from FTT-inverted models and from first-arrival traveltimes are between −0.1 and 0.1 km s−1, respectively. This test shows that using FTT-inverted models as network inputs rather than first-arrival traveltimes can obviously improve the accuracy of network outputs.

Histograms of the velocity errors of the 30 selected testing samples. Panels (a), (b) and (c) indicate the models obtained by FTT, predicted by networks using FTT-inverted models and first-arrival traveltimes as inputs, respectively.
3.3 Influence of training sample number
Except the type of inputting data sets, the performance of SDL methods also relies on training sample numbers (Araya-Polo et al. 2019; Wang & Ma 2020). To reveal the influence of training sample numbers on the accuracy of network predicted models, we trained the proposed network using 35, 70, 105, 140, 175 and 210 training samples, individually. The numbers of validation and testing samples are 2/7 and 1/7 of ones of the training samples, respectively. For each training process, training samples are randomly selected from the total 300 samples. The validation and testing samples are randomly selected from the left samples for each training and testing process.
Fig. 8 shows the average RMSEs of the velocity models predicted from the networks trained using different sample numbers. The solid and dashed lines in Fig. 8 indicate the networks training and predicting using first-arrival traveltimes and FTT-inverted models as network inputs, respectively. In general, the RMSEs of velocity models predicted from first-arrival traveltimes change rapidly with the increase of training sample numbers whereas ones from FTT-inverted models decrease relatively steadily from about 0.18 to 0.11 km s−1. The RMSEs of models predicted from first-arrival traveltimes are obviously larger than ones from FTT-inverted models even using more training samples.

Average RMSEs of the velocity models predicted from the networks trained using different sample numbers. Solid and dashed lines indicate the SDL using first-arrival traveltimes and FTT-inverted models as network inputs, respectively.
Fig. 9 shows the histograms of the velocity errors of the models predicted from the networks trained using different sample numbers and network inputs. The top and bottom rows indicate the models predicted from the first-arrival traveltimes and FTT-inverted models, respectively. The left-hand and right-hand columns indicate the models predicted via the networks trained using 35 and 105 samples, respectively. From Figs 7 and 9, on the whole, the velocity errors of the models predicted from the FTT-inverted models are much closer to zeros than ones predicted from the first-arrival traveltimes. Comparing Figs 7(b), 9(c) and (d), the performance of the networks trained with FTT-inverted models is obviously improved by increasing the training sample number from 35 to 105 whereas the improvements are limited via increasing the training sample number from 105 to 210.

Histograms of the velocity errors. The top and bottom rows indicate the models predicted from the first-arrival traveltimes and FTT-inverted models, respectively. The left- and right-hand columns indicate the models predicted via the networks trained using 35 and 105 samples, respectively.
To compare the performance of neural networks trained using 70 FTT-inverted models and 210 first-arrival traveltime data sets, three testing samples training with 70 FTT-inverted models are shown in Fig. 10. The top, middle and bottom rows in Fig. 10 indicate true, FTT-inverted and predicted velocity models. The accuracy of predicted models is significantly improved comparing with FTT-inverted models, indicating that good performance of the proposed method can be achieved even using only a small number of training samples. Comparing the predicted models shown in Fig. 10 and at the fourth rows in Fig. 5, it is apparent that the performance of neural networks trained using 70 FTT-inverted models is much better than using 210 first-arrival traveltime data sets. The primary reason is that the FTT-inverted models are much closer to the true ones comparing with first-arrival traveltime data sets. The training time using 70 FTT-inverted models is 325 s, about 1/3 of that using 210 first-arrival traveltime data sets.

Three representative testing samples with 70 training samples using FTT-inverted models as network inputs. The top, middle and bottom rows indicate the true, FTT-inverted and predicted velocity models, respectively.
3.4 Generalization ability test
To examine the generalization ability of the proposed neural networks (training with 210 samples using FTT-inverted models as inputs), four additional testing samples different from the previous samples are constructed as shown in Fig. 11. The first and second samples are with smaller and larger topographic reliefs. The third and fourth samples are with lower and higher weathering layer velocities. The last three samples contain a Gaussian anomaly, respectively, in the middle of the weathering layers as shown in Figs 11(b)–(d). The velocity differences between the Gaussian anomaly and its surroundings in the second sample is small whereas ones in the third and fourth samples are obvious. The top, middle and bottom rows in Fig. 11 indicate the true, FTT-inverted and predicted velocity models, respectively.

Four testing samples different from the training samples. The top, middle and bottom rows indicate the true, FTT-inverted and predicted velocity models, respectively.
In general, the trained neural networks show good performances on the first three testing samples whereas bad on the last sample as shown in Fig. 11. The first two testing samples indicate that the trained neural networks can be applied successfully to FTT-inverted models with similar observation surface and velocity variations.
For the third testing sample, although the velocities of the weathering layer are lower than ones of the training samples, they are still larger than 0 km s−1 and within the velocity ranges of training samples. However, the velocities of the weathering layer of the fourth training sample are larger than ones of the training samples and without the corresponding velocity ranges. Consequently, the predicted models of the third testing sample is satisfactory whereas one of the fourth testing sample is disappointing.
To overcome this problem, let us back to the network-training loss function, eq. (2). The maximum values of the FTT-inverted models, λ in eq. (2), are used for normalizing the velocity models. If the velocity ranges of the weathering layers of testing samples are larger than that of the training samples, using the maximum values of the FTT-inverted models as the normalization factors seems to be no longer appropriate. Thus, we try to use 1.2 times of the maximum velocity of the model shown in Fig. 11(h) as the new normalization factor, and obtain a newly predicted model shown in Fig. 12. Comparing with the originally predicted model shown in Fig. 11(l), the newly predicted model is closer to the true one shown in Fig. 11(d). This test reveals that if the velocity ranges of testing samples are larger than ones of the training samples, larger normalization factors are necessary to fit the trained neural networks.

Predicted model corresponding to the FTT-inverted model shown in Fig. 11 (h) using a normalization factor of 1.2 times of the maximum velocity of the FTT-inverted model.
On the whole, the Gaussian anomalies are invisible from the predicted models of the last two testing samples as shown in Figs 11(k) and 12. The results indicate that the trained neural networks fail in predicting the velocity variations that are apparently different with ones in the training samples.
4 FIELD DATA APPLICATION
We apply the proposed method to a field seismic data set obtained in mountainous areas in the west of China. The seismic line is about 20.48 km long. The surface elevation is shown in Fig. 13. A high mountain is located from about 10 to 16 km in the horizontal direction. 228 shots and 512 geophones are placed along the survey line with an interval of about 100 and 40 m, respectively. For each shot 240 geophones are used for recording the seismic wavefields. The sampling interval of the seismic data is 1 ms and the sampling length is 6 s. A 15–160 Hz bandpass filter and an automatic gain control procedure with a 400 ms time window are applied to the field seismic data to suppress the surface waves and to enhance the reflected waves.

Velocity models obtained by the FTT (a) and the proposed method (b). The black lines indicate the bottom interfaces of the weathering layers.
4.1 Model inverted by the FTT
First-arrival traveltimes are picked from the processed field seismic data and used to invert the near-surface velocity models. The initial velocity model used for the FTT is one with a constant velocity of 5 km s−1. The initial model was discretized with 40 × 10 m2 rectangle grids. Fig. 13(a) shows the near-surface velocity model inverted by the FTT after 30 iterations.
The velocities of the weathering layer shown in Fig. 13(a) vary gradually in the vertical direction whereas severely in the lateral direction, especially from about 11 to 15 km in the lateral direction. The velocities of the bottom high-velocity layer are about 4.5 km s−1. According to the velocity structure of the inverted model and previous research (Yang et al. 2021), we determined the bottom interface of the weathering layer based on the 4 km s−1 constant-velocity line as indicated by the black line in Fig. 13(a).
Fig. 14 shows the normalized loss function values of the FTT versus the iteration numbers. The objective function value at 30th iteration is very close to zero, indicating that the first-arrival traveltimes calculated with the inverted model are very close to the picked ones.

4.2 Model improved by the deep learning
4.2.1 Generating training samples
According to the characteristics of the FTT-inverted model shown in Fig. 13(a), 200 true velocity models are generated based on the method proposed in Section 3.1. Figs 15(a)–(h) show eight representative true models. The topographies of all true models are the same with the practical one. All true models consist of the weathering and high-velocity layers. The velocities of the weathering layer change in both lateral and vertical directions. The high-velocity layer is with a constant velocity. The velocity ranges of the weathering and high-velocity layers of the true models are larger than ones of the FTT-inverted model. All the true models are unique.

Eight representative training samples. The left-hand (output) and right-hand (input) panels are true and FTT-inverted near-surface velocity models.
Based on the acquisition geometry of the field seismic data, the first-arrival traveltimes corresponding to the generated true models are calculated by the ray tracing methods (Li et al. 2019). Using the same FTT and inversion parameters as ones for the field seismic data, inverted velocity models are obtained from the calculated first-arrival traveltimes. Figs 15(i)–(p) show eight inverted models corresponding to the true models in (a)–(h), respectively. It clearly shows that the velocities of the weathering layers of the inverted models are, on the whole, higher than ones of the true models, and the thicknesses of the weathering layers of the inverted models are larger. The differences between the FTT-inverted and true models are obvious, especially when the weathering layers are very thin or great velocity differences exist between the weathering and high-velocity layers.
4.2.2 Training neural network
The FTT-inverted and corresponding true models are used as the network inputs and outputs, respectively. The dimensions of the inputs and outputs are 512 × 128. The sample numbers for the network training, validation, testing are 190, 5 and 5, respectively. During the training, the batch size is set to 5, the maximum epoch number 30 and 38 iterations for each epoch. The network validation is implemented every eight iterations. The training process is stopped when reaching the maximum iteration number or the last five loss function values of the SDL on validation samples larger than the smallest one.
After 744 iterations, about 20 epochs, the training process meets the termination conditions. The mean-squared errors (MSEs) of the network training and validation between the predicted and true velocity models are indicated by the black solid line and red circles in Fig. 16. The MSE decreases dramatically at the first epoch and then dynamically declines to about 3 per cent of the maximum MSE.

After training the network, the five selected testing samples are used to examine its performance. Fig. 17 shows two representative samples. The top, middle and bottom rows show the FTT-inverted, predicted and true models, respectively. From Fig. 17, the predicted models are obviously closer to the true ones than the FTT-inverted models, especially the weathering layers. The vertical resolution of the predicated models is obviously higher than that of the FTT-inverted models, making the bottom interfaces of the weathering layers more clear.

Two representative testing samples. (a) and (d) are the FTT-inverted models (inputs). (b) and (e) are the predicted models (outputs). (c) and (f) are the true models (expected outputs). The black lines in (a) and (d) indicate the positions of the velocity curves shown in Figs 18(a) and (b), respectively.
To clearly show the velocity differences between the FTT-inverted, predicted and true models, the velocities indicated by the black dashed lines in Figs 17(a) and (d) are shown in Figs 18(a) and (b), respectively. The solid, dashed and dotted lines in Fig. 18 indicate the velocities of the true, FTT-inverted and predicted models in Fig. 17, respectively. It clearly shows that the velocity curves of the FTT-inverted models are smoother than ones of the predicted models. The velocities of the predicted models are closer to true ones. Meanwhile, the predicted models are laterally smoother compared to ones inverted by FTT. This test implies that the trained network is capable of improving the accuracy of the FTT-inverted near-surface velocity model.

Velocity curves. The positions of (a) and (b) are indicated by the black lines in Figs 17(a) and (d), respectively. The solid, dashed and dotted lines indicate the velocities of the true, FTT-inverted and predicted models, respectively.
4.2.3 Predicting using the trained network
Fig. 13(b) shows the velocity model predicted from the FTT-inverted model shown in Fig. 13(a). The velocities nearby the observation surface in the predicted model are smaller. The bottom interface of the weathering layer of the predicted model is determined based on the 4 km s−1 constant-velocity line as indicated by the black line in Fig. 13(b), similar with that in Fig. 13 (a).
To examine the accuracy of the FTT-inverted and predicted near-surface velocity models shown in Fig. 13, long-wavelength statics are calculated using a 4 km s−1 replacement velocity and a 1-km datum (Yang et al. 2021). Figs 19(a) and (b) are the stacking profiles obtained from the seismic data after long-wavelength static correction using the FTT-inverted and predicted model, respectively. The imaging events in Fig. 19(b) are more continuous and their stacking energy are stronger than ones in Fig. 19(a), especially in the red box. To compare clearly, the enlarged events in the red box are shown in Fig. 20. The red arrows in Fig. 20 indicate the apparent improvements. The geological structures in Fig. 20(b) are clearer than ones in (a). This test shows that the near-surface velocity model obtained using the proposed method is more accurate than that acquired using the FTT, and the static correction problems of the land seismic data recorded on the mountains can be well solved.

Stacking profiles. Panels (a) and (b) are obtained from the seismic data after long-wavelength static correction using the FTT-inverted and predicted model, respectively.

Enlarged stacking profiles. Panels (a) and (b) are corresponding to Figs 19(a) and (b), respectively. The improved events are indicated by red arrows.
5 CONCLUSIONS
Considering the advantages and disadvantages of the traditional FTT and the newly developed SDL, we propose a joint inversion method which embeds the FTT into SDL. The mapping from first-arrival traveltimes to accurate near-surface velocity models are divided into two parts, solving by the FTT and SDL, respectively. The FTT is used to establish the relationships from first-arrival traveltimes to rough near-surface velocity models. The SDL is to build the relationships from rough to accurate models.
Unlike the existing deep-learning velocity-model building methods, which use seismic records and true velocity models as network inputs and outputs, respectively, we have chosen an approach closer to the works that use the migrated images as network inputs. The inputs and outputs of the neural network proposed in this paper are the FTT-inverted and true near-surface velocity models. The FTT-inverted and true models have the same size, topography, and similar velocity distributions. The relationship between them is simple. Therefore, only a small number of training samples are needed. At the same time, using the proposed method is convenient for judging whether the network predicted models are reliable, which is important for the seismic exploration.
The effectiveness and superiority of the proposed method are verified by the application to synthetic and field mountainous seismic data. The proposed method provides a new perspective and strategy for applying the deep learning methods to velocity model building from field seismic data.
ACKNOWLEDGEMENTS
We thank editors Prof Herve Chauris and Dr Louise Alexander, and three anonymous reviewers for valuable comments which have improved this paper. This work was supported by the National Natural Science Foundation of China (grant no. 42174154) and Postdoctoral Application Research Project of Qingdao, China (grant no. QDBSH20220202126).
DATA AVAILABILITY
No data and codes are available.
CONFLICT OF INTEREST
The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.