Earthquake signal detection using a multiscale feature fusion network with hybrid attention mechanism

Prior work on earthquake detection and classification across different DL architectures and data sets.

Reference	Objective	Network architecture	Data format	data set
(Seydoux et al. 2020)	Earthquake signal and noise classification	Scattering network with Gaussian mixture model	Time-series	Nuugaatsiaq landslide
(Linville et al. 2019)	Earthquake and quarry blasts classification	CNN with long-short-term memory	3-C spectrograms	UUSS^a
(Tibi et al. 2019)	Tectonic, mining-induced and mining earthquake classification	CNN	3-C spectrograms	UUSS and UUEB^b
(Mousavi et al. 2019c)	Earthquake event detection	Residual network	3-C spectrograms	North California
(Ku et al. 2020)	Earthquake detection, microearthquake and noise classification	Attention-based CNN	3-C time-series	NECIS^c and IRIS^d
(Mousavi et al. 2020)	Seismic detection and phase picking	Multitask learning	3-C time-series	STEAD^e
(Shakeel et al. 2021)	Earthquake magnitude estimation	3D-CNN and RNN	1-C Log-MeL spectrograms	STEAD
(Duan et al. 2021)	Microseismic event classification	CNN	22-C seismograms	Synthetic data
(Saad et al. 2021)	Earthquake event detection	U-net	3-C CWT spectrograms	Texas, North California, Japan and Egypt
(Liu et al. 2021)	Microseismic classification	CNN	6-C seismograms	Underground coal mine
(Saad et al. 2022b)	Earthquakes and quarry blasts classification	Capsule Neural Network	Seismograms	ENSN^f
(Trani et al. 2022)	Earthquake, other event and noise classification	CNN	3-C time-series and seismograms	KNMI^g
(Saad et al. 2022a)	Event detection and magnitude estimation	Vision Transformer	3-C time-series	STEAD
(Jiang et al. 2023)	Earthquake, quake, rockfall and noise classification	CNN	3-C time-series, STFT maps and CWT maps	Résif
(Ma et al. 2023)	Microseismic, blasting and noise classification	Modified Visual Geometry Group 13-layer Network (VGG13) with attention	1-C STFT maps	–
Our model	Earthquake signal and noise classification, and arrival picking	MSFF with hybrid attention	3-C time-series	TXED^h

Reference	Objective	Network architecture	Data format	data set
(Seydoux et al. 2020)	Earthquake signal and noise classification	Scattering network with Gaussian mixture model	Time-series	Nuugaatsiaq landslide
(Linville et al. 2019)	Earthquake and quarry blasts classification	CNN with long-short-term memory	3-C spectrograms	UUSS^a
(Tibi et al. 2019)	Tectonic, mining-induced and mining earthquake classification	CNN	3-C spectrograms	UUSS and UUEB^b
(Mousavi et al. 2019c)	Earthquake event detection	Residual network	3-C spectrograms	North California
(Ku et al. 2020)	Earthquake detection, microearthquake and noise classification	Attention-based CNN	3-C time-series	NECIS^c and IRIS^d
(Mousavi et al. 2020)	Seismic detection and phase picking	Multitask learning	3-C time-series	STEAD^e
(Shakeel et al. 2021)	Earthquake magnitude estimation	3D-CNN and RNN	1-C Log-MeL spectrograms	STEAD
(Duan et al. 2021)	Microseismic event classification	CNN	22-C seismograms	Synthetic data
(Saad et al. 2021)	Earthquake event detection	U-net	3-C CWT spectrograms	Texas, North California, Japan and Egypt
(Liu et al. 2021)	Microseismic classification	CNN	6-C seismograms	Underground coal mine
(Saad et al. 2022b)	Earthquakes and quarry blasts classification	Capsule Neural Network	Seismograms	ENSN^f
(Trani et al. 2022)	Earthquake, other event and noise classification	CNN	3-C time-series and seismograms	KNMI^g
(Saad et al. 2022a)	Event detection and magnitude estimation	Vision Transformer	3-C time-series	STEAD
(Jiang et al. 2023)	Earthquake, quake, rockfall and noise classification	CNN	3-C time-series, STFT maps and CWT maps	Résif
(Ma et al. 2023)	Microseismic, blasting and noise classification	Modified Visual Geometry Group 13-layer Network (VGG13) with attention	1-C STFT maps	–
Our model	Earthquake signal and noise classification, and arrival picking	MSFF with hybrid attention	3-C time-series	TXED^h

UUSS represents University of Utah Seismic Station.

UUEB represents Unconstrained Utah Event Bulletin.

NCEDC stands for Northern California Earthquake Data Center.

IRIS represents the University of Utah Seismic Station.

STEAD represents Stanford Earthquake Dataset.

ENSN refers to the Egyptian Seismic Network.

KNMI is for Royal Netherlands Meteorological Institute.

TXED stands for the Texas Earthquake Dataset for AI.

Table 1.

Prior work on earthquake detection and classification across different DL architectures and data sets.

Reference	Objective	Network architecture	Data format	data set
(Seydoux et al. 2020)	Earthquake signal and noise classification	Scattering network with Gaussian mixture model	Time-series	Nuugaatsiaq landslide
(Linville et al. 2019)	Earthquake and quarry blasts classification	CNN with long-short-term memory	3-C spectrograms	UUSS^a
(Tibi et al. 2019)	Tectonic, mining-induced and mining earthquake classification	CNN	3-C spectrograms	UUSS and UUEB^b
(Mousavi et al. 2019c)	Earthquake event detection	Residual network	3-C spectrograms	North California
(Ku et al. 2020)	Earthquake detection, microearthquake and noise classification	Attention-based CNN	3-C time-series	NECIS^c and IRIS^d
(Mousavi et al. 2020)	Seismic detection and phase picking	Multitask learning	3-C time-series	STEAD^e
(Shakeel et al. 2021)	Earthquake magnitude estimation	3D-CNN and RNN	1-C Log-MeL spectrograms	STEAD
(Duan et al. 2021)	Microseismic event classification	CNN	22-C seismograms	Synthetic data
(Saad et al. 2021)	Earthquake event detection	U-net	3-C CWT spectrograms	Texas, North California, Japan and Egypt
(Liu et al. 2021)	Microseismic classification	CNN	6-C seismograms	Underground coal mine
(Saad et al. 2022b)	Earthquakes and quarry blasts classification	Capsule Neural Network	Seismograms	ENSN^f
(Trani et al. 2022)	Earthquake, other event and noise classification	CNN	3-C time-series and seismograms	KNMI^g
(Saad et al. 2022a)	Event detection and magnitude estimation	Vision Transformer	3-C time-series	STEAD
(Jiang et al. 2023)	Earthquake, quake, rockfall and noise classification	CNN	3-C time-series, STFT maps and CWT maps	Résif
(Ma et al. 2023)	Microseismic, blasting and noise classification	Modified Visual Geometry Group 13-layer Network (VGG13) with attention	1-C STFT maps	–
Our model	Earthquake signal and noise classification, and arrival picking	MSFF with hybrid attention	3-C time-series	TXED^h

Reference	Objective	Network architecture	Data format	data set
(Seydoux et al. 2020)	Earthquake signal and noise classification	Scattering network with Gaussian mixture model	Time-series	Nuugaatsiaq landslide
(Linville et al. 2019)	Earthquake and quarry blasts classification	CNN with long-short-term memory	3-C spectrograms	UUSS^a
(Tibi et al. 2019)	Tectonic, mining-induced and mining earthquake classification	CNN	3-C spectrograms	UUSS and UUEB^b
(Mousavi et al. 2019c)	Earthquake event detection	Residual network	3-C spectrograms	North California
(Ku et al. 2020)	Earthquake detection, microearthquake and noise classification	Attention-based CNN	3-C time-series	NECIS^c and IRIS^d
(Mousavi et al. 2020)	Seismic detection and phase picking	Multitask learning	3-C time-series	STEAD^e
(Shakeel et al. 2021)	Earthquake magnitude estimation	3D-CNN and RNN	1-C Log-MeL spectrograms	STEAD
(Duan et al. 2021)	Microseismic event classification	CNN	22-C seismograms	Synthetic data
(Saad et al. 2021)	Earthquake event detection	U-net	3-C CWT spectrograms	Texas, North California, Japan and Egypt
(Liu et al. 2021)	Microseismic classification	CNN	6-C seismograms	Underground coal mine
(Saad et al. 2022b)	Earthquakes and quarry blasts classification	Capsule Neural Network	Seismograms	ENSN^f
(Trani et al. 2022)	Earthquake, other event and noise classification	CNN	3-C time-series and seismograms	KNMI^g
(Saad et al. 2022a)	Event detection and magnitude estimation	Vision Transformer	3-C time-series	STEAD
(Jiang et al. 2023)	Earthquake, quake, rockfall and noise classification	CNN	3-C time-series, STFT maps and CWT maps	Résif
(Ma et al. 2023)	Microseismic, blasting and noise classification	Modified Visual Geometry Group 13-layer Network (VGG13) with attention	1-C STFT maps	–
Our model	Earthquake signal and noise classification, and arrival picking	MSFF with hybrid attention	3-C time-series	TXED^h

UUSS represents University of Utah Seismic Station.

UUEB represents Unconstrained Utah Event Bulletin.

NCEDC stands for Northern California Earthquake Data Center.

IRIS represents the University of Utah Seismic Station.

STEAD represents Stanford Earthquake Dataset.

ENSN refers to the Egyptian Seismic Network.

KNMI is for Royal Netherlands Meteorological Institute.

TXED stands for the Texas Earthquake Dataset for AI.

Prior works (Ku et al. 2020; Mousavi et al. 2020; Saad et al. 2022a; Trani et al. 2022; Jiang et al. 2023) utilized 3-C time-series as the input for their models, which exploited the most characteristics of waveforms in time-series. These methods (Linville et al. 2019; Tibi et al. 2019; Mousavi et al. 2019c; Shakeel et al. 2021; Saad et al. 2021; Jiang et al. 2023) used the spectrograms generated by the STFT, CWT and other transforms as the input data, which enables the network architecture to obtain more frequency resolution information hence achieving high-precision in detection and classification tasks. However, the transform-based data generation methods consume more time than just taking the time-series as input data. Compared with the above pioneering works, we introduced a multiscale feature fusion model for automatic dimensionality reduction and feature extraction, which enables 1-D convolutional (Conv1-D) layers, taking into account both details and local features. We chose the 3-C time-series as the input data for our DL network architecture. Seydoux et al. (2020) used unsupervised learning for real-time earthquake classification and detection. Unlike the supervised method, it does not require labelled training pairs, which overcomes the bias in the manual training set annotation. However, unsupervised learning methods highly depend on the specific network structure and carefully chosen hyperparameters, which limits their further applications in diverse data sets. Earthquake event detection and phase picking share similarities, but their objectives differ. Therefore, Mousavi et al. (2020) proposed a multitask DL architecture to train a neural network for seismic event detection and P- and S-wave picking. Compared to using only CNN, the attention mechanism-based method can guide the network to focus on essential waveform characteristics, while using recurrent neural network (RNN) can enhance the network’s ability to model time-series data (Ku et al. 2020; Shakeel et al. 2021).

3 METHODOLOGY

This section is divided into three subsections. The first subsection gives the details about the proposed DL architecture. In the second section, we show some details of training and testing for the proposed method. Finally, we introduce objective evaluation metrics to test the network’s classification performance.

3.1 Network architecture

We propose a novel DL framework (see Figs 1 and 2) for signal and noise discriminating in the 3-C earthquake time-series. The network primarily consists of a feature extractor, a classifier within the feature space domain and two hybrid attention mechanism blocks. The feature extractor, composed of four multiscale feature fusion (MSFF) modules, enables automatic dimensionality reduction and feature extraction. Additionally, we explored the impact of the number of MSFF modules on classification accuracy in the “DISCUSSION” section. The classifier accurately identifies the noise and signal probabilities of 3-C waveforms within the feature space. The convolutional block attention mechanism (CBAM) block enhances the network’s focus on waveform characteristics, thereby further improving the feature mapping. The feature extractor contains four submodules and three Conv1-D layers with different kernel sizes that can simultaneously extract feature information at different scales. Next, the feature maps with different receptive field sizes are concatenated (Huang et al. 2017) to enhance the model’s ability to extract features at various scales. The different kernel sizes in Conv1-D layers can obtain the relevant sections of local (detailed information) and global (global information) earthquake signal and noise features. Note that the feature fusion modules shown in Figs 1 and 2 share a common structure except for the activation function. Given that the input time-series consists of 1-D earthquake data with both positive and negative values, employing nonlinear activation functions directly may lead to the suppression of neurons with negative values, potentially resulting in the loss of key waveform characteristics. Consequently, we employ a linear activation function within the initial feature fusion module to maintain robustness. Due to the temporal dependencies in time-series earthquake data, we introduced gated recurrent units (GRU) with different hidden layers in the classifier block to enhance the network’s temporal modelling capabilities. Finally, we employed a flattened layer and a fully connected layer to reshape the output dimensions of the network, and a softmax activation function was applied to ensure that the output conforms to a probability distribution.

Figure 1.

The diagram of the proposed classification network. It includes a feature extractor, a classifier and a hybrid attention model. 3-C waveforms are the input data and the output data are the probability values corresponding to signal and noise waveforms.

Figure 2.

Details about the proposed DL architecture shown in Fig. 1. Each feature fusion module contains one fully connected layer with an ReLU activation function, three Conv1-D layers with different kernel sizes, one feature maps concatenation layer and one activation function. The classifier module has two GRU, and each GRU layer is followed with a tanh activation function. Moreover, we add a flattened and full connection layer to reshape the output into a probability distribution. CBAM consists of two main parts: spatial attention and channel attention mechanism. The Add layer fuses the feature maps through the above two blocks.

Additionally, because of the interdependence between earthquake classification and arrival picking tasks, we developed an extended multitask network to address both tasks effectively. The multitask network is illustrated in Fig. 3. Based on the earthquake classification network shown in Fig. 1, we added four decoder blocks and one output block to the classification network. As shown in Fig. 3, each decoder block consists of a fully connected layer with an Rectified Linear Unit (ReLU) activation function, an MSSF module with a LeakyReLU activation function, a batch normalization (BN) layer, and an upsampling layer. Notably, we employ skip connections to combine the output of each hybrid attention module with the output of the BN layer in each decoder block. These skip connections allow the network to merge features from deeper layers with those from shallower layers, enhancing the network’s generalizability and mitigating overfitting during training. Moreover, we used a Conv1-D layer with a sigmoid activation function to reshape the feature maps in the output block.

Figure 3.

An illustration of the proposed multitask deep neural network workflow. The network contains four encoders/decoders, two output blocks, and four hybrid attention blocks. 3-C time-series acts as the input of the multitask network, and the output is the probability of signal/noise waveforms and the arrival time of P and S wave, respectively.

3.2 Training details

As we can see from Fig. 1, our proposed DL network has three main blocks. The feature extractor, which is followed by a classifier, uses four MSFF modules to capture nonlinear representations in the feature space. The classifier uses the feature representations to discriminate between earthquake signal and noise. During the training process, we use the gradient information of a hybrid loss function (L) in eq. (1) to determine the direction for updating model parameters, thereby gradually steering the model toward the optimal solution.

$$\begin{eqnarray} L = (1 - \lambda ){L_{\mathrm{ bce}}} + \lambda {L_{\mathrm{ kld}}}, \end{eqnarray}$$

(1)

where |${L_{\mathrm{ bce}}}$| represents the binary cross-entropy (BCE) loss function. It is commonly used in binary classification problems to measure the discrepancy between the probability distribution predicted by the model and the true distribution. |${L_{\mathrm{ kld}}}$| stands for the Kullback–Leibler divergence (KLD) loss, which compares two different discrete probability distributions and can alleviate the issue of vanishing gradients that arises with the use of cross-entropy loss functions as the number of network layers increases. |$\lambda \in [0,1]$| is a hyperparameter that balances |${L_{\mathrm{ bce}}}$| and |${L_{\mathrm{ kld}}}$|⁠. In our work, the feature extraction and classification in the feature domain are jointly performed by setting the balanced factor equal to 0.1. We have gained experience from other researchers’ work (Xie et al. 2016) on the numerical setting of the balance factor. |${L_{\mathrm{ kld}}}$| stands for the KLD loss in eq. (1):

$$\begin{eqnarray} L_{\mathrm{ kld}} = \mathrm{ KL}(P||Q) = \sum _{i} P(i) \log \frac{P(i)}{Q(i)}, \end{eqnarray}$$

(2)

where P and Q represent two probability distributions, while i denotes the elements within these distributions. A smaller value of |$\mathrm{ KL}(P||Q)$| indicates a closer similarity between the two probability distributions.

$$\begin{eqnarray} L_{\mathrm{ bce}} &=& \mathrm{ binary}\_\mathrm{ crossentropy}(y, \hat{y})\\ &=& - \frac{1}{N} \sum _{i=1}^{N} \left[ y_i \log (\hat{y}_i) + (1 - y_i) \log (1 - \hat{y}_i) \right], \end{eqnarray}$$

(3)

here, y represents the true labels, |$\hat{y}$| stands for the predicted probability values by the model and N denotes the number of samples. |${L_{\mathrm{ bce}}}$| represents the BCE loss in eq. (1). BCE loss function is primarily used for binary classification problems, where the label y is either 0 or 1, and |$\hat{y}$| is the probability predicted by the model for the positive class.

3.3 Evaluation metrics

We objectively evaluate the classification performance of the proposed DL architecture with the accuracy (Acc), Precision, Recall, and F1-score, which are defined with the following formulas:

$$\begin{eqnarray} {\mathop {\rm Acc}\nolimits } = \frac{{\mathrm{ TP} + \mathrm{ TN}}}{{\mathrm{ TP} + \mathrm{ TN} + \mathrm{ FP} + \mathrm{ FN}}}, \end{eqnarray}$$

(4)

where TP, TN, FP and FN are true positive, true negative, false positive and false negative, respectively.

$$\begin{eqnarray} {\mathrm{ Precision}} &=& \frac{{\mathrm{ TP}}}{{\mathrm{ TP} + \mathrm{ FP}}}\\ {\mathrm{ Recall}} &=& \frac{{\mathrm{ TP}}}{{\mathrm{ TP} + \mathrm{ FN}}}\\ {F1 - \mathrm{ score}} &=& \frac{{2\times \mathrm{ Precision}\times \mathrm{ Recall}}}{{\mathrm{ Precision} + \mathrm{ Recall}}}, \end{eqnarray}$$

(5)

where Precision represents the proportion of correctly identified positive instances out of all predicted positive instances. Recall measures the proportion of samples that are correctly predicted to be positive among all samples that are actually positive. The F1-score is the harmonic mean of precision and recall, providing a single value to evaluate the model’s performance by balancing both metrics.

To ensure that we obtain a relatively stable and convincing classification result for the 3-C time-series, we trained the proposed network five times and calculated the average accuracy of the five predictions as the final output. As shown in eq. (6):

$$\begin{eqnarray} \bar{\mathrm{ Acc}} = \frac{{\sum \nolimits _i^5 {\mathrm{ Ac}{\mathrm{ c}_i}} }}{5}. \end{eqnarray}$$

(6)

Furthermore, we introduce kurtosis, a statistical measure that describes the sharpness or peakedness of the input sample distribution. Typically, a high-kurtosis value suggests a sharp peak, while a low one indicates less dramatic fluctuations in the input time-series. Given a 3-C time-series to the kurtosis algorithm, we calculate the kurtosis value of each channel in the “Numerical experiment results” section.

4 DATA GENERATION

In this paper, we utilized two representative data sets to demonstrate the network’s performance across data from different regions. Among them, TXED was used to test the earthquake discrimination and arrival picking tasks, and STEAD was used in the transfer learning experiments. The TXED is a high-quality data set with 519 689 60-second 3-C seismograms, consisting of 312 231 earthquake 3-C waveforms and 207 458 noise 3-C waveforms. It is worth noting that the signal waveforms of TXED all go through a robust manual picking process and are of relatively high quality. We randomly select 10 000 signal waveforms and 10 000 noise waveforms from the TXED data set as the training data. The signal waveforms were pre-processed with several common stages, for example de-trending, de-instrument response, resampling to 100 Hz, band-pass filtering and blank data interpolation. We use bandpass filtering between 1 and 45 Hz to remove the high- and low-frequency noise, for example ambient noise, random noise, heavy machine noise and vehicle noise. Each 3-C waveform has a fixed window size of 6000 samples, and most of the earthquake signals were segmented according to the P-wave arrival time from 0–10 s. Fig. 4 shows several representative signal waveforms, whereas Figs 4(a) and (b) display microearthquake waveforms with different signal-to-noise ratios (SNRs; Chen & Fomel 2015), Fig. 4(c) shows a very minor earthquake with Ml = 2.5 and the exhibited waveform has a relatively low SNR, and Fig. 4(d) is a moderate earthquake with Ml = 5.4. In Fig. 5, several representative noise waveforms are shown: random, unknown and vehicle noise. The TXED includes various types of waveforms, which cover all types of possible solutions. STEAD is a global earthquake data set containing 1.2 million labelled earthquake records and 2613 stations worldwide. Both TXED and STEAD contain numerous labelled data to train a robust network model. However, transferring a trained model between the two datasets remains a potential challenge due to differences in regions and acquisition periods and variations in ambient noise, amplitude and frequency.

Figure 4.

Example of signal waveform with different magnitudes and SNR in TXED. (a) The waveform of a microearthquake with high SNR. (b) A microearthquake waveform with relatively low SNR. (c) A moderate earthquake waveform with relatively high SNR. (d) The signal waveform of one of the largest earthquakes in TXED.

Figure 5.

Example of different types of noise waveform in TXED. (a) Regular noise with unknown cases. (b) and (c) Random noise with low and high amplitudes, different frequencies, respectively. (d) Ambient noise.

5 RESULTS

This section begins with a hyperparameter setup. Then, we utilize two data sets from TXED to demonstrate the effectiveness of the proposed classification network. Finally, we present the results of P- and S-wave arrival picking using the multitask deep neural network.

5.1 Hyperparameter setup

The goal of the proposed classification network is to classify the signal and noise waveforms using the 3-C time-series as the input. The network’s output consists of two columns: the first column stands for the probabilities of the signal waveform and the second one represents the probabilities of the noise waveform. We introduce a threshold strategy in eq. (7) to evaluate the performance. When the probability |$y_{\mathrm{ ph}}$| of the input signal or noise is greater than 0.5, the value is assigned to 1; otherwise, the value is assigned to 0.

$$\begin{eqnarray} {{{\bf y}}_{\mathrm{ ph} = }}\left\lbrace {\begin{array}{*{20}{c}}1,&{{y_{(i,j)}} \ge 0.5}\\ {0,}&{{y_{(i,j)}} \lt 0.5} \end{array}} \right., \end{eqnarray}$$

(7)

To ensure the reproducibility of the experimental results, we set the same hyperparameters (shown in Table 2) for each numerical experiment. We use 50 epochs for each training process of classification tasks. The Adam optimizer (Kingma & Ba 2014) is applied for fine-tuning the network, employing a learning rate step-down strategy that enables the network to achieve a more accurate optimal solution. The linear activation function is introduced to the first MSFF block to avoid partial loss of neurons with negative values, and the LeakyReLU (Chen et al. 2019) is employed for nonlinear feature presentations. Additionally, to prevent network overfitting, we employed an early stopping strategy to monitor the loss on the validation set. Specifically, if the loss on the validation set does not effectively decrease (i.e. no significant drop) over a consecutive period of five epochs, we halt the network training.

Table 2.

Hyperparameters of the proposed DL network.

Hyperparameter	Specifications
Input size	(None, 6000, 3)
Output size	(None, 2)
Parameters	125 394
Optimizer	Adam
Loss	KLD & Binary_crossentropy
Activation function	Linear & LeakyReLU
Batch size	1024
Epoch	50
Learning rate	[2e − 04, 1e − 03]

Hyperparameter	Specifications
Input size	(None, 6000, 3)
Output size	(None, 2)
Parameters	125 394
Optimizer	Adam
Loss	KLD & Binary_crossentropy
Activation function	Linear & LeakyReLU
Batch size	1024
Epoch	50
Learning rate	[2e − 04, 1e − 03]

Table 2.

Hyperparameters of the proposed DL network.

Hyperparameter	Specifications
Input size	(None, 6000, 3)
Output size	(None, 2)
Parameters	125 394
Optimizer	Adam
Loss	KLD & Binary_crossentropy
Activation function	Linear & LeakyReLU
Batch size	1024
Epoch	50
Learning rate	[2e − 04, 1e − 03]

Hyperparameter	Specifications
Input size	(None, 6000, 3)
Output size	(None, 2)
Parameters	125 394
Optimizer	Adam
Loss	KLD & Binary_crossentropy
Activation function	Linear & LeakyReLU
Batch size	1024
Epoch	50
Learning rate	[2e − 04, 1e − 03]

5.2 Numerical experiment results

After obtaining training samples randomly selected from the TXED, we divided them into training, testing and validation sets using a proportion of 0.85:0.1:0.05, respectively. To ensure we select the same data set in each training session, we set a fixed random seed during the selection of the training data set. Also, we set a different random seed to choose a new data set for model inference. Similarly, we introduced an extensive testing data set extracted from a specific part of the TXED to evaluate the generalizability of the classification network further. We performed five training sessions for each testing data set to make the results reliable and calculated the corresponding confusion matrix using the average values. Additionally, we compared the evaluation metrics of our proposed method with a state-of-the-art approach (Jiang et al. 2023) by testing on the above data sets.

Fig. 6 shows the accuracy and loss curves in a specific training session of the proposed DL network. It is obvious that from epoch 0 to epoch 10, the loss curve decreases rapidly, and the Acc curve increases up to around 98 per cent. After epoch 10, the Acc and loss values converge step-by-step. We compare the classification outcomes of the proposed method with the benchmark method using the two data sets mentioned earlier, as shown in Table 3. The results of all evaluation indicators indicate that our DL framework can achieve more than 99 per cent classification accuracies in both randomly selected data and the data set extracted from a fixed part of TXED. The low standard deviations across all evaluation metrics confirm that the proposed method consistently achieves stable and generalized performance in discriminating signal and noise waveforms across multiple tests. These results show that the proposed method achieves higher accuracies in both experiments than the benchmark method. Notably, when comparing the standard deviations of the evaluation metrics, the proposed method exhibits lower standard deviations in Acc, Precision and F1-score. This indicates that our method provides more stable and generalized predictions when applying the trained model to unseen data sets.

Figure 6.

Acc and loss curves of the training process. The network is trained with 10 000 signal waveforms and 10 000 noise waveforms randomly selected from the TXED. The ratio of training, testing, and validation samples is 0.8:0.15:0.05.

Table 3.

Classification result comparisons of different data sets using the proposed and benchmark method.

Method	Dataset	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
Proposed method	Part of TXED data	99.78 per cent	0.27 per cent	99.93 per cent	0.03 per cent	99.90 per cent	0.04 per cent	99.91 per cent	0.03 per cent
	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Benchmark method^a	Part of TXED data	98.03 per cent	1.21 per cent	96.50 per cent	4.25 per cent	99.72 per cent	0.02 per cent	98.07 per cent	1.10 per cent
	Randomly selected data	98.42 per cent	0.5 per cent	97.00 per cent	1.76 per cent	99.94 per cent	0.00 per cent	98.44 per cent	0.47 per cent

Method	Dataset	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
Proposed method	Part of TXED data	99.78 per cent	0.27 per cent	99.93 per cent	0.03 per cent	99.90 per cent	0.04 per cent	99.91 per cent	0.03 per cent
	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Benchmark method^a	Part of TXED data	98.03 per cent	1.21 per cent	96.50 per cent	4.25 per cent	99.72 per cent	0.02 per cent	98.07 per cent	1.10 per cent
	Randomly selected data	98.42 per cent	0.5 per cent	97.00 per cent	1.76 per cent	99.94 per cent	0.00 per cent	98.44 per cent	0.47 per cent

represents the citation (Jiang et al. 2023).

Table 3.

Classification result comparisons of different data sets using the proposed and benchmark method.

Method	Dataset	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
Proposed method	Part of TXED data	99.78 per cent	0.27 per cent	99.93 per cent	0.03 per cent	99.90 per cent	0.04 per cent	99.91 per cent	0.03 per cent
	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Benchmark method^a	Part of TXED data	98.03 per cent	1.21 per cent	96.50 per cent	4.25 per cent	99.72 per cent	0.02 per cent	98.07 per cent	1.10 per cent
	Randomly selected data	98.42 per cent	0.5 per cent	97.00 per cent	1.76 per cent	99.94 per cent	0.00 per cent	98.44 per cent	0.47 per cent

Method	Dataset	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
Proposed method	Part of TXED data	99.78 per cent	0.27 per cent	99.93 per cent	0.03 per cent	99.90 per cent	0.04 per cent	99.91 per cent	0.03 per cent
	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Benchmark method^a	Part of TXED data	98.03 per cent	1.21 per cent	96.50 per cent	4.25 per cent	99.72 per cent	0.02 per cent	98.07 per cent	1.10 per cent
	Randomly selected data	98.42 per cent	0.5 per cent	97.00 per cent	1.76 per cent	99.94 per cent	0.00 per cent	98.44 per cent	0.47 per cent

represents the citation (Jiang et al. 2023).

Fig. 7 shows the confusion matrices of five training sessions on the randomly selected data from TXED, which contains 10 000 signal and 10 000 noise waveforms. We can see that the number of wrong predictions on signal waveforms ranges from 3 to 14 (i.e. false negatives), and the wrong predictions on noise waveforms range from 0 to 7 (i.e. false positives), which indicates that the proposed DL framework achieves high precision on the TXED discrimination. We have calculated the distributions of these predictions (shown in Fig. 7). The numerical results indicate that the probabilities of true negatives (TN) and true positives (TP) exceed 0.98 in the network model’s prediction results, demonstrating that the proposed method yields promising outcomes for earthquake classification. In Figs 8 and 9, we have visualized eight incorrect predictions on signal and noise waveforms (visualizing from Fig. 7), respectively. To better compare, we have calculated the kurtosis coefficient of each component in a 3-C time-series (predicted incorrectly by the trained model), which characterizes the peakedness of a curve at its mean value. Typically, when the kurtosis value is greater than 3, it is more likely that the time-series of the component represents a signal. Conversely, if the kurtosis value is less than 3, there is a higher probability that the component is noise. By calculating the kurtosis value, we can assess whether the waveform predicted by the network model in case of prediction errors is trustworthy. Figs 8(a), (c), (g), and (h) show relatively low-kurtosis coefficients, which indicate that they suffer from noise interference. As a result, it is challenging to determine noise or signal with the trained model. It can be seen that Figs 8(c) and (h) are caused by label errors. However, the incorrect predictions shown in Figs 8(b), (d), (e) and (f) are attributed to the limitations of the network model. For the wrong predictions on the noise 3-C time-series shown in Fig. 9, all of their mean kurtoses are below 3, indicating that our proposed deep learning network structure fails to predict these waveforms correctly. In summary, the proposed method generally demonstrates encouraging performance on the TXED data.

Figure 7.

Confusion matrix of the discrimination results of the TXED. (a)–(e) The confusion matrices corresponding to five training sessions.

Figure 8.

Visualization of incorrect signal predictions (signal predicted wrongly as noise). Note that we calculated each component’s kurtosis in the 3-C waveforms and displayed them in the subfigures. From (a)–(h), the mean kurtosis of each 3-C time-series are 3.359, 28.481, 0.086, 8.511, 14.485, 26.832, 4.922 and 0.412, respectively.

Figure 9.

Visualization of incorrect noise predictions (noise predicted wrongly as signal). Note that we calculated each component’s kurtosis in the 3-C waveforms and displayed them in the subfigures. From (a)–(h), the mean kurtosis of each 3-C time-series are 0.072, 0.131, 0.253, 0.186, 0.208, 0.107, 0.308 and 0.093, respectively.

5.3 Multitask neural network for seismic arrival picking and classification

The tasks of event discrimination and arrival picking in earthquake data are closely interdependent. Typically, a more accurate classification model enhances the efficiency of P- and S-wave arrival picking. Since classification models are generally easier to develop than arrival-picking models, we designed a multitask neural network based on the classification network shown in Fig. 1 for event classification and first-arrival picking. In this case, we utilized the modified network shown in Fig. 3 to perform the P- and S-wave arrival picking. The proposed network is trained using multistation 3-C signal waveforms with a 60 s time window. All signal waveforms are filtered from 1 to 45 Hz using a bandpass filter, and standard normalization is applied to subtract the mean from each training sample. After obtaining the arrival times of the P and S waves, we use a Gaussian function to generate the labels for P- and S-wave arrivals. We then fed 100 000 labelled training samples into the proposed network, allocating 85 per cent for the training set, 15 per cent for the testing set, and 5 per cent for the validation set. The network is trained for 200 epochs using a BCE loss function. An early stopping strategy is implemented to halt the training process if the validation loss does not decrease for five consecutive epochs. We use the Adam optimizer with a learning rate of 0.001. Once trained, the model is applied to predict the arrival of P and S wave simultaneously using the testing set. As illustrated in Fig. 10, the mean absolute error (MAE) and standard deviation (⁠|$\sigma$|⁠) for P-wave picking are 0.19 and 0.53 s, respectively. The corresponding MAE and |$\sigma$| for S-wave picking are 0.23 and 0.50 s, respectively. We can see from Fig. 10 that the majority of prediction differences fall within ±1 s (i.e. 35 samples) for both P- and S-wave arrivals. Further evaluation demonstrates that the proposed method successfully predicts P and S arrivals with accuracies of 95.07 and 94.54 per cent, respectively. We further visualized the representative picking results for both the training and testing sets, as displayed in Figs 11 and 12. Note that we calculated the index of the maximum value of the P- and S-wave probabilities to determine the arrival times. The dashed lines with different colours in the 3-C waveform represent the predicted arrivals of P and S waves, while the solid lines indicate the actual arrivals. The visualization of each prediction shows that the trained model can predict the P and S waves with negligible error using the multistation waveforms.

Figure 10.

Error distribution for the P- and S-wave arrival picking using the testing set. (a) The error distribution of P-wave arrival picking. (b) The error distribution of the S-wave arrival picking.

Figure 11.

Visualization of the P- and S-wave arrival picking results on the training set. (a)–(d) Randomly selected samples from the predictions of the training set. Note that from top to bottom of each subfigure are the 3-C waveform, the predicted probabilities of P- and S wave, and the labelled arrivals of the P and S wave, respectively.

Figure 12.

Visualization of the P- and S-wave arrival picking results on the testing set. (a)–(d) Randomly selected samples from the predictions of the testing set. Note that from top to bottom of each subfigure are the 3-C waveform, the predicted probabilities of P and S wave, and the labelled arrivals of the P and S wave, respectively.

6 DISCUSSION

This section provides a deeper evaluation of the proposed DL model across various experiments. First, we investigate the impact of the number of MSFF modules and the composition of the training data through control experiments. Then, we explore the stability of the network by training with less data and unbalanced signal and noise waveforms. Finally, the STEAD is introduced to gain further insight into the transferability of the classification network.

6.1 The influence of the number of feature fusion modules on network performance.

To better select the number of MSFF modules in the network, we compared the loss and accuracy curves of the training process with different numbers of MSFF modules in Fig. 13. Each model was trained for 50 epochs. As the number of epochs increases, the loss curves of all training sessions quickly converge, and the corresponding training Acc curve continues to increase. Acc and loss curves of all training sessions change rapidly during the epoch from 0 to 8 and then tend to be smooth. We selected the curves from epochs 5 to 20 to zoom in for a clearer comparison. When comparing the zoomed sections of each subfigure, it is evident that there is no significant difference in the convergence speed of the loss curves when using different numbers of MSFF modules. However, we finally chose four MSFF modules for a good balance between convergence speed and accuracy.

Figure 13.

A comparison of classification performance using different numbers of MSFF modules. A zoomed-in section is provided in each subfigure for better comparison.

6.2 Training with unbalanced data

We have demonstrated the effectiveness of the proposed method by training with balanced data sets. Here, we perform several experiments on unbalanced data sets to explore the robustness of the network. Additionally, all hyperparameters are consistent with those used for the aforementioned classification experiments shown in Table 2.

The first experiment was conducted with 10 000 signal waveforms and fewer noise waveforms (see Fig. 14a). The signal and noise waveform proportions are set as 1:1, 2:1, 3:1, 4:1 and 5:1, respectively. We can see that the Acc and loss curves of the five training processes quickly converge when the number of iterations is between 0 and 20. However, when the signal and noise waveforms ratio is 5:1, the Acc curve rises from approximately 0.6 to 0.9, and the loss curve drops from 0.07 to 0.005, consuming approximately 15 epochs. For other ratios, they converge more quickly. The descent rate of the loss curve matches the ascent rate of the accuracy curve, indicating the model’s effectiveness during training. By comparing the converging time, we found that when the ratio of signal and noise waveforms is 1:1, it consumes the least converging time. Figs 15 (a)–(d) show the confusion matrices of different ratios of signal and noise waveforms in validation sets. It is obvious that the proposed network can achieve an accuracy above 97 per cent even when the data set is very unbalanced.

Figure 14.

Classification performance comparison with unbalanced training samples. (a) The impact on Acc and loss curves using different proportions of signal and noise waveforms (num. of signal waveforms |$> $| num. of noise waveforms). (b) The impact on Acc and loss curves using different proportions of signal and noise waveforms (num. of signal waveforms |$< $| num. of noise waveforms).

Figure 15.

Confusion matrix of the discrimination results (tested in Fig. 14) with unbalanced training samples. (a)–(d) The confusion matrix corresponding to the signal waveform and noise waveform ratio in the training sample is 1:5, 2:5, 3:5 and 4:5, respectively. (e)–(h) The confusion matrix corresponding to the signal and noise ratio in the training sample is 5:1, 5:2, 5:3 and 5:4, respectively.

Fig. 14(b) shows the Acc and loss curves using 10 000 noise waveforms and different numbers of signal waveforms as training samples. We also choose the same ratios as the above experiments. When the ratio is 1:1, the Acc and loss curves cost the least converging time. However, the network obtained similar evaluation curves with a proportion of 1:2, which is different from the previous experiment. When the ratio is 1:5, it shows relatively worse results on the Acc and loss curve than other experiments, consuming the most time to converge. During these experiments, only a few wrong predictions appear (see Figs 15e–h), which demonstrates the proposed network acquires excellent performance in handling challenging tasks.

We further explore the network’s robustness by introducing it to fewer training samples and unbalanced training data. The results are exhibited in Figs 16(a) and (b). We can see that when the numbers of signal and noise waveforms are 4000 and 5000, respectively, the Acc and loss curves are almost identical, indicating that the proposed method can achieve great performance in balanced data with 8000 training samples. However, the network’s performance will worsen when the training samples drop from 6000 to 2000 for balanced training. Fig. 16(b) demonstrates that the trained model still achieves an accuracy of 95.43 per cent when the signal and noise waveform ratio is 5:1, despite the lack of training samples and extremely unbalanced training data.

Figure 16.

Comparisons of Acc and loss curves using different training samples. (a) The impact on Acc and loss curves using different numbers of training data, where ts stands for training samples. (b) With fewer waveforms in the training set. The influence on Acc and loss curves using different proportions of signal and noise waveforms (num. of signal waveforms |$> $| num. of noise waveforms).

6.3 Transfer learning to the STEAD

We have trained a novel DL model on the TXED data set, which achieves high-precision classification. For the transfer learning step, we use the pre-trained model by the proposed network for transfer learning. We applied the transfer learning strategy that froze all layers except the last two in the pre-trained model and then trained a fully connected network with the STEAD. The hyperparameters in the training process are set according to Table 2. We first utilize the pre-trained model from the TXED data set for the STEAD signal detection tasks to compare the waveform classification results before and after transfer learning. The result shows that the accuracy, precision, recall and F1-score are 90.63 per cent, 98.14 per cent, 90.61 per cent and 94.20 per cent, respectively. In contrast, after training, we calculated the average classification performance metrics for STEAD with transfer learning, which are 95.03 per cent, 96.53 per cent, 93.70 per cent and 95.14 per cent, respectively. The results indicate that with transfer learning, our proposed network achieves a higher signal and noise classification accuracy, demonstrating that the proposed DL architecture is more generalized after introducing the transfer learning strategy.

6.4 Weight matrix visualization

Here, we present visualizations of feature maps from different Conv1-D layers and Add layers. Through visualizing these feature maps, we can gain a deeper understanding of the feature information that a specific filter extracts from the input 3-C waveform data. In Fig. 17, each subfigure shows 14 feature maps corresponding to the Conv1D layer in the MSFF block and the output of the hybrid attention model of the proposed DL architecture. In the first MSFF block, the input 3-C signal time-series is transformed into earthquake-like waveforms by the three Conv1-D layers with different kernel sizes. The feature maps in the second MSFF module show the waveform features in a more detailed way. We can see that the feature maps in the third MSFF block and the second hybrid attention module contain more significant first arrival waveform information, which helps the classifier achieve better waveform detection performance. From Figs 17(a) and (b), we can see that the proposed DL architecture can capture decisive waveform characteristics even in a noisy input 3-C signal. Figs 17(c) and (d) demonstrate that our network can easily obtain the waveform features from a signal with a relatively high SNR.

Figure 17.

Feature maps of different Conv1-D and Add layers using the randomly selected signal from the training samples. (a) and (b) show the feature maps of the 574th and 5336th noise waveforms in the testing set. (c) and (d) display the feature maps of the 742nd and 2867th signal waveforms in the testing set. Note that we plot the length of the feature maps of each layer to the left of each subfigure.

6.5 The role of hybrid attention module

The hybrid attention module, which utilizes spatial and channel attention to enhance the feature extraction capabilities of the convolutional neural network, is employed by the proposed method to improve the classification and arrival picking accuracy of 3-C earthquake data. To explore the role of the hybrid attention module in earthquake classification tasks, we conducted an ablation experiment. Table 4 presents the classification results with and without the attention block. The results indicate that incorporating the hybrid attention modules relatively enhances the performance of the proposed method in Acc, Precision, and F1-score, owing to the improved feature extraction capabilities of the attention mechanism.

Table 4.

Ablation experiment on the role of hybrid attention module.

	Data set	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
With attention	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Without attention	Randomly selected data	99.01 per cent	0.01 per cent	98.17 per cent	0.03 per cent	99.89 per cent	0.00 per cent	99.23 per cent	0.1 per cent

	Data set	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
With attention	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Without attention	Randomly selected data	99.01 per cent	0.01 per cent	98.17 per cent	0.03 per cent	99.89 per cent	0.00 per cent	99.23 per cent	0.1 per cent

Table 4.

10.1111/j.1365-246X.2008.03921.x

Ablation experiment on the role of hybrid attention module.

	Data set	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
With attention	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Without attention	Randomly selected data	99.01 per cent	0.01 per cent	98.17 per cent	0.03 per cent	99.89 per cent	0.00 per cent	99.23 per cent	0.1 per cent

	Data set	Accuracy (mean, \|$\sigma$\|⁠)		Precision (mean, \|$\sigma$\|⁠)		Recall (mean, \|$\sigma$\|⁠)		F1-score (mean, \|$\sigma$\|⁠)
With attention	Randomly selected data	99.83 per cent	0.05 per cent	99.81 per cent	0.03 per cent	99.85 per cent	0.03 per cent	99.83 per cent	0.05 per cent
Without attention	Randomly selected data	99.01 per cent	0.01 per cent	98.17 per cent	0.03 per cent	99.89 per cent	0.00 per cent	99.23 per cent	0.1 per cent

7 CONCLUSION

In this paper, we proposed a generalized neural network for automatic earthquake detection. The proposed method contains a feature extractor, a classifier and two hybrid attention modules. The input of the network model is a 3-C time-series, and the output consists of probabilities corresponding to the signal and noise. Additionally, by incorporating decoder layers into the classification network, we build a multitask deep learning framework for P- and S-wave arrival picking. We have explored how the number of MSFF modules affects the network’s training performance and finally settled on using four MSFF modules as the feature extractor. The classifier can accurately classify nonlinear features extracted by the feature extractor in the feature space domain. The hybrid attention mechanism effectively guides the network to focus on waveform features of the time-series, with only a slight increase in parameters. Extensive numerical experiments in TXED demonstrate that our proposed network achieves classification accuracy exceeding 99 per cent for both signal and noise in balanced training samples. Furthermore, even with a limited number of highly unbalanced training samples, it still performs well in earthquake signal and noise waveform discrimination. We successfully transferred the pre-trained model to the STEAD by employing a transfer learning strategy, achieving commendable results. In summary, our proposed DL framework leverages multiscale feature extraction and hybrid attention mechanisms to achieve high-precision classification between earthquake signals and noise, exhibiting impressive generalization capabilities when dealing with diverse data sets.

ACKNOWLEDGMENTS

We would like to express our gratitude to four reviewers and the editor for their constructive suggestions, which have greatly helped to improve the quality of this paper. We also extend our thanks to the providers of the open-source data. This research was supported by the National Natural Science Foundation of China (grant nos. 42174159 and 41904110), the Natural Science Foundation of Hubei Province (grant no. 2021CFB498), and the Open Fund of Key Laboratory of Exploration Technologies for Oil and Gas Resources (Yangtze University), Ministry of Education (grant no. KPI2021-01).

DATA AVAILABILITY

The codes related to this paper are available via https://github.com/cuiyang512/Multi-task-EQDetection. The TXED data set can be downloaded from https://github.com/chenyk1990/txed. The DEMO scripts to extract signal and noise waveforms can be found at https://github.com/chenyk1990/txed/tree/main/demos. The STEAD data set is available from https://github.com/smousavi05/STEAD.

REFERENCES

Beyreuther

Wassermann

2008

Continuous earthquake detection and classification using discrete hidden markov models

Geophys. J. Int.

175

(

1055

–

1066

Chen

Fomel

2015

Random noise attenuation using local signal-and-noise orthogonalization

Geophysics

(

WD1

–

WD9

10.1190/geo2014-0227.1

Chen

Mai

Xiao

Zhang

2019

Improving the antinoise ability of DNNs via a bio-inspired noise adaptive activation function rand softplus

Neural Comput.

(

1215

–

1233

Chen

et al.

2024

TXED: the texas earthquake dataset for AI

Seismol. Res. Lett.

(

2013

–

2022

Cheng

Zhang

2019

First-break picking for microseismic data based on cascading use of shearlet and stockwell transforms

Geophys. Prospect.

(

–

10.1111/1365-2478.12714

Chin

T.-L.

Huang

C.-Y.

Shen

S.-H.

Tsai

Y.-C.

Y. H.

Y.-M.

2019

Learn to detect: Improving the accuracy of earthquake detection

IEEE Trans. Geosci. Remote Sens.

(

8867

–

8878

10.1109/TGRS.2019.2923453

Chung

Gulcehre

Cho

Bengio

2014

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

preprint

(

arXiv

)

10.1016/j.jrmge.2021.09.002

Google Preview

OpenURL Placeholder Text

WorldCat

Duan

Shen

Canbulat

Luo

2021

Classification of clustered microseismic events in a coal mine using machine learning

J. Rock Mech. Geotech. Eng.

(

1256

–

1273

Giacco

Esposito

Scarpetta

Giudicepietro

Marinaro

2009

Support vector machines and mlp for automatic classification of seismic signals at stromboli volcano

, in

Proc. 19th Italian Workshop Neural Nets

, pp.

116

–

123

., eds,

Apolloni

Bassis

Morabito

C.F.

IOS Press

Griffin

Lim

1984

Signal estimation from modified short-time fourier transform

IEEE Trans. Acoust. Speech Signal Process.

(

236

–

243

10.1109/TASSP.1984.1164317

Gulia

Wiemer

2019

Real-time discrimination of earthquake foreshocks and aftershocks

Nature

574

(

7777

193

–

199

10.1038/s41586-019-1606-4

Huang

Liu

Van Der Maaten

Weinberger

K. Q.

2017

Densely connected convolutional networks

, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

IEEE

Honolulu, HI, USA

, pp.

4700

–

4708

Jiang

Stankovic

Parastatidis

Pytharouli

2023

Microseismic event classification with time-, frequency-, and wavelet-domain convolutional neural networks

IEEE Trans. Geosci. Remote Sens.

–

10.1109/TGRS.2023.3262412

Kim

Lee

You

2020

Seismic discrimination between earthquakes and explosions using support vector machine

Sensors

(

1879

doi:10.3390/s20071879

Kingma

D. P.

2014

Adam: A method for stochastic optimization

preprint

(

arXiv

)

10.1109/LGRS.2020.2996640

Kong

Allen

R. M.

Schreier

Kwon

Y.-W.

2016

Myshake: A smartphone seismic network for earthquake early warning and beyond

Sci. Adv.

(

e1501055

doi:10.1126/sciadv.1501055

10.1126/sciadv.1501055

Min

Ahn

J.-K.

Lee

2020

Earthquake event classification using multitasking deep learning

IEEE Geosci. Remote Sens. Lett.

(

1149

–

1153

Linville

Pankow

Draelos

2019

Deep learning models augment analyst decisions for event discrimination

Geophys. Res. Lett.

(

3643

–

3651

Liu

Rao

Chen

2020

Oriented pre-stack inverse Q filtering for resolution enhancements of seismic data

Geophys. J. Int.

223

(

488

–

501

Liu

Song

Zeng

Yang

2021

Microseismic event detection and classification based on convolutional neural network

J. Appl. Geophys.

192

104 380

doi:10.1016/j.jappgeo.2021.104380

10.1016/j.jappgeo.2021.104380

et al.

2023

Fine classification method for massive microseismic signals based on short-time fourier transform and deep learning

Remote Sens.

(

502

doi:10.3390/rs15020502

McBrearty

I. W.

Beroza

G. C.

2022

Earthquake location and magnitude estimation with graph neural networks

, in

2022 IEEE International Conference on Image Processing (ICIP)

IEEE

, pp.

3858

–

3862

10.1109/ICIP46576.2022.9897468

Mousavi

S. M.

Sheng

Zhu

Beroza

G. C.

2019a

STanford EArthquake Dataset (STEAD): A global data set of seismic signals for AI

IEEE Access

179 464

–

179 476

10.1109/ACCESS.2019.2947848

Mousavi

S. M.

Zhu

Ellsworth

Beroza

2019b

Unsupervised clustering of seismic signals using deep convolutional autoencoders

IEEE Geosci. Remote Sens. Lett.

(

1693

–

1697

10.1109/LGRS.2019.2909218

Mousavi

S. M.

Zhu

Sheng

Beroza

G. C.

2019c

CRED: A deep residual network of convolutional and recurrent units for earthquake signal detection

Sci. Rep.

(

10267

doi:10.1038/s41598-019-45748-1

10.1038/s41598-019-45748-1

Mousavi

S. M.

Ellsworth

W. L.

Zhu

Chuang

L. Y.

Beroza

G. C.

2020

Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking

Nat. Commun.

(

3952

doi:10.1038/s41467-020-17591-w

10.1038/s41467-020-17591-w

Provost

Hibert

Malet

J.-P.

2017

Automatic classification of endogenous landslide seismicity using the random forest supervised classifier

Geophys. Res. Lett.

(

113

–

120

Rioul

Flandrin

1992

Time-scale energy distributions: A general class extending wavelet transforms

IEEE Trans. Signal Process.

(

1746

–

1757

Saad

O. M.

Hafez

A. G.

Soliman

M. S.

2020

Deep learning approach for earthquake parameters classification in earthquake early warning system

IEEE Geosci. Remote Sens. Lett.

(

1293

–

1297

10.1109/LGRS.2020.2998580

Saad

O. M.

Huang

Chen

Savvaidis

Fomel

Pham

Chen

2021

Scalodeep: A highly generalized deep learning framework for real-time earthquake detection

J. Geophys. Res.: Solid Earth

126

(

e2020JB021473

doi:10.1029/2020JB021473

Saad

O. M.

Chen

Savvaidis

Fomel

Chen

2022a

Real-time earthquake detection and magnitude estimation using vision transformer

J. Geophys. Res.: Solid Earth

127

(

e2021JB023657

doi:10.1029/2021JB023657

Saad

O. M.

Soliman

M. S.

Chen

Amin

A. A.

Abdelhafiez

2022b

Discriminating earthquakes from quarry blasts using capsule neural network

IEEE Geosci. Remote Sens. Lett.

–

10.1109/LGRS.2022.3207238

Saad

O. M.

et al.

2023a

Earthquake forecasting using big data and artificial intelligence: A 30-week real-time case study in china

Bull. seism. Soc. Am.

113

(

2461

–

2478

Saad

O. M.

et al.

2023b

EQCCT: A production-ready earthquake detection and phase-picking method using the compact convolutional transformer

IEEE Trans. Geosci. Remote Sens.

–

10.1109/TGRS.2023.3319440

Seydoux

Balestriero

Poli

Hoop

M. D.

Campillo

Baraniuk

2020

Clustering earthquake signals and background noises in continuous seismic data with unsupervised deep learning

Nat. Commun.

(

3972

doi:10.1038/s41467-020-17841-x

10.1038/s41467-020-17841-x

Shakeel

Itoyama

Nishida

Nakadai

2021

EMC: Earthquake magnitudes classification on seismic signals via convolutional recurrent networks

, in

2021 IEEE/SICE International Symposium on System Integration (SII)

IEEE

, pp.

388

–

393

10.1109/IEEECONF49454.2021.9382696

Sharma

Nanda

S. J.

2020

Timely detection of seismic waves in ground motion data using improved s-transform

, in

2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE)

IEEE

, pp.

–

Simonyan

Zisserman

2014

Very deep convolutional networks for large-scale image recognition

preprint

(

arXiv

)

10.1016/j.cageo.2021.104980

Tibi

Linville

Young

Brogan

2019

Classification of local seismic events in the Utah region: A comparison of amplitude ratio methods with a spectrogram-based machine learning approach

Bull. seism. Soc. Am.

109

(

2532

–

2544

Torrey

Shavlik

2010

Transfer learning

, in

Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques

IGI global

, pp.

242

–

264

Trani

Pagani

G. A.

Zanetti

J. P. P.

Chapeland

Evers

2022

DeepQuake—An application of CNN for seismo-acoustic event classification in The Netherlands

Comput. Geosci.

159

104 980

Vaezi

Van der Baan

2015

Comparison of the STA/LTA and power spectral density methods for microseismic event detection

Geophys. Suppl. MNRAS

203

(

1896

–

1908

Wang

Bian

Zhang

Hou

2023

Classification of earthquakes, explosions and mining-induced earthquakes based on xgboost algorithm

Comput. Geosci.

170

105 242

doi:10.1016/j.cageo.2022.105242

10.1016/j.cageo.2022.105242

Wang

Liu

2022

Low-frequency extrapolation of prestack viscoacoustic seismic data based on dense convolutional network

IEEE Trans. Geosci. Remote Sens.

–

10.5194/nhess-21-339-2021

OpenURL Placeholder Text

WorldCat

Wenner

Hibert

van Herwijnen

Meier

Walter

2021

Near-real-time automated classification of seismic signals of slope failures with continuous random forests

Nat. Hazards Earth Syst. Sci.

(

339

–

361

Woo

Park

Lee

J.-Y.

Kweon

I. S.

2018

Cbam: Convolutional block attention module

, in

Proceedings of the European Conference on Computer Vision (ECCV)

, pp.

–

Xie

Girshick

Farhadi

2016

Unsupervised deep embedding for clustering analysis

, in

International Conference on Machine Learning

PMLR

, pp.

478

–

487