ABSTRACT

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) has acquired tens of millions of low-resolution spectra of stars. This paper investigates the parameter estimation problem for these spectra. To this end, we propose the deep learning model StarGRU network (StarGRUNet). This network is applied to estimate the stellar atmospheric physical parameters and 13 elemental abundances from LAMOST low-resolution spectra. On the spectra with signal-to-noise ratios greater than or equal to 5, the estimation precisions are 94 K and 0.16 dex on |$T_\texttt {eff}$| and log  g respectively, 0.07 to 0.10 dex on [C/H], [Mg/H], [Al/H], [Si/H], [Ca/H], [Ni/H] and [Fe/H], 0.10 to 0.16 dex on [O/H], [S/H], [K/H], [Ti/H] and [Mn/H], and 0.18 and 0.22 dex on [N/H] and [Cr/H]. The model shows advantages over other available models and high consistency with high-resolution surveys. We released the estimated catalogue computed from about 8.21 million low-resolution spectra in LAMOST DR8, code, trained model, and experimental data for astronomical science exploration and data processing algorithm research.

1 INTRODUCTION

In recent years, a series of large-scale sky survey programs have been conducted to acquire large numbers of stellar spectra, such as the Apache Point Observatory Galactic Evolution Experiment (APOGEE; Prieto et al. 2010), the Galactic Archaeology with HERMES Survey (GALAH; De Silva et al. 2015), the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST), the Experiment for Galactic Understanding and Exploration (LEGUE; Deng et al. 2012; Zhao et al. 2012), the Gaia-ESO Public Spectroscopic Survey (Gaia-ESO; Gilmore et al. 2012), the Sloan Extension for Galactic Understanding and Exploration (SEGUE; Yanny et al. 2009), the RAdial Velocity Experiment (RAVE; Steinmetz et al. 2006), and so on. Stellar spectra contain rich celestial information, such as the motions of objects, atmospheric physical parameters, elemental abundances, etc. Stellar spectra information can be used to explore stellar evolution, galaxy dynamics, etc. Therefore, the estimation of stellar atmospheric physical parameters and elemental abundances from spectra is vital in large-scale spectroscopic surveys.

Large-scale low-resolution and medium-resolution surveys are typically distinguished by the massive amount of data, the presence of much data with a relatively low signal-to-noise ratio, and an extensive range of data quality. These difficulties challenge the computational efficiency of traditional spectral parameter estimation methods and their robustness to spectral quality. Therefore, spectral parameter estimation research based on machine learning has attracted much attention (Li et al. 2015; Xiang et al. 2016; Bu & Pan 2018; Zhang, Liu & Deng 2020; Xiang et al. 2021). The basic idea of such methods is to represent the parameter estimation problem as a mapping from spectral feature information to the parameters being estimated. The model parameters for this mapping are determined by calculating a batch of empirical data. The parameters of each of these observed spectra are known. These parameters are usually defined based on high-quality, high-resolution spectra with the equivalent widths method or the calculation of chemical elemental absorption lines (Jofré, Heiter & Soubiran 2019).

Traditional machine learning methods for estimating stellar spectral parameters usually consist of two key procedures: feature extraction and mapping learning. The feature extraction procedure learns an appropriate representation for stellar spectra. The spectral feature representation not only determines the interpretability and accuracy limits of the parameter estimation model, but also affects the learning difficulty of the mapping relationship (Li et al. 2014, 2015). The mapping learning procedure provides the mapping relationship from the spectral information to the parameters to be estimated. Typical feature extraction methods are wavelet analysis and wavelet packet decomposition (Li et al. 2015), auto-encoder neural networks (Yang & Li 2015), the Least Absolute Shrinkage and Selection Operator (LASSO; Li et al. 2014), principal component analysis (PCA; Bu & Pan 2018), kernel-based principal component analysis (KPCA; Xiang et al. 2016), etc. The commonly used machine learning methods in stellar spectral parameter estimation are support vector machines (Li et al. 2014; Zhang et al. 2020), linear regression (Li et al. 2015), Gaussian process regression (Bu & Pan 2018), and neural networks (Li et al. 2014). The limitation of the traditional machine learning stellar spectral parameter estimation scheme is that the feature learning and mapping learning of the spectra are performed as two separate procedures. This characteristic results in some difficulties in designing this kind of scheme, and there are some potential improvements that could be made on parameter estimation performance.

With the advent of artificial intelligence and the big data era, deep learning methods have become the dominant methods for estimating parameters from stellar spectra, such as StarNet (Fabbro et al. 2018; Zhang et al. 2019), AstroNN (Leung & Bovy 2018), SPCANet (Wang et al. 2020), and so on. These methods combine feature learning and mapping learning into a single procedure by utilizing neural networks. The procedure combination simplifies the design of the parameter estimation scheme and improves prediction performance. Therefore, neural networks promote research into and the application of the spectral parameter estimation of stars.

This paper investigates the problem of estimating the atmospheric physical parameters of stars and elemental abundances from low-resolution spectra from LAMOST. LAMOST, also referred to as the Guo Shoujing Telescope, is located at the Xinglong National Astronomical Observatory in Hebei, China. It is a distinctive reflecting Schmidt telescope with 4000 optical fibres on the focal plane. This telescope can simultaneously observe up to 4000 targets in a field of view of 20-square-degrees. Since 2015, LAMOST has released several versions of data, from DR1 to DR8. Among them, LAMOST DR8 is the latest version. The LAMOST DR8 consists of |$11\, 214\, 076$| low-resolution stellar spectra covering a wavelength range of 3690–9100 Å, with a resolution of about 1800 at 5500 Å.

In order to estimate the parameters from LAMOST low-resolution stellar spectra, a series of studies have been carried out. These studies have been investigated from traditional machine learning schemes to deep learning solutions. Some representative studies based on traditional machine learning schemes are KPCA (Xiang et al. 2016), The Cannon (Ting et al. 2017; Ho et al. 2017), SLAM (Zhang et al. 2020), SCDD (Xiang et al. 2021), and LASSO-MLPNet (Li et al. 2022,a,b). Some typical investigations of deep learning methods are GSN (Rui et al. 2019a), StarNet (Zhang et al. 2019), DD-Payne (Xiang et al. 2019), HotPayne (Xiang, Maosheng et al. 2022), astroNN (Li et al. 2022c), and Coord-DenseNet (Cai et al. 2023). With the increase of data volume and the development of artificial intelligence methods, deep learning methods have been more and more widely applied to estimate stellar parameters from low-resolution spectra. However, among the methods for estimating stellar parameters from the low-resolution spectra of LAMOST DR8, Wang et al. (2022) focused only on the spectra with higher signal-to-noise ratios (S/NLAMOST > 80 and S/NAPOGEE > 70), and Li et al. (2022a) and Li et al. (2022b) estimated the |$T_\texttt {eff}$|⁠, log  g and [Fe/H] from spectra, repectively, with 20 ≤ S/NLAMOST ≤ 30 and 5 ≤ S/NLAMOST ≤ 80. These constraints led to a very limited number of samples in the reference set. On the other hand, Li et al. (2022c) estimated stellar parameters only for giant stars, and Cai et al. (2023) predicted only the lithium abundance for some giant star spectra. Therefore, they estimate only a few parameters, and the processed spectra are only a small fraction of the observed data.Therefore, our study covers a broader range of spectral signal-to-noise ratios (S/NLAMOST ≥ 5), estimates a wider variety of parameters (16), and includes a more significant number of stellar spectra (about 8.21 million).

To determine the stellar parameters (effective temperature, surface gravity, and metal abundance) for the vast amount of LAMOST spectral data, researchers developed the LAMOST Stellar Parameter Pipeline (LASP; Luo et al. 2015). LASP provides parameter estimation results for LAMOST DR8 low-resolution spectra using the ELODIE spectral library as templates and a χ2 minimization method based on the University of Lyon Spectroscopic analysis Software (ULySS) procedure (Wu et al. 2011). However, our preliminary study shows that the precision of the LASP estimation results decreases rapidly with the decline of the signal-to-noise ratio (S/R) of spectra. In the case of 5 ≤ S/Ng < 8, 8 ≤ S/Ng < 10, 10 ≤ S/Ng < 20, and 20 ≤ S/Ng < 30, the mean absolutes of error (MAEs) of LASP are 178.5, 179.0, 146.3, and 135.3 K on |$T_\texttt {eff}$|⁠; 0.446, 0.356, 0.256, and 0.217 dex on log  g; and 0.148, 0.150, 0.107, and 0.087 dex on [Fe/H], while the standard deviations of error (σ) are 257.8, 292.2, 200.6, and 176.5 K on |$T_\texttt { eff}$|⁠; 0.635, 0.534, 0.372, and 0.325 dex on log  g; and 0.194, 0.211, 0.157, and 0.125 dex on [Fe/H]. Therefore, Li et al. (2022a,b) conducted some investigations and improved the precision of the parameter estimates compared with LASP (Fig. 1). However, Li et al. (2022a,b) and LASP are limited to estimating three stellar atmospheric physical parameters, namely |$T_\texttt {eff}$|⁠, log  g, and [Fe/H]. Therefore, this paper focused on further improving the precision of stellar atmospheric physical parameter estimation while also investigating the measurement of 13 more elemental abundances ([C/H], [Mg/H], [Al/H], [Si/H], [Ca/H], [N/H], [O/H], [S/H], [Ti/H], [Cr/H], [Mn/H], [Ni/H], and [K/H]).

Parameter estimation situations of LAMOST low-resolution spectra: the dependences of the MAE of the estimations from LASP (Luo et al. 2015) and LASSO MLPNet (Li et al. 2022b) on the SNR. MAE, mean absolute of error; σ, standard deviation of error; LASP, LAMOST Stellar Parameter Pipeline.
Figure 1.

Parameter estimation situations of LAMOST low-resolution spectra: the dependences of the MAE of the estimations from LASP (Luo et al. 2015) and LASSO MLPNet (Li et al. 2022b) on the SNR. MAE, mean absolute of error; σ, standard deviation of error; LASP, LAMOST Stellar Parameter Pipeline.

The implementation codes of the proposed neural networks in this paper are performed in Tensorflow. The full project and its documentation are available at http://doi.org/10.12149/101216. This project includes the estimated catalogue computed from about 8.21 million low-resolution spectra in LAMOST DR8, code, trained models, and experimental data for astronomical science exploration and data processing algorithm research. The project documentation will provide a detailed description of the overall project architecture.

The remainder of this paper is organized as following. Section 2 presents the data used in this paper; Section 3 describes the proposed methodology, its evaluation, and model uncertainty analysis; Section 4 gives our application results on approximately 8.21 million low-resolution spectra from LAMOST; Section 5 provides some conclusions.

2 REFERENCE DATA SETS AND THEIR PRE-PROCESSING

The proposed scheme in this paper is a machine learning method. This kind of method needs a reference data set (referred to as a reference set). The reference set is used for learning the model parameters of the mapping from the spectral information to the stellar parameters to be estimated. The model parameters refer to the configurations of a machine learning model, for example, the connection weights in a neural network or a deep learning model. Learning the model parameters is to estimate the model parameters by computing the reference set. Therefore, the reference set is a knowledge carrier for the stellar parameter estimation problem, consisting of the observed spectra and their stellar atmospheric physical parameters and elemental abundances. The observed spectra in the reference set are obtained from the LAMOST DR8 low-resolution spectral library. The stellar atmospheric parameters and elemental abundances of the observed spectra are obtained from the APOGEE DR17 catalogue. The spectral parameters estimated in this work include the effective temperature |$T_\texttt {eff}$|⁠, surface gravity log  g, and 14 elemental abundances [X/H] (where X refers to C, N, O, Mg, Al, Si, S, K, Ca, Ti, Cr, Mn, Fe, Ni).

2.1 APOGEE and APOGEE DR17 catalogue

LAMOST spectra have a low resolution, and the SNR of a large fraction of them is below 30. These characteristics result in there being considerable room for improvement in the estimation precision of the LASP estimation from LAMOST spectra. Moreover, LASP does not give abundance estimates for elements other than [Fe/H]. One possible solution is to transfer parameter information from other high-resolution and high-quality survey spectral libraries to the LAMOST spectral library based on the spectra from common sources.

The Apache Point Observatory Galactic Evolution Experiment (APOGEE; Prieto et al. 2010) is a high-resolution infrared sky survey based on the Sloan telescope, with a band coverage from 1.51 to 1.70 μm. ASPCAP (the APOGEE Stellar Parameter and Chemical Abundances Pipeline) gives estimates of |${T}_\texttt {eff}$|⁠, log  g, and chemical elemental abundances for APOGEE spectra. The APOGEE DR17 catalogue provides the atmospheric parameters (⁠|${T}_\texttt {eff}$|⁠, log  g, [Fe/H]) and elemental abundances for 475 144 stars. The ranges of the stellar atmospheric parameters in the APOGEE DR17 catalogue are [3500, 7000] K for |${T}_\texttt {eff}$|⁠, [ − 0.5, 5] dex for log  g, and [ − 2.0, 0.5] dex for [Fe/H].

This paper thus builds a reference data set by cross-matching the APOGEE DR17 catalogue and the LAMOST DR8 low-resolution spectral library. Each sample in this data set consists of one LAMOST low-resolution spectrum, and the estimations from the common-source observation in APOGEE DR17. The final reference set consists of 240 448 observed spectra and their corresponding stellar parameters. The spectral parameters explored in this paper include the stellar atmospheric physical parameters |$T_\texttt {eff}$|⁠, log g, [Fe/H], and 13 elemental abundances [X/H], where X refers to C, N, O, Mg, Al, Si, S, K, Ca, Ti, Cr, Mn, and Ni.

It is shown that there is a difference in the effective features of parameter estimation between low-S/N spectra and high-S/N spectra. In order to increase the parameter estimation performance, the reference set is further divided into two subsets SlS/N and ShS/N based on the signal-to-noise ratio criteria 5 ≤ S/Ng ≤ 50 and S/Ng > 50. The sample sizes of these two reference subsets are 96 200 and 144 248, respectively. For the reference set SlS/N, we randomly divide it into a training set |$S^{\rm lS/N}_{\rm tr}$|⁠, a validation set |$S^{\rm lS/N}_{\rm val}$|⁠, and a test set |$S^{\rm lS/N}_{\rm te}$| in the ratio of 7:1:2. The sample sizes of |$S^{\rm lS/N}_{\rm tr}$|⁠, |$S^{\rm lS/N}_{\rm val}$|⁠, and |$S^{\rm lS/N}_{\rm te}$| are 67 340, 9620, and 19 240, respectively. These three reference sets were used respectively for training, hyperparameter selection, and performance evaluation for the parameter estimation model used on the spectra with low S/Ng. Similarly, we randomly divide the reference set ShS/N into three subsets |$S^{\rm hS/N}_{\rm tr}$|⁠, |$S^{\rm hS/N}_{\rm val}$|⁠, and |$S^{\rm hS/N}_{\rm te}$|⁠, which are used for training, hyperparameter selection, and performance evaluation for the parameter estimation model used on the spectra with high S/Ng. The sample numbers of |$S^{\rm hS/N}_{\rm tr}$|⁠, |$S^{\rm hS/N}_{\rm val}$|⁠, and |$S^{\rm hS/N}_{\rm te}$| are 100 973, 14 425, and 28 850, respectively. Using these datasets, we can detect the appropriate spectral features respectively for low-S/N spectra and high-S/N spectra.

2.2 Data pre-processing

The observed spectra are negatively affected by many factors, such as redshift, noise, and skylight. These factors can decrease the precision and stability of parameter estimation, and the demand for more reference data, and result in the demand for more reference data (Xiong, Li & Liao 2022). Therefore, the stellar spectral data must be pre-processed before input into the parameter estimation model. The specific pre-processing steps are as follows.

2.2.1 Wavelength correction

We used the radial velocity (RV) for wavelength correction to move each spectrum to its rest frame:

(1)

where λ′, λ, c, and RV respectively denote the corrected wavelength, the original wavelength, the speed of light, and the radial velocity. In this paper, the wavelength correction is performed using the radial velocity estimates given by the official LAMOST stellar parameter estimation pipeline (LASP).

2.2.2 Linear interpolation resampling

We utilized the maximum common wavelength range [ 3841Å, 5699Å] and [ 5901Å, 8798Å] respectively for the blue end and red end of all spectra. Based on the common wavelength range, we resampled each spectrum using a linear interpolation method with a resampling step size of 0.0001 dex in logarithmic space.

2.2.3 Denoising

The observed spectra are usually contaminated with bad pixels and impulse noise, which can negatively affect the mapping learning of the model. Therefore, the observed spectra need to be denoised. To this end, we used the median filtering method to reduce the spectral noise. The size of the filtering window is 3 pixels.

2.2.4 Continuum normalization

Because the spectrophotometric correction applied to these spectra is only an approximation, the observed fluxes at different wavelengths are not accurate in an absolute sense. Therefore, continuum normalization (Fiorentin et al. 2007; Wang et al. 2020; Li et al. 2022b) is required prior to parameter estimation. The basic step of continuum normalization is to estimate the continuum of every spectrum by curve fitting first. This estimated continuum is referred to as a pseudo-continuum. Then, each pixel of a spectrum is divided by the flux of the corresponding pseudo-continuum. The pseudo-continuum is an estimation of the trend in the dependences of the spectral fluxes on wavelength (Fig. 2b). The continuum is generally estimated by a polynomial fitting method (Fiorentin et al. 2007; Wang et al. 2020; Li et al. 2022b). In this paper, the pseudo-continua are estimated separately for the blue-end and red-end spectra using a fifth-order polynomial fitting.

A LAMOST DR8 low-resolution spectrum (spec-55863-M31_011N40_B1_sp08-198) and its pre-processing results. The horizontal and vertical coordinates characterize the wavelength and flux, respectively.
Figure 2.

A LAMOST DR8 low-resolution spectrum (spec-55863-M31_011N40_B1_sp08-198) and its pre-processing results. The horizontal and vertical coordinates characterize the wavelength and flux, respectively.

2.2.5 Secondary denoising and spectrum-wise normalization

After continuum normalization, there are negative effects from aberrant variation ranges on fluxes between different spectra, and from interferences from non-impulse noises. The presence of non-impulse noise reduces the sensitivity of the algorithm to weak spectral features. Therefore, each continuum-normalized spectrum x = (x1, ⋅⋅⋅, xD)T is further processed as follows. In cases where a flux is smaller than μ − 3σ or larger than μ + 3σ, this flux is replaced by μ; and each spectral flux xi is transformed as follows:

(2)

where |$\mu = \sum \limits _{i=1}^D{x_i}/D$| and

Fig. 2 shows a spectrum and its pre-processing results. It can be seen that the spectral features are significantly enhanced after pre-processing.

3 STELLAR SPECTRAL PARAMETER ESTIMATION METHOD STARGRUNET AND ITS EVALUATION

3.1 StarGRUNet

The proposed stellar spectral parameter estimation scheme is an artificial neural network (NN). The NN is a hierarchically organized computational model. More information about NNs can be found in Goodfellow, Bengio & Courville (2016) and Li et al. (2022b). The proposed NN is presented in Table 1. Compared with the previous work (Li et al. 2022b), the BGANet model is further equipped with some Bidirectional Gated Recurrent Unit (Bi-GRU) learning layers and a self-attention learning layer. Actually, the BGANet is the abbreviation for Bi-GRU-Attention Network. The Bi-GRU learning exploits the correlation information between various wavelength subbands, and the Self-Attention learning module discovers parameter-sensitive features of different types of spectra automatically. For more information about Bi-GRU and Self-Attention learning, please refer to Niu, Zhong & Yu (2021).

Table 1.

The BGANet network. In step (1), there is a model parameter t, which indicates the number of wavelength subbands; in step (2), there are parameters n and l1, ⋅⋅⋅, ln, which indicate the number of Bi-GRU layers and the dimension of the features of interest of each Bi-GRU learning layer, respectively.

StepCalculation
InputPre-processed spectra
(1)Dividing each spectrum into t subbands
with equal wavelength width
(2)A series of Bi-GRU learning layers
(3)A Self-Attention learning layer
(4)A fully connected learning layer
OutputAn estimated spectral parameter
StepCalculation
InputPre-processed spectra
(1)Dividing each spectrum into t subbands
with equal wavelength width
(2)A series of Bi-GRU learning layers
(3)A Self-Attention learning layer
(4)A fully connected learning layer
OutputAn estimated spectral parameter
Table 1.

The BGANet network. In step (1), there is a model parameter t, which indicates the number of wavelength subbands; in step (2), there are parameters n and l1, ⋅⋅⋅, ln, which indicate the number of Bi-GRU layers and the dimension of the features of interest of each Bi-GRU learning layer, respectively.

StepCalculation
InputPre-processed spectra
(1)Dividing each spectrum into t subbands
with equal wavelength width
(2)A series of Bi-GRU learning layers
(3)A Self-Attention learning layer
(4)A fully connected learning layer
OutputAn estimated spectral parameter
StepCalculation
InputPre-processed spectra
(1)Dividing each spectrum into t subbands
with equal wavelength width
(2)A series of Bi-GRU learning layers
(3)A Self-Attention learning layer
(4)A fully connected learning layer
OutputAn estimated spectral parameter

Owing to the influence of random factors in model initialization and the learning process, the generalization ability of individual BGANet usually can be improved further. One approach is to employ an ensemble learning strategy to combine the learning results of several BGANet models. The fundamental idea of ensemble learning is to improve prediction performance by training multiple BGANet learners and exploiting their complementary capabilities. The typical methods for combining the regression prediction results of several learners are simple average, weighted average, and learning techniques.

The distinctive characteristics of the stellar spectral parameter estimation problem studied in this paper are the large size of the reference data set and the large amount of model parameters. These characteristics require that the ensemble learning strategy should be easy to implement, efficient, and stable. Therefore, we adopted the Blending learning strategy – a simplified version of the Stacking learning method (Wolpert 1992) – and formed the StarGRUNet method, which is illustrated in Fig. 3.

The principles of the proposed StarGRUNet.
Figure 3.

The principles of the proposed StarGRUNet.

Taking the estimation of parameter |$T_\texttt {eff}$| as an example, the training steps of StarGRUNet are as follows. Suppose Sval = {(xi, yi), i = 1, ⋅⋅⋅, s} is a validation set. First, for each spectrum xiSval, we estimated its |$T_\texttt {eff}$| using n trained BGANet models and computed a vector |${\mathit{ \mathbf{ z}}}_i=(z_i^1, \cdots , z_i^n)^{\rm T}$|⁠. Second, treat S'val = {(zi, yi), i = 1, ⋅⋅⋅, s} as a training set to teach the secondary learner using a multiple linear regressor. The secondary learner fuses the estimations from n BGANet models. The models for estimating other stellar parameters can be trained similarly.

3.2 Model selection and model training

Model hyperparameters can significantly affect predictive performance. There are two sets of hyperparameters in the BGANet model. The first set of hyperparameters consists of the number of wavelength subbands t and the number of Bi-GRU layers n. In the Bi-GRU module, we index the subbands with i = 1, 2, 3,..., t from left to right. In the case of a small t, there is less communication between different wavelength subbands, and it is necessary to take a smaller value for n to reduce the risk of overfitting. In case of a large t, more communication and more complex interdependences are investigated between various wavelength subbands. Therefore, larger values of n are needed to enhance the model complexity by exploiting more complex cross-band correlations and complementarities. In addition, the choice of t and n is theoretically related to the size of the training set. The larger the parameters t and n, the greater the model complexity and the more training data are needed for model learning. This work suggests several configurations of 2 or 3 for n and of 5, 10, or 15 for t based on experimental experience (Table 2).

Table 2.

The proposed configuration for the hyperparameters of three BGANets in the proposed StarGRUNet.

Model|$\boldsymbol{n}$||$\boldsymbol{t}$|l1|$\boldsymbol{l}_2$||$\boldsymbol{l}_3$|
BGANet1256432...
BGANet23101286432
BGANet33151286432
Model|$\boldsymbol{n}$||$\boldsymbol{t}$|l1|$\boldsymbol{l}_2$||$\boldsymbol{l}_3$|
BGANet1256432...
BGANet23101286432
BGANet33151286432
Table 2.

The proposed configuration for the hyperparameters of three BGANets in the proposed StarGRUNet.

Model|$\boldsymbol{n}$||$\boldsymbol{t}$|l1|$\boldsymbol{l}_2$||$\boldsymbol{l}_3$|
BGANet1256432...
BGANet23101286432
BGANet33151286432
Model|$\boldsymbol{n}$||$\boldsymbol{t}$|l1|$\boldsymbol{l}_2$||$\boldsymbol{l}_3$|
BGANet1256432...
BGANet23101286432
BGANet33151286432

The second set of hyperparameters consists of the dimensions {lj, j = 1, ⋅⋅⋅, n} of the features extracted from various Bi-GRU layers, where j is the index of a Bi-GRU layer. A negative correlation should be maintained between lj and j. A small j indicates that the corresponding Bi-GRU layer is close to the input end of the BGANet, and a large lj should be set in this case in order to extract the spectral features as effectively as possible. Similarly, a large j index indicates that the corresponding Bi-GRU layer is close to the output end of the BGANet, and a small lj should be set to reduce noise, redundancies, and the risk of overfitting in mapping learning. In addition, the parameters {lj, j = 1, ⋅⋅⋅, n} also determine the complexity of the BGANet model. A BGANet with a small lj has relatively few model parameters and a low model complexity; in contrast, a BGANet with more model parameters is more complex.

Based on the above-mentioned principles and some experimental experiences, we selected three BGANet models with excellent prediction results on the validation set (Table 2). We took these models as primary learners for StarGRUNet. To estimate each spectral parameter, we built a StarGRUNet model respectively for the spectra with a low SNR and a high SNR.

3.3 Model evaluation

In this subsection, we evaluate the performance of StarGRUNet on the test set. The evaluations are conducted based on the following metrics: μ, the mean of the difference between StarGRUNet predictions and the APOGEE DR17 catalogue; σ, the standard deviation of the difference; and MAE, the mean of the absolute of difference/error. Here, μ indicates the deviation or inconsistency between the prediction result and the reference; σ measures the degree of dispersion or instability of the consistency between the prediction results and the reference; and MAE is a cumulative measure of the difference on all test samples and describes the overall inconsistency.

To evaluate the performance of StarGRUNet, we compared its estimation results with the APOGEE DR17 catalogue in |$T_\texttt {eff}\!-\!\log \ g$| space (Fig. 4). For ease of comparison, three Mesa Isochrones and Stellar Tracks (MIST) stellar isochrones with stellar ages of 7 Gyr are presented in this figure. It is shown that the StarGRUNet predictions not only reconstruct APOGEE DR17 |$T_\texttt {eff}$| and log  g nicely but also match the MIST stellar isochrones well. These phenomena indicate a strong consistency between the predictions of stellar atmospheric parameters from StarGRUNet and the APOGEE DR17 catalogue.

Comparison between the StarBRUNet predictions and APOGEE DR17 catalogue on the test set. The left-hand panel shows the results from the APOGEE DR17 catalogue, and the right-hand panel shows the estimated results from StarGRUNet. The colours indicate the [Fe/H] abundances. The solid line, the dashed line, and the dotted line respectively indicate three MIST stellar isochrones with stellar ages of 7 Gyr.
Figure 4.

Comparison between the StarBRUNet predictions and APOGEE DR17 catalogue on the test set. The left-hand panel shows the results from the APOGEE DR17 catalogue, and the right-hand panel shows the estimated results from StarGRUNet. The colours indicate the [Fe/H] abundances. The solid line, the dashed line, and the dotted line respectively indicate three MIST stellar isochrones with stellar ages of 7 Gyr.

The prediction performance of StarGRUNet can also be measured by the dependence of the difference between its predictions and the APOGEE DR17 catalogue on the SNR (Fig. 5). The experiments in Fig. 5 investigate the dependence of the prediction error of StarGRUNet on the SNR for the atmospheric parameters and abundances of 13 elements. The results show that the increase of S/Ng can effectively reduce the MAE and σ of the StarGRUNet prediction, but μ is almost always stable at 0. These phenomena indicate that improving data quality can effectively reduce the error of StarGRUNet without affecting the overall consistency between StarGRUNet and the APOGEE DR17 catalogue. Therefore, the prediction results of StarGRUNet are very robust. In conclusion, the results of Figs 4 and 5 demonstrate the excellent prediction performance of StarGRUNet for all stellar parameters.

The dependence of the consistency between the StarGRUNet predictions and APOGEE DR17 catalogue on the spectral signal-to-noise ratio. Triangles, circles, and squares represent the σ, MAE, and μ of the prediction uncertainty.
Figure 5.

The dependence of the consistency between the StarGRUNet predictions and APOGEE DR17 catalogue on the spectral signal-to-noise ratio. Triangles, circles, and squares represent the σ, MAE, and μ of the prediction uncertainty.

Finally, we compared the prediction results of StarGRUNet, StarNet, and ResNet (He et al. 2016). StarNet consists of several convolutional layers and several fully connected layers. Therefore, it is a typical convolutional neural network for stellar spectral parameter estimation. ResNet can mine the deep, longitudinal features of the spectrum. This is a sharp contrast to the cross-wavelength subband feature extraction capability of StarGRUNet. Therefore, it is useful for a comparison of the advantages of cross-wavelength subband information extraction and fusion to compare these two methods with StarGRUNet. Table 3 presents the experimental results of these three models. In order to compare the experimental results from them fairly, the three methods share the training set, validation set, and test set. The experimental results in Table 3 indicate that StarGRUNet has an undeniable advantage. Regarding StarNet, its performance is much inferior to StarGRUNet. This is because the network structure of StarNet is relatively simple. Therefore, it is difficult to achieve a better performance of parameter estimation on LAMOST low-resolution spectra. Regarding ResNet, although it can longitudinally exploit the weak features of the spectrum, it fails to extract the cross-band information horizontally, and the result of this failure is ResNet’s sensitiveness to noise. In contrast, the prediction results of StarGRUNet are more accurate and robust. These results are due not only to the extraction and fusion of information from various cross-band features by the BGANet model but also to the advantageous integration of the mapping results under multiple feature expressions by StarGRUNet.

Table 3.

Comparisons: StarGRUNet, StarNet, and ResNet.

ModelStarGRUNetStarNetResNet
ErrorμσMAEμσMAEμσMAE
|$T_\texttt {eff}$| (K)0.93593.7749.28-33.41518.66416.97−16.80150.0487.69
log  g(dex)0.0000.1620.084−0.0140.9840.8770.0230.2500.146
[Fe/H] (dex)0.0010.0700.041−0.0030.2940.224−0.0080.0980.058
[C/H] (dex)0.0010.0900.055−0.0020.3120.2180.0030.1150.070
[N/H] (dex)0.0000.1820.109−0.0020.3750.2860.0080.2000.122
[O/H] (dex)0.0000.1040.068−0.0020.2390.176−0.0020.1160.075
[Mg/H] (dex)0.0010.0730.045−0.0010.2460.178−0.0040.0940.059
[Al/H] (dex)0.0010.0890.052−0.0020.3110.2130.0090.1310.079
[Si/H] (dex)0.0010.0740.045−0.0020.2580.192−0.0040.0950.059
[S/H] (dex)0.0020.1210.080−0.0010.2360.1740.0050.1390.088
[K/H] (dex)0.0020.1410.0820.0000.2760.194−0.0310.2360.115
[Ca/H] (dex)0.0000.0810.050−0.0020.2550.1910.0030.0940.058
[Ti/H] (dex)0.0020.1610.101−0.0010.3290.2460.0280.2260.125
[Cr/H] (dex)0.0030.2150.126−0.0020.3840.2810.0130.2260.134
[Mn/H] (dex)0.0010.1010.060−0.0030.3720.280−0.0060.1260.077
[Ni/H] (dex)0.0000.0820.0500.0000.3260.2410.0020.1080.064
ModelStarGRUNetStarNetResNet
ErrorμσMAEμσMAEμσMAE
|$T_\texttt {eff}$| (K)0.93593.7749.28-33.41518.66416.97−16.80150.0487.69
log  g(dex)0.0000.1620.084−0.0140.9840.8770.0230.2500.146
[Fe/H] (dex)0.0010.0700.041−0.0030.2940.224−0.0080.0980.058
[C/H] (dex)0.0010.0900.055−0.0020.3120.2180.0030.1150.070
[N/H] (dex)0.0000.1820.109−0.0020.3750.2860.0080.2000.122
[O/H] (dex)0.0000.1040.068−0.0020.2390.176−0.0020.1160.075
[Mg/H] (dex)0.0010.0730.045−0.0010.2460.178−0.0040.0940.059
[Al/H] (dex)0.0010.0890.052−0.0020.3110.2130.0090.1310.079
[Si/H] (dex)0.0010.0740.045−0.0020.2580.192−0.0040.0950.059
[S/H] (dex)0.0020.1210.080−0.0010.2360.1740.0050.1390.088
[K/H] (dex)0.0020.1410.0820.0000.2760.194−0.0310.2360.115
[Ca/H] (dex)0.0000.0810.050−0.0020.2550.1910.0030.0940.058
[Ti/H] (dex)0.0020.1610.101−0.0010.3290.2460.0280.2260.125
[Cr/H] (dex)0.0030.2150.126−0.0020.3840.2810.0130.2260.134
[Mn/H] (dex)0.0010.1010.060−0.0030.3720.280−0.0060.1260.077
[Ni/H] (dex)0.0000.0820.0500.0000.3260.2410.0020.1080.064
Table 3.

Comparisons: StarGRUNet, StarNet, and ResNet.

ModelStarGRUNetStarNetResNet
ErrorμσMAEμσMAEμσMAE
|$T_\texttt {eff}$| (K)0.93593.7749.28-33.41518.66416.97−16.80150.0487.69
log  g(dex)0.0000.1620.084−0.0140.9840.8770.0230.2500.146
[Fe/H] (dex)0.0010.0700.041−0.0030.2940.224−0.0080.0980.058
[C/H] (dex)0.0010.0900.055−0.0020.3120.2180.0030.1150.070
[N/H] (dex)0.0000.1820.109−0.0020.3750.2860.0080.2000.122
[O/H] (dex)0.0000.1040.068−0.0020.2390.176−0.0020.1160.075
[Mg/H] (dex)0.0010.0730.045−0.0010.2460.178−0.0040.0940.059
[Al/H] (dex)0.0010.0890.052−0.0020.3110.2130.0090.1310.079
[Si/H] (dex)0.0010.0740.045−0.0020.2580.192−0.0040.0950.059
[S/H] (dex)0.0020.1210.080−0.0010.2360.1740.0050.1390.088
[K/H] (dex)0.0020.1410.0820.0000.2760.194−0.0310.2360.115
[Ca/H] (dex)0.0000.0810.050−0.0020.2550.1910.0030.0940.058
[Ti/H] (dex)0.0020.1610.101−0.0010.3290.2460.0280.2260.125
[Cr/H] (dex)0.0030.2150.126−0.0020.3840.2810.0130.2260.134
[Mn/H] (dex)0.0010.1010.060−0.0030.3720.280−0.0060.1260.077
[Ni/H] (dex)0.0000.0820.0500.0000.3260.2410.0020.1080.064
ModelStarGRUNetStarNetResNet
ErrorμσMAEμσMAEμσMAE
|$T_\texttt {eff}$| (K)0.93593.7749.28-33.41518.66416.97−16.80150.0487.69
log  g(dex)0.0000.1620.084−0.0140.9840.8770.0230.2500.146
[Fe/H] (dex)0.0010.0700.041−0.0030.2940.224−0.0080.0980.058
[C/H] (dex)0.0010.0900.055−0.0020.3120.2180.0030.1150.070
[N/H] (dex)0.0000.1820.109−0.0020.3750.2860.0080.2000.122
[O/H] (dex)0.0000.1040.068−0.0020.2390.176−0.0020.1160.075
[Mg/H] (dex)0.0010.0730.045−0.0010.2460.178−0.0040.0940.059
[Al/H] (dex)0.0010.0890.052−0.0020.3110.2130.0090.1310.079
[Si/H] (dex)0.0010.0740.045−0.0020.2580.192−0.0040.0950.059
[S/H] (dex)0.0020.1210.080−0.0010.2360.1740.0050.1390.088
[K/H] (dex)0.0020.1410.0820.0000.2760.194−0.0310.2360.115
[Ca/H] (dex)0.0000.0810.050−0.0020.2550.1910.0030.0940.058
[Ti/H] (dex)0.0020.1610.101−0.0010.3290.2460.0280.2260.125
[Cr/H] (dex)0.0030.2150.126−0.0020.3840.2810.0130.2260.134
[Mn/H] (dex)0.0010.1010.060−0.0030.3720.280−0.0060.1260.077
[Ni/H] (dex)0.0000.0820.0500.0000.3260.2410.0020.1080.064

3.4 Model uncertainty

Model uncertainty analysis is another way to test the prediction performance of StarGRUNet. Gal & Ghahramani (2016) demonstrated that a neural network with a dropout mechanism is an approximation to a Bayesian neural network with a Gaussian distribution and can be employed to estimate prediction uncertainty. Leung & Bovy (2018) introduced this idea into a stellar spectral parameters estimation model, AstroNN, to estimate the uncertainty. Because BGANet comes with a dropout mechanism in each of the hidden feature vectors, it supports our assessment of the model uncertainty. We repeatedly estimated each spectral parameter five times from each stellar spectrum, and computed the standard deviation of the five predictions as the model uncertainty of StarGRUNet.

Figs 6 and 7 present the dependence of the StarGRUNet uncertainty on |$T_\texttt {eff}$| and [Fe/H], respectively. It is shown that the parameter estimation uncertainty of StarGRUNet on the spectra of metal-poor stars ([Fe/H] < –1.00dex), cold stars (⁠|$T_\texttt {eff}\lt 4000K$|⁠) and hot stars (⁠|$T_\texttt {eff}\gt 6000$| K) is generally greater than that on other types of spectra. These trends are in general consistent with Leung & Bovy (2018). The reason for this phenomenon is the small number and weak spectral features of metal-poor, cool, and hot stars. These characteristics reduce the performance of the established model on the spectra of these stars. Therefore, we recommend using the results in these above-mentioned ranges with caution.

The dependences of StarGRUNet prediction uncertainty on $T_\texttt {eff}$. The dependences are presented using box plots. The black dots inside the box represent the mean of the prediction uncertainties. The dashed line inside the box represents the second quartile of the prediction uncertainty Q2 (namely the median). The lower bottom of the box represents the first quartile Q1. The upper bottom of the box represents the third quartile Q3. The difference between Q3 and Q1 is the IQR (interquartile range): IQR = Q3 − Q1. The lines above and below the box are called the upper and lower limits, corresponding to the values Q3 + 1.5IQR and Q1 − 1.5IQR, respectively. The height of the box and the distance between the upper limit and the lower limit can reflect the degree of uncertainty dispersion to some extent.
Figure 6.

The dependences of StarGRUNet prediction uncertainty on |$T_\texttt {eff}$|⁠. The dependences are presented using box plots. The black dots inside the box represent the mean of the prediction uncertainties. The dashed line inside the box represents the second quartile of the prediction uncertainty Q2 (namely the median). The lower bottom of the box represents the first quartile Q1. The upper bottom of the box represents the third quartile Q3. The difference between Q3 and Q1 is the IQR (interquartile range): IQR = Q3Q1. The lines above and below the box are called the upper and lower limits, corresponding to the values Q3 + 1.5IQR and Q1 − 1.5IQR, respectively. The height of the box and the distance between the upper limit and the lower limit can reflect the degree of uncertainty dispersion to some extent.

The dependences of StarGRUNet prediction uncertainty on [Fe/H]. The dependences are presented using box plots. The black dots inside the box represent the mean of prediction uncertainties. The dashed line inside the box represents the second quartile of the prediction uncertainty Q2 (namely the median). The lower bottom of the box represents the first quartile Q1. The upper bottom of the box represents the third quartile Q3. The difference between Q3 and Q1 is the IQR (interquartile range): IQR = Q3 − Q1. The lines above and below the box are called the upper and lower limits, corresponding to the values Q3 + 1.5IQR and Q1 − 1.5IQR, respectively. The height of the box and the distance between the upper limit and the lower limit can reflect the degree of uncertainty dispersion to some extent.
Figure 7.

The dependences of StarGRUNet prediction uncertainty on [Fe/H]. The dependences are presented using box plots. The black dots inside the box represent the mean of prediction uncertainties. The dashed line inside the box represents the second quartile of the prediction uncertainty Q2 (namely the median). The lower bottom of the box represents the first quartile Q1. The upper bottom of the box represents the third quartile Q3. The difference between Q3 and Q1 is the IQR (interquartile range): IQR = Q3Q1. The lines above and below the box are called the upper and lower limits, corresponding to the values Q3 + 1.5IQR and Q1 − 1.5IQR, respectively. The height of the box and the distance between the upper limit and the lower limit can reflect the degree of uncertainty dispersion to some extent.

4 APPLICATIONS ON LAMOST DR8 LOW-RESOLUTION SPECTRA AND VALIDATIONS ON OTHER SURVEYS

4.1 Applications on LAMOST DR8 low-resolution spectra

In Section 3, we performed a comprehensive evaluation on the performance of StarGRUNet, and a series of experimental results indicate its effectiveness and robustness. Therefore, we utilized the trained StarGRUNet models to estimate the stellar parameters |$T_\texttt {eff}$| and log  g, 14 chemical elemental abundances, and 1σ uncertainties for about 8.21 million LAMOST low-resolution spectra with S/Ng ≥ 5, and generated the StarGRUNet-LAMOST catalogue. In subsections 4.2, 4.3 and 4.4, we evaluate the reliability of the StarGRUNet-LAMOST catalogue.

4.2 Consistencies with the GALAH Survey

An effective way to verify the reliability of the computed StarGRUNet-LAMOST catalogue is to investigate its consistency with a high-resolution catalogue. GALAH DR3 (Buder et al. 2021) provides reliable stellar parameters and elemental abundances for 588 571 stars, including 383 088 dwarfs, 200 927 giant stars, and 4556 unclassified stars. We cross-matched the GALAH DR3 catalogue with the StarGRUNet-LAMOST catalogue and obtained 27 527 common sources. Based on these common sources, we computed the consistency between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue (Fig. 8). A high consistency was found between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue. The systematic biases on |$T_\texttt {eff}$|⁠, log  g, and [Fe/H] are −44.48 K, 0.004 dex, and 0.037 dex, respectively. The corresponding dispersions are 224.26 K, 0.222 dex, and 0.149 dex, respectively. The corresponding MAEs are 109.39 K, 0.128 dex, and 0.095 dex, respectively. The biases, the deviations, and the MAEs of other elemental abundances are also similarly small. These experimental results show excellent consistency between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue, and indicate the reliability of the StarGRUNet-LAMOST catalogue.

Consistency between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue. In each subplot, the horizontal axis indicates the results provided by the GALAH DR3 catalogue, and the vertical axis indicates the difference between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue. The dashed line corresponds to μ = 0, indicating the theoretical consistency. The box in the lower left-hand corner provides the bias and dispersion. The colour characterizes the density of the samples.
Figure 8.

Consistency between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue. In each subplot, the horizontal axis indicates the results provided by the GALAH DR3 catalogue, and the vertical axis indicates the difference between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue. The dashed line corresponds to μ = 0, indicating the theoretical consistency. The box in the lower left-hand corner provides the bias and dispersion. The colour characterizes the density of the samples.

Figs 9 and 10 show the distribution of dwarf stars and giant stars in [X/Fe]–[Fe/H] space, respectively. [X/Fe] represents the abundance of element X relative to Fe, and is computed as [X/Fe]  = [X/H] – [Fe/H]. In general, the elemental abundances of the StarGRUNet-LAMOST catalogue are relatively tight, and most of the StarGRUNet-LAMOST elemental abundances are consistent with the GALAH DR3 catalogue. However, there are still some evident differences between the StarGRUNet-LAMOST catalogue and the GALAH DR3 catalogue on some elemental abundances, such as [Ti/H] for dwarfs. Such differences may be due to the severe lack of metal lines of these elements in the low-resolution, blue-end spectra of LAMOST. Therefore, the precision of the Ti abundance of dwarfs in the StarGRUNet-LAMOST catalogue may be inferior to that in the GALAH DR3 catalogue and should be used with caution.

Distribution of dwarfs (log  g > 4) in [X/Fe]–[Fe/H] space. The two left-hand columns are the estimation results of the GALAH catalogue, and the two right-hand columns are the estimation results of the StarGRUNet-LAMOST catalogue. The colour intensity characterizes the density of the sample distribution.
Figure 9.

Distribution of dwarfs (log  g > 4) in [X/Fe]–[Fe/H] space. The two left-hand columns are the estimation results of the GALAH catalogue, and the two right-hand columns are the estimation results of the StarGRUNet-LAMOST catalogue. The colour intensity characterizes the density of the sample distribution.

Distribution of giants (log  g < 4) in the [X/Fe]–[Fe/H] space. The two left-hand columns are the estimation results of the GALAH catalogue, and the two right-hand columns are the estimation results of the StarGRUNet-LAMOST catalogue. The colour intensity characterizes the density of the sample distribution.
Figure 10.

Distribution of giants (log  g < 4) in the [X/Fe]–[Fe/H] space. The two left-hand columns are the estimation results of the GALAH catalogue, and the two right-hand columns are the estimation results of the StarGRUNet-LAMOST catalogue. The colour intensity characterizes the density of the sample distribution.

4.3 Comparisons with other catalogues based on LAMOST low-resolution spectra

To evaluate the effectiveness of the StarGRUNet-LAMOST catalogue, we compared it with three catalogues based on LAMOST low-resolution spectra. The three catalogues are the LASP catalogue (Luo et al. 2015), the GSN catalogue (Rui et al. 2019b), and the LASSO-MLPNet catalogue (Li et al. 2022b). The LASP catalogue is computed by the LAMOST official pipeline and consists of the estimations of the stellar atmospheric parameters |$T_\texttt {eff}$|⁠, log  g, and [Fe/H]. The GSN catalogue is a set of estimates of the stellar atmospheric parameters |$T_\texttt {eff}$|⁠, log  g, [Fe/H], and [α/Fe]. The LASSO-MLPNet catalogue consists of the estimates of the stellar atmospheric parameters |$T_\texttt {eff}$|⁠, log  g, and [Fe/H] from 4828 190 LAMOST DR8 low-resolution stellar spectra with 5 ≤ S/Ng ≤ 80 and 3500 K |$\le T_\texttt {eff}\le$| 6500 K. Because these catalogues are computed from LAMOST low-resolution spectra, they are easily comparable with the StarGRUNet-LAMOST catalogue.

In the experiment portrayed in Fig. 11, we compared the StarGRUNet-LAMOST catalogue with the LASP catalogue, the GSN catalogue (Rui et al. 2019b), and the LASSO-MLPNet catalogue (Li et al. 2022b), using the mean of the error, the standard deviation of the error, and the MAE of the prediction error. It can be seen that for each stellar atmosphere parameter, the estimation performance measures μ, σ, and MAE of StarGRUNet are evidently lower than those of LASP, GSN, and LASSO-MLPNet on the whole. These results indicate that the error between the StarGRUNet-LAMOST catalogue and the APOGEE DR17 catalogue is smaller than the errors between other catalogues and the APOGEE DR17. Therefore, the StarGRUNet-LAMOST catalogue can more accurately recover the stellar atmospheric parameters from LAMOST low-resolution spectra.

Dependences of the prediction errors on the spectral signal-to-noise ratio for the StarGRUNet-LAMOST catalogue, the LASP catalogue (Luo et al. 2015), the GSN catalogue (Rui et al. 2019b), and the LASSO-MLPNet catalogue (Li et al. 2022b). The horizontal coordinates represent the S/Ng intervals [5,8), [8,10), [10,20), [20,30), [30,40), [40,50), [50,80), [80,100), and [100, + ∞), respectively. The first, second, and third columns represent ${T}_\texttt {eff}$, log  g, and [Fe/H], respectively. The first, second, and third rows respectively represent the mean μ, the standard deviation σ, and the mean of the absolute error MAE of the difference between (the LASP catalogue, GSN catalogue, LASSO MLPNet catalogue, StarGRUNet catalogue) and the APOGEE DR17 catalogue. Triangles, squares, crosses, and circles indicate the evaluation results for the LASP catalogue, the GSN catalogue, the LASSO-MLPNet catalogue, and the StarGRUNet-LAMOST catalogue, respectively. It should be noted that the LASSO-MLPNet catalogue only gives estimates for spectra with 5 ≤ S/Ng ≤ 80. Therefore, the curves of the LASSO-MLPNet catalogue disappear on the last two S/Ng intervals.
Figure 11.

Dependences of the prediction errors on the spectral signal-to-noise ratio for the StarGRUNet-LAMOST catalogue, the LASP catalogue (Luo et al. 2015), the GSN catalogue (Rui et al. 2019b), and the LASSO-MLPNet catalogue (Li et al. 2022b). The horizontal coordinates represent the S/Ng intervals [5,8), [8,10), [10,20), [20,30), [30,40), [40,50), [50,80), [80,100), and [100, + ∞), respectively. The first, second, and third columns represent |${T}_\texttt {eff}$|⁠, log  g, and [Fe/H], respectively. The first, second, and third rows respectively represent the mean μ, the standard deviation σ, and the mean of the absolute error MAE of the difference between (the LASP catalogue, GSN catalogue, LASSO MLPNet catalogue, StarGRUNet catalogue) and the APOGEE DR17 catalogue. Triangles, squares, crosses, and circles indicate the evaluation results for the LASP catalogue, the GSN catalogue, the LASSO-MLPNet catalogue, and the StarGRUNet-LAMOST catalogue, respectively. It should be noted that the LASSO-MLPNet catalogue only gives estimates for spectra with 5 ≤ S/Ng ≤ 80. Therefore, the curves of the LASSO-MLPNet catalogue disappear on the last two S/Ng intervals.

4.4 Uncertainty analysis based on repeated observation: Observation uncertainty

We explored the model uncertainty of StarGRUNet based on the dropout technique in Subsection 3.4. In addition, LAMOST produced some repeated observations by carrying out multiple observations on some stars at various times and under different conditions. The parameters of this kind of spectrum from a common source can be assumed to be constant over the timespan in which we carry out the observations. Therefore, such repeated observations provide us with an alternative option for analysing the uncertainty of the StarGRUNet-LAMOST catalogue. For convenience, we name this uncertainty the observation uncertainty. Suppose the number of repeated observations of a star is nobs, and the corresponding repeated spectra are {|$x_{1},x_{2},...,x_{n_{obs}}$|}. Thus, for any one stellar parameter, StarGRUNet computed nobs estimates. The observation uncertainty is measured using the standard deviation of these nobs estimates in this work. To ensure the reliability of the estimated uncertainty, we keep only the target stars with more than six repeated observations (26 459 in total).

Fig. 12 demonstrates the dependence of the observation uncertainty on the SNR. Overall, the observation uncertainty of the StarGRUNet-LAMOST catalogue is low and has a clear decreasing trend with increasing spectrum quality. In the case of S/Ng ∈ [5, 10), the uncertainties of |$T_\texttt {eff}$| and log  g estimations are 182 K and 0.34 dex, respectively, and the uncertainty of elemental abundance estimation is 0.07–0.17 dex. In the case of S/Ng ≥ 150, the uncertainties of the |$T_\texttt {eff}$| and log  g estimates decrease to 139 K and 0.29 dex, respectively, and the uncertainty of elemental abundance estimation decreases to 0.06–0.15 dex. These phenomena indicate that the results of the StarGRUNet-LAMOST catalogue are very robust.

Observation uncertainty of StarGRUNet-LAMOST. The horizontal axis represents the signal-to-noise ratio of the spectra. Each subplot is labelled with the name of the corresponding stellar parameter or elemental abundance in the upper right-hand corner.
Figure 12.

Observation uncertainty of StarGRUNet-LAMOST. The horizontal axis represents the signal-to-noise ratio of the spectra. Each subplot is labelled with the name of the corresponding stellar parameter or elemental abundance in the upper right-hand corner.

5 CONCLUSION

In this paper, a novel spectral parameter estimation neural network, BGANet, was designed based on the Bi-GRU and Self-Attention mechanism. The parameter estimation performance was further improved by introducing an ensemble learning method StarGRUNet based on the BGANet. The competitiveness of the proposed method was evaluated by comparing it with the typical methods RNN, GRU, Bi-GRU, and StarNet.

By cross-matching the LAMOST DR8 low-resolution spectral library with the APOGEE DR17 catalogue, we established a training set, a validation set, and a test set. These data sets are released for algorithm research of interested readers . On the spectra with S/Ng ≥ 5, the precisions of StarGRUNet for |$T_\texttt {eff}$| and log  g are 94 K and 0.16 dex, respectively. The precisions of elemental abundances [X/H] are 0.07∼0.16 dex (except for 0.18 dex for [N/H] and 0.22 dex for [Cr/H]). The test results show that StarGRUNet has a higher accuracy and robustness compared with other catalogues and neural networks on the whole.

To facilitate use in astronomical science research, this paper applied the trained StarGRUNet model to 8208 332 LAMOST-DR8 low-resolution spectra, and computed the estimations for |$T_\texttt {eff}$|⁠, log g, and 14 elements ([C/H], [Mg/H], [Al/H], [Si/H], [Ca /H], [Fe/H], [N/H], [O/H], [S/H], [Ti/H], [Cr/H], [Mn/H], [Ni/H], [K/H]). The estimates have also been publicly released, and their URLs are available in the Data and Code Availability section.

DATA AND CODE AVAILABILITY

The experimental data set, estimated catalogue, experimental code, and trained models are available at http://doi.org/10.12149/101216.

ACKNOWLEDGEMENTS

This work is supported by the National Natural Science Foundation of China (grant no. 11973022), the Natural Science Foundation of Guangdong Province (grant no. 2020A1515010710), the Major projects of the joint fund of Guangdong and the National Natural Science Foundation (grant no. U1811464). The authors are deeply grateful to Yu Lu, Jinqu Zhang, and Hui Li for discussions when polishing this paper.

LAMOST, a multi-target optical fibre spectroscopic telescope in the large sky area, is a major national engineering project built by the Chinese Academy of Sciences. Funding for the project is provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatory of the Chinese Academy of Sciences.

References

Bu
Y.
,
Pan
J.
,
2018
,
MNRAS
,
447
,
256

Buder
S.
et al. ,
2021
,
MNRAS
,
506
,
150

Cai
B.
,
Kong
X.
,
Shi
J.
,
Gao
Q.
,
Bu
Y.
,
Yi
Z.
,
2023
,
AJ
,
165
,
52

Deng
L.-C.
et al. ,
2012
,
Res. Astron. Astrophys.
,
12
,
735

De Silva
G. M.
et al. ,
2015
,
MNRAS
,
449
,
2604

Fabbro
S.
,
Venn
K.
,
O’Briain
T.
,
Bialek
S.
,
Kielty
C. L.
,
Jahandar
F.
,
Monty
S.
,
2018
,
MNRAS
,
475
,
2978

Fiorentin
P. R.
,
Bailer-Jones
C.
,
Lee
Y. S.
,
Beers
T. C.
,
Sivarani
T.
,
Wilhelm
R.
,
Prieto
C. A.
,
Norris
J.
,
2007
,
A&A
,
467
,
1373

Gal
Y.
,
Ghahramani
Z.
,
2016
, in
Balcan
M. F.
,
Weinberger
K. Q.
, eds,
Proceedings of Machine Learning Research, Vol. 48, Proc. 33rd Int. Conf. Machine Learning
.
PMLR
,
New York, USA
, p.
1050

Gilmore
G.
et al. ,
2012
,
Messenger
,
147
,
25

Goodfellow
I.
,
Bengio
Y.
,
Courville
A.
,
2016
,
Deep Learning
.
MIT Press
,
Cambridge, MA

He
K.
,
Zhang
X.
,
Ren
S.
,
Sun
J.
,
2016
, in
David
F.
,
Philip
T.
,
Andrew
Z.
, eds,
Proc. IEEE Conf. Computer Vision and Pattern Recognition
.
IEEE
,
Las Vegas, NV
, p.
770

Ho
A. Y. Q.
et al. ,
2017
,
ApJ
,
836
,
5

Jofré
P.
,
Heiter
U.
,
Soubiran
C.
,
2019
,
ARA&A
,
57
,
571

Leung
H. W.
,
Bovy
J.
,
2018
,
MNRAS
,
483
,
3255

Li
X.
,
Wu
Q. M. J.
,
Luo
A.
,
Zhao
Y.
,
Lu
Y.
,
Zuo
F.
,
Yang
T.
,
Wang
Y.
,
2014
,
ApJ
,
790
,
105

Li
X.
,
Lu
Y.
,
Comte
G.
,
Luo
A.
,
Zhao
Y.
,
Wang
Y.
,
2015
,
ApJS
,
218
,
3

Li
X.
,
Wang
Z.
,
Zeng
S.
,
Liao
C.
,
Du
B.
,
Kong
X.
,
Li
H.
,
2022a
,
Res. Astron. Astrophys.
,
22
,
065018

Li
X.
,
Zeng
S.
,
Wang
Z.
,
Du
B.
,
Kong
X.
,
Liao
C.
,
2022b
,
MNRAS
,
514
,
4588

Li
Z.
,
Zhao
G.
,
Chen
Y.
,
Liang
X.
,
Zhao
J.
,
2022c
,
MNRAS
,
517
,
4875

Luo
A.-L.
et al. ,
2015
,
Res. Astron. Astrophys.
,
15
,
1095

Niu
Z.
,
Zhong
G.
,
Yu
H.
,
2021
,
Neurocomputing
,
452
,
48

Prieto
C. A.
,
Majewski
S. R.
,
Schiavon
R.
,
Cunha
K.
,
Wilson
J.
,
2010
,
Astron. Nachrichten
,
5
,
428

Rui
W.
et al. ,
2019a
,
PASP
,
131
,
024505

Rui
W.
et al. ,
2019b
,
PASP
,
131
,
024505

Steinmetz
M.
et al. ,
2006
,
AJ
,
132
,
1645

Ting
Y.-S.
,
Rix
H.-W.
,
Conroy
C.
,
Ho
A. Y. Q.
,
Lin
J.
,
2017
,
ApJ
,
849
,
L9

Wang
R.
,
Luo
A.-L.
,
Chen
J.-J.
,
Hou
W.
,
Zhang
S.
,
Zhao
Y.-H.
,
Li
X.-R.
,
Hou
Y.-H.
,
2020
,
ApJ
,
891
,
23

Wang
C.
,
Huang
Y.
,
Yuan
H.
,
Zhang
H.
,
Xiang
M.
,
Liu
X.
,
2022
,
ApJS
,
259
,
51

Wolpert
D. H.
,
1992
,
Neural Netw.
,
5
,
241

Wu
Y.
et al. ,
2011
,
Res. Astron. Astrophys.
,
11
,
924

Xiang
M.
et al. ,
2019
,
ApJS
,
245
,
34

Xiang
G.
,
Chen
J.
,
Qiu
B.
,
Lu
Y.
,
2021
,
PASP
,
133
,
024504

Xiang
M.
et al. ,
2022
,
A&A
,
662
,
A66

Xiang
M.-S.
et al. ,
2016
,
MNRAS
,
464
,
3657

Xiong
S.
,
Li
X.
,
Liao
C.
,
2022
,
ApJS
,
261
,
36

Yang
T.
,
Li
X.
,
2015
,
MNRAS
,
452
,
158

Yanny
B.
,
Rockosi
C.
,
Newberg
H. J.
,
Knapp
G. R.
,
Wadadekar
Y.
,
2009
,
AJ
,
137
,
4377

Zhang
X.
,
Zhao
G.
,
Yang
C. Q.
,
Wang
Q. X.
,
Zuo
W. B.
,
2019
,
PASP
,
131
,
094202

Zhang
B.
,
Liu
C.
,
Deng
L.-C.
,
2020
,
ApJS
,
246
,
9

Zhao
G.
,
Zhao
Y.-H.
,
Chu
Y.-Q.
,
Jing
Y.-P.
,
Deng
L.-C.
,
2012
,
Res. Astron. Astrophys.
,
12
,
723

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)