ABSTRACT

Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) acquired tens of millions of low-resolution stellar spectra. The large amount of the spectra result in the urgency to explore automatic atmospheric parameter estimation methods. There are lots of LAMOST spectra with low signal-to-noise ratios (SNR), which result in a sharp degradation on the accuracy of their estimations. Therefore, it is necessary to explore better estimation methods for low-SNR spectra. This paper proposed a neural network-based scheme to deliver atmospheric parameters, LASSO-MLPNet. Firstly, we adopt a polynomial fitting method to obtain pseudo-continuum and remove it. Then, some parameter-sensitive features in the existence of high noises were detected using Least Absolute Shrinkage and Selection Operator (LASSO). Finally, LASSO-MLPNet used a Multilayer Perceptron network (MLPNet) to estimate atmospheric parameters Teff, log g, and [Fe/H]. The effectiveness of the LASSO-MLPNet was evaluated on some LAMOST stellar spectra of the common star between the Apache Point Observatory Galactic Evolution Experiment (APOGEE) and LAMOST. It is shown that the estimation accuracy is significantly improved on the stellar spectra with 10 < SNR ≤ 80. Especially, LASSO-MLPNet reduces the mean absolute error (MAE) of the estimation of Teff, log g, and [Fe/H] from [144.59 K, 0.236 dex, 0.108 dex; LAMOST Stellar Parameter Pipeline (LASP)] to (90.29 K, 0.152 dex, 0.064 dex; LASSO-MLPNet) on the stellar spectra with 10 < SNR ≤ 20. To facilitate reference, we release the estimates of the LASSO-MLPNet from more than 4.82 million stellar spectra with 10 < SNR ≤ 80 and 3500 < SNRg ≤ 6500 as a value-added output.

1 INTRODUCTION

The stellar spectra contain a lot of basic information about stars, e.g. effective temperature (Teff), surface gravity (log g), the metallicity ([Fe/H]), and so on (Liu et al. 2014). These information is a basis for researches on the stellar evolution and the history of the Milky Way. Therefore, many large-scale survey telescopes have been built, a series of spectral surveys are carried out, and the amount of stellar spectra is greatly increased. Therefore, it is urgent to explore automatic atmospheric parameter estimation methods.

Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST; Cui et al. 2012; Liu, Zhao & Hou 2015) is a major scientific and technological infrastructure, which can simultaneously collect the spectra from up to 4000 targets. After several years of sky surveys, LAMOST has obtained more than 10 million stellar spectra. The stellar parameters of these spectra are derived using LAMOST Stellar Parameter Pipelines (LASP; Wu et al. 2011; Luo et al. 2015). The LASP is designed based on the University of Lyon Spectroscopic Analysis Software [ulyss; Koleva et al. (2009)]. The LASP delivers stellar parameters by minimizing the χ2 between observed spectra and template spectra from ELODIE library (Prugniel & Soubiran 2001; Prugniel et al. 2007). However, the ELODIE library lacks spectral samples of K giants and subgiants, which results in some deviation in the LASP parameter estimations from their theoretical values. On the low-resolution stellar spectra, especially, the metal lines are relatively weak and interfered by noise easily. Therefore, the accuracy of the estimated parameters from the LAMOST low-resolution stellar spectra is relatively low and needs to be improved.

Therefore, experts have carried out a series of studies for the parameter estimation problem from LAMOST low-resolution stellar spectra. For the LAMOST spectra with the low-resolution and high signal-to-noise ratio (SNR), Xiang et al. (2017) proposed a stellar atmospheric physical parameters, absolute magnitude, and element abundance estimation method based on Kernel Principal Component Analysis (Schölkopf, Smola & Müller 1997). This study mainly focused on LAMOST spectra with SNR > 50. Ting et al. (2017) studied the estimation problem of element abundance for LAMOST spectra with SNR higher than 30 for each pixel based on a single-hidden-layer neural network (NN) model. Xiang et al. (2019) used The data-driven Payne (DD-Payne) method to determine stellar atmosphere parameters and 16 element abundances for 6 million stars with a SNR higher than 30 for each pixel. Zhang, Liu & Deng (2020) proposed a spectral parameter estimation scheme, Stellar LAbel Machine (SLAM), based on support vector regression (SVR) method, and predicted the parameters with the uncertainties 50 K, 0.09 dex, and 0.07 dex, respectively, on Teff, log g, and [Fe/H] for the LAMOST spectra with SNRg > 100. Wang et al. (2020) used a generative spectral network to estimate the atmospheric physical parameters for the LAMOST stellar spectra with SNRg > 30, and obtained that for the stellar spectra with SNRg ≥ 50, the accuracy of Teff, log g, [Fe/H], and [α/Fe] is 80 K, 0.14 dex, 0.07 dex, and 0.168 dex, respectively, and these results have a good consistency with other studies. LAMOST collaboration (Luo et al. 2015; Wu et al. 2014) uses LASP – a method based on minimizing the χ2 between the measured spectra and the synthetic spectra of the ELODIE library – to determine the stellar spectral parameters and radial velocity for the stellar spectra with SNRg > 6 and SNRg > 15 for the dark and bright nights, respectively (Luo et al. 2015).

The current related researches mainly focus on the parameter estimation problem of the LAMOST spectra with higher SNR and most studies used the NN models. Although the official LAMOST data provide parameter estimation for the spectra with SNR higher than 6, its accuracy degrades sharply as the SNR decreases (Fig. 1), and the scale of such spectra is huge, in which there are more than 5 million spectra with SNR less than 80. And there are more than 2 million low-resolution stellar spectra with 10 < SNRg < 30 (Fig. 2). These spectra are affected by a lot of noise, and their metal lines are mixed terribly. These characteristics result that the spectral physical properties are not obvious, and the accuracy of each parameter drops sharply with the decrease of SNRg (Fig. 1). Therefore, it is necessary to conduct special research on stellar spectra with low SNR to further enhance the scientific value of LAMOST observation data. In this paper, we focus on designing a machine-learning scheme based on Least Absolute Shrinkage and Selection Operator (LASSO) and a NN to improve the accuracy of atmospheric physical parameters for LAMOST low-SNR stellar spectra

The dependencies of the error of the stellar atmospheric parameters (Teff, log g, and [Fe/H]) on SNRg. The spectral data are from the 115 942 common stars between LAMOST and APOGEE. The error refers to the MAE between the APOGEE_payne labels and the LASP estimations.
Figure 1.

The dependencies of the error of the stellar atmospheric parameters (Teff, log g, and [Fe/H]) on SNRg. The spectral data are from the 115 942 common stars between LAMOST and APOGEE. The error refers to the MAE between the APOGEE_payne labels and the LASP estimations.

The distribution of the LAMOST DR8, low-resolution stellar spectra on SNRg.
Figure 2.

The distribution of the LAMOST DR8, low-resolution stellar spectra on SNRg.

In this paper, our scheme consists of three procedures. First, this paper used a polynomial fitting method to estimate the pseudo-continuum for every spectrum, and removed the pseudo-continuum. Secondly, we detected some parameter-sensitive features in the existence of high noises using LASSO. The LASSO evaluated the effectiveness of spectral features based on the combined effects of spectral fluxes and the noises overlapped on them. Finally, we proposed a Multilayer Perceptron neural network-based method to estimate the stellar atmospheric parameters Teff, log g, and [Fe/H].

This paper is organized as follows. Section 2 describes the data used in this work and some data pre-processing procedures. In Section 3, we introduce the stellar atmospheric parameter estimation method LASSO-Multilayer Perceptron neural network (MLPNet). Section 4 describes performance evaluation results. Section 5 presents the application of our model to the LAMOST DR8 low-resolution stellar spectra. Finally, a brief summary is made in Section 6.

2 DATA AND THEIR PRE-PROCESSINGS

In stellar parameter estimation, we usually use synthetic or empirical spectra as spectral templates or training sets (reference sets) to establish a pipeline. The synthetic spectra are computed from the stellar atmospheric physical model. Therefore, the parameter coverage and wavelength range of the synthetic spectral libraries can be set according to requirements. However, this scheme depends on the limited physical information we currently have about stellar objects, which makes a certain difference between the synthetic spectra and the observed spectra, and even makes a large biases in certain parameter ranges. These differences and biases can introduce some systematic uncertainties into the establishment of the parameter estimation model. For example, the opacity of hydrogen atoms, metal atoms and molecules, and the large discrepancy between the modelling for hot & cold stars and the situation of actual celestial bodies, etc (Plez, Brett & Nordlund 1992; Masseron et al. 2014).

The empirical spectral libraries are generally composed of observed spectra that have been strictly and accurately parameter-calibrated. However, the empirical spectral libraries generally are limited by their numbers of samples, the narrow coverages of the parameters, and wavelengths. For example, the ELODIE spectral library (Prugniel et al. 2007) lacks K-giant and subgiant spectral samples, and their wavelength range is between 4000 and 4800 Å. In particular, if the empirical spectral library and the spectra to be parametrized come from different astronomical telescopes, instrumental effects will have a significant impact on model establishment and model accuracy. The instrumental effect means that, even for the observations of the same celestial body, there is a certain biases between the observed spectra from different telescopes.

Based on the above-mentioned considerations, the reference set in this paper adopted the label transfer empirical spectral library. This kind spectral library consists of a series of LAMOST observed spectra, whose reference labels are transferred from the parameter estimation results from the high-resolution and high-SNR spectra from common sources. It is well-known that the atmospheric parameter estimation results from stellar spectra with high-resolution and high-SNR are usually more accurate. For example, the Apache Point Observatory Galactic Evolution Experiment (APOGEE; Majewski et al. 2017) adopts APOGEE Stellar Parameter and Chemical Abundances Pipeline (ASPCAP; Pérez et al. 2016) to determine the stellar spectral atmospheric physical parameters and chemical element abundances, and the parameter accuracy of giant stars is more reliable than that of dwarf stars. The three atmospheric physical parameters (Teff, log g, and [Fe/H]) are accurate to 2 per cent, 0.1 dex, and 0.05 dex, respectively, and the accuracy of 15 chemical element abundances is typically under 0.1 dex (Pérez et al. 2016). Ting et al. (2019) used the improved Kurucz line list to train The Payne model, and estimated atmospheric parameters and 15 element abundances for approximately 230 000 APOGEE high-resolution stellar spectra, and an APOGEE_payne catalog was released based on the estimated results. This catalog includes some giant stars and dwarf stars. The verification of that paper shows that the accuracy of its parameter estimation results is significantly improved on the basis of ASPCAP: the accuracy of Teff is under 30 K, the accuracy of log g is 0.05 dex, and the accuracy of chemical element abundances is under 0.05 dex. Therefore, this work used the LAMOST spectra from the common stars between LAMOST and APOGEE as reference spectral library. The reference parameters of these LAMOST spectra are transferred from the APOGEE_payne catalog. The essence of this scheme is to use the common stars between LAMOST and APOGEE as the calibration stars.

2.1 LAMOST survey and APOGEE survey

LAMOST (Zhao et al. 2012) is also referred to as ‘Guo Shoujing Telescope’ with a new type of large-aperture (1.75m) and a large field of view (5 degrees field of view). It can observe 4000 targets simultaneously in a field of view of 20 deg2 in the sky, and improve effectively the target spectral collection rate. LAMOST DR8 released more than 11 million low-resolution spectra, which cover the wavelength range of 3690–9100 Å with a resolution of 1800. There are 10 388 423 stellar spectra, and the rest are the spectra of galaxies, quasars, or unknown celestial bodies. At the same time, based on LASP, LAMOST determined the stellar spectral parameter (Teff, log g, [Fe/H], and radial velocity) with SNRg > 6 and SNRg > 15 for the dark and bright nights respectively, and provided three stellar parameter catalogs, in which the LAMOST low-resolution A, F, G, and K-type catalog contain 6 478 063 stellar spectral parameter estimates.

APOGEE is one of the projects in the Sloan Digital Sky Survey. The project uses a 2.5m Sloan telescope to complete a 3-yr observation campaign, observing high-resolution (R ∼ 22 500), high-SNR (>100) and near-infrared (1.51–1.70 μm) spectra, and uses these spectra to systematically study the stellar composition of the Milky Way. APOGEE DR14 released the spectra, their atmospheric parameters, and elemental abundances for approximately 260 000 stars (Holtzman et al. 2018). These stellar parameters are estimated by ASPCAP (Pérez et al. 2016). For dwarf stars, however, log g from ASPCAP does not follow the main sequence expected from the isochron (Jönsson et al. 2018). Therefore, Ting et al. (2019) used the Payne algorithm to further improve the accuracy of the effective temperature, surface gravity, and 15 element abundances for APOGEE stellar spectra, and released a catalog for reference, APOGEE_payne. This catalog contains approximately 230 000 stellar spectra, and the parameter ranges of Teff, log g, and [Fe/H] are [3050, 7950] K, [0, 5] dex, and [−1.45, 0.45] dex, respectively.

2.2 Reference set

This work obtained a reference set by cross-matching the low-resolution stellar spectra from LAMOST DR8 with APOGEE_payne catalog using the Tool for OPerations on Catalogues and Tables (TOPCAT; Taylor 2017). In the TOPCAT, we limit the maximum angular interval (max error) to 3 arcsec between two stellar position coordinates (RA, Dec.). The match selection is set that each LAMOST spectrum has the best-matching parameter from APOGEE_payne catalog. we obtained 115 942 LAMOST stellar spectra with APOGEE_payne labels. Fig. 1 shows the inconsistencies between the LASP estimation of (Teff, log g, and [Fe/H]) and APOGEE_payne catalog on these spectra. It is shown that when SNRg ≤ 20, the parameter estimation accuracy drops sharply with the decrease of SNR. Therefore, there is a lot of room for improvements on the estimated atmospheric parameters (Teff, log g, and [Fe/H]) from these low-SNR stellar spectra.

Therefore, this paper focused on improving the parameter estimation accuracy of the spectra with 10 < SNRg ≤ 20, and limited the SNR between 10 and 20 for the reference spectra. To ensure the quality of the reference labels, we further limited the label quality of APOGEE_payne to ‘good’, and ended up with 9873 spectra of common stars for this study. The ranges of the atmospheric parameters (Teff, log g, and [Fe/H]) are [3692, 7405] K, [0.679, 4.991] dex, and [−1.448, 0.439] dex, respectively on these spectra. We divided these spectral objects into a training set and a test set (test set 1) based on a ratio of 8:2. Finally, 7898 spectra were used to train the model and 1975 spectra were used to test the performance of the model.

Some experiments show that the model learned from above-mentioned data also has application advantages on a wider range of stellar spectra. Therefore, in Section 4.3, we used 56 198 spectra with 5 < SNRg ≤ 10 and 20 < SNRg ≤ 80 from the common stars as another test set (test set 2) to further evaluate the applicability of our model on a wider range of stellar spectra. Here, the LAMOST low-resolution stellar spectra are used as the reference spectra, and the APOGEE_payne labels are used as the reference labels.

2.3 Pre-processing the spectra

The observed stellar spectra suffer from extinction, reddening, stray-light pollution, instrumental noises, and so on. Therefore, the metal lines of the spectrum are not obvious, and we must properly pre-process the stellar spectral data to promote the availability of the spectral characteristic information before extracting features and estimating parameters. The specific pre-processing steps are as follows:

  • Removing redshift: we shifted each spectrum to its rest frame using the redshift estimated from the LASP.

  • Resampling: we calculated the common wavelength range 3881–8890Å for all stellar spectra, and resampled all spectra by a linear interpolation method with a step of 0.0001 in logarithmic wavelength coordinates.

  • Denoising spectral data: the observed spectra usually have some bad pixels or pixels polluted by noises, which may negatively affect the parameter estimation. Therefore, it is necessary to denoise the spectral data. In this paper, we used a Median Filter method to reduce spectral noise. In median filter, as the width of the filtering window increases, the output signal becomes smoother and smoother, which may result in some loses of effective spectral features. Therefore, the width of the filtering window should be set according to the actual situation. In this paper, we experimentally evaluated different settings with window width of 1, 3, 5, 7, etc. The experimental results show that the median filter with a width 3 achieves the best performance (Table 1).

  • Normalizing spectra: due to the uncertainty of flux calibration in LAMOST stellar spectra, and the unknown extinction values for most observed targets (especially those in the galactic disc), it is necessary to normalize the stellar spectra to improve the reliability of parameter estimations. This work obtained the pseudo-continuum by iteratively fitting a sixth-order polynomial, and normalize the spectrum by dividing its fluxes by the pseudo-continuum.

Table 1.

The dependencies of parameter estimation performance on window width (WW): ww of the median filter.

WWTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
194.079.856164.10.1620.0140.2720.068−0.0020.100
390.29−2.892152.60.1520.0170.2580.0640.0040.096
597.26−0.438164.90.1540.0180.2640.0670.0060.099
793.375.433162.60.1560.0110.2580.0670.0050.097
994.23−8.180159.00.1580.0130.2570.068−0.0040.097
WWTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
194.079.856164.10.1620.0140.2720.068−0.0020.100
390.29−2.892152.60.1520.0170.2580.0640.0040.096
597.26−0.438164.90.1540.0180.2640.0670.0060.099
793.375.433162.60.1560.0110.2580.0670.0050.097
994.23−8.180159.00.1580.0130.2570.068−0.0040.097
Table 1.

The dependencies of parameter estimation performance on window width (WW): ww of the median filter.

WWTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
194.079.856164.10.1620.0140.2720.068−0.0020.100
390.29−2.892152.60.1520.0170.2580.0640.0040.096
597.26−0.438164.90.1540.0180.2640.0670.0060.099
793.375.433162.60.1560.0110.2580.0670.0050.097
994.23−8.180159.00.1580.0130.2570.068−0.0040.097
WWTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
194.079.856164.10.1620.0140.2720.068−0.0020.100
390.29−2.892152.60.1520.0170.2580.0640.0040.096
597.26−0.438164.90.1540.0180.2640.0670.0060.099
793.375.433162.60.1560.0110.2580.0670.0050.097
994.23−8.180159.00.1580.0130.2570.068−0.0040.097

After the above procedures, we obtained the normalized spectra, and the subsequent feature selection and parameter estimation are all conducted on the normalized spectra. An example of continuum normalization is presented in Fig. 3.

A LAMOST observation (spec-56213-EG000023N024031B01_sp05-024) and its normalization result. The upper panel and the lower panel respectively present the original LAMOST spectrum and the continuum-normalized spectrum. The proposed parameter estimation model is experimented on the continuum-normalized spectrum in the bottom panel.
Figure 3.

A LAMOST observation (spec-56213-EG000023N024031B01_sp05-024) and its normalization result. The upper panel and the lower panel respectively present the original LAMOST spectrum and the continuum-normalized spectrum. The proposed parameter estimation model is experimented on the continuum-normalized spectrum in the bottom panel.

3 THE LASSO-MLPNET

Machine-learning algorithms can automatically learn a specific rule from reference data and this learnt rule can be used to predict new data. In particular, the neural network (NN) is a typical machine-learning algorithm that has received wide attentions in spectral parameter estimation. The NN can automatically obtain an approximation of the mapping from spectral features to the atmospheric parameters by learning the given reference data. In case of inputting a stellar spectra into the learned NN model, an estimation of an atmospheric parameter can be computed.

This work proposed a Multilayer Perceptron neural network (MLPNet) based on LASSO features for estimating stellar atmospheric parameters. The scheme first uses the LASSO algorithm to adaptively evaluate the effectiveness of the fluxes in estimating atmospheric parameter in the existence of high-level noises, and selects the parameter-sensitive spectral features accordingly. Based on the selected spectral features, a Multilayer Perceptron neural network is used to estimate the atmospheric parameters.

3.1 LASSO

LASSO (Tibshirani 1996) is a constrained biased estimator used for feature selection. The purpose of LASSO feature selection is to improve the accuracy and interpretability of subsequent parameter estimation (Li et al. 2014; Luo et al. 2015). The LASSO can detect the spectral features sensitive to stellar atmospheric parameters, and eliminate the redundant and invalid data components by constructing a penalty function. This penalty function fuses the objectives of the parameter estimation accuracy and the spareness of selected features. Supposing that we have a reference set |$\lbrace (\boldsymbol{x}_i, y_i), i = 1, \cdots , N\rbrace$|⁠, where xi = (xi1, xi2, …, xin)T is the i-th input variable (a spectrum), and yi is the output parameter (the estimated atmospheric parameter). We assume that xij are standardized, where |$\sum _{i}x_{\mathrm{ i j}}=0, \sum _{i}x_{\mathrm{ i j}}^2/N = 1$|⁠. The penalty function of the LASSO algorithm is:
(1)
where βj is the coefficient of the input variable and t is a non-negative parameter that controls the sparsity of the model. Due to the differences on the correlations between these three atmospheric physical parameters and stellar spectra, this work extracted the spectral features using the LASSO independently for Teff, log g, and [Fe/H].

3.2 Neural network

The estimation of the stellar atmospheric parameters based on the artificial neural network (ANN) method has been investigated in the pioneering work (Bailer-Jones et al. 1997). ANN is a computational model (Fig. 4) composed of a series of hierarchically organized computational units, and each computational unit is referred to as a neuron in related literature. This work proposed a Multilayer Perceptron neural network (MLPNet) for estimating atmospheric parameters. This MLPNet consists of an input layer, two hidden layers, and an output layer. The input to the NN is the LASSO features of the LAMOST stellar spectrum, and the output is the atmospheric parameter to be estimated. Suppose that |$y_i^k$| and |$y_{j}^{k+1}$|⁠, respectively, represent the outputs of the i-th node in the k-th layer and the j-th node in the (k + 1)-th layer of the NN, and |$w^k_{ij}$| represents the weight on the connection between the above two nodes. If the (k + 1)-th layer is a hidden layer, the relationship between the node outputs of the two network layers is :
where |$b_j^k$| represents the bias term related with the j-th node in the (k + 1)-th layer and g is an activation function. This work used the sigmoid activation function:
The activation function is to improve the learning ability of the non-linear relationship of the network model. If the (k + 1)-th layer is the output layer of the parameter estimation network, the relationship between the nodes of the above two layers is:
Suppose W represents the set of the NN weights, B the set of NN biases, and |$\hat{Y}$| and Y represent the parameter predictions and parameter labels of the reference spectra, respectively. The objective function driving the NN model to learn is:
where α is the regularization term coefficient, which needs to be specified empirically.
This is a diagram of a NN, which consists of an input layer, a hidden layer, and an output layer.
Figure 4.

This is a diagram of a NN, which consists of an input layer, a hidden layer, and an output layer.

In evaluating the performance of the MLPNet model, this work uses the mean absolute error (MAE), the average error (μ), and the dispersion (σ, standard deviation of the difference):
(2)
(3)
(4)
where |$\hat{y}_{i}$| and yi are respectively the model prediction and the reference labels.

3.3 Training the model LASSO-MLPNet

In training the LASSO-MLPNet, we used the spectral data from the common stars between LAMOST and APOGEE (more in Section 2). Each reference sample consists of a spectrum from LAMOST DR8 and the corresponding labels from APOGEE_payne catalog. Because the LASSO and NN are very sensitive to the scales of spectral features, it is necessary to normalize the spectrum by projecting each flux into the range from 0 to 1 before feature selection.

In the observed fluxes of the LAMOST low-SNR and low-resolution spectra, there are a series of noises and distortions. And the dimension of the spectra is up to 3600. These factors can negatively affect the computational efficiency and estimation accuracy. Therefore, this work selected the effective spectral features and rejected the redundant and ineffective data components by evaluating the usefulness of the fluxes in the existence of the above-mentioned factors using the LASSO method. The regularization coefficient t in the LASSO controls the sparsity of the model, this paper used a 10-fold cross-validation method to find its optimal configuration, and this operation is implemented in the LASSO package (Pedregosa et al. 2011). Table 2 presents the numbers of the selected features, the optimal regularization coefficients for |$T_\texttt {eff}$|⁠, log g, and [Fe/H].

Table 2.

The numbers of the selected features and the optimal regularization coefficients for |$T_\texttt {eff}$|⁠, log g, and [Fe/H] in LASSO.

Parametertd
Teff0.0002643
log g0.0034775
[Fe/H]0.0013930
Parametertd
Teff0.0002643
log g0.0034775
[Fe/H]0.0013930
Table 2.

The numbers of the selected features and the optimal regularization coefficients for |$T_\texttt {eff}$|⁠, log g, and [Fe/H] in LASSO.

Parametertd
Teff0.0002643
log g0.0034775
[Fe/H]0.0013930
Parametertd
Teff0.0002643
log g0.0034775
[Fe/H]0.0013930

Then, we trained an MLPNet using the extracted spectral features. The model parameters of the NN, such as the weights and biases, are learnt using a back propagation algorithm. In this work, a four-layer network model was separately trained for each stellar atmospheric parameter. These four layers are an input layer, two hidden layers, and an output layer. This network took the stellar spectral features extracted by LASSO as input, and the output layer has only one node, which represents the parameter to be estimated. During training the model, we conducted 2000 iterations and adopted an early stop method to prevent the model from overfitting.

In conclusion, the training of LASSO, MLPNet, and the determination of the hyper-parameters in LASSO, MLPNet, and pre-processing procedures are based on the training set (Section 2).

4 EXPERIMENTS

4.1 Consistencies with the APOGEE_payne catalog

The LASSO-MLPNet model is learnt from the training set (Section 2) for predicting (Teff, log g, and [Fe/H]) from LAMOST spectra. The consistencies of the LASSO-MLPNet predictions and the APOGEE_payne catalog can be investigated using a scatter diagram on the test set (Fig. 5). In Fig. 5, the upper panel presents the scatter diagram for stellar atmospheric parameters Teff, log g, and [Fe/H]. The lower panel presents the histogram of the difference between the LASSO-MLPNet estimations and APOGEE_payne catalog for each parameter. It is shown that the scatter points of three parameters are near the theoretical consistency line. Therefore, there exists excellent consistency between LASSO-MLPNet estimation and the APOGEE_payne catalog.

Consistencies between the prediction from MLPNet and the APOGEE_payne catalog. This figure presents the results on 1975 test spectra (test set 1) of common objects between LAMOST and APOGEE. The upper panel is the scatter diagram with the theoretical consistency in solid line. The lower panels present distribution of residual error on the test set 1. The red dashed curve is a Gaussian fitting.
Figure 5.

Consistencies between the prediction from MLPNet and the APOGEE_payne catalog. This figure presents the results on 1975 test spectra (test set 1) of common objects between LAMOST and APOGEE. The upper panel is the scatter diagram with the theoretical consistency in solid line. The lower panels present distribution of residual error on the test set 1. The red dashed curve is a Gaussian fitting.

The consistencies between the LASSO-MLPNet prediction and the APOGEE_payne catalog can also be reflected by the statistics (MAE, μ, and σ (equations 24) on the differences between LASSO-MLPNet prediction and APOGEE_payne catalog (Table 3). The μ is the mean of the differences between LASSO-MLPNet prediction and APOGEE_payne catalog, reflecting the systematic offset/consistency between the them; The MAE is the cumulative measurement of the difference on the test sample, this statistical variable depicts the overall inconsistency; The σ measures the mean of square error of the difference between the predictions from LASSO-MLPNet and the APOGEE_payne catalog, and measures the stability of this consistency. In Table 3, the results show that the LASSO-MLPNet model performs well in terms of systematic bias, overall consistency, and stability. But it is worth noting that Teff is mainly distributed on [4000, 6500] K and [Fe/H] is distributed on [−1.2, 0.439] dex, and APOGEE lacks spectral data for cold stars, hot stars, and low-metal stars. These factors may affect the generalization ability of the model in these cases. Therefore, we should be careful when using the results from the LASSO-MLPNet in the cases out of the above-mentioned ranges.

Table 3.

The consistencies between the LASSO-MLPNet predictions and the APOGEE_payne catalog. These experimental results are computed from the test set 1.

ParameterMAEμσ
Teff (K)90.29−2.892152.6
log g (dex)0.1520.01710.258
[Fe/H] (dex)0.0640.00440.096
ParameterMAEμσ
Teff (K)90.29−2.892152.6
log g (dex)0.1520.01710.258
[Fe/H] (dex)0.0640.00440.096
Table 3.

The consistencies between the LASSO-MLPNet predictions and the APOGEE_payne catalog. These experimental results are computed from the test set 1.

ParameterMAEμσ
Teff (K)90.29−2.892152.6
log g (dex)0.1520.01710.258
[Fe/H] (dex)0.0640.00440.096
ParameterMAEμσ
Teff (K)90.29−2.892152.6
log g (dex)0.1520.01710.258
[Fe/H] (dex)0.0640.00440.096

The consistencies between the MLPNet predictions and the APOGEE_payne catalog can also be studied by exploring the dependence of difference between the two results on the SNR and the parameters to be estimated (Fig. 6). The left-hand panel shows the differences as a function of SNRg. It does not show any visible correlation between the differences and the SNRg. On the whole, the difference between the LASSO-MLPNet prediction and the APOGEE_payne catalog are around 0, which indicates that there is strong consistency between them. The right-hand panel shows the differences as a function of the stellar atmospheric parameter. For |$T_\texttt {eff}$|⁠, when |$T_\texttt {eff}$| < 6000 K, the differences are uniformly distributed near 0, and when |$T_\texttt {eff}$| > 6000 K, the distribution of differences tends to shift downward, which means that |$T_\texttt {eff}$| derived from stellar spectra is underestimated. This situation results from the scarceness of spectral data in this range. This scarceness makes the model difficult to be sufficiently trained. For logg and [Fe/H], although there are several samples with relatively large differences, the whole is uniformly distributed near 0. In summary, these experimental results show that there are good agreement between the LASSO-MLPNet prediction and APOGEE_payne catalog.

The dependencies of the consistency between LASSO-MLPNet predictions and the APOGEE_payne catalog on SNRg and the atmospheric parameters. These experimental results are computed from the test set 1.The left-hand panel presents the dependencies of the consistency on SNRg. The right-hand panel presents the dependencies of the consistencies on $T_\texttt {eff}$, log g, and [Fe/H]. The mean and standard deviation of the differences are shown in the figure.
Figure 6.

The dependencies of the consistency between LASSO-MLPNet predictions and the APOGEE_payne catalog on SNRg and the atmospheric parameters. These experimental results are computed from the test set 1.The left-hand panel presents the dependencies of the consistency on SNRg. The right-hand panel presents the dependencies of the consistencies on |$T_\texttt {eff}$|⁠, log g, and [Fe/H]. The mean and standard deviation of the differences are shown in the figure.

4.2 Comparisons with other typical regression methods

To evaluate the proposed scheme, this work also compared the LASSO-MLPNet with the following typical regression methods: linear regression (LinearR), ElasticNet, support vector machine regression (SVR), random forest regression (RandomForest), and Extreme Trees Regression (ExtraTrees). In these comparisons, we replaced the MLPNet with LinearR, ElasticNet, SVR, RandomForest, and ExtraTrees, and keep other configurations the same as the experiments in Section 4.1. For example, training set and test set 1, spectral pre-processing procedures and LASSO features, and so on (Table 2). The optimization of the parameters is implemented through a grid search, and the experimental results are presented in Table 4.

Table 4.

Comparing the LASSO-MLPNet with other typical regression methods. LinearR, ElasticNet, SVR(rbf): a Support Vector Machine with a rbf kernel, W-SVR(rbf): Weighted SVR(rbf), RandomForest, ExtraTrees, MLPNet: the method proposed in this work.

MethodTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
LinearR113.55−0.471202.760.270−0.0552.9600.0990.00510.377
ElasticNet109.73−1.970190.680.254−0.04312.4240.096−0.00510.388
SVR(rbf)375.23198.71414.440.2280.00700.3560.080−0.00460.118
W-SVR(rbf)323.73195.86343.180.175−0.00410.2690.075−0.00360.110
RandomForest107.71−2.083170.410.207−0.00510.3170.105−0.00540.143
ExtraTrees102.60−2.112163.670.191−0.00700.2870.099−0.00590.137
MLPNet90.29−2.892152.650.1520.01710.2580.0640.00440.096
MethodTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
LinearR113.55−0.471202.760.270−0.0552.9600.0990.00510.377
ElasticNet109.73−1.970190.680.254−0.04312.4240.096−0.00510.388
SVR(rbf)375.23198.71414.440.2280.00700.3560.080−0.00460.118
W-SVR(rbf)323.73195.86343.180.175−0.00410.2690.075−0.00360.110
RandomForest107.71−2.083170.410.207−0.00510.3170.105−0.00540.143
ExtraTrees102.60−2.112163.670.191−0.00700.2870.099−0.00590.137
MLPNet90.29−2.892152.650.1520.01710.2580.0640.00440.096
Table 4.

Comparing the LASSO-MLPNet with other typical regression methods. LinearR, ElasticNet, SVR(rbf): a Support Vector Machine with a rbf kernel, W-SVR(rbf): Weighted SVR(rbf), RandomForest, ExtraTrees, MLPNet: the method proposed in this work.

MethodTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
LinearR113.55−0.471202.760.270−0.0552.9600.0990.00510.377
ElasticNet109.73−1.970190.680.254−0.04312.4240.096−0.00510.388
SVR(rbf)375.23198.71414.440.2280.00700.3560.080−0.00460.118
W-SVR(rbf)323.73195.86343.180.175−0.00410.2690.075−0.00360.110
RandomForest107.71−2.083170.410.207−0.00510.3170.105−0.00540.143
ExtraTrees102.60−2.112163.670.191−0.00700.2870.099−0.00590.137
MLPNet90.29−2.892152.650.1520.01710.2580.0640.00440.096
MethodTefflog g[Fe/H]
MAEμσMAEμσMAEμσ
LinearR113.55−0.471202.760.270−0.0552.9600.0990.00510.377
ElasticNet109.73−1.970190.680.254−0.04312.4240.096−0.00510.388
SVR(rbf)375.23198.71414.440.2280.00700.3560.080−0.00460.118
W-SVR(rbf)323.73195.86343.180.175−0.00410.2690.075−0.00360.110
RandomForest107.71−2.083170.410.207−0.00510.3170.105−0.00540.143
ExtraTrees102.60−2.112163.670.191−0.00700.2870.099−0.00590.137
MLPNet90.29−2.892152.650.1520.01710.2580.0640.00440.096

The performance of the above-mentioned methods in parameter estimation can be investigated using the difference between their predictions and the APOGEE_payne catalog. Table 4 presents the estimation difference of the six methods on test set 1. On the whole, the MLPNet method outperforms the other methods. For linear methods, the estimation performance of ElasticNet is better than LinearR. This is because there are certain noises and calibration defects in the observed spectra, and LinearR without regularization is more sensitive to such noises and defects. Moreover, non-linear methods (ExtraTrees, RandomForest, and MLPNet) perform better than linear methods (LinearR and ElasticNet) in estimating stellar atmospheric parameters. These results indicate that there exists some non-linear relationships between the spectral features and three stellar atmospheric parameters (Teff, log g, and [Fe/H]). Therefore, it is more suitable to derive the atmospheric parameters using a non-LinearR method. The ExtraTrees and RandomForest are highly parallel algorithms. Among them, RandomForest learns by randomly sampling the training set and selects the best splitting properties from a random subset, while the ExtraTrees uses the entire training set and randomly selects a splitting subset of properties to construct a decision tree. These characteristics make the ExtraTrees model possible to have stronger randomness, smaller model variance (σ2), and better generalization performance. However, the offset (μ) of ExtraTrees may increase (Table 4). For [Fe/H], the errors from ExtraTrees and RandomForest are higher than those of the linear model. The possible reason is that the spectra with low SNR contains more noise, while their tolerance for noise is relatively low, which can result in overfitting.

The estimation performance of SVR with an rbf kernel is better than linear models (LinearR and ElasticNet) on log g and [Fe/H]. However, on Teff estimation, the error from SVR(rbf) is too large. It is known that the SVR method treats each input feature equally, but they may be different from each other on their contributions to the parameter estimation in reality. This inappropriate treatment may cause masking effects on features. Therefore, we investigated a reweighting scheme, W-SVR(rbf), on spectral features. The feature weights are estimated by LASSO method in selecting features. The W-SVR(rbf) scheme is to train and test the parameter estimation model by inputting the reweighted features into SVR(rbf). The experimental results show that the estimation performance is greatly improved after weighting the spectral features. Furthermore, the performance of W-SVR(rbf) is better than that of ExtraTr and RandomF for log g and [Fe/H] (Table 4). However, for Teff, the estimation error of W-SVR(rbf) is still large. The possible reason is that although the compactness of the spectral features extracted by LASSO is very good, there exists some redundancies and insignificant data components in the selected spectral features for estimating Teff. These factors lead to a large variance in the estimation results of the model.

In conclusion, the estimation error of the MLPNet model for each atmospheric parameter is smaller than those of other six models. This indicates that the MLPNet has good parameter estimation performance and good predictive capability for low-SNR stellar spectra from LAMOST DR8.

4.3 Comparison with LASP

To evaluate the accuracy and stability of the LASSO-MLPNet, we computed and analysed the consistency of LASSO-MLPNet predictions and LASP predictions (LAMOST official catalog) to the APOGEE_payne catalog. This evaluation was conducted on 1,975 test spectra (test set 1). LASP (Luo et al. 2015) implements parameter estimation based on the ulyss method and a stepwise refinement strategy (Wu et al. 2011). Actually, the LASP first classifies stars into late A-type and FGK-type stars, then uses the Correlation Function Initial (CFI) method to obtain initial parameters estimation, and finally generates final atmospheric parameter estimation using ulyss.

The consistencies between the LASP estimations and APOGEE_payne catalog are investigated using a scatter diagram and a distribution of the difference between the two results (Fig. 7). The upper subplots in Fig. 7 show the scatter plots for 1975 stellar spectra (test set 1). By comparing the experimental results in Figs 5 and 7, it is found that there are more observations with evident inconsistencies between LASP estimations and APOGEE_payne catalog, and there exist obvious systematic deviations in the LASP estimations from the APOGEE_payne catalog. Therefore, the LASSO-MLPNet predictions are more consistent with the APOGEE_payne catalog than LASP predictions. The below subplots in Fig. 7 show the histogram of differences between LASP estimations and APOGEE_payne catalog.

Consistencies between LASP predictions and the AOPGEE_payne catalog. The upper three subplots are the scatter diagrams with the theoretical consistency in solid line. The lower three subplots present the difference between LASP predictions and the AOPGEE_payne catalog. The red dashed curve is a Gaussian fitting curve. The evaluation results are calculated on test set 1.
Figure 7.

Consistencies between LASP predictions and the AOPGEE_payne catalog. The upper three subplots are the scatter diagrams with the theoretical consistency in solid line. The lower three subplots present the difference between LASP predictions and the AOPGEE_payne catalog. The red dashed curve is a Gaussian fitting curve. The evaluation results are calculated on test set 1.

The inconsistencies between LASP estimations, the LASSO-MLPNet predictions, and the APOGEE_payne catalog can also be evaluated by statistical measures MAE, μ and σ (equations 2, 3, and 4; and Table 5). For each atmospheric parameter, the mean difference μ and dispersion σ from LASP are much evident than the corresponding statistical results from LASSO-MLPNet. Compared with APOGEE_payne catalog, LASP has an overestimation of 54.89 K for Teff, 0.060 dex for log g, and an underestimation of 0.018 dex for [Fe/H]. While, the mean differences of the LASSO-MLPNet, (−2.892 K for Teff, 0.0171 dex for log g, and 0.0044 dex for [Fe/H]) are significantly than LASP. Moreover, the dispersion σ and the MAE for the LASSO-MLPNet predictions are also much smaller than those from LASP. Therefore, the LASSO-MLPNet model has good parameter estimation capability on the LAMOST low-SNR stellar spectra.

Table 5.

The consistencies between LASP predictions, LASSO-MLPNet predictions, and the APOGEE_payne catalog. The left-hand side in the table is the results of the consistency between the LASP estimations and the AOPGEE_payne catalog (LASP, APOGEE_payne); the right-hand side is the consistency results between the LASSO-MLPNet predictions and the AOPGEE_payne catalog (LASSO-MLPNet, APOGEE_payne).

Parameters(LASP, APOGEE_payne)(LASSO-MLPNet, APOGEE_payne)
MAEμσMAEμσ
Teff (K)144.5954.89200.290.29−2.892152.6
log g (dex)0.2360.0600.3250.1520.01710.258
[Fe/H] (dex)0.108−0.0180.1620.0640.00440.096
Parameters(LASP, APOGEE_payne)(LASSO-MLPNet, APOGEE_payne)
MAEμσMAEμσ
Teff (K)144.5954.89200.290.29−2.892152.6
log g (dex)0.2360.0600.3250.1520.01710.258
[Fe/H] (dex)0.108−0.0180.1620.0640.00440.096
Table 5.

The consistencies between LASP predictions, LASSO-MLPNet predictions, and the APOGEE_payne catalog. The left-hand side in the table is the results of the consistency between the LASP estimations and the AOPGEE_payne catalog (LASP, APOGEE_payne); the right-hand side is the consistency results between the LASSO-MLPNet predictions and the AOPGEE_payne catalog (LASSO-MLPNet, APOGEE_payne).

Parameters(LASP, APOGEE_payne)(LASSO-MLPNet, APOGEE_payne)
MAEμσMAEμσ
Teff (K)144.5954.89200.290.29−2.892152.6
log g (dex)0.2360.0600.3250.1520.01710.258
[Fe/H] (dex)0.108−0.0180.1620.0640.00440.096
Parameters(LASP, APOGEE_payne)(LASSO-MLPNet, APOGEE_payne)
MAEμσMAEμσ
Teff (K)144.5954.89200.290.29−2.892152.6
log g (dex)0.2360.0600.3250.1520.01710.258
[Fe/H] (dex)0.108−0.0180.1620.0640.00440.096

To evaluate the applicability of the LASSO-MLPNet model on a wider range of spectra, we constructed the second test set (test set 2) for the cases 5 < SNRg ≤ 10 and 20 < SNRg ≤ 80. This test set contains 56 198 LAMOST stellar spectra. We evaluated the consistency between the predictions from the aforementioned LASSO-MLPNet and the APOGEE_payne catalog on test set 1 and test set 2 (Fig. 8). The results show that the LASSO-MLPNet improves the accuracy significantly in the case of 5 < SNR ≤ 80.

The consistencies between LASP estimations, LASSO-MLPNet predictions, and the APOGEE_payne catalog. The red solid line and the green dashed line are dependencies of the MAEs between AOPGEE_payne catalog and LASSO-MLPNet predictions, and the MAEs between the AOPGEE_payne catalog and LASP estimations, on SNRg. The evaluation results of this experiment are calculated on test set 1 and test set 2.
Figure 8.

The consistencies between LASP estimations, LASSO-MLPNet predictions, and the APOGEE_payne catalog. The red solid line and the green dashed line are dependencies of the MAEs between AOPGEE_payne catalog and LASSO-MLPNet predictions, and the MAEs between the AOPGEE_payne catalog and LASP estimations, on SNRg. The evaluation results of this experiment are calculated on test set 1 and test set 2.

4.4 Uncertainty analysis

The uncertainty of the atmospheric parameter estimation is the instability of the estimates caused by observational noise, instrumental effects, and parameter estimation model effects. This work investigated the uncertainty from the following two aspects: the integrated uncertainty and the model uncertainty. These uncertainty evaluation results are calculated on test set 1 and test set 2.

The integrated uncertainty is measured by the standard deviation of the differences between the model estimations and the reference label. The standard deviation of the difference describes the robustness of the parameter estimation system to some factors, such as noises, instrumental effects and parameter estimation model effects, and so on. Fig. 9 shows the dependencies of the standard deviation/dispersion σ of the stellar atmospheric parameters on SNRg. The experimental results show that for the three atmospheric parameters, the dispersions σ from the LASSO-MLPNet predictions with APOGEE_payne catalog are smaller than those from LASP estimations on a wide SNR range. Especially, in case of SNRg ≤ 40, the dispersion from the LASSO-MLPNet decreases sharply with the increase of SNRg; In case of SNRg > 40, the trend of dispersion tends to be flat. On the contrary, the dispersion of LASP decreases on a much wider SNR range. Therefore, the experimental results also show that the dispersion from LASSO-MLPNet has a weaker correlation with SNRg than those from LASP. This weaker correlation means that the LASSO-MLPNet model is more robust to the above-mentioned factors. In summary, these experimental results indicate that the parameter estimation performance of the LASSO-MLPNet model is more robust and more certain.

The dispersion σ of the difference between the APOGEE_payne catalog and LASP estimations or the LASSO-MLPNet predictions. The red solid line and the green dashed line are respectively the dependencies of the dispersions of the differences between the APOGEE_payne catalog and the LASSO-MLPNet predictions, and between the APOGEE_payne catalog and LASP estimations, on SNRg. The evaluation results of this experiment are calculated on test set 1 and test set 2.
Figure 9.

The dispersion σ of the difference between the APOGEE_payne catalog and LASP estimations or the LASSO-MLPNet predictions. The red solid line and the green dashed line are respectively the dependencies of the dispersions of the differences between the APOGEE_payne catalog and the LASSO-MLPNet predictions, and between the APOGEE_payne catalog and LASP estimations, on SNRg. The evaluation results of this experiment are calculated on test set 1 and test set 2.

As for Teff, however, the dispersion of the difference between the LASSO-MLPNet predictions and APOGEE_payne catalog increases slightly at the high SNRg range. This phenomenon probably is caused by difference on noise level between training spectra and test spectra. The SNRg of the training data ranges from 10 to 20, these training spectra is disturbed by a lot of noise components. As the quality of the test data (test set 2) increases, the difference between the training data and the test data (test set 2) becomes more and more significant on noise level. Therefore, the dispersion of difference slightly increases in case of the test spectra with very high quality (high SNRg). This phenomenon also indicates that it is a potential exploration direction to construct an appropriate training set and the corresponding parameter estimation models respectively for the spectra with different SNR range. On the other hand, this work utilized the spectral fluxes on a more wide wavelength range than the LASP. Although the superiority of the wide wavelength range is more spectral information for parameter estimating in theory, there are also some potential risks, for example, inclusion of a lot of invalid data (e.g. 5700–5900 Å). These kind risks also result in an increase on uncertainty.

The model uncertainty results from the conditions of the quantity and parameter coverages of the training spectra, the randomness of the obtained model parameters, and so on. This work used the Dropout technique (Hron, Matthews & Ghahramani 2018) to estimate the model uncertainty of LASSO-MLPNet. Related researches show that the Dropout technique is a Bayesian approximation of the model uncertainty (Gal & Ghahramani 2016). We trained 10 different LASSO-MLPNet models, which can give 10 different predictions for each test spectrum. The standard deviation of these 10 predictions is calculated as the model uncertainty measurement (the red dotted line) of the LASSO-MLPNet predictions on that spectrum (Fig. 10). The experimental results show that the model uncertainty decreases with the increase of SNRg.

The dependencies of the errors (the residual, the residual standard deviation, the absolute of residual, and the model uncertainty) of LASSO-MLPNet predictions on SNRg. The residual, the residual standard deviation, and the absolute of residual are obtained by calculating the estimations obtained by LASSO-MLPNet method (without dropout). The residual standard deviation is referred to as integrated uncertainty in this work. The model uncertainty is obtained by LASSO-MLPNet method (with dropout). The evaluation results of this experiment are calculated on test set 1 and test set 2.
Figure 10.

The dependencies of the errors (the residual, the residual standard deviation, the absolute of residual, and the model uncertainty) of LASSO-MLPNet predictions on SNRg. The residual, the residual standard deviation, and the absolute of residual are obtained by calculating the estimations obtained by LASSO-MLPNet method (without dropout). The residual standard deviation is referred to as integrated uncertainty in this work. The model uncertainty is obtained by LASSO-MLPNet method (with dropout). The evaluation results of this experiment are calculated on test set 1 and test set 2.

To further investigate the rationality of the uncertainty measures, this work studies the dependencies of the estimation residual, MAE (absolute residual), residual standard deviation (integrated uncertainty), and model uncertainty on residual (Fig. 10). It is shown that, on the whole, there exist some positive correlations between the proposed uncertainty measures (the integrated uncertainty and the model uncertainty) and the estimation inconsistency (residual, MAE/residual). These positive correlations indicate the rationality of the uncertainty measures. However, some of the residuals are positive and the others are negative. The changing sign of residual brings some difficulties in reading this positive correlation. Therefore, we also plotted the dependencies of the absolute of residual on SNR (Fig. 10). With the increase of SNR, the standard deviation of residual and the absolute of residual and model uncertainty gradually decrease in general. Specifically, they decrease sharply with the increase of SNR in case of SNRg < 20. While in case of SNRg > 20, the downward trend tends to be flat. Therefore, the model uncertainty correctly indicates the reliability of the estimated parameters.

We also investigated the sensitiveness of the proposed uncertainty measures to the extrapolation (Fig. 11). Based on the characteristics of the LAMOST observations, this work focused on the parameter estimation of the spectrum with 3500 K |$\le T_\texttt {eff}\le$| 6500 K. It is shown that the model uncertainty is slightly sensitive to the extrapolation. In case of Teff ≤ 6500 K, the uncertainty of the model and the absolute of parameter residual have no obvious trend with the increase of Teff. However, in case of Teff > 6500 K, the model uncertainty has an slight upward trend. This indicates that the model uncertainty can alarm the cases of parameter estimation performance degradations.

The sensitiveness of the uncertainty to extrapolation. Based on the statistical characteristics of the LAMOST spectra, this work focus on the parameter estimation of the spectrum with 3500 K$\le T_\texttt {eff}\le$6500 K. It is shown that the model uncertainty is slightly sensitive to the extrapolation. The evaluation results of this experiment are calculated on test set 1 and test set 2.
Figure 11.

The sensitiveness of the uncertainty to extrapolation. Based on the statistical characteristics of the LAMOST spectra, this work focus on the parameter estimation of the spectrum with 3500 K|$\le T_\texttt {eff}\le$|6500 K. It is shown that the model uncertainty is slightly sensitive to the extrapolation. The evaluation results of this experiment are calculated on test set 1 and test set 2.

Some uncertainties come from noises and reference label quality variations. This work studied the evaluation of the uncertainty of the proposed scheme, but not tried to reduce the negative influences from the uncertainties of noises and reference label quality variations. Actually, Leung & Bovy (2019) designed a novel objective function to reduce these kinds of negative influences from label noises and observation noises. This is an interesting and valuable investigation. We will study them in the next step.

5 APPLICATION TO LAMOST STELLAR SPECTRA

In Section 4, we evaluated the LASSO-MLPNet model on test set 1 and test set 2, and conducted a series of comparisons and analysis on the performance of this model. It is shown that the proposed model is robust and applicable to a wide range of stellar spectra. Therefore, this work applied the LASSO-MLPNet model to estimating atmospheric parameters from LAMOST low-resolution stellar spectra with 5 < SNRg ≤ 80. In this SNR range, the spectra without the estimations of redshift and atmospheric parameters in the LAMOST DR8 catalog are excluded.

The APOGEE survey mainly contains G and K-type stars, and lacks hot and cold stars. Therefore, there exist very little observations with |$T_{\mathrm{eff}}\lt 3500\, {\rm K}$|⁠, |$T_{\mathrm{eff}}\gt 6500\, {\rm K}$| from common stars between LAMOST and APOGEE_payne catalog. The result is that the reference data on this parameter ranges are insufficient to train a LASSO-MLPNet model. Therefore, the spectra in this parameter range are also excluded from the processing of the LASSO-MLPNet model. Therefore, we estimated the stellar atmospheric parameters for 4 828 190 LAMOST DR8 stellar spectra with 3500 K ≤ Teff ≤ 6500 K and 5 < SNRg ≤ 80. The estimations are released as a value-added catalog. The distribution map of the predictions is presented in Fig. 12. The isochrones in this figure are the stellar evolution trace from MESA Isochrones and Stellar Tracks (MIST) with a stellar age of 7 Gyr (Choi et al. 2016; Dotter 2016).

The distribution diagram of the LASSO-MLPNet catalog on Teff-log g. The colour indicates the value of [Fe/H]. The isochrones are computed from the MIST stellar evolution model, in which the stellar age is 7 Gyr, and the [Fe/H] is −0.5 dex (solid line), 0 (dotted line), and 0.5 dex (dotted line), respectively.
Figure 12.

The distribution diagram of the LASSO-MLPNet catalog on Teff-log g. The colour indicates the value of [Fe/H]. The isochrones are computed from the MIST stellar evolution model, in which the stellar age is 7 Gyr, and the [Fe/H] is −0.5 dex (solid line), 0 (dotted line), and 0.5 dex (dotted line), respectively.

The proposed LASSO-MLPNet is a data-driven method. The learning and performance of it depend on the sufficiencies of reference data in training set. Therefore, the LASSO-MLPNet can degrade in the parameter ranges with scarce reference data, for example, the spectra with |$T_{\mathrm{eff}}\lt 3500\, {\rm K}$|⁠, |$T_{\mathrm{eff}}\gt 6500\, {\rm K}$|⁠, and [Fe/H] < −1.5. Therefore, the released catalog does not contain the information of this kind spectra. On the contrary, the LASP delivers stellar parameters by minimizing the χ2 between observed spectra and template spectra from ELODIE library (Prugniel & Soubiran 2001; Prugniel et al. 2007). This is a kind of prototype method, which are relatively excellent on the parameter ranges with less reference samples.

6 CONCLUSIONS

In this article, a Multilayer Perceptron neural network model based on LASSO (LASSO-MLPNet) was proposed for estimating atmospheric parameters (Teff, log g, and [Fe/H]) from LAMOST DR8 spectra. We used the stellar spectra from common stars between LAMOST and APOGEE_payne catalog as the reference data. Each reference sample consists of a LAMOST DR8 spectrum and the corresponding parameter estimations from the AOPGEE_payne catalog. The experimental results show that the MLNet predictions are excellently consistent with the APOGEE_payne catalog. In case of the spectra with 10 < SNRg ≤ 20, the MAEs of three atmospheric parameters (Teff, log g, and [Fe/H]) are 90.29 K, 0.152 dex, and 0.064 dex, respectively.

To evaluate the predictive performance of the proposed model, this paper compared LASSO-MLPNet with six typical regression methods. The experimental results show that our model has good estimation capability. At the same time, we also evaluated the consistencies between the APOGEE_payne catalog and LASP estimations. The accuracy of the three parameters are 145.4 K, 0.236 dex, and 0.110 dex, respectively. Compared with the results in Section 4, the LASSO-MLPNet predictions show better consistency with the APOGEE_payne catalog. And we analysed the uncertainty of parameter estimation, and the results show that the LASSO-MLPNet model has good robustness.

Due to the lack of cold and hot stars in APOGEE, the available samples of this kind are scarce in the reference data. Therefore, the generalization performance of the model may be relatively inferior on the spectra of these stars. The spectra with low SNR are contaminated with much noise, and the metal lines on them are very weak. These factors directly affect the estimation performance of the LASSO-MLPNet. Therefore, there should be a lot of works under being conducted in future to improve the parameter estimation performance further for these low-quality spectra.

As part of the study, we estimated atmospheric parameters for more than 4.82 million low-resolution spectra with 5 < SNRg ≤ 80 and 3500 K ≤ Teff ≤ 6500 K from LAMOST DR8. The estimations results are released as a value-added catalog for reference.

ACKNOWLEDGEMENTS

This work was supported by the National Natural Science Foundation of China (grant no. 11973022), the Natural Science Foundation of Guangdong Province (no. 2020A1515010710), the Major projects of the joint fund of Guangdong, and the National Natural Science Foundation (grant no. U1811464).

LAMOST, a multi-target optical fiber spectroscopic telescope in the large sky area, is a major national engineering project built by the Chinese Academy of Sciences. The funding for the project is provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatory of the Chinese Academy of Sciences.

Software: numpy (Harris et al. 2020), scipy (Virtanen et al. 2020), astropy (Price-Whelan et al. 2018), matplotlib (Hunter 2007), and scikit-learn (Pedregosa et al. 2011).

DATA AVAILABILITY

The LAMOST data employed in this article are available after September 2022 to the users out of China for download from LAMOST DR8, at http://www.lamost.org/dr8/. And the software for this pipeline, trained model, training set, test data sets, and the produced catalog are available at https://github.com/xrli/LASSO-MLPNet.

REFERENCES

Bailer-Jones
C. A.
,
Irwin
M.
,
Gilmore
G.
,
von Hippel
T.
,
1997
,
MNRAS
,
292
,
157

Choi
J.
,
Dotter
A.
,
Conroy
C.
,
Cantiello
M.
,
Paxton
B.
,
Johnson
B. D.
,
2016
,
ApJ
,
823
,
102

Cui
X.-Q.
et al. ,
2012
,
Res. Astron. Astrophys.
,
12
,
1197

Dotter
A.
,
2016
,
ApJS
,
222
,
8

Gal
Y.
,
Ghahramani
Z.
,
2016
, in
Balcan
M. F.
,
Weinberger
K. Q.
, eds,
Proceedings of the 33rd International Conference on International Conference on Machine Learning
,
Proceedings of Machine Learning Research
. Vol.
48
,
PMLR
,
New York, USA
, p.
1050

Harris
C. R.
et al. ,
2020
,
Nature
,
585
,
357

Holtzman
J. A.
et al. ,
2018
,
AJ
,
156
,
125

Hron
J.
,
Matthews
A.
,
Ghahramani
Z.
,
2018
, in
Dy
J.
,
Krause
A.
, eds,
Proceedings of the 35th International Conference on Machine Learning
. Vol.
80
,
PMLR
, p.
2019

Hunter
J. D.
,
2007
,
Comput. Science Eng.
,
9
,
90

Jönsson
H.
et al. ,
2018
,
AJ
,
156
,
126

Koleva
M.
,
Prugniel
P.
,
Bouchard
A.
,
Wu
Y.
,
2009
,
A&A
,
501
,
1269

Leung
H. W.
,
Bovy
J.
,
2019
,
MNRAS
,
483
,
3255

Luo
A-Li
,
Zhao
Y.-H.
,
Zhao
G.
,
Deng
Li-Cai
,
Liu
X.-W.
, et al. ,
2015
,
Res. Astron. Astrophys.
,
15
,
1095

Li
X.
,
Wu
Q. J.
,
Luo
A.
,
Zhao
Y.
,
Lu
Y.
,
Zuo
F.
,
Yang
T.
,
Wang
Y.
,
2014
,
ApJ
,
790
,
105

Li
X.
,
Lu
Y.
,
Comte
G.
,
Luo
A.
,
Zhao
Y.
,
Wang
Y.
,
2015
,
ApJS
,
218
,
3

Liu
C.
et al. ,
2014
,
ApJ
,
790
,
110

Liu
X.-W.
,
Zhao
G.
,
Hou
J.-L.
,
2015
,
Res. Astron. Astrophys.
,
15
,
1089

Majewski
S. R.
et al. ,
2017
,
AJ
,
154
,
94

Masseron
T.
et al. ,
2014
,
A&A
,
571
,
A47

Pedregosa
F.
et al. ,
2011
,
J. Mach. Learn. Res.
,
12
,
2825

Pérez
A. E. G.
et al. ,
2016
,
AJ
,
151
,
144

Plez
B.
,
Brett
J. M.
,
Nordlund
A.
,
1992
,
A&A
,
256
,
551

Price-Whelan
A. M.
et al. ,
2018
,
AJ
,
156
,
123

Prugniel
P.
,
Soubiran
C.
,
2001
,
A&A
,
369
,
1048

Prugniel
P.
,
Soubiran
C.
,
Koleva
M.
,
Borgne
D. L.
,
2007
,
preprint (astro-ph/0703658)

Schölkopf
B.
,
Smola
A.
,
Müller
K.-R.
,
1997
, in
International Conference on Artificial Neural Networks
.
Springer Berlin
,
Heidelberg
, p.
583

Taylor
M.
,
2017
,
preprint (arXiv:1711.01885)

Tibshirani
R.
,
1996
,
J. R. Stat. Soc.: Series B (Methodological)
,
58
,
267

Ting
Y.-S.
,
Rix
H.-W.
,
Conroy
C.
,
Ho
A. Y.
,
Lin
J.
,
2017
,
ApJ
,
849
,
L9

Ting
Y.-S.
,
Conroy
C.
,
Rix
H.-W.
,
Cargile
P.
,
2019
,
ApJ
,
879
,
69

Virtanen
P.
et al. ,
2020
,
Nature Methods
,
17
,
261

Wang
R.
et al. ,
2020
,
ApJ
,
891
,
23

Wu
Y.
et al. ,
2011
,
Res. Astron. Astrophys.
,
11
,
924

Wu
Y.
,
Du
B.
,
Luo
A.
,
Zhao
Y.
,
Yuan
H.
,
2014
,
Proc. Int. Astron. Union
,
10
,
340

Xiang
M.
et al. ,
2019
,
ApJS
,
245
,
34

Xiang
M.-S.
et al. ,
2017
,
MNRAS
,
464
,
3657

Zhang
B.
,
Liu
C.
,
Deng
L.-C.
,
2020
,
ApJS
,
246
,
9

Zhao
G.
,
Zhao
Y.-H.
,
Chu
Y.-Q.
,
Jing
Y.-P.
,
Deng
L.-C.
,
2012
,
Res. Astron. Astrophys.
,
12
,
723

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)