-
PDF
- Split View
-
Views
-
Cite
Cite
Jing Zhong, Zhijie Deng, Xiangru Li, Lili Wang, Haifeng Yang, Hui Li, Xirong Zhao, Estimation of stellar mass and star formation rate based on galaxy images, Monthly Notices of the Royal Astronomical Society, Volume 531, Issue 1, June 2024, Pages 2011–2027, https://doi.org/10.1093/mnras/stae1271
- Share Icon Share
ABSTRACT
It is crucial for a deeper understanding of the formation and evolution of galaxies in the Universe to study stellar mass (M*) and star formation rate (SFR). Traditionally, astronomers infer the properties of galaxies from spectra, which are highly informative, but expensive and hard to be obtained. Fortunately, modern sky surveys obtained a vast amount of high-spatial-resolution photometric images. The photometric images are obtained relatively economically than spectra, and it is very helpful for related studies if M* and SFR can be estimated from photometric images. Therefore, this paper conducted some preliminary researches and explorations on this regard. We constructed a deep learning model named Galaxy Efficient Network (GalEffNet) for estimating integrated M* and specific star formation rate (sSFR) from Dark Energy Spectroscopic Instrument galaxy images. The GalEffNet primarily consists of a general feature extraction module and a parameter feature extractor. The research results indicate that the proposed GalEffNet exhibits good performance in estimating M* and sSFR, with σ reaching 0.218 and 0.410 dex. To further assess the robustness of the network, prediction uncertainty was performed. The results show that our model maintains good consistency within a reasonable bias range. We also compared the performance of various network architectures and further tested the proposed scheme using image sets with various resolutions and wavelength bands. Furthermore, we conducted applicability analysis on galaxies of various sizes, redshifts, and morphological types. The results indicate that our model performs well across galaxies with various characteristics and indicate its potentials of broad applicability.
1 INTRODUCTION
With the commencement of various large-scale sky survey projects, our understanding of galaxy evolution has undergone a profound transformation. In 2012, the Large Sky Area Multi-Object Fiber Spectroscopy Telescope (Luo et al. 2015) began its survey, generating a wealth of spectroscopic data crucial for the study of the internal structure of the Milky Way, galaxy clusters, stellar characteristics, and more. Cosmological surveys of large-scale structure, such as Euclid (Laureijs et al. 2011) and the Large Synoptic Survey Telescope (LSST; LSST Dark Energy Science Collaboration 2012), are designed to scan vast expanses of the sky, creating unprecedented galaxy samples. The Sloan Digital Sky Survey (SDSS; York et al. 2000; Yanny et al. 2009), by mapping the three-dimensional structure of the Universe in detail, has become the most influential survey project to date. The Dark Energy Spectroscopic Instrument (DESI; Dey et al. 2019), as the latest generation survey tool, offers deeper images and spectroscopic data compared to SDSS, providing an enduring data legacy for astronomers worldwide and serving as the primary data source for this research.
Observations of large samples of star-forming galaxies have shown that stellar mass (M*) and star formation rate (SFR) are key parameters for studying galaxy evolution (e.g. Brinchmann et al. 2004; Elbaz et al. 2011; Schreiber et al. 2015; Steffen et al. 2021). M* serves as a rough representation of the entire history of star formation within a galaxy. SFR reflects the efficiency with which a galaxy converts gas into stars over a unit of time, indicating the mass of stars produced annually. By measuring both the M* and the SFR within galaxies, we can delve into the detailed processes of their formation and evolution, and even explore the distribution of baryonic matter on a cosmic scale, which holds crucial significance.
To acquire the characteristics of galaxies, different researchers rely on various theoretical models and assumptions. M* can be calculated based on stellar populations synthesis models (Bisigello et al. 2019; Li et al. 2022a). Alternatively, it can be estimated by the linear relationship between the spectral energy distribution (SED) of galaxies and their mass-to-light ratio (M*/L) (Kauffmann et al. 2003). Calculating SFR is more complex, often utilizing several commonly used SFR-tracing indicators, such as ultraviolet (UV; Salim et al. 2007), infrared (IR; Calzetti et al. 2010), H α emission line (Brinchmann et al. 2004; Renzini & Peng 2015), and radio wavelengths (Murphy et al. 2011; Enia et al. 2022). To obtain a high-precision SFR, relying solely on data from a single wavelength is insufficient. We need to account for the parts affected by interstellar dust while also analysing the unaffected parts. Therefore, the combination of multiwavelength measurements (such as UV + IR, H α + IR) is the most commonly used method (Kennicutt et al. 2009; Hao et al. 2011; Popesso et al. 2019), which enhances the accuracy of the final derived SFR. The emergence of various simulation or semi-empirical SED models (e.g. Bruzual & Charlot 2003; Leja et al. 2017) has opened new avenues for SFR calculation. While these methods possess a solid theoretical foundation and accuracy, they may be constrained by observational conditions and data quality, especially for galaxies requiring extensive detailed photometric data. Therefore, efficient, flexible, and easily interpretable estimation tools like lephare (Ilbert et al. 2006), magphys (Da Cunha, Charlot & Elbaz 2008; Da Cunha et al. 2015), bagpipes (Carnall et al. 2018), and cigale (Boquien et al. 2019) are commonly employed for studying galaxy parameters. However, the parameters provided by these tools are not entirely independent and may yield degenerate results under different parameter settings. Consequently, to obtain reasonable and stable fitting models and data analysis results, a rational setting of parameters is crucial.
In the study of M* and SFR in galaxies, spectroscopic analysis is a common and powerful approach. However, the collection and processing of spectroscopic data are time-consuming and expensive. Therefore, astronomical imaging has emerged as a more economical method for studying the sky. This trend has brought about a wealth of high-resolution and publicly available astronomical images. Compared to spectroscopy-based methods (Bonjean et al. 2019; Surana et al. 2020; Sandles et al. 2022), the image-based methods provide a different perspective for studying M* and SFR. These approaches involve a comprehensive analysis of galaxies’ features in two-dimensional space, including morphological characteristics (such as size, shape, symmetry), colour distribution, brightness distribution, and potential star-forming regions. This further enhances our understanding of galaxy evolution and stellar formation mechanisms.
In 2021, Buck & Wolf (2021) conducted a study on simulated SDSS images generated by the Illustris simulation. They proposed establishing a pixel-by-pixel connection between astronomical broad-band imaging data and underlying discernible physical properties such as M*, SFR, and more. This approach enables the extraction of galaxy attributes from relatively inexpensive galaxy images, eliminating the need for costly Integral Field Unit spectroscopic measurements, as seen in Mapping Nearby Galaxies at Apache Point Observatory (MANGA; Bundy et al. 2014) and Sydney-AAO Multi-object Integral-field spectrograph (SAMI; Bryant et al. 2015). However, due to the virtual and complex nature of simulated images, models trained on them may not generalize well to real data sets. Therefore, we proposed a research goal focused on estimating parameters based on real galaxy images.
With the advent of the artificial intelligence and big data era, machine learning methods have shown promising applications in numerous astronomical studies, such as galaxy morphology classification (Ćiprijanović et al. 2022; Fang et al. 2023), weak gravitational lensing of galaxies (Fluri et al. 2022), galaxy cluster mass estimation (Ntampaka et al. 2018, 2019), and photometric redshift estimation (Henghes et al. 2022; Li et al. 2022b). Neural networks, as a machine learning method, have garnered increasing attention in recent years. In comparison to traditional machine learning methods, neural networks demonstrate exceptional performance in tasks such as image classification and regression. Particularly in the research of galaxy images, neural networks mainly focus on the methods of Convolutional Neural Network (CNN). For instance, Wu & Boada (2019) utilized CNNs to estimate gas metallicity, Pasquet et al. (2019) employed deep CNN (Capsule Networks) for estimating photometric redshifts, Ntampaka et al. (2019) applied CNN to estimate cluster masses, and Bisigello (Collaboration et al. 2022) applied CNN to estimate redshift, M*, and SFR. The CNN exhibits translational invariance, a degree of rotational invariance, and demonstrates excellent feature learning capabilities in addressing galaxy image-related challenges.
Therefore, this paper aims to train Galaxy Efficient Network (GalEffNet), a deep convolutional neural network, to exclusively learn and predict integrated M* and SFR solely from real galaxy images. We established a reference data set through cross-matching DESI galaxy image data with the SDSS Max Planck Institute for Astrophysics - Johns Hopkins University (MPA-JHU) catalogue. GalEffNet is trained using transfer learning, allowing it to extract more comprehensive physics features from multiband images of DESI galaxies. This process further enhances the accuracy and reliability of the model’s predictions, enabling effective predictions of M* and SFR in new galaxy images.
The structure of this paper is organized as follows: Section 2 introduces the DESI galaxy images data, the SDSS MPA-JHU catalogue, and the relevant data pre-processing methods. Section 3 introduces the GalEffNet model. Section 4 presents the performance of our model, relevant experiments, and research results. Finally, in Section 5, we provide a summary and outlook for this paper.
2 DATA
The reference data set for the neural network model learning in this paper is composed of the DESI Data Release 9 (DR9) galaxy images (Dey et al. 2019) and the SDSS MPA-JHU Data Release 8 (DR8) catalogue (Kauffmann et al. 2003; Brinchmann et al. 2004), with M* and specific star formation rate (sSFR). This data set was established through cross-matching between the DESI DR9 catalogue and the MPA-JHU DR8 catalogue. The following two sections will provide further details on the process of creating the reference data set and the relevant pre-processing steps.
2.1 Reference data set based on homogeneous observations from DESI and MPA-JHU
The SDSS is one of the largest optical sky survey projects to date. It has produced deep images of about one-third of the sky in five optical bands: u, g, r, i, and z, and measured the spectra of over three million celestial objects. Extensive research on the emission lines in these spectra has generated numerous value-added catalogues, providing rich information about celestial objects. In this study, the SDSS MPA-JHU DR8 catalogue is used, where Kauffmann et al. (2003) calculated M* using photometric luminosities across the g, r, i, and z bands. By comparing with synthetic galaxy models, they estimated the dust correction and the ratio of stellar mass to light, thereby deriving the best estimate of M*. Specifically, we used the median stellar mass (lgm_fib_p50) of target galaxies located within the SDSS fibre aperture (3 arcsec in diameter). The SFR is estimated by Brinchmann et al. (2004) for galaxies also within the SDSS fibre aperture (3 arcsec in diameter), using the dust extinction corrected H α emission lines, derived from the Balmer decrement ratio H α/H β. For galaxies without emission lines, Brinchmann et al. (2004) estimated the SFR using the relationship between SFR and the spectral index D4000 (Balogh et al. 1999). Because the distribution of stellar mass within galaxies is not uniform, we ultimately chose to normalize SFR with M* within the same galaxy, resulting in the median specific star formation rate (specsfr_fib_p50) as a reference label. sSFR characterizes the intensity of star formation activity in a galaxy.
The primary objective of the DESI is to precisely observe the three-dimensional distribution of the large-scale structure of galaxies in the Universe, thereby unravelling the mysteries of the Universe’s accelerated expansion and dark energy. As the world’s most powerful multi-object spectrograph for surveys, DESI will provide more than ten times the amount of spectroscopic and imaging data than all previous survey observations combined, offering an unprecedented wealth of data resources. DESI covers 19 721 square degrees of the extragalactic sky in the visible Northern hemisphere. The survey footprint is divided into Northern and Southern sections and encompasses three optical bands (g, r, z). In this paper, we utilize the DESI DR9 galaxy image data released in 2021 January to perform feature learning related to M* and sSFR.
To ensure accurate matching between reference labels and the image data set, we initially filtered reliable reference labels from the MPA-JHU catalogue. We set a series of criteria.
spectrotype = ′GALAXY′, reliable = 1. spectrotype = ′GALAXY′ means that we have removed the labels of STAR and Quasi-Stellar Object (QSO), and only kept the labels of galaxies. reliable = 1 ensures that the final selected galaxy labels are reliable and accurate.
z > 0,
. z > 0 indicates that we have selected galaxies with redshifts greater than 0. means that bad redshift has been removed, ensuring that the selected galaxies have reliable redshift values.Signal-to-noise ratio in r band (S/Nr) ≥5. This standard can eliminate low-quality galaxy data.
, . These two equations indicate that galaxies with invalid or missing values of M* and sSFR have been filtered out.
Subsequently, we use topcat1 tool to performed cross-matching between approximately 76 million sources from DESI DR9 samples and the MPA-JHU catalogue. Our matching parameters are the coordinates of galaxies (ra, dec), with a matching radius of 3 arcsec, and the best match is chosen to ensure that they can be accurately matched. Due to the repeated observations of the SDSS and DESI surveys, both the SDSS MPA-JHU DR8 catalogue and the DESI DR9 catalogue contain duplicate entries. In order to ensure the purity of training data, we removed duplicate entries. Finally, we obtained a total of 805 678 image data with M* and sSFR parameters. Before using the images as input to GalEffNet, we will perform further data pre-processing, cleaning, and filtering.
2.2 Data pre-processing
In this section, we conducted research on the data samples obtained through cross-matching. We summarized three key issues in the pre-processing stage: addressing masked noisy astronomical sources, identifying blended galaxies, and managing source size constraints. Next, we will introduce the methods for addressing these three issues one by one.
2.2.1 Masking noisy astronomical sources
Each sample in this work is a galaxy image observed by DESI, consisting of 152 × 152 pixels and three optimal bands g, r, and z (Fig. 1). Typically, each image contains multiple celestial objects, with only the one at the centre of the image being the target galaxy we need to predict. The rest are noisy objects. Since the number of noisy objects far exceeds the target galaxy, the model may struggle to extract meaningful information, which is detrimental to subsequent research. To address this issue, we need to mask the image to retain only the target galaxy. We delve deeper into the rationality of masking noisy astronomical sources in Appendix A1. Next, we will use the method of adaptive thresholding and contour fitting to mask the noisy astronomical sources.

Examples of galaxy images from the DESI data base. Each image is 152 × 152 pixels and covers the g, r, z three optical bands. The numbers indicate the ra and dec of each target galaxy located at the centre of the image.
(i) Adaptive Thresholding: First, we utilized Stein’s algorithm2 to convert the original images containing the g, r, and z bands into RGB images, while linearly scaling the pixel values of the images to the [0, 1] range. And then we convert the RGB image to a greyscale image by taking a weighted average of the colour channel values for each pixel. After that, we calculate the threshold based on the research by Yuan et al. (2023). Using the greyscale characteristics of the image, we determine the maximum and minimum greyscale values in the image, denoted as Cmax and Cmin. The threshold T0 is initialized as:
We divide the image into foreground and background using the initial threshold T0, and calculate their respective average greyscale values, Cobj and Cbkg. Then, we update the threshold through the following iterative process:
Repeat this iterative process until the current threshold Tk and the threshold for the next iteration Tk + 1 are equal (equation 3), obtaining the adaptive optimal threshold.
Using the obtained threshold, we extract connected regions in the image with greyscale values higher than this threshold. Set the pixel values within the regions to 1 and the rest to 0, resulting in the binary segmentation image as shown in Fig. 2(c).

Visual representations of the different steps for masking noisy objects. The process involves converting the image from (a) to a greyscale image (b), generating a binary image (c) using an adaptive thresholding method. Next, obtaining the galaxy contour image (d) through edge detection, followed by ellipse fitting for each galaxy (e), and finally, masking the noisy astronomical sources, retaining the target galaxy (f) located at the centre of the image.
(ii) Contour Fitting: We perform edge detection on the binary image to extract the contours of each galaxy in the image. Then, using OpenCV library functions, we fit ellipses to these contours. Based on the characteristics of the target galaxy, we use the distance between the ellipse’s centre and the image’s centre as an evaluation parameter to identify the target galaxy. Finally, we enlarge the ellipse size by a factor of 1.5 to encompass as much information of the galaxy as possible, and mask the regions outside the ellipse with black.
After conducting visual inspections on randomly extracted sources, we found that masking noisy astronomical sources algorithm successfully masked over 90 per cent of the noisy objects while preserving the majority of the halo information from the target galaxy. Fig. 2 illustrates the process of masking noisy objects step by step. Additionally, the limitations of this algorithm are discussed in Appendix A2, where we also demonstrate its effect of masking on low-surface brightness galaxies and disturbed galaxies. On some images, the algorithm fails to accurately identify the target galaxies galaxy at the centre of the images or properly fit their contours. The images of these galaxies will be filtered out. In this work, approximately 7 per cent of the targets were excluded by the masking algorithm.
2.2.2 Blended galaxies identification
Blended galaxies are a visual phenomenon where multiple galaxies, when closely positioned in the sky, appear as a single galaxy when observed from a distance. This can confound the surface brightness, shape, and spectral characteristics of galaxies, making it challenging to accurately discern the true properties of each individual galaxy. To address the issues introduced by blended galaxies, astronomers employ various methods, such as spectral analysis (Joseph, Courbin & Starck 2016), machine learning techniques (Burke et al. 2019; Arcelin et al. 2020). In our data set, a significant number of target galaxies exhibit blending situations. These blended galaxies imply the possibility of multiple celestial objects within the precision range of cross-matching, leading to inaccurate parameter determinations. Simultaneously, the combined radiation from multiple galaxies not only blurs the colours and luminosities of the target galaxy but also complicates the measurements of M* and sSFR, resulting in imprecise estimates. Therefore, to more accurately measure the M* and sSFR, it is crucial to precisely identify and remove blended galaxies.
Existing methods for detecting blended galaxies often yield less than optimal results. After trying various approaches to identify blended galaxies, we ultimately chose to employ the photutils3 image processing package along with connectivity-based recognition method to identify and remove blended galaxies in image data, ensuring the quality of the data. First, we use the photutils package to identify blended galaxies. Subsequently, we introduce a connectivity-based recognition method that effectively assists in identifying blended galaxies. Due to its low-false positive rate, it is highly suitable for a second round of detection after processing with the photutils package. By repeating and optimizing these steps, it helps reduce potential data biases and improves the reliability of measurements for M* and sSFR.
First, we introduce the method of using the photutils image processing package. photutils is a python software package for astronomical image analysis. It provides open-source functionality for detecting sources, measuring their sizes and shapes, and identifying different celestial objects. The process of identifying blended galaxies in DESI image data using the photutils package mainly involves the following key steps.
(i) Masking Noisy Astronomical Sources: The original DESI images are pre-processed as described in Section 2.2.1 to reduce background noise and the impact of noisy astronomical sources on the detection of blended galaxies. This is done to improve the identification of blended galaxies.
(ii) Source Detection: Using functions from photutils, source detection is performed on the pre-processed images. This step automatically identifies celestial sources in the image and determines their central positions. To ensure the reliability of detection, we set criteria that detected sources must have a minimum number of connected pixels (10 pixels) and each pixel must exceed a specified threshold in the image. In this case, we define a 2D detection threshold image using the background Root Mean Square (RMS) image. The setting of the threshold is crucial. If the threshold is set too low, it can mistakenly identify background noise as celestial sources. Conversely, if the threshold is set too high, it may fail to detect some dimmer objects. After experimenting on the image, we set the threshold to be 1.5 times the noise level. This threshold setting helps us to more accurately detect and identify target galaxy.
(iii) Blended Galaxies Identification: Blended galaxies in the image have connected pixel values, causing them to be identified as a single source. Using functions from photutils, saddle points between two sources are detected to determine if two objects are blended. This requires that the two blended objects must be sufficiently separated to accurately identify their saddle point. If the number of detected sources is greater than one, it can be considered as a blended galaxy.
We demonstrate the effectiveness of blended galaxies identification using the photutils package in Fig. 3. Among them, Figs 3(a) and (b) are examples of successfully identifying blended galaxies and non-blended galaxies. Fig. 3(c) illustrates an example where the photutils package incorrectly identified a blended galaxy as a non-blended one, and a secondary detection will be performed using the connectivity-based recognition method.

The effectiveness of blended galaxy identification using the photutils package. Fig. (a) is successfully identified as a blended galaxy and will be removed, while Fig. (b) is successfully identified as a non-blended galaxy and will be retained as input data. Fig. (c) represents an instance where the photutils package failed to recognize a blended galaxy, and a secondary detection will be conducted using the connectivity-based recognition method.
Next, we present the connectivity-based recognition method. As the centre of galaxies are usually bright, blended galaxies in the image often appear as multiple peak bright spots. Therefore, we can identify blended galaxies by detecting these peak brightness points. Following the concept of masking noise in Section 2.2.1, we perform threshold calculation once again on the data after masking, generating a new binary image. Since some areas of the image have already been covered in black, the calculated threshold will generally be higher than before, which means it is easier to segment the blended galaxies into two independent regions. Next, we utilize the OpenCV library to extract connected regions of galaxies from the binary image and calculate the number of connected regions. If the number of connected regions is greater than 2, and each connected region has more than 3 pixels, we identify the image as containing blended galaxies. As shown in Fig. 4, we demonstrate the effectiveness of the connectivity-based recognition method in successfully identifying blended galaxies that failed to be recognized by photutils (Fig. 3c).

The effectiveness of successfully identifying blended galaxies using the connectivity-based recognition method. The number and pixel count of connected regions (the central circles) are computed from the binary image (c) to determine if there are blended galaxies.
2.2.3 Source size constraint
During the cross-matching process, we opted to use parameters from the SDSS value-added catalogue to match with the DESI galaxy images. The SDSS fibre has a diameter of 3 arcsec, but in this work we are considering galaxies with radius up to 6 arcsec, extending the photometric data outside the area covered spectroscopically. This choice implies that our parameters are suitable for capturing features of small-sized galaxies. For galaxies that are too large, our parameters may only represent local information and might not fully characterize the entire galaxy. To ensure the accuracy and relevance of our labels, we further filtered and processed the data. Taking into account the errors in spectroscopic observations and cross-matching, we chose to study data for galaxies with a radius less than 6 arcsec. To ensure the validity of this choice, we conducted relevant tests in Section 4.4.1. The results indicated that there was no decrease in performance when the galaxy diameter exceeded 3 arcsec. Given that our study requires sizes to be restricted within 6 arcsec and the DESI pixel scale is 0.262 arcsec, equivalent to approximately 22.9 pixels, we discarded galaxies with an circumscribed circle radius less than 23 pixels. Finally, we cropped the images from 152 × 152 pixels to 64 × 64 pixels to remove large areas of uninformative regions at the image edges, preventing interference with the model’s learning process. After the above selection steps, we obtained 352 130 DESI observation samples.
To enhance the efficiency of exploration and ensure the adequacy of data, this paper randomly selects 100 000 samples from the above-mentioned data set based on experience for exploration. According to the principles of machine learning model construction, we randomly divide these data into training, validation, and testing data sets in a ratio of 6:1:3. The training set carries the experiential knowledge that the model needs to learn, the validation set is used to guide how to adjust learning strategies or model hyperparameters, and the testing set is used to evaluate the quality of learning outcomes. These three sets consist of 60 000 samples, 10 000 samples, and 30 000 samples, respectively. Based on the experience of machine learning-related research, this ratio setting can better ensure that the model can learn the characteristics of each category more efficiently, improving the generalization ability and accuracy of the model. Fig. 5 presents the distributions of the used samples in three histograms respectively in the parameter space of redshift, M*, and sSFR. Among them, the distribution range of redshift is relatively low, clustering within the interval of (0, 0.35]. The values of M* are distributed across the range of [7.5, 11.5]. Meanwhile, the values of sSFR fall within the range of [−13.5, −8], exhibiting a distinct bimodal distribution.

The distributions of the used samples in three parameter space. The training set, validation set, and test set, respectively consist of 60 000 samples, 10 000 samples, and 30 000 samples.
3 METHOD
The application of deep learning algorithms in the fields of astrophysics and cosmology has become increasingly prevalent, emerging as a powerful tool for the detection, classification, and description of astrophysical sources. It excels in automatically extracting features from complex data and exhibits strong generalization capabilities, particularly when investigating intricate astrophysical phenomena. At a fundamental level, inferring galaxy properties from large-scale photometric measurements can be viewed as an image regression problem. However, regression tasks based on deep learning for photometric images are often plagued by the following three data characteristics: (i) complex structures and morphologies: galaxies exhibit diverse morphologies and structures, posing a challenge in accurately extracting and understanding these features; (ii) diverse colour and brightness distributions: significant variations in colour and brightness distributions among different galaxies may introduce some level of ambiguity or error in regression tasks; (iii) the presence of star-forming regions: the existence of star-forming regions in galaxies can impact the extraction of overall galaxy features. To address these challenges, this paper introduces an efficient, flexible, and scalable convolutional neural network architecture GalEffNet as our research methodology.
3.1 Deep neural network GalEffNet
This section introduces the overall architecture and design principles of GalEffNet. The network structure of GalEffNet is depicted in Fig. 6 and is primarily composed of two key components: the General Feature Extraction Module and the Parameter Feature Extractor. Throughout the entire model, we emphasize the importance of feature extraction. If the foundational backbone network fails to learn high-quality features adequately, the model’s performance may be unsatisfactory. Therefore, GalEffNet’s distinctive feature lies in using the streamlined EfficientNet-B3 architecture (Tan & Le 2019) as the General Feature Extraction Module to ensure effective feature extraction from galaxy images. Before the introduction of EfficientNet, research aimed at improving neural network performance primarily focused on the width, depth, and resolution of the network. However, the relationships between these three factors are closely intertwined. The core idea behind EfficientNet is to use a compound scaling method to balance these three critical factors, aiming to obtain an optimal model under certain complexity constraints. Hence, we selected the EfficientNet-B3 backbone network to explore the connection between the DESI grz three-band photometric images and the physical properties of galaxies.

Structure of GalEffNet. Pre-processed galaxy images are input into the GalEffNet model, sequentially passing through the General Feature Extraction Module (Stem Block and MBConv Block) and the Parameter Feature Extractor (PFeatX Block). Finally, the model outputs predictions for M* and sSFR at the same time through the output layer. In the diagram, FC represents a fully connected layer, which maps the outputs of the preceding feature extraction layers to the final prediction values. BN denotes batch normalization, applied to normalize the current input under the same conditions to expedite model training. Conv represents a convolution operation, and k*k denotes the size of the convolutional kernel. For example, Conv3*3 signifies a convolution operation with a kernel size of 3*3. Swish refers to the swish activation function, expressed as y = x*sigmoid(x).
EfficientNet comprises two modules: the Stem Block and the MBConv Block. Following the learning process, the pre-processed images first pass through the Stem Block. Serving as the starting point of the network, the Stem Block undertakes the task of preliminary processing and feature extraction of the input images. Through a series of operations such as convolution and pooling, it gradually reduces the size and channel number of the feature maps, thereby extracting initial low-level features. Subsequently, the network goes through seven stages, each consisting of a stack of a different number of similar MBConv Block. The MBConv Block is a core component of the EfficientNet series, specifically designed for extracting deeper feature information. It first employs a 1 × 1 ordinary convolution to adjust the number of channels. Subsequently, it introduces depthwise separable convolution (Depthwise Conv) to independently extract spatial features from each input channel, simultaneously reducing parameter count and computational cost. Following that, it employs channel attention mechanism, Squeeze and Excitation, to compress and recalibrate the feature channels to enhance the network’s ability to capture crucial information. Finally, it restores the number of feature channels through a 1 × 1 convolution layer. The entire MBConv Block uses residual connections to add the input to the output, allowing gradients to flow directly through the network, thereby improving training stability.
Finally, we constructed a simple network module, the PFeatX Block, serving as the Parameter Feature Extractor. This block enhances the model’s sensitivity to crucial parameters by further emphasizing and complementing features on the feature map. The PFeatX Block is responsible for mapping high-level features to the final output. It transforms the feature map into a one-dimensional vector through operations like global average pooling, and then maps the one-dimensional vector to the final output space, namely M* and sSFR, using fully connected layers. This concept aims to capture details and local features of important structures in galaxies, further enhancing the model’s performance. With this approach, we not only establish a numerical connection between imaging data and the values of M* and sSFR but also have the potential to conduct a comprehensive two-dimensional analysis of the image data.
3.2 Model training
This section describes the model initialization and the strategy for selecting the optimal model. During the initialization stage of the GalEffNet model, for the General Feature Extraction Module, we opted to load the pre-trained weights of EfficientNet-B3, which were trained on the ImageNet data set. This method is known as transfer learning, allowing the model to possess powerful feature extraction capabilities from the initial stages and converge more quickly. Recognizing that different parameter settings may lead to varied experimental results, in this paper, we employed the following parameter configuration: 50 epochs of iterative training based on transfer learning, learning rate set to 3e−4, batch size of 256, and L2 regularization coefficient α of 0.001. We also implemented an early stopping mechanism, terminating the training process if the loss on the validation set fails to decrease by at least 0.005 over a span of 5 epochs. Our parameter configuration underwent in-depth analysis and experimental validation to ensure the model converges stably and rapidly during training and performs well on the test set. Moreover, the setting of the regularization coefficient also to some extent controlled the model’s complexity, preventing the occurrence of overfitting issues.
Lastly, to improve the model’s data fitting capabilities, we selected the Mean Absolute Error (MAE) as the model’s loss function. This allows the model to update its parameters or weights in a way that minimizes the loss function. Through the iterative training process, GalEffNet gradually optimizes its parameters, enhancing the accuracy of predicting galaxy parameters.
4 RESULTS AND DISCUSSION
To assess the performance of the GalEffNet model, we evaluated the model in Section 4.1 and performed an analysis of the model’s uncertainty in Section 4.2. Subsequently, we applied the GalEffNet model to DESI DR9 photometric images and conducted a series of tests. In Section 4.3, we analysed the impact of galaxy colour and morphology on M* and sSFR. Section 4.4 then presented tests on galaxies of various sizes, redshifts, and types to further validate the robustness and applicability of the model.
4.1 Model evaluation
After determining the optimal hyperparameters for the model, this section evaluates the predictive performance of the GalEffNet model on the test set. The main evaluation metrics used are the standard deviation (σ) and mean absolute error (MAE) of the differences between GalEffNet predictions and reference values from MPA-JHU. The definition is as follows:
where yi is the MPA-JHU reference value,
Consistent with Bisigello (Collaboration et al. 2022), we also estimate the fraction of outliers (fout) and the bias in the entire sample. The fraction of outliers is arbitrarily defined as galaxies for which the M* and sSFR are overestimated or underestimated by a factor of two (approximately 0.3 dex). The definition of bias is as follows:
Fig. 7 illustrates the comparison on the test set. Specifically, the MAE for M* and sSFR is 0.147 and 0.295 dex, with σ being 0.218 and 0.410 dex, respectively. The fraction of outliers for M* is 0.129, with a bias of 0.022. The fraction of outliers for sSFR is larger, at 0.360, with a bias of −0.016. Compared to M*, the estimation of sSFR is more challenging, with a relatively larger degree of dispersion within the sample. Overall, there is good agreement between reference values and model predictions, indicating a good precision in reconstructing galaxy properties. This suggests that our model has undergone effective training.

Comparison results between GalEffNet predictions and MPA-JHU reference values for the test samples. The diagonal dashed line in the graph represents equality between predicted and reference values, and point intensity indicate the density of samples. The top-left corner displays the results of four evaluation metrics, while the bottom-right corner shows the distribution of residuals. The vertical dashed line in the residual distribution plot represents a null difference, and dotted lines indicate the predicted values equal to twice or half the the reference value.
In Fig. 8, we used orange and blue to represent the histograms of reference and predicted values, respectively. M* and sSFR exhibit characteristics of a long-tailed distribution. Moreover, the range of predicted values is more limited compared to the reference values. This limitation may be attributed to GalEffNet facing challenges in accurately predicting extremely high- or low-galaxy property values, leading to larger errors. This phenomenon indicates that the model encounters certain challenges when dealing with extreme numerical values, necessitating further adjustments or improvements to enhance the model’s performance and robustness.

The distribution of reference and predicted galaxy parameters. The left plot represents M*, and the right plot represents sSFR.
In order to further demonstrate the performance of our model, we replicated multiple models for the task of galaxy property prediction on the DESI data set. These models include several typical deep learning networks such as the ResNet series (He et al. 2016), EfficientNet, and Transformer series models like Vision Transformer (Dosovitskiy et al. 2021) and Swin Transformer (Liu et al. 2021). The ResNet series is renowned for its residual connections, which aid in training deep neural network architectures more effectively. EfficientNet employs a network scaling method based on compound coefficients, balancing scaling across the dimensions of depth, width, and resolution within the network. It comprises the base model, denoted as B0, obtained through neural architecture search, and seven extended models, denoted as B1 to B7, derived from this base model through compound expansion. Each extended model varies in depth, width, and resolution to meet the requirements of different tasks and resource constraints. Here, ‘depth’ refers to the number of layers in the network, determining the depth of the feature hierarchies the model can learn. ‘Width’ refers to the number of channels in the feature matrix within the network, influencing the model’s ability to capture features. ‘Resolution’ denotes the dimension of the input data. For example, in this paper, the depth multiplier factor for EfficientNet-B3 is 1.4 times that of the baseline model B0, the width multiplier factor is 1.2 times that of B0, and the resolution of galaxy images input to the network is 64 × 64. Building upon EfficientNet, Tan & Le (2021) further developed the EfficientNet S-XL series models. These models refine the scaling of depth, width, and resolution and introduce advanced data augmentation and regularization techniques to effectively control model size and computational costs while enhancing model performance. Unlike other networks, Vision Transformer and Swin Transformer utilize self-attention mechanisms. This mechanism helps the model establish effective connections between different regions of input data to better capture global contextual information. We replaced the General Feature Extraction Module of GalEffNet with each of the aforementioned models and performed the same prediction task on the same test data set. The experimental comparative results are shown in Table 1.
Comparison of performance in galaxy parameter prediction after combining GalEffNet with classic deep learning network. Bold text indicates the optimal result for the corresponding evaluation metric.
General feature extraction module . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|
. | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Resnet-50 (He et al. 2016) | 0.520 | 0.330 | 0.755 | −0.520 | 0.428 | 0.515 | 0.610 | 0.125 |
Resnet-101 (He et al. 2016) | 0.421 | 0.433 | 0.601 | −0.306 | 0.714 | 0.591 | 0.765 | 0.612 |
Resnet-151 (He et al. 2016) | 0.758 | 0.359 | 0.925 | −0.753 | 0.520 | 0.609 | 0.689 | 0.203 |
EfficientNet-B0 (Tan & Le 2019) | 0.167 | 0.240 | 0.163 | 0.009 | 0.366 | 0.469 | 0.433 | −0.159 |
EfficientNet-B1 (Tan & Le 2019) | 0.183 | 0.247 | 0.197 | −0.037 | 0.321 | 0.435 | 0.410 | −0.013 |
EfficientNet-B2 (Tan & Le 2019) | 0.179 | 0.241 | 0.158 | −0.071 | 0.323 | 0.451 | 0.395 | 0.010 |
GalEffNet | ||||||||
(EfficientNet-B3; Tan & Le 2019) | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
EfficientNetV2-S (Tan & Le 2021) | 0.166 | 0.239 | 0.154 | 0.003 | 0.316 | 0.433 | 0.404 | 0.076 |
Vision Transformer | ||||||||
(Dosovitskiy et al. 2021) | 3.076 | 0.602 | 1.000 | −3.164 | 1.224 | 1.059 | 0.740 | −0.863 |
Swin Transformer (Liu et al. 2021) | 0.472 | 0.670 | 0.606 | −0.127 | 0.955 | 1.059 | 0.903 | 0.398 |
General feature extraction module . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|
. | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Resnet-50 (He et al. 2016) | 0.520 | 0.330 | 0.755 | −0.520 | 0.428 | 0.515 | 0.610 | 0.125 |
Resnet-101 (He et al. 2016) | 0.421 | 0.433 | 0.601 | −0.306 | 0.714 | 0.591 | 0.765 | 0.612 |
Resnet-151 (He et al. 2016) | 0.758 | 0.359 | 0.925 | −0.753 | 0.520 | 0.609 | 0.689 | 0.203 |
EfficientNet-B0 (Tan & Le 2019) | 0.167 | 0.240 | 0.163 | 0.009 | 0.366 | 0.469 | 0.433 | −0.159 |
EfficientNet-B1 (Tan & Le 2019) | 0.183 | 0.247 | 0.197 | −0.037 | 0.321 | 0.435 | 0.410 | −0.013 |
EfficientNet-B2 (Tan & Le 2019) | 0.179 | 0.241 | 0.158 | −0.071 | 0.323 | 0.451 | 0.395 | 0.010 |
GalEffNet | ||||||||
(EfficientNet-B3; Tan & Le 2019) | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
EfficientNetV2-S (Tan & Le 2021) | 0.166 | 0.239 | 0.154 | 0.003 | 0.316 | 0.433 | 0.404 | 0.076 |
Vision Transformer | ||||||||
(Dosovitskiy et al. 2021) | 3.076 | 0.602 | 1.000 | −3.164 | 1.224 | 1.059 | 0.740 | −0.863 |
Swin Transformer (Liu et al. 2021) | 0.472 | 0.670 | 0.606 | −0.127 | 0.955 | 1.059 | 0.903 | 0.398 |
Comparison of performance in galaxy parameter prediction after combining GalEffNet with classic deep learning network. Bold text indicates the optimal result for the corresponding evaluation metric.
General feature extraction module . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|
. | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Resnet-50 (He et al. 2016) | 0.520 | 0.330 | 0.755 | −0.520 | 0.428 | 0.515 | 0.610 | 0.125 |
Resnet-101 (He et al. 2016) | 0.421 | 0.433 | 0.601 | −0.306 | 0.714 | 0.591 | 0.765 | 0.612 |
Resnet-151 (He et al. 2016) | 0.758 | 0.359 | 0.925 | −0.753 | 0.520 | 0.609 | 0.689 | 0.203 |
EfficientNet-B0 (Tan & Le 2019) | 0.167 | 0.240 | 0.163 | 0.009 | 0.366 | 0.469 | 0.433 | −0.159 |
EfficientNet-B1 (Tan & Le 2019) | 0.183 | 0.247 | 0.197 | −0.037 | 0.321 | 0.435 | 0.410 | −0.013 |
EfficientNet-B2 (Tan & Le 2019) | 0.179 | 0.241 | 0.158 | −0.071 | 0.323 | 0.451 | 0.395 | 0.010 |
GalEffNet | ||||||||
(EfficientNet-B3; Tan & Le 2019) | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
EfficientNetV2-S (Tan & Le 2021) | 0.166 | 0.239 | 0.154 | 0.003 | 0.316 | 0.433 | 0.404 | 0.076 |
Vision Transformer | ||||||||
(Dosovitskiy et al. 2021) | 3.076 | 0.602 | 1.000 | −3.164 | 1.224 | 1.059 | 0.740 | −0.863 |
Swin Transformer (Liu et al. 2021) | 0.472 | 0.670 | 0.606 | −0.127 | 0.955 | 1.059 | 0.903 | 0.398 |
General feature extraction module . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|
. | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Resnet-50 (He et al. 2016) | 0.520 | 0.330 | 0.755 | −0.520 | 0.428 | 0.515 | 0.610 | 0.125 |
Resnet-101 (He et al. 2016) | 0.421 | 0.433 | 0.601 | −0.306 | 0.714 | 0.591 | 0.765 | 0.612 |
Resnet-151 (He et al. 2016) | 0.758 | 0.359 | 0.925 | −0.753 | 0.520 | 0.609 | 0.689 | 0.203 |
EfficientNet-B0 (Tan & Le 2019) | 0.167 | 0.240 | 0.163 | 0.009 | 0.366 | 0.469 | 0.433 | −0.159 |
EfficientNet-B1 (Tan & Le 2019) | 0.183 | 0.247 | 0.197 | −0.037 | 0.321 | 0.435 | 0.410 | −0.013 |
EfficientNet-B2 (Tan & Le 2019) | 0.179 | 0.241 | 0.158 | −0.071 | 0.323 | 0.451 | 0.395 | 0.010 |
GalEffNet | ||||||||
(EfficientNet-B3; Tan & Le 2019) | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
EfficientNetV2-S (Tan & Le 2021) | 0.166 | 0.239 | 0.154 | 0.003 | 0.316 | 0.433 | 0.404 | 0.076 |
Vision Transformer | ||||||||
(Dosovitskiy et al. 2021) | 3.076 | 0.602 | 1.000 | −3.164 | 1.224 | 1.059 | 0.740 | −0.863 |
Swin Transformer (Liu et al. 2021) | 0.472 | 0.670 | 0.606 | −0.127 | 0.955 | 1.059 | 0.903 | 0.398 |
The experimental results indicate that the EfficientNet model performs exceptionally well overall, particularly the B3 model, which exhibits even more prominent performance. This may be because the effective information in the DESI photometric image data set is primarily concentrated within a specific range, with some small bright spots or dark backgrounds outside this range. EfficientNet achieves flexibility in adapting to data features and different categories of galaxy samples through balanced scaling across three dimensions. However, it is noted that, despite the relatively small size of the data set, there is slight underfitting observed in models B0–B2, possibly due to insufficient model complexity to adequately learn the data features. On the other hand, for more complex models like ResNet151, there might be a degree of overfitting, which means that the model performs well on the training set but exhibits poor generalization on the test set. Therefore, considering these factors, we opted for EfficientNet-B3 as the General Feature Extraction Module for GalEffNet. As for Transformer models, they may not be well suited for the current task and data set characteristics. This underscores that CNN types are indeed more appropriate for attribute measurement based on photometric image data.
4.2 Uncertainty analysis of the model
Model uncertainty analysis is another method to assess the performance of the GalEffNet model. Gal & Ghahramani (2016) demonstrated that neural networks with the Monte Carlo Dropout (MC Dropout) mechanism can be regarded as Bayesian neural networks with Gaussian distribution, which can be used to estimate uncertainty. The GalEffNet model employs Dropout in each hidden layer’s feature vector, providing support for evaluating model uncertainty.
For each test sample’s M* and sSFR, we conducted 10 repeated predictions using the MC Dropout technique, obtaining 10 different predicted values {y1,..., y10}. Based on these 10 predicted values, we calculated their standard deviation, which is referred to as the 1σ uncertainty. In Fig. 9, we present the GalEffNet model uncertainty. The dots in the figure represent the mean value of the 1σ uncertainty within each interval, while the longitudinal line segments indicate the standard deviation of the 1σ uncertainty within each interval. Observing the results, we found that the mean value of the 1σ uncertainty for M* is highest within the interval (8.0, 8.5], reaching 0.108 dex; whereas, within the interval (10.0, 10.5], the mean value is the lowest, at 0.060 dex. For sSFR, the mean value of the 1σ uncertainty peaks in the interval (−8.5, −8.0], reaching 0.161 dex; whereas, within the interval [−12.5, −12.0], it is the lowest, at 0.040 dex. Further analysis reveals that the intervals with higher model uncertainty are mainly concentrated within the [7.5, 9.0] range for M*, as well as the (−11.5, −10.5] and (−9.0, −8.0] intervals for sSFR. Taking into account the data distribution shown in Fig. 5, we observe that these intervals with greater uncertainty correspond precisely to areas where the amount of data is relatively sparse. Due to the lack of sufficient data, the model’s predictions within these intervals are more unstable and variable. Therefore, we recommend exercising caution when using prediction results within these intervals. Overall, the model uncertainty for M* remains roughly the same across each interval range, while the model uncertainty for sSFR fluctuates more significantly. This phenomenon may be due to the fact that different estimation methods are used for sSFR among different stellar populations in the MPA-JHU catalogue, leading to differences in the results. Additionally, by calculating the standard deviation of the 1σ uncertainty for all samples, we obtained values of 0.028 and 0.043 dex for M* and sSFR, respectively. This indicates that the GalEffNet model exhibits a relatively low degree of dispersion in terms of its 1σ uncertainty, suggesting that the model’s predictions are generally robust overall.

GalEffNet model uncertainty. GalEffNet utilizes the MC Dropout technique to conduct 10 rounds of predictions for each sample, and calculates the standard deviation of these 10 predicted values. This standard deviation is referred to as the 1σ uncertainty. The dots represent the mean value of the 1σ uncertainty within each interval. The longitudinal line segments indicate the standard deviation of the 1σ uncertainty within each interval.
4.3 Impact of colour and morphological information on M* and sSFR
This section aims to test the importance of colour information on galaxy parameters by reducing the number of bands to a single band and fixing the image resolution. Additionally, by fixing the colour information (e.g. using three DESI grz bands) and lowering the image resolution, we can investigate the significance of morphological features.
4.3.1 Effects of colour information (bands)
DESI images consist of three bands: g, r, and z, each carrying different amounts of information. Therefore, the number of bands used in predicting M* and sSFR may influence the accuracy of the results. During training and prediction, we utilized RGB images composed of the combination of the g, r, and z bands. This implies that the parameter measurements of our model might be influenced by colour. In the process of model training, the omission of information from certain bands could result in reduced information received by the model, thus affecting its performance.
To delve into the impact of various bands (colour information) on M* and sSFR, we conducted the following experiments. Specifically, similar to the approach by Wu & Boada (2019), we explored various combinations of bands, including images with one band (g, r, z) and two bands (gr, gz, rz), as illustrated in Fig. 10. These images with various band combinations were input into our model to investigate their potential influence on the results. For the three single-band images of g, r, and z, we replicated them into the red, blue, and green channels, creating a colour image with three channels. As for images with two bands, such as gr, we duplicated them into the red and blue channels, while utilizing the average values of these two bands in the green channel.

Randomly selected sample galaxies and their images in different bands. In each row, we display an RGB image composed of the three bands grz on the left, and on the right are the gr, gz, rz, g, r, and z bands.
Fig. 11 illustrates the variation in MAE and σ of the model’s predictions for M* and sSFR across different wavelength bands. Compared to the grz imaging, the network trained and tested on single-band images performed relatively poorly. Among the three single bands, the g band exhibited the best performance. Further adding a second band to images significantly reduced the error, bringing it close to the level of the original three-colour image. And the gr and gz bands perform better compared to the rz band. This improvement may be attributed to the additional information provided by the g band about the young stellar populations. By training the model on three-band photometry measurements, we were able to enhance the estimates of M* and sSFR to a higher level. Therefore, reducing the number of bands (colour information) while keeping the image resolution unchanged allows testing the impact of colour information at the same level of detail. This is because colour information reflects the spectral features on the surface of galaxies, providing information about galaxy properties. Hence, the rational selection of bands and data sources is crucial in galaxy studies to ensure that the model achieves optimal performance.

Using σ and MAE to measure the impact of various wavelength bands on model performance.
4.3.2 Effects of morphological features (resolution)
Similar to testing the impact of colour information using various bands, here we examine the influence of spatial resolution of galaxy images on two galaxy properties. We hypothesize that reducing the image resolution while keeping the colour information constant can investigate the importance of morphological features. This is because lowering the resolution causes the image to lose some details, while the colour information remains unchanged. In this way, the significance of morphological features in object recognition and structure can be observed. Therefore, we use bilinear interpolation to decrease the spatial resolution of the reference image to various pixel levels and retrain the network.
According to Fig. 12, we found that the the MAE and σ of the model’s predictions for M* and sSFR gradually decrease as the image resolution increases. We attribute this phenomenon to the reduction in image resolution leading to the loss of some crucial detailed information in the prediction of galaxy parameters, affecting the accuracy of the predictions. Specifically, high-resolution images can reveal finer internal structures within galaxies, such as bulge-like structures and bar-like structures. These details are indispensable for accurately predicting the physical properties of galaxies. This indicates that leveraging morphological information provided by images significantly enhances the performance of GalEffNet.

Using σ and MAE to measure the impact of various resolutions on model performance.
Overall, by adjusting the number of bands and image resolution, we have delved into their roles in the analysis of galaxy images and their importance in understanding galaxy characteristics. The experiments demonstrate that if colour information or morphology is helpful for predicting galaxy parameters, the errors are correspondingly reduced. Therefore, we conclude that colour information and galaxy morphology contain information about galaxy properties, including M* and sSFR. This experimental section enhances our understanding and utilization of information within galaxy images.
4.4 Analysis of model applicability
In this section, we conduct an applicability analysis of the GalEffNet model, exploring its performance across galaxy size, redshift, and galaxy types. This series of studies provides a comprehensive understanding of the model’s performance and limitations, offering crucial insights for further optimizing the model and improving prediction accuracy. Moreover, it serves as a valuable reference for practical applications.
4.4.1 Applicability across various galaxy sizes
In our study, we chose to use parameters from the SDSS value-added catalogue as reference values. The fibre length of the SDSS spectrograph is only 3 arcsec in diameter, covering only the central regions of most galaxies. To ensure that the selected galaxies fall within an appropriate size range, we conducted filtering in Section 2.2.3, retaining galaxies with radii within 6 arcsec. Although we performed this data pre-processing, there is still some disparity between the 3 arcsec diameter and the 6 arcsec radius. This raises concerns about the accuracy and reasonability of the input galaxy data compared to the labels used. To address this issue, we conducted an applicability analysis of the model on galaxies of various sizes in this section to ensure that our model can provide accurate predictions across various size ranges.
In this paper, the definition of galaxy size (i.e. galaxy radius) is based on the ellipse fitted to the target galaxy through the OpenCV library in Section 2.2.1, being 1.5 times its major axis. This definition is in pixels, and considering the DESI pixel scale is 0.262 arcsec, we can convert it into galaxy radius in units of arcsec. As shown in Fig. 13, we divided the test data set into five different intervals based on the galaxy size (measured in arcsec) and evaluated the model’s performance in predicting M* and sSFR within each interval. It can be observed that the model’s prediction performance is relatively stable within each interval. As the galaxy size increases, the prediction accuracy slightly improves. This implies that, although our parameters can only reflect local information of galaxy, they are equally applicable to the characteristics of the entire galaxy. This confirms that our model can deeply learn a broader range of galaxies from these two-dimensional images, ensuring the effective applicability of the data to our parameters and analysis methods. Therefore, we can conclude that the galaxy images we input are well matched with the labels used in our study.

The predictive performance of GalEffNet on M* and sSFR across different ranges of galaxy sizes. The symbols ▲ and ■ represent the results for MAE and σ.
4.4.2 Applicability across various redshifts
In this subsection, we evaluate the applicability of the model at various redshifts. Specifically, we evaluate the performance of the GalEffNet model by comparing the differences between the predicted values and the reference values of M* and sSFR at various redshift values.
During the data pre-processing stage, we restricted the size of galaxies to a radius of 6 arcsec. Although this restriction usually results in excluding some very low-redshif galaxies, the experimental results indicate that the remaining data still covers a relatively low-redshift range, concentrated on (0, 0.35]. The limitation (0, 0.35] on redshift is likely due to the flux limitations of the SDSS or DESI survey. Based on Fig. 14, we can learn that the σ of sSFR remains between 0.38 and 0.45 dex, and the MAE ranges from 0.28 to 0.33 dex under different redshifts. The model’s performance in predicting sSFR is relatively stable, with a small range of error fluctuations. This suggests that the redshift variation has a minor impact on the star formation activity within galaxies. In contrast, within the redshift range of (0.05, 0.35], the prediction results for M* are relatively accurate, with both σ and MAE staying below 0.23 dex. However, within the redshift range of (0, 0.05], the prediction error for M* is relatively large. This could be attributed to the limited amount of data within the (0, 0.05] redshift range, where the model lacks sufficient samples to learn and predict M* accurately. Overall, within the low-redshift range of (0, 0.35], even with a slight increase in redshift, the overall consistency between the predicted values of the GalEffNet model and the true values is not significantly affected. This indicates that the model’s performance remains relatively stable across the tested redshift range, ensuring reliable predictions.

The predictive performance of GalEffNet on M* and sSFR across the tested redshift range. The symbols ▲ and ■ represent the results for MAE and σ.
4.4.3 Analysis across galaxy types
In this subsection, we conducted applicability experiments to assess the model’s performance on various types of galaxies. Simultaneously, this allowed us to further analyse and understand the relationships between various morphological types of galaxies and their physical properties. We conducted tests based on galaxy classifications provided by DESI DR9 official website.4 As shown in Fig. 15, our data set covered four different galaxy morphological types, including round exponential galaxies with a variable radius (‘REX’), elliptical galaxies (‘DEV’), spiral galaxies (‘EXP’), and Sersic profiles (‘SER’). The main differences among these galaxies lie in their different surface brightness profile models, which describe how the luminosity of galaxies changes with increasing distance from the centre of the galaxy.5 Among them, the REX model assumes that the surface brightness of galaxies decreases exponentially with increasing radius, assuming galaxies to be perfect circles and disregarding other complexities of ellipticity or shape. The SER model is a more generalized model, where the Sersic index, denoted as n, is a key characteristic that can alter the shape of the brightness profile. When n = 1, the model degenerates into a simple exponential profile, suitable for spiral galaxies. When n = 4, the model approximates to the deVaucouleurs profile, commonly used to describe galaxies with more concentrated brightness, such as elliptical galaxies. That is to say, DEV (bulge-dominated) and EXP (disc-dominated) are special cases of Sersic. Galaxies with Sersic indices between 1 and 4 exhibit both bulge and disc components, known as SER galaxies. Next, we will analyse the predictive performance of the model in relation to different galaxy morphologies and characteristics.

Colours (grz) images of four exemplary galaxy morphological types in DESI DR9.
(i) ‘DEV’ Galaxies: Elliptical galaxies typically exhibit pronounced elliptical shapes with significant axial symmetry. Their colours tend to be relatively uniform and red, which may be attributed to the high-velocity dispersion and relative scarcity of cold gas in elliptical galaxies, leading to relatively low levels of star formation activity. These galaxies have gone through relatively early evolutionary stages, with the internal stellar populations being dominated by older stars, and new star formation is relatively rare. These characteristics indicate that elliptical galaxies have relatively high M* and low sSFR (Fig. 16).

A comparative density plot of the distribution of M* relative to sSFR for four types of galaxies. The colour represents the density of the samples.
(ii) ‘EXP’ Galaxies: Spiral galaxies exhibit prominent spiral structures and are often accompanied by nebulae and star-forming regions. In terms of colour, they generally display a bluish tint, especially in the spiral arms and the central region of the galaxy. These galaxies are rich in cold gas and often contain active interstellar material and continuous star formation activity, hence likely continuously generating new stars. Therefore, spiral galaxies have relatively high sSFR.
(ii) ‘SER’ Galaxies: SER physically refers to composite galaxies, with DEV and EXP being its special manifestations. Some SER galaxies share similar characteristics with elliptical galaxies, and their parameter distributions are also comparable. Specifically, these galaxies have relatively high M* and relatively low sSFR. Meanwhile, there are also some SER galaxies that have similar parameter distributions to spiral galaxies, exhibiting relatively high sSFR.
(iv) ‘REX’ Galaxies: Round exponential galaxies typically exhibit a uniform surface brightness distribution and have a morphology resembling a circle. In terms of colour, these galaxies show relatively uniform features, possibly accompanied by a strong red component. This colour feature may indicate that the stars within the galaxies are mainly old, but the sSFR of these galaxies is not very low, suggesting that they are still undergoing star formation activities.
According to the results shown in Table 2, our model performs well in predicting M* for Sersic profiles and elliptical galaxies, while it excels in predicting sSFR for round exponential galaxies and spiral galaxies. To further explain this phenomenon, we compared the distribution densities of M* relative to sSFR for various galaxy types (Fig. 16). These four types of galaxies exhibit unique characteristics that directly influence their M* and sSFR. For instance, young galaxies undergoing intense star formation may manifest significant differences in morphology and colour, while relatively older galaxies may exhibit more stable and mature morphologies. Consequently, galaxies with high sSFR indicate active star formation, displaying richer features, enabling the model to achieve better performance during the learning process and more accurate predictions. Conversely, galaxies with low sSFR possess fewer internal features, resulting in the model’s predictions being less effective than the former. This trend is similarly observed for M*. Hence, our model’s performance in predicting properties across different galaxy types varies, depending on the alignment between the model and the morphological and characteristic features of the galaxies.
Performance comparison of GalEffNet in predicting parameters for four types of galaxies. Bold text indicates the optimal result for the corresponding evaluation metric.
Galaxy type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
SER | 0.144 | 0.213 | 0.122 | 0.022 | 0.298 | 0.415 | 0.362 | −0.011 | |
DEV | 0.113 | 0.161 | 0.075 | 0.019 | 0.308 | 0.421 | 0.389 | −0.008 | |
REX | 0.210 | 0.290 | 0.235 | 0.028 | 0.262 | 0.349 | 0.324 | −0.066 | |
EXP | 0.213 | 0.278 | 0.234 | 0.061 | 0.247 | 0.347 | 0.281 | −0.066 |
Galaxy type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
SER | 0.144 | 0.213 | 0.122 | 0.022 | 0.298 | 0.415 | 0.362 | −0.011 | |
DEV | 0.113 | 0.161 | 0.075 | 0.019 | 0.308 | 0.421 | 0.389 | −0.008 | |
REX | 0.210 | 0.290 | 0.235 | 0.028 | 0.262 | 0.349 | 0.324 | −0.066 | |
EXP | 0.213 | 0.278 | 0.234 | 0.061 | 0.247 | 0.347 | 0.281 | −0.066 |
Performance comparison of GalEffNet in predicting parameters for four types of galaxies. Bold text indicates the optimal result for the corresponding evaluation metric.
Galaxy type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
SER | 0.144 | 0.213 | 0.122 | 0.022 | 0.298 | 0.415 | 0.362 | −0.011 | |
DEV | 0.113 | 0.161 | 0.075 | 0.019 | 0.308 | 0.421 | 0.389 | −0.008 | |
REX | 0.210 | 0.290 | 0.235 | 0.028 | 0.262 | 0.349 | 0.324 | −0.066 | |
EXP | 0.213 | 0.278 | 0.234 | 0.061 | 0.247 | 0.347 | 0.281 | −0.066 |
Galaxy type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
SER | 0.144 | 0.213 | 0.122 | 0.022 | 0.298 | 0.415 | 0.362 | −0.011 | |
DEV | 0.113 | 0.161 | 0.075 | 0.019 | 0.308 | 0.421 | 0.389 | −0.008 | |
REX | 0.210 | 0.290 | 0.235 | 0.028 | 0.262 | 0.349 | 0.324 | −0.066 | |
EXP | 0.213 | 0.278 | 0.234 | 0.061 | 0.247 | 0.347 | 0.281 | −0.066 |
5 SUMMARY AND OUTLOOK
This paper proposes a comprehensive data pre-processing method that focuses on addressing three key issues in DESI galaxy images, including addressing masked noisy astronomical sources, identifying blended galaxies, and managing source size constraints. The aim is to minimize the impact of noise on the data, providing cleaner and more effective data for subsequent model training. Then, we construct a deep convolutional neural network, GalEffNet, to estimate the M* and sSFR of galaxies in DESI DR9 data set. Emphasizing the importance of feature extraction, the model is trained and tested using DESI photometric images cross-matched with homogenous galaxies from the high-precision SDSS MPA-JHU DR8 catalogue. To assess the performance of the GalEffNet model, a series of comparative experiments are conducted. The general feature extraction module of GalEffNet is replaced with other classical neural network models, including ResNet, EfficientNet series, Vision Transformer, and Swin Transformer. Comparative experiments reveal that the EfficientNet exhibits superior overall performance, with metrics consistently outperforming other models. Specifically, EfficientNet-B3, serving as the backbone network for GalEffNet, achieves outstanding results. For M*, the model’s MAE and σ are 0.147 and 0.218 dex, respectively; for sSFR, MAE and σ are 0.295 and 0.410 dex. To validate the model’s reliability, we conducted an uncertainty analysis, which revealed that the uncertainty of the model is relatively low. Consequently, the model’s predictions are deemed robust.
Subsequently, we investigated the importance of colour and morphological features for M* and sSFR by reducing the number of bands to a single image with fixed the resolution, as well as decreasing the image resolution with fixed colour information. We observed a decreasing trend in the error as morphological or colour complexity increased. This suggests that these two experiments contribute significantly to our understanding of the relative importance of colour and morphological information in predicting galaxy properties. Furthermore, we conducted an applicability analysis across various sizes, redshifts, and morphological types of galaxies. The results demonstrate that our model can provide accurate predictions across various size and redshift ranges. However, performance varies when predicting properties for various types of galaxies. Hence, the model may require more flexible feature extraction capabilities when dealing with galaxies of different morphological types to better capture their intrinsic properties.
In the future, we will expand the scope of our research and explore survey data across different wavelengths, which will add more reference value to our research for astronomers and data analysis researchers. Moreover, for many scientific data sets, especially surveys, the number of corresponding labels is often much smaller than the number of images, and the labels themselves typically have non-negligible noise or bias. Therefore, leveraging powerful self-supervised or unsupervised learning approaches may become possible. In our subsequent research, we will investigate small sky regions released by DESI and use low-redshift samples from SDSS for training to estimate galaxies that are either unreleased or unobserved by DESI.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (Grant Nos. 12373108, 11973022, 11903008, and 12273075), the Natural Science Foundation of Guangdong Province (Grant No. 2020A1515010710), and China Manned Space Program through its Space Application System.
DATA AVAILABILITY
The DESI galaxy images used in this article can be downloaded from the DESI DR9 website, at https://data.desi.lbl.gov/public/ets/target/catalogs/dr9/1.1.1/targets/main/resolve/. Reference labels are obtained from the SDSS MPA-JHU DR8 catalogue, accessible for download at https://skyserver.sdss.org/CasJobs/SubmitJob.aspx. The implementation code for the proposed neural network is developed in tensorflow. The complete project, along with its documentation, is publicly available at https://github.com/yuu250/GalEffNet. The project includes our entire codebase, trained models, experimental data, and usage instructions, serving both astronomical exploration and research on data processing algorithms.
Footnotes
https://astronomy.swin.edu.au/cosmos/S/Surface+Brightness + Profiles
References
APPENDIX A: MASKING NOISY ASTRONOMICAL SOURCES ALGORITHM
A1 Rationality of the algorithm
In Section 2.2.1, we employed adaptive threshold segmentation and contour fitting methods to mask noisy objects. This algorithm makes sure that only the target galaxy located at the centre of the images were retained to reduce the model learning errors. However, considering that galaxy evolution is deeply influenced by the environment, it is necessary for us to explore the rationality of masking noisy objects around the target galaxies, whether this operation has a positive effect on the model estimates. To this end, this work conducted a comparative analysis using both the original data set (without masking noisy objects around the target galaxy) and the pre-processed data set (with masking noisy objects around the target galaxy) to validate the estimation results.
As shown in Table A1, we evaluated the performance of GalEffNet model in predicting M* and sSFR on the original data set and the pre-processed data set. The results indicate that the performance of the data set after masking noisy objects is improved compared to the original data set. Specifically, for the M* estimation, the σ of the pre-processed data set decreased by 6.45 per cent and the MAE reduced by 16.48 per cent, compared to the original data set. Similarly, for sSFR, the σ of the pre-processed data set decreased by 5.31 per cent and the MAE reduced by 6.93 per cent. Additionally, both the fraction of outliers and bias in M* and sSFR were decreased. Therefore, masking around the target galaxies is reasonable and feasible, helping to reduce interference and make our estimates of M* and sSFR more stable and less erroneous.
Comparison of the performance of GalEffNet in predicting M* and sSFR on the original data set and the pre-processed data set. Bold text indicates the optimal result for the corresponding evaluation metric.
Data set type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Original data set | 0.176 | 0.233 | 0.159 | −0.038 | 0.317 | 0.433 | 0.386 | −0.044 | |
Pre-processed data set | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
Data set type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Original data set | 0.176 | 0.233 | 0.159 | −0.038 | 0.317 | 0.433 | 0.386 | −0.044 | |
Pre-processed data set | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
Comparison of the performance of GalEffNet in predicting M* and sSFR on the original data set and the pre-processed data set. Bold text indicates the optimal result for the corresponding evaluation metric.
Data set type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Original data set | 0.176 | 0.233 | 0.159 | −0.038 | 0.317 | 0.433 | 0.386 | −0.044 | |
Pre-processed data set | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
Data set type . | . | M* . | sSFR . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | MAE . | σ . | fout . | <ΔM* > . | MAE . | σ . | fout . | <ΔsSFR > . |
Original data set | 0.176 | 0.233 | 0.159 | −0.038 | 0.317 | 0.433 | 0.386 | −0.044 | |
Pre-processed data set | 0.147 | 0.218 | 0.129 | 0.022 | 0.295 | 0.410 | 0.360 | −0.016 |
A2 Limitations of the algorithm
In Appendix A1, we demonstrated that masking noisy astronomical sources algorithm enhances the performance of our model predictions. While the method performs well in most cases, we also note its limitations in certain scenarios. Therefore, in this appendix, we analyse two types of special celestial objects: disturbed galaxies and low-surface brightness galaxies (LSBGs). Through these two case studies, we aim to delve into the limitations of the current method and provide clear directions and basis for future improvements.
The morphology of disturbed galaxy changes due to interactions with another galaxy, resulting in disturbances. Fig. A1 shows that, during the processing of disturbed galaxies using the algorithm, we discovered that although the algorithm effectively masks other noisy objects around the target galaxy, it fails to accurately preserve the tail of the galaxy itself. This limitation leads to the loss of tail information of disturbed galaxies, affecting the integrity of the galaxies and the accuracy of subsequent parameter analysis. LSBGs have brightness only slightly higher than the background sky, often with only a few percentage points of brightness difference, making them difficult to identify in observations. When using the noisy sources masking algorithm to process LSBGs (Fig. A2), we noticed that the algorithm can successfully identify some LSBGs with more prominent features and mask surrounding noisy sources.

Applying the masking noisy astronomical sources algorithm to disturbed galaxy images randomly selected from DESI. This algorithm fails to include the tails of disturbed galaxies.

Applying the masking noisy astronomical sources algorithm to low-surface brightness galaxy images randomly selected from DESI.
However, the algorithm may encounter difficulties when dealing with some blended galaxies, disturbed galaxies, and LSBGs. In Fig. A3, we can see cases where the algorithm fails to mask properly. It is worth emphasizing that our algorithm can only proceed to mask surrounding noisy sources if the target galaxy is correctly identified at the centre of the image or if the contour of the target galaxy is accurately fitted at the centre of the image; otherwise, these images will be filtered out. As shown in Fig. A3, the algorithm fails to fit the correct contour of galaxies when dealing with some blended galaxies with a large number of haloes present (first row of subfigures) or irregular disturbed galaxies (second row of subfigures). Additionally, due to the low brightness of LSBGs (third row of subfigures), we may fail to identify such galaxies. In these cases, our algorithm will automatically filter out these images. Due to the limitations of the masking algorithm, there may be undetectable detection and selection biases in our data set. In future work, we will focus more on the algorithm’s ability to handle these special celestial objects, further improving the algorithm’s performance.

Cases where the masking noisy astronomical sources algorithm fails. From top to bottom, the examples are of a blended galaxy, a disturbed galaxy, and a low-surface brightness galaxy. If the algorithm fails to recognize the target galaxy or accurately fit the contour of the target galaxy, it will not proceed with the masking step and will discard the image.
Author notes
These authors contributed equally to this work.