Galaxy image classification using hierarchical data learning with weighted sampling and label smoothing

Clean sample selection rule (Willett et al. 2013). A sample can only be selected as a clean sample when the response frequency of each corresponding task exceeds a given threshold. In particular, the threshold selection criteria for smooth galaxies (CRS, IBS, and CSS) are appropriately relaxed to 0.5 [cf. in (Zhu et al. 2019)].

Classname	Abbreviation	Task	Selection	\|$N\_samples$\|
Completely round smooth	CRS	T01	f_smooth > 0.469	8434
		T07	f_completely round > 0.50
In between smooth	IBS	T01	f_smooth > 0.469	8069
		T07	f_in between > 0.50
Cigar-shaped smooth	CSS	T01	f_smooth > 0.469	578
		T07	f_cigar-shaped > 0.50
Edge-on	EO	T01	f_features/disk > 0.430	3903
		T02	f_edge-on,yes > 0.602
Spiral	SPI	T01	f_features/disk > 0.430	7806
		T02	f_edge-on,no > 0.715
		T04	f_spiral,yes > 0.619

Classname	Abbreviation	Task	Selection	\|$N\_samples$\|
Completely round smooth	CRS	T01	f_smooth > 0.469	8434
		T07	f_completely round > 0.50
In between smooth	IBS	T01	f_smooth > 0.469	8069
		T07	f_in between > 0.50
Cigar-shaped smooth	CSS	T01	f_smooth > 0.469	578
		T07	f_cigar-shaped > 0.50
Edge-on	EO	T01	f_features/disk > 0.430	3903
		T02	f_edge-on,yes > 0.602
Spiral	SPI	T01	f_features/disk > 0.430	7806
		T02	f_edge-on,no > 0.715
		T04	f_spiral,yes > 0.619

Table 1.

Clean sample selection rule (Willett et al. 2013). A sample can only be selected as a clean sample when the response frequency of each corresponding task exceeds a given threshold. In particular, the threshold selection criteria for smooth galaxies (CRS, IBS, and CSS) are appropriately relaxed to 0.5 [cf. in (Zhu et al. 2019)].

Classname	Abbreviation	Task	Selection	\|$N\_samples$\|
Completely round smooth	CRS	T01	f_smooth > 0.469	8434
		T07	f_completely round > 0.50
In between smooth	IBS	T01	f_smooth > 0.469	8069
		T07	f_in between > 0.50
Cigar-shaped smooth	CSS	T01	f_smooth > 0.469	578
		T07	f_cigar-shaped > 0.50
Edge-on	EO	T01	f_features/disk > 0.430	3903
		T02	f_edge-on,yes > 0.602
Spiral	SPI	T01	f_features/disk > 0.430	7806
		T02	f_edge-on,no > 0.715
		T04	f_spiral,yes > 0.619

Classname	Abbreviation	Task	Selection	\|$N\_samples$\|
Completely round smooth	CRS	T01	f_smooth > 0.469	8434
		T07	f_completely round > 0.50
In between smooth	IBS	T01	f_smooth > 0.469	8069
		T07	f_in between > 0.50
Cigar-shaped smooth	CSS	T01	f_smooth > 0.469	578
		T07	f_cigar-shaped > 0.50
Edge-on	EO	T01	f_features/disk > 0.430	3903
		T02	f_edge-on,yes > 0.602
Spiral	SPI	T01	f_features/disk > 0.430	7806
		T02	f_edge-on,no > 0.715
		T04	f_spiral,yes > 0.619

After the clean sample selection process, 28 790 samples are obtained. Among them, there are 8434 CRS samples, 8069 IBS samples, 578 CSS samples, 3903 EO samples, and 7806 SPI samples. The training set, validation set, and test set are established by randomly dividing the samples from each galaxy class in a ratio of 8:1:1. The training set, validation set, and test set contain 23 031, 2878, and 2881 samples, respectively. More information can be found in Table 2. It should be noted that the numbers of samples of CRS, IBS, and SPI are large and balanced; in comparison, there are fewer EO, and the number of samples for CSS is the smallest which is less than 1/10 of any one of CRS, IBS, and SPI. Therefore, there is an imbalanced relationship between CSS and the other four classes of galaxies, which will bring some difficulties to the training of deep learning models, and the detailed solution is shown in Section 3.2 of this paper.

Table 2.

The division of the training set, validation set, and test set. The clean sample data set is selected through Table 1 and divided into a training set, a validation set, and a test set in the ratio of 8:1:1. The numbers in this table represent the number of different classes in the corresponding data set.

	CRS	IBS	CSS	EO	SPI	Total
Clean data set	8434	8069	578	3903	7806	28 790
Training set	6747	6455	462	3122	6245	23 031
Validation set	843	807	58	390	780	2878
Test set	844	807	58	391	781	2881

	CRS	IBS	CSS	EO	SPI	Total
Clean data set	8434	8069	578	3903	7806	28 790
Training set	6747	6455	462	3122	6245	23 031
Validation set	843	807	58	390	780	2878
Test set	844	807	58	391	781	2881

Table 2.

The division of the training set, validation set, and test set. The clean sample data set is selected through Table 1 and divided into a training set, a validation set, and a test set in the ratio of 8:1:1. The numbers in this table represent the number of different classes in the corresponding data set.

	CRS	IBS	CSS	EO	SPI	Total
Clean data set	8434	8069	578	3903	7806	28 790
Training set	6747	6455	462	3122	6245	23 031
Validation set	843	807	58	390	780	2878
Test set	844	807	58	391	781	2881

	CRS	IBS	CSS	EO	SPI	Total
Clean data set	8434	8069	578	3903	7806	28 790
Training set	6747	6455	462	3122	6245	23 031
Validation set	843	807	58	390	780	2878
Test set	844	807	58	391	781	2881

3 HIERARCHICAL IMBALANCED DATA LEARNING WITH WEIGHTED SAMPLING AND LABEL SMOOTHING

In fact, deep learning-based galaxy image classification tasks are often plagued by the following three typical data characteristics: some classes have a greater degree of similarities with neighbouring classes than others, the imbalance in the amount of data between classes, and the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changing of morphology. Therefore, in this section, we propose a novel learning method HIWL. The method is designed based on three parts: hierarchical learning using an efficient model, weighted sampling, and label smoothing. In these three parts, we analyse three typical problems, and combine the data set Galaxy Zoo-The Galaxy Challenge to give solutions from general to specific. In Section 3.1, we combine the deep network model EfficientNet with the idea of hierarchical learning to learn the features of galaxy images. These images have the characteristic that some classes have a greater degree of similarities with neighbouring classes than others. In Section 3.2, we use weighted sampling to reduce the negative impact of the imbalanced characteristics of the data. And, in Section 3.3, we use label smoothing to alleviate the machine learning problems caused by the discrepancy between the discrete representation and morphology gradual changing.

3.1 Hierarchical learning using efficient models

In the multiclass galaxy image recognition task, there are few similarities between most classes, and they can be easily distinguished from each other. Only a few classes are difficult to be distinguished from each other and there exist high similarities between them. This kind of phenomenon is referred to as similarity imbalance. The feature extraction capability of the model is particularly important. If the backbone as the foundation is not sufficient to learn high-quality features, the recognition effect of the model would not be satisfactory. Especially, the samples with few similarities between each other appear much more frequently and attract unapproximately much attention from the learning model. And the result is that the model has insufficient ability to identify the classes with high similarities between each other. Thus, feature extraction capability and similarity imbalance brings a challenge to the classification performance of the model. To deal with this kind challenges, we chose the efficient EfficientNet series of models as the backbone in Section 3.1.1 and combine it with the strategy of hierarchical learning in Section 3.1.2 to train the model.

3.1.1 Deep learning model EfficientNet

In this part, we introduce the overall architecture and ideas of the deep network model EfficientNet and give the reflections we made in selecting this series of models and the specific structure of the EfficientNet-B1 model we selected.

EfficientNet (Tan & Le 2019) is the deep learning model proposed by Google Brain in 2019. Before EfficientNet was proposed, most of the exploration works to improve the performance of neural networks focused on the influence of one of the factors such as network width, depth, and resolution. However, it is shown that the relationship between the three is inseparable, and model performance quickly saturates in this way. The idea of EfficientNet is to use compound expansion to balance the relationship between the above three different factors at the same time, aiming to obtain the optimal model under a certain complexity. It includes a base model B0 obtained by Neural Architecture Search (NAS), and extended models B1∼B7 obtained by compound expansion on this basis. NAS (Zoph & Le 2017) is a subfield of AutoML, which aims to achieve more efficient and structured models by designing search spaces, search strategies, and performance evaluation strategies. EfficientNet-B0 is obtained from the NAS technique as a benchmark network, which is simple, clean, and easy to extend and generalize. EfficientNet-B1 to EfficientNet-B7 are obtained by compound expansion of the benchmark model B0 with different multiplicities for the three factors, corresponding to increasing input resolution (224∼600) and network structure complexity, respectively.

The models with various scales are suitable for the data learning problems with different data set sizes. For the photometric image data set Galaxy Zoo-The Galaxy Challenge, we choose B1 from this series of models. It is found that for this data set, the valid information in most of the sample images is concentrated within a certain range, outside of which there are small bright spots (other galaxies) or black backgrounds. This range is bounded by a central rectangle of 240*240 pixels in the image, and the input resolution for EfficientNet-B1 is 240*240 pixels. In addition, although the size of this data set is not large, there exists a slight underfitting on the benchmark model B0 due to insufficient network complexity. On the other hand, these are some overfitting on the model B2 or a more complex model. Therefore, this work selected the model B1.

The structure of the EfficientNet-B1 model is shown in Fig. 1. This model consists of three modules, namely the Stem module, the MBConv module, and the Final Layers module. Following the inference direction, the network first goes through a Stem module. This module serves as a starting point for extracting the initial features and consists of three components, i.e. convolution (kernel size is 3*3), Batch Normalization, and swish activation function. It then goes through seven stages, each consisting of a different number of similar modules MBConv stacked to iteratively extract deeper feature information. In EfficientNet-B0∼B7, the stacking numbers of this module are all different. B1 is composed of 23 modules, and there are certain differences in specific parameters in different modules. The Depthwise Convolution and the channel attention mechanism Squeeze and Excitation (SE) are introduced into the MBConv module, which requires less computation than conventional convolutions. In Depthwise Convolution, a convolution kernel is computed with only one channel’s feature map, which greatly reduces the computational effort. However, the connections between channels are not captured by Depthwise convolution, so the SE structure is used to learn the correlation between channels and obtain channel-specific attention. The combination of the two makes the MBConv module lighter and more powerful in integrating channel information, which is the cornerstone of EfficientNet’s speed and feature extraction capabilities. The Final layers module, which is the last to pass through, is the end module that consolidates the information and performs the classification. It consists of the following parts: convolution (kernel size is 1*1), batch normalization, swish activation functions, global average pooling, dropout layer, and fully connected layer.

Figure 1.

Structure of EfficientNet-B1. Conv in the figure denotes the convolution operation and s denotes the step size of the convolution. For example, ‘Conv3*3, s1’ denotes a convolution operation with a convolution kernel size of 3*3 and a step size of 1. BN denotes batch normalization, which serves to batch normalize the current input and normalize the data under the same scale to speed up model training. Swish refers to the swish activation function, whose expression is y = x * sigmoid(x).

3.1.2 Hierarchical learning

In some image classification tasks, the similarity between two classes of images is different from image pair to image pair. In some image pairs, the samples relatively easily to be misclassified from one class to the other class (Song et al. 2016). Hierarchical learning is an effective method to solve such problems, and the idea is to sequentially classify the data from easy to dificullt. For example, Kim et al. (2019) used a cascade of 4 two-classification models for hierarchical learning of 5 similar disease classes (the number of layers is 4), which improved the overall disease diagnosis accuracy by 2.8 per cent. Gashi et al. (2021) used hierarchical learning in a way that first divided the head and face for the five classes of socially active poses and then further divided these two parts, obtaining an improvement of 2–9 per cent. However, these works used more than two classification models, which makes the overall complexity of the model higher. To simplify the problem, this work proposes a novel hierarchical learning scheme with two layers (Fig. 2): in the first layer, all classes that are difficult to distinguish are combined into a separate combinatorial class, which is trained together with other classes that are easy to distinguish; in the second layer, the combinatorial class is split further into some individual classes and the distinction between these classes which are difficult to distinguish is specially learned.

Figure 2.

Hierarchical learning. The samples from a class which are easy to distinguish share few similarities with the galaxy images from any other class; the samples from a class which are difficult to distinguish share high similarities with the galaxy images from at least one other class.

In the Galaxy Zoo-The Galaxy Challenge data set, particularly, the CSS and EO share the characteristics of being wide in the middle of the image and narrow on their flanks. These characteristics result in some difficulties to discriminate CSS and EO from each other. The shapes of CRS, IBS, and SPI are rotundity, flat ellipse, and spiral with arms, respectively. Therefore, it is relatively easy to distinguish three objects from each other. In the training set, the number of CSS samples is the least (462), the number of EO samples is relatively small (3122), while each of the other three classes has more than 6000 samples. Based on the idea of hierarchical learning, we first treat CSS and EO as a combined class (the number of samples is 3584). This combined class is also easy to be distinguished from the other three classes. Therefore, we classify the samples into four classes in the first layer: the combined class, CRS, IBS, and SPI. This not only makes it easier for the model to learn the difference between the classes, but also alleviates the imbalance between CSS and CRS, IBS, and SPI. The combined classes are then divided further into CSS and EO, with weighted sampling for training in two-classification. Finally, the two models trained above are used sequentially (more details in Section 4.1) to distinguish between the five classes.

3.2 Weighted sampling

Data imbalance refers to the large difference in sample size between different classes in a data set. This characteristic can have a serious negative impact on the overall performance of the CNN (Buda, Maki & Mazurowski 2018). When the classification task uses an imbalanced data set, the learned model prefers to recognize minority class samples as majority class data. To alleviate the imbalance problem, two popularly used techniques are oversampling and undersampling. Without cooperating with other techniques, experimental investigations show that the oversampling outperforms the downsampling method (Mohammed, Rawashdeh & Abdullah 2020). Krawczyk et al. (2016) proposed an algorithm combining boosting and undersampling, which has high application value in breast cancer tumour diagnosis. The disadvantages of oversampling are the overfitting tendency and the high time cost; undersampling may make the model underfit and under-learning of features due to insufficient amount of data. Therefore, this work proposed a weighted sampling approach to alleviate the imbalance effects better.

The idea of weighted sampling is to apply a weight to each sample data in the imbalanced data, and this weight value is the reciprocal of the number of samples in this class. It means that samples from the minority class get higher weight, and samples from the majority class get lower weight. Samples with high weights have a higher probability of being sampled repeatedly but have fewer number of samples coming from the same class; samples with low weights have a lower probability of being sampled but have a larger number of samples coming from the same class. As a result, the number of samples of each class obtained by weighted sampling is approximately balanced. This idea can be expressed as follows:

$$\begin{eqnarray*} w_i=\frac{1}{N_i}, \end{eqnarray*}$$

(1)

$$\begin{eqnarray*} p_i=\frac{w_i}{\sum _{j=1}^{k}{w_j\times N_j}}=\frac{w_i}{k}, \end{eqnarray*}$$

(2)

$$\begin{eqnarray*} E(C=c_i,M=m)=p_i\times N_i\times m=\frac{m}{k}, \end{eqnarray*}$$

(3)

where k represents the number of class, N_i is the number of samples from class c_i, w_i is the weight of each sample in class c_i, p_i is the probability that each sample in class c_i is sampled, and m is the number of expected samples. The expectation of the number of samples sampled from any class is |$E\left(C = c_{i},M = m \right) = \frac{m}{k}$|⁠. Therefore, the computed data set is balanced.

In the data set Galaxy Zoo-The Galaxy Challenge, there is an imbalanced relationship between CSS and the other four classes. The imbalance between CSS, EO and CRS, IBS, SPI has been mitigated by the first layer of hierarchical learning. The second layer is devoted to recognizing CSS and EO, and the imbalance problem between them is dealt with the weighted sampling scheme. In the second layer of hierarchical learning, when CSS and EO are fed into the two-classification model, weighted sampling is added due to the imbalance in the amount of data between the two. In the training phase, the weight of each sample of CSS obtained by weighted sampling is 1/462, and the weight of each sample of EO is 1/3122, the corresponding probability of being selected is 1/924 and 1/6244, respectively. It means that each CSS sample is easier to be sampled, but since EO belongs to the majority class(3122) in the CSS-EO classification problem, the sampling probability of any class is 1/2. Therefore, the two classes of samples tend to be in balance. However, this repeatable sampling operation may yield duplicate samples. Therefore, we will add online data augmentation after sampling (described in more detail in Section 4.2). Due to the random nature of the data augmentation, there isn’t any duplicate samples in the final computation even if one galaxy image is sampled more than twice. Therefore, this strategy ensures the diversity of the final data.

3.3 Label smoothing

In traditional supervised machine learning, labels are often encoded in a one-hot fashion (in equation 4).

$$\begin{eqnarray*} {\hat{y}}_{i} = \left\lbrace {\begin{array}{*{10}c}1,~~i = target \\ {0,~~i \ne target} \\ \end{array}} \right. \end{eqnarray*}$$

(4)

Based on this one-hot label, the target class predictions will converge to 1 and the non-target predictions converge to 0 in minimizing the loss. This label encoding ignores the relationship between the real class and other classes, and it cannot guarantee the generalization ability of the model, which sometimes makes the model prone to overfitting (Zhang et al. 2019). In addition, this drawback is amplified when encountering cases with wrong labels.

When using one-hot encoding as a training label for multiclass galaxy morphology recognition, the labels are only discrete representations like 0 and 1. However, galaxy morphology is essentially gradual changing, which means that the labels should not be just this either/or configuration. Specifically, for the data set Galaxy Zoo-The Galaxy Challenge, in the Hubble classification standard, there is a gradual changing characteristic between CRS and SPI. And in the classification standard of Galaxy Zoo 2, CRS, CSS, and IBS belong to smooth galaxies, and there are some gradual changing morphological characteristics on them. Based on one-hot encoding, the model only learns the typical features of each class, but not the gradual changing characteristics between the classes. This situation results that the model has a higher error rate when recognizing transitional morphological type samples.

Label smoothing, a regularization method in the field of machine learning, was originally proposed by Szegedy et al. (2016). The method has been applied to several fields, such as image classification (Hou et al. 2019), image segmentation (Islam & Glocker 2021), machine translation (Liang, Wang & Cao 2022), and speech recognition (Zheng, Yang & Dang 2020). This is done by making an improvement to one-hot labels (in equation 5) by introducing a smaller parameter α and K as the number of predicted classes, resulting the labels for target class locations are 1 − α and the labels for non-target class locations are α/(K − 1). Müller, Kornblith & Hinton (2019) explains from a representational visualization perspective that label smoothing decreases the intraclass distance of samples in space and increases the interclass distance of samples in space, which facilitates the calibration of the model.

$$\begin{eqnarray*} {\hat{y}}_{i} = \left\lbrace {\begin{array}{*{10}c}1 - \alpha ,~~i = target \\ {\alpha /(K - 1),~~i \ne target} \\ \end{array}} \right. \end{eqnarray*}$$

(5)

$$\begin{eqnarray*} p_{c=i}=\frac{exp(z_{c=i})}{\sum _{j=1}^{k}{exp(z_{c=j})}} \end{eqnarray*}$$

(6)

When using softmax for multiclassification tasks, the predicted value of the model on the target class i is described as equation (6). Where c represents the class, p is the predicted value of a certain class, z is the input logical vector of softmax, and k represents the dimension of the logical vector. During the training of the model under the one-hot encoding method, the predicted value of the target class i p_{c = i} keep tending to 1, and the predicted value of the non-target class p_{c ≠ i} keep tending to 0. It means that z_{c = i} tends to positive infinity, and z_{c ≠ i} tends to negative infinity. At this time, since the input x is a fixed value, the value of the weight w or the bias b will be correspondingly huge, which is easy to cause overfitting. After using label smoothing, the goal of p_{c = i} is to tend to a fixed value of less than 1. This makes z_{c = i}, z_{c ≠ i} no longer tend to infinity, and w, b are no longer optimized when reaching a certain value. Therefore, label smoothing suppresses overfitting and performs better than one-hot. In addition, label smoothing makes each bit of the label code has a value of 0 − α to reflect the gradual changing characteristic between the classes. For example, the original one-hot encoding used by CSS is (0, 0, 1, 0, and 0), and after smoothing the label with the parameter α = 0.1, the encoding becomes (0.025, 0.025, 0.9, 0.025, and 0.025). The 0.025 in the code represents the relationship between this class and other classes to a certain extent, and it also means that the sample has the possibility of belonging to each class. Therefore, using label smoothing makes the label more suitable for gradual changing in a sense, and the learned model will be more reasonable.

4 EXPERIMENT

To explore the effectiveness of the proposed method HIWL, in this section, we conducted an experimental study. We applied HIWL to a real galaxy data set to verify the classification effectiveness and compare the model with other galaxy classification models. Section 4.1 describes the analysis and experimental procedure for applying HIWL to the specific galaxy data set, Galaxy Zoo-The Galaxy Challenge. As an indispensable part of the experiment, the data augmentation is described in detail in Section 4.2. Finally, Section 4.3 presents some details of the experiments, such as parameter settings and specific training strategies.

4.1 Overall design of the experiment

Before the experiments, we analysed the data set we used and learned that this data set has the following characteristics: (1) CSS are very similar to EO in morphology, while other galaxies are highly differentiated from each other; (2) the amount of data for CSS is much less than the amount of data for the other four classes; (3) the galaxy images in the data set have gradual changing characteristics in morphology; and the data may be mislabelled, because the amount of source data is huge and the labelling is done after short-term learning by volunteers. We applied our learning method HIWL to the experiments on this data set. That is, in the overall design of the experiment, hierarchical learning is used to learn the difference between similar classes and alleviate the data imbalance effects, weighted sampling is used to deal with the imbalance problem, and label smoothing is used to learn the gradual changing characteristics between classes.

The overall design of the experiment is shown in Fig. 3. First, we filter the original samples (61 578 galaxy images) according to the clean sample selection rules to obtain clean sample data (28 790 images), and divide them into training set, validation set, and test set. In the training phase, we adopt the idea of hierarchical learning to classify the training data into four classes, namely CRS, IBS, and SPI, as well as a combined class of CSS and EO. The samples of these four classes will be processed using various data pre-processing procedures (details in Section 4.2) and two training strategies (details in Section 4.3). After that, it is sent to the four-classification model with EfficientNet-B1 as the main body for training. Furthermore, for the two classes that are difficult to be distinguished from each other, namely CSS and EO, we train a two-classification model in the same way as above. In the validation phase or test phase, we put the samples into the trained four-classification model after pre-processing. And, if the sample is judged to be one of the non-combination classes, the result is the final prediction result; if the sample is judged to be a combination class, we need to pass it through the trained two-classification model to get the final prediction result.

Figure 3.

The overall design of the experiment. The design as a whole incorporates the following four operations: (1) data set selection and division; (2) hierarchical learning; (3) data pre-processing; (4) learning strategies in the training phase.

4.2 Data augmentation

For deep learning, a total of 23 031 training samples is clearly not enough to train a sufficiently robust model. Therefore, it is necessary to pre-process the training data using some data augmentation techniques. Considering the diversity of galaxy morphology and the effective information concentration range of most of the images in this data set (within 240 × 240 pixels in the centre), we enrich the sample data through some data augmentation operations (as shown in the Fig. 4). These data augmentation operations include center cropping and random operations such as rotation, scaling, horizontal flipping, and vertical flipping.

Figure 4.

Data augmentation of training phase. The original image of the sample is used as the final input of the network structure after the following operations: center cropping to 256 × 256; random rotation, random scaling, and cropping to 240 × 240; random horizontal flip; random vertical flip; and finally normalization.

Suppose the original training set is expressed as |$S_{tr}=\lbrace (\mathbf {x}_i,y_i), i=1, 2, \cdots , 23031\rbrace$| (Section 2.3), where 23 031 denotes the number of training samples, |$\mathbf {x}_i$| denotes the ith galaxy image, and y_i denotes the class of |$\mathbf {x}_i$|⁠. The method of model learning in this paper is an iterative optimization scheme. Each complete iteration of the training set is called an epoch, and the model parameters are updated in each epoch based on an approximate data set of the training sample set S_tr. Assume that the model has experienced a total of N_e epochs of iterative learning, then the learning process of the four-classification model in this paper is summarized as follows:

For n from 1 to N_e
Generate an approximate data set of the sample set S_tr using the data augmentation method (Fig. 4). |$S_{tr}^n=\lbrace (\mathbf {z}_i,y_i), i=1, 2, \cdots , 23031\rbrace$|⁠. The number of samples in |$S_{tr}^n$| is the same as the number of samples in the training set S_tr, and the class of both sample |$\mathbf {z}_i$| and sample |$\mathbf {x}_i$| is y_i. The sample |$\mathbf {z}_i$| is generated from the sample |$\mathbf {x}_i$| by the data augmentation method of Fig. 4: (1) an image |$\mathbf {v}_{i}^{(1)}$| is generated from the sample |$\mathbf {x}_i$| by CenterCrop; (2) do a RandomRotation on image |$\mathbf {v}_{i}^{(1)}$| to generate an image |$\mathbf {v}_{i}^{(2)}$|⁠; (3) do a RandomResizedCrop on image |$\mathbf {v}_{i}^{(2)}$| to generate an image |$\mathbf {v}_{i}^{(3)}$|⁠; (4) do a RandomHorizontalFlip on image |$\mathbf {v}_{i}^{(3)}$| to generate image |$\mathbf {v}_{i}^{(4)}$|⁠; (5) do a Random VerticalFlip on image |$\mathbf {v}_{i}^{(4)}$| to generate image |$\mathbf {v}_{i}^{(5)}$|⁠; and (6) do a normalization on image |$\mathbf {v}_{i}^{(5)}$| to generate the image |$\mathbf {z}_{i}$|⁠.
Use the data set |$S_{tr}^n$| to learn the model.
Perform the next round of iterations.

Where the occurrence probability of the RandomHorizontalFlip in step (4) and the RandomVerticalFlip in step (5) of the data augmentation are both 0.5, and the occurrence probability of each operation in the other steps is 1.

In the above learning process, |$S_{tr}^{n}$| is generated independently in each iteration by the data augmentation method. This data augmentation method is called the online data augmentation method. In contrast, the offline data augmentation method generates a certain augmented data set |$S_{tr}^{^{\prime }}$| from the training set S_tr prior to iterative training, and uses |$S_{tr}^{^{\prime }}$| for computation in each iteration of learning. In total, N_e × 23 031 = 2832 813 samples were generated by the online data augmentation method throughout the iterative learning process. This greatly enhances the diversity of the training data and ensures the generalization ability of the learning results. At the same time, only 23 031 samples are used in each training round, which is the same number of samples as the original training set S_tr, ensuring a manageable computational burden. In addition, considering the data capacity and computational burden issues, we are not able to generate 2832 813 augmented samples and use them for model training in the offline data augmentation scheme.

In the above four-classification model training, each augmented data set |$S_{tr}^n$| was generated from the original training set S_tr using sampling without replacement: each sample in S_tr produces a unique augmented sample to put into |$S_{tr}^n$|⁠. However, the typical characteristics of two-classification model learning on CSS and CO is the imbalance of the data, with a data ratio of 462:3122≈1:6.76 (Table 2). Therefore, sampling with replacement is adopted when generating the augmented data set |$S_{tr, CSS-EO}^n=\lbrace (\mathbf {z}_i,y_i), i=1, 2, \cdots , 3584\rbrace$| from the origin training set |$S_{tr, CSS-EO}=\lbrace (\mathbf {x}_j,y_j), j=1, 2, \cdots , 3584\rbrace$|⁠: for any i = 1, 2, ⋅⋅⋅, 3584, a sample |$\mathbf {x}$| is randomly selected from S_{tr, CSS − EO} using the weighted resampling method of Section 3.2, assuming its class is y; then using steps (1)–(6) of the previous data augmentation by |$\mathbf {x}$| to generate the image |$\mathbf {z}$|⁠, such that |$\mathbf {z}_i = \mathbf {z}$|⁠, y_i = y; and |$(\mathbf {z}_i, y_i)$| as the ith sample of |$S_{tr,CSS-EO}^n$| to form the augmented data set. This is still an online data augmentation method. According to the exploration in Section 3.2, the augmented sample set |$S_{tr,CSS-CO}^n$| maintains a balance between CSS and EO.

4.3 Training strategies and experimental parameters

4.3.1 Training strategies

This section described the schemes for model initialization and optimal model selection. HIWL includes two submodels, namely a four-classification model and a two-classification model. In the initialization phases of both submodels, transfer learning is adopted. The approach is that during initialization, the model loads the pre-training weights that pytorch officially trained in the Imagenet data set. In this way, the model can converge earlier, and has a strong feature extraction ability at the beginning.

For optimal model selection, the models with top three accuracies will be retained for obtaining the final model. A complete iteration means that all samples in a set are learned by the model once. Each complete iteration of the training set is called an epoch, and each classifier needs to be learned in a number of epochs. Suppose N_e denotes the total number of epochs of a model, |$N_e^4$| denotes the total number of epochs of the four-classification model, and |$N_e^2$| denotes the total number of epochs of the two-classification model. After each epoch, a classifier and its classification accuracy on the validation set can be computed. Thus after N_e rounds of iterations, N_e candidate classifiers and their validation accuracy are obtained. An overly large N_e can increase the risk of overfitting, and an inappropriately small N_e usually results in underfitting. These two phenomena have a negative impact on validation accuracy. Therefore, a large epochs number (such as 1000) would be set in advance as an upper bound for N_e and the training procedure will be terminated when the loss value on the validation set no longer decreases and the validation accuracy no longer increases. At this point, the number of passed epochs is the N_e. Based on this scheme, the |$N_e^4$| and |$N_e^2$| are chosen as 123 and 96, respectively. The proposed galaxy image classification system HIWL consists of two subclassifiers: one four-classification model and one two-classification model. In establishing the HIWL, the combinational effects between the subclassifiers should be considered. Therefore, when choosing the best model, the models with the top three validation accuracies in 123 candidate four-classification models will be retained, and the models with the top three validation accuracies in 96 candidate two-classification models will be retained. During the 123 epochs, the top three validation accuracies of four-classification models are 97.84 per cent (at the 90th epoch), 98.02 per cent (at the 98th epoch), and 97.91 per cent (at the 120th epoch). During the 96 epochs, the top three validation accuracies of two-classification models are 95.30 per cent (at the 47th epoch), 95.08 per cent (at the 57th epoch), and 95.08 per cent (at the 62nd epoch). As a result, nine HIWL models are generated, and the HIWL model with the highest validation accuracy is the final model.

4.3.2 Experimental parameter settings

Different parameter settings often lead to different experimental results, and our specific experimental parameters are as follows. The deep learning framework we use is pytorch, the GPU is Nvidia RTX 2060 and the memory is 6GB. For the four-classification model, we iterated 123 epochs based on transfer learning, the learning rate is 0.005, the learning rate decay strategy of cosineAnnealing is adopted, the batch size is 24, and the label smoothing parameter α=0.05. For the two-classification model, we iterated 96 epochs based on transfer learning, the learning rate is 0.005, the learning rate decay strategy of cosineAnnealing is adopted, the batch data size is 24, and the label smoothing parameter α = 0.05. Another important problem is the determination of an appropriate number of epochs for training the model (the time to stop the training procedure). To do this, the relationship between training loss and the number of epochs and the relationship between validation accuracy and the number of epochs are investigated (Fig. 5). In Fig. 5(a), the loss values of the submodels both decrease and then converge to low values when the training process does not reach the stopping point. During this period, the corresponding validation accuracies of the submodels (Fig. 5b) both increase and then stabilize at high values, without any significant decrease. Therefore, both submodels are not overfitting during the training phase. And the submodels have a higher degree of data fit and accuracies where both loss values and validation accuracies are stable. Furthermore, when selecting the optimal model (details in Section 4.3.1), the following candidate submodels with high validation accuracy are all retained where the losses are at low values. During the 123 epochs, the four-classification models with top three validation accuracies are retained at the 90th epoch (97.84 per cent), the 98th epoch (98.02 per cent), and the 120th epoch (97.91 per cent). During the 96th epochs, the two-classification models with top three validation accuracies are retained at the 47th epoch (95.30 per cent), the 57th epoch (95.08 per cent), and the 62nd epoch (95.08 per cent). Therefore, the choice of epochs that 123 for four-classification and 96 for two-classification of the model is appropriate.

Figure 5.

Training loss and validation accuracy curves for two training phases. In subgraph (a) or (b), the horizontal represents the epoch and the vertical represents the loss or accuracy. The total number of epochs for the four-classification model is 123, and the total number of epochs for the two-classification model is 96.

5 RESULTS AND DISCUSSION

5.1 Evaluation metrics

The model evaluation metrics used in this experiment are Accuracy (equation 7), Recall (equation 8), Precision (equation 9), and F1-Score (equation 10). Accuracy means the ratio of the number of correctly predicted samples to the total number of samples, and is a statistic for all samples. Specifically in galaxy image classification, it refers to the number of galaxy images correctly classified by the model as a ratio of the number of all galaxy images.

$$\begin{eqnarray*} Accuracy=\frac{N_{TP}+N_{TN}}{N_{TP}+N_{TN}+N_{FP}+N_{FN}} \end{eqnarray*}$$

(7)

where N_TP is the number of true positives, N_TN is the number of true negatives, N_FP is the number of false positives (FPs), and N_FN is the number of false negatives (FNs).

Recall means the ratio of the number of correctly predicted samples in a class to the total number of such samples and is a statistic for all samples in a class. Specifically in galaxy image classification, it refers to the ratio of the number of galaxy images in a class (e.g. CRS) that are correctly classified by the model to the number of all galaxy images in that class.

$$\begin{eqnarray*} Recall=\frac{N_{TP}}{N_{TP}+N_{FN}} \end{eqnarray*}$$

(8)

Precision means the ratio of the number of correctly predicted samples in a class to the total number of samples predicted as such, and is a statistic for all samples predicted as a certain class. Specifically in classification of galaxy images, it refers to the ratio of the number of galaxy images of a certain class (e.g. CSS) that are correctly classified by the model to the total number of galaxy images classified by the model into this class.

$$\begin{eqnarray*} Precision=\frac{N_{TP}}{N_{TP}+N_{FP}} \end{eqnarray*}$$

(9)

Therefore, recall and precision have different emphases. Furthermore, the F1-Score is based on the harmonic mean of precision and recall, which is an evaluation metric after weighing the two.

$$\begin{eqnarray*} F1\ Score=2\ast \frac{Precision\ast Reacall}{Precision+Reacall} \end{eqnarray*}$$

(10)

Accuracy measures the prediction of a global sample and gives a general view of the model’s performance, but it does not provide a more detailed local view of the model’s prediction for a certain class. Therefore, recall, precision, and F1-Score need to be considered. When the cost of FN is high, i.e. when misclassification of samples in a class into other classes would have serious consequences, the focus should be on improving the recall of that class. When the cost of FP is high, i.e. when being misclassified into a certain class has serious consequences, the focus should be on improving the precision of that class. The F1-Score is a combination of recall and precision and is not biased towards either, which can be applied to more general situations.

5.2 Results presentation of HIWL

We save the models with top three accuracies for the four-classification and two-classification models during training. Then, one of the two types of models is selected, and after combined validation, the model HIWL with the highest combined accuracy is obtained. The accuracy of this model HIWL on the test set is 96.32 per cent. The corresponding confusion matrix is shown in Table 3, and Table 4 denotes its various evaluation metrics (recall, precision, and F1-Score) on the test set.

Table 3.

Confusion matrix of HIWL on the test set. The vertical represents the true class and the horizontal represents the predicted class of this model.

	CRS	IBS	CSS	EO	SPI
CRS	813	26	0	0	5
IBS	25	777	0	0	5
CSS	0	6	41	11	0
EO	0	1	11	376	3
SPI	2	8	0	3	768

Table 3.

Confusion matrix of HIWL on the test set. The vertical represents the true class and the horizontal represents the predicted class of this model.

	CRS	IBS	CSS	EO	SPI
CRS	813	26	0	0	5
IBS	25	777	0	0	5
CSS	0	6	41	11	0
EO	0	1	11	376	3
SPI	2	8	0	3	768

Table 4.

Recall, precision, and F1-Score of HIWL on the test set. Avg represents the arithmetic mean of the corresponding metrics for all classes.

Class	Recall	Precision	F1-Score
CRS	0.9633	0.9679	0.9656
IBS	0.9628	0.9499	0.9563
CSS	0.7069	0.7855	0.7455
EO	0.9616	0.9641	0.9629
EPI	0.9834	0.9834	0.9834
Avg	0.9156	0.9302	0.9227

Class	Recall	Precision	F1-Score
CRS	0.9633	0.9679	0.9656
IBS	0.9628	0.9499	0.9563
CSS	0.7069	0.7855	0.7455
EO	0.9616	0.9641	0.9629
EPI	0.9834	0.9834	0.9834
Avg	0.9156	0.9302	0.9227

Table 4.

Recall, precision, and F1-Score of HIWL on the test set. Avg represents the arithmetic mean of the corresponding metrics for all classes.

Class	Recall	Precision	F1-Score
CRS	0.9633	0.9679	0.9656
IBS	0.9628	0.9499	0.9563
CSS	0.7069	0.7855	0.7455
EO	0.9616	0.9641	0.9629
EPI	0.9834	0.9834	0.9834
Avg	0.9156	0.9302	0.9227

Class	Recall	Precision	F1-Score
CRS	0.9633	0.9679	0.9656
IBS	0.9628	0.9499	0.9563
CSS	0.7069	0.7855	0.7455
EO	0.9616	0.9641	0.9629
EPI	0.9834	0.9834	0.9834
Avg	0.9156	0.9302	0.9227

From Table 3, it is shown that the ratio of misclassification is less than 5 per cent for all classes except the CSS. The relatively large ratio of misclassifications for CSS is due to the fact that CSS and EO are too similar and the original sample data for CSS is too small. It is shown from Table 4 that except for the CSS, the recall and F1-Score of other classes are over 0.956, and the precision is over 0.949. In particular, the precision of CSS is close to 0.8, and the recall and F1-Score are also over 0.7.

In theory, the hierarchical learning mechanism will increase the complexity of a galaxy classification model. To explore whether this approach is advisable, we compared the HIWL with a simpler model. This simpler model consists of a single layer with weighted sampling and label smoothing (SLWSLM), and is designed by removing the hierarchical learning mechanism from HIWL. The experimental results on the test set are presented in Table 5. It is shown that the hierarchical learning mechanism has some evident impacts on galaxy image recognition. In particular, the improvement is 0.42 per cent for the overall accuracy, and 8.62 per cent and 2.81 per cent on the recall improvements for minority classes CSS and EO. The data used in training the second-layer submodel belongs to two minority classes and accounts for a small subset of the entire data set. Therefore, the increased training time is short (48 s/epoch). Specifically, the training time of each epoch of HIWL and SLWSLM is 369 and 321 s, respectively. Furthermore, the EfficientNet has low complexity and fast convergence speed than general backbone models (such as resnet). Therefore, the hierarchical learning scheme is desirable.

Table 5.

The desirability of using a hierarchical classification mechanism. This evaluation on test set is conducted by comparing the HIWL with a simpler model SLWSLM (a single layer (model) with weighted sampling and label smoothing). The SLWSLM is designed by removing the hierarchical learning mechanism from HIWL.

	HIWL		SLWSLM
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9763	95.90 per cent
IBS	0.9628		0.9579
CSS	0.7069		0.6207
EO	0.9616		0.9335
SPI	0.9834		0.9795

	HIWL		SLWSLM
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9763	95.90 per cent
IBS	0.9628		0.9579
CSS	0.7069		0.6207
EO	0.9616		0.9335
SPI	0.9834		0.9795

Table 5.

The desirability of using a hierarchical classification mechanism. This evaluation on test set is conducted by comparing the HIWL with a simpler model SLWSLM (a single layer (model) with weighted sampling and label smoothing). The SLWSLM is designed by removing the hierarchical learning mechanism from HIWL.

	HIWL		SLWSLM
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9763	95.90 per cent
IBS	0.9628		0.9579
CSS	0.7069		0.6207
EO	0.9616		0.9335
SPI	0.9834		0.9795

	HIWL		SLWSLM
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9763	95.90 per cent
IBS	0.9628		0.9579
CSS	0.7069		0.6207
EO	0.9616		0.9335
SPI	0.9834		0.9795

To explore the effect of label smoothing, we compared the HIWL with an another simpler model. This simpler model consists of a hierarchical model with weighted sampling (HMWS) and is designed by removing the label smoothing mechanism from HIWL. The experimental results on test set are presented in Table 6. It is shown that the label smoothing mechanism has positive impacts on overall accuracy and recall of galaxy image recognition. In particular, the improvement is 0.28 per cent for the overall accuracy, and 0.12 per cent, 3.45 per cent, 1.79 per cent, and 0.13 per cent on the recall improvements for IBS, CSS, EO, and SPI, respectively.

Table 6.

The desirability of using a label smoothing mechanism. This evaluation on the test set is conducted by comparing the HIWL with a simpler model HMWS. The HMWS is designed by removing the label smoothing mechanism from HIWL.

	HIWL		HMWS
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9668	96.04 per cent
IBS	0.9628		0.9616
CSS	0.7069		0.6724
EO	0.9616		0.9437
SPI	0.9834		0.9821

	HIWL		HMWS
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9668	96.04 per cent
IBS	0.9628		0.9616
CSS	0.7069		0.6724
EO	0.9616		0.9437
SPI	0.9834		0.9821

Table 6.

The desirability of using a label smoothing mechanism. This evaluation on the test set is conducted by comparing the HIWL with a simpler model HMWS. The HMWS is designed by removing the label smoothing mechanism from HIWL.

	HIWL		HMWS
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9668	96.04 per cent
IBS	0.9628		0.9616
CSS	0.7069		0.6724
EO	0.9616		0.9437
SPI	0.9834		0.9821

	HIWL		HMWS
	Recall	Overall acc	Recall	Overall acc
CRS	0.9633	96.32 per cent	0.9668	96.04 per cent
IBS	0.9628		0.9616
CSS	0.7069		0.6724
EO	0.9616		0.9437
SPI	0.9834		0.9821

5.3 Comparison and analysis of different galaxy classification models

We replicated several classification models and used these models for galaxy classification tasks under the data set Galaxy Zoo-The Galaxy Challenge. These models include the classical works on galaxy classification Dieleman (Dieleman et al. 2015) and ResNet26 (Zhu et al. 2019), and several typical backbone deep learning networks. The latter includes several models from the CNN series, i.e. VGG (Simonyan & Zisserman 2015), GoogleNet (Szegedy et al. 2015), ResNet (He et al. 2016), EfficientNet (Tan & Le 2019), and a transformer series model called Vision Transformer (Dosovitskiy et al. 2021). To test the effectiveness of the method HIWL, we also investigated the version of replacing the backbone model of HIWL with each of the above models and performed the same classification task on the same data set. The results of the experimental comparison are shown in Table 7.

Table 7.

Comparison of the test results before and after combining the method HIWL with the classical model. The table contains 11 comparison models such as Dieleman, ResNet, Vision Transoformer, etc. Avg acc represents the average accuracy of 10 runs for each model, Avg acc (with HIWL) represents the average accuracy of 10 runs for each model after incorporating HIWL, and Promotion represents the difference between the accuracy of the model after incorporating HIWL and the original model.

Model	Avg acc	Avg acc	Promotion
		(with HIWL)
Dieleman (Dieleman et al. 2015)	0.9337	0.9400	0.0063
ResNet26 (Zhu et al. 2019)	0.9074	0.9147	0.0073
VGG16 (Simonyan & Zisserman 2015)	0.9431	0.9469	0.0038
GoogleNet (Szegedy et al. 2015)	0.9480	0.9507	0.0027
ResNet34 (He et al. 2016)	0.9507	0.9597	0.0090
ResNet50 (He et al. 2016)	0.9469	0.9497	0.0028
EfficientNet-B0 (Tan & Le 2019)	0.9521	0.9577	0.0056
EfficientNet-B1 (Tan & Le 2019)	0.9542	0.9612	0.0070
EfficientNet-B2 (Tan & Le 2019)	0.9503	0.9580	0.0077
Vision Transformer (Dosovitskiy et al. 2021)	0.9264	0.9451	0.0187

Model	Avg acc	Avg acc	Promotion
		(with HIWL)
Dieleman (Dieleman et al. 2015)	0.9337	0.9400	0.0063
ResNet26 (Zhu et al. 2019)	0.9074	0.9147	0.0073
VGG16 (Simonyan & Zisserman 2015)	0.9431	0.9469	0.0038
GoogleNet (Szegedy et al. 2015)	0.9480	0.9507	0.0027
ResNet34 (He et al. 2016)	0.9507	0.9597	0.0090
ResNet50 (He et al. 2016)	0.9469	0.9497	0.0028
EfficientNet-B0 (Tan & Le 2019)	0.9521	0.9577	0.0056
EfficientNet-B1 (Tan & Le 2019)	0.9542	0.9612	0.0070
EfficientNet-B2 (Tan & Le 2019)	0.9503	0.9580	0.0077
Vision Transformer (Dosovitskiy et al. 2021)	0.9264	0.9451	0.0187

Table 7.

Comparison of the test results before and after combining the method HIWL with the classical model. The table contains 11 comparison models such as Dieleman, ResNet, Vision Transoformer, etc. Avg acc represents the average accuracy of 10 runs for each model, Avg acc (with HIWL) represents the average accuracy of 10 runs for each model after incorporating HIWL, and Promotion represents the difference between the accuracy of the model after incorporating HIWL and the original model.

Model	Avg acc	Avg acc	Promotion
		(with HIWL)
Dieleman (Dieleman et al. 2015)	0.9337	0.9400	0.0063
ResNet26 (Zhu et al. 2019)	0.9074	0.9147	0.0073
VGG16 (Simonyan & Zisserman 2015)	0.9431	0.9469	0.0038
GoogleNet (Szegedy et al. 2015)	0.9480	0.9507	0.0027
ResNet34 (He et al. 2016)	0.9507	0.9597	0.0090
ResNet50 (He et al. 2016)	0.9469	0.9497	0.0028
EfficientNet-B0 (Tan & Le 2019)	0.9521	0.9577	0.0056
EfficientNet-B1 (Tan & Le 2019)	0.9542	0.9612	0.0070
EfficientNet-B2 (Tan & Le 2019)	0.9503	0.9580	0.0077
Vision Transformer (Dosovitskiy et al. 2021)	0.9264	0.9451	0.0187

Model	Avg acc	Avg acc	Promotion
		(with HIWL)
Dieleman (Dieleman et al. 2015)	0.9337	0.9400	0.0063
ResNet26 (Zhu et al. 2019)	0.9074	0.9147	0.0073
VGG16 (Simonyan & Zisserman 2015)	0.9431	0.9469	0.0038
GoogleNet (Szegedy et al. 2015)	0.9480	0.9507	0.0027
ResNet34 (He et al. 2016)	0.9507	0.9597	0.0090
ResNet50 (He et al. 2016)	0.9469	0.9497	0.0028
EfficientNet-B0 (Tan & Le 2019)	0.9521	0.9577	0.0056
EfficientNet-B1 (Tan & Le 2019)	0.9542	0.9612	0.0070
EfficientNet-B2 (Tan & Le 2019)	0.9503	0.9580	0.0077
Vision Transformer (Dosovitskiy et al. 2021)	0.9264	0.9451	0.0187

It is shown from Table 7 that after incorporating the HIWL method, each classic model has different degrees of improvement. Among 11 network models, the average accuracy of the EfficientNet-B1 model before and after incorporating the method HIWL is the highest. The improvement of the ViT model is the highest among the 11 models, and the reason for the higher improvement compared to other models is mainly because of the training strategies of this method. The ViT model requires a large amount of data, and our method HIWL uses the training strategy of transfer learning, which reduces ViT’s demand for a large amount of data, resulting in a significant improvement in ViT on this task.

Based on the highest validation accuracy [except Reza (2021) and Lin et al. (2021)] of each model, we compared the HIWL with galaxy classification works based on the Galaxy Zoo data set in recent years. These available works are based on deep learning methods such as ANN, CNN, and Vision Transformer, which have been widely studied in recent years. These works are Reza (2021), Zhu et al. (2019), Zhang et al. (2022), Gupta, Srijith & Desai (2022), Silva & Ventura (2019), Goyal et al. (2020), Jiménez et al. (2020), Lin et al. (2021), and Kalvankar, Pandit & Parwate (2020). The comparison results are presented in Table 8. The results of the comparing methods are all extracted from the original articles.

Table 8.

Comparison between the HIWL and nine other galaxy classification works based on the Galaxy Zoo data set in literature. In this table, Overall val acc represents the highest overall accuracy on validation set, and Overall test acc represents the highest overall accuracy on test set. Num classes represents the number of classes to be divided. The accuracies of Reza (2021) and Lin et al. (2021) are based on the test set, and the others are based on the validation set.

Method	Overall	Overall	Num classes
	val acc	test acc
ANN (Reza 2021)		98.2 per cent	4
ResNet26 (Zhu et al. 2019)	95.21 per cent		5
SC-Net (Zhang et al. 2022)	94.70 per cent		5
NODE-ACA (Gupta et al. 2022)	95.00 per cent		5
Silva & Ventura (2019)	94.01 per cent		6
layered CNN (Goyal et al. 2020)	88.33 per cent		3
Jiménez et al. (2020)	96.43 per cent		2
ViT (Lin et al. (2021)		81.21 per cent	8
EfficientNet-B5 (Kalvankar et al. 2020)	93.70 per cent		7
HIWL	97.22 per cent	96.32 per cent	5

Method	Overall	Overall	Num classes
	val acc	test acc
ANN (Reza 2021)		98.2 per cent	4
ResNet26 (Zhu et al. 2019)	95.21 per cent		5
SC-Net (Zhang et al. 2022)	94.70 per cent		5
NODE-ACA (Gupta et al. 2022)	95.00 per cent		5
Silva & Ventura (2019)	94.01 per cent		6
layered CNN (Goyal et al. 2020)	88.33 per cent		3
Jiménez et al. (2020)	96.43 per cent		2
ViT (Lin et al. (2021)		81.21 per cent	8
EfficientNet-B5 (Kalvankar et al. 2020)	93.70 per cent		7
HIWL	97.22 per cent	96.32 per cent	5

Table 8.

Comparison between the HIWL and nine other galaxy classification works based on the Galaxy Zoo data set in literature. In this table, Overall val acc represents the highest overall accuracy on validation set, and Overall test acc represents the highest overall accuracy on test set. Num classes represents the number of classes to be divided. The accuracies of Reza (2021) and Lin et al. (2021) are based on the test set, and the others are based on the validation set.

Method	Overall	Overall	Num classes
	val acc	test acc
ANN (Reza 2021)		98.2 per cent	4
ResNet26 (Zhu et al. 2019)	95.21 per cent		5
SC-Net (Zhang et al. 2022)	94.70 per cent		5
NODE-ACA (Gupta et al. 2022)	95.00 per cent		5
Silva & Ventura (2019)	94.01 per cent		6
layered CNN (Goyal et al. 2020)	88.33 per cent		3
Jiménez et al. (2020)	96.43 per cent		2
ViT (Lin et al. (2021)		81.21 per cent	8
EfficientNet-B5 (Kalvankar et al. 2020)	93.70 per cent		7
HIWL	97.22 per cent	96.32 per cent	5

Method	Overall	Overall	Num classes
	val acc	test acc
ANN (Reza 2021)		98.2 per cent	4
ResNet26 (Zhu et al. 2019)	95.21 per cent		5
SC-Net (Zhang et al. 2022)	94.70 per cent		5
NODE-ACA (Gupta et al. 2022)	95.00 per cent		5
Silva & Ventura (2019)	94.01 per cent		6
layered CNN (Goyal et al. 2020)	88.33 per cent		3
Jiménez et al. (2020)	96.43 per cent		2
ViT (Lin et al. (2021)		81.21 per cent	8
EfficientNet-B5 (Kalvankar et al. 2020)	93.70 per cent		7
HIWL	97.22 per cent	96.32 per cent	5

It is shown from Table 8 that the overall accuracies of most of the models exceeds 90 per cent, even around 95 per cent. Howerver, many models pay insufficient attention to the minority classes. For example, ANN (Reza 2021) studies the recognition problem of four classes: elliptical, merge, spiral, star, and then merge and star are minority classes. But the recall, precision, and F1-Score of star are all lower than 0.6, while the recall, precision, and F1-Score of merger are all lower than 0.2. On the whole, there exist some relationships between the number of divided classes and the final overall accuracy. Generally speaking, the larger the number of classes, the lower the overall accuracy will be. Four of the compared works in the table have a class number of 5, and our HIWL has a higher overall accuracy than the other three works. It is also shown that HIWL has the second-highest overall accuracy (97.22 per cent and 96.32 per cent) after ANN (Reza 2021), which dealt with the classification of four classes of galaxy. It is worth noting that the ratio of the number of samples in the minority classes to the majority classes in ANN (Reza 2021) is too small (< 1:100). Therefore, the low recognition rate in the minority classes has little impact on the final overall accuracy.

Particularly, this paper conducted some comparisons with three related works in literature based on recall, precision, and F1-Score on validation set. Each of these works studied the recognition of CRS, IBS, CSS, EO, and SPI. The SC-Net (Zhang et al. 2022) did not give the information on the precision and F1-Score. Therefore, we did not compare the HIWL with it based on these two metrics.

From Table 9, it is shown that although SC-Net (Zhang et al. 2022) has a higher recall than HIWL on CRS, IBS and CSS, it has a significantly lower recall on EO (>10 per cent) than the other three models. In addition, the recall of HIWL in each class is higher than that of ResNet26 (Zhu et al. 2019) and NODE-ACA (Gupta et al. 2022). In the the comparisons based on precision (Table 10), the proposed HIWL obtained the highest performance on every class. In the comparison based on F1-Score (Table 11), the HIWL still achieves the highest performance. On the whole, therefore, the proposed HIWL is superior to the other typical works based on three metrics (recall, precision, and F1-Score).

Table 9.

Comparison of recall between the HIWL and three literature works based on the validation set. Each of these works focuses on the recognition of CRS, IBS, CSS, EO, and SPI.

Method	Recall
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9634	0.9431	0.5862	0.9485	0.9782
SC-Net (Zhang et al. 2022)	0.9785	0.9785	0.7833	0.8259	0.9850
NODE_ACA (Gupta et al. 2022)	0.9592	0.9425	0.4894	0.9426	0.9268
HIWL	0.9715	0.9727	0.7414	0.9718	0.9897

Method	Recall
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9634	0.9431	0.5862	0.9485	0.9782
SC-Net (Zhang et al. 2022)	0.9785	0.9785	0.7833	0.8259	0.9850
NODE_ACA (Gupta et al. 2022)	0.9592	0.9425	0.4894	0.9426	0.9268
HIWL	0.9715	0.9727	0.7414	0.9718	0.9897

Table 9.

Comparison of recall between the HIWL and three literature works based on the validation set. Each of these works focuses on the recognition of CRS, IBS, CSS, EO, and SPI.

Method	Recall
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9634	0.9431	0.5862	0.9485	0.9782
SC-Net (Zhang et al. 2022)	0.9785	0.9785	0.7833	0.8259	0.9850
NODE_ACA (Gupta et al. 2022)	0.9592	0.9425	0.4894	0.9426	0.9268
HIWL	0.9715	0.9727	0.7414	0.9718	0.9897

Method	Recall
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9634	0.9431	0.5862	0.9485	0.9782
SC-Net (Zhang et al. 2022)	0.9785	0.9785	0.7833	0.8259	0.9850
NODE_ACA (Gupta et al. 2022)	0.9592	0.9425	0.4894	0.9426	0.9268
HIWL	0.9715	0.9727	0.7414	0.9718	0.9897

Table 10.

Comparison of precision between the HIWL and two literature works based on the validation set. Each of these works focuses on the recognition of CRS, IBS, CSS, EO, and SPI.

Method	Precision
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9611	0.9561	0.7234	0.9412	0.9573
NODE_ACA (Gupta et al. 2022)	0.9621	0.9001	0.6053	0.9048	0.9565
HIWL	0.9808	0.9632	0.7963	0.9668	0.9872

Method	Precision
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9611	0.9561	0.7234	0.9412	0.9573
NODE_ACA (Gupta et al. 2022)	0.9621	0.9001	0.6053	0.9048	0.9565
HIWL	0.9808	0.9632	0.7963	0.9668	0.9872

Table 10.

Comparison of precision between the HIWL and two literature works based on the validation set. Each of these works focuses on the recognition of CRS, IBS, CSS, EO, and SPI.

Method	Precision
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9611	0.9561	0.7234	0.9412	0.9573
NODE_ACA (Gupta et al. 2022)	0.9621	0.9001	0.6053	0.9048	0.9565
HIWL	0.9808	0.9632	0.7963	0.9668	0.9872

Method	Precision
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9611	0.9561	0.7234	0.9412	0.9573
NODE_ACA (Gupta et al. 2022)	0.9621	0.9001	0.6053	0.9048	0.9565
HIWL	0.9808	0.9632	0.7963	0.9668	0.9872

Table 11.

Comparison of F1-Score between the HIWL and two literature works based on the validation set. Each of these works focuses on the recognition of CRS, IBS, CSS, EO, and SPI.

Method	F1-Score
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9622	0.9495	0.6476	0.9448	0.9677
NODE_ACA (Gupta et al. 2022)	0.9607	0.9208	0.5412	0.9233	0.9414
HIWL	0.9762	0.9679	0.7679	0.9693	0.9885

Method	F1-Score
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9622	0.9495	0.6476	0.9448	0.9677
NODE_ACA (Gupta et al. 2022)	0.9607	0.9208	0.5412	0.9233	0.9414
HIWL	0.9762	0.9679	0.7679	0.9693	0.9885

Table 11.

Comparison of F1-Score between the HIWL and two literature works based on the validation set. Each of these works focuses on the recognition of CRS, IBS, CSS, EO, and SPI.

Method	F1-Score
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9622	0.9495	0.6476	0.9448	0.9677
NODE_ACA (Gupta et al. 2022)	0.9607	0.9208	0.5412	0.9233	0.9414
HIWL	0.9762	0.9679	0.7679	0.9693	0.9885

Method	F1-Score
	CRS	IBS	CSS	EO	SPI
ResNet26 (Zhu et al. 2019)	0.9622	0.9495	0.6476	0.9448	0.9677
NODE_ACA (Gupta et al. 2022)	0.9607	0.9208	0.5412	0.9233	0.9414
HIWL	0.9762	0.9679	0.7679	0.9693	0.9885

5.4 Model feature visualization

To observe the characteristics of information extraction of the model HIWL, this work performed the visualization of the features extracted by the models. The output of the model’s penultimate layer is taken as the extracted features. The models we investigated are EfficientNet-B1 and HIWL, with EfficientNet-B1 as the backbone. The experimental studies in Table 7 show that these two models have the best performance. Each model is visualized on training samples and test samples. To ensure the observability, only 5000 galaxy samples randomly selected from them were visualized. There are 2881 samples in the test set, and they all were used in visualization.

For the EfficientNet-B1 model, we directly input the selected training data or the test data into the model. After obtaining the output of the penultimate layer, the operations of flattening, dimensionality reduction, and visualization are performed sequentially. For HIWL, we did the following operations: (1) The samples labelled as CRS, IBS, and SPI are sent into the four-classification model of HIWL, and the output features of the penultimate layer are obtained; (2) the samples labelled CSS and EO are fed into the two-classification model to obtain the output features of the penultimate layer; (3) the features obtained from these samples are flattened and projected into a two-dimensional space using t-DistributedStochastic Neighbor Embedding (t-SNE). The result is shown in Fig. 6.

Figure 6.

Visualization of model features. (a) and (b) present the extracted features of 5000 randomly selected training samples, and (c) and (d) present the extracted features of test samples. It is shown that the features of HIWL have more discriminated information than those of EfficientNet-B1.

From Fig. 6, it is shown that the visualization results of EfficientNet-B1 or HIWL are consistent in the training data and test data, respectively. In the feature space of EfficientNet-B1, the sample points within the class are relatively sparse; the distance of sample points between different classes is small and entangled with each other, resulting in inconspicuous class boundaries. Contrastingly, in the feature space of HIWL, the sample points within the class become denser; the distance of sample points between the classes becomes larger and the phenomenon of intertwining is greatly reduced, resulting in obvious class boundaries. Although multiple techniques in the method HIWL contribute to this change, more contributions come from label smoothing, which is also consistent with Müller et al. (2019). These characteristics of HIWL features are beneficial to alleviating the problem of model misclassification and making the classification more accurate.

In addition, it is shown that there is a certain similarity between CRS and IBS, and the similarity between EO and CSS is high. This similarity is demonstrated by the fact that the samples between them partially overlap in feature space. That is to say, some of the samples of EO are partially in the space where CSS is located and vice versa.

5.5 Model attention visualization

For the general deep learning network, many people consider it to be a black box with weak interpretability. For example, why it predicts as it does and where it focuses are unknown. Gradient-weighted Class Activation Mapping (Grad-CAM; Selvaraju et al. 2017) is a deep network visualization method based on gradient localization. It uses the gradient of the target (such as the logits of a class in a classification task), which flows into the final convolutional layer, to generate a rough localization map to highlight important regions in the image for prediction. It explains the classification basis of the DNN model in the form of a heat map, which facilitates humans to understand the model and analyse it. In order to understand the image regions that the model focuses on when classifying galaxies, we then visualized the heat map for each of the five classes using GRAD-CAM, and the results are shown in Fig. 7. In this visualization, we compared two models, EfficientNet-B1 and HIWL. In this figure, the closer to red, the more the model pays attention to this region, and the closer to black, the less the model pays attention to this region.

Figure 7.

Visualization using GRAD-CAM method: the more red it is, the larger the value is, and the more attention the region is in the GRAD-CAM visualization. The first row represents the five original galaxy samples belonging to different classes; the second row represents the GRAD-CAM visualization of EfficientNet-B1; the third row represents the GRAD-CAM visualization of HIWL.

10.1051/0004-6361/201016423

In general, the regions to be focused on are larger and scattered when using EfficientNet-B1 for galaxy image recognition, whereas it is smaller and aggregated for HIWL. Both models are not affected by small bright-spots (other galaxies) in the image and still focus on the main regions of the class to be identified. This robustness benefits from the high noise immunity of the EfficientNet series of models. In particular, for CRS, the method HIWL focuses on the middle and edge contours, while EfficientNet-B1 also focuses on a few redundant surrounding pixels. For IBS, both models focus on the aspect ratio and small surrounding regions, but EfficientNet-B1 focuses on more scattered regions. For CSS, both models have similar focus regions and the difference between them is that the focus regions of HIWL are more aggregated. For EO, EfficientNet-B1 focuses on large and dispersed regions containing both tips of the EO, as well as the side surrounding areas, while HIWL focuses on the central width and surrounds which are relatively more aggregation. For SPI, EfficientNet-B1 focuses on less spiral arms regions and more dispersed surrounding regions, while HIWL focuses on the spiral arms. Therefore, the focus of HIWL are excellent and compact, while the attention of EfficientNet-B1 are drawn to some regions with redundancies.

6 CONCLUSION

This work carries out a series of research to address several typical problems in deep learning-based methods for classifying galaxy images. For example, the auto learning of galaxy features, the masking effect due to the different degree of similarities between classes, the class imbalance problem, and the machine learning problems caused by the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changing of morphology (DDRGC). For the auto learning of galaxy features, this paper explores the application of the EfficientNet model, which combines both representational power and efficiency in galaxy information extraction. To deal with the masking effects mentioned above and the imbalance effects, we utilized the idea of hierarchical learning and weighted sampling + online data augmentation techniques. For the problems caused by the DDRGC discrepancy, this paper explores label smoothing techniques to reduce this discrepancy and its negative effects.

The proposed scheme was evaluated on the Galaxy Zoo-The Galaxy Challenge data. This problem is to deal with the classification of five types of galaxy images: CRS, IBS, CSS, EO, and SPI. It is shown that the proposed scheme achieves an overall classification accuracy of 96.32 per cent. It still has a certain improvement when the idea of this method is applied to other typical classification models. In further comparisons with recent works on galaxy classification, the proposed HIWL shows advantages over three evaluation metrics (recall, precision, and F1-Score). In particular, the method shows a significant improvement in the recognition of CSS, with a recall of 70.69 per cent and a precision of 78.55 per cent. To understand the recognition effectiveness and basis of the model, we also use t-SNE to visualize model features and use GRAD-CAM method to visualize interest area of the model as a heat map.

The typical characteristic of HIWL is the improvement of the recognition effect on the images of the minority class of galaxies. In fact, the rare and special celestial bodies are of special value in astronomical research. Therefore, a very typical potentially application value of the method in this paper is the application scenario of astronomical data processing in the search for such rare and special celestial bodies. In addition, although the recognition based on galaxy images in this paper verifies the effectiveness of the method. In fact, the core techniques of HIWL are hierarchical classification, weighted sampling, and label smoothing. These techniques can be applied not only to the recognition of galaxy images, but also to a wider range of astronomical data-processing problems such as other types of astronomical images, spectral, and time-series observational data.

In future work, we will combine label smoothing with morphological features in a more nuanced way. That is, the smooth value can reflect the similarity between classes, so that classes with different degrees of similarity have different and meaningful values on the label. In addition, we plan to explore the galaxy recognition problem on larger data sets and in more fine-grained class divisions.

ACKNOWLEDGEMENTS

This work were supported by the National Natural Science Foundation of China (Grant No. 11973022), the Natural Science Foundation of Guangdong Province (No. 2020A1515010710), the Major projects of the joint fund of Guangdong and the National Natural Science Foundation (Grant No. U1811464). Software: numpy (Harris et al. 2020), matplotlib (Hunter 2007), pytorch (Paszke et al. 2019), torchvision (Marcel & Rodriguez 2010), and tqdm (da Costa-Luis 2019).

DATA AVAILABILITY

The data set named Galaxy Zoo-The Galaxy Challenge is available at https://www.kaggle.com/competitions/galaxy-zoo-the-galaxy-challenge/data. And the code for the whole experiment process for this data set, including clean sample selection, clean sample set partitioning, pre-processing, training models, and comparing models, are available at https://github.com/xrli/HIWL.

REFERENCES

Baillard

A.

et al. ,

2011

,

A&A

,

532

,

A74

10.1016/j.neunet.2018.07.011

Buda

M.

,

Maki

A.

,

Mazurowski

M. A.

,

2018

,

Neural Netw.

,

106

,

249

PubMed

da Costa-Luis

C. O.

,

2019

,

J. Open Sour. Softw.

,

4

,

1277

10.21105/joss.01277

Dieleman

S.

,

Willett

K. W.

,

Dambre

J.

,

2015

,

MNRAS

,

450

,

1441

10.1093/mnras/stv632

, https://openreview.net/pdf?id=YicbFdNTTy

Dosovitskiy

A.

et al. ,

2021

, in

9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria

.

OpenReview.net

(accessed December 2022)

10.1007/s12145-020-00526-w

Gashi

S.

,

Saeed

A.

,

Vicini

A.

,

Di Lascio

E.

,

Santini

S.

,

2021

, in

Proc. 2021 International Conference on Multimodal Interaction

.

ACM

,

New York

, p.

168

Goyal

L. M.

,

Arora

M.

,

Pandey

T.

,

Mittal

M.

,

2020

,

Earth Sci. Inform.

,

13

,

1427

10.1016/j.ascom.2021.100543

Gupta

R.

,

Srijith

P.

,

Desai

S.

,

2022

,

Astron. Comput.

,

38

,

100543

10.1038/s41586-020-2649-2

Harris

C. R.

et al. ,

2020

,

Nature

,

585

,

357

PubMed

He

K.

,

Zhang

X.

,

Ren

S.

,

Sun

J.

,

2016

, in

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

.

IEEE Computer Society

,

Los Alamitos, CA

, p.

770

10.1016/j.neucom.2018.11.088

Hou

J.

,

Zeng

H.

,

Cai

L.

,

Zhu

J.

,

Chen

J.

,

Ma

K.-K.

,

2019

,

Neurocomputing

,

345

,

15

Hunter

J. D.

,

2007

,

Comput. Sci. Eng.

,

9

,

90

10.1109/MCSE.2007.55

10.1109/ACCESS.2020.2978804

Islam

M.

,

Glocker

B.

,

2021

, in

International Conference on Information Processing in Medical Imaging

.

Springer

,

Berlin

, p.

677

Jiménez

M.

,

Torres

M. T.

,

John

R.

,

Triguero

I.

,

2020

,

IEEE Access

,

8

,

47232

10.1051/0004-6361/202037697

José

A.

et al. ,

2020

,

A&A

,

638

,

A134

Kalvankar

S.

,

Pandit

H.

,

Parwate

P.

,

2020

,

preprint

(

)

Khalifa

N. E.

,

Taha

M. H.

,

Hassanien

A. E.

,

Selim

I.

,

2018

, in

International Conference on Computing Sciences and Engineering (ICCSE)

.

IEEE

,

Piscataway

, p.

1

10.1016/j.nicl.2019.101811

Kim

J. P.

et al. ,

2019

,

NeuroImage: Clin.

,

23

,

101811

10.1016/j.asoc.2015.08.060

PubMed

Krawczyk

B.

,

Galar

M.

,

Jeleń

Ł.

,

Herrera

F.

,

2016

,

Appl. Soft Comput.

,

38

,

714

10.1016/j.chinastron.2019.11.005

Li

C.

,

Zhang

W.

,

Lin

J.

,

2019

,

Chin. Astron. Astrophys.

,

43

,

539

Liang

B.

,

Wang

P.

,

Cao

Y.

,

2022

,

preprint

(

)

Lin

J. Y. Y.

,

Liao

S. M.

,

Huang

H. J.

,

Kuo

W. T.

,

Ou

O. H. M.

,

2021

,

preprint

(

10.1111/j.1365-2966.2010.17432.x

)

Lintott

C.

et al. ,

2011

,

MNRAS

,

410

,

166

10.1017/S1743921316012771

Lukic

V.

,

Brüggen

M.

,

2016

,

Proc. Int. Astron. Union

,

12

,

217

10.1051/0004-6361/201321447

Małek

K.

et al. ,

2013

,

A&A

,

557

,

A16

Marcel

S.

,

Rodriguez

Y.

,

2010

, in

Proc. 18th ACM International Conference on Multimedia

.

ACM

,

New York

, p.

1485

Mohammed

R.

,

Rawashdeh

J.

,

Abdullah

M.

,

2020

, in

11th International Conference on Information and Communication Systems (ICICS)

.

IEEE, Piscataway

, p.

243

. https://proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf

Müller

R.

,

Kornblith

S.

,

Hinton

G. E.

,

2019

, in

Wallach

H.

,

Larochelle

H.

,

Beygelzimer

A.

,

d'Alché-Buc

F.

,

Fox

E.

,

Garnett

R.

, eds,

Advances in Neural Information Processing Systems

, Vol.

32

.

Curran Associates, Inc

, p.

4694

(accessed December 2022)

. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

Paszke

A.

et al. ,

2019

, in

Wallach

H.

,

Larochelle

H.

,

Beygelzimer

A.

,

d'Alché-Buc

F.

,

Fox

E.

,

Garnett

R.

, eds,

Advances in Neural Information Processing Systems

, Vol.

32

.

Curran Associates, Inc

, p.

8026

(accessed December 2022)

Raddick

M. J.

,

Bracey

G.

,

Gay

P. L.

,

Lintott

C. J.

,

Murray

P.

,

Schawinski

K.

,

Szalay

A. S.

,

Vandenberg

J.

,

2010

,

Astron. Educ. Rev.

,

9

,

010103

10.3847/AER2009036

10.1016/j.ascom.2021.100492

Reza

M.

,

2021

,

Astron. Comput.

,

37

,

100492

Scoville

N.

et al. ,

2007

,

ApJS

,

172

,

1

10.1086/516585

Selvaraju

R. R.

,

Cogswell

M.

,

Das

A.

,

Vedantam

R.

,

Parikh

D.

,

Batra

D.

,

2017

, in

Proc. IEEE International Conference on Computer Vision

.

IEEE Computer Society

,

Los Alamitos, CA

, p.

618

10.1016/j.ascom.2015.03.010

Sevilla-Noarbe

I.

,

Etayo-Sotos

P.

,

2015

,

Astron. Comput.

,

11

,

64

Silva

M.

,

Ventura

T.

,

2019

, in

Anais da X Escola Regional de Informática de Mato Grosso

.

SBC

,

Porto Alegre

, p.

31

Simonyan

K.

,

Zisserman

A.

,

2015

, in

Bengio

Y.

,

LeCun

Y.

, eds,

3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA

.

(Accessed December 2022)

Song

G. h.

,

Jin

X. g.

,

Chen

G. l.

,

Nie

Y.

,

2016

,

Front. Inf. Technol. Electron. Eng.

,

17

,

897

10.1631/FITEE.1500346

Sreejith

S.

et al. ,

2018

,

MNRAS

,

474

,

5232

10.1093/mnras/stx2976

Szegedy

C.

et al. ,

2015

, in

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

.

IEEE Computer Society

,

Los Alamitos, CA

, p.

1

Szegedy

C.

,

Vanhoucke

V.

,

Ioffe

S.

,

Shlens

J.

,

Wojna

Z.

,

2016

, in

Proc. IEEE Conference on Computer Vision and Pattern Recognition

.

IEEE Computer Society

,

Los Alamitos, CA

, p.

2818

, http://proceedings.mlr.press/v97/tan19a/tan19a.pdf

Tan

M.

,

Le

Q.

,

2019

, in

International Conference on Machine Learning

. p.

6105

(accessed December 2022)

Tyson

J. A.

,

2002

, in

Tyson

J. A.

,

Wolff

S.

, eds,

Proc. SPIE Conf. Ser. Vol. 4836, Survey and Other Telescope Technologies and Discoveries

.

SPIE

Bellingham

, p.

10

Willett

K. W.

et al. ,

2013

,

MNRAS

,

435

,

2835

10.1093/mnras/stt1458

York

D. G.

et al. ,

2000

,

AJ

,

120

,

1579

10.1086/301513

Zhang

Z.

,

He

T.

,

Zhang

H.

,

Zhang

Z.

,

Xie

J.

,

Li

M.

,

2019

,

preprint

(

)

Zhang

Z.

,

Zou

Z.

,

Li

N.

,

Chen

Y.

,

2022

,

Res. Astron. Astrophys

,

22

,

055002

10.1088/1674-4527/ac5732

Zheng

Y.

,

Yang

X.

,

Dang

X.

,

2020

,

preprint

(

10.1007/s10509-019-3540-1

)

Zhu

X.-P.

,

Dai

J.-M.

,

Bian

C.-J.

,

Chen

Y.

,

Chen

S.

,

Hu

C.

,

2019

,

Ap&SS

,

364

,

55

, https://openreview.net/pdf?id=r1Ue8Hcxg

Zoph

B.

,

Le

Q. V.

,

2017

, in

5th International Conference on Learning Representations, ICLR 2017, Toulon, France

.

OpenReview.net

(accessed December 2022)