Galaxy morphology classification using multiscale convolution capsule network

Li, Guangping; Xu, Tingting; Li, Liping; Gao, Xianjun; Liu, Zhijing; Cao, Jie; Yang, Mingcun; Zhou, Weihong

doi:10.1093/mnras/stad854

ABSTRACT

Classification of galaxy morphology is a hot issue in astronomical research. Although significant progress has been made in the last decade in classifying galaxy morphology using deep learning technology, there are still some deficiencies in spatial feature representation and classification accuracy. In this study, we present a multiscale convolutional capsule network (MSCCN) model for the classification of galaxy morphology. First, this model improves the convolutional layers using a multibranch structure to extract the multiscale hidden features of galaxy images. In order to further explore the hidden information in the features, the multiscale features are encapsulated and fed into the capsule layer. Second, we use a sigmoid function to replace the softmax function in dynamic routing, which can enhance the robustness of MSCCN. Finally, the classification model achieves 97 per cent accuracy, 96 per cent precision, 98 per cent recall, and 97 per cent F1-score under macroscopic averaging. In addition, a more comprehensive model evaluation was accomplished in this study. We visualized the morphological features for the part of sample set that used the t-distributed stochastic neighbour embedding (t-SNE) algorithm. The results show that the model has a better generalization ability and robustness, and it can be effectively used in the galaxy morphological classification.

methods: data analysis, techniques: image processing, galaxies: general

1 INTRODUCTION

A galaxy is a celestial system composed of star and interstellar medium with a spatial scale of thousands to hundreds of thousands of light-years, which includes environmental density, interact history, gas accretion, and dark matter halo (Ball, Loveday & Brunner 2008). Most galaxies have obvious geometric features. The main structures of the galaxy are spheres and discs, and the rods and the spiral arms in discs (Abraham, van den Bergh & Nair 2003). Through the evolution of galaxy structure with redshift, we can understand the formation and evolution of galaxy (Wang & Xu 2007). In this study, the way to classify a galaxy accurately and to understand its morphological structure is of great importance for studying the physical properties of the galaxy.

According to different classification criteria, a galaxy has different classification systems. In the early stage, limited by the observation equipment, acquisition technology and other factors, only some bright and near galaxy images could be obtained and directly classified by human eyes, which is called the visual classification system. The typical method is Hubble classification system (Hubble 1926). According to their morphology, galaxies can be divided into four basic types: spiral galaxy (⁠|$\mathit {S}$|⁠), barred spiral galaxy (⁠|$\mathit {SB}$|⁠), elliptical galaxy (⁠|$\mathit {E}$|⁠), and irregular (⁠|$\mathit {Irr}$|⁠). Since then, the modelling classification systems and non-model classification systems have been proposed. The modelling classification system is mainly based on different galaxy morphology corresponding to different surface brightness profiles to divide the morphology of galaxy. The most common methods are exponential fitting (Sersic 1968) and nuclear bulge-nuclear disc fitting (Ostrander et al. 1998). According to structural parameters such as clumpiness index (Conselice 2003), moment index (Lotz, Primack & Madau 2004), and concentration index (Bershady, Jangren & Conselice 2000), the non-model galaxy classification system mainly classifies galaxy morphology.

Since the new century, the Sloan Digital Sky Survey (SDSS) (Lupton et al. 2001), Large Synoptic Survey Telescope (LSST) (Ivezić et al. 2019), Cosmic Evolution Survey (COSMOS) (Scoville et al. 2007), LAMOST (Zhao et al. 2012), Space Infrared Telescope Facility (SIRTF) (Fanson & Fazio 1998), and JWST (Gardner et al. 2006) are gradually implemented. Traditional data processing methods find it difficult to process the galaxy data efficiently and in real-time. The traditional galaxy morphology classification method distinguishes different classes of galaxy by manually extracting galaxy image features, including surface brightness (⁠|$\mathit {I(r)}$|⁠), Gini coefficient (⁠|$\mathit {G}$|⁠), semidiameter (⁠|$\mathit {SD}$|⁠), and concentration (Abraham et al. 2003; Sorrentino, Antonuccio-Delogu & Rifatto 2006). These methods depend on the comprehensiveness of feature extraction; besides, the process of the selection and the extraction of the features is time-consuming and laborious.

In recent years, due to its advantages such as the manual design features not being required, strong learning ability and fully mining hidden features, deep learning has been widely used in galaxy morphology classification. Zhu et al. (2019) proposed an improved deep residual network model to classify galaxy morphological; this model improves the residual unit, reduces the network depth, and widens the network width. Gupta, Srijith & Desai (2022) introduced a continuous depth version of the Residual Network for the Galaxy Zoo2 (GZ2) (Hart et al. 2016) galaxy data set, which uses Adjoint and Adaptive Checkpoint Adjoint (ACA) of numerical techniques to train the Neural ordinary differential equations (NODE). Compared with the residual network, this model reduced about one-third of the parameters and achieved better effect of classification. Wang Lin-Qian & Luo (2022) built a galaxy classification network (GMC-net) to classify the photometric images of galaxies in SDSS DR16, Galaxy Zoo2, and EFIGI catalogue. The model can automatically extract the features of galaxy images and automatically classify them according to their shapes, avoiding the difficulties of the extraction and selection of features and selection of classifier. Mittal et al. (2020) designed a convolutional neural network (CNN) classifier for the irregular galaxies, and this classifier has a data-enhanced function, and the model uses certain data augmentation techniques and different activation functions to classify the galaxies and obtain better results than its earlier contemporaries. Fielding, Nyirenda & Vaccari (2022) used a convolutional auto-encoder as a feature extractor, and the features were clustered by k-means, fuzzy c-means, and agglomerative clustering to classify the galaxy; this approach could better classify Galaxy Zoo. Zhang et al. (2022) aimed at the problems of large training data and parameters in supervised deep learning model, and proposed a few-shot learning model based on Siamese Networks to classify galaxies; the model is not only suitable for the taxonomy of galaxy morphology but also for identifying rare astrophysical objects. Although the sample data are limited, the model still achieves excellent classification results. Yao-Yu Lin et al. (2021) first applied Vision Transformer to the classification task of galaxy morphology and achieved fair classification results with traditional CNNs, and the method is specifically good at classifying smaller sized and fainter galaxies. Nishikawa-Toomey, Smith & Gal (2020) proposed a semisupervised deep learning method; this model uses a Variational Auto-encoder (VAE) with Equivariant Transformer layers as classifier to classify galaxy morphology. The novel method using the fewer labels data achieved a higher accuracy compared to the existing approaches.

However, neural networks that have been used previously for galaxy morphology classification have some disadvantages, among which CNNs are representative. When it is used for galaxy morphology classification, the pooling operation in CNNs will lead to some critical feature information loss. In addition, the CNNs do not have the Isotropic, and only retain the size of feature but ignore the important information of direction and spatial position (Nishikawa-Toomey et al. 2020). These factors affect the classification performance of galaxy. In 2017, to solve the problems existing in the CNN model, Sabour, Frosst & E Hinton (2017) proposed a new deep learning network called capsule network (CapsNet). The core unit of CapsNet is capsule, which is composed of a group of neurons, and it can store and output feature information in the form of vectors. The length of the capsule vector represents the probability of the object’s existence and the direction represents the object’s attribute (location, rotation, size, colour, etc.). CapsNet is able to output more comprehensive feature information than the CNNs that use only a single neuron for target representation.

Although the capsule network has many advantages in image classification, there is still some room for improvement, such as the insufficient ability of feature extraction, poor performance in some classification tasks, what is more, a large number of parameters and computations hinder the promotion of CapsNet. To solve the above problems, we propose a multiscale convolutional capsule network (MSCCN) for the galaxy morphology classification.

The study is organized as follows: we discuss the traditional capsule network, and introduce the network structure of multiscale CNNs in Section 2. In Section 3, we introduce the data set and experiment equipment used in this study. Besides, we pre-process the data and select the best hyper-parameters for our model. In Section 4, we show the classification results of the model and compare our results with other similar works. In Section 5, we visualize the output of the DigitCaps layer of the model, and analyse the physical representation of the low-level evidence of galaxy data. Finally, we present our conclusion in Section 6.

2 METHODS

2.1 Capsule network

A complete CapsNet can be divided into encoder and decoder. The encoder contains the convolution layer, primary capsules (PrimaryCaps) layer, and digital capsules (DigitCaps) layer. The decoder is composed of a full connection layer. Furthermore, CapsNet can build spatial hierarchy through the dynamic routing process, and then extract spatial feature information of objects. Dynamic routing is an algorithm based on protocol routing that predicts the capsule of the present layer through assigning a reasonable weight to each capsule of the last layer. Fig. 1 is the structure of traditional capsule network (CapsNet).

Figure 1.

Structure of traditional capsule networks.

Open in new tab Download slide

2.1.1 Convolution layer

The convolution layer of CapsNet is used to extract the low-level features of the object. The convolution formula is defined as follows:

$$\begin{eqnarray} {y_{ij}} = \sum \limits _{u = 1}^m {\sum \limits _{v = 1}^n {{f_{uv}} \cdot {x_{i-u + 1,j-v + 1}}} }, \end{eqnarray}$$

(1)

where x_ij, 1 ≤ i ≤ M, 1 ≤ j ≤ N, is the image matrix of galaxy, and f_uv, 1 ≤ u ≤ m, 1 ≤ v ≤ n, is a filter. In the convolution layer, the input from the neuron |$\mathit {i}$| to the layer |$\mathit {l}$| is represented as follows: |$a_i^l = f(\sum \nolimits _{j = 1}^m {w_j^{(l)} \cdot a_{i-j + m}^{(l-1)} + {b^{(l)}}})$|⁠. |$\mathit {w_j^{(l)}}$| represents the convolution kernel, and f represents the activation function. All neurons are same and the weights are shared in layer |$\mathit {l}$|⁠.

2.1.2 Primary capsule layer

The primary capsule layer encapsulated primary features into vectors, and normalized galaxy data sets. We use the squashing function to normalize for the data, which has narrowed the input data interval, reduced the amount of network computation, and strengthened the ability of expression and learning of network. The expression of the primary capsule layer is

$$\begin{eqnarray} {u^{l(i,j)}} = {f_s}\left({\begin{array}{c} {f_a}\big(\sum \nolimits _i {a_i^1\big)} \\ \vdots \\ {{f_a}\big(\sum \nolimits _i {a_i^l}\big)} \end{array}} \right) , \end{eqnarray}$$

(2)

where u^l(i, j) represents the primary capsule, f_a represents the operation of the primary capsule layer on feature data, |${f_a}(\sum \nolimits _i {a_i^l})$| is the output of the convolution layer, and f_s is the squash function, where it is non-linear squash function

$$\begin{eqnarray} {f_s}\left({f_a}\left(\sum \limits _i {a_i^l}\right)\right) = \frac{{{{\left\Vert {{f_a}\left(\sum \nolimits _i {a_i^l}\right)} \right\Vert }^2}}}{{1 + \left\Vert {{f_a}\left(\sum \nolimits _i {a_i^l}\right)} \right\Vert }} \cdot \frac{{{f_a}(\sum \nolimits _i {a_i^l})}}{{\left\Vert {{f_a}\left(\sum \nolimits _i {a_i^l}\right)} \right\Vert }}. \end{eqnarray}$$

(3)

2.1.3 Digital capsule layer

The information transmitted between the primary capsule layer and the digital capsule layer consists of two main processes: linear transformation and dynamic routing update. In the linear transformation, an activation vector |${\hat{u}_{j|i}}$| can be obtained when a low-level capsule i transmits data to a high-level capsule j, and is calculated by the following equation:

$$\begin{eqnarray} \hat{u}_{j|i}=w_{ij} \cdot u_i , \end{eqnarray}$$

(4)

where u_i are the output vectors of the low-level capsules, and w_ij is a unique linear transformation matrix.

In the high-level capsule, the total input s_j is obtained by calculating a weighted summation of the |${\hat{u}_{j|i}}$| and c_ij, and the process is expressed as

$$\begin{eqnarray} {s_j} = \sum \limits _{i = 1}^N {{c_{ij}} \cdot {\hat{u}_{j|i}}}, \end{eqnarray}$$

(5)

where c_ij are the coupling coefficients, which are determined by iterative dynamic routing. In the traditional CapsNet, c_ij are calculated by the softmax function, and the calculation process is as follows:

$$\begin{eqnarray} {c_{ij}=\frac{\mathrm{ exp}(b_{ij})}{\sum \nolimits _k \mathrm{ exp}(b_{ik})}} \end{eqnarray}$$

(6)

The parameter |${b_{ij}}={\hat{u}_{j|i}}\cdot {c_{ij}}$| is the a priori probability of capsule i to capsule j. By updating |${\hat{u}_{j|i}}$| and v_j at first, b_ij and c_ij were updated too; this process is defined as an iteration of dynamic routing, Fig. 2 presents the updated process of dynamic routing. Lastly, a group of optimal parameters is obtained and the updation of dynamic routing is completed.

Figure 2.

Update process of dynamic routing.

Open in new tab Download slide

Lastly, the output of the high-level capsule j is noted as v_j, which can be calculated by s_j. Meanwhile, s_j is squeezed through the squash function:

$$\begin{eqnarray} {v_j} = \frac{{{{\Vert {{s_j}} \Vert }^2}}}{{1 + {{\Vert {{s_j}} \Vert }^2}}} \cdot \frac{{{s_j}}}{{\Vert {{s_j}} \Vert }}. \end{eqnarray}$$

(7)

2.1.4 Loss function

In this work, the total loss of the model is a combination of the margin loss and the reconstruction loss. In the digital capsule layer, margin losses are used to update the weights between digital capsules, which are calculated as follows:

$$\begin{eqnarray} {L_k} = {T_k}\max (0,{m^ + }-{\left\Vert {{v_k}} \right\Vert ^2}) + \lambda {(1-{T_k})\max (0,\left\Vert {{v_k}} \right\Vert - {m^ - })^2}, \end{eqnarray}$$

(8)

where T_k is an indicator function of classification. If the class of k exists, then T_k is 1; otherwise T_k is 0. m⁺ = 0.9 is false positive (FP) and m⁻ = 0.1 is false negative (FN). The λ is defaulted to 0.5.

The reconstruction loss represents the loss of the image reconstructed in decoder; it is the Euclidean distance between the reconstructed vector and the input vector. In this work, we reduce this by a factor of 0.0005 to avoid the reconstruction loss dominating the margin loss.

There are three fully connected layers in the final part of the network. The feature space learned in the previous layers is mapped to the sample marker space in the three fully connected layers. The galaxy morphological features learned in the previous layers are integrated; the galaxy image could be reconstructed.

2.2 Multiscale convolution capsule network

In traditional CapsNet, the low-level feature extraction module has a simple structure that uses only a single convolution layer to extract the low-level features of the image. Its feature extraction activation is insufficient and the parameters are redundant. To further improve the performance of CapsNet, a multibranch structure is used to improve the generation process of the CapsNet capsules to increase the multiscale feature extraction ability of the model at first. Secondly, when the coupling coefficients are applied by the ‘routing softmax’ function (Sabour et al. 2017), the values of the coupling coefficients will be concentrated in a small interval; we use a sigmoid function to replace the ‘routing softmax’ function in dynamic routing, which can help the network get a more uniform distribution of routing coefficients and strengthen the discernibility of the output vector for each class. Therefore, we constructed a multiscale convolution capsule network model for the classification of the galaxy image. Fig. 3 is the model structure of the MSCCN network. The following contributions are presented in this work:

Instead of the convolution layer in CapsNet, we use a multiscale parallel convolution layer to extract the multiscale hidden features of galaxy images.
In dynamic routing, we use a sigmoid function to replace the ‘routing softmax’ function to strengthen the discernibility of the output vector for each class and enhance the robustness capability of the CapsNet.

Figure 3.

Structure of the MSCCN model. Inputs: 80 × 80 downsampled images of the galaxies. Sigmoid: A function is used to replace the softmax function in dynamic routing. Conv1: a convolutional layer with receptive field of 7 × 7 and 256 filters. Conv2: a convolutional layer with receptive field of 5 × 5 and 256 filters. Conv3: a convolutional layer with receptive field of 3 × 3 and 256 filters. Conv4: Convergence of the features of the two channels. Primary Caps: The features extracted by the convolutional layer reshaped into 32 primary caps with eight dimensions where each dimension is a feature map with the size of 32 × 32. DigitCaps: There are five capsules; each of them represent one class. Decoder: three full connected layers with 512, 1024, 19 200 neurons; it is used to reconstructed the images of galaxy.

Open in new tab Download slide

2.2.1 Multiscale parallel convolution layer

Based on the advantages of the multiscale features in CNNs, more researchers have tried to introduce multiscale ideas into the CapsNets model. For example, Xiang et al. (2018) proposed a multiscale capsule network, which solved the problems of the original model not well suited to images with rich internal features. This model has shown improvement in performance, and convergence becomes easier than the previous model. Jeong & Kim (2021) proposed a multiscale decomposed capsule network (MDCN) for the issue of parameter redundancy during the training of capsule networks; this model can use fewer parameters to synthesize capsules through the MDCN architecture and has better performance and parameter deductions. Therefore, we have also tried to introduce features of the multiple scales in our work to improve the structure of the model.

In the improved model, we increase the diversity of features and reduce the loss of basic information by combining the advantages of CNN and CapsNet models. In our work, the MSCCN model uses the three convolution kernels with different scales to extract the low-level features of galaxy morphology, and excavate the multiscale and multilevel feature information. In order to extract the multiscale morphological features of galaxy images, 7 × 7, 5 × 5, and 3 × 3 convolution kernels are selected to construct a multiscale parallel convolution layer.

In the MSCCN model, the output of two channels is spliced to construct a complete galaxy morphological feature map. The capsule transforms these feature vectors into a merged galaxy feature as the input of the following digital capsule layer.

In the primary capsule layer of the MSCCN model, the feature maps could be activated after convolution layers are merged, and finally a new capsule unit is synthesized per eight channels. Therefore, the number of the capsule in the primary capsule layer is one-eighth of the total number of activation features in the upper layer, and the reconstructed capsule unit is used as the input of the digital capsule layer.

2.2.2 Sigmoid routing

In the DigitCaps layer, the capsule network transmits information from each capsule to the next layer in the form of its activation value, and the input to each layer is a weighted summation of the activation vector |$\hat{u_{i|j}}$|⁠. The process is shown in equation (5), where c_ij are the coupling coefficients determined by dynamic routing with iterations.

In traditional models, c_ij are calculated by the softmax function, as in equation (6), but we found that the softmax function would convert the logits of the coupling coefficients into a set of concentrative values; it may lead to little difference in the coupling coefficients assigned to the true features and the false features. This may result in the wrong summation of prediction vectors in the next capsule, affecting the final performance of classification. In our study, we try using a sigmoid function instead of the softmax function in dynamic routing, where c_ij is no longer the allocation probabilities of the capsule, but the strength of the correlation between the two capsules. It is defined by the following equation:

$$\begin{eqnarray} {c_{ij}^{\prime }=\frac{1}{1+\mathrm{ exp}(b_{ij})}} \end{eqnarray}$$

(9)

Sigmoid function is a continuous smooth function when we use it as an activation function, which can compress the data uniformly at (0,1) to enhance the expressiveness of the network. In dynamic routing, sigmoid function can assign large coupling coefficients to the real features, and assign small coupling coefficients to the false features, which could avoid wrong features after dynamic routing to obtain larger weight coefficients to transfer to the next capsule layer. Compared with softmax, sigmoid can reduce the agglomeration effect of the coupling coefficients in the capsule and let the capsule of Digitcaps layer obtain a more uniform distribution of coupling coefficients. Table 1 is the update process of sigmoid dynamic routing.

Table 1.

Open in new tab

Sigmoid routing algorithm.

Sigmoid routing algorithm
1: Input to routing(⁠\|$\hat{u}_{j\|i},r,l$\|⁠)
2: For all capsule i in layer l and capsule j in layer (l + 1): b_ij ← 0
3: For r iterations do:
4: For all capsule i in layer l: \|$c_{ij}^{\prime } \leftarrow sigmoid(b_{i})$\|
5: For all capsule j in layer l + 1: \|$s_{j} \leftarrow \sum _i c_{ij}^{\prime } \cdot \hat{u}_{j\|i}$\|
6: For all capsule j in layer l + 1: v_j ← squash(s_j)
7: For all capsule i in layer l and capsule j in layer (l + 1):\|$b_{ij} \leftarrow b_{ij}+\hat{u}_{j\|i} \cdot v_{j}$\|
Return v_j

Sigmoid routing algorithm
1: Input to routing(⁠\|$\hat{u}_{j\|i},r,l$\|⁠)
2: For all capsule i in layer l and capsule j in layer (l + 1): b_ij ← 0
3: For r iterations do:
4: For all capsule i in layer l: \|$c_{ij}^{\prime } \leftarrow sigmoid(b_{i})$\|
5: For all capsule j in layer l + 1: \|$s_{j} \leftarrow \sum _i c_{ij}^{\prime } \cdot \hat{u}_{j\|i}$\|
6: For all capsule j in layer l + 1: v_j ← squash(s_j)
7: For all capsule i in layer l and capsule j in layer (l + 1):\|$b_{ij} \leftarrow b_{ij}+\hat{u}_{j\|i} \cdot v_{j}$\|
Return v_j

Table 1.

Open in new tab

Sigmoid routing algorithm.

Sigmoid routing algorithm
1: Input to routing(⁠\|$\hat{u}_{j\|i},r,l$\|⁠)
2: For all capsule i in layer l and capsule j in layer (l + 1): b_ij ← 0
3: For r iterations do:
4: For all capsule i in layer l: \|$c_{ij}^{\prime } \leftarrow sigmoid(b_{i})$\|
5: For all capsule j in layer l + 1: \|$s_{j} \leftarrow \sum _i c_{ij}^{\prime } \cdot \hat{u}_{j\|i}$\|
6: For all capsule j in layer l + 1: v_j ← squash(s_j)
7: For all capsule i in layer l and capsule j in layer (l + 1):\|$b_{ij} \leftarrow b_{ij}+\hat{u}_{j\|i} \cdot v_{j}$\|
Return v_j

Sigmoid routing algorithm
1: Input to routing(⁠\|$\hat{u}_{j\|i},r,l$\|⁠)
2: For all capsule i in layer l and capsule j in layer (l + 1): b_ij ← 0
3: For r iterations do:
4: For all capsule i in layer l: \|$c_{ij}^{\prime } \leftarrow sigmoid(b_{i})$\|
5: For all capsule j in layer l + 1: \|$s_{j} \leftarrow \sum _i c_{ij}^{\prime } \cdot \hat{u}_{j\|i}$\|
6: For all capsule j in layer l + 1: v_j ← squash(s_j)
7: For all capsule i in layer l and capsule j in layer (l + 1):\|$b_{ij} \leftarrow b_{ij}+\hat{u}_{j\|i} \cdot v_{j}$\|
Return v_j

3 DATA

3.1 Data preparation

Our sample set were selected from the Galaxy Challenge in the Galaxy Zoo2(GZ2) (Hart et al. 2016), and the data set is deployed on the Kaggle platform.¹ The data of Kaggle were selected from SDSS DR7, which contains 61 578 labelled galaxies observations with a size of 424 × 424 × 3 pixels, the label of each image is 1 × 37 vector, which comes from the cumulative frequency correction value of GZ2 volunteers voting scores. We selected five classes of galaxies from the GZ2 for model training and testing. The galaxies of five classes are completely round, in-between smooth, cigar-shape smooth, edge-on, and spiral (Zhu et al. 2019). Fig. 4 shows 20 randomly selected images of galaxy morphology from the GZ2 dataset.

The data set of GZ2 has its corresponding classification threshold standard. For a galaxy image, its cumulative voting score correction value must meet a certain threshold to be classified into a galaxy category. To obtain enough sample data of galaxy, we modify the threshold selection criteria of the smooth galaxy, and the threshold criteria of other galaxy remain unchanged. Table 2 is the threshold selection criteria for classifying five classes galaxy.

Table 2.

Open in new tab

The criteria of clean sample selection. Here, T01–T11 are 11 classification problems in GZ2; f_smooth is the probability that a galaxy image is classified as a smooth galaxy; f_{features/disc} is the probability of being classified as characteristic or disc structure; f_spiral/yes is the probability of being classified as a spiral; f_edge-on/yes and f_edge-on/no are the probability of being classified as an edge-on and not an edge-on; f_{completely round}, f_in-between and f_cigar-shaped are the probabilities of being classified as completely round smooth, between a round, and cigar-shaped (Willett et al. 2013).

Class	Clean sample	Task	Threshold	N_sample
0	Spiral	T01	f_{features/disc} ≥ 0.430	7806
		T02	f_{edge-on, no} ≥ 0.715
		T04	f_{spiral, yes} ≥ 0.619
1	Edge-on	T01	f_{features/disc} ≥ 0.430	3903
		T02	f_{edge-on, yes} ≥ 0.602
2	Cigar-	T07	f_smooth ≥ 0.469	578
	shape smooth	T01	f_cigar-shaped ≥ 0.50
3	Completely	T07	f_smooth ≥ 0.469	8343
	round	T01	f_{completely round} ≥ 0.50
4	In-between	T07	f_smooth ≥ 0.469	8069
	smooth	T01	f_in-between ≥ 0.50

Class	Clean sample	Task	Threshold	N_sample
0	Spiral	T01	f_{features/disc} ≥ 0.430	7806
		T02	f_{edge-on, no} ≥ 0.715
		T04	f_{spiral, yes} ≥ 0.619
1	Edge-on	T01	f_{features/disc} ≥ 0.430	3903
		T02	f_{edge-on, yes} ≥ 0.602
2	Cigar-	T07	f_smooth ≥ 0.469	578
	shape smooth	T01	f_cigar-shaped ≥ 0.50
3	Completely	T07	f_smooth ≥ 0.469	8343
	round	T01	f_{completely round} ≥ 0.50
4	In-between	T07	f_smooth ≥ 0.469	8069
	smooth	T01	f_in-between ≥ 0.50

Table 2.

Open in new tab

The criteria of clean sample selection. Here, T01–T11 are 11 classification problems in GZ2; f_smooth is the probability that a galaxy image is classified as a smooth galaxy; f_{features/disc} is the probability of being classified as characteristic or disc structure; f_spiral/yes is the probability of being classified as a spiral; f_edge-on/yes and f_edge-on/no are the probability of being classified as an edge-on and not an edge-on; f_{completely round}, f_in-between and f_cigar-shaped are the probabilities of being classified as completely round smooth, between a round, and cigar-shaped (Willett et al. 2013).

Class	Clean sample	Task	Threshold	N_sample
0	Spiral	T01	f_{features/disc} ≥ 0.430	7806
		T02	f_{edge-on, no} ≥ 0.715
		T04	f_{spiral, yes} ≥ 0.619
1	Edge-on	T01	f_{features/disc} ≥ 0.430	3903
		T02	f_{edge-on, yes} ≥ 0.602
2	Cigar-	T07	f_smooth ≥ 0.469	578
	shape smooth	T01	f_cigar-shaped ≥ 0.50
3	Completely	T07	f_smooth ≥ 0.469	8343
	round	T01	f_{completely round} ≥ 0.50
4	In-between	T07	f_smooth ≥ 0.469	8069
	smooth	T01	f_in-between ≥ 0.50

Class	Clean sample	Task	Threshold	N_sample
0	Spiral	T01	f_{features/disc} ≥ 0.430	7806
		T02	f_{edge-on, no} ≥ 0.715
		T04	f_{spiral, yes} ≥ 0.619
1	Edge-on	T01	f_{features/disc} ≥ 0.430	3903
		T02	f_{edge-on, yes} ≥ 0.602
2	Cigar-	T07	f_smooth ≥ 0.469	578
	shape smooth	T01	f_cigar-shaped ≥ 0.50
3	Completely	T07	f_smooth ≥ 0.469	8343
	round	T01	f_{completely round} ≥ 0.50
4	In-between	T07	f_smooth ≥ 0.469	8069
	smooth	T01	f_in-between ≥ 0.50

We collected 28 790 galaxy images based on the threshold rule for galaxies in Table 2. Among them, 7806 spiral galaxies, 578 cigar galaxies, 3903 lateral galaxies, 8069 intermediate galaxies, and 8434 circular galaxies. Finally, a data set was constructed in our study.

3.2 Data pre-processing

When training set is limited, data augmentation can improve the performance of the model. Data augmentation plays an important role in the final recognition performance and generalization ability of a model. General data augmentation methods include rotation, translation, scaling, random flipping, and brightness. It should be noted that scaling, translation, and brightness change had little effect on the model performance. Therefore, we augment the sample set through rotation and random flipping (horizontal and vertical). However, excessive data augmentation may lead to increased computational effort of the model. Sometimes, although resulting in overall performance improvement, imbalance problem between classes may occur in it (Balestriero, Bottou & LeCun 2022). Therefore, we experimented with augmenting the training set by a factor of 1 to 5, and found that the accuracy of the model did not change significantly when the training set was augmented by a factor of 3. Table 3 shows the aggregation of the model accuracy and training time at different augmentation multiples.

Table 3.

Open in new tab

Model performance and training time under different data augmentation methods The ‘Multiples’ is augmentation multiples of training set.

	Data augmentation methods
Multiples	Rotation angle (^o)	Flipping probability	Accuracy	Time (s/Epoch)
0	/	/	0.9505	16
1	90	/	0.9659	30
2	90	0.5	\|$\boldsymbol {0.9701}$\|	46
3	90 180	0.5	0.9698	65
4	90,180,270	0.5	0.9669	89
5	45,90,180,270	0.5	0.9689	145

	Data augmentation methods
Multiples	Rotation angle (^o)	Flipping probability	Accuracy	Time (s/Epoch)
0	/	/	0.9505	16
1	90	/	0.9659	30
2	90	0.5	\|$\boldsymbol {0.9701}$\|	46
3	90 180	0.5	0.9698	65
4	90,180,270	0.5	0.9669	89
5	45,90,180,270	0.5	0.9689	145

Table 3.

Open in new tab

Model performance and training time under different data augmentation methods The ‘Multiples’ is augmentation multiples of training set.

	Data augmentation methods
Multiples	Rotation angle (^o)	Flipping probability	Accuracy	Time (s/Epoch)
0	/	/	0.9505	16
1	90	/	0.9659	30
2	90	0.5	\|$\boldsymbol {0.9701}$\|	46
3	90 180	0.5	0.9698	65
4	90,180,270	0.5	0.9669	89
5	45,90,180,270	0.5	0.9689	145

	Data augmentation methods
Multiples	Rotation angle (^o)	Flipping probability	Accuracy	Time (s/Epoch)
0	/	/	0.9505	16
1	90	/	0.9659	30
2	90	0.5	\|$\boldsymbol {0.9701}$\|	46
3	90 180	0.5	0.9698	65
4	90,180,270	0.5	0.9669	89
5	45,90,180,270	0.5	0.9689	145

Secondly, we perform intermediate cropping of galaxy images to retain the complete information of galaxy images, to reduce noise and dimension of data. We cut the images from 424 × 424 × 3 pixels to 212 × 212 × 3 pixels, and then downsample it to 80 × 80 × 3 pixels. After pre-processing the data, we divide the data into test set and training set in the ratio of 1:9.

4 RESULT AND DISCUSSION

4.1 Computer set-up

The primary device for this study is a server with a 24G RTX A5000 GPU, 14-core Intel (R) Xeon (R) Gold 6330 CPU, besides Windows10 OS, 2021.1 version of Pycharm professional version, 11.2 version CUDA, Python language, Tensorflow, Pandas, and Scikit-learn libraries.

4.2 Selection of hyper-parameters

In deep learning, hyper-parameters selection plays a vital role in the performance of the model. We carried out a series of experiments on the hyper-parameters such as convolution kernel size, dynamic routing number, and batch-size to select the best value of hyper-parameters.

The convolution layer of the capsule network is mainly used to extract the low-level features of the galaxy images, and the size of the convolution kernel affects the performance of feature extraction. We designed three experiments based on the principle that large convolution kernels can expand the acceptance domain and small convolution kernels can extract more detailed features. The selected convolution kernels are 7 × 7, 5 × 5, and 3 × 3 combinations, 9 × 9, 6 × 6, and 3 × 3 combinations, and the single layer convolution with a kernel size of 9. Other parameters of network remain unchanged and the model is trained in batches. The results are shown in Table 4.

Table 4.

Open in new tab

The classification accuracy of MSCCN under different hyper-parameters.

Hyper-parameters		Accuracy
Conv Kernel	7 − 5,3	\|$\boldsymbol {0.9673}$\|
	9 − 6,3	0.9396
	9	0.9293
Routing	3	0.9273
	6	0.9144
	12	\|$\boldsymbol {0.9664}$\|
Batch-size	32	\|$\boldsymbol {0.9626}$\|
	64	0.9466
	128	0.9473

Hyper-parameters		Accuracy
Conv Kernel	7 − 5,3	\|$\boldsymbol {0.9673}$\|
	9 − 6,3	0.9396
	9	0.9293
Routing	3	0.9273
	6	0.9144
	12	\|$\boldsymbol {0.9664}$\|
Batch-size	32	\|$\boldsymbol {0.9626}$\|
	64	0.9466
	128	0.9473

Table 4.

Open in new tab

The classification accuracy of MSCCN under different hyper-parameters.

Hyper-parameters		Accuracy
Conv Kernel	7 − 5,3	\|$\boldsymbol {0.9673}$\|
	9 − 6,3	0.9396
	9	0.9293
Routing	3	0.9273
	6	0.9144
	12	\|$\boldsymbol {0.9664}$\|
Batch-size	32	\|$\boldsymbol {0.9626}$\|
	64	0.9466
	128	0.9473

Hyper-parameters		Accuracy
Conv Kernel	7 − 5,3	\|$\boldsymbol {0.9673}$\|
	9 − 6,3	0.9396
	9	0.9293
Routing	3	0.9273
	6	0.9144
	12	\|$\boldsymbol {0.9664}$\|
Batch-size	32	\|$\boldsymbol {0.9626}$\|
	64	0.9466
	128	0.9473

The weight updating in CapsNet is based on iterative dynamic routing, and the number of dynamic routing plays an essential role in the stability and classification performance of the model. To explore the influence of dynamic routing numbers on the performance of CapsNet, we take the dynamic routing numbers of 3, 6, and 12, and other parameters of network remain unchanged. The classification results are shown in Table 4.

Deep learning is optimized by a stochastic gradient descent algorithm. The principle of the stochastic gradient descent algorithm is as follows:

$$\begin{eqnarray} {w_{t + 1}} = {w_t}-\eta \frac{1}{n}\sum \limits _{x \in \beta } {\Delta l(x,{w_t})} , \end{eqnarray}$$

(10)

where η is the learning rate, |$\mathit {n}$| is the batch-size, w_t is the gradient; in equation (10), the two parameters directly determine the optimization performance of the model, and are the most critical parameters for the optimization convergence of the model. In our work, the learning rate is 0.001 and decayed over time. Several experiments were conducted on the batch-size to select the best value. In this study, the batch-size is 128, 64, 32, and other parameters were unchanged, the classification results are shown in Table 4.

4.3 Regression model based on MSCCN

In our work, we also designed a regression model based on CapsNet, which predicted the vote fraction probabilities of a 37-question list in the GZ2 decision tree. In this regression model, we have not used data augmentation and only cropped the images with a central window like our classification model. The final input of regression is an image of size 224 × 224 × 3 and vote-fractions for the 37 questions in the GZ2 decision tree.

Root mean square error (RMSE) is to measure the deviation between the predicted value and the actual values. The RMSE of our regression model is 0.08192; when comparing our RMSE with the public leader-board of Kaggle Galaxy Zoo challenge, we find that RMSE was placed ninth on the public leader-board. The results show the error between the predicted value and the actual values is very slight; our model can correctly predict the probabilities.

4.4 Classification results of the MSCCN model

When the convolution kernel size is divided into 7 × 7, 5 × 5, and 3 × 3, batch-size is 32, and the number of dynamic routing is 3, the MSCCN model will be achieved the best results. We set the initial learning rate as 0.001 and the learning rate decay factor as 0.9. As the iteration increases, the learning rate decays in the proportion of 0.9 to avoid overfitting of the classification model. There are five classes of the galaxy in data set, therefore we set the number of digital capsules as five.

The final classification accuracy of the well-trained model for galaxy morphology is 0.9701. Among the five classes of galaxy morphology classification results, the edge-on galaxy obtained the highest classification accuracy of 0.9925, the second is completely round smooth galaxy, its accuracy is 0.9751. In addition, the classification accuracy of spiral galaxies and in-between smooth galaxies are 0.9725 and 0.9617. The classification accuracy of cigar-like galaxies is 0.9298. The analysis results shows that the original data of the cigar-shape smooth galaxy is fewer, therefore the MSCCN model cannot fully learn the characteristics of cigar shape smooth galaxy, and the classification performance is lower than the other four classes. Fig. 5 is the curve of training loss and classification accuracy of the MSCCN model. In our model, the model converges after 70 epochs, and the number of parameters are 7.65 million. Training an epoch takes 46 s, and a well-trained model will spent 53 min.

Figure 4.

Galaxy morphology images in GZ2 data set.

Open in new tab Download slide

Figure 5.

The accuracy and loss curve of the different model. The CapsNet is the traditional model, the MS_CapsNet is the CapsNet model with a multiscale convolution layer, and the SR_CapsNet is the CapsNet model with a sigmoid routing.

Open in new tab Download slide

At the same time, we calculated the confusion matrix of the testing set for the MSCCN model. In Fig. 6, each row of the matrix represent true label of the galaxy category, while each column represent the predicted label of the galaxy category. From the confusion matrix, we found that there are three edge-on galaxies wrongly predicted as cigar-shape smooth galaxies, and one cigar-shape smooth wrongly classified as spiral galaxies, because spiral galaxies and cigar-shape smooth galaxies are very similar in morphology and structure. After analysis of the study results, we find that when the characteristics of the category are similar or same, the performance of model will be inhibited. In addition, six completely round smooth galaxies were misclassified as in-between smooth galaxies, and 45 in-between smooth galaxies were predicted to be completely round smooth galaxies. Our analysis shows that the shapes of these two galaxies are smooth and the threshold selection of clean samples between them is very close, which results in some deviations in the classification results.

Figure 6.

The confusion matrix of MSCCN model.

Open in new tab Download slide

Receiver Operating Characteristic curve (ROC) and Area under the curve (AUC) can reduce the interference caused by different test sets in model evaluation, they can more objectively measure the performance of model. To verify the generalization ability of model, we calculated the ROC curve and the AUC value of the model for each classes of galaxy. In Fig. 7, each colours represents a category of the galaxy. The horizontal axis is the False Positive Rate (FPR), and the vertical axis is the True Positive Rate (TPR). The TPR of ideal model is supposed to be close to 1, and FPR close to 0. In Fig. 7, the TPR of the five galaxy classes is close to 1, and the FPR is close to 0, indicating that the model has achieved good predicted results for each galaxy classes. The AUC values of the former four types of galaxies are all above 0.99. And the AUC values of the in-between smooth galaxies are also above 0.98, which indicates that the robustness of the model is relatively strong, and the imbalance of data samples has little effect on the overall performance of the model.

Figure 7.

ROC curves and AUC values of MSCCN model.

Open in new tab Download slide

4.5 Results comparison with other similar works

Dieleman, Willett & Dambre (2015) first applied deep learning to the galaxy morphology classification in the galaxy challenge. They proposed a convolution neural network galaxy morphology classification model specifically for galaxy image properties. The model effectively uses the translation and rotation symmetry in the image and autonomously learns the abstract representation of multilevel features of the image. It can efficiently and automatically represent image categories with morphological information and classify galaxies accurately and quickly. Huertas-Company et al. (2015) used the Dieleman model to classify the high-redshift galaxy images from CANDELS cruise data into five categories and achieved excellent classification results. In 2018, Zhu et al. (2019) based on ResNet V2 and combined with the characteristics of galaxy images themselves, proposed an improved deep residual network model for galaxy morphology classification, namely ResNet-26. This model improves the residual unit, while reducing the depth of network, widening the width of network, and realizing the automatic extraction of galaxy morphological features for identification and classification. In 2022, Gupta et al. (2022) introduced a continuous depth version of the Residual Network (ResNet) called Neural ordinary differential equations (NODE) for galaxy morphology classification. They train NODE with different numerical techniques such as Adjoint and Adaptive Checkpoint Adjoint (ACA) and compared them with ResNet, the results show that the accuracy of NODE is comparable to ResNet, and the number of parameters used is about one-third compared to ResNet.

In this work, we compared the MSCCN model with Dieleman model, Resnet-26 model, NODE model in the same experimental environment, to verify the validity and superiority of the MSCCN model in the task of galaxy morphology classification. The Dieleman model is the first model to apply CNN to astronomical image classification, which consists of four convolution layers and three full connection layers. The Resnet-26 model is a 26-layers residual network. We analysed the results of the three models and selected the accuracy, precision, recall, and F1-score as the evaluation indexes of the model. Table 5 shows the study results of four models.

Table 5.

Open in new tab

Comparison results of classification evaluation indexes of four models.

Model	Dieleman (Dieleman et al. 2015)	NODE (Gupta et al. 2022)	Resnet-26 (Zhu et al. 2019)	MSCCN (In this work)
Accuracy (per cent)	93.88	91.65	94.68	\|$\boldsymbol {97.01}$\|
Precision (per cent)	94.55	91.55	95.12	\|$\boldsymbol {95.97}$\|
Recall (per cent)	94.86	93.59	95.21	\|$\boldsymbol {98.16}$\|
F1 (per cent)	94.56	92.60	95.15	\|$\boldsymbol {96.39}$\|

Model	Dieleman (Dieleman et al. 2015)	NODE (Gupta et al. 2022)	Resnet-26 (Zhu et al. 2019)	MSCCN (In this work)
Accuracy (per cent)	93.88	91.65	94.68	\|$\boldsymbol {97.01}$\|
Precision (per cent)	94.55	91.55	95.12	\|$\boldsymbol {95.97}$\|
Recall (per cent)	94.86	93.59	95.21	\|$\boldsymbol {98.16}$\|
F1 (per cent)	94.56	92.60	95.15	\|$\boldsymbol {96.39}$\|

Table 5.

Open in new tab

Comparison results of classification evaluation indexes of four models.

Model	Dieleman (Dieleman et al. 2015)	NODE (Gupta et al. 2022)	Resnet-26 (Zhu et al. 2019)	MSCCN (In this work)
Accuracy (per cent)	93.88	91.65	94.68	\|$\boldsymbol {97.01}$\|
Precision (per cent)	94.55	91.55	95.12	\|$\boldsymbol {95.97}$\|
Recall (per cent)	94.86	93.59	95.21	\|$\boldsymbol {98.16}$\|
F1 (per cent)	94.56	92.60	95.15	\|$\boldsymbol {96.39}$\|

Model	Dieleman (Dieleman et al. 2015)	NODE (Gupta et al. 2022)	Resnet-26 (Zhu et al. 2019)	MSCCN (In this work)
Accuracy (per cent)	93.88	91.65	94.68	\|$\boldsymbol {97.01}$\|
Precision (per cent)	94.55	91.55	95.12	\|$\boldsymbol {95.97}$\|
Recall (per cent)	94.86	93.59	95.21	\|$\boldsymbol {98.16}$\|
F1 (per cent)	94.56	92.60	95.15	\|$\boldsymbol {96.39}$\|

Dieleman, NODE model, and RESNET-26 model are all deep learning models designed for the task of galaxy morphological classification. From Table 5, it shows that they have achieved good effects in the classification of galaxy images, the accuracy and precision of those methods are 0.9388, 0.9165, 0.9468 and 0.9455, 0.9155, 0.9512. The classification accuracy and precision of MSCCN model are 0.9701 and 0.9597, which is better than the other methods. We introduced Recall as the same time. Dieleman, NODE, and Resnet-26 are 0.9486, 0.9359 and 0.9521 on the recall of the model, and the Recall of MSCCN model is 0.9815. The results shows when the number of Cigar-shape smooth galaxy samples is fewer, and the data set samples are unbalanced, the MSCCN model still performs better than the former three groups of models in the comparative test. The F1-score of the MSCCN is also better than those of the other three models, and the F1 scores of the four models are 0.9639, 0.9260, 0.9515, 0.9456.

Kalvankar, Pandit & Parwate (2020) proposed a fine-tuned architecture using EfficientNetB5 to classify galaxies into seven classes. They introducing irregular galaxies on top of the five classes, and subdividing spiral galaxies into barred spiral galaxies and unbarred spiral galaxies based on whether they have a bar structure. The fine-tuned architecture achieved a classification accuracy of 0.9370. Yao-Yu Lin et al. (2021) used a Vision Transformer to classify the smaller sized and fainter galaxies. In their work, they classified irregular galaxies and merger galaxies based on the work of Zhu et al. (2019) and Kalvankar et al. (2020) according to the characteristics of whether they have mergers or not, and then classified galaxies into eight classes, the best overall classification accuracy of this work is 0.8055. To evaluate our model more objectively, we select eight classes of galaxies according to Yao-Yu Lin et al. (2021), and select seven classes of galaxies based on Kalvankar et al. (2020). When we use MSCCN to classify the eight classes of galaxies, the accuracy is 0.9159, when the galaxies are of seven classes, the accuracy is 0.9427. The results shows that MSCCN model still perform well on multiple classes of galaxy morphology classification task.

4.6 Analysis and discussion

MSCCN model shows excellent results in accuracy, precision, loss, confusion matrix, ROC curve and other model evaluation indices, which shows that the model has a good performance for the images of galaxy data. The accuracy of the cigar-shape smooth galaxy is 0.9298 (Without data augment is 0.7236), which is lower than the other four classes. Which is due to the number of original samples of the cigar-shape smooth galaxies is too few. And when the model classifies galaxies, the classification boundary tends to occupy the area of minority classes. However, through the analysis of the ROC curve and AUC value of the model, we found that the data imbalance has a limited impact on the overall performance of the model, and the overall generalization ability of the model is still strong. In data set, the completely round smooth galaxy and the in-between smooth galaxy have the characteristics of none obvious interclass boundary and little difference in classification threshold, which undoubtedly increases the difficulty of classification. The classification accuracy of the model for completely round smooth galaxies and in-between smooth galaxies is 0.9706 and 0.9810. The results of this work show that the multiscale convolution layer of the MSCCN model can extract the multiscale primary features of the galaxy, accurately classify each classes of galaxy, and eliminate the influence of sample classification boundary ambiguity on the performance of the model.

5 VISUALIZATION ANALYSIS OF MSCCN MODEL

In order to explore the information on the representation of the morphological features of the galaxy from the data itself, we randomly selected 1000 samples from the test set for visual representation to analyse the output of the DigitCaps layer of the MSCCN model. The visualization analysis was implemented in our study based on the t-SNE algorithm, that is a non-linear dimensionality reduction and visualization method(van der Maaten & Hinton 2008). It can retain the local structure of the sample data and obtain low-dimensional data with higher similarity to the original high-dimensional data (Dai & Tong 2018). The t-SNE algorithm converts the similarity between data points into probability, and the similarity in the original high-dimensional space is represented by Gaussian distribution. The probability of embedding space is represented by T-distribution so that the data in the high-dimensional space is mapped to the low-dimensional space and visual representation. Fig. 8 is the feature visualization the MSCCN model.

Figure 8.

Feature visualization of DigitCaps layer of MSCCN model.

Open in new tab Download slide

In Fig. 8, because the galaxies of the same morphology have similar underlying structures, each class of galaxies is distributed in clusters. Completely round smooth galaxy and in-between smooth galaxy tend to converge. Since both completely round smooth galaxy and in-between smooth galaxy are smooth galaxy, there is no definite classification boundary between them, and they are similar in shape, resulting in misclassified. In Fig. 8, cigar-shape smooth galaxies and spiral galaxies are inter-weaved, and many samples between them are misclassified. Through analysing the results and comparing the image, we found that the geometric shapes of cigar-shape smooth galaxy and lateral spiral galaxy are very similar. When labelling the original samples, the labels of the two galaxies are easily mislabelled, but they are correctly recognized by the model when classified. At the same time, the shapes of the two galaxies are similar, which can also lead to the wrong classification. This discovery contribute to the understanding of the physical properties of galaxy morphology.

Fig. 8 shows the visual representation of outliers in the DigitCaps layer of MSCCN model. The red outliers (Id: 1–3589, 1–3755) belong to spiral galaxies but appear in the in-between smooth galaxy represented by orange. The analysis suggests that the two galaxy images have similar structures to the in-between smooth galaxy, therefore it is early been misclassified. The model misclassified the blue outliers (Id: 3–2476) as spiral galaxies. After analysis, it is a galaxy image with poor quality. The galaxies are located in a small part of the image, and it is not easy to distinguish the class of the galaxies. In addition, the outliers (Id: 2–3793) were found to be a spiral galaxy, but it was wrongly marked as an in-between smooth galaxy. It was predicted as a lateral galaxy after the model classification.

Furthermore, we visualized the output of the last average pooling layer of the CNN as a comparison to our visualization work. In this part work, we constructed a convolution neural network to classify galaxy morphology based on the model of (Dai & Tong 2018). Finally, we select 1000 samples to visualized the features of CNN, and Fig. 9 is the feature visualization of last average pooling layer of CNN. We found our MSCCN model has a better separability compared to CNN, and each class of galaxy is separated from the other.

Figure 9.

Feature visualization of last average pooling layer of CNN (Accuracy = 0.92).

Open in new tab Download slide

We visually analyse the output of the DigitCaps layer. In Fig. 8, each class of galaxy is clustered and separated from each other, which shows that the MSCCN model has better separability, and the classification effect of the model is excellent. In the future, we try to explore the low-level structure of sample data and the physical meaning of high-dimensional feature representation through visualization analysis, which is contribute to discover data rules quickly, explain the classification results of model, and provide more helpful feedback for the galaxies classification system.

6 CONCLUSION

This study presents a method for galaxy morphology classification, namely multiscale convolution capsule network. In this network, we used a multiscale convolution layer to replace the convolution layer of the traditional capsule network, built a parallel convolution layer, and reconstruct the main capsule layer. Additionally, we used a disperse dynamic routing algorithm to get a more uniform distribution of coupling coefficients; it can assign larger coupling coefficients to the true features and smaller coupling coefficients to the wrong features, which can strengthen the discernibility of the output vector for each class and improve the robustness of the CapsNet. The model can fully capture the multiscale galaxy features, and further extract the hidden galaxy information in the galaxies image, and reduce the parameter redundancy. At the same time, it solved the problem of traditional deep learning being unable to extract the spatial information of the galaxy and the loss of feature information. The results show that the multiscale convolution capsule network has a better classification performance on galaxy morphology classification. The classification results are better than the comparative models selected in this study and can be applied to galaxy morphology classification.

ACKNOWLEDGEMENTS

We thank the anonymous referee for valuable and helpful comments and suggestions. This work is supported by the National Nature Science Foundation of China (61561053), and the Scientific Research Foundation Project of Yunnan Education Department (2023J0624). This work is also supported by the Astronomical Big Data Joint Research Center, co-founded by National Astronomical Observatories, Chinese Academy of Sciences and Alibaba Cloud.

DATA AVAILABILITY

All data used in this work are publicly available. Details on how to access the data can be found on their websites: https://www.kaggle.com/competitions/galaxy-zoo-the-galaxy-challenge/overview.

And all code produced in this work are available upon reasonable request to the authors.

Footnotes

1

https://www.kaggle.com/competitions/galaxy-zoo-the-galaxy-challenge/overview

References

Abraham

R. G.

,

van den Bergh

S.

,

Nair

P.

,

2003

,

ApJ

,

588

,

218

10.1086/373919

Crossref

Search ADS

Balestriero

R.

,

Bottou

L.

,

LeCun

Y.

,

2022

,

Advances in Neural Information Processing Systems

,

preprint

(

arXiv

)

10.48550/arXiv.2204.03632

Ball

N. M.

,

Loveday

J.

,

Brunner

R. J.

,

2008

,

MNRAS

,

383

,

907

10.1111/j.1365-2966.2007.12627.x

Crossref

Search ADS

Bershady

M. A.

,

Jangren

A.

,

Conselice

C. J.

,

2000

,

AJ

,

119

,

26

Crossref

Search ADS

Conselice

C. J.

,

2003

,

ApJS

,

147

,

1

10.1086/375001

Crossref

Search ADS

Dai

J.-M.

,

Tong

J.

,

2018

,

preprint

(

arXiv

)

10.48550/arXiv.1807.05657

Dieleman

S.

,

Willett

K. W.

,

Dambre

J.

,

2015

,

MNRAS

,

450

,

1441

10.1093/mnras/stv632

Crossref

Search ADS

Fanson

J. L.

,

Fazio

,

1998

, in

Bely

P. Y.

,

Breckinridge

J. B.

eds,

Proc. SPIE Conf. Ser. Vol. 3356, Space Telescopes and Instruments V

.

SPIE

,

Bellingham

, p.

478

Fielding

E.

,

Nyirenda

C. N.

,

Vaccari

M.

,

2022

,

International Conference on Electrical, Computer and Energy Technologies (ICECET)

,

Prague, Czech Republic

, p.

1

Gardner

J. P.

et al. ,

2006

,

Space Sci. Rev.

,

123

,

485

10.1007/s11214-006-8315-7

Crossref

Search ADS

Gupta

R.

,

Srijith

P. K.

,

Desai

S.

,

2022

,

Astron. Comput.

,

38

,

100543

10.1016/j.ascom.2021.100543

Crossref

Search ADS

Hart

R. E.

et al. ,

2016

,

MNRAS

,

461

,

3663

10.1093/mnras/stw1588

Crossref

Search ADS

Hubble

E. P.

,

1926

,

ApJ

,

64

,

321

10.1086/143018

Crossref

Search ADS

Huertas-Company

M.

et al. ,

2015

,

ApJS

,

221

,

8

10.1088/0067-0049/221/1/8

Crossref

Search ADS

Ivezić

Ž.

et al. ,

2019

,

ApJ

,

873

,

111

10.3847/1538-4357/ab042c

Crossref

Search ADS

Jeong

M.

,

Kim

C.

,

2021

,

IEEE International Conference on Image Processing (ICIP)

.

Anchorage, AK, USA

, p.

739

Kalvankar

S.

,

Pandit

H.

,

Parwate

P.

,

2020

,

preprint

(

arXiv

)

10.48550/arXiv.2008.13611

Lotz

J. M.

,

Primack

J.

,

Madau

P.

,

2004

,

AJ

,

128

,

163

10.1086/421849

Crossref

Search ADS

Lupton

R.

,

Gunn

J. E.

,

Ivezic

Z.

,

Knapp

G. R.

,

Kent

S.

,

Yasuda

N.

,

2001

, in

Harnden

F. R.

Jr.,

Primini

F. A.

,

Payne

H. E.

, eds,

ASP Conf. Ser. Vol. 238, Active Galaxies

.

Astron. Soc. Pac

,

San Francisco

, p.

269

Mittal

A.

,

Soorya

A.

,

Nagrath

P.

,

Hemanth

D. J.

,

2020

,

Earth Sci. Inf.

,

13

,

601

Crossref

Search ADS

Nishikawa-Toomey

M.

,

Smith

L.

,

Gal

Y.

,

2020

,

preprint

(

arXiv

)

10.48550/arXiv.2011.08714

Ostrander

E. J.

,

Nichol

R. C.

,

Ratnatunga

K. U.

,

Griffiths

R. E.

,

1998

,

AJ

,

116

,

2644

10.1086/300627

Crossref

Search ADS

Sabour

S.

,

Frosst

N.

,

E Hinton

G.

,

2017

,

Proceedings of the 31st International Conference on Neural Information Processing Systems

.

Curran Associates Inc

,

Long Beach, California, USA

, p.

6000

10.48550/arXiv.1710.09829

Scoville

N.

et al. ,

2007

,

ApJs

,

172

,

1

10.1086/516585

Crossref

Search ADS

Sersic

J. L.

,

1968

,

Atlas de Galaxias Australes

.

Observatorio Astronomico

,

Cordoba, Argentina

Sorrentino

G.

,

Antonuccio-Delogu

V.

,

Rifatto

A.

,

2006

,

A&A

,

460

,

673

10.1051/0004-6361:20065789

Crossref

Search ADS

van der Maaten

L.

,

Hinton

G.

,

2008

,

J. Mach. Learn. Res.

,

9

,

2579

Wang

M.

,

Xu

X.

,

2007

,

Prog. Astron.

,

25

,

215

Wang Lin-Qian

B. Q.

,

Luo

A.-L.

,

2022

,

Astron. Res. Technol.

,

19

,

359

Willett

K. W.

et al. ,

2013

,

MNRAS

,

435

,

2835

10.1093/mnras/stt1458

Crossref

Search ADS

Xiang

C.

,

Zhang

L.

,

Tang

Y.

,

Zou

W.

,

Xu

C.

,

2018

,

IEEE Signal Process. Lett

,

25

,

1850

10.1109/LSP.2018.2873892

Crossref

Search ADS

Yao-Yu Lin

J.

,

Liao

S.-M.

,

Huang

H.-J.

,

Kuo

W.-T.

,

Hsuan-Min Ou

O.

,

2021

,

preprint

(

arXiv

)

10.48550/arXiv.2110.01024

Zhang

Z.

,

Zou

Z.

,

Li

N.

,

Chen

Y.

,

2022

,

Res. Astron. Astrophys.

,

22

,

055002

10.1088/1674-4527/ac5732

Crossref

Search ADS

Zhao

G.

,

Zhao

Y.-H.

,

Chu

Y.-Q.

,

Jing

Y.-P.

,

Deng

L.-C.

,

2012

,

Res. Astron. Astrophys.

,

12

,

723

10.1088/1674-4527/12/7/002

Crossref

Search ADS

Zhu

X.-P.

,

Dai

J.-M.

,

Bian

C.-J.

,

Chen

Y.

,

Chen

S.

,

Hu

C.

,

2019

,

Ap&SS

,

364

,

55

10.1007/s10509-019-3540-1

Crossref

Search ADS

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
May 2023	3
June 2023	13
July 2023	11
August 2023	9
September 2023	7
October 2023	11
November 2023	6
December 2023	2
January 2024	18
February 2024	12
March 2024	18
April 2024	27
May 2024	29
June 2024	32
July 2024	31
August 2024	14
September 2024	47
October 2024	30
November 2024	19
December 2024	11
January 2025	20
February 2025	10
March 2025	41
April 2025	17
May 2025	6

Article Contents

Galaxy morphology classification using multiscale convolution capsule network

ABSTRACT

1 INTRODUCTION

2 METHODS

2.1 Capsule network

2.1.1 Convolution layer

2.1.2 Primary capsule layer

2.1.3 Digital capsule layer

2.1.4 Loss function

2.2 Multiscale convolution capsule network

2.2.1 Multiscale parallel convolution layer

2.2.2 Sigmoid routing

3 DATA

3.1 Data preparation

3.2 Data pre-processing

4 RESULT AND DISCUSSION

4.1 Computer set-up

4.2 Selection of hyper-parameters

4.3 Regression model based on MSCCN

4.4 Classification results of the MSCCN model

4.5 Results comparison with other similar works

4.6 Analysis and discussion

5 VISUALIZATION ANALYSIS OF MSCCN MODEL

6 CONCLUSION

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

References

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Galaxy morphology classification using multiscale convolution capsule network

ABSTRACT

1 INTRODUCTION

2 METHODS

2.1 Capsule network

2.1.1 Convolution layer

2.1.2 Primary capsule layer

2.1.3 Digital capsule layer

2.1.4 Loss function

2.2 Multiscale convolution capsule network

2.2.1 Multiscale parallel convolution layer

2.2.2 Sigmoid routing

3 DATA

3.1 Data preparation

3.2 Data pre-processing

4 RESULT AND DISCUSSION

4.1 Computer set-up

4.2 Selection of hyper-parameters

4.3 Regression model based on MSCCN

4.4 Classification results of the MSCCN model

4.5 Results comparison with other similar works

4.6 Analysis and discussion

5 VISUALIZATION ANALYSIS OF MSCCN MODEL

6 CONCLUSION

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

References

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only