Abstract

We present a machine-learning (ML) approach for classifying kinematic profiles of elliptical galaxies in the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey. Previous studies employing ML to classify spectral data of galaxies have provided valuable insights into morphological galaxy classification. This study aims to enhance the understanding of galaxy kinematics by leveraging ML. The kinematics of 2624 MaNGA elliptical galaxies are investigated using integral field spectroscopy by classifying their one-dimensional velocity dispersion (VD) profiles. We utilized a total of 1266 MaNGA VD profiles and employed a combination of unsupervised and supervised learning techniques. The unsupervised K-means algorithm classifies VD profiles into four categories: flat, decline, ascend, and irregular. A bagged decision trees classifier (TreeBagger)-supervised ensemble is trained using visual tags, achieving 100  per cent accuracy on the training set and 88  per cent accuracy on the test set. Our analysis identifies the majority (68  per cent) of MaNGA elliptical galaxies presenting flat VD profiles, which requires further investigation into the implications of the dark matter problem.

1 INTRODUCTION

The kinematic profiles of galaxies have unveiled critical challenges in astronomy, such as the ‘missing mass’ problem. For instance, the observation of flat rotation curves in spiral galaxies (Rubin, Ford & Thonnard 1980) implies the presence of a dynamical mass that exceeds the amount of luminous matter initially anticipated by the Keplerian decline in Newtonian dynamics. This flat rotation curve is commonly referred to as the dark matter problem, which aims to account for the insufficient luminous components observed. However, elliptical galaxies in some instances exhibit the decline velocity dispersion (VD) profiles, suggesting a deficiency in dark matter (Milgrom & Sanders 2003; Romanowsky et al. 2003; Tian & Ko 2016). This discovery raises the question of whether the observed dark matter deficiency is a systematic trait, given that elliptical galaxies are commonly thought to have formed through mergers of spiral galaxies (Romanowsky et al. 2003). Therefore, systematic investigations of kinematic profiles can provide valuable insights into the phenomena of dark matter and galaxy formation.

Various types of one-dimensional (1D) VD profiles have been observed in elliptical galaxies. In addition to decline profiles (Durazo et al. 2018), both flat and ascending profiles have been reported in the literature (Veale et al. 2018; Tian et al. 2021). Moreover, the majority of the brightest cluster galaxies (BCGs) exhibit flat VD profiles (Tian et al. 2021), which are a distinct type of elliptical galaxy situated at the centre of galaxy clusters. Identifying BCGs can be a complex task, often requiring a comprehensive survey. As a result, classifying VD profiles could serve as a valuable tool for studying BCGs, even though the challenge lies in the complexity of collecting these VD profiles.

Efficient classification of kinematic profiles can be achieved using two primary techniques. First, spatially resolved galactic kinematics necessitates integral field spectroscopy (IFS, Bacon et al. 2001; Cappellari & Copin 2003; Bundy et al. 2015; García-Benito et al. 2015), an advanced observational method that combines imaging and spectroscopy. By employing an integral field unit (IFU) with a spectrograph, IFS generates a data cube containing spatial and spectral information, facilitating the study of various properties. IFS has significantly impacted the investigation of astronomical phenomena, offering valuable insights into the spatially resolved kinematics, dynamics, and chemical composition of galaxies, star-forming regions, and nebulae. Automatic identification can be achieved through machine learning (ML), a subfield of artificial intelligence that develops algorithms and statistical models, enabling computers to learn from and make predictions or decisions based on data.

ML techniques have been utilized in classifying spectral data (Teimoorinia 2012; Rahmani, Teimoorinia & Barmby 2018; Sarmiento, Huertas-Company & Knapen 2021a; Vavilova et al. 2021; Chang et al. 2022) and deriving various physical parameters of galaxies (Masters et al. 2015; Krakowski et al. 2016; D’Isanto & Polsterer 2018; Bonjean et al. 2019; Davidzon et al. 2019; Hemmati et al. 2019; Chang et al. 2021; Sarmiento et al. 2021b; Marini et al. 2022; Teimoorinia et al. 2022). Furthermore, several studies have presented the principal methods for performing automatic morphological galaxy classification (Ball & Brunner 2010; Way et al. 2012; Ivezić et al. 2014; VanderPlas 2016; El Bouchefry & de Souza 2020; Fluke & Jacobs 2020; Vavilova et al. 2020, 2021).

In a recent study, the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey was utilized as a source for ML classification (Sarmiento et al. 2021a). The Simple Contrastive Learning of Visual Representations framework was employed to explore the multidimensional kinematic and stellar population maps of galaxies (Chen et al. 2020a). Their studies demonstrated that the most relevant classification strongly correlates with VD, dividing the sample into slow- and fast-rotating galaxies. Moreover, another study presented a novel self-supervised ML method to visualize the multidimensional information on stellar population and kinematics in the MaNGA survey in a 2D plane (Sarmiento et al. 2021b). The results not only confirmed that the distinction between high- and low-VD systems reveals different assembly histories and is not only a reflection of a stellar mass bimodality but also found that a supervised classification based on contrastive-learning representations achieved better accuracy than a purely convolutional neural network (CNN)-based supervised classification on the ImageNet data base (Chen et al. 2020b).

Various post-processing parameters have been utilized to enhance our understanding of the physical mechanisms in addition to using morphological galaxy features. For example, Teimoorinia et al. (2022) introduced an unsupervised method for organizing the spectra of the MaNGA survey (Data Release 15; DR15) using a Deep Embedded Self-Organising Map. They condensed the entirety of the MaNGA-observed universe into a 15 × 15 grid of spectra, which served as fingerprints to detect the presence of distinct stellar populations within galaxies. Their findings confirmed that galaxies with similar fingerprints have similar morphologies and inclination angles. Furthermore, Chang et al. (2022) proposed a method to classify galaxy merger stages using ML techniques on MaNGA DR15, incorporating projected separation, line-of-sight velocity difference, Sloan Digital Sky Survey (SDSS) gri images, and MaNGA Hα velocity map as inputs. Their results demonstrated that the physical characteristics of interacting galaxies can be more easily classified using tree-structured classifiers.

ML techniques have also been applied to identify BCG and IntraCluster Light (ICL). Marini et al. (2022) explored the use of supervised Random Forest to classify stars in simulated galaxy clusters after subtracting the member galaxies. They interpreted the dynamically different components as the individual properties of the stars in the BCG and ICL. The ML algorithm was trained, cross-validated, and tested on 29 clusters from a set of cosmological hydrodynamical simulations called DIANOGA. The results showed that this method is reliable and faster than the traditional method of identifying ICL and BCG in the main halo of simulated galaxy clusters. However, the data were simulated rather than observed.

1D profiles have advantages in ML classification due to their reduced sensitivity to dust obscuration and other effects that can impact visual classification. Additionally, 1D profiles are easily comparable across different surveys and telescopes, allowing for more consistent classification. Furthermore, these profiles provide detailed kinematic information that can aid in understanding the underlying physical processes of galaxies. Thus, this study opted to classify the kinematics of elliptical galaxies based on 1D MaNGA VD Profiles for future studies in classifying BCGs in observed data.

The classification of VD profiles of elliptical galaxies has the potential to contribute to preliminary studies aimed at identifying deficiencies in DM and establishing correlations with BCG properties. As previously noted, the systematically declining VD profiles could present a challenge to the DM model. Even though comprehensive dynamical studies are required to quantify this concern, such classifications can assist in preliminary sample selection. Furthermore, Tian et al. (2021) displays mostly flat VD profiles of BCGs, independently identified by (Hsu et al. 2022). It would be intriguing to explore the association between flat VD profiles and BCGs within a large IFS sample. The amalgamation of these statistical properties could potentially establish an alternate methodology for investigating BCGs. However, a detailed exploration of this proposition is beyond the scope of this study and calls for future research endeavours.

The structure of this paper is as follows. In Section 2, we introduce VD profiles of MaNGA elliptical galaxies and the ML-based method to classify the VD profile. In Section 3, we present both unsupervised and supervised ML-based classification. In Section 4, we discuss and summarize our findings and implications.

2 DATA AND METHODS

The study of galaxies is a complex and challenging field within astrophysics, necessitating sophisticated tools and techniques to unravel the intricate structures and kinematics of these massive objects. MaNGA is a groundbreaking initiative aimed at elucidating the detailed composition and kinematic structure of thousands of nearby galaxies using IFU spectroscopy. ML-based techniques provide an efficient way to automatically classify or identify galactic characteristics when dealing with enormous samples.

In this paper, we present our method for classifying the VD profiles of elliptical galaxies in MaNGA using a combination of supervised and unsupervised algorithms. Our approach leverages the power of artificial intelligence to analyse the vast amounts of spectroscopic data generated by the MaNGA survey and identify meaningful patterns in the VD profiles of these galaxies. By combining K-means and TreeBagger algorithms, we aim to achieve high accuracy and efficiency in classifying the VD profiles of MaNGA elliptical galaxies, which will help us to better understand the underlying physical processes governing galaxy formation and evolution.

2.1 Data

The MaNGA project is designed to map the composition and kinematic structure of nearby galaxies using the SDSS (Smee et al. 2013; Albareti et al. 2017; Blanton et al. 2017). By utilizing IFS data, MaNGA can measure spectra for hundreds of pixels within each galaxy. MaNGA provides derived data products such as maps of emission line fluxes, gas and stellar kinematics, and stellar population properties. MaNGA samples (Data Release 17; DR17) contain observations and data products for over 10 000 galaxies. In this study, two separate data sets from the SDSS are utilized. The first data set, from DR15, consists of 1266 elliptical galaxies and serves as the training set for the ML model. The second data set, from DR17, contains 2624 elliptical galaxies and provides a larger, independent set of galaxies for comparison and validation of the model’s performance. Later in the process, the training and testing approach is further refined by using a subset of 400 galaxies from the DR15 data set, with half used for training and the other half for testing.

Elliptical galaxies in MaNGA can be identified through a comprehensive morphological classification, utilizing a CNN-based analysis. Domínguez Sánchez et al. (2022) developed the MaNGA Deep Learning Morphological Value Added Catalogue to provide morphological classifications for galaxies, as described in Domínguez Sánchez et al. (2018), presenting as part of the SDSS DR17 for the final MaNGA galaxy sample. The morphological classification scheme offers a T-Type value ranging from −4 to 9, obtained through training a CNN based on the T-Types from the Nair & Abraham (2010) catalogue. The catalogue also includes two primary classifications: (1) PLTG, which separates early-type galaxies from late-type galaxies, and (2) PS0, which distinguishes pure ellipticals from S0s. Elliptical galaxies can be identified using the criteria T-Type < 0 and PS0 < 0.5, resulting in a total of 2632 samples (Domínguez Sánchez et al. 2022). When combined with the IFS data from MaNGA, complete VD profiles were obtained for 2624 elliptical galaxies.

We classified 1D VD profiles of 2624 MaNGA elliptical galaxies by calculating the mean line-of-sight stellar VD of each circle relative to their centres (Durazo et al. 2018; Tian et al. 2021). This analysis was performed using Marvin developed in Python package (Cherinka et al. 2019). To obtain complete IFS data for each sample, we restricted spaxel points to within one effective radius Re, the half-light radius. We excluded any data with 1D VD values below 20 km s−1 and those lacking associated uncertainties. To improve the clarity of the stacked profiles, we normalized all VD profiles based on their mean VD (σ¯) and effective radius (Re).

2.2 Methods

The primary objective was to develop a classifier for VD profiles using both supervised and unsupervised ML techniques. The methodology employed to train the model is depicted in Fig. 1. Initially, MaNGA VD profiles (DR15) were pre-processed to remove zero elements and retain those with large error bars. After pre-processing, K-means unsupervised classification was applied to DR15 VD profiles with k values ranging 2–5, and the results were evaluated. Based on this evaluation, k = 4 was selected for supervised classification. Following this, a subset of 400 DR15 VD profiles was randomly selected. This subset was then divided into training and testing sets, with each set comprising 50 per cent of the tagged profiles. Specifically, only 200 visually tagged profiles were utilized for the ensemble of bagging decision tree (TreeBagger) classifier training. During the training process, hyperparameters were continuously adjusted, and the accuracy of the supervised classification classifier was evaluated. Once the accuracy of the train set reached 100 per cent, the model was assessed on the test set to determine its accuracy and the confusion matrix. Finally, the classifier with the highest accuracy was employed to predict the classification of VD profiles (DR17), and all profiles were labelled.

Flowchart outlining the process of data analysis. The MaNGA data are split into two sets: DR15 and DR17. For the DR15 data, unsupervised ML (K-means) and supervised ML (TreeBagger) are used for classification. The DR17 data are directly classified.
Figure 1.

Flowchart outlining the process of data analysis. The MaNGA data are split into two sets: DR15 and DR17. For the DR15 data, unsupervised ML (K-means) and supervised ML (TreeBagger) are used for classification. The DR17 data are directly classified.

Recognizing the importance of capturing the overall trend in each VD profile, we normalized the pre-processed VD profiles along the R/Re axis and interpolated them onto a grid of 21 points, denoted by pn. To ensure smoothness in the VD profile, we included the standard deviation σ and selected points p3, p6, p9, p12, p15, p18, and p21 as inputs. Specifically, the estimation of predictor importance is achieved through the permutation of out-of-bag (OOB) predictor observations, a method intrinsic to the TreeBagger algorithm. For the distance metric of K-means clustering, we employed the cosine distance metric, with each centroid representing the mean of the normalized points to unit Euclidean length within that cluster. We ran the K-means algorithm five times with new initial cluster centroids each time, denoting ‘Replicates’ as five, and selected the final solution with the lowest within-cluster sums of point-to-centroid distances. The maximum number of iterations for the K-means algorithm to converge (MaxIter) was set to 10. Since K-means clustering is an unsupervised method, we evaluated the results for k values ranging 2–5 to determine the optimal number of categories.

Upon determining the appropriate k value, we proceeded to train a supervised classifier. Among various ML algorithms, we employed tree-structured classifiers (TreeBagger, Chang et al. 2022) for clustering the physical properties of galaxies. TreeBagger is a random forest ensemble technique that avoids feature loss by not reprocessing effective features (Zhang et al. 2019). The hyperparameters used for TreeBagger training are listed in Table 1. We set the minimum number of observations in a leaf node ‘MinLeafSize’ to 6, examining the integral region ranging in [5, 20]. The number of decision trees in the bagged ensemble ‘bestNumTrees’ was set to 65 based on OOB error analysis, which can reduce variance when training a model on noisy data sets and save memory and computation time (Janitza & Hornung 2018). Additionally, we utilized OOB error analysis to calculate and save the importance of each predictor. The specifications for the ‘Prior’ and ‘Predictor Selection’ hyperparameters are provided in Table 1.

Table 1.

The hyperparameters tuning used to set up the training for the Supervised Classification using TreeBagger.

HyperparameterInputDescription
NumTrees65Number of decision trees in the bagged ensemble.
MinLeafSize6Minimum number of leaf node observations.
PriorUniformPrior probability for each class for two-class learning is set to be equal.
Predictor SelectionCurvatureSelect the best-split predictor at each node by minimizing the p-value of chi-square tests of independence between each predictor and the response.
HyperparameterInputDescription
NumTrees65Number of decision trees in the bagged ensemble.
MinLeafSize6Minimum number of leaf node observations.
PriorUniformPrior probability for each class for two-class learning is set to be equal.
Predictor SelectionCurvatureSelect the best-split predictor at each node by minimizing the p-value of chi-square tests of independence between each predictor and the response.
Table 1.

The hyperparameters tuning used to set up the training for the Supervised Classification using TreeBagger.

HyperparameterInputDescription
NumTrees65Number of decision trees in the bagged ensemble.
MinLeafSize6Minimum number of leaf node observations.
PriorUniformPrior probability for each class for two-class learning is set to be equal.
Predictor SelectionCurvatureSelect the best-split predictor at each node by minimizing the p-value of chi-square tests of independence between each predictor and the response.
HyperparameterInputDescription
NumTrees65Number of decision trees in the bagged ensemble.
MinLeafSize6Minimum number of leaf node observations.
PriorUniformPrior probability for each class for two-class learning is set to be equal.
Predictor SelectionCurvatureSelect the best-split predictor at each node by minimizing the p-value of chi-square tests of independence between each predictor and the response.

A unique training method was employed to enhance the efficiency of the training process, especially given the limited data. Instead of relying on a large data set, the approach utilized a smaller set of 200 visually tagged VD profiles from DR15 as the training set, despite the uneven distribution across categories. To enhance the ML classifier’s ability to recognize VD profiles, a replication strategy was implemented. Categories with fewer VD profiles were replicated more frequently, leading to a total of 470 samples for training inputs. To increase the sample size, we duplicated the 470 samples three times, resulting in a total of 1410 VD profile samples, including repeated items. Importantly, overfitting can constrain accuracy enhancement, so the number of duplications must be carefully controlled. This approach enabled maximum training effectiveness with minimal manpower.

To assess the performance of the TreeBagger classifier, we employ a confusion matrix that compares the model’s predicted class labels with the true class labels. The matrix is arranged such that the rows represent the true class labels, and the columns represent the predicted class labels. Each cell in the matrix corresponds to the number of instances belonging to a particular true class and classified as part of a specific predicted class. The matrix’s diagonal displays the number of instances correctly classified, while the off-diagonal elements indicate incorrect classifications. The model’s accuracy is determined by adding the values to the confusion matrix’s diagonal and dividing them by the sum of all values in the matrix. This offers a measure of the model’s capacity to accurately predict the classes of instances relative to the total number of predictions made.

3 RESULTS

Our results are presented in two subsections: K-means unsupervised clustering and TreeBagger-supervised classification. The first subsection discusses the K-means unsupervised classification of scaled MaNGA VD profiles (DR15). We displayed the classified types for various k values and the percentage of each type within a total sample of elliptical galaxies in DR15. In the second subsection, we presented the supervised classification of scaled MaNGA VD profiles (DR15 and DR17) using TreeBagger. We compared the visually classified types with the prediction results of the train and test sets, and provide a comprehensive analysis of the Bagged Decision Trees training. Generally, the K-means unsupervised algorithm is used to provide an initial understanding of approximately how many categories or clusters exist within the data set. On the other hand, the TreeBagger supervised training is employed to achieve high accuracy in classification. Together, the accuracy of different types of VD profiles using both K-means and TreeBagger classification methods was examined.

3.1 K-means unsupervised clustering

The K-means unsupervised classification method reveals distinct categories in the scaled MaNGA VD profiles (DR15), with the Flat and Decline categories being the most prevalent when initially grouped, and additional categories emerging as the value of k increases. In Fig. 2, the K-means unsupervised classification of the scaled MaNGA VD profiles (DR15) is presented. In the background of each panel, multiple grey lines display the profiles of each category, while the median profile of each category is depicted by a black line with a circle marker. To enhance the efficiency of the classification and minimize noise, the data profile is first normalized and then divided into 21 equidistant points. Subsequently, seven evenly spaced points are selected to effectively remove the majority of the noise. When the VD profiles are initially grouped into two categories k = 2, the Flat and Decline categories are found to be the most prevalent. However, as the value of k is increased, the Irregular and Ascend categories begin to separate and become more distinct. At k = 5, a slightly declining category also emerges, although its trend is similar to that of the Decline category, resembling a branch of it. It is worth noting that in Fig. 2, the black median profile of the Irregular category appears flat due to the significant deviation introduced by the irregular feature. This suggests that individual VD profiles within the Irregular category display a wide range of values and do not follow a consistent trend. As a result, when computing the median value for each point in the profiles to create the black median profile, the resulting line is flat, devoid of any distinct peaks or troughs. Consequently, subsequent classifications with k = 4 identify four distinct categories: (1) Decline, (2) Flat, (3) Ascend, and (4) Irregular. When using K-means to cluster these four categories, the Flat category accounts for approximately 55.8  per cent of the MaNGA DR15 VD profiles, while the Decline category accounts for approximately 33.7  per cent. This intriguing discovery warrants further investigation into the potential correlation between the Flat category and BCGs. It is crucial to remember that the K-means algorithm, being an unsupervised technique, can only offer approximate categorizations. Thus, employing a supervised training classifier to validate the classification accuracy based on visual tags for VD profiles is recommended.

The K-means unsupervised classification of the scaled MaNGA DR15 VD profiles. The classified types for different k values (2, 3, 4, and 5) are displayed in rows from top to bottom, labelled as Flat, Decline, Irregular, Ascend, and Slightly Decline according to the trend of each VD profile. The grey profiles represent all the profiles in each category, while the black profile represents the median value of each point in all profiles aligned within the category. The percentage of each type to the total sample of elliptical galaxies in DR15 is indicated by the number tagged in each panel.
Figure 2.

The K-means unsupervised classification of the scaled MaNGA DR15 VD profiles. The classified types for different k values (2, 3, 4, and 5) are displayed in rows from top to bottom, labelled as Flat, Decline, Irregular, Ascend, and Slightly Decline according to the trend of each VD profile. The grey profiles represent all the profiles in each category, while the black profile represents the median value of each point in all profiles aligned within the category. The percentage of each type to the total sample of elliptical galaxies in DR15 is indicated by the number tagged in each panel.

3.2 TreeBagger-supervised classification

The supervised TreeBagger classifier, which utilized 200 visually tagged samples of the scaled MaNGA DR15 VD profiles for training and 200 additional samples for evaluation, is illustrated in Fig. 3. In the top row of Fig. 3, the 400 randomly selected and visually tagged samples were classified, with Decline samples accounting for 45.5  per cent and emerging as the dominant category. The middle and bottom rows display the predicted train and test sets, respectively, revealing a clear separation between Flat, Decline, and Irregular samples. To analyse the accuracy, OOB error, and predictor importance of the TreeBagger model, a comprehensive analysis is presented in Fig. 4.

The TreeBagger-supervised classification of the scaled MaNGA DR15 VD profiles. The top row shows the types that were classified visually; the middle row shows the prediction of the train set by the best-forest classifier; and the bottom row shows the prediction of the test set by the best-forest classifier. The four columns from left to right are the classified types labelled as Flat, Decline, Ascend, and Irregular. The grey profiles represent all the profiles in each category, while the black profile represents the median value of each point in all profiles aligned within the category.
Figure 3.

The TreeBagger-supervised classification of the scaled MaNGA DR15 VD profiles. The top row shows the types that were classified visually; the middle row shows the prediction of the train set by the best-forest classifier; and the bottom row shows the prediction of the test set by the best-forest classifier. The four columns from left to right are the classified types labelled as Flat, Decline, Ascend, and Irregular. The grey profiles represent all the profiles in each category, while the black profile represents the median value of each point in all profiles aligned within the category.

A comprehensive analysis of the TreeBagger training. (a) The OOB error while applying different numbers of grown trees. (b) The importance estimates of each predictor, σ indicates the standard deviation of the profile, and pn indicates the n elements of the profile. The confusion matrices of the train and test sets are demonstrated in (c) and (d), respectively. The order of the predicted class in (c) and (d) is (1) Decline, (2) Flat, (3) Ascend, and (4) Irregular.
Figure 4.

A comprehensive analysis of the TreeBagger training. (a) The OOB error while applying different numbers of grown trees. (b) The importance estimates of each predictor, σ indicates the standard deviation of the profile, and pn indicates the n elements of the profile. The confusion matrices of the train and test sets are demonstrated in (c) and (d), respectively. The order of the predicted class in (c) and (d) is (1) Decline, (2) Flat, (3) Ascend, and (4) Irregular.

The performance of the TreeBagger classifier is analysed through the OOB error, predictor importance, and confusion matrices for the train and test sets. Fig. 4a illustrates the decrease in OOB error as the number of trees in the TreeBagger training increased. Notably, the OOB error approached zero when the number of trees exceeded 60. In Fig. 4b, a bar graph of predictor importance showcases the standard deviation of scaled DR15 VD profiles as the most influential factor in the classification decision, with secondary peaks in importance, estimates observed for predictors p3, p9, and p18. Additionally, Figs 4(c) and (d) present the confusion matrices for the train and test sets, respectively. To enhance recognition rates during training, categories with fewer samples in the train set were duplicated, resulting in a sum of 470 samples in the train set’s confusion matrix (see Fig. 4c), and 200 samples in the test set’s confusion matrix (see Fig. 4d). It is important to note that classes 1 to 4 in the confusion matrices correspond to the Decline, Flat, Ascend, and Irregular categories, respectively. The confusion matrices reveal that the trained TreeBagger classifier achieved a classification accuracy of 100  per cent on the train set, though some test set samples were classified differently from their visually tagged categories.

A comparative analysis of the classification accuracies for the K-means unsupervised classifier and TreeBagger-supervised classifier, as presented in Table 2, offers insights into their respective performances in relation to visually tagged VD profiles (DR15). Table 2 presents a comparative analysis of the accuracy for the K-means unsupervised classifier, TreeBagger predicting train set, and TreeBagger predicting test set, providing insights into the performance of the classifiers in relation to the visually tagged VD profiles. The K-means unsupervised classifier achieved an overall accuracy of 73.5  per cent, while the TreeBagger-supervised classifier achieved an overall accuracy of 88  per cent when predicting the test set. A closer examination of the classification accuracy for each category reveals that both the K-means and TreeBagger classifiers performed best in classifying the Ascend category, whereas the Irregular category had the lowest accuracy for both classifiers. This lower accuracy in classifying the Irregular category could be due to the absence of a specific trend in the Irregular profiles, making accurate classification more challenging.

Table 2.

Comparing the accuracy of different types of profiles by applying K-means unsupervised clustering and TreeBagger-supervised classification.

Accuracy ( per cent)FlatDeclineAscendIrregularAll
K-means71.476.496.716.773.5
TreeBagger train set100100100100100
TreeBagger test set80.393.893.860.088.0
Accuracy ( per cent)FlatDeclineAscendIrregularAll
K-means71.476.496.716.773.5
TreeBagger train set100100100100100
TreeBagger test set80.393.893.860.088.0
Table 2.

Comparing the accuracy of different types of profiles by applying K-means unsupervised clustering and TreeBagger-supervised classification.

Accuracy ( per cent)FlatDeclineAscendIrregularAll
K-means71.476.496.716.773.5
TreeBagger train set100100100100100
TreeBagger test set80.393.893.860.088.0
Accuracy ( per cent)FlatDeclineAscendIrregularAll
K-means71.476.496.716.773.5
TreeBagger train set100100100100100
TreeBagger test set80.393.893.860.088.0

Upon evaluating the TreeBagger-supervised classifier’s performance, we proceeded to apply it to classify the normalized MaNGA DR17 VD profiles, as illustrated in Fig. 5. Surprisingly, the Flat category comprises approximately 67.9  per cent of the entire 2624 samples in DR17, while the Decline category constitutes only 24.47  per cent. The grey lines in the background of each panel indicate an accurate classification of the trends, with Ascend and Irregular samples being clearly discernible from the Flat and Decline categories.

The TreeBagger-supervised classification of the scaled MaNGA DR17 VD profiles. The four panels from left to right are the classified types labelled as Flat, Decline, Ascend, and Irregular.
Figure 5.

The TreeBagger-supervised classification of the scaled MaNGA DR17 VD profiles. The four panels from left to right are the classified types labelled as Flat, Decline, Ascend, and Irregular.

This study’s findings underscore the potential of the TreeBagger-supervised classifier for categorizing MaNGA VD profiles, thus enhancing our understanding of the distribution of elliptical galaxies by type. It is important to note, however, that the visual tags of the DR15 VD profiles were employed as the ground truth for training and assessing the classifier, and human error or disagreement may introduce uncertainty. Consequently, there is a pressing need for additional research to refine the classification algorithm’s accuracy. By doing so, we can more precisely determine the category of each elliptical galaxy and obtain a deeper insight into its physical properties, ultimately contributing to the advancement of our knowledge regarding galaxy formation and evolution.

4 DISCUSSIONS AND SUMMARY

In this work, we investigate the classification of MaNGA VD profiles using K-means unsupervised clustering and TreeBagger-supervised classification. The K-means unsupervised classification of scaled MaNGA VD profiles (DR15) reveals distinct categories, with the Flat and Decline categories being the most prevalent. As the value of k increases, additional categories such as the Irregular and Ascend categories become more distinct. In contrast, the supervised TreeBagger classifier, trained and evaluated on visually tagged samples, demonstrates a high classification accuracy. The comparison between the K-means and TreeBagger classifiers shows that the TreeBagger method outperforms the K-means algorithm in classifying the VD profiles. The TreeBagger classifier is then applied to classify the normalized MaNGA VD profiles (DR17), uncovering an unexpectedly high percentage of Flat category galaxies.

The findings emphasize the potential of the TreeBagger classifier in categorizing MaNGA VD profiles and enhancing our understanding of the distribution of elliptical galaxies by type. However, the reliance on visually tagged VD profiles as the ground truth introduces the possibility of human error or disagreement, which highlights the need for further research to refine the classification algorithm’s accuracy. By doing so, we can more accurately determine each elliptical galaxy’s category and gain deeper insight into its physical properties.

The fraction of flat profiles is an intriguing aspect to consider in elliptical galaxies. In the literature, Durazo et al. (2018) examined approximately 300 MaNGA elliptical galaxies, all of which displayed declining profiles. Conversely, Tian et al. (2021) discovered that 54 MaNGA BCGs mostly exhibit flat profiles. A systematic examination is essential to determine whether flat profiles imply potential BCG candidates. The presence of numerous flat profile samples among all MaNGA elliptical galaxies raises questions about the proportion of flat profiles compared to other profile types.

Various types of kinematic profiles have been suggested in the context of MOND (Milgrom 1983). For spiral galaxies, MOND predicts a characteristic surface density Σma0/G, where a0 = 1.2 × 10−10 m s−2. This prediction differentiates high-surface brightness (HSB) galaxies with Σ > Σm from low-surface brightness (LSB) galaxies with Σ < Σm. In spiral galaxies, flat and ascending rotation curves are divided according to HSB and LSB classifications (Sanders & McGaugh 2002). Moreover, the HSB and LSB of elliptical galaxies may imply different VD profiles in the MOND paradigm, necessitating a systematic examination of our findings. The prevalence of flat profiles also warrants further exploration into the dark matter problem.

In summary, this paper presents an ML-based classification of VD profiles for elliptical galaxies in the MaNGA survey. The unsupervised K-means algorithm is employed to categorize the profiles into four distinct groups, while the TreeBagger-supervised classifier attains an overall accuracy of 88  per cent when making predictions on the test set. Our analysis reveals that the Flat category dominates, accounting for 67.9  per cent of the entire DR17 data set, with the Decline category representing only 24.47  per cent. The identification of Flat VD profiles in this study has significant implications for our understanding of galaxy formation and evolution since they may represent potential candidates for BCGs that require further investigation to determine their contribution to the evolution of elliptical galaxies.

Acknowledgement

YD is supported by the Taiwan Ministry of Education (MOE) Higher Education SPROUT Project Grant to the Center for Astronautical Physics and Engineering (CAPE), as well as Taiwan National Science and Technology Council (NSTC) grants 111-2636-M-008-004 and 112-2636-M-008-003 to Professor Loren C. Chang of National Central University. YT is supported by Taiwan NSTC 110-2112-M-008-015-MY3. CMK is supported by Taiwan NSTC 111-2112-M-008-013 and NSTC 112-2112-M-008-032.

DATA AVAILABILITY

The catalogues outlined in this paper are included in the last data release of the MaNGA survey and will be made public as part of the SDSS (https://dr15.sdss.org/sas/dr15/manga/) and DR17 (https://www.sdss4.org/dr17/manga/). Upon request, the code employed for the machine-learning algorithm can be shared.

References

Albareti
F. D.
et al. ,
2017
,
ApJS
,
233
,
25

Bacon
R.
et al. ,
2001
,
MNRAS
,
326
,
23

Ball
N. M.
,
Brunner
R. J.
,
2010
,
Int. J. Mod. Phys.
,
19
,
1049

Blanton
M. R.
et al. ,
2017
,
AJ
,
154
,
28

Bonjean
V.
,
Aghanim
N.
,
Salomé
P.
,
Beelen
A.
,
Douspis
M.
,
Soubrié
E.
,
2019
,
A&A
,
622
,
A137

Bundy
K.
et al. ,
2015
,
ApJ
,
798
,
7

Cappellari
M.
,
Copin
Y.
,
2003
,
MNRAS
,
342
,
345

Chang
Y.-Y.
,
Hsieh
B.-C.
,
Wang
W.-H.
,
Lin
Y.-T.
,
Lim
C.-F.
,
Toba
Y.
,
Zhong
Y.
,
Chang
S.-Y.
,
2021
,
ApJ
,
920
,
68

Chang
Y.-Y.
,
Lin
L.
,
Pan
H.-A.
,
Lin
C.-A.
,
Hsieh
B.-C.
,
Bottrell
C.
,
Wang
P.-W.
,
2022
,
ApJ
,
937
,
97

Chen
T.
,
Kornblith
S.
,
Norouzi
M.
,
Hinton
G. E.
,
2020a
,
preprint
()

Chen
T.
,
Kornblith
S.
,
Swersky
K.
,
Norouzi
M.
,
Hinton
G. E.
,
2020b
,
preprint
()

Cherinka
B.
et al. ,
2019
,
AJ
,
158
,
74

Davidzon
I.
et al. ,
2019
,
MNRAS
,
489
,
4817

Domínguez Sánchez
H.
,
Huertas-Company
M.
,
Bernardi
M.
,
Tuccillo
D.
,
Fischer
J. L.
,
2018
,
MNRAS
,
476
,
3661

Domínguez Sánchez
H.
,
Margalef
B.
,
Bernardi
M.
,
Huertas-Company
M.
,
2022
,
MNRAS
,
509
,
4024

Durazo
R.
,
Hernandez
X.
,
Cervantes Sodi
B.
,
Sanchez
S. F.
,
2018
,
ApJ
,
863
,
107

D’Isanto
A.
,
Polsterer
K. L.
,
2018
,
A&A
,
609
,
A111

El Bouchefry
K.
,
de Souza
R. S.
,
2020
, in
Škoda
P.
,
Adam
F.
, eds,
Knowledge Discovery in Big Data from Astronomy and Earth Observation
.
Elsevier
,
Amsterdam
, p.
225

Fluke
C. J.
,
Jacobs
C.
,
2020
,
Wiley Interdiscip. Rev.: Data Min. Knowl. Discov.
,
10
,
e1349

García-Benito
R.
et al. ,
2015
,
A&A
,
576
,
A135

Hemmati
S.
et al. ,
2019
,
ApJL
,
881
,
L14

Hsu
Y.-H.
et al. ,
2022
,
ApJ
,
933
,
61

Ivezić
Ž.
,
Connolly
A. J.
,
VanderPlas
J. T.
,
Gray
A.
,
2014
,
Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data
.
Princeton Univ. Press
,
Princeton

Janitza
S.
,
Hornung
R.
,
2018
,
PLoS One
,
13
,
1

Krakowski
T.
,
Malek
K.
,
Bilicki
M.
,
Pollo
A.
,
Kurcz
A.
,
Krupa
M.
,
2016
,
A&A
,
596
,
A39

Marini
I.
,
Borgani
S.
,
Saro
A.
,
Murante
G.
,
Granato
G. L.
,
Ragone-Figueroa
C.
,
Taffoni
G.
,
2022
,
MNRAS
,
514
,
3082

Masters
D.
et al. ,
2015
,
ApJ
,
813
,
53

Milgrom
M.
,
1983
,
ApJ
,
270
,
365

Milgrom
M.
,
Sanders
R. H.
,
2003
,
ApJ
,
599
,
L25

Nair
P. B.
,
Abraham
R. G.
,
2010
,
ApJS
,
186
,
427

Rahmani
S.
,
Teimoorinia
H.
,
Barmby
P.
,
2018
,
MNRAS
,
478
,
4416

Romanowsky
A. J.
,
Douglas
N. G.
,
Arnaboldi
M.
,
Kuijken
K.
,
Merrifield
M. R.
,
Napolitano
N. R.
,
Capaccioli
M.
,
Freeman
K. C.
,
2003
,
Science
,
301
,
1696

Rubin
V. C.
,
Ford
W. K. J.
,
Thonnard
N.
,
1980
,
ApJ
,
238
,
471

Sanders
R. H.
,
McGaugh
S. S.
,
2002
,
ARA&A
,
40
,
263

Sarmiento
R.
,
Huertas-Company
M.
,
Knapen
J.
,
2021a
,
Bull. Am. Astron. Soc.
,
53
,
301

Sarmiento
R.
,
Huertas-Company
M.
,
Knapen
J. H.
,
Sánchez
S. F.
,
Sánchez
H. D.
,
Drory
N.
,
Falcón-Barroso
J.
,
2021b
,
ApJ
,
921
,
177

Smee
S. A.
et al. ,
2013
,
AJ
,
146
,
32

Teimoorinia
H.
,
2012
,
AJ
,
144
,
172

Teimoorinia
H.
,
Archinuk
F.
,
Woo
J.
,
Shishehchi
S.
,
Bluck
A. F. L.
,
2022
,
AJ
,
163
,
71

Tian
Y.
,
Ko
C.-M.
,
2016
,
MNRAS
,
462
,
1092

Tian
Y.
,
Cheng
H.
,
McGaugh
S. S.
,
Ko
C.-M.
,
Hsu
Y.-H.
,
2021
,
ApJ
,
917
,
L24

VanderPlas
J.
,
2016
,
Python Data Science Handbook: Essential Tools For Working With Data
.
O’Reilly Media, Inc
,
Sebastopol, CA

Vavilova
I.
,
Dobrycheva
D.
,
Vasylenko
M.
,
Elyiv
A.
,
Melnyk
O.
,
2020
, in
Škoda
P.
,
Adam
F.
, eds,
Knowledge Discovery in Big Data from Astronomy and Earth Observation
.
Elsevier
,
Amsterdam
, p.
307

Vavilova
I. B.
,
Dobrycheva
D. V.
,
Vasylenko
M. Yu.
,
Elyiv
A. A.
,
Melnyk
O. V.
,
Khramtsov
V.
,
2021
,
A&A
,
648
,
A122

Veale
M.
,
Ma
C.-P.
,
Greene
J. E.
,
Thomas
J.
,
Blakeslee
J. P.
,
Walsh
J. L.
,
Ito
J.
,
2018
,
MNRAS
,
473
,
5446

Way
M.
,
Scargle
J. D.
,
Ali
K.
,
Srivastava
A.
,
2012
,
Advances in Machine Learning and Data Mining for Astronomy
, 1st edn.
Chapman and Hall/CRC
,
New York

Zhang
J.
,
Lin
F.
,
Xiong
P.
,
Du
H.
,
Zhang
H.
,
Liu
M.
,
Hou
Z.
,
Liu
X.
,
2019
,
IEEE Access
,
7
,
70634

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data