-
PDF
- Split View
-
Views
-
Cite
Cite
Shimpei Nishimoto, Toshikazu Onishi, Atsushi Nishimura, Shinji Fujita, Yasutomo Kawanishi, Shuyo Nakatani, Kazuki Tokuda, Yoshito Shimajiri, Hiroyuki Kaneko, Yusuke Miyamoto, Tsuyoshi Inoue, Atsushi M Ito, Infrared bubble recognition in the Milky Way and beyond using deep learning, Publications of the Astronomical Society of Japan, Volume 77, Issue 2, April 2025, Pages 403–424, https://doi.org/10.1093/pasj/psaf008
- Share Icon Share
Abstract
We propose a deep-learning model that can detect Spitzer bubbles accurately using two-wavelength near-infrared data acquired by the Spitzer Space Telescope and JWST. The model is based on the single-shot multibox detector as an object detection model, trained and validated using Spitzer bubbles identified by the Milky Way Project (MWP bubbles). We found that using only MWP bubbles with clear structures, along with normalization and data augmentation, significantly improved performance. To reduce the dataset bias, we also use data without bubbles in the dataset selected by combining two techniques: negative sampling and clustering. The model was optimized by hyperparameter tuning using Bayesian optimization. Applying this model to a test region of the Galactic plane resulted in a 98% detection rate for MWP bubbles with 8 µm emission clearly encompassing 24 µm emission. Additionally, we applied the model to a broader area of |$1^\circ \leq |l| \leq 65^\circ$|, |$|b| \leq 1^\circ$|, including both training and validation regions, and the model detected 3006 bubbles, of which 1413 were newly detected. We also attempted to detect bubbles in the high-mass star-forming region Cygnus X, as well as in external galaxies, the Large Magellanic Cloud (LMC) and NGC 628. The model successfully detected Spitzer bubbles in these external galaxies, though it also detected Mira-type variable stars and other compact sources that can be difficult to distinguish from Spitzer bubbles. The detection process takes only a few hours, demonstrating the efficiency in detecting bubble structures. Furthermore, the method used for detecting Spitzer bubbles was applied to detect shell-like structures observable only in the 8 µm emission band, leading to the detection of 469 shell-like structures in the LMC and 143 in NGC 628.
1 Introduction
High-mass stars significantly impact the surrounding interstellar medium (ISM) and the evolution of galaxies (Krumholz 2014). Mechanically, they dynamically disturb the ISM through stellar winds, ionizing radiation, dust heating, the expansion of H ii regions, etc. (Hosokawa & Inutsuka 2005; Watson et al. 2008; Shimajiri et al. 2014). Chemically, their supernova explosions at the end of their lifecycles enrich heavy elements (Wanajo et al. 2002; Nomoto et al. 2013). Therefore, a deep understanding of the mechanisms of high-mass star formation is crucial for comprehending galaxy evolution.
The ISM is filled with ring-like and shell-like structures of various sizes, ranging from supergiant shells spanning kiloparsecs created by energetic events such as supernova explosions (Kim et al. 1999) to smaller structures associated with protostar formation (Harada et al. 2023; Tokuda et al. 2023). Recent JWST observations have revealed that entire galaxies are densely packed with these ring and shell structures (Barnes et al. 2023; Watkins et al. 2023). Unraveling the nature and formation mechanisms of these structures is key to understanding the lifecycle of the ISM in the context of star formation and ultimately deciphering the process of galaxy evolution.
Among these numerous rings and shells, Spitzer bubbles are particularly well studied and have been systematically identified in large numbers, especially within the Milky Way (Churchwell et al. 2006, 2007; Simpson et al. 2012; Jayasinghe et al. 2019). These structures are characterized by an 8 µm bright shell surrounding a central 24 µm emission region (see figure 1 in Beaumont et al. 2014). Spitzer bubbles have typically been associated with the feedback effects of high-mass stars, such as in the collect and collapse (C&C) process (Elmegreen & Lada 1977). However, there is ongoing debate about their formation mechanisms (Zavagno et al. 2007; Torii et al. 2015), as some evidence implies that they capture the direct triggers of high-mass star formation, as proposed in the cloud–cloud collision (CCC: Habe & Ohta 1992; Fukui et al. 2021 and references therein). Understanding the origins of Spitzer bubbles is, therefore, critical for constraining the mechanisms of high-mass star formation.
In addition, Spitzer bubbles are used not only to understand individual star-forming regions, but also as a statistical study to understand the mechanisms of high-mass star formation such as CCC and C&C in the entire Milky Way (Kendrew et al. 2012; Thompson et al. 2012). Statistical studies of such high-mass star formation mechanisms require comprehensive and accurate detection of Spitzer bubbles in the entire Milky Way and other galaxies. However, some Spitzer bubbles were not detected in previous studies and the conventional detection of Spitzer bubbles has primarily relied on manual work, which is time-consuming and costly (Ueda et al. 2020). Therefore, we developed a deep-learning model that can detect Spitzer bubbles. Deep learning generally enables faster and more accurate detection by processing large datasets. Additionally, deep learning excels at capturing microscopic structures and patterns, helping us to better understand the physical processes involved in the formation and evolution of the Spitzer bubble, as well as its interaction with the surrounding environment. This approach also saves time and resources, making large-scale observational data analysis feasible. By comparing the spatial and velocity distribution of molecular gas associated with the Spitzer bubbles, we can statistically investigate the origins of Spitzer bubbles and the mechanisms of high-mass star formation (Liu et al. 2015; Torii et al. 2018; Fujita et al. 2019). We are preparing a subsequent paper on the investigation of star formation mechanisms through comparison with molecular gas.
In this study, we focus on developing a deep-learning model that can rapidly detect Spitzer bubbles including previously undetected ones. In section 2, we introduce the infrared data used in this study. section 3 describes the details of the model, dataset, method of data processing, and evaluation metrics. To enhance the model performance, we conducted data optimization, as discussed in section 4, and hyperparameter tuning of the model, in section 5. In section 6, we discuss the effectiveness of the model in detecting Spitzer bubbles within the test region, Cygnus X, the Large Magellanic Cloud (LMC), and NGC 628. We also discuss the results of detecting shell-like structures observable only in the 8 µm, which are considered to be generated by supernova explosions or high-mass star formation, utilizing the methods described in sections 4 and 5.
1.1 Features of Spitzer bubbles
The 24 µm emission associated with the Spitzer bubbles is considered to trace H ii regions, which are ionized and heated to approximately 10|$^4$| K by ultraviolet (UV) radiation from high-mass stars. Within the H ii region, dust mixed with ionized gas is heated by an intense radiation field, forming a bright nebula at 24 µm similar to radio continuum emission (Churchwell et al. 2006; Watson et al. 2008, 2010). The photodissociation region (PDR) surrounding the H ii region is traced by 8 µm emission, which is produced by the excitation of polycyclic aromatic hydrocarbons (PAHs) due to far-UV radiation leaking from the H ii region. Thus, the characteristic morphology of Spitzer bubbles, where a bright 8 µm shell surrounds the central part of the 24 µm emission, can be observed in both CCC and C&C scenarios (Dale et al. 2007; Takahira et al. 2014; Shima et al. 2016).
H ii regions are the most abundant energy sources of turbulence within giant molecular clouds (GMCs) (Matzner 2002). Among the 102 bubbles picked up from Spitzer bubbles detected by Churchwell et al. (2006, 2007) (hereafter CH06, CH07), at least 86% coincide with radio continuum emission at 20 cm (Deharveng et al. 2010). This result suggests that most Spitzer bubbles are associated with H ii regions and are significant energy sources within GMCs. Additionally, Spitzer bubbles are helpful as tracers of star formation activity due to their relatively low contamination from supernova remnants (SNR), asymptotic giant branch star bubbles, and planetary nebulae (Deharveng et al. 2010). Thus, Spitzer bubbles are appropriate objects as indicators of high-mass star formation and have important features for understanding the process of high-mass star formation.
1.2 Previous work on the detection of Spitzer bubbles
1.2.1 Manual work
The mid-infrared image surveys of the Galactic plane conducted by ISO (Infrared Space Observatory; Kessler et al. 1996) and MSX (Midcourse Space Experiment; Price 1995; Egan et al. 1998; Price et al. 2001) revealed the presence of many Spitzer bubble-like structures in the Galactic disk. Subsequently, CH06 and CH07 created the first Spitzer bubble catalogs, which include 591 Spitzer bubbles within the range |$-65^{\circ } \leq l \leq 65^{\circ }$|, |$-1^{\circ } \leq b \leq 1^{\circ }$|, using only the GLIMPSE data at 3.6, 5.8, and 8.0 µm. However, due to the limited resources, it was suggested that the actual number of Spitzer bubbles within the surveyed area of the Galactic plane might be underestimated. Additionally, CH06 and CH07 catalogs contain errors where at least two or more instances of the same Spitzer bubble were counted multiple times (for example, N1 and CN146, S1 and CS116).
Following the above studies, the Milky Way Project (MWP), involving over 35000 citizen scientists, detected Spitzer bubbles in the same region as CH06 and CH07 using GLIMPSE 8 µm and MIPSGAL 24 µm data (Simpson et al. 2012, DR1). Because the Spitzer bubble is associated with a H ii region, as mentioned above, the addition of 24 µm data makes the identification of Spitzer bubbles much easier than when relying primarily on 8 µm data. Consequently, the project detected 5106 Spitzer bubbles, including 86% of the sources in CH06 and CH07. In addition, 928 yellow balls, which have compact emissions at both 8 and 24 µm and are expected in the early stages of high-mass star formation, were also detected (Kerton et al. 2015). The significant increase in the number of Spitzer bubbles has enabled statistical studies of high-mass star formation on the Galactic scale. However, the DR1 is not recommended because of the low accuracy in measuring the shapes and sizes of bubbles and the lack of uncertainty parameters (Jayasinghe et al. 2019). Therefore, 2600 Spitzer bubbles were newly scrutinized and cataloged (Jayasinghe et al. 2019, DR2). The DR2 is a more refined catalog because of its accurate measurement of the shapes and sizes of Spitzer bubbles, with a maximum zoom level that was twice that employed in the DR1, and its elimination of the duplication that existed in the DR1 (hereafter, MWP in the following text refers to the DR2).
In MWP, many scientists participated in detecting bubbles from the Spitzer data; however, this method is time-consuming, subjective, and difficult to calibrate (Beaumont et al. 2014). Despite the significant time and human resources invested in MWP, it was confirmed that undetected Spitzer bubbles still existed (Ueda et al. 2020). Furthermore, with the increasing data volume from modern telescopes like the James Webb Space Telescope (JWST), comprehensive human detection is becoming increasingly difficult. In particular, Spitzer bubble surveys by humans using all-sky data from Wide-field Infrared Survey Explorer (WISE; Wright et al. 2010) or data from JWST are likely to take years. Therefore, it is crucial to detect Spitzer bubbles using machine learning with less human intervention.
1.2.2 Machine-learning work
Recently, machine-learning techniques have been applied to various astronomical data for various scientific purposes, such as solving the near–far problem in the inner Galaxy using a convolutional neural network (CNN) (Fujita et al. 2023) and predicting H|$_2$| column density using Extra Trees Regressor, which is similar to Random Forests (Shimajiri et al. 2023).
In Spitzer bubbles, machine learning has enabled systematic, quantitative, and repeatable detection by automatically classifying them. The Random Forest classification method introduced in the DR2, named Brut, is the first introduction of the automatic classification of Spitzer bubbles (Beaumont et al. 2014; Jayasinghe et al. 2019). Brut has made it possible to supplement human detection by setting specific criteria for the detection of Spitzer bubbles. Subsequently, Brut significantly improved performance by training on the synthetic images of bubbles in three Spitzer bands (4.5, 8, 24 µm) generated by the HYPERION (three-dimensional dust continuum Monte Carlo radiative transfer code) (Xu & Offner 2017). However, recent advancements in object detection using CNN have shown overwhelming performance improvements compared to the Random Forest used in the Brut (Liu et al. 2020). Focusing on CNN, Ueda et al. (2020) developed a new model that can detect Spitzer bubbles (hereafter, the Ueda model), although the Ueda model had issues such as false detections, long inference times, and complexity of result analysis. Therefore, we have attempted to develop a new deep-learning model that is both fast and accurate.
2 Data description
In this study, we used observational data from GLIMPSE, MIPSGAL, and JWST. Table 1 shows the wavelengths, resolutions, and observation areas of the GLIMPSE and MIPSGAL data.
Comparison of wavelengths, angular resolutions, and observation areas of GLIMPSE and MIPSGAL.*
. | GLIMPSE . | MIPSGAL . |
---|---|---|
Wavelengths | 3.6, 4.5, 5.8, 8 µm | 24, 70 µm |
Resolutions | |${{1^{\prime\prime}\!\!.{}5}}$|–|${{1^{\prime\prime}\!\!.{}9}}$| | 5|$^{\prime \prime }$|, 15|$^{\prime \prime }$| |
Area | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| |
. | GLIMPSE . | MIPSGAL . |
---|---|---|
Wavelengths | 3.6, 4.5, 5.8, 8 µm | 24, 70 µm |
Resolutions | |${{1^{\prime\prime}\!\!.{}5}}$|–|${{1^{\prime\prime}\!\!.{}9}}$| | 5|$^{\prime \prime }$|, 15|$^{\prime \prime }$| |
Area | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| |
In this study, we use only GLIMPSE 8 µm and MIPSGAL 24 µm.
Comparison of wavelengths, angular resolutions, and observation areas of GLIMPSE and MIPSGAL.*
. | GLIMPSE . | MIPSGAL . |
---|---|---|
Wavelengths | 3.6, 4.5, 5.8, 8 µm | 24, 70 µm |
Resolutions | |${{1^{\prime\prime}\!\!.{}5}}$|–|${{1^{\prime\prime}\!\!.{}9}}$| | 5|$^{\prime \prime }$|, 15|$^{\prime \prime }$| |
Area | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| |
. | GLIMPSE . | MIPSGAL . |
---|---|---|
Wavelengths | 3.6, 4.5, 5.8, 8 µm | 24, 70 µm |
Resolutions | |${{1^{\prime\prime}\!\!.{}5}}$|–|${{1^{\prime\prime}\!\!.{}9}}$| | 5|$^{\prime \prime }$|, 15|$^{\prime \prime }$| |
Area | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| | |$-65^\circ \leq l \leq 65^\circ$|, |$-1^\circ \leq b \leq 1^\circ$| |
In this study, we use only GLIMPSE 8 µm and MIPSGAL 24 µm.
2.1 Galactic Legacy Infrared Mid-Plane Survey Extraordinaire (GLIMPSE)
GLIMPSE (Benjamin et al. 2003; Churchwell et al. 2009) observed the Galactic plane using the IRAC (Fazio et al. 2004) on the Spitzer Space Telescope (Werner et al. 2004). IRAC has four bands (3.6, 4.5, 5.8, and 8.0 µm) with angular resolutions ranging from |${{1^{\prime\prime}\!\!.{}5}}$| (3.6 µm) to |${{1^{\prime\prime}\!\!.{}9}}$| (8.0 µm). The observational range covers |$-65^{\circ } \leq l \leq 65^{\circ }$|, |$-1^{\circ } \leq b \leq 1^{\circ }$|. In this study, we used the 8.0 µm data from GLIMPSE. The 8.0 µm band is dominated by strong PAH features at 7.7 and 8.6 µm, which control the diffuse emission in this band. The infrared emission from PAHs is observed in the direction of PDRs excited by far-UV radiation leaking from H ii regions.
2.2 MIPSGAL
MIPSGAL is a survey of the Galactic plane (|$-65^{\circ } \leq l \leq 65^{\circ }$|, |$-1^{\circ } \leq b \leq 1^{\circ }$|) using the Multiband Imaging Photometer for Spitzer (MIPS: Rieke et al. 2004; Carey et al. 2009) at 24 and 70 µm. The angular resolutions are 5|$^{\prime \prime }$| at 24 µm and 15|$^{\prime \prime }$| at 70 µm. The 24 µm emission is dominated by dust continuum emission, which is thought to be due to very small grains (VSG) out of thermal equilibrium or big grains in thermal equilibrium.
2.3 JWST
For the extragalactic galaxy NGC 628, we used data observed by the F770W and F2100W filters of MIRI on the JWST (Gardner et al. 2006). The FWHM is |${{0^{\prime\prime}\!\!.{}25}}$| for F770W and |${{0^{\prime\prime}\!\!.{}67}}$| for F2100W. The 7.7 µm band data include PAH emission similar to GLIMPSE (Tielens 2008). The 21 µm band data are thought to be due to thermal emission from VSG, similar to MIPSGAL. The FITS files for the data were downloaded from the Multimission Archive at STScI (MAST).1
3 Method of Spitzer bubble detection
To detect Spitzer bubbles at high speed, we used an object detection method, the single-shot multibox detector (SSD), developed by Liu et al. (2016). SSD is a CNN-based object detector that outputs the location and class confidence using a single convolutional neural network, which can speed up the object detection process. Object detection methods such as SSD are detectors that “detect” objects in images by simultaneously outputting the classification results and locations in each image. On the other hand, Brut and the Ueda model are classifiers that “classify” cropped images into specific categories in a sliding-window manner. Because of this feature, object detection methods are suitable for detecting in vast areas compared to classifiers.
Figure 1 shows the development procedure, details of which are described in sections 4 and 5.

Processing flow of our deep-learning model that can detect Spitzer bubbles. First, we divided the input data into training, validation, and test (left side; see table 3) to train the model and evaluate its performance. All data were normalized within the range of 0 to 1 as described in sub-subsection 3.3.1. Furthermore, training data are divided into two: bubble and non-bubble data (subsection 3.3). Secondly, to improve model performance (F2 score), bubble data in training data were applied for bubble selection (subsection 4.1) and data augmentation (subsection 4.2), and the hyperparameters of the model were Bayesian optimized. Finally, all data are input into the optimized model and the results output by the model are treated as bubble candidates (section 6).
3.1 Comparison between SSD and the Ueda model
In this subsection, we show the advantages of SSD when compared to the Ueda model in terms of inference time and the processing method of the results.
3.1.1 Inference time
Simple CNN-based classifiers such as the Ueda model can only classify the presence or absence of objects in an image using confidence scores. Therefore, they determine the exact location of one Spitzer bubble in an image, by cropping a single image at various sizes and combining their inference results. Also, when there are multiple Spitzer bubbles in an image, they are processed in the same way.
Similarly, since the original image is too large to be processed by SSD, our model also crops input images at various sizes for inference (see sub-subsection 3.3.2). However, unlike classifiers, SSD in our model can output the location of objects in images and detect multiple objects simultaneously; thus it can use larger sizes for cropping. This capability allows our model to significantly reduce the number of cropped images needed to get the exact location and the inference time compared to the CNN model. For example, the inference time by GPU for a small region of 6 deg|$^2$| is approximately 10 min for SSD with approximately 80000 images to be inferred. On the other hand, the Ueda model would use approximately 700 million images, assuming crop sizes of 23, 26, 28, ..., 2263, 2489, and 2738 pixels (calculated as |$50 \times 1.1^x$|, [x= |$-$|8–43]) and sliding-window strides of |$1/10$| of the crop sizes. As a result, it takes about 50 min for the inference by GPU. The inference time by GPU for SSD and CNN is not proportional to the number of cropped images because the Ueda model has fewer CNN layers than SSD, resulting in a shorter inference time per image. In addition, including the time for processing image data, such as normalization and resizing, the total computation time for the Ueda model exceeds several tens of hours.
Based on this difference in inference time, SSD is more suitable than the Ueda model for fast object detection.
3.1.2 Post-processing after detection
SSD is much simpler than the Ueda model in processing detection results. The Ueda model calculates the position of a Spitzer bubble using a probability cube created by connecting the probabilities of the cropped images at each crop size into a map and overlaying them. In this probability cube, a Spitzer bubble is determined if the number of connected voxels exceeding the probability threshold is greater than a specific value. However, such a complex process is time-consuming, and has several hyperparameters. In contrast, SSD can calculate the positions of Spitzer bubbles without a complex analysis using probability cubes because SSD can output the location information.
In addition, since SSD has high detection accuracy (Liu et al. 2016), it can achieve the objective of faster speed without compromising accuracy, despite the different methods compared to the classifier. Thus, in this study, we used the SSD model.
3.2 Single-shot multibox detector (SSD)
SSD places 8732 detection boxes of various sizes and shapes in the region of a single image (see the following paragraphs) and simultaneously performs classification and regression on all these boxes in a single inference. Then, the positions of the objects are determined from the offsets of those boxes obtained by the regression. In addition, SSD uses multiple feature maps with different resolutions for object detection, instead of a single feature map. By detecting smaller objects in feature maps of shallow layers and larger objects in feature maps of deeper layers, SSDs can detect objects of various sizes.
SSD has four main components: VGG layer, Extra layer, loc layer, and conf layer (figure 2).

Convolution flow and output of the SSD on which our model is based. The arrows indicate the sequence in which the data are convolved. Sources 1–6 mean feature maps generated by the convolution of the input image. Source 1 is the output after inputting the 38 |$\times$| 38 |$\times$| 512 feature map to L2 Norm, then all sources are input to the loc and conf layers. SSD can detect both large and small objects using feature maps with six different resolutions. In this figure, only a single image is input, but it can be input in specified batch units. The input image is an example of a Spitzer bubble with 8 µm emission in green, and 24 µm emission in red. For all subsequent images observed by the Spitzer Space Telescope, we will use the same scheme. See table 2 for the size of each source and subsection 3.2 for the DBox.
The first VGG layer is based on VGG-16 (CNN with 16 layers) and the loc and conf layers each consist of one convolution layer. These components work together as follows: The VGG and Extra layers are the feature extractors, producing six feature maps from the input image. The loc layer refines the positions and sizes of detected objects based on the feature map information. The conf layer assigns confidence scores to each detected object. Together, these layers help to accurately localize and classify objects in the image.
The feature maps from source 1 to source 6 are extracted using the VGG layer and the Extra layer. Source 1 has features corresponding to small areas of an image, source 2 and source 3 have progressively larger areas in that order, and source 6 has features corresponding to large areas that represent the entire image. The resolutions of each source are shown in table 2. Due to these four components and six feature maps, the SSD can accurately detect variously sized objects at high speed.
Resolutions of each source, the number of kinds of DBoxes, and the total number of DBoxes for each source.*
. | Resolution . | Kinds . | Total . |
---|---|---|---|
Source 1 | |$38\times 38$| | 4 | 5776 |
Source 2 | |$19\times 19$| | 6 | 2166 |
Source 3 | |$10\times 10$| | 6 | 600 |
Source 4 | |$5\times 5$| | 6 | 150 |
Source 5 | |$3\times 3$| | 4 | 36 |
Source 6 | |$1\times 1$| | 4 | 6 |
. | Resolution . | Kinds . | Total . |
---|---|---|---|
Source 1 | |$38\times 38$| | 4 | 5776 |
Source 2 | |$19\times 19$| | 6 | 2166 |
Source 3 | |$10\times 10$| | 6 | 600 |
Source 4 | |$5\times 5$| | 6 | 150 |
Source 5 | |$3\times 3$| | 4 | 36 |
Source 6 | |$1\times 1$| | 4 | 6 |
Source 1, 5, and 6 have four kinds of DBoxes and sources 2, 3, and 4 have six kinds of DBoxes, so source 1 has 5776 (|$38 \times 38 \times 4$|) DBoxes and source 2 has 2166 (|$19 \times 19 \times 6$|) DBoxes.
Resolutions of each source, the number of kinds of DBoxes, and the total number of DBoxes for each source.*
. | Resolution . | Kinds . | Total . |
---|---|---|---|
Source 1 | |$38\times 38$| | 4 | 5776 |
Source 2 | |$19\times 19$| | 6 | 2166 |
Source 3 | |$10\times 10$| | 6 | 600 |
Source 4 | |$5\times 5$| | 6 | 150 |
Source 5 | |$3\times 3$| | 4 | 36 |
Source 6 | |$1\times 1$| | 4 | 6 |
. | Resolution . | Kinds . | Total . |
---|---|---|---|
Source 1 | |$38\times 38$| | 4 | 5776 |
Source 2 | |$19\times 19$| | 6 | 2166 |
Source 3 | |$10\times 10$| | 6 | 600 |
Source 4 | |$5\times 5$| | 6 | 150 |
Source 5 | |$3\times 3$| | 4 | 36 |
Source 6 | |$1\times 1$| | 4 | 6 |
Source 1, 5, and 6 have four kinds of DBoxes and sources 2, 3, and 4 have six kinds of DBoxes, so source 1 has 5776 (|$38 \times 38 \times 4$|) DBoxes and source 2 has 2166 (|$19 \times 19 \times 6$|) DBoxes.
SSD estimates the exact position of objects using the relative offset values to the detection boxes. The fixed detection boxes are called Default-Boxes (DBoxes), and the boxes indicating the estimated position of the object with applied offsets are called Bounding-Boxes (BBoxes).
SSD detects objects using the six feature maps mentioned above and DBoxes corresponding to each feature map with different sizes and locations. The DBoxes are created by defining specific positions and sizes on the feature map grid (figure 3a).

Pattern DBoxes enclosing the source on two feature maps and DBox and BBox images. (a) Examples of feature maps and DBoxes for sources 3 and 4. The DBoxes exist in all cells, but we have described only the DBoxes where objects are located in this figure. The DBox with the highest confidence is shown in red. In this case, only the DBoxes of sources 3 and 4 are taken as examples, but in reality, loc and conf are obtained for all DBoxes. See table 2 for the size of each source and the number of DBoxes. (b) The gray squares are dozens of DBoxes out of 8732 DBoxes of different sizes and positions. The DBoxes in red are the DBoxes with the highest confidence of the object, and the blue color indicates the DBoxes with the next highest confidence. Although we only show several DBoxes for simplicity, there are more DBoxes around objects. (c) Because the objects and the DBoxes are slightly misaligned, SSD adjusts them using the offset information and the equation defined by Liu et al. (2016). Then, we use NMS to eliminate redundant BBoxes. The 8 and 24 µm emissions are shown in green and red, respectively.
There are a total of 8732 DBoxes, consisting of four or six DBoxes with different aspect ratios for all six feature map cells (table 2). SSD achieves superior execution speed compared to other methods by using multiple DBoxes for position estimation and class classification in a single inference.
Feature maps from source 1 to source 6 are input into the loc and conf layers, and convolution is performed once for each source. The loc layer outputs offset values (|$\Delta$|center|$_{\mathrm{x}}$|, |$\Delta$|center|$_{\mathrm{y}}$|, |$\Delta$|width, |$\Delta$|height) for each of the 8732 DBoxes. The conf layer outputs the confidence score for all object categories (c|$_{\mathrm{bubble}}$|, c|$_{\mathrm{non-bubble}}$|) for each of the 8732 DBoxes. Then, the top 200 DBoxes with the highest c|$_{\mathrm{bubble}}$| obtained from the conf layer are extracted. By substituting the coordinates of the DBoxes (center|$_{\mathrm{x}_{\rm d}}$|, center|$_{\mathrm{y}_{\rm d}}$|, width|$_{\rm d}$|, height|$_{\rm d}$|) and the offset values of the output into the equation defined by Liu et al. (2016) (figure 3b), DBoxes are converted into the exact position coordinates of the objects (BBoxes). Furthermore, redundant BBoxes are eliminated by a method called non-maximum suppression (NMS), leaving only one BBox per object (figure 3c). In this study, the remaining BBoxes after NMS are treated as detected Spitzer bubbles.
Generally, SSD can detect multiple categories of objects (e.g., cars, people, and bicycles) at the same time; however, in this study, it is used to detect a single category, bubbles. In addition, we use two wavelength bands, 8 and 24 µm, for the detection of Spitzer bubbles. Therefore, the input was two-band data and the output was the background and Spitzer bubble classes.
3.3 Dataset used for training
The training data in this study are composed of two types: data with Spitzer bubbles (bubble data) and data without Spitzer bubbles (non-bubble data) (figure 4). The bubble and non-bubble data consist of image data (data) and classes (labels) with the location information of the objects. The sequential procedure of updating the parameters of the SSD using all training data and evaluating the model with validation data is called an epoch (see figure 6 in Nishimoto et al. 2022).

Example of bubble and non-bubble data; the white box in the bubble data indicates the position of the MWP bubble. Both data have 300 |$\times$| 300 pixels. The 8 and 24 µm emissions are shown in green and red, respectively.

Comparison of the same Spitzer bubble normalized using all pixel data (left) and normalized without using the point sources (right). The detected point sources are marked with cyan circles. The 8 and 24 µm emissions are shown in green and red, respectively.

Example of how to crop data when creating validation data. The crop sizes are 150, 300, 600, 900, 1200, 1500, 1800, 2400, and 3000 pixels, and the data are cropped at sliding-window strides of |$1/3$| of the crop size. The white dashed arrow indicates the direction in which the data are cropped. The 8 and 24 µm emissions are shown in green and red, respectively.
Bubble data (figures 4a, 4b, 4c) include Spitzer bubbles detected by the Milky Way Project (MWP bubbles). The model developed in this study aims to detect whether Spitzer bubbles show a clear distribution with 24 µm emission surrounded by 8 µm emission. Therefore, we used only distinct MWP bubbles, corresponding to bubbles categorized as Rank 1 (see subsection 4.1), for the training and validation data. Additionally, the square area ratio of 175 bubbles shared by both MWP and CH06 and CH07 shows that the area of MWP bubbles, calculated from the major axis (MajAxis), is approximately 1.7 times larger than the area of bubbles of CH06 and CH07, calculated from the outer radius (|$R_{\mathrm{out}}$|). In this study, to ensure the entire 8 µm shell structure, we expanded the radii of MWP bubbles by a factor of 1.3, where 1.3 is the square root of 1.7. To ensure robust handling of partial bubble morphologies, we annotated bubbles on full-field Spitzer images before cropping. Partial bubbles were only included in the training data if more than 60% of their structure was present within the cropped region. This approach minimizes false detections and enhances the model’s performance. The bubble data used for training data are generated at each epoch, with a fixed random seed applied only at the beginning to ensure reproducibility.
We also incorporated non-bubble data in training data to suppress false positives (backgrounds). Non-bubble data were created by randomly cropping regions outside the areas of Rank 1 MWP bubbles, so non-bubble data are not expected to include Spitzer bubbles in the image (figures 4d, 4e, 4f). The role of non-bubble data is to make SSD learn the areas that are unrelated to the periphery of Spitzer bubbles as backgrounds. Spitzer bubbles exist only locally, and their area within the Milky Way is very small. If SSD is trained with only bubble data, SSD can only learn the areas around Spitzer bubbles as backgrounds and cannot correctly learn the areas that are unrelated to the periphery of Spitzer bubbles as backgrounds. Therefore, including images with only backgrounds such as non-bubble data can avoid limiting the backgrounds to only the data surrounding Spitzer bubbles and suppress false positives. In this study, we applied negative sampling and non-bubble clustering to the non-bubble data to train the model effectively (see subsection 4.3).
After cropping, both bubble data and non-bubble data larger than 300 pixels are reduced to the resolution of 300 |$\times$| 300 pixels and normalised using the method in sub-subsection 3.3.1. Then, the data were resized to 300 |$\times$| 300 pixels. Table 3 lists the regions used for the training, validation, and testing. The training and validation data were randomly selected from FITS files excluding the test region.
. | Galactic longitude range . |
---|---|
Training | Area of |$1^\circ \leq |l| \leq 65^\circ$| |
excluding test and validation regions | |
Validation | |$31.^{\!\!\!\circ }5 \leq l \leq 34.^{\!\!\!\circ }5$|, |$37.^{\!\!\!\circ }5 \leq l \leq 40.^{\!\!\!\circ }5$|, |
|$46.^{\!\!\!\circ }5 \leq l \leq 49.^{\!\!\!\circ }5$|, |$52.^{\!\!\!\circ }5 \leq l \leq 55.^{\!\!\!\circ }5$| | |
Test | |$10.^{\!\!\!\circ }5 \leq l \leq 22.^{\!\!\!\circ }5$| |
. | Galactic longitude range . |
---|---|
Training | Area of |$1^\circ \leq |l| \leq 65^\circ$| |
excluding test and validation regions | |
Validation | |$31.^{\!\!\!\circ }5 \leq l \leq 34.^{\!\!\!\circ }5$|, |$37.^{\!\!\!\circ }5 \leq l \leq 40.^{\!\!\!\circ }5$|, |
|$46.^{\!\!\!\circ }5 \leq l \leq 49.^{\!\!\!\circ }5$|, |$52.^{\!\!\!\circ }5 \leq l \leq 55.^{\!\!\!\circ }5$| | |
Test | |$10.^{\!\!\!\circ }5 \leq l \leq 22.^{\!\!\!\circ }5$| |
After determining the test region, the training and validation regions were randomly determined. The training, validation, and test regions all have |$|b| \leq 1^\circ$|.
. | Galactic longitude range . |
---|---|
Training | Area of |$1^\circ \leq |l| \leq 65^\circ$| |
excluding test and validation regions | |
Validation | |$31.^{\!\!\!\circ }5 \leq l \leq 34.^{\!\!\!\circ }5$|, |$37.^{\!\!\!\circ }5 \leq l \leq 40.^{\!\!\!\circ }5$|, |
|$46.^{\!\!\!\circ }5 \leq l \leq 49.^{\!\!\!\circ }5$|, |$52.^{\!\!\!\circ }5 \leq l \leq 55.^{\!\!\!\circ }5$| | |
Test | |$10.^{\!\!\!\circ }5 \leq l \leq 22.^{\!\!\!\circ }5$| |
. | Galactic longitude range . |
---|---|
Training | Area of |$1^\circ \leq |l| \leq 65^\circ$| |
excluding test and validation regions | |
Validation | |$31.^{\!\!\!\circ }5 \leq l \leq 34.^{\!\!\!\circ }5$|, |$37.^{\!\!\!\circ }5 \leq l \leq 40.^{\!\!\!\circ }5$|, |
|$46.^{\!\!\!\circ }5 \leq l \leq 49.^{\!\!\!\circ }5$|, |$52.^{\!\!\!\circ }5 \leq l \leq 55.^{\!\!\!\circ }5$| | |
Test | |$10.^{\!\!\!\circ }5 \leq l \leq 22.^{\!\!\!\circ }5$| |
After determining the test region, the training and validation regions were randomly determined. The training, validation, and test regions all have |$|b| \leq 1^\circ$|.
3.3.1 Processing of the dataset
In this study, to treat the values of 8 and 24 µm equally, the data were normalized for each channel to a range from 0 to 1 before being input into the model. The background level of the data obtained from the Spitzer Space Telescope was almost 0; however, for the JWST data, a specific value had to be subtracted to make the background level 0. The maximum value was set to three times the standard deviation above the mean intensity level of each target image. In this process, for the 8 µm data only, the regions containing point sources were excluded, and the remaining data were normalized within the range of 0 to 1. The regions with point sources were assigned a value of 1 after normalization.
Figure 5 compares the same Spitzer bubble, normalized using all pixels and normalized without using the point sources. As shown on the left side of figure 5, when bright point sources are present, the 8 µm distribution of the Spitzer bubble becomes close to 0 after normalization, making the bubble appear as if it is only a point source in the 24 µm emission. To address this, point sources are identified and excluded using DAOStarFinder in photutils,2 based on their size and flux intensity. Specifically, point sources are defined as objects with a full width at half maximum (FWHM) smaller than or equal to |${{1^{\prime\prime}\!\!.{}98}}$|, matching the PSF of the 8 µm observations. Additionally, to account for differences in flux intensity, we remove only point sources with intensities exceeding the mean plus three times the variance of the cropped data. This ensures that only the brightest point sources, which could significantly affect the normalization, are excluded while retaining small bubbles and other relevant structures for analysis.
Additionally, because all data are saved in the PNG format, they were converted to 256 gradations after the normalization. Then, when the data are input into the model, they are divided by 255 to fit within the range of 0 to 1.
3.3.2 Creation of validation data
The validation data and other data to be inferred are too large to be input directly into SSD, so these data are cropped into windows of various sizes and inferred. We determined the crop sizes as half and multiples of 300 pixels, which is one edge of the SSD’s input size. For example, crop sizes could be 150, 300, 600, or 900 pixels (figure 6). The sliding-window stride was set at |$1/3$| of the crop size to ensure finer image scanning. For instance, if the crop size is 300 pixels, the next cropping position would be at a stride of 100 pixels. This setup allows the SSD model to process the image more efficiently and accurately detect its features. The areas used for validation data are shown in table 3.
3.4 Loss
In deep learning, loss refers to measuring how well or poorly a model’s predictions match the true values. It is a numerical value that quantifies the error, with lower values indicating better performance. During training, models use backpropagation to adjust their parameters and minimize the loss by calculating gradients and updating weights.
The loss function for SSD is the sum of the confidence loss for class prediction and the location loss for bounding box regression. The location loss is calculated using the Smooth L1 Loss, which is a loss function robust to outliers. The confidence loss is calculated using Cross-entropy Loss.
SSD calculates the loss by dividing the DBoxes into positive DBoxes and negative DBoxes, where the former has intersection over union (IoU) |$\ge$| 0.5 with the ground truth BBoxes (Spitzer bubble location information) and the latter has IoU |$\leq$| 0.5 without the same ground truth BBoxes (figures 7 and 8). For the positive DBox, both location loss and confidence loss are calculated, while for the negative DBox, only confidence loss is calculated. At this time, the number of negative DBoxes becomes significantly larger than that of positive DBoxes. Therefore, to avoid bias between negative DBoxes and positive DBoxes in training, the number of negative DBoxes is limited to three times the number of positive DBoxes. The negative DBoxes with the highest confidence loss are selected. Additionally, because there are no positive DBoxes in non-bubble data (no ground truth BBoxes), the top 10 negative DBoxes with the highest confidence loss were selected for training.

IoU represents the percentage of overlapping boxes. This figure is taken from Nishimoto et al. (2022).

Examples of positive and negative DBoxes. DBoxes with IoU |$\ge$| 0.5 for the ground truth BBox (white boxes) are judged to be positive DBoxes (yellow boxes). DBoxes with IoU |$\leq$| 0.5 are judged to be negative DBoxes (orange box).
3.5 Evaluation criteria
In this study, we used precision, recall, and F2 score (a weighted harmonic mean of precision and recall, giving more weight to recall) as criteria to evaluate the maximum performance of the model. One of the purposes of this study is to find undetected Spitzer bubbles, and increasing the precision may reduce the detection of new undetected Spitzer bubbles. Therefore, we used the F2 score, which emphasizes recall over precision, for performance evaluation. Equations (1)–(3) for calculating precision, recall, and the F2 score are as follows:
TP (true positive) represents the number of MWP bubbles correctly detected. FP (false positive) indicates objects that were detected as Spitzer bubbles but are not MWP bubbles, while FN (false negative) indicates objects that were detected as non-Spitzer bubbles but are cataloged as Spitzer bubbles by MWP.
Additionally, among the BBoxes that exceed the confidence threshold, those with IoU |$\ge$| 0.5 are merged into the BBox with the highest confidence using NMS.
4 Details of data optimization
In this section, we introduce the effects of three optimizations to improve the performance of the model: (1) selection of MWP bubbles, (2) data augmentation, and (3) non-bubble clustering. We illustrate the impact of each optimization by comparing the transitions in precision, recall, and F2 scores. In subsections 4.1, 4.2, and 4.3, we describe the experiments that we conducted with default values of mini-batch size = 8, learning rate = |$1\times 10^{-4}$|, and weight decay = |$1\times 10^{-4}$| (see section 5). In addition, SSD utilizes negative sampling, a form of random sampling, for non-bubble data to facilitate effective learning.
4.1 Selection of Spitzer bubbles
The MWP bubbles include many objects with unclear 8 µm shell structures and 24 µm distributions. These objects can obscure the criteria by which the model detects Spitzer bubbles and negatively impact the detection of clear Spitzer bubbles. To understand how data ambiguity affects model accuracy, we ranked MWP bubbles into three categories and trained the model using training and validation data with different ranks (Rank 1, Ranks |$1+2$|, Ranks |$1+2+3$|). We applied these three models to the test region and determined the optimal rank for training and validation data based on the accuracy of newly detected bubbles.
In this study, we classify the MWP bubbles used into three patterns: Rank 1, Rank 2, and Rank 3, as shown in figure 9. Rank 1 includes bubbles where 8 µm encloses 24 µm. Rank 2 includes bubbles where a distorted 8 µm encloses 24 µm. Rank 3 includes bubbles where 8 µm does not enclose 24 µm. The numbers of bubbles identified as Rank 1, Rank 2, and Rank 3 were 634, 952, and 815, respectively. The total number of bubbles classified as Rank 1, Rank 2, and Rank 3 is 2401, excluding MWP bubbles that span multiple FITS files. We evaluated the performance of the model by changing the MWP bubbles used in training and validation data to Rank 1, Ranks |$1+2$|, and Ranks |$1+2+3$|, using the results of the test region. Because the validation data also use ranked MWP bubbles, each model has different validation data. Therefore, the comparisons in terms of precision, recall, and the F2 score are only for reference. The performance of each model is compared using inference results in the test region. In the test region, there are 84 Rank 1 MWP bubbles, 111 Rank 2 MWP bubbles, and 91 Rank 3 MWP bubbles.

Part of three ranks of MWP bubbles, Rank 1 (8 µm encompassing 24 µm, 634 bubbles), Rank 2 (distorted 8 µm encompassing 24 µm, 952 bubbles), and Rank 3 (no encompassing, 815 bubbles). The name of the MWP bubble used is noted in the top left-hand corner of each image. The 8 and 24 µm emissions are shown in green and red, respectively.
We created training and validation data in three patterns: using only Rank 1, Ranks |$1+2$|, and Ranks |$1+2+3$| MWP bubbles. We trained three models with these three patterns of data and applied the three trained models to the test region. As a result, the number of objects newly detected as bubbles in the test region was 193 for the model trained with Rank 1, 625 for the model trained with Ranks |$1+2$|, and 875 for the model trained with Ranks |$1+2+3$|.
Figure 10 displays 25 randomly selected objects detected by each model. It is clear that the models trained with Ranks |$1+2$| and Ranks |$1+2+3$| MWP bubbles as training data contain a significant number of false positives, i.e., structures that are evidently not associated with high-mass stars. Therefore, we conclude that the inclusion of MWP bubbles with unclear 8 µm shell structures and 24 µm distributions in the training data increases false positives. In subsequent experiments, we used only Rank 1 MWP bubbles for training and validation data.

Comparison of newly detected Spitzer bubble objects in the test region when changing the ranked MWP bubble used in training data. ID = 4, 5, 7, 8, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, and 24 (16/25) for Rank 1. ID = 1, 4, 7, 10, 12, 14, 15, 19, 20, 21, 22, 24, and 25 (13/25) for Ranks |$1+2$|. ID = 1, 7, and 19 (3/25) for Ranks |$1+2+3$|, capturing features of objects formed by the radiation of high-mass stars. The 8 and 24 µm emissions are shown in green and red, respectively.
4.2 Effect of data augmentation
Data augmentation (DA) is expected to improve model performance by expanding the amount of data through image processing including rotation and flipping, thereby preventing overfitting. We attempted to improve performance with DA, because there are only 647 Rank 1 MWP bubbles. In this study, we used translation, rotation, and flipping as data augmentation (DA) techniques. Rotation and flipping were applied to Spitzer bubbles that had been translated beforehand, as shown in figure 11. While numerous other augmentation methods, such as cut-out (DeVries & Taylor 2017), mix-up (Zhang et al. 2018), and GAN-based image generation (Goodfellow et al. 2014), are available, we focused on these conventional techniques to ensure a balanced and effective training process. Future work could explore the potential benefits of incorporating more advanced augmentation strategies.

Example of data augmentation. Rank 1 MWP bubbles were translated and augmented using five patterns of flipping and rotation. The augmented data included upside-down and left–right flipped images, as well as images rotated by |$90^\circ$|, |$180^\circ$|, and |$270^\circ$|. An additional experiment with further rotation angles (|$45^\circ$|, |$135^\circ$|, |$225^\circ$|, and |$315^\circ$|) resulted in an F2 score of 0.619, which is comparable to the score without these additional angles (F2 score = 0.666). Based on these findings, we limited rotations to |$90^\circ$|, |$180^\circ$|, and |$270^\circ$|.
In the translation process, a bubble is cropped from the data and randomly positioned within the image, ensuring that it remains fully contained within the image boundaries. For each epoch, the cropping size of the bubble is randomly chosen, ranging from 1.3 to 6 times the actual size of the bubble. The upper limit of 6 times the size ensures that the bubble remains recognizable within a 300 |$\times$| 300 image. We updated the bubble data created in this way every epoch with a fixed random seed to improve the generalization performance of the model (as explained in subsection 3.3).
To measure the performance improvement with DA, we compared the precision, recall, and F2 score using the test results of the model trained with and without DA. Figure 12 shows a comparison of 25 randomly selected newly detected objects classified as Spitzer bubbles when evaluating the test region. The model trained without DA detected many point sources, whereas the model trained with DA detected many objects that could be considered Rank 1 or 2 MWP bubbles. In this region, the precision, recall, and F2 score with DA were 0.293, 0.976, and 0.666, respectively, while the precision, recall, and F2 score without DA were 0.0596, 0.833, and 0.232, respectively. Based on these results, we conclude that training with DA increased the recall of Rank 1 MWP bubbles and effectively suppressed false positives due to the enhanced background patterns in the bubble data. Therefore, in subsequent experiments, we applied DA to the bubble data in addition to selecting Spitzer bubbles for the training data.

Comparison of the 25 randomly selected objects that were newly detected as Spitzer bubbles by the two models trained on training data with data augmentation (DA) and training data without DA (non-DA). The 8 and 24 µm emissions are shown in green and red, respectively.
4.3 Negative sampling and non-bubble clustering
Here, we select the same number of non-bubble objects as data-augmented bubbles for training. Randomly selected non-bubble data may result in the inclusion of Spitzer bubbles that have not been previously detected, which could degrade the training performance. Additionally, most non-bubble images contain only stars statistically, as shown in figure 4d. Therefore, it is important to include infrared structures that are not Spitzer bubbles, such as infrared ridges or cores, as non-bubble objects in the negative data, in order to prevent these structures from being misidentified as Spitzer bubbles.
To address this issue, we first randomly selected 120000 non-bubble data samples from all regions and applied a clustering-based negative sampling method, as illustrated in figure 13. Using k-means clustering, we divided the non-bubble data into 10 clusters. This method proved to be the most effective for clustering non-bubble data, including undetected Spitzer bubbles. The breakdown of these 10 clusters is shown in figure 14. Among these clusters, clusters 6 and 10 were excluded because they were found to likely contain Spitzer bubbles. For the remaining clusters, the number of images per cluster was balanced through negative sampling, ensuring that each cluster contributed an equal number of samples. This balancing approach helped optimize the training process for non-bubble data. Of particular importance is that clusters 1, 4, 5, and 8 include complex infrared structures, such as filaments and ridges, which are not related to Spitzer bubbles. By including these structures in the negative sampling process, we ensured that the training dataset accounted for diverse non-bubble features, thereby enhancing the model’s robustness.

Method of clustering non-bubble data (|$300 \times 300$| pixel) using feature maps (|$N \times 1 \times$| 1). First, we created non-bubble data. Secondly, the non-bubble data were compressed down to |$1 \times 1$| using the VGG layer of pre-learned SSD. Finally, they were clustered by k-means after being flattened. k-means is a non-hierarchical clustering method that divides data into k clusters (MacQueen et al. 1967). The figure shows an example when classified into three classes.

Results of clustering the non-bubble data by the method shown in figure 13. The number of images for cluster 1–10 was 4675, 29886, 17717, 7253, 11447, 536, 18803, 2362, 26231, and 1090. Due to the large number of images, only 16 of each class were randomly excerpted. This result shows many images with no structures, such as clusters 2, 7, and 9. Because the number of images in each cluster differs greatly, it is clear that the SSD cannot uniformly learn the characteristics of each cluster using simple random sampling from all non-bubble data. Clusters 6 and 10 with Spitzer bubble-like structure were excluded from non-bubble data (marked with a cross). By learning for each cluster, SSD can efficiently learn images of clusters 1 and 8, which have structures that are prone to false positives.
To measure the performance, we compared precision, recall, and F2 score with the test results of the model trained with and without non-bubble clustering. In the test region, precision, recall, and F2 score with non-bubble clustering were 0.419, 0.893, and 0.728, respectively, and precision, recall, and F2 score without non-bubble clustering were 0.293, 0.976, and 0.666, respectively.
For the purpose of creating pure non-bubble data, it is also possible to apply a pre-trained model to all regions and recreate non-bubble data from regions outside MWP bubbles and newly detected bubbles, in addition to the clustering method. However, this method is more time-consuming and the model cannot efficiently learn the non-bubble data with various features; therefore, we adopted the clustering method.
From these results, we conclude that the performance of the model improves by selecting Spitzer bubbles, performing data augmentation, adding infrared structures not related to the bubbles, and removing undetected bubbles through clustering non-bubble data in the training data.
5 Hyperparameter optimization
In SSD, some hyperparameters, such as bubble mini-batch size, learning rate, and weight decay, need to be predetermined before training. We attempted to improve the performance of the model by optimizing the hyperparameters using Bayesian optimization, which is one of the hyperparameter optimization methods. In this study, we searched hyperparameters within the range shown in table 4 using the Weights & Biases sweeps. We searched them 40 times and found that the F2 score tended to be higher when the learning rate and weight decay were below 0.0001 and the bubble mini-batch size was 32. The values of each parameter when the F2 score to validation data was highest at 0.598 were as follows: learning rate is |$7.7461\times 10^{-5}$|, weight decay is |$8.3171\times 10^{-5}$|, and bubble mini-batch is 32. Figure 15 compares the changes in F2 score, recall, and precision between the default hyperparameters and the best parameters. The precision increased overall, and although the increase in recall was smaller than the default parameters, it eventually reached the same level. In section 6, we use the model obtained with the best parameters.3

Comparison of precision, recall, and F2 scores of the best and default models. The x-axis represents the number of epochs, while the y-axis represents the percentage of precision, recall, and F2 score ranging from 0 to 1.
Loss hyperparameter . | ||
---|---|---|
Type . | Min . | Max . |
Learning rate | 0.000001 | 0.001 |
Weight decay | 0.000001 | 0.001 |
Mini-batch hyperparameter | ||
Type | Batch size | |
Bubble mini-batch | 2, 4, 8, 16, 32 |
Loss hyperparameter . | ||
---|---|---|
Type . | Min . | Max . |
Learning rate | 0.000001 | 0.001 |
Weight decay | 0.000001 | 0.001 |
Mini-batch hyperparameter | ||
Type | Batch size | |
Bubble mini-batch | 2, 4, 8, 16, 32 |
The learning rate and weight decay were explored in the range of 0.00001 to 0.1, and the mini-batch was explored in the range of 2, 4, 8, 16, and 32.
Loss hyperparameter . | ||
---|---|---|
Type . | Min . | Max . |
Learning rate | 0.000001 | 0.001 |
Weight decay | 0.000001 | 0.001 |
Mini-batch hyperparameter | ||
Type | Batch size | |
Bubble mini-batch | 2, 4, 8, 16, 32 |
Loss hyperparameter . | ||
---|---|---|
Type . | Min . | Max . |
Learning rate | 0.000001 | 0.001 |
Weight decay | 0.000001 | 0.001 |
Mini-batch hyperparameter | ||
Type | Batch size | |
Bubble mini-batch | 2, 4, 8, 16, 32 |
The learning rate and weight decay were explored in the range of 0.00001 to 0.1, and the mini-batch was explored in the range of 2, 4, 8, 16, and 32.
6 Result and discussion
6.1 Detection of Spitzer bubbles
In this section, we show the performance and validity of our model. First, we apply the model to the test region and compare the result with the MWP bubbles. Then, we extend the application to the training and validation regions. Last, we examine the validity of our model for detecting Spitzer bubbles in other data in three regions: Cygnus X, the LMC, and NGC 628. We show here the characteristics of the detected bubbles. For the inferences, we used crop sizes of 100, 150, 300, 600, 900, 1200, 1500, 1800, 2400, and 3000 pixels, and the sliding-window stride was set at |$1/3$| of the cropping size.
6.1.1 Application to the test region
We applied the model to the test region, which was not used for training and validation. The test region is set to |$10.^{\!\!\!\circ }5 \leq l \leq 22.^{\!\!\!\circ }5$|, |$-1^{\circ } \leq b \leq 1^{\circ }$| (table 3), and contains 286 MWP bubbles (Rank 1: 84; Rank 2: 111; Rank 3: 91). The model evaluated the test region and detected 289 objects (inference time, 23 min). Figure 16 shows the detected objects in magenta and Rank 1 MWP bubbles in white. About 80% of the detected objects are located within galactic latitudes |$-0.^{\!\!\!\circ }5 \leq b \leq 0.^{\!\!\!\circ }5$|, and various sizes of objects can be seen. Table 5 compares the detected objects with Rank 1 MWP bubbles. Out of the 84 Rank 1 MWP bubbles, 82 were detected, resulting in a very high detection rate of 98% for Rank 1 MWP bubbles.

Test region (|$10.^{\!\!\!\circ }5 \leq l \leq 22.^{\!\!\!\circ }5$|, |$-1^{\circ } \leq b \leq 1^{\circ }$|) with 289 objects detected by our model as Spitzer bubbles (magenta squares) and Rank 1 MWP bubbles (white circles). The 8 and 24 µm emissions are shown in green and red, respectively.
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 82 | 2 |
Non-bubble | 207 | — |
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 82 | 2 |
Non-bubble | 207 | — |
The number of Rank 1 MWP bubbles in the test region is 84, while our model detected 289 objects.
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 82 | 2 |
Non-bubble | 207 | — |
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 82 | 2 |
Non-bubble | 207 | — |
The number of Rank 1 MWP bubbles in the test region is 84, while our model detected 289 objects.
Figure 17 shows a histogram of the sizes of bubbles that could and could not be detected among the Rank 1 MWP bubbles. The two undetected bubbles were |${{0^{\prime}\!\!.{}34}}$| (= 17 pixels) and |${{10^{\prime}\!\!.{}34}}$| (= 517 pixels) in size, indicating that both the smallest and largest bubbles were missed. The detection rate by each MWP bubble rank is 98% for Rank 1, 63% for Rank 2, and 22% for Rank 3. Compared to all MWP bubbles, the detection rate was 60%. The 40% of MWP bubbles not detected by our model had distorted shapes, and most of them may be objects that have had some time since bubble formation, have complex environments, or are not associated with high-mass star formation.

Size comparison of Rank 1 MWP bubbles that could and could not be detected by our model.
Figure 18 shows 111 objects that were newly detected as Spitzer bubbles by the model. At least 50% of the newly detected objects can be regarded as Ranks 1 and 2 with 8 µm encompassing 24 µm, such as ID = 7, 10, 20, 25, 48 and 90, indicating the effectiveness of the optimization performed in sections 4 and 5. On the other hand, some objects do not capture the characteristics of Spitzer bubbles. For example, object IDs of 3, 4, 6, and 9 show strong 24 µm and weak 8 µm emission. These false detections are likely due to the inclusion of inappropriate bubble data used for training. As explained in sub-subsection 3.3.1, the bubble data are normalized to a range of 0 to 1 in areas other than point sources detected by DAOStarFinder. Subsequently, as described in subsection 4.2, the bubble data undergo data augmentation, including enlargement. During this process, large bright objects may be included and, not being judged as point sources, are normalized as is. In such cases, the maximum value of the large bright object becomes 1, causing the emission from MWP bubbles to be underestimated. In particular, the 8 µm emission from MWP bubbles is easily underestimated, resulting in the generation of bubble data where 24 µm emission is prominent. Due to these data, the model learns to identify point sources that are bright at 24 µm as Spitzer bubbles, leading to the detection of point sources bright at 24 µm, such as ID = 3, 4, 6, and 9. Finding an appropriate method to remove objects that are compact at 8 µm but brighter and more extended than point sources remains a challenge for future consideration. ID = 0 and 1 seem to be misidentified due to diffraction spikes at 8 µm (Hora et al. 2012).

The 111 objects newly detected as Spitzer bubbles by our model in the test region, sorted by size. The scale bar is 40|$^{\prime \prime }$|. The 8 and 24 µm emissions are shown in green and red, respectively.
Additionally, objects such as ID = 101 and 106 were detected as the same objects with different boxes. In SSD, BBoxes detected as bubbles with an overlap rate of 30% or more are considered the same object. However, ID = 101 and 106 have an overlap rate of less than 30%. Some Spitzer bubbles are in contact with multiple Spitzer bubbles, making distinguishing them difficult based on simple numerical criteria.
6.1.2 Application to the training and validation regions
We applied the model to the entire region, including the areas used for training and validation (|$1^{\circ } \leq |l| \leq 65^{\circ }$|, |$|b| \leq 1^{\circ }$|), to detect Spitzer bubbles that MWP could not detect. The model detected 3006 objects, of which 1413 were newly detected as bubbles. The detection rate for all MWP bubbles was almost the same as in the test region, and the detection rates for each rank were also almost identical (table 6).
Comparison of detection rates with ranked MWP bubbles in the test region and the regions |$1^{\circ } \leq |l| \leq 65^{\circ }$|, |$|b| \leq 1^{\circ }$|.
. | Rank . | ||
---|---|---|---|
. | 1 . | 1 |$+$| 2 . | 1 |$+$| 2 |$+$| 3 . |
Test region | 98% | 78% | 60% |
|${1^{\circ } \leq |l| \leq 65^{\circ }, |b| \leq 1^{\circ }}$| | 97% | 81% | 63% |
. | Rank . | ||
---|---|---|---|
. | 1 . | 1 |$+$| 2 . | 1 |$+$| 2 |$+$| 3 . |
Test region | 98% | 78% | 60% |
|${1^{\circ } \leq |l| \leq 65^{\circ }, |b| \leq 1^{\circ }}$| | 97% | 81% | 63% |
Comparison of detection rates with ranked MWP bubbles in the test region and the regions |$1^{\circ } \leq |l| \leq 65^{\circ }$|, |$|b| \leq 1^{\circ }$|.
. | Rank . | ||
---|---|---|---|
. | 1 . | 1 |$+$| 2 . | 1 |$+$| 2 |$+$| 3 . |
Test region | 98% | 78% | 60% |
|${1^{\circ } \leq |l| \leq 65^{\circ }, |b| \leq 1^{\circ }}$| | 97% | 81% | 63% |
. | Rank . | ||
---|---|---|---|
. | 1 . | 1 |$+$| 2 . | 1 |$+$| 2 |$+$| 3 . |
Test region | 98% | 78% | 60% |
|${1^{\circ } \leq |l| \leq 65^{\circ }, |b| \leq 1^{\circ }}$| | 97% | 81% | 63% |
The detection rate for Rank 1 MWP bubbles was very high, while the detection rate decreased as Ranks 2 and 3 MWP bubbles were included. The high detection rate for Rank 1 MWP bubbles and the low detection rate for Ranks 2 and 3 MWP bubbles suggest that many newly detected sources have properties similar to Rank 1 bubbles. Indeed, many of the newly detected objects appear to be Spitzer bubbles. However, some newly detected objects are small and have point sources or fuzzy structures, indicating a certain number of false detections. With the current model performance, these need to be manually excluded. Table 7 shows part of the catalog of the newly detected bubbles.
Part of the catalog of the 1413 objects that our model newly detected as Spitzer bubbles in the entire training and validation regions.*
Name . | |$G_{\mathrm{LON}}$| † (|$^{\circ }$|) . | |$G_{\mathrm{LAT}}$| † (|$^{\circ }$|) . | Radius (|$^{\prime }$|) . |
---|---|---|---|
|$SB-GP_{0}$| | 5.7442 | 1.0089 | 0.22 |
|$SB-GP_{1}$| | 14.3548 | 0.5842 | 0.22 |
|$SB-GP_{2}$| | 295.5209 | |$-$|0.2372 | 0.23 |
Name . | |$G_{\mathrm{LON}}$| † (|$^{\circ }$|) . | |$G_{\mathrm{LAT}}$| † (|$^{\circ }$|) . | Radius (|$^{\prime }$|) . |
---|---|---|---|
|$SB-GP_{0}$| | 5.7442 | 1.0089 | 0.22 |
|$SB-GP_{1}$| | 14.3548 | 0.5842 | 0.22 |
|$SB-GP_{2}$| | 295.5209 | |$-$|0.2372 | 0.23 |
The complete version of table 7 and an image of the newly detected objects are available as supplementary data in table 7 and figure E1.
|$G_{\mathrm{LON}}$| and |$G_{\mathrm{LAT}}$| are the galactic longitude and latitude for the central positions of the objects.
Part of the catalog of the 1413 objects that our model newly detected as Spitzer bubbles in the entire training and validation regions.*
Name . | |$G_{\mathrm{LON}}$| † (|$^{\circ }$|) . | |$G_{\mathrm{LAT}}$| † (|$^{\circ }$|) . | Radius (|$^{\prime }$|) . |
---|---|---|---|
|$SB-GP_{0}$| | 5.7442 | 1.0089 | 0.22 |
|$SB-GP_{1}$| | 14.3548 | 0.5842 | 0.22 |
|$SB-GP_{2}$| | 295.5209 | |$-$|0.2372 | 0.23 |
Name . | |$G_{\mathrm{LON}}$| † (|$^{\circ }$|) . | |$G_{\mathrm{LAT}}$| † (|$^{\circ }$|) . | Radius (|$^{\prime }$|) . |
---|---|---|---|
|$SB-GP_{0}$| | 5.7442 | 1.0089 | 0.22 |
|$SB-GP_{1}$| | 14.3548 | 0.5842 | 0.22 |
|$SB-GP_{2}$| | 295.5209 | |$-$|0.2372 | 0.23 |
The complete version of table 7 and an image of the newly detected objects are available as supplementary data in table 7 and figure E1.
|$G_{\mathrm{LON}}$| and |$G_{\mathrm{LAT}}$| are the galactic longitude and latitude for the central positions of the objects.
The inference time for this region was approximately 3.6 h, significantly shorter than MWP detection period. Given the high detection rate of Rank 1 MWP bubbles and the characteristics of the newly detected bubbles, it is evident that the deep-learning model is highly effective in detecting Spitzer bubbles.
6.1.3 Application to the Cygnus X region
We introduce the results of applying our model to Cygnus X. Cygnus X is one of the most active star-forming regions in the Milky Way galaxy, located at a distance of approximately 1.4 kpc (Rygl et al. 2012). Cygnus X contains hundreds of OB-type stars (Wright et al. 2015), and 47 MWP bubbles have been listed by Jayasinghe et al. (2019). Schneider et al. (2006, 2007) showed that the molecular clouds in Cygnus X form connected groups, and it is understood that most of the molecular clouds in this region are at the same distance. We verified if it was possible to detect Spitzer bubbles in Cygnus X, which includes such massive molecular cloud complexes. The data used were survey data from IRAC and MIPS centered around [RA(J2000.0), Dec(J2000.0)] = (20h30m258, +40°0′), covering an area of 24 deg|$^2$| (Papovich et al. 2016).
Our model detected 69 objects in Cygnus X. Figure 19 shows the detected objects in magenta and MWP bubbles in white. It can be seen that objects of various sizes were detected. Table 8 shows the confusion matrix between the objects detected by our model and the MWP bubbles. The detection rate for all MWP bubbles was 62%. Because the MWP bubbles in Cygnus X were not ranked, the detection rate for all MWP bubbles was almost the same as that in the test region. Even in Cygnus X, where star formation is active, and the 8 µm distribution is complex, the bubble detection rate of the model was close to that in the test region.

Cygnus X with 69 objects detected by our model as Spitzer bubbles (magenta squares) and MWP bubbles (white circles). The 8 and 24 µm emissions are shown in green and red, respectively.
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 29 | 18 |
Non-bubble | 40 | — |
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 29 | 18 |
Non-bubble | 40 | — |
The number of MWP bubbles in Cygnus X is 47, while our model detected 69 objects.
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 29 | 18 |
Non-bubble | 40 | — |
. | Predicted . | |
---|---|---|
MWP . | Bubble . | Non-bubble . |
Bubble | 29 | 18 |
Non-bubble | 40 | — |
The number of MWP bubbles in Cygnus X is 47, while our model detected 69 objects.
Figure 20 shows the 40 newly detected objects as bubbles (sorted by size). Many detected objects are bubbles where 8 µm encloses 24 µm (e.g., ID = 3, 13, 21, 23, 26, 30, and 36). On the other hand, objects such as ID = 7, 12, 20, and 38 are faint and extended at 24 µm, similar to the appearance at 8 µm. These objects have shell-like structures, so the model judged them as bubbles, but it is difficult to determine whether they are Spitzer bubbles. Furthermore, objects such as ID = 32 and 33, which have isolated 8 µm distributions with strong 24 µm point sources, were also detected. In such cases, it is necessary to improve the training data and increase the number of training iterations. The inference time for Cygnus X was approximately 8 min.

40 objects in Cygnus X newly detected as Spitzer bubbles by our model, sorted by size. The scale bar corresponds to 0.5 pc. A coordinate catalog of objects newly detected as Spitzer bubbles in Cygnus X is available as supplementary data in table E1. The 8 and 24 µm emissions are shown in green and red, respectively.
6.1.4 Application to the LMC
The LMC is an extragalaxy located approximately 50 kpc from the Milky Way (Pietrzyński et al. 2019). Owing to small inclination (approximately 35|$^\circ$|: van der Marel & Cioni 2001), individual objects in the LMC have less uncertainty in the distance. Additionally, the dust-to-gas mass ratio varies significantly spatially, being about 2–4 times the value near the Sun (Gordon et al. 2003). Observational studies in the LMC are important for investigating high-mass star formation in environments that are different from the Milky Way. However, the LMC is located farther away than objects in the Milky Way, resulting in worse spatial resolution compared to the test region. No Spitzer bubble catalog has been made toward the LMC. The data were taken from the Surveying the Agents of a Galaxy’s Evolution (Meixner et al. 2006), observed with IRAC (8 µm) and MIPS (24 µm).
Figure 21 shows in magenta the 128 objects identified as bubbles by the model. No object exceeding a few hundred pc was detected, but a range of 10–100 pc-sized objects were successfully identified. Figure 22 shows images of the 128 detected objects (sorted by size). It can be confirmed that many of the detected objects were 24 µm enclosed by 8 µm. Additionally, small objects such as ID = 0–39 have characteristics similar to yellow balls (except for 28, 31, 36, and 38). The typical size of bubbles is less than several dozens of parsecs (see subsection 1.1), and considering the spatial resolution of the LMC (approximately 0.49 pc pixel|$^{-1}$|), these objects may be unresolved bubble structures.

The LMC with 128 objects detected by our model as Spitzer bubbles (magenta circles). The 8 and 24 µm emissions are shown in green and red, respectively.

128 objects in the LMC newly detected as Spitzer bubbles by our model, sorted by size. The scale bar is 10 pc. A coordinate catalog of objects newly detected as Spitzer bubbles in the LMC is available as supplementary data in table E2. The 8 and 24 µm emissions are shown in green and red, respectively.
However, it has been confirmed that some objects are associated with Mira-type variable stars (ID = 6, 25, and 38) and T Tauri stars (ID = 21). Objects associated with such stars have radiation spectra similar to Spitzer bubbles, making them difficult to distinguish. Furthermore, objects such as ID = 69, 76, and 82, where the 8 and 24 µm distributions are similar to those seen in the Cygnus X region, have an extended 24 µm distribution resembling the appearance at 8 µm, making it difficult to determine whether they really are Spitzer bubbles.
In addition to these characteristics, ID = 44 (ESO 55-29), 45 (NGC 2150), and 47 (ESO 56-154) are galaxies or active galactic nuclei, and ID = 60 and 104 (N 132D) are SNRs. Galaxies have 8 and 24 µm emissions similar to Spitzer bubbles, making them difficult to distinguish; therefore, the model may have mistakenly detected them.
The inference time for the LMC was approximately 20 min.
6.1.5 Application to NGC628
NGC 628 (also known as M 74) is a spiral galaxy located 9.84 |$\pm$| 0.63 Mpc from our galaxy. It is a face-on galaxy, allowing for detailed studies of spiral arms, star formation regions, and the ISM. Since high angular resolution observations have been made toward NGC 628 from optical to radio wavelengths, such as PHANGS-ALMA (Atacama Large Millimeter/submillimeter Array: Leroy et al. 2021), PHANGS-MUSE (Multi Unit Spectroscopic Explorer: Emsellem et al. 2022), and PHANGS-HST (Hubble Space Telescope: Lee et al. 2022), it is an ideal laboratory for understanding galaxy formation and evolution. PHANGS-JWST (Watkins et al. 2023) detected 1964 Spitzer bubbles in NGC 628 using PHANGS data and JWST 7.7 µm. We used JWST 7.7 and 21 µm data, which are close in wavelength to Spitzer 8 and 24 µm, to verify whether Spitzer bubbles can be detected in galaxies located farther than the LMC.
Since, in distant galaxies such as NGC 628, the spatial resolution is more than 10 times worse than that in the LMC, many Spitzer bubbles with compact 8 and 24 µm distributions, such as yellow balls, may be detected. Therefore, for NGC 628, we added crop sizes of 25, 50, and 75 pixels to the inference crop sizes, resulting in 25, 50, 75, 100, 150, 300, 600, and 900 pixels. Crop sizes of 1200, 1500, 1800, 2400, and 3000 pixels were excluded as they would exceed the observation data. The sliding-window stride remained |$1/3$| of the crop size. One pixel of NGC 628 observation data from JWST is |${0^{\prime\prime}.11}$|, corresponding to 5.28 pc.
As a result of the inference, the model detected 203 objects as Spitzer bubbles. Figure 23 shows the distribution of the detected objects. The sizes of the detected bubbles range from approximately 40 to 400 pc, with many bubbles distributed along the arms. It can be confirmed that many of the detected objects had 24 µm enclosed by 8 µm (figure 24). On the other hand, similar to the LMC, objects with compact 8 and 24 µm distributions were also observed owing to the low resolution. The inference time for the NGC 628 was approximately 5 min.

JWST image of NGC 628 overlaid with 203 objects detected by our model as Spitzer bubbles (magenta circles). Many of the objects are distributed over the arm. The 7.7 and 21 µm emissions are shown in green and red, respectively.

203 objects newly detected as Spitzer bubbles by our model in the NGC 628, sorted by size. The scale bar corresponds to 3 pc. A coordinate catalog of objects newly detected as Spitzer bubbles in the NGC 628 is available as supplementary data in table E3. The 7.7 and 21 µm emissions are shown in green and red, respectively.
6.1.6 Characteristics of Spitzer bubbles detected by our model
In this study, we ranked MWP bubbles and used only Rank 1 MWP bubbles for training and validation data. As shown in subsection 4.1, using all MWP bubbles in the training data increases the number of obvious false detections that do not have the characteristics of Spitzer bubbles.
The most concerning point about ranking MWP bubbles for use in training data is the potential omission of Spitzer bubbles generated by high-mass stars due to the restriction on the MWP bubbles. However, our developed model achieved a very high detection rate of 97% for Rank 1 MWP bubbles. Additionally, the model could newly detect objects with similar characteristics to Rank 1 MWP bubbles. The number of these newly detected objects is approximately equal to the number of Rank 1 MWP bubbles, effectively doubling the sample size. This fact demonstrates the limitations of human detection and highlights the importance of deep-learning detection methods like those used in this study.
Furthermore, such comprehensive bubble detection is fundamental research for understanding star formation mechanisms, as seen in the statistical estimates of triggered star formation by Thompson et al. (2012) and Kendrew et al. (2012).
6.2 Detection of (super-)bubbles created by supernova
In detecting Spitzer bubbles in the LMC and NGC 628, large shell-like structures observed in 8 µm emission were rarely detected. This behavior reflects that these shell-like structures were not formed in association with recent high-mass star formation, as 24 µm emission is not observed within the shells. In this subsection, we introduce the detection of shell-like structures observed in the 8 µm emission band in the LMC and NGC 628 (hereafter, 8 µm shell-like structures). Recent JWST observations have confirmed many 8 µm shell-like structures in NGC 628 and other galaxies (Barnes et al. 2023; Mayya et al. 2023). Shell-like structures larger than several hundred parsecs are thought to have been formed by supernova explosions. The ISM is swept by shock waves from supernova explosions approximately once every million years, and molecular clouds are born through repeated sweeping by shock waves. Filamentary molecular clouds perpendicular to the magnetic field are formed with each compression, and star formation begins when the line density becomes sufficiently large (Inutsuka et al. 2015). Many 8 µm shell-like structures are considered to be formed through this process. We attempted to detect 8 µm shell-like structures by applying the methods established in sections 4 and 5.
We developed the model using the methods described in sections 4 and 5, changing only the training data. We created the training data by extracting only the 8 µm data (Spitzer IRAC) from the training data used in section 5. The MWP bubbles were reselected to include objects with well confirmed 8 µm shell-like structures (see subsection 4.1). We note here that the training data that we used are the 8 µm shells associated with Spitzer bubbles, not those created by supernovae.
6.2.1 Application to the LMC
We trained a model capable of detecting 8 µm shell-like structures and applied it to the LMC. As a result, the model detected 469 objects as 8 µm shell-like structures (figure 25). The sizes range from about 10 pc to about 900 pc. Most of these objects have well confirmed 8 µm shell-like structures. Some of these objects are accompanied by 24 µm and are thought to be Spitzer bubbles.

A Spitzer image of the LMC overlaid with 469 objects detected by the model as 8 µm shell-like structures (magenta circles). An image and a coordinate catalog of the individual 8 µm shell-like structures are available as supplementary data in figure E2 and table E4. The 8 and 24 µm emissions are shown in green and red, respectively.
The crop size and sliding-window stride are the same as in sub-subsection 6.1.4. The inference time was also about 20 min.
6.2.2 Application to NGC 628
Next, we introduce the results of applying this model to the 7.7 µm data of NGC 628 observed by JWST. The crop size and the sliding-window stride were the same as those in sub-subsection 6.1.5. As a result, the model detected 143 objects as bubbles (figure 26). Similar to the LMC, most objects had 8 µm shell-like structures. Even with the model trained on the 8 µm shell structure of Spitzer bubbles, which were formed by the emission of a young high-mass star, it is possible to detect 8 µm shell-like structures of the order of a few hundred pc that may have been formed by a supernova explosion. However, some 8 µm shell-like structures in NGC 628 were not comprehensively detected. This result may be caused because, while Spitzer bubbles typically exhibit a ring structure with an 8 µm emission, some super-bubbles formed by supernova explosions may present just as holes in the extended ISM, which the current model may not detect. To detect bubble structures thought to be formed by supernova explosions, we need to construct training data by using data such as JWST data or simulation results; this will be discussed in our forthcoming paper.

A JWST image of NGC 628 overlaid with 143 objects detected by the model as 8 µm shell-like structures (magenta circles). An image and a coordinate catalog of the individual 8 µm shell-like structures are available as supplementary data in figure E3 and table E5. The 7.7 and 21 µm emissions are shown in green and red, respectively.
Recent observations indicate that galaxies are filled with bubbles formed by supernova explosions (Barnes et al. 2023). These bubbles interact with the ISM and adjacent bubbles, inducing active star formation and forming Spitzer bubbles. Thus, by applying the method developed in this study to detect bubbles formed by supernova explosions observed at 8 µm, it becomes possible to advance the study of star formation history in galaxies statistically by comparing the spatial distribution and the dynamics of the associated gas with these two types of bubbles.
7 Summary
We developed a deep-learning model to detect Spitzer bubbles with the single-shot multibox detector using 8 and 24 µm data from the Spitzer Space Telescope. Applying this model to the Milky Way at |$1^{\circ } \leq |l| \leq 65^{\circ }$|, |$|b| \leq 1^{\circ }$|, we newly identified 1413 objects as Spitzer bubbles, many of which exhibit a distinct 8 µm feature encompassing 24 µm emission. In addition, the detection rate of Rank 1 MWP bubbles was very high at 98%. When we also applied the model to Cygnus X, the LMC, and NGC 628, the model newly detected 40 objects in Cygnus X, 128 in the LMC, and 203 in NGC 628 as Spitzer bubbles. These newly detected objects share similar characteristics to Rank 1 MWP bubbles, indicating that our model is effective in detecting Spitzer bubbles. Inference times varied by region, being approximately 3.6 h for the Milky Way (|$1^{\circ } \leq |l| \leq 65^{\circ }$|, |$|b| \leq 1^{\circ }$|), 8 min for Cygnus X, 20 min for the LMC, and 5 min for NGC 628. Giving the high detection rate of Rank 1 MWP bubbles, the accuracy of newly detected bubbles, and the model’s efficiency, this deep-learning model approach proves effective for rapidly and accurately detecting Spitzer bubbles compared to the manual method. We can use the model to detect Spitzer bubbles with high speed and accuracy for observational data obtained by JWST and other telescopes. However, some compact objects that the model newly detected as bubbles are galaxies and Mira-type variable stars, which are difficult to distinguish from Spitzer bubbles. To further improve the performance of the model, we consider it necessary to create training data using simulations and construct an optimal architecture for detecting Spitzer bubbles in the future.
The deep-learning method used in this study can be applied to various objects. In this study, we attempted to detect shell-like structures formed in the 8 µm band in the LMC and NGC 628 using the Spitzer Space Telescope (8 µm) and JWST (7.7 µm) data. For the training data, we used the 8 µm shell-like structure of Spitzer bubbles. The model detected 469 shell-like structures in the LMC and 143 in NGC 628. Some of these 8 µm shell-like structures may have been formed by supernova explosions. Although the model was able to detect many 8 µm shell-like structures, there were still some objects that could not be detected. We consider it necessary to carefully create the training data for 8 µm shell-like structures.
Acknowledgments
This work was supported by the “Young interdisciplinary collaboration project” in the National Institutes of Natural Sciences (NINS) and JSPS KAKENHI Grant Numbers JP23H00129 and JP18H05440. This work was also supported by JST SPRING, Grant Number JPMJSP2139. The authors would like to thank all the FUGIN-AI members who were involved in this study, and would also like to thank Shota Ueda for his generous support. This work made use of Photutils, an Astropy package for detection and photometry of astronomical sources (Bradley 2023).
Appendix. Details of the machine and environment used in our model
The machine and environment used for model development in this paper are as follows:
Hardware specifications
Machine type: Custom-built desktop
CPU: Intel Core i9-13900K @ 5.50 GHz, 24 cores
GPU: NVIDIA GeForce RTX 4090 @ 24 GB GDDR6X
RAM: 96 GB DDR5 @ 4800 MHz
Storage: 2 TB NVMe SSD
Software environment
Operating system: Ubuntu 20.04.6 LTS
NVIDIA-driver version: 545.23.08
CUDA version: 12.3.107
Python version: 3.8.10
Python libraries
Astropy: 5.2.2
Matplotlib: 3.7.4
NumPy: 1.24.4
Pandas: 2.0.3
PyTorch: 2.1.2
SciPy: 1.10.1
The main Python libraries are listed above. For other libraries that depend on these, see requirements.txt at the Github account.3 Docker was also used to build the learning environment; see the docker folder at the Github account3 for details on the Docker files.
Footnotes
|$\langle$|https://mast.stsci.edu/|$\rangle$|.
|$\langle$|https://zenodo.org/records/12585239|$\rangle$|.
The Python codes of our model are available at |$\langle$|https://github.com/ninpei7114/galactic_bubble|$\rangle$|.