Fig. 2.
Convolution flow and output of the SSD on which our model is based. The arrows indicate the sequence in which the data are convolved. Sources 1–6 mean feature maps generated by the convolution of the input image. Source 1 is the output after inputting the 38 |$\times$| 38 |$\times$| 512 feature map to L2 Norm, then all sources are input to the loc and conf layers. SSD can detect both large and small objects using feature maps with six different resolutions. In this figure, only a single image is input, but it can be input in specified batch units. The input image is an example of a Spitzer bubble with 8 µm emission in green, and 24 µm emission in red. For all subsequent images observed by the Spitzer Space Telescope, we will use the same scheme. See table 2 for the size of each source and subsection 3.2 for the DBox.