Abstract

We examine critically recent claims for the presence of above-atmosphere optical transients in publicly available digitized scans of Schmidt telescope photographic plate material derived from the National Geographic Society–Palomar Observatory Sky Survey. We employ the publicly available SuperCOSMOS Sky Survey catalogues to examine statistically the morphology of the sources. We develop a simple, objective, and automated image classification scheme based on a random forest decision tree classifier. We find that the putative transients are likely to be spurious artefacts of the photographic emulsion. We suggest a possible cause of the appearance of these images as resulting from the copying procedure employed to disseminate glass copy survey atlas sets in the era before large-scale digitization programmes.

1. INTRODUCTION

The National Geographic Society–Palomar Observatory Sky Survey (hereafter POSSI) undertaken in the mid-20th century was the main deep and wide-angle optical reference of the sky for many decades following its publication (Minkowski & Abell 1963). When subsequently supplemented with Southern hemisphere counterpart surveys conducted with other Schmidt telescopes, notably the UK Schmidt Telescope at Siding Springs Observatory in Australia (Cannon 1984), an optical reference of the entire sky was created. Coupled with second-epoch surveys and the advent of large-scale digitization programmes (Lasker 1995) whole-sky multicolour and multi-epoch catalogues were created that became the cornerstone of survey astronomy prior to the advent of wide angle surveys with cameras employing large format digital detectors. The 1.2 m Schmidt photographic surveys have been superseded in terms of depth and image quality by more recent programmes employing telescopes with larger apertures and sensitive CCD cameras (e.g. SDSS, Gunn et al. 1998; PanSTARRS, Chambers et al. 2016). However it was not until relatively recently that a superior, deep, and truly all-sky optical survey has been achieved with the advent of ESA–Gaia (Gaia Collaboration 2016). Moreover the venerable digitized optical all-sky surveys remain relevant today as a snapshot of the sky at epochs during the latter half of the last century. They are of course particularly important as an early epoch reference for current and future time-domain astronomical surveys.

Recently a number of articles have appeared in the astronomical literature noting the presence of a significant population of apparently transient sources on POSSI ‘E’ (red) plate scans (Villarroel et al. 2021, and citations thereof). The data analysed came from both the Digitized Sky Survey (DSS, Lasker 1992) and the SuperCOSMOS Sky Survey (SSS, Hambly et al. 2001a, b, c). While the original discovery paper discusses various plausible non-astronomical origins for the nine transients identified, subsequent work (e.g. Solano, Villarroel & Rodrigo 2022; Solano et al. 2024) is firmly based on the premise that these apparent transients represent real, above-atmosphere astronomical phenomena. If real, causality arguments in conjunction with the appearance and disappearance, over angular scales of tens of arcminutes, of multiple simultaneous detections in 45 min exposures separated by 6 d require the sources to be well inside the Solar system. Villarroel et al. (2022) propose a population of highly reflective, glinting objects in near-Earth orbit as the source of the transients. No such population has been noted in more recent, deeper surveys such as SDSS and PanSTARRS, but if it truly exists there would be major implications for current and future deep, high time-cadence surveys, e.g. the Vera Rubin Observatory Legacy Survey of Space and Time (Schwamb et al. 2023, and references therein). Such surveys are already facing significant contamination problems from constellations of artificial satellites currently being launched into low Earth orbit (Hu et al. 2022, note that this was a concern at the mid-20th century epoch of POSSI of course).

In this paper, we present an independent analysis of the source photographic plate material and SSS scans used in passing by Villarroel et al. (2021). We argue against the premise that these are real above-atmosphere transients and suggest an alternative, mundane explanation as to their origin.

2. OBSERVATIONS, METHODS, AND RESULTS

The 1.2 m Palomar Oschin and UK Schmidt Telescopes employed 355 × 355 mm2, 1 mm thin glass plates in a vacuum-clamping plate holder for optimum focus over the focal surface of the imaging optics. The delicate survey originals were then copied on paper (in the case of POSSI), film, and glass for distribution to observatories and research institutes for the purposes of inspection and quantitative densitometry (Morgan et al. 1992). It was primarily the glass copies that were digitized at scale by various scanning programmes in the latter half of the 20th century (Hambly et al. 2001c, and references therein). It is our understanding that of the major all-sky digitization programmes undertaken, only that of the USNO (Monet et al. 2003) had access to the POSSI originals for scanning. The SSS and DSS (McLean, personal communication; see also Lasker 1995) employed glass copies of POSSI. The primary method for atlas copy production was contact printing (West & Dumoulin 1980). In this process, intermediate glass positives, and then a second-generation copy negative from that positive, were made for each original. Exposures were made in vacuum and using ultraviolet light for optimum focus and resolution over the full 6 × 6 deg2 field of view. Hence, the DSS and SSS data used in the study of Villarroel et al. (2021) are in fact derived from independent machine scans and software processing of independent glass copies, those independent copy negatives having been generated from the same copy positive, itself a copy of the one original POSSI negative.

2.1. Visual inspection of copy plates

The copy set scanned during the SSS programme is to this day housed in the plate archive of the Royal Observatory, Edinburgh and is available for close inspection work. We inspected the locally held glass copy plates of POSSI survey fields E0070 and E0086 visually to verify the presence of the transients identified on the former in the publicly available digital scans by Villarroel et al. (2021). We confirmed the presence of nine point-like detections on E0070 at the positions reported, albeit appearing (to the eye aided by microscope) as generally rather more concentrated and circular than typical stellar images of the same estimated brightness. We noted, also as originally reported, that they are absent in the overlapping plate E0086 (exposed 6 d later) but furthermore that there are similar detections present on E0086 that are not present on E0070. At the same time, we noted also the presence of various blemishes on the POSSI copy plates and we present some examples in Fig. 1. The presence of emulsion holes is particularly germane (see later).

Examples of imaging defects on glass copy plates of POSSI fields E0070 and E0086 in the atlas set held at the Royal Observatory Edinburgh. On the left is what appears to be a bubble in the thin emulsion coating on the plate; in the middle is an emulsion hole; and on the right are examples of dark marks that could plausibly be produced via emulsion scratches on the intermediate copy positive.
Figure 1.

Examples of imaging defects on glass copy plates of POSSI fields E0070 and E0086 in the atlas set held at the Royal Observatory Edinburgh. On the left is what appears to be a bubble in the thin emulsion coating on the plate; in the middle is an emulsion hole; and on the right are examples of dark marks that could plausibly be produced via emulsion scratches on the intermediate copy positive.

2.2. Image profile statistics

Given the appearance of the blobs as somewhat sharper than is typical for stars, we extracted profile information from the SSS catalogues via the SuperCOSMOS Science Archive (SSA, Hambly et al. 2004). Amongst a set of image parameters measured in the digitized plate data (position, brightness, and morphology) this provides a profile statistic (hereafter denoted η) for each detected source (Hambly et al. 2001b). This is a continuously distributed, zero-mean unit-weight statistic distilled from 1 d radial profile information. That information is derived from the ‘areal profile’ as determined at re-thresholded intensity levels above the threshold isophote as part of the image parameter analysis stage using the algorithm of Bunclark & Irwin (1983).1 The normalization is done with respect to the dominant population of point-like sources on a given plate. The resulting |$\mathcal {N}(0,1)$| statistic quantifies how point-like each detection is compared with the typical stellar profile for the plate and, amongst other benefits, provides the means for discrete image classification cuts. For example, in the SSA, stellar images have −3 < η < +2.5; galaxies with shallower profiles have η > 2.5; and spurious noise images with generally sharper (than stellar) profiles have η < −3.0. Fig. 2 shows the distribution of η versus magnitude for detections in the SSA from POSSI plate E0070 (SSA plateId = 327750; see example queries in the Appendix). Also highlighted are the nine potential transients identified by Villarroel et al. (2021) and it is noteworthy that all lie on the negative side of the |$\mathcal {N}(0,1)$| distribution in the range of −3 ⪅ η < −1. While these result in a rough discrete classification flag of 2 (≡ star) in the SSA for each individual detection in eight of the nine cases (one having η = −3.2 is classed as noise), the combined probability of drawing nine values having η ⪅ −1.0 from a truly normal distribution is |$\mathcal {O}(0.1^9)$|⁠, i.e. vanishingly small.

Distribution of the stellar profile statistic η for the SSA catalogue detections in POSSI survey field E0070 versus red magnitude E in the range of 10 < E < 20 and −10.0 < η < +10.0. The central $\mathcal {N}(0,1)$ distribution at any magnitude is occupied by point sources; galaxies and blends scatter to positive values of the statistic while sharp, noise features tend to scatter to negative values. Large solid circles highlight the data points of the transient sources identified by Villarroel et al. (2021).
Figure 2.

Distribution of the stellar profile statistic η for the SSA catalogue detections in POSSI survey field E0070 versus red magnitude E in the range of 10 < E < 20 and −10.0 < η < +10.0. The central |$\mathcal {N}(0,1)$| distribution at any magnitude is occupied by point sources; galaxies and blends scatter to positive values of the statistic while sharp, noise features tend to scatter to negative values. Large solid circles highlight the data points of the transient sources identified by Villarroel et al. (2021).

2.3. Image classification via machine learning

In order to put the analysis of image morphology on a more objective statistical footing we undertook a simple machine learning (ML) image classification exercise. We selected a subset of the morphological parameters available from the SSA and trained up a random forest decision tree classifier in three classes (star, galaxy, and spurious) using independent modern sky survey catalogues as a source of highly reliable training data. The procedure undertaken is outlined in the following sections (where we have denoted public data base table and column names in mono-type for convenience and clarity).

2.3.1. Plate catalogue data selection

We created individual plate catalogue selections from the SSA for fields E0070 and E0086 by applying cuts to retain isolated, well-exposed detections with no obvious quality issues: blend = 0; sMag < 19.5; and quality < 128. We retained the following six catalogue parameters as features on which to train and subsequently classify: profile statistic η prfStat; photographic plate magnitude sMag; image area area; intensity weighted image location on plate xCen, yCen; and eccentricity e computed from the intensity weighted semimajor and semiminor ellipse fit parameters aI and bI (an example SSA data base query is provided in the Appendix).

2.3.2. Training data: high-reliability stars

A selection of highly reliable, isolated stars covering each field was created using Gaia DR3 (Gaia Collaboration 2023). We made an astrometric reliability cut following Lindegren et al. (2021) via the ‘renormalized unit-weight error’ statistic (ruwe < 1.4; again, an example query is provided in the Appendix). This selection was then proper motion corrected to the observation epoch of the plate and paired with the plate catalogue (data as defined earlier) using a proximity criterion of 1 arcsec, this being dictated by the likely uncertainties in the SSS plate astrometry (Hambly et al. 2001c ), to create a list of high reliability stars detected on the plate.

2.3.3. Training data: high-reliability galaxies

A selection of highly reliable galaxies covering each field was created using PanSTARRS PS1-DR2 (Flewelling et al. 2020). Tables ObjectThin and StackObjectThin were joined to provide multiple-detection, multiband catalogue including Point Spread Function and Kron magnitudes and source flags. Our selection required detection in both PS1 r and i and we applied the recommended star–galaxy separation criterion iPsfMagiKronMag > 0.05 (Farrow et al. 2014). A brightness cut was applied as rPsfMag < 21. Once again the training input data set was created by proximity pairing the high reliability catalogue of galaxies to the plate catalogue using a proximity criterion of 1 arcsec.

2.3.4. Training data: high-probability spurious detections

Our approach to defining a training set of high likelihood spurious plate detections was to use highly complete star and galaxy catalogues as defined earlier but without the quality criteria, and negate the pair association with a relaxed proximity criterion. We defined a plate catalogue entry as likely spurious if there was no associated Gaia stellar or PS1 galaxy entry of any kind within 5 arcsec of its measured position. In this way, we ensured that if there was any chance of a plate catalogue entry being real it would not be included in the spurious detection training data. Needless to say the transient catalogue detections on E0070 were excluded from the training set.

2.3.5. Training and testing

Following standard practice in defining ML training data the star and galaxy sets were sub-sampled down to be the same size as the spurious detection catalogue (this being the smallest of the three) and then all three were split 80 per cent/20 per cent to give independent training and validation sets numbering ≈10 000 and 2500 entries, respectively. We employed the random forest implementation in scikit-learn (Pedregosa et al. 2011) for our decision tree classifier.

Automated hyper-parameter tuning (Probst, Boulesteix & Bischl 2019) was achieved using the scikit-learn HalvingRandomSearchCV module. Two key hyper-parameters in random forest decision trees are the number of trees (also known as the number of estimators) and the number of features (up to a maximum of all those available) that are employed at each decision node (Efron & Tibshirani 1994; Hastie, Tibshirani & Friedman 2009). Such hyper-parameters influence execution speed and reliability, and need to be chosen with care to avoid overfitting. We examined the ‘out-of-bag’ (OOB) error for our training sets and found an optimum configuration (minimum number of estimators beyond which there is no significant improvement in the OOB error) of ≈100 trees with a restriction on the maximum number of features used as square-root (E0070) or base-2 logarithm (E0086) of the number available, i.e. √6 or log26. The resulting scikit-learn hyper-parameters for training independent random forest classifiers for the two adjacent fields E0070 and E0086 are given in Table 1 and in Fig. 3. We show the confusion matrix for each field. Table 2 quantifies precision (= true positives divided by sum of true and false positives), recall (= true positives divided by sum of true positives and false negatives), and F1 score (= weighted mean of precision and recall, Fawcett 2006). Finally in Table 3, we show the feature importance output as part of the training. As expected it is the profile statistic that carries most weight in classifying the images while position on the plate is relatively unimportant.

Normalized confusion matrix amongst the true/false positive/negatives when training for three detection classes in the data sets for field E0070 (left-hand panel) and E0086 (right-hand panel). True positives are on the diagonal from upper left to lower right in each case. Field-to-field differences indicate likely random uncertainties in the procedure; the magnitude cut of E < 19.5 mag avoids the regime where emulsion noise introduces the strongest dependencies on signal-to-noise ratio in the results.
Figure 3.

Normalized confusion matrix amongst the true/false positive/negatives when training for three detection classes in the data sets for field E0070 (left-hand panel) and E0086 (right-hand panel). True positives are on the diagonal from upper left to lower right in each case. Field-to-field differences indicate likely random uncertainties in the procedure; the magnitude cut of E < 19.5 mag avoids the regime where emulsion noise introduces the strongest dependencies on signal-to-noise ratio in the results.

Table 1.

Random forest hyper-parameters for the two fields.

Hyper-parameterValue in field
E0070E0086
max_depth223
min_samples_split236
min_samples_leaf117
max_features√6log2(6)
Number of trees128128
Hyper-parameterValue in field
E0070E0086
max_depth223
min_samples_split236
min_samples_leaf117
max_features√6log2(6)
Number of trees128128
Table 1.

Random forest hyper-parameters for the two fields.

Hyper-parameterValue in field
E0070E0086
max_depth223
min_samples_split236
min_samples_leaf117
max_features√6log2(6)
Number of trees128128
Hyper-parameterValue in field
E0070E0086
max_depth223
min_samples_split236
min_samples_leaf117
max_features√6log2(6)
Number of trees128128
Table 2.

Precision of random forest classifiers trained and optimized as described in the text for the two adjacent POSSI fields. The F1 score is a weighted mean of precision and recall (Fawcett 2006).

ClassMetric
— E0070 —— E0086 —
PrecisionRecallF1 scorePrecisionRecallF1 score
Star0.800.840.820.800.810.81
Galaxy0.780.860.820.860.840.81
Bad0.870.720.790.760.770.80
ClassMetric
— E0070 —— E0086 —
PrecisionRecallF1 scorePrecisionRecallF1 score
Star0.800.840.820.800.810.81
Galaxy0.780.860.820.860.840.81
Bad0.870.720.790.760.770.80
Table 2.

Precision of random forest classifiers trained and optimized as described in the text for the two adjacent POSSI fields. The F1 score is a weighted mean of precision and recall (Fawcett 2006).

ClassMetric
— E0070 —— E0086 —
PrecisionRecallF1 scorePrecisionRecallF1 score
Star0.800.840.820.800.810.81
Galaxy0.780.860.820.860.840.81
Bad0.870.720.790.760.770.80
ClassMetric
— E0070 —— E0086 —
PrecisionRecallF1 scorePrecisionRecallF1 score
Star0.800.840.820.800.810.81
Galaxy0.780.860.820.860.840.81
Bad0.870.720.790.760.770.80
Table 3.

Decision tree feature importance for the random forest classifiers in the two adjacent fields.

FeatureRelative importance in field
E0070E0086
prfStat50.5 per cent45.8 per cent
area16.2 per cent17.2 per cent
e16.6 per cent16.9 per cent
sMag10.6 per cent12.7 per cent
xCen3.3 per cent3.9 per cent
yCen2.8 per cent3.5 per cent
FeatureRelative importance in field
E0070E0086
prfStat50.5 per cent45.8 per cent
area16.2 per cent17.2 per cent
e16.6 per cent16.9 per cent
sMag10.6 per cent12.7 per cent
xCen3.3 per cent3.9 per cent
yCen2.8 per cent3.5 per cent
Table 3.

Decision tree feature importance for the random forest classifiers in the two adjacent fields.

FeatureRelative importance in field
E0070E0086
prfStat50.5 per cent45.8 per cent
area16.2 per cent17.2 per cent
e16.6 per cent16.9 per cent
sMag10.6 per cent12.7 per cent
xCen3.3 per cent3.9 per cent
yCen2.8 per cent3.5 per cent
FeatureRelative importance in field
E0070E0086
prfStat50.5 per cent45.8 per cent
area16.2 per cent17.2 per cent
e16.6 per cent16.9 per cent
sMag10.6 per cent12.7 per cent
xCen3.3 per cent3.9 per cent
yCen2.8 per cent3.5 per cent

2.3.6. Classifying the whole plate detection catalogues

The aforementioned individual trained classifiers were applied to the full SSA catalogues for the adjacent fields E0070 and E0086. For the former, this resulted in ≈45 000 stellar, ≈27 000 galaxy, and ≈8000 bad detections brighter than E = 19.5 mag. A histogram is shown in Fig. 4 and exhibits logarithmic number–magnitude counts for stars and galaxies that conform to expectations, with shallower star counts and steeper, log-linear galaxy counts that cross at around 18th magnitude. The number counts of sources classified as bad rise dramatically towards the faint end consistent with the presence of a population of faint detections in the emulsion noise, originating from optical imaging artefacts chopped up by the automated image analysis software, or from foreign body contamination (e.g. dust, hair, etc.) on the scanned plates (see e.g. Storkey et al. 2004).

Number–magnitude counts in the three discrete classes for field E0070 as determined via the ML classifier as described in the text.
Figure 4.

Number–magnitude counts in the three discrete classes for field E0070 as determined via the ML classifier as described in the text.

As another method of validating our results, albeit in a necessary-but-insufficient sense, we employed the small overlap region between the two adjacent fields. We took the lists of detections classified as bad in the two fields and searched for positional coincidences within a tolerance of 1 arcsec, and there were none as we would expect. Furthermore, the availability of the overlap region enabled us to search for apparent transients appearing on E0086 that are not present on E0070, i.e. the opposite situation to that described in Villarroel et al. (2021). We found more than 100 detections on E0086 in the overlap region with brightness, eccentricity, and area similar to the nine detections on E0070 and some examples are shown in Fig. 5. Of course, there will be a ≈10 per cent level of false, contrary classifications of stars as galaxies and galaxies as stars in the overlap region between the two plates, as indicated by the confusion matrices in Fig. 3.

Examples of two bad (according to our ML classifier) detections masquerading as transient sources in the overlap regions of POSSI fields E0086 (right-hand panels) and E0070 (left-hand panels), being present in the former while absent in the latter. Several real astronomical sources common between the two fields in each case are circumscribed in large ellipses while the bad detection is circled.
Figure 5.

Examples of two bad (according to our ML classifier) detections masquerading as transient sources in the overlap regions of POSSI fields E0086 (right-hand panels) and E0070 (left-hand panels), being present in the former while absent in the latter. Several real astronomical sources common between the two fields in each case are circumscribed in large ellipses while the bad detection is circled.

From the application of our trained classifier for field E0070 we note that all nine of the transients identified in Villarroel et al. (2021) are classed as bad with probabilities amongst the various classes as shown in Table 4.

Table 4.

ML classifier results for the nine apparent transients of Villarroel et al. (2021) in terms of percentage relative probability amongst the three classes.

TransientClass probability (per cent)
no.BadStarGalaxy
180.419.60.0
299.30.70.0
395.94.00.1
462.437.10.5
598.31.20.5
674.524.60.9
767.332.50.2
881.418.40.2
985.714.00.3
TransientClass probability (per cent)
no.BadStarGalaxy
180.419.60.0
299.30.70.0
395.94.00.1
462.437.10.5
598.31.20.5
674.524.60.9
767.332.50.2
881.418.40.2
985.714.00.3
Table 4.

ML classifier results for the nine apparent transients of Villarroel et al. (2021) in terms of percentage relative probability amongst the three classes.

TransientClass probability (per cent)
no.BadStarGalaxy
180.419.60.0
299.30.70.0
395.94.00.1
462.437.10.5
598.31.20.5
674.524.60.9
767.332.50.2
881.418.40.2
985.714.00.3
TransientClass probability (per cent)
no.BadStarGalaxy
180.419.60.0
299.30.70.0
395.94.00.1
462.437.10.5
598.31.20.5
674.524.60.9
767.332.50.2
881.418.40.2
985.714.00.3

3. DISCUSSION

The precision of the ML classifier as described above is 81 per cent for field E0070. This is usable for the purpose of this study, at least in the sense that of the nine detections under investigation we might expect around eight to be correctly classified, so the fact that one or two cases in Table 4 have relatively lower distinction between the bad and stellar classes should not be too concerning. In any case, none have any significant probability of being galaxies which is of course not surprising given their sharp profiles as indicated by their significantly negative profile statistics. The relatively low precision does not compare particularly favourably with previous applications of ML techniques to image classification of parametrized images on photographic plates. For example, Odewahn et al. (1992, 2004) and Weir, Fayyad & Djorgovski (1995) applied ML techniques to catalogues derived from photographic plates and achieved significantly higher precision. However in these works, only two classes (star and galaxy) were considered; whereas, in this study we have a third class whose occupation of the low-dimensionality feature parameter space may well be less distinct than that of stars and galaxies. Moreover, some reliability tests in those earlier works were done internally in overlap regions since at that time deep, wide-angle, and high-angular resolution catalogues from missions such as Gaia and PanSTARRS were not available for training and validation. Furthermore, later work employed data from measures of second-generation POSS original plates employing fine-grained Kodak IIIa emulsions; whereas, the data analysed here are from copy plates employing older, coarser 103a emulsions (as were state-of-the-art at the time of POSSI). On the subject of modern (i.e. late 20th century) hyper-sensitized, fine grained Kodak emulsions, it is interesting to note that there is a large literature on the appearance of spurious microdots (see e.g. Good 1988, and references therein) but no such artefacts have been reported as occurring on the older coarse-grained varieties.

Neither Villarroel et al. (2021) nor any of the follow-up works discuss the provenance of the photographic plate material scanned in creating the images on which their analyses are based yet the reproduction of glass copy plates from the survey originals is clearly an important consideration. Emulsion flaws will be present on original negatives, contact positives derived from them as the first step in creating copies, and any paper, film, or glass copy negatives derived from those positives. While the presence of the transients in independent scans (DSS and SSS) does indeed eliminate the respective digital scanning procedures, hardware and software as introducing the images under consideration here, it does not follow that those images must be present on the original negatives and are astronomical in origin. For example, suppose that there were a significant number of bubbles and holes (e.g. Fig. 1) in the emulsion coatings of the plates used as positives in the production of the glass copy negatives. These would result in the appearance of dark spots on those copies since they would appear as low density spots in the intermediate positives in the same manner as real astronomical sources. This would easily explain the presence of the nine apparent transients and many hundreds or even thousands more on each POSSI copy plate, i.e. it is likely that the entire survey is peppered with such isolated detections. Unfortunately we cannot make any quantitative assessment of the prevalence of holes and bubbles like those illustrated in Fig. 1 since the automated analysis in the DSS programmes considered only dark detections in an otherwise light background as potential sources and densities below that of the sky were ignored.

Clearly it would be most instructive to analyse (or at the very least, visually inspect) the original POSSI negative plates and the intermediate copies. The originals are, we understand, archived in the Carnegie Observatories plate vault as part of the plate archive holdings of the Observatories of the Carnegie Institution for Science.2 The existence and, assuming they were retained for posterity, location of the intermediate positives used in copy atlas production is unknown to us. As noted previously the only large-scale digitization and catalogue generation programme to scan original glass negatives of POSSI was that of the USNO (Monet et al. 2003). Unfortunately the image data from those scans appears to be no longer available and furthermore USNO-B1.0 catalogue entries were created only for detections paired between at least two plates of different epoch or colour. Detections appearing on one plate only (such as those under consideration here) were never retained in USNO-B1.0 catalogue anyway, even supposing they were present on the original negative plates.

4. CONCLUSIONS

We have undertaken an independent study of nine apparent optical transients identified by Villarroel et al. (2021) in publicly available digitized scans of POSSI copy plates. We have verified the presence of those detections on locally held copy plates from which SSS scans were made, noting subtle differences between them and other stellar images of similar brightness. We also noted the presence of emulsion blemishes (holes) during our visual inspection. We have made an objective, quantitative, and statistical analysis of the morphological properties of the publicly available catalogues of all detections on the same plates, derived during the production of the whole-sky SSS, from two adjacent fields. We find that (i) the image profiles of the transients are significantly sharper than typical stellar images on the plates; (ii) that an ML decision-tree classifier badges the images as spurious with high probability; (iii) that similar examples of apparent transients are present on the copy plate of the adjacent field; and finally (iv) that there are many hundreds of similar images on both plates in the overlap region between the two fields. We suggest one likely mechanism for the origin of at least some of these apparent transients as being emulsion holes on the intermediate positive plates used during reproduction of the copy sets. We therefore caution that digitized all-sky survey catalogues derived from the POSSI glass copies are likely peppered with these isolated false detections and that great care must be exercised when interpreting the publicly available digitized images or when making samples of unpaired catalogue records derived from them.

ACKNOWLEDGEMENTS

We thank two anonymous referees for their positive reception and constructive comments on the original version of this paper. This research has made use of data obtained from the SuperCOSMOS Science Archive, prepared and hosted by the Wide Field Astronomy Unit, Institute for Astronomy, University of Edinburgh, which is funded by the UK Science and Technology Facilities Council. This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. The Pan-STARRS1 Surveys (PS1) and the PS1 public science archive have been made possible through contributions by the Institute for Astronomy, the University of Hawaii, the Pan-STARRS Project Office, the Max-Planck Society and its participating institutes, the Max Planck Institute for Astronomy, Heidelberg and the Max Planck Institute for Extraterrestrial Physics, Garching, The Johns Hopkins University, Durham University, the University of Edinburgh, the Queen’s University Belfast, the Harvard-Smithsonian Center for Astrophysics, the Las Cumbres Observatory Global Telescope Network Incorporated, the National Central University of Taiwan, the Space Telescope Science Institute, the National Aeronautics and Space Administration under Grant No. NNX08AR22G issued through the Planetary Science Division of the NASA Science Mission Directorate, the National Science Foundation Grant No. AST-1238877, the University of Maryland, Eotvos Lorand University (ELTE), the Los Alamos National Laboratory, and the Gordon and Betty Moore Foundation.

DATA AVAILABILITY

All data used in this study are publicly available from online archive systems such as the SuperCOSMOS Science Archive (Hambly et al. 2004), the Gaia Archive (Salgado et al. 2020), and the Barbara A. Mikulski Archive for Space Telescopes (MAST; Shaw, Cherinka & Forshay 2021). Some example queries are provided for convenience in the Appendix.

Footnotes

References

Bunclark
 
P. S.
,
Irwin
 
M. J.
,
1983
,
Proc. an International Colloquium, Statistical Methods in Astronomy
.
European Space Agency ESA SP-201
,
Noordwijk
, p.
195

Cannon
 
R. D.
,
1984
, in
Capaccioli
 
M.
, ed.,
Astrophysics and Space Science Library, Vol. 110, Astronomy with Schmidt-Type Telescopes
.
D. Reidel Publishing Co
,
Dordrecht
, p.
25

Chambers
 
K. C.
 et al. ,
2016
,
preprint
()

Efron
 
B.
,
Tibshirani
 
R.
,
1994
,
An Introduction to the Bootstrap
.
CRC Press
,
New York
 

Farrow
 
D. J.
 et al. ,
2014
,
MNRAS
,
437
,
748
 

Fawcett
 
T.
,
2006
,
Patt. Recog. Lett.
,
27
,
861
 

Flewelling
 
H. A.
 et al. ,
2020
,
ApJS
,
251
,
7
 

Gaia Collaboration
,
2016
,
A&A
,
595
,
A1
 

Gaia Collaboration
,
2023
,
A&A
,
674
,
A1
 

Good
 
A. R.
,
1988
, in
Marx
 
S.
, ed.,
Proc. IAU Workshop, Astrophotography
.
Springer
,
Berlin
, p.
28

Gunn
 
J. E.
 et al. ,
1998
,
AJ
,
116
,
3040
 

Hambly
 
N. C.
,
Davenhall
 
A. C.
,
Irwin
 
M. J.
,
MacGillivray
 
H. T.
,
2001a
,
MNRAS
,
326
,
1315
 

Hambly
 
N. C.
,
Irwin
 
M. J.
,
MacGillivray
 
H. T.
,
2001b
,
MNRAS
,
326
,
1295
 

Hambly
 
N. C.
 et al. ,
2001c
,
MNRAS
,
326
,
1279
 

Hambly
 
N.
,
Read
 
M.
,
Mann
 
R.
,
Sutorius
 
E.
,
Bond
 
I.
,
MacGillivray
 
H.
,
Williams
 
P.
,
Lawrence
 
A.
,
2004
, in
Ochsenbein
 
F.
,
Allen
 
M. G.
,
Egret
 
D.
, eds,
ASP Conf. Ser. Vol. 314, Astronomical Data Analysis Software and Systems (ADASS) XIII
.
Astron. Soc. Pac
,
San Francisco
, p.
137

Hastie
 
T.
,
Tibshirani
 
R.
,
Friedman
 
J.
,
2009
,
Random Forests
.
Springer
,
New York
, p.
587

Hu
 
J. A.
,
Rawls
 
M. L.
,
Yoachim
 
P.
, Ivezić, Ž.,
2022
,
ApJ
,
941
,
L15
 

Lasker
 
B. M.
,
1992
, in
MacGillivray
 
H. T.
,
Thomson
 
E. B.
, eds,
Astrophysics and Space Science Library, Vol. 174, Digitized Optical Sky Surveys
.
Springer
,
Berlin
, p.
87

Lasker
 
B. M.
,
1995
,
PASP
,
107
,
763
 

Lindegren
 
L.
 et al. ,
2021
,
A&A
,
649
,
A4
 

Minkowski
 
R. L.
,
Abell
 
G. O.
,
1963
, in
Strand
 
K. A.
, ed.,
Basic Astronomical Data: Stars and Stellar Systems
.
Univ. Chicago Press
,
Chicago
, p.
481

Monet
 
D. G.
 et al. ,
2003
,
AJ
,
125
,
984
 

Morgan
 
D. H.
,
Tritton
 
S. B.
,
Savage
 
A.
,
Hartley
 
M.
,
Cannon
 
R. D.
,
1992
, in
MacGillivray
 
H. T.
,
Thomson
 
E. B.
, eds,
Astrophysics and Space Science Library, Vol. 174, Digitized Optical Sky Surveys
.
Springer
,
Berlin
, p.
11

Odewahn
 
S. C.
,
Stockwell
 
E. B.
,
Pennington
 
R. L.
,
Humphreys
 
R. M.
,
Zumach
 
W. A.
,
1992
,
AJ
,
103
,
318
 

Odewahn
 
S. C.
 et al. ,
2004
,
AJ
,
128
,
3092
 

Pedregosa
 
F.
 et al. ,
2011
,
J. Mach. Learn. Res.
,
12
,
2825

Probst
 
P.
,
Boulesteix
 
A.-L.
,
Bischl
 
B.
,
2019
,
J. Mach. Learn. Res.
,
20
,
1

Salgado
 
J.
 et al. ,
2020
, in
Ballester
 
P.
,
Ibsen
 
J.
,
Solar
 
M.
,
Shortridge
 
K.
, eds,
ASP Conf. Ser. Vol. 522, Astronomical Data Analysis Software and Systems XXVII
.
Astron. Soc. Pac
,
San Francisco
, p.
307

Schwamb
 
M. E.
 et al. ,
2023
,
ApJS
,
266
,
22
 

Shaw
 
R. A.
,
Cherinka
 
B.
,
Forshay
 
P.
,
2021
,
MAST Portal Guide
.

Solano
 
E.
,
Villarroel
 
B.
,
Rodrigo
 
C.
,
2022
,
MNRAS
,
515
,
1380
 

Solano
 
E.
,
Marcy
 
G. W.
,
Villarroel
 
B.
,
Geier
 
S.
,
Streblyanska
 
A.
,
Lombardi
 
G.
,
Bär
 
R. E.
,
Andruk
 
V. N.
,
2024
,
MNRAS
,
527
,
6312
 

Storkey
 
A. J.
,
Hambly
 
N. C.
,
Williams
 
C. K. I.
,
Mann
 
R. G.
,
2004
,
MNRAS
,
347
,
36
 

Villarroel
 
B.
 et al. ,
2021
,
Sci. Rep.
,
11
,
12794
 

Villarroel
 
B.
 et al. ,
2022
,
preprint
()

Weir
 
N.
,
Fayyad
 
U. M.
,
Djorgovski
 
S.
,
1995
,
AJ
,
109
,
2401
 

West
 
R. M.
,
Dumoulin
 
B.
,
1980
,
AAS Photo Bull.
,
23
,
3

APPENDIX A: EXAMPLE ARCHIVE QUERIES FOR DATA SETS EMPLOYED IN THE STUDY

Plate details and internal scanning housekeeping for the POSSI-E plates studied herein (E0070 and E0086) can be derived with the following query (cut-and-paste into the ‘Freeform SQL’ web form3 in the SSA having first selected ‘Full SSA’ as the data base):

SELECT *

FROM Plate

WHERE plateNum IN (70,86) AND

     emulsion LIKE ''103a'' AND

     filterID LIKE ''E''

This shows the internal plate identifiers for the relevant catalogue subsets (plate E0070 has identifier 327750 and E0086 is 327766). For example, the data plotted in Fig. 2 can be generated as follows:

SELECT sMag, prfStat

FROM   Detection

WHERE  plateId = 327750 AND sMag < 20.0 AND

      ra BETWEEN 205.0 AND 220.0 AND

      dec BETWEEN +26.0 and +34.0where the final position predicate is a speed optimization trick to force an indexed search of the ∼10 billion row detection table. As another example, internal detection object identifiers for the nine transient sources identified by positions quoted in Villarroel et al. (2021) or measured from an SSS (in the case of the three detections not having quoted positions) can be derived using the SSA cone-search facility4 and then included in a query to separately identify their data for overplotting in Fig. 2:

SELECT sMag, prfStat, ra, dec

FROM   Detection

WHERE  objId IN

   (1409569612299549, 1409569612298291,

   1409569612298315, 1409569612295478,

   1409569612295920, 1409569612296943,

   1409569612291853, 1409569612291887,

   1409569612291586)

A complete SSA catalogue containing a column projection including only those features used in the ML part of the analysis for one or other plate can be generated as follows (column explanations are provided in the data base schema browser5 for all tables):

SELECT DISTINCT ra,dec, aI, bI, class, blend,

      quality, prfStat, sMag,

      sqrt(1.0 - (bI*bI)/(aI*aI)) AS e

FROM Detection

WHERE plateId = 327750 AND surveyId = 5 AND

   ra BETWEEN 205.0 AND 220.0 AND

   dec BETWEEN +26.0 AND +34.0 AND

   blend = 0 AND quality < 128 AND

   sMag < 19.5(note that owing to the construction of the SSA data base there are duplicate catalogue records for POSSI-E plate detections and these are filtered out above using the DISTINCT keyword).

Similarly, SQL/ADQL queries can be employed in the Gaia and MAST science archives to derive relevant data sets as described in the main text. For example, the sample of reliable stars input into the generation of a training set of reliable stars for field E0070 as described in Section 2.3.2 can be generated in the Gaia archive search facility as follows:

SELECT ra, dec, pmra, pmdec

FROM gaiadr3.gaia_source

WHERE ruwe < 1.4 AND

     1 = CONTAINS( POINT('ICRS', ra, dec),

   CIRCLE('ICRS', 216.08370, 29.34827, 4.25))

Note that the cut on ruwe was not employed in generating training sets of spurious objects, as described in Section 2.3.4. The ICRS field centre quoted in this query was derived by precession of the B1950.0 equatorial coordinates (215.535, 29.574) appearing in the output from the first query to the SSA data base above in columns (raPntdecPnt). The circular area selection, achieved via the ‘cone-search’ geometric function, specifies a radius of 4.25 degrees around the field centre to enclose the entire plate area.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.