Abstract

We present a novel automated methodology to detect and classify periodic variable stars in a large data base of photometric time series. The methods are based on multivariate Bayesian statistics and use a multistage approach. We applied our method to the ground-based data of the Trans-Atlantic Exoplanet Survey (TrES) Lyr1 field, which is also observed by the Kepler satellite, covering ∼26 000 stars. We found many eclipsing binaries as well as classical non-radial pulsators, such as slowly pulsating B stars, γ Doradus, β Cephei and δ Scuti stars. Also a few classical radial pulsators were found.

1 INTRODUCTION

In recent years there has been a rapid progress in astronomical instrumentation giving us an enormous amount of new time-resolved photometric data, resulting in large data bases. These data bases contain many light curves of variable stars, both of known and unknown nature. Well-known examples are the large data bases resulting from the CoRoT (Fridlund et al. 2006) and Kepler (Gilliland et al. 2010) space missions, containing respectively ∼100 000 and ∼150 000 light curves so far. The ESA Gaia mission, expected to be launched in 2012, will monitor about one billion stars during five years. Besides the space missions, also large-scale photometric monitoring of stars with ground-based automated telescopes deliver large numbers of light curves. The challenging task of a fast and automated detection and classification of new variable stars is therefore a necessary first step in order to make them available for further research and to study their group properties.

Several efforts have already been made to detect and classify variable stars. In the framework of the CoRoT mission, a procedure for fast light-curve analysis and derivation of classification parameters was developed by Debosscher et al. (2007). That algorithm searches for a fixed number of frequencies and overtones, giving the same set of parameters for each star. The variable stars were then classified using a Gaussian classifier (Debosscher et al. 2007, 2009) and a Bayesian network classifier (Sarro et al. 2009).

In this paper, we present a new version of this method to detect and classify periodic variable stars. In contrast to the previous versions, the new automated methodology only uses significant frequencies and overtones to classify the variables with it giving rise to less confusion, especially when dealing with ground-based data. In order to be able to deal with a variable number of parameters, we also introduce a novel multistage approach. This new methodology offers much more flexibility. We applied this method to the ground-based photometric data of the Trans-Atlantic Exoplanet Survey (TrES) Lyr1 field, covering about ∼26 000 stars. The classification algorithm considers various classes of non-radial pulsators, such as β Cep, slowly pulsating B (SPB) stars, δ Sct and γ Dor stars, as well as classical radial pulsators (Cepheids, RR Lyr) and eclipsing binaries [see e.g. Aerts, Christensen-Dalsgaard & Kurtz (2010) for a definition of the classes of these pulsators].

2 A NEW METHODOLOGY

2.1 Variability detection

To detect and extract the variables we performed an automated frequency analysis on all time series. The algorithm first checks for a possible polynomial trend up to order 2 and subtracts it, as it can have a large detrimental influence on the frequency spectrum through aliasing. The order of the trend was determined using a classical likelihood-ratio test. Although the coefficients of the trend are recomputed each time a new oscillation frequency is added to the fit, the order of the trend remains fixed.

After detrending, the algorithm searches for significant frequencies and overtones in the residuals, using Fourier analysis. The algorithm searches for the frequency with the highest amplitude in the discrete Fourier transform and checks if this period is significant, using the false alarm probability (Horne & Baliunas 1986; Schwarzenberg-Czerny 1998). Note that a detected frequency peak can be significant but unreliable. Reliability is checked through pre-specified frequency intervals that are not trustworthy (e.g. around multiples of 1 c/d for ground-based data). Unreliable frequencies are pre-whitened, but flagged as ‘unreliable’ and are not used for classification. If the frequency is the first significant reliable frequency, then the algorithm checks whether half of this frequency is also significant and reliable. In this case, the original new frequency is replaced with half of this frequency to better model the binary light curves. In a next step, the algorithm searches for significant overtones, using the likelihood-ratio test, to model possible non-sinusoidal variations (like those of RR Lyr stars). This procedure is repeated as long as significant frequencies are found. These frequencies νn can be used to make a harmonic best fit to the light curve of the form
1
with 0 ≤K≤ 2 the order of the trend, and N, the number of significant frequencies, determined using the false alarm probability and M≥ 1, the number of harmonics, determined using the likelihood-ratio test.

The frequency analysis method used by Debosscher et al. (2007) performs well on properly reduced satellite data for which it was designed, but not on noisier ground-based data, as many insignificant frequencies and overtones can degrade the performance of the classifier.

2.2 The classifier

The aim of supervised classification is to assign to each variable target a probability that it belongs to a particular pre-defined variability class, given a set of observed parameters. This set of parameters (also called attributes) is obtained from the variability detection pipeline described above and contains frequencies, amplitudes and phase differences. The classifier relies on a set of known examples, the so-called training set, of each class that needs to represent well the entire variability class.

We used a novel multistage approach, where the classification problem is divided into several sequential steps. This classifier partitions the set of given variability classes Ci, into two or more parts: formula, formula, … . This simplifies the classification by degrading the level of detail to a smaller number of categories. Each of these partitions formula, which can contain several variability classes, is then again splitted into formula, formula, … , which in turn can be partitioned into formula, formula, and so on, each time specializing the classification until each subpartition contains only one variability class. These partitions can be represented in a tree.

This approach offers several advantages compared to a single-stage classifier. The main advantage of this approach is that in each stage a different classifier and a different set of attributes can be used. This is important as attributes carrying useful information for the separation of two classes can be useless or even harmful for distinguishing other classes. In each stage, informationless attributes for the separation of the classes of interest can be removed, thereby significantly reducing possible confusion. In addition, it is also possible to have a variable number of attributes. This allows us to make different branches for mono- versus multi-periodic pulsators. In this way we do not need a fixed set of attributes, thereby avoiding the introduction of spurious frequencies or overtones, which was sometimes the case in Debosscher et al. (2007). As already mentioned, this too is important as insignificant attributes can degrade the performance of the classifier.

We took each of the classifier nodes in the multistage tree as a Gaussian mixture classifier. The Gaussian mixture classifier is based on the general law of Bayes:
2
with Nc the number of different classes. These classes can correspond to the variability classes (e.g. β Cep, SPB, …), but as the Gaussian mixture classifier is used at the nodes in the multistage classifier, a class in this context may also correspond to a group of variability classes relevant for a particular node. P(C=ci|A=a) is the a posteriori probability of the target belonging to class ci given the observational evidence a, and is the goal of the classification problem. L(A=a|C=ci) is the conditional likelihood of an attribute set a given that it belongs to variability class ci. P(C=ci) is the a priori probability of a target belonging to class ci. As no reliable prior values for variability classes are known yet, we used a uniform prior.

In previous versions of the classifier, the likelihood was approximated as a single Gaussian. Some of the variability classes, however, are not well modelled by a single Gaussian. An example of this is shown in Fig. 1, in which multiple components are clearly preferable.

Gaussian mixture in the two-parameter space (log (ν1), log (a)) for the classical Cepheids in the training set estimated by the expectation-maximization (EM) algorithm.
Figure 1

Gaussian mixture in the two-parameter space (log (ν1), log (a)) for the classical Cepheids in the training set estimated by the expectation-maximization (EM) algorithm.

The likelihood is now approximated as a finite sum of multivariate Gaussians:
3
where
4
with Mi the finite number of Gaussian components of class ci, Na the number of attributes, αk the a priori probability to belong to component k, and μk and Σk, the mean vector and covariance matrix of Gaussian component k.

For each node of the multistage tree, the set of variability classes is partitioned. The best attributes are selected for that node and the classifier is trained, meaning that the Gaussian mixture for each class is determined. To do so, we used the expectation-maximization (EM) method (see e.g. Gamerman & Migon 1993). Given a variability class, the unknowns are the number of Gaussian components of each class, the prior probability to belong to a particular component, and the mean vectors μk and covariance matrices Σk of each component. The EM algorithm is an iterative method for calculating maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an expectation (E) step, which computes the E of the log-likelihood evaluated by using the current estimate for the latent variables, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found in the E step. These parameter estimates are then used to determine the distribution of the latent variables in the next E step. Given the number of Gaussian components Nc, the remaining unknowns in the model can be determined by using this procedure. The actual number of components is determined using the Bayesian information criterion (BIC), which is a criterion for model selection among a set of parametric models with different number of parameters. We obtained three components in the example in Fig. 1 using BIC. The Akaike information criterion (AIC) gives the same number of components. This solution turns out to be very stable when changing initial values, in the sense that the EM algorithm always converges to the same solution.

2.3 Automated classification

Once the classifiers in each node are trained, the targets can be classified. In each node we assign a probability to each target that it belongs to a particular class relevant for that node. In order to obtain the final probability for each variability class we multiply the probabilities along the corresponding root-to-leaf path using the chain rule of conditional probability. Let formula be the subpartition that contains only class Ci. The probability that the target T belongs to Ci is thus given by
5
where we dropped ‘T∈’ and the observed attributes {Ai} on the right-hand side of the equation, for the sake of notational simplicity. We retain the most probable class assignment for a given variable star of unknown type and label it according to the Mahalanobis distance.
Note that the denominator in equation (2) enforces the target to belong to one of the pre-defined classes, although the target can be very far from the class centres in attribute space. For that reason it is important to include an outlier detection step to flag possible wrong predictions. Debosscher et al. (2009) approximated a training class with a single Gaussian, and computed the Mahalanobis distance of a target to the centre of the class as an outlier indicator. For the multistage approach with multidimensional Gaussians, we use the following extension of the Mahalanobis distance:
6
with a the attribute vector of the target and formula the centre of mass of the Gaussian mixture. The total variance formula is defined as the sum of the intracomponent variances and the intercomponent variance:
7
where μk is the mean vector of each of the Nc Gaussian components. If, and only if, the distance is above a certain threshold, the outlier flag will be set to indicate that the target does not seem to belong to any of the pre-defined classes. This distance is a multidimensional generalization of the one-dimensional statistical distance (e.g. distance to a mean value of a Gaussian in terms of the standard deviation). For this reason, a value of the distance threshold d = 3 is chosen.

2.4 Training the classifier

In order to train the classifier, we computed the attributes of the training set objects, which were taken from Hipparcos, OGLE and CoRoT, with the variability detection pipeline, described in Section 2.1. We only computed up to two significant frequencies with each up to three harmonics, which in our experience is sufficient for classification purposes. Since the quality of the classification results depends crucially on the quality of the training set, we checked all the light curves and phase plots in this set. The variability classes we took into account are listed in Table 1. We carefully set up the multistage tree, which is given in Fig. 2. Applying clustering techniques on CoRoT data, Sarro et al. (2009) managed to identify new classes. In view of the Kepler mission, two of these classes, stars with activity and variables due to rotational modulation, are taken into account in the multistage tree. A detailed description of these two classes can be found in Debosscher et al. (2010).

Table 1

The variability classes taken into account in the multistage tree, with the number of light curves (NLC) used to define the classes.

ClassNLC
Eclipsing binaries (ECL)790
Ellipsoidal (ELL)35
Classical cepheids (CLCEP)170
Double-mode cepheids (DMCEP)79
RR Lyr stars, subtype ab (RRAB)70
RR Lyr stars, subtype c (RRC)21
RR Lyr stars, subtype d (RRD)52
β Cep stars (BCEP)28
δ Sct stars (DSCUT)86
Slowly pulsating B stars (SPB)91
γ Dor stars (GDOR)33
Mira variables (MIRA)136
Semiregular (SR)103
Activity (ACT)51
Rotational modulation (ROT)26
ClassNLC
Eclipsing binaries (ECL)790
Ellipsoidal (ELL)35
Classical cepheids (CLCEP)170
Double-mode cepheids (DMCEP)79
RR Lyr stars, subtype ab (RRAB)70
RR Lyr stars, subtype c (RRC)21
RR Lyr stars, subtype d (RRD)52
β Cep stars (BCEP)28
δ Sct stars (DSCUT)86
Slowly pulsating B stars (SPB)91
γ Dor stars (GDOR)33
Mira variables (MIRA)136
Semiregular (SR)103
Activity (ACT)51
Rotational modulation (ROT)26
Table 1

The variability classes taken into account in the multistage tree, with the number of light curves (NLC) used to define the classes.

ClassNLC
Eclipsing binaries (ECL)790
Ellipsoidal (ELL)35
Classical cepheids (CLCEP)170
Double-mode cepheids (DMCEP)79
RR Lyr stars, subtype ab (RRAB)70
RR Lyr stars, subtype c (RRC)21
RR Lyr stars, subtype d (RRD)52
β Cep stars (BCEP)28
δ Sct stars (DSCUT)86
Slowly pulsating B stars (SPB)91
γ Dor stars (GDOR)33
Mira variables (MIRA)136
Semiregular (SR)103
Activity (ACT)51
Rotational modulation (ROT)26
ClassNLC
Eclipsing binaries (ECL)790
Ellipsoidal (ELL)35
Classical cepheids (CLCEP)170
Double-mode cepheids (DMCEP)79
RR Lyr stars, subtype ab (RRAB)70
RR Lyr stars, subtype c (RRC)21
RR Lyr stars, subtype d (RRD)52
β Cep stars (BCEP)28
δ Sct stars (DSCUT)86
Slowly pulsating B stars (SPB)91
γ Dor stars (GDOR)33
Mira variables (MIRA)136
Semiregular (SR)103
Activity (ACT)51
Rotational modulation (ROT)26
Multistage decomposition. The subtree represented in the S box is not replicated for simplicity.
Figure 2

Multistage decomposition. The subtree represented in the S box is not replicated for simplicity.

In each node, we manually selected the best attributes to distinguish the classes considered in that node. In order to evaluate the significance of an attribute we measured the information gain and gain ratio with respect to each class (Witten & Frank 2005). Based on these results we selected the best attributes in terms of highest information gain and gain ratio, which make sense from an astrophysical point of view. In practice ‘random’ attributes can show structure, even if they are not supposed to. The attributes we know by theory that should be random variables were excluded in order to avoid overfitting. In each node the classifier was then tested using stratified 10-fold cross-validation (see e.g. Witten & Frank 2005). In stratified n-fold cross-validation, the original sample is randomly partitioned into n subsamples. Of the n subsamples, a single one is retained as the validation data for testing the model, and the remaining n− 1 subsamples are used as training data. The cross-validation process is then repeated n times (the folds), with each of the n subsamples used exactly once as the validation data. Then the n results from the folds are combined to produce a single estimation. Each fold contains roughly the same proportions of the class labels. We kept the attributes giving the best classification results, not only in terms of correctly classified targets, but also in terms of accuracy measured by the area under the ROC curve (Witten & Frank 2005). The higher the area under the ROC curve, the better the test is.

Stratified 10-fold cross-validation was also applied on the multistage tree as a whole. When only the first frequency and its main amplitude are available, poor results are obtained because there is simply too little information available for classification. When we leave out those examples and only use the training examples for which we have more information, very good results are obtained as can be seen in Table 2. Only 5.8 per cent of the training examples is wrongly classified. When we replace the variability classes models by single Gaussians, we have a worse result with 7.3 per cent of wrong predictions (see Table 3). When we also use only one stage, 10.7 per cent of the training examples are misclassified (see Table 4). We can thus conclude that our multistage classification tree with Gaussian mixtures at its nodes is a significant improvement.

Table 2

The confusion matrix for the multistage tree applied on the training set objects with at least two harmonics for the first frequency. Each stellar variability class in each node is modelled by a finite sum of multivariate Gaussians. The last line lists the correct classification (CC) for every class separately. The average correct classification is 94.2 per cent.

BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT2190000010000000
CLCEP00147001000000000
DMCEP0017100000000001
MIRA00001095000000000
SR0000861000002101
RRAB0000006700000000
RRC0000002130000000
RRD0000000034000000
SPB0000000001931000
GDOR000300000660001
ELL0001010010113009
ROT0000000000002201
ACT0000000000000460
ECL01300503160431690
CC71.482.697.494.793.283.697.176.594.461.360.065.084.697.998.3
BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT2190000010000000
CLCEP00147001000000000
DMCEP0017100000000001
MIRA00001095000000000
SR0000861000002101
RRAB0000006700000000
RRC0000002130000000
RRD0000000034000000
SPB0000000001931000
GDOR000300000660001
ELL0001010010113009
ROT0000000000002201
ACT0000000000000460
ECL01300503160431690
CC71.482.697.494.793.283.697.176.594.461.360.065.084.697.998.3
Table 2

The confusion matrix for the multistage tree applied on the training set objects with at least two harmonics for the first frequency. Each stellar variability class in each node is modelled by a finite sum of multivariate Gaussians. The last line lists the correct classification (CC) for every class separately. The average correct classification is 94.2 per cent.

BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT2190000010000000
CLCEP00147001000000000
DMCEP0017100000000001
MIRA00001095000000000
SR0000861000002101
RRAB0000006700000000
RRC0000002130000000
RRD0000000034000000
SPB0000000001931000
GDOR000300000660001
ELL0001010010113009
ROT0000000000002201
ACT0000000000000460
ECL01300503160431690
CC71.482.697.494.793.283.697.176.594.461.360.065.084.697.998.3
BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT2190000010000000
CLCEP00147001000000000
DMCEP0017100000000001
MIRA00001095000000000
SR0000861000002101
RRAB0000006700000000
RRC0000002130000000
RRD0000000034000000
SPB0000000001931000
GDOR000300000660001
ELL0001010010113009
ROT0000000000002201
ACT0000000000000460
ECL01300503160431690
CC71.482.697.494.793.283.697.176.594.461.360.065.084.697.998.3
Table 3

The confusion matrix for the multistage tree applied on the training set objects with at least two harmonics for the first frequency. Each stellar variability class in each node is modelled by a single Gaussian. The last line lists the correct classification (CC) for every class separately. The average correct classification is 92.7 per cent.

BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT1180000000000000
CLCEP00148001000000001
DMCEP0007100000001002
MIRA000011410000000000
SR00003550000000014
RRAB0002006700000000
RRC1000002140000001
RRD0000000032000000
SPB0000000001921000
GDOR000200000780002
ELL00100100010150015
ROT0000020001002301
ACT0000000000000470
ECL02200403430331666
CC71.478.398.094.797.475.397.182.488.961.380.075.088.5100.094.8
BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT1180000000000000
CLCEP00148001000000001
DMCEP0007100000001002
MIRA000011410000000000
SR00003550000000014
RRAB0002006700000000
RRC1000002140000001
RRD0000000032000000
SPB0000000001921000
GDOR000200000780002
ELL00100100010150015
ROT0000020001002301
ACT0000000000000470
ECL02200403430331666
CC71.478.398.094.797.475.397.182.488.961.380.075.088.5100.094.8
Table 3

The confusion matrix for the multistage tree applied on the training set objects with at least two harmonics for the first frequency. Each stellar variability class in each node is modelled by a single Gaussian. The last line lists the correct classification (CC) for every class separately. The average correct classification is 92.7 per cent.

BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT1180000000000000
CLCEP00148001000000001
DMCEP0007100000001002
MIRA000011410000000000
SR00003550000000014
RRAB0002006700000000
RRC1000002140000001
RRD0000000032000000
SPB0000000001921000
GDOR000200000780002
ELL00100100010150015
ROT0000020001002301
ACT0000000000000470
ECL02200403430331666
CC71.478.398.094.797.475.397.182.488.961.380.075.088.5100.094.8
BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP530000000000000
DSCUT1180000000000000
CLCEP00148001000000001
DMCEP0007100000001002
MIRA000011410000000000
SR00003550000000014
RRAB0002006700000000
RRC1000002140000001
RRD0000000032000000
SPB0000000001921000
GDOR000200000780002
ELL00100100010150015
ROT0000020001002301
ACT0000000000000470
ECL02200403430331666
CC71.478.398.094.797.475.397.182.488.961.380.075.088.5100.094.8
Table 4

The confusion matrix for a single-stage classifier applied on the training set objects with at least two harmonics for the first frequency. Each stellar variability class is modelled by a single Gaussian. The last line lists the correct classification (CC) for every class separately. The average correct classification is 89.3 per cent.

BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP340000010000002
DSCUT3190000000000003
CLCEP00149101000000003
DMCEP0007000001001004
MIRA000011411000000000
SR00003590000010112
RRAB0000006700000000
RRC00000021500000018
RRD0000000135000004
SPB0000000002521005
GDOR000300000461008
ELL00210100022131029
ROT0000000000012304
ACT1000000000002460
ECL00000100000200610
CC42.982.698.793.397.480.897.188.297.280.660.065.088.597.886.9
BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP340000010000002
DSCUT3190000000000003
CLCEP00149101000000003
DMCEP0007000001001004
MIRA000011411000000000
SR00003590000010112
RRAB0000006700000000
RRC00000021500000018
RRD0000000135000004
SPB0000000002521005
GDOR000300000461008
ELL00210100022131029
ROT0000000000012304
ACT1000000000002460
ECL00000100000200610
CC42.982.698.793.397.480.897.188.297.280.660.065.088.597.886.9
Table 4

The confusion matrix for a single-stage classifier applied on the training set objects with at least two harmonics for the first frequency. Each stellar variability class is modelled by a single Gaussian. The last line lists the correct classification (CC) for every class separately. The average correct classification is 89.3 per cent.

BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP340000010000002
DSCUT3190000000000003
CLCEP00149101000000003
DMCEP0007000001001004
MIRA000011411000000000
SR00003590000010112
RRAB0000006700000000
RRC00000021500000018
RRD0000000135000004
SPB0000000002521005
GDOR000300000461008
ELL00210100022131029
ROT0000000000012304
ACT1000000000002460
ECL00000100000200610
CC42.982.698.793.397.480.897.188.297.280.660.065.088.597.886.9
BCEPDSCUTCLCEPDMCEPMIRASRRRABRRCRRDSPBGDORELLROTACTECL
BCEP340000010000002
DSCUT3190000000000003
CLCEP00149101000000003
DMCEP0007000001001004
MIRA000011411000000000
SR00003590000010112
RRAB0000006700000000
RRC00000021500000018
RRD0000000135000004
SPB0000000002521005
GDOR000300000461008
ELL00210100022131029
ROT0000000000012304
ACT1000000000002460
ECL00000100000200610
CC42.982.698.793.397.480.897.188.297.280.660.065.088.597.886.9

3 APPLICATION TO TrES DATA

3.1 The TrES Lyr1 data set

We analysed 25 947 light curves in the TrES Lyr1 field. TrES is a network of three 10-cm optical telescopes searching the sky for transiting planets (Alonso et al. 2007; O’Donovan 2008). This network consisted of Sleuth (Palomar Observatory, Southern California), the PSST (Lowell Observatory, Northern Arizona) and STARE (Observatorio del Teide, Canary Islands, Spain), as TrES now excludes Sleuth and STARE, but includes WATTS. The TrES Lyr1 field is a 5°.7 × 5°.7 field, centred on the star 16 Lyr and is part of the Kepler field (Alonso et al. 2007). Most light curves have about 15 000 observations spread with a total time-span of approximately 75 d. A small fraction has less than 5000 observations with a total time-span of around 62 d. Observations are given in either the Sloan r (Sleuth) or the Kron–Cousins R magnitude (PSST) and the mean R magnitude ranges from 9.2 to 16.3.

3.2 Classification of variable stars

3.2.1 Results of the variability detection

With the use of the variability detection algorithm, described in Section 2.1, we searched for frequencies in the range 3/Ttot to 50 c/d, with Ttot the total time-span of the observations in days. In order to avoid the problem of daily aliasing in an automated way, small frequency intervals around multiples of 1 c/d were flagged as ‘unreliable’. Using a false alarm probability of α = 0.005 (the null hypothesis of only having noise in the light curves is rejected when P < α, with P the probability of finding such a peak in the power spectrum of a time series that only contains noise), about 18 000 objects were found non-constant. The stars for which we could not find significant frequencies were used to determine the rms level of the time series as a function of the mean magnitude, which is plotted in Fig. 3, indicating to what level we can detect variability. The upward trend can be explained in terms of photon noise.

The rms of the time series plotted as a function of the mean magnitude for stars having no significant frequencies and no trend.
Figure 3

The rms of the time series plotted as a function of the mean magnitude for stars having no significant frequencies and no trend.

3.2.2 Classification results

We used the multistage tree presented in Section 2.4, where we excluded the stars with activity and variables with rotational modulation. As already mentioned, these classes were included in the multistage tree in view of the Kepler mission. However, we do not expect to find good candidates in the ground-based data of TrES Lyr1 as these classes are characterized by low amplitudes. The classification algorithm was able to detect many good candidate class members. By candidate we mean a target belonging to the class with the highest class probability above two different cut-off values pmin = 0.5 and 0.75 and with a generalized Mahalanobis distance d < 3 to that class. A quick visual check of the light curves and phase plots of the targets with a distance above 3 showed that a large fraction of light curves suffers from instrumental effects. The results of the classification are listed in Table 5.

Table 5

Overview of the classification results using two different cut-off values for the highest class probability p. A generalized Mahalanobis distance d < 3 to the most probable class is taken as defined in equation (6).

Class(es)p > 0.5p > 0.75
Eclipsing binaries (ECL)158130
Ellipsoidal (ELL)571214
Classical cepheids (CLCEP)32
Double-mode cepheids (DMCEP)00
RR-Lyr stars, subtype ab (RRAB)22
RR-Lyr stars, subtype c (RRC)44
RR-Lyr stars, subtype d (RRD)00
β Cep or δ Sct stars (BCEP/DSCUT)842720
SPB or γ Dor stars (SPB/GDOR)914453
Mira variables (MIRA)00
Semiregular (SR)85
Class(es)p > 0.5p > 0.75
Eclipsing binaries (ECL)158130
Ellipsoidal (ELL)571214
Classical cepheids (CLCEP)32
Double-mode cepheids (DMCEP)00
RR-Lyr stars, subtype ab (RRAB)22
RR-Lyr stars, subtype c (RRC)44
RR-Lyr stars, subtype d (RRD)00
β Cep or δ Sct stars (BCEP/DSCUT)842720
SPB or γ Dor stars (SPB/GDOR)914453
Mira variables (MIRA)00
Semiregular (SR)85
Table 5

Overview of the classification results using two different cut-off values for the highest class probability p. A generalized Mahalanobis distance d < 3 to the most probable class is taken as defined in equation (6).

Class(es)p > 0.5p > 0.75
Eclipsing binaries (ECL)158130
Ellipsoidal (ELL)571214
Classical cepheids (CLCEP)32
Double-mode cepheids (DMCEP)00
RR-Lyr stars, subtype ab (RRAB)22
RR-Lyr stars, subtype c (RRC)44
RR-Lyr stars, subtype d (RRD)00
β Cep or δ Sct stars (BCEP/DSCUT)842720
SPB or γ Dor stars (SPB/GDOR)914453
Mira variables (MIRA)00
Semiregular (SR)85
Class(es)p > 0.5p > 0.75
Eclipsing binaries (ECL)158130
Ellipsoidal (ELL)571214
Classical cepheids (CLCEP)32
Double-mode cepheids (DMCEP)00
RR-Lyr stars, subtype ab (RRAB)22
RR-Lyr stars, subtype c (RRC)44
RR-Lyr stars, subtype d (RRD)00
β Cep or δ Sct stars (BCEP/DSCUT)842720
SPB or γ Dor stars (SPB/GDOR)914453
Mira variables (MIRA)00
Semiregular (SR)85

As with CoRoT, the main objective of TrES was the search for planets. We do not find many long period variables, Cepheids and RR Lyr among its targets. The total time-span of the light curves is also too short to be able to detect Mira type variables.

3.2.3 Eclipsing binaries and ellipsoidal variables

Irrespective of the observed field on the sky, we should always find a number of eclipsing binaries and ellipsoidal variables. Light curves of eclipsing binaries are very different from those of pulsating stars and therefore generally well separated using the phase differences between the first three harmonics of the first frequency. Most detected candidate binaries have therefore a very high probability (>90 per cent) of belonging to the ECL class. We found about 158 reliable eclipsing binaries. Some good examples of eclipsing binary light curves are shown in Fig. 4. It is remarkable that, although eclipses are not always easily seen in the light curve, they clearly show up in the phase plot and are detected by the classification algorithm.

Panels on the left-hand side show a sample of TrES Lyr1 time series of eclipsing binaries. Right-hand column: the corresponding phase plots, made with the detected frequency.
Figure 4

Panels on the left-hand side show a sample of TrES Lyr1 time series of eclipsing binaries. Right-hand column: the corresponding phase plots, made with the detected frequency.

3.2.4 Monoperiodic pulsators

Despite the fact that Cepheids and RR Lyr are easy to distinguish from other classes due to their large amplitudes, almost no good candidates were found. Examples of the few candidates found are shown in Fig. 5.

Panels on the left-hand side show a sample of TrES Lyr1 time series of radial pulsators. Panels on the right-hand side show the corresponding phase plots, made with the detected frequency.
Figure 5

Panels on the left-hand side show a sample of TrES Lyr1 time series of radial pulsators. Panels on the right-hand side show the corresponding phase plots, made with the detected frequency.

3.2.5 Multiperiodic pulsators

As no colour information was available, confusion between β Cep and δ Sct stars occurs, because of overlapping frequency ranges. For this reason we merged these two classes into a single class. It is possible that, for the same target, these classes have similar probabilities below 0.5, but add up to a value well above 0.5. Similarly, we could often not make a clear distinction between γ Dor and SPB stars, because they show similar gravity-mode spectra. This problem may be solved by adding supplementary information like temperature, colours or a spectrum, not only for the targets but also for the training sets. Although frequencies around multiples of 1 c/d have been set unreliable, especially the γ Dor and SPB classes suffer from the combination of daily aliasing and instrumental effects. For this class, a visual inspection of the light curves and phase plots was needed. Fig. 6 shows some good examples of non-radial pulsators.

Left-hand column: some TrES Lyr1 light curves of non-radial pulsators. Right-hand column: the corresponding phase plots, made with the detected frequency.
Figure 6

Left-hand column: some TrES Lyr1 light curves of non-radial pulsators. Right-hand column: the corresponding phase plots, made with the detected frequency.

4 DISCUSSION AND CONCLUSIONS

In contrast to previous classification methods for time series of photometric data (e.g. Debosscher et al. 2007) we now only use significant frequencies and overtones as attributes, giving rise to less confusion. We are able to statistically deal with a variable number of attributes using the multistage approach developed here. Another advantage of this approach is that the conditional probabilities in each node can be simplified by dropping one or more attributes that are not relevant for a particular node. Moreover, in each node, a different classifier can be chosen. In this paper we only used the Gaussian mixture classifier, but also other methods like e.g. Bayesian nets can be used, which gives more flexibility. Finally, the variability classes were better described by a finite sum of multivariate Gaussians.

We applied our methods to the ground-based data of the TrES Lyr1 field, which is also observed by the Kepler satellite. We found non-radial pulsators such as β Cep, δ Sct, SPB and γ Dor stars. Because of lack of precise and dereddened information, and because of overlap in frequency range we could, however, sometimes not avoid confusion between β Cep and δ Sct stars, on one hand, and between SPB and γ Dor stars on the other hand. Besides non-radial pulsators we also mention the detection of binary stars and some classical radial pulsators. The results of this classification will be made available through electronic tables. A small sample is given in Table 6. See the Supporting Information for the full results.

Table 6

The results of the supervised classification applied to the TrES Lyr1 data base. The first column contains the identifiers of the TrES Lyr1 objects. For each object, the first two frequencies, each with three harmonics, are given. Phase differences for the first frequency are also given. Details of these attributes can be found in Debosscher et al. (2007). The last column gives the most probable class according to the classification. The full version of this table is available with the online version of the article (see Supporting Information).

IdentifierFrequencies (formula)Amplitudes (mmag)Phase differencesClass
f1f2a11a12a13a21a22a23pdf 12pdf 13
0044210.9065.3142.16111.7730.6020.114.00−1.36−2.90ECL
100725.5744.4815.0110.9314.596.57−1.972.71ECL
161163.4929.6513.9514.0111.989.992.49−1.653.00ECL
2197015.4121.4320.0554.2515.274.60−1.43−2.83ECL
2248865.8365.72118.5911.7016.25−1.54ECL
0145727.8320.651.681.030.991.570.88−1.752.72BCEP/DSCUT
0766528.5729.002.201.151.402.15−0.99−1.30BCEP/DSCUT
0687422.2718.992.901.231.622.680.64−0.79BCEP/DSCUT
0323127.5054.672.040.921.402.01−1.30−2.92BCEP/DSCUT
14595115.8128.7145.508.266.0611.385.961.59−0.01BCEP/DSCUT
2505210.152.015.372.254.883.43−1.33SPB/GDOR
237589.2728.914.212.843.63−0.92SPB/GDOR
1066212.261.2726.297.167.5819.136.363.302.87−0.65SPB/GDOR
1197938.6138.506.0134.006.760.92ELL
0637442.1596.577.2346.772.0520.631.901.91ELL
2276342.27253.6562.54139.654.8415.46−2.51−0.75ELL
0278618.8710.86168.8775.1241.024.802.322.81−0.39RRAB
2384912.2612.84369.48105.1532.94141.9531.391.62−0.71RRAB
1618624.1124.21347.27159.82118.5733.242.34−1.39RRAB
1128742.9344.23142.6513.5323.18−2.72RRC
0655642.9344.24127.0712.1522.363.64−2.74RRC
0988031.6412.32198.7317.6712.136.882.362.19−3.051.10RRC
2446963.5531.72167.8023.574.0025.284.25−1.56−0.09RRC
027182.274.65198.8026.685.9114.9213.2813.601.230.28CLCEP
106042.272.16239.0935.174.7014.751.170.86CLCEP
163500.822.5962.9818.9114.2061.6939.3631.33−1.70−0.92SR
001160.9713.90100.8829.9832.4557.082.39−2.59SR
000190.7910.6171.4717.0519.7077.1028.5611.371.052.78SR
058751.340.7782.5512.872.398.485.232.732.02−2.78SR
052051.101.2078.4214.612.3514.594.061.15−0.11−2.12SR
001110.6012.9359.2217.2528.1438.6425.4813.21−0.330.44SR
226430.960.8278.989.945.606.576.632.67−0.56SR
173071.086.3366.3560.8136.0383.6038.8558.531.050.72SR
IdentifierFrequencies (formula)Amplitudes (mmag)Phase differencesClass
f1f2a11a12a13a21a22a23pdf 12pdf 13
0044210.9065.3142.16111.7730.6020.114.00−1.36−2.90ECL
100725.5744.4815.0110.9314.596.57−1.972.71ECL
161163.4929.6513.9514.0111.989.992.49−1.653.00ECL
2197015.4121.4320.0554.2515.274.60−1.43−2.83ECL
2248865.8365.72118.5911.7016.25−1.54ECL
0145727.8320.651.681.030.991.570.88−1.752.72BCEP/DSCUT
0766528.5729.002.201.151.402.15−0.99−1.30BCEP/DSCUT
0687422.2718.992.901.231.622.680.64−0.79BCEP/DSCUT
0323127.5054.672.040.921.402.01−1.30−2.92BCEP/DSCUT
14595115.8128.7145.508.266.0611.385.961.59−0.01BCEP/DSCUT
2505210.152.015.372.254.883.43−1.33SPB/GDOR
237589.2728.914.212.843.63−0.92SPB/GDOR
1066212.261.2726.297.167.5819.136.363.302.87−0.65SPB/GDOR
1197938.6138.506.0134.006.760.92ELL
0637442.1596.577.2346.772.0520.631.901.91ELL
2276342.27253.6562.54139.654.8415.46−2.51−0.75ELL
0278618.8710.86168.8775.1241.024.802.322.81−0.39RRAB
2384912.2612.84369.48105.1532.94141.9531.391.62−0.71RRAB
1618624.1124.21347.27159.82118.5733.242.34−1.39RRAB
1128742.9344.23142.6513.5323.18−2.72RRC
0655642.9344.24127.0712.1522.363.64−2.74RRC
0988031.6412.32198.7317.6712.136.882.362.19−3.051.10RRC
2446963.5531.72167.8023.574.0025.284.25−1.56−0.09RRC
027182.274.65198.8026.685.9114.9213.2813.601.230.28CLCEP
106042.272.16239.0935.174.7014.751.170.86CLCEP
163500.822.5962.9818.9114.2061.6939.3631.33−1.70−0.92SR
001160.9713.90100.8829.9832.4557.082.39−2.59SR
000190.7910.6171.4717.0519.7077.1028.5611.371.052.78SR
058751.340.7782.5512.872.398.485.232.732.02−2.78SR
052051.101.2078.4214.612.3514.594.061.15−0.11−2.12SR
001110.6012.9359.2217.2528.1438.6425.4813.21−0.330.44SR
226430.960.8278.989.945.606.576.632.67−0.56SR
173071.086.3366.3560.8136.0383.6038.8558.531.050.72SR
Table 6

The results of the supervised classification applied to the TrES Lyr1 data base. The first column contains the identifiers of the TrES Lyr1 objects. For each object, the first two frequencies, each with three harmonics, are given. Phase differences for the first frequency are also given. Details of these attributes can be found in Debosscher et al. (2007). The last column gives the most probable class according to the classification. The full version of this table is available with the online version of the article (see Supporting Information).

IdentifierFrequencies (formula)Amplitudes (mmag)Phase differencesClass
f1f2a11a12a13a21a22a23pdf 12pdf 13
0044210.9065.3142.16111.7730.6020.114.00−1.36−2.90ECL
100725.5744.4815.0110.9314.596.57−1.972.71ECL
161163.4929.6513.9514.0111.989.992.49−1.653.00ECL
2197015.4121.4320.0554.2515.274.60−1.43−2.83ECL
2248865.8365.72118.5911.7016.25−1.54ECL
0145727.8320.651.681.030.991.570.88−1.752.72BCEP/DSCUT
0766528.5729.002.201.151.402.15−0.99−1.30BCEP/DSCUT
0687422.2718.992.901.231.622.680.64−0.79BCEP/DSCUT
0323127.5054.672.040.921.402.01−1.30−2.92BCEP/DSCUT
14595115.8128.7145.508.266.0611.385.961.59−0.01BCEP/DSCUT
2505210.152.015.372.254.883.43−1.33SPB/GDOR
237589.2728.914.212.843.63−0.92SPB/GDOR
1066212.261.2726.297.167.5819.136.363.302.87−0.65SPB/GDOR
1197938.6138.506.0134.006.760.92ELL
0637442.1596.577.2346.772.0520.631.901.91ELL
2276342.27253.6562.54139.654.8415.46−2.51−0.75ELL
0278618.8710.86168.8775.1241.024.802.322.81−0.39RRAB
2384912.2612.84369.48105.1532.94141.9531.391.62−0.71RRAB
1618624.1124.21347.27159.82118.5733.242.34−1.39RRAB
1128742.9344.23142.6513.5323.18−2.72RRC
0655642.9344.24127.0712.1522.363.64−2.74RRC
0988031.6412.32198.7317.6712.136.882.362.19−3.051.10RRC
2446963.5531.72167.8023.574.0025.284.25−1.56−0.09RRC
027182.274.65198.8026.685.9114.9213.2813.601.230.28CLCEP
106042.272.16239.0935.174.7014.751.170.86CLCEP
163500.822.5962.9818.9114.2061.6939.3631.33−1.70−0.92SR
001160.9713.90100.8829.9832.4557.082.39−2.59SR
000190.7910.6171.4717.0519.7077.1028.5611.371.052.78SR
058751.340.7782.5512.872.398.485.232.732.02−2.78SR
052051.101.2078.4214.612.3514.594.061.15−0.11−2.12SR
001110.6012.9359.2217.2528.1438.6425.4813.21−0.330.44SR
226430.960.8278.989.945.606.576.632.67−0.56SR
173071.086.3366.3560.8136.0383.6038.8558.531.050.72SR
IdentifierFrequencies (formula)Amplitudes (mmag)Phase differencesClass
f1f2a11a12a13a21a22a23pdf 12pdf 13
0044210.9065.3142.16111.7730.6020.114.00−1.36−2.90ECL
100725.5744.4815.0110.9314.596.57−1.972.71ECL
161163.4929.6513.9514.0111.989.992.49−1.653.00ECL
2197015.4121.4320.0554.2515.274.60−1.43−2.83ECL
2248865.8365.72118.5911.7016.25−1.54ECL
0145727.8320.651.681.030.991.570.88−1.752.72BCEP/DSCUT
0766528.5729.002.201.151.402.15−0.99−1.30BCEP/DSCUT
0687422.2718.992.901.231.622.680.64−0.79BCEP/DSCUT
0323127.5054.672.040.921.402.01−1.30−2.92BCEP/DSCUT
14595115.8128.7145.508.266.0611.385.961.59−0.01BCEP/DSCUT
2505210.152.015.372.254.883.43−1.33SPB/GDOR
237589.2728.914.212.843.63−0.92SPB/GDOR
1066212.261.2726.297.167.5819.136.363.302.87−0.65SPB/GDOR
1197938.6138.506.0134.006.760.92ELL
0637442.1596.577.2346.772.0520.631.901.91ELL
2276342.27253.6562.54139.654.8415.46−2.51−0.75ELL
0278618.8710.86168.8775.1241.024.802.322.81−0.39RRAB
2384912.2612.84369.48105.1532.94141.9531.391.62−0.71RRAB
1618624.1124.21347.27159.82118.5733.242.34−1.39RRAB
1128742.9344.23142.6513.5323.18−2.72RRC
0655642.9344.24127.0712.1522.363.64−2.74RRC
0988031.6412.32198.7317.6712.136.882.362.19−3.051.10RRC
2446963.5531.72167.8023.574.0025.284.25−1.56−0.09RRC
027182.274.65198.8026.685.9114.9213.2813.601.230.28CLCEP
106042.272.16239.0935.174.7014.751.170.86CLCEP
163500.822.5962.9818.9114.2061.6939.3631.33−1.70−0.92SR
001160.9713.90100.8829.9832.4557.082.39−2.59SR
000190.7910.6171.4717.0519.7077.1028.5611.371.052.78SR
058751.340.7782.5512.872.398.485.232.732.02−2.78SR
052051.101.2078.4214.612.3514.594.061.15−0.11−2.12SR
001110.6012.9359.2217.2528.1438.6425.4813.21−0.330.44SR
226430.960.8278.989.945.606.576.632.67−0.56SR
173071.086.3366.3560.8136.0383.6038.8558.531.050.72SR

The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 227224 (PROSPERITY), the Research Council of K. U. Leuven (GOA/2008/04), from the Fund for Scientific Research of Flanders (G.0332.06), the Belgian Federal Science Policy Office (C90309: CoRoT Data Exploitation, C90291 Gaia-DPAC) and the Spanish Ministerio de Educación y Ciencia through grant AYA2005-04286. Public access to the TrES data was provided to the through the NASA Star and Exoplanet Database (NStED, http://nsted.ipac.caltech.edu).

REFERENCES

Aerts
C.
Christensen-Dalsgaard
J.
Kurtz
D. W.
,
2010
,
Asteroseismology
.
Springer-Verlag
, Berlin(ISBN 978-1-4020-5178-4)

Alonso
R.
et al.,
2007
, in
Afonso
C.
Weldrake
D.
Henning
Th.
, eds, ASP Conf. Ser. Vol. 366,
Transiting Extrasolar Planets Workshop
.
Astron. Soc. Pac.
, San Francisco, p.
13

Debosscher
J.
et al.,
2007
,
A&A
,
475
,
1159

Debosscher
J.
et al.,
2009
,
A&A
,
506
,
519

Debosscher
J.
Blomme
J.
Aerts
C.
De Ridder
J.
,
2010
,
A&A
, submitted

Fridlund
M.
Baglin
A.
Lochard
J.
Conroy
L.
,
2006
, eds,
ESA SP-1306, The CoRoT Mission: Pre-Launch Status
.
ESA
, Noordwidjk

Gamerman
D.
Migon
H.
,
1993
.
J. R. Statistical Soc. Ser. B
,
55
,
629

Gilliland
R. L.
et al.,
2010
,
PASP
,
122
,
131

Horne
J. H.
Baliunas
S. L.
,
1986
,
ApJ
,
302
,
763

O’Donovan
F.
,
2008
, PhD thesis, California Institute of Technology, Pasadena, California

Sarro
L. M.
Debosscher
J.
López
M.
Aerts
C.
,
2009
,
A&A
,
494
,
739

Schwarzenberg-Czerny
A.
,
1998
,
MNRAS
,
301
,
831

Witten
I. H.
Frank
E.
,
2005
,
Data Mining: Practical Machine Learning Tools and Techniques
, 2nd edn. Morgan Kaufmann, San Francisco (ISBN 0-12-088407-0)

SUPPORTING INFORMATION

Additional Supporting Information may be found in the online version of this article:

Table 6. The results of the supervised classification applied to the TrES Lyr1 data base.

Please note:Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

Supplementary data