Completeness of the Gaia-verse III: using hidden states to infer gaps, detection efficiencies, and the scanning law from the DR2 light curves

Boubert, Douglas; Everall, Andrew; Fraser, Jack; Gration, Amery; Holl, Berry

doi:10.1093/mnras/staa3791

ABSTRACT

The completeness of the Gaia catalogues heavily depends on the status of that space telescope through time. Stars are only published with each of the astrometric, photometric, and spectroscopic data products if they are detected a minimum number of times. If there is a gap in scientific operations, a drop in the detection efficiency or Gaia deviates from the commanded scanning law, then stars will miss out on potential detections and thus be less likely to make it into the Gaia catalogues. We lay the groundwork to retrospectively ascertain the status of Gaia throughout the mission from the tens of individual measurements of the billions of stars, by developing novel methodologies to infer both the orientation and angular velocity of Gaia through time and gaps and efficiency drops in the detections. We have applied these methodologies to the Gaia data release 2 variable star epoch photometry – which are the only publicly available Gaia time-series at the present time – and make the results publicly available. We accompany these results with a new python package scanninglaw that you can use to easily predict Gaia observation times and detection probabilities for arbitrary locations on the sky.

methods: data analysis, methods: statistical, stars: statistics, Galaxy: kinematics and dynamics, Galaxy: stellar content

1 INTRODUCTION

Gaia (Gaia Collaboration 2016) is the first all-sky astrometric, photometric, and spectroscopic survey with tens to hundreds of repeat observations over a decade-long baseline. Future Gaia data releases (DR) will drive a generational breakthrough in our ability to carry out time-domain astrophysics, allowing us to study binary stars, exoplanets, variable stars, supernovae, and microlensing events through combinations of epoch astrometric, photometric, and spectroscopic measurements. Our ability to discover a transient source or detect the variability of a periodic source will depend on both the number of Gaia observations and the timings between them (as well as the uncertainties of the individual observations). The pattern of these Gaia observations varies systematically across the sky – as described by the scanning law which states where Gaia was looking at every point in time – and thus we will be differently able to investigate time-domain phenomena in different regions of the sky. The picture is further complicated by the possibility that a Gaia observation predicted by the scanning law may not result in a published measurement, as that can change the completeness and sensitivity of the Gaia pipeline to time variable objects at that location on the sky. The set of variable and non-variable objects in the Gaia catalogues will be biased by the selection function resulting from the effective scanning law that was used, and we will not be able to make unbiased general statements unless we can correct for these biases, which will require us (amongst other things) to know when Gaia looked at a given point on the sky and the probability that each of those observations would have resulted in a measurement of a given star. It is thus of fundamental importance that we have a precise knowledge of the Gaia scanning law, the reasons why a Gaia observation might not result in a published measurement, and the times when no Gaia measurements were acquired.

In Boubert, Everall & Holl (2020, hereafter referred to as Paper I), we conducted preliminary work towards that goal by exploiting the light curves of the 550 737 variable sources published in Gaia DR2 (Evans et al. 2018; Gaia Collaboration 2018; Holl et al. 2018; Riello et al. 2018), which tell us the time of each detection and the location of the detection on the Gaia focal plane. By aligning the reported location of the source on the focal plane with the location of the source on the sky, we were able to derive first-order corrections to the Gaia DR2 nominal scanning law¹ (the scanning law that DPAC (Data Processing and Analysis Consortium) commanded Gaia to follow and which can deviate from the true scanning law by up to 30 arcseconds). By looking at the times between consecutive photometric measurements of all of the variable stars, and by comparing the predicted number of observations at each location on the sky to the reported maximum number of astrometric, photometric, and spectroscopic detections, we were able to identify ‘gaps’ which contributed no detections to the published catalogue.

Boubert & Everall (2020, hereafter referred to as Paper II) relied on the results of Paper I to accurately predict the number of observations of each source that could have resulted in detections (i.e. those not during a period which did not result in any published detections). Paper II used this information to model the probability that the Gaia DR2 observations of a source at any location on the sky resulted in at least five detections – the minimum requirement for a source to enter the Gaia DR2 source catalogue – and thus arrived at an approximation to the Gaia DR2 selection function. However, there were several deficiencies in Paper I which we will address here. First, our corrections to the scanning law were simplistic and only corrected the pitch and roll of Gaia, not the yaw. Our corrected scanning law was thus able to predict the across-scan location of the source to a precision of 0.083 arcsec, but the along-scan location remained uncertain at a level of 30 arcsec, with a corresponding uncertainty on the timing of predicted observations of 0.5 s. Second, we defined a gap to have occurred between any two consecutive photometric measurements more than 1 per cent of a day apart without a robust statistical justification. This may have caused us to both miss shorter gaps and to erroneously interpret long intervals between consecutive observations as gaps in regions of the sky that are simply sparse in variable stars. Third, we ignored that another reason for a Gaia observation not to have resulted in a published detection is if that detection is deleted on-board due to Gaia scanning dense regions of the sky for large fractions of the spin period and exhausting its available storage capacity due to the limited down-link bandwidth. This will result in a magnitude-dependent deletion fraction that varies as a function of time as Gaia scans across more and less dense parts of the sky. Fourth, we ignored that there are other processes besides gaps and on-board deletions that can cause an observation to not result in a published detection, ranging from faint sources not being detected every time due to photon shot noise to crowding preventing every source from being assigned a window to quality cuts applied by the Gaia DPAC as listed in the Introduction of Paper I.

It is likely that we will not be able to identify with full certainty the nature of all of the processes that caused observations to be missed. Instead, our goal in this work is to characterize the time-varying and partially probabilistic process through which Gaia observations result in published detections, to the best degree possible with the 550 737 light curves associated with Gaia DR2 variable sources. Our methods will be immediately applicable to larger samples of Gaia epoch measurements when they are made available in later DR. The fourth and subsequent Gaia DR will contain all of the epoch astrometric, photometric, and spectroscopic measurements and an extension of this methodology will allow the community to diagnose the photometric and temporal biases impairing the completeness of these DR.

This work consists of two parts. In Section 3, we use the locations of the variable stars both on the focal plane and on the sky at their times of detection to infer Gaia’s orientation and angular velocity throughout the 22 months of Gaia DR2. In Section 4, we combine the detection times of the variable stars with the predicted observations that did not result in published detections to simultaneously infer gaps which did not result in any published detections and the magnitude dependency of the detection probability with time. The methodologies in both sections are based on temporally evolving state space models and so we begin with a pedagogical overview of Gaussian process regression, Hidden Markov Models, and Kalman filters in Section 2.

2 MODELLING STATES THROUGH TIME

Throughout this paper, we will consider probabilistic models with time-evolving state parameters |$\boldsymbol{\theta }(t)$| that describe some aspect of a process that we are interested in. At discrete times |$\lbrace t_{i}\rbrace _{i = 1}^{n}$|⁠, we make a noisy measurement of either those parameters, a function of those parameters, or random variables that are conditioned on those parameters. The latter two cases are known as hidden state spaces, because we never have direct access to the values of the state parameters. In this section, we give an overview of such models as a precursor to the other sections in this work.

2.1 Gaussian process regression

Perhaps the most common state-space probabilistic model is the Gaussian process regression model, which allows us to predict a state at some arbitrary time given observations of that state at some number of other times (Sacks et al. 1989). We treat each state θ(t) as the realization of a random variable Θ(t) and assume that for any finite set of times |$\lbrace t_i\rbrace _{i=1}^{n}$| those random variables |$\lbrace \Theta (t_{i})\rbrace _{i=1}^{n}$| have a multivariate normal distribution. Thus, the time evolution of a state, |$\lbrace \theta (t)\rbrace _{t \in \mathbb {R}}$|⁠, is a realization of a Gaussian process. A Gaussian random process is completely defined by its mean-value function, m, given by m(t) = 〈Θ(t)〉, and its covariance kernel, k, given by |$k(t, t^{\prime }) = {Var}(\Theta (t), \Theta (t^{\prime }))$|⁠. Accordingly, we write |$\Theta \sim \operatorname{GP}(m, k)$|⁠. Given a sample of |$\lbrace \Theta (t_{i})\rbrace _{i = 1}^{n}$|⁠, which we can express as a real vector |$\boldsymbol{\theta } = [\theta (t_{1}) \dots \theta (t_{n})]^{\mathrm{T}}$|⁠, we may predict the random variable Θ(t_*), for any arbitrary time t_*, using the conditional random variable |$\Theta (t_{*})\,\,|\,\,(\lbrace \Theta (t_{i})\rbrace _{i = 1}^{n} = \boldsymbol{\theta })$|⁠. We denote this random variable |$\hat{\Theta }(t_{*})$|⁠. It may be shown (Rasmussen & Williams 2006) that |$\hat{\Theta }(t_{*})$| is normal with mean

$$\begin{eqnarray*} \langle \hat{\Theta }(t_{*})\rangle = m(t_{*}) + \boldsymbol{k}^{\mathrm{T}}\boldsymbol{\sf K}^{-1}(\boldsymbol{\theta } - \boldsymbol{m}) \end{eqnarray*}$$

(1)

and variance

$$\begin{eqnarray*} {Var}(\hat{\Theta }(t_{*})) = k(t_{*}, t_{*}) - \boldsymbol{k}^{\mathrm{T}}\boldsymbol{\sf K}^{-1}\boldsymbol{k} \end{eqnarray*}$$

(2)

where |$\boldsymbol{k} = [k(t_{*}, t_{1}) \ldots k(t_{*}, t_{n})]^{\mathrm{T}}$|⁠, K_ij = k(t_i, t_j), and |$\boldsymbol{m} = [m(t_{1}) \ldots m(t_{n})]^{\mathrm{T}}$|⁠. Typically, we know neither the mean-value function nor the covariance kernel. Instead, we assume that they are members of some family of functions, which is parametrized by a set of hyperparameters.

It is common to assume that the mean is constant, given by m(t) = μ for some real μ, and that the covariance kernel is a linear sum of a squared exponential and a Kronecker delta, given by

$$\begin{eqnarray*} k(t, t^{\prime }) = \varepsilon ^{2}\exp \left(\frac{-(t^{\prime } - t)^{2}}{2l^{2}}\right) + \sigma ^2(t)\delta (t, t^{\prime }). \end{eqnarray*}$$

(3)

The first term of this expression is the covariance kernel for the underlying time-series, and the second is the covariance kernel for the noise in each observation, with σ(t_i) being the known noise of the measurement at time t_i. We call the hyperparameter μ the ‘signal mean’, ε² the ‘signal variance’ and l the ‘time-scale’, and collate them in the hyperparameter vector |$\boldsymbol{\xi }=[\mu ,\varepsilon ^2,l]$|⁠.

By maximizing the likelihood we can recover the hyperparameters that best suits the data. The random variables |$\lbrace \Theta (t_{i})\rbrace _{i = 1}^{n}$| have a multivariate normal distribution, and therefore have the probability density function (PDF)

$$\begin{eqnarray*} f(\boldsymbol{\theta}; \boldsymbol {\xi}) = \frac{1}{\sqrt{{(2\pi)^{n}|\boldsymbol{\sf K}_{\boldsymbol{\xi }}|}}} \exp \left(-\frac{1}{2}(\boldsymbol {\theta} - \boldsymbol {m}_{\boldsymbol {\xi }})^{\mathrm{T}}{\boldsymbol{\sf K}_{\boldsymbol {\xi }}}^{\, -1}(\boldsymbol {\theta } - \boldsymbol {m}_{\boldsymbol {\xi }})\right), \end{eqnarray*}$$

(4)

where the subscripts now indicate that K and |$\boldsymbol {m}$| are functions of the hyperparameter vector |$\boldsymbol {\xi }$|⁠. By definition, the likelihood of |$\boldsymbol {\xi }$| is given by

$$\begin{eqnarray*} L(\boldsymbol {\xi }| \boldsymbol {\theta}) = f(\boldsymbol {\theta }| \boldsymbol {\xi}) \end{eqnarray*}$$

(5)

and hence the log-likelihood of |$\boldsymbol {\xi }$| is, up to an additive constant,

$$\begin{eqnarray*} \ln (L(\boldsymbol {\xi }| \boldsymbol {\theta })) = - \frac{1}{2} (\boldsymbol {\theta } - \boldsymbol {m}_{\boldsymbol {\xi }})^{\mathrm{T}}{\boldsymbol{\sf K}_{\boldsymbol {\xi }}}^{\, -1}(\boldsymbol {\theta } - \boldsymbol {m}_{\boldsymbol {\xi }}) - \frac{1}{2}\ln (|\boldsymbol{\sf K}_{\boldsymbol {\xi}}|). \end{eqnarray*}$$

(6)

We must therefore find the hyperparameter vector that maximizes this expression. It is common practice to ignore the error associated with the resulting maximum-likelihood estimate of the hyperparameter vector, and to proceed as if the hyperparameter is known with certainty.

When the mean is constant we may simply estimate it by setting it equal to the sample mean or – more accurately – the weighted sample mean,

$$\begin{eqnarray*} \mu = \frac{\sum _{i = 1}^{n}\theta _{i}/\sigma _i}{\sum _{i = 1}^{n}1/\sigma _{i}} \end{eqnarray*}$$

(7)

where θ_i and σ_i are the reported measurement and uncertainty at time t_i. When the signal variance dominates the noise variance, that is, when ε² ≫ σ², we may estimate it by setting it equal to the sample variance or – more accurately – the weighted sample variance,

$$\begin{eqnarray*} \varepsilon^{2} = \frac{\sum _{i = 1}^{n}(\theta _{i} - \mu)^{2}/\sigma _{i}}{\sum _{i = 1}^{n}1/\sigma _{i}}. \end{eqnarray*}$$

(8)

Similarly, the time-scale, l, can be set equal to some known characteristic time-scale for the physical process in question. Again, it is common practice to ignore the error associated with these estimates.

We will use Gaussian process regression in Section 4.1 to fill in missing values in the Gaia DR2 variable star light curves, by inferring the magnitudes of the stars at the predicted observation times that did not yield published detections.

2.2 Hidden Markov Models

Hidden Markov Models are used to model a system that is known to transition between discrete states through a Markov Process X, but where we do not have direct access to the value of the states |$\lbrace X_i\rbrace _{i=1}^n$|⁠. There is assumed to be another discrete process Y whose states |$\lbrace Y_i\rbrace _{i=1}^n$| are conditioned on process X and are observed, and so the task is to infer the states of process X given the observed states of process Y. For instance, the lead author regularly travels by train from Oxford to London. Whether that train leaves Oxford on time on a particular day (Y_i = 1) or not (Y_i = 0) is conditionally dependent on whether there were leaves on the track between Hereford and Oxford (X_i = 1) or not (X_i = 0). The lead author will not know if there were leaves on the track, but the train is more likely to be running late if this was the case. The probability of there being leaves on the track will depend in some way on whether there were leaves on the track the previous day, for instance through the weather or the number of leaves left on the trees.

The Markov property tells us that the probability distribution of the future state X_i conditioned on the values of all previous states only depends on the present state,

$$\begin{eqnarray*} P(X_{i}|X_1=x_1,\ldots ,X_{i-1}=x_{i-1}) = P(X_{i}|X_{i-1}=x_{i-1}). \end{eqnarray*}$$

(9)

The other key property of a Hidden Markov process is that the observation at the present time Y_i is only conditionally dependent on the present hidden state,

$$\begin{eqnarray*} P(Y_{i}|X_1=x_1,\ldots ,X_i=x_i) = P(Y_{i}|X_i=x_i). \end{eqnarray*}$$

(10)

These two probability distributions defined at each step completely specify a Hidden Markov Model. In the analogy above, we would need to specify the probability of there being leaves on the track today given whether there were leaves on the track yesterday, and the probability of the Oxford-to-London train leaving Oxford on time depending on whether there are leaves on the track today.

It is common for these probability distributions to be parametrized by hyperparameters |$\boldsymbol{\xi }$|⁠, whose values are optimized by maximizing the likelihood of the observations. By marginalizing over the hidden states it is possible to condition the observation at the next state on the observations at all previous states |$P(Y_{i}|Y_1=y_1,\ldots ,Y_{i-1}=y_{i-1},\boldsymbol {\xi })$| and thus compute the likelihood of the hyperparameters |$\boldsymbol{\xi }$| as

$$\begin{eqnarray*} &&{L(\boldsymbol{\xi}|Y_1 =y_1,\dots ,Y_n=y_n)} \nonumber \\ &&{\quad= P(Y_1=y_1|\boldsymbol {\xi })\prod _{i=2}^n P(Y_i|Y_1=y_1,\ldots ,Y_{i-1}=y_{i-1},\boldsymbol {\xi }).} \end{eqnarray*}$$

(11)

The Viterbi (1967) algorithm can be used to find the most likely sequence of hidden state values given the observations and fixed values for any hyperparameters. We stress that there is a difference between the most likely value of a state at each time and the value of the state in the most likely sequence of states.

We will use a Hidden Markov Model in Section 4.2 to identify gaps when Gaia observations were not resulting in published detections.

2.3 Kalman filters

Kalman filters are an algorithm to estimate a hidden dynamical state |$\boldsymbol {x}(t)$| at a series of discrete times t_{i = 1, …, n} under two assumptions. First, the state at time t_i is a linear function of the state at t_{i − 1},

$$\begin{eqnarray*} \boldsymbol {x}_{i} = \boldsymbol{\sf F}_i\boldsymbol {x}_{i-1} + \boldsymbol {w}_{i}, \end{eqnarray*}$$

(12)

where |$\boldsymbol{\sf F}_i$| is the matrix describing the state-transition model and |$\boldsymbol {w}_{i}$| is the process noise which is assumed to be drawn from a zero-mean multivariate normal distribution with covariance matrix |$\boldsymbol{\sf Q}_i$|⁠. Second, at each time a noisy measurement |$\boldsymbol {z}_i$| is made of the true state |$\boldsymbol {x}_i$| through the linear model

$$\begin{eqnarray*} \boldsymbol {z}_{i} = \boldsymbol{\sf H}_i\boldsymbol {x}_{i} + \boldsymbol {v}_{i}, \end{eqnarray*}$$

(13)

where |$\boldsymbol{\sf H}_i$| is the matrix describing the observation model and |$\boldsymbol {v}_{i}$| is the observation noise which is assumed to be drawn from a zero-mean multivariate normal distribution with covariance matrix |$\boldsymbol{\sf R}_i$|⁠.

The Kalman filter represents our estimate of the state through a multivariate normal distribution with mean |$\boldsymbol{\hat{x}}_{i|j}$| and covariance |$\boldsymbol{\sf P}_{i|j}$|, which are the mean and covariance of our estimate at time t_i given all of the measurements made at times up to and including time t_j. The Kalman filter iteratively calculates the means |$\boldsymbol{\hat{x}}_{i|i}$| and covariances |$\boldsymbol{\sf P}_{i|i}$|at each time-step given all of the prior observations. Each time-step is split into a prediction stage (where |$\vec{\hat{x}}_{i|i-1}$| and |$\boldsymbol{\sf P}_{i|i-1}$|are predicted from |$\boldsymbol {\hat{x}}_{i-1|i-1}$| and |$\boldsymbol{\sf P}_{i-1|i-1}$|) and a measurement update stage (where the updated estimates |$\boldsymbol {\hat{x}}_{i|i}$| and |$\boldsymbol{\sf P}_{i|i}$|are obtained based on the measurement |$\boldsymbol {z}_i$|⁠). The full system of equations is

$$\begin{eqnarray*} \boldsymbol{\hat{x}}_{i|i-1} &= \boldsymbol{\sf F}_i\boldsymbol {\hat{x}}_{i-1|i-1} \end{eqnarray*}$$

(14)

$$\begin{eqnarray*} \boldsymbol{\sf P}_{i|i-1} &= \boldsymbol{\sf F}_i\boldsymbol{\sf P}_{i-1|i-1}\boldsymbol{\sf F}_i^T+\boldsymbol{\sf Q}_i \end{eqnarray*}$$

(15)

$$\begin{eqnarray*} \boldsymbol {y}_i &\equiv \boldsymbol {z}_i - \boldsymbol{\sf H}_i\boldsymbol {\hat{x}}_{i|i-1} \end{eqnarray*}$$

(16)

$$\begin{eqnarray*} \boldsymbol{\sf S}_i &\equiv \boldsymbol{\sf H}_i\boldsymbol{\sf P}_{i|i-1}\boldsymbol{\sf H}_i^T + \boldsymbol{\sf R}_i \end{eqnarray*}$$

(17)

$$\begin{eqnarray*} \boldsymbol{\sf K}_i &\equiv \boldsymbol{\sf P}_{i|i-1}\boldsymbol{\sf H}_i^T\boldsymbol{\sf S}_i^{-1} \end{eqnarray*}$$

(18)

$$\begin{eqnarray*} \boldsymbol {\hat{x}}_{i|i} &= \boldsymbol {\hat{x}}_{i|i-1} + \boldsymbol{\sf K}_i \boldsymbol {y}_i \end{eqnarray*}$$

(19)

$$\begin{eqnarray*} \boldsymbol{\sf P}_{i|i} &= (\boldsymbol{\sf I}-\boldsymbol{\sf K}_i\boldsymbol{\sf H}_i)\boldsymbol{\sf P}_{i|i-1}. \end{eqnarray*}$$

(20)

The terms |$\boldsymbol {y}_i$| and |$\boldsymbol{\sf S}_i$| are the mean and covariance of the residual between the measurement and the prediction, while the term |$\boldsymbol{\sf K}_i$| is the matrix of optimal weights to apply to the residual when computing the updated estimate, otherwise known as the Kalman gain. If the period under consideration is a fixed-interval t₁ ≤ t ≤ t_n, then it is possible to obtain an improved estimate of the state at time t_i by incorporating the information from observations at later times t_i < t_j ≤ t_n. This is done through the Rauch–Tung–Striebel smoothing algorithm (Rauch, Striebel & Tung 1965) which performs a backward pass starting at time t_n, where we note that the final mean and covariance from the forward algorithm |$\boldsymbol {\hat{x}}_{n|n}$| and |$\boldsymbol{\sf P}_{n|n}$| already utilize all of the available information. At time i, the system of equations to be evaluated is

$$\begin{eqnarray*} \boldsymbol{\sf C}_i = \boldsymbol{\sf P}_{i|i}\boldsymbol{\sf F}_{i+1}^T\boldsymbol{\sf P}_{i+1|i}^{-1} \end{eqnarray*}$$

(21)

$$\begin{eqnarray*} \boldsymbol {\hat{x}}_{i|n} = \boldsymbol {\hat{x}}_{i|i} + \boldsymbol{\sf C}_i(\boldsymbol {\hat{x}}_{i+1|n} - \boldsymbol {\hat{x}}_{i+1|i}) \end{eqnarray*}$$

(22)

$$\begin{eqnarray*} \boldsymbol{\sf P}_{i|n} = \boldsymbol{\sf P}_{i|i} + \boldsymbol{\sf C}_i (\boldsymbol{\sf P}_{i+1|n} - \boldsymbol{\sf P}_{i+1|i})\boldsymbol{\sf C}_i^T. \end{eqnarray*}$$

(23)

A toy example of a system where Kalman filters are suitable is in estimating the position and velocity of a moving object subject to random, zero-mean accelerations where we only have access to occasional, noisy measurements of the position. The state encodes the position p_i and velocity q_i at each time t_i, and the expected position and velocity at the next time is a linear function of the estimate of the position and velocity at the previous time

$$\begin{eqnarray*} \hat{p}_{i+1} = \hat{p}_i + \hat{q}_i\Delta t_i, \end{eqnarray*}$$

(24)

$$\begin{eqnarray*} \hat{q}_{i+1} = \hat{q}_i, \end{eqnarray*}$$

(25)

where Δt_i = t_i − t_{i − 1}. In the formalism above, we would write

$$\begin{eqnarray*} \boldsymbol {x}_i={\begin{bmatrix}p_i \\ q_i\end{bmatrix}},\quad \boldsymbol{\sf F}_i = {\begin{bmatrix}1 &\quad \Delta t_i \\ 0&\quad 1\end{bmatrix}}, \quad \boldsymbol{\sf H}_i = {\begin{bmatrix}1 &\quad 0 \end{bmatrix}}. \end{eqnarray*}$$

(26)

The Kalman filter can be generalized to non-linear dynamical systems

$$\begin{eqnarray*} \boldsymbol {x}_{i} = f(\boldsymbol {x}_{i-1}) + \boldsymbol {w}_{i} \end{eqnarray*}$$

(27)

$$\begin{eqnarray*} \boldsymbol {z}_{i} = h(\boldsymbol {x}_{i}) + \boldsymbol {v}_{i}, \end{eqnarray*}$$

(28)

in which case it is known as the extended Kalman filter. The extended Kalman filter algorithm only differs from the linear case in that the prediction of the next state and the expected observation use the non-linear functions

$$\begin{eqnarray*} \boldsymbol {\hat{x}}_{i|i-1} = f(\boldsymbol {\hat{x}}_{i-1|i-1}) \end{eqnarray*}$$

(29)

$$\begin{eqnarray*} \boldsymbol {y}_i = \boldsymbol {z}_i - h(\boldsymbol {\hat{x}}_{i|i-1}), \end{eqnarray*}$$

(30)

whilst the remaining equations remain the same with the matrices |$\boldsymbol{\sf F}_i$| and |$\boldsymbol{\sf H}_i$| being defined by the Jacobians

$$\begin{eqnarray*} \boldsymbol{\sf F}_i = \frac{\partial f}{\partial \boldsymbol {x}}\Big \vert _{\boldsymbol {\hat{x}}_{i-1|i-1}}, \quad \boldsymbol{\sf H}_i = \frac{\partial h}{\partial \boldsymbol {x}}\Big \vert _{\boldsymbol {\hat{x}}_{i|i-1}}. \end{eqnarray*}$$

(31)

A modification of the extended Kalman filter to allow for multiplicative uncertainties will be presented in Section 3.3 to allow us to model the orientation of Gaia, whilst an extended Kalman filter will be used in Section 4.3 to model the efficiency with which Gaia observations result in detections through time.

3 SCANNING LAW

Determining the Gaia scanning law is exactly analogous to the problem of determining the attitude of Gaia through time. Determining the attitude of a spacecraft is a key engineering challenge which has been covered extensively in the literature (see Wertz 2012, and references therein), although our use case is peculiar in that we do not have access to gyroscopic² measurements or data from a star tracker. Instead, we will attempting to infer the attitude of Gaia based on the nominal scanning law and the locations and timings of the detections of the Gaia DR2 variable stars. Note that in the post-processed Astrometric Global Iterative Solution of Gaia (AGIS, see Lindegren et al. 2012) the attitude is also derived from the observations themselves, though based on a set of ‘primary’ sources that behave like single point-like sources whose colour is stable. Gaia’s point spread function is colour-dependent (Lindegren et al. 2018) and so if a source’s colour changes with time then that adds additional uncertainty into the determination of its position. Variable stars have colour variations and could be components of astrometric binary systems, both of which make them less than ideal as reference sources for the extreme attitude precision that the official astrometric solution requires, but are the only sources for which we have Gaia time-series. We chose to use the first-order multiplicative extended Kalman filter (MEKF, Markley 2003) where the attitude of the spacecraft is expressed as a quaternion, the attitude error is multiplicative, and the attitude error and angular velocity vector are assumed to be hidden states that jointly satisfy a multivariate normal distribution at all times. We use quaternions as they are the ideal mathematical object to encode the orientation of a spacecraft (an argument developed in the following section) and were used in the AGIS (Lindegren et al. 2012) to determine Gaia’s attitude for the official Gaia DPAC astrometric solution.

3.1 What are quaternions?

Quaternions (z = a + bi + cj + dk) are a generalization of the complex numbers (z = a + bi) with two additional ‘imaginary’ dimensions satisfying i² = j² = k² = ijk = −1. If this definition is as impenetrable to the reader as it was to the authors, then we recommend an hour exploring Sanderson & Eater (2018). An alternative way to write quaternions is in the composite form |$\boldsymbol {q}=(\boldsymbol {q},q_4)$|⁠, where q₄ ≡ a is the real part of the quaternion, and |$\boldsymbol {q} \equiv (b,c,d)^T$| corresponds to the imaginary parts. The key utility of quaternions is that any rotation by an angle θ about an axis specified by the unit vector |$\hat{u}$| in three real dimensions can be uniquely encoded, without the risk of gimbal lock, as a single number in quaternion space

$$\begin{eqnarray*} \boldsymbol {q} = \cos {\frac{\theta }{2}} + (u_xi+u_yj+u_zk)\sin {\frac{\theta }{2}}. \end{eqnarray*}$$

(32)

Every unit quaternion (e.g. a quaternion normalized such that |$|\boldsymbol {q}|^2+q_4^2=1$|⁠) corresponds to a rotation. The more common way to represent rotations in three dimensions is in terms of a rotation matrix, which can be recovered from the quaternion through

$$\begin{eqnarray*} \boldsymbol{\sf A}(\boldsymbol {q}) = \big(q_4^2-|\boldsymbol {q}|^2\big)I_3 + 2\boldsymbol {q}\boldsymbol {q}^T - 2q_4[\boldsymbol {q}\times ], \end{eqnarray*}$$

(33)

where |$[\boldsymbol {q}\times ]$| is the cross-product matrix

$$\begin{eqnarray*} [\boldsymbol {q}\times ] \equiv {\begin{bmatrix}0 &\quad -q_3 &\quad q_2 \\ q_3 &\quad 0 &\quad -q_1 \\ -q_2 &\quad q_1 &\quad 0 \\ \end{bmatrix}}. \end{eqnarray*}$$

(34)

We will follow Markley (2003) in adopting a slightly altered definition of quaternion multiplication which follows that of direction cosine matrices,

$$\begin{eqnarray*} \boldsymbol {p} \otimes \boldsymbol {q} \equiv {\begin{bmatrix}p_4\boldsymbol {q}+q_4\boldsymbol {p}-\boldsymbol {p}\times \boldsymbol {q} \\ p_4q_4-\boldsymbol {p}\cdot \boldsymbol {q} \\ \end{bmatrix}} \end{eqnarray*}$$

(35)

$$\begin{eqnarray*} \hphantom{\boldsymbol {p} \otimes \boldsymbol {q}} \Rightarrow \boldsymbol{\sf A}(\boldsymbol {p})\boldsymbol{\sf A}(\boldsymbol {q})=\boldsymbol{\sf A}(\boldsymbol {p}\otimes \boldsymbol {q}), \end{eqnarray*}$$

(36)

noting that quaternion multiplication is always non-commutative. This differs from the standard convention only up to the ordering of the quaternions: |$\boldsymbol {qp} = \boldsymbol {p} \otimes \boldsymbol {q}$|⁠.

The orientation of a body that is rotating can be parametrized by the required rotation from a non-rotating frame, and thus we can use a quaternion to encode the orientation of a body. Under our convention, the kinematic equation relating the time evolution of the quaternion to the angular velocity vector |$\boldsymbol {\omega }$| is

$$\begin{eqnarray*} \dot{\boldsymbol {q}} = \frac{1}{2} {\begin{bmatrix}\boldsymbol {\omega } \\ 0\end{bmatrix}} \otimes \boldsymbol {q}. \end{eqnarray*}$$

(37)

Here, |$\boldsymbol {\omega }$| is the angular velocity evaluated in the frame which moves with the body such that the moment of inertia tensor is constant and diagonal. Integrating this equation of motion is non-trivial due to the manifold constraint that the quaternion must at all times remain normalized to unity, and must therefore evolve smoothly across the unit hyper-sphere, S³. Adapting the first-order manifold integrator of Crouch & Grossman (1993) to a quaternion framework, we find that if |$\boldsymbol {\omega }$| is constant between t and t + δt, then the system evolves exactly according to:

$$\begin{eqnarray*} \boldsymbol {q}(t+\delta t) = {\begin{bmatrix}\frac{\boldsymbol {\omega }}{|\boldsymbol {\omega }|}\sin (\frac{\delta t}{2}|\boldsymbol {\omega }|) \\ \cos (\frac{\delta t}{2}|\boldsymbol {\omega }|) \end{bmatrix}}\otimes \boldsymbol {q}(t). \end{eqnarray*}$$

(38)

These expressions perfectly preserve the unit normalization of the quaternion. In the limit of small time-steps, this expression reduces to the result of a naive additive integration of the equation of motion.

A further consequence of the unity-normalization constraint is that if the observed attitude |$\boldsymbol {q}$| is an uncertain estimate of the true attitude |$\hat{\boldsymbol {q}}$|⁠, then it is meaningless to interpret that uncertainty as additive |$\boldsymbol {q} = \hat{\boldsymbol {q}}+\delta \boldsymbol {q}$|⁠, because there is no guarantee that the sum of the true attitude and noise will have unit normalization, and hence will not represent a valid orientation state. An alternative used by Markley (2003) is to consider a multiplicative uncertainty |$\boldsymbol {q} = \boldsymbol {\delta q}(\boldsymbol {a})\otimes \hat{\boldsymbol {q}}$|⁠, where the uncertainty is parametrized by an unconstrained vector |$\boldsymbol {a}$| in |${R}^3$| which is mapped to the space of unit quaternions by the function

$$\begin{eqnarray*} \boldsymbol {\delta q}(\boldsymbol {a}) = \frac{1}{\sqrt{4+|\boldsymbol {a}|^2}}{\begin{bmatrix}\boldsymbol {a}\\ 2\end{bmatrix}}. \end{eqnarray*}$$

(39)

The product of two unit quaternions is a unit quaternion and thus this form is guaranteed to preserve the unit normalization. To first order in the components of |$\boldsymbol {a}$|⁠, the rotation matrix corresponding to the mean rotation and an additional small rotation can be approximated by

$$\begin{eqnarray*} \boldsymbol{\sf A}({\boldsymbol q}) &=& \boldsymbol{\sf A}(\boldsymbol {\delta q}(\boldsymbol {a})\otimes \hat{{\boldsymbol q}}) \nonumber \\ &=& \boldsymbol{\sf A}(\boldsymbol {\delta q}(\boldsymbol {a}))\boldsymbol{\sf A}(\hat{{\boldsymbol q}}) \nonumber \\ &=& (\boldsymbol{\sf I}_3-[\boldsymbol {a}\times ])\boldsymbol{\sf A}(\hat{{\boldsymbol q}}). \end{eqnarray*}$$

(40)

3.2 Positions of stars on the sky and on the focal plane

We are attempting to infer the attitude of Gaia with very little information. Traditionally, the attitude of a spacecraft is inferred using measurements from a combination of gyroscopes and star trackers, but we do not have access to these. We have only access to the light-curves of the 550 737 stars classified as photometrically variable by DPAC in Gaia DR2. Each of the flux measurements in those light curves has an associated transit_id which encodes the time t, the field of view F, and the across-scan pixel P and CCD C of the detection on the Gaia focal plane. In the Gaia body frame, the along-scan angles are the longitude and the across-scan angles are the latitude, so named because they point in directions parallel and perpendicular to the direction in which Gaia is scanning. Note that technically the transit_id encodes the centre of the window that was assigned on-board to record the flux around the identified point source. Both time and position in the transit_id are from the on-board estimates, which have an uncertainty of roughly 1 pixel, that is, ∼60 mas in the along-scan direction (corresponding to ∼1 ms) and ∼180 mas in the across-scan direction. As we do not have access to the sub-pixel centroid estimate for each window, the precision of each transit observation will hence be limited to about 1 pixel in both directions. The timings of the detections are those measured on-board Gaia and so will be subject to aberration effects, while the timings of the brightest detections (G < 12) could be offset due to the partial, gated integrations used to prevent saturation.

It is more important to accurately know the across-scan location of a source than the along-scan location, because in the latter case a 20 arcsec offset will only cause an error of 0.3 s in the observation time, while in the former case it might mean that the star falls outside of Gaia’s field of view. Relativistic aberration is caused by Gaia’s motion around the Sun of roughly 30 km s⁻¹ (Gaia is located at L2 and so moves at approximately Earth’s orbital velocity), which can result in a maximum total aberration of about 20 arcsec (see equation 17 of Klioner 2003) across a combination of the along- and across-scan directions. Due to the importance of the across-scan location being correct, we corrected for the effect of aberration as described later in this section. The gating can only cause the detection location to be off in the along-scan direction by at most half of a CCD integration time (the ungated CCD integration time is 4.42 s (see section 2.2 of Crowley et al. 2016)) and so the predicted observation times of the gated brightest stars can be off by at most 2.2 s. Errors of these scales in the predicted observation time will not be astrophysically important even for stars exploding as supernovae, justifying us to neglect this effect.

Given the geometry of the Gaia focal plane, there are functions η(F, P, C) and ζ(F, P, C) that map these quantities to along- and across-scan angles in the body frame of Gaia, which can then be expressed as a unit vector. The attitude of Gaia at the time of detection can then be used to rotate that unit vector to the ICRS reference frame. However, we already know the location of the star on the sky from the Gaia DR2 source catalogue and thus we can constrain the unknown attitude at the time of the detection by requiring that the location of the star on the focal plane aligns with the location of the star on the sky. We note that – even if the position of the star on the sky and the geometry of the focal plane is known perfectly – the alignment of one vector in the body frame with one in the reference frame only constrains two of the three degrees of freedom in the attitude.

We used the information in the Gaia DR2 light curves in Paper I to make a determination of the Gaia scanning law. We did not carry out a full attitude determination in that preliminary work, restricting ourselves to only considering across-scan corrections to the positions of the preceding and following fields of view. This is equivalent to changing the roll and pitch of Gaia in the body frame but leaving the yaw fixed, and thus reduces the dimensionality of the problem to two degrees of freedom. In this work, we adopt the same across-scan focal plane geometry as in Paper I and additionally constrain the along-scan location,

$$\begin{eqnarray*} \zeta (F,C,P) &=& \Delta_f(1-2F)+(\Delta_c+\delta _c)(C-3)-(\Delta _p+\delta _p)\nonumber\\ &&\times \,(P-1965/2), \nonumber \\ \eta (F,C,P) &=& \eta _{\mathrm{AF1}}+\gamma _c(1-2F) \end{eqnarray*}$$

(41)

where Δ_f = 220.9979 arcsec is the magnitude of the across-scan offset of each of the fields of view, Δ_c = 356.5435 arcsec is the across-scan size of each CCD, Δ_p = 0.1768 arcsec is the across-scan size of each pixel, η_AF1 = −0.5° is the along-scan distance between the centre of each field of view and the reference centre of the AF1 CCDs, and γ_c = 106.5°/2 is half the basic angle of Gaia. The quantities δ_c and δ_p are free parameters that can correct for small errors in the across-scan sizes of the CCDs and pixels, respectively. We do not include corrections to the across-scan field-of-view offset nor the along-scan offset of AF1 from the field-of-view centres, because these are fully degenerate with the roll and yaw of Gaia. The finite size of the along-scan and across-scan pixels means that both of the expressions in equation (41) give only an estimate of the location on the focal plane. We assume a variance in the along-scan direction of |$\sigma _{\eta }^2=(17\,\,\mathrm{arcsec})^2$| (based on the variance in the along-scan locations of stars at their times of observation if we use the nominal scanning law) and in the across-scan direction of |$\sigma _{\zeta }^2 = \frac{1}{12}(\Delta _p+\delta _p)^2+\sigma _p^2$| (the first term is the variance of a uniform distribution and the second term accounts for any excess spread in the across-scan direction).

Suppose you have a mean longitude and latitude |$(\hat{\phi },\hat{\theta })$| with covariant uncertainties |$(\Delta \phi \cos {\hat{\theta }}, \Delta \theta)\sim N(0,\Sigma)$|⁠. To first order, the unit vector defined by those coordinates is then distributed like |$\boldsymbol {u} \sim N(\hat{\boldsymbol {u}},\boldsymbol{\sf T}\Sigma \boldsymbol{\sf T}^T)$|⁠, where

$$\begin{eqnarray*} \hat{\vec{u}} = {\begin{bmatrix}\cos {\hat{\phi }}\cos {\hat{\theta }} \\ \sin {\hat{\phi }}\cos {\hat{\theta }} \\ \sin {\hat{\theta }}\end{bmatrix}}, \quad \boldsymbol{\sf T} = {\begin{bmatrix}-\sin {\hat{\phi }} &\quad -\cos {\hat{\phi }}\sin {\hat{\theta }} \\ +\cos {\hat{\phi }} &\quad -\sin {\hat{\phi }}\sin {\hat{\theta }} \\ 0 &\quad +\cos {\hat{\theta }}\end{bmatrix}}. \end{eqnarray*}$$

(42)

This operation is equivalent to projecting the uncertainty in the unit vector into the plane that is tangent to the unit sphere at the mean position and thus the covariance matrix |$\boldsymbol{\sf T}\Sigma\boldsymbol{\sf T}^T$| is rank deficient – it describes uncertainty in three dimensions but only has two degrees of freedom. In almost all places where we will use this approximation the uncertainty on the positions is less than 1 arcsec and so the approximation is highly valid. We use these equations to express the uncertain along- and across-scan coordinates as a unit vector in Gaia’s body frame. One of the conveniences of working with unit vectors rather than angles is that is that it is easy to rotate a mean vector |$\hat{\boldsymbol {u}}$| and covariance matrix Σ from the reference frame to the body frame of Gaia using the matrix |$\boldsymbol{\sf A}(q)$|, with the rotated vector having mean |$\boldsymbol{\sf A}\hat{\boldsymbol {u}}$| and covariance |$\boldsymbol{\sf A}\Sigma\boldsymbol{\sf A}^T$|.

The Gaia DR2 astrometry of a star is given as the mean position, parallax, and proper motions |$\boldsymbol {x}_0=(\hat{\alpha }\cos {\hat{\delta }},\hat{\delta },\hat{\varpi },\hat{\mu }_{\alpha }\cos {\hat{\delta }},\hat{\mu }_{\delta })_0$| at the epoch T₀ = J2015.5 and a joint covariance matrix Σ describing the uncertainty in each of these. In general we will be interested in the position of the star at a different epoch T₀ + ΔT and so we need to propogate the position forward or backward in time by applying the linear operator

$$\begin{eqnarray*} \boldsymbol{\sf M} = {\begin{bmatrix} 1 &\, 0 &\, X_{\mathrm{G}}\sin {\hat{\alpha }_0}-Y_{\mathrm{G}}\cos {\hat{\alpha }_0} &\, \Delta T &\, 0 \\ 0 &\, 1 &\, (X_{\mathrm{G}}\cos {\hat{\alpha }_0} + Y_{\mathrm{G}}\sin {\hat{\alpha }_0})\sin {\hat{\delta }_0} - Z_{\mathrm{G}}\cos {\hat{\delta }_0} &\, 0 &\, \Delta T \\ 0 &\, 0 &\, 1 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 1 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 1 \end{bmatrix}}, \nonumber\\ \end{eqnarray*}$$

(43)

where |$\boldsymbol {x}_{\mathrm{G}} = (X_{\mathrm{G}},Y_{\mathrm{G}},Z_{\mathrm{G}})$| is the vector between the Solar system barycentre and Gaia at the epoch T₀ + ΔT (obtained from the NASA JPL HORIZONS ephemeris calculator) and we assume that the proper motions and parallax are constant over the ±1 yr interval of ΔT. The predicted astrometry at the epoch T₀ + ΔT will have mean |$\boldsymbol{\sf M}\boldsymbol {x}_0$| and covariance |$\boldsymbol{\sf M}\Sigma\boldsymbol{\sf M}^T$|. We predict the astrometry for all of the sources with variable light curves in Gaia DR2 at the time of each flux measurement. We discarded the measurements from any source which lacked 5D astrometry in Gaia DR2, which resulted in us discarding 466 706 out of the 17 672 340 flux measurements. Focusing only on the rows and columns of the predicted astrometry that contain the positions, we can then use the results of the previous paragraph to predict the mean |$\hat{u}$| and covariance |$\boldsymbol{\sf S}$| of the components of the unit vector pointing from Gaia to the source in the reference frame at the epoch T₀ + ΔT.

Finally, we accounted for the relativistic effect of aberration, which causes sources viewed by a moving observer to appear nearer to the apex of the observer’s motion. The unit vectors |$\boldsymbol {u}$| from the previous paragraph were derived using the location of the source on the sky in a frame that is stationary with respect to the Solar system barycentre, but aberration due to Gaia’s orbit will causes the location of the source to vary between observations. Klioner (2003) states that the angular shift towards the apex is given by

$$\begin{eqnarray*} \delta \theta &=& \frac{1}{c}|\boldsymbol {v}_{\mathrm{G}}|\sin \theta \left[1+\frac{1}{c^2}(1+\gamma)w(\boldsymbol {x}_{\mathrm{G}})+\frac{1}{4}\frac{|\boldsymbol {v}_{\mathrm{G}}|^2}{c^2}\right] \nonumber \\ &&-\,\frac{1}{4}\frac{|\boldsymbol {v}_{\mathrm{G}}|^2}{c^2}\sin 2\theta + \frac{1}{12}\frac{|\boldsymbol {v}_{\mathrm{G}}|^3}{c^3}\sin 3\theta + O(c^{-4}), \end{eqnarray*}$$

(44)

where |$\boldsymbol {v}_{\mathrm{G}}$| is the velocity of Gaia relative to the Solar system barycentre at the epoch T₀ + ΔT (obtained from the NASA JPL HORIZONS ephemeris calculator), θ is the angle of the source from the apex of Gaia’s motion, γ is a parameter in the parametrized post-Newtonian formalism (we assume that General Relativity holds and thus γ = 1), and |$w(\boldsymbol {x}_{\mathrm{G}})$| gives the Solar system gravitational potential at Gaia’s location. The angular shifts predicted by equation (44) are at most 20 arcsec for sources observed by Gaia. We note that due to Gaia’s slow velocity (30 km s⁻¹) and large distance from any massive bodies, almost all of the shift comes from the first term (⁠|$\delta \theta \approx \frac{1}{c}|\boldsymbol {v}_{\mathrm{G}}|\sin \theta$|⁠), with the remaining terms contributing at most 0.0005 arcsec. We computed the angular shift due to aberration for every observation of every source and then rotated the unit vector |$\boldsymbol {u}$| by that amount around the axis perpendicular to the velocity vector of Gaia and |$\boldsymbol {u}$| using the rotation matrix |$\boldsymbol{\sf R}$|. When predicting observations of sources in Section 4, we used the first-order expansion of equation (10) of Klioner (2003) that predicts that the unit vector |$\boldsymbol {u}$| towards a source will be perturbed to |$\boldsymbol {u}+\boldsymbol {v}_{\mathrm{G}}/c$|⁠, with the caveat that we re-normalize the resulting vector to ensure it retains unit normality. In subsequent paragraphs, we will refer to the unit vector with aberration included as |$\boldsymbol {u}$|⁠.

Suppose that at this epoch we have an uncertain estimate of the attitude of Gaia |$\boldsymbol {\delta q}(\boldsymbol {a})\otimes \hat{{\boldsymbol q}}$| parametrized by |$\boldsymbol {a}$| with mean |$\hat{\boldsymbol {a}}=0$| and covariance |$\boldsymbol{\sf P}_{aa}$|, then – to first order in the uncertainties of the attitude and the unit vector in the reference frame – the predicted unit vector in the body frame of Gaia has mean |$\boldsymbol{\sf A}(\hat{{\bf q}})\vec{u}$| and covariance |$\boldsymbol{\sf H}_a\boldsymbol{\sf P}_{aa}\boldsymbol{\sf H}_a^T+\boldsymbol{\sf A}(\hat{{\boldsymbol q}})\boldsymbol{\sf S}\boldsymbol{\sf A}^T(\hat{{\boldsymbol q}})$|⁠, where |$\boldsymbol{\sf H}_a = [\hat{\boldsymbol {u}}\times ]$|⁠. The second term is simply the rotation of the uncertainty in the unit vector in the reference frame and we refer the reader to Markley (2003) for details on the first term.

The key to the MEKF that we will describe in the following section is the measurement update step: the uncertain unit vector in the Gaia body frame pointing from Gaia to a source is predicted using our current estimate of Gaia’s attitude, thus allowing us to use the unit vector deduced from the field of view, CCD, and pixel of the detection as a measurement to constrain the attitude.

3.3 The multiplicative extended Kalman filter

The MEKF is a modification of the extended Kalman filter described in Section 2.3 that allows for part of the state – the attitude of the spacecraft – to be constrained to have unit normalization. We stress that the overview of the MEKF presented here draws heavily from Markley (2003), who presented one of the seminal overviews of the MEKF, and Burton et al. (2017), who adapted the MEKF for the case where the attitude needed to be determined based on noisy measurements of the vector pointing towards the Sun rather than measurements from a gyroscope or star tracker.

The MEKF represents the true attitude quaternion of the spacecraft as

$$\begin{eqnarray*} {\boldsymbol q}(t)=\boldsymbol{\delta q}(\boldsymbol {a}(t))\otimes \hat{{\boldsymbol q}}(t), \end{eqnarray*}$$

(45)

where |$\hat{{\boldsymbol q}}(t)$| is our expectation value of the attitude quaternion and |$\boldsymbol{\delta q}(\boldsymbol {a}(t))$| encodes the uncertainty in that attitude (see equation 39). The state space of the MEKF is composed of three parts: the mean attitude quaternion |$\hat{{\boldsymbol q}}(t)$|⁠, the attitude error vector |$\boldsymbol {a}(t)$|⁠, and the angular velocity vector |$\boldsymbol {\omega }(t)$|⁠. The redundancy between the expectation value of |$\boldsymbol {a}(t)$| and |$\hat{{\boldsymbol q}}(t)$| is removed by resetting |$\boldsymbol {a}(t)$| to zero after each measurement update. The quantities that are explicitly tracked by the MEKF are thus the mean attitude and angular velocity vector |$(\hat{{\boldsymbol q}},\hat{\boldsymbol{\omega }})$| and the error covariance matrix

$$\begin{eqnarray*} \boldsymbol{\sf P} = {\begin{bmatrix}\boldsymbol{\sf P}_{aa} &\quad \boldsymbol{\sf P}_{a\omega } \\ \boldsymbol{\sf P}_{a\omega }^T &\quad \boldsymbol{\sf P}_{\omega \omega } \end{bmatrix}}. \end{eqnarray*}$$

(46)

We assume that Gaia is subject to random zero-mean angular accelerations such that the dynamical equations are

$$\begin{eqnarray*} \dot{{\boldsymbol q}} = \frac{1}{2}{\begin{bmatrix}\boldsymbol{\omega } \\ 0 \end{bmatrix}}\otimes {\boldsymbol q} \end{eqnarray*}$$

(47)

$$\begin{eqnarray*} \dot{\boldsymbol {a}} = \boldsymbol {\omega }-\hat{\boldsymbol {\omega }}+\boldsymbol {a}\times (\boldsymbol {\omega }+\hat{\boldsymbol {\omega }}) \end{eqnarray*}$$

(48)

$$\begin{eqnarray*} \dot{\boldsymbol {\omega }} = \boldsymbol {w}, \end{eqnarray*}$$

(49)

where |$\boldsymbol {w}$| is drawn from a zero-mean multivariate normal distribution with the covariance matrix |$\boldsymbol{\sf Q}=\sigma _{\mathrm{MEKF}}^2\boldsymbol{\sf I}_{3\times 3}$|⁠. We note that in standard applications of the MEKF, equation (49) would either have input from a gyroscope or include a model of the spacecraft dynamics in terms of torques and the moment of inertia. We have access to neither of these and thus rely on there being a sufficient number of variable star measurements that we capture the change in the angular velocity vector through time. Our estimates of these quantities evolve through time satisfying the equations

$$\begin{eqnarray*} \dot{\hat{{\boldsymbol q}}}(t) = \frac{1}{2}{\begin{bmatrix}\hat{\boldsymbol {\omega }} \\ 0 \end{bmatrix}}\otimes \hat{{\boldsymbol q}}(t) \end{eqnarray*}$$

(50)

$$\begin{eqnarray*} \dot{\hat{\boldsymbol {\omega }}}(t) = 0 \end{eqnarray*}$$

(51)

$$\begin{eqnarray*} \dot{\boldsymbol{\sf P}}(t) = \boldsymbol{\sf F}\boldsymbol{\sf P}(t) + \boldsymbol{\sf P}(t)\boldsymbol{\sf F}^T + \boldsymbol{\sf G}\boldsymbol{\sf Q}\boldsymbol{\sf G}^T, \end{eqnarray*}$$

(52)

where

$$\begin{eqnarray*} \boldsymbol{\sf F} = {\begin{bmatrix}\frac{\partial \dot{\boldsymbol {a}}}{\partial \boldsymbol {a}}\big \vert _{\hat{\boldsymbol {a}},\hat{\boldsymbol {\omega }}} &\quad \frac{\partial \dot{\boldsymbol {a}}}{\partial \boldsymbol {\omega }}\big \vert _{\hat{\boldsymbol {a}},\hat{\boldsymbol {\omega }}} \\ \frac{\partial \dot{\boldsymbol {\omega }}}{\partial \boldsymbol {a}}\big \vert _{\hat{\boldsymbol {a}},\hat{\boldsymbol {\omega }}} &\quad \frac{\partial \dot{\boldsymbol {\omega }}}{\partial \boldsymbol {\omega }}\big \vert _{\hat{\boldsymbol {a}},\hat{\boldsymbol {\omega }}} \end{bmatrix}} = {\begin{bmatrix}-[\hat{\boldsymbol {\omega }}\times ] &\quad \boldsymbol{\sf I}_{3\times 3} \\ \boldsymbol{\sf 0}_{3\times 3} &\quad \boldsymbol{\sf 0}_{3\times 3} \end{bmatrix}}, \end{eqnarray*}$$

(53)

and

$$\begin{eqnarray*} \boldsymbol{\sf G} = {\begin{bmatrix}\frac{\partial \dot{\boldsymbol {a}}}{\partial \boldsymbol {w}}\big \vert _{\hat{\boldsymbol {a}},\hat{\boldsymbol {\omega }}} \\ \frac{\partial \dot{\boldsymbol {\omega }}}{\partial \boldsymbol {w}}\big \vert _{\hat{\boldsymbol {a}},\hat{\boldsymbol {\omega }}}\end{bmatrix}} = {\begin{bmatrix}\boldsymbol{\sf 0}_{3\times 3} \\ \boldsymbol{\sf I}_{3\times 3} \end{bmatrix}}. \end{eqnarray*}$$

(54)

We refer the reader to Markley (2003) and Burton et al. (2017) for the form of the derivations of these.

Analogously with Section 2.3, we define |$\hat{{\boldsymbol q}}_{i|j}$|⁠, |$\hat{\boldsymbol {\omega }}_{i|j}$|⁠, and |$\boldsymbol{\sf P}_{i|j}$| to be our estimate of the mean attitude quaternion, mean angular velocity vector, and covariance between the attitude error and angular velocity vectors at time t_i given all of the measurements made at times up to and including time t_j. The MEKF iteratively calculates the quantities |$\hat{{\boldsymbol q}}_{i|i}$|⁠, |$\hat{\boldsymbol {\omega }}_{i|i}$|⁠, and |$\boldsymbol{\sf P}_{i|i}$| at each time-step given all of the prior observations. Each time-step is split into a prediction and a measurement update stage.

At the prediction stage, the dynamical equations above are solved with the expressions

$$\begin{eqnarray*} \hat{{\boldsymbol q}}_{i|i-1} = {\begin{bmatrix}\frac{\hat{\boldsymbol {\omega }}_{i-1|i-1}}{|\hat{\boldsymbol {\omega }}_{i-1|i-1}|}\sin \left(\frac{\delta t}{2}|\hat{\boldsymbol {\omega }}_{i-1|i-1}|\right) \\ \cos \left(\frac{\delta t}{2}|\hat{\boldsymbol {\omega }}_{i-1|i-1}|\right) \end{bmatrix}}\otimes \hat{\boldsymbol {q}}_{i-1|i-1} \end{eqnarray*}$$

(55)

$$\begin{eqnarray*} \hat{\boldsymbol {\omega }}_{i|i-1} = \hat{\boldsymbol {\omega }}_{i-1|i-1} \end{eqnarray*}$$

(56)

$$\begin{eqnarray*} \boldsymbol{\sf P}_{i|i-1} = \Phi _{i|i-1}\boldsymbol{\sf P}_{i-1|i-1}\Phi _{i|i-1}^T + \delta t \boldsymbol{\sf G}\boldsymbol{\sf Q}\boldsymbol{\sf G}^T, \end{eqnarray*}$$

(57)

where δt = t_i − t_{i − 1} and the state transition matrix is defined by |$\Phi_{i|i-1} = \exp(\delta t \boldsymbol{\sf F}_{i-1|i-1})$|⁠. We exploited the structure of |$\boldsymbol{\sf F}_{i-1|i-1}$| to avoid explicitly calculating the matrix exponential, as detailed in Appendix A.

At the measurement stage, we use the detected location of the star on the focal plane and the known location of the star on the sky to place a constraint on the attitude of Gaia. The unit vector of the star in the reference frame has mean |$\hat{\boldsymbol {x}}_i$| and covariance Σ_i, while the unit vector of the detection in the body frame has mean |$\hat{\boldsymbol {u}}_i$| and covariance |$\boldsymbol{\sf R}_i$|. The equations expressing this are

$$\begin{eqnarray*} \hat{\boldsymbol{\sf A}} = \boldsymbol{\sf A}(\hat{{\boldsymbol q}}_{i|i-1}) \end{eqnarray*}$$

(58)

$$\begin{eqnarray*} \hat{\boldsymbol {m}}_i = \hat{\boldsymbol{\sf A}}\hat{\boldsymbol {u}}_i \end{eqnarray*}$$

(59)

$$\begin{eqnarray*} \boldsymbol{\sf H} = {\begin{bmatrix}[\hat{\boldsymbol {m}}_i\times] & \boldsymbol{\sf 0}_3 \end{bmatrix}} \end{eqnarray*}$$

(60)

$$\begin{eqnarray*} \boldsymbol{\sf S} = \boldsymbol{\sf H} \boldsymbol{\sf P}_{i|i-1} \boldsymbol{\sf H}^T + \hat{\boldsymbol{\sf A}}\Sigma _i\hat{\boldsymbol{\sf A}}^T + \boldsymbol{\sf R}_i \end{eqnarray*}$$

(61)

$$\begin{eqnarray*} \boldsymbol{\sf K} = \boldsymbol{\sf P}_{i|i-1}\boldsymbol{\sf H}^T\boldsymbol{\sf S}^{-1} \end{eqnarray*}$$

(62)

$$\begin{eqnarray*} \boldsymbol {c} = \hat{\boldsymbol {u}}_i - \hat{\boldsymbol {m}}_i \end{eqnarray*}$$

(63)

$$\begin{eqnarray*} {\begin{bmatrix}\hat{\boldsymbol{a}} \\ \Delta \hat{\boldsymbol {\omega }} \end{bmatrix}} = \boldsymbol{\sf K}\boldsymbol {c} \end{eqnarray*}$$

(64)

$$\begin{eqnarray*} {\boldsymbol q}_{i|i} = \boldsymbol{\delta q}(\hat{\boldsymbol {a}})\otimes {\boldsymbol q}_{i|i-1} \end{eqnarray*}$$

(65)

$$\begin{eqnarray*} \hat{\boldsymbol {\omega }}_{i|i} = \Delta \hat{\boldsymbol {\omega }} + \hat{\boldsymbol {\omega }}_{i|i-1} \end{eqnarray*}$$

(66)

$$\begin{eqnarray*} \boldsymbol{\sf P}_{i|i} = (\boldsymbol{\sf I}_{6\times 6}-\boldsymbol{\sf K}\boldsymbol{\sf H})\boldsymbol{\sf P}_{i|i-1} \end{eqnarray*}$$

(67)

$$\begin{eqnarray*} L_i = -\frac{1}{2}(\boldsymbol {c}^T\boldsymbol{\sf S}^{-1}\boldsymbol {c} + \log |\boldsymbol{\sf S}|+2\log {2\pi}), \end{eqnarray*}$$

(68)

where the final line calculates the log-likelihood of the location of the detection given the location of the source on the sky and our prior estimate of the attitude. In practise, the rank deficiency of both Σ_i and |$\boldsymbol{\sf R}_i$| causes the inversion of |$\boldsymbol{\sf S}$| to be numerically unstable. We solve this by discarding the rows and columns corresponding to the x component of vectors and matrices in the body frame of Gaia, because all of the information gained during the measurement update is contained in the other two components. The redundancy between |$\hat{\boldsymbol {a}}$| and |$\hat{{\boldsymbol q}}(t)$| is removed during each measurement update in equation (65) by moving the mean rotation represented by a non-zero |$\hat{\boldsymbol {a}}$| to the mean attitude quaternion |$\hat{{\boldsymbol q}}$|⁠. The total log-likelihood of the MEKF for a given set of values for the hyperparameters is simply the sum of the individual log-likelihoods L_i from each time-step.

The forward algorithm described above is sufficient to calculate the log-likelihood and to predict the attitude and angular velocity of Gaia at each time given all prior measurements, but it is possible to obtain a more precise estimate that additionally incorporates the measurements at later times through an altered version of the Rauch–Tung–Striebel smoothing algorithm (Rauch et al. 1965) discussed in Section 2.3. The required alterations were given in Kubelka, Reinstein & Svoboda (2016):

$$\begin{eqnarray*} \boldsymbol{\sf C}_i = \boldsymbol{\sf P}_{i|i}\Phi _{i+1|i}^T\boldsymbol{\sf P}_{i+1|i}^{-1} \end{eqnarray*}$$

(69)

$$\begin{eqnarray*} {\begin{bmatrix}\hat{\boldsymbol {a}} \\ \Delta \hat{\boldsymbol {\omega }} \end{bmatrix}} = \boldsymbol{\sf C}_i{\begin{bmatrix}\boldsymbol{\delta q}^{-1}({\boldsymbol q}_{i+1|n}\otimes {\boldsymbol q}_{i+1|i}^{-1})\\ \hat{\boldsymbol {\omega }}_{i+1|n}-\hat{\boldsymbol {\omega }}_{i+1|i}\end{bmatrix}} \end{eqnarray*}$$

(70)

$$\begin{eqnarray*} {\boldsymbol q}_{i|n} = \boldsymbol{\delta q}(\hat{\boldsymbol {a}})\otimes {\boldsymbol q}_{i|i} \end{eqnarray*}$$

(71)

$$\begin{eqnarray*} \hat{\boldsymbol {\omega }}_{i|n} = \Delta \hat{\boldsymbol {\omega }} + \hat{\boldsymbol {\omega }}_{i|i} \end{eqnarray*}$$

(72)

$$\begin{eqnarray*} \boldsymbol{\sf P}_{i|n} = \boldsymbol{\sf P}_{i|i} + \boldsymbol{\sf C}_i (\boldsymbol{\sf P}_{i+1|n} - \boldsymbol{\sf P}_{i+1|i})\boldsymbol{\sf C}_i^T, \end{eqnarray*}$$

(73)

where the function |$\boldsymbol{\delta q}^{-1}$| is the inverse of the Gibbs map in equation (39). Of the results in this subsection, only the generalization of the measurement update to incorporate uncertainty in the known unit vector in the reference frame were original derivations of this work. The remaining results are a hybrid of Markley (2003), Burton et al. (2017), and Kubelka et al. (2016).

3.4 Application of the MEKF

Our model of the attitude of Gaia has four free parameters – the variance |$\sigma _{\mathrm{MEKF}}^2$| of the random angular accelerations experienced by Gaia, the excess spread in the locations of the star on the focal plane |$\sigma _p^2$| and the corrections to the CCD δ_c and pixel δ_c across-scan widths. If we propose a set of values for these parameters and provide a set of detections of Gaia sources then the MEKF algorithm described above will churn out a log-likelihood, and so we are able to carry out maximum-likelihood estimation of the values of those parameters. We opt to split the 22 months of Gaia DR2 into day-long chunks and perform maximum-likelihood optimization separately in each day. We do this to make our estimation more robust to extreme adverse events; if Gaia’s attitude rapidly changed in one short period then that could cause the value of |$\sigma _{\mathrm{MEKF}}^2$| to be biased high. The MEKF is a recursive algorithm and so cannot be naively parallelized, and so splitting the optimization into chunks also makes the optimization more computationally tractable by allowing it to be spread across multiple CPUs. We performed the optimization using the scipy implementation of the Nelder–Mead algorithm (Gao & Han 2012). The initial attitude and angular velocity vector was taken from the last time point of the nominal scanning law prior to the first data point in that day. We note that the nominal scanning law as provided by DPAC has a discontinuity on the 2014 September 25 (OBMT = 1326.7 rev), which corresponds to the transition from the original nominal scanning law to one optimized for the ‘GAia Relativistic Experiment on Quadrupole’ experiment (de Bruijne et al. 2010). This transition created a discontinuity in the phase of both the precession and spin. Whilst in reality Gaia would smoothly transition to the updated scanning law, the nominal scanning file has an abrupt transition with the attitude changing by more than 100° between two consecutive 10 s time-steps. This transition occurs during the data-taking gap associated with the first decontamination of Gaia and so is not constrained by the variable star detections. We reset the MEKF at that time point by resetting the attitude and angular velocity to that given by the nominal scanning law after the transition.

In Fig. 1, we show a corner plot of the maximum-likelihood values of (σ_MEKF, σ_p, δ_c, δ_p) from each of the 667 d. We estimate the values of our parameters from the (16, 50, 84) per cent percentiles of this forest of maximum-likelihood values to be |$\sigma _{\mathrm{MEKF}} = 1.28_{-0.08}^{+0.06}\,\,\mu \mathrm{as}\,\,\mathrm{s}^{-2}$|⁠, |$\sigma _{p} = 56.04_{-3.10}^{+5.76}\,\,\mathrm{mas}$|⁠, |$\delta _{c} = 31.61_{-3.33}^{+6.13}\,\,\mathrm{mas}$|⁠, and |$\delta _{p} = -43.70_{-3.23}^{+1.80}\,\,\mathrm{mas}$|⁠. The values of the latter three of these parameters (shown as blue lines in Fig. 1) are in excellent agreement with our findings in Paper I, giving us faith in the results of these two independent methodologies, though the value of σ_p is slightly smaller due to us mistakenly ignoring aberration in the previous work. We verified that if we neglected aberration in this work then we recovered the larger value of σ_p.

Figure 1.

Corner plot of the hyperparameters of our Gaia attitude model, determining the random angular accelerations experienced by Gaia (σ_MEKF), the excess error in the location of the stars (σ_p), and the along-scan corrections to the CCD δ_c and pixel δ_p widths. The blue lines indicate the values that the latter three of these hyperparameters took in our much simpler model of Gaia’s attitude in Paper I.

Open in new tab Download slide

Adopting the median values of the parameters as our fiducial values, we inserted the nominal scanning law time-points into the MEKF as time points without a measurement update step and calculated the means and covariances of the attitude and angular velocity at those time-steps, employing the MEKF with the additional backwards smoothing step. In Figs 2 and 3, we show the across- and along-scan distance between the locations of the two fields of view in our corrected scanning law and their locations in the nominal scanning law, in the frame of the nominal scanning law. The across-scan offsets of the two fields of view have medians (12.9 and −11.4 arcsec, respectively) that are significantly different from zero, indicating either a long-term difference between the two scanning laws or that our value of Δ_f is too small by approximately 12 arcsec. Fig. 2 shows significant differences to the equivalent figure of Paper I, because in that previous work we failed to include relativistic aberration when deriving the scanning law, causing our model in that paper to fit the aberration and thus resulting in periodic structure on the 63-d precession period. There is, however, periodic behaviour in the across-scan offsets on the 6 h spin period of Gaia. These are projections of small differences in Gaia’s angular velocity between the two scanning laws that cause the angular offset to grow and shrink as Gaia rotates.³ The along-scan offsets of the two fields of view are identical, which is a geometric constraint imposed by our model. Fig. 3 shows an offset that changes linearly with time, except from the period shortly after OBMT = 300 d where Gaia transitions from the Ecliptic Pole Scanning Law to the Nominal Scanning Law (the nominal scanning law published by DPAC has a discontinuity at this transition). During periods where none of the variable stars have published detections our scanning law deviates significantly from the nominal scanning law, because we have not directly modelled the precession of Gaia’s spin axis. However, this is not a significant issue, because detections taken during these periods will not have contributed to the Gaia DR2 data products. We will make our scanning law publicly available upon acceptance of this manuscript.

Figure 2.

The across-scan distance between the locations of the two fields of view in our corrected scanning law and their locations in the nominal scanning law, in the frame of the nominal scanning law. This figure is comparable to fig. 3 of Paper I. The light blue bars in each panel indicate the gaps derived in Section 4, during which there are no published detections of variable stars.

Open in new tab Download slide

Figure 3.

The along-scan distance between the locations of the two fields of view in our corrected scanning law and their locations in the nominal scanning law, in the frame of the nominal scanning law. The preceeding and following fields of view have identical along-scan offsets.

Open in new tab Download slide

4 EFFICIENCY OF OBTAINING USEFUL DETECTIONS FROM OBSERVATIONS

If we had perfect knowledge of Gaia’s scanning law then we could predict the number of times that any star would have transited across either of the two fields of view, which we will term to be a predicted observation. Not every observation of a star results in a measurement in the Gaia DR2 variable star epoch table, which we will term to be a published detection. There are several reasons why a predicted observation might not result in a published detection:

Gaia experienced events which resulted in some detections not being of sufficient quality for publication in DR2 (e.g. the decontamination procedures or micro-meteoroid impacts, see Gaia Collaboration 2016). We term these periods to be gaps because no star of any magnitude has any published detections from during these periods. Part of the data taken during DR2 gaps may appear in subsequent DR.
An observation only results in a published detection with some probability which depends on the properties of the source. This may be because the occurrence of a detection is probabilistic (for instance, at the faint magnitude limit photon shot noise will result in stars being detected on only a fraction of their observations) or because of quality cuts made by the Gaia DPAC.
Gaia has a limited capacity to assign windows to stars and store and downlink the resulting scientific data, and so if densely populated regions are being scanned then only some fraction of the predicted observations will result in data that makes it to the ground. The retention behaviour is decided by magnitude bin (see Table 1), with some bins being prioritized to ensure good calibration across the entire magnitude range.

Table 1.

Open in new tab

The magnitude bins used on-board by Gaia to determine priority for deletion if the downlink bandwidth is exhausted and which we used to bin the epoch measurements in this work. The first three columns of this table are a reproduction of table 1.10, section 1.3.3 of de Bruijne et al. (2018).

Packet	Magnitude range	Deletion (per cent)	l (d)	ε (d⁻¹)
SP1-1	(5.00,13.00)	0	150.49	1.53
SP1-2	(13.00,16.00)	0	1.57	1.58
SP1-3	(16.00,16.30)	1	9.65	1.48
SP1-4	(16.30,17.00)	1	1.78	1.52
SP1-5	(17.00,17.20)	2	7.01	1.52
SP1-6	(17.20,18.00)	2	2.15	1.52
SP1-7	(18.00,18.10)	2	11.11	1.51
SP1-8	(18.10,19.00)	2	2.01	1.44
SP1-9	(19.00,19.05)	2	7.02	1.30
SP1-10	(19.05,19.95)	7	0.28	1.21
SP1-11	(19.95,20.00)	2	7.87	1.00
SP1-12	(20.00,20.30)	13	1.39	0.99
SP1-13	(20.30,20.40)	12	2.32	1.08
SP1-14	(20.40,20.50)	13	2.21	1.07
SP1-15	(20.50,20.60)	28	2.51	1.02
SP1-16	(20.60,20.70)	28	3.01	0.97
SP1-17	(20.70,20.80)	28	8.64	0.84
SP1-18	(20.80,20.90)	28	19.67	0.73
SP1-19	(20.90,21.00)	24	15.10	0.63

Packet	Magnitude range	Deletion (per cent)	l (d)	ε (d⁻¹)
SP1-1	(5.00,13.00)	0	150.49	1.53
SP1-2	(13.00,16.00)	0	1.57	1.58
SP1-3	(16.00,16.30)	1	9.65	1.48
SP1-4	(16.30,17.00)	1	1.78	1.52
SP1-5	(17.00,17.20)	2	7.01	1.52
SP1-6	(17.20,18.00)	2	2.15	1.52
SP1-7	(18.00,18.10)	2	11.11	1.51
SP1-8	(18.10,19.00)	2	2.01	1.44
SP1-9	(19.00,19.05)	2	7.02	1.30
SP1-10	(19.05,19.95)	7	0.28	1.21
SP1-11	(19.95,20.00)	2	7.87	1.00
SP1-12	(20.00,20.30)	13	1.39	0.99
SP1-13	(20.30,20.40)	12	2.32	1.08
SP1-14	(20.40,20.50)	13	2.21	1.07
SP1-15	(20.50,20.60)	28	2.51	1.02
SP1-16	(20.60,20.70)	28	3.01	0.97
SP1-17	(20.70,20.80)	28	8.64	0.84
SP1-18	(20.80,20.90)	28	19.67	0.73
SP1-19	(20.90,21.00)	24	15.10	0.63

Table 1.

Open in new tab

The magnitude bins used on-board by Gaia to determine priority for deletion if the downlink bandwidth is exhausted and which we used to bin the epoch measurements in this work. The first three columns of this table are a reproduction of table 1.10, section 1.3.3 of de Bruijne et al. (2018).

Packet	Magnitude range	Deletion (per cent)	l (d)	ε (d⁻¹)
SP1-1	(5.00,13.00)	0	150.49	1.53
SP1-2	(13.00,16.00)	0	1.57	1.58
SP1-3	(16.00,16.30)	1	9.65	1.48
SP1-4	(16.30,17.00)	1	1.78	1.52
SP1-5	(17.00,17.20)	2	7.01	1.52
SP1-6	(17.20,18.00)	2	2.15	1.52
SP1-7	(18.00,18.10)	2	11.11	1.51
SP1-8	(18.10,19.00)	2	2.01	1.44
SP1-9	(19.00,19.05)	2	7.02	1.30
SP1-10	(19.05,19.95)	7	0.28	1.21
SP1-11	(19.95,20.00)	2	7.87	1.00
SP1-12	(20.00,20.30)	13	1.39	0.99
SP1-13	(20.30,20.40)	12	2.32	1.08
SP1-14	(20.40,20.50)	13	2.21	1.07
SP1-15	(20.50,20.60)	28	2.51	1.02
SP1-16	(20.60,20.70)	28	3.01	0.97
SP1-17	(20.70,20.80)	28	8.64	0.84
SP1-18	(20.80,20.90)	28	19.67	0.73
SP1-19	(20.90,21.00)	24	15.10	0.63

Packet	Magnitude range	Deletion (per cent)	l (d)	ε (d⁻¹)
SP1-1	(5.00,13.00)	0	150.49	1.53
SP1-2	(13.00,16.00)	0	1.57	1.58
SP1-3	(16.00,16.30)	1	9.65	1.48
SP1-4	(16.30,17.00)	1	1.78	1.52
SP1-5	(17.00,17.20)	2	7.01	1.52
SP1-6	(17.20,18.00)	2	2.15	1.52
SP1-7	(18.00,18.10)	2	11.11	1.51
SP1-8	(18.10,19.00)	2	2.01	1.44
SP1-9	(19.00,19.05)	2	7.02	1.30
SP1-10	(19.05,19.95)	7	0.28	1.21
SP1-11	(19.95,20.00)	2	7.87	1.00
SP1-12	(20.00,20.30)	13	1.39	0.99
SP1-13	(20.30,20.40)	12	2.32	1.08
SP1-14	(20.40,20.50)	13	2.21	1.07
SP1-15	(20.50,20.60)	28	2.51	1.02
SP1-16	(20.60,20.70)	28	3.01	0.97
SP1-17	(20.70,20.80)	28	8.64	0.84
SP1-18	(20.80,20.90)	28	19.67	0.73
SP1-19	(20.90,21.00)	24	15.10	0.63

The objective of this section is to rigorously identify the gaps in Gaia data taking by exploiting the DR2 variable star epoch photometry. We attempted this in Paper I by looking for periods longer than 1 per cent of a day where none of the variable stars had a published detection, but there were several possible issues with this approach. First, this approach cannot distinguish between a period with no published detections and a period where no variable stars were observed because Gaia was scanning a sparse region of the sky. Second, our choice of 1 per cent of a day was an attempt to mitigate the first problem and it is likely that there are gaps shorter than this. Third, gaps are not the only reason that an observation might not result in a detection, as mentioned above.

Our novel methodology in this work is based on the idea that a run of non-detections could be due to a gap or it could be due to those stars having a low probability that an observation would result in a published detection. By pairing the published detections with the predicted observations that failed to result in published detections and weighing the possibility of a gap against the possibility of low detection efficiencies, we are able to search for gaps in a way that is robust to the density of variable stars on the sky and to the low detection efficiency of faint stars. We model the probability of a source with magnitude G having a published detection at the predicted observation time t as A(t)B(G, t), where A(t) takes binary values and represents gaps that affect all magnitudes equally while B(G, t) can take any value in (0,1) and represents any other effects that can cause an observation to not yield a published detection. We assume that the magnitude dependency of B(G, t) is piecewise constant such that at time t it takes a different value in each of the star packet magnitude intervals (see Table 1), because one of the most important drivers of a time-varying detection probability is crowding and the impact of crowding changes between each star packet magnitude interval.

In Section 4.1, we discuss the preparation of the data and the prediction of the times of observation. We describe our discrete hidden Markov model for A(t) in Section 4.2 and our extended Kalman filter model for B(G, t) in Section 4.3. The full Bayesian specification of this problem would require us to have a posterior over millions of parameters (one discrete and one continuous hidden state at each epoch measurement in addition to the tens of hyperparameters) and so we opt for an iterative maximum-likelihood approach which we describe in Section 4.4.

4.1 Predicting observations and data cleaning

We use the 550 737 variable stars in the Gaia DR2 epoch photometry tables. We extract the times of observation at Gaia from the transit_id and approximate the error in the epoch G magnitude from the reported epoch flux and flux error. For each source in the epoch photometry table we then predict when it would have been observed during the 22 months of Gaia DR2, following the methodology laid out in Papers I and II and using the scanning law derived in Section 3. The resulting 19 685 650 predicted observations were then matched to the published detections with a window of |$1{{\ \rm per\ cent}}$| of a day, with the predicted observations being assigned a value of one if they are matched with a published detection and zero otherwise.

Our methodology is magnitude dependent and so we must assign a magnitude to each predicted observation. In the case of those with matching published detections, we take the magnitude estimate and error reported at that epoch, but for the non-detections we must infer the magnitude at those epochs. The naïve approach would be to assign the mean G-band magnitude from the Gaia source table to be the magnitude at each non-detection, but that ignores that these stars have been identified to be variable stars and thus the magnitude will significantly change between epochs. The best way to proceed would be to have theoretical models that predict the light curves of different types of variable star objects, fit those model light curves to the published detections in a Bayesian way, and then marginalize over those fits to obtain a predicted measurement at each of the predicted observations without a published detection. However, this is far beyond the scope of this work and would likely only make a marginal difference to our results. We will be combining the measurements from many stars into magnitude bins and thus a few observations crossing the border into an adjacent magnitude bin should not overly bias our results. We therefore decided to infer these values using Gaussian process regression, to which we gave an introduction in Section 2.1. The reader should view our use of Gaussian process regression in this instance as a mathematically convenient curve-fitting scheme that can account for uncertainty in the individual magnitude measurements.

We assume that the time evolution of the magnitude of each star can modelled as the realization of a Gaussian process. We have measurements of the magnitude G(t_i) and magnitude variance σ²(t_i) at times |$\lbrace t_{i}\rbrace _{i = 1}^{n}$| and wish to predict the magnitude at any time t_*. We assume that the mean is constant, and that the covariance kernel is given by equation (3). We fix the mean and signal variance according to equations (7) and (8). Though the actual period distribution of the published variables stars ranges from hours till hundreds of days, we fix the length scale to l = 1 d as a very rough ‘typical’ value. This has the consequence that a non-detection occurring soon before or soon after a detection will be assigned a magnitude close to that of the detection, whereas a non-detection occurring long before or long after a detection will be assigned a magnitude close to the sample mean. The scanning law causes most stars observed in the preceding field-of-view to be observed 2 h later by the following field of view and thus observations come in pairs. Our inference scheme is therefore most effective in the fairly common case that one of these observations did not result in a detection. We give examples of the use of Gaussian process regression to infer the magnitudes of non-detections in Fig. 4.

Figure 4.

Examples of the results from our use of a Gaussian Process to impute the magnitudes of the variable stars at the epochs at which they were predicted to be observed but that did not result in a published detection in the Gaia DR2 epoch photometry tables. The solid and dashed lines indicate the weighted mean and 1σ regions of the published detections of each star.

Open in new tab Download slide

We applied this formalism to all 550 737 of the stars with epoch photometry. We then used the magnitude at each predicted observation to group them across all stars by magnitude into the bins given in Table 1, and further sorted these observations by time. If there is a matching published detection then the magnitude used for the grouping is the one measured by Gaia, otherwise it is the magnitude infilled by our Gaussian Process as described above.

In summary, we have magnitude bins i = 1, …, 19 each with a time-series |$t_j^i$| of predicted observations j = 1, …, Mⁱ each with a flag |$k_j^i$| which indicates whether there is a matching published detection (⁠|$k_j^i=1$|⁠) or not (⁠|$k_j^i=0$|⁠).

4.2 Hidden Markov Model for gaps

We will model the occurrence of gaps with a Hidden Markov Model, which we introduced in Section 2.2. In this application the hidden state X_i at time t_i indicates whether a gap is occurring (X_i = 0) or not (X_i = 1), and the observation Y_i reports whether or not the predicted observation of a variable star at time t_i has yielded a published detection (Y_i = 1) or not (Y_i = 0). We will assume that no detections are possible during gaps

$$\begin{eqnarray*} P(Y_i=1|X_i=0) \equiv 0 \Rightarrow P(Y_i=0|X_i=0) \equiv 1, \end{eqnarray*}$$

(74)

and that at each step there is some known probability p_i that a published detection occurs if we are outside of a gap

$$\begin{eqnarray*} P(Y_i=1|X_i=1) \equiv p_i \Rightarrow P(Y_i=0|X_i=1) \equiv 1-p_i. \end{eqnarray*}$$

(75)

The value of p_i will depend on the magnitude of the star being observed at time t_i and will be an output of the model in the following section.

The transition between states will be governed by the probability q_i that a gap will begin at step i and the probability r_i that a gap will end at step i,

$$\begin{eqnarray*} P(X_i=0|X_{i-1}=1) \equiv q_i \Rightarrow P(X_i=1|X_{i-1}=1) \equiv 1-q_i, \end{eqnarray*}$$

(76)

$$\begin{eqnarray*} P(X_i=1|X_{i-1}=0) \equiv r_i \Rightarrow P(X_i=0|X_{i-1}=0) \equiv 1-r_i. \end{eqnarray*}$$

(77)

The predicted observation times are not uniformly spaced and so we assume that

$$\begin{eqnarray*} q_i &=& \frac{1}{2}-\frac{1}{2}\exp {\left(-\frac{t_i-t_{i-1}}{q}\right)}, \nonumber \\ r_i &=& \frac{1}{2}-\frac{1}{2}\exp {\left(-\frac{t_i-t_{i-1}}{r}\right)}, \end{eqnarray*}$$

(78)

where q and r are the length-scales over which non-gaps and gaps persist. These expressions have the properties that for t_i − t_{i − 1} ≪ 1 the probability of the state changing is zero and that if t_i is a long time after t_{i − 1} then the two states are entirely uncorrelated (X_i is equally likely to zero or one regardless of the state X_{i − 1}).

4.3 Continuous state space modelling

We will model the probability of an observation at time t_i resulting in a published detection if a gap is not occurring with an extended Kalman filter in each of the magnitude bins. We have observations y_i at predicted observation times t_i at which a star has a published detection (y_i = 1) or not (y_i = 0). We assume that the probability of a published detection is given by

$$\begin{eqnarray*} \operatorname{P}(y_i|x_i)=\left\lbrace \begin{array}{@{}l@{\quad }l@{}}\Phi (x_i) &\text{if $y_i = 1$,}\\ 1-\Phi (x_i) &\text{if $y_i = 0$,} \end{array}\right. \end{eqnarray*}$$

(79)

where |$X_i\in \mathbb {R}$| is the hidden state and Φ(X_i) maps X_i to (0,1). Throughout this section, we will use the notation ϕ(x|μ, σ²) and Φ(x|μ, σ²) to denote the PDF and cumulative density function of a normally distributed random variable with mean μ and variance σ², where if the givens are omitted – such as in equation (79) – we are referring to the standard normal distribution (μ = 0, σ² = 1). For convenience, we give the explicit forms of these functions,

$$\begin{eqnarray*} \phi (x|\mu ,\sigma ^2) = \frac{1}{\sqrt{2\pi \sigma ^2}}e^{-\frac{x-\mu }{2\sigma ^2}}, \end{eqnarray*}$$

(80)

$$\begin{eqnarray*} \Phi (x|\mu ,\sigma ^2) = \frac{1}{2}\left(1+\operatorname{erf}\left(\frac{x-\mu }{\sqrt{2\sigma ^2}}\right)\right). \end{eqnarray*}$$

(81)

The use of Φ(x) in equation (79) is solely to allow us to work with state space variables in (− ∞, +∞) and then map them to (0,1), and the choice of this function over other functions that can carry out this mapping (such as a sigmoid function) was made to simplify integrals later in this section.

We assume that the state variables X_i are related to each other through a variation on an extended Kalman filter

$$\begin{eqnarray*} x_{i} = F_ix_{i-1} + w_i, \end{eqnarray*}$$

(82)

$$\begin{eqnarray*} F_i = e^{-\tau _i/l}, \end{eqnarray*}$$

(83)

$$\begin{eqnarray*} w_i \sim \operatorname{Normal} (0,(1-e^{-2\tau _i/l})\varepsilon ^2), \end{eqnarray*}$$

(84)

$$\begin{eqnarray*} \tau _i = t_i-t_{i-1}, \end{eqnarray*}$$

(85)

where l and ε² are the length-scale and variance of the process. These equations can be equivalently expressed through the PDF of the next state conditioned on the previous state,

$$\begin{eqnarray*} \operatorname{P}(x_{i}|x_{i-1}) = \phi (x_{i} \vert e^{-\tau _i/l}x_{i-1},(1-e^{-2\tau _i/l})\varepsilon ^2). \end{eqnarray*}$$

(86)

We note that this process is mean reverting: as the time between observations grows large the conditional expectation of the next observation tends to zero.

Throughout this section, we will encounter expressions of the form f(x) = Φ(x)ϕ(x|μ, σ²) and will need to approximate f(x) ≈ Aϕ(x|m, s²). Denote the jth moment of f(x) about zero by |$J_j=\int _{-\infty }^{+\infty }x^jf(x)\mathrm{d}x$|⁠, then we set A = J₀ to ensure the integral of f(x) is preserved and further choose m ≈ J₁/J₀ and s² ≈ J₂/J₀ − m², that is, we set the mean and variance of the approximation to be the mean and variance of the normalized f(x). The moments are given by the expressions

$$\begin{eqnarray*} J_0 = \frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\mu }{\sqrt{2}\sqrt{1+\sigma ^2}}\right)\right), \end{eqnarray*}$$

(87)

$$\begin{eqnarray*} J_1 = \mu J_0 + \sigma ^2\phi (0|\mu ,1+\sigma ^2), \end{eqnarray*}$$

(88)

$$\begin{eqnarray*} J_2 = \sigma ^2 J_0 + \mu J_1 + \frac{\mu \sigma ^2}{1+\sigma ^2}\phi (0|\mu ,1+\sigma ^2). \end{eqnarray*}$$

(89)

For (μ, σ²) = (0, 1) the difference between the normalized cumulative density functions is less than 1 per cent everywhere, while for (μ, σ²) = (0, 0.1) this difference is less than 0.05 per cent, as shown in Fig. 5. For our purposes σ² is typically much less than unity and so this approximation is excellent.

The maximum percentage difference between the cumulative densities of the function Φ(x)ϕ(x|μ, σ2) and the Gaussian approximation described in the text.

Figure 5.

The maximum percentage difference between the cumulative densities of the function Φ(x)ϕ(x|μ, σ²) and the Gaussian approximation described in the text.

Open in new tab Download slide

We need to obtain both the likelihood of the hyperparameters l and ε² and the best estimate of the states |${X_i}_{i=1}^{n}$| given the observations |${Y_i}_{i=1}^{n}$|⁠. The reason our model cannot be expressed as a conventional extended Kalman filter is that our observations are not continuous and thus we cannot simply differentiate the observation model with respect to the state space parameters. We instead follow the formalism laid out in Fraser (2008, chap. 4).

Denote by μ_i|j and |$\sigma _{i|j}^2$| the mean and variance of the estimate of the state at time t_i given all of the observations that occurred up to and until time t_j. We initialize the algorithm by specifying a weak prior on the first state P(x₁) = ϕ(x₁| − 3, 6). In the forward part of the algorithm we iteratively forecast the next state given all previous observations

$$\begin{eqnarray*} P(x_{i+1}|y_1,\dots ,y_i) = \int P(x_{i+1}|x_i)P(x_i|y_1,\dots ,y_i) \mathrm{d}x_i \end{eqnarray*}$$

(90)

$$\begin{eqnarray*} \qquad = \int \phi (x_{i+1}|e^{-\tau _i/l}x_i,(1-e^{-2\tau _i/l})\varepsilon ^2)\phi \left(x_i|\mu _{i|i},\sigma ^2_{i|i}\right) \mathrm{d}x_i \end{eqnarray*}$$

(91)

$$\begin{eqnarray*} \qquad = \phi \left(x_{i+1}|\mu _{i+1|i},\sigma ^2_{i+1|i}\right), \end{eqnarray*}$$

(92)

using the fact that the convolution of two normal distributions is another normal distribution, where

$$\begin{eqnarray*} \mu _{i+1|i} = \mu _{i|i}e^{-\tau _i/l} \end{eqnarray*}$$

(93)

$$\begin{eqnarray*} \sigma _{i+1|i}^2 = \sigma _{i|i}^2e^{-2\tau _i/l}+\varepsilon ^2(1-e^{-2\tau _i/l}). \end{eqnarray*}$$

(94)

We can then obtain the conditional state distribution and probability of the observation as

$$\begin{eqnarray*} &&{P(x_{i+1}|y_1,\dots ,y_{i+1})P(y_{i+1}|y_1,\dots ,y_i)} \nonumber \\ &&{\quad= P(y_{i+1}|x_{i+1})P(x_{i+1}|y_1,\dots ,y_i)} \nonumber \\ &&{\quad= \Phi (x_{i+1})\phi (x_{i+1}|\mu _{i+1|i},\sigma ^2_{i+1|i})} \nonumber \\ &&{\quad\approx L_{i+1|i}\phi (x_{i+1}|\mu _{i+1|i+1},\sigma ^2_{i+1|i+1})} \end{eqnarray*}$$

(95)

where we obtain L_{i + 1|i}, μ_{i + 1|i + 1} and |$\sigma ^2_{i+1|i+1}$| through the moment-matching approximation discussed above. Note that equation (95) is only true for the case that y_{i + 1} = 1. If y_{i + 1} = 0 then P(y_{i + 1}|x_{i + 1}) = 1 − Φ(x_{i + 1}) and the expressions above can be trivially adjusted. The total likelihood L is then given by

$$\begin{eqnarray*} L = L_1\prod _{i=1}^{n-1}L_{i+1|i} \end{eqnarray*}$$

(96)

where L₁ can be computed by replacing P(x_{i + 1}|x_i) by the prior P(x₁) in the equations above. Having approximated all of the conditional probability distributions in the forward algorithm as normal distributions, the backwards smoothing filter described in Section 2.3 can be applied to compute μ_i|n and |$\sigma _{i|n}^2$|⁠, noting F_i = exp (− τ_i/l).

If there is known to be a gap at time t_i, then we adjust the expressions above by stating P(y_i|x_i) = 1.

4.4 Estimating the hyperparameter likelihood and discussion

Our model for the gaps has two free parameters q and r and our model for the detection efficiency has two free parameters l_i and |$\varepsilon _i^2$| in each magnitude bin i, for a total of 38 free parameters. The run-time of both algorithms is linear in the number of predicted observations and thus it would be prohibitively expensive to simultaneously optimize all parameters. We opt instead to alternate between optimizing the free parameters in the gaps model and optimizing the free parameters in the detection efficiency model, using the maximum-likelihood gaps or efficiencies from the last optimization of each in the next optimization of the other.

Our final time-scales for the persistence of non-gaps and gaps were q = 1.216 and r = 0.043 d. If the time between the current state and the next is equal to the persistence time-scale then the probability of the next state being the same as the current state is 68.4 per cent, while if the time difference is twice the persistence time-scale then the probability of remaining in the same state is only |$56.8{{\ \rm per\ cent}}$|⁠. Our final values for q and r tell us that gaps are not frequent and tend to be short-lived. The final values for the time-scale l and variance ε² of the detection efficiency process in each magnitude bin are given in Table 1. These describe the time-scales over which the detection efficiencies appear to vary and the scale of those variations. We illustrate the final gaps and detection efficiencies in Fig. 6.

Figure 6.

Illustration of our final gaps and probabilities of detection. Top: the inferred gaps where no Gaia observations resulted in published detections are shown in blue. Bottom: for each of the magnitude bins used to define Gaia star packets, the colour map shows the probability that a Gaia observation made at that time would result in a published detection.

Open in new tab Download slide

Fig. 6 shows rich structure which can be attributed to known effects. The drops in detection probability around |$\mathrm{OBMT}\, 1925\,\,\mathrm{rev}$|⁠, 2150 rev, 2825 rev, 3475 rev, and 3575 rev are due to periods when Gaia was frequently scanning across the Galactic plane. The initial period of low detection probability for sources fainter than G = 20.3 is due to the faint-end threshold for the SkyMapper CCDs to register a detection (de Bruijne et al. 2015) originally being set to G = 20.3. This threshold was changed to G = 21.0 on 2014 September 15 and decreased to its final value of G = 20.7 on 2014 October 27 (see section 1.3.3 of de Bruijne et al. 2018). These threshold changes do not appear as crisp edges in this plot because the SkyMapper-estimated G magnitude is not as accurate as the calibrated measurement made by the astrometric CCDs. We identified a total of 206 gaps, an increase from the 94 gaps identified in Paper I. The gaps which are clearly visible in Fig. 6 were primarily caused by the mirror decontaminations and subsequent refocusings, station-keeping maneuvers, and micro-meteoroid impacts.

5 USING OUR SCANNING LAW TO PREDICT GAIA OBSERVATIONS AND THE PROBABILITY OF THEM RESULTING IN DETECTIONS

The results of our series of papers rely heavily on predicting when Gaia observed a location on the sky, given the scanning law. Being able to predict Gaia observations would be useful in other science cases beyond inferring Gaia’s completeness, and so we have produced a python module scanninglaw (https://github.com/gaiaverse/scanninglaw) based on the dustmaps package by Green (2018) and subsequent selectionfunctions package from Paper II. This enables the user to ask the question ‘At what times could Gaia have observed my star in the time frame of DR2 and what is the probability that each observation was successfully processed?’. This is demonstrated by determining when the fastest main-sequence star in the Galaxy (S5-HVS1, Koposov et al. 2020) would have been observed in Gaia DR2. This python package has options to download and use the DPAC nominal scanning law or the one derived in Section 3 of this work.

6 CONCLUSIONS

The completeness of the Gaia catalogues is heavily dependent on the status of Gaia through time. If there is a gap in scientific operations, a drop in the detection efficiency or Gaia deviates from the commanded scanning law, then stars will miss out on potential detections and thus be less likely to make it into the Gaia catalogues. The Gaia mission will take hundreds of epoch astrometric, photometric, and spectroscopic measurements of billions of stars, which will implicitly encode the status of Gaia throughout the mission. In this work, we have laid the groundwork for the future exploitation of these massive time-series by developing novel methodologies to infer the orientation of Gaia and the gaps and detection efficiencies from time-series of Gaia detections. We have applied these methodologies to the Gaia DR2 variable star epoch photometry which are the only publicly available Gaia time-series at the present time. The nominal scanning law will be updated in the early third Gaia DR3, but the true attitude determination used in the DPAC astrometric solution will not be made available at that time.⁴ Therefore, in a later paper in this series we will determine a more accurate scanning law for the period covered by DR3 by applying the methods presented in this paper to the extended variable star photometry which will become available in the full DR3.

The objective of this work in the context of the Completeness of the Gaia-verse series was to more accurately infer Gaia’s true scanning law and the timings of data-taking gaps, which will be used in subsequent works to infer selection functions for the astrometric, photometric, and spectroscopic data products. However, our results are also of immediate practical use. We have created a new open-source python package scanninglaw which can be used to query the times that Gaia observed a location on the sky and the probability of each of those observations resulting in a published detection. This package can download and use any of the publicly available scanning laws for the 22 months of Gaia DR2.

ACKNOWLEDGEMENTS

DB thanks Magdalen College for his fellowship and the Rudolf Peierls Centre for Theoretical Physics for providing office space and travel funds. AE thanks the Science and Technology Facilities Council of the United Kingdom for financial support. This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by National Institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

DATA AVAILABILITY

The data underlying this article are publicly available from the European Space Agency’s Gaia archive (https://gea.esac.esa.int/archive/). The scanning law derived in Section 3 is publicly available on the Harvard Dataverse (https://doi.org/10.7910/DVN/MYIPLH), as are the identified gaps and detection probabilities in Gaia data-taking (https://doi.org/10.7910/DVN/ST8TSM). The authors welcome queries from those interested in using our data products in their own works.

Footnotes

1

https://www.cosmos.esa.int/web/gaia/scanning-law-pointings

2

Note that Gaia does not use any gyroscopes to guide its attitude, instead it uses a cold-gas micro-Newton thruster system firing several times per second, using the scientific instrument measurements to maintain the programmed scanning law during nominal operations (Gaia Collaboration 2016).

3

An analogy can make this clearer. Imagine we release a satellite on a circular orbit about the Earth and at the same time release a second satellite on a slightly inclined but otherwise identical orbit. The distance between these two satellites will oscillate with a period that is identical to the period of their orbit. Similarly, small offsets in the estimated angular velocity of Gaia between the nominal scanning law and our derived scanning law cause the angular offset between the two to oscillates with Gaia’s rotation periods.

4

https://www.cosmos.esa.int/web/gaia/earlydr3

REFERENCES

Andrle

M. S.

,

Crassidis

J. L.

,

2015

,

J. Guid. Control Dyn.

,

38

,

1614

10.2514/1.G001025

Crossref

Search ADS

Boubert

D.

,

Everall

A.

,

2020

,

MNRAS

,

497

,

4246

(Paper II)

10.1093/mnras/staa2305

Crossref

Search ADS

Boubert

D.

,

Everall

A.

,

Holl

B.

,

2020

,

MNRAS

,

497

,

1826

(Paper I)

10.1093/mnras/staa2050

Crossref

Search ADS

Burton

R.

,

Rock

S.

,

Springmann

J.

,

Cutler

J.

,

2017

,

Acta Astronaut.

,

133

,

269

10.1016/j.actaastro.2017.01.024

Crossref

Search ADS

Cardoso

J. R.

,

Leite

F. S.

,

2010

,

J. Comput. Appl. Math.

,

233

,

2867

10.1016/j.cam.2009.11.032

Crossref

Search ADS

Crouch

P. E.

,

Grossman

R.

,

1993

,

J. Nonlinear Sci.

,

3

,

1

10.1007/BF02429858

Crossref

Search ADS

Crowley

C.

et al. ,

2016

,

A&A

,

595

,

A6

10.1051/0004-6361/201628990

Crossref

Search ADS

de Bruijne

J.

,

Siddiqui

H.

,

Lammers

U.

,

Hoar

J.

,

O’Mullane

W.

,

Prusti

T.

,

2010

, in

Klioner

S. A.

,

Seidelmann

P. K.

,

Soffel

M. H.

, eds,

IAU Symp. Vol. 261, Relativity in Fundamental Astronomy: Dynamics, Reference Frames, and Data Analysis

.

Cambridge Univ. Press

,

Cambridge

, p.

331

10.1017/S1743921309990597

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Crossref

de Bruijne

J. H. J.

,

Allen

M.

,

Azaz

S.

,

Krone-Martins

A.

,

Prod’homme

T.

,

Hestroffer

D.

,

2015

,

A&A

,

576

,

A74

10.1051/0004-6361/201424018

Crossref

Search ADS

de Bruijne

J. H. J.

et al. ,

2018

,

Gaia DR2 documentation Chapter 1: Introduction, Gaia DR2 documentation

. Available at: https://gea.esac.esa.int/archive/documentation/GDR2/

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Dieci

L.

,

Papini

A.

,

2001

,

Numer. Algorithms

,

28

,

137

10.1023/A:1014071202885

Crossref

Search ADS

Evans

D. W.

et al. ,

2018

,

A&A

,

616

,

A4

10.1051/0004-6361/201832756

Crossref

Search ADS

Fraser

A. M.

,

2008

,

Hidden Markov Models and Dynamical Systems

. Vol.

107

,

Society for Industrial and Applied Mathematics

,

Philadelphia

Gaia Collaboration

,

2016

,

A&A

,

595

,

A1

10.1051/0004-6361/201629272

Crossref

Search ADS

Gaia Collaboration

,

2018

,

A&A

,

616

,

A1

10.1051/0004-6361/201833051

Crossref

Search ADS

Gao

F.

,

Han

L.

,

2012

,

Comput. Optim. Appl.

,

51

,

259

10.1007/s10589-010-9329-3

Crossref

Search ADS

Green

G. M.

,

2018

,

J. Open Source Softw.

,

3

,

695

10.21105/joss.00695

Crossref

Search ADS

Holl

B.

et al. ,

2018

,

A&A

,

618

,

A30

10.1051/0004-6361/201832892

Crossref

Search ADS

Klioner

S. A.

,

2003

,

AJ

,

125

,

1580

10.1086/367593

Crossref

Search ADS

Koposov

S. E.

et al. ,

2020

,

MNRAS

,

491

,

2465

10.1093/mnras/stz3081

Crossref

Search ADS

Kubelka

V.

,

Reinstein

M.

,

Svoboda

T.

,

2016

,

Robot. Auton. Syst.

,

84

,

88

10.1016/j.robot.2016.07.006

Crossref

Search ADS

Lindegren

L.

,

Lammers

U.

,

Hobbs

D.

,

O’Mullane

W.

,

Bastian

U.

,

Hernández

J.

,

2012

,

A&A

,

538

,

A78

10.1051/0004-6361/201117905

Crossref

Search ADS

Lindegren

L.

et al. ,

2018

,

A&A

,

616

,

A2

10.1051/0004-6361/201832727

Crossref

Search ADS

Markley

F. L.

,

2003

,

J. Guid. Control Dyn.

,

26

,

311

10.2514/2.5048

Crossref

Search ADS

Rasmussen

C. E.

,

Williams

C. K. I.

,

2006

,

Gaussian Processes for Machine Learning

.

MIT Press

,

Cambridge

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Rauch

H. E.

,

Striebel

C. T.

,

Tung

F.

,

1965

,

AIAA J.

,

3

,

1445

10.2514/3.3166

Crossref

Search ADS

Riello

M.

et al. ,

2018

,

A&A

,

616

,

A3

10.1051/0004-6361/201832712

Crossref

Search ADS

Sacks

J.

,

Welch

W. J.

,

Mitchell

T. J.

,

Wynn

H. P.

,

1989

,

Stat. Sci.

,

4

,

409

10.1214/ss/1177012413

Crossref

Search ADS

Sanderson

G.

,

Eater

B.

,

2018

,

Visualizing Quaternions: An Explorable Video Series

, Available at:

https://eater.net/quaternions

(last accessed Aug 2020)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Viterbi

A.

,

1967

,

IEEE Trans. Inform. Theory

,

13

,

260

10.1109/TIT.1967.1054010

Crossref

Search ADS

Wertz

J. R.

,

2012

,

Spacecraft Attitude Determination and Control

. Vol.

73

,

Springer Science & Business Media

,

Berlin

10.1063/1.4707858

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Crossref

APPENDIX A: MATRIX EXPONENTIAL FOR THE MEKF

The goal of this appendix is to evaluate |$\exp{(\delta t \boldsymbol{\sf F})}$| where |$\boldsymbol{\sf F}$| is a block matrix of the form

$$\begin{eqnarray*} \boldsymbol{\sf F} = {\begin{bmatrix}\omega \boldsymbol{\sf S}& \boldsymbol{\sf I}_{3\times3} \\ \boldsymbol{\sf 0}_{3\times3} & \boldsymbol{\sf 0}_{3\times3} \end{bmatrix}}, \end{eqnarray*}$$

(A1)

where |$\boldsymbol{\sf S}\equiv -[\boldsymbol {n}\times ]$|⁠, ω is a scalar in |${R}$| and |$\boldsymbol {n}$| is a unit vector in |${R}^3$|⁠. Using the expression in equation 1.3 of Dieci & Papini (2001) for the exponential of an upper triangular block matrix, we have

$$\begin{eqnarray*} e^{\delta t \boldsymbol{\sf F}} = {\begin{bmatrix}e^{\delta t \omega \boldsymbol{\sf S}} & \int _0^1 \delta t e^{(1-x)\omega \delta t \boldsymbol{\sf S}}\mathrm{d}x \\ \boldsymbol{\sf 0}_{3\times3} & \boldsymbol{\sf I}_{3\times3}\end{bmatrix}}. \end{eqnarray*}$$

(A2)

The first diagonal term can be simplified by noting that |$\boldsymbol{\sf S}$| is a skew-symmetric matrix with unit two-norm and thus the Rodrigues rotation formula (e.g. see section 2 of Cardoso & Leite 2010) gives us

$$\begin{eqnarray*} e^{\omega \delta t \boldsymbol{\sf S}} = \boldsymbol{\sf I}_{3\times3}+\sin {\omega \delta t}\boldsymbol{\sf S} + (1-\cos {\omega \delta t})\boldsymbol{\sf S}^2, \end{eqnarray*}$$

(A3)

where θ is a scalar in |${R}$|⁠. The off-diagonal term can be simplified by substituting the Rodrigues formula, changing variables and integrating,

$$\begin{eqnarray*} \int _0^1 \delta t e^{(1-x)\omega \delta t \boldsymbol{\sf S}}\mathrm{d}x &=& \frac{1}{\omega }\int _0^{\omega \delta t} \boldsymbol{\sf I}_{3\times3}+\sin {u}\boldsymbol{\sf S} + (1-\cos {u})\boldsymbol{\sf S}^2 \mathrm{d}u \nonumber \\ &=& \delta t \boldsymbol{\sf I}_{3\times3}+\frac{1}{\omega }\left(1-\cos {\omega \delta t}\right)\boldsymbol{\sf S} \nonumber\\ &&+\,\frac{1}{\omega }\left(\omega \delta t-\sin {\omega \delta t}\right)\boldsymbol{\sf S}^2. \end{eqnarray*}$$

(A4)

We note that this result matches that derived by Andrle & Crassidis (2015).

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
December 2020	1
January 2021	18
February 2021	9
March 2021	13
April 2021	1
May 2021	6
June 2021	10
August 2021	2
September 2021	9
October 2021	5
November 2021	10
December 2021	2
February 2022	4
March 2022	2
April 2022	4
May 2022	1
June 2022	7
July 2022	1
August 2022	2
September 2022	4
October 2022	6
November 2022	5
December 2022	3
February 2023	5
March 2023	2
May 2023	4
June 2023	1
October 2023	4
November 2023	5
December 2023	1
January 2024	11
February 2024	8
March 2024	14
April 2024	5
May 2024	14
June 2024	9
July 2024	20
August 2024	8
September 2024	11
October 2024	18
November 2024	3
December 2024	7
January 2025	3
February 2025	4
March 2025	13
April 2025	1
May 2025	5

Article Contents

Completeness of the Gaia-verse III: using hidden states to infer gaps, detection efficiencies, and the scanning law from the DR2 light curves

ABSTRACT

1 INTRODUCTION

2 MODELLING STATES THROUGH TIME

2.1 Gaussian process regression

2.2 Hidden Markov Models

2.3 Kalman filters

3 SCANNING LAW

3.1 What are quaternions?

3.2 Positions of stars on the sky and on the focal plane

3.3 The multiplicative extended Kalman filter

3.4 Application of the MEKF

4 EFFICIENCY OF OBTAINING USEFUL DETECTIONS FROM OBSERVATIONS

4.1 Predicting observations and data cleaning

4.2 Hidden Markov Model for gaps

4.3 Continuous state space modelling

4.4 Estimating the hyperparameter likelihood and discussion

5 USING OUR SCANNING LAW TO PREDICT GAIA OBSERVATIONS AND THE PROBABILITY OF THEM RESULTING IN DETECTIONS

6 CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

APPENDIX A: MATRIX EXPONENTIAL FOR THE MEKF

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Completeness of the Gaia-verse III: using hidden states to infer gaps, detection efficiencies, and the scanning law from the DR2 light curves Free

ABSTRACT

1 INTRODUCTION

2 MODELLING STATES THROUGH TIME

2.1 Gaussian process regression

2.2 Hidden Markov Models

2.3 Kalman filters

3 SCANNING LAW

3.1 What are quaternions?

3.2 Positions of stars on the sky and on the focal plane

3.3 The multiplicative extended Kalman filter

3.4 Application of the MEKF

4 EFFICIENCY OF OBTAINING USEFUL DETECTIONS FROM OBSERVATIONS

4.1 Predicting observations and data cleaning

4.2 Hidden Markov Model for gaps

4.3 Continuous state space modelling

4.4 Estimating the hyperparameter likelihood and discussion

5 USING OUR SCANNING LAW TO PREDICT GAIA OBSERVATIONS AND THE PROBABILITY OF THEM RESULTING IN DETECTIONS

6 CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

APPENDIX A: MATRIX EXPONENTIAL FOR THE MEKF

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Completeness of the Gaia-verse III: using hidden states to infer gaps, detection efficiencies, and the scanning law from the DR2 light curves