Abstract

We revisit the Hipparcos 2007 re-reduction and find improvements to the catalogue by leveraging Gaia EDR3. We show that including a constant residual offset and additional dispersion (two free parameters in total) in the Hipparcos 2007 Intermediate Astrometric Data (IAD) creates a new catalogue with significantly better agreement with Gaia EDR3. The astrometric parameters, after recalibration, have z-scores that follow a unit-Gaussian when measured against Gaia EDR3 values. We have expanded the python astrometry tool, htof, to recalibrate the IAD on-the-fly. On a second front, we find that a merged set of IAD from the 1997 and 2007 Hipparcos reductions is not possible in an internally consistent manner. This can be understood if Hipparcos 2007 is an improved, but overfit, model to the underlying along-scan data. For this reason, we recommend using the recalibrated Hipparcos 2007 astrometric parameters, or those from the HipparcosGaia Catalogue of Accelerations – because the signatures of overfitting are calibrated out. We advise caution in fitting orbits to the IAD from either Hipparcos 2 as-published or the recalibrated version presented here.

1 INTRODUCTION

The first reduction of the Hipparcos mission (hereafter, Hip1, ESA 1997; Perryman et al. 1997) provided proper motions, parallaxes, and positions for over 100 000 stars across the sky. These parameters were reduced from repeated observations of the instantaneous position of the stars. For this, Hipparcos scanned the sky over its 4-yr mission in a revolving pattern called the scanning law (van Leeuwen & Evans 1998). The photon-multiplier tube and slit arrangement of the satellite meant that each observation is precise in the direction along the scan, and imprecise perpendicular to the scan. In practice, each observation is thus considered one-dimensional (1D). The Intermediate Astrometric Data (IAD) are required to reconstruct the final catalogue parameters and errors. The IAD collectively consist of: the 1D residuals (with respect to the catalogue skypath) in the along-scan directions; the on-sky position angle of the scan; the time each scan was collected; and the associated standard error of each residual measurement (hereafter, along-scan error). Each version of the Hipparcos catalogue has its own IAD.

The van Leeuwen (2007b) Hipparcos reduction (hereafter, Hip2), was a new reduction of the same observational data, which reported significant improvements, in part due to an improved satellite attitude reconstruction (van Leeuwen & Fantino 2005). This led to formal errors on the final parameters (positions, parallaxes, and proper motions) better than Hip1 by up to a factor of four.

The Hip2 IAD have slightly different residuals (compared to Hip1), and importantly: dramatically smaller along-scan errors, such that some scans are weighted substantially more than other scans during the fitting of the skypath parameters. In this paper, we show that including a constant residual offset and additional dispersion in the Hip2 IAD creates a new catalogue with significantly better agreement with Gaia EDR3. The agreement is improved at a level that is nearly 60 Gaussian sigma above random chance, even after accounting for the addition of these two free parameters. We motivate our study in Section 2. We outline our method and data selection in Section 3. In Section 4, we derive the best value for the two new global constants, show their quantitative benefits to the catalogue, and validate this new set of IAD. We discuss the residuals, a critical piece of the Hip2 IAD, in Section 5. To fully understand the characteristics of the IAD before and after recalibration, we introduce a toy model to explain and highlight certain statistical features of the data. We show how the recalibrated IAD can benefit the orbit-fitting of individual stars in Section 6. In Section 7, we discuss the challenges of a merger of Hip1 and Hip2 that would result in a final, single set of IAD. We propose that a weighted combination of the fitted astrometric parameters (as in the HipparcosGaia Catalogue of Accelerations; HGCA; Brandt 2018) is still the preferred method; a statistical merger of the individual scans and residuals is not fruitful. We conclude and offer recommendations for the uses of Hipparcos astrometry in Section 8.

2 MOTIVATION

Both Hipparcos reductions are distributed as two sets of astrometric data: the catalogue which describes the best-fitting skypath via five (or more) astrometric parameters; and the IAD, which give the difference (residual) between the observed on-sky position and the best-fitting skypath. The five astrometric parameters reported by the catalogue are those of the χ2 (chi-squared) minimizing solution (see e.g. section 2.1 of Brandt et al. 2021), where χ2=iri2/σi2. Where ri is the along-scan residual for the ith observation, and σi is the corresponding along-scan error. If the data and errors are well-behaved, the residuals with respect to the best-fitting solution across every star should have mean zero. There are nearly 14 million residuals across the entire IAD and so the mean can be easily computed with high formal precision.

The mean of the Hip1 residuals is consistent with zero. It is −0.002 mas with a formal uncertainty on the mean of 0.01 mas: It is displaced from zero by only 0.2 Gaussian sigma. However, the Hip2 residuals are surprising. The mean of the distribution of residuals is −0.12 mas while the formal uncertainty on this mean is 0.003 mas. This is an offset that is significant at 40 Gaussian sigma due to the millions of samples. The fact that Hip2 has an unexpectedly non-zero residual offset motivates the following ostensibly absurd proposal: Could adding this residual offset, or a similar value fit as a free parameter, improve the catalogue?

Additionally, we investigate a second free parameter to improve the catalogue: inflating each individual along-scan error with a global additional dispersion added in quadrature (akin to cosmic dispersion or astrometric excess noise in Gaia). Inflating the errors is motivated by 1: the evidence that the errors in Hipparcos 2 are underestimated (discussed extensively in Brandt 2018); and 2: the fact that for many stars, Hipparcos 2 deflated the along-scan errors so that the goodness-of-fit metric for the astrometric skypath became desirable (van Leeuwen 2007a; Michalik et al. 2014) – a process often referred to as error renormalization but which is typically only used to inflate errors. Makarov (2022) and Gould, Kollmeier & Sesar (2016) (see Section 5) additionally hint at subtleties in the Hipparcos error profile. Note that inflating the uncertainties on the fitted astrometric parameters is not equivalent to inflating the individual IAD along-scan errors in quadrature, because the latter results in a more even weighting of each scan to which the skypath is fitted (and therefore different astrometric parameters). We stress that an investigation of whether the along-scan errors should be inflated is, a priori, difficult without Gaia EDR3 parallaxes and proper motions (or other ground-truth values). Perryman et al. (1997) were able to estimate a 0–20 per cent error inflation for Hip1 (see their table 2). Yet, as we show, Gaia allows one to precisely calculate the error inflation needed to match external uncertainties. In the next section, we describe how we recalibrate the Hipparcos 2 IAD by inflating errors and adding a residual offset.

3 METHODS AND DATA

When we refer to Hip2 as-is, we mean the IAD corresponding to the van Leeuwen (2007b) catalogue when it was published.1 Adding a residual offset and additional dispersion to the Hip2 IAD creates an entirely new set of IAD (residuals and along scan errors) for each star. Performing a standard five-parameter fit to these new IAD results in new, recalibrated astrometric parameters. We refer to this process as recalibration and the new catalogue and new IAD (together) as Hip2 recalibrated. We recalibrate Hipparcos with code2 based on the open-source astrometric-fitting package htof (Brandt et al. 2021; Brandt & Michalik 2022). We seek the additional dispersion and residual offset that best recalibrate Hip2. We define the best Hip2 recalibrated catalogue as follows:

  1. The best catalogue has the most accurately estimated uncertainties on parallaxes and proper motions.

  2. The best catalogue has the best agreement with ground-truth parallaxes and proper motions.

Ideally both (1) and (2) occur simultaneously but we do not enforce that explicitly. In practice, ‘ground-truth’ means a modern measure of parallaxes and proper motions that are substantially more precise than Hip2. Additionally, any ground-truth parameters must be such that lingering systematics affect the data at a level much smaller than the Hip2 errors on those same parameters.

We use Gaia EDR3 for the ground-truth parallaxes. The Gaia EDR3 parallaxes are on average 40 times more precise than the Hip1 parallaxes (and Hip2 after our recalibration). For ground-truth proper motions, we adopt the long-term proper motions between Hipparcos and Gaia, from the HGCA (Brandt 2021). These proper motions are based on the positional displacement between the two missions. For our sample, these long-term proper motions are on average 30 times more precise than the Hipparcos proper motions.

Because we use long-term, ∼25-yr, proper motions as the ground-truth values for the 3.4-yr Hipparcos proper motions, we restrict our analysis to the sources that are not accelerating. We identify these with the HGCA, by choosing stars with an acceleration χ2 less than 10. Furthermore, we limit ourselves to only five-parameter sources (solution type 5), and we remove the 6617 sources where the initially published IAD was flagged as corrupted by Brandt et al. (2021). Our final sample consists of 67 256 stars and nearly 8 million individual IAD observations.

A natural metric to measure agreement is the chi-squared (χ2) of the ground-truth astrometric parameters compared to the Hipparcos parameters. For parallax ϖ (each star indexed by i) this is

(1)

with

(2)

where ϖH,i and ϖG,i are the parallaxes in Hipparcos and Gaia for star i, and σϖ,H,i and σϖ,G,i are their uncertainties. Similar equations describe the likelihood for the two directions of proper motion.

(3)
(4)

where α and δ refer to right ascension3 and declination, respectively. The total metric which takes into account the covariance, is

(5)
(6)
(7)

where CHG−1 is the inverse covariance matrix for the difference between the Hipparcos and Gaia astrometric parameters. In the absence of covariance (i.e. CHG−1 is diagonal), equation (5) reduces to the sum of the individual metrics: χ2ϖ + χ2μα + χ2μδ. Indeed, the sum of the metrics ends up being nearly identical to the proper multivariate χ2 metric.

For any constant additional dispersion and constant residual offset, we perturb the Hipparcos catalogue IAD by those two values, then re-solve for a new, slightly perturbed catalogue: each additional dispersion and residual offset yields new ϖH,i, μα,H,i, and μδ,H,i (and errors) for each star. Thus for every recalibration, there is an associated larger or smaller likelihood relative to as-is Hip2.

An important last step is that we hold fixed the uncertainties for every refit (i.e. the σϖ,i2 etc. are fixed). Therefore, the only way that an additional dispersion can improve the agreement with the reference astrometry is by bringing the Hipparcos astrometry closer to the Gaia values. (Otherwise, inflating uncertainties to infinity would suppress χ2 to 0 for the ‘best’ solution.) Adding a uniform additional dispersion to all IAD gives each along-scan measurement a more uniform contribution to the fit than in Hip2 as-is, where the uncertainties on different scans can differ more substantially. This different relative weighting of the IAD produces small differences in the best-fitting astrometric skypaths. The aim is that these small differences budge the values naturally closer to the Gaia EDR3 ‘ground-truths’.

We therefore minimize the right-hand terms of equations (14). Minimizing the three χ2 (with fixed uncertainties) each separately will yield three different sets of best-fitting additional dispersions and residual offsets. Additionally, we minimize the multivariate χ2 metric equation (5). The best-fitting values could differ significantly from one metric to another. Differing best-fitting values would be evidence against the hypothesis that an additional dispersion and residual offset can improve the catalogue. Ideally, there exists a single residual offset and additional dispersion which naturally result in a reduced χ2 of unity across all four metrics.

There is some evidence that the best residual offset is a function of magnitude. The residual mean increases slightly with magnitude: from roughly −0.1 mas near Hipparcos magnitude (Hp) magnitudes of 3, to −0.16 mas at Hp magnitudes of 11. Encoding this variation in our procedure would not appreciably improve the recalibration over simply adopting a constant Δr because the residual mean range is small, but it would introduce more free parameters.

We solve for the best residual offset (Δr) and additional dispersion by brute force minimization of the metric. We compute the χ2 metrics at each trial value of Δr and additional dispersion. We test values of Δr over a fine grid between −0.4 and 0.8 mas, and additional dispersions between 0 and 5 mas. The boundaries are such because the global minimum occurs well within that grid; all trial values outside those boundaries yielded worse catalogues. In the following section, we discuss the results of this optimization.

4 THE RECALIBRATED HIPPARCOS 2007 CATALOGUE

Fig. 1 shows the four χ2 metrics (one for parallax, one for each direction of proper motion, and the average of all three metrics) over the evaluation grid of Δr and additional dispersion. Each parcel of the grid requires refitting all 67 000 stars in our sample. We computed 400 million astrometric solutions to construct Fig. 1.

The χ2 between Hipparcos and Gaia parallaxes (first panel, counting from the left) and Hipparcos and Gaia–Hipparcos long-term proper motions (second and third panels), by adding a residual offset (Δr) and additional dispersion to the Hip2 IAD. We have subtracted the χ2 of the catalogue as-published (0 additional dispersion and 0 residual offset) from each panel. The far right panel is equation (5) (which almost equals the sum of the three metrics). The first three panels have been multiplied by three to match the scale of the far-right sum panel. Δχ2 = 0 represents no improved agreement with Gaia. Δχ2 < 0 corresponds to an improved final catalogue, and a positive Δχ2 indicates degradation. The thin, vertical line in each panel is the negative of the observed residual offset from the van Leeuwen (2007b) IAD. The yellow dot denotes the minimum metric value in each panel. We adopt the combined multivariate χ2 as the best metric, yielding Δr = 0.140 mas and an additional dispersion of 2.25 mas as the optimal parameters.
Figure 1

The χ2 between Hipparcos and Gaia parallaxes (first panel, counting from the left) and Hipparcos and GaiaHipparcos long-term proper motions (second and third panels), by adding a residual offset (Δr) and additional dispersion to the Hip2 IAD. We have subtracted the χ2 of the catalogue as-published (0 additional dispersion and 0 residual offset) from each panel. The far right panel is equation (5) (which almost equals the sum of the three metrics). The first three panels have been multiplied by three to match the scale of the far-right sum panel. Δχ2 = 0 represents no improved agreement with Gaia. Δχ2 < 0 corresponds to an improved final catalogue, and a positive Δχ2 indicates degradation. The thin, vertical line in each panel is the negative of the observed residual offset from the van Leeuwen (2007b) IAD. The yellow dot denotes the minimum metric value in each panel. We adopt the combined multivariate χ2 as the best metric, yielding Δr = 0.140 mas and an additional dispersion of 2.25 mas as the optimal parameters.

The best agreement with Gaia EDR3 occurs at a Δr = 0.140 mas and an additional dispersion of 2.25 mas. Remarkable is that all four metrics yield roughly the same best-fitting values, a fact that strongly validates the procedure. Comparing the panels of Fig. 1 with each other, the best-fitting Δr and additional dispersion from one metric are within three standard deviations of those from any other metric. The agreement between the Hipparcos and Gaia catalogues is improved by 3800 points of χ2 in total (roughly 60 Gaussian sigma), by adopting the quoted Δr and additional dispersion while holding the Hipparcos uncertainties fixed.4 Because our metric is χ2, we can easily calculate the formal errors on our best-fitting values from Fig. 1. We use the χ2 survival function for 2 degrees of freedom. The best-fitting values with 1σ confidence intervals, are Δr = 0.140 ± 0.008 mas and an additional dispersion of 2.25 ± 0.04 mas.

As another consistency check, we computed the constant offset that brings the residual mean to zero. The residual mean in the as-is catalogue is −0.12 mas, but adding back in +0.12 mas and refitting does not cause the residuals to be centred at zero. The offset that centres the residuals at 0 is 0.145 mas, in excellent statistical agreement with our best-fitting Δr, despite the fact that we never enforced balanced residuals in the metric. Thus, fitting for the best external statistical agreement with Gaia EDR3 also improved the internal statistical properties of the IAD.

Fig. 2 showcases the improved accuracy of the errors on the astrometric parameters. In particular, the figure displays histograms of the three sets of z-scores, for Hip2 recalibrated, Hip2 as-is, and Hip1:

(8)
(9)
(10)

where HG denotes the long-baseline HGCA proper motions5 from the HGCA and i indexes the stars in the sample (all ∼67 000).

Histograms of z-scores of the fitted astrometric parameters relative to Gaia EDR3 parallaxes (top row) and HGCA long-baseline proper motions (middle and bottom row). The vertical axis is probability density. Left-hand panel shows the Hip2 recalibrated IAD (green), and middle panel shows the IAD as-is (blue; showing the less-than-optimal agreement with Gaia EDR3). Right-hand panel is the Hip1 IAD (black), showing how there is still an overdispersion (error underestimation), but to a lesser degree than in Hip2. A unit-Gaussian distribution (grey dashed line) is plotted to guide the eye.
Figure 2

Histograms of z-scores of the fitted astrometric parameters relative to Gaia EDR3 parallaxes (top row) and HGCA long-baseline proper motions (middle and bottom row). The vertical axis is probability density. Left-hand panel shows the Hip2 recalibrated IAD (green), and middle panel shows the IAD as-is (blue; showing the less-than-optimal agreement with Gaia EDR3). Right-hand panel is the Hip1 IAD (black), showing how there is still an overdispersion (error underestimation), but to a lesser degree than in Hip2. A unit-Gaussian distribution (grey dashed line) is plotted to guide the eye.

The middle column of Fig. 2 displays the Hip2 as-is parameters with respect to the Gaia EDR3 values. There, the distribution of z-scores is overdispersed relative to the unit-Gaussian. This is clear evidence for an underestimation of the Hip2 as-is parameter errors. For comparison, we show Hip1 in the right-hand column. It too has evidence for error-underestimation, but to a noticeably lesser degree. Some degree of underestimation is not surprising and simply hints at non-negligible external uncertainties in Hip1 with respect to Gaia EDR3. Indeed, the Hip1 consortia were aware of this possibility, and estimated the ratio of external to internal uncertainties to be between 1 and 1.2 (Perryman et al. 1997) – e.g. an error inflation between 0 and 20 per cent. Fig. 2 (Hip1 panel) shows that the true ratio is 1.07 for the parallaxes and 1.12 for the proper motions, which agrees remarkably well with Perryman et al. (1997).

The left-hand column of Fig. 2 shows Hip2 recalibrated. The distribution of the z-scores matches extremely well the unit-Gaussian. This demonstrates the reliability of the recalibrated parameter errors. This is noteworthy. The recalibrated parameter z-scores did not have to result in a unit-Gaussian, as we never fit for that behaviour. Our recalibration was tuned only for the best agreement with Gaia EDR3 in the values of the parameters.

Thus, we have the following remarkable summary. We did not tune the recalibration to: (1) improve the internal statistical agreement of Hip2 (i.e. remove the residual offset); nor (2) ensure more accurate errors on the Hipparcos astrometric parameters. However, both (1) and (2) are improved by the recalibration, even though neither were explicitly optimized for. The residual offset is removed, and the estimate of the astrometric errors is improved (Fig. 2). Finally, note that a parallax offset of −0.09 mas is visible in the Hip2 as-is, Hip2 recalibrated, and Hip1 data (top row of Fig. 2). This offset is 19 sigma significant (in terms of the standard error on the estimate of the mean of the ϖ z-scores). We do not investigate this offset in detail.

Fig. 2 shows that the catalogue is improved on average by recalibration. Fig. 3 shows these improvements across magnitudes. The right-hand panel shows how Hip2 as-is has a varying overdispersion in the parameter residuals; the overdispersion (and therefore error underestimation) increases with brightness. Despite that complex variation with magnitude, Fig. 3 (left-hand panel) shows that recalibration unambiguously improves agreement with Gaia EDR3 for all magnitudes. The additional dispersion, although a constant for all stars, compensates perfectly for the varying amount of error underestimation.

Histograms of z-scores of the fitted astrometric parameters relative to Gaia EDR3, binned by magnitude. Parallax and proper motions have been binned together. Left-hand panel: The recalibrated data, showing how all stars, independent of magnitude, have recalibrated parameters that agree excellently with Gaia EDR3. Note that the same additional dispersion and residual offset have been used for all stars. Right-hand panel: The as-is data, showing how the errors are most underestimated for bright stars and have the worst agreement with Gaia EDR3. The vertical axis is probability density. The expected unit-Gaussian distribution (grey dashed line) is overplotted. The as-is panel is substantially unchanged even if the error transformation from the published catalogue is kept (see Michalik et al. 2014 for an explanation of the transformation).
Figure 3

Histograms of z-scores of the fitted astrometric parameters relative to Gaia EDR3, binned by magnitude. Parallax and proper motions have been binned together. Left-hand panel: The recalibrated data, showing how all stars, independent of magnitude, have recalibrated parameters that agree excellently with Gaia EDR3. Note that the same additional dispersion and residual offset have been used for all stars. Right-hand panel: The as-is data, showing how the errors are most underestimated for bright stars and have the worst agreement with Gaia EDR3. The vertical axis is probability density. The expected unit-Gaussian distribution (grey dashed line) is overplotted. The as-is panel is substantially unchanged even if the error transformation from the published catalogue is kept (see Michalik et al. 2014 for an explanation of the transformation).

4.1 A local recalibration

The dominant source of systematic errors in Hipparcos is the spacecraft attitude (i.e. its orientation). This is accounted for in Hip2 by cubic splines with frequent anchor points; these are fit iteratively with the five-parameter astrometric skypaths (van Leeuwen & Fantino 2005). An alternative to modelling the spacecraft attitude is to use nearly simultaneous observations of stars. Given the beam combiner setup of the Hipparcos spacecraft, these are displaced by ≈58 (ESA 1997); the simultaneity of the observations results in the same attitude. The use of nearly simultaneous reference observations provides an alternative to modelling the attitude but it couples observations of distant stars. Makarov (2002) used the mean residuals of these reference stars to compute approximate corrections to the 1997 Hipparcos astrometry, resolving much of the discrepancy between the Hipparcos Pleiades parallax and the consensus of other measurements. A similar approach could be attempted for the entire Hipparcos data set using Gaia as an anchor for the reference stars used for any target star, but it is beyond the scope of this work.

One could, in principle, fit for residual offsets (abscissae zero-points) for each orbit, as suggested by Zacharias et al. (2022). We find that Hipparcos does not have the statistical power necessary to fit orbit–orbit variations in the residual offset. We investigated this by calculating the residual offsets in each of the ∼3000 orbits, and the statistical error in the estimate of those offsets. Fig. 4 shows the z-scores (residual offset divided by the standard error in that offset) across all orbits.

Left-hand panel: The z-scores of residual offsets calculated per orbit (in blue), for all ∼3000 orbits in Hipparcos 2. The best-fitting quadratic trend is shown in light grey and its coefficients are provided in Section 4.1. Right-hand panel: The same z-scores but binned. The distribution of z-scores is nearly identical to a unit normal (dashed grey).
Figure 4

Left-hand panel: The z-scores of residual offsets calculated per orbit (in blue), for all ∼3000 orbits in Hipparcos 2. The best-fitting quadratic trend is shown in light grey and its coefficients are provided in Section 4.1. Right-hand panel: The same z-scores but binned. The distribution of z-scores is nearly identical to a unit normal (dashed grey).

The distribution of z-scores of the per-orbit residual offsets is nearly identical to a unit normal. There is a slight quadratic trend present in the residuals.6 However, quadratically recalibrating Hipparcos 2 doubles the number of free parameters in the procedure, but results in a catalogue only 0.0016 per cent better than that from the constant offset recalibration. In other words, breaking up the residuals by orbit number is practically indistinguishable from any other division into ∼3000 random bins. Or from another angle, orbit-to-orbit variations of the residual offsets are significantly smaller than the precision with which we can measure those variations. We conclude that there is not enough statistical power to do an orbit-by-orbit calibration of Hipparcos.

5 A FOCUSED DISCUSSION OF THE HIP2 IAD RESIDUALS

The previous section demonstrated that a recalibration of the Hip2 IAD, including a substantial error inflation, improves the agreement of the resulting Hip2 catalogue with Gaia. After this recalibration, Hip2 astrometry is then ≈5 per cent more precise, on average, than Hip1 astrometry. In this section, we turn our attention to the Hip2 IAD themselves: their statistics, and their suitability for direct orbit fitting. We especially focus on the statistics of their z-scores.

A well-fit model of any data set will have a reduced chi-squared (χ2/dof) of ≈1, where dof is the degrees of freedom: the number of measurements (hereafter, N) minus the number of fitting parameters. This directly implies that the z-scores of the measurements should have a variance of about dof/N, which is always less than one. Thus, fitting any model to any dataset results in residuals that are technically overfit. To quantify the magnitude of overfitting, we define the overfitting fraction as 1 minus the variance of the distribution of the z-scores; we expect a value ≈1 − dof/N. We usually quote the overfitting fraction as a percent.

In Hip1, each source in our sample has, on average, 64 observations. Fitting a five-parameter astrometric skypath should result in a z-score variance of (64 − 5)/64 = 0.92, or overfitting of 8 per cent. For Hip2, we would also expect the residual distribution to be narrower than a unit Gaussian, but less so than for Hip1 because each scan in an orbit was included separately.7 The average number of scans per source is 114, and so fitting a five-parameter astrometric skypath would result in a z-score variance of 1 − 5/114 = 0.96, for an overfitting of ≈4 per cent. For the subsequent subsections, it is important to keep in mind that figure of 4 per cent for Hip2.

5.1 The residuals and overfitting

The left-hand panel of Fig. 5 displays the z-scores of the residuals for 300 bright stars from Hip2. The z-score width of Hip2 as-is (blue) is slightly higher than the expected value of 0.96: the z-scores have a variance of 1.01. However, the distribution is much narrower in the recalibrated case (green) for these bright magnitudes. The distribution has a variance of 0.15, suggesting that a model with roughly 80 to 90 free parameters was fit – not five – if this variance is taken at face-value. The recalibration procedure does not induce the overfitting. Fig. 3 shows that the recalibrated errors are correct at every magnitude. Thus, the recalibration merely reveals the overfitting in the along-scan residuals. Instead, in Hip2 as-is, the overfitting was apparent in the astrometric parameters (Fig. 2), and we show in this section that it is hidden in the residuals because of the deflated along-scan errors. The apparent overfitting is larger for brighter stars. The middle and right-hand panels of Fig. 5 show these variances for all magnitudes in our sample. They are binned by Hp in constant bins of 300 stars. The recalibrated panel reveals that bright stars are significantly more overfit than the catalogue on-average. The overfitting fraction varies from ≈85 per cent (Hp < 4) to non-existent at faint magnitudes (Hp > 11). Averaging across all magnitudes, the overfitting fraction is 22 per cent, much larger than the 4 per cent expected.

Left-hand panel: The z-scores of the residuals of 300 bright stars near Hp magnitude 3. The recalibrated distribution is highly narrowed (σ2 = 0.15), revealing the overfitting in the as-is residuals that were masked by deflated errors. Middle panel: The variance of the z-score distribution across all Hipparcos magnitudes in our subsample (i.e. the green σ2 in the left-hand panel, but for the many different magnitude bins). Each bin contains 300 stars. The apparent overfitting increases with increasing brightness (decreasing magnitude). Right-hand panel: The same as the middle panel but for Hip2 as-is. Note: The first magnitude bin (magnitudes ≈2 to ≈4) in both middle/right-hand panels correspond to the recal/as-is distributions in the left-hand panel. The as-is data do not include the error transformation (see Michalik et al. 2014), which forced the reduced χ2 of every star to unity.
Figure 5

Left-hand panel: The z-scores of the residuals of 300 bright stars near Hp magnitude 3. The recalibrated distribution is highly narrowed (σ2 = 0.15), revealing the overfitting in the as-is residuals that were masked by deflated errors. Middle panel: The variance of the z-score distribution across all Hipparcos magnitudes in our subsample (i.e. the green σ2 in the left-hand panel, but for the many different magnitude bins). Each bin contains 300 stars. The apparent overfitting increases with increasing brightness (decreasing magnitude). Right-hand panel: The same as the middle panel but for Hip2 as-is. Note: The first magnitude bin (magnitudes ≈2 to ≈4) in both middle/right-hand panels correspond to the recal/as-is distributions in the left-hand panel. The as-is data do not include the error transformation (see Michalik et al. 2014), which forced the reduced χ2 of every star to unity.

An overfitting of 22 per cent instead of 4 per cent implies either that the residuals are suppressed (i.e. structure has been removed by overfitting), or that the recalibrated along-scan errors for the on-sky positions are overestimated. But the recalibrated along-scan errors cannot be deflated without degrading the agreement of the resulting astrometric catalogue with Gaia EDR3. The presence of modest overfitting neatly explains the residuals’ narrow distribution of z-scores. In other words, significantly more parameters were fit to Hip2 than five astrometric parameters per star; how many additional parameters exactly is unknown to us. This fact has important consequences for the expected behaviour of the residuals and in what contexts they can be used.

In the search for an explanation of the apparent overfitting, we turn to the attitude reconstruction. (The additional two parameters fit to 7.7 million residuals in our recalibration cannot explain the 22 per cent overfitting.) One reason that Hip2 achieved smaller formal errors was because of an improved reconstruction of the satellite attitude (van Leeuwen 2007a). The satellite attitude reconstruction operation fit one free parameter (called a node) roughly every 60 s, over the entire 3.4-yr baseline of Hipparcos. van Leeuwen (2007a) states that the average interval between nodes is 64 s (section 9.2.3), and section 3.3. of van Leeuwen & Fantino (2005) implies 66 s. A 64- or 66-s interval over 3.4 yr means that 1.6 to 1.7 million points were fit for the satellite attitude reconstruction. There are 13.6 million residuals in the entire Hip2 dataset. 1.6 million points fit corresponds to 12 per cent. The satellite attitude reconstruction, together with the five astrometric parameters per star, can account for ≈3/4 of the 22 per cent overfitting fraction present. Hip1, which lacks this detailed attitude reconstruction, has an overfitting of 11 per cent. The five-parameter astrometric fits account for 8 per cent overfitting, again ≈3/4 of the total overfitting seen. Additionally, in Hip2, the bright stars were weighted more heavily in the satellite attitude reconstruction compared to faint stars. Thus this would also naturally explain why bright stars appear much more overfit than the faint stars.

In Hip2 as-is, the IAD residuals approximate the expected distribution, while the resulting astrometric catalogue does not agree with Gaia EDR3. Our recalibration produces excellent agreement between the Hip2 astrometric catalogue and Gaia EDR3, while it results in an IAD distribution considerably narrower than expected. Fig. 6 shows these facts in a pictographic table: The middle and bottom rows show the distribution of IAD residuals and astrometric catalogue residuals, respectively; the left-hand column is recalibrated, while the right-hand column shows our recalibration. The top row, a toy model whose behaviour matches that of Hip2 (as-is and recalibrated), is the subject of the next subsection.

A pictographic table where the left-hand column is recalibrated, and the right-hand column is as-is. Top row: The ‘toy model’ along-scan residuals. These are the z-scores of the recalibrated toy model (left-hand panel; green) and the toy model as-is (right-hand panel; blue). Each panel has a light grey unit-Gaussian overplotted. Note that the ideal residuals distribution is not exactly the unit-Gaussian plotted, rather it is one with a variance of 0.96. Middle row: The actual Hip2 residuals, with the recalibrated Hip2 IAD on the left-hand panel (green) and the IAD as-is on the right-hand panel (blue). The width, in terms of the z-score, of the recalibrated distribution is 0.881 (variance 0.8812 ≈ 0.78), revealing the presence of 22 per cent overfitting inherent to the 2007 IAD. Bottom row: The real Hip2 parameter residuals, as in Fig. 2, but with parallaxes and proper motions binned together. Note how the recalibrated parameters (bottom-left panel) match the unit-Gaussian, while the recalibrated residuals (middle-left panel) deviate from expectation. The situation is reversed for the as-is data.
Figure 6

A pictographic table where the left-hand column is recalibrated, and the right-hand column is as-is. Top row: The ‘toy model’ along-scan residuals. These are the z-scores of the recalibrated toy model (left-hand panel; green) and the toy model as-is (right-hand panel; blue). Each panel has a light grey unit-Gaussian overplotted. Note that the ideal residuals distribution is not exactly the unit-Gaussian plotted, rather it is one with a variance of 0.96. Middle row: The actual Hip2 residuals, with the recalibrated Hip2 IAD on the left-hand panel (green) and the IAD as-is on the right-hand panel (blue). The width, in terms of the z-score, of the recalibrated distribution is 0.881 (variance 0.8812 ≈ 0.78), revealing the presence of 22 per cent overfitting inherent to the 2007 IAD. Bottom row: The real Hip2 parameter residuals, as in Fig. 2, but with parallaxes and proper motions binned together. Note how the recalibrated parameters (bottom-left panel) match the unit-Gaussian, while the recalibrated residuals (middle-left panel) deviate from expectation. The situation is reversed for the as-is data.

Fig. 6 shows empirically that with overfit residuals, one can never have the z-scores of the fitted parameters and the z-scores of the along-scan residuals simultaneously follow the expected distributions (a unit-Gaussian and a slightly narrower Gaussian, respectively). To understand further, we present an informative toy-model replica of the Hipparcos 2007 data.

5.2 A toy model of overfit data

We address the hypothesis that a large number of free parameters were fit to the Hip2 IAD, such that some unknown function was subtracted from the residuals. We illustrate this hypothesis on toy data and aim to show the following two behaviours.

  1. Astrometric parameters fit to overfit residuals can be well-behaved even if the residuals themselves (the on-sky positions) have had some modestly significant unknown function subtracted away.

  2. Astrometric parameters fit to overfit residuals can be improved by refitting with the correct along-scan uncertainties.

Point (1), which on its surface is oxymoronic, is in fact expected if the overfitting procedure itself leads to a better underlying model of the data. This is always possible, especially if the overfitting procedure is done simultaneously with the fitting of the astrometric skypath (or in some iterative procedure that is approximately simultaneous). In the case of Hip2, the process of developing a better underlying model (skypath) was had at the cost of suppressing structure in the residuals. A non-parametric attitude correction based on the IAD themselves will inevitably include, and project out, a component of the actual skypaths. This reduces the utility of the residuals themselves; but leads to improved astrometric parameters (roughly 5 per cent better on average compared to Hip1).

We create a single set of toy model data that is analogous to Hip2 in order to illustrate this behaviour. The toy model is comprised of 30 000 fictitious stars with 108 scans each, for roughly 3 million residuals total. Each star has a sky-path described by a single parameter: the mean of that star’s position. Their ground-truth positions are all taken to be 0 mas, and measurement error causes the observations to jitter about that zero position. We take each scan to uniformly have an uncertainty of 1 mas. The residuals therefore are Gaussian random variables with standard deviations of 1 mas. The total dataset is thus 6 million entries long: 3 million residuals with 3 million (identical) along-scan errors. We refer to this dataset as the ‘perfect Hip2’ toy model IAD. We replicate overfitting in Hip2 by doing the following to the ‘perfect Hip2’ toy IAD. We chunk the residuals into 27 point segments, and fit a fifth order polynomial to every segment. This causes a 22 per cent overfitting fraction. This is intentionally identical to the 22 per cent in the real Hip2 data. We leave the along-scan errors set to 1 mas. We refer to this now-overfit dataset, yet with correct along-scan errors, as the ‘Hip2 recalibrated’ toy IAD. The top-left panel of Fig. 6 shows the distribution of z-scores (the residuals all divided by 1 mas) in green for the ‘Hip2 recalibrated’ toy IAD. It is narrower than 1, owing to the overfitting. The z-score distribution has a standard deviation of 0.881. Note that 1 − 0.8812 = 0.22, equal to the input overfitting fraction. The ‘correct’ along-scan errors cannot be known to be 1 mas a priori. They must be estimated from the data. As well, a metric for ‘correct’ must be established, which is a priori difficult. We consider the following metric of ‘correct’: The residuals’ z-score distribution matches a unit Gaussian. It makes little difference whether we adopt the slightly more-exact definition of ‘correct’ as being a Gaussian of variance 0.96 – not 1. The observer who was handed the ‘Hip2 as-is’ toy IAD, would have found that a (deflated) along-scan error of 0.86 mas achieves correctness.8 Deflating the along-scan errors to 0.86 mas gives a distribution of residual z-scores that best visually matches a unit-Gaussian. Combining the overfit residuals with the deflated along-scan errors of 0.86 mas, yields our second dataset. We call this dataset the ‘Hip2 as-is’ toy IAD. The top-right panel of Fig. 6 shows the distribution of z-scores for the ‘Hip2 as-is’ toy IAD. The width is 1, showing nearly perfect agreement with the desired normal distribution.

An along-scan error of 0.86 mas is an underestimate. What has occurred is that structure has been removed from the residuals, and the along-scan errors have been deflated to compensate. But it is nearly impossible to know how much structure has been removed without a comparison dataset: a toy ‘Gaia EDR3’ to match the toy ‘Hip2 as-is’ IAD.

For all three toy datasets (‘as-is’, ‘recalibrated’, and ‘perfect’), we perform a one-parameter astrometric fit to each star: fitting for the mean of the residuals. We then examine the statistics of the distribution of the fitted parameter for all 30 000 stars. Remember that the toy dataset (prior to overfitting) was constructed so that each star’s position was scattered around 0. So our toy ‘Gaia EDR3’ astrometric parameters are simple: positions of 0 mas for every star.

The z-scores produced by fitting astrometric parameters to the toy IAD are nearly identical to those seen in the real data in the bottom panels of Fig. 6. For the as-is toy data, the distribution of parameter z-scores is overdispersed. The scatter of the fitted parameters is nearly 0.1 mas (equalling the expected 1 mas/N), but the inferred formal error on each parameter is about 0.086 mas. So fitting parameters to the toy ‘Hip2 as-is’ IAD ostensibly yields ‘improved’ formal errors that are better by roughly 12 per cent. Yet, because of toy ‘Gaia EDR3’ we know that there is no real improvement: the scatter relative to this comparison data is not any better than 0.1 mas.

Recall that the toy ‘Hip2 recalibrated’ IAD has identical residuals as the toy ‘Hip2 as-is’ IAD. Yet the former has 1 mas along-scan errors, which are inflated relative to the as-is data. Fitting astrometric parameters to the toy ‘Hip2 recalibrated’ IAD gives formal uncertainties that match the actual 0.1 mas scatter observed about zero. The distribution of z-scores is accurately Gaussian, just like with the real recalibrated Hip2 data.

An equivalent way to interpret this result is: One can recalibrate and fix the toy ‘as-is’ data by inflating each along-scan error with an additional dispersion of 0.5 mas. This brings the total along-scan error back to the correct value of 1 mas. As we showed in Sections 3 and 4, solving for the correct additional dispersion is straightforward if one has access to a very precise comparison dataset of parameters like Gaia EDR3. We argue that this toy model has provided understanding of why our recalibration on actual Hip2 data was successful.

6 THE RECALIBRATED IAD IN THE CONTEXT OF ORBIT FITTING

We have so far reserved our discussion to the benefits of the recalibration to the IAD as a whole. The recalibrated Hip2 data provide better agreement with Gaia EDR3, but they also inform and prove useful for individual planetary systems, in particular, the orbits of stars with known substellar companions. For any planetary system, one way to constrain the orbits of the constituents is to fit directly the reflex motion of the star to the on-sky positions given by Hipparcos – as in Zucker & Mazeh (2001), Sozzetti & Desidera (2010), Reffert & Quirrenbach (2011), Sahlmann et al. (2011), Snellen & Brown (2018), and Nowak et al. (2020). One can reconstruct the on-sky positions by adding the residuals to the best-fitting skypath.

The on-sky positions may be statistically well-behaved for stars where the errors are not severely underestimated. These are stars whose average along-scan error is significantly larger than 2.25 mas, mainly those fainter than Hp ≈ 9. However, for bright stars like 51 Eri (along-scan errors ≈ 1 mas) or β Pic (along-scan errors ≈ 0.8 mas) with very large differences between as-is and recalibrated formal errors, these on-sky positions are poorly behaved statistically. Caution should be heeded in these cases. We quantitatively illustrate why with 51 Eri. Table 1 shows the first five rows of the Hip2 as-is IAD and the Hip2 recalibrated IAD. On the right of the dividing vertical line is the IAD as-is published with the Java Tool (see footnote 1), and on the left is the IAD after recalibration. IORB, EPOCH, RES, and SRES are the orbit number, the time in years from 1991.25, the residual, and the along-scan error, respectively. Many columns have been omitted for clarity because, like IORB and EPOCH, they do not change with recalibration. The table highlights how the errors of the on-sky positions of 51 Eri, after recalibration, are roughly 2.5 times larger than the as-is data. The best-fitting sky-path to 51 Eri after recalibration therefore has a much smaller χ2 of only 24 – a factor of five smaller than what is expected from 104 observations and five free parameters. If used alongside other sources of data (like radial velocities), it correspondingly exerts much less influence on an orbital fit than with the IAD as-published. Such a potentially hazardous situation is common because using Hipparcos and Gaia astrometry, alongside radial velocities and/or direct-imaging, is the norm for breaking the mass-inclination degeneracy of giant-planet orbits.

Table 1.

The first five entries of the Hip2 IAD (recalibrated and as-is) for 51 Eri (Hip 21547).

IORBEPOCHRES (recal)SRES (recal)RES (as-is)SRES (as-is)
167−1.20320.502.420.110.88
167−1.20321.582.421.200.88
208−1.1533−0.972.43−1.010.91
208−1.15330.062.480.021.04
208−1.1533−0.872.44−0.910.93
IORBEPOCHRES (recal)SRES (recal)RES (as-is)SRES (as-is)
167−1.20320.502.420.110.88
167−1.20321.582.421.200.88
208−1.1533−0.972.43−1.010.91
208−1.15330.062.480.021.04
208−1.1533−0.872.44−0.910.93
Table 1.

The first five entries of the Hip2 IAD (recalibrated and as-is) for 51 Eri (Hip 21547).

IORBEPOCHRES (recal)SRES (recal)RES (as-is)SRES (as-is)
167−1.20320.502.420.110.88
167−1.20321.582.421.200.88
208−1.1533−0.972.43−1.010.91
208−1.15330.062.480.021.04
208−1.1533−0.872.44−0.910.93
IORBEPOCHRES (recal)SRES (recal)RES (as-is)SRES (as-is)
167−1.20320.502.420.110.88
167−1.20321.582.421.200.88
208−1.1533−0.972.43−1.010.91
208−1.15330.062.480.021.04
208−1.1533−0.872.44−0.910.93

Beyond the effects of underestimated uncertainties, the use of the IAD residuals themselves remains problematic for orbit fitting. Hip2 used observed stellar positions to reconstruct the satellite attitude. A physical perturbation to the stellar positions will, in turn, perturb the attitude correction. Real stellar motion residuals cannot be distinguished from noise and will be suppressed as a byproduct of correcting the satellite attitude. A similar effect is well-known in high-contrast imaging, resulting in substantial suppression of planet light as a byproduct of suppressing starlight using post-processing algorithms (Lafrenière et al. 2007; Soummer, Pueyo & Larkin 2012). Correcting for this bias can be done by reprocessing the data with synthetic sources added (Lafrenière et al. 2007; Marois, Macintosh & Véran 2010) or by forward-modelling the effects of post-processing (Brandt et al. 2013; Pueyo 2016). In the context of Hipparcos, one could locally re-calibrate the along-scan attitude (see Section 4.1) using extrapolated Gaia positions of single well-behaved HG stars. This exceeds the scope of this work. We are left with a bias whose sign is clear (residuals from the best-fitting skypath will be suppressed) but whose magnitude is uncertain.

Overall, this implies important caveats when constraining the orbits of giant planets with Hip2. For the brightest (most precise) sources in Hipparcos, their errors are really a factor of two to three worse than reported in the published Hip2 catalogue – reducing their utility. Moreover, their on-sky positions, even if one wanted to fit to them, are potentially affected by a partial subtraction of real residuals in the skypath due to overfitting. There is, however, a clear way forward: fitting to recalibrated astrometric parameters9 and not the individual positions implied by the IAD. Indeed, by combining error-inflated Hip2 astrometric parameters with Gaia EDR3, Dupuy, Brandt & Brandt (2022) derived the strongest mass upper limit for 51 Eri b. If one fits to the recalibrated astrometric parameters instead, then the left-hand column of Fig. 2 proves that the weighting is correct. The complicated magnitude-dependent overfitting fraction of Fig. 5 can be safely ignored. But, fitting to the astrometric parameters comes with the unfortunate drawback that nearly all of the information about short-period planets (periods less than the Hipparcos 3.5-yr baseline) is lost. However, for long-period planets, fitting to astrometric parameters offers the same constraining power as fitting to the on-sky positions themselves.

7 A MERGER OF HIP1 AND HIP2

We now address one final question: If the Hip2 recalibrated astrometric parameters are well behaved, is there a way to combine Hip1 and Hip2 (recalibrated) to yield a final set that is superior to either individually? This is particularly useful for the problem of fitting orbits of massive companions around stars. One would ideally want to fit to a single, final, set of merged astrometry. We investigate two ways to create this final merged reduction.

7.1 An attempted merger between NDAC, FAST, and Hip2 at the level of the IAD

For Hip1, two independent Hipparcos IAD were merged, and the result of that merger was fitted to produce the Hipparcos 1 catalogue. The two IAD were the NDAC and FAST reductions by the NDAC and FAST data consortia, respectively (Bernstein 1994; ESA 1997; Perryman et al. 1997).10 So, can we merge the three IAD from Hip2 (recal), NDAC, and FAST, to produce a final set of Hipparcos IAD, in the same way that NDAC and FAST were merged together to produce the first Hipparcos IAD? We refer to this way of merging as ‘a merger at the level of the IAD’.

We treat NDAC, FAST, and Hip2 as three independent yet correlated data reductions. We show in this section that a merger at the level of the IAD of all three reductions, because of the overfitting in Hip2, results in residuals whose statistical properties are polluted by overfitting in the same way as Hip2. We found that the level of overfitting was less, because of dilution from the less-overfit NDAC and FAST. But overfitting is still present. We conclude that a three-way merger does not yield useful IAD, and so such a merger is not useful. We now detail the methods of a three-way merger and describe the null result.

We start by formatting the Hip2 IAD so that each observation matches up with each NDAC and FAST observation from Hip1. Within a single data reduction, each star is comprised of many observations, each marked by a time. In a given Hip1 IAD file, NDAC, and FAST reported at most one observation per time (per-star). However, for Hip2 there are usually two to four observations at any one time stamp (see e.g. the IAD of 51 Eri in Table 1). Hip 27321 has 111 observations in the Hip2 IAD, while there are roughly 33 observations for NDAC and FAST each in Hip1. Before attempting the merger, we average together the Hip2 data at a single time (per-star) with inverse variance weighting. This yields IAD that are equivalent when fitted, yet with only one observation per time instead of two to four. We performed this averaging and verified that the resulting astrometric parameters fitted to these IAD are identical to the published catalogue. Meaning, the averaging of Hip2 data occurring at identical time-stamps (per-star) makes no difference, yet it crucially allows us to perform a statistical merger per-orbit with NDAC and FAST.

For a single observation of a star, we now have at-most three residuals: one from NDAC, one from FAST, and one from Hipparcos 2. For Hip 27321, we now have about 99 observations: one from Hip2, NDAC, and FAST for every time, across 33 different times. A merger at the level of IAD means combining, at each time, those three residuals together according to the Best-Linear-Unbiased-Estimator (BLUE) weights (see Brandt et al. 2021, or section 3.2 of van Leeuwen & Evans 1998). Such weights are calculated uniquely from the covariance matrix for the three observations. In the covariance matrix, we denote NDAC as N, FAST by F, and Hip2 by the letter V. Without loss of generality, we order the matrix such that the indices 1, 2, and 3 refer to NDAC (N), FAST (F), and Hip2 (V), respectively. For example, C1,2 = ρNFσNσF, and C1,3 = ρNVσNσV, where ρ denotes the correlation coefficient and σ the along-scan error for that observation. The BLUE weights are

(11)

with the three weights w0, w1, and w2 for combining the three residuals, the along-scan error of the merged residual is

(12)

So merging the NDAC, FAST, and Hip2 residuals is straightforward given the covariance matrix C for every observation (and its inverse C−1). However, we do not know all the elements of this matrix.

We know the along-scan errors and thus the diagonal elements. The errors are available from the IAD for every observation and reduction. For Hip2, we use the recalibrated along-scan errors. ρNF is supplied with the Hip1 IAD, and usually falls between 0.2 and 0.8 (in Appendix  A, we performed a consistency check of the ρNF and find it appropriate to use these values as provided). Unfortunately, we are missing two of the three correlation coefficients: ρVF and ρVN. So to perform the merger, we need to solve for those two unknown correlation coefficients.

Easiest is to calculate those coefficients empirically from the residuals themselves. That trivial calculation yields:

(13)
(14)

We can merge the vast majority of orbits using the provided NDAC-FAST correlation coefficient (ρNF) and 0.69, 0.66 for ρVF and ρVN. However, for some values of ρNF, the triplet of correlations is disallowed by the Cauchy–Schwarz (CS) inequality and so the merger is not possible. This will be especially relevant if we want to try other values of ρVF. The CS inequality implies that given two correlation coefficients (ρNF and ρVN) then the third (ρVF) must lie within the range

(15)
(16)
(17)

For problematic orbits, one can set ρVF to the maximal or minimal value, or any value in-between. We characterize this range as follows, using a CS magnitude, dubbed ACS, that ranges between −1 and 1.

(18)

One solution for problematic orbits is to replace one of the correlation coefficients with the result of equation (18) (for some value of ACS). Replacing problematic orbits however is just one way to merge the three reductions. This method also relies on trusting the empirical correlation coefficients we calculated. Because we have resolved the issue of disallowed triplets, we could replace the empirical estimates with blind guesses. We outline below the three most reasonable algorithms to find these correlation coefficients. They are ordered in terms of computational complexity, from least to most. Or equivalently, ordered by how much we trust our empirical estimates of ρVF and ρVN, from most to least.

  1. Use the empirical values ρVF = 0.69 and ρVN = 0.66. In the rare edge case where the correlations form a triplet that is incompatible with the CS-inequality, replace ρVF with equation (18) for some global value of ACS. Try full mergers with all possible values of ACS (i.e. varying between −1 and 1), and take the merger with the best statistical properties.

  2. Fix only one of the two unknown correlation coefficients to the empirical value. For example, ρVN = 0.66. Then solve for ρVF with equation (18) for some global value of ACS. Try full mergers with all possible values of ACS, and take the merger with the best statistical properties.

  3. Trust only the Hip1 provided ρNF. Then, let ρVN vary over all possible values (−1 to 1). For each trial value of ρVN, solve for ρVF with equation (18). Try full mergers with all possible values of ACS and ρVN. Note that this requires evaluating a grid of mergers. For example, 900 mergers are required for 30 values of ACS and 30 of ρVN.

We implemented all three methods above. For all constructions, the final set of merged data was polluted by overfitting in the same way as Hip2, meaning that the merger yielded residuals with undesirable statistical properties. The amount of overfitting varied but was never reduced to a satisfactory level.

Although we went through great pains to exhaust the three reasonable options outlined, this conclusion is no surprise. As described in Section 5.2, when residuals are overfit, real signal is removed and cannot be regained. The underlying model (skypath) may be improved, but those residuals are only useful relative to that model (Hip2) and not necessarily relative to another model (e.g. Hip1). We conclude that it is presently impossible to merge Hip1 and Hip2 together at the level of the intermediate data and individual orbits, in a way that yields new IAD with the desired statistical properties.

A second, albeit higher level, way of merging Hip1 and Hip2 turned out to be viable and trivial: combining the reductions at the level of the parameters. This does not result in a new set of IAD, but it does result in an improved set of fitted astrometric parameters. We detail these efforts and the resulting merger in the following subsection.

7.2 A weighted combination of Hip1 and recalibrated Hip2

Brandt (2018) showed that a weighted combination of Hip1 and Hip2 yields a catalogue that is better than either reduction on their own. We perform a weighted combination of Hip1 and Hip2 just as in the HGCA (Brandt 2018, 2021), except that we use the new, recalibrated Hip2 data. The astrometric parameters (parallax and proper motions) for a star are the weighted combination of the parameters Hip1 and the recalibrated Hip2. So μαmerged=(1f)μαHip1+fμαHip2,recal, with f being the weight applied to Hip2. We do a simple grid search for the weight that maximizes agreement with Gaia EDR3 proper motions.

The result of that minimization is f = 0.63. The left-hand panel of Fig. 7 shows the resulting distribution of proper motion residuals, μαmergedμαGaiaEDR3 and μδmergedμδGaiaEDR3, binned together. We display the raw parameter differences, not normalizing by standard error. This way, it is easy to see by eye that the 37 per cent/63 per cent weighting of Hip1/Hip2 recalibrated (black) is more performant than either reduction on their own. The combination is 5 per cent better, on average, than Hip2 recalibrated on its own (green); and 11 per cent better than Hip1 on its own (red). This improvement is highly statistically significant, as discussed by Brandt (2018).

The distributions of the differences between Hipparcos–Gaia EDR3 long-baseline proper motions and Hipparcos proper motions, for variations of mixing the Hipparcos 1997 and 2007 catalogues. The vertical axis is probability density. Both right ascension and declination proper motions are binned together. In the left-hand panel, we show these differences using the original Perryman et al. (1997) Hipparcos catalogue (labelled Hip1), the recalibrated version of the van Leeuwen (2007b) Hipparcos catalogue (labelled Hip2 recal), and a mixture of the two catalogues at the level of the best-fitting parameters. The mixture of the two catalogues has lower residuals than either catalogue on its own. In the legend, σ is the standard deviation of the residuals for all points within the plot’s limits. The right-hand panel shows the same comparisons, but using the original Hip2 catalogue (not recalibrated). This figure is in the same style as fig. 2 in Brandt (2018), where this comparison was made for Gaia DR2. The right-hand side here slightly differs from fig. 2 in Brandt (2018), because of the slightly different subset of Hipparcos sources used.
Figure 7

The distributions of the differences between HipparcosGaia EDR3 long-baseline proper motions and Hipparcos proper motions, for variations of mixing the Hipparcos 1997 and 2007 catalogues. The vertical axis is probability density. Both right ascension and declination proper motions are binned together. In the left-hand panel, we show these differences using the original Perryman et al. (1997) Hipparcos catalogue (labelled Hip1), the recalibrated version of the van Leeuwen (2007b) Hipparcos catalogue (labelled Hip2 recal), and a mixture of the two catalogues at the level of the best-fitting parameters. The mixture of the two catalogues has lower residuals than either catalogue on its own. In the legend, σ is the standard deviation of the residuals for all points within the plot’s limits. The right-hand panel shows the same comparisons, but using the original Hip2 catalogue (not recalibrated). This figure is in the same style as fig. 2 in Brandt (2018), where this comparison was made for Gaia DR2. The right-hand side here slightly differs from fig. 2 in Brandt (2018), because of the slightly different subset of Hipparcos sources used.

For comparison, we repeat the exercise with Hip2 as-is – just as was done in Brandt (2018). The right-hand panel of Fig. 7 shows the result of the mixture of Hip1 and Hip2. We find, just as Brandt (2018) found, that the optimal mixture is 40/60 Hip1/Hip2 as-is. The slightly smaller weight on Hip2 (when using the as-is reduction) is expected. Given a mixture of Hip2 and Hip1, more weight should be applied to Hip2 recalibrated because it is improved over Hip2 as-is.

The Hip1/Hip2 recalibrated mixture is barely better (only 0.2 per cent better) than a Hip1/Hip2 as-is mixture. This is a direct consequence of Hip2 (recal) being 0.5 per cent better than Hip2 as-is, where one would roughly expect an improvement of f(0.5 per cent)=0.63(0.5 per cent)=0.3 per cent, close to the 0.2 per cent found. This yields an important and highly convenient corollary: Previous works that mix Hip1/Hip2 40/60 with inflated parameter errors need not take into account this new recalibration because the improvement (on average) is not significant. It is possible though that Hip1/Hip2 recalibrated is superior for some sources and use cases. It is possible as well, that for some sources Hip2 recal (alone) is the optimal choice.

8 CONCLUSIONS

We showed that Hipparcos 2007 is improved by including an offset to the IAD residuals and an additional dispersion in quadrature to the along-scan errors. The best residual offset is +0.140 mas and the best additional dispersion is 2.25 mas. This yields a recalibrated IAD, and fitting skypaths to those recalibrated IAD yields recalibrated astrometric parameters. The recalibration of Hip2 can easily be done on-the-fly by adding those best-fitting values to and refitting the IAD. Instead of producing an entirely new catalogue and recalibrated IAD, we provide a python interface for users to easily recalibrate the data on-demand. We have updated htof with the new capability to recalibrate the IAD. htof can apply the residual offset and additional dispersion to the IAD, perform a refit to those data, and write to file the new best-fitting parameters and the recalibrated IAD. This operation takes negligible time to complete. We encourage the reader to consult the recalibration example jupyter notebook.11 The recalibrated astrometric parameters have better agreement with Gaia EDR3 astrometric parameters. This improvement is significant at roughly 60 Gaussian sigma.

We revealed two important caveats with using Hipparcos 2, both with the recalibrated and 2007 as-is data. These are:

  1. The individual position measurements (i.e. the residuals) show strong evidence for being overfit by 22 per cent on average (Fig. 6), and the overfitting is more severe for brighter stars. The satellite attitude reconstruction in Hip2 accounts for about half of the overfitting.

  2. Because the individual position measurements of Hip2 are overfit, one should exercise caution when fitting an orbit directly to those positions.

We offer recommendations for the major use case of fitting orbits to the IAD (in order from most preferred to least):

  1. We recommend fitting orbits to the astrometric parameters and proper motion anomalies provided by the HGCA, which uses a mixture of Hip1/Hip2. We showed that a mixture of Hip1/Hip2 recalibrated is more or less identical to Hip1/Hip2 as-is, and so the HGCA is in line with the improvements to Hip2 presented here.

  2. In cases where one finds they do not want to use the HGCA, then the next best option is to use the recalibrated Hip2 astrometric parameters.

  3. If one needs to fit to the on-sky positions directly (i.e. the IAD residuals + the best-fitting skypath), then take caution. Check the statistical properties of the recalibrated residuals and errors for that particular source, or consider using the Hip1 (which is less overfit than Hip2) residuals and skypath. One should perform the same statistical checks to Hip1.

Following our recommendations means that one will be least likely to bias the inferred parameters of companions in an orbital fit. There exist stars where fitting the on-sky positions may be okay. That is the case if the recalibrated residuals happen to be statistically well behaved for that source.

As with all data and all recommendations, there is no replacement for individual assessment of a data’s validity on a case-by-case basis (e.g. Snellen & Brown 2018 for β Pic). The Hipparcos 2007 recalibration, in addition to its statistical improvements, can serve as a valuable tool in that assessment.

Acknowledgement

GMB is supported by the National Science Foundation (NSF) Graduate Research Fellowship under grant no. 1650114. DM gratefully acknowledges support from the European Space Agency (ESA) Research Fellow programme. This work has made use of data from the ESA mission Gaia (https://www.cosmos.esa.int/Gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC; https://www.cosmos.esa.int/web/Gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

This work made use of astropy (Astropy Collaboration 2013; Price-Whelan et al. 2018), scipy (Virtanen et al. 2020), numpy (Oliphant 2006; van der Walt, Colbert & Varoquaux 2011), pandas (McKinney 2010; The Pandas Development Team 2020), and htof (Brandt et al. 2021; Brandt & Michalik 2022).

DATA AVAILABILITY

The data underlying this article are available at https://www.cosmos.esa.int/web/hipparcos/interactive-data-access for Hipparcos, hosted and publicly available via the European Space Agency. The Gaia data are available at https://gea.esac.esa.int/archive/, and the HipparcosGaia Catalogue of Accelerations is available on Vizier (https://vizier.cfa.harvard.edu/), J/ApJS/254/42.

Footnotes

1

For Hip2 as-is, we use the IAD recently published for the 2014 Java Tool. But these IAD are, for the vast majority of stars, nearly identical to the IAD published in 2007. See comments on this in Brandt et al. (2021).

2

Code available by request.

3

α includes the standard cos δ factor. It is often denoted α*.

4

The 3800 points of improvement is with respect to the parameter errors after inflating the Hipparcos IAD by 2.25 mas. Adopting the Hip2 as-is catalogue errors results in a much larger improvement of 6600 points of χ2.

5

The long-baseline proper motions are sometimes referred to as Δα/Δt in right ascension and Δδ/Δt in declination, e.g. in Brandt 2018.

6

The best-fitting trend is r(j)/(mas) = a + bj + cj2 where j is the orbit number index and a = 5.569 · 10−1, b = −7.539 · 10−4, and c = 2.719 · 10−7.

7

One would expect the 2-4 scans within a single orbit to be correlated, but for the construction of the Hip2 as-is catalogue they were assumed to be uncorrelated. Thus, the overfitting fraction expected is 4 per cent.

8

Or, slightly more than 0.86 mas if a final distribution width of 0.96 was desired.

9

The new astrometric parameters for any star are retrievable from the header of its recalibrated data file produced by htof.

10

Note that the merged Hipparcos 1 IAD were not published; one must reconstruct it on their own. But, merging is uncomplicated and we detail it in Section 7.1.

References

Astropy Collaboration
,
2013
,
A&A
,
558
,
A33

Bernstein
H. H.
,
1994
,
A&A
,
283
,
293

Brandt
T. D.
,
2018
,
ApJS
,
239
,
31

Brandt
T. D.
,
2021
,
ApJS
,
254
,
42

Brandt
T. D.
et al. ,
2013
,
ApJ
,
764
,
183

Brandt
G. M.
,
Michalik
D.
,
Brandt
T. D.
,
Li
Y.
,
Dupuy
T. J.
,
Zeng
Y.
,
2021
,
AJ
,
162
,
230

Brandt
G. M.
,
Michalik
D.
,
Brandt
T. D.
,
Hung
G.
,
Li
Y.
,
Andreason
T.
,
Dupuy
T. J.
,
Zeng
Y.
,
2022
,
Htof
.
Zenodo
, available at: https://doi.org/10.5281/zenodo.6789233

Dupuy
T. J.
,
Brandt
G. M.
,
Brandt
T. D.
,
2022
,
MNRAS
,
509
,
4411

ESA
,
1997
,
The HIPPARCOS and TYCHO catalogues. Astrometric and photometric star catalogues derived from the ESA HIPPARCOS Space Astrometry Mission
.
ESA Publications Division
,
the Netherlands

Gould
A.
,
Kollmeier
J. A.
,
Sesar
B.
,
2016
,
preprint
()

Lafrenière
D.
,
Marois
C.
,
Doyon
R.
,
Nadeau
D.
,
Artigau
É.
,
2007
,
ApJ
,
660
,
770

McKinney
W.
,
2010
,
Proc. 9th Python Sci. Conf., Data Structures for Statistical Computing in Python
. p.
56

Makarov
V. V.
,
2002
,
AJ
,
124
,
3299

Makarov
V. V.
,
2022
,
Rev. Mex. Astron. Astrofis.
,
58
,
315

Marois
C.
,
Macintosh
B.
,
Véran
J.-P.
,
2010
,
Proc. SPIE Conf. Ser. Vol. 7736, Adaptive Optics Systems II
.
SPIE
,
Bellingham
, p.
77361J

Michalik
D.
,
Lindegren
L.
,
Hobbs
D.
,
Lammers
U.
,
2014
,
A&A
,
571
,
A85

Nowak
M.
et al. ,
2020
,
A&A
,
642
,
L2

Oliphant
T.
,
2006
,
NumPy: A Guide to NumPy
.
Trelgol Publishing
,
USA

Perryman
M. A. C.
et al. ,
1997
,
A&A
,
500
,
501

Price-Whelan
A. M.
et al. ,
2018
,
AJ
,
156
,
123

Pueyo
L.
,
2016
,
ApJ
,
824
,
117

Reffert
S.
,
Quirrenbach
A.
,
2011
,
A&A
,
527
,
A140

Sahlmann
J.
et al. ,
2011
,
A&A
,
525
,
A95

Snellen
I. A. G.
,
Brown
A. G. A.
,
2018
,
Nature Astron.
,
2
,
883

Soummer
R.
,
Pueyo
L.
,
Larkin
J.
,
2012
,
ApJ
,
755
,
L28

Sozzetti
A.
,
Desidera
S.
,
2010
,
A&A
,
509
,
A103

The Pandas Development Team
,
2020
,
pandas-dev/pandas
.
Pandas

van der Walt
S.
,
Colbert
S. C.
,
Varoquaux
G.
,
2011
,
Comput. Sci. Eng.
,
13
,
22

van Leeuwen
F.
,
2007a
,
Astrophysics and Space Science Library, Vol. 350, Hipparcos, the New Reduction of the Raw Data
.
Springer
,
Berlin

van Leeuwen
F.
,
2007b
,
A&A
,
474
,
653

van Leeuwen
F.
,
Evans
D. W.
,
1998
,
A&AS
,
130
,
157

van Leeuwen
F.
,
Fantino
E.
,
2005
,
A&A
,
439
,
791

Virtanen
P.
et al. ,
2020
,
Nature Methods
,
17
,
261

Zacharias
N.
,
Makarov
V. V.
,
Finch
C. T.
,
Harris
H. C.
,
Munn
J. A.
,
Subasavage
J. P.
,
2022
,
AJ
,
164
,
36

Zucker
S.
,
Mazeh
T.
,
2001
,
ApJ
,
562
,
549

APPENDIX A: VERIFYING THE HIP1 NDAC AND FAST CORRELATION COEFFICIENTS

The correlation between each NDAC and FAST residual pair are given in the Hip1 IAD. These correlations can be verified in the following simple way. For any given observation pair, the NDAC-FAST covariance matrix defines a transformation that when applied to that observation pair should decorrelate the two observations. If it does not decorrelate the two observations (on average, over all the samples in Hip1), then the correlation coefficients are on average incorrect.

The transformation matrix is that which diagonalizes the covariance matrix. We calculate this transformation matrix for all observation pairs in the sample used in this paper (using the covariance matrices provided in the IAD); apply each transformation to each one of the millions of observation pairs; and then verify that the ‘decorrelated’ NDAC-FAST observations are indeed decorrelated. And indeed, the result is a success. The decorrelated NDAC/FAST residuals form a 2D Gaussian, demonstrating that on average the correlation coefficients are correctly estimated.

One important caveat though is that of potential skew lying beneath the noise of Hip1. However, one can use Gaia to reduce the scatter in the Hip1 residuals and therefore perform a more precise check. Recalibration of the residuals with Gaia may reveal a skew in the decorrelated dataset. The recalibration procedure is as follows:

  1. Reduce the sample of stars to those that are not accelerating (i.e. the same sample we use in this work).

  2. Refit the positions only of each of the stars in that sample, whilst fixing the proper motions and parallaxes to the HGCA values.

In Fig. A1 we show the NDAC and FAST residuals after such a recalibration. In the top panel, they are plotted before decorrelating. We have converted the residuals to z-scores by dividing by the standard error. The positive correlation between the two consortia’s residuals is apparent. In the bottom panel, we show the result of decorrelating each pair of observations using the correlation coefficients as provided in the IAD. If the correlation coefficients from the IAD were on average under or overestimated, then the decorrelation would be unsuccessful and the 2D Gaussian seen in the bottom panel would be skewed. However, we find the resulting Gaussian to be nicely uncorrelated and symmetric. This demonstrates that the Hip1 correlation coefficients on average are trustworthy and are appropriate to use as provided.

Top panel: The Hip1 NDAC and FAST residuals plotted against eachother. Bottom panel: The same residuals in the top panel, but each pair has been transformed to the decorrelated basis using the correlation coefficients provided by the IAD. A beautifully symmetric 2D Gaussian is recovered. Note that the bullseye is nearly the same (very symmetric) even if the as-published IAD residuals are used instead of the Gaia recalibrated ones. Two red contours (at arbitrary levels) are shown to emphasis the symmetry.
Figure A1

Top panel: The Hip1 NDAC and FAST residuals plotted against eachother. Bottom panel: The same residuals in the top panel, but each pair has been transformed to the decorrelated basis using the correlation coefficients provided by the IAD. A beautifully symmetric 2D Gaussian is recovered. Note that the bullseye is nearly the same (very symmetric) even if the as-published IAD residuals are used instead of the Gaia recalibrated ones. Two red contours (at arbitrary levels) are shown to emphasis the symmetry.

We note that qualitatively Fig. A1 is unchanged (decorrelation is successful) if one uses Hip1 as-is.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.