ABSTRACT

The high angular resolution monolithic optical and near-infrared Integral field spectrograph is the first light visible and near-infrared integral field spectrograph for the Extremely Large Telescope. To reach the diffraction limit of the telescope (≈ 10 mas) and maintain an optimal image quality over long exposures, an accurate measurement of geometrical distortions in the instrument’s guide star field is needed. Geometrical distortions affecting the guide stars map directly to pointing errors of the science field. The systematic contribution to the pointing error can be calibrated and removed by a corrective model. In this work, we propose a formulation of the corrective model that aims to calibrate the geometrical field distortions down to a given target residual, as well as reducing the time spent in calibrations. We also propose a calibration procedure that accounts for the uncertainties of the measurement process. We developed a tool named harmoni-pm to simulate the expected pointing error caused by geometrical distortions and validate the effectiveness of the proposed corrective model. We also relied on pseudo Zernike polynomials to formulate the model, and the Bayesian theoretical framework to study the propagation of uncertainties along the calibration process. Compared with the classical calibration algorithm, the Bayesian calibration algorithm was able to reduce the number of calibration points required to reach the same model residual. Additionally, we were able to optimize the hardware of the Geometrical Calibration Unit and reduce the time required to achieve the calibration goal.

1. INTRODUCTION

The Extremely Large Telescope (ELT) is a visible to near-infrared (NIR) segmented telescope, currently under construction as part of the European Southern Observatory. At the time of its completion (expected near 2028), its 39-m primary mirror will allow access to a previously unexplored resolution regime of 10 mas near the diffraction limit.

High angular resolution monolithic optical and near-infrared integral field spectrograph (HARMONI) will be one of the first generation instruments of ELT, providing integral field spectroscopy with a variety of adaptive optics modes that enable diffraction-limited observations. HARMONI is conceived as a workhorse instrument and, as such, it must enable a wide range of science cases, ranging from the high-redshift universe to Solar system bodies (Thatte et al. 2021).

The performance of the instrument in the different science cases was validated by means of HARMONI’s pipeline simulator HSIM (Zieleniewski et al. 2015; Pereira-Santaella et al. 2023). Simulations included the detection and classification of high-redshift Type Ia supernovae (Bounissou et al. 2018), observability studies of Population III stars (Grisdale et al. 2021), gas kinematic studies of high-redshift galaxies (Richardson et al. 2020), multiwavelength observations of the circumgalactic medium emission (Augustin et al. 2019), supermassive black hole mass measurements (Nguyen et al. 2023), and determination of morpho-kinematics and star-formation rates of cosmic noon active galactic nuclei (García-Lorenzo et al. 2022) and other high-redshift galaxies (Kendrew et al. 2016; Grisdale et al. 2022).

While adaptive optics is necessary to achieve the diffraction limit, an equally accurate pointing is also critical. This represents a technical challenge not only from the opto-mechanical perspective, but also from the calibration process itself: an instrumental, quasi-systematic pointing error is expected, and must be removed up to a residual comparable to the diffraction limit. This is achieved by means of a corrective model, which predicts the quasi-systematic pointing error in all points of the instrument’s focal plane.

In this paper, we contribute to the development of the corrective model of HARMONI in three ways. First, we introduce the formulation of a general corrective model as a linear combination of complex basis functions on the focal plane, along with a classical calibration strategy (Section 4.2). Next, we provide a conjugate Bayesian calibration algorithm that, in addition to being incremental and not requiring Monte-Carlo sampling, has the potential of reducing the number of measurements needed to reach the calibration goal (Section 4.3). We continue by exploring its practical application to HARMONI field distortions using pseudo Zernike polynomials (PZPs) as basis functions on the unit disc, and propose a geometrical calibration unit (GCU) mask pattern that reduces the impact of measurement noise in calibrations (Section 6). Finally, we discuss its potential application to the pointing model of a telescope, expressed as a multipole expansion of the pointing error (Section 8).

2. HARMONI

HARMONI is a slicer-based integral field spectrograph designed to operate in the 0.46- to 2.45-μm range (3500 < R < 17 000) in a broad variety of scientific programs (Thatte et al. 2022). Its different spectral set-ups will enable observations in the NIR bands, including J (1.05–1.32 μm), H (1.44–1.82 μm), K (1.95–2.47 μm), and portions thereof. It will enable both adaptive optics (AO) [including laser-tomography adaptive optics (LTAO), single-conjugate adaptive optics (SCAO), and high-contrast adaptive optics (HCAO)] and non-AO observations, with fields of view (FoV) of 9 arcsec × 6 arcsec, 4 arcsec × 3 arcsec, 2 arcsec × 1.5 arcsec, and 0.8 arcsec × 0.6 arcsec. In its highest resolution configuration, it will be able to resolve objects around 10 mas, with a sampling of 4 mas per spaxel over a 0.8 arcsec × 0.6 arcsec FoV.

One of the design features of HARMONI is that guiding does not take place on the science target, in any AO mode other than HCAO (Thatte et al. 2022). Instead, the pointing of the science field in HARMONI is controlled by a guide probe measuring the position of a natural guide star (NGS). The basic control loop sends feedback to the telescope to ensure that the guide star is always centred on the guide probe. Any unknown term in the apparent distance between the NGS and the science target will therefore appear as a pointing error on the science target. If the error changes on the time-scale of an observation (effectively up to 1 h), it will also impact the image quality. This error would manifest as a dragging of the science object during the observation.

Of particular concern in HARMONI is optical distortion introduced by the Focal Plane Relay Subsystem (FPRS), as the fundamental architecture of the instrument means that the distortion pattern is seen to rotate as the instrument tracks the field rotation of the telescope. The instrument performance budgets—internal to the project—identify this is a major source of error if not properly characterized by modelling.

Different guide probes are used for different AO modes of HARMONI; a single probe on-axis for HCAO, a single probe within 15 arcsec of the science target for SCAO, a single guide probe within 60 arcsec for non-AO, and two guide probes within 60 arcsec for LTAO (Dohlen et al. 2022). The latter is the most challenging configuration for the geometric distortion, as it is the most demanding combination of spatial resolution and physical patrol field.

Ensuring that the field distortion contribution does not affect the LTAO image quality (≈10 mas) means understanding the model at the level of ≈2 mas over 60 000 mas; i.e. 1 part in 30 000.

If we understand the guide probes as measurement devices of the positions of stars, we should expect them to be affected by random and quasi-systematic error contributions. While the former fall under the category of measurement noise and are—in internal project budgets—much smaller than the pointing accuracy, the latter are caused by slowly varying (≫exposure time) opto-mechanical effects (e.g. optical distortions, mechanical non-linearities, etc.) and are much larger than the measurement noise. The overall instrument control concept therefore relies on a corrective pointing model to compensate for long-term quasi-systematic effects.

The calibration of the corrective model relies on HARMONI’s GCU. The GCU consists of a deployable mask that can be inserted in the focal plane of the telescope, and provides a pattern of sources whose locations are known beforehand. The difference between the nominal and measured locations is then used as training data for the corrective model.

3. METHODOLOGY

In order to validate the proposals made in this paper, two simulation codes were used: harmoni-pm and bayescal.

hamoni-pm (Carracedo-Carballal 2022) is a pointing error simulator written in Python 3 for HARMONI. It was used to characterize the error distribution of the measurements of the guide probes, identify the main drivers of the instrumental error, validate corrective models, and evaluate calibration patterns. harmoni-pm relies on an exhaustive, extensible instrument model in which contributions to the instrumental pointing error are described as composed |$\mathbb {R}^2\rightarrow \mathbb {R}^2$| transforms. These transforms are applied to the coordinates of the points contained in the instrument focal plane, and each transform encodes the effect of a particular type of opto-mechanical distortion (e.g. relay optics, positioning errors of the GCU mask, etc.).

The transforms are parametrized by random variables, classified by the time at which they are sampled:

  • Manufacturing time. Used to represent manufacturing tolerances (e.g. the size of a particular mechanical element).

  • Calibration. Used to model the repeatability of quantities that may change between two consecutive calibrations of the instrument.

  • Positioning. Used to model quantities that change every time a measurement is performed.

bayescal (Carracedo-Carballal 2023) is a calibration process simulator written after the results of this characterization (Carracedo-Carballal et al. 2022). bayescal runs large batches of simulated calibrations, and it was used to produce the results of Section 7. It was designed to be small and to output easily reproducible results. It features a simplified instrumental model that captures the main traits of the instrumental pointing error as simulated by harmoni-pm.

This simplified instrumental model comprises the contribution of HARMONI’s relay optics, positioning errors of the GCU mask, positioning accuracy of the guide probe, and angular errors of the instrument rotator. High-order distortions of the relay optics were included as a fully toleranced model of the FPRS mirrors. On the other hand, this instrumental model does not take into account systematic, mechanical effects of the guide probes.

4. PROBLEM FORMULATION

Let |$F\subseteq \mathbb {R}^2$| be the set of point coordinates contained in an idealized focal plane of the instrument. The points in this idealized focal plane are related to true sky coordinates by a simple transform involving the plate scale of the telescope. Let F′⊆F the set of point coordinates that can be actually observed by the instrument’s detector, including the effects of the field distortion introduced by the different opto-mechanical elements in the light path. The instrumental distortion causes the observed locations |$\pmb {x}^{\prime }\in F^{\prime }$| to be displaced with respect to their theoretical locations |$\pmb {x}\in F$| as:

(1)

The instrumental pointing error |$\pmb {\varepsilon }\in \mathbb {R}^2$| can be seen as a function of |$\pmb {x}$|⁠, with |$\Vert \pmb {\varepsilon }(\pmb {x})\Vert _2$| a quantity much smaller than the focal plane dimensions. Requirements of the instrument impose smoothness conditions on |$\pmb {\varepsilon }(\pmb {x})$| and therefore |$\pmb {\varepsilon }(\pmb {x})\approx \pmb {\varepsilon }(\pmb {x}^{\prime })$| up to an error that is below the instrument’s pointing accuracy.

The goal of the corrective model is, given certain maximum pointing error residual E > 0, to provide certain function |$\tilde{\pmb {\varepsilon }}: F\rightarrow \mathbb {R}^2$| such that |$\Vert \tilde{\pmb {\varepsilon }}(\pmb {x})-\pmb {\varepsilon }(\pmb {x})\Vert _2\lt E,\forall \pmb {x}\in F$|⁠.

While the aforementioned maximum residual requirement applies for all points in F, the model still needs to be calibrated from a finite (and hopefully small) number of calibration points. These calibration points are provided by the GCU. During an instrument’s calibration, the GCU deploys a mask with a pattern of illuminated points in the ELT’s focal plane. The position of these calibration points is assumed to be known with given accuracy.

The measurement devices for this task are the guide probes, that consist of two mirrors that can be placed anywhere in their corresponding half of the relayed focal plane, and image objects therein. The instrumental pointing error is simply the difference between the expected location of the reference object (in this case, the GCU’s calibration points) and the location that is actually observed by the guide probe.

4.1 The corrective model

We start by representing all two-dimensional (2D) vectors |$\pmb {v}=(x, y)\in \mathbb {R}^2$| as complex quantities |$\hat{v}\in \mathbb {C}$| by means of the bijection:

(2)

where i is the imaginary unit. We therefore refer to the true instrumental pointing error in complex form as |$\hat{\varepsilon }(\pmb {x})$|⁠. If |$\hat{\varepsilon }(\pmb {x})$| allows an asymptotic expansion of certain basis functions |$\hat{G}_j:F\rightarrow \mathbb {C}$| of the form:

(3)

we can formulate the corrective model as a truncated asymptotic expansion of order J that is fully parametrized by its complex coefficients |$\hat{\alpha }^j\in \mathbb {C}$|⁠. A prediction of the pointing error |$\hat{\varepsilon }(\pmb {x})$| at the point |$\pmb {x}\in F$| is obtained by evaluating

(4)

4.2 The classical calibration problem

The classical calibration problem exploits the linearity of equation (3) and ignores the effects of the measurement noise. Assuming that the distortion is well described by an expansion of the first J functions, the problem is posed as follows:

Let |$\pmb {x}^i={(x, y)}^\top \in F, 0\le i \lt J \le K$| a set of K calibration points and |$\hat{\varepsilon }^i = \hat{\varepsilon }(\pmb {x}^i) \in \mathbb {C}$| a set of K measurements of the pointing error in the calibration points in complex form. Find |$\hat{\alpha }^j\in \mathbb {C}, 0\le j \lt J$| such that the residual sum of squares

(5)

is minimal.

If we write |$\hat{\pmb {\alpha }} = {(\hat{\alpha }^0, \hat{\alpha }^1, \dots , \hat{\alpha }^{J-1})}^\top \in \mathbb {C}^J$|⁠, |$\hat{\pmb {\varepsilon }} = {(\hat{\varepsilon }^0, \hat{\varepsilon }^1, ..., \hat{\varepsilon }^{K-1})}^\top \in \mathbb {C}^K$|⁠, and |$\hat{\pmb {G}}\in \mathbb {C}^{K\times J}$| as a matrix such that its ith row and jth column is |$\hat{G}^i_j=\hat{G}_j(\pmb {x}^i)$|⁠, equation (5) can be written in matrix form as

(6)

Matrix |$\hat{\pmb {G}}$| is also called collocation matrix, by similarity with other interpolation problems. If J = K and |$\text{det }\hat{\pmb {G}} \ne 0$|⁠, |$\hat{\pmb {G}}$| is invertible, a solution with null residual can be found:

(7)

How well this solution extends to the rest of points in F depends on the choice of |$\hat{G}_j$|⁠, the measurement noise, and how uniformly the calibration points |$\pmb {x}^i$| are distributed in F. In general, uneven distributions of calibration points translate to ill-conditioned collocation matrices, which amplify measurement noise. In the extreme case in which two calibration points are the same, equation (5) refers to an underdetermined system of equations and the collocation matrix becomes singular. A good calibration pattern is therefore not only determined by the number of calibration points, but also by their impact on the condition number of the collocation matrix.

While this approach does not take measurement noise into account, its effect can be arbitrarily reduced by providing more calibration points (K > J) and ensuring that the rank of |$\hat{\pmb {G}}$| is at least J. In this case, the system of equations becomes overdetermined and ordinary least squares can be used to minimize equation (6).

4.3 The Bayesian calibration problem

Let us now assume a certain abstract model M that makes predictions |$\tilde{\mathcal {D}}={(\tilde{x}^0, \tilde{x}^1,\dots )}^\top$| of certain observable from a given set of model parameters |$\pmb {\vartheta }={(\vartheta ^0, \vartheta ^1,\dots )}^\top$|⁠. We also assume that these parameters were fitted from previous observations |$\mathcal {D}={(x^0, x^1,\dots )}^\top$|⁠.

In the context of Bayesian inference, model parameters are treated as joint distributions |$p(\pmb {\vartheta })$| that encode the degree of belief we have on them. If the probability of observing certain |$\mathcal {D}$| is known when the parameters are known to be exactly  |$\pmb {\vartheta }$|⁠, the Bayes theorem can be used to find the degree of belief over the coefficients after providing additional evidence |$\mathcal {D}$|⁠:

(8)

The probability distribution |$p(\pmb {\vartheta })$| is called prior probability distribution of the parameters, and |$p(\pmb {\vartheta }|\mathcal {D})$| the posterior probability distribution of the parameters. Note that |$p(\pmb {\vartheta }|\mathcal {D})$| is a function evaluated in |$\pmb {\vartheta }$|⁠, leaving |$\mathcal {D}$| fixed. While |$p(\mathcal {D}|\pmb {\vartheta })$| has the form of the probability distribution of measuring certain observation vector |$\mathcal {D}$|⁠, it is evaluated on |$\pmb {\vartheta }$| and therefore it is not a proper probability distribution. We call this function the likelihood function, and we make this distinction explicit by the notation

(9)

Finally, |$p(\mathcal {D})$| is the probability distribution of observing |$\mathcal {D}$| for all possible choices of |$\pmb {\vartheta }$| and it is generally difficult to calculate. Since |$p(\mathcal {D})$| does not depend on |$\pmb {\vartheta }$|⁠, we can treat it as a normalization constant and write equation (8) as

(10)

Posterior probability distributions can also be used as prior distributions for subsequent Bayesian updating steps, which enables iterative refining of the parameter knowledge as additional evidence is provided.

Identifying |$\pmb {\vartheta }$| with the complex model coefficients |$\hat{\pmb {\alpha }}\in \mathbb {C}^J$| and |$\mathcal {D}$| with both the calibration points in complex form |$\hat{\pmb {p}} = {(\hat{x}^0,\dots ,\hat{x}^{K-1})}^\top$| and its corresponding observations of the pointing error |$\hat{\pmb {\varepsilon }}\in \mathbb {C}^K$|⁠, we can formulate the Bayesian calibration problem as follows:

Given a set of observations of the pointing error |$(\hat{\pmb {\varepsilon }},\hat{\pmb {p}})$| and prior knowledge of the model coefficients |$p(\hat{\pmb {\alpha }})$|⁠, find the posterior probability distribution of the parameters |$p(\hat{\pmb {\alpha }}|\hat{\pmb {\varepsilon }},\hat{\pmb {p}})$| as

(11)

As an additional simplifying assumption, we will consider the calibration points |$\hat{\pmb {p}}$| to be known up to arbitrary precision. This is a justifiable assumption, as the physical location of points is planned to be measured by means of microscopy, before the commissioning phase. Since the GCU mask is our ultimate source of truth, errors in this characterization will propagate directly as errors of the corrective model.

If we write all probabilities of the form |$p(\hat{\pmb {\varepsilon }},\hat{\pmb {p}}|\hat{\pmb {\alpha }})$| as |$p(\hat{\pmb {\varepsilon }}|\hat{\pmb {p}},\hat{\pmb {\alpha }})p(\hat{\pmb {p}})$|⁠, we formalise this assumption by modelling |$p(\hat{\pmb {p}})$| as a Dirac delta distribution centred in the nominal calibration points. This enables us to marginalize equation (11) on |$\hat{\pmb {p}}$|⁠, from which we conclude that |$p(\hat{\pmb {\varepsilon }}|\hat{\pmb {\alpha }})=p(\hat{\pmb {\varepsilon }}|\hat{\pmb {p}},\hat{\pmb {\alpha }})$|⁠. Since this dependency on |$\hat{\pmb {p}}$| appears in all probabilities involving |$\hat{\pmb {\varepsilon }}$|⁠, we will make it explicit only when further clarification is needed. Equation (11) can therefore be simplified as

(12)

4.4 Likelihood function

The most fundamental ingredient of Bayesian inference is an accurate description of the likelihood function. This method assumes a Gaussian likelihood (which is rather common for measurement devices with multiple contributions to the measurement error). One must note, however, that this is not true in all scenarios. The applicability of this method is tied to a proof of the Gaussianity of the measurement error.

For the sake of simplicity, we start by rewriting the complex arithmetic in equation (6) in terms of real-valued linear operations. This can be done by alternating real and imaginary parts of complex vectors, and encoding the complex arithmetic in a real-valued collocation matrix as described in Appendix  A. In this form, the components of |$\pmb {p}$| consist of alternating horizontal and vertical coordinates of the calibration points, and components of |$\pmb {\varepsilon }$| of alternating horizontal and vertical projections of the pointing error.

Assuming Gaussianity, the probability of observing a pointing error εi in one of the two axes of certain measurement point is given by

(13)

where |$G^i_j$| are the coefficients of the real-valued collocation matrix |$\pmb {G}\in \mathbb {R}^{2K\times 2J}$|⁠, and are calculated as described in equations (A13)–(A16). This implies that the coefficients |$G^i_j$| are indeed a function of pi. The likelihood function |$\mathcal {L}(\pmb {\alpha }|\varepsilon ^i,p^i,\sigma ^2_{\rm E})$| is simply the probability density |$p(\varepsilon ^i|\pmb {\alpha },p^i,\sigma ^2_{\rm E})$| evaluated in |$\pmb {\alpha }$|⁠.

If we can guarantee the independence of measurements, we can formulate the likelihood function with respect to the measurement vector |$\pmb {\varepsilon }$| as simply the product of the likelihoods of the individual measurements:

(14)

The likelihood function can therefore be expressed as multivariate Gaussian distribution of mean |$\pmb {G}\pmb {\alpha }$| and covariance matrix |$\sigma _{\rm E}^2\mathbb {I}$|⁠, where |$\mathbb {I}$| stands for the identity matrix. The probability density is therefore written as

(15)

5. ANALYTIC SOLUTION

In many applications, posterior distributions can only be estimated by means of Monte-Carlo methods which are computationally expensive. None the less, there are particular instances in which the posterior distribution admits a closed-form expression that is of the same form as that of the prior distribution. This occurs for certain combinations of prior distributions and likelihood functions when they are of the exponential family (e.g. Gaussian, Gamma, exponential, etc.). We say in this case that the prior and posterior are conjugate probability distributions, and the prior |$p(\pmb {\vartheta })$| is said to be a conjugate prior.

Simulated classical calibrations (Carracedo-Carballal et al. 2022) showed that the distribution of model coefficients is also well described by a multivariate Gaussian distribution of the form

(16)

where |$\bar{\pmb {\alpha }}\in \mathbb {R}^{2J}$| is the mean vector and |$\pmb {\Sigma }\in \mathbb {R}^{2J\times 2J}$| the covariance matrix of the distribution. If these parameters refer to those of a prior or a posterior distribution, they are called hyperparameters.

If we write the prior distribution |$p(\pmb {\alpha })$| at calibration the step n as

(17)

and the corresponding posterior distribution after providing additional observations of the pointing error |$\pmb {\varepsilon }$| as

(18)

It can be proven (see Appendix B1) that the posterior hyperparameters can be calculated as

(19)
(20)

which reduces to the classical calibration when K = J and the prior hyperparameters describe an infinitely uninformative distribution.1

In a realistic application of this calibration technique, we could obtain a good representative of the model coefficients from the mean |$\pmb {\alpha }_{n+1}$| and a measure of their uncertainty from their individual variances (which equal to the diagonal elements of |$\pmb {\Sigma }_{n+1}$|⁠).

5.1 Prediction of model residuals

While the degree of uncertainty of the |$\pmb {\alpha }$| is contained in |$\pmb {\Sigma }_{n+1}$|⁠, it is usually more relevant to understand how it translates to uncertainty in the pointing error. This result can be obtained from the variance of the posterior predictive distribution:

(21)

where |$\tilde{\pmb {\varepsilon }}$| denotes a prediction made by the corrective model, given that previous evidence |$\pmb {\varepsilon }$| has been provided. Note that |$p(\tilde{\pmb {\varepsilon }}|\pmb {\alpha })$| has the form of the likelihood function and |$p(\pmb {\alpha }|\pmb {\varepsilon })$| is conjugate for this function. It can be proven (see Appendix B3) that |$p(\tilde{\pmb {\varepsilon }}|\pmb {\varepsilon })$| is another multivariate Gaussian of the form

(22)

where |$\tilde{\pmb {G}}$| is the collocation matrix evaluated in the test points used for the prediction. The covariance matrix |$\sigma ^2_{\rm E}\mathbb {I} + \tilde{\pmb {G}}\pmb {\Sigma }_{n+1}{\tilde{\pmb {G}}}^\top$| is hereinafter referred to as the model uncertainty covariance matrix and will be denoted by |$\tilde{\pmb {U}}_{n+1}$|⁠.

The variance of the predicted pointing error is simply calculated as

(23)

We remark from this equation that the variance is lower bounded by the measurement noise variance |$\sigma ^2_{\rm E}$|⁠. This is expected, as the only way to access pointing error measurements is by means of a measurement device (the guide probe) with finite precision.

This result can be used as part of a termination condition, in which new observations of the pointing error update the hyperparameters iteratively until the degree of certainty of the true pointing error reaches a threshold value (e.g. |$\Vert \sigma ^2[\tilde{\pmb {\varepsilon }}|\pmb {\varepsilon }]\Vert _\infty \lt \sigma ^2_\text{max}$|⁠, with |$\sigma ^2_\text{max}$| certain maximum variance of the pointing error in the test points).

We must take into account that the Bayesian updating step in equation (20) depends on the particular order in which new measurements have been presented to the model. This also implies that batched updatings (i.e. presenting K > 1 measurements in the measurement vector |$\pmb {\varepsilon }$| at once) are not formally equivalent to sequential updatings (i.e. performing K updating steps with one measurement each).

This dependency on ordering is particularly pronounced in the first updatings with vague priors, as the first observations have a larger relative impact on the posterior than subsequent ones. If the prior is too vague, noisy measurements may introduce an undesired shrinkage that bias future updatings towards erroneous values.

In case uninformative priors are not available, this effect can be mitigated by reordering calibration points in a way that the first ones maximize their distance with respect to the next. The rationale of this strategy is to leverage the first steps to integrate distortion information from the whole field as much as possible. Taking points too close together may give excessive weight to certain correlations, shrinking future observations towards wrong directions.

5.1.1 Mean square error and root mean square error

In many practical applications, however, the mean square error (MSE) and the root mean square error (RMS) are the preferred metrics for model performance:

(24)

with ri the per-axis real-valued residuals between the prediction and the measured pointing error. The uncertainties in the model parameters and the measurement noise can be used to model the residual vector |$\pmb {r}=(r_0,r_1,\dots ,r_{2K-1})^\top$| as a zero-mean multivariate Gaussian:

(25)

which can be used to provide upper bounds both on the mean and variance of the MSE and the RMS. If we consider the MSE a random variable calculated as

(26)

given that the components of |$\pmb {r}$| come from a multivariate Gaussian, the distribution of the MSE is that of a generalized central χ2 distribution, with no Gaussian contribution:

(27)

where λi are the 2K eigenvalues of |$\tilde{\pmb {U}}_{n+1}$|⁠. The expected value and variances of the MSE are given by

(28)

If the uncertainty is sufficiently big (e.g. in the earliest steps of the calibration), the expected value and variances of the RMS can be approximated by uncertainty propagation of the MSE as

(29)

We must remark, none the less, that this approach will introduce a slight amount of overestimation as it assumes that we are sampling |$\pmb {\alpha }$| for every estimation and not taking its mean value |$\pmb {\alpha }_{n+1}$| directly.

5.2 Repeatability priors

During the calibration procedure, posterior distributions obtained from the previous measurements are reinterpreted as prior distributions of the next. After the termination condition has been met, we can choose the mean |$\hat{\pmb {\alpha }}_{n+1}\in \mathbb {C}^J$| as a good representative of the model coefficients. None the less, we still have to define the hyperparameters of the initial prior |$p(\pmb {\alpha })=p(\pmb {\alpha }|\pmb {\alpha }_{0}, \pmb {\Sigma }_{0})$|⁠, used before performing the first measurement.

A reasonable choice is to obtain them from the repeatability of classically calibrated model coefficients as observed after certain number M of calibration sessions:

(30)
(31)

where |$\bar{\pmb {\alpha }}$| is the sample mean of the coefficients and |$\pmb {Q}$| is the sample covariance matrix, both estimated from a sample of M model coefficient vectors.

This choice is motivated by the fact that, in some scenarios, an appropriate choice of basis functions in the asymptotic expansions can be used to distinguish between contributions of different origin and repeatability. This is particularly true for HARMONI, in which optical distortions (which are described by the higher order terms of the PZP expansion) have much smaller variability than mechanical distortions (which mostly affect the lowest order terms).

On the other hand, this choice implicitly assumes that the first M calibrations are good representatives of future calibrations, which is not necessarily true. We can overcome this limitation by reinterpreting |$\pmb {\alpha }_0,\pmb {\Sigma }_0$| as random variables conditioned by observed calibrated coefficients |$\pmb {\alpha }$|⁠, providing us with a Bayesianly justifiable method (Rubin 1984) to update our knowledge of the repeatability of the distortion. From Bayes’ theorem, it follows that

(32)

where the likelihood function |$p(\pmb {\alpha }|\pmb {\alpha }_0,\pmb {\Sigma }_0)=\mathcal {L}(\pmb {\alpha }_0,\pmb {\Sigma }_0|\pmb {\alpha })$| is simply the prior distribution of the coefficients:

(33)

We call the distribution |$p(\pmb {\alpha }_0,\pmb {\Sigma }_0)$| a repeatability prior, as it reflects the repeatability of measurements between two consecutive calibration sessions.

We have, in principle, certain freedom to choose the form of the prior |$p(\pmb {\alpha }_0,\pmb {\Sigma }_0)$|⁠. If we wish to leverage the algebraic properties of conjugate prior distributions, we can model the joint probability of |$\pmb {\alpha }_0,\pmb {\Sigma }_0$| as a Gaussian–Inverse–Wishart distribution:

(34)

which is a conjugate prior for the likelihood described in equation (33). |$\pmb {\mu }_m\in \mathbb {R}^{2J}$| is the mean vector of |$\pmb {\alpha }_0$|⁠, and |$\pmb {\Psi }_m\in \mathbb {R}^{2J\times 2J}$| is the inverse scale matrix (which is positive definite). λm > 0 and νm > 2J − 1 are pseudo-counter parameters that weight the effect of new observations of |$\pmb {\alpha }$| in the calculation of the posterior distribution.

The least informative prior has λ0 = 1 and ν0 = 2J + 1, and could be initialized from a zero mean and diagonal |$\pmb {\Psi }_0$|⁠. Unfortunately, this choice exhibits slow convergence to the true covariance matrix, depriving the algorithm of quickly leveraging the prior information from previous calibrations.

A faster, albeit more frequentist approach, is to rely on Parametric Empirical Bayes (PEB) for prior generation. Under PEB, prior hyperparameters |$\pmb {\mu }_0$| and |$\pmb {\Psi }_0$| are derived directly from maximum-likelihood estimators (MLEs) of previous observations (equations 3031) as

(35)
(36)
(37)
(38)

in which we identified |$\pmb {Q}$| with the mean variance. We also leveraged ν0 and λ0 to account for the strength of the evidence used as prior (M observations).

The analytic formula for the Bayesian update step has been described by Gelman et al. (2021). If |$\pmb {\alpha }^0,\dots ,\pmb {\alpha }^{n-1}$| are the calibrated coefficients obtained from n calibrations, the hyperparameters in equation (34) are updated as

(39)
(40)
(41)
(42)

where S is the sum of squares matrix about the sample mean:

(43)

Note that for the case of sequential updating (n = 1), |$\bar{\pmb {\alpha }}=\pmb {\alpha }$| and S is zero. While choosing sequential over batched updating should not affect the resulting priors, it can be desirable to benefit from the most up-to-date repeatability prior before each calibration. In the next calibration, the hyperparameters |$\pmb {\alpha }_0,\pmb {\Sigma }_0$| of the calibration priors would be drawn from the updated the Gaussian–Inverse–Wishart. The sampling procedure is documented by Gelman et al. (2021) and is performed in two steps: first, draw |$\pmb {\Sigma }_0|\pmb {\alpha }\sim \text{Inv-Wishart}(\pmb {\Psi }_{m+1},\nu _{m+1})$| and then draw |$\pmb {\alpha }_0|\pmb {\Sigma }_0,\pmb {\alpha }\sim \mathcal {MN}(\pmb {\mu }_{m+1},\pmb {\Sigma }_0/\lambda _{m+1})$|⁠.

5.3 Calibration algorithm

We are now ready to formulate the full calibration algorithm. The algorithm assumes that the repeatability prior hyperparameters |$(\pmb {\mu }_m,\pmb {\Psi }_m,\lambda _m,\nu _m)$| can be saved and restored from permanent storage through certain routines named load and save. The algorithm also assumes that at least M previous classical calibrations were performed, from which |$\bar{\pmb {\alpha }}$| and |$\pmb {Q}$| were obtained.

6. APPLICATION TO HARMONI’S FIELD DISTORTION

Before applying this method, one must provide evidence for the Gaussianity of the likelihood function. Simulations performed by the pointing simulator harmoni-pm (Carracedo-Carballal et al. 2022) showed that the measurement process is well described by a Gaussian with certain measurement noise variance |$\sigma _{\rm E}^2$| for both axes (Appendix  C), which also motivates a Gaussian likelihood.

Next, one must choose basis functions |$\hat{G}_j$| of the asymptotic expansion of the error and a good calibration pattern. While the latter is not as critical as the former, it has an impact on the number of points that are eventually required to calibrate the instrument up to the desired accuracy.

6.1 Basis functions

Due to the circular geometry of HARMONI’s focal plane, we propose a corrective model consisting of a truncated expansion of PZP of the pointing error. PZPs are enumerated by two indices: the radial order n and the azimuthal order m. Biesheuvel et al. (2018) formulate these polynomials in polar coordinates as

(44)

where |$\rho =|\hat{v}|=(x^2+y^2)^{\frac{1}{2}}$| and |$\theta =\text{arg}(\hat{v})=\text{arctan}\frac{y}{x}$|⁠. The radial polynomial |$R^{|m|}_n$| is a real function of ρ that can be written in terms of the Jacobi polynomials |$P^{(\alpha ,\beta )}_n(x)$| as

(45)

Our method requires the basis functions to be enumerated by a single index 0 ≤ j < J. While multiple indexations are possible, in this article we will refer to the so-called OSA/ANSI indices (Thibos et al. 2002), which relate m, n, and j by

(46)

This choice has the advantage of sorting PZPs in increasing radial order. This ensures that, as long as J is a triangular number, all azimuthal orders up to certain radial order are contained in the expansion. In particular, first order distortions are encoded by the first three PZPs: |$\hat{Z}_0$| describes a displacement in the focal plane, |$\hat{Z}_1$| a change in the aspect ratio, and |$\hat{Z}_2$| any combination of a rotation and a rescaling of the focal plane.

As PZPs are defined in the unit disc, our basis functions must be evaluated in normalized focal plane as

(47)

6.2 Calibration pattern

The choice of the calibration pattern is not only given by the basis functions, but also by practical constraints imposed by the instrument design. The performance of several calibration patterns was evaluated in a previous work (Carracedo-Carballal et al. 2022): dense square grid of points (of around 400 points) had the best performance, at the cost of a long calibration time. Spiral-like patterns behave well in the J = K case for low-order realizations of the model, but tend to leave highly undersampled regions in higher orders.

The best trade-off between performance and number of points was given by the Optimal Concentric Sampling (OCS) pattern. This pattern, originally described by Ramos-Löpez et al. (2016), optimizes the condition number of the collocation matrices of the Zernike expansion, and it is parametrized by the truncation order of the expansion. For a corrective model with J = 10 complex coefficients, the corresponding pattern takes the form shown in Fig. 1.

OCS pattern with 10 calibration points. The sub-indices represent the order in which the calibration points are traversed.
Figure 1.

OCS pattern with 10 calibration points. The sub-indices represent the order in which the calibration points are traversed.

7. RESULTS

The performance of the algorithm was evaluated against simulated distortions between the ELT’s focal plane and the relayed focal plane, using bayescal. The goals of these simulations were:

  • Validate the ability of the model to produce calibration coefficients meeting certain accuracy goal.

  • Compare the predicted uncertainties of the model with the model residuals after calibration.

  • Study the impact of the number of classical calibrations used as prior evidence of the repeatability prior.

  • Compare the performance of the Bayesian calibration algorithm with respect to the classical calibration technique.

bayescal models the total distortion as a composition of three |$\mathbb {R}^2\rightarrow \mathbb {R}^2$| coordinate transforms, representing how 2D coordinates are displaced between focal planes:

(48)

These transforms can be split into a fixed part and a variable part. The fixed part corresponds to Tfprs and was modelled after a Finite Element Method (FEM) analysis of FPRS distortions up to the 3rd radial order (introducing pointing errors of order 100 μm/30 mas). The variable part is used to represent the repeatability of measurements between calibrations, and consists of a random 2D displacement |$T_{\Delta \pmb {x}}=\pmb {x} + \Delta \pmb {x}$| composed with a random 2D rotation |$T_{\phi } = R(\phi )\pmb {x}$|⁠.

In order to highlight the ability of the model to distinguish between fixed and variable contributions to the pointing error, the effect of the latter was artificially increased to surpass the former. In particular, coordinates of the displacement |$\Delta \pmb {x}={(\Delta x,\Delta y)}^\top$| are drawn from zero-mean Gaussian distributions, with σΔx = 1 mm and σΔy = 2 mm. The angle of the rotation is also drawn from a zero-mean Gaussian distribution, with σϕ = 30 arcsec.

The measurement noise refers to the uncertainty of the technique used to locate the guide star’s centre of the imaged star. It depends on contributions like the detector’s resolution, fitting algorithm, or signal-to-noise ratio of the acquired image. We chose a realistic, isotropic Gaussian measurement noise of |$\sigma _{x}=\sigma _y=\sigma _{\rm E}=0.2\, \mu$|m, resulting in a total measurement noise variance of |$\sigma _n^2=\sigma _x^2+\sigma _y^2\approx 8\times 10^{-2}\, \mu$|m2. The calibration goal was set to |$\sigma _{\rm max}=6\, \mu$|m, corresponding to the tracking accuracy requirement of the guide probes (Estrada et al. 2022).

Since calibrations assumed a model with J = 10, the same 10-point OCS calibration pattern shown in Fig. 1 was used. Additionally, in a real-world scenario, we should expect that some of the optical fibres composing the points of the GCU mask may be damaged during the life time of the instrument. As a consequence, the definitive GCU mask pattern must contain redundant points that can replace the failing ones.

Validations were performed against a redundant version of the same pattern with K = 90 calibration points in total, distributed in the azimuthal and radial directions (Fig. 2). Calibration points were ordered to maximize their distance with respect to the previous, improving the quality of the sampling during the first steps of the calibration. The resulting order is reflected by the sub-indices of the points in Fig. 1.

Original OCS pattern (larger dots) augmented with redundant points (smaller dots). This redundancy was achieved by multiplying the density of points in the azimuthal and radial directions by 3.
Figure 2.

Original OCS pattern (larger dots) augmented with redundant points (smaller dots). This redundancy was achieved by multiplying the density of points in the azimuthal and radial directions by 3.

7.1 Accuracy testing

For accuracy testing, we forced a non-informative prior for |$\pmb {\alpha }_0,\pmb {\Sigma }_0$| in all calibrations, with |$\pmb {\mu }_0=\pmb {0}$| and |$\pmb {\Psi }_0=(100 \ {\mu \rm m})^2\mathbb {I}$|⁠. This resulted in the algorithm requiring J measurements prior to having a good description of the field distortion.

A total of 104 calibrations were performed, with different realizations of the distortion each. The performance was evaluated as the RMS measured in all 90 calibration points, and compared against the RMS predictions described in equation (29).

The result of this test is reflected in Fig. 3. We see not only that the calibration meets the goal, but also that the predicted RMS (⁠|$\approx 0.35\pm 0.17\, \mu$|m) overestimates the observed mean RMS (⁠|$\approx 0.31\, \mu$|m) by small amount. We connect this excess uncertainty to the vague priors used for testing: increasing the number of measurements per point, we observe a reduction in the uncertainties that pushes the prediction closer to the observed values (Fig. 4).

Histogram of the RMS of the residual after 104 simulated calibrations. All 10 calibration points, one measurement per calibration point, no prior information. The dashed line represents the mean expectation of the predicted RMS. The box represents the standard deviation of the uncertainty of the RMS prediction.
Figure 3.

Histogram of the RMS of the residual after 104 simulated calibrations. All 10 calibration points, one measurement per calibration point, no prior information. The dashed line represents the mean expectation of the predicted RMS. The box represents the standard deviation of the uncertainty of the RMS prediction.

Histogram of the RMS of the residual after 104 simulated calibrations. All 10 calibration points, two measurements per calibration point, no prior information.
Figure 4.

Histogram of the RMS of the residual after 104 simulated calibrations. All 10 calibration points, two measurements per calibration point, no prior information.

7.2 Uncertainty prediction

A critical feature of this algorithm is its ability to make informed predictions of its own uncertainty. For this analysis, we trained the model with the result of M = 100 classical calibrations, starting with |$\pmb {\mu }_0=\pmb {0}$| and |$\pmb {\Psi }_0=(100\ {\mu \rm m})^2\mathbb {I}$| and updating the repeatability priors as usual, iterating the algorithm along 104 simulated calibrations. Let εx, εy be the horizontal and vertical components of the measured pointing error, and |$\tilde{\varepsilon }_x,\tilde{\varepsilon }_y$| the random variables predicted by the pointing model. Since the uncertainty decreases as the number of accepted calibrations increases, we normalize the measured pointing errors by

(49)
(50)

which, for a perfect calibration with a good approximation of its uncertainty, should result in both rx and ry behaving as Gaussian distributions of zero mean and unitary variance.

In all cases, the calibration goal was achieved after taking measurements at the two first calibration points (⁠|$\pmb {p}_0,\pmb {p}_1$|⁠) once. This is not surprising, as the variable part of the pointing error consists of a displacement and a rotation.

The results for the first five calibration points are summarized in Fig. 5. We can see that, in all cases, the observed residuals are consistent with the predicted uncertainties. We can also see that the histogram of observed residuals at |$\pmb {p}_0,\pmb {p}_1$| is well described by the uncertainty predicted by the model. Unsurprisingly, the model overestimates the uncertainty around |$\pmb {p}_{2-4}$|⁠, as information of the distortion in those points is obtained more indirectly from the prior distribution. We observe a similar behaviour by overlaying the normalized residual histogram in both axes and all 90 calibration points (Fig. 6). In terms of RMS, we observe similar performance as providing 10 points with a very uninformative prior (Fig. 7).

Histogram of the normalized residuals with respect to the predicted residuals, one measurement per calibration point. While the pointing error at the two first calibration points $\pmb {p}_0,\pmb {p}_1$ seems to be perfectly described by the model, we observe extra uncertainty in the rest. We also observe small offsets in the non-visited points of the calibration patterns with respect to their predicted means, all of them smaller than 1σ.
Figure 5.

Histogram of the normalized residuals with respect to the predicted residuals, one measurement per calibration point. While the pointing error at the two first calibration points |$\pmb {p}_0,\pmb {p}_1$| seems to be perfectly described by the model, we observe extra uncertainty in the rest. We also observe small offsets in the non-visited points of the calibration patterns with respect to their predicted means, all of them smaller than 1σ.

Histograms of the normalized residual, using prior information. 104 calibrations, two calibration points, one measurement per calibration point.
Figure 6.

Histograms of the normalized residual, using prior information. 104 calibrations, two calibration points, one measurement per calibration point.

Histograms of the residual RMS, using prior information. 104 calibrations, two calibration points, one measurement per calibration point.
Figure 7.

Histograms of the residual RMS, using prior information. 104 calibrations, two calibration points, one measurement per calibration point.

By repeating the test duplicating the number of measurements, the model still needs two different calibration points to achieve the calibration goal. Apart from a lower uncertainty in the sampled points, the impact in the performance is negligible (Figs 9, 8, and 10).

Histogram of the normalized residuals with respect to the predicted residuals, two measurements per calibration point.
Figure 8.

Histogram of the normalized residuals with respect to the predicted residuals, two measurements per calibration point.

Histograms of the normalized residual, using prior information. 104 calibrations, two calibration points, two measurements per calibration point.
Figure 9.

Histograms of the normalized residual, using prior information. 104 calibrations, two calibration points, two measurements per calibration point.

Histograms of the residual RMS, using prior information. 104 calibrations, two calibration points, two measurements per calibration point.
Figure 10.

Histograms of the residual RMS, using prior information. 104 calibrations, two calibration points, two measurements per calibration point.

7.3 Impact of the number of classical calibrations

In the two previous tests, we made the choice of M = 100 calibrations on somewhat vague assumptions. In order to optimize this choice, we repeated the simulations using prior information from M = 25, 50, 100, and 200 classical calibrations.

Results of this tests are summarized in Fig. 11. In all cases, the calibration goal was met after measuring two calibration points. In the M = 25 case, the tails of the RMS histogram were considerably larger than those of the classical calibration. None the less, the performance of the algorithm can be made comparable with the previous calibrations simply by taking one extra measurement in each point (Fig. 12).

Quartiles of the RMS after 104 simulations, compared with the predicted uncertainty (two measurements needed in total). Horizontal lines represent the quartiles of the RMS of a classical calibration using all 10 points.
Figure 11.

Quartiles of the RMS after 104 simulations, compared with the predicted uncertainty (two measurements needed in total). Horizontal lines represent the quartiles of the RMS of a classical calibration using all 10 points.

RMS histogram with M = 25 and 104 simulations. Bayesian calibrations were performed with one measurement per point (top panel) and two measurements per point (bottom panel). Classical calibrations were performed with all 10 calibration points, one measurement per calibration point.
Figure 12.

RMS histogram with M = 25 and 104 simulations. Bayesian calibrations were performed with one measurement per point (top panel) and two measurements per point (bottom panel). Classical calibrations were performed with all 10 calibration points, one measurement per calibration point.

It should be noted that the smallest M is not necessarily the best choice in all cases. In particular, for M = 25, initial Bayesian calibrations required two measurements per calibration point to surpass the performance of the classical calibration, but only one measurement per calibration point in future calibrations. This evolution in the number of measurements many not be desirable in contexts in which a repeatable behaviour of the uncertainty is desired.

In our case, M = 100 ensures both a reasonable number of previous calibrations (if we assumed one geometrical calibration per day, this number would be achieved in approximately 3 month) and a stable number of required measurements.

7.4 Performance with respect to the classical calibration

We finish by comparing the performance of the classical calibration with respect to the Bayesian calibration. We do this by performing 104 simulations of each, and calculating the RMS of the residuals as measured in the 90 locations defined by the redundant OCS pattern. For the Bayesian calibration, we considered two cases: the informative case (as described in algorithm 1) and the uninformative case (forcing |$\pmb {\alpha }_0=\pmb {0}$| and |$\pmb {\Sigma }_0=(200\ {\mu \rm m})^2\mathbb {I}$|⁠).

Algorithm 1

Bayesian calibration algorithm

graphic
graphic
Algorithm 1

Bayesian calibration algorithm

graphic
graphic

In the uninformative case, calibrations were carried out by performing one simulated measurement of the distortion in each of the 10 calibration points of the non-redundant OCS pattern. Then, we calculated the histograms of the residual RMS and overlaid them in Fig. 13. From these results, it is evident that the performance of the uninformative Bayesian calibration is equivalent to that of the classical calibration. This is an expected consequence of the removal of the shrinkage introduced by the prior distributions: in the limiting case in which the priors are absolutely flat, |$\pmb {\alpha }_1$| tends to the MLE for |$\pmb {\alpha }$|⁠.

Residual RMS histograms for the classical and Bayesian calibration, no prior information, 104 calibrations. Both Bayesian and classical calibrations were performed with all 10 calibration points, one measurement per calibration point.
Figure 13.

Residual RMS histograms for the classical and Bayesian calibration, no prior information, 104 calibrations. Both Bayesian and classical calibrations were performed with all 10 calibration points, one measurement per calibration point.

The informative case was performed by estimating |$\bar{\pmb {\alpha }}$| and |$\pmb {Q}$| from M = 100 previously simulated classical calibrations, and allowing |$\pmb {\alpha }_0, \pmb {\Sigma }_0$| to be drawn from the repeatability priors. In this case, the calibration goal was achieved after measuring only two calibration points. The residual RMS histograms are shown in Fig. 14. Even though the original estimate of the RMS remained at 0.38 μm, we observe that the Bayesian calibration now outperforms the classical calibration. This illustrates the ability of the model to capture the repeatability of the highest order distortion and shrink measurements towards the mean.

Residual RMS histograms for the classical and Bayesian calibration, using prior information, 104 calibrations. Bayesian calibrations used two calibration points, one measurement per calibration point. Classical calibrations were performed with all 10 calibration points, one measurement per calibration point.
Figure 14.

Residual RMS histograms for the classical and Bayesian calibration, using prior information, 104 calibrations. Bayesian calibrations used two calibration points, one measurement per calibration point. Classical calibrations were performed with all 10 calibration points, one measurement per calibration point.

If we further increase the number of measurements per point to be two, we observe an additional reduction of the RMS towards the measurement noise (Fig. 15). This, once again, illustrates the ability of the model to acknowledge the increase of measurement precision by repeating the same measurements twice.

Residual RMS histograms for the classical and Bayesian calibration, using prior information, 104 calibrations. Bayesian calibrations used two calibration points, two measurements per calibration point. Classical calibrations were performed with all 10 calibration points, one measurement per calibration point.
Figure 15.

Residual RMS histograms for the classical and Bayesian calibration, using prior information, 104 calibrations. Bayesian calibrations used two calibration points, two measurements per calibration point. Classical calibrations were performed with all 10 calibration points, one measurement per calibration point.

8. OTHER APPLICATIONS

The freedom of choice of basis functions allows this method to be applied beyond the calibration of geometrical distortions of an instrument like HARMONI. In particular, this method could be applied to the calibration of a pointing model of a ground-based telescope.

For instance, we can identify F with the set of azimuthal coordinates of calibration stars in the sky and F′ with the coordinates to which the telescope must be commanded to observe these stars in the centre of their science field. The fact that sky coordinates are a particular case of spherical coordinates motivates a multipole expansion of the pointing error as

(51)

with |$Y^{m}_{l}(\theta ,\phi )$| being Laplace’s spherical harmonics, and also the basis functions of this pointing model. This series can be truncated at l = d, with d being the order of the resulting model. For a dth order spherical harmonic model, one must keep the first (d + 1)2 spherical harmonics with 0 ≤ ld.

A similar argument used in the derivation of OCS can be made for a calibration pattern with spherical harmonics. In this case, additional constraints exist regarding the availability of stars near the calibration points and the time required to complete the calibration. Dette et al. (2005) discusses a family sampling patterns for spherical harmonic expansions that satisfy different optimality criteria (Kiefer 1974). Unlike OCS, these criteria minimize a property of the inverse of the information matrix of a given sampling pattern, like its determinant, its trace, or its greatest eigenvalue.

The proposed patterns consist of the Cartesian products of two samplings in azimuth and elevation. While the azimuth sampling is uniform, the elevation sampling is calculated from the nodes of quadrature formulas with uniform weights.

9. CONCLUSIONS AND FUTURE WORK

  • We proposed a general corrective model that is able to describe the quasi-static geometric distortion in the focal plane of an instrument.

  • The Bayesian formulation of the calibration problem enables incremental calibration, informed updating of the uncertainties in the model parameters, and informative predictions of the distortion.

  • Given that proper priors are provided, the Bayesian calibration algorithm is able to outperform the classical calibration algorithm with fewer measurements.

  • We applied this model to HARMONI’s field distortion under certain simplifying assumptions like highly repeatable GCU mask distortion, no bias in guide probe positioning, and isotropic and homogeneous measurement noise.

  • The classical formulation of the calibration problem was used to reduce the number of points in the GCU mask while keeping a good immunity to measurement noise. As a result of this trade-off, the redundant OCS pattern was proposed.

  • The current formulation of the corrective model can be easily augmented to include anisotropic pointing error measurement, as well as position-dependent measurement errors of the guide probes.

For future work, we plan to extend the instrumental model to incorporate the effect of the systematic positioning errors of the guide probe mechanisms. As these errors are of mechanical origin, we expect them to be one of the biggest contributions to the total instrumental error.

We also intend to couple the corrective model to distortions of the GCU mask itself, due to thermal expansion of the supporting structure. Additionally, while the current estimates of the RMS can be used as sufficiently good termination conditions for calibration, they still overestimate the distribution of observed RMS. The cause of this overestimation can be tied to the crude approximation of the RMS distribution (approximation of the MSE distribution as a Gaussian plus a first order expansion of the square root) and to the choice of the calibration coefficients (we are not sampling |$\pmb {\alpha }$| but picking |$\pmb {\alpha }_{n+1}$|⁠, which is known with less uncertainty). In this regard, we plan to develop the estimation of the RMS further to produce more realistic predictions of the observed RMS.

ACKNOWLEDGEMENTS

GCC and JPL acknowledge support from grants PID2019-105423GA-I00 and PID2022-140483NB-C22 funded by AEI (10.13039/501100011033) and BDC 20221289 funded by MCI through the Recovery, Transformation and Resilience Plan from the Spanish State, and by NextGenerationEU from the European Union through the Recovery and Resilience Facility. MPS acknowledges funding support from the Ramón y Cajal program of the Spanish Ministerio de Ciencia e Innovación (RYC2021-033094-I).

DATA AVAILABILITY

Source code of the simulation scripts, data products, and Jupyter Lab notebooks are freely accessible from the GitHub repository at https://github.com/BatchDrake/bayescal, under the terms of the 3-clause BSD license.

Footnotes

1

Proof in Appendix B2.

References

Augustin
 
R.
 et al. ,
2019
,
MNRAS
,
489
,
2417
 

Biesheuvel
 
R. S.
,
Janssen
 
A. J. E. M.
,
Pozzi
 
P.
,
Pereira
 
S. F.
,
2018
,
OSA Continuum
,
1
,
581
 

Bounissou
 
S.
,
Thatte
 
N.
,
Zieleniewski
 
S.
,
Houghton
 
R. C. W.
,
Tecza
 
M.
,
Hook
 
I.
,
Neichel
 
B.
,
Fusco
 
T.
,
2018
,
MNRAS
,
478
,
3189
 

Carracedo-Carballal
 
G. J.
,
2022
,
harmoni-pm GitHub repository
.
GitHub
. Available at: https://github.com/BatchDrake/harmoni-pm

Carracedo-Carballal
 
G. J.
,
2023
,
bayescal GitHub repository
.
GitHub
. Available at:

Carracedo-Carballal
 
G. J.
,
Piqueras-López
 
J.
,
Clarke
 
F.
,
2022
, in
Ibsen
 
J.
,
Chiozzi
 
G.
, eds,
Proc. SPIE Conf. Ser. Vol. 12189, Software and Cyberinfrastructure for Astronomy VII
.
SPIE
,
Bellingham
, p.
121892G

Dette
 
H.
,
Melas
 
V. B.
,
Pepelyshev
 
A.
,
2005
,
Ann. Stat.
,
33
,
2758
 

Dohlen
 
K.
 et al. ,
2022
,
Proc. SPIE Conf. Ser. Vol. 12187, Modeling, Systems Engineering, and Project Management for Astronomy X
.
SPIE
,
Bellingham
, p.
121871D

Estrada
 
A.
 et al. ,
2022
, in
Navarro
 
R.
,
Geyl
 
R.
, eds,
Proc. SPIE Conf. Ser. Vol. 12188, Advances in Optical and Mechanical Technologies for Telescopes and Instrumentation V
.
SPIE
,
Bellingham
, p.
121883O

García-Lorenzo
 
B.
,
Monreal-Ibero
 
A.
,
Pereira-Santaella
 
M.
,
Thatte
 
N.
,
Ramos Almeida
 
C.
,
Galbany
 
L.
,
Mediavilla
 
E.
,
2022
,
A&A
,
659
,
A79
 

Gelman
 
A.
,
Carlin
 
J. B.
,
Stern
 
H. S.
,
Dunson
 
D. B.
,
Vehtari
 
A.
,
Rubin
 
D. B.
,
2021
,
Bayesian Data Analysis
, 3rd edn.
CRC Press, Boca Raton, FL
.

Grisdale
 
K.
,
Thatte
 
N.
,
Devriendt
 
J.
,
Pereira-Santaella
 
M.
,
Slyz
 
A.
,
Kimm
 
T.
,
Dubois
 
Y.
,
Yi
 
S. K.
,
2021
,
MNRAS
,
501
,
5517
 

Grisdale
 
K.
 et al. ,
2022
,
MNRAS
,
513
,
3906
 

Kendrew
 
S.
 et al. ,
2016
,
MNRAS
,
458
,
2405
 

Kiefer
 
J.
,
1974
,
Ann. Stat.
,
2
,
849
 

Nguyen
 
D. D.
,
Cappellari
 
M.
,
Pereira-Santaella
 
M.
,
2023
,
MNRAS
,
526
,
3548
 

Pereira-Santaella
 
M.
,
Routledge
 
L.
,
Zieleniewsk
 
S.
,
Kendrew
 
S.
,
2023
,
HSIM 3 GitHub repository
.
GitHub
. Available at:

Ramos-Löpez
 
D.
,
Sánchez-Granero
 
M.
,
Fernández-Martínez
 
M.
,
Martínez-Finkelshtein
 
A.
,
2016
,
Appl. Math. Comput.
,
274
,
247
 

Richardson
 
M. L. A.
,
Routledge
 
L.
,
Thatte
 
N.
,
Tecza
 
M.
,
Houghton
 
R. C. W.
,
Pereira-Santaella
 
M.
,
Rigopoulou
 
D.
,
2020
,
MNRAS
,
498
,
1891
 

Rubin
 
D. B.
,
1984
,
Ann. Stat.
,
12
,
1151
 

Thatte
 
N.
 et al. ,
2021
,
The Messenger
,
182
,
7
 

Thatte
 
N.
 et al. ,
2022a
, in
Proc. SPIE Conf. Ser. Vol. 12184, Ground-based and Airborne Instrumentation for Astronomy IX
.
SPIE
,
Bellingham
, p.
1218420

Thibos
 
L. N.
,
Applegate
 
R. A.
,
Schwiegerling
 
J. T.
,
Webb
 
R.
,
2002
,
J. Refract. Surg.
,
18
,
S652

Zieleniewski
 
S.
,
Thatte
 
N.
,
Kendrew
 
S.
,
Houghton
 
R.
,
Swinbank
 
A.
,
Tecza
 
M.
,
Clarke
 
F.
,
Fusco
 
T.
,
2015
,
MNRAS
,
453
,
3755

APPENDIX A: REAL-VALUED COLLOCATION MATRIX

Let |$\hat{\pmb {p}} = {(\hat{p}^0,\hat{p}^1,\dots ,\hat{p}^{K-1})}^\top \in \mathbb {C}^K$| be a complex vector of evaluation points in the relayed focal plane, |$\hat{\pmb {\alpha }} = {(\hat{\alpha }^0,\hat{\alpha }^1,\dots ,\hat{\alpha }^{J-1})}^\top \in \mathbb {C}^J$| the complex vector of coefficients of the corrective model, and |$\hat{\pmb {\varepsilon }}={(\hat{\varepsilon }^0,\hat{\varepsilon }^1,\dots ,\varepsilon ^{K-1})}^\top \in \mathbb {C}^K$| the vector of pointing errors evaluated in |$\hat{\pmb {p}}$| in complex form. Predictions of the pointing error can be made by evaluating

(A1)

in which the coefficients |$\hat{G}^k_j$| of the complex collocation matrix |$\hat{\pmb {G}}$| are obtained from evaluations of the first J basis functions |$\hat{G}_j$| as |$\hat{G}^k_j=\hat{G}_j(\hat{p}^k)$|⁠.

In order to obtain a real-valued expression for the pointing error in both axes, we need to decompose all complex quantities into their real and imaginary parts:

(A2)
(A3)
(A4)

Now, equation (A1) can be expanded to

(A5)

By grouping real and imaginary quantities, we arrive to the following set of real-valued equations:

(A6)
(A7)

The quantities |$\varepsilon ^k_d,d\in \lbrace x, y\rbrace$| are still enumerated by two indices. Since single-indexed vectors are more convenient from the implementation perspective, we can rely on the following interlaced representation of complex vectors:

(A8)
(A9)

which assigns even indices to the real parts and odd indices to the imaginary parts. Under this representation, equation (A6) can be written as

(A10)
(A11)

or, in a more compact form

(A12)

where |$\pmb {\varepsilon }\in \mathbb {R}^{2K}$|⁠, |$\pmb {\alpha }\in \mathbb {R}^{2J}$|⁠, and |$\pmb {G}\in \mathbb {R}^{2K\times 2J}$|⁠. The coefficients of the real collocation matrix |$G_p^q$| can be expressed in terms of the real and imaginary parts of the coefficients of the complex collocation matrix |$\hat{G}^k_j$| as

(A13)
(A14)
(A15)
(A16)

APPENDIX B: POSTERIOR DISTRIBUTIONS

This paper introduces Bayesian updating formulae for two distributions: the calibration-time coefficient distribution and the model repeatability distributions. These distributions have been chosen to be conjugate with respect to their corresponding likelihood functions, so that the posterior distribution is analytic and of the same form as the prior distribution. This section describes how the Bayesian updating formulae have been obtained in each case.

B1 Model coefficient distribution

Let |$\pmb {\alpha }=(\alpha ^0,\alpha ^1,\dots ,\alpha ^{2J-1})^\top \in \mathbb {R}^{2J}$| the real and imaginary parts of the model coefficients and |$\sigma ^2_{\rm E}$| the variance of the measurement noise, |$\pmb {\varepsilon }=(\varepsilon ^0,\varepsilon ^1,\dots ,\varepsilon ^{2K-1})^\top \in \mathbb {R}^{2K}$| the real vector constructed from measurements of the pointing error in the horizontal and vertical axes of certain K measurement points, and |$\pmb {p}={(p^0,p^1,\dots ,p^{2K-1})}^\top$| the real vector constructed from alternating horizontal and vertical coordinates of the K measurement points.

Let |$p(\pmb {\alpha })$| be the prior probability density of the coefficient vector, which encodes our knowledge on the model coefficients. According to the Bayes theorem, the presence of a new observation |$\pmb {\varepsilon }$| updates the probability density |$p(\pmb {\alpha })$| as

(B1)

where |$p(\pmb {\varepsilon }|\pmb {\alpha },\pmb {p},\sigma ^2_{\rm E})$| is just the likelihood function introduced in equation (15) and |$p(\pmb {\varepsilon })$| acts as a mere normalization constant. Equation (B1) can therefore be written as

(B2)

If |$p(\pmb {\alpha })$| is modelled after a multivariate Gaussian with a mean vector |$\pmb {\alpha }_n$| and a covariance matrix |$\pmb {\Sigma }_{n}$|

(B3)

the posterior |$p(\pmb {\alpha }|\pmb {\varepsilon })$| will also be multivariate Gaussian, as it is conjugate with respect to the likelihood function (15):

(B4)

where the sub-indices n and n + 1 have been conveniently chosen to distinguish between prior hyperparameters and posterior hyperparameters. |$p(\pmb {\alpha })$| can be written as

(B5)

Since all distributions in equation (B2) are of the exponential family, we can take the logarithms in both sides of the equation and remove the common factors:

(B6)

where S is a normalization term that does not depend on |$\pmb {\alpha }$|⁠. Both sides of the identity (B6) are therefore second-degree polynomials on |$\pmb {\alpha }$|⁠.

We introduce now the precision matrix |$\pmb {\Lambda }_n=\pmb {\Sigma }_n^{-1}$|⁠. By definition, all covariance matrices |$\pmb {\Sigma }_n$| are symmetric and positive definite, hence the corresponding precision matrices |$\pmb {\Lambda }_n$| exist and must also be symmetric and positive definite. This implies that |${\pmb {u}}^\top \pmb {\Lambda }_n\pmb {v} = {\pmb {v}}^\top \pmb {\Lambda }_n\pmb {u}$| for any |$\pmb {u},\pmb {v}\in \mathbb {R}^{2J}$|⁠. By exploiting this property, we can further expand (B6) into

(B7)

By grouping the quadratic terms on |$\pmb {\alpha }$|⁠, we see that |${\pmb {\alpha }}^\top {{\pmb {\Lambda }_{n+1}}{\pmb {\alpha }}}={\pmb {\alpha }}^\top {{{\pmb {G}}^\top {{\sigma _{\rm E}^{-2}}{\pmb {G}}}}{\pmb {\alpha }}} + {\pmb {\alpha }}^\top {{\pmb {\Lambda }_n}{\pmb {\alpha }}}={\pmb {\alpha }}^\top {{({\pmb {G}}^\top {{\sigma _{\rm E}^{-2}}{\pmb {G}}} + \pmb {\Lambda }_n)}{\pmb {\alpha }}}$|⁠. This allows us to conclude that

(B8)

which can be interpreted as the Bayesian updating formula for the inverse of the covariance matrix |$\pmb {\Sigma }_n$|⁠.

By repeating the process on the linear terms in |$\pmb {\alpha }$| and removing common factors, we see that |${\pmb {\alpha }}^\top \pmb {\Lambda }_{n+1}\pmb {\alpha }_{n+1}={{\pmb {\alpha }}^\top {\pmb {G}}^\top }\sigma _{\rm E}^{-2}{\pmb {\varepsilon }}+{\pmb {\alpha }}^\top \pmb {\Lambda }_n\pmb {\alpha }_n$|⁠. Since |${\pmb {\alpha }}^\top$| is a common factor in this identity, we can rewrite it as

(B9)

which must hold for any |$\pmb {\alpha }$| and therefore |$\pmb {\Lambda }_{n+1}\pmb {\alpha }_{n+1}={\pmb {G}}^\top \sigma _{\rm E}^{-2}\pmb {\varepsilon }+\pmb {\Lambda }_n\pmb {\alpha }_n$|⁠. By solving now for |$\pmb {\alpha }_{n+1}$|

(B10)

which can be interpreted as the Bayesian updating formula for the mean vector |$\pmb {\alpha }_n$|⁠. The full Bayesian updating formula for the model coefficient distribution is therefore written as

(B11)
(B12)

B2 Connection to the classical calibration

When K = J, a classical solution to the calibration problem can be obtained by the formula

(B13)

with |$\pmb {\alpha }, \pmb {\varepsilon }\in \mathbb {R}^{2J}$| and |$\pmb {G}\in \mathbb {R}^{2J\times 2J}$|⁠. This solution can be connected to a limiting case of the Bayesian update when a batch of J measurements has been used as evidence and the prior |$p(\pmb {\alpha })$| tends to an infinitely uninformative distribution. From equations (B11)–(B12), we obtain

(B14)
(B15)

where |$\pmb {G}\in \mathbb {R}^{2J\times 2J}$| is now the classical real collocation matrix.

If we introduce a prior for the covariance matrix of the form |$\pmb {\Sigma }_0=\sigma _0^2\mathbb {I}$| (i.e. same variance |$\sigma _0^2$| for all coefficients and no correlation between them), we can analyse the limiting case of an infinitely uninformative prior distribution by taking the limits of |$\pmb {\Sigma }_1$| and |$\pmb {\alpha }_1$| when |$\sigma ^2_0\rightarrow \infty$|⁠:

(B16)
(B17)

from which we see that |$\pmb {\alpha }_1$| equals to the classical solution as written in equation (B13).

We should remark that, as per equation (B11), sequential and batched updates have the same effect on the covariance matrix. This is no longer true for the mean vector |$\pmb {\alpha }_n$|⁠, in which updatings from previous observations condition the probability of future observations.

On the other hand, since equation (B16) still applies to sequential updatings, we should expect a significant decrease of the total variance of |$\pmb {\Sigma }_J$| with respect to |$\pmb {\Sigma }_{J-1}$| (as it now contains information of all distortion modes).

B3 Posterior predictive distribution

The choice of conjugate priors also ensures the existence of a closed-form expression for the posterior predictive distribution. In order to find this expression, we consider a collocation matrix evaluated in N distinct test points |$\tilde{\pmb {G}}\in \mathbb {R}^{2N\times 2J}$|⁠, with N < J. We start by remarking that |$p(\tilde{\pmb {\varepsilon }}|\pmb {\varepsilon })$| can be obtained from the following marginalization:

(B18)

Since |$p(\pmb {\alpha }|\pmb {\varepsilon })=p(\pmb {\alpha }|\pmb {\alpha }_{n+1},\pmb {\Sigma }_{n+1})$|⁠, it can also be written as

(B19)

Since |$p(\tilde{\pmb {\varepsilon }}|\pmb {\alpha },\sigma ^2_{\rm E})$| depends on |$\pmb {\alpha }$| only to calculate the mean |$\tilde{\pmb {G}}\pmb {\alpha }$|⁠, we can tentatively introduce the change of variable |$\pmb {m}=\tilde{\pmb {G}}\pmb {\alpha }$|⁠. Unfortunately, |$\tilde{\pmb {G}}$| is a rectangular projection matrix, and any change of variable requires the transformation to be bijective.

We can overcome this difficulty by remarking that |$\tilde{\pmb {G}}$| consists on 2N linearly independent vectors |$\pmb {G}_i \in \mathbb {R}^{2J}$|⁠. Let |$G = \text{span}\lbrace \pmb {G}_0,\dots ,\pmb {G}_{2N-1}\rbrace$| the vector sub-space spanned by the rows of |$\tilde{\pmb {G}}$|⁠. It is clear that |$G\subset \mathbb {R}^{2J}$| as dimG = 2N < 2J.

Let |$W\subset \mathbb {R}^{2J}$| such that WG and |$W\cup G=\mathbb {R}^{2J}$|⁠. If |$\pmb {w}_i \in W$| are a set of 2(JN) basis vectors of W, we can complete the matrix |$\tilde{\pmb {G}}$| as follows:

(B20)

Since by construction the matrix |$\pmb {A}\in \mathbb {R}^{2J\times 2J}$| is full rank and its rows span all |$\mathbb {R}^{2J}$|⁠, |$\pmb {A}^{-1}$| is guaranteed to exist. We can now introduce the following change of variable:

(B21)

with |$\pmb {m}={(m_0,\dots ,m_{2J-1})}^\top \in \mathbb {R}^{2J}$|⁠. This enables the rewrite of equation (B19) as

(B22)

We further note that |$p(\tilde{\pmb {\varepsilon }}|\pmb {m},\sigma ^2_{\rm E}\mathbb {I})$| does not depend on the extra dimensions we added to |$\tilde{\pmb {G}}$|⁠. This allows us to marginalize only in the first 2N components of |$\pmb {m}$|⁠:

(B23)

where we kept the dependency m2N, …, m2J − 1 as it is still necessary to compute |$\pmb {A}^{-1}\pmb {m}$|⁠. Note that |$p(\pmb {A}^{-1} \pmb {m}|\pmb {\alpha }_{n+1},\pmb {\Sigma }_{n+1})$| is also a multivariate Gaussian probability distribution. The logarithmic probability density takes the form

(B24)

By writing |$\pmb {\Sigma }_{n+1}^{-1}={(\pmb {A}^{-1}\pmb {A})}^\top \pmb {\Sigma }_{n+1}^{-1}\pmb {A}^{-1}\pmb {A}$|⁠, equation (B24) becomes

(B25)

from which we identify the exponent of a multivariate Gaussian with mean |$\pmb {A}\pmb {\alpha }_{n+1}$| and covariance matrix |${\pmb {A}}^\top \pmb {\Sigma }\pmb {A}$|⁠:

(B26)

The multivariate Gaussian density |$p(\tilde{\pmb {\varepsilon }}|\pmb {m},\sigma ^2_{\rm E}\mathbb {I})$| can also be written as |$p(\tilde{\pmb {\varepsilon }}-\pmb {m}^{\prime }|\pmb {0},\sigma ^2_{\rm E}\mathbb {I})$| with |$\pmb {m}^{\prime }={(m_0,\dots ,m_{2N-1})}^\top$|⁠, as it does not depend on the extra dimensions we added to construct |$\pmb {A}$|⁠. We are now ready to remove the dummy dependency on m2N, …, m2J − 1 along with the corresponding rows and columns in the mean expression |$\pmb {A}\pmb {\alpha }_{n+1}$| (which now reduces to |$\tilde{\pmb {G}}\pmb {\alpha }$|⁠) and the covariance matrix |${\pmb {A}}^\top \pmb {\Sigma }_{n+1}\pmb {A}$| (which now reduces to |${\tilde{\pmb {G}}}^\top \pmb {\Sigma }_{n+1}\tilde{\pmb {G}}$|⁠):

(B27)

We finish by remarking that equation (B27) is actually the convolution integral of two multivariate Gaussians distributions:

(B28)

from which it follows that |$\tilde{\pmb {\varepsilon }}|\pmb {\varepsilon }$| is also a multivariate Gaussian variable of the form

(B29)

APPENDIX C: SCATTER PLOT MATRICES

In order to understand the stochastic behaviour of the corrective model coefficients, we simulated a total of 20 000 classical calibrations. Simulations were performed by means of HARMONI’s pointing simulator harmoni-pm, and assumed a corrective model with J = 10 complex coefficients. Each calibration was fed from 553 simulated measurements of the pointing error along a uniform grid of points inside the surface of the technical field. Each simulated measurement took into account contributions like manufacturing tolerances, optical distortions, and mechanical instabilities, and produced a |$\mathbb {C}^{10}$| vector of calibrated coefficients.

In order to represent distribution and correlation between coefficients, a representation in the form of two scatter plot matrices was used, one for the real part of the coefficients (Fig. C1) and another for the imaginary part of the coefficients (Fig. C2). The plot in the ith row and jth column (with ij) is a 2D scatter plot of the pair formed by the real (imaginary) parts of the coefficients |$\hat{\alpha }^i$| and |$\hat{\alpha }^j$|⁠. On the other hand, the ith plot in the diagonal represents the histogram of the real (imaginary) part of the coefficient |$\hat{\alpha }^i$|⁠. Since the purpose of these plots was to represent the shape of the underlying coefficient distribution, the simulated real (imaginary) parts of the coefficients where normalized by their respective sample means and standard deviations.

Scatter plot matrix of the real part of the complex coefficient vector $\hat{\pmb {\alpha }}$. In this figure, $\alpha ^m_n$ represents the real part of the complex coefficient multiplying the polynomial $\hat{G}^m_n$. Rows and columns are sorted by increasing OSA/ASI index, as per equation (46). Plot axes are normalized by the sample mean and standard deviation of the real parts of the respective coefficients.
Figure C1.

Scatter plot matrix of the real part of the complex coefficient vector |$\hat{\pmb {\alpha }}$|⁠. In this figure, |$\alpha ^m_n$| represents the real part of the complex coefficient multiplying the polynomial |$\hat{G}^m_n$|⁠. Rows and columns are sorted by increasing OSA/ASI index, as per equation (46). Plot axes are normalized by the sample mean and standard deviation of the real parts of the respective coefficients.

Scatter plot matrix of the imaginary part of the complex coefficient vector $\hat{\pmb {\alpha }}$. In this figure, $\alpha ^m_n$ represents the imaginary part of the complex coefficient multiplying the polynomial $\hat{G}^m_n$. Rows and columns are sorted by increasing OSA/ASI index, as per equation (46). Plot axes are normalized by the sample mean and standard deviation of the imaginary parts of the respective coefficients.
Figure C2.

Scatter plot matrix of the imaginary part of the complex coefficient vector |$\hat{\pmb {\alpha }}$|⁠. In this figure, |$\alpha ^m_n$| represents the imaginary part of the complex coefficient multiplying the polynomial |$\hat{G}^m_n$|⁠. Rows and columns are sorted by increasing OSA/ASI index, as per equation (46). Plot axes are normalized by the sample mean and standard deviation of the imaginary parts of the respective coefficients.

The obtained scatter plot matrices are compatible with a multivariate Gaussian random variable, with slight correlations between certain coefficients.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.