Spatially constrained direction-dependent calibration

Yatawatta, Sarod

doi:10.1093/mnras/stab3643

ABSTRACT

Direction-dependent calibration of widefield radio interferometers estimates the systematic errors, along with multiple directions in the sky. This is necessary because with most systematic errors, which are caused by effects such as the ionosphere or the receiver beam shape, there is a significant spatial variation. Fortunately, there is some deterministic behaviour of these variations in most situations. We enforce this underlying smooth spatial behaviour of systematic errors as an additional constraint on to spectrally constrained direction-dependent calibration. Using both analysis and simulations, we show that this additional spatial constraint improves the performance of multifrequency direction-dependent calibration.

Instrumentation: interferometers, methods: numerical, techniques: interferometric

1 INTRODUCTION

One limitation for improving the quality of data obtained by modern widefield radio interferometric arrays is the systematic errors affecting the data. Calibration, or the determination and removal of systematic errors, is therefore a crucial data processing step in radio interferometry. When the observing field of view is wide, especially in low-frequency radio astronomy, the systematic errors vary with the direction in the sky and therefore, direction-dependent calibration is necessary. Numerous techniques exist for direction-dependent calibration and Cotton & Mauch (2021) give a comprehensive overview of the current state of the art.

By definition, ‘direction dependence’ implies that the systematic errors are spatially variable (in addition to temporal and spectral variability). We can categorize the spatial variation into two groups. In deterministic variability, the spatial variations are smooth and contiguous and they can be described by a simple model. Examples for such variations are the systematic errors caused by the main lobe of a phased array beam or the refraction caused by a benign ionosphere. The second category of spatial variability is random or stochastic. In this case, a simple model will not be able to completely describe the systematic errors. The side lobe patterns of a phased array beam or scintillation caused by a turbulent ionosphere are two examples for the causes of stochastic direction-dependent errors. In practice, the systematic errors are a combination of both deterministic and stochastic components, and the amount of each component is unknown.

Multifrequency direction-dependent calibration has already been improved by using the spectral smoothness of systematic errors as a regularizer (Yatawatta 2015; Brossard et al. 2018; Ollier et al. 2018; Yatawatta et al. 2018, 2020). Improving on this, in this paper, we add an implicit model for the spatial dependence of systematic errors as an extra regularizer. We favour an implicit model over an explicit model because the amount of deterministic behaviour of the direction dependence is unknown to us. For one observation, we might get well-defined direction dependence of systematic errors while for another observation, we might get completely random and uncorrelated direction dependence. Our spatial model will adapt itself to each observation according to this dichotomy. We use the calibration solutions for each direction to bootstrap our spatial model. In order to prevent overfitting, we pass the solutions through an information bottleneck by using elastic net regression (Zou & Hastie 2005) to construct the spatial model.

Explicit models have already being used to improve direction-dependent calibration, for example by modelling ionospheric effects (Cotton 2007; Mitchell et al. 2008; Intema et al. 2009; Arora et al. 2016; Albert et al. 2020) or beam effects (Bhatnagar et al. 2008; Yatawatta 2018b; Cotton & Mauch 2021). Most of these methods are not scalable to process multifrequency data in a distributed computer. Instead of having a model with both spectral and spatial dependence, we use disjoint models and separate constraints for spectral dependence and spatial dependence. We use consensus optimization (Boyd et al. 2011) and extend our previous work (Yatawatta 2020) to incorporate the spatial constraint as an extension to a model derived by federated averaging (McMahan et al. 2016). In this manner, we are able to incorporate both spectral and spatial constraints without losing scalability in processing multifrequency data in a distributed computer. However, since we use an implicit model, relating this model to actual physical phenomena, such as the ionosphere and the beam shape, is more involved and is a drawback of our approach.

The remainder of the paper is organized as follows. In Section 2, we define the radio interferometric data model and describe our spatially constrained distributed calibration algorithm. In Section 3, we derive performance criteria based on the influence function (Hampel et al. 1986; Yatawatta 2019a). We perform simulations and present the results in Section 4 to illustrate the improvement due to the spatial constraint. Finally, we draw our conclusions in Section 5.

Notation: Lower case bold letters refer to column vectors (e.g. y). Upper case bold letters refer to matrices (e.g. C). Unless otherwise stated, all parameters are complex numbers. The set of complex numbers is given as |${\mathbb {C}}$| and the set of real numbers as |${\mathbb {R}}$|⁠. The matrix inverse, pseudo-inverse, transpose, Hermitian transpose, and conjugation are referred to as (·)⁻¹, (·)^†, (·)^T, (·)^H, (·)^⋆, respectively. The matrix Kronecker product is given by ⊗. The vectorized representation of a matrix is given by vec(·). The identity matrix of size N is given by I_N. All logarithms are to the base e, unless stated otherwise. The Frobenius norm is given by ‖ · ‖.

2 SPECTRALLY AND SPATIALLY CONSTRAINED CALIBRATION

We consider a radio interferometer with N receivers (or stations), collecting data from K directions in the sky. The visibilities at baseline p, q are given by Hamaker, Bregman & Sault (1996)

$$\begin{eqnarray*} {\bf V}_{pqf_i}=\sum _{k=1}^K {\bf J}_{pkf_i} {\bf C}_{pqkf_i} {\bf J}_{qkf_i}^H + {\bf N}_{pq} \end{eqnarray*}$$

(1)

where |${\bf V}_{pqf_i}$||$(\in \mathbb {C}^{2\times 2})$| is the visibility matrix in full polarization. The subscripts p and q denote the two stations forming the baseline p, q, and f_i denotes the receiving frequency. The systematic errors, along with direction k, are given by |${\bf J}_{pkf_i}$| and |${\bf J}_{qkf_i}$||$(\in \mathbb {C}^{2\times 2})$|⁠. The source coherency of direction k is given by |${\bf C}_{pqkf_i}$||$(\in \mathbb {C}^{2\times 2})$|⁠. The noise term is given by N_pq|$(\in \mathbb {C}^{2\times 2})$|⁠, which represents the receiver noise and the noise due to sources not included in the model. Note that all components in (1) are also dependent on time but we implicitly assume this.

We augment the systematic errors, along with direction k, for all stations into a block matrix as

$$\begin{eqnarray*} {{\bf J}_{kf_i}}\buildrel\triangle \over =[{{\bf J}}_{1k{f_i}}^T,{{\bf J}}_{2k{f_i}}^T,\ldots ,{{\bf J}}_{Nk{f_i}}^T]^T, \end{eqnarray*}$$

(2)

where |${{\bf J}_{kf_i}}$||$\in \mathbb {C}^{2N\times 2}$|⁠.

The spectral constraint on the the systematic errors is given by (Yatawatta 2015)

$$\begin{eqnarray*} {\bf J}_{kf_i}={\bf B}_{f_i} {\bf Z}_{k} \end{eqnarray*}$$

(3)

where |${\bf B}_{f_i}$| (⁠|$\in \mathbb {R}^{2N\times 2FN}$|⁠) is a basis in frequency, evaluated at f_i and Z_k (⁠|$\in \mathbb {C}^{2FN\times 2}$|⁠) is a matrix that accumulates information over all frequencies (say P frequencies) pertaining to direction k. Note that Z_k is therefore independent of frequency, i.e. a ‘global’ model. The number of basis functions describing the frequency dependence is given by F.

In Yatawatta (2020), we have introduced an additional constraint

$$\begin{eqnarray*} {\bf Z}_{k} = \overline{{\bf Z}}_k \end{eqnarray*}$$

(4)

where |$\overline{{\bf Z}}_k$| (⁠|$\in \mathbb {C}^{2FN\times 2}$|⁠) is an external model describing the frequency dependence. For example, we can build Z_k in (3) using data at only a subset of the full frequency range. Similarly, we can build |$\overline{{\bf Z}}_k$| using another subset of frequencies. By using the constraint (4), we are able to reach consensus between these different frequency subsets. This is useful when we are computationally limited to simultaneously process all available frequencies.

In this paper, we build our external model |$\overline{{\bf Z}}_k$| in (4) using the information available to all directions, i.e. by varying k. Let |${\rm{\boldsymbol \Phi }}_k$| be a basis function in space (such as spherical harmonics) evaluated, along with the direction k. We can describe |$\overline{{\bf Z}}_k$| as

$$\begin{eqnarray*} \overline{{\bf Z}}_k = {\bf Z} {\rm{\boldsymbol \Phi }}_k \end{eqnarray*}$$

(5)

where |${\rm{\boldsymbol \Phi }}_k \in \mathbb {C}^{2G\times 2}$| and the global spatial model, |${\bf Z} \in \mathbb {C}^{2FN\times 2G}$|⁠. The number of spatial basis functions is given by G. Note that Z is independent of k, i.e. the direction in the sky, and is therefore global, both spectrally and spatially.

The spatial basis can be constructed by a vector of basis functions |${\rm{\boldsymbol \phi }}_k$| (⁠|$\in \mathbb {C}^{G\times 1}$|⁠) evaluated along with direction k as

$$\begin{eqnarray*} {\rm{\boldsymbol \Phi }}_k ={\bf I}_2 \otimes {\rm{\boldsymbol \phi }}_k. \end{eqnarray*}$$

(6)

Note that with G = 1, we get (federated) averaging.

With the spectral and spatial constraints (3) and (4), we formulate the calibration problem as

$$\begin{eqnarray*} \lbrace {\bf {J}}_{kf_i},\ldots ,{\bf {Z}}_k:\ \forall \ k,i\rbrace &=&\underset{{\bf {J}}_{kf_i},\ldots ,{\bf {Z}}_k}{\rm arg\ min} \sum _i g_{f_i}(\lbrace {\bf J}_{kf_i}:\ \forall k\rbrace)\nonumber \\ {\rm subject\ to}\ \ {\bf {J}}_{kf_i} &=& {\bf {B}}_{f_i} {\bf {Z}}_k,\ \ i\in [1,P],k\in [1,K]\nonumber \\ {\rm and}\ \ {\bf {Z}}_k &=& \overline{{\bf Z}}_k, \ \ k\in [1,K]. \end{eqnarray*}$$

(7)

The unconstrained calibration cost function at frequency f_i for all K directions is given by |$g_{f_i}(\lbrace {\bf J}_{kf_i}:\ \forall k\rbrace)$| and this can be solved for instance by using the space alternating generalized expectation maximization algorithm Fessler & Hero (1994) and Kazemi et al. (2011).

In order to solve (7), we form the augmented Lagrangian, for all k, i as

$$\begin{eqnarray*} &&{L(\lbrace {\bf {J}}_{kf_i},{\bf {Z}}_k,{\bf {Y}}_{kf_i},{\bf X}_{k}: \forall \ k,i\rbrace) }\nonumber \\ &&\quad =\sum _i g_{f_i}(\lbrace {\bf J}_{kf_i}:\ \forall k\rbrace)\nonumber \\ &&\qquad + \sum _{i,k} \left(\Vert {\bf {Y}}_{kf_i}^H ({\bf {J}}_{kf_i}- {\bf {B}}_{f_i} {\bf {Z}}_k)\Vert + \frac{\rho }{2} \Vert {\bf {J}}_{kf_i}- {\bf {B}}_{f_i} {\bf {Z}}_k \Vert ^2 \right) \nonumber \\ &&\qquad+\sum _k \left(\Vert {\bf X}_k^H\left({\bf {Z}}_k - \overline{{\bf Z}}_k \right)\Vert + \frac{\alpha }{2} \Vert {\bf {Z}}_k - \overline{{\bf Z}}_k \Vert ^2 \right). \end{eqnarray*}$$

(8)

The Lagrange multiplier for the spectral constraint (3) is given by |${\bf {Y}}_{kf_i}$| (⁠|$\in \mathbb {C}^{2N\times 2}$|⁠) while the Lagrange multiplier for the spatial constraint (4) is given by X_k (⁠|$\in \mathbb {C}^{2FN\times 2}$|⁠). The regularization factors for both constraints are given by ρ and α. Note that both ρ and α can be made direction (k) dependent as well, but we omit this to simplify the notation.

We use the alternating direction method of multipliers (ADMM) algorithm (Boyd et al. 2011) to solve (8) and the pseudo-code is given in Algorithm 1. As in (Yatawatta 2015), we consider a fusion centre that is connected to a set of worker nodes to implement Algorithm 1. Most of the computations are done in parallel at all the worker nodes.

Algorithm 1

Open in new tab

Spatially constrained distributed calibration

The fusion centre performs the update of Z_k in (3) and also the update of Z in (5). Note that the spatial model Z is updated less frequently than the spectral model Z_k, i.e. with a cadence of C in Algorithm 1. The spectral model can be updated in closed form by solving

$$\begin{eqnarray*} {{\bf {Z}}_k} &=& \left(\sum _i \rho {\bf {B}}_{f_i}^T {\bf {B}}_{f_i} + \alpha {\bf I}_{2FN} \right)^{\dagger }\nonumber \\ &&\times \! \left(\sum _i {\bf {B}}_{f_i}^T \left({\bf {Y}}_{kf_i} + \rho {\bf {J}}_{kf_i}\right) + \alpha \overline{{\bf Z}}_k -{\bf X}_{k} \right). \nonumber \\ \end{eqnarray*}$$

(9)

The spatial model update is performed using (4) and (5). In order to prevent overfitting, we use elastic-net regression (Zou & Hastie 2005) to update Z. The model update can be formulated as

$$\begin{eqnarray*} {\bf Z} = \underset{{\bf Z}}{\rm arg\ min} \sum _k \Vert \overline{{\bf Z}}_k - {\bf Z} {\rm{\boldsymbol \Phi }}_k \Vert ^2 + \lambda \Vert {\bf Z}\Vert ^2 + \mu \Vert {\bf Z}\Vert _1 \end{eqnarray*}$$

(10)

where λ and μ are introduced to keep Z low-energy and sparse. We use the fast iterative shrinkage and thresholding algorithm (FISTA; Beck & Teboulle (2009)) to solve (10) using the differentiable cost function

$$\begin{eqnarray*} h({\bf Z})=\sum _k \Vert \overline{{\bf Z}}_k - {\bf Z} {\rm{\boldsymbol \Phi }}_k \Vert ^2 + \lambda \Vert {\bf Z}\Vert ^2 \end{eqnarray*}$$

(11)

and its gradient

$$\begin{eqnarray*} \nabla h = {\bf Z}\left(\sum _k {\rm{\boldsymbol \Phi }}_k {{\rm{\boldsymbol \Phi }}_k}^H + \lambda {\bf I}_{2G} \right) - \sum _k \overline{{\bf Z}}_k {{\rm{\boldsymbol \Phi }}_k}^H. \end{eqnarray*}$$

(12)

In FISTA, we iterate with starting step size t, and at each iteration, we perform soft thresholding of each element of Z (real and imaginary parts taken separately) as

$$\begin{eqnarray*} \mathcal {T}([{\bf Z}]_i)={\rm sign}\left([{\bf Z}-t \nabla h]_{i} \right) \left(|[{\bf Z}-t \nabla h]_{i}| - \mu t\right)_{+} \end{eqnarray*}$$

(13)

where we use the subscript i to denote a single element of Z, with real and imaginary parts taken separately. The step size t is automatically updated in FISTA. The soft thresholding operator is denoted as |$\mathcal {T}(\cdot)$| and (z)₊ = max(0, z).

It is evident from the construction of Z_k and Z that we do not build an explicit model for physical effects, such as the beam shape or the ionosphere. Our model is implicit but much simpler because we decouple the spectral dependence and the spatial dependence. If we want to estimate the systematic errors, along with direction k at frequency f_i (subject to a unitary ambiguity), we can use (3), (4), and (5) to get

$$\begin{eqnarray*} \widehat{\bf J}_{kf_i}={\bf B}_{f_i} {\bf Z} {\rm{\boldsymbol \Phi }}_k \end{eqnarray*}$$

(14)

illustrating the decoupling of spectral and spatial coordinates. Another way of looking at (14) is that we factorize the systematic errors, along direction k at frequency f_i, into three matrices (similar to a low-rank matrix approximation, Lu & Yang (2015)), out of which, we have some freedom in selecting |${\bf B}_{f_i}$| and |${\rm{\boldsymbol \Phi }}_k$|⁠. This emphasizes the careful selection of basis functions in frequency and in space.

3 PERFORMANCE ANALYSIS

We reuse our previous work (Yatawatta 2018a; Yatawatta 2019b) and derive the influence function (Hampel et al. 1986) as a performance measure. From (9), we select a single frequency, i.e. f_i = f, and a single direction k. At convergence, the global spectral model Z_k can be written as a function of variables J_kf and Y_kf as

$$\begin{eqnarray*} {\bf Z}_k=\rho {\bf P}{\bf J}_{kf} + {\bf P}{\bf Y}_{kf}+{\bf R} \end{eqnarray*}$$

(15)

where

$$\begin{eqnarray*} {\bf P}\buildrel\triangle \over = \left(\sum _i \rho {\bf {B}}_{f_i}^T {\bf {B}}_{f_i} + \alpha {\bf I}_{2FN} \right)^{\dagger } {\bf B}_f^T\ \ \in \mathbb {C}^{2FN\times 2N} \end{eqnarray*}$$

(16)

and the remainder term R is independent of J_kf and Y_kf. We substitute (15) to (8) and find the gradient with respect to J_kf and Y_kf. Let |${\bf F}\buildrel\triangle \over ={\bf I}_{2N}-\rho {\bf B}_f{\bf P}$| (⁠|$\in \mathbb {C}^{2N\times 2N}$|⁠) and let r₁(R₁) and r₂(R₂) be the remainder terms that are independent of J_kf and Y_kf. We can simplify the gradient of (8) as

$$\begin{eqnarray*} && {{\rm grad}(L,{\bf J}_{kf})}\nonumber \\ &&\quad={\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf}) + \left(\frac{\rho }{2}{\bf F}^H{\bf F}+\frac{\alpha }{2}\rho ^2 {\bf P}^H{\bf P}\right){\bf J}_{kf}\nonumber \\ &&\qquad+ \left(\frac{1}{2}{\bf F}^H{\bf F}+\frac{\alpha }{2}\rho {\bf P}^H{\bf P}\right){\bf Y}_{kf}+{\bf r}_1({\bf R}_1), \end{eqnarray*}$$

(17)

and

$$\begin{eqnarray*} && {{\rm grad}(L,{\bf Y}_{kf})}\nonumber \\ &&\quad=\left(\frac{1}{2}{\bf F}^H{\bf F}+\frac{\alpha }{2}\rho {\bf P}^H{\bf P}\right){\bf J}_{kf}\nonumber \\ &&\qquad+ \left(-\frac{1}{2\rho }({\bf I}-{\bf F}^H{\bf F})+\frac{\alpha }{2}{\bf P}^H{\bf P}\right){\bf Y}_{kf}+ {\bf r}_2({\bf R}_2). \end{eqnarray*}$$

(18)

At a local minimum, we have grad(L, J_kf) = 0 and grad(L, Y_kf) = 0. Therefore, we can equate (17) and (18) to zero to get a system of equations as

$$\begin{eqnarray*} && {{\bf H}\begin{bmatrix}\bf J_{kf}\\ {\bf Y}_{kf} \end{bmatrix}}\nonumber \\ &&+ \begin{bmatrix}\rm grad(g_{f}({\bf J}_f),{\bf J}_f)+{\bf r}_1({\bf R}_1)\\ {\bf r}_2({\bf R}_2)\\ \end{bmatrix} = \begin{bmatrix}\bf 0\\ {\bf 0}\\ \end{bmatrix} \end{eqnarray*}$$

(19)

where

$$\begin{eqnarray*} {\bf H}=\begin{bmatrix}H_{11} & H_{12}\\ H_{21} & H_{22}\\ \end{bmatrix} \end{eqnarray*}$$

(20)

with

$$\begin{eqnarray*} {\bf H}_{11}\buildrel\triangle \over =\left(\frac{\rho }{2}{\bf F}^H{\bf F}+\frac{\alpha }{2}\rho ^2 {\bf P}^H{\bf P}\right) \end{eqnarray*}$$

(21)

$$\begin{eqnarray*} {\bf H}_{12}={\bf H}_{21}^H\buildrel\triangle \over =\left(\frac{1}{2}{\bf F}^H{\bf F}+\frac{\alpha }{2}\rho {\bf P}^H{\bf P}\right) \end{eqnarray*}$$

(22)

$$\begin{eqnarray*} {\bf H}_{22}\buildrel\triangle \over =\left(-\frac{1}{2\rho }({\bf I}-{\bf F}^H{\bf F})+\frac{\alpha }{2}{\bf P}^H{\bf P}\right). \end{eqnarray*}$$

(23)

Following Yatawatta (2018a), we take the differential of (19) to get

$$\begin{eqnarray*} {\bf H} \begin{bmatrix}\mathrm{ d}{\bf J}_{kf}\\ \mathrm{ d}{\bf Y}_{kf}\\ \end{bmatrix} + \begin{bmatrix}\mathrm{ d}{\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf})\\ {\bf 0}\\ \end{bmatrix} = \begin{bmatrix}\bf 0\\ {\bf 0}\\ \end{bmatrix} \end{eqnarray*}$$

(24)

where terms independent of J_kf and Y_kf become zero in the differential. By row elimination, we can simplify (24) to eliminate Y_kf to get

$$\begin{eqnarray*} \widetilde{\bf H}\mathrm{ \mathrm{ d}}{\bf J}_{kf} + \mathrm{ d}{\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf}) ={\bf 0} \end{eqnarray*}$$

(25)

where

$$\begin{eqnarray*} \widetilde{\bf H} = {\bf H}_{11} - {\bf H}_{12}{\bf H}_{22}^{\dagger }{\bf H}_{21},\ \ \in \mathbb {C}^{2N\times 2N}. \end{eqnarray*}$$

(26)

Using the influence function, we study the change in J_kf, i.e. the solution to (7), due to small changes in the input data. For any given baseline p, q at any given time and frequency, there are eight real data points because |${\bf V}_{pqf_i}$| in (1) is a matrix in |$\mathbb {C}^{2\times 2}$| requiring eight real data points. We select one baseline (say p′, q′) and one real valued data point out of eight (say r). Let this data point be |$x_{p^\prime q^\prime r}$|⁠. We apply the chain-rule to differentiate g_f(J_kf), and considering this to be a function of both J_kf and input data |$x_{p^\prime q^\prime r}$|⁠,

$$\begin{eqnarray*} \mathrm{ d} \mathrm{vec}\left({\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf})\right) \!&=&\! \mathcal {D}_{\bf J}{\rm grad}(g_{f}({\bf J}_{kf})) \mathrm{vec}\left(\mathrm{ d}{\bf J}_{kf}\right)\nonumber \\ &&+\,\frac{\partial }{\partial x_{p^\prime q^\prime r}} \mathrm{vec}\left({\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf}) \right)\!. \nonumber\\ \end{eqnarray*}$$

(27)

where |$\mathcal {D}_{\bf J}{\rm grad}(g_{f}({\bf J}_{kf}))$| is the Hessian of g_f(J_kf) whose exact expression can be found in Yatawatta (2019b). Substituting (27) into (25), we get

$$\begin{eqnarray*} &&{\bf I}_2\otimes \widetilde{\bf H}\mathrm{vec}\left(\mathrm{ d}{\bf J}_{kf}\right) +\left(\mathcal {D}_{\bf J}{\rm grad}(g_{f}({\bf J}_{kf})) \mathrm{vec}\left(\mathrm{ d}{\bf J}_{kf}\right) \right.\nonumber \\ &&\quad +\,\left. \frac{\partial }{\partial x_{p^\prime q^\prime r}} \mathrm{vec}\left({\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf}) \right) \right) = {\bf 0}. \end{eqnarray*}$$

(28)

Using the fact that only |${\bf V}_{p^\prime q^\prime f}$| in (1) is dependent on |$x_{p^\prime q^\prime r}$|⁠, we have

$$\begin{eqnarray*} && {\frac{\partial }{\partial x_{p^\prime q^\prime r}} \mathrm{vec}\left({\rm grad}(g_{f}({\bf J}_{kf}),{\bf J}_{kf}) \right)}\nonumber \\ && = - \left({\bf A}_{q^\prime }{\bf J}_{kf} {\bf C}_{p^\prime q^\prime f}^H\right)^T \otimes {\bf A}_{p^\prime }^T \mathrm{vec}\left(\frac{\partial {\bf V}_{p^\prime q^\prime f}}{\partial x_{p^\prime q^\prime r}}\right) \end{eqnarray*}$$

(29)

where the canonical selection matrix A_p (⁠|$\in \mathbb {R}^{2\times 2N}$|⁠) is given as

$$\begin{eqnarray*} {\bf A}_p \buildrel\triangle \over =[{\bf 0},{\bf 0},\ldots ,{\bf I}_2,\ldots ,{\bf 0}]. \end{eqnarray*}$$

(30)

In other words, only the p-th block of (30) is I₂ and the rest are all zeros.

Finally we have

$$\begin{eqnarray*} && {\mathrm{vec}\left(\frac{\partial {\bf J}_{kf}}{\partial x_{p^\prime q^\prime r}}\right)}\nonumber \\ &&\quad=\left(\mathcal {D}_{\bf J}{\rm grad}(g_{f}({\bf J}_{kf}))+{\bf I}_2\otimes \widetilde{\bf H}\right)^{\dagger }\nonumber \\ &&\qquad\times \left({\bf A}_{q^\prime }{\bf J}_{kf} {\bf C}_{p^\prime q^\prime f}^H\right)^T \otimes {\bf A}_{p^\prime }^T \mathrm{vec}\left(\frac{\partial {\bf V}_{p^\prime q^\prime f}}{\partial x_{p^\prime q^\prime r}}\right) \end{eqnarray*}$$

(31)

that relates the change in J_kf due to a small change in the input datum |$x_{p^\prime q^\prime r}$|⁠.

We use (31) similar to (Yatawatta & Avruch 2021) section 2.2 to study the influence function of the residual R_pqf,

$$\begin{eqnarray*} {\bf R}_{pqf}={\bf V}_{pqf}-\sum _{k=1}^{K} \widehat{{\bf J}}_{kpf} {\bf C}_{kpqf} \widehat{{\bf J}}_{kqf}^H \end{eqnarray*}$$

(32)

where |$\widehat{{\bf J}}_{kpf} \forall k,p,f$| are calculated using the solution of (7). We consider a mapping between the input data and the output residual as

$$\begin{eqnarray*} {\bf y}={\bf x}-{\bf s}(\widehat{{\rm{\boldsymbol \theta }}}). \end{eqnarray*}$$

(33)

where x is the input and y is the residual, both |$\in \mathbb {R}^D$|⁠, where the data length D = 8 × N(N − 1)/2 × Δ_t indicates the amount of data used to obtain a solution for (7), for one frequency. The model s(·) describes the sky model and is parametrized by |$\widehat{{\bf J}}_{kpf} \forall k,p,f$| which is represented as |$\widehat{{\rm{\boldsymbol \theta }}}$| real parameters.

We use the statistical relation between the input x and the output yby representing their probability density functions, p_X(x) and p_Y(y) as

$$\begin{eqnarray*} p_X({\bf x}) = |\mathcal {J}| p_Y({\bf y}) \end{eqnarray*}$$

(34)

where |$\mathcal {J}$| is the Jacobian of the mapping between the input and the output, which is dependent on the model |${\bf s}(\widehat{{\rm{\boldsymbol \theta }}})$| in (33) which in turn is dependent on (31). The exact expressions can be found in Yatawatta (2018a), Yatawatta (2019b). The determinant of |$\mathcal {J}$| is given as

$$\begin{eqnarray*} |\mathcal {J}|=\exp \left(\sum _{i=1}^D \log (1 + \lambda _i) \right) \end{eqnarray*}$$

(35)

where λ_i are eigenvalues that are dependent on the model and the loss function (see equation 14 in Yatawatta & Avruch (2021)). Ideally, all λ_i → 0 so we have |$|\mathcal {J}|=1$|⁠, i.e. no distortion in the the probability density relation (34). In practice, some λ_i are non zero (and negative). Therefore, there is always a distortion due to the calibration. Another way of looking at this distortion is to measure the number of degrees of freedom absorbed into the calibration. This can be done by measuring the area of the epigraph of the curve 1 + λ_i up to the ordinate 1. As an example, a linear system is considered in section 3.1 of Yatawatta (2019b). We will give an example related to calibration in Section 4.

4 SIMULATIONS

We simulate an interferometric array with N = 62 stations, using phased array beams, such as LOFAR. The duration of the observation is 10 min, divided into samples with 10 s integration, so in total 60 time samples. We use P = 8 frequencies, equally spaced in the range [115, 185] MHz. The calibration is performed using every Δ_t = 10 time samples, or 100 s of data.

The sky model consists of K = 10 point-sources in the sky that are being calibrated, with their positions randomly chosen within a 30 deg radius from the field centre. Their intrinsic fluxes are randomly chosen in the range [100, 1000] Jy and their spectral indices are randomly chosen from a standard normal distribution. An additional 400 weak sources (both point sources and Gaussians) are randomly positioned across a 16 × 16 square degrees field of view. Their intensities are uniform-randomly selected from [0.01,0.5] Jy with flat spectra. All the aforementioned sources are unpolarized. A model for diffuse structure (with Stokes I, Q, and U fluxes) based on shapelets are also added to the simulation. The simulation incorporates beam effects, both the dipole beam and the station beam.

The systematic errors, i.e. |${\bf J}_{pkf_i}$| in (1), are simulated for the K = 10 sources as follows. For any given p, we simulate the eight values of |${\bf J}_{pkf_i}$| for the central frequency and for k = 1. We multiply this with a random third-order polynomial in frequency to get the frequency variation. We also multiply this value with a random sinusoidal in time to get time variability. We get spatial variability by propagating |${\bf J}_{pkf_i}$| for k = 1 to other directions (or other values of k). We do this by generating random planes in l, m coordinates (such as a₁l + a₂m + a₃ where a₁, a₂, a₃ are generated from a standard normal distribution) and multiplying the eight values of |${\bf J}_{pkf_i}$| for k = 1 with these random planes evaluated at the l, m coordinates of each direction. Finally, an additional random value drawn from a standard normal distribution is added to the values of |${\bf J}_{pkf_i}$| at each k.

The noise N_pq in (1) is generated as complex zero mean Gaussian distributed elements and is added to the simulated signal with a signal to noise ratio of 0.05. An example of a simulation is shown in Fig. 1(a).

Figure 1.

Sample images (not deconvolved) made using simulated data, covering about 13.3 × 13.3 square degrees in the sky. (a) The image before calibration. (b) The diffuse sky and the weak sources that are hidden in the simulated data. (c) The residual image after calibration, without using spatial regularization. (d) The residual image after calibration, with spatial regularization. Both (c) and (d) reveal the weak sky shown in (b), but the intensity is much less.

Open in new tab Download slide

We perform calibration using a third order Bernstein polynomial in frequency for consensus. The spectral regularization parameter ρ is chosen with a peak value of 300 and scaled according to the apparent flux of each of the K = 10 directions being calibrated. When spatial regularization is enabled, we use a spherical harmonic basis with order 3 (giving G = 9 basis functions). The spatial regularization parameter α is set equal to ρ for each direction. We use 20 ADMM iterations and the spatial model is updated at every tenth iteration (if enabled). The elastic net regression to update the spatial model uses λ = 0.01 and μ = 10⁻⁴ as the L2 and L1 constraints and we use 40 FISTA iterations.

We measure the performance of calibration by measuring the suppression of the unmodelled weak sources in the sky due to calibration. As seen in Fig. 1(b), we can simulate only the weak sky, and compare this to the residual images made after calibration. Numerically, we can cross-correlate for example Fig. 1(b) with Fig. 1(c) or Fig. 1(d). We show the correlation coefficient calculated this way for several simulations in Fig. 2. Note that, calibration, along K = 10 directions with only 100 s of data, is not well constrained. None the less, we see from Fig. 2 that introducing spatial constraints into the problem increases the correlation of the residual with the weak sky. In other words, the spatial constraints decreases the loss of unmodelled structure in the sky.

Figure 2.

The correlation of the residual maps with the weak sky signal (in Stokes I, Q, and U). With spatial constraint, we can increase the correlation from about 30 per cent to 34 per cent on average.

Open in new tab Download slide

We use the influence function derived in Section 3 to explain the behaviour in Fig. 2. We show the plots of 1 + λ_i with i in Fig. 3 for calibration with no regularization (ρ = α = 0), with only spectral regularization (ρ > 0, α = 0), and with both spectral and spatial regularization (ρ > 0, α > 0). We see that calibration with both spatial and spectral regularization gives the curve of 1 + λ_i with the lowest area above the curve (or the epigraph). In other words, the number of degrees of freedom consumed by calibration with both spectral and spatial regularization is the lowest. This also means that the loss of weak signals due to calibration is the lowest for both spectral and spatial regularized calibration, as we show in Fig. 2. However, the improvement depends on the actual amount of deterministic variability of systematic errors with direction and the appropriate selection of regularization parameters ρ, α.

Figure 3.

Eigenvalue plots for the influence function of calibration with (i) no regularization (ii) spectral regularization, and (iii) both spectral and spatial regularization. The epigraph of the curve is lowest with both spectral and spatial regularization.

Open in new tab Download slide

5 CONCLUSIONS

We have presented a method to incorporate spatial constraints to spectrally constrained direction-dependent calibration. Whenever the direction-dependent errors have a deterministic spatial variation, we can bootstrap a spatial model. With only a small increase in computations, we are able to improve the performance of spectrally constrained calibration. The improvement depends on the actual spatial variation of systematic errors, i.e. whether it is deterministic or stochastic. Moreover, the appropriate selection of the regularization factor α is also affecting the final result. Future work will focus on automating the determination of optimal regularization and modelling parameters, for example, by using reinforcement learning.

ACKNOWLEDGEMENTS

We thank Chris Jordan for the careful review and valuable comments.

DATA AVAILABILITY

Ready-to-use software based on this work and test data are available online.¹

Footnotes

1

http://sagecal.sourceforge.net and https://github.com/SarodYatawatta/smart-calibration

REFERENCES

Albert

J. G.

,

Oei

M. S. S. L.

,

van Weeren

R. J.

,

Intema

H. T.

,

Röttgering

H. J. A.

,

2020

,

A&A

,

633

,

A77

10.1051/0004-6361/201935668

Crossref

Search ADS

Arora

B. S.

et al. ,

2016

,

Publ. Astron. Soc. Aust.

,

33

,

e031

10.1017/pasa.2016.22

Crossref

Search ADS

Beck

A.

,

Teboulle

M.

,

2009

,

SIAM J. Imaging Sci.

,

2

,

183

10.1137/080716542

Crossref

Search ADS

Bhatnagar

S.

,

Cornwell

T. J.

,

Golap

K.

,

Uson

J. M.

,

2008

,

A&A

,

487

,

419

10.1051/0004-6361:20079284

Crossref

Search ADS

Boyd

S.

,

Parikh

N.

,

Chu

E.

,

Peleato

B.

,

Eckstein

J.

,

2011

,

Found. Trends® Mach. Learn.

,

3

,

1

Brossard

M.

,

El Korso

M. N.

,

Pesavento

M.

,

Boyer

R.

,

Larzabal

P.

,

Wijnholds

S. J.

,

2018

,

Signal Processing

,

145

,

258

Cotton

W.

,

2007

,

Very Large Array (VLA) Scientific Memorandum

,

118

,

1

Cotton

W. D.

,

Mauch

T.

,

2021

,

Publications of the Astronomical Society of the Pacific, Vol. 133, No. 1028

.

Fessler

J.

,

Hero

A.

,

1994

,

IEEE Trans. Signal Process.

,

42

,

2664

Crossref

Search ADS

Hamaker

J. P.

,

Bregman

J. D.

,

Sault

R. J.

,

1996

,

A&AS

,

117

,

137

Crossref

Search ADS

Hampel

F. R.

,

Ronchetti

E.

,

Rousseeuw

P. J.

,

Stahel

W. A.

,

1986

,

Robust Statistics: The Approach Based on Influence Functions

.

Wiley

,

New York, USA

https://archive-ouverte.unige.ch/unige:23238

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Intema

H. T.

,

van der Tol

S.

,

Cotton

W. D.

,

Cohen

A. S.

,

van Bemmel

I. M.

,

Röttgering

H. J. A.

,

2009

,

A&A

,

501

,

1185

10.1051/0004-6361/200811094

Crossref

Search ADS

Kazemi

S.

,

Yatawatta

S.

,

Zaroubi

S.

,

Labropoluos

P.

,

de Bruyn

A.

,

Koopmans

L.

,

Noordam

J.

,

2011

,

MNRAS

,

414

,

1656

Crossref

Search ADS

Lu

Y.

,

Yang

J.

,

2015

,

preprint (arXiv:1507.00333)

McMahan

B. H.

,

Moore

E.

,

Ramage

D.

,

Hampson

S.

,

Agüera y Arcas

B.

,

2016

,

Proceedings of the 20 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017. JMLR: W&CP, Vol. 54

.

Mitchell

D. A.

,

Greenhill

L. J.

,

Wayth

R. B.

,

Sault

R. J.

,

Lonsdale

C. J.

,

Cappallo

R. J.

,

Morales

M. F.

,

Ord

S. M.

,

2008

,

IEEE J. Sel. Top. Signal Process.

,

2

,

707

10.1109/JSTSP.2008.2005327

Crossref

Search ADS

Ollier

V.

,

Korso

M. N. E.

,

Ferrari

A.

,

Boyer

R.

,

Larzabal

P.

,

2018

,

Signal Process.

,

153

,

348

https://doi.org/10.1016/j.sigpro.2018.07.024

Crossref

Search ADS

Yatawatta

S.

,

2015

,

MNRAS

,

449

,

4506

10.1093/mnras/stv596

Crossref

Search ADS

Yatawatta

S.

,

2018a

,

in 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM)

.

IEEE

,

Piscataway

, p.

485

10.1109/SAM.2018.8448481

Yatawatta

S.

,

2018b

,

in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

.

IEEE

,

Piscataway

, p.

3489

10.1109/ICASSP.2018.8462230

Yatawatta

S.

,

2019a

,

MNRAS

,

486

,

5646

10.1093/mnras/stz1222

Crossref

Search ADS

Yatawatta

S.

,

2019b

,

MNRAS

,

486

,

5646

10.1093/mnras/stz1222

Crossref

Search ADS

Yatawatta

S.

,

2020

,

MNRAS

,

493

,

6071

10.1093/mnras/staa648

Crossref

Search ADS

Yatawatta

S.

,

Avruch

I. M.

,

2021

,

MNRAS

,

505

,

2141

10.1093/mnras/stab1401

Crossref

Search ADS

Yatawatta

S.

,

Diblen

F.

,

Spreeuw

H.

,

Koopmans

L. V. E.

,

2018

,

MNRAS

,

475

,

708

10.1093/mnras/stx3130

Crossref

Search ADS

Zou

H.

,

Hastie

T.

,

2005

,

J. R. Stat. Soc.: Series B

,

67

,

301

10.1111/j.1467-9868.2005.00503.x

Crossref

Search ADS

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
December 2021	2
January 2022	14
February 2022	9
March 2022	6
April 2022	7
May 2022	3
June 2022	9
July 2022	7
August 2022	2
September 2022	3
October 2022	5
December 2022	8
January 2023	3
February 2023	2
March 2023	4
April 2023	4
June 2023	3
August 2023	2
September 2023	9
October 2023	1
November 2023	1
December 2023	1
January 2024	9
February 2024	6
March 2024	8
April 2024	11
May 2024	10
June 2024	4
July 2024	5
August 2024	4
September 2024	11
October 2024	14
November 2024	15
December 2024	5
January 2025	8
February 2025	5
March 2025	16
April 2025	5
May 2025	4

Article Contents

Spatially constrained direction-dependent calibration

ABSTRACT

1 INTRODUCTION

2 SPECTRALLY AND SPATIALLY CONSTRAINED CALIBRATION

3 PERFORMANCE ANALYSIS

4 SIMULATIONS

5 CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Spatially constrained direction-dependent calibration

ABSTRACT

1 INTRODUCTION

2 SPECTRALLY AND SPATIALLY CONSTRAINED CALIBRATION

3 PERFORMANCE ANALYSIS

4 SIMULATIONS

5 CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only