Abstract

A major stage of radio interferometric data processing is calibration or the estimation of systematic errors in the data and the correction for such errors. A stochastic error (noise) model is assumed, and in most cases, this underlying model is assumed to be Gaussian. However, outliers in the data due to interference or due to errors in the sky model would have adverse effects on processing based on a Gaussian noise model. Most of the shortcomings of calibration such as the loss in flux or coherence, and the appearance of spurious sources, could be attributed to the deviations of the underlying noise model. In this paper, we propose to improve the robustness of calibration by using a noise model based on Student's t-distribution. Student's t-noise is a special case of Gaussian noise when the variance is unknown. Unlike Gaussian-noise-model-based calibration, traditional least-squares minimization would not directly extend to a case when we have a Student's t-noise model. Therefore, we use a variant of the expectation–maximization algorithm, called the expectation–conditional maximization either algorithm, when we have a Student's t-noise model and use the Levenberg–Marquardt algorithm in the maximization step. We give simulation results to show the robustness of the proposed calibration method as opposed to traditional Gaussian-noise-model-based calibration, especially in preserving the flux of weaker sources that are not included in the calibration model.

1 INTRODUCTION

Radio interferometry gives an enhanced view of the sky, with increased sensitivity and higher resolution. There is a trend towards using phased arrays as the building blocks of radio telescopes (LOFAR1 and SKA2) as opposed to the traditional dish-based interferometers. In order to reach the true potential of such telescopes, calibration is essential. Calibration refers to the estimation of systematic errors introduced by the instrument (such as the beam shape and receiver gain) and also by the propagation path (such as the ionosphere), and correction for such errors, before any imaging is done. Conventionally, calibration is done by observing a known celestial object (called the external calibrator), in addition to the part of the sky being observed. This is improved by self-calibration (Cornwell & Wilkinson 1981), which is essentially using the observed sky itself for the calibration. Therefore, self-calibration entails consideration of both the sky as well as the instrument as unknowns. By iteratively refining the sky and the instrument model, the quality of the calibration is improved by orders of magnitude in comparison to using an external calibrator.

From a signal processing perspective, calibration is essentially the maximum likelihood (ML) estimation of the instrument and sky parameters. An in-depth overview of existing calibration techniques from an estimation perspective can be found in, e.g., Boonstra & van der Veen (2003), van der Veen, Leshem & Boonstra (2004), van der Tol, Jeffs & van der Veen (2007) and Kazemi et al. (2011). All such calibration techniques are based on a Gaussian noise model, and the ML estimate is obtained by minimizing the least-squares cost function using a non-linear optimization technique such as the Levenberg–Marquardt (LM; Levenberg 1944; Marquardt 1963) algorithm. Despite the obvious advantages of self-calibration, there are also some limitations. For instance, Cornwell & Fomalont (1999) give a detailed overview of the practical problems in self-calibration, in particular due to errors in the initial sky model. It is well known that the sources not included in the sky model have lower flux (or loss of coherence), and Martí-Vidal et al. (2010) is a recent study on this topic. Moreover, under certain situations, fake or spurious sources could appear due to calibration as studied by Martí-Vidal & Marcaide (2008).

In this paper, we propose to improve the robustness of calibration by assuming a Student's t-noise model (Gosset 1908) instead of a Gaussian noise model. One of the earliest attempts in deviating from a Gaussian-noise-model-based calibration can be found in Schwab (1982), where instead of minimizing a least-squares cost function, an l1 norm minimization was considered. Minimizing the l1 norm is equivalent to having a noise model which has a Laplacian distribution (Aravkin, Friedlander & van Leeuwen 2012). The motivation for Schwab (1982) to deviate from the Gaussian noise model was the ever present outliers in the radio interferometric data.

In a typical radio interferometric observation, there is a multitude of causes for outliers in the data.

  • Radio frequency interference caused by man-made radio signals is a persistent cause of outliers in the data. However, data affected by such interference are removed before any calibration is performed by flagging (e.g. Offringa et al. 2010). But there might be faint interference still present in the data, even after flagging.

  • The initial sky model used in self-calibration is almost always different from the true sky that is observed. Such model errors also create outliers in the data. This is especially significant when we observe a part of the sky that has sources with complicated, extended structure. Moreover, during calibration, only the brightest sources are normally included in the sky model and the weaker sources act together to create outliers.

  • During day-time observations, the Sun could act as a source of interference, especially during high solar activity. In addition the Galactic plane also affects the signals on short baselines.

  • An interferometer made of phased arrays will have sidelobes that change with time and frequency. It is possible that a strong celestial source, faraway from the part of the sky being observed, will pass through such a sidelobe. This will also create outliers in the data.

To summarize, model errors of the sky as well as the instrument will create outliers in the data and in some situations calibration based on a Gaussian noise model will fail to perform satisfactorily. In this paper, we consider the specific problem of the effect of unmodelled sources in the sky during calibration. We consider ‘robustness’ to be the preservation of the fluxes of the unmodelled sources. Therefore, our prime focus is to minimize the loss of flux or coherence of unmodelled sources, and our previous work (Yatawatta, Kazemi & Zaroubi 2012) has measured robustness in terms of the quality of calibration.

Robust data modelling using Student's t-distribution has been applied in many diverse areas of research, and Lange, Little & Tylor (1989), Bartkowiak (2007) and Aravkin et al. (2012) are few such examples. However, the traditional least-squares minimization is not directly applicable when we have a non-Gaussian noise model, and we apply the expectation–maximization (EM; Dempster, Laird & Rubin 1977) algorithm to convert calibration into an iteratively reweighted least-squares minimization problem, as proposed by Lange et al. (1989). In fact, we use an extension of the EM algorithm called the expectation–conditional maximization either (ECME) algorithm (Liu & Rubin 1995) to convert calibration to a tractable minimization problem. However, we emphasize that we do not force a non-Gaussian noise model on to the data. In case there are no outliers and the noise is actually Gaussian, our algorithms would work as traditional calibration does.

The rest of the paper is organized as follows. In Section 2, we give an overview of radio interferometric calibration. We consider the effect of unmodelled sources in the sky during calibration in Section 3. Next, in Section 4, we discuss the application of Student's t-noise model in calibration. We also present the weighted LM routine used in calibration. In Section 5, we present simulation results to show the superiority of the proposed calibration approach in minimizing the loss in coherence and present conclusions in Section 6.

Notation. Lowercase bold letters refer to column vectors (e.g. |${\boldsymbol y}$|⁠). Uppercase bold letters refer to matrices (e.g. |${\bf {\sf C}}$|⁠). Unless otherwise stated, all parameters are complex numbers. The matrix inverse, transpose, Hermitian transpose and conjugation are referred to as (.)−1, (.)T, (.)H and (.), respectively. The matrix Kronecker product is given by ⊗. The statistical expectation operator is given as E{.}. The vectorized representation of a matrix is given by vec(.). The ith diagonal element of matrix |${\bf {\sf A}}$| is given by |${\bf {\sf A}}_{ii}$|⁠. The ith element of a vector |${\boldsymbol y}$| is given by |${\boldsymbol y}_i$|⁠. The identity matrix is given by |${\bf {\sf I}}$|⁠. Estimated parameters are denoted by a hat, |$\widehat{(.)}$|⁠. All logarithms are to the base e, unless stated otherwise. The l2 norm is given by ‖.‖ and the infinity norm is given by ‖.‖. A random variable X that has a distribution |$\mathcal {P}$| is denoted by |$X \sim {\mathcal {P}}$|⁠.

2 DATA MODEL

We give a brief overview of radio interferometry in this section. For more information about radio interferometry, the reader is referred to Thompson, Moran & Swenson (2001) and to Hamaker, Bregman & Sault (1996) for the data model in particular. We consider the radio frequency sky to be composed of discrete sources, faraway from the Earth such that the approaching radiation from each one of them appears to be plane waves. We decompose the contribution from the ith source into two orthogonal polarizations |${\boldsymbol u}_i=[u_{xi}\ u_{yi}]^{\rm T}$|⁠. The interferometric array consists of R receiving elements with dual polarized feeds. At the pth station, this plane wave causes an induced voltage, which is dependent on the beam attenuation as well as the radio frequency receiver chain attenuation. The induced voltages at the x and y polarization feeds |$\tilde{\boldsymbol v}_{pi}=[v_{xpi}\ v_{ypi}]^{\rm T}$| due to a source i are given as
(1)
The 2 × 2 Jones matrix |${\bf {\sf J}}_{pi}$| in equation (1) represents the effects of the propagation medium, the beam shape and the receiver. If there are K known sources (that are in the sky model) and K unknown sources, the total signal will be a superposition of K + K such signals as in equation (1).
Consider the correlation of signals at the pth receiver and the qth receiver, as shown in Fig. 1, with proper signal delay. After correlation, the correlated signal of the pth station and the qth station (named as the visibilities), |${\bf {\sf V}}_{pq}=E\lbrace {\boldsymbol v}_{p} {\boldsymbol v}_{q}^{\rm H} \rbrace$|⁠, is given by
(2)
In equation (2), |${\bf {\sf J}}_{pi}$| and |${\bf {\sf J}}_{qi}$| are the Jones matrices describing errors along the direction of source i, at stations p and q, respectively. The 2 × 2 noise matrix is given as |${\bf {\sf N}}_{pq}$|⁠. The contribution from the ith source on the baseline pq is given by the 2 × 2 matrix |${\bf {\sf C}}_{pqi}$|⁠. The noise matrix |${\bf {\sf N}}_{pq}$| is assumed to have elements with zero mean, complex Gaussian entries with equal variance in real and imaginary parts. Moreover, in equation (2), we have split the contribution from the sky into two parts: K sources that are known to us and K sources that are unknown. Generally, the bright sources are always known but there are infinitely many faint sources that are too weak to be detected and too numerous to be included in the sky model. Therefore, almost always K is much larger than K.
A basic radio interferometer that correlates the signals received from faraway celestial sources. The signals are corrupted by Earth's atmosphere as well as by the receiver beam pattern, and these corruptions are represented by ${\bf {\sf J}}_p$ and ${\bf {\sf J}}_q$.
Figure 1.

A basic radio interferometer that correlates the signals received from faraway celestial sources. The signals are corrupted by Earth's atmosphere as well as by the receiver beam pattern, and these corruptions are represented by |${\bf {\sf J}}_p$| and |${\bf {\sf J}}_q$|⁠.

During calibration, we only estimate the Jones matrices |${\bf {\sf J}}_{pi}$| for p ∈ [1, R] and i ∈ [1, K]; in other words, we estimate the errors along the known bright sources. Due to our ignorance of the K unknown sources, the effective noise during calibration becomes
(3)
and our assumption regarding the noise being complex circular Gaussian breaks down, depending on the properties of the signals of the weak sources. The prime motivation of this paper is to address this problem of the possible non-Gaussianity of the noise due to an error in the sky model. A similar situation could arise even for calibration along one direction (or direction independent calibration), when K = 1, if there is an error in the source model, for instance in the shape of the source.
The vectorized form of equation (2), |${\boldsymbol v}_{pq}=\mathrm{vec}({\bf {\sf V}}_{pq}),$| can be written as
(4)
where |${\boldsymbol n}_{pq}=\mathrm{vec}({\bf {\sf N}}_{pq})$|⁠. Depending on the time and frequency interval within which calibration solutions are obtained, we can stack up all cross-correlations within that interval as
(5)
where |${\boldsymbol d}$| is a vector of size N × 1 of real data points. Thereafter, we can rewrite the data model as
(6)
where |${\boldsymbol \theta }$| is the real parameter vector (size M × 1) that is estimated by calibration. The contribution of the ith known source on all data points is given by |${\boldsymbol s}_i({\boldsymbol \theta })$| (size N × 1) and the unknown contribution from the ith unknown source is given by |${\boldsymbol s}_{i^{\prime }}$| (size N × 1). The noise vector based on a Gaussian noise model is given by |${\boldsymbol n}$| (size N × 1). The parameters |${\boldsymbol \theta }$| are the elements of |${\bf {\sf J}}_{pi}$|s, with real and imaginary parts considered separately.
The ML estimate of |${\boldsymbol \theta }$| under a zero mean, white Gaussian noise is obtained by minimizing the least-squares cost
(7)
as done in current calibration approaches (Boonstra & van der Veen 2003; van der Veen et al. 2004; van der Tol et al. 2007; Kazemi et al. 2011). However, due to the unmodelled sources, the effective noise is actually
(8)
even when |${\boldsymbol n}$| is assumed to be Gaussian. Therefore, traditional calibration based on a least-squares cost minimization would not perform optimally. In order to improve this, we have to consider the statistical properties of the effective noise |${\boldsymbol n}^{\prime }$|⁠, and we shall do that in Section 3.

3 EFFECT OF UNMODELlED SOURCES IN CALIBRATION

In this section, we study the effect of unmodelled sources on |${\bf {\sf N}}^{\prime }_{pq}$| in equation (3) when |${\bf {\sf N}}_{pq}$| has elements with zero mean, complex circular white Gaussian statistics. We only select one element from the 2 × 2 matrix (say at 1st row and column) for simplicity. Let us denote the baseline coordinates as u, v, w in wavelengths (we omit the pq subscript for simplicity). We can rewrite equation (3) for just one element as
(9)
In equation (9), |$g_{pq i^{\prime }}$| correspond to the corruptions along the direction i (contributions from |${\bf {\sf J}}_{pi^{\prime }}$| and |${\bf {\sf J}}_{qi^{\prime }}$|⁠). The intensity of the ith source seen on the baseline pq is given by |$I_{pq i^{\prime }}$|⁠. The direction cosines of the ith source are given by |$l_{i^{\prime }},m_{i^{\prime }}{\rm and }n_{i^{\prime }}$|⁠. The Gaussian noise is given by |$n_{pq} \sim \mathcal {CN}(0,\rho ^2)$|⁠. We assume that |$g_{pq i^{\prime }}$|⁠, |$I_{pq i^{\prime }}$|⁠, |$l_{i^{\prime }},m_{i^{\prime }},n_{i^{\prime }}$| and npq are statistically independent from each other. Moreover, the sources are assumed to be uniformly distributed in a field of view defined by |$-\overline{l} \le l_{i^{\prime }} \le \overline{l}$| and |$-\overline{m} \le m_{i^{\prime }} \le \overline{m}$|⁠, and that |$(n_{i^{\prime }}-1) \approx 0$|⁠. The sources outside this area in the sky have almost no contribution to the signal due to the fact that the values of |$|g_{pq i^{\prime }}|$| are very small, mainly due to attenuation by the beam shape.
With the above assumptions, we see that
(10)
which is almost zero if |$|u|>\frac{1}{2 \overline{l}}$| and |$|v|>\frac{1}{2 \overline{m}}$|⁠. Therefore, we make the following assumptions applicable to long baselines:
  • the mean of the effective noise is almost equal to the mean of noise, E{zpq} → E{npq} = 0,

  • the variance of effective noise is greater than the variance of noise, E{|zpq|2} > E{|npq|2}.

Let us briefly consider the implications of equation (10) above. First, the field of view is |$2\overline{l} \times 2\overline{m}$| in the sky. Now, consider the longest baseline length or the maximum value of |$\sqrt{u^2+v^2}$| to be |$\overline{u}$|⁠. Therefore, the image resolution will be about |$1/\overline{u}$|⁠, and consider the field of view to be of width |$2 \overline{l} \approx P\times 1/\overline{u}$|⁠. In other words, the field of view is P image pixels when the pixel width is |$1/\overline{u}$|⁠. Now, in order for E{zpq} ≈ 0 in equation (10), we need |$|u|>\frac{1}{2 \overline{l}}$| or |$|u|> \overline{u}/P$| (and a similar expression can be derived for |v|). This means that for baselines that are at least 1/P times the maximum baseline length, we can assume E{zpq} ≈ 0.

To illustrate the above discussion, we give a numerical example considering the LOFAR highband array at 150 MHz. The point spread function at this frequency is about 6 arcsec and the field of view is about 10° in diameter. Therefore, P ≈ 10 × 3600/6 = 6000. The longest baselines is about 80 km and for all baselines that are greater than 80/6 = 13 m, the assumptions made above more or less hold.

To summarize the discussion in this section, we claim that
(11)
in equation (8) and, therefore, |$E\lbrace {\boldsymbol n}^{\prime }\rbrace \rightarrow E\lbrace {\boldsymbol n}\rbrace$|⁠. However, the covariance of |${\boldsymbol n}^{\prime }$| is different than the covariance of |${{\boldsymbol n}}$|⁠, and in general, the effective noise is not necessarily Gaussian anymore.

3.1 SAGE algorithm with unmodelled sources

In our previous work (Kazemi et al. 2011), we have presented the Space Alternating Generalized EM (SAGE; Fessler & Hero 1994) algorithm as an efficient and accurate method to solve equation (7), when the noise model is Gaussian. However, when there are unmodelled sources, as we have seen in this section, the noise model is not necessarily Gaussian.

The SAGE expectation step is finding the conditional mean of the kth signal,
(12)
where |${\boldsymbol x}^k$| is the hidden data. Using this, we can rewrite the observed data |${\boldsymbol d}$| as
(13)
The conditional mean of |${\boldsymbol x}^k$| given |${\boldsymbol d}$|⁠, is given as |$\widehat{{\boldsymbol x}^k}$|⁠, where
(14)
where we still assume a Gaussian noise model |${\boldsymbol n}$|⁠. Under the assumption that |$\sum _{i^{\prime }=1}^{K^{\prime }} {\boldsymbol s}_{i^{\prime }} \rightarrow {\boldsymbol 0}$|⁠, the conditional mean simplifies to
(15)
The SAGE maximization step maximizes the likelihood of the conditional mean |$\widehat{{\boldsymbol x}^k}$| under the noise |${\boldsymbol n}^{\prime }$|⁠. However, we cannot use a least-squares cost function as |${\boldsymbol n}^{\prime }$| is not necessarily Gaussian anymore, because of the unmodelled sources. In Section 4, we explore an alternative noise model based on Student's t-distribution (Gosset 1908) for the maximization of the likelihood.

4 ROBUST CALIBRATION

First, we briefly describe the univariate Student's t-distribution (Lange et al. 1989; Bartkowiak 2007). Let X be a random variable with a normal distribution |$\mathcal {N}(\varepsilon ,\sigma ^2/\gamma )$|⁠, where γ is also a random variable. Then the conditional distribution of X is
(16)
We consider γ to have a gamma distribution, |$\gamma \sim {\mathrm{{\rm g}amma}}(\frac{\nu }{2},\frac{\nu }{2})$|⁠, where ν is positive (also called the number of degrees of freedom). The density function of γ can be given as
(17)
Then, the marginal distribution of X is
(18)
and this is the probability density function which defines the Student's t-distribution. In Fig. 2, we have shown the probability density functions for Gaussian distribution and Student's t-distribution, both with zero mean and unit variance. We see that for low values of the number of degrees of freedom ν, Student's t-distribution has a higher tail. The asymptotic limit of Student's t-distribution is Gaussian as ν → ∞, and for ν > 30, the two distributions are indistinguishable, within the resolution of the data points used in Fig. 2.
Probability density functions for standard normal distribution and Student's t-distribution, with ν = 2 and 30. At ν = 30, the Student's distribution is indistinguishable from the normal distribution.
Figure 2.

Probability density functions for standard normal distribution and Student's t-distribution, with ν = 2 and 30. At ν = 30, the Student's distribution is indistinguishable from the normal distribution.

Reverting back to equation (8), we see that the increase in the noise variance due to the unmodelled sources can be considered as the effect of γ in equation (16). Therefore, we consider the noise vector |${\boldsymbol n}^{\prime }$| to have independent, identically distributed entries, with the distribution given by equation (18) with ε = 0 and σ = ρ = 1. In the SAGE iterations outlined in Section 3.1, at the kth iteration (15), we have |$\widehat{{\boldsymbol x}^k}$| as the data vector and |${\boldsymbol s}_k({\boldsymbol \theta })$| as the model that is used to estimate the parameters |${\boldsymbol \theta }$| (or a subset of the parameters). Therefore, the estimation problem is to find the ML estimate of |${\boldsymbol \theta }$| (size M × 1), given the data |${\boldsymbol y}=\widehat{{\boldsymbol x}^k}$| (size N × 1) and the model |${\boldsymbol f}({\boldsymbol \theta })={\boldsymbol s}_k({\boldsymbol \theta })$| (size N × 1) with noise |${\boldsymbol n}^{\prime }$|⁠. Hence, we can rewrite our data model as
(19)
where the unknowns are |${\boldsymbol \theta }$| and ν, the number of degrees of freedom of noise |${\boldsymbol n}^{\prime }$|⁠. Then, the ith element of the vector |${\boldsymbol y}$| (denoted by |${\boldsymbol y}_i$|⁠) in equation (6) will have a similar distribution as equation (18) with σ = 1 and |$\mu _i={\boldsymbol f}_i({\boldsymbol \theta })$|⁠, where |${\boldsymbol f}_i({\boldsymbol \theta })$| is the ith element of the vector function |${\boldsymbol f}({\boldsymbol \theta })$|⁠. The likelihood function becomes
(20)
and the log-likelihood function is
(21)
Note that unlike for the Gaussian case, minimizing a least-squares cost function (or maximizing the likelihood) will not give us the ML estimate. In addition, we have an extra parameter, ν, which is the number of degrees of freedom. Hence, we use the ECME algorithm (Liu & Rubin 1995; Li, Wang & Chai 2006) to solve this problem. The ECME algorithm is an extension of the EM algorithm for t-distribution presented by Lange et al. (1989).
The auxiliary variables are the weights wi (N values) and a scalar λ. All these are initialized to 1 at the beginning. The expectation step in the ECME algorithm involves the conditional estimation of hidden variables γi (or the weights wi) as
(22)
and the update of the scalar λ
(23)
The maximization step involves finding the value for ν that is a solution for
(24)
where |$\Psi (x)=\frac{{\rm d}}{{\rm d}x} \log \left(\Gamma (x)\right)$| is the digamma function. Since we know that beyond ν > 30 we almost get a Gaussian distribution, and therefore, the search space for finding a solution for equation (24) is kept within 2 ≤ ν ≤ 30 and the initial value for ν is chosen to be 2.
Once wi is known, |${\boldsymbol y}_i$| has a normal distribution with variance determined by wi. Therefore, in the maximization step of the EM algorithm, we minimize the weighted least-squares cost function:
(25)
With this formulation at hand, we present the LM algorithm for robust calibration in Algorithm 1, similar to the presentations in Lourakis (2004) and Madsen, Nielsen & Tingleff (2004). The additional information needed in Algorithm 1 is the Jacobian of |${\boldsymbol f}({\boldsymbol \theta })$|⁠, i.e. |${\bf {\sf J}}({\boldsymbol \theta }) = \frac{\mathrm{\partial} {\boldsymbol f}({\boldsymbol \theta })}{\mathrm{\partial} {\boldsymbol \theta }}$|⁠, that can be calculated in closed form using equations (2) and (4). The diagonal matrix with the weights |$\sqrt{w_i}$| as its diagonal entries is given by |${\bf {\sf W}}$|⁠.

Algorithm 1: Robust Levenberg–Marquardt (ECME)

Require: Data |${\boldsymbol y}$|⁠, mapping |${\boldsymbol f}({\boldsymbol \theta })$|⁠, Jacobian |${\bf {\sf J}}({\boldsymbol \theta })$|⁠, ν, initial value |${\boldsymbol \theta }^0$|

 1: |${\boldsymbol \theta } \leftarrow {\boldsymbol \theta }^0$|⁠; wi ← 1; λ ← 1

 2: whilel < max EM iterations do

 3:  k ← 0; η ← 2

 4:  |${\bf {\sf J}}({\boldsymbol \theta }) \leftarrow {\bf {\sf W}} {\bf {\sf J}}({\boldsymbol \theta })$|

 5:  |${\bf {\sf A}} \leftarrow {\bf {\sf J}}({\boldsymbol \theta })^T{\bf {\sf J}}({\boldsymbol \theta }); {\boldsymbol e}\leftarrow {\bf {\sf W}}({\boldsymbol y}-{\boldsymbol f}({\boldsymbol \theta })); {\boldsymbol g} \leftarrow {\bf {\sf J}}({\boldsymbol \theta })^T {\boldsymbol e}$|

 6:  found |$\leftarrow (\Vert {\boldsymbol g}\Vert _{\infty } <\epsilon _1); \mu \leftarrow \tau \max {\bf {\sf A}}_{ii}$|

 7:  while (not found) and (k < max iterations)

 8:   kk + 1; Solve |$({\bf {\sf A}} + \mu {\bf {\sf I}}){\boldsymbol h}={\boldsymbol g}$|

 9:   if|$\Vert {\boldsymbol h}\Vert <\epsilon _2(\Vert {\boldsymbol \theta }\Vert +\epsilon _2)$|

10:    found ← true

11:   else

12:    |${\boldsymbol \theta }_{new} \leftarrow {\boldsymbol \theta } + {\boldsymbol h}$|

13:    |$\rho \leftarrow (\Vert {\boldsymbol e}\Vert ^2 - \Vert {\bf {\sf W}}({\boldsymbol y}-{\boldsymbol f}({\boldsymbol \theta }_{new}))\Vert ^2)/({\boldsymbol h}^T(\mu {\boldsymbol h}+{\boldsymbol g}))$|

14:    if ρ > 0 then

15:     |${\boldsymbol \theta } \leftarrow {\boldsymbol \theta }_{new}$|

16:     |${\bf {\sf J}}({\boldsymbol \theta }) \leftarrow {\bf {\sf W}} {\bf {\sf J}}({\boldsymbol \theta })$|

17:     |${\bf {\sf A}} \leftarrow {\bf {\sf J}}({\boldsymbol \theta })^T{\bf {\sf J}}({\boldsymbol \theta }); {\boldsymbol e}\leftarrow {\bf {\sf W}}({\boldsymbol y}-{\boldsymbol f}({\boldsymbol \theta })); {\boldsymbol g} \leftarrow {\bf {\sf J}}({\boldsymbol \theta })^T {\boldsymbol e}$|

18:     found |$\leftarrow (\Vert {\boldsymbol g}||_{\infty }\le \epsilon _1)$|

19:     μ ← μmax (1/3, 1 − (2ρ − 1)3); η ← 2

20:    else

21:     μ ← μη; η ← 2η

22:    endif

23:   endif

24:  endwhile

25:  Update weights |$w_i\leftarrow \lambda \frac{\nu +1}{\nu + ({\boldsymbol y}_i - {\boldsymbol f}_i({\boldsymbol \theta }))^2}$|

26:  Update |$\lambda \leftarrow \frac{1}{N} \sum _{i=1}^N w_i$|

27:  Update ν using (24)

28:  ll + 1

29: endwhile

30: return|${\boldsymbol \theta }$|

5 SIMULATION RESULTS

In this section, we provide results based on simulations to convince the robustness of our proposed calibration approach. We simulate an interferometric array with R = 47 stations, with the longest baseline of about 30 km. We simulate an observation centred at the north celestial pole, with a duration of 6 h at 150 MHz. The integration time for each data sample is kept at 10 s. For the full duration of the observation, there are 2160 data points. Each data point consists of 1081 baselines and 8 real values corresponding to the 2 × 2 complex visibility matrix.

The sky is simulated to have 300 sources, uniformly distributed over a field of view of 12° × 12°. The intensities of the sources are drawn using a power-law distribution, with the peak intensity at 40 Jy. In Fig. 3, we show the histogram of the intensities of the sources. Our intention is to compare the fluxes of the weak sources, i.e. the sources with intensities less than or equal to 1 Jy, after directional calibration is performed. In order to do that, we corrupt the visibilities of the bright sources with directional errors that vary slowly with time. We consider three scenarios here: we only corrupt the signals of the sources that have intensities greater than (i) 1, (ii) 2 and (iii) 5 Jy. For the simulated sky model, there are 28, 11 and 7 sources that have fluxes greater than 1, 2 and 5 Jy, respectively. Note that in each case, we do not corrupt the signals of the weak sources as our only objective is to find the recovered flux after directional calibration and subtraction of the bright sources from the data, although in reality all sources will be corrupted by similar directional errors. Finally, we add zero mean white Gaussian noise to the simulated data, with the signal-to-noise ratio (SNR) defined as
(26)
In all simulations, we have kept the SNR at 5 dB.

In Fig. 4, we show some of the weak sources (with intensities less than 1 Jy) over a 4° × 4° are of the field of view. In Fig. 5, we have also added the bright sources with slowly varying directional errors. Note that in order to recover Fig. 4 from Fig. 5, directional calibration is essential.

Histogram of the fluxes of the 300 simulated sources. The peak flux is 40 Jy.
Figure 3.

Histogram of the fluxes of the 300 simulated sources. The peak flux is 40 Jy.

Simulated image of 4° × 4° of the sky, showing only weak sources with intensities less than 1 Jy.
Figure 4.

Simulated image of 4° × 4° of the sky, showing only weak sources with intensities less than 1 Jy.

Simulated image of the sky where bright sources with fluxes greater than 1 Jy have been corrupted with directional errors. Due to these errors, there are artefacts throughout the image that make it difficult to study the fainter background sources.
Figure 5.

Simulated image of the sky where bright sources with fluxes greater than 1 Jy have been corrupted with directional errors. Due to these errors, there are artefacts throughout the image that make it difficult to study the fainter background sources.

In Fig. 6, we show the image after directional calibration along the bright sources and subtraction of their contribution from the data, using traditional calibration based on a Gaussian noise model. On the other hand, in Fig. 7, we show the image after directional calibration and subtraction using a robust noise model. With respect to the subtraction of the bright sources from the data, both normal calibration and robust calibration show equal performance as seen from Figs 6 and 7.

Image of the sky where the bright sources have been calibrated and subtracted from the data to reveal the fainter background sources. The traditional calibration based on a Gaussian noise model is applied.
Figure 6.

Image of the sky where the bright sources have been calibrated and subtracted from the data to reveal the fainter background sources. The traditional calibration based on a Gaussian noise model is applied.

Image of the sky where the bright sources have been calibrated and subtracted from the data to reveal the fainter background sources. The robust calibration proposed in this paper is applied.
Figure 7.

Image of the sky where the bright sources have been calibrated and subtracted from the data to reveal the fainter background sources. The robust calibration proposed in this paper is applied.

We perform Monte Carlo simulations with different directional gain and additive noise realizations for each scenario (i), (ii) and (iii) as outlined previously. For each realization, we image the data after subtraction of the bright sources and compare the flux recovered for the weak sources before and after directional calibration. The directional calibration is performed for every 10 time samples (every 100 s duration). Therefore, the number of data points used for each calibration (N) is 10 × 1081 × 8 = 86 480 and the number of real parameters estimated are 47 × 8 × 28 = 10 528, 47 × 8 × 11 = 4136 and 47 × 8 × 7 = 2632, respectively, for scenarios (i), (ii) and (iii). For each scenario (i), (ii) and (iii), we perform 100 Monte Carlo simulations.

Our performance metric is the ratio between the recovered peak flux of the weak sources and the original flux of each source. We calculate the average ratio (recovered flux/original flux) over all Monte Carlo simulations. In Figs 8– 10, we show the results obtained for scenarios (i), (ii) and (iii), respectively.

Ratio between the recovered flux and the original flux of each of the weak sources, when bright sources (>1 Jy) are subtracted.
Figure 8.

Ratio between the recovered flux and the original flux of each of the weak sources, when bright sources (>1 Jy) are subtracted.

Ratio between the recovered flux and the original flux of each of the weak sources, when bright sources (>2 Jy) are subtracted.
Figure 9.

Ratio between the recovered flux and the original flux of each of the weak sources, when bright sources (>2 Jy) are subtracted.

Ratio between the recovered flux and the original flux of each of the weak sources, when bright sources (>5 Jy) are subtracted.
Figure 10.

Ratio between the recovered flux and the original flux of each of the weak sources, when bright sources (>5 Jy) are subtracted.

We observe two major characteristics in Figs 8–10. First, we see that as we calibrate over an increasing number of directions (and subtract an increasing number of sources), the recovered flux is reduced. Secondly, in all scenarios, robust calibration recovers more flux compared to normal calibration. To illustrate this point, we also plot in Fig. 11, the ratio between the recovered flux using robust calibration and the recovered flux using normal calibration. As we see from Fig. 11, we almost always get a value greater than 1 for this ratio, indicating that we recover more flux using robust calibration.

Ratio between the recovered flux using robust calibration and the recovered flux using normal calibration. Almost always, robust calibration recovers more flux compared with normal calibration. The different colours indicate different scenarios where the number of bright sources subtracted is varied.
Figure 11.

Ratio between the recovered flux using robust calibration and the recovered flux using normal calibration. Almost always, robust calibration recovers more flux compared with normal calibration. The different colours indicate different scenarios where the number of bright sources subtracted is varied.

We summarize our findings in Table 1. We see that at worst case, the performance of normal calibration gives a flux reduction of about 20 per cent compared to robust calibration.

Table 1.

Comparison of the reduction of flux of the weak background sources with normal and robust calibration.

No. of sourcesLowest flux of theAverage reduction of the flux of weak
calibrated andsubtracted sourcesbackground sources (per cent)
subtracted(Jy)
Normal calibrationRobust calibration
28128.78.2
11212.33.2
757.93.1
No. of sourcesLowest flux of theAverage reduction of the flux of weak
calibrated andsubtracted sourcesbackground sources (per cent)
subtracted(Jy)
Normal calibrationRobust calibration
28128.78.2
11212.33.2
757.93.1
Table 1.

Comparison of the reduction of flux of the weak background sources with normal and robust calibration.

No. of sourcesLowest flux of theAverage reduction of the flux of weak
calibrated andsubtracted sourcesbackground sources (per cent)
subtracted(Jy)
Normal calibrationRobust calibration
28128.78.2
11212.33.2
757.93.1
No. of sourcesLowest flux of theAverage reduction of the flux of weak
calibrated andsubtracted sourcesbackground sources (per cent)
subtracted(Jy)
Normal calibrationRobust calibration
28128.78.2
11212.33.2
757.93.1

Up to now, we have only considered the sky to consist of only point sources. In reality, there is diffuse structure in the sky. This diffuse structure is seldom incorporated into the sky model during calibration either because it is too faint or because of the complexity of modelling it accurately. We have also done simulations where there is faint diffuse structure in the sky and only the bright foreground sources are calibrated and subtracted. We have chosen scenario (i) in the previous simulation except that we have replaced the sources below 1 Jy with Gaussian sources with peak intensities below 1 Jy and with random shapes and orientations.

In Fig. 12, we have shown the residual image of a 6° × 6° area in the sky after removing all sources brighter than 1 Jy. The residual image is obtained by averaging 100 Monte Carlo simulations. The equivalent image for robust calibration is given in Fig. 13.

The average residual image of the diffuse structure after subtracting the bright sources by normal calibration. The colour scale is in Jy per PSF.
Figure 12.

The average residual image of the diffuse structure after subtracting the bright sources by normal calibration. The colour scale is in Jy per PSF.

The average residual image of the diffuse structure after subtracting the bright sources by robust calibration. The colour scale is in Jy per PSF and is the same as in Fig. 12.
Figure 13.

The average residual image of the diffuse structure after subtracting the bright sources by robust calibration. The colour scale is in Jy per PSF and is the same as in Fig. 12.

As seen from Figs 12 and 13, there is more flux in the diffuse structure after robust calibration. This is clearly seen in the bottom right-hand corner of both figures, where Fig. 13 has more positive flux than in Fig. 12.

6 CONCLUSIONS

We have presented the use of Student's t-distribution in radio interferometric calibration. Compared with traditional calibration that has an underlying Gaussian noise model, robust calibration using Student's t-distribution can handle situations where there are model errors or outliers in the data. Moreover, by automatically selecting the number of degrees of freedom ν during calibration, we have the flexibility of choosing the appropriate distribution even when no outliers are present and the noise is perfectly Gaussian. For the specific case considered in this paper, i.e. the loss of coherency or flux of unmodelled sources, we have given simulation results that show the significantly improved flux preservation with robust calibration. Future work would focus on adopting this for pipeline processing of massive data sets from new and upcoming radio telescopes.

We thank the reviewer, Fred Schwab, for his careful review and insightful comments. We also thank the Editor and Assistant Editor for their comments. We also thank Wim Brouw for commenting on an earlier version of this paper.

1

The Low Frequency Array.

2

The Square Kilometre Array.

REFERENCES

Aravkin
A.
Friedlander
M.
van Leeuwen
T.
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Robust Inversion Via Semistochastic Dimensionality Reduction
,
2012
Piscataway, NJ
IEEE
pg.
5425
Bartkowiak
A.
Proc. IEEE 6th Int. Conf. on Computer Information Systems and Industrial Management Applications (CISIM’07), Should Normal Distribution be Normal? The Student's T Alternative
,
2007
Piscataway, NJ
IEEE
pg.
3
Boonstra
A.
van der Veen
A.
IEEE Trans. Signal Process.
,
2003
, vol.
51
pg.
25
Cornwell
T.
Fomalont
E. B.
Taylor
G. B.
Carilli
C. L.
Perley
R. A.
ASP Conf. Ser. Vol. 180, Synthesis Imaging in Radio Astronomy II
,
1999
San Francisco
Astron. Soc. Pac.
pg.
187
Cornwell
T. J.
Wilkinson
P. N.
MNRAS
,
1981
, vol.
196
pg.
1067
Dempster
A.
Laird
N.
Rubin
D.
J. R. Stat. Soc. B
,
1977
, vol.
39
pg.
1
Fessler
J.
Hero
A.
IEEE Trans. Signal Process.
,
1994
, vol.
42
pg.
2664
Gosset
W. S.
Biometrika
,
1908
, vol.
6
pg.
1
Hamaker
J. P.
Bregman
J. D.
Sault
R. J.
A&AS
,
1996
, vol.
117
pg.
96
Kazemi
S.
Yatawatta
S.
Zaroubi
S.
Labropoluos
P.
de Bruyn
A.
Koopmans
L.
Noordam
J.
MNRAS
,
2011
, vol.
414
pg.
1656
Lange
K.
Little
R.
Tylor
J.
J. Am. Stat. Assoc.
,
1989
, vol.
84
pg.
881
Levenberg
K.
Q. J. App. Math.
,
1944
, vol.
2
pg.
164
Li
S.
Wang
H.
Chai
T.
American Control Conference
,
2006
Troy, NY
Am. Autom. Control Counc.
pg.
6
Liu
C.
Rubin
D.
Stat. Sin.
,
1995
, vol.
5
pg.
19
Lourakis
M.
Technical report, Institute of Computer Science, Foundation for Research and Technology, Hellas (FORTH) on Levmar: Levenberg–Marquardt nonlinear least squares algorithms in C/C++
,
2004
Madsen
K.
Nielsen
H.
Tingleff
O.
Lecture Notes: Methods for non-linear least squares problems
,
2004
2nd edn
Technical University of Denmark
Marquardt
D.
SIAM J. App. Math.
,
1963
, vol.
11
pg.
431
Martí-Vidal
I.
Marcaide
J.
A&A
,
2008
, vol.
480
pg.
289
Martí-Vidal
I.
Ros
E.
Perez-Torres
M.
Guirado
J.
Jiménez-Monferrer
S.
Marcaide
J.
A&A
,
2010
, vol.
515
pg.
A53
Offringa
A.
de Bruyn
A.
Biehl
M.
Zaroubi
S.
Bernardi
G.
Pandey
V.
MNRAS
,
2010
, vol.
405
pg.
155
Schwab
F.
VLA Scientific Memorandum
,
1982
, vol.
136
pg.
1
Thompson
A.
Moran
J.
Swenson
G.
Interferometry and Synthesis in Radio Astronomy
,
2001
3rd edn
New York
Wiley Interscience
van der Tol
S.
Jeffs
B.
van der Veen
A.
IEEE Trans. Signal Process.
,
2007
, vol.
55
pg.
4497
van der Veen
A.
Leshem
A.
Boonstra
A.
Exp. Astron.
,
2004
, vol.
17
pg.
231
Yatawatta
S.
Kazemi
S.
Zaroubi
S.
Proc. IEEE Int. Symp. on Signal Processing and Information Technology
,
2012