ABSTRACT

The incomplete sampling of data in complex polarization measurements from radio telescopes negatively affects both the rotation measure (RM) transfer function and the Faraday depth spectra derived from these data. Such gaps in polarization data are mostly caused by flagging of radio frequency interference and their effects worsen as the percentage of missing data increases. In this paper we present a novel method for inferring missing polarization data based on Gaussian processes (GPs). GPs are stochastic processes that enable us to encode prior knowledge in our models. They also provide a comprehensive way of incorporating and quantifying uncertainties in regression modelling. In addition to providing non-parametric model estimates for missing values, we also demonstrate that GP modelling can be used for recovering rotation measure values directly from complex polarization data, and that inferring missing polarization data using this probabilistic method improves the resolution of reconstructed Faraday depth spectra.

1 INTRODUCTION

Polarization measurements from radio telescopes with broad-band receiver systems have changed the way in which we investigate magnetized astrophysical plasmas by enabling the use of the rotation measure (RM) synthesis method (Burn 1966; Brentjens & de Bruyn 2005). In this method the Fourier relationship between polarized intensity as a function of wavelength-squared and the Faraday dispersion function is exploited to recover the polarized intensity as a function of Faraday depth, ϕ, such that
(1)
where
(2)

Although conceptually simple, in practical terms the implementation of this method and the interpretation of the resulting Faraday dispersion function are complicated by a number of factors. The first of these complications is that the Stokes pseudo-vectors Q and U are not sampled natively in wavelength-squared but linearly in frequency. A further complication is that equation (1) does not represent a true Fourier relationship as P2) does not exist at λ2 < 0. Since F(ϕ) is not necessarily a purely real quantity this represents a fundamental limitation in attempting to reconstruct an unknown Faraday dispersion function from measured values of P2). An additional limitation comes from the finite bandwidth and hence range of wavelength-squared, Δ(λ2), measured by a radio telescope receiver, as well as the potentially incomplete sampling over this bandwidth due to various effects [radio frequency interference (RFI), instrumental problems, etc.] that require flagging (excision) of specific frequency channel data during an observation.

Incomplete sampling of complex polarization in frequency-space, as well as the nonlinear mapping of frequency to λ2, results in a non-uniform distribution of measurements in λ2-space. This is typically represented as a multiplicative weighting function, W2), which results in the convolution of the Faraday dispersion function with a transfer function,
(3)
known as the rotation measure transfer function, RMTF(ϕ) (Brentjens & de Bruyn 2005). Consequently, the complications caused by both the λ2 < 0 problem and the mapping of linear frequency to wavelength-squared may be encapsulated in the weighting function W2) and its Fourier counterpart, RMTF(ϕ). Although attempts have been made to deconvolve the RMTF from the Faraday depth spectra, e.g. Heald (2009), the methods are inherently underconstrained due to the |$P(\lambda ^2\lt 0) = \varnothing$| problem.

This can cause statistical ambiguity in the results of such a deconvolution process, as Faraday depth is inherently complex-valued and therefore not subject to the same conjugate symmetry as, for example, interferometric imaging. It has also been shown that certain Faraday structures will themselves not have conjugate symmetry (Brandenburg & Stepanov 2014). Consequently, underconstrained deconvolution processes are liable to introduce spurious non-physical structures, whilst also being unable to reconstruct all true physical structures (Macquart et al. 2012; Pratley, Johnston-Hollitt & Gaensler 2020).

These issues are of particular importance when complex or compound Faraday structures are seen along a single line of sight. In a blind data challenge comparing different methods for extracting the parameters of Faraday structures, Sun et al. (2015) noted that RM synthesis was highly prone to missing composite Faraday structures and that the ∼25 per cent of extragalactic sources found observationally by Law et al. (2011) to be Faraday complex was likely an underestimate. Sun et al. (2015) concluded that when Faraday complexity is present, only QU-fitting methods will produce results that do not introduce extraneous scatter in recovered RM values. Such QU-fitting methods, e.g. Farnsworth, Rudnick & Brown (2011), O’Sullivan et al. (2012), and Ideguchi et al. (2014), model the observed polarization, |$\tilde{P}(\lambda ^2)$|⁠, directly rather than decomposing Faraday depth spectra to recover parameters. However, whilst QU-fitting methods can be more reliable for recovering composite structures, in particular those including Faraday thick components, in general they rely on physically parametrized models of the synchrotron-emitting and Faraday rotating media along the line of sight to be known a priori. One recent exception to this is the deconvolution method proposed by Pratley et al. (2020), which uses a constrained clean-style non-parametric QU-fitting approach that combines QU-fitting with deconvolution.

In this work we explore the potential for using a Gaussian process modelling (GPM) approach to improve both the interpretation of Faraday depth spectra and the recovery of physical parameters from polarization measurements. In its simplest form, a GPM approach can be considered part of the family of QU-fitting methods; however, it has certain advantages over physically parametrized QU-fitting methods, which include both a robustness to using non-exact models for imperfectly known data sets and an ability to provide predictive models for imputation including posterior variances. The contents of this paper are as follows, in Section 3 we describe the underlying GPM methodology including a description of the covariance kernel function employed and the optimization of hyperparameters; in Section 4 we illustrate the impact of GP optimization and re-sampling of polarization data on their resulting Faraday depth spectra in the presence of significant RFI flagging, and demonstrate how the rotation measures for Faraday thin data can be recovered from the hyperparameters of the GP; in Section 5 we demonstrate the method using a simulated source population with observational specifications matching that of the MeerKAT radio telescope; in Section 6 we discuss the implications of these results, compare the method to previous work in the literature and consider the use of more complex covariance kernels. In Section 7 we draw our conclusions.

2 GAUSSIAN PROCESS MODELLING FOR FARADAY DEPTH RECONSTRUCTION

Gaussian processes (GPs) are probabilistic non-parametric models that can be used for a variety of regression and classification problems. GPM can be used to describe a given data set by assuming a priori that the data themselves represent a realization of a stochastic process described by a multivariate Gaussian distribution. Such models are non-parametric as the GP is a prior on the data itself rather than on a parametric model, i.e. the complexity of the model grows with the inclusion of additional data points. GPMs are considered to be an attractive solution when modelling systems where the degree of complexity required by a parametric model is not necessarily supported by the amount of information in the observations. Overly complex parametric models are prone to overfitting and can lead to misinterpretation of the data. In contrast, Gaussian processes are flexible probabilistic regression models that do not require exact knowledge of the underlying physical model and can also provide posterior uncertainties that naturally reflect high uncertainty in regions of data space where there are few observational constraints.

In astrophysics, Gaussian processes have been used in various applications including in the study of photometric redshifts (Way et al. 2009), variability in stellar light curves (Angus et al. 2018), galactic dust emission (Leike & Enßlin 2019), eclipses in cataclysmic variable stars (McAllister et al. 2016), transmission spectroscopy of extrasolar planets (Gibson et al. 2012), and radio velocity studies for planetary candidates (Rajpaul et al. 2015). They have also been used extensively in cosmic microwave background analysis (Bond & Efstathiou 1987; Wandelt & Hansen 2003), as well as for modelling instrumental systematics and calibration, e.g. Gibson et al. (2012), Haywood et al. (2014), Barclay et al. (2015), Czekala et al. (2015), Evans et al. (2015), Rajpaul et al. (2015), Aigrain, Parviainen & Pope (2016), Rajpaul, Aigrain & Roberts (2016), Littlefair, Burningham & Helling (2017), Mertens, Ghosh & Koopmans (2018) and interferometric image reconstruction (Junklewitz et al. 2016).

2.1 GPM for imputation of missing polarization data

The transfer function described in equation (3) is a function of the observational sampling in λ2-space. In practice this sampling is affected not only by the characteristics of the telescope’s receiver system but also by the necessary removal of frequency channels affected by RFI. Radio telescopes operating at or close to L band (1.4 GHz) typically experience data flagging at a level of 20–50 per cent in bandwidth due to a wide variety of both terrestrial and satellite RFI sources (e.g. Mauch et al. 2020). As well as negatively impacting the sensitivity of an observation, this flagging can also result in large contiguous gaps in the frequency coverage that have a detrimental effect on the form of the RMTF.

Contingent on there being sufficient information in the remaining data, GPM may provide a potential method for imputing1 these missing data subject to only weak assumptions about the form of the data and without the need for a parametrized physical model. Furthermore, GPM not only provides a model estimate of the value for each missing data point but also an uncertainty on that value, allowing astronomers to set thresholds on the degree of reliability they are prepared to accept when imputing the values of missing data points.

From the perspective of imputation in a more general sense, this kind of application addressing RFI excision can be considered as an example of recovering data missing completely at random (MCAR) and/or missing at random (MAR; e.g. Seaman et al. 2013), depending on the nature of the RFI, i.e. it is being used to repair deficits from local interruptions in the data set rather than recovering information from a signal at or below the reliable cusp of detection.

2.2 GPM for resampling of irregular data

In addition to the imputation of missing data values, GPM can also provide a method of re-sampling observational data on to a uniform spacing in λ2-space. Since data are natively taken in channels uniformly spaced in frequency, some previous practical implementations of RM Synthesis have employed an initial processing step to re-bin spectral data, |$Q(\nu)~\&~U(\nu)$|⁠, into regularly spaced arrays of λ2 in order to use a fast Fourier transform (FFT) transformation into Faraday depth space, as implemented in for example pyrmsynth  2 package (described in Pratley & Johnston-Hollitt 2020) that was used to perform RM synthesis for the LOFAR Two-metre Sky Survey (Van Eck et al. 2018). Like RFI flagging, the nonlinear relationship between frequency and wavelength-squared results in a distribution of measured data points that are often separated by large contiguous gaps in λ2-space, resulting in high sidelobes in the RMTF. Furthermore, GPM can not only be used to re-sample data within the observed bandwidth but also to infer data beyond the boundaries of that bandwidth. Without additional data, these inferences will be subject to rapidly increasing degrees of uncertainty but may still be used over moderate distances in frequency space to improve the resolution of a measurement in Faraday depth. In practice, re-sampling to uniformly spaced λ2-positions and imputing the values of data that are missing due to RFI flagging can be done simultaneously.

3 METHOD

The basic premise of GPM is that, instead of employing a physically motivated or semi-empirical parametric model to describe a data set, we can instead assume that the data, |$\mathbf {y}(\mathbf {x})$|⁠, represent the realization of a random Gaussian process,
(4)
where |$K(\mathbf {x},\mathbf {x})$| is the matrix that defines the degree of covariance between every pair of measurement positions, (x, x), for |$\mathbf {x} = \lbrace x_1, x_2, \dots , x_n\rbrace$|⁠, and |$\mathbf {\mu }(\mathbf {x})$| is an underlying mean function, which in this case is considered to be zero-valued. In the case of non-negligible polarization leakage, this would manifest as μ(x) ≠ 0.
The expected value of the data at all other positions, x*, can be calculated as a posterior mean, μ*, with an associated posterior variance, C*, such that
(5)
 
(6)
see e.g. Rasmussen & Williams (2006) and Roberts et al. (2013).

For polarization data measured at regular intervals in frequency (spectral channels) and hence irregular intervals in λ2, the posterior mean can be used to infer the values of the polarization data at regular intervals, |$\lambda _{\ast }^2$|⁠, avoiding the need for explicit re-binning of the data, and the posterior variance can be used to define the weighting function |$W(\lambda _{\ast }^2)$| at those positions and hence calculate the RMTF.

3.1 Kernel definition

The covariance matrix used in equations (5) and (6) can be populated analytically using a kernel function, k, that defines the degree of covariance for each measurement separation,
(7)
where k(x1, x2) is the covariance between two measurements at a separation |x1x2|. We note that this assumes statistical homogeneity within a given data space, which may not be the case for data sets that combine significantly different λ2 regimes.

3.2 Covariance kernel definition

For the application to radio polarization data we use a compound covariance kernel. The contributions to this kernel comprise covariance arising both from the expected signal and from the properties of the measurement data. Faraday rotation results in periodicity of the complex polarization signal but the presence of additional Faraday components, including thick structures, can cause deviations from exact periodicity in the data. To represent this behaviour we use a quasi-periodic kernel formed from the multiplication of an exponential kernel and a periodic kernel.

We implement our kernel using the celerite Gaussian processing library (Foreman-Mackey et al. 2017). Traditional GPM scales as |$\mathcal {O}(N^3)$|⁠. This is due to the need to invert the covariance matrix and often makes GPM computationally intractable for large data sets. In contrast, the celerite GP implementation uses a restricted set of kernels that allow for fast separable evaluation with a linear complexity of |$\mathcal {O}(N)$|⁠.

The celerite kernel has a generalized exponential form
(8)
which can be used to produce a quasi-periodic kernel of the form
(9)
In this work we combine this quasi-periodic kernel with a white noise kernel
(10)
to take account of the thermal noise present on observational measurements of the complex polarization. Together, k1 and k2 are used to create a composite kernel
(11)
where the contributions of the different components are governed by the values of their hyperparameters, h, c, P, and σ2. While the values of these hyperparameters need to be inferred from the data, they do not explicitly characterize a functional form for the data, but instead govern the scales on which different covariance distributions act.

3.3 Hyperparameter estimation

The values of the hyperparameters can be optimized by evaluation of a likelihood function using the measured data. Polarization data are complex valued, where
(12)
and so the log-likelihood function can be broken down such that
(13)
The components of this likelihood are
(14)
 
(15)
where
(16)
 
(17)
In the case where the covariance matrix is characterized solely by stationary kernels, i.e. kernels that are functions of the sample separation, |xixj|, only, it can be assumed that |$\mathbf {K}_{\rm U}\equiv \mathbf {K}_{\rm Q}$| and in this paper we assume a single kernel of the form given in equation (9) to describe both Stokes parameters, the hyperparameters of which are optimized using the joint likelihood in equation (13).

4 APPLICATION TO SIMULATED DATA

4.1 Example scenarios

We illustrate the use of the GPM method on two examples: (1) a simple Faraday Thin scenario, and (2) a more complex scenario involving both Faraday Thin and Faraday Thick components. This second scenario corresponds to the example geometry outlined in section 1 of Brentjens & de Bruyn (2005).

4.1.1 Scenario 1

In this scenario we assume polarized emission from the lobe of a radio galaxy with an intensity of 1 Jy at a reference frequency of νref = 1.4 GHz and a synchrotron spectral index of α = 0.7, which has a single valued Faraday depth of 50 rad m−2 such that
(18)
 
(19)

4.1.2 Scenario 2

In this scenario we adopt the second line of sight geometry of Brentjens & de Bruyn (2005), where
(20)
 
(21)
and the total polarization is given by
(22)

In both scenarios we assume that the signal is measured over a frequency bandwidth from 0.58 to 2.50 GHz in 512 evenly spaced frequency channels. We map each channel frequency to a corresponding value of λ2. For each measurement we assume that (Q, U, σQ, σU) are recorded, where Q corresponds to the real part of the polarization and U corresponds to the imaginary component.

4.2 Faraday depth spectra

We project the data into Faraday depth space following the numerical method outlined in Brentjens & de Bruyn (2005). The resulting Faraday depth spectra are calculated using
(23)
where Wi is the weight for channel i at a wavelength of λi, and Pi is the complex polarization measured in that channel, see equation (2). These are the dirty Faraday depth spectra, i.e. they are convolved with the RMTF, which itself is calculated using
(24)
In each case, K, is the sum of the weights,
(25)
where the weights are defined as the reciprocal of the variance in each channel, and λ0 is a reference wavelength used to derotate the polarization data to a common zero-point defined as
(26)

The recovered Faraday depth spectra for Scenario 2 are shown in Fig. 1, which illustrates the difference in the Faraday depth spectra when polarization data are sampled uniformly in frequency compared to being sampled uniformly in λ2-space. From the figure, two key differences can be observed. First, by comparing the full width at half-maximum (FWHM) of the RMTF in each case, it can be seen that the resolution in Faraday depth is improved by a factor of ∼2. This is important for resolving different Faraday components and complex structure. Secondly, it can be seen that the main lobe of the RMTF is clearly separated from the sidelobe structure in the case of uniform λ2-sampling, and that the amplitude of the first sidelobe is ∼3 times smaller than that resulting from uniform sampling in frequency. As well as aiding visual inspection, this is important for approaches that incorporate deconvolution, as incorrect or imperfect identification of model component positions during deconvolution will result in residuals with amplitudes proportional to RMTF sidelobe levels. As a consequence of these factors, the Faraday thick structure from Scenario 2 in Fig. 1 is resolved in the resulting Faraday depth spectrum when using uniform λ2-sampling, with the positions of the edges clearly delineated. Additionally, we note that the sidelobe response of regular sampling in λ2-space could be further improved by the application of a window function prior to Fourier transforming, and that such an application could be tailored to balance the contributions from sidelobe structure, thermal noise and resolution, as needed for an individual science case.

Scenario 2: RMTF (left), Faraday depth spectrum (centre), and Faraday depth spectrum overlaid with input model (right) for the case of (i) data evenly sampled in frequency (top) and (ii) data evenly sampled in wavelength-squared (bottom). Data are noiseless in both cases.
Figure 1.

Scenario 2: RMTF (left), Faraday depth spectrum (centre), and Faraday depth spectrum overlaid with input model (right) for the case of (i) data evenly sampled in frequency (top) and (ii) data evenly sampled in wavelength-squared (bottom). Data are noiseless in both cases.

In what follows, we use inverse variance weights for calculating Faraday depth spectra. For the optimized GP reconstructions, we use an approximate predictive posterior variance for the weights given by
(27)
(Rasmussen & Williams 2006), where |$\sigma _{\rm n}^2$| is the amplitude of the white noise kernel in equation (10). The white noise variance needs to be included because we are building a Faraday depth spectrum from the model estimate of the noisy complex polarization data.

4.3 Parameter estimation

To optimize the hyperparameters of our GP kernel we use Markov Chain Monte Carlo (MCMC) implemented via the emcee package (Foreman-Mackey et al. 2013) to explore the posterior probability distribution. We use 500 samples for a burn-in then take the maximum likelihood hyperparameter values from the burn-in and perturb them by adding noise at a level of 10−5. We then use these perturbed values to start a production run of 5000 samples. Prior ranges for each of the hyperparameters are shown in Table 1. The lower limit on log P corresponds to a maximum Faraday depth of ∼1200 rad m−2.

Table 1.

Range of uniform priors used for each of the hyperparameters in GP kernel, see equation (9).

HyperparameterPrior
ln h(−25,5)
ln c(−25,5)
ln P(−6,10)
ln σ(−25,5)
HyperparameterPrior
ln h(−25,5)
ln c(−25,5)
ln P(−6,10)
ln σ(−25,5)
Table 1.

Range of uniform priors used for each of the hyperparameters in GP kernel, see equation (9).

HyperparameterPrior
ln h(−25,5)
ln c(−25,5)
ln P(−6,10)
ln σ(−25,5)
HyperparameterPrior
ln h(−25,5)
ln c(−25,5)
ln P(−6,10)
ln σ(−25,5)

4.4 Imputation of missing data

RFI flagging creates gaps in the polarization data thereby reducing the λ2-coverage in an irregular fashion and causing the sidelobe structure in the RMTF to become more prominent.To mimic the RFI flagging process, we remove random portions of data from the simulated data set. To do this, a random position in the data set is selected and a chunk comprised of a randomly selected number of entries is removed. This process is repeated until the desired percentage of data is removed. We simulate data sets with 20, 30, and 40 per cent missing values. An example of this flagging using the Scenario 1 data set is shown in Fig. 2, which shows the input data, flagged at a level of 30 per cent. These data are scattered by white noise, giving an SNR of one in each channel.

Left upper: Stokes parameters for Scenario 1 (see Section 4.1) with SNRchan = 1 and 30 per cent of the data flagged (black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: residuals between the posterior mean for Stokes Q and the data (black points) and residuals between the posterior mean and the missing/flagged Stokes Q data (blue points). The ±3σ limits are marked by grey dashed lines. Right: The inferred period of the process with the true period marked (vertical blue line) as well as the 16, 50, and 84 per cent confidence intervals of the posterior probability distribution (grey dashed lines).
Figure 2.

Left upper: Stokes parameters for Scenario 1 (see Section 4.1) with SNRchan = 1 and 30 per cent of the data flagged (black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: residuals between the posterior mean for Stokes Q and the data (black points) and residuals between the posterior mean and the missing/flagged Stokes Q data (blue points). The ±3σ limits are marked by grey dashed lines. Right: The inferred period of the process with the true period marked (vertical blue line) as well as the 16, 50, and 84 per cent confidence intervals of the posterior probability distribution (grey dashed lines).

Following Sun et al. (2018), we define the SNR of the integrated polarization data as
(28)
where σν is the standard deviation of the random Gaussian noise added to each frequency channel and Nchan is the number of frequency channels. P0 is the polarized amplitude at the reference frequency, taken in this case to be 1.4 GHz. Noise is added independently to both Stokes Q and Stokes U data.

Fig. 2 also shows the posterior mean from the GP model, optimized using only the unflagged data points, as well as the residual between the input data and the optimized GP model for both the unflagged and flagged data points. The right-hand panel of Fig. 2 shows the posterior probability distribution for the hyperparameter, P, which can be used to calculate the rotation measure as described later in this work, see Section 4.6.

The effect of the flagging process on the Faraday depth spectrum is shown for Scenario 1 in Fig. 3. In Fig. 3 (centre), data have been imputed at regular intervals across the full bandwidth of the simulation and the Faraday depth spectrum calculated using the posterior mean values, weighted by the posterior covariance at each point.

Faraday depth spectra for Scenario 1 (Faraday thin) with 20 per cent (top), 30 per cent (middle), and 40 per cent (bottom) of channel data flagged and SNRchan = 1. Left: Faraday depth spectrum from flagged data. Right: Faraday depth spectra from GPM reconstructed data as described in Section 4.4. The ±5 σϕ limits are shown as a light grey shaded area.
Figure 3.

Faraday depth spectra for Scenario 1 (Faraday thin) with 20 per cent (top), 30 per cent (middle), and 40 per cent (bottom) of channel data flagged and SNRchan = 1. Left: Faraday depth spectrum from flagged data. Right: Faraday depth spectra from GPM reconstructed data as described in Section 4.4. The ±5 σϕ limits are shown as a light grey shaded area.

It is noticeable in Fig. 3 that the spectra calculated from the optimized GP model appear significantly cleaner than those calculated from the original data. This is in part because the sidelobe structure due to the gaps in frequency coverage is reduced, but also because the posterior mean values from the model are not scattered by the thermal noise in the same way as the original data. Although this may be advantageous in enhancing some features of the spectra, it can also potentially lead to overinterpretation of low-level features. We caution that structures present at amplitudes below |$5\!-\!8\, \sigma _{\phi }$| should not be considered to represent true astrophysical components but are more likely to arise from structure in the noise. For uniform spectral noise, i.e. noise per frequency channel, the value of σϕ can be calculated by taking into account the number of independent measurements in complex polarization, which is |$N_{\rm chan}^{\prime }$|⁠, where |$N_{\rm chan}^{\prime }$| is the number of unflagged frequency channels in the data set. By analogy to the quantity P2), which has σP = σ when σQ = σU = σ (Brentjens & de Bruyn 2005), the amplitude of the Faraday depth spectrum will have noise |$\sigma _{\phi } = \sigma /\sqrt{N_{\rm chan}^{\prime }}$|⁠. The threshold indicated in Fig. 3 is shown at 5σϕ.

The right-hand column of Fig. 3 shows the Faraday depth spectrum reconstructed from the difference between the GP estimate and the noiseless model in Scenario 1. We note that this difference is not the same as the residual that would be obtained from deconvolution of Faraday depth reconstruction of the GP estimate, which depends only on the data itself, but instead describes the degree of difference between the true model structure and the estimate fitted to the noisy data. Residuals in this structure below the level of the noise threshold are consistent with the model overfitting to noise fluctuations in the Stokes Q and U data from the perspective of the model. One notable feature of this difference in the case of low SNR is the appearance of a residual peak at ϕ = −50 rad m−2 for Scenario 1. This is due to the fact that the GP model is not parametrically constrained to produce the same amplitude for both the Stokes Q and U components, but rather the amplitude will follow the pattern of the noise in the data. For low signal to noise data, this can result in a local offsets between the amplitude of Q and U inferred by the GP estimate as a function of λ2. From the perspective of the model this local offset removes flux from the true peak and moves it to the mirrored peak, resulting in the double structure seen in the residual Faraday depth spectrum. However, once again we note that from the perspective of the data, this is a consequence of the noise realization, which is why it is reflected in the GP model estimate.

As illustrated by Foreman-Mackey et al. (2017), it is often the case that valid inferences can be made using incorrect GP models. For physical processes whose underlying properties are not well understood, such as generalized Faraday spectra, this can significantly complicate the process of selecting a parametric model. Incorporating domain knowledge helps to make the problem of model selection and interpretation of parameters less difficult. We demonstrate this by using the GP kernel from Scenario 1 to impute missing data for simulations of Scenario 2. An example of this is shown in Fig. 4, which shows a simulated data set where 20 per cent of the data points have been removed to mimic RFI flagging. In spite of not being an exact model, the optimized GP is able to reconstruct the missing data within the measurement uncertainties. It can also be seen that the optimized model is able to recover the rotation measure of the radio galaxy with high accuracy.

Left upper: Stokes parameters for Scenario 2 (see Section 4.1; black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: residuals between the posterior mean for Stokes Q and the data (black points) and residuals between the posterior mean and the missing/flagged Stokes Q data (blue points). The ±3σ limits are marked by grey dashed lines. Right: The inferred period of the process with the true period marked (vertical blue line) as well as the 16, 50, and 84 per cent confidence intervals of the posterior probability distribution (grey dashed lines).
Figure 4.

Left upper: Stokes parameters for Scenario 2 (see Section 4.1; black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: residuals between the posterior mean for Stokes Q and the data (black points) and residuals between the posterior mean and the missing/flagged Stokes Q data (blue points). The ±3σ limits are marked by grey dashed lines. Right: The inferred period of the process with the true period marked (vertical blue line) as well as the 16, 50, and 84 per cent confidence intervals of the posterior probability distribution (grey dashed lines).

In this case, recovering the Faraday depth of the thin component is not as strong an indicator of performance as in Scenario 1. Instead we compare the posterior mean of the optimized GP to the input model without noise and the residuals for this comparison are shown in the lower left-hand panel of Fig. 4. Fig. 5 shows the Faraday depth spectrum reconstructed from the optimized GP for Scenario 2 in the presence of different levels of RFI flagging. These should be compared to the noiseless model spectrum shown in Fig. 1.

Faraday depth spectra for Scenario 2 (Faraday complex) with 20 per cent (top), 30 per cent (middle), and 40 per cent (bottom) of channel data flagged and SNRchan = 5 at a reference frequency of 1.4 GHz. Left: Faraday depth spectrum from flagged data. Centre: Faraday depth spectra reconstructed from the GP estimate as described in Section 4.4. Right: Faraday depth residuals formed by subtracting the noiseless model from the GP estimate. The ±5 σϕ limits are shown as a light-grey shaded area.
Figure 5.

Faraday depth spectra for Scenario 2 (Faraday complex) with 20 per cent (top), 30 per cent (middle), and 40 per cent (bottom) of channel data flagged and SNRchan = 5 at a reference frequency of 1.4 GHz. Left: Faraday depth spectrum from flagged data. Centre: Faraday depth spectra reconstructed from the GP estimate as described in Section 4.4. Right: Faraday depth residuals formed by subtracting the noiseless model from the GP estimate. The ±5 σϕ limits are shown as a light-grey shaded area.

4.5 Multiple sources along the line of sight

As described in Sun et al. (2015), it is expected that many lines of sight will have more than one Faraday thin component. Such Faraday geometries are also able to be recovered by the kernel in equation (9). In this case the hyperparameter P will represent the |RM| of the component with the highest rotation. An example of where this can occur is the superposition of radio galaxy jets or lobes oriented along a line of sight unresolved by the beam of a telescope. Such geometries result in double Faraday components, typically with similar Faraday depths as the integrated rotation differs only by the degree contributed in the local medium between the jets whilst experiencing a common degree of rotation from the comparatively larger path between the galaxy and the observer. An example of such a situation is shown in Fig. 6. We illustrate the GP model in this figure at high signal to noise, SNRchan = 20, to demonstrate that the same GP kernel from equation (9) can accurately represent the complex polarization behaviour of multiple sources without requiring additional kernel components. The Faraday depth spectra in the lower panels of Fig. 6 show the recovered Faraday depth spectrum from the noisy input data (left) and from the optimized GP reconstruction (right). It can be observed that the double structure is more apparent in the GP reconstruction. We note that the separation of the peaks in Faraday depth is wider than the separation of components in the model. This is not an artefact of the GP reconstruction, but due to the position of the first null in the RMTF.

Upper: Stokes parameters for line of sight comprised of two Faraday thin sources with SNRchan = 20 (black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: Faraday depth spectrum calculated from the noisy input data sampled at regular intervals in frequency. Right lower: Faraday depth spectrum calculated from the optimized GPM sampled at regular intervals in λ2-space with the model components marked as solid black lines.
Figure 6.

Upper: Stokes parameters for line of sight comprised of two Faraday thin sources with SNRchan = 20 (black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: Faraday depth spectrum calculated from the noisy input data sampled at regular intervals in frequency. Right lower: Faraday depth spectrum calculated from the optimized GPM sampled at regular intervals in λ2-space with the model components marked as solid black lines.

4.6 Rotation measure extraction

Under certain circumstances it is possible to use the optimized hyperparameters to recover the rotation measure of a source. In principle, for Faraday thin sources the absolute value of the rotation measure can be recovered from the periodicity using,
(29)
Due to the stationary nature of the covariance kernels, it is not possible to recover the sign of the rotation measure from the hyperparameters of the covariance matrix; however, it can be recovered by comparing the correlation co-efficient, ccorr, between (e.g.) the Stokes Q data and the Stokes U data shifted by ±π/2 radians, U±:
(30)

Using Scenario 1, we evaluate the accuracy of RM recovery for the Faraday thin source as a function of the SNR of the integrated polarization data, SNRint. It can be seen in Fig. 7 that for Scenario 1, the rotation measure is recovered with <1σ deviation from the true value for even low values of SNRint < 10 where the SNR on an individual frequency channel, SNRchan ≪ 1. A minimum value of SNRint = 8 is typically used to identify sources in total polarization images, see e.g. George, Stil & Keller (2012), and this value is marked in Fig. 7. It can be seen that the optimized RM value is in good agreement with both the expectation value and the true value of the RM above this limit.

Recovered rotation measures (RM) as a function of the signal-to-noise ratio (SNR) in the integrated polarization data for Scenario 1 are shown for a range of different ϕgal values. The difference between the expectation of the RM from the GP and the input to the Scenario 1 model are shown in each case with uncertainties calculated using the 68 per cent confidence interval on the optimized value of log P, which results in asymmetric uncertainties on the RM values. The point at which the SNR in an individual frequency channel is equal to one is shown as a vertical dashed line (blue). The distribution of equivalent expectation values for ten different noise realizations are shown as light-grey lines.
Figure 7.

Recovered rotation measures (RM) as a function of the signal-to-noise ratio (SNR) in the integrated polarization data for Scenario 1 are shown for a range of different ϕgal values. The difference between the expectation of the RM from the GP and the input to the Scenario 1 model are shown in each case with uncertainties calculated using the 68 per cent confidence interval on the optimized value of log P, which results in asymmetric uncertainties on the RM values. The point at which the SNR in an individual frequency channel is equal to one is shown as a vertical dashed line (blue). The distribution of equivalent expectation values for ten different noise realizations are shown as light-grey lines.

However, we note that when using this method to obtain an RM there is a lower limit on the absolute value of the rotation measure that can be reliably recovered. This limit is dependent on the both the frequency coverage of the observational data and the signal to noise ratio. At very low values of Faraday depth the degree of rotation in the complex polarization signal becomes smaller and causes a degeneracy between hyperparameter c in equation (9) and the periodicity, P. This is exacerbated for data at higher frequencies where the rotation is more poorly sampled. This lack of rotation also causes the method for assigning the direction of the rotation measure described in equation (30) to be less effective, as there are not guaranteed to be ±π/2 radians of rotation present in the observed data. As this lower limit, ϕlim, is a strong function of frequency coverage, it is possible to determine it a priori for a particular telescope such that
(31)
where Δλ2 is the coverage of the observational data in λ2-space and the given expression corresponds to that coverage being equivalent to π/2 radians for ϕlim. As illustrated in Fig. 7, RM values above this limit are recovered reliably for SNRint ≥ 8.

We emphasize that this limit is only applicable to the calculation of the rotation measure from the hyperparameters and does not affect the improvement in the Faraday depth spectrum from GPM re-sampling or imputation, as shown in Fig. 1, where structure at low Faraday depths is recovered equivalently well.

5 PERFORMANCE CONSIDERATIONS FOR SURVEYS

We now extend the application to simulated data to a large data set, such as might be recovered from a MeerKAT observation of a survey field. We use a simulated data set containing Stokes Q and Stokes U data for 10 000 Faraday thin sources. Observational parameters are matched to those of the MeerKAT telescope, one of the precursor arrays of the Square Kilometre Array (SKA; Dewdney et al. 2009; Jonas 2009). The MeerKAT consists of 64 13.5 m parabolic dishes sited in the Northern Cape province of South Africa (Jonas 2016; Taylor & Jarvis 2017).

A population of 10 000 Faraday thin sources was created with a Stokes I flux density at 1.4 GHz taken from a uniform distribution with values between 10 μJy and 30 mJy, and an RM taken from a uniform distribution between −100 and  +100 rad m−2. Stokes Q and U values were generated for 320 channels spaced uniformly in frequency from 0.88 to 1.68 GHz. The intrinsic angle of polarization for each source, χ0, was sampled randomly from a uniform distribution [−π, +π). For each frequency channel, noise was added to the Stokes Q and U values using a normal distribution with a mean of zero and a variance of 180 μ Jy beam−1, equivalent to an integrated noise level of ∼10 μJy beam−1. These specifications are representative of the observational parameters for the MeerKAT MIGHTEE survey (Taylor & Jarvis 2017).

The frequency-dependence of the flux density was modelled such that
(32)
where Sν is the flux density at each frequency, ν, |$S_{1.4\rm {GHz}}$| is the flux density at 1.4 |$\rm {GHz}$|⁠, and α is the spectral index, which is taken to be the canonical value of α  = 0.7 for synchrotron radiation.
The intensity of the polarized signal P is given by
(33)
where Q and U are the Stokes parameters for linear polarization, I is the total intensity Stokes parameter, and Π is the polarization fraction. To simulate values for Π, the following relationship was used,
(34)
(Stil et al. 2014), where Π is the median percentage polarization fraction and |$S_{\rm 1.4\, GHz}$| is the 1.4 GHz flux density in mJy.

We randomly select 500 sources from this population with an SNRint ≥ 8 and |RM| > ϕlim and use an optimized GP with the kernel specified in equation (9) to recover the rotation measure for each object. We use a prior ranges as specified in Table 1. Uncertainties on the recovered RMs are calculated from the 68 per cent confidence interval on the optimized value of log P, which results in asymmetric uncertainties on the RM values. For 500 randomly selected samples under these constraints all of the recovered RMs lie within 5σ of the true value and |$\gt 99{{\ \rm per\ cent}}$| within 3σ. The average run time per object is 43 seconds on a Macbook Pro with a 3.5 GHz Intel Core i7 processor.

6 DISCUSSION

6.1 Using GP reconstructed Faraday depth spectra

Examples of Faraday depth spectra reconstructed from the optimized GP for Scenario 1 and Scenario 2 are shown in Figs 3 and 5. The reconstructed Faraday depth spectra in each case (right hand column of both figures) represent the structure in the quasi-periodic component of the covariance kernel only. Although the white noise component is taken into account during the optimization, it is not appropriate to scatter the predicted values by this distribution as the white component represents the measurement noise on the original data only. As mentioned already in Section 4.4, in order to avoid overinterpretation of the reconstructed Faraday depth spectrum, the significance of features below the equivalent detection threshold in Faraday depth space should be carefully considered.

If the optimized GPM was used to infer the posterior mean only at the positions of the input data, the model could be subtracted from the data at those positions to estimate a residual, as shown in the lower panel of Figs 2 and 4. This residual could then be transformed separately to the model and combined with the model reconstruction in a manner similar to that employed by the final step of the Cotton-Schwab CLEAN algorithm in synthesis imaging. We note that a kernel-based GPM approach to two-dimensional image synthesis and deconvolution in radio astronomy is likely to be prohibitive in its computational complexity, as the fast separable kernels employed here are only appropriate for one-dimensional data. However, kernel-free methods such as those described by Arras et al. (2020) can achieve similar scaling.

6.1.1 Recovery of Faraday thick structure

As can be seen in Fig. 5, using an optimized GP to reconstruct the Faraday depth spectrum in the case where Faraday thick structure is present can reveal these structures without the need for assuming any specific parametric model of Faraday depth a priori. Although it is not possible to recover the parametrized geometry of the thick structure from the hyperparameters of the GP in this case, the shape is far more clearly delineated in the GP-reconstructed Faraday depth spectrum than in the spectrum calculated from the original data. This remains the case even in the presence of high degrees of RFI excision.

In principle, GPs might also be used to infer the complex polarization signal at smaller values of λ2 than are present in the original data set, which could be beneficial for recovering extended structure in Faraday depth space. When used to infer the behaviour of a signal outside the range of measurements, the posterior covariance of a GP will increase in response to the stationary nature of the covariance not finding any additional measurements to anchor it. This increase in covariance can be used to down-weight predicted values of the posterior mean, relative to those known with more confidence. The behaviour of the inferred posterior mean will tend towards the mean function and in the general case where the mean function is assumed to be zero-valued, this is equivalent to the same logical sampling as in the original data set; however, for spectra where Faraday thick structure are present, which have non-zero means at low-λ2 this may artificially truncate structure. This was found to be the case here for the kernel in equation (9) when used to extend λ2-space.

It is also important to note that structures which exist only on scales larger than that set by the measured |$\lambda ^2_{\rm min}$| will not be recovered by a GP model used for inferring data at positions outside the range of the original measurements. Hence this form of structure enhancement for Faraday thick structures can only be used to improve the recovery of structures which exist on scales that are already sampled in part by the original measurements. Any larger structures will not appear in the Faraday depth spectrum of the GP reconstruction, an effect analogous to flux loss in synthesis imaging for radio interferometry.

6.1.2 Zero-valued Faraday depth signals

In the scenarios considered here, we have assumed a zero-valued mean function. In the absence of instrumental leakage, this assumption is valid for modelling sources with non-zero Faraday rotation; however, there are also two scenarios where a non-zero offset in both Stokes Q and Stokes U can occur, resulting in a zero-valued Faraday depth signal. The first of these is the case of un-rotated radio polarization signals, the second is the presence of residual instrumental leakage. The inclusion of such a constant offset as a free parameter (separately for Q and U) in the optimization process is straightforward to implement and requires only very minor changes to be made in the calculation of the posterior mean. However, for some telescopes with wide bandwidths, polarization leakage can be frequency-dependent and consequently modelling it as a constant offset will not be sufficient. In this case there is potential for confusion between the leakage term and covariate behaviour in the complex polarization; as can be seen in Fig. 4, smoothly varying systematic changes in amplitude can be modelled perfectly well using stationary kernels.

6.2 Faraday Challenge

In order to evaluate the performance of the GP approach against other methods in the literature we use the 17 models from the Faraday challenge of Sun et al. (2015). These models include examples of single Faraday thin objects, multiple (two) Faraday thin objects, and combinations of Faraday thick and Faraday thin objects. While the GP approach can only determine RM values in the case of single Faraday thin models, we can use the model evaluation metrics to compare the predicted Faraday depth spectra in all cases. For this purpose we use the kernel from equation (10) for all test data sets with priors as listed in Table 1. All evaluations are made on models with SNRint = 32, as in Sun et al. (2015).

The reduced χ2 for parametric models described in Sun et al. (2015) is roughly equivalent to the standardized mean squared error (SMSE). For an optimized GP, the SMSE is defined as
(35)
where P* is the posterior mean of complex polarization from the optimized GP, P is the input complex polarization, and |$\sigma _{\rm n}^2$| is the variance of the white noise on the input data. In order to account for the independence of the white noise between Stokes Q and U we set N = 600. The difference between the SMSE and the |$\chi ^2_{\rm r}$| used by Sun et al. (2015) is the normalization, which instead uses NK in the denominator where K is the number of parameters. Although the GP used here is not strictly parametric, we calculate this quantity using K = 4 to represent the number of hyperparameters in our model.
Since the GP optimization method produces a full predictive distribution at the position of each test sample, we also calculate the mean standardized log likelihood as a measure of performance (Rasmussen & Williams 2006). The negative log-likelihood for the GP is given by
(36)
where P are the input polarization data, P* are the GP reconstructed polarization data, and |$\sigma _{\ast }^2$| is the variance of the GP reconstruction, defined in equation (27). We standardize this loss by subtracting the equivalent loss as calculated using a model that predicts using a simple Gaussian with the mean and variance of the input data. The individual performance metrics for each model in the Faraday challenge are listed in Table 2.
Table 2.

Faraday challenge models. This table lists the input parameters for the 17 Faraday challenge models in Columns 1–9, and the evaluation metrics for the predicted complex Stokes data from the optimized GP in Columns 10–12. See Section 6.2 for details.

Input parametersEvaluation metrics
Modelp1ϕ1χ1p2ϕ2χ2p0ϕcϕs|$\chi ^2_r$|SMSEMSLL〈ϕ −SMSE〉
(per cent)rad m−2deg(per cent)rad m−2deg(per cent)rad m−2rad m−2
01100.0500.10400.880.87−1.461.62
02100.049.38601.000.99−0.941.31
03100.04.96601.061.05−0.471.01
0425.0−37.84016.70103.18−361.031.03−0.991.33
0525.0−37.84−4024.005.05−400.960.95−0.711.18
0625.0−37.84−409.005.05−401.051.04−0.801.16
0725.0−44.55016.7037.50721.081.07−0.511.28
0825.0232.56409.00192.70401.041.03−1.271.30
0925.0−37.83−4016.505.051400.910.90−0.721.05
1025.0−37.8409.00103.00−361.051.05−0.901.27
1125.0149.504023.75163.50−681.101.09−1.341.33
1225.0−232.5609.00−50.10721.031.02−1.471.41
1325.0−44.55024.0037.54720.990.98−0.581.17
141.9−136.98500.870.87−1.411.59
151.8−240.22−361.9−250.17501.071.06−1.351.16
161.9−136.98250.990.98−1.531.25
171.8−240.00−361.9−250.17251.021.01−1.411.20
Input parametersEvaluation metrics
Modelp1ϕ1χ1p2ϕ2χ2p0ϕcϕs|$\chi ^2_r$|SMSEMSLL〈ϕ −SMSE〉
(per cent)rad m−2deg(per cent)rad m−2deg(per cent)rad m−2rad m−2
01100.0500.10400.880.87−1.461.62
02100.049.38601.000.99−0.941.31
03100.04.96601.061.05−0.471.01
0425.0−37.84016.70103.18−361.031.03−0.991.33
0525.0−37.84−4024.005.05−400.960.95−0.711.18
0625.0−37.84−409.005.05−401.051.04−0.801.16
0725.0−44.55016.7037.50721.081.07−0.511.28
0825.0232.56409.00192.70401.041.03−1.271.30
0925.0−37.83−4016.505.051400.910.90−0.721.05
1025.0−37.8409.00103.00−361.051.05−0.901.27
1125.0149.504023.75163.50−681.101.09−1.341.33
1225.0−232.5609.00−50.10721.031.02−1.471.41
1325.0−44.55024.0037.54720.990.98−0.581.17
141.9−136.98500.870.87−1.411.59
151.8−240.22−361.9−250.17501.071.06−1.351.16
161.9−136.98250.990.98−1.531.25
171.8−240.00−361.9−250.17251.021.01−1.411.20
Table 2.

Faraday challenge models. This table lists the input parameters for the 17 Faraday challenge models in Columns 1–9, and the evaluation metrics for the predicted complex Stokes data from the optimized GP in Columns 10–12. See Section 6.2 for details.

Input parametersEvaluation metrics
Modelp1ϕ1χ1p2ϕ2χ2p0ϕcϕs|$\chi ^2_r$|SMSEMSLL〈ϕ −SMSE〉
(per cent)rad m−2deg(per cent)rad m−2deg(per cent)rad m−2rad m−2
01100.0500.10400.880.87−1.461.62
02100.049.38601.000.99−0.941.31
03100.04.96601.061.05−0.471.01
0425.0−37.84016.70103.18−361.031.03−0.991.33
0525.0−37.84−4024.005.05−400.960.95−0.711.18
0625.0−37.84−409.005.05−401.051.04−0.801.16
0725.0−44.55016.7037.50721.081.07−0.511.28
0825.0232.56409.00192.70401.041.03−1.271.30
0925.0−37.83−4016.505.051400.910.90−0.721.05
1025.0−37.8409.00103.00−361.051.05−0.901.27
1125.0149.504023.75163.50−681.101.09−1.341.33
1225.0−232.5609.00−50.10721.031.02−1.471.41
1325.0−44.55024.0037.54720.990.98−0.581.17
141.9−136.98500.870.87−1.411.59
151.8−240.22−361.9−250.17501.071.06−1.351.16
161.9−136.98250.990.98−1.531.25
171.8−240.00−361.9−250.17251.021.01−1.411.20
Input parametersEvaluation metrics
Modelp1ϕ1χ1p2ϕ2χ2p0ϕcϕs|$\chi ^2_r$|SMSEMSLL〈ϕ −SMSE〉
(per cent)rad m−2deg(per cent)rad m−2deg(per cent)rad m−2rad m−2
01100.0500.10400.880.87−1.461.62
02100.049.38601.000.99−0.941.31
03100.04.96601.061.05−0.471.01
0425.0−37.84016.70103.18−361.031.03−0.991.33
0525.0−37.84−4024.005.05−400.960.95−0.711.18
0625.0−37.84−409.005.05−401.051.04−0.801.16
0725.0−44.55016.7037.50721.081.07−0.511.28
0825.0232.56409.00192.70401.041.03−1.271.30
0925.0−37.83−4016.505.051400.910.90−0.721.05
1025.0−37.8409.00103.00−361.051.05−0.901.27
1125.0149.504023.75163.50−681.101.09−1.341.33
1225.0−232.5609.00−50.10721.031.02−1.471.41
1325.0−44.55024.0037.54720.990.98−0.581.17
141.9−136.98500.870.87−1.411.59
151.8−240.22−361.9−250.17501.071.06−1.351.16
161.9−136.98250.990.98−1.531.25
171.8−240.00−361.9−250.17251.021.01−1.411.20

The median of the reduced chi-squared statistic recovered from our model, |$\chi ^2_{\rm r} = 0.99$|⁠, is in good agreement with the expected optimum value of 1.00 ± 0.02 from Sun et al. (2015). Results for both the SMSE and |$\chi _{\rm r}^2$| across all 17 test models, evaluated using SNR = 32 as in Sun et al. (2015), are shown in Fig. 8. By comparison to Fig. 3 in that work it can be seen that the GP |$\chi _{\rm r}^2$| has a median value comparable to that of the three best performing methods in that work, each of which use QU-fitting approaches. Furthermore, it can also be seen that the distribution of |$\chi _{\rm r}^2$| for the GP method is significantly narrower than those methods, each of which has a 68 per cent confidence interval that extends to values larger than |$\chi _{\rm r}^2=1.1$|⁠.

Box-and-whisker plots for figures of merit over all 17 Faraday challenge models. The boxes show the first, the second (median), and the third quartile, and the ends of the whiskers show the edges of the distribution. Points deemed to be outliers are shown as open circles.
Figure 8.

Box-and-whisker plots for figures of merit over all 17 Faraday challenge models. The boxes show the first, the second (median), and the third quartile, and the ends of the whiskers show the edges of the distribution. Points deemed to be outliers are shown as open circles.

In order to evaluate the resulting Faraday depth reconstructions we draw T = 100 Monte Carlo realizations from the posterior distribution of the optimized GP for each model. We then construct a residual Faraday depth spectrum from the difference between the model realization and the noiseless Faraday challenge model. To mitigate the effect of noise correlation in Faraday depth space, we downsample these residual Faraday depth data by a factor
(37)
where ΔϕFWHM ≃ 135 rad m−2 is the FWHM of the RMTF for the Faraday challenge data and δϕ is the pixel size in Faraday depth. We use the down-sampled residual Faraday depth spectra to calculate an SMSE of the form shown in equation (35), relative to a model of 0 + 0j. The Faraday depth SMSE uses |$\sigma _{\phi } = \sigma /\sqrt{N_{\rm chan}}$| as a normalization, as described in Section 4.4, and N = 2 × Nds, where Nds is the number of data points in each of the downsampled complex parts of the residual Faraday depth spectrum. We average the resulting SMSE values across all realizations for a particular model to obtain the quantity 〈ϕ-SMSE〉 and this is listed for each model in Table   2 and the distribution visualized in Fig. 8.

By comparing the SMSE and 〈ϕ-SMSE〉 values for each model it can be seen that the goodness of fit for the GP model estimates is higher in the Stokes   Q and U data space than it is in the Faraday depth signal space. This is to be expected as the model itself is optimized against the noisy Stokes data. Fig. 8 also indicates models that are deemed to be outliers based on the interquartile range of the data. These two outliers are Models 1 and 14, which are low outliers for the data (QU) space SMSE distribution and high outliers for the signal (ϕ) space 〈ϕ-SMSE〉 distribution, indicating that the GP model estimates are overfitting the noisy QU data in these cases, resulting in larger Faraday depth residuals.

The calculation of 〈ϕ-SMSE〉 performed here is possible due to the nature of the GP model, which provides a full predictive posterior distribution for each estimate. Whilst equivalent values are not available for the other methods profiled in the Faraday challenge of Sun et al. (2015), we are able to visualize the equivalent residual Faraday depth spectra. For example, for Model 1, the average difference between the Faraday depth, ϕ1 (see Table 2), of the noiseless model and the fitted value of this parameter across all of the methods in the Sun et al. (2015) paper is Δϕ1 = 1.57 rad m−2 (see table 3 in that work). In Fig. 9 we show the residual Faraday depth spectra for the difference between the true noiseless model and a model with a value of ϕ1 that is incorrect by this amount. We also include residuals for Δϕ1 = 0.60 rad m−2 and Δϕ1 = 3.60 rad m−2, which were the smallest and largest differences returned by the methods evaluated against Model 1 in the Sun et al. (2015) comparison. It can be seen from this figure that the amplitude of the residuals from the GP model are equivalent to those of the best fitting model from the Sun et al. (2015) Faraday challenge in this case; however, it should also be noted that the nature of these residuals is different in form to those resulting from an inexact parametric model.

Upper: Residual Faraday depth spectra for Model 1 of the Faraday challenge, constructed using the transform of the difference between the noiseless model and a model where the fitted Faraday depth, ϕ1, differs from the input model by (i) an average deviation of Δϕ1 = 1.57 rad m−2 (grey line indicates absolute polarization, with real and imaginary components shown in cyan), (ii) a minimum deviation of Δϕ1 = 0.60 rad m−2 (blue dashed line), and (iii) a maximum deviation of Δϕ1 = 3.60 rad m−2 (magenta dashed line). Lower: Residual Faraday depth spectrum for Model 1 of the Faraday challenge, constructed using the transform of the difference between the noiseless model and the posterior mean of the optimized GP model. The ±5 σϕ limits appropriate to SNRint = 32 for the Faraday challenge are shown as a light-grey shaded area.
Figure 9.

Upper: Residual Faraday depth spectra for Model 1 of the Faraday challenge, constructed using the transform of the difference between the noiseless model and a model where the fitted Faraday depth, ϕ1, differs from the input model by (i) an average deviation of Δϕ1 = 1.57 rad m−2 (grey line indicates absolute polarization, with real and imaginary components shown in cyan), (ii) a minimum deviation of Δϕ1 = 0.60 rad m−2 (blue dashed line), and (iii) a maximum deviation of Δϕ1 = 3.60 rad m−2 (magenta dashed line). Lower: Residual Faraday depth spectrum for Model 1 of the Faraday challenge, constructed using the transform of the difference between the noiseless model and the posterior mean of the optimized GP model. The ±5 σϕ limits appropriate to SNRint = 32 for the Faraday challenge are shown as a light-grey shaded area.

6.3 Alternative kernels

For a given regression problem, there are an infinite number of models that can be used. As such, it is usually not feasible to test all possible models. In this work so far we have used the GP kernel defined in equation (9); here we consider the use of alternative kernels. We denote our original kernel as ‘3-HP’ and we compare it to (i) the quasi-periodic kernel used by Foreman-Mackey et al. (2017) to model stellar variability, and (ii) a hybrid kernel including a second squared-exponential term.

In Foreman-Mackey et al. (2017), the period of rotation of a star was determined by extracting the optimized P hyperparameter, analogous to our use of P to estimate the rotation measure of a Faraday thin source. The stellar variability kernel from Foreman-Mackey et al. (2017) is also built using the celerite library and has the form,
(38)
where A, B, C, and P are hyperparameters. We denote this kernel as ‘4-HP’.
The hybrid kernel, denoted here as ‘5-HP’, has the form,
(39)
where h1, c1, P, h2, and c2 are hyperparameters. Both the 4-HP and 5-HP kernels have an extra term which is intended to model smoothly varying non-periodic behaviour such as that which might arise from Faraday thick components along the line of sight.

Although different methods of evaluating competing models have been developed, the computational complexity involved in applying them limits their applicability. To overcome this limitation, easier-to-use approximate methods are commonly used to compare the performance of two or more models. These include the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC) and the corrected Akaike Information Criterion (AICc). These information criteria are designed to evaluate the amount of information lost by a particular model. The model that loses the least information, i.e. has the lowest valued information criterion, is considered to be the preferred model.

Burnham & Anderson (2004) suggest that the BIC is a powerful model selection tool when the ‘true’ model is in the set of test models; however, in the case of empirical data where the true model is unlikely to be known a priori the AIC is considered to be a more appropriate criterion. Furthermore, the AICc places a stronger complexity penalty on models used with finite data series than the original AIC. In the case of small sample sizes it is considered to be more appropriate and accurate for finding the preferred model. However, Kass & Raftery (1995) suggest that the AIC should only be used in situations where the prior distribution is similar to that of the likelihood and that when this isn’t the case and the prior is significantly less informative than the likelihood then the BIC will favour simpler models than the AIC.

For each sample data set we optimize the hyperparameters for each of the three kernels using the method described in Section 4.3 and use these values to compute the Bayesian Information Criteria (BIC; Schwarz 1978),
(40)
the AIC (Akaike 1974),
(41)
and the AICc (Hurvich & Tsai 1989),
(42)
where log L* is given by equation (36), K is the number of hyperparameters, and N = 600 is the number of data points.

We calculate the median value and inter-quartile range for each of the three information criteria from ten different noise realizations with SNRint = 32. The goodness of fit statistics for the Faraday challenge models using the two alternative kernels are |$\chi ^2_{\rm r} = 0.98$| for the 4-HP model, and |$\chi ^2_{\rm r} = 0.97$| for the 5-HP model.

The median values of the information criteria suggest a preference for the HP-4 kernel in the case of single Faraday thin component models, a preference for the HP-5 kernel in the case of double Faraday thin models, and a preference for the HP-3 kernel in the case of Faraday thick models. However, we note that these are modal preferences and no model type shows a unanimous preference for a particular kernel. Indeed, although ranking the three kernels using the median value of the information criteria for each data set may nominally show a preference for one kernel above others, the overlapping inter-quartile ranges for most models suggest that there is in fact no clear preference. The only model where one kernel is significantly preferred is Model 1, where the inter-quartile range of the HP-4 kernel is outside the inter-quartile range of both other kernels, see Fig. 10. We also see no difference in the median ranking across the three information criteria, which is perhaps unsurprising given that NK in all cases. Furthermore, as noted by Bollen et al. (2014), information criteria comparisons are appropriate only for data sets with the same number of measurement samples, and we emphasize that the kernel comparison presented here is for the specifications of the test sets in the Sun et al. (2015) Faraday challenge.

Box-and-whisker plots for BIC values derived from each of the 3 kernels using 10 noise realizations of Model 1 from Sun et al. (2015). The boxes show the first, the second (median), and the third quartile, and the ends of the whiskers show the minimum and maximum values.
Figure 10.

Box-and-whisker plots for BIC values derived from each of the 3 kernels using 10 noise realizations of Model 1 from Sun et al. (2015). The boxes show the first, the second (median), and the third quartile, and the ends of the whiskers show the minimum and maximum values.

7 CONCLUSIONS

In this work we suggest that GPM may be a flexible pre-processing approach for the improved analysis of Faraday depth spectra as it does not rely on an exact parametrized physical model, or set of models, to be known a priori and is therefore suitable for situations where some prior knowledge of the form of the signal is understood, but the exact line of sight structure is imperfectly known. This is often the case for Faraday depth analysis. With regard to this, we have shown that even simple GPM covariance kernels can be used to impute data reliably for complex lines of sight.

We have demonstrated the applicability of GPM for improved recovery of structure in Faraday depth spectra. We have shown that these improvements include the reduction of sidelobe contamination in Faraday depth spectra through imputation of missing data when RFI flagging causes significant losses, even in the case where significant portions of the original data set are lost. We have also illustrated the benefits of using GPM to re-sample complex polarization data on to regular spacings in λ2-space as an alternative to currently used re-gridding methods. We have shown that these applications in combination can provide an improved representation of Faraday thick structure in Faraday depth spectra.

We have shown that under certain circumstances the hyperparameters of the periodic covariance kernel can be used to recover rotation measure values for Faraday thin sources and that this hyperparameter can also be physically meaningful in a more limited way for complex lines of sight. We have presented a method to calculate the range of RM over which this method is applicable for different telescopes and/or data sets and we have demonstrated that within this range RM recovery from GP hyperparameters is robust to typical detection thresholds.

We have discussed limitations in the use of GPM reconstructed Faraday depth spectra, highlighted situations in which GPM is not an appropriate solution and presented a quantitative method for determining a threshold above which Faraday structure can be safely considered significant when reconstructed using an optimized GP fitted to noisy data.

We have evaluated the GP reconstruction proposed here against other methods in the literature and have shown that it has a data space performance equivalent to other QU-fitting methods and furthermore that the variance in results from the GP approach is smaller in the Stokes data space than seen from those methods. Using the same test data we demonstrate that even complex Faraday structure can be recovered using a comparatively simple covariance kernel.

ACKNOWLEDGEMENTS

We thank the anonymous referee for their careful reading of the paper and their helpful comments, which improved the content of this paper. The authors also thank Torsten Enßlin for useful comments on an earlier draft of this paper. This work makes extensive use of the celerite library.3 SWN acknowledges support from the UK Newton Fund via the Science & Technology Facilities Council (STFC) through the Development in Africa with Radio Astronomy (DARA) Big Data project ST/R001898/1. AMS gratefully acknowledges support from an Alan Turing Institute AI fellowship EP/V030302/1. JH acknowledges support from the UK Science & Technology Facilities Council (STFC). MC acknowledges support from the National Agency for Research and Development (ANID) Scholarship Program DOCTORADO BECAS CHILE under grant 2018-72190574.

DATA AVAILABILITY

All code from this work is publicly available on Github at the following address: https://github.com/as595/GPM-for-Faraday-Depth-Spectra

Footnotes

1

In statistics, imputation is the process of replacing missing data points with substituted values such as model estimates.

REFERENCES

Aigrain
 
S.
,
Parviainen
 
H.
,
Pope
 
B. J. S.
,
2016
,
MNRAS
,
459
,
2408
 

Akaike
 
H.
,
1974
,
IEEE Trans. Autom. Control
,
19
,
716
 

Angus
 
R.
,
Morton
 
T.
,
Aigrain
 
S.
,
Foreman-Mackey
 
D.
,
Rajpaul
 
V.
,
2018
,
MNRAS
,
474
,
2094
 

Arras
 
P.
,
Perley
 
R. A.
,
Bester
 
H. L.
,
Leike
 
R.
,
Smirnov
 
O.
,
Westermann
 
R.
,
Enßlin
 
T. A.
,
2021
,
A&A
,
646
,
A84
 

Barclay
 
T.
,
Endl
 
M.
,
Huber
 
D.
,
Foreman-Mackey
 
D.
,
Cochran
 
W. D.
,
MacQueen
 
P. J.
,
Rowe
 
J. F.
,
Quintana
 
E. V.
,
2015
,
ApJ
,
800
,
46
 

Bollen
 
K. A.
,
Harden
 
J. J.
,
Ray
 
S.
,
Zavisca
 
J.
,
2014
,
Struct. Equ. Modeling
,
21
,
1
 

Bond
 
J. R.
,
Efstathiou
 
G.
,
1987
,
MNRAS
,
226
,
655
 

Brandenburg
 
A.
,
Stepanov
 
R.
,
2014
,
ApJ
,
786
,
91
 

Brentjens
 
M. A.
,
de Bruyn
 
A. G.
,
2005
,
A&A
,
441
,
1217
 

Burn
 
B. J.
,
1966
,
MNRAS
,
133
,
67
 

Burnham
 
K. P.
,
Anderson
 
D. R.
,
2004
,
Sociol. Methods Res.
,
33
,
261
 

Czekala
 
I.
,
Andrews
 
S. M.
,
Mandel
 
K. S.
,
Hogg
 
D. W.
,
Green
 
G. M.
,
2015
,
ApJ
,
812
,
128
 

Dewdney
 
P. E.
,
Hall
 
P. J.
,
Schilizzi
 
R. T.
,
Lazio
 
T. J. L. W.
,
2009
,
Proc. IEEE
,
97
,
1482
 

Evans
 
T. M.
,
Aigrain
 
S.
,
Gibson
 
N.
,
Barstow
 
J. K.
,
Amundsen
 
D. S.
,
Tremblin
 
P.
,
Mourier
 
P.
,
2015
,
MNRAS
,
451
,
680
 

Farnsworth
 
D.
,
Rudnick
 
L.
,
Brown
 
S.
,
2011
,
AJ
,
141
,
191
 

Foreman-Mackey
 
D.
,
Hogg
 
D. W.
,
Lang
 
D.
,
Goodman
 
J.
,
2013
,
PASP
,
125
,
306
 

Foreman-Mackey
 
D.
,
Agol
 
E.
,
Angus
 
R.
,
Ambikasaran
 
S.
,
2017
,
AJ
,
154
,
220
 

George
 
S. J.
,
Stil
 
J. M.
,
Keller
 
B. W.
,
2012
,
Publ. Astron. Soc. Aust.
,
29
,
214
 

Gibson
 
N. P.
,
Aigrain
 
S.
,
Roberts
 
S.
,
Evans
 
T. M.
,
Osborne
 
M.
,
Pont
 
F.
,
2012
,
MNRAS
,
419
,
2683
 

Haywood
 
R. D.
 et al. ,
2014
,
MNRAS
,
443
,
2517
 

Heald
 
G.
,
2009
, in
Strassmeier
 
K. G.
,
Kosovichev
 
A. G.
,
Beckman
 
J. E.
, eds,
IAU Symp. Vol. 259, Cosmic Magnetic Fields: From Planets, to Stars and Galaxies
. Kluwer, Dordrecht, p.
591
 

Hurvich
 
C. M.
,
Tsai
 
C.-L.
,
1989
,
Biometrika
,
76
,
297
 

Ideguchi
 
S.
,
Tashiro
 
Y.
,
Akahori
 
T.
,
Takahashi
 
K.
,
Ryu
 
D.
,
2014
,
ApJ
,
792
,
51
 

Jonas
 
J. L.
,
2009
,
IEEE Proc.
,
97
,
1522
 

Jonas
 
J. L.
,
2018
,
PoS
,
MeerKAT2016
,
001

Junklewitz
 
H.
,
Bell
 
M. R.
,
Selig
 
M.
,
Enßlin
 
T. A.
,
2016
,
A&A
,
586
,
A76
 

Kass
 
R. E.
,
Raftery
 
A. E.
,
1995
,
J. Am. Stat. Assoc.
,
90
,
773

Law
 
C. J.
 et al. ,
2011
,
ApJ
,
728
,
57
 

Leike
 
R. H.
,
Enßlin
 
T. A.
,
2019
,
A&A
,
631
,
A32
 

Littlefair
 
S. P.
,
Burningham
 
B.
,
Helling
 
C.
,
2017
,
MNRAS
,
466
,
4250
 

McAllister
 
M. J.
 et al. ,
2016
,
MNRAS
,
464
,
1353
 

Macquart
 
J. P.
,
Ekers
 
R. D.
,
Feain
 
I.
,
Johnston-Hollitt
 
M.
,
2012
,
ApJ
,
750
,
139
 

Mauch
 
T.
 et al. ,
2020
,
ApJ
,
888
,
61
 

Mertens
 
F. G.
,
Ghosh
 
A.
,
Koopmans
 
L. V. E.
,
2018
,
MNRAS
,
478
,
3640
 

O’Sullivan
 
S. P.
 et al. ,
2012
,
MNRAS
,
421
,
3300
 

Pratley
 
L.
,
Johnston-Hollitt
 
M.
,
2020
,
ApJ
,
894
,
38
 

Pratley
 
L.
,
Johnston-Hollitt
 
M.
,
Gaensler
 
B. M.
,
2020
,
preprint (arXiv:2010.07932)

Rajpaul
 
V.
,
Aigrain
 
S.
,
Osborne
 
M. A.
,
Reece
 
S.
,
Roberts
 
S.
,
2015
,
MNRAS
,
452
,
2269
 

Rajpaul
 
V.
,
Aigrain
 
S.
,
Roberts
 
S.
,
2016
,
MNRAS
,
456
,
L6
 

Rasmussen
 
C.
,
Williams
 
C.
,
2006
,
Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning
.
MIT Press
,
Cambridge, MA, USA

Roberts
 
S.
,
Osborne
 
M.
,
Ebden
 
M.
,
Reece
 
S.
,
Gibson
 
N.
,
Aigrain
 
S.
,
2013
,
Phil. Trans. R. Soc. A
,
371
,
20110550
 

Schwarz
 
G.
,
1978
,
Ann. Stat.
,
6
,
461

Seaman
 
S.
,
Galati
 
J.
,
Jackson
 
D.
,
Carlin
 
J.
,
2013
,
Statistical Science
,
28
,
257

Stil
 
J. M.
,
Keller
 
B. W.
,
George
 
S. J.
,
Taylor
 
A. R.
,
2014
,
ApJ
,
787
,
99
 

Sun
 
X. H.
 et al. ,
2015
,
AJ
,
149
,
60
 

Sun
 
S.
,
Zhang
 
G.
,
Wang
 
C.
,
Zeng
 
W.
,
Li
 
J.
,
Grosse
 
R.
,
2018
,
PMLR
,
80
,
4828

Taylor
 
A. R.
,
Jarvis
 
M.
,
2017
,
IOP Conf. Ser.: Mater. Sci. Eng.
,
198
,
012014
 

Van Eck
 
C. L.
 et al. ,
2018
,
A&A
,
613
,
A58
 

Wandelt
 
B. D.
,
Hansen
 
F. K.
,
2003
,
Phys. Rev. D
,
67
,
023001
 

Way
 
M. J.
,
Foster
 
L. V.
,
Gazis
 
P. R.
,
Srivastava
 
A. N.
,
2009
,
ApJ
,
706
,
623
 

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)