-
PDF
- Split View
-
Views
-
Cite
Cite
S W Ndiritu, A M M Scaife, D L Tabb, M Cárcamo, J Hanson, Gaussian process modelling for improved resolution in Faraday depth reconstruction, Monthly Notices of the Royal Astronomical Society, Volume 502, Issue 4, April 2021, Pages 5839–5853, https://doi.org/10.1093/mnras/stab379
- Share Icon Share
ABSTRACT
The incomplete sampling of data in complex polarization measurements from radio telescopes negatively affects both the rotation measure (RM) transfer function and the Faraday depth spectra derived from these data. Such gaps in polarization data are mostly caused by flagging of radio frequency interference and their effects worsen as the percentage of missing data increases. In this paper we present a novel method for inferring missing polarization data based on Gaussian processes (GPs). GPs are stochastic processes that enable us to encode prior knowledge in our models. They also provide a comprehensive way of incorporating and quantifying uncertainties in regression modelling. In addition to providing non-parametric model estimates for missing values, we also demonstrate that GP modelling can be used for recovering rotation measure values directly from complex polarization data, and that inferring missing polarization data using this probabilistic method improves the resolution of reconstructed Faraday depth spectra.
1 INTRODUCTION
Although conceptually simple, in practical terms the implementation of this method and the interpretation of the resulting Faraday dispersion function are complicated by a number of factors. The first of these complications is that the Stokes pseudo-vectors Q and U are not sampled natively in wavelength-squared but linearly in frequency. A further complication is that equation (1) does not represent a true Fourier relationship as P(λ2) does not exist at λ2 < 0. Since F(ϕ) is not necessarily a purely real quantity this represents a fundamental limitation in attempting to reconstruct an unknown Faraday dispersion function from measured values of P(λ2). An additional limitation comes from the finite bandwidth and hence range of wavelength-squared, Δ(λ2), measured by a radio telescope receiver, as well as the potentially incomplete sampling over this bandwidth due to various effects [radio frequency interference (RFI), instrumental problems, etc.] that require flagging (excision) of specific frequency channel data during an observation.
This can cause statistical ambiguity in the results of such a deconvolution process, as Faraday depth is inherently complex-valued and therefore not subject to the same conjugate symmetry as, for example, interferometric imaging. It has also been shown that certain Faraday structures will themselves not have conjugate symmetry (Brandenburg & Stepanov 2014). Consequently, underconstrained deconvolution processes are liable to introduce spurious non-physical structures, whilst also being unable to reconstruct all true physical structures (Macquart et al. 2012; Pratley, Johnston-Hollitt & Gaensler 2020).
These issues are of particular importance when complex or compound Faraday structures are seen along a single line of sight. In a blind data challenge comparing different methods for extracting the parameters of Faraday structures, Sun et al. (2015) noted that RM synthesis was highly prone to missing composite Faraday structures and that the ∼25 per cent of extragalactic sources found observationally by Law et al. (2011) to be Faraday complex was likely an underestimate. Sun et al. (2015) concluded that when Faraday complexity is present, only QU-fitting methods will produce results that do not introduce extraneous scatter in recovered RM values. Such QU-fitting methods, e.g. Farnsworth, Rudnick & Brown (2011), O’Sullivan et al. (2012), and Ideguchi et al. (2014), model the observed polarization, |$\tilde{P}(\lambda ^2)$|, directly rather than decomposing Faraday depth spectra to recover parameters. However, whilst QU-fitting methods can be more reliable for recovering composite structures, in particular those including Faraday thick components, in general they rely on physically parametrized models of the synchrotron-emitting and Faraday rotating media along the line of sight to be known a priori. One recent exception to this is the deconvolution method proposed by Pratley et al. (2020), which uses a constrained clean-style non-parametric QU-fitting approach that combines QU-fitting with deconvolution.
In this work we explore the potential for using a Gaussian process modelling (GPM) approach to improve both the interpretation of Faraday depth spectra and the recovery of physical parameters from polarization measurements. In its simplest form, a GPM approach can be considered part of the family of QU-fitting methods; however, it has certain advantages over physically parametrized QU-fitting methods, which include both a robustness to using non-exact models for imperfectly known data sets and an ability to provide predictive models for imputation including posterior variances. The contents of this paper are as follows, in Section 3 we describe the underlying GPM methodology including a description of the covariance kernel function employed and the optimization of hyperparameters; in Section 4 we illustrate the impact of GP optimization and re-sampling of polarization data on their resulting Faraday depth spectra in the presence of significant RFI flagging, and demonstrate how the rotation measures for Faraday thin data can be recovered from the hyperparameters of the GP; in Section 5 we demonstrate the method using a simulated source population with observational specifications matching that of the MeerKAT radio telescope; in Section 6 we discuss the implications of these results, compare the method to previous work in the literature and consider the use of more complex covariance kernels. In Section 7 we draw our conclusions.
2 GAUSSIAN PROCESS MODELLING FOR FARADAY DEPTH RECONSTRUCTION
Gaussian processes (GPs) are probabilistic non-parametric models that can be used for a variety of regression and classification problems. GPM can be used to describe a given data set by assuming a priori that the data themselves represent a realization of a stochastic process described by a multivariate Gaussian distribution. Such models are non-parametric as the GP is a prior on the data itself rather than on a parametric model, i.e. the complexity of the model grows with the inclusion of additional data points. GPMs are considered to be an attractive solution when modelling systems where the degree of complexity required by a parametric model is not necessarily supported by the amount of information in the observations. Overly complex parametric models are prone to overfitting and can lead to misinterpretation of the data. In contrast, Gaussian processes are flexible probabilistic regression models that do not require exact knowledge of the underlying physical model and can also provide posterior uncertainties that naturally reflect high uncertainty in regions of data space where there are few observational constraints.
In astrophysics, Gaussian processes have been used in various applications including in the study of photometric redshifts (Way et al. 2009), variability in stellar light curves (Angus et al. 2018), galactic dust emission (Leike & Enßlin 2019), eclipses in cataclysmic variable stars (McAllister et al. 2016), transmission spectroscopy of extrasolar planets (Gibson et al. 2012), and radio velocity studies for planetary candidates (Rajpaul et al. 2015). They have also been used extensively in cosmic microwave background analysis (Bond & Efstathiou 1987; Wandelt & Hansen 2003), as well as for modelling instrumental systematics and calibration, e.g. Gibson et al. (2012), Haywood et al. (2014), Barclay et al. (2015), Czekala et al. (2015), Evans et al. (2015), Rajpaul et al. (2015), Aigrain, Parviainen & Pope (2016), Rajpaul, Aigrain & Roberts (2016), Littlefair, Burningham & Helling (2017), Mertens, Ghosh & Koopmans (2018) and interferometric image reconstruction (Junklewitz et al. 2016).
2.1 GPM for imputation of missing polarization data
The transfer function described in equation (3) is a function of the observational sampling in λ2-space. In practice this sampling is affected not only by the characteristics of the telescope’s receiver system but also by the necessary removal of frequency channels affected by RFI. Radio telescopes operating at or close to L band (1.4 GHz) typically experience data flagging at a level of 20–50 per cent in bandwidth due to a wide variety of both terrestrial and satellite RFI sources (e.g. Mauch et al. 2020). As well as negatively impacting the sensitivity of an observation, this flagging can also result in large contiguous gaps in the frequency coverage that have a detrimental effect on the form of the RMTF.
Contingent on there being sufficient information in the remaining data, GPM may provide a potential method for imputing1 these missing data subject to only weak assumptions about the form of the data and without the need for a parametrized physical model. Furthermore, GPM not only provides a model estimate of the value for each missing data point but also an uncertainty on that value, allowing astronomers to set thresholds on the degree of reliability they are prepared to accept when imputing the values of missing data points.
From the perspective of imputation in a more general sense, this kind of application addressing RFI excision can be considered as an example of recovering data missing completely at random (MCAR) and/or missing at random (MAR; e.g. Seaman et al. 2013), depending on the nature of the RFI, i.e. it is being used to repair deficits from local interruptions in the data set rather than recovering information from a signal at or below the reliable cusp of detection.
2.2 GPM for resampling of irregular data
In addition to the imputation of missing data values, GPM can also provide a method of re-sampling observational data on to a uniform spacing in λ2-space. Since data are natively taken in channels uniformly spaced in frequency, some previous practical implementations of RM Synthesis have employed an initial processing step to re-bin spectral data, |$Q(\nu)~\&~U(\nu)$|, into regularly spaced arrays of λ2 in order to use a fast Fourier transform (FFT) transformation into Faraday depth space, as implemented in for example pyrmsynth 2 package (described in Pratley & Johnston-Hollitt 2020) that was used to perform RM synthesis for the LOFAR Two-metre Sky Survey (Van Eck et al. 2018). Like RFI flagging, the nonlinear relationship between frequency and wavelength-squared results in a distribution of measured data points that are often separated by large contiguous gaps in λ2-space, resulting in high sidelobes in the RMTF. Furthermore, GPM can not only be used to re-sample data within the observed bandwidth but also to infer data beyond the boundaries of that bandwidth. Without additional data, these inferences will be subject to rapidly increasing degrees of uncertainty but may still be used over moderate distances in frequency space to improve the resolution of a measurement in Faraday depth. In practice, re-sampling to uniformly spaced λ2-positions and imputing the values of data that are missing due to RFI flagging can be done simultaneously.
3 METHOD
For polarization data measured at regular intervals in frequency (spectral channels) and hence irregular intervals in λ2, the posterior mean can be used to infer the values of the polarization data at regular intervals, |$\lambda _{\ast }^2$|, avoiding the need for explicit re-binning of the data, and the posterior variance can be used to define the weighting function |$W(\lambda _{\ast }^2)$| at those positions and hence calculate the RMTF.
3.1 Kernel definition
3.2 Covariance kernel definition
For the application to radio polarization data we use a compound covariance kernel. The contributions to this kernel comprise covariance arising both from the expected signal and from the properties of the measurement data. Faraday rotation results in periodicity of the complex polarization signal but the presence of additional Faraday components, including thick structures, can cause deviations from exact periodicity in the data. To represent this behaviour we use a quasi-periodic kernel formed from the multiplication of an exponential kernel and a periodic kernel.
We implement our kernel using the celerite Gaussian processing library (Foreman-Mackey et al. 2017). Traditional GPM scales as |$\mathcal {O}(N^3)$|. This is due to the need to invert the covariance matrix and often makes GPM computationally intractable for large data sets. In contrast, the celerite GP implementation uses a restricted set of kernels that allow for fast separable evaluation with a linear complexity of |$\mathcal {O}(N)$|.
3.3 Hyperparameter estimation
4 APPLICATION TO SIMULATED DATA
4.1 Example scenarios
We illustrate the use of the GPM method on two examples: (1) a simple Faraday Thin scenario, and (2) a more complex scenario involving both Faraday Thin and Faraday Thick components. This second scenario corresponds to the example geometry outlined in section 1 of Brentjens & de Bruyn (2005).
4.1.1 Scenario 1
4.1.2 Scenario 2
In both scenarios we assume that the signal is measured over a frequency bandwidth from 0.58 to 2.50 GHz in 512 evenly spaced frequency channels. We map each channel frequency to a corresponding value of λ2. For each measurement we assume that (Q, U, σQ, σU) are recorded, where Q corresponds to the real part of the polarization and U corresponds to the imaginary component.
4.2 Faraday depth spectra
The recovered Faraday depth spectra for Scenario 2 are shown in Fig. 1, which illustrates the difference in the Faraday depth spectra when polarization data are sampled uniformly in frequency compared to being sampled uniformly in λ2-space. From the figure, two key differences can be observed. First, by comparing the full width at half-maximum (FWHM) of the RMTF in each case, it can be seen that the resolution in Faraday depth is improved by a factor of ∼2. This is important for resolving different Faraday components and complex structure. Secondly, it can be seen that the main lobe of the RMTF is clearly separated from the sidelobe structure in the case of uniform λ2-sampling, and that the amplitude of the first sidelobe is ∼3 times smaller than that resulting from uniform sampling in frequency. As well as aiding visual inspection, this is important for approaches that incorporate deconvolution, as incorrect or imperfect identification of model component positions during deconvolution will result in residuals with amplitudes proportional to RMTF sidelobe levels. As a consequence of these factors, the Faraday thick structure from Scenario 2 in Fig. 1 is resolved in the resulting Faraday depth spectrum when using uniform λ2-sampling, with the positions of the edges clearly delineated. Additionally, we note that the sidelobe response of regular sampling in λ2-space could be further improved by the application of a window function prior to Fourier transforming, and that such an application could be tailored to balance the contributions from sidelobe structure, thermal noise and resolution, as needed for an individual science case.

Scenario 2: RMTF (left), Faraday depth spectrum (centre), and Faraday depth spectrum overlaid with input model (right) for the case of (i) data evenly sampled in frequency (top) and (ii) data evenly sampled in wavelength-squared (bottom). Data are noiseless in both cases.
4.3 Parameter estimation
To optimize the hyperparameters of our GP kernel we use Markov Chain Monte Carlo (MCMC) implemented via the emcee package (Foreman-Mackey et al. 2013) to explore the posterior probability distribution. We use 500 samples for a burn-in then take the maximum likelihood hyperparameter values from the burn-in and perturb them by adding noise at a level of 10−5. We then use these perturbed values to start a production run of 5000 samples. Prior ranges for each of the hyperparameters are shown in Table 1. The lower limit on log P corresponds to a maximum Faraday depth of ∼1200 rad m−2.
Range of uniform priors used for each of the hyperparameters in GP kernel, see equation (9).
Hyperparameter . | Prior . |
---|---|
ln h | (−25,5) |
ln c | (−25,5) |
ln P | (−6,10) |
ln σ | (−25,5) |
Hyperparameter . | Prior . |
---|---|
ln h | (−25,5) |
ln c | (−25,5) |
ln P | (−6,10) |
ln σ | (−25,5) |
Range of uniform priors used for each of the hyperparameters in GP kernel, see equation (9).
Hyperparameter . | Prior . |
---|---|
ln h | (−25,5) |
ln c | (−25,5) |
ln P | (−6,10) |
ln σ | (−25,5) |
Hyperparameter . | Prior . |
---|---|
ln h | (−25,5) |
ln c | (−25,5) |
ln P | (−6,10) |
ln σ | (−25,5) |
4.4 Imputation of missing data
RFI flagging creates gaps in the polarization data thereby reducing the λ2-coverage in an irregular fashion and causing the sidelobe structure in the RMTF to become more prominent.To mimic the RFI flagging process, we remove random portions of data from the simulated data set. To do this, a random position in the data set is selected and a chunk comprised of a randomly selected number of entries is removed. This process is repeated until the desired percentage of data is removed. We simulate data sets with 20, 30, and 40 per cent missing values. An example of this flagging using the Scenario 1 data set is shown in Fig. 2, which shows the input data, flagged at a level of 30 per cent. These data are scattered by white noise, giving an SNR of one in each channel.

Left upper: Stokes parameters for Scenario 1 (see Section 4.1) with SNRchan = 1 and 30 per cent of the data flagged (black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: residuals between the posterior mean for Stokes Q and the data (black points) and residuals between the posterior mean and the missing/flagged Stokes Q data (blue points). The ±3σ limits are marked by grey dashed lines. Right: The inferred period of the process with the true period marked (vertical blue line) as well as the 16, 50, and 84 per cent confidence intervals of the posterior probability distribution (grey dashed lines).
Fig. 2 also shows the posterior mean from the GP model, optimized using only the unflagged data points, as well as the residual between the input data and the optimized GP model for both the unflagged and flagged data points. The right-hand panel of Fig. 2 shows the posterior probability distribution for the hyperparameter, P, which can be used to calculate the rotation measure as described later in this work, see Section 4.6.
The effect of the flagging process on the Faraday depth spectrum is shown for Scenario 1 in Fig. 3. In Fig. 3 (centre), data have been imputed at regular intervals across the full bandwidth of the simulation and the Faraday depth spectrum calculated using the posterior mean values, weighted by the posterior covariance at each point.

Faraday depth spectra for Scenario 1 (Faraday thin) with 20 per cent (top), 30 per cent (middle), and 40 per cent (bottom) of channel data flagged and SNRchan = 1. Left: Faraday depth spectrum from flagged data. Right: Faraday depth spectra from GPM reconstructed data as described in Section 4.4. The ±5 σϕ limits are shown as a light grey shaded area.
It is noticeable in Fig. 3 that the spectra calculated from the optimized GP model appear significantly cleaner than those calculated from the original data. This is in part because the sidelobe structure due to the gaps in frequency coverage is reduced, but also because the posterior mean values from the model are not scattered by the thermal noise in the same way as the original data. Although this may be advantageous in enhancing some features of the spectra, it can also potentially lead to overinterpretation of low-level features. We caution that structures present at amplitudes below |$5\!-\!8\, \sigma _{\phi }$| should not be considered to represent true astrophysical components but are more likely to arise from structure in the noise. For uniform spectral noise, i.e. noise per frequency channel, the value of σϕ can be calculated by taking into account the number of independent measurements in complex polarization, which is |$N_{\rm chan}^{\prime }$|, where |$N_{\rm chan}^{\prime }$| is the number of unflagged frequency channels in the data set. By analogy to the quantity P(λ2), which has σP = σ when σQ = σU = σ (Brentjens & de Bruyn 2005), the amplitude of the Faraday depth spectrum will have noise |$\sigma _{\phi } = \sigma /\sqrt{N_{\rm chan}^{\prime }}$|. The threshold indicated in Fig. 3 is shown at 5σϕ.
The right-hand column of Fig. 3 shows the Faraday depth spectrum reconstructed from the difference between the GP estimate and the noiseless model in Scenario 1. We note that this difference is not the same as the residual that would be obtained from deconvolution of Faraday depth reconstruction of the GP estimate, which depends only on the data itself, but instead describes the degree of difference between the true model structure and the estimate fitted to the noisy data. Residuals in this structure below the level of the noise threshold are consistent with the model overfitting to noise fluctuations in the Stokes Q and U data from the perspective of the model. One notable feature of this difference in the case of low SNR is the appearance of a residual peak at ϕ = −50 rad m−2 for Scenario 1. This is due to the fact that the GP model is not parametrically constrained to produce the same amplitude for both the Stokes Q and U components, but rather the amplitude will follow the pattern of the noise in the data. For low signal to noise data, this can result in a local offsets between the amplitude of Q and U inferred by the GP estimate as a function of λ2. From the perspective of the model this local offset removes flux from the true peak and moves it to the mirrored peak, resulting in the double structure seen in the residual Faraday depth spectrum. However, once again we note that from the perspective of the data, this is a consequence of the noise realization, which is why it is reflected in the GP model estimate.
As illustrated by Foreman-Mackey et al. (2017), it is often the case that valid inferences can be made using incorrect GP models. For physical processes whose underlying properties are not well understood, such as generalized Faraday spectra, this can significantly complicate the process of selecting a parametric model. Incorporating domain knowledge helps to make the problem of model selection and interpretation of parameters less difficult. We demonstrate this by using the GP kernel from Scenario 1 to impute missing data for simulations of Scenario 2. An example of this is shown in Fig. 4, which shows a simulated data set where 20 per cent of the data points have been removed to mimic RFI flagging. In spite of not being an exact model, the optimized GP is able to reconstruct the missing data within the measurement uncertainties. It can also be seen that the optimized model is able to recover the rotation measure of the radio galaxy with high accuracy.

Left upper: Stokes parameters for Scenario 2 (see Section 4.1; black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: residuals between the posterior mean for Stokes Q and the data (black points) and residuals between the posterior mean and the missing/flagged Stokes Q data (blue points). The ±3σ limits are marked by grey dashed lines. Right: The inferred period of the process with the true period marked (vertical blue line) as well as the 16, 50, and 84 per cent confidence intervals of the posterior probability distribution (grey dashed lines).
In this case, recovering the Faraday depth of the thin component is not as strong an indicator of performance as in Scenario 1. Instead we compare the posterior mean of the optimized GP to the input model without noise and the residuals for this comparison are shown in the lower left-hand panel of Fig. 4. Fig. 5 shows the Faraday depth spectrum reconstructed from the optimized GP for Scenario 2 in the presence of different levels of RFI flagging. These should be compared to the noiseless model spectrum shown in Fig. 1.

Faraday depth spectra for Scenario 2 (Faraday complex) with 20 per cent (top), 30 per cent (middle), and 40 per cent (bottom) of channel data flagged and SNRchan = 5 at a reference frequency of 1.4 GHz. Left: Faraday depth spectrum from flagged data. Centre: Faraday depth spectra reconstructed from the GP estimate as described in Section 4.4. Right: Faraday depth residuals formed by subtracting the noiseless model from the GP estimate. The ±5 σϕ limits are shown as a light-grey shaded area.
4.5 Multiple sources along the line of sight
As described in Sun et al. (2015), it is expected that many lines of sight will have more than one Faraday thin component. Such Faraday geometries are also able to be recovered by the kernel in equation (9). In this case the hyperparameter P will represent the |RM| of the component with the highest rotation. An example of where this can occur is the superposition of radio galaxy jets or lobes oriented along a line of sight unresolved by the beam of a telescope. Such geometries result in double Faraday components, typically with similar Faraday depths as the integrated rotation differs only by the degree contributed in the local medium between the jets whilst experiencing a common degree of rotation from the comparatively larger path between the galaxy and the observer. An example of such a situation is shown in Fig. 6. We illustrate the GP model in this figure at high signal to noise, SNRchan = 20, to demonstrate that the same GP kernel from equation (9) can accurately represent the complex polarization behaviour of multiple sources without requiring additional kernel components. The Faraday depth spectra in the lower panels of Fig. 6 show the recovered Faraday depth spectrum from the noisy input data (left) and from the optimized GP reconstruction (right). It can be observed that the double structure is more apparent in the GP reconstruction. We note that the separation of the peaks in Faraday depth is wider than the separation of components in the model. This is not an artefact of the GP reconstruction, but due to the position of the first null in the RMTF.

Upper: Stokes parameters for line of sight comprised of two Faraday thin sources with SNRchan = 20 (black & grey error bars), and maximum a posteriori GP model (equation 9; blue solid line) with one standard deviation posterior uncertainty (grey shaded area). Left lower: Faraday depth spectrum calculated from the noisy input data sampled at regular intervals in frequency. Right lower: Faraday depth spectrum calculated from the optimized GPM sampled at regular intervals in λ2-space with the model components marked as solid black lines.
4.6 Rotation measure extraction
Using Scenario 1, we evaluate the accuracy of RM recovery for the Faraday thin source as a function of the SNR of the integrated polarization data, SNRint. It can be seen in Fig. 7 that for Scenario 1, the rotation measure is recovered with <1σ deviation from the true value for even low values of SNRint < 10 where the SNR on an individual frequency channel, SNRchan ≪ 1. A minimum value of SNRint = 8 is typically used to identify sources in total polarization images, see e.g. George, Stil & Keller (2012), and this value is marked in Fig. 7. It can be seen that the optimized RM value is in good agreement with both the expectation value and the true value of the RM above this limit.

Recovered rotation measures (RM) as a function of the signal-to-noise ratio (SNR) in the integrated polarization data for Scenario 1 are shown for a range of different ϕgal values. The difference between the expectation of the RM from the GP and the input to the Scenario 1 model are shown in each case with uncertainties calculated using the 68 per cent confidence interval on the optimized value of log P, which results in asymmetric uncertainties on the RM values. The point at which the SNR in an individual frequency channel is equal to one is shown as a vertical dashed line (blue). The distribution of equivalent expectation values for ten different noise realizations are shown as light-grey lines.
We emphasize that this limit is only applicable to the calculation of the rotation measure from the hyperparameters and does not affect the improvement in the Faraday depth spectrum from GPM re-sampling or imputation, as shown in Fig. 1, where structure at low Faraday depths is recovered equivalently well.
5 PERFORMANCE CONSIDERATIONS FOR SURVEYS
We now extend the application to simulated data to a large data set, such as might be recovered from a MeerKAT observation of a survey field. We use a simulated data set containing Stokes Q and Stokes U data for 10 000 Faraday thin sources. Observational parameters are matched to those of the MeerKAT telescope, one of the precursor arrays of the Square Kilometre Array (SKA; Dewdney et al. 2009; Jonas 2009). The MeerKAT consists of 64 13.5 m parabolic dishes sited in the Northern Cape province of South Africa (Jonas 2016; Taylor & Jarvis 2017).
A population of 10 000 Faraday thin sources was created with a Stokes I flux density at 1.4 GHz taken from a uniform distribution with values between 10 μJy and 30 mJy, and an RM taken from a uniform distribution between −100 and +100 rad m−2. Stokes Q and U values were generated for 320 channels spaced uniformly in frequency from 0.88 to 1.68 GHz. The intrinsic angle of polarization for each source, χ0, was sampled randomly from a uniform distribution [−π, +π). For each frequency channel, noise was added to the Stokes Q and U values using a normal distribution with a mean of zero and a variance of 180 μ Jy beam−1, equivalent to an integrated noise level of ∼10 μJy beam−1. These specifications are representative of the observational parameters for the MeerKAT MIGHTEE survey (Taylor & Jarvis 2017).
We randomly select 500 sources from this population with an SNRint ≥ 8 and |RM| > ϕlim and use an optimized GP with the kernel specified in equation (9) to recover the rotation measure for each object. We use a prior ranges as specified in Table 1. Uncertainties on the recovered RMs are calculated from the 68 per cent confidence interval on the optimized value of log P, which results in asymmetric uncertainties on the RM values. For 500 randomly selected samples under these constraints all of the recovered RMs lie within 5σ of the true value and |$\gt 99{{\ \rm per\ cent}}$| within 3σ. The average run time per object is 43 seconds on a Macbook Pro with a 3.5 GHz Intel Core i7 processor.
6 DISCUSSION
6.1 Using GP reconstructed Faraday depth spectra
Examples of Faraday depth spectra reconstructed from the optimized GP for Scenario 1 and Scenario 2 are shown in Figs 3 and 5. The reconstructed Faraday depth spectra in each case (right hand column of both figures) represent the structure in the quasi-periodic component of the covariance kernel only. Although the white noise component is taken into account during the optimization, it is not appropriate to scatter the predicted values by this distribution as the white component represents the measurement noise on the original data only. As mentioned already in Section 4.4, in order to avoid overinterpretation of the reconstructed Faraday depth spectrum, the significance of features below the equivalent detection threshold in Faraday depth space should be carefully considered.
If the optimized GPM was used to infer the posterior mean only at the positions of the input data, the model could be subtracted from the data at those positions to estimate a residual, as shown in the lower panel of Figs 2 and 4. This residual could then be transformed separately to the model and combined with the model reconstruction in a manner similar to that employed by the final step of the Cotton-Schwab CLEAN algorithm in synthesis imaging. We note that a kernel-based GPM approach to two-dimensional image synthesis and deconvolution in radio astronomy is likely to be prohibitive in its computational complexity, as the fast separable kernels employed here are only appropriate for one-dimensional data. However, kernel-free methods such as those described by Arras et al. (2020) can achieve similar scaling.
6.1.1 Recovery of Faraday thick structure
As can be seen in Fig. 5, using an optimized GP to reconstruct the Faraday depth spectrum in the case where Faraday thick structure is present can reveal these structures without the need for assuming any specific parametric model of Faraday depth a priori. Although it is not possible to recover the parametrized geometry of the thick structure from the hyperparameters of the GP in this case, the shape is far more clearly delineated in the GP-reconstructed Faraday depth spectrum than in the spectrum calculated from the original data. This remains the case even in the presence of high degrees of RFI excision.
In principle, GPs might also be used to infer the complex polarization signal at smaller values of λ2 than are present in the original data set, which could be beneficial for recovering extended structure in Faraday depth space. When used to infer the behaviour of a signal outside the range of measurements, the posterior covariance of a GP will increase in response to the stationary nature of the covariance not finding any additional measurements to anchor it. This increase in covariance can be used to down-weight predicted values of the posterior mean, relative to those known with more confidence. The behaviour of the inferred posterior mean will tend towards the mean function and in the general case where the mean function is assumed to be zero-valued, this is equivalent to the same logical sampling as in the original data set; however, for spectra where Faraday thick structure are present, which have non-zero means at low-λ2 this may artificially truncate structure. This was found to be the case here for the kernel in equation (9) when used to extend λ2-space.
It is also important to note that structures which exist only on scales larger than that set by the measured |$\lambda ^2_{\rm min}$| will not be recovered by a GP model used for inferring data at positions outside the range of the original measurements. Hence this form of structure enhancement for Faraday thick structures can only be used to improve the recovery of structures which exist on scales that are already sampled in part by the original measurements. Any larger structures will not appear in the Faraday depth spectrum of the GP reconstruction, an effect analogous to flux loss in synthesis imaging for radio interferometry.
6.1.2 Zero-valued Faraday depth signals
In the scenarios considered here, we have assumed a zero-valued mean function. In the absence of instrumental leakage, this assumption is valid for modelling sources with non-zero Faraday rotation; however, there are also two scenarios where a non-zero offset in both Stokes Q and Stokes U can occur, resulting in a zero-valued Faraday depth signal. The first of these is the case of un-rotated radio polarization signals, the second is the presence of residual instrumental leakage. The inclusion of such a constant offset as a free parameter (separately for Q and U) in the optimization process is straightforward to implement and requires only very minor changes to be made in the calculation of the posterior mean. However, for some telescopes with wide bandwidths, polarization leakage can be frequency-dependent and consequently modelling it as a constant offset will not be sufficient. In this case there is potential for confusion between the leakage term and covariate behaviour in the complex polarization; as can be seen in Fig. 4, smoothly varying systematic changes in amplitude can be modelled perfectly well using stationary kernels.
6.2 Faraday Challenge
In order to evaluate the performance of the GP approach against other methods in the literature we use the 17 models from the Faraday challenge of Sun et al. (2015). These models include examples of single Faraday thin objects, multiple (two) Faraday thin objects, and combinations of Faraday thick and Faraday thin objects. While the GP approach can only determine RM values in the case of single Faraday thin models, we can use the model evaluation metrics to compare the predicted Faraday depth spectra in all cases. For this purpose we use the kernel from equation (10) for all test data sets with priors as listed in Table 1. All evaluations are made on models with SNRint = 32, as in Sun et al. (2015).
Faraday challenge models. This table lists the input parameters for the 17 Faraday challenge models in Columns 1–9, and the evaluation metrics for the predicted complex Stokes data from the optimized GP in Columns 10–12. See Section 6.2 for details.
. | Input parameters . | Evaluation metrics . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | p1 . | ϕ1 . | χ1 . | p2 . | ϕ2 . | χ2 . | p0 . | ϕc . | ϕs . | |$\chi ^2_r$| . | SMSE . | MSLL . | 〈ϕ −SMSE〉 . |
. | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | rad m−2 . | . | . | . | . |
01 | 100.0 | 500.10 | 40 | − | − | − | − | − | − | 0.88 | 0.87 | −1.46 | 1.62 |
02 | 100.0 | 49.38 | 60 | − | − | − | − | − | − | 1.00 | 0.99 | −0.94 | 1.31 |
03 | 100.0 | 4.96 | 60 | − | − | − | − | − | − | 1.06 | 1.05 | −0.47 | 1.01 |
04 | 25.0 | −37.84 | 0 | 16.70 | 103.18 | −36 | − | − | − | 1.03 | 1.03 | −0.99 | 1.33 |
05 | 25.0 | −37.84 | −40 | 24.00 | 5.05 | −40 | − | − | − | 0.96 | 0.95 | −0.71 | 1.18 |
06 | 25.0 | −37.84 | −40 | 9.00 | 5.05 | −40 | − | − | − | 1.05 | 1.04 | −0.80 | 1.16 |
07 | 25.0 | −44.55 | 0 | 16.70 | 37.50 | 72 | − | − | − | 1.08 | 1.07 | −0.51 | 1.28 |
08 | 25.0 | 232.56 | 40 | 9.00 | 192.70 | 40 | − | − | − | 1.04 | 1.03 | −1.27 | 1.30 |
09 | 25.0 | −37.83 | −40 | 16.50 | 5.05 | 140 | − | − | − | 0.91 | 0.90 | −0.72 | 1.05 |
10 | 25.0 | −37.84 | 0 | 9.00 | 103.00 | −36 | − | − | − | 1.05 | 1.05 | −0.90 | 1.27 |
11 | 25.0 | 149.50 | 40 | 23.75 | 163.50 | −68 | − | − | − | 1.10 | 1.09 | −1.34 | 1.33 |
12 | 25.0 | −232.56 | 0 | 9.00 | −50.10 | 72 | − | − | − | 1.03 | 1.02 | −1.47 | 1.41 |
13 | 25.0 | −44.55 | 0 | 24.00 | 37.54 | 72 | − | − | − | 0.99 | 0.98 | −0.58 | 1.17 |
14 | − | − | − | − | − | − | 1.9 | −136.98 | 50 | 0.87 | 0.87 | −1.41 | 1.59 |
15 | 1.8 | −240.22 | −36 | − | − | − | 1.9 | −250.17 | 50 | 1.07 | 1.06 | −1.35 | 1.16 |
16 | − | − | − | − | − | − | 1.9 | −136.98 | 25 | 0.99 | 0.98 | −1.53 | 1.25 |
17 | 1.8 | −240.00 | −36 | − | − | − | 1.9 | −250.17 | 25 | 1.02 | 1.01 | −1.41 | 1.20 |
. | Input parameters . | Evaluation metrics . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | p1 . | ϕ1 . | χ1 . | p2 . | ϕ2 . | χ2 . | p0 . | ϕc . | ϕs . | |$\chi ^2_r$| . | SMSE . | MSLL . | 〈ϕ −SMSE〉 . |
. | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | rad m−2 . | . | . | . | . |
01 | 100.0 | 500.10 | 40 | − | − | − | − | − | − | 0.88 | 0.87 | −1.46 | 1.62 |
02 | 100.0 | 49.38 | 60 | − | − | − | − | − | − | 1.00 | 0.99 | −0.94 | 1.31 |
03 | 100.0 | 4.96 | 60 | − | − | − | − | − | − | 1.06 | 1.05 | −0.47 | 1.01 |
04 | 25.0 | −37.84 | 0 | 16.70 | 103.18 | −36 | − | − | − | 1.03 | 1.03 | −0.99 | 1.33 |
05 | 25.0 | −37.84 | −40 | 24.00 | 5.05 | −40 | − | − | − | 0.96 | 0.95 | −0.71 | 1.18 |
06 | 25.0 | −37.84 | −40 | 9.00 | 5.05 | −40 | − | − | − | 1.05 | 1.04 | −0.80 | 1.16 |
07 | 25.0 | −44.55 | 0 | 16.70 | 37.50 | 72 | − | − | − | 1.08 | 1.07 | −0.51 | 1.28 |
08 | 25.0 | 232.56 | 40 | 9.00 | 192.70 | 40 | − | − | − | 1.04 | 1.03 | −1.27 | 1.30 |
09 | 25.0 | −37.83 | −40 | 16.50 | 5.05 | 140 | − | − | − | 0.91 | 0.90 | −0.72 | 1.05 |
10 | 25.0 | −37.84 | 0 | 9.00 | 103.00 | −36 | − | − | − | 1.05 | 1.05 | −0.90 | 1.27 |
11 | 25.0 | 149.50 | 40 | 23.75 | 163.50 | −68 | − | − | − | 1.10 | 1.09 | −1.34 | 1.33 |
12 | 25.0 | −232.56 | 0 | 9.00 | −50.10 | 72 | − | − | − | 1.03 | 1.02 | −1.47 | 1.41 |
13 | 25.0 | −44.55 | 0 | 24.00 | 37.54 | 72 | − | − | − | 0.99 | 0.98 | −0.58 | 1.17 |
14 | − | − | − | − | − | − | 1.9 | −136.98 | 50 | 0.87 | 0.87 | −1.41 | 1.59 |
15 | 1.8 | −240.22 | −36 | − | − | − | 1.9 | −250.17 | 50 | 1.07 | 1.06 | −1.35 | 1.16 |
16 | − | − | − | − | − | − | 1.9 | −136.98 | 25 | 0.99 | 0.98 | −1.53 | 1.25 |
17 | 1.8 | −240.00 | −36 | − | − | − | 1.9 | −250.17 | 25 | 1.02 | 1.01 | −1.41 | 1.20 |
Faraday challenge models. This table lists the input parameters for the 17 Faraday challenge models in Columns 1–9, and the evaluation metrics for the predicted complex Stokes data from the optimized GP in Columns 10–12. See Section 6.2 for details.
. | Input parameters . | Evaluation metrics . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | p1 . | ϕ1 . | χ1 . | p2 . | ϕ2 . | χ2 . | p0 . | ϕc . | ϕs . | |$\chi ^2_r$| . | SMSE . | MSLL . | 〈ϕ −SMSE〉 . |
. | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | rad m−2 . | . | . | . | . |
01 | 100.0 | 500.10 | 40 | − | − | − | − | − | − | 0.88 | 0.87 | −1.46 | 1.62 |
02 | 100.0 | 49.38 | 60 | − | − | − | − | − | − | 1.00 | 0.99 | −0.94 | 1.31 |
03 | 100.0 | 4.96 | 60 | − | − | − | − | − | − | 1.06 | 1.05 | −0.47 | 1.01 |
04 | 25.0 | −37.84 | 0 | 16.70 | 103.18 | −36 | − | − | − | 1.03 | 1.03 | −0.99 | 1.33 |
05 | 25.0 | −37.84 | −40 | 24.00 | 5.05 | −40 | − | − | − | 0.96 | 0.95 | −0.71 | 1.18 |
06 | 25.0 | −37.84 | −40 | 9.00 | 5.05 | −40 | − | − | − | 1.05 | 1.04 | −0.80 | 1.16 |
07 | 25.0 | −44.55 | 0 | 16.70 | 37.50 | 72 | − | − | − | 1.08 | 1.07 | −0.51 | 1.28 |
08 | 25.0 | 232.56 | 40 | 9.00 | 192.70 | 40 | − | − | − | 1.04 | 1.03 | −1.27 | 1.30 |
09 | 25.0 | −37.83 | −40 | 16.50 | 5.05 | 140 | − | − | − | 0.91 | 0.90 | −0.72 | 1.05 |
10 | 25.0 | −37.84 | 0 | 9.00 | 103.00 | −36 | − | − | − | 1.05 | 1.05 | −0.90 | 1.27 |
11 | 25.0 | 149.50 | 40 | 23.75 | 163.50 | −68 | − | − | − | 1.10 | 1.09 | −1.34 | 1.33 |
12 | 25.0 | −232.56 | 0 | 9.00 | −50.10 | 72 | − | − | − | 1.03 | 1.02 | −1.47 | 1.41 |
13 | 25.0 | −44.55 | 0 | 24.00 | 37.54 | 72 | − | − | − | 0.99 | 0.98 | −0.58 | 1.17 |
14 | − | − | − | − | − | − | 1.9 | −136.98 | 50 | 0.87 | 0.87 | −1.41 | 1.59 |
15 | 1.8 | −240.22 | −36 | − | − | − | 1.9 | −250.17 | 50 | 1.07 | 1.06 | −1.35 | 1.16 |
16 | − | − | − | − | − | − | 1.9 | −136.98 | 25 | 0.99 | 0.98 | −1.53 | 1.25 |
17 | 1.8 | −240.00 | −36 | − | − | − | 1.9 | −250.17 | 25 | 1.02 | 1.01 | −1.41 | 1.20 |
. | Input parameters . | Evaluation metrics . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | p1 . | ϕ1 . | χ1 . | p2 . | ϕ2 . | χ2 . | p0 . | ϕc . | ϕs . | |$\chi ^2_r$| . | SMSE . | MSLL . | 〈ϕ −SMSE〉 . |
. | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | deg . | (per cent) . | rad m−2 . | rad m−2 . | . | . | . | . |
01 | 100.0 | 500.10 | 40 | − | − | − | − | − | − | 0.88 | 0.87 | −1.46 | 1.62 |
02 | 100.0 | 49.38 | 60 | − | − | − | − | − | − | 1.00 | 0.99 | −0.94 | 1.31 |
03 | 100.0 | 4.96 | 60 | − | − | − | − | − | − | 1.06 | 1.05 | −0.47 | 1.01 |
04 | 25.0 | −37.84 | 0 | 16.70 | 103.18 | −36 | − | − | − | 1.03 | 1.03 | −0.99 | 1.33 |
05 | 25.0 | −37.84 | −40 | 24.00 | 5.05 | −40 | − | − | − | 0.96 | 0.95 | −0.71 | 1.18 |
06 | 25.0 | −37.84 | −40 | 9.00 | 5.05 | −40 | − | − | − | 1.05 | 1.04 | −0.80 | 1.16 |
07 | 25.0 | −44.55 | 0 | 16.70 | 37.50 | 72 | − | − | − | 1.08 | 1.07 | −0.51 | 1.28 |
08 | 25.0 | 232.56 | 40 | 9.00 | 192.70 | 40 | − | − | − | 1.04 | 1.03 | −1.27 | 1.30 |
09 | 25.0 | −37.83 | −40 | 16.50 | 5.05 | 140 | − | − | − | 0.91 | 0.90 | −0.72 | 1.05 |
10 | 25.0 | −37.84 | 0 | 9.00 | 103.00 | −36 | − | − | − | 1.05 | 1.05 | −0.90 | 1.27 |
11 | 25.0 | 149.50 | 40 | 23.75 | 163.50 | −68 | − | − | − | 1.10 | 1.09 | −1.34 | 1.33 |
12 | 25.0 | −232.56 | 0 | 9.00 | −50.10 | 72 | − | − | − | 1.03 | 1.02 | −1.47 | 1.41 |
13 | 25.0 | −44.55 | 0 | 24.00 | 37.54 | 72 | − | − | − | 0.99 | 0.98 | −0.58 | 1.17 |
14 | − | − | − | − | − | − | 1.9 | −136.98 | 50 | 0.87 | 0.87 | −1.41 | 1.59 |
15 | 1.8 | −240.22 | −36 | − | − | − | 1.9 | −250.17 | 50 | 1.07 | 1.06 | −1.35 | 1.16 |
16 | − | − | − | − | − | − | 1.9 | −136.98 | 25 | 0.99 | 0.98 | −1.53 | 1.25 |
17 | 1.8 | −240.00 | −36 | − | − | − | 1.9 | −250.17 | 25 | 1.02 | 1.01 | −1.41 | 1.20 |
The median of the reduced chi-squared statistic recovered from our model, |$\chi ^2_{\rm r} = 0.99$|, is in good agreement with the expected optimum value of 1.00 ± 0.02 from Sun et al. (2015). Results for both the SMSE and |$\chi _{\rm r}^2$| across all 17 test models, evaluated using SNR = 32 as in Sun et al. (2015), are shown in Fig. 8. By comparison to Fig. 3 in that work it can be seen that the GP |$\chi _{\rm r}^2$| has a median value comparable to that of the three best performing methods in that work, each of which use QU-fitting approaches. Furthermore, it can also be seen that the distribution of |$\chi _{\rm r}^2$| for the GP method is significantly narrower than those methods, each of which has a 68 per cent confidence interval that extends to values larger than |$\chi _{\rm r}^2=1.1$|.

Box-and-whisker plots for figures of merit over all 17 Faraday challenge models. The boxes show the first, the second (median), and the third quartile, and the ends of the whiskers show the edges of the distribution. Points deemed to be outliers are shown as open circles.
By comparing the SMSE and 〈ϕ-SMSE〉 values for each model it can be seen that the goodness of fit for the GP model estimates is higher in the Stokes Q and U data space than it is in the Faraday depth signal space. This is to be expected as the model itself is optimized against the noisy Stokes data. Fig. 8 also indicates models that are deemed to be outliers based on the interquartile range of the data. These two outliers are Models 1 and 14, which are low outliers for the data (QU) space SMSE distribution and high outliers for the signal (ϕ) space 〈ϕ-SMSE〉 distribution, indicating that the GP model estimates are overfitting the noisy QU data in these cases, resulting in larger Faraday depth residuals.
The calculation of 〈ϕ-SMSE〉 performed here is possible due to the nature of the GP model, which provides a full predictive posterior distribution for each estimate. Whilst equivalent values are not available for the other methods profiled in the Faraday challenge of Sun et al. (2015), we are able to visualize the equivalent residual Faraday depth spectra. For example, for Model 1, the average difference between the Faraday depth, ϕ1 (see Table 2), of the noiseless model and the fitted value of this parameter across all of the methods in the Sun et al. (2015) paper is Δϕ1 = 1.57 rad m−2 (see table 3 in that work). In Fig. 9 we show the residual Faraday depth spectra for the difference between the true noiseless model and a model with a value of ϕ1 that is incorrect by this amount. We also include residuals for Δϕ1 = 0.60 rad m−2 and Δϕ1 = 3.60 rad m−2, which were the smallest and largest differences returned by the methods evaluated against Model 1 in the Sun et al. (2015) comparison. It can be seen from this figure that the amplitude of the residuals from the GP model are equivalent to those of the best fitting model from the Sun et al. (2015) Faraday challenge in this case; however, it should also be noted that the nature of these residuals is different in form to those resulting from an inexact parametric model.

Upper: Residual Faraday depth spectra for Model 1 of the Faraday challenge, constructed using the transform of the difference between the noiseless model and a model where the fitted Faraday depth, ϕ1, differs from the input model by (i) an average deviation of Δϕ1 = 1.57 rad m−2 (grey line indicates absolute polarization, with real and imaginary components shown in cyan), (ii) a minimum deviation of Δϕ1 = 0.60 rad m−2 (blue dashed line), and (iii) a maximum deviation of Δϕ1 = 3.60 rad m−2 (magenta dashed line). Lower: Residual Faraday depth spectrum for Model 1 of the Faraday challenge, constructed using the transform of the difference between the noiseless model and the posterior mean of the optimized GP model. The ±5 σϕ limits appropriate to SNRint = 32 for the Faraday challenge are shown as a light-grey shaded area.
6.3 Alternative kernels
For a given regression problem, there are an infinite number of models that can be used. As such, it is usually not feasible to test all possible models. In this work so far we have used the GP kernel defined in equation (9); here we consider the use of alternative kernels. We denote our original kernel as ‘3-HP’ and we compare it to (i) the quasi-periodic kernel used by Foreman-Mackey et al. (2017) to model stellar variability, and (ii) a hybrid kernel including a second squared-exponential term.
Although different methods of evaluating competing models have been developed, the computational complexity involved in applying them limits their applicability. To overcome this limitation, easier-to-use approximate methods are commonly used to compare the performance of two or more models. These include the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC) and the corrected Akaike Information Criterion (AICc). These information criteria are designed to evaluate the amount of information lost by a particular model. The model that loses the least information, i.e. has the lowest valued information criterion, is considered to be the preferred model.
Burnham & Anderson (2004) suggest that the BIC is a powerful model selection tool when the ‘true’ model is in the set of test models; however, in the case of empirical data where the true model is unlikely to be known a priori the AIC is considered to be a more appropriate criterion. Furthermore, the AICc places a stronger complexity penalty on models used with finite data series than the original AIC. In the case of small sample sizes it is considered to be more appropriate and accurate for finding the preferred model. However, Kass & Raftery (1995) suggest that the AIC should only be used in situations where the prior distribution is similar to that of the likelihood and that when this isn’t the case and the prior is significantly less informative than the likelihood then the BIC will favour simpler models than the AIC.
We calculate the median value and inter-quartile range for each of the three information criteria from ten different noise realizations with SNRint = 32. The goodness of fit statistics for the Faraday challenge models using the two alternative kernels are |$\chi ^2_{\rm r} = 0.98$| for the 4-HP model, and |$\chi ^2_{\rm r} = 0.97$| for the 5-HP model.
The median values of the information criteria suggest a preference for the HP-4 kernel in the case of single Faraday thin component models, a preference for the HP-5 kernel in the case of double Faraday thin models, and a preference for the HP-3 kernel in the case of Faraday thick models. However, we note that these are modal preferences and no model type shows a unanimous preference for a particular kernel. Indeed, although ranking the three kernels using the median value of the information criteria for each data set may nominally show a preference for one kernel above others, the overlapping inter-quartile ranges for most models suggest that there is in fact no clear preference. The only model where one kernel is significantly preferred is Model 1, where the inter-quartile range of the HP-4 kernel is outside the inter-quartile range of both other kernels, see Fig. 10. We also see no difference in the median ranking across the three information criteria, which is perhaps unsurprising given that N ≫ K in all cases. Furthermore, as noted by Bollen et al. (2014), information criteria comparisons are appropriate only for data sets with the same number of measurement samples, and we emphasize that the kernel comparison presented here is for the specifications of the test sets in the Sun et al. (2015) Faraday challenge.

Box-and-whisker plots for BIC values derived from each of the 3 kernels using 10 noise realizations of Model 1 from Sun et al. (2015). The boxes show the first, the second (median), and the third quartile, and the ends of the whiskers show the minimum and maximum values.
7 CONCLUSIONS
In this work we suggest that GPM may be a flexible pre-processing approach for the improved analysis of Faraday depth spectra as it does not rely on an exact parametrized physical model, or set of models, to be known a priori and is therefore suitable for situations where some prior knowledge of the form of the signal is understood, but the exact line of sight structure is imperfectly known. This is often the case for Faraday depth analysis. With regard to this, we have shown that even simple GPM covariance kernels can be used to impute data reliably for complex lines of sight.
We have demonstrated the applicability of GPM for improved recovery of structure in Faraday depth spectra. We have shown that these improvements include the reduction of sidelobe contamination in Faraday depth spectra through imputation of missing data when RFI flagging causes significant losses, even in the case where significant portions of the original data set are lost. We have also illustrated the benefits of using GPM to re-sample complex polarization data on to regular spacings in λ2-space as an alternative to currently used re-gridding methods. We have shown that these applications in combination can provide an improved representation of Faraday thick structure in Faraday depth spectra.
We have shown that under certain circumstances the hyperparameters of the periodic covariance kernel can be used to recover rotation measure values for Faraday thin sources and that this hyperparameter can also be physically meaningful in a more limited way for complex lines of sight. We have presented a method to calculate the range of RM over which this method is applicable for different telescopes and/or data sets and we have demonstrated that within this range RM recovery from GP hyperparameters is robust to typical detection thresholds.
We have discussed limitations in the use of GPM reconstructed Faraday depth spectra, highlighted situations in which GPM is not an appropriate solution and presented a quantitative method for determining a threshold above which Faraday structure can be safely considered significant when reconstructed using an optimized GP fitted to noisy data.
We have evaluated the GP reconstruction proposed here against other methods in the literature and have shown that it has a data space performance equivalent to other QU-fitting methods and furthermore that the variance in results from the GP approach is smaller in the Stokes data space than seen from those methods. Using the same test data we demonstrate that even complex Faraday structure can be recovered using a comparatively simple covariance kernel.
ACKNOWLEDGEMENTS
We thank the anonymous referee for their careful reading of the paper and their helpful comments, which improved the content of this paper. The authors also thank Torsten Enßlin for useful comments on an earlier draft of this paper. This work makes extensive use of the celerite library.3 SWN acknowledges support from the UK Newton Fund via the Science & Technology Facilities Council (STFC) through the Development in Africa with Radio Astronomy (DARA) Big Data project ST/R001898/1. AMS gratefully acknowledges support from an Alan Turing Institute AI fellowship EP/V030302/1. JH acknowledges support from the UK Science & Technology Facilities Council (STFC). MC acknowledges support from the National Agency for Research and Development (ANID) Scholarship Program DOCTORADO BECAS CHILE under grant 2018-72190574.
DATA AVAILABILITY
All code from this work is publicly available on Github at the following address: https://github.com/as595/GPM-for-Faraday-Depth-Spectra
Footnotes
In statistics, imputation is the process of replacing missing data points with substituted values such as model estimates.