-
PDF
- Split View
-
Views
-
Cite
Cite
M Purver, C G Bassa, I Cognard, G H Janssen, R Karuppusamy, M Kramer, K J Lee, K Liu, J W McKee, D Perrodin, S Sanidas, R Smits, B W Stappers, Removal and replacement of interference in tied-array radio pulsar observations using the spectral kurtosis estimator, Monthly Notices of the Royal Astronomical Society, Volume 510, Issue 2, February 2022, Pages 1597–1611, https://doi.org/10.1093/mnras/stab3434
- Share Icon Share
ABSTRACT
We describe how to implement the spectral kurtosis method of interference removal (zapping) on a digitized signal of averaged power values. Spectral kurtosis is a hypothesis test, analogous to the t-test, with a null hypothesis that the amplitudes from which power is formed belong to a ‘good’ distribution – typically Gaussian with zero mean – where power values are zapped if the hypothesis is rejected at a specified confidence level. We derive signal-to-noise ratios (SNRs) as a function of amount of zapping for folded radio pulsar observations consisting of a sum of signals from multiple telescopes in independent radio-frequency interference environments, comparing four methods to compensate for lost data with coherent (tied-array) and incoherent summation. For coherently summed amplitudes, scaling amplitudes from non-zapped telescopes achieves a higher SNR than replacing zapped amplitudes with artificial noise. For incoherently summed power values, the highest SNR is given by scaling power from non-zapped telescopes to maintain a constant mean. We use spectral kurtosis to clean a tied-array radio pulsar observation by the Large European Array for Pulsars: the signal from one telescope is zapped with time and frequency resolutions of |$6.25\, \mathrm{ms}$| and |$0.16\, \mathrm{MHz}$|, removing interference, along with 0.27 per cent of ‘good’ data, giving an uncertainty of |$0.25\, \mathrm{\mu \mathrm{ s}}$| in pulse time of arrival (TOA) for PSR J1022+1001. We use a single-telescope observation to demonstrate recovery of the pulse profile shape, with 0.6 per cent of data zapped and a reduction from 1.22 to |$0.70\, \mathrm{\mu \mathrm{ s}}$| in TOA uncertainty.
1 INTRODUCTION
Terrestrial sources of radio-frequency interference hamper many astronomical radio observations, swamping the signal from outer space with intense and unwanted human-made radiation. Although it is usually impossible to subtract interference from useful information received at the same time and frequency, it is possible to remove or replace contaminated portions of data entirely, and so prevent them from affecting the integrated signal.
To remove interference, we wish to identify it using a fair and reliable method. One such method, a general signal processing technique termed ‘spectral kurtosis’ (Dwyer 1983), was first applied to radio astronomy by Nita et al. (2007), subsequently refined by Nita & Gary (2010a,b) and applied again by Gary, Liu & Nita (2010). It is a statistical method that attempts to separate ‘Gaussian white noise’ from everything else, on the assumption that the useful information resembles Gaussian white noise while the interference does not. It uses the signal power to make a binary decision about whether to remove data, and it can be applied with fine resolution in time and frequency. We refer to the quantity used to make this decision as ‘the estimator’.
Pulsars are weak and rapidly varying astronomical radio sources that require observations with fine time resolution, so interference cannot easily be ‘averaged out’ with integration across time, frequency, or multiple telescopes. Spectral kurtosis has previously been used to remove interference from single-telescope pulsar observations from the Parkes, Lovell, and Green Bank Telescopes (van Straten 2013; Dolch et al. 2014; Lam 2016; Kar et al. 2019). We extend this approach to cover the case of a pulsar observation consisting of a sum of signals from an array of telescopes in independent interference environments, which results in a variable number of telescopes contributing to the observation at different times and frequencies. Since it is possible to compensate for contaminated data at one telescope using uncontaminated data from the rest of the array, we compare four methods of replacing the data that are lost from an array pulsar observation when interference is removed.
We apply one such method to a timing observation made as a part of the Large European Array for Pulsars (LEAP) project (first shown in section 4.5 Bassa et al. 2016). This array comprises five widely separated radio telescopes, whose simultaneous pulsar observations can be summed coherently to produce a tied array of equivalent sensitivity to a single telescope of |$195\, \mathrm{m}$| in diameter. Coherent summation is not always possible, so LEAP sometimes acts as an incoherently summed array equivalent to a 130-m telescope. LEAP provides pulsar observations of greater sensitivity than any other existing steerable radio telescope, enabling precise timing of dynamical properties, such as a pulsar’s rotational period and proper motion. The project aims to use high-precision timing to detect gravitational waves, and interference removal is necessary to maintain its accuracy. Spectral kurtosis can be applied independently to each telescope’s data, excising interference while sacrificing as little useful information as possible.
In Section 2 of this paper, we fully describe the implementation of the spectral kurtosis method of Nita & Gary (2010b). In Section 3, we explain advantages and disadvantages of this method of interference detection. In Section 4, we explore the effects of general interference removal on folded pulse profiles produced using observations from an array of telescopes, and look at how adverse effects can be mitigated while maximizing signal-to-noise ratio (SNR). In Section 5, we summarize the application of spectral kurtosis to a LEAP observation. In Section 6, we conclude.
2 SPECTRAL KURTOSIS METHOD OF INTERFERENCE DETECTION
Each instance of the spectral kurtosis estimator is derived from a portion of the radio signal, and its value is a measure of the statistical properties of that portion. We use a hypothesis test to classify data as either ‘good’ or ‘bad’ based on the value of the estimator, where bad data are usually interference and good data are usually not. The null hypothesis is that data are good, because we know the values expected from good data. The classification provides evidence of whether a portion is likely to be good or bad in reality, but, since it examines a finite number of data, it cannot tell us whether the portion is definitely good or bad. We therefore remove portions that are suspected of being bad when they give estimator values that are outside specific limits, and we call this removal ‘zapping’. We can control our level of suspicion, because the estimator allows us to define the fraction of good data that we are willing to zap mistakenly. In general, the more good data we are willing to lose, the more interference we will eliminate. If the amount of interference in an observation is small then we may zap more good data than bad, but the spoiling effect of even a small amount of dominant interference is usually sufficient to justify this.
In the following subsections, we explain how to calculate the estimator and its limits. We provide some example values for variables used in the calculations, as an aid to implementation of the method. Where numerical integration is required, we refer to routines within the GNU Scientific Library (gsl) for the c and c++ programming languages. All variables used have real values, with imaginary quantities shown explicitly using the imaginary unit i.
2.1 The radio signal
Although the estimator is a function of signal power, we begin with the amplitudes from which power is derived. The digitized signal initially consists of an evenly sampled time series of radio amplitudes, covering a fixed frequency bandwidth; we assume that the continuous signal has been limited to cover the same bandwidth before being sampled, so that its information is captured accurately (Shannon 1949). Each time sample records the amplitude in either one or two polarization components, and each amplitude may be either real or complex (in the case of complex amplitudes, the imaginary part is simply a phase-shifted version of the real part in which each sinusoidal wave making up the signal has been shifted by |$\frac{\pi }{2}$| radians, capturing the same information as a real signal of twice the sampling rate). The amplitudes are drawn from a set containing a fixed number of discrete values, stored using a corresponding number of ‘sampling bits’, and we assume this set to be large enough to approximate a continuum of values for statistical purposes. Although the number of values does not change, the values themselves can be calibrated dynamically during an observation.
The frequency resolution of the signal can be improved, at the expense of time resolution, by performing discrete Fourier transforms (DFTs) on sequences of amplitudes; the more time samples are used in each DFT, the more the frequency resolution improves (pp. 260–262 Bracewell 2000). Although this is often referred to as ‘moving from the time domain to the frequency domain’, the values that come out of a DFT are still amplitudes, with each amplitude representing the signal at one time and one frequency. DFTs of consecutive sequences of amplitudes can therefore be used as a time series with multiple frequency channels, where each DFT contributes one time point to all channels. The process simply exchanges time resolution for frequency resolution, and it is worth noting that the DFT of a single value gives the value itself, i.e. the time-domain signal can be thought of as being the frequency-domain signal with one channel. The channelized amplitudes are generally complex, representing the magnitude and phase of the signal at each time and frequency. Given the same signal and equipment, a complex time series consisting of T samples and a real time series consisting of 2T samples produce almost the same amplitudes in channels 1 to T − 1 of their respective DFTs (where T is a positive integer), as long as the frequencies in the signal are within the range of the channels. This similarity allows amplitudes from telescopes with real and complex sampling to be added together in the frequency domain (section 4.1 of Bassa et al. 2016). There are some differences in the way that continuous signal frequencies are distributed into discrete bins, which can be mitigated by applying different weights to each bin; the differences are larger in the lowest- and highest-frequency channels, and these channels are often not used because they do not approximate the original signal well even if weights are used (pp. 281, 288 Bracewell 2000). The DFT amplitudes in channels 0 and T from the real time series are themselves always real, so they cannot generally be compared to the amplitude in channel 0 from the complex time series, and the complex time series does not produce a channel T.
2.2 The statistical distribution of the signal
The identification of interference by the spectral kurtosis method is based on the probability density function (PDF) of a set of amplitudes, which gives the probabilities of a single amplitude taking any given value and which we refer to as the ‘distribution’ of the set. The number of amplitudes in the set can be chosen at will, and the time and frequency ranges covered by the set are the time and frequency resolutions of zapping.
Depending on the origin of the signal, the distribution can have different general forms (shapes), and other distinguishing characteristics, such as different mean values. We can use spectral kurtosis to test whether each measured set of amplitudes is well described by a particular distribution. In a ‘good’ signal of Gaussian white noise, the amplitudes (taking the real and imaginary parts as separate values if the signal is complex) are uncorrelated and have a Gaussian distribution with a mean of 0 when collected over the time and frequency resolutions of zapping and across all polarizations; in a ‘bad’ signal, the distribution of amplitudes when collected over these ranges is non-Gaussian, contains correlated amplitudes and/or does not have a mean of 0. Although the portion of the signal from the pulsar might have a non-Gaussian distribution, it is typically too weak to substantially alter the total signal distribution, so a detectably bad signal is usually caused by interference. If a bad signal is caused by unpolarized interference, the distribution over time and frequency within a single polarization is bad, and two polarizations have the same distribution as one another; if a bad signal is caused by polarized interference, two polarization components are differently distributed. In all but pathological cases, the real and imaginary parts of a complex signal have the same distribution as one another.
2.3 The estimator
The estimator is an unbiased estimate of the scaled variance divided by the square of the mean for a set of samples of summed power. It is referred to as ‘spectral’ because it can be calculated separately for each frequency channel by using Fourier transforms, although it can also be calculated across the full bandwidth without leaving the time domain. It is called ‘kurtosis’ because the variance of signal power involves the mathematical fourth power of amplitude values, but each element in the calculation is more generally the square of a sum of squares rather than a simple fourth power.
The underlying probability distribution of S can be revealed by making many measurements of |$\hat{S}$|. But the distribution for good data can also be approximated analytically, allowing it to be calculated more efficiently. The distribution depends on M and N, but, for Gaussian-distributed amplitudes with a mean of 0, it does not depend on the variance of those amplitudes. In other words, the estimator behaves in the same way for Gaussian amplitudes (good data) of any ‘loudness’, and can thus be used to distinguish them from most non-Gaussian amplitudes (bad data). The estimator shares this property with the t-statistic (Student 1908), and in fact a t-test could be used to classify good and bad data using amplitudes instead of power values. We have not undertaken an interference detection comparison between the estimator and the t-statistic, but have employed the estimator because it can be used on either averaged power values or amplitudes and because it can apply a consistent test to polarized signals regardless of the angle between the radio source and the receiver plane.
For good data, S has a mean of 1 for all allowed values of M and N; if we are to decide which amplitudes to accept as good without deriving the distribution of S empirically, we must calculate the distribution’s shape as best we can by computing some of its higher moments as well.
2.4 The probability distribution of the estimator
In order to understand what values we expect the estimator to take when the null hypothesis is true, we determine the approximate cumulative distribution function (CDF), P(S), that is produced by good |$\hat{S}$| values. The CDF gives the fraction of good |$\hat{S}$| values that are expected to fall at or below any level S, and its shape depends only on M and N.

Profile SNR as a function of the fraction of summed power samples zapped due to interference, using coherent (top) and incoherent (bottom) summation of signals from two identical telescopes in independent interference environments, with four different methods of equalization (see text for other parameter values).

Profile SNR as a function of the fraction of summed power samples zapped due to interference, using coherent (top) and incoherent (bottom) summation of signals from five identical telescopes in independent interference environments, with four different methods of equalization (see text for other parameter values).
2.4.1 The CDF using Type I
Type I corresponds to κ ≤ 0, and applies only in some cases for which M ≤ 9 (as long as N ≥ 0.5, which covers all of its allowed values when calculating the estimator distribution caused by Gaussian white noise). Such small values of M give a PDF with a large variance, making the estimator so insensitive to interference that it might not be considered useful (Nita & Gary 2010a). Additionally, we have found that Type I does not provide a good approximation to the true PDF of S as generated using simulated random numbers, even though it does approximate the first four moments well. The problem is worst at small S, where the Type I PDF gives a substantial probability of S < 0, despite the fact that |$\hat{S}$| cannot be negative. These negative values indicate that a Pearson distribution is inadequate when M is small, and that it is necessary to approximate more than four moments or to derive the PDF empirically in these cases. However, since we do not know the scope of applications for which spectral kurtosis will be employed, we include this part of the method for completeness.
2.4.2 The CDF using Type IV
Type IV corresponds to 0 < κ < 1. It applies in all cases for which M ≥ 240 and N ≤ 13.5, and in some cases for which 14 ≤ M ≤ 239 and N ≤ 13.5, so it is most commonly used when few amplitudes are used to make each power value. It approximates the true PDF of S well for all values of M and N to which it applies, working best when M ≳ 25.
2.4.3 The CDF using Type VI
Type VI corresponds to κ > 1. It applies in all cases for which M ≥ 3 and N ≥ 14, and in some cases for which 3 ≤ M ≤ 239 and N ≤ 13.5, so it is most commonly used when many amplitudes are used to make each power value. Like Type I, Type VI cannot identify interference fairly when M is small: it gives a substantial probability of S < 0 in some cases for which M ≲ 12, even though |$\hat{S}$| cannot be negative. In the more useful cases for which M ≳ 25, however, simulations with random numbers showed that Type VI provides a good approximation to the true PDF of S.
2.5 The limits of the estimator
We now decide on the fraction of good data that we are willing to reject, 2f, and use the transformed CDF to calculate lower and upper limits of S such that a fraction f of good |$\hat{S}$| values are expected to fall below the lower limit, SL, and a fraction f are expected to fall above the upper limit, SU. The range between the limits is a confidence interval: we reject any portion of data that produces an estimator value outside the range, because we suspect those data of being bad. We wish to choose the smallest value of f that adequately removes interference, in order to keep as many good data as possible and avoid any substantial change to the overall amplitude distribution of an observation.
After checking that the initial guess for a transformed limit falls within the range for which the relevant CDF is defined, we calculate the value of the CDF at that point. Since all CDFs increase monotonically, we make the guess a lower bound if the CDF falls below its target value or as an upper bound if the CDF falls above its target value. We then calculate the CDF at intervals in S′ (moving in one direction by, for example, |$\frac{\sqrt{\mu _2}}{a}$| at a time) until the CDF crosses its target value, giving us the other bound for the transformed limit. After this, we bisect the upper and lower bounds iteratively, and at each iteration we make the bisection point a new upper or lower bound according to the value of the CDF at that point. Once the upper and lower bounds are sufficiently close together, we take their bisection point as the transformed limit, and finally convert this to a limit on S using equation (11). We use our limits to zap portions of data that give |$\hat{S}\lt S_L$| or |$\hat{S}\gt S_U$|.
For η = 3 (f = 0.001350), example values corresponding to those in Sections 2.4.1–2.4.3 are: M = 3, N = 4, SL = −1.492, SU = 7.417 (Type I); M = 1000, N = 2, SL = 0.8499, SU = 1.1818 (Type IV); M = 600, N = 16, SL = 0.8321, SU = 1.1901 (Type VI).
3 ADVANTAGES AND DISADVANTAGES OF THE ESTIMATOR
Spectral kurtosis is one of many methods designed to distinguish interference from useful data. While spectral kurtosis looks for non-Gaussianity in the distribution of amplitudes, most techniques flag outlying power values. ‘Median absolute deviation’, for example, distinguishes any portion of data whose power is substantially different to the portions around it, using a median-based variance estimate that is robust to outliers (Fridman 2009). The pulsar processing software psrchive can tackle narrowband interference by automatically zapping data in frequency channels that stand out from the median channel power, and can also remove impulsive interference by zapping parts of an average pulse profile that deviate from the expected shape (van Straten, Demorest & Oslowski 2012).
Other approaches to interference mitigation include: eliminating ‘cyclostationary’ signals that have periodic statistical properties (Ait-Allal et al. 2010); characterizing known sources of interference in a specific environment (Czech, Mishra & Inggs 2018); and removing signals that appear simultaneously to both a telescope and an adjacent reference receiver that is not pointing at the astronomical source (Briggs, Bell & Kesteven 2000).
Since no method of interference removal is perfect, we examine some of the particular features of the estimator.
3.1 Advantages of the estimator
The estimator is statistically unbiased, allowing us to choose fairly the fraction of good data that will be rejected. It is also simple in its assumption that the distributions of interference can usually be distinguished from the distribution of useful data: this is an unashamedly frequentist approach that requires no prior knowledge of the interference distributions. An examination by eye shows that spectral kurtosis is successful in zapping many different kinds of interference, with the loss of only a small fraction of good data.
As shown in Section 5, the estimator can be used to zap interference with fine time and frequency resolutions (e.g. |$6.25\, \mathrm{ms}$| and |$0.16\, \mathrm{MHz}$|), salvaging more useful data than other methods with coarser resolutions. Zapping can operate in either the time or frequency domain, allowing the time resolution to be improved at the expense of frequency resolution or vice versa. The ability of the estimator to detect interference generally improves as more power values are accumulated, since its variance decreases as M increases (Nita & Gary 2010a), but the best value of M also depends on the distribution and time-scale of the interference.
It is possible to zap different types of interference by using the estimator with different values of M on the same data, either independently or jointly in a ‘multiscale’ approach (Gary et al. 2010), the latter also allowing transient signals to be detected and classified using data with as few as two sampling bits (Nita 2016; Nita & Gary 2016; Nita et al. 2019). It is also possible to make the estimator more sensitive to randomly changing signals and less sensitive to smoothly varying ones by normalizing the power at each frequency and time by the total power across the bandwidth at the same time, before using the normalized values to produce the estimator (Nita et al. 2007).
The estimator can be effective on power values that have been averaged together, so it can be used retrospectively after data have been compressed in this way, although its performance deteriorates as more values are used in each average.
3.2 Disadvantages of the estimator
The frequentist approach that makes spectral kurtosis simple is also its fundamental limitation. Ideally, we would zap data using a Bayesian method in which we had prior knowledge of the amount and the distributions of the interference. Without these things, we cannot give an accurate probability that any particular portion of data is bad. But prior knowledge is not generally available to us, since the interference environment often changes on time-scales shorter than the duration of our radio observations. We therefore choose to use spectral kurtosis based on our long-term experience and suspicions about interference, but must acknowledge that the technique will be more successful in some situations than others.
Inevitably, the estimator will sometimes cause us to zap useful data or fail to zap interference. This can happen in two ways. First, data can be mislabelled as good or bad. We can control the fraction of good data that will be rejected (the type I error rate), but we cannot predetermine the fraction of bad data that will be accepted (the type II error rate), since it depends on the similarity between the distributions of estimator values from bad data and the distribution from good data. Correlated Gaussian noise might be labelled as good, for example, and Nita et al. (2007) found that periodic interference with a duty cycle of 40–60 per cent can closely mimic a good data distribution. Secondly, useful data can have a ‘bad’ distribution or interference can have a ‘good’ distribution. The former case could occur if single pulses from a pulsar had an SNR close to or greater than 1 in a single frequency channel; the latter case could result from interference that had uncorrelated Gaussian-distributed amplitudes whose variance remained approximately constant over the zapping time-scale and bandwidth. Zapping the pulsar, in particular, could alter its apparent profile shape (see Section 4.1). To combat these problems, we should use frequency channels that are too fine for single pulses to be seen above the observational noise, and we may need to employ other methods of interference removal as well as, or instead of, spectral kurtosis.
The statistical nature of spectral kurtosis makes it less suitable for data that have been averaged over many power values. As the number of power values averaged together (N) increases, the estimator loses its ability to distinguish between different gamma power distributions and therefore to detect certain types of interference (Nita & Gary 2010b). The method works best on power values sampled near the Nyquist rate, requiring substantial data storage space and computational power when zapping.
4 EFFECTS OF INTERFERENCE REMOVAL ON PULSE PROFILES
Regardless of the method used to identify interference, zapping alters our data. We look at situations where this may cause a problem for pulsar observations, and examine four methods of compensating for the lost data in array observations with telescopes in independent interference environments.
4.1 Situations in which zapping may alter profile shape
When making simultaneous radio observations using multiple telescopes in order to increase the SNR, the signals from each telescope may be summed coherently, using amplitudes with phase information, or incoherently, using power without phase information. When portions of the signal at each telescope are independently zapped prior to summation, the final signal has a variable number of contributing telescopes as a function of time and frequency. This can pose a problem for pulsar observations in particular, because of the importance of profile shape.
After summation over the available telescopes, a pulsar observation is typically summed as a function of the pulsar’s rotational phase in a process called ‘folding’. This gives the ‘average pulse’, or profile. Samples are summed incoherently in a number of phase ‘bins’, giving a profile with phase and frequency resolution, to which time resolution is added by repeating the process (pp. 165–166 Lorimer & Kramer 2005). Profiles give a higher SNR than individual pulses, which allows more accurate timing of the pulsar. Timing accuracy also relies on the profile shape remaining highly stable over time, and so every care is taken to avoid altering it instrumentally. Profile shape change could be caused by portions of a strong pulsar signal being zapped as interference, but, even when zapping is unrelated to pulsar emission, we must still understand its effect on the profile.
To quantify the profile change caused by zapping, we begin with the amplitude signal at a single telescope with no interference (the signal may be real or complex and in the time or frequency domain). We neglect variation between individual pulse measurement, which is caused by phenomena such as pulse jitter and interstellar scintillation (see e.g. Lorimer & Kramer 2005, fig. 1.1 and pp. 8, 92, 202), and assume that all pulses are identical monochromatic waves whose magnitude can be described as a function of rotational phase only. The signal power consists of a source (pulsar) contribution and a gamma-distributed random noise contribution, where the noise comes from Gaussian-distributed random amplitudes. In a single time sample within a single frequency and polarization channel, before summation over N or 2N values (see Section 2.3), the source power alone would have a mean of h(θ) and a variance of 0 at a single phase value, θ, while the random noise power alone would have a mean of g and a variance of g2 (where h(θ) and g are both non-negative). Because the source and noise have already been added together as amplitudes, within which the two contributions were independent, the summed power has a mean of N(h(θ) + g) and a variance of N(2h(θ)g + g2). When many samples of summed power are added together in folding, the central limit theorem dictates that the folded power follows an approximately Gaussian distribution. Assuming that g does not change with time, the variance is approximately constant if h(θ) ≪ g, which is usually the case since h(θ) and g come from individual power samples and pulsar radiation at the Earth is weak. However, we can see that there is some phase-dependence to the variance, which could cause measured profile shape to deviate from the true profile shape given by the function h(θ) if the noise is non-Gaussian. Furthermore, zapping would cause variations in the power level that might manifest as additional phase-dependent noise or as an inconstant power baseline, where the baseline without zapping has a constant value of Ng.
4.2 Equalizing zapped data in array observations with independent zapping at each telescope
After summing a signal coherently over multiple telescopes, we may wish to return the resulting amplitudes to the time domain via an inverse Fourier transform. We can avoid artefacts from the process by minimizing the noise level changes that can be produced by zapping, which also produces a more constant power baseline in both coherently and incoherently summed signals. This minimization is achieved by equalizing the amplitude or power variances of summed samples that have different amounts of zapping, either by scaling up samples that have fewer contributing telescopes or by adding artificial noise to them. If we are not returning the signal to the time domain, we can instead equalize the power mean by subtracting different baseline values from summed samples that have different amounts of zapping. Below, we compare the typical SNR of the summed signal after using these three processes and after no equalization.
4.2.1 Equalization of coherent observations
With coherent summation, {Xl} represents a set of magnitudes of amplitudes summed over l telescopes. Power values are the squared magnitudes of these summed amplitudes. The set of power values has a measurable mean, |$\mathrm{E}\lbrace X_l^2\rbrace$|, which is |$(l^2\overline{h}+lg)$| if the amplitudes are complex or |$(l^2\overline{h}+lg) /2$| if they are real, where |$\overline{h}$| is the time-average of h(θ) over all phase values. This quantity should be measured using a sufficient number of samples to make a stable estimate. Each summed amplitude is associated with its own value of l after zapping, and we need to measure |$\mathrm{E}\lbrace X_l^2\rbrace$| using all combinations of telescopes if |$\overline{h}$| makes any substantial contribution, since the |$\overline{h}$| and g terms vary with different powers of l and we cannot measure the two independently. If g dominates, however, then we can calculate |$\mathrm{E}\lbrace X_l^2\rbrace$| using a separate measurement from each telescope. Ideally, it should be measured and used separately in each frequency and polarization channel, but an average can be used if it is similar in each channel. We assume that |$\overline{h}$| remains constant, as does h(θ) at each phase value, but any measurable variation in these parameters could be included in g, which would then acquire phase dependence, and the following equations for coherent and incoherent observations would remain valid. In principle, we could also replace |$\overline{h}$| with h(θ) in these equations if we were able to measure the time average of h(θ) separately at each phase value, but this would require longer, folded variance measurements.
Since groups of M summed power samples are zapped together, the values of y and z depend on whether groups of samples being folded into each profile bin are zapped independently, and their ranges are 1 ≤ y ≤ 2 and 1 ≤ z ≤ 2. If M ≫ W, then y ≃ 2 (most consecutive samples entering a bin are zapped together). The value of y decreases when M decreases towards W, but with the ‘resonances’ at which y = 2 if zapping and binning are synchronized (e.g. if M = W and the M samples that are considered for zapping are the same set as the W samples that are summed into a bin). When M decreases below W, y decreases until y ≃ 1 when M ≪ W (many consecutive groups of samples entering a bin are zapped independently), although this limit is reached very slowly unless M is small. A baseband observation of a pulsar with a period of |$10\, \mathrm{ms}$| using 0.16-MHz-wide frequency channels and 1024 bins across the profile, with N = 2 and M = 1000, would give y ≃ 2, but a similar observation of a pulsar with a period of |$10\, \mathrm{s}$| would give y a lower value (though not close to 1 unless M were made smaller). If M ≤ WD, then z = 1, where D is the number of bins across the pulse profile and so WD is the number of summed power samples across a single pulse (all groups of samples entering a bin from different pulses are zapped independently). If M > WD but M ≪ FWD, then z ≃ 1, where FWD is the number of samples across an entire profile (many groups of samples entering a bin from different pulses are zapped independently). When M increases above WD and towards FWD, z increases until z ≃ 2 when M ≫ FWD (most groups of samples entering a bin from different pulses are zapped together). This last situation is undesirable as entire profiles would be zapped together, and z ≃ 1 is typical for a pulsar observation.
4.2.2 Equalization of incoherent observations
Incoherent summation generally produces a lower profile SNR than coherent summation, but may be necessary if amplitudes cannot be stored or if the alignment of time series between telescopes cannot be made accurate enough to guarantee coherence. With incoherent summation, {Xl} represents a set of power values summed over l telescopes. This set has a measurable mean, E{Xl}, of |$Nl(\overline{h}+g)$| and a measurable variance, Var{Xl}, of |$Nl(2\overline{h}g+g^2)$|. These quantities should be measured using a sufficient number of samples to make stable estimates. Each summed power value is associated with its own value of l after zapping, and we can calculate the means and variances of the summed power values for all combinations of telescopes using separate measurements from each telescope, since each quantity varies with a single power of l.
4.3 Comparison of equalization methods for zapped data
The choice of equalization method may depend on the fraction of data that are zapped, which can vary with time, frequency, and telescope environment, and the choice may also depend on whether data are summed coherently or incoherently. In most cases, mean equalization and variance equalization by scaling produce a higher profile SNR than variance equalization by addition of artificial noise and no equalization. This is because artificial noise introduces additional variance without increasing the mean, as does a failure to equalize the power baseline.
We focus on the regime of |$\overline{h}\ll h(\theta)\ll g$|, which is typical of pulsar observations at phase values where the pulse is visible. The first inequality comes about because a normal pulsar gives no emission for the majority of its period, so the pulse itself is usually well above the average emission strength. The second inequality occurs because these are quantities in a single sample at a single telescope, in which noise usually dominates over source contribution (where noise does not dominate, coherent summation loses its SNR advantage over incoherent summation even if there is no zapping). Figs 1–3 use h = 0.01, |$\overline{h}=0.0001$|, g = 1, N = 2, W = 2, y = 2, F = 1000, and z = 1 (representing an observation of a pulsar with a period of |$10\, \mathrm{ms}$|) and show the relationships between profile SNR (μ′(θ)/σ(θ)) and fraction of summed power samples zapped at each telescope (q) for 2, 5, and 100 identical telescopes in independent interference environments (this example also applies to identical groups of telescopes in which each group is zapped together and each group is in an independent interference environment; see Taylor et al. 2019 and Nita & Hellbourg 2020 for calculations of the estimator using multiple receivers in the same environment). Coherent and incoherent summation are shown, with all four methods of equalization.

Profile SNR as a function of the fraction of summed power samples zapped due to interference, using coherent (top) and incoherent (bottom) summation of signals from 100 identical telescopes in independent interference environments, with four different methods of equalization (see text for other parameter values).
Mean equalization usually gives the highest SNR, although the difference between mean equalization and variance equalization by scaling decreases as more telescopes are added, and the second method is very slightly better for coherent summation of 100 telescopes with a small or moderate amount of zapping. The scaling method may be preferred for coherent summation in particular, because it avoids artefacts when returning amplitudes to the time domain (see Section 4.2), and the penalty in SNR is small unless there is a large amount of zapping. Mean equalization can fall behind if W or N increase sufficiently to cause the signal to make a substantial contribution to variance.
Variance equalization by addition of artificial noise lags some way behind the other active methods (except in the case of a single telescope, for which it is identical to variance equalization by scaling), but it gives a better SNR than the passive method of no equalization when summing two telescopes with a small amount of zapping, as well as preventing artefacts when returning coherently summed amplitudes to the time domain. With a larger number of telescopes, it only retains this SNR advantage for incoherent summation. It can be applied at each telescope without reference to the others, so it is computationally simpler than the scaling method and may be preferred if equalization needs to be done before the individual signals are combined – for example, interference-free signals from widely spaced telescopes may be needed in order to synchronize their summation, after which the artificial noise could be replaced by a different equalization method.
The method of no equalization gives a relatively poor SNR for incoherent summation, but its performance for coherent summation improves as the number of telescopes increases. It is the simplest method, because it does not require replacement values to be computed, and may be preferred when there is very little zapping, or in the coherent summation of a large number of telescopes. However, its unequal power baseline can cause its SNR to decrease below that of all other methods when W or N increases, particularly for incoherent summation (e.g. when W = 2000 and y = 1.5, representing an observation of a pulsar with a period of |$10\, \mathrm{s}$|).
All four methods of equalization produce profile means that are linear functions of h(θ), so they do not make systematic changes to profile shape as long as the distribution of all profile noise is close to Gaussian. Even Gaussian profile noise with a phase-dependent variance produces Gaussian noise of constant variance in each complex-valued bin of a profile’s DFT, which means that the standard method of frequency-domain pulsar timing should not produce unwanted correlations between timing residuals (often called ‘red noise’) when using equalized signals (Taylor 1992). The main danger to profile shape, other than zapping of the pulsar contribution itself, is non-Gaussian noise. We rely on the accumulation of data into each profile phase bin to make the noise approximately Gaussian, in line with the central limit theorem (even though the equations for profile mean and variance above do not depend on the noise distribution). Substantial non-Gaussianity could arise if q were very close to 1, or if q were very close to 0 when h ≳ g. It could also occur if the duration and bandwidth of a typical burst of interference were not much less than the duration and bandwidth over which the profile was folded, as there would then be few independent instances of zapping within each profile, and so the binomial distribution of zapping might not resemble a Gaussian shape. Similarly, it could happen if the time and frequency resolutions of zapping were not much less than the folding duration and bandwidth. As a rough guide, fewer than 100 independent instances of zapping within a profile may be too few when q = 0.1. With or without zapping, it is worth noting that profile noise may be substantially skewed and non-Gaussian if few (less than about 50) amplitude samples contribute to the power in each profile bin.
5 INTERFERENCE REMOVAL FROM PULSAR OBSERVATIONS
Spectral kurtosis has been employed successfully by the LEAP project, which makes astronomical observations of pulsars using up to five radio telescopes simultaneously (Bassa et al. 2016; Smits et al. 2017). The aim of the project is to measure the times of arrival of pulses with sufficient accuracy to detect variation that is characteristic of the influence of gravitational waves, thereby measuring the strength of a background of low-frequency waves that is believed to permeate the Solar System from distant sources such as binary supermassive black holes (Hellings & Downs 1983). The signal from each telescope is converted to the baseband frequency range and sampled at the Nyquist rate to enable coherent summation (pp. 117–120 Lorimer & Kramer 2005), allowing spectral kurtosis to be used effectively alongside a simpler method that zaps portions of the signal whose power deviates greatly from an expected value or from the power of neighbouring portions (section 4.5 of Bassa et al. 2016). Each telescope’s signal is recorded digitally using eight sampling bits, calibrated for polarization accuracy and then zapped if necessary, before the stored signals are summed with their amplitudes calibrated to maximize the SNR of the observation. Pulse profiles, showing the average radio emission from a pulsar as it rotates, can then be produced and timed.
Fig. 4, reproduced from Bassa et al. (2016), shows the improvement in the pulse profile of PSR J1022+1001 achieved by zapping a signal from the Nançay radio telescope using spectral kurtosis and replacing the rejected data with artificial Gaussian noise. The 6-min segment of this LEAP observation used four telescopes and covered a frequency range of 1332–|$1460\, \mathrm{MHz}$|. Each measurement of the estimator used 1000 power values averaged over two complex polarization channels (M = 1000 and N = 2), giving zapping resolutions of |$6.25\, \mathrm{ms}$| and |$0.16\, \mathrm{MHz}$|. The estimator thresholds were set using η = 3, meaning that 0.27 per cent of good data from Nançay were zapped. Through the application of spectral kurtosis, little data were lost and an observation that was riddled with interference became suitable for high-precision pulsar timing.

The pulse profile of PSR J1022+1001, with brightness representing power, during 6 min of a LEAP observation with a bandwidth of |$128\, \mathrm{MHz}$|, both without (top) and with (bottom) interference removal using spectral kurtosis. The observation is a coherent summation of signals from the Jodrell Bank, Effelsberg, Nançay, and Westerbork radio telescopes, with spectral kurtosis applied to Nançay using M = 1000, N = 2, and η = 3, and rejected data replaced by artificial Gaussian noise. This high-resolution zapping makes the observation usable while sacrificing only a small fraction of data. Reproduced from Bassa et al. (2016, fig. 6).
Fig. 5 shows the improvement in the pulse profile shape of PSR J1022+1001 achieved by zapping a signal from the Nançay radio telescope using the same parameters and resolutions as those above, while Fig. 6 reveals the persistent broadband interference that was removed. This 30-min LEAP observation took place on 2013 July 27, used four telescopes and covered a frequency range of 1364–|$1460\, \mathrm{MHz}$|. Following the application of spectral kurtosis, the psrchive tool pat was used to align the LEAP profile with a template profile of high S/N using the Fourier phase gradient between them (Taylor 1992). This gave an estimated uncertainty of only |$0.25\, \mathrm{\mu \mathrm{ s}}$| in the pulse arrival time associated with the zapped tied-array profile, compared to |$1.35\, \mathrm{\mu \mathrm{ s}}$| for the non-zapped Nançay profile alone.

The pulse profile of PSR J1022+1001, folded over 30 min of a LEAP observation with a bandwidth of |$96\, \mathrm{MHz}$|, both without (top) and with (bottom) interference removal using spectral kurtosis. The top panel shows the contribution from the Nançay radio telescope only, while the bottom panel shows the coherent summation of signals from the Jodrell Bank, Effelsberg, Nançay, and Westerbork radio telescopes, with spectral kurtosis applied to Nançay using M = 1000, N = 2, and η = 3, and rejected data replaced by artificial Gaussian noise. Zapping restores the profile shape and flattens the power baseline, allowing the pulse arrival time to be measured with a high estimated accuracy of |$0.25\, \mathrm{\mu \mathrm{ s}}$|. This compares to |$1.35\, \mathrm{\mu \mathrm{ s}}$| for Nançay alone and without zapping.

The dynamic spectrum of the Nançay observation shown in Fig. 5, with brightness representing power, before interference removal. Spectral kurtosis is applied before the pulsar observation is folded, and can zap the persistent broadband interference seen here.
A further LEAP observation of PSR J1022+1001 demonstrates that spectral kurtosis recovers the unique profile shape that is critical to high-precision pulsar timing. Fig. 7 shows the profile of PSR J1022+1001 produced from a 60-min observation made on 2021 May 15 by the Nançay radio telescope over a frequency range of 1332–|$1460\, \mathrm{MHz}$|, before and after zapping using the same parameters and resolutions as those above. Fig. 8 shows that the interference in the observation was persistent and broadband, and zapping of the folded observation using the psrchive tool paz had little effect. Spectral kurtosis zapped around 0.6 per cent of the data, and reduced the estimated timing uncertainty of the Nançay profile from 1.22 to |$0.70\, \mathrm{\mu \mathrm{ s}}$| according to the pat tool. Fig. 9 shows the residual profile produced by subtracting this Nançay observation from the four-telescope LEAP observation shown in Fig. 5, again using pat, with the lack of residual structure indicating that spectral kurtosis did not alter the shape of the pulse profile.

The pulse profile of PSR J1022+1001, folded over 60 min of a LEAP observation with a bandwidth of |$128\, \mathrm{MHz}$|, both without (top) and with (bottom) interference removal using spectral kurtosis. Both panels show the contribution from the Nançay radio telescope only, with spectral kurtosis applied using M = 1000, N = 2, and η = 3, and rejected data replaced by artificial Gaussian noise. While zapping of the folded observation is ineffective at removing this interference, spectral kurtosis improves the estimated accuracy of the pulse arrival time measured at Nançay from 1.22 to |$0.70\, \mathrm{\mu \mathrm{ s}}$|.

The dynamic spectrum of the Nançay observation shown in Fig. 7, with brightness representing power, before interference removal. The persistent broadband interference seen here cannot be separated from the pulsar signal using a folded observation, but can be zapped by spectral kurtosis using fine time and frequency resolutions simultaneously.

The pulse profile of PSR J1022+1001, with a residual profile (top) showing the difference between the four-telescope LEAP observation from Fig. 5 (middle) and the Nançay observation from Fig. 7 (bottom) after interference removal using spectral kurtosis. The Nançay observation has been reduced to the same frequency range as the earlier LEAP observation, and its profile has been scaled and shifted when producing the residual profile. Interference removal from the Nançay telescope does not appear to have changed the pulse profile shape, judging by the lack of structure in the residuals between these two observations that were made 8-yr apart.
6 CONCLUSIONS
This paper provides a recipe for the implementation of the spectral kurtosis method from start to finish, allowing signal interference to be zapped from real or complex time series data stored as either amplitudes or power (Section 2). The frequentist nature of spectral kurtosis makes it effective without prior knowledge of the interference that will be encountered, so it is widely applicable rather than being ideal in specific situations (Section 3). We have shown its success in enabling an accurate radio-frequency array observation of a pulsar in the presence of interference local to one telescope, allowing signals from multiple widely spaced telescopes to be combined with only a very small loss of usable information and without any apparent detriment to the shape of the pulse profile (Section 5). The preservation of the unique profile signature of each pulsar is crucial for precise timing of its rotation, and the timing information from the cleaned observations is being used in a long-term project to detect gravitational waves.
When zapping data that contain a rapidly varying signal such as pulsar emission, it is important that the estimator does not recognize the signal amplitudes as non-Gaussian, as the spectral kurtosis procedure would then remove the information of interest. Observers should therefore ensure that the time and frequency resolutions of an observation are too fine to allow single pulses to be detected (Section 3). In order to maintain a Gaussian noise distribution, the time and frequency resolutions of zapping should be much less than the duration and bandwidth of an observation or folded pulse profile, so that there are many independent opportunities for zapping within the observation (Section 4).
The quality of an observation made using an array of telescopes in independent interference environments is improved by compensating for zapped data, regardless of the zapping technique used, and the methods of compensation are applicable to any widely spaced array (Section 4). The highest SNR is usually obtained by mean equalization: equalizing the mean of the summed power so that its baseline level remains constant over time. Mean equalization is the most appropriate method for an incoherently summed signal. However, if the signal is coherently summed and its amplitudes are stored so that its time and frequency resolutions can be adjusted later, it is better to apply variance equalization by scaling: equalizing the variance of the summed amplitudes over time, which also results in a constant power baseline and avoids unwanted artefacts if the amplitudes are transformed to different time and frequency resolutions. The SNR after variance equalization by scaling is usually slightly less than the SNR after mean equalization, but the difference is small when there is either a small-to-moderate amount of zapping or a large number of telescopes. Variance equalization by scaling is the most appropriate method when the signal may be summed coherently or incoherently in different parts of an observation. The alternative method of variance equalization by addition of artificial noise may be needed to allow signals from multiple telescopes to be synchronized, and can then be replaced with another method to improve SNR.
ACKNOWLEDGEMENTS
The European Pulsar Timing Array (EPTA) is a collaboration of European institutes working towards the direct detection of low-frequency gravitational waves, for which it has implemented the Large European Array for Pulsars (LEAP). The authors acknowledge the support of colleagues in the EPTA, and MP expresses gratitude for the patience of co-authors and family while the writing of this paper was completed. The work reported here has been funded by the European Research Council Advanced Grant ‘LEAP’, grant agreement ID 227947 (Principal Investigator: M. Kramer). KL and MK are supported by the European Research Council Synergy Grant ‘BlackHoleCam’, grant agreement ID 610058 (Principal Investigator: M. Kramer). KJL acknowledges support from the National Basic Research Program of China, 973 Program, 2015CB857101 and NSFC 11373011. The Effelsberg Radio Telescope is operated by the Max-Planck-Institut für Radioastronomie in Germany. The Westerbork Synthesis Radio Telescope is operated by the Netherlands Institute for Radio Astronomy (ASTRON) with support from the Netherlands Organisation for Scientific Research (NWO). The Nançay Radio Observatory is operated by the Paris Observatory and associated with the Centre National de la Recherche Scientifique in France. Pulsar research at the Jodrell Bank Centre for Astrophysics, and the observations using the Lovell Telescope, are supported by a consolidated grant from the Science and Technology Facilities Council (STFC) in the UK. The Sardinia Radio Telescope is operated by the Istituto Nazionale di Astrofisica (INAF) in Italy, and was undergoing its astronomical validation phase when the observations used in this paper were made.
DATA AVAILABILITY
The data and software programmes underlying this article will be shared on a reasonable request to the corresponding author.