ABSTRACT

The fast folding algorithm (FFA) is a phase-coherent search technique for periodic signals. It has rarely been used in radio pulsar searches, having been historically supplanted by the less computationally expensive fast fourier transform (FFT) with incoherent harmonic summing (IHS). Here, we derive from first principles that an FFA search closely approaches the theoretical optimum sensitivity to all periodic signals; it is analytically shown to be significantly more sensitive than the standard FFT+IHS method, regardless of pulse period and duty cycle. A portion of the pulsar phase space has thus been systematically underexplored for decades; pulsar surveys aiming to fully sample the pulsar population should include an FFA search as part of their data analysis. We have developed an FFA software package, riptide, fast enough to process radio observations on a large scale; riptide has already discovered sources undetectable using existing FFT+IHS implementations. Our sensitivity comparison between search techniques also shows that a more realistic radiometer equation is needed, which includes an additional term: the search efficiency. We derive the theoretical efficiencies of both the FFA and the FFT+IHS methods and discuss how excluding this term has consequences for pulsar population synthesis studies.

1 INTRODUCTION

The standard search procedure for periodic pulsar signals relies on taking the fast fourier transform (FFT) of noisy input data and forming the fluctuation power spectrum (see e.g. Lorimer & Kramer 2004, for an overview). A sufficiently bright periodic signal manifests itself in the power spectrum as a discrete set of harmonics: peaks located at integer multiples of the signal’s fundamental frequency. The statistical significance of such a signal is evaluated by the sum of its normalized harmonic powers, a practice aptly named incoherent harmonic summing (IHS; Huguenin and Taylor, as described by Burns & Clark 1969). For each significant candidate thus identified, the original data are then phase-coherently folded modulo the candidate period to produce an integrated pulse profile and other diagnostic information that are then evaluated by a combination of visual inspection and, in more recent years, machine-learning algorithms (e.g. Lyon et al. 2016, for a review). The FFT has long dominated the landscape of pulsar searching, being the only periodicity search technique employed in nearly all major pulsar surveys of the past three decades (e.g. Johnston et al. 1992; Manchester et al. 2001; Cordes et al. 2006; Keith et al. 2010; Stovall et al. 2014).

However, an alternative method consists of directly folding the data at a wide range of narrowly spaced trial periods; the fast folding algorithm (FFA; Staelin 1969) does so efficiently by taking advantage of redundant operations. One integrated profile is produced for each trial period, which can then be evaluated for the presence of a pulse. Although as old as pulsar astronomy itself, for many years the FFA was rarely applied in practice and only for small-scale purposes. Examples include targeted searches of radio emission from X-ray emitting neutron stars (Crawford et al. 2009; Kondratiev et al. 2009), or attempts to find an underlying periodicity in the emission of the repeating fast radio burst FRB121102 (Scholz et al. 2016). Outside of pulsar astronomy, we may note that the FFA has been adapted to detect transiting exoplanets in Kepler photometry data (Petigura, Marcy & Howard 2013) and used in searches for extraterrestrial intelligence (SETI@home, e.g. Korpela et al. 2001). When it comes to large-scale FFA searches for pulsars, Faulkner et al. (2004) suggest that one was attempted for some time on the Parkes Multibeam Pulsar Survey (Manchester et al. 2001); they credited it with the discovery of a pulsar with a 7.7 s period, but there have been no subsequent publications on the matter. An interest in further developing the FFA has been growing in recent years; however, with the practical sensitivity analysis of Cameron et al. (2017) and the implementation of an FFA search pipeline on the PALFA survey by Parent et al. (2018) that has found a new pulsar missed by the standard FFT pipeline running on the same data.

So far, the FFA has been used mostly in niche cases because of its high computational cost (Lorimer & Kramer 2004; Cameron et al. 2017). The cost of an FFA search grows unreasonably large if one wants to probe shorter (P ≲ 500 ms) periods, and limited effort has been invested so far in making truly fast FFA implementations. On the other hand, highly optimized FFT libraries are widely available, upon which fast pulsar searching codes can be built. The FFT can also be efficiently extended to search for binary millisecond pulsars (e.g. Ransom, Eikenberry & Middleditch 2002). But sensitivity concerns have to be seriously taken into consideration.

Only one theoretical comparison between the sensitivity of the FFA and the standard FFT procedure has been published before (Kondratiev et al. 2009); its conclusion was that the FFA is better at detecting pulsars with both long periods (P ≳ 2 s) and short duty cycles (⁠|$\delta \lesssim 1{{\ \rm per\ cent}}$|⁠). More recently, a thorough analysis of the sensitivity of the PALFA pulsar survey (Lazarus et al. 2015) revealed that FFT searches do not reach their expected theoretical sensitivity to P ≥ 100 ms pulsars on real-world data, an effect that worsens for longer periods of a few seconds or more. The presence of low-frequency ‘red’ noise was found to be the main cause, and a suggested counter-measure was to use the FFA to search the long-period regime. Cameron et al. (2017) focused on the problem of assessing the output of the FFA for the presence of a pulsar signal; they empirically tested the behaviour of several pulse profile evaluation algorithms on a sample of real long-period pulsar observations. Methods based on matched filtering were found to perform better, and the FFA was shown to exceed the sensitivity of the FFT in a number of test cases. Parent et al. (2018) did a similar analysis on artificial data, and reported that on real PALFA survey data, their FFA implementation was on average more sensitive to pulsars with periods longer than a second (see their fig. 6) than the FFT-based presto pipeline (Ransom et al. 2002).

Although empirical sensitivity comparisons between search methods are important, it is not clear whether the differences in sensitivity thus measured between FFA and FFT search codes demonstrate an intrinsic superiority of either technique in a specific period or duty cycle regime, or if they are indirectly caused by technical implementation details. Particular concerns include how the statistical significance of signals is assessed, or the method used in FFT pipelines to mitigate the effects of low-frequency noise in Fourier spectra, which can have dramatic period-dependent effects (van Heerden, Karastergiou & Roberts 2017). To fully understand and then realize the true potential of the FFA, the intrinsic sensitivity of search methods must be compared analytically.

In this paper, we mathematically demonstrate that an FFA search procedure is more sensitive than its FFT counterpart to all periodic signals. In Section 2, we explain the FFA core operation, and propose a more computationally efficient variant of the original FFA algorithm. In Section 3, we formulate the general problem of detecting a signal in noisy data as a problem of statistical hypothesis testing; from there, we derive the optimally sensitive test statistic for the presence of a periodic signal in uncorrelated Gaussian noise, and show that calculating it amounts to running the FFA and convolving its output folded profiles with a matched filter. In Section 4, we use the same hypothesis-testing framework to calculate the sensitivity of the standard FFT+IHS method, and demonstrate that it is always outperformed by the FFA, moreso on smaller pulse duty cycles; pulse period is shown to be irrelevant in this comparison. In Section 5, we describe the details of our own FFA implementation riptide1 and how we propose to surmount the practical hurdles of realizing the full potential of the FFA on real data contaminated by low-frequency noise and interference. Our implementation is sufficiently fast to be employed on a modern all-sky survey. We discuss the implications of our results in Section 6. In particular, we caution that the radiometer equation of Dewey et al. (1985), if used to compute the minimum detectable flux density of a search, implicitly assumes that the method employed is phase-coherent and that the pulse shape is known a priori; we provide correction factors that more realistically account for the sensitivities of both the FFA and of the incoherent FFT-based standard method. We then conclude in Section 7.

2 THE FAST FOLDING ALGORITHM

The FFA is an efficient method to phase-coherently fold an evenly sampled time series at multiple closely spaced trial periods at once, taking advantage of the many redundant calculations in the process. It was originally devised to search for periodic signals in one of the very first pulsar surveys (Staelin 1969). In this section, we explain the principle of the FFA and describe a variant of the algorithm that produces identical results to the original, but does not require any input padding and uses modern CPU caches more efficiently.

2.1 The folding transform

If we consider an evenly sampled time series containing m cycles of a periodic pulsed signal with an integer period p (expressed in number of samples), the data may be represented as a two-dimensional array with m rows and p columns; each row represents an individual cycle, arranged in chronological order from top to bottom, and all pulses appear vertically aligned in phase. In this case, phase-coherently folding the signal is achieved by summing the array across all rows. However, if the period of the signal is increased to a larger, non-integer value p + Δp with 0 < Δp < 1, its consecutive pulses will appear to drift in phase towards higher values of pulse phase (i.e. the right-hand side), as illustrated in the example shown in the left-handl of Fig. 1. To phase-coherently integrate the signal, we now need to apply circular rotations to each pulse in order to compensate for the observed phase drift, which is a linear function of time; in other words, we must integrate the rows of the array along a diagonal path, whose slope can be defined by a single parameter s: the number of phase bins drifted towards the right between the first and last rows. Note that the integration path may wrap around in phase at the right edge of the array, possibly multiple times for higher values of s. The path s = 0 corresponds to a folding period of p, while s = m corresponds to a folding period of approximately p + 1. The exact relationship between s and folding period is derived in Section 2.3. We call the folding transform of the data the set of integrated pulse profiles obtained for all trial values of s with 0 ≤ s < m, which presents itself as an array of the same dimensions as the input data (right-handl of Fig. 1); column index represents phase, and each row corresponds to a trial folding period between p and approximately p + 1. Since the input data are discrete in time, integration paths with non-integer values of s are identical to that with the closest integer value, which means that the folding transform provides the highest possible period resolution for the given sampling rate. A brute-force calculation of the folding transform would take |$\mathcal {O}(p m^2)$| additions, but redundancies between consecutive period trials can be exploited.

An illustration of the folding transform. Left-hand panel: input data, arranged in m = 8 rows of p = 6 samples, containing an artificial train of pulses with a width of one time sample, an initial phase of ϕ = 1 sample and a period larger than p, such that the pulses appear to drift in phase by s = 4 bins over the total observation time. Right-hand panel: folding transform of the input data, where darker colours denote larger integrated intensity. In a blind search, a visible peak in the folding transform denotes a candidate periodic signal; the row and column indices of the peak correspond to its period and initial phase, respectively.
Figure 1.

An illustration of the folding transform. Left-hand panel: input data, arranged in m = 8 rows of p = 6 samples, containing an artificial train of pulses with a width of one time sample, an initial phase of ϕ = 1 sample and a period larger than p, such that the pulses appear to drift in phase by s = 4 bins over the total observation time. Right-hand panel: folding transform of the input data, where darker colours denote larger integrated intensity. In a blind search, a visible peak in the folding transform denotes a candidate periodic signal; the row and column indices of the peak correspond to its period and initial phase, respectively.

2.2 A depth-first traversal, arbitrary length FFA

The FFA can be seen as a divide-and-conquer algorithm, where the idea is to cheaply compute the folding transform of a data block from the folding transforms of its two halves. FFT algorithms are likewise based on the same principle (Cooley & Tukey 1965). Suppose that we have split the input data, in two-dimensional form as in the previous section, into two arbitrarily sized blocks that we call the head and the tail, each with mh and mt rows, respectively, such that m = mh + mt. Let us consider an integration path that shifts to the right by s phase bins between the mid-points in time of the first and last rows: this amounts to m − 1 rows worth of time-span, and the drift rate of the path is thus s/(m − 1) phase bins per row. Within the head and tail parts of the data, this integration path will therefore drift by i and j phase bins, respectively, whose expressions are
(1)
where |$\lfloor \, \rceil$| denotes rounding to the nearest integer. This is illustrated on an example in the left-hand panel of Fig. 2. We further need to take into account a possible 1-bin phase jump at the boundary between the head and the tail that we call b:
(2)
Therefore, if H, T, and F denote the folding transforms of the head, tail, and whole input, respectively, then we can obtain the s-th row Fs of F by adding together pairs of rows from H and T with the following rule, illustrated on the middle and right-hand panels of Fig. 2:
(3)
where the notation ≪k denotes the element-wise circular rotation of an array by k elements towards the left; rotating Tj by i + b bins towards the left takes care of phase-connecting the two integration segments. From there a divide-and-conquer algorithm naturally emerges, where the input data are recursively split in two similarly sized blocks, thereby creating a binary tree of folding transforms where the input sizes are (approximately) halved at every recursion level. A leaf node corresponds to an input block with a single row, in which case its folding transform is trivially equal to itself. Our proposed variant of the FFA performs a depth-first, post-order traversal of the tree; the left sub-tree is visited first, resulting in the calculation of H, followed by the traversal of the right sub-tree, resulting in the calculation of T, and then the two are combined into F using equation (3). The original variant of the FFA described by Staelin (1969) instead performs a breadth-first traversal from the bottom-level of the tree: first, the input is partitioned in two-row blocks and their folding transforms are calculated; from there, the folding transforms of all four-row blocks are built from the results of the previous step, and so on until the transform of the whole input data has been obtained.
Depth-first variant of the FFA. The principle of the algorithm is to split the input data (left-hand panel) into two parts, outlined in blue and red, respectively, and independently calculate their folding transforms (middle panel); by adding together pairs of rows from each of these intermediate outputs with appropriate element-wise circular shifts, the folding transform of the whole input data (right-hand panel) can be obtained. It is shown in this specific example how the fourth row of the output, corresponding to an integration path with a total shift of s = 3 bins across the input data, is calculated. See text for a full explanation. This divide and conquer strategy is applied recursively, thus avoiding the redundant additions that would be performed in a brute force calculation of the folding transform.
Figure 2.

Depth-first variant of the FFA. The principle of the algorithm is to split the input data (left-hand panel) into two parts, outlined in blue and red, respectively, and independently calculate their folding transforms (middle panel); by adding together pairs of rows from each of these intermediate outputs with appropriate element-wise circular shifts, the folding transform of the whole input data (right-hand panel) can be obtained. It is shown in this specific example how the fourth row of the output, corresponding to an integration path with a total shift of s = 3 bins across the input data, is calculated. See text for a full explanation. This divide and conquer strategy is applied recursively, thus avoiding the redundant additions that would be performed in a brute force calculation of the folding transform.

The depth-first traversal FFA presented here is more efficient than the original breadth-first variant for two reasons. First, the depth-first variant can directly handle inputs with an arbitrary, non power-of-two number of rows m, removing the need to zero-pad the input data. This not only saves operations in the folding transform calculation itself, but also avoids creating additional folded profiles to be evaluated for the presence of a pulse. We note that the input data must still be presented in rectangular form, with a total number of samples that is a multiple of p. If the last input row is incomplete, it must be either ignored or padded to a full p samples; we have chosen the former option in our implementation because it is more computationally efficient. Secondly, a depth-first traversal of the tree has better memory locality and is more optimally suited to the hierarchical cache structure of modern CPUs: as the traversal proceeds towards the deeper layers of the tree, the calculations associated with a node and all of its sub-tree can fit entirely in cache once a certain depth is reached, regardless of the size of the full input. In contrast, the original breadth-first variant has to initially partition the input into two-row blocks and compute their respective transforms; for large inputs that cannot fit in the CPU cache, this requires two comparatively slow transactions with the main memory (RAM): one to read the entire input, another to write an equal-sized intermediate output with all two-row blocks transformed. All these operations have to be done before computing the next intermediate output with four-row blocks transformed. Building the final result in this order always requires a total of 2log2(m) RAM transactions, which negatively affects execution times.

2.3 Spacing of trial periods

The statement in Staelin (1969) that the trial periods of the FFA are spaced linearly is true only as a first-order approximation. Let τ denote the sampling time, P0 = pτ the chosen base folding period and ν0 = 1/P0 the corresponding base folding frequency. Between the mid-points in time of the first and last rows of the input array, the total time elapsed is T = (m − 1)pτ. If the data contain a signal with a different frequency ν0 + Δν, then between the first and last rows, its train of pulses will drift in phase by some number of bins s to the right. The relationship between s and Δν is obtained by writing the phase difference of the pulse train (with frequency ν) relative to the folding frequency ν0. This phase difference, expressed in number of folding periods, is given by Δϕ(t) = (ν − ν0)t. At t = T, the pulse train is late in phase by s time samples with respect to the fiducial point which means that Δϕ(T) = −s/p and thus
(4)
The output profiles produced by the FFA are therefore spaced linearly in frequency rather than in period. The s-th row in the folding transform therefore corresponds to a trial frequency of
(5)
and to a trial period of
(6)
It is worth noting that the last period trial Pm − 1 is strictly larger than (p + 1)τ. In a practical search, where a sequence of folding transforms with increasing base periods are calculated, this means that their last few rows may have trial periods larger than the base period of the next folding transform. These redundant trials are ignored in our search code implementation.

3 OPTIMALLY SEARCHING FOR PERIODIC SIGNALS AND SENSITIVITY OF THE FFA

In this section, we examine the general problem of testing for the presence of a signal in discretely sampled noisy data, which is a matter of statistical hypothesis testing. After introducing all necessary technical terms, we find from first principles the optimally sensitive test statistic for detecting a signal with known parameters. For periodic signals, it emerges that computing this test statistic is equivalent to phase-coherently folding the data and correlating the output with a matched filter that reproduces the known pulse shape. In a practical search, the signal parameters are not known a priori; in this case a sensitivity penalty is incurred, which we quantify.

3.1 Problem formulation

In ideal observing conditions, the evenly sampled time series data to search can be represented by an N-dimensional vector |$\mathbf {x}$| resulting from the sum of background noise |$\mathbf {w}$| and possibly an astrophysical signal |$\mathbf {s}$|⁠, which can be written as
(7)
The samples of |$\mathbf {w}$| are assumed to be mutually independent and drawn from the same normal distribution with known mean m and standard deviation σ; we will consider the case m = 0 and σ = 1 without loss of generality, as the data can be normalized beforehand with the transformation |$\mathbf {x} \mapsto (\mathbf {x} - m) / \sigma$|⁠. a ≥ 0 is an unknown parameter that denotes the amplitude of the signal, noting that we impose a unit square sum (or unit L2-norm) normalization on |$\mathbf {s}$|⁠:
(8)
where the dot symbol denotes the dot product. Why the choice of unit square sum normalization is natural here will become apparent later. We will assume, until stated otherwise, that the shape of the astrophysical signal is known in advance, i.e. we are searching for a well-defined signal shape (the value of all si are known) with unknown amplitude. Establishing whether or not the signal |$\mathbf {s}$| is present in |$\mathbf {x}$| means deciding between the mutually exclusive statistical hypotheses below:
  • H0: the data |$\mathbf {x}$| contain no signal, a = 0.

  • H1: the data |$\mathbf {x}$| contain the signal |$\mathbf {s}$|⁠, a > 0.

To that end, we need a mathematical function |$T(\mathbf {x})$|⁠, a test statistic, that reduces the data |$\mathbf {x}$| to a single summary value from which one can sensibly decide between hypotheses. For any test statistic, one must also define a so-called critical region, which is the set of values of |$T(\mathbf {x})$| for which the null hypothesis is rejected. In most practical applications, as will be the case here, the critical region will simply be defined by a threshold above which (or below which) one decides in favour of H1. Here, we want to find a test statistic (and critical threshold) that maximizes the sensitivity of our search for the presence of |$\mathbf {s}$|⁠, a concept that first needs to be rigorously defined.

3.2 Evaluating a test statistic

When choosing between hypotheses, two types of errors can be made. A type I error is a rejection of the null hypothesis H0 when it is true, which in pulsar astronomy parlance is a false alarm or false positive. A type II error is a rejection of the alternative hypothesis H1 when it is true − here, quite simply a missed pulsar. Unless the decision problem is trivial, there is no perfectly accurate test, so that there exists a trade-off between the two types of error rates; when evaluating tests, the standard practice is therefore to first set an acceptable false positive rate, or significance level noted α, and identify which test exhibits the lowest type II error rate while maintaining a type I error probability smaller than α (e.g. Casella & Berger 2002). For our specific purposes, the sensitivity of a test statistic T to a given signal shape |$\mathbf {s}$| can be rigorously evaluated by following the procedure below:

  • Set the significance level α. In a pulsar search, the choice of α is usually made so that no more than one false alarm per observation occurs on average, assuming ideal data without interference signals (e.g. Lorimer & Kramer 2004).

  • Calculate the critical threshold η of the test statistic T associated with the chosen false alarm rate α. η is defined by
    (9)
    where the notation P(E|H) denotes the probability of event E given hypothesis H is true.
  • Assuming the alternative hypothesis H1 is true, compute the probability γ that T exceeds the critical value:
    (10)
    γ is called the statistical power of the test T, or completeness of the search, and it is a function of the signal amplitude a.
Note that we have implicitly assumed that T is a monotonically increasing function of a. If instead T is a monotonically decreasing function of a, the inequalities in (9) and (10) must be reversed. One last useful concept is the upper limit|$\mathcal {U}$|⁠, which is the minimum signal amplitude that is detectable with at least some probability γmin, following the rigorous definition of Kashyap et al. (2010). It is a characteristic of any search procedure and is defined by
(11)
For the sake of simplicity we will set γmin = 0.5 throughout the paper. This now provides us with a rigorous definition of sensitivity: we can conveniently compare two test statistics by their respective upper limits at some fixed significance level α.

3.3 Constructing the most powerful test: the Z-statistic

One interesting and widely applied test is the so-called likelihood-ratio test (LRT) statistic Λ, whose most general definition is (e.g. Casella & Berger 2002)
(12)
where θ is a vector of model parameters, |$\mathcal {L}(\theta |\mathbf {x})$| the likelihood function, Θ is the entire parameter space, Θ0 the subset of that space spanned by the null hypothesis H0, and |$\rm {sup}$| denotes the supremum. The LRT takes values between zero and one and essentially measures the goodness-of-fit of the null hypothesis. The main reason to investigate the LRT is that under some specific conditions it has optimal statistical power; furthermore, writing out its expression (if possible), may also suggest another equally powerful test with a simpler expression. For our particular problem, we have a single parameter a which only takes positive values, and the natural logarithm of |$\Lambda (\mathbf {x})$| reduces to a simple form
(13)
which is derived in the appendix (Section A1). The above expression depends on the data only through what we will call the Z-statistic
(14)
with the intuition that it will retain all desirable properties of the LRT. The Z-statistic is in fact the maximum-likelihood estimator for the parameter a (see Section A1) and it follows a normal distribution with mean a and unit variance, which we can remark by expanding the above
(15)
By our initial assumption, the wi are independent variables that follow a standard normal distribution, that is with zero mean and unit variance, which is compactly written as |$w_i \sim \mathcal {N}(0,1)$|⁠. In the above equation, the sum of the wi weighted by the coefficients si is also a normal random variable, whose mean is zero (by linearity of expectation) and whose variance is equal to the sum of the |$s_i^2$|⁠, which is 1 by definition (equation 8). Therefore, |$Z(\mathbf {x}) \sim \mathcal {N}(a, 1)$|⁠.
From there we may find the critical value of the Z-statistic for a given significance level α by applying equation (9). Under the null hypothesis, we have a = 0 and thus we can write |$P(Z(\mathbf {x}) \ge u | H_0) = \overline{\Phi }(u)$| where |$\overline{\Phi }$| is the survival function of the standard normal distribution |$\mathcal {N}(0,1)$|⁠. The critical value for the Z-statistic ηz at significance level α is the solution to the equation |$\overline{\Phi }(u) = \alpha$|⁠, and its expression thus given by
(16)
where |$\overline{\Phi }^{-1}$| denotes the inverse survival function of the standard normal distribution. ηz therefore represents the number of Gaussian sigmas associated with the probability α. By application of the Karlin–Rubin theorem, the test defined as follows:
  • Choose H1 if |$Z(\mathbf {x}) \ge \eta _z(\alpha)$|⁠.

  • Choose H0 otherwise.

is the most powerful test for the presence of the signal |$\mathbf {s}$| regardless of the chosen significance level α. A demonstration is provided in the appendix (Section A2). Its statistical power as a function of true signal amplitude a can be found directly from equation (10) remarking that, under the alternative hypothesis H1, |$Z(\mathbf {x}) - a$| follows a standard normal distribution
(17)
where Φ denotes the cumulative distribution function of the standard normal distribution and where the property |$\overline{\Phi }(u) = \Phi (-u)$| has been used. We can remark that for a = ηz, γz = Φ(0) = 0.5 and therefore from equation (11) that the upper limit of the Z-test is simply
(18)
which is thus the lowest achievable at significance level α among all statistical tests. Since the Z-statistic is the best estimator of the signal amplitude relative to the background noise standard deviation (here set to 1 without loss of generality), we will interchangeably call it signal-to-noise ratio (SNR or S/N) when later discussing its practical software implementation.

3.4 Case of periodic signals

If we further assume that the signal |$\mathbf {s}$| is periodic with a known integer period of p samples, then |$\mathbf {s}$| is a sequence of identical segments |$\mathbf {t}$| that we will call the pulse template. In this case, it is equivalent to calculate the Z-statistic by phase-coherently folding the data |$\mathbf {x}$| at a period p into a so-called folded profile |$\mathbf {f}$|⁠, and then by taking the dot product of the folded output with the pulse template |$\mathbf {t}$|⁠. Assuming that the data contain exactly m pulse periods, the elements of |$\mathbf {f}$| are given by
(19)
where 0 ≤ i < m represents the pulse index and 0 ≤ j < p the phase bin index. Since |$\mathbf {s}$| is periodic, the elements of the pulse template are such that tj = spi + j for any i, j. Writing out the Z-statistic expression and rearranging, we get
(20)
which completes our demonstration. When searching for a periodic signal with a known period and pulse shape, the optimally sensitive procedure to detect its presence is based upon phase-coherently folding the input data into a profile and then correlating it with the known pulse template.

3.5 Searching for periodic signals with unknown parameters

Until now, we have considered the true signal |$\mathbf {s}$| to be known in advance; this is not the case in a blind search, where one must make a priori assumptions about the properties of the signal. We may end up testing the input data, erroneously so, for the presence of a different trial signal |$\mathbf {s}^{\prime }$|⁠, which means using a less powerful test statistic Z:
(21)
where |$\mathbf {s^{\prime }}$| is also normalized to unit square sum. The relevant term in the equation above is what we will call the search efficiency
(22)
which measures the similarity between the true and postulated signals, noting that |$|\mathcal {E}| \le 1$| and that |$\mathcal {E} = 1$| if and only if |$\mathbf {s^{\prime }} = \mathbf {s}$|⁠. Following the same steps as in Section 3.3, one can show that the degraded test statistic Z’ follows a normal distribution |$\mathcal {N}(\mathcal {E} a, 1)$|⁠; its critical value remains equal to ηz, but its statistical power is reduced to a value of
(23)
and its correspondingly higher upper limit |$\mathcal {U}_z^{\prime }$| is given by
(24)
One therefore needs to search for a parametrized family of signal templates so that at least one is acceptably similar to the true signal. Periodic pulsar signals need to be described by at least three parameters:
  • Their period p.

  • Their pulse phase ϕ at the start of the observation.

  • Their duty cycle δ, defined as the ratio between pulse width and period.

This naturally raises the question of gridding this three-dimensional search space so that the efficiency |$\mathcal {E}$| remains acceptably high. We also need to estimate |$\mathcal {E}$| in a practical FFA search. The FFA conveniently provides the finest possible period coverage allowed by the time resolution of the data, by producing a sequence of folded profiles at all distinguishable trial periods. Using the result of Section 3.4, these profiles can be directly tested for the presence of various pulse templates, parametrized by a duty cycle δ and initial phase ϕ. However, the finest possible coverage of the phase dimension can be achieved using matched filtering; that is performing circular convolution of the profiles with templates with different values of δ and an initial phase of zero. This only leaves the gridding of the duty cycle dimension for detailed consideration; since δ can span several orders of magnitude in known pulsars, an efficient strategy is to geometrically space the trial duty cycle values, where the minimum trial value corresponds to a width of one phase bin. To provide an estimate of |$\mathcal {E}$| in a real search using this strategy, we need to postulate a shape both for the true signal and the templates (or matched filters). This investigation can be highly multidimensional; we will assume that the true pulse shape is Gaussian, which is a good approximation for the majority of pulsars (e.g. Rankin 1983), and that the templates are so-called boxcar or top-hat filters because they are very computationally efficient in practice (Section 5.4). Under these assumptions, and if the trial boxcar widths are geometrically spaced by a factor of 2, the average efficiency when searching for a Gaussian pulse with unknown duty cycle is
(25)
which we derive in the appendix (Section A3). This can be taken as a reasonable approximation of the average efficiency of a practical FFA search, and shows that for single-peaked profiles boxcars provide quite acceptable sensitivity. For more exotic, multipeaked pulsar profiles, however, the above is an overestimate. We leave the problem of choosing the optimal set of matched filters with respect to the known pulsar population open for future investigation. From equation (24), the upper limit of a practical FFA search using boxcar filters at a significance level α can be approximated by
(26)
or in other words, 1.075 times the number of Gaussian sigmas associated with the probability α.

4 SENSITIVITY ANALYSIS OF THE STANDARD FFT PROCEDURE

After establishing the optimal, or near-optimal sensitivity of an FFA search procedure, we now need to quantify how the standard incoherent FFT-based search performs in comparison. Although the topic of the sensitivity of a sum of Fourier harmonics to periodic signals has been covered before (e.g. Groth 1975; Vaughan et al. 1994; Ransom et al. 2002; Kondratiev et al. 2009), we will do so again in a way that fits the hypothesis testing framework laid out in the previous section. This will allow a mathematically rigorous comparison of FFA and FFT upper limits.

4.1 The incoherent detection statistic

For the purposes of this section, we will use the so-called unitary normalization of the discrete Fourier transform (DFT); the DFT of a sequence |$\mathbf {x}$| with N elements is the set of coefficients |$\mathbf {X}$| whose elements are given by
(27)
where 0 ≤ k < N and i2 = −1. We will use uppercase letters to denote the complex-valued, unitary Fourier transforms of the vectors defined in equation (7). The unitary DFT conserves the complex dot product and therefore the L2-norm, such that in particular ‖S2 = ‖s2 = 1. Because the wi are mutually independent and follow a standard normal distribution, the real and imaginary parts of all complex elements of |$\mathbf {W}$| are also mutually independent and normally distributed with zero mean and variance 1/2; this important property will be relevant below (a demonstration of it can be found in e.g. chapter 15 of Kay 1993). We will assume that |$\mathbf {s}$| is a periodic signal whose fundamental frequency is k0/N where k0 is a positive integer. In this case, all elements of |$\mathbf {S}$| are zero, except for indices that are integer multiples of k0 which contain all of the signal’s power (e.g. Ransom et al. 2002). The incoherent detection statistic I is defined as the sum of harmonic powers up to some rank h:
(28)
where we have used the Hermitian symmetry of the DFT of a real-valued input. In a practical search, the optimal number of harmonics to sum h is unknown a priori and depends mainly on the duty cycle of the pulse. This requires trying multiple values of h, and therefore calculating multiple incoherent detection statistics so that sensitivity is maximized to a broader range of pulse shapes. In practice, trial values of h are powers of two up to either 16 or 32.

4.2 Distribution of the incoherent detection statistic

Let us first define, for practical purposes, the fraction of power contained in the Fourier harmonics of the postulated signal |$\mathbf {s}$| up to rank h:
(29)
which is equal to 1 when all harmonics of the signal are summed. For each element of |$\mathbf {X}$|⁠, with the exception of X0, we can write
(30)
Here, we must recall that ℜWi and ℑWi are independent random variables that follow a normal distribution with mean 0 and variance 1/2, while Si is a complex-valued constant. This means that the terms |$\sqrt{2} \Re {W_i}$| and |$\sqrt{2} \Im {W_i}$| follow |$\mathcal {N}(0,1)$|⁠. The equation above is thus the sum of squares of two independent normal variables with unit variances, and means equal to |$a \sqrt{2} \Re {S_i}$| and |$a \sqrt{2} \Im {S_i}$|⁠, respectively. 2|Xi|2 therefore follows, by definition, a non-central chi-squared distribution:
(31)
where d denotes degrees of freedom and λ is the so-called non-centrality parameter: it is defined as the sum of the squared means of every contributing normal variable. The case of X0 must be treated separately; writing out its expression from equation (27) shows that it is real-valued and normally distributed with variance 1 instead of 1/2. Thus |X0|2 ∼ χ2(d = 1, λ = a2|S0|2). From there, we can simply expand each term of equation (28) into its real and imaginary part as we did in equation (30), remark that all random variables involved are mutually independent, and conclude that I(a, h) likewise follows the non-central chi-squared distribution below:2
(32)
The distribution of the incoherent detection statistic is fully specified by three parameters: the number of harmonics summed, the true amplitude of the signal, and its cumulative harmonic power distribution P(h) which can be numerically calculated by taking the DFT of any postulated pulse shape. It is important to note that the distribution does not depend on the signal period, and so will be the case of any relative differences in sensitivity between FFA and FFT.

4.3 Statistical power analysis

With the distribution of the detection statistic I now known, we only need to follow the procedure laid out in the previous section to determine its sensitivity to any periodic signal. A number of software packages provide routines accurate to machine precision that compute the PDF, CDF, and survival function (as well as their respective inverse functions) of the non-central chi-squared distribution; we used the python language implementations of the scipy.stats package (Virtanen et al. 2019) in this section. For a given number of harmonics summed h, the critical value ηI(h) of the incoherent detection statistic is obtained from equation (9), via the application of the inverse survival function of the chi-squared distribution with d = 2h + 1 degrees of freedom |$\overline{F}^{-1}_{\chi ^{2}(2h+1)}$|⁠:
(33)
The completeness γI (at significance level α) is a function of both signal amplitude a and number of harmonics summed h, and is likewise found using equation (10):
(34)
where |$\overline{F}_{\chi ^{2}(2h+1, \lambda (a))}$| is the survival function of the non-central chi-squared distribution of equation (32). The upper limit of the incoherent detection statistic is the true signal amplitude |$\mathcal {U}_I(h)$| such that the signal is detected with probability 0.5 in a sum of h harmonics, as defined in equation (11):
(35)
Calculating |$\mathcal {U}_I(h)$| means solving for a the equation γI(a, h) = 0.5; γI is a monotonically increasing function of a which means that the equation can be numerically solved quickly using the bisection method, an implementation of which is available in the scipy.optimize package. Obtaining the value of |$\mathcal {U}_I(h)$| as a function of pulse duty cycle and different values of h now allows us to directly compare the sensitivity of the standard FFT search method with that of the FFA, for the same fixed significance level α. We need to set two free parameters to complete this comparison:
  • The pulse shape, which defines the cumulative harmonic power distribution P(h): to remain consistent with the analysis done on the FFA above we also use Gaussians, parametrized by their full-width at half-maximum (FWHM) in units of pulse period

  • The significance level, that we set to 7σ, i.e. |$\alpha = \bar{\Phi }(7)$| where |$\bar{\Phi }$| is the survival function of the normal distribution.

We have plotted the upper limits |$\mathcal {U}_I(h)$| for h = 1, 2, ..., 32 in the left-hand panel of Fig. 3, as a function of Gaussian pulse FWHM: these different upper limits are shown as a set of annotated grey lines. For each pulse duty cycle, there is one optimal harmonic sum, which is that with the smallest upper limit and therefore highest sensitivity. The upper limit of the standard FFT search procedure, assuming that up to 32 harmonics are summed, is therefore
(36)
which is represented by the orange line on the left-hand side of Fig. 3. The ratios |$\mathcal {U}_\mathrm{z} / \mathcal {U}_\mathrm{I}(h)$| and |$\mathcal {U}_\mathrm{z} / \mathcal {U}_\mathrm{FFT}$| are directly comparable to the search efficiency defined in the previous section, which is evident from equation (24); these ratios are plotted on the right-hand panel of Fig. 3 and provide a more convenient comparison of search methods. The optimal upper limit |$\mathcal {U}_z$| can only be achieved by the FFA procedure if the pulse shape and width are known in advance, and thus we have also plotted the more realistic upper limit for a practical FFA search |$\mathcal {U}_\mathrm{FFA}$|⁠, given by equation (26).
Sensitivity comparison between an FFA-based search and the standard FFT procedure with incoherent harmonic summing. We assumed a significance threshold of 7σ and a periodic signal with a Gaussian pulse shape. Left-hand panel: upper limit as a function of Gaussian pulse FWHM (in log-scale), expressed in units of pulse period. The upper limit of a search procedure is the minimum signal amplitude (in units of background noise standard deviation) that it can detect with 50 per cent probability. See Section 3.1 for a rigorous definition of amplitude. The theoretical optimum upper limit ($\mathcal {U}_z = 7$ here) could be achieved by the FFA if the pulse shape and width were known a priori (blue dashed line); in practice, a penalty is incurred when using boxcar matched filters (blue solid line). FFT search codes compute multiple harmonic sums, typically up to 32 harmonics, whose individual upper limits are plotted as grey curves; the upper limit of the whole FFT procedure corresponds to their lower envelope (orange line). Right-hand panel: search efficiency $\mathcal {E}$, i.e. ratio between the theoretical optimum upper limit $\mathcal {U}_z$ and the upper limit of each search procedure. $\mathcal {E}_\mathrm{FFA} \approx 0.93$ in practice. It is implicitly assumed that the FFA search employs enough output phase bins to resolve the pulse.
Figure 3.

Sensitivity comparison between an FFA-based search and the standard FFT procedure with incoherent harmonic summing. We assumed a significance threshold of 7σ and a periodic signal with a Gaussian pulse shape. Left-hand panel: upper limit as a function of Gaussian pulse FWHM (in log-scale), expressed in units of pulse period. The upper limit of a search procedure is the minimum signal amplitude (in units of background noise standard deviation) that it can detect with 50 per cent probability. See Section 3.1 for a rigorous definition of amplitude. The theoretical optimum upper limit (⁠|$\mathcal {U}_z = 7$| here) could be achieved by the FFA if the pulse shape and width were known a priori (blue dashed line); in practice, a penalty is incurred when using boxcar matched filters (blue solid line). FFT search codes compute multiple harmonic sums, typically up to 32 harmonics, whose individual upper limits are plotted as grey curves; the upper limit of the whole FFT procedure corresponds to their lower envelope (orange line). Right-hand panel: search efficiency |$\mathcal {E}$|⁠, i.e. ratio between the theoretical optimum upper limit |$\mathcal {U}_z$| and the upper limit of each search procedure. |$\mathcal {E}_\mathrm{FFA} \approx 0.93$| in practice. It is implicitly assumed that the FFA search employs enough output phase bins to resolve the pulse.

We note that some past surveys (e.g. Manchester et al. 2001) have used the phase information of the harmonics of candidates reported by the incoherent harmonic sums to reconstruct their integrated pulse profile, and also estimate their phase-coherent S/N. Although this process is useful in removing chance detections, it does not change the upper limit of the search as a whole for a fixed false alarm probability α, because it can only be applied to signals bright enough to exceed the detection threshold of the FFT+IHS procedure in the first place. Computing resources permitting, implementing this second-stage selection process might however justify an increase of the acceptable α for the FFT+IHS procedure in practice.

4.4 Summary

We have rigorously shown that in the presence of pure uncorrelated Gaussian noise, a phase-coherent FFA-based search is more sensitive than the incoherent FFT search procedure regardless of pulse period and duty cycle. The improvement offered by the FFA is greater for shorter pulse duty cycles. We do note, however, that there may be some exotic pulse shapes for which the standard FFT procedure can still outperform the FFA in terms of sensitivity, if the set of matched filters used in the FFA search does not account for their possible existence. For example, a pulse with two widely spaced identical peaks may see only one of them matched by a set of single-peaked matched filters, resulting in at best |$\approx 71{{\ \rm per\ cent}}$| efficiency.3 In that sense, the incoherent harmonic sum could be said to be more agnostic to pulse shape. On the other hand, we have assumed that the fundamental signal frequency falls at the centre of a Fourier frequency bin, i.e. it is an integer multiple of 1/N. This corresponds to the best-case sensitivity scenario for the FFT method, where all of the signal power is concentrated in Fourier bins with frequencies that are multiples of the fundamental. Otherwise, a significant amount of power leaks to neighbouring bins, a sensitivity-reducing effect known as scalloping; it can in principle be eliminated using so-called Fourier interpolation, but not entirely so in practice due to limited computing resources (see section 4.1 of Ransom et al. 2002, for details). In large-scale searches, a computationally cheap version of Fourier interpolation called interbinning is employed, that results in a worst-case loss of SNR of 7.4 per cent, which is equivalent to further dividing the FFT’s search efficiency by 1.074.

5 AN IMPLEMENTATION OF THE FFA TO PROCESS LARGE-SCALE PULSAR SURVEYS

Implementing an FFA search pipeline that conserves all its desirable properties on ideal data when faced with real-world, imperfect data are challenging. In this section, we will describe all the practical problems that need to be surmounted, the proposed solutions that we implemented in our own FFA search software package riptide, and a rationale behind the technical choices that we made. A significant amount of practical knowledge was gained by testing and refining the implementation on a portion of the SUPERB survey (Keane et al. 2018), in particular on observations of known sources that were either faint or made difficult to detect by different strains of radio-frequency interference.

5.1 Overview of the implementation

The high-level interface of riptide is written in the python language, while the performance-critical sections of the code are implemented in optimized c functions. riptide provides a library of python functions and classes that can be used to perform interactive analysis and visualisation of dedispersed time series data, using for example ipython or jupyter. The high-level functions of riptide form the building blocks of its pipeline executable, that takes a set of dedispersed time series as input and outputs a number of data products, including a set of files containing the parameters of detected candidate signals and useful diagnostic information (see Section 5.7 for details on the pipeline itself). The code is modular so that it can in principle support arbitrary data formats; currently, riptide can process dedispersed time series files produced by either presto4 (Ransom et al. 2002) or sigproc5 (Lorimer 2011) pulsar software packages.

5.2 Data whitening and normalization

In Section 3, we have demonstrated the optimal sensitivity of the combination of the FFA and matched filtering, but under the significant assumption that the background noise was Gaussian with mean m and standard deviation σ known a priori, that we took to be 0 and 1, respectively, without loss of generality. In practice these parameters have to be estimated from the data directly, in the presence of an additional low-frequency noise component and possibly interference. Devising an efficient and unbiased estimator for m and σ in the general case is extremely challenging, as it would require fitting the low-frequency noise component parameters, any detectable signal regardless of its nature, and m and σ simultaneously. To that we must further add the constraint that evaluating the noise parameters should be done in a small fraction of the time taken by the search process.

Cameron et al. (2017) have partially avoided this problem by essentially evaluating m and σ in every folded profile separately, using its so-called baseline, a region of the pulse that appears to contain no signal defined by empirical criteria. This method has long been employed in pulsar search and timing software, and is implemented for example in the widely used psrchive package (Hotan, van Straten & Manchester 2004). However, doing so has a number of drawbacks. In particular, the estimators for the noise parameters have to work with a small number of data points, no larger than the number of phase bins, yielding uncertain estimates that significantly vary from one pulse to the other, and in turn increase the uncertainty of the output S/N values. At longer trial periods, the baseline is also more likely to be contaminated by low-frequency noise. Furthermore, and most importantly, any procedural method to locate the baseline region needs to use criteria such as excluding a putative pulse even where there is none present (e.g. ‘Algorithm 2’ in Cameron et al. 2017), or search for a contiguous segment of the profile with the lowest mean (implemented in the pdmp utility of psrchive). The two methods above yield estimates of m and σ that are systematically biased low in the presence of pure noise, and therefore S/N values that are biased high, which invalidates the optimal sensitivity demonstration made in Section 3.

Our priority was therefore to remain within all mathematical assumptions of Section 3, at least to the extent allowed by real-world data. We have chosen to directly evaluate the noise parameters from the whole input time series once during pre-processing, in an attempt to obtain unbiased and low-variance estimates. This approach is also used by Parent et al. (2018). In our implementation, we first remove the low-frequency noise component by subtracting a running median from the input data; the width of the median window is left to the user to choose and optimize. The running median filter has the desirable property of leaving intact any constant impulse whose width is less than half that of the window (Gallagher & Wise 1981). Then the input time series is normalized to zero mean and unit standard deviation before being searched.

5.3 Searching a period range

A given period range [P0, P1] could naively be searched by calculating a sequence of folding transforms, with base periods (and therefore number of output phase bins) expressed in number of samples ranging from p0 = ⌊P0/τ⌋ to p1 = ⌈P1/τ⌉, where τ is the time resolution of the input data, |$\lfloor \, \rfloor$| and |$\lceil \, \rceil$| are the floor and ceiling functions, respectively. However, if P1/P0 ≫ 1 this procedure has a highly non-uniform phase and pulse duty cycle resolution, which can become excessively high at trial periods of a few seconds and higher, especially in an era where τ is typically on order of tens of microseconds. The standard way to circumvent the problem, adopted by previous FFA implementations (Kondratiev et al. 2009; Cameron et al. 2017), is to iteratively downsample the input data by a factor of two, and consecutively search so-called octaves of the form [2kP0, 2k + 1P0], where within each octave the number of phase bins in the output folded profiles increases from b to 2b proportionally to the trial period, with b being a user-defined parameter that sets the minimum duty cycle resolution. We have implemented the same procedure in riptide, with one difference worth mentioning: we have provided the option to use real-valued, strictly smaller than 2 downsampling factors which generates more numerous and shorter pseudo-octaves. In practice, the user chooses a minimum and maximum number of output of phase bins bmin and bmax, respectively, and the data are iteratively downsampled by f = (bmax + 1)/bmin rather than f = 2. We have made this choice both to maintain the duty cycle resolution much closer to uniform, rather than let it oscillate by a factor of two; furthermore, if what matters most when reporting on a large-scale FFA search is the lowest duty cycle resolution across the period range explored, then one should attempt to increase the minimum number of phase bins while maintaining the maximum number of phase bins as low as possible to avoid spending additional computing resources. The whole search process is summarized as a diagram in Fig. 4.

Sequence of operations performed by riptide when processing a dedispersed time series over a user-defined period range, see also Section 5.3 for a detailed explanation. The output periodogram is two-dimensional as the code returns the highest S/N per period trial and per pulse width trial, rather than just the best S/N per period trial across all pulse widths. In practice this can greatly improve sensitivity to narrow pulses in the presence of interference or low-frequency noise (see text and Fig. 5).
Figure 4.

Sequence of operations performed by riptide when processing a dedispersed time series over a user-defined period range, see also Section 5.3 for a detailed explanation. The output periodogram is two-dimensional as the code returns the highest S/N per period trial and per pulse width trial, rather than just the best S/N per period trial across all pulse widths. In practice this can greatly improve sensitivity to narrow pulses in the presence of interference or low-frequency noise (see text and Fig. 5).

There are a few of details to be aware of here, firstly that using a non-integer factor makes any two consecutive samples in the downsampled data correlated with each other. This is not a problem in principle because when folding, the samples integrated in any given phase bin are not consecutive and therefore uncorrelated. But this may not be the case anymore if chaining (i.e. composing) many downsampling operations, which we carefully avoid doing in our implementation. When moving from one pseudo-octave to the next, we always return to the original highest resolution data, and downsample it with the appropriately chosen, geometrically increasing factor. The real caveat actually lies in how downsampling by a real-valued factor f affects the noise variance in the output samples. If the input data are uncorrelated Gaussian noise with unit variance, then the sample variance of the output data is |$f - \frac{1}{3}$| rather than f as one would expect when using an integer factor. This is shown in Section A4 and properly accounted for in riptide.

5.4 Matched filtering

While the folded profiles can be, and ideally should be, convolved with pulse templates of highly diverse shapes, we have limited ourselves to only boxcars for computing performance reasons at least in the current version of the code. The loss of sensitivity is mild as long as the profile is single peaked, a loss that has already been evaluated above and in Section A3. Circular convolution with a set of boxcar filters is particularly fast, as it can be implemented as ‘strided differences’ of the exclusive prefix sum of the profile being evaluated. If P is the profile array with n bins, its exclusive prefix sum S is an array whose elements are defined as
(37)
with the exception S0 = 0. If B is a boxcar of width w bins and unit height, and C = PB their circular convolution product, then
(38)
noting that the array index of P is taken modulo n because the convolution is circular, and for the same reason we calculate S over two concatenated repeats of the profile P. The height of the boxcar is also properly scaled as a function of the number of pulse periods summed, to follow the unit square sum normalization of Section 3. This implementation is quite efficient on the CPU since it re-uses the profile prefix sum array multiple times, which typically fits into the L1 cache. The sequence of trial boxcar widths used in the search is controlled by two user-defined parameters: a maximum trial duty cycle (20 per cent by default), and a geometric spacing factor fsp between consecutive trial widths (fsp = 1.5 by default). The first trial width is always w0 = 1 sample, and the whole sequence is generated with the recurrence relation wk + 1 = max {⌊fspwk⌋, wk + 1} up to a maximum value.

5.5 Identifying statistically significant candidate signals

Overall the combination of FFA and matched filtering returns S/N as a function of trial period, width, and phase, which can be too large to be fully kept in memory especially when processing multiple dispersion measure (DM) trials. FFA implementations typically reduce this output to a so-called periodogram, a one-dimensional array containing the highest S/N as a function of trial period (across all possible trial widths and phases). On ideal data containing only uncorrelated Gaussian noise and a possible periodic signal, identifying statistically significant signal periods is a trivial exercise of applying a threshold to the periodogram; the threshold value is the critical value of the z-statistic (equation 16), i.e. the number of Gaussian sigmas that correspond to the desired false alarm probability. However, situations where this is not applicable arise relatively often in practice. If the data are contaminated by low-frequency noise, then it will manifest itself in every folded profile at longer trial periods (typically a few seconds) in the form of a fluctuating baseline, to which the widest matched filters have optimal response. Randomly occurring bursts of interference are also empirically found to have a similar effect. We have noticed on several Parkes L-band test cases that keeping in memory only the best S/N value across all trial widths runs the risk of missing long period pulsar signals that could otherwise be found in a blind search. This is illustrated in Fig. 5 on a Parkes multibeam observation of the 12.1-s pulsar PSR J2251–3711 (Morello et al. 2020) affected by interference: the pulsar emerges as a clear peak if considering the optimal trial width in isolation, but is undetectable if the only information available is the best S/N versus period plot. We therefore keep in memory the highest S/N as a function of both trial period and width, making the output periodograms of riptide two-dimensional; each pulse width trial is then post-processed separately.

Highlighting the advantages of analysing pulse width trials separately when searching for long-period faint sources. Here, we have run riptide on a 9-min Parkes L-band observation of PSR J2251–3711, dedispersed at the known DM of the pulsar. The 12.1-s period of the pulsar is highlighted in orange; many harmonics of an interference source with a 1-s period are detected. Top panel: best S/N obtained across all trial pulse widths as a function of trial period. For most trial periods, the highest S/N is obtained with the widest matched filter due to its optimal response to residual low-frequency noise, and the peak associated with the pulsar is difficult to distinguish from the background. Bottom panel: S/N obtained specifically with the optimal 3-bin width matched filter, where the associated peak could be identified in a blind search. $\rm{\small RIPTIDE}$ keeps one periodogram per width trial in memory which are then independently searched for peaks, in an attempt to be robust to situations with imperfect data.
Figure 5.

Highlighting the advantages of analysing pulse width trials separately when searching for long-period faint sources. Here, we have run riptide on a 9-min Parkes L-band observation of PSR J2251–3711, dedispersed at the known DM of the pulsar. The 12.1-s period of the pulsar is highlighted in orange; many harmonics of an interference source with a 1-s period are detected. Top panel: best S/N obtained across all trial pulse widths as a function of trial period. For most trial periods, the highest S/N is obtained with the widest matched filter due to its optimal response to residual low-frequency noise, and the peak associated with the pulsar is difficult to distinguish from the background. Bottom panel: S/N obtained specifically with the optimal 3-bin width matched filter, where the associated peak could be identified in a blind search. |$\rm{\small RIPTIDE}$| keeps one periodogram per width trial in memory which are then independently searched for peaks, in an attempt to be robust to situations with imperfect data.

Furthermore, periodograms of contaminated data commonly show a trend of S/N increasing as a function of period, and all reported S/N values above a certain trial period may exceed the theoretical significance threshold. Whether or not a peak is significant can then only be evaluated relative to neighbouring period trials; visual inspection of periodograms is not an option in a large-scale survey, and a reliable peak identification algorithm is required. This is a crucial part of a pipeline, which if not carefully designed can vastly reduce the inherent sensitivity of the FFA itself in many imperfect data situations; it should also run in a reasonable fraction of the time required by the computation of the periodogram itself. Our peak finding algorithm is based on evaluating, as a function of trial period, the overall trend of the S/N and the dispersion around it (Fig. 6), in order to determine how far above the trend a point should lie to be considered significant. We first divide the whole trial frequency range into equal-sized segments. Here, we operate on trial frequency rather than trial period, basing ourselves on the property that the complex-valued coefficients of the DFT of uncorrelated Gaussian noise are also uncorrelated; the DFT frequency bins are spaced by 1/T where T is the integration time, and one can expect the S/N values in an FFA periodogram to decorrelate on a frequency scale on order of 1/T as well. The width of the frequency segments is thus chosen to be Δzseg/T where Δzseg is a dimensionless free parameter. In each segment, we calculate the median m of the S/N values and estimate their standard deviation σ from the interquartile range (IQR) of the S/N values using σ = IQR/1.349, in order to be robust to outliers. An estimate of the local significance threshold is s = m + kσ where k is a dimensionless parameter defining the significance level. All segments thus provide a set of control points (νi, si) where νi is the centre frequency of the i-th segment and si the estimated local significance threshold. The last step is to fit a function of frequency to this set of points. Here, the choice of functional form might aim to reflect that the data are expected to contain two noise components: a Gaussian component with uniform power as a function of frequency, and a low-frequency component whose power is typically modelled as proportional to ν−α where α > 0 is the noise spectral index (α = 2 in the case of Brownian noise). With signal power being the square of S/N, a logical choice appears to be a function of the form |$s = \sqrt{A \nu ^{-\alpha } + B}$|⁠; however, we found that it often yields a poor fit to the control points on real test cases, by overestimating the selection threshold at low frequencies. We attribute this to the red noise reduction effect of the running median filter and the presence of interference. Furthermore, this fit is non-linear and does not converge all the time, which is highly undesirable as we aim for maximum reliability. In the end, we found that the best results were obtained with a polynomial in log (ν), a function that does not diverge around ν = 0 and can be reliably fit with the linear least-squares method. The degree of the polynomial is exposed as a free parameter, and is equal to 2 by default.

An explanation of the peak detection algorithm employed in riptide, using the same observation of PSR J2251–3711 as in Fig. 5. For each individual trial width, the S/N versus trial frequency array is partitioned into equal-sized frequency segments with a width of a few times 1/T where T is the length of the input time series. Within every segment, the median m and standard deviation σ of the S/N are calculated using robust estimators. Each segment yields a control point with coordinates (ν, s), where ν corresponds to the centre frequency of the segment and s = m + kσ where k is a parameter set to k = 6 by default. Finally, a second-degree polynomial in log (ν) is fitted to the control points, yielding the peak selection threshold (orange line). Any points lying above the threshold are clustered into peaks. The detection of the pulsar is highlighted with a red background and bolder points. See text for full details and the design rationale for this algorithm. Note that the frequency segments are much narrower in practice, they have been widened here for readability.
Figure 6.

An explanation of the peak detection algorithm employed in riptide, using the same observation of PSR J2251–3711 as in Fig. 5. For each individual trial width, the S/N versus trial frequency array is partitioned into equal-sized frequency segments with a width of a few times 1/T where T is the length of the input time series. Within every segment, the median m and standard deviation σ of the S/N are calculated using robust estimators. Each segment yields a control point with coordinates (ν, s), where ν corresponds to the centre frequency of the segment and s = m + kσ where k is a parameter set to k = 6 by default. Finally, a second-degree polynomial in log (ν) is fitted to the control points, yielding the peak selection threshold (orange line). Any points lying above the threshold are clustered into peaks. The detection of the pulsar is highlighted with a red background and bolder points. See text for full details and the design rationale for this algorithm. Note that the frequency segments are much narrower in practice, they have been widened here for readability.

We can note that the exact same candidate selection problems caused by S/N trends are encountered in FFT-based periodicity searches as well, where the low-frequency bins of Fourier spectra contain significant excess power. The classical solution in FFT search implementations is to perform so-called ‘spectral whitening’, where portions of the power spectra are divided by a factor dependent on the locally estimated power average (Israel & Stella 1996; Ransom et al. 2002). Our dynamic thresholding algorithm above is conceptually similar.

5.6 Improving performance with dynamic duty cycle resolution

A useful feature of our implementation is the option to split the search period range into a number of user-defined segments, each using a different number of phase bins, where the idea is to use more bins (i.e. a higher duty cycle resolution) at longer trial periods. Indeed the complexity of an FFA search scales unfavourably with both the minimum period being searched and the maximum desired duty cycle resolution. Searching an octave means performing a sequence of folding transforms and convolving every output profile with matched filters. Using notations from Section 2, we note N is the total number of samples in the input time series after any appropriate downsampling, p the base period in samples of the folding transform and m the number of rows such that N = mp. The cost of the folding transform itself is mplog2(m) additions if m is a power of two (Staelin 1969). The cost of matched filtering with boxcars, using the method of Section 5.4, is equal to the product of the number of rows, number of phase bins (p) and number of boxcar matched filters. If geometrically spacing the consecutive pulse width trials by a factor of 2, the number of matched filters is approximately equal to log2(p). The number of arithmetic operations associated with a single folding transform can thus be estimated by
(39)
Searching an octave where the trial periods range between b and 2b therefore costs |$\mathcal {C}_{\rm {octave}} = b N \log _2(N)$|⁠. Here, b represents the shortest trial period in number of samples and determines the minimum duty cycle resolution 1/b. The asymptotic complexity of searching several octaves using iterative downsampling is dominated by the cost of the first, shortest period octave; the cost of searching a time series is thus |$\mathcal {C}_{\rm {TS}} = \mathcal {O}(b N \log _2(N))$|⁠. If we note T the integration time and P0 the shortest trial period in seconds, then the input data can be initially downsampled to an optimal time resolution of τ = P0/b, which means the effective number of samples to search is N = T/τ. Noting ν0 = 1/P0 the highest trial frequency, we can re-express |$\mathcal {C}_{\rm {TS}}$| as
(40)
However, if searching a range of DMs, one must limit the pulse broadening effects of dedispersing at an incorrect DM (e.g. Cordes & McLaughlin 2003). To achieve adequate DM space coverage, the trial DM step needs to be proportional to the minimum pulse width being searched Wmin = P0/b, which means that the number of required DM trials scales with ν0b. The total asymptotic complexity of searching all DMs is thus
(41)
The b3 term is a strong incentive to avoid using the same duty cycle resolution across a search period range that may span two orders of magnitude or more in practice. When using the full pipeline, riptide users are therefore given the option to partition the period search range into any number of arbitrary sub-ranges, each with the duty cycle resolution of their choice depending on computing resources available. This might enable for example searching periods down to 100 ms or even shorter with a small (≈100) number of phase bins, while keeping a high duty cycle resolution (≳1000 phase bins) for periods of a second or longer.

5.7 Full pipeline overview

In a large-scale pulsar survey processing scenario, an FFT-based search that would be running simultaneously already requires an observation to be dedispersed into a set of time series each with their trial dispersion measure, with a DM step almost certainly finer than required by an FFA search. The riptide pipeline is designed to take advantage of this situation, taking a set of existing DM trials as an input and processing only the minimal subset necessary to achieve adequate DM space coverage; the trial DM step is chosen using the classical method, that is based on limiting the pulse broadening time-scale due to dedispersing the data at a DM different from that of the source (e.g. chapter 6 of Lorimer & Kramer 2004). If running riptide on its own, then the user is responsible for dedispersing the data with the appropriate DM step beforehand. All selected DM trials are then distributed between available CPUs using the multiprocessing standard library of python, and processed in parallel. For each DM trial a periodogram is produced and searched for peaks, but only the peak parameters are accumulated in memory. Once all time series have been processed, the parallel section of the pipeline ends. All reported peaks are then grouped into clusters in frequency space, and clusters that are harmonics of others are flagged. The user decides whether such harmonics are kept and later output as candidates, or discarded immediately; finally, for each remaining cluster, a candidate object is produced. The code saves the following data products:

  • A table of all detected peak parameters before clustering. These include: period, DM, width in phase bins, duty cycle, S/N.

  • A table of cluster parameters, with any harmonic relationships. The cluster parameters are simply that of its brightest member peak object.

  • Candidate files in JSON format each containing: a list of best-fitting parameters (period, DM, S/N, duty cycle), a subintegrations plot produced by folding the de-reddened and normalized time series associated with the best-fitting DM, all available metadata of the observation (e.g. coordinates, integration time, observing frequency, etc.), and a table of all detected peaks associated with the candidate.

  • Candidate plots, although this can be disabled to save time; plots can still be produced later.

Saving intermediate data products enables one to test all post-processing steps on known pulsar observations, and further improve them in the future. It is important to verify for example if a known pulsar is missed because it is genuinely undetectable, or if it registers as a peak and then is lost due to erroneous clustering or harmonic flagging.

5.8 Benchmarks

To measure the speed of the riptide pipeline, we have decided to avoid using a pulsar observation, because the total number of candidate files that must be produced by the final stage of the pipeline is dependent on the signal content of the data. For reproducibility purposes, we have therefore measured the execution time of the riptide pipeline (version 0.1.0) on a set of 128 DM trials made of artificially generated Gaussian noise. Each time series was 9 min long, containing 223 time samples with a 64 µs sampling interval, reproducing the parameters of recent Parkes multibeam receiver surveys (Keith et al. 2010; Keane et al. 2018). The benchmark was conducted on a node of the OzStar computing facility, reserving 16 Intel Gold 6140 @ 2.3GHz CPU cores to search the data in parallel. Here, we deliberately did not use the dynamic duty cycle resolution feature previously described, choosing instead to specify a single, wide period search range with the same bmin and bmax parameters in order to keep the experiment simple. We provided riptide with the following base configuration:

  • Minimum search period Pmin = 1.0 s.

  • Maximum search period Pmin = 120.0 s.

  • Average number of phase bins bavg = 1024, for a duty cycle resolution of |$\delta = 0.1{{\ \rm per\ cent}}$|⁠. In practice we specified the code parameters bmax and bmin to be equal to bavg plus and minus 4 per cent, respectively.

  • Nproc = 16 parallel processes, one per available CPU core.

We then measured how individually changing the parameters Pmin, bavg, and Nproc (while keeping all other parameters fixed) affected the total processing time per DM trial. The results are shown in Fig. 7. The execution times as a function of minimum trial period and number of phase bins are well predicted by the asymptotic complexity expression given in equation (40). A deviation from the prediction is observed where the search is computationally cheap, that is for both lower number of phase bins and higher minimum trial periods: in this regime, the total execution time is instead dominated by the overheads of reading, de-reddening and normalizing the input data. However, the performance of the pipeline was not found to scale quite linearly with the number of CPUs employed; although some operations of the pipeline run sequentially and do not benefit from multiple CPUs, we found that the non-parallel sections of the code consumed a negligible amount of time in this experiment. We therefore attribute the observed effect to limited memory bandwidth, for which multiple CPUs likely compete. Overall, a single 9-min DM trial can be processed in approximately one second on a modern compute node with 16 CPU cores, while using at least 1000 phase bins for trial periods of a second or longer.

Processing time per DM trial of the riptide pipeline, evaluated on a set of 128 artificial DM trials with a length of 9 min and a sampling time of 64 µs. The pipeline was initially configured to search trial periods from 1 to 120 s with 1024 phase bins, and allowed to use 16 CPU cores. We then measured how changing one configuration parameter at a time from this initial setup affected the execution time. The plots show the total processing time per DM trial as a function of the average number of output phase bins (left), minimum trial period (middle), and number of CPU cores (right). For the left and middle plots, the expected scaling is extrapolated from the most intensive test case (i.e. with the highest run time) using equation (40), which gives the asymptotic complexity of searching a single time series. For the right plot, the expected scaling is a linear performance increase as a function of number of CPU cores. See text for more details on the experimental setup and a discussion of the results.
Figure 7.

Processing time per DM trial of the riptide pipeline, evaluated on a set of 128 artificial DM trials with a length of 9 min and a sampling time of 64 µs. The pipeline was initially configured to search trial periods from 1 to 120 s with 1024 phase bins, and allowed to use 16 CPU cores. We then measured how changing one configuration parameter at a time from this initial setup affected the execution time. The plots show the total processing time per DM trial as a function of the average number of output phase bins (left), minimum trial period (middle), and number of CPU cores (right). For the left and middle plots, the expected scaling is extrapolated from the most intensive test case (i.e. with the highest run time) using equation (40), which gives the asymptotic complexity of searching a single time series. For the right plot, the expected scaling is a linear performance increase as a function of number of CPU cores. See text for more details on the experimental setup and a discussion of the results.

6 DISCUSSION

6.1 A more realistic radiometer equation to determine detectable flux densities

The following form of the radiometer equation (Dewey et al. 1985; Lorimer & Kramer 2004) has often been used to evaluate the minimum detectable average flux density Smin in a pulsar survey:
(42)
where β is a loss factor due to digitization,6Tsys is the system temperature, G the telescope gain, np the number of polarization channels summed, tint the integration time, Δf the observing bandwidth, P the signal period, and W the effective pulse width in the dedispersed data, i.e. with all broadening effects taken into account. S/Nmin is quoted as the ‘detection threshold’ in the original paper, which is incorrect; instead one should substitute it for the actual minimum detectable amplitude, the upper limit of the search. Indeed the detection threshold is the number of Gaussian sigmas that corresponds to the false alarm probability α, i.e. the critical value of the Z-statistic (equation 16); the upper limit is equal to the detection threshold if and only if performing phase-coherent folding followed by matched filtering with the exact pulse shape of the signal (equation 18). Yu (2018) has previously reported the discrepancy, and that the equation above is misused in a number of pulsar survey papers. If we retain S/Nmin as the detection threshold in the equation above, then the search efficiency |$\mathcal {E}$| needs to be taken into account as follows:
(43)
For an FFA search, |$\mathcal {E}_\mathrm{FFA} = 0.93$| can be used as long as the duty cycles considered are resolved in the folded output. For an incoherent FFT search, |$\mathcal {E}$| is a function of effective pulse duty cycle δ = W/P; for practical use, we have fitted a sigmoid function to the search efficiency shown in the right-hand panel of Fig. 3. Assuming that up to 32 harmonics can be summed:
(44)
which approximates the exact value derived in Section 4 with a relative error no larger than 6 per cent for 10−4 ≤ δ ≤ 0.5. The exact value for the median pulsar duty cycle of 2.8 per cent (from the ATNF pulsar catalogue, Manchester et al. 2005) is |$\mathcal {E}_\mathrm{FFT} = 0.70$|⁠, which remains valid even if summing a maximum of 16 harmonics. These corrections do not account for any additional reduction in sensitivity incurred from data imperfections, various software implementation details and the possibilities of losing a possible detection in the many stages of the search process. Whenever possible, the true minimum detectable average flux density should be evaluated via signal injection in real data processing conditions, as previously done in Lazarus et al. (2015) and Parent et al. (2018). Finally, we note that there are multiple conventions as to what the β factor represents in the radiometer equation above. As an example, Johnston et al. (1992) have included in β not only losses due to digitization, but also what they called ‘approximations made in the software’; it could potentially refer to an estimate of the search efficiency. Likewise, Dewey et al. (1985) defined β as ‘various system losses’.

Above and beyond assessing the detection efficiency, a further caveat is needed when using values calculated by equation (43) to infer intrinsic pulsar properties. It would be incorrect to interpret flux density values determined in this way, that apply to detection thresholds in the final dedispersed data stream, as simply scaled versions of intrinsic pulsar properties. For example it is common to scale up flux densities determined from the (even uncorrected) radiometer equation by the square of the distance to a pulsar to infer an intrinsic property. When doing this, at the very least an extra efficiency factor should come into play to account for the various transfer functions of the interstellar medium, telescope system, dedispersion procedures, etc., and in reality the inversion might be quite complex and line-of-sight dependent. The end result, as to what fraction of the intrinsic population is probed by these searches, contains this difficult-to-account-for incompleteness.

6.2 Improving FFT searches with coherent harmonic summing

It has been previously suggested (Kondratiev et al. 2009; Lazarus et al. 2015; Parent et al. 2018) that the reduced sensitivity to narrow pulses experienced by the standard FFT procedure is foremost caused by the limit of 32 harmonics usually imposed on the incoherent harmonic sums. We have put this idea to the test in Fig. 8, running the sensitivity comparison again but this time allowing up to 1024 harmonics to be summed. From the results we instead conclude that the lack of sensitivity of the standard FFT+IHS method to narrower pulses is mainly a consequence of its incoherence, i.e. the discarding of all phase information contained in complex-valued Fourier spectra. It is possible, however, to achieve the same level of sensitivity as the coherent FFA-based procedure in a Fourier domain search. Indeed the Z-statistic can be calculated as follows, remarking that the unitary DFT conserves the complex dot product:
(45)
where |$\mathbf {X}$| and |$\mathbf {S}$| are, respectively, the unitary DFTs of the data and the signal template being searched for, and the overline represents complex conjugation. The formula above represents a coherent form of harmonic summing, which is equivalent to folding and matched filtering in the time domain. Coherent harmonic summing is a potentially attractive solution, as it could be integrated into an existing FFT+IHS pipeline where the Fourier transform is already performed for free, and used as a complementary method to cover the short duty cycle regime where the incoherent search is least efficient. If too costly, this coherent search could be limited to either longer signal periods or isolated pulsars, ignoring any Doppler-induced variations of signal period due to binary motion which are expensive to compensate for (Johnston & Kulkarni 1991; Ransom et al. 2002).
Same search efficiency plot as in Fig. 3, also assuming a 7σ significance level and a Gaussian pulse, except that we have allowed for the incoherent summation of up to 1024 harmonics. We have extended the X-axis to include pulse duty cycles down to 0.01 per cent. The usual limit of 32 harmonics summed in most search codes is highlighted in black. Summing more harmonics is beneficial for duty cycles shorter than about 1 per cent, but cannot compensate for the lack of phase-coherence of this search procedure.
Figure 8.

Same search efficiency plot as in Fig. 3, also assuming a 7σ significance level and a Gaussian pulse, except that we have allowed for the incoherent summation of up to 1024 harmonics. We have extended the X-axis to include pulse duty cycles down to 0.01 per cent. The usual limit of 32 harmonics summed in most search codes is highlighted in black. Summing more harmonics is beneficial for duty cycles shorter than about 1 per cent, but cannot compensate for the lack of phase-coherence of this search procedure.

6.3 Efficacy of the FFA in a real search

We have covered in-depth the topic of the theoretical sensitivity of the FFA, but its ability to discover new pulsars in practice is of high importance. riptide has successfully been used to discover new pulsars that were missed by FFT search codes running on the same data. An early development version of riptide was used to process a portion of the SUPERB survey (Keane et al. 2018). This search was highly experimental for several reasons; the pipeline used a crude peak detection algorithm to analyse periodograms which may have resulted in missed detections, and employed a low number of phase bins (250) even at the longest search periods which clearly limited its ability to find short duty cycle pulsars similar to the 23.5-s PSR J0250+5854 (Tan et al. 2018). Furthermore, the accuracy of the candidate selection algorithm employed was limited compared to the one running on the candidates produced by the standard FFT search. In spite of these limitations, 9 previously unknown pulsars missed in the FFT analysis were discovered, with periods ranging from 373 ms to 2.4 s and duty cycles larger than one per cent. Timing solutions for these discoveries are presented in Spiewak et al. (2020). A new, deeper FFA search of the entire SUPERB survey and of several other Parkes archival surveys will be performed soon (Morello et al., in preparation).

More recently, riptide has been used to discover the previously unknown pulsar PSR J0043–73, with a 937 ms period, in a deep targeted search of the Small Magellanic Cloud (Titus et al. 2019). This source was not detected by the FFT-based presto running on the same data, although it should be noted that to save computing time, it was only allowed to sum up to 8 harmonics. In the same data, the known pulsar PSR J0131–7310, with a 348-ms period, was also detected exclusively by riptide. Furthermore, a complete re-processing of the LOFAR Tied-Array All-Sky Survey (Sanidas et al. 2019) is currently in progress where riptide is being run in parallel to presto. The results will be reported in a future paper (Bezuidenhout et al., in preparation).

The fact that riptide has discovered a number of pulsars with relatively short periods (less than a second), in a regime where the FFA was previously thought to bring no substantial benefits, is both encouraging and consistent with the theoretical sensitivity analysis presented here. Any difference in the number of pulsars detected between FFA and FFT search codes may result not only from the higher intrinsic sensitivity of the FFA, that we have demonstrated in this paper, but also from other implementation details such as low-frequency noise and interference mitigation schemes, which may be of equal if not higher importance. A thorough empirical comparison between riptide and other FFT search codes (presto in particular) on real data will be performed in the aforementioned upcoming publications.

7 CONCLUSION

We have analytically shown that a search procedure that combines a FFA with matched filtering is optimally sensitive to periodic signals in the presence of uncorrelated Gaussian noise. In particular it is more sensitive than the standard FFT technique with incoherent harmonic summing to all periodic signals. We have defined a search efficiency term, to rigorously compare the sensitivities of pulsar searching procedures; the optimal search efficiency is 1 when phase-coherently folding the input data and correlating the output with a matched filter reproducing the exact shape of the signal. We have estimated the practical efficiency of both an FFA search (0.93), and of an FFT search with incoherent harmonic summing (0.70 for the median pulsar duty cycle). The efficiency of the FFT+IHS is even lower for shorter pulse duty cycles δ. For example it is a factor of at least 4 less sensitive than the FFA for |$\delta = 0.1{{\ \rm per\ cent}}$|⁠. The advantage of the FFA could be even greater in practice on long-period signals due to how FFT search codes mitigate the effects of low-frequency noise, although this must be investigated further.

There are two main consequences. First, that a portion of the pulsar phase space has not been properly searched in nearly all radio pulsar surveys, and that the inclusion of an FFA search code in all processing pipelines should now become standard practice. For that purpose, we have made our FFA implementation, riptide, publicly available. Secondly, we have discussed how the search efficiency term must be properly taken into account when trying to infer properties of the general pulsar population from the results of a pulsar search survey; we have warned against a potential misuse of the radiometer equation where the search efficiency is implicitly assumed to be equal to 1, even for an incoherent and thus less efficient search method. This leads to a significant overestimate of the sensitivity of a survey, and could have important consequences for pulsar population synthesis studies, which may also need to take into account the duty cycle dependence of the standard FFT+IHS method’s efficiency.

ACKNOWLEDGEMENTS

VM and BWS acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 694745). Code benchmarks were performed on the OzSTAR national facility at Swinburne University of Technology. OzSTAR is funded by Swinburne and the National Collaborative Research Infrastructure Strategy (NCRIS). VM thanks Greg Ashton, Scott Ransom, Anne Archibald, and Kendrick Smith for useful discussions, and the organizers of the IAU symposium ‘Pulsar Astrophysics – The Next 50 Years’ for making some of these discussions possible in the first place. Finally, we thank the anonymous referee for helping considerably improve the clarity of the paper.

DATA AVAILABILITY

The discovery observation of PSR J2251–3711, used to demonstrate the peak finding algorithm implemented in riptide (Figs 5 and 6), as well as the benchmark data (Fig. 7) will be shared on reasonable request to the corresponding author.

Footnotes

2

Groth (1975) provides, with a different formalism, series expansions for the PDF and CDF of this distribution but does not mention its name.

3

I.e. |$1 / \sqrt{2}$|⁠, the reason being that unit S/N pulses have unit square sum, quite an important detail of this paper

6

Although some authors include other losses in β, which is relevant to the discussion here, see below.

REFERENCES

Burns
W. R.
,
Clark
B. G.
,
1969
,
A&A
,
2
,
280

Cameron
A. D.
,
Barr
E. D.
,
Champion
D. J.
,
Kramer
M.
,
Zhu
W. W.
,
2017
,
MNRAS
,
468
,
1994

Casella
G.
,
Berger
R.
,
2002
,
Statistical Inference
,
Cengage Learning
, Boston, MA

Cooley
J. W.
,
Tukey
J. W.
,
1965
,
Math. Comput.
,
19
,
297

Cordes
J. M.
,
McLaughlin
M. A.
,
2003
,
ApJ
,
596
,
1142

Cordes
J. M.
et al. ,
2006
,
ApJ
,
637
,
446

Crawford
F.
,
Lorimer
D. R.
,
Devour
B. M.
,
Takacs
B. P.
,
Kondratiev
V. I.
,
2009
,
ApJ
,
696
,
574

Dewey
R. J.
,
Taylor
J. H.
,
Weisberg
J. M.
,
Stokes
G. H.
,
1985
,
ApJ
,
294
,
L25

Faulkner
A. J.
et al. ,
2004
,
MNRAS
,
355
,
147

Gallagher
N.
,
Wise
G.
,
1981
,
IEEE Trans. Acoust. Speech Signal Process.
,
29
,
1136

Groth
E. J.
,
1975
,
ApJS
,
29
,
285

Hotan
A. W.
,
van Straten
W.
,
Manchester
R. N.
,
2004
,
PASA
,
21
,
302

Israel
G. L.
,
Stella
L.
,
1996
,
ApJ
,
468
,
369

Johnston
H. M.
,
Kulkarni
S. R.
,
1991
,
ApJ
,
368
,
504

Johnston
S.
,
Lyne
A. G.
,
Manchester
R. N.
,
Kniffen
D. A.
,
D’Amico
N.
,
Lim
J.
,
Ashworth
M.
,
1992
,
MNRAS
,
255
,
401

Kashyap
V. L.
,
van Dyk
D. A.
,
Connors
A.
,
Freeman
P. E.
,
Siemiginowska
A.
,
Xu
J.
,
Zezas
A.
,
2010
,
ApJ
,
719
,
900

Kay
S. M.
,
1993
,
Fundamentals of Statistical Signal Processing
.
Prentice Hall PTR, Upper Saddle River
,
NJ

Keane
E. F.
et al. ,
2018
,
MNRAS
,
473
,
116

Keith
M. J.
et al. ,
2010
,
MNRAS
,
409
,
619

Kondratiev
V. I.
,
McLaughlin
M. A.
,
Lorimer
D. R.
,
Burgay
M.
,
Possenti
A.
,
Turolla
R.
,
Popov
S. B.
,
Zane
S.
,
2009
,
ApJ
,
702
,
692

Korpela
E.
,
Werthimer
D.
,
Anderson
D.
,
Cobb
J.
,
Leboisky
M.
,
2001
,
Comput. Sci. Eng.
,
3
,
78

Lazarus
P.
et al. ,
2015
,
ApJ
,
812
,
81

Lorimer
D. R.
,
2011
,
Astrophysics Source Code Library
,
record ascl:1107.016

Lorimer
D. R.
,
Kramer
M.
,
2004
,
Handbook of Pulsar Astronomy
,
Cambridge University Press
,
Cambridge, UK

Lyon
R. J.
,
Stappers
B. W.
,
Cooper
S.
,
Brooke
J. M.
,
Knowles
J. D.
,
2016
,
MNRAS
,
459
,
1104

Manchester
R. N.
et al. ,
2001
,
MNRAS
,
328
,
17

Manchester
R. N.
,
Hobbs
G. B.
,
Teoh
A.
,
Hobbs
M.
,
2005
,
AJ
,
129
,
1993

McLaughlin
M. A.
,
Cordes
J. M.
,
2003
,
ApJ
,
596
,
982

Morello
V.
et al. ,
2020
,
MNRAS
,
493
,
1165

Parent
E.
et al. ,
2018
,
ApJ
,
861
,
44

Petigura
E. A.
,
Marcy
G. W.
,
Howard
A. W.
,
2013
,
ApJ
,
770
,
69

Rankin
J. M.
,
1983
,
ApJ
,
274
,
333

Ransom
S. M.
,
Eikenberry
S. S.
,
Middleditch
J.
,
2002
,
AJ
,
124
,
1788

Sanidas
S.
et al. ,
2019
,
A&A
,
626
,
A104

Scholz
P.
et al. ,
2016
,
ApJ
,
833
,
177

Spiewak
R.
et al. ,
2020
,
MNRAS
,
496
,
4836

Staelin
D. H.
,
1969
,
IEEE Proc.
,
57
,
724

Stovall
K.
et al. ,
2014
,
AJ
,
791
:

Tan
C. M.
et al. ,
2018
,
ApJ
,
866
,
54

Titus
N.
et al. ,
2019
,
MNRAS
,
487
,
4332

van Heerden
E.
,
Karastergiou
A.
,
Roberts
S. J.
,
2017
,
MNRAS
,
467
,
1661

Vaughan
B. A.
et al. ,
1994
,
ApJ
,
435
,
362

Virtanen
P.
et al. ,
2019
,
Nature Methods
,
17
,
261

Yu
M.
,
2018
,
ApJ
,
868
,
8

APPENDIX A: MATHEMATICAL DERIVATIONS

A1 The likelihood-ratio test for the presence of a signal with known shape

With the context of the hypothesis testing problem of Section 3, let us write the log-likelihood function of the data |$\mathbf {x}$|⁠. Recall that we have modelled the data as the sum of: a known signal |$\mathbf {s}$| normalized to unit square sum scaled by an amplitude parameter a, and of independent samples of normally distributed noise |$\mathbf {w}$| (equation 7). The data |$\mathbf {x}$| follow a multivariate Gaussian distribution with:

  • mean |$\mathbf {\mu } = a \mathbf {s}$|

  • covariance matrix that is equal to the identity matrix, because the noise samples wi are assumed to be mutually independent, and thus uncorrelated

This means that the log-likelihood function of a given the data |$\mathbf {x}$| is
(A1)
where N is the length of vector |$\mathbf {x}$|⁠, and we used the fact that ‖s2 = 1. We then apply the definition of the likelihood-ratio test (equation ) to our hypothesis testing problem, recalling that the null hypothesis is a = 0:
(A2)
Differentiating equation (A1) with respect to a shows that |$\mathcal {L}(a|\mathbf {x})$| reaches its maximum value for |$a^* = \mathbf {x} \cdot \mathbf {s}$|⁠. This also means that the test statistic |$Z(\mathbf {x}) = \mathbf {x} \cdot \mathbf {s}$| is, by definition, the maximum-likelihood estimator for the parameter a. From there, the compact expression of equation (13) for |$\ln \Lambda (\mathbf {x})$| follows.

A2 Optimality of the Z-statistic

Here, we will show using the Karlin–Rubin theorem, as formulated in Casella & Berger (2002), that a test based on the Z-statistic is the most powerful for the presence of the signal |$\mathbf {s}$| in the data regardless of significance level α. Recall that our two hypotheses involving the signal amplitude parameter a are H0: a = 0 and H1: a > 0. The theorem requires the following two conditions to be met:

  • Z is a sufficient statistic for the parameter a; this means the test statistic Z captures all the information that the data |$\mathbf {x}$| can provide about the parameter a.

  • Z has a monotone likelihood ratio; that is, if g(z|a) is the probability distribution function (PDF) of Z, then for any a2 > a1 the ratio g(z|a2)/g(z|a1) is a monotone function of z. Intuitively, this expresses the idea that the larger the test statistic Z, the more likely the alternative hypothesis is to be true.

Under the conditions above the theorem states that the following test is most powerful at significance level α:

  • Choose H1 if Z ≥ η with η such that P(Z ≥ η|H0) = α. The value of η is given by equation (16).

  • Choose H0 otherwise.

We can show that both conditions above are indeed met. First, Z is a sufficient statistic for a by application of the Fisher–Neyman factorization theorem (e.g. chapter 6 of Casella & Berger 2002); the joint PDF of the data |$f(\mathbf {x}|a)$| can be expressed as follows, knowing that the data follow the multivariate Gaussian distribution stated in Section A1:
(A3)
Here, |$f(\mathbf {x}|a)$| is the product of two terms: one that depends on the data but not on the parameter a, and another (the rightmost exponential) that does depend on the parameter a, but only depends on the data through the Z-statistic |$Z(\mathbf {x}) = \mathbf {x} \cdot \mathbf {s}$|⁠. The factorization theorem thus applies and Z is a sufficient statistic for a. Secondly, we have previously shown (equation 15) that Z follows a normal distribution with mean a and unit variance, and thus the PDF of Z is given by |$g(z|a) = \exp (-(z-a)^2 / 2) / \sqrt{2 \pi }$|⁠. For any a2 > a1 we have
(A4)
which shows that g(z|a2)/g(z|a1) is a monotone function of z, completing the demonstration.

A3 Efficiency of boxcar matched filters in a blind search for Gaussian pulses

Although in a practical FFA search the folded profiles are discretely sampled, we will assume that the width of a phase bin is much smaller than the width of the Gaussian-shaped pulse being searched for. This enables us to work with continuous functions instead and derive analytical expressions for the search efficiency when using a family of boxcar pulse templates. We model the true pulse shape by a Gaussian function of the form
(A5)
where ϕ represents pulse phase in units of the pulse period, σ the Gaussian’s standard deviation in the same units, and ϕ ∈ [ − 1/2, +1/2]. A is a normalization factor to be chosen so that, in line with the convention of Section 3, the square sum of g over one pulse period is equal to 1. This is written as
(A6)
If we further assume σ ≪ 1, then the function is essentially zero outside the integration interval, and replacing the integration bounds by −∞ and +∞ does not change the value of the integral. Doing so in the equation above then yields |$A^2 \sqrt{\pi \sigma ^2} = 1$|⁠, and our properly normalized Gaussian pulse is therefore
(A7)
Our search template is a boxcar filter of width w also expressed in units of the pulse period
(A8)
where the unit square sum normalization has also been applied. The efficiency |$\mathcal {E}$| (equation 22) is equal to the scalar product of g with t, which is a function of w:
(A9)
where the integration range has been limited to [ − w/2, +w/2] since t is uniformly zero outside of these bounds. The integral in the expression above has then been expressed in terms of the error function erf. For increased clarity, let us define the half-width of the boxcar in units of sigma |$u = \frac{w}{2 \sigma }$|⁠. We can then write the more compact expression
(A10)
We find numerically that |$\mathcal {E}$| reaches its maximum for u* = 1.40 and that |$\mathcal {E}(u^*) = 0.943$|⁠. In other words, the boxcar filter that optimally matches a Gaussian function with standard deviation σ has a width |$w^* = 2.80\sigma = 1.19\,\rm {FWHM}$| and recovers 94.3 per cent of the true SNR of the pulse. In a blind search, σ is unknown; however, a good strategy is to try a one-dimensional grid of boxcar widths that are powers-of-two multiples of some minimum value (one phase bin in practice). In this case, any interval of the form [w, 2w] is guaranteed to contain a grid point, and in particular |$[w^*/\sqrt{2}, w^* \sqrt{2}]$| is not only centred (in log-scale) on the optimum w* but also always contains the grid point closest (in log-scale as well) to w*. A good estimate of the average efficiency of the search |$\bar{\mathcal {E}}$| is therefore:
(A11)
where the leading factor |$\frac{\sqrt{2}}{u^*}$| is the inverse of the length of the integration interval. The worst-case scenario is reached for |$u = u^* \sqrt{2}$|⁠, where |$\mathcal {E} = 0.901$|⁠. Two major caveats to keep in mind here are that the search efficiency may be significantly lower for pulses with multiple peaks, and that the above derivation was done by approximating discrete folded pulse profiles and templates with continuous functions, which is valid only if the number of phase bins is large enough for the pulse to be well-resolved.

We note that the above derivation is directly applicable to single-pulse searches as well. Our 0.943 peak efficiency factor differs from the value of 0.868 given in the appendix of McLaughlin & Cordes (2003); although they do not provide details about how their value is derived, the discrepancy could be caused by the fact they took into account a preliminary downsampling of the data to a time resolution comparable to the width of the pulse of interest, which results in a loss of SNR. On the other hand, we have considered the optimal case where the time (or phase) sampling interval is much smaller than the pulse width.

A4 Variance of uncorrelated Gaussian noise samples downsampled by a non-integer factor

Here, we consider a sequence of time samples xi made of normally distributed, uncorrelated Gaussian noise, with arbitrary mean and unit variance. The operation of downsampling by a non-integer factor f = n + r where n = ⌊f⌋ and 0 < r < 1 is illustrated in Fig. A1. A downsampling window gathers and adds f consecutive input samples together into an output sample, before moving forward by f input samples and repeating the process, creating a new time series with a time resolution reduced by a factor f. In the addition, samples that only partially overlap with the window are weighted by their overlap fraction. As the window moves forward, the ‘initial phase’ ϕ of the downsamping window relative the start of the first sample with which it overlaps will vary; the variance of the output sample created is a function of ϕ. Indeed one can distinguish two cases:

  • Case A: 0 ≤ ϕ < 1 − r. The window gathers 1 − ϕ partial sample on the left, n − 1 full samples in the middle, and r + ϕ partial sample on the right.

  • Case B: 1 − r ≤ ϕ < 1. The window gathers 1 − ϕ partial sample on the left, n full samples in the middle, and r + ϕ − 1 partial sample on the right.

An illustration of downsampling a time series by a non-integer factor f = n + r, where n = ⌊f⌋ and 0 < r < 1.
Figure A1.

An illustration of downsampling a time series by a non-integer factor f = n + r, where n = ⌊f⌋ and 0 < r < 1.

The variance of a single output sample σ2 as a function of ϕ is thus
(A12)
where we used the fact that the input samples are uncorrelated, the general property Var[aX] = a2Var[X], and that uncorrelated variances add linearly. If f is an irrational number, then over a long input time series ϕ will take values that can be considered randomly distributed between 0 and 1. Therefore, the average variance of output samples |$\overline{\sigma ^2}$| is
(A13)
which we have verified numerically by downsampling artificial Gaussian noise. It is interesting to note that if f is a rational number with a small denominator, ϕ cycles through a small set of rational values and the above is not quite correct; we neglect that edge case in riptide, where we apply the above equation for all non-integer f.
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)