ABSTRACT

The cosmological 21 cm signal is one of the most promising avenues to study the Epoch of Reionization. One class of experiments aiming to detect this signal is global signal experiments measuring the sky-averaged 21 cm brightness temperature as a function of frequency. A crucial step in the interpretation and analysis of such measurements is separating foreground contributions from the remainder of the signal, requiring accurate models for both components. Current models for the signal (non-foreground) component, which may contain cosmological and systematic contributions, are incomplete and unable to capture the full signal. We propose two new methods for extracting this component from the data: First, we employ a foreground-orthogonal Gaussian Process to extract the part of the signal that cannot be explained by the foregrounds. Secondly, we use a FlexKnot parametrization to model the full signal component in a free-form manner, not assuming any particular shape or functional form. This method uses Bayesian model selection to find the simplest signal that can explain the data. We test our methods on both, synthetic data and publicly available EDGES low-band data. We find that the Gaussian Process can clearly capture the foreground-orthogonal signal component of both data sets. The FlexKnot method correctly recovers the full shape of the input signal used in the synthetic data and yields a multimodal distribution of different signal shapes that can explain the EDGES observations.

1 INTRODUCTION

21 cm cosmology is the endeavour to use the hyperfine transition of neutral hydrogen to probe the Universe at high redshifts (5 ≲ z ≲ 30), from the Dark Ages, when the Universe was neutral via Cosmic Dawn when the first stars formed, to the Epoch of Reionization, when early galaxies ionized the intergalactic medium (IGM). This is one of the last unexplored frontiers of cosmology, and measurements of the 21 cm signal of neutral hydrogen will allow us to study the properties of the IGM and the first stars and galaxies (Madau, Meiksin & Rees 1997; Furlanetto, Oh & Briggs 2006; Pritchard & Loeb 2012; Barkana 2016).

The 21 cm signal is the emission or absorption line corresponding to the hyperfine transition of neutral hydrogen. When the spin temperature of the hydrogen atoms is higher than the background radiation temperature, the atoms emit, creating a peak in the differential brightness temperature. When it is lower, the hydrogen atoms absorb the background radiation causing a trough in the spectrum. We generally expect a trough during Cosmic Dawn and potentially a peak towards lower redshifts when the IGM is heated above the background temperature. We can observe this signal as a distortion in a smooth radio background (typically assumed to be the Cosmic Microwave Background, CMB), at a frequency of |$\nu _{21} = 1420.4\, {\rm MHz} / (1+z)$| depending on the redshift z of the corresponding hydrogen cloud.

The two main types of 21 cm experiments are global signal experiments, which measure the sky-averaged 21 cm signal, and interferometers, which measure the spatial fluctuations of the signal. Global signal experiments include EDGES (Experiment to Detect the Global EoR Signature; Bowman, Rogers & Hewitt 2008), SARAS (Shaped Antenna measurement of the background RAdio Spectrum; Patra et al. 2013), SCI-HI (Sonda Cosmológica de las Islas para la Detección de Hidrógeno Neutro; Voytek et al. 2014), BIGHORNS (Broadband Instrument for Global HydrOgen ReioNisation Signal; Sokolowski et al. 2015), LEDA (Large Aperture Experiment to Detect the Dark Ages; Price et al. 2018; Garsden et al. 2021), PRIZM (Probing Radio Intensity at high-Z from Marion; Philip et al. 2019), MIST (Mapper of the IGM Spin Temperature; Monsalve et al. 2023), and REACH (Radio Experiment for the Analysis of Cosmic Hydrogen; de Lera Acedo et al. 2022). Of particular interest are EDGES, which has claimed a detection of the 21 cm signal (Bowman et al. 2018), and SARAS 3, which reported a non-detection (Singh et al. 2022) of the signal claimed by EDGES. Future experiments (e.g. REACH and MIST) are expected to provide independent global signal measurements to answer this question. Measurements of fluctuations in the 21 cm signal have been performed by several interferometers. While no detection has been reported so far, these measurements have set upper limits on the 21 cm power spectrum. The strongest limits so far have been set by the HERA telescope (Hydrogen Epoch of Reionization Array; Abdurashidova et al. 2022; The HERA Collaboration 2022); other interferometers include LOFAR (Low-Frequency Array; Patil et al. 2017; Mertens et al. 2020), MWA (Murchison Widefield Array; Dillon et al. 2014; Trott et al. 2020), NenuFAR (New Extension in Nançay Upgrading LOFAR; Mertens, Semelin & Koopmans 2021), PAPER (Precision Array for Probing the Epoch of Reionization; Kolopanis et al. 2019), and GMRT (Giant Metrewave Radio Telescope; Paciga et al. 2013). Although fluctuations in the cosmological signal contain unique information about the first sources of light and the underlying cosmology, in this paper we choose to focus on the global 21 cm signal measurement.

The 21 cm global signal is challenging to observe. Not only does a measurement require sufficient sensitivity (the signal reaches a contrast of up to |$280\, \mathrm{mK}$|⁠, Cohen et al. 2017, under the assumption that the background is the CMB), but it is also contaminated by astrophysical foreground sources (Chandrasekhar 1960; Rogers et al. 2015), radio frequency interference (Rogers et al. 2005), ionospheric effects (Vedantham et al. 2014; Datta et al. 2016; Shen et al. 2021, 2022), and instrumental effects such as beam chromaticity (e.g. Monsalve et al. 2017b; Anstey, de Lera Acedo & Handley 2021) or analogue receiver properties (Jishnu Nambissan et al. 2021; de Lera Acedo et al. 2022). While instrumental challenges can be addressed by careful design and calibration (Monsalve et al. 2017a), the unknown astrophysical foreground sources have to be constrained jointly with the signal. Thus, we fit the measured sky temperature as a sum of the foreground and signal components Tsky(ν) = Tfg(ν) + Tsignal(ν), following Bowman et al. (2018).

The claimed detection of the 21 cm signal by Bowman et al. (2018) showed a much stronger absorption trough than expected from physical models that assume the CMB as the background radiation and standard astrophysical heating and cooling processes (Cohen et al. 2017; Reis, Fialkov & Barkana 2021). This has sparked discussion about the choice of foreground model (Hills et al. 2018; Singh & Subrahmanyan 2019; Bevins et al. 2021), the question of instrumental calibration (Sims & Pober 2020), and the possibility of non-standard cosmological signals (Barkana 2018; Ewall-Wice et al. 2018; Feng & Holder 2018; Muñoz & Loeb 2018; Slatyer & Wu 2018; Fialkov & Barkana 2019; Mirocha & Furlanetto 2019). However, all previous work either assumed the signal to have a particular shape that does not necessarily correspond to the actual signal (including the original work by Bowman et al. 2018), or used physical models, typically based on simulations (such as Furlanetto, Zaldarriaga & Hernquist 2004; Iliev et al. 2006; McQuinn et al. 2007; Mesinger, Furlanetto & Cen 2011; Fialkov et al. 2014a; Ross et al. 2017; Semelin et al. 2017) that are limited by our understanding of the underlying astrophysics (Majumdar et al. 2014). This mismatch between the signal model and the true signal will affect the quality of the foreground separation and prevent the extraction of the correct signal from the data.

In this work, we propose a new way to model the signal component that is agnostic to the precise form of the signal, allowing for an unexpected cosmological signal shape or an undetected systematic contribution. From this point on, we will refer to any non-foreground part of the measurement as the signal component here, and do not address the question of separating the cosmological signal from systematics. This question is considered in the forthcoming work by Shen et al. (2023). Concretely we propose two techniques to describe this signal component: A non-parametric Gaussian Process, and a model-independent FlexKnot parametrization. These free-form approaches allow us to find out what (not necessarily cosmological) signal is hidden in the data. This signal can then be interpreted, either to learn about systematics in the data or about astrophysical processes. The latter will however require physical models to connect the observed signal properties to physical properties, re-introducing model-dependence.

Gaussian Processes (GPs; Rasmussen & Williams 2006) are a standard tool for modelling functions in a non-parametric fashion. Taken as a prior over smooth functions, GP regression enables Bayesian inference over a regression function with an unknown parametric form and has successfully been used in a wide array of applications (Almosallam et al. 2016; Ghosh et al. 2020; Li et al. 2021; Mertens, Bobin & Carucci 2023).1Utilizing a GP to model the unknown signal allows us to place a prior distribution over the signal without assuming a known shape. Gaussian Processes have a small number of parameters specifying the mean and covariance but these only encode priors on the signal smoothness and strength and do not impose any functional form on the signal. However, since GPs are very flexible function approximators, they need to be sufficiently constrained to avoid confounding with the foreground model. We make use of quite a harsh constraint to obtain this, relying on orthogonalizing the GP with respect to the foreground fit, but other options are possible as well.

In this work, we utilize the concept of a foreground-orthogonal basis, that has been previously considered in the literature (Liu & Tegmark 2012; Vedantham et al. 2014; Switzer & Liu 2014; Tauscher et al. 2018; Tauscher, Rapetti & Burns 2020; Rapetti et al. 2020; Liu & Shaw 2020; Bassett et al. 2021). We say a signal function is orthogonal to the foregrounds if no amount of foreground terms can reduce the norm of the signal. If we consider the foreground terms and signal as vectors containing their value at each of the observed frequencies then this orthogonality is equivalent to the signal vector being orthogonal (zero dot product) to all foreground term vectors. This concept is particularly useful here because the foreground-orthogonal component of a measurement is unambiguous (although still dependent on the foreground model) and has to be fully contained within the signal component (which could be of cosmological origin or caused by systematics).

In addition to the Gaussian Process, we explore a second method to fit the signal in a model-independent way. The FlexKnot parametrization is based on a free-form method that has been introduced by Bridges et al. (2009) for modelling the primordial power spectrum. It has been used in the CMB community for several years, constraining the primordial power spectrum and dark energy equation of state (Vázquez et al. 2012a, b, 2013; Abazajian et al. 2014; Aslanyan et al. 2014; Hee et al. 2016; Planck Collaboration XX,2016; Hee et al. 2017; Finelli et al. 2018; Handley et al. 2019; Planck Collaboration I 2020a;Planck Collaboration X 2020,b, Escamilla & Vazquez 2023). It was also used to model the pressure profile of galaxy clusters (Olamaie et al. 2018). Millea & Bouchet (2018) adapted the method for parametrizing the reionization history, giving it the name FlexKnot. More recently the use of FlexKnot has been extended by Heimersheim et al. (2022) to explore reionization history constraints from hypothetical high-redshift Fast Radio Bursts. Their implementation forms the basis of the implementation adopted in this work.

The FlexKnot method differs from the Gaussian Process in that it is a parametric method, and allows for a variable amount of complexity. The parametrization works by specifying a set of interpolation points (knots) as x and y coordinates (the parameters) that allow us to describe the signal function by interpolating between these knots. The number of interpolation knots is itself also a parameter, leading to the key difference between the Gaussian Process and FlexKnot methods: We can compare interpolations of different complexity levels and use Bayesian model selection to find the simplest signal that can explain the observations. At the same time, we still obtain a free-form signal fit that is not limited to a particular shape.

With this paper, we introduce the concept of a free-form model for the global 21 cm signal. We highlight what the Gaussian Process and FlexKnot methods can and cannot do, and demonstrate their use on both synthetic data, and the EDGES low-band measurements from Bowman et al. (2018). We describe both methods in Section 2, present the results for both cases in Section 3, discuss advantages and limitations in Section 4, and give our conclusions in Section 5.

2 METHOD

The quantity observed by a global 21 cm signal experiment is the sky-averaged brightness temperature as a function of frequency, Tsky(ν). The actual measurement is a spectral intensity but it is typically expressed as the corresponding blackbody temperature. The sky temperature is dominated by contributions from foregrounds but contains a small cosmological signal that we are interested in. Additionally, the measured temperature may contain systematic contributions. In this work, we focus on separating the foregrounds from the remaining signal, Tsignal(ν), and will not distinguish between cosmological and systematic contributions (this is addressed in the forthcoming work by Shen et al. 2023). Thus, for our purposes, the sky-averaged observed brightness temperature is

(1)

where Tfg(ν) is the foreground component and Tsignal(ν) contains the remaining contributions (including the cosmological signal and possible systematics).

In Section 2.1, we briefly describe the foreground model we adopt from Bowman et al. (2018). Section 2.2 describes the synthetic data we generate to validate our methods and Section 2.3 summarizes the likelihood function adopted from Bowman et al. (2018). Finally, we give a detailed description of the Gaussian Process and FlexKnot methods in Sections 2.4 and 2.5, respectively.

2.1 Foreground model

The core advances discussed in this work concern the modelling of the signal component, hence we only briefly describe the foreground model and use a simple model derived in Bowman et al. (2018). However, our signal modelling approach can be used with any parametrized foreground model (e.g. the alternatives considered in Hills et al. 2018).

For concreteness, we assume the following parametrized foreground model. It describes the foreground contribution to the sky temperature as a sum of polynomial terms

(2)

where an are free parameters and νc is set to |$75\, \mathrm{MHz}$|⁠. We note that this model corresponds to equation (2) rather than (1) of Bowman et al. (2018) but Bowman et al. (2018) themselves report consistent results for both models. We assume wide and nearly uninformative priors on the foreground parameters, as we discuss in Sections 2.4.2 and 2.5.3 for each method.

2.2 Synthetic data

We generate a synthetic data set with a known ground truth to validate our methods. We use the foreground model from equation (2), and a signal component corresponding to a flattened Gaussian (from Bowman et al. 2018)

(3)

with amplitude A, central frequency ν0, full width at half-maximum w, and flattening factor τ. Finally, we add Gaussian noise with a standard deviation of |$25\, \mathrm{mK}$| to emulate the thermal noise level of the EDGES experiment (Bowman et al. 2018). The synthetic data are generated with foreground parameters (a1, a2, a3, a4, a5) = (1570, 620, −1000, 700, −175) and signal parameters |$(A, \nu _0, w, \tau) = (500\, \mathrm{mK}, 78\, \mathrm{MHz}, 19\, \mathrm{MHz}, 7)$|⁠; both sets were chosen to be similar to those found in Bowman et al. (2018).2

Note that we use equation (3), not only to generate the synthetic data (with the parameter values mentioned above) but separately also to fit the EDGES low-band data. In the latter case, we fit the parameters of equation (3) in order to reproduce the method of Bowman et al. (2018) as a comparison to our methods. We emphasize that these are two separate uses of equation (3), with different signal parameters and different foregrounds.

2.3 Likelihood function

To compare a given model m = Tfg + Tsignal (a vector composed of temperature values at the observed frequencies) to the measured data d (vector of measurements at every frequency) we use a Gaussian likelihood, following Bowman et al. (2018). The likelihood accounts for Gaussian thermal noise with a free parameter σnoise that determines the standard deviation:

(4)

We use this likelihood for both our synthetic data and the EDGES data. Since the EDGES low-band data are measured in 123 frequency channels (Bowman et al. 2018) ν1, …, ν123, the data as well as the signal and foregrounds will be 123-dimensional temperature vectors corresponding to those frequencies.

2.4 Gaussian Process signal model

We now proceed with introducing the methods to represent the signal component, starting with the Gaussian Process approach. GPs are a common way to model smoothly varying functions. A GP is a stochastic process, i.e. a collection of random variables, defined such that the joint distribution of any finite sub-collection is a multivariate normal. Specifically, we consider the continuous Gaussian Process |$\mathcal {\mathrm{ GP}}(\mu (\nu),\kappa ({\nu ,\nu ^{\prime }}))$| to describe the signal at a frequency ν, where μ(ν) is the mean function calculated at frequency ν, and κ(ν, ν′) is the covariance function calculated between two frequencies ν and ν′. We generate the GP signal vector as a finite realization of a Gaussian Process. It follows a multivariate normal distribution |$\left[T_{\rm GP}(\nu _1),\ldots ,T_{\rm GP}(\nu _{n})\right]^T \sim \mathcal {N}(\boldsymbol{\mu },K)$| where |$\boldsymbol{\mu }=[\mu (\nu _1),\ldots ,\mu (\nu _n)]^T$| denotes the mean vector constructed by evaluating the mean function at each input location. Correspondingly, the covariance matrix K is defined such that Kij = κ(νi, νj) for all frequency pairs (νi, νj). The GP is fully specified by its mean and covariance functions. Of particular importance is the covariance function, sometimes denoted as the kernel of the GP, which controls many features of the resulting function such as the level of smoothness (correlation) across the input locations.

We choose the mean function to be zero, |$\boldsymbol{\mu }=\mathbf {0}$|⁠, to not bias the signal towards any particular shape. As the covariance matrix, we choose a simple squared exponential kernel

(5)

with amplitude |$\sigma _f^2$| and length scale ℓ. The amplitude controls how far the GP samples are allowed to move away from the mean, and the length scale controls the overall correlation length of the samples across frequency space.

In addition to this smooth variation, we allow for thermal noise as described in Section 2.3. Taking equation (4) with m = Tfg + TGP yields the corresponding likelihood for observational data d. In order to draw samples from the Gaussian Process, we use a formulation that explicitly includes the noise term ϵi, i.e. we draw expected measurements at frequency νi from the model Tfgi) + TGPi) + ϵi with ϵi being a Gaussian random variable with zero mean and variance |$\sigma _{\rm noise}^2$|⁠.

2.4.1 Orthogonal Gaussian Process

It should be noted that combining the parametric model with the GP with no further constraints yields a solution characterized by degeneracy between estimates of the foregrounds, and signal. To illustrate this point, consider the signals in Fig. 1. All the demonstrated curves contain the same cosmological signal but differ in the foreground component, and are thus equally preferred by the data (as any “foreground-shaped” addition can be compensated by slightly different foreground parameters). This leads to confounding (Hodges & Reich 2010), as the fit is jointly appropriate, but there is no incentive for the GP to separate the signal from the foregrounds. This degeneracy leads to large uncertainties on both the GP signal and the foreground parameters.

Five different signals that are equivalent up to their foreground-proportional terms. Each of these will have the same likelihood (for different values of the foreground parameters), thus making the signals equally preferred by the data.
Figure 1.

Five different signals that are equivalent up to their foreground-proportional terms. Each of these will have the same likelihood (for different values of the foreground parameters), thus making the signals equally preferred by the data.

To control the degeneracy between the GP and the foreground model, the GP must be penalized away from functional forms that are similar to the foreground. One way to do this is through penalization of the kernel hyperparameters ℓ and σf, for example, a prior choice for σf that encourages small deviations from the prior mean at zero. That choice however could also negatively affect the result, e.g. by discouraging a better-fitting signal requiring larger deviations from the prior mean.

Another way to avoid the degeneracy, and which will be our main method pursued in the paper, is to only fit the foreground-orthogonal component of the signal, rather than the full signal. The foreground-orthogonal part contains less information than the full signal because there exist multiple signals that have the same foreground-orthogonal component (e.g. those shown in Fig. 1). However, these are exactly the kind of signals that cannot be distinguished by the likelihood. Thus we fit the foreground-orthogonal component which presents exactly the information that can be extracted from the data alone. In Section 3.1, we will see that the Gaussian Process can very accurately estimate this component of the signal.

The foreground-orthogonal signal fit obtained from the GP is generically useful for signal comparison purposes. In particular, it can be used to compare the data to (e.g. physical or systematic) theory signals by simply comparing their foreground-orthogonal components. This can be done by applying the foreground-orthogonal projection to the theory signal and then comparing the resulting foreground-orthogonal theory to the foreground-orthogonal GP fit. We illustrate the projection in Fig. 2, showing full theory signals (top panel) and their foreground-orthogonal components (bottom panel).

Demonstration of the foreground-orthogonal projection with a range of simulated cosmological signals. The particular signals are chosen for illustrative purposes, the different lines correspond to different settings of the X-ray production in early galaxies (fX from 10−1, blue, to 103, yellow, see Fialkov et al. (2014a), Fialkov, Barkana & Visbal (2014b), Fialkov & Barkana (2014) for details3). The top panel shows Tsignal with the recognizable shape, but the bottom panel shows the foreground-orthogonal component Tsignal, projected of these signals (note the y-axis scale). We want to stress that we show these cosmological signals without any systematics and only for illustrative purposes, while Tsignal in our analysis may contain both cosmological and systematic contributions.
Figure 2.

Demonstration of the foreground-orthogonal projection with a range of simulated cosmological signals. The particular signals are chosen for illustrative purposes, the different lines correspond to different settings of the X-ray production in early galaxies (fX from 10−1, blue, to 103, yellow, see Fialkov et al. (2014a), Fialkov, Barkana & Visbal (2014b), Fialkov & Barkana (2014) for details3). The top panel shows Tsignal with the recognizable shape, but the bottom panel shows the foreground-orthogonal component Tsignal, projected of these signals (note the y-axis scale). We want to stress that we show these cosmological signals without any systematics and only for illustrative purposes, while Tsignal in our analysis may contain both cosmological and systematic contributions.

Comparing foreground-orthogonal projections is sufficient and lossless (under a Gaussian likelihood as in equation (4), and an unrestricted foreground model as adopted here). If the foreground-orthogonal theory signal is close to the foreground-orthogonal component of the data, this implies that there exists a set of foreground parameters for which the theory model plus a foreground term are equally close to the full data. We give a mathematical derivation of this in the paragraph on analytical foreground marginalization in Section 2.5.1.

In the future, it might be practical for signal interpretation purposes to consider the foreground-orthogonal components, rather than analyse the full signals. Projecting an ensemble of physical signals would enable the interpretation of the foreground-orthogonal component in terms of astrophysical properties, just as it is done now with full signals. We discuss this option further in Section 4.

To create estimate the foreground-orthogonal component we construct a Gaussian Process that is always orthogonal to the foregrounds, allowing us to fit the parametric foregrounds and non-parametric GP signal simultaneously without degeneracy. While this is generally difficult to do for GPs across the entire input space (see e.g. Plumlee & Joseph 2017), it can easily be achieved at any finite set of locations by conditioning by kriging (Rue & Held 2005). These processes have been utilized in spatial statistics under the name restricted spatial regression (Hanks et al. 2015; Khan & Calder 2022).

We can achieve this by first drawing TGP from the unconstrained multivariate Normal distribution |${\mathcal {N}(\mathbf {0},K)}$| and then applying the transformation |${\mathbf {\mathit{ T}}_{\rm GP, proj} = (\mathrm{Id} - P_\mathrm{ \mathit{ F}})\, \mathbf {\mathit{ T}}_{\rm GP}}$| using the projection matrix PF = KFT(FKFT)−1F with the 5 × 123-dimensional design matrix Fni = fni) describing the five independent foreground modes from equation (2). This can be derived by first considering the joint distribution of the GP TGP and a linear transformation of the GP FTGP – which will be multivariate normal. Then the conditional distribution of TGP|FTGP = 0 is available through standard formulas, enforcing the orthogonality constraint. To efficiently sample from the posterior distribution, we use the standard trick of marginalizing out the GP from the joint distribution to obtain the following distribution of our data:

(6)

That is, we assume our observed data is drawn from a normal distribution centred around the foreground model, with spatial covariance defined by the orthogonal GP. Conditional on the foreground, kernel hyperparameters, and noise parameter σ2, the posterior distribution of the 21 cm signal is available in closed form, and is simply a multivariate normal |$\mathcal {N}(\tilde{K}(\tilde{K}+\sigma ^2\mathrm{Id})^{-1}(T_\mathrm{obs}-T_{\rm fg}),\tilde{K}-\tilde{K}(\tilde{K}+\sigma ^2\mathrm{Id})^{-1}\tilde{K})$|⁠, where |$\tilde{K}=(\mathrm{Id} - P_\mathrm{ \mathit{ F}}) K (\mathrm{Id} - P_\mathrm{ \mathit{ F}})^T$|⁠, following the standard formulas of Gaussian Process regression. We sample from this distribution at each step of a Monte Carlo Markov Chain (MCMC) to recover the posterior distribution of TGP,proj itself.

2.4.2 Gaussian Process priors

We expect the variation of the GP to be of the order of |$\pm 1\, \mathrm{K}$| so we set the prior on σf to be the exponential distribution |$\pi (\sigma) = \exp (-\sigma \, \mathrm{K}^{-1})$|⁠. This standard choice allows deviation from the mean with a preference for smaller fluctuations (we want this preference to not introduce artificial fluctuations due to the prior). If one is expecting signals much larger than |$1\, \mathrm{K}$| (e.g. in cases with additional radio backgrounds present or non-standard thermal histories; Barkana 2018; Ewall-Wice et al. 2018; Feng & Holder 2018; Muñoz & Loeb 2018; Slatyer & Wu 2018; Fialkov & Barkana 2019; Mirocha & Furlanetto 2019) this choice can be adjusted. As for the smoothness we are working in the frequency range |$[50\, \mathrm{MHz}, 100\, \mathrm{MHz}]$| and set the prior on ℓ to be an inverse-gamma distribution with shape α = 5 and scale |$\beta =75\, \mathrm{MHz}$|⁠. This is a fairly standard choice where we penalize the model away from exceedingly small or large length scales, but the GP is not very sensitive to these choices due to the orthogonality constraint.

The remaining priors are set as follows: For the foreground component, we place uninformative priors on the parameters of the foreground model using the recommended QR reparametrization trick as described in the Stan manual (Stan Development Team 2023a). The prior for each scaled and transformed parameter is given by a wide and uninformative Normal distribution |$\mathcal {N}(0,10^6)$|⁠. For the noise σnoise, we assume a half-Normal distribution value |$\mathcal {N}^+(0\, {\rm K},1\, {\rm K})$| to allow for a wide range covering all reasonable values. We do full MCMC for all parameters in the model, and implement the orthogonal GP model in Stan (RStan version 2.26; Stan Development Team 2023b, 2018) using the No-U-Turn-Sampler algorithm (Hoffman & Gelman 2014). The results of estimating the Gaussian Process are shown in Section 3.1. Note that we also did fit a non-orthogonalized GP to the data, using the same prior distributions for the hyperparameters as above, but found that it was difficult to avoid degeneracy with the foreground through prior distributions only.

2.5 FlexKnot signal model

The Gaussian Process fulfils one of our goals: it allows us to extract information about the signal model without assuming any particular function or shape, and we obtain a tight constraint on the foreground-orthogonal projection of the signal. However, this leaves a wide range of possible full (non-orthogonal) signals compatible with this constraint.

Our goal in this section is to differentiate between these possible full signals and to find the most likely underlying signal. To achieve this we parametrize the signal with the FlexKnot method (Vázquez et al. 2012a; Millea & Bouchet 2018; Heimersheim et al. 2022) and apply Bayesian model selection to find the most probable true signals. Our final result will be a posterior distribution of signals, showing us which shapes are compatible with the data.

Even though multiple signals may fit the data equally well, we can differentiate them based on their complexity. Simpler, more predictive models are more likely to be correct, whereas complex functions with many free parameters are susceptible to overfitting the data. In Bayesian statistics, this effect is quantified as the Bayesian evidence for a given model, and we can make use of this in the FlexKnot method.

We use the FlexKnot method to parametrize the signal Tsignal as a function of frequency ν. FlexKnot uses a family of flexible parametrizations that are based on interpolation. The first FlexKnot function is an interpolation between two points, i.e. a straight line. The second FlexKnot function adds a third point and interpolates between the three points, and so on. We refer to these interpolation points as knots. Each function is parametrized by the coordinates {νi, Ti} of its knots, and the value at any other point ν is interpolated. We illustrate this in the top panel of Fig. 3.

Illustration of the FlexKnot parametrization Tsignal(ν). Top: An example FlexKnot with six interpolation points (knots, red with arrows) and the corresponding interpolated curve (blue curve). Only the knots (not the interpolation) are restricted by the priors to be within the range $\nu _i \in [50\, {\rm MHz}, 100\, {\rm MHz}]$ and $T_i \in [-2\, {\rm K}, 2\, {\rm K}]$. Bottom: An example posterior distribution of the knot coordinates {νi, Ti} for a FlexKnot run. This is the raw data which the signal posteriors (Sections 3.2 and 3.3) are computed from.4
Figure 3.

Illustration of the FlexKnot parametrization Tsignal(ν). Top: An example FlexKnot with six interpolation points (knots, red with arrows) and the corresponding interpolated curve (blue curve). Only the knots (not the interpolation) are restricted by the priors to be within the range |$\nu _i \in [50\, {\rm MHz}, 100\, {\rm MHz}]$| and |$T_i \in [-2\, {\rm K}, 2\, {\rm K}]$|⁠. Bottom: An example posterior distribution of the knot coordinates {νi, Ti} for a FlexKnot run. This is the raw data which the signal posteriors (Sections 3.2 and 3.3) are computed from.4

The interpolation function used between the knots is the piecewise cubic Hermite interpolation polynomials (PCHIP),5 following Millea & Bouchet (2018). Alternative options are other spline variations or even a linear interpolation. We have not found meaningful differences between various interpolation techniques, and chose the PCHIP interpolation to avoid sharp the features that a linear interpolation would introduce, making the functions easier to interpret. Note that, unlike Heimersheim et al. (2022, modelling the reionization history), we do not fix the first and last interpolation knot to the minimum and maximum y-axis levels, as those minima and maxima do not exist. Instead we extrapolate the cubic Hermite polynomials before the first, and after the last interpolation knot.

After setting up the FlexKnot parametrization we sample the parameter space using Nested Sampling (Skilling 2004), specifically we use the code polychord (Handley, Hobson & Lasenby 2015a, b). We perform multiple sampling runs for each number of knots to ensure our results are not influenced by outlier runs, averaging the posterior samples. In Section 3.2 and 3.3, we show the evidences of individual runs (specifically Figs 9 and 12) and show the scatter is small. The parameter space spans all knot coordinates and foreground parameters, however as discussed further in Section 2.5.1 we only need to sample the knot coordinates. We choose Nested Sampling so that we can compute the Bayesian evidence Z for each model, and we choose polychord because it is well suited for the high-dimensional parameter space of FlexKnot models. Every Nested Sampling run gives us the posterior distribution of knot positions for a given number of knots; The lower panel of Fig. 3 shows an example for Nknots = 6. We use the package anesthetic (Handley 2019) to analyse the Nested Sampling results. As the final visualization step do not show the distribution of knot coordinates, but the distribution of functions by computing the interpolation for every sample. The reason for this is that the knot coordinates are not directly interpretable, and the interpolation function is both, what the data is ultimately sensitive to, and what we are interested in. For this purpose we used fgivenx (Handley 2018), which yielded the functional posterior distributions that we show in Section 3.

We explore functions from Nknots = 2 to a maximum of 15 knots. The cutoff was chosen empirically as the Bayesian evidence for models declines after 6 knots, falling to a negligible level of less than 10−6 of the total evidence after 15 knots. We would like to treat Nknots as a discrete parameter of the model, but it is impractical to sample Nknots as a parameter. Instead, we run separate Nested Sampling runs for each Nknots and average the results weighted by the Bayesian evidence Z of each run. This is equivalent to treating Nknots as a parameter and marginalizing over its value. We can show this by writing the posterior distribution of Tsignal given the data d and expanding the Nknots parameter. This shows that P(Tsignald) is equivalent to the average over individual posteriors P(TsignalNknots, d) weighted by the Bayesian evidence Z(Nknots):

(7)

The same rewrite can be used for other quantities such as the foreground parameter posterior.

2.5.1 Orthogonalizing the foregrounds

There is one additional technique we want to introduce in this section. While the FlexKnot method is not limited to foreground-orthogonal functions, we can make use of the orthogonalization to drastically speed-up the sampling. We can analytically integrate out the foreground parameters and reduce the parameter space we need to numerically sample by five dimensions. We note that this idea is not limited to FlexKnot but can be applied to any parametrization such as e.g. the flattened Gaussian or even physical models.

In the following, we will show (i) how to find an orthogonal basis for the foregrounds, and (ii) how to analytically solve the foreground parameter marginalization.

Orthonormal foreground basis:

We want to express the foreground with new basis functions |$\tilde{f}_\mathrm{ n}$| that are orthogonal and normalized

(8)

where the functions |$\tilde{f}_n$| are orthonormal. We mark quantities in the new basis with a tilde to distinguish them from the equivalent quantities in the original basis. The new foreground basis vectors are given by |$\mathbf {\tilde{\mathit{ f}}}_n = \left(\tilde{f}_n(\nu _1), \ldots\,, \tilde{f}_n(\nu _{123})\right)$|⁠. Then |$\mathbf {\tilde{\mathit{ f}}}_n$| and |$\mathbf {\tilde{\mathit{ f}}}_m$| are orthonormal if and only if |$\mathbf {\tilde{\mathit{ f}}}_n \boldsymbol{\cdot }\mathbf {\tilde{\mathit{ f}}}_m = \delta _{nm}$|⁠. To calculate this basis we define the 5 × 123-dimensional foreground matrix6Fni = fni) and apply a singular value decomposition7

to obtain the desired orthogonal foreground basis |$\tilde{F}_{ni} = (V^T)_{ni}$|⁠. We can convert between the original foreground parameters an and the corresponding new basis parameters |$\tilde{a}_n$| as

(9)

Fig. 4 shows a comparison of the original foreground basis functions fn (top), and the new orthogonal foreground basis functions |$\mathbf {\tilde{\mathit{ f}}}_n$| (bottom). Note that orthogonal here refers to the functions being orthogonal to each other, not orthogonal to the foregrounds.

Original foreground basis functions fn(ν) (top) and orthogonal foreground basis functions $\tilde{f}_n(\nu)$ (bottom). The new basis functions are orthogonal and normalized, and thus allow us to easily project signals into a foreground-orthogonal space.
Figure 4.

Original foreground basis functions fn(ν) (top) and orthogonal foreground basis functions |$\tilde{f}_n(\nu)$| (bottom). The new basis functions are orthogonal and normalized, and thus allow us to easily project signals into a foreground-orthogonal space.

Analytical foreground marginalization:

To fit a signal model to the data one typically needs to simultaneously sample the signal and foreground parameters. However, we found that for Gaussian likelihoods and foreground models that are linear in their parameters, such as our equation (2) or equation (1) of Bowman et al. (2018), there exists a shortcut. We can analytically compute the posterior of the foreground parameters conditional on the signal parameters. This means that we can sample the foreground parameters analytically (negligible computational cost) and need to sample only the signal parameters numerically.

We start by inserting the orthogonal foreground model, equation (8), into the likelihood, equation (4). This yields the likelihood as a function of the FlexKnot parameters θFK = {νn, Tn} and the foreground parameters |$\theta _{\rm fg} = \lbrace \tilde{a}_n\rbrace$|⁠:

(10)

with the residuals vector |$\boldsymbol{\xi } = \mathbf {\mathit{ d}} - \mathbf {\mathit{ T}}_{\rm signal}(\theta _{\rm FK})$| (omitting the θFK dependence for brevity). We can simplify the exponent in two steps: First, we decompose |$\boldsymbol{\xi }$| into foreground-proportional terms with coefficients |$\mu _n = \boldsymbol{\xi } \boldsymbol{\cdot }\mathbf {\tilde{\mathit{ f}}}_n$| and a foreground-orthogonal remainder |$\boldsymbol{\xi }_r$|⁠. Secondly, we use the orthogonality of the bases to write the square of the sum as a sum of squares, as all cross-terms vanish due to orthogonality.

(11)
(12)

Adding the foreground and FlexKnot prior terms gives us

(13)

where the prior scale on foreground parameters |$\sigma _{\tilde{a}_n}= 10^6\, \mathrm{K}$| is much larger than noise level |$\sigma _{\rm noise}\ll 1\, \mathrm{K}$| and much larger than the range allowed by the likelihood (on the order of |$10^3\, \mathrm{K}$|⁠). Thus, we can approximate the foreground prior term by a constant and obtain

At this point we see the posterior has factorized into a foreground-dependent Gaussian term, and a FlexKnot-dependent term. We can use this to both, analytically sample the foreground parameters conditional on the FlexKnot parameters (simply a Normal distribution with mean μnFK) and scale σnoise), and to marginalize over the foreground parameters as follows:

(14)
(15)

We use this form in our implementation, sampling the FlexKnot parameters θFK numerically and sampling foreground parameters analytically as derived parameters in polychord. We store both, the orthogonal foreground parameters |$\tilde{a}_n$| and the original foreground parameters an in our chains, and will show the posterior distributions of both (Figs D1 and 10 and 13, respectively).

We emphasize that, unlike for the GP analysis, this orthogonalization is not essential for the FlexKnot method to work. All it does is significantly improve the sampling efficiency, both by reducing the parameter space by five dimensions, and by letting us skip sampling over the degenerate foreground parameters.

2.5.2 Mean subtraction

We apply one important post-processing step to the final FlexKnot signals, which is to normalize all functions to have zero8 mean. This is necessary because, while FlexKnot can eliminate almost the entire foreground degeneracy that affects the shape of the signal, it cannot eliminate the degeneracy between the absolute signal level and the foreground level.

This is because (i) the foregrounds can combine to create an almost constant offset in the signal, as demonstrated by the blue line in the lower panel of Fig. 4. And because (ii) a FlexKnot parametrization can also allow for a constant offset by shifting all knot coordinates vertically. The latter does not incur a complexity penalty, thus creating a degeneracy between the signal level and the foreground level that cannot be removed by the FlexKnot method.

The post-processing step allows us to visualize the good constraint on the shape and contrast of the signal. In terms of scientific results, this means we cannot constrain quantities that depend on the absolute signal level but it leaves inferences about the shape of the signal unaffected. Constraints on absolute signal levels can be made possible by adding more assumptions, such as fixing the value of the signal at a given frequency based on a physical model, but this is beyond the scope of this work.

Note that a similar technique does not solve the degeneracy of the Gaussian Process fit seen in Fig. 7. This is because the GP already penalizes constant offsets of the fit, and an additional mean subtraction would be less useful. We explicitly show this in Appendix  A.

2.5.3 FlexKnot priors

We choose simple priors on the FlexKnot parameters, i.e. the coordinates of the interpolation points. We allow the frequency coordinates νi to lie between 50  and 100 MHz (the frequency range corresponding to the EDGES low-band observations), and the temperature coordinates Ti to lie between −2 K and 2 K (the largest signal amplitude we consider9). The temperature prior is uniform (flat), for the frequency prior we use a forced identifyability prior (Handley, Hobson & Lasenby 2015b). This means we sample the frequency coordinates as though they were drawn from a uniform (flat) prior, and then sorted so that the knot coordinates are always in ascending order. This gives the same result as a uniform (flat) prior on νi, but avoids the combinatorial explosion of having to sample every possible permutation of the knot coordinates. As for the number of interpolation knots Nknots, we use a discrete uniform prior from the minimum of 2 knots up to 15 knots; we notice empirically that the evidence for models with more than 15 knots is negligible. As prior on the foreground parameters we adopt a Normal distribution |$\mathcal {N}\left(0\, \mathrm{K}, 10^6\, \mathrm{K}\right)$| over the transformed foreground parameters |$\tilde{a}_n$| (see Section 2.5.1). Given that the actual values lie in the |$\pm 10^3\, \mathrm{K}$| range,10 this is effectively an uninformative and approximately flat prior. The difference in priors between this choice and a prior on an (as chosen for the Gaussian Process) is negligible, as the priors have such a small effect on our results. The noise level σnoise is a free parameter as well, here we choose the same prior as in the Gaussian Process run, a non-negative Normal distribution |$\mathcal {N}^+(0\, \mathrm{K}, 1\, \mathrm{K})$|⁠.

In addition to these, we also consider additional priors to encode our prior knowledge about the signal. In our analysis we consider either: a uniform prior |$[-2\, \mathrm{K}, 2\, \mathrm{K}]$| on the signal values, which affects the posterior only slightly beyond the existing knot coordinate priors, a Gaussian prior |$0\, \mathrm{K}\pm 1\, \mathrm{K}$| on the signal contrast (the difference between the highest and lowest signal value), or a Gaussian prior on every signal point. We use the second option of a Gaussian prior on the signal contrast for the rest of our analysis and show the effects of other choices in Appendix  B.

3 RESULTS

We apply the Gaussian Process and FlexKnot methods to both, the synthetic data, where we know the ground truth, and the EDGES low-band data from Bowman et al. (2018) to test our methods on a real measurement. We will first discuss the GP fit (Section 3.1) to both data sets, then the FlexKnot fit to the synthetic data (Section 3.2) and EDGES low-band data (Section 3.3).

3.1 Gaussian Process

As discussed in Section 2.4, we fit both, an unconstrained GP and a foreground-orthogonal GP. Extracting the signal with an unconstrained fit proves difficult, but we can extract the foreground-orthogonal part of the signal with a high degree of confidence. We will start this section with that result, and show the corresponding results for the unconstrained GP in Fig. 7.

We show the foreground-orthogonal Gaussian Process posterior fit to the synthetic data in Fig. 5. The plot shows the posterior distribution (blue contours) of the Gaussian Process that is (by construction) orthogonal to the foreground component. The shades of blue show the 1σ, 2σ, and 3σ contours, corresponding to 68 per cent, 95 per cent, and 99.7 per cent credibility intervals. These constraints on the foreground-orthogonal components show us which data features need to be explained by systematics or an astrophysical model. For comparison, we also show the projected ground truth signal (red line) and a projection of the synthetic data (black points) that this fit is based on. These projections are the foreground-orthogonal components of the ground truth signal and data, respectively, using the orthogonalization discussed in Section 2.5.1. We see that the GP posterior matches the ground truth signal very well, and its width is consistent with the scatter in the data points. The posterior for the noise level (a free parameter) is |$\sigma _\textrm {noise}=0.027\, \mathrm{K}\pm 0.002\, \mathrm{K}$|⁠, which is consistent with the input noise level of 0.025 K.11 As for the other GP parameters, we give mean values and 95 per cent credible intervals to account for the asymmetry of the distribution. The amplitude σf has mean 0.93 K and lies within |$[0.2\, \mathrm{K}, 2.1\, \mathrm{K}]$|⁠; the kernel length scale ℓ has mean 7.3 MHz and lies within |$\in [5.5\, \mathrm{MHz}, 8.7\, \mathrm{MHz}]$|⁠.

Gaussian Process fit to the synthetic data set introduced in Section 2.2. The blue contours indicate the 68 per cent, 95 per cent, and 99.7 per cent credible intervals of the temperature values Tsignal, projected at every frequency, derived from the Gaussian Process posterior. The red line shows the foreground-orthogonal component of the true signal (see Section 2.2 for details) and the black dots show the foreground-orthogonal component of the synthetic data vector. The figure shows that the Gaussian Process correctly recovers the input signal shape and its uncertainty is consistent with the scatter in the data.
Figure 5.

Gaussian Process fit to the synthetic data set introduced in Section 2.2. The blue contours indicate the 68 per cent, 95 per cent, and 99.7 per cent credible intervals of the temperature values Tsignal, projected at every frequency, derived from the Gaussian Process posterior. The red line shows the foreground-orthogonal component of the true signal (see Section 2.2 for details) and the black dots show the foreground-orthogonal component of the synthetic data vector. The figure shows that the Gaussian Process correctly recovers the input signal shape and its uncertainty is consistent with the scatter in the data.

Applying the same method to the EDGES low-band data from Bowman et al. (2018), we obtain the posterior shown in Fig. 6. Again we show the posterior contours in blue, and the projected data points in black, but in this case we do not know the true signal. An interesting comparison is to see how well a flattened Gaussian signal (as reported in Bowman et al. 2018) can explain the data, or equivalently, how well it can explain the foreground-orthogonal component shown here. We show a (projected) fit of the flattened Gaussian model (equation 3) as an orange dashed line, and see that although the flattened Gaussian fit largely overlaps with the GP posterior, it significantly deviates from the GP contours around 74 , 82, and 91 MHz.

Gaussian Process fit to the EDGES low-band data (from Bowman et al. 2018). We show posterior credible intervals as blue contours, and the projected data points in black. We do not have a ground truth as the real cosmological or systematic signal present is unknown. However, we can show a flattened Gaussian fit (orange dashed line, equation 3) following Bowman et al. (2018) for comparison. This shows us which features in the signal the flattened Gaussian cannot explain, and we see multiple points where the GP and flattened Gaussian fits deviate by more than 3σ. Note that the orange dashed line here is a fit (varying A, ν0, w, and τ) of equation (3) to the EDGES data, while the red solid line in Fig. 5 is the known input signal for the synthetic data. Both look alike because the input signal for the synthetic data was chosen to be similar to the best-fitting signal in Bowman et al. (2018). Also note that the shape of the flattened Gaussian looks unfamiliar because we are only showing its foreground-orthogonal component.
Figure 6.

Gaussian Process fit to the EDGES low-band data (from Bowman et al. 2018). We show posterior credible intervals as blue contours, and the projected data points in black. We do not have a ground truth as the real cosmological or systematic signal present is unknown. However, we can show a flattened Gaussian fit (orange dashed line, equation 3) following Bowman et al. (2018) for comparison. This shows us which features in the signal the flattened Gaussian cannot explain, and we see multiple points where the GP and flattened Gaussian fits deviate by more than 3σ. Note that the orange dashed line here is a fit (varying A, ν0, w, and τ) of equation (3) to the EDGES data, while the red solid line in Fig. 5 is the known input signal for the synthetic data. Both look alike because the input signal for the synthetic data was chosen to be similar to the best-fitting signal in Bowman et al. (2018). Also note that the shape of the flattened Gaussian looks unfamiliar because we are only showing its foreground-orthogonal component.

The Gaussian Process fit posterior for the noise level is |$\sigma _\mathrm{noise} = 0.021\, \mathrm{K}\pm 0.002\, \mathrm{K}$|⁠, somewhat lower than found in Bowman et al. (2018), which is expected, as the Gaussian Process fits the data more accurately. For the other parameters, we again give the mean value and 95 per cent credible intervals: The amplitude mean is 1.95 K with the credible interval |$\sigma _f \in [0.4\, \mathrm{K}, 3.8\, \mathrm{K}]$|⁠; the mean kernel length scale is 7.23 MHz with interval |$\ell \in [5.6\, \mathrm{MHz}, 8.9\, \mathrm{MHz}]$|⁠.

We want to comment on the sharp features at 70 and 90 MHz in Figs 5 and 6. These are reminiscent of the features found in the original EDGES analysis (Bowman et al. 2018) and demonstrate an example of a feature that cannot be absorbed by the foregrounds, and thus is present in both analyses.12 Note however that this feature does not necessarily have to be present as a sharp rise and fall like it is here: In Section 3.3 (specifically Fig. 11), we will see signal shapes that contain the same foreground-orthogonal component but look visually different. This highlights why the foreground-orthogonal projection is such a useful tool for comparing whether different signals are consistent with the data, or with each other.

As a point of comparison we also show the fit of an unconstrained Gaussian Process to both data sets in Fig. 7. The posterior constraints show large uncertainties (blue contour bands) which are caused by the degeneracy with foregrounds. These constraints depend somewhat on the choice of priors, but the qualitative result remains the same.

Gaussian Process posterior contours without orthogonalization. The blue contours show the posterior distribution of signals for the synthetic data set (top) and the EDGES low-band data (bottom). The red solid line (top) indicates the input signal (ground truth) for the synthetic data, the orange dashed line (bottom) shows the flattened Gaussian fit to the EDGES data. The Gaussian Process contours for the synthetic data are consistent with the input, but both Gaussian Process posteriors are rather wide and unconstraining due to their degeneracy with the foreground term.
Figure 7.

Gaussian Process posterior contours without orthogonalization. The blue contours show the posterior distribution of signals for the synthetic data set (top) and the EDGES low-band data (bottom). The red solid line (top) indicates the input signal (ground truth) for the synthetic data, the orange dashed line (bottom) shows the flattened Gaussian fit to the EDGES data. The Gaussian Process contours for the synthetic data are consistent with the input, but both Gaussian Process posteriors are rather wide and unconstraining due to their degeneracy with the foreground term.

3.2 FlexKnot signal fit to synthetic data

Unlike our Gaussian Process, the FlexKnot method can reliably produce a posterior for the full signal. As discussed in Section 2.5, the reason this is possible is that the FlexKnot method has an implicit prior towards simpler, more predictive functions due to the Bayesian evidence weighting. This is the distinguishing criterion between multiple signals that fit the data equally well: while their foreground-orthogonal projection can be the same, the full signals differ in complexity. The FlexKnot method will find the simplest signals among the infinite number of signals that are consistent with the data. While the simplest signal is not necessarily the true signal, we expect it to be at least instructive, and notice that the method does in fact recover the ground truth for the synthetic data.

First, we plot the posterior contours of the FlexKnot functional posterior, showing the posterior of the signal value at every frequency. The top panel of Fig. 8 shows credible intervals (blue contours) along with the ground truth signal (red line) used to generate the synthetic data. We see that the contours are compatible with the input signal and indeed suggest the correct shape. Overall we see a much larger uncertainty than in the orthogonal Gaussian Process fit, which is expected since here we are fitting the full signal, not just the foreground-orthogonal component. The main source of uncertainty is the degeneracy with the foregrounds.

FlexKnot fit to our synthetic data set, visualized as posterior contours (top) and random posterior samples (bottom). These plots use the full signal Tsignal, rather than the foreground-orthogonal projection we showed for the GP fit. The only post-processing we apply is to shift the signal to have zero mean Tsignal − 〈Tsignal〉ν. Both plots show the input signal (red line) as a ground truth that the synthetic data was generated from. Top: Credible intervals of the temperature value at each frequency ν, visualized as blue contours correspond to 68 per cent, 95 per cent, and 99.7 per cent credible intervals, respectively. We see that the ground truth (red line, synthetic data input signal) is recovered within the 68 per cent credible region of the posterior. Bottom: random samples drawn proportional to the FlexKnot posterior. Every sample is a function T(ν) shown as a thin black line. This visualization gives us a better view of the functional shapes contained in the posterior. We see that many samples have a similar shape to the ground truth (red line), although a fraction of posterior samples spread out at low frequencies and do not recover the flat nature of the ground truth. Overall though we could infer the underlying signal from observing the posterior samples.
Figure 8.

FlexKnot fit to our synthetic data set, visualized as posterior contours (top) and random posterior samples (bottom). These plots use the full signal Tsignal, rather than the foreground-orthogonal projection we showed for the GP fit. The only post-processing we apply is to shift the signal to have zero mean Tsignal − 〈Tsignalν. Both plots show the input signal (red line) as a ground truth that the synthetic data was generated from. Top: Credible intervals of the temperature value at each frequency ν, visualized as blue contours correspond to 68 per cent, 95 per cent, and 99.7 per cent credible intervals, respectively. We see that the ground truth (red line, synthetic data input signal) is recovered within the 68 per cent credible region of the posterior. Bottom: random samples drawn proportional to the FlexKnot posterior. Every sample is a function T(ν) shown as a thin black line. This visualization gives us a better view of the functional shapes contained in the posterior. We see that many samples have a similar shape to the ground truth (red line), although a fraction of posterior samples spread out at low frequencies and do not recover the flat nature of the ground truth. Overall though we could infer the underlying signal from observing the posterior samples.

We make one post-processing adjustment to the posteriors for all FlexKnot plots in this section: we shift all signals to have zero mean. This is because one of the foreground components is an approximately constant signal, and without additional priors, we cannot resolve that degeneracy since the constant does not add any complexity. However, we are mostly interested in the shape of the signal here, and not its absolute level, thus this is not an issue in our analysis. Note that this is the same effect that causes the FlexKnot knot coordinate posteriors in Fig. 3 to be stretched out vertically. In Fig. C1, we show the posterior functions without this mean-zero shift. In the same figure, we also show the foreground-orthogonal projection of the FlexKnot posterior for comparison with the GP results.

In the posterior contour plot (top panel of Fig. 8), it can be hard to determine the shapes of individual functions, so in addition, we want to look at a representative sample (drawn proportional to the posterior) of signals. The bottom panel of Fig. 8 shows ≈4000 of such function samples, and allows us to get an intuition for the functional shapes. Here we can see that almost all functions have a flattened absorption trough, similar to the ground truth signal (red line). On the low-frequency tail, we see the posterior spreading out, as well as at the high-frequency tail (although to a lesser degree).

The posterior distribution is marginalized over the number of knots. As discussed in Section 2.5, this is an average over all runs, weighted by the Bayesian evidence of each run.13 We show the evidence as a function of Nknots in Fig. 9. We see that the evidence rises sharply until we reach the peak at Nknots = 6, and falls off continuously after that. Both effects are what we would expect: the sharp rise since low-Nknots models cannot adequately fit the data, and the fall off since high-Nknots models are more complex than necessary and less predictive, thus being weakly penalized by the Bayesian evidence. At the largest number we explore, Nknots = 15, the evidence has fallen to Z15 ≈ e223 which is smaller than the peak evidence (Z6 ≈ e245) by a factor of >109. Thus, we expect larger numbers of knots to be negligible, and truncate our analysis after Nknots = 15.

Bayesian evidence (logarithmic, base e) as a function of the number of knots for the FlexKnot fit to the synthetic data set. Marginalizing over the number of knots corresponds to weighing runs by their evidence Z so this plot shows us which runs account for the largest amount of posterior mass. For each Nknots value we perform multiple Nested Sampling runs (shown as black dots) to account for outliers, and take the mean (red "-" markers) of the log Z values since Z is lognormal distributed (Handley, Hobson & Lasenby 2015b).
Figure 9.

Bayesian evidence (logarithmic, base e) as a function of the number of knots for the FlexKnot fit to the synthetic data set. Marginalizing over the number of knots corresponds to weighing runs by their evidence Z so this plot shows us which runs account for the largest amount of posterior mass. For each Nknots value we perform multiple Nested Sampling runs (shown as black dots) to account for outliers, and take the mean (red "-" markers) of the log Z values since Z is lognormal distributed (Handley, Hobson & Lasenby 2015b).

Finally, the noise level and posteriors of the foreground parameters are shown in Fig. 10. As expected, the foreground parameters are highly correlated. We note that the transformed foreground parameters |$\tilde{a}_n$| (see Section 2.5.1) are significantly less correlated, as demonstrated in Fig. D1. In both Figures, we see that the foreground posterior parameters are consistent with the input parameters at the 95 per cent confidence level. We find a noise level of |$\sigma _{\rm noise} = 0.026\, \mathrm{K} \pm 0.001\,$|K, also consistent with |$0.025\,$|K noise added to the synthetic data.

The posterior distribution of noise and foreground parameters for the FlexKnot signal fit to the synthetic data set. We find the expected degenerate distribution of foreground parameters but recover the input values (this is more visible in the upper panel of Fig.  D1 in the Appendix). Note that the degeneracy is not a problem for our Nested Sampling due to the foreground orthogonalization discussed in Section 2.5.1. The noise posterior distribution shown in the top right insert is consistent with the input noise level (red dashed line).
Figure 10.

The posterior distribution of noise and foreground parameters for the FlexKnot signal fit to the synthetic data set. We find the expected degenerate distribution of foreground parameters but recover the input values (this is more visible in the upper panel of Fig.  D1 in the Appendix). Note that the degeneracy is not a problem for our Nested Sampling due to the foreground orthogonalization discussed in Section 2.5.1. The noise posterior distribution shown in the top right insert is consistent with the input noise level (red dashed line).

As discussed in Section 2.5, we have chosen a Gaussian prior on the largest difference in the signal values for our FlexKnot implementation. Different choices of priors can affect the posterior if the signals differ significantly, but the effect on the synthetic data set is small. We show posterior plots for other prior choices in Appendix  B, for both the synthetic data and the EDGES low-band data.

3.3 FlexKnot signal fit to EDGES low-band data

Now we apply the FlexKnot analysis to the EDGES low-band data from Bowman et al. (2018). In Fig. 11 (top panel), we show the resulting posterior contours along with the flattened Gaussian fit for comparison. The posterior contours are more complex compared to the synthetic data (which only contain the flattened Gaussian signal, a foreground, and random noise) and appear multimodal. The credible intervals are rather large, as the different modes cover different parts of the temperature range.

FlexKnot posteriors for the EDGES low-band data set (Bowman et al. 2018), visualized as credible intervals (top) and posterior samples (bottom). The plots show the full signal except for a mean-zero shift, as described in Section 3.2. Top: credible intervals of the temperature Tsignal − 〈Tsignal〉ν at every frequency (blue contours) derived from the FlexKnot fit, along with the best-fitting flattened Gaussian model (orange dashed line, equation 3) for comparison. While we can recognize the main posterior mode from this plot, the contours indicate a multimodal distribution and do not describe the posterior in a useful way. Bottom: random samples drawn from the FlexKnot posterior. We show five subplots in this panel: Subpanel A shows samples drawn from the whole posterior (black), as well as the best-fitting flattened Gaussian model (orange dashed line) again for comparison. The posterior samples (black line) clearly show the multi-modal nature of the posterior. We can distinguish four modes in the posterior by eye, and colour these in the subpanels below. Subpanel B highlights the largest mode (blue, 61 per cent of the posterior), followed by subpanel C (green, 14 per cent), subpanel D (orange, 23 per cent), and subpanel E (red, 2.8 per cent). In each panel, we show all posterior samples in grey and highlight the respective mode in colour.
Figure 11.

FlexKnot posteriors for the EDGES low-band data set (Bowman et al. 2018), visualized as credible intervals (top) and posterior samples (bottom). The plots show the full signal except for a mean-zero shift, as described in Section 3.2. Top: credible intervals of the temperature Tsignal − 〈Tsignalν at every frequency (blue contours) derived from the FlexKnot fit, along with the best-fitting flattened Gaussian model (orange dashed line, equation 3) for comparison. While we can recognize the main posterior mode from this plot, the contours indicate a multimodal distribution and do not describe the posterior in a useful way. Bottom: random samples drawn from the FlexKnot posterior. We show five subplots in this panel: Subpanel A shows samples drawn from the whole posterior (black), as well as the best-fitting flattened Gaussian model (orange dashed line) again for comparison. The posterior samples (black line) clearly show the multi-modal nature of the posterior. We can distinguish four modes in the posterior by eye, and colour these in the subpanels below. Subpanel B highlights the largest mode (blue, 61 per cent of the posterior), followed by subpanel C (green, 14 per cent), subpanel D (orange, 23 per cent), and subpanel E (red, 2.8 per cent). In each panel, we show all posterior samples in grey and highlight the respective mode in colour.

To better visualize this posterior distribution we again turn to a plot of randomly drawn posterior function samples, shown in the bottom panels of Fig. 11. We can see a clear distinction and observe the individual signal modes in this plot.14 We show the same posterior samples in five subpanels of Fig. 11: subpanel A shows all samples in black, and again the best-fitting flattened Gaussian for comparison. The next four subpanels (B through E) each highlight a subset of the posterior samples in colour to illustrate the different apparent modes in the posterior. This ad hoc classification is based on the signal slope between 73 and 84 MHz and should not be treated as a rigorous classification, and thus the numbers we give below are only to be taken as estimates.

We notice that the majority of the posterior samples (subpanel B, |$\sim 61{{\ \rm per\ cent}}$|⁠) follow an oscillatory pattern with a peak around 65 MHz, a trough around 85 MHz, and another peak around 95 MHz.

A smaller fraction of the data (subpanel C, |$\sim 14{{\ \rm per\ cent}}$|⁠) shows a similar shape with an additional trough around 55 MHz and without the 95 MHz peak. Neither of these two signal modes is compatible with the flattened Gaussian shape, and neither would be detected in an analysis that forcefully assumes such a shape.

The third mode (subpanel D, |$\sim 23{{\ \rm per\ cent}}$|⁠) resembles the signal found by Bowman et al. (2018) with a trough at 78 MHz, except for the tails at low and high frequencies that are not flat in our case. We also note that the flattened Gaussian best-fitting profile (orange dashed line in subpanel A) shows a much deeper trough that is not recovered by the FlexKnot fits.

Finally, the small fourth family of posterior samples (subpanel E, |$\sim 2.8{{\ \rm per\ cent}}$|⁠) follows an inverted oscillatory pattern, with a trough around 65 MHz and a peak around 90 MHz. However, it is much less probable than the other modes, and only highlighted here for its distinct shape.

The balance between the different signal modes is directly influenced by any priors chosen for the analysis. As we show in Appendix  B, the prior choice does not particularly affect the shapes, but does change the relative probabilities of the different modes. In particular, the third mode (subpanel D) has a lower contrast (difference between maximum and minimum of the signal) and thus is preferred by the Gaussian prior (see Fig. B1).

We also show the Bayesian evidence as a function of Nknots in Fig. 12. The evidence again peaks at Nknots = 6 and falls off after that. However, this fall-off is slightly weaker than for the synthetic data (Fig. 9). This could be an indication that the signal underlying the EDGES low-band data is more complex than the flattened Gaussian we used for the synthetic data. However, this trend is only a weak argument, and in both cases the evidence peaks at the same complexity level. The ratio between the peak (⁠|$\mathit{ Z}=\rm e^{267}$|⁠) and Nknots = 15 (⁠|$\mathit{ Z}=\rm e^{255}$|⁠) is still very large (>106) so we can still safely truncate our analysis at that point.

Log evidence as a function of Nknots for FlexKnot results on the EDGES low-band data (from Bowman et al. 2018) (note the split y-axis). Again we show individual runs (black dots) as well as the mean (red "-" markers) for every Nknots value. The evidence again peaks at Nknots = 6 but falls off less strongly than for the synthetic data (Fig. 9). The evidence still falls off by more than a factor of 106 making Nknots > 15 negligible. The slower fall-off suggests that the signal contained in the EDGES data is more complex than the flattened Gaussian in our synthetic data.
Figure 12.

Log evidence as a function of Nknots for FlexKnot results on the EDGES low-band data (from Bowman et al. 2018) (note the split y-axis). Again we show individual runs (black dots) as well as the mean (red "-" markers) for every Nknots value. The evidence again peaks at Nknots = 6 but falls off less strongly than for the synthetic data (Fig. 9). The evidence still falls off by more than a factor of 106 making Nknots > 15 negligible. The slower fall-off suggests that the signal contained in the EDGES data is more complex than the flattened Gaussian in our synthetic data.

We show the posterior for foreground parameters in Figs 13 (an) and D1 (⁠|$\tilde{a}_n$|⁠). The foreground parameters an values cover a large range due to their degeneracy, while the transformed |$\tilde{a}_n$| parameters cover a narrower range. The FlexKnot foreground posteriors cover a range near the flattened Gaussian best-fitting point (yellow crosses, best visible in Fig. D1, lower panel) but do not overlap as well as the synthetic data inputs did with the respective posteriors (upper panel of Fig. D1). This is not unexpected as the EDGES data might not be well described by a flattened Gaussian. The noise level of |$\sigma _\textrm {noise} = 0.020\, \mathrm{K} \pm 0.001\, \mathrm{K}$| is higher than the passive load test done by the EDGES team (0.015 K; Bowman et al. 2018), but closer to the expectation than the flattened Gaussian fit (⁠|$\sigma _\textrm {noise} = 0.026\, \mathrm{K} \pm 0.002\, \mathrm{K}$|⁠) showing that the FlexKnot functions allow a closer fit to the data.

Noise and foreground parameter distribution for the FlexKnot signal fit to the EDGES low-band data. The foreground parameters an are degenerate as expected. We notice the FlexKnot fit is closer to the data and thus produces a lower noise posterior σnoise (top right insert, compared to the dashed orange line).
Figure 13.

Noise and foreground parameter distribution for the FlexKnot signal fit to the EDGES low-band data. The foreground parameters an are degenerate as expected. We notice the FlexKnot fit is closer to the data and thus produces a lower noise posterior σnoise (top right insert, compared to the dashed orange line).

Fig. 13 also shows a multimodal distribution of foreground parameters. This is expected, as the signal distribution (Fig. 11) is multi-modal and the foreground plus signal components must add up to the data. We confirmed that the clusters seen in Fig. 13 are indeed the same clusters as in Fig. 11.

4 DISCUSSION

In this section, we want to discuss the strengths and limitations of the proposed methods, specifically the foreground orthogonalization we used and its dependence on the chosen foreground model.

The foreground-orthogonal Gaussian Process fit is a new way to analyse the 21 cm global signal. It is a tool to extract and present exactly the part of the data that cannot be explained by the foregrounds, allowing us a direct comparison to theoretical predictions. This does not assume that the theoretical signals are foreground-orthogonal, we just apply a foreground-orthogonal projection to their theoretical description. As discussed in Section 2.4.1, this projection is lossless, allowing us to obtain the likelihood without the need for further sampling and marginalization.

Relying on the Gaussian Process fit to compare observations to theory models requires high confidence that the GP result indeed follows the data. By design, the GP foreground-orthogonal fit is a very robust method because the data always contains an unambiguous foreground-orthogonal component. We can verify this by plotting this component, i.e. the foreground-orthogonal projection of the data points themselves, along with the Gaussian Process posterior. We have done this in Figs 5 and 6, showing that the GP posterior contours indeed follow the data.

Finally, the foreground orthogonalization is based on distinguishing between foreground-parallel and foreground-orthogonal components of the signal. Thus any result is dependent on the assumed foreground model. This is not a unique problem to this method but applies to all methods that rely on foreground modelling. With the foreground model choice being an active topic of research (e.g. Hills et al. 2018; Singh & Subrahmanyan 2019; Bevins et al. 2021), we emphasize that our foreground orthogonalization can be applied directly to any other foreground model with unbounded linear parameters, such as equations (4), (7), (9) in Hills et al. (2018), and equation (2) through (8) in Bevins et al. (2021). The only change required is to recompute the design matrix Fni as described in Section 2.4.1. We leave expanding the GP method to foreground models with non-linear parameters or bounded parameter ranges for future work, and note that the FlexKnot method can already be applied to any parametrized foreground model without the aforementioned restrictions.

5 CONCLUSIONS

In this paper, we have presented two model-independent approaches to explore a 21 cm measurement and find the signal (cosmological or systematic) that is present in addition to the foregrounds. The two methods achieve different and complementary goals.

Our first method, based on Gaussian Processes, can reliably extract the foreground-orthogonal component of the signal. This is an easy-to-implement and robust method that has been under-utilized in the literature so far. We successfully apply the technique to both a synthetic data set and the EDGES low-band data (Bowman et al. 2018). In the future, we expect this method to be used to compare observational data to theoretical signal predictions. In particular, theorists could explore which types of astrophysical models are especially “bright” in the foreground-orthogonal space and are thus easier to detect or rule out. Furthermore, one could approach investigating possible sources of systematics in the same way i.e. consider which kinds of systematics can explain the foreground-orthogonal part of the observed data.

Transitioning to our second method, we can summarize the gist of our first method as absorbing as much of the observed sky temperature as possible into the foreground component, ending up with only the features that cannot be explained by the foreground. Our second method on the other hand, based on FlexKnot, absorbs only as much of the observation into the foreground component as is beneficial to leave the simplest signal remainder.

Our FlexKnot approach is the first free-form parametrization that can fit the entire 21 cm signal present in the data (foreground-orthogonal and -parallel components) without relying on a theoretical model or pre-defined functional shape. Instead it utilizes a flexible interpolation scheme and Bayesian model selection to find the simplest signal that explains the data. We successfully demonstrate its ability on a synthetic data set and present a novel analysis of the EDGES low-band data (Bowman et al. 2018). The multi-modal posterior distribution we find suggests that multiple signals are consistent with the data, comparable to individual results in the literature (e.g. Hills et al. 2018).

For both of our methods, we briefly want to discuss the required instrument sensitivity. The Gaussian Process method relies on the foreground-orthogonal component of the data being significantly larger than the noise level (otherwise the Gaussian Process is consistent with zero), and the FlexKnot method indirectly relies on the same criterion (otherwise the simplest FlexKnot model to explain the data will be a constant). Thus we can only expect these methods to provide useful results if the foreground-orthogonal projection of the data is significantly larger than the noise level of the instrument. This is a similar sensitivity requirement to other methods, as a lower sensitivity would always allow a foreground-only fit to explain the data.

In the future, we expect the foreground-orthogonalization to be applied to a multitude of data sets, theoretical signal predictions, and different foreground models. In particular, a combination with recent advances in foreground modelling (Tauscher et al. 2018; Anstey, de Lera Acedo & Handley 2021; Shen et al. 2022) could provide the basis for future 21 cm global signal analyses.

We are especially excited about extensions of the FlexKnot method. In particular, while distinguishing between cosmological and systematic contributions is not within the scope of this paper, there are properties we expect of the cosmological signal that can help us distinguish between the two. The FlexKnot method could be expanded to model the cosmological signal specifically (Shen et al. 2023), or to focus on certain types of systematics.

Acknowledgement

We would like to thank Adam Ormondroyd, Harry Bevins, and Will Handley for helpful discussions about our Nested Sampling procedure, and Thomas Gessey-Jones for providing the simulation data used for Fig. 2. We also thank the anonymous reviewer for their helpful feedback.

Besides the polychord, anesthetic, and fgivenx libraries mentioned before, this work makes wide use of the numpy (Harris et al. 2020), scipy Virtanen et al. (2020), pandas (Reback et al. 2022), and Matplotlib (Hunter 2007) libraries.

SH acknowledges the support of Science and Technology Facilities Council (STFC,grant code ST/T505985/1), via the award of a Doctoral Training Partnership (DTP) Ph.D. studentship, and the Institute of Astronomy for a maintenance award. HL was supported by the Institute of Astronomy as part of the 2021 summer internship programme. FP was supported by the UK Medical Research Council programme MC_UU_00002/10, and the Engineering and Physical Sciences Research Council EP/R018561/1. AF was supported by the Royal Society University Research Fellowship.

DATA AVAILABILITY

The data underlying this article will be shared on reasonable request to the corresponding author.

Footnotes

1

Note that concurrent work by Mertens, Bobin & Carucci (2023) uses GPs to model the 21 cm signal based on seminumerical simulations, while our goal is to achieve modelling of the signal without relying on simulations.

2

Note that we do not expect our synthetic data to exactly match the EDGES observations. The reasons for this are (i) that we picked a set of plausible parameters but not exactly the best-fitting parameters from EDGES, (ii) that we use a slightly different foreground model, and (iii) that the EDGES observations may contain a signal that is different from the flattened Gaussian we assumed for the synthetic data.

3

The remaining simulation parameters are set to |$R_{\rm mfp}=40\, \mathrm{Mpc}$|⁠, ζ = 15, |$V_{\mathrm{ c}}=15\, \mathrm{km\,s^{ -1}}$|⁠, f*,II = 0.02, f*,III = 0.01, fr = 1, and the intermediate stellar initial mass function from Gessey-Jones et al. (2022).

4

Note that the posterior distributions are very stretched in the vertical direction. This is due to a degeneracy of the signal mean with the foregrounds that we discuss in Section 3.2.

5

We use the scipy implementation scipy.interpolate.pchip (Virtanen et al. 2020).

6

This is the same matrix as the design matrix from Section 2.4.1.

7

Note that the resulting matrices U, D, and VT are 5 × 5, 5 × 5, and 5 × 123 dimensional, respectively; Fig. 4 shows the five 123-dimensional rows of F (top) and |$\mathbf {\tilde{\mathit{ F}}} = \mathbf {\mathit{ V}}^T$| (bottom).

8

Zero is an arbitrary choice for illustration purposes.

9

Because we know there exist signals <1 K that can fit the data (as demonstrated in Bowman et al. 2018) we do not consider much larger signals.

10

We will see this in Section 3, specifically in Figs 10 and 13.

11

The noise level of the foreground-orthogonal component is expected to be the same as the full signal, as the foreground-parallel component cannot absorb random thermal noise.

12

The same feature appears in the synthetic data analysis because that data is generated with a similar profile as the EDGES flattened Gaussian fit, as discussed in Section 2.2.

13

The weights are normalized to sum to 1 while preserving the relative weight between runs.

14

We have reproduced this plot with multiple Nested Sampling chains to confirm that the modes correspond to the posterior shape and are not an artefact of the sampling.

References

Abazajian
K. N.
,
Aslanyan
G.
,
Easther
R.
,
Price
L. C.
,
2014
,
J. Cosmol. Astropart. Phys.
,
2014
,
053

Abdurashidova
Z.
et al. ,
2022
,
ApJ
,
925
,
221

Almosallam
I. A.
,
Lindsay
S. N.
,
Jarvis
M. J.
,
Roberts
S. J.
,
2016
,
MNRAS
,
455
,
2387

Anstey
D.
,
de Lera Acedo
E.
,
Handley
W.
,
2021
,
MNRAS
,
506
,
2041

Aslanyan
G.
,
Price
L. C.
,
Abazajian
K. N.
,
Easther
R.
,
2014
,
J. Cosmol. Astropart. Phys.
,
2014
,
052

Barkana
R.
,
2016
,
Phys. Rep.
,
645
,
1

Barkana
R.
,
2018
,
Nature
,
555
,
71

Bassett
N.
,
Rapetti
D.
,
Tauscher
K.
,
Burns
J. O.
,
Hibbard
J. J.
,
2021
,
ApJ
,
908
,
189

Bevins
H. T. J.
,
Handley
W. J.
,
Fialkov
A.
,
de Lera Acedo
E.
,
Greenhill
L. J.
,
Price
D. C.
,
2021
,
MNRAS
,
502
,
4405

Bowman
J. D.
,
Rogers
A. E. E.
,
Hewitt
J. N.
,
2008
,
ApJ
,
676
,
1

Bowman
J. D.
,
Rogers
A. E. E.
,
Monsalve
R. A.
,
Mozdzen
T. J.
,
Mahesh
N.
,
2018
,
Nature
,
555
,
67

Bridges
M.
,
Feroz
F.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2009
,
MNRAS
,
400
,
1075

Chandrasekhar
S.
,
1960
,
Radiative Transfer
.
Dover Publications
,
New York

Cohen
A.
,
Fialkov
A.
,
Barkana
R.
,
Lotem
M.
,
2017
,
MNRAS
,
472
,
1915

Datta
A.
,
Bradley
R. F.
,
Burns
J. O.
,
Harker
G. J. A.
,
Komjathy
A.
,
Lazio
T. J. L. W.
,
2016
,
ApJ
,
831
:

de Lera Acedo
E.
et al. ,
2022
,
Nat. Astron.
,
6
,
984

Dillon
J. S.
et al. ,
2014
,
Phys. Rev. D
,
89
,
023002

Escamilla
L. A.
,
Vazquez
J. A.
,
2023
,
Eur. Phys. J. C
,
83
,
251

Ewall-Wice
A.
,
Chang
T. C.
,
Lazio
J.
,
Doré
O.
,
Seiffert
M.
,
Monsalve
R. A.
,
2018
,
ApJ
,
868
,
63

Feng
C.
,
Holder
G.
,
2018
,
ApJ
,
858
,
L17

Fialkov
A.
,
Barkana
R.
,
2014
,
MNRAS
,
445
,
213

Fialkov
A.
,
Barkana
R.
,
2019
,
MNRAS
,
486
,
1763

Fialkov
A.
,
Barkana
R.
,
Pinhas
A.
,
Visbal
E.
,
2014a
,
MNRAS
,
437
,
L36

Fialkov
A.
,
Barkana
R.
,
Visbal
E.
,
2014b
,
Nature
,
506
,
197

Finelli
F.
et al. ,
2018
,
J. Cosmol. Astropart. Phys.
,
2018
,
016

Furlanetto
S. R.
,
Zaldarriaga
M.
,
Hernquist
L.
,
2004
,
ApJ
,
613
,
1

Furlanetto
S. R.
,
Oh
S. P.
,
Briggs
F. H.
,
2006
,
Phys. Rep.
,
433
,
181

Garsden
H.
et al. ,
2021
,
MNRAS
,
506
,
5802

Gessey-Jones
T.
et al. ,
2022
,
MNRAS
,
516
,
841

Ghosh
A.
et al. ,
2020
,
MNRAS
,
495
,
2813

Handley
W.
,
2018
,
J. Open Source Softw.
,
3
,
849

Handley
W.
,
2019
,
J. Open Source Softw.
,
4
,
1414

Handley
W. J.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2015a
,
MNRAS
,
450
,
L61

Handley
W. J.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2015b
,
MNRAS
,
453
,
4384

Handley
W. J.
,
Lasenby
A. N.
,
Peiris
H. V.
,
Hobson
M. P.
,
2019
,
Phys. Rev. D
,
100
,
103511

Hanks
E. M.
,
Schliep
E. M.
,
Hooten
M. B.
,
Hoeting
J. A.
,
2015
,
Environmetrics
,
26
,
243

Harris
C. R.
et al. ,
2020
,
Nature
,
585
,
357

Hee
S.
,
Handley
W. J.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2016
,
MNRAS
,
455
,
2461

Hee
S.
,
Vázquez
J. A.
,
Handley
W. J.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2017
,
MNRAS
,
466
,
369

Heimersheim
S.
,
Sartorio
N. S.
,
Fialkov
A.
,
Lorimer
D. R.
,
2022
,
ApJ
,
933
,
57

Hills
R.
,
Kulkarni
G.
,
Meerburg
P. D.
,
Puchwein
E.
,
2018
,
Nature
,
564
,
E32

Hodges
J. S.
,
Reich
B. J.
,
2010
,
Am. Stat.
,
64
,
325

Hoffman
M.
,
Gelman
A.
,
2014
,
J. Mach. Learn. Res.
,
15
,
1593

Hunter
J. D.
,
2007
,
Comput. Sci. Eng.
,
9
,
90

Iliev
I. T.
,
Mellema
G.
,
Pen
U. L.
,
Merz
H.
,
Shapiro
P. R.
,
Alvarez
M. A.
,
2006
,
MNRAS
,
369
,
1625

Jishnu Nambissan
T.
et al. ,
2021
,
Exp. Astron.
,
51
,
193

Khan
K.
,
Calder
C. A.
,
2022
,
J. Am. Stat. Assoc.
,
117
,
482

Kolopanis
M.
et al. ,
2019
,
ApJ
,
883
,
133

Li
E.-K.
,
Du
M.
,
Zhou
Z.-H.
,
Zhang
H.
,
Xu
L.
,
2021
,
MNRAS
,
501
,
4452

Liu
A.
,
Shaw
J. R.
,
2020
,
PASP
,
132
,
062001

Liu
A.
,
Tegmark
M.
,
2012
,
MNRAS
,
419
,
3491

Madau
P.
,
Meiksin
A.
,
Rees
M. J.
,
1997
,
ApJ
,
475
,
429

Majumdar
S.
,
Mellema
G.
,
Datta
K. K.
,
Jensen
H.
,
Choudhury
T. R.
,
Bharadwaj
S.
,
Friedrich
M. M.
,
2014
,
MNRAS
,
443
,
2843

McQuinn
M.
,
Lidz
A.
,
Zahn
O.
,
Dutta
S.
,
Hernquist
L.
,
Zaldarriaga
M.
,
2007
,
MNRAS
,
377
,
1043

Mertens
F. G.
et al. ,
2020
,
MNRAS
,
493
,
1662

Mertens
F. G.
,
Semelin
B.
,
Koopmans
L. V. E.
,
2021
, in
Siebert
A.
,
et
al.
, eds,
SF2A-2021: Proceedings of the Annual Meeting of the French Society of Astronomy and Astrophysics
.
Société Française d’Astronomie et d’Astrophysique (SF2A)
,
France
, p.
211

Mertens
F. G.
,
Bobin
J.
,
Carucci
I. P.
,
2023
,
MNRAS
,
527
,
3517

Mesinger
A.
,
Furlanetto
S.
,
Cen
R.
,
2011
,
MNRAS
,
411
,
955

Millea
M.
,
Bouchet
F.
,
2018
,
A&A
,
617
,
A96

Mirocha
J.
,
Furlanetto
S. R.
,
2019
,
MNRAS
,
483
,
1980

Monsalve
R. A.
,
Rogers
A. E. E.
,
Bowman
J. D.
,
Mozdzen
T. J.
,
2017a
,
ApJ
,
835
,
49

Monsalve
R. A.
,
Rogers
A. E. E.
,
Bowman
J. D.
,
Mozdzen
T. J.
,
2017b
,
ApJ
,
847
,
64

Monsalve
R. A.
et al. ,
2023
,
preprint
()

Muñoz
J. B.
,
Loeb
A.
,
2018
,
Nature
,
557
,
684

Olamaie
M.
,
Hobson
M. P.
,
Feroz
F.
,
Grainge
K. J. B.
,
Lasenby
A.
,
Perrott
Y. C.
,
Rumsey
C.
,
Saunders
R. D. E.
,
2018
,
MNRAS
,
481
,
3853

Paciga
G.
et al. ,
2013
,
MNRAS
,
433
,
639

Patil
A. H.
et al. ,
2017
,
ApJ
,
838
,
65

Patra
N.
,
Subrahmanyan
R.
,
Raghunathan
A.
,
Udaya Shankar
N.
,
2013
,
Exp. Astron.
,
36
,
319

Philip
L.
et al. ,
2019
,
J. Astron. Instrum.
,
8
,
1950004

Planck Collaboration XX
,
2016
,
A&A
,
594
,
A20

Planck Collaboration I
,
2020a
,
A&A
,
641
,
A1

Planck Collaboration X
,
2020b
,
A&A
,
641
,
A10

Plumlee
M.
,
Joseph
V. R.
,
2018
,
Stat. Sin.
,
28
,
601

Price
D. C.
et al. ,
2018
,
MNRAS
,
478
,
4193

Pritchard
J. R.
,
Loeb
A.
,
2012
,
Rep. Prog. Phys.
,
75
,
086901

Rapetti
D.
,
Tauscher
K.
,
Mirocha
J.
,
Burns
J. O.
,
2020
,
ApJ
,
897
,
174

Rasmussen
C.
,
Williams
C.
,
2006
,
Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning
.
MIT Press
,
Cambridge, MA, USA

Reback
J.
et al. ,
2022
,
pandas-dev/pandas: Pandas 1.4.2
.
Zenodo

Reis
I.
,
Fialkov
A.
,
Barkana
R.
,
2021
,
MNRAS
,
506
,
5479

Rogers
A. E. E.
,
Pratap
P.
,
Carter
J. C.
,
Diaz
M. A.
,
2005
,
Radio Sci.
,
40
,
RS5S17

Rogers
A. E. E.
,
Bowman
J. D.
,
Vierinen
J.
,
Monsalve
R.
,
Mozdzen
T.
,
2015
,
Radio Sci.
,
50
,
130

Ross
H. E.
,
Dixon
K. L.
,
Iliev
I. T.
,
Mellema
G.
,
2017
,
MNRAS
,
468
,
3785

Rue
H.
,
Held
L.
,
2005
,
Gaussian Markov Random Fields
.
Chapman and Hall/CRC
,
London

Semelin
B.
,
Eames
E.
,
Bolgar
F.
,
Caillat
M.
,
2017
,
MNRAS
,
472
,
4508

Shen
E.
,
Anstey
D.
,
de Lera Acedo
E.
,
Fialkov
A.
,
Handley
W.
,
2021
,
MNRAS
,
503
,
344

Shen
E.
,
Anstey
D.
,
de Lera Acedo
E.
,
Fialkov
A.
,
2022
,
MNRAS
,
515
,
4565

Shen
E.
,
Anstey
D.
,
de Lera Acedo
E.
,
Fialkov
A.
,
2023
,
preprint
()

Sims
P. H.
,
Pober
J. C.
,
2020
,
MNRAS
,
492
,
22

Singh
S.
,
Subrahmanyan
R.
,
2019
,
ApJ
,
880
,
26

Singh
S.
et al. ,
2022
,
Nat. Astron.
,
6
,
607

Skilling
J.
,
2004
, in
Fischer
R.
,
Preuss
R.
,
Toussaint
U. V.
, eds,
AIP Conf. Ser. Vol. 735, Bayesian Inference and Maximum Entropy Methods in Science and Engineering: 24th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering
.
Am. Inst. Phys
,
New York
, p.
395

Slatyer
T. R.
,
Wu
C.-L.
,
2018
,
Phys. Rev. D
,
98
,
023013

Sokolowski
M.
et al. ,
2015
,
Publ. Astron. Soc. Aust.
,
32
,
e004

Development
Stan
,
2018
,
The Stan Core Library
.
http://mc-stan.org/, last accessed 2023-09-07

Stan Development Team
,
2023a
,
Stan Modeling Language Users Guide and Reference Manual
.
http://mc-stan.org/, last accessed 2023-09-07

Stan Development Team
,
2023b
,
RStan: the R interface to Stan
.
https://mc-stan.org/, last accessed 2023-09-07

Switzer
E. R.
,
Liu
A.
,
2014
,
ApJ
,
793
,
102

Tauscher
K.
,
Rapetti
D.
,
Burns
J. O.
,
Switzer
E.
,
2018
,
ApJ
,
853
,
187

Tauscher
K.
,
Rapetti
D.
,
Burns
J. O.
,
2020
,
ApJ
,
897
,
132

The HERA Collaboration
,
2022
,
preprint
()

Trott
C. M.
et al. ,
2020
,
MNRAS
,
493
,
4711

Vázquez
J. A.
,
Bridges
M.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2012a
,
J. Cosmol. Astropart. Phys.
,
2012
,
006

Vázquez
J. A.
,
Bridges
M.
,
Hobson
M. P.
,
Lasenby
A. N.
,
2012b
,
J. Cosmol. Astropart. Phys.
,
2012
,
020

Vázquez
J. A.
,
Bridges
M.
,
Ma
Y.-Z.
,
Hobson
M. P.
,
2013
,
J. Cosmol. Astropart. Phys.
,
2013
,
001

Vedantham
H. K.
,
Koopmans
L. V. E.
,
de Bruyn
A. G.
,
Wijnholds
S. J.
,
Ciardi
B.
,
Brentjens
M. A.
,
2014
,
MNRAS
,
437
,
1056

Virtanen
P.
et al. ,
2020
,
Nat. Methods
,
17
,
261

Voytek
T. C.
,
Natarajan
A.
,
Jáuregui García
J. M.
,
Peterson
J. B.
,
López-Cruz
O.
,
2014
,
ApJ
,
782
,
L9

APPENDIX A: MEAN CENTRING IN GP FIT

In this section we test whether applying the mean subtraction (Section 2.5.2), that we apply to the FlexKnot signals, can also remove the foreground degeneracy in the Gaussian Process fit. Fig. A1 shows the GP posterior contours for the synthetic data (top) and EDGES low-band data (bottom) with the mean subtraction applied. As expected (see discussion in Section 2.5.2), the mean subtraction is not very effective and does not change the qualitative result.

Gaussian Process posterior contours with mean-centring for the synthetic data (top) and EDGES low-band data (bottom). This Figure is the pendant to Fig. 7 but with the mean centring as described in Section 2.5.2. The mean-centring helped reduce the degeneracy in the Gaussian Process contours somewhat, but does not solve the issue. The effect is not as strong as for the FlexKnot method (which is discussed in Appendix  C).
Figure A1.

Gaussian Process posterior contours with mean-centring for the synthetic data (top) and EDGES low-band data (bottom). This Figure is the pendant to Fig. 7 but with the mean centring as described in Section 2.5.2. The mean-centring helped reduce the degeneracy in the Gaussian Process contours somewhat, but does not solve the issue. The effect is not as strong as for the FlexKnot method (which is discussed in Appendix   C).

APPENDIX B: EFFECT OF SIGNAL PRIOR CHOICE IN FLEXKNOT ANALYSIS

In addition to the necessary priors in the knot parameters (where we chose uniform priors from |$[-2\, \mathrm{K}, 2\, \mathrm{K}]$| for the Tsignal value of each knot) we can choose to add an additional prior to encode what we expect of the signal. In the main analysis, we chose a prior to limit the overall contrast of the signal, that is the difference between the highest and lowest signal value. We put a Gaussian prior |$\mathcal {N}(0\, \mathrm{K}, 1\, \mathrm{K})$| on this difference to encode that we think a signal with a larger variation is less plausible.

In Fig. B1, we show the effect of different such prior choices. The first row shows just a uniform prior |$[-2\, \mathrm{K}, 2\, \mathrm{K}]$| on the signal values. This is almost equivalent to no additional prior, as the prior on the knot parameters approximately has the same effect (but the signal could exceed the knot prior in the extrapolation). Note that the plots show mean-subtracted signals which is why some lines reach outside the |$[-2\, \mathrm{K}, 2\, \mathrm{K}]$| range. The second row shows the aforementioned Gaussian prior on the signal contrast as we used in the main analysis. The third row shows a Gaussian prior |$\mathcal {N}(0\, \mathrm{K}, 1\, \mathrm{K})$| on every value Tsignali) which we found to be too strong and to suppress many posterior modes.

Comparison of prior choices for the FlexKnot signal fit. The first row shows runs with a uniform $[-2\, \mathrm{K}, 2\, \mathrm{K}]$ prior on the signal values, the second row shows runs with a Gaussian prior $\mathcal {N}(0\, \mathrm{K}, 1\, \mathrm{K})$ on the difference between the highest and lowest signal value, and the third row shows runs with a strong Gaussian prior $\mathcal {N}(0\, \mathrm{K}, 1\, \mathrm{K})$ on every signal value. We demonstrate the effect for both the synthetic data (left) and EDGES low-band data (right). We also add lines to indicate the input signal (red solid, left) or the flattened Gaussian best-fit (orange dashed, right) for easier comparison.
Figure B1.

Comparison of prior choices for the FlexKnot signal fit. The first row shows runs with a uniform |$[-2\, \mathrm{K}, 2\, \mathrm{K}]$| prior on the signal values, the second row shows runs with a Gaussian prior |$\mathcal {N}(0\, \mathrm{K}, 1\, \mathrm{K})$| on the difference between the highest and lowest signal value, and the third row shows runs with a strong Gaussian prior |$\mathcal {N}(0\, \mathrm{K}, 1\, \mathrm{K})$| on every signal value. We demonstrate the effect for both the synthetic data (left) and EDGES low-band data (right). We also add lines to indicate the input signal (red solid, left) or the flattened Gaussian best-fit (orange dashed, right) for easier comparison.

Note that in all cases we keep the uniform prior on the FlexKnot coordinates |$[-2\, \mathrm{K}, 2\, \mathrm{K}]$|⁠. We think this is appropriate here since even a foreground-only fit produces residuals |$\ll 2\, \mathrm{K}$|⁠, but we emphasize that for other data a wider prior may need to be considered.

The left and right columns show the synthetic and EDGES low-band data, respectively. We see that the prior choice overall mostly affects the weight between different posterior modes, as expected. For example, the synthetic data posteriors (left column) in the first row show a flattened Gaussian-shaped signal fitting the data, as well as a signal with a large |$+2\, {\rm K}$| peak and |$-1\, {\rm K}$| trough; the contrast prior in the second row gives more weight to the former and suppresses the latter more-extreme mode, as desired. The third row however is extremely constraining and artificially restricts the results to a single signal shape.

APPENDIX C: FLEXKNOT MEAN-ZERO SHIFT AND FOREGROUND-ORTHOGONAL PROJECTION

When we obtain the FlexKnot parameter posterior we first have a sample of knot parameters as illustrated in the lower panel of Fig. 3. We then turn each sample into the corresponding T(ν) function via the FlexKnot interpolation. The resulting functions are shown in the upper panel of Fig. C1. We can already read off the different function shapes, but due to the trivial degeneracy of shifting the signal up and down (this is not captured by the complexity penalty from the Bayesian evidence) these plots can be hard to interpret when there are many different shapes. Contour plots in particular are impossible to interpret in this format.

Different post-processing steps we consider for visualizing our FlexKnot results. Top: the raw FlexKnot posterior function samples. Middle: Posterior samples shifted to have zero mean, this is what we use in the main text. Bottom: Posterior samples but projected (in post-processing) into the foreground-orthogonal subspace, we just use this as a sanity check. All runs shown here use the synthetic data set and the Gaussian prior on signal contrast discussed in Appendix B.
Figure C1.

Different post-processing steps we consider for visualizing our FlexKnot results. Top: the raw FlexKnot posterior function samples. Middle: Posterior samples shifted to have zero mean, this is what we use in the main text. Bottom: Posterior samples but projected (in post-processing) into the foreground-orthogonal subspace, we just use this as a sanity check. All runs shown here use the synthetic data set and the Gaussian prior on signal contrast discussed in Appendix  B.

Thus, we remove this one degeneracy in post-processing by shifting all signals to have zero mean, as shown in the middle panel of Fig. C1. This does not affect the functional shape as far as we are interested, and makes the plots much more readable. This is the format we use for all functional posterior plots in Sections 3.2 and 3.3, and Appendix  B.

Finally, we could also project the FlexKnot signals into the foreground-orthogonal subspace, as shown in the lower panel of Fig. C1 (note the different y-axis). This is a useful check that the FlexKnot result is consistent with the Gaussian Process result; all FlexKnot results are consistent with the GP results in this projection. We emphasize that the FlexKnot still describes the full signal Tsignal (and we apply the foreground-orthogonal projection in post-processing just for this check) while the Gaussian Process directly fits the foreground-orthogonal component Tsignal, projected.

APPENDIX D: TRANSFORMED FOREGROUND PARAMETER POSTERIORS

In this section, we show the foreground parameter plots for the transformed parameters |$\tilde{a}_n$|⁠. These parameters do not suffer from the degeneracies of the non-transformed parameters an and thus simplify the comparison between the FlexKnot results and the synthetic data inputs or flattened Gaussian best-fitting parameters, respectively.

In Fig. D1 we show the synthetic data posterior (top panel), corresponding to Fig. 10 in the main text, and the EDGES low-band data posterior (bottom panel), corresponding to Fig. 13 in the main text.

Posterior distributions of the transformed parameters $\tilde{a}_n$ (rather than the an parameters shown in Figs 10 and 13). We show the FlexKnot foreground posterior distributions (blue contours) for the synthetic (top) and EDGES low-band (bottom) data sets. The top panel shows that the input parameters used for synthetic data (red markers) are well recovered by the FlexKnot fit. The bottom panel instead shows the foreground parameter values for the best-fitting flattened Gaussian model (orange markers). While these lie within the FlexKnot 95 per cent credibility region, they overlap less well with the FlexKnot posterior. This is expected because the FlexKnot method finds different signals than the flattened Gaussian fit.
Figure D1.

Posterior distributions of the transformed parameters |$\tilde{a}_n$| (rather than the an parameters shown in Figs 10 and 13). We show the FlexKnot foreground posterior distributions (blue contours) for the synthetic (top) and EDGES low-band (bottom) data sets. The top panel shows that the input parameters used for synthetic data (red markers) are well recovered by the FlexKnot fit. The bottom panel instead shows the foreground parameter values for the best-fitting flattened Gaussian model (orange markers). While these lie within the FlexKnot 95 per cent credibility region, they overlap less well with the FlexKnot posterior. This is expected because the FlexKnot method finds different signals than the flattened Gaussian fit.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.