Emulation of the cosmic dawn 21-cm power spectrum and classification of excess radio models using an artificial neural network

ABSTRACT

The cosmic 21-cm line of hydrogen is expected to be measured in detail by the next generation of radio telescopes. The enormous data set from future 21-cm surveys will revolutionize our understanding of early cosmic times. We present a machine learning approach based on an artificial neural network that uses emulation in order to uncover the astrophysics in the epoch of reionization and cosmic dawn. Using a seven-parameter astrophysical model that covers a very wide range of possible 21-cm signals, over the redshift range 6 to 30 and wavenumber range 0.05 to |$1 \ \rm {Mpc}^{-1}$| we emulate the 21-cm power spectrum with a typical accuracy of |$10 - 20~{{\ \rm per\ cent}}$|⁠. As a realistic example, we train an emulator using the power spectrum with an optimistic noise model of the square kilometre array (SKA). Fitting to mock SKA data results in a typical measurement accuracy of 2.8 per cent in the optical depth to the cosmic microwave background, 34 per cent in the star-formation efficiency of galactic haloes, and a factor of 9.6 in the X-ray efficiency of galactic haloes. Also, with our modelling we reconstruct the true 21-cm power spectrum from the mock SKA data with a typical accuracy of |$15 - 30~{{\ \rm per\ cent}}$|⁠. In addition to standard astrophysical models, we consider two exotic possibilities of strong excess radio backgrounds at high redshifts. We use a neural network to identify the type of radio background present in the 21-cm power spectrum, with an accuracy of 87 per cent for mock SKA data.

methods: numerical, methods: statistical, dark ages, reionization, first stars, cosmology: theory

1 INTRODUCTION

The redshifted 21-cm signal from neutral hydrogen is the most promising probe of the epoch of reionization (EoR) and cosmic dawn. This 21-cm emission or absorption originates from the hyperfine splitting of the hydrogen atom. As this signal depends on both cosmological and astrophysical parameters, it should be possible to decipher abundant information about the early Universe from the signal once it is observed. The low frequency array (LOFAR; Gehlot et al. 2019), the precision array to probe the epoch of reionization (PAPER; Kolopanis et al. 2019), the Murchison wide-field array (MWA; Trott et al. 2020), the Owens valley radio observatory long wavelength array (OVRO-LWA; Eastwood et al. 2019), the large-aperture experiment to detect the dark age (LEDA; Price et al. 2018; Garsden et al. 2021), and the hydrogen epoch of reionization array (HERA; DeBoer et al. 2017) are experiments that have analysed data in an attempt to detect the power spectrum from the EoR. Although the existing upper limits are weak, they already provide interesting constraints on some of the exotic scenarios (e.g. with extra radio background as considered here) (Mondal et al. 2020; Abdurashidova et al. 2022). HERA along with the New Extension in Nançay Upgrading LOFAR (NenuFAR; Zarka et al. 2012) and the square kilometre array (SKA; Koopmans et al. 2015) will aim to measure the power spectrum over a wide range of redshifts including cosmic dawn. Thus, we expect a great deal of data from observations in the upcoming decade.

The question arises as to what are the possible ways to infer the astrophysical parameters from the observed 21-cm power spectrum data. Since the characteristic astrophysical parameters at high redshifts are currently almost entirely unconstrained, the 21-cm signal must be calculated for a large number of parameter sets that cover a wide range of possibilities. Given the complexity of the 21-cm signal (see Barkana 2018a; Mesinger 2019) and its highly non-linear dependence on the astrophysical parameters, Artificial Neural Networks (ANNs) are a useful method for emulation and fitting. Shimabukuro & Semelin (2017) used an ANN to estimate the astrophysical parameters from 21-cm observations. They trained the ANN using 70 data sets where each set consists of the 21-cm power spectrum obtained using 21cmfast (Mesinger, Furlanetto & Cen 2011) as input, with three EoR parameters used in the simulation as output. They applied the trained ANN to 54 data sets to evaluate how the algorithm performs. Kern et al. (2017) used a machine learning algorithm to emulate the 21-cm power spectrum and perform Bayesian analysis for parameter constraints over eleven parameters which included six parameters of the EoR and X-ray heating and five additional cosmological parameters. Schmit & Pritchard (2018) built an emulator using a neural network to emulate the 21-cm power spectrum where they generated the training and test data sets using the 21cmfast simulation and compared their results with 21cmmc. Cohen et al. (2020) introduced the first all sky averaged (global) 21-cm signal emulator using an ANN. Recently, Bevins et al. (2021) and Bye, Portillo & Fialkov (2022) proposed two different approaches for emulating the global 21-cm signal. The astrophysical parameters and reionization history can also be recovered from 21-cm images directly using convolutional neural networks (CNN) (Gillet et al. 2019; La Plante & Ntampaka 2019). In this paper, we use an emulation method to constrain the 21-cm power spectrum for the seven-parameter astrophysical model. Using ANN, we construct an emulator that is trained on a large data set of models that cover a very wide range of the astrophysical parameter space. Given the seven-parameter astrophysical model, the emulator is able to predict the 21-cm power spectrum over a wide redshift range (z = 6 to 30). We construct our algorithm in a way that approximately accounts for emulation error (i.e. the uncertainty due to the finite size of the training set), and also test the accuracy (and improve the error estimates) using 5-fold cross-validation. We also explore a more realistic case of the observational measurements expected for the SKA, as well as extended models that also include an excess early radio background. We note that any supervised machine learning algorithm is specific to the particular seminumerical model that is used to generate the training data. Therefore, the results obtained from such algorithms may not be directly applicable to simulations conducted using other seminumerical methods (such as 21cmfast) or to the real Universe; in further work we plan to test with data generated with other methods.

This paper is organized as follows: We present in Section 2 a description of the theory and methods used to generate the data sets (2.1–2.4) and build the ANN and maximum likelihood estimator (2.5–2.6). Section 3 presents our results, for standard astrophysical models (3.1–3.4) and ones with an early radio background (3.5–3.6). Finally, we summarize our results and discuss our conclusions in Section 4.

2 THEORY AND METHODS

2.1 21-cm signals

2.1.1 Astrophysical parameters

We use seven key parameters to parametrize the high redshift astrophysics: the star formation efficiency (f_⋆), the minimum circular velocity of star-forming haloes (V_C), the X-ray radiation efficiency (f_X), the power-law slope (α) and the low energy cutoff (E_min) of the X ray spectral energy distribution (SED), the optical depth (τ) of the cosmic microwave background (CMB), and the mean free path (R_mfp) of ionizing photons. Here we briefly discuss these astrophysical parameters.

• The star formation efficiency, f_⋆, quantifies the fractional amount of gas in star-forming dark matter haloes that is converted into stars (Tegmark et al. 1997). The value of f_⋆ depends on the details of star formation that are unknown at high redshift, so we treat it as a free parameter. We assume a constant star formation efficiency in haloes heavier than the atomic cooling mass and a logarithmic cutoff in the efficiency in lower mass haloes (Fialkov et al. 2013). We cover a wide range of f_⋆ values, from 0.0001 to 0.5.

• The circular velocity, V_C, is another parameter that encodes the information about star formation. Star formation takes place in dark matter haloes that are massive enough to radiatively cool the in-falling gas (Tegmark et al. 1997). This is the main element in setting the minimum mass of star-forming haloes, M_min. We equivalently use the minimum circular velocity as one of our free parameters. Since the cooling and the internal feedback depend on the depth of the potential and the potential is directly related to V_C, it is more physical to use a fixed V_C versus redshift rather than a fixed M_min. Since complex feedback (e.g. Schauer et al. 2015) of various types can suppress star formation in low-mass haloes, we treat V_C as a free parameter. In practice the actual threshold is not spatially homogeneous in our simulation since individual pixels are affected by feedback processes including Lyman–Werner feedback on small haloes, photoheating feedback during the EoR, and the streaming velocity between dark matter and baryons. The relation between the circular velocity (V_C) and the minimum mass of the dark matter halo (M_min) is given (in the Einstein de-Sitter limit which is valid at high redshift) by

$$\begin{eqnarray} V_C = 16.9\left(\frac{M_{\rm {min}}}{10^8{\rm M}_{\odot }}\right)^{1/3}\left(\frac{1+z}{10}\right)^{1/2}\left(\frac{\Omega _m}{0.0316}\right)^{1/6}\ \rm {km\ s^{-1}} \ . \end{eqnarray}$$

(1)

• The X-ray radiation efficiency, f_X, is defined by the standard expression of the ratio of the X-ray luminosity to the star formation rate (⁠|$\rm {L_X}-$|SFR relation) [see Fialkov, Barkana & Visbal (2014), Cohen et al. (2017) for more details]

$$\begin{eqnarray} \frac{\rm {L_X}}{\rm {SFR}} = 3\times 10^{40}f_X \ \ \rm {erg\ s^{-1}\ M_\odot ^{-1}\ yr}\ \ . \end{eqnarray}$$

(2)

In the above expression |$\rm {L_X}$| is the bolometric luminosity and f_X is the X-ray efficiency of the source. The normalization is such that f_X = 1 corresponds to the typical observed value for low-metallicity galaxies. Given the almost total absence of observational constraints at the relevant redshifts, we vary f_X from 0.0001 to 1000.

• The power-law slope α and the low energy cutoff E_min determine the shape of the spectral energy distribution (SED). We parametrize the X-ray SED by the power-law slope α (where dlog(E_X)/dlog(ν) = −α) and the low energy cutoff (E_min). These two parameters have significant degeneracy, so we vary α in the narrow range 1–1.5 and E_min in the broad range of 0.1–3.0 keV. The SEDs of the early X-ray sources strongly affect the 21-cm signal from both the EoR and cosmic dawn (Fialkov & Barkana 2014; Fialkov et al. 2014). Soft X-ray sources (emitting mostly below 1 keV) produce strong fluctuations on relatively small scales (up to a few tens of Mpc) whereas hard X-ray sources produce milder fluctuations on larger scales. X-ray binaries (XRB) (Mirabel et al. 2011; Fragos et al. 2013) are major sources that are expected to have a hard X-ray SED.

• The optical depth of the CMB, τ, is one of two parameters that describe the EoR. For given values of the other astrophysical parameters, the CMB optical depth has a one-to-one relation with the ionizing efficiency ζ which is defined by

$$\begin{eqnarray} \zeta = f_{\star }f_{\rm {esc}}N_{\rm {ion}}\frac{1}{1 + \bar{n}_{\rm {rec}}}\ ,\\ \end{eqnarray}$$

(3)

where f_⋆ is the star formation efficiency, f_esc is the fraction of ionizing photons that escape from their host galaxy, N_ion is the number of ionizing photons produced per stellar baryon in star-forming haloes, and |$\bar{n}_{\rm {rec}}$| is the mean number of recombinations per ionized hydrogen atom. We choose to include the CMB optical depth (τ) in our seven-parameter astrophysical model instead of the ionizing efficiency (ζ) because τ is directly constrained by CMB observations (Planck Collaboration VI 2020).

• The mean free path of ionizing photons, R_mfp, is the other EoR parameter (Alvarez & Abel 2012). R_mfp sets the maximum distance travelled by ionizing photons. Due to the process of structure formation, dense regions of neutral hydrogen (Lyman-limit systems) effectively absorb all the ionizing radiation and thus limit the sphere of influence of each ionizing source. The mean free path parameter approximately accounts for the effect of these dense neutral hydrogen pockets during reionization. In our simulations, we vary R_mfp from 10 to 70 comoving Mpc (Wyithe & Loeb 2004; Songaila & Cowie 2010).

2.1.2 Power spectrum

It is possible in principle to map the distribution of neutral hydrogen three dimensionally in the early Universe by observing the brightness temperature contrast of the 21-cm line. In order to infer the information about the astrophysical processes in the EoR and cosmic dawn, there are a variety of approaches one can follow to characterize the 21-cm signal. Other than the global signal, the most straightforward approach is to use the statistical description of the 21-cm fluctuations, i.e. the 21-cm power spectrum.

The 21-cm power spectrum encodes a great deal of information about the underlying physical processes related to reionization and cosmic dawn. We define the power spectrum P(k) of fluctuation of the 21-cm brightness temperature (relative to the radio background, which is the CMB in standard models) by

$$\begin{eqnarray} \langle \tilde{\delta }_{T_b}(\boldsymbol {k})\tilde{\delta }^{*}_{T_b}(\boldsymbol {k}^{\prime }) \rangle = (2\pi)^3\delta _D(\boldsymbol {k}-\boldsymbol {k}^{\prime })P(k)\ , \end{eqnarray}$$

(4)

where k is the comoving wave vector, δ_D is the Dirac delta function, and the angular brackets denote the ensemble average. |$\tilde{\delta }_{T_b} (\boldsymbol {k})$| is the Fourier transform of |$\delta _{T_b} (\boldsymbol {x})$| which is defined by |$\delta _{T_b} (\boldsymbol {x}) = (\delta T_b(\boldsymbol {x})-\bar{\delta T_b})/\bar{\delta T_b}$|⁠. Finally we express the power spectrum in terms of the variance, in mK² units:

$$\begin{eqnarray} \Delta ^2 = \langle \delta T_b \rangle ^2 \frac{k^3P(k)}{2\pi ^2}\ , \end{eqnarray}$$

(5)

where the expression k³P(k)/2π² is dimensionless. The 21-cm signal is significantly non-Gaussian because of both large-scale and small-scale processes during reionization and cosmic dawn. Thus, the power spectrum does not reveal all the statistical information that is available. Nevertheless, a wealth of astrophysical information can be extracted from the 21-cm power spectrum and it can be measured relatively easily from observations.

2.2 The excess radio background

The first observational signature of the H i 21-cm line from cosmic dawn was tentatively detected by the EDGES collaboration (Bowman et al. 2018). The shape and magnitude of this signal are not consistent with the standard astrophysical expectation. The reported 21-cm signal is centred at ν = 78.2 MHz with an absorption trough of |$\delta T_b = -500^{+200}_{-500}$| mK (Bowman et al. 2018). The amplitude of absorption is more than a factor of two larger than that predicted from standard astrophysics based on the ΛCDM cosmology and hierarchical structure formation. The SARAS 3 experiment recently reported the upper limit of the global signal that is inconsistent with the EDGES signal (Singh et al. 2022) at 95 per cent, so it will be some time before we can be confident that the global 21-cm signal has been reliably measured.

If EDGES is confirmed, one possible explanation of this observed signal is that there is an additional cooling mechanism that makes the neutral hydrogen gas colder than expected; a novel dark matter interaction with the cosmic gas (Barkana 2018b) is a viable option, but it likely requires a somewhat elaborate dark matter model (Barkana et al. 2018; Berlin et al. 2018; Muñoz & Loeb 2018; Liu et al. 2019). Another possibility, which we consider in detail in this paper, is an excess radio background above the CMB (Bowman et al. 2018; Ewall-Wice et al. 2018; Feng & Holder 2018; Fialkov & Barkana 2019; Mirocha & Furlanetto 2019; Ewall-Wice, Chang & Lazio 2020; Reis, Fialkov & Barkana 2020b). This excess radio background increases the contrast between the spin temperature and the background radiation temperature. In this case the basic equation for the observed 21-cm brightness temperature from redshift z relative to the background is

$$\begin{eqnarray} \delta T_b = \frac{T_{\rm {S}}-T_{\rm {rad}}}{1 + z}(1-e^{-\tau _{\nu }})\ , \end{eqnarray}$$

(6)

where T_rad = T_CMB + T_radio, with T_radio being the brightness temperature of the excess radio background and T_CMB = 2.725(1 + z) K. We consider two distinct types of extra radio models, which we have considered in previous publications. The external radio model assumes a homogeneous background that is not directly related to astrophysical sources, i.e. may be generated by exotic processes (such as dark matter decay) in the early Universe. In this model, we assume that the brightness temperature of the excess radio background at the 21-cm rest-frame frequency at redshift z is given by (Fialkov & Barkana 2019)

$$\begin{eqnarray} T_{\rm {radio}} = A_r \times 2.725(1 + z)\left[\frac{1420}{78(1 + z)}\right]^{\beta } \ \hbox{K}\ \ , \end{eqnarray}$$

(7)

where the spectral index β = −2.6 (set to match the slope of the observed extragalactic radio background observed by ARCADE2 [Fixsen et al. 2011; Seiffert et al. 2011) and confirmed by LWA1 (Dowell & Taylor 2018)] and A_r is the amplitude of the radio background. Here 1420 MHz/(1 + z) is the observed frequency corresponding to redshift z, and A_r measures the amplitude (relative to the CMB) at the central frequency of the EDGES feature (78 MHz). Thus, the external radio model has eight free parameters: f_⋆, V_C, f_X, α, E_min, τ, R_mfp, and A_r.

In contrast to this external radio background, astrophysical sources such as supermassive black holes or supernovae could in principle produce such an extra radio background due to synchrotron radiation. In such a case, the radio emission would originate from within high redshift radio galaxies and would thus result in a spatially varying radio background, as computed accurately on large scales within our seminumerical simulations (Reis et al. 2020b). The galaxy radio luminosity can be written as

$$\begin{eqnarray} L_{\rm {radio}}(\nu , z) = f_{R} \times 10^{22} \left(\frac{\nu }{150~\hbox{MHz}} \right)^{-\alpha _{\rm {radio}}} \left(\frac{\rm {SFR}}{\rm {M_\odot yr^{-1}}}\right) \ \ \ \ \rm {W\, Hz}^{-1}\ ,\\ \end{eqnarray}$$

(8)

where α_radio is the spectral index in the radio band, |$\rm {SFR}$| is the star formation rate, and f_R is the normalization of the radio emissivity. Based on observations of low-redshift galaxies, we set α_radio = 0.7 and note that f_R = 1 roughly corresponds to the expected value (Gürkan et al. 2018; Mirocha & Furlanetto 2019). Since extrapolating low-redshift observations to cosmic dawn may be wildly inaccurate, in our analysis we allow f_R to vary over a wide range. Thus, the galactic radio model is also based on eight parameters: f_⋆, V_C, f_X, α, E_min, τ, R_mfp, and f_R.

Both types of radio background, if they exist, can affect the 21-cm power spectrum, leading to a strong amplification of the 21-cm signal during cosmic dawn and the EoR in models in which the radio background is significantly brighter than the CMB. However, there are some major differences between the two models. The external radio background is spatially uniform, is present at early cosmic times (prior to the formation of the first stars), and increases with redshift (i.e. it is very strong at cosmic dawn and weakens during the EoR). On the other hand, the galactic radio background is non-uniform, and its intensity generally rises with time as it follows the formation of galaxies (as long as f_R is assumed to be constant with redshift).

2.3 Mock SKA data

To consider a more realistic case study, we create mock SKA data by including several expected observational effects in the 21-cm power spectrum, which we refer to as the case ‘mock SKA data’. To incorporate the SKA noise case within the data, (i) we include the effect of the SKA angular resolution, (ii) we add a pure Gaussian noise smoothed over the SKA resolution as a realization of the SKA thermal noise. The strength of the SKA thermal noise (for a frequency depth corresponding to 3 comoving Mpc and assuming a 1000 h integration) is approximated by

$$\begin{eqnarray} \sigma _{\rm {thermal}} = a\left(\frac{1+z}{17}\right)^b \ , \end{eqnarray}$$

(9)

where a and b depend on the resolution used. Here we use a smoothing radius, R_SKA = 20 Mpc, for which a = 4 mK and b = 5.1 for z > 16, while for lower redshifts, b = 2.7 [following Banet et al. (2021), see also Koopmans et al. (2015)] and (iii) we remove (i.e. set to zero) modes from part of the k-space (the ‘foreground wedge’) that is dominated by foregrounds [following Reis, Barkana & Fialkov (2020a), see also Datta, Bowman & Carilli (2010); Morales et al. (2012); Dillon et al. (2014); Pober et al. (2014); Liu, Parsons & Trott (2014a, b); Jensen et al. (2015); Pober (2015)]. Each of the three effects is included along with its expected redshift dependence. Regarding the foreground avoidance/filtering, we note that we assume that the high-resolution maps of the SKA will enable a first step of reasonably accurate foreground subtraction, so that the remaining wedge-like region that must be set to zero will be limited [corresponding to the ‘optimistic model’ of Pober et al. (2014)]. In order to gain some understanding of the separate SKA effects, we also consider a case that we label ‘SKA thermal noise’. In this case, we add the effect of SKA resolution and thermal noise, i.e. the same as ‘mock SKA data’ except without foreground avoidance.

Given the lower sensitivity, for cases with mock SKA effects we use coarser binning, as we do not expect the results to depend on the detailed power spectrum shape that would come out in finer binning. Specifically, we used eight redshift bins and five k bins. The five k-bins are spaced evenly in log scale between k = 0.05 Mpc⁻¹ and k = 1.0 Mpc⁻¹; we average the 21-cm power spectrum at each redshift over the range of k values within each bin. To fix the redshift bins, we imagine placing our simulation box multiple times along the line of sight, so that our comoving box size fixes the redshift range of each bin. For example, we start with z = 27.4, which corresponds to 50 MHz (the limit of the SKA), as the far side of the highest redshift bin. Then the centre of the box is 192 comoving Mpc (half of our 384 Mpc box length) closer to us. The redshift corresponding to the centre is taken as the central redshift of the first bin. The next redshift bin is 384 Mpc closer and so on. As the total comoving distance between z = 27.4 and z = 6 is around 3000 Mpc, we obtain eight redshift bins that naturally correspond to a line-of-sight filled with simulation boxes. We then average the 21-cm power spectrum over the redshift range spanned by each box along the line of sight, by using the simulation outputs which we have at finer resolution in redshift. This averaging is part of the effect of observing a light cone; while there is also an associated anisotropy (Barkana & Loeb 2006; Datta et al. 2012), in this paper we only consider the spherically averaged 21-cm power spectrum.

2.4 Method to generate the data set

We use our own seminumerical simulation code (Visbal et al. 2012; Fialkov & Barkana 2014), which we have named 21 cmSPACE (21-cm Seminumerical Predictions Across Cosmological Epochs), to predict the 21-cm signal for each possible model. The simulation generates realizations of the Universe in a large cosmological volume (384³ comoving Mpc³) with a resolution of 3 comoving Mpc over a wide range of redshifts (6 to 50). The simulation follows the hierarchical structure formation and the evolution of the Ly α, X-ray, Lyman–Werner (LW), and ionizing ultraviolet radiation. The extended Press–Schechter formalism is used to compute the star formation rate in each cell at each redshift (Barkana & Loeb 2004). The 21-cm brightness temperature cubes are output by the simulation and we use them to calculate the 21-cm power spectrum at each redshift. While this seminumerical simulation was inspired by 21cmfast (Mesinger, Furlanetto & Cen 2011), it is an entirely independent implementation with various differences such as more accurate X-ray heating (including the effect of local reionization on the X-ray absorption) and Ly α fluctuations (including the effect of multiple scattering and Ly α heating). Inhomogeneous processes such as the streaming velocity, LW feedback, and photoheating feedback are also included in the code. We created a mock 21-cm signal using the code for a large number of astrophysical models and calculated the 21-cm power spectrum for each parameter combination. Considering first standard astrophysical models (without an excess radio background), we generated the 21-cm power spectrum for 3195 models that cover a wide range of possible values of the seven astrophysical parameters. The ranges of the parameters were f_⋆ = 0.0001 − 0.50, V_C = 4.2 − 100 km s⁻¹, f_X = 0.0001 − 1000, α = 1.0 − 1.5, E_min = 0.1 − 3.0 keV, τ = 0.01 − 0.12, and R_mfp = 10.0 − 70.0 Mpc. The sampling of different parameters is done using randomly selected values over these wide ranges in the seven-parameter space.

As explained above, our analysis involved two more data sets (3195 models each) of 21-cm power spectra, with either full SKA noise or SKA thermal noise only, in order to analyse a more realistic situation. In order to investigate the two scenarios of the excess radio background (where the number of free parameters is increased by one), we use two new data sets of models: 10 158 models with the galactic radio background and 5077 models with the external radio background. Throughout this work, we adopt the ΛCDM cosmology with cosmological parameters from Planck Collaboration VIII (2014).

2.5 Artificial neural network

ANN (often simply called neural networks or NN) are computing systems that mimic in some ways the biological neural networks that constitute the human brain. We briefly summarize their properties. An ANN consists of a collection of artificial neurons. Each artificial neuron has inputs and produces a single output which can be the input of multiple other neurons. In our analysis, we utilize a Multi-layer Perceptron (MLP) which is a deep neural network consisting of fully connected layers. To define the architecture of the neural network, several parameters must be specified, including the number of hidden layers, the number of nodes in each layer, the activation function, the solver, and the maximum number of iterations.

An MLP (Ramchoun et al. 2016) is a supervised learning algorithm that learns to fit a mapping f: X^m → Yⁿ using a training data set, where m is the input dimension and n is the output dimension. When we apply unknown data as a set of input features X = x₁, x₂, x₃,..., x_m, the neural network uses the mapping to infer the target output (Y). This MLP can be used for both classification and regression problems. The advantage of an MLP is that it can learn highly non-linear models. Every neural network has three different types of layers each consisting of a set of nodes or neurons. They are the input layer, hidden layer, and output layer. The input layer consists of a set of neurons that represent the input features X = x₁, x₂, x₃,..., x_m. Each neuron in the input layer is connected to all the neurons in the first hidden layer with some weights and each node in the first hidden layer is connected to all the nodes in the next hidden layer and so on. The output layer receives the values from the last hidden layer and transforms them to the output target value. A specific weight (w_ij) and a bias (b_j) are applied to every input or feature. Both the weight and the bias are initially chosen randomly. For a particular neuron in the i’th hidden layer, if m_i is the input and n_j is the output of that neuron, then n_j = f(∑_iw_ijm_i + b_j), where f is called the activation function. Using linear activation functions would make the entire network linear in the inputs, and thus equivalent to a one layered network. Thus, non-linear activation functions such as a rectified linear unit (ReLu; Hara, Saito & Shouno 2015), sigmoid function (Han & Moraga 1995), etc., are typically used in order to provide the ability to handle complex, non-linear data. The precise nature of the non-linearity depends on the activation function used. We use the sigmoid activation function, f(x) = 1/(1 + e^−x). When the absolute value of the input is large, this function saturates and returns a constant output. The output obtained from the activated neuron is then utilized as the input for the next hidden layer. The primary objective of training an ANN is to determine a set of weights and biases that enable the ANN to generate output vectors that are consistent with the desired output, for a given set of input vectors. A backpropagation algorithm (Rumelhart, Hinton & Williams 1986) is usually used to train an ANN. The training procedure for a network involves several steps:

Initialization: Randomly chosen initial weights and biases are applied to all the nodes or neurons in each layer.
Forward propagation: The output is computed using the neural network based on the initial choices of the weights and biases given the input from the training data set. Since the calculation progresses from the input to the output layer (through the hidden layers), this is known as forward propagation.
Error estimation: An error function (often called a loss function) is used to compute the difference between the predicted and the true (known) output of the model, given the current weights. MLP uses different loss functions based on the problem type. For regression, a common choice is the mean square error.
Backpropagation and updating of the weights: A backpropagation algorithm minimizes the error function and finds the optimal weight values, typically by using the gradient descent technique (Kingma & Ba 2014). The outermost weights get updated first and then the updates propagates towards the input layer, hence the term backpropagation.
Repetition until convergence: In each iteration, the weights get updated by a small amount, so to train a neural network several iterations are required. The number of iterations until convergence depends on the learning rate and the optimization method used in the network.

Once the network has been trained using the training data set, the trained network can make predictions for arbitrary input data that were not a part of the training set. We note that in the various mappings presented below, when we train emulators using simulated training sets, we do this separately for each of our cases of ideal data, mock SKA data, and SKA thermal noise only. In each case the power spectra within the training set are generated from 21-cm images that include the observational effects that correspond to the appropriate case. By applying the observational effects directly to the images, we are working in the spirit of a forward pipeline simulation, which is designed to be as realistic as possible.

2.5.1 Astrophysical parameter predictions

For the purpose of predicting the astrophysical parameters, we adopted a four-layer MLP (input layer + two hidden layers + output layer) from the Scikit-learn library (Pedregosa et al. 2011). Specifically, there are 150 neurons in the first hidden layer and 50 neurons in the second hidden layer. The network was expected to be somewhat complex as we want a mapping between the seven astrophysical parameters of the model and, on the other side, the 21-cm power spectrum for 32 values of the wavenumber in the range 0.05 Mpc⁻¹ <k <1.0 Mpc⁻¹ and for redshifts ranging from 6 to 30. For the purpose of predicting the parameters, the 21-cm power spectra are the input and the astrophysical parameters are the output.

Data pre-processing is an important step before applying machine learning. We have two pre-processing steps in the code: (1) Standardization of the data sets, and (2) Dimensionality reduction of the data sets. If individual features have large differences in their ranges, those with large ranges will dominate over those with small ranges, which leads to biased results. Thus it is important to transform all the features to comparable scales, i.e. perform standardization of the data. Mathematically, to do the standardization, we subtract the mean from each value of each feature and then divide it by the standard deviation. After standardization, all the features are centred around zero and have identical variances. For optimal performance of the learning algorithm, it is best that the individual features are as close as possible to standard normally distributed data (a Gaussian distribution with zero mean and unit variance). For standardization of the features, we use the StandardScaler class from the preprocessing module in the Scikit-learn library (Pedregosa et al. 2011). For dimensionality reduction, we use Principal Component Analysis (PCA; Jolliffe & Cadima 2016) to project our data into a lower dimensional space. In the case of predicting the seven parameter model, the input dimension for a particular model is 25 × 32, which is quite high and makes the learning algorithm too slow. Thus we use PCA to speed up the learning algorithm. Though the reduction in dimensionality results in loss of information, the data sets with the reduced dimension are sufficiently good to encode the most important relationships between the data points. Using PCA with 200 components, we are able to reduce the input dimension by 75 per cent, and capture |$\gt 99~{{\ \rm per\ cent}}$| of the data variance; for the mock SKA and SKA thermal noise cases, the number of components is lower so we use all of them without dimensionality reduction. We transform each data set (both the parameters and the 21-cm power spectra) in log scale (log₁₀) before applying it to the MLP regressor. In our MLP regressor, we set the maximum number of iterations to 10 000. We choose the logistic sigmoid function as the activation function for the hidden layers, an adaptive learning rate initialized with 0.001 and the stochastic gradient-based optimizer (Kingma & Ba 2014) for the weight optimization. No early stopping criteria has been used here. We use 2557 models (which is 80 per cent of our full data set) to train the neural network, and we then apply the trained ANN to a test data set consisting of 639 models (20 per cent of our full data set). Throughout this paper, for simplicity we choose test cases that have non-zero power spectra from intergalactic hydrogen, i.e. that have not fully reionized by redshift 6.

The mapping presented in this subsection, of the power spectrum to the parameters, is included for completeness, and since it is useful whenever a quick but still accurate result of the parameter fitting is needed. However, in the rest of this paper we focus more on Markov Chain Monte Carlo (MCMC) fitting (Hogg, Bovy & Lang 2010), since that allows us to also generate estimates of errors as well as multiparameter error contours; in that application, we make only limited use of the mapping presented in this subsection, as a method to obtain a good first guess of the seven astrophysical parameters given a 21-cm power spectrum.

2.5.2 Emulation of the 21-cm power spectrum

If the statistical description of the 21-cm signal (here the 21-cm power spectrum) is our main focus, then we hope to avoid the need to run a seminumerical simulation for each parameter combination. We can instead construct an emulator that provides rapidly computed output statistics that capture the important information in the signal given a set of astrophysical parameters.

We train a neural network to predict the 21-cm power spectrum based on the seven parameter astrophysical model specified above. This trained network/emulator is used primarily for inference using MCMC (hereafter, the word ‘emulator’ refers to the network that predicts the power spectrum given the seven parameter astrophysical model). As in the case of the ANN to predict the parameters, here also we standardize the features as part of data pre-processing. To reduce the dimension of the power spectrum data, we apply PCA transformation to the data; after experimentation we found that here 20 PCA components suffice to capture |$\gt 98~{{\ \rm per\ cent}}$| of the data variance. As before, we again use a log scale for both the data set of the parameters and the 21-cm power spectra. Next we need to find the appropriate neural network architecture to construct the emulator. For this, we choose some specified hyperparameters for our MLP estimator and search among all possible combinations to find the best one to use in our MLP regressor. To emulate the 21-cm power spectrum, we employ a three layer MLP from the Scikit-learn library (Pedregosa et al. 2011) with 134 neurons in each layer. Thus, the full network architecture consists of five layers in total: input layer, three hidden layers, and output layer. We use the logistic sigmoid function (Han & Moraga 1995) as the activation function for the hidden layer and limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimizer (Byrd et al. 1995) for the weight optimization. As before, we fix the maximum number of iterations to 10 000. After emulating the 21-cm power spectrum, we need to inverse PCA transform the predicted data to get back the power spectrum in original dimensions.

2.6 Posterior distribution of the astrophysical model

Given the 21-cm power spectrum, we can predict the seven astrophysical parameters that describe the EoR and the cosmic dawn using the NN trained on the data set consisting of the seven astrophysical parameters and the 21-cm power spectrum P(k) over the wide range of redshifts of interest. This parameter estimation is very computationally fast. However, estimating uncertainties on the predicted parameters using a traditional neural network, as employed in this study, is not straightforward. One solution is to use a Bayesian neural network that incorporates uncertainties while making predictions on output parameters (Hortúa, Volpi & Malagò 2020). Leaving this for future investigation, here we take a hybrid approach. In order to estimate the astrophysical parameters as well as their uncertainties (including complete information on covariances), our approach is as follows. First we use the NN to predict the parameters given the 21-cm power spectrum. Then we employ our power spectrum emulator to an MCMC sampler, using the predicted parameter values as the initial guesses; the emulator allows us to avoid a direct dependence on the seminumerical simulation, thus making the MCMC process computationally fast. The final output of the sampler is the posterior probability distribution for the parameters. We report the median of each of the marginalized distributions of the parameters as the predicted value and calculate the uncertainty bound based on quantiles. This is discussed in further detail in the results section, below.

In order to estimate the uncertainties in predicting the parameters we follow a Bayesian analysis for finding the posterior probability distribution of the parameters. We use MCMC methods for sampling the probability distribution functions or probability density functions (pdfs).

The posterior pdf for the parameters θ given the data D, p(θ|D), is, in general, the likelihood p(D|θ) (i.e. the pdf for the data D given the parameters θ) times the prior pdf p(θ) for the parameters, divided by the probability of the data p(D):

$$\begin{eqnarray} p(\theta |D) = \frac{p(D|\theta)p(\theta)}{p(D)}\ , \end{eqnarray}$$

(10)

where the denominator p(D) can be thought of as a normalization factor that makes the posterior distribution function integrate to unity. If we assume that the noise is independent between data points, then the likelihood function is the product of the conditional probabilities

$$\begin{eqnarray} \mathcal {L} = \prod _{n=1}^N p(D_n|\theta)\ . \end{eqnarray}$$

(11)

Taking the logarithm,

$$\begin{eqnarray} \ln \mathcal {L} = -\frac{1}{2}\sum _{n=1}^N \left[\frac{[D_n-D_{\rm {n, model}} (\theta)]^2}{s_n^2} + \ln (2\pi s_n^2)\right]\ , \end{eqnarray}$$

(12)

where we set

$$\begin{eqnarray} s_n^2 = \sigma _n^2 + D_{\rm {n, model}}^2f^2 \ . \end{eqnarray}$$

(13)

The likelihood function here is assumed to be a Gaussian, where the variance is modelled as is common for the MCMC procedure, as a sum of a constant plus a multiple of the predicted data (i.e. as a combination of an absolute error and a relative error). Here f is a free parameter that gives the MCMC procedure some flexibility, so we include it effectively as an additional model parameter. We apply the procedure to obtain the posterior distribution for all the parameters (seven astrophysical parameters and f) and then marginalize over the extra parameter (f) to obtain the properly marginalized posterior distribution for the seven astrophysical parameters (Hogg et al. 2010). Here the index n denotes various z-bins and k-bins, where the data D_n is the mock observation of the 21-cm power spectrum and D_{n, model} is the predicted 21-cm power spectrum from the emulator. In this work we adopt an effective constant error of:

$$\begin{eqnarray} \sigma _n = 0.25\ {\rm mK}^2\ . \end{eqnarray}$$

(14)

This ensures that the algorithm does not try to achieve a low relative error when the fluctuation itself is low (below ∼0.5 mK) and likely more susceptible to systematic errors in realistic data. What we have described here is a typical setup for MCMC. We emphasize that regardless of our detailed assumptions, in the end we have test models that allow us to independently test the reliability of the uncertainty estimates, as described further in the results section below.

When we use one of the data sets with observational noise and other SKA effects, we modify equation (13) as follows:

$$\begin{eqnarray} s_n^2 = \sigma _n^2 + \sigma _{\rm {var},n}^2 + D_{\rm {n, model}}^2f^2 \ , \end{eqnarray}$$

(15)

where |$\sigma _{\rm {var},n}^2$| is the variance (for each bin of z and k) of the SKA noise power spectrum, found separately for the mock SKA and SKA thermal noise cases. We found these expected variances by randomly generating observational (i.e. signal-free) data. We did not include co-variances among different bins since we found that the correlation coefficients are much smaller than unity (no more than a few percent), indeed so small that their values did not converge even with tens of thousands of samples.

We use the emcee sampler (Foreman-Mackey et al. 2013) which is the affine-invariant ensemble sampler for MCMC (Goodman & Weare 2010). The MCMC sampler only computes the likelihood when the parameters are within the prior bounds. We set the prior bounds for the parameters according to Table 1 and we use flat priors for the parameter values (in log except for α and E_min).

Table 1.

Open in new tab

The prior bounds for the astrophysical parameters.

Parameters	Lower bound	Upper bound
f_⋆	0.0001	0.50
V_C (km s⁻¹)	4.2	100
f_X	0.0001	1000
α	0.9	1.6
E_min (keV)	0.09	3.1
τ	0.01	0.14
R_mfp (Mpc)	9	74

Table 1.

Open in new tab

The prior bounds for the astrophysical parameters.

Parameters	Lower bound	Upper bound
f_⋆	0.0001	0.50
V_C (km s⁻¹)	4.2	100
f_X	0.0001	1000
α	0.9	1.6
E_min (keV)	0.09	3.1
τ	0.01	0.14
R_mfp (Mpc)	9	74

3 RESULTS

3.1 Performance analysis of the emulator

We show the performance of the emulator of the 21-cm power spectrum in Fig. 1. We compare the emulated power spectrum and the true power spectrum from the seminumerical simulation for two particular k values. The left panel shows a few random examples of the emulated power spectrum (solid lines) and the true power spectrum (dashed lines). The different colours denote different models. In this figure, we see that the accuracy of the emulator is generally good and tends to improve with the height of the power spectrum, although there is some random variation among different models. A more representative, statistical analysis of the accuracy is shown further below.

A few random examples of the emulated power spectrum without SKA noise (left panel) and with SKA noise (right panel) at k = 0.11 Mpc−1 (upper panel) and k ≈1.0 Mpc−1 (lower panel); note that the k-bin values and widths are different in the SKA case, as explained in the text. The dashed line is the true power spectrum from the simulation and the solid line is the emulated power spectrum (for combinations of astrophysical parameters that were not included in the training set). Different colours show different models.

Figure 1.

A few random examples of the emulated power spectrum without SKA noise (left panel) and with SKA noise (right panel) at k = 0.11 Mpc⁻¹ (upper panel) and k ≈1.0 Mpc⁻¹ (lower panel); note that the k-bin values and widths are different in the SKA case, as explained in the text. The dashed line is the true power spectrum from the simulation and the solid line is the emulated power spectrum (for combinations of astrophysical parameters that were not included in the training set). Different colours show different models.

Open in new tab Download slide

The right panel of Fig. 1 shows a few random examples of the comparison between the power spectrum emulated by the emulator trained using mock SKA data set and the true power spectrum from the mock SKA data set. The different colours denote different astrophysical models. Again, the emulation is seen to be reasonably accurate, although in some cases the emulated 21-cm power spectrum significantly deviates from the actual one at low redshift and/or when the power spectrum is low. The variations intrinsic to the different models in the power spectra (left panels in Fig. 1) are heavily suppressed once we include the expected observational effects of the SKA experiment into the power spectra (right panels in Fig. 1). In particular, the thermal noise dominates at high redshift. However, as we find from the results below, when we fit the power spectrum with SKA noise there is still significant information in the data that allows the fitting procedure to reconstruct the input parameters. An advantage of machine learning is that the algorithm learns directly how to best deal with noisy data, and there is no need to try to explicitly model or fit the observational effects.

3.2 Dependence of the emulation error on the redshift and wavenumber

For a more detailed assessment of the emulator, we calculate how the error varies with redshift and wavenumber. For this we use a test data set of 639 models (which is 20 per cent of our full data set) for each of the cases: ideal data set, mock SKA data, and SKA thermal noise case. We first directly test the emulator by comparing the predicted power spectrum (feeding into the emulator the known true parameters) to the true simulated power spectrum (as in the previous subsection, but here divided separately into k and z bins). In addition, we test the complete framework by finding the best-fitting astrophysical parameters to mock data using the MCMC sampler; feeding the best-fitting parameters to the emulator of the power spectrum; and finding the error of this best-fitting predicted power spectrum compared to the true simulated power spectrum. In cases with SKA noise (mock SKA data set and SKA thermal noise case), we are not interested in finding the error in the predicted power spectrum with SKA noise (as the power spectrum is often dominated by noise, especially at high redshifts); instead, we make the more challenging comparison of the best-fit predicted power spectrum to the true power spectrum, both in their ‘clean’ versions (i.e. without SKA noise). To be clear, this means taking the reconstructed best-fitting astrophysical parameters (which were reconstructed from the mock SKA data set, based on the NN trained using mock SKA power spectrum), and using it as input to the NN trained using power spectra from the ideal data set (i.e. without SKA noise). Here we use the following definition to quantify the error as a function of redshift and wavenumber:

$$\begin{eqnarray} {\rm Error}(z, k) = \rm {Median\left|\frac{\Delta ^2_{predicted\_clean}-\Delta ^2_{true\_clean}}{\Delta ^2_{true\_clean} + 0.25 \ \rm {mK^2}}\right|} \ \ , \end{eqnarray}$$

(16)

where we take the median over the test models; in this paper we often take the median in order to measure the typical error and reduce the sensitivity to outliers. This definition of the error measures the absolute value of the relative error, except that the denominator includes a constant in order not to demand a low relative error when the fluctuation itself is low [in agreement with equation (14)]. Note that here the errors are much larger than before because they are not normalized to the maximum value of the power spectrum but are measured separately at each bin, including when the power spectrum is low.

In Fig. 2, we show how the error varies with wavenumber (top panels) and redshift (bottom panels), for both the ideal and SKA cases (mock SKA data set and SKA thermal noise case). For the direct emulation case (left-most panels, where we emulate the power spectrum using the true parameters from the test data set), the relative error decreases with wavenumber up to k ∼ 0.1 − 0.2 Mpc⁻¹, then plateaus, and again increases above k ∼ 0.6 Mpc⁻¹. The redshift dependence shows a less regular pattern, except that the errors tend to increase both at the low-redshift and high-redshift end. Overall, the typical emulation error of the power spectrum in each bin is |$10-20~{{\ \rm per\ cent}}$| over a broad range of k and z, but it rises above 20 per cent at the lowest and highest k values (for most redshifts), and at the lowest redshift for all k values (i.e. at z = 6, near the end of reionization, when the power spectrum is highly variable and is sensitive to small changes in the parameters). For some perspective, we note that a 20 per cent error is typically adopted to represent the systematic theoretical modelling error in the 21-cm power spectrum (e.g. Greig & Mesinger 2015, 2017). In the panels in the second column from the left, we use the best-fitting parameters derived from the network trained using ideal data set to emulate the power spectrum. From the comparison to the left-most panels, we see that the fitting of the astrophysical parameters (in this ideal case) is nearly perfect, in that the error that it adds is small compared to the error of the emulator itself. In the panels in the third column, the best-fitting parameters are derived from the network trained using mock SKA data set, but as noted above, the errors are calculated for the ability to predict the real power spectrum, i.e. by comparing the true power spectrum to the prediction of the emulator that was trained using the power spectrum without SKA noise. SKA noise reduces the accuracy of the reconstruction of the astrophysical parameters but not by too much, increasing the typical errors by a fairly uniform factor of ∼1.5, to |$15-30~{{\ \rm per\ cent}}$| for most values of k and z. For the panels in the last column, we use the best-fitting parameters derived from the network trained using the power spectrum of SKA thermal noise case. The errors are nearly identical to the full SKA noise panels, showing that the foreground effects do not add substantial error beyond the angular resolution plus thermal noise, at least for the optimistic foreground avoidance model that we have assumed.

Figure 2.

Redshift and wavenumber dependence of the relative error in emulating the best-fitting power spectrum. The upper panels shows the dependence on wavenumber (for fixed redshift) and the lower panels depict the redshift dependence (for fixed wavenumber). For the left-most panels, we emulate the power spectrum using the true parameters from the test data set. For the panels in the second column from the left, we emulate the power spectrum using the best-fitting parameters derived from the network of ideal data set. For the panels in the third column from the left, we use the best-fitting parameters derived from the network of mock SKA data, but for the error we measure the prediction of the real power spectrum, i.e. we apply the emulator trained without SKA noise. For the panels in the right-most column, we use the best-fitting parameters derived from the network of SKA thermal noise case and otherwise do the same as for the third column. Note that the plots in this figure show all 25 z values and 32 k values.

Open in new tab Download slide

In order to get a better understanding of the span of the models over k and z, we show in Fig. 3 characteristic quantities that enter into the above calculation of the relative errors. In the left column, we show the median of the clean power spectrum (without any noise) as a function of the wavenumber (upper panel) and redshift (lower panel). In the other columns, the median of the absolute difference between the true and predicted clean power spectra is shown as a function of wavenumber (upper panels) and redshift (lower panels). For the panels in the middle column, the best-fitting parameters are derived from the network without any noise (i.e. ideal data set), whereas for the panels in the right column we use the best-fitting parameters derived from the network trained using mock SKA data set to emulate the clean power spectrum (without noise). This figure shows that the 21-cm power spectrum varies greatly as a function of k and z, even when we take out the model-to-model variation by showing the median of the 639 random test cases. The variation is by three and a half orders of magnitude; even if we ignore the parameter space in which the fluctuation level is lower than 0.5 mK (see equation 14), we are left with a range of more than two orders of magnitude. For the considered ranges, the overall variation with redshift at a given wavelength is much greater than the variation with wavelength at a given redshift. Over this large range, the relative error in each case (with or without SKA noise) remains relatively constant; this is seen by the panels in Fig. 3 that show the relative error, which overall follows a similar pattern (with z and k) as the power spectrum except with a compressed range of values.

$Left column: Median of the true (clean) power spectrum (ideal data set), $\Delta ^2_{\rm {true\_clean}}$, as a function of wavenumber (upper panel) and redshift (lower panel). Other columns: The median of the absolute value of the difference between the true and predicted clean power spectrum. For the panels in the middle column, we emulate the power spectrum using the best-fitting parameters derived from the network trained using ideal data set. For the panels in the right column, the best-fitting parameters are derived from the network trained using mock SKA data set, but the error is measured by emulating the clean power spectrum. As in Fig. 2, the plots in this figure show all 25 z values and 32 k values.$

Figure 3.

Left column: Median of the true (clean) power spectrum (ideal data set), |$\Delta ^2_{\rm {true\_clean}}$|⁠, as a function of wavenumber (upper panel) and redshift (lower panel). Other columns: The median of the absolute value of the difference between the true and predicted clean power spectrum. For the panels in the middle column, we emulate the power spectrum using the best-fitting parameters derived from the network trained using ideal data set. For the panels in the right column, the best-fitting parameters are derived from the network trained using mock SKA data set, but the error is measured by emulating the clean power spectrum. As in Fig. 2, the plots in this figure show all 25 z values and 32 k values.

Open in new tab Download slide

3.3 Errors in the fitted astrophysical parameters

Up to now, we have examined the errors in emulating or reconstructing the 21-cm power spectrum. Of greater interest is, of course, the ability to extract astrophysical information from a given power spectrum. In addition to the unavoidable effect on the fitting of the emulation uncertainty, there are also the SKA observational effects.

In order to account for the emulation uncertainty, we employed a method in the spirit of k-fold cross-validation, where the training data set is randomly divided into k portions, and each model is trained on various k − 1 portions (we emphasize that this is completely separate from the actual k-fold cross-validation that we perform in the next subsection). We added the k MCMC chains from all the runs and used the resulting combined chain to get parameter uncertainties and error contours. We used k = 40 for the algorithm described in this subsection. The posterior probability for one astrophysical model is shown in Fig. 4, for the ideal data set (A1 shows the same example for the case of the mock SKA data). The figure shows the posterior distribution of the seven parameter astrophysical model for an example power spectrum from the ideal data set. In this figure the grey dashed lines denote the true parameters from the simulation. As the posteriors are not perfectly Gaussian, we report the median of the distribution (black dashed lines) as the predicted parameter value from the MCMC sampler. The results are rather insensitive to α, and so its value is not well determined, and thus extends to the limits of the prior assumed range, especially in the noisy cases (where the errors are larger). In the case of R_mfp, the upper limit runs into the edge of the assumed range, in the the mock SKA case. As noted in section 2.1.1, we assumed a reasonable upper limit to R_mfp based on observational and theoretical constraints; in the future, we plan to explore more realistic modelling of the end stages of reionization.

$The posterior distribution of the seven parameter astrophysical model, showing the pairwise covariances (off-diagonal entries) and their marginalized distribution (diagonal entries) across each model parameter. The black dashed lines denote the median of the distribution. The grey dashed lines denote the true parameters from the simulation. In the 2D contours, there are only two levels of probability. The fraction of probability contained in each contour: 0.68 and 0.95 (dark through light blue). All the parameters are in log10 except α and Emin. The upper right panel shows the true power spectrum (dashed) versus the reconstructed best-fitting one (solid), at one wavenumber. This figure shows a case that uses the power spectrum from the ideal data set.$

Figure 4.

The posterior distribution of the seven parameter astrophysical model, showing the pairwise covariances (off-diagonal entries) and their marginalized distribution (diagonal entries) across each model parameter. The black dashed lines denote the median of the distribution. The grey dashed lines denote the true parameters from the simulation. In the 2D contours, there are only two levels of probability. The fraction of probability contained in each contour: 0.68 and 0.95 (dark through light blue). All the parameters are in log₁₀ except α and E_min. The upper right panel shows the true power spectrum (dashed) versus the reconstructed best-fitting one (solid), at one wavenumber. This figure shows a case that uses the power spectrum from the ideal data set.

Open in new tab Download slide

More generally, while this type of result for individual models is interesting, we prefer to look at properties that more generally characterize the broad range of possible astrophysical models. In order to understand the general trends, we consider below the overall statistics of the fitting as calculated for a large number of models.

3.4 k-fold cross-validation and statistical analysis of the astrophysical parameter errors

In order to test the overall performance in predicting each of the parameters, we use our test data set of 639 models, which is 20 per cent of our full data set. We calculate in each case the 1-σ MCMC uncertainty. In order to test whether this uncertainty is a realistic error estimate, we calculate the normalized error in predicting a parameter (⁠|$\rm {P_{predicted}}$|⁠) compared to the true value (⁠|$\rm {P_{true}}$|⁠), which we term the |$z\rm {-score}$|⁠:

$$\begin{eqnarray} z\rm {-score}\ = \frac{P_{\rm {true}}-P_{\rm {predicted}}}{\sigma }\ . \end{eqnarray}$$

(17)

If σ is an accurate estimate then the actual values of this normalized error (i.e. |$z\rm {-score}$|⁠) for the test data set should have a standard deviation of unity. All these quantities, namely |$\rm {P_{true}}$|⁠, |$\rm {P_{predicted}}$|⁠, and σ, are measured in log space (log₁₀) for all the parameters except for α and E_min.

To test and improve the error estimation using our procedure, we perform k-fold cross-validation. In k-fold cross-validation, the training sample is partitioned into subsets, reserving one subset for testing while training the emulator on the remaining data. We then repeat this process k times, each time using a different subset as the test data. Comparing the different cases gives validation (if the results do not vary too much), while taking the mean of various statistics among the k-folds gives better estimates. Here we choose to work with k = 5 so that each of our validation sets consists of a test data set equal to 20 per cent of the whole data set. We note that the PCA reduction (as described in Section 2) is performed on the training data for each of the k-folds.

We show the combined histogram of the z-score from all 5-folds in predicting each of the parameters in Fig. 5 (for the three most important parameters of high-redshift galaxies) and Fig. B1 (for the four other parameters, shown in the Appendix). In these figures, the left panels are for the ideal data, the middle panels are for the case with mock SKA data, and the right panels are for the SKA thermal noise case. The black solid line in each panel shows the best-fitting Gaussian of the histogram, also listing its mean (μ) and standard deviation (σ) within the panel. The two grey dashed lines in each panel represent the 3σ boundary of the respective Gaussian. Table 2 (along with Table B1) lists the corresponding parameters of best-fitting Gaussians for each of the 5-folds. The σ values are fairly consistent among the different folds, while there is more variation in some cases in the bias μ. The best-fitting values to the combined distributions (shown in the Figures) agree rather closely with the mean values of the Gaussian parameters (shown in the Tables).

$Histogram of the normalized errors/$z\rm {-scores}$ for the parameters: f⋆, VC and fX as defined in equation (17). The distribution is shown over all the 5-folds. Also shown in each panel is the best-fitting Gaussian, with its parameters listed within the panel. The parameter values here are in log10.$

Figure 5.

Histogram of the normalized errors/|$z\rm {-scores}$| for the parameters: f_⋆, V_C and f_X as defined in equation (17). The distribution is shown over all the 5-folds. Also shown in each panel is the best-fitting Gaussian, with its parameters listed within the panel. The parameter values here are in log₁₀.

Open in new tab Download slide

Table 2.

Open in new tab

Standard deviation (σ) and mean (μ) of the best-fitting Gaussian of the respective histogram of f_⋆, V_C, and f_X for 5-fold cross-validation. We also calculate the mean value of each of the best-fitting parameter for the Gaussian over all the 5-folds. The three parameters here are in log₁₀. Here we show the results for all three cases: ideal data set (top), mock SKA data (middle), and SKA thermal noise (bottom).

Parameters	f_⋆		V_C		f_X
Gaussian fit	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	0.78	−0.00	0.81	+0.07	1.01	−0.25
Fold 2	0.85	+0.15	0.89	+0.42	1.05	−0.41
Fold 3	0.69	+0.08	0.75	+0.28	1.09	−0.32
Fold 4	0.60	−0.24	0.66	−0.25	1.06	−0.24
Fold 5	0.55	−0.14	0.67	−0.14	1.07	−0.38
Mean	0.69	−0.03	0.76	+0.08	1.06	−0.32
Mock SKA data set
Fold 1	0.80	−0.09	0.83	−0.44	1.37	−1.27
Fold 2	0.74	−0.11	0.72	−0.19	1.44	−1.47
Fold 3	0.73	−0.23	0.72	−0.12	1.48	−1.23
Fold 4	0.85	−0.10	0.87	−0.88	1.48	−1.61
Fold 5	0.73	−0.04	0.86	−0.85	1.48	−1.65
Mean	0.77	−0.11	0.80	−0.50	1.45	−1.45
SKA thermal noise case
Fold 1	0.79	−0.11	0.85	−0.45	1.44	−1.25
Fold 2	0.76	−0.16	0.71	−0.20	1.44	−1.45
Fold 3	0.70	−0.22	0.73	−0.12	1.45	−1.24
Fold 4	0.87	−0.11	0.82	−0.83	1.47	−1.56
Fold 5	0.75	−0.06	0.85	−0.84	1.47	−1.60
Mean	0.77	−0.13	0.79	−0.49	1.45	−1.42

Parameters	f_⋆		V_C		f_X
Gaussian fit	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	0.78	−0.00	0.81	+0.07	1.01	−0.25
Fold 2	0.85	+0.15	0.89	+0.42	1.05	−0.41
Fold 3	0.69	+0.08	0.75	+0.28	1.09	−0.32
Fold 4	0.60	−0.24	0.66	−0.25	1.06	−0.24
Fold 5	0.55	−0.14	0.67	−0.14	1.07	−0.38
Mean	0.69	−0.03	0.76	+0.08	1.06	−0.32
Mock SKA data set
Fold 1	0.80	−0.09	0.83	−0.44	1.37	−1.27
Fold 2	0.74	−0.11	0.72	−0.19	1.44	−1.47
Fold 3	0.73	−0.23	0.72	−0.12	1.48	−1.23
Fold 4	0.85	−0.10	0.87	−0.88	1.48	−1.61
Fold 5	0.73	−0.04	0.86	−0.85	1.48	−1.65
Mean	0.77	−0.11	0.80	−0.50	1.45	−1.45
SKA thermal noise case
Fold 1	0.79	−0.11	0.85	−0.45	1.44	−1.25
Fold 2	0.76	−0.16	0.71	−0.20	1.44	−1.45
Fold 3	0.70	−0.22	0.73	−0.12	1.45	−1.24
Fold 4	0.87	−0.11	0.82	−0.83	1.47	−1.56
Fold 5	0.75	−0.06	0.85	−0.84	1.47	−1.60
Mean	0.77	−0.13	0.79	−0.49	1.45	−1.42

Table 2.

Open in new tab

Parameters	f_⋆		V_C		f_X
Gaussian fit	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	0.78	−0.00	0.81	+0.07	1.01	−0.25
Fold 2	0.85	+0.15	0.89	+0.42	1.05	−0.41
Fold 3	0.69	+0.08	0.75	+0.28	1.09	−0.32
Fold 4	0.60	−0.24	0.66	−0.25	1.06	−0.24
Fold 5	0.55	−0.14	0.67	−0.14	1.07	−0.38
Mean	0.69	−0.03	0.76	+0.08	1.06	−0.32
Mock SKA data set
Fold 1	0.80	−0.09	0.83	−0.44	1.37	−1.27
Fold 2	0.74	−0.11	0.72	−0.19	1.44	−1.47
Fold 3	0.73	−0.23	0.72	−0.12	1.48	−1.23
Fold 4	0.85	−0.10	0.87	−0.88	1.48	−1.61
Fold 5	0.73	−0.04	0.86	−0.85	1.48	−1.65
Mean	0.77	−0.11	0.80	−0.50	1.45	−1.45
SKA thermal noise case
Fold 1	0.79	−0.11	0.85	−0.45	1.44	−1.25
Fold 2	0.76	−0.16	0.71	−0.20	1.44	−1.45
Fold 3	0.70	−0.22	0.73	−0.12	1.45	−1.24
Fold 4	0.87	−0.11	0.82	−0.83	1.47	−1.56
Fold 5	0.75	−0.06	0.85	−0.84	1.47	−1.60
Mean	0.77	−0.13	0.79	−0.49	1.45	−1.42

Parameters	f_⋆		V_C		f_X
Gaussian fit	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	0.78	−0.00	0.81	+0.07	1.01	−0.25
Fold 2	0.85	+0.15	0.89	+0.42	1.05	−0.41
Fold 3	0.69	+0.08	0.75	+0.28	1.09	−0.32
Fold 4	0.60	−0.24	0.66	−0.25	1.06	−0.24
Fold 5	0.55	−0.14	0.67	−0.14	1.07	−0.38
Mean	0.69	−0.03	0.76	+0.08	1.06	−0.32
Mock SKA data set
Fold 1	0.80	−0.09	0.83	−0.44	1.37	−1.27
Fold 2	0.74	−0.11	0.72	−0.19	1.44	−1.47
Fold 3	0.73	−0.23	0.72	−0.12	1.48	−1.23
Fold 4	0.85	−0.10	0.87	−0.88	1.48	−1.61
Fold 5	0.73	−0.04	0.86	−0.85	1.48	−1.65
Mean	0.77	−0.11	0.80	−0.50	1.45	−1.45
SKA thermal noise case
Fold 1	0.79	−0.11	0.85	−0.45	1.44	−1.25
Fold 2	0.76	−0.16	0.71	−0.20	1.44	−1.45
Fold 3	0.70	−0.22	0.73	−0.12	1.45	−1.24
Fold 4	0.87	−0.11	0.82	−0.83	1.47	−1.56
Fold 5	0.75	−0.06	0.85	−0.84	1.47	−1.60
Mean	0.77	−0.13	0.79	−0.49	1.45	−1.42

Table 3.

Open in new tab

The median (over 639 test models) of the actual error (|P_true − P_predicted|) for each parameter for 5-fold cross-validation. As before, all the parameter values are in log₁₀ except α and E_min. Here we show the results for all three cases: ideal data set (top), mock SKA data (middle), and SKA thermal noise (bottom).

Parameters	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean
Ideal data set
f_⋆	0.0199	0.0273	0.0226	0.0126	0.0114	0.0188
V_C (km s⁻¹)	0.0080	0.0067	0.0061	0.0145	0.0138	0.0098
f_X	0.1924	0.3458	0.3106	0.1912	0.2149	0.2510
α	0.1928	0.1936	0.1902	0.1847	0.1921	0.1907
E_min (keV)	0.1257	0.3631	0.2955	0.1048	0.1312	0.2041
τ	0.0036	0.0045	0.0043	0.0034	0.0034	0.0038
R_mfp (Mpc)	0.0575	0.0613	0.0643	0.0731	0.0741	0.0661
Mock SKA data set
f_⋆	0.1355	0.0691	0.0764	0.1894	0.1661	0.1273
V_C (km s⁻¹)	0.0605	0.0192	0.0209	0.1752	0.1921	0.0936
f_X	0.8439	0.9416	0.7955	1.1552	1.1771	0.9827
α	0.2210	0.2167	0.2222	0.2242	0.2292	0.2227
E_min (keV)	0.7906	0.9044	0.8302	1.0050	1.0200	0.9100
τ	0.0108	0.0076	0.0081	0.0158	0.0159	0.0117
R_mfp (Mpc)	0.1030	0.0750	0.0729	0.1151	0.1121	0.0956
SKA thermal noise case
f_⋆	0.1266	0.0687	0.0768	0.1931	0.1528	0.1236
V_C (km s⁻¹)	0.0572	0.0187	0.0211	0.1725	0.1828	0.0905
f_X	0.7845	0.9113	0.7805	1.0742	1.1113	0.9323
α	0.2220	0.2179	0.2181	0.2235	0.2276	0.2218
E_min (keV)	0.7572	0.8882	0.8236	0.9760	1.0110	0.8912
τ	0.0107	0.0078	0.0080	0.0150	0.0151	0.0113
R_mfp (Mpc)	0.1011	0.0751	0.0736	0.1118	0.1104	0.0944

Parameters	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean
Ideal data set
f_⋆	0.0199	0.0273	0.0226	0.0126	0.0114	0.0188
V_C (km s⁻¹)	0.0080	0.0067	0.0061	0.0145	0.0138	0.0098
f_X	0.1924	0.3458	0.3106	0.1912	0.2149	0.2510
α	0.1928	0.1936	0.1902	0.1847	0.1921	0.1907
E_min (keV)	0.1257	0.3631	0.2955	0.1048	0.1312	0.2041
τ	0.0036	0.0045	0.0043	0.0034	0.0034	0.0038
R_mfp (Mpc)	0.0575	0.0613	0.0643	0.0731	0.0741	0.0661
Mock SKA data set
f_⋆	0.1355	0.0691	0.0764	0.1894	0.1661	0.1273
V_C (km s⁻¹)	0.0605	0.0192	0.0209	0.1752	0.1921	0.0936
f_X	0.8439	0.9416	0.7955	1.1552	1.1771	0.9827
α	0.2210	0.2167	0.2222	0.2242	0.2292	0.2227
E_min (keV)	0.7906	0.9044	0.8302	1.0050	1.0200	0.9100
τ	0.0108	0.0076	0.0081	0.0158	0.0159	0.0117
R_mfp (Mpc)	0.1030	0.0750	0.0729	0.1151	0.1121	0.0956
SKA thermal noise case
f_⋆	0.1266	0.0687	0.0768	0.1931	0.1528	0.1236
V_C (km s⁻¹)	0.0572	0.0187	0.0211	0.1725	0.1828	0.0905
f_X	0.7845	0.9113	0.7805	1.0742	1.1113	0.9323
α	0.2220	0.2179	0.2181	0.2235	0.2276	0.2218
E_min (keV)	0.7572	0.8882	0.8236	0.9760	1.0110	0.8912
τ	0.0107	0.0078	0.0080	0.0150	0.0151	0.0113
R_mfp (Mpc)	0.1011	0.0751	0.0736	0.1118	0.1104	0.0944

Table 3.

Open in new tab

Parameters	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean
Ideal data set
f_⋆	0.0199	0.0273	0.0226	0.0126	0.0114	0.0188
V_C (km s⁻¹)	0.0080	0.0067	0.0061	0.0145	0.0138	0.0098
f_X	0.1924	0.3458	0.3106	0.1912	0.2149	0.2510
α	0.1928	0.1936	0.1902	0.1847	0.1921	0.1907
E_min (keV)	0.1257	0.3631	0.2955	0.1048	0.1312	0.2041
τ	0.0036	0.0045	0.0043	0.0034	0.0034	0.0038
R_mfp (Mpc)	0.0575	0.0613	0.0643	0.0731	0.0741	0.0661
Mock SKA data set
f_⋆	0.1355	0.0691	0.0764	0.1894	0.1661	0.1273
V_C (km s⁻¹)	0.0605	0.0192	0.0209	0.1752	0.1921	0.0936
f_X	0.8439	0.9416	0.7955	1.1552	1.1771	0.9827
α	0.2210	0.2167	0.2222	0.2242	0.2292	0.2227
E_min (keV)	0.7906	0.9044	0.8302	1.0050	1.0200	0.9100
τ	0.0108	0.0076	0.0081	0.0158	0.0159	0.0117
R_mfp (Mpc)	0.1030	0.0750	0.0729	0.1151	0.1121	0.0956
SKA thermal noise case
f_⋆	0.1266	0.0687	0.0768	0.1931	0.1528	0.1236
V_C (km s⁻¹)	0.0572	0.0187	0.0211	0.1725	0.1828	0.0905
f_X	0.7845	0.9113	0.7805	1.0742	1.1113	0.9323
α	0.2220	0.2179	0.2181	0.2235	0.2276	0.2218
E_min (keV)	0.7572	0.8882	0.8236	0.9760	1.0110	0.8912
τ	0.0107	0.0078	0.0080	0.0150	0.0151	0.0113
R_mfp (Mpc)	0.1011	0.0751	0.0736	0.1118	0.1104	0.0944

Parameters	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean
Ideal data set
f_⋆	0.0199	0.0273	0.0226	0.0126	0.0114	0.0188
V_C (km s⁻¹)	0.0080	0.0067	0.0061	0.0145	0.0138	0.0098
f_X	0.1924	0.3458	0.3106	0.1912	0.2149	0.2510
α	0.1928	0.1936	0.1902	0.1847	0.1921	0.1907
E_min (keV)	0.1257	0.3631	0.2955	0.1048	0.1312	0.2041
τ	0.0036	0.0045	0.0043	0.0034	0.0034	0.0038
R_mfp (Mpc)	0.0575	0.0613	0.0643	0.0731	0.0741	0.0661
Mock SKA data set
f_⋆	0.1355	0.0691	0.0764	0.1894	0.1661	0.1273
V_C (km s⁻¹)	0.0605	0.0192	0.0209	0.1752	0.1921	0.0936
f_X	0.8439	0.9416	0.7955	1.1552	1.1771	0.9827
α	0.2210	0.2167	0.2222	0.2242	0.2292	0.2227
E_min (keV)	0.7906	0.9044	0.8302	1.0050	1.0200	0.9100
τ	0.0108	0.0076	0.0081	0.0158	0.0159	0.0117
R_mfp (Mpc)	0.1030	0.0750	0.0729	0.1151	0.1121	0.0956
SKA thermal noise case
f_⋆	0.1266	0.0687	0.0768	0.1931	0.1528	0.1236
V_C (km s⁻¹)	0.0572	0.0187	0.0211	0.1725	0.1828	0.0905
f_X	0.7845	0.9113	0.7805	1.0742	1.1113	0.9323
α	0.2220	0.2179	0.2181	0.2235	0.2276	0.2218
E_min (keV)	0.7572	0.8882	0.8236	0.9760	1.0110	0.8912
τ	0.0107	0.0078	0.0080	0.0150	0.0151	0.0113
R_mfp (Mpc)	0.1011	0.0751	0.0736	0.1118	0.1104	0.0944

Table 4.

Open in new tab

The mean and median of the error (using equation 18) in the fitting procedure. External emulator: The emulator trained using the 21-cm power spectra with an external radio background. Galactic emulator: The emulator trained using the 21-cm power spectra with a galactic radio background. Here we use the 21-cm power spectrum from the ideal data set.

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0071	0.0043
B	External	Galactic	0.0217	0.0159
C	Galactic	Galactic	0.0092	0.0060
D	Galactic	External	0.0142	0.0109

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0071	0.0043
B	External	Galactic	0.0217	0.0159
C	Galactic	Galactic	0.0092	0.0060
D	Galactic	External	0.0142	0.0109

Table 4.

Open in new tab

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0071	0.0043
B	External	Galactic	0.0217	0.0159
C	Galactic	Galactic	0.0092	0.0060
D	Galactic	External	0.0142	0.0109

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0071	0.0043
B	External	Galactic	0.0217	0.0159
C	Galactic	Galactic	0.0092	0.0060
D	Galactic	External	0.0142	0.0109

Table 5.

Open in new tab

Same as Table 4, but here we use the 21-cm power spectra from the mock SKA data set. As in Fig. 10, the errors are measured on the ability to predict the clean power spectrum.

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0079	0.0056
B	External	Galactic	0.0265	0.0170
C	Galactic	Galactic	0.0136	0.0075
D	Galactic	External	0.0188	0.0140

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0079	0.0056
B	External	Galactic	0.0265	0.0170
C	Galactic	Galactic	0.0136	0.0075
D	Galactic	External	0.0188	0.0140

Table 5.

Open in new tab

Same as Table 4, but here we use the 21-cm power spectra from the mock SKA data set. As in Fig. 10, the errors are measured on the ability to predict the clean power spectrum.

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0079	0.0056
B	External	Galactic	0.0265	0.0170
C	Galactic	Galactic	0.0136	0.0075
D	Galactic	External	0.0188	0.0140

Case	Excess background type	Emulator	Mean	Median
A	External	External	0.0079	0.0056
B	External	Galactic	0.0265	0.0170
C	Galactic	Galactic	0.0136	0.0075
D	Galactic	External	0.0188	0.0140

The standard deviations (σ) for most of the seven parameters are close to unity (within |$\sim 20~{{\ \rm per\ cent}}$|⁠), which implies that our procedure generates a reasonable estimate of the uncertainties. The errors are significantly smaller than expected for τ, and also (to a lesser extent) for f_⋆ and V_C. In the noisy cases, the error is significantly underestimated for f_X (which is the parameter that has the largest log uncertainty). The mean (which measures the bias in the prediction) is in every parameter at most 0.3σ in size, for the ideal data set. The mock SKA data set and SKA thermal noise case give similar results to each other, consistent with the similar comparison in Fig. 2. With the noisy data, the mean values are biased by as much as ∼1.4σ for some of the parameters (f_X and E_min), with significant skewness that favours low values (particularly for f_⋆). Even though the thermal noise is assumed to be Gaussian, it is quite large (especially at high redshifts), and when this is combined with the highly non-linear dependence of the power spectrum on the astrophysical parameters, the resulting distributions are significantly non-Gaussian. When fitting real data, these results can be used directly as estimates of the expected error distributions. It may also be possible to improve the procedure in order to reduce the errors and the bias. In the case of noisy data, various regularization techniques such as dropout or weight decay, or exploring alternative network architectures, might be effective in improving the predictions. We leave for future work further exploration of these possibilities. We also note that most of the distributions are fairly Gaussian, in that only a small fraction of the samples yield best-fitting parameter values that fall outside the 3σ boundary of the respective Gaussian fit.

In Figs 6 and B2 (the latter in the Appendix), we show the combined histogram of the size of the actual error (|P_true − P_predicted|) from all 5-folds for each of the parameters. In the left panels we compare the histogram of the actual error for the cases: ideal data and mock SKA data, whereas in the right panels we compare the actual error for the cases: ideal data and SKA thermal noise. Again the actual errors are measured in log₁₀ for all the parameters except for α and E_min.

Figure 6.

Histogram of the actual errors (|P_true − P_predicted|) in predicting the parameters: f_⋆, V_C, and f_X. The distribution is shown over all the 5-folds. The parameter values here are in log₁₀.

Open in new tab Download slide

Table 3 shows the corresponding median of the actual error for each fold from the 5-fold cross-validation for the cases: ideal data set, mock SKA data set, and SKA thermal noise. We also calculate the mean (of the median) over all the 5-folds for all cases. In the theoretical case of no observational limitations (‘ideal data set’), the emulation errors still allow the parameters V_C and τ to be reconstructed with a typical accuracy of 2.3 per cent and 0.9 per cent, respectively, and f_⋆ to within 4.5 per cent. The ionizing mean free path (R_mfp) is typically uncertain by a factor of 1.16, and f_X by a factor of 1.78. For the linear parameters, the actual error is typically ±0.19 in α and ±0.20 keV in E_min.

One might worry that the dimensionality reduction using PCA on the data set likely introduces some correlations in the resulting power spectrum predictions (see Fig. 2) over the various z and k-bins. In testing the sensitivity to the number of PCA components used to train the emulator, we focus on our main results. Specifically, for the case of the ideal data set we tried doubling the number of PCA components used throughout the analysis. We found that the standard deviation (σ) and mean (μ) of the best-fitting Gaussian of the astrophysical parameters were as follows: f_⋆: (σ = 0.67, μ = 0.00), V_C: (σ = 0.60, μ = 0.03), f_X: (σ = 0.94, μ = −0.24), α: (σ = 1.05, μ = −0.03), E_min: (σ = 0.79, μ = −0.10), τ: (σ = 0.59, μ = 0.05), and R_mfp: (σ = 0.89, μ = 0.02). These values are similar (within |$\sim 20~{{\ \rm per\ cent}}$| or better) to the mean values shown for the ideal data set in Tables 2 and B1. Thus, the main goal of this work, which is to constrain seven-parameter astrophysical models and obtain their uncertainties from the mock 21-cm power spectrum, is not very sensitive to the details of the PCA reduction; we leave for the future further analysis of correlations in the reconstructed power spectrum.

The actual error with mock SKA data and with SKA thermal noise are nearly identical, with the mock SKA case increasing the error by up to 10 per cent (but much less in most cases). The errors are substantially larger compared to the ideal data set, with the mock SKA case giving a median error of 24 per cent in V_C, 2.8 per cent in τ, 34 per cent in f_⋆, factors of 1.25 in R_mfp and 9.6 in f_X, and errors in the linear parameters of ±0.22 in α and ±0.91 keV in E_min. Of course, currently our knowledge of most of these parameters is uncertain by large factors (orders of magnitude in some cases), so these types of constraints would represent a remarkable advance.

3.5 Classification of the radio backgrounds

As noted in the introduction, the possible observation of the absorption profile of the 21-cm line centred at 78 MHz with an amplitude of −500 K by the EDGES collaboration is incompatible with the standard astrophysical prediction. One of the possible explanations for this unexpected signal is that the excess radio background above the CMB enhances the contrast between the spin temperature and the background radiation temperature. Fialkov & Barkana (2019) considered a uniform external radio background (not related to the astrophysical sources directly), with a synchrotron spectrum of spectral index β = −2.6 and amplitude parameter A_r measured relative to the CMB at the reference frequency of 78 MHz. Another potential model for the excess radio background is that it comes from the high redshift radio galaxies. The effect of the inhomogeneous galactic radio background on the 21-cm signal has been explored by Reis et al. (2020b). They used the galactic radio background model to explain the unexpected EDGES low band signal. In our work, we use both the external and galactic radio models and train a neural network to try to infer the type of the radio background given the 21-cm power spectrum. For this purpose, we create a training data set of 9500 models (where there are ∼5000 models with a galactic radio background and ∼4500 models with an external radio background), with the astrophysical parameters varying over the following ranges: f_⋆ = 0.01 − 0.5, V_C = 4.2 − 60 km s⁻¹, f_X = 0.0001 − 1000, α = 1.0 − 1.5, E_min = 0.1 − 3.0, τ = 0.033 − 0.089, R_mfp = 10.0 − 70.0. For the models with a galactic radio background, the normalization of the radio emissivity (measured relative to low-redshift galaxies), f_R, varies over the range f_R = 0.01 − 10⁷, and the range for the amplitude of the radio background, A_r, for the external radio models is is 0.0001 − 0.5.

We apply an EDGES-compatible test data set to the two trained networks. The models that we refer to as EDGES-compatible satisfy the criteria adopted by Fialkov & Barkana (2019) as representing a rough compatibility with the 99 per cent limits of the detected signal in the EDGES low band experiment, in terms of the overall decline and rise without regard to the precise shape of the absorption (which is much more uncertain). The enhanced radio emission must strictly be a high redshift phenomena, in order to not overproduce the observed radio background (Fialkov & Barkana 2019), so we assume a cut-off redshift, z_cutoff = 15 (Reis et al. 2020b) below which f_R = 1 as for present-day radio sources. So we only consider here redshifts from 15 to 30 (or the highest SKA redshift in the case with SKA noise). In our training data set, we treat the radio background parameters A_r or f_R on an equal footing and add an extra column of a binary parameter that specify the type of radio background: 0 for the external radio background and 1 for the galactic radio background. In our EDGES compatible test data set, we have 530 models and 308 models with an external and a galactic radio background, respectively. We apply this test data set to the trained NN. In the predicted parameters, we round off the binary parameter either to zero (when it is ≤0.5) which is the label for the external radio background, or to unity (when it is >0.5) which is the label of the galactic radio background. The confusion matrix shown in Fig. 7 indicates the performance of our classification method for identifying the type of radio background. In the case without noise, the accuracy is 99 per cent. The information available in the whole k range, i.e. 0.05 Mpc⁻¹ <k <1.0 Mpc⁻¹, helps yield this high classification accuracy, despite the level of emulation errors as seen in Fig. 2. The classification accuracy drops to 87.2 per cent if we use the power spectra from the mock SKA data set with excess radio background; the accuracy remains fairly high as these EDGES-inspired models have high 21-cm power spectra that are not so strongly affected by the SKA thermal noise.

Figure 7.

Confusion matrix depicts the performance of our classification method to distinguish the type of the radio background given the 21-cm power spectrum. Left panel: ideal data set. 62.41 per cent of the test models, i.e. 523 models out of 838 models are labelled as having the external radio background and 36.63 per cent of the test models, i.e. 307 models out of 838 models are labelled as having the galactic radio background. The number of misclassified cases is 8, i.e. eight models have been classified as having the other radio background rather than the true radio background. In this case, the overall accuracy is 99 per cent. Right panel: When we train the ANN using the 21-cm power spectrum from the mock SKA data set, the overall accuracy is 87.2 per cent, as seen in the confusion matrix.

Open in new tab Download slide

3.6 Accuracy of fitting the excess radio background models

In the previous subsection we found that our NN works well to infer the type of radio background present in the 21-cm power spectrum. In order to understand cases of misclassification, we now ask whether we can fit a model with the galactic radio background with parameters (f_⋆, V_C, f_X, α, E_min, τ, R_mfp, A_r) of the external radio background model. To address this question, we train an NN that can predict the parameters in the parameter space of an external radio background given the 21-cm power spectra with an excess radio background. We also construct an emulator and train it using the data set with the external radio background. We then employ our trained emulator in the MCMC sampler. If we apply a data set generated with a galactic background, the NN will find the approximate best-fitting parameters in the parameter space of the external radio background, and we use this predicted set of parameters from the NN as the initial guess in the MCMC sampler. The output of the MCMC sampler is the posterior distribution of the best-fitting parameters. We take the median of the distribution for each of the parameters and report it as the predicted best-fitting parameter value. We use these best-fitting parameter values in the emulator trained on the external radio background data set to emulate the 21-cm power spectrum (which, again, is actually based on the galactic radio background model). The left panel of the Fig. 8 shows a few random examples of the quality of fitting the galactic models with an external radio background. In the plot, the solid line is the true power spectrum with the galactic radio background, the dashed line is the best-fitting emulated power spectrum using the correct emulator (galactic) and the dotted line is the best-fitting emulated power spectrum using the other emulator (external). The right panel of Fig. 8 shows a few examples of a similar setup except that the roles of the two types of radio models have been reversed.

Figure 8.

A few examples of the comparison of the emulated power spectra and the true power spectra with the two different models for an excess radio background. Left panel: The true power spectrum comes from the galactic radio background model, and the other emulator uses the external radio background model. Right panel: The true power spectrum comes from the external radio background, and the other emulator uses the galactic radio background model. Shown here is k = 0.11 Mpc⁻¹. Different colours refer to different models (with no relation between the models in the two panels).

Open in new tab Download slide

To test the overall, statistical performance of the emulators in fitting power spectra with different radio backgrounds, we estimate the relative error using the equation

$$\begin{eqnarray} \rm {Error} = \frac{\sqrt{\rm {Mean\left[\left(\Delta ^2_{\rm {predicted}}-\Delta ^2_{\rm {true}}\right)^2\right]}}}{Max\left[\Delta ^2_{\rm {true}}\right]}\ . \end{eqnarray}$$

(18)

As explained before, this yields a simple, optimistic estimate of the overall error in a simple number. It suffices here for our interest in comparing various cases, as follows:

Case A: We have a test data set of 200 models with an external radio background. We fit the power spectra of the test data set using the emulator trained with power spectra with the external radio background (i.e. the correct model in this case). The top left panel of Fig. 9 shows the histogram of errors in the fitting procedure. If we compare the true and predicted power spectra, 98.5 per cent of the cases in the test data set give a relative error less than 0.04.

Figure 9.

Histogram of errors showing the overall performance of the fitting procedure. Top panels: We use a test data set of 200 models with an external radio background. In the top left panel, the histogram shows the error when we employ the emulator trained using the 21-cm power spectra with the external radio background. In the top right panel, the histogram shows the error when we use the emulator trained using 21-cm power spectra with the wrong, galactic radio background model. Bottom panels: We use a test data set of 200 models with a galactic radio background. In the bottom left panel, the histogram represents the error when we apply the emulator trained using the 21-cm power spectra with a galactic radio background. The histogram in the bottom right panel shows the error when we use the emulator trained using the 21-cm power spectra with the wrong, external radio background model. Here we consider the 21-cm power spectra from the ideal data set.

Open in new tab Download slide

Case B: We fit the power spectra of the same test data set as in case A but using the wrong emulator, trained with power spectra with a galactic radio background. The top right panel of Fig. 9 shows the histogram of errors in the fitting procedure. In this case, we find that the relative error is still lower than 0.04 for 87 per cent of cases.

Case C: We use a test data set of 200 models with a galactic radio background. We fit the power spectra of the test data set using the correct emulator, i.e. trained using the power spectra with a galactic radio background. The histogram of errors is shown in the bottom left panel of Fig. 9. Here the relative error is lower than 0.04 for 98.5 per cent cases.

Case D: We fit the power spectra of the same test data set as in case C but using the wrong emulator, trained with the external radio background model. The bottom right panel of Fig. 9 shows the histogram of error for this case. We find that the relative error is still lower than 0.04 in 95 per cent of cases.

Table 4 lists the mean and the median of the relative errors in the various cases mentioned above. The errors increase significantly when the incorrect emulator is used, by factors between 1.5 and 4 in the various cases. Fig. 10 and Table 5 show similar results to Fig. 9 and Table 4, but using the mock SKA power spectrum. SKA noise increases the errors, but typically by only tens of percent, up to a factor of 1.5. This is generally reminiscent of the magnitude of the effect of SKA noise that we saw in a different context in Fig. 2.

Figure 10.

Same as Fig. 9, except using mock SKA data set. In each case we use the best-fitting parameters derived from NN’s trained using mock SKA data, but for the error we measure the prediction of the clean power spectrum, i.e. we apply the emulators (both external and galactic) trained using ideal data set to emulate the best-fitting 21-cm power spectrum, and compare this to the true clean power spectrum.

Open in new tab Download slide

4 CONCLUSIONS

In this work, we applied machine learning techniques to analyse a data set of power spectra from mock 21-cm observations. We developed a numerical emulation that is a computationally efficient method that speeds up the data analysis by bypassing the need to run very large numbers of seminumerical simulations. We trained our neural network over a wide range of possible values of the seven astrophysical parameters that include the star formation efficiency, the minimum circular velocity of star-forming haloes, the X-ray radiation efficiency, the power-law slope and low energy cutoff of the X-ray SED, the CMB optical depth from reionization, and the mean free path of the ionizing photons. We constructed our algorithm in a way that approximately accounts for emulation error (i.e. the uncertainty due to the finite size of the training set), and also tested the accuracy (and improved the error estimates) using 5-fold cross-validation.

As a result we obtained an emulator that can predict the 21-cm power spectrum given the set of astrophysical parameters. We analysed the overall performance of the emulator by comparing, for 639 test models, the emulated power spectrum to the true power spectrum generated in the seminumerical simulation. We showed how the error varies with wavenumber and redshift, using the definition in equation (16). We found that the typical emulation error of the power spectrum in each bin is |$10-20~{{\ \rm per\ cent}}$| over a broad range of k and z, but it rises above 20 per cent at the lowest and highest k values (for most redshifts), and at the lowest redshift for all k values (i.e. at z = 6, near the end of reionization). SKA noise reduces the accuracy of the reconstruction of the astrophysical parameters but not by too much, increasing the typical errors by a fairly uniform factor of ∼1.5.

In order to find constraints on the astrophysical parameters, we employed the emulator to an MCMC sampler to fit parameters to a given 21-cm power spectrum. We quantified the error in predicting the parameters using equation (17). We found that the standard deviations (σ) for most of the seven parameters (Figs 5 and B1) were within 20 per cent of our error estimate. Also, the mean (which measures the bias) was small in every parameter for the ideal data set, and the distribution was fairly well described by a Gaussian. The mock SKA data set or SKA thermal noise case gave nearly identical results to each other; the noisy data made the bias as large as ∼1.4σ for some of the parameters, and added significant skewness in some cases. Table 2 (along with Table B1) listed the corresponding parameters of best-fitting Gaussians for each of the 5-folds, along with means of these parameters over the 5-folds. In Figs 6 and B2, we showed the histograms of the size of the actual error in predicting each of the parameters, with the medians listed in Table 3. We measured the parameters and uncertainties in log scale (log₁₀) for all the parameters except for α and E_min. In the case of the ideal data set, we found that the emulation errors still allow the parameters V_c and τ to be reconstructed with a typical accuracy of 2.3 per cent and 0.9 per cent, respectively, and f_⋆ to within 4.5 per cent. The ionizing mean free path (R_mfp) is typically uncertain by a factor of 1.16, and f_X by a factor of 1.78. For the linear parameters, the uncertainty is typically ±0.19 in α and ±0.20 in E_min.

Noisy SKA data only marginally affected the uncertainty in α, indicating the importance of the emulation error and also the overall lack of sensitivity of the power spectrum to this parameter. However, SKA noise increased the errors to 24 per cent in V_C, 2.8 per cent in τ, and 34 per cent in f_⋆. The uncertainty factor increased mildly (to 1.25) in R_mfp but greatly (to 9.6) in f_X, and also in E_min (to ±0.91 keV). Currently we have almost no observational information about the values of most of the astrophysical parameters we used in this work, except for the optical depth (τ) of the CMB photons. The detection of the polarized CMB signal by WMAP (Spergel et al. 2003) and Planck (Planck Collaboration VI 2020) has provided constraints on the optical depth, but their precision is limited by the cosmic variance of the large-angle polarization effect on the CMB. We have shown that 21-cm power spectrum observations can potentially produce a precise measurement of this and other astrophysical parameters. Of course we have used a relatively simple astrophysical model here. If such precise measurements are indeed achieved, they will motivate comparisons with more complicated models where, for example, the various efficiencies (for star formation, X-rays, and ionizing photons) depend on redshift, halo mass, and perhaps halo environment or merger history.

In another part of this work, we applied a neural network to classify the nature of the excess radio background, if it is present in the 21-cm signal. We compared models with an external radio background (assumed to be primordial and homogeneous) to a galactic radio background (produced by the same galaxies as other radiation backgrounds, thus generally reflecting the galaxy distribution). The accuracy of the classification was 99 per cent for the 21-cm power spectrum without SKA noise (ideal data set), going down to a still high accuracy of 87 per cent if we use the 21-cm power spectrum from the mock SKA data set. When fitting data with either the correct emulator or the one from the other type of radio background, we found that the fits were in all cases rather accurate (Table 4). However, the errors increased significantly when the incorrect emulator was used, by factors between 1.5 and 4 in the various cases in the case without SKA noise. Adding SKA noise increased the errors, but typically by only tens of percent, up to a factor of 1.5 (Table 5).

In summary, emulating and fitting the 21-cm power spectrum using ANNs is a rapid and accurate method. One of the potential extensions of this work will be to improve the accuracy of the emulator, e.g. by trying to change various parameters such as the number of layers of the NN. Another possible improvement is to use the current procedure as the first step of a fit, and then zoom in on a smaller region of the parameter space in order to achieve higher accuracy (noting that we have covered a far wider range of astrophysical parameters than most similar work in the literature). A further direction is to make the astrophysical model more realistic by adding a significant number of parameters, and seeing whether the computational speed and fitting accuracy are maintained. NN’s will clearly remain valuable in this field, given the highly non-linear dependence of the 21-cm power spectrum on astrophysical parameters, the wide range of possible values for these parameters, and the relative slowness of realistic simulations, even seminumerical ones.

Acknowledgement

This project was made possible for SS, RB, and IR through the support of the Israel Science Foundation (grant No. 2359/20). RB also acknowledges the support of The Ambrose Monell Foundation and the Institute for Advanced Study. AF was supported by the Royal Society University Research Fellowship.

This research made use of: Numpy (Harris et al. 2020), Scipy (Virtanen et al. 2020), matplotlib (Hunter 2007), seaborn (Waskom 2021), getdist (Lewis 2019), and the NASA Astrophysics Data System Bibliographic Services.

DATA AVAILABILITY

The data underlying this article will be shared on reasonable request to the corresponding author.

References

Abdurashidova

et al. ,

2022

ApJ

924

Parameters	α		E_min		τ		R_mfp
Gaussian fit	σ	μ	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	1.13	+0.05	0.86	−0.07	0.62	+0.02	0.83	+0.06
Fold 2	1.10	+0.19	0.94	−0.30	0.61	+0.02	0.90	+0.08
Fold 3	1.14	+0.05	0.94	−0.22	0.65	−0.06	0.94	+0.04
Fold 4	1.10	+0.09	0.95	+0.04	0.61	+0.00	0.89	+0.11
Fold 5	1.10	+0.00	0.87	−0.17	0.54	+0.06	0.90	+0.11
Mean	1.11	+0.08	0.91	−0.14	0.61	+0.01	0.89	+0.08
Mock SKA data set
Fold 1	1.18	+0.19	1.17	−1.09	0.45	+0.20	0.87	−0.15
Fold 2	1.18	+0.28	1.19	−1.14	0.54	+0.17	0.82	−0.07
Fold 3	1.23	+0.17	1.23	−1.06	0.58	+0.11	0.82	−0.14
Fold 4	1.18	+0.22	1.11	−1.37	0.60	+0.18	0.87	−0.15
Fold 5	1.19	+0.17	1.14	−1.42	0.52	+0.15	0.83	−0.08
Mean	1.19	+0.21	1.17	−1.22	0.54	+0.16	0.84	−0.12
SKA thermal noise case
Fold 1	1.18	+0.20	1.17	−1.05	0.52	+0.20	0.85	−0.16
Fold 2	1.16	+0.29	1.16	−1.09	0.52	+0.18	0.83	−0.09
Fold 3	1.23	+0.18	1.24	−1.05	0.61	+0.10	0.83	−0.17
Fold 4	1.18	+0.22	1.19	−1.32	0.61	+0.20	0.87	−0.17
Fold 5	1.20	+0.19	1.17	−1.40	0.52	+0.16	0.81	−0.05
Mean	1.19	+0.22	1.19	−1.18	0.56	+0.17	0.84	−0.13

Parameters	α		E_min		τ		R_mfp
Gaussian fit	σ	μ	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	1.13	+0.05	0.86	−0.07	0.62	+0.02	0.83	+0.06
Fold 2	1.10	+0.19	0.94	−0.30	0.61	+0.02	0.90	+0.08
Fold 3	1.14	+0.05	0.94	−0.22	0.65	−0.06	0.94	+0.04
Fold 4	1.10	+0.09	0.95	+0.04	0.61	+0.00	0.89	+0.11
Fold 5	1.10	+0.00	0.87	−0.17	0.54	+0.06	0.90	+0.11
Mean	1.11	+0.08	0.91	−0.14	0.61	+0.01	0.89	+0.08
Mock SKA data set
Fold 1	1.18	+0.19	1.17	−1.09	0.45	+0.20	0.87	−0.15
Fold 2	1.18	+0.28	1.19	−1.14	0.54	+0.17	0.82	−0.07
Fold 3	1.23	+0.17	1.23	−1.06	0.58	+0.11	0.82	−0.14
Fold 4	1.18	+0.22	1.11	−1.37	0.60	+0.18	0.87	−0.15
Fold 5	1.19	+0.17	1.14	−1.42	0.52	+0.15	0.83	−0.08
Mean	1.19	+0.21	1.17	−1.22	0.54	+0.16	0.84	−0.12
SKA thermal noise case
Fold 1	1.18	+0.20	1.17	−1.05	0.52	+0.20	0.85	−0.16
Fold 2	1.16	+0.29	1.16	−1.09	0.52	+0.18	0.83	−0.09
Fold 3	1.23	+0.18	1.24	−1.05	0.61	+0.10	0.83	−0.17
Fold 4	1.18	+0.22	1.19	−1.32	0.61	+0.20	0.87	−0.17
Fold 5	1.20	+0.19	1.17	−1.40	0.52	+0.16	0.81	−0.05
Mean	1.19	+0.22	1.19	−1.18	0.56	+0.17	0.84	−0.13

Parameters	α		E_min		τ		R_mfp
Gaussian fit	σ	μ	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	1.13	+0.05	0.86	−0.07	0.62	+0.02	0.83	+0.06
Fold 2	1.10	+0.19	0.94	−0.30	0.61	+0.02	0.90	+0.08
Fold 3	1.14	+0.05	0.94	−0.22	0.65	−0.06	0.94	+0.04
Fold 4	1.10	+0.09	0.95	+0.04	0.61	+0.00	0.89	+0.11
Fold 5	1.10	+0.00	0.87	−0.17	0.54	+0.06	0.90	+0.11
Mean	1.11	+0.08	0.91	−0.14	0.61	+0.01	0.89	+0.08
Mock SKA data set
Fold 1	1.18	+0.19	1.17	−1.09	0.45	+0.20	0.87	−0.15
Fold 2	1.18	+0.28	1.19	−1.14	0.54	+0.17	0.82	−0.07
Fold 3	1.23	+0.17	1.23	−1.06	0.58	+0.11	0.82	−0.14
Fold 4	1.18	+0.22	1.11	−1.37	0.60	+0.18	0.87	−0.15
Fold 5	1.19	+0.17	1.14	−1.42	0.52	+0.15	0.83	−0.08
Mean	1.19	+0.21	1.17	−1.22	0.54	+0.16	0.84	−0.12
SKA thermal noise case
Fold 1	1.18	+0.20	1.17	−1.05	0.52	+0.20	0.85	−0.16
Fold 2	1.16	+0.29	1.16	−1.09	0.52	+0.18	0.83	−0.09
Fold 3	1.23	+0.18	1.24	−1.05	0.61	+0.10	0.83	−0.17
Fold 4	1.18	+0.22	1.19	−1.32	0.61	+0.20	0.87	−0.17
Fold 5	1.20	+0.19	1.17	−1.40	0.52	+0.16	0.81	−0.05
Mean	1.19	+0.22	1.19	−1.18	0.56	+0.17	0.84	−0.13

Parameters	α		E_min		τ		R_mfp
Gaussian fit	σ	μ	σ	μ	σ	μ	σ	μ
Ideal data set
Fold 1	1.13	+0.05	0.86	−0.07	0.62	+0.02	0.83	+0.06
Fold 2	1.10	+0.19	0.94	−0.30	0.61	+0.02	0.90	+0.08
Fold 3	1.14	+0.05	0.94	−0.22	0.65	−0.06	0.94	+0.04
Fold 4	1.10	+0.09	0.95	+0.04	0.61	+0.00	0.89	+0.11
Fold 5	1.10	+0.00	0.87	−0.17	0.54	+0.06	0.90	+0.11
Mean	1.11	+0.08	0.91	−0.14	0.61	+0.01	0.89	+0.08
Mock SKA data set
Fold 1	1.18	+0.19	1.17	−1.09	0.45	+0.20	0.87	−0.15
Fold 2	1.18	+0.28	1.19	−1.14	0.54	+0.17	0.82	−0.07
Fold 3	1.23	+0.17	1.23	−1.06	0.58	+0.11	0.82	−0.14
Fold 4	1.18	+0.22	1.11	−1.37	0.60	+0.18	0.87	−0.15
Fold 5	1.19	+0.17	1.14	−1.42	0.52	+0.15	0.83	−0.08
Mean	1.19	+0.21	1.17	−1.22	0.54	+0.16	0.84	−0.12
SKA thermal noise case
Fold 1	1.18	+0.20	1.17	−1.05	0.52	+0.20	0.85	−0.16
Fold 2	1.16	+0.29	1.16	−1.09	0.52	+0.18	0.83	−0.09
Fold 3	1.23	+0.18	1.24	−1.05	0.61	+0.10	0.83	−0.17
Fold 4	1.18	+0.22	1.19	−1.32	0.61	+0.20	0.87	−0.17
Fold 5	1.20	+0.19	1.17	−1.40	0.52	+0.16	0.81	−0.05
Mean	1.19	+0.22	1.19	−1.18	0.56	+0.17	0.84	−0.13

Article Contents

Emulation of the cosmic dawn 21-cm power spectrum and classification of excess radio models using an artificial neural network

ABSTRACT

1 INTRODUCTION

2 THEORY AND METHODS

2.1 21-cm signals

2.1.1 Astrophysical parameters

2.1.2 Power spectrum

2.2 The excess radio background

2.3 Mock SKA data

2.4 Method to generate the data set

2.5 Artificial neural network

2.5.1 Astrophysical parameter predictions

2.5.2 Emulation of the 21-cm power spectrum

2.6 Posterior distribution of the astrophysical model

3 RESULTS

3.1 Performance analysis of the emulator

3.2 Dependence of the emulation error on the redshift and wavenumber

3.3 Errors in the fitted astrophysical parameters

3.4 k-fold cross-validation and statistical analysis of the astrophysical parameter errors

3.5 Classification of the radio backgrounds

3.6 Accuracy of fitting the excess radio background models

4 CONCLUSIONS

Acknowledgement

DATA AVAILABILITY

References

APPENDIX A: POSTERIOR DISTRIBUTION OF THE SEVEN PARAMETER ASTROPHYSICAL MODEL

APPENDIX B: HISTOGRAMS OF THE NORMALIZED ERROR IN PREDICTING THE ASTROPHYSICAL PARAMETERS

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only