ABSTRACT

Analysing next-generation cosmological data requires balancing accurate modelling of non-linear gravitational structure formation and computational demands. We propose a solution by introducing a machine learning-based field-level emulator, within the Hamiltonian Monte Carlo-based Bayesian Origin Reconstruction from Galaxies (BORG) inference algorithm. Built on a V-net neural network architecture, the emulator enhances the predictions by first-order Lagrangian perturbation theory to be accurately aligned with full N-body simulations while significantly reducing evaluation time. We test its incorporation in BORG for sampling cosmic initial conditions using mock data based on non-linear large-scale structures from N-body simulations and Gaussian noise. The method efficiently and accurately explores the high-dimensional parameter space of initial conditions, fully extracting the cross-correlation information of the data field binned at a resolution of |$1.95\,h^{-1}$| Mpc. Percent-level agreement with the ground truth in the power spectrum and bispectrum is achieved up to the Nyquist frequency |$k_\mathrm{N} \approx 2.79h \,\, \mathrm{Mpc}^{-1}$|⁠. Posterior resimulations – using the inferred initial conditions for N-body simulations – show that the recovery of information in the initial conditions is sufficient to accurately reproduce halo properties. In particular, we show highly accurate |$M_{200\mathrm{c}}$| halo mass function and stacked density profiles of haloes in different mass bins |$[0.853,16]\times 10^{14}\,{\rm M}_{\odot }\,h^{-1}$|⁠. As all available cross-correlation information is extracted, we acknowledge that limitations in recovering the initial conditions stem from the noise level and data grid resolution. This is promising as it underscores the significance of accurate non-linear modelling, indicating the potential for extracting additional information at smaller scales.

1 INTRODUCTION

Imminent next-generation cosmological surveys, such as DESI (DESI Collaboration 2016), Euclid (Laureijs et al. 2011; Amendola et al. 2018), LSST at the Vera C. Rubin Observatory (LSST Science Collaboration 2009; LSST Dark Energy Science Collaboration 2012; Ivezić et al. 2019), SPHEREx (Doré et al. 2014), and Subaru Prime Focus Spectrograph (Takada et al. 2012), will provide an unprecedented wealth of galaxy data probing the large-scale structure of the Universe. The increased volume of data, with the expected number of galaxies in the order of billions, must now be matched by accurate modelling to optimally extract physical information. Analysis at the field level recently emerged as a successful alternative to the traditional way of analysing cosmological data – through a limited set of summary statistics – and instead uses information from the entire field (Jasche & Wandelt 2013; Wang et al. 2014; Jasche, Leclercq & Wandelt 2015; Lavaux & Jasche 2016).

Addressing the complexities of modelling higher-order statistics and defining optimal summary statistics for information extraction from the galaxy distribution is a challenge that can be overcome by employing a fully numerical approach at the field level (see e.g. Jasche et al. 2015; Lavaux & Jasche 2016; Jasche & Lavaux 2019; Lavaux, Jasche & Leclercq 2019). All higher-order statistics of the cosmic matter distribution are generated through a physics model, which connects the early Universe with the large-scale structure of today. Enabled by the nearly Gaussian nature of the initial perturbations, the inference is pushed to explore the space of plausible initial conditions from which the present-day non-linear cosmic structures formed under gravitational collapse. Although accurate physics models are needed, the computational resources required for parameter inference prompt the use of approximate models. In this work, we propose a solution by introducing a physics model based on a machine-learning-trained field-level emulator as an extension to first-order Lagrangian Perturbation Theory (1LPT) to accurately align with full N-body simulations, resulting in higher accuracy and lower computational footprint during inference than currently available.

Field-level inferences, supported by a spectrum of methodologies and nuances (e.g. Jasche & Wandelt 2013; Kitaura 2013; Ata, Kitaura & Müller 2014; Wang et al. 2014; Jasche & Lavaux 2017; Seljak et al. 2017; Modi, Feng & Seljak 2018; Bos, Kitaura & van de Weygaert 2019; Schmidt et al. 2019; Ata et al. 2021; Kitaura et al. 2021; Villaescusa-Navarro et al. 2021a; Hahn et al. 2022, 2023; Jindal et al. 2023; Kostić et al. 2023; Legin et al. 2023; Modi, Li & Blei 2023; Stadler, Schmidt & Reinecke 2023; Bayer, Seljak & Modi 2023a; Bayer, Modi & Ferraro 2023b), have become a prominent approach. Notably, analysis at the field level has recently been shown to best serve the goal of maximizing the information extraction (Leclercq & Heavens 2021; Boruah & Rozo 2024). It has also been shown that a significant amount of the cosmological signal is embedded in non-linear scales (e.g. Ma & Scott 2016; Seljak et al. 2017; Villaescusa-Navarro et al. 2020, 2021b), which can be optimally extracted by field-level inference. N-body simulations are currently the most sophisticated numerical method to simulate the full non-linear structure formation of our Universe (for a review, see Vogelsberger et al. 2020). Direct use of N-body simulations in field-level inference pipelines is none the less challenging, due to the high cost of model evaluations. To lower the required computational resources, all the while resolving non-linear scales, quasi N-body numerical schemes have been developed, e.g. tCOLA (Tassev, Zaldarriaga & Eisenstein 2013), FastPM (Feng et al. 2016), FlowPM (Modi, Lanusse & Seljak 2021), PMWD (Li et al. 2022), and Hybrid Physical-Neural ODEs (Lanzieri, Lanusse & Starck 2022). All these, however, involve significant trade-offs between speed and non-linear accuracy (e.g. Stopyra et al. 2024).

As demonstrated with the Markov Chain Monte Carlo (MCMC) based Bayesian Origin Reconstruction from Galaxies (BORG) algorithm (Jasche & Wandelt 2013) for galaxy clustering (Jasche et al. 2015; Lavaux & Jasche 2016; Jasche & Lavaux 2019; Lavaux et al. 2019), weak lensing (Porqueres et al. 2021, 2022, 2023), velocity tracers (Prideaux-Ghee et al. 2023), and Lyman-|$\alpha$| forest (Porqueres et al. 2019, 2020), field-level inference with more than tens of millions of parameters has become feasible. These applications, in addition to other studies within the BORG framework (Jasche & Lavaux 2019; Ramanah et al. 2019; Nguyen et al. 2020; Tsaprazi et al. 2022; Andrews et al. 2023; Tsaprazi et al. 2023), rely on fast, approximate, and differentiable physical forward models, such as first and second-order Lagrangian Perturbation Theory (1LPT/2LPT), non-linear particle mesh models, and tCOLA. The application of the BORG algorithm (Jasche et al. 2015; Lavaux & Jasche 2016; Jasche & Lavaux 2019; Lavaux et al. 2019) to trace the galaxy distribution of the nearby Universe from SDSS-II (Abazajian et al. 2009), BOSS/SDSS-III (Eisenstein et al. 2011; Dawson et al. 2013), and the 2M+ + catalogue (Lavaux & Hudson 2011), respectively, further laid the foundation for posterior resimulations of the local Universe with high-resolution N-body simulations (Leclercq, Jasche & Wandelt 2015; Nguyen et al. 2020; Desmond et al. 2022; Hutt et al. 2022; Mcalpine et al. 2022; Stiskalek et al. 2024). With the success of this approach, the requirements increased. Posterior resimulations require that the inferred initial conditions when evolved to redshift |$z=0$| reproduce accurate cluster masses and halo properties. To achieve this goal, Stopyra et al. (2024) showed that a minimal accuracy of the physics simulator during inference is required. This inevitably leads to a higher computational demand, which we address in this work by the use of the field-level emulator.

Specifically, we propose to use a machine learning replacement for N-body simulations in the inference. It takes the form of a field-level emulator, which is trained through deep learning to mimic the predictions of the complex N-body simulation, as displayed in Fig. 1. Such emulators have recently been used both to map cosmological initial conditions to the final density field (Bernardini et al. 2020) and to map the output of a fast and less accurate simulation to the more complex output of an N-body simulation (He et al. 2019; de Oliveira et al. 2020; Kaushal et al. 2021; Jamieson et al. 2022a, b). While He et al. (2019) and Oliveira et al. (2020) map 1LPT to FastPM and N-body respectively for a fixed cosmology, Jamieson et al. (2022b) made significant advancement over the previous works by both training over different cosmologies and mapping 1LPT directly to N-body simulations. Similarly, Kaushal et al. (2021) presented the mapping from tCOLA to N-body for various cosmologies. The main benefit of the field-level emulator over other gravity models is being fast and differentiable to evaluate, all the while achieving per cent level accuracy at scales down to |$k \sim 1\,h$| Mpc|$^{-1}$| compared to N-body simulations. The computational cost to model the complexity is instead pushed into the network training of the emulator.

Visual comparison of three physical forward models at redshift $z=0$, illustrated in the panels as slices through the density field. In all cases, $128^3$ particles in a cubic volume with side length $250\,h^{-1}$ Mpc were simulated. Identical initial conditions underpin the N-body simulation, the first-order Lagrangian Perturbation Theory (1LPT), and the emulator (BORG-EM). The emulator, utilizing 1LPT displacements as input and generating N-body-like displacements, effectively bridges the gap between 1LPT and N-body outcomes; while the 1LPT struggles to replicate collapsed overdensities as effectively as the N-body simulation, the emulator achieves remarkable success in reproducing N-body-like cosmic structures. In the right-most panel, the N-body simulation’s initial conditions originate from the posterior distribution realized through the BORG algorithm. The inference process employs the N-body density field (the left-most panel) combined with Gaussian noise as the ground truth, utilizing the BORG-EM as the forward model during inference. As expected, given that this represents a single realization of posterior resimulations of initial conditions, minor deviations from the true N-body field are visible. Being able to capture large-scale structures as well as collapsed overdensities, the field-level emulator demonstrates that field-level inference from non-linear cosmic structures is feasible.
Figure 1.

Visual comparison of three physical forward models at redshift |$z=0$|⁠, illustrated in the panels as slices through the density field. In all cases, |$128^3$| particles in a cubic volume with side length |$250\,h^{-1}$| Mpc were simulated. Identical initial conditions underpin the N-body simulation, the first-order Lagrangian Perturbation Theory (1LPT), and the emulator (BORG-EM). The emulator, utilizing 1LPT displacements as input and generating N-body-like displacements, effectively bridges the gap between 1LPT and N-body outcomes; while the 1LPT struggles to replicate collapsed overdensities as effectively as the N-body simulation, the emulator achieves remarkable success in reproducing N-body-like cosmic structures. In the right-most panel, the N-body simulation’s initial conditions originate from the posterior distribution realized through the BORG algorithm. The inference process employs the N-body density field (the left-most panel) combined with Gaussian noise as the ground truth, utilizing the BORG-EM as the forward model during inference. As expected, given that this represents a single realization of posterior resimulations of initial conditions, minor deviations from the true N-body field are visible. Being able to capture large-scale structures as well as collapsed overdensities, the field-level emulator demonstrates that field-level inference from non-linear cosmic structures is feasible.

An investigation of the robustness of field-level emulators was made in Jamieson et al. (2022a). The emulator achieves per cent-level accuracy deep into the non-linear regime and generalizes well, demonstrating that it has learned general physical principles and non-linear mode couplings. This builds confidence in emulators being part of the physical forward model within cosmological inference. Other recent works have also demonstrated promise in inferring the initial conditions from non-linear dark matter densities with machine-learning based methods (Jindal et al. 2023; Legin et al. 2023).

The manuscript is structured as follows. In Section 2, we describe the merging of the 1LPT forward model in BORG with a field-level emulator, which we call BORG-EM, to infer the initial conditions from non-linear dark matter densities, which we discuss along the introduction of a novel multiscale likelihood in Section 3. We demonstrate the efficient exploration of the high-dimensional parameter space of initial conditions in Section 4. In Section 5, we show that our method fully extracts all cross-correlation information from the data well into the noise-dominated regime up to the data grid resolution limit. We use the inferred initial conditions to run posterior resimulations and show in Section 6 that the recovered information in the initial conditions is sufficient for accurate recovery of halo masses and density profiles. We summarize and conclude our findings in Section 7.

2 FIELD-LEVEL INFERENCE WITH EMULATORS

Inferring the initial conditions of the Universe from available data is enabled by the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm. The modular structure of BORG facilitates replacement of the forward model, which translates initial conditions to the data space. To increase the accuracy and decrease the computational cost, we integrate field-level emulators.

2.1 The BORG algorithm

The BORG algorithm is a Bayesian hierarchical inference framework designed to analyse the three-dimensional cosmic matter distribution underlying observed galaxies in surveys (Jasche & Kitaura 2010; Jasche & Wandelt 2013; Jasche et al. 2015; Lavaux & Jasche 2016; Jasche & Lavaux 2019; Lavaux et al. 2019). More specifically, by incorporating physical models of gravitational structure formation and a data likelihood BORG turns the task of analysing the present-day cosmic structures into exploring the high-dimensional space of plausible initial conditions from which these observed structures formed. To explore this vast parameter space, BORG uses the Hamiltonian Monte Carlo (HMC) framework generating Markov Chain Monte Carlo (MCMC) samples that approximate the posterior distribution of initial conditions (Jasche & Kitaura 2010; Jasche & Wandelt 2013). The benefit of the HMC is two-fold: (1) it utilizes conserved quantities such as the Hamiltonian to obtain a high Metropolis-Hastings acceptance rate, effectively reducing the rejected model evaluations and (2) it efficiently uses model gradients to traverse the parameter space, reducing random walk behaviour through persistent motion (Duane et al. 1987; Neal 1993).

Importantly, BORG solely relies on forward model evaluations and at no point requires the inversion of the flow of time in them, which generally is not possible due to inverse problems being ill-posed because of noisy and incomplete observational data (Nusser & Dekel 1992; Crocce & Scoccimarro 2006). BORG implements a fully differentiable forward model such that the adjoint gradient with respect to the initial conditions can be computed. Current physics models in BORG include first and second-order Lagrangian Perturbation Theory (BORG-1LPT/BORG-2LPT), non-linear Particle Mesh models BORG-PM with/without tCOLA. During inference, BORG accounts for both systematic and stochastic observational uncertainties, such as survey characteristics, selection effects, and luminosity-dependent galaxy biases as well as unknown noise and foreground contaminations (Jasche & Wandelt 2013; Jasche et al. 2015; Lavaux & Jasche 2016; Jasche & Lavaux 2017). As BORG infers the initial conditions, it effectively also recovers the dynamic formation history of the large-scale structure as well as its evolved density and velocity fields.

The sampled initial conditions generated by BORG can be used for running posterior resimulations using more sophisticated structure formation models at high resolution (Leclercq et al. 2015; Nguyen et al. 2020; Desmond et al. 2022; Mcalpine et al. 2022). In this work, we aim to improve the fidelity of inferences through the incorporation of novel machine-learning emulator techniques into the forward model, as required for accurate recovery of massive structures (Nguyen et al. 2021; Stopyra et al. 2024).

2.2 Field-level emulators

In this work, we build upon the work of Jamieson et al. (2022b) by incorporating the convolutional neural network (CNN) emulator into the physical forward model of BORG. While an N-body simulation aims to translate initial particle positions into their final positions, which represent the non-linear dark matter distribution, several approximate models strive for the same result with a lower computational cost. Specifically, our emulator aims to use as input first-order LPT and correct it such that the output of the emulator aligns with the results of an actual N-body simulation.

Denoting |$\boldsymbol {q}$| as the initial positions on an equidistant Cartesian grid in Lagrangian space and |$\boldsymbol {p}$| as the final positions of simulated dark matter particles, we define the displacements as |$\boldsymbol{\Psi } = \boldsymbol {p}-\boldsymbol {q}$|⁠. The 3-dimensional displacement field |$\boldsymbol{\Psi }_{1\mathrm{LPT}}$| generated by 1LPT at redshift |$z=0$| is used as input. Let us denote the emulator mapping |$G_{\mathrm{EM}}$| as

(1)

where |$\boldsymbol{\Psi }_{\mathrm{EM}}$| is the output displacement field of the emulator. The final positions |$\boldsymbol {p}$| of the emulator are then given by

(2)

As we will see in Section 2.2.1, |$\Omega _\mathrm{m}$| is an additional input to the network, which helps encode the dependence of clustering across different scales and enables the emulator to accurately reproduce the final particle positions (at redshift |$z=0$|⁠) of N-body simulations for a wide range of cosmologies as shown in Jamieson et al. (2022b). For this work, we use a fixed set of cosmological parameters and we will therefore not explicitly state |$\Omega _\mathrm{m}$| as an input to the emulator.

2.2.1 Model re-training

The model architecture of the emulator is identical to the one described in Jamieson et al. (2022b) and is based on the U-Net/V-Net architecture (Ronneberger, Fischer & Brox 2015; Milletari, Navab & Ahmadi 2016). These specialized convolutional neural network architectures were initially developed for biomedical image segmentation and three-dimensional volumetric medical imaging, respectively. By working on multiple levels of resolution connected in a U-shape as shown in Fig. 2, first by several downsampling layers and then by the same amount of upsampling layers, these architectures excel at capturing fine spatial details. The sensitivity to information at different scales of these architectures is a critical aspect, making them particularly suitable for accurately representing cosmological large-scale structures. They also maintain intricate spatial information through skip connections in their down- and upsampling paths.

Overview of the field-level emulator integration within the BORG algorithm, which we call BORG-EM. The process begins with the evolution of the initial white-noise field using first-order Lagrangian Perturbation Theory (1LPT) up to redshift $z=0$, yielding the displacement field $\Psi _{1\mathrm{LPT}}$. After periodic padding, these displacements are passed through the V-NET emulator. The emulator’s output displacement field $\Psi _{\mathrm{EM}}$ is subsequently transformed into an overdensity field $\boldsymbol{\delta }$ using a cloud-in-cell mass assignment scheme. A likelihood evaluation with the data $\boldsymbol {d}$ (here the output of an N-body simulation $+$ Gaussian noise) ensues, followed by the back-propagation of the adjoint gradient for updating the initial conditions.
Figure 2.

Overview of the field-level emulator integration within the BORG algorithm, which we call BORG-EM. The process begins with the evolution of the initial white-noise field using first-order Lagrangian Perturbation Theory (1LPT) up to redshift |$z=0$|⁠, yielding the displacement field |$\Psi _{1\mathrm{LPT}}$|⁠. After periodic padding, these displacements are passed through the V-NET emulator. The emulator’s output displacement field |$\Psi _{\mathrm{EM}}$| is subsequently transformed into an overdensity field |$\boldsymbol{\delta }$| using a cloud-in-cell mass assignment scheme. A likelihood evaluation with the data |$\boldsymbol {d}$| (here the output of an N-body simulation |$+$| Gaussian noise) ensues, followed by the back-propagation of the adjoint gradient for updating the initial conditions.

Our neural network (whose architecture is described in more detail in Appendix A1) is trained and tested using the framework map2map1 for field-to-field emulators, based on PyTorch (Paszke et al. 2019). Gradients of the loss function with respect to the model weights during training and of the data model with respect to the input during field-level inference are offered by the automatic differentiation engine autograd in PyTorch.

For detailed model consistency within the BORG framework, we retrain the model weights of the emulator using the BORG-1LPT predictions as input (for details see Appendix A2). To predict the cosmological power spectrum of initial conditions, we use the class transfer function (Blas, Lesgourgues & Tram 2011). As outputs during training, we use the Quijote Latin Hypercube (Quijote LH) N-body suite (Villaescusa-Navarro et al. 2020), wherein each simulation is characterized by unique values for the five cosmological parameters: |$\Omega _\mathrm{m},\Omega _\mathrm{b},\sigma _8,n_\mathrm{s},$| and h. As the emulator expects a |$\Lambda$|CDM cosmological background, it can navigate through this variety of cosmologies by explicitly using the |$\Omega _\mathrm{m}$| as input to each layer of the CNN. The other parameters only affect the initial conditions, not gravitational clustering, and need not be included as input. In total, 2000 input-output simulation pairs from BORG-1LPT and the Quijote latin-hypercube were generated, out of which 1757, 122, and 121 cosmologies were used in the training, validation, and test set, respectively.

As described in Jamieson et al. (2022b), even though the emulator was trained on |$512^3$| particles in a |$1\,h^{-1}$| Gpc box, the only requirement on the input is that the Lagrangian resolution is |$1.95\,h^{-1}$| Mpc. The emulator can thus handle larger volumes by dividing them into smaller sub-volumes to be processed independently, with subsequent tiling of the sub-volumes to construct the complete field. For smaller volumes that can be handled directly, periodic padding of 48 cells in each dimension is necessary before the network prediction.

2.2.2 Mini-emulator

In this work, we also train and deploy a compact variant of the field-level emulator which we call mini-emulator, described in Appendix A3. The modified architecture of the emulator results in the number of model parameters being reduced by a factor of 4. In turn, this reduces the forward and adjoint computations with a factor of 4 while not reducing the accuracy significantly (see Appendix  C). This makes the mini-emulator especially appealing for application during the initial phase of the BORG inference. This also suggests that improved network architectures can still yield improved performance. A detailed investigation of model architectures is, however, beyond the scope of this paper and will be explored in future works.

2.2.3 Timing and accuracy of emulator

We compare the improvements in timing and accuracy offered by the emulator against forward models previously used in BORG, such as BORG-1LPT and COLA.2 In Table 1, we highlight the significant increase in computational efficiency achieved by our emulator BORG-EM and the mini-emulator.

Table 1.

To demonstrate the effectiveness of the emulator, we compare evaluation times (in wall-clock seconds) of the forward and adjoint parts of different approximate physics models available in BORG, bench-marked against a full N-body simulation. The indicative times are of relevance for the generation of BORG samples of initial conditions per time unit. Comprehensive timing comparison requires optimization for specific settings and hardware in each method, which is outside the scope of this work. The scenario involves |$128^3$| particles in a cubic volume of side length |$250\,h^{-1}$| Mpc. |$\star$|⁠: Supermicro 4124GS-TNR node with NVIDIA A100 40GiB, |$\dagger$|⁠: 128 cores on a Dell R6525 node with AMD EPYC Rome 7502 using the COLA settings required for sufficiently high accuracy as shown by Stopyra et al. (2024).

Structure formation model|$\boldsymbol {t}_{\mathrm{forward}}$| [s]|$\boldsymbol {t}_{\mathrm{adjoint}}$| [s]
1LPT|$\lt 10^{-1}$||$\lt 10^{-1}$|
BORG-EM (1LPT |$+$| emulator)|$^\star$|1.62.6
1LPT |$+$| mini-emulator|$^\star$|0.40.6
BORG-PM (COLA, |$n_{\mathrm{steps}} = 20$|⁠, forcesampling|$=4)^{\dagger }$||$\sim 8$||$\sim 8$|
N-body (P-Gadget-III, |$n_{\mathrm{steps}} = 1664)^{\dagger }$||$\sim 6 \times 10^2$|
Structure formation model|$\boldsymbol {t}_{\mathrm{forward}}$| [s]|$\boldsymbol {t}_{\mathrm{adjoint}}$| [s]
1LPT|$\lt 10^{-1}$||$\lt 10^{-1}$|
BORG-EM (1LPT |$+$| emulator)|$^\star$|1.62.6
1LPT |$+$| mini-emulator|$^\star$|0.40.6
BORG-PM (COLA, |$n_{\mathrm{steps}} = 20$|⁠, forcesampling|$=4)^{\dagger }$||$\sim 8$||$\sim 8$|
N-body (P-Gadget-III, |$n_{\mathrm{steps}} = 1664)^{\dagger }$||$\sim 6 \times 10^2$|
Table 1.

To demonstrate the effectiveness of the emulator, we compare evaluation times (in wall-clock seconds) of the forward and adjoint parts of different approximate physics models available in BORG, bench-marked against a full N-body simulation. The indicative times are of relevance for the generation of BORG samples of initial conditions per time unit. Comprehensive timing comparison requires optimization for specific settings and hardware in each method, which is outside the scope of this work. The scenario involves |$128^3$| particles in a cubic volume of side length |$250\,h^{-1}$| Mpc. |$\star$|⁠: Supermicro 4124GS-TNR node with NVIDIA A100 40GiB, |$\dagger$|⁠: 128 cores on a Dell R6525 node with AMD EPYC Rome 7502 using the COLA settings required for sufficiently high accuracy as shown by Stopyra et al. (2024).

Structure formation model|$\boldsymbol {t}_{\mathrm{forward}}$| [s]|$\boldsymbol {t}_{\mathrm{adjoint}}$| [s]
1LPT|$\lt 10^{-1}$||$\lt 10^{-1}$|
BORG-EM (1LPT |$+$| emulator)|$^\star$|1.62.6
1LPT |$+$| mini-emulator|$^\star$|0.40.6
BORG-PM (COLA, |$n_{\mathrm{steps}} = 20$|⁠, forcesampling|$=4)^{\dagger }$||$\sim 8$||$\sim 8$|
N-body (P-Gadget-III, |$n_{\mathrm{steps}} = 1664)^{\dagger }$||$\sim 6 \times 10^2$|
Structure formation model|$\boldsymbol {t}_{\mathrm{forward}}$| [s]|$\boldsymbol {t}_{\mathrm{adjoint}}$| [s]
1LPT|$\lt 10^{-1}$||$\lt 10^{-1}$|
BORG-EM (1LPT |$+$| emulator)|$^\star$|1.62.6
1LPT |$+$| mini-emulator|$^\star$|0.40.6
BORG-PM (COLA, |$n_{\mathrm{steps}} = 20$|⁠, forcesampling|$=4)^{\dagger }$||$\sim 8$||$\sim 8$|
N-body (P-Gadget-III, |$n_{\mathrm{steps}} = 1664)^{\dagger }$||$\sim 6 \times 10^2$|

In terms of accuracy, as shown in Jamieson et al. (2022b), the field-level emulator achieves per cent level accuracy for the density power spectrum, the Lagrangian displacement power spectrum, the momentum power spectrum, and density bispectra as compared to the Quijote suite of N-body simulations down to |$k\sim 1\,h$| Mpc|$^{-1}$|⁠. In Appendix  B, we define some of these summary statistics (also see Jamieson et al. 2022b). High accuracy is also achieved for the halo mass function, as well as for halo profiles and the matter density power spectrum in redshift space.

In Fig. 1, we present a graphical comparison between the results obtained from BORG-1LPT with and without the utilization of the re-trained emulator BORG-EM, in addition to the corresponding N-body simulation utilizing P-Gadget-III (see more in Section 4.1). The emulator effectively collapses the overdensities of the BORG-1LPT prediction into more pronounced haloes, filaments, and walls, forming the cosmic web, while expanding underdense regions into cosmic voids. In Appendix  C, we show the accuracies in terms of the cross-power spectrum, power spectrum, bispectrum, halo mass function, and stacked halo density profiles.

2.3 BORG  + Field-level emulator

In general, BORG obtains data-constrained realizations of a set of plausible three-dimensional initial conditions in the form of the white noise amplitudes |$\boldsymbol {x}$| given some data |$\boldsymbol {d}$|⁠, such as a dark matter overdensity field or an observed galaxy counts. Following Jasche & Lavaux (2019), one can show that the posterior distribution from Bayes Law reads

(3)

where |$\pi \left(\boldsymbol {x}\right)$| is the prior distribution encompassing our a priori knowledge about the initial white-noise field, |$\pi \left(\boldsymbol {d}\right)$| is the evidence which normalizes the posterior distribution, and |$\pi \left(\boldsymbol {d}| G(\boldsymbol {x},\boldsymbol{\Omega })\right)$| is the likelihood that describes the statistical process of obtaining the data |$\boldsymbol {d}$| given the initial conditions |$\boldsymbol {x}$|⁠, cosmological parameters |$\boldsymbol{\Omega }$|⁠, and a structure formation model G. The final density field |$\boldsymbol{\delta }$| at redshift |$z=0$| is thus related to the initial white-noise field |$\boldsymbol {x}$| through

(4)

where we explicitly model the joint forward model G as the function composition of two parts, the first being the BORG-1LPT model including the application of the transfer function and the primordial power spectrum, and the second being the field-level emulator. Note that, the network weights and biases of the emulator are implicitly assumed.

A schematic of the incorporation of the field-level emulator into BORG, which we call BORG-EM, is shown in Fig. 2. The initial white-noise field |$\boldsymbol {x}$| is evolved using BORG-1LPT to redshift |$z=0$|⁠. We obtain the 3-dimensional displacements |$\boldsymbol{\Psi }_{1\mathrm{LPT}}$| using the 1LPT predicted particle positions |$\boldsymbol {p}_{1\mathrm{LPT}}$| and the initial grid positions |$\boldsymbol {q}$|⁠. The displacements are corrected through the use of the emulator, yielding the updated displacements |$\boldsymbol{\Psi }_{\mathrm{EM}}$| and, in turn, the particle positions through the use of equation (2). The Cloud-In-Cell (CIC) algorithm is applied as the particle mesh assignment scheme, which gives us the effective number of particles per voxel and, subsequently, the final overdensity field |$\boldsymbol{\delta }$| at |$z=0$|⁠. After a likelihood computation, the adjoint gradient is back-propagated through the combined structure formation model G to the initial white-noise field |$\boldsymbol {x}$|⁠.

As described in detail in Jasche & Wandelt (2013), new samples |$\boldsymbol {x}$| from the posterior distribution |$\pi \left(\boldsymbol {x}| \boldsymbol {d}\right)$| can be obtained by following the Hamiltonian dynamics in the high-dimensional space of initial conditions. This requires computing the Hamiltonian forces, which can be obtained by differentiating the Hamiltonian potential |$\mathcal {H}(\boldsymbol {x}) = \mathcal {H}_{\mathrm{prior}}(\boldsymbol {x}) + \mathcal {H}_{\mathrm{likelihood}}(\boldsymbol {x})$|⁠. The introduction of the emulator only affects the likelihood part of the Hamiltonian, so we only need to differentiate

(5)

with respect to the initial conditions |$\boldsymbol {x}$|⁠. The chain rule yields

(6)

where the new component is the matrix containing the gradients of the emulator output with respect to its input

(7)

which is accessible through auto-differentiation in PyTorch.

3 MODELLING NON-LINEAR DARK MATTER FIELDS

Having the field-level emulator integrated into BORG, we now set up the data model necessary to infer the initial white-noise field from a non-linear dark matter distribution generated by an N-body simulation. To generate data with noise, we utilize a Gaussian data model such that Gaussian noise is added to the simulation output. We also introduce a novel multiscale likelihood to the BORG algorithm, which allows balancing between the statistical information at small scales with our physical understanding at large scales, where the physics model performs best.

3.1 Gaussian data model

The final dark-matter particle positions from the |$z=0$| snapshot of an N-body simulation are passed through the CIC algorithm to obtain the dark matter over-density field |$\boldsymbol{\delta }^{\mathrm{sim}}$|⁠. The data |$\boldsymbol {d}$| is generated by

(8)

where |$\mathbf {n}$| is noise drawn from a zero mean Gaussian with diagonal covariance matrix |$\mathbf {C}=\mathbb{1}\sigma ^2$|⁠.

During inference, the predicted final density field |$\boldsymbol{\delta }$| from BORG and the data |$\boldsymbol {d}$| is compared with a likelihood function, from which the adjoint gradient backpropagates to the white noise field. To describe the data model in equation (8), we introduce the real-space voxel-based Gaussian likelihood

(9)

where i runs over all voxels in the fields.

3.2 Multiscale likelihood model

The likelihood in equation (9) is voxel-wise, resulting in giving all the weight of the inference to the small scales. Our physics simulator has the largest discrepancy at those non-linear scales. To inform the data model about where we think the physics model is performing best, we introduce a multiscale likelihood. Instead of giving all weight to the small scales, we re-balance in an information-theoretically correct way as shown below. Importantly, we only destroy and never introduce information, resulting in a conservative inference.

To balance the statistical information at small scales with our physical understanding at large scales, we embrace a partitioned approach to the likelihood function by incorporating L factors, each dedicated to refining the prediction and data at specific scales before conducting a comparison. We start by decomposing the likelihood

(10)

which is a statistically valid approach as long as the weight factors |$w_l$| satisfy the condition:

(11)

We next introduce |$\mathbf {K}^l$| to denote the operation of averaging the field values over neighbourhood sub-volumes spanning |$k_l \equiv 2^{3l}$| voxels, where the subscript l denotes the level of granularity. We ensure that the averaging process occurs over different sub-volumes at each likelihood evaluation. For the Gaussian likelihood in equation (9), equation (10) becomes

(12)

where |$\sigma _l = \sigma k_l^{-1/2}$| decreases with l to account for increasingly coarser resolutions (see Appendix D1) and i runs over all voxels in the fields at level l.

With the use of the average pooling operators |$\lbrace \mathbf {K}^l\rbrace$|⁠, some information is lost due to the smoothing process. As shown in Appendix D2, the difference in information entropy H of the data vector |$\boldsymbol {d}$| between the voxel-based likelihood and the multiscale likelihood becomes

(13)

This means that the multiscale likelihood results in not using all available information in the data, which comes with the advantage of implicitly establishing a framework for conducting robust Bayesian inference (see more in e.g. Jasche & Lavaux 2019; Miller & Dunson 2019) in the presence of potential model mis-specification.

More specifically, this approach can operate iteratively, with weight factors |$w_l$| dynamically adjusted throughout the inference process. This strategy allows to initially use most of the large-scale information in the data before progressively incorporating finer details to capture more complex non-linear scales (see Section 4.2 for further details). While this reminds of annealing-based methods in e.g. Modi et al. (2018), Porqueres et al. (2020), and Bayer et al. (2023b), it is worth stressing that our multiscale likelihood approach is conceptually different through an information-theoretically rigorous decomposition in equation (10) and the introduction of the average pooling operators.

4 DEMONSTRATION OF INFERENCE WITH BORG-EM

The BORG algorithm uses a Markov chain Monte Carlo (MCMC) algorithm to explore the posterior distribution of white-noise fields. To evaluate the performance of the field-level emulator integration, we aim to infer the |$128^3\sim 10^6$| primordial white-noise amplitudes from a non-linear matter distribution as simulated by an N-body simulation. We use a cubic Cartesian box of side length |$250\,h^{-1}$| Mpc on |$128^3$| equidistant grid nodes, that is with a resolution of |$\sim$||$1.95\,h^{-1}$| Mpc. Notably, the emulator then operates on periodically padded displacement fields of size |$224^3$| as input. To assess the algorithm’s validity and efficiency in exploring the very high-dimensional parameter space, we employ warm-up testing of the Markov chain from a region far away from the target. Despite subjecting the neural network emulator to random inputs from such a Markov chain, which have not been included in the training data, it demonstrates a coherent warm-up toward the correct target region. The faster, but slightly less accurate mini-emulator is used during the initial phase of inference, after which we switch to the fully tested emulator.

4.1 Generation of ground truth data

Although the forward model used during inference is limited to BORG-EM, that is BORG-1LPT plus the emulator extension, we generate the ground truth dark-matter overdensity field using the N-body simulation code P-Gadget-III, a non-public extension to the Gadget-II code (Springel 2005). The initial conditions are generated at |$z=127$| using the IC-generator MUSIC (Hahn & Abel 2011) together with transfer file from CAMB (Lewis, Challinor & Lasenby 2000). The cosmological model used corresponds to the fiducial Quijote suite: |$\Omega _\mathrm{m} = 0.3175$|⁠, |$\Omega _\mathrm{b} = 0.049$|⁠, |$h = 0.6711$|⁠, |$n_\mathrm{s} = 0.9624$|⁠, |$\sigma _8 = 0.834$|⁠, and |$w=-1$| (Villaescusa-Navarro et al. 2020), consistent with latest constraints by Planck (Aghanim et al. 2020). For consistency, we compile and run the TreePM code P-Gadget-III with the same configuration as used for Quijote LH. This means approximating gravity consistently in terms of softening length, parameters for the gravity PM grid (grid size, |$a_{\mathrm{smooth}}$|⁠), and opening angle for the tree. For example, the Quijote LH used PMGRID |$=1024$|⁠, i.e. with a grid cell size of |$\sim$||$0.98\,h^{-1}$| Mpc, which we match by using PMGRID |$=256$| in our smaller volume.

In our test scenario, we use a sufficiently high signal-to-noise ratio (⁠|$\sigma = 3$| in equation 8), such that the algorithm would have to recover all intrinsic details of the non-linearities in the large-scale structure. In contrast, a lower signal-to-noise scenario would be an easier problem as the predictions would be allowed to fluctuate more around the truth. Moreover, |$\sigma = 3$| corresponds to a signal-to-noise above one for scales down to |$k \approx 1.5\,h^{-1}$| Mpc (see Appendix  E), which is below the scales for which the emulator is expected to perform accurately.

4.2 Weight scheduling of multiscale likelihood

We set up the use of the multiscale likelihood at |$L=6$| different levels of granularity, effectively smoothing the fields at most to |$4^3$| voxels. The data model is then informed about the accuracy of the physics simulator at different scales. We empirically select the weights to follow a specific scheduling strategy and initialize with |$w = [0.0001,0.0002,0.001,0.01,0.1,0.8887]$|⁠, i.e. with the largest weight on the largest scale. This effectively initiates the inference process at a coarser resolution to facilitate the enhancement of large-scale correlations. The weight of |$w_5$| is then increased, at the expense of |$w_6$|⁠, until it becomes the dominant factor. All weight transitions are shown in Appendix D3. This iterative process of increasing the weight on the next smaller scale continues until we converge after roughly 5000 samples to empirically determined weights . While the majority of the weight w = [0.957, 0.01, 0.01, 0.009, 0.008, 0.007] is assigned to the smallest scale at the voxel level, non-zero weights are maintained for the larger scales. Only the last set of weights are kept when recording samples from the Markov chain, while the others belong to the warm-up phase and are discarded. While we see no major change by switching to [1,0,0,0,0,0] after warm-up, further refinement of the weights is left for future research.

4.3 Initial warm-up phase of the Markov chain

To evaluate its initial behaviour, we initiate the Markov chain with an overdispersed Gaussian density field, set at three-tenths of the expected amplitude in a standard |$\Lambda$|CDM scenario. In the |$128^3 \sim 10^6$| high-dimensional parameter space of white noise field amplitudes, this means that the Markov chain will initially explore regions far away from the target distribution. If it successfully transitions to the typical set and performs accurate exploration, it signifies a successful demonstration of its efficacy in traversing high-dimensional parameter space. To prematurely place the Markov chain in the target region to hasten the warm-up phase would risk bypassing a true evaluation of the algorithm’s performance and exploration capabilities.

We monitor and illustrate this systematic drift through parameter space by following the power spectrum |$P(k)$| and two different configurations of the reduced bispectrum Q (defined in Appendix  B) as computed from subsequent predicted final overdensity fields, as shown in Fig. 3. In particular, the middle panel shows the bispectrum as a function of the angle |$\theta$| between two wave vectors, chosen to |$k_1=0.1\,h$| Mpc|$^{-1}$| and |$k_2=1.0\,h$| Mpc|$^{-1}$|⁠. In the right panel, the reduced bispectrum is displayed as a function of the magnitude of the three wave vectors for an equilateral configuration. We observe that approaching the fiducial statistics requires only a few thousand Markov chain transitions while fine-tuning to obtain the correct statistics requires additional tens of thousands of transitions during the final warm-up phase.

The initial phase of the inference is monitored by following the systematic drift of the power spectrum and two configurations of the reduced bispectrum towards the fiducial ground truths. After a few thousand samples, we approach the regime of per cent-level agreement with the power spectrum and bispectrum. More samples are needed to enter the typical set from which plausible Gaussian initial conditions can be sampled. Model mis-specification between the ground truth (N-body) and the BORG-EM predicted fields shows up as small discrepancies at the largest scales, as a consequence of putting most weight on fitting the smallest scales where most statistical power resides.
Figure 3.

The initial phase of the inference is monitored by following the systematic drift of the power spectrum and two configurations of the reduced bispectrum towards the fiducial ground truths. After a few thousand samples, we approach the regime of per cent-level agreement with the power spectrum and bispectrum. More samples are needed to enter the typical set from which plausible Gaussian initial conditions can be sampled. Model mis-specification between the ground truth (N-body) and the BORG-EM predicted fields shows up as small discrepancies at the largest scales, as a consequence of putting most weight on fitting the smallest scales where most statistical power resides.

There are apparent differences between the resulting statistics from the forward simulated samples |$\boldsymbol{\delta }$| of initial conditions. Although the small scales appear to agree with the ground truth simulation |$\boldsymbol{\delta }^{\mathrm{sim}}$|⁠, there is an evident discrepancy at larger scales. The observed phenomena can be explained by the model mismatch between the field-level emulator BORG-EM and the N-body simulation. Because the multiscale likelihood still puts the most weight on the small scales, it is expected that for the data model to fit the smallest scales, the discrepancy is pushed to the larger scales. It is worth noting that the discrepancy would be even larger for e.g. BORG-1LPT or COLA, where the model mismatch is more significant.

5 ACCURACY OF INFERRED INITIAL CONDITIONS

After the completion of the warm-up phase, the Markov chain generates samples |$\lbrace \boldsymbol {x}\rbrace$| from the posterior distribution |$\pi (\boldsymbol {x}|\boldsymbol {d})$|⁠. We show that the inferred initial conditions display Gaussianity with the expected mean and variance, and that the transfer function with the ground truth initial conditions show below 5 per cent agreement over all scales. We also show that the cross-correlation between the final density fields and the data is as high as the correlation between the simulated N-body and the data, indicating we extracted all cross-correlation information content from the data. Notably, information on linear scales in the initial conditions is mainly tied up on non-linear scales in the final conditions due to gravitational collapse. This highlights the importance of non-linear physics models, such as the field-level emulator, to constrain linear regimes in the initial conditions.

5.1 Statistical accuracy

To assess the quality of the drawn initial condition samples from the posterior distribution, we first verify their expected Gaussianity as compared to the ground truth in Fig. 4, including tests of skewness and excess kurtosis going to zero as expected. Notably, in Fig. 3, we also see that the two- and three-point statistics (power spectra and bispectra) of the inferred initial conditions towards the end of warm-up are well-recovered to below a few per cent and 10  per cent respectively.

The distribution of individual samples of initial conditions $\lbrace \boldsymbol {x}_i\rbrace$, here displayed next to the true initial conditions $\boldsymbol {x}_{\mathrm{true}}$ for data generation, exhibit desired Gaussian properties: zero mean, unit variance, and minimal skewness $\gamma _1$ and excess kurtosis $\gamma _2$ (with standard deviation errors computed over the samples).
Figure 4.

The distribution of individual samples of initial conditions |$\lbrace \boldsymbol {x}_i\rbrace$|⁠, here displayed next to the true initial conditions |$\boldsymbol {x}_{\mathrm{true}}$| for data generation, exhibit desired Gaussian properties: zero mean, unit variance, and minimal skewness |$\gamma _1$| and excess kurtosis |$\gamma _2$| (with standard deviation errors computed over the samples).

We also evaluate the transfer function, defined as the square root of the ratio between the power spectra of the inferred fields and the ground truth. As illustrated in Fig. 5, we achieve high accuracy (⁠|$\lt 5~{{\ \rm per\ cent}}$|⁠) for both the initial and final conditions across all modes. We observe that the final conditions align more closely with the ground truth. This is the result of accurately explaining the N-body generated data with the emulator during inference despite the residual model mis-specification, particularly small-scale inaccuracies (see e.g. Fig. C2). The per cent-level deviations in the transfer functions at intermediate and large scales can also be attributed to model mis-specifications, which propagate through mode coupling (as discussed along Fig. 6 in Section 5.2) and power conservation in cosmological fields to affect larger scales in the inferred initial conditions. Furthermore, as most statistical information resides at smaller scales, the emulator model prioritizes fitting these scales, further explaining the minor larger-scale discrepancies. The quality of the initial conditions remains high, as shown in Section 6.

Transfer functions for the inferred initial and final density fields with the respective ground truth show agreements within 5 per cent across the entire Fourier range. The noise-dominated region of the data is shown in grey. Notably, the final conditions align more closely with the ground truth than the initial conditions. We expect this and other per cent-level deviations to result from small-scale model mis-specifications, which propagate through mode coupling and power conservation to larger scales in the initial conditions.
Figure 5.

Transfer functions for the inferred initial and final density fields with the respective ground truth show agreements within 5 per cent across the entire Fourier range. The noise-dominated region of the data is shown in grey. Notably, the final conditions align more closely with the ground truth than the initial conditions. We expect this and other per cent-level deviations to result from small-scale model mis-specifications, which propagate through mode coupling and power conservation to larger scales in the initial conditions.

Information recovery of inferred initial conditions. Left: the size of proto-haloes (as defined by the diameter $D_{\mathrm{proto-halo}}$) in the initial conditions collapse into smaller haloes (of size $D_{\mathrm{halo}}$) in the final field. In the spherical collapse approximation to halo formation, all of our proto-haloes collapse into the noise-dominated region (S/N $\lt 1$) of our data, and most of them even below our data grid resolution as given by the Nyquist frequency $k_\mathrm{N}$. Right: Cross-correlation analysis reveals that the inferred initial fields $\boldsymbol {x}$ exhibit notable correlation with the truth $\boldsymbol {x}^{\mathrm{truth}}$ up to $k \sim 0.35\,h$ Mpc$^{-1}$. Between the corresponding final conditions $\boldsymbol{\delta }$ and the data $\boldsymbol {d}$, this correlation extends to much smaller scales of $k \sim 2h$ Mpc$^{-1}$. Note that, this closely mirrors the correlation between the true final density field $\boldsymbol{\delta }^{\mathrm{sim}}$ and the data $\boldsymbol {d}$, which demonstrates that we have fully extracted the cross-correlation information, as evident in the lower panel. Importantly, this is obtained despite the lower correlation in the initial field. With our noise level and data grid resolution, and because larger regions in the initial conditions collapse into smaller regions in the final conditions, there is no more information in the initial conditions to gain.
Figure 6.

Information recovery of inferred initial conditions. Left: the size of proto-haloes (as defined by the diameter |$D_{\mathrm{proto-halo}}$|⁠) in the initial conditions collapse into smaller haloes (of size |$D_{\mathrm{halo}}$|⁠) in the final field. In the spherical collapse approximation to halo formation, all of our proto-haloes collapse into the noise-dominated region (S/N |$\lt 1$|⁠) of our data, and most of them even below our data grid resolution as given by the Nyquist frequency |$k_\mathrm{N}$|⁠. Right: Cross-correlation analysis reveals that the inferred initial fields |$\boldsymbol {x}$| exhibit notable correlation with the truth |$\boldsymbol {x}^{\mathrm{truth}}$| up to |$k \sim 0.35\,h$| Mpc|$^{-1}$|⁠. Between the corresponding final conditions |$\boldsymbol{\delta }$| and the data |$\boldsymbol {d}$|⁠, this correlation extends to much smaller scales of |$k \sim 2h$| Mpc|$^{-1}$|⁠. Note that, this closely mirrors the correlation between the true final density field |$\boldsymbol{\delta }^{\mathrm{sim}}$| and the data |$\boldsymbol {d}$|⁠, which demonstrates that we have fully extracted the cross-correlation information, as evident in the lower panel. Importantly, this is obtained despite the lower correlation in the initial field. With our noise level and data grid resolution, and because larger regions in the initial conditions collapse into smaller regions in the final conditions, there is no more information in the initial conditions to gain.

5.2 Information recovery

From the collapsed objects in the final conditions, we are trying to recover the information on the initial conditions. In this pursuit, we have to acknowledge the effects of gravitational collapse, causing objects that are initially extended objects in Lagrangian space to collapse to smaller scales in the final conditions. This means that the gravitational structure formation introduces a temporal mode coupling between the present-day small scales and the earlier large scales, as illustrated in Fig. 6.

5.2.1 Proto-haloes versus haloes

We distinguish between the Lagrangian scales in the initial conditions and the Eulerian scales in the final conditions due to mass transport. In particular, overdense regions in the initial conditions collapse into smaller regions in the final conditions, while underdense regions experience growth (Gunn & Gott 1972). We can understand this mode coupling in more detail by spherical collapse approximation to halo formation. We start by considering a halo of mass M and follow the trajectories of all of its constituent particles back to the initial conditions. Based on the early Universe being arbitrarily close to uniform with a mean density |$\rho _\mathrm{m} = \rho _\mathrm{c} \Omega _\mathrm{m}$|⁠, we can imagine the collapse of a uniform spherical volume – the proto-halo volume – from which the present-day halo formed. We can define the proto-halo through the Lagrangian radius|$R_\mathrm{ L}$| within which all particles from a halo with mass M must have initially resided,

(14)

We thus expect the linear regime in the initial conditions to be pushed into the non-linear regime of today.

Using the |$M_{200\mathrm{c}}$| mass definition (see more in Section 6.1) we find haloes in the ground truth simulation with accurate mass estimates in the range |$[0.853,16] \times 10^{14}\, {\rm M}_{\odot }$|⁠, corresponding to Lagrangian radii between |$6.14h^{-1} \,\, \mathrm{Mpc}$| and |$16.3h^{-1} \,\,\mathrm{Mpc}$| for the least and most massive haloes, respectively. The diameter |$D=2R_\mathrm{ L}$| can be converted to the corresponding scale of these regions |$k \in [0.193, 0.512]h \,\, \mathrm{Mpc}^{-1}$|⁠. Note, however, that, in Fig. 6, we show all haloes found, even the smallest ones. The virial radii for these haloes lie in the range |$[0.34, 1.85]h^{-1} \,\, \mathrm{Mpc}$|⁠, i.e. with diameter scales of |$k \in [1.70, 9.24]h \,\, \mathrm{Mpc}^{-1}$|⁠. In the left panel of Fig. 6, we highlight this size difference of haloes and the corresponding proto-haloes in units of the corresponding scale in |$h \,\, \mathrm{Mpc}^{-1}$|⁠. As our Cartesian grid of the final density field has a resolution of |$1.95\,h^{-1}$| Mpc with a Nyquist frequency |$k_{\mathrm{N}} = 2.79\,h \, \mathrm{Mpc}^{-1}$|⁠, this means that the majority of proto-haloes will collapse to haloes below the resolution of our data constraints. While one might anticipate an impact on the reconstructed halo centres, our observations suggest the effect is minor, as presented in Section 6.2. The precise extent to which additional information can be extracted at higher resolution remains uncertain and requires further investigation.

5.2.2 Limitations set by Gaussian noise and data grid resolution

We also note that because of non-linear structure formation, different cosmic environments in the data will be differently informative at different scales (see e.g. Bonnaire et al. 2022). This is particularly evident in our case of using a Gaussian data model since Gaussian noise, in contrast to e.g. Poisson noise, is insensitive to the environment of the matter distribution. With our fixed noise threshold, the signal-to-noise ratio becomes significantly higher in overdense regions as compared to underdense regions. It is therefore expected that the constraining power in our inference will come almost exclusively from overdense regions.

In Fig. 6, we show where the transition from signal domination to noise domination occurs for our use of |$\sigma =3$| in equation (8) (also see Appendix  E). This introduces a soft barrier below which recovery of information is limited. We also display a hard boundary of information recovery given by the data grid resolution, corresponding to the three-dimensional Nyquist frequency |$k_{\mathrm{N}} \approx 2.79\,h \, \mathrm{Mpc}^{-1}$|⁠, below which no information is accessible.

5.2.3 Cross-correlation in inferred fields

We examine the information recovered from our inference by computing the cross-correlation between the true white noise field |$\boldsymbol {x}^{\mathrm{truth}}$| and the set of sampled fields |$\lbrace \boldsymbol {x}\rbrace$|⁠, as depicted in the right panel of Fig. 6. We see a strong correlation (⁠|$\gt $|80 per cent) with the truth up to |$k \sim 0.35\,h$| Mpc|$^{-1}$|⁠, after which the correlation drops. We also compare the correlation between the final density fields |$\boldsymbol{\delta }$| and the data |$\boldsymbol {d}$|⁠. Furthermore, we benchmark this to the correlation between the true final density field |$\boldsymbol{\delta }^{\mathrm{sim}}$| and the data |$\boldsymbol {d}$|⁠. As depicted in the lower panel, we achieve a correlation for the inferred fields that is nearly as high, differing by only a sub- per cent margin up to |$k \sim 2\,h$| Mpc|$^{-1}$|⁠. This demonstrates that the cross-correlation information content between the final fields |$\boldsymbol{\delta }$| and the data |$\boldsymbol {d}$| has been fully extracted well within the noise-dominated regime up to the grid resolution of the data.

Despite achieving maximal information extraction from the final conditions, we are limited in the reconstruction of initial conditions by noise and data grid resolution. This is promising since it shows that more information can be obtained. For instance, the use of a lower signal-to-noise ratio exhibit diminished correlation as shown in Appendix  F, indicating that higher signal-to-noise ratios increase information extraction. Because most of our probes moved into these regimes (high noise or even sub-grid), by enhancing the resolution in the data space, we could potentially also boost recovery and increase correlation in the initial conditions up to smaller scales. We leave this for future investigation, but it is clear that to constrain the linear regime of the initial conditions, non-linear models are essential.

6 POSTERIOR RESIMULATIONS

As highlighted by Stopyra et al. (2024), in the context of model mis-specification, a sufficiently accurate physics model is needed during inference to recover the initial conditions accurately. Otherwise, when using the inferred initial conditions to run posterior resimulations in high-fidelity simulations, one may risk obtaining an overabundance of massive haloes and voids as observed by Desmond et al. (2022), Hutt et al. (2022), and Mcalpine et al. (2022). To validate the incorporation of the emulator in field-level inference, we use 80 independent posterior samples (see details in Appendix  G) of inferred initial conditions within the N-body simulator P-Gadget-III, mirroring the data generation set-up (see Section 4.1). The posterior resimulations align with the ground truth N-body simulation in the formation of massive large-scale structures, in terms of halo mass function as well as density profiles, which demonstrates the robustness of the emulator model in inference. The mean and standard deviation of the posterior resimulations as well as of the inferred initial conditions are shown in Fig. H1.

6.1 Halo mass function

To identify haloes in our N-body simulations, we employ the spherical-overdensity based halo finder AHF (Knollmann & Knebe 2009) and the Friend-of-Friends (FoF) algorithm in nbodykit (Hand et al. 2018). We adopt the |$M_{200\mathrm{c}}$| definition (for AHF) for the halo mass, representing the mass enclosed within a spherical halo whose average density is equal to 200 times the critical density. Recovering the |$M_{200\mathrm{c}}$| mass of haloes is generally a more stringent test than the FoF technique because it requires resolving the density profile accurately down to smaller scales. FoF does not depend on a fixed density threshold to define halo boundaries. Instead, it determines boundaries based on particle distribution and connectivity using a linking length (we use |$l = 0.2\,h^{-1}$| Mpc). This allows for more forgiving interpretations of halo size and shape, not requiring precise density profiles for larger regions. Consequently, FoF haloes demonstrate more robustness against density variations and are adept at accommodating complex substructures.

In Fig. 7, we show the halo mass function as found by AHF and FoF, clearly demonstrating the high accuracy of the posterior resimulations. The number density of haloes from the posterior resimulations shows high consistency both with FoF and |$M_{200\mathrm{c}}$| haloes from the ground truth simulation. We also validate the |$M_{200\mathrm{c}}$| halo mass function against the theoretically expected Tinker prediction for this cosmology (Tinker et al. 2008) (represented by the grey shading). Similarly, the FoF halo mass function is validated against theory prediction (Watson et al. 2013). We conclude that the resimulations, plotted with the scatter across the resimulation samples, align with the predicted halo mass function, i.e. fall within the Poisson interval of the expected mean. While also possible to include the Poisson error for the resimulations into a single error, the Poisson error would dominate, obscuring an important point: the resimulation scatter is minimal, allowing for precise determination of the count in each bin. This is a result of the high signal-to-noise ratio employed in this study, along with the fact that all voxels are constrained by data. The residual panel further reveals small discrepancies, with some bins containing more haloes and others fewer in the posterior resimulations compared to the ground truth. These discrepancies can be attributed to model mis-specification between the emulator and the N-body ground truth during inference.

Halo mass function for the entire cubic volume of side length $250\,h^{-1}$ Mpc obtained using posterior resimulations of the BORG-EM field-level inference. For comparison with the ground truth simulation, we adopt two different halo mass definitions, $M_{200\mathrm{c}}$ (through AHF) (left) and FoF (right). Note that, $M_{200\mathrm{c}}$, being a spherical overdensity definition, puts stricter requirements on the density profile. The posterior resimulation bins are plotted along with uncertainties only originating from the scatter in the resimulations to emphasize its minimal size, demonstrating our ability to accurately determine the count in each bin. To show that our resimulations are consistent with the halo mass function (i.e. within the Poisson interval of the expected mean in that bin), we plot the theoretical halo mass function with Poisson scatter. The grey shaded regions show the 99 per cent confidence interval from Poisson fluctuations around the Tinker and FoF predictions, respectively. The mass function is not accurately reproduced below the mass resolution limit, illustrated by the red shaded region and defined as haloes consisting of less than 130 and 30 simulated particles for $M_{200\mathrm{c}}$ and FoF, respectively.
Figure 7.

Halo mass function for the entire cubic volume of side length |$250\,h^{-1}$| Mpc obtained using posterior resimulations of the BORG-EM field-level inference. For comparison with the ground truth simulation, we adopt two different halo mass definitions, |$M_{200\mathrm{c}}$| (through AHF) (left) and FoF (right). Note that, |$M_{200\mathrm{c}}$|⁠, being a spherical overdensity definition, puts stricter requirements on the density profile. The posterior resimulation bins are plotted along with uncertainties only originating from the scatter in the resimulations to emphasize its minimal size, demonstrating our ability to accurately determine the count in each bin. To show that our resimulations are consistent with the halo mass function (i.e. within the Poisson interval of the expected mean in that bin), we plot the theoretical halo mass function with Poisson scatter. The grey shaded regions show the 99 per cent confidence interval from Poisson fluctuations around the Tinker and FoF predictions, respectively. The mass function is not accurately reproduced below the mass resolution limit, illustrated by the red shaded region and defined as haloes consisting of less than 130 and 30 simulated particles for |$M_{200\mathrm{c}}$| and FoF, respectively.

A recent investigation on the number of particles required to accurately describe the density profile and mass of |$M_{200\mathrm{c}}$| dark matter haloes has shown that at least |$N=130$| particles are required (Mansfield & Avestruz 2021). This results in the Poisson contribution |$\sqrt{N}/N$| to the halo mass being below 10 per cent, ensuring greater accuracy. With our simulation set-up, this corresponds to |$8.53\times 10^{13} \,{\rm M}_{\odot }$|⁠, below which the mass is in general not accurately recovered. The (red) shaded region of Fig. 7 represents the mass resolution limit. In our case, masses down to |$5\times 10^{13}\, {\rm M}_{\odot }$| show a significant correlation with the theoretical expectation. Below the mass resolution limit, we also match the ground truth mass function down to |$2\times 10^{13}\, {\rm M}_{\odot }$|⁠. The FoF mass resolution limit is set to 30 particles (in our case corresponding to |$2\times 10^{13} {\rm M}_{\odot }$|⁠) which is sufficient to accurately recover the mass function (Reed et al. 2003; Trenti et al. 2010), as shown here next to the theoretical prediction from Watson et al. (2013).

6.2 Stacked halo density profiles

In addition to the halo mass functions in Fig. 7, which shows high consistency with |$\Lambda$|CDM expectations and the ground truth, we investigate how well the density profiles of haloes in different mass bins are reconstructed. The centre of mass of each halo will, however, vary from one posterior sample to the next. The mean displacement of halo centres over all posterior resimulations is found to be |$0.13^{+0.03}_{-0.02}\,h^{-1}$| Mpc. We therefore use the ground truth N-body simulation prediction as an initial guess for the locations of each halo in each posterior resimulation and re-centre these with an iterative scheme to identify the correct centre of mass for each. This prescription allows us to identify the haloes in posterior resimulations corresponding to each of the ground truth haloes, which in turn enables us to compare how well we constrain the halo density profiles. Notably, this results in only using the haloes present in the ground truth simulation.

We use the cumulative density profile

(15)

and group haloes into three different mass bins. We compute the mean density profile for each mass bin in each posterior resimulation, and subsequently average the means over the simulations.

For the lower mass bin, we pick the |$8.53\times 10^{13} {\rm M}_{\odot }$| threshold as described in the previous section, and for the high-mass end, we select |$1.6\times 10^{15} {\rm M}_{\odot }h^{-1}$| as it includes all haloes in the simulation. In units of |$10^{14}\,{\rm M}_{\odot }$|⁠, the bins are [0.853,2], [2,5], and [5,16].

We display the average over the stacked density profiles as well as the standard error in Fig. 8. For all mass bins, as compared to the ground truth, we see at most a 10 per cent discrepancy down to |$\frac{1}{2}r_{200\mathrm{c}}$|⁠, highlighting the high quality of the sampled initial conditions by BORG-EM.

Stacked halo density profiles obtained from the posterior resimulations and the ground truth in three distinct mass bins. We used the ground truth N-body simulation prediction as an initial guess for the locations of each halo in each posterior resimulation and re-centred these with an iterative scheme to identify the correct centre of mass for each halo. The average density profile within each mass bin is then computed for each resimulation, after which we average the density profiles over all resimulations. In all bins, we demonstrate discrepancies below 10  per cent for $r\gt \frac{1}{2}r_{200\mathrm{c}}$. Note that, the virial radius $r_{200\mathrm{c}}$ for the ground truth haloes lie in the range $[0.34,1.85]h^{-1}$ Mpc, i.e. below the data resolution.
Figure 8.

Stacked halo density profiles obtained from the posterior resimulations and the ground truth in three distinct mass bins. We used the ground truth N-body simulation prediction as an initial guess for the locations of each halo in each posterior resimulation and re-centred these with an iterative scheme to identify the correct centre of mass for each halo. The average density profile within each mass bin is then computed for each resimulation, after which we average the density profiles over all resimulations. In all bins, we demonstrate discrepancies below 10  per cent for |$r\gt \frac{1}{2}r_{200\mathrm{c}}$|⁠. Note that, the virial radius |$r_{200\mathrm{c}}$| for the ground truth haloes lie in the range |$[0.34,1.85]h^{-1}$| Mpc, i.e. below the data resolution.

7 DISCUSSION & CONCLUSIONS

The maximization of scientific yield from upcoming next-generation surveys of billions of galaxies heavily relies on the synergy between high-precision data, high-fidelity physics models, and advanced technologies for extracting cosmological information. In this work, we built on recent success in analysis at the field level, which naturally incorporates all non-linear and higher-order statistics of the cosmic matter distribution within the physics model. More specifically, we incorporated a field-level emulator into the Bayesian hierarchical inference algorithm BORG to infer the initial white-noise field from a dark matter-only density field produced by a high-fidelity N-body simulation. While reducing computational cost compared to full N-body simulations, the emulator significantly increases the accuracy as compared to first-order Lagrangian perturbation theory and COLA. As outlined by Stopyra et al. (2024), a sufficiently accurate physics model is required to accurately reconstruct massive objects, a standard that our emulator surpasses.

In this work, we introduced a novel multiscale likelihood approach that balances our comprehension of large-scale physics with the statistical power on smaller scales. This approach effectively informs the data model about the physics simulator’s accuracy across various scales during inference. The inevitable loss of data information as shown by information entropy due to the average operator introduces a method for conducting robust Bayesian inference, proving advantages in instances of model mis-specification.

In our demonstration, we utilized the dark-matter-only output of the N-body simulation code P-Gadget-III added with Gaussian noise as the data. Within the Hamiltonian Monte Carlo based BORG algorithm, we sampled the initial conditions in the form of |$128^3 \sim 10^6$| primordial white-noise amplitudes in a cubic volume with side length |$250\,h^{-1}$| Mpc and with a grid resolution of|$1.95\,h^{-1}$| Mpc.

A worry in putting a trained machine-learning algorithm into an MCMC algorithm is that the MCMC algorithm might probe regions, particularly during the initial phase, that have not been part of the training domain of the network. Therefore, the MCMC framework could be trapped in an error regime of the machine-learning algorithm. Our observations suggest that this never occurs and that the emulator as well as the mini-emulator thus show that they have become sufficiently generalized during training to allow for exploration within an MCMC framework. This could partly be explained by 1LPT playing a pivotal role in fixing the global configuration, while the emulator is a fairly local and sparse operator that only updates the smaller scales.

Having reached the desired target space in the |$\sim$||$10^6$| high-dimensional space of initial conditions, we note that the sampled initial conditions display the expected Gaussianity. The per cent-level agreement for the power spectrum and bispectrum compared to the ground truth reveals our ability to accurately recover scales down to approximately |$k\sim 2\,h$| Mpc|$^{-1}$|⁠.

Importantly, upon examining information recovery, we note that the inference has fully extracted all cross-correlation information that can be gained from the noisy data up to the data grid resolution. This is evident by the sub-per cent level agreements as compared to the correlation between the ground truth simulation and the data. The initial conditions are constrained up to |$k \sim 0.35\,h$| Mpc|$^{-1}$| at the 80 per cent level, demonstrating that non-linear data models are essential to constrain even the linear regime of the initial conditions. Spherical collapse theory, in which massive objects encompasses larger scales in the initial conditions, offers one explanation. The Eulerian scale of the haloes has virialization radii that are below our data resolution. This is promising as it indicates that increasing the resolution of the data could potentially increase the recovered small-scale information in the initial conditions. We leave the investigation of increasing the data resolution for future work.

We further demonstrate the robustness of incorporating the field-level emulator in inference by running posterior resimulations, using the inferred initial conditions. We note consistency for the stringent |$M_{200\mathrm{c}}$| definition of halo mass as well as for the Friend-of-friends halo mass function down to |$0.853\times 10^{14}\,{\rm M}_{\odot }$|⁠. We further show less than 10 per cent discrepancies for stacked density profiles in all mass bins.

The current emulator network boasts improved accuracy and speed, yet there is a chance the architecture is not fully optimized. Particularly, our mini-emulator in this study reveals a promising trend: despite substantially reduced neural network model weights, leading to a considerable cut in evaluation time, the accuracy remains notably high. This finding merits further investigation. The potential of field-level emulators shown in this work further suggests that an extension to more complicated scenarios such as using hydrodynamical simulations might be possible. Complex physics can be directly integrated into the training data using the same pipeline showcased in this work. Of interest to this work would also be extending the emulator to multiple redshifts. Such a redshift-dependent emulator has already been developed by Jamieson et al. (2024). Redshift is added as a second style parameter, and the emulator is retrained on a suite of simulations including snapshots that span a range of redshifts from |$z=3$| to |$z=0$|⁠. Another redshift-dependent emulator has also been developed by Bartlett et al. (2024) using the scale factor as a style parameter.

Given that the emulator operates as an extension to the structure formation model module and is trained on multiple cosmologies, there is also interest in integrating this technique with the exploration of cosmological parameters. Notably, joint sampling of the initial density field and cosmological parameters within the BORG framework has been investigated in prior research (Ramanah et al. 2019; Andrews et al. 2023; Porqueres et al. 2023).

In conclusion, we here demonstrated that Bayesian inference within a Hamiltonian Monte Carlo machinery with a high-fidelity simulation is numerically feasible. Machine learning in the form of field-level emulators for cosmological simulation makes it possible to extract more of the non-linear information in the data.

ACKNOWLEDGEMENTS

We thank Adam Andrews, Metin Ata, Nai Boonkongkird, Simon Ding, Mikica Kocic, Stuart McAlpine, Hiranya Peiris, Andrew Pontzen, Natalia Porqueres, Fabian Schmidt, and Eleni Tsaprazi for useful discussions related to this work. We also thank Adam Andrews and Nhat-Minh Nguyen for their feedback on the manuscript. This research utilized the Sunrise HPC facility supported by the Technical Division at the Department of Physics, Stockholm University. This work was carried out within the Aquila Consortium.3 This work was supported by the Simons Collaboration on ‘Learning the Universe’. This work has been enabled by support from the research project grant ‘Understanding the Dynamic Universe’ funded by the Knut and Alice Wallenberg Foundation under Dnr KAW 2018.0067. This work has received funding from the Centre National d’Etudes Spatiales. LD acknowledges travel funding supplied by Elisabeth och Herman Rhodins minne. SS is supported by the Göran Gustafsson Foundation for Research in Natural Sciences and Medicine. JJ acknowledges support by the Swedish Research Council (VR) under the project 2020–05143 – ‘Deciphering the Dynamics of Cosmic Structure’. JJ acknowledges the hospitality of the Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611. The participation of JJ at the Aspen Center for Physics was supported by the Simons Foundation.

DATA AVAILABILITY

The data underlying this article will be shared on reasonable request to the corresponding author.

Footnotes

2

Further comparisons of interest with other fast GPU-based forward models (e.g. Li et al. 2022) would require a thorough evaluation of their timing and accuracy, as well as their trade-off, which is beyond the scope of this study.

REFERENCES

Abazajian
K. N.
et al. ,
2009
,
ApJS
,
182
,
543

Aghanim
N.
et al. ,
2020
,
A&A
,
641
,
67

Alves de Oliveira
R.
,
Li
Y.
,
Villaescusa-Navarro
F.
,
Ho
S.
,
Spergel
D. N.
,
2020
,
preprint
()

Amendola
L.
et al. ,
2018
,
Living Rev. Relativ.
,
21
,
1

Andrews
A.
,
Jasche
J.
,
Lavaux
G.
,
Schmidt
F.
,
2023
,
MNRAS
,
520
,
5746

Ata
M.
,
Kitaura
F. S.
,
Müller
V.
,
2014
,
MNRAS
,
446
,
4250

Ata
M.
,
Kitaura
F. S.
,
Lee
K. G.
,
Lemaux
B. C.
,
Kashino
D.
,
Cucciati
O.
,
Hernandez-Sanchez
M.
,
Le Fevre
O.
,
2021
,
MNRAS
,
500
,
3194

Bartlett
D. J.
,
Chiarenza
M.
,
Doeser
L.
,
Leclercq
F.
,
2024
,
preprint
()

Bayer
A. E.
,
Seljak
U.
,
Modi
C.
,
2023a
,
preprint
()

Bayer
A. E.
,
Modi
C.
,
Ferraro
S.
,
2023b
,
J. Cosmol. Astropart. Phys.
,
2023
,
46

Bernardini
M.
,
Mayer
L.
,
Reed
D.
,
Feldmann
R.
,
2020
,
MNRAS
,
496
,
5116

Blas
D.
,
Lesgourgues
J.
,
Tram
T.
,
2011
,
J. Cosmol. Astropart. Phys.
,
2011
,
026

Bonnaire
T.
,
Aghanim
N.
,
Kuruvilla
J.
,
Decelle
A.
,
2022
,
A&A
,
661
,
A146

Boruah
S. S.
,
Rozo
E.
,
2024
,
MNRAS
,
527
,
L162

Bos
E. G. P.
,
Kitaura
F.-S.
,
van de Weygaert
R.
,
2019
,
MNRAS
,
488
,
2573

Crocce
M.
,
Scoccimarro
R.
,
2006
,
 Phys. Rev. D
,
73
,
063519

Dawson
K. S.
et al. ,
2013
,
ApJ
,
145
,
41

DESI Collaboration
,
2016
,
preprint
()

Desmond
H.
,
Hutt
M. L.
,
Devriendt
J.
,
Slyz
A.
,
2022
,
MNRAS
,
511
,
L45

Doré
O.
et al. ,
2014
,
preprint
()

Duane
S.
,
Kennedy
A. D.
,
Pendleton
B. J.
,
Roweth
D.
,
1987
,
Phys. Lett. B
,
195
,
216

Eisenstein
D. J.
et al. ,
2011
,
AJ
,
142
,
24

Feng
Y.
,
Chu
M. Y.
,
Seljak
U.
,
McDonald
P.
,
2016
,
MNRAS
,
463
,
2273

Gunn
J. E.
,
Gott
J. R. I.
,
1972
,
ApJ
,
176
,
1

Hahn
O.
,
Abel
T.
,
2011
,
MNRAS
,
415
,
2101

Hahn
C.
et al. ,
2022
,
preprint
()

Hahn
C. H.
et al. ,
2023
,
J. Cosmol. Astropart. Phys.
,
2023
,
010

Hand
N.
,
Feng
Y.
,
Beutler
F.
,
Li
Y.
,
Modi
C.
,
Seljak
U.
,
Slepian
Z.
,
2018
,
AJ
,
156
,
160

He
K.
,
Zhang
X.
,
Ren
S.
,
Sun
J.
,
2016
, in
Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
.
IEEE Computer Society
, p.
770

He
S.
,
Li
Y.
,
Feng
Y.
,
Ho
S.
,
Ravanbakhsh
S.
,
Chen
W.
,
Póczos
B.
,
2019
,
PNAS
,
116
,
13825

Hutt
M. L.
,
Desmond
H.
,
Devriendt
J.
,
Slyz
A.
,
2022
,
MNRAS
,
516
,
3592

Ivezić
Ž.
et al. ,
2019
,
ApJ
,
873
,
111

Jamieson
D.
,
Li
Y.
,
He
S.
,
Villaescusa-Navarro
F.
,
Ho
S.
,
Alves de Oliveira
R.
,
Spergel
D. N.
,
2022a
,
preprint
()

Jamieson
D.
,
Li
Y.
,
Alves de Oliveira
R.
,
Villaescusa-Navarro
F.
,
Ho
S.
,
Spergel
D. N.
,
2022b
,
ApJ
,
952
,
145

Jamieson
D.
,
Li
Y.
,
Villaescusa-Navarro
F.
,
Ho
S.
,
Spergel
D. N.
,
2024
,
preprint
()

Jasche
J.
,
Kitaura
F. S.
,
2010
,
MNRAS
,
407
,
29

Jasche
J.
,
Lavaux
G.
,
2017
,
A&A
,
606
,
1

Jasche
J.
,
Lavaux
G.
,
2019
,
A&A
,
625
,
A64

Jasche
J.
,
Wandelt
B. D.
,
2013
,
MNRAS
,
432
,
894

Jasche
J.
,
Leclercq
F.
,
Wandelt
B. D.
,
2015
,
J. Cosmol. Astropart. Phys.
,
2015
,
036

Jindal
V.
,
Jamieson
D.
,
Liang
A.
,
Singh
A.
,
Ho
S.
,
2023
,
preprint
()

Kaushal
N.
,
Villaescusa-Navarro
F.
,
Giusarma
E.
,
Li
Y.
,
Hawry
C.
,
Reyes
M.
,
2021
,
ApJ
,
930
,
115

Kitaura
F. S.
,
2013
,
MNRAS
,
429
,
L84

Kitaura
F. S.
,
Ata
M.
,
Rodríguez-Torres
S. A.
,
Hernández-Sánchez
M.
,
Balaguera-Antolínez
A.
,
Yepes
G.
,
2021
,
MNRAS
,
502
,
3456

Knollmann
S. R.
,
Knebe
A.
,
2009
,
ApJS
,
182
,
608

Kostić
A.
,
Nguyen
N. M.
,
Schmidt
F.
,
Reinecke
M.
,
2023
,
J. Cosmol. Astropart. Phys.
,
2023
,
063

LSST Dark Energy Science Collaboration
,
2012
,
preprint
()

LSST Science Collaboration
,
2009
,
preprint
()

Lanzieri
D.
,
Lanusse
F.
,
Starck
J.-L.
,
2022
,
preprint
()

Laureijs
R.
et al. ,
2011
,
preprint
()

Lavaux
G.
,
Hudson
M. J.
,
2011
,
MNRAS
,
416
,
2840

Lavaux
G.
,
Jasche
J.
,
2016
,
MNRAS
,
455
,
3169

Lavaux
G.
,
Jasche
J.
,
Leclercq
F.
,
2019
,
preprint
()

Leclercq
F.
,
Heavens
A.
,
2021
,
MNRAS
,
506
,
L85

Leclercq
F.
,
Jasche
J.
,
Wandelt
B.
,
2015
,
J. Cosmol. Astropart. Phys.
,
2015
,
015

Legin
R.
,
Ho
M.
,
Lemos
P.
,
Perreault-Levasseur
L.
,
Ho
S.
,
Hezaveh
Y.
,
Wandelt
B.
,
2024
,
MNRAS
,
527
,
L173

Lewis
A.
,
Challinor
A.
,
Lasenby
A.
,
2000
,
ApJ
,
538
,
473

Li
Y.
,
Modi
C.
,
Jamieson
D.
,
Zhang
Y.
,
Lu
L.
,
Feng
Y.
,
Lanusse
F.
,
Greengard
L.
,
2022
,
ApJS
,
270
,
36

Ma
Y. Z.
,
Scott
D.
,
2016
,
Phys. Rev. D
,
93
,
083510

Mansfield
P.
,
Avestruz
C.
,
2021
,
MNRAS
,
500
,
3309

Mcalpine
S.
et al. ,
2022
,
MNRAS
,
512
,
5823

Michaux
M.
,
Hahn
O.
,
Rampf
C.
,
Angulo
R. E.
,
2021
,
MNRAS
,
500
,
663

Miller
J. W.
,
Dunson
D. B.
,
2019
,
J. Am. Stat. Assoc.
,
114
,
1113

Milletari
F.
,
Navab
N.
,
Ahmadi
S. A.
,
2016
, in
Proc. 4th International Conference on 3D Vision, 3DV 2016
. p.
565
,
available at
: http://promise12.grand-challenge.org/results/

Modi
C.
,
Feng
Y.
,
Seljak
U.
,
2018
,
J. Cosmol. Astropart. Phys.
,
2018
,
028

Modi
C.
,
Lanusse
F.
,
Seljak
U.
,
2021
,
Astron. Comput.
,
37
,
100505

Modi
C.
,
Li
Y.
,
Blei
D.
,
2023
,
J. Cosmol. Astropart. Phys.
,
2023
,
059

Neal
R. M.
,
1993
.
Technical Report CRG-TR-93-1
,
University of Toronto

Nguyen
N. M.
,
Jasche
J.
,
Lavaux
G.
,
Schmidt
F.
,
2020
,
J. Cosmol. Astropart. Phys.
,
2020
,
011

Nguyen
N. M.
,
Schmidt
F.
,
Lavaux
G.
,
Jasche
J.
,
2021
,
J. Cosmol. Astropart. Phys.
,
2021
,
058

Nusser
A.
,
Dekel
A.
,
1992
,
ApJ
,
391
,
443

Paszke
A.
et al. ,
2019
,
preprint
()

Porqueres
N.
,
Jasche
J.
,
Lavaux
G.
,
Enßlin
T.
,
2019
,
A&A
,
630
,
A151

Porqueres
N.
,
Hahn
O.
,
Jasche
J.
,
Lavaux
G.
,
2020
,
A&A
,
642
,
A139

Porqueres
N.
,
Heavens
A.
,
Mortlock
D.
,
Lavaux
G.
,
2021
,
MNRAS
,
502
,
3035

Porqueres
N.
,
Heavens
A.
,
Mortlock
D.
,
Lavaux
G.
,
2022
,
MNRAS
,
509
,
3194

Porqueres
N.
,
Heavens
A.
,
Mortlock
D.
,
Lavaux
G.
,
Makinen
T. L.
,
2023
,
preprint
()

Prideaux-Ghee
J.
,
Leclercq
F.
,
Lavaux
G.
,
Heavens
A.
,
Jasche
J.
,
2023
,
MNRAS
,
518
,
4191

Ramanah
D. K.
,
Lavaux
G.
,
Jasche
J.
,
Wandelt
B. D.
,
2019
,
A&A
,
621
,
A69

Reed
D.
,
Gardner
J.
,
Quinn
T.
,
Stadel
J.
,
Fardal
M.
,
Lake
G.
,
Governato
F.
,
2003
,
MNRAS
,
346
,
565

Ronneberger
O.
,
Fischer
P.
,
Brox
T.
,
2015
, in
Nawab
N.
,
Hornegger
J.
,
Wells
W. M.
,
Frangi
A. F.
, eds,
 Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science
.
Springer
,
Berlin
, p.
234

Schmidt
F.
,
Elsner
F.
,
Jasche
J.
,
Nguyen
N. M.
,
Lavaux
G.
,
2019
,
J. Cosmol. Astropart. Phys.
,
2019
,
 042

Seljak
U.
,
Aslanyan
G.
,
Feng
Y.
,
Modi
C.
,
2017
,
J. Cosmol. Astropart. Phys.
,
2017
,
009

Springel
V.
,
2005
,
MNRAS
,
364
,
1105

Stadler
J.
,
Schmidt
F.
,
Reinecke
M.
,
2023
,
J. Cosmol. Astropart. Phys.
,
2023
,
069

Stiskalek
R.
,
Desmond
H.
,
Devriendt
J.
,
Slyz
A.
,
2024
,
MNRAS
,
534
,
3120

Stopyra
S.
,
Peiris
H. V.
,
Pontzen
A.
,
Jasche
J.
,
Lavaux
G.
,
2024
,
MNRAS
,
527
,
1244

Takada
M.
et al. ,
2012
,
PASJ
,
66
,
R1

Tassev
S.
,
Zaldarriaga
M.
,
Eisenstein
D. J.
,
2013
,
J. Cosmol. Astropart. Phys.
,
2013
,
036

Tinker
J.
,
Kravtsov
A. V.
,
Klypin
A.
,
Abazajian
K.
,
Warren
M.
,
Yepes
G.
,
Gottlöber
S.
,
Holz
D. E.
,
2008
,
ApJ
,
688
,
709

Trenti
M.
,
Smith
B. D.
,
Hallman
E. J.
,
Skillman
S. W.
,
Shull
J. M.
,
2010
,
ApJ
,
711
,
1198

Tsaprazi
E.
,
Nguyen
N. M.
,
Jasche
J.
,
Schmidt
F.
,
Lavaux
G.
,
2022
,
J. Cosmol. Astropart. Phys.
,
2022
,
003

Tsaprazi
E.
,
Jasche
J.
,
Lavaux
G.
,
Leclercq
F.
,
2023
,
preprint
()

Villaescusa-Navarro
F.
,
2018
,
Astrophysics Source Code Library
,
record ascl:1811.008

Villaescusa-Navarro
F.
et al. ,
2020
,
ApJS
,
250
,
2

Villaescusa-Navarro
F.
et al. ,
2021a
,
preprint
()

Villaescusa-Navarro
F.
et al. ,
2021b
,
Nat. Commun.
,
9
,
4950

Vogelsberger
M.
,
Marinacci
F.
,
Torrey
P.
,
Puchwein
E.
,
2020
,
Nat. Rev. Phys.
,
2
,
42

Wang
L.
,
Yoon
K. J.
,
2022
,
IEEE Transact. Patt. Analys. Mach. Intelligence
,
44
,
3048

Wang
H.
,
Mo
H. J.
,
Yang
X.
,
Jing
Y. P.
,
Lin
W. P.
,
2014
,
ApJ
,
794
,
1

Watson
W. A.
,
Iliev
I. T.
,
D’Aloisio
A.
,
Knebe
A.
,
Shapiro
P. R.
,
Yepes
G.
,
2013
,
MNRAS
,
433
,
1230

APPENDIX A: FIELD-LEVEL EMULATOR

We here describe in more detail the neural network architecture used, the generation of consistent training data with the first-order LPT implementation in BORG, and the changes necessary to train a mini-emulator.

A1 Neural network architecture

We implement a U-Net/V-Net architecture, akin to the one in Jamieson et al. (2022a) and Oliveira et al. (2020). This model operates on four resolution levels in a ‘V’ pattern, utilizing a series of 3 downsampling (by stride-2 |$2^3$| convolutions) and subsequently 3 upsampling layers (by stride-|$1/2$||$2^3$| transposed convolutions). Each level involves |$3^3$| convolutions, with the incorporation of a residual connection (a |$1^3$| convolution instead of identity) within each block, inspired by ResNet (He et al. 2016). Batch normalization layers follow every convolution, except the initial and final two, and are accompanied by a leaky ReLU activation function (with a negative slope of 0.01). Similar to the U-Net structure, inputs from downsampling layers are concatenated with outputs from the corresponding resolution levels’ upsampling layers. Each layer maintains 64 channels, excluding the input and output layers (3) and those following concatenations (128). A difference from the original U-Net is that we directly add input displacement to the output.

We use the same loss function as in Jamieson et al. (2022b),

(A1)

where |$L_{\delta }$| and |$L_{\boldsymbol{\Psi }}$| are mean square error losses on the Eulerian density field |$\delta$| and particle displacements |$\boldsymbol{\Psi }$| respectively, and the weight parameter is chosen to |$\lambda =3$| for faster training.

A2 Generation of BORG-LPT data

The emulator from Jamieson et al. (2022b) was retrained with input data coming from the implementation of 1LPT in BORG. BORG-1LPT was run down to |$z=0$|⁠. The initial conditions used for Quijote (Villaescusa-Navarro et al. 2020) have a mesh size of |$1024^3$|⁠, which sets the highest resolution that can be run. The initial conditions generator computes the 2LPT displacements at |$z=127$|⁠, and fills the mesh up to the particle grid Nyquist mode, and uses zero padding on the rest, as this effectively de-aliases the convolutions when computing the 2LPT corrections (see e.g. Michaux et al. 2021).

For the Latin hyper-cube, the particle grid used for the N-body simulation was |$512^3$|⁠. In preparation for using the white noise field in BORG, we make use of the idea behind the zero padding by setting all modes higher than the particle grid Nyquist mode to zero. We then run BORG-LPT with |$1024^3$| particles and finally downsample by removing every other particle in each dimension.

A3 Mini-emulator

We downsize the neural network architecture presented in Appendix A1 by altering the channel count in each layer from 64 to 32, consequently reducing the number of model weights from |$3.35 \times 10^6$| to |$0.84 \times 10^6$|⁠.

To reduce the computational cost for training, we only make use of 5 per cent of the full data set. Based on ideas from recent knowledge distillation techniques (Wang & Yoon 2022), we also introduce a novel approach of leveraging the outputs of the emulator to train the mini-emulator. In essence, the comprehensive emulator acts as an auxiliary teacher, complementing the Quijote data. The loss L from equation (A1) thus becomes only a part in the new |$L_{\mathrm{mini}}$| loss through

(A2)

where the individual losses L and |$L_{\mathrm{teacher}}$| both have the form of equation (A1), with the only difference being the ground truth output either as the Quijote suite or the full emulator predictions. This student-teacher methodology yields a modest yet discernible enhancement over the performance of a teacher-less mini-emulator (i.e. when training the mini-emulator independently), but further investigation is needed.

APPENDIX B: POWER SPECTRUM AND BISPECTRUM

The density modes, denoted as |$\delta (\boldsymbol {k})$|⁠, are obtained by performing the Fourier transform on the density field |$\delta (\boldsymbol {s})$| at each location |$\boldsymbol {s}$| in space. The power spectrum |$P(k)$|⁠, which describes the distribution of the density modes, can then be computed as:

(B1)

In this expression, |$\delta _{\mathrm{D}}^{(3)}(\boldsymbol {k})$| denotes the 3D Dirac delta function, which enforces the homogeneity of the density fluctuations. The requirement of isotropy is reflected in the fact that the power spectrum only depends on the magnitude |$k \equiv |\boldsymbol {k}|$|⁠. Similarly, the cross-power |$C_k$| spectrum between two fields |$\delta _1$| and |$\delta _2$| is given by

(B2)

where we usually normalize |$C_k$| with the product |$P_1(k)P_2(k)$|⁠.

The three-point statistic in the form of the matter bispectrum |$B\left(k_{1:3}\right)$|⁠, where |$k_{1:3} \equiv \lbrace k_1, k_2, k_3\rbrace$|⁠, is given by

(B3)

Note that, the sum of the wave vectors in the Dirac delta function requires them to form a closed triangle. The bispectrum can therefore be expressed through the magnitude of two wave vectors and the separation angle |$\theta$|⁠, i.e. |$B\left(k_1, k_2, \theta \right)$|⁠. We also define the reduced bispectrum Q as

(B4)

which has a weaker dependence on cosmology and scale. To compute the power spectrum, cross-power spectrum, and bispectrum, we make use of the pylians34 package (Villaescusa-Navarro 2018).

APPENDIX C: APPROXIMATE PHYSICS MODELS

We compare the different structure formation models BORG-1LPT, COLA (using force factor |$f=4$| and either |$n=10$| or |$n=20$| time-steps), mini-emulator, and emulator with the high-fidelity N-body simulation code P-Gadget-III (see Section 4.1). This comparison encompasses an examination of the cross power spectrum in Fig. C1 and the power spectrum and two distinct configurations of the reduced bispectrum in Fig. C2 (see Appendix  B). Moreover, three different definitions of halo mass function in Fig. C3, and density profiles in Fig. C4. In all statistics, it is evident that the emulator, as well as the mini-emulator, outperforms the other approximate model and provides highly accurate predictions compared to the N-body predictions. We used the N-body simulation prediction to identify halo locations and re-centred these to identify the centre of masses of the same objects in each forward model.

Comparison between different gravity models in terms of the cross-power spectrum. Both the mini-emulator and emulator outperform first-order LPT and COLA (using $n=10$ and $n=20$ time-steps; force factor $f=4$) and reach a per cent-level agreement with the N-body simulation predictions.
Figure C1.

Comparison between different gravity models in terms of the cross-power spectrum. Both the mini-emulator and emulator outperform first-order LPT and COLA (using |$n=10$| and |$n=20$| time-steps; force factor |$f=4$|⁠) and reach a per cent-level agreement with the N-body simulation predictions.

Comparison between different gravity models, all having been run with $128^3$ particles in a cubic volume with side length $250\,h^{-1}$ Mpc box to $z=0$ using the same initial conditions, in terms of the power spectrum and two configurations of the reduced bispectrum. Both the mini-emulator and emulator outperform first-order LPT and COLA (using $n=10$ and $n=20$ time-steps; force factor $f=4$) and reach a per cent-level agreement with the N-body simulation predictions.
Figure C2.

Comparison between different gravity models, all having been run with |$128^3$| particles in a cubic volume with side length |$250\,h^{-1}$| Mpc box to |$z=0$| using the same initial conditions, in terms of the power spectrum and two configurations of the reduced bispectrum. Both the mini-emulator and emulator outperform first-order LPT and COLA (using |$n=10$| and |$n=20$| time-steps; force factor |$f=4$|⁠) and reach a per cent-level agreement with the N-body simulation predictions.

Of most relevance is the emulators’ improved accuracy over COLA. In terms of the power spectrum and the reduced bispectra, the emulators improve from a |$\gt $|10 per cent discrepancy at small scales down to per cent-level discrepancies up to |$k \sim 2\,h^{-1}$| Mpc for the fiducial cosmology, as can be seen in Fig. C2.

Furthermore, while COLA performs accurately for |$M_{100\mathrm{m}}$| and the Friend-of-Friends mass functions (Fig. C3), COLA struggles significantly with the |$M_{200\mathrm{c}}$| mass function as well as the inner halo density profiles (Figs C3 and C4). This follows from the fact that obtaining accurate |$M_{200\mathrm{c}}$| masses requires accurately determining the density profile at the |$200\rho _c$| radius, which corresponds to about |$\sim 630$| times the mean density (of the Universe) for our value of |$\Omega _\mathrm{m}$|⁠. This occurs at the radius |$r_{200\mathrm{c}}$|⁠, and in Fig. C4 this is where COLA is inaccurate, while the emulators are still highly accurate. Hence, the emulators are able to get the correct |$M_{200\mathrm{c}}$| masses down to the mass resolution limit. |$M_{100\mathrm{m}}$| and Friend-of-Friends mass functions, however, are less strict and do not require density profiles to be accurate on scales as small as the |$r_{200\mathrm{c}}$|⁠. This explains why COLA performs almost as accurately as the emulators on these mass functions.

Halo mass function for different gravity models, all having been run with $128^3$ particles in a cubic volume with side length $250h^{-1}$ Mpc box to $z=0$ using the same initial conditions, computed using three different halo mass definitions. Left: $M_{200\mathrm{c}}$ (mass is the total mass within a radius such that the average density is 200 times the critical density of the Universe). Middle: $M_{100\mathrm{m}}$ (mass within a sphere such that the radius is 100 times the mean density of the Universe). We validate the $M_{100\mathrm{m}}$ and $M_{200\mathrm{c}}$ halo mass functions against the theoretically expected Tinker prediction for this cosmology (Tinker et al. 2008). Right: FoF mass with linking length 0.2, as compared with the mass function from Watson et al. (2013). Notably, the inability of LPT to resolve small scales results in the absence of haloes that can be reliably identified across all defined halo definitions. Below the mass resolution limit, which differs between the mass definitions (130, 110, and 30 particles, respectively) as given by Reed et al. (2003), Trenti et al. (2010), and Mansfield & Avestruz (2021), we cannot accurately recover the mass function. As compared to COLA (using $n=10$ and $n=20$ time-steps; force factor $f=4$), the emulators display significant improvement towards the N-body mass functions.
Figure C3.

Halo mass function for different gravity models, all having been run with |$128^3$| particles in a cubic volume with side length |$250h^{-1}$| Mpc box to |$z=0$| using the same initial conditions, computed using three different halo mass definitions. Left: |$M_{200\mathrm{c}}$| (mass is the total mass within a radius such that the average density is 200 times the critical density of the Universe). Middle: |$M_{100\mathrm{m}}$| (mass within a sphere such that the radius is 100 times the mean density of the Universe). We validate the |$M_{100\mathrm{m}}$| and |$M_{200\mathrm{c}}$| halo mass functions against the theoretically expected Tinker prediction for this cosmology (Tinker et al. 2008). Right: FoF mass with linking length 0.2, as compared with the mass function from Watson et al. (2013). Notably, the inability of LPT to resolve small scales results in the absence of haloes that can be reliably identified across all defined halo definitions. Below the mass resolution limit, which differs between the mass definitions (130, 110, and 30 particles, respectively) as given by Reed et al. (2003), Trenti et al. (2010), and Mansfield & Avestruz (2021), we cannot accurately recover the mass function. As compared to COLA (using |$n=10$| and |$n=20$| time-steps; force factor |$f=4$|⁠), the emulators display significant improvement towards the N-body mass functions.

Stacked density profiles of haloes for different gravity models using the same initial conditions with $128^3$ particles in a cubic volume with side length $250\,h^{-1}$ Mpc. To identify halo locations we used the N-body predictions and re-centred them to determine the centre of mass for each object in every forward model. This allows us to define halo density profiles for first-order LPT even though we cannot reliably identify haloes. We adopt the $M_{200\mathrm{c}}$ mass definition and average the profile over all haloes in three mass bins. Only the emulators show accurate profiles below the virial radius $r_{200\mathrm{c}}$ as compared to the N-body. Note that, the virial radius $r_{200\mathrm{c}}$ for the ground truth haloes lie in the range $[0.34,1.85]\,h^{-1}$ Mpc.
Figure C4.

Stacked density profiles of haloes for different gravity models using the same initial conditions with |$128^3$| particles in a cubic volume with side length |$250\,h^{-1}$| Mpc. To identify halo locations we used the N-body predictions and re-centred them to determine the centre of mass for each object in every forward model. This allows us to define halo density profiles for first-order LPT even though we cannot reliably identify haloes. We adopt the |$M_{200\mathrm{c}}$| mass definition and average the profile over all haloes in three mass bins. Only the emulators show accurate profiles below the virial radius |$r_{200\mathrm{c}}$| as compared to the N-body. Note that, the virial radius |$r_{200\mathrm{c}}$| for the ground truth haloes lie in the range |$[0.34,1.85]\,h^{-1}$| Mpc.

APPENDIX D: MULTISCALE LIKELIHOOD

We here derive the effect on the variance of averaging the fields and the effect this in turn has on the information entropy of the data. We also provide the weights used for different scales during inference.

D1 Scale-dependent variance

At the target resolution |$N = 128^3$|⁠, the noise is drawn from a zero-mean Gaussian with covariance matrix |$\mathbf {C}$|⁠, such that noise realizations |$\mathbf {n}\curvearrowleft \pi (\mathbf {n})=\mathcal {N}(\mathbf {0},\mathbf {C})$| represent independent Gaussian noise corresponding to the Gaussian data model in equation (8). We assume a constant data variance |$\sigma ^2$| for all diagonal elements of the noise covariance matrix |$\mathbf {C}$|⁠, whose elements thus are

(D1)

where |$n_i$| and |$n_j$| represent field values of the noise at voxels i and j, and the Kronecker delta |$\delta ^K_{ij}$| describes the diagonal covariance matrix.

We now apply the average pooling operator |$K^l$| defined as

(D2)

where i and r represent voxels in the original and smoothed field, respectively, and |$\Omega _r$| is the domain of |$k_l = 2^{3l}$| adjacent voxels from the original field that are averaged over to level l. The total number of voxels in the smoothed grid becomes |$N_l = N k_l^{-1}$|⁠. The operation can now be defined through

(D3)

where |$n_r^l$| is the field value of the noise at voxel r at smoothing level l. To derive the variance at the smoothing level l, we evaluate the covariance matrix at this level as

(D4)

where, in the ultimate step, we utilized that there are |$k_l$| voxels in the domain |$\Omega _r$| of the smoothing process. Additionally, we accounted for the non-overlapping nature of the smoothing domains |$\Omega _r$| and |$\Omega _s$|⁠, leading to the product of the kernels |$K_{ir}^l$| and |$K_{is}^l$| being non-zero only if |$r = s$|⁠. The resulting relation

(D5)

was here derived for |$\mathbf {n}$| but note it also holds for |$\mathbf {d}=\boldsymbol{\delta }^{\mathrm{sim}}+\mathbf {n}$| since |$\boldsymbol{\delta }^{\mathrm{sim}}$| is fixed and the variance only enters through |$\mathbf {n}$|⁠.

D2 Information entropy

The information entropy H for the data field |$\boldsymbol {d}$| is defined as

(D6)

We identify |$p(\boldsymbol {d}) \equiv \pi (\boldsymbol {d}|\boldsymbol{\delta })$| as our likelihood and perform the standard calculation of entropy of an independent multivariate Gaussian, yielding

(D7)

where we used that the covariance matrix in our case is diagonal with variances |$\sigma ^2$| and |$N=128^3$| is the total number of voxels. We proceed similarly to the sum in equation (12): after applying a smoothing operator |$\mathbf {K}^l$| separately, we compute the information entropy and then sum the individual information entropies. Similar to equation (D7) and using our result from equation (D5), we obtain

(D8)
(D9)

The total loss of information can be written

(D10)

which is evident since |$k_l=2^{3l}\gt 1$|⁠, and |$\sum _l w_l = 1$|⁠, making each of the three terms in equation (D9) lower than in equation (D7), even when summed over all l. This indicates that the multiscale likelihood erased information in the data |$\boldsymbol {d}$|⁠.

D3 Weight scheduling

In Table D1, we show the weight values used in the multiscale likelihood during the initial phase of the inference. Importantly, weights are only adjusted during the warm-up phase. The final set of weights, corresponding to the last row of the table, is used when recording samples.

Table D1.

The six-stage weight schedule within the multiscale likelihood during the initial phase of the inference. The weights are provided corresponding to specific numbers of accepted samples |$N_{\textrm {accept}}$| in the Markov chain. A linear increase is applied between the specified weight values. The values at the transitions were chosen empirically to ensure that the scale with the highest w made the most significant likelihood contribution.

|$N_{\textrm {accept}}$||$w_1$||$w_2$||$w_3$||$w_4$||$w_5$||$w_6$|
00.00010.00020.0010.010.10.8887
3000.00010.00020.0010.10.88870.01
7000.00010.00020.10.87970.010.01
12000.00010.10.86990.010.010.01
18000.10.860.010.010.010.01
25000.9570.010.0090.0090.0080.007
|$N_{\textrm {accept}}$||$w_1$||$w_2$||$w_3$||$w_4$||$w_5$||$w_6$|
00.00010.00020.0010.010.10.8887
3000.00010.00020.0010.10.88870.01
7000.00010.00020.10.87970.010.01
12000.00010.10.86990.010.010.01
18000.10.860.010.010.010.01
25000.9570.010.0090.0090.0080.007
Table D1.

The six-stage weight schedule within the multiscale likelihood during the initial phase of the inference. The weights are provided corresponding to specific numbers of accepted samples |$N_{\textrm {accept}}$| in the Markov chain. A linear increase is applied between the specified weight values. The values at the transitions were chosen empirically to ensure that the scale with the highest w made the most significant likelihood contribution.

|$N_{\textrm {accept}}$||$w_1$||$w_2$||$w_3$||$w_4$||$w_5$||$w_6$|
00.00010.00020.0010.010.10.8887
3000.00010.00020.0010.10.88870.01
7000.00010.00020.10.87970.010.01
12000.00010.10.86990.010.010.01
18000.10.860.010.010.010.01
25000.9570.010.0090.0090.0080.007
|$N_{\textrm {accept}}$||$w_1$||$w_2$||$w_3$||$w_4$||$w_5$||$w_6$|
00.00010.00020.0010.010.10.8887
3000.00010.00020.0010.10.88870.01
7000.00010.00020.10.87970.010.01
12000.00010.10.86990.010.010.01
18000.10.860.010.010.010.01
25000.9570.010.0090.0090.0080.007

APPENDIX E: SIGNAL-TO-NOISE

The Gaussian noise is insensitive to density amplitudes, resulting in high signal-to-noise in denser regions and low signal-to-noise in underdense regions. In Fig. E1, we show that for |$k\gt 1.52\,h$| Mpc|$^{-1}$| the noise dominates over the signal. We also indicate the Nyquist frequency, corresponding to the smallest scale from which information can be accessed.

Power spectra for the ground truth simulation $\boldsymbol{\delta }^{\mathrm{sim}}$, the added noise $\mathbf {n} \curvearrowleft \pi (\mathbf {n})=\mathcal {N}(\mathbf {0},\mathbb{1}\sigma ^2)$ with $\sigma =3$, and the data $\mathbf {d}$. The grey regions show where the signal-to-noise is greater than 1 (for $k\gt 1.52h$ Mpc$^{-1}$) and the Nyquist frequency $k_{\mathrm{N}} \approx 2.79\,h$ Mpc$^{-1}$.
Figure E1.

Power spectra for the ground truth simulation |$\boldsymbol{\delta }^{\mathrm{sim}}$|⁠, the added noise |$\mathbf {n} \curvearrowleft \pi (\mathbf {n})=\mathcal {N}(\mathbf {0},\mathbb{1}\sigma ^2)$| with |$\sigma =3$|⁠, and the data |$\mathbf {d}$|⁠. The grey regions show where the signal-to-noise is greater than 1 (for |$k\gt 1.52h$| Mpc|$^{-1}$|⁠) and the Nyquist frequency |$k_{\mathrm{N}} \approx 2.79\,h$| Mpc|$^{-1}$|⁠.

APPENDIX F: INFERENCE WITH LOWER SIGNAL-TO-NOISE (⁠|$\sigma =4$|⁠)

The signal-to-noise chosen affects the amount of information that can be extracted. In a sibling inference with |$\sigma =4$|⁠, i.e. with a lower signal-to-noise, we obtain the expected behaviour of lower correlations over the full Fourier range, as shown in Fig. F1.

Information recovery of inferred initial conditions as in Fig. 6, but here with $\sigma =4$ used in equation (8) for generating the mock data and in equation (12) during inference. The results corresponding to $\sigma =3$, as shown in Fig. 6, are represented by the bold lines, while the $\sigma =4$ case is shown by fainter lines. In both cases, the mean cross-correlation over the inferred samples is displayed. In line with expectations, this lower signal-to-noise scenario yields reduced correlation for both initial and final conditions.
Figure F1.

Information recovery of inferred initial conditions as in Fig. 6, but here with |$\sigma =4$| used in equation (8) for generating the mock data and in equation (12) during inference. The results corresponding to |$\sigma =3$|⁠, as shown in Fig. 6, are represented by the bold lines, while the |$\sigma =4$| case is shown by fainter lines. In both cases, the mean cross-correlation over the inferred samples is displayed. In line with expectations, this lower signal-to-noise scenario yields reduced correlation for both initial and final conditions.

APPENDIX G: AUTO-CORRELATION FUNCTION

To check how many samples are needed to obtain independent samples from the posterior, we compute the auto-correlation function |$\text{ACF}(l)$| for the white noise field |$\boldsymbol {x}$| as

(G1)

where l here is the lag for a given Markov chain for an individual white noise amplitude |$\boldsymbol {x}_i$| with mean |$\bar{\boldsymbol {x}}_i$|⁠. Note also that, |$N=128^3$| is the total number of amplitudes, |$\mathcal {F}$| is the Fast Fourier Transform, |$\star$| denotes a conjugation and |$^{*}$| represents a complex conjugate.

We establish the correlation length of the Markov chain as the sample lag needed for the auto-correlation function to drop below 0.1, and obtain an average of |$430 \pm 154$| samples.

APPENDIX H: SLICES OF POSTERIOR FIELDS

In Fig. H1, we show slices of the posterior fields, both of the inferred initial conditions and the resimulations.

The mean of the posterior resimulations (top centre) is compared with the ground truth (top left) using a projection slab width of $19.5\,h^{-1}$ Mpc. The mean exhibits a high visual correlation with the ground truth down to the smallest (voxel) scales. The standard deviation (top right) varies non-uniformly following the large-scale structure. The inferred initial conditions (bottom two rows; slab of width $1.95h^{-1}$ Mpc) show no correlation with the ground truth at the smallest scale, as also illustrated quantitatively by Figs 6 and F1. However, employing a Gaussian smoothing filter, here with a smoothing scale $\sigma _{\mathrm{filt}}=4$, visually reveals the correlation with the truth (bottom row).
Figure H1.

The mean of the posterior resimulations (top centre) is compared with the ground truth (top left) using a projection slab width of |$19.5\,h^{-1}$| Mpc. The mean exhibits a high visual correlation with the ground truth down to the smallest (voxel) scales. The standard deviation (top right) varies non-uniformly following the large-scale structure. The inferred initial conditions (bottom two rows; slab of width |$1.95h^{-1}$| Mpc) show no correlation with the ground truth at the smallest scale, as also illustrated quantitatively by Figs 6 and F1. However, employing a Gaussian smoothing filter, here with a smoothing scale |$\sigma _{\mathrm{filt}}=4$|⁠, visually reveals the correlation with the truth (bottom row).

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.