-
PDF
- Split View
-
Views
-
Cite
Cite
Nicolas Chartier, Benjamin Wandelt, Yashar Akrami, Francisco Villaescusa-Navarro, CARPool: fast, accurate computation of large-scale structure statistics by pairing costly and cheap cosmological simulations, Monthly Notices of the Royal Astronomical Society, Volume 503, Issue 2, May 2021, Pages 1897–1914, https://doi.org/10.1093/mnras/stab430
- Share Icon Share
ABSTRACT
To exploit the power of next-generation large-scale structure surveys, ensembles of numerical simulations are necessary to give accurate theoretical predictions of the statistics of observables. High-fidelity simulations come at a towering computational cost. Therefore, approximate but fast simulations, surrogates, are widely used to gain speed at the price of introducing model error. We propose a general method that exploits the correlation between simulations and surrogates to compute fast, reduced-variance statistics of large-scale structure observables without model error at the cost of only a few simulations. We call this approach Convergence Acceleration by Regression and Pooling (CARPool). In numerical experiments with intentionally minimal tuning, we apply CARPool to a handful of gadget-iii N-body simulations paired with surrogates computed using COmoving Lagrangian Acceleration. We find ∼100-fold variance reduction even in the non-linear regime, up to |$k_\mathrm{max} \approx 1.2\, h {\rm Mpc^{-1}}$| for the matter power spectrum. CARPool realizes similar improvements for the matter bispectrum. In the nearly linear regime CARPool attains far larger sample variance reductions. By comparing to the 15 000 simulations from the Quijote suite, we verify that the CARPool estimates are unbiased, as guaranteed by construction, even though the surrogate misses the simulation truth by up to |$60{{\ \rm per\ cent}}$| at high k. Furthermore, even with a fully configuration-space statistic like the non-linear matter density probability density function, CARPool achieves unbiased variance reduction factors of up to ∼10, without any further tuning. Conversely, CARPool can be used to remove model error from ensembles of fast surrogates by combining them with a few high-accuracy simulations.
1 INTRODUCTION
The next generation of galaxy surveys will provide a detailed chart of cosmic structure and its growth on our cosmic light cone. These include the Euclid space telescope (Laureijs et al. 2011; Euclid Collaboration 2020), the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration 2016a, b), the Rubin Observatory Legacy Survey of Space and Time (LSST; Ivezić et al. 2019; LSST Science Collaboration 2009; LSST Dark Energy Science Collaboration 2018), the Square Kilometre Array (SKA; Yahya et al. 2015; Square Kilometre Array Cosmology Science Working Group 2020), the Wide Field InfraRed Survey Telescope (WFIRST; Spergel et al. 2015), the Subaru Hyper Suprime-Cam (HSC) and Prime Focus Spectrograph (PFS) surveys (Aihara et al. 2018; Tamura et al. 2016), and the Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer (SPHEREx; Doré et al. 2014, 2018). These data sets will provide unprecedented statistical power to constrain the initial perturbations, the growth of cosmic structure, and the cosmic expansion history. To access this information requires accurate theoretical models of large-scale structure statistics, such as power spectra and bispectra. While analytical work, such as standard perturbation theory (Jain & Bertschinger 1994; Goroff et al. 1986), Lagrangian perturbation theory (LPT; Bouchet et al. 1995; Matsubara 2008), renormalized perturbation theory (Crocce & Scoccimarro 2006), and effective field theory (Carrasco, Hertzberg & Senatore 2012; Vlah, White & Aviles 2015; Perko et al. 2016), has made great strides [see also Bernardeau et al. (2002), Desjacques, Jeong & Schmidt (2018) for reviews], the reference models for large-scale structure are based on computationally intensive N-body simulations that compute the complex non-linear regime of structure growth. In recent years, the BACCO simulation project (Angulo et al. 2020), the Outer Rim Simulation (Heitmann et al. 2019), the Aemulus project I (DeRose et al. 2019), the ABACUS Cosmos suite (Garrison et al. 2018), the Dark Sky Simulations (Skillman et al. 2014), the MICE Grand Challenge (MICE-GC; Crocce et al. 2015), the Coyote Universe I (Heitmann et al. 2010), and the Uchuu simulations (Ishiyama et al. 2020), among others, involved generation of expensive N-body simulations.
While analytical methods compute expectation values of large-scale structure statistics, a simulation generates a single realization and its output therefore suffers from sample variance. Reducing this variance to a point where it is subdominant to the observational error therefore requires running ensembles of simulations.
Computational cosmologists have been tackling the challenge of optimizing N-body codes and gravity solvers for a growingly larger number of particles. Widely used codes include the parallel Tree Particle-Mesh (TreePM or TPM) codes gadget-ii by Springel (2005) and greem by Ishiyama, Fukushige & Makino (2009), the adaptive treecode 2hot by Warren (2013), the GPU-accelerated abacus code originated from Garrison (2019), the Hardware/Hybrid Accelerated Cosmology Code (hacc) developed by Habib et al. (2016), and the distributed-memory and GPU-accelerated pkdgrav3, based on Fast Multipole Methods and adaptive particle timesteps, from Potter, Stadel & Teyssier (2017). The memory and CPU time requirements of such computations are a bottleneck for future work on new-generation cosmological data sets. As an example, the 43 100 runs in the Quijote simulations from Villaescusa-Navarro et al. (2020), of which the data outputs are public and used in this paper, required 35 million CPU-core-hours.
The search for solutions has led to alternative, fast, and approximate ways to generate predictions for large-scale structure statistics. The COmoving Lagrangian Acceleration (COLA) solver of Tassev, Zaldarriaga & Eisenstein (2013) is a PM code that solves the particle equations of motion in an accelerated frame given by LPT. Particles are nearly at rest in this frame for much of the mildly non-linear regime. As a consequence, much larger timesteps can be taken, leading to significant time savings. The N-body solver FASTPM of Feng et al. (2016) operates on a similar principle, using modified kick and drift factors to enforce the Zel’dovich approximation in the mildly non-linear regime. The spatial COLA (sCOLA) scheme (Tassev et al. 2015) extends the idea of using LPT to guide the solution in the spatial domain. Leclercq et al. (2020) have carefully examined and implemented these ideas to allow splitting large N-body simulations into many perfectly parallel, independently evolving small simulations.
In a different family of approaches, but still using LPT, Monaco et al. (2013) proposed a parallelized implementation of the PINpointing Orbit Crossing-Collapsed HI-erarchical Objects (pinocchio) algorithm from Taffoni, Monaco & Theuns (2002). Chuang et al. (2015) developed a physically motivated enhancement of the Zel’dovich approximation called EZmocks. Approximate methods and full N-body simulations can also be jointly used. For instance, Tassev & Zaldarriaga (2012) proposed a statistical linear regression model of the non-linear matter density field using the density field given by perturbation theory, for which the random residual error is minimized.
Recently, so-called emulators have been of great interest: they predict statistics in the non-linear regime based on a generic mathematical model whose parameters are trained on simulation suites covering a range of cosmological parameters. An emulator is trained by Angulo et al. (2020) on the BACCO simulations; similarly, the Aemulus project contributions II, III, and IV (McClintock et al. 2019a, b; Zhai et al. 2019), respectively, construct an emulator for the halo mass function, the galaxy correlation function, and the halo bias using the Aemulus I suite (DeRose et al. 2019). Not only do emulators that map cosmological parameters to certain outputs need large numbers of simulations for training, they also do not guarantee unbiased results with respect to full simulation codes, especially outside the parameter range used during training.
Recent advances in deep learning have allowed training emulators that reproduce particle positions or density fields starting from initial conditions therefore essentially emulating the full effect of a low-resolution cosmological N-body code – these include the Deep Density Displacement Model (D3M) of He et al. (2019) stemming from the U-NET architecture (Ronneberger, Fischer & Brox 2015). Kodi Ramanah et al. (2020) describe a complementary deep learning tool that increases the mass and spatial resolution of low-resolution N-body simulations using a variant of Generative Adversarial Networks (Goodfellow et al. 2014).
None of these fast approximate solutions exactly reproduce the results of more computationally intensive codes. They trade computational accuracy for computational speed, especially in the non-linear regime. In this vein, the recent series of papers by Lippich et al. (2019), Blot et al. (2019), and Colavincenzo et al. (2019) compare the covariance matrices of clustering statistics given by several low-fidelity methods to those of full N-body codes and find statistical biases in the parameter uncertainties by up to 20 per cent.
A different approach to this problem is to reduce the stochasticity of the initial conditions, thereby modifying the statistics of the observables in such a way as to reduce sample variance. This is the spirit of the method of fixed fields invented and first explored by Pontzen et al. (2016) and Angulo & Pontzen (2016). They found in numerical experiments that a large variety of statistics retain the correct mean, and analytically showed that pairing and fixing, while changing the initial distributions, only impact a measure-zero set of correlations when the errors are not smothered by the large number of available modes. While this approach does not guarantee that any given statistic will be unbiased, the numerical study by Villaescusa-Navarro et al. (2018) showed that ‘fixing’ succeeds in reducing variance for several statistics of interest with no detectable bias when comparing to an ensemble of hundreds of full simulations and at no additional cost to regular simulations. Still, it is clear that other statistics must necessarily be biased, for example, the square of any variance-reduced statistic, such as four-point functions. Still in the family of variance reduction methods, Smith & Angulo (2019) built a composite model of the matter power spectrum and managed to cancel most of the cosmic variance on large scales, notably by using the ratio of matched phase initial conditions.
In this paper, we show that it is possible to get the best of both worlds: the speed of fast surrogates and the guarantee of full-simulation accuracy.1 We take inspiration from control variates, a classical variance reduction technique that directly and optimally minimizes the variance of any random quantity [see Lavenberg & Welch (1981) for a review, and Gorodetsky et al. (2020) and Peherstorfer, Willcox & Gunzburger (2016) for related recent applications], to devise a way to combine fast but approximate simulations (or surrogates) with computationally intensive accurate simulations to vastly accelerate convergence while guaranteeing arbitrarily small bias with respect to the full simulation code. We call this Convergence Acceleration by Regression and Pooling (CARPool).2
The paper is organized as follows. In Section 2, we explore the theory of univariate and multivariate estimation with control variates and highlight some differences in our setting for cosmological simulations. In Section 3, we briefly discuss both the N-body simulation suite and our choice of fast surrogates we use in the numerical experiments presented in Section 4. We conclude in Section 5.
Table 1 lists mathematical notation and definitions used throughout this paper.
Notation . | Description . |
---|---|
|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$| | Set of N random seeds rn of probability space |
|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$| | Random column vector of size p at seed rn |
|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$| | Expectation value of random vector |$\boldsymbol{y}$| |
|$[[ m,n ]]$| | Set of integers from m to n |
|$\boldsymbol{M}^{\boldsymbol{T}}$| | Transpose of real matrix |$\boldsymbol{M}$| |
|$\boldsymbol{M}^{\boldsymbol{\dagger }}$| | Moore–Penrose pseudo-inverse of matrix |$\boldsymbol{M}$| |
|$\det \left(\boldsymbol{M}\right)$| | Determinant of matrix |$\boldsymbol{M}$| |
|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$| | Variance-covariance matrix of random vector |$\boldsymbol{x}$| |
|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$| | Cross-covariance matrix of random vectors |$\boldsymbol{y}$| and |$\boldsymbol{x}$| |
|$\sigma _{y}^2$| | Variance of scalar random variable y |
|$\boldsymbol{0}_{p,q}$| and |$\boldsymbol{0}_p$| | Null matrix in |$\mathbb {R}^{p \times q}$| and null vector in |$\mathbb {R}^{p}$| |
|$\boldsymbol{I}_p$| | Square p × p identity matrix |
Notation . | Description . |
---|---|
|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$| | Set of N random seeds rn of probability space |
|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$| | Random column vector of size p at seed rn |
|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$| | Expectation value of random vector |$\boldsymbol{y}$| |
|$[[ m,n ]]$| | Set of integers from m to n |
|$\boldsymbol{M}^{\boldsymbol{T}}$| | Transpose of real matrix |$\boldsymbol{M}$| |
|$\boldsymbol{M}^{\boldsymbol{\dagger }}$| | Moore–Penrose pseudo-inverse of matrix |$\boldsymbol{M}$| |
|$\det \left(\boldsymbol{M}\right)$| | Determinant of matrix |$\boldsymbol{M}$| |
|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$| | Variance-covariance matrix of random vector |$\boldsymbol{x}$| |
|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$| | Cross-covariance matrix of random vectors |$\boldsymbol{y}$| and |$\boldsymbol{x}$| |
|$\sigma _{y}^2$| | Variance of scalar random variable y |
|$\boldsymbol{0}_{p,q}$| and |$\boldsymbol{0}_p$| | Null matrix in |$\mathbb {R}^{p \times q}$| and null vector in |$\mathbb {R}^{p}$| |
|$\boldsymbol{I}_p$| | Square p × p identity matrix |
Notation . | Description . |
---|---|
|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$| | Set of N random seeds rn of probability space |
|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$| | Random column vector of size p at seed rn |
|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$| | Expectation value of random vector |$\boldsymbol{y}$| |
|$[[ m,n ]]$| | Set of integers from m to n |
|$\boldsymbol{M}^{\boldsymbol{T}}$| | Transpose of real matrix |$\boldsymbol{M}$| |
|$\boldsymbol{M}^{\boldsymbol{\dagger }}$| | Moore–Penrose pseudo-inverse of matrix |$\boldsymbol{M}$| |
|$\det \left(\boldsymbol{M}\right)$| | Determinant of matrix |$\boldsymbol{M}$| |
|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$| | Variance-covariance matrix of random vector |$\boldsymbol{x}$| |
|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$| | Cross-covariance matrix of random vectors |$\boldsymbol{y}$| and |$\boldsymbol{x}$| |
|$\sigma _{y}^2$| | Variance of scalar random variable y |
|$\boldsymbol{0}_{p,q}$| and |$\boldsymbol{0}_p$| | Null matrix in |$\mathbb {R}^{p \times q}$| and null vector in |$\mathbb {R}^{p}$| |
|$\boldsymbol{I}_p$| | Square p × p identity matrix |
Notation . | Description . |
---|---|
|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$| | Set of N random seeds rn of probability space |
|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$| | Random column vector of size p at seed rn |
|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$| | Expectation value of random vector |$\boldsymbol{y}$| |
|$[[ m,n ]]$| | Set of integers from m to n |
|$\boldsymbol{M}^{\boldsymbol{T}}$| | Transpose of real matrix |$\boldsymbol{M}$| |
|$\boldsymbol{M}^{\boldsymbol{\dagger }}$| | Moore–Penrose pseudo-inverse of matrix |$\boldsymbol{M}$| |
|$\det \left(\boldsymbol{M}\right)$| | Determinant of matrix |$\boldsymbol{M}$| |
|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$| | Variance-covariance matrix of random vector |$\boldsymbol{x}$| |
|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$| | Cross-covariance matrix of random vectors |$\boldsymbol{y}$| and |$\boldsymbol{x}$| |
|$\sigma _{y}^2$| | Variance of scalar random variable y |
|$\boldsymbol{0}_{p,q}$| and |$\boldsymbol{0}_p$| | Null matrix in |$\mathbb {R}^{p \times q}$| and null vector in |$\mathbb {R}^{p}$| |
|$\boldsymbol{I}_p$| | Square p × p identity matrix |
2 METHODS
Our goal is to find a more precise – i.e. lower variance – and unbiased estimator of |$\mathbb {E}\left[ \boldsymbol{y}\right]$| with a much smaller number of simulations |$\boldsymbol{y}_n$|. The means by which we achieve this is to construct another set of quantities that are fast to compute such that (i) their means are small enough to be negligible, and (ii) their errors are anticorrelated with the errors in the |$\boldsymbol{y}_n$|,3 and add some multiple of these to |$\boldsymbol{\bar{y}}$| to cancel some of the error in the yn. This is the control variates principle.
2.1 Theoretical framework
In what follows we will use the word simulation to refer to costly high-fidelity runs and surrogate for fast but low-fidelity runs.
2.1.1 Introduction with the scalar case
2.1.2 Multivariate control variates
Optimizing variance reduction here means minimizing the confidence region associated to |$\mathbb {E}\left[ \boldsymbol{x}(\boldsymbol{\beta })\right]$| and represented by the generalized variance |$\det \left(\boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta })\right)$|. Appendix A presents a Bayesian solution to the Gaussian version of this optimization problem.
2.2 Estimation in practice
In this section, we examine practical implications of the control variates implementation when the optimal control matrix |$\boldsymbol{\beta }$| (or coefficients) and the mean of the cheap estimator |$\boldsymbol{\mu _c}$| are unknown. We will consider an online approach in order to improve the estimates of (4) or (8) as simulations and surrogates are computed. Estimating |$\boldsymbol{\mu _c}$| is done through an inexpensive pre-computation step that consists in running fast surrogates. From now on, to differentiate our use of the control variates principle and its application to cosmological simulations from the theory presented above, we will refer to it as the CARPool technique.
For the purposes of this paper, we will take as our goal to produce low-variance estimates of expectation values of full simulation observables. When we discuss model error, it is therefore only relative to the full simulation. From an absolute point of view the accuracy of the full simulation depends on a number of factors such as particle number, force resolution, timestepping, inclusion of physical effects, etc. The numerical examples of full simulations we give are not selected for their unmatched accuracy, but for the availability of a large ensemble that we can use to validate the CARPool results.
2.2.1 Estimation of |$\boldsymbol{\mu _c}$|
In the textbook control variates setting, the crude approximation |$\boldsymbol{\mu _c}$| of |$\boldsymbol{\mu }$| is assumed to be known. There is no reason for this to be the case in the context of cosmological simulations, thus we compute |$\bar{\boldsymbol{\mu }}_{\boldsymbol{c}}$| with surrogate samples drawn on a separate set of seeds |$\mathcal {S}_{M} = \left\lbrace r_{1}, \dots , r_{M} \right\rbrace$| (|$\mathcal {S}_{N} \cap \mathcal {S}_{M} = \emptyset$|, where |$\mathcal {S}_{N}$| is the set of initial conditions of simulations). What is then the additional variance-covariance of the control variates estimate stemming from the estimation of |$\boldsymbol{\mu _c}$|?
2.2.2 Estimation of the control matrix
Note that for finite N, the inverse of |$\boldsymbol{\widehat{\Sigma }_{cc}}$| in equation (13) is not an unbiased estimator of the precision matrix |$\boldsymbol{\Sigma _{cc}^{-1}}$| (Hartlap, Simon & Schneider 2006). Moreover, |$\boldsymbol{\widehat{\Sigma }_{cc}^{-1}}$| is not defined when |$\boldsymbol{\widehat{\Sigma }_{cc}}$| is rank-deficient, which is guaranteed to happen when N is smaller that p . We have consequently replaced |$\boldsymbol{\Sigma _{cc}^{-1}}$| by the Moore–Penrose pseudo-inverse – always defined and unique – |$\boldsymbol{\Sigma _{cc}^{\dagger }}$| in equation (8) for the numerical analysis presented in Section 4 to be able to compute multivariate CARPool estimates even when N < p.
Since the singular value decomposition exists for any complex or real matrix, we can write |$\boldsymbol{\Sigma _{yc}} = \boldsymbol{UVW^{T}}$| and |$\boldsymbol{\Sigma _{cc}} = \boldsymbol{OPQ^{T}}=\boldsymbol{OPO^{T}}$| by symmetry. The optimal control matrix now gives |$\boldsymbol{\beta ^{\star }} = \boldsymbol{UVW^{T}OP^{-1}O^{T}}$|. The product |$\boldsymbol{-P^{\frac{1}{2}}O^{T}}$| whitens the centered surrogate vector elements (principal component analysis whitening), |$\boldsymbol{OP^{-\frac{1}{2}}}$| restretches the coefficients and returns them to the surrogate basis, and then |$\boldsymbol{UVW^{T}}$| projects the scaled surrogate elements into the high-fidelity simulation basis and rescales them to match the costly simulation covariance. It follows that, when using |${\boldsymbol{\hat{\beta }}}$| in practice, the projections are done in bases specifically adapted to the |$\boldsymbol{y}$| and |$\boldsymbol{c}$| samples available. With this argument, we justify why we use the same simulation/surrogate pairs to compute |${\boldsymbol{\hat{\beta }}}$| first (with the Moore–Penrose pseudo-inverse of the surrogate covariance replacing the precision matrix) and estimate the CARPool mean after that.
2.2.3 Multivariate versus univariate CARPool
So far we have not assumed any special structure for |$\boldsymbol{\beta }$|. If, as in the classical control variates setting, the (potentially dense) covariances on the right-hand side of equation (8) are known a priori, then |$\boldsymbol{\beta ^{\star }}$| is the best solution because it exploits the mutual information between all elements of |$\boldsymbol{y}$| and |$\boldsymbol{c}$|.
In practice, we will be using the online approach discussed in Section 2.2.2 for a very small number of simulations. If we are limited by a very small number of |$\left\lbrace \boldsymbol{y}_n,\boldsymbol{c}_n \right\rbrace$| pairs compared to the number of elements of the vectors, the estimate of |$\boldsymbol{\beta ^{\star }}$| can be unstable and possibly worsen the variance of equation (15), though unbiasedness remains guaranteed.
We will demonstrate below that in the case of small number of simulations and a large number of statistics to estimate from the simulations, it is advantageous to impose structure on |$\boldsymbol{\beta }$|. In the simplest case, we can set the off-diagonal elements to zero. This amounts to treating each vector element separately and results in a decoupled problem with a separate solution (4) for each vector element.
gadget, where we compute the sample mean |$\boldsymbol{\bar{y}}$| from N-body simulations only.
Multivariate CARPool described by equation (6), where we estimate the control matrix |$\boldsymbol{\beta }$| online using equations (13), and denote it by |$\boldsymbol{\beta ^{\star }}$|.
Univariate CARPool, where we use the empirical counterpart of equation (4) as the control coefficient for each element of a vector: we estimate |$\boldsymbol{\beta ^{\mathrm{diag}}}$|.
Other, intermediate choices between fully dense and diagonal |$\boldsymbol{\beta }$| are possible and may be advantageous in some circumstances. We will leave an exploration of these to future work, and simply note here that this freedom to tune |$\boldsymbol{\beta }$| does not affect the mean of the CARPool estimate.
3 COSMOLOGICAL SIMULATIONS
This section describes the simulation methods that we use to compute the statistics presented in Section 4. The simulations assume a Λ cold dark matter (ΛCDM) cosmology congruent with the Planck constraints provided by Planck Collaboration (2020): Ωm = 0.3175, Ωb = 0.049, h = 0.6711, ns = 0.9624, σ8 = 0.834, w = −1.0, and Mν = 0.0 eV.
3.1 Quijote simulations at the fiducial cosmology
Villaescusa-Navarro et al. (2020) have publicly released data outputs from N-body cosmological simulations run with the full TreePM code gadget-iii, a development of the previous version gadget-ii by Springel (2005).4 Available data and statistics include simulation snapshots, matter power spectra, matter bispectra and matter probability density functions. The sample mean of each statistic computed from all available realizations gives the unbiased estimator of |$\mathbb {E} \left[ \boldsymbol{y} \right] = \boldsymbol{\mu }$|. The fiducial cosmology data set contains 15 000 realizations; their characteristics are grouped in Table 2.
Characteristic/parameter . | Value . |
---|---|
Simulation box volume | (1000 h−1Mpc)3 |
Number of CDM particles | Np = 5123 |
Force mesh grid size | Nm = 1024 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {3.0, 2.0, 1.0, 0.5, 0.0} |
Characteristic/parameter . | Value . |
---|---|
Simulation box volume | (1000 h−1Mpc)3 |
Number of CDM particles | Np = 5123 |
Force mesh grid size | Nm = 1024 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {3.0, 2.0, 1.0, 0.5, 0.0} |
Characteristic/parameter . | Value . |
---|---|
Simulation box volume | (1000 h−1Mpc)3 |
Number of CDM particles | Np = 5123 |
Force mesh grid size | Nm = 1024 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {3.0, 2.0, 1.0, 0.5, 0.0} |
Characteristic/parameter . | Value . |
---|---|
Simulation box volume | (1000 h−1Mpc)3 |
Number of CDM particles | Np = 5123 |
Force mesh grid size | Nm = 1024 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {3.0, 2.0, 1.0, 0.5, 0.0} |
As discussed in Section 2.2, the Quijote simulations are selected because we have access to an extensive ensemble of simulations that we can use to validate the CARPool approach. In the following we will look at wavenumbers k = ∼1 hMpc−1 where the Quijote simulations may not be fully resolved. This is not important for the purposes of this paper; we will consider the full simulation ensemble as the gold standard that we attempt to reproduce with a much smaller number of simulations plus fast surrogates.
In the next section, we present the chosen low-fidelity simulation code which provides an approximate statistic |$\boldsymbol{c}$| for our numerical experiments.
3.2 Choice of approximate simulation method
Characteristic/parameter . | Value . |
---|---|
Number of timesteps | 20 (linearly spaced) |
Modified timestepping from Tassev et al. (2013) | nLPT = +0.5 |
Force mesh grid size | Nm = 512 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {1.0, 0.5, 0.0} |
Characteristic/parameter . | Value . |
---|---|
Number of timesteps | 20 (linearly spaced) |
Modified timestepping from Tassev et al. (2013) | nLPT = +0.5 |
Force mesh grid size | Nm = 512 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {1.0, 0.5, 0.0} |
Characteristic/parameter . | Value . |
---|---|
Number of timesteps | 20 (linearly spaced) |
Modified timestepping from Tassev et al. (2013) | nLPT = +0.5 |
Force mesh grid size | Nm = 512 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {1.0, 0.5, 0.0} |
Characteristic/parameter . | Value . |
---|---|
Number of timesteps | 20 (linearly spaced) |
Modified timestepping from Tassev et al. (2013) | nLPT = +0.5 |
Force mesh grid size | Nm = 512 |
Starting redshift | zi = 127 |
Initial conditions | Second-order Lagrangian perturbation theory (2LPT) |
Redshift of data outputs | z ∈ {1.0, 0.5, 0.0} |
4 APPLICATION AND RESULTS
In this section, we apply the CARPool technique to three standard cosmological statistics: the matter power spectrum, the matter bispectrum, and the one-dimensional probability density function (PDF) of matter fractional overdensity. We seek to improve the precision of estimates of theoretical expectations of these quantities as computed by gadget-iii. To assess the actual improvement, we need the sample mean |$\boldsymbol{\bar{y}}$| of the Quijote simulations on the one hand, and the estimator (15) on the other hand.
Additionally, unless stated otherwise, each test case has the following characteristics:
|$N_\mathrm{max}=500 \, \left\lbrace \boldsymbol{y}_i, \boldsymbol{c}_i \right\rbrace$| simulation pairs are generated, and the cumulative sample mean |$\boldsymbol{\bar{y}}$| (resp. |$\boldsymbol{\bar{x}(\beta)}$|) is computed for every other 5 additional simulations (resp. simulation pairs).
M = 1, 500 additional fast simulations are dedicated to the estimation of |$\boldsymbol{\mu _c}$|.
The sample mean of 15 000 N-body simulations, accessible in the Quijote database, is taken as the true |$\boldsymbol{\mu }$|.
p = q since we post-process gadget-iii and l-picola snapshots with the same analysis codes (e.g. same vector size for |$\boldsymbol{y}$| and |$\boldsymbol{c}$|).
The analysis is performed at redshift z = 0.5. The lower the redshift, the more non-linear (and hence more difficult) the structure formation problem. We pick the lowest redshift that is relevant for upcoming galaxy surveys. We expect CARPool to be even more efficient for higher redshifts.
|$\delta (\boldsymbol{x}) \equiv \rho (\boldsymbol{x})/\bar{\rho } - 1$| is the matter density contrast field; the first term designates the matter fractional overdensity field computed with the Cloud-in-Cell (CiC) mass assignment scheme. |$\boldsymbol{x}$| exceptionally denotes the three-dimensional comoving grid coordinates here.
Ngrid designates the density contrast grid size when post-processing snapshots.
We use bias-corrected and accelerated (BCa) bootstrap,6 with B = 5 000 samples with replacement, to compute the |$95{{\ \rm per\ cent}}$| confidence intervals of the estimators. Efron & Tibshirani (1994) explain the computation.
The procedure of the method is illustrated in Fig. 1. The first step is to run M fast surrogates to compute the approximate mean |$\boldsymbol{\mu }_{\boldsymbol{c}}$|. How large M should be depends on the accuracy demanded by the user. Then, for each newly picked initial condition, both the expensive simulation code and the low-fidelity method are run to produce a snapshot pair. Only in this step do we need to run the high-fidelity simulation code N times. The mean (15) can be computed for each additional pair to track the estimate. In the next section, we assess the capacity of CARPool to use less than 10 simulations and a set of fast surrogates to match the precision of a large number of N-body simulations. All the statistics are calculated from the snapshots with the python 3 module pylians3.7

Flowchart of the practical application of CARPool to cosmological simulations. We highlight the estimation of |$\boldsymbol{\mu _c}$| as a precomputation step using M fast simulations. The larger the M, the less impacted the variance/covariance of the control variates estimator, as expressed in (11) and Appendix A. The fractional overdensity images are projected slices of 60 h−1Mpc.
4.1 Matter power spectrum
This section is dedicated to estimating the power spectrum of matter density in real space at z = 0.5, the lower end of the range covered by next-generation galaxy redshift surveys. The density contrast |$\delta (\boldsymbol{x})$| is computed from each snapshot with the grid size Ngrid = 1024. The publicly available power spectra range from |$k_\mathrm{min}= 8.900 \times 10^{-3} \, h {\rm Mpc^{-1}}$| to |$k_\mathrm{max}=5.569\, h {\rm Mpc^{-1}}$| and contain 886 bins. The following analysis is restricted to |$k_\mathrm{max}=1.194\, h {\rm Mpc^{-1}}$| that results in 190 bins. We simplify our test case by compressing the power spectra into p = 95 bins, using the appropriate re-weighting by the number of modes in each k bin given in pylians3. Univariate CARPool gives the best results since we are using the smallest possible number of costly N-body simulations; for this reason, power spectrum estimates using the multivariate framework are not shown here. As we discuss in appendix C, we intentionally run our fast surrogate (COLA) in a mode that produces a power spectrum that is highly biased compared to the full simulations, with a power deficit of more than 60 per cent on small scales.
4.1.1 CARPool versus N-body estimates
Fig. 2 shows the estimated power spectrum with 95 per cent confidence intervals enlarged by a factor of 20 for better visibility. Only 5 N-body simulations are needed to compute an unbiased estimate of the power spectrum with much higher precision than 500 N-body runs on large scales and on the scale of Baryon Acoustic Oscillations (BAO). On small scales, confidence intervals are of comparable size.8

Estimated power spectrum with 500 N-body simulations versus 5 pairs of ‘N-body + cheap’ simulations, from which |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is derived. The estimated 95 per cent confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 20 for better visibility.
We must verify that these results are not produced by a ‘lucky’ set of 5 simulation pairs. To this end, we compute 100 CARPool means |$\boldsymbol{\bar{x}(\widehat{\beta ^\mathrm{diag})}}$| from distinct sets of five random seeds. The CARPool estimates fall within a sub- per cent accuracy relative to the sample mean from 15 000 N-body simulations, as illustrated by the upper panel of Fig. 3. The gadget sample mean percentage error of 500 simulations with respect to 15 000 simulations is plotted with 95 per cent confidence intervals. We stress here that every percentage error plot in this paper shows an error with respect to 15 000 N-body simulations. The mean of 500 gadget realizations is thus not at zero per cent, though the difference is very small.

Estimated power spectrum percentage error with respect to 15 000 N-body runs: 500 N-body simulations versus 100 sets of five pairs of ‘N-body + cheap’ simulations. Each set uses a distinct |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|, calculated with the same seeds used for |$\boldsymbol{\bar{x}}$|. The upper panel estimate uses |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| while the lower panel convolves the diagonal elements of |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| with a narrow top-hat window. Beta smoothing removes outliers and Gaussianizes the tails by effectively increasing the number of degrees of freedom for each β estimate. Both panels use the same random seeds. The estimated 95 per cent confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap. The dark blue symbols show the 68 per cent percentile of the CARPool estimates ordered by the absolute value of the percentage error; the rest appears in light blue symbols.
4.1.2 Beta smoothing
Since we use a very small number of simulations, the estimates of the diagonal elements of |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| are noisy. This leads to some heavy tailed distributions for the CARPool estimates. Using the freedom we have to modify |$\boldsymbol{\beta }$| without affecting unbiasedness, we can exploit the fact that we expect neighboring bins to have similar optimal |$\boldsymbol{\beta }$|. Convolving the diagonal elements with a five-bin-wide top-hat window slightly reduces the spread at small scales of CARPool estimates computed with only five gadget power spectra and removes outliers. The comparison of the two panels in Fig. 3 illustrates this point. Using a nine-bin-wide Hanning window for the smoothing yields similar results. We call this technique beta smoothing and use it with a five-bin-wide top-hat window in what follows.
Both panels of Fig. 3 show the symmetric |$95{{\ \rm per\ cent}}$| confidence intervals of the surrogate mean with grey dashed lines. They represent the |$95{{\ \rm per\ cent}}$| error band likely to stem from the estimation of |$\boldsymbol{\mu _c}$|, relatively to the mean of 15 000 gadget simulations, hence the fact that, at large scales especially, the CARPool means concentrate slightly away from the nullpercentage error. Though the unbiased estimator in equation (15) takes a precomputed cheap mean, the practitioner can decide to run more approximate simulations on the fly to improve the accuracy of |$\boldsymbol{\bar{\mu }_c}$|. Note that the CARPool means with 5 N-body simulations still land withing the |$95{{\ \rm per\ cent}}$| confidence intervals from 500 gadget simulations, even at large scales where the difference due to the surrogate mean is visible.
Fig. 4 exhibits the convergence of one power spectrum bin at the BAO scale as we add more simulations: the |$95{{\ \rm per\ cent}}$| error band of the control variates estimate shrinks extremely fast compared to that of the N-body sample mean.

Convergence of a single k-bin at the BAO scale: the cumulative sample mean |$\boldsymbol{\bar{y}}$| of N-body simulations versus the sample mean |$\boldsymbol{\bar{x}(\widehat{\beta ^\mathrm{diag}})}$|. Confidence intervals take into account that |$\boldsymbol{\beta ^\mathrm{diag}}$| is estimated from the same number of samples used to compute the CARPool estimate of P(k).
4.1.3 Empirical variance reduction
The left-hand panel of Fig. 5 shows the empirical generalized variance reduction of the CARPool estimate compared to the standard estimate, as defined in equation (9). The vertical axis corresponds to the volume ratio of two parallelepipeds of dimension p = 95, in other words the volume ratio of error ‘boxes’ for two estimators. The determinant |$\det \left(\boldsymbol{\widehat{\Sigma _{yy}}}\right)$| is fixed because we take all 15 000 N-body simulations available in Quijote to compute the most accurate estimate of |$\boldsymbol{\Sigma _{yy}}$| we have access to, whereas |$\det \left(\boldsymbol{\Sigma _{xx}}(\boldsymbol{\hat{\beta }})\right)$| changes each time new simulation pairs are run. More precisely, for each data point in Fig. 5, we take the control matrix estimate computed with |$5k,k \in [[ 1,100 ]]$| simulation pairs and generate 3000 |$\boldsymbol{x}$| samples according to (14) to obtain an estimator of |$\boldsymbol{\Sigma _{xx}}$|. For that, we use 3000 Quijote simulations and 3000 additional l-picola surrogates run with the corresponding seeds.

Left-hand panel: Generalized variance ratio for the power spectrum up to kmax ≈ 1.2 hMpc−1 as a function of the number of available simulations. Each |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| serves to generate 3000 samples according to (14) to estimate the CARPool covariance matrix. Right-hand panel: Standard deviation reduction for each power spectrum bin due to CARPool. The blue and black curves use |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have five samples only to compute |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|. |$\boldsymbol{\Sigma _{yy}}$| is estimated using all 15 000 available power spectra from the Quijote simulations.
The simpler univariate scheme outperforms the estimation of the optimal |$\boldsymbol{\beta ^{\star }}$| for |$N=5k,k \in [[ 1,100 ]]$|, corroborating the experiments of Section 4.1.1. Furthermore, variance reduction granted by a sub-optimal diagonal |$\boldsymbol{\beta ^\mathrm{diag}}$| improves rapidly and reaches its apparent limit quickly. We suspect that the slight worsening of the variance reduction, when the number of available samples to estimate |$\boldsymbol{\beta ^{\star }}$| neighbors the vector size p, is linked to the eigenspectrum of |$\boldsymbol{\Sigma _{c,c}^{\dagger }}$| and could be improved by projecting out the eigenmodes corresponding to the smallest, noisiest eigenvalues.
We depict the scale-dependent performance of CARPool for the matter power spectrum in the right-hand panel of Fig. 5. The vertical axis is the variance reduction to expect from the optimal control coefficients (or matrix). Namely, we take the data points of the left panel for 500 simulation/surrogate pairs, extract the diagonal of the covariance matrices, and divide the arrays. The blue and black curves show the variance reduction with respect to the sample mean of N-body simulations using all 500 simulation/surrogate pairs to estimate the control matrix. In practice, we estimate |$\boldsymbol{\beta }$| using only five simulation/surrogate pairs; does this noisy |$\boldsymbol{\hat{\beta }}$| lead to significant inefficiency? The grey dashed curve shows the actual standard deviation reduction brought by the rough estimate of |$\boldsymbol{\beta ^\mathrm{diag}}$| using five simulation pairs only, with which the results of Figs 2 and 3 are computed. A few k-bins fluctuate high but the variance reduction remains close to optimal, especially considering that only five simulations were used, and we have not attempted any further regularization except for beta smoothing.
4.2 Matter bispectrum
We compute the shot-noise corrected matter bispectrum in real space (Hahn et al. 2020; Villaescusa-Navarro et al. 2020), using pyspectrum 9 with Ngrid = 360 and bins of width |$\Delta k= 3k_\mathrm{f} = 1.885\times 10^{-2}\, h {\rm Mpc^{-1}}$|, where |$k_\mathrm{f} = \frac{2\pi }{L}\, h {\rm Mpc^{-1}}$| is the fundamental mode depending on the box size L. As in the previous section, we present only the results using |$\boldsymbol{\beta ^\mathrm{diag}}$| instead of |$\boldsymbol{\beta ^{\star }}$|. We examine two distinct sets of bispectrum coefficients: in the first case we study the bispectrum for squeezed isosceles triangles as a function of opening angle only, averaging over scale; in the second case we compute equilateral triangles as a function of k.
4.2.1 Squeezed isosceles triangles
We start the analysis by regrouping isosceles triangles (k1 = k2) and re-weighting the bispectrum monopoles for various |$k_3/k_1$| ratios in ascending order. Only squeezed triangles are considered here: |$\left(k_3/k_1\right)_\mathrm{max} = 0.20$| so that the dimension of |$\boldsymbol{y}$| is p = 98 (see Table 1).
4.2.2 CARPool versus N-body estimates
On the order of 5 samples are required to achieve a precision similar to that of the sample mean of 500 N-body simulations as we show in Fig. 6 (upper panel). Fig. 7 (upper panel) corroborates the claim by showing thepercentage error of 100 CARPool means using 5 costly simulations each. The reference is the mean of the 15 000 bispectra from the Quijote simulations. As in the previous section, we show the |$95{{\ \rm per\ cent}}$| error band due to estimation of the surrogate mean |$\boldsymbol{\mu _c}$| with dashed curves.

Upper panel: Estimated bispectrum for squeezed isosceles triangles with 500 N-body simulations versus 5 pairs of ‘N-body + cheap’ simulations, from which the smoothed |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is derived. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 20 for better visibility. Lower panel: As in the upper panel, but for the reduced bispectrum of equilateral triangles.

Upper panel: Estimated bispectra percentage error for squeezed isosceles triangles with respect to 15 000 N-body runs: 500 N-body simulations versus 100 sets of 5 pairs of ‘N-body + cheap’ simulations. Each set uses a distinct |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|, calculated with the same seeds intervening in |$\boldsymbol{\bar{x}}$| and smoothed by a five-bin-wide flat window. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap. The dark blue symbols show the |$68{{\ \rm per\ cent}}$| percentile of the CARPool estimates ordered by the absolute value of the percentage error; light-blue symbols represent the rest. Lower panel: As in the upper panel, but for the reduced bispectrum of equilateral triangles.
4.2.3 Empirical variance reduction
As for the power spectrum, the upper left-hand panel of Fig. 8 shows that the generalized variance reduction is much more significant when separately estimating control coefficients for each triangle configuration. The right-hand side of the curve suggests an increasing improvement of the multivariate case, but in this range of numbers of required samples the variance reduction scheme loses its appeal. We have used 1800 additional simulations to compute the covariance matrices intervening in the generalized variance estimates. In the upper right-hand panel of the figure, the calculation of the standard deviation ratio for each triangle configuration follows the same logic as in Section 4.1.3. The grey dashed curve corresponds to the standard deviation reduction brought by control coefficients (i.e. the univariate CARPool framework) estimated with 5 simulation/surrogate pairs only.

Upper left-hand panel: Generalized variance ratio of bispectrum for squeezed isosceles triangles as a function of the number of available simulations. Each |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| serves to generate 1800 samples according to (14) to estimate the CARPool covariance matrix. Upper right-hand panel: Standard deviation reduction for each squeezed isosceles triangle to expect from CARPool. The blue and black curves respectively use |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have five samples only to compute |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|. |$\boldsymbol{\Sigma _{yy}}$| is estimated with all 15 000 available bispectra from the Quijote simulations. Lower panels: As in the upper panels, but for the reduced bispectrum of equilateral triangles.
4.2.4 Equilateral triangles
Here, we analyse equilateral triangles with the modulus of k1 = k2 = k3 varying up to |$k_\mathrm{max} = 0.75\, h {\rm Mpc^{-1}}$| (p = 40). For better visibility, we show the reduced bispectrum monopole Q(k1, k2, k3).
4.2.5 CARPool versus N-body estimates
Similarly to the previous set of triangle configurations, we compare the precision of the CARPool estimator using 5 N-body simulations with that of the sample mean from 500 gadget runs. Fig. 6 (lower panel) exhibits the estimated reduced bispectrum with five seeds, while Fig. 7 (lower panel) shows the relative error of various CARPool sets with respect to the reference from 15 000 N-body samples.
4.2.6 Empirical variance reduction
In Fig. 8 (lower panels), we observe a trend similar to that of the previous experiments: the univariate control coefficients are much better than the control matrix in terms of generalized variance reduction for a realistic number of full N-body simulations.
4.3 Probability density function of smoothed matter fractional overdensity
The power spectrum and the bispectrum are Fourier-space statistics. How does CARPool fare on a purely direct-space statistic? In the Quijote simulations, the probability density function of the matter fractional overdensity, or the matter PDF, is computed on a grid with Ngrid = 512, smoothed by a top-hat filter of radius R. There are 100 histogram bins in the range |$\rho /\bar{\rho } \in \left[ 10^{-2}, 10^{2}\right]$|. We work with the R = 5 h−1Mpc case and restrict the estimation of the PDF to the interval |$\rho /\bar{\rho } \in \left[ 8\times 10^{-2}, 5\times 10^1\right]$| that contains p = 70 bins. Note that we intentionally do not do anything to improve the correspondence of the surrogate and simulation histograms, an example of which is displayed in Fig. 9.

Probability density function of the smoothed matter fractional overdensity of gadget-iii and l-picola snapshots at z = 0.5 for the same initial conditions. The characteristics of l-picola are provided in Table 3.
4.3.1 Empirical variance reduction
For the matter PDF, we show the empirical variance reduction results before the actual estimates: Fig. 10 shows that the variance reduction is much milder for the PDF than for the power spectrum or the bispectrum, both for the univariate and multivariate CARPool frameworks. While the multivariate case does eventually lead to significant gains, CARPool needs |$\mathcal {O}(100)$| simulations to learn how to map density contrast in COLA outputs to density contrast in gadget-iii simulations. While COLA places overdense structures close to the right position, their density contrast is typically underestimated, meaning a level sets of the COLA output is informative about a different level set of the gadget-iii simulation.

Left-hand panel: Generalized variance ratio of the matter PDF as a function of the number of available simulations. Each |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| serves to generate 1800 samples according to (14) to estimate the CARPool covariance matrix. Right-hand panel: Standard deviation reduction for the PDF bin to expect from CARPool. The blue and black curves respectively use |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have 10 samples only to compute |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|. |$\boldsymbol{\Sigma _{yy}}$| is estimated with all 15 000 available PDFs from the Quijote simulations.
The right-hand panel none the less proves that it is possible to reduce the variance of the one-point PDF with CARPool, unlike with paired-fixed fields (Villaescusa-Navarro et al. 2018). As for the bispectrum, we took the data outputs of 1800 additional simulations to compute the covariance matrices intervening in the generalized variance and standard error estimates.
4.3.2 CARPool versus N-body estimates
For the matter PDF we compare CARPool estimates in both the multivariate and univariate settings. Figs 11 and 12 are paired and show the comparable performance at the tails of the estimated PDF for the smoothed |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| with 50 samples on the one hand, and the dense |$\widehat{\boldsymbol{\beta }}$| matrix obtained with 125 simulations on the other. We can expect |$\mathcal {O}(10^1)$| fewer N-body simulations to compute an accurate estimate of the PDF when applying the simple univariate CARPool technique (50 instead of 500 here). As discussed above, with enough simulations CARPool can learn the mapping between the density contrasts of COLA and gadget outputs. Therefore, the matter PDF is a case where the multivariate framework, which involves the estimation of p × p covariance matrices, shows improvement over the more straightforward univariate case once the number of available simulation pairs passes a threshold.

Estimated matter PDF with 500 N-body simulations versus CARPool estimates. |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is used in the upper panel whereas the full control matrix is computed in the lower panel. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 40 for better visibility.

Estimated matter PDF percentage error with respect to 15 000 N-body runs: sample mean of 500 N-body simulations versus CARPool estimates. In the upper panel, |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is used for each set and smoothed by a five-bin-wide flat window. In the lower panel, the full control matrix |$\widehat{\boldsymbol{\beta }}$| is estimated for each group of seeds. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap.
While we wanted to test the performance of CARPool with minimal tuning, we expect that with some mild additional assumptions and tuning the univariate CARPool approach could be improved and similar gains to the multivariate case could be obtained with a smaller number of simulations. As an example, one could pre-process the COLA outputs to match the PDF (and power spectrum) of gadget-iii using the approach described in Leclercq et al. (2013) to guarantee a close correspondence between bins of density contrast. In addition, a regularizing assumption would be to consider transformations from COLA to gadget-iii density contrasts that are smooth and monotonic.
4.4 Summary of results
Here we present a summary of the variance reduction observed in our numerical experiments. With M =1500 additional fast simulations reserved for estimating the cheap mean |$\boldsymbol{\bar{\mu }_c}$|, and with percentage errors relative to the mean of 15 000 full N-body runs available in Quijote, we find:
With only 5 N-body simulations, the univariate CARPool technique recovers the 95-bin power spectrum up to |$k_\mathrm{max} \approx 1.2\, h {\rm Mpc^{-1}}$| within the |$0.5{{\ \rm per\ cent}}$| error band, when the control coefficients are smoothed.
For the bispectrum of 98 squeezed isosceles triangle configurations, the recovery is within |$2{{\ \rm per\ cent}}$| when 5 N-body simulations are available, and |$1{{\ \rm per\ cent}}$| when we have 10 of them, still with the smoothed |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|.
The bispectrum estimator of equilateral triangles on 40 bins falls within the |$2{{\ \rm per\ cent}}$| (resp. |$1{{\ \rm per\ cent}}$|) error band with 5 simulations (resp. 10) at large k, and performs better than the mean of 500 gadget simulations at large scales.
The standard deviation of matter PDF bins can also be reduced with CARPool, by factors between 3 and 10, implying that the number of required costly simulations is lowered by an order of magnitude.
In Appendix B, we provide the power spectrum and bispectrum results when the CARPool means are computed with 10 simulation/surrogate pairs instead of the 5 pairs presented so far.
5 DISCUSSION AND CONCLUSIONS
We presented CARPool, a general scheme for reducing variance on estimates of large-scale structure statistics. It operates on the idea of forming a combination (pooling) of a small number of accurate simulations with a larger number of fast but approximate surrogates in such a way as to not introduce systematic error (zero bias) on the combination. The result is equivalent to having run a much larger number of accurate simulations. This approach is particularly adapted to cosmological applications where our detailed physical understanding has resulted in a number of perturbative and non-perturbative methods to build fast surrogates for high-accuracy cosmological simulations.
To show the operation and promise of the technique, we computed high-accuracy and low-variance predictions for statistics of gadget-iii cosmological N-body simulations in the ΛCDM model at z = 0.5. A large number of surrogates are available; for illustration we selected the approximate particle mesh solver l-picola.
For three different examples of statistics, the matter power spectrum, the matter bispectrum, and the probability density function of the matter fractional overdensity, CARPool reduces variance by factors 10 to 100 even in the non-linear regime, and by much larger factors on large scales. Using only five gadget-iii simulations CARPool is able to compute Fourier-space two-point and three-point functions of the matter distribution at a precision comparable to 500 gadget-iii simulations.
CARPool requires (i) inexpensive access to surrogate solutions, and (ii) strong correlations of the fluctuations about the mean of the surrogate model with the fluctuations of the expensive and accurate simulations. By construction, CARPool estimates are unbiased compared to the full simulations no matter how biased the surrogates might be. In all our examples, we achieved substantial variance reductions even though the fast surrogate statistics were highly biased compared to the full simulations.
So far we have presented CARPool as a way to accelerate the convergence of ensemble averages of accurate simulations. An equivalent point of view would be to consider it a method to remove approximation error from ensembles of fast mocks by running a small number of full simulations. Such simulations often already exist, as in our case with the Quijote simulations, not least because strategies to produce fast surrogates are often tested against a small number of simulations.
In some cases there are opportunities to use CARPool almost for free: for instance, using linear theory from the initial conditions as a surrogate model has the advantage that |$\boldsymbol{\mu _c}$| (the mean linear theory power spectrum) is perfectly known a priori. In addition, the de-correlation between linearly and non-linearly evolved perturbations is well studied, and can be used to set |$\boldsymbol{\beta }$|. Even for just a single N-body simulation, and without the need to estimate |$\boldsymbol{\mu _c}$| from an ensemble of surrogates, this would remove cosmic variance on the largest scales better than in our numerical experiments with l-picola, which are limited by the uncertainty of the |$\boldsymbol{\mu _c}$| estimate.
Regardless of the details of the implementation, the reduction of sample variance on observables could be used to avoid having to run ensembles of simulations (or even surrogates) at the full survey volume. This would simplify simulation efforts for upcoming large surveys since memory limitations rather than computational time are currently the most severe bottleneck for full-survey simulations (Potter et al. 2017).
In comparison to other methods of variance reduction, CARPool has the main advantage of guaranteeing lack of model error (‘bias’) compared to the full simulation. ‘Fixing’ (Angulo & Pontzen 2016; Pontzen et al. 2016) explicitly modifies the statistics of the generated simulation outputs; which observables are unbiased must be checked on a case-by-case basis, either through theoretical arguments or through explicit simulation (Villaescusa-Navarro et al. 2018). Klypin, Prada & Byun (2020) argue that ‘fixed’ field initialization is unsuitable for simulation suites to estimate accurate covariance matrices, and they are pessimistic about the possibility of generating mock galaxy catalogues solely with this technique.
Pontzen et al. (2016) and Angulo & Pontzen (2016) also introduce and study the ‘pairing’ technique. ‘Pairing’ reduces variance for k-space observables (such as the power spectrum) by a factor of |$\mathcal {O}(1)$| by combining two simulations whose initial conditions only differ by an overall minus sign, that is they are phase-flipped. This technique can be analysed simply in the control variates framework of CARPool. Consider the phase-flipped simulation as the surrogate for the moment. The mean of an ensemble of phase-flipped simulations is identical to the mean of the unflipped simulations by symmetry. ‘Pairing’ then amounts to taking |$\beta=-1$| to cancel off contributions of odd-order terms in the initial conditions (Angulo & Pontzen 2016; Pontzen et al. 2016) to reduce variance on the simulation output. Inserting this |$\beta$| in equation (2) and taking the expectation shows that ‘pairing’ is an unbiased estimator of the simulation mean.
Other opportunities of exploiting the control variates principle abound; related ideas have been used in the past. As an example, a very recent study (Smith et al. 2021) succeeds in reducing the variance of the quadrupole estimator of the two-point clustering statistic in redshift space. In this case, the variance reduction is achieved by combining different, correlated lines of sight through the halo catalogue of the Outer Rim simulation. Though not driven by a general theoretical framework that guarantees unbiasedness and optimal variance reduction, for the specific application at hand their approach does not require pre-computation of fast surrogates and uses a control matrix set based on physical assumptions.
While we intentionally refrained from tuning CARPool for this first study, there are opportunities to use physical insight to adapt it for cosmological applications. For instance, the one-point remapping technique proposed by Leclercq et al. (2013) that allows us to increase the cross-correlation between LPT-evolved density fields and full N-body simulations could improve snapshots of a chosen surrogate for CARPool.
In future work, we plan to explore intermediate forms of CARPool between the multivariate and univariate versions we study in this paper. Any given entry of |$\boldsymbol{y}$| could be predicted by an optimal combination of a small subset of |$\boldsymbol{c}$|. In this case, the variance reduction could be improved compared to the univariate case while the reduced dimension of the control matrix would ensure a stable estimate using a moderate number of simulations.
The CARPool setup can be applied to numerous ‘N-body code plus surrogate’ couples for cosmology. It can be used to make high-resolution corrections to low-resolution simulations, while reducing variance. This will provide an alternative to the procedure suggested by Rasera et al. (2014), where the mass resolution effect is estimated by a polynomial fit of the matter power spectrum ratio, and the work of Blot et al. (2015), where a linear transformation of the low-resolution power spectra preserving the mean and variance is smoothed by a polynomial fit. Furthermore, rather than using a single surrogate, taking advantage of multiple low-fidelity methods for variance reduction is also a possibility to explore, especially if the cost of running a large number of surrogates is non-neglible. For instance, taking the linear theory as a second surrogate in addition to l-picola would have strongly reduced the number of l-picola runs required to match the variance of the |$\boldsymbol{\mu _c}$| estimate to the massively reduced variance of |$\boldsymbol{y}-\boldsymbol{\beta }\left(\boldsymbol{c} - \boldsymbol{\mu _c} \right)$|. In this regard, the multifidelity Monte Carlo scheme of Peherstorfer et al. (2016) and the approximate control variates framework of Gorodetsky et al. (2020) are recent techniques that reduce variance with multiple surrogates for a fixed computational budget. We can also combine CARPool with other techniques. For instance, if the paired-fixed fields initialization of Angulo & Pontzen (2016) is found to be unbiased in practice for a particular statistic, then one can combine it with CARPool for further variance reduction.
The simplicity of the theory behind CARPool makes the method attractive for various applications both in and beyond cosmology, as long as the conditions given above are satisfied. Our results suggest that CARPool allows estimating the expectation values of any desired large-scale structure correlators with negligible variances from a small number of accurate simulations, thereby providing a useful complement to analytical approaches such as higher-order perturbation theory or effective field theory. We are planning to explore a number of these applications in upcoming publications.
ACKNOWLEDGEMENTS
We thank Martin Crocce, Janis Fluri, Cullan Howlett, and Hans Arnold Winther for their advice on COLA, and Boris Leistedt for stimulating discussions. We are grateful to Pier-Stefano Corasaniti, Eiichiro Komatsu, Marius Millea, Andrew Pontzen, Yann Rasera, and Matias Zaldarriaga for stimulating comments on an earlier version of the manuscript. Nicolas Chartier acknowledges funding from LabEx ENS-ICFP (PSL). Benjamin Wandelt acknowledges support by the ANR BIG4 project, grant ANR-16-CE23-0002 of the French Agence Nationale de la Recherche; and the Labex ILP (reference ANR-10-LABX-63) part of the Idex SUPER, and received financial state aid managed by the Agence Nationale de la Recherche, as part of the programme Investissements d’avenir under the reference ANR-11-IDEX-0004-02. The Flatiron Institute is supported by the Simons Foundation. Yashar Akrami is supported by LabEx ENS-ICFP: ANR-10-LABX-0010/ANR-10-IDEX-0001-02 PSL*. Francisco Villaescusa-Navarro acknowledges funding from the WFIRST program through NNG26PJ30C and NNN12AA01C.
DATA AVAILABILITY
The data underlying this article are available through globus.org, and instructions can be found at https://github.com/franciscovillaescusa/Quijote-simulations. Additionally, a python3 package and code examples are provided at https://github.com/CompiledAtBirth/pyCARPool to reproduce some results presented in this study.
Footnotes
As a jargon reminder, the accuracy and precision of an estimate refer, respectively, to the trueness of its expectation (in terms of the statistical bias) and the confidence in the expectation (standard errors, confidence intervals).
We will consider surrogates to be much faster than simulations, so that we only need to consider the number of simulations to evaluate computational cost.
The intuition behind this principle is that for two random scalars a and b, we have |$\sigma _{a+b}^2 = \sigma _{a}^2 + \sigma _{b}^2+2\mathrm{cov}(a,b)$|.
Instructions to access the data are given at https://github.com/franciscovillaescusa/Quijote-simulations
The parallelized version of the code is available at http://cosmo.nyu.edu/roman/2LPT/
Available at https://github.com/cgevans/scikits-bootstrap
Available at https://github.com/franciscovillaescusa/Pylians3
While bootstrap is robust for estimating the |$95{{\ \rm per\ cent}}$| error bars of a sample mean with 500 simulation, it is not equally reliable with a very small number of realizations. This leads to large bin-to-bin variations of the estimated CARPool confidence intervals in Fig. 2. An alternative, parametric computation of confidence intervals with very few samples can be found in Appendix B, using Student t-score values.
Available at https://github.com/changhoonhahn/pySpectrum
REFERENCES
APPENDIX A: ANALYTICAL DERIVATION: A BAYESIAN APPROACH
There is an elegant Bayesian derivation of the optimal form of the control variates estimator for the Gaussian case. The result coincides with the minimum variance estimator even in the non-Gaussian case. As in the derivation by Rubinstein & Marcus (1985), the covariance matrices of the full simulations |$\boldsymbol{y}$| and of the fast simulations |$\boldsymbol{c}$| are assumed to be known. In the main text, we use non-parametric approaches to estimate uncertainties since |$\boldsymbol{\beta }$| is not known a priori but estimated from the same simulations that we use to estimate |$\boldsymbol{\mu _y}$|.
APPENDIX B: ADDITIONAL INSIGHT ON RESULTS AND CONFIDENCE INTERVALS
Because the trustworthiness of confidence intervals for a sample mean with very few realizations is debatable, we provide here, by way of an example for the power spectrum only, Fig. B1 with bootstrap confidence intervals of 10 CARPool samples and Fig. B2 for CARPool with 5 and 10 N-body simulations but with t-score intervals accordingly to equation (B1). The latter figure is to compare with Figs 2 and B1 (exact same data except for the blue CARPool confidence intervals). We have agreement between the paired plots, and we notice that the symmetric confidence intervals from t-score tend to be larger. Additionally, for the two- and three-point clustering statistics, we present in Figs B3 (power spectrum) and B4 (bispectrum) the percentage error of CARPool means with 10 simulations that are not shown in the main part of the paper.

As in Fig. 2, but with 10 N-body simulations used for the CARPool estimate.

As in the lower panel of Fig. 3, but with 10 N-body simulations used for the CARPool estimate.

As in Fig. 7, but with 10 N-body simulations used for the CARPool estimate.
We provide also in Fig. B5 an overview of the optimal control matrix |$\boldsymbol{\beta ^{\star }}$| from equation (8) for the matter power spectrum and matter PDF test cases.

Estimated matrices intervening in |$\boldsymbol{\beta ^{\star }}$| for the matter power spectrum (left) and the matter PDF (right). The cross-covariance, covariance, and precision matrices are normalized i.e. we display |$\boldsymbol{D} ^{-1}\boldsymbol{\widehat{\Sigma }}\boldsymbol{D} ^{-1}$| with |$\boldsymbol{D} = \sqrt{\mathrm{diag}\left({\boldsymbol{\widehat{\Sigma }}}\right)}$|. ‘od’ denotes the fractional overdensity bin |$\rho /\bar{\rho }$|. For better visibility, the diverging colour scale is not forced to be centered at 0.0 for the |$\boldsymbol{\Sigma _{yc}}$| and |$\boldsymbol{\Sigma _{cc}}$| estimates in the upper left corner (power spectrum). All matrices are estimated using 500 simulation pairs, and represent the ‘close to optimal’ |$\boldsymbol{\beta ^{\star }}$| towards which the control matrix estimator tends in the multivariate setting.
APPENDIX C: COLA TIMESTEPPING AND CROSS-CORRELATION COEFFICIENTS
The numerator in (C3) is the cross power spectrum between the two aforementioned density contrast fields. |$\delta (\boldsymbol{k})$| is the Fourier transform of the real-space density contrast |$\delta (\boldsymbol{x})$|. Note that these coefficients serve as a proxy for the correlation between the COLA and gadget snapshots, but do not provide an estimation of the canonical cross-correlations of (10) between the statistics |$\boldsymbol{y}$| and |$\boldsymbol{c}$| computed from these snapshots. Having tested different schemes, we concluded that choosing linearly-spaced timesteps yields a better cross-correlation than with logarithmic ones, and that the fewer the timesteps, the more influential the modified timestepping parameter nLPT in terms of cross-correlation coefficients (in the case of this study, with a very high starting redshift of zi = 127). Fig. C1 shows an example with 10 and 20 linearly-spaced timesteps and nLPT ∈ { − 2.5, +0.5} (the fiducial value and our experimentally ‘best’ value, respectively). Although ζyc(k = 1.0 hMpc−1) ≈ 0.96 with 10 timesteps exceeds ζyc(k = 1.0 hMpc−1) ≈ 0.94 with 20 timesteps for nLPT = +0.5, we still chose to generate our l-picola snapshots with 20 timesteps between zi = 127 and z = 0.0, again, to avoid tuning l-picola for any one particular statistic. In any case, even with 20 timesteps the l-picola surrogates are much faster than full gadget-iii simulations.

Power spectrum recovery ratio (top) and cross power spectrum coefficients (bottom) at z = 0.5 between a specific l-picola snapshot computed with 10 (left) and 20 (right) linearly-spaced timesteps and the corresponding N-body snapshot derived from the same initial conditions at zi = 127.