CARPool: fast, accurate computation of large-scale structure statistics by pairing costly and cheap cosmological simulations

Mathematical notation and definitions.

Notation	Description
\|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$\|	Set of N random seeds r_n of probability space
\|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$\|	Random column vector of size p at seed r_n
\|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$\|	Expectation value of random vector \|$\boldsymbol{y}$\|
\|$[[ m,n ]]$\|	Set of integers from m to n
\|$\boldsymbol{M}^{\boldsymbol{T}}$\|	Transpose of real matrix \|$\boldsymbol{M}$\|
\|$\boldsymbol{M}^{\boldsymbol{\dagger }}$\|	Moore–Penrose pseudo-inverse of matrix \|$\boldsymbol{M}$\|
\|$\det \left(\boldsymbol{M}\right)$\|	Determinant of matrix \|$\boldsymbol{M}$\|
\|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$\|	Variance-covariance matrix of random vector \|$\boldsymbol{x}$\|
\|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$\|	Cross-covariance matrix of random vectors \|$\boldsymbol{y}$\| and \|$\boldsymbol{x}$\|
\|$\sigma _{y}^2$\|	Variance of scalar random variable y
\|$\boldsymbol{0}_{p,q}$\| and \|$\boldsymbol{0}_p$\|	Null matrix in \|$\mathbb {R}^{p \times q}$\| and null vector in \|$\mathbb {R}^{p}$\|
\|$\boldsymbol{I}_p$\|	Square p × p identity matrix

Notation	Description
\|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$\|	Set of N random seeds r_n of probability space
\|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$\|	Random column vector of size p at seed r_n
\|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$\|	Expectation value of random vector \|$\boldsymbol{y}$\|
\|$[[ m,n ]]$\|	Set of integers from m to n
\|$\boldsymbol{M}^{\boldsymbol{T}}$\|	Transpose of real matrix \|$\boldsymbol{M}$\|
\|$\boldsymbol{M}^{\boldsymbol{\dagger }}$\|	Moore–Penrose pseudo-inverse of matrix \|$\boldsymbol{M}$\|
\|$\det \left(\boldsymbol{M}\right)$\|	Determinant of matrix \|$\boldsymbol{M}$\|
\|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$\|	Variance-covariance matrix of random vector \|$\boldsymbol{x}$\|
\|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$\|	Cross-covariance matrix of random vectors \|$\boldsymbol{y}$\| and \|$\boldsymbol{x}$\|
\|$\sigma _{y}^2$\|	Variance of scalar random variable y
\|$\boldsymbol{0}_{p,q}$\| and \|$\boldsymbol{0}_p$\|	Null matrix in \|$\mathbb {R}^{p \times q}$\| and null vector in \|$\mathbb {R}^{p}$\|
\|$\boldsymbol{I}_p$\|	Square p × p identity matrix

Table 1.

Mathematical notation and definitions.

Notation	Description
\|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$\|	Set of N random seeds r_n of probability space
\|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$\|	Random column vector of size p at seed r_n
\|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$\|	Expectation value of random vector \|$\boldsymbol{y}$\|
\|$[[ m,n ]]$\|	Set of integers from m to n
\|$\boldsymbol{M}^{\boldsymbol{T}}$\|	Transpose of real matrix \|$\boldsymbol{M}$\|
\|$\boldsymbol{M}^{\boldsymbol{\dagger }}$\|	Moore–Penrose pseudo-inverse of matrix \|$\boldsymbol{M}$\|
\|$\det \left(\boldsymbol{M}\right)$\|	Determinant of matrix \|$\boldsymbol{M}$\|
\|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$\|	Variance-covariance matrix of random vector \|$\boldsymbol{x}$\|
\|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$\|	Cross-covariance matrix of random vectors \|$\boldsymbol{y}$\| and \|$\boldsymbol{x}$\|
\|$\sigma _{y}^2$\|	Variance of scalar random variable y
\|$\boldsymbol{0}_{p,q}$\| and \|$\boldsymbol{0}_p$\|	Null matrix in \|$\mathbb {R}^{p \times q}$\| and null vector in \|$\mathbb {R}^{p}$\|
\|$\boldsymbol{I}_p$\|	Square p × p identity matrix

Notation	Description
\|$\mathcal {S}_{N} = \left\lbrace r_1, \dots , r_N \right\rbrace$\|	Set of N random seeds r_n of probability space
\|$\boldsymbol{y}(r_n)\equiv \boldsymbol{y}_n$\|	Random column vector of size p at seed r_n
\|$\mathbb {E}\left[ \boldsymbol{y} \right]\equiv \boldsymbol{\mu _y}$\|	Expectation value of random vector \|$\boldsymbol{y}$\|
\|$[[ m,n ]]$\|	Set of integers from m to n
\|$\boldsymbol{M}^{\boldsymbol{T}}$\|	Transpose of real matrix \|$\boldsymbol{M}$\|
\|$\boldsymbol{M}^{\boldsymbol{\dagger }}$\|	Moore–Penrose pseudo-inverse of matrix \|$\boldsymbol{M}$\|
\|$\det \left(\boldsymbol{M}\right)$\|	Determinant of matrix \|$\boldsymbol{M}$\|
\|$\mathbb {E} \left[\left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right]\right) \left(\boldsymbol{x} - \mathbb {E} \left[\boldsymbol{x}\right]\right)^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{xx}}}$\|	Variance-covariance matrix of random vector \|$\boldsymbol{x}$\|
\|$\mathbb {E} \left[ \left(\boldsymbol{y} - \mathbb {E} \left[ \boldsymbol{y} \right] \right) \left(\boldsymbol{x} - \mathbb {E} \left[ \boldsymbol{x}\right] \right) ^{\boldsymbol{T}} \right]\, \equiv \boldsymbol{\Sigma _{\boldsymbol{yx}}}$\|	Cross-covariance matrix of random vectors \|$\boldsymbol{y}$\| and \|$\boldsymbol{x}$\|
\|$\sigma _{y}^2$\|	Variance of scalar random variable y
\|$\boldsymbol{0}_{p,q}$\| and \|$\boldsymbol{0}_p$\|	Null matrix in \|$\mathbb {R}^{p \times q}$\| and null vector in \|$\mathbb {R}^{p}$\|
\|$\boldsymbol{I}_p$\|	Square p × p identity matrix

2 METHODS

Let us consider a set of observables y_i we would like to model (e.g. power spectrum or bispectrum bins) and collect them into a random vector |$\boldsymbol{y}$| with values in |$\mathbb {R}^p$|⁠. The standard estimate of the theoretical expectation of |$\boldsymbol{y}$|⁠, |$\mathbb {E}\left[ \boldsymbol{y}\right] =\boldsymbol{\mu }$|⁠, from a set of independent and identically distributed realizations |$\boldsymbol{y}_n$|⁠, n = 1, …N, is the sample mean

$$\begin{eqnarray*} \boldsymbol{\bar{y}}=\frac{1}{N}\sum _{n=1}^{N} \boldsymbol{y}_n. \end{eqnarray*}$$

(1)

Then the standard deviation σ_i of each element |$\bar{y}_i$| decreases as |$\mathcal {O}(N^{-\frac{1}{2}})$|⁠, under mild regularity conditions (principally that σ_i exists).

Our goal is to find a more precise – i.e. lower variance – and unbiased estimator of |$\mathbb {E}\left[ \boldsymbol{y}\right]$| with a much smaller number of simulations |$\boldsymbol{y}_n$|⁠. The means by which we achieve this is to construct another set of quantities that are fast to compute such that (i) their means are small enough to be negligible, and (ii) their errors are anticorrelated with the errors in the |$\boldsymbol{y}_n$|⁠,³ and add some multiple of these to |$\boldsymbol{\bar{y}}$| to cancel some of the error in the y_n. This is the control variates principle.

2.1 Theoretical framework

In what follows we will use the word simulation to refer to costly high-fidelity runs and surrogate for fast but low-fidelity runs.

2.1.1 Introduction with the scalar case

Let us consider a scalar simulated observable y, such that |$\mathbb {E} \left[ y \right] = \mu$|⁠, and a surrogate c of y with |$\mathbb {E} \left[ c \right] = \mu _c$|⁠. Note that |$\mu \ne \mu_c$| in general. For any |$\beta \in \mathbb {R}$|⁠, the quantity

$$\begin{eqnarray*} x(\beta) = y - \beta \left(c - \mu _c \right) \end{eqnarray*}$$

(2)

is an unbiased estimator of μ by construction. The optimal value for β is determined by minimizing the variance of the new estimator,

$$\begin{eqnarray*} \sigma _{x(\beta)}^2 = \beta ^2 \sigma _c^2 - 2\beta \mathrm{cov}(y, c) + \sigma _y^2. \end{eqnarray*}$$

(3)

The function (3) of β has a strict global minimum point at

$$\begin{eqnarray*} \beta ^{\star } =\rm{arg\,min}_{\beta \in \mathbb {R}} \sigma _{x(\beta)}^2= \frac{\mathrm{cov}(y,c)}{\sigma _c^2}. \end{eqnarray*}$$

(4)

Plugging equation (4) into equation (3) allows us to express the variance reduction ratio of control variates as

$$\begin{eqnarray*} \frac{\sigma _{x(\beta)}^2}{\sigma _y^2} = 1 - \rho _{y,c}^2\, , \end{eqnarray*}$$

(5)

with ρ_{y, c} the Pearson correlation coefficient between y and c. The latter result shows that no matter how biased the surrogate c might be, the more correlated it is with the simulation y, the better the variance reduction. For the classical control variates method, the choice of c is restricted to cases where |$\mu_c$| and β are known a priori. In Section 2.2 below, we will consider the more general case, typically encountered in practice, where β is not known and we must estimate it from data.

2.1.2 Multivariate control variates

Let |$\boldsymbol{y}$| be an unbiased and costly simulation statistic of expectation |${\boldsymbol{\mu } \in \mathbb {R}^p}$|⁠, and |$\boldsymbol{c}$| an approximate realization with |$\mathbb {E} \left[ \boldsymbol{c} \right] = \boldsymbol{\mu _c} \in \mathbb {R}^q$|⁠. Similarly to the scalar case, for any |$\boldsymbol{\beta } \in \mathbb {R}^{p \times q}$| the control variates estimator is

$$\begin{eqnarray*} \boldsymbol{x}(\boldsymbol{\beta }) = \boldsymbol{y} - \boldsymbol{\beta } \left(\boldsymbol{c} - \boldsymbol{\mu _c} \right). \end{eqnarray*}$$

(6)

|$\boldsymbol{\Sigma _{xx}}$|⁠, the covariance matrix of the random vector |$\boldsymbol{x(\beta)}$|⁠, is expressed as a function of |$\boldsymbol{\beta }$|⁠,

$$\begin{eqnarray*} \boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta }) = \boldsymbol{\beta }\boldsymbol{\Sigma _{cc}} \boldsymbol{\beta ^T} - \boldsymbol{\beta }\boldsymbol{\Sigma _{yc}^T} - \boldsymbol{\Sigma _{yc}} \boldsymbol{\beta ^T} + \boldsymbol{\Sigma _{yy}}. \end{eqnarray*}$$

(7)

Optimizing variance reduction here means minimizing the confidence region associated to |$\mathbb {E}\left[ \boldsymbol{x}(\boldsymbol{\beta })\right]$| and represented by the generalized variance |$\det \left(\boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta })\right)$|⁠. Appendix A presents a Bayesian solution to the Gaussian version of this optimization problem.

Here we present an outline of the derivation in de O. Porta Nova & Wilson (1993) and Venkatraman & Wilson (1986). The course by Helwig (2017) provides an overview of canonical correlation analysis that is used in the derivation. The oriented volume of the p-dimensional parallelepiped spanned by the columns of |$\boldsymbol{\Sigma _{xx}(\beta)}$| is minimized as the analogue of an error bar in the univariate case. Rubinstein & Marcus (1985) proved that

$$\begin{eqnarray*} \boldsymbol{\beta ^{\star }} = \rm{arg\,min}_{\boldsymbol{\beta } \in \mathbb {R}^{p \times q}} \det \left(\boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta })\right) = \boldsymbol{\Sigma _{yc}}\boldsymbol{\Sigma _{cc}^{-1}}. \end{eqnarray*}$$

(8)

Combining equations (8) and (7) gives the generalized variance reduction

$$\begin{eqnarray*} \frac{\det \left(\boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta ^{\star }})\right)}{\det \left(\boldsymbol{\Sigma _{yy}} \right)} &=& \frac{\det \left(\boldsymbol{\Sigma _{yy}} \left(\boldsymbol{I}_{p} - \boldsymbol{\Sigma _{yy}^{-1}} \boldsymbol{\Sigma _{yc}}\boldsymbol{\Sigma _{cc}^{-1}}\boldsymbol{\Sigma _{yc}^T} \right) \right)}{\det \left(\boldsymbol{\Sigma _{yy}} \right)} \nonumber \\ &=& \prod _{n=1}^{s = \mathrm{rank}\left(\boldsymbol{\Sigma _{yc}} \right)} \left(1 - \lambda _n^2 \right), \end{eqnarray*}$$

(9)

where the scalars |$\lambda _1^2 \ge \lambda _2^2 \ge \dots \ge \lambda _s^2 \ge 0$| are the eigenvalues of |$\boldsymbol{\Sigma _{yy}^{-1}} \boldsymbol{\Sigma _{yc}}\boldsymbol{\Sigma _{cc}^{-1}}\boldsymbol{\Sigma _{yc}^T}$| and whose square roots are the canonical correlations between |$\boldsymbol{y}$| and |$\boldsymbol{c}$|⁠. More precisely, λ₁ is the maximum obtainable cross-correlation between any linear combinations |$\boldsymbol{u_1^T}\boldsymbol{y}$| and |$\boldsymbol{v_1^T}\boldsymbol{c}$|⁠,

$$\begin{eqnarray*} \displaystyle \lambda _1 = \rm{arg\,max}_{\boldsymbol{u_1} \in \mathbb {R}^p, \boldsymbol{v_1} \in \mathbb {R}^q} \frac{\boldsymbol{u_1^T}\boldsymbol{\Sigma _{yc}}\boldsymbol{v_1}}{\sqrt{\boldsymbol{u_1^T}\boldsymbol{\Sigma _{yy}}\boldsymbol{u_1}} \sqrt{\boldsymbol{v_1^T}\boldsymbol{\Sigma _{cc}}\boldsymbol{v_1}}}, \end{eqnarray*}$$

(10)

and {λ_n; n ≤ s} are found recursively with the constraint of uncorrelatedness between |$\left\lbrace \boldsymbol{u_n^T y}, \boldsymbol{v_n^T c} \right\rbrace$| and |$\left\lbrace \boldsymbol{u_1^T y}, \boldsymbol{v_1^T c}, \dots , \boldsymbol{u_{n-1}^T y}, \boldsymbol{v_{n-1}^T c} \right\rbrace$|⁠. At the end, we have two bases for the transformed vectors |$\boldsymbol{u}=\left[ \boldsymbol{u_1^Ty}, \dots , \boldsymbol{u_s^Ty}\right]^{\boldsymbol{T}}$| and |$\boldsymbol{v}=\left[ \boldsymbol{v_1^Tc}, \dots , \boldsymbol{v_s^Tc}\right]^{\boldsymbol{T}}$| in which their cross-covariance matrix is diagonal i.e. |$\boldsymbol{\Sigma _{uv}}=\mathrm{diag}\left(\lambda _1,\dots , \lambda _s \right)$|⁠.

2.2 Estimation in practice

In this section, we examine practical implications of the control variates implementation when the optimal control matrix |$\boldsymbol{\beta }$| (or coefficients) and the mean of the cheap estimator |$\boldsymbol{\mu _c}$| are unknown. We will consider an online approach in order to improve the estimates of (4) or (8) as simulations and surrogates are computed. Estimating |$\boldsymbol{\mu _c}$| is done through an inexpensive pre-computation step that consists in running fast surrogates. From now on, to differentiate our use of the control variates principle and its application to cosmological simulations from the theory presented above, we will refer to it as the CARPool technique.

For the purposes of this paper, we will take as our goal to produce low-variance estimates of expectation values of full simulation observables. When we discuss model error, it is therefore only relative to the full simulation. From an absolute point of view the accuracy of the full simulation depends on a number of factors such as particle number, force resolution, timestepping, inclusion of physical effects, etc. The numerical examples of full simulations we give are not selected for their unmatched accuracy, but for the availability of a large ensemble that we can use to validate the CARPool results.

2.2.1 Estimation of |$\boldsymbol{\mu _c}$|

In the textbook control variates setting, the crude approximation |$\boldsymbol{\mu _c}$| of |$\boldsymbol{\mu }$| is assumed to be known. There is no reason for this to be the case in the context of cosmological simulations, thus we compute |$\bar{\boldsymbol{\mu }}_{\boldsymbol{c}}$| with surrogate samples drawn on a separate set of seeds |$\mathcal {S}_{M} = \left\lbrace r_{1}, \dots , r_{M} \right\rbrace$| (⁠|$\mathcal {S}_{N} \cap \mathcal {S}_{M} = \emptyset$|⁠, where |$\mathcal {S}_{N}$| is the set of initial conditions of simulations). What is then the additional variance-covariance of the control variates estimate stemming from the estimation of |$\boldsymbol{\mu _c}$|?

First, write each cheap-estimator realization as |$\boldsymbol{c}=\boldsymbol{\mu _c} + \boldsymbol{\delta }$|⁠, with |$\mathbb {E}\left[ \boldsymbol{\delta } \right] = \boldsymbol{0}_q$|⁠,

$$\begin{eqnarray*} \bar{\boldsymbol{\mu }}_{\boldsymbol{c}} &=& \boldsymbol{\mu _c} + \frac{1}{M}\sum _{i=1}^M \boldsymbol{\delta }_i, \nonumber \\ \boldsymbol{\Sigma }_{\boldsymbol{\bar{\boldsymbol{\mu }}_{\boldsymbol{c}}}\boldsymbol{\bar{\boldsymbol{\mu }}_{\boldsymbol{c}}}} &=& \boldsymbol{\Sigma }_{\boldsymbol{\bar{\delta }}\boldsymbol{\bar{\delta }}} = \frac{1}{M}\boldsymbol{\Sigma _{cc}}. \end{eqnarray*}$$

(11)

Replacing |$\boldsymbol{\mu _c}$| by |$\bar{\boldsymbol{\mu }}_{\boldsymbol{c}}$| in equation (6) and computing the covariance results in

$$\begin{eqnarray*} \boldsymbol{x}(\boldsymbol{\beta }, \boldsymbol{\boldsymbol{\bar{\mu }}_{\boldsymbol{c}}}) &=& \boldsymbol{y} - \boldsymbol{\beta } \left(\boldsymbol{c} - \boldsymbol{\mu _c} \right) + \boldsymbol{\beta }\boldsymbol{\bar{\delta }}, \nonumber \\ \boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta },\boldsymbol{\bar{\boldsymbol{\mu }}_{\boldsymbol{c}}}) &=& \boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta }) + \boldsymbol{\beta } \frac{\boldsymbol{\Sigma _{cc}}}{M}\boldsymbol{\beta }^{\boldsymbol{T}}, \end{eqnarray*}$$

(12)

with |$\boldsymbol{\Sigma _{xx}}(\boldsymbol{\beta })$| from equation (7). The |$\boldsymbol{\beta }\boldsymbol{\bar{\delta }}$| term above is statistically independent of the rest of the sum, since it is computed on a separate set of seeds. As expected, additional uncertainty is brought by |$\boldsymbol{\Sigma _{cc}}$| and scaled by the estimated control matrix. See Appendix A for a Bayesian derivation of the combined uncertainty in the Gaussian case while taking into account possible prior information on |$\boldsymbol{\mu}$| and/or |$\boldsymbol{\mu _c}$|⁠.

2.2.2 Estimation of the control matrix

The matrices in equation (8) need to be estimated from data via the bias-corrected sample covariance matrix

$$\begin{eqnarray*} \boldsymbol{\widehat{\Sigma }_{yc}} &=& \frac{1}{N-1} \sum _{i=1}^N \left(\boldsymbol{y}_i -\boldsymbol{\bar{y}} \right) \left(\boldsymbol{c}_i -\boldsymbol{\bar{c}} \right)^{\boldsymbol{T}}, \nonumber \\ \boldsymbol{\widehat{\Sigma }_{cc}} &=& \frac{1}{N-1} \sum _{i=1}^N \left(\boldsymbol{c}_i -\boldsymbol{\bar{c}} \right) \left(\boldsymbol{c}_i -\boldsymbol{\bar{c}} \right)^{\boldsymbol{T}}. \end{eqnarray*}$$

(13)

The computational cost of |$\boldsymbol{y}$| is the limiting factor for estimating |$\boldsymbol{\Sigma _{yc}}$|⁠. Therefore, the cross-covariance matrix is estimated online, as our primary motivation is to reduce the computation time: for instance, we certainly do not want to run more costly simulations in a precomputation step like we do for |$\boldsymbol{\mu _c}$| with fast simulations. Simply put, |$\boldsymbol{\widehat{\Sigma }_{yc}}$| is updated each time a new simulation pair is available.

Note that for finite N, the inverse of |$\boldsymbol{\widehat{\Sigma }_{cc}}$| in equation (13) is not an unbiased estimator of the precision matrix |$\boldsymbol{\Sigma _{cc}^{-1}}$| (Hartlap, Simon & Schneider 2006). Moreover, |$\boldsymbol{\widehat{\Sigma }_{cc}^{-1}}$| is not defined when |$\boldsymbol{\widehat{\Sigma }_{cc}}$| is rank-deficient, which is guaranteed to happen when N is smaller that p . We have consequently replaced |$\boldsymbol{\Sigma _{cc}^{-1}}$| by the Moore–Penrose pseudo-inverse – always defined and unique – |$\boldsymbol{\Sigma _{cc}^{\dagger }}$| in equation (8) for the numerical analysis presented in Section 4 to be able to compute multivariate CARPool estimates even when N < p.

Since the singular value decomposition exists for any complex or real matrix, we can write |$\boldsymbol{\Sigma _{yc}} = \boldsymbol{UVW^{T}}$| and |$\boldsymbol{\Sigma _{cc}} = \boldsymbol{OPQ^{T}}=\boldsymbol{OPO^{T}}$| by symmetry. The optimal control matrix now gives |$\boldsymbol{\beta ^{\star }} = \boldsymbol{UVW^{T}OP^{-1}O^{T}}$|⁠. The product |$\boldsymbol{-P^{\frac{1}{2}}O^{T}}$| whitens the centered surrogate vector elements (principal component analysis whitening), |$\boldsymbol{OP^{-\frac{1}{2}}}$| restretches the coefficients and returns them to the surrogate basis, and then |$\boldsymbol{UVW^{T}}$| projects the scaled surrogate elements into the high-fidelity simulation basis and rescales them to match the costly simulation covariance. It follows that, when using |${\boldsymbol{\hat{\beta }}}$| in practice, the projections are done in bases specifically adapted to the |$\boldsymbol{y}$| and |$\boldsymbol{c}$| samples available. With this argument, we justify why we use the same simulation/surrogate pairs to compute |${\boldsymbol{\hat{\beta }}}$| first (with the Moore–Penrose pseudo-inverse of the surrogate covariance replacing the precision matrix) and estimate the CARPool mean after that.

An online estimation of both |${\boldsymbol{\hat{\beta }}}$| and |$\boldsymbol{\bar{x}}(\boldsymbol{\hat{\beta }})$|⁠, considering incoming |$\left\lbrace \boldsymbol{y}_n,\boldsymbol{c}_n \right\rbrace$| pairs computed on the same seed r_n, amounts to computing a collection of N samples as functions of |$\widehat{\boldsymbol{\beta }}$|⁠,

$$\begin{eqnarray*} \boldsymbol{x}_n(\boldsymbol{\hat{\beta }}) = \boldsymbol{y}_n - \boldsymbol{\hat{\beta }} \left(\boldsymbol{c}_n - \boldsymbol{\bar{\mu }_c} \right). \end{eqnarray*}$$

(14)

We implement equation (6) by taking the sample mean of N such variance-reduced samples,

$$\begin{eqnarray*} \boldsymbol{\bar{x}}(\boldsymbol{\hat{\beta }}) = \boldsymbol{\bar{y}} - \boldsymbol{\hat{\beta }} \left(\boldsymbol{\bar{c}} - \boldsymbol{\bar{\mu }_c} \right). \end{eqnarray*}$$

(15)

This way, equation (15) can be computed each time a simulation/surrogate pair is drawn from a seed in |$\mathcal {S}_{N} = \left\lbrace r_{1}, \dots , r_{N} \right\rbrace$|⁠, after updating |${\boldsymbol{\hat{\beta }}}$| according to equation (13).

2.2.3 Multivariate versus univariate CARPool

So far we have not assumed any special structure for |$\boldsymbol{\beta }$|⁠. If, as in the classical control variates setting, the (potentially dense) covariances on the right-hand side of equation (8) are known a priori, then |$\boldsymbol{\beta ^{\star }}$| is the best solution because it exploits the mutual information between all elements of |$\boldsymbol{y}$| and |$\boldsymbol{c}$|⁠.

In practice, we will be using the online approach discussed in Section 2.2.2 for a very small number of simulations. If we are limited by a very small number of |$\left\lbrace \boldsymbol{y}_n,\boldsymbol{c}_n \right\rbrace$| pairs compared to the number of elements of the vectors, the estimate of |$\boldsymbol{\beta ^{\star }}$| can be unstable and possibly worsen the variance of equation (15), though unbiasedness remains guaranteed.

We will demonstrate below that in the case of small number of simulations and a large number of statistics to estimate from the simulations, it is advantageous to impose structure on |$\boldsymbol{\beta }$|⁠. In the simplest case, we can set the off-diagonal elements to zero. This amounts to treating each vector element separately and results in a decoupled problem with a separate solution (4) for each vector element.

The univariate setting of 2.1.1 applied individually to each vector element (bin) will be referred to as ‘diagonal |$\boldsymbol{\beta }$|’ or |$\boldsymbol{\beta ^{\mathrm{diag}}}$|⁠, as it amounts to fixing the non-diagonal elements of |$\boldsymbol{\Sigma _{cc}}$| and |$\boldsymbol{\Sigma _{yc}}$| to zero in equation (8) and only estimating the diagonal elements

$$\begin{eqnarray*} \boldsymbol{\beta^\mathrm{diag}} = \begin{pmatrix} \frac{cov(y_1,c_1)}{\sigma_{c_{1}}^2} \\ & \frac{cov(y_2,c_2)}{\sigma_{c_{2}}^2} & & \rm{0} \\ \rm{0} & & \ddots \\ & & & \frac{cov(y_p,c_p)}{\sigma_{c_{p}}^2} \end{pmatrix} \end{eqnarray*}$$

(16)

The intent of this paper is to show the potential of control variates for cosmological simulations; to this end, we will compare the following unbiased estimators:

gadget, where we compute the sample mean |$\boldsymbol{\bar{y}}$| from N-body simulations only.
Multivariate CARPool described by equation (6), where we estimate the control matrix |$\boldsymbol{\beta }$| online using equations (13), and denote it by |$\boldsymbol{\beta ^{\star }}$|⁠.
Univariate CARPool, where we use the empirical counterpart of equation (4) as the control coefficient for each element of a vector: we estimate |$\boldsymbol{\beta ^{\mathrm{diag}}}$|⁠.

Other, intermediate choices between fully dense and diagonal |$\boldsymbol{\beta }$| are possible and may be advantageous in some circumstances. We will leave an exploration of these to future work, and simply note here that this freedom to tune |$\boldsymbol{\beta }$| does not affect the mean of the CARPool estimate.

3 COSMOLOGICAL SIMULATIONS

This section describes the simulation methods that we use to compute the statistics presented in Section 4. The simulations assume a Λ cold dark matter (ΛCDM) cosmology congruent with the Planck constraints provided by Planck Collaboration (2020): Ω_m = 0.3175, Ω_b = 0.049, h = 0.6711, n_s = 0.9624, σ₈ = 0.834, w = −1.0, and M_ν = 0.0 eV.

3.1 Quijote simulations at the fiducial cosmology

Villaescusa-Navarro et al. (2020) have publicly released data outputs from N-body cosmological simulations run with the full TreePM code gadget-iii, a development of the previous version gadget-ii by Springel (2005).⁴ Available data and statistics include simulation snapshots, matter power spectra, matter bispectra and matter probability density functions. The sample mean of each statistic computed from all available realizations gives the unbiased estimator of |$\mathbb {E} \left[ \boldsymbol{y} \right] = \boldsymbol{\mu }$|⁠. The fiducial cosmology data set contains 15 000 realizations; their characteristics are grouped in Table 2.

Table 2.

Characteristics of gadget-iii simulations.

Characteristic/parameter	Value
Simulation box volume	(1000 h⁻¹Mpc)³
Number of CDM particles	N_p = 512³
Force mesh grid size	N_m = 1024
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {3.0, 2.0, 1.0, 0.5, 0.0}

Characteristic/parameter	Value
Simulation box volume	(1000 h⁻¹Mpc)³
Number of CDM particles	N_p = 512³
Force mesh grid size	N_m = 1024
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {3.0, 2.0, 1.0, 0.5, 0.0}

Table 2.

Characteristics of gadget-iii simulations.

Characteristic/parameter	Value
Simulation box volume	(1000 h⁻¹Mpc)³
Number of CDM particles	N_p = 512³
Force mesh grid size	N_m = 1024
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {3.0, 2.0, 1.0, 0.5, 0.0}

Characteristic/parameter	Value
Simulation box volume	(1000 h⁻¹Mpc)³
Number of CDM particles	N_p = 512³
Force mesh grid size	N_m = 1024
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {3.0, 2.0, 1.0, 0.5, 0.0}

As discussed in Section 2.2, the Quijote simulations are selected because we have access to an extensive ensemble of simulations that we can use to validate the CARPool approach. In the following we will look at wavenumbers k = ∼1 hMpc⁻¹ where the Quijote simulations may not be fully resolved. This is not important for the purposes of this paper; we will consider the full simulation ensemble as the gold standard that we attempt to reproduce with a much smaller number of simulations plus fast surrogates.

In the next section, we present the chosen low-fidelity simulation code which provides an approximate statistic |$\boldsymbol{c}$| for our numerical experiments.

3.2 Choice of approximate simulation method

Any fast solution can be used for |$\boldsymbol{c}$|⁠, provided that it can be fed with the same initial conditions as of the Quijote simulations. To this end, the matter power spectrum from camb (Lewis, Challinor & Lasenby 2000) at z = 0 is rescaled at the initial redshift z_i = 127 to generate the initial conditions, as in Villaescusa-Navarro et al. (2020). In this work, we use the l-picola code developed by Howlett, Manera & Percival (2015), an MPI parallel implementation of the COLA method (Tassev et al. 2013). The core idea of COLA is to add residual displacements computed with a Particle-Mesh N-body solver to the trajectory given by the first- and second-order LPT approximations. If |$\boldsymbol{l}$| is the initial Lagrangian position of a particle and |$\boldsymbol{x}$| is its Eulerian comoving coordinates, the evolution of the residual displacement field |$\boldsymbol{\Psi _\mathrm{res}}$| appears by rewriting the equation of motion in a frame comoving with the LPT trajectory,

$$\begin{eqnarray*} \partial _a^2 \boldsymbol{\Psi _\mathrm{res}} &= -\nabla _{\boldsymbol{x}} \Phi - \partial _a^2 \boldsymbol{\Psi _\mathrm{LPT}}, \end{eqnarray*}$$

(17)

where a is the cosmological scale factor and

$$\begin{eqnarray*} \boldsymbol{\Psi _\mathrm{res}} &\equiv &\boldsymbol{\Psi } - \boldsymbol{\Psi _\mathrm{LPT}}\nonumber, \\ \boldsymbol{x}(\boldsymbol{l},a) &\equiv &\boldsymbol{l} + \boldsymbol{\Psi }\left(\boldsymbol{l}, a \right)\nonumber. \end{eqnarray*}$$

Here, we have omitted the Hubble expansion rate and constants for simplicity, |$\boldsymbol{\Psi _\mathrm{LPT}}$| is the displacement vector associated to |$\boldsymbol{x_\mathrm{LPT}}$|⁠, the LPT approximation to the Eulerian position |$\boldsymbol{x}$| of matter particles, and Φ is the gravitational potential obtained by solving the Poisson equation with |$\nabla _{\boldsymbol{x}}$| the gradient operator in Eulerian comoving coordinates. Time integration is performed by discretizing the derivative |$\partial _a^2$| only on the left-hand side of equation (17), while the (second-order) LPT displacements are computed analytically and stored. l-picolA has its own initial conditions generator and uses a slightly modified version of the 2lptic code.⁵To generate l-picola snapshots and extract statistics, we set the free parameters as presented in Table 3. Justification for these choices, along with more details on COLA and the l-picola implementation, can be found in Appendix C.

Table 3.

Characteristics of l-picola simulations.

Characteristic/parameter	Value
Number of timesteps	20 (linearly spaced)
Modified timestepping from Tassev et al. (2013)	nLPT = +0.5
Force mesh grid size	N_m = 512
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {1.0, 0.5, 0.0}

Characteristic/parameter	Value
Number of timesteps	20 (linearly spaced)
Modified timestepping from Tassev et al. (2013)	nLPT = +0.5
Force mesh grid size	N_m = 512
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {1.0, 0.5, 0.0}

Table 3.

Characteristics of l-picola simulations.

Characteristic/parameter	Value
Number of timesteps	20 (linearly spaced)
Modified timestepping from Tassev et al. (2013)	nLPT = +0.5
Force mesh grid size	N_m = 512
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {1.0, 0.5, 0.0}

Characteristic/parameter	Value
Number of timesteps	20 (linearly spaced)
Modified timestepping from Tassev et al. (2013)	nLPT = +0.5
Force mesh grid size	N_m = 512
Starting redshift	z_i = 127
Initial conditions	Second-order Lagrangian perturbation theory (2LPT)
Redshift of data outputs	z ∈ {1.0, 0.5, 0.0}

4 APPLICATION AND RESULTS

In this section, we apply the CARPool technique to three standard cosmological statistics: the matter power spectrum, the matter bispectrum, and the one-dimensional probability density function (PDF) of matter fractional overdensity. We seek to improve the precision of estimates of theoretical expectations of these quantities as computed by gadget-iii. To assess the actual improvement, we need the sample mean |$\boldsymbol{\bar{y}}$| of the Quijote simulations on the one hand, and the estimator (15) on the other hand.

Additionally, unless stated otherwise, each test case has the following characteristics:

|$N_\mathrm{max}=500 \, \left\lbrace \boldsymbol{y}_i, \boldsymbol{c}_i \right\rbrace$| simulation pairs are generated, and the cumulative sample mean |$\boldsymbol{\bar{y}}$| (resp. |$\boldsymbol{\bar{x}(\beta)}$|⁠) is computed for every other 5 additional simulations (resp. simulation pairs).
M = 1, 500 additional fast simulations are dedicated to the estimation of |$\boldsymbol{\mu _c}$|⁠.
The sample mean of 15 000 N-body simulations, accessible in the Quijote database, is taken as the true |$\boldsymbol{\mu }$|⁠.
p = q since we post-process gadget-iii and l-picola snapshots with the same analysis codes (e.g. same vector size for |$\boldsymbol{y}$| and |$\boldsymbol{c}$|⁠).
The analysis is performed at redshift z = 0.5. The lower the redshift, the more non-linear (and hence more difficult) the structure formation problem. We pick the lowest redshift that is relevant for upcoming galaxy surveys. We expect CARPool to be even more efficient for higher redshifts.
|$\delta (\boldsymbol{x}) \equiv \rho (\boldsymbol{x})/\bar{\rho } - 1$| is the matter density contrast field; the first term designates the matter fractional overdensity field computed with the Cloud-in-Cell (CiC) mass assignment scheme. |$\boldsymbol{x}$| exceptionally denotes the three-dimensional comoving grid coordinates here.
N_grid designates the density contrast grid size when post-processing snapshots.
We use bias-corrected and accelerated (BCa) bootstrap,⁶ with B = 5 000 samples with replacement, to compute the |$95{{\ \rm per\ cent}}$| confidence intervals of the estimators. Efron & Tibshirani (1994) explain the computation.

The procedure of the method is illustrated in Fig. 1. The first step is to run M fast surrogates to compute the approximate mean |$\boldsymbol{\mu }_{\boldsymbol{c}}$|⁠. How large M should be depends on the accuracy demanded by the user. Then, for each newly picked initial condition, both the expensive simulation code and the low-fidelity method are run to produce a snapshot pair. Only in this step do we need to run the high-fidelity simulation code N times. The mean (15) can be computed for each additional pair to track the estimate. In the next section, we assess the capacity of CARPool to use less than 10 simulations and a set of fast surrogates to match the precision of a large number of N-body simulations. All the statistics are calculated from the snapshots with the python 3 module pylians3.⁷

$Flowchart of the practical application of CARPool to cosmological simulations. We highlight the estimation of $\boldsymbol{\mu _c}$ as a precomputation step using M fast simulations. The larger the M, the less impacted the variance/covariance of the control variates estimator, as expressed in (11) and Appendix A. The fractional overdensity images are projected slices of 60 h−1Mpc.$

Figure 1.

Flowchart of the practical application of CARPool to cosmological simulations. We highlight the estimation of |$\boldsymbol{\mu _c}$| as a precomputation step using M fast simulations. The larger the M, the less impacted the variance/covariance of the control variates estimator, as expressed in (11) and Appendix A. The fractional overdensity images are projected slices of 60 h⁻¹Mpc.

4.1 Matter power spectrum

This section is dedicated to estimating the power spectrum of matter density in real space at z = 0.5, the lower end of the range covered by next-generation galaxy redshift surveys. The density contrast |$\delta (\boldsymbol{x})$| is computed from each snapshot with the grid size N_grid = 1024. The publicly available power spectra range from |$k_\mathrm{min}= 8.900 \times 10^{-3} \, h {\rm Mpc^{-1}}$| to |$k_\mathrm{max}=5.569\, h {\rm Mpc^{-1}}$| and contain 886 bins. The following analysis is restricted to |$k_\mathrm{max}=1.194\, h {\rm Mpc^{-1}}$| that results in 190 bins. We simplify our test case by compressing the power spectra into p = 95 bins, using the appropriate re-weighting by the number of modes in each k bin given in pylians3. Univariate CARPool gives the best results since we are using the smallest possible number of costly N-body simulations; for this reason, power spectrum estimates using the multivariate framework are not shown here. As we discuss in appendix C, we intentionally run our fast surrogate (COLA) in a mode that produces a power spectrum that is highly biased compared to the full simulations, with a power deficit of more than 60 per cent on small scales.

4.1.1 CARPool versus N-body estimates

Fig. 2 shows the estimated power spectrum with 95 per cent confidence intervals enlarged by a factor of 20 for better visibility. Only 5 N-body simulations are needed to compute an unbiased estimate of the power spectrum with much higher precision than 500 N-body runs on large scales and on the scale of Baryon Acoustic Oscillations (BAO). On small scales, confidence intervals are of comparable size.⁸

$Estimated power spectrum with 500 N-body simulations versus 5 pairs of ‘N-body + cheap’ simulations, from which $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ is derived. The estimated 95 per cent confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 20 for better visibility.$

Figure 2.

Estimated power spectrum with 500 N-body simulations versus 5 pairs of ‘N-body + cheap’ simulations, from which |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is derived. The estimated 95 per cent confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 20 for better visibility.

We must verify that these results are not produced by a ‘lucky’ set of 5 simulation pairs. To this end, we compute 100 CARPool means |$\boldsymbol{\bar{x}(\widehat{\beta ^\mathrm{diag})}}$| from distinct sets of five random seeds. The CARPool estimates fall within a sub- per cent accuracy relative to the sample mean from 15 000 N-body simulations, as illustrated by the upper panel of Fig. 3. The gadget sample mean percentage error of 500 simulations with respect to 15 000 simulations is plotted with 95 per cent confidence intervals. We stress here that every percentage error plot in this paper shows an error with respect to 15 000 N-body simulations. The mean of 500 gadget realizations is thus not at zero per cent, though the difference is very small.

$Estimated power spectrum percentage error with respect to 15 000 N-body runs: 500 N-body simulations versus 100 sets of five pairs of ‘N-body + cheap’ simulations. Each set uses a distinct $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$, calculated with the same seeds used for $\boldsymbol{\bar{x}}$. The upper panel estimate uses $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ while the lower panel convolves the diagonal elements of $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ with a narrow top-hat window. Beta smoothing removes outliers and Gaussianizes the tails by effectively increasing the number of degrees of freedom for each β estimate. Both panels use the same random seeds. The estimated 95 per cent confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap. The dark blue symbols show the 68 per cent percentile of the CARPool estimates ordered by the absolute value of the percentage error; the rest appears in light blue symbols.$

Figure 3.

Estimated power spectrum percentage error with respect to 15 000 N-body runs: 500 N-body simulations versus 100 sets of five pairs of ‘N-body + cheap’ simulations. Each set uses a distinct |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|⁠, calculated with the same seeds used for |$\boldsymbol{\bar{x}}$|⁠. The upper panel estimate uses |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| while the lower panel convolves the diagonal elements of |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| with a narrow top-hat window. Beta smoothing removes outliers and Gaussianizes the tails by effectively increasing the number of degrees of freedom for each β estimate. Both panels use the same random seeds. The estimated 95 per cent confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap. The dark blue symbols show the 68 per cent percentile of the CARPool estimates ordered by the absolute value of the percentage error; the rest appears in light blue symbols.

4.1.2 Beta smoothing

Since we use a very small number of simulations, the estimates of the diagonal elements of |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| are noisy. This leads to some heavy tailed distributions for the CARPool estimates. Using the freedom we have to modify |$\boldsymbol{\beta }$| without affecting unbiasedness, we can exploit the fact that we expect neighboring bins to have similar optimal |$\boldsymbol{\beta }$|⁠. Convolving the diagonal elements with a five-bin-wide top-hat window slightly reduces the spread at small scales of CARPool estimates computed with only five gadget power spectra and removes outliers. The comparison of the two panels in Fig. 3 illustrates this point. Using a nine-bin-wide Hanning window for the smoothing yields similar results. We call this technique beta smoothing and use it with a five-bin-wide top-hat window in what follows.

Both panels of Fig. 3 show the symmetric |$95{{\ \rm per\ cent}}$| confidence intervals of the surrogate mean with grey dashed lines. They represent the |$95{{\ \rm per\ cent}}$| error band likely to stem from the estimation of |$\boldsymbol{\mu _c}$|⁠, relatively to the mean of 15 000 gadget simulations, hence the fact that, at large scales especially, the CARPool means concentrate slightly away from the nullpercentage error. Though the unbiased estimator in equation (15) takes a precomputed cheap mean, the practitioner can decide to run more approximate simulations on the fly to improve the accuracy of |$\boldsymbol{\bar{\mu }_c}$|⁠. Note that the CARPool means with 5 N-body simulations still land withing the |$95{{\ \rm per\ cent}}$| confidence intervals from 500 gadget simulations, even at large scales where the difference due to the surrogate mean is visible.

Fig. 4 exhibits the convergence of one power spectrum bin at the BAO scale as we add more simulations: the |$95{{\ \rm per\ cent}}$| error band of the control variates estimate shrinks extremely fast compared to that of the N-body sample mean.

$Convergence of a single k-bin at the BAO scale: the cumulative sample mean $\boldsymbol{\bar{y}}$ of N-body simulations versus the sample mean $\boldsymbol{\bar{x}(\widehat{\beta ^\mathrm{diag}})}$. Confidence intervals take into account that $\boldsymbol{\beta ^\mathrm{diag}}$ is estimated from the same number of samples used to compute the CARPool estimate of P(k).$

Figure 4.

Convergence of a single k-bin at the BAO scale: the cumulative sample mean |$\boldsymbol{\bar{y}}$| of N-body simulations versus the sample mean |$\boldsymbol{\bar{x}(\widehat{\beta ^\mathrm{diag}})}$|⁠. Confidence intervals take into account that |$\boldsymbol{\beta ^\mathrm{diag}}$| is estimated from the same number of samples used to compute the CARPool estimate of P(k).

4.1.3 Empirical variance reduction

The left-hand panel of Fig. 5 shows the empirical generalized variance reduction of the CARPool estimate compared to the standard estimate, as defined in equation (9). The vertical axis corresponds to the volume ratio of two parallelepipeds of dimension p = 95, in other words the volume ratio of error ‘boxes’ for two estimators. The determinant |$\det \left(\boldsymbol{\widehat{\Sigma _{yy}}}\right)$| is fixed because we take all 15 000 N-body simulations available in Quijote to compute the most accurate estimate of |$\boldsymbol{\Sigma _{yy}}$| we have access to, whereas |$\det \left(\boldsymbol{\Sigma _{xx}}(\boldsymbol{\hat{\beta }})\right)$| changes each time new simulation pairs are run. More precisely, for each data point in Fig. 5, we take the control matrix estimate computed with |$5k,k \in [[ 1,100 ]]$| simulation pairs and generate 3000 |$\boldsymbol{x}$| samples according to (14) to obtain an estimator of |$\boldsymbol{\Sigma _{xx}}$|⁠. For that, we use 3000 Quijote simulations and 3000 additional l-picola surrogates run with the corresponding seeds.

$Left-hand panel: Generalized variance ratio for the power spectrum up to kmax ≈ 1.2 hMpc−1 as a function of the number of available simulations. Each $\widehat{\boldsymbol{\beta }}$ and $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ serves to generate 3000 samples according to (14) to estimate the CARPool covariance matrix. Right-hand panel: Standard deviation reduction for each power spectrum bin due to CARPool. The blue and black curves use $\widehat{\boldsymbol{\beta }}$ and $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have five samples only to compute $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$. $\boldsymbol{\Sigma _{yy}}$ is estimated using all 15 000 available power spectra from the Quijote simulations.$

Figure 5.

Left-hand panel: Generalized variance ratio for the power spectrum up to k_max ≈ 1.2 hMpc⁻¹ as a function of the number of available simulations. Each |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| serves to generate 3000 samples according to (14) to estimate the CARPool covariance matrix. Right-hand panel: Standard deviation reduction for each power spectrum bin due to CARPool. The blue and black curves use |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have five samples only to compute |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|⁠. |$\boldsymbol{\Sigma _{yy}}$| is estimated using all 15 000 available power spectra from the Quijote simulations.

The simpler univariate scheme outperforms the estimation of the optimal |$\boldsymbol{\beta ^{\star }}$| for |$N=5k,k \in [[ 1,100 ]]$|⁠, corroborating the experiments of Section 4.1.1. Furthermore, variance reduction granted by a sub-optimal diagonal |$\boldsymbol{\beta ^\mathrm{diag}}$| improves rapidly and reaches its apparent limit quickly. We suspect that the slight worsening of the variance reduction, when the number of available samples to estimate |$\boldsymbol{\beta ^{\star }}$| neighbors the vector size p, is linked to the eigenspectrum of |$\boldsymbol{\Sigma _{c,c}^{\dagger }}$| and could be improved by projecting out the eigenmodes corresponding to the smallest, noisiest eigenvalues.

We depict the scale-dependent performance of CARPool for the matter power spectrum in the right-hand panel of Fig. 5. The vertical axis is the variance reduction to expect from the optimal control coefficients (or matrix). Namely, we take the data points of the left panel for 500 simulation/surrogate pairs, extract the diagonal of the covariance matrices, and divide the arrays. The blue and black curves show the variance reduction with respect to the sample mean of N-body simulations using all 500 simulation/surrogate pairs to estimate the control matrix. In practice, we estimate |$\boldsymbol{\beta }$| using only five simulation/surrogate pairs; does this noisy |$\boldsymbol{\hat{\beta }}$| lead to significant inefficiency? The grey dashed curve shows the actual standard deviation reduction brought by the rough estimate of |$\boldsymbol{\beta ^\mathrm{diag}}$| using five simulation pairs only, with which the results of Figs 2 and 3 are computed. A few k-bins fluctuate high but the variance reduction remains close to optimal, especially considering that only five simulations were used, and we have not attempted any further regularization except for beta smoothing.

4.2 Matter bispectrum

We compute the shot-noise corrected matter bispectrum in real space (Hahn et al. 2020; Villaescusa-Navarro et al. 2020), using pyspectrum ⁹ with N_grid = 360 and bins of width |$\Delta k= 3k_\mathrm{f} = 1.885\times 10^{-2}\, h {\rm Mpc^{-1}}$|⁠, where |$k_\mathrm{f} = \frac{2\pi }{L}\, h {\rm Mpc^{-1}}$| is the fundamental mode depending on the box size L. As in the previous section, we present only the results using |$\boldsymbol{\beta ^\mathrm{diag}}$| instead of |$\boldsymbol{\beta ^{\star }}$|⁠. We examine two distinct sets of bispectrum coefficients: in the first case we study the bispectrum for squeezed isosceles triangles as a function of opening angle only, averaging over scale; in the second case we compute equilateral triangles as a function of k.

4.2.1 Squeezed isosceles triangles

We start the analysis by regrouping isosceles triangles (k₁ = k₂) and re-weighting the bispectrum monopoles for various |$k_3/k_1$| ratios in ascending order. Only squeezed triangles are considered here: |$\left(k_3/k_1\right)_\mathrm{max} = 0.20$| so that the dimension of |$\boldsymbol{y}$| is p = 98 (see Table 1).

4.2.2 CARPool versus N-body estimates

On the order of 5 samples are required to achieve a precision similar to that of the sample mean of 500 N-body simulations as we show in Fig. 6 (upper panel). Fig. 7 (upper panel) corroborates the claim by showing thepercentage error of 100 CARPool means using 5 costly simulations each. The reference is the mean of the 15 000 bispectra from the Quijote simulations. As in the previous section, we show the |$95{{\ \rm per\ cent}}$| error band due to estimation of the surrogate mean |$\boldsymbol{\mu _c}$| with dashed curves.

$Upper panel: Estimated bispectrum for squeezed isosceles triangles with 500 N-body simulations versus 5 pairs of ‘N-body + cheap’ simulations, from which the smoothed $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ is derived. The estimated $95{{\ \rm per\ cent}}$ confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 20 for better visibility. Lower panel: As in the upper panel, but for the reduced bispectrum of equilateral triangles.$

Figure 6.

Upper panel: Estimated bispectrum for squeezed isosceles triangles with 500 N-body simulations versus 5 pairs of ‘N-body + cheap’ simulations, from which the smoothed |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is derived. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 20 for better visibility. Lower panel: As in the upper panel, but for the reduced bispectrum of equilateral triangles.

$Upper panel: Estimated bispectra percentage error for squeezed isosceles triangles with respect to 15 000 N-body runs: 500 N-body simulations versus 100 sets of 5 pairs of ‘N-body + cheap’ simulations. Each set uses a distinct $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$, calculated with the same seeds intervening in $\boldsymbol{\bar{x}}$ and smoothed by a five-bin-wide flat window. The estimated $95{{\ \rm per\ cent}}$ confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap. The dark blue symbols show the $68{{\ \rm per\ cent}}$ percentile of the CARPool estimates ordered by the absolute value of the percentage error; light-blue symbols represent the rest. Lower panel: As in the upper panel, but for the reduced bispectrum of equilateral triangles.$

Figure 7.

Upper panel: Estimated bispectra percentage error for squeezed isosceles triangles with respect to 15 000 N-body runs: 500 N-body simulations versus 100 sets of 5 pairs of ‘N-body + cheap’ simulations. Each set uses a distinct |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|⁠, calculated with the same seeds intervening in |$\boldsymbol{\bar{x}}$| and smoothed by a five-bin-wide flat window. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap. The dark blue symbols show the |$68{{\ \rm per\ cent}}$| percentile of the CARPool estimates ordered by the absolute value of the percentage error; light-blue symbols represent the rest. Lower panel: As in the upper panel, but for the reduced bispectrum of equilateral triangles.

4.2.3 Empirical variance reduction

As for the power spectrum, the upper left-hand panel of Fig. 8 shows that the generalized variance reduction is much more significant when separately estimating control coefficients for each triangle configuration. The right-hand side of the curve suggests an increasing improvement of the multivariate case, but in this range of numbers of required samples the variance reduction scheme loses its appeal. We have used 1800 additional simulations to compute the covariance matrices intervening in the generalized variance estimates. In the upper right-hand panel of the figure, the calculation of the standard deviation ratio for each triangle configuration follows the same logic as in Section 4.1.3. The grey dashed curve corresponds to the standard deviation reduction brought by control coefficients (i.e. the univariate CARPool framework) estimated with 5 simulation/surrogate pairs only.

$Upper left-hand panel: Generalized variance ratio of bispectrum for squeezed isosceles triangles as a function of the number of available simulations. Each $\widehat{\boldsymbol{\beta }}$ and $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ serves to generate 1800 samples according to (14) to estimate the CARPool covariance matrix. Upper right-hand panel: Standard deviation reduction for each squeezed isosceles triangle to expect from CARPool. The blue and black curves respectively use $\widehat{\boldsymbol{\beta }}$ and $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have five samples only to compute $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$. $\boldsymbol{\Sigma _{yy}}$ is estimated with all 15 000 available bispectra from the Quijote simulations. Lower panels: As in the upper panels, but for the reduced bispectrum of equilateral triangles.$

Figure 8.

Upper left-hand panel: Generalized variance ratio of bispectrum for squeezed isosceles triangles as a function of the number of available simulations. Each |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| serves to generate 1800 samples according to (14) to estimate the CARPool covariance matrix. Upper right-hand panel: Standard deviation reduction for each squeezed isosceles triangle to expect from CARPool. The blue and black curves respectively use |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have five samples only to compute |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|⁠. |$\boldsymbol{\Sigma _{yy}}$| is estimated with all 15 000 available bispectra from the Quijote simulations. Lower panels: As in the upper panels, but for the reduced bispectrum of equilateral triangles.

4.2.4 Equilateral triangles

Here, we analyse equilateral triangles with the modulus of k₁ = k₂ = k₃ varying up to |$k_\mathrm{max} = 0.75\, h {\rm Mpc^{-1}}$| (p = 40). For better visibility, we show the reduced bispectrum monopole Q(k₁, k₂, k₃).

4.2.5 CARPool versus N-body estimates

Similarly to the previous set of triangle configurations, we compare the precision of the CARPool estimator using 5 N-body simulations with that of the sample mean from 500 gadget runs. Fig. 6 (lower panel) exhibits the estimated reduced bispectrum with five seeds, while Fig. 7 (lower panel) shows the relative error of various CARPool sets with respect to the reference from 15 000 N-body samples.

4.2.6 Empirical variance reduction

In Fig. 8 (lower panels), we observe a trend similar to that of the previous experiments: the univariate control coefficients are much better than the control matrix in terms of generalized variance reduction for a realistic number of full N-body simulations.

4.3 Probability density function of smoothed matter fractional overdensity

The power spectrum and the bispectrum are Fourier-space statistics. How does CARPool fare on a purely direct-space statistic? In the Quijote simulations, the probability density function of the matter fractional overdensity, or the matter PDF, is computed on a grid with N_grid = 512, smoothed by a top-hat filter of radius R. There are 100 histogram bins in the range |$\rho /\bar{\rho } \in \left[ 10^{-2}, 10^{2}\right]$|⁠. We work with the R = 5 h⁻¹Mpc case and restrict the estimation of the PDF to the interval |$\rho /\bar{\rho } \in \left[ 8\times 10^{-2}, 5\times 10^1\right]$| that contains p = 70 bins. Note that we intentionally do not do anything to improve the correspondence of the surrogate and simulation histograms, an example of which is displayed in Fig. 9.

$Probability density function of the smoothed matter fractional overdensity of gadget-iii and l-picola snapshots at z = 0.5 for the same initial conditions. The characteristics of l-picola are provided in Table 3.$

Figure 9.

Probability density function of the smoothed matter fractional overdensity of gadget-iii and l-picola snapshots at z = 0.5 for the same initial conditions. The characteristics of l-picola are provided in Table 3.

4.3.1 Empirical variance reduction

For the matter PDF, we show the empirical variance reduction results before the actual estimates: Fig. 10 shows that the variance reduction is much milder for the PDF than for the power spectrum or the bispectrum, both for the univariate and multivariate CARPool frameworks. While the multivariate case does eventually lead to significant gains, CARPool needs |$\mathcal {O}(100)$| simulations to learn how to map density contrast in COLA outputs to density contrast in gadget-iii simulations. While COLA places overdense structures close to the right position, their density contrast is typically underestimated, meaning a level sets of the COLA output is informative about a different level set of the gadget-iii simulation.

$Left-hand panel: Generalized variance ratio of the matter PDF as a function of the number of available simulations. Each $\widehat{\boldsymbol{\beta }}$ and $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ serves to generate 1800 samples according to (14) to estimate the CARPool covariance matrix. Right-hand panel: Standard deviation reduction for the PDF bin to expect from CARPool. The blue and black curves respectively use $\widehat{\boldsymbol{\beta }}$ and $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have 10 samples only to compute $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$. $\boldsymbol{\Sigma _{yy}}$ is estimated with all 15 000 available PDFs from the Quijote simulations.$

Figure 10.

Left-hand panel: Generalized variance ratio of the matter PDF as a function of the number of available simulations. Each |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| serves to generate 1800 samples according to (14) to estimate the CARPool covariance matrix. Right-hand panel: Standard deviation reduction for the PDF bin to expect from CARPool. The blue and black curves respectively use |$\widehat{\boldsymbol{\beta }}$| and |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| estimated with 500 samples. The dashed grey curve exhibits the actual standard deviation ratio when we have 10 samples only to compute |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|⁠. |$\boldsymbol{\Sigma _{yy}}$| is estimated with all 15 000 available PDFs from the Quijote simulations.

The right-hand panel none the less proves that it is possible to reduce the variance of the one-point PDF with CARPool, unlike with paired-fixed fields (Villaescusa-Navarro et al. 2018). As for the bispectrum, we took the data outputs of 1800 additional simulations to compute the covariance matrices intervening in the generalized variance and standard error estimates.

4.3.2 CARPool versus N-body estimates

For the matter PDF we compare CARPool estimates in both the multivariate and univariate settings. Figs 11 and 12 are paired and show the comparable performance at the tails of the estimated PDF for the smoothed |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| with 50 samples on the one hand, and the dense |$\widehat{\boldsymbol{\beta }}$| matrix obtained with 125 simulations on the other. We can expect |$\mathcal {O}(10^1)$| fewer N-body simulations to compute an accurate estimate of the PDF when applying the simple univariate CARPool technique (50 instead of 500 here). As discussed above, with enough simulations CARPool can learn the mapping between the density contrasts of COLA and gadget outputs. Therefore, the matter PDF is a case where the multivariate framework, which involves the estimation of p × p covariance matrices, shows improvement over the more straightforward univariate case once the number of available simulation pairs passes a threshold.

$Estimated matter PDF with 500 N-body simulations versus CARPool estimates. $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ is used in the upper panel whereas the full control matrix is computed in the lower panel. The estimated $95{{\ \rm per\ cent}}$ confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 40 for better visibility.$

Figure 11.

Estimated matter PDF with 500 N-body simulations versus CARPool estimates. |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is used in the upper panel whereas the full control matrix is computed in the lower panel. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are computed with the BCa bootstrap. They are enlarged by a factor of 40 for better visibility.

$Estimated matter PDF percentage error with respect to 15 000 N-body runs: sample mean of 500 N-body simulations versus CARPool estimates. In the upper panel, $\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$ is used for each set and smoothed by a five-bin-wide flat window. In the lower panel, the full control matrix $\widehat{\boldsymbol{\beta }}$ is estimated for each group of seeds. The estimated $95{{\ \rm per\ cent}}$ confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap.$

Figure 12.

Estimated matter PDF percentage error with respect to 15 000 N-body runs: sample mean of 500 N-body simulations versus CARPool estimates. In the upper panel, |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$| is used for each set and smoothed by a five-bin-wide flat window. In the lower panel, the full control matrix |$\widehat{\boldsymbol{\beta }}$| is estimated for each group of seeds. The estimated |$95{{\ \rm per\ cent}}$| confidence intervals are plotted for the N-body sample mean only, using BCa bootstrap.

While we wanted to test the performance of CARPool with minimal tuning, we expect that with some mild additional assumptions and tuning the univariate CARPool approach could be improved and similar gains to the multivariate case could be obtained with a smaller number of simulations. As an example, one could pre-process the COLA outputs to match the PDF (and power spectrum) of gadget-iii using the approach described in Leclercq et al. (2013) to guarantee a close correspondence between bins of density contrast. In addition, a regularizing assumption would be to consider transformations from COLA to gadget-iii density contrasts that are smooth and monotonic.

4.4 Summary of results

Here we present a summary of the variance reduction observed in our numerical experiments. With M =1500 additional fast simulations reserved for estimating the cheap mean |$\boldsymbol{\bar{\mu }_c}$|⁠, and with percentage errors relative to the mean of 15 000 full N-body runs available in Quijote, we find:

With only 5 N-body simulations, the univariate CARPool technique recovers the 95-bin power spectrum up to |$k_\mathrm{max} \approx 1.2\, h {\rm Mpc^{-1}}$| within the |$0.5{{\ \rm per\ cent}}$| error band, when the control coefficients are smoothed.
For the bispectrum of 98 squeezed isosceles triangle configurations, the recovery is within |$2{{\ \rm per\ cent}}$| when 5 N-body simulations are available, and |$1{{\ \rm per\ cent}}$| when we have 10 of them, still with the smoothed |$\widehat{\boldsymbol{\beta ^\mathrm{diag}}}$|⁠.
The bispectrum estimator of equilateral triangles on 40 bins falls within the |$2{{\ \rm per\ cent}}$| (resp. |$1{{\ \rm per\ cent}}$|⁠) error band with 5 simulations (resp. 10) at large k, and performs better than the mean of 500 gadget simulations at large scales.
The standard deviation of matter PDF bins can also be reduced with CARPool, by factors between 3 and 10, implying that the number of required costly simulations is lowered by an order of magnitude.

In Appendix B, we provide the power spectrum and bispectrum results when the CARPool means are computed with 10 simulation/surrogate pairs instead of the 5 pairs presented so far.

5 DISCUSSION AND CONCLUSIONS

We presented CARPool, a general scheme for reducing variance on estimates of large-scale structure statistics. It operates on the idea of forming a combination (pooling) of a small number of accurate simulations with a larger number of fast but approximate surrogates in such a way as to not introduce systematic error (zero bias) on the combination. The result is equivalent to having run a much larger number of accurate simulations. This approach is particularly adapted to cosmological applications where our detailed physical understanding has resulted in a number of perturbative and non-perturbative methods to build fast surrogates for high-accuracy cosmological simulations.

To show the operation and promise of the technique, we computed high-accuracy and low-variance predictions for statistics of gadget-iii cosmological N-body simulations in the ΛCDM model at z = 0.5. A large number of surrogates are available; for illustration we selected the approximate particle mesh solver l-picola.

For three different examples of statistics, the matter power spectrum, the matter bispectrum, and the probability density function of the matter fractional overdensity, CARPool reduces variance by factors 10 to 100 even in the non-linear regime, and by much larger factors on large scales. Using only five gadget-iii simulations CARPool is able to compute Fourier-space two-point and three-point functions of the matter distribution at a precision comparable to 500 gadget-iii simulations.

CARPool requires (i) inexpensive access to surrogate solutions, and (ii) strong correlations of the fluctuations about the mean of the surrogate model with the fluctuations of the expensive and accurate simulations. By construction, CARPool estimates are unbiased compared to the full simulations no matter how biased the surrogates might be. In all our examples, we achieved substantial variance reductions even though the fast surrogate statistics were highly biased compared to the full simulations.

So far we have presented CARPool as a way to accelerate the convergence of ensemble averages of accurate simulations. An equivalent point of view would be to consider it a method to remove approximation error from ensembles of fast mocks by running a small number of full simulations. Such simulations often already exist, as in our case with the Quijote simulations, not least because strategies to produce fast surrogates are often tested against a small number of simulations.

In some cases there are opportunities to use CARPool almost for free: for instance, using linear theory from the initial conditions as a surrogate model has the advantage that |$\boldsymbol{\mu _c}$| (the mean linear theory power spectrum) is perfectly known a priori. In addition, the de-correlation between linearly and non-linearly evolved perturbations is well studied, and can be used to set |$\boldsymbol{\beta }$|⁠. Even for just a single N-body simulation, and without the need to estimate |$\boldsymbol{\mu _c}$| from an ensemble of surrogates, this would remove cosmic variance on the largest scales better than in our numerical experiments with l-picola, which are limited by the uncertainty of the |$\boldsymbol{\mu _c}$| estimate.

Regardless of the details of the implementation, the reduction of sample variance on observables could be used to avoid having to run ensembles of simulations (or even surrogates) at the full survey volume. This would simplify simulation efforts for upcoming large surveys since memory limitations rather than computational time are currently the most severe bottleneck for full-survey simulations (Potter et al. 2017).

In comparison to other methods of variance reduction, CARPool has the main advantage of guaranteeing lack of model error (‘bias’) compared to the full simulation. ‘Fixing’ (Angulo & Pontzen 2016; Pontzen et al. 2016) explicitly modifies the statistics of the generated simulation outputs; which observables are unbiased must be checked on a case-by-case basis, either through theoretical arguments or through explicit simulation (Villaescusa-Navarro et al. 2018). Klypin, Prada & Byun (2020) argue that ‘fixed’ field initialization is unsuitable for simulation suites to estimate accurate covariance matrices, and they are pessimistic about the possibility of generating mock galaxy catalogues solely with this technique.

Pontzen et al. (2016) and Angulo & Pontzen (2016) also introduce and study the ‘pairing’ technique. ‘Pairing’ reduces variance for k-space observables (such as the power spectrum) by a factor of |$\mathcal {O}(1)$| by combining two simulations whose initial conditions only differ by an overall minus sign, that is they are phase-flipped. This technique can be analysed simply in the control variates framework of CARPool. Consider the phase-flipped simulation as the surrogate for the moment. The mean of an ensemble of phase-flipped simulations is identical to the mean of the unflipped simulations by symmetry. ‘Pairing’ then amounts to taking |$\beta=-1$| to cancel off contributions of odd-order terms in the initial conditions (Angulo & Pontzen 2016; Pontzen et al. 2016) to reduce variance on the simulation output. Inserting this |$\beta$| in equation (2) and taking the expectation shows that ‘pairing’ is an unbiased estimator of the simulation mean.

Other opportunities of exploiting the control variates principle abound; related ideas have been used in the past. As an example, a very recent study (Smith et al. 2021) succeeds in reducing the variance of the quadrupole estimator of the two-point clustering statistic in redshift space. In this case, the variance reduction is achieved by combining different, correlated lines of sight through the halo catalogue of the Outer Rim simulation. Though not driven by a general theoretical framework that guarantees unbiasedness and optimal variance reduction, for the specific application at hand their approach does not require pre-computation of fast surrogates and uses a control matrix set based on physical assumptions.

While we intentionally refrained from tuning CARPool for this first study, there are opportunities to use physical insight to adapt it for cosmological applications. For instance, the one-point remapping technique proposed by Leclercq et al. (2013) that allows us to increase the cross-correlation between LPT-evolved density fields and full N-body simulations could improve snapshots of a chosen surrogate for CARPool.

In future work, we plan to explore intermediate forms of CARPool between the multivariate and univariate versions we study in this paper. Any given entry of |$\boldsymbol{y}$| could be predicted by an optimal combination of a small subset of |$\boldsymbol{c}$|⁠. In this case, the variance reduction could be improved compared to the univariate case while the reduced dimension of the control matrix would ensure a stable estimate using a moderate number of simulations.

The CARPool setup can be applied to numerous ‘N-body code plus surrogate’ couples for cosmology. It can be used to make high-resolution corrections to low-resolution simulations, while reducing variance. This will provide an alternative to the procedure suggested by Rasera et al. (2014), where the mass resolution effect is estimated by a polynomial fit of the matter power spectrum ratio, and the work of Blot et al. (2015), where a linear transformation of the low-resolution power spectra preserving the mean and variance is smoothed by a polynomial fit. Furthermore, rather than using a single surrogate, taking advantage of multiple low-fidelity methods for variance reduction is also a possibility to explore, especially if the cost of running a large number of surrogates is non-neglible. For instance, taking the linear theory as a second surrogate in addition to l-picola would have strongly reduced the number of l-picola runs required to match the variance of the |$\boldsymbol{\mu _c}$| estimate to the massively reduced variance of |$\boldsymbol{y}-\boldsymbol{\beta }\left(\boldsymbol{c} - \boldsymbol{\mu _c} \right)$|⁠. In this regard, the multifidelity Monte Carlo scheme of Peherstorfer et al. (2016) and the approximate control variates framework of Gorodetsky et al. (2020) are recent techniques that reduce variance with multiple surrogates for a fixed computational budget. We can also combine CARPool with other techniques. For instance, if the paired-fixed fields initialization of Angulo & Pontzen (2016) is found to be unbiased in practice for a particular statistic, then one can combine it with CARPool for further variance reduction.

The simplicity of the theory behind CARPool makes the method attractive for various applications both in and beyond cosmology, as long as the conditions given above are satisfied. Our results suggest that CARPool allows estimating the expectation values of any desired large-scale structure correlators with negligible variances from a small number of accurate simulations, thereby providing a useful complement to analytical approaches such as higher-order perturbation theory or effective field theory. We are planning to explore a number of these applications in upcoming publications.

ACKNOWLEDGEMENTS

We thank Martin Crocce, Janis Fluri, Cullan Howlett, and Hans Arnold Winther for their advice on COLA, and Boris Leistedt for stimulating discussions. We are grateful to Pier-Stefano Corasaniti, Eiichiro Komatsu, Marius Millea, Andrew Pontzen, Yann Rasera, and Matias Zaldarriaga for stimulating comments on an earlier version of the manuscript. Nicolas Chartier acknowledges funding from LabEx ENS-ICFP (PSL). Benjamin Wandelt acknowledges support by the ANR BIG4 project, grant ANR-16-CE23-0002 of the French Agence Nationale de la Recherche; and the Labex ILP (reference ANR-10-LABX-63) part of the Idex SUPER, and received financial state aid managed by the Agence Nationale de la Recherche, as part of the programme Investissements d’avenir under the reference ANR-11-IDEX-0004-02. The Flatiron Institute is supported by the Simons Foundation. Yashar Akrami is supported by LabEx ENS-ICFP: ANR-10-LABX-0010/ANR-10-IDEX-0001-02 PSL*. Francisco Villaescusa-Navarro acknowledges funding from the WFIRST program through NNG26PJ30C and NNN12AA01C.

DATA AVAILABILITY

The data underlying this article are available through globus.org, and instructions can be found at https://github.com/franciscovillaescusa/Quijote-simulations. Additionally, a python3 package and code examples are provided at https://github.com/CompiledAtBirth/pyCARPool to reproduce some results presented in this study.

Footnotes

1

As a jargon reminder, the accuracy and precision of an estimate refer, respectively, to the trueness of its expectation (in terms of the statistical bias) and the confidence in the expectation (standard errors, confidence intervals).

2

We will consider surrogates to be much faster than simulations, so that we only need to consider the number of simulations to evaluate computational cost.

3

The intuition behind this principle is that for two random scalars a and b, we have |$\sigma _{a+b}^2 = \sigma _{a}^2 + \sigma _{b}^2+2\mathrm{cov}(a,b)$|⁠.

4

Instructions to access the data are given at https://github.com/franciscovillaescusa/Quijote-simulations

5

The parallelized version of the code is available at http://cosmo.nyu.edu/roman/2LPT/

6

Available at https://github.com/cgevans/scikits-bootstrap

7

Available at https://github.com/franciscovillaescusa/Pylians3

8

While bootstrap is robust for estimating the |$95{{\ \rm per\ cent}}$| error bars of a sample mean with 500 simulation, it is not equally reliable with a very small number of realizations. This leads to large bin-to-bin variations of the estimated CARPool confidence intervals in Fig. 2. An alternative, parametric computation of confidence intervals with very few samples can be found in Appendix B, using Student t-score values.

9

Available at https://github.com/changhoonhahn/pySpectrum

REFERENCES

Aihara

H.

et al. ,

2018

,

PASJ

,

70

,

S4

10.1093/pasj/psx066

Angulo

R. E.

,

Pontzen

A.

,

2016

,

MNRAS

,

462

,

L1

10.1093/mnrasl/slw098

10.1016/S0370-1573(02)00135-7

Angulo

R. E.

,

Zennaro

M.

,

Contreras

S.

,

Aricò

G.

,

Pellejero-Ibañez

M.

,

Stücker

J.

,

2020

,

preprint (arXiv:2004.06245)

Bernardeau

F.

,

Colombi

S.

,

Gaztanaga

E.

,

Scoccimarro

R.

,

2002

,

Phys. Rep.

,

367

,

1

Blot

L.

et al. ,

2019

,

MNRAS

,

485

,

2806

10.1093/mnras/stz507

Blot

L.

,

Corasaniti

P. S.

,

Alimi

J. M.

,

Reverdy

V.

,

Rasera

Y.

,

2015

,

MNRAS

,

446

,

1756

10.1093/mnras/stu2190

Bouchet

F.

,

Colombi

S.

,

Hivon

E.

,

Juszkiewicz

R.

,

1995

,

A&A.

,

296

,

575

Carrasco

J. J. M.

,

Hertzberg

M. P.

,

Senatore

L.

,

2012

,

J. High Energy Phys.

,

09

,

082

10.1007/JHEP09(2012)082

Chuang

C.-H.

,

Kitaura

F.-S.

,

Prada

F.

,

Zhao

C.

,

Yepes

G.

,

2015

,

MNRAS

,

446

,

2621

10.1093/mnras/stu2301

Colavincenzo

M.

et al. ,

2019

,

MNRAS

,

482

,

4883

10.1093/mnras/sty2964

10.1103/PhysRevD.73.063519

Crocce

M.

,

Scoccimarro

R.

,

2006

,

Phys. Rev. D

,

73

,

063519

Crocce

M.

,

Castander

F. J.

,

Gaztañaga

E.

,

Fosalba

P.

,

Carretero

J.

,

2015

,

MNRAS

,

453

,

1513

10.1093/mnras/stv1708

10.1016/0377-2217(93)90262-L

de O. Porta Nova

A. M.

,

Wilson

J. R.

,

1993

,

Eur. J. Oper. Res.

,

71

,

80

DeRose

J.

et al. ,

2019

,

ApJ

,

875

,

69

10.3847/1538-4357/ab1085

10.1016/j.physrep.2017.12.002

DESI Collaboration

,

2016a

,

preprint (arXiv:1611.00036)

DESI Collaboration

,

2016b

,

preprint (arXiv:1611.00037)

Desjacques

V.

,

Jeong

D.

,

Schmidt

F.

,

2018

,

Phys. Rep.

,

733

,

1

Doré

O.

et al. ,

2014

,

preprint (arXiv:1412.4872)

Doré

O.

et al. ,

2018

,

preprint (arXiv:1805.05489)

Efron

B.

,

Tibshirani

R. J.

,

1994

,

An Introduction to the Bootstrap

.

Chapman & Hall

,

New York

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Euclid Collaboration

,

2020

,

Astronomy & Astrophysics

,

642

,

A191

Feng

Y.

,

Chu

M.-Y.

,

Seljak

U.

,

McDonald

P.

,

2016

,

MNRAS

,

463

,

2273

10.1093/mnras/stw2123

Garrison

L.

,

2019

,

PhD thesis

,

University Of Washington

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Garrison

L. H.

,

Eisenstein

D. J.

,

Ferrer

D.

,

Tinker

J. L.

,

Pinto

P. A.

,

Weinberg

D. H.

,

2018

,

ApJS

,

236

,

43

10.3847/1538-4365/aabfd3

10.1016/j.jcp.2020.109257

Goodfellow

I. J.

,

Pouget-Abadie

J.

,

Mirza

M.

,

Xu

B.

,

Warde-Farley

D.

,

Ozair

S.

,

Courville

A.

,

Bengio

Y.

,

2014

,

preprint (arXiv:1406.2661)

Gorodetsky

A. A.

,

Geraci

G.

,

Eldred

M. S.

,

Jakeman

J. D.

,

2020

,

J. Comput. Phys.

,

408

,

109257

Goroff

M. H.

,

Grinstein

B.

,

Rey

S. J.

,

Wise

M. B.

,

1986

,

ApJ

,

311

,

6

10.1086/164749

10.1016/j.newast.2015.06.003

Habib

S.

et al. ,

2016

,

New Astron.

,

42

,

49

10.1088/1475-7516/2020/03/040

Hahn

C.

,

Villaescusa-Navarro

F.

,

Castorina

E.

,

Scoccimarro

R.

,

2020

,

JCAP

,

2020

,

040

10.1051/0004-6361:20066170

Hartlap

J.

,

Simon

P.

,

Schneider

P.

,

2006

,

A&A

,

464

,

399

He

S.

,

Li

Y.

,

Feng

Y.

,

Ho

S.

,

Ravanbakhsh

S.

,

Chen

W.

,

Póczos

B.

,

2019

,

Proc. Natl. Acad. Sci.

,

116

,

13825

10.1073/pnas.1821458116

Heitmann

K.

et al. ,

2019

,

ApJS

,

245

,

16

10.3847/1538-4365/ab4da1

10.1088/0004-637X/715/1/104

Heitmann

K.

,

White

M.

,

Wagner

C.

,

Habib

S.

,

Higdon

D.

,

2010

,

ApJ

,

715

,

104

http://users.stat.umn.edu/ helwig/notes/cancor-Notes.pdf

Helwig

E. N.

,

2017

,

Canonical Correlation Analysis

.

Howlett

C.

,

Manera

M.

,

Percival

W.

,

2015

,

Astron. Comput.

,

12

,

109

10.1016/j.ascom.2015.07.003

Ishiyama

T.

et al. ,

2020

,

preprint (arXiv:2007.14720)

Ishiyama

T.

,

Fukushige

T.

,

Makino

J.

,

2009

,

PASJ

,

61

,

1319

10.1093/pasj/61.6.1319

Ivezić

Ž.

et al. ,

2019

,

ApJ

,

873

,

111

10.3847/1538-4357/ab042c

Izard

A.

,

Crocce

M.

,

Fosalba

P.

,

2016

,

MNRAS

,

459

,

2327

10.1093/mnras/stw797

Jain

B.

,

Bertschinger

E.

,

1994

,

ApJ

,

431

,

495

10.1086/174502

Klypin

A.

,

Prada

F.

,

Byun

J.

,

2020

,

MNRAS

,

496

,

3862

10.1093/mnras/staa734

Kodi Ramanah

D.

,

Charnock

T.

,

Villaescusa-Navarro

F.

,

Wandelt

B. D.

,

2020

,

MNRAS

,

495

,

4227

10.1093/mnras/staa1428

10.1088/1475-7516/2013/11/048

Laureijs

R.

et al. ,

2011

,

preprint (arXiv:1110.3193)

Lavenberg

S.

,

Welch

P.

,

1981

,

Manag. Sci.

,

27

,

322

Leclercq

F.

,

Jasche

J.

,

Gil-Marín

H.

,

Wandelt

B.

,

2013

,

JCAP

,

2013

,

048

Leclercq

F.

,

Faure

B.

,

Lavaux

G.

,

Wandelt

B. D.

,

Jaffe

A. H.

,

Heavens

A. F.

,

Percival

W. J.

,

Noûs

C.

,

2020

,

Astronomy & Astrophysics

,

639

,

A91

Lewis

A.

,

Challinor

A.

,

Lasenby

A.

,

2000

,

ApJ

,

538

,

473

10.1086/309179

Lippich

M.

et al. ,

2019

,

MNRAS

,

482

,

1786

10.1093/mnras/sty2757

10.1103/PhysRevD.77.063530

LSST Dark Energy Science Collaboration

,

2018

,

preprint (arXiv:1809.01669)

LSST Science Collaboration

,

2009

,

preprint (arXiv:0912.0201)

Matsubara

T.

,

2008

,

Phys. Rev. D

,

77

,

063530

McClintock

T.

et al. ,

2019a

,

preprint (arXiv:1907.13167)

McClintock

T.

et al. ,

2019b

,

ApJ

,

872

,

53

10.3847/1538-4357/aaf568

Monaco

P.

,

Sefusatti

E.

,

Borgani

S.

,

Crocce

M.

,

Fosalba

P.

,

Sheth

R. K.

,

Theuns

T.

,

2013

,

MNRAS

,

433

,

2389

10.1093/mnras/stt907

Peherstorfer

B.

,

Willcox

K.

,

Gunzburger

M.

,

2016

,

SIAM J. Sci. Comput.

,

38

,

A3163

10.1137/15M1046472

10.1103/PhysRevD.93.103519

Perko

A.

,

Senatore

L.

,

Jennings

E.

,

Wechsler

R. H.

,

2016

,

preprint (arXiv:1610.09321)

Planck Collaboration

,

2020

,

Astronomy and Astrophysics

,

641

,

A6

Pontzen

A.

,

Slosar

A.

,

Roth

N.

,

Peiris

H. V.

,

2016

,

Phys. Rev. D

,

93

,

103519

10.1186/s40668-017-0021-1

Potter

D.

,

Stadel

J.

,

Teyssier

R.

,

2017

,

Comput. Astrophys. Cosmol.

,

4

,

2

Quinn

T.

,

Katz

N.

,

Stadel

J.

,

Lake

G.

,

1997

,

preprint (astro–ph/9710043)

Rasera

Y.

,

Corasaniti

P. S.

,

Alimi

J. M.

,

Bouillot

V.

,

Reverdy

V.

,

Balmès

I.

,

2014

,

MNRAS

,

440

,

1420

10.1093/mnras/stu295

Ronneberger

O.

,

Fischer

P.

,

Brox

T.

,

2015

,

preprint (arXiv:1505.04597)

Rubinstein

R. Y.

,

Marcus

R.

,

1985

,

Oper. Res.

,

33

,

661

10.1287/opre.33.3.661

Skillman

S. W.

,

Warren

M. S.

,

Turk

M. J.

,

Wechsler

R. H.

,

Holz

D. E.

,

Sutter

P. M.

,

2014

,

preprint (arXiv:1407.2600)

Smith

R. E.

,

Angulo

R. E.

,

2019

,

MNRAS

,

486

,

1448

10.1093/mnras/stz890

10.1111/j.1365-2966.2005.09655.x

Smith

A.

,

de Mattia

A.

,

Burtin

E.

,

Chuang

C.-H.

,

Zhao

C.

,

2021

,

MNRAS

,

500

,

259

Spergel

D.

et al. ,

2015

,

preprint (arXiv:1503.03757)

Springel

V.

,

2005

,

MNRAS

,

364

,

1105

Square Kilometre Array Cosmology Science Working Group

,

2020

,

PASA

,

37

,

e007

10.1017/pasa.2019.51

10.1046/j.1365-8711.2002.05441.x

Taffoni

G.

,

Monaco

P.

,

Theuns

T.

,

2002

,

MNRAS

,

333

,

623

Tamura

N.

et al. ,

2016

, in

Proc. SPIE, Ground-based and Airborne Instrumentation for Astronomy VI

.

SPIE

,

Bellingham

, p.

99081M

10.1117/12.2232103

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

10.1088/1475-7516/2012/04/013

Tassev

S.

,

Zaldarriaga

M.

,

2012

,

JCAP.

,

2012

,

013

10.1088/1475-7516/2013/06/036

Tassev

S.

,

Zaldarriaga

M.

,

Eisenstein

D. J.

,

2013

,

JCAP

,

2013

,

036

10.1016/0167-6377(86)90098-2

Tassev

S.

,

Eisenstein

D. J.

,

Wandelt

B. D.

,

Zaldarriaga

M.

,

2015

,

preprint (arXiv:1502.07751)

Venkatraman

S.

,

Wilson

J. R.

,

1986

,

Oper. Res. Lett.

,

5

,

37

Villaescusa-Navarro

F.

et al. ,

2018

,

ApJ

,

867

,

137

10.3847/1538-4357/aae52b

Villaescusa-Navarro

F.

et al. ,

2020

,

ApJS

,

250

,

2

10.3847/1538-4365/ab9d82

10.1088/1475-7516/2015/09/014

Vlah

Z.

,

White

M.

,

Aviles

A.

,

2015

,

JCAP

,

09

,

014

Warren

M. S.

,

2013

,

preprint (arXiv:1310.4502)

Yahya

S.

,

Bull

P.

,

Santos

M. G.

,

Silva

M.

,

Maartens

R.

,

Okouma

P.

,

Bassett

B.

,

2015

,

MNRAS

,

450

,

2251

10.1093/mnras/stv695

Zhai

Z.

et al. ,

2019

,

ApJ

,

874

,

95

10.3847/1538-4357/ab0d7b

APPENDIX A: ANALYTICAL DERIVATION: A BAYESIAN APPROACH

There is an elegant Bayesian derivation of the optimal form of the control variates estimator for the Gaussian case. The result coincides with the minimum variance estimator even in the non-Gaussian case. As in the derivation by Rubinstein & Marcus (1985), the covariance matrices of the full simulations |$\boldsymbol{y}$| and of the fast simulations |$\boldsymbol{c}$| are assumed to be known. In the main text, we use non-parametric approaches to estimate uncertainties since |$\boldsymbol{\beta }$| is not known a priori but estimated from the same simulations that we use to estimate |$\boldsymbol{\mu _y}$|⁠.

For notational simplicity, we will use |$\boldsymbol{y}$| for the empirical mean of the brute-force simulations, |$\boldsymbol{c}$| for the empirical mean of cheap simulations, and |$\boldsymbol{t}$| for the target, the unknown mean of |$\boldsymbol{y}$|⁠. These quantities can be related in a linear model,

$$\begin{eqnarray*} \boldsymbol{y}&=\boldsymbol{t}+\boldsymbol{\epsilon _y}, \end{eqnarray*}$$

(A1)

$$\begin{eqnarray*} \boldsymbol{c}&=\boldsymbol{m}+\boldsymbol{\epsilon _c}. \end{eqnarray*}$$

(A2)

We model the quantities on the right-hand side as

$$\begin{eqnarray*} \boldsymbol{t} &\sim N(\boldsymbol{\mu _y},\boldsymbol{\Sigma _{tt}}), \end{eqnarray*}$$

(A3)

$$\begin{eqnarray*} \boldsymbol{\epsilon _y}&\sim N(\boldsymbol{0}_p,\boldsymbol{\Sigma _{yy}}/N), \end{eqnarray*}$$

(A4)

$$\begin{eqnarray*} \boldsymbol{\epsilon _c}&\sim N(\boldsymbol{0}_p,\boldsymbol{\Sigma _{cc}}/N), \end{eqnarray*}$$

(A5)

$$\begin{eqnarray*} \boldsymbol{m}&\sim N(\boldsymbol{\mu _{c}},\boldsymbol{\Sigma _{mm}}), \end{eqnarray*}$$

(A6)

which express, respectively, any prior information on |$\boldsymbol{t}$| from previous runs, the noise terms for |$\boldsymbol{y}$| and |$\boldsymbol{c}$| after averaging over N simulations, and prior information on |$\boldsymbol{m}$| from a separate run of fast simulations of |$\boldsymbol{c}$|⁠. In addition, the basis of our methods is to exploit correlation between the Monte Carlo noise |$\boldsymbol{y}$| and |$\boldsymbol{c}$|⁠, so |$\mathrm{cov}(\boldsymbol{y},\boldsymbol{c})\equiv \boldsymbol{\Sigma }_{\boldsymbol{yc}}/N$|⁠.

Gathering these together in a single vector gives |$\boldsymbol{z}=(\boldsymbol{t},\boldsymbol{y},\boldsymbol{c},\boldsymbol{m})^T$| with multivariate normal density |$p(\boldsymbol{z})=p(\boldsymbol{t},\boldsymbol{y},\boldsymbol{c},\boldsymbol{m})$|⁠. This joint vector |$\boldsymbol{z}$| is a multivariate Gaussian |$N(\boldsymbol{\mu },\boldsymbol{\Sigma })$|⁠, where

$$\begin{eqnarray*} \boldsymbol{\mu }&={\begin{pmatrix}\boldsymbol{\mu _y}\\ \boldsymbol{\mu _y}\\ \boldsymbol{\mu _c}\\ \boldsymbol{\mu _c} \end{pmatrix}} \end{eqnarray*}$$

(A7)

and

$$\begin{eqnarray*} C &={\begin{pmatrix}\boldsymbol{\Sigma _{tt}} & \boldsymbol{\Sigma _{tt}} & \boldsymbol{0}_{p,p} & \boldsymbol{0}_{p,p} \\ \boldsymbol{\Sigma _{tt}} & \boldsymbol{\Sigma _{tt}}+\boldsymbol{\Sigma _{yy}}/N & \boldsymbol{\Sigma _{yc}}/N & \boldsymbol{0}_{p,p} \\ \boldsymbol{0}_{p,p} & \boldsymbol{\Sigma _{yc}}^T/N & \boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}}/N & \boldsymbol{\Sigma _{mm}}\\ \boldsymbol{0}_{p,p} & \boldsymbol{0}_{p,p} & \boldsymbol{\Sigma _{mm}} & \boldsymbol{\Sigma _{mm}} \end{pmatrix}}. \end{eqnarray*}$$

(A8)

The diagonal covariances are the block marginals, representing prior information; e.g. |$\boldsymbol{\Sigma _{mm}}$| expresses the uncertainty in |$\boldsymbol{m}$| obtained from a prior, independent simulation set of the fast surrogate. For that reason |$\boldsymbol{\Sigma _{ym}}=\boldsymbol{\Sigma _{tm}}=\boldsymbol{\Sigma _{tc}}=\boldsymbol{0}_{p,p}$|⁠.

We are interested in the posterior |$p(\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c})$|⁠; this expresses the information we have about our target |$\boldsymbol{t}$| when we have obtained the set of correlated sample pairs |$(\boldsymbol{y},\boldsymbol{c})$|⁠. Based on our assumptions, we know the posterior |$p(\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c})$| to be Gaussian with mean

$$\begin{eqnarray*} \boldsymbol{\mu _{t|y,c}}&=&\boldsymbol{\mu _y}+(\boldsymbol{\Sigma _{tt}} \quad \boldsymbol{0}_{p,p}) \begin{pmatrix}\boldsymbol{\Sigma _{tt}}+\boldsymbol{\Sigma _{yy}}/N \boldsymbol{\Sigma _{yc}}/N \\ \boldsymbol{\Sigma _{yc}}^T/N \boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}}/N\end{pmatrix}^{-1} \begin{pmatrix}\boldsymbol{y}-\boldsymbol{\mu _y}\\ \boldsymbol{c}-\boldsymbol{\mu _c}\end{pmatrix}\nonumber \\ &=&\boldsymbol{\mu _y}+\boldsymbol{\Sigma _{tt}}[\boldsymbol{\Sigma _{tt}}+\frac{1}{N}(\boldsymbol{\Sigma _{yy}}-\boldsymbol{\Sigma _{yc}}(N\boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}})^{-1}\boldsymbol{\Sigma _{yc}}^T)]^{-1}\nonumber \\ &&((\boldsymbol{y}-\boldsymbol{\mu _y})-\boldsymbol{\Sigma _{yc}}(N\boldsymbol{\Sigma _m}+\boldsymbol{\Sigma _{cc}})^{-1}(\boldsymbol{c}-\boldsymbol{\mu _c})) \end{eqnarray*}$$

(A9)

and covariance

$$\begin{eqnarray*} \boldsymbol{\Sigma _{t|y,c}}&=& \boldsymbol{\Sigma _{tt}} -(\boldsymbol{\Sigma _{tt}}\quad \boldsymbol{0}_{p,p}) \begin{pmatrix}\boldsymbol{\Sigma _{tt}}+\boldsymbol{\Sigma _{yy}}/N \boldsymbol{\Sigma _{yc}}/N \\ \boldsymbol{\Sigma _{yc}}^T/N \boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}}/N\end{pmatrix}^{-1} \begin{pmatrix}\boldsymbol{\Sigma _{tt}}\\ \boldsymbol{0}_{p,p}\end{pmatrix}\nonumber \\ &=&\boldsymbol{\Sigma _{tt}}-\boldsymbol{\Sigma _{tt}} [\boldsymbol{\Sigma _{tt}}+\frac{1}{N}(\boldsymbol{\Sigma _{yy}}-\boldsymbol{\Sigma _{yc}}(N\boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}})^{-1}\boldsymbol{\Sigma _{yc}}^T)]^{-1}\boldsymbol{\Sigma _{tt}}\nonumber \\ &=& [\boldsymbol{\Sigma _{tt}}^{-1}+ N(\boldsymbol{\Sigma _{yy}}- \boldsymbol{\Sigma _{yc}}(N\boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}})^{-1} \boldsymbol{\Sigma _{yc}}^T)^{-1}]^{-1}\, . \end{eqnarray*}$$

(A10)

These results generalize the earlier equations by (i) including |$\boldsymbol{\Sigma _{mm}}$|⁠, the error estimate of |$\boldsymbol{\mu _c}$| from a prior run of fast simulations, (ii) allowing for information from previous runs to be included by specifying prior mean |$\boldsymbol{\mu _t}$| and prior covariance |$\boldsymbol{\Sigma _{tt}}$|⁠, and (iii) giving analytical uncertainty estimates for the accelerated estimates.

To make contact with equation (15), we will consider special cases of this expression. Without prior information on |$\boldsymbol{\mu _y}$| (i.e. |$\boldsymbol{\Sigma _{tt} }\rightarrow \infty$|⁠) we obtain

$$\begin{eqnarray*} \boldsymbol{\mu _{t|y,c}}=\boldsymbol{y}-\frac{1}{N}\boldsymbol{\Sigma _{yc}}(\boldsymbol{\Sigma _{mm}}+\frac{1}{N}\boldsymbol{\Sigma _{cc}})^{-1}(\boldsymbol{c}-\boldsymbol{\mu _c}) \quad {{\rm (no\ prior\ on}\, \boldsymbol{y})} \nonumber \\ \end{eqnarray*}$$

(A11)

and

$$\begin{eqnarray*} \boldsymbol{\Sigma _{t|y,c}}= + \frac{1}{N}(\boldsymbol{\Sigma _{yy}}- \boldsymbol{\Sigma _{yc}}(N\boldsymbol{\Sigma _{mm}}+\boldsymbol{\Sigma _{cc}})^{-1} \boldsymbol{\Sigma _{yc}}^T) \quad {({\rm no\ prior\ on}\, \boldsymbol{y}).} \nonumber \\ \end{eqnarray*}$$

(A12)

For the case where the error on |$\boldsymbol{m}$| can be neglected (i.e. |$\boldsymbol{\Sigma _{mm}}\rightarrow \boldsymbol{0}_{p,p}$|⁠) but prior information is included, we obtain

$$\begin{eqnarray*} &&\boldsymbol{\mu }_{\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c},\boldsymbol{\Sigma _{mm}}\rightarrow \boldsymbol{0}_p}\nonumber \\ &&\quad =\boldsymbol{\mu _y}+\boldsymbol{\Sigma _{tt}}(\boldsymbol{\Sigma _{tt}}+ \boldsymbol{\Sigma }_{\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c},\boldsymbol{\Sigma _{mm}}\rightarrow \boldsymbol{0}_{p,p}})^{-1}\nonumber \\ &&\qquad ((\boldsymbol{y}-\boldsymbol{\mu _y})-\boldsymbol{\Sigma _{yc}\Sigma _{cc}}^{-1}(\boldsymbol{c}-\boldsymbol{\mu _c})) \quad {(\boldsymbol{\mu _c}\, {\rm known})} \end{eqnarray*}$$

(A13)

and

$$\begin{eqnarray*} &&\boldsymbol{\Sigma }_{\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c},\boldsymbol{\Sigma _m}\rightarrow \boldsymbol{0}_{p,p}}\nonumber \\ &&\quad = [\boldsymbol{\Sigma _{tt}}^{-1}+ N(\boldsymbol{\Sigma _{yy}}- \boldsymbol{\Sigma _{yc}\Sigma _c}^{-1} \boldsymbol{\Sigma _{yc}}^T)^{-1}]^{-1} \quad {(\boldsymbol{\mu _c}\, {\rm known}).} \nonumber \\ \end{eqnarray*}$$

(A14)

In the absence of prior information and assuming that |$\boldsymbol{\mu _c}$| is perfectly known (i.e. |$\boldsymbol{\Sigma _{tt}}\rightarrow \infty$| and |$\boldsymbol{\Sigma _{mm}}\rightarrow \boldsymbol{0}_{p,p}$|⁠), equation (A10) simplifies to match the result of equation (15) from Rubinstein & Marcus (1985),

$$\begin{eqnarray*} \boldsymbol{\mu }_{\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c}}= \boldsymbol{y}-\boldsymbol{\Sigma _{yc}\Sigma _{cc}}^{-1}(\boldsymbol{c}-\boldsymbol{\mu _c})) \quad {(\boldsymbol{\mu _c}\, {\rm known,\ no\ prior\ on}\, \boldsymbol{y})} \nonumber \\ \end{eqnarray*}$$

(A15)

and

$$\begin{eqnarray*} \boldsymbol{\Sigma }_{\boldsymbol{t}|\boldsymbol{y},\boldsymbol{c}}= \frac{1}{N}(\boldsymbol{\Sigma _{yy}}-\boldsymbol{\Sigma _{yc}}(\boldsymbol{\Sigma _{cc}})^{-1} \boldsymbol{\Sigma _{yc}}^T) \quad {(\boldsymbol{\mu _c}\, {\rm known,\ no\ prior\ on}\, \boldsymbol{y})\, .} \nonumber \\ \end{eqnarray*}$$

(A16)

APPENDIX B: ADDITIONAL INSIGHT ON RESULTS AND CONFIDENCE INTERVALS

We start with a reminder about confidence intervals. The ‘1σ rule of thumb’ (same for two and three) is a direct application of the central limit theorem (CLT) when estimating a random variable with the sample mean of N realizations,

$$\begin{eqnarray*} \bar{y} \pm \gamma \frac{\widehat{\sigma _y}}{\sqrt{N}}, \end{eqnarray*}$$

(B1)

where γ is the z-score – e.g. from a normal distribution – associated to a given confidence band. The |$95{{\ \rm per\ cent}}$| symmetric confidence intervals correspond to γ ≈ 1.96, hence the name 2σ rule.’ With a very small number of samples, the CLT is not really ‘working,’ so it is common practice to penalize the confidence intervals by taking γ from a t-score table i.e. from a Student distribution with N − 1 degrees of freedom, which has fatter tails. For instance, for N = 10, γ ≈ 2.26 for the |$95{{\ \rm per\ cent}}$| confidence band.

Because the trustworthiness of confidence intervals for a sample mean with very few realizations is debatable, we provide here, by way of an example for the power spectrum only, Fig. B1 with bootstrap confidence intervals of 10 CARPool samples and Fig. B2 for CARPool with 5 and 10 N-body simulations but with t-score intervals accordingly to equation (B1). The latter figure is to compare with Figs 2 and B1 (exact same data except for the blue CARPool confidence intervals). We have agreement between the paired plots, and we notice that the symmetric confidence intervals from t-score tend to be larger. Additionally, for the two- and three-point clustering statistics, we present in Figs B3 (power spectrum) and B4 (bispectrum) the percentage error of CARPool means with 10 simulations that are not shown in the main part of the paper.

Figure B1.

As in Fig. 2, but with 10 N-body simulations used for the CARPool estimate.

Figure B2.

The upper panel shows the same data as in Fig. 2 and the lower panel is paired with Fig. B1, except that the confidence intervals come from t-score values with 4 and 9 d.o.f., respectively.

Figure B3.

As in the lower panel of Fig. 3, but with 10 N-body simulations used for the CARPool estimate.

Figure B4.

As in Fig. 7, but with 10 N-body simulations used for the CARPool estimate.

We provide also in Fig. B5 an overview of the optimal control matrix |$\boldsymbol{\beta ^{\star }}$| from equation (8) for the matter power spectrum and matter PDF test cases.

$Estimated matrices intervening in $\boldsymbol{\beta ^{\star }}$ for the matter power spectrum (left) and the matter PDF (right). The cross-covariance, covariance, and precision matrices are normalized i.e. we display $\boldsymbol{D} ^{-1}\boldsymbol{\widehat{\Sigma }}\boldsymbol{D} ^{-1}$ with $\boldsymbol{D} = \sqrt{\mathrm{diag}\left({\boldsymbol{\widehat{\Sigma }}}\right)}$. ‘od’ denotes the fractional overdensity bin $\rho /\bar{\rho }$. For better visibility, the diverging colour scale is not forced to be centered at 0.0 for the $\boldsymbol{\Sigma _{yc}}$ and $\boldsymbol{\Sigma _{cc}}$ estimates in the upper left corner (power spectrum). All matrices are estimated using 500 simulation pairs, and represent the ‘close to optimal’ $\boldsymbol{\beta ^{\star }}$ towards which the control matrix estimator tends in the multivariate setting.$

Figure B5.

Estimated matrices intervening in |$\boldsymbol{\beta ^{\star }}$| for the matter power spectrum (left) and the matter PDF (right). The cross-covariance, covariance, and precision matrices are normalized i.e. we display |$\boldsymbol{D} ^{-1}\boldsymbol{\widehat{\Sigma }}\boldsymbol{D} ^{-1}$| with |$\boldsymbol{D} = \sqrt{\mathrm{diag}\left({\boldsymbol{\widehat{\Sigma }}}\right)}$|⁠. ‘od’ denotes the fractional overdensity bin |$\rho /\bar{\rho }$|⁠. For better visibility, the diverging colour scale is not forced to be centered at 0.0 for the |$\boldsymbol{\Sigma _{yc}}$| and |$\boldsymbol{\Sigma _{cc}}$| estimates in the upper left corner (power spectrum). All matrices are estimated using 500 simulation pairs, and represent the ‘close to optimal’ |$\boldsymbol{\beta ^{\star }}$| towards which the control matrix estimator tends in the multivariate setting.

APPENDIX C: COLA TIMESTEPPING AND CROSS-CORRELATION COEFFICIENTS

We briefly explain our choice of timestepping strategy to generate a collection of low-fidelity snapshots at z = 0.5. In COLA, the cosmological scale factor a is used to discretize the time derivative of the left-hand side of the COLA equation of motion (17),

$$\begin{eqnarray*} \boldsymbol{v}_{i+\frac{1}{2}} &=& \boldsymbol{v}_{i-\frac{1}{2}} - \Delta a_1 \partial _a^2 \boldsymbol{\Psi _\mathrm{res}}, \nonumber \\ \boldsymbol{r}_{i+\frac{1}{2}} &=& \boldsymbol{r}_i + \boldsymbol{v}_{i+\frac{1}{2}}\Delta a_2 + \Delta D_1 \boldsymbol{\Psi _1} + \Delta D_2 \boldsymbol{\Psi _2}. \end{eqnarray*}$$

(C1)

ΔD_l = D_{l, i + 1} − D_{l, i} with l ∈ {1, 2} are the changes of linear (or Zel’dovich) and second-order growth factors between the timesteps, normalised such that D₁(a = 1) = D₂(a = 1) = 1. |$\boldsymbol{\Psi _1}$| and |$\boldsymbol{\Psi _2}$| are, respectively, the linear (or Zel’dovich) and second-order LPT (or 2LPT) displacement fields at a = 1. We have enabled the timestepping scheme from Tassev et al. (2013) in which the time intervals Δa_i, i ∈ {1, 2} are given by

$$\begin{eqnarray*} \Delta a_1 &=& \frac{H_0}{nLPT} \frac{a_{i + \frac{1}{2}}^{nLPT} -a_{i - \frac{1}{2}}^{nLPT} }{a_i^{nLPT-1}}, \nonumber \\ \Delta a_2 &=& \frac{H_0}{a_{i + \frac{1}{2}}^{nLPT}} \int _{a_i}^{a_{i+1}} \frac{a^{nLPT-3}}{H(a)} da. \end{eqnarray*}$$

(C2)

Here, nLPT is an additional free parameter that should be tuned experimentally for every simulation setting, as Tassev et al. (2013), Howlett et al. (2015), and Izard, Crocce & Fosalba (2016) already stressed. The Kick-and-Drift/Leapfrog algorithm of Quinn et al. (1997) can also be used in l-picola.

Before generating our ensemble of fast surrogates, we tested the sensitivity of the cross-correlation coefficients ζ_yc between the full N-body dark matter density contrast field |$\boldsymbol{\delta _y}$| and |$\boldsymbol{\delta _c}$| produced by l-picola,

$$\begin{eqnarray*} \zeta _{yc} = \frac{\mathbb {E}\left[ \delta _{y}(\boldsymbol{k}) \delta _{c}(\boldsymbol{k})^{\ast } \right]}{\sqrt{\mathbb {E} \left[ \mid \delta _{y}(\boldsymbol{k})\mid ^2 \right] \mathbb {E} \left[ \mid \delta _{c}(\boldsymbol{k})\mid ^2 \right]}} = \frac{P_{yc}(\boldsymbol{k})}{\sqrt{P_y(\boldsymbol{k})P_c(\boldsymbol{k})}}, \end{eqnarray*}$$

(C3)

to the choice of timestepping.

The numerator in (C3) is the cross power spectrum between the two aforementioned density contrast fields. |$\delta (\boldsymbol{k})$| is the Fourier transform of the real-space density contrast |$\delta (\boldsymbol{x})$|⁠. Note that these coefficients serve as a proxy for the correlation between the COLA and gadget snapshots, but do not provide an estimation of the canonical cross-correlations of (10) between the statistics |$\boldsymbol{y}$| and |$\boldsymbol{c}$| computed from these snapshots. Having tested different schemes, we concluded that choosing linearly-spaced timesteps yields a better cross-correlation than with logarithmic ones, and that the fewer the timesteps, the more influential the modified timestepping parameter nLPT in terms of cross-correlation coefficients (in the case of this study, with a very high starting redshift of z_i = 127). Fig. C1 shows an example with 10 and 20 linearly-spaced timesteps and nLPT ∈ { − 2.5, +0.5} (the fiducial value and our experimentally ‘best’ value, respectively). Although ζ_yc(k = 1.0 hMpc⁻¹) ≈ 0.96 with 10 timesteps exceeds ζ_yc(k = 1.0 hMpc⁻¹) ≈ 0.94 with 20 timesteps for nLPT = +0.5, we still chose to generate our l-picola snapshots with 20 timesteps between z_i = 127 and z = 0.0, again, to avoid tuning l-picola for any one particular statistic. In any case, even with 20 timesteps the l-picola surrogates are much faster than full gadget-iii simulations.

Figure C1.

Power spectrum recovery ratio (top) and cross power spectrum coefficients (bottom) at z = 0.5 between a specific l-picola snapshot computed with 10 (left) and 20 (right) linearly-spaced timesteps and the corresponding N-body snapshot derived from the same initial conditions at z_i = 127.