Article Navigation

Journal Article

Dark Energy Survey year 3 results: covariance modelling and its impact on parameter estimation and quality of fit

ABSTRACT

We describe and test the fiducial covariance matrix model for the combined two-point function analysis of the Dark Energy Survey Year 3 (DES-Y3) data set. Using a variety of new ansatzes for covariance modelling and testing, we validate the assumptions and approximations of this model. These include the assumption of Gaussian likelihood, the trispectrum contribution to the covariance, the impact of evaluating the model at a wrong set of parameters, the impact of masking and survey geometry, deviations from Poissonian shot noise, galaxy weighting schemes, and other sub-dominant effects. We find that our covariance model is robust and that its approximations have little impact on goodness of fit and parameter estimation. The largest impact on best-fitting figure-of-merit arises from the so-called f_sky approximation for dealing with finite survey area, which on average increases the χ² between maximum posterior model and measurement by |$3.7{{\ \rm per\ cent}}$| (Δχ² ≈ 18.9). Standard methods to go beyond this approximation fail for DES-Y3, but we derive an approximate scheme to deal with these features. For parameter estimation, our ignorance of the exact parameters at which to evaluate our covariance model causes the dominant effect. We find that it increases the scatter of maximum posterior values for Ω_m and σ₈ by about |$3{{\ \rm per\ cent}}$| and for the dark energy equation-of-state parameter by about |$5{{\ \rm per\ cent}}$|⁠.

large-scale structure of Universe, cosmology: observations

1 INTRODUCTION

Our understanding of the Universe has become much more accurate in the past decades due to a massive amount of observational data collected through different probes, such as the cosmic microwave background (CMB; see e.g. Planck Collaboration VI 2020), big bang nucleosynthesis (see e.g. Fields et al. 2020), Type Ia supernovae (see e.g. Riess 2017; Smith et al. 2020), number counts of clusters of galaxies (see e.g. Mantz et al. 2014; Costanzi et al. 2019; Abbott et al. 2020), the correlation of galaxy positions, and that of their measured shape (see e.g. Abbott et al. 2018; Heymans et al. 2020). From the study of that data, a standard cosmological model has emerged characterized by a small number of parameters (see e.g. Frieman, Turner & Huterer 2008; Peebles 2012; Blandford et al. 2020). Current spectroscopic and photometric surveys of galaxies such as the Extended Baryon Oscillation Spectroscopic Survey¹ and earlier phases of the Sloan Digital Sky Survey, the Hyper Suprime-Cam Subaru Strategic Program,² the Kilo-Degree Survey (KiDS³), and the Dark Energy Survey (DES⁴) have become instrumental in testing this standard model at a new front: the growth of density perturbations in the late-time Universe. Also, future surveys, such as the Dark Energy Spectroscopic Instrument,⁵ the Vera Rubin Observatory Legacy Survey of Space and Time,⁶ Euclid,⁷ and the Nancy Grace Roman Space Telescope,⁸ will push this test to a precision exceeding that provided by other cosmological probes.

An important part of this program is the DES, a state-of-the-art galaxy survey that completed its 6-yr observational campaign in 2019 January (Diehl et al. 2019) collecting data on position, colour, and shape for more than 300 million galaxies. This makes DES the most sensitive and comprehensive photometric galaxy survey ever performed. The main cosmological analyses of the first year (Y1) of DES data have been concluded (Abbott et al. 2018, 2019b) and analyses of the first 3 yr of data (Y3) are under way. The study of the large-scale structure (LSS) of the Universe based on the DES-Y3 data set has the potential to become the most stringent test of our understanding of cosmological physics to date.

To achieve this goal, the DES team is comparing different theoretical models characterized by a range of cosmological parameters to the measured statistics of the LSS in order to determine the model and range of parameters that are in best agreement with the data. The statistics of the LSS considered in the main DES-Y3 analysis are two-point correlation functions of the galaxy density field (galaxy clustering), the weak gravitational lensing field (cosmic shear), and the cross-correlation functions between these fields (galaxy–galaxy lensing) in real space and measured in different redshift bins. These three types of two-point correlation functions are combined into one data vector – the so-called 3×2pt data vector.

A key ingredient in analysing these statistics is a model for the likelihood of a cosmological model given the measured correlation functions. Under the assumption of Gaussian statistical uncertainties (which is to be validated), this likelihood is completely characterized by the covariance matrix that describes how correlated the uncertainties of different data points in the 3×2pt data vector are. Validating the quality of the covariance model for the DES-Y3 two-point analyses is the main focus of this paper.

There are several methods to estimate covariance matrices that can roughly be divided into four main categories: covariance estimation from the data itself (e.g. through jackknife or sub-sampling methods; cf. Norberg et al. 2009; Friedrich et al. 2016), covariance estimation from a suite of simulations (e.g. Hartlap, Simon & Schneider 2007; Dodelson & Schneider 2013; Taylor, Joachimi & Kitching 2013; Percival et al. 2014; Taylor & Joachimi 2014; Joachimi 2017; Sellentin & Heavens 2017; Avila et al. 2018; Shirasaki et al. 2019), theoretical covariance modelling (e.g. Schneider et al. 2002; Eifler, Schneider & Hartlap 2009; Krause et al. 2017), or hybrid methods combining both simulations and theoretical covariance models (e.g. Pope & Szapudi 2008; Friedrich & Eifler 2018; Hall & Taylor 2019).

For the DES-Y3 3×2pt analysis, we adopt a theoretical covariance model as our fiducial covariance matrix. This fiducial covariance model is based on a halo model and includes a dominant Gaussian component, a non-Gaussian component (trispectrum and supersample covariance), redshift space distortions (RSDs), curved sky formalism, finite angular bin width, non-Limber computation for the clustering part, Gaussian shape noise, Poissonian shot noise, and f_sky approximation to treat the finite DES-Y3 survey footprint (although taking into account the exact survey geometry when computing sampling noise contributions to the covariance). In order to assess the accuracy of that model, we study the impact of several approximations and assumptions that go into it (and into two-point function covariance models in general):

the Gaussian likelihood assumption, i.e. whether knowledge of the covariance is sufficient to calculate the likelihood;
robustness with respect to the modelling of the non-Gaussian covariance contributions, i.e. contributions from the trispectrum and supersample covariance;
treatment of the fact that two-point functions are measured in finite angular bins;
cosmology dependence of the covariance model;
random point shot noise;
the assumption of Poissonian shot noise;
survey geometry and the f_sky approximation;
other covariance modelling details such as flat sky versus curved sky calculations, Limber approximation, and RSDs.

We generate different types of mock data and/or analytical estimates to determine how each of these effects impacts the quality of the fit between measurements of the 3×2pt data vector and maximum posterior models (quantified by the distribution of χ² between the two). We also show how they impact cosmological parameter constraints derived from measurements of the 3×2pt data vector. For most of these tests, we employ a linearized Gaussian likelihood framework that allows us to analytically quantify the impact of covariance errors on the χ² distribution and parameter constraints. This is complemented by a set of lognormal simulations and importance sampling techniques to quickly assess large numbers of mock (non-linear) likelihood analyses.

This paper is part of a larger release of scientific results from year-3 data of the DES and our analysis is informed by the (in some cases preliminary) analysis choices of the other DES-Y3 studies. In addition to carving out the most stringent constraints on cosmological parameters from late-time two-point statistics of galaxy density and cosmic shear yet, the year-3 analysis of the DES collaboration is introducing and testing numerous methodological innovations that pave the way for future experiments. Details of the DESY3 galaxy catalogues and the photometric estimation of their redshift distribution are presented by Sevilla-Noarbe et al. (2020), Hartley et al. (2020), Everett et al. (2020), Myles et al. (2021), Gatti et al. (2020a), Cawthon et al. (2020), Buchs et al. (2019), and Cordero et al. (2020). The measurements of galaxy shapes and the calibration of these measurements for the purpose of cosmic weak gravitational lensing analyses are detailed by Gatti et al. (2020b), Jarvis et al. (2020), and MacCrann et al. (2020). Krause et al. (2021) develop and test the theoretical modelling pipeline of the DES-Y3 3×2pt analysis, Pandey et al. (2021) outline how galaxy bias is incorporated in this pipeline, DeRose et al. (2020) validate this pipeline with the help of simulated data, and Muir et al. (2020) describe how we have blinded our analysis to focus our efforts on model-independent validation criteria and reduce the chance for confirmation bias. The DESY3 methodology to sample high-dimensional likelihoods and to characterize external and internal tensions is outlined by Lemos et al. (2021) and Doux et al. (2020). Measurements of cosmic shear two-point correlation functions and analyses thereof are presented by Amon et al. (2021) and Secco et al. (2021), the measurement and analysis of galaxy clustering wo-point statistics are carried out by Rodríguez-Monroy et al. (2021), and two-point cross-correlations between galaxy density and cosmic shear (galaxy–galaxy lensing) are measured and analysed by Prat et al. (2021), with additional analyses of lensing magnification and shear ratios carried out by Elvin-Poole et al., (in preparation) and Sánchez et al. (2021) and results for an alternative lens galaxy sample presented by Porredon et al. (2020) and Porredon et al., (2021). Finally, in DES Collaboration (2020) we present our cosmological analysis of the full 3×2pt data vector.

Our paper is structured as follows. We start by presenting a discussion of our validation strategy in Section 2, where we also summarize our main findings before plunging into the details in the remaining of the paper. In Section 3, we review the modelling and structure of the 3×2pt data vector. Section 4 describes our fiducial covariance model as well as two alternatives to it that are used to validate several modelling assumptions. In Section 5, we describe our linearized likelihood formalism and derive analytically how different covariance matrices impact parameter constraints and maximum posterior χ² within that formalism (including the presence of nuisance parameters and allowing for Gaussian priors on these parameters). In Section 6, we present the details of each step in our validation strategy followed by a short Section 7 presenting a simple test to corroborate some of the results from the linearized framework. We conclude with a discussion of our results in Section 8. Seven appendices describe in more detail some results used in this work.

2 COVARIANCE VALIDATION STRATEGY AND SUMMARY OF THE RESULTS

How should one validate the quality of a covariance model (and the associated likelihood model) for the purpose of constraining cosmological model parameters from a measured statistic? A straightforward answer seems to be that one should run a large number of accurate cosmological simulations, then measure and analyse the statistic at hand in each of the simulated data sets, and test whether the true parameters of the simulations are located within the, say, |$68.3{{\ \rm per\ cent}}$| quantile of the inferred parameter constraints in |$68.3{{\ \rm per\ cent}}$| of the simulations. There are, however, at least two problems with such an approach.

The first one is a conceptual problem. Consider a Bayesian analysis of a measured statistic |$\boldsymbol{\hat{\xi }}$| with a model |$\boldsymbol{\xi }[\boldsymbol{\pi }]$| that is parametrized by model parameters |$\boldsymbol{\pi }$|⁠. For each value of |$\boldsymbol{\pi }$|⁠, the statistical uncertainties in the measurement |$\boldsymbol{\hat{\xi }}$| will have some distribution

$$\begin{eqnarray*} \mathcal {L}\left(\boldsymbol{\pi }|\boldsymbol{\hat{\xi }}\right) \equiv p \left(\boldsymbol{\hat{\xi }}|\boldsymbol{\pi }\right) , \end{eqnarray*}$$

(1)

which is also called the likelihood of the parameters given the data. If this function is known, then a Bayesian analysis will assign a posterior probability distribution to the parameters as

$$\begin{eqnarray*} p\left(\boldsymbol{\pi }|\boldsymbol{\hat{\xi }}\right) = \frac{1}{\mathcal {N}}\ \mathcal {L}\left(\boldsymbol{\pi }|\boldsymbol{\hat{\xi }}\right)\ \mathrm{pr}(\boldsymbol{\pi })\ . \end{eqnarray*}$$

(2)

Here, |$\mathrm{pr}(\boldsymbol{\pi })$| is a prior probability distribution that parametrizes prior knowledge from other experiments (or theoretical constraints) and the normalization constant |$\mathcal {N}$| is fixed by demanding that |$p(\boldsymbol{\pi }|\boldsymbol{\hat{\xi }})$| be a probability distribution. The |$68.3{{\ \rm per\ cent}}$| confidence region for the parameters |$\boldsymbol{\pi }$| would then, e.g. be stated as a volume |$V_{68.3{{\ \rm per\ cent}}}$| in parameter space that contains |$68.3{{\ \rm per\ cent}}$| of the probability. To unambiguously define that volume, one can e.g. impose the additional condition that

$$\begin{eqnarray*} \min _{\boldsymbol{\pi } \in V_{68.3{{\ \rm per\ cent}}}} p\left(\boldsymbol{\pi }|\boldsymbol{\hat{\xi }}\right) \ge \max _{\boldsymbol{\pi } \notin V_{68.3{{\ \rm per\ cent}}}} p\left(\boldsymbol{\pi }|\boldsymbol{\hat{\xi }}\right) \end{eqnarray*}$$

(3)

or, more frequently, one would directly define one-dimensional intervals that satisfy the above conditions for the marginalized posterior distributions on the individual parameter axes. Unfortunately, if one performs such an analysis many times one is not guaranteed that the true parameters (e.g. of a simulation) are located within |$V_{68.3{{\ \rm per\ cent}}}$| in |$68.3{{\ \rm per\ cent}}$| of the times. This has recently been referred to as prior volume effect [this issue is discussed in, e.g. Raveri & Hu (2019) and Abbott et al. (2019a)]. One may argue that a Bayesian posterior should not be interpreted in terms of frequencies but that does not help for the task of validating this posterior on the basis of a large number of simulated data sets.⁹

Another more practical problem is the fact that it is not (yet) feasible to generate enough sufficiently accurate mock data sets to validate covariance matrices of large data vectors with high precision. We recall that for the DES-Y1 analysis, a total of 18 realistic simulated data sets were available to validate the inference pipeline (MacCrann et al. 2018). At the same time, the main reason why N-body simulations would be required to test the accuracy of covariance (and likelihood) models is to capture contributions to the covariance coming from the trispectrum (connected four-point function) of the cosmic density field. However, for DES-like analyses it has been shown that this contribution is negligible (see e.g. Krause et al. 2017; Barreira, Krause & Schmidt 2018). The reason for this is twofold: First, very small scales (where the trispectrum contribution to the covariance would matter most) are often cut off from analyses because on these scales already the modelling of the data vector, |$\boldsymbol{\xi }[\boldsymbol{\pi }]$|⁠, is inaccurate. Secondly, on small scales the covariance matrix is often dominated by effects coming from sparse sampling such as shot noise and shape noise. These covariance contributions are typically easy to model (although one has to be careful when estimating effective number densities and shape-noise dispersions or when estimating the number of galaxy pairs in the presence of complex survey footprints; see Troxel et al. 2018a, b).

As a result of the above considerations, we base our covariance validation strategy mostly on the use of a linearized likelihood (where the model |$\boldsymbol{\xi }[\boldsymbol{\pi }]$| is linear in the parameters |$\boldsymbol{\pi }$|⁠). In this framework, the Bayesian likelihood allows for an interpretation in terms of frequencies – both for total and marginalized constraints. Also, this allows us to perform large numbers of simulated likelihood analyses very efficiently, without the need to run computationally expensive Markov chain Monte Carlo (MCMC) codes. In addition, any leading-order deviation from a linearized likelihood will be next-to-leading order for the purpose of studying the impact of covariance errors (i.e. errors on errors) on our analysis.

Within the linearized likelihood formalism, we confirm the findings of Krause et al. (2017) and Barreira et al. (2018) for the DES-Y3 set-up: Both supersample covariance and trispectrum have a negligible impact on our analysis. This allows us to estimate the impact of other assumptions in our covariance and likelihood model either analytically or by the means of simplified mock data such as lognormal simulations (as opposed to full N-body simulations; cf. Section 4.3).

We summarize our main findings in Fig. 1 and Table 1 for the busy reader. For the combined data vector of the DES-Y3 two-point function analysis (the 3×2pt data vector; see details in Section 3), the left-hand panel of Fig. 1 shows the impact of different assumptions in our likelihood model on the mean and scatter of χ² between maximum posterior model and measurements. To obtain the maximum posterior model, we are fitting for all the 28 parameters listed in Table 3 within the linearized likelihood framework described in Section 5.1. Since we assume Gaussian priors on 13 nuisance parameters, the effective number of parameters in that fit will be between 28 and 15. Within the linearized likelihood approach, we find that with a perfect covariance model the average χ² is expected to be about 507.6; i.e. the effective number of degrees of freedom in the fit is N_param,eff ≈ 23.4. The right-hand panel of Fig. 1 shows the equivalent results when cosmic shear correlation functions are excluded from the data vector (the 2×2pt data vector). The green points in both panels denote effects that have been already accounted for in the previous year-1 analysis of DES.

Impact of different covariance modelling choices on χ2 between measured 3×2pt (left-hand panel) and 2×2pt (right-hand panel) data vectors and maximum posterior models. The dashed vertical lines and error bars indicate the 1σ fluctuations expected in χ2. See the main text for details.

Figure 1.

Impact of different covariance modelling choices on χ² between measured 3×2pt (left-hand panel) and 2×2pt (right-hand panel) data vectors and maximum posterior models. The dashed vertical lines and error bars indicate the 1σ fluctuations expected in χ². See the main text for details.

Open in new tab Download slide

Table 1.

Open in new tab

Summary of the impact of the different effects tested here on the distribution of χ² between measurement and maximum posterior model, on the scatter |$\sigma [\hat{\pi }]$| of maximum posterior parameters |$\hat{\pi }$|⁠, and on the standard deviations σ_π on these parameters inferred from the likelihood. See the text for details.

Effect	〈χ²〉	σ(χ²)	\|$\sigma (\hat{\Omega }_\mathrm{ m})$\|	\|$\sigma _{\Omega _\mathrm{ m}}$\|	\|$\sigma (\hat{\sigma }_8)$\|	\|$\sigma _{\sigma _8}$\|	\|$\sigma (\hat{w})$\|	σ_w
Fiducial	507.6	31.8	0.0509	0.0509	0.0975	0.0975	0.244	0.244
Angular bin width	402.1	26.0	+0.8 per cent	+7.4 per cent	+0.8 per cent	+8.3 per cent	+1.0 per cent	+7.4 per cent
Connected four-point function	507.6	31.8	+0.1 per cent	−0.8 per cent	+0.1 per cent	−0.9 per cent	+0.1 per cent	−0.8 per cent
Curved sky	507.7	31.8	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent
Non-Limber and RSD	511.4	32.1	+0.1 per cent	−0.6 per cent	+0.1 per cent	−0.6 per cent	+0.3 per cent	−1.4 per cent
Non-Gauss. likelihood	–	32.6	+0.8 per cent (low)	–	+0.4 per cent (low)	–	+0.5 per cent (low)	–
			−0.9 per cent (high)		−0.4 per cent (high)		+0.05 per cent (high)
Covariance cosmology	508.6	32.4	+2.9 per cent	+(0.1 ± 0.06) per cent	+2.8 per cent	+(0.1 ± 0.05) per cent	+4.7 per cent	+(0.1 ± 0.06) per cent
Random point shot noise	511.3	32.0	+0.0 per cent	−0.5 per cent	+0.0 per cent	−0.6 per cent	+0.0 per cent	−0.2 per cent
Non-Poisson shot noise	515.0	32.3	+0.0 per cent	−0.7 per cent	+0.0 per cent	−0.8 per cent	+0.0 per cent	−0.6 per cent
Masking and survey geometry	526.5	33.8	+0.6 per cent	−0.8 per cent	+0.7 per cent	−0.3 per cent	+0.3 per cent	−1.3 per cent

Effect	〈χ²〉	σ(χ²)	\|$\sigma (\hat{\Omega }_\mathrm{ m})$\|	\|$\sigma _{\Omega _\mathrm{ m}}$\|	\|$\sigma (\hat{\sigma }_8)$\|	\|$\sigma _{\sigma _8}$\|	\|$\sigma (\hat{w})$\|	σ_w
Fiducial	507.6	31.8	0.0509	0.0509	0.0975	0.0975	0.244	0.244
Angular bin width	402.1	26.0	+0.8 per cent	+7.4 per cent	+0.8 per cent	+8.3 per cent	+1.0 per cent	+7.4 per cent
Connected four-point function	507.6	31.8	+0.1 per cent	−0.8 per cent	+0.1 per cent	−0.9 per cent	+0.1 per cent	−0.8 per cent
Curved sky	507.7	31.8	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent
Non-Limber and RSD	511.4	32.1	+0.1 per cent	−0.6 per cent	+0.1 per cent	−0.6 per cent	+0.3 per cent	−1.4 per cent
Non-Gauss. likelihood	–	32.6	+0.8 per cent (low)	–	+0.4 per cent (low)	–	+0.5 per cent (low)	–
			−0.9 per cent (high)		−0.4 per cent (high)		+0.05 per cent (high)
Covariance cosmology	508.6	32.4	+2.9 per cent	+(0.1 ± 0.06) per cent	+2.8 per cent	+(0.1 ± 0.05) per cent	+4.7 per cent	+(0.1 ± 0.06) per cent
Random point shot noise	511.3	32.0	+0.0 per cent	−0.5 per cent	+0.0 per cent	−0.6 per cent	+0.0 per cent	−0.2 per cent
Non-Poisson shot noise	515.0	32.3	+0.0 per cent	−0.7 per cent	+0.0 per cent	−0.8 per cent	+0.0 per cent	−0.6 per cent
Masking and survey geometry	526.5	33.8	+0.6 per cent	−0.8 per cent	+0.7 per cent	−0.3 per cent	+0.3 per cent	−1.3 per cent

Table 1.

Open in new tab

Effect	〈χ²〉	σ(χ²)	\|$\sigma (\hat{\Omega }_\mathrm{ m})$\|	\|$\sigma _{\Omega _\mathrm{ m}}$\|	\|$\sigma (\hat{\sigma }_8)$\|	\|$\sigma _{\sigma _8}$\|	\|$\sigma (\hat{w})$\|	σ_w
Fiducial	507.6	31.8	0.0509	0.0509	0.0975	0.0975	0.244	0.244
Angular bin width	402.1	26.0	+0.8 per cent	+7.4 per cent	+0.8 per cent	+8.3 per cent	+1.0 per cent	+7.4 per cent
Connected four-point function	507.6	31.8	+0.1 per cent	−0.8 per cent	+0.1 per cent	−0.9 per cent	+0.1 per cent	−0.8 per cent
Curved sky	507.7	31.8	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent
Non-Limber and RSD	511.4	32.1	+0.1 per cent	−0.6 per cent	+0.1 per cent	−0.6 per cent	+0.3 per cent	−1.4 per cent
Non-Gauss. likelihood	–	32.6	+0.8 per cent (low)	–	+0.4 per cent (low)	–	+0.5 per cent (low)	–
			−0.9 per cent (high)		−0.4 per cent (high)		+0.05 per cent (high)
Covariance cosmology	508.6	32.4	+2.9 per cent	+(0.1 ± 0.06) per cent	+2.8 per cent	+(0.1 ± 0.05) per cent	+4.7 per cent	+(0.1 ± 0.06) per cent
Random point shot noise	511.3	32.0	+0.0 per cent	−0.5 per cent	+0.0 per cent	−0.6 per cent	+0.0 per cent	−0.2 per cent
Non-Poisson shot noise	515.0	32.3	+0.0 per cent	−0.7 per cent	+0.0 per cent	−0.8 per cent	+0.0 per cent	−0.6 per cent
Masking and survey geometry	526.5	33.8	+0.6 per cent	−0.8 per cent	+0.7 per cent	−0.3 per cent	+0.3 per cent	−1.3 per cent

Effect	〈χ²〉	σ(χ²)	\|$\sigma (\hat{\Omega }_\mathrm{ m})$\|	\|$\sigma _{\Omega _\mathrm{ m}}$\|	\|$\sigma (\hat{\sigma }_8)$\|	\|$\sigma _{\sigma _8}$\|	\|$\sigma (\hat{w})$\|	σ_w
Fiducial	507.6	31.8	0.0509	0.0509	0.0975	0.0975	0.244	0.244
Angular bin width	402.1	26.0	+0.8 per cent	+7.4 per cent	+0.8 per cent	+8.3 per cent	+1.0 per cent	+7.4 per cent
Connected four-point function	507.6	31.8	+0.1 per cent	−0.8 per cent	+0.1 per cent	−0.9 per cent	+0.1 per cent	−0.8 per cent
Curved sky	507.7	31.8	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent	+0.0 per cent	−0.0 per cent
Non-Limber and RSD	511.4	32.1	+0.1 per cent	−0.6 per cent	+0.1 per cent	−0.6 per cent	+0.3 per cent	−1.4 per cent
Non-Gauss. likelihood	–	32.6	+0.8 per cent (low)	–	+0.4 per cent (low)	–	+0.5 per cent (low)	–
			−0.9 per cent (high)		−0.4 per cent (high)		+0.05 per cent (high)
Covariance cosmology	508.6	32.4	+2.9 per cent	+(0.1 ± 0.06) per cent	+2.8 per cent	+(0.1 ± 0.05) per cent	+4.7 per cent	+(0.1 ± 0.06) per cent
Random point shot noise	511.3	32.0	+0.0 per cent	−0.5 per cent	+0.0 per cent	−0.6 per cent	+0.0 per cent	−0.2 per cent
Non-Poisson shot noise	515.0	32.3	+0.0 per cent	−0.7 per cent	+0.0 per cent	−0.8 per cent	+0.0 per cent	−0.6 per cent
Masking and survey geometry	526.5	33.8	+0.6 per cent	−0.8 per cent	+0.7 per cent	−0.3 per cent	+0.3 per cent	−1.3 per cent

What stands out in our analysis is the large effect of finite angular bin sizes on the cosmic variance and mixed terms of our covariance model (cf. Section 4 for this terminology, where we also show that it is unavoidable to take into account finite bin width in the pure shot noise and shape noise terms of the covariance). In DES-Y1, this has been dealt with in an approximate manner, by computing the covariance model for a very fine angular binning and then re-summing the matrix to obtain a coarser binning (Krause et al. 2017). This time we incorporate the exact treatment of finite angular bin size for all the three two-point functions into our fiducial covariance model (cf. Section 4). The blue points in Fig. 1 denote improvements that have been made in the year-3 analysis compared to the year-1 covariance model, and the red points are estimates of effects that are not taken into account in the fiducial DES-Y3 likelihood – either because they are negligible or because an exact treatment is unfeasible (cf. Section 6 for details). Adding these effects in quadrature, our results suggest that the maximum posterior χ² of the DES-Y3 3×2pt analysis should be on average |$\approx 4{{\ \rm per\ cent}}$| (Δχ² ≈ 20.3) higher than expected if the exact covariance matrix of our data vector was known.

Table 1 summarizes the offsets in χ² displayed in the left-hand panel of Fig. 1 and also shows how parameter constraints based on the 3×2pt data vector are impacted by assumptions of our covariance and likelihood model. We distinguish two effects here: the scatter of a maximum posterior parameter π (which we denote by |$\sigma [\hat{\pi }]$|⁠) and the width of posterior constraints inferred from our likelihood model (which we denote by σ_π). For our tests of likelihood non-Gaussianity, we state the changes in the difference between the fiducial parameter values and the upper (high) and lower (low) boundaries of the 68.3 per cent quantile with respect to the standard deviation of the Gaussian likelihood. For our tests of the impact of covariance cosmology, we show the mean of all σ_π obtained from our 100 different covariances and also indicate the scatter of these σ_π values.

The effect that has the dominant impact on parameter constraints is that of evaluating the covariance model at a set of parameters that do not represent the exact cosmology of the Universe. When computing the covariance at 100 different cosmologies that were randomly drawn from a Monte Carlo Markov chain (run around a fiducial model data vector; see Section 6.8 for details), we find that the differences between these covariances introduce an additional scatter in maximum posterior parameter values. This scatter increases by about |$3{{\ \rm per\ cent}}$| for Ω_m and σ₈ and by about |$5{{\ \rm per\ cent}}$| for the dark energy equation-of-state parameter w. This increased scatter is in fact the dominant effect, since the width of the derived parameter constraints hardly changes between the different covariance matrices. Note especially that rerunning the analysis with a covariance updated to the best-fitting parameters does not mitigate this effect.

In Fig. 2, we take the two effects that had the largest impacts on χ² and show the resulting mismatch between scatter of maximum posterior values and width of the inferred contours for a wider range of parameters. All of our results take into account marginalization over nuisance parameters (and all other parameters).

Figure 2.

Impact of covariance errors on the ratio of the standard deviation of maximum posterior parameters to the width of the posterior derived from the erroneous covariance. Green triangles shot the effect caused by non-Poissonian shot-noise and orange circles show the effect caused by the f_sky approximation (cf. Appendix C for our beyond-f_sky treatment). These ratios have been calculated purely on the base of different analytic covariance models and within the linearized likelihood framework discussed in Section 5.1. We also show the ratio of maximum posterior parameter scatter observed from the 197 flask simulations to the statistical uncertainties expected from a lognormal covariance matrix matching the flask configuration. Within the statistical uncertainties, these ratios are consistent with 1.

Open in new tab Download slide

Our reason for exclusively investigating the impact of covariance errors on χ² and parameter constraints is that those are the two measures by which our final (on-shot) data analysis will be interpreted and judged.¹⁰ In the remainder of this paper, we detail how the above results were obtained.

3 THE 3×2-POINT DATA VECTOR

The combined 3×2pt data vector of the DES-Y3 analysis consists of measurements of the following two-point correlations:

the angular two-point correlation function w(θ) of galaxy density contrast measured for luminous red galaxies in five different redshift bins (see e.g. Rodríguez-Monroy et al. 2021; Cawthon et al. 2020, as well as other relevant references given in Section 1),
the auto- and cross-correlation functions ξ₊(θ) and ξ₋(θ) between the galaxy shapes of four redshift bins of source galaxies (see e.g. Amon et al. 2021; Gatti et al. 2020a; Myles et al. 2021; Secco et al. 2021),
the tangential shear γ(θ) imprinted on source galaxy shapes around positions of foreground redMaGiC galaxies (see e.g. Prat et al. 2021).

At the time of writing this paper, the exact choices for redshift intervals and angular bins considered for each two-point function are still being determined by a careful study of their impact on the robustness of DES-Y3 parameter constraints (DES Collaboration 2020; Krause et al. 2021). For the purposes of testing the modelling of the covariance matrix, we will use the most recent but possibly not final DES-Y3 analysis choices. We do not expect that our tests and conclusions will change in a significant manner with further updated analysis choices. We assume that each of the correlation functions is measured in 20 logarithmically spaced angular bins between θ_min = 2.5 arcmin and θ_max = 250 arcmin. Some of these bins in some of the measured two-point functions are being cut from the analysis to ensure unbiased cosmological results, resulting in a total of 531 data points when using the preliminary DES-Y3 scale cuts.

Our starting point of modelling the different two-point functions in the 3×2pt data vector is the three-dimensional (3D) non-linear matter power spectrum P(k, z) at a given wavenumber k and redshift z. We obtain it by using either of the Boltzmann solvers CLASS¹¹ or CAMB¹² to calculate the linear power spectrum and the HALOFIT fitting formula (Smith et al. 2003) in its updated version (Takahashi et al. 2012) to turn this into the late-time non-linear power spectrum. From this 3D power spectrum, the angular power spectra required for our three two-point functions [cosmic shear (κκ), galaxy–galaxy lensing (δ_gκ), and galaxy–galaxy clustering (δ_gδ_g)] in the Limber approximation are given by (e.g. Limber 1953; Krause et al. 2017):

$$\begin{eqnarray*} C^{ij}_{\kappa \kappa }(\ell) = \int \mathrm{ d}\chi \frac{q^i_\kappa (\chi) q^j_\kappa (\chi)}{\chi ^2} P\left(\frac{\ell +\frac{1}{2}}{\chi }, z(\chi)\right), \end{eqnarray*}$$

(4)

$$\begin{eqnarray*} C^{ij}_{\delta _\mathrm{ g} \kappa }(\ell) = \int \mathrm{ d}\chi \frac{q^i_\delta \left(\frac{\ell +\frac{1}{2}}{\chi },\chi \right) q^j_\kappa (\chi)}{\chi ^2} P\left(\frac{\ell +\frac{1}{2}}{\chi }, z(\chi)\right), \end{eqnarray*}$$

(5)

$$\begin{eqnarray*} C^{ij}_{\delta _\mathrm{ g} \delta _\mathrm{ g}}(\ell) = \int \mathrm{ d}\chi \frac{q^i_\delta \left(\frac{\ell +\frac{1}{2}}{\chi },\chi \right) q^j_\delta \left(\frac{\ell +\frac{1}{2}}{\chi },\chi \right)}{\chi ^2} P\left(\frac{\ell +\frac{1}{2}}{\chi }, z(\chi)\right)\ , \end{eqnarray*}$$

(6)

where χ is the comoving radial distance, i and j denote different combinations of pairs of redshift bins, and the lensing efficiency |$q^i_\kappa$| and the radial weight function for clustering |$q^i_\delta$| are, respectively, given by

$$\begin{eqnarray*} q^i_\kappa (\chi) &=& \frac{3H_0^2 \Omega _\mathrm{ m}\chi }{2a(\chi)} \int _{\chi }^{\chi _\mathrm{ h}} \mathrm{ d}\chi ^\prime \left(\frac{\chi ^\prime - \chi }{\chi }\right) n^i_\kappa (z(\chi ^\prime)) \frac{\mathrm{ d} z}{\mathrm{ d}\chi ^\prime } , \nonumber \\ q^i_\delta (k,\chi) &=& b^i(k, z(\chi))\ n^i_\delta (z(\chi)) \frac{\mathrm{ d} z}{\mathrm{ d}\chi }\ . \end{eqnarray*}$$

(7)

Here, H₀ is the Hubble parameter today, Ω_m is the ratio of today’s matter density to today’s critical density of the Universe, a(χ) is the Universe’s scale factor at comoving distance χ, and bⁱ(k, z) is a scale- and redshift-dependent galaxy bias. Furthermore, |$n^i_{\kappa , \mathrm{ g}}(z)$| denote the redshift distributions of the different DES-Y3 redshift bins of source and lens galaxies, respectively, normalized such that

$$\begin{eqnarray*} \int \mathrm{ d}z \,\, n^i_{\kappa , \mathrm{ g}}(z) = 1\ . \end{eqnarray*}$$

(8)

Note that on large angular scales the DES-Y3 analysis does not make use of the Limber approximation for galaxy clustering but instead employs the method derived in Fang, Eifler & Krause (2020a).

The above angular power spectra are now related to the real space correlation functions w(θ), γ_t(θ), and ξ_±(θ) as

$$\begin{eqnarray*} w^i(\theta) &=& \sum _\ell \frac{2 \ell +1}{4\pi } P_\ell (\cos \theta) C^{ii}_{\delta _\mathrm{ g} \delta _\mathrm{ g}}(\ell)\ , \nonumber \\ \gamma ^{ij}_t(\theta) &=& \sum _\ell \frac{2\ell + 1}{4\pi } \frac{P_\ell ^2\left(\cos \theta \right)}{\ell (\ell + 1)} C_{\delta _\mathrm{ g}\kappa }^{ij}(\ell)\ ,\nonumber \\ \xi _{\pm }^{ij}(\theta) &=& \sum _{\ell \ge 2} \frac{2\ell + 1}{4\pi }\ \frac{2(G_{\ell , 2}^+(x) \pm G_{\ell , 2}^-(x))}{\ell ^2(\ell + 1)^2}\ C^{ij}_{\kappa \kappa }(\ell)\ .\nonumber \\ \end{eqnarray*}$$

(9)

Here, P_ℓ are the Legendre polynomials of order ℓ, |$P_\ell ^m$| are the associated Legendre polynomials, x = cos θ, and the functions |$G_{\ell , 2}^{+,-}(x)$| are given in Appendix A (see also Stebbins 1996). Note that we only consider the autocorrelations wⁱ(θ) for each tomographic bin since in the Y1 analysis it was shown that the cross-correlations do not carry significant information (Elvin-Poole et al. 2018).

The above relations between angular power spectra and real space correlation functions can all be written in the form

$$\begin{eqnarray*} \xi ^{\mathrm{AB}}(\theta) = \sum _{\ell = 0}^\infty \frac{2 \ell +1}{4 \pi } F^{AB}_\ell (\theta) \, C_\ell ^{AB}\ . \end{eqnarray*}$$

(10)

This is particularly useful when deriving covariance expressions and when performing averages over finite bins in the angular scale θ. To achieve the latter, one can simply derive analytical averages of the functions |$F^{AB}_\ell (\theta)$|⁠. Both of these points will be considered in the next sections.

For most of our tests, we consider the 3×2pt data vector and its covariance matrix at the fiducial cosmology described in Section 5, where we also show the Gaussian priors assumed on some of these parameters when assessing the impact of covariance modelling on parameter constraints and maximum posterior χ².

4 COVARIANCE MATRICES FOR THE 3 × 2PT DATA VECTOR

The covariance matrix of measurements of cosmological two-point statistics typically contains three contributions (cf. Krause & Eifler 2017; Krause et al. 2017),

$$\begin{eqnarray*} \mathbf {C} = \mathbf {C}_{\mathrm{G}} + \mathbf {C}_{\mathrm{nG}} + \mathbf {C}_{\mathrm{SSC}}\ . \end{eqnarray*}$$

(11)

Here, |$\mathbf {C}_{\mathrm{G}}$| is the contribution to the covariance that would be present if the cosmic matter density and cosmic shear fields were pure Gaussian random fields (see also Schneider et al. 2002; Crocce, Cabré & Gaztañaga 2011), |$\mathbf {C}_{\mathrm{nG}}$| are contributions involving the connected four-point function of these fields (the trispectrum), and |$\mathbf {C}_{\mathrm{SSC}}$| is the so-called supersample covariance contribution resulting from the fact that any survey only observes a finite volume of the Universe and that the mean density in that volume is subject to fluctuations due to long wavelength modes (Takada & Hu 2013; Schaan, Takada & Spergel 2014).

In the fiducial DES-Y3 analysis, we model all of these covariance contributions analytically. This fiducial model is described in Section 4.1. In Section 4.2, we describe an alternative model for the non-Gaussian covariance contributions that is used to test the robustness of our analysis with respect to the modelling of the trispectrum contribution. Finally, Section 4.3 describes a set of lognormal simulations (Xavier, Abdalla & Joachimi 2016) and the covariance matrix of the 3×2pt data vector estimated from them. These simulations also allow us to test the accuracy of our Gaussian likelihood assumption and the treatment of masking and finite survey area in our fiducial covariance model.

4.1 Fiducial DES-Y3 covariance

In our fiducial covariance matrix, we model the non-Gaussian covariance contributions |$\mathbf {C}_{\mathrm{nG}}$| and |$\mathbf {C}_{\mathrm{SSC}}$| using a halo model combined with leading-order perturbation theory to approximate the trispectrum of the cosmic density field and to compute the mode coupling between scales larger than the considered survey volume with scales inside that volume. These calculations are carried out using the CosmoCov code package (Fang et al. 2020a) based on the CosmoLike framework (Krause & Eifler 2017). Our modelling of these contributions has not changed with respect to the year-1 analysis of DES and we refer the reader to Krause et al. (2017) as well as to the CosmoLike papers for details. However, the modelling of the Gaussian contribution has changed as described in the following.

4.1.1 Gaussian covariance

Our modelling of the Gaussian covariance part has changed with respect to the year-1 analysis in the following ways:

we use (and present for the first time¹³) analytical expression for the angular bin averaging of the functions |$F^{AB}_\ell (\theta)$| (cf. equation 10) for all four types of two-point functions present in our data vector (see Section 6.3; this is especially relevant for the sampling-noise contribution to the covariance; cf. Troxel et al. 2018b);
we account for RSDs and also use a non-Limber calculation to obtain the galaxy–galaxy clustering power spectrum |$C_{\delta _\mathrm{ g }\delta _\mathrm{ g}}(\ell)$| (see Section 6.5);
we do not make use of the flat-sky approximation anymore (see Section 6.4).

To derive expressions for the Gaussian covariance part, let us first consider an all-sky survey. If a two-point function measurement |$\hat{\xi }^{\mathrm{AB}}(\theta)$| could be obtained from data on the entire sky, then for most types of two-point correlations it would be related to power spectrum measurements |$C_\ell ^{AB}$| from a spherical harmonics decomposition of the same all-sky data through equation (10), i.e.

$$\begin{eqnarray*} \hat{\xi }^{\mathrm{AB}}(\theta) = \sum _{\ell = 0}^\infty \frac{2 \ell +1}{4 \pi } F^{AB}_\ell (\theta) \, \hat{C}_\ell ^{AB}\ . \end{eqnarray*}$$

(12)

A notable exception to this is the cosmic shear two-point functions |$\hat{\xi }_\pm$| that obtain contributions from both the so-called E-mode and B-mode power spectra (Schneider et al. 2002). For these functions, equation (12) in the curved sky formalism becomes

$$\begin{eqnarray*} \hat{\xi }_{\pm }^{ij}(\theta) = \sum _{\ell \ge 2} \frac{2\ell + 1}{4\pi }\ \frac{2(G_{\ell , 2}^+(x) \pm G_{\ell , 2}^-(x))}{\ell ^2(\ell + 1)^2}\ \left(\hat{C}^{E, ij}_{\gamma \gamma }(\ell) \pm \hat{C}^{B, ij}_{\gamma \gamma }(\ell)\right)\ , \end{eqnarray*}$$

(13)

where in the absence of shape-measurement systematics (and ignoring post-Born corrections) |$\langle \hat{C}^{E, ij}_{\gamma \gamma }(\ell) \rangle = C^{ij}_{\kappa \kappa }(\ell)$| and |$\langle \hat{C}^{B, ij}_{\gamma \gamma }(\ell) \rangle = 0$|⁠.

Since this is a linear equation in C(ℓ)’s, the covariance of two different two-point function measurements |$\hat{\xi }^{AB}$| and |$\hat{\xi }^{CD}$| at two different angular scales θ₁ and θ₂ would be given in terms of the covariance of the corresponding power spectrum measurements by

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{\xi }^{AB}(\theta _1), \hat{\xi }^{CD}(\theta _2)\right] = \sum _{\ell _1, \ell _2} \frac{(2\ell _1 +1)(2\ell _2 +1)}{(4\pi)^2} F_{\ell _1}^{AB}\left(\cos \theta _1 \right) F_{\ell _2}^{CD}\left(\cos \theta _2 \right) \mathrm{Cov}\left[\hat{C}_{\ell _1}^{AB}, \hat{C}_{\ell _2}^{CD}\right]\ . \end{eqnarray*}$$

(14)

Again, for |$\hat{\xi }^{AB}(\theta) = \hat{\xi }_\pm (\theta)$| one would have to use |$C_{\ell }^{AB} = \hat{C}^{E}_{\gamma \gamma }(\ell) \pm \hat{C}^{B}_{\gamma \gamma }(\ell)$| in this sum.

For the autopower spectrum of galaxy density contrast in one of our redshift bins, the harmonic space covariance would be (Crocce et al. 2011)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\delta _\mathrm{ g} \delta _\mathrm{ g}}^{ii}(\ell _1), \hat{C}_{\delta _\mathrm{ g} \delta _\mathrm{ g}}^{ii}(\ell _2)\right] = \frac{2\delta _{\ell _1\ell _2}}{(2\ell _1 + 1)} \left(C_{\delta _\mathrm{ g} \delta _\mathrm{ g}}^{ii}(\ell _1) + \frac{1}{n_\mathrm{ g}} \right)^2. \end{eqnarray*}$$

(15)

Here, n_g is the number density of the galaxies and |$\delta _{\ell _1\ell _2}$| is the Kronecker symbol. To account for partial-sky surveys (such as DES), we simply divide this expression (and similar ones for the other two-point functions) by the observed sky fraction f_sky. This so-called f_sky approximation leads to the following harmonic space Gaussian covariances of (see also Krause et al. 2017):

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\mathrm{ g} \mathrm{ g}}^{ij}(\ell _1), \hat{C}_{\mathrm{ g g}}^{kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[\left(C_{\mathrm{ g g}}^{ik}(\ell _1) + \frac{\delta _{ik}}{n_\mathrm{ g}^i} \right)\left(C_{\mathrm{ g g}}^{jl}(\ell _1) + \frac{\delta _{jl}}{n_\mathrm{ g}^j} \right) + \left(C_{\mathrm{ g g}}^{il}(\ell _1) + \frac{\delta _{il}}{n_\mathrm{ g}^i} \right)\left(C_{\mathrm{ g g}}^{jk}(\ell _1) + \frac{\delta _{jk}}{n_\mathrm{ g}^j} \right)\right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(16)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\gamma \gamma }^{E, ij}(\ell _1), \hat{C}_{\gamma \gamma }^{E, kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[\left(C_{\kappa \kappa }^{ik}(\ell _1) + \frac{\delta _{ik}\sigma _{\epsilon , i}^2}{n_\mathrm{ s}^i} \right)\left(C_{\kappa \kappa }^{jl}(\ell _1) + \frac{\delta _{jl} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} \right) + \left(C_{\kappa \kappa }^{il}(\ell _1) + \frac{\delta _{il}\sigma _{\epsilon , i}^2}{n_\mathrm{ s}^i} \right)\left(C_{\kappa \kappa }^{jk}(\ell _1) + \frac{\delta _{jk} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} \right)\right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(17)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\gamma \gamma }^{B, ij}(\ell _1), \hat{C}_{\gamma \gamma }^{B, kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[\frac{\delta _{ik}\sigma _{\epsilon , i}^2}{n_\mathrm{ s}^i} \frac{\delta _{jl} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} + \frac{\delta _{il}\sigma _{\epsilon , i}^2}{n_\mathrm{ s}^i}\frac{\delta _{jk} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} \right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(18)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\mathrm{ g} \kappa }^{ij}(\ell _1), \hat{C}_{\mathrm{ g} \kappa }^{kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[\left(C_{\mathrm{ g g}}^{ik}(\ell _1) + \frac{\delta _{ik}}{n_\mathrm{ g}^i} \right)\left(C_{\kappa \kappa }^{jl}(\ell _1) + \frac{\delta _{jl} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} \right) + C_{\mathrm{ g} \kappa }^{il}(\ell _1) C_{\mathrm{ g} \kappa }^{kj}(\ell _1)\right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(19)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\mathrm{ g g}}^{ij}(\ell _1), \hat{C}_{\gamma \gamma }^{E, kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[C_{\mathrm{ g} \kappa }^{ik}(\ell _1)C_{\mathrm{ g} \kappa }^{jl}(\ell _1) + C_{\mathrm{ g} \kappa }^{il}(\ell _1)C_{\mathrm{ g}\mathrm{ } \kappa }^{jk}(\ell _1) \right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(20)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\mathrm{ gg}}^{ij}(\ell _1), \hat{C}_{\mathrm{ g} \kappa }^{kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[\left(C_{\mathrm{ gg}}^{ik}(\ell _1) + \frac{\delta _{ik}\sigma _{\epsilon , i}^2}{n_\mathrm{ s}^i} \right)C_{\mathrm{ g} \kappa }^{jl}(\ell _1) + C_{\mathrm{ g} \kappa }^{il}(\ell _1) \left(C_{\mathrm{ g g}}^{jk}(\ell _1) + \frac{\delta _{jk}}{n_\mathrm{ g}^j} \right)\right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(21)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\mathrm{ g} \kappa }^{ij}(\ell _1), \hat{C}_{\gamma \gamma }^{E, kl}(\ell _2)\right] = \frac{\delta _{\ell _1\ell _2} \left[C_{\mathrm{ g} \kappa }^{ik}(\ell _1)\left(C_{\kappa \kappa }^{jl}(\ell _1) + \frac{\delta _{jl} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} \right) + C_{\mathrm{ g} \kappa }^{il}(\ell _1)\left(C_{\kappa \kappa }^{jk}(\ell _1) + \frac{\delta _{jk} \sigma _{\epsilon , j}^2}{n_\mathrm{ s}^j} \right)\right]}{(2\ell _1 + 1) f_{\mathrm{sky}}} \end{eqnarray*}$$

(22)

$$\begin{eqnarray*} \mathrm{Cov}\left[\hat{C}_{\mathrm{ g g}}^{ij}(\ell _1), \hat{C}_{\gamma \gamma }^{B, kl}(\ell _2)\right] = 0\ \left(\mathrm{as\ are\ all\ other\ covariances\ with\ only\ one}\ \hat{C}_{\gamma \gamma }^{B}\right)\ . \end{eqnarray*}$$

(23)

At this point, let us introduce the following nomenclature: We will denote the terms that contain two power spectra as cosmic variance contribution to the covariance, the terms that contain no power spectrum at all as the sampling noise contributions (or shape-noise and shot-noise contributions), and the terms that contain contribution from one power spectrum and a sampling noise as the mixed terms. We test the accuracy of the f_sky approximation that results in equations (16–23) in Section 6.6 by comparing it to more accurate expressions.

4.2 Analytical lognormal covariance model

To test the robustness of the CosmoLike covariance, we also employ an alternative model for the connected four-point function part of the covariance – the lognormal model. Hilbert, Hartlap & Schneider (2011) originally derived this as a model for the covariance of cosmic shear correlation function, assuming that that the lensing convergence κ can be written in terms of a Gaussian random field n as (see also Xavier et al. 2016)

$$\begin{eqnarray*} \kappa = \lambda \left(\mathrm{ e}^{n+\mu } - 1\right)\ , \end{eqnarray*}$$

(24)

where it is assumed that 〈n〉 = 0. For given values λ > 0 and μ the power spectrum of n can be chosen such as to reproduce a desired two-point correlation function ξ_κ (see Xavier et al. 2016, for caveats). Furthermore, for any given value λ > 0 one can choose μ such that 〈κ〉 = 0. This makes λ the only free parameter of the lognormal covariance model. Hilbert et al. (2011) show that this model leads to a number of correction terms to the Gaussian covariance model, and identify the most dominant of these terms to be

$$\begin{eqnarray*} C_{\mathrm{LN}}[\hat{\xi }_\kappa (\theta _1), \hat{\xi }_\kappa (\theta _2)] \approx C_{\mathrm{G}}[\hat{\xi }_\kappa (\theta _1), \hat{\xi }_\kappa (\theta _2)] + \frac{4\ \xi _\kappa (\theta _1) \xi _\kappa (\theta _2)}{A_{\mathrm{S}}\lambda ^2} \mathrm{Var}_{\mathrm{S}}(\kappa)\ . \end{eqnarray*}$$

(25)

Here, A_S is the area of the considered survey footprint and Var_S(κ) is the variance of κ when averaged over the footprint. We generalize this to the covariance of two-point correlations |$\hat{\xi }_{AB}$| and |$\hat{\xi }_{CD}$| between arbitrary scalar fields δ_A, δ_B, δ_C, and δ_D as

$$\begin{eqnarray*} C_{\mathrm{LN}}[\hat{\xi }_{AB}(\theta _1), \hat{\xi }_{CD}(\theta _2)] - C_{\mathrm{G}}[\hat{\xi }_{AB}(\theta _1), \hat{\xi }_{CD}(\theta _2)] \approx \frac{\xi _{AB}(\theta _1) \xi _{CD}(\theta _2)}{A_{\mathrm{S}}} \left\lbrace \frac{\mathrm{Cov}_{\mathrm{S}}(\delta _A, \delta _C)}{\lambda _A \lambda _C} + \frac{\mathrm{Cov}_{\mathrm{S}}(\delta _A, \delta _D)}{\lambda _A \lambda _D} + \frac{\mathrm{Cov}_{\mathrm{S}}(\delta _B, \delta _C)}{\lambda _B \lambda _C} + \frac{\mathrm{Cov}_{\mathrm{S}}(\delta _B, \delta _D)}{\lambda _B \lambda _D} \right\rbrace . \end{eqnarray*}$$

(26)

Here, Cov_S(δ_A, δ_C) is the covariance of δ_A and δ_C after the two fields have been averaged over the entire survey footprint (and likewise for the other terms appearing above). Following Hilbert et al. (2011), we use this expression even when considering non-scalar fields (i.e. the shear field) by replacing ξ_XY(θ) by the appropriate two-point functions ξ₊(θ), ξ₋(θ), γ_t(θ) [or w(θ), for the scalar galaxy density contrast].

To choose the parameters λ_X (also called the lognormal shift parameters; cf. Xavier et al. 2016), we follow a procedure similar to the one outlined in Friedrich et al. (2018). There, it is shown how the value of λ_X can be adjusted in order to match the re-scaled cumulant

$$\begin{eqnarray*} S_3(\vartheta) \equiv \frac{\langle \delta _X(\vartheta)^3 \rangle }{\langle \delta _X(\vartheta)^2 \rangle ^2} \end{eqnarray*}$$

(27)

of the random field δ_X smoothed with a top-hat filter of angular radius ϑ to the value of S₃ predicted by leading-order perturbation theory for that same smoothing scale. Since the focus in our paper is the covariance matrix of two-point statistics (hence a four-point function), we modify their method to match instead the value of reduced fourth-order cumulant

$$\begin{eqnarray*} S_4(\vartheta) \equiv \frac{\left\langle \delta _X(\vartheta)^4 \right\rangle - 3\left\langle \delta _X(\vartheta)^2 \right\rangle ^2}{\left\langle \delta _X(\vartheta)^2 \right\rangle ^3}\ . \end{eqnarray*}$$

(28)

The field δ_X here will be either projections of the 3D matter density contrast along the line-of-sight distribution of our lens galaxies or the lensing convergence fields corresponding to our four source redshift bins. The smoothing scale ϑ at which we use the λ_X to match S₄ to its perturbation theory value is chosen such that it corresponds to about 10 Mpc h⁻¹ at the mean redshift of the line-of-sight projection kernels corresponding to the different δ_X. This is approximately the scale at which Friedrich et al. (2018) found the shifted lognormal model to be a good approximation of the overall PDF of density fluctuations in N-body simulations (cf. their fig. 5).

Our results are shown in Table 2, where we present the number density, galaxy bias (relevant for lenses only), shape-noise dispersion (per shear component; relevant for sources only), and the lognormal shift parameters obtained from the procedure described above. Note that for the source galaxy samples, the relevant line-of-sight projection kernel used to derive the shift parameter is the lensing kernel (and not the redshift distribution of the source galaxies). For the lens galaxies, all shift parameters come out to be >1. As a consequence, there will be pixels with negative density in our lognormal simulations. However, the fraction of such pixels is <0.01 for all runs and all bins and setting δ_g = −1 in these bins has an unnoticeable effect on the statistics measured in these maps (e.g. for bin 4, which is affected most, the standard deviation of δ_g changes by |$0.053{{\ \rm per\ cent}}$|⁠). Note further that at the time of completing the simulation runs presented in Section 4.3, the DES-Y3 shear catalogue and redshift distribution were not finalized. As a consequence, the shape-noise dispersion values used for simulations differ from the values in this table. We display the projection kernels assumed for our analysis in Fig. 3.

Table 2.

Open in new tab

Number density, galaxy bias (relevant for lenses only), shape-noise dispersion (per shear component; relevant for sources only), and the lognormal shift parameters obtained from the procedure described in Section 4.2.

z-bin	n_g (arcmin⁻²)	Bias	σ_ϵ	Lognormal shift
Lenses 1	0.0221	1.7	−	1.089
Lenses 2	0.0381	1.7	−	1.106
Lenses 3	0.0583	1.7	−	1.047
Lenses 4	0.0295	2.0	−	1.252
Lenses 5	0.0251	2.0	−	1.177
Sources 1	1.7971	−	0.2724	0.004 53
Sources 2	1.5521	−	0.2724	0.008 85
Sources 3	1.5967	−	0.2724	0.019 18
Sources 4	1.0979	−	0.2724	0.032 87

z-bin	n_g (arcmin⁻²)	Bias	σ_ϵ	Lognormal shift
Lenses 1	0.0221	1.7	−	1.089
Lenses 2	0.0381	1.7	−	1.106
Lenses 3	0.0583	1.7	−	1.047
Lenses 4	0.0295	2.0	−	1.252
Lenses 5	0.0251	2.0	−	1.177
Sources 1	1.7971	−	0.2724	0.004 53
Sources 2	1.5521	−	0.2724	0.008 85
Sources 3	1.5967	−	0.2724	0.019 18
Sources 4	1.0979	−	0.2724	0.032 87

Table 2.

Open in new tab

z-bin	n_g (arcmin⁻²)	Bias	σ_ϵ	Lognormal shift
Lenses 1	0.0221	1.7	−	1.089
Lenses 2	0.0381	1.7	−	1.106
Lenses 3	0.0583	1.7	−	1.047
Lenses 4	0.0295	2.0	−	1.252
Lenses 5	0.0251	2.0	−	1.177
Sources 1	1.7971	−	0.2724	0.004 53
Sources 2	1.5521	−	0.2724	0.008 85
Sources 3	1.5967	−	0.2724	0.019 18
Sources 4	1.0979	−	0.2724	0.032 87

z-bin	n_g (arcmin⁻²)	Bias	σ_ϵ	Lognormal shift
Lenses 1	0.0221	1.7	−	1.089
Lenses 2	0.0381	1.7	−	1.106
Lenses 3	0.0583	1.7	−	1.047
Lenses 4	0.0295	2.0	−	1.252
Lenses 5	0.0251	2.0	−	1.177
Sources 1	1.7971	−	0.2724	0.004 53
Sources 2	1.5521	−	0.2724	0.008 85
Sources 3	1.5967	−	0.2724	0.019 18
Sources 4	1.0979	−	0.2724	0.032 87

4.3 Lognormal covariance from simulations

We also produce a test DES-Y3 covariance matrix from a set of simulations. We use the publicly available code flask (Full sky Lognormal Astro fields Simulation Kit; Xavier et al. 2016,¹⁴) to generate 800 DES-Y3 footprint sky maps of density, convergence, and shear healpix maps (Górski et al. 2005) with NSIDE = 8192, as well as galaxy positions catalogues, used to reproduce the DES-Y3 properties. flask is able to quickly produce tomographic correlated simulations of clustering and weak lensing lognormal fields based on the DES-Y3 lens and sources samples. The lognormal distribution of cosmological fields has been shown to be a good approximation (Coles & Jones 1991; Wild et al. 2005; Clerkin et al. 2017) but much less computationally expensive to generate than full N-body simulations.

As input for the simulations, we used a set of auto- and cross-correlated power spectrum and the lognormal field shift parameters. The theoretical input power spectrum was generated using CosmoLike, and the lognormal shifts are the ones listed in Table 2. In order to reproduce the properties of shear fields, we added the shape-noise term by sampling each pixel of the simulated maps to match the correspondent shape-noise dispersion σ_ϵ and number density n_g of the tomographic bin. At the time of completing the simulation runs, the DES-Y3 shear catalogue and redshift distribution were not finalized. For this reason, the values used in the simulations are slightly different from the values in Table 2. For the simulations, we set the number density for the five tomographic lens bins as 0.0227, 0.0392, 0.0583, 0.0451, and 0.0278 (arcmin⁻²). The shape-noise dispersion values for the four tomographic bins of sources were set to 0.270 49, 0.332 12, 0.325 37, and 0.350 37. The cosmology adopted for the theoretical power spectra is set as Ω_m = 0.3, σ₈ = 0.823 55, n_s = 0.97, Ω_b = 0.048, h₀ = 0.69, and |$\Omega _\nu h_0^2 = 0.000\,83$|⁠.

We use the publicly available code treecorr¹⁵ (Jarvis, Bernstein & Jain 2004) to measure the 3×2 point correlation measurements for 200 DES-Y3 realizations. For all measurements, we used 20 log-spaced angular separation bins on scales between 2.5 and 250 arcmin. We set the bin_slopTreeCorr parameter to zero, essentially setting all estimators to brute-force computation. In Fig. 4, we show the validation of the measurements comparing with the theoretical input.

We will use the flask covariance mainly to estimate the impact of the survey geometry.

4.4 Comparisons among covariances

Here, we present some comparisons between the different covariance matrices. In Fig. 5, we show the ratio of the diagonal elements of the different covariance matrices introduced in this section displaying both the variances of the measurements of ξ₊(θ) of w(θ).

Figure 3.

Redshift distributions of lens galaxies (shaded regions) and source galaxies (solid lines) in our fiducial test configuration.

Open in new tab Download slide

Figure 4.

Validation of flask simulations. Each panel shows the absolute difference of three two-point correlations measured on flask realizations and the predicted correlation functions from input C(ℓ)s normalized to the statistical error given by the standard deviation along flask realizations (ΔX/σ_X, where X = w, γ_t, ξ₊, ξ₋). Grey dots are single realizations and blue dots its mean.

Open in new tab Download slide

Figure 5.

Ratio of the diagonal elements of the different covariance matrices introduced in this section with respect to each other. The left-hand panel compares the variances of measurements of ξ₊(θ) while the right-hand panel compares the variances of measurements of w(θ). To give a sense of the goodness of fit between the covariance estimated from flask and our fiducial analytic matrix, we treat the diagonal elements of the flask covariance as a multivariate Gaussian whose covariance can be inferred from the properties of the Wishart distribution (Taylor et al. 2013). The low p-value for the highest redshift bin ofw(θ) most likely results from our incomplete treatment of the survey mask (cf. discussion in Section 6 and Appendix C).

Open in new tab Download slide

In Fig. 6, we compare the covariance matrices obtained from the flask simulations and the analytical halo model covariance.

Figure 6.

flask (lower diagonal) versus CosmoLike halo model (upper diagonal) correlation matrix.

Open in new tab Download slide

5 IMPACT OF COVARIANCE ERRORS ON A LINEARIZED GAUSSIAN LIKELIHOOD

As discussed earlier, a full assessment of the impact of using different covariance matrices to parameter estimation becomes unfeasible due to the computational demand of running a large number of MCMC chains. Since the covariance matrices studied in this work differ by subdominant effects, we do not expect large modifications in the results of the estimation of the parameters. Therefore, we will bypass this difficulty by using a linearized approximation of the model data vector as a function of the parameters. The measured data are assumed to be a Gaussian multivariate variable characterized by a covariance matrix and a given prior matrix. This approach is called the Gaussian linear model (Seehars et al. 2014, 2016; Raveri & Hu 2019).

Within this approach, we study the following impacts of different covariances:

error in the parameter estimation, characterized by the width of the contours;
the scatter of the best-fitting (maximum posteriors) parameters;
change in the maximum posterior χ² value;
error in the maximum posterior χ² value.

In the remainder of this section, we detail this method.

5.1 Linearized likelihoods

To speed up our simulated likelihood analyses, we employ a linearized model of the data vector |$\boldsymbol{\xi }$| (e.g. the DES-Y3 3×2-point function data vector). This can be considered a linear Taylor expansion of our full model around a fiducial set of parameters |$\boldsymbol{\pi }^0$| that is summarized in Table 3. In this approximation, our model data vector becomes

$$\begin{eqnarray*} \boldsymbol{\xi }(\boldsymbol{\pi }) = \boldsymbol{\xi }(\boldsymbol{\pi }^0) + \sum _\alpha \left(\pi _\alpha - \pi _\alpha ^0\right)\left.\frac{\partial \boldsymbol{\xi }(\boldsymbol{\pi })}{\partial \pi _\alpha }\right|_{\boldsymbol{\pi } = \boldsymbol{\pi }^0} , \end{eqnarray*}$$

(29)

where the sum is over all components π_α of the parameter vector |$\boldsymbol{\pi }$| (we will use Latin indices for the components of the data vector and Greek indices for the components of the parameter vector). Given a two-point function measurement |$\boldsymbol{\hat{\xi }}$| and abbreviating

$$\begin{eqnarray*} \boldsymbol{\xi }^0 &=& \boldsymbol{\xi }\left(\boldsymbol{\pi }^0 \right) \nonumber \\ \delta \boldsymbol{\xi } &=& \boldsymbol{\hat{\xi }} - \boldsymbol{\xi }^0 \nonumber \\ \delta \boldsymbol{\pi } &=& \boldsymbol{\pi } - \boldsymbol{\pi }^0 \nonumber \\ \partial _\alpha \boldsymbol{\xi } &=& \left.\frac{\partial \boldsymbol{\xi }(\boldsymbol{\pi })}{\partial \pi _\alpha }\right|_{\boldsymbol{\pi } = \boldsymbol{\pi }_0} \end{eqnarray*}$$

our figure of merit χ² as a function of the parameters becomes in the linearized approximation

$$\begin{eqnarray*} \chi ^2[\delta \boldsymbol{\pi }] &=&\ \left(\delta \boldsymbol{\xi } - \sum _\alpha \delta \pi _\alpha \partial _\alpha \boldsymbol{\xi } \right)^T \mathbf {C}^{-1} \left(\delta \boldsymbol{\xi } - \sum _\alpha \delta \pi _\alpha \partial _\alpha \boldsymbol{\xi } \right)\nonumber \\ && + \left(\boldsymbol{\pi } - \boldsymbol{\pi }^{\mathrm{prior}}\right)^T\ \mathbf {P}\ \left(\boldsymbol{\pi } - \boldsymbol{\pi }^{\mathrm{prior}}\right)\ . \end{eqnarray*}$$

(30)

Here, we have allowed for a Gaussian prior with covariance matrix |$\mathbf {P}^{-1}$| and central value |$\boldsymbol{\pi }^{\mathrm{prior}}$|⁠. To find the deviation |$\delta \boldsymbol{\pi }^{\mathrm{MP}} = \boldsymbol{\pi }^{\mathrm{MP}} - \boldsymbol{\pi }^0$| from our fiducial parameters that minimizes this function (the maximum posterior value of the parameters is denoted by |$\boldsymbol{\pi }^{\mathrm{MP}}$|⁠), we have to solve

$$\begin{eqnarray*} \left. \frac{\partial \chi ^2}{\partial (\delta \pi _\beta)} \right|_{\delta \boldsymbol{\pi }=\delta \boldsymbol{\pi }^{\mathrm{MP}}}\ = 0 \ . \end{eqnarray*}$$

(31)

Defining a vector |$\mathbf {x}$| such that |$x_\beta = \delta \boldsymbol{\xi }^T \mathbf {C}^{-1} \partial _\beta \boldsymbol{\xi }$| as well as the Fisher matrix |$F_{\alpha \beta } = \partial _\beta \boldsymbol{\xi }^T \mathbf {C}^{-1} \partial _\alpha \boldsymbol{\xi }$|⁠, this becomes

$$\begin{eqnarray*} (\mathbf {F}+\mathbf {P})\ \delta \boldsymbol{\pi }^{\mathrm{MP}} &=& \mathbf {x} + \mathbf {P}\ (\boldsymbol{\pi }^{\mathrm{prior}} - \boldsymbol{\pi }^{0})\nonumber\\ &\Rightarrow& \boldsymbol{\pi }^{\mathrm{MP}} = \boldsymbol{\pi }^0 + (\mathbf {F}+\mathbf {P})^{-1} \mathbf {x} + (\mathbf {F}+\mathbf {P})^{-1}\mathbf {P}\ (\boldsymbol{\pi }^{\mathrm{prior}} - \boldsymbol{\pi }^{0})\ . \end{eqnarray*}$$

(32)

Table 3.

Open in new tab

Fiducial cosmology and standard deviation of Gaussian parameter priors used in our mock likelihood analyses. A_{IA, i} is the intrinsic alignment amplitude in the i-th source redshift bin, m_i is the multiplicative shear bias, and Δz_{s, i} parametrizes systematic shifts in the photometric redshift distribution of that bin. Δz_{l, i} parametrizes systematic shifts in the photometric redshift distribution of the i-th lens redshift bin. The Gaussian priors we choose for the parameters follow the analysis choices of Abbott et al. (2018) and we assume infinite flat priors for all other parameters.

Parameter	Fiducial value	σ_prior
Cosmology
Ω_m	0.3	–
σ₈	0.823 55	–
h₁₀₀	0.69	–
n_s	0.97	–
w₀	−1	–
Ω_b	0.048	–
Ω_ν	0.001 743	–
\|$\Omega _\Lambda$\|	1 − Ω_m − Ω_ν	–
b₁	1.7	–
b₂	1.7	–
b₃	1.7	–
b₄	2.0	–
b₅	2.0	–
Δz_{l, 1}	0.0	0.04
Δz_{l, 2}	0.0	0.04
Δz_{l, 3}	0.0	0.04
Δz_{l, 4}	0.0	0.04
Δz_{l, 5}	0.0	0.04
Δz_{s, 1}	0.0	0.08
Δz_{s, 2}	0.0	0.08
Δz_{s, 3}	0.0	0.08
Δz_{s, 4}	0.0	0.08
A_{IA, 1}	0.0	–
A_{IA, 2}	0.0	–
A_{IA, 3}	0.0	–
A_{IA, 4}	0.0	–
m₁	0.0	0.03
m₂	0.0	0.03
m₃	0.0	0.03
m₄	0.0	0.03

Parameter	Fiducial value	σ_prior
Cosmology
Ω_m	0.3	–
σ₈	0.823 55	–
h₁₀₀	0.69	–
n_s	0.97	–
w₀	−1	–
Ω_b	0.048	–
Ω_ν	0.001 743	–
\|$\Omega _\Lambda$\|	1 − Ω_m − Ω_ν	–
b₁	1.7	–
b₂	1.7	–
b₃	1.7	–
b₄	2.0	–
b₅	2.0	–
Δz_{l, 1}	0.0	0.04
Δz_{l, 2}	0.0	0.04
Δz_{l, 3}	0.0	0.04
Δz_{l, 4}	0.0	0.04
Δz_{l, 5}	0.0	0.04
Δz_{s, 1}	0.0	0.08
Δz_{s, 2}	0.0	0.08
Δz_{s, 3}	0.0	0.08
Δz_{s, 4}	0.0	0.08
A_{IA, 1}	0.0	–
A_{IA, 2}	0.0	–
A_{IA, 3}	0.0	–
A_{IA, 4}	0.0	–
m₁	0.0	0.03
m₂	0.0	0.03
m₃	0.0	0.03
m₄	0.0	0.03

Table 3.

Open in new tab

Parameter	Fiducial value	σ_prior
Cosmology
Ω_m	0.3	–
σ₈	0.823 55	–
h₁₀₀	0.69	–
n_s	0.97	–
w₀	−1	–
Ω_b	0.048	–
Ω_ν	0.001 743	–
\|$\Omega _\Lambda$\|	1 − Ω_m − Ω_ν	–
b₁	1.7	–
b₂	1.7	–
b₃	1.7	–
b₄	2.0	–
b₅	2.0	–
Δz_{l, 1}	0.0	0.04
Δz_{l, 2}	0.0	0.04
Δz_{l, 3}	0.0	0.04
Δz_{l, 4}	0.0	0.04
Δz_{l, 5}	0.0	0.04
Δz_{s, 1}	0.0	0.08
Δz_{s, 2}	0.0	0.08
Δz_{s, 3}	0.0	0.08
Δz_{s, 4}	0.0	0.08
A_{IA, 1}	0.0	–
A_{IA, 2}	0.0	–
A_{IA, 3}	0.0	–
A_{IA, 4}	0.0	–
m₁	0.0	0.03
m₂	0.0	0.03
m₃	0.0	0.03
m₄	0.0	0.03

Parameter	Fiducial value	σ_prior
Cosmology
Ω_m	0.3	–
σ₈	0.823 55	–
h₁₀₀	0.69	–
n_s	0.97	–
w₀	−1	–
Ω_b	0.048	–
Ω_ν	0.001 743	–
\|$\Omega _\Lambda$\|	1 − Ω_m − Ω_ν	–
b₁	1.7	–
b₂	1.7	–
b₃	1.7	–
b₄	2.0	–
b₅	2.0	–
Δz_{l, 1}	0.0	0.04
Δz_{l, 2}	0.0	0.04
Δz_{l, 3}	0.0	0.04
Δz_{l, 4}	0.0	0.04
Δz_{l, 5}	0.0	0.04
Δz_{s, 1}	0.0	0.08
Δz_{s, 2}	0.0	0.08
Δz_{s, 3}	0.0	0.08
Δz_{s, 4}	0.0	0.08
A_{IA, 1}	0.0	–
A_{IA, 2}	0.0	–
A_{IA, 3}	0.0	–
A_{IA, 4}	0.0	–
m₁	0.0	0.03
m₂	0.0	0.03
m₃	0.0	0.03
m₄	0.0	0.03

We now want to consider the situation when a model covariance matrix |$\mathbf {C}_{\mathrm{mod}}$| is used to calculate the likelihood in equation (30) that is different from the true covariance matrix |$\mathbf {C}_{\mathrm{true}}$| of the statistical uncertainties in the data vector |$\boldsymbol{\hat{\xi }}$|⁠. In that case, our linearized likelihood will be a Gaussian centred around |$\boldsymbol{\pi }^{\mathrm{MP}}$| and with parameter covariance matrix

$$\begin{eqnarray*} \mathbf {C}_{\boldsymbol{\pi }, \mathrm{like}} = (\mathbf {F}_{\mathrm{mod}}+\mathbf {P})^{-1}\ , \end{eqnarray*}$$

(33)

where |$F_{\mathrm{mod},\alpha \beta } = \partial _\beta \boldsymbol{\xi }^T \mathbf {C}_{\mathrm{mod}}^{-1} \partial _\alpha \boldsymbol{\xi }$| is the Fisher matrix calculated from the model covariance.

The actual covariance matrix of |$\boldsymbol{\pi }^{\mathrm{MP}}$| includes two sources of noise. First, statistical uncertainties in the measurement |$\boldsymbol{\hat{\xi }}$| that are described by the covariance matrix |$\mathbf {C}_{\mathrm{true}}$| and are represented by the first term in equation (32) that is proportional to |$\mathbf {x}$|⁠, and secondly, statistical uncertainties in our choice of the prior centre that are described by the prior covariance matrix |$\mathbf {P}^{-1}$| and are represented by the second term in equation (32) that is proportional to |$\boldsymbol{\pi }^{\mathrm{prior}}$|⁠. The latter term has the covariance matrix |$(\mathbf {F}_{\mathrm{mod}}+\mathbf {P})^{-1}\mathbf {P}(\mathbf {F}_{\mathrm{mod}}+\mathbf {P})^{-1}$| (because the covariance matrix of |$\boldsymbol{\pi }^{\mathrm{prior}}$| is |$\mathbf {P}^{-1}$|⁠). Hence, the total covariance matrix of |$\boldsymbol{\pi }^{\mathrm{MP}}$| can be written as

$$\begin{eqnarray*} \left(\mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP}}\right)_{\alpha \beta }\ &\equiv & \mathrm{Cov}\left[\pi _{\alpha }^{\mathrm{MP}}, \pi _{\beta }^{\mathrm{MP}}\right] \nonumber \\ &=&\ (\mathbf {F}_{\mathrm{mod}}+\mathbf {P})^{-1}\mathbf {P}(\mathbf {F}_{\mathrm{mod}}+\mathbf {P})^{-1} + \sum _{\kappa , \lambda } (\mathbf {F}_{\mathrm{mod}} + \mathbf {P})_{\alpha \kappa }^{-1}\ (\mathbf {F}_{\mathrm{mod}} + \mathbf {P})_{\lambda \beta }^{-1} \sum _{i,k} \partial _\kappa \xi _i\ \left(\mathbf {C}_{\mathrm{mod}}^{-1}\mathbf {C}_{\mathrm{true}}\mathbf {C}_{\mathrm{mod}}^{-1}\right)_{ik}\ \partial _\lambda \xi _k\ . \end{eqnarray*}$$

(34)

For |$\mathbf {C}_{\mathrm{mod}} = \mathbf {C}_{\mathrm{true}}$|⁠, it is easy to see that this parameter covariance |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP}}$| equals the covariance |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{like}}$| that describes the shape of our likelihood (as it should).

5.2 Impact on the width of the likelihood and scatter of best-fitting parameters

We can use the above findings to study the impact of different effects in covariance modelling on parameter constraints. If a covariance matrix |$\mathbf {C}_1$| contains a noise contribution that is missing in another covariance matrix |$\mathbf {C}_2$|⁠, then we quantify the difference between these matrices by considering the following two effects:

Width of likelihood contours: Denoting the Fisher matrices obtained from |$\mathbf {C}_1$| or |$\mathbf {C}_2$| as |$\mathbf {F}_1$| and |$\mathbf {F}_2$|⁠, respectively, the widths of likelihood contours drawn from the different covariances are given by
$$\begin{eqnarray*} \mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 1} &=& (\mathbf {F}_1+\mathbf {P})^{-1}\nonumber \\ \mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 2} &=& (\mathbf {F}_2+\mathbf {P})^{-1}\ . \end{eqnarray*}$$
(35)
Hence, if the difference |$\mathbf {C}_1-\mathbf {C}_2 = \mathbf {E}$| represents noise contributions missing from (or miss-estimated in |$\mathbf {C}_2$|⁠), then a comparison of |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 1}$| and |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 2}$| quantifies the impact of this on the width of parameter contours.
Scatter in the centre of likelihood contours: If the data vector |$\boldsymbol{\hat{\xi }}$| had |$\mathbf {C}_1$| as its true covariance matrix but |$\mathbf {C}_2$| would be used to derive the maximum posterior parameters |$\boldsymbol{\pi }^{\mathrm{MP}}$| from it, then the maximum posterior parameter covariance would be given by
$$\begin{eqnarray*} \left(\mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP},\ 2}\right)_{\alpha \beta } &=&\ (\mathbf {F}_{\mathrm{2}}+\mathbf {P})^{-1}\ \mathbf {P}\ (\mathbf {F}_{\mathrm{2}}+\mathbf {P})^{-1} + \sum _{\kappa , \lambda } (\mathbf {F}_{\mathrm{2}} + \mathbf {P})_{\alpha \kappa }^{-1}\ (\mathbf {F}_{\mathrm{2}} + \mathbf {P})_{\lambda \beta }^{-1} \sum _{i,k} \partial _\kappa \xi _i\ \left(\mathbf {C}_{\mathrm{2}}^{-1\ }\mathbf {C}_{\mathrm{1}}\ \mathbf {C}_{\mathrm{2}}^{-1}\right)_{ik}\ \partial _\lambda \xi _k\ . \end{eqnarray*}$$
(36)
If the difference |$\mathbf {C}_1-\mathbf {C}_2 = \mathbf {E}$| represents noise contributions missing from (or miss-estimated in |$\mathbf {C}_2$|⁠), then a comparison of |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP},\ 2}$| and |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP},\ 1} \equiv \mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 1}$| quantifies the impact of this on the scatter in the location of parameter contours.

An inaccurate covariance model will in general have a different impact on the width and the location of parameter contours. Hence, in order to quantify the importance of different effects in covariance modelling for parameter estimation, we compare both the pair |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 1} / \mathbf {C}_{\boldsymbol{\pi }, \mathrm{like},\ 2}$| and the pair |$\mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP},\ 1} / \mathbf {C}_{\boldsymbol{\pi }, \mathrm{MP},\ 2}$|⁠.

5.3 Distribution of χ² when fitting for parameters

Within the linearized likelihood model developed in the previous section, we now investigate how errors in the covariance model impact the distribution of |$\chi _{\mathrm{MP}}^2$| between measured data vector |$\boldsymbol{\hat{\xi }}$| and a maximum posterior model |$\boldsymbol{\xi }_{\mathrm{MP}} = \boldsymbol{\xi }(\boldsymbol{\pi }^{\mathrm{MP}})$|⁠,

$$\begin{eqnarray*} \hat{\chi }_{\mathrm{MP}}^2 = (\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}})^T \mathbf {C}^{-1} (\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}})\ . \end{eqnarray*}$$

(37)

We start with the case that

the true covariance |$\mathbf {C}$| of |$\boldsymbol{\hat{\xi }}$| is known;
no parameter priors are used when determining the best-fitting model |$\boldsymbol{\xi }_{\mathrm{MP}}$|⁠;
the true expectation value |$\boldsymbol{\bar{\xi }} \equiv \langle \boldsymbol{\hat{\xi }}\rangle$| lies within our parameter space; i.e. there are parameters |$\boldsymbol{\pi }^{\mathrm{true}}$| such that |$\boldsymbol{\xi }(\boldsymbol{\pi }^{\mathrm{true}}) = \boldsymbol{\bar{\xi }}$|⁠.

We will show that, as expected, in this case |$\hat{\chi }_{\mathrm{MP}}^2$| should follow a χ² distribution with N_data − N_param degrees of freedom.

Using equations (29) and (32) (and setting again |$\delta \boldsymbol{\xi } \equiv \boldsymbol{\hat{\xi }} - \boldsymbol{\xi }^0$|⁠), one can see that the maximum posterior data vector is given by

$$\begin{eqnarray*} \boldsymbol{\xi }_{\mathrm{MP}} &=& \boldsymbol{\xi }^0 + \sum _{\alpha \beta } \partial _\alpha \boldsymbol{\xi }\ \left(\mathbf {F}^{-1}\right)_{\alpha \beta }\ \left(\delta \boldsymbol{\xi }^T \mathbf {C}^{-1} \partial _\beta \boldsymbol{\xi }\right)\nonumber \\ &=&\ \boldsymbol{\bar{\xi }} + \sum _{\alpha \beta } \partial _\alpha \boldsymbol{\xi }\ \left(\mathbf {F}^{-1}\right)_{\alpha \beta }\ \left((\boldsymbol{\hat{\xi }}-\boldsymbol{\bar{\xi }})^T \mathbf {C}^{-1} \partial _\beta \boldsymbol{\xi }\right)\nonumber \\ &=&\ \boldsymbol{\bar{\xi }} + \sum _{\alpha \beta } \sum _{kl} \partial _\alpha \boldsymbol{\xi }\ \left(\mathbf {F}^{-1}\right)_{\alpha \beta }\ (\hat{\xi }_k-\bar{\xi }_k) \left(\mathbf {C}^{-1}\right)_{kl} \partial _\beta \xi _l\nonumber \\ & \equiv &\ \boldsymbol{\bar{\xi }} + \boldsymbol{\mathcal {P}} \cdot \left(\boldsymbol{\hat{\xi }} - \boldsymbol{\bar{\xi }}\right) \ . \end{eqnarray*}$$

(38)

Here, the second line follows from the fact that |$\boldsymbol{\bar{\xi }} =\langle \boldsymbol{\hat{\xi }}\rangle =\langle \boldsymbol{\xi }_{\mathrm{MP}}\rangle$| and we have defined the matrix

$$\begin{eqnarray*} \mathcal {P}_{ij} = \sum _{\alpha \beta } \partial _\alpha \xi _i \sum _{l} \left(\mathbf {F}^{-1}\right)_{\alpha \beta }\ \left(\mathbf {C}^{-1}\right)_{lj} \partial _\beta \xi _l\ . \end{eqnarray*}$$

(39)

It can be shown that |$\boldsymbol{\mathcal {P}}$| is an idempotent matrix (⁠|$\boldsymbol{\mathcal {P}}^2=\boldsymbol{\mathcal {P}}$|⁠) and furthermore that

$$\begin{eqnarray*} \mathrm{Trace}\left(\boldsymbol{\mathcal {P}}\right) &=& N_{\mathrm{param}} \nonumber \\ \boldsymbol{C}^{-1} \boldsymbol{\mathcal {P}} \boldsymbol{C} &=& \boldsymbol{\mathcal {P}}^T \ . \end{eqnarray*}$$

(40)

The residual between the measurement |$\boldsymbol{\hat{\xi }}$| and the best-fitting model |$\boldsymbol{\xi }_{\mathrm{MP}}$| can be written in terms of |$\boldsymbol{\mathcal {P}}$| as

$$\begin{eqnarray*} \boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}} &=& \left(\boldsymbol{\hat{\xi }} - \boldsymbol{\bar{\xi }}\right) - \left(\boldsymbol{\xi }_{\mathrm{MP}} - \boldsymbol{\bar{\xi }}\right) \nonumber \\ &=& \left(\mathbb {1} - \boldsymbol{\mathcal {P}}\right)\cdot \left(\boldsymbol{\hat{\xi }} - \boldsymbol{\bar{\xi }}\right)\ . \end{eqnarray*}$$

(41)

Hence, the covariance matrix of |$\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}}$| is given by

$$\begin{eqnarray*} \mathbf {C}_{\mathcal {P}} \equiv \langle (\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}})^T (\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}}) \rangle = (\mathbb {1} - \boldsymbol{\mathcal {P}}) \mathbf {C}\ (\mathbb {1} - \boldsymbol{\mathcal {P}})^T . \end{eqnarray*}$$

(42)

This makes it straightforward to find the expectation value

$$\begin{eqnarray*} \left\langle \chi _{\mathrm{MP}}^2 \right\rangle &=& \left\langle (\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}})^T \boldsymbol{C}^{-1}(\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}}) \right\rangle \nonumber \\ &=& \mathrm{Trace}\left(\mathbf {C}_{\mathcal {P}}\ \mathbf {C}^{-1}\right) \nonumber \\ &=& \sum _{jk} C_{kj}\ \left(C^{-1}\right)_{jk}\ -\ \sum _{k} \mathcal {P}_{kk} \nonumber \\ &=& N_{\mathrm{data}} - N_{\mathrm{param}}\ . \end{eqnarray*}$$

(43)

Similarly, the variance of |$\chi _{\mathrm{MP}}^2$| can be shown to be

$$\begin{eqnarray*} \mathrm{Var}(\chi _{\mathrm{MP}}^2) &=& \left\langle \left(\chi _{\mathrm{MP}}^2\right)^2 \right\rangle - \left\langle \chi _{\mathrm{MP}}^2 \right\rangle ^2 \nonumber \\ &=&\ 2\ \mathrm{Trace}\left(\left[ \mathbf {C}_{\mathcal {P}}\ \mathbf {C}^{-1}\right]^2 \right) \nonumber \\ &=&\ 2(N_{\mathrm{data}} - N_{\mathrm{param}})\ . \end{eqnarray*}$$

(44)

So far, we have only re-derived textbook results (Anderson 2003). Now how do |$\langle \chi _{\mathrm{MP}}^2 \rangle$| and |$\mathrm{Var}(\chi _{\mathrm{MP}}^2)$| change if the covariance model |$\mathbf {C}_{\mathrm{mod}}$| we use to find the best-fitting model |$\boldsymbol{\xi }_{\mathrm{MP}}$| and to compute |$\chi _{\mathrm{MP}}^2$| is different from the true covariance matrix |$\mathbf {C}$| of |$\boldsymbol{\hat{\xi }}$|?

Following similar steps as equations (38) and (39), one can show that

$$\begin{eqnarray*} \boldsymbol{\xi }_{\mathrm{MP}} = \boldsymbol{\bar{\xi }} + \boldsymbol{\mathcal {P}}_{\mathrm{mod}} \cdot (\boldsymbol{\hat{\xi }}-\boldsymbol{\bar{\xi }}), \end{eqnarray*}$$

(45)

where

$$\begin{eqnarray*} (\mathcal {P}_{\mathrm{mod}})_{ij} = \sum _{\alpha \beta } \partial _\alpha \xi _i \sum _{l} \left(\mathbf {F}_{\mathrm{mod}}^{-1}\right)_{\alpha \beta }\ \left(\mathbf {C}_{\mathrm{mod}}^{-1}\right)_{lj} \partial _\beta \xi _l \end{eqnarray*}$$

(46)

and where the Fisher matrix |$\mathbf {F}_{\mathrm{mod}}$| is computed from the model covariance |$\mathbf {C}_{\mathrm{mod}}$|⁠. Equation (45) especially shows that |$\boldsymbol{\xi }_{\mathrm{MP}}$| is still an unbiased estimator of |$\boldsymbol{\bar{\xi }}$| even when |$\mathbf {C}_{\mathrm{mod}} \ne \mathbf {C}$|⁠. When deriving the moments of |$\chi _{\mathrm{MP}}^2$|⁠, we will still come across expectation values like (cf. equation 43)

$$\begin{eqnarray*} \left\langle (\hat{\xi }_i - \bar{\xi }_i)(\hat{\xi }_j - \bar{\xi }_j) \right\rangle \equiv (\mathbf {C})_{ij} \ne (\mathbf {C}_{\mathrm{mod}})_{ij}\ . \end{eqnarray*}$$

(47)

Hence, the expectation value and variance of |$\chi _{\mathrm{MP}}^2$| are given by

$$\begin{eqnarray*} \left\langle \chi _{\mathrm{MP}}^2 \right\rangle = \mathrm{Trace}\left(\mathbf {C}_{\boldsymbol{\mathcal {P}}_{\mathrm{mod}}} \mathbf {C}_{\mathrm{mod}}^{-1} \right) \end{eqnarray*}$$

(48)

$$\begin{eqnarray*} \mathrm{Var}\left(\chi _{\mathrm{MP}}^2\right) = 2\ \mathrm{Trace}\left(\left[ \mathbf {C}_{\boldsymbol{\mathcal {P}}_{\mathrm{mod}}} \mathbf {C}_{\mathrm{mod}}^{-1} \right]^2 \right), \end{eqnarray*}$$

(49)

where

$$\begin{eqnarray*} \mathbf {C}_{\boldsymbol{\mathcal {P}}_{\mathrm{mod}}} = (\mathbb {1} - \boldsymbol{\mathcal {P}}_{\mathrm{mod}}) \mathbf {C}\ (\mathbb {1} - \boldsymbol{\mathcal {P}}_{\mathrm{mod}})^T . \end{eqnarray*}$$

(50)

Now we are left to investigate how equations (48) and (49) change when a Gaussian parameter prior |$\mathbf {P}$| is included in the likelihood function (cf. equation 30). A complication in this case is that now |$\boldsymbol{\xi }_{\mathrm{MP}}$| is not necessarily an unbiased estimate of |$\boldsymbol{\bar{\xi }}$| anymore. This is because in equation (30) we have centred our prior around the model parameters |$\boldsymbol{\pi }^{\mathrm{prior}}$| that may be different from the true parameters |$\boldsymbol{\pi }^{\mathrm{true}}$|⁠. Inserting the full expression for the maximum posterior parameters (equation 32) into our linearized model, we now get

$$\begin{eqnarray*} \boldsymbol{\xi }_{\mathrm{MP}} = \boldsymbol{\xi }^0 + \boldsymbol{\mathcal {P}}_{\mathrm{mod}} \cdot (\boldsymbol{\hat{\xi }}-\boldsymbol{\xi }^0) + \boldsymbol{\zeta } \end{eqnarray*}$$

(51)

with

$$\begin{eqnarray*} (\mathcal {P}_{\mathrm{mod}})_{ij} &=& \sum _{\alpha \beta } \partial _\alpha \xi _i \sum _{l} (\mathbf {F}_{\mathrm{mod}} + \mathbf {P})_{\alpha \beta }^{-1}\ \left(\mathbf {C}_{\mathrm{mod}}^{-1}\right)_{lj} \partial _\beta \xi _l\nonumber \\ \boldsymbol{\zeta } &=& \sum _\alpha \left[(\mathbf {F}_{\mathrm{mod}}+\mathbf {P})^{-1}\mathbf {P}\ (\boldsymbol{\pi }^{\mathrm{prior}} - \boldsymbol{\pi }^{0})\right]_{\alpha }\partial _\alpha \boldsymbol{\xi }.\nonumber \\ \end{eqnarray*}$$

(52)

The residual between |$\boldsymbol{\hat{\xi }}$| and |$\boldsymbol{\xi }_{\mathrm{MP}}$| hence becomes

$$\begin{eqnarray*} \boldsymbol{\hat{\xi }} - \boldsymbol{\xi }_{\mathrm{MP}} =&\ (\mathbb {1} - \boldsymbol{\mathcal {P}}_{\mathrm{mod}})\cdot (\boldsymbol{\hat{\xi }} - \boldsymbol{\xi }^0) - \boldsymbol{\zeta }\ . \end{eqnarray*}$$

(53)

Treating the prior centre |$\boldsymbol{\pi }^{\mathrm{prior}}$| again as a random vector centred around |$\boldsymbol{\pi }^{\mathrm{true}}$|⁠, |$\boldsymbol{\zeta }$| also becomes a random vector with covariance

$$\begin{eqnarray*} \left(\mathbf {C}_\zeta \right)_{ij}& \equiv & \mathrm{Cov}[\zeta _i , \zeta _j]\nonumber \\ &=& \sum _{\alpha \beta \gamma \delta } \partial _{\alpha } \xi _i\ (\mathbf {F}_{\mathrm{mod}} + \mathbf {P})_{\alpha \beta }^{-1}\ \mathbf {P}_{\beta \gamma }\ (\mathbf {F}_{\mathrm{mod}} + \mathbf {P})_{\gamma \delta }^{-1}\ \partial _{\delta } \xi _j\ . \end{eqnarray*}$$

(54)

Hence, along lines similar to the case without a prior, we can write the moments of |$\chi _{\mathrm{MP}}^2$| for a given model covariance as

$$\begin{eqnarray*} \left\langle \chi _{\mathrm{MP}}^2 \right\rangle = \mathrm{Trace}\left(\lbrace \mathbf {C}_{{\mathcal {P}}_{\mathrm{mod}}}+\mathbf {C}_\zeta \rbrace \ \mathbf {C}_{\mathrm{mod}}^{-1}\right) \end{eqnarray*}$$

(55)

$$\begin{eqnarray*} \mathrm{Var}\left(\chi _{\mathrm{MP}}^2\right) = 2\ \mathrm{Trace}\left(\left[ \lbrace \mathbf {C}_{{\mathcal {P}}_{\mathrm{mod}}}+\mathbf {C}_\zeta \rbrace \ \mathbf {C}_{\mathrm{mod}}^{-1}\right]^2 \right) \ . \end{eqnarray*}$$

(56)

Notice that in the absence of priors |$\mathbf {C}_\zeta = \mathbb {0}$| and for the true covariance |$\mathbf {C}$|⁠, we recover equations (43) and (44) as expected. Equations (55) and (56) are used to produce our main result shown in Fig. 1 for different covariance matrices.

6 EXPLORING DIFFERENT EFFECTS IN THE COVARIANCE MODELLING

Our main goal is to study the impact of including different effects in the covariance modelling on the estimation of parameters. Several covariance matrices were generated and tested under different assumptions and approximations. The main results were already shown in Section 2. We now present the details of each step in the validation strategy that was outlined in Section 5.

6.1 Gaussian likelihood assumption

A basic assumption of our framework of testing different covariance matrices is that the likelihood function of the data is Gaussian. One simple reason of why the sampling distribution of the correlation functions cannot be an exact multivariate Gaussian is that this violates the positivity constraint of the power spectrum (Schneider & Hartlap 2009). There are also other reasons described below. The purpose of this subsection is to assess the impact of non-Gaussianity of the likelihood of two-point functions in the parameter estimation. In this sense, checking this basic assumption is a test of the whole framework and is different from the robustness tests for the covariance matrix modelling described in the remaining subsections of this section.

The impact of a non-Gaussian likelihood in parameter estimation of weak lensing correlation functions has been recently studied in Lin et al. (2020) where no significant biases were found in one-dimensional posteriors of Ω_m and σ₈ between the multivariate Gaussian likelihood model and more complex non-Gaussian likelihood models. Also, in Sellentin, Heymans & Harnois-Déraps (2018) the skewed distributions of weak lensing shear correlation functions are used to derive an analytical expression for a non-Gaussian likelihood.

We first consider a full-sky survey such that each of our two-point function estimators |$\hat{\xi }^{\mathrm{AB}}(\theta)$| is a harmonic transform of a harmonic space estimator |$\hat{C}_\ell ^{AB}$| (cf. equation 12), i.e.

$$\begin{eqnarray*} \hat{\xi }^{\mathrm{AB}}(\theta) = \sum _{\ell = 0}^\infty \frac{2 \ell +1}{4 \pi } F^{AB}_\ell (\theta) \, \hat{C}_\ell ^{AB}\ . \end{eqnarray*}$$

(57)

Each |$\hat{C}_\ell ^{AB}$| is given in terms of the spherical harmonics coefficients a_ℓm, b_ℓm of two Gaussian random fields as

$$\begin{eqnarray*} \hat{C}_\ell ^{AB} = \frac{1}{2\ell +1} \sum _{m=-\ell }^{\ell } a_{\ell m} b_{\ell m}^{*}\ . \end{eqnarray*}$$

(58)

The product of two Gaussian random variables does not follow a Gaussian distribution. Therefore, in principle one would not expect |$\hat{C}_\ell ^{AB}$| [and consequently |$\hat{\xi }^{\mathrm{AB}}(\theta)$|] to have a Gaussian likelihood. However, at small scales, i.e. at high multipoles ℓ, the sum of the random variables |$a_{\ell m} b_{\ell m}^{*}$| in equation (58) will approach a Gaussian distribution by means of the central limit theorem, since there are a large number (2ℓ + 1) of independent modes. It should be pointed out that at these small scales the galaxy density and shear fields characterized by a_ℓm and b_ℓm are themselves non-Gaussian due to the non-linear evolution of gravity.

It is hence our working hypothesis that non-Gaussianity of |$\hat{C}_\ell ^{AB}$| only matters at the largest scales (small ℓ’s) where both a_ℓm and b_ℓm can be considered Gaussian random variables but not their product. In the full-sky case, it can then be shown that the second and third central moments of |$\hat{C}_\ell ^{AB}$| are given by

$$\begin{eqnarray*} \left\langle \left(\hat{C}_\ell ^{AB} - C_\ell ^{AB}\right)^2 \right\rangle = \frac{\left[\left(C_\ell ^{AB}\right)^2+ C_\ell ^{AA}C_\ell ^{BB}\right]}{2\ell +1} \end{eqnarray*}$$

(59)

$$\begin{eqnarray*} \left\langle \left(\hat{C}_\ell ^{AB} - C_\ell ^{AB}\right)^3 \right\rangle = \frac{2 \left[ \left(C_\ell ^{AB}\right)^3 + 3 C_\ell ^{AA}C_\ell ^{BB}C_\ell ^{AB}\right]}{(2\ell +1)^2}\ . \end{eqnarray*}$$

(60)

If only a fraction f_sky of the sky is being observed, these moments get divided by f_sky and |$f_{\mathrm{sky}}^2$|⁠, respectively.

Assuming different multipoles to be uncorrelated, the corresponding moments of |$\hat{\xi }^{\mathrm{AB}}(\theta)$| can be computed as

$$\begin{eqnarray*} \left\langle \left(\hat{\xi }^{\mathrm{AB}}(\theta)-\xi ^{\mathrm{AB}}(\theta)\right)^2\right\rangle = \sum _{\ell = 0}^\infty \left(\frac{2 \ell +1}{4 \pi } F^{AB}_\ell (\theta)\right)^2 \ \left\langle \left(\hat{C}_\ell ^{AB} - C_\ell ^{AB}\right)^2\right\rangle \end{eqnarray*}$$

(61)

$$\begin{eqnarray*} \left\langle \left(\hat{\xi }^{\mathrm{AB}}(\theta)-\xi ^{\mathrm{AB}}(\theta)\right)^3\right\rangle = \sum _{\ell = 0}^\infty \left(\frac{2 \ell +1}{4 \pi } F^{AB}_\ell (\theta)\right)^3 \ \left\langle \left(\hat{C}_\ell ^{AB} - C_\ell ^{AB}\right)^3\right\rangle \ . \end{eqnarray*}$$

(62)

Equation (61) is of course nothing but the diagonal of the covariance matrix (cf. equation 14).

The dominant effect of the non-Gaussianity of the C_ℓ’s is a positive skewness in the distribution of our data vectors (Sellentin et al. 2018). To estimate its impact on our parameter constraints, we approximate the entire distribution of our 3×2pt data vector by a multivariate lognormal distribution. The covariance of our data vector and the skewness of each data point as given by equation (62) are sufficient to fix the parameters of a shifted lognormal distribution. We have already discussed this in Sections 4.2 and 4.3, though with a conceptual difference: In that section, we describe how to configure lognormal simulations of the cosmic density field, while here we assume measurements of the 3×2-point functions to have a multivariate lognormal distribution. To be explicit, we fix the shift parameters λ(θ) that enter the lognormal PDF of the measurements |$\hat{\xi }^{\mathrm{AB}}(\theta)$| in the different angular bins (cf. equation 24 for the definition of λ) via the equation

$$\begin{eqnarray*} \left\langle \left(\hat{\xi }^{\mathrm{AB}}(\theta)-\xi ^{\mathrm{AB}}(\theta)\right)^3\right\rangle = \frac{3\left\langle \left(\hat{\xi }^{\mathrm{AB}}(\theta)-\xi ^{\mathrm{AB}}(\theta)\right)^2\right\rangle ^2}{\lambda (\theta)} + \frac{\left\langle \left(\hat{\xi }^{\mathrm{AB}}(\theta)-\xi ^{\mathrm{AB}}(\theta)\right)^2\right\rangle ^3}{\lambda (\theta)^3}\ , \end{eqnarray*}$$

(63)

which relates the second and third central moments of lognormal random variables (Hilbert et al. 2011).

In the top panel of Fig. 7, we show the impact of this non-Gaussianity on the distribution of maximum posterior χ². For that figure, we generated 300 000 random realizations of our fiducial data vector from a multivariate Gaussian distribution, 300 000 random realizations of that data vector from a multivariate lognormal distribution, and 300 000 random realizations from another lognormal distribution, whose skewness in each data point was increased by a factor of 5. For each of these random realizations, we analytically determined the maximum posterior model within the linearized likelihood formalism of Section 5.1 and then computed the χ² between that model and the random realization. The blue histogram in the top panel of Fig. 7 shows the distribution of these χ² values for the Gaussian random realizations and the red histogram corresponds to the lognormal random realizations. The two histograms are almost identical. Hence, within the f_sky approximation employed above non-Gaussianity in the likelihood does not seem to affect our analysis. Also, even in the extreme scenario of enhancing the skewness of the data vector by a factor of 5 (green histogram) the increase in the scatter of χ² remains smaller than about |$3{{\ \rm per\ cent}}$| of the average χ² – which still would not dominate over the other effects discussed in subsequent sections (cf. Fig. 1).

Top panel: Distribution of χ2 when drawing 3×2pt data vectors from a Gaussian distribution (blue histogram), from a shifted lognormal distribution where the skewness of each data point was computed in the fsky approximation (red histogram) and when assuming that the skewness of the data points is 5 times that of the fsky approximation (green histogram). Bottom panel: Distribution of maximum posterior σ8 when fitting the linearized model to Section 5.1 Gaussian realizations of our fiducial data vector, to lognormal realizations of our fiducial data vector (blue histogram), and to lognormal realizations with 5 times the skewness of the fsky approximation employed in Section 6.1 (orange histogram).

Figure 7.

Top panel: Distribution of χ² when drawing 3×2pt data vectors from a Gaussian distribution (blue histogram), from a shifted lognormal distribution where the skewness of each data point was computed in the f_sky approximation (red histogram) and when assuming that the skewness of the data points is 5 times that of the f_sky approximation (green histogram). Bottom panel: Distribution of maximum posterior σ₈ when fitting the linearized model to Section 5.1 Gaussian realizations of our fiducial data vector, to lognormal realizations of our fiducial data vector (blue histogram), and to lognormal realizations with 5 times the skewness of the f_sky approximation employed in Section 6.1 (orange histogram).

Open in new tab Download slide

The impact of non-Gaussianity on the likelihood becomes even more negligible when directly considering the distribution of maximum posterior parameters. We demonstrate this in the bottom panel of Fig. 7 for the best-fitting values of σ₈ but find similar results for our other key cosmological parameters Ω_m and w₀. Therefore, we conclude that it is safe to assume a Gaussian distribution for the statistical uncertainties of the DES-Y3 two-point function measurements.

6.2 Modelling of connected four-point function in covariance

The connected four-point contribution to the covariance is the part that is most challenging to model analytically (Schneider et al. 2002; Hilbert et al. 2011; Sato et al. 2011; Takada & Hu 2013). This contribution is most relevant at small scales and turns out to be a small one for current LSS analyses (Krause et al. 2017; Barreira et al. 2018). This is for two reasons: (1) such analyses typically cut away their smallest scales because of uncertainties in the modelling of their data vectors and (2) at small scales the covariance matrix is often dominated by shape noise and shot noise that are believed to be well understood.

We test whether the non-Gaussian covariance parts (by which we mean both the connected four-point function and supersample covariance) are a relevant contribution to our error budget by either

replacing the non-Gaussian contributions from the fiducial halo model with the lognormal covariance described in Section 4.2.
or setting it to zero, i.e. using only a Gaussian covariance matrix.

Fig. 1 and Table 1 show that neither of these changes has a significant impact on the distribution of χ² and our parameter constraints. Assuming that our halo model and lognormal recipes do not underestimate the non-Gaussian covariance parts by orders of magnitude [see e.g. Sato et al. (2009) and Hilbert et al. (2011) for justifications of this assumption), this demonstrates that we are insensitive to the exact modelling of these contributions. At the same time, we want to stress that this finding holds for the specific scale cuts, redshift distributions, and tracer densities of the DESY3 3×2pt analysis and cannot necessarily be generalized to other analysis set-ups.

6.3 Exact angular bin averaging

Equation (14) holds when measuring the two-point correlation functions in infinitesimally small bins around the angular scales θ₁ and θ₂. This is unfeasible in practice and in fact also leads to divergent covariance matrices. This can, for example, be seen for the galaxy clustering correlation functions, where the constant term proportional to |$1/n_\mathrm{ g}^2$| in the harmonic space covariance gives a contribution to the real space covariance of

$$\begin{eqnarray*} \frac{1}{4\pi ^2 n_\mathrm{ g}^2f_{\mathrm{sky}}}\ \underset{N\rightarrow \infty }{\lim }\sum _{\ell = 1}^N \frac{(2\ell +1)}{2} P_{\ell }\left(\cos \theta \right)^2 \rightarrow \frac{1}{4\pi ^2 n_\mathrm{ g}^2f_{\mathrm{sky}}}\ \delta _D(\cos \theta - \cos \theta) (= \infty)\ . \end{eqnarray*}$$

The reason for this divergence is simply the fact that the number of galaxy pairs found in an infinitesimal bin vanishes, leading to infinite shot noise. This problem disappears when considering finite angular bins.

To analytically average over a finite angular bin [θ_min, θ_max], we assume that the number of galaxy pairs with angular separation θ is proportional to sin θ (corresponding to a uniform distribution of galaxies on the sky). We then replace the functions |$F^{AB}_\ell (\theta)$| in equations (9) and (10) by

$$\begin{eqnarray*} F^{AB}_\ell (\theta) \rightarrow \frac{1}{\cos \theta _{\min } - \cos \theta _{\max }}\ \int _{\theta _{\min }}^{\theta _{\max }} \mathrm{d}\theta \ \sin \theta \ F^{AB}_\ell (\theta)\ . \end{eqnarray*}$$

(64)

For the galaxy clustering correlation function w(θ), this leads to

$$\begin{eqnarray*} P_\ell (\cos \theta) \rightarrow \frac{\left[P_{\ell +1}(x) - P_{\ell -1}(x) \right]_{\cos \theta _{\max }}^{\cos \theta _{\min }}}{(2\ell +1)(\cos \theta _{\min } - \cos \theta _{\max })} . \end{eqnarray*}$$

(65)

The corresponding expressions for the galaxy–galaxy lensing correlation function γ_t(θ) and for the cosmic shear correlation functions ξ_± are presented (together with derivations of all the bin averaged expressions) in Appendix B.

We show below how the bin averaging solves the problem of diverging diagonal values of the covariance for w(θ). This can be seen from

$$\begin{eqnarray*} \sum _{\ell } \frac{\left(\left[ P_{\ell +1}\left(x \right) - P_{\ell -1}\left(x \right)\right]_{\cos \theta _{\max }}^{\cos \theta _{\min }}\right)^2}{2(2\ell +1)f_{\mathrm{sky}}(n_\mathrm{ g} A_{\mathrm{bin}})^2} &=& \int _{\cos \theta _{\max }}^{\cos \theta _{\min }}\mathrm{d} x_1\int _{\cos \theta _{\max }}^{\cos \theta _{\min }}\mathrm{d} x_2 \sum _{\ell } \frac{2\ell +1}{2} \frac{P_{\ell }\left(x_1 \right)P_{\ell }\left(x_2 \right)}{f_{\mathrm{sky}}(n_\mathrm{ g} A_{\mathrm{bin}})^2} \nonumber \\ &=& \int _{\cos \theta _{\max }}^{\cos \theta _{\min }}\mathrm{d} x_1\int _{\cos \theta _{\max }}^{\cos \theta _{\min }}\mathrm{d} x_2 \ \frac{\delta _D(x_1 - x_2)}{f_{\mathrm{sky}}(n_\mathrm{ g} A_{\mathrm{bin}})^2} \nonumber \\ &=& \frac{\cos \theta _{\min } - \cos \theta _{\max }}{f_{\mathrm{sky}}(n_\mathrm{ g} A_{\mathrm{bin}})^2} \nonumber \\ &=& \frac{2}{A_{\mathrm{survey}} A_{\mathrm{bin}} n_\mathrm{ g}^2}\nonumber \\ &=& \frac{1}{N_{\mathrm{pair}}}\ , \end{eqnarray*}$$

(66)

where A_survey = 4πf_sky is the total survey area, A_bin = 2π(cos θ_min − cos θ_max) is the bin area, and N_pair the total number of galaxy pairs used to estimate |$\hat{w}$|⁠. The above expression is the usual formula for the shot-noise part of the real space covariance.

The impact of the exact angular bin averaging for the noise and mixed terms in the Gaussian part of the covariance matrix is included for all four types of two-point functions present in the DES-Y3 data vector and the DES-Y3 fiducial covariance.

6.4 Flat versus curved sky

For the Y1 analysis, it was shown that the flat-sky approximation was valid for the galaxy–galaxy shear and shear–shear two-point correlation function (Krause et al. 2017). In Y3, the fiducial covariance computes the full sky correlations; see equations (12) and (13). We show in Fig. 1 that the effect of including curved sky results has negligible impact on the χ² distribution. Table 1 shows that this is also true for parameter constraints.

6.5 RSD and Limber approximation and RSD effects

The modelling of the angular power spectrum of two tracers involves a projection from the 3D power spectrum that requires integrals with integrands containing the product of two spherical Bessel functions, which are highly oscillatory. The inclusion of RSD effects in a simple linear modelling (Kaiser 1987) involves the computation of those integrals with derivatives of the Bessel functions. These integrals are notoriously difficult to perform numerically and it is usual to apply the so-called Limber approximation (Limber 1953; LoVerde & Afshordi 2008). An efficient computation of these integrals without resorting to the Limber approximation was recently implemented in the case of the angular power spectrum for galaxy clustering in Fang et al. (2020b). We use their approach to study the impact of taking into account both non-Limber computations and RSD effects in the covariance matrix. Fig. 1 and Table 1 show that not taking these effects into account leads to an increase in average χ² of about |$0.5{{\ \rm per\ cent}}$| and an underestimation of uncertainties in key cosmological parameters by |$0.6{{\ \rm per\ cent}}$| to |$1.4{{\ \rm per\ cent}}$|⁠.

6.6 Effect of the mask geometry

The analytical covariance models described in Section 4 make use of the so-called f_sky approximation; i.e. they take the covariance of an all-sky survey and divide this by the sky fraction of DES-Y3 to approximate the covariance of our partial sky data. In Appendix C, we show how to go beyond this approximation. First, we note there that the covariance of the two-point function measurements between pairs of scalar random fields (δ_a, δ_b) and (δ_c, δ_d) within angular bins |$[\theta _{-}^{ab},\theta _{+}^{ab}]$| and |$[\theta _{-}^{cd},\theta _{+}^{cd}]$| is given by

$$\begin{eqnarray*} &&{\mathrm{Cov}\left\lbrace \hat{\xi }^{ab}\left[\theta _{-}^{ab},\theta _{+}^{ab}\right],\hat{\xi }^{cd}\left[\theta _{-}^{cd},\theta _{+}^{cd}\right]\right\rbrace \ \frac{N_{\mathrm{pair}}^{ab}\left[\theta _{-}^{ab},\theta _{+}^{ab}\right] \ N_{\mathrm{pair}}^{cd}\left[\theta _{-}^{cd},\theta _{+}^{cd}\right]}{n_a n_b n_c n_d}}\nonumber \\ &&{\quad= (2\pi)^2\sum _{\ell _1\ \ell _2} \left[P_{\ell _1+1}(x) - P_{\ell _1-1}(x) \right]_{\theta _{+}^{ab}}^{\theta _{-}^{ab}}\ \left[P_{\ell _2+1}(x) - P_{\ell _2-1}(x) \right]_{\theta _{+}^{cd}}^{\theta _{-}^{cd}}\cdot \mathrm{Cov}\left\lbrace \tilde{C}_{\ell _1}^{ab}, \tilde{C}_{\ell _2}^{cd} \right\rbrace \ . } \end{eqnarray*}$$

(67)

Here, P_ℓ are again the Legendre polynomials and the angular bin averaging was already evaluated. The factor |$N_{\mathrm{pair}}^{ab}[\theta _{-}^{ab},\theta _{+}^{ab}]$| (resp. |$N_{\mathrm{pair}}^{cd}[\theta _{-}^{cd},\theta _{+}^{cd}]$|⁠) is the average number of pairs of random points that uniformly sample the footprint with densities n_a, n_b (resp. n_c, n_d) and whose separation falls into the angular bin |$[\theta _{-}^{ab},\theta _{+}^{ab}]$| (resp. |$[\theta _{-}^{cd},\theta _{+}^{cd}]$|⁠). Hence, these factors describe how the exact survey geometry suppresses the number of pairs of positions in the bins |$[\theta _{-}^{ab},\theta _{+}^{ab}]$| and |$[\theta _{-}^{cd},\theta _{+}^{cd}]$| with respect to the f_sky approximation. Also, finally, |$\mathrm{Cov}\lbrace \tilde{C}_{\ell _1}^{ab}, \tilde{C}_{\ell _2}^{cd} \rbrace$| is the covariance of pseudo-C_ℓ estimates of the power spectra between the fields (δ_a, δ_b) and (δ_c, δ_d) (see Appendix C for more details). Note that the full survey footprint modifies the covariance with respect to the f_sky approximation used in Section 3 both through the factors |$N_{\mathrm{pair}}^{ab}[\theta _{-}^{ab},\theta _{+}^{ab}]/n_an_b$|⁠, |$N_{\mathrm{pair}}^{cd}[\theta _{-}^{cd},\theta _{+}^{cd}]/n_c n_d$| and by changing |$\mathrm{Cov}\lbrace \hat{C}_{\ell _1}^{ab}, \hat{C}_{\ell _2}^{cd} \rbrace$| compared to equations (16)–(23).

One can determine the factors |$N_{\mathrm{pair}}^{ab}[\theta _{-}^{ab},\theta _{+}^{ab}]/n_an_b$| and |$N_{\mathrm{pair}}^{cd}[\theta _{-}^{cd},\theta _{+}^{cd}]/n_c n_d$| either by counting pairs in a set of random points that trace the survey footprint homogeneously or they can be calculated analytically from the power spectrum of the survey mask (see our Appendix C as well as Troxel et al. 2018b). This will generally lead to an enhancement of statistical uncertainties (i.e. of the covariance matrix) with respect to the f_sky approximation. To calculate |$\mathrm{Cov}\lbrace \tilde{C}_{\ell _1}^{ab}, \tilde{C}_{\ell _2}^{cd} \rbrace$|⁠, one could e.g. follow approximations made by Efstathiou (2004). We slightly modify their arguments in Appendix C to arrive at

$$\begin{eqnarray*} \mathrm{Cov}\left\lbrace \tilde{C}_{\ell _1}^{ab}, \tilde{C}_{\ell _2}^{cd} \right\rbrace \approx \frac{1}{4}\left(\frac{C_{\ell _1}^{ac} C_{\ell _2}^{bd} + C_{\ell _2}^{ac} C_{\ell _1}^{bd} + C_{\ell _1}^{ac} C_{\ell _1}^{bd} + C_{\ell _2}^{ac} C_{\ell _2}^{bd}}{(2\ell _1+1)(2\ell _2+1)} + \frac{C_{\ell _1}^{ad} C_{\ell _2}^{bc} + C_{\ell _2}^{ad} C_{\ell _1}^{bc} + C_{\ell _1}^{ad} C_{\ell _1}^{bc} + C_{\ell _2}^{ad} C_{\ell _2}^{bc}}{(2\ell _1+1)(2\ell _2+1)}\right)\ \mathcal {M}_{\ell _1 \ell _2}\ . \end{eqnarray*}$$

(68)

Here, the mode coupling matrix |$\mathcal {M}_{\ell _1 \ell _2}$| again depends on the power spectrum of the survey mask and is also detailed in Appendix C. Note that in order to keep our notation brief, we have assumed that the power spectra |$C_{\ell _1}^{ac}$| etc. in the above equation include both the underlying cosmological power spectra and contributions to the power spectra from sampling noise, such as shape noise and shot noise.

In practice, equation (68) and the approximations proposed by Efstathiou (2004) yield very similar results and they are both valid on scales ℓ₁, ℓ₂ that are much smaller than the typical scales of the mask W. Unfortunately, the DES-Y3 analysis mask has features and holes over a large range of scales. Hence, the angular scales of interest in the 3×2pt analysis are never strictly smaller than the scales of our mask. Hence, equation (68) is not sufficiently accurate in our case and in fact significantly overestimates our statistical uncertainties. In Fig. 8, we explain a simple scheme that can be used to correct for this. To motivate this procedure, consider how one would compute the Gaussian covariance model directly from the real space two-point correlation functions, i.e. without taking the detour to Fourier space that was used in Section 3. For the covariance of |$\hat{\xi }^{ab}[\theta _{-}^{ab},\theta _{+}^{ab}]$| and |$\hat{\xi }^{cd}[\theta _{-}^{cd},\theta _{+}^{cd}]$|⁠, this would amount to integration over all pairs of locations within our survey mask that fall into the angular bins |$[\theta _{-}^{ab},\theta _{+}^{ab}]$| and |$[\theta _{-}^{cd},\theta _{+}^{cd}]$|⁠. Schematically, this leads to terms of the form

$$\begin{eqnarray*} \mathrm{Cov} \propto \underset{(ab)\in \mathrm{mask,bin}}{\int } \mathrm{d}\Omega ^{a}\mathrm{d}\Omega ^{b} \underset{(cd)\in \mathrm{mask,bin}}{\int } \mathrm{d}\Omega ^{c}\mathrm{d}\Omega ^{d} \xi ^{ac}(\theta ^{ac})\xi ^{bd}(\theta ^{bd}) + \dots \ . \end{eqnarray*}$$

(69)

Here, Ω^a…Ω^d are four locations inside the survey mask such that the distance between Ω^a and Ω^b lies inside the angular bin |$[\theta _{-}^{ab},\theta _{+}^{ab}]$| and the distance between Ω^c and Ω^d lies inside the angular bin |$[\theta _{-}^{cd},\theta _{+}^{cd}]$|⁠. Now the approximation of Efstathiou (2004) assumes that the two-point functions ξ^ac(θ), ξ^bd(θ) are negligible on scales θ comparable to the small-scale features in the mask. Schematically, this amounts to making the approximation

$$\begin{eqnarray*} \int \mathrm{d}\Omega ^{a}\ \dots \ W(\Omega ^{a})\xi ^{ac}(\theta ^{ac})\approx \bar{\xi }^{ac}\int \mathrm{d}\Omega ^{a}\ \dots \ W(\Omega ^{a})\delta _{\mathrm{Dirac}}^2(\Omega ^{a}-\Omega ^{c}) , \end{eqnarray*}$$

(70)

where |$\bar{\xi }^{ac}$| is a suitable average of the two-point function over different scales. Our understanding of the approximation of Efstathiou (2004) via equation (70) is based on findings that we present in Appendix D. This approximation fails when the mask contains features (e.g. holes) on scales where the two-point function has not yet decayed. Assuming that such small-scale holes cover a fraction of f_mask of a more coarse version of the footprint, this can roughly be corrected for with a multiplicative factor, i.e. by instead using the approximation

$$\begin{eqnarray*} \int \mathrm{d}\Omega ^{a}\ \dots \ W(\Omega ^{a})\xi ^{ac}(\theta ^{ac}) \approx f_{\mathrm{mask}}\ \bar{\xi }^{ac}\int \mathrm{d}\Omega ^{a}\ \dots \ W(\Omega ^{a})\delta _{\mathrm{Dirac}}^2(\Omega ^{a}-\Omega ^{c})\ . \end{eqnarray*}$$

(71)

The impact of masking on the DES-Y3 covariance. The blue histogram in the right-hand panel shows the distribution of χ2 obtained from our flask data vectors when using the fsky approximation. We restrict this figure to the 2×2pt function part of the data vector since it is this part that suffers the most from masking effects (cf. Fig. 1). Ansatzes in the CMB literature (e.g. Efstathiou 2004) are not sufficient to correct for this, because the DES footprint has features down to very small scales. In the main text, we have motivated a possible way to correct for these small-scale masking features and the orange histogram in the left-hand panel shows that this ansatz indeed significantly improves the χ2 obtained from our flask measurements. The sketch in the right-hand panel visualizes how small-scale features in the mask lead to an overestimation in the covariance when using common ways to treat the impact of survey geometry on the two-point function covariance (see the main text for explanation).

Figure 8.

The impact of masking on the DES-Y3 covariance. The blue histogram in the right-hand panel shows the distribution of χ² obtained from our flask data vectors when using the f_sky approximation. We restrict this figure to the 2×2pt function part of the data vector since it is this part that suffers the most from masking effects (cf. Fig. 1). Ansatzes in the CMB literature (e.g. Efstathiou 2004) are not sufficient to correct for this, because the DES footprint has features down to very small scales. In the main text, we have motivated a possible way to correct for these small-scale masking features and the orange histogram in the left-hand panel shows that this ansatz indeed significantly improves the χ² obtained from our flask measurements. The sketch in the right-hand panel visualizes how small-scale features in the mask lead to an overestimation in the covariance when using common ways to treat the impact of survey geometry on the two-point function covariance (see the main text for explanation).

Open in new tab Download slide

The right-hand panel of Fig. 8 visualizes this for the mixed terms in the covariance, where one of the correlation functions ξ^ac or ξ^bd is due to sampling noise such as shape noise or shot noise and is hence exactly proportional to a Dirac delta function. In that case, the integration is over pairs that share one end point. Now, the approximation made e.g. in Efstathiou (2004) or by our equation (68) assumes that also the correlation function between the other two end points effectively acts as a delta function with respect to the smallest scale features in the survey mask (cf. equation 70). We find that this is not the case for the DES-Y3 mask and that it contains features on all scales relevant to our analysis. However, as indicated in equation (71), one can approximately correct for this by multiplying the mixed terms in the covariance by the fraction f_mask of the coarser survey geometry that is covered by small-scale holes in the mask. This can be considered a next-to-leading-order correction to our equation (68).

By applying equation (71) twice, one can see that the cosmic variance terms (terms where neither of the two-point functions ξ^ac or ξ^bd are exactly proportional to delta functions) can be corrected by multiplication with |$f_{\mathrm{mask}}^2$|⁠. To implement this correction in practice, we draw circles within the DES-Y3 survey footprint with radii ranging from 5 to 20 arcmin and measure the masking fraction in these circles. We find that this fraction is |${\approx} 90{{\ \rm per\ cent}}$| across the considered scales. Multiplying the mixed terms in the covariance by that fraction and the cosmic variance terms by the square of that fraction (together with using equation 68), we indeed find significant improvement of the maximum posterior χ² obtained for the flask simulations – as is shown in the left-hand panel of Fig. 8 (as well as in Fig. 1).

In Fig. 9, we use our flask measurements together with the technique of precision matrix expansion (PME; from inverse covariance = precision matrix; Friedrich & Eifler 2018) and perform a consistency of the modelling ansatz described above by investigating the impact of masking on individual covariance terms. We find both with the PME methods and with our analytic ansatz that masking effects are most impactful in the covariance terms that depend on shape noise of the weak lensing source galaxies (i.e. in what we called mixed terms in Section 3). This also agrees with the findings of Joachimi et al. (2020) and it further motivates the modelling of masking effects that we have described here. Nevertheless, we do not elevate this modelling ansatz to our fiducial covariance model because its motivation remains rather heuristic. However, we consider it a realistic estimate for the error made by the f_sky approximation and can hence use it to estimate the impact of that approximation on parameter constraints. In Fig. 2, we have already shown that this impact is below the |$1{{\ \rm per\ cent}}$| level; i.e. we underestimate the scatter of maximum posterior parameters by less than |$1{{\ \rm per\ cent}}$| when making the f_sky approximation in our fiducial covariance model.

The method of PME (Friedrich & Eifler 2018) allows us to estimate the impact of individual covariance terms on χ2 even when only few simulated measurements are available. The orange squares show the average χ2 between our flask measurements and their mean [re-scaled by a factor of Nflask/(Nflask − 1) to account for the correlation of individual measurements and mean] when using either no PME at all or when using PME estimates from shape-noise free sims or from the full sims. The blue dots show the corresponding χ2 values when using the heuristically motivated analytical treatment of masking and survey geometry presented in the main text. The grey dashed line represents the number of data points and should be the average χ2 if we had a perfect covariance model (note that for this comparison we have not performed any parameter fitting).

Figure 9.

The method of PME (Friedrich & Eifler 2018) allows us to estimate the impact of individual covariance terms on χ² even when only few simulated measurements are available. The orange squares show the average χ² between our flask measurements and their mean [re-scaled by a factor of N_flask/(N_flask − 1) to account for the correlation of individual measurements and mean] when using either no PME at all or when using PME estimates from shape-noise free sims or from the full sims. The blue dots show the corresponding χ² values when using the heuristically motivated analytical treatment of masking and survey geometry presented in the main text. The grey dashed line represents the number of data points and should be the average χ² if we had a perfect covariance model (note that for this comparison we have not performed any parameter fitting).

Open in new tab Download slide

Note that Kilbinger & Schneider (2004), Sato et al. (2011), Shirasaki et al. (2019), and Philcox & Eisenstein (2019) have devised and promoted an alternative method to correct for masking, which amounts to direct Monte Carlo integration of expressions like equation (69). Given the large area of DES-Y3 and its numerous combinations of redshift bins, we did not find this to be feasible.

6.7 Non-Poissonian shot noise

In the Poissonian limit and in a complete region of the sky, the power spectrum of shot noise is scale independent and given by

$$\begin{eqnarray*} N^{\rm {complete}}_{\ell } = \frac{1}{\bar{n}}, \end{eqnarray*}$$

(72)

where |$\bar{n}$| is the galaxy density per steradian. As noted in the previous subsection, in galaxy surveys not every region of the sky is fully accessible; i.e. the presence of bright stars, satellite trails, etc. leads to artificial changes in the measured galaxy density. These density changes can potentially modify the observed galaxy power spectrum and bias any cosmological analyses derived from them, and thus, they are avoided by removing certain regions of the sky where artefacts may be found. These regions are usually smaller than the resolution of the (pixelated) survey mask used to determine whether a region of the sky is within the footprint or not, since it is computationally expensive to increase the resolution. This this can be described through a fractional mask W_i = 1/f_i, where i is a given pixel of the mask and f_i is the fractional area of the pixel unaffected by the presence of artefacts. If we compute the galaxy overdensity as |$\delta _{g, i} = N_{i}/(\bar{N} W_{i})-1$|⁠, with |$\bar{N} = \sum _{i} N_{i}/ \sum _{i} W_{i}$|⁠, the mean number of sources per pixel, we can estimate the Poissonian noise power spectrum as (Nicola et al. 2020)

$$\begin{eqnarray*} N_{\ell } = \Omega _{\mathrm{ pix}}\frac{\bar{W}}{\bar{N}}, \end{eqnarray*}$$

(73)

where |$\bar{w}$| is the mean of the weights w_i across the footprint, and Ω_pix is the area of the pixels from the mask in steradians. In the case where all the pixels in the footprint are fully complete, we recover equation (72) since |$\bar{n} = \bar{N}/\Omega _{\mathrm{ pix}}$|⁠, and |$\bar{N}=\sum _{i}N_{i}/\sum _{i}1$|⁠. However, in the case that any of the pixels of the mask are not fully complete we obtain an increased shot-noise contribution by a factor |$\bar{W} \ge 1$| (since 0 ≤ f_i ≤ 1).

In previous studies, the DES-Y1 lens galaxies were shown to prefer a super-Poissonian variance (Friedrich et al. 2018; Gruen et al. 2018) that might be a consequence of their complex selection criteria or due to the nature of their formation and evolution (see e.g. Baldauf et al. 2013; Dvornik et al. 2018). This super-Poissonian variance leads to an enhancement in shot noise. In order to test for this effect, we proceeded to estimate the angular power spectrum, C_ℓ ≈ C_ℓ,galaxies + N_ℓ + δC_ℓ, of DES-Y1 redmagic galaxies selected in Elvin-Poole et al. (2018) using NaMaster (Alonso et al. 2019), where N_ℓ is the shot-noise contribution from equation (73) and δC_ℓ is the excess power that can be due to a number of factors (variations in completeness not captured by the mask, super-Poissonian shot noise, observational systematics, etc.). We also compute the power spectrum, C_ℓ,rnd of a random field with the same number of objects as the galaxy sample considered, and the probability of populating a pixel i is proportional to its weight, W_i. We find that C_ℓ,rnd is statistically consistent with N_ℓ. We then compute the ratio as follows:

$$\begin{eqnarray*} r_{\ell } = \frac{C_{\ell }-C_{\ell ,rnd}}{N_{\ell }} \approx \frac{C_{\ell , \mathrm{ galaxies}}}{N_{\ell }} + \frac{\delta C_{\ell }}{N_{\ell }}. \end{eqnarray*}$$

(74)

In Fig. 10, we show r_ℓ compared to the theoretical expectation for C_ℓ,galaxies/N_ℓ = C_ℓ,th/N_ℓ, where C_ℓ,th is the theoretical power spectrum computed using the best-fitting parameters found in Elvin-Poole et al. (2018). We also allow for a 20 per cent variation in the linear galaxy bias, which is much larger than the uncertainty found in Elvin-Poole et al. (2018). We find that there is an excess power at ℓ ≥ 3000 that cannot be explained by an excess galaxy clustering (i.e. a larger than measured linear bias or a non-linear bias component). We identify this excess (between |$2{{\ \rm per\ cent}}$| and |$6{{\ \rm per\ cent}}$|⁠) with |$\frac{\delta C_{\ell }}{N_{\ell }}$| in equation (74).

$Measured ratio $r_{\ell } = \frac{C_{\ell }-C_{\ell ,rnd}}{N_{\ell }}$ (crosses) compared to predicted contribution of the galaxy power spectra over the shot noise (solid line) for the fiducial parameters at Elvin-Poole et al. (2018) allowing for a 20 per cent uncertainty in the galaxy bias (shaded regions) for two redshift bins (bin 4 in blue and bin 5 in orange). Horizontal dashed lines are just to guide the eye. If the shot noise were to be completely Poissonian, the measured and predicted ratios would agree; however, we find an excess between $2{{\ \rm per\ cent}}$ and $6{{\ \rm per\ cent}}$.$

Figure 10.

Measured ratio |$r_{\ell } = \frac{C_{\ell }-C_{\ell ,rnd}}{N_{\ell }}$| (crosses) compared to predicted contribution of the galaxy power spectra over the shot noise (solid line) for the fiducial parameters at Elvin-Poole et al. (2018) allowing for a 20 per cent uncertainty in the galaxy bias (shaded regions) for two redshift bins (bin 4 in blue and bin 5 in orange). Horizontal dashed lines are just to guide the eye. If the shot noise were to be completely Poissonian, the measured and predicted ratios would agree; however, we find an excess between |$2{{\ \rm per\ cent}}$| and |$6{{\ \rm per\ cent}}$|⁠.

Open in new tab Download slide

This excess will translate into an extra shot-noise-like contribution to the covariance matrix (Philcox et al. 2020). The way we include this is by fitting a correction to the number density α_n such that the excess power is compatible with zero. In order to do so, we minimize the following χ²:

$$\begin{eqnarray*} \chi ^{2}(\alpha _{n}) = \sum _{\ell }\left(\frac{C_{\ell }-C_{\ell , th}}{\alpha _{n}N_{\ell }}-\frac{C_{\ell ,rnd}}{N_{\ell }}\right)^{2}\left(\frac{\Delta C_{\ell , th}}{\alpha _{n} N_{\ell }}\right)^{-2} , \end{eqnarray*}$$

(75)

where ΔC_ℓ,th is varied in the range of 1.2²C_ℓ,th–0.8²C_ℓ,th (so we are allowing for 20 per cent uncertainty in the bias for the fit). The resulting values for α_n can be found in Table 4. In Figs 1 and 2 and Table 1, one can see that depleting the lens galaxy densities in our fiducial covariance model by these factors has a negligible effect on both maximum posterior χ² and parameters constraints.

Table 4.

Open in new tab

Best-fitting values of α_n to correct for the excess shot noise with the DES-Y1 redmagic galaxies.

Bin number	α
1	1.042 ± 0.002
2	1.069 ± 0.003
3	1.072 ± 0.003
4	1.057 ± 0.003
5	1.021 ± 0.001

Table 4.

Open in new tab

Best-fitting values of α_n to correct for the excess shot noise with the DES-Y1 redmagic galaxies.

Bin number	α
1	1.042 ± 0.002
2	1.069 ± 0.003
3	1.072 ± 0.003
4	1.057 ± 0.003
5	1.021 ± 0.001

6.8 Cosmology dependence of the covariance model

In order to evaluate our covariance model, we choose a particular set of cosmological parameters. We do not vary these parameters when sampling our parameter posterior and this may impact the width of our parameter constraints (Hamimeche & Lewis 2008; Eifler et al. 2009; White & Padmanabhan 2015; Kalus, Percival & Samushia 2016). Our main reason for not sampling the covariance model along with the data model is that computing a covariance matrix is computationally too costly for this to be feasible. Recently, Carron (2013) has also indicated that it may indeed be incorrect to vary the covariance cosmology when running MCMC chains.

It is only after running the MCMC chains that we can recompute the covariance at our best-fitting parameters and re-derive our parameter constraints – repeating this process until our constraints have converged (cf. Abbott et al. 2018, for the application of this procedure in the DES Y1 data). Therefore, the cosmology at which we compute our covariance is expected to be off from the best-fitting cosmology. In this subsection, we investigate how χ², as well as cosmological parameter constraints, shifts when computing the covariance at cosmologies that are randomly drawn from the DESY3-like posterior.

We test the robustness of our constraints against the choice of cosmological parameters at which we evaluate the covariance model by taking a set of 100 different cosmologies drawn randomly from the simulated DES-Y3 3×2pt posterior and generating 100 lognormal covariance matrices. Using each of these covariances, we estimate posteriors for a given realization of simulated DES-Y3 3×2pt data with noise drawn from a fiducial lognormal covariance.

Since it is prohibitively expensive to perform simulated analyses running MCMC chains for each covariance matrix, we use the technique of importance sampling. That allows us to quickly evaluate how these different likelihood modelling choices impact the derived parameter constraints without repeatedly running expensive sampling algorithms. In our importance sampling pipeline, we take a fiducial analysis as a proposal distribution, re-evaluate the likelihoods using the alternative covariance matrix, and compute importance weights as:

$$\begin{eqnarray*} w_i = \frac{\mathcal {L}(\pi _i | \hat{\xi }, {\bf C}_\text{alt})}{\mathcal {L}(\pi _i | \hat{\xi }, {\bf C}_\text{fid})}, \end{eqnarray*}$$

(76)

where C_fid is the fiducial covariance in the analysis and C_alt is the alternative one. If the changes induced by the new covariance matrix in the posterior are not too large, the re-weighted samples represent the target distribution (i.e. the posterior for the alternative covariance matrix). So we have

$$\begin{eqnarray*} E_{\bf p}[f(X_i)] = \sum _i p_i f(X_i) = \sum _i q_i \frac{p_i}{q_i} f(X_i) = E_{\bf q}[w_i f(X_i)], \end{eqnarray*}$$

(77)

for a function f(X_i) of the posterior samples. Here, p_i is the probability of X_i under the target distribution p and q_i is the probability of X_i under the proposal distribution q (see e.g. MacKay 2002; Owen 2013).

To diagnose the performance of our importance sampling estimates, we use the effective sample size (ESS):

$$\begin{eqnarray*} \text{ESS} = \frac{\left \langle w_i\right\rangle ^2}{\left\langle w_i^2\right\rangle } N_\text{samples} , \end{eqnarray*}$$

(78)

where N_samples is the total number of posterior samples used in the estimation. The ESS as defined above quantifies the statistical power of the sample set after the re-weighting process (assuming uncorrelated samples). It is equal to the original sample size re-scaled by the ratio of the variances under each of the distributions (Martino, Elvira & Louzada 2017), such that the error of the mean of a quantity x with standard deviation σ_x under the target distribution can be estimated as |$\sigma _x / \sqrt{\text{ESS}}$|⁠. Additionally, since our proposal distribution is itself a weighted sample set, we incorporate both the original and the importance weights in our ESS estimate.

Using the fiducial lognormal covariance matrix, we run the nested sampling algorithm MultiNest (Feroz & Hobson 2008; Feroz, Hobson & Bridges 2009; Feroz et al. 2019), and perform the importance sampling procedure to estimate parameters using each of the 100 covariance matrices randomly sampled in parameter space. The (S₈, Ω_m) contours can be seen in Fig. 11. The ESSs for the importance sampled estimates range from 16 446 to 18 329 (implying a standard error of the mean within |$0.78{{\ \rm per\ cent}}$| of the standard deviation for all cases), and the contours show good statistics. As the impact of covariance cosmology is barely noticeable for this range of tested parameters, we repeat the analysis for a few more extreme (and unlikely) cosmologies in Appendix F.

(S8, Ωm) constraints for a given noisy realization of the DES-Y3 3×2pt data vector analysed using 100 lognormal covariance matrices, each computed from a different cosmology drawn from a simulated DES-Y3 3×2pt posterior. The 100 contours are superimposed in the plot, showing very small change in constraints. The points indicate the cosmologies at which the covariances were evaluated.

Figure 11.

(S₈, Ω_m) constraints for a given noisy realization of the DES-Y3 3×2pt data vector analysed using 100 lognormal covariance matrices, each computed from a different cosmology drawn from a simulated DES-Y3 3×2pt posterior. The 100 contours are superimposed in the plot, showing very small change in constraints. The points indicate the cosmologies at which the covariances were evaluated.

Open in new tab Download slide

These results all confirm that we can safely neglect the impact of the choice of covariance cosmology in DES-Y3 3×2pt analysis. One caveat of this conclusion is that we have indeed only varied cosmological parameters (including galaxy bias parameters) but not nuisance parameters (multiplicative shear bias, photometric redshift uncertainties) or parameters that describe intrinsic alignment. However, the DES-Y3 shear and photo-z calibration yield tight Gaussian priors on the corresponding nuisance parameters. Also, intrinsic alignment is relevant only on small angular scales where the covariance matrix is dominated by sampling noise contributions. Hence, we do not expect the results of this section to change significantly had all parameters been varied.

6.9 Random point shot noise

We also consider the effect of additional shot noise in the measurements of galaxy clustering resulting from the use of finite numbers of random points. The Landy–Szalay estimator (Landy & Szalay 1993) is estimating the galaxy clustering correlation function inside an angular bin [θ₁, θ₂] as

$$\begin{eqnarray*} \hat{w}[\theta _{1}, \theta _{2}] = \frac{DD[\theta _{1}, \theta _{2}] - 2DR[\theta _{1}, \theta _{2}] + RR[\theta _{1}, \theta _{2}]}{RR[\theta _{1}, \theta _{2}]}\ , \end{eqnarray*}$$

(79)

where DD[θ₁, θ₂] is the number of galaxy pairs found within the angular bin, RR[θ₁, θ₂] is the (normalized) number of pairs of random points that uniformly samples the survey footprint, and DR[θ₁, θ₂] is the (normalized) number of galaxy-random-point pairs within the angular bin. If the number density of random points n_r is much larger than the number density of the galaxies n_g (as is recommended for reduce sampling noise), then both RR and DR must be re-scaled by factors of (n_g/n_r)² and (n_g/n_r) respectively.

We stress that the Landy–Szalay estimator was devised at a time of very limited computational resources, where it was prohibitively costly to measure galaxy pair in a large number of random points. Hence, it was vital to minimize random point shot noise. Nowadays, footprint geometries of photometric surveys are typically characterized by high-resolution healpix maps. The most straightforward way to calculate galaxy clustering correlation function is to simply assign a value of galaxy density contrast to each of these pixels and then measure the scalar autocorrelation function of the unmasked pixels. This way, one is avoiding random point shot noise completely.

Nevertheless, it is still very common to measure w(θ) by means of equation (79). So we also tested what impact a finite number of random points would have on our analysis. To do so, we extended expressions of Cabré & Gaztañaga (2009; see their appendix A) to the case where the same random points are used to estimate w(θ) in each of our redshift bins and also to subtract shear around random points from our galaxy–galaxy lensing correlation functions. Note that this causes a noise contribution to the two-point function measurements that is correlated among different redshift bins. We assumed a random point density of 1.36 arcmin⁻², which is more than 20 times larger than the density of our most dense lens galaxy sample. From Table 1, it can be seen that not accounting for the random point shot noise in the covariance leads to an increase in average χ² of |${\lesssim} 1{{\ \rm per\ cent}}$| and to an underestimation of parameter uncertainties by |${\approx} 0.5{{\ \rm per\ cent}}$|⁠. Hence, this effect can be ignored for our analysis.

6.10 Effective densities and effective shape noise

We are closing this section by spelling out an aspect of covariance modelling that may seem straightforward but which has repeatedly came up in covariance discussions.

If the tracer galaxies used to estimate two-point correlation functions are weighted according to some weighting scheme, then this may change the effective number densities and the effective shape noise that should be used when evaluating the covariance expressions in Section 4. In the following, we will derive how this can be done for each of the two-point functions in the DESY3 3×2pt data vector.

6.10.1 Galaxy clustering

We start with the galaxy clustering correlation function w(θ). We assume a weighting scheme that is aimed at correcting for non-cosmological density fluctuations resulting from spatially varying observing conditions (as e.g. in Elvin-Poole et al. 2018). This means that the weights assigned to each galaxy in fact sample a weight map that spans the entire footprint.

Instead of measuring w(θ) from the weighted galaxies by means of, say, the Landy–Szalay estimator (Landy & Szalay 1993), it will be more convenient to think of the galaxy density contrast as a pixelized field on the sky. Furthermore, we will assume that the weight map has been normalized such that 〈w〉 = 1 (which can always be done without changing the outcome of the weighted measurement). Consider pixel i with galaxy count N_g,i and weight w_i. If n_g is the average galaxy density of the unweighted sample, then by taking expectation values with respect to many Poissonian shot-noise realizations (and hence ignoring fluctuations of the underlying matter density field) we get

$$\begin{eqnarray*} \langle N_{\mathrm{ g},i} \rangle = \frac{A_{\mathrm{pix}} n_\mathrm{ g}}{w_i} \end{eqnarray*}$$

(80)

$$\begin{eqnarray*} \mathrm{Var}(N_{\mathrm{ g},i}) = \frac{A_{\mathrm{pix}} n_\mathrm{ g}}{w_i} \end{eqnarray*}$$

(81)

$$\begin{eqnarray*} \mathrm{Var}(w_i N_{\mathrm{ g},i}) =&\ w_i A_{\mathrm{pix}} n_\mathrm{ g} \end{eqnarray*}$$

(82)

$$\begin{eqnarray*} \mathrm{Var}\left(\frac{w_i N_{\mathrm{ g},i}}{A_{\mathrm{pix}} n_\mathrm{ g}} - 1\right) &\equiv &\ \mathrm{Var}(\delta _{\mathrm{ g},i})\nonumber \\ &=&\ \frac{w_i}{A_{\mathrm{pix}} n_\mathrm{ g}}\ , \end{eqnarray*}$$

(83)

where A_pix is the area of each pixel and the second to last line serves as definition of δ_g,i and needs the fact that we demanded 〈w〉 = 1. Note that these equations are only valid for an ensemble of observations that shares the same weight maps and differs only in their shot-noise realizations.

From the set of all pixels, we can now estimate w(θ) within a finite angular bin [θ₁, θ₂] as

$$\begin{eqnarray*} \hat{w}[\theta _1, \theta _2] = \frac{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ \delta _{g,i}\ \delta _{g,j}}{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}}\ , \end{eqnarray*}$$

(84)

where the symbol |$\Delta _{[\theta _1, \theta _2]}^{ij}$| in the double sum over all pixels is 1 when the distance of the pixel pair i, j is within [θ₁, θ₂] and 0 otherwise. Note that we assume an enumeration of the pixels and that we demand i > j in the sum in order to not count any pair of pixels twice.

If shot noise is the only source of noise, then it is straightforward to calculate the variance of this measurement as

$$\begin{eqnarray*} \mathrm{Var}(\hat{w}[\theta _1, \theta _2]) &=& \frac{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ \left\langle \delta _{\mathrm{ g},i}^2\right\rangle \ \left\langle \delta _{\mathrm{ g},j}^2\right\rangle }{\left[\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\right]^2}\nonumber \\ &=& \frac{1}{(A_{\mathrm{pix}}n_\mathrm{ g})^2}\frac{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_i\ w_j}{\left[\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\right]^2}\nonumber \\ &=& \frac{1}{N_{\mathrm{pair},\mathrm{ g}}[\theta _1, \theta _2]}\frac{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_i\ w_j}{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}}\ . \end{eqnarray*}$$

(85)

In the last line, N_pair,g[θ₁, θ₂] is the number of unweighted galaxy pairs within the angular bin [θ₁, θ₂]. Note that in the presence of clustering, this should be calculated from a set of random points instead of from the actual galaxy catalogue.

The first factor on the right-hand side of equation (85) is what the shot-noise variance of |$\hat{w}$| should be in the absence of a weighting scheme. The second term is a two-point function of the weight map itself. If the weight map has a white-noise power spectrum, then this factor will be close to 1 in any angular bin that does not include angular distances of 0. This means that at large enough scales the last line of equation (85) looks like the covariance for plain Poissonian shot noise without any notion of an effective number density. This may be surprising, but it stems from the fact that the weighting scheme we assumed does not simply multiply the galaxy density contrast field. Instead, it reverses an already existing depletion of galaxy density from non-cosmological density fluctuations.

In conclusion, the effective number density that should be used to compute the covariance of |$\hat{w}[\theta _1, \theta _2]$| is

$$\begin{eqnarray*} n_{\mathrm{ g},\mathrm{eff}}[\theta _1, \theta _2] = n_\mathrm{ g} \sqrt{\frac{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}}{\sum _{\mathrm{pxls}\ i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_i\ w_j}} \ . \end{eqnarray*}$$

(86)

6.10.2 Galaxy–galaxy lensing

We move on to consider the galaxy–galaxy lensing correlation function γ_t[θ₁, θ₂]. We assume that the lens galaxy sample comes with weights derived from a weight map w^l as in the previous subsection while each source galaxy j has a weight |$w_j^s$| that does not come from an entire weight map but is instead the result of the individual quality of shape measurement for this galaxy. A measurement of γ_t can be constructed as

$$\begin{eqnarray*} \hat{\gamma }_t[\theta _1, \theta _2] = \frac{\sum _{\mathrm{pxl}\ i,\ \mathrm{source}\ j} \Delta _{[\theta _1, \theta _2]}^{ij}\ \delta _{l,i}\ \epsilon _{t,j\rightarrow i}\ w_j^s}{\sum _{\mathrm{pxl}\ i,\ \mathrm{source}\ j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_j^s}\ . \end{eqnarray*}$$

(87)

Here, δ_l,i is the galaxy density contrast of the lenses defined in analogy to the previous subsection, ϵ_{t, j → i} is the tangential component of the shear of source j with respect to lens galaxy i, and w_j is the weight of source galaxy j. Note that due to our definition of the lens galaxy density contrast this estimator already includes subtraction of shear around random points.

If shot noise and shape noise are the only sources of noise, then it can be readily shown that

$$\begin{eqnarray*} \mathrm{Var}(\hat{\gamma }_t[\theta _1, \theta _2])&=& \frac{\sum _{\mathrm{pxl}\ i,\ \mathrm{source}\ j} \Delta _{[\theta _1, \theta _2]}^{ij}\ \left\langle \delta _{l,i}^2\right\rangle \ \left\langle \left(\epsilon _{t,j\rightarrow i}\ w_j^s\right)^2\right\rangle }{\left[\sum _{\mathrm{pxl}\ i,\ \mathrm{source}\ j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_j^s\right]^2}\nonumber \\ &\approx& \frac{1}{N_{\mathrm{pair},ls}[\theta _1, \theta _2]}\frac{\sum _{\mathrm{pxl}\ i,\ \mathrm{source}\ j} \Delta _{[\theta _1, \theta _2]}^{ij}\ \left\langle \left(\epsilon _{t,j\rightarrow i}\ w_j^s\right)^2\right\rangle }{\sum _{\mathrm{pxl}\ i,\ \mathrm{sources}\ j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_j^s}\nonumber \\ &\approx& \frac{1}{2 N_{\mathrm{pair},ls}[\theta _1, \theta _2]}\frac{\sum _{\mathrm{source}\ j} \left\langle |\boldsymbol{\epsilon }_j|^2\ \left(w_j^s\right)^2\right\rangle }{N_s} \,\,\left(\mathrm{only\ with}\ \left\langle w_j^s \right\rangle = 1 = \left\langle w_j^l \right\rangle \ !\right). \end{eqnarray*}$$

(88)

Here, N_pair,ls[θ₁, θ₂] is the number of unweighted lens–source pairs in the angular bin [θ₁, θ₂], N_s is the total number of source galaxies and |$\boldsymbol{\epsilon }_j = \epsilon _{1,j} + i \epsilon _{2,j}$| is the complex intrinsic ellipticity of source galaxy j.

Note that the final expression in equation (88) explicitly allows for the possibility that the source weights |$w_j^s$| are correlated with the intrinsic ellipticities |$\boldsymbol{\epsilon }_j$| of the source galaxies. One can interpret equation (88) as

$$\begin{eqnarray*} \mathrm{Var}(\hat{\gamma }_t[\theta _1, \theta _2]) = \frac{\sigma _{\epsilon ,\mathrm{eff}}^2}{N_{\mathrm{pair},ls}[\theta _1, \theta _2]} \end{eqnarray*}$$

(89)

with the effective dispersion of intrinsic ellipticity per shear component given by

$$\begin{eqnarray*} \sigma _{\epsilon ,\mathrm{eff}}^2 = \frac{1}{2} \frac{\sum _{\mathrm{source}\ j} |\boldsymbol{\epsilon }_j|^2\ (w_j^s)^2}{N_s}\ . \end{eqnarray*}$$

(90)

One subtlety here is that the above derivation requires |$\langle w_j^s \rangle = 1$|⁠. The above expressions must be modified as this is not the case or when taking into account responses R_j of a shape catalogue generated with metacalibration (Sheldon & Huff 2017). We detail what to do in the latter case in Appendix G.

6.10.3 cosmic shear

For cosmic shear, we follow Schneider et al. (2002) and construct a measurement of ξ₊ from a set of sources as

$$\begin{eqnarray*} \hat{\xi }_+[\theta _1, \theta _2] = \frac{\sum _{i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_i w_j\ (\epsilon _{1,i}\epsilon _{1,j}+\epsilon _{2,i}\epsilon _{2,j})}{\sum _{i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_i w_j}\ . \end{eqnarray*}$$

(91)

If shape noise is the only source of noise and if the intrinsic ellipticities of galaxies are not correlated with their weights, then the variance of |$\hat{\xi }_+$| is given by

$$\begin{eqnarray*} \mathrm{Var}(\hat{\xi }_+[\theta _1, \theta _2])&=& \frac{\sum _{i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ \left\langle \epsilon _{1,i}^2 w_i^2\right\rangle \left\langle \epsilon _{1,j}^2 w_j^2\right\rangle +\left\langle \epsilon _{2,i}^2 w_i^2\right\rangle \left\langle \epsilon _{2,j}^2 w_j^2\right\rangle }{\left[\sum _{i\gt j} \Delta _{[\theta _1, \theta _2]}^{ij}\ w_i w_j\right]^2}\nonumber \\ &\approx& \frac{2 \sigma _{\epsilon ,\mathrm{eff}}^4}{N_{\mathrm{pair}}[\theta _1, \theta _2]}\ . \end{eqnarray*}$$

(92)

Here, N_pair[θ₁, θ₂] is the number of source galaxy pairs in the bin [θ₁, θ₂] and we have replaced each expectation value |$\langle \epsilon _{1/2,i}^2 w_i^2\rangle$| by |$\sigma _{\epsilon ,\mathrm{eff}}^2$| from equation (90). Note that we again assumed 〈w_j〉 = 1 and that this may require re-scaling of both weights and σ_ϵ when using shape measurements from metacalibration.

6.10.4 Testing validity of effective shape noise

To test the validity of our expression for effective shape noise in equation (90), we run a sub-sample covariance estimator on our data (see e.g. Friedrich et al. 2016). In particular, we divide all of our source and lens galaxy samples into 200 randomly chosen sub-samples and measure the galaxy–galaxy lensing correlation function of each source–lens bin combination. As a result, we obtain 200 measurements of |$\hat{\gamma }_t$| in each source–lens bin combination. Since we employ completely random sub-sampling, i.e. without any regard for e.g. a division of our footprint into sub-regions, the sample covariance of these 200 measurements will almost exclusively be dominated by shape noise and shot noise. This is even more so, because the lens and source densities of the sub-samples are very low.

In Fig. 12, we show the ratio of the variances of the 200 galaxy–galaxy lensing measurements |$\hat{\gamma }_t$| in the different lens–source bin combinations to equation (89). Assuming that the sub-sample covariances follow a Wishart distribution, we find that these ratios are perfectly consistent with 1. This indicates that equation (90) indeed yields an accurate effective shape-noise dispersion, and that one should indeed use the plain density of lens galaxies (as opposed to any notion of effective density) when evaluating covariance expressions.

$Ratio between the sample variance of $\hat{\gamma }_t$ measured in 200 randomly selected sub-samples of the DESY3 lens and sources catalogues and equation (89) for the shape-noise contribution to the covariance (again using equation 90 to calculate the effective shape-noise dispersion σϵ,eff). Each row displays the variances measured for a different source redshift bin and vertical dashed lines separate points belonging to different lens redshift bins (1–5 from left to right). Assuming that the covariance estimates have a Wishart distribution, we calculate the covariance matrix of these ratios (cf. Taylor et al. 2013) and find that they are consistent with 1 (both for the cosmic shear and galaxy–galaxy lensing variances).$

Figure 12.

Ratio between the sample variance of |$\hat{\gamma }_t$| measured in 200 randomly selected sub-samples of the DESY3 lens and sources catalogues and equation (89) for the shape-noise contribution to the covariance (again using equation 90 to calculate the effective shape-noise dispersion σ_ϵ,eff). Each row displays the variances measured for a different source redshift bin and vertical dashed lines separate points belonging to different lens redshift bins (1–5 from left to right). Assuming that the covariance estimates have a Wishart distribution, we calculate the covariance matrix of these ratios (cf. Taylor et al. 2013) and find that they are consistent with 1 (both for the cosmic shear and galaxy–galaxy lensing variances).

Open in new tab Download slide

7 A SIMPLE χ² TEST

In this short section, we present a simple χ² test that does not rely on the linearized framework. However, it has the disadvantage of not addressing the impact on the estimation of parameters. Here, we generate a large number of ‘contaminated’ data vectors (we use 1000) by a Gaussian sampling of a given covariance matrix that includes different effects and to compute a χ² distribution from these data vectors using a fiducial covariance matrix. The resulting shifts in the mean value of χ² and their standard deviations give another benchmark for the importance of the different effects considered here. We show the results of this test in Fig. 13. Note that the relative increases in χ² follow closely what we obtained within the linearized likelihood framework in Fig. 1. This indicates that the dominant way in which covariance errors cause χ² offsets is not through the altered scatter of maximum posterior parameter locations but simply through using an erroneous inverse covariance when computing χ². That also justifies our usage of the linearized likelihood framework since any impact of non-linear parameter dependences on parameter fitting can be expected to be even less relevant than linear fitting in the first place.

χ2 tests taking into account different effects. Colours follow the scheme of Fig. 1.

Figure 13.

χ² tests taking into account different effects. Colours follow the scheme of Fig. 1.

Open in new tab Download slide

8 DISCUSSIONS AND CONCLUSIONS

In this paper, we have presented the fiducial covariance model of the DES-Y3 joint analysis of cosmic shear, galaxy–galaxy lensing, and galaxy clustering correlation functions (the 3×2pt analysis). We then investigated how the assumptions and approximations of that model (including the assumption of Gaussian statistical uncertainties) impact the distribution of maximum posterior χ² and maximum posterior estimates of cosmological parameters.

The fiducial covariance matrix of the DES-Y3 3×2pt analysis uses the formalism of Krause & Eifler (2017) to model supersample covariance as well as the trispectrum contribution to the covariance. The model for the Gaussian covariance part (i.e. the contributions from the disconnected four-point function) correctly takes into account sky curvature and includes analytical averaging over the finite angular bins in which the two-point functions are measured. Furthermore, the galaxy clustering power spectra that enter our calculation of the Gaussian covariance part are computed using the non-Limber formalism of Fang et al. (2020b) and also include modelling of RSDs. The finite survey area of DES-Y3 is incorporated in the covariance model via the f_sky approximation (except in the pure shape-noise and shot-noise terms where we follow Troxel et al. 2018b).

In order to perform our validation tests for the DES-Y3 covariance matrix, we developed a plethora of new modelling ansatzes and testing strategies that are applicable in general. These new techniques are as follows:

We have motivated and devised a way of drawing realizations of the 3×2pt data vector from a non-Gaussian distribution in order to test the accuracy of our Gaussian likelihood assumption.
We have derived analytical expressions for angular bin averaging of all four types of two-point correlation functions included in the 3×2pt vector (ξ₊, ξ₋, γ_t, and w). These expressions correctly account for sky curvature. To the best of our knowledge, an analytical treatment of bin averaging for cosmic shear two-point function has not been presented before (though we have shared our results with Fang et al. 2020a, who have used them for the fiducial covariance computations).
We have extended the lognormal analytical model for the covariance of cosmic shear two-point function of Hilbert et al. (2011) to the other two-point functions present in the 3×2pt data vector.
Within a linearized likelihood formalism, we have analytically derived how covariance model inaccuracies influence the distribution of maximum-posterior χ² and of maximum-posterior parameter estimates. The results we presented also allow for the possibility of including the Gaussian priors on certain model parameters and can be used to analytically estimate the impact of covariance errors on cosmological likelihood analyses.
By fitting an effective number density to the high-ℓ plateau of galaxy clustering C_ℓ measurements, we have estimated how much the assumption of Poissonian shot noise influences our likelihood analysis. This is similar in spirit to the RASCALC technique presented by Philcox et al. (2020), and we agree with those authors that non-Poissonian shot noise can be viewed as an effective description of how short-scale non-linearities in galaxy clustering influence the covariance.
We calculated covariance matrices for 100 different sets of cosmological and nuisance parameters randomly drawn from a simulated likelihood chain. This allowed us to investigate whether calculating our covariance model at a reasonable, but wrong point in parameter space significantly impacts our analysis. This was done both within our linearized likelihood framework and by using importance sampling to quickly evaluate the 100 non-linear likelihoods.
We have derived how the two-point correlation function of weight maps influences the covariance of galaxy clustering two-point function measurements. In that context, we have also found that traditional ways of deriving an effective number density for a given set of galaxy weights are erroneous when those weights are aimed at undoing a suspected depletion of galaxy density (e.g. due to variations in observing conditions).
We have derived an expression for the effective dispersion of intrinsic source shapes for the situation when source galaxy weights are correlated with galaxy ellipticity. We have also shown how metacalibration responses (Sheldon & Huff 2017) enter that expression for effective shape noise.
We have described a clean sub-sample covariance estimation scheme that directly measures the sampling noise contributions to the covariance from a given data set. We then used the resulting covariance estimates to test the validity of our assumed effective shape noise values.
We have employed the hybrid covariance estimation technique PME (Friedrich & Eifler 2018) to efficiently evaluate the importance of individual contributions to the covariance from only a limited set of simulated data (in our case: 200 realizations of the 3×2pt data vector including shape noise and 100 realizations without shape noise).
We have devised a treatment of survey geometry in covariance modelling that improves upon existing approximations (of e.g. Efstathiou 2004) and we have demonstrated how to carry those approximations from harmonic space to real space.

Using these results, we perform several tests for the fiducial DES-Y3 3×2pt covariance matrix and likelihood model, with the following conclusions:

The assumption of Gaussian statistical uncertainties is sufficiently accurate (cf. Section 6.1). Hence, knowledge of the covariance of the 3×2pt data vector is sufficient to model our statistical uncertainties. The main assumption made to arrive at this conclusion is that non-Gaussian error bars are primarily a large-scale problem and that at small scales the number of modes present within the DES-Y3 survey volume converges to a Gaussian distribution via the central limit theorem.
The non-Gaussian part of the covariance has a negligible impact on both maximum posterior χ² and parameter constraints (cf. Section 6.2). This statement is not a general one but only holds for the specific DES-Y3 3×2pt analysis set-up. The main assumption made to arrive at that conclusion is that the CosmoLike model (Krause, Eifler & Blazek 2016) or the lognormal model (Hilbert et al. 2011) for the non-Gaussian covariance does not vastly underestimate the true covariance. Given the results of Sato et al. (2009) and Hilbert et al. (2011), we find that this a safe assumption.
Of all covariance modelling assumptions investigated in this paper, the f_sky approximation (made in the mixed term and cosmic variance term of our covariance model) has the largest effect on maximum posterior χ². On average, it increases χ² between measurement and maximum posterior model by about |$3.7{{\ \rm per\ cent}}$| (Δχ² ≈ 18.9) for the 3×2pt data vectors and by about |$5.7{{\ \rm per\ cent}}$| (Δχ² ≈ 16.0) for the 2×2pt data vector (cf. Table 1).
However, neither f_sky approximation nor any other covariance modelling detail tested in this paper (cf. Table 1; with the exception of finite bin width; see the next point) has any significant impact on the location and width of constraints on the parameters Ω_m, σ₈, and w.
The only exception to this statement is finite angular bin width that – if not taken into account in the mixed term and cosmic variance term of the covariance model – significantly increases the scatter of maximum posterior parameters (without increasing the inferred constraints accordingly). However, finite bin width has been taken into account in the past in an approximate manner – see e.g. Krause et al. (2017).
The fact that we do not know the true cosmological parameters of the Universe forces us to evaluate our covariance model at a wrong set of parameters. Even when iteratively adjusting those parameters to the maximum posterior parameters of the analysis, the parameters of the covariance model will scatter around the ‘true’ cosmological/nuisance parameters. We consider this an irreducible covariance error and find that it increases the maximum posterior scatter of Ω_m and σ₈ by about |$3{{\ \rm per\ cent}}$| and that of the dark energy equation-of-state parameter w by about |$5{{\ \rm per\ cent}}$| (cf. Section 6.8 and Table 1). At the same time, we find this effect to have a negligible impact on maximum posterior χ².

In summary, we have shown that our fiducial covariance and likelihood model underestimates the scatter of maximum posterior parameters by about 3–|$5{{\ \rm per\ cent}}$|⁠, which is mostly caused by uncertainty in the set of cosmological and nuisance parameters at which we evaluate that model. On average, the χ² between maximum posterior model and measurement of the 3×2pt data vector will be |${\sim} 4{{\ \rm per\ cent}}$| higher than expected with perfect knowledge of the covariance matrix. This is mainly caused by our use of the f_sky approximation. We have devised an improved treatment of the full survey geometry (cf. Section 6.6) but for the reason mentioned above this was only used to test the impact of the f_sky approximation on parameter constraints.

Given the small impact that we estimated from the unaccounted effects in the covariance modelling, we conclude that the fiducial covariance model is adequate to be used in the 3×2pt DES-Y3 analysis. While our validation of this covariance model has been carried out with a preliminary set of scale cuts and redshift distributions, we do not expect qualitative changes for the final DES-Y3 analysis set-up.

While the DESY3 specific outcomes of our study cannot straightforwardly be transferred to other surveys and analyses, our methodological innovations will be useful tools in the covariance and likelihood validation of future experiments.

ACKNOWLEDGEMENTS

OF gratefully acknowledges support by the Kavli Foundation and the International Newton Trust through a Newton-Kavli-Junior Fellowship and by Churchill College Cambridge through a postdoctoral By-Fellowship. The authors thank Henrique Xavier for very helpful discussions about mock simulations and the flask code. This research was partially supported by the Laboratório Interinstitucional de e-Astronomia (LIneA), the Brazilian Funding agency CNPq, the INCT of the e-Universe, and the Sao Paulo State Research Agency (FAPESP). The authors acknowledge the use of computational resources from LIneA, the Center for Scientific Computing (NCC/GridUNESP) of the Sao Paulo State University (UNESP), and from the National Laboratory for Scientific Computing (LNCC/MCTI, Brazil), where the SDumont supercomputer (sdumont.lncc.br) was used.

This paper has gone through internal review by the DES collaboration. Funding for the DES Projects has been provided by the U.S. Department of Energy, the U.S. National Science Foundation, the Ministry of Science and Education of Spain, the Science and Technology Facilities Council of the United Kingdom, the Higher Education Funding Council for England, the National Center for Supercomputing Applications at the University of Illinois at Urbana–Champaign, the Kavli Institute of Cosmological Physics at the University of Chicago, the Center for Cosmology and Astro-Particle Physics at the Ohio State University, the Mitchell Institute for Fundamental Physics and Astronomy at Texas A&M University, Financiadora de Estudos e Projetos, Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro, Conselho Nacional de Desenvolvimento Científico e Tecnológico and the Ministério da Ciência, Tecnologia e Inovação, the Deutsche Forschungsgemeinschaft, and the Collaborating Institutions in the DES.

The Collaborating Institutions are Argonne National Laboratory, the University of California at Santa Cruz, the University of Cambridge, Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas-Madrid, the University of Chicago, University College London, the DES-Brazil Consortium, the University of Edinburgh, the Eidgenössische Technische Hochschule (ETH) Zürich, Fermi National Accelerator Laboratory, the University of Illinois at Urbana–Champaign, the Institut de Ciències de l’Espai (IEEC/CSIC), the Institut de Física d’Altes Energies, Lawrence Berkeley National Laboratory, the Ludwig-Maximilians Universität München and the associated Excellence Cluster Universe, the University of Michigan, the National Optical Astronomy Observatory, the University of Nottingham, The Ohio State University, the University of Pennsylvania, the University of Portsmouth, SLAC National Accelerator Laboratory, Stanford University, the University of Sussex, Texas A&M University, and the OzDES Membership Consortium.

Based in part on observations at Cerro Tololo Inter-American Observatory at NSF’s NOIRLab (NOIRLab Prop. ID 2012B-0001; PI: J. Frieman), which is managed by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation.

The DES data management system is supported by the National Science Foundation under grant numbers AST-1138766 and AST-1536171. The DES participants from Spanish institutions are partially supported by MINECO under grants AYA2015-71825, ESP2015-66861, FPA2015-68048, SEV-2016-0588, SEV-2016-0597, and MDM-2015-0509, some of which include ERDF funds from the European Union. IFAE is partially funded by the CERCA program of the Generalitat de Catalunya. Research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Program (FP7/2007-2013) including ERC grant agreements 240672, 291329, and 306478. We acknowledge support from the Brazilian Instituto Nacional de Ciência e Tecnologia (INCT) e-Universe (CNPq grant 465376/2014-2).

This manuscript has been authored by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics.

This work made use of the software packages getdist (Lewis 2019), chainconsumer (Hinton 2016), matplotlib (Hunter 2007), and numpy (Harris et al. 2020).

We would like to thank the anonymous journal referee for their helpful comments.

DATA AVAILABILITY

The DES-Y3 3×2pt covariance matrix and likelihoods will be made public upon publication of our final data analysis. c++ and python tools to configure flask as described in Section 4.3 are available at https://github.com/OliverFHD/CosMomentum. Tools to compute Gaussian and halomodel covariance are available at https://github.com/CosmoLike/CosmoCov.

Footnotes

www.sdss.org/surveys/eboss

hsc.mtk.nao.ac.jp/ssp

kids.strw.leidenuniv.nl

www.darkenergysurvey.org

www.desi.lbl.gov

www.lsst.org

www.euclid-ec.org

nasa.gov/content/goddard/nancy-grace-roman-space-telescope

In order to deal with the prior volume effect, Joachimi et al. (2020) proposed to report parameter constraints through what they call projected joint highest posterior density. This topic will be addressed in a separate DES paper (Raveri et al., in preparation).

Alternatively, one could investigate the distribution of p-values (or probability to exceed; cf. Hall & Taylor 2019) as opposed to the distribution of χ².

www.class-code.net

camb.info

We have shared our results with Fang et al. (2020a), who have used them for their covariance calculations.

http://www.astro.iag.usp.br/flask/

https://github.com/rmjarvis/TreeCorr

from |$\smash{\sum _\ell \frac{2\ell + 1}{2} P_\ell (x) P_\ell (y) = \delta _D(x -y)}$| - see N. Bronstein & A. Semendjajew (1979) for this and other properties of Legendre polynomials.

Note that a factor of 1/isin (θ) is missing in the second line of this equation.

A map of the mask will come with a finally resolution, in which case the fractional values 0 < W < 1 of the mask will describe the completeness of observations within the map resolution.

REFERENCES

Abbott

T. M. C.

et al. ,

2018

Phys. Rev. D

043526

Month:	Total Views:
August 2021	12
September 2021	15
October 2021	99
November 2021	150
December 2021	113
January 2022	146
February 2022	189
March 2022	111
April 2022	139
May 2022	78
June 2022	95
July 2022	76
August 2022	68
September 2022	96
October 2022	139
November 2022	49
December 2022	72
January 2023	88
February 2023	62
March 2023	64
April 2023	42
May 2023	34
June 2023	54
July 2023	30
August 2023	43
September 2023	56
October 2023	32
November 2023	53
December 2023	44
January 2024	34
February 2024	58
March 2024	36
April 2024	32
May 2024	35
June 2024	35
July 2024	38
August 2024	21
September 2024	38
October 2024	47
November 2024	41

Article Contents

Dark Energy Survey year 3 results: covariance modelling and its impact on parameter estimation and quality of fit Open Access

ABSTRACT

1 INTRODUCTION

2 COVARIANCE VALIDATION STRATEGY AND SUMMARY OF THE RESULTS

3 THE 3×2-POINT DATA VECTOR

4 COVARIANCE MATRICES FOR THE 3 × 2PT DATA VECTOR

4.1 Fiducial DES-Y3 covariance

4.1.1 Gaussian covariance

4.2 Analytical lognormal covariance model

4.3 Lognormal covariance from simulations

4.4 Comparisons among covariances

5 IMPACT OF COVARIANCE ERRORS ON A LINEARIZED GAUSSIAN LIKELIHOOD

5.1 Linearized likelihoods

5.2 Impact on the width of the likelihood and scatter of best-fitting parameters

5.3 Distribution of χ2 when fitting for parameters

6 EXPLORING DIFFERENT EFFECTS IN THE COVARIANCE MODELLING

6.1 Gaussian likelihood assumption

6.2 Modelling of connected four-point function in covariance

6.3 Exact angular bin averaging

6.4 Flat versus curved sky

6.5 RSD and Limber approximation and RSD effects

6.6 Effect of the mask geometry

6.7 Non-Poissonian shot noise

6.8 Cosmology dependence of the covariance model

6.9 Random point shot noise

6.10 Effective densities and effective shape noise

6.10.1 Galaxy clustering

6.10.2 Galaxy–galaxy lensing

6.10.3 cosmic shear

6.10.4 Testing validity of effective shape noise

7 A SIMPLE χ2 TEST

8 DISCUSSIONS AND CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

APPENDIX A: CURVED SKY FORMALISM

APPENDIX B: AVERAGING CORRELATION FUNCTIONS OVER FINITE BINS

APPENDIX C: MASKING IN REAL SPACE COVARIANCES

APPENDIX D: MOTIVATION FOR OUR RE-SCALING ANSATZ FOR MASKING EFFECTS

APPENDIX E: PME TO INVESTIGATE THE IMPACT OF MASKING ON INDIVIDUAL COVARIANCE TERMS

APPENDIX F: IMPACT OF EXTREME COSMOLOGIES ON PARAMETER CONSTRAINTS

APPENDIX G: EFFECTIVE SHAPE NOISE WHEN USING METACALIBRATION

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Dark Energy Survey year 3 results: covariance modelling and its impact on parameter estimation and quality of fit

5.3 Distribution of χ² when fitting for parameters

7 A SIMPLE χ² TEST