Optimizing marked power spectra for cosmology

ABSTRACT

Marked power spectra provide a computationally efficient way to extract non-Gaussian information from the matter density field using the usual analysis tools developed for the power spectrum without the need for explicit calculation of higher-order correlators. In this work, we explore the optimal form of the mark function used for re-weighting the density field, to maximally constrain cosmology. We show that adding to the mark function or multiplying it by a constant leads to no additional information gain, which significantly reduces our search space for optimal marks. We quantify the information gain of this optimal function and compare it against mark functions previously proposed in the literature. We find that we can gain around |$\sim 2$| times smaller errors in |$\sigma _8$| and |$\sim 4$| times smaller errors in |$\Omega _\mathrm{m}$| compared to using the traditional power spectrum alone, an improvement of |$\sim 60~{{\ \rm per\ cent}}$| compared to other proposed marks when applied to the same data set.

cosmological parameters, large-scale structure of Universe

1 INTRODUCTION

Much of the progress made so far in cosmological large-scale structure studies has been using tools that are only optimal for the study of Gaussian random fields, such as correlation functions and power spectra. There are good reasons for these: CMB observations have confirmed that the primordial density fluctuations were very close to Gaussian (Komatsu et al. 2003; Planck Collaboration VII 2020). However, over time, non-linear gravitational clustering leads to highly non-Gaussian (NG) features in the small-scale matter distribution. This is the so-called ‘cosmic web’, including voids, sheets, filaments, and virialised haloes, and its rich NG structure may contain invaluable information to constrain different theories of the evolution of the Universe (Bernardeau et al. 2002; Libeskind et al. 2018).

This NG information will not be fully captured by 2-point statistics alone (Scoccimarro, Zaldarriaga & Hui 1999; Bernardeau et al. 2002); however, there are multiple ways to attempt to recover the extra information, all falling under the umbrella term of ‘higher-order statistics’. One method is to explicitly calculate the higher-order n-point correlations. Most commonly, the next-order correlator to the power spectrum, the bispectrum (3-point correlator), has been investigated and used in measurements of the 3D galaxy distribution (Gil-Marín et al. 2017; Philcox et al. 2022a). The trispectrum (i.e. 4th-order correlator) has also been exploited (Hu 2001; Gualdi & Verde 2022; Philcox 2022). The combination of these next-order statistics has been shown to have the power to break degeneracies in cosmological and astrophysical parameters (Bernardeau, van Waerbeke & Mellier 1997; Pires, Leonard & Starck 2012; Hahn & Villaescusa-Navarro 2021). The downside to directly calculating the bispectrum or trispectrum is that it can be computationally expensive and can lead to a large increase in the size of the resulting data vector (with the associated complexity of estimating its covariance matrix and theoretical predictions). However, significant progress has been made at accelerating these calculaions (Takahashi et al. 2020; Philcox et al. 2022b; Choustikov, Vlah & Challinor 2023; Philcox & Flöss 2024). There is also no guarantee that this next-order approach leads to an optimal extraction of NG information.

Another approach is to compute summary statistics that capture NG information. Compared to n-point correlators, they are often easier to compute and potentially more efficient at capturing NG information. An incomplete list includes density split statistics (Gruen et al. 2016; Friedrich 2018; Friedrich et al. 2018), the integrated cosmic shear 3-point function (Halder et al. 2021, 2023; Gong et al. 2023), counts in cells of the PDF (Uhlemann et al. 2020), minimum spanning trees (Naidoo et al. 2020; Naidoo, Massara & Lahav 2022), |$k$|-nearest neighbours (Banerjee & Abel 2021), and wavelet scattering transforms (Cheng et al. 2020, 2024; Valogiannis & Dvorkin 2022; Eickenberg et al. 2022). There is also significant interest in the advantage of looking at voids as a probe of cosmology (Pisani et al. 2019; Davies et al. 2020), including the number density of voids (Woodfinden et al. 2022) and the void size function (Ronconi & Marulli 2017; Bayer et al. 2021; Kreisch et al. 2021; Thiele et al. 2023).

An alternative method is to still use 2-point statistics but transform the fields themselves such that higher-order information is brought into the 2-point correlators. Examples of this include Gaussianized fields (Weinberg 1992; Neyrinck 2011; Neyrinck, Szapudi & Szalay 2011), log-normal transformed fields (Neyrinck, Szapudi & Szalay 2009; Wang et al. 2011), and reconstructed density fields (Eisenstein et al. 2007). In this work, we explore a particular form of these transforms, called the ‘marked’ field, and its associated ‘marked correlation functions’. Marked correlation functions were first formulated in Stoyan (1984) in the form of marked point processes, where each source was weighted by specific source properties. These were used in measurements of luminosity- and morphology-dependent clustering of galaxies (Beisbart & Kerscher 2000; Sureshkumar et al. 2021) and its interpretation within the halo model (Gottlöber et al. 2002; Sheth 2005) and galaxy formation models (Sheth, Connolly & Skibba 2005; Skibba et al. 2006). In the context of cosmological structure formation, marked correlation functions involve locally weighting the original density field by a mark function that depends on the smoothed local density field at the same point. The correlations of this marked field with itself and with the unmarked field then contain information beyond the two-point function of the original field (see e.g. Philcox, Massara & Spergel 2020). This procedure has several advantages: computing the marked field is inexpensive, and measuring the marked power spectra containing additional NG information can be done using the vast existing infrastructure for the estimation of two-point functions, their covariances, and, potentially, their theoretical expectations.

This formalism was more recently popularized by White (2016, W16 hereafter) as a potentially powerful test for modified gravity (MG) theories, and versions of this mark have been used in the literature. The W16 function has been used to test modified gravity theories (Armijo et al. 2018; Satpathy et al. 2019; Armijo et al. 2024), and the effect on MG models in perturbation theory (Aviles et al. 2020). It has also been compared with other transforms such as the clipping and logarithmic transform (Valogiannis & Bean 2018), as well as the ‘log mark’ and ‘Gaussian mark’ (Hernández-Aguayo, Baugh & Li 2018). Massara et al. (2021) demonstrated the power of marked statistics on late-time cosmological parameters using simulations, and galaxy field data (Massara et al. 2023), finding that they led to a significant improvement in errors on cosmological parameters. Perturbative expressions of the W16 mark within a cosmological context have been derived in Philcox et al. (2020); Philcox, Aviles & Massara (2021).

Our aim in this work is to study the possibility of finding an optimal mark function, beyond that allowed by specific functional forms. Specifically, we define ‘optimality’ as minimizing the final uncertainty on two cosmological parameters, |$\Omega _\mathrm{m}$| and |$\sigma _8$|⁠, where |$\Omega _\mathrm{m}$| is the current non-relativistic energy density fraction, and |$\sigma _8$| is the standard deviation of the linear density field on spheres with a radius of |$8\, h^{-1}{\rm Mpc}$|⁠. Similar studies have been recently carried out in the context finding the optimal mark functions within the context of modified gravity (Kärcher, Bel & de la Torre 2024), where they explore various types of mark functions, including environmental weightings, but this paper looks at the density weighted mark in a cosmological context.

This paper is organized as follows: in Section 2, we describe some of the theories of marked correlation functions and the methodology we use to search for optimal non-parametric mark functions. We present our results in Section 3, first exploring the potential for optimality of the W16 function and then finding optimal marks under different conditions, studying their universality. Finally, we summarize our main findings and conclude in Section 4.

2 THEORY AND METHODS

2.1 Marked fields and power spectra

Consider a measurement of an overdensity field |$\delta ({\bf x})$| which, for simplicity, we will assume to be the matter overdensity. Let |$\delta _R({\bf x})$| be the same field smoothed on a comoving scale |$R$|⁠. The specific choice of smoothing kernel is arbitrary, and here we will use a Gaussian kernel, with |$R$| as its standard deviation. Finally, let |$M(\delta _R)$| be the mark function, which we will consider to be a smooth, infinitely differentiable function defined over the range of values that |$\delta _R$| can take. With these ingredients, the marked overdensity field|$\Delta ({\bf x})$| is defined to be

$$\begin{eqnarray} \Delta ({\bf x})\equiv M(\delta _R({\bf x}))\, [1+\delta ({\bf x})]-\langle M(\delta _R)\, [1+\delta ]\rangle , \end{eqnarray}$$

(1)

where |$\langle \cdots \rangle$| denotes a spatial average over the observed region.

The reasons for considering marked fields are manifold. First, the mark function allows us to up-weight or down-weight different regions based on their local environment (e.g. up-weighting underdensities, where small-scale inhomogeneities may be closer to the linear regime). Moreover, through this simple operation, the marked field becomes a non-linear and mildly non-local function of the overdensity. Thus, the low-order correlations of this field (with itself and with |$\delta$|⁠) will automatically contain contributions from higher-order correlations (Philcox et al. 2020). Crucially, we can thus apply well-established techniques used to estimate two-point functions and their uncertainties from real data to extract NG information.

Once |$\delta ({\bf x})$| and |$\Delta ({\bf x})$| have been constructed, we can estimate their three 2-point correlators in Fourier space (i.e. power spectra): |$P_{\delta \delta }(k)$|⁠, |$P_{\delta \Delta }(k)$|⁠, |$P_{\Delta \Delta }(k)$|⁠, where

$$\begin{eqnarray} \langle a({\bf k})\, b^{*}({\bf k}^{\prime })\rangle \equiv (2\pi)^3\, \delta ^D({\bf k}-{\bf k}^{\prime })P_{ab}(k), \end{eqnarray}$$

(2)

and we use the following convention for Fourier transforms:

$$\begin{eqnarray} a({\bf k})\equiv \int \, d^3x\, e^{-i{\bf k}\cdot {\bf x}}\, a({\bf x}). \end{eqnarray}$$

(3)

The efficiency of a given choice of mark function |$M$| should be quantified in terms of its ability to tease out additional information beyond that contained in the power spectrum of the original density field |$P_{\delta \delta }(k)$|⁠. Given a data vector composed of the three power spectra |${\bf d}\equiv (P_{\delta \delta }(k),P_{\delta \Delta }(k),P_{\Delta \Delta }(k))$|⁠, measured at a set of |$k$| values, and assuming that the data follow a multivariate normal distribution (which is a good approximation on sufficiently small scales (e.g. Hamimeche & Lewis 2008)), we can quantify the amount of information recovered from the Fisher matrix:

$$\begin{eqnarray} \mathcal {F}_{\alpha \beta } = \partial _\alpha {\bf d}^T\, {\sf C}^{-1}\partial _\beta {\bf d}, \end{eqnarray}$$

(4)

where |$\partial _\alpha \equiv \partial /\partial \theta _\alpha$| denotes the partial derivative of the data vector with respect to the model parameter |$\theta _\alpha$|⁠, and |${\sf C}$| is the covariance matrix of |${\bf d}$|⁠.

For simplicity, in our analysis, we will only consider two cosmological parameters, the non-relativistic matter fraction |$\Omega _\mathrm{m}$|⁠, and the amplitude of linear matter fluctuations |$\sigma _8$|⁠. To quantify the information gain, we will use the figure of merit

$$\begin{eqnarray} {\rm FOM}\equiv {\rm det}(\mathcal {F}). \end{eqnarray}$$

(5)

We will consider both the joint constraints on |$\Omega _\mathrm{m}$| and |$\sigma _8$|⁠, in which case |$\mathcal {F}$| is a |$2\times 2$| matrix, as well as either |$\sigma _8$| or |$\Omega _\mathrm{m}$| separately (for which |$\mathcal {F}$| is simply a number). We will also compare the FOM for the matter power spectrum |$P_{\delta \delta }(k)$| alone, and for the full set of power spectra, in order to quantify the relative information gain when including the marked density field.

2.2 Mark functions

The most common form of the mark function used in the literature was proposed by White (2016), and takes the form

$$\begin{eqnarray} M(\delta _R)=\left[1+\frac{\delta _R}{1+\delta _s}\right]^{-p}, \end{eqnarray}$$

(6)

In what follows, we will label the function above W16 and, although we will use it as a case study, our main aim is to explore more general functional forms for the mark function, quantifying their optimality.

2.2.1 Symmetries of mark functions

Before we introduce a method to define general mark functions, it is worth exploring its symmetries, in order to avoid spending time exploring functions that are, ultimately, equivalent.

Two mark functions |$M(\delta _R)$| and |$M^{\prime }(\delta _R)$| are equivalent if they are able to recover the exact same amount of information from any given overdensity map (i.e. if they produce the same Fisher information matrix). Under this definition, we can identify two obvious symmetries of mark functions (i.e. transformations connecting two equivalent functions:

Linear scaling:|$M^{\prime }=A\, M$|⁠, where |$A$| is an arbitrary non-zero constant. The effect of this transformation would be to simply increase the amplitude of the marked overdensity field (⁠|$\Delta ^{\prime }=A\, \Delta$|⁠) by a known amount, which cannot alter the information encoded in the field. Evidently this symmetry includes also reflections of the mark function across the |$x$|-axis (⁠|$M^{\prime }=-M$|⁠).
Linear offset:|$M^{\prime }=M+b$|⁠, where |$b$| is an arbitrary constant. Under this transformation, the marked field simply receives an additive contribution proportional to the original field: |$\Delta ^{\prime }=\Delta +b\delta$|⁠. Since our proposed analysis would involve both |$\Delta$| and |$\delta$|⁠, no additional information can be uncovered.

Mark functions connected via an affine transformation of the form |$M^{\prime }=A\, M+b$| are therefore equivalent. Although the arguments above should suffice, this can be easily proven formally. Under such a transformation, the data vector |${\bf d}$| and its covariance matrix |${\sf C}$| become |${\bf d}^{\prime }={\sf T}{\bf d}$| and |${\sf C}^{\prime }={\sf T}{\sf C}{\sf T}^T$|⁠,¹ where the matrix |${\sf T}$| is

$$\begin{eqnarray} {\sf T}\equiv \begin{pmatrix}1 & 0 & 0 \\ b & A & 0 \\ b^2 & 2Ab & A^2 \end{pmatrix}. \end{eqnarray}$$

(7)

The matrix |${\sf T}$| is invertible, and therefore the transformed Fisher matrix is

$$\begin{eqnarray} {\cal F}^{\prime }_{\alpha \beta }=\partial _\alpha {\bf d}^T{\sf T}^T({\sf T}^{T})^{-1}{\sf C}^{-1}{\sf T}^{-1}{\sf T}\partial _\beta {\bf d}=\partial _\alpha {\bf d}^T{\sf C}^{-1}\partial _\beta {\bf d}={\cal F}_{\alpha \beta }. \end{eqnarray}$$

(8)

Although this result may seem obvious, it has interesting consequences when trying to interpret a given mark function. Consider, for example, the mark function displayed in blue in Fig. 1. Taken at face value, this function would be interpreted as up-weighting underdensities and suppressing overdense regions. Shifting the function downwards by 1, and multiplying the result by −1, we obtain the function shown in red which, instead, favours overdensities and suppresses underdensities. Shifting this by |$1/2$| downwards, we instead obtain a function (orange) that upweights both overdensities and underdensities (the latter with a negative weight), and suppresses regions where the density is close to the mean. Although the interpretation of these three curves as mark functions is clearly different, they are equivalent in terms of the amount of information recovered.

Figure 1.

Examples of seemingly distinct (but in fact equivalent) mark functions.

Open in new tab Download slide

In order to sift through truly unique mark functions, one may therefore impose a constraint on their amplitude (e.g. by fixing their |$L^2$| norm), and by forcing them to cross zero at a specific value of |$\delta _R$| (an obvious choice would be |$\delta _R=0$|⁠, but other values may be used). We will use this to our advantage in the next Section.

2.2.2 General mark functions

In order to explore mark functions with a generic form, in search of the optimal one, we make use of Gaussian processes. A Gaussian process is a collection of multivariate normal random variables, representing the values of a function |$M(x)$| at different points along the |$x$|-axis (in our case, |$x=\delta _R$|⁠). A Gaussian process is thus fully defined by a mean function (which we take to be zero), and a kernel function parametrizing the covariance between any two points in the process. In our analysis, we use the radial basis function, also known as ‘squared exponential’ kernel. This choice is widely used in machine learning due to its infinite differentiability, which leads to the generation of smooth functions. The kernel is specified by only two parameters: an amplitude |$\sigma ^2$| and length-scale |$\ell$|⁠:

$$\begin{eqnarray} k_{\rm SE}(x,x^{\prime })=\sigma ^2\, \exp \left(-\frac{(x-x^{\prime })^2}{2\ell ^2}\right). \end{eqnarray}$$

(9)

With this choice, we thus define a given mark function |$M(\delta _R)$| in terms of its values at a set of |$N$| fixed points, or nodes, along the |$\delta _R$| axis. We label the node positions |$\boldsymbol{\delta }_R^{*}\equiv (\delta ^{*}_{R,1},\cdots ,\delta ^{*}_{R,N})$|⁠, and the corresponding function values |${\bf M}_*\equiv (M(\delta _{R,1}^{*}),\cdots ,M(\delta _{R,N}^{*}))$|⁠. The value of the function at any other |$\delta _R$| is given by the most likely Gaussian process conditioned on the known node values:

$$\begin{eqnarray} M(\delta _R)={\bf k}^T_*(\delta _R)\, {\sf K}^{-1}_*\, {\bf f}_*, \end{eqnarray}$$

(10)

where the vector |${\bf k}_*(\delta _R)$| and matrix |${\sf K}_*$| are

$$\begin{eqnarray} k_{*,i}(\delta _R)\equiv k_{\rm SE}(\delta _R,\delta ^{*}_{R,i}), K_{*,ij}\equiv k_{\rm SE}(\delta ^{*}_{R,i},\delta ^{*}_{R,j}). \end{eqnarray}$$

(11)

In principle, a general mark function can then be defined in terms of the node values |${\bf M}_*$|⁠. We can reduce the number of free parameters, while simultaneously avoiding exploring equivalent functions, taking advantage of the symmetries described in the previous section. To do so, we impose a normalization constraint on the node values, such that |$|{\bf M}_*|^2=1$|⁠. In other words, |${\bf M}_*$| is a unit vector that can be described in terms of |$N-1$| variables. We choose these variables to be the spherical coordinates parametrizing any point in the unit |$(N-1)$|-sphere. For instance, in the case |$N=4$|⁠, we would parametrize |${\bf M}_*$| in terms of three angles, |$(\psi ,\theta ,\phi)$| as

$$\begin{eqnarray} {\bf M}_*=(\sin \psi \, \sin \theta \, \sin \phi , \sin \psi \, \sin \theta \, \cos \phi , \sin \psi \, \cos \theta , \cos \psi), \\ \end{eqnarray}$$

(12)

where |$\phi \in [0,2\pi)$|⁠, and |$\lbrace \theta ,\psi \rbrace \in [0,\pi)$|⁠.

In our analysis, we will fix the kernel amplitude to |$\sigma ^2=20$|⁠, which is far larger than the typical amplitude of the mark functions given the normalization imposed above. Finally, the correlation length |$\ell$|⁠, which governs the smoothness of the resulting function, is chosen based on the inter-node spacing. This itself depends on the dynamic range of |$\delta _R$|⁠, which varies with the smoothing scale |$R$|⁠. For the three smoothing scales explored, the corresponding correlation lengths are |$\ell (R=5h^{-1}\, {\rm Mpc})=5$|⁠, |$\ell (R=10h^{-1}\, {\rm Mpc})=2$|⁠, and |$\ell (R=30h^{-1}\, {\rm Mpc})=0.75$|⁠.

2.3 Finding optimal mark functions

To quantify the performance of different mark functions, we make use of |$N$|-body simulations. The simulations used are described in Section 2.3.1, while the method used to find an optimal mark function from these simulations is outlined in Section 2.3.2. Section 2.3.3 provides further details about the calculation of power spectrum covariances, a central aspect of this analysis.

2.3.1 Simulations

We use two types of simulations in this analysis. First, we generate a suite of full |$N$|-body simulations, each with a box of size |$L_{\rm box}=700{\rm Mpc\,h^{-1}}$|⁠, with |$512^3$| dark matter particles. The simulations are run using the public Gadget-2 code (Springel, Yoshida & White 2001; Springel 2005). We run one simulation for a fiducial set of |$\Lambda$|CDM cosmological parameters

$$\begin{eqnarray} \lbrace \Omega _\mathrm{m},\sigma _8,h,n_s,\Omega _b\rbrace =\lbrace 0.3,0.8,0.7,0.96,0.05\rbrace . \end{eqnarray}$$

(13)

We then run four additional simulations, two of them varying |$\Omega _\mathrm{m}$| by |$\Delta \Omega _\mathrm{m}=\pm 0.02$|⁠, while keeping all other parameters fixed, and two of them varying |$\sigma _8$| by |$\Delta \sigma _8=0.02$|⁠. The initial conditions for these simulations were generated using the same seed, in order to partially cancel the effects of cosmic variance when computing the Fisher matrix from their power spectra (as described in Section 2.3.2). Snapshots are generated at |$z\in \lbrace 0,\, 0.1,\, 0.3,\, 0.5,\, 1,\, 2\rbrace$|⁠, although we only study the |$z=0$| snapshot here, leaving the investigation of the redshift dependence of the optimal mark to future studies.

Furthermore, we generate a suite of 100 fast simulations using the COLA algorithm (Tassev, Zaldarriaga & Eisenstein 2013). These are generated with the same box size and number of particles as the previous simulations, and assuming the fiducial cosmology above. The COLA simulations are used to quantify the impact of NG terms in the power spectrum covariance, as described in Section 2.3.3.

All power spectra are estimated from a matter overdensity field calculated using a cloud-in-cell algorithm on a grid with |$N_{\rm grid}=256$| cells per side. Power spectra are estimated at a set of wavenumbers covering the range |$k\in [0,k_{\rm Ny}]$|⁠, with |$k_{\rm Ny}=\pi N_{\rm grid}/L_{\rm box}=1.15\, h\, {\rm Mpc}^{-1}$|⁠, in intervals of width |$\Delta k=2\pi /L_{\rm box}=0.009\, h\, {\rm Mpc}^{-1}$|⁠.

2.3.2 Method overview

In order to quantify the optimality of a given mark function, defined in terms of the Gaussian process nodes (or rather, the associated hypersphere angles defined in Section 2.2.2), we do the following:

Having measured the original overdensity field |$\delta$|⁠, and its smoothed version |$\delta _R$|⁠, we evaluate the mark function on the latter, and construct the marked overdensity as defined in equation (1). This is done for all five simulations (fiducial cosmology and 1-parameter departures).
We then compute the three auto- and cross-power spectra from these fields and build the complete data vector |${\bf d}\equiv \lbrace P_{\delta \delta }(k),\, P_{\delta \Delta }(k),\, P_{\Delta \Delta }(k)\rbrace$| for each simulation.
We construct the data vector derivatives via finite differences as
$$\begin{eqnarray} \partial _\alpha {\bf d}\simeq \frac{{\bf d}(\theta _\alpha +\delta \theta _\alpha)-{\bf d}(\theta _\alpha -\delta \theta _\alpha)}{2\delta \theta _\alpha }, \end{eqnarray}$$
(14)
where |${\bf d}(\theta)$| is the vector of power spectra estimated from the simulation with parameters |$\theta$|⁠.
We construct an approximate covariance matrix, in the form of the analytical disconnected (or ‘Gaussian’) covariance, estimated from the power spectra of |$\delta$| and |$\Delta$| in the fiducial simulation. See Section 2.3.3 for a detailed discussion of the covariance matrix.
We combine the results of the last two points to compute the Fisher matrix (equation 4), and the FOM associated with this mark.

The steps above allow us to estimate the FOM for a given mark function, defined by the hypersphere angles that determined its Gaussian process nodes. To find the optimal mark function, we thus maximize this FOM as a function of the node angles. In our fiducial analysis, we use |$N=4$| nodes, and therefore we marginalize over three angle parameters. The location of the nodes in the |$\delta _R$| axis depends on the value of |$R$| used (which governs the dynamic range of |$\delta _R$|⁠). We spread the 4 nodes evenly between the minimum and maximum |$\delta _R$| values. We show the information for the three smoothing cases in Table 1, and we will study the dependence of our results on these choices in Section 3.2.3. We use the modified Powell algorithm as implemented in scipy (Virtanen et al. 2020) to find the optimal mark function. In practice, we do this by minimizing the negative logarithm of the FOM, which is a smoother function of the node parameters. In order to avoid local minima, and to study the possibility of finding distinct, but equally performing mark functions, we repeat this minimization starting from a set of 100 random points on the hypersphere and study the resulting distribution of optimal mark functions.

Table 1.

Open in new tab

Node spacing information for calculating mark functions. R corresponds to the smoothing scale, while we also show the range of |$\delta _R$| between which our 4 nodes are placed, and |$\ell$| corresponds to the length-scale of the Gaussian process.

R (⁠\|$h^{-1}$\|MPc)	\|$\delta _R(min, max)$\|	\|$\ell$\|
5	\|$-$\|0.9, 18.8	5.0
10	\|$-$\|0.8, 4.6	2.0
30	\|$-$\|0.4,0.5	0.75

Table 1.

Open in new tab

R (⁠\|$h^{-1}$\|MPc)	\|$\delta _R(min, max)$\|	\|$\ell$\|
5	\|$-$\|0.9, 18.8	5.0
10	\|$-$\|0.8, 4.6	2.0
30	\|$-$\|0.4,0.5	0.75

2.3.3 Covariance matrices

A crucial aspect of our analysis is the power spectrum covariance matrix |${\sf C}$| used to calculate the Fisher matrix and FOM. Since the covariance of |$P_{\delta \Delta }(k)$| and |$P_{\Delta \Delta }(k)$| depends on the choice of mark function, this matrix must be recalculated for every value of the node parameters explored in the minimization procedure described in the previous section. Calculating this covariance accurately from simulations at every step would be prohibitively expensive, and therefore we employ an approximate estimate covariance matrix. Specifically, we calculate the disconnected component of the covariance matrix, equivalent to the covariance matrix assuming that all fields involved (⁠|$\delta$| and |$\Delta$|⁠) are Gaussian. In this case, the covariance between the power spectra of a pair of fields |$(a,b)$| and another pair |$(c,d)$| is

$$\begin{eqnarray} {\rm Cov}(P_{ab}(k),P_{cd}(k^{\prime }))=\frac{\delta ^K_{k,k^{\prime }}}{N_k}\left[P_{ac}(k)P_{bd}(k)+P_{ad}(k)P_{bc}(k)\right], \end{eqnarray}$$

(15)

where |$\delta ^K_{kk^{\prime }}$| is the Kronecker delta, and |$N_k$| is the number of modes used to calculate the power spectra in the bin of wavenumbers labelled by |$k$|⁠. The disconnected covariance has been found to be the dominant contribution to the matter power spectrum covariance, especially on mildly non-linear scales (Mohammed, Seljak & Vlah 2016). Therefore, although not accurate in detail, it should provide a reasonable estimate of the power spectrum uncertainties when comparing the performance of different mark functions. Since the validity of this approximation is less well studied for general marked fields and, in general, to mitigate the impact of its inaccuracies on our results, we limit our analysis to scales |$k\lt 0.3h\, {\rm Mpc}^{-1}$| and will study the impact of this choice on our results.

Once the optimal mark function has been determined using this approximation, we make use of the COLA simulations to compute a better estimate of the covariance matrix,

$$\begin{eqnarray} {\rm Cov}_{ij}= {\rm Cov}^{\rm NG, COLA}_{ij} \sqrt{\frac{{\rm Cov}^{\rm G,N-body}_{ii}{\rm Cov}^{\rm G,N-body}_{jj}}{{\rm Cov}^{\rm G,COLA}_{ii}{\rm Cov}^{\rm G,COLA}_{jj}}}, \end{eqnarray}$$

(16)

where |${\rm Cov}^{\rm G}$| is the Gaussian covariance estimated from the measured power spectra (either in the COLA or true N-body sims), following equation (15), and |${\rm Cov}^{\rm NG,COLA}$| is the full covariance matrix calculated from the COLA realizations. The motivation for this equation stems from the fact that we expect the COLA realizations to lack power on small scales when compared to the true N-body simulation. Thus we scale the NG COLA covariance by the relative amplitude between the Gaussian covariance estimated from both types of simulations, along the diagonal. Once calculated, we use this covariance to recalculate the FOM associated with the optimal mark function and quantify the relative information gain compared with that predicted using the analytical disconnected covariance.

It is important to note that, for a general mark function, the fields |$\delta$| and |$\Delta$| (and therefore their power spectra) may be highly correlated. This is obvious, for example, in the case of a constant mark |$M(\delta _R)={\rm const.}$|⁠, for which the fields are completely correlated, and no new information can be gained from |$\Delta$|⁠. In these cases, the combined power spectrum covariance |${\sf C}$| may be very close to singular, and hence numerically unstable upon inversion (e.g. in the constant mark case, two-thirds of the covariance matrix eigenvalues are exactly zero, since |$P_{\Delta \Delta }\propto P_{\Delta \delta }\propto P_{\delta \delta }$|⁠). To avoid this issue, we construct a pseudo-inverse of our covariance matrix using its singular value decomposition. We calculate the eigenvalues and eigenvectors of the matrix, identify all eigenvalues lower than a fraction |$f_{\rm thr}$| of the largest eigenvalue, and construct the inverse matrix by inverting all eigenvalues except those below the threshold, which we set to zero in the pseudo-inverse. This effectively removes all information from modes of the data vector that are aligned with the affected eigenvectors. As in Park et al. (2023), we use a condition number, of |$f_{\rm thr}=10^{-7}$| in our analysis.

3 RESULTS

3.1 Exploring the W16 mark

The performance of the W16 mark function for different values of its free parameters |$(\delta _s,p)$|⁠, and for different smoothing scales was studied in Massara et al. (2021, 2023, 2024). They find the values to depend on the type of data used, N-body simulations, and whether galaxy or matter overdensity fields were used. In Massara et al. (2021), out of 125 combinations of |$R$|⁠, |$\delta _s$|⁠, and |$p$|⁠, the best parameters were found to be |$R = 10 h^{-1}$| Mpc, |$p = 2$|⁠, and |$\delta _s = 0.25$| using N-body simulations. In Massara et al. (2023) 60 parametrizations were explored, finding slightly different marks with |$R\simeq 30 h^{-1}$| Mpc, |$p=1$| and |$\delta _s$| between 0.1 and 0.5. The cases |$(R,p,\delta _s)=(30\, h^{-1}\, {\rm Mpc},0.5,1.0)$| and |$(R,p,\delta _s)=(15\, h^{-1}\, {\rm Mpc},0.5,0.1)$| were employed on real data in Massara et al. (2024) using simulation-based inference. We later compare our results to marks using these parameters.

The fast, simplified evaluation procedure described in the previous section allows us here to quantify the performance of the W16 mark function on a finer parameter grid. Specifically, for three different Gaussian smoothing scales of |$R=5,\, 10$|⁠, and |$30\, h^{-1}\, {\rm Mpc}$|², we vary |$p$| between (−5, 5), and |$\delta _s$| between (0,5), on a |$49\times 49$| grid (i.e. |$2{,}401$| evaluations for each smoothing scale). We then calculate the parameter errors from the Fisher matrix (see equation 4) |$\sigma (\theta _i)=\sqrt{{{\mathcal { F}}^{-1}}_{ii}}$|⁠, where |$\theta _i\in \lbrace \Omega _\mathrm{m},\sigma _8\rbrace$|⁠. We calculate this error in two cases: for a data vector including the three power spectra |$(P_{\delta \delta }(k),P_{\delta \Delta }(k),P_{\Delta \Delta }(k))$|⁠, and including only |$P_{\delta \delta }(k)$|⁠. Finally, we estimate the error improvement factor |$r$| as the ratio of both errors:

$$\begin{eqnarray} r(\theta)=\frac{\sigma (\theta |\delta \delta)}{\sigma (\theta |\delta \delta +\delta \Delta +\Delta \Delta)}. \end{eqnarray}$$

(17)

The results of this exercise are shown in Fig. 2, where the smoothing scale increases from top to bottom, and error improvements for |$\sigma _8$| and |$\Omega _\mathrm{m}$| are plotted on the left and right columns, respectively. We see the highest improvement for |$\Omega _\mathrm{m}$| at a smoothing radius of |$30\, h^{-1}\, {\rm Mpc}$|⁠, contrasting to |$\sigma _8$| at |$R=5\, h^{-1}\, {\rm Mpc}$|⁠. We can also observe large degeneracies in the mark function parameters for a given smoothing scale (i.e. broad regions of the |$(p,\delta _s)$| space achieving roughly he same error improvement). For example, in the case of |$\Omega _\mathrm{m}$|⁠, a broad region with |$p\lesssim -1$|⁠, largely independent of |$\delta _s$| can achieve close-to-optimal constraints for |$R=5$| and |$10\, h^{-1}\, {\rm Mpc}$| (interestingly, this trend changes sharply to positive values of |$p$| for |$R=30\, h^{-1}\, {\rm Mpc}$|⁠). In contrast, positive values of |$p$| seem to achieve close to optimal constraints on |$\sigma _8$|⁠, almost independently of |$\delta _s$|⁠. In all cases, the error improvement ratio drops sharply near |$p=0$|⁠, where the mark function is approximately constant, as expected. Massara et al. (2023) found maximum information recovery for at |$p=1$| at |$R=30\, h^{-1}\, {\rm Mpc}$|⁠, and |$p=2$| for |$R=10\, h^{-1}\, {\rm Mpc}$|⁠, using a top-hat smoothing radius. Qualitatively, we can see similarities, however, there is ultimately no clear ‘best’ mark, or clear local maxima and minima in the |$(p,\delta _s)$| plane. This motivates our goal to explore more general mark functions.

$Error improvement (see equation 17) in $\Omega _\mathrm{m}$ (left) and $\sigma _8$ (right) when altering the free parameters $\delta _s$ and $p$ of the W16 mark function in equation (6). We also calculate for three different values of the smoothing scale, $R$, of (5, 10, 30) $h^{-1}$Mpc.$

Figure 2.

Error improvement (see equation 17) in |$\Omega _\mathrm{m}$| (left) and |$\sigma _8$| (right) when altering the free parameters |$\delta _s$| and |$p$| of the W16 mark function in equation (6). We also calculate for three different values of the smoothing scale, |$R$|⁠, of (5, 10, 30) |$h^{-1}$|Mpc.

Open in new tab Download slide

We find a maximum error improvement ratio of |$\sim 1.4$| in |$\sigma _8$| and |$\sim 5$| in |$\Omega _\mathrm{m}$|⁠. Note, however, that these error improvement ratios were estimated using the Gaussian covariance approximation from equation (15), and should therefore not be taken at face value. As shown in Table 2, the effect of this approximation can be significant, with the error improvement ratios changing to 1.7 and 2.8, respectively. Nevertheless, this exercise demonstrates the difficulty in finding an optimal mark from a family of functions with limited freedom.

Table 2.

Open in new tab

Results of error improvement from the 900 optimized mark functions shown in Fig. 3. We display the error ratio, |$r = \epsilon ^\Delta _{i}/{\epsilon _{i}}$| of the mean curves displayed in Fig. 3, with the various covariance approximations described in Section 2.3.3. The last two rows show the results obtained with the W16 parameterizations for parameters |$p=1$|⁠, |$\delta _s=0.5$| (Massara et al. 2024).

Mark type	\|$R\, [h^{-1}\, {\rm Mpc}]$\|	\|${r}(\sigma _8)$\|	\|${r}(\Omega _\mathrm{m})$\|
	5	1.3	2.7
Optimized, Gaussian Cov.	10	1.4	4.1
	30	1.4	4.5
	5	2.0	4.5
Optimized, Combined Cov.	10	2.1	4.6
	30	2.0	3.5
W16, Gaussian Cov.	30	1.4	4.5
W16, Combined Cov.	30	1.7	2.8

Mark type	\|$R\, [h^{-1}\, {\rm Mpc}]$\|	\|${r}(\sigma _8)$\|	\|${r}(\Omega _\mathrm{m})$\|
	5	1.3	2.7
Optimized, Gaussian Cov.	10	1.4	4.1
	30	1.4	4.5
	5	2.0	4.5
Optimized, Combined Cov.	10	2.1	4.6
	30	2.0	3.5
W16, Gaussian Cov.	30	1.4	4.5
W16, Combined Cov.	30	1.7	2.8

Table 2.

Open in new tab

Mark type	\|$R\, [h^{-1}\, {\rm Mpc}]$\|	\|${r}(\sigma _8)$\|	\|${r}(\Omega _\mathrm{m})$\|
	5	1.3	2.7
Optimized, Gaussian Cov.	10	1.4	4.1
	30	1.4	4.5
	5	2.0	4.5
Optimized, Combined Cov.	10	2.1	4.6
	30	2.0	3.5
W16, Gaussian Cov.	30	1.4	4.5
W16, Combined Cov.	30	1.7	2.8

Mark type	\|$R\, [h^{-1}\, {\rm Mpc}]$\|	\|${r}(\sigma _8)$\|	\|${r}(\Omega _\mathrm{m})$\|
	5	1.3	2.7
Optimized, Gaussian Cov.	10	1.4	4.1
	30	1.4	4.5
	5	2.0	4.5
Optimized, Combined Cov.	10	2.1	4.6
	30	2.0	3.5
W16, Gaussian Cov.	30	1.4	4.5
W16, Combined Cov.	30	1.7	2.8

3.2 Optimizing the mark function

3.2.1 Optimal marks

Given the degeneracies displayed by the W16 mark function, it is important to ensure that, for the more complex Gaussian process marks we aim to optimize, we can find true, converged, global minima. To do so, for a given choice of smoothing scale and FOM, we run 100 optimizations starting from random points on the surface of the 4D sphere. Once the minimizers have converged, we can flag those that ended up in local minima, having achieved a significantly lower FOM than the optimum. For the three smoothing scales and FOMs explored (i.e. optimizing for |$\Omega _\mathrm{m}$| or |$\sigma _8$| separately, and for their combined FOM), this leads to a total of 900 mark functions.

The results of this study are summarized in Fig. 3. Each column corresponds to a different smoothing scale, |$R = 5,\, 10$|⁠, and |$30\, h^{-1}\, {\rm Mpc}$| from left to right. The first row shows the optimized mark function for each case, each containing the 300 mark curves found by the optimizer, 100 for each FOM. Each FOM is plotted in a different colour, with the filled region illustrating the range of different curves found by the optimizers (i.e. maximum and minimum). Remarkably, for each smoothing scale, we find that the minimizer converges on almost the same mark function for all three FOM choices. Consequently, it seems possible to find a single function that optimizes constraints on both of these parameters simultaneously. The black line shows the mean mark function, calculated from the 900 optimizers. The physical interpretation of this optimal mark is not immediately evident, especially given the additive and multiplicative symmetries discussed in Section 2.2.1, which would allow us to shift the mark functions shown in the figure up or down by an arbitrary amount. Since these symmetries allow us to fix the value of the mark function at a given |$\delta _R$|⁠, let us take |$M(\delta _R=0)=0$| as a natural choice (this ensures that, to lowest order, the marked field is at least quadratic in the matter overdensity). With this constrains in mind, consider for concreteness the optimal mark for |$R=10\, h^{-1}\, {\rm Mpc}$|⁠: since the mark function reaches a minimum close to |$\delta _R=0$|⁠, it effectively upweights both underdensities and overdensities away from the mean with the same sign.³ The other smoothing scales seem to approximately reproduce this structure (a minimum close to |$\delta _R=0$|⁠). This is a markedly different behaviour to that of the W16 function, shown as a red line in the figure for the parameters taken from Massara et al. (2021, 2024) of |$R=10, p = 2, \delta _s = 0.25$| and |$R=30, p = 1, \delta _s = 0.5$|⁠, respectively. Since the W16 function is monotonically decreasing, even if we used the shift symmetry to enforce the constraint |$M(\delta _R=0)=0$|⁠, the point |$\delta _R=0$| would simply be a zero crossing and not a minimum. With our normalization, the W16 function would thus weigh underdensities and overdensities with opposite signs.

$The columns correspond to results for different smoothing scales, corresponding to $R =$ 5, 10, 30 $h^{-1}$ Mpc, respectively. In the first row, we display the optimal mark functions $M(\delta _R)$ found for 900 optimizers, 300 for each column, as a function of the value of the smoothed overdensity. The marks are colour coded for each of three different FOM, with the filled region illustrating the range of different curves found by the optimizers (i.e. maximum and minimum). The second row shows the power spectra for the autocorrelation and cross-correlation of the density field, $\delta$, and marked density field, $\Delta$, for the mark functions shown above. The third row shows a slice of the mark function as a function of the density field of the simulation $M(\delta _R)$, highlighting the scales of features each mark upweights/downweights. The marked field $\Delta ({\bf x})$ is shown in the final row, which is simply a product of the marked field in row 3 and the fiducial field in Fig. 4.$

Figure 3.

The columns correspond to results for different smoothing scales, corresponding to |$R =$| 5, 10, 30 |$h^{-1}$| Mpc, respectively. In the first row, we display the optimal mark functions |$M(\delta _R)$| found for 900 optimizers, 300 for each column, as a function of the value of the smoothed overdensity. The marks are colour coded for each of three different FOM, with the filled region illustrating the range of different curves found by the optimizers (i.e. maximum and minimum). The second row shows the power spectra for the autocorrelation and cross-correlation of the density field, |$\delta$|⁠, and marked density field, |$\Delta$|⁠, for the mark functions shown above. The third row shows a slice of the mark function as a function of the density field of the simulation |$M(\delta _R)$|⁠, highlighting the scales of features each mark upweights/downweights. The marked field |$\Delta ({\bf x})$| is shown in the final row, which is simply a product of the marked field in row 3 and the fiducial field in Fig. 4.

Open in new tab Download slide

The second row in Fig. 3 shows the three power spectra |$\lbrace P_{\delta \delta }(k),P_{\delta \Delta }(k),P_{\Delta \Delta }(k)\rbrace$| for the optimal mark function in each case. It is interesting to note that while, with our choice of sign for the mark function, all three power spectra are positive for |$R=5$| and |$10\, h^{-1}\, {\rm Mpc}$|⁠, the cross-correlation |$P_{\delta \Delta }(k)$| changes sign as a function of scale for |$R=30\, h^{-1}\, {\rm Mpc}$|⁠. This is not necessarily surprising, since the morphology of the regions over which the smoothed overdensity field takes different values (and hence the structures that get positively or negatively weighted) is a strong function of the smoothing scale. It is worth noting that, although the power spectra used for this analysis were measured from a single N-body simulation, they are numerically stable, as are the finite-difference partial derivatives used to compute the Fisher matrix. This is greatly aided by the fact that the N-body simulations were run with different cosmologies all using initial conditions generated from the same seed. Therefore we expect the qualitative results found here to be robust against numerical instabilities in the method used to estimate the Fisher matrix and associated parameter uncertainties.

This can be seen in the third and fourth rows of the same figure, which show the mark function evaluated on the smoothed overdensity field, |$M(\delta _R({\bf x}))$|⁠, and the marked field |$\Delta ({\bf x})$|⁠, defined in equation (1), in a 2D slice of the N-body simulation used here. The original overdensity field is shown in Fig. 4. The marked fields display, in all cases, a significantly different morphology, with some structures in the original field highlighted and others suppressed. The marked fields, in which extreme environments (e.g. large overdensities or voids) can undergo significant enhancements, can therefore display very significant NG fluctuations, potentially stronger than those of the original unsmoothed overdensity. This may play an important role in the covariance matrix of the resulting power spectra, as we will see.

$A slice of the simulation for the original, not marked, overdensity of the fiducial cosmology. Values are capped at $2 \sigma$ values away from the mean value for clarity, where $\sigma$ is the standard deviation of the field.$

Figure 4.

A slice of the simulation for the original, not marked, overdensity of the fiducial cosmology. Values are capped at |$2 \sigma$| values away from the mean value for clarity, where |$\sigma$| is the standard deviation of the field.

Open in new tab Download slide

The error improvement factors resulting from these optimal marks, calculated as described in Section 3.1, are shown in Table 2. The table displays results for two different choices of the covariance matrix: the Gaussian approximation of equation (15) (first three rows for each smoothing scale), used to obtain the optimal mark function, and the combined covariance of equation (16), calculated from the COLA simulations and accounting for NG contributions (rows 4–6). Rows 7 and 8 show the improvement factors found for both covariances using the W16 function with close-to-optimal parameters |$p=1$|⁠, |$\delta _s=0.5$| (Massara et al. 2024). We see that the choice of covariance has a relevant impact on the final achieved constraints, with |$r(\sigma _8)$| changing by up to 50 per cent, and |$r(\Omega _\mathrm{m})$| by up to 60 per cent. The largest improvement factor is achieved for the |$10\, h^{-1}\, {\rm Mpc}$| smoothing scale, which achieves an error reduction factor of 2.1 and 4.6 for |$\sigma _8$| and |$\Omega _\mathrm{m}$|⁠, respectively (compared to 1.4 and 4.1 for the Gaussian covariance). We find that this optimal mark function outperforms the W16 function, by |$\sim 20~{{\ \rm per\ cent}}$| for |$\sigma _8$|⁠, and by |$\sim 60~{{\ \rm per\ cent}}$| for |$\Omega _\mathrm{m}$|⁠. Fixing the smoothing scale to the same one used for the W16 function (⁠|$30\, h^{-1}\, {\rm Mpc}$|⁠), these drop to |$\sim 20~{{\ \rm per\ cent}}$| and |$\sim 25~{{\ \rm per\ cent}}$|⁠, respectively. The results found thus far therefore confirm that it is possible to find more general marks that outperform simple parametrizations such as W16, and that the resulting mark is relatively universal, in that it can optimize the constraints on more than one parameter.

3.2.2 Covariance matrices

Given the non-negligible impact precision with which the covariance matrix is estimated on the final constraints, it is worth exploring the potential origin of the observed differences in more detail. These are displayed in Fig. 5. The three rows show results for the |$R=5,\, 10,$| and |$30\, h^{-1}\, {\rm Mpc}$| smoothing scales. In each row, the first and second plots show the diagonals of the covariance matrix for the |$P_{\delta \Delta }(k)$| and |$P_{\Delta \Delta }(k)$| power spectra, respectively. Results are shown for the Gaussian covariances estimated from the true N-body simulations and from COLA, for the covariance estimated directly from the COLA realizations, and for the combination used in our analysis (see equation 16). The third column then shows the correlation matrix associated with the COLA covariance, with the blocks corresponding to different pairings of the measured power spectra |$\lbrace P_{\delta \delta },P_{\delta \Delta },P_{\Delta ,\Delta }\rbrace$| clearly visible and marked.

$Diagonals of the various covariance matrices for the three mark functions for smoothing scales R = 5 $h^{-1}$ Mpc, 10 $h^{-1}$ Mpc, and 30 $h^{-1}$ Mpc from top to bottom, respectively. We plot the NG covariance calculated from the 100 COLA simulations, compared to the Gaussian approximation from equation (15) calculated for the full N-body simulations, and our combined covariance estimate from equation (16) In the 3rd column, we show the corresponding correlation matrices.$

Figure 5.

Diagonals of the various covariance matrices for the three mark functions for smoothing scales R = 5 |$h^{-1}$| Mpc, 10 |$h^{-1}$| Mpc, and 30 |$h^{-1}$| Mpc from top to bottom, respectively. We plot the NG covariance calculated from the 100 COLA simulations, compared to the Gaussian approximation from equation (15) calculated for the full N-body simulations, and our combined covariance estimate from equation (16) In the 3rd column, we show the corresponding correlation matrices.

Open in new tab Download slide

First of all, comparing the Gaussian errors in the first two columns, we can confirm our expectation that the COLA realizations recover a reduced clustering amplitude on small scales, which motivates the correction included in equation (16). Secondly, we can see that the full covariance estimated from COLA displays a significantly larger variance on small scales than that predicted by the Gaussian approximation, particularly for the |$R=10$| and |$30\, h^{-1}\, {\rm Mpc}$|⁠. As shown in the third column of the figure, this large NG contribution is highly correlated across different wavenumbers, dominating the off-diagonal elements of the covariance on small scales. Effectively, this seemingly large contribution is equivalent to a large increase in the statistical uncertainties of a single mode (or at most a small number of them), which explains its relatively moderate impact in the final constraints (40 per cent at most) in spite of its large contribution to the small-scale power spectrum uncertainties.

The presence of this correlated contribution, and its growing amplitude for larger smoothing scales, is to be expected. The marked field is effectively a product of two fields: |$m({\bf x})\equiv M(\delta _R({\bf x}))$|⁠, and |$\delta ({\bf x})$| where, by construction, |$m({\bf x})$| only has power on scales |$k\, \ll 1/R$|⁠. Therefore, on small scales (⁠|$kR\gg 1$|⁠):

|$m$| acts effectively as a constant (but stochastic) amplitude (⁠|$\Delta ({\bf x})\sim m\, \delta ({\bf x})$|⁠).
|$m$| and |$\delta$| are largely statistically uncorrelated.

In this regime, the power spectrum of |$\Delta$| can therefore be approximated as

$$\begin{eqnarray} \hat{P}_{\Delta \Delta }(k)=m^2\, \hat{P}_{\delta \delta }(k), \end{eqnarray}$$

(18)

and therefore, taking into account the stochasticity of |$m$|⁠, its covariance is:

$$\begin{eqnarray} &{\rm Cov}(\hat{P}_{\Delta \Delta }(k), \hat{P}_{\Delta \Delta }(k^{\prime })) \end{eqnarray}$$

(19)

$$\begin{eqnarray} \equiv \langle \hat{P}_{\Delta \Delta }(k)\hat{P}_{\Delta \Delta }(k^{\prime })\rangle -\langle \hat{P}_{\Delta \Delta }(k)\rangle \langle \hat{P}_{\Delta \Delta }(k^{\prime })\rangle \end{eqnarray}$$

(20)

$$\begin{eqnarray} \simeq \langle m^4\rangle {\rm Cov}(\hat{P}_{\delta \delta }(k),\hat{P}_{\delta \delta }(k^{\prime }))+\sigma _{m^2}^2P_{\delta \delta }(k)P_{\delta \delta }(k^{\prime }), \end{eqnarray}$$

(21)

where |$\sigma _{m^2}^2\equiv \langle m^4\rangle -\langle m^2\rangle ^2$|⁠. The second term in the equation above, due to the stochasticity of the marked smooth field, is 100 per cent correlated on all scales and effectively allows additional variance for the power spectrum to vary its amplitude while preserving its shape.

3.2.3 Stability to analysis choices

As described above, under our fiducial setup we have been able to find an optimal mark function that is qualitatively universal (i.e. functions with similar shapes are able to maximize the information recovered on several cosmological parameters). It is, however, important to verify that the resulting optimal function is not strongly dependent on the specific choices made in our analysis, particularly regarding the methodology used to describe general functions. To this end, we repeated our analysis changing some of these choices, and compared the resulting optimal mark functions with our fiducial result.

Specifically, we explored the following choices:

The number of nodes used to describe the Gaussian process. Our fiducial choice of 4 nodes (corresponding to three degrees of freedom, after normalizing the function as described in Section 2.2.2 may not allow for sufficient freedom to discover potentially more optimal functions. We thus repeat our analysis for five Gaussian process nodes.
The correlation length of the Gaussian process |$\ell$|⁠. This parameter determines the ‘smoothness’ of the function (i.e. the speed with which it can vary across its domain of definition). Our fiducial choice of |$\ell$| (different for each smoothing scale |$R$|⁠) was selected as a compromise between the range of variation of |$\delta _R$| and the separation between nodes |$\Delta \delta _R$|⁠.⁴ This choice may have a significant impact on the shape of the recovered optimal mark, restricting its freedom to vary over overdensity ranges |$\ll \ell$|⁠. We repeat our study for four different correlation lengths: |$\ell \in \lbrace 0.75,\, 2.0,\, 5.0,\, 10\rbrace$|⁠.
The smallest scale used in the analysis. In our fiducial analysis, we use all Fourier modes up to |$k_{\rm max}=0.3\, h\, {\rm Mpc}^{-1}$|⁠. This scale was chosen as a compromise between ensuring that we sufficiently probe the mildly non-linear regime where the overdensity field is clearly NG, while avoiding complex astrophysical effects (e.g. baryonic effects) that may dominate the theoretical error budget on small scales (⁠|$k\lesssim 1\, h\, {\rm Mpc}^{-1}$|⁠). The shape of the optimal mark function likely depends on the range of scales over which NG information can be extracted, and therefore we repeat our analysis for |$k_{\rm max}=0.2\, h\, {\rm Mpc}^{-1}$| and |$0.1\, h\, {\rm Mpc}^{-1}$|⁠.

The results are shown in Fig. 6 for the |$R=10\, h^{-1}\, {\rm Mpc}$| smoothing scale, with our fiducial mark function shown in black. We find that increasing the number of GP nodes (purple line, with node positions marked as circles) leads to an optimal mark that is qualitatively equivalent to our fiducial one. Likewise, varying the GP correlation length does not lead to significant deviations (again, qualitatively), from this optimal mark. We thus conclude that the constraints imposed on the mark function by our analysis, which restricts its degree of smoothness and allowed range of variation, do not have a significant effect on the shape of the optimal mark function found here.

$Example of the universality of the mark function. Optimized curves from different tests are displayed, including varying the number of nodes, the length-scale of the GPR ($\ell$), and the maximum k used in the analysis.$

Figure 6.

Example of the universality of the mark function. Optimized curves from different tests are displayed, including varying the number of nodes, the length-scale of the GPR (⁠|$\ell$|⁠), and the maximum k used in the analysis.

Open in new tab Download slide

On the other hand, we observe that reducing the maximum wavenumber used in the analysis does have a gradual effect on the shape of the optimal mark seemingly shifting the high-density maximum of the function towards larger values of |$\delta _R$|⁠. Specifically, we find cutting |$k_{\rm max}=0.2\, h\, {\rm Mpc}^{-1}$| and |$k_{\rm max}=0.1\, h\, {\rm Mpc}^{-1}$| yields the error improvements in |$\sigma _8$| and |$\Omega _\mathrm{m}$| of (1.2, 2.4), and (1.1, 1.1), respectively. These are reduced significantly compared to the fiducial results of (2.1, 4.6). As justified above, this is not entirely surprising, since the threshold scale |$k_{\rm max}$| directly constrains the level of non-Gaussianity present in the field on the scales used. This result, however, implies that the exact form of the optimal mark function does depend on some of the analysis choices, and therefore should be re-calibrated whenever these choices change.

4 DISCUSSION

In this work, we studied the use of marked power spectra as a way to enhance the constraining power of large-scale structure analysis for cosmology. In particular, we studied the problem of finding an optimal form for the mark function, beyond that allowed by parametric approaches used in the past. To do so, we have modelled the mark function as the posterior mean of a Gaussian process defined by its value at a number of nodes (4 or 5 in our case) spread throughout the domain of the smoothed density field |$\delta _R$|⁠. To facilitate this study, we have identified the symmetries of this mark function (i.e. transformations under which the marked field retains the same amount of information), in particular showing that mark functions are equivalent under affine transformations. Besides reducing the number of degrees of freedom to explore in our optimization, this is also useful in interpreting the properties of different mark functions (e.g. the type of environments over which the original field is up- or down-weighted).

We find that it is possible to find a single mark function that is simultaneously optimal in the recovery of different cosmological parameters (⁠|$\Omega _\mathrm{m}$| and |$\sigma _8$| in the case studied here), both individually and jointly. The form of this optimal mark function is robust against the main choices made to define it (e.g. the number of GP nodes and the correlation length of the GP covariance). When applied to the matter overdensity field, this mark is able to improve the uncertainties on |$\Omega _\mathrm{m}$| and |$\sigma _8$| by a factor of |$\sim 4$| and |$\sim 2$|⁠, respectively (see Table 2). The optimal mark functions found for three different smoothing scales are shown in Fig. 3. Their main common feature seems to be the existence of a minimum around |$\delta _R\sim 0$| which, making use of the affine symmetry mentioned above, can be interpreted as the function upweighting both overdensities and underdensities with the same sign (and not opposite signs, as a zero-crossing mark function would imply). This interpretation is only qualitative and we have seen that the exact shape of the function depends on the choices made when defining the characteristic scales of the analysis: namely, the smoothing scale |$R$| used to define the environment and the smallest scale used to constrain cosmology, |$k_{\rm max}$|⁠.

The results presented here are subject to a number of caveats that must be noted. Perhaps most importantly, our analysis has focused on the use of marked power spectra for the study of the 3D matter overdensity field itself. Real observations instead must rely on biased tracers of this matter overdensity, such as galaxies, or use projected (i.e. 2D) probes, such as cosmic shear or CMB lensing. Our analysis may be extended to 2D fields in the latter case (Cowell + in preparation), but the use of biased tracers implies significant departures from the assumptions of our study. First, our ability to recover information on cosmological parameters is significantly hampered by the need to marginalize over galaxy bias (or, more in general, the details of galaxy–halo relation, Wechsler & Tinker 2018). Although, the use of marked fields may improve our ability to break the degeneracy between bias and cosmological parameters, it is not immediately clear that a single mark function may be found that simultaneously recovers optimal constraints on all parameters. Secondly, galaxy clustering studies inevitably suffer from shot noise, caused by the discrete nature of the tracer. This degrades the precision with which information can be extracted from the smallest scales and may therefore affect the performance and the specific shape of the optimal mark function (just as we have shown a change in |$k_{\rm max}$| does). Finally, we have not considered the effects of super sample covariance on the shape of the optimal mark (Bayer et al. 2023).

In the same vein, our analysis has been relatively limited in scope, studying only constraints on two cosmological parameters. More than one mark function may be needed to optimize constraints on a larger parameter set, particularly if parameters that rely heavily on the small-scale dependence of the power spectrum, such as neutrino masses, are included. The impact of other parameters that is poorly constrained by large-scale structure, but which show degeneracies with parameters of interest (e.g., |$h$| or |$n_s$|⁠), is also unclear. The design of optimal general marks (or sets thereof) for the extraction of constraints from multidimensional parameter spaces is an interesting problem that we leave for future work.

Finally, we have relied on a Gaussian covariance matrix approximation to derive optimal mark functions. This was necessary to avoid having to carry out expensive calculations over hundreds of simulations at each step in the optimization. As we show in Fig. 5, while the Gaussian covariance is a relatively good approximation on large scales, the power spectra involving the marked field display significant departures from it, particularly on small scales. We have argued that these NG contributions, which become increasingly relevant for larger smoothing scales, are relatively harmless since they effectively correspond to a larger variance for a single data mode. Nevertheless, the exact shape of the optimal mark is likely sensitive, at some level, to this approximation. The impact of NG covariances has also affected our analysis by forcing us to choose a relatively conservative scale cut (⁠|$k_{\rm max}=0.3\, h\, {\rm Mpc}^{-1}$|⁠), beyond which other NG contributions would become more relevant. As we have seen, this choice can significantly affect the shape of the resulting optimal mark. A deeper study of the covariance matrix for marked fields would be useful, not only to improve the design of optimal mark functions but, perhaps more importantly, to facilitate their use in cosmological analyses without having to rely upon an expensive simulation-based approach to compute them.

As confirmed by previous works, marked power spectra are a promising avenue to extract NG information from large-scale structure probes. Our study has shown that the statistical power of this technique can be significantly enhanced by searching for an optimal mark function, allowing it to take both positive and negative values, and to depart from simple parametric forms. Applying a similar procedure to the various observational LSS probes under exploitation will therefore increase the return on investment in current and next-generation investment.

ACKNOWLEDGEMENTS

We would like to thank Francisco Villaescusa-Navarro, Elena Massara, and Joaquin Armijo for their useful discussions. JAC is funded by a Kavli/IPMU PhD Studentship. DA acknowledges support from the Beecroft Trust. This work was supported by JSPS KAKENHI Grants 23K13095 and 23H00107 (to JL). We made extensive use of computational resources at the University of Oxford Department of Physics, funded by the John Fell Oxford University Press Research Fund.

DATA AVAILABILITY

The data and software used for this analysis can be made available upon request.

Footnotes

Note that we are abusing the notation here. The data vector |${\bf d}=(P_{\delta \delta },P_{\delta \Delta },P_{\Delta \Delta })$|⁠, is not really a 3D vector, but a concatenation of three power spectra measured at a set of wavenumbers. Our argument holds nevertheless, by simply promoting the matrix |${\sf T}$| to a block-diagonal matrix.

Note that these are not directly comparable to the smoothing scales used in Massara et al. (2021, 2023) as they used top-hat window functions.

Note that, under the multiplicative symmetry, both positive and negative mark function values can be interpreted as upweighting the overdensity field in the corresponding regions, although the relative sign between different parts of the mark function can be used to unlock additional information.

For instance, small lengths |$\ell \ll \Delta \delta _R$| would force the recovered Gaussian process mean to systematically drift towards zero between nodes.

References

Armijo

Cai

Y.-C.

Padilla

Peacock

J. A.

2018

MNRAS

478

3627

Month:	Total Views:
November 2024	48
December 2024	60
January 2025	32
February 2025	34
March 2025	25
April 2025	23
May 2025	13

Article Contents

Optimizing marked power spectra for cosmology

ABSTRACT

1 INTRODUCTION

2 THEORY AND METHODS

2.1 Marked fields and power spectra

2.2 Mark functions

2.2.1 Symmetries of mark functions

2.2.2 General mark functions

2.3 Finding optimal mark functions

2.3.1 Simulations

2.3.2 Method overview

2.3.3 Covariance matrices

3 RESULTS

3.1 Exploring the W16 mark

3.2 Optimizing the mark function

3.2.1 Optimal marks

3.2.2 Covariance matrices

3.2.3 Stability to analysis choices

4 DISCUSSION

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

References

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only