-
PDF
- Split View
-
Views
-
Cite
Cite
Richard Stiskalek, Harry Desmond, Julien Devriendt, Adrianne Slyz, Evaluating the variance of individual halo properties in constrained cosmological simulations, Monthly Notices of the Royal Astronomical Society, Volume 534, Issue 4, November 2024, Pages 3120–3132, https://doi.org/10.1093/mnras/stae2292
- Share Icon Share
ABSTRACT
Constrained cosmological simulations play an important role in modelling the local Universe, enabling investigation of the dark matter content of local structures and their formation. We introduce an internal method for quantifying the extent to which the variance of individual halo properties is suppressed by the constraints imposed on the initial conditions. We apply it to the Constrained Simulations in BORG (CSiBORG) suite of 101 high-resolution realizations across the posterior probability distribution of initial conditions from the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm. The method is based on the overlap of the initial Lagrangian patch of a halo in one simulation with those in another, measuring the degree to which the haloes’ particles are initially coincident. This addresses the extent to which the imposed large-scale structure constraints reduce the variance of individual halo properties. We find consistent reconstructions of |$M\gtrsim 10^{14}~\mathrm{M}_\odot \, h^{-1}$| haloes, indicating that the constraints from the BORG algorithm are sufficient to pin down the masses, positions, and peculiar velocities of clusters to high precision, though we do not assess how well they reproduce observations of the local Universe. The effect of the constraints tapers off towards lower mass, and the halo spins and concentrations are largely unconstrained at all masses. We document the advantages of evaluating halo consistency in the initial conditions and describe how the method may be used to quantify our knowledge of the halo field given galaxy survey data analysed through the lens of probabilistic inference machines such as BORG.
1 INTRODUCTION
The dynamics of the Universe are largely governed by dark matter (DM), constituting the majority of the matter content of the Universe. Over the past decades, cosmological simulations have emerged as the paramount instrument to elucidate its non-linear dynamics and interplay with baryons (Wechsler & Tinker 2018; Vogelsberger et al. 2020; Angulo & Hahn 2022). Simulations typically employ initial conditions (ICs) based on a realization of a Lambda cold dark matter (|$\Lambda \mathrm{CDM}$|) power spectrum and random phases of the primordial matter field (Press & Schechter 1974; Davis et al. 1985; Lacey & Cole 1993; Eisenstein & Hut 1998; Tinker et al. 2008). Such ICs produce universes that resemble the real Universe only statistically, but cannot be linked object by object.
The alternative is the ‘constrained simulation’, in which not only the amplitudes but also the phases of the primordial density perturbations are encoded. The beginnings of this endeavour can be traced back to Bertschinger (1987) and Hoffman & Ribak (1991), who laid the foundation for simulating constrained realizations of Gaussian random fields. Local Universe constraints were subsequently derived from galaxy counts (Kolatt et al. 1996; Bistolas & Hoffman 1998), peculiar velocity measurements (van de Weygaert & Hoffman 2000; Klypin et al. 2003; Kravtsov, Klypin & Hoffman 2002) and galaxy groups (Wang et al. 2014), which, together with advances in simulation resolution, gravity modelling, IC generation or galaxy bias modelling, have led to local Universe simulation becoming a mature field.
Deriving the ICs that generated the structure we see around us is an inference problem, and, as we have only one Universe, is best formulated in a Bayesian framework. This realization led to the development of a Bayesian forward-modelling approach now known as the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm (Jasche & Wandelt 2013; Jasche, Leclercq & Wandelt 2015; Lavaux & Jasche 2016; Jasche & Lavaux 2019), although other early studies explored the forward-modelling approach as well (e.g. Kitaura 2013; Heß, Kitaura & Gottlöber 2013; Wang et al. 2013). borg leverages an efficient Hamiltonian Markov Chain Monte Carlo algorithm to sample the initial matter field along with parameters associated with observational selection effects, galaxy bias, and cosmology. The BORG posterior encapsulates all realizations of the local Universe that are compatible with the observational constraints used to derive them. The flagship application of BORG – and the one that we will use in this work – targeted the |$\mathrm{2M}\tt {++}$| galaxy catalogue, a whole-sky redshift compilation of 69 160 galaxies (Lavaux & Jasche 2016; Jasche & Lavaux 2019) based on the Two-Micron-All-Sky Extended Source Catalog (Skrutskie et al. 2006). Work is in progress to augment the constraints with information from cosmic shear (Porqueres et al. 2021) and peculiar velocities (Prideaux-Ghee et al. 2023).
In this work, we use the Constrained Simulations inborg (CSiBORG) suite of constrained cosmological simulations (Bartlett, Desmond & Ferreira 2021; Desmond et al. 2022), which are based on the BORG|$\mathrm{2M}\tt {++}$| ICs. Each CSiBORG box is a resimulation of ICs from a single CSiBORG posterior sample inferred from the |$\mathrm{2M}\tt {++}$| galaxy catalogue (Lavaux & Jasche 2016; Jasche & Lavaux 2019), so that differences between realizations quantify the reconstruction uncertainty associated with our incomplete knowledge of the galaxy field and galaxy – halo connection. Hutt et al. (2022) used CSiBORG to study the effect of the reduced cosmic variance on the halo mass function (HMF) and clustering of haloes, and developed a method to assess the consistency of halo reconstruction from the final conditions. CSiBORG has also previously been used to create catalogues of local voids-as-antihaloes (Desmond et al. 2022), and search for modified gravity (Bartlett et al. 2021) and dark matter annihilation and decay (Bartlett et al. 2022; Kostić, Bartlett & Desmond 2023).
A focus of constrained simulations has been used to study the Local Group and its assembly history, e.g. within the CLUES (Gottloeber, Hoffman & Yepes 2010; Sorce et al. 2016a) and SIBELIUS (Sawala et al. 2022; McAlpine et al. 2022) projects. The CLUES collaboration explored the reconstruction of the local Universe and its clusters (e.g. Sorce et al. 2016b, c; Sorce, Gottlöber & Yepes 2020). Constrained simulations have also been used to study the connection between Sloan Digital Sky Survey galaxies and their haloes (Wang et al. 2016; Yang et al. 2018; Zhang, Yang & Guo 2022; Xu et al. 2023), quantify the compatibility of the local Universe with |$\Lambda \mathrm{CDM}$| (Stopyra et al. 2021, 2023), model the local Universe in modified gravity (Naidoo et al. 2023), model the reionization of the local Universe (e.g. Ocvirk et al. 2016; Aubert et al. 2018; Ocvirk et al. 2020), or predict the future evolution of protoclusters (Ata et al. 2022). Recently, the SLOW project was introduced, which aims to resimulate constrained ICs from the Constrained LOcal and Nesting Environment Simulations project (clones; Sorce 2018) using a hydrodynamical simulation (Dolag et al. 2023; Böss et al. 2023; Hernández-Martínez et al. 2024).
Due to their Bayesian set-up, BORG and CSiBORG afford quantification of the effects of data and model uncertainties on the dark matter distribution produced in the simulations. An alternative method leverages the Wiener filter, which allows reconstruction of the mean field but cannot quantify reconstruction uncertainty (Hoffman & Ribak 1991; Zaroubi et al. 1995; Zaroubi, Hoffman & Dekel 1999; Doumler et al. 2013a,b; Hoffman, Courtois & Tully 2015). In a recent study, Valade et al. (2023) compared a Wiener filter and Bayesian-based (hamlet; Valade et al. 2022) approach to reconstruction from a mock peculiar velocity catalogue. For example, to quantify reconstruction uncertainty, the CLUES team (as described in Sorce et al. 2014, 2016b) uses constrained realizations to estimate the possible residuals between the Wiener filter and the true underlying field (Hoffman & Ribak 1991, 1992). In this approach, they add random power at large scales – if the Wiener filter is not constrained – to ensure that the resulting power spectrum matches the underlying cosmological model. By varying the random seed for this added power, they can quantify the impact of unconstrained modes on the reconstruction (Sorce et al. 2016c; Sorce 2020). Closely related to this work, Sorce, Blaizot & Dubois (2019) investigated the suppression of cosmic variance in constrained simulations of the Virgo cluster, comparing the reconstructed properties to observational estimates and finding good agreement. Additionally, Sorce, Blaizot & Dubois quantified the uniqueness of the Virgo cluster in comparison to a population of random haloes.
The objective of our study is to investigate precision or robustness of the reconstructions of individual haloes in constrained cosmological simulations, as opposed to accuracy in the sense of how well the reconstructions match the local Universe. We develop a framework to assess whether a halo present in one box is also present in another, and hence what properties of the haloes are robustly reconstructed across the suite, in the sense of being insensitive to simulation within the suite. This is achieved by means of a novel metric, the overlap of haloes’ initial Lagrangian patches. While the method is agnostic as to the way in which the simulations of the suite are linked, a natural application (and the one on which we focus) is to suites that sample the IC posterior of a previous inference (e.g. BORG). We showcase the method by application to CSiBORG, where we quantify the consistency of the halo reconstruction as a function of various halo properties. The significance of the results are established by contrast with quijote, an unconstrained suite. This metric quantifies the consistency of halo reconstruction (precision), but it does not assess how reliably these haloes match onto structures in the real local Universe (accuracy).
Other applications of our method include matching haloes between DM-only simulations and their hydrodynamical counterparts, as well as simulations using different cosmological models, e.g. |$\Lambda \mathrm{CDM}$| and modified gravity. While traditional methods for matching haloes between different runs often depend on a consistent DM particle ordering between the runs (e.g. Butsky et al. 2016; Desmond et al. 2017; Mitchell et al. 2018; Cataldi et al. 2021), ours does not. In both above-mentioned scenarios, our approach can quantify how either the hydrodynamics or the cosmology impacts the properties of individual haloes, instead of relying solely on population statistics conditioned on properties such as mass (e.g. Pallero et al. 2023).
The structure of the paper is as follows. In Section 2, we introduce the two sets of simulations employed in our work, in Section 3, we introduce the overlap metric and its interpretation, Section 4 contains our results and in Section 5, we discuss the results. Lastly, we conclude in Section 6. All logarithms in this work are base-10.
2 SIMULATED DATA
In this section, we describe the two sets of simulations that we use, CSiBORG and Quijote, and their halo catalogues.
2.1 CSiBORG
The CSiBORG suite, first presented in Bartlett et al. (2021), consists of 101 DM-only N-body simulations in a |$677.7~\mathrm{Mpc}\, h^{-1}$| box centred on the Milky Way (MW), with ICs sampled from the BORG reconstruction of the |$\mathrm{2M}\tt {++}$| galaxy survey (Lavaux & Jasche 2016). This reconstruction covers the same volume as each CSiBORG box, discretized into |$256^3$| cells for a spatial resolution of |$2.65~\mathrm{Mpc}\, h^{-1}$| (Jasche & Lavaux 2019). The BORG density field is constrained within a spherical volume of radius |$\sim 155~\mathrm{Mpc}\, h^{-1}$| around the MW, where the |$\mathrm{2M}\tt {++}$| catalogue has high completeness.
In CSiBORG, the ICs are propagated linearly to |$z = 69$| and augmented with white noise on a |$2048^3$| grid in the central high-completeness region, corresponding to a spatial resolution of |$0.33~\mathrm{Mpc}\, h^{-1}$| and a DM particle mass of |$3.09 \times 10^9~\mathrm{M}_\odot\, h^{-1}$|. To ensure a smooth transition to the remainder of the box, a buffer region of approximately |$10~\mathrm{Mpc}\, h^{-1}$| is added at the edge of the high-resolution region. Both BORG and CSiBORG adopt the cosmological parameters from the Planck Collaboration I (2014) best-fitting results, including the Wilkinson Microwave Anisotropy Probe (WMAP) polarization, high multipole moment, and baryonic acoustic oscillation data, except |$H_0$| which is taken from the 5-yr WMAP results combined with Type Ia supernovae and baryonic acoustic oscillation data (Hinshaw et al. 2009) (|$T_\mathrm{CMB} = 2.728~\mathrm{K}$|, |$\Omega _\mathrm{m} = 0.307$|, |$\Omega _\Lambda = 0.693$|, |$\Omega _\mathrm{b} = 0.04825$|, |$H_0 = 70.5~\mathrm{km}~\mathrm{s}^{-1}~\mathrm{Mpc}^{-1}$|, |$\sigma _8 = 0.8288$|, |$n = 0.9611$|). The DM density field is evolved to |$z = 0$| using the adaptive mesh refinement code ramses (Teyssier 2002), where only the central high-resolution region is refined (reaching level 18 by |$z = 0$| with a spatial resolution of |$2.6~\mathrm{kpc}\, h^{-1}$|).
As we will be studying the reconstruction of individual objects, whose initial Lagrangian patches are constrained in BORG, it is illustrative to consider how many BORG cells constitute the Lagrangian patch of a halo. We find this to be approximately |$N \approx 7~M_{\rm tot} / (10^{13} \mathrm{M}_\odot \, h^{-1})$|. BORG constrains the average field value in each cell with physical size |$2.65~\mathrm{Mpc}\, h^{-1}$|, and we do not a priori expect haloes with Lagrangian patches spanning only a few BORG cells to be consistently reconstructed in CSiBORG since such haloes likely vary strongly across the BORG posterior. However, haloes above |$\sim 10^{14}~\mathrm{Mpc}\, h^{-1}$| comprise initially |$\gtrsim 100$| cells and are therefore likely well constrained.
2.2 Quijote
We compare the CSiBORG results to the publicly available Quijote simulations1 (Villaescusa-Navarro et al. 2020). Quijote is a suite of unconstrained simulations evolved from |$z = 127$| to |$z = 0$| using the gadget-iii code (Springel et al. 2008). We use 10 realizations of the Quijote DM-only simulations with randomly drawn IC phases, each with a volume of |$1~\mathrm{Gpc}\, h^{-1}$| and a particle mass of |$8.72 \times 10^{10}~\mathrm{M}_\odot\, h^{-1}$| in a fiducial cosmology: |$\Omega _\mathrm{m} = 0.3175$|, |$\Omega _\Lambda = 0.6825$|, |$\Omega _\mathrm{b} = 0.049$|, |$H_0 = 67.11~\mathrm{km}~\mathrm{s}^{-1}~\mathrm{Mpc}^{-1}$|, |$\sigma _8 = 0.834$|, and |$n = 0.9624$|. Besides the constraints, the most significant difference with csiborg is the volume, so for an approximately fair comparison when calculating any extrinsic quantity we mimic the CSiBORG high-resolution region by splitting each Quijote box into 27 non-overlapping spherical sub-volumes of radius |$155~\mathrm{Mpc}\, h^{-1}$| centred at |$n \times 155~\mathrm{Mpc}\, h^{-1}$| for |$n=1, 3, 5$| along each axis.
2.3 Halo catalogues
We use the friends-of-friends halo finder (FOF; Davis et al. 1985) in both CSiBORG and Quijote, with a linking length parameter of |$b = 0.2$|. FOF connects particles within a distance b times the mean interparticle separation. FOF can create artificially large structures by connecting extraneous particles to haloes along nearby filaments of the density field, particularly for merging haloes at |$z = 0$| (Eisenstein & Hut 1998). Warren et al. (2006) and Lukić et al. (2009) proposed corrections to this based on particle number and halo concentration, respectively, but as we do not require high-precision halo masses, we do not apply them here.
To reduce numerical resolution errors of recovered haloes and their properties, the authors have suggested that haloes must contain at least |$50\!-\!100$| particles (Springel et al. 2008; Onions et al. 2012; Knebe et al. 2013; van den Bosch & Jiang 2016; Griffen et al. 2016), though Diemer & Kravtsov (2015) suggested stricter criteria for measuring, e.g. concentration to avoid having only a few particles in the inner parts of the halo, especially if the inner region spans only a few force resolution lengths. Recently, van den Bosch & Ogiya (2018) proposed a more stringent criterion to determine the numerical convergence of infalling subhaloes. However, in this work, we are not concerned with substructure or halo profiles and therefore adopt the simpler and less restrictive criterion of 100 particles for all haloes, corresponding to a minimum halo mass of |$3.09 \times 10^{11}~\mathrm{M}_\odot\, h^{*-1}$| and |$8.72 \times 10^{12}~\mathrm{M}_\odot\, h^{-1}$| for CSiBORG and Quijote, respectively. In CSiBORG, we identify haloes only inside the high-resolution region. As CSiBORG has a mass resolution nearly two orders of magnitude better than Quijote, when comparing the two simulations, we only consider haloes above the mass resolution of Quijote.
CSiBORG and Quijote assume different cosmological parameters, yielding a difference in both the large-scale structure and halo population. Hutt et al. (2022) study how this affects the HMF, and show in their Fig. 1 no significant disparities due to the different cosmologies. They do however quantify the reduced spread of the HMFs of individual CSiBORG realizations relative to Quijote due to the suppression of cosmic variance by the IC constraints. We show the HMFs in Fig. 1. It is found that the CSiBORG and Quijote HMFs have some systematic disagreement, with the former predicting higher bright-end abundances (see also McAlpine et al. 2022; Desmond et al. 2022; Hutt et al. 2022). However, this becomes statistically significant only above |$M \sim 10^{15} \mathrm{M}_\odot \, h^{-1}$|. On the other hand, CSiBORG systematically undershoots the Quijote HMF below |$\sim 10^{14.2}~\mathrm{M}_\odot \, h^{-1}$|. These discrepancies have recently been investigated by Stopyra et al. (2023) who showed that replacing the 10-step particle mesh solver used in the BORG forward model with a 20-step COLA solver (Tassev, Zaldarriaga & Eisenstein 2013) corrects for them.

The friends-of-friends (FOF) HMF in CSiBORG and Quijote. The thin background lines show the realizations of CSiBORG and Quijote and the bold lines show their mean. The CSiBORG HMF undershoots and overshoots the Quijote HMF at low and high masses, respectively. The systematic offset is driven by the BORG gravity model and is not associated with deviations of the local Universe from the cosmic average (see Section 2.3 for further discussion). The lower limit is approximately given by Quijote haloes containing 100 particles.
3 METHODOLOGY
We aim to develop a metric to quantify the precision with which haloes are robustly reconstructed between boxes, e.g. across the simulations of constrained suites. We do so by evaluating the similarity of proto-haloes at high redshift, measured by their initial Lagrangian patches. This is a priori a sensible approach as it is the initial conditions that are constrained: focusing on the initial snapshot facilitates the establishment of a more causally coherent framework, circumventing reliance on interpretations deduced purely from the final conditions as in e.g. Hutt et al. (2022). Most of our work is intended to interpret and show that it is useful a posteriori as well.
Similar approaches are used to associate haloes between DM-only and hydrodynamical simulations sharing the same ICs (e.g. Butsky et al. 2016; Desmond et al. 2017; Mitchell et al. 2018; Cataldi et al. 2021). For instance, Desmond et al. leverage the ability to match DM particle IDs between DM-only and hydrodynamical runs to match haloes as follows: If halo a from the DM-only run shares the most particles with halo b from the hydrodynamical run, and conversely, b shares most particles with a, they are identified as a match. This bears a strong resemblance to the method we develop because the particle IDs are assigned based on their position in the initial snapshot. In CSiBORG, the particle IDs are not consistent between realizations, so this method cannot be used directly.
We cross-correlate two IC realizations as follows. We denote the first simulation as the ‘reference’ and the second as the ‘crossing’ simulation and calculate the overlap of each halo in the reference simulation with all haloes in the crossing simulation. We identify (FOF) haloes in the final snapshot of both the reference |$\mathcal {A}$|th and the crossing |$\mathcal {B}$|th IC realization and trace their constituent particles back to the initial snapshot. To calculate the intersecting mass of a halo |$a \in \mathcal {A}$| and |$b \in \mathcal {B}$|, we use the nearest grid point (NGP) scheme to assign the halo particles to a |$2048^3$| grid in the initial snapshot, matching the initial refinement of the high-resolution region. We denote the mass assigned to the nth cell as |$m_a^{(n)}$| and |$m_b^{(n)}$|, respectively, for the two haloes. On the same grid, we also calculate the background mass field of all particles assigned to a halo in the final snapshot in the respective simulation, |$\widehat{m}_{\mathcal {A}}^{(n)}$| and |$\widehat{m}_{\mathcal {B}}^{(n)}$|.
In the NGP scheme, a particle located at |$x_i \in \left[0, 1\right]$| along the ith axis is assigned to the |$\lfloor x_i N_{\rm cells}\rfloor$|th cell, where |$N_{\rm cells}$| is the total number of cells along an axis. We then apply a Gaussian smoothing to the NGP field using the kernel width of one cell and define the intersecting mass of haloes a and b as
where |$n = 1,\ldots ,2048^3$| is the grid index. This definition accounts for the fact that particles of more than one halo may contribute to a single cell and may be motivated as
i.e. the contribution of a is weighted by the mass of b in that cell normalized by the total mass in that cell, and vice versa. Next, we define the IC overlap between the haloes a and b as
such that |$M_{a} = \sum _{n} m_{a}^{(n)}$| is the total particle mass of the ath halo, and similarly for the bth halo. This ensures that |$\mathcal {O}_{a b} \in \left[0, 1\right]$| and can be interpreted simply as the mass of the ath halo that overlaps with the bth halo normalized by their total mass. The definition of equation (1) accounts for the fact that some cells of a halo |$a \in \mathcal {A}$| may overlap simultaneously with haloes |$b_1, b_2 \in \mathcal {B}$|. Due to the presence of |$b_2$|, the intersecting mass |$X_{a b_1}$| is appropriately reduced (denominator of Equation 1) to avoid double-counting. For further illustration, in Fig. 2, we present an example of Lagrangian patches from two haloes in CSiBORG with a |$75{{\ \rm per\ cent}}$| overlap. Despite being drawn from distinct steps of the BORG posterior, the two Lagrangian patches are located at the same comoving position and lead to a collapsed structure at |$z = 0$| with a mass of approximately |$10^{15} ~ \mathrm{M}_\odot \, h^{-1}$|, also at a similar location.

Example of Lagrangian patches at |$z = 69$| for two haloes identified at |$z = 0$| from CSiBORG at two distinct Markov Chain Monte Carlo steps of the BORG posterior. Both haloes have a mass of |$10^{15} ~ \mathrm{M}_\odot \, h^{-1}$| at |$z = 0$| and a mutual Lagrangian patch overlap of approximately |$75{{\ \rm per\ cent}}$|. The plot shows the projected mass density of the Lagrangian patches in the x–y plane, along with density contours corresponding to the two haloes. The contour colours match that of the labels above the figures. Notably, the size of each Lagrangian patch of such haloes is around |$30~\mathrm{cMpc}\, h^{-1}$|. Both panels share the same x- and y-axes.
We can illustrate the overlap on a toy example of two haloes. If we have that |$m_a^{(n)} = m_b^{(n)} = \widehat{m}_\mathcal {A}^{(n)} = \widehat{m}_\mathcal {B}^{(n)} = m$| in all cells, then |$X_{a b}$| is simply m times the number of cells occupied simultaneously by both a and b. Moreover, if the ath halo is completely enclosed within the bth halo, then the overlap can be expressed as |$\mathcal {O}_{a b} = M_a / M_b$|, i.e. the ratio of their masses. In case of perfect overlap, |$M_a = M_b$| and thus |$\mathcal {O}_{a b} = 1$|.
The overlap measures the similarity of two haloes in their Lagrangian patches. However, a halo may overlap with many smaller haloes, producing a large set of small overlaps with none of the overlapping haloes being similar to the reference halo in mass or size. Therefore, we also calculate the maximum overlap that a halo a has with any halo in the crossing simulation |$\max _{b \in \mathcal {B}} \mathcal {O}_{a b}$|. If this quantity is sufficiently high across all crossing simulations, it implies that the halo is consistently reconstructed across the IC realizations. While the overlap between a pair of haloes is symmetric, if a halo |$a_0 \in \mathcal {A}$| has a maximum overlap |$\mathcal {O}_{a_0 b_0} = \max _{b \in \mathcal {B}} \mathcal {O}_{a_0 b}$| with some halo |$b_0 \in \mathcal {B}$|, this does not imply that |$b_0$| also has a maximum overlap with |$a_0$| since its maximum overlap is defined as |$\max _{a \in \mathcal {A}} \mathcal {O}_{b_0 a}$| and the two sets of haloes over which the overlaps are maximized are not the same.
The properties of the overlap |$\mathcal {O}_{a b}$| lend it a natural interpretation as the probability of a match between haloes in two simulations. That a reference halo can have a non-zero overlap with multiple haloes that themselves may overlap in their initial Lagrangian patches is already accounted for in the definition of the overlap through the denominator of the intersecting mass in equation (1), which implies that the overlap of a pair of haloes is modified by the presence of other overlapping haloes. Therefore, we consider that the probability that a reference halo a being matched to some halo in the crossing simulation is simply the sum of the overlaps with all haloes in the crossing simulation:
and the probability of not being matched to any halo in the crossing simulation is
The definitions of the intersecting mass and overlap in equations (1) and (3) ensure that not only the overlap between a pair of haloes is always |$\le 1$|, but also that the sum of overlaps with a reference halo such as in equation (4) is always |$\le 1$|. In the denominator of equation (1), the intersecting mass is weighted by the fraction of a cell occupied by the pair of haloes, taking into account the contributions from all haloes in both simulations. This ensures that the intersecting mass is not counted multiple times when summing over crossing haloes.
We calculate the most likely value of a matched halo property |$\mathcal {H}$| (for instance, mass) based on haloes that overlap with the reference halo. For each crossing simulation indexed i, we select |$\mathcal {H}_i$| of the halo with the highest overlap |$w_i$| with the reference halo. We then calculate the most likely matched property as the mode of the distribution of |$\mathcal {H}_i$| weighted by |$w_i$|. To identify it, we employ the shrinking sphere method, commonly utilized for finding the centre of DM haloes (Power et al. 2003). We iteratively shrink the search radius around the weighted average of the enclosed |$\mathcal {H}_i$| samples until less than 5 samples are enclosed, at which point we take their average as the mode of the distribution. We verify that 5 samples are sufficient and our conclusions are not affected by increasing it.
4 RESULTS
We showcase the framework on the CSiBORG suite. We first calculate the overlaps of halo initial Lagrangian patches defined in equation (3). We then calculate the halo mass, spin, and concentration of overlapping haloes and compare them to the reference halo. At each step, we compare our findings to the unconstrained Quijote suite to assess the significance of the results. We calculate the overlaps for haloes whose total mass exceeds |$10^{13.25}~\mathrm{M}_\odot \, h^{-1}$| and pairs of haloes closer in mass than |$2~\mathrm{dex}$|. The latter is simply to reduce computational expense, since halo pairs with larger differences in mass have negligible overlaps (|$\lesssim 1$| per cent).
Although the DM particle mass differs between CSiBORG and Quijote, this does not prevent us from comparing trends across the two simulation suites. This discrepancy would only be problematic if we were matching a CSiBORG box to Quijote, which we do not. The overlap metric itself does not directly depend on the particle mass, as the particles are deposited onto a regular grid: it is primarily influenced by the shot noise resulting from sampling the density field with a finite number of particles. However, we do not expect this to significantly impact our results. A halo with a mass of |$10^{13.25}~\mathrm{M}_\odot \, h^{-1}$| in Quijote already consists of 200 particles, with the shot noise decreasing further at higher halo masses.
4.1 Overlaps
We begin by using a single reference simulation and assessing how consistently its haloes are present in the remaining IC realizations. We calculate the non-zero overlaps between its haloes and those of another simulation following the approach of Section 3. This yields for each halo in the reference simulation a set of overlaps (which could potentially be empty), but its sum will never exceed one because of the denominator of equation (1). Retaining the same reference, the process is reiterated across the remaining IC realizations.
4.1.1 Pair overlap
We present the overlaps between the haloes in Fig. 3(a) where, for every reference halo, we concatenate all sets of overlaps with the remaining IC realizations. For comparison, in Fig. 3(a), we also show the overlaps in Quijote where we again use a single reference simulation. The hex-bin shows the CSiBORG results and the lines indicate binned medians and |$2\sigma$| spread. It is evident that in both CSiBORG and Quijote, the most likely overlap value is close to zero, which follows from the predominance of non-matched haloes in both simulation suites. However, a notable distinction emerges in CSiBORG: there is a pronounced tail of high overlaps, especially evident for haloes above |$10^{14}~\mathrm{M}_\odot \, h^{-1}$|. These massive haloes have counterparts that occupy the same part of the initial snapshot in the majority of the IC realizations. This is not the case in Quijote, where in fact the high-overlap tail becomes less significant for more massive haloes due to their rarity.

Comparison of Lagrangian patch overlaps in CSiBORG and Quijote simulations. In both cases, a single reference simulation is assumed, and for each halo, a list of overlaps is calculated per crossing simulation. The x-axis denotes the mass of the reference halo, the hex bins illustrate CSiBORG overlaps, with the lines delineating overlaps in CSiBORG and Quijote, each accompanied by |$1\sigma$| error bars showing the spread among points. The differentiation in overlap behaviour between the two simulations becomes evident at high mass, where the constraints of the CSiBORG suite become significant. The concatenated pair overlaps show no trend because they are dominated by frequent chance overlaps. On the other hand, there is a clear trend in the maximum overlaps towards higher halo masses.
4.1.2 Maximum pair overlap
Next, we keep the same reference simulation, but instead calculate the maximum overlap of a reference halo in each other IC realization. In Fig. 3(a), we present the median of the maximum overlaps for each reference halo across the remaining IC realizations. Unlike before, in CSiBORG, the mean trend of the maximum overlaps as a function of mass is no longer close to zero. On the other hand, in Quijote, the mean trend of maximum overlaps remains close to zero. This indicates that there are haloes in any pair of simulations that match well in CSiBORG, especially at high mass, while there are not in Quijote. We verify that this conclusion holds irrespective of the choice of reference simulation.
In Fig. 4, we show the fraction of CSiBORG haloes in a reference simulation that have a median maximum overlap with other simulations over |$1,\, 2,\, 3\sigma$|-level thresholds calculated in Quijote (|$84.1,\, 97.7,\, 99.9$| per cent). This figure again highlights that more massive haloes are more clearly constrained: already at |$10^{14} \mathrm{M}_\odot \, h^{-1}$|, about 75 per cent of CSiBORG haloes have median maximum overlaps more significant than the |$1\sigma$| level in Quijote.

Fraction of CSiBORG reference haloes, as a function of |$\log M_{\rm tot}$|, with a mean maximum overlap with the remaining IC realizations exceeding the |$1,\, 2,\, 3\sigma$| thresholds in Quijote (|$84.1,\, 97.7,\, 99.9$| per cent), as shown in Fig. 3. The bands show the |$1\sigma$| spread among the CSiBORG realizations. Even at the lower mass threshold a large proportion of CSiBORG haloes has maximum overlaps more significant than in Quijote.
In Fig. 5, we show in CSiBORG for every halo from a single reference simulation the number of simulations in which a reference halo a and a halo b from a crossing simulation both have maximum overlaps with each other (|$f_{\rm sym}$|). We plot |$f_{\rm sym}$| against both the reference halo mass and the average maximum overlap of a halo. On average, haloes around |$\sim 10^{14} \mathrm{M}_\odot \, h^{-1}$| have symmetric maximum overlaps in 50 per cent of the IC realizations. In contrast, haloes around |$\sim 10^{15} \mathrm{M}_\odot \, h^{-1}$| have symmetric maximum overlaps in nearly all IC realizations. On the other hand, the relationship between |$f_{\rm sym}$| and the average maximum overlap has less scatter than its relationship with mass. For example, haloes with an average maximum overlap of |$\sim 0.2$| have symmetric maximum overlaps in approximately 50 per cent of the realizations. The results of Figs 4 and 5 demonstrate that the most massive haloes exhibit reduced variance in CSiBORG compared to a suite of unconstrained simulations, highlighting the robustness of our method in identifying consistently constrained haloes across different simulations. However, this does not imply that these haloes are accurately reconstructed relative to real observations; rather, it reflects the internal consistency of the simulation suite.

Number of simulations in which a reference halo a and a halo b from a crossing simulation both have maximum overlaps with each other, |$f_{\rm sym}$|, shown for every halo from a single reference simulation in CSiBORG. The dashed line is an arithmetic average in a bin along with |$1\sigma$| spread. The highest mass haloes have symmetric maximum overlaps in nearly all IC realizations.
4.1.3 Probability of being matched
Lastly, we calculate the probability that a reference halo has a counterpart in any other IC realization, as given by equation (4). In Fig. 6, we plot the mean and standard deviation of this probability, averaged over the remaining IC realizations. If a halo has no potential match in another simulation, then the sum of its overlaps in that simulation is simply zero. Mirroring previous results, there is a clear distinction between CSiBORG and Quijote above |$\sim 10^{14}~\mathrm{M}_\odot \, h^{-1}$|, in that the majority of CSiBORG haloes are matched while the majority of Quijote haloes are not. Below this mass, the distinction weakens, though it remains significant. The uncertainty on the probability of a match shown in Fig. 6(b) peaks at |$10^{14}~\mathrm{M}_\odot \, h^{-1}$|. Below this mass, and particularly near the lower mass threshold of |$10^{13.25} \mathrm{M}_\odot \, h^{-1}$|, there are only haloes with a mass above this threshold to overlap with and not below it, thus underestimating the uncertainty.

Probability of a reference halo having a match in the remaining IC realizations calculated as the sum of overlaps. The hex bins show the CSiBORG data and the lines are the mean trends in CSiBORG and Quijote, with |$1\sigma$| error bars characterizing the spread among points. The probability of a match is typically significantly higher in CSiBORG except for the lower mass threshold where the constraints weaken. The truncation at low masses in Fig. 6(a), and the reduction in uncertainty toward very low masses in Fig. 6(b), are due to the lower mass threshold of |$10^{13.25} \mathrm{M}_\odot \, h^{-1}$| when matching haloes.
This is complementary to the results of Fig. 3(a), which was instead sensitive to the maximum overlap a reference halo has with another simulation. On the other hand, in Fig. 6, a high probability of a match can also be due to adding many small overlaps.
In both CSiBORG and Quijote, there is a trend that more massive haloes have higher sum of overlaps, although this is more pronounced in CSiBORG. However, this is not surprising, since the most massive haloes have initial Lagrangian patches of |$\sim 10~\mathrm{Mpc}\, h^{-1}$| and thus naturally overlap with more objects. For comparison, see the trend of Fig. 3(a) where the mean trend of the maximum overlaps in Quijote never rises – while the large haloes have overlaps with many small ones such that the sum of overlaps may increase, the maximum overlap a given reference halo has with any one halo in a crossing simulation does not increase with mass.
In Fig. 7, we show in CSiBORG the relation between the probability of being matched and the maximum overlap averaged over the IC realizations. The two quantities are strongly correlated, though the most massive haloes deviate from the |$1\!-\!1$| line, as smaller overlaps start to contribute more significantly to the sum over overlaps. Nevertheless, even for the most massive haloes, this shows that sum over overlaps is typically dominated by one object that has a large overlap with the reference halo.

Relation between the mean summed overlap and mean maximum overlap of reference haloes, both of which are averaged over the remaining CSiBORG IC realizations. Although the most massive haloes deviate from the |$1\!-\!1$| line due to contributions of smaller overlaps, it is clear that the summed overlaps are typically dominated by the overlap to a single object, making the match relatively unambiguous.
4.2 Halo properties of overlapping haloes
Now that we have explored the properties of the overlap statistic and its behaviour across the CSiBORG suite, we turn our attention to investigating the properties of haloes that it matches. This includes their separation (Section 4.2.1), mass (Section 4.2.2), peculiar velocity (Section 4.2.3) and concentration and spin (Section 4.2.4).
4.2.1 Final snapshot separation of overlapping haloes
For a reference halo that has a significant overlap with a halo in other IC realization, our anticipation is that their proximity in the ICs will make the pair unusually close in the final snapshot as well. Because there are nearby haloes between boxes even in the absence of any IC constraints, we juxtapose our results with those from the unconstrained Quijote suite. Any suppression in distance observed in CSiBORG relative to Quijote can then be ascribed to the constraints.
We show the results in Fig. 8. We cross-match all IC realizations to a single reference simulation and compute |$\langle \Delta R\rangle$|, the overlap-weighted average separation of haloes in the final snapshot averaged over all IC realizations. We calculate the separation as |$\Delta R = \Vert (\boldsymbol {x}_a - \boldsymbol {x}_b)\Vert$|, where |$\boldsymbol {x}_a$| and |$\boldsymbol {x}_b$| are the position of the reference and matched halo, respectively, and |$\Vert \boldsymbol {x} \Vert = \sqrt{\boldsymbol {x} \cdot \boldsymbol {x}}$|. In CSiBORG haloes, are consistently more likely to remain close in the final snapshot if they originate from the same Lagrangian patch. The lowest mass matched objects in CSiBORG show a variation of |$\sim 10~\mathrm{Mpc}\, h^{-1}$|. For an individual reference halo and its maximum-overlap matches from the remaining IC realizations we find, as expected, a strong negative correlation between the overlap value and their |$z = 0$| separation.

The overlap-weighted mean separation of haloes in the final snapshot |$\Delta R$| of CSiBORG realization 7444 averaged over all remaining IC realizations. The hex bins are the CSiBORG overlaps and the lines are the CSiBORG and Quijote mean trends, respectively, with |$1\sigma$| spread among points Although the halo positions are constrained across the entire mass range when compared to Quijote, there is a significant variation in the positions of the lower mass objects.
4.2.2 Mass of matched haloes
The next quantity that we look at is the most probable mass of the matched haloes. We calculate this following the approach outlined at the end of Section 3. From each crossing simulation, we find the mass of the maximally overlapping halo and construct a weighted histogram, where the weights are the maximum overlaps. We then define the most probable mass as the mode of this distribution, with its uncertainty given by the square root of the overlap-weighted average square of residuals around the mode. We plot an example of this in the left panel of Fig. 9 for the most massive halo in a single CSiBORG realization, finding the most likely mass to be within |$0.2~\mathrm{dex}$| of the reference halo mass. This is in part by construction, since the overlap itself is preferentially higher for haloes that have a similar mass. However, while the overlap is preferentially higher for haloes of a similar mass, it does not guarantee the objects being a ‘good’ match. If that were the case, then we would find that even in Quijote the mass of the matched haloes is on average close to the mass of the reference halo, which it is not.

Distributions of most likely matched haloes’ properties (|$\log M_{\rm tot}$|, |$\log \lambda _{\rm 200c}$|, c) with the most massive high-resolution halo in one IC realization (7444) of CSiBORG shown as the blue histogram. The green ‘control’ histogram is the spin and concentration of haloes with a similar mass from the remaining IC realizations: from each, we take the 10 haloes closest in mass to the reference halo. The blue and green vertical lines are the mode of the ‘matched’ and ‘control’ histograms, respectively, and the corresponding shaded bands are |$1\sigma$| uncertainties. The red line is the reference halo property. A comparison to the control distribution reveals that neither the spin nor the concentration is constrained. The matched concentration of this halo has a sharp peak near the reference concentration; however, a similar trend is not observed for other massive haloes.
By analysing histograms like Fig. 9, we calculate and show in Fig. 10 the most likely mass of all reference haloes in CSiBORG above |$10^{13.25}~\mathrm{M}_\odot \, h^{-1}$|. In this figure, we show three panels: comparison between the reference and most likely halo mass, the uncertainty of the most likely mass as a function of the reference mass and the ratio of the most likely mass to the reference mass as a function of the median probability of being matched.

Most likely mass of haloes matched to a single realization of CSiBORG averaged over all crossing simulations and defined as the mode of a histogram such as Fig. 9. The left panel shows the reference-to-expected halo mass relation, the middle panel shows the uncertainty of the expected mass and the right panel shows the ratio of the expected mass to the reference mass as a function of probability of being matched defined in equation (4). The agreement between the reference and matched mass is strongly correlated with the matching probability.
The reference and most likely matched halo masses agree well for the most massive objects, with deviations from the |$1\!-\!1$| line increasing towards lower mass. However, these objects also have a consistently lower probability of being matched (right panel of Fig. 10) and therefore it is not unexpected. Below the scale of |$\sim 10^{14}~\mathrm{M}_\odot \, h^{-1}$|, the matching is less robust: we only cross-match haloes with mass above |$10^{13.25}~\mathrm{M}_\odot \, h^{-1}$| since below this scale, the IC constraints weaken as indicated by Figs 3(a) and 6(a). Therefore, a reference halo near the threshold will only have potential matches that are above this threshold, thus biasing the matching procedure.
4.2.3 Peculiar velocity alignment
In Fig. 11, we show the alignment and ratio of the magnitudes of the peculiar velocities in the final snapshot of CSiBORG. We define the alignment angle |$\theta$| between the peculiar velocity vectors of haloes a and b with peculiar velocities |$\boldsymbol {v}_a$| and |$\boldsymbol {v}_b$|, respectively, as:

The mean alignment angle (top panel) and ratio of magnitudes (bottom panel) of reference haloes with the remaining IC realizations in CSiBORG, plotted as a function of the mean maximum overlap of the reference halo averaged over the IC realizations. The dashed line is the mean trend in a bin with |$1\sigma$| spread among points. Higher overlapping haloes tend to have their peculiar velocities more aligned in final snapshot.
For each reference halo, we calculate the alignment of its peculiar velocity vector with the corresponding vector of the highest overlapping halo from another IC realization. We find that the alignment of a reference halo and a single maximum overlap crossing halo is strongly correlated with the magnitude of this overlap across the IC realizations. The alignment and ratio of the magnitudes plotted in Fig. 11 is averaged over the crossing IC realizations. While most matched cluster-mass haloes show strong alignment in the final snapshot, there are exceptions. These often coincide with a lower mean maximum overlap. Nevertheless, even plotting the alignment as a function of the mean maximum overlap, there remains a significant scatter. We also compare the magnitudes of peculiar velocities, finding their ratios to be on average |$\sim 1$|, however, with significantly larger scatter towards smaller overlaps.
4.2.4 Spin and concentration constraint
Lastly, we turn our attention to the spin and concentration of these haloes. We use the Bullock spin definition
where |$J_{\rm 200c}$| is the angular momentum magnitude of particles within |$R_{\rm 200c}$| and |$V_{\rm 200c}^2 = G M_{\rm 200c} / R_{\rm 200c}$| (Bullock et al. 2001). We define concentration as the ratio of the virial radius to the scale radius of the Navarro–Frenk–White (NFW) profile, |$c = R_{\rm 200c} / R_{\rm s}$| (Navarro, Frenk & White 1996).
We calculate the most likely spin and concentration of the matched haloes following the approach outlined in Section 3. In the middle panel of Fig. 9, we show the comparison of the spins of overlapping haloes to the spin of a reference halo, which we take to be the most massive halo in one IC realizations of CSiBORG. However, unlike previously, we do not find any preference for the spin of overlapping haloes to be similar to the reference halo spin, regardless of whether we weight the matched haloes by overlap or not. The matched distribution in Fig. 9 has a secondary peak near the reference spin; however, it is not statistically significant. In fact, the distribution of the matched spins is in good agreement with the simulation average, which is approximately a mass-independent Gaussian distribution in |$\log \lambda _{\rm 200c}$|. We find similar conclusions to hold for all haloes, regardless of their mass.
Next, we investigate the concentration of the matched haloes. In the right panel of Fig. 9, we show the comparison of the concentrations of overlapping haloes to the concentration of a reference halo, which we again take to be the most massive halo in one IC realizations of CSiBORG. We find that in this particular example, the mode of the weighted distribution agrees well with the reference halo concentration, but the width of this distribution is still similar to the expectation from the simulated mass–concentration relation. To delve deeper, we assess the matched concentrations for all haloes with a mass exceeding |$10^{15}~\mathrm{M}_\odot \, h^{-1}$|, which we previously identified as being consistently reconstructed. We find no discernible correlation between the most likely and reference concentrations. We also find that the concentration of the highest overlap halo from each IC realization is within 10 (30) per cent of the reference value in only 20 (50) per cent of realizations, without any clear dependence on the reference halo mass. The agreement seen in Fig. 9 is thus coincidental or specific to the halo in question. However, even in such instances, the significance is questionable as there is no notable improvement over the mass–concentration relation.
5 DISCUSSION
5.1 Interpretation of the results
Constrained simulations enable an object-by-object comparison of the local Universe with theory. If a direct, or at least probabilistic, correspondence between a simulated halo and an observed structure can be established, constrained simulations potentially allow inference of the properties and assembly histories of nearby objects and hence pose rigorous tests for galaxy formation models and cosmology. In this work, we outline and test a method for assessing whether a halo is robustly reconstructed across a set of simulations, differing for example in ICs drawn from the posterior of a preceding inference. This allows us to compare the properties of the ‘same’ haloes across the simulations. By ‘robustly reconstructed’, we refer specifically to the suppression of variance in halo properties across the sampled ICs, meaning the haloes originate from the same Lagrangian patches and have similar properties at |$z = 0$|. An important additional aspect in assessing the quality of any local Universe reconstruction is how faithfully it reproduces the observed Universe. However, we do not test this aspect in this work.
We illustrate our framework by applying it to the CSiBORG suite of 101 constrained simulations of the local |$\sim 155~\mathrm{Mpc}\, h^{-1}$| Universe with ICs on a grid of spacing |$2.65~\mathrm{Mpc}\, h^{-1}$| derived from the BORG inference of the |$\mathrm{2M}\tt {++}$| catalogue. We find that cluster-mass haloes (|$M \gtrsim 10^{14}~\mathrm{M}_\odot \, h^{-1}$|) are consistently reconstructed across the suite and originate from the same Lagrangian region. Haloes of this mass are distributed over |$\sim 70$| cells in the initial snapshot at |$z = 69$|. Assuming that the comoving cell density in the initial snapshot is |$\Omega _{\rm m} \rho _{\rm c}$|, where |$\rho _{\rm c}$| is the current critical density of the Universe, we can approximate the spatial resolution, L, required to distribute a mass M over N cells as:
Assuming the criterion of 70 resolution elements across the initial Lagrangian patch to be universal, this suggests that for haloes of mass |$10^{15},\, 10^{14},\, 10^{13}~\mathrm{M}_\odot \, h^{-1}$| to be robustly reconstructed, one must use IC constraints at the scales |$5.5,\, 2.6$| and |$1.2~\mathrm{Mpc}\, h^{-1}$|, respectively.
While high-mass haloes are consistently reconstructed and have counterparts of similar mass and peculiar velocity across all IC realizations, the overlapping haloes originating from same Lagrangian regions are not similar in neither their spin or concentration. Despite the BORG reconstruction employed in this work not utilizing a peculiar velocity catalogue as a constraint – relying solely on galaxy positions in the |$\mathrm{2M}\tt {++}$| catalogue – it is reassuring to find that the highest mass haloes indeed have aligned velocities, since to some extent galaxy positions are complementary to the peculiar velocity field information. In fact, BORG has been used in the past to reconstruct the local peculiar velocity field (Jasche & Lavaux 2019).
Cadiou, Pontzen & Peiris (2021) demonstrated that the angular momentum of haloes can be accurately predicted directly from their Lagrangian patches, but that it exhibits chaotic behaviour under small changes to the patch boundary. Although the high-mass CSiBORG haloes are present in all IC realizations, their overlaps never reach unity and so their Lagrangian patches are not exactly aligned. Therefore, given the findings of Cadiou, Pontzen & Peiris, the lack of constraint on spin is not surprising. The halo concentration depends on both the Lagrangian patch configuration and subsequent accretion history (Rey, Pontzen & Saintonge 2019). While we find that on average the halo concentration in CSiBORG is not constrained, we leave a detailed examination of specific observed clusters for future work along with comparing the mass accretion histories of haloes that are initially strongly overlapping. This will help tease out the properties of haloes and their constituent particles that are responsible for setting their concentrations.
For certain haloes, we observe that those originating from consecutive CSiBORG simulations tend to have higher maximum overlaps. CSiBORG resimulates every 24th step of the BORG chain, yet the autocorrelation length of the chain is |$\sim 100$|. CSiBORG thus effectively oversamples the BORG posterior – even though it varies without correlation the unconstrained small-scale modes at each step – which could lead to an overestimation of confidence in the BORG constraints if relying only on a few consecutive CSiBORG samples. To mitigate this effect, we have averaged across all remaining 100 IC realizations for each reference realization, the majority of which are fully decorrelated from it. A fully satisfactory solution would be to resimulate only decorrelated IC realizations; however, it is challenging to derive a large number of these due to the high computational cost of the BORG inference.
5.2 Comparison with the literature
Hutt et al. (2022) introduce an algorithm to assess whether ‘twins’ of a single halo can be identified in all IC realizations of a constrained simulation. This is done by selecting a reference simulation and cataloguing the positions of all haloes. A halo is then chosen from the reference simulation, and its size is calculated as |$R_{\rm 200c} = \left[ (3 M_{\rm 200c}) / (4\pi \cdot 200 \cdot \rho _{\rm crit}) \right]^{1/3}$|. In the other simulations, the haloes within a designated ‘search radius’ of the reference halo are then identified and the one most similar in mass selected. If no haloes are found within the search radius, the reference halo is discarded as not present within the crossing simulation. This approach differs from ours in several important ways:
We match haloes directly in the ICs, which are constrained by BORG, instead of relying on the forward-modelled realizations of the final snapshot,
We do not require the matched haloes to be within a fixed radius of one another, but instead calculate the overlap of their initial Lagrangian patches,
While the procedure of Hutt et al. is inherently binary, ours has a continuous interpretation in terms of probabilities.
It is nevertheless instructive to compare our results. In Fig. 12, we show the fraction of matches identified via the approach of Hutt et al. that are also the maximum-overlap pairs for several choices of search radii in CSiBORG. Due to the stringent requirement of Hutt et al. that a halo must have a ‘twin’ in all IC realizations, there is a strong agreement between the two approaches. We calculate |$f_{\rm agreement}$|, the fraction of matches identified by Hutt et al. that are also the maximum overlap pairs as a function of a lower mass threshold. For the fiducial search radius used by Hutt et al., the agreement is |$\sim 90$| per cent regardless of the mass threshold. However, this comes at the cost of only a small number of haloes being identified as matches by Hutt et al.: for example, at |$10^{14}~\mathrm{M}_\odot \, h^{-1}$|, this is only |$\sim 10$| per cent of the haloes. Our method has the advantage of providing useful information even in the regime where the matching across all realizations is partial or even weak.

Comparison of our matching procedure to that of Hutt et al. (2022), who identify ‘twin’ haloes in the final snapshots of CSiBORG. Solid lines correspond to |$f_{\rm agreement}$|, which is the fractional agreement between the Hutt et al. matches and the maximum-overlap pairs above mass |$M_{\rm tot, min}$|. Dotted lines are |$f_{\rm match}$|, the fraction of haloes above |$M_{\rm tot, min}$| identified as matches by Hutt et al.. The high |$f_{\rm agreement}$| indicates excellent agreement between the approaches regardless of the search radius (shown in the legend). However, only a small fraction of haloes are identified as matches by Hutt et al.. Our method establishes a match probability applicable even to far more uncertain matches.
Another methodology with similarities to ours is that of Pfeifer et al. (2023), in which the goal is to identify the simulation most representative of the local Universe. However, a crucial difference between our work and that of Pfeifer et al. is that we focus on studying the consistency of constraining haloes across the posterior samples of an IC inference, rather than assessing the reliability of matching these haloes to the local Universe. They use a Wiener filter-based reconstruction that must be supplied with random small-scale/unconstrained modes. This strategy can alternatively be conceptualized as an intensive fine-tuning process to pinpoint the optimal random seed on a very fine grid level, which can then be re-simulated (McAlpine et al. 2022). Pfeifer et al. find the realization that best resembles the local Universe by minimizing the sum of p-values of simulated cluster-mass haloes being at the locations of observed clusters. However, while they find the most representative constrained simulations, their approach does not provide any information about the consistency with which the matched simulated haloes are reconstructed. We believe that this is crucial information to determine the confidence with which we can assert the properties of the dark matter distribution of the local Universe, given the quality and quantity of data used to set the constraints.
A similar technique to match observed galaxies to haloes from constrained cosmological simulations has been applied by Zhang et al. (2022), Xu et al. (2023). Zhang et al. develop a ‘neighbourhood’ subhalo abundance matching (SHAM), a SHAM model specifically tailored for constrained cosmological simulations which ranks haloes based on both their peak mass and closeness in position and velocity to the observed galaxy (Yang et al. 2018). This statistical approach enables them to connect haloes in their single constrained simulation with observed galaxies, thereby facilitating the study of, e.g. the galaxy-to-halo size relation. Our method would allow folding in the uncertainties associated with the reconstruction.
6 CONCLUSION
We have investigated the extent to which haloes are robustly reconstructed across multiple cosmological simulations that sample the IC posterior of a previous inference. While this question is particularly pertinent to simulation suites constrained (with uncertainty) to match the local Universe, other applications include the matching of haloes between DM-only simulations and their hydrodynamical counterparts, or simulations of varying cosmology such as |$\Lambda \mathrm{CDM}$| versus modified gravity.
We argue that for a halo to be consistently reconstructed, its Lagrangian patch in the initial conditions must strongly overlap with a halo in most other realizations of the suite, a condition implying similarity in both mass and location. This is a stricter and more causal measure than similarity in the final snapshot, providing a clearer condition for two haloes to be ‘the same’. In the future, an even stricter measure of similarity may be introduced by measuring the haloes’ initial overlap in the |$6\mathrm{D}$| phase space directly, instead of only the |$\mathrm{3}\mathrm{D}$| position space.
We apply the method to CSiBORG, a suite of constrained simulations with initial conditions sampled from the posterior of the BORG algorithm applied to the |$\mathrm{2M}\tt {++}$| galaxy number density field. We establish the significance of the results by comparing to the unconstrained Quijote suite. Based on the criteria mentioned above, we find that cluster-mass haloes (|$M \gtrsim 10^{14} \mathrm{M}_\odot \, h^{-1}$|) are consistently reconstructed in CSiBORG in position, mass, and peculiar velocity, with higher mass haloes typically more strongly constrained. For haloes below this mass threshold, the constraints diminish, and haloes do not consistently originate from the same Lagrangian patches. Regarding secondary halo properties such as spin and concentration, even consistently matched, high-mass haloes display variations across the IC realizations. The absence of constraints on concentration beyond the mass–concentration relation is surprising given its strong dependence on mass assembly history. This might imply that the assembly history of even the most massive haloes in CSiBORG remains unconstrained; however, we defer a comprehensive study of the mass assembly history to future work. In the future, we will also investigate the extent to which halo overlap correlates with reliability of match to an observed object in the local Universe. In sum, our framework provides a step towards the goal of identifying the ICs that led to the observed Universe and the objects it contains.
ACKNOWLEDGEMENTS
We thank Jens Jasche, Guilhem Lavaux, Francisco Villaescusa-Navarro and Tariq Yasin for useful inputs and discussions. We also thank Jonathan Patterson for smoothly running the Glamdring Cluster hosted by the University of Oxford, where the data processing was performed. This work was done within the Aquila Consortium.2
RS acknowledges financial support from STFC grant no. ST/X508664/1, HD is supported by a Royal Society University Research Fellowship (grant no. 211046). This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 693024). This work was performed using the DiRAC Data Intensive service at Leicester, operated by the University of Leicester IT Services, which forms part of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The equipment was funded by BEIS capital funding via STFC capital grants ST/K000373/1 and ST/R002363/1 and STFC DiRAC Operations grant ST/R001014/1. DiRAC is part of the National e-Infrastructure. For the purpose of open access, we have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
DATA AVAILABILITY
The code underlying this article is available at https://github.com/Richard-Sti/csiborgtools_public and other data will be made available on reasonable request to the authors.