-
PDF
- Split View
-
Views
-
Cite
Cite
Zhongxu Zhai, Will J Percival, Testing the framework of the halo occupation distribution with assembly bias modelling and empirical extensions, Monthly Notices of the Royal Astronomical Society, Volume 535, Issue 3, December 2024, Pages 2469–2481, https://doi.org/10.1093/mnras/stae2489
- Share Icon Share
ABSTRACT
We investigate theoretical systematics caused by the application of the halo occupation distribution (HOD) to the study of galaxy clustering at non-linear scales. To do this, we repeat recent cosmological analyses using extended HOD models based on both the Aemulus and Aemulus |$\nu$| simulation suites, allowing for variations in the dark matter halo shape, incompleteness, baryonic effects, and position bias of central galaxies. We fit to the galaxy correlation function including the projected correlation function, redshift-space monopole and quadrupole, and consider how the changes in HOD affect the retrieval of cosmological information. These extensions can be understood as an evaluation of the impact of the secondary bias in the clustering analysis. In the application of BOSS (Baryon Oscillation Spectroscopic Survey) galaxies, these changes do not have a significant impact on the measured linear growth rate. However, we do find weak to mild evidence for some of the effects modelled by the empirical parametrizations adopted. The modelling is able to make the HOD approach more complete in terms of cosmological constraints. We anticipate that the future and better data can provide tighter constraints on the new prescriptions of the HOD model.
1 INTRODUCTION
Galaxy clustering measurements have wide applications in cosmology and galaxy science. At large scales, the spatial distribution of galaxies reveals coherent patterns through baryon acoustic oscillations (BAOs) and redshift-space distortions (RSDs). Observational data from cosmological surveys such as the Sloan Digital Sky Survey (SDSS-I/II; York et al. 2000; Abazajian et al. 2009), the two-degree Field Galaxy Redshift Survey (2dFGRS; Colless et al. 2001; Cole et al. 2005), WiggleZ (Drinkwater et al. 2010), SDSS-Baryon Oscillation Spectroscopic Survey (BOSS; Dawson et al. 2013), SDSS-extended Baryon Oscillation Spectroscopic Survey (eBOSS; Dawson et al. 2016), and the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration 2016, 2024) have provided accurate measurements of the expansion history and structure growth of the universe. When combined with perturbation theory, these measurements can constrain a large family of dark energy and modified gravity models (Clifton et al. 2012; Zhai et al. 2017b; Abdalla et al. 2022; Ferraro et al. 2022).
At small scales, the clustering amplitude is also sensitive to the physics underlying galaxy formation and evolution (see e.g. White et al. 2007; Zheng, Coil & Zehavi 2007; Brown et al. 2008; Wechsler & Tinker 2018; Salucci 2019 and references therein). Due to the dual dependence on cosmology and galaxy formation, it is challenging, but not impossible, to extract cosmological information on small scales. One way to do this is to use N-body simulations to provide an accurate description of the non-linear evolution of the dark matter field. Approximate statistical methods can then be used to place galaxies within dark matter haloes, which allow sufficient sampling in the parameter space to marginalize over the model dependence. This combined approach can allow us to extract cosmological information at non-linear scales, but requires N-body simulations spanning the range of cosmologies to be tested. In order to interpolate between a small number of such simulations, the emulator method (Heitmann et al. 2009) has been demonstrated to provide accurate measurements for the growth rate of structure using the recent observational data (see Kwan et al. 2015, 2023; Zhai et al. 2019, 2023b; Chapman et al. 2022; Yuan et al. 2022 and references therein), as well as investigations into other aspects (Heitmann et al. 2010; Euclid Collaboration 2019; McClintock et al. 2019; Nishimichi et al. 2019; Rogers et al. 2019; Wibking et al. 2019; Lange et al. 2022; Chapman, Zhai & Percival 2023; Storey-Fisher et al. 2024).
In the application of the emulator approach to the SDSS-BOSS galaxies, Zhai et al. (2023b) report that the amplitude of matter fluctuation is lower than that from the cosmic microwave background (CMB) observed by the Planck satellite (Planck Collaboration VI 2020), a finding similar to other studies using galaxies at lower redshift (Leauthaud et al. 2017; Wibking et al. 2020; Lange et al. 2022). This may imply that the modelling to connect galaxies with dark matter haloes is not complete, or novel physics is required to reconcile the various experiments. This emulator approach relies on the halo occupation distribution (HOD) model to describe how galaxies are distributed within haloes. The method is flexible and can be easily extended with more parameters to describe physics beyond its most basic form where halo mass is the only dominant factor (Hearin et al. 2016). Indeed, Zhai et al. (2023b) considered the freedom that the velocity field of central or satellite galaxies can be different from the dark matter field (Guo et al. 2015), and the effect of assembly bias, where the clustering properties of galaxies or dark matter haloes are affected by the local environment in addition to halo mass. They also adopted a velocity scaling parameter to enable a more flexible modelling of the velocity field to retrieve the information due to RSD (Reid et al. 2014).
In addition to the standard two-point correlation function (2PCF) other summary statistics, providing extra small-scale non-linear information, can be explored with a similar emulator approach. These include the marked correlation function (Storey-Fisher et al. 2024), void-based statistics (Fraser et al. 2024), the density split method for galaxy clustering (Paillas et al. 2024), the wavelet scattering transform algorithm (Valogiannis & Dvorkin 2022; Valogiannis, Yuan & Dvorkin 2024), and the kth nearest neighbour statistics (Yuan, Abel & Wechsler 2024). When analysing the two-point functions and these statistics, marginalizing over the HOD parameters to remove the effects of galaxy formation relies on the HOD being a complete description of the link between galaxies and mass in the Universe. This motivates us to consider the impact of more complications in the model of the galaxy–halo connection. This will also allow us to investigate the correlation between the empirical models of galaxy bias and the cosmological parameters, and see if any of the extensions we consider can mitigate the tension between the large-scale structure (LSS) measurement and CMB observations. We consider the parametrization and modelling of both the central and satellite galaxies, including the dark matter halo shape and distribution of central galaxies within the haloes. We also test the assumption of the completeness of the HOD model by incorporating an additional parameter to describe model incompleteness. Although these effects may not directly result from the assembly history of dark matter haloes, they are inspired by considering how galaxy formation might be affected by halo properties. From this point of view, any impact on the clustering property that is not fully modelled by halo mass could be understood as a secondary bias or assembly bias.
We note that these effects have been studied in the literature for different purposes. For instance, within HOD models, the simplest assumption is that haloes are spherical. However, we know that even simple models of structure formation lead to non-spherical collapse (Doroshkevich 1970; Zel’dovich 1970). Furthermore, simulations show that mergers of dark matter haloes can further change the internal structures. Early studies found that the dark matter halo shape can be described by a triaxial ellipsoid model with explicit dependence on halo mass and redshift (Jing & Suto 2002; Allgood et al. 2006). Using the Millennium Simulation and a galaxy catalogue generated by a semi-analytic model, Zu et al. (2008) and van Daalen, Angulo & White (2012) reported that the halo shape can significantly affect the galaxy correlation function in both real and redshift space, especially at small scales. More recently, Mezini et al. (2024) studied the Milky Way-sized and cluster-sized dark matter haloes and found the anisotropic and aligned distribution of subhaloes within their host haloes. Durkalec, Pollo & Abbas (2024) investigated the halo asymmetry using both a HOD and subhalo abundance matching method. This effect may directly influence the retrieval of cosmological information using galaxy clustering measurements at scales where the impact is significant. To explicitly quantify the potential effect, we incorporate this halo shape into our previous emulator-based modelling to explore the possible bias on cosmological parameter inference. In addition, we consider effects such as the spatial distribution of central galaxies, baryonic effects, and incompleteness of the HOD model in this work. We also investigate the impact of neutrino mass using the newly released Aemulus |$\nu$| simulation (DeRose et al. 2023).
This paper is organized as follows. In Section 2, we give an overview of our modelling of galaxy clustering at small scales within the emulator approach. In Section 3, we describe how we extend the basic model with additional prescriptions. In Section 4, we discuss our results and conclude with a summary of the main findings.
2 MODELLING
In this section, we first introduce our analysis pipeline based on the HOD framework, including the summary statistics, model parameters, and likelihood analysis. Then we focus on a new addition to the method, including the Alcock–Paczynski (AP) effect.
2.1 Basic framework
The basic framework for the analysis follows from our previous work modelling galaxy clustering at small scales (Zhai et al. 2019, 2023b; Zhai, Percival & Guo 2023a), and we only summarize it briefly in this section. We fit models to galaxy clustering measurements based on the 2PCF |$\xi (r)$| which measures the excess probability of finding two galaxies separated by a distance r. In order to include information that does not depend on modelling the RSD effect, we fit the projected correlation function (Davis & Peebles 1983):
where |$r_{\rm p}$| and |$\pi$| denote the separations of the galaxy pair perpendicular and parallel to the line of sight, respectively. We combine this with the clustering measurement in redshift space, decomposing the correlation function to compute the multipoles:
where |$L_{\ell }$| is the Legendre polynomial of order |$\ell$|, s is the magnitude of the separation, and |$\mu$| is cosine of the angle between r and the line of sight. We only consider the first two orders of the multipole |$\xi _{0}$| and |$\xi _{2}$|. The measurements of these statistics are performed with 10 logarithmic bins for |$r_{\rm p}$| or s from 0.1 to |$60.2 \, h^{-1} \, {\rm Mpc}$|.
We model galaxy clustering using the HOD framework, based on the model of Zheng et al. (2005). The basic component is the mean number of central and satellite galaxies in each halo:
where |$M_{\text{min}}$|, |$\sigma _{\log {M}}$|, |$\alpha$|, |$M_{\text{sat}}$|, and |$M_{\text{cut}}$| are free parameters. This model has been used extensively for clustering analysis of massive galaxies (e.g. White et al. 2011; Parejko et al. 2013; Zhai et al. 2017a). To complete the model, we also consider parameters for additional effects, including the concentration parameter for the satellite distribution compared with the Navarro–Frenk–White (NFW) profile (Navarro, Frenk & White 1996), velocity bias parameters for central and satellite galaxies (Guo et al. 2015), incompleteness parameter for central occupancy at the massive end (Leauthaud et al. 2016; Lange et al. 2022), scaling parameter of the velocity field of dark matter haloes (Reid et al. 2014), and environment-based assembly bias parameters (McEwen & Weinberg 2018; Salcedo et al. 2018; Han et al. 2019; Xu, Zehavi & Contreras 2021; Yuan et al. 2021). We refer the readers to table 3 of Zhai et al. (2023b) for detailed definitions and symbols of the model parameters used throughout the analysis.
We use the above HOD model to populate dark matter haloes from numerical simulations to generate mock galaxy catalogues. In this work, we utilize the Aemulus simulation suite (DeRose et al. 2019) and the new Aemulus |$\nu$| suite (DeRose et al. 2023) with massive neutrinos. The Aemulus suite comprises 40 simulations with different cosmological parameters to construct the emulator, and another independent 35 simulations to evaluate the emulator’s accuracy.
The Aemulus suite is based on the dark matter-only simulations without massive neutrinos. Modelling the massive neutrino component in numerical simulations for cosmological applications has been an active topic for years, and a variety of methods have been developed in the literature. Depending on the mass range and the required accuracy, both linear and non-linear treatments of the neutrino component have been applied to different purposes (see e.g. Brandbyge & Hannestad 2009; Viel, Haehnelt & Springel 2010; Ali-Haïmoud & Bird 2013; Banerjee & Dalal 2016; Upadhye et al. 2016; Banerjee et al. 2018 and references therein). Research on LSS shows that the finite mass of neutrinos leads to a large velocity dispersion and suppresses the clustering amplitude of dark matter haloes (Lesgourgues & Pastor 2006; Saito, Takada & Taruya 2008). Given the measured precision of the growth rate using galaxy clustering at small scales from our previous work, it is worthwhile to quantitatively investigate the impact of neutrino mass. For this purpose, we utilize the latest Aemulus |$\nu$| simulation suite (DeRose et al. 2023) in part of this work, which extends and updates the previous Aemulus suite while maintaining the same simulation volume and mass resolution. For more details on this simulation, we refer interested readers to DeRose et al. (2023).
In order to provide a reference model for comparison when we investigate the model extensions using simulations with massive neutrinos, we rebuild the emulator and rerun the analysis performed in Zhai et al. (2023b). In terms of the measurement of linear growth rate, we find that the massive neutrinos do not have a significant impact as we show later. We will present a thorough analysis of the constraint on the neutrino mass in a separate paper (Gao et al., in preparation), and we focus on the extension of our model of galaxy–halo connection in this work.
The Aemulus |$\nu$| suite has two tiers. Tier 1 includes 100 simulations covering a wider range of cosmological parameter space, while tier 2 comprises 50 simulations with similar coverage to Aemulus. To ensure emulator accuracy with dense samplings, we only use tier 2 for massive neutrino-related modelling in this work. Both the Aemulus and Aemulus |$\nu$| simulations have the same box size of |$1.05\, h^{-1} \, {\rm Gpc}$| and |$1400^3$| dark matter particles, while the Aemulus |$\nu$| boxes also have |$1400^3$| neutrino particles. The mass resolution slightly depends on the cosmological parameters but is suitable for massive galaxies like the BOSS sample. Both simulation suites are run with a spatially flat w cold dark matter (wCDM) cosmology, and the model parameters with their ranges can be found in DeRose et al. (2019, 2023) or table 3 of Zhai et al. (2023b). The evaluation of each model extension is performed individually with a reference model. We provide more details in the corresponding sections and summarize in Table 1.
Systematics and model extensions investigated in this work. We list the symbols for the parameters, physical meaning, range in the modelling, simulation suite used, and the corresponding section in this paper.
Parameter . | Meaning . | Range . | Simulation . | Section . |
---|---|---|---|---|
|$q_{\parallel }, \, q_{\perp }$| | AP parameters | Derived | Aemulus | 2.2 |
|$e_{1}, \, e_{2}$| | Shape parameter for dark matter halo | [0, 1] | Aemulus | 3.1 |
|$L_{\rm frac}$| | HOD incompleteness parameter | [0, 0.2] | Aemulus |$\nu$| | 3.2 |
|$B_{\rm f}$| | Scaling parameter for baryonic effect | [0.5, 1.5] | Aemulus |$\nu$| | 3.3 |
|$p_{\rm b}$| | Position bias for central galaxies | [0, 0.3] | Aemulus |$\nu$| | 3.4 |
Parameter . | Meaning . | Range . | Simulation . | Section . |
---|---|---|---|---|
|$q_{\parallel }, \, q_{\perp }$| | AP parameters | Derived | Aemulus | 2.2 |
|$e_{1}, \, e_{2}$| | Shape parameter for dark matter halo | [0, 1] | Aemulus | 3.1 |
|$L_{\rm frac}$| | HOD incompleteness parameter | [0, 0.2] | Aemulus |$\nu$| | 3.2 |
|$B_{\rm f}$| | Scaling parameter for baryonic effect | [0.5, 1.5] | Aemulus |$\nu$| | 3.3 |
|$p_{\rm b}$| | Position bias for central galaxies | [0, 0.3] | Aemulus |$\nu$| | 3.4 |
Systematics and model extensions investigated in this work. We list the symbols for the parameters, physical meaning, range in the modelling, simulation suite used, and the corresponding section in this paper.
Parameter . | Meaning . | Range . | Simulation . | Section . |
---|---|---|---|---|
|$q_{\parallel }, \, q_{\perp }$| | AP parameters | Derived | Aemulus | 2.2 |
|$e_{1}, \, e_{2}$| | Shape parameter for dark matter halo | [0, 1] | Aemulus | 3.1 |
|$L_{\rm frac}$| | HOD incompleteness parameter | [0, 0.2] | Aemulus |$\nu$| | 3.2 |
|$B_{\rm f}$| | Scaling parameter for baryonic effect | [0.5, 1.5] | Aemulus |$\nu$| | 3.3 |
|$p_{\rm b}$| | Position bias for central galaxies | [0, 0.3] | Aemulus |$\nu$| | 3.4 |
Parameter . | Meaning . | Range . | Simulation . | Section . |
---|---|---|---|---|
|$q_{\parallel }, \, q_{\perp }$| | AP parameters | Derived | Aemulus | 2.2 |
|$e_{1}, \, e_{2}$| | Shape parameter for dark matter halo | [0, 1] | Aemulus | 3.1 |
|$L_{\rm frac}$| | HOD incompleteness parameter | [0, 0.2] | Aemulus |$\nu$| | 3.2 |
|$B_{\rm f}$| | Scaling parameter for baryonic effect | [0.5, 1.5] | Aemulus |$\nu$| | 3.3 |
|$p_{\rm b}$| | Position bias for central galaxies | [0, 0.3] | Aemulus |$\nu$| | 3.4 |
We use the Latin hypercube algorithm (Heitmann et al. 2009) to sample the HOD model parameter space and generate the training sample. The measurement of the 2PCF (equations 1 and 2) from these mocks yields the final training set to construct the emulator, and the detailed methodology follows our earlier work (Zhai et al. 2019).
In order to constrain cosmological parameters, we perform a Bayesian analysis, with a Gaussian likelihood:
Note that we include emulator inaccuracies in the analysis, using the test simulations to evaluate the emulator accuracy. We include the scatter as an additive contribution to the covariance matrix. To obtain the posterior of the model parameters, we use the nested sampling algorithm (Skilling 2004) provided by the multinest (Feroz, Hobson & Bridges 2009; Buchner et al. 2014) package. The tests run until the convergence criteria are reached, i.e. the Bayes evidence estimate is stable and the posterior can be produced as a by-product.
2.2 Alcock–Paczynski effect
For the measurement of the galaxy correlation function, we must assume a fiducial model to convert redshift into distance, which means the measured correlation function depends on this choice. To mimic this within the model to be fitted to the data, we need to stretch the simulations we use along and across the line of sight by the AP parameters (Alcock & Paczynski 1979).
In the eBOSS analysis (Bautista et al. 2021; Chapman et al. 2022), this effect was incorporated by scaling the distance parameter when making predictions of the 2PCF, but that ignores the effect on halo finding. In our work, instead, we include the effect during the construction of the galaxy mocks, following Lange et al. (2023), where the simulation coordinates are adjusted based on the assumed fiducial cosmology.
Before we apply the AP correction to the model, we first examine the effect on the correlation function measurement. In order to do so, we define two parameters |$q_{\parallel }$| and |$q_{\perp }$| and use them to scale the axes for line of sight and perpendicular directions, respectively, to mimic the AP effect. These parameters are not the same as the |$\alpha$|s commonly used to compress BAO measurements, as they do not include the sound horizon at the drag epoch. We are interested in the shift in the full clustering signal caused by the mismatch of true and fiducial cosmology, not the shift of the BAO position from changes in both the geometric stretching and the sound horizon. In order to maintain the galaxy number density, we regenerate the HOD mocks using the Aemulus boxes with the number density corrected by the factor |$q_{\parallel }q_{\perp }^{2}$| such that the resultant mock after AP parameter correction matches the observed galaxy number density.
Given this modelling, we can test the impact of the AP effect on the cosmological constraints. Since the cosmological parameters are allowed to vary, we can explicitly compute the strength of the AP correction for each corresponding model being tested. Therefore |$q_{\parallel }$| and |$q_{\perp }$| are not free parameters. To implement the AP effect, we construct the emulators for the correlation function with |$q_{\parallel }$| and |$q{\perp }$| as part of the extended parameter set, such that we do not have to define a fiducial cosmology. Then, in the likelihood analysis when the fiducial cosmology is known, we compute the AP parameters of each model as derived parameters and use these values. In Fig. 1, we present the constraint on a subset of parameters in the cosmology and HOD parameter set. For reference, we also show the result when the AP scaling is not applied. We can see that these AP parameters do not have a significant impact on the cosmological measurements. We also compute the linear growth rate and find that the offset is no higher than a few tenth of |$\sigma$|. Therefore, we conclude that ignoring AP stretching in the analysis using small-scale clustering does not have a significant impact. This is consistent with the earlier eBOSS result but using a different approach (Chapman et al. 2022).

Constraint on a subset of the model parameters with the AP effect corrected using BOSS galaxies. For reference, the original result without AP stretching of the models is also shown. The result does not reveal systematic offset due to the addition of the AP effect, and thus the measurement of growth rate is not significantly affected.
3 EXTENSIONS
In this section, we introduce the extensions considered for the HOD-based model. In Table 1, we summarize the key parameters and their corresponding information.
3.1 Dark matter halo shape
The first extension we consider is the effect of dark matter halo shape on clustering measurements using high-resolution N-body simulations. We then introduce a parametrization to extend the parameter space of the HOD-based model, and finally present the impact on cosmological parameters.
3.1.1 Impact of halo shape on correlation function
We adopt a simple approach using subhaloes instead of galaxies such that the halo shape is represented by the spatial distribution of the subhaloes within the host halo. The halo catalogue we use is from the MultiDark Planck 2 (mdpl2) simulation (Prada et al. 2012; Klypin et al. 2016)1 with cosmological parameters |$\lbrace \Omega _{\rm m}, \Omega _{\rm b}, h, n_{\rm s}, \sigma _{8}\rbrace = \lbrace 0.307, 0.048, 0.677, 0.96, 0.8228\rbrace$|. The simulation has a box size of |$1\, h^{-1} \, {\rm Gpc}$| with |$3840^3$| dark matter particles, and the mass resolution is |$1.51 \times 10^9\, h^{-1} \, {\rm M}_{\odot }$|. The dark matter haloes are identified using the rockstar algorithm (Behroozi, Wechsler & Wu 2013). In order to avoid artefacts due to unresolved (sub)structures, we restrict the analysis in this section to only (sub)haloes above |$10^{12}\, h^{-1} \, {\rm M}_{\odot }$|, which corresponds to at least 100 dark matter particles as the lower limit.
We test the effect of the halo shape using the method from Zu et al. (2008) and Daalen et al. (2012). To form a baseline, we randomly rotate the position of each individual subhalo around the centre of the host halo, i.e. we sphericalize the spatial distribution of subhaloes within the host halo. We also apply the same rotation vector to the relative velocity of the subhaloes with respect to the centre of the host halo. This implicitly keeps the correlation between the host halo and subhaloes, such as the infall pattern. In Fig. 2, we present the ratio of the 2PCF between the sphericalized and the original distribution measured in redshift space for |$w_{\rm p}$|, |$\xi _{0}$|, and |$\xi _{2}$|, respectively. The top row shows the example when we only use haloes with a mass in the range |$13.0\lt \log {M \, [h^{-1} \, {\rm M}_{\odot }]}\lt 13.2$|, but for different redshifts, and the bottom row picks the |$z=0.55$| snapshot with different ranges of halo mass. For each realization, we repeat the sphericalization procedure 20 times and measure the 2PCF with three axes as the line of sight. The solid line denotes the mean, and the shaded area displays the 68 per cent uncertainty from these 60 measurements.
![Impact of changes in the dark matter halo shape assumed on the correlation function in redshift space: $w_{\rm p}$ (left), RSD monopole $\xi _{0}$ (middle), and quadrupole $\xi _{2}$ (right), using the halo and subhalo catalogue from the mdpl2 simulation. The result shows the ratio of the measurements between the sphericalized and the original distribution. The top panel shows a sample with halo mass $13.0\lt \log {M \, [h^{-1} \, {\rm M}_{\odot }]}\lt 13.2$ at different redshifts, while the bottom panel shows samples at $z=0.55$ with different halo masses. For each realization, we repeat the sphericalization procedure multiple times with different random seeds and measure the 2PCF. The solid line denotes the mean and the shaded area displays the 68 per cent uncertainty from these measurements.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/535/3/10.1093_mnras_stae2489/1/m_stae2489fig2.jpeg?Expires=1749327613&Signature=EIYB8-Tq4xwvE2r24av6cdLdeQ1sHf4EmAzV8j9Uras06iRtycFjZkZKJBfDeeldNTRy2PRKTeik24AlLJYizrwjh3lKeIkrVtUNLgeyUqh0vziGfNR3uc5zIYIE36oPVp4Ih8QTGNqr9dMoyO86Ku~I8m7NEH0rPm48Ys5ecFLc42vH8dOwwWvH38c~1eMDzlkHoqeF7LbP33y19ygXzijSY--ByNPRXVjA2FW~Wr4fBs2hgL7YdMNUPfxhV0YIVkA1k61LIF5Odg~gxwbnX95RVPhdqi1ylfXjGkUUa1Vd6frsOYD950nFVrew4cLAaBjgfmc8U4mkuh0994lS1g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Impact of changes in the dark matter halo shape assumed on the correlation function in redshift space: |$w_{\rm p}$| (left), RSD monopole |$\xi _{0}$| (middle), and quadrupole |$\xi _{2}$| (right), using the halo and subhalo catalogue from the mdpl2 simulation. The result shows the ratio of the measurements between the sphericalized and the original distribution. The top panel shows a sample with halo mass |$13.0\lt \log {M \, [h^{-1} \, {\rm M}_{\odot }]}\lt 13.2$| at different redshifts, while the bottom panel shows samples at |$z=0.55$| with different halo masses. For each realization, we repeat the sphericalization procedure multiple times with different random seeds and measure the 2PCF. The solid line denotes the mean and the shaded area displays the 68 per cent uncertainty from these measurements.
From the results, we can see that the dark matter halo shape can impact the clustering measurement mainly at small scales, consistent with previous studies and our expectations, since the sphericalization process only redistributes the subhaloes (satellite galaxies). We see some change in |$\xi _{2}$| at |${\sim} 10 \, h^{-1} \, {\rm Mpc}$|, but it appears relatively stronger than it is, due to the statistics crossing zero. From the top row, the results show some dependence on redshift, i.e. haloes of the same mass can experience more effects due to shape in terms of both the amplitude and shape of the correlation function at higher redshift. However, this dependence is not as strong as the halo mass dependence as shown in the bottom row. More massive haloes have a stronger effect due to their shape. The effect on the most massive haloes can become significant at a few |$h^{-1} \, {\rm Mpc}$|, simply due to the increase in the halo radius.
This dependency can be partially explained by the distribution of the shape parameters of dark matter haloes. Fig. 3 shows the ellipticity parameter of the rockstar haloes defined using the mass distribution tensor for particles within the halo radius (Zemp et al. 2011; Behroozi et al. 2013):
where |$x_{i}$| denotes the i component of the position vector. We can obtain the sorted eigenvalues of this matrix to get the axes of the principal ellipsoid (|$a\gt b\gt c$|). This matrix indeed simplifies the shape information of the halo into a few parameters defining an ellipsoid. We show the distribution of this result using the mdpl2 simulation in the figure for various masses and redshifts. For both axis ratios (|$b/a$| and |$c/a$|), we can see a similar dependency with slightly different amplitudes of change. The result is consistent with early studies (Jing & Suto 2002; Kasun & Evrard 2005; Allgood et al. 2006; Bett et al. 2007; Bonamigo et al. 2015; Vega-Ferrero, Yepes & Gottlöber 2017), i.e. the axis ratio is larger (closer to sphericity) for less massive haloes and lower redshifts. This dependency can be further investigated within the framework of hierarchical growth of structure formation, but the scatter of the distribution may imply complicated couplings of various halo properties, such as the halo formation history, mass assembly history, baryonic physics, local environments, and so on. We refer the readers to e.g. Smith, Watts & Sheth (2006), Jeeson-Daniel et al. (2011), Bryan et al. (2013), Chua et al. (2019), Chen et al. (2020), Lau et al. (2021), Cataldi et al. (2023) and references therein for some recent and further studies.

The distribution of the axis ratios for dark matter haloes identified in the mdpl2 simulation. The top row shows the redshift dependency of a sample selected by mass, while the bottom row shows the mass dependency at z = 0.55.
3.1.2 Modelling of halo shape in HOD framework
When using the HOD model to create mock catalogues, the simplest approaches to implement a halo shape are to either place satellites such that they are spherically distributed or to place satellites that follow dark matter particles. Both methods have been used extensively in the past, but we should note that this binary choice may not be flexible enough given the required accuracy of LSS measurements from future surveys with larger volumes and higher number density. In particular, the baryonic physics of galaxy formation may impact the evolution of satellite galaxies within their host haloes such that the distribution does not match that of the dark matter. There can be plenty of effects or phenomena caused by the baryonic processes, and the lack of a well-established model of galaxy formation can make an exhaustive exploration impossible. Therefore, in this work, we use the halo shape as an example and implement an empirical model given the emulator approach we have developed. i.e. we can parametrize the shape and test if the data show any evidence for non-standard behaviour.
We use the distribution of satellite galaxies to represent the dark matter halo shape. To implement the model, we first use the HOD formalism to generate galaxy mocks assuming the satellites are spherically distributed as in our previous work. Then for each host halo, we scale the axis ratios (|$b/a$| and |$c/a$|) determined through the ellipsoidal model (equation 6) by introducing two new parameters:
With the rescaled axis ratio, we scale the coordinates of the satellites in the host halo frame through
Note that we also apply this scaling method to the velocity components of the satellites in the host halo frame. The final step is to perform a single rotation to all of the satellites belonging to the host halo, and we repeat the process for all the host haloes. This removes the correlation between halo orientation and environment and allows us to consider the effect of the radial distribution only. In the parametrization we choose, haloes can return to the spherical shape when |$e_{1} = e_{2} = 0$|, and to an ellipsoidal shape when |$e_{1} = e_{2} = 1$|. Other values of |$e_{1}$| and |$e_{2}$| represent the deviation of the halo shape from spherical or ellipsoidal. We note that this parametrization is not unique, as one can also adopt models for triaxiality to describe ellipticity and prolaticity as used in e.g. Franx, Illingworth & de Zeeuw (1991), Jing & Suto (2002), Allgood et al. (2006), and Lau et al. (2021).
To see the impact of this model, we apply this parametrization to the test samples of the Aemulus suite with large volumes and present the results of |$\xi _{0}$| for 100 independent HOD models with various cosmological simulations in Fig. 4. The result is represented as the ratio of |$\xi _{0}$| with the new satellite distribution divided by the measurement of the original HOD mock. We can see that these additional degrees of freedom can make the clustering amplitude lower than assuming a spherical distribution of satellites. The impact is clearly scale dependent and more significant in the one-halo regime as expected. The overall result shows that the ellipsoidal haloes are less clustered than spherical haloes. This result is related to the work of Faltenbacher & White (2010), which showed that near-spherical haloes are more clustered than non-spherical ones. Their study was in the framework of assembly bias rather than empirical parametrization: they simply split the galaxy population and compared their clustering amplitude. This result should not be confused with the process of sphericalization, as shown in Fig. 2. Sphericalization can reduce the clustering amplitude for the same statistics (|$\xi _{0}$|). This is likely caused by the rotation operation increasing the separation of satellites within the dark matter haloes on average (Daalen et al. 2012). Our model works in a different way with the parametrization in equations (7) and (8) preserving the volume of the spheroid and ellipsoid, and thus the average separation of the galaxy pairs of the one-halo terms. The resultant behaviour of the clustering statistics becomes closer to the physical exploration in Faltenbacher & White (2010). We note that compared with those extensive discussions of the dependency on halo mass, concentration, and halo spin, our parametrization only serves as a simplified approximation.

Effect of our parametrization of dark matter halo shape on |$\xi _{0}$| in redshift space, expressed as the ratio between shape modulated and the original spherical distribution. It displays results from hundreds of HOD models populating the Aemulus test simulations.
We next apply this model to the training set of the HOD mocks and rebuild the emulator within the extended model parameter space. With the new parameters, the overall accuracy of the emulator remains comparable to the original one without significant degradation. We then apply the model to the BOSS measurement from Zhai et al. (2023a, hereafter Z23) at |$z=0.55$| for subsamples split by different galaxy properties. To isolate the effect of the halo shape parameters, we do not implement the environment-based assembly bias parameters as in Z23 and Zhai et al. (2023b), since we find that this effect is not significant in the BOSS galaxies. For a fair comparison, we first turn off the assembly bias parameter by setting |$f_{\text{env}}=0$| in the original model (see e.g. Zhai et al. 2023b for more details of this model) and rerun the cosmological constraint. The result in Fig. 5 shows that ignoring this effect does not affect the measurement of |$f\sigma _{8}$| (red square and red dot), as we already pointed out. We then apply the new emulator model with halo shape parameters to the same BOSS measurement (blue dot). The |$f\sigma _{8}$| value is consistent with the previous one, indicating that the new degree of freedom does not prefer a higher |$f\sigma _{8}$| measurement to resolve the discrepancy with Planck. It is likely that the shape model in the one-halo regime is not correlated strongly with cosmological information, or it is been obscured by significant shot noise. The former can be understood through the baryonic processes within the dark matter haloes, while the latter can be further tested with future data such as DESI with higher galaxy number density (DESI Collaboration 2016). We also perform a sanity check for the emulator construction. The original emulator with assembly bias parameters turned off is equivalent to the new emulator but with shape parameters turned off. The latter can be simply achieved by setting |$e_{1}=e_{2}=0$|, and we find that the result (green dot) is consistent with the previous ones, showing that the tiny variation is only due to the interpolation accuracy of the Gaussian process.

Measurement of |$f\sigma _{8}$| using different emulator models. For reference, the original result from Z23 is shown as red squares, the result using the same model but with assembly bias turned off is shown as red dots. The results from the new emulator with varying halo shape modelling are shown as blue (shape parameter added) and green (spherical distribution) dots, respectively.
3.2 Incompleteness
As mentioned in the previous sections, we use the Aemulus |$\nu$| suite in the following analyses. The first extension we consider is the (in)completeness of the galaxy sample in terms of clustering measurement, which has been previously discussed in the literature such as Leauthaud et al. (2016), Tinker et al. (2017), and Zhai et al. (2017a). To minimize the bias in the analysis of Aemulus V, we adopt corrections from both the sample and modelling sides. For the former, we restrict the sample to a narrow redshift range and only select galaxies at the brighter end. For the latter, we introduce the parameter |$f_{\text{max}}$| to model the incompleteness of central galaxies at the massive end. Indeed our analysis in Zhai et al. (2023b) has shown a correlation between the parameter |$f_{\text{max}}$| and growth rate-related parameters. Enforcing this parameter to unity can lead to a lower estimate of |$f\sigma _{8}$| with a significance of a few tens of |$\sigma$|, similar to the findings in Lange et al. (2022).
In Z23, we tried multiple selections on the galaxy sample in addition to their brightness. We note that value-added parameters such as star formation rate and stellar mass can lead to different levels of completeness in individual samples. This effect may not be fully corrected within the current HOD framework. Given the high accuracy of cosmological measurements, it is useful to revisit the effect. In this section, we focus on the modelling side and explore the systematics that may not be corrected by the current HOD model. On the other hand, there are artefacts in the numerical simulations where galaxies can be lost during the evolution of dark matter haloes. These effects can introduce an extra source of scatter in the modelling of HOD. Therefore, we extend our model by introducing additional degrees of freedom and evaluating the impact on cosmological measurements. We define a new parameter |$L_{\rm frac}$| in this approach and discuss the modelling and results in the following four cases.
3.2.1 Case 1
The first case is a minimal extension of the model. Using the HOD mocks produced with the Aemulus |$\nu$| suite, we randomly select a fraction |$L_{\rm frac}$| of galaxies and remove them from the catalogue. To preserve the number density of the mock galaxies used to generate the HOD mocks, we artificially add |$L_{\rm frac}$| galaxies back to the catalogue, but with some changes. For simplicity, we denote these galaxies as lost galaxies, noting that this terminology may be similar to ‘orphan’ galaxies widely used in literature, but we use a different name as the situations are different. We consider two methods for the distribution of these lost galaxies in phase space. The first method assumes that these lost galaxies are randomly distributed in space and are essentially unclustered, acting as an extreme case. The second method is similar, but we use a random subsample of the dark matter particles to assign both positions and velocities to the lost galaxies.
3.2.2 Case 2
Our HOD model assumes that all the mocks are generated with the same number density. For the analysis of Aemulus V at |$z=0.55$|, it is |$2\times 10^{-4} \, (h^{-1} \, \text{Mpc})^{-3}$|. This number density is used as input to determine the HOD parameter |$M_{\text{min}}$|. This means that the galaxy mocks in case 1 correspond to a HOD model with a number density |$L_{\rm frac}$| lower than the input. To correct this issue, we propose a second case where, when we generate the HOD mocks, the input number density is obtained by scaling the fiducial one by |$(1-L_{\rm frac})$|. This results in a HOD mock with a number density lower than expected. Next, we apply the same two methods as in case 1 to add extra |$L_{\rm frac}$| galaxies to the mock. The resultant mock still has the number density as the fiducial one. This requirement should be fulfilled since the observed sample can have the number density well measured.
3.2.3 Case 3
This case is simple and considers the opposite situation, where we scale the fiducial number density by a factor (|$1+L_{\rm frac}$|) and regenerate the corresponding HOD mocks. Then we randomly pick |$L_{\rm frac}$| galaxies and remove them from the catalogue with a remaining subsample that can still match the required number density, implying that |$L_{\rm frac}$| galaxies are simply lost from the sample.
3.2.4 Case 4
The above cases all deal with adding/removing a fraction of galaxies but differ in how to match the number density. In this case, the simplest model can just assume the number density as a free parameter. To avoid confusion caused by different symbols, we model this effect using the |$L_{\rm frac}$| parameter as a fractional change with respect to the required input number density.
3.2.5 Results
With the above models for different considerations, we regenerate the HOD mocks in the extended parameter space with |$L_{\rm frac}$| and rerun the cosmological constraint on the CMASS galaxies in Aemulus V. In Fig. 6, we present the constraints for each model. For reference, the original constraints from Aemulus V and the reanalysis using Aemulus |$\nu$| are also shown. In the first three cases, the range of |$L_{\rm frac}$| is from 0 to 0.2, while in the last case |$L_{\rm frac}$| can vary from −0.2 to 0.2.

Constraint on the growth-related parameters and the incompleteness parameter |$L_{\rm frac}$| for different models. Top-left: Case 1. Top-right: Case 2. Bottom-left: Case 3. Bottom-right: Case 4. The result from Aemulus V and Aemulus |$\nu$| is also shown for reference. More details are provided in the text.
For the growth-related parameters, we can see that the new modelling of |$L_{\rm frac}$| does not show a significant impact on the results, indicating that the incompleteness of the modelling does not yield a biased measurement of |$f\sigma _{8}$|. The first two cases show quite similar constraints on the incompleteness parameter |$L_{\rm frac}$|. With the first method of adding galaxies that are not clustered, the constraint on |$L_{\rm frac}$| is up to a few per cent level. On the other hand, the second method of adding galaxies by dark matter particles allows a much wider range of |$L_{\rm frac}$|. This is not surprising since the first method is an extreme model to artificially exclude galaxy pairs in the correlation function, and the impact is much more radical than the second method which still uses the clustered dark matter particles.
In the opposite model of case 3, the constraint on |$L_{\rm frac}$| is weak and dominated by the prior. Similar to the previous two cases, the additional degree of freedom does not degrade the cosmological constraint either. Part of the reason is that the change of number density up to 20 per cent can only slightly change the halo mass of the galaxy sample given the Schechter form of the halo mass function. The impact can be much smaller than 0.1 dex on the halo mass range without a significant change in the galaxy bias.
The last case, assuming number density as a free parameter, can be an important test for the parameter degeneracy since it has been fixed in our previous works. The reason is that fixing the number density is equivalent to bringing the abundance information to anchor the average halo mass of the sample and thus the galaxy bias. This may translate to the constraint on the cosmological parameters as well. Therefore, to assess the possible impact, we construct HOD mocks and build emulators with number density as a free parameter. Note that this parameter is expressed as a fractional change with respect to the required one. We find that the cosmological constraint is quite stable, but the measurement on |$L_{\rm frac}$| itself is not tight and can be dominated by the prior.
We also perform another test which assumes a 1 per cent error in the measured number density. We add this information as a Gaussian prior and rerun the test. This is similar to the test in the early BOSS LOWZ clustering analysis (Parejko et al. 2013) which assumes a 15 per cent error on the galaxy number density, but this level of uncertainty corresponds to the variation of number density as a function of redshift. Our selection of BOSS galaxies can have a much lower error. In Storey-Fisher et al. (2024), we use the BOSS QPM mocks and estimate that the fractional change of number density is 0.43 per cent. Therefore, our assumption of 1 per cent in the analysis is a conservative choice. For the constraint on growth-related parameters, we find that it is consistent with the fiducial one when |$L_{\rm frac}$| is free. This consistency seems to imply that most of the cosmological information is from the clustering signals rather than the abundance. This also demonstrates the robustness of our approach, and fixing the number density as an external constraint does not significantly degrade the cosmological measurements.
Next, we apply this new model with extensions of model incompleteness to the BOSS galaxies split by different properties as in Z23. Among these multiple combinations of data and models, we do not find significant deviation in the measurement of linear growth rate. In Fig. 7, we show an example when the sample is selected by luminosity. It shows that adding the new degrees of freedom for sample incompleteness into the HOD approach does not strongly affect the cosmological measurements.

Measurement of |$f\sigma _{8}$| with different model extensions. For reference, the original result from Z23 is shown as a red square, a reanalysis using Aemulus |$\nu$| with assembly bias turned off is shown as red dot. The results show that model incompleteness, baryonic effect, and position bias for central galaxies as model extensions of the HOD framework do not have significant impact on the measurement of cosmological growth. The result using galaxy samples selected by other properties is similar.
3.3 Baryonic effects
The clustering analysis conducted in this study relies on gravity-only simulations. While we have employed a flexible HOD model to capture the relationship between galaxies and dark matter haloes, effectively accounting for the influence of baryonic physics at small scales, it is likely that this model does not fully address the uncertainties associated with galaxy formation physics. Given this limitation, various approaches have been proposed and implemented to investigate the modelling and implications within the cosmological framework. For instance, the BACCO simulation employs numerical simulation and machine learning to quantify the impact of baryonic effect on the matter power spectrum (Aricò et al. 2021), Chaves-Montero, Angulo & Contreras (2023) explicitly explore the potential of resolving the |$S_{8}$| tension with galaxy formation models, and Kwan, McCarthy & Salcido (2024) employ the BAHAMA simulation to study the baryonic effect on galaxy clustering at small scales.
In this section, we take a step in a similar direction but without explicitly delving into complex baryonic models involving star formation, gas cooling, or active galactic nucleus (AGN) feedback. Instead, we introduce a simple single-parameter extension of the HOD model. To incorporate the effects of approximate baryonic processes on the output of gravity-only simulations, we introduce a new parameter |$B_{\rm f}$| that scales the dark matter halo mass. We then use the scaled mass to generate the HOD mocks. We artificially set the range of this new parameter to be [0.5, 1.5] and rerun the analysis, including the generation of mocks and construction of the emulator. We find that this new parameter does not significantly degrade the accuracy of the clustering statistics as we initially anticipated. Subsequently, we apply the model to the BOSS CMASS galaxies to examine the impact on cosmological measurements, similar to the approach in previous sections.
We first examine the impact of this new parameter on the galaxy correlation function using the Aemulus |$\nu$| tier 2 simulations. In Fig. 8, we present the result for |$\xi _{0}$| in redshift space by computing the ratio with and without the |$B_{\rm f}$| parameter for hundreds of HOD models. The red shaded area shows the inner 68 and 95 per cent distribution of the samples. The result shows that the change in halo mass of up to 50 per cent can have a significant impact on the amplitude of the correlation function, with a larger impact at small scales where the one-halo term is more dominant.

Ratio of |$\xi _{0}$| in redshift space considering the effect from baryonic effect (red) and position bias of central galaxies (blue). The results come from the Aemulus |$\nu$| tier 2 simulations with hundreds of HOD models randomly distributed in the parameter space, expressed as the ratio compared with the results without corrections from baryonic effect or position bias of centrals. The shaded area denotes the inner 68 and 95 per cent distribution. The behaviour for |$w_{\rm p}$| and |$\xi _{2}$| is similar.
In the left-hand panel of Fig. 9, we show the constraint on the growth rate-related parameters considering the baryonic effect and compare them with the reference results. It shows that the cosmological inference is almost unchanged, indicating that the approximate model does not affect the measurement of |$f\sigma _{8}$|. On the other hand, the constraint on the parameter |$B_{\rm f}$| is centred at unity. We do not see a noticeable baryonic effect from this clustering analysis. In order to see if this result is due to a statistical fluke or not, we also run tests with the new emulator on the CMASS galaxies split by different properties and find that although the growth rate measurement is not significantly affected (Fig. 7), the constraint on |$B_{\rm f}$| is somewhat higher than Aemulus V. We display the constraint in the left-hand panel of Fig. 10. Because |$B_{\rm f}$| is a scaling factor for halo mass, the preference for |$B_{\rm f}\gt 1$| indicates that the halo mass prediction from the N-body simulation is somewhat underestimated. However, the deviation from unity is less than |$2\sigma$|, and thus the finding is not conclusive. In addition, |$B_{\rm f}$| is correlated with other parameters, affecting both centrals and satellites. The high value of |$B_{\rm f}$| correlates with slightly lower |$f_{\rm max}$|, the incompleteness parameter of central occupancy at the massive end, and lower |$\eta _{\rm vs}$|, the velocity bias of satellites and slightly higher |$\alpha$| for satellite occupancy. In general, a high |$B_{\rm f}$| can produce a galaxy mock with a higher satellite fraction and thus lower number of centrals in order to preserve the galaxy number density. To maintain the galaxy bias anchored by a large-scale measurement, the lower number of centrals has to be offset by less occupancy of the more massive haloes and thus a lower value of |$f_{\rm max}$|. Since the overall impact of |$B_{\rm f}$| is relatively weak at large scales where the centrals dominate (Fig. 8), this influence is not very significant. For the same reason, a high value of |$B_{\rm f}$| can limit satellites to a narrower range of halo mass (a halo before scaling without satellites can host satellites after scaling, and there are fewer haloes available at the massive end due to |$f_{\rm max}$|). The model has to produce more satellite pairs with a faster occupancy (higher |$\alpha$|) as a function of halo mass. In order to balance the enhanced clustering due to more one-halo terms, the velocity field needs to be suppressed to fit the data.

Similar to Fig. 6, but for baryonic model (left) and position bias of central galaxies (right). The results from Aemulus V and Aemulus |$\nu$| are also shown for reference.

Constraint on the extended model parameter from the BOSS galaxies split by different properties. Left: Baryonic parameter |$B_{\rm f}$|. Right: Position bias of central galaxies.
This interpretation does not consider effects on the internal structures of the dark matter haloes such as the radius or radial density profile. These may partially explain the constraint on |$B_{\rm f}$| since the cosmological measurement from clustering also depends on the velocity field, which is not modified in our model. Additionally, the change in dark matter halo mass behaves exactly as that due to assembly bias, but our model is more simplified and can absorb any dependency on parameters rather than halo mass. A more realistic approach could mimic models like the decorated HOD that can redistribute the halo catalogue during mock construction as presented in Hearin et al. (2016).
3.4 Position bias of central galaxies
In the HOD model, we assume that central galaxies reside at the centres of their dark matter haloes. In reality, however, the spatial distribution of central galaxies within their haloes can be complex, influenced by factors such as halo mergers, interactions with neighbouring galaxies, and the local cosmological environment. Therefore, the assumption that central galaxies are located at the centres of their haloes should be re-examined. This effect is particularly important in the study of galaxy clusters and weak lensing, as misidentifying the cluster centre can significantly impact the modelling and understanding of halo dynamics (Yang et al. 2006; Johnston et al. 2007). To differentiate this effect from the miscentring effect in galaxy clusters, we simply refer to it as ‘position bias’ for central galaxies in our analysis.
We relax this assumption by introducing a new parameter |$p_{\rm b}$|. For central galaxies in the mock catalogues, we first identify their host haloes and halo radius |$R_{\rm h}$|. We then multiply |$R_{\rm h}$| by |$p_{\rm b}$| and apply a Gaussian dispersion with a mean of zero and a width of |$p_{\rm b}R_{\rm h}$|. Next, we draw random values from this distribution and add them as perturbations to the positions of the central galaxies. In this way, |$p_{\rm b}$| describes the strength of the offset of central galaxies compared to the halo centre, and we choose a range of [0, 0.3] for |$p_{\rm b}$| in our modelling.
Similar to the previous test on the baryonic effect, we display the impact on |$\xi _{0}$| in Fig. 8. We can see that the overall impact is less significant than the baryonic effect, but this is mainly due to the range of |$p_{\rm b}$| we have considered. On the other hand, the position bias is more important at small scales and has almost no impact at large scales. This is consistent with expectations, as at large scales, the position offset of central galaxies is negligible compared to the distances between galaxies. For the mass range of interest, the average perturbation in distance is around a few tens to hundreds of kpc. However, this change is not negligible for galaxy pairs within massive haloes, thus it can alter the correlation function in the one-halo regime. This is interesting because the model only applies to central galaxies but has a more significant effect at small scales.
With this extended model, we make cosmological constraints using the CMASS galaxies and present the results in the right-hand panel of Fig. 9. Similar to previous work, we do not observe a clear impact on the measurement of the linear growth rate from this position bias. However, we find that the results favour a non-zero value of |$p_{\rm b}$| and peak around 0.2. This suggests that central galaxies have a preference for not residing in the centre of their host haloes. We also conduct the same tests on the BOSS galaxies split by different properties (Fig. 7). The right-hand panel of Fig. 10 shows the constraint on |$p_{\rm b}$|. The results are similar to the tests on the baryonic model, as these subsamples provide slightly different constraints compared to Aemulus V. Their constraint on |$p_{\rm b}$| is closer to 0, indicating no or weak offset in the positions of central galaxies.
4 DISCUSSION AND CONCLUSION
The observational data from state-of-the-art sky surveys have enabled accurate measurements of small-scale galaxy clustering. These measurements contain a great amount of information about the underlying cosmology and the physics governing galaxy formation and evolution but, an unbiased retrieval of this information, requires a robust understanding of the correlation between galaxies and their host dark matter haloes. Due to its simplicity and economy, the HOD model has been widely used for this purpose. Its construction for different types of galaxies has provided useful information about how galaxies evolve over a wide range of redshift. On the other hand, we should be concerned that its basic assumptions and prescriptions may have a direct impact on the cosmological measurements and the ignorance of any component may have a noticeable effect on the cosmological interpretation. This limitation can be somewhat examined and mitigated by introducing additional degrees of freedom. We have witnessed this development in the past decades with the basic HOD model extended for various purposes. In this work, we continue in this direction by introducing additional model parameters and explicitly examine their impact on the cosmological measurements with the BOSS galaxy sample using both the Aemulus (DeRose et al. 2019) and Aemulus |$\nu$| (DeRose et al. 2023) simulation suites.
As an empirical model, the extension of the HOD model can be described using simple parametrizations motivated by physical numerical simulations or observational requirements (Beltz-Mohrmann, Berlind & Szewciw 2020; Beltz-Mohrmann et al. 2023). We consider the spatial distribution of satellite galaxies within the host halo and that, because of baryonic effects, the halo mass can be different from a dark matter-only simulation. For the latter, we consider that central galaxy positions can be offset from the centre of the host haloes, a phenomenon similar to the miscentring effect observed in the study of galaxy clusters. In addition, we consider the effect of sample incompleteness. We implement these extended HOD models using numerical simulations showing how they impact the correlation function as a function of scale in both real and redshift space. With the new model parameters, we adopt our previously developed emulator approach to retrieve cosmological constraints and isolate the influence from these new parameters. Interestingly, the results from the BOSS galaxies show that these new parameters do not affect the cosmological constraint significantly in terms of the measurement of linear growth rate |$f\sigma _{8}$|.
The robustness of the cosmological constraints from information at small scales is interesting, and it is good to note that the basic HOD model is still sufficient to retrieve the key information about the underlying cosmology without considering additional complexities. In essence, the new degrees of freedom do not have a significant correlation with the underlying cosmology or RSD measurements (e.g. Padilla et al. 2019; Contreras et al. 2021; Alimi & Koskas 2024). Our analysis is limited by the galaxy sample considered in terms of its sample variance and shot noise, and the summary statistics we use. More data would provide better constraints on the extensions of our model and rule out the possible bias of the problem. In this case, future work will be necessary to pin down the behaviour of these new degrees of freedom.
The current BOSS galaxies cover a cosmic volume comparable to the typical simulation box of Aemulus or Aemulus |$\nu$|. This is sufficient for a small-scale analysis like that we have undertaken. However, the number density only allows observations of bright galaxies and this can lead to significant noise at small scales. At scales below a few |$h^{-1} \, {\rm Mpc}$|, it is likely that the cosmological information is contaminated by both the shot noise and complicated baryonic physics. We have seen such behaviour in the scale-dependent analysis (e.g. fig. 10 of Aemulus V): adding small scales into the analysis does not improve the cosmological constraints. Our updated analysis in this work also reveals similar findings. The shape modelling for satellites and position bias for central galaxies only affect clustering in the one-halo regime, and these new parameters are not strongly correlated with cosmological parameters. Given the uncertainties of the constraints, it could be useful to add more statistics in addition to the standard 2PCF (Hahn et al. 2020, 2024) to break these degeneracies. But this risks introducing further complexities in the modelling, and we leave such investigation in future works.
In addition to the cosmological measurements, we also constrain the parameters quantifying the model extensions. We do not observe significant constraints on these parameters or evidence for strong deviations from the basic HOD model. However, we do see some weak to mild constraints for a subset of these models. For instance, our analysis of the baryonic effect shows that the halo mass from the dark matter-only simulation is somewhat underestimated, and the central galaxies have a preference for not residing in the centre of the host halo. Although the deviations from the standard HOD model are not strong, it is worthwhile to revisit and exploit with future and better data. From this point of view, the model and methodology developed in this work can be considered a pilot study for ongoing and future sky surveys such as DESI. The physics of the tracers observed in the future will be very different and the resultant empirical modelling will have to vary significantly. Therefore our extended model can provide the additional flexibility to describe the clustering properties of these targets and build an unbiased connection with dark matter haloes.
Since the extended parameters can change the clustering properties of galaxies, we can understand these models within the framework of assembly bias or secondary bias. However, the approach we adopt is quite different from the literature (e.g. Xu et al. 2021). We could do a similar analysis by looking at samples split by different properties. Then the comparison of the clustering measurement would provide an empirical prescription of the assembly bias parameters to augment the basic HOD model (e.g. Hadzhiyska et al. 2023). The models we have considered in this work are only a limited number of trials to improve the modelling and characterization of galaxy–halo connections on small scales. There are plenty of possibilities that can be incorporated into the model, such as influence from halo spin, age, galactic conformity, galaxy and halo orientation, and many others. Our work only serves as a first step in this direction. What is clear is that the combination with the emulator approach is able to marginalize over the cosmological dependency and improve the robustness of the results.
ACKNOWLEDGEMENTS
We thank the anonymous reviewers for the helpful and insightful comments and suggestions that have significantly improved this paper. ZZ appreciates helpful comments and discussions with Renyue Cen, Xiaodong Li, Yi Zheng, Jeremy Tinker, and the Aemulus team. ZZ was supported by NSFC (12373003), the National Key Research and Development Program of China (2023YFA1605600), and acknowledges the generous sponsorship from the Yangyang Development Fund.
WJP acknowledges the support of the Canadian Space Agency and the Natural Sciences and Engineering Research Council of Canada (NSERC; funding reference number RGPIN-2019-03908).
Research at Perimeter Institute was supported in part by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Colleges and Universities.
This research was enabled in part by support provided by Compute Ontario (computeontario.ca) and the Digital Research Alliance of Canada (alliancecan.ca).
Software used: python, matplotlib (Hunter 2007), numpy (van der Walt, Colbert & Varoquaux 2011), and scipy (Jones, Oliphant & Peterson 2001).
DATA AVAILABILITY
The galaxy mocks and data are available upon reasonable request to the author.