-
PDF
- Split View
-
Views
-
Cite
Cite
Hui Peng, Yu Yu, Precise self-calibration of interloper bias in spectroscopic surveys, Monthly Notices of the Royal Astronomical Society, Volume 526, Issue 1, November 2023, Pages 820–829, https://doi.org/10.1093/mnras/stad2808
- Share Icon Share
ABSTRACT
Interloper contamination due to line misidentification is an important issue in the future low-resolution spectroscopic surveys. We realize that the algorithm previously used for photometric redshift self-calibration, with minor modifications, can be particularly applicable to calibrate the interloper bias. In order to explore the robustness of the modified self-calibration algorithm, we construct the mock catalogues based on China Space Station Telescope (CSST), taking two main target emission lines, Hα and [O iii]. The self-calibration algorithm is tested in cases with different interloper fractions at 1 per cent, 5 per cent, and 10 per cent. We find that the interloper fraction and mean redshift in each redshift bin can be successfully reconstructed at the level of ∼ 0.002 and ∼ 0.001(1 + z), respectively. We also find the impact of the cosmic magnification can be significant, which is usually ignored in previous works, and therefore propose a convenient and efficient method to eliminate it. Using the elimination method, we show that the calibration accuracy can be effectively recovered with slightly larger uncertainty.
1 INTRODUCTION
Spectroscopic surveys can be very powerful to explore the formation and evolution of galaxies and the large-scale structure, and study the properties of dark matter and dark energy. There are several ongoing and upcoming spectroscopic surveys to observe larger and deeper maps of sky, such as the Nancy Grace Roman Space Telescope1 (RST; Spergel et al. 2015), the Prime Focus Spectrograph2 (Takada et al. 2014), the Hobby–Eberly Telescope Dark Energy Experiment3 (HETDEX; Gebhardt et al. 2021), the Euclid4 (Amendola et al. 2018), the Dark Energy Spectroscopic Instrument5 (DESI Collaboration et al. 2016a,b), and the China Space Station Telescope (CSST; Gong et al. 2019).
Many of the spectroscopic surveys will obtain spectrum with lower signal-to-noise ratio (|$\rm S/N$|) to observe more galaxies at higher redshift, and the redshifts are likely determined by only a single emission line operating near the minimum acceptable |$\rm S/N$| (e.g. HETDEX, RST, Euclid, and CSST). Thus, one important systematic effect is interloper contamination due to misidentified emission lines. Pullen et al. (2016) found that the interlopers can be effectively removed in the surveys with high spectral resolution by secondary line identification or finding correlated emission lines, but should be seriously considered in the low-resolution spectroscopic surveys. The RST [O iii] survey may suffer from Ha contamination at a level of tens of per cent, despite implementing secondary line identification, and this problem is expected to be more severe for Euclid and CSST due to their lower S/N line detection threshold (Pullen et al. 2016; Addison et al. 2019). It has been shown that interlopers can significantly degrade the cosmological constraints from the measurements of power spectrum, correlation function, redshift-space distortions, baryon acoustic oscillations, weak lensing, and so on (e.g. Pullen et al. 2016; Leung et al. 2017; Addison et al. 2019; Grasshorn Gebhardt et al. 2019; Awan & Gawiser 2020; Massara et al. 2021).
Several methods have been proposed to vanish the impact of interlopers, such as combination with photometry (Kirby et al. 2007) and Bayesian framework based on prior assumptions on the luminosity functions and equivalent width distributions of galaxies (Leung et al. 2017; Davis et al. 2023; Mentuch Cooper et al. 2023). The measured correlation functions and power spectra are found to be significantly affected by the presence of interlopers and can be used to put constraints on it. In Grasshorn Gebhardt et al. (2019), they present a proof of concept for the constraints on the interloper fractions by including its effects in the modelling of the galaxy auto- and cross-power spectra of the main and the interloper samples. An alternative approach is to minimize the residual difference between the observed cross-correlation multipoles and the predicted cross-correlation multipoles of the two samples (Farrow et al. 2021). Besides, Gong et al. (2021) proposed that the galaxy–galaxy correlation between two particular observed redshift bins can be also explored to obtain the interloper fraction. Similarly, a method is presented in Foroozan, Massara & Percival (2022) that fits a model for the monopole and quadrupole moments of the contaminated correlation function with a free parameter for the interloper fraction.
In this paper, we also focus on the galaxy–galaxy correlation between redshift bins, similar to the input used in Gong et al. (2021). The straightforward approximate calculation with the ratio of angular correlation functions made in Gong et al. (2021) results in a relatively large and unstable error in the reconstruction accuracy, especially for the cases with large interloper fractions. We attempt to take one step further, to extend the interloper fraction estimates to a higher precision, meeting the requirements of the analyses in future spectroscopic surveys. The theory of this method is almost the same as the one used in self-calibrating the photometric redshift errors (Schneider et al. 2006; Benjamin et al. 2010; Zhang, Pen & Bernstein 2010; Zhang, Yu & Zhang 2017; Schaan, Ferraro & Seljak 2020; Peng et al. 2022; Xu et al. 2023). In both cases, the cross-correlations between different observed redshift bins come from the redshift error. Since the principle is the same, we find the algorithm previously used for photometric redshift self-calibration in Peng et al. (2022) has enormous potential to solve this problem.
We propose the modified self-calibration algorithm to precisely obtain the interloper fractions in redshift bins of spectroscopic surveys. To systematically investigate the performance of the algorithm in practice, we take the CSST slitless spectroscopic survey as an example, and construct light-cone simulations to generate mock galaxy catalogues. We take into account two main target lines in CSST, Hα and [O iii], and investigate on several interloper fraction cases. We note that these two strong emission lines are also the most dominant targets used in RST and Euclid, with different redshift coverage due to the different filter design from CSST. In a wide interloper fraction range, we find that the algorithm can reconstruct accurate interloper fraction and the mean redshift in each tomographic bin. We also find the cosmic magnification, which has been ignored in the past literature (e.g. Pullen et al. 2016; Grasshorn Gebhardt et al. 2019; Farrow et al. 2021; Gong et al. 2021; Foroozan et al. 2022), however, can seriously contaminate the cross-correlation measurements and degrade the accuracy. Thus, we propose a convenient and efficient method to eliminate its impact. The self-calibration results after implementing our elimination method are confirmed to be unbiased with very mild increase of the uncertainty. Additionally, in Appendix B we compliment the performance of our method on the contamination between Hα and [O ii] emission lines. We focus on CSST here but the implications generalize to all similar low-resolution spectroscopic surveys.
This paper is organized as follows. In Section 2, we give a brief overview of the self-calibration method and the modified algorithm. The simulation and mock catalogues we use are presented in Section 3. Section 4 describes the implementations of the algorithm and the method to eliminate the impact of the cosmic magnification. Finally, we conclude and discuss in Section 5.
2 METHODOLOGY
2.1 The self-calibration method
Assume that we observe the emission line at wavelength λobs in the spectrum. It may come from one emission line with rest-frame wavelength λ1 at redshift z1 or another emission line with rest-frame wavelength λ2 at a different redshift z2, which can be expressed as follows:
Thus, due to the possible misidentification, if we use λ1 (λ2) to determine the redshifts, there will be some interloper galaxies actually from λ2 (λ1) with different redshifts. This interloper bias should be precisely calibrated to meet the requirements of the ongoing and future spectroscopic surveys.
The self-calibration method we use here is almost the same as the one used in calibrating the photometric redshift errors (Peng et al. 2022). Assume that we split galaxies into n redshift bins. We denote the ratio of the galaxies in true redshift bin i but observed in redshift bin j detected through an emission line as |$P_{ij}\, {\equiv }\, N_{i{\rightarrow }j}/N_j^D$|. Here, Ni → j is the number of galaxies misidentified from the true-z bin i to the observed bin j, and |$N_j^D$| is the total number of galaxies in the observed redshift bin j. We have the normalization ∑iPij = 1. The power spectrum of two observed redshift bins, |$C_{ij}^{gg,D}(\ell)$|, and the power spectrum of true redshift bins, |$C_{ij}^{gg,R}(\ell)$|, is related by
This equation has approximated |$C_{k{\ne }m}^{gg,R}(\ell)=0$|, as the galaxy cross-correlation between non-overlapping redshift bins would vanish under the Limber approximation and without the cosmic magnification. The last term |$\delta {N_{ij}^{gg,D}(\ell)}$| is the associated shot noise fluctuation after the subtraction of its ensemble average. With the measurements on different scales, the number of unknowns (Pij, |$C_{kk}^{gg,R}$|) will be less than the number of observables (|$C_{ij}^{gg,D}$|) and then the equation can be solved in principle. For a given ℓ, we can rewrite the above equations in matrix form,
Note that there are some obvious difference between the interloper contamination in spectroscopic surveys and photometric redshift scattering in photometric surveys. According to equation (1), there exist strict corresponding relation between the true redshifts and observed redshifts of interlopers when taking two emission lines into consideration. If we divide the redshift bins properly, the interlopers will come from only one redshift bin, instead of from multiple bins, or even worse from all bins in the case of photometric redshift scattering. Thus, there is only one non-zero Pi ≠ j, and we can denote this non-zero Pij as the interloper fraction fi for the observed redshift bin j, and Pjj is simply reduced to 1−fi. Take the spectroscopic survey like CSST which observing Hα 6563 Å and [O iii] 5007 Å lines as an example. We can divide the redshift range 0 < z < 1.478 into 10 tomographic bins with edges z = 0.000, 0.100, 0.200, 0.311, 0.442, 0.573, 0.719, 0.890, 1.062, 1.254, and 1.478. The contamination happens between bin j and j + 3, for j = 1, ⋅⋅⋅, 7.
Fig. 1 shows two cases using these two emission lines to determine the redshifts with interloper fraction fi = 5 per cent in each redshift bin. We can easily figure out that the number of unknown parameters, Pij, is greatly reduced, compared to the case of photometric redshift errors, making it possible to obtain precise solutions of the interloper fraction in each observed redshift bin. In Gong et al. (2021), the ratio between the cross-correlation and auto correlation is used to estimate the same interloper fractions here. However, the accuracy of this simplistic approximation will degrade as the interloper fraction rises (though this can be mitigated in an iterative way, see Appendix A for details). Therefore, in order to assure the high accuracy for the future analysis in surveys, a more robust algorithm is needed to obtain the precise interloper fractions.
![Two typical cases when using Hα (left panel) and [O iii] (right panel) lines to determine the redshifts with interloper fraction 5 per cent. The vertical axis zs labels the true redshift bins, the horizontal axis zHα and $z_{\rm [O\, \small {III}]}$ denote the redshift bins detected through emission lines. The 10 redshift bins are divided according to the strict corresponding relation in equation (1).](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/526/1/10.1093_mnras_stad2808/1/m_stad2808fig1.jpeg?Expires=1749375177&Signature=ooIMJ2-lR-Dv1TZo-PSuONzDtUfOZupYiKAEJiVOTw2OWHcSRuk2IwV44p6BLChoYHra8tr39WU1wNHQPSql~F3ZwrCRjpZN8aYxwXlfiI4SwjDfBEkLpwWnY7Kg84jkBAKtsCvHDC-YAVmPqlVizg0ZumDoWx31NlzJxLUsf3m~PXyAy0Vz0YIce6v~6jDmWXejpkamFENRGl8bDIayEZsgUclfKTPGbeVvUlDthVBJeaq51lq9FJSzwhO1~F5amJzdVBWz085H-yN5v9KcMTUieVzk8IrmAnTSLZWB7kEJ4HH17fHSN3DTabcXwBrBnTSO-ZQnxM8b6jroV9JnTQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Two typical cases when using Hα (left panel) and [O iii] (right panel) lines to determine the redshifts with interloper fraction 5 per cent. The vertical axis zs labels the true redshift bins, the horizontal axis zHα and |$z_{\rm [O\, \small {III}]}$| denote the redshift bins detected through emission lines. The 10 redshift bins are divided according to the strict corresponding relation in equation (1).
2.2 Modified algorithm
In Peng et al. (2022), an algorithm based on the self-calibration theory above was developed to calibrate the photometric redshift errors. The technique attempts to find the result by minimizing
where ‖.‖F is the Frobenius form. |$\mathcal {J}$| measures the accumulation of decomposition error across all data matrices between the observations and reconstructions, where |$C_{\ell }^{gg,D}$| is the observational power spectrum, P and |$C_{\ell }^{gg,R}$| are the derived results from the algorithm. We find this self-calibration algorithm can also be very helpful to obtain the interloper fractions in observed redshift bins of spectroscopic surveys. Of course, due to some unique properties in the cases of interloper bias, we implement the following modifications on the algorithm to make the reconstruction of interloper fractions more efficient.
First, the cross-power spectra in |$C_{\ell }^{g g, D}$| and the off-diagonal elements in matrix P should only exist for the redshift bin pairs satisfying equation (1). Thus, we set 0 for other matrix elements in iteration process of algorithm, and only the non-vanishing cross-power spectra are used to initialize the iteration. Secondly, for our redshift binning scheme, the interlopers will not come from adjacent redshift bin. We do not need to worry about the large scale non-vanishing cross-correlation between neighbouring redshift bins when the bins are narrow. Therefore, we relax the limitation on the largest scale and the redshift bin width used in the self-calibration for photometric redshift (Peng et al. 2022). Finally, thanks to the extremely decreased number of unknown parameters and complexity of the system, here we only use the first part of the original algorithm for photometric redshift self-calibration, i.e. the fixed-point iteration algorithm, with the above modifications. The new procedure is summarized in Algorithm 1. We find that it is sufficient to solve the problem and obtain the accurate results. We refer the readers to Zhang et al. (2017) and Peng et al. (2022) for the original algorithm used in photometric redshift self-calibration.
Modified algorithm for solving equation (3) to obtain the interloper fractions in redshift bins. The differences from the algorithm used in Peng et al. (2022) are shown in italics.
![]() |
![]() |
Modified algorithm for solving equation (3) to obtain the interloper fractions in redshift bins. The differences from the algorithm used in Peng et al. (2022) are shown in italics.
![]() |
![]() |
3 SIMULATION
The CSST is a 2-m space telescope, which is planned to be launched at the end of 2024. It will concurrently investigate both photometric imaging and slitless spectroscopic surveys, covering 17 500 |$\rm deg^2$| sky area in about 10 yr. The CSST contains three bands, i.e. GU, GV, and GI, with wavelength range from ∼ 250 to ∼ 1100 nm. The Hα and [O iii] emission lines are main targets in the CSST spectroscopic survey.
The zCOSMOS is a survey covers 1.7 |$\rm deg^2$| with a magnitude limit ≃ 22.5 (Lilly et al. 2007, 2009), closing to survey depth of the CSST. Thus, we use the redshift distribution of the zCOSMOS catalogue to construct the mock of CSST spectroscopic survey, and the magnitude is limited to 22.5. The redshift distribution is presented in Fig. 2, and the redshift bins are divided as in Fig. 1, ranging from 0 to 1.478. Note that the redshift range covered by observing a specified emission line is limited by the wavelength coverage of instruments in different surveys, and can not reach 1.478 for Hα or [O iii] in CSST spectroscopic survey. Here, we use this broader redshift range from zCOSMOS as an example to show the universality of our method, without strict constrains on the redshift ranges of different emission lines in real spectroscopic surveys.

The redshift distribution of the zCOSMOS catalogue, which is adopted to construct the mock catalogue for CSST spectroscopic survey. The vertical axis indicates the number of objects per arcmin2 in each observed redshift bin.
The simulation we use to construct mock galaxy catalogues is the same as in Peng et al. (2022), which is a high-resolution N-body simulation presented in Jing (2019) with a flat ΛCDM cosmological model consistent with the WMAP observations (Komatsu et al. 2011; Hinshaw et al. 2013). By a particle-particle-particle-mesh gravity solver from z = 144, the simulation evovles 30723 particles inside a (600 Mpc h−1)3 comoving volume with periodical condition. From the snapshots at various redshifts, we cut out curved slices with 300 Mpc h−1 thickness and stack them to construct light-cones upto z ∼ 2.48. In order to prevent the repeating structures along line-of-sight, all the boxes are randomly rotated and shifted before slicing. We made a total of 300 pseudo-independent light-cones, including lensing convergence maps, friends-of-friends haloes, and dark matter particle distributions.
We use 223 simulated maps, with 67.13 |$\rm {deg}^2$| each, to cover ∼ 15 000 |$\rm {deg^2}$|, similar to the sky map area of CSST after masking. To construct mock catalogues, we use the following steps. First, we set the number of galaxies in each observed bin using the redshift distribution in Fig. 2. Then, according to the interloper fractions we preset, the number of galaxies in each true redshift bin can be fixed. The haloes generated in simulation are regarded as the galaxies we need. We pick the galaxies in descending order of halo mass until the number is satisfied in each true redshift bin. Finally, to match the interloper fractions, we randomly assign corresponding fraction of galaxies in true redshift bins to each observed bin. Note that the observed redshifts of interlopers are known to us due to the strict relationship in equation (1).
The healpix (Górski et al. 2005) is used to construct galaxy overdensity map, with Nside = 1024, corresponding to a spatial resolution of ∼ 3.4 arcmin. We use the function compute_coupled_cell in namaster (Alonso et al. 2019) to measure the angular power spectrum of galaxies. As mentioned in Section 2.2, here we do not cut out the data on large scale. Thus, the ℓ modes from the fundamental frequency ℓmin = 44 to ℓmax = 1000 are used in the analysis and are divided into six broad bands, [44, 410), [410, 578), [578, 708), [708, 817), [817, 913), and [913, 1000).
4 RESULTS
4.1 Fiducial results
In this work, we take the interloper fractions at 1 per cent, 5 per cent, and 10 per cent as examples. We assess the performance of the self-calibration algorithm when using |$\rm {H\alpha }$| and |$\rm [O\, \small {III}]$| lines to determine the redshifts of galaxies in mock catalogues, respectively. We still input |$\ell C_{\ell }^{gg,D}$| with 1000 initial matrices and the solution selection criterion is also the same as in Peng et al. (2022) via
Here, |$\mathcal {J}_{\min }$| is smallest |$\mathcal {J}$| from the modified algorithm. We take the average value of the solutions in this range as the final result. However, the reconstruction uncertainty here is not decided by the standard deviation of the selected solutions like Peng et al. (2022). Since we have much fewer unknowns in this problem, the solutions after selection are very concentrated. The variance among the selected results is not a good estimate of the uncertainties induced by the reconstruction algorithm. Instead, we divide the maps into 10 groups, each representing a sky coverage over ∼ 1500 |$\rm {deg^2}$|. Then we use the standard deviation of the reconstruction results from these 10 groups, divided by |$\sqrt{10}$|, to represent the uncertainty of the complete sample.
As shown in Fig. 1, the interlopers only happen in seven observed redshift bins in our mock data, bins 1–7 and bins 4–10 when the redshifts are determined by |$\rm {H\alpha }$| and |$\rm [O\, \small {III}]$| lines, respectively. The points and error bars in Fig. 3 show the reconstruction results in different cases when ignoring the cosmic magnification. Here, we only present the reconstructed off-diagonal elements, i.e. interloper fraction fi in each tomographic bin, because of the column-sum-to-one constrain. We find that the biases of reconstructed interloper fractions are very small except the first redshift bin when using |$\rm H\alpha$| to determine the redshift. This underestimation also occurs when the method in Gong et al. (2021) is used (see Appendix A). We argue that this deviation is partially due to the low |$\rm S/N$| of power spectrum measurement for the lowest redshift bin, and partially due to the cosmic variance. It is not a great concern here. We also find that the reconstruction accuracy is not sensitive to the different interloper fractions we set. Taking all three cases together, the mean absolute bias of the reconstructed interloper fractions is 0.0017 for the remaining six redshift bins when using |$\rm H\alpha$| lines to determine the redshifts and 0.0021 for all seven redshift bins when redshifts are determined by [O iii] lines.
![The reconstruction results when using $\rm H\alpha$ (left panel) and $\rm [O\, \small {III}]$ (right panel) lines to determine the redshifts of galaxies, with interloper fractions at 1 per cent, 5 per cent, and 10 per cent. The points and error bars indicate the biases and 1σ uncertainties of the reconstructed interloper fractions in different cases.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/526/1/10.1093_mnras_stad2808/1/m_stad2808fig3.jpeg?Expires=1749375177&Signature=BGLSkDuWGCPf7COo7601GGDPUbqwIIT6T6EAXRCa4L~j9ZS4rLDTFhjrm-hPtqAa8vlsYNlUmYUdMJaihdKELVRCP9aIe4Vyi-Mw5ZzuRy6VXcNxZaBxH~WH5I0snurr0eZ0y8r3Bpjh9N8iI2i~fkT7T10hHZrqzX~71-CsrGE4kXZ1ffx1mCENUkf08IiLf5sf5zmZj8L04pZzr3oJoa0L2mvd1vcDIsFnPDzkuuFdA3tAZdWNTWr6WlQUdhNaahbBQcdwWDnGD3UnTmSGgZriOTelytIsk0jHZEdoNN6iySUxwTNnkNlWejK2q2G-zMJKs3S5LD8Dcf2HD6fw2A__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
The reconstruction results when using |$\rm H\alpha$| (left panel) and |$\rm [O\, \small {III}]$| (right panel) lines to determine the redshifts of galaxies, with interloper fractions at 1 per cent, 5 per cent, and 10 per cent. The points and error bars indicate the biases and 1σ uncertainties of the reconstructed interloper fractions in different cases.
Moreover, with the interloper fractions reconstructed by the self-calibration algorithm, the bias in the mean redshift estimation for each tomographic bin can be reduced. We use the approximation equation proposed by Zhang et al. (2010) to estimate the mean true redshift for a observed bin i, which can be expressed as follows:
Here, |$\langle z_{j}^{D}\rangle$| is the mean observed redshift for the observed redshift bin j, and it is assumed to be approximately equal to the unknown mean true redshift of the true redshift bin j. The bias of the mean redshift in each tomographic bin obtained by equation (6) are shown in Fig. 4. The error bars again come from the standard deviation of the mean redshifts estimated from 10 groups, and divided by |$\sqrt{10}$|. We can see that the deviation from the truth value in each tomographic bin has been successfully reduced to ∼ 0.001(1 + z).
![Bias on the mean redshift in each tomographic bin estimated from equation (6) with the reconstructed results when using $\rm H\alpha$ (left panel) and $\rm [O\, \small {III}]$ (right panel) lines to determine the redshifts of galaxies. The points and error bars indicate the bias and 1σ uncertainties of the estimated mean redshift for each tomographic bin in different cases. The grey and light grey region show the accuracy of δ〈z〉 < 0.001(1 + z) and δ〈z〉 < 0.003(1 + z), respectively.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/526/1/10.1093_mnras_stad2808/1/m_stad2808fig4.jpeg?Expires=1749375177&Signature=Ug~a-6-t0W-gr173NCqPR7uiGWNZIm40HRQO-coCwSMEBHl2Sb-oJN9-8fboZnr6C1Ff1WVtdB8hQqsa0mBhEWljG3ZQNhTOa7n7LWmE31uXgyIzuwoT1EoC3BTc4nh3Ht5phKojxcWiF7a17GQGa4Q3TwWv0SmwMcQJOspt7LovBCJCsPlsGHqmC2BkYJ98tNhBX90ek0uAdJLXUcvqSVurxidjeg1JfnBsc2FJ9hhFsMnJJgIV31Q0ais9pf2f7MsoQviDty1B~YmvU4pRmVJQE85FwBtboXyHvSFWz3BUL1Ivaywb0-3axFAA0YeTCOu1av2fr5yV2M6X9kytGA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Bias on the mean redshift in each tomographic bin estimated from equation (6) with the reconstructed results when using |$\rm H\alpha$| (left panel) and |$\rm [O\, \small {III}]$| (right panel) lines to determine the redshifts of galaxies. The points and error bars indicate the bias and 1σ uncertainties of the estimated mean redshift for each tomographic bin in different cases. The grey and light grey region show the accuracy of δ〈z〉 < 0.001(1 + z) and δ〈z〉 < 0.003(1 + z), respectively.
4.2 Cosmic magnification
In real survey, gravitational lensing can alter the galaxy positions and flux. Then the observed galaxy clustering will be changed, so called magnification bias, which is an important source of systematic error for the precision cosmology. The galaxy number overdensity after lensing can be given by
Here, κ is the lensing convergence, with pre-factor g ≡ 2(α−1). α is the logarithmic slope at the flux limit Flim, defined as folows:
With the lensed galaxy distribution, the power spectrum between the observed foreground redshift bin i and background redshift bin j can be approximated by a simple form
where |$C_{ij}^{g \kappa , D}$| is the galaxy–galaxy lensing power spectrum between observed redshift bins. This effect is ignored in past related literature. However, given the high precision of the reconstructed interloper fractions in the last section, we investigate the impact of cosmic magnification in our self-calibration method.
The lensing convergence κ in each redshift bin is given by
where n(zs) is the distribution of the true galaxy redshifts. The logarithmic luminosity slope α of 10 redshift bins are set to [0.20, 0.26, 0.35, 0.55, 0.90, 1.35, 1.90, 2.35, 2.85, and 3.50], which are consistent with Unruh et al. (2020). Then we use the galaxy number overdensity after lensing, i.e. equation (7), to calculate the power spectra and apply the self-calibration algorithm.
In Fig. 5, we compare the reconstructed interloper fractions with and without magnification for all cases. It is obvious that the cosmic magnification can drastically degrade the accuracy and need to be particularly taken into account. The algorithm treats the lensing contribution as the effect of interlopers, and the biases are larger for high redshift bins. In order to ensure the high precision of the self-calibration method, it is essential to eliminate the magnification effect.

Comparison between the reconstructed interloper fractions after lensing (black points) and the fiducial results before lensing (red points). The plot indicates that the cosmic magnification can drastically degrade the accuracy and needs to be eliminated.
We propose a convenient and efficient elimination method here. Assume that there exist interloper galaxies between the redshift bins i and j. The |$g_j C_{ij}^{g \kappa , D}$| is the lensing magnification term we need to eliminate. We can easily find two assistant redshift bins, i− and i+, whose upper and lower boundary is the lower and upper boundary of bin i, respectively. When the widths of redshift bins i− and i+ are equal to the redshift bin i, we have the first order approximation
Note that there are no interlopers between assistant redshift bins i± and redshift bin j, thus their cross-correlations are contributed by the galaxy–galaxy lensing only, and we have
Now we can use the assistant redshift bins to estimate the lensing magnification signal between the redshift bins i and j. And then we can subtract it in the original measured cross-power spectrum |$C_{ij,L}^{g g, D}$| to obtain |$C_{ij}^{g g, D}$|. Note that when the redshift bin i is the first redshift bin, we estimate the magnification signal by working backward from two assistant redshift bins i+ and i++ with higher redshift, although this may introduce a relatively large error. Additionally, in some rare cases involving the first bin, the estimated magnification contribution has the opposite sign to the pre-factor g in equation (7), which is definitely due to the large noise. We directly ignore the magnification in these situations. Furthermore, if the bin width is too large to find the equal-width assistant redshift bins, we have to set the width as close as possible and take this difference in bin width into consideration.
Figs 6 and 7 show the results after implementing the elimination method. In Fig. 6, we can see the accuracy of the reconstructed interloper fractions is effectively recovered with slightly larger uncertainty (∼ 50 per cent increase). This indicates that our elimination method can efficiently remove the impact of the cosmic magnification and be applied to ensure a high reconstruction accuracy in practice. And Fig. 7 shows the accuracy of mean redshift in each tomographic bin can also be recovered to a similar level as the fiducial results. Therefore, although there exist the influence from the cosmic magnification, the reconstruction results with our elimination method can still reach a similar high accuracy.

Same as Fig. 5, but the blue points are the results after implementing the elimination method to remove the impact of the cosmic magnification. The comparison shows the effectiveness of our elimination method and the biases of interloper fractions can be reduced to a similar level of the fiducial results before lensing.

Comparison between the bias on the mean redshift in each tomographic bin after implementing the elimination method to remove the impact of the cosmic magnification (blue points) and the fiducial results before lensing (red points). The results show that the self-calibration accuracy can be effectively recovered with slightly larger uncertainty.
5 CONCLUSIONS AND DISCUSSION
We modified the algorithm that was applied to self-calibrate the photometric redshift errors, and used it to calibrate the interloper bias in spectroscopic surveys. The algorithm was validated on the mock data based on CSST, with precise reconstructions on the interloper fraction and mean redshift in each tomographic bin. Actually, the power spectra in true redshift bins can be precisely derived simultaneously. We also found the cosmic magnification can be a big issue and drastically degrade the accuracy of self-calibration results. Thus, we proposed a convenient and efficient elimination method to ensure the high accuracy in practice. After implementing the elimination method, the biases of the reconstructed interloper fractions can be successfully reduced to a similar level as before. This will play a crucial role in many practical analyses.
In this work, we took the Hα and [O iii] lines as an example to validate the self-calibration method. They are widely used as the target lines in the future stage IV spectroscopic surveys (e.g. RST, Euclid, and CSST). Though with different redshift range, the excellent stability of the reconstruction accuracy across different redshift bins indicate that the self-calibration algorithm and elimination method can also be particularly effective for all these surveys. And it is obvious that this method is equally applicable to calibrate the interloper contamination between other emission lines, such as the Lyα and [O ii] in HETDEX, where approximately 95 per cent of emission line detections are spectra containing only one apparently single-peaked emission line (Davis et al. 2023). Besides, as long as the corresponding relationship is satisfied, the binning scheme and the range of redshift can be various to match the requirements for different analysis or wavelength coverage from instrument in survey. We note that our elimination method to remove the magnification impact is not the only approach, however, is the one with minimal assumption. Any other feasible methods can also be incorporated into the self-calibration process. For example, a single assistant redshift bin with assuming a cosmology can serve the purpose similarly.
So far our work is carried out based on the simulation data. The results we obtained give us confidence that our self-calibration algorithm with elimination method presented will enable accurate and well-calibrated redshift for the analyses in CSST and other ongoing and future spectroscopic surveys. We delay the implementation on real data to a future work.
ACKNOWLEDGEMENTS
We thank Pengjie Zhang and Xin Wang for useful discussions. This work is supported by the National Key Basic Research and Development Program of China (number 2018YFA0404504), the National Science Foundation of China (grant numbers 12273020, 11621303, and 11890691), the China Manned Space Project with numbers CMS-CSST-2021-A02 and CMS-CSST-2021-A03, the ‘111’ Project of the Ministry of Education under grant number B20019, and the sponsorship from Yangyang Development Fund. This work made use of the Gravity Supercomputer at the Department of Astronomy, Shanghai Jiao Tong University.
DATA AVAILABILITY
All data included in this study are available upon reasonable request by contacting with the corresponding author.
Footnotes
References
APPENDIX A: PREVIOUS METHOD AND THE IMPROVEMENT FROM AN ITERATIVE WAY
In Gong et al. (2021), the interloper fraction from the true redshift j to the observed redshift bin i is estimated by a simplistic ratio form, and we denote it as f0 here.
The implicit assumption in the above estimation is that there are no interlopers for the observed redshift bin j. However, this is largely not true in practice, especially when we chose a lot of redshift bins. In more realistic case, the ratio, derived from equation (2), is
Here, we further consider the interloper galaxies from the true redshift bin k to the observed bin j. For example, when using Hα lines to determine the redshifts, redshift bins i, j and k can correspond to bins 4, 7, and 10, respectively. Actually, we can derive the bias on f0 from neglecting the secondary interlopers. Denoting the interloper fraction Pkj = f1, the equation (A2) can be written by Taylor expansion as follows:
We can see the ratio is close to f0 only if the value of f1 is very small.
In Fig. A1, We compare the results from the ratio method based on equation (A1). We note that the improvement in accuracy of the ratio method relative to the results in Gong et al. (2021) may be due to the use of angular power spectra instead of angular correlation function, or due to the difference in the scale range used in the analysis. Besides, the error bars in Gong et al. (2021) are derived by the average values of the errors in different angular bins. Instead, here we estimate the error bars from the 10 groups. It is obvious that in some redshift bins the reconstruction accuracy using the ratio method will degrade as the interloper fractions rises, which is in line with the expectation from equation (A3). However, the accuracy of the results obtained by our self-calibration method can be stable in different cases.

Comparison of results from the self-calibration algorithm (red points) and the ratio method in Gong et al. (2021) (blue points). The plot shows that the reconstruction accuracy in some redshift bins using the previous ratio method will degrade as the interloper fractions rise.
In fact, using the equation (A3), we can derive an iterative method based on the relation between the ratio |$C_{ji}^{gg,D}/C_{jj}^{gg,D}$| and f0. Because the redshifts of a given catalogue have the upper and lower limits. For large k or small k in different cases, Pkk = 1, and the interloper fraction Pkj is exactly equal to the ratio |$C_{kj}^{gg,D}/C_{kk}^{gg,D}$|. Then the estimated value of Pkj, i.e. f1 in equation (A3), can be used to obtain interloper fraction f0. We propose that, in this iterative way, all the true interloper fractions can be derived in order. Here we simplify the equation (A3) to
because the interloper fractions are usually small. Never the less, if the value of |$C_{kk}^{gg,P}/C_{jj}^{gg,P}$| is found to be much larger than 1, we can use it to approximate the |$C_{kk}^{gg,R}/C_{jj}^{gg,R}$| in equation (A3) and do calculation. The results after implementing this iteration method are shown in Fig. A2. We find it can successfully solve the problem in previous ratio method and the accuracy is comparable to the self-calibration algorithm. However, when dealing with complex cases that require multistep iterations, the accuracy of this approach may degrade. Furthermore, as mentioned in Sun et al. (2023), this naive ratio estimator is sub-optimal and biased in terms of statistical errors. Therefore, the self-calibration algorithm is more feasible and can be implemented in complicated cases.

Same as Fig. A1, but the blue points are further calibrated using the iteration method. The iteration process follows the rule in equation (A4). The results show that the iteration method can successfully solve the problem in previous ratio method and is comparable to the self-calibration algorithm.
APPENDIX B: INTERLOPER CONTAMINATION BETWEEN ANOTHER PAIR OF EMISSION LINES
Here, we also investigate the accuracy of the self-calibration algorithm and elimination method on the contamination between Hα 6563 Å and [O ii] 3727 Å emission lines, which are relatively further away from each other than the fiducial case. To match the relation in equation (1), we divide the redshift range 0 < z < 1.478 into another nine tomographic bins with edges z = 0.000, 0.100, 0.200, 0.300, 0.407, 0.761, 0.937, 1.113, 1.289, and 1.478. The contamination happens between bin j and j + 5, for j = 1, ⋅⋅⋅, 4.
The results of adopting Hα and [O ii] emission lines are shown in Fig. B1, which is consistent with the results in the main text. This consistence indicate that our method can generalize to any similar spectroscopic surveys.