Convolution- and deconvolution-based estimates of galaxy scaling relations from photometric redshift surveys

Sheth, Ravi K.; Rossi, Graziano

doi:10.1111/j.1365-2966.2010.16258.x

Abstract

In addition to the maximum likelihood approach, there are two other methods that are commonly used to reconstruct the true redshift distribution from photometric redshift data sets: one uses a deconvolution method, and the other a convolution. We show how these two techniques are related, and how this relationship can be extended to include the study of galaxy scaling relations in photometric data sets. We then show what additional information photometric redshift algorithms must output so that they too can be used to study galaxy scaling relations, rather than just redshift distributions. We also argue that the convolution-based approach may permit a more efficient selection of the objects for which calibration spectra are required.

methods: analytical, methods: statistical, galaxies: formation, cosmology: observations

1 INTRODUCTION

The next generation of sky surveys will provide reasonably accurate photometric redshift estimates, so there is considerable interest in the development of techniques that can use these noisy distance estimates to provide unbiased estimates of galaxy scaling relations. While there exist a number of methods for estimating photometric redshifts (Budavári 2009 and references therein), there are fewer for using these to estimate accurate redshift distributions (Padmanabhan et al. 2005; Sheth 2007; Lima et al. 2008; Cunha et al. 2009), the luminosity function (Sheth 2007) or the joint luminosity–size, colour–magnitude, etc. relations (Rossi & Sheth 2008; Christlein et al. 2009; Rossi, Sheth & Park 2010).

Ideally, the output from a photometric redshift estimator is a normalized likelihood function that gives the probability that the true redshift is z given the observed colours (i.e. Bolzonella, Miralles & Pelló 2000; Collister & Lahav 2004; Cunha et al. 2009). Let denote this quantity; it may be skewed, bimodal or more generally it may assume any arbitrary shape.

Let ζ denote the mean or the most probable value of this distribution (it does not matter which, although some of the logic that follows is more transparent if ζ denotes the mean). Often, ζ (sometimes with an estimate of the uncertainty on its value) is the only quantity that is available. Therefore, in Section 2.1 we first consider how ζ compares with the true redshift z, and contrast the convolution and deconvolution methods for estimating dN/dz, while in Section 2.2 we describe how to reconstruct the redshift distribution directly from colours. Section 2.3 shows what this implies if one wishes to use the full distribution ⁠. Section 2.4 shows how to extend the logic to the luminosity function, and Section 2.5 to scaling relations, again by contrasting the convolution and deconvolution methods, and showing what generalization of is required from the photometric redshift codes if one wishes to do this. A final section summarizes our results.

Where necessary, we write the Hubble constant as H₀= 100 h km s⁻¹ Mpc⁻¹, and we assume a spatially flat cosmological model with (Ω_M, Ω_Λ, h) = (0.3, 0.7, 0.7), where Ω_M and Ω_Λ are the present-day densities of matter and cosmological constant scaled to the critical density.

2 TO CONVOLVE OR TO DECONVOLVE?

In what follows, we will use spectroscopic and photometric redshifts from the Sloan Digital Sky Survey (SDSS) to illustrate some of our arguments. Details of how the early-type galaxy sample was selected are in Rossi et al. (2010); the photo-zs for this sample are from Csabai et al. (2003).

2.1 The redshift distribution

Suppose that the true redshifts z are available for a subset of the objects; for now, assume that the subset is a random subsample of the objects in a magnitude-limited catalogue. Ideally, this subset would have the same geometry as the full survey, as cross-correlating objects with spectra and those without allows the use of other methods (e.g. Caler, Sheth & Jain 2009). In practice, this may be difficult to achieve – and this is not required for the analysis that follows, provided that the photometric redshift estimator does not have spatially dependent biases (e.g. as a result of photometric calibrations varying across the survey).

For objects with spectroscopic redshifts, one can study the joint distribution of ζ and z (see Fig. 1). Typically, most photometric redshift codes are constructed to return 〈ζ|z〉≈z. The codes that do so are sometimes said to be unbiased, but they are not perfect: the scatter around the unbiased mean is of order σ_ζ|z≈ 0.05(1 +z). This scatter, combined with the fact that 〈ζ|z〉≈z, means that 〈z|ζ〉≠ζ: the fact that 〈z|ζ〉 is guaranteed to be biased is not widely appreciated. However, we show below that it matters little whether 〈ζ|z〉 or 〈z|ζ〉 is unbiased; what matters is whether the bias is accurately quantified.

Figure 1

Distribution of the difference between spectroscopic and photometric redshifts (z and ζ), at fixed z (top) and ζ (bottom), in the SDSS early-type galaxy sample. Note that p(ζ|z) is rather well centred on z, whereas p(z|ζ) is not centred on ζ.

Open in new tab Download slide

In particular, if

and dN/dz denote the distribution of ζ and z values in the subset of the data where both z and ζ are available, then what matters is that p(ζ|z) and p(z|ζ), where

1

are known. Note that

2

The algorithm in Sheth (2007) assumes that p(ζ|z), measured in the subset for which both z and ζ are available, also applies to the full sample for which z is not available. Since

is measured in the full data set, and p(ζ|z) is known, a deconvolution is then used to estimate the true dN/dz.

Suppose, however, that one measured p(z|ζ) instead. Then, because

3

one could estimate the quantity on the left-hand side by ‘convolving’ the two measurables on the right-hand side. For the data subset in which both z and ζ are available, this is correct by definition. Clearly, to apply this method on the larger data set for which only ζ is available, one must assume that p(z|ζ) of the subset from which it was measured will remain accurate in the larger data set.

Rossi et al. (2010) have shown that the deconvolution method accurately reconstructs the true dN/dz distribution from ⁠. Fig. 2 shows that the convolution approach also works well, even when only a random 5 per cent of the full data set is used to calibrate p(z|ζ), as displayed in Fig. 1. Thus, for the data set in which both z and ζ are available, both the convolution and deconvolution approaches are valid, whether or not the means (or, for that matter, the most probable values) of p(z|ζ) and p(ζ|z) are unbiased, and however complicated (skewed, multimodal) is the shape of these two distributions. This remains true in the larger data set, where only ζ is known. However, whereas the convolution approach assumes that p(z|ζ) is the same in the calibration subset as in the full one, the deconvolution approach assumes that p(ζ|z) is the same.

Distribution of (dotted) and dN/dz (solid); crosses show the result of convolving with p(z|ζ) (from the bottom panel of Fig. 1).

Figure 2

Distribution of formula (dotted) and dN/dz (solid); crosses show the result of convolving formula with p(z|ζ) (from the bottom panel of Fig. 1).

Open in new tab Download slide

2.2 Convolution directly from colours

The integral in equation (3) is really a sum over all the objects in the photometric data set, where each object with estimated ζ contributes to dN/dz with weight p(z|ζ):

4

Now, recall that ζ was the mean (or most probable) value of a distribution returned by a photometric redshift code. In cases where the observed colours c map to a unique value of ζ, then this sum over ζ is really a sum over c, and the expression above is really

5

Equation (5) is one of the key results of this paper.

Although we arrived at equation (5) by requiring the mapping c→ζ to be one-to-one [as may be the case for e.g. luminous red galaxies (LRGs)], it is actually more general. This is because one can simply measure p(z|c) in the sample for which spectra are in hand, for the same reason that one could measure p(z|ζ). In fact, p(z|c) is an easier measurement, since it does not depend on the output of a photo-z code! The constraint on the mapping between c and ζ in the discussion above was simply to motivate the connection between photo-z codes and the convolution method. Once the connection has been made, however, there is no real reason to go through the intermediate step of estimating ζ, since all photo-z codes use the observed colours c anyway. In this respect, equation (5) is the more direct and natural expression to work with than is equation (4). In particular, because p(z|c) is an observable, the convolution approach of equation (5) is independent of any photo-z algorithm. Of course, if this method is to work, then the subsample with spectral information must be able to provide an accurate estimate of p(z|c).

2.3 Relation to photo-z algorithms

The convolution method of the previous subsection provides a simple way of illustrating how one should use the output from photo-z codes that actually provide a properly calibrated probability distribution for each set of colours c, to estimate dN/dz. It also shows in what sense the codes should be ‘unbiased’.

In particular, equation (5) suggests that one can estimate d N(z)/d z by summing over all the objects in the data set, weighting each by its

⁠. This is because

6

Equation (6) shows that if

does not have the same shape as p(z|c), then use of

will lead to a bias; this is the pernicious bias that must be reduced – whether or not 〈z|c〉 equals the spectroscopic redshift is, in some sense, irrelevant. (In the case of a one-to-one mapping between c and ζ, 〈z|c〉 is the same as the quantity 〈z|ζ〉, which we have discussed in the previous subsections.)

Satisfying is non-trivial. This is perhaps most easily seen by supposing that the template or training set consists of two galaxy types (early- and late type, say), for which the same observed colours are associated with two different redshifts. In this case, if the photo-z algorithms are working well, then will be bimodal for at least some c. However, if the sample of interest only contains LRGs, then p(z|c) may actually be unimodal. As a result, unless proper priors on the templates are used, or care is taken to ensure that the training set is representative of the sample of interest.

2.4 The luminosity function

We can perform a similar analysis of the luminosity function. In this case, the key is to recognize that, in a magnitude-limited survey, the quantity that is most directly affected by the photometric redshift error is not the luminosity function φ(M) itself, but is the luminosity distribution N(M) ≡V_max(M) φ(M) (Sheth 2007). In a spectroscopic survey, N(M) differs from φ(M) because one sees the brightest objects to larger distances: V_max(M) is the largest comoving volume to which an object with absolute magnitude M could be seen. If we use

to denote the absolute magnitude estimated using the photometric redshift ζ, and M its correct value, then

7

Sheth (2007) describes a deconvolution algorithm for estimating N(M) given measurements of

and the assumption that

⁠, measured in a subset for which both z and ζ (hence both M and

⁠) are available, also applies to the full photometric survey.

Following the discussion in the previous section, we could instead have measured

⁠, and then used the fact that

8

to estimate the quantity on the left-hand side by summing over the photometric catalogue on the right-hand side, weighting each object in it by

⁠; note that this weight depends on

⁠. Fig. 3 shows

and

⁠; notice how broad they are, and how much more skewed and biased

is than

⁠. Nevertheless, Rossi et al. (2010) have shown that the deconvolution algorithm produces good results. Fig. 4 shows that the convolution algorithm works equally well.

Similar to Fig. 1 but for the true absolute magnitude and the estimate from the photometry. Notice that is approximately symmetrically distributed around M, whereas can be both significantly offset from and skewed.

Figure 3

Similar to Fig. 1 but for the true absolute magnitude and the estimate from the photometry. Notice that formula is approximately symmetrically distributed around M, whereas formula can be both significantly offset from formula and skewed.

Open in new tab Download slide

Figure 4

Same as Fig. 2 but for the absolute magnitudes. Crosses show the distribution one obtains by convolving the dotted histogram with the distributions shown in the bottom panel of Fig. 3; solid histogram shows the true distribution of M.

Open in new tab Download slide

One estimates φ(M) by dividing N(M) by V_max(M). Since this weight is the same for all objects with the same M, one could have added an additional weighting term to the sum above to get

9

One might have written

⁠, so the expression above shows explicitly why should the photometric errors be thought of as affecting N(M) and not φ(M).

To make the connection to p(z|c) and then to

⁠, it is worth considering how one computes M from z given the observed colours c. If there were no k-correction, then the luminosity in a given band would be determined from the observed apparent brightness by the square of the (cosmology dependent) luminosity distance – the colours are not necessary. In practice, however, one must apply a k-correction; this depends on the spectral type of the galaxy, and hence on its colour. As a result, the mapping between m and M depends on z and c. However, it is still true that both M and z are determined by c. Therefore, the spectroscopic subsample that was previously used to estimate p(z|c) also allows one to estimate p(M, z|c). The quantity of interest in the previous section, p(z|c), is simply the integral of p(M, z|c) over all M. The quantity of interest here, p(M|c), is the integral of p(M, z|c) over all z. Thus, equation (8) becomes

10

where the second to last expression writes the integral of p(M, z|c) over all z as p(M|c), and the final one writes the integral explicitly as a sum over all the objects in the catalogue.

The expression above is the convolution-type estimate of N(M); it does not require a photometric redshift code. However, in principle, a photometric redshift code could output

⁠: the quantity such codes currently output,

⁠, is the integral of

over all M. The relevant weighted sum becomes

11

where

is the integral of

over all z, the sum is over all the objects in the catalogue, and the method only works if

⁠.

Note that the luminosity density (in solar units) can therefore be written as

12

The line before the last line shows that one requires the average of 〈L/V_max(L)〉 summed over the distribution

⁠; this is easily computed from distributions like those shown in the bottom panel of Fig. 3. The final expression writes this as a sum over the observed distribution of colours.

2.5 Galaxy scaling relations

Although the previous section considered the luminosity function in a single band, it is clear that the photometric redshift codes could output

⁠, where M is a set of absolute luminosities (typically, these will be those associated with the various bandpasses from which the colours c were determined). Hence, the colour–magnitude relation, which is really a statement about the joint distribution in two bands, can be estimated by

13

Galaxy scaling relations can be estimated similarly, if we simply interpret M as being the vector of observables, which can include sizes, etc. (not just luminosities). In principle, quantities other than colours (e.g. apparent magnitudes, surface brightness, axis ratios) can play a role in the photometric redshift determination; this can be incorporated into the formalism simply by using c to now denote the full set of observables from which the redshift and other intrinsic quantities M were estimated.

If one wishes to use the output from a photo-z code, rather than from the spectroscopic subset, one would use

14

having checked that, in the spectroscopic subset,

⁠.

3 DISCUSSION

We showed how previous work on deconvolution algorithms for making unbiased reconstructions of galaxy distributions and scaling relations (Sheth 2007; Rossi & Sheth 2008; Rossi et al. 2010) could be related to convolution-based methods. Whereas deconvolution-based methods require accurate knowledge of p(ζ|z), the distribution of the photometric redshift ζ given the true redshift z, convolution-based methods require accurate knowledge of p(z|ζ). Since ζ is derived from photometry, this may more generally be written as p(z|c), where c is the vector of observed photometric parameters that were used to estimate the redshift. In both cases, p(z|c) and p(ζ|z) are calibrated from a sample in which z is known, and are then used in a larger sample where z is not available. If the smaller training set has the same selection limits as the larger data set (e.g. both have the same magnitude limit), then both approaches are valid. We have illustrated our arguments with measurements in the SDSS (Figs 1–4).

We also showed what additional information must be output from photometric redshift codes if their results are to be used in a convolution-like approach to provide unbiased estimates of galaxy scaling relations. In particular, we argued that only if the redshift distribution output by a photo-z algorithm, ⁠, has the same shape as p(z|c), can the algorithm be said to be unbiased. Only in this case its output (available for the full sample) can be used in place of p(z|c) (which is typically available for a small subset). The safest way to accomplish this is for the training set to be a random subsample of the full data set – and to then tune the algorithm so that ⁠. If the training set is not representative, then care must be taken to ensure that does not yield biased results.

Obtaining spectra is expensive, so the question arises as to whether or not there is a more efficient alternative to the random-sample approach. For the convolution method, which requires p(z|c), the answer is clearly ‘yes’. This is because some colour combinations (e.g. the red sequence) might give rise to a narrow p(z|c) distribution, whereas others may result in broader distributions. Since it will take fewer objects to accurately estimate the shape of a narrow p(z|c) distribution than that of a broad one, observational effort would be better placed in obtaining spectra for those objects that produce broad p(z|c) distributions. For the deconvolution approach, one would like to preferentially target those redshifts z that produce broader p(ζ|z) distributions – for similar reasons. However, since z is not known until the spectra are taken, this cannot be done, so taking a random sample of the full data set is the safest way to proceed.

Our methods permit accurate measurement of many scaling relations for which spectra were previously thought to be necessary (e.g. the colour–magnitude relation, the size–surface brightness relation, the photometric Fundamental Plane), so we hope that our work will enable photometric redshift surveys to provide more stringent constraints on galaxy formation models at a fraction of the cost of spectroscopic surveys.

RKS thanks L. Da Costa, M. Maia, P. Pellegrini, M. Makler and the organizers of the DES Workshop in Rio in 2009 May where he had stimulating discussions with C. Cunha and M. Lima about the relative merits of convolution and deconvolution methods, and the APC at Paris 7 Diderot and MPI-Astronomie Heidelberg, for hospitality when this work was written up.

Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the US Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society and the Higher Education Funding Council for England. The SDSS Web site is http://www.sdss.org/.

The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory and the University of Washington.

REFERENCES

Bolzonella

M.

Miralles

J.-M.

Pelló

R.

,

2000

,

A&A

,

363

,

476

Budavári

T.

,

2009

,

ApJ

,

695

,

747

Crossref

Search ADS

Caler

M.

Sheth

R. K.

Jain

B.

,

2009

,

MNRAS

, submitted (arXiv:0811.2805)

Christlein

D.

Gawiser

E.

Marchesini

D.

Padilla

N.

,

2009

,

MNRAS

,

400

,

429

Collister

A. A.

Lahav

O.

,

2004

,

PASP

,

116

,

345

Crossref

Search ADS

Csabai

I.

et al.,

2003

,

AJ

,

125

,

580

Crossref

Search ADS

Cunha

C. E.

Lima

M.

Oyaizu

H.

Frieman

J.

Lin

H.

,

2009

,

MNRAS

,

396

,

2379

Crossref

Search ADS

Lima

M.

Cunha

C. E.

Oyaizu

H.

Frieman

J.

Lin

H.

Sheldon

E. S.

,

2008

,

MNRAS

,

390

,

118

Crossref

Search ADS

Padmanabhan

N.

et al.,

2005

,

MNRAS

,

359

,

237

Crossref

Search ADS

Rossi

G.

Sheth

R. K.

,

2008

,

MNRAS

,

387

,

735

Crossref

Search ADS

Rossi

G.

Sheth

R. K.

Park

C.

,

2010

,

MNRAS

,

401

,

666

Crossref

Search ADS

Sheth

R. K.

,

2007

,

MNRAS

,

378

,

709

Crossref

Search ADS

Download all slides

Month:	Total Views:
January 2017	1
February 2017	2
March 2017	3
April 2017	1
May 2017	1
June 2017	1
July 2017	3
August 2017	1
September 2017	1
October 2017	4
November 2017	1
December 2017	10
January 2018	7
February 2018	6
March 2018	11
April 2018	5
May 2018	3
June 2018	1
July 2018	5
August 2018	3
September 2018	6
October 2018	1
November 2018	8
December 2018	5
January 2019	2
February 2019	1
March 2019	5
April 2019	5
May 2019	7
June 2019	5
July 2019	5
August 2019	5
September 2019	6
October 2019	7
November 2019	10
December 2019	6
January 2020	4
February 2020	3
March 2020	4
April 2020	6
May 2020	2
June 2020	3
July 2020	5
August 2020	6
September 2020	5
October 2020	1
November 2020	6
December 2020	3
January 2021	1
February 2021	2
March 2021	3
April 2021	4
May 2021	2
June 2021	8
July 2021	5
August 2021	1
September 2021	6
October 2021	6
November 2021	3
December 2021	2
February 2022	4
March 2022	6
April 2022	6
June 2022	5
July 2022	9
August 2022	6
September 2022	10
October 2022	16
November 2022	1
December 2022	4
January 2023	6
February 2023	3
March 2023	2
April 2023	13
May 2023	3
June 2023	2
July 2023	4
August 2023	9
September 2023	5
October 2023	3
November 2023	4
December 2023	9
January 2024	7
February 2024	3
March 2024	5
April 2024	1
May 2024	6
June 2024	3
July 2024	13
August 2024	4
September 2024	4
October 2024	4
November 2024	2
December 2024	1
January 2025	7
February 2025	6
March 2025	5
May 2025	5

Article Contents

Convolution- and deconvolution-based estimates of galaxy scaling relations from photometric redshift surveys

Abstract

1 INTRODUCTION

2 TO CONVOLVE OR TO DECONVOLVE?

2.1 The redshift distribution

2.2 Convolution directly from colours

2.3 Relation to photo-z algorithms

2.4 The luminosity function

2.5 Galaxy scaling relations

3 DISCUSSION

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Convolution- and deconvolution-based estimates of galaxy scaling relations from photometric redshift surveys Free

Abstract

1 INTRODUCTION

2 TO CONVOLVE OR TO DECONVOLVE?

2.1 The redshift distribution

2.2 Convolution directly from colours

2.3 Relation to photo-z algorithms

2.4 The luminosity function

2.5 Galaxy scaling relations

3 DISCUSSION

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Convolution- and deconvolution-based estimates of galaxy scaling relations from photometric redshift surveys