-
PDF
- Split View
-
Views
-
Cite
Cite
Ilya Mandel, Will M Farr, Jonathan R Gair, Extracting distribution parameters from multiple uncertain observations with selection biases, Monthly Notices of the Royal Astronomical Society, Volume 486, Issue 1, June 2019, Pages 1086–1093, https://doi.org/10.1093/mnras/stz896
- Share Icon Share
Abstract
We derive a Bayesian framework for incorporating selection effects into population analyses. We allow for both measurement uncertainty in individual measurements and, crucially, for selection biases on the population of measurements, and show how to extract the parameters of the underlying distribution based on a set of observations sampled from this distribution. We illustrate the performance of this framework with an example from gravitational-wave astrophysics, demonstrating that the mass ratio distribution of merging compact-object binaries can be extracted from Malmquist-biased observations with substantial measurement uncertainty.
1 INTRODUCTION
The problem of extracting the distributional properties of a population of sources based on a set of observations drawn from that distribution is a common one, frequently labelled as hierarchical modelling (e.g. Hogg, Myers & Bovy 2010; Bovy, Hogg & Roweis 2011 call this ‘extreme deconvolution’). In practical applications, one often has to deal with selection effects: the observed population will have a Malmquist bias (Malmquist 1922, 1925) whereby the loudest or brightest sources are most likely to be detected, and it is necessary to correct for this bias in order to extract the true source population (e.g. Farr et al. 2014; Foreman-Mackey, Hogg & Morton 2014). In other applications, significant measurement uncertainties in the individual observations must be accounted for (e.g. Farr & Mandel 2018). Of course, these two complications – measurement uncertainties and selection effects – are often present simultaneously.
There have been multiple attempts to address the problem of population-based inference with both selection effects and significant measurement uncertainties. The earliest correct published solution to this problem, as far as we are aware, belongs to Loredo (2004). However, despite the availability of this solution, it is easy to be lured into a seemingly straightforward but incorrect derivation. The most common mistake is the modification of the model population distribution to account for the selection function, i.e. the inclusion of the probability of detecting a particular event only as a multiplicative term in the probability of observing that event. This detection probability is usually included as the probability marginalized over all realizations of the data, ignoring the fact that we know the particular data realization that has been observed. For a given data realization the probability that a source is detected, which is a property purely of the data, is by definition equal to one for any data set associated with an observation we are analysing. On the other hand, as shown below, it is critical to include the detection probability in the normalization factor to account for the different numbers of events expected to be observed under different population models.
We sketched out the correct approach to including selection effects in Mandel, Farr & Gair (2016) (which is superseded by the present manuscript) and Abbott et al. (2016). Other correct applications in the literature include Fishbach & Holz (2017), Fishbach, Holz & Farr (2018), and Feeney et al. (2019). Here, we expand and clarify the earlier treatment of Loredo (2004) by presenting two different approaches to solving this problem below: a bottom-up and a top-down derivation, showing that they yield the same result. Some among us find one or the other approach to be more clear, and we hope that including both will also benefit readers.
We illustrate the derived methodology with two examples. The first is the classic example of measuring a luminosity function with a flux-limited survey. The second is an example from gravitational-wave astronomy: the measurement of the mass ratio of merging binary neutron stars. We show that ≳ 1000 observations at a signal-to-noise ratio (SNR) of ≳ 20 will be necessary to accurately measure the mass ratio distribution. This feat, which can be accomplished with third-generation ground-based gravitational-wave detectors, could elucidate the details of neutron star formation.
2 PROBLEM STATEMENT AND NOTATION
We consider a population of events or objects, each described by a set of parameters |$\vec{\theta }$|. These parameters represent the characteristics of individual events. For example, in the case of compact binary coalescences observed by LIGO and Virgo these would include the masses, spin magnitudes and spin orientations of the two components, the location of the source on the sky, the distance of the source, the orientation and eccentricity of the binary orbit etc. The distribution of events in the population is described via parameters |$\vec{\lambda }$|, so that the number density of objects follows |$\frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }} (\vec{\lambda }) = N p_\textrm{pop}(\vec{\theta }|\vec{\lambda }{}^{\prime })$|. In the gravitational-wave context, these parameters could represent properties of the population like the slope of the mass function of black holes in compact binaries, or the shape of the spin magnitude distribution, or the mixing fractions of different subpopulations. They could also represent physical ingredients used in population synthesis calculations, for example the parameters of the initial mass function, stellar metallicity distribution or stellar winds and the properties of common envelope evolution or of the distribution of supernova kicks. In this second case, the distribution of the individual event properties |$p_\textrm{pop}(\vec{\theta }|\vec{\lambda }{}^{\prime })$| could be obtained from the output of population synthesis codes for that particular choice of input physics. We have separated |$\vec{\lambda }$| into the overall normalization for the number or rate of events N and the set of parameters describing the shape of the distribution alone |$\vec{\lambda }{}^{\prime }$|. For instance, if the underlying distribution is modelled as a multidimensional Gaussian, |$\vec{\lambda }$| would consist of the mean vector and covariance matrix; alternatively, a non-parametric distribution could be described with a (multidimensional) histogram, in which case |$\vec{\lambda }$| represents the weights of various histogram bins.
This distribution is sampled by drawing a set of Nobs ‘observed events’ with true parameters |$\lbrace \vec{\theta }_i\rbrace$|, for i ∈ [1, Nobs]. For each object in the population we make a noisy measurement of |$\vec{\theta }_i$|, represented by a likelihood function relating the measured data, |$\vec{d}_i$|, to the parameters of the event, |$\vec{\theta }$|: |$p\left(\vec{d}_i \mid \vec{\theta }_i \right)$|.
Moreover, based on the observed data, some objects are classed as ‘observable’ and others are ‘unobservable.’ For example, a survey may impose a per-pixel or per-aperture threshold on the flux for inclusion of point sources in a catalogue, or a gravitational wave detector may only report events whose SNR rises above some predetermined threshold. This detection probability can be estimated empirically for a search pipeline via a large injection campaign. In some cases, it can be modelled analytically; for example, for low-mass compact binaries, the gravitational-wave strain in the frequency domain is proportional to the 5/6 power of the chirp mass Mc, so the detection probability scales as the surveyed volume,1|$\propto M_\mathrm{ c}^{15/6}$|. Throughout this article, we will assume that whether or not an event is counted as a detection is a property only of the data for each object and so there exists an indicator function |$\mathbf {I}(\vec{d})$| that is equal to 1 for ‘observable’ objects that would be classified as detections and 0 otherwise; this is by far the most common case for astronomical observations.2
3 BOTTOM-UP DERIVATION
First, we follow the bottom-up approach of deriving the likelihood for obtaining a particular set of observations given the population parameters, by starting with a simple problem without either measurement uncertainties or selection effects and gradually building up the problem complexity. For the moment, we assume that we are only interested in the shape of the population distribution, and ignore the normalization, or rate, of objects in the population; we discuss estimation of both the rate and shape of a population at the end of this section and in Section 4.
We have so far described inference based on the shape of the distribution |$p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })$| while ignoring the overall normalization. This is appropriate when the overall normalization on the population counts is not interesting, or when the bulk of the information comes from the distribution properties rather than the detection rate (a single data point). This is a reasonable assumption in the gravitational-wave context, where the astrophysical uncertainty on the rates of compact object mergers covers several order of magnitude. While inferring the rate is of great interest, models may not predict it with sufficient precision for that measurement to have strong constraining power.
4 TOP-DOWN DERIVATION
This is the same result we derived in Section 3. Each multiplicative term in the numerator of equation (13) from Section 3 is the integral |$\int \mathrm{d}\vec{\theta } p(\vec{d}_i|\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })$|, approximated as a Monte Carlo sum over the posterior samples. The denominator of equation (13) is |$\alpha ^{N_\mathrm{obs}}$|. Meanwhile, α = Ndet/N according to equation (12), which is identical to equation (23) from this section. With the substitution |$p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime }) = (dN/d\vec{\theta }) / N$|, the entire fraction in equation (13) is identical to the first term of equation (22) divided by |$N_\mathrm{det}^{N_\mathrm{obs}}$|, which cancels the last term of equation (13). Thus, we see that equations (13) and (22) are equivalent up to the choice of priors.
Note that the commonly employed technique of modifying |$\frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}$| to account for the selection function is not correct, and will lead to biased results as long as the selection is dependent only on the observed data.
5 HOW IMPORTANT IS IT TO INCLUDE SELECTION EFFECTS?
In the former case, every event contributes to a mistake in inference as the factor |$1/\alpha (\vec{\lambda }^{\prime })$| is also missing. The number of events required before the error becomes apparent will then depend on how strongly this varies with the population parameters, which depends on the particular inference problem. For example, in the case of inferring the slope of the black hole mass function from binary black hole mergers observed by LIGO, this would be a strong effect as shallower mass functions give more higher mass events, which are visible to greater distances and so a higher proportion of the total population lies within the LIGO detector horizon (see for example Fishbach & Holz 2017). However, in the case of inferring the Hubble constant using binary neutron star observers with counterparts, the natural prior on the distance distribution is uniform in comoving volume and, since mass redshifting and non-Euclidean cosmological corrections are negligible within the current LIGO horizon, the selection effect is largely independent of the Hubble constant Abbott et al. (2017b). To be concrete, in the example that will be described in the next section, we repeated the analysis using the former of these wrong methods (as a worst-case scenario) and we show the results of that analysis as dashed lines in Fig. 3. That figure shows the probability–probability plot, i.e. the fraction of times the true parameters lie at a particular significance level over many experiments. For true and modelled distributions that are both Gaussians with common variance σ but means that differ by a bias b, the amount by which the p–p plot deviates from the diagonal depends on b/σ (see discussion in Gair & Moore 2015). We see that, for that specific example, with 10 events the bias is already evident in the p–p plot, but at a level consistent with b/σ < 1. So, there is a bias but it is smaller than the typical statistical error. For 100 events the effect is much more pronounced and consistent with b/σ ∼a few, so for 100 events the result will be appreciably biased. These numbers are for a specific problem and the threshold for inclusion of selection effects to avoid bias will vary from problem to problem. It is therefore important to always include selection effects properly in the analysis, unless there is a good reason to believe that they can be ignored, which typically could only be assessed by doing the analysis including selection effects anyway.
6 AN ILLUSTRATION: MEASURING A LUMINOSITY FUNCTION WITH A FLUX-LIMITED SURVEY
Measuring a luminosity function from a flux-limited survey is a classic problem in astronomy that deals with selection effects (see e.g. Malmquist 1922). Here we apply the method discussed in the previous sections to a toy-model, but illustrative, version of this problem.

The distribution of observed (blue) and true (orange) luminosities for a draw from the model discussed in Section 6. Due to selection effects, the distribution of observed luminosities peaks at higher luminosity and falls more rapidly at low luminosity than the true distribution of sources.
Applying the ‘top-down’ methodology to this problem, the crucial integral in equation (23) is not analytically tractable, though both the population distribution and the selection function are simple functions. We must evaluate this integral numerically. We choose to do this by sampling over the un-observed population and associated data (subject to the constraint that the fluxes associated to the unobserved population are always below Fth) in an MCMC at the same time as we sample the properties of the population and observed objects. That is we explicitly implement equation (18) as our posterior density, summing over the (unknown) number of non-detected systems. Sampling over the unobserved population with this posterior is a method for numerically evaluating the selection integral. Code and results implementing this model in the stan sampler (Carpenter et al. 2017) can be found at https://github.com/farr/SelectionExample. One result of the sampling is an estimate of the luminosity function parameters L* and α; a joint posterior on these parameters appears in Fig. 2. The analysis also recovers with similar accuracy the expected number of objects in the survey volume (Λ), improved estimates of each object’s intrinsic luminosity (informed by the population model), and luminosity distributions of the set of objects too dim to be observed by the survey, as a by-product of the selection function modelling.
7 AN ILLUSTRATION: MEASURING THE MASS RATIO OF BINARY NEUTRON STARS
Do all or the majority of merging binary neutron stars have mass ratios very close to unity? Is the answer to this question redshift- or metallicity-dependent? This question is an important science driver for third-generation gravitational-wave detectors.5 Here, we examine how many neutron star binary mergers must be detected in order to measure the mass-ratio distribution, providing an illustration of the methodology described in the previous sections.
We test the self-consistency of the inference on λ by creating 100 mock populations with random values of λ drawn from the flat prior λ ∈ [0, 0.1]. For each population, we compute the posterior distribution on λ following the methodology described above. We then ask for the quantile of the true value of λ within this posterior. Fig. 3 shows the cumulative distribution of this quantile value, the so-called p–p plot. If posteriors are self-consistent, we expect the truth to fall within the X per cent Bayesian credible interval X per cent of the time, i.e. the p–p plot should be diagonal (e.g. Cook, Gelman & Rubin 2006; Sidery et al. 2014; Veitch et al. 2015). We confirm that the p–p plot is consistent with the diagonal within statistical fluctuations.

The p–p plot of the cumulative distribution of the quantile of the true value of λ within its posterior as estimated from 10 (solid orange curve) and 100 (solid blue curve) mock data sets. These are consistent with the diagonal (dashed black line). For comparison we show the corresponding results, as dashed lines, from using one particular wrong method, as described in Section 5.

The width of the 90 per cent credible interval Δλ as a function of the number of detections; the true value of λ is 0.05 in all mock catalogues. The fluctuations relative to the |$\Delta \lambda \propto N_\mathrm{det}^{-1/2}$| trend are due to the stochastic nature of the detected sample.
Having tested the method and its implementation, we now analyse the uncertainty in the inferred value of λ. This time, we fix the value of λ at λ = 0.05 when generating mock data catalogues, but vary the number of simulated events, with a subset of the events labelled as detectable. We compute the width of the 90 per cent credible interval on λ, defined here as stretching from the 5th to the 95th percentile of the posterior. In figure, we plot this width Δλ against the number of detectable events.
We find that ∼1000 detections at ρ ≥ 18 are necessary in order to measure λ to an accuracy δλ ≈ 0.01. Distributions with λ = 0.01 and λ = 0.02 yield median values of η (q) of 0.243 (0.71) and 0.236 (0.62), respectively, so at least a thousand detections are required in order to make meaningful inference on the mass ratio distribution with a view to distinguishing evolutionary models. An even greater number of detections would be required in each of several redshift bins in order to search for redshift-dependent changes in the mass ratio distribution – perhaps |$\mathcal {O}(10000)$|, given the plausible variation of the mass ratio distribution with redshift.
ACKNOWLEDGEMENTS
IM and WF thank Tom Loredo for useful discussions and the Statistical and Applied Mathematical Sciences Institute, partially supported by the National Science Foundation under Grant DMS-1127914, for hospitality. IM’s work was performed in part at Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611; IM’s visit there was partially supported by a grant from the Simons Foundation. IM thanks Stephen Justham, Vicky Kalogera, and Fred Rasio for discussions related to the illustrative example. We thank Arya Farahi for alerting us to a typo in the manuscript, and the anonymous referee for a number of insightful comments.
Footnotes
In practice, there are very weak deviations from this power law due to the imperfect – noisy – measurement of signal amplitude.
An example where the selection may be parameter- rather than data-dependent is in surveys of objects that have been selected based on data in yet other surveys; Maggie Lieu pointed us to X-ray selected populations of galaxy clusters in a weak-lensing catalogue. This can still be treated within the framework proposed here, by considering the combined likelihood for both data sets and marginalizing over the ‘discarded’ data from the survey used for selection.
This is a very artificial model since all detectors have noise and the reason that |$p_\textrm{det}(\vec{\theta })$| is not equal to one is because of that noise. However, it serves to illustrate the basic idea.
The rationale for writing this as a double-integral, when the integral over |$\vec{d}$| is in fact trivial – since the likelihood is normalized over |$\vec{d}$| – will become apparent below.
This was identified as a key goal during an ongoing study commissioned by the Gravitational Wave International Committee (GWIC), https://gwic.ligo.org/3Gsubcomm/charge.shtml.
We assume that the noise spectral density is proportional to the LIGO A + design, https://dcc.ligo.org/LIGO-T1800042/public.