Extracting distribution parameters from multiple uncertain observations with selection biases

Mandel, Ilya; Farr, Will M; Gair, Jonathan R

doi:10.1093/mnras/stz896

Abstract

We derive a Bayesian framework for incorporating selection effects into population analyses. We allow for both measurement uncertainty in individual measurements and, crucially, for selection biases on the population of measurements, and show how to extract the parameters of the underlying distribution based on a set of observations sampled from this distribution. We illustrate the performance of this framework with an example from gravitational-wave astrophysics, demonstrating that the mass ratio distribution of merging compact-object binaries can be extracted from Malmquist-biased observations with substantial measurement uncertainty.

gravitational waves, methods: data analysis, stars: neutron

1 INTRODUCTION

The problem of extracting the distributional properties of a population of sources based on a set of observations drawn from that distribution is a common one, frequently labelled as hierarchical modelling (e.g. Hogg, Myers & Bovy 2010; Bovy, Hogg & Roweis 2011 call this ‘extreme deconvolution’). In practical applications, one often has to deal with selection effects: the observed population will have a Malmquist bias (Malmquist 1922, 1925) whereby the loudest or brightest sources are most likely to be detected, and it is necessary to correct for this bias in order to extract the true source population (e.g. Farr et al. 2014; Foreman-Mackey, Hogg & Morton 2014). In other applications, significant measurement uncertainties in the individual observations must be accounted for (e.g. Farr & Mandel 2018). Of course, these two complications – measurement uncertainties and selection effects – are often present simultaneously.

There have been multiple attempts to address the problem of population-based inference with both selection effects and significant measurement uncertainties. The earliest correct published solution to this problem, as far as we are aware, belongs to Loredo (2004). However, despite the availability of this solution, it is easy to be lured into a seemingly straightforward but incorrect derivation. The most common mistake is the modification of the model population distribution to account for the selection function, i.e. the inclusion of the probability of detecting a particular event only as a multiplicative term in the probability of observing that event. This detection probability is usually included as the probability marginalized over all realizations of the data, ignoring the fact that we know the particular data realization that has been observed. For a given data realization the probability that a source is detected, which is a property purely of the data, is by definition equal to one for any data set associated with an observation we are analysing. On the other hand, as shown below, it is critical to include the detection probability in the normalization factor to account for the different numbers of events expected to be observed under different population models.

We sketched out the correct approach to including selection effects in Mandel, Farr & Gair (2016) (which is superseded by the present manuscript) and Abbott et al. (2016). Other correct applications in the literature include Fishbach & Holz (2017), Fishbach, Holz & Farr (2018), and Feeney et al. (2019). Here, we expand and clarify the earlier treatment of Loredo (2004) by presenting two different approaches to solving this problem below: a bottom-up and a top-down derivation, showing that they yield the same result. Some among us find one or the other approach to be more clear, and we hope that including both will also benefit readers.

We illustrate the derived methodology with two examples. The first is the classic example of measuring a luminosity function with a flux-limited survey. The second is an example from gravitational-wave astronomy: the measurement of the mass ratio of merging binary neutron stars. We show that ≳ 1000 observations at a signal-to-noise ratio (SNR) of ≳ 20 will be necessary to accurately measure the mass ratio distribution. This feat, which can be accomplished with third-generation ground-based gravitational-wave detectors, could elucidate the details of neutron star formation.

2 PROBLEM STATEMENT AND NOTATION

We consider a population of events or objects, each described by a set of parameters |$\vec{\theta }$|⁠. These parameters represent the characteristics of individual events. For example, in the case of compact binary coalescences observed by LIGO and Virgo these would include the masses, spin magnitudes and spin orientations of the two components, the location of the source on the sky, the distance of the source, the orientation and eccentricity of the binary orbit etc. The distribution of events in the population is described via parameters |$\vec{\lambda }$|⁠, so that the number density of objects follows |$\frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }} (\vec{\lambda }) = N p_\textrm{pop}(\vec{\theta }|\vec{\lambda }{}^{\prime })$|⁠. In the gravitational-wave context, these parameters could represent properties of the population like the slope of the mass function of black holes in compact binaries, or the shape of the spin magnitude distribution, or the mixing fractions of different subpopulations. They could also represent physical ingredients used in population synthesis calculations, for example the parameters of the initial mass function, stellar metallicity distribution or stellar winds and the properties of common envelope evolution or of the distribution of supernova kicks. In this second case, the distribution of the individual event properties |$p_\textrm{pop}(\vec{\theta }|\vec{\lambda }{}^{\prime })$| could be obtained from the output of population synthesis codes for that particular choice of input physics. We have separated |$\vec{\lambda }$| into the overall normalization for the number or rate of events N and the set of parameters describing the shape of the distribution alone |$\vec{\lambda }{}^{\prime }$|⁠. For instance, if the underlying distribution is modelled as a multidimensional Gaussian, |$\vec{\lambda }$| would consist of the mean vector and covariance matrix; alternatively, a non-parametric distribution could be described with a (multidimensional) histogram, in which case |$\vec{\lambda }$| represents the weights of various histogram bins.

This distribution is sampled by drawing a set of N_obs ‘observed events’ with true parameters |$\lbrace \vec{\theta }_i\rbrace$|⁠, for i ∈ [1, N_obs]. For each object in the population we make a noisy measurement of |$\vec{\theta }_i$|⁠, represented by a likelihood function relating the measured data, |$\vec{d}_i$|⁠, to the parameters of the event, |$\vec{\theta }$|⁠: |$p\left(\vec{d}_i \mid \vec{\theta }_i \right)$|⁠.

Moreover, based on the observed data, some objects are classed as ‘observable’ and others are ‘unobservable.’ For example, a survey may impose a per-pixel or per-aperture threshold on the flux for inclusion of point sources in a catalogue, or a gravitational wave detector may only report events whose SNR rises above some predetermined threshold. This detection probability can be estimated empirically for a search pipeline via a large injection campaign. In some cases, it can be modelled analytically; for example, for low-mass compact binaries, the gravitational-wave strain in the frequency domain is proportional to the 5/6 power of the chirp mass M_c, so the detection probability scales as the surveyed volume,¹|$\propto M_\mathrm{ c}^{15/6}$|⁠. Throughout this article, we will assume that whether or not an event is counted as a detection is a property only of the data for each object and so there exists an indicator function |$\mathbf {I}(\vec{d})$| that is equal to 1 for ‘observable’ objects that would be classified as detections and 0 otherwise; this is by far the most common case for astronomical observations.²

Our ultimate goal is to determine the population properties |$\vec{\lambda }$|⁠. Of course, we cannot uniquely reconstruct |$\vec{\lambda }$| using a limited set of observations with selection biases and measurement uncertainties. The best we can do is compute the posterior probability on |$\vec{\lambda }$|⁠, the distribution on distributions, given the observations, which, in the usual Bayesian formalism, is given by

\begin{eqnarray*} p(\vec{\lambda }|\lbrace \vec{d}_i\rbrace) = \frac{p(\lbrace \vec{d}_i\rbrace |\vec{\lambda }) \pi (\vec{\lambda })}{p(\lbrace \vec{d}_i\rbrace)}, \end{eqnarray*}

(1)

where |$p(\lbrace \vec{d}_i\rbrace |\vec{\lambda })$| is the likelihood of observing the data set given the population properties, |$\pi (\vec{\lambda })$| is the prior on |$\vec{\lambda }$| and the evidence |$p(\lbrace \vec{d}_i\rbrace)$| is the integral of the numerator over all |$\vec{\lambda }$|⁠. This evidence can be used to select between different models for representing the distribution, as in Farr et al. (2011). In the next two sections, we present two alternative ways of deriving |$p(\lbrace \vec{d}_i\rbrace |\vec{\lambda })$|⁠.

3 BOTTOM-UP DERIVATION

First, we follow the bottom-up approach of deriving the likelihood for obtaining a particular set of observations given the population parameters, by starting with a simple problem without either measurement uncertainties or selection effects and gradually building up the problem complexity. For the moment, we assume that we are only interested in the shape of the population distribution, and ignore the normalization, or rate, of objects in the population; we discuss estimation of both the rate and shape of a population at the end of this section and in Section 4.

In the absence of measurement uncertainties, the data can be directly converted into event parameters |$\lbrace \vec{\theta }_i\rbrace$|⁠, for i ∈ [1, N_obs]. The total probability of making this particular set of independent observations is

\begin{eqnarray*} p(\lbrace \vec{\theta }_i\rbrace |\vec{\lambda }^{\prime }) = \prod _{i=1}^{N_\mathrm{obs}}\frac{p_\textrm{pop}(\vec{\theta }_i|\vec{\lambda }^{\prime })}{\int \mathrm{d}\vec{\theta }\, p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })}\, . \end{eqnarray*}

(2)

The normalization factor here accounts for the overall probability of making an observation given a particular choice of |$\vec{\lambda }$| (it will be equal to 1 if p_pop is properly normalized, but we keep the normalization term for completeness).

In practice, there is often a selection bias involved: some events are easier to observe than others. This can be characterized by a detection probability |$p_\textrm{det}(\vec{\theta })$|⁠. We assume for now that when systems are observed their parameters can be measured perfectly, i.e. we directly measure |$\lbrace \vec{\theta }_i\rbrace$|⁠. This effectively says that the noise in the measurement is negligible³ and the selection effects can be applied directly to the event parameters: |$p_\textrm{det}(\vec{\theta }) = \mathbf {I}(\vec{\theta })$|⁠, i.e. events are either always detected or never detected depending on their parameters. With the selection effect included, equation (2) becomes (e.g. Chennamangalam et al. 2013; Farr et al. 2015)

\begin{eqnarray*} p(\lbrace \vec{\theta }_i\rbrace |\vec{\lambda }^{\prime }) &=& \prod _{i=1}^{N_\mathrm{obs}}\frac{ p_\textrm{pop}(\vec{\theta }_i|\vec{\lambda }^{\prime }) p_\textrm{det}(\vec{\theta }_i)}{\int \mathrm{d}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime }) p_\textrm{det}(\vec{\theta })}\nonumber \\ &=& \prod _{i=1}^{N_\mathrm{obs}}\frac{ p_\textrm{pop}(\vec{\theta }_i|\vec{\lambda }^{\prime })}{\int \mathrm{d}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime }) p_\textrm{det}(\vec{\theta })}\ , \end{eqnarray*}

(3)

where the second equality follows because, by definition, |$\mathbf {I}(\vec{\theta })=1$| for any event we have included among the set of detections.

In general, we do not have the luxury of directly measuring the parameters of a given event, |$\vec{\theta }_i$|⁠. Instead, we measure the data set |$\vec{d}_i$| which encodes these parameters but also includes some random noise. For a given data set and search pipeline, we assume that the detectability is deterministic: if the data exceed some threshold (e.g. a threshold on the SNR), then the event is detectable, otherwise it is not. In other words, the detection probability for a given set of parameters introduced earlier is, in fact, an integral over the possible data sets given those parameters:

\begin{eqnarray*} p_\textrm{det}(\vec{\theta }) = \int _{\vec{d} \gt \textrm{threshold}} p(\vec{d}|\vec{\theta }) \mathrm{d}\vec{d} = \int \mathbf {I}(\vec{d}) p(\vec{d}|\vec{\theta }) \mathrm{d}\vec{d}\ . \end{eqnarray*}

(4)

In the gravitational-wave context, detection is usually well approximated as a cut on the observed SNR and so this detection probability is the likelihood distribution of observed SNRs. There are two stochastic components to the observed SNR – fluctuations in the detector noise which change the observed SNR relative to the intrinsic SNR, and fluctuations in the intrinsic SNR due to variations in the source parameters. For an example of the latter, the expected signal amplitude is a strong function of the mass – a selection effect that is critical to consider when inferring the underlying distribution of binary black hole masses from the observed events (Abbott et al. 2016; Fishbach & Holz 2017). As another example, the intrinsic SNR also depends on extrinsic parameters of the binary, i.e. the sky location and orientation of the system. That dependence is largely encoded in the distribution of the parameter Θ described in Finn & Chernoff (1993). The function |$p_\textrm{det}(\vec{\theta })$| encodes both these types of intrinsic selection effect, plus marginalization over instrumental noise fluctuations.

Using equation (4), we can write the probability of observing a particular data set (where ‘observing’ implies that the data are above the threshold, hence included as one of our k observations) given the assumed distribution parametrized by |$\vec{\lambda }^{\prime }$| as

\begin{eqnarray*} p(\vec{d}|\vec{\lambda }^{\prime }) = \frac{\int \mathrm{d}\vec{\theta } p(\vec{d}|\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })}{\alpha (\vec{\lambda }^{\prime })}\, , \end{eqnarray*}

(5)

where the normalization factor |$\alpha (\vec{\lambda }^{\prime })$| is given by

\begin{eqnarray*} \alpha (\vec{\lambda }^{\prime }) &\equiv & \int _{\vec{d} \gt \textrm{threshold}} \mathrm{d}\vec{d} \, \int \mathrm{d}\vec{\theta } p(\vec{d}|\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })\nonumber \\ &=& \int \mathrm{d}\vec{\theta } \left[\int _{\vec{d} \gt \textrm{threshold}} \mathrm{d}\vec{d} p(\vec{d}|\vec{\theta })\right] p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })\nonumber \\ &\equiv & \int \mathrm{d}\vec{\theta } p_\textrm{det}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })\, . \end{eqnarray*}

(6)

This normalization factor can be interpreted as the fraction of events in the Universe that would be detected for a particular population model, characterized by the population parameters |$\vec{\lambda }^{\prime }$|⁠.

Thus, in the presence of both measurement uncertainty and selection effects, equations (2) and (3) become:

\begin{eqnarray*} p(\lbrace \vec{d}_i\rbrace |\vec{\lambda }^{\prime }) = \prod _{i=1}^{N_\mathrm{obs}}\frac{\int \mathrm{d}\vec{\theta } p(\vec{d}_i|\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })}{\int \mathrm{d}\vec{\theta } p_\textrm{det}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })} \, . \end{eqnarray*}

(7)

The presence of the likelihood |$p(\vec{d}_i|\vec{\theta })$| in this equation is a reminder that we do not have a perfect measurement of the parameters of a given event. The likelihood can be rewritten in terms of the posterior probability density function (PDF) |$p(\vec{\theta }_i|\vec{d}_i)$| that is computed in the course of single-event parameter estimation using some assumed prior |$\pi (\vec{\theta })$|⁠:

\begin{eqnarray*} p(\vec{d}_i|\vec{\theta }_i) = \frac{p(\vec{\theta }_i|\vec{d}_i) p(\vec{d}_i)}{\pi (\vec{\theta })}\, . \end{eqnarray*}

(8)

Thus, each term of the product in equation (7) is a normalized convolution integral of the population with the posterior PDF (Mandel 2010).

In practice, the posterior PDF |$p(\vec{\theta }_i|\vec{d}_i)$| is often discretely sampled with S_i samples from the posterior, |$\lbrace ^j\vec{\theta }_i\rbrace$|⁠, for j ∈ [1, S_i]. Because the samples are drawn according to the posterior, the parameter space volume associated with each sample is inversely proportional to the local PDF, |$d^j\vec{\theta }_i \propto \left[p(^j\vec{\theta }_i | \vec{d}_i)\right]^{-1}$|⁠. This allows us to easily replace the integral in equation (7) with a discrete sum over PDF samples:

\begin{eqnarray*} p(\lbrace \vec{d}_i\rbrace |\vec{\lambda }^{\prime }) = \prod _{i=1}^{N_\mathrm{obs}}\frac{\frac{1}{S_i} \sum _{j=1}^{S_i} p_\textrm{pop}(^j\vec{\theta }_i|\vec{\lambda }^{\prime }) \frac{p(\vec{d}_i)}{\pi (\vec{\theta })}}{\int \mathrm{d}\vec{\theta } p_\textrm{det}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })} \ . \end{eqnarray*}

(9)

Finally, the posterior on the underlying population parameters |$\vec{\lambda }^{\prime }$| is given by substituting equation (9) into equation (1):

\begin{eqnarray*} p(\vec{\lambda }^{\prime }|\lbrace \vec{d}_i\rbrace) &=& \frac{\pi (\vec{\lambda }^{\prime })}{p(\lbrace \vec{d}_i\rbrace)} \prod _{i=1}^{N_\mathrm{obs}}\frac{\frac{1}{S_i} \sum _{j=1}^{S_i} \frac{p_\textrm{pop}(^j\vec{\theta }_i|\vec{\lambda }^{\prime })}{\pi (\vec{\theta })}p(\vec{d}_i)}{\int \mathrm{d}\vec{\theta } p_\textrm{det}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })} \nonumber \\ &=& \pi (\vec{\lambda }^{\prime }) \prod _{i=1}^{N_\mathrm{obs}}\frac{\frac{1}{S_i} \sum _{j=1}^{S_i} \frac{p_\textrm{pop}(^j\vec{\theta }_i|\vec{\lambda }^{\prime })}{\pi (\vec{\theta })}}{\int \mathrm{d}\vec{\theta } p_\textrm{det}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })} \ . \end{eqnarray*}

(10)

Of course, if interested in the distribution of a single parameter, we can marginalize over equation (10) in the usual way, by integrating over the remaining parameters.

We have so far described inference based on the shape of the distribution |$p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })$| while ignoring the overall normalization. This is appropriate when the overall normalization on the population counts is not interesting, or when the bulk of the information comes from the distribution properties rather than the detection rate (a single data point). This is a reasonable assumption in the gravitational-wave context, where the astrophysical uncertainty on the rates of compact object mergers covers several order of magnitude. While inferring the rate is of great interest, models may not predict it with sufficient precision for that measurement to have strong constraining power.

In contexts in which the expected number of detections N_det can be predicted, this can be readily included in the framework. The probability of observing k events is given by the Poisson distribution

\begin{eqnarray*} p(k|N_\mathrm{det}) = e^{-N_\mathrm{det}} (N_\mathrm{det})^{N_\mathrm{obs}}\ . \end{eqnarray*}

(11)

Here, the usual N_obs! term in the denominator is absent because the events are distinguishable by their data; in any case, as a normalization term that depends on the data only, it would not impact inference on |$\vec{\lambda }$|⁠. The expected number of detections once selection effects are included is (cf. equation 23):

\begin{eqnarray*} N_\mathrm{det}\left(\vec{\lambda }\right) \equiv \int _{ \vec{d}\gt {\rm threshold} } \mathrm{d}\vec{d}\, \mathrm{d}\vec{\theta }\, p(\vec{d}| \vec{\theta }) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}(\vec{\lambda }) = N \alpha (\vec{\lambda }{}^{\prime }) \ . \end{eqnarray*}

(12)

The posterior on the population parameters with the rate included becomes

\begin{eqnarray*} p(\vec{\lambda }{}^{\prime }, N|\lbrace \vec{d}_i\rbrace) &=& \pi (\vec{\lambda }{}^{\prime }) \pi (N) \prod _{i=1}^{N_\mathrm{obs}}\frac{\frac{1}{S_i} \sum _{j=1}^{S_i} \frac{p_\textrm{pop}(^j\vec{\theta }_i|\vec{\lambda }{}^{\prime })}{\pi (\vec{\theta })}}{\int \mathrm{d}\vec{\theta } p_\textrm{det}(\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }{}^{\prime })} \nonumber \\ &&\times e^{-N_\mathrm{det}} (N_\mathrm{det})^{N_\mathrm{obs}}\ . \end{eqnarray*}

(13)

Note that if a prior π(N) ∝ 1/N is assumed on the intrinsic event number or rate (Fishbach et al. 2018), equation (13) can be marginalized over N to again yield equation (10) up to a normalization constant, which depends only on the number of observed events and would not impact inference on model parameters:

\begin{eqnarray*} \int \mathrm{d}N \frac{\pi (\vec{\lambda }{}^{\prime })}{N} \prod _{i=1}^{N_\mathrm{obs}}\frac{\sum _{j=1}^{S_i} \frac{p_\textrm{pop}(^j\vec{\theta }_i|\vec{\lambda }{}^{\prime })}{\pi (\vec{\theta })}}{S_i\ \alpha (\vec{\lambda }{}^{\prime })} e^{-N \alpha (\vec{\lambda }{}^{\prime })} \left(N \alpha (\vec{\lambda }{}^{\prime })\right)^{N_\mathrm{obs}}\nonumber \\ = ({N_\mathrm{obs}}-1)! \ \pi (\vec{\lambda }{}^{\prime }) \prod _{i=1}^{N_\mathrm{obs}}\frac{\sum _{j=1}^{S_i} \frac{p_\textrm{pop}(^j\vec{\theta }_i|\vec{\lambda }{}^{\prime })}{\pi (\vec{\theta })}}{S_i\ \alpha (\vec{\lambda }{}^{\prime })} \end{eqnarray*}

(14)

where we used

\begin{eqnarray*} \int \frac{\mathrm{d}N}{N} e^{-N \alpha (\vec{\lambda }{}^{\prime })} \left(N \alpha (\vec{\lambda }{}^{\prime })\right)^{N_\mathrm{obs}}&=& \int \mathrm{d}N_\mathrm{det}e^{-N_\mathrm{det}} N_\mathrm{det}^{{N_\mathrm{obs}}-1} \nonumber \\ &=& \Gamma (k) = ({N_\mathrm{obs}}-1)! \end{eqnarray*}

(15)

4 TOP-DOWN DERIVATION

Alternatively, we consider a top-down calculation. If we have observed a representative sample from the population (i.e. a ‘fair draw’), then the appropriate (unnormalized) joint distribution for the parameters |$\left\lbrace \vec{\theta }_i \right\rbrace _{i=1}^{N_\mathrm{total}}$| and observations |$\left\lbrace \vec{d}_i \right\rbrace$| of the i = 1, …, N_total objects given the parameters |$\vec{\lambda }$| describing the population (again, |$\vec{\lambda }$| are all parameters describing the population, including the rate, while |$\vec{\lambda }{}^{\prime }$| are parameters that only describe the shape of the population) is

\begin{eqnarray*} \pi \left(\left\lbrace \vec{\theta }_i \right\rbrace , \left\lbrace d_i \right\rbrace \mid \vec{\lambda }\right) \propto \left[ \prod _{i=1}^{N_\mathrm{total}} p\left(\vec{d}_i \mid \vec{\theta }_i \right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }_i}\left(\vec{\lambda }\right) \right] \exp \left[ - N\left(\vec{\lambda }\right) \right]\!, \nonumber \\ \end{eqnarray*}

(16)

where

\begin{eqnarray*} N\left(\vec{\lambda }\right) \equiv \int \mathrm{d}\vec{d}\, \mathrm{d}\vec{\theta }p\left(\vec{d}\mid \vec{\theta }\right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}\left(\lambda \right) \end{eqnarray*}

(17)

is the expected number of objects in the population.⁴ This is the standard likelihood for a hierarchical analysis of an inhomogeneous Poisson process (Loredo & Wasserman 1995; Hogg et al. 2010; Mandel 2010; Youdin 2011; Foreman-Mackey et al. 2014; Farr et al. 2015; Barrett et al. 2018).

If some objects are classed as ‘observable’ (indexed by i) and others are ‘unobservable’ (indexed by j), the complete set of observations partitions into two subsets of cardinality N_obs and N_nobs:

\begin{eqnarray*} \pi \left(\left\lbrace \vec{\theta }_i \right\rbrace , \left\lbrace \vec{\theta }_j \right\rbrace , \left\lbrace d_i \right\rbrace , \left\lbrace d_j \right\rbrace \mid \vec{\lambda }\right) \propto \left[ \prod _{i=1}^{N_\mathrm{obs}} p\left(\vec{d}_i \mid \vec{\theta }_i \right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }_i}\left(\vec{\lambda }\right) \right] \nonumber \\ \times \left[ \prod _{j=1}^{N_\mathrm{nobs}} p\left(\vec{d}_j \mid \vec{\theta }_j \right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }_j}\left(\vec{\lambda }\right) \right] \exp \left[ - N\left(\vec{\lambda }\right) \right]. \nonumber \\ \end{eqnarray*}

(18)

Again, a key point is that we can perform this partitioning simply by examining the data obtained for each object.

It is common for the data associated with ‘non-observable’ objects to be completely censored; that is it often does not appear in a catalogue or otherwise at all. In this case, it is appropriate to marginalize over the parameters and (unknown) data for the ‘non-observable’ objects. Doing so destroys the distinguishability inherent in the inhomogeneous Poisson distribution, so we must introduce a factor of N_nobs! to account for the overcounting:

\begin{eqnarray*} \pi \left(\left\lbrace \vec{\theta }_i \right\rbrace , \left\lbrace d_i \right\rbrace ,N_\mathrm{nobs}\mid \vec{\lambda }\right) \propto \left[ \prod _{i=1}^{N_\mathrm{obs}} p\left(\vec{d}_i \mid \vec{\theta }_i \right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }_i}\left(\vec{\lambda }\right) \right] \nonumber \\ \times \frac{N_\mathrm{ndet}^{N_\mathrm{nobs}}\left(\vec{\lambda }\right)}{N_\mathrm{nobs}!} \exp \left[ - N\left(\vec{\lambda }\right) \right], \end{eqnarray*}

(19)

where

\begin{eqnarray*} N_\mathrm{ndet}\left(\vec{\lambda }\right) \equiv \int _{\left\lbrace \vec{d}\mid {\rm non-detection} \right\rbrace } \mathrm{d}\vec{d}\, \mathrm{d}\vec{\theta }\, p\left(\vec{d}\mid \vec{\theta }\right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}\left(\vec{\lambda }\right) \end{eqnarray*}

(20)

is the expected number of non-detections in the population model. Stopping here we would have a model similar to the ones discussed in Messenger & Veitch (2013) (though that reference did not discuss rate estimation); however, it is common to not even know how many non-detected objects there were in a given survey or data set. In this case we must marginalize – sum, since counting is a discrete operation – over the unknown number of non-detections, N_nobs, yielding

\begin{eqnarray*} \pi \left(\left\lbrace \vec{\theta }_i \right\rbrace , \left\lbrace d_i \right\rbrace \mid \vec{\lambda }\right) & \propto & \left[ \prod _{i=1}^{N_\mathrm{obs}} p\left(\vec{d}_i \mid \vec{\theta }_i \right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }_i}\left(\vec{\lambda }\right) \right] \nonumber \\ & \times & \exp \left[ - \left(N\left(\vec{\lambda }\right) - N_\mathrm{ndet}\left(\vec{\lambda }\right) \right) \right], \end{eqnarray*}

(21)

or

\begin{eqnarray*} \pi \left(\left\lbrace \vec{\theta }_i \right\rbrace , \left\lbrace d_i \right\rbrace \mid \vec{\lambda }\right)\! \propto \! \left[ \prod _{i=1}^{N_\mathrm{obs}} p\left(\vec{d}_i \mid \vec{\theta }_i \right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }_i}\left(\vec{\lambda }\right) \! \right] \exp \left[ - N_\mathrm{det}\left(\vec{\lambda }\right) \right]\!, \nonumber \\ \end{eqnarray*}

(22)

where N_det – the compliment of N_ndet – is the expected number of detections under the population model:

\begin{eqnarray*} N_\mathrm{det}\left(\vec{\lambda }\right) &\equiv & \int _{\left\lbrace \vec{d}\mid {\rm detection} \right\rbrace } \mathrm{d}\vec{d}\, \mathrm{d}\vec{\theta }\, p\left(\vec{d}\mid \vec{\theta }\right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}\left(\vec{\lambda }\right)\nonumber \\ &=& \int \mathrm{d}\vec{d}\, \mathrm{d}\vec{\theta }\, \mathbf {I}(\vec{d}) p\left(\vec{d}\mid \vec{\theta }\right) \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}\left(\vec{\lambda }\right)\ . \end{eqnarray*}

(23)

This equation is the posterior for a hierarchical analysis of the number density and properties of objects from a data set subject to selection effects (e.g. Gair, Tang & Volonteri 2010; Youdin 2011; Fishbach et al. 2018; Wysocki, Lange & O’Shaughnessy 2018).

This is the same result we derived in Section 3. Each multiplicative term in the numerator of equation (13) from Section 3 is the integral |$\int \mathrm{d}\vec{\theta } p(\vec{d}_i|\vec{\theta }) p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime })$|⁠, approximated as a Monte Carlo sum over the posterior samples. The denominator of equation (13) is |$\alpha ^{N_\mathrm{obs}}$|⁠. Meanwhile, α = N_det/N according to equation (12), which is identical to equation (23) from this section. With the substitution |$p_\textrm{pop}(\vec{\theta }|\vec{\lambda }^{\prime }) = (dN/d\vec{\theta }) / N$|⁠, the entire fraction in equation (13) is identical to the first term of equation (22) divided by |$N_\mathrm{det}^{N_\mathrm{obs}}$|⁠, which cancels the last term of equation (13). Thus, we see that equations (13) and (22) are equivalent up to the choice of priors.

As in Section 3, if we re-parametrize |$\frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}$| so that we can write

\begin{eqnarray*} \frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }} \equiv N p\left(\vec{\theta }\mid \vec{\lambda }{}^{\prime } \right) \end{eqnarray*}

(24)

with |$p\left(\vec{\theta }\mid \vec{\lambda }{}^{\prime }\right)$| integrating to 1 over the population for any value of the new parameters |$\vec{\lambda }{}^{\prime }$|⁠, impose a prior p(N) ∝ 1/N, and marginalize over N, we arrive at the treatment of selection functions for estimating population distributions from Loredo (2004) and Abbott et al. (2016). This correspondence only holds with a 1/N scale-invariant prior on the number of objects in the population (see Fishbach et al. 2018 and equation (14) above); other priors are, of course, possible, but will not marginalize to the population analysis above.

Note that the commonly employed technique of modifying |$\frac{\mathrm{d}N}{\mathrm{d}\vec{\theta }}$| to account for the selection function is not correct, and will lead to biased results as long as the selection is dependent only on the observed data.

5 HOW IMPORTANT IS IT TO INCLUDE SELECTION EFFECTS?

It is natural to ask how many events you will need to observe before the incorrect treatment of selection effects starts to influence the results. Any incorrect analysis, i.e. writing down a posterior distribution that is not consistent with the true data-generating process, will lead to a bias in the result and might also change the inferred posterior uncertainty. Asymptotically, the bias remains constant while the uncertainty decreases like the square root of the number of events. Therefore after sufficient observations have been made the result will be inconsistent with the true parameter values. The number of events that can be observed before the bias becomes important depends both on what particular ‘wrong method’ is being used and on the specific problem under consideration. One plausible wrong method is that selection effects will be ignored completely, but more often selection effects are included in an incorrect way. For example, one might write down the likelihood for an individual detected event as

\begin{eqnarray*} \int p(\vec{d}, {\rm det} | \vec{\theta }) p_{\rm pop}(\vec{\theta }| \vec{\lambda }^{\prime }) {\rm d}\vec{\theta } \end{eqnarray*}

which acknowledges that we have only used detected events (indicated by the flag ‘det’). Then an incorrect assumption is made that the specific data generation process and the question of whether or not the event is detected are independent, so that the first term can be factorized as

\begin{eqnarray*} \int p(\vec{d} | \vec{\theta }) p_{\rm det} (\vec{\theta }) p_{\rm pop}(\vec{\theta }| \vec{\lambda }^{\prime }){\rm d}\vec{\theta }. \end{eqnarray*}

This differs from the true result in two ways – the normalization term |$1/\alpha (\vec{\lambda })$| is missing, and there is an extra factor of |$p_{\rm det}(\vec{\theta })$| in the numerator.

A slightly more astute practitioner might realize that the selection bias modifies the probability distribution for the parameters of observed events so that this becomes

\begin{eqnarray*} p(\vec{\theta }| {\rm det}, \vec{\lambda }^{\prime }) = \frac{p_{\rm det}(\vec{\theta }) p_{\rm pop}(\vec{\theta }|\vec{\lambda }^{\prime })}{\alpha (\vec{\lambda }^{\prime })} \end{eqnarray*}

but then fail to also condition the likelihood |$p(d|\vec{\theta })$| on detection and use

\begin{eqnarray*} \frac{1}{\alpha (\vec{\lambda }^{\prime })} \int p(d|\vec{\theta }){p_{\rm det}(\vec{\theta }) p_{\rm pop}(\vec{\theta }|\vec{\lambda }^{\prime })} {\rm d} \vec{\theta } \end{eqnarray*}

which includes the correct normalization factor but still has the additional |$p_{\rm det}(\vec{\theta })$| in the numerator. In this latter case, the differences will only become apparent once a sufficient number of events with |$p_{\rm det}(\vec{\theta })$| significantly different from 1 have been observed. The number of events required would scale like the inverse of the fraction of the observable parameter space where selection effects are important, although the exact number of events would also depend on how much information those events contained about |$\vec{\lambda }^{\prime }$|⁠, i.e. how much the properties of those events depend on the properties of the population.

In the former case, every event contributes to a mistake in inference as the factor |$1/\alpha (\vec{\lambda }^{\prime })$| is also missing. The number of events required before the error becomes apparent will then depend on how strongly this varies with the population parameters, which depends on the particular inference problem. For example, in the case of inferring the slope of the black hole mass function from binary black hole mergers observed by LIGO, this would be a strong effect as shallower mass functions give more higher mass events, which are visible to greater distances and so a higher proportion of the total population lies within the LIGO detector horizon (see for example Fishbach & Holz 2017). However, in the case of inferring the Hubble constant using binary neutron star observers with counterparts, the natural prior on the distance distribution is uniform in comoving volume and, since mass redshifting and non-Euclidean cosmological corrections are negligible within the current LIGO horizon, the selection effect is largely independent of the Hubble constant Abbott et al. (2017b). To be concrete, in the example that will be described in the next section, we repeated the analysis using the former of these wrong methods (as a worst-case scenario) and we show the results of that analysis as dashed lines in Fig. 3. That figure shows the probability–probability plot, i.e. the fraction of times the true parameters lie at a particular significance level over many experiments. For true and modelled distributions that are both Gaussians with common variance σ but means that differ by a bias b, the amount by which the p–p plot deviates from the diagonal depends on b/σ (see discussion in Gair & Moore 2015). We see that, for that specific example, with 10 events the bias is already evident in the p–p plot, but at a level consistent with b/σ < 1. So, there is a bias but it is smaller than the typical statistical error. For 100 events the effect is much more pronounced and consistent with b/σ ∼a few, so for 100 events the result will be appreciably biased. These numbers are for a specific problem and the threshold for inclusion of selection effects to avoid bias will vary from problem to problem. It is therefore important to always include selection effects properly in the analysis, unless there is a good reason to believe that they can be ignored, which typically could only be assessed by doing the analysis including selection effects anyway.

6 AN ILLUSTRATION: MEASURING A LUMINOSITY FUNCTION WITH A FLUX-LIMITED SURVEY

Measuring a luminosity function from a flux-limited survey is a classic problem in astronomy that deals with selection effects (see e.g. Malmquist 1922). Here we apply the method discussed in the previous sections to a toy-model, but illustrative, version of this problem.

Suppose the luminosity function of our objects can be modelled by a Schechter function (Schechter 1976):

\begin{eqnarray*} \frac{\mathrm{d}N}{\mathrm{d}L} = \frac{\Lambda }{L_*^{1+\alpha } \Gamma \left(1+\alpha \right)} L^\alpha \exp \left[ - \frac{L}{L_*}\right] \end{eqnarray*}

(25)

with α > −1 and L_* > 0 parameters controlling the shape of the distribution and Λ the expected number of objects in the survey volume (i.e. the overall normalization).

Somewhat unrealistically, we suppose we can measure distances to objects perfectly, but that we typically measure fluxes (and therefore luminosities) with |$\sigma _L \simeq 5{{\ \rm per\ cent}}$| uncertainty and that the measurement process results in a lognormal likelihood function:

\begin{eqnarray*} p\left(L_\mathrm{obs} \mid L \right) &=& \nonumber \\ && \frac{1}{\sigma _L L_\mathrm{obs} \sqrt{2\pi }} \exp \left[ - \frac{1}{2} \left(\frac{\log L - \log L_\mathrm{obs}}{\sigma _L} \right)^2\right].\nonumber \\ \end{eqnarray*}

(26)

We assume a Euclidean universe, so in appropriate units a flux limit for detection of F_th implies a probability of detection for an observed luminosity of

\begin{eqnarray*} P_\mathrm{det} \left(L_\mathrm{obs} \right) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}1 & \frac{L_\mathrm{obs}}{4 \pi z^2} \gt F_\mathrm{th} \\ 0 & \mathrm{otherwise} \end{array}\right., \end{eqnarray*}

(27)

where z is the redshift (distance) to the object. For computational efficiency, we assume that our objects are uniformly distributed in z for 0 ≤ z ≤ 2 (this assumption reduces the number of unobservable objects compared to a more realistic volumetric distribution). We choose true values of the parameters in this model to be Λ = 100, L_* = 1, α = −1/2, and F_th = 1/4π; this latter choice means that the detection probability for an L_* object at z = 1 is 50 per cent. For these choices, one draw of a random universe produces the distribution of observed and true luminosities shown in Fig. 1. In this particular draw, we observed 24 objects and missed 80 in our survey.

Figure 1.

The distribution of observed (blue) and true (orange) luminosities for a draw from the model discussed in Section 6. Due to selection effects, the distribution of observed luminosities peaks at higher luminosity and falls more rapidly at low luminosity than the true distribution of sources.

Open in new tab Download slide

Applying the ‘top-down’ methodology to this problem, the crucial integral in equation (23) is not analytically tractable, though both the population distribution and the selection function are simple functions. We must evaluate this integral numerically. We choose to do this by sampling over the un-observed population and associated data (subject to the constraint that the fluxes associated to the unobserved population are always below F_th) in an MCMC at the same time as we sample the properties of the population and observed objects. That is we explicitly implement equation (18) as our posterior density, summing over the (unknown) number of non-detected systems. Sampling over the unobserved population with this posterior is a method for numerically evaluating the selection integral. Code and results implementing this model in the stan sampler (Carpenter et al. 2017) can be found at https://github.com/farr/SelectionExample. One result of the sampling is an estimate of the luminosity function parameters L_* and α; a joint posterior on these parameters appears in Fig. 2. The analysis also recovers with similar accuracy the expected number of objects in the survey volume (Λ), improved estimates of each object’s intrinsic luminosity (informed by the population model), and luminosity distributions of the set of objects too dim to be observed by the survey, as a by-product of the selection function modelling.

Figure 2.

Marginal posterior distribution for the L_* and α parameters of the luminosity function (see equation 25) from the model and data described in Section 6. The black lines indicate the true values of the parameters.

Open in new tab Download slide

7 AN ILLUSTRATION: MEASURING THE MASS RATIO OF BINARY NEUTRON STARS

Do all or the majority of merging binary neutron stars have mass ratios very close to unity? Is the answer to this question redshift- or metallicity-dependent? This question is an important science driver for third-generation gravitational-wave detectors.⁵ Here, we examine how many neutron star binary mergers must be detected in order to measure the mass-ratio distribution, providing an illustration of the methodology described in the previous sections.

The binary neutron star mass ratio distribution is sensitive to the mass ejections associated with neutron star formation in a supernova and the velocity kicks that neutron stars receive at birth. For example, fig. 3 of Vigna-Gómez et al. (2018) illustrates the differences in the mass ratio distributions under different assumptions about mass fallback and natal kicks. Since models show a preference for equal mass ratios q = m₂/m₁, we assume a simple single-parameter form for the intrinsic mass ratio distribution:

\begin{eqnarray*} p(\eta) \propto e^{(\eta -0.25)/\lambda }\, , \end{eqnarray*}

(28)

where η = q/(1 + q)² is the symmetric mass ratio. We use the symmetric mass ratio because it tends to have more symmetric error bars than q; when component masses are equal, q = 1 and η = 0.25.

The likelihood function on the data is, in general, quite complex (Veitch et al. 2015), and depends on a multitude of other parameters, such as spins, which must then be marginalized over to obtain |$p(\vec{d}|\eta)$|⁠. We will approximate the problem by viewing the data as a point estimate of the symmetric mass ratio |$\hat{\eta }$| (one can think of it as a maximum-likelihood estimate) with a Gaussian likelihood function given by

\begin{eqnarray*} p(\hat{\eta }|\eta) \propto \exp \left\lbrace {-\frac{(\hat{\eta }-\eta)^2}{2 \sigma _\eta ^2}}\right\rbrace . \end{eqnarray*}

(29)

We use a simple Fisher-information-matrix analysis with a noise power spectral density shape representative of a potential third-generation detector⁶ to estimate the expected measurement uncertainty σ_η. We follow Poisson & Will (1995) in using frequency-domain post-Newtonian waveforms, which can be analytically differentiated and are adequate for binary neutron star analysis, allowing us to rapidly estimate the accuracy of inference. We do not impose priors, include a spin-orbit coupling term but ignore the spin–spin coupling term as suggested by Poisson & Will (1995). We derive the following simple fit to the measurement uncertainty on η for a signal from a canonical 1.4 + 1.4 M_⊙ binary with non-spinning components as a function of the event SNR ρ:

\begin{eqnarray*} \sigma _\eta = \frac{0.12}{\rho }+\frac{4}{\rho ^2}+\frac{250}{\rho ^3} \end{eqnarray*}

(30)

This fit is accurate to better than 10 per cent for ρ > 18. The inverse of the Fisher information matrix is no longer a good estimate for the covariance matrix at lower values of ρ where the linear signal approximation breaks down, the log-likelihood ceases to be well approximated by a quadratic (Vallisneri 2008), and the prior constraints on variables strongly correlated with η, such as the spin parameters, become increasingly important. In any case, the mass ratio constraints become very poor at low ρ; for example, despite a ρ of 32, the mass ratio of the binary neutron star merger GW170817 could only be constrained to q ∈ [0.4, 1] at 90 per cent confidence (Abbott et al. 2017a).

The SNR at a given distance scales as |$M_\mathrm{ c}^{5/6}$|⁠, where M_c ∝ η^3/5 is the chirp mass, consistent with the inspiral amplitude scaling. We assume that the distance D to the event is drawn from a p(D) ∝ D² distribution consistent with a flat, isotropic universe, and is known perfectly. With this simplification, the SNR as used in equation (30) follows

\begin{eqnarray*} \rho \propto \eta ^{0.5} \frac{18}{D}\, . \end{eqnarray*}

(31)

The observed SNR |$\hat{\rho }$| follows the same scaling, but with the dependence on the data |$\hat{\eta }$|⁠, not the true event mass ratio η. In line with comments on the validity of the Fisher information matrix we will use a detection threshold |$\hat{\rho } \ge 18$| in this simplified treatment; the detectability conditioned on the observed data is thus independent of the source properties.

We test the self-consistency of the inference on λ by creating 100 mock populations with random values of λ drawn from the flat prior λ ∈ [0, 0.1]. For each population, we compute the posterior distribution on λ following the methodology described above. We then ask for the quantile of the true value of λ within this posterior. Fig. 3 shows the cumulative distribution of this quantile value, the so-called p–p plot. If posteriors are self-consistent, we expect the truth to fall within the X per cent Bayesian credible interval X per cent of the time, i.e. the p–p plot should be diagonal (e.g. Cook, Gelman & Rubin 2006; Sidery et al. 2014; Veitch et al. 2015). We confirm that the p–p plot is consistent with the diagonal within statistical fluctuations.

Figure 3.

The p–p plot of the cumulative distribution of the quantile of the true value of λ within its posterior as estimated from 10 (solid orange curve) and 100 (solid blue curve) mock data sets. These are consistent with the diagonal (dashed black line). For comparison we show the corresponding results, as dashed lines, from using one particular wrong method, as described in Section 5.

Open in new tab Download slide

$The width of the 90 per cent credible interval Δλ as a function of the number of detections; the true value of λ is 0.05 in all mock catalogues. The fluctuations relative to the $\Delta \lambda \propto N_\mathrm{det}^{-1/2}$ trend are due to the stochastic nature of the detected sample.$

Figure 4.

The width of the 90 per cent credible interval Δλ as a function of the number of detections; the true value of λ is 0.05 in all mock catalogues. The fluctuations relative to the |$\Delta \lambda \propto N_\mathrm{det}^{-1/2}$| trend are due to the stochastic nature of the detected sample.

Open in new tab Download slide

Having tested the method and its implementation, we now analyse the uncertainty in the inferred value of λ. This time, we fix the value of λ at λ = 0.05 when generating mock data catalogues, but vary the number of simulated events, with a subset of the events labelled as detectable. We compute the width of the 90 per cent credible interval on λ, defined here as stretching from the 5th to the 95th percentile of the posterior. In figure, we plot this width Δλ against the number of detectable events.

We find that ∼1000 detections at ρ ≥ 18 are necessary in order to measure λ to an accuracy δλ ≈ 0.01. Distributions with λ = 0.01 and λ = 0.02 yield median values of η (q) of 0.243 (0.71) and 0.236 (0.62), respectively, so at least a thousand detections are required in order to make meaningful inference on the mass ratio distribution with a view to distinguishing evolutionary models. An even greater number of detections would be required in each of several redshift bins in order to search for redshift-dependent changes in the mass ratio distribution – perhaps |$\mathcal {O}(10000)$|⁠, given the plausible variation of the mass ratio distribution with redshift.

ACKNOWLEDGEMENTS

IM and WF thank Tom Loredo for useful discussions and the Statistical and Applied Mathematical Sciences Institute, partially supported by the National Science Foundation under Grant DMS-1127914, for hospitality. IM’s work was performed in part at Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611; IM’s visit there was partially supported by a grant from the Simons Foundation. IM thanks Stephen Justham, Vicky Kalogera, and Fred Rasio for discussions related to the illustrative example. We thank Arya Farahi for alerting us to a typo in the manuscript, and the anonymous referee for a number of insightful comments.

Footnotes

1

In practice, there are very weak deviations from this power law due to the imperfect – noisy – measurement of signal amplitude.

2

An example where the selection may be parameter- rather than data-dependent is in surveys of objects that have been selected based on data in yet other surveys; Maggie Lieu pointed us to X-ray selected populations of galaxy clusters in a weak-lensing catalogue. This can still be treated within the framework proposed here, by considering the combined likelihood for both data sets and marginalizing over the ‘discarded’ data from the survey used for selection.

3

This is a very artificial model since all detectors have noise and the reason that |$p_\textrm{det}(\vec{\theta })$| is not equal to one is because of that noise. However, it serves to illustrate the basic idea.

4

The rationale for writing this as a double-integral, when the integral over |$\vec{d}$| is in fact trivial – since the likelihood is normalized over |$\vec{d}$| – will become apparent below.

5

This was identified as a key goal during an ongoing study commissioned by the Gravitational Wave International Committee (GWIC), https://gwic.ligo.org/3Gsubcomm/charge.shtml.

6

We assume that the noise spectral density is proportional to the LIGO A + design, https://dcc.ligo.org/LIGO-T1800042/public.

REFERENCES

Abbott

B. P.

et al. .,

2016

,

Phys. Rev. X

,

6

,

041015

10.1103/PhysRevX.6.041015

Crossref

Abbott

B. P.

et al. .,

2017a

,

Phys. Rev. Lett

.,

119

,

161101

10.1103/PhysRevLett.119.161101

Crossref

Search ADS

Abbott

B. P.

et al. .,

2017b

,

Nature

,

551

,

85

10.1038/551425a

Crossref

Search ADS

Barrett

J. W.

,

Gaebel

S. M.

,

Neijssel

C. J.

,

Vigna-Gómez

A.

,

Stevenson

S.

,

Berry

C. P. L.

,

Farr

W. M.

,

Mandel

I.

,

2018

,

MNRAS

,

477

,

4685

10.1093/mnras/sty908

Crossref

Search ADS

Bovy

J.

,

Hogg

D. W.

,

Roweis

S. T.

,

2011

,

Ann. Appl. Stat.

,

5

,

1657

10.1214/10-AOAS439

Crossref

Search ADS

Carpenter

B.

et al. .,

2017

,

J. Stat. Softw.

,

76

,

1

10.18637/jss.v076.i01

Crossref

Search ADS

Chennamangalam

J.

,

Lorimer

D. R.

,

Mandel

I.

,

Bagchi

M.

,

2013

,

MNRAS

,

431

,

874

10.1093/mnras/stt205

Crossref

Search ADS

Cook

S. R.

,

Gelman

A.

,

Rubin

D. B.

,

2006

,

J. Comput. Graph. Stat.

,

15

,

675

10.1198/106186006X136976

Crossref

Search ADS

Farr

W. M.

,

Mandel

I.

,

2018

,

Science

,

361

,

aat6506

10.1126/science.aat6506

Crossref

Search ADS

Farr

W. M.

,

Sravan

N.

,

Cantrell

A.

,

Kreidberg

L.

,

Bailyn

C. D.

,

Mandel

I.

,

Kalogera

V.

,

2011

,

ApJ

,

741

,

103

10.1088/0004-637X/741/2/103

Crossref

Search ADS

Farr

W. M.

,

Mandel

I.

,

Aldridge

C.

,

Stroud

K.

,

2014

, preprint (arXiv:1412.4849)

Farr

W. M.

,

Gair

J. R.

,

Mandel

I.

,

Cutler

C.

,

2015

,

Phys. Rev. D

,

91

,

023005

10.1103/PhysRevD.91.023005

Crossref

Search ADS

Feeney

S. M.

,

Peiris

H. V.

,

Williamson

A. R.

,

Nissanke

S. M.

,

Mortlock

D. J.

,

Alsing

J.

,

Scolnic

D.

,

2019

,

Phys. Rev. Lett.

,

122

,

061105

10.1103/PhysRevLett.122.061105

Crossref

Search ADS

PubMed

Finn

L. S.

,

Chernoff

D. F.

,

1993

,

Phys. Rev. D

,

47

,

2198

10.1103/PhysRevD.47.2198

Crossref

Search ADS

Fishbach

M.

,

Holz

D. E.

,

2017

,

ApJ

,

851

,

L25

10.3847/2041-8213/aa9bf6

Crossref

Search ADS

Fishbach

M.

,

Holz

D. E.

,

Farr

W. M.

,

2018

,

ApJ

,

863

,

L41

10.3847/2041-8213/aad800

Crossref

Search ADS

Foreman-Mackey

D.

,

Hogg

D. W.

,

Morton

T. D.

,

2014

,

ApJ

,

795

,

64

10.1088/0004-637X/795/1/64

Crossref

Search ADS

Gair

J. R.

,

Moore

C. J.

,

2015

,

Phys. Rev. D

,

91

,

124062

10.1103/PhysRevD.91.124062

Crossref

Search ADS

Gair

J. R.

,

Tang

C.

,

Volonteri

M.

,

2010

,

Phys. Rev. D

,

81

,

104014

10.1103/PhysRevD.81.104014

Crossref

Search ADS

Hogg

D. W.

,

Myers

A. D.

,

Bovy

J.

,

2010

,

ApJ

,

725

,

2166

10.1088/0004-637X/725/2/2166

Crossref

Search ADS

Loredo

T. J.

,

2004

, in

Fischer

R.

,

Preuss

R.

,

Toussaint

U. V.

, eds,

AIP Conf. Proc., vol. 735. Accounting for Source Uncertainties in Analyses of Astronomical Survey Data

.

Am. Inst. Phys

,

New York

, p.

195

10.1063/1.1835214

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Crossref

Loredo

T. J.

,

Wasserman

I. M.

,

1995

,

ApJS

,

96

,

261

10.1086/192119

Crossref

Search ADS

Malmquist

K. G.

,

1922

,

Meddelanden fran Lunds Astronomiska Observatorium Serie I

,

100

,

1

Malmquist

K. G.

,

1925

,

Meddelanden fran Lunds Astronomiska Observatorium Serie I

,

106

,

1

Mandel

I.

,

2010

,

Phys. Rev. D

,

81

,

084029

10.1103/PhysRevD.81.084029

Crossref

Search ADS

Mandel

I.

,

Farr

W. M.

,

Gair

J. R.

,

2016

, Available at: https://dcc.ligo.org/LIGO-P1600187/public

Messenger

C.

,

Veitch

J.

,

2013

,

New J. Phys.

,

15

,

053027

10.1088/1367-2630/15/5/053027

Crossref

Search ADS

Poisson

E.

,

Will

C. M.

,

1995

,

Phys. Rev. D

,

52

,

848

10.1103/PhysRevD.52.848

Crossref

Search ADS

Schechter

P.

,

1976

,

ApJ

,

203

,

297

10.1086/154079

Crossref

Search ADS

Sidery

T.

et al. .,

2014

,

Phys. Rev. D

,

89

,

084060

10.1103/PhysRevD.89.084060

Crossref

Search ADS

Vallisneri

M.

,

2008

,

Phys. Rev. D

,

77

,

042001

10.1103/PhysRevD.77.042001

Crossref

Search ADS

Veitch

J.

et al. .,

2015

,

Phys. Rev. D

,

91

,

042003

10.1103/PhysRevD.91.042003

Crossref

Search ADS

Vigna-Gómez

A.

et al. .,

2018

,

MNRAS

,

481

,

4009

Crossref

Search ADS

Wysocki

D.

,

Lange

J.

,

O’Shaughnessy

R.

,

2018

, preprint (arXiv:1805.06442)

Youdin

A. N.

,

2011

,

ApJ

,

742

,

38

10.1088/0004-637X/742/1/38

Crossref

Search ADS

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
March 2019	2
April 2019	17
May 2019	12
June 2019	24
July 2019	16
August 2019	8
September 2019	15
October 2019	17
November 2019	18
December 2019	14
January 2020	22
February 2020	5
March 2020	8
April 2020	4
May 2020	8
June 2020	10
July 2020	10
August 2020	16
September 2020	19
October 2020	21
November 2020	15
December 2020	18
January 2021	4
February 2021	12
March 2021	10
April 2021	17
May 2021	10
June 2021	19
July 2021	24
August 2021	24
September 2021	33
October 2021	37
November 2021	38
December 2021	29
January 2022	7
February 2022	14
March 2022	22
April 2022	16
May 2022	13
June 2022	38
July 2022	46
August 2022	48
September 2022	37
October 2022	66
November 2022	69
December 2022	25
January 2023	34
February 2023	68
March 2023	58
April 2023	100
May 2023	42
June 2023	54
July 2023	95
August 2023	61
September 2023	79
October 2023	83
November 2023	57
December 2023	59
January 2024	70
February 2024	57
March 2024	71
April 2024	58
May 2024	48
June 2024	52
July 2024	76
August 2024	52
September 2024	49
October 2024	78
November 2024	76
December 2024	73
January 2025	55
February 2025	75
March 2025	61
April 2025	70
May 2025	5

Article Contents

Extracting distribution parameters from multiple uncertain observations with selection biases

Abstract

1 INTRODUCTION

2 PROBLEM STATEMENT AND NOTATION

3 BOTTOM-UP DERIVATION

4 TOP-DOWN DERIVATION

5 HOW IMPORTANT IS IT TO INCLUDE SELECTION EFFECTS?

6 AN ILLUSTRATION: MEASURING A LUMINOSITY FUNCTION WITH A FLUX-LIMITED SURVEY

7 AN ILLUSTRATION: MEASURING THE MASS RATIO OF BINARY NEUTRON STARS

ACKNOWLEDGEMENTS

Footnotes

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Extracting distribution parameters from multiple uncertain observations with selection biases

Abstract

1 INTRODUCTION

2 PROBLEM STATEMENT AND NOTATION

3 BOTTOM-UP DERIVATION

4 TOP-DOWN DERIVATION

5 HOW IMPORTANT IS IT TO INCLUDE SELECTION EFFECTS?

6 AN ILLUSTRATION: MEASURING A LUMINOSITY FUNCTION WITH A FLUX-LIMITED SURVEY

7 AN ILLUSTRATION: MEASURING THE MASS RATIO OF BINARY NEUTRON STARS

ACKNOWLEDGEMENTS

Footnotes

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only