Abstract

Aims

Fits of species-abundance distributions to empirical data are increasingly used to evaluate models of diversity maintenance and community structure and to infer properties of communities, such as species richness. Two distributions predicted by several models are the Poisson lognormal (PLN) and the negative binomial (NB) distribution; however, at least three different ways to parameterize the PLN have been proposed, which differ in whether unobserved species contribute to the likelihood and in whether the likelihood is conditional upon the total number of individuals in the sample. Each of these has an analogue for the NB. Here, we propose a new formulation of the PLN and NB that includes the number of unobserved species as one of the estimated parameters. We investigate the performance of parameter estimates obtained from this reformulation, as well as the existing alternatives, for drawing inferences about the shape of species abundance distributions and estimation of species richness.

Methods

We simulate the random sampling of a fixed number of individuals from lognormal and gamma community relative abundance distributions, using a previously developed ‘individual-based’ bootstrap algorithm. We use a range of sample sizes, community species richness levels and shape parameters for the species abundance distributions that span much of the realistic range for empirical data, generating 1 000 simulated data sets for each parameter combination. We then fit each of the alternative likelihoods to each of the simulated data sets, and we assess the bias, sampling variance and estimation error for each method.

Important Findings

Parameter estimates behave reasonably well for most parameter values, exhibiting modest levels of median error. However, for the NB, median error becomes extremely large as the NB approaches either of two limiting cases. For both the NB and PLN, >90% of the variation in the error in model parameters across parameter sets is explained by three quantities that corresponded to the proportion of species not observed in the sample, the expected number of species observed in the sample and the discrepancy between the true NB or PLN distribution and a Poisson distribution with the same mean. There are relatively few systematic differences between the four alternative likelihoods. In particular, failing to condition the likelihood on the total sample sizes does not appear to systematically increase the bias in parameter estimates. Indeed, overall, the classical likelihood performs slightly better than the alternatives. However, our reparameterized likelihood, for which species richness is a fitted parameter, has important advantages over existing approaches for estimating species richness from fitted species-abundance models.

INTRODUCTION

Ecologists have sought to understand what drives patterns in the commonness and rarity of species for much of the last century (Magurran and Henderson 2011). In high-diversity assemblages, species-by-species estimates of demographic and other ecological parameters, such as competitive ability, cannot be made feasibly for the entire community, due both to the sheer number of species that must be studied and to the fact that the overwhelming majority of species in such communities are rare (McGill et al. 2007). Consequently, studies of commonness and rarity in high-diversity assemblages often focus on the overall frequency distributions of common and rare species, usually termed the ‘species-abundance distribution’ or ‘relative abundance distribution’, without respect to species identity. Such data are frequently fitted with alternative models, either as part of an attempt to identify the mechanisms that drive biodiversity maintenance in the community (Hubbell 2001; Lande et al. 2003) or to assess community properties for conservation purposes (Engen 2007; Gray 1981).

Two probability distributions that commonly recur in theory for species abundance distributions are the lognormal and the gamma distributions (Lande et al. 2003). These two distributions were first proposed when the analysis of commonness and rarity was still in its infancy, largely for phenomenological reasons (Fisher et al. 1943; Preston 1948). More recently, they have been derived from more mechanistic models of community structure. For instance, the lognormal and gamma arise as important special cases of stochastic abundance models (Diserud and Engen 2000; Engen and Lande 1996a, 1996b). The lognormal distribution also arises as a limiting case of the occupancy of high-dimensional niche space (Connolly et al. 2009; May 1975) and as an approximation for second-order departures from pure neutrality (Pueyo 2006). The gamma distribution, or its discrete analogue, the negative binomial (NB) distribution, is generated by several formulations of neutral theory (Etienne et al. 2007; Volkov et al. 2003, 2007).

Most species abundance distributions record counts of individuals, and they are incomplete samples from a community, rather than complete censuses. Thus, in most contemporary approaches to the analysis of species-abundance models, sampling theory is applied to derive a distribution for the probability that a species is represented by a particular number of individuals in a random sample from a community with a particular underlying community abundance distribution (Connolly and Dornelas 2011):
(1)
where P(n|x) is the probability of observing n individuals of a species in a sample, given that its mean abundance in the community is x, and f(x) is the probability distribution of mean abundance (i.e. it approximates the variation among species in mean abundance).
When P(n|x) is a Poisson distribution (i.e. sampling is random) and f(x) is a lognormal distribution, then the corresponding sample follows a Poisson lognormal (PLN) distribution:
(2)
where n is a natural number (n = 0, 1, 2, …), and μ and σ are the mean and standard deviation (SD) of log abundance in the community. When f(x) is the gamma distribution, then the sample follows an NB distribution:
(3)
where a and k are the scale and shape parameters of the gamma distribution, and m = ak is the mean of the resulting NB distribution.
In most cases, when the PLN and NB are fitted to species-abundance data, the log-likelihood that is used has a so-called ‘zero-truncated’ form and implicitly assumes that all species’ abundances are independent of one another:
(4)

In this expression, θ is the vector of parameter values (μ and σ for the PLN, and m and k for NB), Sobs is the number of species observed in the sample, N is the total number of individuals in the sample, sn is the number of species with abundance n and p(n) is the PLN or NB probability from equation (2) or (3). formula is thus the probability that a species has abundance n, conditional on the fact that it is observed at least once in the sample (i.e. its abundance is not zero). The product follows from the assumption that all species’ abundances are statistically independent of one another, and the multinomial coefficient in front of the product accounts for all the different ways that the Sobs species can have the observed set of abundances.

There are two unrealistic assumptions associated with the likelihood in equation (4) that are potentially problematic. One is the implicit assumption that a species’ abundance n can be arbitrarily large. Another is the assumption that species’ abundances are statistically independent of one another. In ecological samples, however, the abundance of a species in a sample will be constrained by total sampling effort or sample size and the abundances of the other species. One way to explicitly incorporate these two assumptions in the likelihood is to make it conditional on the total number of individuals observed in the sample:
(5)
where ni is the abundance of species i in the sample (Etienne and Olff 2005). Here, the numerator is the unconditional likelihood from equation (4), and the denominator normalizes this likelihood by all possible combinations of abundances of the Sobs observed species that would add up to a total sample size of N individuals. Etienne and Olff (2005) applied this likelihood to the PLN, but the expression applies equally well to the NB.
Although the likelihoods in equations (4) and (5) consider only the species that appear in the sample being analysed, they can be used to estimate the total number of species in the community. That is, the fitted parameters θ can be used in equation (2) or (3) to estimate p(0) and the size of the community species pool obtained from:
(6)
(Pielou 1975). However, because this is a derived parameter (i.e. it is not estimated directly by an optimization procedure when the likelihoods in equation (4) or (5) are fitted to species-abundance data), estimating the sampling variance for this potentially ecologically important quantity requires a computationally intensive approach (Connolly et al. 2005; Engen 2007). Moreover, one may wish to constrain multiple samples from the same community to have the same S or to specify a covariate model for variation in S among different communities. When S is a derived rather than a fitted parameter, such constrains are extremely difficult to impose.
Recently, Izsak (2008) proposed re-formulating the PLN likelihood with three fitted parameters, to explicitly estimate the total number of species in the community, S:
(7)

Note that, here, the products run from 0 to N, rather than from 1 to N, to incorporate the species with abundance zero, and the probabilities are not normalized by 1 − p(0). An analogous approach could be adopted for the NB, estimating m, k and S. In addition to requiring estimation of an additional parameter (which we might expect to increase the sampling variance of parameter estimates), this approach has the disadvantage of incorporating hypothetical data (i.e. the unobserved species) into the likelihood. The incorporation of unobserved species is potentially problematic for model selection: if two competing models estimate different numbers of unobserved species, the one that estimates the most unobserved species will tend to have a smaller likelihood, even if the two models fit the observed part of the distribution equally well (see  Appendix).

Because of the growing interest in fitting species abundance models to data, we here comparatively evaluate the statistical performance of different approaches to fitting the PLN and NB to species abundance data. First, we propose an alternative parameterization of the PLN and NB, which includes species richness as an estimated parameter, but which has only two fitted parameters, and for which unobserved species do not contribute to the likelihood. We then conduct a simulation study to assess error of parameter estimates using this new likelihood, as well as the classical (equation 4), conditional (equation 5) and three-parameter (equation 7) likelihoods for both these distributions. We also evaluate how differences in bias and sampling variance of parameter estimates contribute to these differences. In particular, we are interested in whether the failure to condition on sample size increases the error associated with the likelihoods that omit this normalization and on whether a reparameterization of the zero-truncated likelihoods provides a satisfactory alternative to the three-parameter likelihood of equation (7) when direct estimation of S is desired. Finally, we identify regression models that explain variation in estimate error among a broad range of parameter sets in terms of functions of those parameters.

APPROXIMATE TWO-PARAMETER LIKELIHOOD WITH SPECIES RICHNESS

The first moment of a PLN distribution is formula (Shaban 1988). Consequently, if the true number of species in the community is S, then the expected total number of individuals, given μ and σ, is
(8)
We can solve this to express μ as a function of S and σ, if we substitute the observed sample size, N, for the expected sample size:
(9a)
(9b)

By substituting the right-hand side of equation (9b) for μ in equation (2), the classical (zero-truncated) PLN is reparameterized as a function of the two community-level parameters, σ and S, rather than μ, which depends on sampling intensity. Note that this is a kind of hybrid moment–maximum likelihood approach because the observed total sample size N has been used in place of E(N) in equation (9a).

We can reparameterize the NB in similar fashion by expressing the mean, m, in terms of S and N:
(10)

Thus, S can be explicitly estimated without hypothetical, unobserved species contributing to the likelihood. Also, in contrast to the three-parameter formulation, there are only two fitted parameters (S and the shape parameter σ or k).

METHODS

To evaluate the bias and precision of the parameter estimates produced by the four alternative likelihoods of the PLN and NB, we conduct parametric bootstrap simulations. That is, we simulate data using a set of ‘true’ parameter values, and we compare the estimated values of our parameters with these true values. The data are simulated using a previously developed ‘hypergeometric’ bootstrapping algorithm, in which a fixed number of individuals N is sampled from an underlying relative abundance distribution of Strue species that is either lognormal with shape parameter σtrue or gamma with shape parameter ktrue (see Connolly et al. 2009 for a detailed description).

Bootstrap simulations were conducted for a range of parameter values designed to capture much of the realistic range for species abundance data likely to be fitted with species-abundance models (Fig. 1). We consider sample sizes N of 100, 1 000 and 10 000 because this spans the orders of magnitude of sample size exhibited by most species abundance data sets, at least for macroscopic organisms. For example, the largest samples from individual locations (e.g. Dornelas and Connolly 2008; Volkov et al. 2007) are on the order of tens of thousands of individuals. We consider Strue values of 20, 100 and 500.

illustration of the expected abundance distributions associated with the parameter sets explored in this paper. The left column of panels shows the PLN and the right column the NB. In each panel, color indicates the shape parameter, and line type indicates the sample size (in number of individuals), for species richness levels of (a, b) S = 20, (c, d) S = 100 and (e, f) S = 500.
Figure 1:

illustration of the expected abundance distributions associated with the parameter sets explored in this paper. The left column of panels shows the PLN and the right column the NB. In each panel, color indicates the shape parameter, and line type indicates the sample size (in number of individuals), for species richness levels of (a, b) S = 20, (c, d) S = 100 and (e, f) S = 500.

For the PLN, we consider σtrue values of 1, 2 and 3 for the standard deviation (SD) of log abundance. Lande et al. (2003) report σ = 2 as a typical value for natural communities. Moreover, analyses of ensembles of species abundance distributions suggest that estimates very rarely fall below 1 or above 3. For instance, in Connolly et al. (2005), all but one local coral and fish abundance distribution had 1 < σ < 3 (also see Engen et al. (2002, 2008) for insects and Gaston and Blackburn (2000) for birds). From equation (8), these 27 parameter combinations yield estimates of the sampling intensity parameter μtrue that range from about −6 to 6. We excluded one combination of parameter values (N = 100, Strue = 20, σtrue = 3) because the expected number of observed species (i.e. those with abundance >0) was <10. We think it unlikely that anyone would seek to fit a species-abundance model to such a dataset at least in isolation.

For the NB, we wished to explore small values of the shape parameter, k, which produce log series-like distributions (i.e. distributions without a mode on a log scale, k < 1); as well as cases with k > 1, which produce distributions of log abundance with an internal mode. Here, we choose ktrue values of 0.02, 0.2, 2 and 20. Ideally, we would have liked to explore even smaller values of ktrue. In particular, the log series represents a limiting case of the NB for which k → 0 and fitted estimates of the NB often yield values of k that are extremely small (Connolly et al. 2009; Volkov et al. 2007). However, for very small values of k, p(0) tends to unity (see equation 3). This tends to cause significant problems with convergence of optimization algorithms. These problems are manageable when fitting one or a few data sets but not when tens of thousands of fits are involved, as in the present study. In addition, several parameter combinations led to expected numbers of observed species <10; as with the PLN, these parameter combinations were excluded. These included all cases with k = 0.02 and S = 20, as well as the combination N = 100, k = 0.02 and S = 100.

For all simulated data sets, parameters were estimated by numerical maximization of the model’s log-likelihood, using the optimization function nlminb() in R (R Core Development Team 2011). Inspection of likelihood surfaces suggested that a commonly used alternative, optim(), tended to falsely report successful convergence for abundance distributions with small μ for the PLN, or small or large k for the NB, too frequently to yield reliable sampling distributions of parameter estimates. For the NB, we used the function dnbinom() to calculate p(n) in equation (3). For the PLN, convergence problems were frequently encountered when using the R function integrate() to evaluate equation (2). Therefore, we used Bulmer’s (1974) closed-form approximation for p(n), for n ≥ 25:
(11)

Using a smaller cutoff abundance value (e.g. n > 10, as recommended by Bulmer 1974) increased the prevalence of convergence problems. Conversely, increasing the threshold further (e.g. to n = 100) did not measurably improve either convergence or sampling distributions of parameter estimates. To calculate p(n), we used the function dpolono() in package VGAM (Yee 2010), which allows user specification of the cutoff abundance value from which to apply equation (11). For the constrained PLN and NB likelihoods, we applied a fast Fourier transform to approximate the quantity in the denominator of equation (5) (cf. Etienne and Olff 2005), using the R function fft(). Note that, in fitting the three-parameter likelihood, we convert factorials to gamma functions, in order to estimate S as a real number (rather than an integer, as in Izsak 2008): this helps maintain comparability with the other likelihoods, for which S is a derived parameter and can take non-integer values.

We used the frequency distributions of estimates of the natural logarithm of the community-level parameters of the model (S^ and the shape parameters σ^ or k^) from the fits to simulated data to evaluate the performance of each of the different likelihood approaches. For the classical and conditional likelihoods, we calculated, from the other fitted parameters (equation 6). In general, we found that these frequency distributions were not consistently skewed, but they were frequently leptokurtic, with a large number of outliers. Therefore, we quantify estimator performance using absolute median error, rather than mean-squared error:
(12)
where θ^i is the estimate of parameter θ (log(σ), log(S) or log(k)) for simulated data set i and θtrue is the value of the parameter that was actually used to simulate the data. Similarly, we quantify bias using the median:
(13)

For the same reason, we compared sampling variability in parameter estimates based on the interquartile range (IQR), rather than the SD, of parameter estimates (see, e.g. Cabrera and Watson 1997).

To assess whether estimators were reasonably statistically well behaved (i.e. biases were small relative to their stochastic variability), we normalized MB to IQR. To compare the performance of the different likelihoods, we analysed their absolute median errors across all the parameter sets examined, using paired Wilcoxon signed-ranks tests (with Bonferroni corrections), along with graphical inspection. We inspect median bias and IQR to understand the extent to which systematic differences between likelihood methods in absolute median error are due to differences in bias or sampling variance. We also examine absolute median error as functions of the parameters used to generate the data, in order to identify the characteristics of species abundance distributions that are likely to be associated with reliable or unreliable parameter estimates. Finally, we develop a regression model to better understand what drives variation in absolute median error among our true parameter sets.

RESULTS

Summary statistics

Box plots of parameter estimates (on a natural logarithm scale and normalized to the true parameter values) are shown in Supplementary DataSupplementary Data (see online supplementary material). Median biases are shown in Supplementary Data and Supplementary Data (see online supplementary material), IQR in Supplementary Data and Supplementary Data (see online supplementary material) and absolute median error in Supplementary Data and Supplementary Data (see online supplementary material). Convergence information is reported in Supplementary Data (see online supplementary material).

Comparison of methods: bias and variance

Frequency distributions of the median bias, relative to IQR, across parameter sets, indicated that parameter estimates were well behaved in most cases (Fig. 2). In particular, for the PLN, median biases were always less than IQR, with the exception of two parameter sets for the reparameterized model (Fig. 2a and b). These corresponded to (N = 1 000 and N = 10 000, σtrue = 3, Strue = 100), for which S^ was more strongly negatively biased than it was for the other likelihoods (Supplementary Data, see online supplementary material). For the NB, the majority of parameter sets also produced relatively well-behaved estimates, except when ktrue = 0.02 (Fig. 2c and d). In these cases, for both parameters, there were four cases for the three-parameter likelihood and one for the constrained likelihood, for which bias exceeded IQR for one or both parameters. For the three-parameter likelihood, these cases all involved large positive biases in k, and negative biases in S, whereas the opposite was the case for the constrained likelihood (Supplementary Data and Supplementary Data, see online supplementary material).

median bias, expressed as a proportion of IQR, for (a) log(σ) and (b) log(S) of the PLN and (c) log(k) and (d) log(S) of the NB. Each panel shows the frequency distribution (across parameter sets) of the ratio of median bias to IQR, for each of the four likelihoods considered in this paper.
Figure 2:

median bias, expressed as a proportion of IQR, for (a) log(σ) and (b) log(S) of the PLN and (c) log(k) and (d) log(S) of the NB. Each panel shows the frequency distribution (across parameter sets) of the ratio of median bias to IQR, for each of the four likelihoods considered in this paper.

For the NB, parameter estimates also failed to converge in a large proportion of simulations for three parameter sets with N = 100: ktrue = 20 and Strue = 100, ktrue = 2 and Strue = 500 and ktrue = 20 and Strue = 500 (Supplementary Data, see online supplementary material). In these cases, K^ sometimes wandered off towards very large positive values, indicating that the simulated data were close to a Poisson distribution. For these cases, IQRs in k were very large for the classical, reparameterized and constrained likelihoods but not the three-parameter likelihood (Supplementary Data, see online supplementary material).

Comparison of methods: absolute median error

For the PLN, the classical likelihood performed best, exhibiting low absolute median error for both σ and S (Fig. 3; Supplementary Data, see online supplementary material). For σ, the constrained likelihood performed worst, with absolute median error consistently and significantly higher than those of the other likelihoods. Conversely, there was no clear best-performing likelihood, with the classical and reparameterized likelihoods exhibiting similar levels of absolute median error, and the three-parameter likelihood only marginally worse. Poor performance of the constrained likelihood was due to its larger IQR than the alternatives, rather than larger bias (Supplementary Data, see online supplementary material). For S, the classical likelihood had significantly lower median error than all the alternatives, while the other three likelihoods did not exhibit significant differences in median error (Fig. 3; Supplementary Data, see online supplementary material). The good performance of the classical likelihood was due mainly to its lower bias in S, compared to the reparameterized and three-parameter likelihoods (Supplementary Data, see online supplementary material), but to its lower IQR, compared to the constrained likelihood (Supplementary Data, see online supplementary material).

comparison of absolute median error in PLN parameter estimates across the four likelihoods. Median error in log(σ) is shown in the upper left half of the figure; median error in log(S) is shown in the bottom right. In each panel, the coordinates of each point represent the absolute median error (across simulations) of the two likelihoods being compared, for one of the parameter sets used to simulate the data. In each panel, these median errors were recalculated using only those simulated datasets for which fits converged for both of the likelihoods being compared. The solid line is the unity line. Thus, points above and left of the unity line indicate that the method plotted on the vertical axis exhibits greater median error than that on the horizontal axis and vice versa for points below and right of the unity line. P values report the results of a Wilcoxon signed-ranks test for paired observations. P values indicating statistically significant differences between methods (after Bonferroni correction: critical α = 0.083) are in bold.
Figure 3:

comparison of absolute median error in PLN parameter estimates across the four likelihoods. Median error in log(σ) is shown in the upper left half of the figure; median error in log(S) is shown in the bottom right. In each panel, the coordinates of each point represent the absolute median error (across simulations) of the two likelihoods being compared, for one of the parameter sets used to simulate the data. In each panel, these median errors were recalculated using only those simulated datasets for which fits converged for both of the likelihoods being compared. The solid line is the unity line. Thus, points above and left of the unity line indicate that the method plotted on the vertical axis exhibits greater median error than that on the horizontal axis and vice versa for points below and right of the unity line. P values report the results of a Wilcoxon signed-ranks test for paired observations. P values indicating statistically significant differences between methods (after Bonferroni correction: critical α = 0.083) are in bold.

For the NB, the four likelihoods all exhibited very similar levels of median error. There was some suggestion of greater error in k for the three-parameter likelihood, relative to the classical and reparameterized likelihoods, but this was statistically significant only for the reparameterized likelihood. For S, there was some suggestion of lower error overall for the constrained likelihood, although this was statistically significant only in comparison to the reparameterized model. However, there were two cases for which the constrained likelihood performed particularly poorly (Fig. 4; Supplementary Data6, see online supplementary material), due to a combination of high bias and high sampling variance (Supplementary Data, see online supplementary material). These cases corresponded to the parameter combinations with the largest expected number of unobserved species (>100; parameter sets N = 100 and N = 1 000, k = 0.02, S = 500).

comparison of absolute median error in NB parameter estimates across the four likelihoods, plotted in the same way as in Fig. 3. Median error in log(k) is shown in the upper left half of the figure; median error in log(S) is shown in the bottom right.
Figure 4:

comparison of absolute median error in NB parameter estimates across the four likelihoods, plotted in the same way as in Fig. 3. Median error in log(k) is shown in the upper left half of the figure; median error in log(S) is shown in the bottom right.

Determinants of median error

Patterns of variation in median error as a function of sample size, species richness and shape parameter were qualitatively identical across methods for both the PLN and NB. Therefore, we illustrate here in the main text the patterns using the classical likelihood (Fig. 5). For the PLN, relatively low levels of error (∼20% of the true parameter value or less) for both parameters occurred when one of two conditions were met: either N = 10 000, or N = 1 000 and σtrue < 3 (Fig. 5a and b). For both parameters, median error is an increasing function of σtrue and a decreasing function of N. For σ, when N = 1 000 or 10 000, error decreased with increasing Strue. Conversely, for S, error increased with Strue. For the NB, error tended to be greater overall than for the PLN, particularly when ktrue < 1 (Fig. 5c and d). Low levels of median error (∼20% or smaller) in both parameters occurred only for simulations with ktrue > 1 and N > 100, and ktrue = 0.2, N = 10 000. For k, error decreased as N increased. In general, error also decreased to a minimum at ktrue = 2, except when sample size was small and the number of species was large. In this case, sampling variances were extremely large. Error also tended to increase with Strue when ktrue > 1 or N = 100. For S, the patterns were similar. However, error remained low as ktrue increased from 2 to 20, rather than increasing. Also, the biases tended to be smaller than for k, particularly the bias in S for the small N, high Strue cases with ktrue = 2 and ktrue = 20, for which the error was large, but not outrageously so.

absolute median error of (a) log(σ) and (b) log(S) for the PLN and (c) log(k) and (d) log(S) of the NB, for each of the parameter combinations used. Values of the relevant shape parameter are shown on the horizontal axis. Other parameter values are as follows: N = 100 (black), N = 1 000 (red), N = 10 000 (green), Strue = 20 (circles), Strue = 100 (upward-pointing triangles) and Strue = 500 (downward pointing triangles).
Figure 5:

absolute median error of (a) log(σ) and (b) log(S) for the PLN and (c) log(k) and (d) log(S) of the NB, for each of the parameter combinations used. Values of the relevant shape parameter are shown on the horizontal axis. Other parameter values are as follows: N = 100 (black), N = 1 000 (red), N = 10 000 (green), Strue = 20 (circles), Strue = 100 (upward-pointing triangles) and Strue = 500 (downward pointing triangles).

Examination of the patterns in Fig. 5 led us to consider a regression model, in which variation in median error depends upon three quantities: the incompleteness of the distribution (here measured as p0, the expected fraction of species not observed in the sample), the expected number of observed species, E(Sobs), and the extent to which the distribution departs from a Poisson distribution. As a measure of the latter, we simply summed the absolute differences in p(n) between PLN or NB and a Poisson distribution with the same mean (i.e. we calculated one minus the percentage overlap in the two distributions). A regression model incorporating these three explanatory variables explained >90% of the variation in median error among parameter sets for both the PLN and NB. The results of this regression using the classical likelihood are reported in Table 1 and Fig. 6, but similar results were obtained using the other likelihoods.

Table 1:

regressions of absolute median error in the fitted parameters of the PLN and NB

ModelAbsolute median error inExplanatory variablesCoefficientsStandard errorPR2
PLNlog σp(0)0.9830.0979 × 10−10*0.97
dist.pois−0.9880.1781 × 10−5*
log[E(Sobs)]−0.4830.0242 × 10−15*
Intercept−0.0060.1740.974
log Sp(0)0.8620.0512 × 10−14*0.93
log[E(Sobs)]−0.0520.0140.0014*
Intercept0.2540.0640.0006*
NBlog kp(0)2.6320.1314 × 10−18*0.97
log(dist.pois)−0.5130.0346 × 10−15*
log[E(Sobs)]−0.4900.0423 × 10−12*
Intercept−0.2420.1810.192
log Sp(0)1.2150.0641 × 10−17*0.94
log(dist.pois)0.0480.0160.007*
log[E(Sobs)]−0.0480.0200.025*
Intercept0.2730.0880.004*
ModelAbsolute median error inExplanatory variablesCoefficientsStandard errorPR2
PLNlog σp(0)0.9830.0979 × 10−10*0.97
dist.pois−0.9880.1781 × 10−5*
log[E(Sobs)]−0.4830.0242 × 10−15*
Intercept−0.0060.1740.974
log Sp(0)0.8620.0512 × 10−14*0.93
log[E(Sobs)]−0.0520.0140.0014*
Intercept0.2540.0640.0006*
NBlog kp(0)2.6320.1314 × 10−18*0.97
log(dist.pois)−0.5130.0346 × 10−15*
log[E(Sobs)]−0.4900.0423 × 10−12*
Intercept−0.2420.1810.192
log Sp(0)1.2150.0641 × 10−17*0.94
log(dist.pois)0.0480.0160.007*
log[E(Sobs)]−0.0480.0200.025*
Intercept0.2730.0880.004*

Absolute median error has been log-transformed for the shape parameters σ and k and square-root transformed for species richness S. p(0) is the probability that a species is not observed in the species-abundance sample. E(Sobs) is the expected number of observed species in the sample and ‘dist.pois’ is one minus the percent overlap of the true NB or PLN distribution and a Poisson distribution with the same mean.

*

P values are significant with a confidence level of α = 0.05.

Table 1:

regressions of absolute median error in the fitted parameters of the PLN and NB

ModelAbsolute median error inExplanatory variablesCoefficientsStandard errorPR2
PLNlog σp(0)0.9830.0979 × 10−10*0.97
dist.pois−0.9880.1781 × 10−5*
log[E(Sobs)]−0.4830.0242 × 10−15*
Intercept−0.0060.1740.974
log Sp(0)0.8620.0512 × 10−14*0.93
log[E(Sobs)]−0.0520.0140.0014*
Intercept0.2540.0640.0006*
NBlog kp(0)2.6320.1314 × 10−18*0.97
log(dist.pois)−0.5130.0346 × 10−15*
log[E(Sobs)]−0.4900.0423 × 10−12*
Intercept−0.2420.1810.192
log Sp(0)1.2150.0641 × 10−17*0.94
log(dist.pois)0.0480.0160.007*
log[E(Sobs)]−0.0480.0200.025*
Intercept0.2730.0880.004*
ModelAbsolute median error inExplanatory variablesCoefficientsStandard errorPR2
PLNlog σp(0)0.9830.0979 × 10−10*0.97
dist.pois−0.9880.1781 × 10−5*
log[E(Sobs)]−0.4830.0242 × 10−15*
Intercept−0.0060.1740.974
log Sp(0)0.8620.0512 × 10−14*0.93
log[E(Sobs)]−0.0520.0140.0014*
Intercept0.2540.0640.0006*
NBlog kp(0)2.6320.1314 × 10−18*0.97
log(dist.pois)−0.5130.0346 × 10−15*
log[E(Sobs)]−0.4900.0423 × 10−12*
Intercept−0.2420.1810.192
log Sp(0)1.2150.0641 × 10−17*0.94
log(dist.pois)0.0480.0160.007*
log[E(Sobs)]−0.0480.0200.025*
Intercept0.2730.0880.004*

Absolute median error has been log-transformed for the shape parameters σ and k and square-root transformed for species richness S. p(0) is the probability that a species is not observed in the species-abundance sample. E(Sobs) is the expected number of observed species in the sample and ‘dist.pois’ is one minus the percent overlap of the true NB or PLN distribution and a Poisson distribution with the same mean.

*

P values are significant with a confidence level of α = 0.05.

observed versus predicted median error of (a) log(σ) and (b) log(S) for the PLN and (c) log(k) and (d) log(S) of the NB, illustrating the fit of the regression model from Table 1. The solid line is the unity line, and each point represents one of the parameter combinations used to simulate the data. Note that absolute median error is log-transfomed for the shape parameters σ and k and square root transformed for species richness S.
Figure 6:

observed versus predicted median error of (a) log(σ) and (b) log(S) for the PLN and (c) log(k) and (d) log(S) of the NB, illustrating the fit of the regression model from Table 1. The solid line is the unity line, and each point represents one of the parameter combinations used to simulate the data. Note that absolute median error is log-transfomed for the shape parameters σ and k and square root transformed for species richness S.

DISCUSSION

For most parameter sets, all the likelihoods perform similarly well, exhibiting relatively low median error and low bias relative to sampling variance. However, for the NB in particular, median error can be extremely large either when k is very small (k = 0.02 in this study) or when N is small (N = 100 in this study; Fig. 5), patterns that are reflected in very high median biases (Supplementary Data, see online supplementary material). While these biases were particularly large for estimates of the shape parameter k, they were also large for the parameter S, at least when k was small. Because fits of the NB to species-abundance data frequently suggest log series-like (i.e. k << 1) distributions (Connolly et al. 2009; Volkov et al. 2007), this suggests that the traditional maximum likelihood approach of fitting the NB to a single species-abundance distribution is problematic.

Our regression models facilitate an explanation of what factors drive the variation in estimator error that we found in this study. Our three explanatory variables were the probability of a species having abundance zero, the expected number of observed species, and the difference between the abundance distribution and the closest Poisson distribution. p0 is an inverse measure of the ‘completeness’ of the expected abundance distribution, as a representation of the true underlying lognormal or gamma relative abundance distribution. The lower p0 is, the more ‘unveiled’ that underlying distribution is (sensuPreston 1948). The expected number of observed species is a measure of how complete a picture of this observable part of the abundance distribution the sample will constitute (i.e. how ‘filled in’ the expected abundance distribution will be with observed species abundances). Thus, these two quantities together indicate how effectively the observed data reveal the shape of the underlying community abundance distribution. However, each of our models is derived from assuming Poisson sampling from an underlying gamma or lognormal pattern of relative abundances. For some parameter values (e.g. low μ, low σtrue for the PLN; low m, high ktrue for the NB), the Poisson sampling component dominates the variability in observed species abundances, and the overlap between the NB or PLN and the Poisson distribution is high. In such cases, the shape of the sample abundance distribution is less sensitive to the parameters of the underlying gamma or lognormal distribution, and median error is greater. The predictability of absolute median error suggests that a similar set of explanatory variables may be used to develop a comprehensive bias correction technique for the PLN and NB distributions (see Cabrera and Watson 1997 for a similar approach).

Comparisons of the performance of the different likelihoods suggest that, on balance, considering both the PLN and NB, the classical formulation of the likelihood performs better than the alternatives, even though S is a derived rather than fitted parameter in that formulation. It was the only likelihood for which median bias was always less than IQR (Fig. 2). For the PLN, the classical likelihood also yields the lowest error in estimates of σ and equal lowest error in estimation of S. For the NB, there was little to differentiate the likelihoods: the three-parameter likelihood was marginally worse than the classical and reparameterized likelihoods for k, whereas the constrained likelihood was marginally better than the non-constrained likelihoods for S. Only one of these differences was statistically significant, however (Fig. 4).

Our comparisons provide a direct test of whether the statistical performance of parameter estimates for the classical, reparameterized and three-parameter likelihoods are biased by their failure to account for the fact that the potential shapes that a sample abundance distribution can take are constrained by the total number of individuals in the sample. Previous work has investigated this possibility indirectly, by comparing parameter bias for unconstrained likelihoods fitted to two different types of simulated data: data for which the total number of individuals sampled is fixed (as in the simulations used here) and data for which each species’ abundance is an independent Poisson random variable (Connolly and Dornelas 2011; Connolly et al. 2009). The results in this paper are consistent with the conclusions drawn from the earlier simulation studies: biases do not appear to be systematically greater when constraints on total sample size are ignored by the likelihood.

Indeed, contrary to our expectation, the constrained likelihood actually performed worse than the unconstrained likelihoods for the PLN. It is not immediately obvious why consistently larger IQR in σ would result from the normalization that differentiates the constrained likelihood from the other candidates examined in this paper. It may well be that the poorer performance of the constrained likelihood arises from numerical error, rather than the likelihood per se. For example, calculating the normalization constant for the PLN requires numerical evaluation of the integral in equation (2) up to n = 25 for a large range of values of p(n). Thus, any numerical approximation errors would be amplified, relative to the other likelihoods. The fact that the constrained likelihood did not perform worse than the alternatives for the NB would be consistent with such an explanation (assuming that numerical approximation error associated with evaluating the gamma functions in the NB likelihood are small by comparison). Regardless, however, there is certainly limited evidence that the lack of a constraint on total community size in the classical, reparameterized, or three-parameter likelihoods leads to materially poorer parameter estimates.

Our results do suggest that the reparameterized likelihood provides a viable alternative to the three-parameter likelihood, when the direct estimation of total species richness is desired. Indeed, there is some support for using this likelihood in preference to the three-parameter likelihood: the reparameterized likelihood exhibits significantly lower median error in k (but not S) for the NB, and there is a suggestion that it also exhibits lower median error in σ for the PLN (this difference is only marginally non-significant after a Bonferroni correction). We had expected the additional parameter used in the three-parameter likelihood to lead to greater sampling variance, compared to two-parameter likelihoods. Indeed, there does appear to be slightly greater IQR for the three-parameter model for k and σ (Supplementary Data, see online supplementary material), but lower bias also appears to contribute to the better performance of the reparameterized likelihood (Supplementary Data, see online supplementary material).

Because the three-parameter likelihood, all other things being equal, will tend to be a decreasing function of the predicted number of unobserved species ( Appendix), we conjectured that the three-parameter likelihoods would be a poor choice when model selection is one of the goals of analysis, and the competing models predict different numbers of unobserved species. Preliminary support for this conjecture becomes apparent when we fit both the NB and PLN to data simulated with the NB hypergeometric bootstrap and then compare model selection results based on the reparameterized versus the three-parameter likelihood (Table 2). In this case, the true NB distribution consistently estimates a larger proportion of unobserved species, and indeed, this does appear to impair model selection by the three-parameter likelihood. In 31 of 32 parameter sets, the median log-likelihood ratio comparing the NB and PLN fits is greater when model selection is based on the reparameterized likelihood than when it is based on the three-parameter likelihood. In other words, the reparameterized approach favors the true NB distribution more strongly than the three-parameter likelihood (Table 2). Moreover, across parameter sets, the magnitude of the underperformance of the three-parameter likelihood is correlated with the difference in formula estimated by the NB and the PLN. Indeed, if we exclude those parameter sets where formula for both the PLN and NB (in such cases, the sampling variance of the difference in formula is very large, relative to its magnitude), the difference in formula explains almost all the variation in the relative model selection performance of the two likelihoods (Spearman’s ρ = 0.96, P < 0.0001).

Table 2:

model selection results for data simulated with the hypergeometric NB bootstrap

True parameters
median(ℒNB / ℒPLN)
median[p(0)^]
NkSReparThreeparPerformance differenceNBPLN
1000.2200.29−0.060.350.1600.073
1 0000.2200.910.130.780.1340.033
10 0000.2201.840.461.390.0900.017
1002200.120.000.120.0400.030
1 0002200.570.520.050.001<0.001
10 0002200.610.590.02<0.001<0.001
10020200.000.000.000.0070.009
1 0002020−0.05−0.090.04<0.001<0.001
10 00020200.050.050.00<0.001<0.001
1 0000.021000.74−0.100.840.3280.077
10 0000.021001.440.041.390.4080.060
1000.21000.300.060.240.4790.292
1 0000.21001.671.050.620.3800.117
10 0000.21004.852.911.940.2410.047
10021000.030.000.030.3790.371
1 00021001.140.750.390.0240.011
10 00021003.343.210.14<0.001<0.001
100201000.000.000.000.3750.348
1 000201000.030.030.00<0.001<0.001
10 000201000.210.210.00<0.001<0.001
1000.025000.27−0.020.290.6020.304
1 0000.025001.160.540.620.7480.195
10 0000.025003.571.741.830.7420.139
1000.25000.060.000.060.7410.617
1 0000.25002.271.830.440.5870.256
10 0000.250011.009.411.590.3820.101
10025000.000.000.000.8040.757
1 00025000.500.400.100.2440.200
10 00025009.038.250.780.0080.002
100205000.000.000.000.8540.832
1 000205000.000.000.000.1460.148
10 00020500−13.96−13.87−0.08<0.001<0.001
True parameters
median(ℒNB / ℒPLN)
median[p(0)^]
NkSReparThreeparPerformance differenceNBPLN
1000.2200.29−0.060.350.1600.073
1 0000.2200.910.130.780.1340.033
10 0000.2201.840.461.390.0900.017
1002200.120.000.120.0400.030
1 0002200.570.520.050.001<0.001
10 0002200.610.590.02<0.001<0.001
10020200.000.000.000.0070.009
1 0002020−0.05−0.090.04<0.001<0.001
10 00020200.050.050.00<0.001<0.001
1 0000.021000.74−0.100.840.3280.077
10 0000.021001.440.041.390.4080.060
1000.21000.300.060.240.4790.292
1 0000.21001.671.050.620.3800.117
10 0000.21004.852.911.940.2410.047
10021000.030.000.030.3790.371
1 00021001.140.750.390.0240.011
10 00021003.343.210.14<0.001<0.001
100201000.000.000.000.3750.348
1 000201000.030.030.00<0.001<0.001
10 000201000.210.210.00<0.001<0.001
1000.025000.27−0.020.290.6020.304
1 0000.025001.160.540.620.7480.195
10 0000.025003.571.741.830.7420.139
1000.25000.060.000.060.7410.617
1 0000.25002.271.830.440.5870.256
10 0000.250011.009.411.590.3820.101
10025000.000.000.000.8040.757
1 00025000.500.400.100.2440.200
10 00025009.038.250.780.0080.002
100205000.000.000.000.8540.832
1 000205000.000.000.000.1460.148
10 00020500−13.96−13.87−0.08<0.001<0.001

The NB and PLN were both fitted to all simulated data sets, using the reparameterized (Repar) and three-parameter (Threepar) likelihoods. Then, model selection performance was assessed by calculating, for each simulated dataset, the ratio of the NB maximum log-likelihood to that of the PLN (so that larger positive values indicate stronger support for the true NB distribution over the PLN). This was done once using the reparameterized likelihood and once using the three-parameter likelihood. Performance difference is the difference between the log-likelihood ratios obtained using these two approaches (positive values indicate that model selection based on the reparameterized likelihood supports the true model more strongly than model selection based on the three-parameter likelihood). The final two columns report the estimated fraction of unobserved species, calculated from the maximum likelihood estimates for the three-parameter NB and PLN likelihoods. All medians are calculated across the simulated data sets for a given combination of true parameter values.

Table 2:

model selection results for data simulated with the hypergeometric NB bootstrap

True parameters
median(ℒNB / ℒPLN)
median[p(0)^]
NkSReparThreeparPerformance differenceNBPLN
1000.2200.29−0.060.350.1600.073
1 0000.2200.910.130.780.1340.033
10 0000.2201.840.461.390.0900.017
1002200.120.000.120.0400.030
1 0002200.570.520.050.001<0.001
10 0002200.610.590.02<0.001<0.001
10020200.000.000.000.0070.009
1 0002020−0.05−0.090.04<0.001<0.001
10 00020200.050.050.00<0.001<0.001
1 0000.021000.74−0.100.840.3280.077
10 0000.021001.440.041.390.4080.060
1000.21000.300.060.240.4790.292
1 0000.21001.671.050.620.3800.117
10 0000.21004.852.911.940.2410.047
10021000.030.000.030.3790.371
1 00021001.140.750.390.0240.011
10 00021003.343.210.14<0.001<0.001
100201000.000.000.000.3750.348
1 000201000.030.030.00<0.001<0.001
10 000201000.210.210.00<0.001<0.001
1000.025000.27−0.020.290.6020.304
1 0000.025001.160.540.620.7480.195
10 0000.025003.571.741.830.7420.139
1000.25000.060.000.060.7410.617
1 0000.25002.271.830.440.5870.256
10 0000.250011.009.411.590.3820.101
10025000.000.000.000.8040.757
1 00025000.500.400.100.2440.200
10 00025009.038.250.780.0080.002
100205000.000.000.000.8540.832
1 000205000.000.000.000.1460.148
10 00020500−13.96−13.87−0.08<0.001<0.001
True parameters
median(ℒNB / ℒPLN)
median[p(0)^]
NkSReparThreeparPerformance differenceNBPLN
1000.2200.29−0.060.350.1600.073
1 0000.2200.910.130.780.1340.033
10 0000.2201.840.461.390.0900.017
1002200.120.000.120.0400.030
1 0002200.570.520.050.001<0.001
10 0002200.610.590.02<0.001<0.001
10020200.000.000.000.0070.009
1 0002020−0.05−0.090.04<0.001<0.001
10 00020200.050.050.00<0.001<0.001
1 0000.021000.74−0.100.840.3280.077
10 0000.021001.440.041.390.4080.060
1000.21000.300.060.240.4790.292
1 0000.21001.671.050.620.3800.117
10 0000.21004.852.911.940.2410.047
10021000.030.000.030.3790.371
1 00021001.140.750.390.0240.011
10 00021003.343.210.14<0.001<0.001
100201000.000.000.000.3750.348
1 000201000.030.030.00<0.001<0.001
10 000201000.210.210.00<0.001<0.001
1000.025000.27−0.020.290.6020.304
1 0000.025001.160.540.620.7480.195
10 0000.025003.571.741.830.7420.139
1000.25000.060.000.060.7410.617
1 0000.25002.271.830.440.5870.256
10 0000.250011.009.411.590.3820.101
10025000.000.000.000.8040.757
1 00025000.500.400.100.2440.200
10 00025009.038.250.780.0080.002
100205000.000.000.000.8540.832
1 000205000.000.000.000.1460.148
10 00020500−13.96−13.87−0.08<0.001<0.001

The NB and PLN were both fitted to all simulated data sets, using the reparameterized (Repar) and three-parameter (Threepar) likelihoods. Then, model selection performance was assessed by calculating, for each simulated dataset, the ratio of the NB maximum log-likelihood to that of the PLN (so that larger positive values indicate stronger support for the true NB distribution over the PLN). This was done once using the reparameterized likelihood and once using the three-parameter likelihood. Performance difference is the difference between the log-likelihood ratios obtained using these two approaches (positive values indicate that model selection based on the reparameterized likelihood supports the true model more strongly than model selection based on the three-parameter likelihood). The final two columns report the estimated fraction of unobserved species, calculated from the maximum likelihood estimates for the three-parameter NB and PLN likelihoods. All medians are calculated across the simulated data sets for a given combination of true parameter values.

Although biases in parameter estimates tended to be less than the IQR, especially for the classical likelihoods (Fig. 2), there were some parameter combinations where both biases and sampling variances were large, and thus, median error in parameter estimates was high. This was particularly the case for the NB distribution when sample size was small, species richness was large and the shape of the observed abundance distribution was relatively insensitive to changes in k (i.e. k << 1, where distributions approach the log series, and k >> 1, where distributions approach the Poisson. If the goal is simply to fit the observed species-abundance distribution, this high median error is relatively unimportant because the shape of the distribution is relatively insensitive to changes in parameter values in this range. However, because it is also possible to estimate species richness or infer demographic parameters from species abundance (e.g. k is the immigration rate in the neutral model of Volkov et al. 2007), some exploration of alternative approaches is needed in future work. We suspect that the most productive avenues will include the integration of multiple sources of information. For instance, if there are multiple local species-abundance samples from the same community, it is possible that better parameter estimates can be obtained by fitting multivariate species abundances (e.g. Etienne 2009). Alternatively, if independent sources of information about NB parameters are available (e.g. estimates of S from presence-absence data or species range maps), then this information may be incorporated within a Bayesian framework.

We conclude with several recommendations for species abundance analysis using the PLN and NB. Firstly, when the goal is simply to fit a distribution and draw inferences from the parameters, the classical formulation of the PLN likelihood appears to perform best overall, even though S is not estimated directly. Secondly, conditioning the likelihood on the total number of individuals in the sample appears to have a relatively small effect on the quality of parameter estimation. There is certainly no evidence of consistently greater median error in parameter estimates when this conditioning is not used; indeed, error in σ is larger for the constrained likelihood. Thus, the choice of whether or not to condition on the total number of sampled individuals should probably be determined by the purpose of analysis. For example, if a goal of analysis is model selection, and the alternative model is conditional upon a particular number of individuals (as in most neutral models), then conditioning the PLN or NB in similar fashion will ensure that likelihoods are comparable (Etienne and Olff 2005). Finally, the reparameterized (approximate) likelihood works as well as, or (in the case of the NB, for estimates of k) better than, the three-parameter likelihood. Thus, when direct estimates of S are required, and model selection is intended, the reparameterized formulation may be preferable.

SUPPLEMENTARY MATERIAL

Supplementary Figures S1Supplementary Data and Supplementary Tables S1Supplementary Data are available at Journal of Plant Ecology online.

FUNDING

Australian Research Council (grants CE0561435 and DP0880544) and James Cook University.

We thank F. He and Sun Yat Sen University for supporting SRC’s participation in the International Symposium in Biodiversity and Theoretical Ecology from which this contribution arose. Mizue Hisano provided assistance with statistical programming and figure preparation. We also thank R. Etienne for suggestions and R. Izsak for assistance with a technical point, at early stages of the work.

Conflict of interest statement. None declared.

APPENDIX

Consider an idealized case in which two models predict exactly the same observed abundance distribution (i.e. the two models have identical zero-truncated likelihoods), but they differ in the number of unobserved species they predict. For each model, the ratio of the non-truncated to the zero-truncated log-likelihood will be
(A.1)
The derivative of equation (A.1) with respect to s0 is always negative, and thus, the model predicting the larger number of unobserved species (i.e. the larger p(0)) would have the smaller three-parameter likelihood. The derivative of (A.1) is messy, but it is possible to confirm by brute force evaluation that it is negative for moderately small (e.g. <100) values Sobs and s0. For large values of Sobs and s0, we can apply Stirling’s approximation: formula to obtain a clearer analytical result:
and then differentiating:

References

Bulmer
MG
,
Fitting the Poisson lognormal distribution to species-abundance data
Biometrics
,
1974
, vol.
30
(pg.
101
-
10
)
Cabrera
J
Watson
GS
,
Simulation methods for mean and median bias reduction in parametric estimation
J Stat Plan Inference
,
1997
, vol.
57
(pg.
143
-
52
)
Connolly
SR
Dornelas
M
Magurran
AE
McGill
BJ
,
Fitting and empirical evaluation of species abundance models
Biological Diversity: Frontiers in Measurement and Assessment
,
2011
Oxford
Oxford University Press
(pg.
123
-
40
)
Connolly
SR
Dornelas
M
Bellwood
DR
et al.
,
Testing species abundance models: a new bootstrap approach applied to Indo-Pacific coral reefs
Ecology
,
2009
, vol.
90
(pg.
3138
-
49
)
Connolly
SR
Hughes
TP
Bellwood
DR
et al.
,
Community structure of corals and reef fishes at multiple scales
Science
,
2005
, vol.
309
(pg.
1363
-
5
)
Diserud
OH
Engen
S
,
A general and dynamic species abundance model, embracing the lognormal and the gamma models
Am Nat
,
2000
, vol.
155
(pg.
497
-
511
)
Dornelas
M
Connolly
SR
,
Multiple modes in a coral species abundance distribution
Ecol Lett
,
2008
, vol.
11
pg.
1008
pg.
1016
Engen
S
,
Heterogeneous communities with lognormal species abundance distribution: species-area curves and sustainability
J Theor Biol
,
2007
, vol.
249
(pg.
791
-
803
)
Engen
S
Lande
R
,
Population dynamic models generating species abundance distributions of the gamma type
J Theor Biol
,
1996
, vol.
178
(pg.
325
-
31
)
Engen
S
Lande
R
,
Population dynamic models generating the lognormal species abundance distribution
Math Biosci
,
1996
, vol.
132
(pg.
169
-
83
)
Engen
S
Lande
R
Walla
T
et al.
,
Analyzing spatial structure of communities using the two- dimensional Poisson lognormal species abundance model
Am Nat
,
2002
, vol.
160
(pg.
60
-
73
)
Engen
S
Saether
BE
Sverdrup-Thygeson
A
et al.
,
Assessment of species diversity from species abundance distributions at different localities
Oikos
,
2008
, vol.
117
(pg.
738
-
48
)
Etienne
RS
,
Improved estimation of neutral model parameters for multiple samples with different degrees of dispersal limitation
Ecology
,
2009
, vol.
90
(pg.
847
-
52
)
Etienne
RS
Alonso
D
McKane
AJ
,
The zero-sum assumption in neutral biodiversity theory
J Theor Biol
,
2007
, vol.
248
(pg.
522
-
36
)
Etienne
RS
Olff
H
,
Confronting different models of community structure to species-abundance data: a Bayesian model comparison
Ecol Lett
,
2005
, vol.
8
(pg.
493
-
504
)
Fisher
RA
Corbet
AS
Williams
CB
,
The relation between the number of species and the number of individuals in a random sample of an animal population
J Anim Ecol
,
1943
, vol.
12
(pg.
42
-
58
)
Gaston
KJ
Blackburn
TM
Pattern and Process in Macroecology
,
2000
Blackwell
Oxford
Gray
JS
,
Detecting pollution induced changes in communities using the log-normal distribution of individuals among species
Mar Pollut Bull
,
1981
, vol.
12
(pg.
173
-
6
)
Hubbell
SP
The Unified Neutral Theory of Biodiversity and Biogeography
,
2001
Princeton, NJ
Princeton University Press
Izsak
R
,
Maximum likelihood fitting of the Poisson lognormal distribution
Environ Ecol Stat
,
2008
, vol.
15
(pg.
143
-
56
)
Lande
R
Engen
S
Saether
BE
Stochastic Population Dynamics in Ecology and Conservation
,
2003
New York
Oxford University Press
Magurran
AE
Henderson
PA
Magurran
AE
McGill
BJ
,
Commonness and rarity
Biological Diversity: Frontiers in Measurement and Assessment
,
2011
Oxford
Oxford University Press
May
RM
Cody
ML
Diamond
JM
,
Patterns of species abundance and diversity
Ecology and Evolution of Communities
,
1975
Cambridge
Belknap Press of Harvard University Press
(pg.
81
-
120
)
McGill
BJ
Etienne
RS
Gray
JS
et al.
,
Species abundance distributions: moving beyond single-prediction theories to integration within an ecological framework
Ecol Lett
,
2007
, vol.
10
(pg.
995
-
1015
)
Pielou
EC
Ecological Diversity
,
1975
New York
Wiley
Preston
FW
,
The commonness, and rarity, of species
Ecology
,
1948
, vol.
29
(pg.
254
-
83
)
Pueyo
S
,
Diversity: between neutrality and structure
Oikos
,
2006
, vol.
112
(pg.
392
-
405
)
Shaban
SA
Crow
EL
Shimizu
K
,
Poisson-lognormal distributions
Lognormal Distributions: Theory and Applications
,
1988
Boca Raton, FL
CRC Press
(pg.
195
-
210
)
R Core Development Team
R: A Language and Environment for Statistical Computing
,
2011
Vienna, Austria
R Foundation for Statistical Computing
Volkov
I
Banavar
JR
Hubbell
SP
et al.
,
Neutral theory and relative species abundance in ecology
Nature
,
2003
, vol.
424
(pg.
1035
-
7
)
Volkov
I
Banavar
JR
Hubbell
SP
et al.
,
Patterns of relative species abundance in rainforests and coral reefs
Nature
,
2007
, vol.
450
(pg.
45
-
9
)
Yee
TW
,
The VGAM package for categorical data analysis
J Stat Software
,
2010
, vol.
32
(pg.
1
-
34
)

Supplementary data