Abstract

In an effort to effectively model observed patterns in the spatial configuration of individuals of multiple species in nature, we introduce the saturated pairwise interaction Gibbs point process. Its main strength lies in its ability to model both attraction and repulsion within and between species, over different scales. As such, it is particularly well-suited to the study of associations in complex ecosystems. Based on the existing literature, we provide an easy to implement fitting procedure as well as a technique to make inference for the model parameters. We also prove that under certain hypotheses the point process is locally stable, which allows us to use the well-known ‘coupling from the past’ algorithm to draw samples from the model. Different numerical experiments show the robustness of the model. We study three different ecological data sets, demonstrating in each one that our model helps disentangle competing ecological effects on species' distribution.

1 INTRODUCTION

Point processes, that is, random events in time and/or space, have seen widespread use in forestry and plant ecology (Thompson, 1955), astronomy (Babu & Feigelson, 1996), epidemiology (Waller & Gotway, 2004), geology (Connor & Hill, 1995), wireless networks (Andrews et al., 2010; Baccelli & Błlaszczyszyn, 2009) and criminology (Mohler et al., 2011). A marked point process is a point process that has additional features attached to each event. Marked point processes are particularly important in ecology where observations often include properties of the individuals, for example their size or species. Indeed, marked point processes have been used to model trees and their species in a swamp forest (Dixon, 2002), trees in a tropical forest along with their species and size (Hubbell et al., 2012), adult and juvenile plants in a dipterocarp forest along with their species (Punchi-Manage et al., 2013), see also the many other examples in the spatstat.data R package (Baddeley et al., 2015).

The spatial arrangement of individuals of different species reflects what is termed ecological community assembly. Community assembly may be thought as the outcome of species having differential dispersal abilities, environmental tolerance and biotic interactions (Weiher et al., 2011). Statistical modelling of community assembly data on multi-species abundances or presence/absence in samples has focused on environmental tolerance and biotic interactions, within the framework of generalised linear mixed effects modelling (Ovaskainen et al., 2017). Yet, community assembly is an outcome of underlying mechanisms of birth, growth, reproduction, dispersal and death of individuals. Those individuals' fates may be affected by other individuals of the same or different species through positive and negative interactions, such as competition and facilitation. Additionally, indirect interactions between individuals within a species may be mediated through natural enemies (e.g., seed predators, herbivores and pathogens) that use density-dependent search strategies. Compared to alternative models that work with abundance or occurrence data, the use of point process models in such ecological settings enables inferences about the underlying mechanisms driving spatial arrangement of individuals in a multi-species setting.

The most common approach to the analysis of multi-species point processes is to compute the cross-pair correlation functions between all pairs of species, following the methodology of Møller and Waagepetersen (2004) (see Sections 4.4 and 4.5 therein). Such an approach quickly becomes impractical when the number of species increases beyond the bivariate setting. Compared to this type of ad hoc analysis, an integrated modelling framework is more robust and allows ecological questions to be answered more systematically.

A number of such models have recently been introduced to tackle multi-species spatial point patterns. First, the log-Gaussian Cox process has been successfully used to jointly model the locations of a nine tree species subset of the Barro Colorado Island 50 Ha plot in Waagepetersen et al. (2016). Briefly, the model assumes that a number of correlated Gaussian fields are driving the log-intensity of the point process, and the correlation coefficients between the Gaussian fields are thought to represent positive or negative interactions between species. Second, some types of Gibbs point processes have been used to model a larger subset of species from the same Barro Colorado Island dataset in Rajala et al. (2018). Although the Gibbs point process is usually thought of as modelling repulsion, the Geyer model (Geyer, 1999) introduces a saturation parameter that allows it to model either attraction or repulsion. The model in Rajala et al. (2018) is an extended version of the Geyer model adapted to the multi-species setting.

Our aim in this manuscript is to expand upon some of the ideas in Rajala et al. (2018) and to build a solid unified framework that can be applied to a wide range of multi-species marked point patterns. To that end, we introduce the saturated pairwise interaction Gibbs point process. Within this class of models, Rajala et al. (2018) only consider potential functions that are linear combinations of step functions and in brief, our model instead is based on ecologically sound potential functions. We also allow for pairwise interactions whose magnitude are driven by the individuals' marks, such as their size, which are thought to be important in explaining species' distribution.

In addition to providing the theoretical background to the model, we shall also consider the problem of simulating the point process without conditioning on the number of points. Although not considered in Rajala et al. (2018), such unconditioned simulation is indeed important, for example, to do Monte Carlo simulations as well as compute simulation envelopes and run goodness of fit tests. The model is validated in a series of numerical experiments in which we compute coverage probabilities, mean estimates and consider the sensitivity of the inference procedure to some of the fixed model parameters.

We demonstrate the flexibility of our model by applying it to various point patterns of interest to plant ecologists. These cover a range of ecosystems and, within the framework of our model, showcase the different types of positive and negative interactions that arise in the analysis of ecological data. We start with modelling the locations of a hundred Norway spruce trees in Germany (Fiksel, 1988) before moving on to a study of close to a thousand trees in a swamp forest in South Carolina (Good & Whipple, 1982). We conclude with an analysis of the well-known Barro Colorado Island data set studied in Rajala et al. (2018) and Waagepetersen et al. (2016). In this last analysis, we cover a larger subset of the data than Waagepetersen et al. (2016), and compared to Rajala et al. (2018), we put more emphasis on the species' interactions. We examine how well our model has performed on a data set containing almost a hundred species and many thousands of individual trees. In each of these examples, we show that our model has helped quantify and disentangle the effects of different ecological mechanisms on the spatial distribution. The model described in this manuscript has been implemented as an R package (R Core Team, 2019) to facilitate its use by ecology practitioners and other researchers.

We begin in Section 2 by introducing some notation and defining our new Gibbs point process. In Section 3 we explain how to do inference on our point process model by using the logistic regression inference technique for Gibbs point processes from Baddeley et al. (2014). We recall two important simulation algorithms in Section 4, and prove that they can be applied to our model. We provide numerical studies in Section 5 and applications to real data sets in Section 6. Some additional technical results are given in Appendix S1.

2 MATHEMATICAL SPECIFICATION OF THE MODEL

2.1 Notation

Before introducing our model, we start with some brief elements of point process theory. The interested reader may find more details in any of the numerous textbooks on the topic, for example Kallenberg (1983), Daley and Vere-Jones (2003, 2008) and Møller and Waagepetersen (2004). In the remainder of this manuscript, we shall often use 1A to denote the indicator function of a set A, that is, the function with values 1 on A and 0 elsewhere.

We consider p species located in a bounded region W2, with individuals labeled by an integer representing their species, and each individual also equipped with a mark in representing one of its properties such as its height or width. More precisely, we denote by (x,i,m)W×{1,,p}×=:𝕊 an individual of species i at the location x with a mark m. The set of admissible configurations is denoted by

and the cardinality of the set ω by |ω|. We assume that the reference measure of the marks is the Lebesgue measure. Given a configuration of individuals ω𝒩, we denote by ωi the sub-configuration consisting of individuals of species i, for 1ip. In short, ωi:={(x,k,m)ω:k=i}.

In point process theory, a number of descriptive functions have been introduced. The two we shall use in this manuscript are the Papangelou conditional intensity and the density with respect to a unit rate Poisson point process. Briefly, the density of the point process is the function j such that j(ω) is proportional to the probability that a configuration occurs in an infinitesimal volume around ω𝒩.

The Papangelou conditional intensity of the point process (Daley and Vere-Jones (2008), definition 10.4.I) is a function π such that π((x,i,m),ω)dxdm is the probability that there is an individual of species i, mark around m and location around xW conditional on the configuration ω𝒩 (outside of dx×{i}×dm). Contrary to the density j, the Papangelou conditional intensity gives information on the conditional probability of finding new individuals, given an existing configuration. Although the density characterises the point process, it is the Papangelou conditional intensity that appears in point process inference and simulation, and thus it shall play a key role in our analysis.

2.2 Potential functions

Our model is based on short- and medium-range potential functions, which themselves depend on interaction distances. In order to facilitate comparisons between different potential functions, we impose a few conditions. A short-range potential functionφRS with short-range interaction radius RS is a [0,1]-valued decreasing1 function which satisfies φRS(0)=1, φRS(r)0.5 for rRS and φRS(r)<0.5 for r>RS. Similarly, a medium-range potential functionψRMRL with medium-range interaction radius RM and long-range interaction radius RL is a [0,1]-valued function such that 0.5ψRMRL(r)1 for RMrRL and 0ψRMRL(r)<0.5 otherwise. Larger values of the potential functions correspond to stronger interactions, and in particular the potential functions fall to no more than half of their maximum at the corresponding interaction distances. We list in Table 1 a few commonly used potential functions at short and medium ranges.

TABLE 1

Short-range (first three rows, in blue) and medium-range (last two rows, in red) potential functions [Colour figure can be viewed at https://dbpia.nl.go.kr]

Potential functionDefinitionShape
Exponentialexp[rln(2)/RS]graphic
Square bump1exp[(RS)2ln(2)/r2]graphic
Step1[0,RS](r)graphic
Normalexp[4(r(RM+RL)/2)2ln(2)(RLRM)2]graphic
Geyer1[RM,RL](r)graphic
Potential functionDefinitionShape
Exponentialexp[rln(2)/RS]graphic
Square bump1exp[(RS)2ln(2)/r2]graphic
Step1[0,RS](r)graphic
Normalexp[4(r(RM+RL)/2)2ln(2)(RLRM)2]graphic
Geyer1[RM,RL](r)graphic

Notes: In the table below, RS, RM and RL are, respectively, the short-range, medium-range and long-range interaction radii.

TABLE 1

Short-range (first three rows, in blue) and medium-range (last two rows, in red) potential functions [Colour figure can be viewed at https://dbpia.nl.go.kr]

Potential functionDefinitionShape
Exponentialexp[rln(2)/RS]graphic
Square bump1exp[(RS)2ln(2)/r2]graphic
Step1[0,RS](r)graphic
Normalexp[4(r(RM+RL)/2)2ln(2)(RLRM)2]graphic
Geyer1[RM,RL](r)graphic
Potential functionDefinitionShape
Exponentialexp[rln(2)/RS]graphic
Square bump1exp[(RS)2ln(2)/r2]graphic
Step1[0,RS](r)graphic
Normalexp[4(r(RM+RL)/2)2ln(2)(RLRM)2]graphic
Geyer1[RM,RL](r)graphic

Notes: In the table below, RS, RM and RL are, respectively, the short-range, medium-range and long-range interaction radii.

2.3 Model

Our model is parametrised by the following quantities.

  • 1.

    An intercept vector (β1,0,,βp,0)Tp which is interpreted as the log-intensities of the different species, if there were no interactions.

  • 2.

    Environmental covariates X1,,XK which are assumed to be bounded.

  • 3.

    For 1ip and 1kK, a coefficient βi,k that represents the response of species i to environmental covariate k.

  • 4.

    A function u(z,(ω{z})i2) representing the short-range interactions between species i2 in ω and an individual z=(x,i1,m) of species i1 with mark m at location x.

  • 5.

    A function v(z,(ω{z})i2) that models the medium-range interactions between species i2 in ω and an individual z as in (4).

  • 6.

    For 1i1,i2p, a coefficient αi1,i2 which represents the magnitude of short-range interactions between species i1 and species i2. Positive values of αi1,i2 correspond to attraction between species i1 and species i2 while negative values are associated with repulsion. Note that it is assumed that α is symmetric, in the sense that αi1,i2=αi2,i1.

  • 7.

    For 1i1,i2p, a symmetric coefficient γi1,i2 which is the magnitude of medium-range interactions between each pair of species i1 and species i2. As in (6), we interpret the sign of γi1,i2 as indicating either attraction or repulsion.

The model is specified by its density, defined by

(1)

for ω𝒩, and where C>0 is a normalising constant. The Papangelou conditional intensity π directly follows from (1) by the formula π((x,i,m),ω):=j(ω{(x,i,m)})/j(ω) for (x,i,m)ω. We compute π explicitly in Appendix S1.

As mentioned above, the function u(z,(ω{z})i2) is interpreted as the saturated sum of short-range interactions between species i2 in ω and an individual z=(x1,i1,m1) of species i1 at x1 and with mark m1. Letting Ri1,i2S denote the short range interaction distance between species i1 and i2, we propose to define u as either

(2)

or, taking into account marks,

(3)

where N is called the saturation parameter, and the set of saturated configurations is defined as S(ω,N)={ηω:|η|N}. The quantity uunmarked((x1,i1,m1),ωi2) consists in the sum of the N largest pairwise interactions between the individual at x1 and individuals of species i2 in ω. Heuristically, the larger this quantity, the more short-range interactions there are between the individual at x1 and species i2. Our interpretation of the saturation parameter N is similar to that of Rajala et al. (2018) who write that N ‘reproduces the feature that the neighbourhood must eventually saturate with individuals as resources are finite’.

In the first of our two definitions (2), the distances Ri1,i2S are interpreted as typical short-range interaction distances between individuals of species i1 and i2. This contrasts with the second definition (3), in which the distances Ri1,i2S are measured as a proportion of the average marks of interacting individuals. One could consider other choices involving marks instead of (3), for example interactions proportional to the absolute difference of marks, thereby modelling fiercer competition between dissimilar individuals.

Similarly, letting Ri1,i2M (respectively Ri1,i2L) be the medium- (respectively long-) range interaction distances between species i1 and i2, we define

(4)

as well as

(5)

where the set of saturated configurations S(ωi2,N) was defined above. The parameters Ri1,i2M and Ri1,i2L have the same interpretation as Ri1,i2S, but relate to what we call medium- and long-range interactions instead of short-range ones.

2.4 Saturated pairwise interaction Gibbs point process

We call our model defined by (1) a ‘saturated pairwise interaction Gibbs point process’, and the aim of this section is to make explicit why we have settled on this name. As an aside, although to the best of our knowledge saturated pairwise interaction Gibbs point processes have never been described in the scientific literature, spatstat has implemented internally what they call pairsat.family and describe as a ‘Saturated Pairwise Interaction Point Process Family’.

Rewriting the model's density (1), for example in the marked case (3) and (5), we have

When N=, this is precisely a pairwise interaction Gibbs point process (see e.g., Møller & Waagepetersen, 2004, section 6.2) with inhomogeneous intensity for species i given by

and pairwise interaction functions

(6)

(the factor 2 in front of αi1,i2 and γi1,i2 respectively, arises because for any pair x1, x2 of locations in ω, our model double-counts the pairwise interaction between x1 and x2). Equation (6) above makes clear the joint effect of the short and medium range potentials, as well as the effect of the magnitude and sign of the coefficients αi1,i2 and γi1,i2. A plot to illustrate this effect is provided in Figure 1.

Two potential functions summed together, for α=1, γ=−1/2, exponential short-range potential, and normal medium-range potential (see Table 1), when N=∞.We plot the short-range potential in densely dashed red (- - - - -), the medium-range potential in loosely dashed blue (− − −) and the sum of the two in solid purple (______). [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 1

Two potential functions summed together, for α=1, γ=1/2, exponential short-range potential, and normal medium-range potential (see Table 1), when N=.We plot the short-range potential in densely dashed red (- - - - -), the medium-range potential in loosely dashed blue () and the sum of the two in solid purple (______). [Colour figure can be viewed at https://dbpia.nl.go.kr]

When N is finite, the model only accounts for interactions between each individual and its N closest neighbours. This explains our use of the adjective ‘saturated’ to qualify our model.

2.5 Some cases of interest

2.5.1 Non-interacting model

Assume that αi1,i2=γi1,i2=0, so that there is neither attraction nor repulsion. Our general model (1) simplifies to

which can be seen (see e.g., Daley & Vere,-Jones 2003) to be a multi-type inhomogeneous Poisson point process with intensity for the ith type given by

In other words, each of the species is modelled independently by inhomogeneous Poisson point processes with log-intensities driven linearly by the environmental covariates.

2.5.2 Multivariate Geyer model

We assume now that βi,k=0 and γi1,i2=0. We further assume that the short range interaction potential is the step potential from Table 1. The density in the unmarked case (2) is equal to

which is an instance of the class of models used in Rajala et al. (2018).

3 INFERENCE

3.1 Logistic regression of Baddeley et al

In this subsection, we prove that the assumptions of Baddeley et al. (2014) hold, which ensures that their logistic regression can be used to do inference for our model. This method enables us to estimate the parameters β, α and γ.

The density of the model defined in (1) can be written as

(7)

In the equation above, we have defined the parameter vector θ:=(θ0T,θ1T,θ2T,θ3T)T, where θ0:=(β1,0,,βp,0)T, θ1:=(β1,1,,β1,n,,βp,1,,βp,n)T, θ2:=(α1,1,,αp,p)T and θ3:=(γ1,1,,γp,p)T.

In addition, we have set t(ω):=(t0(ω)T,t1(ω)T,t2(ω)T,t3(ω)T)T, where

for

and

Under this new compact notation (7), the Papangelou conditional intensity at ω𝒩 and for an individual of species i{1,,p} with mark m located at xW is readily computed as

(8)

where t((x,i,m),ω):=t(ω{(x,i,m)})t(ω).

The fact that we can write the density and the Papangelou conditional intensities respectively as (7) and (8) guarantees that the assumptions of Baddeley et al. (2014) hold. Given an observed configuration ω, the logistic regression technique of Baddeley et al. (2014) can be summarised as:

  • 1.

    sample a set of dummy points D with known (fixed) intensity, denoted by ρ;

  • 2.

    compute t(z,ω{z}) defined in (8) as z ranges over ωD;

  • 3.

    obtain θ defined above by a logistic regression with response variable 1 when z=(x,i,m)ω and 0 otherwise, input variables t(z,ω{z}) and offset term log(ρ(x)).

3.2 Variance-covariance matrix

Our model belongs to the class of Gibbs point processes and as such, SEs and confidence intervals are not straightforward to produce. Indeed, it has been shown in Baddeley et al. (2014) that, although the SEs corresponding to the logistic regression of the previous section are a good approximation, they are in general not accurate. Instead, asymptotic confidence intervals can be estimated by the technique introduced in Coeurjolly and Rubak (1998) (see also section 4 and the appendices of Baddeley et al., 2014). We will not repeat here the details of the construction of the asymptotic variance-covariance matrix, but we draw the reader's attention to the fact that there appears to be multiple typographic errors in equation (A4) of Baddeley et al. (2014). We refer to our package described in Appendix S1 for the details of the implementation.

3.3 Estimation of the other parameters

Section 3.1 dealt with the estimation of β, α and γ. It remains to explain how to choose the saturation parameter N, the shape of the potential functions, as well as the interaction radii between and within species on the short, medium and long ranges.

We shall often fix the potential shapes in order to simplify the analysis. Regarding the saturation parameter N, in some cases, we shall keep it fixed to 2. This assumption implies that the probability of a new individual being at a given location depends only on its two neighbours with which it interacts most, disregarding other individuals. Another option would be to follow the last paragraph of section 2.2 in Rajala et al. (2018) and set N automatically depending on the observed abundances.

In Rajala et al. (2018), the interaction radii are fixed a priori, and they write as their justification ‘in data analysis one usually has a priori information on relevant ranges (e.g., Uriarte et al., 2004)’. Although a priori fixing these parameters has been done in some of our analyses, we also wanted a straightforward statistical procedure to estimate the interaction radii. This has allowed us to fit the model to different data sets without prior knowledge of the characteristics of the species involved.

Our basic idea is to calibrate the model for various values of the interaction radii, saturation parameters, and potential shapes, and choose the set of values which performs best according to some measure of goodness of fit. Since one of our goals is to apply the model to large-scale datasets, an important requirement for the measure of goodness-of-fit is that it be relatively fast to compute. Consequently, we have refrained from using computationally heavy techniques like that of Møller and Berthelsen (2012) or an explicit computation of the likelihood as in section 8.3.2 of Møller and Waagepetersen (2004). Instead, we propose as a measure of the goodness of fit the pseudo-likelihood corresponding to the logistic regression in Section 3.1. More explicitly, we choose values of the saturation parameter and interaction radii which maximise the logistic pseudo-likelihood.

4 SIMULATION

4.1 Coupling from the past

In some cases, it is possible to use the ‘coupling from the past’ algorithm (sometimes called ‘perfect simulation’ algorithm) to sample from our point process, see section 11 of Møller and Waagepetersen (2004). Contrary to other simulation algorithms, the ‘coupling from the past’ algorithm is not approximate, and produces samples from the actual point process. In order to apply such an algorithm in practice, one needs to prove that its Papangelou conditional intensity is locally stable, that is, that there exists a function h such that π((x,i,m),ω)h(x) almost everywhere. The following Proposition 1 ensures that our model is locally stable under some additional hypotheses. We define x+:=max(x,0) for any real number x.

Proposition 1

Assuming that for anyi1,i2, γi1,i20, we have

for almost anyxW, 1ip, mandω𝒩, and where

Proof

The proof is a straightforward consequence of Lemma 1 in Appendix S1.

Given Proposition 1 above, we shall often work under the assumption

which is to say that none of the medium-range interactions are attractive. Under (H), Proposition 1 ensures that the ‘coupling from the past’ algorithm can be applied. The details of how the algorithm applies to our setting are provided in Section 3 of Appendix S1.

4.2 Metropolis–Hastings algorithm

Although the algorithm introduced in the previous subsection is extremely powerful, it has two disadvantages. First, it is sometimes slow, and for some values of the parameters, it does not converge in a reasonable time. Second, it requires the additional hypothesis (H) which we would like to relax in some instances. As such, in some cases, we will fall back on the unconditional Metropolis–Hastings algorithm, see algorithm 7.4 of Møller and Waagepetersen (2004). There are a series of possible variations of the algorithm, see for example remark 7.6 of Møller and Waagepetersen (2004) for a specialisation to the locally stable setting.

Since we aim for a version of the algorithm which can be applied to simulate from our model in all settings, we shall choose, in the notation of Møller and Waagepetersen (2004), a probability of birth equal to 1/2, uniformly distributed births qb(·)=1W(·)/|W|, and a probability 1/2 of a uniformly distributed death distributed according to qd(·,ω)=1ω(·)/|ω|, where ω𝒩.

5 NUMERICAL SIMULATIONS

5.1 Simulation study

We start with a simulation study involving two species. This ensures that the number of parameters is tractable, while still demonstrating that the ‘coupling from the past’ algorithm and the fitting procedure are working as expected. We ran simulation studies involving significantly more species, and we have not observed any decrease in performance. We report the results of a seven species study in Appendix S1. In this first numerical experiment, we consider a ‘saturated pairwise interaction Gibbs point process’ on the square region W=[1,1]2, consisting of p=2 species, with no marks, and whose distribution is driven by two geospatial covariates, X1(x,y)=x and X2(x,y)=y. We consider uniform short-range interaction radii of RS=0.05, medium-range interaction radii of RM=0.07 and long-range interaction radii of RL=0.12. The rest of the parameters are given by β0T=(2.5,2), β1T=(2,2.5) (corresponding to X1), β2T=(1,1.5) (corresponding to X2), and

We set the saturation parameter N to 2, take as the short-range potential the square bump function, and choose the normal medium-range potential, see Table 1. In order to illustrate our experiment, we plot on the left of Figure 2 a typical sample from this point process.

Typical samples considered in our numerical experiments. On the left, a sample from the point process considered in Section 5.1, in the middle, a sample from the point process considered in Section 5.2 and finally on the right, a sample from the point process considered in Section 5.3. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 2

Typical samples considered in our numerical experiments. On the left, a sample from the point process considered in Section 5.1, in the middle, a sample from the point process considered in Section 5.2 and finally on the right, a sample from the point process considered in Section 5.3. [Colour figure can be viewed at https://dbpia.nl.go.kr]

We sampled 1000 independent draws of this point process. Since the assumption (H) from Section 4.1 is satisfied and the simulation procedure is reasonably fast for these parameters, these draws are sampled by the ‘coupling from the past’ algorithm. The saturation parameter, interaction distances and interaction shapes were set to their true values. We then fit each of the samples by the logistic regression technique from Section 3.1, and produced asymptotic confidence intervals according to Section 3.2. The results are presented in Table 2. Our results are satisfying, showing good mean estimates over only 1000 samples, along with coverage probabilities with a mean and median of 95%.

TABLE 2

Parameter estimates & coverage probabilities

ParameterTrue valueMeanMedianRMSECoverage prob.
β1,02.502.502.510.3340.94
β2,02.001.971.980.3410.94
β1,12.002.142.100.5750.95
β2,12.502.672.630.5670.97
β1,21.001.081.040.4180.96
β2,21.501.621.580.4480.95
α1,10.200.530.341.120.95
α1,20.100.0920.110.2730.95
α2,20.600.820.750.5620.95
γ1,10.600.690.660.3970.96
γ1,20.300.310.300.1790.96
γ2,20.000.0180.0230.02390.94
ParameterTrue valueMeanMedianRMSECoverage prob.
β1,02.502.502.510.3340.94
β2,02.001.971.980.3410.94
β1,12.002.142.100.5750.95
β2,12.502.672.630.5670.97
β1,21.001.081.040.4180.96
β2,21.501.621.580.4480.95
α1,10.200.530.341.120.95
α1,20.100.0920.110.2730.95
α2,20.600.820.750.5620.95
γ1,10.600.690.660.3970.96
γ1,20.300.310.300.1790.96
γ2,20.000.0180.0230.02390.94
TABLE 2

Parameter estimates & coverage probabilities

ParameterTrue valueMeanMedianRMSECoverage prob.
β1,02.502.502.510.3340.94
β2,02.001.971.980.3410.94
β1,12.002.142.100.5750.95
β2,12.502.672.630.5670.97
β1,21.001.081.040.4180.96
β2,21.501.621.580.4480.95
α1,10.200.530.341.120.95
α1,20.100.0920.110.2730.95
α2,20.600.820.750.5620.95
γ1,10.600.690.660.3970.96
γ1,20.300.310.300.1790.96
γ2,20.000.0180.0230.02390.94
ParameterTrue valueMeanMedianRMSECoverage prob.
β1,02.502.502.510.3340.94
β2,02.001.971.980.3410.94
β1,12.002.142.100.5750.95
β2,12.502.672.630.5670.97
β1,21.001.081.040.4180.96
β2,21.501.621.580.4480.95
α1,10.200.530.341.120.95
α1,20.100.0920.110.2730.95
α2,20.600.820.750.5620.95
γ1,10.600.690.660.3970.96
γ1,20.300.310.300.1790.96
γ2,20.000.0180.0230.02390.94

5.2 Sensitivity analysis

In this experiment, we study how sensitive our calibration is to mis-specified values of the interaction radii and the saturation parameter N. We consider a ‘saturated pairwise interaction Gibbs point process’ on W=[0,1]2, consisting in p=2 species, with no marks, and whose distribution is driven by a single environmental covariate X1(x,y)=x. We assume that the two species interact over different ranges, and that their distribution is characterised by β0T=(4,3.5), β1T=(1.5,2), and

We take as the short-range potential the square bump function from Table 1, and choose a saturation parameter N=2.

Although the assumption (H) from Section 4.1 is satisfied, it is faster to sample 100 independent draws of this point process by the Metropolis–Hastings algorithm of Section 4.2, with 100,000 steps. In order to give a sense of the type of point process we are working with, we show in the middle of Figure 2 a typical sample.

In our experiment, we first fit each of the samples by mis-specifying the short-range interaction radii RS, then assumed a mis-specification of the saturation parameter N. More specifically, we consider two mis-specifications of the interaction radii, namely

We also consider an under-specified saturation parameter N=1 and an over-specified N+=4.

The results for the interaction radii mis-specification are presented in Table 3. The main insight gained from this part of the experiment is that the estimates of the parameters are fairly accurate even when the interaction radii have been mis-specified by around 50%. This is largely due to the shape of our short-range potential function which is flat around the origin, and the high intensity of points in each sample relative to the saturation parameter N. In addition, we remark that the estimates are notably better when the radius is mis-specified as R+S. Our interpretation of this fact is that when the user chooses an interaction radius which is larger than the true one, the same broad pairwise interactions are accounted for. When the radius is under-specified instead, some pairwise interaction are strongly discounted, which biases the estimates of some of the parameters.

TABLE 3

Mis-specification of the interaction radii as either an under-specification RS or an over-specification R+S

Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,0  4.04.160.803.770.92
β2,0  3.53.410.903.510.97
β1,1  1.51.640.931.640.93
β2,1  2.02.350.872.060.93
α1,1  0.40.380.920.440.97
α1,20.30.380.870.310.97
α2,2  0.40.520.900.300.93
Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,0  4.04.160.803.770.92
β2,0  3.53.410.903.510.97
β1,1  1.51.640.931.640.93
β2,1  2.02.350.872.060.93
α1,1  0.40.380.920.440.97
α1,20.30.380.870.310.97
α2,2  0.40.520.900.300.93
TABLE 3

Mis-specification of the interaction radii as either an under-specification RS or an over-specification R+S

Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,0  4.04.160.803.770.92
β2,0  3.53.410.903.510.97
β1,1  1.51.640.931.640.93
β2,1  2.02.350.872.060.93
α1,1  0.40.380.920.440.97
α1,20.30.380.870.310.97
α2,2  0.40.520.900.300.93
Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,0  4.04.160.803.770.92
β2,0  3.53.410.903.510.97
β1,1  1.51.640.931.640.93
β2,1  2.02.350.872.060.93
α1,1  0.40.380.920.440.97
α1,20.30.380.870.310.97
α2,2  0.40.520.900.300.93

The results related to the mis-specification of the saturation parameter N are in Table 4. A few things stand out in this analysis. First, the β parameters (which relate to the abundance) are well estimated even when the saturation parameter is mis-specified. Indeed, the mean estimated values of β1,0, β2,0, β1,1, β2,1, β1,2 and β2,2 are very close to the true values, and the associated coverage probabilities are of the right magnitude. Second, some interaction coefficients have very bad coverage probabilities, but broadly speaking their signs and magnitude are properly recovered by the estimation procedure. Third, when the saturation parameter is under-specified, the corresponding interaction coefficients are larger in magnitude, while when it is over-specified the interaction coefficients are smaller. Heuristically, this is due to the fact that when the saturation parameter is under-specified, there are less interactions accounted for in the sum of short-range interactions (2), and consequently the corresponding interaction coefficient that multiplies the sum ought to be larger.

TABLE 4

Mis-specification of the saturation parameter as either N or N+

Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,04.04.060.884.030.91
β2,03.53.510.943.410.91
β1,11.51.690.891.530.96
β2,12.02.190.902.100.94
α1,10.40.520.910.220.48
α1,20.30.540.620.190.53
α2,20.40.350.960.280.90
Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,04.04.060.884.030.91
β2,03.53.510.943.410.91
β1,11.51.690.891.530.96
β2,12.02.190.902.100.94
α1,10.40.520.910.220.48
α1,20.30.540.620.190.53
α2,20.40.350.960.280.90
TABLE 4

Mis-specification of the saturation parameter as either N or N+

Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,04.04.060.884.030.91
β2,03.53.510.943.410.91
β1,11.51.690.891.530.96
β2,12.02.190.902.100.94
α1,10.40.520.910.220.48
α1,20.30.540.620.190.53
α2,20.40.350.960.280.90
Under-specificationOver-specification
ParameterTrue valueMeanCoverage prob.MeanCoverage prob.
β1,04.04.060.884.030.91
β2,03.53.510.943.410.91
β1,11.51.690.891.530.96
β2,12.02.190.902.100.94
α1,10.40.520.910.220.48
α1,20.30.540.620.190.53
α2,20.40.350.960.280.90

5.3 Inference of the interaction radii

In this paragraph, we assume that the true interaction radii are unknown, and we study how well the model is able to recover them using our proposed method from Section 3.3. We do not choose the same parameters as in the previous Section 5.2 since, as observed there, the model is not very sensitive to the actual value of the interaction radius. Instead, we purposely choose strong interaction coefficient values to allow our fitting procedure to recover the true values of the interaction radii.

We choose an observation window W=[0,1]2, with p=2 species, no marks, and whose distribution is driven by a single geospatial covariate X1(x,y)=x0.5. We assume that all interactions occur at a distance of 0.05 and in addition we assume that the interactions at those ranges are quite strong, so that the calibration procedure is able to pick them up. To be explicit, the rest of the parameters are given by β0T=(6.5,2.6), β1T=(1,1), and

We choose N=2 for the saturation parameter and take as the short-range potential the exponential function from Table 1. A typical sample is shown on the right of Figure 2.

We sampled 1000 independent draws of this point process. Although the assumption (H) from Section 4.1 is satisfied, these draws are sampled with 1,000,000 steps of the Metropolis–Hastings algorithm which is quicker for such extreme values of the interaction coefficients. For each draw of the point process, we find the optimal short-range interaction coefficient by maximising the pseudo-likelihood. We find in Figure 3 that for around 4% of samples, the pseudo-likelihood is actually maximised by choosing the largest possible interaction radius. When removing these outliers, the mean estimated short-range interaction radius is found to be 0.06. If instead we keep these samples, then the mean estimate significantly overestimates the true interaction radius, and the median actually works best.

Optimal short-range interaction radius for each draw, obtained by pseudo-likelihood maximization. The maximization was done on a discrete grid between 0.0025 and 0.5. The true value of the interaction radius is shown in red, the median estimate is in blue, and the average estimate (including the values hitting the hard limit at 0.5) is drawn in yellow. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 3

Optimal short-range interaction radius for each draw, obtained by pseudo-likelihood maximization. The maximization was done on a discrete grid between 0.0025 and 0.5. The true value of the interaction radius is shown in red, the median estimate is in blue, and the average estimate (including the values hitting the hard limit at 0.5) is drawn in yellow. [Colour figure can be viewed at https://dbpia.nl.go.kr]

In order to explore how well our method is actually performing, we also searched for the interaction radius which maximises the average pseudo-likelihood over all draws. Although not practical, since this method requires the observation of multiple replications of the point process, we show in Figure 4 that this technique properly recovers the true value of the short-range interaction radius.

Pseudo-log-likelihood averaged over all samples, for a given value of the short-range interaction coefficient. The value that maximizes the average log-likelihood is found to be the true value of the interaction radius, RS=0.05, shown here in red. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 4

Pseudo-log-likelihood averaged over all samples, for a given value of the short-range interaction coefficient. The value that maximizes the average log-likelihood is found to be the true value of the interaction radius, RS=0.05, shown here in red. [Colour figure can be viewed at https://dbpia.nl.go.kr]

Although we have reported here the results of a study with quite extreme values of the interaction coefficients, our reported findings are representative of a range of other tested values. In running the simulation with other interaction coefficients, we find that the main change is in the proportion of samples for which the method does not properly converge. We found this proportion to vary between 4% and 30%. We gather from this experiment that the method introduced in Section 3.3 works reasonably well to estimate unknown interaction radii, except in certain cases where the pseudo-likelihood maximising radius appears to be infinite. In conclusion, we caution the reader to not put much confidence in estimated values of the interaction radii hitting the hard upper-bound, especially when the corresponding interaction coefficient is not statistically significant.

6 REAL APPLICATIONS

In this section, we consider three different case studies from plant ecology. In each case we give examples of ecological insights derived from our model. All three data sets consist of the locations of trees, however, differing in their biome, plot size, density of individuals and number of species.

6.1 Norway spruces

In this subsection we consider the locations of 134 Norway spruce trees in a natural forest stand in Saxonia, Germany. The original source of the data is unknown, but it has been widely studied in the point process literature, see for example section 4 of Fiksel (1988) and example 2 in Goulard et al. (1996). The diameter at breast height in metres has been recorded for each individual tree in the dataset, and will serve as our marks. There are no associated environmental covariates, and instead the dataset is often used as an example of a regular marked point process, with interaction distances thought to be proportional to marks. What we call interaction radii are sometimes described in the literature on this dataset as ‘influence zones’ (Goulard et al., 1996), ‘hard-core’ and “interaction” radii (Penttinen et al., 1992). Various estimates of these values have been derived in previous analyses and one of our aims shall be to compare our results to the literature. In Figure 5, we show the locations of the spruces along with discs proportional to their diameters.

Norway spruces with marks representing their diameter at breast height. The background colour gradient is the fitted log-Papangelou conditional intensity. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 5

Norway spruces with marks representing their diameter at breast height. The background colour gradient is the fitted log-Papangelou conditional intensity. [Colour figure can be viewed at https://dbpia.nl.go.kr]

6.1.1 Results

Following Goulard et al. (1996), we assume that interactions take place at distances proportional to the marks, and so we choose (3) and (5), which in words assumes that individual to individual interactions are proportional to the average marks of the two individuals. In order to estimate the interaction radii, potential function shapes and saturation parameter, we deployed the multi-dimensional maximisation outlined in Section 3.3, using the pseudo-likelihood of the logistic regression as the objective function. Our only constraint is restricting the saturation parameter to the range of values {1,2,4,6}; however, we found that the fit was not significantly influenced by these values. The results of our model are summarised in Table 5.

TABLE 5

Norway spruce dataset results

ParameterEstimate95% CI
Interceptβ01.88(2.57,1.19)
Short-range coefficientα5.18(6.92,3.43)
Medium-range coefficientγ0.14(0.05,0.23)
Short-range radiusRS2.41
Medium-range radiusRM16.40
Long-range radiusRL24.43
Short-range shapeφRSExponential
Medium-range shapeψRMRLGeyer
SaturationN6
ParameterEstimate95% CI
Interceptβ01.88(2.57,1.19)
Short-range coefficientα5.18(6.92,3.43)
Medium-range coefficientγ0.14(0.05,0.23)
Short-range radiusRS2.41
Medium-range radiusRM16.40
Long-range radiusRL24.43
Short-range shapeφRSExponential
Medium-range shapeψRMRLGeyer
SaturationN6

Notes: We do not give the 95% confidence intervals for the parameters fitted by the ad hoc pseudo-likelihood maximisation. The other confidence intervals are produced by the method outlined in Section 3.2.

TABLE 5

Norway spruce dataset results

ParameterEstimate95% CI
Interceptβ01.88(2.57,1.19)
Short-range coefficientα5.18(6.92,3.43)
Medium-range coefficientγ0.14(0.05,0.23)
Short-range radiusRS2.41
Medium-range radiusRM16.40
Long-range radiusRL24.43
Short-range shapeφRSExponential
Medium-range shapeψRMRLGeyer
SaturationN6
ParameterEstimate95% CI
Interceptβ01.88(2.57,1.19)
Short-range coefficientα5.18(6.92,3.43)
Medium-range coefficientγ0.14(0.05,0.23)
Short-range radiusRS2.41
Medium-range radiusRM16.40
Long-range radiusRL24.43
Short-range shapeφRSExponential
Medium-range shapeψRMRLGeyer
SaturationN6

Notes: We do not give the 95% confidence intervals for the parameters fitted by the ad hoc pseudo-likelihood maximisation. The other confidence intervals are produced by the method outlined in Section 3.2.

Recall that the radii in Table 5 are given as a proportion of the marks, so that for example two individuals of size 0.2m interact on the short-range at a distance of 0.2RS=0.482m. Our fitted estimates are broadly in line with what other researchers have estimated or a prior fixed in the relevant literature, see Fiksel (1988), Penttinen et al. (1992) and Goulard et al. (1996). Indeed, as others have observed, there are strong negative short-range interactions between the locations of the spruces. In addition, the authors of Penttinen et al. (1992) choose a ‘hard-core radius’ of 1m, where our short-range interaction radius amounts to 0.6m on average (calculated as RS times the average tree diameter of 25cm). We find medium-range interactions that occur at an average distance of 5.1m (calculated as the mean of RM and RL times the average tree diameter), which is analogous to the quantity Penttinen et al. (1992) call an ‘interaction radius’ and set to 3.5m. The authors in Goulard et al. (1996) choose an influence zone of five times the diameter, which again is comparable to our fitted short-range interaction radius. The best short-range potential function is found to be the exponential, which is notably the shape chosen for interactions in the pairwise Gibbs point process used in Penttinen et al. (1992).

We have also gone further than some of the existing models. To the best of our knowledge, other models do not capture the statistically significant medium-range positive interactions in the dataset, occurring between 16 and 24 times the diameter at breast height. This property of the point pattern might be caused by a mixture of pollination and seed dispersal. These ecological mechanisms would tend to increase the likelihood of finding individuals surrounded by others at these medium ranges.

6.2 South Carolina Savannah river site

In this subsection, we study the locations of 734 individual trees in a 200m×50m plot in the Savannah river site, South Carolina, USA. Seven different plots were originally set up by Bill Good, and a first analysis of their spatial patterns was conducted in Good and Whipple (1982), see also the subsequent analyses in Jones et al. (1994) and Dixon (2002). We focus on one of the plots from the original experiment shown in Figure 6. The data set can be obtained using the R language (R Core Team, 2019) as ecespa::swamp from the ecespa package available on CRAN.

South Carolina Savannah river site [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 6

South Carolina Savannah river site [Colour figure can be viewed at https://dbpia.nl.go.kr]

There are no known environmental covariates related to this data set; however, the (unmeasured) water level is thought to be an important driver of the spatial distribution. Contrary to Section 6.1 and to simplify the analysis, we assume that the saturation parameter N is equal to 2, that the short-range interaction potential is the square exponential from Table 1, and finally we assume that there are no medium-range interactions. We also let the interaction radii be on a discrete grid, with grid size 1m, and constrain them to be less than 20m.

6.2.1 Fitting of the interaction radii

In order to estimate the different interaction radii, we follow the procedure outlined in Section 3.3 and implemented in Section 6.1. We find that the fitted short range interaction distances RS in metres are given by

where entry i,j of the matrix above corresponds to Ri,jS, the short-range interaction distance between species i and j. We have put in bold values of the interaction distances which are later found to be associated with significant interactions, and greyed out values which are found not to be. Since their corresponding interactions are weak, greyed out values carry weak statistical weight. In addition, values of the interaction radius attaining our hard upper-bound of 20m should not be taken at face value given our findings in Section 5.3.

We observe that the short-range interaction radii Ri,iS within each of the species has a mean of around 2m while the interaction radii Ri,jS between species are on average five times larger. Thus, the intra-species and inter-species short-range interaction radii appear to relate to different underlying ecological processes. The intra-species interaction radii Ri,iS might be related to the seed dispersal distance and the range within which individuals (of the same species) compete for resources. The inter-species interaction radii Ri,jS could be due to unmeasured environmental variation and/or be the range within which individuals (of different species) compete for resources.

6.2.2 Results

The fitted values for the matrix of short-range interaction coefficients α are presented in Figure 7. The results support the hypothesis of strong clustering within each species, with the exception of the bald cypress in which we observe mild repulsion, although the parameter estimate is not statistically significant. Similar results were already obtained in Dixon (2002), where it was written that the particular status of the cypress “may be due to logging … or it may represent some other difference between cypress and the other tree species”.

On the left-hand side, short range interaction coefficients within each of the species αi,i. On the right-hand side, short range interaction coefficients between each of the species αi,j. We provide estimates along with the corresponding 95% confidence intervals. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 7

On the left-hand side, short range interaction coefficients within each of the species αi,i. On the right-hand side, short range interaction coefficients between each of the species αi,j. We provide estimates along with the corresponding 95% confidence intervals. [Colour figure can be viewed at https://dbpia.nl.go.kr]

The estimates of the pairwise short-range interaction radii are all negative and all but two of the 95% confidence intervals do not overlap with zero. However, we recall that we have used a two-step procedure in which the interaction radii were specifically chosen to maximise the pseudo-likelihood, and in addition we have not made any correction for the multiple testing problem. Hence, we should be cautious in interpreting the confidence intervals. Broadly speaking however, there is evidence of competition rather than facilitation between species. We note in particular that many of the strongest repulsive associations involve the swamp tupelo. These results also corroborate what was observed in the existing literature on this dataset, see in particular Dixon (2002). However, the technique introduced in Dixon (2002) did not find most of the inter-species interactions to be statistically significant, perhaps due to the fact that heterogeneity in the interaction radii could not be accounted for.

6.3 Barro Colorado Island

Fully mapped out forest plots are a rare occurrence in ecology. These are, however, crucial in understanding the relative importance of dispersal limitation, biotic interactions and habitat filtering in explaining species' distributions. Many seminal studies of spatial distributions within forest plots have been unable to account for inter-species associations (Condit et al., 2000; John et al., 2007; Shen et al., 2013; Wiegand et al., 2007) and when they have it is via an analysis of pair correlation functions (Deyi et al., 2020; Uriarte et al., 2004). By contrast, our model allows us to conduct the analysis within a fully integrated model-based framework.

In this section, we study the 1000m×500m tropical moist forest plot at Barro Colorado Island, Panama. All woody trees and shrubs whose stems have a diameter of at least 1cm have been censused in multiple years (see Condit, 1998; Condit et al., 1999; Hubbell et al., 2012 for more details). Regarding the analysis of the Barro Colorado Island data set specifically, attempts at analysing ecological drivers of multi-species distributions within a unified framework have been scarce, and we shall mostly compare our results to Rajala et al. (2018, section 5) and Waagepetersen et al. (2016, section 6.2) which are the most extensive studies to date.

A wide range of environmental covariates are available for the Barro Colorado Island dataset, for example information about the soil type, elevation, etc. We settled upon six ecologically relevant covariates, namely slope and elevation, solar irradiance, soil pH and phosphorus content, and finally the soil moisture in the mid dry season in a non-drought year from Kupers et al. (2019). Rajala et al. (2018) chose instead six covariates from principal component analysis, which can be difficult to interpret, while Waagepetersen et al. (2016) settled on eleven different covariates including the first five of ours. We remark that our method scales well with the number of environmental covariates, and the reason for restricting our attention to only six of them is simply ease of presentation.

There are around 300 different species and hundreds of thousands of individual trees in the Barro Colorado Island data set, and consequently various techniques have been used to reduce the numerical complexity. The authors in Waagepetersen et al. (2016) restrict their attention to nine seemingly arbitrarily chosen species with intermediate abundance. In Rajala et al. (2018) instead, the authors exclude species for which they do not have an estimate of ‘reproducible size’, which is used as a proxy for the size at which individuals reach reproductive maturity. Then for each species, the authors remove individual trees below the reproducible size threshold, and finally exclude species with less than 50 remaining individuals.

In order to restrict our analysis to that of adult trees which are thought to have a more regular distribution, following Rajala et al. (2018) we remove immature individuals from the data set. Immature individuals were removed based on their size, with estimates of size at reproductive maturity available as a supplement to Flügge et al. (2014). While Rajala et al. (2018) exclude from their analysis the species for which the size at reproductive maturity is not available, we do not since excluding entire species from the data set might lead to missed ecological interactions. Instead, we find that reproductive maturity is well explained by a regression YaSb, where S is the maximum diameter of the species and Y is the size at reproductive maturity. This leads us to exclude individuals that are below the reproductive size for their species, or if that trait is not available, below the extrapolated size at reproductive maturity inferred from their maximum diameter. Compared to Rajala et al. (2018), this retains more species. Finally, we group species with less than 70 individuals into a separate category which shall still play a role in the interactions accounted for by the model. After this procedure, we end up with 82 different species comprising around 45,000 individual trees. This constitutes a few thousand more individuals and nine times more species than Waagepetersen et al. (2016); 50% more individuals and roughly the same number of species as Rajala et al. (2018).

We fix the saturation parameter N to 2 and let the shape of the potential functions be the square bump and normal, respectively. We choose 10m as the short-range interaction radius and search for residual medium-range interactions between 20 and 40m. These values are in line with the results of neighbourhood-dependent growth models, see table 4 in Uriarte et al. (2004). We implemented a Lasso regularisation of the logistic regression of Section 3.1 in order to facilitate the analysis of the many potential resulting interactions. The theoretical justification for using regularisation on the composite likelihood is provided in Daniel et al. (2018), see also Ba and Coeurjolly (2020) for the asymptotic properties of the regularised estimator in our setting. We chose as the regularisation parameter the one that minimises AIC.

6.3.1 Results

We start by presenting in Figure 8 the intra-species interactions coefficients. We broadly observe that most species are clustered, with a few exhibiting very significant clumping. Notably, our three most clustered species are Anaxagorea panamensis, Bactris major and Rinorea sylvatica which were highlighted in Seri et al. (2015) as ‘exceptional species’ in terms of their clustering. In addition, in part due to the removal of immature trees, we find some species which have negative or null intra-species interactions, leading to regular distributions. In Figure 12 we show in more detail the spatial distribution of four such species. Protium panamense is an instructive example that exhibits strong intra-species short-range negative interactions and almost no medium-range interactions. This species was analysed in Waagepetersen et al. (2016) without removing immature trees. Analysing the configuration of mature trees in their framework would be more challenging since the Cox process in their model is restricted to positive associations between individuals and therefore cannot properly account for these negative intra-species interactions.

On the left-hand side, short range interaction coefficients within each of the species αi,i. On the right-hand side, medium-range interaction coefficients between each of the species γi,i. The estimates were obtained by averaging out the results of 10 logistic regressions, each with a different binomial draw of the dummy points D. The error bars represent the variation among these draws. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 8

On the left-hand side, short range interaction coefficients within each of the species αi,i. On the right-hand side, medium-range interaction coefficients between each of the species γi,i. The estimates were obtained by averaging out the results of 10 logistic regressions, each with a different binomial draw of the dummy points D. The error bars represent the variation among these draws. [Colour figure can be viewed at https://dbpia.nl.go.kr]

In Figure 9 we show the inter-species interaction coefficients. We find that our model has properly disentangled two different kinds of associations. First, on the short range, species are generally negatively associated with one another, which is a strong marker of competition for resources. Second, on the medium range, we see substantially more positive associations, possibly indicating some dependency on unmeasured environmental covariates. Others in the literature (Rajala et al., 2018; Waagepetersen et al., 2016) have not been able to disentangle these numerous short-range negative interactions from associations at broader scales. We find that some of the species pairs studied in Waagepetersen et al. (2016) are negatively associated, for example Swartzia simplex with most other species, or Hirtella triandra with Garcinia intermedia. These negative associations were not picked up by Waagepetersen et al. (2016) while they were corroborated by our analysis of Ripley's cross K-function (not shown here). Indeed, all significant interactions in Waagepetersen et al. (2016) were found to be positive. We were unable to compare our results with those of Rajala et al. (2018) more closely since they did not report the species' label in their figures.

On the left-hand side, the 40 largest short-range interaction coefficients between the species αi,j. On the right-hand side, the 40 largest medium range interaction coefficients between each of the species γi,j. The coefficients shown in blue are negative, so that the corresponding interactions are repulsive, while those in red are positive, meaning the interactions are attractive. In both panels, the thickness of the cord is proportional to the strength of the interaction. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 9

On the left-hand side, the 40 largest short-range interaction coefficients between the species αi,j. On the right-hand side, the 40 largest medium range interaction coefficients between each of the species γi,j. The coefficients shown in blue are negative, so that the corresponding interactions are repulsive, while those in red are positive, meaning the interactions are attractive. In both panels, the thickness of the cord is proportional to the strength of the interaction. [Colour figure can be viewed at https://dbpia.nl.go.kr]

Ecological processes such as dispersal and competition are expected to display distinct spatial signatures (Seabloom et al., 2005). We hypothesise that the outputs of the model presented here partly result from these ecological processes. Our model has disentangled associations on different scales, providing a basis for dissecting the underlying ecological processes.

In terms of ecological insights, in Figure 10 we show that species with a smaller maximum diameter at breast height tend to be more clustered, with the relationship being statistically significant (p=0.000214 significance according to a Wald test). This is a well-known feature of the Barro Colorado Island data set that our model has successfully picked up, see for example Condit et al. (2000). We also found that larger species on average have more negative associations with other species, reflecting size-dependent competitive pressure (p<2·1016, Wald test, plot not shown here).

Mean of the intra-species interaction coefficient for each species (obtained as the average of αi,i and γi,i) as a function of the species' maximum diameter at breast height. The fit shown on the figure is a GAM fit with basis dimension 3, along with its 95% confidence bands. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 10

Mean of the intra-species interaction coefficient for each species (obtained as the average of αi,i and γi,i) as a function of the species' maximum diameter at breast height. The fit shown on the figure is a GAM fit with basis dimension 3, along with its 95% confidence bands. [Colour figure can be viewed at https://dbpia.nl.go.kr]

6.3.2 Model assessment

We shall show next that our model satisfies the following compelling criteria:

  • (i)

    for a given species, conditioning on other species and accounting for the corresponding interactions yields a conditional occurrence probability estimate which captures the inhomogeneity in the point pattern well;

  • (ii)

    the intra-species interaction coefficients indicate clustering or regularity in each of the species' spatial distribution;

  • (iii)

    the inter-species interaction coefficients shown in Figure 9 capture actual associations between species in the data set.

(i) Species-specific intensity

We begin by showing that our model correctly captures the underlying spatial inhomogeneity. Consider a species i, and the configuration ωi in which we remove all individuals of species i. Recall that the Papangelou conditional intensity π((x,i,m),ωi) is interpreted as the probability of finding an individual of species i around x and with mark around m, conditional on individuals of other species. We expect individuals of species i to be found at locations where this Papangelou conditional intensity takes large values. We would like to assess how well the Papangelou conditional intensity π((x,i,m),ωi) is able to separate the region into high and low density of individuals of species i. For that purpose, we compute the Area Under the ROC Curve (AUC), compare with Nam and D'Agostino (2002). In the point process framework, the AUC is computed by discretising the study area, and choosing as events the presence or absence of an individual in a cell (see, e.g., Lombardo et al., 2018). More precisely, in our context, for a (conditional) intensity λ the AUC is defined (see section 6.7.3 in Baddeley et al., 2015) as

where X is a uniformly chosen point of the point process (in our case, of species i) and U is a continuous random variable uniformly distributed over the study region. The AUC measures the ability of the intensity to properly separate the region into areas of high and low density of individuals, with a value of 0.5 indicating a lack of discriminatory power. In our analysis, we have discretised the study region into 1m×1m cells, computed λ at each cell and at the location of each individual of species i to produce an estimate of the AUC. We have in practice used the auc.ppp function in spatstat Baddeley et al. (2015).

More precisely, we proceed as follows. First, we fit each species separately according to a Poisson point process driven by the same six environmental covariates used in our case study, and produce a maximum likelihood intensity estimate. Second, for each species, we compute the Papangelou conditional intensity of our fitted Gibbs point process, conditional on other species (as described in the previous paragraph), over the whole area. We then compute the AUC in both cases. We show in Figure 11 the resulting performance gain in terms of AUC species by species. The saturated pairwise interactions Gibbs point process attains an average AUC of 0.76 by conditioning on other species, compared to an average of 0.65 for the standard Poisson point process. We find that the AUC of most species is improved. This shows that inter-species interactions are important in shaping the species' conditional distributions. We acknowledge that part of this improvement is due to our model having more parameters; our main point here is that the model is indeed capturing associations between species and capitalising on these to improve the conditional intensity estimates.

Conditional Area Under the ROC Curve (AUC) improvement species by species, when going from an inhomogeneous Poisson point process to the saturated pairwise interaction Gibbs point process. Each blue point corresponds to one species. Points in the top-left quadrant indicate species for which our model produces a better AUC than that of an inhomogeneous Poisson point process model. The average AUC improvement is 0.11 and our model has improved the conditional AUC for 83% of species. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 11

Conditional Area Under the ROC Curve (AUC) improvement species by species, when going from an inhomogeneous Poisson point process to the saturated pairwise interaction Gibbs point process. Each blue point corresponds to one species. Points in the top-left quadrant indicate species for which our model produces a better AUC than that of an inhomogeneous Poisson point process model. The average AUC improvement is 0.11 and our model has improved the conditional AUC for 83% of species. [Colour figure can be viewed at https://dbpia.nl.go.kr]

In order to illustrate how well the Papangelou conditional intensity resembles the actual spatial distribution, let us take a closer look at the four species which were found to exhibit most intra-species short-range repulsion, namely Protium panamense (‘protpa’), Prioria copaifera (‘pri2co’), Apeiba membranacea (‘apeime’) and Hura crepitans (‘huracr’). We show in Figure 12 the Papangelou conditional intensity computed at each of the species, conditional on other species. We see clearly that for these species, our model has properly separated the region into locations where the species occurs and others where it does not. The rather large corresponding AUC values for these species ranging from 0.76 to 0.90 corroborate this result.

Log-Papangelou conditional intensities of the four most repulsive species in our model, conditional on all other species, see the text for details on how this quantity is defined. Our model has captured most of the spatial inhomogeneity and its conditional intensity has properly separated the area into areas of high and low density of individuals. This is well quantified by the area under the ROC curve (AUC) metric which is quite high for these species (AUCprotpa=0.76, AUCpri2co=0.90, AUCapeime=0.81, AUChuracr=0.90). [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 12

Log-Papangelou conditional intensities of the four most repulsive species in our model, conditional on all other species, see the text for details on how this quantity is defined. Our model has captured most of the spatial inhomogeneity and its conditional intensity has properly separated the area into areas of high and low density of individuals. This is well quantified by the area under the ROC curve (AUC) metric which is quite high for these species (AUCprotpa=0.76, AUCpri2co=0.90, AUCapeime=0.81, AUChuracr=0.90). [Colour figure can be viewed at https://dbpia.nl.go.kr]

(ii) Intra-species clustering

We now show that the intra-species clustering or regularity is partly captured by the intra-species interaction coefficients. We characterise intra-species clustering in terms of the inhomogeneous L-function defined, for example, on p. 32 of Møller and Waagepetersen (2004). In general, for any two species i and j and a distance R>0, we define

(9)

as a measure of the association between individuals of species i and j within a distance R of each other. In the equation above, Li,j(r) is the cross inhomogeneous L-function defined on p. 49 of Møller and Waagepetersen (2004), and which generalises the usual inhomogeneous L-function.

In order to evaluate the degree of clustering in each species, we shall perform a hypothesis test (see chapter 10 in Baddeley et al., 2015). Let our null hypothesis be that a given species i is an inhomogeneous Poisson point process, conditionally on all other species. By Proposition 1 in Appendix S1, this is for example the case if αi,i=γi,i=0 and the saturation parameter N is sufficiently large. Under these hypotheses, the conditional point process is second-order intensity reweighted stationary (SOIRS), see Møller and Waagepetersen (2004, definition 4.5), and so the standard definition of Li,i makes sense. In particular, in this case Li,i is expected to be zero. Again by Proposition 1 in Appendix S1, the intensity of the conditional point process is proportional to π((x,i,m),ξi), where ξi is the point process consisting in individuals of species other than i. The statistic Li,i could be estimated by normalising the standard estimator by the fitted Papangelou conditional intensity, but we choose instead to rely on the leave-one-out kernel smoother derived in section 2.2 of Baddeley et al. (2000). If the corresponding empirical L-function is outside the simulation envelopes obtained by draws of an inhomogeneous Poisson point process with intensity the standard leave-one-out kernel estimate of the species, then we have grounds to reject the null hypothesis. When the null hypothesis does not hold, strictly speaking, the previous definition of Li,i and of its estimator do not make sense because first, π((x,i,m),ξi) can not be viewed as proportional to the intensity and second, even if this were the case, the SOIRS assumption is not met. However, we can expect that under the alternative the estimator of Li,i diverges from the expected value under the null with the same interpretation as under the SOIRS assumption. More precisely, values above rπr2 of Li,i, and thus positive values of Li,i, indicate more species-specific clustering than if the species were conditionally an inhomogeneous Poisson point process. Negative values of Li,i instead indicate more regularity.

We find in Figure 13 that 86% of species which were above the envelopes (i.e., indicating that the species is significantly more clustered than would be expected if it were conditionally Poisson distributed) were also found to have positive short-range interaction coefficients. Both species which were below the envelopes were also found to have negative short-range interaction coefficients. In addition, we find that the intra-species short-range interaction coefficients αi,i are positively correlated with Li,i, with Pearson coefficient 0.71, and show in Figure 13 a scatter plot of all 82 species. Overall, species which are more clustered than would be expected if they were conditionally Poisson distributed tend to have positive short-range intra-species interaction coefficients, and conversely species which are more regular tend to have negative coefficients. This can also be seen visually in Figure 12, where we show that the four most repulsive species–with their estimated intensity shown in the background–tend to have a more regular distribution than that of a (conditional) inhomogeneous Poisson point process.

Scatter plot of the intra-species short-range interaction coefficients αi,i in terms of Li,i‾, for r ranging from 0 to 20 m. We have superimposed the results of a linear regression along with its 95% confidence bands (slope 1.04). The envelopes are computed with 400 draws of an inhomogeneous Poisson point process. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 13

Scatter plot of the intra-species short-range interaction coefficients αi,i in terms of Li,i, for r ranging from 0 to 20m. We have superimposed the results of a linear regression along with its 95% confidence bands (slope 1.04). The envelopes are computed with 400 draws of an inhomogeneous Poisson point process. [Colour figure can be viewed at https://dbpia.nl.go.kr]

(iii) Inter-species clustering

We characterise inter-species associations in terms of the inhomogeneous cross L-function Li,j(r) described above. We still use definition (9) to analyse inter-species interactions. Assume as the null hypothesis that for two species i and j we have αi,j=γi,j=0. By Proposition 2 in Appendix S1, the two species are independent conditionally on other species. By proposition 4.4 in Møller and Waagepetersen (2004), under these hypotheses, the conditional point process formed of the two species is cross SOIRS (see definition 4.8 in Møller & Waagepetersen, 2004). In this case, the definition of Li,j makes sense and Li,j is equal to zero. As in (ii) above, strictly speaking, under the alternative hypothesis the definition of Li,j and its estimator do not make sense. However, we can again expect that values of Li,j(r) above their expectation under the null point to species positively associated, and conversely values below their expectation under the null indicate negatively associated species. Therefore, negative values of Li,j correspond to repulsion and positive values correspond to positive associations (at least for small values of r, see p. 49 of Møller & Waagepetersen, 2004), and so this quantity serves as a good indicator of spatial associations between species.

Heuristically, then, Li,j represents the average relative distance to the theoretical cross L-function if the two species were independent conditionally on other species. So, for example Li,j=0.5 indicates that the cross L-function is on average 50% less than if the two species were independent. Envelopes are not as straightforward to produce as in the intra-species setting (ii) above, though. Indeed, the null hypothesis in this case is that species i and j are independent point processes, but they need not be Poissonian. And indeed, in general they are not even saturated pairwise interaction Gibbs point process, and their simulation (conditional on other species containing tens of thousands of individuals) is very computationally demanding. Therefore, in Figure 14 we restrict ourselves to the 28 species which were not found in (ii) to depart from the conditional inhomogeneous Poisson hypothesis.

Scatter plot of the inter-species short-range interaction coefficients αi,j in terms of Li,j‾, for r ranging from 0 to 20 m. We have superimposed the results of a linear regression along with its 95% confidence bands (slope 0.34). The envelopes were computed with 400 draws of two independent inhomogeneous Poisson point processes. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 14

Scatter plot of the inter-species short-range interaction coefficients αi,j in terms of Li,j, for r ranging from 0 to 20m. We have superimposed the results of a linear regression along with its 95% confidence bands (slope 0.34). The envelopes were computed with 400 draws of two independent inhomogeneous Poisson point processes. [Colour figure can be viewed at https://dbpia.nl.go.kr]

We find in Figure 14 that 67% of species which were above the envelopes (indicating that the two species are found closer than would be expected if they were independent) were also found to have positive short-range inter-species interaction coefficients. In addition, 93% of species which were below the envelopes were found to have negative short-range inter-species interaction coefficients. We also show in Figure 14 a scatter plot of all species pairs and also observe that Li,j and αi,j are positively correlated with Pearson coefficient 0.48. Our findings lend credence to the fact that the short-range interaction coefficients αi,j capture associations between individuals of different species. Overall, we have shown that the short-range interaction coefficients capture associations between individuals, both within and between species, and the way the model accounts for these associations convincingly models the species' conditional spatial distribution.

7 DISCUSSION

Two main classes of models had previously been proposed to analyse the spatial arrangement of individuals in large multi-species ecological datasets. First, the log-Gaussian Cox process proposed in Waagepetersen et al. (2016) is an elegant model that fits within a Bayesian framework well, but cannot model competition causing repulsion within a species, nor does it scale well with the number of species. In addition, the latent correlated Gaussian fields have no straightforward interpretation in ecological applications. Furthermore, as pointed out when analysing Protium panamense in Section 6.3, the multivariate log-Gaussian Cox process cannot serve as a model for a species with null or negative intra-species interactions that interacts with other species. Second, the saturated Gibbs point process introduced in Rajala et al. (2018) captures pairwise interactions over different ranges, and scales well with the number of species. We find the second class to be more compelling. Inspired by the work of Rajala et al. (2018), in this manuscript we have introduced the ‘saturated pairwise interaction Gibbs point process’ to start working towards a unified framework to untangle the three main drivers underlying community assembly, namely species' dispersal abilities, environmental tolerance and biotic interactions.

In contrast to the model in Rajala et al. (2018), in modelling pairwise interactions, we allow the use of more realistic smooth potential functions instead of linear combinations of step functions. Moreover, our model has a role for marks such as the individuals' size, and these are thought to be influential in affecting species' distribution. These two features have allowed us to handle applications that are out of reach of existing models. For example, the locations of Norway spruces studied in Section 6.1 exhibit exponential pairwise interactions at a distance that is proportional to individuals' diameters. We have also studied other spatial patterns from plant ecology in which competing ecological factors are at play, and have shown how these mechanisms materialise within the framework of the model. We have found that our model has performed well in the Barro Colorado Island analysis in Section 6.3, a dataset containing almost a hundred species and many thousands of individual trees. This has helped us gain additional insights into three very different ecosystems, namely a spruce forest from northern Europe, a subtropical swamp forest, and a neotropical rainforest.

Additionally, we have addressed the problem of simulating this point process, and in particular, we proved in Proposition 1 a crucial result that allows us to apply the ‘coupling from the past’ algorithm to draw samples from the point process. In our manuscript, simulating from the model has helped us carefully validate the model's performance and allowed us to do a sensitivity analysis, see Section 5. We also believe that simulating from the model will be important in future work, since it is necessary to do Monte Carlo simulations as well as compute simulation envelopes and run goodness of fit tests.

Our model can be applied in a wide range of settings, and may also be useful outside of ecology. Indeed, the notion of a physical pairwise interaction making it more or less likely that two individuals occur close by is a compelling assumption that surely also makes sense in physics, epidemiology and economics among others. We have consequently made our fitting and simulation procedures available as an open-source R package, see Appendix S1 for more details.

DATA AVAILABILITY STATEMENT

The model described in this manuscript is openly available on Github at https://www.github.com/iflint1. The analyses of real data sets can be reproduced by executing the corresponding scripts made public as Github gists at https://gist.github.com/iflint1/.

1Hereon, the term ‘decreasing’ is to be understood in the weak sense, that is, φ is said to be decreasing if for all xy, φ(x)φ(y).

ACKNOWLEDGEMENTS

We thank the anonymous referees whose comments helped improve a previous version of this manuscript. One of the referees in particular has helped us make substantial improvements to an earlier version. This work was supported by Australian Research Council Grant number DP190100613. Open access publishing facilitated by The University of Melbourne, as part of the Wiley - The University of Melbourne agreement via the Council of Australian University Librarians.

REFERENCES

Andrews
,
J.G.
,
Ganti
,
R.K.
,
Haenggi
,
M.
,
Jindal
,
N.
&
Weber
,
S.
(
2010
)
A primer on spatial modeling and analysis in wireless networks
.
IEEE Commun Mag
,
48
,
156
163
.

Ba
,
I.
&
Coeurjolly
,
J.F.
(
2020
)
High-dimensional inference for inhomogeneous Gibbs point processes
.

Babu
,
G.J.
&
Feigelson
,
E.D.
(
1996
)
Spatial point processes in astronomy
.
Journal of Statistical Planning and Inference
,
50
,
311
326
.

Baccelli
,
F.
&
Błlaszczyszyn
,
B.
(
2009
)
Stochastic geometry and wireless networks: volume I theory
.
Hanover, MA
:
Now Publishers Inc
Available from: https://doi.org/10.1561/1300000006

Baddeley
,
A.J.
,
Møller
,
J.
&
Waagepetersen
,
R.
(
2000
)
Non-and semi-parametric estimation of interaction in inhomogeneous point patterns
.
Statistica Neerlandica
,
54
,
329
350
. Available from: https://doi.org/10.1111/1467-9574.00144

Baddeley
,
A.
,
Coeurjolly
,
J.-F.
,
Rubak
,
E.
&
Waagepetersen
,
R.
(
2014
)
Logistic regression for spatial Gibbs point processes
.
Biometrika
,
101
,
377
392
.

Baddeley
,
A.
,
Rubak
,
E.
&
Turner
,
R.
(
2015
)
Spatial point patterns: methodology and applications
.
London
:
Chapman & Hall/CRC Press
.

Coeurjolly
,
J.-F.
&
Rubak
,
E.
(
2013
)
Fast covariance estimation for innovations computed from a spatial Gibbs point process
.
Scandinavian Journal of Statistics
,
40
,
669
684
.

Condit
,
R.
(
1998
)
Tropical forest census plots
.
Berlin, Germany, and Georgetown, Texas
:
Springer-Verlag and R. G. Landes Company
.

Condit
,
R.
,
Ashton
,
P.S.
,
Manokaran
,
N.
,
LaFrankie
,
J.V.
,
Hubbell
,
S.P.
&
Foster
,
R.B.
(
1999
)
Dynamics of the forest communities at Pasoh and Barro Colorado: comparing two 50-ha plots
.
Philosophical Transactions of the Royal Society of London Series B, Biological Sciences
,
354
,
1739
1748
.

Condit
,
R.
,
Ashton
,
P.S.
,
Baker
,
P.
,
Bunyavejchewin
,
S.
,
Gunatilleke
,
S.
,
Gunatilleke
,
N.
et al. (
2000
)
Spatial patterns in the distribution of tropical tree species
.
Science
,
288
,
1414
1418
. Available at: https://science.sciencemag.org/content/288/5470/1414

Connor
,
C.B.
&
Hill
,
B.E.
(
1995
)
Three nonhomogeneous Poisson models for the probability of basaltic volcanism: application to the Yucca mountain region, Nevada
.
Journal of Geophysical Research: Solid Earth
,
1978-2012
(
100
),
10107
10125
.

Daley
,
D.J.
&
Vere-Jones
,
D.
(
2003
)
An introduction to the theory of point processes
.
Probability and Its Applications
, Vol.
1
.
New York
:
Springer-Verlag
.

Daley
,
D.J.
&
Vere-Jones
,
D.
(
2008
)
An introduction to the theory of point processes
.
Probability and Its Applications
, Vol.
2
.
New York
:
Springer-Verlag
.

Daniel
,
J.
,
Horrocks
,
J.
&
Umphrey
,
G.J.
(
2018
)
Penalized composite likelihoods for inhomogeneous Gibbs point process models
.
Computational Statistics & Data Analysis
,
124
,
104
116
.

Deyi
,
Y.
,
Liu
,
Y.
,
Ye
,
Q.
,
Cadotte
,
M.
&
He
,
F.
(
2020
November 13)
Trait dissimilarity and hierarchy predict spatial co-occurrence patterns of tree species in a subtropical forest
.
Authorea
. Available from: https://doi.org/10.22541/au.160525624.40303010/v1

Dixon
,
P.M.
(
2002
)
Nearest-neighbor contingency table analysis of spatial segregation for several species
.
Ecoscience
,
9
,
142
151
.

Fiksel
,
T.
(
1988
)
Estimation of interaction potentials of Gibbsian point processes
.
Statistics
,
19
,
77
86
.

Flügge
,
A.J.
,
Olhede
,
S.C.
&
Murrell
,
D.J.
(
2014
)
A method to detect subcommunities from multivariate spatial associations
.
Methods in Ecology and Evolution
,
5
,
1214
1224
.

Geyer
,
C.J.
(
1999
) Likelihood inference for spatial point processes: likelihood and computation. In:
Kendall
,
W.
,
Barndroff-Nielsen
,
O.
&
van
Lieshout
,
M.N.
(Eds.)
Stochastic geometry: likelihood and computation
.
London
:
Chapman & Hall/CRC Press
, pp.
141
172
.

Good
,
B.J.
&
Whipple
,
S.A.
(
1982
)
Tree spatial patterns: South Carolina bottomland and swamp forests
.
Bulletin of the Torrey Botanical Club
,
109
,
529
536
.

Goulard
,
M.
,
Särkkä
,
A.
&
Grabarnik
,
P.
(
1996
)
Parameter estimation for marked Gibbs point processes through the maximum pseudolikelihood method
.
Scandinavian Journal of Statistics
,
23
,
365
379
.

Hubbell
,
S.
,
Condit
,
R.
&
Foster
,
R.
(
2012
)
Barro Colorado forest census plot data
. Available at: http://ctfs.si.edu/webatlas/datasets/bci.

John
,
R.
,
Dalling
,
J.W.
,
Harms
,
K.E.
,
Yavitt
,
J.B.
,
Stallard
,
R.F.
,
Mirabello
,
M.
et al. (
2007
)
Soil nutrients influence spatial distributions of tropical tree species
.
Proceedings of the National Academy of Sciences
,
104
,
864
869
. Available at: https://www.pnas.org/content/104/3/864

Jones
,
R.H.
,
Sharitz
,
R.R.
,
James
,
S.M.
&
Dixon
,
P.M.
(
1994
)
Tree population dynamics in seven South Carolina mixed-species forests
.
Bulletin of the Torrey Botanical Club
,
121
,
360
368
.

Kallenberg
,
O.
(
1983
)
Random measures
, 3rd edition. Berlin, Germany:
Akademie-Verlag
.

Kupers
,
S.J.
,
Wirth
,
C.
,
Engelbrecht
,
B.M.J.
&
Rüger
,
N.
(
2019
)
Dry season soil water potential maps of a 50 hectare tropical forest plot on Barro Colorado Island
.
Panama. Scientific Data
,
6
,
63
.

Lombardo
,
L.
,
Opitz
,
T.
&
Huser
,
R.
(
2018
)
Point process-based modeling of multiple debris flow landslides using INLA: an application to the 2009 Messina disaster
.
Stochastic Environmental Research and Risk Assessment
,
32
,
2179
2198
.

Mohler
,
G.O.
,
Short
,
M.B.
,
Brantingham
,
P.J.
,
Schoenberg
,
F.P.
&
Tita
,
G.E.
(
2011
)
Self-exciting point process modeling of crime
.
Journal of the American Statistical Association
,
106
,
100
108
.

Møller
,
J.
&
Berthelsen
,
K.K.
(
2012
)
Transforming spatial point processes into Poisson processes using random superposition
.
Advances in Applied Probability
,
44
,
42
62
.

Møller
,
J.
&
Waagepetersen
,
R.P.
(
2004
)
Statistical inference and simulation for spatial point processes
.
London
:
Chapman & Hall/CRC Press
.

Nam
,
B.-H.
&
D'Agostino
,
R.B.
(
2002
)
Discrimination index, the area under the ROC curve
.
Boston, MA
:
Birkhäuser
. Available from:, pp.
267
279
. https://doi.org/10.1007/978-1-4612-0103-8_20

Ovaskainen
,
O.
,
Tikhonov
,
G.
,
Norberg
,
A.
,
Guillaume Blanchet
,
F.
,
Duan
,
L.
,
Dunson
,
D.
et al. (
2017
)
How to make more out of community data? A conceptual framework and its implementation as models and software
.
Ecology Letters
,
20
,
561
576
.

Penttinen
,
A.
,
Stoyanell
,
D.
&
Henttonen
,
H.M.
(
1992
)
Marked point processes in forest statistics
.
Forest Science
,
638
,
806
824
.

Punchi-Manage
,
R.
,
Getzin
,
S.
,
Wiegand
,
T.
,
Kanagaraj
,
R.
,
Gunatilleke
,
C.V.S.
,
Gunatilleke
,
I.A.U.N.
et al. (
2013
)
Effects of topography on structuring local species assemblages in a Sri Lankan mixed dipterocarp forest
.
Journal of Ecology
,
101
,
149
160
.

Rajala
,
T.
,
Murrell
,
D.J.
&
Olhede
,
S.C.
(
2018
)
Detecting multivariate interactions in spatial point patterns with Gibbs models and variable selection
.
Journal of the Royal Statistical Society: Series C
,
67
,
1237
1273
.

R Core Team (

2019
)
R: a language and environment for statistical computing
. Vienna, Austria:
R Foundation for Statistical Computing
. Available from: https://www.R-project.org/

Seabloom
,
E.W.
,
Bjørnstad
,
O.N.
,
Bolker
,
B.M.
&
Reichman
,
O.J.
(
2005
)
Spatial signature of environmental heterogeneity, dispersal, and competition in successional grasslands
.
Ecological Monographs
,
75
,
199
214
. Available from: https://doi.org/10.1890/03-0841

Seri
,
E.
,
Shtilerman
,
E.
&
Shnerb
,
N.M.
(
2015
)
The GLOCAL forest
.
PLoS One
,
10
,
1
9
. Available from: https://doi.org/10.1371/journal.pone.0126117

Shen
,
G.
,
He
,
F.
,
Waagepetersen
,
R.
,
Sun
,
I.-F.
,
Hao
,
Z.
,
Chen
,
Z.-S.
et al. (
2013
)
Quantifying effects of habitat heterogeneity and other clustering processes on spatial distributions of tree species
.
Ecology
,
94
,
2436
2443
.

Thompson
,
H.
(
1955
)
Spatial point processes, with applications to ecology
.
Biometrika
,
42
,
102
115
.

Uriarte
,
M.
,
Condit
,
R.
,
Canham
,
C.D.
&
Hubbell
,
S.P.
(
2004
)
A spatially explicit model of sapling growth in a tropical forest: does the identity of neighbours matter?
Journal of Ecology
,
92
,
348
360
.

Waagepetersen
,
R.
,
Guan
,
Y.
,
Jalilian
,
A.
&
Mateu
,
J.
(
2016
)
Analysis of multispecies point patterns by using multivariate log-Gaussian Cox processes
.
Journal of the Royal Statistical Society: Series C
,
65
,
77
96
.

Waller
,
L.A.
&
Gotway
,
C.A.
(
2004
)
Applied spatial statistics for public health data
.
Boca Raton
:
John Wiley & Sons
.

Weiher
,
E.
,
Freund
,
D.
,
Bunton
,
T.
,
Stefanski
,
A.
,
Lee
,
T.
&
Bentivenga
,
S.
(
2011
)
Advances, challenges and a developing synthesis of ecological community assembly theory
.
Philosophical Transactions of the Royal Society B: Biological Sciences
,
366
,
2403
2413
.

Wiegand
,
T.
,
Gunatilleke
,
S.
&
Gunatilleke
,
N.
(
2007
)
Species associations in a heterogeneous Sri Lankan Dipterocarp forest
.
The American Naturalist
,
170
,
E77
E95
.

Author notes

Funding information Australian Research Council, Grant/Award Number: DP190100613

This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.