-
PDF
- Split View
-
Views
-
Cite
Cite
Virgilio Gómez-Rubio, Francisco Palmí-Perales, Multivariate Posterior Inference for Spatial Models with the Integrated Nested Laplace Approximation, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 68, Issue 1, January 2019, Pages 199–215, https://doi.org/10.1111/rssc.12292
- Share Icon Share
Summary
The integrated nested Laplace approximation (INLA) is a convenient way to obtain approximations to the posterior marginals for parameters in Bayesian hierarchical models when the latent effects can be expressed as a Gaussian Markov random field. In addition, its implementation in the R-INLA package for the R statistical software provides an easy way to fit models using the INLA in practice in a fraction of the time that other computer-intensive methods (e.g. Markov chain Monte Carlo methods) take to fit the same model. Although the INLA provides a fast approximation to the marginals of the model parameters, it is difficult to use it with models that are not implemented in R-INLA. It is also difficult to make multivariate posterior inference on the parameters of the model as the INLA focuses on the posterior marginals and not the joint posterior distribution. We describe how to use the INLA within the Metropolis–Hastings algorithm to fit complex spatial models and to estimate the joint posterior distribution of a small number of parameters. We illustrate the benefits of this new method with two examples. In the first, a spatial econometrics model with two auto-correlation parameters (for the response and the error term) is considered. This model is not currently available in R-INLA, and multivariate inference is often required to assess dependence between the two spatial auto-correlation parameters in the model. Furthermore, the estimation of spillover effects is based on the joint posterior distribution of a spatial auto-correlation parameter and a covariate coefficient. In the second example, a multivariate spatial model for several diseases is proposed for disease mapping. This model includes a shared specific spatial effect as well as disease-specific spatial effects. Dependence on the shared spatial effect is modulated via disease-specific weights. By inspecting the joint posterior distribution of these weights it is possible to assess which diseases have a similar spatial pattern.
1 Introduction
Bayesian inference on complex hierarchical models has relied on Markov chain Monte Carlo (MCMC) methods for many years. Inference with MCMC sampling is based on drawing samples from the joint posterior distribution of the model parameters, which is often of a high dimension. For Bayesian hierarchical models with complex structure or large data sets, MCMC sampling can be very computationally demanding. See, for example, Brooks et al. (2011) for a recent summary of MCMC methods.
To avoid dealing with high dimensional posterior distributions that are difficult to estimate, Rue et al. (2009) focused on marginal inference and they developed a method to approximate the posterior marginal of the model parameters by using the Laplace approximation and numerical integration. Also, they focused on models whose latent effects are a Gaussian Markov random field (GMRF). GMRFs have several properties that can be exploited for computational efficiency when fitting Bayesian models (Rue and Held, 2005). This new method has been termed the integrated nested Laplace approximation (INLA) and an implementation is available in the RINLA package, that can cope with a large family of models.
Bivand et al. (2014) described a novel approach to fit models that are not implemented in RINLA with the INLA. They noted that some models can be fitted with RINLA when one of the parameters is fixed, such as several models that are widely used in spatial econometrics (LeSage and Pace, 2009). In this particular case, the models can be fitted when the spatial auto-correlation parameter is fixed. As this parameter is in a bounded interval, values for the spatial auto-correlation parameters can be taken from a fine grid on its (bounded) support. Conditioning on these values, models can be fitted to obtain conditional posterior marginals of all the other parameters. The posterior marginal of the spatial auto-correlation parameter is then obtained by combining the marginal likelihoods of the fitted models and the prior by using the Bayes rule. The posterior marginals for the remainder of the parameters in the model are obtained by averaging over the different conditional posterior marginals.
The previous method can be easily extended to more than one parameter but, as the number of parameters increases, the number of models to be fitted increases exponentially. Also, it is problematic when the parameters that need to be fixed are not in a bounded interval. Hence, if the model needs to be conditioned on several parameters to be fitted with RINLA, a different approach is required.
Gómez-Rubio and Rue (2018) suggest using the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) together with the INLA when models need to be conditioned on several parameters. In this way, the joint posterior distribution of an ensemble of parameters can be obtained via MCMC sampling, whereas the posterior marginals of all the other parameters are obtained by averaging over several conditional marginal distributions.
This paper is motivated by the need to fit complex spatial econometrics models and models for disease mapping with a weighted sum of several spatial effects, for which we are also interested in how to perform multivariate inference on a small ensemble of parameters. Multivariate inference on a small ensemble of parameters of the spatial models will rely on the output that is produced by the INLA-within-MCMC algorithm (Gómez-Rubio and Rue, 2018), i.e. samples from the joint posterior distribution of the ensemble of parameters.
This paper is structured as follows. A summary of the INLA and a detailed description of the spatial models that are implemented in RINLA is given in Section 2. Next, Section 3 covers how the INLA can be used within the Metropolis–Hastings algorithm to fit new spatial models. This is illustrated in Section 4, where two examples on spatial econometrics and joint disease mapping are developed. Finally, a summary and discussion are available in Section 5.
2 Spatial models with the integrated nested Laplace approximation
The INLA (Rue et al., 2009) provides a way to obtain approximations to the posterior marginals of effects and hyperparameters of a Bayesian hierarchical model that can be expressed as a latent GMRF. Some recent developments have been described in Martins et al. (2013), and Rue et al. (2017), which will provide a good summary for those who are not familiar with the INLA and the RINLA package. More details can be found in Blangiardo and Cameletti (2015) (chapter 4), where computational details are clearly explained and a full example for a normal–gamma model is developed.
In a latent GMRF, the means of observations y=(y1,…,yn) are conveniently linked to a linear predictor on latent effects x, which have a multivariate Gaussian distribution with zero mean and variance covariate matrix that depends on a set of hyperparameters θ. INLAs can provide accurate approximations to the posterior marginals of the latent effects π(xi|y), i=1,…,nx, and hyperparameters π(θj|y), j=1,…,nθ. These approximations will be denoted by and respectively. Furthermore, the INLA can also compute other quantities of interest, such as different criteria for model assessment and choice. In particular, Rue et al. (2009) proposed an approximation to the marginal likelihood π(y). Note that this is often difficult to compute and that the approximation that is provided by the INLA is very accurate (see Hubin and Storvik (2016) and Gómez-Rubio and Rue (2018) for details).
The INLA is implemented as an R (R Core Team, 2018) package named RINLA. In addition to providing a simple way of defining models and computing the approximation to the posterior marginals, RINLA provides some extra features for model specification (see Martins et al. (2013) for details) and can compute derived quantities for model checking and model assessment. In particular, an approximation to the marginal likelihood of the model π(y) is provided, which can also be used for model selection, for example.
Several researchers (Gómez-Rubio et al., 2014; Bivand et al., 2015; Lindgren and Rue, 2015; Blangiardo and Cameletti, 2015) have summarized the various spatial models that are available in RINLA as latent effects that can be used to build models. We shall provide a summary to give an overview of what has already been implemented and what other spatial models have not been added yet. Furthermore, see Bakka et al. (2018) for a recent review on fitting spatial models with INLAs.
Spatial latent effects for lattice data in RINLA have a prior distribution which is a multivariate normal distribution with zero mean and precision matrix τT, where τ is a precision parameter and T is a square and symmetric matrix. The structure of T will control how the spatial dependence is and it can take different forms to induce different types of spatial interaction. A description of the main spatial models that have been implemented with the INLA is available in the on-line supplementary materials for this paper.
3 Integrated nested Laplace approximation within Markov chain Monte Carlo sampling
As described in the previous section, RINLA provides a large number of spatial latent effects that can be used to build more complex models. However, this list is far from exhaustive and it is difficult to implement new models in RINLA. A possible approach to implement new models is the rgeneric latent effect that allows the user to define the latent model in R but this requires the user to specify the full structure of the latent effect as a GMRF. Furthermore, this generic approach will estimate the posterior marginals of the parameters in the latent effect, which are seldom adequate for multivariate posterior inference.
To extend the number of models that the INLA can fit through the RINLA package, Gómez-Rubio and Rue (2018) propose the use of the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) together with the INLA to fit some complex models that are not implemented in RINLA. We shall now introduce this new approach, which we shall use to develop the examples in Section 4 to fit new spatial models and to make multivariate inference using the joint posterior distribution of a small subset of parameters.
Denote by θ the ensemble of parameters and hyperparameters to be estimated in a Bayesian hierarchical model. Now θ will include some of the latent effects in x and not only the hyperparameters θ that were described in Section 2. Similarly to Bivand et al. (2014), Gómez-Rubio and Rue (2018) argue that very complex spatial models can be fitted with RINLA when some of the parameters are fixed. Let us call these parameters θc so that θ becomes (θc,θ−c). By conditioning on θc, the INLA can be used to obtain the posterior marginals of all the parameters in θ−c, i.e. π(θ−c,i|y,θc), and the conditional marginal likelihood π(y|θc).
Li et al. (2012) dealt with complex spatiotemporal models in a similar way by conditioning on some of the parameters at their maximum likelihood (ML) estimates. Although this is a reasonable way to proceed for highly parameterized models, it ignores the uncertainty of some of the parameters in the model when computing the posterior marginal of the other model parameters. Bivand et al. (2014) also fitted spatial models conditioning on some of the model parameters at different values in a grid and then combined the resulting models to obtain the posterior marginals (not conditioned on θc) for the model of interest.
Note that is the marginal likelihood of the model conditioned on , which can be obtained by fitting the model after fixing the values of θc to . Also, this will provide the posterior marginals for all the parameters in θ−c.
Gómez-Rubio and Rue (2018) stated that the marginal likelihood that is used in equation (1) is an approximation. However, they argued that this error is very small because this approximation is very accurate (see Hubin and Storvik (2016) for a thorough study) and that the stationary distribution is very close to the actual posterior distribution. They also provided numerical evidence in a simulation study that supports that the use of the INLA within an MCMC algorithm will provide an accurate approximation to the posterior distribution. In addition, Gómez-Rubio and Rue (2018) illustrate the use of the INLA within the Metropolis–Hastings algorithm with three examples, including fitting a simple spatial econometrics model. In addition to fitting models with unimplemented latent effects, this approach can be employed to use other priors that are not implemented in RINLA (in particular, multivariate priors) and to fit models with missing values in the covariates.
Finally, it is difficult to give a general rule for the choice of θc in highly parameterized models. As a simple rule, it can be taken such as the conditional model on θc is a latent Gaussian model and can be fitted with RINLA. Similarly, θc can be chosen so that, when fixed, the resulting model is a generalized linear mixed model. In this case, the design matrix on the covariates will not depend on any other hyperparameters and the variance matrix of the random effects can be expressed as a precision parameter times a matrix with known structure (that does not depend on any other hyperparameters).
In the examples in the next section we consider some spatial models and provide some hints on how to choose θc. Furthermore, we stress the importance of using the INLA-within-MCMC approach for multivariate inference on any number of model parameters, which is a point that is not sufficiently addressed in Gómez-Rubio and Rue (2018).
3.1 Alternatives to the integrated nested Laplace approximation within Markov chain Monte Carlo sampling approach
The INLA-within-MCMC approach can be regarded as a way to integrate out most of the parameters in the model so that we need to deal with only the set of parameters θc. In this way, the INLA-within-MCMC approach uses MCMC sampling to estimate the posterior distribution of θc, and the marginal distributions of the remainder of the parameters in the model are obtained by Bayesian model averaging. For this reason, the INLA can be regarded as a device to reduce the dimension of the model to focus on a small subset of parameters θc.
The INLA-within-MCMC approach can deal with any model that can be expressed as a conditional latent GMRF model. However, it is still a sequential algorithm that may take time to converge. For this reason, we believe that other alternatives to estimate the posterior distribution of θc should be considered in practice.
First, instead of estimating the posterior distribution of θc these parameters may be fixed at their maximum likelihood estimates or posterior modes (if known) . This will simplify inference on θc (i.e. the posterior is a Dirac distribution at ) and ignore the uncertainty about these parameters in the model but this approximation to π(θc|y) is simple to compute and it may also provide good approximations to the posterior marginals of the other parameters in the model. For example, Li et al. (2012) used a similar solution (e.g. fixing the values of some parameters in the model) to fit spatiotemporal models for disease mapping.
Alternatively, the posterior distribution of θc may be estimated at a set of points at or close to the posterior mode. In particular, a central composite design (CCD) (Box and Draper, 2007) can be used to place some points in the parametric space at which the posterior distribution of θc is computed. For any given point in the CCD , its posterior density will be proportional to , and these values can be rescaled to approximate the posterior density of and to obtain the weights that are used in Bayesian model averaging when computing the posterior marginals for all the other parameters in the model. This is similar to the approach that is used in the INLA-within-MCMC method but with the conditional models evaluated at fewer points.
The CCD is already implemented in RINLA to estimate the posterior distribution of the hyperparameters, as described in Rue et al. (2017), for the latent models that are available therein. In our current proposal we are dealing with models that are not implemented in RINLA or that are not latent GMRFs (but conditional latent GMRFs). The CCD is used in a similar way to that in RINLA to set points in the parametric space of θc at which to evaluate a conditional model. Note also that in this setting it is possible to fit all the conditional models in parallel and that this will take a fraction of the time that is required by the INLA-within-MCMC approach. To obtain the points in the CCD the rsm package (Lenth, 2009) will be used. Others who have used CCDs to integrate some of the model hyperparameters out include Vanhatalo et al. (2013).
4 Examples
In this section we provide two examples on how to fit complex spatial models with the INLA by using the RINLA package and the Metropolis–Hastings algorithm. Both models have a complex spatial structure, with several spatial terms, and they cannot be fitted with RINLA alone. In addition, to fitting the model, we show how to exploit the joint posterior distribution that is obtained on a reduced ensemble of parameters to make multivariate inference. We shall compare these results with those obtained by setting θc at some reasonable values (ML estimate or posterior mode) or using a CCD to estimate the posterior distribution of θc, as discussed in Section 3.1.
As stated in Section 3, the rgeneric latent effect in the RINLA software could be used to implement these models by using the R language. However, this will allow us only to fit the models but not to provide multivariate posterior distributions on ensembles of the model parameters, which is also our aim in these examples.
4.1 Spatial econometrics
This model has two auto-correlation terms: one on the response y and another on the error term u, which are controlled by auto-correlation parameters ρ and λ respectively. Spatial adjacency matrix W is often taken as a binary matrix, so the entry at row i and column j will be 1 if areas i and j are neighbours. Although this is a reasonable adjacency matrix, we shall take W to be row standardized, so that the sum of all the elements in a row add up to 1. This will bring interesting properties to the spatial auto-correlation parameters. In particular, they will be bounded in the interval (1/λmin,1), where λmin is the minimum eigenvalue of W (Haining, 2003).
LeSage and Pace (2009) fitted this model by using Bayesian inference, for which they assigned priors to all model parameters. In particular, they considered typical independent inverse gamma priors for all the precisions, vague independent normal distributions for β and γ and uniform prior distributions for ρ and λ in the interval (1/λmin,1).
Hence, this model has a non-linear term in ρ, (I−ρW)−1, that multiplies the linear predictor on the covariates and lagged covariates, and a complex error structure. This model cannot be fitted with RINLA unless we condition on ρ and λ.
To fit this model with RINLA, we have resorted to the INLA-within-MCMC approach with θc=(ρ,λ). At each step of the algorithm we shall obtain a conditional (on θc) posterior marginal for the remainder of the parameters in the model θ−c. These conditional marginals will later be combined to obtain the posterior marginals of the model parameters θ−c.
The proposal distribution for θc is the product of two normal distributions centred at the current value of the parameters and standard deviation 0.25. This value has been obtained after some tuning to ensure a suitable acceptance rate. The starting point is θc=(0,0). Then, 500 iterations were used as burn-in, followed by 5000 iterations more, of which one in five was kept to reduce auto-correlation. This provided a final 1000 samples to estimate the joint posterior distribution of θc. Note that thinning also involves the conditional distributions computed at each step of the Metropolis–Hastings algorithm.
We have fitted this model to the Columbus data set (Anselin, 1988) which is available in R package spdep (Bivand and Piras, 2015). This data set contains information about 49 neighbourhoods in Columbus (Ohio) in 1980. We shall reproduce a classical analysis consisting of explaining crime rate on housing value (variable HOVAL) and household income (variable INC). We have not included lagged covariates, so γ=0.
In addition to using the INLA-within-MCMC approach, we have also fitted this model by using Gibbs sampling with the rjags package (Plummer, 2016) and using ML with function sacsarlm() from the spdep package.
Both MCMC sampling and the INLA-within-MCMC approach must deal with the computation of matrices I−λW and I−ρW and their determinants. This is an important topic in spatial econometrics and several approximations have been proposed (see LeSage and Pace (2009), Bivand et al. (2013) and the references therein). As an alternative, we use a CCD (as described in Section 3.1), which is centred at the ML estimates and inscribed in the box defined by taking ±0.75 standard errors (as reported by function sacsarlm) from the ML estimates. Note that the INLA-within-MCMC method averages over 1000 posterior marginals whereas an INLA with a CCD will use only 10 posterior marginals. Also, with a CCD the matrices I−λW and I−ρW and their determinants need to be computed at only a handful of points, which will reduce the computational burden.
Finally, we have also considered the approximation based on setting the values of ρ and λ at their ML estimates (which will be denoted as the INLA-at-ML approach). In the current example, these values can be obtained very fast and fitting a model with an INLA conditional on these values provides marginals for all the other parameters in the model. Although these posterior marginals ignore the uncertainty about ρ and λ, they can be regarded as approximations to their respective posterior marginals in the Manski model.
Fig. 1 shows the posterior marginal distributions of the auto-correlation parameters, estimated by using INLA-within-MCMC, INLA-with-CCD, MCMC and ML methods. As can be seen, there is good agreement between the various estimates. Regarding the other parameters in the model, Fig. 2 summarizes the different estimates and how they seem to agree. Note that these marginals have been obtained in different ways. For the INLA-within-MCMC approach, they are obtained by averaging the conditional marginals that are obtained at each step of the Metropolis–Hastings algorithm. With the INLA-with-CCD approach, they have been averaged over the marginals obtained from the models fitted at the 10 CCD points. Finally, for the INLA-at-ML approach, the marginals are those fitted by the INLA after fixing ρ and λ at their ML estimates.

Estimates of the spatial auto-correlation parameters of the Manski model by using the INLA-within-MCMC (), INLA-with-CCD (
), MCMC (
) and ML (
) methods: (a) ρ; (b) λ

Estimates of (a) the intercept, (b), (c) covariate coefficients and (d) variance by using the INLA-within-MCMC (), INLA-with-CCD (
), MCMC (
) and INLA approaches with ρ and λ fixed at their ML estimates (
, INLA with ML): the covariates are housing value HOVAL and household income INC
The non-MCMC methods seem to provide narrower posterior marginals for the intercept and variance, but this is probably because they do not account for all the variability that is associated with the spatial auto-correlation parameters. However, these methods are computationally faster than the MCMC and INLA-within-MCMC methods and still provide reasonable approximations to the posterior marginals of these parameters.
Finally, the joint posterior distribution of (ρ,λ) is shown in Fig. 3. The bivariate distributions that are obtained with the INLA-within-MCMC and MCMC algorithms look alike. The ML estimate has also been added to the plots (as a black dot) and in both plots looks close to the mode of the joint posterior distribution. Although no estimate of the joint posterior distribution is provided for the INLA-with-CCD algorithm, posterior summary statistics could be computed and these can be plugged into a bivariate Gaussian distribution to approximate π(ρ,λ|y).

Joint posterior distribution of (ρ,λ) (, ML estimate): (a) INLA-within-MCMC approach; (b) MCMC sampling
4.1.1 Computation of the impacts
Direct impacts measure changes in the response in the same area where the change occurs, indirect impacts measure changes in adjacent areas and total impacts are the sum of direct and indirect impacts. For this reason the average direct impact is defined as the mean of the trace of the impacts matrix, the average indirect impact is the sum of all the off-diagonal impacts divided by n and the total impact is the sum of all the elements in the impacts matrix divided by n.
As discussed in Gómez-Rubio et al. (2017), computing the impacts requires multivariate inference as the distribution depends on the (joint) posterior distribution of ρ, βr and γr. Hence, computing the impacts from the posterior marginals of these parameters alone is not enough to obtain accurate estimates. Note that in our particular case we are not considering lagged covariates and that γr=0, which simplifies the computation of the impacts.
To compute the impacts we can exploit the fact that we have different models conditioned on the values of ρ. Hence, we could compute the marginal distribution of the impacts for each value of ρ obtained with the Metropolis–Hastings and the conditional (on ρ) marginal π(βr|y,ρ). Then we could average over all conditional marginal distributions of the impacts given ρ to obtain the final distribution of the impacts, which could be used to compute summary statistics.
Table 1 summarizes point estimates and standard deviations of the various average impacts that are considered in this example. The estimation methods for which the impacts are reported are the MCMC, INLA-within-MCMC, INLA-with-CCD, INLA-at-ML and ML estimates obtained with the functions in the spdep package. In general, all Bayesian methods provide very similar results, with INLA-based approaches reporting slightly smaller standard deviations than does the MCMC approach.
Mean and standard deviation (in parentheses) of the impacts for the Manski model without lagged covariates fitted to the Columbus data set†
Method . | Results for INC . | Results for HOVAL . | ||||
---|---|---|---|---|---|---|
. | Direct . | Indirect . | Total . | Direct . | Indirect . | Total . |
MCMC | −0.96 (0.37) | −0.46 (0.42) | −1.42 (0.66) | −0.30 (0.10) | −0.14 (0.14) | −0.44 (0.20) |
INLA within MCMC | −0.97 (0.33) | −0.44 (0.35) | −1.40 (0.48) | −0.30 (0.10) | −0.13 (0.10) | −0.43 (0.14) |
INLA within CCD | −1.06 (0.40) | −0.54 (0.29) | −1.60 (0.49) | −0.29 (0.12) | −0.15 (0.09) | −0.44 (0.15) |
INLA at ML | −1.06 (0.32) | −0.53 (0.36) | −1.59 (0.48) | −0.29 (0.10) | −0.14 (0.11) | −0.43 (0.14) |
ML | −1.10 (—) | −0.55 (—) | −1.65 (—) | −0.29 (—) | −0.15 (—) | −0.44 (—) |
Method . | Results for INC . | Results for HOVAL . | ||||
---|---|---|---|---|---|---|
. | Direct . | Indirect . | Total . | Direct . | Indirect . | Total . |
MCMC | −0.96 (0.37) | −0.46 (0.42) | −1.42 (0.66) | −0.30 (0.10) | −0.14 (0.14) | −0.44 (0.20) |
INLA within MCMC | −0.97 (0.33) | −0.44 (0.35) | −1.40 (0.48) | −0.30 (0.10) | −0.13 (0.10) | −0.43 (0.14) |
INLA within CCD | −1.06 (0.40) | −0.54 (0.29) | −1.60 (0.49) | −0.29 (0.12) | −0.15 (0.09) | −0.44 (0.15) |
INLA at ML | −1.06 (0.32) | −0.53 (0.36) | −1.59 (0.48) | −0.29 (0.10) | −0.14 (0.11) | −0.43 (0.14) |
ML | −1.10 (—) | −0.55 (—) | −1.65 (—) | −0.29 (—) | −0.15 (—) | −0.44 (—) |
†The covariates are housing value HOVAL and household income INC.
Mean and standard deviation (in parentheses) of the impacts for the Manski model without lagged covariates fitted to the Columbus data set†
Method . | Results for INC . | Results for HOVAL . | ||||
---|---|---|---|---|---|---|
. | Direct . | Indirect . | Total . | Direct . | Indirect . | Total . |
MCMC | −0.96 (0.37) | −0.46 (0.42) | −1.42 (0.66) | −0.30 (0.10) | −0.14 (0.14) | −0.44 (0.20) |
INLA within MCMC | −0.97 (0.33) | −0.44 (0.35) | −1.40 (0.48) | −0.30 (0.10) | −0.13 (0.10) | −0.43 (0.14) |
INLA within CCD | −1.06 (0.40) | −0.54 (0.29) | −1.60 (0.49) | −0.29 (0.12) | −0.15 (0.09) | −0.44 (0.15) |
INLA at ML | −1.06 (0.32) | −0.53 (0.36) | −1.59 (0.48) | −0.29 (0.10) | −0.14 (0.11) | −0.43 (0.14) |
ML | −1.10 (—) | −0.55 (—) | −1.65 (—) | −0.29 (—) | −0.15 (—) | −0.44 (—) |
Method . | Results for INC . | Results for HOVAL . | ||||
---|---|---|---|---|---|---|
. | Direct . | Indirect . | Total . | Direct . | Indirect . | Total . |
MCMC | −0.96 (0.37) | −0.46 (0.42) | −1.42 (0.66) | −0.30 (0.10) | −0.14 (0.14) | −0.44 (0.20) |
INLA within MCMC | −0.97 (0.33) | −0.44 (0.35) | −1.40 (0.48) | −0.30 (0.10) | −0.13 (0.10) | −0.43 (0.14) |
INLA within CCD | −1.06 (0.40) | −0.54 (0.29) | −1.60 (0.49) | −0.29 (0.12) | −0.15 (0.09) | −0.44 (0.15) |
INLA at ML | −1.06 (0.32) | −0.53 (0.36) | −1.59 (0.48) | −0.29 (0.10) | −0.14 (0.11) | −0.43 (0.14) |
ML | −1.10 (—) | −0.55 (—) | −1.65 (—) | −0.29 (—) | −0.15 (—) | −0.44 (—) |
†The covariates are housing value HOVAL and household income INC.
4.2 Joint modelling of three diseases
Spatial models have been used for a long time in disease mapping to study the spatial variation of disease. Several researchers have used the INLA to fit spatial and spatiotemporal models for disease mapping of a single disease as MCMC methods are often very slow when the number of areas is large. See Blangiardo and Cameletti (2015) for a recent summary. Modelling several diseases is more complex because it requires models with a multivariate response and it often involves non-linear terms on some parameters in the linear predictor of the model (Leroux et al., 1999; Bivand et al., 2015). Knorr-Held and Best (2001) have considered the joint modelling of two diseases by including a shared spatial effect (with different weights for each disease) plus specific spatial terms for each disease.
In the following example we extend this model to create a multivariate model for three diseases following Downing et al. (2008). They fitted a joint model to six diseases with different weighted shared and specific spatial components. In our case, we shall consider only three diseases, which are modelled by using a shared spatial term and specific spatial patterns.
Note that this model looks like a standard log-linear model and that, for a fixed value of δ(d), it could be fitted with some standard software packages for Bayesian inference, including RINLA. We shall use our new approach to fit this model by taking θc=δ=(δ(1),…,δ(D)). In RINLA, this value can be included as a weight of the effects in the linear predictor by using a latent effect of type besag.
The prior for δ(d) will be a log-normal distribution with mean 0 and a precision equal to 0.1 to provide vague prior information. Shared component vi will be an intrinsic conditional auto-regressive spatial effect with precision τv. Specific components will also have an intrinsic conditional auto-regressive spatial effect with precision τs (which can be easily modelled by using the replicate feature in RINLA).
We shall fit this model to the cases of lip, oral cavity and pharynx tumours (international classification of diseases, version 10, code C00-C14), oesophagus tumour (code C15) and stomach tumour (code C16) from 1996 to 2014 at the province level in mainland Spain. Expected cases were computed by using age–sex standardized rates from the period 1996–2014.
Standardized mortality ratios (i.e. ) for each disease are shown in the maps in Fig. 4. Stomach tumours seem to have a spatial pattern that is different, and the other two diseases seem to have a very similar spatial pattern. We expect that the model which was described earlier would capture this joint spatial pattern and account for the different overall rates by means of the disease-specific intercepts that are included in the specific components.

Standardized mortality ratio of (a) lip, oral cavity and pharynx cancers, (b) oesophagus cancer and (c) stomach cancer in Spain in the period 1996–2014
Similarly to the previous example, we shall study how other approximations will work. In this case, ML estimates of δ are not easy to obtain and we shall rely on the posterior means and standard deviations to create the CCD. Given that the hyperparameters are positive in this model, the CCD will be considered in the log-scale. It will be centred at the log-posterior modes that are reported by the MCMC algorithm and within 0.75 times the posterior standard deviations in the log-scale. The standard deviations can be obtained from the MCMC output or approximated by using the delta method (see Hosmer et al. (2008), appendix 1) from the posterior means and variances of δ. In this case, they will be (σ(d)/μ(d))2, where μ(d) is the posterior mean and σ(d) the posterior standard deviation of δ(d).
In this case it is not easy to find values to create the CCD as in the previous example. In any case, our aim with this comparison is to investigate how other faster approximations will work. In addition, the CCD now produces a set of 16 different values for the spatial weights and it seems to be concentrated about the posterior modes. Furthermore, another approximation based on fitting the model at the posterior mode of δ will be used. This will be denoted as the INLA-at-mode method.
Fig. 5 summarizes the estimates that were obtained with the MCMC (Gibbs sampling with WinBUGS), INLA-within-MCMC, INLA-with-CCD and INLA-at-mode methods. Figs 5(a)–5(c) show the estimates of the posterior distribution of weights δ(d) for the three diseases for which there is a very good agreement between both estimation methods. The INLA-with-CCD approach performs reasonably well considering that it is based on 16 models and the CCD design seems to be too concentrated about the posterior modes. The INLA-at-mode method simply shows a vertical line at the posterior mode of δ(d).

(a)–(c) Summary of the posterior marginals of δ(d) (, INLA within MCMC;
, INLA with CCD;
, MCMC;
, INLA at mode) and summary of bivariate posterior distributions of (δ(i),δ(j)), i≠j, obtained with (d)–(f) the INLA-within-MCMC approach and (g)–(i) MCMC sampling
An interesting question now is how spatial dependence between the three causes of death could be assessed. Fig. 5 also shows the bivariate posterior distributions of each pair of spatial weights. These plots show the strong correlation between the weights that are associated with oral cavity and oesophagus cancers, and how the weight that is associated with stomach cancer seems to be independent from the two other weights. This supports the idea that stomach cancer has a distinct spatial pattern and that oral cavity and oesophagus cancers have a similar spatial pattern. Furthermore, there is a very good agreement between the estimates that were obtained with the INLA-within-MCMC and MCMC approaches.
It is also worth mentioning that the INLA-with-CCD approach can also provide approximations to the posterior correlations between the spatial weights by dividing the posterior estimate of the covariance by the posterior estimates of the standard deviations. For δ(1) and δ(2) this value is 0.41, whereas for δ(3) and any of the other two weights it is 0.12. The same correlations that are computed from the MCMC output are higher but this is because the CCD is concentrated about the posterior mode of δ. In any case, the positive dependence between the first two diseases is also captured by the INLA-with-CCD approximation. Note that it is not possible to explore the dependence between the different diseases with the model fitted by using the INLA-at-mode approach.
Finally, Fig. 6 displays the marginal distributions of the disease-specific intercepts α(d) and precisions that appear in the model. For the INLA-within-MCMC approach, these have been obtained by doing an average of the conditional marginals obtained at each step of the Metropolis–Hastings algorithm. In general, there is good agreement between this approach and MCMC sampling. Regarding the INLA-with-CCD and INLA-at-mode approaches, they provide good estimates of the disease-specific intercepts and the precision of the disease-specific random effects. However, they seem not to account for all the variability in the estimation of the precision of the shared spatial random effect, as the posterior marginal is centred at the posterior mode but with a smaller variance.

Marginal distributions of the parameters in the model obtained by Bayesian model averaging the conditional marginals obtained at each step of the Metropolis–Hastings algorithm (estimates obtained with INLA within MCMC () and INLA with CCD (
), and marginals obtained with MCMC (
) and INLA at the mode (
)): (a) α(1); (b) α(2); (c) α(3); (d) τs; (e) τv
Although it is not shown here, we have observed a positive strong correlation between δ(1) and δ(2). We believe that this happens because of the similar spatial pattern that both diseases have. δ(3) shows less correlation with the other two weighting parameters. This kind of multivariate inference has also been possible because we have been able to estimate the joint posterior distribution of (δ(1),δ(2),δ(3)) by fitting the model with the INLA-within-MCMC algorithm.
5 Discussion
Gómez-Rubio and Rue (2018) develop a novel approach to fit Bayesian models that are not implemented in RINLA by combining a Metropolis–Hastings and INLA algorithm. In this paper we have exploited this method to show how to fit complex spatial models with several spatial components so that multivariate inference on a small ensemble of important parameters is feasible as their joint posterior distribution is obtained. This joint posterior distribution can also be used to compute the posterior distribution of any function on this ensemble of parameters. Furthermore, we have also explored how simpler approximations will work when fitting complex spatial models, which provide good approximations to most of the posterior marginals of the parameters at a reduced computational cost. These simpler approaches are based on estimating the posterior distribution of the hyperparameters by using a CCD or simply by fixing their values at the ML estimates or posterior modes.
As seen in the examples that are presented in this paper, the INLA-within-Metropolis approach can be used to fit complex spatial models with a simple implementation of the Metropolis–Hastings algorithm. This implementation is simpler compared with a full implementation of the model using several MCMC algorithms as sampling involves only a few parameters. Furthermore, convergence of the MCMC simulations is achieved in fewer iterations because the number of parameters that need to be simulated from is greatly reduced compared with fitting the model completely by using MCMC sampling. For large data sets, this should be an interesting alternative as an INLA algorithm can fit conditional models faster than MCMC sampling.
The choice of the set of hyperparameters θc to condition on can be chosen in different ways. In general, any choice that makes the conditional model a latent GMRF that can be fitted with an INLA will work. In many applications it will be enough to choose θc so that the conditional model is a generalized linear mixed model that can be easily fitted with an INLA. This is how we have selected the hyperparameters in θc in the examples that were developed in this paper. In most cases, the dimension of θc will be small, but the INLA-within-MCMC algorithm can be applied to sets of hyperparameters of any dimension.
For cases in which the spatial model can be fitted completely with RINLA, the approach that is presented in this paper is also appealing because it makes multivariate inference based on the joint posterior possible. Although we have focused on spatial models, the methods that are presented in this paper can be easily applied to any other model that can be expressed as a conditional GMRF and, hence, can be fitted with RINLA by conditioning on a small number of parameters.
In the future we hope to explore alternatives to the Metropolis–Hastings algorithm that can fit the required conditional models in parallel. In particular, importance sampling can be a good alternative as sampling values of θc can be performed in parallel. Furthermore, the INLA-within-MCMC approach can be embedded in reversible jump MCMC algorithms to tackle the problem of model fitting and model selection at the same time. This could be useful, for example, to perform variable selection in complex spatial or spatiotemporal models.
Supporting Information
Additional ‘supporting information’ may be found in the on-line version of this article:
‘Supplementary materials: Multivariate posterior inference for spatial models with the Integrated Nested Laplace Approximation’.
Acknowledgements
This work has been supported by grant PPIC-2014-001-P, funded by the Consejería de Educación, Cultura y Deportes and the Fondo Europeo de Desarrollo Regional, and grant MTM2016-77501-P, funded by the Ministerio de Economía y Competitividad.
F. Palmí-Perales was supported by a doctoral scholarship awarded by the University of Castilla–La Mancha (Spain).