-
PDF
- Split View
-
Views
-
Cite
Cite
Lu Zhang, Sudipto Banerjee, Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data, Biometrics, Volume 78, Issue 2, June 2022, Pages 560–573, https://doi.org/10.1111/biom.13452
- Share Icon Share
Abstract
Multivariate spatially oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High-dimensional multivariate spatial data, which are the theme of this article, refer to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference, which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the matrix-normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high-dimensional parameter space including the latent spatial process. We illustrate the computational and inferential benefits of our algorithms over competing methods using simulation studies and an analysis of a massive vegetation index data set.
1 Introduction
Statistical modeling for multiple spatially oriented data is required to capture underlying spatial associations in each variable and accounting for inherent associations among the different variables. As an example, to which we return later, consider a set of spatially indexed spectral variables for vegetation activity on the land. Such variables exhibit strong spatial dependence as customarily exhibited through plots of spatial variograms and other exploratory maps. In addition, the variables are assumed to be associated with each other because of shared physical processes that manifest through the observations.
Modeling each variable separately captures the spatial distribution of that variable independent of other variables. Such an analysis ignores associations among the variables and can impair prediction or interpolation (see, e.g., Wackernagel, 2003; Chiles and Delfiner, 2009; Gelfand and Banerjee, 2010; Cressie and Wikle, 2015). Each of the aforementioned works provides ample evidence, theoretical and empirical, in favor of joint modeling of multiple spatially indexed variables. Joint modeling, or multivariate spatial analysis, is especially pertinent in the presence of spatial misalignment, where not all variables have been observed over the same set of locations. For example, suppose is a normalized difference vegetation index (NDVI) and
is red reflectance. If location
has yielded a measurement for
but not for
, then optimal imputation of
should proceed from
, where
and
comprise all measurements on
and
. If the processes Y() and X() are modeled as independent, then the predictive distribution
and will not exploit the possible predictive information present in
for
. This specific issue has also been discussed, with examples, in Banerjee and Gelfand (2002).
Joint modeling is driven by vector-valued latent spatial stochastic processes, such as a multivariate Gaussian process. These are specified with matrix-valued cross-covariance functions (see, e.g., Le and Zidek, 2006; Genton and Kleiber, 2015; Salvaña and Genton, 2020, and references therein) that model pairwise associations at distinct locations. Theoretical properties of cross-covariances are well established, but practical modeling implications and computational efficiency require specific considerations depending upon the application (see, e.g., Le et al., 1997; Sun et al., 1998; Le et al., 2001; Schmidt and Gelfand, 2003; Gamerman and Moreira, 2004; Banerjee et al., 2014).
High-dimensional multivariate spatial models will deal with a large number of dependent variables over a massive number of locations. While analyzing massive spatial and spatial-temporal databases has received attention (see, e.g., Sun et al., 2011; Banerjee, 2017; Heaton et al., 2019; Zhang et al., 2020), the bulk of methods has focused on one or very few (two or three) spatially dependent variables and often have to rely upon restrictive assumptions that preclude full inference on the latent process. With larger numbers of dependent variables, modeling the cross-covariance becomes challenging. Even for stationary cross-covariance functions, where we assume that the associations among the variables do not change over space and the spatial association for each variable depends only on the difference of two positions, matters become computationally challenging.
This article builds upon the popular linear models of coregionalization (Bourgault and Marcotte, 1991; Goulard and Voltz, 1992; Wackernagel, 2003; Gelfand et al., 2004; Chiles and Delfiner, 2009; Genton and Kleiber, 2015). Our contributions include: (i) developing a hierarchical model with a matrix-normal distribution as a prior for an unknown linear transformation on latent spatial processes, (ii) extending classes of spatial factor models for spatially misaligned data, and (iii) accounting for multiple outcomes over very large number of locations. Spatial factor models have been explored by Wang and Wall (2003), Lopes et al. (2008), Ren and Banerjee (2013), and Taylor-Rodriguez et al. (2019). Lopes et al. (2008) provide an extensive discussion on how hierarchical models emerged from dynamic factor models. Ren and Banerjee (2013) proposed low-rank specifications for spatially varying factors to achieve dimension reduction, but such low-rank specifications tend to oversmooth the latent process from massive data sets containing millions of locations. More recently, Taylor-Rodriguez et al. (2019) consider nearest-neighbor Gaussian process (Datta et al., 2016) for spatial factors with the usual constrained loading matrices in nonspatial factor models. These are more restrictive than needed for identifying spatially correlated factors (see, e.g., Ren and Banerjee, 2013).
We develop our modeling framework in Section 2. Section 3 presents some theoretical results about posterior consistency for the proposed models. Simulation studies for exploring the performance of proposed models are summarized in Section 4. Section 5 presents an application to remote-sensed vegetation analysis on land surfaces. We conclude with some discussion in Section 6.
2 Multivariate Spatial Processes
Let be a
stochastic process, where each
is a real-valued random variable at location
. The process is specified by its mean
and, customarily, second-order stationary covariances
for
. These covariances define the matrix-valued
cross-covariance function
with
-th entry
. While there is no loss of generality in assuming the process mean to be zero by absorbing the mean into a separate regression component in the model, as we will do here, modeling the cross-covariance function requires care. From its definition,
need not be symmetric, but must satisfy
. Also, since
for any set of finite locations
and any set of constant vectors
, we have
. Genton and Kleiber (2015) provide a comprehensive review of cross-covariance functions.
Perhaps the most widely used approach for constructing multivariate random fields is the linear model of coregionalization (LMC). This hinges on invertible linear maps of independent spatial processes yielding valid spatial processes. If is a
vector of independent spatial processes so that
for all
and any two locations
and
(same or distinct), then LMC (Bourgault and Marcotte, 1991) specifies

where is
, Λ is
,
is the k-th row of Λ, and each
is an independent Gaussian process with correlation function
with parameters
. The cross-covariance for
yields nondegenerate process-realizations whenever
and Λ is nonsingular. To achieve dimension reduction in the number of variables, we restrict
so we have nondegenerate realizations in a K-dimensional subspace.
Schmidt and Gelfand (2003) propose multivariate spatial processes through a hierarchical spatial conditional model, whereupon in (1) is a
lower triangular matrix. Other variants of LMC (e.g., Goulard and Voltz, 1992) can also be recast as (1) using linear algebra. The flexibility offered in modeling Λ is appealing and, in particular, can accrue computational benefits in high-dimensional settings. Hence, we build upon (1).
2.1 A Bayesian LMC (BLMC) Factor Model
Let denote the
vector of dependent outcomes in location
,
be the corresponding explanatory variables, and β be a
regression coefficient matrix in the multivariate spatial model

where the latent process is an LMC as described above. Elements in
are as described in (1), while the noise process
with covariance matrix Σ. We model
using a Matrix-Normal-Inverse-Wishart family. To be precise,

where a
matrix and
a
positive definite matrix. A random matrix
has the probability density function (Dawid, 1981)

where tr( · ) is the trace function, is the mean matrix,
is the first scale matrix with dimension
, and
is the second scale matrix with dimension
. This distribution is equivalent to
, where ⊗ is the Kronecker product and
is the vectorized
random matrix
. We refer to the model specified through (2)–(3) as the BLMC factor model.
Without misalignment, the observation model in (2) can be cast as

where is the
response matrix,
is the corresponding design matrix with full rank (
), and
is the
matrix with j-th column being the
vector comprising
's for
.
The parameters Λ and are not jointly identified in factor models and some constraints are required to ensure identifiability (Ren and Banerjee, 2013; Lopes and West, 2004). These constraints are not without problems. For example, a lower trapezoidal (triangular for
) specification for Λ imposes possibly unjustifiable conditional independence on the spatial processes. Alternatively, ordering the spatial range parameters can ensure identifiability but creates difficulties in computation and interpretation. We avoid such constraints and transform
to obtain inference for the latent process. This parameterization yields conditional conjugate distributions and, therefore, efficient posterior sampling. We elucidate below in the context of misaligned data.
2.2 Inference for Spatially Misaligned Data
Let be the set of locations that have recorded at least one of the observed outcomes and let
be the subset of locations that have recorded the i-th response. Then
and let
. Let
denote the set of locations where at least one response, but not the i-th response, is recorded so that
is the set of all locations with incomplete data. We derive the conditional distribution of
and of the unobserved responses
conditional on
. Let
be the
matrix such that
, where the suffix
are indexes of the observed responses at
. Thus,
extracts the observed responses from
in each of the locations
. The joint distribution of
and
, given by
, can be represented through the augmented linear system,

where ,
,
is the
spatial correlation matrix corresponding to
, and
represents the block diagonal operator stacking matrices along the diagonal. Letting
and
, where
, we obtain

The elements of are independent error terms, each with unit variance. The full conditional distribution
for the LMC model in (2) then follows

For misaligned data, we will perform Bayesian updating of the outcomes missing at a location . Let
be the suffix that indexes outcomes that are missing at
. The conditional distribution of
given the parameters
is

where ,
is the submatrix of Σ extracted with row and column indices
and
, respectively . With the priors given in (3), we let
and define
. The conditional posterior distribution
can be found from

where . Using standard distribution theory, we can show that
follows
, where

with . In particular, if
and each
for
, then the conditional distribution of
given
follows
, where

and is the i-th column of
. From (10),
.
The parameters ,
, by themselves, are not consistently estimable under in-fill asymptotics. Therefore, irrespective of the sample size (within a fixed domain), inference on
will be sensitive to the choice of the prior. Furthermore, without placing restrictions on the loading matrix or ordering these parameters (Ren and Banerjee, 2013), these parameters are identifiable primarily through the prior. We treat these as unknown and model them using priors based upon customary spatial domain considerations. The full conditional distributions for
are not available in closed form. However, since
and
are conditionally independent given
, and
are independent for
, we obtain
up to a proportionality constant as

for each , where
is the prior for
.
Turning to predictions, if is a set of new locations, then
is independent of
given
and
. Then,

for each . It follows that
is proportional to

where we have used the independence between and
given
and
. The distributions in (14) and (15) help in sampling from the posterior predictive distribution over
using the posterior samples of
. We elaborate below.
2.3 The Block Update Markov Chain Monte Carlo (MCMC) Algorithm
We formulate an efficient MCMC algorithm for obtaining full Bayesian inference as follows. From the l-th iteration with , we generate
from (8). Next, we draw
on
using (9) and then update
using (11). We complete the
-th iteration by drawing
through a Metropolis random walk using (13). Upon convergence, these iterations will generate samples from the desired joint posterior distribution
.
For inference on , we sample
from (14), given the posterior samples of
and
, then generate posterior predictions of
given the posterior samples of
. Applying the single component adaptive Metropolis (SCAM) algorithm introduced in Haario et al. (2005), one can avoid tuning parameters in Metropolis algorithm by warming up each MCMC chain of
with an adaptive proposal distribution. In our implementation, we use the proposal distribution defined by eq. (2.1) in Roberts and Rosenthal (2009).
We sample as a single block through a linear transformation of the
independent parameters from the model in (7). Sampling
follows analogously. We significantly improve convergence by reducing the posterior dependence among the parameters in this Gibbs with Metropolis algorithm (Gelman et al., 2013). Since
is sensitive to the value of the intercept, we recommend using an intercept-centered latent process to obtain inference for the latent spatial pattern.
2.4 Scalable Modeling
We use a conjugate gradient method (Nishimura and Suchard, 2018) to facilitate sampling of when
is sparse for
. Here, we develop a scalable BLMC model with each element of
modeled as a Nearest-Neighbor Gaussian Process (NNGP).
Let each be an
, which implies that
for each
, where
,
is a sparse lower triangular matrix with no more than a specified small number, m, of nonzero entries in each row and
is a diagonal matrix. The diagonal entries of
and the nonzero entries of
are obtained from the conditional variance and conditional expectations for a Gaussian process with covariance function
. We consider a fixed order of locations in
and let
be the set of at most m neighbors of
among locations
such that
. The
-th entry of
is 0 whenever
. If
are the m column indices for the nonzero entries in the i-th row of
, then the
-th element of
is the k-th element of the
vector
. The
-th diagonal element of
is given by
. Repeating these calculations for each row completes the construction of
and
and yields a sparse
. This construction is performed in parallel and requires storage or computation of at most
matrices, where
, costing
flops and storage. See Supplementary Appendix B for details.
Sampling is computationally expensive, but is expedited by solving
efficiently for any vector
. If
has a sparse Cholesky factor
, then calculating
is efficient. To be precise, the Woodbury matrix identity yields

where is sparse,
with
. If all the
's have similar structures, then permuting
with
in rows and columns often renders structures in
's that can be exploited by BLMC for very large spatial data sets . For example, if
's are banded matrices with bandwidth b, then
is also banded with bandwidth
. Moreover,
is a banded matrix with bandwidth
. Hence, adding
hardly increases the computational burden in the Cholesky decomposition of
when q is small. Assembling all features of
,
, and
, the calculation of
for any
is scalable when multiplying
with (16).
We conclude this section with a remark on the BLMC model with diagonal Σ. This specification is desirable for data sets with a massive number of responses q. A diagonal Σ avoids the quadratic growth of the number of parameters in Σ as q increases. We illustrate an NNGP-based BLMC with diagonal Σ in Section 4.2.
3 On Posterior Consistency: Large-Sample Properties of Posterior Estimates
We present some theoretical results for the models constructed in the previous section. Specifically, we investigate the behavior of the posterior distribution as the sample size increases and establish its convergence to an oracle distribution. Here, for establishing the results, we will assume conjugate Matrix-Normal-Inverse-Wishart (MNIW) models with no misalignment. First, we assume that itself is modeled as a spatial process without explicitly introducing a latent process. Let

where is a spatial correlation function defined through hyperparameter ψ, δ denotes Dirac's delta function, and
is the nonspatial covariance matrix of
. The fixed scalar α represents the proportion of total variability allocated to the spatial process. This implies that
, where
. We model
using the conjugate MNIW prior

with prefixed . Closely following the developments in Gamerman and Moreira (2004), we obtain the posterior distribution of
as
, where

We refer to the above model as the “response” model.
Next, we consider the spatial regression model with the latent process,

where is a latent process and
is measurement error. Define
. For theoretical tractability, we restrict posterior inference on
, assuming that the scalar α is fixed. Assuming that the joint distribution of β and Σ are given in (3) and that
, the posterior distribution of
is
, where

We refer to the above model as the “latent” model.
We establish the posterior consistency of for the response model (17) and the latent model (20). For distinguishing the variables based on the number of observations, we make the dependence upon n explicit. Denote
,
,
,
, and
. Proofs and technical details are available in Supplementary Appendix C.
Theorem 1. [Theorem S.1, Theorem S.2] Parameter set is posterior consistent for both conjugate response and latent models if and only if
, where
is the smallest eigenvalue of
.
When the explanatory variables share the same spatial correlation with the responses, the necessary and sufficient conditions for Theorem 1 hold (see Remark S.2). When the explanatory variables are themselves regarded as independent observations, the necessary and sufficient conditions in Theorem 1 hold (see Remark S.3).
4 Simulation
We present two simulation examples. The first compares BLMC model with other multivariate Bayesian spatial models. The second assesses our BLMC model when K is not excessively large. BLMC models were implemented in Julia 1.2.0 (Bezanson et al., 2017). We modeled the univariate processes in the proposed BLMC by NNGP. We took the BLMC model proposed by Schmidt and Gelfand (2003) as a benchmark in the first simulation example. The benchmark model was implemented in R 3.4.4 through function spMisalignLM in the R package spBayes (Finley et al., 2007). The posterior inference for each model was based on MCMC chains with 5000 iterations after a burn-in of 5000 iterations. All models were run on a single 8 Intel Core i7-7700K CPU @ 4.20GHz processor with 32 GB of random-access memory running Ubuntu 18.04.2 LTS. Convergence diagnostics and other posterior summaries were implemented within the Julia statistical environment. Model comparisons were based on parameter estimates (posterior mean and 95% credible interval), root mean squared prediction error (RMSPE), mean squared error of intercept-centered latent processes (MSEL), prediction interval coverage (CVG; the percent of intervals containing the true value), interval coverage for intercept-centered latent process of observed response (CVGL), average continuous rank probability score CRPS; see (Gneiting and Raftery, 2007) for responses, and the average interval score INT; see (Gneiting and Raftery, 2007) for responses and run time. We assessed convergence of MCMC chains by visually monitoring autocorrelations and checking the accuracy of parameter estimates using effective sample size (ESS) (Gelman et al., 2013, section 10.5) and Monte Carlo standard errors (MCSE) with batch size 50 (Flegal et al., 2008). To calculate the CRPS and INT, we assumed that the associated predictive distribution was well approximated by a Gaussian distribution with mean centered at the predicted value and standard deviation equal to the predictive standard error. All NNGP models were specified with at most nearest neighbors.
4.1 Simulation Example 1
We simulated the response from the LMC model in (2) with
over 1200 randomly generated locations over a unit square. The size of the data set was kept moderate to enable comparisons with the expensive full GP-based LMC models for experiments conducted on the computing setup described earlier. The explanatory variable
consists of an intercept and a single predictor generated from a standard normal distribution. An exponential correlation function was used to model
, that is,
, where
is the Euclidean distance between
and
, and
is the decay for each k. We randomly picked 200 locations for predicting each response to examine the predictive performance. Supplementary Appendix D presents the fixed parameters generating the data and the subsequent posterior estimates.
For NNGP-based BLMC model, we assigned a flat prior for β, which makes in (10) a zero matrix. The prior for Λ followed (3) with
a zero matrix and
a diagonal matrix whose diagonal elements are 25. The prior for Σ was set to follow
with
and
. For the benchmark LMC, we assigned a flat prior for β,
with
and
for the cross-covariance matrix
, and IG(2, 0.5) for each diagonal element of Σ. We assigned unif(2.12, 212) as priors of decays for both models. This implies that the “effective spatial range,” which is the distance where spatial correlation drops below 0.05, will be bounded above by
(the maximum intersite distance within a unit square) and bounded below by 1/100th of that to ensure a wide range.
Table 1 presents posterior estimates of parameters and performance metrics for all candidate models. Both models provided similar posterior inferences for . The 95% credible intervals of
all include the true value used to generate the data. The NNGP-based BLMC model and the benchmark LMC model cost 2.38 min and around 18.25 h, respectively. Despite the shorter running time, we observed superior performance of the NNGP-based BLMC than the benchmark LMC for inferring on the latent process using CVGL, MSEL, CRPSL, and INTL. Moreover, the interpolated map of the recovered intercept-centered latent processes (Figure 1) by BLMC and benchmark LMC are almost indistinguishable from each other. BLMC and benchmark LMC produce very similar RMSPEs, CRPSs, and INTs. The differences in estimates between the two models is likely emerging from the different prior settings and sampling schemes. Benchmark LMC restricts the loading matrix Λ to be upper triangular, while BLMC does not, resulting in greater flexibility in fitting latent process. On the other hand, the unidentifiable parameter setting of BLMC causes somewhat less stable inference for the hyperparameters
. The inferences for
are also less stable due to the sensitivity of intercept to latent process. For all other parameters including the intercept-centered latent process on 1200 locations, the median ESS is 4111.5. All MCSEs were consistently less than 0.02. These diagnostics suggest adequate convergence of the MCMC algorithm.
. | . | BLMC . | Benchmark LMC . | ||
---|---|---|---|---|---|
True . | Inference . | MCSE . | Inference . | MCSE . | |
β11 | 1.0 | 0.705 (0.145, 1.233) | 0.034 | 0.806 (0.502, 1.131) | 0.002 |
β12 | −1.0 | −1.24 (−1.998, −0.529) | 0.045 | −1.1 (−1.533, −0.646) | 0.001 |
β21 | −5.0 | −4.945 (−5.107, −4.778) | 0.002 | −4.949 (−5.113, −4.787) | 0.004 |
β22 | 2.0 | 1.979 (1.78, 2.166) | 0.004 | 1.974 (1.785, 2.167) | 0.002 |
Σ11 | 0.4 | 0.346 (0.283, 0.409) | 0.002 | 0.306 (0.248, 0.364) | 0.003 |
Σ12 | 0.15 | 0.133 (0.072, 0.194) | 0.003 | 0.0 | – |
Σ22 | 0.3 | 0.29 (0.198, 0.386) | 0.004 | 0.233 (0.159, 0.334) | 0.005 |
ϕ1 | 6.0 | 8.723 (4.292, 14.065) | 0.343 | 12.839 (8.805, 17.471) | 0.23 |
ϕ2 | 18.0 | 22.63 (15.901, 29.555) | 0.416 | 18.075 (12.99, 23.741) | 0.301 |
RMSPEa | – | [0.728, 0.756, 0.742] | [0.725, 0.762, 0.744] | ||
MSELb | – | [0.136, 0.168, 0.152] | [0.147, 0.192, 0.169] | ||
CRPSa | – | [−0.412, −0.423, −0.418] | [−0.41, −0.427, −0.418] | ||
CRPSLb | – | [−0.035, −0.038, −0.036] | [−0.216, −0.248, −0.232] | ||
CVGa | – | [0.915, 0.955, 0.935] | [0.925, 0.96, 0.9425] | ||
CVGLb | – | [0.946, 0.962, 0.954] | [0.756, 0.773, 0.765] | ||
INTa | – | [3.378, 3.756, 3.567] | [3.347, 3.823, 3.585] | ||
INTLa | – | [0.282, 0.329, 0.305] | [1.875, 2.023, 1.949] | ||
Time (s) | 143 | [42047, 23664]c |
. | . | BLMC . | Benchmark LMC . | ||
---|---|---|---|---|---|
True . | Inference . | MCSE . | Inference . | MCSE . | |
β11 | 1.0 | 0.705 (0.145, 1.233) | 0.034 | 0.806 (0.502, 1.131) | 0.002 |
β12 | −1.0 | −1.24 (−1.998, −0.529) | 0.045 | −1.1 (−1.533, −0.646) | 0.001 |
β21 | −5.0 | −4.945 (−5.107, −4.778) | 0.002 | −4.949 (−5.113, −4.787) | 0.004 |
β22 | 2.0 | 1.979 (1.78, 2.166) | 0.004 | 1.974 (1.785, 2.167) | 0.002 |
Σ11 | 0.4 | 0.346 (0.283, 0.409) | 0.002 | 0.306 (0.248, 0.364) | 0.003 |
Σ12 | 0.15 | 0.133 (0.072, 0.194) | 0.003 | 0.0 | – |
Σ22 | 0.3 | 0.29 (0.198, 0.386) | 0.004 | 0.233 (0.159, 0.334) | 0.005 |
ϕ1 | 6.0 | 8.723 (4.292, 14.065) | 0.343 | 12.839 (8.805, 17.471) | 0.23 |
ϕ2 | 18.0 | 22.63 (15.901, 29.555) | 0.416 | 18.075 (12.99, 23.741) | 0.301 |
RMSPEa | – | [0.728, 0.756, 0.742] | [0.725, 0.762, 0.744] | ||
MSELb | – | [0.136, 0.168, 0.152] | [0.147, 0.192, 0.169] | ||
CRPSa | – | [−0.412, −0.423, −0.418] | [−0.41, −0.427, −0.418] | ||
CRPSLb | – | [−0.035, −0.038, −0.036] | [−0.216, −0.248, −0.232] | ||
CVGa | – | [0.915, 0.955, 0.935] | [0.925, 0.96, 0.9425] | ||
CVGLb | – | [0.946, 0.962, 0.954] | [0.756, 0.773, 0.765] | ||
INTa | – | [3.378, 3.756, 3.567] | [3.347, 3.823, 3.585] | ||
INTLa | – | [0.282, 0.329, 0.305] | [1.875, 2.023, 1.949] | ||
Time (s) | 143 | [42047, 23664]c |
a[response 1, response 2, all responses].
bintercept + latent process on 1000 observed locations for [response 1, response 2, all responses].
c[time for MCMC sampling, time for recovering predictions].
. | . | BLMC . | Benchmark LMC . | ||
---|---|---|---|---|---|
True . | Inference . | MCSE . | Inference . | MCSE . | |
β11 | 1.0 | 0.705 (0.145, 1.233) | 0.034 | 0.806 (0.502, 1.131) | 0.002 |
β12 | −1.0 | −1.24 (−1.998, −0.529) | 0.045 | −1.1 (−1.533, −0.646) | 0.001 |
β21 | −5.0 | −4.945 (−5.107, −4.778) | 0.002 | −4.949 (−5.113, −4.787) | 0.004 |
β22 | 2.0 | 1.979 (1.78, 2.166) | 0.004 | 1.974 (1.785, 2.167) | 0.002 |
Σ11 | 0.4 | 0.346 (0.283, 0.409) | 0.002 | 0.306 (0.248, 0.364) | 0.003 |
Σ12 | 0.15 | 0.133 (0.072, 0.194) | 0.003 | 0.0 | – |
Σ22 | 0.3 | 0.29 (0.198, 0.386) | 0.004 | 0.233 (0.159, 0.334) | 0.005 |
ϕ1 | 6.0 | 8.723 (4.292, 14.065) | 0.343 | 12.839 (8.805, 17.471) | 0.23 |
ϕ2 | 18.0 | 22.63 (15.901, 29.555) | 0.416 | 18.075 (12.99, 23.741) | 0.301 |
RMSPEa | – | [0.728, 0.756, 0.742] | [0.725, 0.762, 0.744] | ||
MSELb | – | [0.136, 0.168, 0.152] | [0.147, 0.192, 0.169] | ||
CRPSa | – | [−0.412, −0.423, −0.418] | [−0.41, −0.427, −0.418] | ||
CRPSLb | – | [−0.035, −0.038, −0.036] | [−0.216, −0.248, −0.232] | ||
CVGa | – | [0.915, 0.955, 0.935] | [0.925, 0.96, 0.9425] | ||
CVGLb | – | [0.946, 0.962, 0.954] | [0.756, 0.773, 0.765] | ||
INTa | – | [3.378, 3.756, 3.567] | [3.347, 3.823, 3.585] | ||
INTLa | – | [0.282, 0.329, 0.305] | [1.875, 2.023, 1.949] | ||
Time (s) | 143 | [42047, 23664]c |
. | . | BLMC . | Benchmark LMC . | ||
---|---|---|---|---|---|
True . | Inference . | MCSE . | Inference . | MCSE . | |
β11 | 1.0 | 0.705 (0.145, 1.233) | 0.034 | 0.806 (0.502, 1.131) | 0.002 |
β12 | −1.0 | −1.24 (−1.998, −0.529) | 0.045 | −1.1 (−1.533, −0.646) | 0.001 |
β21 | −5.0 | −4.945 (−5.107, −4.778) | 0.002 | −4.949 (−5.113, −4.787) | 0.004 |
β22 | 2.0 | 1.979 (1.78, 2.166) | 0.004 | 1.974 (1.785, 2.167) | 0.002 |
Σ11 | 0.4 | 0.346 (0.283, 0.409) | 0.002 | 0.306 (0.248, 0.364) | 0.003 |
Σ12 | 0.15 | 0.133 (0.072, 0.194) | 0.003 | 0.0 | – |
Σ22 | 0.3 | 0.29 (0.198, 0.386) | 0.004 | 0.233 (0.159, 0.334) | 0.005 |
ϕ1 | 6.0 | 8.723 (4.292, 14.065) | 0.343 | 12.839 (8.805, 17.471) | 0.23 |
ϕ2 | 18.0 | 22.63 (15.901, 29.555) | 0.416 | 18.075 (12.99, 23.741) | 0.301 |
RMSPEa | – | [0.728, 0.756, 0.742] | [0.725, 0.762, 0.744] | ||
MSELb | – | [0.136, 0.168, 0.152] | [0.147, 0.192, 0.169] | ||
CRPSa | – | [−0.412, −0.423, −0.418] | [−0.41, −0.427, −0.418] | ||
CRPSLb | – | [−0.035, −0.038, −0.036] | [−0.216, −0.248, −0.232] | ||
CVGa | – | [0.915, 0.955, 0.935] | [0.925, 0.96, 0.9425] | ||
CVGLb | – | [0.946, 0.962, 0.954] | [0.756, 0.773, 0.765] | ||
INTa | – | [3.378, 3.756, 3.567] | [3.347, 3.823, 3.585] | ||
INTLa | – | [0.282, 0.329, 0.305] | [1.875, 2.023, 1.949] | ||
Time (s) | 143 | [42047, 23664]c |
a[response 1, response 2, all responses].
bintercept + latent process on 1000 observed locations for [response 1, response 2, all responses].
c[time for MCMC sampling, time for recovering predictions].

Interpolated maps of (a) & (d) the true generated intercept-centered latent processes, the posterior means of the intercept-centered latent process ω from the (b) & (e) NNGP-based BLMC model and the (c) & (f) benchmark LMC model. Heat-maps of the (l) actual finite sample correlation among latent processes and (g)–(k) posterior mean of finite sample correlation among latent processes based on the posterior samples of Ω. This figure appears in color in the electronic version of this article, and any mention of color refers to that version
4.2 Simulation Example 2
We generated 100 different data sets using (2) with and a diagonal Σ (i.e., independent measurement errors across outcomes). Supplementary Appendix D presents the parameter values used to generate the data sets. We fixed a set of 1200 irregularly situated locations inside a unit square. The explanatory variable
comprised an intercept and two predictors generated independently from a standard normal distribution. The same set of locations and explanatory variables were used for the 100 data sets. Each
was generated using an exponential covariance function,
, where
was the decay for
. We held out 200 locations for assessing predictive performances.
For each simulated data set, we fitted the BLMC model specifying a diagonal Σ with K from 1 to 10. Each has a Gamma prior with shape and scale equaling 2 and 4.24, respectively, so that the expected effective spatial range is half of the maximum inter-site distance. We assigned flat prior for β, a vague prior for Λ which follows the prior of Λ in the preceding example and IG(2, 1.0) priors for the diagonal elements of Σ.
The posterior mean and the 95% credible interval of CVGL, CVG, RMSPE, and diagnostics metric MCSE for regression slopes and Σ for 100 simulation studies are summarized by K in Table 2. Inference for CVG and MCSE were robust to the choice of K. All of the 95% credible intervals for CVG and MCSE were within [0.9, 0.99] and [0.0, 0.02], respectively. As shown in Table 2, the performance metrics were quickly improved as K increased from 1 to 10. On average, RMSPE decreased by about 30.9% and CVGL increased from 28% to 95%. Given that our data come from an LMC model with , we can conclude that BLMC with diagonal Σ is efficient in obtaining inference for the latent processes even when K is not adequately large. We also create heat-maps of the posterior mean of our finite sample correlation matrix among the latent processes based on posterior samples of Ω, where
with
the vector of column means of ω. Figures 1 (g)–1 (k) depict such heat maps from one of the 100 simulated data sets. As K increases from 2 to 10, the estimated correlation matrix approaches the true correlation matrix. The plots also reveal that the performance of BLMC is sensitive to the choice of K. We recommend choosing K based on scientific considerations for the problem at hand and exploratory data analyses, or checking the RMSPE value for different K and picking K by an elbow rule (Thorndike, 1953).
K = . | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|
CVGL | 0.28 (0.14, 0.93) | 0.39 (0.17, 0.94) | 0.49 (0.2, 0.95) | 0.58 (0.25, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) |
RMSPE | 2.07 (1.94, 2.18) | 1.96 (1.86, 2.05) | 1.87 (1.78, 1.97) | 1.78 (1.7, 1.85) |
MCSE | 0.004 (0.002, 0.008) | 0.005 (0.002, 0.01) | 0.005 (0.003, 0.01) | 0.004 (0.003, 0.01) |
K = . | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|
CVGL | 0.28 (0.14, 0.93) | 0.39 (0.17, 0.94) | 0.49 (0.2, 0.95) | 0.58 (0.25, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) |
RMSPE | 2.07 (1.94, 2.18) | 1.96 (1.86, 2.05) | 1.87 (1.78, 1.97) | 1.78 (1.7, 1.85) |
MCSE | 0.004 (0.002, 0.008) | 0.005 (0.002, 0.01) | 0.005 (0.003, 0.01) | 0.004 (0.003, 0.01) |
K = . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|
CVGL | 0.67 (0.29, 0.96) | 0.74 (0.33, 0.96) | 0.81 (0.38, 0.96) | 0.86 (0.44, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.91, 0.98) |
RMSPE | 1.7 (1.62, 1.77) | 1.63 (1.55, 1.69) | 1.56 (1.5, 1.63) | 1.51 (1.45, 1.57) |
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.013) | 0.005 (0.003, 0.011) | 0.005 (0.003, 0.011) |
K = . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|
CVGL | 0.67 (0.29, 0.96) | 0.74 (0.33, 0.96) | 0.81 (0.38, 0.96) | 0.86 (0.44, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.91, 0.98) |
RMSPE | 1.7 (1.62, 1.77) | 1.63 (1.55, 1.69) | 1.56 (1.5, 1.63) | 1.51 (1.45, 1.57) |
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.013) | 0.005 (0.003, 0.011) | 0.005 (0.003, 0.011) |
K = . | 9 . | 10 . | . | . |
---|---|---|---|---|
CVGL | 0.91 (0.59, 0.96) | 0.95 (0.92, 0.96) | ||
CVG | 0.95 (0.91, 0.98) | 0.95 (0.91, 0.98) | ||
RMSPE | 1.46 (1.41, 1.51) | 1.43 (1.38, 1.48) | ||
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.01) |
K = . | 9 . | 10 . | . | . |
---|---|---|---|---|
CVGL | 0.91 (0.59, 0.96) | 0.95 (0.92, 0.96) | ||
CVG | 0.95 (0.91, 0.98) | 0.95 (0.91, 0.98) | ||
RMSPE | 1.46 (1.41, 1.51) | 1.43 (1.38, 1.48) | ||
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.01) |
K = . | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|
CVGL | 0.28 (0.14, 0.93) | 0.39 (0.17, 0.94) | 0.49 (0.2, 0.95) | 0.58 (0.25, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) |
RMSPE | 2.07 (1.94, 2.18) | 1.96 (1.86, 2.05) | 1.87 (1.78, 1.97) | 1.78 (1.7, 1.85) |
MCSE | 0.004 (0.002, 0.008) | 0.005 (0.002, 0.01) | 0.005 (0.003, 0.01) | 0.004 (0.003, 0.01) |
K = . | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|
CVGL | 0.28 (0.14, 0.93) | 0.39 (0.17, 0.94) | 0.49 (0.2, 0.95) | 0.58 (0.25, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) |
RMSPE | 2.07 (1.94, 2.18) | 1.96 (1.86, 2.05) | 1.87 (1.78, 1.97) | 1.78 (1.7, 1.85) |
MCSE | 0.004 (0.002, 0.008) | 0.005 (0.002, 0.01) | 0.005 (0.003, 0.01) | 0.004 (0.003, 0.01) |
K = . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|
CVGL | 0.67 (0.29, 0.96) | 0.74 (0.33, 0.96) | 0.81 (0.38, 0.96) | 0.86 (0.44, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.91, 0.98) |
RMSPE | 1.7 (1.62, 1.77) | 1.63 (1.55, 1.69) | 1.56 (1.5, 1.63) | 1.51 (1.45, 1.57) |
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.013) | 0.005 (0.003, 0.011) | 0.005 (0.003, 0.011) |
K = . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|
CVGL | 0.67 (0.29, 0.96) | 0.74 (0.33, 0.96) | 0.81 (0.38, 0.96) | 0.86 (0.44, 0.96) |
CVG | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.92, 0.98) | 0.95 (0.91, 0.98) |
RMSPE | 1.7 (1.62, 1.77) | 1.63 (1.55, 1.69) | 1.56 (1.5, 1.63) | 1.51 (1.45, 1.57) |
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.013) | 0.005 (0.003, 0.011) | 0.005 (0.003, 0.011) |
K = . | 9 . | 10 . | . | . |
---|---|---|---|---|
CVGL | 0.91 (0.59, 0.96) | 0.95 (0.92, 0.96) | ||
CVG | 0.95 (0.91, 0.98) | 0.95 (0.91, 0.98) | ||
RMSPE | 1.46 (1.41, 1.51) | 1.43 (1.38, 1.48) | ||
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.01) |
K = . | 9 . | 10 . | . | . |
---|---|---|---|---|
CVGL | 0.91 (0.59, 0.96) | 0.95 (0.92, 0.96) | ||
CVG | 0.95 (0.91, 0.98) | 0.95 (0.91, 0.98) | ||
RMSPE | 1.46 (1.41, 1.51) | 1.43 (1.38, 1.48) | ||
MCSE | 0.005 (0.003, 0.01) | 0.005 (0.003, 0.01) |
5 Remote-Sensed Vegetation Data Analysis
We apply our proposed models to analyze NDVI and enhanced vegetation indices (EVIs) measuring vegetation activity on the land surface, which can help us understand the global distribution of vegetation types as well as their biophysical and structural properties and spatial variations. Apart from vegetation indices, we consider gross primary productivity (GPP) data, global terrestrial evapotranspiration (ET) product, and land cover data (see Ramon Solano et al., 2010; Mu et al., 2013; Sulla-Menashe and Friedl, 2018, for further details). The geographic coordinates of our variables were mapped on a Sinusoidal (SIN) projection grid. We focus on zone h08v05, which covers 11 119 505 to 10 007 555 m south of the prime meridian and 3 335 852 to 4 447 802 m north of the equator. The land is situated in the western United States. Our explanatory variables included an intercept and a binary indicator for no vegetation or urban area through the 2016 land cover data. All other variables were measured through the MODIS satellite over a 16-day period from 2016.04.06 to 2016.04.21. Some variables were rescaled and transformed in exploratory data analysis for the sake of better model fitting. The data sets were downloaded using the R package MODIS, and the code for the exploratory data analysis is provided as supplementary material to this article.
Our data comprise 1 020 000 observed locations to illustrate the proposed model. Our spatially dependent outcomes were the transformed NDVI ( labeled as NDVI) and red reflectance (red refl). A Bayesian multivariate regression model, defined by (2) excluding
, was also fitted for comparisons. All NNGP-based models used
nearest neighbors. We randomly held out 10% of each response and then held all responses over the region 10 400 000 to 10 300 000 m south of the prime meridian and 3 800 000 to 3 900 000 m north of the equator to evaluate the models' predictive performance over a missing region (white square) and randomly missing locations. Figure 2 (a) illustrates the map of the transformed NDVI data.

Colored NDVI and red reflectance images of western United States (zone h08v05). Maps of raw data (a) & (d) and the posterior mean of the intercept-centered latent process recovered from (b) & (e) BLMC and (c) & (f) BLMC with diagonal Σ. Correlation of responses (g) and posterior mean of finite sample correlation among latent processes from the BLMC model with diagonal Σ (h). Heat-map (i) of counts of observed response. Each observed location is labeled with a color dot whose color represents the count of observed responses on the location. The greener the color, the higher the count. This figure appears in color in the electronic version of this article, and any mention of color refers to that version
We fit both models with 5000 iterations after 5000 iterations as burn-in. The priors for all parameters except decays followed those in the Section 4. We assigned Gamma(200, 0.02) and Gamma(200, 0.04) for ϕ1 and ϕ2 for BLMC based on fitted variograms to the raw data. All the codes were run with single thread. No other processes were simultaneously run so as to provide an accurate measure of computing time.
Table 3 presents results on the BLMC. The regression coefficients of the index of no vegetation or urban area show relatively low biomass (low NDVI) and high red reflectance over no vegetation or urban area. Estimates of Σ and the finite sample process covariance matrix Ω, as defined in Section 4.2, show a negative association between the residuals and latent processes of transformed NDVI and red reflectance, which satisfies the underlying relationship between two responses. BLMC captured a high negative correlation () between the latent processes of two responses, indicating that the spatial pattern of the latent processes of NDVI and red reflectance are almost the reverse of each other. The maps of the latent processes recovered by BLMC, presented in Figure 2, also support this relationship.
Vegetation data analysis summary Table 1: posterior mean (2.5%, 97.5%) percentiles
. | Bayesian linear model . | BLMC . | |
---|---|---|---|
inference . | Inference . | MCSE . | |
Intercept1 | 0.2515 (0.2512, 0.2517) | 0.1433 (0.1418, 0.1449) | 1.145e-4 |
Intercept2 | 0.1395 (0.1394, 0.1396) | 0.1599 (0.159, 0.1608) | 6.17e-5 |
Novegeorurbanarea1 | −0.1337 (−0.1346, −0.1328) | −1.385e-2 (−1.430e-2, −1.342e-2) | 1.69e-5 |
Novegeorurbanarea2 | 6.035e-2 (5.992e-2, 6.075e-2) | 7.831e-3 (7.584e-3, 8.097e-3) | 8.24e-6 |
Σ11 | 1.599e-2 (1.594e-2, 1.603e-2) | 3.514e-4 (3.477e-4, 3.553e-4) | 1.93e-7 |
Σ12 | −6.491e-3 (−6.512e-3, −6.471e-3) | −1.084e-4 (−1.100e-4, −1.067e-4) | 8.19e-8 |
Σ22 | 3.656e-3 (3.646e-3, 3.667e-3) | 1.074e-4 (1.063e-4, 1.084e-4) | 4.79e-8 |
Ω11 | – | 1.675e-2 (1.674e-2, 1.676e-2) | 4.17e-7 |
Ω12 | – | −6.873e-3 (−6.879e-3, −6.867e-3) | 1.77e-7 |
Ω22 | – | 3.764e-3 (3.760e-3, 3.768e-3) | 9.06e-8 |
ϕ1 | – | 3.995 (3.887, 4.075) | 7.535e-3 |
ϕ2 | – | 12.376 (11.512, 13.320) | 7.60e-3 |
RMSPEa | [0.074, 0.0359, 0.0581] | [0.0326, 0.0171, 0.0260] | |
CRPSa | [−0.04135, −0.01988, −0.03061] | [−0.01561, −0.00879, −0.0122] | |
CVGa | [0.956, 0.958, 0.957] | [0.954, 0.947, 0.950] | |
INTa | [0.3468, 0.1711, 0.2589] | [0.1965, 0.0995, 0.1480] | |
Time (min) | 11 | 2318 |
. | Bayesian linear model . | BLMC . | |
---|---|---|---|
inference . | Inference . | MCSE . | |
Intercept1 | 0.2515 (0.2512, 0.2517) | 0.1433 (0.1418, 0.1449) | 1.145e-4 |
Intercept2 | 0.1395 (0.1394, 0.1396) | 0.1599 (0.159, 0.1608) | 6.17e-5 |
Novegeorurbanarea1 | −0.1337 (−0.1346, −0.1328) | −1.385e-2 (−1.430e-2, −1.342e-2) | 1.69e-5 |
Novegeorurbanarea2 | 6.035e-2 (5.992e-2, 6.075e-2) | 7.831e-3 (7.584e-3, 8.097e-3) | 8.24e-6 |
Σ11 | 1.599e-2 (1.594e-2, 1.603e-2) | 3.514e-4 (3.477e-4, 3.553e-4) | 1.93e-7 |
Σ12 | −6.491e-3 (−6.512e-3, −6.471e-3) | −1.084e-4 (−1.100e-4, −1.067e-4) | 8.19e-8 |
Σ22 | 3.656e-3 (3.646e-3, 3.667e-3) | 1.074e-4 (1.063e-4, 1.084e-4) | 4.79e-8 |
Ω11 | – | 1.675e-2 (1.674e-2, 1.676e-2) | 4.17e-7 |
Ω12 | – | −6.873e-3 (−6.879e-3, −6.867e-3) | 1.77e-7 |
Ω22 | – | 3.764e-3 (3.760e-3, 3.768e-3) | 9.06e-8 |
ϕ1 | – | 3.995 (3.887, 4.075) | 7.535e-3 |
ϕ2 | – | 12.376 (11.512, 13.320) | 7.60e-3 |
RMSPEa | [0.074, 0.0359, 0.0581] | [0.0326, 0.0171, 0.0260] | |
CRPSa | [−0.04135, −0.01988, −0.03061] | [−0.01561, −0.00879, −0.0122] | |
CVGa | [0.956, 0.958, 0.957] | [0.954, 0.947, 0.950] | |
INTa | [0.3468, 0.1711, 0.2589] | [0.1965, 0.0995, 0.1480] | |
Time (min) | 11 | 2318 |
a[1st response transformed NDVI, 2nd response red reflectance, all responses].
Vegetation data analysis summary Table 1: posterior mean (2.5%, 97.5%) percentiles
. | Bayesian linear model . | BLMC . | |
---|---|---|---|
inference . | Inference . | MCSE . | |
Intercept1 | 0.2515 (0.2512, 0.2517) | 0.1433 (0.1418, 0.1449) | 1.145e-4 |
Intercept2 | 0.1395 (0.1394, 0.1396) | 0.1599 (0.159, 0.1608) | 6.17e-5 |
Novegeorurbanarea1 | −0.1337 (−0.1346, −0.1328) | −1.385e-2 (−1.430e-2, −1.342e-2) | 1.69e-5 |
Novegeorurbanarea2 | 6.035e-2 (5.992e-2, 6.075e-2) | 7.831e-3 (7.584e-3, 8.097e-3) | 8.24e-6 |
Σ11 | 1.599e-2 (1.594e-2, 1.603e-2) | 3.514e-4 (3.477e-4, 3.553e-4) | 1.93e-7 |
Σ12 | −6.491e-3 (−6.512e-3, −6.471e-3) | −1.084e-4 (−1.100e-4, −1.067e-4) | 8.19e-8 |
Σ22 | 3.656e-3 (3.646e-3, 3.667e-3) | 1.074e-4 (1.063e-4, 1.084e-4) | 4.79e-8 |
Ω11 | – | 1.675e-2 (1.674e-2, 1.676e-2) | 4.17e-7 |
Ω12 | – | −6.873e-3 (−6.879e-3, −6.867e-3) | 1.77e-7 |
Ω22 | – | 3.764e-3 (3.760e-3, 3.768e-3) | 9.06e-8 |
ϕ1 | – | 3.995 (3.887, 4.075) | 7.535e-3 |
ϕ2 | – | 12.376 (11.512, 13.320) | 7.60e-3 |
RMSPEa | [0.074, 0.0359, 0.0581] | [0.0326, 0.0171, 0.0260] | |
CRPSa | [−0.04135, −0.01988, −0.03061] | [−0.01561, −0.00879, −0.0122] | |
CVGa | [0.956, 0.958, 0.957] | [0.954, 0.947, 0.950] | |
INTa | [0.3468, 0.1711, 0.2589] | [0.1965, 0.0995, 0.1480] | |
Time (min) | 11 | 2318 |
. | Bayesian linear model . | BLMC . | |
---|---|---|---|
inference . | Inference . | MCSE . | |
Intercept1 | 0.2515 (0.2512, 0.2517) | 0.1433 (0.1418, 0.1449) | 1.145e-4 |
Intercept2 | 0.1395 (0.1394, 0.1396) | 0.1599 (0.159, 0.1608) | 6.17e-5 |
Novegeorurbanarea1 | −0.1337 (−0.1346, −0.1328) | −1.385e-2 (−1.430e-2, −1.342e-2) | 1.69e-5 |
Novegeorurbanarea2 | 6.035e-2 (5.992e-2, 6.075e-2) | 7.831e-3 (7.584e-3, 8.097e-3) | 8.24e-6 |
Σ11 | 1.599e-2 (1.594e-2, 1.603e-2) | 3.514e-4 (3.477e-4, 3.553e-4) | 1.93e-7 |
Σ12 | −6.491e-3 (−6.512e-3, −6.471e-3) | −1.084e-4 (−1.100e-4, −1.067e-4) | 8.19e-8 |
Σ22 | 3.656e-3 (3.646e-3, 3.667e-3) | 1.074e-4 (1.063e-4, 1.084e-4) | 4.79e-8 |
Ω11 | – | 1.675e-2 (1.674e-2, 1.676e-2) | 4.17e-7 |
Ω12 | – | −6.873e-3 (−6.879e-3, −6.867e-3) | 1.77e-7 |
Ω22 | – | 3.764e-3 (3.760e-3, 3.768e-3) | 9.06e-8 |
ϕ1 | – | 3.995 (3.887, 4.075) | 7.535e-3 |
ϕ2 | – | 12.376 (11.512, 13.320) | 7.60e-3 |
RMSPEa | [0.074, 0.0359, 0.0581] | [0.0326, 0.0171, 0.0260] | |
CRPSa | [−0.04135, −0.01988, −0.03061] | [−0.01561, −0.00879, −0.0122] | |
CVGa | [0.956, 0.958, 0.957] | [0.954, 0.947, 0.950] | |
INTa | [0.3468, 0.1711, 0.2589] | [0.1965, 0.0995, 0.1480] | |
Time (min) | 11 | 2318 |
a[1st response transformed NDVI, 2nd response red reflectance, all responses].
We provide RMSPE, CVG, CRPS, INT, MCSE, and run time in Table 3. Apparently BLMC substantially improved predictive accuracy. BLMC's RMSPEs were over 50% less than the Bayesian linear model. CVG is similar between two models, while INT and CRPS also favored BLMC over the Bayesian linear model. Figure 2 presents the estimated latent processes from BLMC. Notably, the BLMC smooths out the predictions in the held-out region. The model's run time was around 38.6 h, which is still impressive given the full model-based analysis it offers for such a massive multivariate spatial data set.
We also fitted a BLMC with diagonal Σ to explore the underlying latent processes of 10 (transformed) responses: (i) NDVI, (ii) EVI, (iii) GPP, (iv) net photosynthesis (PsnNet), (v) red reflectance (red refl), (vi) blue reflectance (blue refl), (vii) average daily global ET, (viii) latent heat flux (LE), (ix) potential ET (PET), and (x) potential LE (PLE). There are, in total, 12 057 locations with no responses and 656 366 observed locations with misaligned data (at least one but not all responses), which cover 65.12% of observed locations. We provide a heat-map (Figure 2 (i)) to present the status of misalignment over the study domain.
Based on the exploratory analysis, we observed two groups of responses that have high within-group correlations but relatively low between-group correlations (see Figure 2 (g)). Hence we picked . Estimates from the BLMC model are presented in Table 4. No vegetation or urban area exhibits lower vegetation indexes (lower NDVI and EVI) and lower production of chemical energy in organic compounds by living organisms (lower GPP and PsnNet). We observe a trend of higher blue reflectance, red reflectance, ET (higher ET LE), and lower potential ET (lower PET PLE) in urban area and area with no vegetation. We provide maps of posterior predictions for all 10 variables in Supplementary Appendix F. The latent processes corresponding to transformed NDVI and red reflectance fitted in two analyses in Figure 2 share a similar pattern. Finally, the heat map of the posterior mean of the finite sample correlation among the latent processes (elements of Ω as defined in Section 4.2) based on BLMC with diagonal Σ, presented in Figure 2 (h), reveals a high underlying correlation among NDVI, EVI, GPP, PsnNet, red and blue reflectance, and that LE and ET are slightly more correlated with NDVI and EVI than PLE and PET. The total run time for BLMC with diagonal Σ was around 60.7 h (3642.25 min).
Response . | Slope . | MCSE . | Nugget (![]() | MCSE . |
---|---|---|---|---|
NDVI | −0.0120 (−0.0124, −0.0116) | 1.37e-5 | 7.46e-4 (7.42e-4, 7.49e-4) | 7.60e-8 |
EVI | −4.38e-3 (−4.68e-3, −4.08e-3) | 6.86e-6 | 8.68e-4 (8.65e-4, 8.7e-4) | 3.07e-8 |
GPP | −0.197 (−0.199, −0.194) | 8.31e-5 | 0.0244 (0.0243, 0.0245) | 2.34e-6 |
PsnNet | −4.48e-3 (−5.39e-3, −3.50e-3) | 3.42e-5 | 5.34e-3 (5.32e-3, 5.36e-3) | 3.50e-7 |
red refl | 4.49e-3 (4.20e-3, 4.77e-3) | 5.11e-6 | 9.84e-4 (9.81e-4, 9.87e-4) | 3.13e-8 |
blue refl | 0.0123 (0.0121, 0.0124) | 2.74e-6 | 2.60e-4 (2.59e-4, 2.61e-4) | 8.81e-9 |
LE | 0.0908 (0.0884, 0.0932) | 1.36e-4 | 0.0531 (0.0529, 0.0533) | 2.37e-6 |
ET | 0.0919 (0.0895, 0.0944) | 1.49e-4 | 0.0531 (0.053, 0.0533) | 2.27e-6 |
PLE | −3.64e-3 (−3.98e-3, −3.36e-3) | 5.50e-5 | 2.095e-5 (2.086e-5, 2.104e-5) | 1.63e-9 |
PET | −4.88e-3 (−5.99e-3, −3.96e-3) | 1.81e-4 | 6.50e-5 (6.44e-5, 6.57e-5) | 2.20e-8 |
Response . | Slope . | MCSE . | Nugget (![]() | MCSE . |
---|---|---|---|---|
NDVI | −0.0120 (−0.0124, −0.0116) | 1.37e-5 | 7.46e-4 (7.42e-4, 7.49e-4) | 7.60e-8 |
EVI | −4.38e-3 (−4.68e-3, −4.08e-3) | 6.86e-6 | 8.68e-4 (8.65e-4, 8.7e-4) | 3.07e-8 |
GPP | −0.197 (−0.199, −0.194) | 8.31e-5 | 0.0244 (0.0243, 0.0245) | 2.34e-6 |
PsnNet | −4.48e-3 (−5.39e-3, −3.50e-3) | 3.42e-5 | 5.34e-3 (5.32e-3, 5.36e-3) | 3.50e-7 |
red refl | 4.49e-3 (4.20e-3, 4.77e-3) | 5.11e-6 | 9.84e-4 (9.81e-4, 9.87e-4) | 3.13e-8 |
blue refl | 0.0123 (0.0121, 0.0124) | 2.74e-6 | 2.60e-4 (2.59e-4, 2.61e-4) | 8.81e-9 |
LE | 0.0908 (0.0884, 0.0932) | 1.36e-4 | 0.0531 (0.0529, 0.0533) | 2.37e-6 |
ET | 0.0919 (0.0895, 0.0944) | 1.49e-4 | 0.0531 (0.053, 0.0533) | 2.27e-6 |
PLE | −3.64e-3 (−3.98e-3, −3.36e-3) | 5.50e-5 | 2.095e-5 (2.086e-5, 2.104e-5) | 1.63e-9 |
PET | −4.88e-3 (−5.99e-3, −3.96e-3) | 1.81e-4 | 6.50e-5 (6.44e-5, 6.57e-5) | 2.20e-8 |
Response . | Slope . | MCSE . | Nugget (![]() | MCSE . |
---|---|---|---|---|
NDVI | −0.0120 (−0.0124, −0.0116) | 1.37e-5 | 7.46e-4 (7.42e-4, 7.49e-4) | 7.60e-8 |
EVI | −4.38e-3 (−4.68e-3, −4.08e-3) | 6.86e-6 | 8.68e-4 (8.65e-4, 8.7e-4) | 3.07e-8 |
GPP | −0.197 (−0.199, −0.194) | 8.31e-5 | 0.0244 (0.0243, 0.0245) | 2.34e-6 |
PsnNet | −4.48e-3 (−5.39e-3, −3.50e-3) | 3.42e-5 | 5.34e-3 (5.32e-3, 5.36e-3) | 3.50e-7 |
red refl | 4.49e-3 (4.20e-3, 4.77e-3) | 5.11e-6 | 9.84e-4 (9.81e-4, 9.87e-4) | 3.13e-8 |
blue refl | 0.0123 (0.0121, 0.0124) | 2.74e-6 | 2.60e-4 (2.59e-4, 2.61e-4) | 8.81e-9 |
LE | 0.0908 (0.0884, 0.0932) | 1.36e-4 | 0.0531 (0.0529, 0.0533) | 2.37e-6 |
ET | 0.0919 (0.0895, 0.0944) | 1.49e-4 | 0.0531 (0.053, 0.0533) | 2.27e-6 |
PLE | −3.64e-3 (−3.98e-3, −3.36e-3) | 5.50e-5 | 2.095e-5 (2.086e-5, 2.104e-5) | 1.63e-9 |
PET | −4.88e-3 (−5.99e-3, −3.96e-3) | 1.81e-4 | 6.50e-5 (6.44e-5, 6.57e-5) | 2.20e-8 |
Response . | Slope . | MCSE . | Nugget (![]() | MCSE . |
---|---|---|---|---|
NDVI | −0.0120 (−0.0124, −0.0116) | 1.37e-5 | 7.46e-4 (7.42e-4, 7.49e-4) | 7.60e-8 |
EVI | −4.38e-3 (−4.68e-3, −4.08e-3) | 6.86e-6 | 8.68e-4 (8.65e-4, 8.7e-4) | 3.07e-8 |
GPP | −0.197 (−0.199, −0.194) | 8.31e-5 | 0.0244 (0.0243, 0.0245) | 2.34e-6 |
PsnNet | −4.48e-3 (−5.39e-3, −3.50e-3) | 3.42e-5 | 5.34e-3 (5.32e-3, 5.36e-3) | 3.50e-7 |
red refl | 4.49e-3 (4.20e-3, 4.77e-3) | 5.11e-6 | 9.84e-4 (9.81e-4, 9.87e-4) | 3.13e-8 |
blue refl | 0.0123 (0.0121, 0.0124) | 2.74e-6 | 2.60e-4 (2.59e-4, 2.61e-4) | 8.81e-9 |
LE | 0.0908 (0.0884, 0.0932) | 1.36e-4 | 0.0531 (0.0529, 0.0533) | 2.37e-6 |
ET | 0.0919 (0.0895, 0.0944) | 1.49e-4 | 0.0531 (0.053, 0.0533) | 2.27e-6 |
PLE | −3.64e-3 (−3.98e-3, −3.36e-3) | 5.50e-5 | 2.095e-5 (2.086e-5, 2.104e-5) | 1.63e-9 |
PET | −4.88e-3 (−5.99e-3, −3.96e-3) | 1.81e-4 | 6.50e-5 (6.44e-5, 6.57e-5) | 2.20e-8 |
6 Summary and Discussion
We have proposed scalable models for analyzing massive and possibly misaligned multivariate spatial data sets. Our framework offers flexible covariance structures and scalability by modeling the loading matrix of spatial factors using matrix-normal distributions and the factors themselves as NNGPs. This process-based formulation allows us to resolve spatial misalignment by fully model-based imputation. Through a set of simulation examples and an analysis of a massive misaligned data set comprising remote-sensed variables, we demonstrated the inferential and computational benefits accrued from our proposed framework.
This work can be expanded further in at least two important directions. The first is to extend the current methods to spatiotemporal data sets, where multiple variables are indexed by spatial coordinates, as considered here, as well as by temporal indices. Associations are likely to be exhibited across space and time as well as among the variables within a location and time-point. In addition, these variables are likely to be misaligned across time and space. Regarding the scalability of the spatiotemporal process, we can build a dynamic nearest-neighbor Gaussian process (DNNGP) (Datta et al., 2016) to model spatiotemporal factors and one can also envisage temporal dependence on the loading matrix.
A second direction will consider spatially varying coefficient models. We model the regression coefficients β using a spatial (or spatiotemporal) random field to capture spatial (or spatiotemporal) patterns in how some of the predictors impact the outcome. We can assign the prior of the regression coefficients β using a multivariate Gaussian random field with a proportional cross-covariance function. Then the prior of β over observed locations follows a matrix-normal distribution, which is the prior we designed for β in all of the proposed models in this article. While the modification seems to be easy, the actual implementation requires a more detailed exploration, and we leave these topics for further explorations.
From a computational perspective, we clearly need to further explore high-performance computing and high-dimensional spatial models amenable to such platforms. The programs provided in this work are for illustration and have limited usage in graphical processing units (GPUs) computing and parallelized CPU computing. A parallel CPU computing algorithm for the BLMC model can simultaneously sample multiple MCMC chains, improving the performance of the actual implementations. Implementations with modeling methods such as MRA (Katzfuss, 2017) also require dedicated programming with GPU. Other scalable modeling methods that build graphical Gaussian models on space, time, and the number of variables can lead to sparse models for high-dimensional multivariate data and scale not only up to millions of locations and time points, but also to hundreds or even thousands of spatially or spatiotemporally oriented variables. The idea here will be to extend current developments in Vecchia-type models to graphs building dependence among a large number of variables so that the precision matrices across space, time, and variables are sparse. Research on scalable statistical models and high-performance computing algorithms for such models will be of substantial interest to statisticians and environmental scientists.
Data Availability Statement
The simulated and remote-sensed vegetation index data that support the findings in this article are openly available at GitHub at https://github.com/LuZhangstat/Multi_NNGP, Zhang (2021).
Acknowledgments
The work of the authors has been supported in part by National Science Foundation (NSF) under grants NSF/DMS 1916349 and NSF/IIS 1562303, and by the National Institute of Environmental Health Sciences (NIEHS) under grants R01ES030210 and 5R01ES027027.
References
Supplementary data
The Web Appendices referenced in Sections 2.4, 3, 4 and 5, the MODIS vegetation indices data analyzed in Section 5, and the Julia code implementing our models are available with this paper at the Biometrics website on Wiley Online Library. The data and code are also available at https://github.com/LuZhangstat/Multi_NNGP.