Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data

Simulation study summary table: posterior mean (2.5%, 97.5%) percentiles

		BLMC		Benchmark LMC
	True	Inference	MCSE	Inference	MCSE
β₁₁	1.0	0.705 (0.145, 1.233)	0.034	0.806 (0.502, 1.131)	0.002
β₁₂	−1.0	−1.24 (−1.998, −0.529)	0.045	−1.1 (−1.533, −0.646)	0.001
β₂₁	−5.0	−4.945 (−5.107, −4.778)	0.002	−4.949 (−5.113, −4.787)	0.004
β₂₂	2.0	1.979 (1.78, 2.166)	0.004	1.974 (1.785, 2.167)	0.002
Σ₁₁	0.4	0.346 (0.283, 0.409)	0.002	0.306 (0.248, 0.364)	0.003
Σ₁₂	0.15	0.133 (0.072, 0.194)	0.003	0.0	–
Σ₂₂	0.3	0.29 (0.198, 0.386)	0.004	0.233 (0.159, 0.334)	0.005
ϕ₁	6.0	8.723 (4.292, 14.065)	0.343	12.839 (8.805, 17.471)	0.23
ϕ₂	18.0	22.63 (15.901, 29.555)	0.416	18.075 (12.99, 23.741)	0.301
RMSPEa	–	[0.728, 0.756, 0.742]		[0.725, 0.762, 0.744]
MSELb	–	[0.136, 0.168, 0.152]		[0.147, 0.192, 0.169]
CRPSa	–	[−0.412, −0.423, −0.418]		[−0.41, −0.427, −0.418]
CRPSLb	–	[−0.035, −0.038, −0.036]		[−0.216, −0.248, −0.232]
CVGa	–	[0.915, 0.955, 0.935]		[0.925, 0.96, 0.9425]
CVGLb	–	[0.946, 0.962, 0.954]		[0.756, 0.773, 0.765]
INTa	–	[3.378, 3.756, 3.567]		[3.347, 3.823, 3.585]
INTLa	–	[0.282, 0.329, 0.305]		[1.875, 2.023, 1.949]
Time (s)		143		[42047, 23664]c

		BLMC		Benchmark LMC
	True	Inference	MCSE	Inference	MCSE
β₁₁	1.0	0.705 (0.145, 1.233)	0.034	0.806 (0.502, 1.131)	0.002
β₁₂	−1.0	−1.24 (−1.998, −0.529)	0.045	−1.1 (−1.533, −0.646)	0.001
β₂₁	−5.0	−4.945 (−5.107, −4.778)	0.002	−4.949 (−5.113, −4.787)	0.004
β₂₂	2.0	1.979 (1.78, 2.166)	0.004	1.974 (1.785, 2.167)	0.002
Σ₁₁	0.4	0.346 (0.283, 0.409)	0.002	0.306 (0.248, 0.364)	0.003
Σ₁₂	0.15	0.133 (0.072, 0.194)	0.003	0.0	–
Σ₂₂	0.3	0.29 (0.198, 0.386)	0.004	0.233 (0.159, 0.334)	0.005
ϕ₁	6.0	8.723 (4.292, 14.065)	0.343	12.839 (8.805, 17.471)	0.23
ϕ₂	18.0	22.63 (15.901, 29.555)	0.416	18.075 (12.99, 23.741)	0.301
RMSPEa	–	[0.728, 0.756, 0.742]		[0.725, 0.762, 0.744]
MSELb	–	[0.136, 0.168, 0.152]		[0.147, 0.192, 0.169]
CRPSa	–	[−0.412, −0.423, −0.418]		[−0.41, −0.427, −0.418]
CRPSLb	–	[−0.035, −0.038, −0.036]		[−0.216, −0.248, −0.232]
CVGa	–	[0.915, 0.955, 0.935]		[0.925, 0.96, 0.9425]
CVGLb	–	[0.946, 0.962, 0.954]		[0.756, 0.773, 0.765]
INTa	–	[3.378, 3.756, 3.567]		[3.347, 3.823, 3.585]
INTLa	–	[0.282, 0.329, 0.305]		[1.875, 2.023, 1.949]
Time (s)		143		[42047, 23664]c

^a[response 1, response 2, all responses].

^bintercept + latent process on 1000 observed locations for [response 1, response 2, all responses].

^c[time for MCMC sampling, time for recovering predictions].

TABLE 1

Open in new tab Download slide

Simulation study summary table: posterior mean (2.5%, 97.5%) percentiles

		BLMC		Benchmark LMC
	True	Inference	MCSE	Inference	MCSE
β₁₁	1.0	0.705 (0.145, 1.233)	0.034	0.806 (0.502, 1.131)	0.002
β₁₂	−1.0	−1.24 (−1.998, −0.529)	0.045	−1.1 (−1.533, −0.646)	0.001
β₂₁	−5.0	−4.945 (−5.107, −4.778)	0.002	−4.949 (−5.113, −4.787)	0.004
β₂₂	2.0	1.979 (1.78, 2.166)	0.004	1.974 (1.785, 2.167)	0.002
Σ₁₁	0.4	0.346 (0.283, 0.409)	0.002	0.306 (0.248, 0.364)	0.003
Σ₁₂	0.15	0.133 (0.072, 0.194)	0.003	0.0	–
Σ₂₂	0.3	0.29 (0.198, 0.386)	0.004	0.233 (0.159, 0.334)	0.005
ϕ₁	6.0	8.723 (4.292, 14.065)	0.343	12.839 (8.805, 17.471)	0.23
ϕ₂	18.0	22.63 (15.901, 29.555)	0.416	18.075 (12.99, 23.741)	0.301
RMSPEa	–	[0.728, 0.756, 0.742]		[0.725, 0.762, 0.744]
MSELb	–	[0.136, 0.168, 0.152]		[0.147, 0.192, 0.169]
CRPSa	–	[−0.412, −0.423, −0.418]		[−0.41, −0.427, −0.418]
CRPSLb	–	[−0.035, −0.038, −0.036]		[−0.216, −0.248, −0.232]
CVGa	–	[0.915, 0.955, 0.935]		[0.925, 0.96, 0.9425]
CVGLb	–	[0.946, 0.962, 0.954]		[0.756, 0.773, 0.765]
INTa	–	[3.378, 3.756, 3.567]		[3.347, 3.823, 3.585]
INTLa	–	[0.282, 0.329, 0.305]		[1.875, 2.023, 1.949]
Time (s)		143		[42047, 23664]c

		BLMC		Benchmark LMC
	True	Inference	MCSE	Inference	MCSE
β₁₁	1.0	0.705 (0.145, 1.233)	0.034	0.806 (0.502, 1.131)	0.002
β₁₂	−1.0	−1.24 (−1.998, −0.529)	0.045	−1.1 (−1.533, −0.646)	0.001
β₂₁	−5.0	−4.945 (−5.107, −4.778)	0.002	−4.949 (−5.113, −4.787)	0.004
β₂₂	2.0	1.979 (1.78, 2.166)	0.004	1.974 (1.785, 2.167)	0.002
Σ₁₁	0.4	0.346 (0.283, 0.409)	0.002	0.306 (0.248, 0.364)	0.003
Σ₁₂	0.15	0.133 (0.072, 0.194)	0.003	0.0	–
Σ₂₂	0.3	0.29 (0.198, 0.386)	0.004	0.233 (0.159, 0.334)	0.005
ϕ₁	6.0	8.723 (4.292, 14.065)	0.343	12.839 (8.805, 17.471)	0.23
ϕ₂	18.0	22.63 (15.901, 29.555)	0.416	18.075 (12.99, 23.741)	0.301
RMSPEa	–	[0.728, 0.756, 0.742]		[0.725, 0.762, 0.744]
MSELb	–	[0.136, 0.168, 0.152]		[0.147, 0.192, 0.169]
CRPSa	–	[−0.412, −0.423, −0.418]		[−0.41, −0.427, −0.418]
CRPSLb	–	[−0.035, −0.038, −0.036]		[−0.216, −0.248, −0.232]
CVGa	–	[0.915, 0.955, 0.935]		[0.925, 0.96, 0.9425]
CVGLb	–	[0.946, 0.962, 0.954]		[0.756, 0.773, 0.765]
INTa	–	[3.378, 3.756, 3.567]		[3.347, 3.823, 3.585]
INTLa	–	[0.282, 0.329, 0.305]		[1.875, 2.023, 1.949]
Time (s)		143		[42047, 23664]c

^a[response 1, response 2, all responses].

^bintercept + latent process on 1000 observed locations for [response 1, response 2, all responses].

^c[time for MCMC sampling, time for recovering predictions].

Figure 1

Interpolated maps of (a) & (d) the true generated intercept-centered latent processes, the posterior means of the intercept-centered latent process ω from the (b) & (e) NNGP-based BLMC model and the (c) & (f) benchmark LMC model. Heat-maps of the (l) actual finite sample correlation among latent processes and (g)–(k) posterior mean of finite sample correlation among latent processes based on the posterior samples of Ω. This figure appears in color in the electronic version of this article, and any mention of color refers to that version

4.2 Simulation Example 2

We generated 100 different data sets using (2) with and a diagonal Σ (i.e., independent measurement errors across outcomes). Supplementary Appendix D presents the parameter values used to generate the data sets. We fixed a set of 1200 irregularly situated locations inside a unit square. The explanatory variable comprised an intercept and two predictors generated independently from a standard normal distribution. The same set of locations and explanatory variables were used for the 100 data sets. Each was generated using an exponential covariance function, ⁠, where was the decay for ⁠. We held out 200 locations for assessing predictive performances.

For each simulated data set, we fitted the BLMC model specifying a diagonal Σ with K from 1 to 10. Each has a Gamma prior with shape and scale equaling 2 and 4.24, respectively, so that the expected effective spatial range is half of the maximum inter-site distance. We assigned flat prior for β, a vague prior for Λ which follows the prior of Λ in the preceding example and IG(2, 1.0) priors for the diagonal elements of Σ.

The posterior mean and the 95% credible interval of CVGL, CVG, RMSPE, and diagnostics metric MCSE for regression slopes and Σ for 100 simulation studies are summarized by K in Table 2. Inference for CVG and MCSE were robust to the choice of K. All of the 95% credible intervals for CVG and MCSE were within [0.9, 0.99] and [0.0, 0.02], respectively. As shown in Table 2, the performance metrics were quickly improved as K increased from 1 to 10. On average, RMSPE decreased by about 30.9% and CVGL increased from 28% to 95%. Given that our data come from an LMC model with ⁠, we can conclude that BLMC with diagonal Σ is efficient in obtaining inference for the latent processes even when K is not adequately large. We also create heat-maps of the posterior mean of our finite sample correlation matrix among the latent processes based on posterior samples of Ω, where with the vector of column means of ω. Figures 1 (g)–1 (k) depict such heat maps from one of the 100 simulated data sets. As K increases from 2 to 10, the estimated correlation matrix approaches the true correlation matrix. The plots also reveal that the performance of BLMC is sensitive to the choice of K. We recommend choosing K based on scientific considerations for the problem at hand and exploratory data analyses, or checking the RMSPE value for different K and picking K by an elbow rule (Thorndike, 1953).

TABLE 2

Simulation study summary table 2: posterior mean (2.5%, 97.5%) percentiles

K =	1	2	3	4
CVGL	0.28 (0.14, 0.93)	0.39 (0.17, 0.94)	0.49 (0.2, 0.95)	0.58 (0.25, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)
RMSPE	2.07 (1.94, 2.18)	1.96 (1.86, 2.05)	1.87 (1.78, 1.97)	1.78 (1.7, 1.85)
MCSE	0.004 (0.002, 0.008)	0.005 (0.002, 0.01)	0.005 (0.003, 0.01)	0.004 (0.003, 0.01)

K =	1	2	3	4
CVGL	0.28 (0.14, 0.93)	0.39 (0.17, 0.94)	0.49 (0.2, 0.95)	0.58 (0.25, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)
RMSPE	2.07 (1.94, 2.18)	1.96 (1.86, 2.05)	1.87 (1.78, 1.97)	1.78 (1.7, 1.85)
MCSE	0.004 (0.002, 0.008)	0.005 (0.002, 0.01)	0.005 (0.003, 0.01)	0.004 (0.003, 0.01)

K =	5	6	7	8
CVGL	0.67 (0.29, 0.96)	0.74 (0.33, 0.96)	0.81 (0.38, 0.96)	0.86 (0.44, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.7 (1.62, 1.77)	1.63 (1.55, 1.69)	1.56 (1.5, 1.63)	1.51 (1.45, 1.57)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.013)	0.005 (0.003, 0.011)	0.005 (0.003, 0.011)

K =	5	6	7	8
CVGL	0.67 (0.29, 0.96)	0.74 (0.33, 0.96)	0.81 (0.38, 0.96)	0.86 (0.44, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.7 (1.62, 1.77)	1.63 (1.55, 1.69)	1.56 (1.5, 1.63)	1.51 (1.45, 1.57)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.013)	0.005 (0.003, 0.011)	0.005 (0.003, 0.011)

K =	9	10
CVGL	0.91 (0.59, 0.96)	0.95 (0.92, 0.96)
CVG	0.95 (0.91, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.46 (1.41, 1.51)	1.43 (1.38, 1.48)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.01)

K =	9	10
CVGL	0.91 (0.59, 0.96)	0.95 (0.92, 0.96)
CVG	0.95 (0.91, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.46 (1.41, 1.51)	1.43 (1.38, 1.48)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.01)

TABLE 2

Open in new tab Download slide

Simulation study summary table 2: posterior mean (2.5%, 97.5%) percentiles

K =	1	2	3	4
CVGL	0.28 (0.14, 0.93)	0.39 (0.17, 0.94)	0.49 (0.2, 0.95)	0.58 (0.25, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)
RMSPE	2.07 (1.94, 2.18)	1.96 (1.86, 2.05)	1.87 (1.78, 1.97)	1.78 (1.7, 1.85)
MCSE	0.004 (0.002, 0.008)	0.005 (0.002, 0.01)	0.005 (0.003, 0.01)	0.004 (0.003, 0.01)

K =	1	2	3	4
CVGL	0.28 (0.14, 0.93)	0.39 (0.17, 0.94)	0.49 (0.2, 0.95)	0.58 (0.25, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)
RMSPE	2.07 (1.94, 2.18)	1.96 (1.86, 2.05)	1.87 (1.78, 1.97)	1.78 (1.7, 1.85)
MCSE	0.004 (0.002, 0.008)	0.005 (0.002, 0.01)	0.005 (0.003, 0.01)	0.004 (0.003, 0.01)

K =	5	6	7	8
CVGL	0.67 (0.29, 0.96)	0.74 (0.33, 0.96)	0.81 (0.38, 0.96)	0.86 (0.44, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.7 (1.62, 1.77)	1.63 (1.55, 1.69)	1.56 (1.5, 1.63)	1.51 (1.45, 1.57)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.013)	0.005 (0.003, 0.011)	0.005 (0.003, 0.011)

K =	5	6	7	8
CVGL	0.67 (0.29, 0.96)	0.74 (0.33, 0.96)	0.81 (0.38, 0.96)	0.86 (0.44, 0.96)
CVG	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.92, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.7 (1.62, 1.77)	1.63 (1.55, 1.69)	1.56 (1.5, 1.63)	1.51 (1.45, 1.57)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.013)	0.005 (0.003, 0.011)	0.005 (0.003, 0.011)

K =	9	10
CVGL	0.91 (0.59, 0.96)	0.95 (0.92, 0.96)
CVG	0.95 (0.91, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.46 (1.41, 1.51)	1.43 (1.38, 1.48)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.01)

K =	9	10
CVGL	0.91 (0.59, 0.96)	0.95 (0.92, 0.96)
CVG	0.95 (0.91, 0.98)	0.95 (0.91, 0.98)
RMSPE	1.46 (1.41, 1.51)	1.43 (1.38, 1.48)
MCSE	0.005 (0.003, 0.01)	0.005 (0.003, 0.01)

5 Remote-Sensed Vegetation Data Analysis

We apply our proposed models to analyze NDVI and enhanced vegetation indices (EVIs) measuring vegetation activity on the land surface, which can help us understand the global distribution of vegetation types as well as their biophysical and structural properties and spatial variations. Apart from vegetation indices, we consider gross primary productivity (GPP) data, global terrestrial evapotranspiration (ET) product, and land cover data (see Ramon Solano et al., 2010; Mu et al., 2013; Sulla-Menashe and Friedl, 2018, for further details). The geographic coordinates of our variables were mapped on a Sinusoidal (SIN) projection grid. We focus on zone h08v05, which covers 11 119 505 to 10 007 555 m south of the prime meridian and 3 335 852 to 4 447 802 m north of the equator. The land is situated in the western United States. Our explanatory variables included an intercept and a binary indicator for no vegetation or urban area through the 2016 land cover data. All other variables were measured through the MODIS satellite over a 16-day period from 2016.04.06 to 2016.04.21. Some variables were rescaled and transformed in exploratory data analysis for the sake of better model fitting. The data sets were downloaded using the R package MODIS, and the code for the exploratory data analysis is provided as supplementary material to this article.

Our data comprise 1 020 000 observed locations to illustrate the proposed model. Our spatially dependent outcomes were the transformed NDVI (⁠ labeled as NDVI) and red reflectance (red refl). A Bayesian multivariate regression model, defined by (2) excluding ⁠, was also fitted for comparisons. All NNGP-based models used nearest neighbors. We randomly held out 10% of each response and then held all responses over the region 10 400 000 to 10 300 000 m south of the prime meridian and 3 800 000 to 3 900 000 m north of the equator to evaluate the models' predictive performance over a missing region (white square) and randomly missing locations. Figure 2 (a) illustrates the map of the transformed NDVI data.

Figure 2

Colored NDVI and red reflectance images of western United States (zone h08v05). Maps of raw data (a) & (d) and the posterior mean of the intercept-centered latent process recovered from (b) & (e) BLMC and (c) & (f) BLMC with diagonal Σ. Correlation of responses (g) and posterior mean of finite sample correlation among latent processes from the BLMC model with diagonal Σ (h). Heat-map (i) of counts of observed response. Each observed location is labeled with a color dot whose color represents the count of observed responses on the location. The greener the color, the higher the count. This figure appears in color in the electronic version of this article, and any mention of color refers to that version

We fit both models with 5000 iterations after 5000 iterations as burn-in. The priors for all parameters except decays followed those in the Section 4. We assigned Gamma(200, 0.02) and Gamma(200, 0.04) for ϕ₁ and ϕ₂ for BLMC based on fitted variograms to the raw data. All the codes were run with single thread. No other processes were simultaneously run so as to provide an accurate measure of computing time.

Table 3 presents results on the BLMC. The regression coefficients of the index of no vegetation or urban area show relatively low biomass (low NDVI) and high red reflectance over no vegetation or urban area. Estimates of Σ and the finite sample process covariance matrix Ω, as defined in Section 4.2, show a negative association between the residuals and latent processes of transformed NDVI and red reflectance, which satisfies the underlying relationship between two responses. BLMC captured a high negative correlation (⁠⁠) between the latent processes of two responses, indicating that the spatial pattern of the latent processes of NDVI and red reflectance are almost the reverse of each other. The maps of the latent processes recovered by BLMC, presented in Figure 2, also support this relationship.

TABLE 3

Vegetation data analysis summary Table 1: posterior mean (2.5%, 97.5%) percentiles

	Bayesian linear model	BLMC
	inference	Inference	MCSE
Intercept₁	0.2515 (0.2512, 0.2517)	0.1433 (0.1418, 0.1449)	1.145e-4
Intercept₂	0.1395 (0.1394, 0.1396)	0.1599 (0.159, 0.1608)	6.17e-5
Novegeorurbanarea₁	−0.1337 (−0.1346, −0.1328)	−1.385e-2 (−1.430e-2, −1.342e-2)	1.69e-5
Novegeorurbanarea₂	6.035e-2 (5.992e-2, 6.075e-2)	7.831e-3 (7.584e-3, 8.097e-3)	8.24e-6
Σ₁₁	1.599e-2 (1.594e-2, 1.603e-2)	3.514e-4 (3.477e-4, 3.553e-4)	1.93e-7
Σ₁₂	−6.491e-3 (−6.512e-3, −6.471e-3)	−1.084e-4 (−1.100e-4, −1.067e-4)	8.19e-8
Σ₂₂	3.656e-3 (3.646e-3, 3.667e-3)	1.074e-4 (1.063e-4, 1.084e-4)	4.79e-8
Ω₁₁	–	1.675e-2 (1.674e-2, 1.676e-2)	4.17e-7
Ω₁₂	–	−6.873e-3 (−6.879e-3, −6.867e-3)	1.77e-7
Ω₂₂	–	3.764e-3 (3.760e-3, 3.768e-3)	9.06e-8
ϕ₁	–	3.995 (3.887, 4.075)	7.535e-3
ϕ₂	–	12.376 (11.512, 13.320)	7.60e-3
RMSPEa	[0.074, 0.0359, 0.0581]	[0.0326, 0.0171, 0.0260]
CRPSa	[−0.04135, −0.01988, −0.03061]	[−0.01561, −0.00879, −0.0122]
CVGa	[0.956, 0.958, 0.957]	[0.954, 0.947, 0.950]
INTa	[0.3468, 0.1711, 0.2589]	[0.1965, 0.0995, 0.1480]
Time (min)	11	2318

	Bayesian linear model	BLMC
	inference	Inference	MCSE
Intercept₁	0.2515 (0.2512, 0.2517)	0.1433 (0.1418, 0.1449)	1.145e-4
Intercept₂	0.1395 (0.1394, 0.1396)	0.1599 (0.159, 0.1608)	6.17e-5
Novegeorurbanarea₁	−0.1337 (−0.1346, −0.1328)	−1.385e-2 (−1.430e-2, −1.342e-2)	1.69e-5
Novegeorurbanarea₂	6.035e-2 (5.992e-2, 6.075e-2)	7.831e-3 (7.584e-3, 8.097e-3)	8.24e-6
Σ₁₁	1.599e-2 (1.594e-2, 1.603e-2)	3.514e-4 (3.477e-4, 3.553e-4)	1.93e-7
Σ₁₂	−6.491e-3 (−6.512e-3, −6.471e-3)	−1.084e-4 (−1.100e-4, −1.067e-4)	8.19e-8
Σ₂₂	3.656e-3 (3.646e-3, 3.667e-3)	1.074e-4 (1.063e-4, 1.084e-4)	4.79e-8
Ω₁₁	–	1.675e-2 (1.674e-2, 1.676e-2)	4.17e-7
Ω₁₂	–	−6.873e-3 (−6.879e-3, −6.867e-3)	1.77e-7
Ω₂₂	–	3.764e-3 (3.760e-3, 3.768e-3)	9.06e-8
ϕ₁	–	3.995 (3.887, 4.075)	7.535e-3
ϕ₂	–	12.376 (11.512, 13.320)	7.60e-3
RMSPEa	[0.074, 0.0359, 0.0581]	[0.0326, 0.0171, 0.0260]
CRPSa	[−0.04135, −0.01988, −0.03061]	[−0.01561, −0.00879, −0.0122]
CVGa	[0.956, 0.958, 0.957]	[0.954, 0.947, 0.950]
INTa	[0.3468, 0.1711, 0.2589]	[0.1965, 0.0995, 0.1480]
Time (min)	11	2318

^a[1st response transformed NDVI, 2nd response red reflectance, all responses].

TABLE 3

Vegetation data analysis summary Table 1: posterior mean (2.5%, 97.5%) percentiles

	Bayesian linear model	BLMC
	inference	Inference	MCSE
Intercept₁	0.2515 (0.2512, 0.2517)	0.1433 (0.1418, 0.1449)	1.145e-4
Intercept₂	0.1395 (0.1394, 0.1396)	0.1599 (0.159, 0.1608)	6.17e-5
Novegeorurbanarea₁	−0.1337 (−0.1346, −0.1328)	−1.385e-2 (−1.430e-2, −1.342e-2)	1.69e-5
Novegeorurbanarea₂	6.035e-2 (5.992e-2, 6.075e-2)	7.831e-3 (7.584e-3, 8.097e-3)	8.24e-6
Σ₁₁	1.599e-2 (1.594e-2, 1.603e-2)	3.514e-4 (3.477e-4, 3.553e-4)	1.93e-7
Σ₁₂	−6.491e-3 (−6.512e-3, −6.471e-3)	−1.084e-4 (−1.100e-4, −1.067e-4)	8.19e-8
Σ₂₂	3.656e-3 (3.646e-3, 3.667e-3)	1.074e-4 (1.063e-4, 1.084e-4)	4.79e-8
Ω₁₁	–	1.675e-2 (1.674e-2, 1.676e-2)	4.17e-7
Ω₁₂	–	−6.873e-3 (−6.879e-3, −6.867e-3)	1.77e-7
Ω₂₂	–	3.764e-3 (3.760e-3, 3.768e-3)	9.06e-8
ϕ₁	–	3.995 (3.887, 4.075)	7.535e-3
ϕ₂	–	12.376 (11.512, 13.320)	7.60e-3
RMSPEa	[0.074, 0.0359, 0.0581]	[0.0326, 0.0171, 0.0260]
CRPSa	[−0.04135, −0.01988, −0.03061]	[−0.01561, −0.00879, −0.0122]
CVGa	[0.956, 0.958, 0.957]	[0.954, 0.947, 0.950]
INTa	[0.3468, 0.1711, 0.2589]	[0.1965, 0.0995, 0.1480]
Time (min)	11	2318

	Bayesian linear model	BLMC
	inference	Inference	MCSE
Intercept₁	0.2515 (0.2512, 0.2517)	0.1433 (0.1418, 0.1449)	1.145e-4
Intercept₂	0.1395 (0.1394, 0.1396)	0.1599 (0.159, 0.1608)	6.17e-5
Novegeorurbanarea₁	−0.1337 (−0.1346, −0.1328)	−1.385e-2 (−1.430e-2, −1.342e-2)	1.69e-5
Novegeorurbanarea₂	6.035e-2 (5.992e-2, 6.075e-2)	7.831e-3 (7.584e-3, 8.097e-3)	8.24e-6
Σ₁₁	1.599e-2 (1.594e-2, 1.603e-2)	3.514e-4 (3.477e-4, 3.553e-4)	1.93e-7
Σ₁₂	−6.491e-3 (−6.512e-3, −6.471e-3)	−1.084e-4 (−1.100e-4, −1.067e-4)	8.19e-8
Σ₂₂	3.656e-3 (3.646e-3, 3.667e-3)	1.074e-4 (1.063e-4, 1.084e-4)	4.79e-8
Ω₁₁	–	1.675e-2 (1.674e-2, 1.676e-2)	4.17e-7
Ω₁₂	–	−6.873e-3 (−6.879e-3, −6.867e-3)	1.77e-7
Ω₂₂	–	3.764e-3 (3.760e-3, 3.768e-3)	9.06e-8
ϕ₁	–	3.995 (3.887, 4.075)	7.535e-3
ϕ₂	–	12.376 (11.512, 13.320)	7.60e-3
RMSPEa	[0.074, 0.0359, 0.0581]	[0.0326, 0.0171, 0.0260]
CRPSa	[−0.04135, −0.01988, −0.03061]	[−0.01561, −0.00879, −0.0122]
CVGa	[0.956, 0.958, 0.957]	[0.954, 0.947, 0.950]
INTa	[0.3468, 0.1711, 0.2589]	[0.1965, 0.0995, 0.1480]
Time (min)	11	2318

^a[1st response transformed NDVI, 2nd response red reflectance, all responses].

We provide RMSPE, CVG, CRPS, INT, MCSE, and run time in Table 3. Apparently BLMC substantially improved predictive accuracy. BLMC's RMSPEs were over 50% less than the Bayesian linear model. CVG is similar between two models, while INT and CRPS also favored BLMC over the Bayesian linear model. Figure 2 presents the estimated latent processes from BLMC. Notably, the BLMC smooths out the predictions in the held-out region. The model's run time was around 38.6 h, which is still impressive given the full model-based analysis it offers for such a massive multivariate spatial data set.

We also fitted a BLMC with diagonal Σ to explore the underlying latent processes of 10 (transformed) responses: (i) NDVI, (ii) EVI, (iii) GPP, (iv) net photosynthesis (PsnNet), (v) red reflectance (red refl), (vi) blue reflectance (blue refl), (vii) average daily global ET, (viii) latent heat flux (LE), (ix) potential ET (PET), and (x) potential LE (PLE). There are, in total, 12 057 locations with no responses and 656 366 observed locations with misaligned data (at least one but not all responses), which cover 65.12% of observed locations. We provide a heat-map (Figure 2 (i)) to present the status of misalignment over the study domain.

Based on the exploratory analysis, we observed two groups of responses that have high within-group correlations but relatively low between-group correlations (see Figure 2 (g)). Hence we picked ⁠. Estimates from the BLMC model are presented in Table 4. No vegetation or urban area exhibits lower vegetation indexes (lower NDVI and EVI) and lower production of chemical energy in organic compounds by living organisms (lower GPP and PsnNet). We observe a trend of higher blue reflectance, red reflectance, ET (higher ET LE), and lower potential ET (lower PET PLE) in urban area and area with no vegetation. We provide maps of posterior predictions for all 10 variables in Supplementary Appendix F. The latent processes corresponding to transformed NDVI and red reflectance fitted in two analyses in Figure 2 share a similar pattern. Finally, the heat map of the posterior mean of the finite sample correlation among the latent processes (elements of Ω as defined in Section 4.2) based on BLMC with diagonal Σ, presented in Figure 2 (h), reveals a high underlying correlation among NDVI, EVI, GPP, PsnNet, red and blue reflectance, and that LE and ET are slightly more correlated with NDVI and EVI than PLE and PET. The total run time for BLMC with diagonal Σ was around 60.7 h (3642.25 min).

TABLE 4

Vegetation data analysis summary Table 2: posterior mean (2.5%, 97.5%)

Response	Slope	MCSE	Nugget (⁠⁠)	MCSE
NDVI	−0.0120 (−0.0124, −0.0116)	1.37e-5	7.46e-4 (7.42e-4, 7.49e-4)	7.60e-8
EVI	−4.38e-3 (−4.68e-3, −4.08e-3)	6.86e-6	8.68e-4 (8.65e-4, 8.7e-4)	3.07e-8
GPP	−0.197 (−0.199, −0.194)	8.31e-5	0.0244 (0.0243, 0.0245)	2.34e-6
PsnNet	−4.48e-3 (−5.39e-3, −3.50e-3)	3.42e-5	5.34e-3 (5.32e-3, 5.36e-3)	3.50e-7
red refl	4.49e-3 (4.20e-3, 4.77e-3)	5.11e-6	9.84e-4 (9.81e-4, 9.87e-4)	3.13e-8
blue refl	0.0123 (0.0121, 0.0124)	2.74e-6	2.60e-4 (2.59e-4, 2.61e-4)	8.81e-9
LE	0.0908 (0.0884, 0.0932)	1.36e-4	0.0531 (0.0529, 0.0533)	2.37e-6
ET	0.0919 (0.0895, 0.0944)	1.49e-4	0.0531 (0.053, 0.0533)	2.27e-6
PLE	−3.64e-3 (−3.98e-3, −3.36e-3)	5.50e-5	2.095e-5 (2.086e-5, 2.104e-5)	1.63e-9
PET	−4.88e-3 (−5.99e-3, −3.96e-3)	1.81e-4	6.50e-5 (6.44e-5, 6.57e-5)	2.20e-8

Response	Slope	MCSE	Nugget (⁠⁠)	MCSE
NDVI	−0.0120 (−0.0124, −0.0116)	1.37e-5	7.46e-4 (7.42e-4, 7.49e-4)	7.60e-8
EVI	−4.38e-3 (−4.68e-3, −4.08e-3)	6.86e-6	8.68e-4 (8.65e-4, 8.7e-4)	3.07e-8
GPP	−0.197 (−0.199, −0.194)	8.31e-5	0.0244 (0.0243, 0.0245)	2.34e-6
PsnNet	−4.48e-3 (−5.39e-3, −3.50e-3)	3.42e-5	5.34e-3 (5.32e-3, 5.36e-3)	3.50e-7
red refl	4.49e-3 (4.20e-3, 4.77e-3)	5.11e-6	9.84e-4 (9.81e-4, 9.87e-4)	3.13e-8
blue refl	0.0123 (0.0121, 0.0124)	2.74e-6	2.60e-4 (2.59e-4, 2.61e-4)	8.81e-9
LE	0.0908 (0.0884, 0.0932)	1.36e-4	0.0531 (0.0529, 0.0533)	2.37e-6
ET	0.0919 (0.0895, 0.0944)	1.49e-4	0.0531 (0.053, 0.0533)	2.27e-6
PLE	−3.64e-3 (−3.98e-3, −3.36e-3)	5.50e-5	2.095e-5 (2.086e-5, 2.104e-5)	1.63e-9
PET	−4.88e-3 (−5.99e-3, −3.96e-3)	1.81e-4	6.50e-5 (6.44e-5, 6.57e-5)	2.20e-8

TABLE 4

Vegetation data analysis summary Table 2: posterior mean (2.5%, 97.5%)

Response	Slope	MCSE	Nugget (⁠⁠)	MCSE
NDVI	−0.0120 (−0.0124, −0.0116)	1.37e-5	7.46e-4 (7.42e-4, 7.49e-4)	7.60e-8
EVI	−4.38e-3 (−4.68e-3, −4.08e-3)	6.86e-6	8.68e-4 (8.65e-4, 8.7e-4)	3.07e-8
GPP	−0.197 (−0.199, −0.194)	8.31e-5	0.0244 (0.0243, 0.0245)	2.34e-6
PsnNet	−4.48e-3 (−5.39e-3, −3.50e-3)	3.42e-5	5.34e-3 (5.32e-3, 5.36e-3)	3.50e-7
red refl	4.49e-3 (4.20e-3, 4.77e-3)	5.11e-6	9.84e-4 (9.81e-4, 9.87e-4)	3.13e-8
blue refl	0.0123 (0.0121, 0.0124)	2.74e-6	2.60e-4 (2.59e-4, 2.61e-4)	8.81e-9
LE	0.0908 (0.0884, 0.0932)	1.36e-4	0.0531 (0.0529, 0.0533)	2.37e-6
ET	0.0919 (0.0895, 0.0944)	1.49e-4	0.0531 (0.053, 0.0533)	2.27e-6
PLE	−3.64e-3 (−3.98e-3, −3.36e-3)	5.50e-5	2.095e-5 (2.086e-5, 2.104e-5)	1.63e-9
PET	−4.88e-3 (−5.99e-3, −3.96e-3)	1.81e-4	6.50e-5 (6.44e-5, 6.57e-5)	2.20e-8

Response	Slope	MCSE	Nugget (⁠⁠)	MCSE
NDVI	−0.0120 (−0.0124, −0.0116)	1.37e-5	7.46e-4 (7.42e-4, 7.49e-4)	7.60e-8
EVI	−4.38e-3 (−4.68e-3, −4.08e-3)	6.86e-6	8.68e-4 (8.65e-4, 8.7e-4)	3.07e-8
GPP	−0.197 (−0.199, −0.194)	8.31e-5	0.0244 (0.0243, 0.0245)	2.34e-6
PsnNet	−4.48e-3 (−5.39e-3, −3.50e-3)	3.42e-5	5.34e-3 (5.32e-3, 5.36e-3)	3.50e-7
red refl	4.49e-3 (4.20e-3, 4.77e-3)	5.11e-6	9.84e-4 (9.81e-4, 9.87e-4)	3.13e-8
blue refl	0.0123 (0.0121, 0.0124)	2.74e-6	2.60e-4 (2.59e-4, 2.61e-4)	8.81e-9
LE	0.0908 (0.0884, 0.0932)	1.36e-4	0.0531 (0.0529, 0.0533)	2.37e-6
ET	0.0919 (0.0895, 0.0944)	1.49e-4	0.0531 (0.053, 0.0533)	2.27e-6
PLE	−3.64e-3 (−3.98e-3, −3.36e-3)	5.50e-5	2.095e-5 (2.086e-5, 2.104e-5)	1.63e-9
PET	−4.88e-3 (−5.99e-3, −3.96e-3)	1.81e-4	6.50e-5 (6.44e-5, 6.57e-5)	2.20e-8

6 Summary and Discussion

We have proposed scalable models for analyzing massive and possibly misaligned multivariate spatial data sets. Our framework offers flexible covariance structures and scalability by modeling the loading matrix of spatial factors using matrix-normal distributions and the factors themselves as NNGPs. This process-based formulation allows us to resolve spatial misalignment by fully model-based imputation. Through a set of simulation examples and an analysis of a massive misaligned data set comprising remote-sensed variables, we demonstrated the inferential and computational benefits accrued from our proposed framework.

This work can be expanded further in at least two important directions. The first is to extend the current methods to spatiotemporal data sets, where multiple variables are indexed by spatial coordinates, as considered here, as well as by temporal indices. Associations are likely to be exhibited across space and time as well as among the variables within a location and time-point. In addition, these variables are likely to be misaligned across time and space. Regarding the scalability of the spatiotemporal process, we can build a dynamic nearest-neighbor Gaussian process (DNNGP) (Datta et al., 2016) to model spatiotemporal factors and one can also envisage temporal dependence on the loading matrix.

A second direction will consider spatially varying coefficient models. We model the regression coefficients β using a spatial (or spatiotemporal) random field to capture spatial (or spatiotemporal) patterns in how some of the predictors impact the outcome. We can assign the prior of the regression coefficients β using a multivariate Gaussian random field with a proportional cross-covariance function. Then the prior of β over observed locations follows a matrix-normal distribution, which is the prior we designed for β in all of the proposed models in this article. While the modification seems to be easy, the actual implementation requires a more detailed exploration, and we leave these topics for further explorations.

From a computational perspective, we clearly need to further explore high-performance computing and high-dimensional spatial models amenable to such platforms. The programs provided in this work are for illustration and have limited usage in graphical processing units (GPUs) computing and parallelized CPU computing. A parallel CPU computing algorithm for the BLMC model can simultaneously sample multiple MCMC chains, improving the performance of the actual implementations. Implementations with modeling methods such as MRA (Katzfuss, 2017) also require dedicated programming with GPU. Other scalable modeling methods that build graphical Gaussian models on space, time, and the number of variables can lead to sparse models for high-dimensional multivariate data and scale not only up to millions of locations and time points, but also to hundreds or even thousands of spatially or spatiotemporally oriented variables. The idea here will be to extend current developments in Vecchia-type models to graphs building dependence among a large number of variables so that the precision matrices across space, time, and variables are sparse. Research on scalable statistical models and high-performance computing algorithms for such models will be of substantial interest to statisticians and environmental scientists.

Data Availability Statement

The simulated and remote-sensed vegetation index data that support the findings in this article are openly available at GitHub at https://github.com/LuZhangstat/Multi_NNGP, Zhang (2021).

Acknowledgments

The work of the authors has been supported in part by National Science Foundation (NSF) under grants NSF/DMS 1916349 and NSF/IIS 1562303, and by the National Institute of Environmental Health Sciences (NIEHS) under grants R01ES030210 and 5R01ES027027.

References

Banerjee

,

S.

(

2017

)

High-dimensional Bayesian geostatistics

.

Bayesian Analysis

,

12

,

583

–

614

.

Banerjee

,

S.

,

Carlin

,

B.P.

and

Gelfand

,

A.E.

(

2014

).

Hierarchical Modeling and Analysis for Spatial Data

.

Boca Raton, FL

:

CRC Press

.

Banerjee

,

S.

and

Gelfand

,

A.

(

2002

)

Prediction, interpolation and regression for spatially misaligned data

.

Sankhya: The Indian Journal of Statistics, Series A

,

64

,

227

–

245

.

Bezanson

,

J.

,

Edelman

,

A.

,

Karpinski

,

S.

and

Shah

,

V.B.

(

2017

)

Julia: a fresh approach to numerical computing

.

SIAM Review

,

59

,

65

–

98

.

Bourgault

,

G.

and

Marcotte

,

D.

(

1991

)

Multivariable variogram and its application to the linear model of coregionalization

.

Mathematical Geology

,

23

,

899

–

928

.

Chiles

,

J.-P.

and

Delfiner

,

P.

(

2009

).

Geostatistics: Modeling Spatial Uncertainty

, 2nd edition.

New York

:

John Wiley & Sons

.

Cressie

,

N.

and

Wikle

,

C.K.

(

2015

).

Statistics for Spatio-Temporal Data

.

Hoboken, NJ

:

John Wiley & Sons

.

Datta

,

A.

,

Banerjee

,

S.

,

Finley

,

A.O.

and

Gelfand

,

A.E.

(

2016

)

Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets

.

Journal of the American Statistical Association

,

111

,

800

–

812

.

Datta

,

A.

,

Banerjee

,

S.

,

Finley

,

A.O.

,

Hamm

,

N. A.S.

and

Schaap

,

M.

(

2016

)

Non-separable dynamic nearest-neighbor gaussian process models for large spatio-temporal data with an application to particulate matter analysis

.

Annals of Applied Statistics

,

10

,

1286

–

1316

.

Dawid

,

A.P.

(

1981

)

Some matrix-variate distribution theory: notational considerations and a Bayesian application

.

Biometrika

,

68

,

265

–

274

.

Finley

,

A.O.

,

Banerjee

,

S.

and

Carlin

,

B.P.

(

2007

)

spbayes: an R package for univariate and multivariate hierarchical point-referenced spatial models

.

Journal of Statistical Software

,

19

,

1

.

Flegal

,

J.M.

,

Haran

,

M.

and

Jones

,

G.L.

(

2008

)

Markov chain Monte Carlo: can we trust the third significant figure?

Statistical Science

,

23

(

2

),

250

–

260

.

Gamerman

,

D.

and

Moreira

,

A.R.

(

2004

)

Multivariate spatial regression models

.

Journal of Multivariate Analysis

,

91

,

262

–

281

.

Gelfand

,

A.

,

Schmidt

,

A.

,

Banerjee

,

S.

and

Sirmans

,

C.

(

2004

)

Nonstationary multivariate process modeling through spatially varying coregionalization

.

TEST: An Official Journal of the Spanish Society of Statistics and Operations Research

,

13

,

263

–

312

.

Gelfand

,

A.E.

and

Banerjee

,

S.

(

2010

) Multivariate spatial process models. In:

Gelfand

,

A.

,

Diggle

,

P.

,

Fuentes

,

M.

and

Guttorp

,

P.

(Eds.)

Handbook of Spatial Statistics

.

Boca Raton, FL

:

CRC Press

, pp.

495

–

516

.

Gelman

,

A.

,

Carlin

,

J.B.

,

Stern

,

H.S.

,

Dunson

,

D.B.

,

Vehtari

,

A.

and

Rubin

,

D.B.

(

2013

).

Bayesian Data Analysis, Texts in Statistical Science

. 3rd edition.

Boca Raton, FL

:

Chapman & Hall/CRC

.

Genton

,

M.G.

and

Kleiber

,

W.

(

2015

)

Cross-covariance functions for multivariate geostatistics

.

Statistical Science

,

30

,

147

–

163

.

Gneiting

,

T.

and

Raftery

,

A.E.

(

2007

)

Strictly proper scoring rules, prediction, and estimation

.

Journal of the American Statistical Association

,

102

,

359

–

378

.

Goulard

,

M.

and

Voltz

,

M.

(

1992

)

Linear coregionalization model: tools for estimation and choice of cross-variogram matrix

.

Mathematical Geology

,

24

,

269

–

286

.

Haario

,

H.

,

Saksman

,

E.

and

Tamminen

,

J.

(

2005

)

Componentwise adaptation for high dimensional MCMC

.

Computational Statistics

,

20

,

265

–

273

.

Heaton

,

M.J.

,

Datta

,

A.

,

Finley

,

A.O.

,

Furrer

,

R.

,

Guinness

,

J.

,

Guhaniyogi

,

R.

, et al. (

2019

)

A case study competition among methods for analyzing large spatial data

.

Journal of Agricultural, Biological and Environmental Statistics

,

24

,

398

–

425

.

Katzfuss

,

M.

(

2017

)

A multi-resolution approximation for massive spatial datasets

.

Journal of the American Statistical Association

,

112

,

201

–

214

.

Le

,

N.

,

Sun

,

L.

and

Zidek

,

J.V.

(

2001

)

Spatial prediction and temporal backcasting for environmental fields having monotone data patterns

.

Canadian Journal of Statistics

,

29

,

529

–

554

.

Le

,

N.D.

,

Sun

,

W.

and

Zidek

,

J.V.

(

1997

)

Bayesian multivariate spatial interpolation with data missing by design

.

Journal of the Royal Statistical Society: Series B (Statistical Methodology)

,

59

,

501

–

510

.

Le

,

N.D.

and

Zidek

,

J.V.

(

2006

).

Statistical Analysis of Environmental Space-Time Processes

.

New York

:

Springer Science & Business Media

.

Lopes

,

H.F.

,

Salazar

,

E.

and

Gamerman

,

D.

(

2008

)

Spatial dynamic factor analysis

.

Bayesian Analysis

,

3

(

4

),

759

–

792

.

Lopes

,

H.F.

and

West

,

M.

(

2004

)

Bayesian model assessment in factor analysis

.

Statistica Sinica

,

14

,

41

–

67

.

Mu

,

Q.

,

Zhao

,

M.

and

Running

,

S.W.

(

2013

)

Modis global terrestrial evapotranspiration (ET) product (nasa mod16a2/a3)

.

Algorithm Theoretical Basis Document, Collection, 5

.

Nishimura

,

A.

and

Suchard

,

M.A.

(

2018

)

Prior-preconditioned conjugate gradient method for accelerated gibbs sampling in “large n & large p” sparse Bayesian regression

.

arXiv preprint arXiv:1810.12437

.

Ramon Solano

,

R.

,

Didan

,

K.

,

Jacobson

,

A.

and

Huete

,

A.

(

2010

)

Modis Vegetation Index User's Guide

.

Tucson, AZ

:

The University of Arizona

.

Ren

,

Q.

and

Banerjee

,

S.

(

2013

)

Hierarchical factor models for large spatially misaligned datasets: a low-rank predictive process approach

.

Biometrics

,

69

,

19

–

30

.

Roberts

,

G.O.

and

Rosenthal

,

J.S.

(

2009

)

Examples of adaptive MCMC

.

Journal of Computational and Graphical Statistics

,

18

,

349

–

367

.

Salvaña

,

M.L.O.

and

Genton

,

M.G.

(

2020

)

Nonstationary cross-covariance functions for multivariate spatio-temporal random fields

.

Spatial Statistics

,

37

, 100411.

Schmidt

,

A.M.

and

Gelfand

,

A.E.

(

2003

)

A Bayesian coregionalization approach for multivariate pollutant data

.

Journal of Geophysical Research: Atmospheres

,

108

,

8783

.

Sulla-Menashe

,

D.

and

Friedl

,

M.A.

(

2018

)

User Guide to Collection 6 Modis Land Cover (mcd12q1 and mcd12c1) Product

.

Reston, VA

:

USGS

, pp.

1

–

18

.

Sun

,

W.

,

Le

,

N.D.

,

Zidek

,

J.V.

and

Burnett

,

R.

(

1998

)

Assessment of a Bayesian multivariate interpolation approach for health impact studies

.

Environmetrics: The Official Journal of the International Environmetrics Society

,

9

,

565

–

586

.

Sun

,

Y.

,

Li

,

B.

and

Genton

,

M.

(

2011

) Geostatistics for large datasets. In:

Montero

,

J.

,

Porcu

,

E.

and

Schlather

,

M.

(Eds.),

Advances and Challenges in Space-Time Modelling of Natural Events

.

Berlin Heidelberg

:

Springer-Verlag

, pp.

55

–

77

.

Taylor-Rodriguez

,

D.

,

Finley

,

A.O.

,

Datta

,

A.

,

Babcock

,

C.

,

Andersen

,

H.E.

,

Cook

,

B.D.

, et al. (

2019

)

Spatial factor models for high-dimensional and large spatial data: an application in forest variable mapping

.

Statistica Sinica

,

29

,

1155

–

1180

.

PubMed

Thorndike

,

R.L.

(

1953

)

Who belongs in the family

.

Psychometrika

,

18

,

267

–

276

.

. https://github.com/LuZhangstat/Multi_NNGP/tree/master/RDA/data/rawdata.

Wackernagel

,

H.

(

2003

).

Multivariate Geostatistics

, 3 Edition.

Berlin

:

Springer-Verlag

.

Wang

,

F.

and

Wall

,

M.M.

(

2003

)

Generalized common spatial factor model

.

Biostatistics

,

4

,

569

–

582

.

Zhang

,

L.

(

2021

).

Remote-Sensed Vegetation data