Distributed model building and recursive integration for big spatial data modeling

95% CP in percentage for the first set of simulations: |$S=400$|⁠, |$\boldsymbol{\theta }=\lbrace 0.3, 0.6, 0.8, \log (3), \log (0.5), \log (1.6)\rbrace$|⁠.

Method	\|$K_1,\ldots ,K_M$\|	Intercept	\|$X_1$\|	\|$X_2$\|	\|$\log (\tau ^2)$\|	\|$\log (\rho ^2)$\|	\|$\log (\sigma ^2)$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
\|$\widehat{\boldsymbol{\theta }}_s$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
Part.		100	100	100	0	0	0
NNGP		95	95	95	96	96	96

Method	\|$K_1,\ldots ,K_M$\|	Intercept	\|$X_1$\|	\|$X_2$\|	\|$\log (\tau ^2)$\|	\|$\log (\rho ^2)$\|	\|$\log (\sigma ^2)$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
\|$\widehat{\boldsymbol{\theta }}_s$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
Part.		100	100	100	0	0	0
NNGP		95	95	95	96	96	96

TABLE 1

95% CP in percentage for the first set of simulations: |$S=400$|⁠, |$\boldsymbol{\theta }=\lbrace 0.3, 0.6, 0.8, \log (3), \log (0.5), \log (1.6)\rbrace$|⁠.

Method	\|$K_1,\ldots ,K_M$\|	Intercept	\|$X_1$\|	\|$X_2$\|	\|$\log (\tau ^2)$\|	\|$\log (\rho ^2)$\|	\|$\log (\sigma ^2)$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
\|$\widehat{\boldsymbol{\theta }}_s$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
Part.		100	100	100	0	0	0
NNGP		95	95	95	96	96	96

Method	\|$K_1,\ldots ,K_M$\|	Intercept	\|$X_1$\|	\|$X_2$\|	\|$\log (\tau ^2)$\|	\|$\log (\rho ^2)$\|	\|$\log (\sigma ^2)$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
\|$\widehat{\boldsymbol{\theta }}_s$\|	2, 2, 4	94	96	94	96	93	95
	2, 4, 2	95	96	94	96	93	95
	4, 2, 2	94	95	95	96	93	95
	2, 8	94	96	94	96	93	95
	2, 2, 2, 2	95	96	94	96	93	95
Part.		100	100	100	0	0	0
NNGP		95	95	95	96	96	96

Computing times are reported in Table 2. Using |$\widehat{\boldsymbol{\theta }}_s$| is faster than recursively updating weights with |$\widehat{\boldsymbol{\theta }}_r$|⁠. In comparison, the partitioning and NNGP approaches are 212 and 467 times slower, respectively, than our estimator |$\widehat{\boldsymbol{\theta }}_s$| with |$K_1=4,K_2=K_3=2$|⁠. We also observe the increased computational burden in Settings IV and V. This phenomenon is underpinned by the computation complexity orders of |$\widehat{\boldsymbol{\theta }}_r$| and |$\widehat{\boldsymbol{\theta }}_s$| derived in Sections 2.3.3 and 2.5, which show that computation is slower when M or |$k_{\max }$| are larger.

TABLE 2

Mean elapsed time (Monte Carlo standard deviation) in seconds of |$\widehat{\boldsymbol{\theta }}_r$|⁠, |$\widehat{\boldsymbol{\theta }}_s$|⁠, NNGP and Part.

Method	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2$\|	\|$K_1,K_2,K_3,K_4$\|	16 CPUs
	\|$=2,2,4$\|	\|$=2,4,2$\|	\|$=4,2,2$\|	\|$=2,8$\|	\|$=2,2,2,2$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	0.80 (0.092)	0.66 (0.060)	0.66 (0.060)	1.2 (0.86)	0.97 (0.75)
\|$\widehat{\boldsymbol{\theta }}_s$\|	0.72 (0.084)	0.60 (0.064)	0.60 (0.064)	1.2 (0.82)	1.1 (1.2)
NNGP						276 (135)
Part.						127 (11)

Method	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2$\|	\|$K_1,K_2,K_3,K_4$\|	16 CPUs
	\|$=2,2,4$\|	\|$=2,4,2$\|	\|$=4,2,2$\|	\|$=2,8$\|	\|$=2,2,2,2$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	0.80 (0.092)	0.66 (0.060)	0.66 (0.060)	1.2 (0.86)	0.97 (0.75)
\|$\widehat{\boldsymbol{\theta }}_s$\|	0.72 (0.084)	0.60 (0.064)	0.60 (0.064)	1.2 (0.82)	1.1 (1.2)
NNGP						276 (135)
Part.						127 (11)

Across 500 simulations for the first set of simulations: |$S=400$|⁠, |$\boldsymbol{\theta }=\lbrace 0.3, 0.6, 0.8, \log (3), \log (0.5), \log (1.6)\rbrace$|⁠.

TABLE 2

Mean elapsed time (Monte Carlo standard deviation) in seconds of |$\widehat{\boldsymbol{\theta }}_r$|⁠, |$\widehat{\boldsymbol{\theta }}_s$|⁠, NNGP and Part.

Method	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2$\|	\|$K_1,K_2,K_3,K_4$\|	16 CPUs
	\|$=2,2,4$\|	\|$=2,4,2$\|	\|$=4,2,2$\|	\|$=2,8$\|	\|$=2,2,2,2$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	0.80 (0.092)	0.66 (0.060)	0.66 (0.060)	1.2 (0.86)	0.97 (0.75)
\|$\widehat{\boldsymbol{\theta }}_s$\|	0.72 (0.084)	0.60 (0.064)	0.60 (0.064)	1.2 (0.82)	1.1 (1.2)
NNGP						276 (135)
Part.						127 (11)

Method	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2,K_3$\|	\|$K_1,K_2$\|	\|$K_1,K_2,K_3,K_4$\|	16 CPUs
	\|$=2,2,4$\|	\|$=2,4,2$\|	\|$=4,2,2$\|	\|$=2,8$\|	\|$=2,2,2,2$\|
\|$\widehat{\boldsymbol{\theta }}_r$\|	0.80 (0.092)	0.66 (0.060)	0.66 (0.060)	1.2 (0.86)	0.97 (0.75)
\|$\widehat{\boldsymbol{\theta }}_s$\|	0.72 (0.084)	0.60 (0.064)	0.60 (0.064)	1.2 (0.82)	1.1 (1.2)
NNGP						276 (135)
Part.						127 (11)

Across 500 simulations for the first set of simulations: |$S=400$|⁠, |$\boldsymbol{\theta }=\lbrace 0.3, 0.6, 0.8, \log (3), \log (0.5), \log (1.6)\rbrace$|⁠.

A second set of simulations with the same model and |$S=25600$| is presented in the supplement. There, we show that our MRRI estimators scale well when S grows. A third set of simulations in the supplement investigates the log-linear model with h the log function, |$S=900$|⁠, and |$C_{\alpha }(\boldsymbol{s}_j, \boldsymbol{s}_{j^\prime }; \boldsymbol{\gamma })$| the exponential covariance function. Conclusions drawn from this simulation mirror those from the first 2 sets of simulations.

The fourth set of simulations mimics the data analysis of Section 4 with |$S=800$|⁠. We consider 2 ROIs, |$\mathcal {S}_1=[1,20]^2=\lbrace \boldsymbol{s}_j\rbrace _{j=1}^{400}$| and |$\mathcal {S}_2=[21,40]^2=\lbrace \boldsymbol{s}_j\rbrace _{j=401}^{800}$|⁠, |$\boldsymbol{s}_j \in \mathbb {R}^2$|⁠. Defining |$\mathcal {S}=\mathcal {S}_1 \cup \mathcal {S}_2$|⁠, the Gaussian outcomes |$\lbrace y_i(\mathcal {S})\rbrace _{i=1}^N$|⁠, |$N=10000$|⁠, are independently simulated with mean |$\lbrace \boldsymbol{1}\beta \rbrace _{i=1}^N$|⁠, |$\beta =0$|⁠, |$\boldsymbol{1}\in \mathbb {R}^S$| a vector of one’s, and the spatial covariance function in (1) with |$d=2$|⁠. The spatial variance is modeled through |$\tau (\boldsymbol{s}_j, \boldsymbol{s}_{j^\prime })=\tau ^2$|⁠, the spatial correlation is modeled through |$\boldsymbol{\rho }(\boldsymbol{s}_j) = \boldsymbol{\rho }_1 \mathbb {1}(\boldsymbol{s}_j \in \mathcal {S}_1) + \boldsymbol{\rho }_2 \mathbb {1}(\boldsymbol{s}_j \ \in \mathcal {S}_2)$|⁠, and |$\boldsymbol{\gamma }=\lbrace \log (\tau ^2), \boldsymbol{\rho }^\top _1, \boldsymbol{\rho }^\top _2\rbrace ^\top \in \mathbb {R}^{1+2q}$|⁠. Here, |$\boldsymbol{X}_i \in \mathbb {R}^q$| consists of an intercept and 2 nonspatially varying continuous variables independently generated from a Gaussian distribution with mean 0 and variance 1. The true values of the dependence parameters are set to |$\sigma ^2=1.6$|⁠, |$\tau ^2=3$|⁠, |$\boldsymbol{\rho }_1=(0.5,0.5,0.5)$|⁠, and |$\boldsymbol{\rho }_2=(0.6,0.6,0.6)$|⁠. We estimate |$\boldsymbol{\theta }=\lbrace \beta , \log (\tau ^2), \boldsymbol{\rho }^\top _1, \boldsymbol{\rho }^\top _2, \log (\sigma ^2)\rbrace$| using |$\widehat{\boldsymbol{\theta }}_s$| using the recursive partition of S with |$M=3$|⁠: |$K_1=K_2=2$|⁠, and |$K_3=4$|⁠. Each set |$\mathcal {A}_{k_1\ldots k_m}$| at resolution m is a union of the nearest neighbors in |$\mathcal {S}_1$| and |$\mathcal {S}_2$| separately, so that each set |$\mathcal {A}_{k_1k_2k_3}$| consists of 25 locations from |$\mathcal {S}_1$| and 25 locations from |$\mathcal {S}_2$|⁠, |$\mathcal {A}_{k_1k_2}$| consists of 100 locations from |$\mathcal {S}_1$| and 100 locations from |$\mathcal {S}_2$|⁠, and so on. Analyses are parallelized across 16 CPUs with 2 GB of RAM each. Table 3 reports the RMSE, ESE, ASE, mean bias (BIAS), and CP of our MRRI estimator |$\widehat{\boldsymbol{\theta }}_s$| averaged across 500 simulations, where |$\boldsymbol{\rho }_j=(\rho _{j1},\rho _{j2},\rho _{j3})$|⁠, |$j=1,2$|⁠.

TABLE 3

Simulation metrics of |$\widehat{\boldsymbol{\theta }}_s$| across 500 simulations for the fourth set of simulations: |$S=800$|⁠, |$\boldsymbol{\theta }=\lbrace 0,\log (3),0.5,0.5,0.5,0.6,0.6,0.6,\log (1.6)\rbrace$|⁠.

Parameter	RMSE\|$\times 10^3$\|	ESE\|$\times 10^3$\|	ASE\|$\times 10^3$\|	BIAS\|$\times 10^4$\|	CP \|$(\%)$\|
\|$\beta$\|	1.4	1.4	1.4	−0.78	94
\|$\log (\tau ^2)$\|	1.1	1.1	1.1	0.44	96
\|$\rho _{11}$\|	1.9	1.9	1.9	−0.57	94
\|$\rho _{12}$\|	1.9	1.9	1.8	−0.88	94
\|$\rho _{13}$\|	1.9	1.8	1.8	−1.80	93
\|$\rho _{21}$\|	1.9	1.9	1.9	−0.35	94
\|$\rho _{22}$\|	1.9	1.9	1.9	−1.20	95
\|$\rho _{23}$\|	1.9	1.9	1.9	−0.49	94
\|$\log (\sigma ^2)$\|	1.2	1.2	1.2	−0.51	96

Parameter	RMSE\|$\times 10^3$\|	ESE\|$\times 10^3$\|	ASE\|$\times 10^3$\|	BIAS\|$\times 10^4$\|	CP \|$(\%)$\|
\|$\beta$\|	1.4	1.4	1.4	−0.78	94
\|$\log (\tau ^2)$\|	1.1	1.1	1.1	0.44	96
\|$\rho _{11}$\|	1.9	1.9	1.9	−0.57	94
\|$\rho _{12}$\|	1.9	1.9	1.8	−0.88	94
\|$\rho _{13}$\|	1.9	1.8	1.8	−1.80	93
\|$\rho _{21}$\|	1.9	1.9	1.9	−0.35	94
\|$\rho _{22}$\|	1.9	1.9	1.9	−1.20	95
\|$\rho _{23}$\|	1.9	1.9	1.9	−0.49	94
\|$\log (\sigma ^2)$\|	1.2	1.2	1.2	−0.51	96

TABLE 3

Parameter	RMSE\|$\times 10^3$\|	ESE\|$\times 10^3$\|	ASE\|$\times 10^3$\|	BIAS\|$\times 10^4$\|	CP \|$(\%)$\|
\|$\beta$\|	1.4	1.4	1.4	−0.78	94
\|$\log (\tau ^2)$\|	1.1	1.1	1.1	0.44	96
\|$\rho _{11}$\|	1.9	1.9	1.9	−0.57	94
\|$\rho _{12}$\|	1.9	1.9	1.8	−0.88	94
\|$\rho _{13}$\|	1.9	1.8	1.8	−1.80	93
\|$\rho _{21}$\|	1.9	1.9	1.9	−0.35	94
\|$\rho _{22}$\|	1.9	1.9	1.9	−1.20	95
\|$\rho _{23}$\|	1.9	1.9	1.9	−0.49	94
\|$\log (\sigma ^2)$\|	1.2	1.2	1.2	−0.51	96

Parameter	RMSE\|$\times 10^3$\|	ESE\|$\times 10^3$\|	ASE\|$\times 10^3$\|	BIAS\|$\times 10^4$\|	CP \|$(\%)$\|
\|$\beta$\|	1.4	1.4	1.4	−0.78	94
\|$\log (\tau ^2)$\|	1.1	1.1	1.1	0.44	96
\|$\rho _{11}$\|	1.9	1.9	1.9	−0.57	94
\|$\rho _{12}$\|	1.9	1.9	1.8	−0.88	94
\|$\rho _{13}$\|	1.9	1.8	1.8	−1.80	93
\|$\rho _{21}$\|	1.9	1.9	1.9	−0.35	94
\|$\rho _{22}$\|	1.9	1.9	1.9	−1.20	95
\|$\rho _{23}$\|	1.9	1.9	1.9	−0.49	94
\|$\log (\sigma ^2)$\|	1.2	1.2	1.2	−0.51	96

Parallelizing over |$\widetilde{K}=16$| CPUs, mean elapsed time (Monte Carlo standard deviation) is 29 minutes (22 minutes). Simulation metrics in Table 3 are consistent with the results from the previous simulations: the RMSE, ESE, and ASE are approximately equal, the BIAS is negligible, and the CP reaches the nominal 95% level. Further, to illustrate the high statistical power of our approach, we perform a test of the hypotheses |$H_0: \rho _{12} = \rho _{22}$| versus |$H_A: \rho _{12} \ne \rho _{22}$| using the test statistic |$Z=(\widehat{\rho }_{12} - \widehat{\rho }_{22} -\rho _0)\lbrace \mathbb {V}(\widehat{\rho }_{12}) + \mathbb {V}(\widehat{\rho }_{22}) - 2\mbox{Cov}(\widehat{\rho }_{12}, \widehat{\rho }_{22})\rbrace ^{-1/2}$|⁠, which follows an approximate standard normal distribution under |$H_0$| when |$\rho _0=0$|⁠. In the context of the analysis of Section 4, this test evaluates whether ASD is associated with different spatial correlation in the left and right precentral gyri. Across the 500 simulations, the test rejects the null 100% of the time at level 0.05, illustrating the high statistical power of our approach. Using |$\rho _0=\rho _{12}-\rho _{22}$|⁠, the type-I error rate is 3.6% across the 500 simulations.

4 ESTIMATION OF BRAIN FUNCTIONAL CONNECTIVITY

We return to the motivating neuroimaging application described in Section 1. Out of 1112 ABIDE participants, 774 passed quality control: 379 with ASD, 647 males (335 with ASD) and 127 females (44 with ASD), with mean age 15 years (standard deviation 6 years). The left and right precentral gyri form the primary motor cortex and are responsible for executing voluntary movements. They are two of the largest ROIs in the Harvard-Oxford atlas (FMRIB Software Library, 2018) with 1786 and 1888 voxels, respectively. Many individuals with ASD have motor deficits (Jansiewicz et al., 2006). Atypical connectivity within the left and right precentral gyri may indicate that these motor deficits are related to how these 2 brain regions coordinate movement. The preprocessing pipeline of rfMRI data has already been described by Craddock et al. (2013). Participant-specific data have been registered into a common template space such that voxel locations are comparable between participants.

Define |$\mathcal {S}_1=\lbrace \boldsymbol{s}_j\rbrace _{j=1}^{1888}$| and |$\mathcal {S}_2=\lbrace \boldsymbol{s}_j\rbrace _{j=1889}^{3674}$| the set of voxels in the right and left precentral gyri, respectively, and |$\mathcal {S}=\mathcal {S}_1 \cup \mathcal {S}_2$|⁠. Voxels in the atlas outside the brain are assigned missing. Following the thinning described in Section 2.1, we obtain independent replicates |$y_i(\boldsymbol{s}_j)$| at |$S=3674$| voxel locations, |$i=1, \ldots , 75888$|⁠. We refer to this dataset as the “primary” dataset; we compare estimates from this primary dataset to estimates from a secondary dataset, consisting of the excluded time points, with identical dimensions N and S. There is a priori no reason to believe that the data distribution differs across primary and secondary datasets, and so comparing results across both datasets will allow us to quantify robustness of our analysis, a notoriously difficult task in analyses of rfMRI data (Uddin et al., 2017).

Let |$\boldsymbol{X}_i$| be corresponding observations of |$q=5$| covariates for outcome i: an intercept, ASD status (1 for ASD, 0 for neurotypical), age (centered and standardized), sex (0 for male, 1 for female), and the age (centered and standardized) by ASD status interaction. We model |$\mu (\boldsymbol{s}_j; \boldsymbol{X}_i, \boldsymbol{\beta })=\beta$|⁠, where |$\mu (\boldsymbol{s}_j; \boldsymbol{X}_i, \boldsymbol{\beta })$| is the mean rfMRI time series at voxel |$\boldsymbol{s}_j$|⁠. The covariance |$C_{\alpha }(\boldsymbol{s}_{j^\prime }, \boldsymbol{s}_j; \boldsymbol{X}_i, \boldsymbol{\gamma })$| is that given in Equation (1) of Section 2.1 with |$d=3$| and |$\tau (\boldsymbol{s}_j, \boldsymbol{s}_{j^\prime })= \lbrace (\tau ^2_1)^{\mathbb {1}(\boldsymbol{s}_j \in \mathcal {S}_1) + \mathbb {1}(\boldsymbol{s}_{j^\prime } \in \mathcal {S}_1)} (\tau ^2_2)^{\mathbb {1}(\boldsymbol{s}_j \in \mathcal {S}_2) + \mathbb {1}(\boldsymbol{s}_{j^\prime } \in \mathcal {S}_2)} \rbrace ^{1/2}$|⁠. As in Section 3, the correlation is modeled through |$\boldsymbol{\rho }(\boldsymbol{s}_j) = \boldsymbol{\rho }_1 \mathbb {1}(\boldsymbol{s}_j \in \mathcal {S}_1) + \boldsymbol{\rho }_2 \mathbb {1}(\boldsymbol{s}_j \ \in \mathcal {S}_2)$|⁠. Thus, |$\boldsymbol{\gamma }=\lbrace \log (\tau ^2_1), \log (\tau ^2_2), \boldsymbol{\rho }_1^\top , \boldsymbol{\rho }_2^\top \rbrace \in \mathbb {R}^{2+2q}$| and |$\boldsymbol{\theta }=\lbrace \beta , \boldsymbol{\gamma }, \log (\sigma ^2)\rbrace$|⁠, with |$\boldsymbol{\gamma }$| the parameter of interest describing the effect of covariates on functional connectivity. We interpret the effect of ASD on the correlation structure in detail in the supplement. An illustration of the correlation between ROIs for various values of |$\rho _{j2}, \rho _{j^\prime 2}$| is provided in the supplement.

The size of each (primary and secondary) outcome dataset is 15 GB. To overcome the computational burden of a whole dataset analysis, we estimate |$\boldsymbol{\theta }$| using the sequential estimator |$\widehat{\boldsymbol{\theta }}_s$|⁠. To partition the 3-dimensional spatial domain, we recursively partition |$\mathcal {S}_1$| and |$\mathcal {S}_2$| separately into |$K_1=2$|⁠, |$K_2=K_3=4$| disjoint sets based on nearest neighbors, |$M=3$|⁠. The sets |$\mathcal {A}_{k_1\ldots k_m}$| consist of the union of the disjoint partition sets of |$\mathcal {S}_1$| and |$\mathcal {S}_2$| at each resolution |$m \in \lbrace 1, \ldots , M\rbrace$|⁠. This strategy ensures that each |$\mathcal {A}_{k_1\ldots k_M}$| contains locations from both |$\mathcal {S}_1$| and |$\mathcal {S}_2$|⁠, so that gyri-specific parameters |$\boldsymbol{\rho }_1,\boldsymbol{\rho }_2,\tau _1,\tau _2$| are identifiable in all sets |$\mathcal {A}_{k_1\ldots k_M}$|⁠, |$k_M=1\ldots , K_M$|⁠. A plot of the partitioning is provided in the supplement.

The analysis of the primary and secondary datasets takes 4.28 and 3.62 hours, respectively, when parallelized across 32 CPUs with 4 GB of RAM each; a comparison of these run times is detailed in the supplement. We attempted to obtain the MLE using either the whole primary or secondary dataset by supplying the value of |$\widehat{\boldsymbol{\theta }}_s$| as a starting value. The estimation required more than 170 GB of RAM and terminated without completing after 190 hours. The standard deviation (s.d.) of block-specific estimates of |$\log (\sigma ^2)$| across the |$K_1K_2K_3=32$| blocks is 0.401 and 0.399 in the primary and secondary datasets, respectively, supporting our assumption that |$\sigma ^2$| does not depend on |$\boldsymbol{s}$|⁠; complete summaries are available in the supplement. The estimated effect and s.d. of the intercept, ASD status, age, sex, and age by ASD status interaction are reported in Table 4. In the primary dataset, the estimates (s.d.) of |$\beta$|⁠, |$\log (\sigma ^2)$|⁠, |$\log (\tau ^2_1)$| and |$\log (\tau ^2_2)$| are, respectively, |$-0.00537$| (⁠|$2.03\times 10^{-4}$|⁠), |$-4.05$| (⁠|$3.90\times 10^{-4}$|⁠), |$-0.108$| (⁠|$2.53\times 10^{-4}$|⁠), and |$-0.0699$| (⁠|$2.48\times 10^{-4}$|⁠). In the secondary dataset, the estimates (s.d.) of |$\beta$|⁠, |$\log (\sigma ^2)$|⁠, |$\log (\tau ^2_1)$| and |$\log (\tau ^2_2)$| are, respectively, |$-0.00609$| (⁠|$2.03\times 10^{-4}$|⁠), |$-4.05$| (⁠|$3.88\times 10^{-4}$|⁠), |$-0.111$| (⁠|$2.51\times 10^{-4}$|⁠), and |$-0.0709$| (⁠|$2.46\times 10^{-4}$|⁠). As expected, estimates of |$\beta$| are close to 0 and estimates of |$\sigma ^2+\tau ^2_j$| are close to 1 due to the centering and standardizing. The cosine similarity between standardized estimates of |$\boldsymbol{\theta }$| in the primary and secondary datasets, that is, the cosine of the angle between the 2 standardized vectors |$\widehat{\theta }_{q}/\lbrace \mathbb {V}(\widehat{\theta }_{q}) \rbrace ^{1/2}$|⁠, |$q=1, \ldots , 14$|⁠, is 0.999994, indicating a high degree of agreement between the standardized vectors.

TABLE 4

Estimated covariate effects and s.d. in the primary and secondary datasets.

	\|$\boldsymbol{\rho }_1$\|		\|$\boldsymbol{\rho }_2$\|
Covariate	Estimate	s.d.\|$\times 10^4$\|	Estimate	s.d.\|$\times 10^4$\|
	Primary dataset
Intercept	0.569	1.44	0.561	1.42
ASD status	−0.00676	1.32	−0.0050	1.27
Age	−0.0222	1.07	−0.0120	0.932
Sex	0.0362	1.79	0.0655	1.72
Age by ASD status interaction	−0.00310	1.38	−0.00386	1.25
	Secondary dataset
Intercept	0.568	1.43	0.562	1.41
ASD status	−0.00511	1.32	−0.00472	1.27
Age	−0.0211	1.07	−0.0118	0.931
Sex	0.0341	1.78	0.0643	1.72
Age by ASD status interaction	−0.00503	1.38	−0.00591	1.25

	\|$\boldsymbol{\rho }_1$\|		\|$\boldsymbol{\rho }_2$\|
Covariate	Estimate	s.d.\|$\times 10^4$\|	Estimate	s.d.\|$\times 10^4$\|
	Primary dataset
Intercept	0.569	1.44	0.561	1.42
ASD status	−0.00676	1.32	−0.0050	1.27
Age	−0.0222	1.07	−0.0120	0.932
Sex	0.0362	1.79	0.0655	1.72
Age by ASD status interaction	−0.00310	1.38	−0.00386	1.25
	Secondary dataset
Intercept	0.568	1.43	0.562	1.41
ASD status	−0.00511	1.32	−0.00472	1.27
Age	−0.0211	1.07	−0.0118	0.931
Sex	0.0341	1.78	0.0643	1.72
Age by ASD status interaction	−0.00503	1.38	−0.00591	1.25

TABLE 4

Estimated covariate effects and s.d. in the primary and secondary datasets.

	\|$\boldsymbol{\rho }_1$\|		\|$\boldsymbol{\rho }_2$\|
Covariate	Estimate	s.d.\|$\times 10^4$\|	Estimate	s.d.\|$\times 10^4$\|
	Primary dataset
Intercept	0.569	1.44	0.561	1.42
ASD status	−0.00676	1.32	−0.0050	1.27
Age	−0.0222	1.07	−0.0120	0.932
Sex	0.0362	1.79	0.0655	1.72
Age by ASD status interaction	−0.00310	1.38	−0.00386	1.25
	Secondary dataset
Intercept	0.568	1.43	0.562	1.41
ASD status	−0.00511	1.32	−0.00472	1.27
Age	−0.0211	1.07	−0.0118	0.931
Sex	0.0341	1.78	0.0643	1.72
Age by ASD status interaction	−0.00503	1.38	−0.00591	1.25

	\|$\boldsymbol{\rho }_1$\|		\|$\boldsymbol{\rho }_2$\|
Covariate	Estimate	s.d.\|$\times 10^4$\|	Estimate	s.d.\|$\times 10^4$\|
	Primary dataset
Intercept	0.569	1.44	0.561	1.42
ASD status	−0.00676	1.32	−0.0050	1.27
Age	−0.0222	1.07	−0.0120	0.932
Sex	0.0362	1.79	0.0655	1.72
Age by ASD status interaction	−0.00310	1.38	−0.00386	1.25
	Secondary dataset
Intercept	0.568	1.43	0.562	1.41
ASD status	−0.00511	1.32	−0.00472	1.27
Age	−0.0211	1.07	−0.0118	0.931
Sex	0.0341	1.78	0.0643	1.72
Age by ASD status interaction	−0.00503	1.38	−0.00591	1.25

From a practical perspective, estimates and their standard errors are virtually identical across primary and secondary datasets. Two-sample Z-tests, however, mostly reject the null that elements of |$\boldsymbol{\theta }$| are equal in both datasets at the typical 0.05 level: this is a feature of the sample size and the high power of the test, rather than of true underlying differences between the 2 datasets, and illustrates well the challenges of robustness in analyses of rfMRI data. We calibrate the |$\alpha$|-level of hypothesis testing procedures in our analysis by borrowing ideas from knock-offs (Barber and Candès, 2015). We estimate a data-dependent type-I error rate threshold as the 5% quantile of the observed p-values of the 2-sample tests of equality between parameters in primary and secondary datasets. The Gaussian critical value corresponding to this 5% quantile is |$z_{\mbox{{crit}}}=9.87$|⁠.

Armed with this robust critical value, we evaluate whether the ASD main and interaction effects are significantly different across the 2 brain regions. We perform a test of the hypotheses |$H^m_0: \rho _{12}=\rho _{22}$| versus |$H^m_A: \rho _{12} \ne \rho _{22}$| and |$H^i_0: \rho _{15}=\rho _{25}$| versus |$H^i_A: \rho _{15}\ne \rho _{25}$| using the test statistic Z in Section 3, where |$\rho _{j2}$| and |$\rho _{j5}$| are the ASD main and interaction effects, respectively, in ROI |$\mathcal {S}_j$|⁠, |$j=1,2$|⁠. The test statistic for |$H^m_0$| versus |$H^m_A$| takes a value of 13.99 and 3.04 in the primary and secondary datasets, respectively; since we reject |$H^m_0$| in the primary but not the secondary dataset, we conclude that the main ASD effect is not significantly different in the correlation structure of both brain regions. The test statistic for |$H^i_0$| versus |$H^i_A$| takes a value of 6.14 and 7.13 in the primary and secondary datasets, respectively, and we conclude that the ASD by age interaction effect is not significantly different in the correlation structure of both brain regions. An analysis that excludes the age by ASD status interaction is included in the supplement and agrees with this analysis.

Summaries of distances |$d_{jj^\prime }=(\boldsymbol{s}_{j^\prime } - \boldsymbol{s}_j)^\top (\boldsymbol{s}_{j^\prime } - \boldsymbol{s}_j)$| are provided in the supplement. For a male participant of mean age (15.13 years), the estimated correlation structures between the right and left precentral gyri, within the right precentral gyrus, and within the left precentral gyrus, for a participant with and without ASD are, respectively, |$0.915 \exp (- 1.83 d_{jj^\prime })$| and |$0.915 \exp (- 1.86 d_{jj^\prime })$|⁠, |$0.898 \exp (- 1.85 d_{jj^\prime })$| and |$0.898 \exp (- 1.89 d_{jj^\prime })$|⁠, and |$0.932 \exp (- 1.82 d_{jj^\prime })$| and |$0.932 \exp (- 1.84 d_{jj^\prime })$|⁠. ASD manifests as hyperconnectivity within and between the right and left precentral gyri. These findings concur with those of Nebel et al. (2014). These results also concur with a less powered analysis that averages each participant’s rfMRI time series at each voxel, then regresses the participants’ Pearson correlation between the right and left precentral gyri onto an intercept, ASD status, age, sex, and the age by ASD status interaction. Estimates (s.d.) of covariate effects from this analysis are 0.726 (0.00759), |$-0.0457$| (0.0101), 0.0159 (0.00745), |$-0.0111$| (0.0137), and |$-0.00293$| (0.0101). Only the intercept, ASD status, and age effects are significant at level 0.05, highlighting the superior power of our spatial approach.

5 DISCUSSION

The proposed MRRI estimators depend on the choice of the recursive partition of |$\mathcal {S}$|⁠. We have suggested that this partitioning be performed such that |$S_{k_1\ldots k_m}$| and |$K_m$|⁠, |$m=1, \ldots , M$| are relatively small compared to |$S_0$| and shown through simulations that this leads to desirable statistical and computational performance. Nonetheless, the GMM is known to underestimate the variance of estimators when |$K_m$| is moderately large relative to N (Hansen et al., 1996). Thus, special care should be taken to ensure |$K_m$| is relatively low-dimensional.

In this paper, we have allowed |$\boldsymbol{\theta }$| to vary spatially under the constraint that it can be consistently estimated using subsets of the spatial domain |$\mathcal {S}$|⁠. While we have assumed that |$\sigma ^2$| does not depend on |$\boldsymbol{s}$|⁠, a model that allows |$\sigma ^2$| to depend on |$\boldsymbol{s}$| can easily be fitted in our framework under this constraint. Other settings may consider a setting in which each subset is modeled through its own, subset-specific parameter. Future research should focus on the development of recursive and sequential integration rules for this setting with partially heterogeneous parameters |$\boldsymbol{\theta }$| following, for example, the work of Manschot and Hector (2022) and Hector and Reich (2024) in spatially varying coefficient models. Such extensions would allow parameters such as the nugget variance |$\sigma ^2$| to vary with |$\boldsymbol{s}$|⁠. A priori, these extensions should follow from zero-padding the weight matrices in Equations (4) and (6).

While our approach was motivated by a comparison of functional connectivity between participants with and without ASD, the proposed methods are generally applicable to Gaussian process modeling of high-dimensional images. Our ABIDE analysis has primarily focused on modeling correlation between the left and right precentral gyri as a function of the within-ROI correlations. This analysis is suitable when within-ROI correlation is of primary interest. Extensions that model the between-ROI correlation through a more flexible covariance structure are of interest but beyond the scope of the present work.

ACKNOWLEDGMENTS

The authors thank the reviewers and associate editor for their valuable feedback, which led to an improved manuscript. The authors also thank the ABIDE study organizers and members.

FUNDING

E.C.H. and B.J.R. were supported by the National Science Foundation grant DMS 2152887.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

ABIDE I data usage is unrestricted for noncommercial research purposes. To obtain access to ABIDE I data, complete registration with NITRC at nitrc.org/ir, then register with the 1000 Functional Connectomes Project at nitrc.org/projects/fcon_1000. Data access is also described at fcon_1000.projects.nitrc.org/ indi/abide/abide_I.html.

References

Alaerts

Woolley

D. G.

Steyaert

Di Martino

Swinnen

S. P.

Wenderoth

. (

2014

Underconnectivity of the superior temporal sulcus predicts emotion recognition deficits in autism

Social Cognitive and Affective Neuroscience

1589

–

1600

Arbabshirani

M. R.

Damaraju

Phlypo

Plis

Allen

et al. (

2014

Impact of autocorrelation on functional connectivity

NeuroImage

102

294

–

308

Banerjee

Carlin

B. P.

Gelfand

A. E.

(

2014

Hierarchical Modeling and Analysis for Spatial Data

Boca Raton, Florida

Chapman and Hall

Barber

R. F.

Candès

E. J.

(

2015

Controlling the false discovery rate via knockoffs

The Annals of Statistics

2055

–

2085

Bradley

J. R.

Cressie

N. A.

Shi

(

2016

A comparison of spatial predictors when datasets could be very large

Statistics Surveys

100

–

131

Craddock

Benhajali

Chu

Chouinard

Evans

Jakab

et al. (

2013

The Neuro Bureau Preprocessing Initiative: open sharing of preprocessed neuroimaging data and derivatives

Frontiers in Neuroinformatics

Google Preview

Cressie

N. A.

(

1993

Statistics for Spatial Data

New York

Wiley

Cressie

N. A.

Johannesson

(

2008

Fixed rank kriging for very large spatial data sets

Journal of the Royal Statistical Society Series B (Statistical Methodology)

209

–

226

Cressie

N. A.

Wikle

C. K.

(

2015

Statistics for Spatio-Temporal Data

Hoboken, New Jersey

Wiley

Google Preview

. https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases.

Datta

Banerjee

Finley

A. O.

Gelfand

A. E.

(

2016

Hierarchical nearest-neighbour Gaussian process models for large geostatistical datasets

Journal of the American Statistical Association

111

800

–

812

Di Martino

Yan

C.-G.

Denio

Castellanos

Alaerts

et al. (

2014

The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism

Molecular Psychiatry

659

–

667

Finley

A. O.

Datta

Cook

B. D.

Morton

D. C.

Andersen

H. E.

Banerjee

. (

2019

Efficient algorithms for Bayesian nearest neighbour Gaussian processes

Journal of Computational and Graphical Statistics

401

–

414

FMRIB Software Library

(

2018

FSL Atlases

[Accessed 18 August 2021]

Fuentes

(

2007

Approximate Likelihood for Large Irregularly Spaced Spatial Data

Journal of the American Statistical Association

102

321

–

331

Furrer

Genton

M. G.

Nychka

D. W.

(

2006

Covariance tapering for interpolation of large spatial datasets

Journal of Computational and Graphical Statistics

502

–

523

Godambe

V. P.

(

1991

Estimating Functions

New York

Oxford University Press

Hansen

L. P.

(

1982

Large sample properties of generalized method of moments estimators

Econometrica

1029

–

1054

Hansen

L. P.

Heaton

Yaron

(

1996

Finite-sample properties of some alternative GMM estimators

Journal of Business and Economic Statistics

262

–

280

Heaton

M. J.

Datta

Finley

A. O.

Furrer

Guinness

Guhaniyogi

et al. (

2019

A case study competition among methods for analyzing large spatial data

Journal of Agricultural, Biological and Environmental Statistics

398

–

425

Hector

E. C.

Reich

B. J.

(

2024

Distributed inference for spatial extremes modeling in high dimensions

Journal of the American Statistical Association

119

1297

–

1308

Hector

E. C.

Song

P. X.-K.

(

2021

A distributed and integrated method of moments for high-dimensional correlated data analysis

Journal of the American Statistical Association

116

805

–

818

Heyde

C. C.

(

1997

Quasi-Likelihood and its Application: a General Approach to Optimal Parameter Estimation

New York

Springer

Jansiewicz

E. M.

Goldberg

M. C.

Newschaffer

C. J.

Denckla

M. B.

Landa

Mostofsky

S. H

. (

2006

Motor signs distinguish children with high functioning autism and Asperger’s syndrome from controls

Journal of Autism and Developmental Disorders

613

–

621

Katzfuss

(

2017

A multi-resolution approximation for massive spatial datasets

Journal of the American Statistical Association

112

201

–

214

Katzfuss

Gong

(

2020

A class of multi-resolution approximations for large spatial datasets

Statistica Sinica

2203

–

2226

Kaufman

C. G.

Schervish

M. J.

Nychka

D. W.

(

2008

Covariance tapering for likelihood-based estimation in large spatial data sets

Journal of the American Statistical Association

103

1545

–

1555

Lindsay

B. G.

(

1988

Composite likelihood methods

Contemporary Mathematics

220

–

239

Liu

Ong

Y.-S.

Shen

Cai

(

2020

When Gaussian process meets big data: a review of scalable GPs

IEEE Transactions on Neural Networks and Learning Systems

–

Manschot

Hector

E. C.

(

2022

Functional regression with intensively measured longitudinal outcomes: a new lens through data partitioning

arXiv

arXiv:2207.13014

Monti

(

2011

Statistical analysis of fMRI time-series: a critical review of the GLM approach

Frontiers in Human Neuroscience

Nebel

M. B.

Eloyan

Barber

A. D.

Mostofsky

S. H.

(

2014

Precentral gyrus functional connectivity signatures of autism

Frontiers in Systems Neuroscience

Nychka

D. W.

Bandyopadhyay

Hammerling

Lindgren

Sain

(

2015

A multiresolution Gaussian process model for the analysis of large spatial datasets

Journal of Computational and Graphical Statistics

579

–

599

Paciorek

Schervish

(

2003

Nonstationary covariance functions for Gaussian process regression

. In:

Advances in Neural Information Processing Systems

. 16.

Cambridge, Massachusetts

MIT Press

Sang

Huang

J. Z.

(

2012

A full scale approximation of covariance functions for large spatial data sets

Journal of the Royal Statistical Society, Series B

111

–

132

Shehzad

Kelly

Reiss

P. T.

Craddock

R. C.

Emerson

J. W.

McMahon

et al. (

2014

A multivariate distance-based analytic framework for connectome-wide association studies

NeuroImage

–

Stein

M. L.

(

2013

Statistical properties of covariance tapers

Journal of Computational and Graphical Statistics

866

–

885

Sun

Genton

M. G.

(

2011

Geostatistics for large datasets

. In:

Advances and Challenges in Space-time Modelling of Natural Events

–

New York

Springer Nature

Uddin

L. Q.

Dajani

D. R.

Voorhies

Bednarz

Kana

R. K.

(

2017

Progress and roadblocks in the search for brain-based biomarkers of autism and attention-deficit/hyperactivity disorder

Translational Psychiatry

e1218

Zimmerman

D. L.

(

1989

Computationally exploitable structure of covariance matrices and generalized covariance matrices in spatial models

Journal of Statistical Computation and Simulation

1015