-
PDF
- Split View
-
Views
-
Cite
Cite
Sílvia Gonçalves, Julia Koh, Benoit Perron, Bootstrap Inference for Group Factor Models, Journal of Financial Econometrics, Volume 23, Issue 2, 2025, nbae020, https://doi.org/10.1093/jjfinec/nbae020
- Share Icon Share
Abstract
Andreou et al. (2019) have proposed a test for common factors based on canonical correlations between factors estimated separately from each group. We propose a simple bootstrap test that avoids the need to estimate the bias and variance of the canonical correlations explicitly and provide high-level conditions for its validity. We verify these conditions for a wild bootstrap scheme similar to the one proposed in Gonçalves and Perron (2014). Simulation experiments show that this bootstrap approach leads to null rejection rates closer to the nominal level in all of our designs compared to the asymptotic framework.
Factor models have been extensively used in the past decades to reduce dimensions of large data sets. They are now widely used in forecasting, as controls in regressions, and as a tool to model cross-sectional dependence.
Andreou et al. (2019) have proposed a test of whether two groups of data contain common factors. The test consists in estimating a set of factors for each subgroup using principal components and testing whether some canonical correlations between these two groups of estimated factors are 1 as they would be if there are factors common to both groups of data. Inference in this situation is complicated by the need to account for the preliminary estimation of the factors. The asymptotic theory in Andreou et al. (2019) is highly nonstandard with non-standard rates of convergence and the presence of an asymptotic bias. Under restrictive assumptions, they propose an estimator for this bias and construct a feasible statistic. However, their simulation results suggest that, even under these restrictive assumptions, their statistic can exhibit level distortions.
This approach was applied in Andreou et al. (2022) to sets of returns on individual stocks and on portfolios. In principle, these two sets of returns should share a common set of factors that represent the stochastic discount factor. The authors find a set of three common factors that price both individual stocks and sorted portfolios. They also find that 10 principal components from the large number of factors proposed in the literature to price stocks (the factor zoo) are needed to span the space of these three common factors.
This article proposes the bootstrap as an alternative inference method. Our main contribution is to propose a simple bootstrap test that avoids the need to estimate the bias and variance of the canonical correlations explicitly. We show its validity under a set of high-level conditions that allow for weak dependence on the data-generating process (DGP). The specific bootstrap scheme that is used depends on the assumptions a researcher is willing to make about this dependence.
For example, if a researcher is willing to assume that the idiosyncratic terms do not exhibit cross-sectional or serial correlation, we show that a wild bootstrap is valid in this context. This is analogous to the results in Gonçalves and Perron (2014), henceforth GP (2014), who showed the validity of a wild bootstrap in the context of factor-augmented regression models. If the presence of cross-sectional dependence is important, a researcher could instead use the cross-sectional dependent (CSD) bootstrap of Gonçalves and Perron (2020). If serial correlation in the idiosyncratic errors is relevant, Koh (2022) proposed an autoregressive sieve bootstrap for factor models. Finally, we also discuss an extension of this method that allows for cross-sectional and serial dependence in the idiosyncratic errors.
The bootstrap has recently been applied in Andreou et al. (2024) to test for the number of common factors. Contrary to our framework which follows Andreou et al. (2019), one set of the factors is assumed to be observed, implying that their bootstrap method is different from ours.
The remainder of the article is organized as follows. Section 1 describes the model and the testing problem in Andreou et al. (2019). Section 2 introduces a general bootstrap scheme in this context and provides a set of high-level conditions under which the bootstrap test is asymptotically valid under the null hypothesis. We also provide a set of sufficient conditions that ensure the bootstrap test is consistent under the alternative hypothesis. Section 3 verifies these conditions for the wild bootstrap method of GP (2014) under a set of assumptions similar to those in Andreou et al. (2019). Section 4 provides simulation results, and Section 5 concludes. We provide three appendices. Appendix A contains a set of assumptions under which we derive the limiting distribution of the original test statistic as well as auxiliary lemmas used to derive this asymptotic distribution. Appendix B contains a set of bootstrap high-level conditions that mirror the primitive assumptions in Appendix A. It also provides the bootstrap analog of the auxiliary lemmas introduced in Appendix A, which are used to prove the bootstrap results in Section 2. Finally, Appendix C contains the proofs of the bootstrap results for the wild bootstrap method proposed in Section 3.
A final word on notation. For a bootstrap sequence, say , we use , or, equivalently, or , as , in probability, to mean that, for any , where denotes the probability measure conditionally on the original data. An equivalent notation is (where we omit the qualification “in probability” for brevity). We also write to mean that for some large enough M. Finally, we write or, equivalently, , in probability, to mean that, for all continuity points of the cdf of X, say , we have that .
1 Framework
1.1 The Group Panel Factor Model
However, we allow for the possibility that and are correlated with covariance matrix .
1.2 The testing problem
The critical value used in AGGR (2019) is obtained from the asymptotic distribution of the test statistic when N1, N2, and . Our goal in this article is to propose an alternative method of inference based on the bootstrap.
1.3 Canonical Correlations, Common and Group-Specific Factors, and Their Loadings
Here, we define the estimators k. In the process of doing so, we also define the estimators of the common and group-specific factors and factor loadings. These will be used to form our bootstrap DGP.
Given Yj, we estimate Fj and Λj with the standard method of principal components. In particular, Fj is estimated with the matrix composed of times the eigenvectors corresponding to the kj largest eigenvalues of of (arranged in decreasing order), where the normalization is used. The factor loading matrix is .
The kc largest eigenvalues of are denoted by . They correspond to the largest kc sample squared canonical correlations between and .
For our bootstrap DGP (to be defined in the next section), we also need estimates of the common and group-specific factors and loadings. These estimates are also used to obtain the test statistic proposed by AGGR (2019). Hence, we describe them next.
First, using Definition 1 of AGGR (2019), we can estimate the common factors as follows. Let the kc associated eigenvectors of (the canonical directions) be collected in the matrix , normalized to have length one, for example, since . Given , an estimator of the common factors is .
The matrix Ξj is defined as . Given Ξj, we can now apply the method of principal components to obtain , composed of times the eigenvectors corresponding to the largest eigenvalues of of (arranged in decreasing order), where the normalization is used.
1.4 The Test Statistic and Its Asymptotic Distribution
Our goal is to propose a bootstrap test based on the bootstrap analogue of , say . In particular, we focus on obtaining a valid bootstrap p-value .2
To understand the properties that a bootstrap test should have in order to be asymptotically valid, we first review the large sample properties of this test statistic, as studied by AGGR (2019). In the following, we let (without loss of generality) and define . Since . We assume that . When , .
Appendix A contains a set of assumptions under which we derive the asymptotic distribution of . Compared to AGGR (2019), we impose a stricter rate condition on N relatively to T. In particular, while our Assumption 1 maintains AGGR (2019)’s assumption that , we require that as opposed to . The main reason why we adopt a stricter rate condition is that it greatly simplifies both the asymptotic and the bootstrap theory.3 In addition, we generalize standard assumptions in the literature on factor models (see, e.g., Bai (2003), Bai and Ng (2006) and GP (2014)) to the group factor context of interest here. Our assumptions suggest natural bootstrap high-level conditions (which we provide in Appendix B) under which the bootstrap asymptotic distribution can be derived. Since some of these bootstrap conditions have already been verified in the previous literature, we can rely on existing results for proving our bootstrap theory. Instead, AGGR (2019)’s assumptions are not easily adapted to proving our bootstrap theory.
Next, we characterize the asymptotic distribution of under Assumptions 1–6 in Appendix A. We introduce the following notation. First, we let and note that ujt captures the factors estimation uncertainty for panel j. In particular, as is well known (cf Bai (2003)), estimation of fjt by principal components implies that each estimator is consistent for , a rotated version of fjt. The rotation matrix is defined as , where is a diagonal matrix containing the kj largest eigenvalues of on the main diagonal, in decreasing order. As shown by Bai (2003), ujt is the leading term in the asymptotic expansion of . We let denote the vector containing the first kc rows of and define . Finally, we let and .
Theorem 2.1 corresponds to Theorem 1 in AGGR (2019) under our Assumptions 1–6. For completeness, we provide a proof of this result in Appendix A. As in AGGR (2019), we obtain an asymptotic expansion of around , where . We then use the fact that under the null hypothesis, fjt and fkt share a set of common factors (ie for j = 1, 2), implying that the kc largest eigenvalues of are all equal to 1. This explains why is centered around kc under the null. However, the asymptotic distribution of depends on the contribution of the factors estimation uncertainty to , which involves products of and . Using Bai (2003)’s identity for the factor estimation error , we rely on Lemma A.2 in Appendix A (which gives an asymptotic expansion of up to order ) to obtain the asymptotic distribution in Theorem 2.1.4
Under our assumptions, the leading term of the asymptotic expansion of in Equation (2) is given by , where . Since under our assumptions, is of order . The asymptotic Gaussianity of the test statistic is driven by the first term on the right-hand side of Equation (2), which we can rewrite as , where . Under Assumption 6, satisfies a central limit theorem, that is, we assume5 that . Hence, is asymptotically distributed as , as stated in Equation (3). Note that in deriving this result we have used the fact that and to show that the remainder is .
Theorem 2.1 illustrates two crucial features of the asymptotic properties of the test statistic under the null. First, the test converges at a non-standard rate given by . Second, the statistic is not centered at even under the null. There is an asymptotic bias term of order given by . When multiplied by , this term is of order . Thus, the bias is diverging but at a slower rate than the convergence rate .
Two crucial conditions for showing that are (i) and (ii) . Under these conditions, we can use a standard normal critical value to test against . Since under is large and negative, the decision rule is to reject whenever , where is the α-quantile of a distribution. This is the approach followed by AGGR (2019).
As it turns out, estimating and is a difficult task when we allow for general time series and cross-sectional dependence in the idiosyncratic errors . In particular, we can show that depends on the cross-sectional dependence of and (but not on their serial dependence), whereas depends on both forms of dependence.
To see that depends on both serial and cross-sectional dependence in , note that is the long run variance of , whose form depends on the potential serial dependence of . It also depends on the cross-sectional dependence because is a (quadratic) function of ujt, which depends on the cross-sectional averages of . Thus, we conclude that is a complicated function of the serial and cross-sectional dependence in the idiosyncratic error terms.
For these reasons, in order to obtain a feasible test statistic, AGGR (2019) assume that each sub-panel follows a strict factor model. Under this assumption (including the assumption of conditional homoskedasticity in the idiosyncratic errors), the form of and simplifies considerably. Their Theorem 2 provides consistent estimators of these quantities, allowing for the construction of a feasible test statistic. However, even under these restrictive assumptions, our simulations (to be discussed later) show important level distortions.
This provides the main motivation for using the bootstrap as an alternative method of inference. Our main goal is to propose a simple bootstrap test that avoids the need to estimate and explicitly and outperforms the asymptotic theory-based test of AGGR (2019).
2 A General Bootstrap Scheme
2.1 The Bootstrap DGP and the Bootstrap Statistics
The goal of this section is to propose asymptotically valid bootstrap methods. A crucial condition for bootstrap validity is that the bootstrap p-value is asymptotically distributed as , a uniform random variable on , when holds. Under , the bootstrap p-value should converge to zero in probability to ensure that the bootstrap test has power. We propose a general residual-based bootstrap scheme that resamples the residuals from the two sub-panels in order to create the bootstrap data on and . We highlight the crucial conditions that the resampled idiosyncratic errors and need to verify in order to produce an asymptotically valid bootstrap test.
Estimation in the bootstrap world proceeds as in the original sample. First, we extract the largest kj principal components for each group j, with by applying the method of principal components to each sub-panel. In particular, the matrix contains the estimated factors for each bootstrap sample generated from . The matrix collects the eigenvectors corresponding to the kj largest eigenvalues of of (arranged in decreasing order and multiplied by ), where we impose that . We then compute , where . The bootstrap test statistic is , where is a diagonal matrix containing the kc largest eigenvalues of obtained from the eigenvalue-eigenvector problem , where is the matrix eigenvector matrix.
As in the original sample, estimation by principal components using the bootstrap data implies that each estimator is consistent for , a rotated version of . The bootstrap rotation matrix is defined as , where is a diagonal matrix containing the kj largest eigenvalues of on the main diagonal, in decreasing order. Contrary to Hj, is observed and could be used for inference on the factors as in GP (2014). Here, the bootstrap test statistic is invariant to , but it shows up in the bootstrap theory. The bootstrap p-value is based on , where is centered around kc because we have imposed the null hypothesis in the bootstrap DGP in Equation (4).
Next, we characterize the bootstrap distribution of . Following the proof of Theorem 2.1, we expand around , where is the bootstrap analogue of .7 Given Equation (4), and share a set of common factors (ie for j = 1, 2), implying that the kc largest eigenvalues of are all equal to 1 and is centered around kc. Note that this holds by construction, independently of whether the null hypothesis is true or not. As argued for the original statistic, the bootstrap distribution of is driven by the contribution of the factors estimation uncertainty (as measured by ) to . In particular, following the proof of Theorem 2.1, the asymptotic distribution of is based on an asymptotic expansion of up to order . This crucial result is given in Lemma B.2 in Appendix B. It relies on Conditions A*, B*, and C*, which are the bootstrap analogues of Assumptions 3, 4 and 5. We call these bootstrap high-level conditions because they apply to any bootstrap method that is used to generate the bootstrap draws . We will verify these conditions for the wild bootstrap in the next section.
The following result follows under Conditions A*–C*. We let , where denotes the first kc rows of . Similarly, we let , which is the bootstrap analogue of .
Lemma 3.1 gives the asymptotic expansion of and is the bootstrap analogue of Equation (2) in Theorem 2.1. The leading term in the expansion of in Equation (5) is given by , where is the bootstrap analogue of . Note that in the bootstrap world, , which explains why is omitted from the definition of . Under our bootstrap high-level conditions, is of order .
To show the asymptotic validity of the bootstrap test, we impose the following additional bootstrap high-level conditions. We define , and let .
Condition D*.
Condition E*, where is such that .
Condition D* requires the bootstrap bias to mimic the bias term . In particular, needs to be a -convergent estimator of . Having does not suffice. The main reason for the faster rate of convergence requirement is that the asymptotic bias term is of order and since the convergence rate is , this induces a shift of the center of the distribution of order . So, contrary to more standard settings where the asymptotic bias is of order , here the asymptotic bias diverges. However, any -consistent estimator of can be used to recenter and yield a random variable whose limiting distribution is . Condition D* requires that the bootstrap bias has this property. Condition E* requires that the bootstrap array satisfies a central limit theorem in the bootstrap world with an asymptotic variance–covariance matrix that converges in probability to . This condition is the bootstrap analogue of Assumption 6(b) in Appendix A.
We discuss a few implications of our bootstrap high-level conditions. The first one is that for the bootstrap to mimic the asymptotic bias term (as implied by Condition D*) we need to generate in a way that preserves the cross-sectional dependence of . Serial dependence in is asymptotically irrelevant for this term. The reason for this is that depends only on the cross-sectional dependence but not on the serial dependence of , as we explained in the previous section.
The second implication is that in order for the bootstrap to replicate the covariance (as required by Condition E*) we need to design a bootstrap method that generates with serial dependence (in addition to cross-sectional dependence). This can be seen by noting that is the long run variance of , which depends on both the serial and the cross-sectional dependence properties of .
The overall conclusion is that the implementation of the bootstrap depends on the serial and cross-sectional dependence assumptions we make on the idiosyncratic errors of each sub-panel. Different assumptions will lead to different bootstrap algorithms. Theorem 3.1 is useful because it gives a set of high-level conditions that can be used to prove the asymptotic validity of the bootstrap for any bootstrap scheme used to obtain .
To end this section, we discuss the asymptotic power of our bootstrap test. Although Conditions A*–E* suffice to show that under , a weaker set of assumptions suffices. In particular, the following high-level condition is sufficient to ensure that any bootstrap test based on is consistent.
Condition F* and , where ϵ is some positive number.
Under Assumptions 1–6, any bootstrap method that verifies Conditions A*–C* and F* satisfies under .
Since we reject if , Proposition 3.1 ensures that when is true.
3 Specific Bootstrap Schemes
3.1 The Wild Bootstrap Method
Here, we discuss a wild bootstrap method and show that it verifies Conditions A*–F* under a set of assumptions similar to those of Theorem 2 in AGGR (2019). Algorithm 1 contains a description of this method.
Wild Bootstrap
3. Compute the kc largest eigenvalues of and denote these by .
4. Compute the bootstrap test statistic .
6. Reject the null hypothesis of kc common factors at level α if .
To prove the asymptotic validity of the wild bootstrap p-value, we strengthen the primitive assumptions given in Appendix A as follows:
For j = 1, 2, and are mutually independent such that and for all (i, t).
(a) if or or , and (b) .
For each j = 1, 2,
for any i, k.
Assumption WB1 strengthens the moment conditions in Assumptions 2 and 3(a). The larger number of moments of fjt and is required here than in GP (2014) (who require the existence of 12 moments rather than 32). As explained above, our bootstrap test statistic involves products and cross products of bootstrap estimated factors from each sub-panel. The derivation of the bootstrap asymptotic distribution of relies on Lemma B.2 which obtains an asymptotic expansion of up to order . This requires not only the verification of Conditions A* and B* from GP (2014) (who obtain an asymptotic expansion of up to order ), but also of Condition C*, which is new to this article. The large number of moments is used in verifying this condition. In particular, we rely on repeated applications of Cauchy–Schwarz’s inequality, and bound sums such as for , which requires the existence of 2p moments of fjt and (see Lemma C.1).
Assumption WB2 rules out cross-sectional and serial correlation in the idiosyncratic errors of each sub-panel as well as correlation among and for . These assumptions are similar to the assumptions used by AGGR (2019) to justify their feasible test statistic (see their Theorem 2). For simplicity, we assume the external random variable to be Gaussian, but the result generalizes to any i.i.d. draw that is mean zero and variance one with finite eight moments and a symmetric distribution.
Assume that Assumptions 1–6 strengthened by Assumptions WB1, WB2, and WB3 hold. Then, if Algorithm 1 is used to generate for j = 1, 2, the conclusions of Theorem 3.1 and Proposition 3.1 apply.
Theorem 4.1 justifies theoretically using the wild bootstrap p-value to test the null hypothesis of kc common factors. Although Assumption WB2 rules out dependence in in both dimensions, as in Theorem 2 of AGGR (2019), this bootstrap test does not require an explicit bias correction nor a variance estimator. We show in Section 4 that the feasible test statistic AGGR (2019) can be oversized even under these restrictive assumptions. The wild bootstrap essentially eliminates these level distortions.
3.2 An Extension: AR-CSD Bootstrap Method
The fact that we allow for a possibly non-diagonal covariance matrix means that we allow for cross-sectional dependence in the innovations vjt.
Our proposal is to create bootstrap observations using a residual-based bootstrap procedure that resamples the residuals of the AR model (6). Resampling the vector of the AR(p) residuals allowing for unrestricted cross-sectional dependence is complicated due to the fact that the covariance matrix is high dimensional. In particular, i.i.d. resampling of is not valid, as shown by Gonçalves and Perron (2020) in the context of factor-augmented regression models. Our bootstrap algorithm (described in Algorithm 2) relies on the CSD bootstrap of Gonçalves and Perron (2020). In the following, we let denote any consistent estimator of under the spectral norm. Examples include the thresholding estimator of Bickel and Levina (2008a) and the banding estimator of Bickel and Levina (2008b).
AR-CSD Bootstrap
2. Repeat steps 2 through 6 of Algorithm 1.
The wild bootstrap algorithm (Algorithm 1) is a special case of Algorithm 2 when we set for all i and . Another special case is the CSD bootstrap of Gonçalves and Perron (2020), which sets and lets denote the thresholding estimator based on the sample covariances of . Finally, a generalization of Algorithm 2 is the sieve bootstrap proposed by Koh (2024) in the context of MIDAS factor models. Although it would be interesting to extend the sieve bootstrap to our testing problem, we focus on a class of finite order AR models here in order to simplify the analysis.
The proof of the asymptotic validity of Algorithm 2 follows from Theorem 3.1 and Proposition 3.1 by verifying Conditions A*–F*. Since is both serially and cross-sectionally correlated, the verification of these bootstrap high-level conditions is much more involved than for the wild bootstrap and beyond the scope of this article. However, we evaluate by simulation the performance of both Algorithms 1 and 2 in the next section.
4 Simulations
For level experiments, we let . As in GP (2014), this common factor is generated independently over time from a standard normal distribution, . For power experiments, each group has a specific factor and These two group-specific factors are also generated independently over time from a bivariate normal with unit variance and correlation . In all cases, the factor loadings are drawn independently from a standard normal distribution, , j = 1, 2.
The scalar β induces cross-sectional dependence in each group among the idiosyncratic innovations. This is similar to the design in Bai and Ng (2006). Note that we assume that Σv is a block diagonal matrix, so we do not consider dependence between the two groups. In Table 1 and Table 2, we report the parameter settings and sample sizes we consider, respectively.
DGP . | . | . | β . |
---|---|---|---|
Design 1 (no serial & no cross-sectional dependence) | 0 | 0 | 0 |
Design 2 (only serial dependence) | 0.5 | 0.3 | 0 |
Design 3 (only cross-sectional dependence) | 0 | 0 | 0.5 |
Design 4 (serial & cross-sectional dependence) | 0.5 | 0.3 | 0.5 |
DGP . | . | . | β . |
---|---|---|---|
Design 1 (no serial & no cross-sectional dependence) | 0 | 0 | 0 |
Design 2 (only serial dependence) | 0.5 | 0.3 | 0 |
Design 3 (only cross-sectional dependence) | 0 | 0 | 0.5 |
Design 4 (serial & cross-sectional dependence) | 0.5 | 0.3 | 0.5 |
DGP . | . | . | β . |
---|---|---|---|
Design 1 (no serial & no cross-sectional dependence) | 0 | 0 | 0 |
Design 2 (only serial dependence) | 0.5 | 0.3 | 0 |
Design 3 (only cross-sectional dependence) | 0 | 0 | 0.5 |
Design 4 (serial & cross-sectional dependence) | 0.5 | 0.3 | 0.5 |
DGP . | . | . | β . |
---|---|---|---|
Design 1 (no serial & no cross-sectional dependence) | 0 | 0 | 0 |
Design 2 (only serial dependence) | 0.5 | 0.3 | 0 |
Design 3 (only cross-sectional dependence) | 0 | 0 | 0.5 |
Design 4 (serial & cross-sectional dependence) | 0.5 | 0.3 | 0.5 |
N1 . | N2 . | T . |
---|---|---|
50 | 50 | 50 |
50 | 50 | 100 |
50 | 50 | 200 |
100 | 100 | 50 |
100 | 100 | 100 |
100 | 100 | 200 |
200 | 200 | 50 |
200 | 200 | 100 |
200 | 200 | 200 |
N1 . | N2 . | T . |
---|---|---|
50 | 50 | 50 |
50 | 50 | 100 |
50 | 50 | 200 |
100 | 100 | 50 |
100 | 100 | 100 |
100 | 100 | 200 |
200 | 200 | 50 |
200 | 200 | 100 |
200 | 200 | 200 |
N1 . | N2 . | T . |
---|---|---|
50 | 50 | 50 |
50 | 50 | 100 |
50 | 50 | 200 |
100 | 100 | 50 |
100 | 100 | 100 |
100 | 100 | 200 |
200 | 200 | 50 |
200 | 200 | 100 |
200 | 200 | 200 |
N1 . | N2 . | T . |
---|---|---|
50 | 50 | 50 |
50 | 50 | 100 |
50 | 50 | 200 |
100 | 100 | 50 |
100 | 100 | 100 |
100 | 100 | 200 |
200 | 200 | 50 |
200 | 200 | 100 |
200 | 200 | 200 |
In Design 1, we assume that there is no serial correlation and no cross-sectional dependence and that the idiosyncratic errors are homoskedastic. The idiosyncratic error terms in Design 2 are serially correlated in each group where the AR(1) coefficient in group 1 is larger than the one in group 2. In the third design, we consider cross-sectional dependence without serial correlation in the idiosyncratic error term. Finally, in the last design, the idiosyncratic innovation terms are both serially and cross-sectionally correlated.
We consider sample sizes between 50 and 200 and T between 50 and 200. We simulate each design 5000 times, and the bootstrap replication number is set at 399. We use the bootstrap algorithms proposed in Sections 2 and 3 with four different bootstrap methods: the wild bootstrap, the AR(1)-CSD bootstrap proposed earlier and two variants: a parametric AR(1) bootstrap with no cross-sectional dependence and a CSD bootstrap with no serial dependence. The CSD and AR(1)-CSD bootstraps involve an estimator of the covariance matrix of the idiosyncratic errors. We rely on the banding estimator of Bickel and Levina (2008b) with the banding parameter k chosen by their cross-validation procedure. We focus our results on and report rejection rates for each design, bootstrap method, and sample size.
The simulation results for the level experiments are shown in Table 3. The row labeled “AGGR” reports results based on the asymptotic standard normal critical values. The other four rows contain the results for the bootstrap methods: WB for wild bootstrap and AR(1) for parametric AR(1) bootstrap method, CSD for the cross-sectional bootstrap, and AR-CSD for the bootstrap that combines the autoregressive and CSD bootstrap.
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 6.5 | 4.9 | 3.3 | 7.4 | 6.2 | 5.0 | 8.3 | 7.2 | 6.2 |
WB | 5.3 | 4.9 | 4.3 | 5.8 | 5.7 | 5.1 | 6.7 | 6.4 | 5.6 | |
AR(1) | 5.1 | 4.9 | 4.1 | 5.9 | 5.6 | 5.1 | 6.7 | 6.1 | 5.8 | |
CSD | 6.1 | 5.9 | 5.6 | 6.0 | 5.7 | 5.6 | 6.7 | 6.2 | 6.0 | |
AR(1)-CSD | 7.3 | 6.7 | 5.8 | 7.5 | 6.8 | 6.3 | 8.4 | 7.3 | 6.7 | |
Design 2 AR | AGGR | 14.3 | 10.0 | 7.7 | 15.2 | 12.4 | 9.8 | 17.7 | 13.8 | 10.8 |
WB | 9.8 | 8.7 | 8.0 | 10.4 | 9.8 | 8.9 | 12.5 | 10.7 | 9.3 | |
AR(1) | 4.9 | 4.7 | 4.2 | 5.9 | 5.7 | 4.8 | 6.9 | 5.9 | 5.4 | |
CSD | 11.9 | 12.5 | 14.7 | 11.5 | 11.5 | 11.4 | 13.0 | 11.0 | 10.1 | |
AR(1)-CSD | 6.4 | 6.2 | 6.1 | 7.2 | 7.0 | 6.3 | 7.8 | 7.1 | 6.2 | |
Design 3 CSD | AGGR | 13.1 | 13.5 | 16.0 | 12.0 | 11.3 | 12.6 | 11.2 | 10.1 | 9.9 |
WB | 11.2 | 13.2 | 17.3 | 9.6 | 10.8 | 12.6 | 9.6 | 9.0 | 9.5 | |
AR(1) | 11.6 | 13.1 | 17.2 | 10.1 | 10.4 | 12.5 | 9.5 | 9.1 | 9.8 | |
CSD | 3.5 | 4.4 | 4.0 | 5.0 | 4.6 | 3.9 | 5.9 | 5.1 | 5.2 | |
AR(1)-CSD | 6.5 | 6.1 | 4.7 | 7.7 | 6.1 | 5.2 | 8.7 | 6.8 | 6.1 | |
Design 4 AR + CSD | AGGR | 21.6 | 18.9 | 20.5 | 20.4 | 17.5 | 16.9 | 20.8 | 16.8 | 14.2 |
WB | 15.7 | 16.6 | 20.1 | 15.1 | 14.7 | 15.8 | 15.2 | 13.8 | 12.4 | |
AR(1) | 9.6 | 11.6 | 15.0 | 8.6 | 9.4 | 10.6 | 8.7 | 8.2 | 8.2 | |
CSD | 7.5 | 8.0 | 8.0 | 8.9 | 8.2 | 7.7 | 10.5 | 9.4 | 8.4 | |
AR(1)-CSD | 5.4 | 5.3 | 5.1 | 6.5 | 5.8 | 4.9 | 7.1 | 6.1 | 5.7 |
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 6.5 | 4.9 | 3.3 | 7.4 | 6.2 | 5.0 | 8.3 | 7.2 | 6.2 |
WB | 5.3 | 4.9 | 4.3 | 5.8 | 5.7 | 5.1 | 6.7 | 6.4 | 5.6 | |
AR(1) | 5.1 | 4.9 | 4.1 | 5.9 | 5.6 | 5.1 | 6.7 | 6.1 | 5.8 | |
CSD | 6.1 | 5.9 | 5.6 | 6.0 | 5.7 | 5.6 | 6.7 | 6.2 | 6.0 | |
AR(1)-CSD | 7.3 | 6.7 | 5.8 | 7.5 | 6.8 | 6.3 | 8.4 | 7.3 | 6.7 | |
Design 2 AR | AGGR | 14.3 | 10.0 | 7.7 | 15.2 | 12.4 | 9.8 | 17.7 | 13.8 | 10.8 |
WB | 9.8 | 8.7 | 8.0 | 10.4 | 9.8 | 8.9 | 12.5 | 10.7 | 9.3 | |
AR(1) | 4.9 | 4.7 | 4.2 | 5.9 | 5.7 | 4.8 | 6.9 | 5.9 | 5.4 | |
CSD | 11.9 | 12.5 | 14.7 | 11.5 | 11.5 | 11.4 | 13.0 | 11.0 | 10.1 | |
AR(1)-CSD | 6.4 | 6.2 | 6.1 | 7.2 | 7.0 | 6.3 | 7.8 | 7.1 | 6.2 | |
Design 3 CSD | AGGR | 13.1 | 13.5 | 16.0 | 12.0 | 11.3 | 12.6 | 11.2 | 10.1 | 9.9 |
WB | 11.2 | 13.2 | 17.3 | 9.6 | 10.8 | 12.6 | 9.6 | 9.0 | 9.5 | |
AR(1) | 11.6 | 13.1 | 17.2 | 10.1 | 10.4 | 12.5 | 9.5 | 9.1 | 9.8 | |
CSD | 3.5 | 4.4 | 4.0 | 5.0 | 4.6 | 3.9 | 5.9 | 5.1 | 5.2 | |
AR(1)-CSD | 6.5 | 6.1 | 4.7 | 7.7 | 6.1 | 5.2 | 8.7 | 6.8 | 6.1 | |
Design 4 AR + CSD | AGGR | 21.6 | 18.9 | 20.5 | 20.4 | 17.5 | 16.9 | 20.8 | 16.8 | 14.2 |
WB | 15.7 | 16.6 | 20.1 | 15.1 | 14.7 | 15.8 | 15.2 | 13.8 | 12.4 | |
AR(1) | 9.6 | 11.6 | 15.0 | 8.6 | 9.4 | 10.6 | 8.7 | 8.2 | 8.2 | |
CSD | 7.5 | 8.0 | 8.0 | 8.9 | 8.2 | 7.7 | 10.5 | 9.4 | 8.4 | |
AR(1)-CSD | 5.4 | 5.3 | 5.1 | 6.5 | 5.8 | 4.9 | 7.1 | 6.1 | 5.7 |
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 6.5 | 4.9 | 3.3 | 7.4 | 6.2 | 5.0 | 8.3 | 7.2 | 6.2 |
WB | 5.3 | 4.9 | 4.3 | 5.8 | 5.7 | 5.1 | 6.7 | 6.4 | 5.6 | |
AR(1) | 5.1 | 4.9 | 4.1 | 5.9 | 5.6 | 5.1 | 6.7 | 6.1 | 5.8 | |
CSD | 6.1 | 5.9 | 5.6 | 6.0 | 5.7 | 5.6 | 6.7 | 6.2 | 6.0 | |
AR(1)-CSD | 7.3 | 6.7 | 5.8 | 7.5 | 6.8 | 6.3 | 8.4 | 7.3 | 6.7 | |
Design 2 AR | AGGR | 14.3 | 10.0 | 7.7 | 15.2 | 12.4 | 9.8 | 17.7 | 13.8 | 10.8 |
WB | 9.8 | 8.7 | 8.0 | 10.4 | 9.8 | 8.9 | 12.5 | 10.7 | 9.3 | |
AR(1) | 4.9 | 4.7 | 4.2 | 5.9 | 5.7 | 4.8 | 6.9 | 5.9 | 5.4 | |
CSD | 11.9 | 12.5 | 14.7 | 11.5 | 11.5 | 11.4 | 13.0 | 11.0 | 10.1 | |
AR(1)-CSD | 6.4 | 6.2 | 6.1 | 7.2 | 7.0 | 6.3 | 7.8 | 7.1 | 6.2 | |
Design 3 CSD | AGGR | 13.1 | 13.5 | 16.0 | 12.0 | 11.3 | 12.6 | 11.2 | 10.1 | 9.9 |
WB | 11.2 | 13.2 | 17.3 | 9.6 | 10.8 | 12.6 | 9.6 | 9.0 | 9.5 | |
AR(1) | 11.6 | 13.1 | 17.2 | 10.1 | 10.4 | 12.5 | 9.5 | 9.1 | 9.8 | |
CSD | 3.5 | 4.4 | 4.0 | 5.0 | 4.6 | 3.9 | 5.9 | 5.1 | 5.2 | |
AR(1)-CSD | 6.5 | 6.1 | 4.7 | 7.7 | 6.1 | 5.2 | 8.7 | 6.8 | 6.1 | |
Design 4 AR + CSD | AGGR | 21.6 | 18.9 | 20.5 | 20.4 | 17.5 | 16.9 | 20.8 | 16.8 | 14.2 |
WB | 15.7 | 16.6 | 20.1 | 15.1 | 14.7 | 15.8 | 15.2 | 13.8 | 12.4 | |
AR(1) | 9.6 | 11.6 | 15.0 | 8.6 | 9.4 | 10.6 | 8.7 | 8.2 | 8.2 | |
CSD | 7.5 | 8.0 | 8.0 | 8.9 | 8.2 | 7.7 | 10.5 | 9.4 | 8.4 | |
AR(1)-CSD | 5.4 | 5.3 | 5.1 | 6.5 | 5.8 | 4.9 | 7.1 | 6.1 | 5.7 |
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 6.5 | 4.9 | 3.3 | 7.4 | 6.2 | 5.0 | 8.3 | 7.2 | 6.2 |
WB | 5.3 | 4.9 | 4.3 | 5.8 | 5.7 | 5.1 | 6.7 | 6.4 | 5.6 | |
AR(1) | 5.1 | 4.9 | 4.1 | 5.9 | 5.6 | 5.1 | 6.7 | 6.1 | 5.8 | |
CSD | 6.1 | 5.9 | 5.6 | 6.0 | 5.7 | 5.6 | 6.7 | 6.2 | 6.0 | |
AR(1)-CSD | 7.3 | 6.7 | 5.8 | 7.5 | 6.8 | 6.3 | 8.4 | 7.3 | 6.7 | |
Design 2 AR | AGGR | 14.3 | 10.0 | 7.7 | 15.2 | 12.4 | 9.8 | 17.7 | 13.8 | 10.8 |
WB | 9.8 | 8.7 | 8.0 | 10.4 | 9.8 | 8.9 | 12.5 | 10.7 | 9.3 | |
AR(1) | 4.9 | 4.7 | 4.2 | 5.9 | 5.7 | 4.8 | 6.9 | 5.9 | 5.4 | |
CSD | 11.9 | 12.5 | 14.7 | 11.5 | 11.5 | 11.4 | 13.0 | 11.0 | 10.1 | |
AR(1)-CSD | 6.4 | 6.2 | 6.1 | 7.2 | 7.0 | 6.3 | 7.8 | 7.1 | 6.2 | |
Design 3 CSD | AGGR | 13.1 | 13.5 | 16.0 | 12.0 | 11.3 | 12.6 | 11.2 | 10.1 | 9.9 |
WB | 11.2 | 13.2 | 17.3 | 9.6 | 10.8 | 12.6 | 9.6 | 9.0 | 9.5 | |
AR(1) | 11.6 | 13.1 | 17.2 | 10.1 | 10.4 | 12.5 | 9.5 | 9.1 | 9.8 | |
CSD | 3.5 | 4.4 | 4.0 | 5.0 | 4.6 | 3.9 | 5.9 | 5.1 | 5.2 | |
AR(1)-CSD | 6.5 | 6.1 | 4.7 | 7.7 | 6.1 | 5.2 | 8.7 | 6.8 | 6.1 | |
Design 4 AR + CSD | AGGR | 21.6 | 18.9 | 20.5 | 20.4 | 17.5 | 16.9 | 20.8 | 16.8 | 14.2 |
WB | 15.7 | 16.6 | 20.1 | 15.1 | 14.7 | 15.8 | 15.2 | 13.8 | 12.4 | |
AR(1) | 9.6 | 11.6 | 15.0 | 8.6 | 9.4 | 10.6 | 8.7 | 8.2 | 8.2 | |
CSD | 7.5 | 8.0 | 8.0 | 8.9 | 8.2 | 7.7 | 10.5 | 9.4 | 8.4 | |
AR(1)-CSD | 5.4 | 5.3 | 5.1 | 6.5 | 5.8 | 4.9 | 7.1 | 6.1 | 5.7 |
Under the restrictive Design 1 where the assumptions of Theorem 2 of Andreou et al. (2019) are satisfied, the asymptotic theory performs reasonably well, although some di appear for the smaller value of T. For the other three designs, we find severe over-rejections for all sample sizes, as expected given that the statistic is computed assuming away autocorrelation and cross-sectional dependence.
In all sets of samples and designs, bootstrap methods provide more reliable inference than standard normal inference. The bootstrap method that performs best is typically the one tailored to the properties of the DGP. For example, in Design 1, both the wild bootstrap and the AR(1) bootstrap perform similarly, and they reject the null hypothesis at a rate close to 5%. To illustrate, for and the test rejects in 7.4% of the replications using the standard normal critical values. The rejection rates for the wild bootstrap and AR(1) bootstrap are 5.8% and 5.9%, respectively. On the other hand, the cross-sectional bootstrap and combined AR(1) and CSD bootstrap reject in 6.0% and 7.5% of the replications. This higher rejection rate is the cost of using a more robust method than necessary.
As mentioned above, in Designs 2–4, the feasible statistic in Andreou et al. (2019) leads to large level distortions since it is not robust to serial correlation or cross-sectional dependence. Because there is serial dependence in the idiosyncratic error terms in Design 2, the wild bootstrap and CSD bootstrap are no longer valid while still improving on the use of the standard normal critical values. In this case, both the AR(1) and AR(1)-CSD bootstraps are valid and provide similar results with a slight preference for the simple AR(1) bootstrap. To illustrate, with the same and T = 50 as above, the standard normal critical values lead to a rejection rate of 15.2% for a 5% test. The (invalid) wild and CSD bootstraps have rejection rates of 10.4% and 11.5%, respectively. On the other hand, the (valid) AR(1) and AR(1)-CSD bootstraps have rejection rates of 5.9% and 7.2%.
In Designs 3 and 4, where we introduce cross-sectional dependence, neither the wild bootstrap nor the AR(1) bootstrap are valid and they are not performing well, as expected. In the most general design with both serial and cross-sectional dependence, only the AR(1)-CSD bootstrap provides reliable results. While the asymptotic theory in the and case shows a rejection rate of 20.4%, the AR(1)-CSD bootstrap has a rejection rate of 6.5% compared with 8.9% for the CSD bootstrap, 8.6% for the AR(1) bootstrap, and 15.0% for the simple wild bootstrap.
Our power results are presented in Table 4. These results must be interpreted with caution given the large level distortions documented in some cases. For the simple i.i.d. case (Design 1) where all tests have reasonable rejection rates under the null, we see that the bootstrap entails a small reduction in power relative to the AGGR test. The largest loss occurs for where the AGGR test has a power of 65.2%, while the wild bootstrap rejects in 61.5% of the cases. The gap between the two methods disappears as sample size increases in both dimensions.
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 65.2 | 83.5 | 96.4 | 96.4 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.5 | 83.0 | 97.2 | 95.5 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 60.8 | 83.4 | 97.4 | 95.2 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 58.9 | 79.6 | 92.9 | 94.9 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1)-CSD | 62.0 | 81.0 | 93.7 | 95.6 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 2 AR | AGGR | 70.1 | 84.9 | 96.0 | 96.7 | 99.8 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.3 | 82.3 | 96.0 | 95.1 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1) | 48.9 | 74.2 | 93.3 | 90.1 | 99.3 | 100.0 | 99.8 | 100.0 | 100.0 | |
CSD | 61.5 | 79.9 | 92.7 | 94.3 | 99.5 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 50.0 | 71.6 | 87.5 | 90.3 | 99.2 | 100.0 | 99.9 | 100.0 | 100.0 | |
Design 3 CSD | AGGR | 68.4 | 84.3 | 94.4 | 96.3 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 64.7 | 83.9 | 95.0 | 95.2 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 63.9 | 83.4 | 95.1 | 95.1 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
CSD | 46.1 | 66.3 | 83.1 | 91.7 | 99.0 | 99.9 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 51.3 | 69.1 | 84.6 | 92.9 | 99.1 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 4 AR + CSD | AGGR | 73.3 | 85.0 | 94.3 | 96.9 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 65.4 | 82.6 | 94.4 | 94.9 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 53.5 | 75.1 | 91.7 | 90.5 | 99.1 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 48.8 | 66.5 | 83.8 | 91.3 | 99.0 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 40.0 | 57.2 | 75.4 | 85.8 | 98.0 | 99.0 | 99.8 | 100.0 | 100.0 |
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 65.2 | 83.5 | 96.4 | 96.4 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.5 | 83.0 | 97.2 | 95.5 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 60.8 | 83.4 | 97.4 | 95.2 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 58.9 | 79.6 | 92.9 | 94.9 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1)-CSD | 62.0 | 81.0 | 93.7 | 95.6 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 2 AR | AGGR | 70.1 | 84.9 | 96.0 | 96.7 | 99.8 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.3 | 82.3 | 96.0 | 95.1 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1) | 48.9 | 74.2 | 93.3 | 90.1 | 99.3 | 100.0 | 99.8 | 100.0 | 100.0 | |
CSD | 61.5 | 79.9 | 92.7 | 94.3 | 99.5 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 50.0 | 71.6 | 87.5 | 90.3 | 99.2 | 100.0 | 99.9 | 100.0 | 100.0 | |
Design 3 CSD | AGGR | 68.4 | 84.3 | 94.4 | 96.3 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 64.7 | 83.9 | 95.0 | 95.2 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 63.9 | 83.4 | 95.1 | 95.1 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
CSD | 46.1 | 66.3 | 83.1 | 91.7 | 99.0 | 99.9 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 51.3 | 69.1 | 84.6 | 92.9 | 99.1 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 4 AR + CSD | AGGR | 73.3 | 85.0 | 94.3 | 96.9 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 65.4 | 82.6 | 94.4 | 94.9 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 53.5 | 75.1 | 91.7 | 90.5 | 99.1 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 48.8 | 66.5 | 83.8 | 91.3 | 99.0 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 40.0 | 57.2 | 75.4 | 85.8 | 98.0 | 99.0 | 99.8 | 100.0 | 100.0 |
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 65.2 | 83.5 | 96.4 | 96.4 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.5 | 83.0 | 97.2 | 95.5 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 60.8 | 83.4 | 97.4 | 95.2 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 58.9 | 79.6 | 92.9 | 94.9 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1)-CSD | 62.0 | 81.0 | 93.7 | 95.6 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 2 AR | AGGR | 70.1 | 84.9 | 96.0 | 96.7 | 99.8 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.3 | 82.3 | 96.0 | 95.1 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1) | 48.9 | 74.2 | 93.3 | 90.1 | 99.3 | 100.0 | 99.8 | 100.0 | 100.0 | |
CSD | 61.5 | 79.9 | 92.7 | 94.3 | 99.5 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 50.0 | 71.6 | 87.5 | 90.3 | 99.2 | 100.0 | 99.9 | 100.0 | 100.0 | |
Design 3 CSD | AGGR | 68.4 | 84.3 | 94.4 | 96.3 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 64.7 | 83.9 | 95.0 | 95.2 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 63.9 | 83.4 | 95.1 | 95.1 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
CSD | 46.1 | 66.3 | 83.1 | 91.7 | 99.0 | 99.9 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 51.3 | 69.1 | 84.6 | 92.9 | 99.1 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 4 AR + CSD | AGGR | 73.3 | 85.0 | 94.3 | 96.9 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 65.4 | 82.6 | 94.4 | 94.9 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 53.5 | 75.1 | 91.7 | 90.5 | 99.1 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 48.8 | 66.5 | 83.8 | 91.3 | 99.0 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 40.0 | 57.2 | 75.4 | 85.8 | 98.0 | 99.0 | 99.8 | 100.0 | 100.0 |
. | . | N = 50 . | N = 100 . | N = 200 . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . | T = 50 . | 100 . | 200 . |
Design 1 i.i.d. | AGGR | 65.2 | 83.5 | 96.4 | 96.4 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.5 | 83.0 | 97.2 | 95.5 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 60.8 | 83.4 | 97.4 | 95.2 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 58.9 | 79.6 | 92.9 | 94.9 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1)-CSD | 62.0 | 81.0 | 93.7 | 95.6 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 2 AR | AGGR | 70.1 | 84.9 | 96.0 | 96.7 | 99.8 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 61.3 | 82.3 | 96.0 | 95.1 | 99.6 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1) | 48.9 | 74.2 | 93.3 | 90.1 | 99.3 | 100.0 | 99.8 | 100.0 | 100.0 | |
CSD | 61.5 | 79.9 | 92.7 | 94.3 | 99.5 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 50.0 | 71.6 | 87.5 | 90.3 | 99.2 | 100.0 | 99.9 | 100.0 | 100.0 | |
Design 3 CSD | AGGR | 68.4 | 84.3 | 94.4 | 96.3 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 64.7 | 83.9 | 95.0 | 95.2 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 63.9 | 83.4 | 95.1 | 95.1 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
CSD | 46.1 | 66.3 | 83.1 | 91.7 | 99.0 | 99.9 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 51.3 | 69.1 | 84.6 | 92.9 | 99.1 | 100.0 | 100.0 | 100.0 | 100.0 | |
Design 4 AR + CSD | AGGR | 73.3 | 85.0 | 94.3 | 96.9 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |
WB | 65.4 | 82.6 | 94.4 | 94.9 | 99.5 | 100.0 | 100.0 | 100.0 | 100.0 | |
AR(1) | 53.5 | 75.1 | 91.7 | 90.5 | 99.1 | 100.0 | 99.9 | 100.0 | 100.0 | |
CSD | 48.8 | 66.5 | 83.8 | 91.3 | 99.0 | 100.0 | 99.9 | 100.0 | 100.0 | |
AR(1)-CSD | 40.0 | 57.2 | 75.4 | 85.8 | 98.0 | 99.0 | 99.8 | 100.0 | 100.0 |
It is interesting to note that power increases faster in the cross-sectional than in the time series dimension. Going from N = 50 to N = 100 for given T has more impact on power than going from T = 50 to T = 100 for given N. This is consistent with the different rates of convergence of the statistic in the two dimensions.
Finally, we see that more complex idiosyncratic dependencies lead to a reduction in power for bootstrap methods that control level. Nevertheless, power approaches one rather quickly.
Overall, our results suggest that except for the simple case with no serial or cross-sectional dependence and large sample sizes, the use of standard normal critical values leads to large level distortions. On the other hand, a bootstrap method that adapts to the properties of the idiosyncratic terms provides excellent coverage rates, while a misspecified bootstrap still improves matters noticeably. The use of more robust bootstrap methods has a small cost in terms of power.
5 Conclusions
In this article, we have proposed the bootstrap as an inference method on the number of common factors in two groups of data. We propose and theoretically justify under weak conditions a simple bootstrap test that avoids the need to estimate the bias and variance of the canonical correlations explicitly. We have verified these conditions in the case of the wild bootstrap under conditions similar to those in AGGR (2019). However, other approaches tailored to more general DGPs are possible. Our simulation experiment shows that the bootstrap leads to rejection rates closer to the nominal level in all of the designs we considered compared to the asymptotic framework of AGGR (2019).
Funding
Funding support for this article was provided by Gonçalves and Perron: the Social Sciences and Humanities Research Council (SSHRC, grants 435-2023-0352 and 435-2020-1349, respectively) and the Fonds de recherche du Québec—société et culture (FRQSC, grant 2020-SE2-269954).
This article is based on the Halbert White Memorial Lecture given by Sílvia Gonçalves at the Society for Financial Econometrics Conference on June 15th, 2024, in Rio de Janeiro, Brazil. We thank the JFEC Editors Allan Timmermann and Fabio Trojani for the invitation, and the discussants at the conference, Michael Wolf and Eric Ghysels, for their insightful comments. We also thank participants at the following conferences: Society for Financial Econometrics (2023), North American Summer Meetings of the Econometric Society (2023), Advances in Econometrics conference in honor of Joon Park (2023), International Association for Applied Econometrics (2023), Canadian Economics Association (2023), Société canadienne de science économique (2024), CIREQ Econometrics conference in honor of Eric Ghysels (2024), and International Symposium on Non-Parametric Statistics (2024).
Notes
For simplicity, we focus on here. Our results also apply to a test statistic based on the alternative estimator defined in AGGR (2019) (this is the sample analogue of in our notation).
Although we denote the bootstrap p-value by , we should note it is not random with respect to the bootstrap measure . A similar notation is used below to denote the bootstrap bias and bootstrap variance of the bootstrap test statistic . This choice of notation allows us to differentiate bootstrap population quantities from other potential estimators that do not rely on the bootstrap.
Under our Assumption 1, the asymptotic expansions of the test statistic (and of its bootstrap analogue) used to derive the limiting distributions need to have remainders of order , with , whereas AGGR (2019) need to obtain expansions up to order .
In contrast, AGGR (2019) rely on an asymptotic expansion up to order because they require rather than (see their Proposition 3).
AGGR (2019) provide conditions under which this high-level condition holds. See in particular their Assumptions A.5 and A.6, which are used to show that is a Near Epoch Dependent (NED) process. Since our contribution is proving the bootstrap validity in this context, we do not provide these more primitive conditions. They are not required to prove our bootstrap theory.
For example, we could use the principal components estimators and when generating . To distinguish these estimators from their restricted versions, we denote the latter by and .
Although is defined as a function of and does not depend on resampled data, we use this notation to indicate that it is the bootstrap analogue of .
We could allow for richer dynamics by assuming a sparse VAR model for the idiosyncratic error vector , as in Kock and Callot (2015), Krampe and Paparoditis (2021), and Krampe and Margaritella (2021). Under sparsity, we would estimate by a regularized OLS estimator such as LASSO rather than OLS. The remaining steps of our bootstrap method would remain the same.
This means that it contains terms of order and a remainder of order .
This means that it contains terms of order and a remainder of order . Instead, AGGR (2019) need to obtain higher order expansions with remainders of order because they replace our assumption with .
Note that is the bootstrap analogue of defined in eq. (B.3) of AGGR (2019). Although we keep the star notation when defining , we note that is not random when we condition on the original sample. We adopt this notation to be consistent with notation in AGGR (2019).
We can replace and with their expansions based on alternative group common factors such that , where is matrix collecting eigenvectors of associated to kc eigenvalues. It yields the similar expansion such that .
Appendix
Appendix A: Asymptotic theory
This appendix is organized as follows. In Appendix A.1, we provide a set of primitive assumptions under which we derive the asymptotic distribution of . Appendix A.2 contains auxiliary lemmas used to derive this limiting distribution. Appendix A.3 provides a proof of the results in Section 1.4. When describing our assumptions below, it is convenient to collect the vectors and into a vector , whose dimension is .
A.1. Primitive assumptions
We let such that and where and .
- and such that where ΣG is a non-random positive definite matrix defined as
For each j = 1, 2, the factor loadings matrix is deterministic such that and has distinct eigenvalues.
For each
for any
for all (t, s) and for all (i, l) such that , and .
for every (t, s).
For each
, where for all (i, t).
For each s, .
.
.
A key step in deriving the asymptotic distribution of the AGGR (2019) test statistic (and of its bootstrap analogue) under our Assumption 1 is to obtain an asymptotic expansion of the factors estimation uncertainty (as characterized by for up to order9 ). See Lemma A.5 in Appendix A.2. As it turns out, Assumptions 1–4 are not sufficient to ensure this fast rate of convergence. For this reason, we strengthen Assumptions 1–4 as follows:
For each t and j = 1, 2, , where and .
For any j, k, .
For any .
For any j, k, , where .
For any j, k, .
For any j, k, , where .
For any j, k, .
Assumption 5(a) is a strengthening of Assumption 3(b) and corresponds to Assumption E.1 of Bai (2003). A similar assumption has been used by AGGR (2019). See in particular their Assumption A.7(c) on . As explained by Bai (2003), this assumption is satisfied when we rule out serial dependence, implying that for . In this case, Assumption 5(a) is equivalent to requiring that . More generally, this condition holds whenever for each panel j and each series i, the autocovariance function of is absolutely summable (thus covering all finite order stationary ARMA models).
To interpret Assumptions 5(b) and 5(c), let and . With this notation, we can rewrite part (b) as and part (c) as . The latter condition holds if for all j, k, s, which follows if part (a) holds and if we assume that for all k, t. To see this, note that , which is bounded by by Cauchy–Schwarz’s inequality. If for all k, t, we can use Assumption 5(a) to verify Assumption 5(c). The assumption that for all k, t is a strengthening of Assumption 4(d) and both are equivalent if we assume stationarity of . Hence, Assumption 5(c) holds under general serial and cross-sectional dependence in the idiosyncratic error terms.
A sufficient condition for Assumption 5(b) is that We can show that this condition is implied by Assumptions 3(b) and 5(a) if we assume that and are mutually independent. We can verify Assumptions 5(d) and 5(e) by showing that and , where and , which holds for instance if is i.i.d. with and for j = 1, 2. Similarly, we can show that Assumptions 5(d) and 5(e) are verified under similar conditions on .
Our next assumption is a high-level condition that allows us to obtain the asymptotic normal distribution for the AGGR test statistic.
is such that .
, where and is a vector containing the first kc rows of .
Assumption 6(a) strengthens Assumption 2(a) by requiring that converges to at rate . This assumption is implied by standard mixing conditions on by a maximal inequality for mixing processes and has been used in this literature. See, for example, Gonçalves, McCracken, and Perron (2017). AGGR (2019) assume factors to be mixing, explaining why they do not explicitly write this assumption. It is used to omit from the term that appears in the asymptotic expansion of the test statistic. Assumption 6(b) is a high-level condition that requires the time series process to satisfy a CLT. AGGR (2019) provide conditions under which this high-level condition holds. See in particular their Assumptions A.5 and A.6, which are used to show that is a NED process. Since our contribution is proving the bootstrap validity in this context, we do not provide these more primitive conditions. They are not required to prove our bootstrap theory.
A.2. Asymptotic expansion of the sample covariance of the factors estimation error
The main goal of this section is to provide an asymptotic expansion of for up to order , which is then used to characterize the bias term. See Lemma A.5 in Appendix A.3.
The following auxiliary lemma is used to prove Lemma A.2.
Note that to obtain this last bound, we impose Assumption 5(c), which is a strengthening of Assumption 3.
implying that
A.3 Proof of Theorem 2.1
Following AGGR (2019), we define , where . The test statistic is given by , where is a diagonal matrix containing the kc largest eigenvalues of obtained from the eigenvalue-eigenvector problem , where is the eigenvector matrix. The main idea of the proof is to obtain an expansion of through order10 , where , from which we obtain an asymptotic expansion of and of .
We can show that under Assumptions 1–4 (see Lemma A.3(a)). Using this result, we can show that , where , where . The following auxiliary lemma states this result and characterizes the term of order under Assumptions 1–4. Note that for this result we do not need Assumptions 5 and 6. Nor do we need to impose the null hypothesis of kc common factors between the two panels.
Lemma A.3(a) is the analogue of Lemma B.1 of AGGR (2019). Contrary to AGGR (2019), we rely on Bai (2003)’s asymptotic expansion for , which explains why our set of assumptions is different from those of AGGR (2019). Lemma A.3(b) is the analogue of Lemma B.2 of AGGR (2019) under our Assumptions 1–4. Note that the order of magnitude of the remainder term follows from expressing as a function of the inverse matrices of and then using the expansion to obtain given that . Instead AGGR (2019) use a second-order expansion to obtain their equation (B.5). They require a higher order asymptotic expansion than ours because their rate conditions on N and T are weaker than those we assume under Assumption 1.
The next step is to obtain an asymptotic expansion of the kc largest eigenvalues of when the two panels share kc common factors, that is, when for j = 1, 2 (hence, when the null hypothesis of kc common factors is true). We summarize these results in the following lemma.
Lemma A.4 gives the asymptotic expansion of through order under the null hypothesis that there are kc factors that are common between the two groups. This result is a simplified version of equation (B.13) of AGGR since it only contains terms of order (their expansion contains terms of order ).
Next, we can use Lemma A.2 to expand up to a remainder of order . We can then obtain the following result using the definition of given above.
The asymptotic distribution of the test statistic given Theorem 2.1 follows from the previous lemmas by adding Assumption 6 (in addition to Assumptions 1–5).
Part (a): This follows from Lemma A.2 of GP (2014) and the fact that the rotation matrices are . Assumptions 1–4 are sufficient to apply this result.
The result follows by replacing with the asymptotic expansion given in Lemma A.2. ▪
The proof of this result follows from Lemmas A.3, A.4, and A.5 under Assumptions 1–6, when the null hypothesis is true. ▪
Appendix B: Bootstrap results
We organize this appendix as follows. In Appendix B.1, we provide a set of bootstrap high-level conditions which are the bootstrap analogues of Assumptions 3, 4, and 5. These conditions are used to prove two auxiliary lemmas in Appendix B.2. Appendix B.3 provides the proofs of the results in Section 2.
B.1. Bootstrap high-level conditions
Here, we propose a set of high-level conditions on under which we can characterize the asymptotic distribution of the bootstrap test statistic . These conditions can be verified for any resampling scheme.
Condition A*
, for all i, t.
where .
Condition B*
.
.
Condition C*
.
, where .
.
, where .
.
Conditions A* and B* are used in GP (2014) and Gonçalves and Perron (2020) and have been verified for the wild bootstrap and the CSD bootstrap, respectively, when and are the PCA estimators. Here, they are obtained as in AGGR (2019) under the null. Condition C* is new to the group factor model and needs to verified.
B.2. Asymptotic expansion of the sample covariance of the bootstrap factors estimation error
First, note that under Conditions A* and B*, which are all from GP (2014).
The following auxiliary lemmas are the bootstrap analogues of Lemmas A.1 and A.2.
This follows immediately from Lemma B.1. ▪
B.3. Proof of bootstrap results in Section 2
The section is organized as follows. First, we state several auxiliary lemmas used to prove Lemma 3.1 and Theorem 3.1, followed by their proofs. Then, we prove Lemma 3.1, Theorem 3.1 and Proposition 3.1.
Following AGGR (2019), we define , where . The test statistic is given by , where is a diagonal matrix containing the kc largest eigenvalues of obtained from the eigenvalue–eigenvector problem , where is a matrix of eigenvectors associated to kc largest eigenvalues. The main idea of the proof is to obtain an expansion of through order , where , from which we obtain an asymptotic expansion of and of .
The following auxiliary lemma provides the asymptotic expansion of through order .
Lemma B.3 is the bootstrap analogue of Lemma B.2 of AGGR (2019) when the rate conditions on N and T are as assumed in Assumption 1. Note that under this assumption, we only require an asymptotic expansion through order , which means its remainder is of order .
Lemma B.3 only requires Conditions A* and B*. Condition C* is not used here. Note that and , which explains the differences between the asymptotic expansions of and (in particular, we do not need to pre-multiply by ).
Since the bootstrap test statistic is defined as , where contains the first kc eigenvalues of , our next result provides an asymptotic of , from which we obtain an asymptotic expansion of .
Suppose Conditions A* and B* hold. Under Assumption 1,
where is upper-left block of the matrix defined in Lemma B.3 and is a matrix.
Lemma B.4 is the bootstrap analogue of Lemma B.4 of AGGR (2019) when N and T satisfy the rate conditions of Assumption 1. In contrast to Lemma B.4 in AGGR (2019), which only holds under the null hypothesis, Lemma B.4 holds under both the null and the alternative hypothesis.
Next, we provide an asymptotic expansion of through order (ie with remainder of order ). This expansion is based on the asymptotic expansion of given in Lemma B.2. This result is in Appendix B.2, and it requires the strengthening of Conditions A* and B* with Condition C*. We can then obtain the following result using the definition of given in Lemma B.3.
Recall that , where denotes the vector containing the first kc rows of .
Suppose Conditions A*, B* and C* hold and assume that Assumption 1 is verified with . Defining , we have that
The desired result follows by LemmaB.2, noting that , where (without loss of generality), which implies the definition of . ▪
This follows from Lemmas B.3, B.4, and B.5 under Conditions A*–C*. ▪
Since under the null hypothesis, the random variable inside in can be written as , implying that . ▪
Appendix C: Proof of wild bootstrap results in Section 3.1
In this appendix, we first provide three auxiliary lemmas, followed by their proofs. Then, we prove Theorem 4.1.
Suppose Assumptions 1–4 hold. If either (1) and are mutually independent and for some and , or (2) for some and , it follows that
, and ;
and ;
,
where and is the diagonal matrix containing the largest eigenvalues of on the main diagonal in descending order.
Assume that Assumptions 1–6 strengthened by Assumption WB1 and WB2 hold. Then Lemma 3.1 follows for Algorithm 1.
In Lemma C.2, we verify that the bootstrap method generated by Algorithm 1 satisfies Conditions A* through C*. To verify these conditions, we use Lemma C.1 which is valid under and . Therefore,Lemma C.2 is satisfied regardless of the fact that either or is true.
In the following Lemma C.3, we obtain the uniform expansions of the group common factors, factor loadings, group-specific factors, and group-specific factor loadings up to order under to verify Condition D*. Note that Lemma C.3 is only valid under .
Assume that Assumptions 1–5 hold and is true. Then, for j = 1, 2, we have the following:
;
;
;
,
where and and is defined in Lemma C.1.
Similarly, we can show that (b) and (d) are by replacing with its expansion up to order . For example, ignoring , the term (d) is by Assumption 4(c). Using the proof of Lemma A.1(e), we can show that and show that . Our asymptotic expansions for and are equivalent to those in AGGR (2019) (specifically, (C.92) and (C.94) in their Online Appendix).
Then, by following the arguments in AGGR (2019) (specifically, the arguments on page 56 in their Online Appendix), we can show that and this completes the proof of part (iii).
Since we can show that , we can show this expansion is equivalent to the expansion of in AGGR (2019) (i.e., equation (C.95) in their Online Appendix). ▪
We first verify Theorem 3.1 and then Proposition 3.1. Given Lemma C.2, it suffices to show that the wild bootstrap in Algorithm 1 satisfies Condition D* and Condition E*.
By applying similar arguments in Gonçalves and Perron (2020) (proof of their Lemma B.2), we can show that by Lemma C.1 and Assumption WB3(b). The proof for (i-b) is similar.
Since we have four cases to consider. If , we have since . For the second case, we consider , we have . The third case is and . In this case, we can bound it as since . Finally, we consider when and , and in this case, we can bound the term as , where we use .
We can show that all the terms are under Assumptions 2 and 3.