Abstract

Andreou et al. (2019) have proposed a test for common factors based on canonical correlations between factors estimated separately from each group. We propose a simple bootstrap test that avoids the need to estimate the bias and variance of the canonical correlations explicitly and provide high-level conditions for its validity. We verify these conditions for a wild bootstrap scheme similar to the one proposed in Gonçalves and Perron (2014). Simulation experiments show that this bootstrap approach leads to null rejection rates closer to the nominal level in all of our designs compared to the asymptotic framework.

Factor models have been extensively used in the past decades to reduce dimensions of large data sets. They are now widely used in forecasting, as controls in regressions, and as a tool to model cross-sectional dependence.

Andreou et al. (2019) have proposed a test of whether two groups of data contain common factors. The test consists in estimating a set of factors for each subgroup using principal components and testing whether some canonical correlations between these two groups of estimated factors are 1 as they would be if there are factors common to both groups of data. Inference in this situation is complicated by the need to account for the preliminary estimation of the factors. The asymptotic theory in Andreou et al. (2019) is highly nonstandard with non-standard rates of convergence and the presence of an asymptotic bias. Under restrictive assumptions, they propose an estimator for this bias and construct a feasible statistic. However, their simulation results suggest that, even under these restrictive assumptions, their statistic can exhibit level distortions.

This approach was applied in Andreou et al. (2022) to sets of returns on individual stocks and on portfolios. In principle, these two sets of returns should share a common set of factors that represent the stochastic discount factor. The authors find a set of three common factors that price both individual stocks and sorted portfolios. They also find that 10 principal components from the large number of factors proposed in the literature to price stocks (the factor zoo) are needed to span the space of these three common factors.

This article proposes the bootstrap as an alternative inference method. Our main contribution is to propose a simple bootstrap test that avoids the need to estimate the bias and variance of the canonical correlations explicitly. We show its validity under a set of high-level conditions that allow for weak dependence on the data-generating process (DGP). The specific bootstrap scheme that is used depends on the assumptions a researcher is willing to make about this dependence.

For example, if a researcher is willing to assume that the idiosyncratic terms do not exhibit cross-sectional or serial correlation, we show that a wild bootstrap is valid in this context. This is analogous to the results in Gonçalves and Perron (2014), henceforth GP (2014), who showed the validity of a wild bootstrap in the context of factor-augmented regression models. If the presence of cross-sectional dependence is important, a researcher could instead use the cross-sectional dependent (CSD) bootstrap of Gonçalves and Perron (2020). If serial correlation in the idiosyncratic errors is relevant, Koh (2022) proposed an autoregressive sieve bootstrap for factor models. Finally, we also discuss an extension of this method that allows for cross-sectional and serial dependence in the idiosyncratic errors.

The bootstrap has recently been applied in Andreou et al. (2024) to test for the number of common factors. Contrary to our framework which follows Andreou et al. (2019), one set of the factors is assumed to be observed, implying that their bootstrap method is different from ours.

The remainder of the article is organized as follows. Section 1 describes the model and the testing problem in Andreou et al. (2019). Section 2 introduces a general bootstrap scheme in this context and provides a set of high-level conditions under which the bootstrap test is asymptotically valid under the null hypothesis. We also provide a set of sufficient conditions that ensure the bootstrap test is consistent under the alternative hypothesis. Section 3 verifies these conditions for the wild bootstrap method of GP (2014) under a set of assumptions similar to those in Andreou et al. (2019). Section 4 provides simulation results, and Section 5 concludes. We provide three appendices.  Appendix A contains a set of assumptions under which we derive the limiting distribution of the original test statistic as well as auxiliary lemmas used to derive this asymptotic distribution.  Appendix B contains a set of bootstrap high-level conditions that mirror the primitive assumptions in  Appendix A. It also provides the bootstrap analog of the auxiliary lemmas introduced in  Appendix A, which are used to prove the bootstrap results in Section 2. Finally,  Appendix C contains the proofs of the bootstrap results for the wild bootstrap method proposed in Section 3.

A final word on notation. For a bootstrap sequence, say XN,T, we use XN,Tpp0, or, equivalently, or XN,Tp0, as N,T, in probability, to mean that, for any ϵ>0,P(|XN,T|>ϵ)p0, where P denotes the probability measure conditionally on the original data. An equivalent notation is XN,T=op(1) (where we omit the qualification “in probability” for brevity). We also write XN,T=Op(1) to mean that P(|XN,T|>M)p0 for some large enough M. Finally, we write XN,TdpX or, equivalently, XN,TdX, in probability, to mean that, for all continuity points xR of the cdf of X, say F(x)P(Xx), we have that P(XN,Tx)F(x)p0.

1 Framework

1.1 The Group Panel Factor Model

Following Andreou et al. (2019) (henceforth AGGR 2019), we consider a group factor panel model with two groups, both driven by a set of common factors and a set of specific factors:
Here, y1t and y2t are N1×1 and N2×1 vectors, respectively. In particular, yjt=[yj,1t,,yj,Njt] collects the Nj observations in group j at time t. We use a similar notation to denote εjt. The kc×1 vector ftc denotes the common factors, whereas fjts is a kjs×1 vector which contains the factors specific to group j. The matrices Λjc and Λjs contain the factor loadings associated with the common factors and the group-specific factors for group j, respectively. Thus, Λjc is Nj×kc and Λjs is Nj×kjs. We let kjkc+kjs denote the total number of factors in each sub-panel and define k  min(k1,k2). Finally, we let fjt=((ftc),(fjts)) define the kj×1 vector that collects all factors in each group. Their variance–covariance matrices are VjlE(fjtflt),j,l=1,2. As in AGGR (2019), we assume that the common and group-specific factors have mean zero, variance–covariance matrix equal to the identity matrix and that they are orthogonal within each group:

However, we allow for the possibility that f1ts and f2ts are correlated with covariance matrix Φ.

1.2 The testing problem

AGGR (2019) propose an inference procedure for determining the number of common factors kc. Their procedure is based on the fact that the canonical correlations between the two sets of factors f1t and f2t identify the common factor space. Specifically, let ρ1ρ2ρk_ denote the ordered canonical correlations between f1t and f2t. The squared canonical correlations ρl2 for l=1,,k are defined as the k largest eigenvalues of the matrix R=V111V12V221V21 (or equivalently of Ŕ=V221V21V111V12). Proposition 1 of AGGR (2019) shows that if kc>0, the largest kc canonical correlations are equal to 1 and the remaining k  kc are strictly less than 1. This corresponds to the null hypothesis that there are kc common factors, that is,
To test H0, AGGR (2019) propose a test statistic based on
where ρ^l is the sample analogue of ρl (we define these estimators below). Under the null, ξ^(kc) should be close to kc, whereas it should be less than kc under the alternative hypothesis H1. Here, H1 is defined as
with the understanding that if r =0, ρl<1 for all l=1,,  k, corresponding to the absence of common factors. Thus, we reject the null when ξ^(kc)kc is negative and large.

The critical value used in AGGR (2019) is obtained from the asymptotic distribution of the test statistic when N1, N2, and T. Our goal in this article is to propose an alternative method of inference based on the bootstrap.

1.3 Canonical Correlations, Common and Group-Specific Factors, and Their Loadings

Here, we define the estimators ρ^l,  l=1,,k. In the process of doing so, we also define the estimators of the common and group-specific factors and factor loadings. These will be used to form our bootstrap DGP.

We start by extracting the first kj principal components for each group j, with j =1, 2. In particular, let Yj denote the observed data matrix of size T×Nj for group j. The factor model for this group can be written as
(1)
where εj is T×Nj,Fj=(fj1,,fjT) is T×kj, and Λj is Nj×kj.

Given Yj, we estimate Fj and Λj with the standard method of principal components. In particular, Fj is estimated with the T×kj matrix F^j=(f^j1,,f^jT) composed of T times the eigenvectors corresponding to the kj largest eigenvalues of of YjYj/TNj (arranged in decreasing order), where the normalization F^jF^jT=Ikj is used. The factor loading matrix is Λ^j=YjF^j/T.

The estimators ρ^l,  l=1,,k are obtained from the eigenvalues of the sample analogue of R. Specifically, letting
we can define1  

The kc largest eigenvalues of R^ are denoted by ρ^l2,l=1,,kc. They correspond to the largest kc sample squared canonical correlations between f^1t and f^2t.

For our bootstrap DGP (to be defined in the next section), we also need estimates of the common and group-specific factors and loadings. These estimates are also used to obtain the test statistic proposed by AGGR (2019). Hence, we describe them next.

First, using Definition 1 of AGGR (2019), we can estimate the common factors as follows. Let the kc associated eigenvectors of R^ (the canonical directions) be collected in the k1×kc matrix W^, normalized to have length one, for example, W^V^11W^=W^W^=Ik1 since V^11=Ik1. Given W^, an estimator of the common factors ftc is f^tc=W^f^1t.

The group-specific factors fjts (j =1, 2) can then be estimated using Definition 2 of AGGR (2019). In particular, f^jts are obtained by applying the method of principal components to the T×Nj matrix of residuals Ξj obtained from regressing yjt on the estimated common components f^tc. More specifically, given model (1), we can further decompose Fj and Λj in terms of common and group-specific factors and factor loadings to write
Let F^c=(f^1c,,f^Tc) denote the T×kc matrix of the kc largest estimated common factors. The regression of Yj on F^c yields the common factor loadings

The matrix Ξj is defined as Ξj=YjF^cΛ^jc. Given Ξj, we can now apply the method of principal components to obtain F^js=(f^j1sf^jTs), composed of T times the eigenvectors corresponding to the kjs largest eigenvalues of of ΞjΞj/TNj (arranged in decreasing order), where the normalization F^jsF^jsT=Ikjs is used.

1.4 The Test Statistic and Its Asymptotic Distribution

To test H0, we rely on the statistic

Our goal is to propose a bootstrap test based on the bootstrap analogue of ξ^(kc)kc, say ξ^(kc)kc. In particular, we focus on obtaining a valid bootstrap p-value pP(ξ^(kc)ξ^(kc)).2

To understand the properties that a bootstrap test should have in order to be asymptotically valid, we first review the large sample properties of this test statistic, as studied by AGGR (2019). In the following, we let Nmin(N1,N2)=N2 (without loss of generality) and define μN=N2/N1. Since N=N2N1,μN1. We assume that μNμ[0,1]. When N1=N2=N, μN=μ=1.

 Appendix A contains a set of assumptions under which we derive the asymptotic distribution of ξ^(kc). Compared to AGGR (2019), we impose a stricter rate condition on N relatively to T. In particular, while our Assumption 1 maintains AGGR (2019)’s assumption that T/N0, we require that N/T3/20 as opposed to N/T5/20. The main reason why we adopt a stricter rate condition is that it greatly simplifies both the asymptotic and the bootstrap theory.3 In addition, we generalize standard assumptions in the literature on factor models (see, e.g., Bai (2003), Bai and Ng (2006) and GP (2014)) to the group factor context of interest here. Our assumptions suggest natural bootstrap high-level conditions (which we provide in  Appendix B) under which the bootstrap asymptotic distribution can be derived. Since some of these bootstrap conditions have already been verified in the previous literature, we can rely on existing results for proving our bootstrap theory. Instead, AGGR (2019)’s assumptions are not easily adapted to proving our bootstrap theory.

Next, we characterize the asymptotic distribution of ξ^(kc) under Assumptions 1–6 in  Appendix A. We introduce the following notation. First, we let ujt(ΛjΛjNj)ΛjεjtNj and note that ujt captures the factors estimation uncertainty for panel j. In particular, as is well known (cf Bai (2003)), estimation of fjt by principal components implies that each estimator f^jt is consistent for Hjfjt, a rotated version of fjt. The rotation matrix is defined as Hj=Vj1F^jFjTΛjΛjNj, where Vj is a kj×kj diagonal matrix containing the kj largest eigenvalues of YjYj/NjT on the main diagonal, in decreasing order. As shown by Bai (2003), ujt is the leading term in the asymptotic expansion of Nj(f^jtHjfjt). We let ujt(c) denote the kc×1 vector containing the first kc rows of ujt(ΛjΛjNj)ΛjεjtNj and define UtμNu1t(c)u2t(c). Finally, we let Σ˜UT1t=1TE(UtUt) and Σ˜ccT1t=1TE(ftcftc).

 
Theorem 2.1.
Suppose Assumptions 1–6 hold and the null hypothesis is true so that fjt=(ftc,fjts) for j =1, 2. It follows that
(2)
implying that
(3)

Theorem 2.1 corresponds to Theorem 1 in AGGR (2019) under our Assumptions 1–6. For completeness, we provide a proof of this result in  Appendix A. As in AGGR (2019), we obtain an asymptotic expansion of R^ around R˜V˜111V˜12V˜221V˜21, where V˜jkT1t=1Tfjtfkt. We then use the fact that under the null hypothesis, fjt and fkt share a set of common factors ftc (ie fjt=(ftc,fjts) for j =1, 2), implying that the kc largest eigenvalues of R˜ are all equal to 1. This explains why ξ^(kc) is centered around kc under the null. However, the asymptotic distribution of ξ^(kc) depends on the contribution of the factors estimation uncertainty to V^jkT1t=1Tf^jtf^kt, which involves products of f^jt and f^kt. Using Bai (2003)’s identity for the factor estimation error f^jtHjfjt, we rely on Lemma A.2 in  Appendix A (which gives an asymptotic expansion of T1t=1T(f^jtHjfjt)(f^ktHkfkt) up to order Op(δNT4)) to obtain the asymptotic distribution in Theorem 2.1.4

Under our assumptions, the leading term of the asymptotic expansion of ξ^(kc)kc in Equation (2) is given by 12NB, where Btr(Σ˜cc1Σ˜U). Since B=Op(1) under our assumptions, 12NB is of order Op(N1). The asymptotic Gaussianity of the test statistic is driven by the first term on the right-hand side of Equation (2), which we can rewrite as 1NT12Tt=1TZN,t, where ZN,tUtUtE(UtUt). Under Assumption 6, ZN,t satisfies a central limit theorem, that is, we assume5 that 12Tt=1TZN,tdN(0,ΩU). Hence, NTΩU1(ξ^(kc)kc+12Ntr(Σ˜cc1Σ˜U)) is asymptotically distributed as N(0,1), as stated in Equation (3). Note that in deriving this result we have used the fact that T/N0 and N/T3/20 to show that the remainder is NTOp(δNT4)=op(1).

Theorem 2.1 illustrates two crucial features of the asymptotic properties of the test statistic ξ^(kc) under the null. First, the test converges at a non-standard rate given by NT. Second, the statistic ξ^(kc) is not centered at kc even under the null. There is an asymptotic bias term of order Op(N1) given by B/2N. When multiplied by NT, this term is of order Op(T). Thus, the bias is diverging but at a slower rate than the convergence rate NT.

The distributional result (3) is infeasible since we do not observe the asymptotic bias B nor the asymptotic covariance matrix ΩU. To obtain a feasible test statistic, we need consistent estimators of B and ΩU. In particular, suppose that B^ and Ω^U denote such estimators. Then, a feasible test statistic is

Two crucial conditions for showing that ξ˜(kc)dN(0,1) are (i) T(B^B)=op(1) and (ii) Ω^UΩU=op(1). Under these conditions, we can use a standard normal critical value to test H0 against H1. Since under H1,ξ^(kc)kc is large and negative, the decision rule is to reject H0 whenever ξ˜(kc)<zα, where zα is the α-quantile of a N(0,1) distribution. This is the approach followed by AGGR (2019).

As it turns out, estimating B and ΩU is a difficult task when we allow for general time series and cross-sectional dependence in the idiosyncratic errors εjt. In particular, we can show that B depends on the cross-sectional dependence of ε1t and ε2t (but not on their serial dependence), whereas ΩU depends on both forms of dependence.

To see this, assume that kc=1 (and kjs=0 for j =1, 2), in which case B=Σ˜cc1Σ˜U. Assume also that N=N2=N1, which implies that μN=1. When the idiosyncratic errors are independent across the two groups, we can write
For each group j, E(ujt2) captures the factor estimation uncertainty in f^jt and is given by E(ujt2)=(N1ΛjΛj)2Γj,t, where Γj,tVar(N1/2i=1Nλj,iεj,it). It follows that
where ΓjT1t=1TΓj,t. This shows that B=Σ˜cc1Σ˜U depends on Γ1 and Γ2, the time averages of the variances of the cross-sectional averages of λj,iεj,it for j =1, 2. Hence, B depends on the cross-sectional dependence of each group’s idiosyncratic errors, but it does not depend on their serial dependence.

To see that ΩU depends on both serial and cross-sectional dependence in εj,it, note that ΩUVar(12T1/2t=1TZN,t) is the long run variance of ZN,t(u1tu2t)2E(u1tu2t)2, whose form depends on the potential serial dependence of εj,it. It also depends on the cross-sectional dependence because ZN,t is a (quadratic) function of ujt, which depends on the cross-sectional averages of εj,it. Thus, we conclude that ΩU is a complicated function of the serial and cross-sectional dependence in the idiosyncratic error terms.

For these reasons, in order to obtain a feasible test statistic, AGGR (2019) assume that each sub-panel follows a strict factor model. Under this assumption (including the assumption of conditional homoskedasticity in the idiosyncratic errors), the form of B and ΩU simplifies considerably. Their Theorem 2 provides consistent estimators of these quantities, allowing for the construction of a feasible test statistic. However, even under these restrictive assumptions, our simulations (to be discussed later) show important level distortions.

This provides the main motivation for using the bootstrap as an alternative method of inference. Our main goal is to propose a simple bootstrap test that avoids the need to estimate B and ΩU explicitly and outperforms the asymptotic theory-based test of AGGR (2019).

2 A General Bootstrap Scheme

2.1 The Bootstrap DGP and the Bootstrap Statistics

Let ξ^(kc) denote the bootstrap analogue of ξ^(kc). Our goal is to propose a bootstrap test that rejects H0 whenever pα, where α is the significance level of the test and p is the bootstrap p-value defined as

The goal of this section is to propose asymptotically valid bootstrap methods. A crucial condition for bootstrap validity is that the bootstrap p-value is asymptotically distributed as U[0,1], a uniform random variable on [0,1], when H0 holds. Under H1, the bootstrap p-value should converge to zero in probability to ensure that the bootstrap test has power. We propose a general residual-based bootstrap scheme that resamples the residuals from the two sub-panels in order to create the bootstrap data on y1t and y2t. We highlight the crucial conditions that the resampled idiosyncratic errors ε1t and ε2t need to verify in order to produce an asymptotically valid bootstrap test.

We adapt the general residual-based bootstrap method of GP (2014) to the group panel factor model. Specifically, for j =1, 2, let {εjt:t=1,,Nj} denote a resampled version of {ε˜jt=yjtΛ^jcf^tcΛ^jsf^jts}. The bootstrap DGP is
(4)
or, equivalently, for j =1, 2, we let Yj=F˜jΛ˜j+εj, where F˜j=[f˜j1,,f˜jT] is T×kj and Λ˜j=(λ˜j,1,,λ˜j,Nj) is Nj×kj. An important feature of Equation (4) is that it imposes the null hypothesis of kc common factors between the two panels since the conditional mean of yjt relies on the restricted estimated factors f˜jt=(f^tc,f^jts) for each j =1, 2. This mimics the fact that yjt depends on fjt=(ftc,fjts) under the null hypothesis. Similarly, εjt are a resampled version of the restricted residuals ε˜jt. Although other bootstrap schemes that do not impose the null hypothesis could be considered,6 we focus on the null restricted bootstrap DGP in Equation (4) for two main reasons. First, the fact that we impose the null hypothesis implies that the factors underlying the bootstrap DGP satisfy the normalization conditions imposed on the group factor model (see Assumption 2(a)). In particular, by construction f^tc is orthogonal in-sample to f^jts for both j =1, 2 when we use Definition 2 of AGGR (2019), and T1t=1Tf^tcf^tc=Ikc and T1t=1Tf^jtsf^jts=Ikj for both j =1, 2. These properties are crucial in showing the asymptotic Gaussianity of the bootstrap test statistic. Second, imposing the null hypothesis in the bootstrap DGP when doing hypothesis testing has been shown to be important to minimize the probability of type I error (see, e.g., Davidson and MacKinnon 1999).

Estimation in the bootstrap world proceeds as in the original sample. First, we extract the largest kj principal components for each group j, with j=1,2, by applying the method of principal components to each sub-panel. In particular, the T×kj matrix F^j=(f^j1,,f^jT) contains the estimated factors for each bootstrap sample generated from Yj=F˜jΛ˜j+εj. The matrix F^j collects the eigenvectors corresponding to the kj largest eigenvalues of of YjYj/TNj (arranged in decreasing order and multiplied by T), where we impose that F^jF^jT=Ikj. We then compute R^=V^111V^12V^221V^21, where V^jk=F^jF^k/T=T1t=1Tf^jtf^kt. The bootstrap test statistic is ξ^(kc)=l=1kcρ^l=tr(Λ^1/2), where Λ^=diag(ρ^l2:l=1,,kc) is a kc×kc diagonal matrix containing the kc largest eigenvalues of R^ obtained from the eigenvalue-eigenvector problem R^W^=W^Λ^, where W^ is the k1×kc matrix eigenvector matrix.

As in the original sample, estimation by principal components using the bootstrap data Yj implies that each estimator f^jt is consistent for Hjf˜jt, a rotated version of f˜jt. The bootstrap rotation matrix is defined as Hj=Vj1F^jF˜jTΛ˜jΛ˜jNj, where Vj is a kj×kj diagonal matrix containing the kj largest eigenvalues of YjYj/NjT on the main diagonal, in decreasing order. Contrary to Hj, Hj is observed and could be used for inference on the factors as in GP (2014). Here, the bootstrap test statistic ξ^(kc) is invariant to Hj, but it shows up in the bootstrap theory. The bootstrap p-value p is based on NT(ξ^(kc)kc), where ξ^(kc) is centered around kc because we have imposed the null hypothesis in the bootstrap DGP in Equation (4).

Next, we characterize the bootstrap distribution of ξ^(kc). Following the proof of Theorem 2.1, we expand R^ around R˜V˜111V˜12V˜221V˜21, where V˜jkT1t=1Tf˜jtf˜kt is the bootstrap analogue of V˜jkT1t=1Tfjtfkt.7 Given Equation (4), f˜jt and f˜kt share a set of common factors f^tc (ie f˜jt=(f^tc,f^jts) for j =1, 2), implying that the kc largest eigenvalues of R˜ are all equal to 1 and ξ^(kc) is centered around kc. Note that this holds by construction, independently of whether the null hypothesis H0 is true or not. As argued for the original statistic, the bootstrap distribution of ξ^(kc) is driven by the contribution of the factors estimation uncertainty (as measured by f^jtHjf˜jt) to V^jkT1t=1Tf^jtf^kt. In particular, following the proof of Theorem 2.1, the asymptotic distribution of ξ^(kc) is based on an asymptotic expansion of T1t=1T(f^jtHjf˜jt)(f^ktHkf^kt) up to order Op(δNT4). This crucial result is given in Lemma B.2 in  Appendix B. It relies on Conditions A*, B*, and C*, which are the bootstrap analogues of Assumptions 3, 4 and 5. We call these bootstrap high-level conditions because they apply to any bootstrap method that is used to generate the bootstrap draws εjt. We will verify these conditions for the wild bootstrap in the next section.

The following result follows under Conditions A*–C*. We let UtμNu1t(c)u2t(c), where ujt(c) denotes the first kc rows of ujt(Nj1Λ˜jΛ˜j)1Nj1/2i=1Njλ˜j,iεj,it. Similarly, we let Σ˜UT1t=1TE(UtUt), which is the bootstrap analogue of Σ˜UT1t=1TE(UtUt).

 
Lemma 3.1.
Suppose that Conditions A*, B*, and C* hold. It follows that  
(5)

Lemma 3.1 gives the asymptotic expansion of ξ^(kc) and is the bootstrap analogue of Equation (2) in Theorem 2.1. The leading term in the expansion of ξ^(kc)kc in Equation (5) is given by 12NB, where Btr(Σ˜U) is the bootstrap analogue of Btr(Σ˜cc1Σ˜U). Note that in the bootstrap world, Σ˜ccT1t=1Tf^tcf^tc=Ikc, which explains why Σ˜cc1 is omitted from the definition of B. Under our bootstrap high-level conditions, 12NB is of order Op(N1).

To show the asymptotic validity of the bootstrap test, we impose the following additional bootstrap high-level conditions. We define ZN,tUtUtE(UtUt), and let ΩUVar(12Tt=1TZN,t).

Condition D*T(BB)p0.

Condition E*ΩU1/212Tt=1TZN,tdpN(0,1), where ΩUVar(12Tt=1TZN,t) is such that ΩUΩUp0.

 
Theorem 3.1.
Assume Assumptions 1–6 hold and  H0  is true. Then, any bootstrap scheme that verifies Conditions A*–E* is such that  
which implies that  pdU[0,1].

Condition D* requires the bootstrap bias B to mimic the bias term B. In particular, B needs to be a T-convergent estimator of B. Having BB=op(1) does not suffice. The main reason for the faster rate of convergence requirement is that the asymptotic bias term (2N)1B is of order Op(N1) and since the convergence rate is NT, this induces a shift of the center of the distribution of order Op(T). So, contrary to more standard settings where the asymptotic bias is of order O(1), here the asymptotic bias diverges. However, any T-consistent estimator of B can be used to recenter ξ^(kc)kc and yield a random variable whose limiting distribution is N(0,ΩU). Condition D* requires that the bootstrap bias B has this property. Condition E* requires that the bootstrap array ZN,t satisfies a central limit theorem in the bootstrap world with an asymptotic variance–covariance matrix ΩU that converges in probability to ΩU. This condition is the bootstrap analogue of Assumption 6(b) in  Appendix A.

We discuss a few implications of our bootstrap high-level conditions. The first one is that for the bootstrap to mimic the asymptotic bias term B (as implied by Condition D*) we need to generate εjt in a way that preserves the cross-sectional dependence of εjt. Serial dependence in εjt is asymptotically irrelevant for this term. The reason for this is that B depends only on the cross-sectional dependence but not on the serial dependence of εjt, as we explained in the previous section.

The second implication is that in order for the bootstrap to replicate the covariance ΩU (as required by Condition E*) we need to design a bootstrap method that generates εjt with serial dependence (in addition to cross-sectional dependence). This can be seen by noting that ΩU is the long run variance of 12Tt=1TZN,t, which depends on both the serial and the cross-sectional dependence properties of {εjt}.

The overall conclusion is that the implementation of the bootstrap depends on the serial and cross-sectional dependence assumptions we make on the idiosyncratic errors of each sub-panel. Different assumptions will lead to different bootstrap algorithms. Theorem 3.1 is useful because it gives a set of high-level conditions that can be used to prove the asymptotic validity of the bootstrap for any bootstrap scheme used to obtain εjt.

To end this section, we discuss the asymptotic power of our bootstrap test. Although Conditions A*–E* suffice to show that pp0 under H1, a weaker set of assumptions suffices. In particular, the following high-level condition is sufficient to ensure that any bootstrap test based on ξ^(kc) is consistent.

Condition F*12Tt=1TZN,t=Op(1) and B=Op(N1ϵ), where ϵ is some positive number.

 
Proposition 3.1.

Under Assumptions 1–6, any bootstrap method that verifies Conditions A*–C* and F* satisfies  pp0  under  H1.

Since we reject H0 if pα, Proposition 3.1 ensures that P(pα)1 when H1 is true.

3 Specific Bootstrap Schemes

3.1 The Wild Bootstrap Method

Here, we discuss a wild bootstrap method and show that it verifies Conditions A*–F* under a set of assumptions similar to those of Theorem 2 in AGGR (2019). Algorithm 1 contains a description of this method.

Algorithm 1

Wild Bootstrap

1. For t=1,,T, and j = 1, 2, let
where f˜jt=(f^tc,f^jts) and εjt=(εj,1t,,εj,Njt) is such that
and ηj,it are i.i.d. N(0,1) across (j,i,t).
2. For j = 1, 2, estimate the bootstrap factors F^j by extracting the first kj principal components from yjt, and set
and

3. Compute the kc largest eigenvalues of R^ and denote these by ρ^l2,l=1,,kc.

4. Compute the bootstrap test statistic ξ^(kc)=l=1kcρ^l.

5. Repeat steps 1–4 M times and then compute the bootstrap p-value as
where ξ^(b)(kc) is the value of the bootstrap test for replication b=1,,M.

6. Reject the null hypothesis of kc common factors at level α if pα.

To prove the asymptotic validity of the wild bootstrap p-value, we strengthen the primitive assumptions given in  Appendix A as follows:

 
Assumption WB1

For j =1, 2, {fjt} and {εj,it} are mutually independent such that E||fjt||32M< and E|εj,it|32M< for all (i, t).

 
Assumption WB2

(a) Cov(εj,it,εk,ls)=0 if jk or il or ts, and (b) E(εj,it2)=γj,ii>0.

 
Assumption WB3

For each j =1, 2,

  • 1Tt=1Tεj,it2εj,kt2E(εj,it2εj,kt2)=Op(1) for any i, k.

  • maxiNj1Tt=1Tfjtεj,it=Op(logNjT).

  • E1NjΛjεjt2M.

Assumption WB1 strengthens the moment conditions in Assumptions 2 and 3(a). The larger number of moments of fjt and εj,it is required here than in GP (2014) (who require the existence of 12 moments rather than 32). As explained above, our bootstrap test statistic ξ^(kc) involves products and cross products of bootstrap estimated factors from each sub-panel. The derivation of the bootstrap asymptotic distribution of ξ^(kc) relies on Lemma B.2 which obtains an asymptotic expansion of T1t=1T(f^jtHjf˜jt)(f^ktHkf^kt) up to order Op(δNT4). This requires not only the verification of Conditions A* and B* from GP (2014) (who obtain an asymptotic expansion of T1t=1T(f^jtHjf˜jt)(f^ktHkf^kt) up to order Op(δNT2)), but also of Condition C*, which is new to this article. The large number of moments is used in verifying this condition. In particular, we rely on repeated applications of Cauchy–Schwarz’s inequality, and bound sums such as 1NjTi=1Njt=1T|ε˜j,it|p for p16, which requires the existence of 2p moments of fjt and εj,it (see Lemma C.1).

Assumption WB2 rules out cross-sectional and serial correlation in the idiosyncratic errors of each sub-panel as well as correlation among εjt and εkt for jk. These assumptions are similar to the assumptions used by AGGR (2019) to justify their feasible test statistic (see their Theorem 2). For simplicity, we assume the external random variable ηj,it to be Gaussian, but the result generalizes to any i.i.d. draw that is mean zero and variance one with finite eight moments and a symmetric distribution.

 
Theorem 4.1.

Assume that Assumptions 1–6 strengthened by Assumptions WB1, WB2, and WB3 hold. Then, if Algorithm 1 is used to generate  εjt  for j = 1, 2, the conclusions of Theorem 3.1 and Proposition 3.1 apply.

Theorem 4.1 justifies theoretically using the wild bootstrap p-value p to test the null hypothesis of kc common factors. Although Assumption WB2 rules out dependence in εjt in both dimensions, as in Theorem 2 of AGGR (2019), this bootstrap test does not require an explicit bias correction nor a variance estimator. We show in Section 4 that the feasible test statistic AGGR (2019) can be oversized even under these restrictive assumptions. The wild bootstrap essentially eliminates these level distortions.

3.2 An Extension: AR-CSD Bootstrap Method

Here, we discuss an extension of the wild bootstrap that allows for cross-sectional and serial dependence in the idiosyncratic error terms of each sub-panel. In particular, we assume that for each j=1,2, and i=1,,Nj, the idiosyncratic errors εj,it follow an AR(p) model (autoregressive model of order p):
(6)
where aji(L)=aji,1+aji,2L++aji,p1Lp1. If we collect all observations i for panel j, we can write this as εjt=Aj(L)εj,t1+vjt, where Aj(L)=Aj,1+Aj,2L++Aj,p1Lp1 and Aj,k are Nj×Nj diagonal matrices with coefficients aji,k along the main diagonal. Since Nj is large, consistent estimation of Aj,k is not feasible unless we impose some form of sparsity. Assuming that each series εj,it is an autoregressive process with possibly heterogeneous coefficients is a restrictive form of sparsity which allows the use of OLS.8 In addition, we assume that

The fact that we allow for a possibly non-diagonal covariance matrix Σv,j means that we allow for cross-sectional dependence in the innovations vjt.

Our proposal is to create bootstrap observations εjt using a residual-based bootstrap procedure that resamples the residuals of the AR model (6). Resampling the vector of the AR(p) residuals v˜jt allowing for unrestricted cross-sectional dependence is complicated due to the fact that the covariance matrix Σv,j is high dimensional. In particular, i.i.d. resampling of v˜j,it is not valid, as shown by Gonçalves and Perron (2020) in the context of factor-augmented regression models. Our bootstrap algorithm (described in Algorithm 2) relies on the CSD bootstrap of Gonçalves and Perron (2020). In the following, we let Σ˜v,j denote any consistent estimator of Σv,j under the spectral norm. Examples include the thresholding estimator of Bickel and Levina (2008a) and the banding estimator of Bickel and Levina (2008b).

Algorithm 2:

AR-CSD Bootstrap

1. For t=1,,T, and j = 1, 2, let
where f˜jt=(f^tc,f^jts) and εjt=(εj,1t,,εj,Njt) is such that
with εj,i0=0 for i=1,,Nj and where vj,it is i-th element of vjt obtained as

2. Repeat steps 2 through 6 of Algorithm 1.

The wild bootstrap algorithm (Algorithm 1) is a special case of Algorithm 2 when we set a˜ji(L)=0 for all i and Σ˜v,j=diag(ε˜j,it2). Another special case is the CSD bootstrap of Gonçalves and Perron (2020), which sets a˜ji(L)=0 and lets Σ˜v,j denote the thresholding estimator based on the sample covariances of ε˜j,it. Finally, a generalization of Algorithm 2 is the sieve bootstrap proposed by Koh (2024) in the context of MIDAS factor models. Although it would be interesting to extend the sieve bootstrap to our testing problem, we focus on a class of finite order AR models here in order to simplify the analysis.

The proof of the asymptotic validity of Algorithm 2 follows from Theorem 3.1 and Proposition 3.1 by verifying Conditions A*–F*. Since εj,it is both serially and cross-sectionally correlated, the verification of these bootstrap high-level conditions is much more involved than for the wild bootstrap and beyond the scope of this article. However, we evaluate by simulation the performance of both Algorithms 1 and 2 in the next section.

4 Simulations

In this section, we compare the performance of the bootstrap methods discussed in the previous sections. Our DGP is a simple model with one factor for each group:
(7)
where yjt and εjt are Nj×1 for t=1,,T. As opposed to Andreou et al. (2019), we assume that both groups have the same frequency.

For level experiments, we let f1t=f2t=ftc. As in GP (2014), this common factor is generated independently over time from a standard normal distribution, ftci.i.d. N(0,1). For power experiments, each group has a specific factor f1t=f1ts and f2t=f2ts. These two group-specific factors are also generated independently over time from a bivariate normal with unit variance and correlation ϕ=0.99. In all cases, the factor loadings are drawn independently from a standard normal distribution, Λji.i.d. N(0,1), j =1, 2.

The idiosyncratic error terms in the model, εt=(ε1t,ε2t), are such that
where Aε is a block-diagonal matrix
and aε,j is the AR(1) coefficient in group j (we assume that all individual series in each group share the same autoregressive coefficient). The innovations in the idiosyncratic error terms, vt=(v1t,v2t), are such that:
where Σv,1 is the first diagonal block and Σv,2 is the second diagonal block of

The scalar β induces cross-sectional dependence in each group among the idiosyncratic innovations. This is similar to the design in Bai and Ng (2006). Note that we assume that Σv is a block diagonal matrix, so we do not consider dependence between the two groups. In Table 1 and Table 2, we report the parameter settings and sample sizes we consider, respectively.

Table 1

DGPs

DGPaε,1aε,2β
Design 1 (no serial & no cross-sectional dependence)000
Design 2 (only serial dependence)0.50.30
Design 3 (only cross-sectional dependence)000.5
Design 4 (serial & cross-sectional dependence)0.50.30.5
DGPaε,1aε,2β
Design 1 (no serial & no cross-sectional dependence)000
Design 2 (only serial dependence)0.50.30
Design 3 (only cross-sectional dependence)000.5
Design 4 (serial & cross-sectional dependence)0.50.30.5
Table 1

DGPs

DGPaε,1aε,2β
Design 1 (no serial & no cross-sectional dependence)000
Design 2 (only serial dependence)0.50.30
Design 3 (only cross-sectional dependence)000.5
Design 4 (serial & cross-sectional dependence)0.50.30.5
DGPaε,1aε,2β
Design 1 (no serial & no cross-sectional dependence)000
Design 2 (only serial dependence)0.50.30
Design 3 (only cross-sectional dependence)000.5
Design 4 (serial & cross-sectional dependence)0.50.30.5
Table 2

Sample sizes in simulation experiment

N1N2T
505050
5050100
5050200
10010050
100100100
100100200
20020050
200200100
200200200
N1N2T
505050
5050100
5050200
10010050
100100100
100100200
20020050
200200100
200200200
Table 2

Sample sizes in simulation experiment

N1N2T
505050
5050100
5050200
10010050
100100100
100100200
20020050
200200100
200200200
N1N2T
505050
5050100
5050200
10010050
100100100
100100200
20020050
200200100
200200200

In Design 1, we assume that there is no serial correlation and no cross-sectional dependence and that the idiosyncratic errors are homoskedastic. The idiosyncratic error terms in Design 2 are serially correlated in each group where the AR(1) coefficient in group 1 is larger than the one in group 2. In the third design, we consider cross-sectional dependence without serial correlation in the idiosyncratic error term. Finally, in the last design, the idiosyncratic innovation terms are both serially and cross-sectionally correlated.

We consider sample sizes N1=N2=N between 50 and 200 and T between 50 and 200. We simulate each design 5000 times, and the bootstrap replication number is set at 399. We use the bootstrap algorithms proposed in Sections 2 and 3 with four different bootstrap methods: the wild bootstrap, the AR(1)-CSD bootstrap proposed earlier and two variants: a parametric AR(1) bootstrap with no cross-sectional dependence and a CSD bootstrap with no serial dependence. The CSD and AR(1)-CSD bootstraps involve an estimator of the covariance matrix of the idiosyncratic errors. We rely on the banding estimator of Bickel and Levina (2008b) with the banding parameter k chosen by their cross-validation procedure. We focus our results on α=0.05 and report rejection rates for each design, bootstrap method, and sample size.

The simulation results for the level experiments are shown in Table 3. The row labeled “AGGR” reports results based on the asymptotic standard normal critical values. The other four rows contain the results for the bootstrap methods: WB for wild bootstrap and AR(1) for parametric AR(1) bootstrap method, CSD for the cross-sectional bootstrap, and AR-CSD for the bootstrap that combines the autoregressive and CSD bootstrap.

Table 3

Rejection rate of 5% test—level

N =50
N =100
N =200
T =50100200T =50100200T =50100200
Design 1 i.i.d.AGGR6.54.93.37.46.25.08.37.26.2
WB5.34.94.35.85.75.16.76.45.6
AR(1)5.14.94.15.95.65.16.76.15.8
CSD6.15.95.66.05.75.66.76.26.0
AR(1)-CSD7.36.75.87.56.86.38.47.36.7
Design 2 ARAGGR14.310.07.715.212.49.817.713.810.8
WB9.88.78.010.49.88.912.510.79.3
AR(1)4.94.74.25.95.74.86.95.95.4
CSD11.912.514.711.511.511.413.011.010.1
AR(1)-CSD6.46.26.17.27.06.37.87.16.2
Design 3 CSDAGGR13.113.516.012.011.312.611.210.19.9
WB11.213.217.39.610.812.69.69.09.5
AR(1)11.613.117.210.110.412.59.59.19.8
CSD3.54.44.05.04.63.95.95.15.2
AR(1)-CSD6.56.14.77.76.15.28.76.86.1
Design 4 AR + CSDAGGR21.618.920.520.417.516.920.816.814.2
WB15.716.620.115.114.715.815.213.812.4
AR(1)9.611.615.08.69.410.68.78.28.2
CSD7.58.08.08.98.27.710.59.48.4
AR(1)-CSD5.45.35.16.55.84.97.16.15.7
N =50
N =100
N =200
T =50100200T =50100200T =50100200
Design 1 i.i.d.AGGR6.54.93.37.46.25.08.37.26.2
WB5.34.94.35.85.75.16.76.45.6
AR(1)5.14.94.15.95.65.16.76.15.8
CSD6.15.95.66.05.75.66.76.26.0
AR(1)-CSD7.36.75.87.56.86.38.47.36.7
Design 2 ARAGGR14.310.07.715.212.49.817.713.810.8
WB9.88.78.010.49.88.912.510.79.3
AR(1)4.94.74.25.95.74.86.95.95.4
CSD11.912.514.711.511.511.413.011.010.1
AR(1)-CSD6.46.26.17.27.06.37.87.16.2
Design 3 CSDAGGR13.113.516.012.011.312.611.210.19.9
WB11.213.217.39.610.812.69.69.09.5
AR(1)11.613.117.210.110.412.59.59.19.8
CSD3.54.44.05.04.63.95.95.15.2
AR(1)-CSD6.56.14.77.76.15.28.76.86.1
Design 4 AR + CSDAGGR21.618.920.520.417.516.920.816.814.2
WB15.716.620.115.114.715.815.213.812.4
AR(1)9.611.615.08.69.410.68.78.28.2
CSD7.58.08.08.98.27.710.59.48.4
AR(1)-CSD5.45.35.16.55.84.97.16.15.7
Table 3

Rejection rate of 5% test—level

N =50
N =100
N =200
T =50100200T =50100200T =50100200
Design 1 i.i.d.AGGR6.54.93.37.46.25.08.37.26.2
WB5.34.94.35.85.75.16.76.45.6
AR(1)5.14.94.15.95.65.16.76.15.8
CSD6.15.95.66.05.75.66.76.26.0
AR(1)-CSD7.36.75.87.56.86.38.47.36.7
Design 2 ARAGGR14.310.07.715.212.49.817.713.810.8
WB9.88.78.010.49.88.912.510.79.3
AR(1)4.94.74.25.95.74.86.95.95.4
CSD11.912.514.711.511.511.413.011.010.1
AR(1)-CSD6.46.26.17.27.06.37.87.16.2
Design 3 CSDAGGR13.113.516.012.011.312.611.210.19.9
WB11.213.217.39.610.812.69.69.09.5
AR(1)11.613.117.210.110.412.59.59.19.8
CSD3.54.44.05.04.63.95.95.15.2
AR(1)-CSD6.56.14.77.76.15.28.76.86.1
Design 4 AR + CSDAGGR21.618.920.520.417.516.920.816.814.2
WB15.716.620.115.114.715.815.213.812.4
AR(1)9.611.615.08.69.410.68.78.28.2
CSD7.58.08.08.98.27.710.59.48.4
AR(1)-CSD5.45.35.16.55.84.97.16.15.7
N =50
N =100
N =200
T =50100200T =50100200T =50100200
Design 1 i.i.d.AGGR6.54.93.37.46.25.08.37.26.2
WB5.34.94.35.85.75.16.76.45.6
AR(1)5.14.94.15.95.65.16.76.15.8
CSD6.15.95.66.05.75.66.76.26.0
AR(1)-CSD7.36.75.87.56.86.38.47.36.7
Design 2 ARAGGR14.310.07.715.212.49.817.713.810.8
WB9.88.78.010.49.88.912.510.79.3
AR(1)4.94.74.25.95.74.86.95.95.4
CSD11.912.514.711.511.511.413.011.010.1
AR(1)-CSD6.46.26.17.27.06.37.87.16.2
Design 3 CSDAGGR13.113.516.012.011.312.611.210.19.9
WB11.213.217.39.610.812.69.69.09.5
AR(1)11.613.117.210.110.412.59.59.19.8
CSD3.54.44.05.04.63.95.95.15.2
AR(1)-CSD6.56.14.77.76.15.28.76.86.1
Design 4 AR + CSDAGGR21.618.920.520.417.516.920.816.814.2
WB15.716.620.115.114.715.815.213.812.4
AR(1)9.611.615.08.69.410.68.78.28.2
CSD7.58.08.08.98.27.710.59.48.4
AR(1)-CSD5.45.35.16.55.84.97.16.15.7

Under the restrictive Design 1 where the assumptions of Theorem 2 of Andreou et al. (2019) are satisfied, the asymptotic theory performs reasonably well, although some di appear for the smaller value of T. For the other three designs, we find severe over-rejections for all sample sizes, as expected given that the statistic is computed assuming away autocorrelation and cross-sectional dependence.

In all sets of samples and designs, bootstrap methods provide more reliable inference than standard normal inference. The bootstrap method that performs best is typically the one tailored to the properties of the DGP. For example, in Design 1, both the wild bootstrap and the AR(1) bootstrap perform similarly, and they reject the null hypothesis at a rate close to 5%. To illustrate, for N1=N2=100 and T=50, the test rejects in 7.4% of the replications using the standard normal critical values. The rejection rates for the wild bootstrap and AR(1) bootstrap are 5.8% and 5.9%, respectively. On the other hand, the cross-sectional bootstrap and combined AR(1) and CSD bootstrap reject in 6.0% and 7.5% of the replications. This higher rejection rate is the cost of using a more robust method than necessary.

As mentioned above, in Designs 2–4, the feasible statistic in Andreou et al. (2019) leads to large level distortions since it is not robust to serial correlation or cross-sectional dependence. Because there is serial dependence in the idiosyncratic error terms in Design 2, the wild bootstrap and CSD bootstrap are no longer valid while still improving on the use of the standard normal critical values. In this case, both the AR(1) and AR(1)-CSD bootstraps are valid and provide similar results with a slight preference for the simple AR(1) bootstrap. To illustrate, with the same N1=N2=100 and T =50 as above, the standard normal critical values lead to a rejection rate of 15.2% for a 5% test. The (invalid) wild and CSD bootstraps have rejection rates of 10.4% and 11.5%, respectively. On the other hand, the (valid) AR(1) and AR(1)-CSD bootstraps have rejection rates of 5.9% and 7.2%.

In Designs 3 and 4, where we introduce cross-sectional dependence, neither the wild bootstrap nor the AR(1) bootstrap are valid and they are not performing well, as expected. In the most general design with both serial and cross-sectional dependence, only the AR(1)-CSD bootstrap provides reliable results. While the asymptotic theory in the N1=N2=100 and T=50 case shows a rejection rate of 20.4%, the AR(1)-CSD bootstrap has a rejection rate of 6.5% compared with 8.9% for the CSD bootstrap, 8.6% for the AR(1) bootstrap, and 15.0% for the simple wild bootstrap.

Our power results are presented in Table 4. These results must be interpreted with caution given the large level distortions documented in some cases. For the simple i.i.d. case (Design 1) where all tests have reasonable rejection rates under the null, we see that the bootstrap entails a small reduction in power relative to the AGGR test. The largest loss occurs for N1=N2=T=50 where the AGGR test has a power of 65.2%, while the wild bootstrap rejects in 61.5% of the cases. The gap between the two methods disappears as sample size increases in both dimensions.

Table 4

Rejection rate of 5% test—power

N = 50
N = 100
N = 200
T = 50100200T = 50100200T = 50100200
Design 1 i.i.d.AGGR65.283.596.496.499.7100.0100.0100.0100.0
WB61.583.097.295.599.7100.0100.0100.0100.0
AR(1)60.883.497.495.299.6100.099.9100.0100.0
CSD58.979.692.994.999.6100.0100.0100.0100.0
AR(1)-CSD62.081.093.795.699.7100.0100.0100.0100.0
Design 2 ARAGGR70.184.996.096.799.8100.0100.0100.0100.0
WB61.382.396.095.199.6100.099.9100.0100.0
AR(1)48.974.293.390.199.3100.099.8100.0100.0
CSD61.579.992.794.399.5100.099.9100.0100.0
AR(1)-CSD50.071.687.590.399.2100.099.9100.0100.0
Design 3 CSDAGGR68.484.394.496.399.6100.0100.0100.0100.0
WB64.783.995.095.299.5100.0100.0100.0100.0
AR(1)63.983.495.195.199.5100.0100.0100.0100.0
CSD46.166.383.191.799.099.999.9100.0100.0
AR(1)-CSD51.369.184.692.999.1100.0100.0100.0100.0
Design 4 AR + CSDAGGR73.385.094.396.999.7100.0100.0100.0100.0
WB65.482.694.494.999.5100.0100.0100.0100.0
AR(1)53.575.191.790.599.1100.099.9100.0100.0
CSD48.866.583.891.399.0100.099.9100.0100.0
AR(1)-CSD40.057.275.485.898.099.099.8100.0100.0
N = 50
N = 100
N = 200
T = 50100200T = 50100200T = 50100200
Design 1 i.i.d.AGGR65.283.596.496.499.7100.0100.0100.0100.0
WB61.583.097.295.599.7100.0100.0100.0100.0
AR(1)60.883.497.495.299.6100.099.9100.0100.0
CSD58.979.692.994.999.6100.0100.0100.0100.0
AR(1)-CSD62.081.093.795.699.7100.0100.0100.0100.0
Design 2 ARAGGR70.184.996.096.799.8100.0100.0100.0100.0
WB61.382.396.095.199.6100.099.9100.0100.0
AR(1)48.974.293.390.199.3100.099.8100.0100.0
CSD61.579.992.794.399.5100.099.9100.0100.0
AR(1)-CSD50.071.687.590.399.2100.099.9100.0100.0
Design 3 CSDAGGR68.484.394.496.399.6100.0100.0100.0100.0
WB64.783.995.095.299.5100.0100.0100.0100.0
AR(1)63.983.495.195.199.5100.0100.0100.0100.0
CSD46.166.383.191.799.099.999.9100.0100.0
AR(1)-CSD51.369.184.692.999.1100.0100.0100.0100.0
Design 4 AR + CSDAGGR73.385.094.396.999.7100.0100.0100.0100.0
WB65.482.694.494.999.5100.0100.0100.0100.0
AR(1)53.575.191.790.599.1100.099.9100.0100.0
CSD48.866.583.891.399.0100.099.9100.0100.0
AR(1)-CSD40.057.275.485.898.099.099.8100.0100.0
Table 4

Rejection rate of 5% test—power

N = 50
N = 100
N = 200
T = 50100200T = 50100200T = 50100200
Design 1 i.i.d.AGGR65.283.596.496.499.7100.0100.0100.0100.0
WB61.583.097.295.599.7100.0100.0100.0100.0
AR(1)60.883.497.495.299.6100.099.9100.0100.0
CSD58.979.692.994.999.6100.0100.0100.0100.0
AR(1)-CSD62.081.093.795.699.7100.0100.0100.0100.0
Design 2 ARAGGR70.184.996.096.799.8100.0100.0100.0100.0
WB61.382.396.095.199.6100.099.9100.0100.0
AR(1)48.974.293.390.199.3100.099.8100.0100.0
CSD61.579.992.794.399.5100.099.9100.0100.0
AR(1)-CSD50.071.687.590.399.2100.099.9100.0100.0
Design 3 CSDAGGR68.484.394.496.399.6100.0100.0100.0100.0
WB64.783.995.095.299.5100.0100.0100.0100.0
AR(1)63.983.495.195.199.5100.0100.0100.0100.0
CSD46.166.383.191.799.099.999.9100.0100.0
AR(1)-CSD51.369.184.692.999.1100.0100.0100.0100.0
Design 4 AR + CSDAGGR73.385.094.396.999.7100.0100.0100.0100.0
WB65.482.694.494.999.5100.0100.0100.0100.0
AR(1)53.575.191.790.599.1100.099.9100.0100.0
CSD48.866.583.891.399.0100.099.9100.0100.0
AR(1)-CSD40.057.275.485.898.099.099.8100.0100.0
N = 50
N = 100
N = 200
T = 50100200T = 50100200T = 50100200
Design 1 i.i.d.AGGR65.283.596.496.499.7100.0100.0100.0100.0
WB61.583.097.295.599.7100.0100.0100.0100.0
AR(1)60.883.497.495.299.6100.099.9100.0100.0
CSD58.979.692.994.999.6100.0100.0100.0100.0
AR(1)-CSD62.081.093.795.699.7100.0100.0100.0100.0
Design 2 ARAGGR70.184.996.096.799.8100.0100.0100.0100.0
WB61.382.396.095.199.6100.099.9100.0100.0
AR(1)48.974.293.390.199.3100.099.8100.0100.0
CSD61.579.992.794.399.5100.099.9100.0100.0
AR(1)-CSD50.071.687.590.399.2100.099.9100.0100.0
Design 3 CSDAGGR68.484.394.496.399.6100.0100.0100.0100.0
WB64.783.995.095.299.5100.0100.0100.0100.0
AR(1)63.983.495.195.199.5100.0100.0100.0100.0
CSD46.166.383.191.799.099.999.9100.0100.0
AR(1)-CSD51.369.184.692.999.1100.0100.0100.0100.0
Design 4 AR + CSDAGGR73.385.094.396.999.7100.0100.0100.0100.0
WB65.482.694.494.999.5100.0100.0100.0100.0
AR(1)53.575.191.790.599.1100.099.9100.0100.0
CSD48.866.583.891.399.0100.099.9100.0100.0
AR(1)-CSD40.057.275.485.898.099.099.8100.0100.0

It is interesting to note that power increases faster in the cross-sectional than in the time series dimension. Going from N =50 to N =100 for given T has more impact on power than going from T =50 to T =100 for given N. This is consistent with the different rates of convergence of the statistic in the two dimensions.

Finally, we see that more complex idiosyncratic dependencies lead to a reduction in power for bootstrap methods that control level. Nevertheless, power approaches one rather quickly.

Overall, our results suggest that except for the simple case with no serial or cross-sectional dependence and large sample sizes, the use of standard normal critical values leads to large level distortions. On the other hand, a bootstrap method that adapts to the properties of the idiosyncratic terms provides excellent coverage rates, while a misspecified bootstrap still improves matters noticeably. The use of more robust bootstrap methods has a small cost in terms of power.

5 Conclusions

In this article, we have proposed the bootstrap as an inference method on the number of common factors in two groups of data. We propose and theoretically justify under weak conditions a simple bootstrap test that avoids the need to estimate the bias and variance of the canonical correlations explicitly. We have verified these conditions in the case of the wild bootstrap under conditions similar to those in AGGR (2019). However, other approaches tailored to more general DGPs are possible. Our simulation experiment shows that the bootstrap leads to rejection rates closer to the nominal level in all of the designs we considered compared to the asymptotic framework of AGGR (2019).

Funding

Funding support for this article was provided by Gonçalves and Perron: the Social Sciences and Humanities Research Council (SSHRC, grants 435-2023-0352 and 435-2020-1349, respectively) and the Fonds de recherche du Québec—société et culture (FRQSC, grant 2020-SE2-269954).

This article is based on the Halbert White Memorial Lecture given by Sílvia Gonçalves at the Society for Financial Econometrics Conference on June 15th, 2024, in Rio de Janeiro, Brazil. We thank the JFEC Editors Allan Timmermann and Fabio Trojani for the invitation, and the discussants at the conference, Michael Wolf and Eric Ghysels, for their insightful comments. We also thank participants at the following conferences: Society for Financial Econometrics (2023), North American Summer Meetings of the Econometric Society (2023), Advances in Econometrics conference in honor of Joon Park (2023), International Association for Applied Econometrics (2023), Canadian Economics Association (2023), Société canadienne de science économique (2024), CIREQ Econometrics conference in honor of Eric Ghysels (2024), and International Symposium on Non-Parametric Statistics (2024).

Notes

1

For simplicity, we focus on R^ here. Our results also apply to a test statistic based on the alternative estimator R^ defined in AGGR (2019) (this is the sample analogue of Ŕ in our notation).

2

Although we denote the bootstrap p-value by p, we should note it is not random with respect to the bootstrap measure P. A similar notation is used below to denote the bootstrap bias B and bootstrap variance ΩU of the bootstrap test statistic ξ^(kc). This choice of notation allows us to differentiate bootstrap population quantities from other potential estimators that do not rely on the bootstrap.

3

Under our Assumption 1, the asymptotic expansions of the test statistic (and of its bootstrap analogue) used to derive the limiting distributions need to have remainders of order Op(δNT4), with δNTmin(N,T), whereas AGGR (2019) need to obtain expansions up to order Op(δNT6).

4

In contrast, AGGR (2019) rely on an asymptotic expansion up to order Op(δNT6) because they require N/T5/20 rather than N/T3/20 (see their Proposition 3).

5

AGGR (2019) provide conditions under which this high-level condition holds. See in particular their Assumptions A.5 and A.6, which are used to show that ZN,t is a Near Epoch Dependent (NED) process. Since our contribution is proving the bootstrap validity in this context, we do not provide these more primitive conditions. They are not required to prove our bootstrap theory.

6

For example, we could use the principal components estimators f^jt and Λ^j when generating yjt. To distinguish these estimators from their restricted versions, we denote the latter by f˜jt and Λ˜j.

7

Although V˜jk is defined as a function of f˜kt and does not depend on resampled data, we use this notation to indicate that it is the bootstrap analogue of V˜jk.

8

We could allow for richer dynamics by assuming a sparse VAR model for the idiosyncratic error vector εjt, as in Kock and Callot (2015), Krampe and Paparoditis (2021), and Krampe and Margaritella (2021). Under sparsity, we would estimate Aj(L) by a regularized OLS estimator such as LASSO rather than OLS. The remaining steps of our bootstrap method would remain the same.

9

This means that it contains terms of order Op(δNT2) and a remainder of order Op(δNT4).

10

This means that it contains terms of order Op(δNT2) and a remainder of order Op(δNT4). Instead, AGGR (2019) need to obtain higher order expansions with remainders of order Op(δNT6) because they replace our assumption NT3/20 with NT5/20.

11

Note that V˜jk is the bootstrap analogue of V˜jkT1t=1Tfjtfkt defined in eq. (B.3) of AGGR (2019). Although we keep the star notation when defining V˜jk, we note that V˜jk is not random when we condition on the original sample. We adopt this notation to be consistent with notation in AGGR (2019).

12

We can replace f^tc and λ^j,ic with their expansions based on alternative group common factors such that f˙tc=W˙f^2t, where W˙ is k1×kc matrix collecting eigenvectors of R^ associated to kc eigenvalues. It yields the similar expansion such that f˙tc=H˙c(ftc+1N2u2t(c))+op(T1/2).

Appendix

Appendix A: Asymptotic theory

This appendix is organized as follows. In  Appendix A.1, we provide a set of primitive assumptions under which we derive the asymptotic distribution of ξ^(kc).  Appendix A.2 contains auxiliary lemmas used to derive this limiting distribution.  Appendix A.3 provides a proof of the results in Section 1.4. When describing our assumptions below, it is convenient to collect the vectors ftc,f1ts and f2ts into a vector Gt=(ftc,f1ts,f2ts), whose dimension is kc+k1s+k2s.

A.1. Primitive assumptions
 
Assumption 1

We let N,T such that TN0, and NT3/20, where N=min(N1,N2)=N2 and μNN2/N1μ[0,1].

 
Assumption 2

  • E(Gt)=0 and EGt4M such that 1Tt=1TGtGtpΣG>0, where ΣG is a non-random positive definite matrix defined as
  • For each j = 1, 2, the factor loadings matrix Λj(λj,1,,λj,Nj) is deterministic such that ||λj,i||M and ΣΛ,jlimNjΛjΛj/Nj>0 has distinct eigenvalues.

 
Assumption 3

For each j=1,2,  

  • E(εj,it)=0,E(|εj,it|8)M for any i,t.

  • E(εj,itεj,ls)=σj,il,ts,|σj,il,ts|σ¯j,il for all (t, s) and |σj,il,ts|τj,ts for all (i, l) such that 1Nji,l=1Njσ¯j,ilM,T1t,s=1Tτj,tsM, and 1NjTt,s,i,l|σj,il,ts|M.

  • E|1Nji=1Nj(εj,isεj,itE(εj,isεj,it))|4M for every (t, s).

 
Assumption 4

For each j=1,2,  

  • E(1Nji=1Nj||1Tt=1TGtεj,it||2)M, where E(Gtεj,it)=0 for all (i, t).

  • For each s, E1TNjt=1Ti=1NjGt(εj,isεj,itE(εj,isεj,it))2M.

  • E1TNjt=1TGtεjtΛj2M.

  • E(1Tt=1T1NjΛjεjt2)M.

Assumptions 2–4 are standard in the factor literature. In particular, Assumptions 2(a) and 2(b) impose standard conditions on the factors and factor loadings, respectively. They are identical to Assumptions A.2 and A.3 of AGGR (2019). Assumption 3 imposes standard time and cross-section dependence and heteroskedasticity in the idiosyncratic errors of each panel and corresponds to Assumption 2 of GP (2014). Finally, Assumption 4 imposes conditions on moments and weak dependence among {Gt} and {εj,it}. This assumption corresponds to Assumptions 3(a)–3(d) in GP (2014). Note that given Assumption 2(b), which assumes the factor loadings to be deterministic, we can show that Assumption 4(d) is implied by Assumptions 2(b) and 3(b). To see this, note that we can write
given that Assumption 2(b) and Cauchy–Schwartz’s inequality imply that we can bound |λj,iλj,l|=|k=1kjλj,ikλj,lk|(k=1kjλj,ik2)1/2(k=1kjλj,lk2)1/2=λj,iλj,lM. Assumption 4(d) then follows from Assumption 3(b) which bounds 1Nji=1Njl=1Njσ¯j,il by M for all t. The reason why we keep Assumption 4(d) is that we will give its bootstrap analogue in  Appendix B.1. Note also that, as stated in GP (2014), we can show that Assumptions 4(a) and 4(c) are implied by Assumptions 2 and 3 if we assume that the factors and the idiosyncratic errors are mutually independent. Assumption 4(b) in turn holds if we assume in addition that T2Nj1s,q=1Ti=1Nj|Cov(εj,itεj,is,εj,itεj,iq)|M, which follows if εj,it is i.i.d. and E(εj,it4)M.

A key step in deriving the asymptotic distribution of the AGGR (2019) test statistic (and of its bootstrap analogue) under our Assumption 1 is to obtain an asymptotic expansion of the factors estimation uncertainty (as characterized by 1Tt=1T(f^jtHjfjt)(f^ktHkfkt) for j,k{1,2} up to order9  Op(δNT4)). See Lemma A.5 in  Appendix A.2. As it turns out, Assumptions 1–4 are not sufficient to ensure this fast rate of convergence. For this reason, we strengthen Assumptions 1–4 as follows:

 
Assumption 5

  • For each t and j =1, 2, s=1T|γj,st|M, where γj,stE(1Nji=1Njεj,isεj,it) and l=1Njσ¯j,ilM.

  • For any j, k, 1Ts=1Tfjst=1Tγj,stεktΛkNk=Op(1).

  • For any j,k,1Ts=1Tt=1Tγj,stεktΛkNk2=Op(1).

  • For any j, k, 1Ts=1Tfjs1Tt=1T(1Ni=1Nλk,iεk,it(εj,isεj,itE(εj,isεj,it)))=Op(1), where N=min(N1,N2).

  • For any j, k, 1Ts=1Tfjs1Tt=1T1NjNki1=1Nji2i1Nkλk,i2εk,i2t(εj,i1sεj,i1tE(εj,i1sεj,i1t))=Op(1).

  • For any j, k, 1Ts=1T1Tt=1T(1Ni=1Nλk,iεk,it(εj,isεj,itE(εj,isεj,it)))2=Op(1), where N=min(N1,N2).

  • For any j, k, 1Ts=1T1Tt=1T(1NjNki1=1Nji2i1Nkλk,i2εk,i2t(εj,i1sεj,i1tE(εj,i1sεj,i1t)))2=Op(1).

Assumption 5(a) is a strengthening of Assumption 3(b) and corresponds to Assumption E.1 of Bai (2003). A similar assumption has been used by AGGR (2019). See in particular their Assumption A.7(c) on βj,t. As explained by Bai (2003), this assumption is satisfied when we rule out serial dependence, implying that γj,st=0 for st. In this case, Assumption 5(a) is equivalent to requiring that 1Nji=1NjE(εj,it2)M. More generally, this condition holds whenever for each panel j and each series i, the autocovariance function of {εj,it} is absolutely summable (thus covering all finite order stationary ARMA models).

To interpret Assumptions 5(b) and 5(c), let vktΛkεktNk and mjk,st=1Tγj,stvkt. With this notation, we can rewrite part (b) as 1Ts=1Tfjsmjk,s=Op(1) and part (c) as 1Ts=1Tmjk,s2=Op(1). The latter condition holds if Emjk,s2M for all j, k, s, which follows if part (a) holds and if we assume that Evkt2M for all k, t. To see this, note that Emjk,s2=E[(t=1Tγj,stvkt)(l=1Tγj,slvkl)]=t=1Tl=1Tγj,stγj,slE(vktvkl), which is bounded by t=1Tl=1T|γj,st||γj,sl|(Evkt2)1/2(Evkl2)1/2 by Cauchy–Schwarz’s inequality. If Evkt2M for all k, t, we can use Assumption 5(a) to verify Assumption 5(c). The assumption that Evkt2M for all k, t is a strengthening of Assumption 4(d) and both are equivalent if we assume stationarity of {εkt}. Hence, Assumption 5(c) holds under general serial and cross-sectional dependence in the idiosyncratic error terms.

A sufficient condition for Assumption 5(b) is that E1Ts=1Tfjsmjk,s2M. We can show that this condition is implied by Assumptions 3(b) and 5(a) if we assume that {fjs} and {εkt} are mutually independent. We can verify Assumptions 5(d) and 5(e) by showing that 1Ts=1Tl=1TE(Ajk,lAjk,s)M and 1Ts=1Tl=1TE(Bjk,lBjk,s)M, where Ajk,s1Tt=1T(1Ni=1Nλk,iεk,it(εj,isεj,itE(εj,isεj,it))) and Bjk,s1Tt=1T1NjNki1=1Nji2i1Nkλk,i2εk,i2t(εj,i1sεj,i1tE(εj,i1sεj,i1t)), which holds for instance if εj,it is i.i.d. with E(εj,it3)=0 and E(εj,it6)M for j =1, 2. Similarly, we can show that Assumptions 5(d) and 5(e) are verified under similar conditions on εj,it.

Our next assumption is a high-level condition that allows us to obtain the asymptotic normal distribution for the AGGR test statistic.

 
Assumption 6

  • Σ˜cc1Tt=1Tftcftc is such that Σ˜ccIkc=Op(T1/2).

  • 12Tt=1T(UtUtE(UtUt))dN(0,ΩU), where UtμNu1t(c)u2t(c) and ujt(c) is a kc×1 vector containing the first kc rows of ujt(ΛjΛjNj)1ΛjεjtNj.

Assumption 6(a) strengthens Assumption 2(a) by requiring that 1Tt=1Tftcftc converges to Ikc at rate Op(T1/2). This assumption is implied by standard mixing conditions on ftc by a maximal inequality for mixing processes and has been used in this literature. See, for example, Gonçalves, McCracken, and Perron (2017). AGGR (2019) assume factors to be mixing, explaining why they do not explicitly write this assumption. It is used to omit Σ˜cc from the term 12Tt=1T(UtUtE(UtUt)) that appears in the asymptotic expansion of the test statistic. Assumption 6(b) is a high-level condition that requires the time series process ZN,t(UtUtE(UtUt)) to satisfy a CLT. AGGR (2019) provide conditions under which this high-level condition holds. See in particular their Assumptions A.5 and A.6, which are used to show that ZN,t is a NED process. Since our contribution is proving the bootstrap validity in this context, we do not provide these more primitive conditions. They are not required to prove our bootstrap theory.

Note that our assumptions (in particular, Assumptions 2(b) and 4(d)) imply that
is O(1). This term enters the bias Btr(Σ˜cc1Σ˜U) that appears in the asymptotic distribution of the test statistic.
A.2. Asymptotic expansion of the sample covariance of the factors estimation error

The main goal of this section is to provide an asymptotic expansion of 1Tt=1T(f^jtHjfjt)(f^ktHkfkt) for j,k{1,2} up to order Op(δNT4), which is then used to characterize the bias term. See Lemma A.5 in  Appendix A.3.

To derive this result, we use the following identity for each group j, which follows from Bai (2003):
(8)
Each of the terms Aj,1t through Aj,4t is defined as follows:

The following auxiliary lemma is used to prove Lemma A.2.

 
Lemma A.1.
Suppose Assumptions 1–4 strengthened by Assumption 5 hold. Then, for any  j,k{1,2}: (a)  1Tt=1TAj,1tAk,1t=Op(δNT4);  (b)  1Tt=1TAj,2tAk,2t=Op(δNT4);  (c)  1Tt=1TAj,4tAk,4t=Op(δNT4);  (d)  1Tt=1TAj,mtAk,nt=Op(δNT4)  for  mn, where  m,n{1,2,3,4}; and (e)  
 
Lemma A.2.
Suppose Assumptions 1–4 strengthened by Assumption 5 hold. Then, for  j,k{1,2},
where ujt is as defined in Lemma A.1.
 
Proof of Lemma A.1.
Part (a): We can bound the norm of 1Tt=1TAj,1tAk,1t by
where the first equality follows by the fact that for any vectors A and B we have that AB2=tr(ABBA)=tr(AABB)=A2B2 given the definitions of the Frobenius norm of a matrix and of the Euclidean norm of a vector. The inequality then follows by Cauchy–Schwartz’s inequality. Next, we show that 1Tt=1TAj,1t2=Op(δNT4) for any j, which implies the result. To show this, write Aj,1t=Aj,1t(1)+Aj,1t(2), where
Since Aj,1t(1)+Aj,1t(2)22(Aj,1t(1)2+Aj,1t(2)2), we have that 1Tt=1TAj,1t221Tt=1TAj,1t(1)2+21Tt=1TAj,1t(2)22I1+2II1. We analyze each term separately. First, by an application of the triangle inequality and Cauchy–Schwartz’s inequality,
since T1s=1T||f^jsHjfjs||2=Op(δNT2) and s=1T|γj,st|2=O(1) given Assumptions 1–5. Similarly, we can show that II1=Op(T2)=Op(δNT4) by using Markov’s inequality and noting that
where

Note that to obtain this last bound, we impose Assumption 5(c), which is a strengthening of Assumption 3.

Part (b): we proceed as in part (a) and show that T1t=1T||Aj,2t||2=Op(δNT4) for any j{1,2}. Adding and subtracting appropriately, T1t=1T||Aj,2t||22T1t=1T||Aj,2t(1)||2+2T1t=1T||Aj,2t(2)||22I2+2II2, where Aj,2t(1)T1s=1T(f^jsHjfjs)ζj,st and Aj,2t(2)T1s=1THjfjsζj,st, with ζj,stNj1i=1Nj(εj,isεj,itE(εj,isεj,it)). First, note that
since 1T2t=1Ts=1T|ζj,st|2=Op(Nj1) by Assumption 3(c). Second, by Assumption 4(b),
Part (c): Following the same arguments as above, the result follows by showing that T1t=1T||Aj,4t||2=Op(δNT4) for any j{1,2}. Adding and subtracting appropriately, we can write Aj,4t=Aj,4t(1)+Aj,4t(2), where Aj,4t(1)1Ts=1T(f^jsHjfjs)ξj,st and Aj,4t(2)1Ts=1THjfjsξj,st, with ξj,stfjtΛjεjsNj. We show that I4T1t=1TAj,4t(1)2 and II4T1t=1TAj,4t(2)2 are both Op(δNT4) under our assumptions. For the first term, using the definition of ξj,st, we have that
since Efjt2Δ, and by Cauchy–Schwartz’s inequality,
given that 1Ts=1T||ΛjεjsNj||2=Op(1) under Assumption 4(d). For II4, using the definition of ξj,stεjsΛjNjfjt, we have that
Part (d): Given parts (a), (b), and (c), all the cross terms that involve Aj,1t,Aj,2t and Aj,4t are Op(δNT4) by an application of Cauchy–Schwartz’s inequality. Hence, we only need to show that T1t=1TAj,mtAk,3t is Op(δNT4) for m =1, 2, 4. Using the definition of Ak,3t, we have that
implying that it suffices to show that T1t=1TAj,mtεktΛkNk=Op(δNT4). To show this, an application of Cauchy–Schwartz’s inequality is not enough because T1t=1TεktΛkNk2=Op(Nk1)=Op(δNT2). Hence, using the fact that T1t=1TAj,mt2=Op(δNT4) for m3 implies by Cauchy–Schwartz’s inequality that the term in square bracket is Op(δNT3), which is larger than Op(δNT4). We need a more refined analysis, which in turn requires a strengthening of Assumptions 1–4 as given by Assumption 5. Starting with m=1, by the definition of Aj,1t, we have that
where
and
Note that we can rewrite (b1) as
if we assume that the term in the square bracket is Op(1). We impose this as a new assumption, cf Assumption 5(b). In addition,
where
provided we assume that the term in square bracket is Op(1). We impose this as a new assumption, cf Assumption 5(c). Consider next m=2. Using the decomposition of Aj,2t=Aj,2t(1)+Aj,2t(2), we can write
where
and
Note that
By Cauchy–Schwartz’s inequality, we can bound (a2) by
where we show that (a2ii)=Op(δNT6) by noting that
Finally, consider m =4. Using the decomposition of Aj,4t=Aj,4t(1)+Aj,4t(2), we can write
where
Note that
In addition,
where

implying that (a4)=Op(δNT1)Op(δNT3)=Op(δNT4).

Part (e): By definition, Aj,3t1Ts=1Tf^jsηj,st, where ηj,stfjsΛjεjtNj. Using the definition of the rotation matrix, HjVj1F^jFjTΛjΛjNj, we can rewrite this term as
by Assumptions 2(b) and 4(d).▪
 
Proof of Lemma A.2.
Using Bai (2003)’s identity to express f^jtHjfjt=Vj1(Aj,1t+Aj,2t+Aj,3t+Aj,4t), we can write
Given Lemma A.1, the dominant term is the third term. All other terms are Op(δNT4) under our assumptions, given Lemma A.1. This implies that
completing the proof.▪
A.3 Proof of Theorem 2.1

Following AGGR (2019), we define R^=V^111V^12V^221V^21, where V^jk=1Tt=1Tf^jtf^kt. The test statistic is given by ξ^(kc)l=1kcρ^l=tr(Λ^1/2), where Λ^=diag(ρ^l2:l=1,,kc) is a kc×kc diagonal matrix containing the kc largest eigenvalues of R^ obtained from the eigenvalue-eigenvector problem R^W^=W^Λ^, where W^ is the k1×kc eigenvector matrix. The main idea of the proof is to obtain an expansion of R^ through order10  Op(δNT2), where δNT=min(N,T), from which we obtain an asymptotic expansion of Λ^ and of tr(Λ^1/2).

The asymptotic expansion of R^ is based on expanding V^jk around V˜jk1Tt=1Tfjtfkt and using the fact that under the null hypothesis fjt and fkt share a set of common factors ftc (ie fjt=(ftc,fjts) for j =1, 2). Adding and subtracting appropriately yield
with V¨jkHjV˜jkHk,V˜jk1Tt=1Tfjtfkt, and X¨jk=HjX^jkHk, where letting ψjtHj1(f^jtHjfjt),

We can show that X^jk=Op(δNT2) under Assumptions 1–4 (see Lemma A.3(a)). Using this result, we can show that R^=R¨+Op(δNT2), where R¨=V¨111V¨12V¨221V¨21=(H1)1R˜H1, where R˜V˜111V˜12V˜221V˜21. The following auxiliary lemma states this result and characterizes the term of order Op(δNT2) under Assumptions 1–4. Note that for this result we do not need Assumptions 5 and 6. Nor do we need to impose the null hypothesis of kc common factors between the two panels.

 
Lemma A.3.
Let Assumptions 1–4 hold. Then, (a)  X¨jk=Op(δNT2)  and  X^jk=Op(δNT2); and (b)  R^=(H1)1[R˜+V˜111Ψ^]H1+Op(δNT4),  where  
 
Remark 1.

Lemma A.3(a) is the analogue of Lemma B.1 of AGGR (2019). Contrary to AGGR (2019), we rely on  Bai (2003)’s asymptotic expansion for  f^jtHjfjt, which explains why our set of assumptions is different from those of AGGR (2019). Lemma A.3(b) is the analogue of Lemma B.2 of AGGR (2019) under our Assumptions 1–4. Note that the order of magnitude of the remainder term follows from expressing  R^  as a function of the inverse matrices of  V^jj=V¨jj(Ikj+V¨jj1X¨jj)  and then using the expansion  (IX)1=I+X+O(X2)  to obtain  (Ikj+V¨jj1X¨jj)1=IkjV¨jj1X¨jj+Op(δNT4)  given that  X¨jk=Op(δNT2). Instead AGGR (2019) use a second-order expansion  (IX)1=I+X+X2+O(X3)  to obtain their equation (B.5). They require a higher order asymptotic expansion than ours because their rate conditions on N and T are weaker than those we assume under Assumption 1.

The next step is to obtain an asymptotic expansion of the kc largest eigenvalues of R^ when the two panels share kc common factors, that is, when fjt=[ftc,fjts] for j =1, 2 (hence, when the null hypothesis of kc common factors is true). We summarize these results in the following lemma.

 
Lemma A.4.
Suppose that Assumptions 1–4 hold and assume that  fjt=[ftc,fjts]  for j = 1, 2. Letting  Ψ^cc  denote the first  kc×kc  block obtained from  Ψ^  defined in Lemma A.3, it follows that  
 
Remark 2.

Lemma A.4 gives the asymptotic expansion of  ξ^(kc)=l=1kcρ^l  through order  Op(δNT2)  under the null hypothesis that there are kc factors that are common between the two groups. This result is a simplified version of equation (B.13) of AGGR since it only contains terms of order  Op(δNT2)  (their expansion contains terms of order  Op(δNT4)).

Next, we can use Lemma A.2 to expand 1Tt=1T(f^jtHjfjt)(f^ktHkfkt) up to a remainder of order Op(δNT4). We can then obtain the following result using the definition of Ψ^cc given above.

 
Lemma A.5.
Suppose Assumptions 1–4 strengthened by Assumption 5 hold. Then, letting  ujt(c)  denote the  kc×1  vector containing the first kc rows of  ujt(ΛjΛjNj)ΛjεjtNj  and defining  UtμNu1t(c)u2t(c), we have that under the null hypothesis of kc common factors,

The asymptotic distribution of the test statistic given Theorem 2.1 follows from the previous lemmas by adding Assumption 6 (in addition to Assumptions 1–5).

 
Proof of Lemma A.3.

Part (a): This follows from Lemma A.2 of GP (2014) and the fact that the rotation matrices are Op(1). Assumptions 1–4 are sufficient to apply this result.

Part (b): We follow AGGR (2019) but only consider a first-order asymptotic expansion of R^. In particular, we write
where we used V^jj=V¨jj(Ikj+V¨jj1X¨jj). We then use the expansion (IX)1=I+X+O(X2) to obtain (Ikj+V¨jj1X¨jj)1=IkjV¨jj1X¨jj+Op(δNT4). Contrary to AGGR (2019), we only keep terms up to order Op(δNT4). Thus, the asymptotic expansion of R^ in part (b) only considers terms that are linear in X¨jk. Terms involving products or squares of X¨jk are of order Op(δNT4)=Op(1/min(N2,T2)), which is either Op(N/T3/2) if δNT=N or Op(T/N) if δNT=T. Since we assume that T/N0 and N/T3/20, the remainder is Op(δNT4)=op(1). ▪
 
Proof of Lemma A.4.
We follow closely the derivations of AGGR (2019) leading to their equation (B.13) in Section B.1.4. Specifically, consider the eigenvector-eigenvalue problem associated with R^,R^W^=W^Λ^, where we let W^ denote the k1×kc matrix containing the kc eigenvectors of R^ associated to its largest kc eigenvalues ρ^12,,ρ^kc2, which we collect into the diagonal matrix Λ^=diag(ρ^l2:l=1,,kc). We can replace R^ from its asymptotic expansion in Lemma A.3(b):
Pre-multiplying this equation by H1 gives
Since Ψ^=Op(δNT2),R^ converges to R˜, implying that they share the same eigenvectors and eigenvalues asymptotically. The next step is to use the fact that under the null when fjt=[ftc,fjts] for j =1, 2, R˜ can be expressed as a block triangular matrix of the form
where R˜cs=Σ˜cc1Σ˜c1(Ik1kcR˜ss), with Σ˜cc=T1t=1Tftcftc,Σ˜c1=T1t=1Tftcf1ts and R˜ss is as defined in Lemma B.3 of AGGR (2019). This result is an algebraic result that only relies on the assumption that fjt=[ftc,fjts] for j=1,2. Hence, it holds under our Assumptions 1–4. The fact that R˜  has this special form is key for deriving the asymptotic distribution of the test statistic under the null hypothesis. In particular, because R˜ is block triangular, its eigenvalues are equal to the eigenvalues of Ikc and R˜ss, and we can show that the largest kc eigenvalues are all equal to 1. Similarly, the first kc eigenvectors of R˜ can be shown to be of the form (xc,0), where xc is a kc×1 vector of constants and 0 is a (k1kc)×1 vector of zeros. Hence, letting
we can follow AGGR (2019) and decompose the eigenvector and eigenvalue matrices of R^ as
where U^ is a kc×kc nonsingular matrix, and M^ and α^ are also stochastic matrices. Because Ec and Es span Rk1, the decomposition of W˜1 is true by definition. The same applies to the decompositon of Λ^. However, under the null hypothesis, and because W˜1 and Λ^ are also the eigenvector and eigenvalue matrices of R˜,α^ and M^ converge to zero at rate Op(δNT2). In particular, replacing W˜1 and Λ^ into the eigenvector–eigenvalue equation for R^ and letting Φ^V111Ψ^ gives:
Using the fact that R˜Ec=Ec under the null hypothesis and the fact that Φ^Esα^ and Esα^M^ are of order Op(δNT4) implies that
(9)
Pre-multiplying Equation (9) by Ec gives
from which we obtain
Pre-multiplying Equation (9) by Es gives R˜ssα^+Φ^scU^=α^+Op(δNT4), from which we obtain
(10)
Plugging α^ into the expansion for M^ gives
(11)
The expansions (10) and (11) correspond to equations (C.61) and (C.62) in AGGR (2019)’s Online Appendix (proof of their Lemma B.4). Given the definition of R˜cs, we can write Σ˜cc1Σ˜c,1=R˜cs(Ik1kcR˜ss)1, from which it follows that Φ^cc+R˜cs(Ik1kcR˜ss)1Φ^sc=Σ˜cc1[Σ˜ccΦ^cc+Σ˜c,1Φ^sc]. Letting f1t=(ftc,f1ts), we can write
Partitioning Φ^ accordingly, that is, letting Φ^=(Φ^ccΦ^csΦ^scΦ^ss), implies that Σ˜ccΦ^cc+Σ˜c,1Φ^sc=(V˜11Φ^)(cc), where we use the notation (A)(cc) to denote the upper-left kc×kc block of any matrix A. Since Φ^=V111Ψ^, we obtain that (V˜11Φ^)(cc)=(Ψ^)(cc)Ψ^cc, the upper-left kc×kc block of Ψ^ as defined in Lemma A.3(b). Hence,
This implies that
from which it follows that
by using the expansion (I+X)1/2=I+12X+Op(X2) with X=M^. Taking the trace of Λ^1/2 yields the asymptotic expansion of ξ^(kc)=l=1kcρ^l. ▪
 
Proof of Lemma A.5.
This result follows by replacing Ψ^cc with the expression from Lemma A.3(b). In particular, recall that Ψ^ is defined as
where B˜V˜221V˜21, and X^jk is as defined in Lemma A.3(b). Under the null hypothesis, both R˜ and B˜ have the same structure [Ec  ], which implies that the upper-left kc×kc block Ψ^cc is equal to
as argued by AGGR (2019) (see their equation (C.69) in the Online Appendix). As explained by AGGR (2019), we can rewrite the expression of Ψ^cc as
where ψjt(c)ψkt(c) denotes the upper-left kc×kc block of the matrix ψjtψkt, where ψjtHj1(f^jtHjfjt). For any j,k{1,2}, we can write

The result follows by replacing 1Tt=1T(f^jtHjfjt)(f^ktHkfkt) with the asymptotic expansion given in Lemma A.2. ▪

 
Proof of Theorem 2.1.

The proof of this result follows from Lemmas A.3, A.4, and A.5 under Assumptions 1–6, when the null hypothesis is true. ▪

Appendix B: Bootstrap results

We organize this appendix as follows. In  Appendix B.1, we provide a set of bootstrap high-level conditions which are the bootstrap analogues of Assumptions 3, 4, and 5. These conditions are used to prove two auxiliary lemmas in  Appendix B.2.  Appendix B.3 provides the proofs of the results in Section 2.

B.1. Bootstrap high-level conditions

Here, we propose a set of high-level conditions on εj,it under which we can characterize the asymptotic distribution of the bootstrap test statistic ξ^(kc). These conditions can be verified for any resampling scheme.

Condition A*

  • E(εj,it)=0, for all i, t.

  • 1Tt=1Ts=1T|γj,st|2=Op(1), where γj,stE(1Nji=1Njεj,isεj,it).

  • 1T2t=1Ts=1TE|1Nji=1Nj(εj,itεj,isE(εj,itεj,is))|2=Op(1).

Condition B*

  • 1Tt=1Ts=1Tf˜jsf˜jtγj,st=Op(1).

  • 1Tt=1TE1TNjs=1Ti=1Njf˜js(εj,isεj,itE(εj,isεj,it))2=Op(1).

  • E1TNjt=1Tf˜jtεjtΛ˜j2=Op(1).

  • 1Tt=1TEΛ˜jεjtNj2=Op(1).

Condition C*

  • 1Tt=1Ts=1Tf˜jsγj,st2=Op(1).

  • 1Ts=1Tf˜jst=1Tγj,stεktΛ˜kNk=Op(1).

  • 1Ts=1TEt=1Tγj,stεktΛ˜kNk2=Op(1).

  • 1Ts=1Tf˜js1Tt=1T(1Ni=1Nλ˜k,iεk,it(εj,isεj,itE(εj,isεj,it)))=Op(1), where N=min(N1,N2).

  • 1Ts=1Tf˜js1Tt=1T(1NjNki1=1Nji2i1Nkλ˜k,i2εk,i2t(εj,i1sεj,i1tE(εj,i1sεj,i1t)))=Op(1).

  • 1Ts=1T1Tt=1T(1Ni=1Nλ˜k,iεk,it(εj,isεj,itE(εj,isεj,it)))2=Op(1), where N=min(N1,N2).

  • 1Ts=1T1Tt=1T(1NjNki1=1Nji2i1Nkλ˜k,i2εk,i2t(εj,i1sεj,i1tE(εj,i1sεj,i1t)))2=Op(1).

 
Remark 3.

Conditions A* and B* are used in GP (2014) and  Gonçalves and Perron (2020)  and have been verified for the wild bootstrap and the CSD bootstrap, respectively, when  f˜jt  and  λ˜j,i  are the PCA estimators. Here, they are obtained as in AGGR (2019) under the null. Condition C* is new to the group factor model and needs to verified.

B.2. Asymptotic expansion of the sample covariance of the bootstrap factors estimation error
For each group j, we have that
(12)
where

First, note that 1Ts=1T||f^jsHjf˜js||2=Op(δNT2) under Conditions A* and B*, which are all from GP (2014).

The following auxiliary lemmas are the bootstrap analogues of Lemmas A.1 and A.2.

 
Lemma B.1.
Suppose Conditions A*, B* and C* hold. Then, for any  j,k{1,2}: (a)  1Tt=1TAj,1tAk,1t=Op(δNT4);  (b)  1Tt=1TAj,2tAk,2t=Op(δNT4);  (c)  1Tt=1TAj,4tAk,4t=Op(δNT4);  (d)  1Tt=1TAj,mtAk,nt=Op(δNT4)  for  mn, where  m,n{1,2,3,4}; and (e)  
 
Lemma B.2.
Suppose Conditions A*, B* and C* hold. Then, for  j,k{1,2},

where  ujt  is as defined in Lemma B.1.

 
Proof of Lemma B.1.
This proof follows closely the proof of Lemma A.1. Part (a): We can bound the norm of 1Tt=1TAj,1tAk,1t by
thus we show that 1Tt=1TAj,1t2=Op(δNT4) for any j. To show this, we write Aj,1t=Aj,1t(1)+Aj,1t(2), where
and we show that I11Tt=1TAj,1t(1)2 and II11Tt=1TAj,1t(2)2 are both of order Op(δNT4) under our bootstrap high-level conditions. First, note that
since T1s=1T||f^jsHjf˜js||2=Op(δNT2) under Condition A*, and T2t=1Ts=1T|γj,st|2=Op(T1) under Condition A*-(i). Similarly, ignoring Hj=Op(1), Condition C*(i) (which is new) implies that II1=Op(T2)=Op(δNT4) since
Part (b): We let I2T1t=1T||Aj,2t(1)||2 and II2T1t=1T||Aj,2t(2)||2, where Aj,2t(1)T1s=1T(f^jsHjf˜js)ζj,st and Aj,2t(2)HjT1s=1Tf˜jsζj,st, with ζj,stNj1i=1Nj(εj,isεj,itE(εj,isεj,it)). First, note that
since 1T2t=1Ts=1T|ζj,st|2=Op(Nj1) as implied by Condition A*(iii). Second, by Condition B*(ii),
Part (c): We let Aj,4t(1)1Ts=1T(f^jsHjf˜js)ξj,st and Aj,4t(2)Hj1Ts=1Tf˜jsξj,st, with ξj,stf˜jtΛ˜jεjsNj. We show that I4T1t=1TAj,4t(1)2 and II4T1t=1TAj,4t(2)2 are both Op(δNT4) under our assumptions. For the first term, using the definition of ξj,st, we have that
since by Cauchy–Schwartz’s inequality,
given that 1Ts=1TΛ˜jεjsNj2=Op(1) under Condition B*(iv). For II4, using the definition of ξj,stεjsΛ˜jNjf˜jt (and ignoring Hj=Op(1)), we have that
Part (d): Given parts (a), (b), and (c), all the cross terms that involve Aj,1t,Aj,2t and Aj,4t are Op(δNT4) by an application of Cauchy–Schwartz’s inequality. Hence, we only need to show that T1t=1TAj,mtAk,3t is Op(δNT4) for m =1, 2, 4. Using the definition of Ak,3t, we have that
Thus, it suffices to show that T1t=1TAj,mtεktΛ˜kNk=Op(δNT4). Starting with m =1, by the definition of Aj,1t, we have that
Note that we can rewrite (b1) as
In addition,
where
provided the term in square bracket is Op(1), which follows under Condition C*(iii). Consider next m =2. Using the decomposition of Aj,2t=Aj,2t(1)+Aj,2t(2), we can write
Note that
By Cauchy–Schwartz’s inequality, we can bound (a2) by
since
Finally, consider m =4. Using using the decomposition of Aj,4t=Aj,4t(1)+Aj,4t(2), we can write
Note that
In addition,
where
implying that (a4)=Op(δNT4).
Part (e): By definition, Aj,3t1Ts=1Tf˜jsηj,st, where ηj,stf˜jsΛ˜jεjtNj. Using the definition of the bootstrap rotation matrix, HjVj1F^jF˜jTΛ˜jΛ˜jNj, we can rewrite this term as
given in particular Condition B*(iv). ▪
 
Proof of Lemma B.2.

This follows immediately from Lemma B.1.

B.3. Proof of bootstrap results in Section 2

The section is organized as follows. First, we state several auxiliary lemmas used to prove Lemma 3.1 and Theorem 3.1, followed by their proofs. Then, we prove Lemma 3.1, Theorem 3.1 and Proposition 3.1.

Following AGGR (2019), we define R^=V^111V^12V^221V^21, where V^jk=1Tt=1Tf^jtf^kt. The test statistic is given by ξ^(kc)l=1kcρ^l=tr(Λ^1/2), where Λ^=diag(ρ^l2:l=1,,kc) is a kc×kc diagonal matrix containing the kc largest eigenvalues of R^ obtained from the eigenvalue–eigenvector problem R^W^=W^Λ^, where W^ is a k1×kc matrix of eigenvectors associated to kc largest eigenvalues. The main idea of the proof is to obtain an expansion of R^ through order Op(δNT2), where δNT=min(N,T), from which we obtain an asymptotic expansion of Λ^ and of tr(Λ^1/2).

The asymptotic expansion of R^ is based on expanding V^jk around11  V˜jk1Tt=1Tf˜jtf˜kt, where f˜jt=(f^tc,f^jts) for j =1, 2. Note that f˜jt imposes the null hypothesis that there are kc common factors among the two panels and it is different from the vector f^jt, which contains the kj largest principal components of Yj. Hence, the need to use different notation. The properties of the bootstrap test rely heavily from imposing the null hypothesis in the bootstrap DGP. Adding and subtracting appropriately yield
where letting ψjtHj1(f^jtHjf˜jt),
Under Conditions A* and B*, we can show that X^jk=Op(δNT2) (this follows from Lemma B.3 of GP (2014)). Using this result, we can show that R^=R¨+Op(δNT2), where R¨=V¨111V¨12V¨221V¨21=(H1)1R˜H1, where R˜V˜111V˜12V˜221V˜21. Note that R˜ is the bootstrap analogue of R˜V˜111V˜12V˜221V˜21 defined in Lemma B.2 of AGGR (2019).

The following auxiliary lemma provides the asymptotic expansion of R^ through order Op(δNT2).

 
Lemma B.3.
Suppose Conditions A* and B* hold. Under Assumption 1,
where  Ψ^X^11R˜+X^12B˜+B˜X^21B˜X^22B˜,  B˜V˜21, and  
 
Remark 4.

Lemma B.3 is the bootstrap analogue of Lemma B.2 of AGGR (2019) when the rate conditions on N and T are as assumed in Assumption 1. Note that under this assumption, we only require an asymptotic expansion through order  Op(δNT2), which means its remainder is of order  Op(δNT4).

 
Remark 5.

Lemma B.3 only requires Conditions A* and B*. Condition C* is not used here. Note that  V˜11=Ik1  and  V˜22=Ik2, which explains the differences between the asymptotic expansions of  R^  and  R^  (in particular, we do not need to pre-multiply  Ψ^  by  V˜111).

Since the bootstrap test statistic is defined as ξ^(kc)tr(Λ^1/2), where Λ^=diag(ρ^l2:l=1,,kc) contains the first kc eigenvalues of R^, our next result provides an asymptotic of Λ^1/2, from which we obtain an asymptotic expansion of ξ^(kc)l=1kcρ^l.

 
Lemma B.4.

Suppose Conditions A* and B* hold. Under Assumption 1,

  • Λ^1/2=Ikc+12U^1Ψ^ccU^+Op(δNT4),  where  Ψ^cc  is upper-left  kc×kc  block of the matrix  Ψ^  defined in Lemma B.3 and  U^  is a  kc×kc  matrix.

  • tr(Λ^1/2)=l=1kcρ^l=kc+12tr(Ψ^cc)+Op(δNT4).

Lemma B.4 is the bootstrap analogue of Lemma B.4 of AGGR (2019) when N and T satisfy the rate conditions of Assumption 1. In contrast to Lemma B.4 in AGGR (2019), which only holds under the null hypothesis, Lemma B.4 holds under both the null and the alternative hypothesis.

Next, we provide an asymptotic expansion of Ψ^cc through order Op(δNT2) (ie with remainder of order Op(δNT4)). This expansion is based on the asymptotic expansion of 1Tt=1T(f^jtHjf˜jt)(f^ktHkf^kt) given in Lemma B.2. This result is in  Appendix B.2, and it requires the strengthening of Conditions A* and B* with Condition C*. We can then obtain the following result using the definition of Ψ^cc given in Lemma B.3.

Recall that UtμNu1t(c)u2t(c), where ujt(c) denotes the kc×1 vector containing the first kc rows of ujt(Λ˜jΛ˜jNj)Λ˜jεjtNj.

 
Lemma B.5.

Suppose Conditions A*, B* and C* hold and assume that Assumption 1 is verified with  N=N2<N1. Defining  UtμNu1t(c)u2t(c), we have that  Ψ^cc=1TNt=1TUtUt+Op(δNT4).

 
Proof of Lemma B.3.
We follow the proof of Lemma B.2 of AGGR (2019), but only consider a first order asymptotic expansion of R^. In particular, we write
where we used V^jj=V¨jj(Ikj+V¨jj1X¨jj). We then use the expansion (IX)1=I+X+O(X2) to obtain (Ikj+V¨jj1X¨jj)1=IkjV¨jj1X¨jj+Op(δNT4), where only terms that are linear in X¨jk are larger than Op(δNT4). Terms involving products or squares of X¨jk are of order Op(δNT4) because we can show that X¨jk is of order Op(δNT2) using Lemma B.3 of GP (2014). ▪
 
Proof of Lemma B.4.
Part (a): We follow closely the proof of Lemma B.4 of AGGR (2019), but rely on Assumption 1 and the following key features of the bootstrap DGP to simplify their proof. First, note that the eigenvector-eigenvalue problem associated with R^ is R^W^=W^Λ^, where Λ^=diag(ρ^l2:l=1,,kc). We can replace R^ from its asymptotic expansion in Lemma B.3:
where we note that V˜111=Ik1 by construction. Pre-multiplying this equation by H1 gives
Note that
where R˜ssΣ˜12Σ˜21, with Σ˜12T1t=1Tf^1tsf^2ts=Σ˜21. This follows by the definition of R˜V˜111V˜12V˜221V˜21 and the fact that V˜jkT1t=1Tf˜jtf˜kt, where f˜jt(f^tc,f^jts) for j =1, 2, with f^tc=W^f^1t and f^jts as defined in Definition 2 of AGGR (2019). As argued by AGGR (2019) (specifically their p. 1271),
which implies that for j =1, 2,
Compared to the matrix R˜ defined in Lemma B.3 of AGGR (2019), here R˜cs, the upper-right block of R˜, is 0 due to the orthogonality between f^tc and f^jts for both j =1, 2. This in turn simplifies the form of R˜ss as compared to R˜ss in AGGR (2019). Importantly, the fact that R˜ is block diagonal implies that its first kc eigenvalues are all equal to 1 (since they correspond to the eigenvalues of Ikc), whereas its remaining k1s eigenvalues are those of R˜ss, which can be shown to be all smaller than one. This can be seen when f1ts and f2ts are both scalars, since then R˜ss=Φ^2, where Φ^=T1t=1Tf^1tsf^2ts is the correlation between the two group-specific factors. Moreover, the eigenvectors associated with the first kc eigenvalues of R˜ are spanned by the columns of the matrix Ec[Ikc,0]. Thus, letting Es(0,Ik1k1s), and following AGGR (2019), we can decompose the eigenvector and eigenvalue matrices of R^ as
Following AGGR (2019), by Lemma B.3, α^ and M^ converge to zero at rate Op(δNT2). Thus, replacing W˜1 and Λ^ into the eigenvector–eigenvalue equation for R^ gives:
Using the fact that R˜Ec=Ec and that Ψ^Esα^ and Esα^M^ are of order Op(δNT4) implies that
(13)
Pre-multiplying this Equation (13) by Ec gives
from which we obtain
(14)
Expansion (14) is the bootstrap analogue of equation (C.62) in AGGR (2019)’s Online Appendix (proof of their Lemma B.4), where we have used the facts that R˜cs=0 and Σ˜ccT1t=1Tf^tcf^tc=Ikc to simplify the expansion in the bootstrap world. Equation (14) implies that
from which it follows that
by using the expansion (I+X)1/2=I+12X+Op(X2) with X=M^. Part (b): This follows by taking the trace of Λ^1/2 and using the properties of the trace operator. ▪
 
Proof of Lemma B.5.
We replace Ψ^cc with the expression from Lemma B.3 and use Lemma B.5. In particular, recall that Ψ^ is defined as
where B˜V˜221V˜21=V˜21 since V˜22=Ik2, and X^jk is as defined in Lemma B.3. Since the bootstrap DGP for each panel generates bootstrap observations on Yj using f˜jt=(f^tc,f^jts), we can show that
where R˜ss=Σ˜12Σ˜21, where Σ˜12T1t=1Tf^1tsf^2ts=Σ˜21. Thus, the upper-left kc×kc block Ψ^cc is equal to
as argued by AGGR (2019) (see their equation (C.69) in the Online Appendix). Given the expressions of X^jk in Lemma B.3 and the fact that f˜jt=(f^tc,f^jts), we can then use the same arguments of AGGR (2019) to rewrite the expression of Ψ^cc as
where ψjt(c)ψkt(c) denotes the upper-left kc×kc block of the matrix ψjtψkt, where ψjtHj1(f^jtHjf˜jt). For any j,k{1,2}, we can write

The desired result follows by Lemma B.2, noting that μN=N2/N1, where Nmin(N1,N2)=N2 (without loss of generality), which implies the definition of UtμNu1t(c)u2t(c). ▪

 
Proof of Lemma 3.1.

This follows from Lemmas B.3, B.4, and B.5 under Conditions A*–C*. ▪

 
Proof of Theorem 3.1.
The asymptotic Gaussianity of the bootstrap test statistic follows from Lemma 3.1 when we add Conditions D* and E*. To see that this implies that the bootstrap p-value converges in distribution to a uniform distribution under the null hypothesis, note that

Since ΩU1/2NT(ξ^(kc)kc+B2N)dN(0,1) under the null hypothesis, the random variable inside Φ(·) in can be written as Φ1(U[0,1]), implying that pdΦ(Φ1(U[0,1]))=U[0,1]. ▪

 
Proof of Proposition 3.1.
We can rewrite p as follows:
where c1 and c2 are positive constants and ϵ is also positive. Note that Equation (1) follows by Lemma 3.1 under Conditions A*- C*, whereas Equation (2) follows by using the fact that under H1,ξ^(kc)kc+B2N=l=1kcρlkc+op(1) (since B=Op(1) and ρ^lpρl), where ρl denotes the true canonical correlations. Since l=1kcρlkc<0 when there are less than kc common factors, NT(ξ^(kc)kc+B2N)NTc1 for c1>0 under H1, as argued by AGGR (2019). Finally, we can bound T(BB) by TN1ϵc2 for some positive constant c2 by using Condition F* and the fact that B and B are positive. Thus, T(BB) is asymptotically negligible with respect to NTc1. This together with the fact that 12Tt=1TZN,t is Op(1) as assumed in Condition F* implies that pp0. ▪

Appendix C: Proof of wild bootstrap results in Section 3.1

In this appendix, we first provide three auxiliary lemmas, followed by their proofs. Then, we prove Theorem 4.1.

 
Lemma C.1.

Suppose Assumptions 1–4 hold. If either (1)  {ftc},{fjts}  and  {εj,it}  are mutually independent and for some  p2,E|εj,it|2pM<  and  E||fjt||2pM<, or (2) for some  p2,E|εj,it|4pM<  and  E||fjt||4pM<, it follows that  

  • 1Tt=1Tf^tcftcp=Op(1), and  1Nji=1Njλ^j,icλj,icp=Op(1);

  • 1Tt=1Tf^jtsHjsfjtsp=Op(1)  and  1Nji=1Njλ^j,is(Hjs)1λj,isp=Op(1);

  • 1NjTi=1Njt=1T|ε˜j,it|p=Op(1),

where  Hjs=(Vjs)1F^jsFjsTΛjsΛjsNj  and  Vjs  is the  kjs×kjs  diagonal matrix containing the  kjs  largest eigenvalues of  ΞjΞj/NjT  on the main diagonal in descending order.

 
Lemma C.2.

Assume that Assumptions 1–6 strengthened by Assumption WB1 and WB2 hold. Then Lemma 3.1 follows for Algorithm 1.

 
Remark 6.

In Lemma C.2, we verify that the bootstrap method generated by Algorithm 1 satisfies Conditions A* through C*. To verify these conditions, we use Lemma C.1 which is valid under  H0  and  H1. Therefore,Lemma C.2 is satisfied regardless of the fact that either  H0  or  H1  is true.

In the following Lemma C.3, we obtain the uniform expansions of the group common factors, factor loadings, group-specific factors, and group-specific factor loadings up to order op(T1/2) under H0 to verify Condition D*. Note that Lemma C.3 is only valid under H0.

 
Lemma C.3.

Assume that Assumptions 1–5 hold and  H0  is true. Then, for j = 1, 2, we have the following:  

  • f^tc=Hc(ftc+1N1u1t(c))+op(T1/2);

  • λ^j,ic=(Hc)1λj,ic+Hc1Tt=1Tftcεj,it+Hc1Tt=1Tftcfjtsλj,is+op(T1/2);

  • f^jts=H˜js(f˜jts+1Njujt(s))+op(T1/2);

  • λ^j,is=(H˜js)1λj,is+H˜js1Tt=1Tf˜jtsεj,it+op(T1/2),

where  f˜jtsfjtsΣ˜j,cΣ˜cc1ftc  and  H˜js=(Vjs)1F^jsF˜jsTΛjsΛjsNj  and  Vjs  is defined in Lemma C.1.

 
Proof of Lemma C.1.
Part (i): Recall that f^tc=W^f^1t, where W^ is s k1×kc matrix collecting the eigenvectors of R^ associated to the kc largest eigenvalues and W^W^=Ikc. By following Proposition 1 in AGGR (2019), ftc=Wf1t, where W is a k1×kc matrix of eigenvectors of R associated to the kc largest eigenvalues. Then, by adding and subtracting appropriately, we can write f^tcftc=W^(f^1tH1f1t)+(H1W^W)f1t. By the cr-inequality, we can bound part (i) as follows:
where we let W˜1=H1W^. It is sufficient to show that 1Tt=1Tf^1tH1f1tp=Op(1). By following the arguments in GP (2014) (i.e., their Lemma C.1(i)), given that E|εj,it|2pM< and Efjt2pM<, we have 1Tt=1Tf^1tH1f1tp=Op(1). If we assume that {fjt} and {εj,it} are independent, then EfjtpM< and E|εj,it|2pM< are sufficient. Next, we show that 1Nji=1Nj||λ^j,icλj,ic||p=Op(1). Since Λ^jc=1TYjF^c and Yj=FcΛjc+FjsΛjs+εj, we can write Λ^jc as follows:
Then, λ^j,icλj,ic=1TF^c(F^cFc)λj,ic+1T(F^cFc)Fjsλj,is+1TFcFjsλj,is+1T(F^cFc)εj,i+1TFcεj,i. We apply the cr-inequality and show that each term is Op(1). In particular,
To see that the first term is bounded, note that
Similarly, for the second term,
where we can show T1/2Fjsp=(1T||Fjs||2)p/21Tt=1T||fjts||pM, provided E||fjts||pM<. We can bound the third term as 1Nji=1Nj1TFcFjsλj,isp1Nji=1Nj||λj,is||p1TFcFjsp, ignoring Hcp=U^p=Op(1). Given ||λj,is||pM, it suffices to show that 1TFcFjsp=Op(1). By Markov’s inequality, this follows from E1Tt=1Tftcfjtsp1Tt=1TE||ftcfjts||p, which is bounded given E||fjt||2pM<, if ftc and fjts are not independent (otherwise, E||fjt||pM< is sufficient). The fourth term can be bounded as 1Nji=1Nj1T(F^cFcHc)εj,ip||T1/2(F^cFcHc)||p1Nji=1Nj||T1/2εj,i||p=Op(1), where
given E|εj,it|pM<. Similarly, we can bound the last term as 1Nji=1Nj||T1Fcεj,i||p=Hcp(1Nji=1Nj||T1/2εj,i||p)(||T1/2Fc||p)=Op(1), given E|εj,it|pM< and E||ftc||pM<.
Part (ii): Note that f^jts is the principal component estimator from Ξjt=yjtΛ^jcf^tc. By using the fact that yjt=Λjcftc+Λjsfjts+εjt, we can write Ξjt as follows:
Then, using the identity from the proof of Theorem 1 in Bai (2003), we have
where Vjs is the kjs×kjs matrix of kjs eigenvalues of ΞjΞj/(TNj) in its diagonal elements and ψj,lts=1Nji=1Njej,ilej,it,ηj,lts=1Nji=1Njλj,isfjlsej,it, and ξj,lts=1Nji=1Njλj,isfjtsej,il. Using this identity and the cr-inequality, we have
where ats=1Tpl=1Tf^jlsψj,ltsp,bts=1Tpl=1Tf^jlsηj,ltsp, and cts=1Tpl=1Tf^jlsξj,ltsp. Let χj,lt denote either ψj,lts,ηj,lts, or ξj,lts. Then, we can show l=1Tf^jlsχj,ltp=(l=1Tf^jlsχj,lt2)1/2(l=1T||f^jls||2l=1T|χj,lt|2)p/2. Under this inequality, we can show that
where we use the fact that 1Tt=1T||f^jts||2=kjs. It suffices to show that 1T2t=1Tl=1T|χj,lt|p=Op(1). Starting with χj,lt=ψj,lts, we can write as
where we let ej,it=εj,it+c^j,it, with c^j,itλj,icftcλ^j,icf^tc. Using the cr-inequality, we can show that 1T2Njt=1Tl=1Ti=1Nj|εj,itεj,il|p1T2Njt=1Ti=1Nj|εj,it|2p=Op(1), given that E|εj,it|2pM<. For the second term, it suffices to show that 1NjTi=1Njt=1T|c^j,it|2p=Op(1), because 1T2Njt=1Tl=1Ti=1Nj|εj,itc^j,il|p(1NjTi=1Njl=1T|εj,il|2p)1/2(1NjTi=1Njl=1T|c^j,it|2p)1/2 by using the cr- and Cauchy–Schwarz’s inequalities. Using the definition of c^j,it, we have
To show that this term is Op(1), it suffices that 1Nji=1Nj||λ^j,icλj,ic||2p=Op(1), and 1Tt=1T||f^tcftc||2p=Op(1), given that ||λj,ic||2pM. Assuming ftc and fjts are independent, provided that E||fjt||2pM and ||λj,is||2pM, we have 1Nji=1Nj||λ^j,icλj,ic||2p=Op(1) (otherwise, we need E||fjt||4pM). Assuming that fjt and εj,it are independent, given that ||λj,ic||2pM,E|εj,it|2pM and E||fjt||2pM, we have 1Tt=1T||f^tcftc||2p=Op(1) (otherwise, we need E|εj,it|4pM and E||fjt||4pM). The remaining terms can be handled similarly by using 1NjTi=1Njt=1T|εj,it|2p=Op(1) and 1NjTi=1Njt=1T|c^j,it|2p=Op(1). For instance, letting χj,lt=ηj,lts and assuming that fjt and εj,it are independent, we have that
which is Op(1) by showing that 1NjTt=1Ti=1Nj|ej,it|2p=Op(1), given that ||λj,is||2pM and E||fjls||M. To show that 1NjTt=1Ti=1Nj|ej,it|2p=Op(1), it is sufficient to have 1NjTt=1Ti=1Nj|εj,it|2p=Op(1) and 1NjTt=1Ti=1Nj|c^j,it|2p=Op(1). We can use a similar argument when χj,lt=ξj,lts.
Next, we show that 1Nji=1Nj||λ^j,is(Hjs)1λj,is||p=Op(1). Note that Λ^js=1TΞjF^js and Ξj=FjsΛjs+ej, where ej=εj+(FcΛjcF^cΛ^jc). Then, we can write λ^j,is as follows:
Under this identity and the cr-inequality, we can bound 1Nji=1Nj||λ^j,is(Hjs)1λj,is||p by
The first term is Op(1) since ||λj,is||pM< and
For the second term, we have
Similarly, the third term can be bounded as
given that E||fjts||pM<.
Part (iii): To show 1NjTi=1Njt=1T|ε˜j,it|p=Op(1), we first rewrite ε˜j,it as follows:
Using the identity above and the cr-inequality, we have
To end the proof, we show that (a) through (e) are Op(1). The fact that (a) is Op(1) follows from E|εj,it|pM<. The term (b) can be bounded by (1Nji=1Nj||λj,ic||p)(1Tt=1T||f^tcHcftc||p)=Op(1) using part (i) and ||λj,ic||pM. We can also bound the term (c) as
where
The term (d) can be bounded by (1Nji=1Nj||λj,is||p)||(Hjs)1||p(1Tt=1T||f^jtsHjsfjts||p)=Op(1), by part (i) and ||λj,is||pM<. Finally, the last term can be bounded as follows:
where
 
Proof of Lemma C.2.
As argued in Remark 3, Condition A*–B* are verified for the wild bootstrap in GP (2014) (for details, see their proof of Theorem 4.1). Therefore, we focus on Condition C*. Part (i): By Cauchy–Schwarz’s inequality, we can show that
Since we have 1Ts=1Tf˜js4=Op(1), by applying Lemma C.1 with p =4, it is sufficient to show that 1Tt=1Ts=1T|γj,st|4=Op(1). Noting that γj,st=0 for st and using cr-inequality, we have that
Lemma C.1 with p =8 implies the last equality. Part (ii): By letting mjk,st=1Tγj,stΛ˜kεktNk, we can write the sufficient condition for part (ii) to be Op(1) as follows:
where
By noting that γj,st=0 if st, it follows that
where the first parenthesis can be bounded as
given Lemma C.1 with p =4. By Cauchy–Schwarz’s inequality, we can also bound the second parenthesis as follows:
where we apply Lemma C.1 with p =16 to obtain 1NjTt=1Ti=1Njε˜j,it16=Op(1). Part (iii): We rewrite the term as follows:
where the third equality follows since E(εk,itεk,ml)=ε˜k,itε˜k,mlE(ηk,itηk,ml)=0 if either im or tl and the fourth equality follows since γj,st=0 for st. To verify part (iv), a sufficient condition is that E1Ts=1Tf˜js1Tt=1T(1Ni=1Nλ˜k,iεk,it(εj,isεj,itE(εj,isεj,it)))2=Op(1). To simplify the notation, we define ψ1,jk,st=1Ni=1Nλ˜k,iεk,it(εj,isεj,itE(εj,isεj,it)).
We simplify the expression of X1 depending on the choices of j, k, i1,i2, and s, t, q, and l. To simplify the notation, we let j = k and ignore the group notation (if jk, under the group independence, the proof is simpler). If i1i2, we have X1=ε˜i1t3ε˜i2l3E(ηi1t3)E(ηi1l3)=0, when s=tl=q or s=t=l=q, since ηiti.i.d.N(0,1). Therefore, we only need to consider the case of i1=i2(=i). For this case, X1 takes a non-zero value for three different cases: s=ltq (X1=ε˜it2ε˜iq2ε˜is2),s=lt=q  (X1=3ε˜it4ε˜is2), and s=l=t=q  (X1=10ε˜it6). Considering these cases and using Cauchy–Schwarz’s inequality and cr-inequality, we can bound the above condition as follows:
By applying Lemma C.1 with p =8, we can show that 1Ts=1T||f˜s||4=Op(1),1Ni=1N||λ˜i||4=Op(1), and 1NTi=1Nt=1T|ε˜it|8=Op(1). To prove the part (v), E1Ts=1Tf˜js1Tt=1Tψ2,jk,st2=Op(1), where ψ2,jk,st=1NjNki1=1Nji2i1Nkλ˜k,i2εk,i2t(εj,i1sεj,i1tE(εj,i1sεj,i1t)). We have that
To show that this is Op(1), we expand the expression for E(ψ2,jk,stψ2,jk,lq). Ignoring the group notation and considering the case where j = k, we can rewrite E(ψ2,jk,stψ2,jk,lq) as
Since X2 is non-zero only if i1=i3i2=i4, we consider X2 depending on s, t, q, and l and i1=i3i2=i4. When t=qs=l,X2=ε˜i2t2ε˜i1s2ε˜i1t2, and when t=q=s=l,X2=2ε˜i1t4 using the fact that ηiti.i.d.N(0,1). For the other combinations of s, t, q, and l, we have X2=0. Considering this and putting the group notation back, we can bound E1Ts=1Tf˜js1Tt=1Tψ2,jk,st2 as follows:
Part (vi): As a sufficient condition, we can show that 1Ts=1TE1Tt=1Tψ1,jk,st2=Op(1). Since E1Tt=1Tψ1,jk,st2=1T2t=1Tq=1TE(ψ1,jk,stψ1,jk,sq), we focus on expanding E(ψ1,jk,stψ1,jk,sq) (ignoring the group notation) as follows:
If i1i2, we have X3=0, since ηiti.i.d.N(0,1). When i1=i2, we have five cases to consider: if stq,X3=ε˜it2ε˜iq2ε˜is2; if s=tq,X3=2ε˜it4ε˜iq2; if s=qt,X3=2ε˜it2ε˜is4; if q=ts,X3=3ε˜it4ε˜is2; and if s = t = q, X3=10ε˜it6. Therefore,
by applying Lemma C.1 with p =12. For part (vii), we use similar arguments and show that 1Ts=1TE1Tt=1Tψ2,jk,st2=Op(1), where ψ2,jk,st is defined in part (v). In particular, ignoring the group notation and considering j=k (Nj=Nk=N), we can write E(ψ2,stψ2,sq) as follows:
where X4=0 when i1=i4i2=i3, and X40 when i1=i3i2=i4 with st=q or s = t = q. It follows that
given Lemma C.1 with p =4. ▪
 
Proof of Lemma C.3.
Part (i): Since f^tc=W^f^1t and we can write the factors estimation error as f^jtHjfjt=Vj1(Aj,1t+Aj,2t+Aj,3t+Aj,4t) as in Lemma A.2 in  Appendix A, we can write f^tc as follows:
where we use W˜1=H1W^ and W˜1=[Ec+Es(Ik1kcR˜ss)1Φ^sc]U^  from the proof of Lemma A.4 in  Appendix A. Note that Ecf1t=ftc and Esf1t=f1ts under H0. By letting Hc=U^, we can rewrite f^tc as follows:
where we use Φ^sc(Ik1kcR˜ss)1=Op(δNT2). For the rest of the terms, we use Lemma A.2 in Bai (2003): 1Ts=1Tf^1sγ1,st=Op(δNT1T1/2); 1Ts=1Tf^1sζ1,st=Op(δNT1N1/2); 1Ts=1Tf^1sη1,st=Op(N1/2); and 1Ts=1Tf^1sξ1,st=Op(δNT1N1/2). Since  Op(δNT1N1/2)=op(T1/2) and  Op(δNT1T1/2)=op(T1/2), we can simplify the asymptotic expansion of f^tc up to order op(T1/2) as follows:
where we use the fact that F^1F1T=V1H1(Λ1Λ1N1)1 by the definition of H1 in the second equality and use the expression for W˜1 in the third equality.
Part (ii): Next, we show the asymptotic expansion of λ^j,ic up to order op(T1/2). In particular, by using the fact that Λ^jc=1TYjF^c and Yj=FcΛjc+FjsΛjs+εj and substituting appropriately, we can write λ^j,ic as follows:
Then, to prove part (ii), we show that the terms (a) through (d) are op(T1/2). Using the expansion of f^tc from part (i), we can rewrite the term (a) as follows:

Similarly, we can show that (b) and (d) are op(T1/2) by replacing f^tcHcftc with its expansion up to order op(T1/2). For example, ignoring Hc=Op(1), the term (d) is 1Tt=1T(f^tcHcftc)ftc=(Λ1Λ1N1)11N1T(1N1Tt=1Tftcε1tΛ1)=Op((TN1)1/2)=op(T1/2) by Assumption 4(c). Using the proof of Lemma A.1(e), we can show that 1NTt=1Tu1tu1t=Op(N1) and show that (e)=op(T1/2). Our asymptotic expansions for f^tc and λ^j,ic are equivalent to those in AGGR (2019) (specifically, (C.92) and (C.94) in their Online Appendix).

Part (iii): To obtain the asymptotic expansion of f^jts, we follow the arguments in AGGR (2019) closely. Recall that f^jts are principal components of the residuals such that ξj,it=yj,itf^tcλ^j,ic. Following the arguments in AGGR (2019), by replacing f^tc and λ^j,ic with their asymptotic expansions of order up to op(T1/2) and using the fact that HcHc=Σ˜cc1+op(T1/2), we can rewrite ξj,it as follows:12  
Using the identity from Bai (2003) as in  Appendix A.2, we can write f^jtsH˜jsf˜jts as follows:
where
We show that Bj,1t and Bj,3t are op(T1/2) and Bj,2t have a bias term of order Op(Nj1/2) up to order op(T1/2) (ignoring (Vjs)1=Op(1)). Letting wj,icΣ˜cc11Tt=1Tftcεj,it, we first rewrite Bj,1t as follows:
We can show that all nine terms in Bj,1t are op(T1/2). We can show that Bj,1t,(1)=op(T1/2) by applying similar arguments in Lemma A.2 in Bai (2003). We can write the next term as follows:
Then, (b1)=Op(T1/2) since it is equivalent to 1T(1TNjl=1Tf˜jlsεjlΛjc), which is Op(T1/2) by Assumption 4(c). By applying Cauchy–Schwarz’s inequality,
Since we can show that max1tTujt(c)=Op(T) by Assumption WB3(c), we have Bj,1t,(2)=op(T1/2). We can use similar arguments to show that Bj,1t,(3)=op(T1/2). Specifically, we can write Bj,1t,(3)=(1Tl=1Tf^jlsujl(c))(1Nji=1Njλj,i(c)εj,it)1Nj. Then, by using the fact that 1Tl=1Tf^jlsujl(c)=Op(δNT1) and maxtΛjcεjtNj=Op(T1/2), we can show that Bj,1t,(3)=op(T1/2). We write the next term as
By Assumption 4(b), we have 1NjTi=1Njs=1Tfsc(εj,isεj,ilE(εj,isεj,il))=Op(1). Then, we can show that the first term in the square bracket is Op(Nj1/2T1/2). We can decompose the second term in the square bracket into two parts as follows:
where we use the fact that 1Tt=1T1Ts=1Tfscγj,sl=Op(T2) by following the arguments in the proof of Lemma A.1(a). Then, by Assumption 2(a), we can show that max1tTftc=Op(T1/4) and hence, Bj,1t,(4)=op(T1/2). Using similar arguments, we can also show that Bj,1t,(5)=op(T1/2). Similar to the arguments to show that Bj,1t,(2)=op(T1/2), we can show that 1Tl=1Tf^jlsujl(c)=Op(δNT1). Then, since Bj,1t,(6) is equivalent to write it as (1Tl=1Tf^jlsujl(c))(ΛjcΛjcNj)1Njujt(c), we can show this term is Op(δNT1Nj1T1/2)=op(T1/2). Next, Bj,1t,(7) can be bounded as follows:
where we use the fact that 1Nji=1Njλj,icwj,ic=1Nj(1NjTi=1Njt=1TftcεjtΛjc)Σ˜cc1=Op(Nj1/2) by Assumption 4(c). Following similar arguments, we have Bj,1t,(8)=Op(Nj1)=op(T1/2). Next, Bj,1t,(9) can be bounded as follows:
where we use the fact that 1Nji=1Njwj,icwj,ic(1Nji=1Nj1Tt=1Tftcεj,it2)Σ˜cc12=Op(1) by Assumption 4(a). Since we show that Bj,1t,(i)=op(T1/2) for i=1,,9, we have Bj,1t=op(T1/2). Next, our goal is to expand Bj,2t up to order op(T1/2). We first rewrite Bj,2t as follows:
We have Bj,2t,(i)=Op(Nj1/2) for i =1, 2 and Bj,2t,(3)=op(T1/2). To see this, we can bound Bj,2t,(3) as follows:
Using the definition of e˜jl, we can decompose Bj,3t into three parts as follows:
Our next goal is to show that Bj,3t,(i)=op(T1/2) for i =1, 2, 3. The first term Bj,3t,(1) can be bounded by 1TNjl=1Tf^jlsεjlΛjsmaxtf˜jts and since we can show that 1TNjl=1Tf^jlsεjlΛjs=Op(δNT2), we have Bj,3t,(1)=op(T1/2). Bj,3t,(2) can be shown as op(T1/2) by applying that 1Tl=1Tf^jlsujl(c)=Op(δNT1). The last term Bj,3t,(3) can be bounded as follows:
by applying that 1Nji=1Njwj,icλj,is=Op(Nj1/2). Therefore, we can expand f^jts up to order op(T1/2) as follows:
Noting that 1Tl=1Tf^jlsf˜jls is equivalent to (Vjs)1H˜js(ΛjsΛjsNj)1 by H˜js=(Vjs)1(F^jsF˜jsT)(ΛjsΛjsNj), we can also write f^jts as follows:

Then, by following the arguments in AGGR (2019) (specifically, the arguments on page 56 in their Online Appendix), we can show that νjts=ujt(s) and this completes the proof of part (iii).

Part (iv): Recall that λ^j,is=1Tt=1Tf^jtsξj,it. By expanding ξj,it up to order op(T1/2) as in the proof of part (iii), we can rewrite λ^j,is as follows:
To expand further, we analyze three terms (a), (b), and (c). By replacing f^jts with (f^jtsH˜jsf˜jts)+H˜jsf˜jts, we can decompose (a) further into two parts: 1Tt=1T(f^jtsH˜jsf˜jts)(f^jtsH˜jsf˜jts) and H˜js1Tt=1Tf˜jts(f^jtsH˜jsf˜jts). By using that 1Tt=1Tujtujt=Op(1), we can show that 1Tt=1T(f^jtsH˜jsf˜jts)(f^jtsH˜jsf˜jts)=1Nj(1Tt=1Tujt(s)ujt(s))+op(T1/2)=op(T1/2). By Assumption 4(c), we can also show that H˜js1Tt=1Tf˜jts(f^jtsH˜jsf˜jts)=Op((TNj)1/2)=op(T1/2). Next, we rewrite (b) as follows:
Since we can write (b1)=(ΛjsΛjsNj)1(1TNjt=1Ti=1Njλj,iεj,it2(ΛjsΛjcNj)1TNjt=1Tεj,itujt(c)), using the similar arguments in the proof of part (ii), we can show that (b1)=Op(δNT2)=op(T1/2). Also, using the fact that 1Tt=1Tujtujt=Op(1),(b2)=Op(Nj1)=op(T1/2). We can bound the term (b3) as follows:
where we use max1iNwj,ic=Op(logNj) by Assumption WB3(b) and 1Tt=1Tujt(s)ftc=Op(T1/2) by Assumption 4(c). Next, we expand the term (c) by using the definition of e˜j,it, ignoring H˜js=Op(1).
We can show the terms (c1) and (c2) are op(T1/2). Using the similar arguments above, we can show that (c1)=Op((TNj)1/2)=op(T1/2) by Assumption 4(c). By using the definition such that f˜jts=fjtsΣ˜j,cΣ˜cc1ftc, we can show that (c2)=0. Finally, by plugging all the terms back into expansion of λ^j,is and keeping only the terms non-negligible up to order op(T1/2), we have the following expansion for λ^j,is:

Since we can show that H˜jsH˜js=(1Tt=1Tf˜jtsf˜jts)1+op(T1/2), we can show this expansion is equivalent to the expansion of λ^j,is in AGGR (2019) (i.e., equation (C.95) in their Online Appendix). ▪

 
Proof of Theorem 4.1.

We first verify Theorem 3.1 and then Proposition 3.1. Given Lemma C.2, it suffices to show that the wild bootstrap in Algorithm 1 satisfies Condition D* and Condition E*.

Condition D*: Recall that B=tr(Σ˜U). By the fact that ηj,it are i.i.d.N(0, 1) across (j, i, t),
Since ujt(Λ˜jΛ˜jNj)1Λ˜jεjtNj, we can write Σ˜U,jj for j =1, 2 as follows:
where we define Γ˜j1Nji=1Njλ˜j,iλ˜j,i1Tt=1Tε˜j,it2. Next, recall that B=tr(Σ˜cc1Σ˜U). For example, by Assumption WB2, we have Σ˜U=μN2Σ˜U,11+Σ˜U,22, where
and γj,ii1Tt=1TE(εj,it2). Our goal is to show that Σ˜U,jj=Σ˜U,jj+op(T1/2). In fact, since the asymptotic expansions of λj,ic and λj,is are equivalent to those in AGGR (2019), by applying their Lemma B.8, we can show that Σ˜U,jj=Σ˜U,jj+op(T1/2). In particular, by using asymptotic expansions in Lemma C.3 and by stacking over i for λ^j,ic and λ^j,is, we have the following expansions:
where Wjc=(1Tt=1Tεjtftc)Σ˜cc1 and Wjs=(1Tt=1Tεjtf˜jts)(1Tt=1Tf˜jtsf˜jts)1. Then, we have the following expansion, which is equivalent to equation (C.98) in AGGR (2019):
where
Similar to the arguments in AGGR (2019) and by our Assumptions 4(a) and 4(c), we can show that
where LΛ,j=(ΛjΛjNj)Qj. The rest of the proof similarly follows the arguments in Lemma B.8 in AGGR (2019).
Condition E*: For simplicity, we assume that kc=1 and kjs=0 for j =1, 2 and N1=N2=N. We first derive ΩU. Using Algorithm 1, we have
where we use Cov(ZN,t,ZN,s)=0 for ts since ujt and uks are independent for either ts or jk under Assumption WB2. We can write Var(ZN,t) as
where zjt=ujt2E(ujt2). By Assumption WB2, E(z1tz2t)=0 and E(zjtu1tu2t)=0. In addition, we can show that E(zjt2)=2(Λ˜jΛ˜jN)41N2i=1Nk=1Njλ˜j,i2λ˜j,k2ε˜j,it2ε˜j,kt2, and E(u1t2u2t2)=(Λ˜1Λ˜1N)2(Λ˜2Λ˜2N)21N2i=1N1j=1N2λ˜1,i2λ˜2,j2ε˜1,it2ε˜2,jt2. Using these expressions, we can rewrite ΩU as follows:
To show that ΩUpΩU, first note that under Assumption WB2, ΩU=12(ΣU,11+ΣU,22)2, where ΣU,jj=limNΣ˜U,jj. The proof follows by showing that (I) and (II) converge in probability to ΣU,112 and ΣU,222, respectively, and (III) converges in probability to 2ΣU,11ΣU,22. For each j =1, 2, we can show that 1N2i=1Nk=1Nλ˜j,i2λ˜j,k21Tt=1Tε˜j,it2ε˜j,kt2=Ωjj2(0)+op(1), where Ωjj1Ni=1Njλj,iγj,ii. By appropriately adding and subtracting, a detailed proof involves three steps (ignoring the group notation): (i) 1N2i,j=1Nλ˜i2λ˜j2(1Tt=1Tε˜it2ε˜jt2εit2εjt2)=op(1), (ii) 1N2i,j=1N(λ˜i2λ˜j2λi2λj2)1Tt=1Tεit2εjt2=op(1), and (iii) 1N2i,j=1Nλi2λj2(1Tt=1Tεit2εjt2γiiγjj)=op(1). By Assumption WB3(a), we can show that 1T(1Tt=1Tεit2εjt2E(εit2εjt2))=Op(T1/2), which gives us (iii) = op(1). Next, to show that (ii) = op(1), we first rewrite the term as follows:
by using that λ˜i2λ˜j2λi2λj2=(λ˜i2λi2)λj2+(λ˜j2λj2)λ˜i2. Using Cauchy–Schwarz’s inequality, we have
For the first term,
where we use the following fact
by Assumption WB3(b) and by following Gonçalves and Perron (2020) (see proof of their Lemma B.2.). Finally, for part (i), using that ε˜it2ε˜jt2εit2εjt2=ε˜it2(ε˜jt2εjt2)+(ε˜it2εit2)εjt2 and decompose (i) into two parts: (i-a) and (i-b). We can rewrite (i-a) as follows:
We can show that (iaa)=Op(1). Since we can write (iab)(1Ni=1N|λ˜i|4)(1NTt=1Ti=1N|ε˜it2εit2|2), our goal is to show that 1NTt=1Ti=1N|ε˜it2εit2|2=op(1). Since ε˜it2εit2=(ε˜itεit)(ε˜it+εit), we have

By applying similar arguments in Gonçalves and Perron (2020) (proof of their Lemma B.2), we can show that 1Tt=1T|maxiN(ε˜itεit)2|2=Op((1N+logNT)2) by Lemma C.1 and Assumption WB3(b). The proof for (i-b) is similar.

Next, we show that 1TΩU1/2t=1TZN,tdpN(0,1). We let ωN,t(ΩU)1/2ZN,t (given that ZN,t depends on η1t and η2t,ZN,t is an independent array) and apply a CLT for heterogeneous independent random vectors on 1Tt=1TωN,t. We have E(ωN,t)=0 and Var(1Tt=1TωN,t)=1. Therefore, it suffices to show that E|ωN,t|2d=Op(1) for some d >1 (Lyapunov’s condition) and it is sufficient to show that E|ZN,t|2d=Op(1) (E|ωN,t|2d|ΩU1/2|2dE|ZN,t|2d). Note that ZN,t=z1,Nt+z2,Nt2u1tu2t, where zjt=ujt2E(ujt2). By applying the cr-inequality, we have
We need to show that E|zjt|2d=Op(1) and E|u1tu2t|2d=Op(1). To show that E|u1tu2t|2d=Op(1), with d =2, it suffices to show that E|ujt|4=Op(1) (E|u1tu2t|2dE|u1t|2dE|u2t|2d) as follows:
where E(εj,i1tεj,i2tεj,i3tεj,i4t)η¯4max{E(ηj,it4),1} and E(ηj,it4)<C. Next, we show that E|zjt|2d=Op(1). Since zjt=ujt2E(ujt2), we have
where C is some positive constant. Taking d =2, it is sufficient to show that

Since ηj,it i.i.d.N(0,1) we have four cases to consider. If i1==i8, we have 1Nj4i=1Nλ˜j,i8ε˜j,it8=E(ηj,it8)1N3(1Ni=1Nλ˜j,i8ε˜j,it8)=Op(1) since E|ηj,it|8<C. For the second case, we consider i1=i2,,i7=i8, we have E|ujt|8=1N4imklλ˜j,i2λ˜j,m2λ˜j,k2λ˜j,l2ε˜j,it2ε˜j,mt2ε˜j,kt2ε˜j,lt2. The third case is i1=i2=i3=i4 and i5==i8. In this case, we can bound it as C11N2(1Ni=1Nλ˜j,i4ε˜j,it4)2 since E|ηj,it|4<C1. Finally, we consider when i1==i6 and i7=i8, and in this case, we can bound the term as C21N2(1Ni=1Nλ˜j,i6ε˜j,it6)(1Ni=1Nλ˜j,i2ε˜j,it2), where we use E|ηj,it|6<C2.

Finally, we show that under the alternative hypothesis, the wild bootstrap method satisfies Condition F* and therefore Proposition 3.1 follows. Under the alternative hypothesis, we have no group common factor in our simple setting. The one factor that we extract from each group is group-specific factor. Provided that the group common factor, f^tc, is estimated using the first group factor, f^1t, there is an additional bias term appearing only in the second group. Particularly, we have
(15)
 
(16)
Note that Φ=corr(f1t,f2t) and under the alternative hypothesis, the estimated factor loadings in the second group consistently estimate only a portion of the true factor loadings of the second group. Moreover, the residual term in the second group will be containing the bias term. Using Equations (15) and (16), we can rewrite the term related to the second group in the bootstrap bias, B as follows:

We can show that all the terms are Op(1) under Assumptions 2 and 3.

We also need to show that 1Tt=1TZN,t=Op(1) under the alternative hypothesis. To show this, it is sufficient to show that Var(1Tt=1TZN,t)=Op(1). Under the Assumption WB2, we have Cov(ZN,t,ZN,s)=0 and Var(1Tt=1TZN,t)=1Tt=1TVar(ZN,t). Since we showed that Var(ZN,t)=E(z1t2)+E(z2t2)+4E(u1t2u2t2) in the proof of Condition E*, where zjt=ujt2E(ujt2), we focus on three terms: E(z1t2),E(z2t2) and E(u1t2u2t2). We can show that the first term is Op(1) as follows:
where we can obtain the second equality by using λ˜1,i=λ1,i+op(1) and ε˜1,it=ε1,it+op(1). This is true when we use the factor from the first group as the group common factor. Then, we can show that 1Tt=1TE(z1t2)=Op(1) under our assumptions. For the second term, we have
where we use Equations (15) and (16) to obtain the second equality. Using the final equality, we can show that 1Tt=1TE(z2t2)=Op(1). Similarly, we can show that E(u1t2u2t2)=Op(1). ▪

References

Andreou
E.
,
Gagliardini
P.
,
Ghysels
E.
,
Rubin
M.
 
2019
.
Inference in Group Factor Models with an Application to Mixed-Frequency Data
.
Econometrica
 
87
:
1267
1305
.

Andreou
E
,
Gagliardini
P.
,
Ghysels
E.
,
Rubin
M.
 
2022
. “Three Common Factors (CEPR Discussion Papers No. 17225).” C.E.P.R. Discussion Papers. Retrieved from https://ideas.repec.org/p/cpr/ceprdp/17225.html

Andreou
E.
,
Gagliardini
P.
,
Ghysels
E.
,
Rubin
M.
 
2024
.
Spanning Latent and Observable Factors
.
Journal of Econometrics
,
105743
. Retrieved from https://www.sciencedirect.com/science/article/pii/S0304407624000897 doi:

Bai
J.
 
2003
.
Inferential Theory for Factor Models of Large Dimensions
.
Econometrica
 
71
:
135
171
.

Bai
J
,
Ng
S.
 
2006
.
Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions
.
Econometrica
 
74
:
1133
1150
.

Bickel
P. J
,
Levina
E.
 
2008a
.
Covariance Regularization by Thresholding
.
The Annals of Statistics
 
36
:
2577
2604
.

Bickel
P. J
,
Levina
E.
 
2008b
.
Regularized Estimation of Large Covariance Matrices
.
The Annals of Statistics
 
36
:
199
227
.

Davidson
R
,
MacKinnon
J. G.
 
1999
.
The Size Distortion of Bootstrap Tests
.
Econometric Theory
 
15
:
361
376
.

Gonçalves
S
,
Perron
B.
 
2014
.
Bootstrapping Factor-Augmented Regression Models
.
Journal of Econometrics
 
182
:
156
173
.

Gonçalves
S
,
Perron
B.
 
2020
.
Bootstrapping Factor Models with Cross Sectional Dependence
.
Journal of Econometrics
 
218
:
476
495
.

Gonçalves
S.
,
McCracken
M. W.
,
Perron
B.
 
2017
.
Tests of Equal Accuracy for Nested Models with Estimated Factors
.
Journal of Econometrics
 
198
:
231
252
. Retrieved from https://www.sciencedirect.com/science/article/pii/S0304407617300180 doi:

Kock
A. B
,
Callot
L.
 
2015
.
Oracle Inequalities for High Dimensional Vector Autoregressions
.
Journal of Econometrics
 
186
:
325
344
.

Koh
J.
 
2024
. Bootstrapping factor-MIDAS regression models (working paper).

Krampe
J
,
Margaritella
L.
 
2021
. Dynamic Factor Models with Sparse VAR Idiosyncratic Components. arXiv preprint arXiv:2112.07149.

Krampe
J
,
Paparoditis
E.
 
2021
.
Sparsity Concepts and Estimation Procedures for High-Dimensional Vector Autoregressive Models
.
Journal of Time Series Analysis
 
42
:
554
579
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data