Sequential Sufficient Dimension Reduction for Large p, Small n Problems

Accuracy for model 1†

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	0.2482	0.9852	0.5721	0.9081	1	0.0244	8 (d = 1), 63 (d = 2), 29 (d = 3)
	(0.0552)	(0.0068)	(0.0915)	(0.0458)
2	0.3642	0.9862	0.8744	0.8280	0.9975	0.0241	12 (d = 1), 53 (d = 2), 35 (d = 3)
	(0.0792)	(0.0066)	(0.2580)	(0.1165)
3	1.0032	0.9749	1.3078	0.7545	1	0.1277	7 (d = 1), 51 (d = 2), 42 (d = 3)
	(0.1399)	(0.0121)	(0.1236)	(0.1861)

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	0.2482	0.9852	0.5721	0.9081	1	0.0244	8 (d = 1), 63 (d = 2), 29 (d = 3)
	(0.0552)	(0.0068)	(0.0915)	(0.0458)
2	0.3642	0.9862	0.8744	0.8280	0.9975	0.0241	12 (d = 1), 53 (d = 2), 35 (d = 3)
	(0.0792)	(0.0066)	(0.2580)	(0.1165)
3	1.0032	0.9749	1.3078	0.7545	1	0.1277	7 (d = 1), 51 (d = 2), 42 (d = 3)
	(0.1399)	(0.0121)	(0.1236)	(0.1861)

†

For Δ_f and |r|, the results reported are the average over 100 replicates, and the values in parentheses are the standard errors. Benchmark Δ_f = 1.4138, |r| = 0.0442 for case 1, |r| = 0.0468 for case 2 and |r| = 0.0679 for case 3.

‡

χ ²-test at α = 0.05

Table 1.

Accuracy for model 1†

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	0.2482	0.9852	0.5721	0.9081	1	0.0244	8 (d = 1), 63 (d = 2), 29 (d = 3)
	(0.0552)	(0.0068)	(0.0915)	(0.0458)
2	0.3642	0.9862	0.8744	0.8280	0.9975	0.0241	12 (d = 1), 53 (d = 2), 35 (d = 3)
	(0.0792)	(0.0066)	(0.2580)	(0.1165)
3	1.0032	0.9749	1.3078	0.7545	1	0.1277	7 (d = 1), 51 (d = 2), 42 (d = 3)
	(0.1399)	(0.0121)	(0.1236)	(0.1861)

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	0.2482	0.9852	0.5721	0.9081	1	0.0244	8 (d = 1), 63 (d = 2), 29 (d = 3)
	(0.0552)	(0.0068)	(0.0915)	(0.0458)
2	0.3642	0.9862	0.8744	0.8280	0.9975	0.0241	12 (d = 1), 53 (d = 2), 35 (d = 3)
	(0.0792)	(0.0066)	(0.2580)	(0.1165)
3	1.0032	0.9749	1.3078	0.7545	1	0.1277	7 (d = 1), 51 (d = 2), 42 (d = 3)
	(0.1399)	(0.0121)	(0.1236)	(0.1861)

†

For Δ_f and |r|, the results reported are the average over 100 replicates, and the values in parentheses are the standard errors. Benchmark Δ_f = 1.4138, |r| = 0.0442 for case 1, |r| = 0.0468 for case 2 and |r| = 0.0679 for case 3.

‡

χ ²-test at α = 0.05

3.1.2. Model 2

Model 2 is the sparse categorical two-dimensional model

Y = I (β_{1}^{T} X + 0.2 ε > 1) + 2 I (β_{2}^{T} X + 0.2 ε > 0),

where

β_{1}^{T} = (1, 1, 1, 1, 0, \dots, 0)

and

β_{2}^{T} = (0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, \dots, 0), I

is an indicator function, p = 1000 and n = 200. This model is an example of Zhu and Zeng (2006). The response variable Y takes four values: 0, 1, 2 or 3.

Table 2 reports the results for model 2. This is a sparse model but two dimensional. The estimated dimensions are very accurate as indicated in the last column. Again TPR and FPR for all three cases show the efficacy of path II. As expected, when the dimension increases, the estimation accuracy in terms of both Δ_f and |r| decreases.

Table 2.

Accuracy for model 2†

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	1.5626	0.4253/0.6518	1.6126	0.4011/0.5132	0.9887	0.1750	97 (d = 24), 3 (d = 3)
	(0.0894)	(0.1444/0.2106)	(0.1009)	(0.1252/0.1963)
		0.3687/0.4922		0.2811/0.4232
		(0.1732/0.1696)		(0.2311/0.1423)
2	1.6582	0.3779/0.7343	1.878	0.2464/0.4053	1	0.1676	2 (d = 1), 92 (d = 2),
	(0.0912)	(0.1144/0.1591)	(0.1207)	(0.1267/0.1940)
		0.4736/0.4351		0.4078/0.5318
		(0.1424/0.1504)		(0.1119/0.1452)
3	1.8427	0.3545/0.6032	1.9268	0.2312/0.3901	1	0.2172	3 (d = 1), 94 (d = 2),
	(0.0822)	(0.1918/0.1580)	(0.0820)	(0.0967/0.2146)
		0.4112/0.4559		0.4117/0.3618
		(0.1424/0.1120)		(0.1212/0.1124)

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	1.5626	0.4253/0.6518	1.6126	0.4011/0.5132	0.9887	0.1750	97 (d = 24), 3 (d = 3)
	(0.0894)	(0.1444/0.2106)	(0.1009)	(0.1252/0.1963)
		0.3687/0.4922		0.2811/0.4232
		(0.1732/0.1696)		(0.2311/0.1423)
2	1.6582	0.3779/0.7343	1.878	0.2464/0.4053	1	0.1676	2 (d = 1), 92 (d = 2),
	(0.0912)	(0.1144/0.1591)	(0.1207)	(0.1267/0.1940)
		0.4736/0.4351		0.4078/0.5318
		(0.1424/0.1504)		(0.1119/0.1452)
3	1.8427	0.3545/0.6032	1.9268	0.2312/0.3901	1	0.2172	3 (d = 1), 94 (d = 2),
	(0.0822)	(0.1918/0.1580)	(0.0820)	(0.0967/0.2146)
		0.4112/0.4559		0.4117/0.3618
		(0.1424/0.1120)		(0.1212/0.1124)

†

For Δ_f and |r|, the results reported are the average over 100 replicates, and the number in parentheses are the standard errors. Benchmark Δ_f = 1.9993, |r| = 0.0595 for case 1, |r| = 0.0607 for case 2 and |r| = 0.0716 for case 3.

‡

χ ²-test at α = 0.05

Table 2.

Open in new tab Download slide

Accuracy for model 2†

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	1.5626	0.4253/0.6518	1.6126	0.4011/0.5132	0.9887	0.1750	97 (d = 24), 3 (d = 3)
	(0.0894)	(0.1444/0.2106)	(0.1009)	(0.1252/0.1963)
		0.3687/0.4922		0.2811/0.4232
		(0.1732/0.1696)		(0.2311/0.1423)
2	1.6582	0.3779/0.7343	1.878	0.2464/0.4053	1	0.1676	2 (d = 1), 92 (d = 2),
	(0.0912)	(0.1144/0.1591)	(0.1207)	(0.1267/0.1940)
		0.4736/0.4351		0.4078/0.5318
		(0.1424/0.1504)		(0.1119/0.1452)
3	1.8427	0.3545/0.6032	1.9268	0.2312/0.3901	1	0.2172	3 (d = 1), 94 (d = 2),
	(0.0822)	(0.1918/0.1580)	(0.0820)	(0.0967/0.2146)
		0.4112/0.4559		0.4117/0.3618
		(0.1424/0.1120)		(0.1212/0.1124)

Case	Non-sparse estimates		Sparse estimates				% of d ‡
Case	Δ_f	\|r\|	Δ_f	\|r\|	TPR	FPR	% of d ‡
1	1.5626	0.4253/0.6518	1.6126	0.4011/0.5132	0.9887	0.1750	97 (d = 24), 3 (d = 3)
	(0.0894)	(0.1444/0.2106)	(0.1009)	(0.1252/0.1963)
		0.3687/0.4922		0.2811/0.4232
		(0.1732/0.1696)		(0.2311/0.1423)
2	1.6582	0.3779/0.7343	1.878	0.2464/0.4053	1	0.1676	2 (d = 1), 92 (d = 2),
	(0.0912)	(0.1144/0.1591)	(0.1207)	(0.1267/0.1940)
		0.4736/0.4351		0.4078/0.5318
		(0.1424/0.1504)		(0.1119/0.1452)
3	1.8427	0.3545/0.6032	1.9268	0.2312/0.3901	1	0.2172	3 (d = 1), 94 (d = 2),
	(0.0822)	(0.1918/0.1580)	(0.0820)	(0.0967/0.2146)
		0.4112/0.4559		0.4117/0.3618
		(0.1424/0.1120)		(0.1212/0.1124)

†

For Δ_f and |r|, the results reported are the average over 100 replicates, and the number in parentheses are the standard errors. Benchmark Δ_f = 1.9993, |r| = 0.0595 for case 1, |r| = 0.0607 for case 2 and |r| = 0.0716 for case 3.

‡

χ ²-test at α = 0.05

In all our simulations, we report the ‘raw’ results from the path once it finishes the first cycle. These reported ‘raw’ results already seem sufficiently accurate. However, re-estimation may be used to improve the results, i.e., once the path stops after the first cycle, we use the remaining active variables to obtain another solution. An illustration of our approach for an abundant model and a comparison of our approach with the one-step estimation method can be found in the on-line supplementary file.

3.2. Leukaemia data

In this section, we shall use path II to analyse a leukaemia data set, which comes from high density Affymetrix oligonucleotide arrays. This data set was first analysed by Golub et al. (1999), and later by Chiaromonte and Martinelli (2002), Dudoit et al. (2002) and Fan and Fan (2008), among others. The data are available from http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi. There are 72 samples from two classes of acute leukaemia: 47 in acute lymphoblastic leukaemia and 25 in acute myeloid leukaemia. There are 7129 genes. Before we apply our path II approach to the data, we preprocess them following Golub et al. (1999). Three preprocessing steps were applied:

a.
thresholding, gene expression readings of 100 or fewer were set to 100 and expression readings of 16000 or more were set to 16000;
b.
screening, only genes where max−min> 500 and max/min> 5 were included, where max and min refer to the maximum and minimum readings of a gene expression among the 72 samples respectively;
c.
transformation, gene expression readings of the genes selected were log-transformed.

The data were then summarized by a 72 × 3571 matrix X = (x _ij), where x _ij denotes the expression level for gene j in messenger ribonucleic acid sample i. We further standardized the data so that observations have mean 0 and variance 1 across variables (genes).

In the application of our method, we run the SIR and PRPSIR steps and form the estimated space by combining directions from these steps, if any. At each step, we use Li's χ ²-test (Li, 1991) to determine the structural dimension. Because of the computational cost, tuning parameters for the sparse SDR estimation of Li (2007) are chosen only at the outset, the first step, and are applied to the remaining steps. We set the number of predictors (genes) that are considered at each step to be 20 (p ₁ = 20), and the number of slices to be 5.

3.2.1. Initial stage estimation

When path II stops, the initial estimate has dimension 1 and it contains 28 non-zero coefficients—it selects 28 genes. Fig. 1(a) shows a boxplot of the sparse estimate of the direction. Fig. 1(b) presents a boxplot of the non-sparse estimate of the direction with all the selected 28 genes by refitting the 72 × 28 data.

Fig. 1.

Sufficient summary plot for the leukaemia data with two categories

3.2.2. Refined stage estimation

In the refined stage we form a new data matrix of size 72 × 28. Since now n = 72 and p = 28, we can just apply the usual SIR method, with shrinkage procedure applied. The estimate contains 11 non-zero coefficients—it selects 11 genes. Fig. 1(c) shows a boxplot of the sufficient predictor that is formed by using this sparse estimate of the direction. Fig. 1(d) presents a boxplot of the sufficient predictor from a non-sparse estimate by using only the 11 selected genes by refitting the 72 × 11 data. There is no further reduction from the 11 genes, and thus we have the final estimate of 11 genes. Fig. 1(d) is our final summary plot, which clearly shows the separation of the two groups.

Since we used SIR and its variations, we would like to check the conditional normality of X|Y. Note that these methods require a linearity condition (Cook, 1998), which is implied by conditional normality. Indeed, we use the Shapiro test for normality with p-value 0.4569 or greater and the Bartlett test for constant variance with p-value 0.2955. These results and a QQ-plot (which is not reported here) show that a conditional normality assumption seems very reasonable.

In two-group analysis, re-estimation using reduced variables is applied, which demonstrates the improvement and supports our earlier comments in Section 3.1.

4. Discussion

Analysis for large p, small n data is important but difficult. This is for at least two reasons: the sample covariance matrix is not invertible, and the capacity of manipulating a large predictor vector. In this paper we proposed a simple framework to tackle large p, small n problems. The framework differs from usual methodologies in that it decomposes the predictor vector into pieces so that inverting the sample covariance matrix is not an issue, and a large response vector can be manipulated. It provides a sufficient way to solve the problem. We illustrated the framework via two paths using SDR approaches. However, the framework does not have to be restricted to our chosen approaches. Many methods that work for n>p data can be adopted in the framework, as long as a multiple-response vector can be handled. Hence, it opens possible research for large p, small n problems. However, the asymptotic properties for the framework when p → ∞, extended from one-step solution, are not yet clear. In addition, differences between solutions from different partitions of the predictor vector may be difficult to quantify rigorously, though an ensemble idea can provide a promising solution. We leave them as open problems.

Acknowledgements

The authors thank the Joint Editor, an Associate Editor and two referees for their valuable comments, which greatly improved the paper. This work is supported in part by National Science Foundation grant DMS-1205564.

References

1

Bura

,

E.

and

Cook

,

R. D.

(

2001

)

Extending sliced inverse regression: the weighted chi-squared test

.

J. Am. Statist. Ass.

,

96

,

996

–

1003

.

2

Candès

,

E.

and

Tao

,

T.

(

2007

)

The Dantzig selector: statistical estimation when p is much larger than n

.

Ann. Statist.

,

35

,

2313

–

2351

.

3

Chen

,

X.

,

Zou

,

C.

and

Cook

,

R. D.

(

2010

)

Coordinate-independent sparse sufficient dimension reduction and variable selection

.

Ann. Statist.

,

38

,

3696

–

3723

.

4

Chiaromonte

,

F.

,

Cook

,

R. D.

and

Li

,

B.

(

2002

)

Sufficient dimension reduction in regression with categorical predictors

.

Ann. Statist.

,

30

,

475

–

497

.

5

Chiaromonte

,

F.

and

Martinelli

,

J.

(

2002

)

Dimension reduction strategies for analyzing global gene expression data with a response

.

Math. Biosci.

,

176

,

123

–

144

.

6

Cook

,

R. D.

(

1994

)

On the interpretation of regression plots

.

J. Am. Statist. Ass.

,

89

,

177

–

190

.

7

Cook

,

R. D.

(

1996

)

Graphics for regressions with a binary response

.

J. Am. Statist. Ass.

,

91

,

983

–

992

.

8

Cook

,

R. D.

(

1998

)

Regression Graphics: Ideas for Studying Regressions through Graphics

.

New York

:

Wiley

.

9

Cook

,

R. D.

(

2004

)

Testing predictor contributions in sufficient dimension reduction

.

Ann. Statist.

,

32

,

1062

–

1092

.

10

Cook

,

R. D.

,

Forzani

,

L.

and

Rothman

,

A.

(

2012

)

Estimating sufficient reductions of the predictors in abundant high-dimensional regressions

.

Ann. Statist.

,

40

,

353

–

384

.

11

Cook

,

R. D.

,

Li

,

B.

and

Chiaromonte

,

F.

(

2007

)

Dimension reduction in regression without matrix inversion

.

Biometrika

,

94

,

569

–

584

.

12

Cook

,

R. D.

and

Setodji

,

C.

(

2003

)

A model-free test for reduced rank in multivariate regression

.

J. Am. Statist. Ass.

,

98

,

340

–

351

.

13

Cook

,

R. D.

and

Weisberg

,

S.

(

1991

)

Discussion of Li (1991)

.

J. Am. Statist. Ass.

,

86

,

328

–

332

.

14

Dalalyan

,

A.

,

Juditsky

,

A.

and

Spokoiny

,

V.

(

2008

)

A new algorithm for estimating the effective dimension-reduction subspace

.

J. Mach. Learn. Res.

,

9

,

1647

–

1678

.

15

Donoho

,

D. L.

(

2000

)

High-dimensional data analysis: the curses and blessings of dimensionality

.

American Mathematical Society Conf. Math Challenges of the 21st Century

.

16

Dudoit

,

S.

,

Fridlyand

,

J.

and

Speed

,

T. P.

(

2002

)

Comparison of discrimination methods for the classification of tumors using gene expression data

.

J. Am. Statist. Ass.

,

97

,

77

–

87

.

17

Fan

,

J.

and

Fan

,

Y.

(

2008

)

High-dimensional classification using features annealed independence rules

.

Ann. Statist.

,

36

,

2605

–

2637

.

18

Fan

,

J.

and

Li

,

R.

(

2001

)

Variable selection via nonconcave penalized likelihood and its oracle properties

.

J. Am. Statist. Ass.

,

96

,

1348

–

1360

.

19

Fan

,

J.

and

Li

,

R.

(

2006

) Statistical challenges with high dimensionality: feature selection in knowledge discovery. In

Proc. Int. Congr. Mathematicians

(eds

M.

Sanz-Sole

,

J.

Soria

,

J. L.

Varona

and

J.

Verdera

), vol. III, pp.

595

–

622

.

Freiburg

:

European Mathematical Society

.

Google Preview

20

Fan

,

J.

and

Lv

,

J.

(

2008

)

Sure independence screening for ultrahigh dimensional feature space (with discussion)

.

J. R. Statist. Soc. B

,

70

,

849

–

911

.

21

Fan

,

J.

,

Samworth

,

R.

and

Wu

,

Y.

(

2009

)

Ultrahigh dimensional feature selection: beyond the linear model

.

J. Mach. Learn. Res.

,

10

,

2013

–

2038

.

22

Fan

,

J.

and

Song

,

R.

(

2010

)

Sure independence screening in generalized linear models with np-dimensionality

.

Ann. Statist.

,

38

,

3567

–

3604

.

23

Fukumizu

,

K.

,

Bach

,

F.

and

Jordan

,

M.

(

2004

)

Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces

.

J. Mach. Learn. Res.

,

5

,

73

–

99

.

24

Fukumizu

,

K.

,

Bach

,

F.

and

Jordan

,

M.

(

2009

)

Kernel dimension reduction in regression

.

Ann. Statist.

,

37

,

1871

–

1905

.

25

Fung

,

W. K.

,

He

,

X.

,

Liu

,

L.

and

Shi

,

P.

(

2002

)

Dimension reduction based on canonical correlation

.

Statist. Sin.

,

12

,

1093

–

1113

.

26

Golub

,

T. R.

,

Slonim

,

D. K.

,

Tamayo

,

P.

,

Hurd

,

C.

,

Gaasenbeek

,

M.

,

Mesirov

,

J. P.

,

Coller

,

H.

,

Loh

,

M. L.

,

Downing

,

J. R.

,

Caligiuri

,

M. A.

,

Bloomfield

,

C. D.

and

Lander

,

E. S.

(

1999

)

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

.

Science

,

286

,

531

–

537

.

27

Hall

,

P.

and

Miller

,

H.

(

2009

)

Using generalised correlation to effect variable selection in very high dimensional problems

.

J. Computnl Graph. Statist.

,

18

,

533

–

550

.

28

Hilafu

,

H.

and

Yin

,

X.

(

2013

)

Sufficient dimension reduction and statistical modeling of plasma concentrations

.

Computnl Statist. Data Anal.

,

63

,

139

–

147

.

29

Hristache

,

M.

,

Juditsky

,

A.

,

Polzehl

,

J.

and

Spokoiny

,

V.

(

2001a

)

Structure adaptive approach for dimension reduction

.

Ann. Statist.

,

29

,

1537

–

1566

.

30

Hristache

,

M.

,

Juditsky

,

A.

and

Spokoiny

,

V.

(

2001b

)

Direct estimation of the index coefficient in a singleindex model

.

Ann. Statist.

,

29

,

595

–

623

.

31

Huang

,

J.

,

Horowitz

,

J.

and

Ma

,

S.

(

2008

)

Asymptotic properties of bridge estimators in sparse high-dimensional regression models

.

Ann. Statist.

,

36

,

587

–

613

.

32

Li

,

K.-C.

(

1991

)

Sliced inverse regression for dimension reduction (with discussion)

.

J. Am. Statist. Ass.

,

86

,

316

–

342

.

33

Li

,

L.

(

2007

)

Sparse sufficient dimension reduction

.

Biometrika

,

94

,

603

–

613

.

34

Li

,

L.

,

Cook

,

R. D.

and

Nachtsheim

,

C. J.

(

2005

)

Model-free variable selection

.

J. R. Statist. Soc. B

,

67

,

285

–

299

.

35

Li

,

L.

and

Nachtsheim

,

C. J.

(

2006

)

Sparse sliced inverse regression

.

Technometrics

,

48

,

503

–

510

.

36

Li

,

B.

,

Wen

,

S.

and

Zhu

,

L.

(

2008

)

On a projective resampling method for dimension reduction with multivariate responses

.

J. Am. Statist. Ass.

,

103

,

1177

–

1186

.

37

Li

,

L.

and

Yin

,

X.

(

2008

)

Sliced inverse regression with regulations

.

Biometrics

,

64

,

124

–

131

.

38

Li

,

B.

,

Zha

,

H.

and

Chiaromonte

,

F.

(

2005

)

Contour regression: a general approach to dimension reduction

.

Ann. Statist.

,

33

,

1580

–

1616

.

39

Li

,

R.

,

Zhong

,

W.

and

Zhu

,

L.-P.

(

2012

)

Feature screening via distance correlation learning

.

J. Am. Statist. Ass.

,

107

,

1129

–

1139

.

40

Lounici

,

K.

,

Pontil

,

M.

,

Tsybakov

,

A.

and

van de Geer

,

S.

(

2009

)

Taking advantage of sparsity in multitask learning

. In

Proc. Conf. Learning Theory, Montréal, June 18th–21st

.

41

Ni

,

L.

,

Cook

,

R. D.

and

Tsai

,

C. L.

(

2005

)

A note on shrinkage sliced inverse regression

.

Biometrika

,

92

,

242

–

247

.

42

Tibshirani

,

R.

(

1996

)

Regression shrinkage and selection via the lasso

.

J. R. Statist. Soc. B

,

58

,

267

–

288

.

43

Wang

,

H.

and

Xia

,

Y.

(

2008

)

Sliced regression for dimension reduction

.

J. Am. Statist. Ass.

,

103

,

811

–

821

.

44

Wang

,

Q.

and

Yin

,

X.

(

2008

)

A nonlinear multi-dimensional variable selection method for high dimensional data: sparse MAVE

.

Computnl Statist. Data Anal.

,

52

,

4512

–

4520

.

45

Wu

,

Y.

and

Li

,

L.

(

2011

)

Asymptotic properties of sufficient dimension reduction with a diverging number of predictors

.

Statist. Sin.

,

21

,

707

–

730

.

46

Xia

,

Y.

,

Tong

,

H.

,

Li

,

W. K.

and

Zhu

,

L.-X.

(

2002

)

An adaptive estimation of dimension reduction space (with discussion)

.

J. R. Statist. Soc. B

,

64

,

363

–

410

.

47

Yin

,

X.

and

Bura

,

E.

(

2006

)

Moment based dimension reduction for multivariate response regression

.

J. Statist. Planng Inf.

,

136

,

3675

–

3688

.

48

Yin

,

X.

and

Cook

,

R. D.

(

2005

)

Direction estimation in single-index regressions

.

Biometrika

,

92

,

371

–

384

.

49

Yin

,

X.

and

Li

,

B.

(

2011

)

Sufficient dimension reduction based on an ensemble of minimum average variance estimators

.

Ann. Statist.

,

39

,

3392

–

3416

.

50

Yin

,

X.

,

Li

,

B.

and

Cook

,

R. D.

(

2008

)

Successive direction extraction for estimating the central subspace in a Multiple-index regression

.

J. Multiv. Anal.

,

99

,

1733

–

1757

.

51

Zhou

,

J.

and

He

,

X.

(

2008

)

Dimension reduction based on constrained canonical correlation and variable filtering

.

Ann. Statist.

,

36

,

1649

–

1668

.

52

Zhu

,

L-P.

,

Li

,

L.

,

Li

,

R.

and

Zhu

,

L.

(

2011

)

Model-free feature screening for ultrahigh-dimensional data

.

J. Am. Statist. Ass.

,

106

,

1464

–

1475

.

53

Zhu

,

L.

,

Miao

,

B.

and

Peng

,

H.

(

2006

)

On sliced inverse regression with large dimensional covariates

.

J. Am. Statist. Ass.

,

101

,

630

–

643

.

54

Zhu

,

L.-P.

,

Yin

,

X.

and

Zhu

,

L.

(

2010a

)

Dimension reduction for correlated data: an alternating inverse regression

.

J. Computnl Graph. Statist.

,

19

,

887

–

899

.

55

Zhu

,

L.-P.

,

Yu

,

Z.

and

Zhu

,

L.

(

2010b

)

A sparse eigen-decomposition estimation in semi-parametric regression

.

Computnl Statist. Data Anal.

,

54

,

976

–

986

.

56

Zhu

,

Y.

and

Zeng

,

P.

(

2006

)

Fourier methods for estimating the central subspace and the central mean subspace in regression

.

J. Am. Statist. Ass.

,

101

,

1638

–

1651

.