SurvJamda: an R package to predict patients' survival and risk assessment using joint analysis of microarray gene expression data

Author Notes

Abstract

Summary: SurvJamda (Survival prediction by joint analysis of microarray data) is an R package that utilizes joint analysis of microarray gene expression data to predict patients' survival and risk assessment. Joint analysis can be performed by merging datasets or meta-analysis to increase the sample size and to improve survival prognosis. The prognosis performance derived from the combined datasets can be assessed to determine which feature selection approach, joint analysis method and bias estimation provide the most robust prognosis for a given set of datasets.

Availability: The survJamda package is available at the Comprehensive R Archive Network, http://cran.r-project.org.

Contact: [email protected]

1 INTRODUCTION

The survJamda package was developed for survival prediction and risk assessment based on microarray data. It allows to jointly analyze the datasets through data merging and meta-analysis. Data merging combines the data into one set prior to their analysis, whereas meta-analysis integrates only the results. In addition to different joint analysis methods, survJamda contains various feature selection approaches and bias estimation techniques which enable the user to determine the combination of which methods provides the most robust prediction for a given set of datasets.

A few other R packages like ipdmeta (Broeze et al., 2009) and survcomp (Haibe-Kains et al., 2008) have been created for joint analysis, that are more specifically, related to meta-analysis of censored data with time-to-event outcome.

2 DATA

The functions and algorithms developed in survJamda can be assessed on the datasets of the survJamda.data package. SurvJamda.data, created by the author, is a data package of 18 Mb containing three breast cancer datasets, GSE3143, GSE1992 and GSE4335, which were analyzed in (Yasrebi et al., 2009). SurvJamda.data are also available on Comprehensive R Archive Network.

3 METHODS

3.1 Feature selection

Top-ranking (Yasrebi et al., 2009). The multiple hypothesis testing correction implemented in the p.adjust function in the R stats package can also be applied to the top-ranking method.
User-defined method.

3.2 Joint analysis methods

Merging method:
- ComBat (Johnson et al., 2007).
- Z-score normalization. This method is applied in two ways:
  1. Z-score1 normalization: in this approach, all datasets are Z-score normalized (Larsen et al., 2000) prior to their selection for the training and testing sets and their combination into one set (Yasrebi et al., 2009).
  2. Z-score2 normalization: in this method, the datasets are initially selected for the training and testing sets. Then, the datasets composing the training set are merged together and Z-score normalized. The testing set is also Z-score normalized independently and separately from the training set.
Meta-analysis. The inverse normal method (Hedges et al., 1985) is used for meta-analysis.

3.3 Validation frameworks

Cross validation (CV) nested in 10 iterations.
Independent validation.
- Pair-wise mode: two datasets are selected at a time, one of which is used as the training set and the other as the testing set. This process is iterated until all datasets are used as the training and testing sets (Yasrebi et al., 2009).
- Leave one dataset out: all datasets except one are merged together to form the training set and the left-out set is used as the testing set. Similarly, this process is iterated until all datasets are used as the training and testing sets (Yasrebi et al., 2009).

3.4 Performance measures

Survival prediction is expressed by time-dependent area under the receiver operating characteristic curve (Heagerty et al., 2000) and hazard ratio measures risk assessment (Yasrebi et al., 2009).
Concordance index (Haibe-Kains et al., 2008).
Brier score (Haibe-Kains et al., 2008).

Conflict of Interest: none declared.

REFERENCES

Broeze

et al. ,

Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine

BMC Med. Res. Methodol.

2009

, vol.

pg.

Haibe-Kains

et al. ,

A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?

Bioinformatics

2008

, vol.

(pg.

2200

2208

)

Heagerty

P.J.

et al. ,

Time-dependent ROC curves for censored survival data and a diagnostic marker

Biometrics.

2000

, vol.

(pg.

337

344

)

Hedges

L.V.

Olkin

Statistical Methods for Meta-Analysis.

1985

Academic Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Ishwaran

et al. ,

Random survival forests

Ann. Appl. Statist.

2008

, vol.

(pg.

841

860

)

Google Scholar

Crossref

WorldCat

Johnson

E.,W.

et al. ,

Adjusting batch effects in microarray expression data using empirical Bayes methods

Biostatistics

2007

, vol.

(pg.

118

127

)

Larsen

R.J.

Marx

M.L.

An Introduction to Mathematical Statistics and Its Applications

2000

3rd

Prentice Hall

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Yasrebi

et al. ,

Can survival prediction be improved by merging gene expression data sets?

PLoS ONE

2009

, vol.

pg.

e7431

Author notes

Associate Editor: Joaquin Dopazo

Download all slides

Month:	Total Views:
December 2016	1
January 2017	2
February 2017	7
March 2017	5
April 2017	4
May 2017	5
June 2017	4
July 2017	6
August 2017	9
September 2017	2
October 2017	7
November 2017	3
December 2017	22
January 2018	4
February 2018	10
March 2018	18
April 2018	11
May 2018	9
June 2018	14
July 2018	12
August 2018	14
September 2018	6
October 2018	6
November 2018	16
December 2018	12
January 2019	14
February 2019	6
March 2019	24
April 2019	18
May 2019	17
June 2019	11
July 2019	11
August 2019	8
September 2019	8
October 2019	10
November 2019	9
December 2019	10
January 2020	15
February 2020	13
March 2020	10
April 2020	12
May 2020	16
June 2020	6
July 2020	2
August 2020	9
September 2020	7
October 2020	11
November 2020	10
December 2020	13
January 2021	1
February 2021	6
March 2021	10
April 2021	19
May 2021	9
June 2021	4
July 2021	9
August 2021	6
September 2021	8
October 2021	12
November 2021	7
December 2021	8
January 2022	6
February 2022	9
March 2022	5
April 2022	11
May 2022	8
June 2022	5
July 2022	15
August 2022	9
September 2022	14
October 2022	15
November 2022	5
December 2022	6
January 2023	1
March 2023	3
April 2023	5
May 2023	6
June 2023	6
July 2023	3
August 2023	24
September 2023	10
October 2023	17
November 2023	11
December 2023	13
January 2024	11
February 2024	6
March 2024	12
April 2024	20
May 2024	16
June 2024	14
July 2024	14
August 2024	12
September 2024	6
October 2024	14
November 2024	5
December 2024	17
January 2025	6
February 2025	13
March 2025	16
April 2025	4

Month:

Total Views:

December 2016

January 2017

February 2017

March 2017

April 2017

May 2017

June 2017

July 2017

August 2017

September 2017

October 2017

November 2017

December 2017

January 2018

February 2018

March 2018

April 2018

May 2018

June 2018

July 2018

August 2018

September 2018

October 2018

November 2018

December 2018

January 2019

February 2019

March 2019

April 2019

May 2019

June 2019

July 2019

August 2019

September 2019

October 2019

November 2019

December 2019

January 2020

February 2020

March 2020

April 2020

May 2020

June 2020

July 2020

August 2020

September 2020

October 2020

November 2020

December 2020

January 2021

February 2021

March 2021

April 2021

May 2021

June 2021

July 2021

August 2021

September 2021

October 2021

November 2021

December 2021

January 2022

February 2022

March 2022

April 2022

May 2022

June 2022

July 2022

August 2022

September 2022

October 2022

November 2022

December 2022

January 2023

March 2023

April 2023

May 2023

June 2023

July 2023

August 2023

September 2023

October 2023

November 2023

December 2023

January 2024

February 2024

March 2024

April 2024

May 2024

June 2024

July 2024

August 2024

September 2024

October 2024

November 2024

December 2024

January 2025

February 2025

March 2025

April 2025

Article Contents

SurvJamda: an R package to predict patients' survival and risk assessment using joint analysis of microarray gene expression data

Abstract

1 INTRODUCTION

2 DATA

3 METHODS

3.1 Feature selection

3.2 Joint analysis methods

3.3 Validation frameworks

3.4 Performance measures

REFERENCES

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

SurvJamda: an R package to predict patients' survival and risk assessment using joint analysis of microarray gene expression data

Abstract

1 INTRODUCTION

2 DATA

3 METHODS

3.1 Feature selection

3.2 Joint analysis methods

3.3 Validation frameworks

3.4 Performance measures

REFERENCES

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only