Molecular subtyping of cancer: current status and moving toward clinical applications Free

A comparison of different techniques for molecular profiling of cancer

	Platform
Characteristic	Microarray	RNA sequencing	qPCR	NanoString	Tissue microarray
Accuracy [37, 166, 167]	Median	Median	High	High	Low
Sensitivity [36, 167, 39]	Median	High	High	High	Low
Specificity [38, 167, 168]	Median	Median	High	High	Low
Speed [36, 167]	Slow	Slow	Fast	Median	Slow
Cost (per sample)	$300 [163]	$1000 [163]	$280 [164]	$800 [39]	$100 [165]
Sample requirement [167]	FFPE/fresh-frozen	FFPE/fresh-frozen	Fresh-frozen	FFPE/fresh-frozen	FFPE
Genome-wide coverage	Yes	Yes	No	No	No
Quantitative	Yes	Yes	Yes	Yes	No
Single-base resolution	No	Yes	No	No	No
Low sample input	No	No	Yes	No	Yes
Reproducibility [168]	Median	Median	High	High	Low

	Platform
Characteristic	Microarray	RNA sequencing	qPCR	NanoString	Tissue microarray
Accuracy [37, 166, 167]	Median	Median	High	High	Low
Sensitivity [36, 167, 39]	Median	High	High	High	Low
Specificity [38, 167, 168]	Median	Median	High	High	Low
Speed [36, 167]	Slow	Slow	Fast	Median	Slow
Cost (per sample)	$300 [163]	$1000 [163]	$280 [164]	$800 [39]	$100 [165]
Sample requirement [167]	FFPE/fresh-frozen	FFPE/fresh-frozen	Fresh-frozen	FFPE/fresh-frozen	FFPE
Genome-wide coverage	Yes	Yes	No	No	No
Quantitative	Yes	Yes	Yes	Yes	No
Single-base resolution	No	Yes	No	No	No
Low sample input	No	No	Yes	No	Yes
Reproducibility [168]	Median	Median	High	High	Low

Table 1

A comparison of different techniques for molecular profiling of cancer

	Platform
Characteristic	Microarray	RNA sequencing	qPCR	NanoString	Tissue microarray
Accuracy [37, 166, 167]	Median	Median	High	High	Low
Sensitivity [36, 167, 39]	Median	High	High	High	Low
Specificity [38, 167, 168]	Median	Median	High	High	Low
Speed [36, 167]	Slow	Slow	Fast	Median	Slow
Cost (per sample)	$300 [163]	$1000 [163]	$280 [164]	$800 [39]	$100 [165]
Sample requirement [167]	FFPE/fresh-frozen	FFPE/fresh-frozen	Fresh-frozen	FFPE/fresh-frozen	FFPE
Genome-wide coverage	Yes	Yes	No	No	No
Quantitative	Yes	Yes	Yes	Yes	No
Single-base resolution	No	Yes	No	No	No
Low sample input	No	No	Yes	No	Yes
Reproducibility [168]	Median	Median	High	High	Low

	Platform
Characteristic	Microarray	RNA sequencing	qPCR	NanoString	Tissue microarray
Accuracy [37, 166, 167]	Median	Median	High	High	Low
Sensitivity [36, 167, 39]	Median	High	High	High	Low
Specificity [38, 167, 168]	Median	Median	High	High	Low
Speed [36, 167]	Slow	Slow	Fast	Median	Slow
Cost (per sample)	$300 [163]	$1000 [163]	$280 [164]	$800 [39]	$100 [165]
Sample requirement [167]	FFPE/fresh-frozen	FFPE/fresh-frozen	Fresh-frozen	FFPE/fresh-frozen	FFPE
Genome-wide coverage	Yes	Yes	No	No	No
Quantitative	Yes	Yes	Yes	Yes	No
Single-base resolution	No	Yes	No	No	No
Low sample input	No	No	Yes	No	Yes
Reproducibility [168]	Median	Median	High	High	Low

Gene expression-based subtyping of cancer was first proposed by Golub et al. [13] in leukemia. The expression pattern of the 50 most informative genes was measured and a two-cluster self-organizing map (SOM) clustering method was applied [40] to group 38 samples into two classes: acute myeloid leukemia and acute lymphoblastic leukemia with accuracy of 100%. This demonstrated the fidelity of cancer subtyping based solely on gene expression patterns [13]. Gene expression-based subtyping now has been extended to include many cancer types [11, 14, 16, 17, 19, 21, 41].

Multi-platform profiling data for cancer subtyping

In addition to gene expression profiling, there are many other molecular profiling data types, such as mutation, miRNA expression, copy number variation (CNV) and DNA methylation, which can be used to identify and characterize cancer subtypes (Table 2) [43, 44, 50, 52–55]. As all cancers arise as a result of DNA sequence changes [56], the gene mutation patterns are informative and a likely platform from which to stratify cancer patients into homogeneous groups [57, 58]. MiRNAs are small noncoding RNAs about 20–22 nucleotides in length that play key roles in the regulation of gene expression. Alterations of miRNA expression are involved in the initiation and progression of human cancer [59–61]. MiRNA expression profiling now has been used as a new tool in cancer onset and subtyping [15, 62]. Unlike mRNAs, miRNAs are more stable and only a small number of miRNAs (∼200 in total) are sufficient to classify human cancers [63]. CNVs are structural variations and genomic alterations that affect DNA sequence lengths ranging from approximately 1 Kb to 3 Mb [64]. CNVs are associated with many complex diseases such as neuropsychiatric disorders [65], HIV [66], familiar pancreatitis [67] and cancers [68, 69]. Comparative genomic hybridization (CGH) can be used to detect CNVs at the genome-wide level, and array-based CGH can increase the resolution for better genomic studies. Epigenetic changes such as DNA methylation also play a significant role in the development and progression of cancer [70]. Bisulfite sequencing [71] and differential methylation hybridization [72] can be used to scan gene methylation status at the genome-wide level.

Table 2

Molecular subtyping studies mentioned in the review

Cancer type	Discovery sample size	Molecular data type	Clustering method	Determinative score	Number of subtypes	Classification method	Reference
Breast cancer	65	mRNA	Hierarchical clustering	NA	4	NA	Perou et al .[16]
Breast cancer	85	mRNA	Hierarchical clustering	NA	5	NA	Sorlie et al. [42]
Breast cancer	825	Five platforms	Cluster of clusters	NA	4	NA	TCGA [43]
Breast cancer	2, 000	mRNA + CNV	iCluster	ARI	10	PAM	Curtis et al. [44]
CRC	62	mRNA	Iterative NMF	Cophenetic coefficient	5	NA	Schlicker et al. [45]
CRC	443	mRNA	Orig. cons. clustering	CDF area	6	Centroid-based	Marisa et al. [46]
CRC	90	mRNA	Orig. cons. clustering	Gap statistic	3	PAM	De Sousa E Melo et al. [20]
CRC	445	mRNA	NMF cons. clustering	Cophenetic coefficient	5	PAM	Sadanandam et al. [47]
CRC	1, 113	mRNA	Orig. cons. clustering	Dynamic cut tree	5	Multiclass LDA	Budinska et al. [48]
CRC	188	mRNA	k-means	NA	3	Single-sample centroid based	Roepman et al. [49]
CRC	4, 151	mRNA	Markov Cluster Algorithm	Inflation factor	4	Random Forest	Guinney et al. [11]
PDAC	185	miRNA	Hierarchical clustering	CDF area	2	SVM	Bauer et al. [50]
PDAC	66	mRNA	NMF cons. clustering	Cophenetic coefficient	3	NTP	Collisson et al. [19]
PDAC	223	mRNA	NMF cons. clustering	Cophenetic coefficient	2	Rank-based classifier	Moffitt et al. [51]
Pancreatic cancer	96	mRNA	NMF cons. clustering	Cophenetic coefficient	4	NA	Bailey et al. [12]
Leukemia	38	mRNA	SOM	NA	2	NA	Golub et al. [13]
Leukemia	200	Methylation	PCA	NA	16	NA	Figueroa et al. [169]
Lymphoma	42	mRNA	Hierarchical clustering	NA	2	NA	Alizadeh et al. [14]
GBM	35	miRNA	PCA	Ratio of intracluster to intercluster correlation	2	LDA	Marziali et al. [170]
Lung	67	mRNA	Hierarchical clustering	NA	4	NA	Garber et al [17]
12 cancer types	3, 527	Five platforms	COCA	NA	11	NA	Hoadley et al [32]

Cancer type	Discovery sample size	Molecular data type	Clustering method	Determinative score	Number of subtypes	Classification method	Reference
Breast cancer	65	mRNA	Hierarchical clustering	NA	4	NA	Perou et al .[16]
Breast cancer	85	mRNA	Hierarchical clustering	NA	5	NA	Sorlie et al. [42]
Breast cancer	825	Five platforms	Cluster of clusters	NA	4	NA	TCGA [43]
Breast cancer	2, 000	mRNA + CNV	iCluster	ARI	10	PAM	Curtis et al. [44]
CRC	62	mRNA	Iterative NMF	Cophenetic coefficient	5	NA	Schlicker et al. [45]
CRC	443	mRNA	Orig. cons. clustering	CDF area	6	Centroid-based	Marisa et al. [46]
CRC	90	mRNA	Orig. cons. clustering	Gap statistic	3	PAM	De Sousa E Melo et al. [20]
CRC	445	mRNA	NMF cons. clustering	Cophenetic coefficient	5	PAM	Sadanandam et al. [47]
CRC	1, 113	mRNA	Orig. cons. clustering	Dynamic cut tree	5	Multiclass LDA	Budinska et al. [48]
CRC	188	mRNA	k-means	NA	3	Single-sample centroid based	Roepman et al. [49]
CRC	4, 151	mRNA	Markov Cluster Algorithm	Inflation factor	4	Random Forest	Guinney et al. [11]
PDAC	185	miRNA	Hierarchical clustering	CDF area	2	SVM	Bauer et al. [50]
PDAC	66	mRNA	NMF cons. clustering	Cophenetic coefficient	3	NTP	Collisson et al. [19]
PDAC	223	mRNA	NMF cons. clustering	Cophenetic coefficient	2	Rank-based classifier	Moffitt et al. [51]
Pancreatic cancer	96	mRNA	NMF cons. clustering	Cophenetic coefficient	4	NA	Bailey et al. [12]
Leukemia	38	mRNA	SOM	NA	2	NA	Golub et al. [13]
Leukemia	200	Methylation	PCA	NA	16	NA	Figueroa et al. [169]
Lymphoma	42	mRNA	Hierarchical clustering	NA	2	NA	Alizadeh et al. [14]
GBM	35	miRNA	PCA	Ratio of intracluster to intercluster correlation	2	LDA	Marziali et al. [170]
Lung	67	mRNA	Hierarchical clustering	NA	4	NA	Garber et al [17]
12 cancer types	3, 527	Five platforms	COCA	NA	11	NA	Hoadley et al [32]

Note: ARI, adjusted Rand index; No., number; COCA, Cluster-Of-Cluster-Assignments; iCluster, integrative clustering framework; LDA, linear discriminant analysis; NTP, nearest template prediction; Orig. cons., original consensus; PCA, principal component analysis.

Table 2

Open in new tab Download slide

Molecular subtyping studies mentioned in the review

Cancer type	Discovery sample size	Molecular data type	Clustering method	Determinative score	Number of subtypes	Classification method	Reference
Breast cancer	65	mRNA	Hierarchical clustering	NA	4	NA	Perou et al .[16]
Breast cancer	85	mRNA	Hierarchical clustering	NA	5	NA	Sorlie et al. [42]
Breast cancer	825	Five platforms	Cluster of clusters	NA	4	NA	TCGA [43]
Breast cancer	2, 000	mRNA + CNV	iCluster	ARI	10	PAM	Curtis et al. [44]
CRC	62	mRNA	Iterative NMF	Cophenetic coefficient	5	NA	Schlicker et al. [45]
CRC	443	mRNA	Orig. cons. clustering	CDF area	6	Centroid-based	Marisa et al. [46]
CRC	90	mRNA	Orig. cons. clustering	Gap statistic	3	PAM	De Sousa E Melo et al. [20]
CRC	445	mRNA	NMF cons. clustering	Cophenetic coefficient	5	PAM	Sadanandam et al. [47]
CRC	1, 113	mRNA	Orig. cons. clustering	Dynamic cut tree	5	Multiclass LDA	Budinska et al. [48]
CRC	188	mRNA	k-means	NA	3	Single-sample centroid based	Roepman et al. [49]
CRC	4, 151	mRNA	Markov Cluster Algorithm	Inflation factor	4	Random Forest	Guinney et al. [11]
PDAC	185	miRNA	Hierarchical clustering	CDF area	2	SVM	Bauer et al. [50]
PDAC	66	mRNA	NMF cons. clustering	Cophenetic coefficient	3	NTP	Collisson et al. [19]
PDAC	223	mRNA	NMF cons. clustering	Cophenetic coefficient	2	Rank-based classifier	Moffitt et al. [51]
Pancreatic cancer	96	mRNA	NMF cons. clustering	Cophenetic coefficient	4	NA	Bailey et al. [12]
Leukemia	38	mRNA	SOM	NA	2	NA	Golub et al. [13]
Leukemia	200	Methylation	PCA	NA	16	NA	Figueroa et al. [169]
Lymphoma	42	mRNA	Hierarchical clustering	NA	2	NA	Alizadeh et al. [14]
GBM	35	miRNA	PCA	Ratio of intracluster to intercluster correlation	2	LDA	Marziali et al. [170]
Lung	67	mRNA	Hierarchical clustering	NA	4	NA	Garber et al [17]
12 cancer types	3, 527	Five platforms	COCA	NA	11	NA	Hoadley et al [32]

Cancer type	Discovery sample size	Molecular data type	Clustering method	Determinative score	Number of subtypes	Classification method	Reference
Breast cancer	65	mRNA	Hierarchical clustering	NA	4	NA	Perou et al .[16]
Breast cancer	85	mRNA	Hierarchical clustering	NA	5	NA	Sorlie et al. [42]
Breast cancer	825	Five platforms	Cluster of clusters	NA	4	NA	TCGA [43]
Breast cancer	2, 000	mRNA + CNV	iCluster	ARI	10	PAM	Curtis et al. [44]
CRC	62	mRNA	Iterative NMF	Cophenetic coefficient	5	NA	Schlicker et al. [45]
CRC	443	mRNA	Orig. cons. clustering	CDF area	6	Centroid-based	Marisa et al. [46]
CRC	90	mRNA	Orig. cons. clustering	Gap statistic	3	PAM	De Sousa E Melo et al. [20]
CRC	445	mRNA	NMF cons. clustering	Cophenetic coefficient	5	PAM	Sadanandam et al. [47]
CRC	1, 113	mRNA	Orig. cons. clustering	Dynamic cut tree	5	Multiclass LDA	Budinska et al. [48]
CRC	188	mRNA	k-means	NA	3	Single-sample centroid based	Roepman et al. [49]
CRC	4, 151	mRNA	Markov Cluster Algorithm	Inflation factor	4	Random Forest	Guinney et al. [11]
PDAC	185	miRNA	Hierarchical clustering	CDF area	2	SVM	Bauer et al. [50]
PDAC	66	mRNA	NMF cons. clustering	Cophenetic coefficient	3	NTP	Collisson et al. [19]
PDAC	223	mRNA	NMF cons. clustering	Cophenetic coefficient	2	Rank-based classifier	Moffitt et al. [51]
Pancreatic cancer	96	mRNA	NMF cons. clustering	Cophenetic coefficient	4	NA	Bailey et al. [12]
Leukemia	38	mRNA	SOM	NA	2	NA	Golub et al. [13]
Leukemia	200	Methylation	PCA	NA	16	NA	Figueroa et al. [169]
Lymphoma	42	mRNA	Hierarchical clustering	NA	2	NA	Alizadeh et al. [14]
GBM	35	miRNA	PCA	Ratio of intracluster to intercluster correlation	2	LDA	Marziali et al. [170]
Lung	67	mRNA	Hierarchical clustering	NA	4	NA	Garber et al [17]
12 cancer types	3, 527	Five platforms	COCA	NA	11	NA	Hoadley et al [32]

Note: ARI, adjusted Rand index; No., number; COCA, Cluster-Of-Cluster-Assignments; iCluster, integrative clustering framework; LDA, linear discriminant analysis; NTP, nearest template prediction; Orig. cons., original consensus; PCA, principal component analysis.

Integrating the analysis of multiple genomic data, such as gene expression with CNV [44], miRNA with gene expression [73] and five-platform combined subtyping [32] studies can provide even better insights into tumor biology, and more accurate predictions, than the analysis at a single molecular level [74]. With the advances in high-throughput profiling technologies, the expenses spent on each sample are decreasing; thus, multi-platform identification and characterization of cancer is likely to become the norm.

Low- and medium-throughput molecular data for clinical test

Biomarkers identified from subtyping studies can be used in clinical practice. In typical clinical settings, only up to several dozens of these predefined biomarkers are measured to minimize the time and expenses spent on the tests [75]. In addition, most cancer specimens are formalin-fixed paraffin-embedded (FFPE), and only few are freshly prepared or snap frozen [76]. In contrast to the above mentioned high-throughput approaches, some low- and medium-throughput profiling techniques (such as qPCR, NanoString and TMA) that allow meaningful analysis of clinical specimens are well suited for clinical use of biomarker assays. These techniques are frequently used when fast detection time is required, and sample volume and pricing should be kept low. Sensitivity and specificity are the two terms used to evaluate a clinical test. Sensitivity refers to the ability of a test to correctly identify an individual with disease; specificity refers to the ability of a test to correctly identify an individual without the disease [77]. Another important term in the evaluation of a clinical test is to determine its accuracy, which describes the errors that a test will produce when differentiating between individuals with and without the disease [78]. In the following, we will compare these three techniques (qPCR, NanoString and TMA) in terms of accuracy, sensitivity, specificity and other aspects of concerns involved in a clinical test. Researchers can choose appropriate techniques for their clinical assays based on the comparisons provided in Table 1.

qPCR is commonly used to determine biomarker expression levels, or to assess CNVs. Because there is a PCR amplification step, which can greatly increase the nucleic acid input, only limited sample quantity is needed. Other advantages of qPCR include fast, high sensitivity, specificity and accuracy, which make it the routine method for validation of results initially obtained from high-throughput methods such as microarray and RNA-Seq [79]. Compared with other techniques, which can assay hundreds to thousands biomarkers, qPCR-based assays can only handle a limited number of biomarkers in a single test. qPCR-based tests also require high quality of the nucleic acids in the sampled material, so fresh-frozen tissues are typically required for qPCR.

The NanoString nCounter analysis system can be used to measure expression levels of up to 800 genes [80]. Developed by Geiss et al. [39], the nCounter system is more sensitive than microarrays, and similar in sensitivity to qPCR [39]. This technology uses digital molecular barcoding and microscopic imaging to detect and quantify the expression levels of genes in a single assay without enzymatic reactions [39, 81]. Other advantages of this technique include high accuracy and specificity [38]. Disadvantages include the high cost of the required reagents and instruments [80].

TMA is a histology-based test, developed by Kononen et al. [82], which allows the analysis of up to 1000 tumor specimens simultaneously in a single paraffin block [37]. Analysis of molecular targets at the DNA, mRNA and protein levels is possible. Once constructed, a TMA block can be sectioned hundreds of times (provided the depth of all cores is sufficient), with each section amenable to biomarker analysis. The most significant advantage of TMA is that all samples on the array are treated in an identical fashion [83]. Another advantage of TMA is that it is cost-effective (Table 1). Only a small amount of reagent is required to analyze all the samples on one slide [83]. Unlike qPCR, which requires fresh-frozen tissues, TMA requires FFPE tissues, which are the major source of material in the clinic. TMA also has limitations. For instance, low sensitivity, specificity and accuracy are the typical features of a TMA test [84]. Other disadvantages include: it usually takes several days to obtain the analysis results [85], only a limited number of analytes can be tested and the analyzed specimen volume is too small to represent the entire tumor [83]. Also during the TMA staining process, the amount of tissues will become less and less [86].

Subtype identifications and characterizations

Molecular subtyping (or molecular classification) is a process of assigning data objects into clusters, so that objects in the same cluster are more similar to each other than those in other clusters. There are two kinds of classification strategies, supervised (with class labels, such as tumor or normal tissues, known beforehand) and unsupervised (with unlabeled data) classification. Subtyping is a more general term of classification, which can be both supervised and unsupervised. Unsupervised classification is increasingly popular in biomedical research [87], and has been successfully used in many cancer subtyping studies [11, 13, 15, 17, 41, 51, 88, 89]. From these studies, we summarize a workflow for molecular subtyping of cancer. These include: data preprocessing, cluster analysis, supervised classification and subtype characterizations (Figure 1). In the following, we focused our attention on subtype identifications and characterizations, which are the two important aspects in the workflow.

Figure 1

Molecular subtyping of cancer workflow. The workflow consists of four major steps: (A) Data preprocessing. Array data preprocessing include image analysis, data normalization and transformation. Next-generation sequencing data preprocessing contains the following steps: quality control, read alignment, expression quantification, data normalization and transformation. (B) Cluster analysis. A first feature selection is performed with a cutoff on SD (e.g. SD > 0.8) or median absolute deviation (MAD) (e.g. MAD > 0.5). Clustering is usually applied to either feature dimension or sample dimension, biclustering at both dimensions and triclustering at three dimensions (feature, sample and time). After (bi/tri) clustering, the optimal number of (bi/tri) clusters is determined by measurement such as gap statistics, cophenetic coefficients and CDF. Also, ensemble and consensus clustering have been proposed to enhance the robustness of (bi/tri) clustering. (C) Supervised classification. To build the best possible classifier, a sample selection (Silhouette width > 0) and a second feature selection (SAM/Limma) processes are applied. Various algorithms such as PAM, SVM, Random Forests (RF) and K-nearest neighbors can be used to build classifiers. (D) Subtype characterizations. A heatmap is used to represent the molecular characterizations, in which rows are features (genes, miRNAs, pathways, etc.) and columns are samples. Here, features are subtype-specific features; samples are sorted according to their subtype numbers. A Kaplan–Meier survival plot is used to represent the clinical characterizations, in which x-axis is the survival time, and y-axis is the probability of an event (i.e. death).

Subtype identifications

High-throughput molecular data are usually arranged into matrix forms, in which rows are features (genes, miRNAs or DNA methylation markers) and columns are samples. Molecular data matrices have been largely analyzed in two dimensions (2D): the feature dimension and the sample dimension [90]. Clustering is usually applied to either feature dimension or sample dimension. As subsets of features are active or suppressed only under certain experimental conditions, and behave almost independently under other conditions, to identify local patterns in the data matrix, biclustering (or subspace clustering), which allows to discover biclusters, was first proposed by Cheng and Church [91]. Now, various biclustering methods are developed to efficiently identify ‘homogeneous’ submatrices in data, such as singular value decomposition [22], nonnegative matrix factorization (NMF) [23] and geometric-based biclustering [92, 93]. With the fast development of data profiling technologies, it is now possible to have a number of samples for numerous features across multiple time points or experimental conditions. Such data can be arranged into three-dimensional (3D) matrices, with the first two dimensions representing the samples and features, respectively, and the third dimension for time or experimental conditions [94]. To find feature groups along the feature–sample–time (or –condition) dimensions, triclustering is proposed to mine triclusters in the data [95]. As tensor is a concept from mathematics that can be thought of as an organized multidimensional array of numerical values, tensor-based triclustering [96, 97] has become a promising solution for analyzing these longitudinal and spatial data.

The optimal number of clusters is determined by measurements such as gap statistics [98], cophenetic coefficients [99] and cumulative distribution function (CDF). Given that cluster analysis methods are based on different algorithms, they yield different results in terms of cluster numbers and assignments [100]. To enhance the robustness of clustering, a method called cluster ensemble has been proposed, which combines results from different runs of clustering methods into a single consensus result [100]. Another similar methodology is consensus clustering, which in conjunction with resampling techniques provides a method to reach consensus from multiple runs of the same clustering method [101]. The major difference between ensemble and consensus clustering is that ensemble clustering integrates results from multiple clustering methods, while consensus clustering provides resampling and performs a single type of clustering method multiple times. Ensemble and consensus clustering methods are also applicable to biclustering and triclustering, and have been widely used in cancer subtyping studies [19, 20, 46, 102].

Subtype characterizations

Subtype characterizations rely heavily on genomic and clinical data, and one purpose of subtype characterizations is to investigate the associations between the identified subtypes and their molecular/clinical relevance [103]. Subtype characterizations can also help to identify consensus subtypes within and between cancers, which we will cover in detail in ‘Cancer consensus molecular subtypes’ section.

Pathways, mutations, structural variations and methylation patterns can be used as the molecular characteristics. Characterizations of cancer subtypes have implications for patient outcome and targeted therapies. Lex et al. [104] developed an integrative visualization tool called StratomeX, which can help researchers to explore the relationships between subtypes and multiple genomic data types such as gene expression, DNA methylation or copy number data. These genomic data have been discussed in the ‘High-throughput molecular data for cancer subtyping’ section, which can not only be used to identify robust cancer subtypes, but can also help us better understand and interpret the molecular characteristics of the subtypes. In addition, gene set enrichment analysis (GSEA) is usually performed to characterize the biology underlying the identified subtypes. GSEA interprets the expression data at the level of gene sets, groups of genes that share the same biological function, chromosomal location, or regulation [105]. Annotated gene sets with specific biological meanings can be obtained, for example, from Gene Ontology (GO) [106] and KEGG [107] databases.

Clinical data include patient’s information such as age, gender, race, tumor grade, tumor size, time of diagnosis, smoking history, treatment strategies, relapse information, follow-up time and so on, which should be well preserved and managed for clinical characterization of the identified subtypes. Moreover, the survival analysis is a widely used method to compare the survival time differences between subtypes. The Kaplan–Meier estimator [108] can be used to generate the survival curve, and the log rank test provides a statistical comparison of two subtypes [109].

Subtype characterizations are necessary and important. Not only do they help us understand more about the subtype characteristics but also provide a subtype validation process. Ideally, there are distinct molecular and clinical characteristics between identified subtypes. Often, subtypes are only statistically different, but not biologically different. In such cases, reclustering and reclassification should be done until more interpretable results are obtained.

Moving toward clinical applications

From high-throughput molecular data and molecular subtyping of cancer to the development of marker panels using low- and medium-throughput methods, clinicians are beginning to embrace and make treatment decisions for cancer patients based on cancer subtyping studies [110, 111]. In the following, we will provide a few examples of subtyping studies that have been applied to cancer diagnosis, prognosis, response prediction and drug design. Specifically, we will focus on biomarkers for diagnostic and prognostic purposes in ‘Biomarkers identified from subtyping studies for cancer diagnosis and prognosis’ section, and cancer subtypes for therapy response prediction and drug development in ‘Cancer subtypes for predicting therapy response and drug design’ section.

Biomarkers identified from subtyping studies for cancer diagnosis and prognosis

Biomarkers identified from subtyping studies with specific indications for cancer diagnosis and prognosis are now widely applied in clinical research, and increasingly combined with conventional histology to improve diagnostic accuracy [112]. For example, TLE1 as a diagnostic marker for synovial sarcoma [113], and CD10, BCL6 and MUM1 as diagnostic markers for the germinal center B-cell-like (GCB) subtype of lymphoma [114]. Furthermore, biomarkers can be used directly to detect cancer. For instance, Bauer et al. [50] analyzed the complete miRNA repertoire of 136 pancreatic ductal adenocarcinoma (PDAC) samples, 27 pancreatitis samples and 22 normal controls. They used a hierarchical clustering method and an SVM classifier, and found that the analysis of only five miRNAs in blood and tissues can distinguish PDAC from pancreatitis and normal, possibly aiding PDAC diagnosis.

Several multigene predictors have been developed for breast cancer patients [115]. These include MammaPrint, Oncotype DX and simplified MapQuant Dx. These predictors are now widely used in the clinic to classify breast patients and treat them accordingly. MammaPrint was the first successfully applied microarray-based prognostic test for breast cancer. MammaPrint uses a 70-gene signature. To identify these genes, hierarchical clustering was used to classify 98 breast cancer patients into good and poor prognosis groups. This was followed by a three-step supervised classification method to reliably stratify good and poor prognostic categories, and finally found 70 prognostic genes for breast cancer [116]. MammaPrint is a US Food and Drug Administration-approved molecular test to predict the risk of breast cancer metastasis. The result of the test can help physicians to determine the appropriate treatment strategy. Most early-stage breast cancer patients receive adjuvant chemotherapy, but only subset of them benefit from the treatment. Paik et al. [117] thus developed a 21-gene qPCR assay called Oncotype DX. This is a diagnostic test that predicts the likelihood of chemotherapy benefit, and calculates the recurrence scores for early-stage breast cancer. Simplified MapQuant Dx is also a qPCR-based prognostic test for breast cancer. It was developed by Toussaint et al. [118], and is based on the expression patterns of four representative genes from the genomic grade index [119] and four reference genes. The prognostic information provided by the test is only applicable to estrogen receptor-positive breast cancer patients [120].

Cancer subtypes for predicting therapy response and drug design

Subtyping studies are potentially well suited to select a subset of patients that may benefit from certain drugs or therapies. For instance, Rouzier et al. [121] examined if the four subtypes of breast cancer [16] respond differently to chemotherapy. Results showed that the basal-like and ERBB2-overexpressing subtypes are more sensitive to paclitaxel- and doxorubicin-containing preoperative chemotherapy than the luminal and normal-like subtypes.

Tumor specimens for laboratory research are often limited in quantity, infiltrated with nontumor cells and sometimes ethical issues apply. Models for cancer, such as cell lines and patient-derived xenografts (PDXs), have been established as in vitro and in vivo platforms that can overcome these shortcomings of tumor specimens, and are now widely used by researchers. For instance, Ross et al. [122] provided molecular characterization of the NCI (National Cancer Institute)-60 cancer cell line panel, and demonstrated that these cell lines correspond to their tumors of origin. Gao et al. [123] established about 1000 PDXs, which provided excellent in vivo platforms to screen novel therapies for cancer patients. Cancer cell lines and PDXs can also be classified into different subtypes, for example, Kao et al. [124] classified 52 commonly used breast cancer cell lines into five subtypes [42], and defined the cell line subtypes that most faithfully capture the known heterogeneity of breast cancer. Moffitt et al. [51] sequenced 37 PDXs from PDAC and demonstrated that these models can recapitulate tumor-specific subtypes. Therefore, cell line and PDX models can provide a great opportunity to investigate subtype-specific therapies as well.

Recent developments in high-throughput technologies have allowed large-scale screening of chemicals and drugs on cell line panels [125]. For example, the abovementioned NCI-60 cancer cell line panel [126] has been used as a standard platform on which >40 000 chemicals were screened over the past few decades [125]. Besides, Garnett et al. [127] screened a panel of several hundred cancer cell lines with 130 drugs in clinical use and under preclinical investigation, which also provides a powerful strategy to identify subtype-specific cancer therapies and biomarkers to guide such strategies. Drug development is shifting away from cytotoxic agents, to drugs which are designed to target specific molecules that drive the malignant progression [128]. It is still a challenging task, but subtype-specific biomarkers can become potential targets for drug design, and should be investigated and validated further [129, 130].

Challenges

We see four major challenges in cancer subtyping studies that preclude clinical implementation (Figure 2). The first is data acquisition, curation and management. The second challenge is tumor microenvironment (TME) heterogeneity. The remaining two challenges are the lack of consensus molecular subtypes, and problems with single-sample classification, respectively.

Figure 2

Four major challenges in the molecular subtyping of cancer and associated solutions/problems. The first challenge is data acquisition, curation and management. Data from publicly available data sets, such as ICGC, TCGA and GEO can increase sample size or be used as validation data sets. Low tumor cellularity can be addressed by physical and virtual microdissection. The second challenge is TME heterogeneity. The TME includes immune cells, blood vessels, fibroblasts and ECM, which are all exhibit heterogeneity at some level. The third challenge is the lack of consensus molecular subtypes. Currently, we only have three examples of consensus subtyping studies: colorectal cancer, breast cancer and the TCGA’s pan-cancer study. The last challenge is the problem with single-sample classification. Currently applied SSPs may yield inconsistent classification results.

Open in new tab Download slide

Data acquisition, curation and management

Many cancer subtyping studies use a strategy called multiple random training-validation strategy [131], in which a training data set is used to identify molecular signatures, and the validation data sets are used to validate the classification performance. Normally, researchers will use their own data set as training data set, and use publicly available data sets as their validation data sets. Publicly available data sets, such as the International Cancer Genome Consortium (ICGC, www.icgc.org) and The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) contain coordinated large-scale cancer genomic data that can be accessed online. ICGC holds genomic, transcriptomic, epigenomic and clinical data from 50 different cancer types and subtypes. Currently, there are >25 000 tumor genome data available on the ICGC website [132]. TCGA also contains a collection of cancer genomic data, and so far, >30 human tumor types have been analyzed through large-scale genome sequencing from 11 000 patient samples [133]. In addition, Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is a public repository that archives and freely distributes gene expression data from numerous studies [134]. Researchers can upload their own data to the GEO or download data from GEO as validation data sets.

Subtyping studies typically use tumor numbers ranging from dozens to more than few hundreds for their study cohort (Table 2). Identification of cancer subtypes has been frustrated by a lack of tumor samples available for study [19]. For instance, because <20% of PDAC patients have resectable tumors at the time of diagnosis, material for profiling is typically limited [135]. Some studies have overcome this problem by integrating different sources of data into their studies to increase sample size [19, 136]. The introduced batch effects (or nonbiological differences) can be removed by methods like empirical Bayes [137], surrogate variable analysis [138] or Distance Weighted Discrimination [139].

Another common problem is the low tumor cellularity of patient samples, which makes the molecular data noisy. How to capture tumor-specific patterns in such data poses a problem. Because of the tight connection and interaction between cancer cells and surrounding cells, using conventional separation techniques, such as laser capture microdissection [140], cannot perfectly separate tumor cells from nontumor cells. Thus, various statistical enrichment techniques such as virtual microdissection [51], mathematical algorithms like ESTIMATE [141] or qpure [142] can be used to assess tumor cellularity and deconvolve tumor-specific contributions.

In summary, to dissect the genetic heterogeneity of the tumor cell, molecular and clinical data should be well processed and managed. As there are abundant publicly available data sets and various data processing tools that may be useful for answering such questions, researchers should take full advantage of them.

TME heterogeneity

Heterogeneity not only exists in the tumor cell compartment but also in the TME. The TME is the sum of interactions between tumor cells and the surrounding environment, which plays an important role in tumor development, progression and therapy responses. The TME includes immune cells, blood vessels, fibroblasts and extracellular matrix (ECM). Stroma is part of the TME, and is a histological unit consisting of connective tissue, fat tissue, fibroblasts, ECM and immune cells within an extracellular scaffold [143]. Stroma, as a whole, can be classified into different subtypes with clinical implications. For instance, Moffitt et al. [51] used NMF-based consensus clustering of hundreds of PDAC tumors and cell lines, and identified two stroma subtypes named as normal and activated. The activated stroma subtype contributes to poor clinical outcome.

Heterogeneity has also been observed in other components of the TME, such as tumor-infiltrated immune cells, fibroblasts and ECM [144–148]. Solid tumors are infiltrated by various immune cells, for example, T and B lymphocytes, mast cells and so on [149]. These immune cells either play a positive role in inhibition of cancer cell growth or are responsible for the tumor-associated chronic inflammation. The presence of a T-cell-infiltrated TME can serve as a predictive biomarker for response to immunotherapies [144]. However, in many tumor types, only a subset of patients can generate a tumor antigen-specific T-cell response. The remaining patients lack an appropriate T-cell phenotype and resist immunotherapeutic interventions [144]. How to select patients that can potentially benefit from immunotherapies is a challenge. We can address this problem by identifying T-cell response genes and building a binary gene expression classifier, which can distinguish response group from nonresponse group. ECM is a collection of extracellular proteins present in all tissues to provide support to that tissue’s cells [150]. Recent studies have found that considerable heterogeneity exists in the ECM, and clinical outcome is often related with ECM characteristics. For instance, Bergamaschi et al. [147] identified 278 ECM-related genes to classify primary breast tumors into four groups (ECM1–4) with distinct clinical outcomes.

Although tumor and stromal cells have close interactions with each other, stroma cells are different from tumor cells in terms of genetic architecture. Stroma cells are mostly genetically intact [143, 151], which suggests that the stroma could be a target of therapy. Heterogeneity in the characteristic of both tumor cells and TME raise questions regarding future cancer treatment. Which one of them is easier to target? How do we interpret such 2D heterogeneity, and how are they related? Can we incorporate them into a single system? These questions remain to be answered in the future.

Cancer consensus molecular subtypes

Currently, there are six subtyping systems for colorectal cancer (CRC) [20, 46, 45, 47–49], which classify CRC into three to six subtypes (Table 2). To identify robust consensus subtypes of CRCs, a consensus subtyping effort for CRC was initiated. The Colorectal Cancer Subtyping Consortium (CRCSC) developed a network-based approach to investigate the associations between the six independent classification systems. A multi-class classifier was built that could classify CRC into four consensus molecular subtypes (CMS1-4) [152]. CMS1 tumors are highly mutated, microsatellite unstable and show strong immune activation. CMS2 tumors are characterized by marked Wnt and Myc signaling activation. CMS3 cancers are metabolically dysregulated. CMS4 cases feature transforming growth factor-β activation, stromal invasion and angiogenesis signatures. These consensus results will aid future clinical stratification and subtype-based targeted interventions for CRC, and such collaborations should serve as a role model for other cancer subtyping studies to accelerate our understanding of cancer biology [152] and develop more efficient ways to cure cancers.

The use of different patient cohorts, platforms and clustering methods for a specific tumor type, typically yields divergent subtyping results. For breast cancer (Table 2), it was first classified by Perou et al. [16] into four subtypes: luminal, basal-like, normal-like and ERBB2-overexpressing subtypes. Then, Sørlie et al. [42] performed complementary DNA microarrays of 85 breast cancer patients and normal controls, and used hierarchical clustering to classify the patients into one of the five subtypes, i.e. luminal A, luminal B, HER2 over-expression, basal and normal-like. The most recent breast cancer subtyping study by TCGA also suggested four subtypes, which are luminal A, luminal B, HER2-positive and triple-negative subtypes [43]. We can conclude that despite inconsistent naming and number of clusters grouped by different studies [16, 42, 43], breast tumors fall primarily into three major subtypes: luminal, HER2 overexpression and triple-negative breast cancer (TNBC) [89]. The luminal subtype cancer is the most common one and carries a good prognosis. This subset of patients expresses hormone receptors, and this makes them responsive to hormone therapies. The HER2-overexpressing breast cancer subtype is more sensitive to herceptin (trastuzumab) and chemotherapy than the luminal subtype. The TNBC subtype is resistant to standard targeted therapies, and carries the worst prognosis.

The next important consideration is the consensus subtyping between cancers. Although there are many cancer types based on their tissue of origin, we can observe similarities between them. The TCGA’s pan-cancer classification study [32] is a good example of this. Six different ‘omic’ platforms were integratively analyzed, consisting of 3527 tumor specimens across 12 cancer types. A unified cancer classification system was constructed, and it identified 11 major subtypes. Among them, five subtypes were strongly associated with their tissue of origin, but the remaining subtypes were not strictly associated with their tissue of origin. For instance, bladder cancers split into three pan-cancer subtypes. Lung squamous, head and neck and a subset of bladder cancers coalesced into a single subtype. This study not only provided a new classification system for multiple cancers but also demonstrated that general characteristics exist between cancers that were traditionally considered to be different entities.

Cancer is a complex disease. Without a systematic understanding of the characteristics of the disease, we cannot develop effective therapies against it. The general characteristics within and between cancers provide great opportunities to identify consensus molecular subtypes. For example, basal subtypes are defined in breast cancer [42], bladder cancer [88] and pancreatic cancer [51]. Mesenchymal subtypes are defined in glioblastoma (GBM) [41], NPC [15], breast [153], pancreatic [19] and colon cancers [20]. Basal subtypes usually express genes like laminins and keratins, and have the worst prognosis compared with other subtypes. The characteristics of mesenchymal subtypes include a mesenchymal phenotype, high expression of proliferation genes, poor prognosis, high malignant potential and resistance to current therapies. Thus, devise treatments that are effective against multiple cancer types with shared characteristics may become a promising solution for future cancer treatment.

Single-sample classification

The abovementioned classifiers (or predictor) are mainly built based on a large number of training samples, and for this reason, we call them population-based predictors. In contrast, single-sample predictors (SSPs) are classification models that can classify a single sample into one of the molecular subtypes of a specific type of cancer [154, 155]. Traditionally, to classify a new sample into a specific subtype based on population-based predictor, reanalysis of a large data set is needed. Contrary to the population-based predictor, SSPs can assign a single sample to a specific subtype regardless of other samples, and is therefore more useful and practical for individual patients than population-based predictors. SSPs have been built for several types of cancer. For instances, Sørlie et al. [154] constructed the first SSP for breast cancer, Stratford et al. [136] developed an SSP for PDAC and Ringnér et al. [156] derived an SSP for lung adenocarcinoma.

SSPs are constructed based on tumor-intrinsic signatures and similarities between a given sample and molecular subtype centroids [154, 155]. Methods applied in the population-based predictor, such as hierarchical clustering and nearest centroid classification method [157], can be used in the SSP. One of the most important requirements for an SSP is that it cannot be built based on row-centered (mean centering or median centering) data [158]. Normally, molecular data matrices contain features in rows and samples in columns. Row-centering is a feature centering process that can help to remove side effects caused by outlier features. The construction of SSPs features no row-centering step, and studies have found inconsistent classification results caused by SSPs [158–160]. Sørlie et al. [161] accepted Weigelt et al.’s [158] conclusions and comments, and explained why there were inconsistent classification results. The reasons are listed below: for the three one-channel-based data sets, most of the variations were caused by differences between genes, and not so much by differences between samples. So, the correlation values vary greatly over a smaller range in the uncentered data. Therefore, for a sample to be correctly assigned to a subtype, it must be centered against an appropriately large and heterogeneous sample set. Sørlie et al. [161] highlighted the importance of performing row-centering in molecular data-processing steps.

In summary, building SSPs is a challenging but important task, and up to now, there are no effective ways to deal with the centering problem. Although current results are not encouraging, we hope that in the near future, applicable SSPs can be developed and applied in the clinic.

Conclusions and outlook

Heterogeneity renders cancer more than a single disease. This poses a significant challenge to the traditional management of cancer. With the advent of genome-wide molecular profiling of cancer, especially the advancements in high-throughput profiling technologies, researchers can now investigate the collective of genomic and epigenomic changes that exist in cancer. In contrast with traditional classification methods, molecular classification can be used to assign cancers to subgroups with distinct molecular characteristics, tumor biology and clinical presentation.

The most important step in molecular subtyping of cancer is cluster analysis. Different clustering methods can produce different results, many cluster analyses are unstable and cluster analyses are a purely exploratory method [162]. It is hard to tell which algorithm is better, as this largely depends on the question asked. Thus, it is important to ascertain proper preprocessing and normalization of the data; also, ensemble and consensus clustering methods should be considered when doing the cluster analysis. Another important step is subtype characterizations. The identified subtypes should be both statistically significant and biologically relevant. This means that molecular as well as clinical data collection is mandatory to truly characterize the identified subtypes. Also, publicly available data sets can be used to evaluate the classification performance of the classifiers.

Although numerous molecular subtyping studies have been conducted, which have identified subtypes for various cancer types, current cancer patient stratification still largely relies on traditional histopathological observation and assessment. We are facing several challenges (Figure 2). The gap between research findings (identified subtypes) and clinical applications can be bridged by the improvement of statistical methods and better interpretation of the results. When cancers are correctly separated into different subtypes, the next important step is to properly interpret these identified subtypes from a biological point of view followed by a move toward clinical applications. With the successfully applied clinical tests in breast cancer, we hope that this will be followed in other cancer types.

In summary, cancer should not be treated as single disease. Molecular subtyping can identify distinct cancer subtypes, which may shed new lights on the treatment strategies for cancer patients. Several challenges should be addressed before clinical applications can be successfully applied.

Key Points

Heterogeneity renders cancer more than a single disease. Molecular subtyping can be used to assign cancers to subgroups with distinct molecular characteristics, tumor biology and clinical presentation.
Unsupervised classification schemes have been successfully applied to identify subtypes in a large number of malignancies. From these studies, we summarize a workflow for molecular subtyping of cancer. These include data preprocessing, cluster analysis, supervised classification and subtype characterizations.
We identified and described four major challenges in cancer subtyping studies that preclude clinical implementation. The first is data acquisition, curation and management. The second challenge is TME heterogeneity. The remaining two challenges are the lack of consensus molecular subtypes, and problems with single-sample classification, respectively.
We suggest that standardized methods should be established to help identify intrinsic subgroup signatures and to build robust classifiers that pave the way toward stratified treatment of cancer patients.

Lan Zhao is a PhD candidate at the Department of Electronic Engineering, City University of Hong Kong. Her research interests are in the areas of machine learning, cancer genomics and computational biology.

Victor H. F. Lee is currently a Clinical Associate Professor of the Department of Clinical Oncology, the University of Hong Kong. His current interests include clinical and genetic studies on nasopharyngeal cancer, head and neck cancers, lung cancers, liver cancers and gastrointestinal cancers.

Michael K. Ng is the Head and Chair Professor of the Department of Mathematics, and Chair Professor (Affiliate) of Department of Computer Science at the Hong Kong Baptist University. As an applied mathematician, his main research areas include bioinformatics, data mining, operations research and scientific computing.

Hong Yan received his PhD degree from Yale University. He was a Professor of imaging science at the University of Sydney and currently is the chair professor of computer engineering at City University of Hong Kong. His research interests include bioinformatics, image processing and pattern recognition.

Maarten F. Bijlsma is an Associate Professor at the Academic Medical Center with the University of Amsterdam. His research focuses on pancreatic and esophageal cancer, from the most fundamental mechanisms that underlie aberrant signaling in these diseases, to the development of serum-borne markers in patient cohorts to predict treatment response and disease outcome. Furthermore, he is a Biomarker/Imaging Program leader for the AMC/VUmc Cancer Center Amsterdam.

Acknowledgement

The authors thank Xin Wang from Department of Biomedical Sciences of the City University of Hong Kong for comments on an earlier version of the manuscript.

Funding

This work was supported by Hong Kong Research Grants Council (RGC) (Project C1007-15G) and City University of Hong Kong (Project 7004862).

References

1

Campbell

PJ

,

Pleasance

ED

,

Stephens

PJ

, et al.

Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing

.

Proc Nat Acad Sci USA

2008

;

105

(

35

):

13081

–

6

.

2

Shipitsin

M

,

Campbell

LL

,

Argani

P

, et al.

Molecular definition of breast tumor heterogeneity

.

Cancer Cell

2007

;

11

(

3

):

259

–

73

.

3

Macintosh

CA

,

Stower

M

,

Reid

N

, et al.

Precise microdissection of human prostate cancers reveals genotypic heterogeneity

.

Cancer Res

1998

;

58

:

23

–

8

.

4

González-García

I

,

Solé

RV

,

Costa

J.

Metapopulation dynamics and spatial heterogeneity in cancer

.

Proc Natl Acad Sci USA

2002

;

99

(

20

):

13085

–

9

.

5

Iacobuzio-Donahue

CA.

Genetic evolution of pancreatic cancer: lessons learnt from the pancreatic cancer genome sequencing project

.

Gut

2012

;

61

(

7

):

1085

–

94

.

6

Penchev

VR

,

Rasheed

ZA

,

Maitra

A

, et al.

Heterogeneity and targeting of pancreatic cancer stem cells

.

Clin Cancer Res

2012

;

18

(

16

):

4277

–

84

.

7

Burrell

RA

,

McGranahan

N

,

Bartek

J

, et al.

The causes and consequences of genetic heterogeneity in cancer evolution

.

Nature

2013

;

501

(

7467

):

338

–

45

.

8

McGranahan

N

,

Swanton

C.

Biological and therapeutic impact of intratumor heterogeneity in cancer evolution

.

Cancer Cell

2015

;

27

(

1

):

15

–

26

.

9

Duggan

DJ

,

Bittner

M

,

Chen

Y

, et al.

Expression profiling using cDNA microarrays

.

Nat Genet

1999

;

21(Suppl 1)

:

10

–

14

.

10

Metzker

ML.

Sequencing technologies—the next generation

.

Nat Rev Genet

2010

;

11

(

1

):

31

–

46

.

11

Guinney

J

,

Dienstmann

R

,

Wang

X

, et al.

The consensus molecular subtypes of colorectal cancer

.

Nat Med

2015

;

21

(

11

):

1350

–

6

.

12

Bailey

P

,

Chang

DK

,

Nones

K

, et al.

Genomic analyses identify molecular subtypes of pancreatic cancer

.

Nature

2016

;

531

(

7592

):

47

–

52

.

13

Golub

TR

,

Slonim

DK

,

Tamayo

P

, et al.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

.

Science

1999

;

286

(

5439

):

531

–

7

.

14

Alizadeh

AA

,

Eisen

MB

,

Davis

RE

, et al.

Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

.

Nature

2000

;

403

(

6769

):

503

–

11

.

15

Zhao

L

,

Fong

AHW

,

Liu

N

, et al.

Molecular subtyping of nasopharyngeal carcinoma (NPC) and a microRNA-based prognostic model for distant metastasis

.

J Biomed Sci

2018

;

25

:

16

.

16

Perou

CM

,

Sørlie

T

,

Eisen

MB

, et al.

Molecular portraits of human breast tumours

.

Nature

2000

;

406

(

6797

):

747

–

52

.

17

Garber

ME

,

Troyanskaya

OG

,

Schluens

K

, et al.

Diversity of gene expression in adenocarcinoma of the lung

.

Proc Natl Acad Sci USA

2001

;

98

(

24

):

13784

–

9

.

18

Chen

X

,

Cheung

ST

,

So

S

, et al.

Gene expression patterns in human liver cancers

.

Mol Biol Cell

2002

;

13

(

6

):

1929

–

39

.

19

Collisson

EA

,

Sadanandam

A

,

Olson

P

, et al.

Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy

.

Nat Med

2011

;

17

:

500

–

3

.

20

Felipe De Sousa

EM

,

Wang

X

,

Jansen

M

, et al.

Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions

.

Nat Med

2013

;

19

:

614

–

18

.

21

Nielsen

TO

,

West

RB

,

Linn

SC

, et al.

Molecular characterisation of soft tissue tumours: a gene expression study

.

Lancet

2002

;

359

(

9314

):

1301

–

7

.

22

Kluger

Y

,

Basri

R

,

Chang

JT

, et al.

Spectral biclustering of microarray data: coclustering genes and conditions

.

Genome Res

2003

;

13

(

4

):

703

–

16

.

23

Lee

DD

,

Seung

HS.

Learning the parts of objects by non-negative matrix factorization

.

Nature

1999

;

401

(

6755

):

788

–

91

.

24

Tibshirani

R

,

Hastie

T

,

Narasimhan

B

, et al.

Diagnosis of multiple cancer types by shrunken centroids of gene expression

.

Proc Natl Acad Sci USA

2002

;

99

(

10

):

6567

–

72

.

25

Hearst

MA

,

Dumais

ST

,

Osuna

E

, et al.

Support vector machines

.

IEEE Intell Syst Their Appl

1998

;

13

(

4

):

18

–

28

.

26

Khan

J

,

Wei

JS

,

Ringner

M

, et al.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks

.

Nat Med

2001

;

7

:

673

–

9

.

27

Nutt

CL

,

Mani

DR

,

Betensky

RA

, et al.

Gene expression-based classification of malignant gliomas correlates better with survival than histological classification

.

Cancer Res

2003

;

63

:

1602

–

7

.

28

Eisen

MB

,

Spellman

PT

,

Brown

PO

, et al.

Cluster analysis and display of genome-wide expression patterns

.

Proc Natl Acad Sci USA

1998

;

95

(

25

):

14863

–

8

.

29

Pena

JM

,

Lozano

JA

,

Larranaga

P.

An empirical comparison of four initialization methods for the k-means algorithm

.

Pattern Recognit Lett

1999

;

20

:

1027

–

40

.

30

Breiman

L.

Random forests

.

Mach Learn

2001

;

45

:

5

–

32

.

31

Fukunaga

K

,

Narendra

PM.

A branch and bound algorithm for computing k-nearest neighbors

.

IEEE Trans Comput

1975

;

100

:

750

–

3

.

32

Hoadley

KA

,

Yau

C

,

Wolf

DM

, et al.

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin

.

Cell

2014

;

158

(

4

):

929

–

44

.

33

Siang

TC

,

Soon

TW

,

Kasim

S

, et al.

A review of cancer classification software for gene expression data

.

Int J Biosci Biotechnol

2015

;

7

(

4

):

89

–

108

.

34

Wang

Z

,

Gerstein

M

,

Snyder

M.

RNA-seq: a revolutionary tool for transcriptomics

.

Nat Rev Genet

2009

;

10

:

57

–

63

.

35

Guo

Y

,

Sheng

Q

,

Li

J

, et al.

Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data

.

PLoS One

2013

;

8

(

8

):

e71462

.

36

Zhao

S

,

Fung-Leung

WP

,

Bittner

A

, et al.

Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells

.

PLoS One

2014

;

9

(

1

):

e78644

.

37

Shergill

IS

,

Shergill

NK

,

Arya

M

, et al.

Tissue microarrays: a current medical research tool

.

Curr Med Res Opin

2004

;

20

:

707

–

12

.

38

Veldman-Jones

MH

,

Brant

R

,

Rooney

C

, et al.

Evaluating robustness and sensitivity of the nanostring technologies ncounter platform to enable multiplexed gene expression analysis of clinical samples

.

Cancer Res

2015

;

75

(

13

):

2587

–

93

.

39

Geiss

GK

,

Bumgarner

RE

,

Birditt

B

, et al.

Direct multiplexed measurement of gene expression with color-coded probe pairs

.

Nat Biotechnol

2008

;

26

:

317

–

25

.

40

Tamayo

P

,

Slonim

D

,

Mesirov

J

, et al.

Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation

.

Proc Natl Acad Sci USA

1999

;

96

(

6

):

2907

–

12

.

41

Verhaak

RGW

,

Hoadley

KA

,

Purdom

E

, et al.

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1

.

Cancer Cell

2010

;

17

(

1

):

98

–

110

.

42

Sørlie

T

,

Perou

CM

,

Tibshirani

R

, et al.

Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications

.

Proc Natl Acad Sci USA

2001

;

98

(

19

):

10869

–

74

.

43

Cancer Genome Atlas Network

Comprehensive molecular portraits of human breast tumours

.

Nature

2012

;

490

:

61

–

70

.

44

Curtis

C

,

Shah

SP

,

Chin

SF

, et al.

The genomic and transcriptomic architecture of 2, 000 breast tumours reveals novel subgroups

.

Nature

2012

;

486

(

7403

):

346

–

52

.

45

Schlicker

A

,

Beran

G

,

Chresta

CM

, et al.

Subtypes of primary colorectal tumors correlate with response to targeted treatment in colorectal cell lines

.

BMC Med Genomics

2012

;

5

:

66

.

46

Marisa

L

,

de Reyniès

A

,

Duval

A

, et al.

Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value

.

PLoS Med

2013

;

10

(

5

):

e1001453

.

47

Sadanandam

A

,

Lyssiotis

CA

,

Homicsko

K

, et al.

A colorectal cancer classification system that associates cellular phenotype and responses to therapy

.

Nat Med

2013

;

19

:

619

–

25

.

48

Budinska

E

,

Popovici

V

,

Tejpar

S

, et al.

Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer

.

J Pathol

2013

;

231

(

1

):

63

–

76

.

49

Roepman

P

,

Schlicker

A

,

Tabernero

J

, et al.

Colorectal cancer intrinsic subtypes predict chemotherapy benefit, deficient mismatch repair and epithelial-to-mesenchymal transition

.

Int J Cancer

2014

;

134

(

3

):

552

–

62

.

50

Bauer

AS

,

Keller

A

,

Costello

E

, et al.

Diagnosis of pancreatic ductal adenocarcinoma and chronic pancreatitis by measurement of microRNA abundance in blood and tissue

.

PLoS One

2012

;

7

(

4

):

e34151

.

51

Moffitt

RA

,

Marayati

R

,

Flate

EL

, et al.

Virtual microdissection identifies distinct tumor-and stroma-specific subtypes of pancreatic ductal adenocarcinoma

.

Nat Genet

2015

;

47

:

1168

–

78

.

52

Marcucci

G

,

Mrózek

K

,

Bloomfield

CD.

Molecular heterogeneity and prognostic biomarkers in adults with acute myeloid leukemia and normal cytogenetics

.

Curr Opin Hematol

2005

;

12

:

68

–

75

.

53

Nones

K

,

Waddell

N

,

Song

S

, et al.

Genome-wide DNA methylation patterns in pancreatic ductal adenocarcinoma reveal epigenetic deregulation of SLIT-ROBO, ITGA2 and MET signaling

.

Int J Cancer

2014

;

135

(

5

):

1110

–

18

.

54

Waddell

N

,

Pajic

M

,

Patch

AM

, et al.

Whole genomes redefine the mutational landscape of pancreatic cancer

.

Nature

2015

;

518

(

7540

):

495

–

501

.

55

Daemen

A

,

Peterson

D

,

Sahu

N

, et al.

Metabolite profiling stratifies pancreatic ductal adenocarcinomas into subtypes with distinct sensitivities to metabolic inhibitors

.

Proc Natl Acad Sci USA

2015

;

112

(

32

):

E4410

–

17

.

56

Stratton

MR

,

Campbell

PJ

,

Futreal

PA.

The cancer genome

.

Nature

2009

;

458

(

7239

):

719

–

24

.

57

Finkelstein

SD

,

Sayegh

R

,

Christensen

S

, et al.

Genotypic classification of colorectal adenocarcinoma. Biologic behavior correlates with K-ras-2 mutation type

.

Cancer

1993

;

71

(

12

):

3827

–

38

.

58

Vural

S

,

Wang

X

,

Guda

C.

Classification of breast cancer patients using somatic mutation profiles and machine learning approaches

.

BMC Syst Biol

2016

;

10(Suppl 3)

:

62

.

59

Calin

GA

,

Liu

CG

,

Sevignani

C

, et al.

MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias

.

Proc Natl Acad Sci USA

2004

;

101

:

11755

–

60

.

60

Calin

GA

,

Croce

CM.

MicroRNA signatures in human cancers

.

Nat Rev Cancer

2006

;

6

(

11

):

857

–

66

.

61

Calin

GA

,

Garzon

R

,

Cimmino

A

, et al.

MicroRNAs and leukemias: how strong is the connection?

Leuk Res

2006

;

30

(

6

):

653

–

5

.

62

Cantini

L

,

Caselle

M

,

Forget

A

, et al.

A review of computational approaches detecting microRNAs involved in cancer

.

Front Biosci

2017

;

22

:

1774

–

91

.

63

Lu

J

,

Getz

G

,

Miska

EA

, et al.

MicroRNA expression profiles classify human cancers

.

Nature

2005

;

435

(

7043

):

834

–

8

.

64

Feuk

L

,

Carson

AR

,

Scherer

SW.

Structural variation in the human genome

.

Nat Rev Genet

2006

;

7

(

2

):

85

–

97

.

65

Cook

EH

Jr,

Scherer

SW.

Copy-number variations associated with neuropsychiatric conditions

.

Nature

2008

;

455

(

7215

):

919

–

23

.

66

Gonzalez

E

,

Kulkarni

H

,

Bolivar

H

, et al.

The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility

.

Science

2005

;

307

(

5714

):

1434

–

40

.

67

Le Maréchal

C

,

Masson

E

,

Chen

JM

, et al.

Hereditary pancreatitis caused by triplication of the trypsinogen locus

.

Nat Genet

2006

;

38

(

12

):

1372

.

68

Kallioniemi

OP

,

Kallioniemi

A

,

Piper

J

, et al.

Optimizing comparative genomic hybridization for analysis of DNA sequence copy number changes in solid tumors

.

Genes Chromosomes Cancer

1994

;

10

(

4

):

231

–

43

.

69

Sebat

J

,

Lakshmi

B

,

Troge

J

, et al.

Large-scale copy number polymorphism in the human genome

.

Science

2004

;

305

(

5683

):

525

–

8

.

70

Baylin

SB.

DNA methylation and gene silencing in cancer

.

Nat Clin Pract Oncol

2005

;

2

:

S4

–

S11

.

71

Frommer

M

,

McDonald

LE

,

Millar

DS

, et al.

A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands

.

Proc Natl Acad Sci USA

1992

;

89

(

5

):

1827

–

31

.

72

Huang

THM

,

Perry

MR

,

Laux

DE.

Methylation profiling of CpG islands in human breast cancer cells

.

Hum Mol Genet

1999

;

8

:

459

–

70

.

73

Kwon

MS

,

Kim

Y

,

Lee

S

, et al.

Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer

.

BMC Genomics

2015

;

16

:

S4

.

74

Zhao

Q

,

Shi

X

,

Xie

Y

, et al.

Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA

.

Brief Bioinform

2015

;

16

:

291

–

303

.

75

Wang

Y.

Development of cancer diagnostics—from biomarkers to clinical tests

.

Transl Cancer Res

2015

;

4

:

270

–

9

.

76

Corless

CL

,

Spellman

PT.

Tackling formalin-fixed, paraffin-embedded tumor tissue with next-generation sequencing

.

Cancer Discov

2012

;

2

(

1

):

23

–

4

.

77

Lalkhen

AG

,

McCluskey

A.

Clinical tests: sensitivity and specificity

.

Contin Educ Anaesth Crit Care Pain

2008

;

8

(

6

):

221

–

3

.

78

Linnet

K

,

Bossuyt

PMM

,

Moons

KGM

, et al.

Quantifying the accuracy of a diagnostic test or marker

.

Clin Chem

2012

;

58

(

9

):

1292

–

301

.

79

Prokopec

SD

,

Watson

JD

,

Waggott

DM

, et al.

Systematic evaluation of medium-throughput mRNA abundance platforms

.

RNA

2013

;

19

(

1

):

51

–

62

.

80

Kulkarni

MM.

Digital multiplexed gene expression analysis using the NanoString nCounter system

.

Curr Protoc Mol Biol

2011

;

Chapter 25

:

Unit25B.10

.

81

Payton

JE

,

Grieselhuber

NR

,

Chang

LW

, et al.

High throughput digital quantification of mRNA abundance in primary human acute myeloid leukemia samples

.

J Clin Invest

2009

;

119

(

6

):

1714

–

26

.

82

Kononen

J

,

Bubendorf

L

,

Kallionimeni

A

, et al.

Tissue microarrays for high-throughput molecular profiling of tumor specimens

.

Nat Med

1998

;

4

:

844

–

7

.

83

Rimm

DL

,

Camp

RL

,

Charette

LA

, et al.

Amplification of tissue by construction of tissue microarrays

.

Exp Mol Pathol

2001

;

70

:

255

–

64

.

84

Schmidt

LH

,

Biesterfeld

S

,

Kümmel

A

, et al.

Tissue microarrays are reliable tools for the clinicopathological characterization of lung cancer tissue

.

Anticancer Res

2009

;

29

:

201

–

9

.

85

Camp

RL

,

Neumeister

V

,

Rimm

DL.

A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers

.

J Clin Oncol

2008

;

26

(

34

):

5630

–

7

.

86

Hoos

A

,

Cordon-Cardo

C.

Tissue microarray profiling of cancer specimens and cell lines: opportunities and limitations

.

Lab Invest

2001

;

81

:

1331

–

8

.

87

Xu

R

,

Wunsch

DC.

Clustering algorithms in biomedical research: a review

.

IEEE Rev Biomed Eng

2010

;

3

:

120

–

54

.

88

Cancer Genome Atlas Research Network

.

Comprehensive molecular characterization of urothelial bladder carcinoma

.

Nature

2014

;

507

:

315

–

22

.

89

Dai

X

,

Li

T

,

Bai

Z

, et al.

Breast cancer intrinsic subtype classification, clinical use and future trends

.

Am J Cancer Res

2015

;

5

:

2929

–

43

.

90

Madeira

SC

,

Oliveira

AL.

Biclustering algorithms for biological data analysis: a survey

.

IEEE/ACM Trans Comput Biol Bioinform

2004

;

1

(

1

):

24

–

45

.

91

Cheng

Y

,

Church

GM.

Biclustering of expression data

.

Proc Int Conf Intell Syst Mol Biol

2000

;

8

:

93

–

103

.

92

Gan

X

,

Liew

AW-C

,

Yan

H.

Discovering biclusters in gene expression data based on high-dimensional linear geometries

.

BMC Bioinformatics

2008

;

9

(

1

):

209.

93

Zhao

H

,

Liew

AW-C

,

Xie

X

, et al.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data

.

J Theor Biol

2008

;

251

:

264

–

74

.

94

Mankad

S

,

Michailidis

G.

Biclustering three-dimensional data arrays with plaid models

.

J Comput Graph Stat

2014

;

23

:

943

–

65

.

95

Narmadha

N

,

Rathipriya

R.

Triclustering: an evolution of clustering. In:

2016 Online International Conference on Green Engineering and Technologies (IC-GET)

. IEEE, Coimbatore, India.

2016

, 1–4.

96

Li

Y

,

Ngom

A.

Classification of clinical gene-sample-time microarray expression data via tensor decomposition methods. In: Computational Intelligence Methods for Bioinformatics and Biostatistics. Springer-Verlag Berlin, Heidelberg, Palermo, Italy, 2011, 275–86.

97

Luo

Y

,

Wang

F

,

Szolovits

P.

Tensor factorization toward precision medicine

.

Brief Bioinform

2017

;

18

:

511

–

4

.

98

Tibshirani

R

,

Walther

G

,

Hastie

T.

Estimating the number of clusters in a data set via the gap statistic

.

J R Stat Soc Series B Stat Methodol

2001

;

63

:

411

–

23

.

99

Brunet

J-P

,

Tamayo

P

,

Golub

TR

, et al.

Metagenes and molecular pattern discovery using matrix factorization

.

Proc Natl Acad Sci USA

2004

;

101

(

12

):

4164

–

9

.

100

Vega-Pons

S

,

Ruiz-Shulcloper

J.

A survey of clustering ensemble algorithms

.

Int J Pattern Recognit Artif Intell

2011

;

25

(

03

):

337

–

72

.

101

Monti

S

,

Tamayo

P

,

Mesirov

J

, et al.

Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data

.

Mach Learn

2003

;

52

:

91

–

118

.

102

Mukhopadhyay

A

,

Bandyopadhyay

S

,

Maulik

U.

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification

.

PLoS One

2010

;

5

(

11

):

e13803.

103

Wang

X

,

Markowetz

F

,

De Sousa

E

,

Melo

F

, et al.

Dissecting cancer heterogeneity–an unsupervised classification approach

.

Int J Biochem Cell Biol

2013

;

45

:

2574

–

9

.

104

Lex

A

,

Streit

M

,

Schulz

H-J

, et al.

StratomeX: visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization

.

Comput Graph Forum

2012

;

31

:

1175

–

84

.

105

Subramanian

A

,

Tamayo

P

,

Mootha

VK

, et al.

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles

.

Proc Natl Acad Sci USA

2005

;

102

:

15545

–

50

.

106

Ashburner

M

,

Ball

CA

,

Blake

JA

, et al.

Gene Ontology: tool for the unification of biology

.

Nat Genet

2000

;

25

:

25

–

9

.

107

Kanehisa

M

,

Goto

S

,

Hattori

M

, et al.

From genomics to chemical genomics: new developments in KEGG

.

Nucleic Acids Res

2006

;

34

(

90001

):

D354

–

7

.

108

Kaplan

EL

,

Meier

P.

Nonparametric estimation from incomplete observations

.

J Am Stat Assoc

1958

;

53

:

457

–

81

.

109

Mantel

N.

Evaluation of survival data and two new rank order statistics arising in its consideration

.

Cancer Chemother Rep

1966

;

50

:

163

–

70

.

110

Shen

T

,

Pajaro-Van de Stadt

SH

,

Yeat

NC

, et al.

Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes

.

Front Genet

2015

;

6

:

215

.

111

Peyser

ND

,

Grandis

JR.

Cancer genomics: spot the difference

.

Nature

2017

;

541

(

7636

):

162

–

3

.

112

Voduc

D

,

Kenney

C

,

Nielsen

TO.

Tissue microarrays in clinical oncology

.

Semin Radiat Oncol

2008

;

18

(

2

):

89

–

97

.

113

Terry

J

,

Saito

T

,

Subramanian

S

, et al.

TLE1 as a diagnostic immunohistochemical marker for synovial sarcoma emerging from gene expression profiling studies

.

Am J Surg Pathol

2007

;

31

:

240

–

6

.

114

Hans

CP

,

Weisenburger

DD

,

Greiner

TC

, et al.

Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray

.

Blood

2004

;

103

(

1

):

275

–

82

.

115

Yersal

O

,

Barutca

S.

Biological subtypes of breast cancer: prognostic and therapeutic implications

.

World J Clin Oncol

2014

;

5

:

412

–

24

.

116

van 't Veer

LJ

,

Dai

H

,

van de Vijver

MJ

, et al.

Gene expression profiling predicts clinical outcome of breast cancer

.

Nature

2002

;

415

(

6871

):

530

–

6

.

117

Paik

S

,

Shak

S

,

Tang

G

, et al.

A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer

.

N Engl J Med

2004

;

351

(

27

):

2817

–

26

.

118

Toussaint

J

,

Sieuwerts

AM

,

Haibe-Kains

B

, et al.

Improvement of the clinical applicability of the Genomic Grade Index through a qRT-PCR test performed on frozen and formalin-fixed paraffin-embedded tissues

.

BMC Genomics

2009

;

10

:

424

.

119

Sotiriou

C

,

Wirapati

P

,

Loi

S

, et al.

Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis

.

J Natl Cancer Inst

2006

;

98

(

4

):

262

–

72

.

120

Wirapati

P

,

Sotiriou

C

,

Kunkel

S

, et al.

Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures

.

Breast Cancer Res

2008

;

10

:

R65

.

121

Rouzier

R

,

Perou

CM

,

Symmans

WF

, et al.

Breast cancer molecular subtypes respond differently to preoperative chemotherapy

.

Clin Cancer Res

2005

;

11

:

5678

–

85

.

122

Ross

DT

,

Scherf

U

,

Eisen

MB

, et al.

Systematic variation in gene expression patterns in human cancer cell lines

.

Nat Genet

2000

;

24

(

3

):

227

–

35

.

123

Gao

H

,

Korn

JM

,

Ferretti

S

, et al.

High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response

.

Nat Med

2015

;

21

:

1318

–

25

.

124

Kao

J

,

Salari

K

,

Bocanegra

M

, et al.

Molecular profiling of breast cancer cell lines defines relevant tumor models and provides a resource for cancer gene discovery

.

PLoS One

2009

;

4

(

7

):

e6146

.

125

Kim

N

,

He

N

,

Yoon

S.

Cell line modeling for systems medicine in cancers (Review)

.

Int J Oncol

2014

;

44

:

371

–

6

.

126

Shoemaker

RH

,

Monks

A

,

Alley

MC

, et al.

Development of human tumor cell line panels for use in disease-oriented drug screening

.

Prog Clin Biol Res

1987

;

276

:

265

–

86

.

127

Garnett

MJ

,

Edelman

EJ

,

Heidorn

SJ

, et al.

Systematic identification of genomic markers of drug sensitivity in cancer cells

.

Nature

2012

;

483

(

7391

):

570

–

5

.

128

Workman

P

,

Kaye

SB.

Translating basic cancer research into new cancer therapeutics

.

Trends Mol Med

2002

;

8

(

4

):

S1

–

9

.

129

Clarke

PA

,

te Poele

R

,

Workman

P.

Gene expression microarray technologies in the development of new therapeutic agents

.

Eur J Cancer

2004

;

40

:

2560

–

91

.

130

Hijazi

H

,

Wu

M

,

Nath

A

, et al.

Ensemble classification of cancer types and biomarker identification

.

Drug Dev Res

2012

;

73

:

414

–

19

.

131

Michiels

S

,

Koscielny

S

,

Hill

C.

Prediction of cancer outcome with microarrays: a multiple random validation strategy

.

Lancet

2005

;

365

(

9458

):

488

–

92

.

132

Hudson

TJ

,

Anderson

W

,

Aretz

A

, et al.

International network of cancer genome projects

.

Nature

2010

;

464

(

7291

):

993

–

8

.

133

Tomczak

K

,

Czerwińska

P

,

Wiznerowicz

M.

The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge

.

Contemp Oncol

2015

;

19

(

1A

):

A68.

. https://www.ncbi.nlm.nih.gov/books/NBK159736/.

134

Barrett

T.

Gene Expression Omnibus (GEO).

2013

135

Neoptolemos

JP

,

Stocken

DD

,

Friess

H

, et al.

A randomized trial of chemoradiotherapy and chemotherapy after resection of pancreatic cancer

.

N Engl J Med

2004

;

350

:

1200

–

10

.

136

Stratford

JK

,

Bentrem

DJ

,

Anderson

JM

, et al.

A six-gene signature predicts survival of patients with localized pancreatic ductal adenocarcinoma

.

PLoS Med

2010

;

7

(

7

):

e1000307

.

137

Johnson

WE

,

Li

C

,

Rabinovic

A.

Adjusting batch effects in microarray expression data using empirical Bayes methods

.

Biostatistics

2007

;

8

(

1

):

118

–

27

.

138

Leek

JT

,

Storey

JD.

Capturing heterogeneity in gene expression studies by surrogate variable analysis

.

PLoS Genet

2007

;

3

(

9

):

e161.

. https://books.google.com.hk/books?hl=en&lr=&id=TUrMBQAAQBAJ&oi=fnd&pg=PP1&dq=DNA+Microarrays+and+Related+Genomics+Techniques:+Design,+Analysis,+and+Interpretation+of+Experiments&ots=eY-ZofXdvd&sig=17rgrkJzuOYz-TydzaTfLthxwyM&redir_esc=y#v=onepage&q=DNA%20Microarrays%20and%20Related%20Genomics%20Techniques%3A%20Design%2C%20Analysis%2C%20and%20Interpretation%20of%20Experiments&f=false.

139

Benito

M

,

Parker

J

,

Du

Q

, et al.

Adjustment of systematic microarray data biases

.

Bioinformatics

2004

;

20

(

1

):

105

–

14

.

140

Emmert-Buck

MR

,

Bonner

RF

,

Smith

PD

, et al.

Laser capture microdissection

.

Science

1996

;

274

(

5289

):

998

–

1001

.

141

Yoshihara

K

,

Shahmoradgoli

M

,

Martínez

E

, et al.

Inferring tumour purity and stromal and immune cell admixture from expression data

.

Nat Commun

2013

;

4

:

2612

.

142

Song

S

,

Nones

K

,

Miller

D

, et al.

qpure: a tool to estimate tumor cellularity from genome-wide single-nucleotide polymorphism profiles

.

PLoS One

2012

;

7

(

9

):

e45835

.

143

Bhome

R

,

Bullock

MD

,

Al Saihati

HA

, et al.

A top-down view of the tumor microenvironment: structure, cells and signaling

.

Front Cell Dev Biol

2015

;

3

:

33

.

144

Gajewski

TF

,

Schreiber

H

,

Fu

Y-X.

Innate and adaptive immune cells in the tumor microenvironment

.

Nat Immunol

2013

;

14

:

1014

–

22

.

145

Jiménez-Sánchez

A

,

Memon

D

,

Pourpe

S

, et al.

Heterogeneous tumor-immune microenvironments among differentially growing metastases in an ovarian cancer patient

.

Cell

2017

;

170

:

927

–

38.e20

.

146

Orimo

A

,

Weinberg

RA.

Heterogeneity of stromal fibroblasts in tumor

.

Cancer Biol Ther

2007

;

6

(

4

):

618

–

9

.

147

Bergamaschi

A

,

Tagliabue

E

,

Sørlie

T

, et al.

Extracellular matrix signature identifies breast cancer subgroups with different clinical outcome

.

J Pathol

2008

;

214

(

3

):

357

–

67

.

148

Pickup

MW

,

Mouw

JK

,

Weaver

VM.

The extracellular matrix modulates the hallmarks of cancer

.

EMBO Rep

2014

;

15

(

12

):

1243

–

53

.

149

Pages

F

,

Galon

J

,

Dieu-Nosjean

MC

, et al.

Immune infiltration in human tumors: a prognostic factor that should not be ignored

.

Oncogene

2010

;

29

(

8

):

1093

–

102

.

150

Frantz

C

,

Stewart

KM

,

Weaver

VM.

The extracellular matrix at a glance

.

J Cell Sci

2010

;

123

(

Pt 24

):

4195

–

200

.

151

Allinen

M

,

Beroukhim

R

,

Cai

L

, et al.

Molecular characterization of the tumor microenvironment in breast cancer

.

Cancer Cell

2004

;

6

(

1

):

17

–

32

.

152

Guinney

J

,

Dienstmann

R

,

Wang

X

, et al.

The consensus molecular subtypes of colorectal cancer

.

Nat Med

2015

;

21

:

1350

–

6

.

153

Lehmann

BD

,

Bauer

JA

,

Chen

X

, et al.

Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies

.

J Clin Invest

2011

;

121

(

7

):

2750

.

154

Sørlie

T

,

Tibshirani

R

,

Parker

J

, et al.

Repeated observation of breast tumor subtypes in independent gene expression data sets

.

Proc Natl Acad Sci USA

2003

;

100

(

14

):

8418

–

23

.

155

Hu

Z

,

Fan

C

,

Oh

DS

, et al.

The molecular portraits of breast tumors are conserved across microarray platforms

.

BMC Genomics

2006

;

7

:

96.

156

Ringnér

M

,

Jönsson

G

,

Staaf

J.

Prognostic and chemotherapy predictive value of gene-expression phenotypes in primary lung adenocarcinoma

.

Clin Cancer Res

2016

;

22

:

218

–

29

.

157

Haibe-Kains

B

,

Desmedt

C

,

Loi

S

, et al.

A three-gene model to robustly identify breast cancer molecular subtypes

.

J Natl Cancer Inst

2012

;

104

(

4

):

311

–

25

.

158

Weigelt

B

,

Mackay

A

,

A'hern

R

, et al.

Breast cancer molecular profiling with single sample predictors: a retrospective analysis

.

Lancet Oncol

2010

;

11

(

4

):

339

–

49

.

159

Lusa

L

,

McShane

LM

,

Reid

JF

, et al.

Challenges in projecting clustering results across gene expression–profiling datasets

.

J Natl Cancer Inst

2007

;

99

(

22

):

1715

–

23

.

160

Guiu

S

,

Michiels

S

,

Andre

F

, et al.

Molecular subclasses of breast cancer: how do we define them? The IMPAKT 2012 Working Group Statement

.

Ann Oncol

2012

;

23

:

2997

–

3006

.

161

Sørlie

T

,

Borgan

E

,

Myhre

S

, et al.

The importance of gene-centring microarray data

.

Lancet Oncol

2010

;

11

:

719

–

20

.

162

Allison

DB

,

Page

GP

,

Beasley

TM

, et al.

DNA Microarrays and Related Genomics Techniques: Design, Analysis, and Interpretation of Experiments

.

2005

Google Preview

163

Mantione

KJ

,

Kream

RM

,

Kuzelova

H

, et al.

Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq

.

Med Sci Monit Basic Res

2014

;

20

:

138

–

42

.

164

Khansarinejad

B

,

Soleimanjahi

H

,

Mirab Samiee

S

, et al.

Monitoring human cytomegalovirus infection in pediatric hematopoietic stem cell transplant recipients: using an affordable in-house qPCR assay for management of HCMV infection under limited resources

.

Transpl Int

2015

;

28

:

594

–

603

.

165

Pires

ARC

,

Andreiuolo

F da M

,

de Souza

SR.

TMA for all: a new method for the construction of tissue microarrays without recipient paraffin block using custom-built needles

.

Diagn Pathol

2006

;

1

:

14

.

166

SEQC/MAQC-III Consortium

.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

.

Nat Biotechnol

2014

;

32

:

903

–

14

.

167

Singh

A

,

Sau

AK.

Tissue microarray: a powerful and rapidly evolving tool for high-throughput analysis of clinical specimens

.

IJCRI

2010

;

1:1

–

11

.