Deep Subspace Mutual Learning for cancer subtypes prediction Free

The stacked convolutional auto-encoder (Du et al., 2017; Masci et al., 2011) structures are selected for reflecting the interactions between genes indirectly. The Rectified Linear Unit (Krizhevsky et al., 2012) is adopted as non-linear activation function in convolutional layers. The nodes in self-expressive layer are connected fully by linear weights, i.e. C, without bias and non-linear activations. The input data of self-expressive layer is the output of the encoder layers involving non-linear activation function. Hence, although only linear connections are used in self-expressive layer, the whole networks will still achieve the non-linear self-expressive of data. The weight between two corresponding points in self-expressive layer should be set to zero, i.e. constraint $diag (C) = 0$ in Equations (4) and (5), denoted as red dashed lines in Figure 1.

2.1.2 Multi-level omics data representation learning model

X = {X^{(1)}, X^{(2)}, \dots, X^{(V)}}

denotes a set of multi-view samples, where each view corresponds to one level omics data.

X^{(v)} = [x_{1}^{(v)}, x_{2}^{(v)}, \dots, x_{N}^{(v)}] \in R^{D_{v} \times N}

is the data of the vth view, where

v = 1, 2, \dots, V

⁠, V and

D_{v}

being the number of views and data dimensionality from the vth view, respectively. The architecture of the proposed DSML is illustrated in Figure 2. It jointly learns latent individual representation and similarity for each single view by branch parts as well as holistic representation and similarity for across multiple views by concentration main-stem part. It can be seen from the figure that the branch and main-stem parts are composed of varieties of DSCN. Specifically, the intrinsic representation of each view is automatically extracted via the specific-view encoding; meanwhile, the similarity of intra-view data is achieved via specific-view self-expressiveness. In other words, a branch, i.e. DSCN, is constructed for each view data. Then, extracted representation from each view are integrated into the form of series connection and input to the main-stem part. Obviously, the intact representation and similarity of data from all views are concentrated and learnt by multi-view encoding and multi-view self-expressiveness parts in Figure 2, respectively. This joint optimization problem can be formulated as follows:

\begin{matrix} \min_{Z^{(v)}, Z^{(M)}, C^{(v)}, C^{(M)}} (\sum_{v = 1}^{V} {‖ X^{(v)} - {\hat{X}}^{(v)} ‖}_{F}^{2} + {‖ X^{(M)} - {\hat{X}}^{(M)} ‖}_{F}^{2}) \\ + λ_{1} (\sum_{v = 1}^{V} {‖ C^{(v)} ‖}_{F}^{2} + {‖ C^{(M)} ‖}_{F}^{2}) \\ + λ_{2} (\sum_{v = 1}^{V} {‖ Z^{(v)} - Z^{(v)} C^{(v)} ‖}_{F}^{2} + {‖ Z^{(M)} - Z^{(M)} C^{(M)} ‖}_{F}^{2}) \\ s . t . diag (C^{(v)}) = 0, diag (C^{(M)}) = 0. \end{matrix}

(6)

Fig. 2.

Overview of DSML. DSML is composed of several branches (pink, orange and green standing for different views) to achieve specific-view encoding and specific-view self-expressiveness, and a concentrated main-stem (shown with blue) to realize multi-view encoding and multi-view self-expressiveness. Specific-view encoding extracts latent feature representations automatically from each view and specific-view self-expressiveness uncovers the intra-view similarity. Accordingly, the holistic representations from different views are connected and integrated via multi-view encoding. The holistic similarity leant from multi-view self-expressiveness could be used for subsequent clustering task

Each notation in Equation (6) has a similar meaning to Equation (5). However, in the context of multi-view, V denotes the branches for V individual level omics data and M denotes the main-stem for integrated data. That is, $X^{(M)} = {[Z^{(1)}^{T}, Z^{(2)}^{T}, \dots, Z^{(V)}^{T}]}^{T}$ ⁠, where $Z^{(v)}$ is output of encoder in vth branch, i.e. the extracted feature of vth level omics data. The notation T denotes the transpose of a vector or a matrix.

The networks with the structure of branches and main-stem incorporating joint optimization in its design can realize mutual learning. The branches can be seen as the students’ pool. The independent learning aim of each branch is to obtain the individual representation and similarity in each omics data, while consistent learning aim of main-stem is to get the similarity in overall level omics data. DSML is a kind of feed forward neural networks, hence the representation of each omics data, i.e. $Z^{(v)}$ ⁠, can effect on the connection weights within main-stem part. DSML is optimized through back propagation strategy, hence the learning of main-stem part in turn effects on the $Z^{(v)}$ of each branch. Furthermore, the representation $Z^{(v)}$ also influences on the similarity relationship, i.e. self-expressive weights $C^{(v)}$ ⁠. Eventually, the mutual learning is carried out among specific-view encoding and self-expressiveness as well as multi-view encoding and self-expressiveness. Consequently, all of them will be improved in training process. Besides, each branch in trained DSML can be employed as an independent model for uncovering the representation and similarity on single-level data. Since multi-level omics data are involved in training, each trained branch has contained the complementary information from other level data. In practice, even if the patients only have one level of data in test, the prediction made by trained branches can also achieve satisfactory results.

Algorithm 1

The DSML training algorithm.

Input: Multi-level data $X$ ⁠, trade-off parameters $λ_{1}$ and $λ_{2}$ ⁠.

Output: Self-expressive weights $C^{(v)}, C^{(M)}, v = 1, 2, \dots, V$ ⁠.

1: Construct and train auto-encoder networks $A^{(v)}$ for vth level data by minimizing reconstruction error ${‖ X^{(v)} - {\hat{X}}^{(v)} ‖}_{F}^{2}$ ⁠.

2: Initialize specific-view encoding and decoding parts with $A^{(v)}$ for each level data.

3: Learn specific-view self-expressive weights and fine-tune all branches by Equation (5).

4: Connect representation $Z^{(v)}$ from each branch to form input data $X^{(M)}$ of main-stem part.

5: Construct and train auto-encoder networks $A^{(M)}$ by minimizing reconstruction error ${‖ X^{(M)} - {\hat{X}}^{(M)} ‖}_{F}^{2}$ ⁠.

6: Initialize multi-view encoding and decoding with $A^{(M)}$ ⁠.

7: Learn and fine-tune self-expressive weights in main-stem by Equation (5).

8: Fine-tune overall DSML networks by Equation (6).

9: return $C^{(v)}, C^{(M)}$

Training the proposed model involves pre-training and fine-tuning two processes for branch parts, main-stem part and overall networks, respectively. In the pre-training stage, only auto-encoder without self-expressive structure is utilized. The weights of networks are obtained by using Restricted Boltzmann Machines and back propagation algorithm with stochastic gradient descent on mini-batches. In the fine-tuning stage, the encoder and decoder layers are initialized by the weights got from the pre-training stage. The weights of self-expressive layer are learnt by corresponding loss function with back propagation algorithm. Since C represents self-expressive layer parameters and

c_{i}_{i} = 0

⁠, the weights between two corresponding points in the self-expressive layer should be set to zero and not change any more in updates, shown as the red dashed lines in Figures 1 and 2. Additionally, in fine-tuning all the data should be used as a single big batch, as each node in self-expressive layer denotes a data sample and all the data should be involved for updating the weights of networks. The detailed training procedure is described in Algorithm 1.

2.1.3 DSML for partial-level omics data

DSML incorporates the mutual learning mechanism, such that it can handle datasets where only a subset of the omics was measured for some samples, i.e. partial-level omics data. As shown in Figure 2, each branch aims to learn the representation and similarity of data from each omics level, and the main-stem controls the consensus learning by fuzing representation from all branches. Consequently, each branch can be seen as an independence model to deal with single omics level data. In clinical applications, even though the patient who needs to be diagnosed only has single omics level data, the corresponding branch within DSML could still achieve satisfactory prediction results, since this branch model has already involved information from other omics in training stage via mutual learning. Moreover, if the data of ith patient has several omics but lost vth omics, we could set $x_{i}^{(v)}$ equals to the all-zero vector and input it directly to the complete DSML model. This lost omics data would not produce a noticeable effect on the representation of overall data fusion. DSML thereby automatically omits the lost omics data and utilizes the available partial-level omics data to predict the cancer subtypes in a natural manner.

2.2 Spectral clustering

A similarity matrix

S

is constructed, where

S_{i j} = \frac{1}{2} (| C_{i j}^{(M)} | + | C_{j i}^{(M)} |)

⁠. The corresponding diagonal matrix

D

and the Laplacian matrix

L

are defined as follows:

L = I - D^{- 1 / 2} S D^{- 1 / 2}, D_{i i} = \sum_{i j} S_{i j} .

(7)

The spectral clustering (Ng et al., 2001) results can be determined by optimizing the following optimization problem,

\begin{matrix} \min_{B} Trace (B^{T} LB), \\ s . t . B^{T} B = I, \end{matrix}

(8)

where I is identity matrix,

B = Y {(Y^{T} Y)}^{- 1 / 2}

and

Y = {[y_{1}^{T}, y_{2}^{T}, \dots, y_{N}^{T}]}^{T}

⁠.

y_{i}

shows the clustering results, e.g.

y_{i} (k) = 1

indicates that ith patient belongs to the kth cancer subtype.

2.3 Materials

In this article, five publicly available benchmark datasets from TCGA have been used to validate the ability of different integrative algorithms. These datasets are for the following cancer types: Breast Invasive Carcinoma (BIC), COlon ADenocarcinoma (COAD), GBM, Kidney Renal Clear Cell Carcinoma (KRCCC) and Lung Squamous Cell Carcinoma (LSCC). Three levels omics data: mRNA expression, miRNA expression and DNA methylation are used for analysis each cancer type. All datasets used in this article preprocessed as in Rappoport and Shamir (2018, 2019). The corresponding codes can be downloaded from the NEMO website (http://acgt.cs.tau.ac.il/nemo/). The number of patients ranges from 184 for KRCCC to 621 for BIC.

3 Results

The proposed method DSML is compared to six multi-omics prediction algorithms on five full multi-level cancer datasets, and then compared to some methods on these cancer datasets with partial level of data.

3.1 Full multi-level omics datasets

Several experiments were performed to demonstrate the effectiveness of multi-level omics data integrating and clustering for cancer subtypes prediction. We compare our DSML on each dataset to six different methods. We select the classical method SNF, as well as other relevant approaches, including Consensus Cluster (CC) and SNF.CC, which are implemented via R packages Cancer Subtypes (Xu et al., 2017). Moreover, we adopt some late integrative methods, such as PINS, LRAcluster and NEMO.

The survival curves of different clusters and performed enrichment analysis on clinical labels are selected to assess the clustering performance (Rappoport and Shamir, 2019). The P-value is adopted for survival analysis. The logrank test of the Cox regression (Hosmer and Lemeshow, 1999) model is used, in order to assess the significance of the difference in survival profiles between subtypes. The P-value represents that the observed difference in survival is characterized by the possibility of accidental discovery. For enrichment analysis, the same set of clinical information is adopted for all cancers, including age at initial diagnosis, gender as well as four discrete clinical pathological parameters, which quantify the progression of the tumor (pathologic T), cancer in lymph nodes (pathologic N), metastases (pathologic M) and total progression (pathologic stage).

Different algorithms utilize their own individual strategies to estimate the number of clusters, and usually obtain different results. To assess standard comparison purposes, we take the suggestion of the number of clusters from Wang et al. (2014) for all methods in experiments. Hence, the number of clusters is set to five for BIC, three for COAD, three for GBM, three for KRCCC and four for LSCC, respectively. The values of data features are normalized between −1 and before training. We use the publicly available codes of the competing methods and follow the conventional parameter settings therein. For DSML, several values of each parameter are tested, and the best one is selected by using silhouette value of the clustering results. There is one convolutional layer in both the encoding and decoding. The numbers of filter is set to 15 and the filter size is set to 1 × 5. The learning rate is set to 0.001. We always set the trade-off parameter $λ_{1}$ to one for simplicity, and pick $λ_{2}$ value from a candidate set ${20, 50, 100, 150, 200, 250, 350}$ ⁠. We finally find that $λ_{2} = 100$ can achieve satisfying performance for most cases.

Figure 3 and Table 1 demonstrate the prediction performance of seven algorithms on cancer datasets. From the table and figure, we observe that DSML discovers the clusters with significant difference in survival for four cancer types. DSML has an average logrank P-value with 2.2, and the second method is SNF. CC with 2.0. Moreover the average number of enriched clinical parameters of DSML is 2.0, while PINS and CC are tied for second with 1.8. Therefore, DSML could produce significant coherent and clinically relevant patient subtypes.

Fig. 3.

Mean performance of the different algorithms on five cancer datasets. Y-axis represents average −log10 logrank test’s P-values and X-axis represents average number of enriched clinical parameters in the clusters. The red dotted lines highlight DSML’s performance

Table 1.

Prediction performance comparison of integrative algorithms on multi-omics cancer datasets

Alg./cancer	BIC	COAD	GBM	KRCCC	LSCC	Mean
SNF	2/1.7	1/0.3	1/3.2	2/1.0	1/1.3	1.4/1.5
CC	2/2.2	1/0.1	1/1.6	4/2.3	1/0.8	1.8/1.4
SNF.CC	2/3.8	1/0.4	1/2.9	3/0.9	1/1.8	1.6/2.0
PINS	3/3.4	1/0.4	1/2.7	3/0.3	1/1.2	1.8/1.6
LRAcluster	3/2.0	1/0.3	1/1.6	2/3.8	1/1.1	1.6/1.8
NEMO	2/1.9	1/0.1	1/3.9	2/1.1	1/1.2	1.4/1.6
DSML	3/3.9	1/1.0	2/2.3	3/1.9	1/1.9	2.0/2.2

Alg./cancer	BIC	COAD	GBM	KRCCC	LSCC	Mean
SNF	2/1.7	1/0.3	1/3.2	2/1.0	1/1.3	1.4/1.5
CC	2/2.2	1/0.1	1/1.6	4/2.3	1/0.8	1.8/1.4
SNF.CC	2/3.8	1/0.4	1/2.9	3/0.9	1/1.8	1.6/2.0
PINS	3/3.4	1/0.4	1/2.7	3/0.3	1/1.2	1.8/1.6
LRAcluster	3/2.0	1/0.3	1/1.6	2/3.8	1/1.1	1.6/1.8
NEMO	2/1.9	1/0.1	1/3.9	2/1.1	1/1.2	1.4/1.6
DSML	3/3.9	1/1.0	2/2.3	3/1.9	1/1.9	2.0/2.2

Note: Within each cell, the first number indicates significant clinical parameters detected, the second number is −log10 P-value for survival, 0.05 is selected as the threshold for significance and the significant results are shown in bold. Mean is algorithm average.

Table 1.

Prediction performance comparison of integrative algorithms on multi-omics cancer datasets

Alg./cancer	BIC	COAD	GBM	KRCCC	LSCC	Mean
SNF	2/1.7	1/0.3	1/3.2	2/1.0	1/1.3	1.4/1.5
CC	2/2.2	1/0.1	1/1.6	4/2.3	1/0.8	1.8/1.4
SNF.CC	2/3.8	1/0.4	1/2.9	3/0.9	1/1.8	1.6/2.0
PINS	3/3.4	1/0.4	1/2.7	3/0.3	1/1.2	1.8/1.6
LRAcluster	3/2.0	1/0.3	1/1.6	2/3.8	1/1.1	1.6/1.8
NEMO	2/1.9	1/0.1	1/3.9	2/1.1	1/1.2	1.4/1.6
DSML	3/3.9	1/1.0	2/2.3	3/1.9	1/1.9	2.0/2.2

Alg./cancer	BIC	COAD	GBM	KRCCC	LSCC	Mean
SNF	2/1.7	1/0.3	1/3.2	2/1.0	1/1.3	1.4/1.5
CC	2/2.2	1/0.1	1/1.6	4/2.3	1/0.8	1.8/1.4
SNF.CC	2/3.8	1/0.4	1/2.9	3/0.9	1/1.8	1.6/2.0
PINS	3/3.4	1/0.4	1/2.7	3/0.3	1/1.2	1.8/1.6
LRAcluster	3/2.0	1/0.3	1/1.6	2/3.8	1/1.1	1.6/1.8
NEMO	2/1.9	1/0.1	1/3.9	2/1.1	1/1.2	1.4/1.6
DSML	3/3.9	1/1.0	2/2.3	3/1.9	1/1.9	2.0/2.2

To evaluate the robustness of the proposed DSML to varying parameter $λ_{2}$ ⁠, we select its value from the candidate set and execute DSML on the BIC cancer dataset with three-omics. Figure 4 shows the number of significantly enrichment and −log10 logrank P-value with varying $λ_{2}$ ⁠. It can be seen from the figure that changing $λ_{2}$ has a little effect on the prediction performance of cancers. Hence, we can conclude that DSML is relatively robust to the choice of $λ_{2}$ ⁠.

Fig. 4.

Robustness analysis for DSML

3.2 Partial-level omics datasets

In order to evaluate the performance and the flexibility of the algorithm on partial-level omics datasets, we apply DSML in two scenarios.

In the first scenario, only single-level omics data are available at diagnosis. Since branch parts in DSML communicate with and learn from each other in training stage, the weights in single branch networks have contained the information from multiple omics data. Thus, even though there is only one omics data at diagnosis stage, the branch part can still handle this situation. Specifically, the corresponding branch part within the trained DSML is adopted to obtain the similarity matrix among given single-level data. Then, spectral clustering based on this similarity matrix is utilized to identify cancer subtypes. We select conventional spectral clustering on original single-level data for comparisons. Comparison results are presented in Table 2. It is obvious that the performance of clustering using similarity matrix obtained by branch model is much better than it only obtained by original single-level data. This phenomenon indicates that the use of mutual learning mechanism can significantly improve the ability of data representation for subtype prediction. Even though only one level data is used for subtype’s prediction, it still archives satisfactory results by using DSML.

Table 2.

Prediction performance comparison of spectral clustering and branches within DSML on each single omics data

	Spectral clustering			Branches within DSML
	mRNA	miRNA	DNAm	mRNA	miRNA	DNAm
BIC	2/1.4	2/0.5	1/0.7	3/2.9	2/1.8	2/1.7
COAD	0/0.3	0/0.4	0/0.6	1/0.9	0/0.5	1/0.6
GBM	1/0.9	1/0.8	1/0.4	1/2.0	1/1.9	2/1.8
KRCCC	1/0.6	1/1.0	1/0.1	2/1.9	1/2.0	2/1.9
LSCC	1/0.2	1/0.5	1/0.3	1/1.0	1/0.9	1/1.7
Mean	1.0/0.7	1.0/0.6	0.8/0.4	1.6/1.7	1.0/1.4	1.6/1.5

	Spectral clustering			Branches within DSML
	mRNA	miRNA	DNAm	mRNA	miRNA	DNAm
BIC	2/1.4	2/0.5	1/0.7	3/2.9	2/1.8	2/1.7
COAD	0/0.3	0/0.4	0/0.6	1/0.9	0/0.5	1/0.6
GBM	1/0.9	1/0.8	1/0.4	1/2.0	1/1.9	2/1.8
KRCCC	1/0.6	1/1.0	1/0.1	2/1.9	1/2.0	2/1.9
LSCC	1/0.2	1/0.5	1/0.3	1/1.0	1/0.9	1/1.7
Mean	1.0/0.7	1.0/0.6	0.8/0.4	1.6/1.7	1.0/1.4	1.6/1.5

Note: mRNA, miRNA and DNAm denote mRNA expression, miRNA expression and DNA methylation data, respectively. The numbers in each cell have the same meaning as in Table 1.

Table 2.

Prediction performance comparison of spectral clustering and branches within DSML on each single omics data

	Spectral clustering			Branches within DSML
	mRNA	miRNA	DNAm	mRNA	miRNA	DNAm
BIC	2/1.4	2/0.5	1/0.7	3/2.9	2/1.8	2/1.7
COAD	0/0.3	0/0.4	0/0.6	1/0.9	0/0.5	1/0.6
GBM	1/0.9	1/0.8	1/0.4	1/2.0	1/1.9	2/1.8
KRCCC	1/0.6	1/1.0	1/0.1	2/1.9	1/2.0	2/1.9
LSCC	1/0.2	1/0.5	1/0.3	1/1.0	1/0.9	1/1.7
Mean	1.0/0.7	1.0/0.6	0.8/0.4	1.6/1.7	1.0/1.4	1.6/1.5

	Spectral clustering			Branches within DSML
	mRNA	miRNA	DNAm	mRNA	miRNA	DNAm
BIC	2/1.4	2/0.5	1/0.7	3/2.9	2/1.8	2/1.7
COAD	0/0.3	0/0.4	0/0.6	1/0.9	0/0.5	1/0.6
GBM	1/0.9	1/0.8	1/0.4	1/2.0	1/1.9	2/1.8
KRCCC	1/0.6	1/1.0	1/0.1	2/1.9	1/2.0	2/1.9
LSCC	1/0.2	1/0.5	1/0.3	1/1.0	1/0.9	1/1.7
Mean	1.0/0.7	1.0/0.6	0.8/0.4	1.6/1.7	1.0/1.4	1.6/1.5

Note: mRNA, miRNA and DNAm denote mRNA expression, miRNA expression and DNA methylation data, respectively. The numbers in each cell have the same meaning as in Table 1.

In the second scenario, some patients loss omics measurements. In experiments, we randomly sampled a fraction θ of the patients and removed their mRNA expression, as described in NEMO (Rappoport and Shamir, 2019). This procedure is repeated five times. The survival analysis and enrichment of clinical labels are still adopted to measure the quality of the prediction solutions. Average results of DSML and NEMO on all five cancer types are shown in Figure 5. The figure reveals that DSML gives a better performance than NEMO with respect to survival and enrichment analysis under all missing rates. These results suggest that DSML can be robustly applied to partial-level omics datasets.

$Average performance as a function of the fraction of samples missing data in mRNA expression. The top plot shows the results of enriched clinical parameters and the bottom plot shows the results of survival analysis$

Fig. 5.

Average performance as a function of the fraction of samples missing data in mRNA expression. The top plot shows the results of enriched clinical parameters and the bottom plot shows the results of survival analysis

In general, the proposed DSML can obtain the cancer subtypes with statistically significant difference in survival profiles and significant clinical enrichment. Moreover, DSML can effectively solve the problem of partial-level omics data. Hence, DSML is a powerful framework for predicting cancer subtypes.

4 Conclusion

Cancer subtypes prediction plays an important role in personalized medicine framework, since stratifying patients correctly into subtypes can provide more targeted treatment and it would ultimately lead to better survival rates of patients. Integrating multiple level omics data can significantly improve clinical outcome predictions, since cancer is a phenotypic end-point incident cumulated via multiple levels in biological system from genome to proteome. In this study, a method called DSML has been proposed for subtype’s prediction by integrating multi-level omics data. DSML employs deep neural networks by incorporating subspace learning and mutual learning to recover the intrinsic similarity relationships among intra-level and across level data, and then adopts spectral clustering to predict patient subtypes. DSML can extract discriminative features simultaneously from multiple branch parts and fuze features via main-stem part. The mutual learning strategy provides an effective solution to the problem of partial-level data missing. Experimental results on five TCGA multiple omics datasets clearly indicate that DSML has better integrative performance compared to other relevant technologies. Moreover, DSML also effectively overcomes the prediction issues related to single-level omics data and partial-level omics data. Thus, DSML is a more general framework for multi-level omics data integrative analysis. As a scope of future work, we plan to involve protein–protein interaction networks to improve the interpretability of integrative strategy.

Acknowledgements

We would like to thank Shuhui Liu for useful conversations. We are grateful to anonymous reviewers for their many helpful and constructive comments that improved the presentation of the article.

Funding

This work was supported by National Natural Science Foundation of China (NSFC) Grant (61806159, 61972312); Xi’an Municipal Science and Technology Program (2020KJRC0027); Natural Science Basic Research Program of Shaanxi (2020JM-575); and Doctoral Scientific Research Foundation of Xi’an Polytechnic University (BS202108).

Conflict of Interest: none declared.

References

Akavia

U.D.

et al. (

2010

)

An integrated approach to uncover drivers of cancer

Cell

143

1005

–

1017

Alizadeh

A.A.

et al. (

2015

)

Toward understanding and exploiting tumor heterogeneity

Nat. Med

846

–

853

Bailey

et al. ; Australian Pancreatic Cancer Genome Initiative. (

2016

)

Genomic analyses identify molecular subtypes of pancreatic cancer

Nature

531

–

Beroukhim

et al. (

2010

)

The landscape of somatic copy-number alteration across human cancers

Nature

463

899

–

905

Chin

Gray

J.W.

(

2008

)

Translating insights from the cancer genome into clinical practice

Nature

452

553

–

563

Collisson

E.A.

et al. (

2019

)

Molecular subtypes of pancreatic cancer

Nat. Rev. Gastroenterol. Hepatol

207

–

220

Croce

C.M.

(

2008

)

Oncogenes and cancer

N. Engl. J. Med

358

502

–

511

Davis-Dusenbery

B.N.

Hata

(

2010

)

MicroRNA in cancer the involvement of aberrant microRNA biogenesis regulatory pathways

Genes Cancer

1100

–

1114

et al. (

2017

)

Stacked convolutional denoising auto-encoders for feature representation

IEEE Trans.. Cybern

1017

–

1027

Elhamifar

Vidal

(

2013

)

Sparse subspace clustering: algorithm, theory, and applications

IEEE Trans. Pattern Anal. Mach. Intell

2765

–

2781

Gao

Church

(

2005

)

Improving molecular cancer class discovery through sparse non-negative matrix factorization

Bioinformatics

3970

–

3975

Hanash

(

2004

)

Integrated global profiling of cancer

Nat. Rev. Cancer

638

–

644

Heiser

L.M.

et al. (

2012

)

Subtype and pathway specific responses to anticancer compounds in breast cancer

Proc. Natl. Acad. Sci. USA

109

2724

–

2729

Hosmer

D.W.

Lemeshow

(

1999

)

Applied Survival Analysis: Regression Modeling of Time to Event Data

Wiley-Interscience

, New York, USA.

Google Preview

Jahid

M.J.

et al. (

2014

)

A personalized committee classification approach to improving prediction of breast cancer metastasis

Bioinformatics

1858

–

1866

et al. (

2017

) Deep subspace clustering networks. In: Neural Information Processing Systems. pp.

–

. Long Beach, USA.

Kanaci

et al. (

2019

) Multi-task mutual learning for vehicle re-identification. In: Proceeding IEEE Conference on Computer Vision and Pattern Recognition. pp.

–

. Long Beach, USA.

Kim

et al. (

2012

)

Synergistic effect of different levels of genomic data for cancer clinical outcome prediction

J. Biomed. Inform

1191

–

1198

Krizhevsky

et al. (

2012

) ImageNet classification with deep convolutional neural networks. In: Proceeding Neural Information Processing Systems. pp.

1097

–

1105

. Lake Tahoe, USA.

Lanckriet

G.R.

et al. (

2004

)

A statistical framework for genomic data fusion

Bioinformatics

2626

–

2635

Lerman

Maunu

(

2018

)

An overview of robust subspace recovery

Proc. IEEE

106

1380

–

1410

Liu

et al. (

2018

)

Low rank subspace clustering via discrete constraint and hypergraph regularization for tumor molecular pattern discovery

IEEE/ACM Trans. Comput. Biol. Bioinform

1500

–

1512

Liu

Shang

(

2018

) Hierarchical similarity network fusion for discovering cancer subtypes. In: International Symposium on Bioinformatics Research and Applications. pp.

–

. Beijing, China.

et al. (

2005

)

MicroRNA expression profiles classify human cancers

Nature

435

834

–

838

Masci

et al. (

2011

) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Proceeding International Conference on Artificial Neural Networks. pp.

–

. Granada, Spain.

et al. (

2013

)

Pattern discovery and cancer gene identification in integrated cancer genomic data

Proc. Natl. Acad. Sci. USA

110

4245

–

4250

A.Y.

et al. (

2001

) On spectral clustering: analysis and an algorithm. In: Proceeding Neural Information Processing Systems. pp.

849

–

856

. British Columbia, Canada.

Nguyen

et al. (

2017

)

A novel approach for data integration and disease subtyping

Genome Res

2025

–

2039

Nigro

J.M.

et al. (

2005

)

Integrated array-comparative genomic hybridization and expression array profiles identify clinically relevant molecular subtypes of glioblastoma

Cancer Res

1678

–

1686

Noushmehr

et al. ; Cancer Genome Atlas Research Network. (

2010

)

Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma

Cancer Cell

510

–

522

Patel

V.M.

Vidal

(

2014

) Kernel sparse subspace clustering. In: IEEE International Conference on Image Processing. pp.

2849

–

2853

. Paris, France.

Peng

et al. (

2018

)

Structured autoencoders for subspace clustering

IEEE Trans. Image Process

5076

–

5086

Peng

et al. (

2020

)

Deep subspace clustering

IEEE Trans. Neural Netw. Learn. Syst

5509

–

5521

Prat

et al. (

2010

)

Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer

Breast Cancer Res

R68

–

R85

Rappoport

Shamir

(

2018

)

Multi-omic and multi-view clustering algorithms: review and cancer benchmark

Nucleic Acids Res

10546

–

10562

Rappoport

Shamir

(

2019

)

NEMO: cancer subtyping by integration of partial multi-omic data

Bioinformatics

3348

–

3356

Ritchie

M.D.

et al. (

2015

)

Methods of integrating data to uncover genotype-phenotype interactions

Nat. Rev. Genet

–

Sanai

(

2010

)

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma

World Neurosurg

–

Shen

et al. (

2009

)

Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis

Bioinformatics

2906

–

2912

Soltanolkotabi

et al. (

2013

)

Robust subspace clustering

Ann. Stat

669

–

699

Speicher

N.K.

Pfeifer

(

2015

)

Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery

Bioinformatics

i268

–

i275

Viale

(

2012

)

The current state of breast cancer classification

Ann. Oncol

23 (Suppl. 10

x207

–

x210

PubMed

Wang

et al. (

2014

)

Similarity network fusion for aggregating data types on a genomic scale

Nat. Methods

333

–

337

Wang

Y.X.

(

2016

)

Noisy sparse subspace clustering

J. Mach. Learn. Res

–

PubMed