Abstract

We present a survey of the current state-of-the-art in breast cancer detection and prognosis. We analyze the evolution of Artificial Intelligence-based approaches from using just uni-modal information to multi-modality for detection and how such paradigm shift facilitates the efficacy of detection, consistent with clinical observations. We conclude that interpretable AI-based predictions and ability to handle class imbalance should be considered priority.

INTRODUCTION

Carcinogenesis, influenced by genetics and the environment, affects all organs and tissues, leading to various cancers. Key factors in its progression include cell survival evasion, uncontrolled cell division, increased blood vessel formation, resistance to growth signals, self-signal generation and metastatic potential [1]. Breast and skin cancers are among the most common, with breast cancer showing high incidence and mortality rates, while skin cancer has a high incidence with moderate mortality in recent decades. GLOBOCAN 2020 data [2] reported 2.3 million new breast cancer cases globally, ranking it as the 5th leading cause of cancer-related deaths. The Global Cancer Observatory [3] offers comprehensive global cancer statistics for research and control efforts. The estimate provided in the year 2020 showed 19 292 789 new cancer incidences of which 11.7% remains the breast cancer. This ratio even increased to 24.5% with the consideration of only female cancer patients. By 2040, the projected annual diagnosis of new cases is 3.2 million with a mortality of 1 million [4].

Breast cancer mortality is a serious concern and is more prevalent in developing countries. The mortality-to-incidence ratio indicates the 5-year survival rates of breast cancer patients, and it is 0.30 for the year 2020 globally. If we consider countries with developed and less developed health care, the 5-year survival rates are 89.6% and 76.3% for localized breast cancer, respectively, and 75.4% and 47.4% for regional breast cancer, respectively [5]. In the instances where patients survive this disease, the quality of life is still compromised due to its aftereffects and financial crises, measured as Disability-Adjusted Life Years (DALYs). According to the World Health Organization, breast cancer alone contributes to 19.6 million DALYs [6].

Unlike men, female breast cells are hormonally sensitive, particularly to estrogen, androgen and progesterone hormones [7]. The risk of breast cancer increases with age, with a 1.5% risk at age 40, 3% at age 50 and over 4% at age 70 [8]. Most breast cancer patients (80%) are over 50 years old, and 40% of them are aged 65 or older [9–11]. A family history of breast or ovarian cancer due to BRCA1 and BRCA2 mutations elevates the risk of breast cancer [12, 13]. Besides BRCA1 and BRCA2 [14], high-penetrance genes such as TP53, CDH1, PTEN and STK11 [15–19], as well as moderate-penetrance DNA-repair genes such as ATM, PALB2, BRIP1, CHEK2 and XRCC2 [20–24], are linked to increased breast cancer risk. White non-Hispanic women have the highest breast cancer incidence, black women have a higher mortality rate [25, 26]. Females are more susceptible to breast cancer, making research on pregnancy [27, 28], breastfeeding [29] and factors such as the timing of first menstruation and menopause [30] and hormonal imbalances crucial in understanding breast carcinogenesis. Females with higher breast tissue density, a history of any non-cancerous alteration in the breast, and recipients of radiation therapy before the age of 30 are also at a higher risk of breast cancer [31–35].

Accessing diverse multi-modal data is crucial for predicting breast cancer clinical outcomes. These data originate from various sources and is referred to as multi-modal or multi-view data. These sources encompass genomics profiles, clinical details, microscopic tissue images (WSI), Ultrasounds (US), Magnetic Resonance Imaging (MRIs) and mammograms, all of which are both heterogeneous and complex. Prognosticating breast cancer plays a vital role in guiding treatment decisions, enabling oncologists to anticipate treatment outcomes and plan accordingly, ultimately reducing patient suffering by avoiding unnecessary, toxic therapies and lowering economic burdens [36]. Molecular subtype identification and survival prognosis prediction have become central concerns in modern breast cancer research due to the heterogeneity and complexity resulting from multi-modality and diverse clinical outcomes [37]. Addressing this challenge requires intelligent, automated systems capable of analyzing complex data and delivering accurate cancer prognoses.

This survey discusses breast cancer-related topics, including available databases, feature selection, dimensionality reduction and the shift from uni-modal to multi-modal machine learning (ML) and deep learning (DL) for molecular subtype and survival prognosis predictions. It also discusses challenges faced by ML algorithms mitigated by our novel interventions for survival prognosis.

DATABASES OF BREAST CANCER

There are several organizations that work toward providing publicly accessible breast cancer databases for widespread implementation including uni and multi-modal cohorts of breast cancer patients. TCGA (The Cancer Genome Atlas) is the most widely used database in cancer studies. A collaborative outcome of the National Human Genome Research Institute and the National Cancer Institute provides a multi-modal PAN-Cancer database. The TCGA database includes samples from 33 different types of cancers such as breast, lung, kidney, liver, prostate, cervix, etc. In this study, we focus only on the breast cancer samples from the TCGA database which are represented as TCGA-BRCA. The dataset has a multi-modal aspect in terms of clinical, multi-omics and histopathological whole slide images. The multi-omics aspect is further categorized as RNA-Seq (mRNA), microRNA (miRNA), DNA methylation, copy number variations (CNV), somatic copy number alterations (CNA), sequencing data and Reverse Phase Protein Array (RPPA). Gene Expression Omnibus (GEO) is the second database of cancer patients. For breast cancer studies, GEO offers accessible expression profiling by microarray-based datasets, generated using the Affymetrix, Illumina and Agilent platforms (Supplementary file has further details).

FEATURE SELECTION AND DIMENSIONALITY REDUCTION IN MULTI-MODAL BREAST CANCER DATA

DL-based classifiers are capable enough to learn from the raw features and do not require feature selection or feature engineering. However, in the low-sample regime with a large number of features, designing a generalized model becomes a serious challenge. In the breast cancer high-dimensional multi-omics dataset, it is an unavoidable scenario and not advisable to train any DL classifier on multi-omics data directly. The smaller feature space due to representational learning in multi-omics data helps in better interpretation and higher training speed, while an adequate representation helps in capturing hidden biological and technical patterns. In this regard, the widely accepted methods follow feature selection, feature extraction or a hybrid of feature selection and feature extraction. Intending to remove redundant and irrelevant gene signatures and a better feature selection method, support vector machine-based recursive feature elimination (SVM-RFE) [38] was proposed. It identified 50 gene signatures outperforming the already highlighted 70 gene signatures [39].

Cristovao et al. [40] follows a naive feature selection on mRNA expression and miRNA from ARCHS4 and TCGA-BRCA databases to pick a common set of genes. Pouryahya et al. [41] applies a gene selection technique on the multi-omics features to the gene level and selects a common set of genes present across all the modalities (mRNA expression, CNA, DNA methylation) data and the PPI network.

Advancement of feature selection technique is witnessed with [42] using the Chi-squared test to get the top 200 relevant gene features from gene expression and CNA profiles via a wrapper method consisting of minimum redundancy maximum relevance (mRMR) [43] in collaboration with radial basis function SVM. Sun et al. [44] uses the mRMR, while Guo et al. [45] uses the modified mRMR method (fast-MRMR) as the feature selection technique from genomics profiles. Lin et al. [46] uses the combination of feature selection and feature extraction to get the top 5000 features from each omics (mRNA, DNA methylation and CNV) profile.

In the context of extracting meaningful patterns from raw data, Liu et al. [47] combines the raw features of mRNA and CNV from the TCGA-BRCA database. Viaud et al. [48], in their work, first perform feature selection by retaining features with the highest variance/mean ratios with further exploration of group factor analysis methods and autoencoders to obtain the integrative representation of DNA methylation, miRNA expression, mRNA expression and RPPA expression. Reference [40] explores the utilization of the Variational Autoencoder (VAE) [49] and Conditional Variational Autoencoder (CVAE) [50] conditioned on the tissue type to get the dimension-reduced representative features of multi-omics data.

UNI-MODAL ARCHITECTURES AND ML ALGORITHMS

This section covers the challenges of using uni-modal methods (Mammograms, US, MRI, etc.) on predictive inference techniques including classical ML and DL architectures. Mammograms are one of the most effective ways of detecting early breast cancer. It is a X-ray done on the breast tissue. Two standard views captured by radiologists are the craniocaudal (CC) and the mediolateral oblique (MLO) view. Mammograms detect the presence of masses (soft tissues) and microcalcification, the two most important indicators of malignancy. Research by Berkman et al. [51] used background-corrected images on CNN to classify the region of interest (ROI) on mammograms as masses of normal tissue. The small dataset is of 85 benign and 83 malignant masses. Carneiro et al. [52] considered using two views, the CC view and the MLO view on the publicly available datasets (InBreast and DDSM) with mass and microcalcification segmentation. Classification to indicate the malignant and benign lesion using the CNN-F model gave reasonably good results. Wei et al. [53] conducted an investigation on the ML algorithms for automated classification of clustered microcalcification to be malignant or benign by using the two views of the mammograms. The dataset was built at the Department of Radiology at the University of Chicago and considered eight features, computed from the mammogram to characterize a microcalcification Cluster. The authors used several classical and DL models and found the AUC for SVM superseding the rest.

The appearance of tumors (spiculated masses in particular) may give useful prediction insights on the malignancy. Kooi et al. [54] have built a CNN and used Gaussian Kernels for the detection of solid, malignant lesions, outperforming computer-aided diagnosis and detection (CAD) systems on a large dataset of 45 000 images. Recently, a few large studies have shown that women with dense breasts have a higher risk for breast cancer, and mammographic density works as a risk marker for breast cancer [55], which classify breast densities into ‘scattered density’ and ‘heterogeneously dense’ (the two difficult-to-distinguish categories). A similar work by Lehman et al. [56] used 58 894 digital mammograms on ResNet-18 architecture to categorize the breast density into four categories- almost entirely fatty, scattered areas of fibroglandular tissue, heterogeneously dense and extremely dense. The categories are defined as per the standards from the BI-RADS lexicon. A binary classification of the four categories—dense (category c and d) versus non-dense (category a and b) is also evaluated. Arora et al. [57] trained multiple DL architectures (AlexNet, VGG16, etc.) on CBIS-DDSM (Curated Breast Imaging Subset of Digital Database for Screening Mammography) dataset. Lie et al. [58] presented DenseNet II architecture to classify benign and malignant breast cancer diagnosis.

US and MRI modalities

In the past, US and MRI images have been used independently for breast cancer prognosis. However, in the recent multi-modal fusion approaches, these individual modalities have not been used frequently. US: US is a process of utilizing high-frequency sound waves to take an image of lesions and tissue in the breast. The high-frequency waves capture the details of the region of interest and convert it into an image which is later analyzed by an experienced radiologist. A study by Byra et al. [59] used VGG19 on publicly available datasets—UDIAT, and Open Access Series of Breast Ultrasonic Data. Choi et al. [60] used B-mode US images to differentiate the lesions into ’possibly benign’ and ’possibly malignant’ categories by proposing a DL-based CAD. Huang et al. [61] explored a 2-stage CNN (G-CNN) for classifying cancer into 5 BI-RADS assessment categories—probably benign, low suspicion for malignancy, moderate suspicion for malignancy, high suspicion for malignancy and highly suggestive of malignancy. MRI: Prediction of tumor responses to Neoadjunct Chemotherapy (NAC) with the help of multiparametric MRI images taken prior to NAC, during and after NAC, etc., is also practiced. Breast cancer patients who undergo NAC may achieve a pathologic complete response (pCR), partial or no response. Predicting the response of NAC becomes important to avoid delays in surgical decisions. An interesting example on predicting tumor response by using just the pretreatment MRIs is done by Aghaei et al. [62] They used quantitative kinetic imaging features from the pretreatment MRI scans of cancer patients. Yuhong et al. [63] incorporated both pre-NAC and post-NAC MRI datasets to predict the pCR of NAC.

SHIFTING THE PARADIGM FROM UNI-MODAL TO MULTIMODAL FOR MOLECULAR SUBTYPE CLASSIFICATION

Breast cancer can be broadly categorized into molecular subtypes which primarily depend on the expression levels of Estrogen Receptor, Progesterone Receptor, Human Epidermal growth factor Receptor2 (HER2) and Ki-67. There are four main molecular subtypes—Luminal A, Luminal B, HER2 and Triple Negative (TN). Individualized treatment plans for breast cancer become effective once the molecular subtypes are identified preoperatively. In general, Luminal A is treated with endocrine therapy; Luminal B can be treated with endocrine therapy and cytotoxic chemotherapy; HER2 is treated with targeted therapy and cytotoxic chemotherapy; and cytotoxic chemotherapy is the main treatment for TN breast cancer. Different ML and DL models are developed and trained to facilitate the prediction of molecular subtypes.

Uni-modal approaches

It is important to investigate the performance of the model when a single modality is used in molecular subtype classification tasks. Research by Ha et al. [64] reports reasonable accuracy on MRI imaging when used on CNN. Likewise, Zhe et al. [65] managed to train VGGNet with SVMs to classify the Luminal A subtypes from other subtypes on MRI images. However, a more sophisticated approach via bounding boxes for capturing ROI used by Zhang et al. [66] to perform the classification of HR+/HER2-, HER2+ and TN groups, by using Dynamic-contrast-enhanced MRI images, was superior. Perou et al. [67] performed a hierarchical clustering method to group genes on the basis of similarity in the pattern with which their expression varied over all samples. They explored that tumors show great variation in their patterns of gene expression helping in identifying connections between specific genes and specific tumors. Furthermore, a system for classifying tumors on the basis of their gene expression patterns as ‘basal type’ and ‘luminal type’ is developed. An extension of this study [68] showcased the broader picture of gene expression pattern-based breast tumor subtyping and its clinical implications.

Multimodal approaches

If we consider molecular sub-typing of breast cancer as an unsupervised learning task, then clustering of the samples into different subgroups becomes the main objective. In this direction, Viaud et al. [48] have done an extensive study for the breast cancer subtype clustering of multi-omics data for the integrative representation of DNA methylation, miRNA, mRNA and RPPA from the TCGA-BRCA database. The obtained clusters provide the opportunity to gain biological insight for identifying biological markers that characterize different cancer sub-types. Pouryahya et al. [41] proposed an integrative network-based framework over multi-omics data (mRNA, DNA methylation and CNA) from the TCGA-BRCA database. To impute the integrative measures, this method uses a PPI network from the Human Protein Reference Database. This measure defines a weighted network for each sample, considering the concordance of all three multi-omics of the gene and its neighbors in the interaction network. The final outcome of this method is the formation of clusters concordant with the PAM50 molecular subtypes of breast cancer. Furthermore, survival analysis of each of the clusters depicts significant variation in survival rates which also established the validity of the clusters formed and their usefulness for clinical outcome analysis such as survival rate and and GO enrichment analysis for discovering biomarkers associated with each subtype. There are certain studies that use clinical information of the patients and treat breast cancer subtyping as a supervised learning task. Here, researchers focus on the usefulness of DNNs in multi-class classification. Zeng et al. [69] proposed a multi-modal CNN, ‘CNN5’ to classify breast cancer samples from mRNA and CNA (TCGA-BRCA) into five subtypes. The non-uniform data distribution induced class imbalance problem, which has been tackled by the inclusion of a weighted loss function in proportion to subtypes. Liu et al. [47] proposed a hybrid DL framework for mRNA, CNV and WSI data. Lin et al. [46] proposed DeepMO, a DL-based model with multi-omics data for breast cancer subtype classification. The model applies the late fusion technique to integrate the deep neural network-based encoded features of multi-omics (mRNA, DNA methylation and CNV) data. Guo et al. [45] proposed Attention-based GCN for breast cancer subtype classification. The model has two parts, a feature fusion module and a prediction module.

The paucity of publicly available labeled data motivated a switch from supervised learning to semi-supervised learning where TCGA PAN-Cancer and ARCHS4 databases have been used [40]. In their work, multi-class logistic regression and feed forward neural network are used to classify breast cancer samples into five popular subtypes. The model is trained on a more generic problem of 32 non-BRCA tumor types and 5 BRCA-subtype identification. They also employed VAE and CVAE to learn meaningful and simplified low-dimensional features of the multi-omics data. Two independent studies on molecular subtype classification by Jiang et al. [70] and Meng et al. [71] involved using multi-modal data (US images and MRI images). A recent study by Zhou et al. [72] used an assembled CNN (by incorporating DenseNet 121, ResNet 50, SENet 50) to predict the breast cancer molecular subtype on uni-, bi- and multi-modal data such as greyscale US images, Colour Doppler Flow Imaging and Shear-Wave Elastography images of breast cancers patients. Another study by Zhang et al. [73] used Mammography and US on ResNet50 by incorporating inter-modality attention and intra-modality attention models. In another article that combined the images from MRI and Mammography along with the clinical features [74], a decision tree was used and it was found that the usage of individual modality (with just MRI; with just Mammography) could not perform better than the combined modalities.

ML- AND DL-BASED APPROACHES FOR BREAST CANCER SURVIVAL PROGNOSIS

Besides molecular subtype and signature gene identification in breast cancer cases, survival prediction helps in selecting aggressive or regressive treatments.

Uni-modal approaches

Uni-modal breast cancer prognosis refers to the prognostication of patients using a single source of information. The initial literature survey identifies the usefulness of gene expression profiles as the conducive modality toward prognostication of breast cancer patients. The clinical implications of locally advanced breast cancer search for genes that correlated with patient survival show significantly different outcomes for the patients belonging to the various groups. Initial research on gene expression profiles [39] identifies 70 gene signatures in the prognostic or diagnostic-based classification of breast tumors. Van de Vijver et al. [75] further validated the previously identified 70 gene signatures as the predictor of breast cancer survival over 295 women using the supervised classification method. SVM-based classifiers in collaboration with other algorithms [76–78] to distinguish between benign and malignant breast tumors over the Wisconsin Breast Cancer Diagnosis and Prognosis dataset [79] are also proposed in the uni-modal studies of breast cancer.

The uni-modal study was not limited to gene expression, it also explored the usefulness of histopathology in cancer prognosis prediction. Integrated framework suggesting markers from histopathology slides for AI-aided diagnosis and survival [80], extraction of 9879 quantitative image features from H&E stained histopathology whole-slide images of cancer patients and further application of traditional ML algorithms for survival prognosis [81], [82] showcase applications of WSIs.

Multi-modal approaches

Multi-modal cancer prognosis and diagnosis witnessed a few bi-modal studies during the initial days. Integration of gene and clinical signatures by Sun et al. [83] resulted in the hybrid signatures-based model, where they further reduced the 70 genetic signatures already identified by [39] to three and clinical features to two. In a similar direction, [84] and [85] also proposed probabilistic models that integrate microarray and clinical data for breast cancer prognosis and diagnosis.

Moreover, shift from bi-modal to multi-modal [42, 44, 86–89] employed multi-modal fusion techniques with DL to improve the survival prognosis of breast cancer. Alkhateeb et al. [42] proposed the combination of self-organizing maps (SOM) and CNNs in a majority voting setting to predict the 5-year survival of METABRIC’s breast cancer patients, which has gene expression, CNA and clinical modalities. The method trains separate CNNs over the RGB images of SOM-derived relational networks for the classification of patients as dead or alive with a survival cut-off of 5 years. Sun et al. [44] have proposed the ‘MDNNMD’ to integrate the information from above three heterogeneous modalities. The authors proposed a score-level fusion technique to obtain the final survival classification outcomes at a 5-year survival cut-off. The obtained results are superior to uni-modal approaches. Zhang et al. [86] proposed the elastic net framework for the integration of gene measurements (e.g. gene expression and DNA methylation from TCGA-BRCA) and gene–gene interaction toward gene signature identification. Two risk groups, low and high, are defined based on the survival of the patient over the median overall survival and the selected genes are employed for survival prediction. Peng et al. [87] used raw and z scores of mRNA expression, DNA methylation and two forms of CNAs as the multi-omics data from the TCGA-BRCA database. The top 17.5% of the related genes were present in the probabilistic scores generated by CapsNetMMD, which is higher than the popular baselines. Furthermore, gene-level survival analysis of breast cancer is performed on the top 10 identified genes to establish the potential of genes in the study of breast cancer.

While the breast cancer survival prognosis using multi-modal data observed improvements, it applied simple concatenation for fusion. The basic problem with this fusion is that the distribution of heterogeneous data from different modalities leads to difficulties in extracting additional information through modalities that are essential for an overall interpretation of multi-modal information. Most of the previous work does not put effort into highlighting the variation in the importance of modalities or learning the modality-invariant embedding space for various modalities to match multi-modal distributions. To solve this, Guo et al. [88] proposed a novel multimodal affinity fusion network to integrate multi-modal data for breast cancer survival prediction. The effectiveness of GAN (specifically adversarial learning) in mapping one data distribution to another motivated Du and Zhao [89] to develop a multi-modal data adversarial representation framework for breast cancer prognosis prediction (Supplementary file with table 2–5 has further details).

ML CHALLENGES

The survival prediction and prognosis of Breast Cancer depending on different survival windows of 5–9 years is mostly due to our work. This triggers a unique problem of imbalanced classes in the binary classification problem of Breast cancer detection as the survival class becomes heavily skewed as the survival window is stretched from 5 to 9 years. This leads to severe class imbalance which traditional approaches such as SMOTE cannot handle while keeping the sanity of the class prediction probabilities unaffected.

Mitigating imbalanced healthcare datasets—SMOTE and Beyond

Healthcare data often suffer from imbalances, especially in cases such as disease diagnosis, where the number of true positive (disease present) cases can be much smaller compared with the true negative (disease absent) cases. This imbalance can lead to machine models that are biased toward the majority class. Working on such imbalanced datasets increases the occurrence of high type-1 and type-2 errors which is not ideal as it may lead to delayed treatment and is extremely fatal in cases such as breast cancer where early detection is key to survival rates. Reference [90] proposed SMOTE, a popular oversampling technique that tackles the class imbalance problem in most ML datasets. It increases the representation of the minority classes by creating synthetic samples in the feature space by selecting two or more similar instances to create new instances. One of the major issues is the over-generalization that occurs when SMOTE generates the same number of synthetic instances for different regions in the feature space, without taking into account the distribution of genuine instances. This can result in the over-representation of certain minority areas. For instance, if a minority region only has a few genuine instances, but SMOTE creates many synthetic instances in that region, it can skew the true distribution and cause that area to be over-represented. Additionally, synthetic instances created close to the decision boundary of two classes can make the boundary even more ambiguous. Unclear class boundaries in healthcare analytics may have erroneous predictions. For instance, in a two-class problem differentiating between benign and malignant tumors, synthetic instances that blur the boundary may cause a model to misclassify a benign tumor as malignant or vice versa. The lack of confidence in class boundaries is due to the oversampling in SMOTE. Alternative and more theoretically grounded approaches include incorporating the probability of class weights in weighted categorical cross entropy in models, which can automatically resolve class Imbalance to a certain extent [91] (Supplementary file contains detailed discussion on the advancements and Challenges of integrating LLM and VLM in breast cancer Prognosis).

OUR CONTRIBUTION: THE WAY FORWARD

Our contribution focuses primarily in the effective classification paradigm with minority class preservation, in the formulation of interpretable classifiers consistent with the changes in the survival window and introduction of multi-modal feature fusion and new modes. We claim that the combination of the proposed methods helps calibrate classifier confidence while processing new modes and features efficiently and be able to handle non-linear decision boundaries in small sample regime. To this extent, we introduce a combinatorial method to handle class imbalance, ’bootstrap minority class balancing’ which relies on the fact that the efficacy of classification is guaranteed to increase as small number of samples are added to the minority class. Millions of subsets are created organically within the training set by combinatorial splitting and binomial theorem, where the majority class is down-sampled in each of the subsets in such a way that the union of the subsets guarantees that the training data do not miss a single pattern of the data or any data instance. This is different from creating synthetic samples in SMOTE. The training is slow but fair for imbalanced datasets.

In the extension of [44], Arya and Saha proposed techniques [92, 93] for the multi-modal survival prognosis of breast cancer patients. The proposed models are generic and have the flexibility to use any ML classifier or deep neural network in the second stage. The later architecture, SiGaAtCNN + Input STACKED RF also follows a similar approach with the involvement of an attention-based gating mechanism at CNNs to learn feature embeddings better while handling class imbalance, validated over the TCGA-BRCA database.

Evolution of AI driven breast cancer detection research.
Figure 1

Evolution of AI driven breast cancer detection research.

The investigation of multi-modal breast cancer databases from Section 2, specifically the TCGA-BRCA, showed the availability of six modalities (mRNA, miRNA, CNV, DNA methylation and WSI) persuading us to explore the effectiveness of additional modalities toward survival prognosis, which has never been explored altogether in any of the studies till today. Unfortunately, the ’curse of dimensionality’ problem due to high-dimensional data in low sample regime became more obvious. To tackle this issue, Arya et al. [94] proposed the Logcosh VAE as the dimensionality reduction technique. While incorporating multiple modalities for the development of multi-omics or multi-modal systems, the very common problem we encounter is incomplete multi-view data. This situation arises when samples have missing data in certain modalities. Most of the studies tackled this scenario with a naive approach, where they considered only the intersection of complete multi-view samples while discarding the incomplete multi-view samples. This approach created paucity in the dataset resulting in non-generalization of the model. Arya and Saha [95] developed a GAN-based generative architecture to solve this issue. Moreover, the proposed method also utilizes the integration of generated data and raw data for breast cancer survival prediction.

We posit the relevant question: Is the State-of-the-Art (SOTA) detection of breast cancer survival the only objective? Since the end outcome is the critical issue of accurate detection of breast cancer that matches with clinical expertise, confidence in the label prediction probabilities post-incorporation of multiple modalities of data becomes an issue of paramount importance. In our case and many public datasets available for such exercise, the volume of data is not suitable for deep neural network-based predictions. As many features are considered for such low sample regime data, interpretable methods such as SVM should be considered instead of Tree-based ensemble methods such as RF, etc., which are perhaps set up for better performance. This was a challenge that we mitigated by proposing a novel SVM utility kernel that guarantees SOTA performance with confidence bounds for label predictions [96]. The inspiration was drawn from Production Economics via Production Functions which is reflected in the Kernel for Classification via adaptation of a class of utility functions to the properties of the inner product.

CONCLUSION

The Key points that summarize our article are variations in survival prediction windows and subsequent handling of the minority class and the utility of multi-modal approaches rather than depending on single modalities such as genomics alone. These points along with correct classification of molecular sub-types is critical for early and effective treatment of breast cancer. We observe that multi-modal feature fusion is critical in that regard. Figure 1 succinctly captures the spirit of the survey.

Key Points
  • Variations in survival prediction windows.

  • Minority class problem.

  • Multi-modal approaches are better than single modalities.

  • Classification of molecular sub-types.

ACKNOWLEDGMENTS

Authors thank the anonymous reviewers for their valuable suggestions. N.A. and K.P. thank the School of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, for partially supporting this work under the Strategic Promotion Program, FY2023.

AUTHOR CONTRIBUTIONS

N.A. and A.M. conducted an exhaustive literature search and contributed to the first three sections of the survey. S.R. contributed to the section on minority class and referencing. S.S. and K.P. reviewed the manuscript. S.S. (corresponding) conceptualized the flow of the ideas, organization of the sections and contents in each section with important references and wrote ‘our contribution’ section. Archana Mathur (Conceptualization [supporting], Investigation [equal], Methodology [equal], Project administration [supporting], Writing & original draft [equal], Writing & review & editing [equal]), Nikhilanand Arya (CRediT contribution not specified), Kitsuchart Pasupa (Project administration [supporting], Writing & review & editing [supporting]), Sriparna Saha (Conceptualization [supporting], Supervision [supporting], Writing & review & editing [supporting]), Sudeepa Roy Dey (Investigation [supporting], Methodology [supporting], Writing & original draft [supporting], Writing & review & editing [supporting]), and Snehanshu Saha (Conceptualization [lead], Data curation [supporting], Formal analysis [equal], Funding acquisition [equal], Investigation [supporting], Project administration [lead], Resources [equal], Supervision [lead], Validation [lead], Writing & original draft [equal], Writing & review & editing [lead]).

FUNDING

S.S. thanks the funding agencies, DBT-Builder project, BITS Pilani K K Birla Goa Campus (No. BT/INF/22/SP42543/2021), SERB SURE-DST, GoI (SUR/2022/001965) and SERB CRG- DST (CRG/2023/003210) for partial support.

Author Biographies

Archana Mathur is an Associate Professor at the Nitte Meenakshi Institute of Technology, Bangalore. Her PhD is in Machine Learning and Scientometrics. She has worked as a research assistant at Indian Statistical Institute, Bangalore.

Nikhilanand Arya is an Assistant Professor at KIIT, Bhubaneshwar, and researches multi-modal DL and ML techniques in computational biology. His MTech and PhD are from IIT Patna, India.

Kitsuchart Pasupa is an Associate Professor with the School of IT, KMITL, Thailand, and researches ML techniques for real-world. He was a recipient of APNNS Young Researcher Award, in 2019.

Sriparna Saha is currently an Associate Professor at the Computer Science and Engineering Department of Indian Institute of Technology Patna, India. Her major research interests are evolutionary machine learning, deep learning, natural language processing and bioinformatics.

Sudeepa Roy Dey is an Associate Professor in the Department of CSE, PES University, EC campus. Her PhD is in the area of scientometrics and network mining and graph analytics.

Snehanshu Saha is a Professor in the Department of CS&IS and Center Head-APPCAIR (AI Research Center), BITS PILANI K K Birla Goa Campus. He is a co-founder and Director of Research at HappyMonk AI, an AI product company. His research interests include bioInformatics and healthcare analytics in the big data regime.

References

1

Hanahan
D
,
Weinberg
RA
.
The hallmarks of cancer
.
Cell
 
2000
;
100
(
1
):
57
70
.

2

Sung
H
,
Ferlay
J
,
Siegel
R
, et al. .  
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
 
2021
;
71
(
3
):
209
49
.

3

Ferlay J, Ervik M, Lam F, Laversanne M, Colombet M, Mery L, Piñeros M, Znaor A, Soerjomataram I, Bray F
.
Global Cancer Observatory: Cancer Today
. Lyon, France: International Agency for Research on Cancer, 2024. Available from: https://gco.iarc.who.int/today (24 April 2024, date last accessed).

4

Ferlay J, Laversanne M, Ervik M, Lam F, Colombet M, Mery L, Piñeros M, Znaor A, Soerjomataram I, Bray F
.
Global Cancer Observatory: Cancer Tomorrow (version 1.1)
. Lyon, France: International Agency for Research on Cancer, 2024. Available from: https://gco.iarc.fr/tomorrow (24 April 2024, date last accessed).

5

Sankaranarayanan
R
,
Swaminathan
R
,
Brenner
H
, et al. .  
Cancer survival in Africa, Asia, and central America: a population-based study
.
Lancet Oncol
 
2010
;
11
(
2
):
165
73
.

6

Global Health Estimates: Life expectancy and leading causes of death and disability
. WHO,  
(20 August 2023, Date accessed)
.

7

Endogenous Hormones and Breast Cancer Collaborative Group; T J Key, P N Appleby, G K Reeves, R C Travis, A J Alberg, A Barricarte, F Berrino, V Krogh, S Sieri, L A Brinton, J F Dorgan, L Dossus, M Dowsett, A H Eliassen, R T Fortner, S E Hankinson, K J Helzlsouer, J Hoff man-Bolton, George W Comstock, R Kaaks, L L Kahle, P Muti, K Overvad, P H M Peeters, E Riboli, S Rinaldi, D E Rollison, F Z Stanczyk, D Trichopoulos, S S Tworoger, P Vineis.

Sex hormones and risk of breast cancer in premenopausal women: a collaborative reanalysis of individual participant data from seven prospective studies
.
Lancet Oncol
 
2013
;
14
(
10
):
1009
19
.

8

Bite
S
.
Lifetime probability among females of dying of cancer
.
JNCI-J Natl Cancer Inst
 
2004
;
96
(
11
):
818
8
.

9

Benz
CC
.
Impact of aging on the biology of breast cancer
.
Crit Rev Oncol Hematol
 
2008
;
66
(
1
):
65
74
.

10

Siegel
R
,
Ma
J
,
Zou
Z
,
Jemal
A
.
Cancer statistics
.
CA Cancer J Clin
 
2014
;
64
(
1
):
9
29
.

11

McGuire
A
,
Brown
J
,
Malone
C
, et al. .  
Effects of age on the detection and management of breast cancer
.
Cancer
 
2015
;
7
(
2
):
908
29
.

12

Hedenfalk
I
,
Duggan
D
,
Chen
Y
, et al. .  
Gene-expression profiles in hereditary breast cancer
.
N Engl J Med
 
2001
;
344
(
8
):
539
48
.

13

Çelik
A
,
Acar
M
,
Erkul
CM
, et al. .  
Relationship of Breast Cancer with Ovarian Cancer
. In: Mehmet Gunduz (ed)
A Concise Review of Molecular Pathology of Breast Cancer
. London, UK:
IntechOpen
,
2015
.

14

Shiovitz
S
,
Korde
LA
.
Genetics of breast cancer: a topic in evolution
.
Ann Oncol
 
2015
;
26
(
7
):
1291
9
.

15

Shahbandi
A
,
Nguyen
HD
,
Jackson
JG
.
TP53 mutations and outcomes in breast cancer: reading beyond the headlines
.
Trends in Cancer
 
2020
;
6
(
2
):
98
110
.

16

Corso
G
,
Veronesi
P
,
Sacchini
V
,
Galimberti
V
.
Prognosis and outcome in CDH1-mutant lobular breast cancer
.
Eur J Cancer Prev
 
2018
;
27
(
3
):
237
8
.

17

Corso
G
,
Intra
M
,
Trentin
C
, et al. .  
CDH1 germline mutations and hereditary lobular breast cancer
.
Fam Cancer
 
2016
;
15
(
2
):
215
9
.

18

Kechagioglou
P
,
Papi
RM
,
Provatopoulou
X
, et al. .  
Tumor suppressor PTEN in breast cancer: heterozygosity, mutations and protein expression
.
Anticancer Res
 
2014
;
34
(
3
):
1387
400
.

19

Chen
J
,
Lindblom
A
.
Germline mutation screening of the STK11/LKB1 gene in familial breast cancer with LOH on 19p: germline mutation screening of the STK11/LKB1 gene
.
Clin Genet
 
2000
;
57
(
5
):
394
7
.

20

Renwick
A
,
Thompson
D
,
Seal
S
, et al. .  
ATM mutations that cause ataxia-telangiectasia are breast cancer susceptibility alleles
.
Nat Genet
 
2006
;
38
(
8
):
873
5
.

21

Rahman
N
,
Seal
S
,
Thompson
D
, et al. .  
PALB2, which encodes a BRCA2-interacting protein, is a breast cancer susceptibility gene
.
Nat Genet
 
2007
;
39
(
2
):
165
7
.

22

Seal
S
,
Thompson
D
,
Renwick
A
, et al. .  
Truncating mutations in the Fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles
.
Nat Genet
 
2006
;
38
(
11
):
1239
41
.

23

Meijers-Heijboer
H
,
van den Ouweland
,
Klijn
J
, et al. .  
Low-penetrance susceptibility to breast cancer due to CHEK2*1100delC in noncarriers of BRCA1 or BRCA2 mutations
.
Nat Genet
 
2002
;
31
(
1
):
55
9
.

24

Park
DJ
,
Lesueur
F
,
Nguyen-Dumont
T
, et al. .  
Rare mutations in XRCC2 increase the risk of breast cancer
.
Am J Hum Genet
 
2012
;
90
(
4
):
734
9
.

25

Hill
DA
,
Prossnitz
ER
,
Royce
M
,
Nibbe
A
.
Temporal trends in breast cancer survival by race and ethnicity: a population-based cohort study
.
PLoS One
 
2019
;
14
(
10
):
e0224064
.

26

Yedjou
CG
,
Sims
JN
,
Miele
L
, et al. .
Health and racial disparity in breast cancer
. In:
Ahmad
A
(ed).
Breast Cancer Metastasis and Drug Resistance: Challenges and Progress, Advances in Experimental Medicine and Biology
.
Cham
:
Springer International Publishing
,
2019
,
31
49
.

27

Bernstein
L
.
Epidemiology of endocrine-related risk factors for breast cancer
.
J Mammary Gland Biol Neoplasia
 
2002
;
7
(
1
):
3
15
.

28

Albrektsen
G
,
Heuch
I
,
Hansen
S
,
Kvåle
G
.
Breast cancer risk by age at birth, time since birth and time intervals between births: exploring interaction effects
.
Br J Cancer
 
2005
;
92
(
1
):
167
75
.

29

Ursin
G
,
Bernstein
L
,
Lord
SJ
, et al. .  
Reproductive factors and subtypes of breast cancer defined by hormone receptor and histology
.
Br J Cancer
 
2005
;
93
(
3
):
364
71
.

30

Titus-Ernstoff
L
,
Longnecker
MP
,
Newcomb
PA
, et al. .  
Menstrual factors in relation to breast cancer risk
.
Cancer Epidemiol Biomarkers Prev
 
1998
;
7
(
9
):
783
9
.

31

Kim
EY
,
Chang
Y
,
Ahn
J
, et al. .  
Mammographic breast density, its changes, and breast cancer risk in premenopausal and postmenopausal women
.
Cancer
 
2020
;
126
(
21
):
4687
96
.

32

Hartmann
LC
,
Sellers
TA
,
Frost
MH
, et al. .  
Benign breast disease and the risk of breast cancer
.
N Engl J Med
 
2005
;
353
(
3
):
229
37
.

33

Dyrstad
SW
,
Yan
Y
,
Fowler
AM
,
Colditz
GA
.
Breast cancer risk associated with benign breast disease: systematic review and meta-analysis
.
Breast Cancer Res Treat
 
2015
;
149
(
3
):
569
75
.

34

Wang
J
,
Costantino
JP
,
Tan-Chiu
E
, et al. .  
Lower-category benign breast disease and the risk of invasive breast cancer
.
JNCI J Nati Cancer Instit
 
2004
;
96
(
8
):
616
20
.

35

Ng
J
,
Shuryak
I
.
Minimizing second cancer risk following radiotherapy: current perspectives
.
Cancer Manag Res
 
2014
;
7
:1–11.

36

Clark
GM
.
Do we really need prognostic factors for breast cancer?
 
Breast Cancer Res Treat
 
1994
;
30
(
2
):
117
26
.

37

Martin
LR
,
Williams
SL
,
Haskard
KB
,
Dimatteo
MR
.
The challenge of patient adherence
.
Ther Clin Risk Manag
 
2005
;
1
(
3
):
189
99
.

38

Xu
X
,
Zhang
Y
,
Zou
L
, et al. .  
A gene signature for breast cancer prognosis using support vector machine
. In:
2012 5th International Conference on BioMedical Engineering and Informatics
. Chongqing, China: IEEE,
2012
. pp.
928
31
.

39

Van ‘t Veer
LJ
,
Dai
H
,
Marc
J
, et al. .  
Gene expression profiling predicts clinical outcome of breast cancer
.
Nature
 
2002
;
415
(
6871
):
530
6
.

40

Cristovao
F
,
Cascianelli
S
,
Canakoglu
A
, et al. .  
Investigating deep learning based breast cancer subtyping using pan-cancer and multi-Omic data
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
;
19
(
1
):
121
34
.

41

Pouryahya
M
,
Oh
JH
,
Javanmard
P
, et al. .  
aWCluster: a novel integrative network-based clustering of multiomics for subtype analysis of cancer data
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
;
19
(
3
):
1472
83
.

42

Alkhateeb
A
,
Zhou
L
,
Tabl
AA
, et al. .  
Deep learning approach for breast cancer InClust 5 prediction based on multiomics data integration
. In:
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
, pp.
1
6
, Virtual Event USA,
2020
.
ACM, New York
.

43

Radovic
M
,
Ghalwash
M
,
Filipovic
N
,
Obradovic
Z
.
Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
.
BMC Bioinformatics
 
2017
;
18
(
1
):
9
.

44

Sun
D
,
Wang
M
,
Li
A
.
A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data
.
IEEE/ACM Trans Comput Biol Bioinform
 
2019
;
16
(
3
):
841
50
.

45

Guo
H
,
Lv
X
,
Li
Y
,
Li
M
.
Attention-based GCN integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification
.
Brief Funct Genomics
 
2023
;
22
(5):463–74.

46

Lin
Y
,
Zhang
W
,
Cao
H
, et al. .  
Classifying breast cancer subtypes using deep neural networks based on multi-omics data
.
Genes
 
2020
;
11
(
8
):
888
.

47

Liu
T
,
Huang
J
,
Liao
T
, et al. .  
A hybrid deep learning model for predicting molecular subtypes of human breast cancer using multimodal data
.
Irbm
 
2021
;
43
:
62
74
.

48

Viaud
G
,
Mayilvahanan
P
,
Cournede
P-H
.
Representation learning for the clustering of multi-omics data
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
;
19
(
1
):
135
45
.

49

Kingma
DP
,
Welling
M
.
Auto-encoding variational bayes
.
arXiv preprint arXiv:1312.6114
.
2013
.

50

Sohn
K
,
Lee
H
,
Yan
X
.
Learning structured output representation using deep conditional generative models
. In:
Cortes
C
,
Lawrence
N
,
Lee
D
, et al. . (eds).
Advances in Neural Information Processing Systems
. MIT PressCambridge, MA, United States:
Curran Associates, Inc.
,
28
,
2015
.

51

Sahiner
B
,
Chan
H-P
,
Petrick
NA
, et al. .  
Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images
.
IEEE Trans Med Imaging
 
1996
;
15
(
5
):
598
610
.

52

Carneiro
G
,
Nascimento
JC
,
Bradley
AP
.
Unregistered multiview mammogram analysis with pre-trained deep learning models
. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention Springer, Cham
,
2015
.

53

Wei
L
,
Yang
Y
,
Nishikawa
RM
,
Jiang
Y
.
A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications
.
IEEE Trans Med Imaging
 
2005
;
24
:
371
80
.

54

Kooi
T
,
Litjens
GJS
,
Ginneken
BV
, et al. .  
Large scale deep learning for computer aided detection of mammographic lesions
.
Med Image Anal
 
2017
;
35
:
303
12
.

55

Mohamed
AA
,
Berg
WA
,
Peng
H
, et al. .  
A deep learning method for classifying mammographic breast density categories
.
Med Phys
 
2018
;
45
:
314
21
.

56

Lehman
CD
,
Yala
A
,
Schuster
T
, et al. .  
Mammographic breast density assessment using deep learning: clinical implementation
.
Radiology
 
2019
;
290
(
1
):
52
8
.

57

Arora
R
,
Rai
PK
,
Raman
B
.
Deep feature–based automatic classification of mammograms
.
Med Biol Eng Comput
 
2020
;
58
:
1199
211
.

58

Li
H
,
Zhuang
S
,
Li
D-a
, et al. .  
Benign and malignant classification of mammogram images based on deep learning
.
Biomed Signal Process Control
 
2019
;
51
:
347
54
.

59

Byra
M
,
Galperin
MY
,
Ojeda-Fournier
H
, et al. .  
Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion
.
Med Phys
 
2019
;
46
:
746
55
.

60

Choi
JS
,
Han
B-K
,
Ko
ES
, et al. .  
Effect of a deep learning framework-based computer-aided diagnosis system on the diagnostic performance of radiologists in differentiating between malignant and benign masses on breast ultrasonography
.
Korean J Radiol
 
2019
;
20
:
749
58
.

61

Huang
Y
,
Han
L
,
Dou
H
, et al. .  
Two-stage cnns for computerized bi-rads categorization in breast ultrasound images
.
Biomed Eng Online
 
2019
;
18
:
8
.

62

Aghaei
F
,
Tan
M
,
Hollingsworth
AB
, et al. .  
Computer-aided breast mr image feature analysis for prediction of tumor response to chemotherapy
.
Med Phys
 
2015
;
42
(
11
):
6520
8
.

63

Yuhong
Q
,
Zhu
H-T
,
Cao
K
, et al. .  
Prediction of pathological complete response to neoadjuvant chemotherapy in breast cancer using a deep learning (dl) method
.
Thoracic Cancer
 
2020
;
11
:
651
8
.

64

Ha
RS
,
Mutasa
S
,
Karcich
J
, et al. .  
Predicting breast cancer molecular subtype with mri dataset utilizing convolutional neural network algorithm
.
J Digit Imaging
 
2019
;
32
:
276
82
.

65

Zhu
Z
,
Albadawy
E
,
Saha
A
, et al. .  
Breast cancer molecular subtype classification using deep features: preliminary results
 
In: Medical Imaging
.
Proceedings of the SPIE
 
2018
;
10575
:6.

66

Zhang
Y
,
Chen
JH
,
Lin
Y
, et al. .  
Prediction of breast cancer molecular subtypes on dce-mri using convolutional neural network with transfer learning between two centers
.
Eur Radiol
 
2020
;
31
:
2559
67
.

67

Perou
CM
,
Sørlie
T
,
Eisen
MB
, et al. .  
Molecular portraits of human breast tumours
.
Nature
 
2000
;
406
(
6797
):
747
52
.

68

Sørlie
T
,
Perou
CM
,
Tibshirani
R
, et al. .  
Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications
.
Proc Natl Acad Sci
 
2001
;
98
(
19
):
10869
74
.

69

Zeng
J
,
Cai
H
,
Akutsu
T
.
Breast Cancer Subtype by Imbalanced Omics Data through A Deep Learning Fusion Model
. In:
Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics
. pp.
78
83
, Kyoto Japan,
2020
.
ACM New york
.

70

Jiang
M
,
Zhang
D
,
Tang
S
, et al. .  
Deep learning with convolutional neural network in the assessment of breast cancer molecular subtypes based on us images: a multicenter retrospective study
.
Eur Radiol
 
2020
;
31
:
3673
82
.

71

Meng
W
,
Sun
YD
,
Qian
HY
, et al. .  
Computer-aided diagnosis evaluation of the correlation between magnetic resonance imaging with molecular subtypes in breast cancer 7, 13818 (2017)
.
Front Oncol
 
2021
;
11
.

72

Zhou
B
,
Wang
L-F
,
Yin
H
, et al. .  
Decoding the molecular subtypes of breast cancer seen on multimodal ultrasound images using an assembled convolutional neural network model: a prospective and multicentre study
.
EBioMedicine
 
2021
;
74
:
103684
.

73

Zhang
T
,
Tan
T
,
Han
L
, et al. .  
Predicting breast cancer types on and beyond molecular level in a multi-modal fashion
.
NPJ Breast Cancer
 
2023
;
9
(1):16.

74

Mingxiang
W
,
Zhong
X
,
Peng
Q
, et al. .  
Prediction of molecular subtypes of breast cancer using bi-rads features based on a ”white box” machine learning approach in a multi-modal imaging setting
.
Eur J Radiol
 
2019
;
114
:
175
84
.

75

van de Vijver
,
He
YD
, et al. .  
A gene-expression signature as a predictor of survival in breast cancer
.
N Engl J Med
 
2002
;
347
(
25
):
1999
2009
.

76

Krishnan
MMR
,
Banerjee
S
,
Chakraborty
C
, et al. .  
Statistical analysis of mammographic features and its classification using support vector machine
.
Expert Syst Appl
 
2010
;
37
(
1
):
470
8
.

77

Stoean
R
,
Stoean
C
.
Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection
.
Expert Syst Appl
 
2013
;
40
(
7
):
2677
86
.

78

Tingting
M
,
Nandi
AK
.
Breast cancer detection from FNA using SVM with different parameter tuning systems and SOM–RBF classifier
.
J Franklin Inst
 
2007
;
344
(
3–4
):
285
311
.

79

Nguyen
C
,
Wang
Y
,
Nguyen
HN
.
Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic
.
J Biomed Sci Eng
 
2013
;
06
(
5
):
551
60
.

80

Wang
H
,
Xing
F
,
Hai
S
, et al. .  
Novel image markers for non-small cell lung cancer classification and survival prediction
.
BMC Bioinform
 
2014
;
15
(
1
):
310
.

81

Kun-Hsing
Y
,
Zhang
C
,
Berry
GJ
, et al. .  
Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features
.
Nat Commun
 
2016
;
7
:
12474
.

82

Tang
B
,
Li
A
,
Li
B
,
Wang
M
.
CapSurv: capsule network for survival analysis with whole slide pathological images
.
IEEE Access
 
2019
;
7
:
26022
30
.

83

Sun
Y
,
Goodison
S
,
Li
J
, et al. .  
Improved breast cancer prognosis through the combination of clinical and genetic markers
.
Bioinformatics
 
2007
;
23
(
1
):
30
7
.

84

Gevaert
O
,
De Smet
,
Timmerman
D
, et al. .  
Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks
.
Bioinformatics (Oxford, England)
 
2006
;
22
(
14
):
e184
90
.

85

Khademi
M
,
Nedialkov
NS
.
Probabilistic Graphical Models and Deep Belief Networks for Prognosis of Breast Cancer
. In:
2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)
. IEEE, Miami, Florida, USA, pp.
727
32
,
2015
.

86

Zhang
L
,
Liu
H
,
Huang
Y
, et al. .  
Cancer progression prediction using gene interaction regularized elastic net
.
IEEE/ACM Trans Comput Biol Bioinform
 
2017
;
14
(
1
):
145
54
.

87

Peng
C
,
Zheng
Y
,
Huang
D-S
.
Capsule network based modeling of multi-omics data for discovery of breast cancer-related genes
.
IEEE/ACM Trans Comput Biol Bioinform
 
2020
;
17
(
5
):
1605
12
.

88

Guo
W
,
Liang
W
,
Deng
Q
,
Zou
X
.
A multimodal affinity fusion network for predicting the survival of breast cancer patients
.
Front Genet
 
2021
;
12
:
709027
.

89

Xiuquan
D
,
Zhao
Y
.
Multimodal adversarial representation learning for breast cancer prognosis prediction
.
Comput Biol Med
 
2023
;
157
:
106765
.

90

Chawla
N
,
Bowyer
K
,
Hall
LO
,
Kegelmeyer
WP
.
Smote: synthetic minority over-sampling technique
.
ArXiv
 
abs/1106.1813
2002
;
16
:
321
57
.

91

Tyagi
M
,
Roy
S
,
Bansal
V
.
Custom weighted balanced loss function for covid 19 detection from an imbalanced cxr dataset
. In:
2022 26th International Conference on Pattern Recognition (ICPR)
. IEEE, Montreal, QC, Canada,
2022
, pp.
2707
13
.

92

Arya
N
,
Saha
S
.
Multi-modal classification for human breast cancer prognosis prediction: proposal of deep-learning based stacked ensemble model
.
IEEE/ACM Trans Comput Biol Bioinform
 
2020
;
19
:
1
1041
.

93

Arya
N
,
Saha
S
.
Multi-modal advanced deep learning architectures for breast cancer survival prediction
.
Knowl-Based Syst
 
2021
;
221
:
106965
.

94

Arya
N
,
Saha
S
,
Mathur
A
,
Saha
S
.
Improving the robustness and stability of a machine learning model for breast cancer prognosis through the use of multi-modal classifiers
.
Sci Rep
 
2023
;
13
(
1
):
4079
.

95

Arya
N
,
Saha
S
.
Generative incomplete multi-view prognosis predictor for breast cancer: GIMPP
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
;
19
(
4
):
2252
63
.

96

Arya
N
,
Mathur
A
,
Saha
S
,
Saha
S
.
Proposal of svm utility kernel for breast cancer survival estimation
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
;
20
:
1372
83
.

Author notes

Equal Contribution.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Supplementary data