Abstract

Increasing studies have revealed the critical roles of human microbiome in a wide variety of disorders. Identification of disease-associated microbes might improve our knowledge and understanding of disease pathogenesis and treatment. Computational prediction of microbe-disease associations would provide helpful guidance for further biomedical screening, which has received lots of research interest in bioinformatics. In this study, a deep learning-based computational approach entitled SGJMDA is presented for predicting microbe-disease associations. Specifically, SGJMDA first fuses multiple similarities of microbes and diseases using a nonlinear strategy, and extracts feature information from homogeneous networks composed of the fused similarities via a graph convolution network. Second, a heterogeneous microbe-disease network is built to further capture the structural information of microbes and diseases by employing multi-neighborhood graph convolution network and jumping knowledge network. Finally, potential microbe-disease associations are inferred through computing the linear correlation coefficients of their embeddings. Results from cross-validation experiments show that SGJMDA outperforms 6 state-of-the-art computational methods. Furthermore, we carry out case studies on three important diseases using SGJMDA, in which 19, 20, and 11 predictions out of their top 20 results are successfully checked by the latest databases, respectively. The excellent performance of SGJMDA suggests that it could be a valuable and promising tool for inferring disease-associated microbes.

Introduction

Microbes, which mainly refer to bacteria, but also include fungi and viruses, have been observed to live in and on human body sites, including urogenital tract, stomach, and skin [1]. The human body contains an estimated 350 trillion microbial cells [2]. Advances in metagenomics and metatranscriptomic analysis technologies have enabled the scientific community to explore the functions of human microbiome. Investigation into the human microbiome has revealed that they have a significant impact on our health. For instance, emerging evidence indicated that the gut microbiota is crucial in supporting health [3]. Another study found that a decrease in the amount of Faecalibacterium prausnitzii, an anti-inflammatory commensal bacterium, is linked to a higher chance of the recurrence of ileal Crohn’s disease (CD) [4].

Identification of human disease-associated microbes would provide a better understanding of disease etiology, which might lead to novel medical treatments [5]. Due to the significance of microbes in human health, researchers have searched published papers and established online databases [6, 7, 8] to systematically curate disease-associated microbes for further studies. Nevertheless, our knowledge of the microbe-disease associations has until now been limited. Meanwhile, it takes time and money to validate disease-associated microbes through in vivo studies. Computational predictions of such associations for further biomedical screening would be an excellent cost-effective alternative.

Till now, computational models to predict microbe-disease associations have garnered lots of research interest in bioinformatics field, and algorithms are constantly proposed with improved prediction accuracy [9]. These computational techniques can be mainly divided into three groups: network-based, matrix factorization-based, and machine learning-based.

Network-based methods apply graph theories to prioritize the unknown microbe-disease associations at the network level. For example, bi-random walk [10], KATZ measure combining network topology information [11] and network consistency projection in conjunction with label propagation [12] were utilized for inferring new microbe-disease associations. The network-based approaches can provide good interpretability of prediction results. However, there is still opportunity for improvements in their performance.

Matrix factorization-based approaches, usually under the low-rank assumption, have been widely used to recover user-item preference matrix in recommender systems [13]. Analogously, computational methods [14–16] were developed to apply matrix factorization to fill out the unknown elements in the original microbe-disease association matrix for new association predictions. Because of significant computational complexity of matrix operations, challenges would exist when these matrix factorization approaches are applied to large-scale datasets.

More recently, the fast advances in machine learning, especially deep learning, enable the development of efficient algorithms for the prediction of microbe-disease associations. For instance, adaptive boosting [17], back-propagation neural network [18] and deep sparse autoencoder neural network [19] were applied to infer new microbe-disease associations. These computational methods are receiving encouraging prediction results.

Despite of success of the above methods in identifying microbe-disease associations, some challenges should be tackled for better predictions. First, as biomedical technologies advance, more and more data features of microbes and diseases are available. Integrating these heterogeneous features for more reliable and accurate prediction is a challenging task. Second, biomedically confirmed negative samples cannot be obtained when using supervised learning methods for association prediction. Some proposed methods usually select negative samples randomly from the unlabeled microbe-disease pairs, in which noise exists and inaccurate results would be received. Finally, the successful applications of deep learning are encouraging us to develop more robust and precise algorithms to predict microbe-disease associations.

In this study, we develop an algorithm named SGJMDA based on similarity fusion using graph convolution networks and jumping knowledge networks for microbe-disease association predictions. We first collect four categories of similarities for microbes and diseases, respectively. A non-linear method is applied to fuse these similarities. Graph convolution networks and jumping knowledge networks are then used to extract features of microbes and diseases, respectively. Linear correlation coefficients between microbes and diseases are finally calculated as the prediction results. We comprehensively test and compare the performance of SGJMDA based on benchmark datasets and cross validations. We also conduct case studies to showcase the prediction ability of SGJMDA in real situations. With excellent performance received, we expect our method SGJMDA would be helpful for biomedical researchers in predicting microbe-disease associations.

Materials and methods

Datasets

In this study, we download the benchmark datasets from reference [19] for performance analysis. We give a brief explanation of the datasets below.

Human microbe–disease associations

In reference [19], authors retrieved human microbe-disease associations from three existing databases (i.e. HMDAD [6], Disbiome [7] and Peryton [8]). After deleting the redundant data and information merging, 4499 experimentally validated microbe-disease associations, which contain 1177 microbes and 134 diseases, were collected as gold standard data. We use Nm and Nd to denote the numbers of microbes and diseases, respectively. Meanwhile, ARNm×Nd is used to represent the adjacency matrix of the microbe-disease associations, where Nm (=1177) is the number of rows (microbes) and Nd(=134) denotes the number of columns (diseases). A(i,j) = 1 indicates an association between microbe mi and disease dj exists. Otherwise A(i,j) = 0.

Similarity calculation and fusion

In reference [19], authors utilized four methods to compute similarities for microbes and diseases. Firstly, they calculated the semantic similarity(DS)of diseases. Based on this, the functional similarity(FS)of microbes was then derived. Subsequently, they further computed the cosine similarity (COS_MS,COS_DS), Gaussian interaction profile similarity(GIP_MS, GIP_DS), and sigmoid kernel function similarity(SIG_MS,SIG_DS)for both microbes and diseases. We download the similarities from reference [19].

As data from different sources can provide complementary information, while containing potential noise [20, 21], we apply a non-linear strategy, motivated by reference [20], to fuse these similarities for both microbes and diseases. Firstly, we standardize COS_MS, GIP_MS, and SIG_MS of microbes. Taking COS_MS as an example, the normalization process is computed as follows:

(1)

Set all diagonal elements of the matrix to 1/2, and the total sum of elements in each row is equal to 1. We can obtain SMCOS by normalizing COS_MS. Meanwhile, we process GIP_MS and SIG_MS in the same way to obtain SMGIP and SMSIG, respectively. Next, we use the KNN algorithm to calculate the local affinity SKNNCOS for microbe i and microbe j as follows:

(2)

Ni is the set of KNNs for a given node, where Ni is equal to the total number of microbes divided by 10. Based on the principle that the closer the distance, the higher the similarity, we set the similarity between nodes far away from a given node to 0. Similarly, we can obtain SKNNGIP and SKNNSIG.

We simultaneously update the three similarity networks as follows:

(3)
(4)
(5)

Among them, m represents the number of different similarity networks of microbes. Since we use three similarity networks, therefore m = 3. K is the selected microbe similarity network. T denotes the times of iterations. The similarity matrix SM is calculated as:

(6)

When the condition SMk(t)SMk(t1)SMk(t1)<106 is met, the iteration will end. We obtain SM through the following equation:

(7)

Finally, we set a hyperparameter α to merge FS with SM as follows:

(8)
The workflow of SGJMDA in microbe-disease association inference.
Figure 1

The workflow of SGJMDA in microbe-disease association inference.

The hyperparameter α is a weighting factor. SM represents the final result of fused similarity feature matrices. Similarly, we can use the above methods to obtain the similarity feature matrix SD of diseases.

Method architecture

The architecture of SGJMDA for association prediction is presented below. The computational framework mainly consists of two parts. The first part is feature extraction and the other one is association prediction. We illustrate the workflow of SGJMDA in Fig. 1.

After similarity fusion, we apply GCN [22] for feature extraction for both microbes and diseases. The progressive spread rule formula of GCN used in our study is similar to reference [23], which is defined as follows:

(9)

Where H(l) denotes the embeddings of nodes at the l-th layer, D=diag(jGij) is the degree matrix of G, W(l) is the trainable weight matrix to the l-th layer and σ() is a non-linear activation function.

Taking microbes as an example, we define the input graph G as:

(10)

Then, our first layer GCN can be formulated as:

(11)

where W(0)RNm×k represents the first layer weight matrix, and k is used to change the dimension of the embedding. Due to the fact that our model only uses one layer of GCN in the homogeneous network, the feature output of microbes obtained through GCN is SMRNm×k. Similarly, we can obtain the feature output of diseases SDRNd×k.

Meanwhile, inspired by MINIMDA [24], we construct microbe-disease heterogeneous networks to integrate mixed high-order neighborhood information to further obtain representation of microbes and diseases. We use the microbe-disease adjacency matrix and fully connected SM and SD for feature extraction as follow:

(12)
(13)
(14)

where K represents the power of D~12G~D~12, and K={0,1,2,...,k} denotes the order of neighbors for information propagation between features. When k=2, K will be {0,1,2}, which means that the model will only receive information from the 0, 1st, and 2nd order neighborhoods.

In order to effectively aggregate the representation of intermediate layers to the final layer, we apply the mechanism of jumping knowledge (JK) network [25] to aggregates these different layers, which is calculated as follows:

(15)

where ωi represents the weighting coefficient for feature aggregation in the i-th layer. The features SM~ and SD~ of microbes and diseases will then be obtained, respectively.

We use linear correlation coefficients to infer possible microbe-disease associations [26], which is computed according to the following equation:

(16)

where

λi[SMSM~]
and
λj[SDSD~]
are vectors representing the features of the i-th microbe and j-th diseases, respectively, and μiand μj are the average values of λi and λj.

Finally, we use the sigmoid function to reconstruct the microbe-disease matrix A^, according to the following formula:

(17)

Optimization of model parameters is based on binary cross entropy loss (see equation 18):

(18)

where γ+ and γ represent the sets of positive and negative instances used in training. (i,j) indicates microbe i and disease j. A^ij denotes the predicted score between microbe i and disease j.

Results

Experimental setting

We use 5-fold and 10-fold cross-validations (5-CV and 10-CV) to evaluate the performance of our model. For 5-CV, we randomly divide the 4499 microbe-disease associations into five roughly equal parts, with four parts for training and the remaining one for testing. The similar steps are taken in the 10-CV tests. We further calculate AUC, AUPR, Recall, precision (Pre), accuracy (ACC) and F1-score as indicators for performance comparison.

Hyperparameter analysis

Our method SGJMDA contains the following hyperparameters: (i) the proportion coefficient of feature fusion α, (ii) the embedding dimension layer_size, (iii) the number of layers in multi-neighborhood GCN n, and (iv) the number of head nodes in multi-neighborhood GCN k. We analyse the impact of different parameter setting on prediction performance based on 10-CV using the benchmark datasets.

Firstly, for the proportion coefficient of feature fusion α, we select its value as 0.1, 0.2, 0.25, 0.28, 0.3 and 0.4 for comparison and find that when the proportion coefficient α of feature fusion equals 0.28, the model receives the best performance (see Fig. 2).

Performance analysis on the proportion coefficient $\alpha$.
Figure 2

Performance analysis on the proportion coefficient α.

Secondly, we change the dimension of layer_size embedded in GCN and analyse its impact on prediction performance. As shown in Fig. 3, we set its value to be 32, 64, 128 and 256 and results show that when layer_size = 128, our model performs best.

Performance analysis on the dimension of layer_size.
Figure 3

Performance analysis on the dimension of layer_size.

Thirdly, we select the number of layers for multi-neighborhood GCN in our model to be 1, 2, 4, and 6. Figure 4 shows that when the number is 2, our model performances best.

Performance analysis on the number of layers in multi-neighborhood.
Figure 4

Performance analysis on the number of layers in multi-neighborhood.

Finally, we test the impact of the number of head nodes k in multi-neighborhood GCN. We set its value as 2, 3, 4, 5, and 6 respectively. As shown in Fig. 5, when k equals 4, our model receives its best performance.

Performance analysis on the number of head nodes in multi-neighborhood.
Figure 5

Performance analysis on the number of head nodes in multi-neighborhood.

Based on the above experimental tests, we set α to 0.28, layer_size to 128, n to 2 and k to 4 in SGJMDA. In addition, we empirically set lr = 0.001, wd = 1e-5, and epoch = 2000 in our study.

Ablation experiments

There are five key modules in our method SGJMDA: feature fusion, GCN fused with homogeneous networks, multi-neighborhood GCN, jumping knowledge and decoder. We remove each component from SGJMDA separately to investigate their impacts on prediction ability. Here are the five models we test and compare:

SGJMDA-SF model: We remove the feature fusion component used in this paper and replace it with averaging, while keeping the rest unchanged.

SGJMDA-Hom model: We preserve feature fusion, multi-neighborhood GCN, jumping knowledge and decoder, and replace GCN with fully connected networks.

SGJMDA-Het model: We remove multi-neighborhood GCN and jumping knowledge, while keep other modules unchanged.

SGJMDA-JK model: We remove the jumping knowledge module, leaving all other modules unchanged.

SGJMDA-Dec model: The decoder is replaced by utilizing fused matric of microbial feature and the transposition of fused matric of disease features for matrix multiplication. Subsequently, applying the sigmoid function to generate the final score matrix.

As shown in Table 1, the performance of SGJMDA-SF is inferior to SGJMDA, indicating that our non-linear fusion strategy can more effectively integrate features of microbes and diseases and achieve better prediction performance; The performance of SGJMDA-Hom is worse when compared with SGJMDA, indicating that homogeneous GCN can better learn the features of microbes and diseases; SGJMDA-Het is only slightly better than SGJMDA on Recall, while other indicators are lower, demonstrating that multi-neighborhood GCN also contributes to the embedding of microbes and diseases; The performance of SGJMDA-JK is worse than that of SGJMDA, suggesting that the using jumping knowledge can improve prediction performance; SGJMDA-Dec performs much worse than SGJMDA, indicating that the linear correlation coefficients are suitable to predict microbe-disease associations in our study.

Table 1

Results of ablation tests based on 10-CV

MethodAUCAUPRAccuracyPrecisionRecallF1-score
SGJMDA
SGJMDA-SF
SGJMDA-Hom
SGJMDA-Het
SGJMDA-JK
SGJMDA-Dec
0.9509
0.9295
0.9364
0.9495
0.9404
0.8433
0.9450
0.9297
0.9293
0.9410
0.9273
0.8650
0.8914
0.8608
0.8728
0.8908
0.8821
0.7912
0.8677
0.8548
0.8579
0.8628
0.8593
0.8107
0.9251
0.8695
0.8949
0.9299
0.9155
0.7633
0.8951
0.8620
0.8755
0.8945
0.8861
0.7853
MethodAUCAUPRAccuracyPrecisionRecallF1-score
SGJMDA
SGJMDA-SF
SGJMDA-Hom
SGJMDA-Het
SGJMDA-JK
SGJMDA-Dec
0.9509
0.9295
0.9364
0.9495
0.9404
0.8433
0.9450
0.9297
0.9293
0.9410
0.9273
0.8650
0.8914
0.8608
0.8728
0.8908
0.8821
0.7912
0.8677
0.8548
0.8579
0.8628
0.8593
0.8107
0.9251
0.8695
0.8949
0.9299
0.9155
0.7633
0.8951
0.8620
0.8755
0.8945
0.8861
0.7853
Table 1

Results of ablation tests based on 10-CV

MethodAUCAUPRAccuracyPrecisionRecallF1-score
SGJMDA
SGJMDA-SF
SGJMDA-Hom
SGJMDA-Het
SGJMDA-JK
SGJMDA-Dec
0.9509
0.9295
0.9364
0.9495
0.9404
0.8433
0.9450
0.9297
0.9293
0.9410
0.9273
0.8650
0.8914
0.8608
0.8728
0.8908
0.8821
0.7912
0.8677
0.8548
0.8579
0.8628
0.8593
0.8107
0.9251
0.8695
0.8949
0.9299
0.9155
0.7633
0.8951
0.8620
0.8755
0.8945
0.8861
0.7853
MethodAUCAUPRAccuracyPrecisionRecallF1-score
SGJMDA
SGJMDA-SF
SGJMDA-Hom
SGJMDA-Het
SGJMDA-JK
SGJMDA-Dec
0.9509
0.9295
0.9364
0.9495
0.9404
0.8433
0.9450
0.9297
0.9293
0.9410
0.9273
0.8650
0.8914
0.8608
0.8728
0.8908
0.8821
0.7912
0.8677
0.8548
0.8579
0.8628
0.8593
0.8107
0.9251
0.8695
0.8949
0.9299
0.9155
0.7633
0.8951
0.8620
0.8755
0.8945
0.8861
0.7853

Comparison with other methods

We compare SGJMDA with six baseline methods (i.e. DSAE_RF [19], AMHMDA [27], MHCLMDA [28], MNNMDA [29], LRLSHMDA [30], and NTSHMDA [31]) using the same benchmark datasets and cross-validations.

We plot the ROC and PR curves of these methods based on 5-CV tests in Fig. 6 for comparison. It can be found that the average AUC value of SGJMDA is 0.9479, which is 2.41% (DSAE_RF), 6.68% (AMHMDA), 6.38% (MHCLMDA), 2.30% (MNNMDA), 12.39% (LRLSHMDA), and 15.12% (NTSHMDA) higher than the other 6 methods, respectively. We can also see the average AUPR value for SGJMDA is 0.9410, which is 2.32% (DSAE_RF), 6.96% (AMHMDA), 6.45% (MHCLMDA), 0.54% (MNNMDA), 14.94% (LRLSHMDA), and 16.83% (NTSHMDA) higher than other methods, respectively. The results of other performance indicators based on 5-CV tests are shown in Table 2. These results suggest that SGJMDA outperforms the other six methods, significantly.

ROC and PR curves of different methods in association prediction based on 5-CV.
Figure 6

ROC and PR curves of different methods in association prediction based on 5-CV.

Table 2

Performance comparison with the baseline methods based on 5-CV

MethodAccuracyPrecisionRecallF1-score
SGJMDA0.88360.85070.93180.8890
DSAE_RF0.84840.84900.84820.8482
AMHMDA0.77420.83790.69180.7467
MHCLMDA0.71780.77880.86350.8187
MNNMDA0.87750.88610.86660.8762
LRLSHMDA0.76830.73240.85040.7860
NTSHMDA0.70760.65590.87800.7504
MethodAccuracyPrecisionRecallF1-score
SGJMDA0.88360.85070.93180.8890
DSAE_RF0.84840.84900.84820.8482
AMHMDA0.77420.83790.69180.7467
MHCLMDA0.71780.77880.86350.8187
MNNMDA0.87750.88610.86660.8762
LRLSHMDA0.76830.73240.85040.7860
NTSHMDA0.70760.65590.87800.7504
Table 2

Performance comparison with the baseline methods based on 5-CV

MethodAccuracyPrecisionRecallF1-score
SGJMDA0.88360.85070.93180.8890
DSAE_RF0.84840.84900.84820.8482
AMHMDA0.77420.83790.69180.7467
MHCLMDA0.71780.77880.86350.8187
MNNMDA0.87750.88610.86660.8762
LRLSHMDA0.76830.73240.85040.7860
NTSHMDA0.70760.65590.87800.7504
MethodAccuracyPrecisionRecallF1-score
SGJMDA0.88360.85070.93180.8890
DSAE_RF0.84840.84900.84820.8482
AMHMDA0.77420.83790.69180.7467
MHCLMDA0.71780.77880.86350.8187
MNNMDA0.87750.88610.86660.8762
LRLSHMDA0.76830.73240.85040.7860
NTSHMDA0.70760.65590.87800.7504

Similarly, we plot the 10-CV results in Fig. 7. The average AUC value of SGJMDA is 95.09%, which surpasses the other six methods by 2.55% (DSAE_RF), 6.67% (AMHMDA), 6.92% (MHCLMDA), 2.37% (MNNMDA), 12.14% (LRLSHMDA), and 15.65% (NTSHMDA), respectively. Meanwhile, the average AUPR value of SGJMDA is 0.9450, which is 2.51% (DSAE_RF), 7.04% (AMHMDA), 7.73% (MHCLMDA), 0.87% (MNNMDA), 14.89% (LRLSHMDA), and 17.51% (NTSHMDA) higher, respectively. Other performance indicators are provided in Table 3. Results from 10-CV tests again confirm the superior performance of our method SGJMDA.

ROC and PR curves of different methods in association prediction based on 10-CV.
Figure 7

ROC and PR curves of different methods in association prediction based on 10-CV.

Table 3

Performance comparison with the baseline methods based on 10-CV

MethodAccuracyPrecisionRecallF1-score
SGJMDA0.89140.86770.92510.8951
DSAE_RF0.84810.84860.84800.8477
AMHMDA0.79740.82640.75650.7870
MHCLMDA0.72950.77230.88440.8237
MNNMDA0.88030.88350.87890.8801
LRLSHMDA0.77230.73040.86800.7923
NTSHMDA0.71750.67060.85950.7526
MethodAccuracyPrecisionRecallF1-score
SGJMDA0.89140.86770.92510.8951
DSAE_RF0.84810.84860.84800.8477
AMHMDA0.79740.82640.75650.7870
MHCLMDA0.72950.77230.88440.8237
MNNMDA0.88030.88350.87890.8801
LRLSHMDA0.77230.73040.86800.7923
NTSHMDA0.71750.67060.85950.7526
Table 3

Performance comparison with the baseline methods based on 10-CV

MethodAccuracyPrecisionRecallF1-score
SGJMDA0.89140.86770.92510.8951
DSAE_RF0.84810.84860.84800.8477
AMHMDA0.79740.82640.75650.7870
MHCLMDA0.72950.77230.88440.8237
MNNMDA0.88030.88350.87890.8801
LRLSHMDA0.77230.73040.86800.7923
NTSHMDA0.71750.67060.85950.7526
MethodAccuracyPrecisionRecallF1-score
SGJMDA0.89140.86770.92510.8951
DSAE_RF0.84810.84860.84800.8477
AMHMDA0.79740.82640.75650.7870
MHCLMDA0.72950.77230.88440.8237
MNNMDA0.88030.88350.87890.8801
LRLSHMDA0.77230.73040.86800.7923
NTSHMDA0.71750.67060.85950.7526

In addition, we adopt the same strategy of selecting negative samples as DSAE_RF [19] (k-means clustering selection), followed by 10-CV tests, and compare the prediction performance with DSAE_RF. The results are listed in Fig. 8. Results from Table 3 and Fig. 8 show SGJMDA receives better prediction performance than DSAE_RF [19] when using k-means to select negative samples.

Performance comparison between SGJMDA and DSAE_RF when using the same k-means clustering for negative sample selection.
Figure 8

Performance comparison between SGJMDA and DSAE_RF when using the same k-means clustering for negative sample selection.

Case studies

We further carry out case studies on three important diseases (i.e. obesity, Crohn’s disease and colorectal cancer) to test SGJMDA’ s prediction ability in real situations. Specifically, we first exclude the association information for each specific disease from the benchmark datasets. Then, we train SGJMDA to infer disease-associated microbes. Finally, we select the top 20 predictions for validation. We use the latest versions of HMDAD, Disbiome, and Peryton to confirm the results.

As an epidemic worldwide, obesity increases the incidence of diabetes, heart disease, high blood pressure, and cancer [32]. Conventional knowledge suggests that behaviors that lead to overeating and inactivity are the main contributors to obesity. However, there are several microorganisms that have been linked to obesity in humans [33]. The findings suggest that obesity has a microbial factor, which may have potential therapeutic implications [34]. We use SGJMDA to infer obesity-associated microbes. We select the top 20 predictions and discover that 19 of them have been confirmed in the HMDAD, Disbiome and Peryton databases. We showcase the results in Table 4.

Table 4

The top 20 predicted obesity-associated microbes

RankingMicrobeEvidence
1CorynebacteriumPMID:30654751
2PeptostreptococcaceaePMID:30572569
3Streptococcus gordoniiPMID:19587155
4RuminococcusPMID:31399369
5RuminococcaceaePMID:29280312
6EubacteriumPMID:23055155
7Bacteroides eggerthiiPMID:29388394
8CoprococcusPMID:30572569
9PrevotellaPMID:31024514
10FaecalibacteriumPMID:23985870
11LactobacillusPMID:23631345
12StreptococcusPMID:29576948
13Bacteriodes uniformisPMID:29338886
14Collinsella aerofaciensNA
15Streptococcus oralisPMID:29520825
16ProteobacteriaPMID:30386323
17BifidobacteriumPMID:29280312
18EscherichiaPMID:23055155
19BlautiaPMID:31530820
20Prevotella melaninogenicaPMID:19587155
RankingMicrobeEvidence
1CorynebacteriumPMID:30654751
2PeptostreptococcaceaePMID:30572569
3Streptococcus gordoniiPMID:19587155
4RuminococcusPMID:31399369
5RuminococcaceaePMID:29280312
6EubacteriumPMID:23055155
7Bacteroides eggerthiiPMID:29388394
8CoprococcusPMID:30572569
9PrevotellaPMID:31024514
10FaecalibacteriumPMID:23985870
11LactobacillusPMID:23631345
12StreptococcusPMID:29576948
13Bacteriodes uniformisPMID:29338886
14Collinsella aerofaciensNA
15Streptococcus oralisPMID:29520825
16ProteobacteriaPMID:30386323
17BifidobacteriumPMID:29280312
18EscherichiaPMID:23055155
19BlautiaPMID:31530820
20Prevotella melaninogenicaPMID:19587155

Note: NA indicates not available.

Table 4

The top 20 predicted obesity-associated microbes

RankingMicrobeEvidence
1CorynebacteriumPMID:30654751
2PeptostreptococcaceaePMID:30572569
3Streptococcus gordoniiPMID:19587155
4RuminococcusPMID:31399369
5RuminococcaceaePMID:29280312
6EubacteriumPMID:23055155
7Bacteroides eggerthiiPMID:29388394
8CoprococcusPMID:30572569
9PrevotellaPMID:31024514
10FaecalibacteriumPMID:23985870
11LactobacillusPMID:23631345
12StreptococcusPMID:29576948
13Bacteriodes uniformisPMID:29338886
14Collinsella aerofaciensNA
15Streptococcus oralisPMID:29520825
16ProteobacteriaPMID:30386323
17BifidobacteriumPMID:29280312
18EscherichiaPMID:23055155
19BlautiaPMID:31530820
20Prevotella melaninogenicaPMID:19587155
RankingMicrobeEvidence
1CorynebacteriumPMID:30654751
2PeptostreptococcaceaePMID:30572569
3Streptococcus gordoniiPMID:19587155
4RuminococcusPMID:31399369
5RuminococcaceaePMID:29280312
6EubacteriumPMID:23055155
7Bacteroides eggerthiiPMID:29388394
8CoprococcusPMID:30572569
9PrevotellaPMID:31024514
10FaecalibacteriumPMID:23985870
11LactobacillusPMID:23631345
12StreptococcusPMID:29576948
13Bacteriodes uniformisPMID:29338886
14Collinsella aerofaciensNA
15Streptococcus oralisPMID:29520825
16ProteobacteriaPMID:30386323
17BifidobacteriumPMID:29280312
18EscherichiaPMID:23055155
19BlautiaPMID:31530820
20Prevotella melaninogenicaPMID:19587155

Note: NA indicates not available.

For Crohn’s disease [35], we first remove the association information from the benchmark datasets, and apply SGJMDA to predict its related microbes. We find that all the top 20 predictions are validated by the latest databases. We list the results in Table 5.

Table 5

The top 20 predicted Crohn’s disease-associated microbes

RankingMicrobeEvidence
1AkkermansiaPMID:28222161
2RuminococcaceaePMID:25121355
3PrevotellaPMID:24013298
4AlistipesPMID:20816835
5FaecalibacteriumPMID:17119388
6CorynebacteriumPMID:22068912
7LactobacillusPMID:17897884
8Faecalibacterium prausnitziiPMID:19235886
9RuminococcusPMID:22068912
10FusobacteriumPMID:30927743
11BifidobacteriumPMID:26789999
12CoprococcusPMID:30478724
13BlautiaPMID:31899727
14CollinsellaPMID:20816835
15MegasphaeraPMID:27083382
16RothiaPMID:26288001
17Collinsella aerofaciensPMID:26804920
18PseudomonasPMID:26574491
19AnaerostipesPMID:26313691
20StreptococcusPMID:30545401
RankingMicrobeEvidence
1AkkermansiaPMID:28222161
2RuminococcaceaePMID:25121355
3PrevotellaPMID:24013298
4AlistipesPMID:20816835
5FaecalibacteriumPMID:17119388
6CorynebacteriumPMID:22068912
7LactobacillusPMID:17897884
8Faecalibacterium prausnitziiPMID:19235886
9RuminococcusPMID:22068912
10FusobacteriumPMID:30927743
11BifidobacteriumPMID:26789999
12CoprococcusPMID:30478724
13BlautiaPMID:31899727
14CollinsellaPMID:20816835
15MegasphaeraPMID:27083382
16RothiaPMID:26288001
17Collinsella aerofaciensPMID:26804920
18PseudomonasPMID:26574491
19AnaerostipesPMID:26313691
20StreptococcusPMID:30545401
Table 5

The top 20 predicted Crohn’s disease-associated microbes

RankingMicrobeEvidence
1AkkermansiaPMID:28222161
2RuminococcaceaePMID:25121355
3PrevotellaPMID:24013298
4AlistipesPMID:20816835
5FaecalibacteriumPMID:17119388
6CorynebacteriumPMID:22068912
7LactobacillusPMID:17897884
8Faecalibacterium prausnitziiPMID:19235886
9RuminococcusPMID:22068912
10FusobacteriumPMID:30927743
11BifidobacteriumPMID:26789999
12CoprococcusPMID:30478724
13BlautiaPMID:31899727
14CollinsellaPMID:20816835
15MegasphaeraPMID:27083382
16RothiaPMID:26288001
17Collinsella aerofaciensPMID:26804920
18PseudomonasPMID:26574491
19AnaerostipesPMID:26313691
20StreptococcusPMID:30545401
RankingMicrobeEvidence
1AkkermansiaPMID:28222161
2RuminococcaceaePMID:25121355
3PrevotellaPMID:24013298
4AlistipesPMID:20816835
5FaecalibacteriumPMID:17119388
6CorynebacteriumPMID:22068912
7LactobacillusPMID:17897884
8Faecalibacterium prausnitziiPMID:19235886
9RuminococcusPMID:22068912
10FusobacteriumPMID:30927743
11BifidobacteriumPMID:26789999
12CoprococcusPMID:30478724
13BlautiaPMID:31899727
14CollinsellaPMID:20816835
15MegasphaeraPMID:27083382
16RothiaPMID:26288001
17Collinsella aerofaciensPMID:26804920
18PseudomonasPMID:26574491
19AnaerostipesPMID:26313691
20StreptococcusPMID:30545401

Similarly, for colorectal cancer [36], we apply SGJMDA to predict its potentially associated microbes. For the predicted 20 predictions, we find that 11 associations are confirmed (Table 6).

Table 6

The top 20 predicted colorectal cancer-associated microbes

RankingMicrobeEvidence
1MicrococcusPMID:28600626
2Erysipelotrichaceae incertae sedisNA
3AggregatibacterPMID:27742762
4ArthrospiraPMID:25150117
5
6
Alloscardovia
propionicimonas
NA
NA
7Peptostreptococcaceae incertae sedisPMID:22114001
8PandoraeaPMID:35672730
9Mycobacterium tuberculosisPMID:36183156
10Delftia tsuruhatensisNA
11Propionimicrobium lymphophilumNA
12Varibaculum cambrienseNA
13AcidovoraxPMID:36717544
14PseudothermotogaNA
15Lactobacillus taiwanensisPMID:29650970
16Anaerococcus tetradiusNA
17WolbachiaNA
18RhodobacteraceaePMID:37317301
19MycoplasmaPMID:37772998
20TreponemaPMID:35664963
RankingMicrobeEvidence
1MicrococcusPMID:28600626
2Erysipelotrichaceae incertae sedisNA
3AggregatibacterPMID:27742762
4ArthrospiraPMID:25150117
5
6
Alloscardovia
propionicimonas
NA
NA
7Peptostreptococcaceae incertae sedisPMID:22114001
8PandoraeaPMID:35672730
9Mycobacterium tuberculosisPMID:36183156
10Delftia tsuruhatensisNA
11Propionimicrobium lymphophilumNA
12Varibaculum cambrienseNA
13AcidovoraxPMID:36717544
14PseudothermotogaNA
15Lactobacillus taiwanensisPMID:29650970
16Anaerococcus tetradiusNA
17WolbachiaNA
18RhodobacteraceaePMID:37317301
19MycoplasmaPMID:37772998
20TreponemaPMID:35664963

Note: NA indicates not available.

Table 6

The top 20 predicted colorectal cancer-associated microbes

RankingMicrobeEvidence
1MicrococcusPMID:28600626
2Erysipelotrichaceae incertae sedisNA
3AggregatibacterPMID:27742762
4ArthrospiraPMID:25150117
5
6
Alloscardovia
propionicimonas
NA
NA
7Peptostreptococcaceae incertae sedisPMID:22114001
8PandoraeaPMID:35672730
9Mycobacterium tuberculosisPMID:36183156
10Delftia tsuruhatensisNA
11Propionimicrobium lymphophilumNA
12Varibaculum cambrienseNA
13AcidovoraxPMID:36717544
14PseudothermotogaNA
15Lactobacillus taiwanensisPMID:29650970
16Anaerococcus tetradiusNA
17WolbachiaNA
18RhodobacteraceaePMID:37317301
19MycoplasmaPMID:37772998
20TreponemaPMID:35664963
RankingMicrobeEvidence
1MicrococcusPMID:28600626
2Erysipelotrichaceae incertae sedisNA
3AggregatibacterPMID:27742762
4ArthrospiraPMID:25150117
5
6
Alloscardovia
propionicimonas
NA
NA
7Peptostreptococcaceae incertae sedisPMID:22114001
8PandoraeaPMID:35672730
9Mycobacterium tuberculosisPMID:36183156
10Delftia tsuruhatensisNA
11Propionimicrobium lymphophilumNA
12Varibaculum cambrienseNA
13AcidovoraxPMID:36717544
14PseudothermotogaNA
15Lactobacillus taiwanensisPMID:29650970
16Anaerococcus tetradiusNA
17WolbachiaNA
18RhodobacteraceaePMID:37317301
19MycoplasmaPMID:37772998
20TreponemaPMID:35664963

Note: NA indicates not available.

Moreover, we use SGJMDA to make comprehensive microbe-disease association predictions based on the whole information in the benchmark datasets. We select the top 20 predicted results for validation. We search PubMed (https://pubmed.ncbi.nlm.nih.gov/) for confirmation, and discover that 14 predictions have been verified (Table 7).

Table 7

The top 20 predictions by SGJMDA

RankingMicrobeDiseaseEvidence
1BlautiaChronic kidney diseasePMID:33101877
2MicrococcusColorectal cancerPMID:28600626
3AlistipesChronic kidney diseasePMID:37809388
4KlebsiellaChronic kidney diseasePMID:37284390
5SutterellaChronic kidney diseasePMID:36718700
6FusobacteriumChronic kidney diseasePMID:33435396
7Erysipelotrichaceae incertae sedisColorectal cancerNA
8AggregatibacterColorectal cancerNA
9OdoribacterChronic kidney diseasePMID:37011727
10MegasphaeraChronic kidney diseasePMID:34357944
11OscillibacterChronic kidney diseasePMID:32560104
12ArthrospiraColorectal cancerPMID:35946342
13StaphylococcusChronic kidney diseaseNA
14VeillonellaChronic kidney diseasePMID:38095826
15LachnospiraceaeChronic kidney diseasePMID:37065213
16CoriobacteriaceaeChronic kidney diseasePMID:33681383
17AlloscardoviaColorectal cancerNA
18PropionicimonasColorectal cancerNA
19Proteobacteriachronic kidney diseasePMID:29444477
20Lachnospiraceae incertae sedisChronic kidney diseaseNA
RankingMicrobeDiseaseEvidence
1BlautiaChronic kidney diseasePMID:33101877
2MicrococcusColorectal cancerPMID:28600626
3AlistipesChronic kidney diseasePMID:37809388
4KlebsiellaChronic kidney diseasePMID:37284390
5SutterellaChronic kidney diseasePMID:36718700
6FusobacteriumChronic kidney diseasePMID:33435396
7Erysipelotrichaceae incertae sedisColorectal cancerNA
8AggregatibacterColorectal cancerNA
9OdoribacterChronic kidney diseasePMID:37011727
10MegasphaeraChronic kidney diseasePMID:34357944
11OscillibacterChronic kidney diseasePMID:32560104
12ArthrospiraColorectal cancerPMID:35946342
13StaphylococcusChronic kidney diseaseNA
14VeillonellaChronic kidney diseasePMID:38095826
15LachnospiraceaeChronic kidney diseasePMID:37065213
16CoriobacteriaceaeChronic kidney diseasePMID:33681383
17AlloscardoviaColorectal cancerNA
18PropionicimonasColorectal cancerNA
19Proteobacteriachronic kidney diseasePMID:29444477
20Lachnospiraceae incertae sedisChronic kidney diseaseNA

Note: NA indicates not available.

Table 7

The top 20 predictions by SGJMDA

RankingMicrobeDiseaseEvidence
1BlautiaChronic kidney diseasePMID:33101877
2MicrococcusColorectal cancerPMID:28600626
3AlistipesChronic kidney diseasePMID:37809388
4KlebsiellaChronic kidney diseasePMID:37284390
5SutterellaChronic kidney diseasePMID:36718700
6FusobacteriumChronic kidney diseasePMID:33435396
7Erysipelotrichaceae incertae sedisColorectal cancerNA
8AggregatibacterColorectal cancerNA
9OdoribacterChronic kidney diseasePMID:37011727
10MegasphaeraChronic kidney diseasePMID:34357944
11OscillibacterChronic kidney diseasePMID:32560104
12ArthrospiraColorectal cancerPMID:35946342
13StaphylococcusChronic kidney diseaseNA
14VeillonellaChronic kidney diseasePMID:38095826
15LachnospiraceaeChronic kidney diseasePMID:37065213
16CoriobacteriaceaeChronic kidney diseasePMID:33681383
17AlloscardoviaColorectal cancerNA
18PropionicimonasColorectal cancerNA
19Proteobacteriachronic kidney diseasePMID:29444477
20Lachnospiraceae incertae sedisChronic kidney diseaseNA
RankingMicrobeDiseaseEvidence
1BlautiaChronic kidney diseasePMID:33101877
2MicrococcusColorectal cancerPMID:28600626
3AlistipesChronic kidney diseasePMID:37809388
4KlebsiellaChronic kidney diseasePMID:37284390
5SutterellaChronic kidney diseasePMID:36718700
6FusobacteriumChronic kidney diseasePMID:33435396
7Erysipelotrichaceae incertae sedisColorectal cancerNA
8AggregatibacterColorectal cancerNA
9OdoribacterChronic kidney diseasePMID:37011727
10MegasphaeraChronic kidney diseasePMID:34357944
11OscillibacterChronic kidney diseasePMID:32560104
12ArthrospiraColorectal cancerPMID:35946342
13StaphylococcusChronic kidney diseaseNA
14VeillonellaChronic kidney diseasePMID:38095826
15LachnospiraceaeChronic kidney diseasePMID:37065213
16CoriobacteriaceaeChronic kidney diseasePMID:33681383
17AlloscardoviaColorectal cancerNA
18PropionicimonasColorectal cancerNA
19Proteobacteriachronic kidney diseasePMID:29444477
20Lachnospiraceae incertae sedisChronic kidney diseaseNA

Note: NA indicates not available.

Discussion and conclusions

Studies have demonstrated that human microbiome has a profound impact on health. Their differences in abundance and diversity can help explain the susceptibility or resistance to certain diseases. Identifying disease-associated microbes would therefore boost our understanding of the pathogenesis of diseases and promote treatment to diseases. In this study, we develop a deep learning-based computational framework SGJMDA to infer new microbe–disease associations. Comprehensive experiments, including ablation tests, comparison with other methods and case studies, are carried out. Results show the superiority of our method in association prediction.

The factors that lead to the good performance of our method can be summarized as follows. First, we use a nonlinear strategy for similarity fusion. Comparative results show the similarity fused by our method can generate more accurate predictions. Second, we apply both GCN and jumping knowledge network to exact features from microbes and diseases, which can obtain high-order neighborhood representation information for them. Finally, we calculate linear correlation coefficients not matrix multiplication as prediction scores. Ablation tests demonstrate prediction performance can be improved by calculating linear correlation coefficients as prediction results.

Although our method SGJMDA performs well in terms of prediction performance, there are still some limitations. For example, the number of experimentally verified microbe-disease associations is limited, which would affect the prediction performance. We expect to solve this issue by integrating more reliable association information discovered in the future in our model. Meanwhile, optimizing the hyperparameters in SGJMDA is also a challenging task, which is a common problem in deep learning methods. These issues are further directions we need to study.

Key Points
  • We apply a non-linear strategy for similarity fusion for both microbes and diseases.

  • SGJMDA can effectively extract embeddings for both microbes and diseases using GCN and jumping knowledge networks.

  • SGJMDA outperforms existing methods and improves prediction accuracy in association prediction.

Conflict of interest: None declared.

Funding

Jiangxi Provincial Natural Science Foundation, China (20242BAB25083).

Data availability

The benchmark datasets and source codes used in our study are freely accessible at https://github.com/IamChenHailin/SGJMDA.

Author Biographies

Hailin Chen, PhD, is an associate professor at School of Information and Software Engineering, East China Jiaotong University. His research interest includes data mining and bioinformatics.

Kuan Chen is a graduate student at School of Information and Software Engineering, East China Jiaotong University. His research interest is deep learning and bioinformatics.

References

1.

Cénit
M
,
Matzaraki
V
,
Tigchelaar
E
. et al.
Rapidly expanding knowledge on the role of the gut microbiome in health and disease
.
Biochimica et Biophysica Acta (BBA)-molecular basis of disease
2014
;
1842
:
1981
92
.

2.

Ley
RE
,
Peterson
DA
,
Gordon
JI
.
Ecological and evolutionary forces shaping microbial diversity in the human intestine
.
Cell
2006
;
124
:
837
48
. .

3.

O'Hara
AM
,
Shanahan
F
.
The gut flora as a forgotten organ
.
EMBO Rep
2006
;
7
:
688
93
. .

4.

Sokol
H
,
Pigneur
B
,
Watterlot
L
. et al.
Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients
.
Proc Natl Acad Sci
2008
;
105
:
16731
6
. .

5.

Althani
AA
,
Marei
HE
,
Hamdi
WS
. et al.
Human microbiome and its association with health and diseases
.
J Cell Physiol
2016
;
231
:
1688
94
. .

6.

Ma
W
,
Zhang
L
,
Zeng
P
. et al.
An analysis of human microbe-disease associations
.
Brief Bioinform
2017
;
18
:
85
97
. .

7.

Janssens
Y
,
Nielandt
J
,
Bronselaer
A
. et al.
Disbiome database: linking the microbiome to disease
.
BMC Microbiol
2018
;
18
:
50
. .

8.

Skoufos
G
,
Kardaras
FS
,
Alexiou
A
. et al.
Peryton: a manual collection of experimentally supported microbe-disease associations
.
Nucleic Acids Res
2021
;
49
:
D1328
33
. .

9.

Wen
Z
,
Yan
C
,
Duan
G
. et al.
A survey on predicting microbe-disease associations: biological data and computational methods
.
Brief Bioinform
2021
;
22
:
bbaa157
. .

10.

Zou
S
,
Zhang
J
,
Zhang
Z
.
A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network
.
PloS One
2017
;
12
:
e0184394
. .

11.

Chen
X
,
Huang
Y-A
,
You
Z-H
. et al.
A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases
.
Bioinformatics
2017
;
33
:
733
9
. .

12.

Yin
M-M
,
Liu
J-X
,
Gao
Y-L
. et al.
NCPLP: a novel approach for predicting microbe-associated diseases with network consistency projection and label propagation
.
IEEE Transact Cybernet
2020
;
52
:
5079
87
. .

13.

Bennett
J
,
Lanning
S
. The netflix prize. In:
Proceedings of KDD Cup and Workshop
, p.
35
.
New York: Association for Computing Machinery (ACM)
,
2007
.

14.

Wu
C
,
Gao
R
,
Zhang
Y
.
mHMDA: human microbe-disease association prediction by matrix completion and multi-source information
.
IEEE Access
2019
;
7
:
106687
93
. .

15.

Yang
X
,
Kuang
L
,
Chen
Z
. et al.
Multi-similarities bilinear matrix factorization-based method for predicting human microbe-disease associations
.
Front Genet
2021
;
12
:
754425
. .

16.

Xu
D
,
Xu
H
,
Zhang
Y
. et al.
Novel collaborative weighted non-negative matrix factorization improves prediction of disease-associated human microbes
.
Front Microbiol
2022
;
13
:
834982
. .

17.

Peng
LH
,
Yin
J
,
Zhou
L
. et al.
Human microbe-disease association prediction based on adaptive boosting
.
Front Microbiol
2018
;
9
:
2440
. .

18.

Li
H
,
Wang
Y
,
Zhang
Z
. et al.
Identifying microbe-disease association based on a novel back-propagation neural network model
.
IEEE/ACM Trans Comput Biol Bioinform
2020
;
18
:
2502
13
. .

19.

Wang
L
,
Wang
Y
,
Xuan
C
. et al.
Predicting potential microbe-disease associations based on multi-source features and deep learning
.
Brief Bioinform
2023
;
24
:1–14. .

20.

Wang
B
,
Mezlini
AM
,
Demir
F
. et al.
Similarity network fusion for aggregating data types on a genomic scale
.
Nat Methods
2014
;
11
:
333
7
. .

21.

Chen
H
,
Zhang
Z
,
Zhang
J
.
In silico drug repositioning based on the integration of chemical, genomic and pharmacological spaces
.
BMC Bioinformatics
2021
;
22
:
52
. .

22.

Kipf
TN
,
Max
W
.
Semi-supervised classification with graph convolutional networks
. arXiv preprint arXiv:1609.02907.
2016
.

23.

Yu
Z
,
Huang
F
,
Zhao
X
. et al.
Predicting drug-disease associations through layer attention graph convolutional network
.
Brief Bioinform
2021
;
22
:1–11. .

24.

Lou
Z
,
Cheng
Z
,
Li
H
. et al.
Predicting miRNA-disease associations via learning multimodal networks and fusing mixed neighborhood information
.
Brief Bioinform
2022
;
23
:1–15.

25.

Xu
K
,
Li
C
,
Tian
Y
. et al. Representation learning on graphs with jumping knowledge networks. In:
International Conference on Machine Learning
, pp.
5453
62
.
Stockholm, Sweden: PMLR
,
2018
.

26.

Peng
W
,
Liu
H
,
Dai
W
. et al.
Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions
.
Bioinformatics
2022
;
38
:
4546
53
. .

27.

Ning
Q
,
Zhao
Y
,
Gao
J
. et al.
AMHMDA: attention aware multi-view similarity networks and hypergraph learning for miRNA-disease associations identification
.
Brief Bioinform
2023
;
24
:1–11. .

28.

Peng
W
,
He
Z
,
Dai
W
. et al.
MHCLMDA: multihypergraph contrastive learning for miRNA-disease association prediction
.
Brief Bioinform
2023
;
25
:1–12. .

29.

Liu
H
,
Bing
P
,
Zhang
M
. et al.
MNNMDA: predicting human microbe-disease association via a method to minimize matrix nuclear norm
.
Comput Struct Biotechnol J
2023
;
21
:
1414
23
. .

30.

Wang
F
,
Huang
ZA
,
Chen
X
. et al.
LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction
.
Sci Rep
2017
;
7
:
7601
. .

31.

Luo
J
,
Long
Y
.
NTSHMDA: prediction of human microbe-disease association based on random walk by integrating network topological similarity
.
IEEE/ACM Trans Comput Biol Bioinform
2020
;
17
:
1341
51
. .

32.

Bray
GA
.
Medical consequences of obesity
.
J Clin Endocrinol Metab
2004
;
89
:
2583
9
.

33.

Hegde
V
,
Dhurandhar
NV
.
Microbes and obesity--interrelationship between infection, adipose tissue and the immune system
.
Clin Microbiol Infect
2013
;
19
:
314
20
.

34.

Ley
RE
,
Turnbaugh
PJ
,
Klein
S
. et al.
Human gut microbes associated with obesity
.
Nature
2006
;
444
:
1022
3
. .

35.

Torres
J
,
Mehandru
S
,
Colombel
JF
. et al.
Crohn's disease
.
Lancet
2017
;
389
:
1741
55
. .

36.

Center
MM
,
Jemal
A
,
Smith
RA
. et al.
Worldwide variations in colorectal cancer
.
CA Cancer J Clin
2009
;
59
:
366
78
. .

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.