Predicting disease-associated microbes based on similarity fusion and deep learning

Abstract

Increasing studies have revealed the critical roles of human microbiome in a wide variety of disorders. Identification of disease-associated microbes might improve our knowledge and understanding of disease pathogenesis and treatment. Computational prediction of microbe-disease associations would provide helpful guidance for further biomedical screening, which has received lots of research interest in bioinformatics. In this study, a deep learning-based computational approach entitled SGJMDA is presented for predicting microbe-disease associations. Specifically, SGJMDA first fuses multiple similarities of microbes and diseases using a nonlinear strategy, and extracts feature information from homogeneous networks composed of the fused similarities via a graph convolution network. Second, a heterogeneous microbe-disease network is built to further capture the structural information of microbes and diseases by employing multi-neighborhood graph convolution network and jumping knowledge network. Finally, potential microbe-disease associations are inferred through computing the linear correlation coefficients of their embeddings. Results from cross-validation experiments show that SGJMDA outperforms 6 state-of-the-art computational methods. Furthermore, we carry out case studies on three important diseases using SGJMDA, in which 19, 20, and 11 predictions out of their top 20 results are successfully checked by the latest databases, respectively. The excellent performance of SGJMDA suggests that it could be a valuable and promising tool for inferring disease-associated microbes.

microbe-disease association, similarity fusion, graph convolution networks, jumping knowledge networks

Issue Section:

Problem solving protocol

Introduction

Microbes, which mainly refer to bacteria, but also include fungi and viruses, have been observed to live in and on human body sites, including urogenital tract, stomach, and skin [1]. The human body contains an estimated 350 trillion microbial cells [2]. Advances in metagenomics and metatranscriptomic analysis technologies have enabled the scientific community to explore the functions of human microbiome. Investigation into the human microbiome has revealed that they have a significant impact on our health. For instance, emerging evidence indicated that the gut microbiota is crucial in supporting health [3]. Another study found that a decrease in the amount of Faecalibacterium prausnitzii, an anti-inflammatory commensal bacterium, is linked to a higher chance of the recurrence of ileal Crohn’s disease (CD) [4].

Identification of human disease-associated microbes would provide a better understanding of disease etiology, which might lead to novel medical treatments [5]. Due to the significance of microbes in human health, researchers have searched published papers and established online databases [6, 7, 8] to systematically curate disease-associated microbes for further studies. Nevertheless, our knowledge of the microbe-disease associations has until now been limited. Meanwhile, it takes time and money to validate disease-associated microbes through in vivo studies. Computational predictions of such associations for further biomedical screening would be an excellent cost-effective alternative.

Till now, computational models to predict microbe-disease associations have garnered lots of research interest in bioinformatics field, and algorithms are constantly proposed with improved prediction accuracy [9]. These computational techniques can be mainly divided into three groups: network-based, matrix factorization-based, and machine learning-based.

Network-based methods apply graph theories to prioritize the unknown microbe-disease associations at the network level. For example, bi-random walk [10], KATZ measure combining network topology information [11] and network consistency projection in conjunction with label propagation [12] were utilized for inferring new microbe-disease associations. The network-based approaches can provide good interpretability of prediction results. However, there is still opportunity for improvements in their performance.

Matrix factorization-based approaches, usually under the low-rank assumption, have been widely used to recover user-item preference matrix in recommender systems [13]. Analogously, computational methods [14–16] were developed to apply matrix factorization to fill out the unknown elements in the original microbe-disease association matrix for new association predictions. Because of significant computational complexity of matrix operations, challenges would exist when these matrix factorization approaches are applied to large-scale datasets.

More recently, the fast advances in machine learning, especially deep learning, enable the development of efficient algorithms for the prediction of microbe-disease associations. For instance, adaptive boosting [17], back-propagation neural network [18] and deep sparse autoencoder neural network [19] were applied to infer new microbe-disease associations. These computational methods are receiving encouraging prediction results.

Despite of success of the above methods in identifying microbe-disease associations, some challenges should be tackled for better predictions. First, as biomedical technologies advance, more and more data features of microbes and diseases are available. Integrating these heterogeneous features for more reliable and accurate prediction is a challenging task. Second, biomedically confirmed negative samples cannot be obtained when using supervised learning methods for association prediction. Some proposed methods usually select negative samples randomly from the unlabeled microbe-disease pairs, in which noise exists and inaccurate results would be received. Finally, the successful applications of deep learning are encouraging us to develop more robust and precise algorithms to predict microbe-disease associations.

In this study, we develop an algorithm named SGJMDA based on similarity fusion using graph convolution networks and jumping knowledge networks for microbe-disease association predictions. We first collect four categories of similarities for microbes and diseases, respectively. A non-linear method is applied to fuse these similarities. Graph convolution networks and jumping knowledge networks are then used to extract features of microbes and diseases, respectively. Linear correlation coefficients between microbes and diseases are finally calculated as the prediction results. We comprehensively test and compare the performance of SGJMDA based on benchmark datasets and cross validations. We also conduct case studies to showcase the prediction ability of SGJMDA in real situations. With excellent performance received, we expect our method SGJMDA would be helpful for biomedical researchers in predicting microbe-disease associations.

Materials and methods

Datasets

In this study, we download the benchmark datasets from reference [19] for performance analysis. We give a brief explanation of the datasets below.

Human microbe–disease associations

In reference [19], authors retrieved human microbe-disease associations from three existing databases (i.e. HMDAD [6], Disbiome [7] and Peryton [8]). After deleting the redundant data and information merging, 4499 experimentally validated microbe-disease associations, which contain 1177 microbes and 134 diseases, were collected as gold standard data. We use $N_{m}$ and $N_{d}$ to denote the numbers of microbes and diseases, respectively. Meanwhile, A $\in R^{N_{m} \times N_{d}}$ is used to represent the adjacency matrix of the microbe-disease associations, where $N_{m}$ (=1177) is the number of rows (microbes) and $N_{d}$ (=134) denotes the number of columns (diseases). A(i,j) = 1 indicates an association between microbe $m_{i}$ and disease $d_{j}$ exists. Otherwise A(i,j) = 0.

Similarity calculation and fusion

In reference [19], authors utilized four methods to compute similarities for microbes and diseases. Firstly, they calculated the semantic similarity(DS)of diseases. Based on this, the functional similarity(FS)of microbes was then derived. Subsequently, they further computed the cosine similarity (COS_MS，COS_DS), Gaussian interaction profile similarity(GIP_MS, GIP_DS), and sigmoid kernel function similarity(SIG_MS，SIG_DS)for both microbes and diseases. We download the similarities from reference [19].

As data from different sources can provide complementary information, while containing potential noise [20, 21], we apply a non-linear strategy, motivated by reference [20], to fuse these similarities for both microbes and diseases. Firstly, we standardize COS_MS, GIP_MS, and SIG_MS of microbes. Taking COS_MS as an example, the normalization process is computed as follows:

C O S_M S (i, j) = {\begin{array}{cc} \frac{C O S_M S (i, j)}{2 * \sum_{k \neq i} C O S_M S (i, k)}, & i \neq j \\ \frac{1}{2}, & i = j \end{array}

(1)

Set all diagonal elements of the matrix to 1/2, and the total sum of elements in each row is equal to 1. We can obtain $S M_{C O S}$ by normalizing COS_MS. Meanwhile, we process GIP_MS and SIG_MS in the same way to obtain $S M_{G I P}$ and $S M_{S I G}$ ⁠, respectively. Next, we use the KNN algorithm to calculate the local affinity $S {K N N}_{C O S}$ for microbe $i$ and microbe $j$ as follows:

S K N N_{C O S} (i, j) = {\begin{array}{cc} \frac{C O S_M S (i, j)}{\sum_{k \in N_{i}} C O S_M S (i, k)}, & j \in N_{i} \\ 0, & o t h e r w i s e \end{array}

(2)

$N_{i}$ is the set of KNNs for a given node, where $N_{i}$ is equal to the total number of microbes divided by 10. Based on the principle that the closer the distance, the higher the similarity, we set the similarity between nodes far away from a given node to 0. Similarly, we can obtain $S {K N N}_{G I P}$ and $S {K N N}_{S I G}$ ⁠.

We simultaneously update the three similarity networks as follows:

S M_{C O S}^{(t)} = S K N N_{C O S} \times (\frac{\sum_{k \neq C O S_M S} S M_{k}^{(t - 1)}}{m - 1}) \times {(S K N N_{C O S})}^{T},

(3)

S M_{G I P}^{(t)} = S K N N_{G I P} \times (\frac{\sum_{k \neq G I P_M S} S M_{k}^{(t - 1)}}{m - 1}) \times {(S K N N_{G I P})}^{T},

(4)

S M_{S I G}^{(t)} = S K N N_{S I G} \times (\frac{\sum_{k \neq S I G_M S} S M_{k}^{(t - 1)}}{m - 1}) \times {(S K N N_{S I G})}^{T},

(5)

Among them, m represents the number of different similarity networks of microbes. Since we use three similarity networks, therefore m = 3. K is the selected microbe similarity network. T denotes the times of iterations. The similarity matrix SM is calculated as:

S M = \frac{S M_{C O S}^{(t)} + S M_{G I P}^{(t)} + S M_{S I G}^{(t)}}{3},

(6)

When the condition $\frac{‖ S M_{k}^{(t)} - S M_{k}^{(t - 1)} ‖}{‖ S M_{k}^{(t - 1)} ‖} < 10^{- 6}$ is met, the iteration will end. We obtain $S M^{'}$ through the following equation:

S M^{'} = \frac{S M + S M^{T}}{2},

(7)

Finally, we set a hyperparameter $α$ to merge FS with $S M^{'}$ as follows:

S M^{''} = α * S M^{'} + (1 - α) * F S,

(8)

Figure 1

The workflow of SGJMDA in microbe-disease association inference.

Open in new tab Download slide

The hyperparameter α is a weighting factor. $S M^{''}$ represents the final result of fused similarity feature matrices. Similarly, we can use the above methods to obtain the similarity feature matrix $S D^{''}$ of diseases.

Method architecture

The architecture of SGJMDA for association prediction is presented below. The computational framework mainly consists of two parts. The first part is feature extraction and the other one is association prediction. We illustrate the workflow of SGJMDA in Fig. 1.

After similarity fusion, we apply GCN [22] for feature extraction for both microbes and diseases. The progressive spread rule formula of GCN used in our study is similar to reference [23], which is defined as follows:

H^{(l + 1)} = f (H^{(l)}, G) = σ (D^{- \frac{1}{2}} G D^{\frac{1}{2}} H^{(l)} W^{(l)}),

(9)

Where $H^{(l)}$ denotes the embeddings of nodes at the l-th layer, $D = diag (\sum_{j} G_{i j})$ is the degree matrix of $G$ ⁠, $W^{(l)}$ is the trainable weight matrix to the l-th layer and $σ (\cdot)$ is a non-linear activation function.

Taking microbes as an example, we define the input graph G as:

G = [S M^{''}],

(10)

Then, our first layer GCN can be formulated as:

H^{(1)} = σ (D^{- \frac{1}{2}} G D^{\frac{1}{2}} {(S M^{''})}^{(0)} W^{(0)}),

(11)

where $W^{(0)} \in R^{N_{m} \times k}$ represents the first layer weight matrix, and k is used to change the dimension of the embedding. Due to the fact that our model only uses one layer of GCN in the homogeneous network, the feature output of microbes obtained through GCN is $S M^{'''} \in R^{N_{m} \times k}$ ⁠. Similarly, we can obtain the feature output of diseases $S D^{'''} \in R^{N_{d} \times k}$ ⁠.

Meanwhile, inspired by MINIMDA [24], we construct microbe-disease heterogeneous networks to integrate mixed high-order neighborhood information to further obtain representation of microbes and diseases. We use the microbe-disease adjacency matrix and fully connected $S M^{'''}$ and $S D^{'''}$ for feature extraction as follow:

H^{(l + 1)} = ‖_{i \in K^{σ}} ({({\tilde{D}}^{- \frac{1}{2}} \tilde{G} {\tilde{D}}^{\frac{1}{2}})}^{i} H^{(l)} W_{i}^{(l)}),

(12)

\tilde{G} = [A],

(13)

H^{(0)} = [\begin{matrix} S M^{'''} \\ S D^{'''} \end{matrix}],

(14)

where $K$ represents the power of ${\tilde{D}}^{- \frac{1}{2}} \tilde{G} {\tilde{D}}^{\frac{1}{2}}$ ⁠, and $K = {0, 1, 2, . . ., k}$ denotes the order of neighbors for information propagation between features. When $k = 2$ ⁠, $K$ will be ${0, 1, 2}$ ⁠, which means that the model will only receive information from the 0, 1st, and 2nd order neighborhoods.

In order to effectively aggregate the representation of intermediate layers to the final layer, we apply the mechanism of jumping knowledge (JK) network [25] to aggregates these different layers, which is calculated as follows:

\tilde{H} = \sum_{i = 1}^{l} ω_{i} H^{(i)},

(15)

where $ω_{i}$ represents the weighting coefficient for feature aggregation in the $i$ -th layer. The features $\tilde{S M}$ and $\tilde{S D}$ of microbes and diseases will then be obtained, respectively.

We use linear correlation coefficients to infer possible microbe-disease associations [26], which is computed according to the following equation:

C o r r (λ_{i}, λ_{j}) = \frac{(λ_{i} - μ_{i}) {(λ_{j} - μ_{j})}^{T}}{\sqrt{(λ_{i} - μ_{i}) {(λ_{i} - μ_{i})}^{T}} \sqrt{(λ_{j} - μ_{j}) {(λ_{j} - μ_{j})}^{T}}},

(16)

where

λ_{i} \in [\begin{matrix} S M^{'''} \\ \tilde{S M} \end{matrix}]

and

λ_{j} \in [\begin{matrix} S D^{'''} \\ \tilde{S D} \end{matrix}]

are vectors representing the features of the

i

-th microbe and

j

-th diseases, respectively, and

μ_{i}

and

μ_{j}

are the average values of

λ_{i}

and

λ_{j}

⁠.

Finally, we use the sigmoid function to reconstruct the microbe-disease matrix $\hat{A}$ ⁠, according to the following formula:

\hat{A} = φ (C o r r ([\begin{matrix} S M^{'''} \\ \tilde{S M} \end{matrix}], [\begin{matrix} S D^{'''} \\ \tilde{S D} \end{matrix}])),

(17)

Optimization of model parameters is based on binary cross entropy loss (see equation 18):

L = - \sum_{(i, j) \in γ^{+} \cup γ -} [A_{i j} \ln \hat{A_{i j}} + (1 - A_{i j}) \ln (1 - \hat{A_{i j}})],

(18)

where $γ^{+}$ and $γ^{-}$ represent the sets of positive and negative instances used in training. $(i, j)$ indicates microbe $i$ and disease $j$ ⁠. ${\hat{A}}_{i j}$ denotes the predicted score between microbe $i$ and disease $j$ ⁠.

Results

Experimental setting

We use 5-fold and 10-fold cross-validations (5-CV and 10-CV) to evaluate the performance of our model. For 5-CV, we randomly divide the 4499 microbe-disease associations into five roughly equal parts, with four parts for training and the remaining one for testing. The similar steps are taken in the 10-CV tests. We further calculate AUC, AUPR, Recall, precision (Pre), accuracy (ACC) and F1-score as indicators for performance comparison.

Hyperparameter analysis

Our method SGJMDA contains the following hyperparameters: (i) the proportion coefficient of feature fusion $α$ ⁠, (ii) the embedding dimension layer_size, (iii) the number of layers in multi-neighborhood GCN n, and (iv) the number of head nodes in multi-neighborhood GCN $k$ ⁠. We analyse the impact of different parameter setting on prediction performance based on 10-CV using the benchmark datasets.

Firstly, for the proportion coefficient of feature fusion $α$ ⁠, we select its value as 0.1, 0.2, 0.25, 0.28, 0.3 and 0.4 for comparison and find that when the proportion coefficient $α$ of feature fusion equals 0.28, the model receives the best performance (see Fig. 2).

$Performance analysis on the proportion coefficient $\alpha$.$

Figure 2

Performance analysis on the proportion coefficient $α$ ⁠.

Open in new tab Download slide

Secondly, we change the dimension of layer_size embedded in GCN and analyse its impact on prediction performance. As shown in Fig. 3, we set its value to be 32, 64, 128 and 256 and results show that when layer_size = 128, our model performs best.

Figure 3

Performance analysis on the dimension of layer_size.

Open in new tab Download slide

Thirdly, we select the number of layers for multi-neighborhood GCN in our model to be 1, 2, 4, and 6. Figure 4 shows that when the number is 2, our model performances best.

Figure 4

Performance analysis on the number of layers in multi-neighborhood.

Open in new tab Download slide

Finally, we test the impact of the number of head nodes $k$ in multi-neighborhood GCN. We set its value as 2, 3, 4, 5, and 6 respectively. As shown in Fig. 5, when $k$ equals 4, our model receives its best performance.

Figure 5

Performance analysis on the number of head nodes in multi-neighborhood.

Open in new tab Download slide

Based on the above experimental tests, we set $α$ to 0.28, layer_size to 128, n to 2 and $k$ to 4 in SGJMDA. In addition, we empirically set lr = 0.001, wd = 1e-5, and epoch = 2000 in our study.

Ablation experiments

There are five key modules in our method SGJMDA: feature fusion, GCN fused with homogeneous networks, multi-neighborhood GCN, jumping knowledge and decoder. We remove each component from SGJMDA separately to investigate their impacts on prediction ability. Here are the five models we test and compare:

SGJMDA-SF model: We remove the feature fusion component used in this paper and replace it with averaging, while keeping the rest unchanged.

SGJMDA-Hom model: We preserve feature fusion, multi-neighborhood GCN, jumping knowledge and decoder, and replace GCN with fully connected networks.

SGJMDA-Het model: We remove multi-neighborhood GCN and jumping knowledge, while keep other modules unchanged.

SGJMDA-JK model: We remove the jumping knowledge module, leaving all other modules unchanged.

SGJMDA-Dec model: The decoder is replaced by utilizing fused matric of microbial feature and the transposition of fused matric of disease features for matrix multiplication. Subsequently, applying the sigmoid function to generate the final score matrix.

As shown in Table 1, the performance of SGJMDA-SF is inferior to SGJMDA, indicating that our non-linear fusion strategy can more effectively integrate features of microbes and diseases and achieve better prediction performance; The performance of SGJMDA-Hom is worse when compared with SGJMDA, indicating that homogeneous GCN can better learn the features of microbes and diseases; SGJMDA-Het is only slightly better than SGJMDA on Recall, while other indicators are lower, demonstrating that multi-neighborhood GCN also contributes to the embedding of microbes and diseases; The performance of SGJMDA-JK is worse than that of SGJMDA, suggesting that the using jumping knowledge can improve prediction performance; SGJMDA-Dec performs much worse than SGJMDA, indicating that the linear correlation coefficients are suitable to predict microbe-disease associations in our study.

Table 1

Open in new tab

Results of ablation tests based on 10-CV

Method	AUC	AUPR	Accuracy	Precision	Recall	F1-score
SGJMDA SGJMDA-SF SGJMDA-Hom SGJMDA-Het SGJMDA-JK SGJMDA-Dec	0.9509 0.9295 0.9364 0.9495 0.9404 0.8433	0.9450 0.9297 0.9293 0.9410 0.9273 0.8650	0.8914 0.8608 0.8728 0.8908 0.8821 0.7912	0.8677 0.8548 0.8579 0.8628 0.8593 0.8107	0.9251 0.8695 0.8949 0.9299 0.9155 0.7633	0.8951 0.8620 0.8755 0.8945 0.8861 0.7853

Method	AUC	AUPR	Accuracy	Precision	Recall	F1-score
SGJMDA SGJMDA-SF SGJMDA-Hom SGJMDA-Het SGJMDA-JK SGJMDA-Dec	0.9509 0.9295 0.9364 0.9495 0.9404 0.8433	0.9450 0.9297 0.9293 0.9410 0.9273 0.8650	0.8914 0.8608 0.8728 0.8908 0.8821 0.7912	0.8677 0.8548 0.8579 0.8628 0.8593 0.8107	0.9251 0.8695 0.8949 0.9299 0.9155 0.7633	0.8951 0.8620 0.8755 0.8945 0.8861 0.7853

Table 1

Open in new tab

Results of ablation tests based on 10-CV

Method	AUC	AUPR	Accuracy	Precision	Recall	F1-score
SGJMDA SGJMDA-SF SGJMDA-Hom SGJMDA-Het SGJMDA-JK SGJMDA-Dec	0.9509 0.9295 0.9364 0.9495 0.9404 0.8433	0.9450 0.9297 0.9293 0.9410 0.9273 0.8650	0.8914 0.8608 0.8728 0.8908 0.8821 0.7912	0.8677 0.8548 0.8579 0.8628 0.8593 0.8107	0.9251 0.8695 0.8949 0.9299 0.9155 0.7633	0.8951 0.8620 0.8755 0.8945 0.8861 0.7853

Method	AUC	AUPR	Accuracy	Precision	Recall	F1-score
SGJMDA SGJMDA-SF SGJMDA-Hom SGJMDA-Het SGJMDA-JK SGJMDA-Dec	0.9509 0.9295 0.9364 0.9495 0.9404 0.8433	0.9450 0.9297 0.9293 0.9410 0.9273 0.8650	0.8914 0.8608 0.8728 0.8908 0.8821 0.7912	0.8677 0.8548 0.8579 0.8628 0.8593 0.8107	0.9251 0.8695 0.8949 0.9299 0.9155 0.7633	0.8951 0.8620 0.8755 0.8945 0.8861 0.7853

Comparison with other methods

We compare SGJMDA with six baseline methods (i.e. DSAE_RF [19], AMHMDA [27], MHCLMDA [28], MNNMDA [29], LRLSHMDA [30], and NTSHMDA [31]) using the same benchmark datasets and cross-validations.

We plot the ROC and PR curves of these methods based on 5-CV tests in Fig. 6 for comparison. It can be found that the average AUC value of SGJMDA is 0.9479, which is 2.41% (DSAE_RF), 6.68% (AMHMDA), 6.38% (MHCLMDA), 2.30% (MNNMDA), 12.39% (LRLSHMDA), and 15.12% (NTSHMDA) higher than the other 6 methods, respectively. We can also see the average AUPR value for SGJMDA is 0.9410, which is 2.32% (DSAE_RF), 6.96% (AMHMDA), 6.45% (MHCLMDA), 0.54% (MNNMDA), 14.94% (LRLSHMDA), and 16.83% (NTSHMDA) higher than other methods, respectively. The results of other performance indicators based on 5-CV tests are shown in Table 2. These results suggest that SGJMDA outperforms the other six methods, significantly.

Figure 6

ROC and PR curves of different methods in association prediction based on 5-CV.

Open in new tab Download slide

Table 2

Open in new tab

Performance comparison with the baseline methods based on 5-CV

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8836	0.8507	0.9318	0.8890
DSAE_RF	0.8484	0.8490	0.8482	0.8482
AMHMDA	0.7742	0.8379	0.6918	0.7467
MHCLMDA	0.7178	0.7788	0.8635	0.8187
MNNMDA	0.8775	0.8861	0.8666	0.8762
LRLSHMDA	0.7683	0.7324	0.8504	0.7860
NTSHMDA	0.7076	0.6559	0.8780	0.7504

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8836	0.8507	0.9318	0.8890
DSAE_RF	0.8484	0.8490	0.8482	0.8482
AMHMDA	0.7742	0.8379	0.6918	0.7467
MHCLMDA	0.7178	0.7788	0.8635	0.8187
MNNMDA	0.8775	0.8861	0.8666	0.8762
LRLSHMDA	0.7683	0.7324	0.8504	0.7860
NTSHMDA	0.7076	0.6559	0.8780	0.7504

Table 2

Open in new tab

Performance comparison with the baseline methods based on 5-CV

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8836	0.8507	0.9318	0.8890
DSAE_RF	0.8484	0.8490	0.8482	0.8482
AMHMDA	0.7742	0.8379	0.6918	0.7467
MHCLMDA	0.7178	0.7788	0.8635	0.8187
MNNMDA	0.8775	0.8861	0.8666	0.8762
LRLSHMDA	0.7683	0.7324	0.8504	0.7860
NTSHMDA	0.7076	0.6559	0.8780	0.7504

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8836	0.8507	0.9318	0.8890
DSAE_RF	0.8484	0.8490	0.8482	0.8482
AMHMDA	0.7742	0.8379	0.6918	0.7467
MHCLMDA	0.7178	0.7788	0.8635	0.8187
MNNMDA	0.8775	0.8861	0.8666	0.8762
LRLSHMDA	0.7683	0.7324	0.8504	0.7860
NTSHMDA	0.7076	0.6559	0.8780	0.7504

Similarly, we plot the 10-CV results in Fig. 7. The average AUC value of SGJMDA is 95.09%, which surpasses the other six methods by 2.55% (DSAE_RF), 6.67% (AMHMDA), 6.92% (MHCLMDA), 2.37% (MNNMDA), 12.14% (LRLSHMDA), and 15.65% (NTSHMDA), respectively. Meanwhile, the average AUPR value of SGJMDA is 0.9450, which is 2.51% (DSAE_RF), 7.04% (AMHMDA), 7.73% (MHCLMDA), 0.87% (MNNMDA), 14.89% (LRLSHMDA), and 17.51% (NTSHMDA) higher, respectively. Other performance indicators are provided in Table 3. Results from 10-CV tests again confirm the superior performance of our method SGJMDA.

Figure 7

ROC and PR curves of different methods in association prediction based on 10-CV.

Open in new tab Download slide

Table 3

Open in new tab

Performance comparison with the baseline methods based on 10-CV

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8914	0.8677	0.9251	0.8951
DSAE_RF	0.8481	0.8486	0.8480	0.8477
AMHMDA	0.7974	0.8264	0.7565	0.7870
MHCLMDA	0.7295	0.7723	0.8844	0.8237
MNNMDA	0.8803	0.8835	0.8789	0.8801
LRLSHMDA	0.7723	0.7304	0.8680	0.7923
NTSHMDA	0.7175	0.6706	0.8595	0.7526

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8914	0.8677	0.9251	0.8951
DSAE_RF	0.8481	0.8486	0.8480	0.8477
AMHMDA	0.7974	0.8264	0.7565	0.7870
MHCLMDA	0.7295	0.7723	0.8844	0.8237
MNNMDA	0.8803	0.8835	0.8789	0.8801
LRLSHMDA	0.7723	0.7304	0.8680	0.7923
NTSHMDA	0.7175	0.6706	0.8595	0.7526

Table 3

Open in new tab

Performance comparison with the baseline methods based on 10-CV

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8914	0.8677	0.9251	0.8951
DSAE_RF	0.8481	0.8486	0.8480	0.8477
AMHMDA	0.7974	0.8264	0.7565	0.7870
MHCLMDA	0.7295	0.7723	0.8844	0.8237
MNNMDA	0.8803	0.8835	0.8789	0.8801
LRLSHMDA	0.7723	0.7304	0.8680	0.7923
NTSHMDA	0.7175	0.6706	0.8595	0.7526

Method	Accuracy	Precision	Recall	F1-score
SGJMDA	0.8914	0.8677	0.9251	0.8951
DSAE_RF	0.8481	0.8486	0.8480	0.8477
AMHMDA	0.7974	0.8264	0.7565	0.7870
MHCLMDA	0.7295	0.7723	0.8844	0.8237
MNNMDA	0.8803	0.8835	0.8789	0.8801
LRLSHMDA	0.7723	0.7304	0.8680	0.7923
NTSHMDA	0.7175	0.6706	0.8595	0.7526

In addition, we adopt the same strategy of selecting negative samples as DSAE_RF [19] (k-means clustering selection), followed by 10-CV tests, and compare the prediction performance with DSAE_RF. The results are listed in Fig. 8. Results from Table 3 and Fig. 8 show SGJMDA receives better prediction performance than DSAE_RF [19] when using k-means to select negative samples.

Figure 8

Performance comparison between SGJMDA and DSAE_RF when using the same k-means clustering for negative sample selection.

Open in new tab Download slide

Case studies

We further carry out case studies on three important diseases (i.e. obesity, Crohn’s disease and colorectal cancer) to test SGJMDA’ s prediction ability in real situations. Specifically, we first exclude the association information for each specific disease from the benchmark datasets. Then, we train SGJMDA to infer disease-associated microbes. Finally, we select the top 20 predictions for validation. We use the latest versions of HMDAD, Disbiome, and Peryton to confirm the results.

As an epidemic worldwide, obesity increases the incidence of diabetes, heart disease, high blood pressure, and cancer [32]. Conventional knowledge suggests that behaviors that lead to overeating and inactivity are the main contributors to obesity. However, there are several microorganisms that have been linked to obesity in humans [33]. The findings suggest that obesity has a microbial factor, which may have potential therapeutic implications [34]. We use SGJMDA to infer obesity-associated microbes. We select the top 20 predictions and discover that 19 of them have been confirmed in the HMDAD, Disbiome and Peryton databases. We showcase the results in Table 4.

Table 4

Open in new tab

The top 20 predicted obesity-associated microbes

Ranking	Microbe	Evidence
1	Corynebacterium	PMID:30654751
2	Peptostreptococcaceae	PMID:30572569
3	Streptococcus gordonii	PMID:19587155
4	Ruminococcus	PMID:31399369
5	Ruminococcaceae	PMID:29280312
6	Eubacterium	PMID:23055155
7	Bacteroides eggerthii	PMID:29388394
8	Coprococcus	PMID:30572569
9	Prevotella	PMID:31024514
10	Faecalibacterium	PMID:23985870
11	Lactobacillus	PMID:23631345
12	Streptococcus	PMID:29576948
13	Bacteriodes uniformis	PMID:29338886
14	Collinsella aerofaciens	NA
15	Streptococcus oralis	PMID:29520825
16	Proteobacteria	PMID:30386323
17	Bifidobacterium	PMID:29280312
18	Escherichia	PMID:23055155
19	Blautia	PMID:31530820
20	Prevotella melaninogenica	PMID:19587155

Ranking	Microbe	Evidence
1	Corynebacterium	PMID:30654751
2	Peptostreptococcaceae	PMID:30572569
3	Streptococcus gordonii	PMID:19587155
4	Ruminococcus	PMID:31399369
5	Ruminococcaceae	PMID:29280312
6	Eubacterium	PMID:23055155
7	Bacteroides eggerthii	PMID:29388394
8	Coprococcus	PMID:30572569
9	Prevotella	PMID:31024514
10	Faecalibacterium	PMID:23985870
11	Lactobacillus	PMID:23631345
12	Streptococcus	PMID:29576948
13	Bacteriodes uniformis	PMID:29338886
14	Collinsella aerofaciens	NA
15	Streptococcus oralis	PMID:29520825
16	Proteobacteria	PMID:30386323
17	Bifidobacterium	PMID:29280312
18	Escherichia	PMID:23055155
19	Blautia	PMID:31530820
20	Prevotella melaninogenica	PMID:19587155

Note: NA indicates not available.

Table 4

Open in new tab

The top 20 predicted obesity-associated microbes

Ranking	Microbe	Evidence
1	Corynebacterium	PMID:30654751
2	Peptostreptococcaceae	PMID:30572569
3	Streptococcus gordonii	PMID:19587155
4	Ruminococcus	PMID:31399369
5	Ruminococcaceae	PMID:29280312
6	Eubacterium	PMID:23055155
7	Bacteroides eggerthii	PMID:29388394
8	Coprococcus	PMID:30572569
9	Prevotella	PMID:31024514
10	Faecalibacterium	PMID:23985870
11	Lactobacillus	PMID:23631345
12	Streptococcus	PMID:29576948
13	Bacteriodes uniformis	PMID:29338886
14	Collinsella aerofaciens	NA
15	Streptococcus oralis	PMID:29520825
16	Proteobacteria	PMID:30386323
17	Bifidobacterium	PMID:29280312
18	Escherichia	PMID:23055155
19	Blautia	PMID:31530820
20	Prevotella melaninogenica	PMID:19587155

Ranking	Microbe	Evidence
1	Corynebacterium	PMID:30654751
2	Peptostreptococcaceae	PMID:30572569
3	Streptococcus gordonii	PMID:19587155
4	Ruminococcus	PMID:31399369
5	Ruminococcaceae	PMID:29280312
6	Eubacterium	PMID:23055155
7	Bacteroides eggerthii	PMID:29388394
8	Coprococcus	PMID:30572569
9	Prevotella	PMID:31024514
10	Faecalibacterium	PMID:23985870
11	Lactobacillus	PMID:23631345
12	Streptococcus	PMID:29576948
13	Bacteriodes uniformis	PMID:29338886
14	Collinsella aerofaciens	NA
15	Streptococcus oralis	PMID:29520825
16	Proteobacteria	PMID:30386323
17	Bifidobacterium	PMID:29280312
18	Escherichia	PMID:23055155
19	Blautia	PMID:31530820
20	Prevotella melaninogenica	PMID:19587155

Note: NA indicates not available.

For Crohn’s disease [35], we first remove the association information from the benchmark datasets, and apply SGJMDA to predict its related microbes. We find that all the top 20 predictions are validated by the latest databases. We list the results in Table 5.

Table 5

Open in new tab

The top 20 predicted Crohn’s disease-associated microbes

Ranking	Microbe	Evidence
1	Akkermansia	PMID:28222161
2	Ruminococcaceae	PMID:25121355
3	Prevotella	PMID:24013298
4	Alistipes	PMID:20816835
5	Faecalibacterium	PMID:17119388
6	Corynebacterium	PMID:22068912
7	Lactobacillus	PMID:17897884
8	Faecalibacterium prausnitzii	PMID:19235886
9	Ruminococcus	PMID:22068912
10	Fusobacterium	PMID:30927743
11	Bifidobacterium	PMID:26789999
12	Coprococcus	PMID:30478724
13	Blautia	PMID:31899727
14	Collinsella	PMID:20816835
15	Megasphaera	PMID:27083382
16	Rothia	PMID:26288001
17	Collinsella aerofaciens	PMID:26804920
18	Pseudomonas	PMID:26574491
19	Anaerostipes	PMID:26313691
20	Streptococcus	PMID:30545401

Ranking	Microbe	Evidence
1	Akkermansia	PMID:28222161
2	Ruminococcaceae	PMID:25121355
3	Prevotella	PMID:24013298
4	Alistipes	PMID:20816835
5	Faecalibacterium	PMID:17119388
6	Corynebacterium	PMID:22068912
7	Lactobacillus	PMID:17897884
8	Faecalibacterium prausnitzii	PMID:19235886
9	Ruminococcus	PMID:22068912
10	Fusobacterium	PMID:30927743
11	Bifidobacterium	PMID:26789999
12	Coprococcus	PMID:30478724
13	Blautia	PMID:31899727
14	Collinsella	PMID:20816835
15	Megasphaera	PMID:27083382
16	Rothia	PMID:26288001
17	Collinsella aerofaciens	PMID:26804920
18	Pseudomonas	PMID:26574491
19	Anaerostipes	PMID:26313691
20	Streptococcus	PMID:30545401

Table 5

Open in new tab

The top 20 predicted Crohn’s disease-associated microbes

Ranking	Microbe	Evidence
1	Akkermansia	PMID:28222161
2	Ruminococcaceae	PMID:25121355
3	Prevotella	PMID:24013298
4	Alistipes	PMID:20816835
5	Faecalibacterium	PMID:17119388
6	Corynebacterium	PMID:22068912
7	Lactobacillus	PMID:17897884
8	Faecalibacterium prausnitzii	PMID:19235886
9	Ruminococcus	PMID:22068912
10	Fusobacterium	PMID:30927743
11	Bifidobacterium	PMID:26789999
12	Coprococcus	PMID:30478724
13	Blautia	PMID:31899727
14	Collinsella	PMID:20816835
15	Megasphaera	PMID:27083382
16	Rothia	PMID:26288001
17	Collinsella aerofaciens	PMID:26804920
18	Pseudomonas	PMID:26574491
19	Anaerostipes	PMID:26313691
20	Streptococcus	PMID:30545401

Ranking	Microbe	Evidence
1	Akkermansia	PMID:28222161
2	Ruminococcaceae	PMID:25121355
3	Prevotella	PMID:24013298
4	Alistipes	PMID:20816835
5	Faecalibacterium	PMID:17119388
6	Corynebacterium	PMID:22068912
7	Lactobacillus	PMID:17897884
8	Faecalibacterium prausnitzii	PMID:19235886
9	Ruminococcus	PMID:22068912
10	Fusobacterium	PMID:30927743
11	Bifidobacterium	PMID:26789999
12	Coprococcus	PMID:30478724
13	Blautia	PMID:31899727
14	Collinsella	PMID:20816835
15	Megasphaera	PMID:27083382
16	Rothia	PMID:26288001
17	Collinsella aerofaciens	PMID:26804920
18	Pseudomonas	PMID:26574491
19	Anaerostipes	PMID:26313691
20	Streptococcus	PMID:30545401

Similarly, for colorectal cancer [36], we apply SGJMDA to predict its potentially associated microbes. For the predicted 20 predictions, we find that 11 associations are confirmed (Table 6).

Table 6

Open in new tab

The top 20 predicted colorectal cancer-associated microbes

Ranking	Microbe	Evidence
1	Micrococcus	PMID:28600626
2	Erysipelotrichaceae incertae sedis	NA
3	Aggregatibacter	PMID:27742762
4	Arthrospira	PMID:25150117
5 6	Alloscardovia propionicimonas	NA NA
7	Peptostreptococcaceae incertae sedis	PMID:22114001
8	Pandoraea	PMID:35672730
9	Mycobacterium tuberculosis	PMID:36183156
10	Delftia tsuruhatensis	NA
11	Propionimicrobium lymphophilum	NA
12	Varibaculum cambriense	NA
13	Acidovorax	PMID:36717544
14	Pseudothermotoga	NA
15	Lactobacillus taiwanensis	PMID:29650970
16	Anaerococcus tetradius	NA
17	Wolbachia	NA
18	Rhodobacteraceae	PMID:37317301
19	Mycoplasma	PMID:37772998
20	Treponema	PMID:35664963

Ranking	Microbe	Evidence
1	Micrococcus	PMID:28600626
2	Erysipelotrichaceae incertae sedis	NA
3	Aggregatibacter	PMID:27742762
4	Arthrospira	PMID:25150117
5 6	Alloscardovia propionicimonas	NA NA
7	Peptostreptococcaceae incertae sedis	PMID:22114001
8	Pandoraea	PMID:35672730
9	Mycobacterium tuberculosis	PMID:36183156
10	Delftia tsuruhatensis	NA
11	Propionimicrobium lymphophilum	NA
12	Varibaculum cambriense	NA
13	Acidovorax	PMID:36717544
14	Pseudothermotoga	NA
15	Lactobacillus taiwanensis	PMID:29650970
16	Anaerococcus tetradius	NA
17	Wolbachia	NA
18	Rhodobacteraceae	PMID:37317301
19	Mycoplasma	PMID:37772998
20	Treponema	PMID:35664963

Note: NA indicates not available.

Table 6

Open in new tab

The top 20 predicted colorectal cancer-associated microbes

Ranking	Microbe	Evidence
1	Micrococcus	PMID:28600626
2	Erysipelotrichaceae incertae sedis	NA
3	Aggregatibacter	PMID:27742762
4	Arthrospira	PMID:25150117
5 6	Alloscardovia propionicimonas	NA NA
7	Peptostreptococcaceae incertae sedis	PMID:22114001
8	Pandoraea	PMID:35672730
9	Mycobacterium tuberculosis	PMID:36183156
10	Delftia tsuruhatensis	NA
11	Propionimicrobium lymphophilum	NA
12	Varibaculum cambriense	NA
13	Acidovorax	PMID:36717544
14	Pseudothermotoga	NA
15	Lactobacillus taiwanensis	PMID:29650970
16	Anaerococcus tetradius	NA
17	Wolbachia	NA
18	Rhodobacteraceae	PMID:37317301
19	Mycoplasma	PMID:37772998
20	Treponema	PMID:35664963

Ranking	Microbe	Evidence
1	Micrococcus	PMID:28600626
2	Erysipelotrichaceae incertae sedis	NA
3	Aggregatibacter	PMID:27742762
4	Arthrospira	PMID:25150117
5 6	Alloscardovia propionicimonas	NA NA
7	Peptostreptococcaceae incertae sedis	PMID:22114001
8	Pandoraea	PMID:35672730
9	Mycobacterium tuberculosis	PMID:36183156
10	Delftia tsuruhatensis	NA
11	Propionimicrobium lymphophilum	NA
12	Varibaculum cambriense	NA
13	Acidovorax	PMID:36717544
14	Pseudothermotoga	NA
15	Lactobacillus taiwanensis	PMID:29650970
16	Anaerococcus tetradius	NA
17	Wolbachia	NA
18	Rhodobacteraceae	PMID:37317301
19	Mycoplasma	PMID:37772998
20	Treponema	PMID:35664963

Note: NA indicates not available.

Moreover, we use SGJMDA to make comprehensive microbe-disease association predictions based on the whole information in the benchmark datasets. We select the top 20 predicted results for validation. We search PubMed (https://pubmed.ncbi.nlm.nih.gov/) for confirmation, and discover that 14 predictions have been verified (Table 7).

Table 7

Open in new tab

The top 20 predictions by SGJMDA

Ranking	Microbe	Disease	Evidence
1	Blautia	Chronic kidney disease	PMID:33101877
2	Micrococcus	Colorectal cancer	PMID:28600626
3	Alistipes	Chronic kidney disease	PMID:37809388
4	Klebsiella	Chronic kidney disease	PMID:37284390
5	Sutterella	Chronic kidney disease	PMID:36718700
6	Fusobacterium	Chronic kidney disease	PMID:33435396
7	Erysipelotrichaceae incertae sedis	Colorectal cancer	NA
8	Aggregatibacter	Colorectal cancer	NA
9	Odoribacter	Chronic kidney disease	PMID:37011727
10	Megasphaera	Chronic kidney disease	PMID:34357944
11	Oscillibacter	Chronic kidney disease	PMID:32560104
12	Arthrospira	Colorectal cancer	PMID:35946342
13	Staphylococcus	Chronic kidney disease	NA
14	Veillonella	Chronic kidney disease	PMID:38095826
15	Lachnospiraceae	Chronic kidney disease	PMID:37065213
16	Coriobacteriaceae	Chronic kidney disease	PMID:33681383
17	Alloscardovia	Colorectal cancer	NA
18	Propionicimonas	Colorectal cancer	NA
19	Proteobacteria	chronic kidney disease	PMID:29444477
20	Lachnospiraceae incertae sedis	Chronic kidney disease	NA

Ranking	Microbe	Disease	Evidence
1	Blautia	Chronic kidney disease	PMID:33101877
2	Micrococcus	Colorectal cancer	PMID:28600626
3	Alistipes	Chronic kidney disease	PMID:37809388
4	Klebsiella	Chronic kidney disease	PMID:37284390
5	Sutterella	Chronic kidney disease	PMID:36718700
6	Fusobacterium	Chronic kidney disease	PMID:33435396
7	Erysipelotrichaceae incertae sedis	Colorectal cancer	NA
8	Aggregatibacter	Colorectal cancer	NA
9	Odoribacter	Chronic kidney disease	PMID:37011727
10	Megasphaera	Chronic kidney disease	PMID:34357944
11	Oscillibacter	Chronic kidney disease	PMID:32560104
12	Arthrospira	Colorectal cancer	PMID:35946342
13	Staphylococcus	Chronic kidney disease	NA
14	Veillonella	Chronic kidney disease	PMID:38095826
15	Lachnospiraceae	Chronic kidney disease	PMID:37065213
16	Coriobacteriaceae	Chronic kidney disease	PMID:33681383
17	Alloscardovia	Colorectal cancer	NA
18	Propionicimonas	Colorectal cancer	NA
19	Proteobacteria	chronic kidney disease	PMID:29444477
20	Lachnospiraceae incertae sedis	Chronic kidney disease	NA

Note: NA indicates not available.

Table 7

Open in new tab

The top 20 predictions by SGJMDA

Ranking	Microbe	Disease	Evidence
1	Blautia	Chronic kidney disease	PMID:33101877
2	Micrococcus	Colorectal cancer	PMID:28600626
3	Alistipes	Chronic kidney disease	PMID:37809388
4	Klebsiella	Chronic kidney disease	PMID:37284390
5	Sutterella	Chronic kidney disease	PMID:36718700
6	Fusobacterium	Chronic kidney disease	PMID:33435396
7	Erysipelotrichaceae incertae sedis	Colorectal cancer	NA
8	Aggregatibacter	Colorectal cancer	NA
9	Odoribacter	Chronic kidney disease	PMID:37011727
10	Megasphaera	Chronic kidney disease	PMID:34357944
11	Oscillibacter	Chronic kidney disease	PMID:32560104
12	Arthrospira	Colorectal cancer	PMID:35946342
13	Staphylococcus	Chronic kidney disease	NA
14	Veillonella	Chronic kidney disease	PMID:38095826
15	Lachnospiraceae	Chronic kidney disease	PMID:37065213
16	Coriobacteriaceae	Chronic kidney disease	PMID:33681383
17	Alloscardovia	Colorectal cancer	NA
18	Propionicimonas	Colorectal cancer	NA
19	Proteobacteria	chronic kidney disease	PMID:29444477
20	Lachnospiraceae incertae sedis	Chronic kidney disease	NA

Ranking	Microbe	Disease	Evidence
1	Blautia	Chronic kidney disease	PMID:33101877
2	Micrococcus	Colorectal cancer	PMID:28600626
3	Alistipes	Chronic kidney disease	PMID:37809388
4	Klebsiella	Chronic kidney disease	PMID:37284390
5	Sutterella	Chronic kidney disease	PMID:36718700
6	Fusobacterium	Chronic kidney disease	PMID:33435396
7	Erysipelotrichaceae incertae sedis	Colorectal cancer	NA
8	Aggregatibacter	Colorectal cancer	NA
9	Odoribacter	Chronic kidney disease	PMID:37011727
10	Megasphaera	Chronic kidney disease	PMID:34357944
11	Oscillibacter	Chronic kidney disease	PMID:32560104
12	Arthrospira	Colorectal cancer	PMID:35946342
13	Staphylococcus	Chronic kidney disease	NA
14	Veillonella	Chronic kidney disease	PMID:38095826
15	Lachnospiraceae	Chronic kidney disease	PMID:37065213
16	Coriobacteriaceae	Chronic kidney disease	PMID:33681383
17	Alloscardovia	Colorectal cancer	NA
18	Propionicimonas	Colorectal cancer	NA
19	Proteobacteria	chronic kidney disease	PMID:29444477
20	Lachnospiraceae incertae sedis	Chronic kidney disease	NA

Note: NA indicates not available.

Discussion and conclusions

Studies have demonstrated that human microbiome has a profound impact on health. Their differences in abundance and diversity can help explain the susceptibility or resistance to certain diseases. Identifying disease-associated microbes would therefore boost our understanding of the pathogenesis of diseases and promote treatment to diseases. In this study, we develop a deep learning-based computational framework SGJMDA to infer new microbe–disease associations. Comprehensive experiments, including ablation tests, comparison with other methods and case studies, are carried out. Results show the superiority of our method in association prediction.

The factors that lead to the good performance of our method can be summarized as follows. First, we use a nonlinear strategy for similarity fusion. Comparative results show the similarity fused by our method can generate more accurate predictions. Second, we apply both GCN and jumping knowledge network to exact features from microbes and diseases, which can obtain high-order neighborhood representation information for them. Finally, we calculate linear correlation coefficients not matrix multiplication as prediction scores. Ablation tests demonstrate prediction performance can be improved by calculating linear correlation coefficients as prediction results.

Although our method SGJMDA performs well in terms of prediction performance, there are still some limitations. For example, the number of experimentally verified microbe-disease associations is limited, which would affect the prediction performance. We expect to solve this issue by integrating more reliable association information discovered in the future in our model. Meanwhile, optimizing the hyperparameters in SGJMDA is also a challenging task, which is a common problem in deep learning methods. These issues are further directions we need to study.

Key Points

We apply a non-linear strategy for similarity fusion for both microbes and diseases.
SGJMDA can effectively extract embeddings for both microbes and diseases using GCN and jumping knowledge networks.
SGJMDA outperforms existing methods and improves prediction accuracy in association prediction.

Conflict of interest: None declared.

Funding

Jiangxi Provincial Natural Science Foundation, China (20242BAB25083).

Data availability

The benchmark datasets and source codes used in our study are freely accessible at https://github.com/IamChenHailin/SGJMDA.

Author Biographies

Hailin Chen, PhD, is an associate professor at School of Information and Software Engineering, East China Jiaotong University. His research interest includes data mining and bioinformatics.

Kuan Chen is a graduate student at School of Information and Software Engineering, East China Jiaotong University. His research interest is deep learning and bioinformatics.

References

Cénit

Matzaraki

Tigchelaar

. et al.

Rapidly expanding knowledge on the role of the gut microbiome in health and disease

Biochimica et Biophysica Acta (BBA)-molecular basis of disease

2014

;

1842

1981

–

Google Scholar

Crossref

WorldCat

Ley

Peterson

Gordon

Ecological and evolutionary forces shaping microbial diversity in the human intestine

Cell

2006

;

124

837

–

10.1016/j.cell.2006.02.017

O'Hara

Shanahan

The gut flora as a forgotten organ

EMBO Rep

2006

;

688

–

10.1038/sj.embor.7400731

Sokol

Pigneur

Watterlot

. et al.

Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients

Proc Natl Acad Sci

2008

;

105

16731

–

10.1073/pnas.0804812105

Althani

Marei

Hamdi

. et al.

Human microbiome and its association with health and diseases

J Cell Physiol

2016

;

231

1688

–

Zhang

Zeng

. et al.

An analysis of human microbe-disease associations

Brief Bioinform

2017

;

–

Janssens

Nielandt

Bronselaer

. et al.

Disbiome database: linking the microbiome to disease

BMC Microbiol

2018

;

10.1186/s12866-018-1197-5

Skoufos

Kardaras

Alexiou

. et al.

Peryton: a manual collection of experimentally supported microbe-disease associations

Nucleic Acids Res

2021

;

D1328

–

Wen

Yan

Duan

. et al.

A survey on predicting microbe-disease associations: biological data and computational methods

Brief Bioinform

2021

;

bbaa157

10.

Zou

Zhang

A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network

PloS One

2017

;

e0184394

10.1371/journal.pone.0184394

11.

Chen

Huang

Y-A

You

Z-H

. et al.

A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases

Bioinformatics

2017

;

733

–

10.1093/bioinformatics/btw715

12.

Yin

M-M

Liu

J-X

Gao

Y-L

. et al.

NCPLP: a novel approach for predicting microbe-associated diseases with network consistency projection and label propagation

IEEE Transact Cybernet

2020

;

5079

–

10.1109/TCYB.2020.3026652

Google Scholar

Crossref

WorldCat

13.

Bennett

Lanning

. The netflix prize. In:

Proceedings of KDD Cup and Workshop

, p.

New York: Association for Computing Machinery (ACM)

2007

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

14.

Gao

Zhang

mHMDA: human microbe-disease association prediction by matrix completion and multi-source information

IEEE Access

2019

;

106687

–

10.1109/ACCESS.2019.2930453

Google Scholar

Crossref

WorldCat

15.

Yang

Kuang

Chen

. et al.

Multi-similarities bilinear matrix factorization-based method for predicting human microbe-disease associations

Front Genet

2021

;

754425

10.3389/fgene.2021.754425

16.

Zhang

. et al.

Novel collaborative weighted non-negative matrix factorization improves prediction of disease-associated human microbes

Front Microbiol

2022

;

834982

10.3389/fmicb.2022.834982

17.

Peng

Yin

Zhou

. et al.

Human microbe-disease association prediction based on adaptive boosting

Front Microbiol

2018

;

2440

10.3389/fmicb.2018.02440

18.

Wang

Zhang

. et al.

Identifying microbe-disease association based on a novel back-propagation neural network model

IEEE/ACM Trans Comput Biol Bioinform

2020

;

2502

–

10.1109/TCBB.2020.2986459

Google Scholar

Crossref

WorldCat

19.

Wang

Xuan

. et al.

Predicting potential microbe-disease associations based on multi-source features and deep learning

Brief Bioinform

2023

;

:1–14.

10.1093/bib/bbad255

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

20.

Wang

Mezlini

Demir

. et al.

Similarity network fusion for aggregating data types on a genomic scale

Nat Methods

2014

;

333

–

21.

Chen

Zhang

In silico drug repositioning based on the integration of chemical, genomic and pharmacological spaces

BMC Bioinformatics

2021

;

10.1186/s12859-021-03988-x

22.

Kipf

Max

Semi-supervised classification with graph convolutional networks

. arXiv preprint arXiv:1609.02907.

2016

23.

Huang

Zhao

. et al.

Predicting drug-disease associations through layer attention graph convolutional network

Brief Bioinform

2021

;

:1–11.

10.1093/bib/bbaa243

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

24.

Lou

Cheng

. et al.

Predicting miRNA-disease associations via learning multimodal networks and fusing mixed neighborhood information

Brief Bioinform

2022

;

:1–15.

Google Scholar

OpenURL Placeholder Text

WorldCat

25.

Tian

. et al. Representation learning on graphs with jumping knowledge networks. In:

International Conference on Machine Learning

, pp.

5453

–

Stockholm, Sweden: PMLR

2018

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

26.

Peng

Liu

Dai

. et al.

Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions

Bioinformatics

2022

;

4546

–

10.1093/bioinformatics/btac574

27.

Ning

Zhao

Gao

. et al.

AMHMDA: attention aware multi-view similarity networks and hypergraph learning for miRNA-disease associations identification

Brief Bioinform

2023

;

:1–11.

10.1093/bib/bbad094

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

28.

Peng

Dai

. et al.

MHCLMDA: multihypergraph contrastive learning for miRNA-disease association prediction

Brief Bioinform

2023

;

:1–12.

10.1093/bib/bbad524

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

29.

Liu

Bing

Zhang

. et al.

MNNMDA: predicting human microbe-disease association via a method to minimize matrix nuclear norm

Comput Struct Biotechnol J

2023

;

1414

–

10.1016/j.csbj.2022.12.053

30.

Wang

Huang

Chen

. et al.

LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction

Sci Rep

2017

;

7601

10.1038/s41598-017-08127-2

31.

Luo

Long

NTSHMDA: prediction of human microbe-disease association based on random walk by integrating network topological similarity

IEEE/ACM Trans Comput Biol Bioinform

2020

;

1341

–

10.1109/TCBB.2018.2883041

32.

Bray

Medical consequences of obesity

J Clin Endocrinol Metab

2004

;

2583

–

33.

Hegde

Dhurandhar

Microbes and obesity--interrelationship between infection, adipose tissue and the immune system

Clin Microbiol Infect

2013

;

314

–

34.

Ley

Turnbaugh

Klein

. et al.

Human gut microbes associated with obesity

Nature

2006

;

444

1022

–

10.1038/4441022a

35.

Torres

Mehandru

Colombel

. et al.

Crohn's disease

Lancet

2017

;

389

1741

–

10.1016/S0140-6736(16)31711-1

36.

Center

Jemal

Smith

. et al.

Worldwide variations in colorectal cancer

CA Cancer J Clin

2009

;

366

–

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
November 2024	356
December 2024	118
January 2025	91
February 2025	101
March 2025	133
April 2025	37

Article Contents

Predicting disease-associated microbes based on similarity fusion and deep learning

Abstract

Introduction

Materials and methods

Datasets

Human microbe–disease associations

Similarity calculation and fusion

Method architecture

Results

Experimental setting

Hyperparameter analysis

Ablation experiments

Comparison with other methods

Case studies

Discussion and conclusions

Funding

Data availability

Author Biographies

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Predicting disease-associated microbes based on similarity fusion and deep learning

Abstract

Introduction

Materials and methods

Datasets

Human microbe–disease associations

Similarity calculation and fusion

Method architecture

Results

Experimental setting

Hyperparameter analysis

Ablation experiments

Comparison with other methods

Case studies

Discussion and conclusions

Funding

Data availability

Author Biographies

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only