MHCLMDA: multihypergraph contrastive learning for miRNA–disease association prediction

Abstract

The correct prediction of disease-associated miRNAs plays an essential role in disease prevention and treatment. Current computational methods to predict disease-associated miRNAs construct different miRNA views and disease views based on various miRNA properties and disease properties and then integrate the multiviews to predict the relationship between miRNAs and diseases. However, most existing methods ignore the information interaction among the views and the consistency of miRNA features (disease features) across multiple views. This study proposes a computational method based on multiple hypergraph contrastive learning (MHCLMDA) to predict miRNA–disease associations. MHCLMDA first constructs multiple miRNA hypergraphs and disease hypergraphs based on various miRNA similarities and disease similarities and performs hypergraph convolution on each hypergraph to capture higher order interactions between nodes, followed by hypergraph contrastive learning to learn the consistent miRNA feature representation and disease feature representation under different views. Then, a variational auto-encoder is employed to extract the miRNA and disease features in known miRNA–disease association relationships. Finally, MHCLMDA fuses the miRNA and disease features from different views to predict miRNA–disease associations. The parameters of the model are optimized in an end-to-end way. We applied MHCLMDA to the prediction of human miRNA–disease association. The experimental results show that our method performs better than several other state-of-the-art methods in terms of the area under the receiver operating characteristic curve and the area under the precision-recall curve.

hypergraph contrastive learning, hypergraph convolution, multiview learning, MiRNA–disease association prediction, multiomics data fusion

INTRODUCTION

MiRNA (microRNA) is a class of noncoding single-stranded RNA molecules with a length of 22 nucleotides [1]. MiRNAs are involved in many critical biological processes, including controlling cell proliferation, differentiation, metabolism and apoptosis by repressing the expression of mRNAs in the organism [2]. Therefore, abnormal expression or dysfunction of miRNAs is closely related to many diseases, including cancer, Parkinson’s and neurodegenerative diseases. For example, overexpression of hsa-mir-142 is associated with lymphoma [3]. miR-101-2, miR-125b-2 and miR-451a function as potential tumor suppressors in Gastric cancer [4]. miR-130a expression is closely associated with the development and progression of nonsmall cell lung cancer [5]. Since miRNAs will be potential biomarkers for various diseases, further study of the relationship between miRNAs and diseases is essential for disease prevention, diagnosis and treatment. Many biological methods are currently available to detect disease-associated miRNAs, such as quantitative real-time Polymerase Chain Reaction (PCR), high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation, Photoactivatable Ribonucleoside-Enhanced crosslinking and immunoprecipitation and individual-nucleotide resolution UV crosslinking and immunoprecipitation [6–9]. Although these biological methods can accurately determine the relationship between miRNAs and diseases, they take time and effort. Therefore, there is a need to develop effective computational models to predict disease-associated miRNAs.

Most current computational methods to predict disease-associated miRNAs rely on the assumption that functionally related miRNAs tend to be associated with similar diseases and vice versa. The process involves constructing miRNA and disease views based on various properties and then inferring new miRNA–disease associations using known relationships [10, 11]. Based on the inference methods employed, these methods can be classified as network path-based, network diffusion-based, network representation learning and machine learning-based methods.

Network path-based methods construct a heterogeneous network that includes both miRNAs and diseases. The network’s edges represent known association relationships between miRNAs and diseases and edges connecting similar miRNAs and diseases. Pathway-based methods assess the closeness of miRNA and disease associations by exploring the pathways of miRNAs and diseases in this network. For example, You et al. [12] discover paths between miRNAs and diseases by performing a depth-first search in a heterogeneous network and filtering out long paths and finally predicted potential miRNA–disease associations by merging all paths. Network diffusion-based approaches use network diffusion algorithms to propagate information through the miRNA–disease network [13, 14]. For example, Luo et al. [15] construct a heterogeneous network containing a disease similarity network, a miRNA similarity network and a known miRNA–disease association network. An unbalanced double random wandering model is proposed based on the differences between the disease similarity network and the miRNA similarity network. The miRNAs are linked to diseases by walking different steps in these two similarity networks to discover potential miRNA–disease associations. However, many false positive and false negative data in heterogeneous networks can reduce the performance of these path-based and network diffusion-based approaches. Network embedding-based methods aim to learn low-dimensional representations or embedding of miRNAs and diseases in a latent space. These methods can predict novel associations by capturing the underlying relationships and similarities between miRNAs and diseases. Network embedding-based methods often use techniques such as least squares [16, 17], matrix decomposition [18–21], matrix completion [22–24], variational auto-encoders [18, 19] and graph convolutional neural networks (GCNs) [25]. Chen et al. [16] use a least-squares approach to predict miRNA–disease associations based on known miRNA–disease association information and miRNA similarity and disease similarity networks. Xiao et al. [18] propose a nonnegative matrix decomposition approach (GRNMF) to identify potential miRNA–disease associations using miRNA, disease similarity relationship as a regularization constraint term. Ding et al. [20] combine a matrix decomposition with a variational auto-encoder to predict new miRNA–disease associations. Chen et al. [24] use known miRNA–disease associations, miRNA similarity and disease similarity to learn feature representations of miRNAs and diseases in the latent space based on matrix completion models to predict new miRNA and disease associations. GCNs have received attentions from researchers because of their ability to consider both the network structure and the properties of the nodes themselves when learning node representations. Li et al. [25] design the NIMCGCN model to run GCNs on miRNA similarity networks and disease similarity networks to learn the feature representations of miRNAs and diseases and then input the learned features into a matrix completion model to predict new miRNA–disease associations. The model GCAEMDA proposed by Li et al. [26] also executes GCNs on the miRNA and disease subnetwork to learn miRNA and disease features separately. Unlike NIMCGCN, it leverages known miRNA–disease associations and miRNA or disease similarities as the initial node features and performs model parameter learning by reconstructing the miRNA and disease subnets. Wang et al. [27] propose a model named MKGAT method to perform attention graph convolution to learn feature representations of miRNAs and diseases, and the model uses the miRNA similarity network and the disease similarity network as regular terms to constrain the model. The deep learning-based approaches concatenate the miRNA and disease features into the deep learning model to predict miRNA–disease association labels. For example, Peng et al. [28] and Li et al. [29] calculate miRNA features and disease features based on the association information of miRNAs with genes and disease with genes and then combined the two types of features into deep learning models such as convolutional neural network (CNN) to predict miRNA–disease associations.

The above computational methods for predicting disease-associated miRNAs often utilize multiple miRNA and disease views to understand the underlying relationships comprehensively. These views can include various properties of miRNAs and diseases, such as miRNA sequence information, functional annotations, semantic associations, etc. Since different views of miRNAs and diseases can complement each other, many methods integrate multiple miRNA and disease views to improve the accuracy of predictions. Most current methods compute the weighted average of the views. These weighted views are then used as feature inputs or constraints for the miRNA and disease prediction model [20, 22, 24, 28]. Some methods take into account the differences between individual views. They learn miRNA and disease features based on each view and then fuse these multiple features to perform the prediction task. For example, MMGCN [30] first uses GCNs to learn their multiview features from various miRNA and disease networks, respectively. Then, the attention mechanism is employed to adaptively extract the importance of different features, and CNNs are used to realize the final predictions. MiRNA target gene and disease-related gene information are often used to construct miRNA and disease views. Because miRNAs play an essential role in cellular functions and metabolic processes by regulating the expression of target genes, abnormal functions of genes may lead to the development of diseases. Thus, genes play an essential role in bridging the gap between miRNAs and diseases. There are also methods to construct heterogeneous networks containing miRNAs, genes and diseases. Random walking [15], GCN [31], typical correlation analysis [32], etc., are executed on the network to learn the miRNA and disease features to predict the miRNA–disease associations.

Existing methods integrate multiple views of miRNAs and diseases to predict miRNA–disease associations and achieve good results. However, most of these methods ignore the information interaction between views. The features learned from different views should maintain consistency. In addition, most existing methods construct miRNA views and disease views based on the properties of miRNAs and diseases, ignoring the high-order relationships between miRNAs and diseases [33]. Multiple similar miRNAs will affect the same disease. On the other hand, various diseases with similar symptoms are related to the same miRNA [34, 35]. Therefore, in addition to considering the lower order relationship between a pair of miRNAs and a pair of diseases, we leverage their higher order relationships. In this work, we propose a multihypergraph contrastive learning model to predict miRNA–disease associations (MHCLMDA). At first, the method constructs multiple miRNA hypergraphs and disease hypergraphs using various miRNA and disease properties, such as miRNA sequences, disease semantics, miRNA target genes, disease-associated genes and known disease–miRNA associations. The nodes of the hypergraphs are miRNA or disease, and the edges of the hypergraphs contain multiple similar miRNA or disease nodes. Then, we perform hypergraph convolution on each hypergraph to learn the node feature representations. Contrastive learning is then employed to facilitate information interaction between the hypergraphs, ensuring consistency in the learned features across different views. MHCLMDA recognizes the importance of known associations between miRNAs and diseases. It utilizes a variational auto-encoder to capture nonlinear features of miRNAs and diseases based on known associations, enabling the model to effectively incorporate prior knowledge into the prediction process. Finally, MHCLMDA fuses the learned features for predicting the association between miRNAs and diseases. Compared with previous work, our main contributions are as follows.

(i) We constructed multiple miRNA and disease hypergraphs containing miRNA sequences, disease semantics, miRNA target genes, disease-associated genes and known disease–miRNA associations. Then, we performed hypergraph convolution on each hypergraph. This approach allows for capturing higher order relationships between miRNAs and diseases and learning a comprehensive representation of miRNAs and diseases from different perspectives.
(ii) By leveraging multihypergraph contrastive learning, MHCLMDA considers the interaction between views, maintaining feature consistency.
(iii) We applied MHCLMDA to predict human miRNA–disease association under different cross-validation settings. The results showed that our method outperformed other advanced techniques. The results of the ablation experiments also prove that MHCLMDA provides a novel and promising approach for predicting miRNA–disease associations.

MATERIALS AND METHODS

Datasets

The known associations of human miRNAs with diseases used in this paper were obtained from the HMDD v3.2 database [36]. MiRNA target gene information was obtained from the mirTarbase database [37]. Information on disease-associated genes was obtained from the DisGeNET database [38]. MiRNA sequence information was from the miRBase database [39]. We focused only on miRNAs with target genes and diseases with associated genes. Therefore, 757 miRNAs, 435 diseases and 11 216 genes were involved in this experiment, 7694 experimentally validated miRNA–disease association data, 48 775 miRNA and gene associations and 154 131 disease and gene associations. All experimental data were obtained from the literature [31].

Overview

MHCLMDA takes three steps to predict miRNA–disease associations. First, it constructs multiple miRNA and disease hypergraphs based on various miRNA and disease similarities. Second, MHCLMDA applies hypergraph convolution on each hypergraph to capture higher order interactions between nodes and learns consistent miRNA and disease feature representations across the hypergraphs through hypergraph contrastive learning. Meanwhile, MHCLMDA leverages a variational auto-encoder to extract key features from the known miRNA–disease associations. Finally, MHCLMDA fuses various miRNA and disease features learned from the hypergraphs to predict novel miRNA–disease associations. Figure 1 illustrates the framework of MHCLMDA.

Figure 1

Architecture of MHCLMDA.

Open in new tab Download slide

Preprocessing data

Association cosine similarity for miRNA/disease

Let |${A}_{md}\in{\left\{0,1\right\}}^{n_m\times{n}_d}$| be the miRNA–disease association matrix. If there is an association between miRNA i and disease j, then A_ij = 1, otherwise 0. n_m and n_d denote the number of miRNAs and diseases, respectively. Based on whether there are similar disease associations between miRNAs, we calculate the association cosine similarity between miRNAs, |${S}_{am}\in{R}^{n_m\times{n}_m}$|

$$ \begin{equation} {S}_{am}=\frac{\left({A}_{md}\left({i}_m,:\right)\right){\left({A}_{md}\left({j}_m,:\right)\right)}^T}{\parallel{A}_{md}\left({i}_m,:\right)\parallel \parallel{A}_{md}\left({j}_m,:\right)\parallel }, \end{equation} $$

(1)

where |${i}_m$| and |${j}_m$|∊ |${n}_m$|⁠. |${A}_{md}\left({i}_m,:\right)$| denotes the number i_m of row vector of the matrix |${A}_{md}$|⁠.|$\parallel{A}_{md}\left({i}_m,:\right)\parallel$| denotes modulus of the row vector.

Similarly, the association cosine similarity for disease, |${S}_{ad}\in{R}^{n_d\times{n}_d}$| is calculated as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}{S}_{ad}=\frac{\big({A}_{md}\left(:,{i}_d\right){\left({A}_{md}\left(:,{j}_d\right)\right)}^T}{\parallel{A}_{md}\left(:,{i}_d\right)\parallel \parallel{A}_{md}\left(:,{j}_d\right)\parallel },\end{array}} \end{equation} $$

(2)

where |${i}_d$| and |${j}_d$| ∊ |${n}_d$|⁠.

MiRNA sequence second-order similarity

We obtained the sequence information of miRNAs from the miRBase database [39] and used the Needleman–Wunsch algorithm [40] to calculate the sequence similarity between miRNAs and miRNAs to construct a miRNA sequence similarity matrix |$SM\in{R}^{n_m\times{n}_m}$|⁠. SMS(m_i, m_j) represents sequence similarity between miRNA m_i and miRNA m_j. Sequence similarity between miRNAs was obtained from the supplementary files of [31]. Considering that miRNAs in the same family have similar sequences and functions, forming miRNA modules. In order to capture the modularity between miRNAs more effectively, we calculate the miRNA sequence second-order similarity based on the miRNA sequence similarity SMS(m_i, m_j), which is calculated as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c} SM S\left({m}_i,{m}_j\right)=\frac{\ SM\left({m}_i,:\right) SM{\left({m}_j,:\right)}^{\top }}{\parallel SM\left({m}_i,:\right)\parallel \parallel SM\left({m}_j,:\right)\parallel }.\end{array}} \end{equation} $$

(3)

Disease semantic second-order similarity

The hierarchical relationships between diseases are described in the Mesh database [41] using directed acyclic graphs (DAGs). The degree of semantic similarity between the two diseases is calculated based on the shared nodes in their DAG structures. The more parts shared by the DAG between diseases indicate the greater semantic similarity between the two diseases.

For a disease d, the semantic contribution D_d(n) of disease d to n in the DAG can be calculated as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}{D}_d(n)=\left\{\begin{array}{c}1\ \mathrm{if}\ n=d\\{}\max \left\{\Delta \ast{D}_d\left({n}^{\prime}\right)\left|{n}^{\prime}\in \mathrm{children}\ \mathrm{of}\ n\right.\right\}\end{array},\right.\end{array}} \end{equation} $$

(4)

where ∆ is the semantic contribution factor and we set the ∆ value to 0.5 by referring to previous work [31]. Let |$SD\in{R}^{n_d\times{n}_d}$| be the disease semantic similarity matrix. The semantic similarity SD(d_i, d_j) between disease d_i and disease d_j can be calculated as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c} SD\left({d}_i,{d}_j\right)=\frac{\sum_{t\in N\left({d}_i\right)\cap N\left({d}_j\right)}\left({D}_{d_i}(t)+{D}_{d_j}(t)\right)}{DV\left({d}_i\right)+ DV\left({d}_j\right)},\end{array}} \end{equation} $$

(5)

where N(d_i) represents the set of all ancestors of d_i (including itself) in the DAG(d). |$DV\left({d}_i\right)=\sum_{d\in \mathrm{N}\left(\mathrm{di}\right)}{D}_{d_i}(d)$|⁠. Since the diseases with high semantic similarity will have similar phenotype and pathogenesis and will often belong to the same disease type, we calculate the disease semantic second-order similarity SDS(d_i, d_j) based on the diseases’ semantic similarity as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c} SD S\left({d}_i,{d}_j\right)=\frac{\ SD\left({d}_i,:\right) SD{\left({d}_j,:\right)}^{\top }}{\parallel SD\left({d}_i,:\right)\parallel \parallel SD\left({d}_j,:\right)\parallel }.\end{array}} \end{equation} $$

(6)

MiRNA cotarget genes similarity

We obtain the information on miRNA target genes from the mirTarbase database [37] and construct a miRNA-gene association matrix |${A}_{mg}\in{\left\{0,1\right\}}^{n_m\times{n}_g}$|⁠, where n_m and n_g represent the number of miRNAs and genes, respectively. According to whether the miRNAs share common target genes, we calculate the similarity matrix between miRNAs, |$GM\in{R}^{n_m\times{n}_m}$|⁠. GM(m_i, m_j) represent the cotarget genes similarity between miRNAs m_i and miRNA m_j. Its calculation formula is as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c} GM\left({m}_i,{m}_j\right)=\frac{\left({A}_{mg}\left({m}_i,:\right)\right){\left({A}_{mg}\left({m}_j,:\right)\right)}^T}{\parallel{A}_{mg}\left({m}_i,:\right)\parallel \parallel{A}_{mg}\left({m}_j,:\right)\parallel }.\end{array}} \end{equation} $$

(7)

Disease coassociation gene similarity

We construct a disease–gene association matrix, |${A}_{dg}\in{\left\{0,1\right\}}^{n_d\times{n}_g}$| according to the data from the DisGeNET database [38], whose element is 1 if the gene is related to the disease. Otherwise, the element is zeros. n_d and n_g represent the number of diseases and genes, respectively. Based on whether the diseases share a common associated gene, we calculate the cosine similarity between diseases and constructed the similarity matrix |$GD\in{R}^{n_d\times{n}_d}$|⁠. GD(d_i, d_j) denotes the coassociated gene similarity between disease d_i and disease d_j. Its calculation formula is as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c} GD\left({d}_i,{d}_j\right)=\frac{\left({A}_{dg}\left({d}_i,:\right)\right){\left({A}_{dg}\left({d}_j,:\right)\right)}^T}{\parallel{A}_{dg}\left({d}_i,:\right)\parallel \parallel{A}_{dg}\left({d}_j,:\right)\parallel }.\end{array}} \end{equation} $$

(8)

Construction of the hypergraph

A hypergraph is a special form of a graph. In a regular graph, an edge connects only two nodes. In a hypergraph, a hyperedge can connect any number of nodes, representing a common association between these nodes. Therefore, hypergraphs can represent the higher order relationships between nodes. Let G = (V, E) be the hypergraph, where V = {v₁, v₂, v₃,…, v_n} represents the set of nodes and E = {e₁, e₂, e₃, …, e_m} represents the set of hypergraph edges. Let |$H\in{R}^{\left|V\right|\times \left|E\right|}$| be the adjacency matrix of the hypergraph G, which is defined as

$$ \begin{equation} {\displaystyle \begin{array}{c}H\left(v,e\right)=\left\{\begin{array}{c}1,v\in e\\{}0,v\notin e\end{array}\right..\end{array}} \end{equation} $$

(9)

Suppose the hyperedge e_j contains a vertex v_i, then H(v_i, e_j) = 1; otherwise 0. There is n_m of miRNAs, and we build n_m of hyperedges for the hypergraph by collecting each miRNA and its most similar K of miRNAs. Hence, the miRNA hypergraph contains n_m hyperedges. We construct three miRNA hypergraphs based on the three similarity matrices of miRNAs mentioned above, namely, hypergraph H_mg is based on the similarity of the cotarget genes, hypergraph H_md is based on miRNA cosine similarity in disease associations, and hypergraph H_SM is based on the miRNA sequences second-order similarity. H_dg is based on the similarity of disease coassociation genes, H_dm is based on disease cosine similarity in miRNA associations and H_SD is based on disease semantic second-order similarity.

Hypergraph contrastive learning

We normalize the adjacency matrix H of each hypergraph before implementing feature learning on the hypergraph. For example, the miRNA hypergraph H_md is normalized as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}{\hat{H}}_{md}={D}_{md v}^{-\frac{1}{2}}{H}_{md}{D}_{md e}^{-\frac{1}{2}},\end{array}} \end{equation} $$

(10)

where D_mdv is the vertex degree matrix of H_md, and D_mde is the hyperedge degree matrix of H_md.

For any vertex |$v\in{V}_{md}$|⁠, the degree of the vertices is defined as the number of hyperedges containing to the node. D_mdv(ν) is defined as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}{D}_{md v}(v)=\sum_{e\in{E}_{md}}{H}_{md}\left(v,e\right).\end{array}} \end{equation} $$

(11)

For any hyperedge |$e\in{E}_{md}$|⁠, the degree of the hyperedge |$e$| is defined as

$$ \begin{equation} {\displaystyle \begin{array}{c}{D}_{md e}(e)=\sum_{v\in{V}_{md}}{H}_{md}\left(v,e\right).\end{array}} \end{equation} $$

(12)

The other hypergraphs are normalized similarly to obtain the respective normalized adjacency matrix |${\hat{H}}_{mg}$|⁠, |${\hat{H}}_{SM}$|⁠, |${\hat{H}}_{dg}$|⁠, |${\hat{H}}_{dm}$| and |${\hat{H}}_{SD}$|⁠.

After constructing the hypergraph, we learn the feature representation of the miRNAs and diseases in the corresponding hypergraph by performing hypergraph convolution. The hypergraph convolution is defined as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c} Hconv\left(\hat{H},{X}^{(l)}\right)=\hat{H}{\hat{H}}^T{X}^{(l)}{W}^{(l)}+{X}^{(l)}{W}^{(l)},\end{array}} \end{equation} $$

(13)

where |$\hat{H}$| = {|${\hat{H}}_{md}$|⁠, |${\hat{H}}_{mg}$|⁠, |${\hat{H}}_{SM}$|⁠, |${\hat{H}}_{dm}$|⁠, |${\hat{H}}_{dg}$|⁠, |${\hat{H}}_{SD}$|} is the normalized hypergraph adjacency matrix. X = {X_md, X_mg, X_ms, X_dm, X_dg, X_ds} is the node feature matrix of miRNA and disease in each hypergraph. In this study, the miRNAs and diseases in different hypergraphs take the same one-hot encoding strategies to keep their initial features in different hypergraphs consistent. W is the learnable weight. l is the number of layers of hypergraph convolution. Hence, X⁽⁰⁾ is the one-hot encoding for miRNAs or diseases. To emphasize the node features and avoid the over-smoothing problem caused by multilayer graph convolution, we add the node features in the hypergraph convolution process.

Taking miRNA as an example, its three normalized hypergraphs adjacency matrix |${\hat{H}}_{md}$|⁠, |${\hat{H}}_{mg}$|⁠, |${\hat{H}}_{SM}$| with their initial features are input to the convolution module to obtain three miRNA features, m₁, m₂ and m₃.

$$\begin{equation} {\displaystyle \begin{array}{c}{m}_1= Hconv\left({\hat{H}}_{mg},{X}_{mg}\right)\end{array}} \end{equation}$$

(14)

$$\begin{equation} {\displaystyle \begin{array}{c}{m}_2= Hconv\left({\hat{H}}_{md},{X}_{md}\right)\ \end{array}} \end{equation}$$

(15)

$$ \begin{equation} {\displaystyle \begin{array}{c}{m}_3= Hconv\left({\hat{H}}_{SM},{X}_{SM}\right).\end{array}} \end{equation} $$

(16)

According to the same approach, we learn the corresponding three feature representations d₁, d₂ and d₃ from the three hypergraphs |${\hat{H}}_{dg}$|, |${\hat{H}}_{dm}$| and |${\hat{H}}_{SD}$| of the disease, respectively.

Next, we view different hypergraphs of miRNAs or diseases as different data enhancements and perform contrastive learning on miRNA and disease representations after hypergraph convolution to achieve information interaction among different hypergraphs and learn consistent miRNA and disease feature representations in different hypergraphs. The hypergraph contrastive loss for learning miRNA feature representations can be defined as follows:

$$\begin{equation} {\displaystyle \begin{array}{c}{l}_i^{21}=-\mathit{\log}\frac{\Sigma_{\mathrm{j}\in{\mathrm{e}}_{\mathrm{i}}}\exp \left( sim\left({m}_{2i},{m}_{1j}\right)/\tau \right)}{\kern0.1em {\Sigma}_{\mathrm{k}=1}^{\mathrm{N}}\exp \left( sim\left({m}_{2i},{m}_{1k}\right)/\tau \right)}\ \end{array}} \end{equation}$$

(17)

$$\begin{equation} {\displaystyle \begin{array}{c}{l}_i^{23}=-\mathit{\log}\frac{\Sigma_{\mathrm{j}\in{\mathrm{e}}_{\mathrm{i}}}\exp \left( sim\left({m}_{2i},{m}_{1j}\right)/\tau \right)}{\kern0.1em {\Sigma}_{\mathrm{k}=1}^{\mathrm{N}}\exp \left( sim\left({m}_{2i},{m}_{1k}\right)/\tau \right)}\end{array}} \end{equation}$$

(18)

$$ \begin{equation} {\displaystyle \begin{array}{c}{l}_c={l}_i^{21}+{l}_j^{23}.\end{array}} \end{equation} $$

(19)

Because this work aims to predict the association between miRNAs and diseases, we take the feature m₂ learned from miRNA hypergraph based on miRNA cosine similarity in disease associations as the axis, and learn it in contrast with the feature m₁ learned from miRNA hypergraph based on cotarget gene similarity and the feature m₃ learned from miRNA hypergraph based on miRNA second-order sequence similarity, respectively. When calculating the contrastive loss, the positive samples consist of the miRNA in different hypergraphs in addition to the nodes in the hyperedges centered on that miRNA in the control hypergraph, i.e. |${\hat{H}}_{mg}$| and |${\hat{H}}_{SM}$|⁠, and the rest of the nodes are all negative samples. In Equations (17 and 18), e_i represents the hyperedge constructed with node i as the center node in |${\hat{H}}_{mg}$| or |${\hat{H}}_{SM}$|⁠, and τ is the temperature hyperparameter. Function sim() represents the cosine similarity.

Similarly, let d₂ conduct contrastive learning with d₁ and d₃, respectively. Next, we use the miRNA features m₁, m₂ and m₃ learned in the miRNA hypergraph to reconstruct the miRNA–disease association matrix (⁠|${MD}_H^1$|⁠, |${MD}_H^2$|⁠, |${MD}_H^3$|⁠) by multiplying them with the disease features d₁, d₂ and d₃ learned in the disease hypergraph, respectively.

$$ M{D}_H^1= sigmoid\left({m}_1{d}_1^T\right) $$

$$\begin{equation} M{D}_H^2= sigmoid\left({m}_2{d}_2^T\right) \end{equation}$$

(20)

$$ M{D}_H^3= sigmoid\left({m}_3{d}_3^T\right). $$

The final miRNA–disease association matrix is obtained by fusing the matrices|$M{D}_H^1$|⁠, |$M{D}_H^2$|⁠, |$M{D}_H^3$| as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}M{D}_H=\frac{M{D}_H^1+M{D}_H^2+M{D}_H^3}{3}.\end{array}} \end{equation} $$

(21)

To learn miRNA and disease features from the hypergraph that are suitable for identifying miRNA and disease associations, we design loss constraints for the three miRNA–disease scoring matrices obtained from the hypergraph. Here is the description of the loss constraints

$$ \begin{equation} {\begin{array}{c}{l}_H\!=\!-\sum_{k=1}^h{M}_{ij}\left[{A}_{md}\right.\log \left(\!\sigma \left({MD}_H^k\!\right)\right)+\left(1-{A}_{md}\right)\left.\log \left(\!1-\sigma \left({MD}_H^k\!\right)\right)\right]\!,\end{array}} \end{equation} $$

(22)

where h ∊ {1, 2, 3} and |$\sigma$| is the sigmoid activation function. A_md is the known correlation matrix. M is the indicator matrix, M_ij = 1 when the association between the ith miRNA and the jth disease is in the training set, otherwise M_ij = 0.

variational auto-encoder–based learning of miRNA and disease features

Considering the importance of known associations between miRNAs and diseases, MHCLMDA adopts a variational auto-encoder (VAE) to learn miRNA and disease nonlinear key features. Given a known miRNA–disease association matrix A_md, the row vectors of the matrix can be considered as the initial features of the miRNA (A_md) and the column vectors of the matrix can be considered as the initial features of the disease (A_dm). The initial features of miRNA and disease based on the known miRNA–disease association are input into the variational auto-encoder to obtain the miRNA and disease nonlinear key features, respectively. Taking miRNA as an example, the nonlinear key features of miRNA m_v is obtained as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}{m}_v^{\prime }=\mathit{\tanh}\left({A}_{md}{W}_{md}+{b}_{md}\right),\end{array}} \end{equation} $$

(23)

where W_md is the weight and b_md is the bias.

Next for the obtained features |${m}_v^{\prime }$|⁠, we use two independent fully connected layers to obtain the means |${\mu}_{md}$| and variances |${\sigma}_{md}$| of the features and they are calculated as follows:

$$\begin{equation} {\displaystyle \begin{array}{c}{\mu}_m=\mathit{\tanh}\left({m}_v^{\prime }{W}_{\mu m}+{b}_{\mu m}\right)\end{array}} \end{equation}$$

(24)

$$ \begin{equation} {\displaystyle \begin{array}{c}{\sigma}_m=\mathit{\tanh}\left({m}_v^{\prime }{W}_{\sigma m}+{b}_{\sigma m}\right).\end{array}} \end{equation} $$

(25)

Similarly, |${W}_{\mu m}$| and |${W}_{\sigma m}$| are the learnable weights. |${b}_{\mu m}$| and |${b}_{\sigma m}$| are the biases. The final miRNA key features is calculated by the following equation:

$$ \begin{equation} {\displaystyle \begin{array}{c}{m}_v={u}_m+{\sigma}_m\odot \varepsilon, \end{array}} \end{equation} $$

(26)

where |$\varepsilon$| is a random vector ε (ε ~ N(0, 1)) sampled from the standard normal distribution. |$\odot$| denotes element-by-element multiplication. The potential characteristics of the disease d_v are also obtained in the same way.

After obtaining the key feature of the miRNA (m_v) and disease(d_v), the prediction score of miRNA–disease associations based on the VAE is calculated as follows:

$$ \begin{equation} {MD}_V= sigmoid\left({m}_v{d}_v^T\right). \end{equation} $$

(27)

The loss constraints for the VAE component are as follows:

$$ {l}_v=-{M}_{ij}\left[{A}_{md} log\sigma \left({MD}_v\right)+\left(1-{A}_{md}\right)\mathit{\log}\left(1-\sigma \left({MD}_v\right)\right)\right] $$

$$ \begin{equation} -\left( KL\left(q\left({m}_V|{MD}_V\right)\left\Vert p\left({m}_v\right)\right.\right)+ KL\left(q\left({d}_v\left|{MD}_V\right.\right)\left\Vert p\left({d}_v\right)\right.\right)\right). \end{equation} $$

(28)

The final prediction score MD is obtained by weighted summation of MD_H and MD_V

$$ \begin{equation} {\displaystyle \begin{array}{c} MD=\lambda{MD}_H+\left(1-\lambda \right){MD}_V,\end{array}} \end{equation} $$

(29)

where λ is a hyperparameter balancing the contribution of the two parts. The loss constraints for the final prediction are as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}{l}_r=-{M}_{ij}\left[{A}_{md} log\sigma \left((MD)\right)+\left(1-{A}_{md}\right)\mathit{\log}\left(1-\sigma (MD)\right)\right].\end{array}} \end{equation} $$

(30)

Then, the total losses of the model is as follows:

$$ \begin{equation} {\displaystyle \begin{array}{c}l={l}_r+{l}_H+\alpha{l}_v+\left(1-\alpha \right){l}_c.\end{array}} \end{equation} $$

(31)

In summary, following is the pseudo-code of the MHCLMDA.

Algorithm MHCLMDA.

Open in new tab

Algorithm MHCLMDA
Input: Known miRNA–disease association matrix A_md; association cosine similarity matrix for miRNA(S_am) and disease(S_ad); miRNA second-order sequence similarity matrix SMS; disease semantic second-order similarity matrix SDS; miRNA cotarget gene similarity matrix GM; disease coassociation gene similarity matrix GD; Number of neighbors K; Learning rate \|$\mathrm{\eta};$\| fusion hyperparameter λ; temperature hyperparameter τ; loss hyperparameter \|$\alpha$\|⁠; number of hypergraph convolution layers l. Output: Reconstructed miRNA–disease association matrix MD. 1: Constructing hypergraphs H_mg, H_md and H_SM based on the three similarity matrices of miRNAs. Constructing the hypergraphs H_dg, H_dm and H_SD based on the three similarity matrices of diseases. 2: Obtaining \|${\hat{H}}_{mg}$\|⁠, \|${\hat{H}}_{md}$\|⁠, \|${\hat{H}}_{SM}$\|⁠, \|${\hat{H}}_{dg}$\|⁠, \|${\hat{H}}_{dm}$\|⁠, \|${\hat{H}}_{SD}$\| by Eqs (10–12). 3: Learning features m₁, m₂, m₃ and d₁, d₂, d₃ from the hypergraphs by Eqs (13–16). 4: Calculating miRNA–disease association score matrices \|${MD}_H^1$\|⁠, \|${MD}_H^2$\|⁠, \|${MD}_H^3$\| by Eq. (20). 5: Obtaining the integrated miRNA–disease association matrix MD_H by Eq. (21). 6: Obtain the key features m_v and d_v by Eqs (23–26). 7: Calculating the predicted scores for the VAE component by Eq. (27). 8: Obtaining the final prediction scores MD by Eq. (29). 9: Calculating the loss by Eq. (31).

Algorithm MHCLMDA

Input: Known miRNA–disease association matrix A_md; association cosine similarity matrix for miRNA(S_am) and disease(S_ad); miRNA second-order sequence similarity matrix SMS; disease semantic second-order similarity matrix SDS; miRNA cotarget gene similarity matrix GM; disease coassociation gene similarity matrix GD; Number of neighbors K; Learning rate |$\mathrm{\eta};$| fusion hyperparameter λ; temperature hyperparameter τ; loss hyperparameter |$\alpha$|⁠; number of hypergraph convolution layers l.
Output: Reconstructed miRNA–disease association matrix MD.
1: Constructing hypergraphs H_mg, H_md and H_SM based on the three similarity matrices of miRNAs. Constructing the hypergraphs H_dg, H_dm and H_SD based on the three similarity matrices of diseases.
2: Obtaining |${\hat{H}}_{mg}$|⁠, |${\hat{H}}_{md}$|⁠, |${\hat{H}}_{SM}$|⁠, |${\hat{H}}_{dg}$|⁠, |${\hat{H}}_{dm}$|⁠, |${\hat{H}}_{SD}$| by Eqs (10–12).
3: Learning features m₁, m₂, m₃ and d₁, d₂, d₃ from the hypergraphs by Eqs (13–16).
4: Calculating miRNA–disease association score matrices |${MD}_H^1$|⁠, |${MD}_H^2$|⁠, |${MD}_H^3$| by Eq. (20).
5: Obtaining the integrated miRNA–disease association matrix MD_H by Eq. (21).
6: Obtain the key features m_v and d_v by Eqs (23–26).
7: Calculating the predicted scores for the VAE component by Eq. (27).
8: Obtaining the final prediction scores MD by Eq. (29).
9: Calculating the loss by Eq. (31).

Open in new tab

Algorithm MHCLMDA
Input: Known miRNA–disease association matrix A_md; association cosine similarity matrix for miRNA(S_am) and disease(S_ad); miRNA second-order sequence similarity matrix SMS; disease semantic second-order similarity matrix SDS; miRNA cotarget gene similarity matrix GM; disease coassociation gene similarity matrix GD; Number of neighbors K; Learning rate \|$\mathrm{\eta};$\| fusion hyperparameter λ; temperature hyperparameter τ; loss hyperparameter \|$\alpha$\|⁠; number of hypergraph convolution layers l. Output: Reconstructed miRNA–disease association matrix MD. 1: Constructing hypergraphs H_mg, H_md and H_SM based on the three similarity matrices of miRNAs. Constructing the hypergraphs H_dg, H_dm and H_SD based on the three similarity matrices of diseases. 2: Obtaining \|${\hat{H}}_{mg}$\|⁠, \|${\hat{H}}_{md}$\|⁠, \|${\hat{H}}_{SM}$\|⁠, \|${\hat{H}}_{dg}$\|⁠, \|${\hat{H}}_{dm}$\|⁠, \|${\hat{H}}_{SD}$\| by Eqs (10–12). 3: Learning features m₁, m₂, m₃ and d₁, d₂, d₃ from the hypergraphs by Eqs (13–16). 4: Calculating miRNA–disease association score matrices \|${MD}_H^1$\|⁠, \|${MD}_H^2$\|⁠, \|${MD}_H^3$\| by Eq. (20). 5: Obtaining the integrated miRNA–disease association matrix MD_H by Eq. (21). 6: Obtain the key features m_v and d_v by Eqs (23–26). 7: Calculating the predicted scores for the VAE component by Eq. (27). 8: Obtaining the final prediction scores MD by Eq. (29). 9: Calculating the loss by Eq. (31).

Algorithm MHCLMDA

EXPERIMENT AND RESULT

To evaluate the effectiveness of MHCLMDA in predicting miRNA–disease associations, we compared it against six state-of-the-art baselines: MKGAT [27], VGAMF [20], MSGCL [33], HGCNMDA [31], AMHMDA [42] and MLRDFM [43].

MKGAT [27] utilizes graph attention networks (GATs) and double Laplacian regularization to predict miRNA–disease associations. It combines multiple miRNA and disease similarities as initial features in GATs and incorporates similarity constraint terms in regularization.

VGAMF [20] fuses multiple miRNA and disease views using linear weighting. It then extracts linear and nonlinear features for miRNAs and diseases through matrix decomposition and variational auto-encoder to predict potential miRNA–disease associations.

MSGCL [33] generates a graph enhancement through the miRNA similarity and disease similarity and conducts contrastive learning between the known miRNA–disease association network and the graph enhancement to learn miRNA and disease features for association predictions.

HGCNMDA [31] constructs a miRNA–gene–disease heterogeneity network and employs a multirelational graph convolution network model to learn miRNA and disease feature embedding.

AMHMDA [42] fuses multiple miRNA similarity views and disease similarity views using an attention mechanism. It introduces super-nodes to construct a heterogeneous graph and performs graph convolution to learn miRNA and disease features.

MLRDFM [43] is a modification of the traditional DeepFM model. It combines multiple miRNA and disease similarities and uses them as regularization constraints for learned miRNA and disease embedding features. The modified DeepFM model considers low-order and high-order features to predict miRNA–disease associations.

For a fair comparison, all methods utilize the same input similarity data, including miRNA sequence similarity, miRNA cogene target similarity, miRNA related-disease cosine similarity, disease semantic similarity, disease corelated gene similarity, and disease and miRNA-related cosine similarity.

Experiment setting

To validate the effectiveness of the MHCLMDA method, we conducted 10 times 5-fold cross-validation test on the HMDD v3.2 database. Three experimental setups were employed for this purpose.

(i) Random zeroing cross-validation: All known miRNA–disease associations were considered positive samples. The positive samples were randomly divided into five mutually nonoverlapping fractions or subsets. In each cross-validation iteration, one of the subsets and the same number of randomly selected negative samples were used as the test data. The remaining positive samples with the negative samples were used as training data. A randomly cleared cross-validation method tests each model’s ability to find missing miRNA–disease associations.

(2) Random multicolumn zeroing cross-validation: The miRNA–disease association matrix was used, where the columns correspond to diseases. We randomly selected and cleared the values of 1/5 columns from the miRNA–disease association matrix. The cleared columns were used as the test data to evaluate the models’ ability to discover miRNAs for new disease associations. The remaining columns were used as training.

(3) Random multirow zeroing cross-validation: The rows of the miRNA-disease association matrix correspond to miRNA. We randomly selected and cleared the values of 1/5 rows from the association matrix to test the ability of each model to discover new miRNA-associated diseases. The remaining rows were as training.

We adopted Adam as the optimizer of our method. The optimal combination of hyperparameters were as follows: learning rate of 0.002, temperature hyperparameter τ for contrastive loss of 0.7. Evaluation metrics, including AUC, AUPR, PRECISION, RECALL and F1 SCORE, were calculated by comparing the prediction results with the known miRNA–disease associations in the benchmark dataset. All parameters of the baseline method were set under the recommendations in the paper or adjusted appropriately to achieve the best performance (See supplementary files for details).

Parameter discussion

In this study, we utilized a hypergraph convolutional network (Hconv) model to learn the node representation of each hypergraph. The parameter l controls the number of layers of the Hconv. To investigate how the parameter l affects the prediction performance, we set different values of l, ranging from 1 to 5, to test the effect of different l values on the results of the random zeroing cross-validation. As shown in Figure 2, our model has the best performance when l = 2. Hence, we set l to 2 in all experiments of this work.

Figure 2

The prediction performance of different |$Hconv$| layers.

Open in new tab Download slide

The parameter K represents the number of neighbors selected when constructing the hypergraph. By adjusting the value of K ranging from 10 to 60, we explored its impact on the model’s performance. We also tested our model under randomly zeroing cross-validation with smaller values of K ranging from 1 to 9 (see Table S7 in supplementary files). According to the results in Figure 3 and Table S7, our model achieved a relatively high AUC value when the value of K was set to 10. Hence, we set K to 10 in all experiments of this work.

Figure 3

The prediction performance of different K.

Open in new tab Download slide

The parameter λ balances the contribution of the VAE and hypergraph contrastive learning to the final prediction scores of miRNA–disease associations. By adjusting the value of λ between 0.1 and 0.9, we explored its effect on the model performance. According to the results in Figure 4 and Table S8, our model achieved relatively high AUC and AUPR values when the λ value was set to 0.6 under randomly zeroing cross-validation and 0.9 under multicolumn and multirows zeroing cross-validation. Since limited known miRNA–disease associations are available under multicolumn and multirows zeroing cross-validations, we increase the λ value to adjust the model to be less reliant on the VAE component when calculating the final prediction scores of miRNA–disease associations.

Figure 4

The prediction performance on different λ.

Open in new tab Download slide

Performance comparison with other method

We compared MHCLMDA with the baselines under three different cross-validation settings. The 5-fold cross-validation was repeated 10 times for all experimental settings. Table 1 indicates the average AUC, AUPR, PRECISION, RECALL and F1 SCORE for each model under random zeroing cross-validation. We provided AUC variance and AUPR variance in all comparative experiments in the supplementary file. The AUC and AUPR of MHCLMDA reached 94.54 and 94.55%, respectively, which were better than the second best method, HGCNMDA, by 0.98 and 1.0%, respectively. The results indicate that MHCLMDA exhibited better overall performance in predicting miRNA–disease associations compared with the baseline methods under the random zeroing cross-validation setting. We selected the best F1 SCORE values and the corresponding PRECISION and RECALL values of each method for comparison. Our F1 SCORE value is 1.47% higher than the second-best method, HGCNMDA.

Table 1

Open in new tab

Performance comparison of every method under randomly zeroing cross validation

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.9454	0.9455	0.8698	0.8801	0.8749
HGCNMDA	0.9356	0.9355	0.8326	0.8879	0.8602
MKGAT	0.8614	0.8912	0.8425	0.8058	0.8425
MLRDFM	0.9367	0.9359	0.8524	0.8649	0.8586
VGAMF	0.9318	0.9315	0.8771	0.8181	0.8466
AMHMDA	0.9159	0.9084	0.8236	0.8789	0.8504
MSGCL	0.8060	0.8285	0.7315	0.7681	0.7493

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.9454	0.9455	0.8698	0.8801	0.8749
HGCNMDA	0.9356	0.9355	0.8326	0.8879	0.8602
MKGAT	0.8614	0.8912	0.8425	0.8058	0.8425
MLRDFM	0.9367	0.9359	0.8524	0.8649	0.8586
VGAMF	0.9318	0.9315	0.8771	0.8181	0.8466
AMHMDA	0.9159	0.9084	0.8236	0.8789	0.8504
MSGCL	0.8060	0.8285	0.7315	0.7681	0.7493

Table 1

Open in new tab

Performance comparison of every method under randomly zeroing cross validation

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.9454	0.9455	0.8698	0.8801	0.8749
HGCNMDA	0.9356	0.9355	0.8326	0.8879	0.8602
MKGAT	0.8614	0.8912	0.8425	0.8058	0.8425
MLRDFM	0.9367	0.9359	0.8524	0.8649	0.8586
VGAMF	0.9318	0.9315	0.8771	0.8181	0.8466
AMHMDA	0.9159	0.9084	0.8236	0.8789	0.8504
MSGCL	0.8060	0.8285	0.7315	0.7681	0.7493

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.9454	0.9455	0.8698	0.8801	0.8749
HGCNMDA	0.9356	0.9355	0.8326	0.8879	0.8602
MKGAT	0.8614	0.8912	0.8425	0.8058	0.8425
MLRDFM	0.9367	0.9359	0.8524	0.8649	0.8586
VGAMF	0.9318	0.9315	0.8771	0.8181	0.8466
AMHMDA	0.9159	0.9084	0.8236	0.8789	0.8504
MSGCL	0.8060	0.8285	0.7315	0.7681	0.7493

To validate the ability of MHCLMDA to predict miRNA associations for new diseases, we performed cross-validation under multiple-column clearing. The performance comparison of each method under this cross-validation setting is presented in Table 2. We observed that MHCLMDA achieved the highest AUC and AUPR values of 88.64 and 25.88%, respectively, among all the compared methods. It had an improvement of 1.45% in AUC and 6.47% in AUPR compared with the second-best method, HGCNMDA. The MHCLMDA’s F1 SCORE value reached 32.39%, which was 3.86% higher than the F1 SCORE of the HGCNMDA method.

Table 2

Open in new tab

Performance comparison of every method under randomly zeroing out multicolumn cross validation

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.8864	0.2588	0.3715	0.2934	0.3239
HGCNMDA	0.8719	0.1941	0.2352	0.3625	0.2853
MKGAT	0.5734	0.1264	0.2129	0.2612	0.2366
MLRDFM	0.8533	0.1180	0.3476	0.0427	0.0761
VGAMF	0.6775	0.0602	0.0446	0.4421	0.0810
AMHMDA	0.8050	0.1494	0.4302	0.0048	0.0094
MSGCL	0.5011	0.0672	0.1301	0.1211	0.1244

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.8864	0.2588	0.3715	0.2934	0.3239
HGCNMDA	0.8719	0.1941	0.2352	0.3625	0.2853
MKGAT	0.5734	0.1264	0.2129	0.2612	0.2366
MLRDFM	0.8533	0.1180	0.3476	0.0427	0.0761
VGAMF	0.6775	0.0602	0.0446	0.4421	0.0810
AMHMDA	0.8050	0.1494	0.4302	0.0048	0.0094
MSGCL	0.5011	0.0672	0.1301	0.1211	0.1244

Table 2

Open in new tab

Performance comparison of every method under randomly zeroing out multicolumn cross validation

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.8864	0.2588	0.3715	0.2934	0.3239
HGCNMDA	0.8719	0.1941	0.2352	0.3625	0.2853
MKGAT	0.5734	0.1264	0.2129	0.2612	0.2366
MLRDFM	0.8533	0.1180	0.3476	0.0427	0.0761
VGAMF	0.6775	0.0602	0.0446	0.4421	0.0810
AMHMDA	0.8050	0.1494	0.4302	0.0048	0.0094
MSGCL	0.5011	0.0672	0.1301	0.1211	0.1244

Methods	AUC	AUPR	PRECISION	RECALL	F1SCORE
MHCLMDA	0.8864	0.2588	0.3715	0.2934	0.3239
HGCNMDA	0.8719	0.1941	0.2352	0.3625	0.2853
MKGAT	0.5734	0.1264	0.2129	0.2612	0.2366
MLRDFM	0.8533	0.1180	0.3476	0.0427	0.0761
VGAMF	0.6775	0.0602	0.0446	0.4421	0.0810
AMHMDA	0.8050	0.1494	0.4302	0.0048	0.0094
MSGCL	0.5011	0.0672	0.1301	0.1211	0.1244

Furthermore, we observed from Table 3 that MHCLMDA still has the highest AUC, AUPR and F1 SCORE values of 87.62, 25.41 and 32.82% among all methods when recommending associated diseases for new miRNAs, with a 3.19% improvement in AUC over the second-best method HGCNMDA, a 6.52% improvement in AUPR over the second-best method HGCNMDA and a 5.38% improvement in F1 SCORE over the second-best method HGCNMDA. These results indicate that MHCLMDA consistently demonstrated superior performance in predicting miRNA–disease associations, both for new diseases and new miRNAs, compared with the baseline. It achieved the highest AUC and AUPR values in multiple cross-validation settings, confirming its effectiveness in discovering associations and outperforming the compared methods.

Table 3

Open in new tab

Performance comparison of every method under randomly zeroing out multirows cross validation

Methods	AUC	AUPR	PRECISION	RECALL	F1
MHCLMDA	0.8762	0.2541	0.3038	0.3601	0.3282
HGCNMDA	0.8443	0.1889	0.2631	0.2867	0.2744
MKGAT	0.6157	0.1794	0.2608	0.2467	0.2536
MLRDFM	0.8430	0.1798	0.3266	0.1511	0.2066
VGAMF	0.6578	0.0689	0.0539	0.4076	0.0952
AMHMDA	0.7684	0.0817	0.1771	0.0137	0.0254
MSGCL	0.7414	0.1692	0.1881	0.2307	0.2074

Methods	AUC	AUPR	PRECISION	RECALL	F1
MHCLMDA	0.8762	0.2541	0.3038	0.3601	0.3282
HGCNMDA	0.8443	0.1889	0.2631	0.2867	0.2744
MKGAT	0.6157	0.1794	0.2608	0.2467	0.2536
MLRDFM	0.8430	0.1798	0.3266	0.1511	0.2066
VGAMF	0.6578	0.0689	0.0539	0.4076	0.0952
AMHMDA	0.7684	0.0817	0.1771	0.0137	0.0254
MSGCL	0.7414	0.1692	0.1881	0.2307	0.2074

Table 3

Open in new tab

Performance comparison of every method under randomly zeroing out multirows cross validation

Methods	AUC	AUPR	PRECISION	RECALL	F1
MHCLMDA	0.8762	0.2541	0.3038	0.3601	0.3282
HGCNMDA	0.8443	0.1889	0.2631	0.2867	0.2744
MKGAT	0.6157	0.1794	0.2608	0.2467	0.2536
MLRDFM	0.8430	0.1798	0.3266	0.1511	0.2066
VGAMF	0.6578	0.0689	0.0539	0.4076	0.0952
AMHMDA	0.7684	0.0817	0.1771	0.0137	0.0254
MSGCL	0.7414	0.1692	0.1881	0.2307	0.2074

Methods	AUC	AUPR	PRECISION	RECALL	F1
MHCLMDA	0.8762	0.2541	0.3038	0.3601	0.3282
HGCNMDA	0.8443	0.1889	0.2631	0.2867	0.2744
MKGAT	0.6157	0.1794	0.2608	0.2467	0.2536
MLRDFM	0.8430	0.1798	0.3266	0.1511	0.2066
VGAMF	0.6578	0.0689	0.0539	0.4076	0.0952
AMHMDA	0.7684	0.0817	0.1771	0.0137	0.0254
MSGCL	0.7414	0.1692	0.1881	0.2307	0.2074

Ablation study

To assess the impact of each module on the performance of MHCLMDA, we set up various model variants and conducted random clearing experiments. These experiments involved removing specific modules from the MHCLMDA framework to evaluate their contributions while keeping the values for the parameter |$\alpha$| (here |$\alpha$| = 0.7) in the total loss l. The details of the model variants are as follows:

(i) MHCLMDA_NVAE: This variant does not utilize the VAE for learning nonlinear key features of miRNAs and diseases.
(ii) MHCLMDA_NG: This variant constructs hypergraph without considering miRNA target genes and disease-related genes.
(iii) MHCLMDA_NC: This variant removes the multiple hypergraph contrastive learning module. Without this module, the model does not incorporate the contrastive loss.
(iv) MHCLMDA_NSMSD: This variant constructs hypergraph without considering miRNA sequence second-order similarity and disease semantic second-order similarity.
(v) MHCLMDA_NMD: This variant constructs hypergraph without considering known miRNA–disease association relationships.
(vi) MHCLMDA_NLH: This variant removes the loss of reconstruction of hypergraph parts l_H.

We also did a t-test between the model variants and our original model in the ablation experiments. The corresponding P-values were all less than 0.05 and included in the supplementary file. The results indicate that the model variants in ablation experiments significantly differ from the original model in AUC and AUPR values. As shown in Table 4, we observed that contrastive loss contributed to the performance of the overall model. The construction of hypergraphs considering known miRNA–disease association relationships has a significant impact on the performance of the MHCLMDA model, which suggests that incorporating information about the existing associations between miRNAs and diseases is crucial in improving the model’s predictive capability. On the other hand, hypergraphs that consider information about disease-associated genes and miRNA target genes have relatively less impact on the model’s overall performance. Furthermore, the incorporation of miRNA–disease association reconstruction loss constraints on the learned miRNA and disease features from each hypergraph positively impacts the model’s performance.

Table 4

Open in new tab

Ablation study

Methods	AUC	AUPR
MHCLMDA_NVAE	0.9275	0.9317
MHCLMDA_NG	0.9319	0.9350
MHCLMDA_NC	0.9335	0.9303
MHCLMDA_NSMSD	0.9333	0.9372
MHCLMDA_NMD	0.8856	0.9077
MHCLMDA_NLH	0.9354	0.9342
MHCLMDA	0.9454	0.9455

Methods	AUC	AUPR
MHCLMDA_NVAE	0.9275	0.9317
MHCLMDA_NG	0.9319	0.9350
MHCLMDA_NC	0.9335	0.9303
MHCLMDA_NSMSD	0.9333	0.9372
MHCLMDA_NMD	0.8856	0.9077
MHCLMDA_NLH	0.9354	0.9342
MHCLMDA	0.9454	0.9455

Table 4

Open in new tab

Ablation study

Methods	AUC	AUPR
MHCLMDA_NVAE	0.9275	0.9317
MHCLMDA_NG	0.9319	0.9350
MHCLMDA_NC	0.9335	0.9303
MHCLMDA_NSMSD	0.9333	0.9372
MHCLMDA_NMD	0.8856	0.9077
MHCLMDA_NLH	0.9354	0.9342
MHCLMDA	0.9454	0.9455

Methods	AUC	AUPR
MHCLMDA_NVAE	0.9275	0.9317
MHCLMDA_NG	0.9319	0.9350
MHCLMDA_NC	0.9335	0.9303
MHCLMDA_NSMSD	0.9333	0.9372
MHCLMDA_NMD	0.8856	0.9077
MHCLMDA_NLH	0.9354	0.9342
MHCLMDA	0.9454	0.9455

In conclusion, our model performance benefits from combining multiple similarities to construct multiple hypergraphs, which helps to learn valid miRNA and disease features to predict their associations. Incorporating contrastive loss and reconstruction loss constraints, along with learning key features of diseases and miRNAs, contributes to the model’s effectiveness in predicting miRNA–disease associations.

Case study

To further validate MHCLMDA performance in miRNA–disease association prediction, we applied it to predict miRNAs associated with lymphoma and lung neoplasms. We removed known miRNAs related to lymphoma or lung cancer in the training process and then assessed the ability of the model to recover the deleted associations in the prediction process. Besides HMDD v3.2, we also used dbDEMC [44] database as a benchmark. Lymphoma is a malignant tumor whose abnormal proliferation and spread may cause severe damage to the body. Lymphoma directly affects the lymphatic system, which may impair immune system function, making the body more susceptible to infections and other diseases. Table 5 shows the top 30 lymphoma-associated miRNAs detected by MHCLMDA based on the HMDD v3.2 dataset. As shown in Table 5, MHCLMDA successfully identified lymphoma-associated miRNAs documented in both data.

Table 5

Open in new tab

Top 30 Lymphoma-related miRNAs predicted by MHCLMDA on HMDDv3.2 Dataset

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-146a	dbDEMC, HMDDv3.2
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-192	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-let-7 g	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-145	dbDEMC	hsa-mir-146b	dbDEMC
hsa-mir-221	dbDEMC, HMDDv3.2	hsa-mir-132	dbDEMC
hsa-mir-19a	dbDEMC, HMDDv3.2	hsa-mir-223	dbDEMC
hsa-let-7b	dbDEMC	hsa-mir-130a	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC	hsa-mir-93	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC	has-mir-106a	dbDEMC
hsa-mir-27b	dbDEMC, HMDDv3.2	hsa-mir-18a	dbDEMC, HMDDv3.2
hsa-mir-222	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-30a	dbDEMC	hsa-mir-150	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	has-mir-15a	dbDEMC, HMDDv3.2

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-146a	dbDEMC, HMDDv3.2
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-192	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-let-7 g	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-145	dbDEMC	hsa-mir-146b	dbDEMC
hsa-mir-221	dbDEMC, HMDDv3.2	hsa-mir-132	dbDEMC
hsa-mir-19a	dbDEMC, HMDDv3.2	hsa-mir-223	dbDEMC
hsa-let-7b	dbDEMC	hsa-mir-130a	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC	hsa-mir-93	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC	has-mir-106a	dbDEMC
hsa-mir-27b	dbDEMC, HMDDv3.2	hsa-mir-18a	dbDEMC, HMDDv3.2
hsa-mir-222	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-30a	dbDEMC	hsa-mir-150	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	has-mir-15a	dbDEMC, HMDDv3.2

Table 5

Open in new tab

Top 30 Lymphoma-related miRNAs predicted by MHCLMDA on HMDDv3.2 Dataset

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-146a	dbDEMC, HMDDv3.2
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-192	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-let-7 g	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-145	dbDEMC	hsa-mir-146b	dbDEMC
hsa-mir-221	dbDEMC, HMDDv3.2	hsa-mir-132	dbDEMC
hsa-mir-19a	dbDEMC, HMDDv3.2	hsa-mir-223	dbDEMC
hsa-let-7b	dbDEMC	hsa-mir-130a	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC	hsa-mir-93	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC	has-mir-106a	dbDEMC
hsa-mir-27b	dbDEMC, HMDDv3.2	hsa-mir-18a	dbDEMC, HMDDv3.2
hsa-mir-222	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-30a	dbDEMC	hsa-mir-150	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	has-mir-15a	dbDEMC, HMDDv3.2

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-146a	dbDEMC, HMDDv3.2
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-192	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-let-7 g	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-145	dbDEMC	hsa-mir-146b	dbDEMC
hsa-mir-221	dbDEMC, HMDDv3.2	hsa-mir-132	dbDEMC
hsa-mir-19a	dbDEMC, HMDDv3.2	hsa-mir-223	dbDEMC
hsa-let-7b	dbDEMC	hsa-mir-130a	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC	hsa-mir-93	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC	has-mir-106a	dbDEMC
hsa-mir-27b	dbDEMC, HMDDv3.2	hsa-mir-18a	dbDEMC, HMDDv3.2
hsa-mir-222	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-30a	dbDEMC	hsa-mir-150	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	has-mir-15a	dbDEMC, HMDDv3.2

Lung cancer is a malignant tumor with serious risks, which is also one of the most deadly cancers worldwide. Table 6 shows the top 30 lung cancer-associated miRNAs detected by MHCLMDA based on the HMDD v3.2 dataset. As shown in Table 6, the top 30 miRNAs predicted by our model associated with lung cancer were all documented in both databases. These two case studies further demonstrate the excellent performance and validity of the MHCLMDA prediction model.

Table 6

Open in new tab

Top 30 lung cancer-related miRNAs predicted by MHCLMDA on HMDDv3.2 Dataset

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-19b-1	dbDEMC, HMDDv3.2
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-mir-122	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	hsa-mir-221	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-222	dbDEMC, HMDDv3.2
hsa-mir-146a	dbDEMC, HMDDv3.2	hsa-mir-31	dbDEMC, HMDDv3.2
hsa-mir-223	dbDEMC, HMDDv3.2	hsa-mir-200b	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC, HMDDv3.2	hsa-mir-19a	dbDEMC, HMDDv3.2
hsa-let-7b	dbDEMC, HMDDv3.2	hsa-mir-195	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC, HMDDv3.2	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-15a	dbDEMC, HMDDv3.2	hsa-mir-200c	dbDEMC, HMDDv3.2
hsa-mir-106a	dbDEMC, HMDDv3.2	hsa-mir-145	dbDEMC, HMDDv3.2
hsa-let-7d	dbDEMC, HMDDv3.2	hsa-mir-27a	dbDEMC, HMDDv3.2
hsa-mir-150	dbDEMC, HMDDv3.2	hsa-mir-182	dbDEMC, HMDDv3.2

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-19b-1	dbDEMC, HMDDv3.2
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-mir-122	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	hsa-mir-221	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-222	dbDEMC, HMDDv3.2
hsa-mir-146a	dbDEMC, HMDDv3.2	hsa-mir-31	dbDEMC, HMDDv3.2
hsa-mir-223	dbDEMC, HMDDv3.2	hsa-mir-200b	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC, HMDDv3.2	hsa-mir-19a	dbDEMC, HMDDv3.2
hsa-let-7b	dbDEMC, HMDDv3.2	hsa-mir-195	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC, HMDDv3.2	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-15a	dbDEMC, HMDDv3.2	hsa-mir-200c	dbDEMC, HMDDv3.2
hsa-mir-106a	dbDEMC, HMDDv3.2	hsa-mir-145	dbDEMC, HMDDv3.2
hsa-let-7d	dbDEMC, HMDDv3.2	hsa-mir-27a	dbDEMC, HMDDv3.2
hsa-mir-150	dbDEMC, HMDDv3.2	hsa-mir-182	dbDEMC, HMDDv3.2

Table 6

Open in new tab

Top 30 lung cancer-related miRNAs predicted by MHCLMDA on HMDDv3.2 Dataset

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-19b-1	dbDEMC, HMDDv3.2
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-mir-122	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	hsa-mir-221	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-222	dbDEMC, HMDDv3.2
hsa-mir-146a	dbDEMC, HMDDv3.2	hsa-mir-31	dbDEMC, HMDDv3.2
hsa-mir-223	dbDEMC, HMDDv3.2	hsa-mir-200b	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC, HMDDv3.2	hsa-mir-19a	dbDEMC, HMDDv3.2
hsa-let-7b	dbDEMC, HMDDv3.2	hsa-mir-195	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC, HMDDv3.2	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-15a	dbDEMC, HMDDv3.2	hsa-mir-200c	dbDEMC, HMDDv3.2
hsa-mir-106a	dbDEMC, HMDDv3.2	hsa-mir-145	dbDEMC, HMDDv3.2
hsa-let-7d	dbDEMC, HMDDv3.2	hsa-mir-27a	dbDEMC, HMDDv3.2
hsa-mir-150	dbDEMC, HMDDv3.2	hsa-mir-182	dbDEMC, HMDDv3.2

Top 1–15	Evidence	Top 16–30	Evidence
hsa-mir-17	dbDEMC, HMDDv3.2	hsa-mir-19b-1	dbDEMC, HMDDv3.2
hsa-mir-21	dbDEMC, HMDDv3.2	hsa-mir-200a	dbDEMC, HMDDv3.2
hsa-mir-155	dbDEMC, HMDDv3.2	hsa-mir-122	dbDEMC, HMDDv3.2
hsa-mir-20a	dbDEMC, HMDDv3.2	hsa-mir-221	dbDEMC, HMDDv3.2
hsa-mir-34a	dbDEMC, HMDDv3.2	hsa-mir-143	dbDEMC, HMDDv3.2
hsa-mir-126	dbDEMC, HMDDv3.2	hsa-mir-222	dbDEMC, HMDDv3.2
hsa-mir-146a	dbDEMC, HMDDv3.2	hsa-mir-31	dbDEMC, HMDDv3.2
hsa-mir-223	dbDEMC, HMDDv3.2	hsa-mir-200b	dbDEMC, HMDDv3.2
hsa-mir-106b	dbDEMC, HMDDv3.2	hsa-mir-19a	dbDEMC, HMDDv3.2
hsa-let-7b	dbDEMC, HMDDv3.2	hsa-mir-195	dbDEMC, HMDDv3.2
hsa-mir-29a	dbDEMC, HMDDv3.2	hsa-mir-29c	dbDEMC, HMDDv3.2
hsa-mir-15a	dbDEMC, HMDDv3.2	hsa-mir-200c	dbDEMC, HMDDv3.2
hsa-mir-106a	dbDEMC, HMDDv3.2	hsa-mir-145	dbDEMC, HMDDv3.2
hsa-let-7d	dbDEMC, HMDDv3.2	hsa-mir-27a	dbDEMC, HMDDv3.2
hsa-mir-150	dbDEMC, HMDDv3.2	hsa-mir-182	dbDEMC, HMDDv3.2

CONCLUSION

In this work, we proposed a computational method based on multiple hypergraph contrastive learning (MHCLMDA) to predict miRNA–disease associations. Compared with existing methods that integrate multiviews to implement prediction tasks, MHCLMDA constructs multiple miRNA hypergraphs and disease hypergraphs based on different properties of miRNAs and diseases and performs hypergraph convolution operations on each hypergraph, which can consider higher order information of miRNA and diseases to learn miRNA and disease feature representations. Contrastive learning is then employed to facilitate information interaction between the hypergraphs, ensuring consistency in the learned features across different views. After that, MHCLMDA uses the VAE to learn miRNA and disease nonlinear key features. Finally, we use the learned miRNA features and disease features for miRNA–disease association matrix reconstruction and fusion of multiple reconstructed matrices to reveal miRNA–disease associations. We tested the model on the human miRNA–disease association dataset. Our model outperformed the state-of-the-art methods in discovering missing miRNA–disease associations. In addition, our model also performed well in predicting associations for new diseases and new miRNAs. These results further validate the effectiveness of our model. In our future work, we will consider feature complementarity between multiple hypergraphs and build heterogeneous hypergraphs whose hyperedges contain other nodes besides miRNAs or diseases to improve the prediction in miRNA–disease associations.

Key Points

We constructed multiple miRNA and disease hypergraphs containing miRNA sequences, disease semantics, miRNA target genes, disease-associated genes and known disease–miRNA associations. Then, we performed hypergraph convolution on each hypergraph. This approach allows for capturing higher order relationships between miRNAs and diseases and learning a comprehensive representation of miRNAs and diseases from different perspectives.
By leveraging multihypergraph contrastive learning, MHCLMDA considers the interactions between views, maintaining feature consistency.
We applied MHCLMDA to predict human miRNA–disease association under different cross-validation settings. The results showed that our method outperformed other advanced techniques. The results of the ablation experiments also prove that MHCLMDA provides a novel and promising approach for predicting miRNA–disease associations.

FUNDING

National Natural Science Foundation of China (grant no. 61972185 and 62072124); Natural Science Foundation of Yunnan Province of China (grant no. 2019FA024); Yunnan Ten Thousand Talents Plan young.

DATA AVAILABILITY

The source code and data can be obtained from https://github.com/weiba/MHCLMDA.

Author Biographies

Wei Peng received the PhD degree in computer science from Central South University, China, in 2013. Currently, she is a professor of the Kunming University of Science and Technology, China. Her research interests include bioinformatics and data mining.

Zhichen He is a master student in the Kunming University of Science and Technology, China. His research interests include bioinformatics, feature extraction and data mining.

Wei Dai received his PhD degree in computer application from the University of Chinese Academy of Sciences, China, in 2018. Currently, he is an associate professor in the Kunming University of Science and Technology. His research interests include bioinformatics, distributed and cloud computing, data mining.

Wei Lan received the PhD in computer science from Central South University, China, in 2016. Currently, he is an associate professor in the School of Computer, Electronic and Information in Guangxi University, Nanning, China. His current research interests include bioinformatics and machine learning.

Reference

Bartel

MicroRNAs: genomics, biogenesis, mechanism, and function

Cell

2004

;

116

(

281

–

Month:	Total Views:
January 2024	415
February 2024	265
March 2024	265
April 2024	241
May 2024	162
June 2024	155
July 2024	168
August 2024	190
September 2024	215
October 2024	261
November 2024	228
December 2024	168
January 2025	143
February 2025	184
March 2025	225
April 2025	160

Article Contents

MHCLMDA: multihypergraph contrastive learning for miRNA–disease association prediction

Abstract

INTRODUCTION

MATERIALS AND METHODS

Datasets

Overview

Preprocessing data

Association cosine similarity for miRNA/disease

MiRNA sequence second-order similarity

Disease semantic second-order similarity

MiRNA cotarget genes similarity

Disease coassociation gene similarity

Construction of the hypergraph

Hypergraph contrastive learning

variational auto-encoder–based learning of miRNA and disease features

EXPERIMENT AND RESULT

Experiment setting

Parameter discussion

Performance comparison with other method

Ablation study

Case study

CONCLUSION

FUNDING

DATA AVAILABILITY

Author Biographies

Reference

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only