Abstract

Motivation

Exploring human-virus protein–protein interactions (PPIs) is crucial for unraveling the underlying pathogenic mechanisms of viruses. Limitations in the coverage and scalability of high-throughput approaches have impeded the identification of certain key interactions. Current popular computational methods adopt a two-stream pipeline to identify PPIs, which can only achieve relation modeling of protein pairs at the classification phase. However, the fitting capacity of the classifier is insufficient to comprehensively mine the complex interaction patterns between protein pairs.

Results

In this study, we propose a pioneering single-stream framework HBFormer that combines hybrid attention mechanism and multimodal feature fusion strategy for identifying human-virus PPIs. The Transformer architecture based on hybrid attention can bridge the bidirectional information flows between human protein and viral protein, thus unifying joint feature learning and relation modeling of protein pairs. The experimental results demonstrate that HBFormer not only achieves superior performance on multiple human-virus PPI datasets but also outperforms 5 other state-of-the-art human-virus PPI identification methods. Moreover, ablation studies and scalability experiments further validate the effectiveness of our single-stream framework.

Availability and implementation

Codes and datasets are available at https://github.com/RmQ5v/HBFormer.

1 Introduction

Viruses are one of the major pathogenic agents of global infectious diseases. Within the intricate human-virus interaction system, protein–protein interactions (PPIs) are tightly linked with multiple pivotal biological processes underlying viral infection (Dyer et al. 2008, Lasso et al. 2019). The surface proteins of the virus can bind to specific cellular receptors in human, initiating viral invasion and triggering a series of intracellular signaling cascades, such as mediating cytoskeleton remodeling, activating protein kinases, and suppressing host cell immune response (Grove and Marsh 2011, Dey and Mondal 2024). Also due to the small genome, the virus has to exploit the established functions of human cellular proteins to complete its life cycle (Yang et al. 2019). Thus, identifying human-virus PPIs is critical for unraveling the underlying pathogenic mechanisms of viruses and for the development of targeted antiviral therapies.

Affinity purification coupled with mass spectrometry (AP-MS) and the yeast two-hybrid (Y2H) system have been extensively employed in enriching and identifying stable interaction partners of specific proteins (Brückner et al. 2009, Qin et al. 2021). Moreover, the Y2H system offers the advantage of capturing weak or transient protein interactions in actual cellular processes (Stynen et al. 2012). Such finely regulated processes exist for many important signaling proteins such as receptor kinases as well as drivers of cellular genealogy (Xing et al. 2014). However, even with substantial investments of time and resources in biological experiments, a great number of false positive results are inevitably generated and many known interactions are frequently missed (Peng et al. 2017). Consequently, computational approaches for identifying potential PPIs have been on the rise, providing testable hypotheses that act as a supplement for experimental efforts.

Numerous traditional machine learning approaches have made progress in the identification of PPIs (Yang et al. 2010, Cui et al. 2012, Dey et al. 2020). For instance, Dey et al. combined random forest with support vector machine to predict viral-host PPIs on protein datasets characterized by amino acid composition, pseudo amino acid composition, and conjoint triad. Yang et al. encoded protein sequence into vectors by local descriptors and leveraged k-nearest neighbor model to detect protein interaction patterns. However, these methods based on hand-crafted feature extraction have limited capabilities in constructing complex hierarchical feature representations, which restrict the classification performance of machine learning models. Given that feature extraction is the key to mining the information patterns of proteins, several tools that integrate protein feature extraction utilities have emerged, such as BioSeq-BLM (Li et al. 2021) and iLearnPlus (Chen et al. 2021).

In recent years, deep learning techniques have become a promising strategy for accelerating PPI identification research (Yang et al. 2022). DPPI (Hashemifar et al. 2018) was the first approach to deploy deep learning into identification of PPIs, which designed stacked convolutional neural modules to detect local patterns within sequence profiles, and then performed binary classification through linear layers. After that, PIPR (Chen et al. 2019) introduced a residual recurrent convolutional neural network (CNN) (LeCun and Bengio 1995) aimed at providing multi-granular sequence feature aggregation. TransPPI (Yang et al. 2021) mined and generated high-dimensional sequence embeddings from sequence profiles based on a Siamese CNN model to facilitate human-virus PPI identification. Nonetheless, dependencies of the global nature of protein sequences are difficult to capture by local convolution operations, and the lack of global features degrades the performance of deep learning approaches. To efficiently model long-distance dependencies of sequences, some works attempted to introduce long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997) neural network. For instance, DNN-PPI (Li et al. 2018) constructed LSTM network to capture short-term dependencies at the amino acid level and long-term dependencies at the motif level. LSTM-PHV (Tsukiyama et al. 2021) leveraged LSTM with the word embedding model word2vec to learn the context information of amino acid sequences and derived predicted output via fully connected neural networks.

While the mentioned methods have achieved breakthroughs in PPI identification domain, they still suffer from certain limitations. On the one hand, they focus solely on sequence information of proteins, often overlooking other relevant factors acting on PPIs such as biological processes and molecular functions. For instance, post-translational modifications (PTMs) can alter the spatial conformation of proteins, thereby influencing the binding ability to interact with other proteins (Wang et al. 2022). On the other hand, existing approaches adopt a two-stream pipeline for PPI identification, which means that separate feature extraction (two-stream) is performed on each of the two input proteins, and then the two sets of feature vectors are concatenated and fed into a classifier to capture the relationships between proteins. However, for different datasets, the input proteins in the two-stream stage may originate from different species, posing a great challenge to the generalization ability of the feature extractor. Moreover, the two-stream framework is constrained to achieving the relation modeling of protein pairs exclusively at the classification phase. Common classifiers essentially aggregate multiple features in a certain proportion of weights, yet this aggregation capability is relatively limited. The fitting capacity of the classifier module is insufficient to comprehensively mine the complex interaction patterns between protein pairs.

To address the above challenges, we propose a single-stream framework named HBFormer, for identifying human-virus PPIs. This predictor is constructed based on hybrid attention mechanism and multimodal feature fusion strategy for proteins. The protein biological annotation features are tokenized to achieve interaction with sequence features in order to comprehensively characterize the intrinsic relationship between protein pairs, improving the prediction performance of the model. Meanwhile, the Transformer with hybrid attention can bridge the bidirectional information flows between human protein and viral protein, thereby unifies joint feature learning and relation modeling of protein pairs. Experimental results indicate that HBFormer outperforms other state-of-the-art methods and significantly impacts human-virus PPI identification.

2 Materials and methods

2.1 Overview of HBFormer

As shown in Fig. 1, the HBFormer framework encompasses three main parts: sequence embedding and feature extraction module, hybrid attention module based on multimodal features and interaction prediction module.

The framework of HBFormer. (a) Sequence embedding and feature extraction module. The pre-trained protein language model ProtT5-XL-Uniref50 and decoupled position encoding are applied to generate informative sequence feature representations of human proteins and viral proteins. (b) Hybrid attention module based on multimodal features. The protein biological annotation features are tokenized to achieve interaction with sequence features to comprehensively characterize the intrinsic relationship between protein pairs. The hybrid attention has dual attention operations, where self-attention allows to learn and extract features from protein itself while cross-attention aims to model the relationship between human protein and viral protein. The dual-perceived features for each of the protein pairs can be obtained in this module. (c) Interaction prediction module. A multilayer perceptron is employed to learn hidden representations from dual-perceived features and calculate predicted probability score of interaction.
Figure 1.

The framework of HBFormer. (a) Sequence embedding and feature extraction module. The pre-trained protein language model ProtT5-XL-Uniref50 and decoupled position encoding are applied to generate informative sequence feature representations of human proteins and viral proteins. (b) Hybrid attention module based on multimodal features. The protein biological annotation features are tokenized to achieve interaction with sequence features to comprehensively characterize the intrinsic relationship between protein pairs. The hybrid attention has dual attention operations, where self-attention allows to learn and extract features from protein itself while cross-attention aims to model the relationship between human protein and viral protein. The dual-perceived features for each of the protein pairs can be obtained in this module. (c) Interaction prediction module. A multilayer perceptron is employed to learn hidden representations from dual-perceived features and calculate predicted probability score of interaction.

2.2 Sequence feature representation

Pre-trained protein language models are a powerful paradigm for creating effective embeddings and learning context-aware data representations and have been successfully used in various downstream protein understanding tasks (Liu and Tian 2023, Wang et al. 2023). In this study, we adopt ProtT5-XL-Uniref50 (PT5) (Elnaggar et al. 2022) to embed human proteins and viral proteins for more informative sequence representations. PT5 is a pre-trained model based on the Text-to-Text Transfer Transformer (T5) (Raffel et al. 2020) architecture, conducted in a self-supervised manner through masked language modeling. This model is not only capable of capturing complex pattern in sequence contexts, but also generating different embedding representations of the same amino acid according to different contextual semantic information. It comprises a 24-layer encoder–decoder, encompassing 3B parameters. The pretext task of PT5 is to reconstruct inputs from masked tokens (amino acids), aiming at training a more powerful feature encoder. The whole process is trained on the BFD (Steinegger and Söding 2018) protein sequence corpora, which contain 393 billion amino acids, and then fine-tuned on the UniRef50 (Suzek et al. 2015) dataset. In implementation, we freeze the weights of the fine-tuned PT5 model to encode sequence. Each amino acid in the human and viral protein sequence is output as a 1024-dimensional embedding vector.

For each human protein sequence embedded by PT5, we denote it as [ah1,ah2,ah3,,ahm], similarly, each viral protein sequence can be described as [av1,av2,av3,,avn], where m and n are the lengths of the human and viral protein sequences, respectively, and a represents the encodings of amino acid. Next, CNNs with shared weights are utilized to project the sequence embeddings of the human and viral proteins from the initial dimension into the D dimension latent space. Meanwhile, we construct decoupled learnable positional encodings Ph and Pv, and separately added them to the corresponding embeddings. This step is necessitated as the subsequent attention mechanism is essentially an order-independent set operation, so explicit position encoding is required to preserve and distinguish the information of amino acid residues at different positions. Moreover, considering the variable lengths of different protein sequences, the proximal interpolation is incorporated to extend the learnable positional embedding matrix along the sequence dimension N to match the specific length of each protein sequence. The final sequence feature representations of human protein and viral protein are as follows:
(1)
(2)
where τ(·) is the proximal interpolation and D is set to 360.

2.3 Biological annotation feature representation

In this section, we firstly employ one-hot encoding to convert biological annotation information into numerical features. The biological annotation data for human protein and viral protein is collected from the UniProt (Consortium 2019) database, which contains entries based on six major categories of keywords, including biological process, cellular component, disease, domain, molecular function, and PTM. Assuming that there are total k entries of human proteins and viral proteins, then the biological annotation feature of proteins can be denoted as follows:
(3)
where ei represents the ith protein biological annotation property. If a human or viral protein has this property, ei=1, otherwise, ei=0.

Due to the large number of entries encompassed by the biological annotation data, the dimensionality of features is correspondingly high. Consequently, principal component analysis is introduced to reduce the dimensionality of the biological annotation features and to mitigate potential noise effects. We preserve 98% of the biological annotation feature information for both human proteins and viral proteins.

2.4 Transformer based on hybrid attention mechanism with multimodal features

Attention module is a highly flexible architectural component with dynamic and global modeling capacity, which has been successfully applied across diverse domains, including natural language processing and image recognition (Vaswani et al. 2017, Dosovitskiy et al. 2021). Drawing inspiration from this, we propose a single-stream Transformer framework, the core objective of which is to leverage the hybrid attention mechanism to bridge the bidirectional information flows between human proteins and viral proteins while simultaneously performing feature extraction and relation modeling. Herein, the hybrid attention has dual attention operations, where self-attention allows to learn and extract features from protein itself while cross-attention aims to model the mutual interaction patterns between human protein and viral protein. We will demonstrate the computational process of the hybrid attention module through mathematical derivation and analyze its intrinsic reasons. The input of the hybrid attention module is the concatenation of Fhum, Fvir, and CLS token, denoted as: [Fhum; Fvir; CLS], where Fhum is the concatenation of sequence feature and biological annotation feature of human protein, and Fvir is analogous to it. The CLS token refers to the class token, a special learnable token that serves to aggregate information from other tokens, providing the global feature representation. For simplicity, we omit the class token in the derivations. The output of the attention operation can be calculated as:
(4)
where Q, K, and V denote query, key, and value matrices, respectively. The subscripts hum and vir denote the matrix items representing the human protein and the viral protein. Next, we expand the attention weights in Equation (4), which is calculated as follows:
(5)
where Yvir_hum and Yhum_vir denote the weights that quantifies mutual relation patterns between human protein and viral protein (relation modeling), while Yvir_vir and Yhum_hum represent the weights that aggregate the protein features (feature extraction). It needs to be noted that the attention mechanism dynamically assigns weights during information aggregation. The more critical the features affecting the interactions, the higher the corresponding weights in the attention map. Then, we can calculate the output as follows:
(6)
In the above process, we extend the attention mechanism to the multiple attention heads, which enables it to consider various attention distributions and allows the model to focus on different aspects of protein multimodal features.
(7)
where WQ, WK, WV, and WO are learnable linear projection parameter matrices, the number of attention heads h is set to 12. After the attention operation, the output of the multi-head attention mechanism denoted as Z is processed as follows:
(8)
(9)
(10)
where FFN(·) and LN(·) denotes feed forward layer and layer normalization, respectively, and X is [Fhum; Fvir; CLS]. Ultimately, we can get the dual-perceived feature Z for human-virus protein pairs.

2.5 Prediction by multi-layer perceptron

After extracting the dual-perceived features for protein pairs using the hybrid attention mechanism, we construct the prediction module based on a multilayer perceptron. The probability score output from the last fully connected layer is utilized to assess the authenticity of interactions between proteins. The mathematical representation of the multilayer perceptron is:
(11)
(12)
(13)
where GAP(·) is the global average pooling operation employed for providing compact feature representations, and Flatten(·) denotes the flatten operator. For the ith layer, Wi denotes the weight matrix, bi is bias vector and σi represents the activation function. In our study, σ1 is the GELU function, and σ2(xi)=1/(1+exi) is the sigmoid function. Furthermore, there is an extremely imbalanced distribution of positive and negative samples in the dataset, which may potentially lead to biased model predictions, inundating the predictor with the majority class. To alleviate the problem, we introduce the Focal Loss (Lin et al. 2017) to supervise the training of the model. As an improvement of binary cross entropy, its core idea is to down-weight the loss assigned to easy classified examples during training, while focusing more on the hard-to-classify examples. It helps to enhance the model’s capability to discriminate between categories, thus ensuring a more robust and reliable prediction process. The focal loss function is defined as:
(14)
where pt represents the predicted probability of the model for the correct class, αt(0,1) denotes the balancing factor for adjusting the importance of positive and negative samples, and γ>0 is the focusing parameter to down-weight the easy examples.

3 Results

3.1 Dataset

  • Benchmark dataset

     To measure the performance of HBFormer for identifying human-virus PPIs, we adopt the high-quality benchmark dataset assembled by Tsukiyama et al. (2021). The positive samples contain experimentally validated 22 383 PPIs involving in 5882 human proteins and 996 viral proteins. Among the data, the confidence score of each interaction is greater than 0.3. Meanwhile, all proteins are non-redundant, composed of standard amino acids, and had lengths between 30 and 1000 residues. Regarding the construction of negative samples, the dissimilarity negative sampling method (Eid et al. 2016) is utilized to effectively mitigate the impact of noise samples that are similar to positive samples. The ratio of positive and negative samples in this benchmark dataset is 1:10. All samples are divided into a training set (conducting cross-validation) and an independent test set with a ratio of 8:2.

  • Specific types of human-virus PPI dataset

     In order to comprehensively evaluate the performance of HBFormer, we use the dataset from the TransPPI work (Yang et al. 2021). The TransPPI dataset contains experimentally validated information on the interactions between human proteins and proteins of eight specific types of viruses, which are HIV, Herpes, Papilloma, Influenza, Hepatitis, Dengue, Zika, and SARS-CoV-2. The number of positive samples is 9880, 5966, 5099, 3044, 1300, 927, 709, and 568, respectively. Similarly, the negative samples of this dataset are obtained by the dissimilarity negative sampling method with a ratio of positive to negative samples of 1:10.

  • Non-viral pathogen PPI dataset

     To investigate the applicability of our method to other types of pathogens, the interaction dataset is taken from Kösesoy et al. (2019), which includes PPIs between human and Bacillus anthracis, as well as human and Yersinia pestis. In the Human-Bacillus anthracis PPI dataset, the number of positive and negative interactions is 3090 and 9500, respectively. The Human-Yersinia pestis PPI dataset consists of 4097 positive samples and 12 500 negative samples. Negative samples for both datasets are generated using the random sampling method.

3.2 Comparison with state-of-the-art methods

To verify the effectiveness of our model, we compare HBFormer with five other state-of-the-art human-virus PPI identification methods on the benchmark dataset. As shown in Table 1, HBFormer based on the multi-feature fusion strategy surpasses other competing methods across all metrics. When HBFormer solely uses sequence features, it also exhibits superior performance in AUC and AUPRC metrics compared to other approaches, and the higher AUPRC value demonstrates the effectiveness of our single-stream framework in identifying positive examples. We can also observe that the AUPRC of HBFormer is 23.5% higher than that of the DeepViral method (Liu-Wei et al. 2021) with the multi-feature fusion strategy, as well as showing higher values in terms of MCC and F1 scores, which can provide better predictions when faced with an imbalance of data. Moreover, HBFormer significantly outperforms the sequence-based identification approaches (Yang et al. 2020, Tsukiyama et al. 2021, Yang et al. 2021, Madan et al. 2022), achieving improvements of 4.4%, 4.9%, 5.3%, and 18.2% on AUPRC, which may be attributed to the following key factors: On the one hand, we incorporate comprehensive protein characterization, encompassing not only sequences but also multifaceted biological annotation information, which provides the model with multidimensional insights into proteins. On the other hand, our pioneering single-stream framework is capable of performing additional and enhanced relation modeling between human protein and viral protein prior to classification, thus improving the model’s ability to discriminate interaction patterns.

Table 1.

Comparison of different methods on benchmark dataset.

MethodAUCAUPRCACCF1MCC
DeepViral0.9810.7570.9520.6880.674
Yang0.9630.8100.9470.7240.697
LSTM-PHV0.9760.9390.9840.9100.903
TransPPI0.9820.9430.9840.8960.886
STEP0.9830.9480.9830.9080.902
HBFormera0.9850.9480.9840.9070.898
HBFormerb0.9980.9920.9930.9620.958
MethodAUCAUPRCACCF1MCC
DeepViral0.9810.7570.9520.6880.674
Yang0.9630.8100.9470.7240.697
LSTM-PHV0.9760.9390.9840.9100.903
TransPPI0.9820.9430.9840.8960.886
STEP0.9830.9480.9830.9080.902
HBFormera0.9850.9480.9840.9070.898
HBFormerb0.9980.9920.9930.9620.958
a

Based on single-modal (sequence) features.

b

Based on multimodal (sequence+biological annotation) features.

Table 1.

Comparison of different methods on benchmark dataset.

MethodAUCAUPRCACCF1MCC
DeepViral0.9810.7570.9520.6880.674
Yang0.9630.8100.9470.7240.697
LSTM-PHV0.9760.9390.9840.9100.903
TransPPI0.9820.9430.9840.8960.886
STEP0.9830.9480.9830.9080.902
HBFormera0.9850.9480.9840.9070.898
HBFormerb0.9980.9920.9930.9620.958
MethodAUCAUPRCACCF1MCC
DeepViral0.9810.7570.9520.6880.674
Yang0.9630.8100.9470.7240.697
LSTM-PHV0.9760.9390.9840.9100.903
TransPPI0.9820.9430.9840.8960.886
STEP0.9830.9480.9830.9080.902
HBFormera0.9850.9480.9840.9070.898
HBFormerb0.9980.9920.9930.9620.958
a

Based on single-modal (sequence) features.

b

Based on multimodal (sequence+biological annotation) features.

3.3 Comparison across specific types of PPI datasets

To further assess the predictive ability of HBFormer across specific types of human-virus PPI datasets, we compare HBFormer with three other advanced identification methods that achieved high performance on the benchmark dataset. The results of the 5-fold cross-validation are shown in Table 2. It can be observed that the AUC metrics of our model show superiority over other advanced methods across all types of human-virus PPI datasets, especially on datasets with smaller data sizes (Hepatitis, Dengue, Zika virus, and SARS-CoV-2). Moreover, except for the slightly lower AUPRC on the Herpes and SARS-CoV-2 datasets compared with the STEP method (Madan et al. 2022), HBFormer achieves the highest AUPRC value on the other six datasets. These results imply that our single-stream model exhibits robust classification performance and reliability in uncovering interaction patterns, which is of great importance for advancing the identification of human-virus PPIs.

Table 2.

Comparison results across specific types of human-virus PPI dataset.

DatasetMethodAUCAUPRCACCDatasetMethodAUCAUPRCACC
Human-HIVTransPPI0.9950.9740.986Human-HepatitisTransPPI0.9170.6360.934
LSTM-PHV0.9940.9590.981LSTM-PHV0.9080.5590.920
STEP0.9960.9760.988STEP0.9010.6390.926
HBformer0.9960.9760.989HBformer0.9350.6850.940
Human-HerpesTransPPI0.9410.7680.952Human-DengueTransPPI0.9170.6360.934
LSTM-PHV0.9360.7220.935LSTM-PHV0.8910.4570.905
STEP0.9510.8020.956STEP0.9240.6380.933
HBformer0.9610.7810.954HBformer0.9580.6830.940
Human-PapillomaTransPPI0.9590.8180.959Human-ZikaTransPPI0.9260.7460.954
LSTM-PHV0.9540.7490.942LSTM-PHV0.8670.5710.924
STEP0.9620.8230.957STEP0.9370.7570.949
HBformer0.9750.8350.961HBformer0.9510.8020.953
Human-InfluenzaTransPPI0.9640.8340.961Human-SARS-CoV-2TransPPI0.8050.3290.906
LSTM-PHV0.9600.7810.948LSTM-PHV0.7760.2530.869
STEP0.9590.8430.958STEP0.8370.4280.911
HBformer0.9650.8490.962HBformer0.8790.4240.895
DatasetMethodAUCAUPRCACCDatasetMethodAUCAUPRCACC
Human-HIVTransPPI0.9950.9740.986Human-HepatitisTransPPI0.9170.6360.934
LSTM-PHV0.9940.9590.981LSTM-PHV0.9080.5590.920
STEP0.9960.9760.988STEP0.9010.6390.926
HBformer0.9960.9760.989HBformer0.9350.6850.940
Human-HerpesTransPPI0.9410.7680.952Human-DengueTransPPI0.9170.6360.934
LSTM-PHV0.9360.7220.935LSTM-PHV0.8910.4570.905
STEP0.9510.8020.956STEP0.9240.6380.933
HBformer0.9610.7810.954HBformer0.9580.6830.940
Human-PapillomaTransPPI0.9590.8180.959Human-ZikaTransPPI0.9260.7460.954
LSTM-PHV0.9540.7490.942LSTM-PHV0.8670.5710.924
STEP0.9620.8230.957STEP0.9370.7570.949
HBformer0.9750.8350.961HBformer0.9510.8020.953
Human-InfluenzaTransPPI0.9640.8340.961Human-SARS-CoV-2TransPPI0.8050.3290.906
LSTM-PHV0.9600.7810.948LSTM-PHV0.7760.2530.869
STEP0.9590.8430.958STEP0.8370.4280.911
HBformer0.9650.8490.962HBformer0.8790.4240.895
Table 2.

Comparison results across specific types of human-virus PPI dataset.

DatasetMethodAUCAUPRCACCDatasetMethodAUCAUPRCACC
Human-HIVTransPPI0.9950.9740.986Human-HepatitisTransPPI0.9170.6360.934
LSTM-PHV0.9940.9590.981LSTM-PHV0.9080.5590.920
STEP0.9960.9760.988STEP0.9010.6390.926
HBformer0.9960.9760.989HBformer0.9350.6850.940
Human-HerpesTransPPI0.9410.7680.952Human-DengueTransPPI0.9170.6360.934
LSTM-PHV0.9360.7220.935LSTM-PHV0.8910.4570.905
STEP0.9510.8020.956STEP0.9240.6380.933
HBformer0.9610.7810.954HBformer0.9580.6830.940
Human-PapillomaTransPPI0.9590.8180.959Human-ZikaTransPPI0.9260.7460.954
LSTM-PHV0.9540.7490.942LSTM-PHV0.8670.5710.924
STEP0.9620.8230.957STEP0.9370.7570.949
HBformer0.9750.8350.961HBformer0.9510.8020.953
Human-InfluenzaTransPPI0.9640.8340.961Human-SARS-CoV-2TransPPI0.8050.3290.906
LSTM-PHV0.9600.7810.948LSTM-PHV0.7760.2530.869
STEP0.9590.8430.958STEP0.8370.4280.911
HBformer0.9650.8490.962HBformer0.8790.4240.895
DatasetMethodAUCAUPRCACCDatasetMethodAUCAUPRCACC
Human-HIVTransPPI0.9950.9740.986Human-HepatitisTransPPI0.9170.6360.934
LSTM-PHV0.9940.9590.981LSTM-PHV0.9080.5590.920
STEP0.9960.9760.988STEP0.9010.6390.926
HBformer0.9960.9760.989HBformer0.9350.6850.940
Human-HerpesTransPPI0.9410.7680.952Human-DengueTransPPI0.9170.6360.934
LSTM-PHV0.9360.7220.935LSTM-PHV0.8910.4570.905
STEP0.9510.8020.956STEP0.9240.6380.933
HBformer0.9610.7810.954HBformer0.9580.6830.940
Human-PapillomaTransPPI0.9590.8180.959Human-ZikaTransPPI0.9260.7460.954
LSTM-PHV0.9540.7490.942LSTM-PHV0.8670.5710.924
STEP0.9620.8230.957STEP0.9370.7570.949
HBformer0.9750.8350.961HBformer0.9510.8020.953
Human-InfluenzaTransPPI0.9640.8340.961Human-SARS-CoV-2TransPPI0.8050.3290.906
LSTM-PHV0.9600.7810.948LSTM-PHV0.7760.2530.869
STEP0.9590.8430.958STEP0.8370.4280.911
HBformer0.9650.8490.962HBformer0.8790.4240.895

3.4 Comparison with popular embedding methods

To probe the capacity of different embedding methods for characterizing protein sequences, we train our model using four other widely used encoding schemes. Note that we construct HBFormer solely based on sequence features to ensure unbiased evaluation. For the static embedding methods Word2vec (Mikolov et al. 2013) and FastText (Joulin et al. 2016), we additionally fine-tune these models by using the genism library for a more comprehensive comparison. The results in Table 3 demonstrate that HBFormer with PT5 embedding exhibits improved performance compared with other encoding approaches. Specifically, without any fine-tuning on the benchmark dataset, PT5 shows a significant advantage compared with Word2vec and FastText in terms of the AUPRC. We also observe that these static embedding methods can lead to a significant improvement in the performance of classification models after fine-tuning on the dataset. However, their performance is still inferior to the PT5 method based on self-supervised training. This is attributed to PT5’s ability to learn context-specific sequence semantic information, enabling the generation of robust expressions enriched with more global contextual information and long-distance dependencies. In contrast to the pre-trained protein language model ESM-1B (Rives et al. 2021), which is also based on self-supervised training, PT5 presents superior performance across all metrics, indicating it can achieve more informative sequence representations. Furthermore, the AUC and AUPRC of PT5 are 2.9% and 12.7% higher than those of the method Bepler (Bepler and Berger 2019), which utilizes global structural similarity between proteins for weakly supervised embedding training, further illustrating the powerful expressive capabilities of sequence features derived from PT5.

Table 3.

Comparison results of different embedding methods.

MethodsAUCAUPRCACCF1MCC
Word2vec0.8900.5190.8890.5260.482
Word2veca0.9710.9260.9780.8780.867
FastText0.9070.5800.8890.5500.515
FastTexta0.9740.9240.9790.8730.861
ESM-1B0.9490.7610.9240.6560.639
Bepler0.9560.8210.9440.7260.700
PT50.9850.9480.9840.9070.898
MethodsAUCAUPRCACCF1MCC
Word2vec0.8900.5190.8890.5260.482
Word2veca0.9710.9260.9780.8780.867
FastText0.9070.5800.8890.5500.515
FastTexta0.9740.9240.9790.8730.861
ESM-1B0.9490.7610.9240.6560.639
Bepler0.9560.8210.9440.7260.700
PT50.9850.9480.9840.9070.898
a

Fine-tuned on the benchmark dataset.

Table 3.

Comparison results of different embedding methods.

MethodsAUCAUPRCACCF1MCC
Word2vec0.8900.5190.8890.5260.482
Word2veca0.9710.9260.9780.8780.867
FastText0.9070.5800.8890.5500.515
FastTexta0.9740.9240.9790.8730.861
ESM-1B0.9490.7610.9240.6560.639
Bepler0.9560.8210.9440.7260.700
PT50.9850.9480.9840.9070.898
MethodsAUCAUPRCACCF1MCC
Word2vec0.8900.5190.8890.5260.482
Word2veca0.9710.9260.9780.8780.867
FastText0.9070.5800.8890.5500.515
FastTexta0.9740.9240.9790.8730.861
ESM-1B0.9490.7610.9240.6560.639
Bepler0.9560.8210.9440.7260.700
PT50.9850.9480.9840.9070.898
a

Fine-tuned on the benchmark dataset.

3.5 Ablation study

To investigate the contribution of the hybrid attention mechanism, we test the performance of HBFormer with its self-attention module or cross-attention module removed. In this section, we train the HBFormer only using sequence features to eliminate potential biases introduced by biological annotation features. The results of the ablation study conducted on the independent test set are shown in Fig. 2a. It can be observed that the performance of HBFormer seems to decrease when we remove any part of the attention components. This elucidates that both self-attention and cross-attention contribute to the identification of interactions, but the impact on the model’s performance is greater in the absence of cross-attention, which may be due to the fact that cross-attention can provide additional relationship modeling and thus more effectively mine the relationships between proteins. Notably, HBFormer can be categorized as a two-stream framework when utilizing only self-attention, as in this stage, the Transformer only amounts to separate feature extraction operations for each of the two proteins. This further suggests that the single-stream attention mechanism framework outperforms the two-stream framework in the PPI identification tasks.

Ablation studies. (a) Performance comparison using different attention mechanism. (b) Attention entropy distribution across all attention heads using different attention mechanism.
Figure 2.

Ablation studies. (a) Performance comparison using different attention mechanism. (b) Attention entropy distribution across all attention heads using different attention mechanism.

We further perform an analysis of the information aggregation capabilities of different attention modules. Specifically, we introduce the attention entropy metric to evaluate the attention maps, where the attention maps are calculated from the compatibility of queries and keys via dot-product operations. The attention entropy is defined as follows:
(15)
where Ah is the attention map for the hth attention head, i, j represent the ith and jth tokens, l is the number of tokens and softmax(xi)=exi/jexj. For the hth head, the average attention entropy across all tokens can be calculated as:
(16)

In this study, we use a total of 12 heads based on multi-head attention mechanism. The average attention entropy distribution across all attention heads is shown in Fig. 2b. The HBFormer based on the hybrid attention mechanism exhibits a broader distribution of attention entropy compared with models employing only self-attention or cross-attention, which means that the attention heads have different specialization roles, allowing the model to better aggregate both local and global tokens with both concentrated and broad focuses. The above results prove that the hybrid attention mechanism is indispensable in our single-stream framework, facilitating the discrimination between interacting and non-interacting protein pairs.

3.6 Identification of non-viral pathogens PPIs

In order to further assess the generalizability of our proposed framework, we apply HBFormer to identify the interactions of human proteins with other non-viral pathogen proteins. We perform 5-fold cross-validation on the PPI datasets of Human-Bacillus anthracis (Human-B) and Humans-Yersinia pestis (Human-Y), respectively, and average the performance metrics of the subset models to produce more reliable predictions. The results are presented in Table 4. HBFormer still shows commendable performance on the two cross-species datasets, with AUC, AUPRC, and ACC all reaching above 0.9, suggesting that HBFormer can also be effectively extended to the PPI identification in humans and other types of pathogens.

Table 4.

Results on non-viral pathogens datasets.

DatasetAUCAUPRCACCF1MCC
Human-B0.9730.9280.9490.8740.825
Humans-Y0.9610.9090.9350.8530.792
DatasetAUCAUPRCACCF1MCC
Human-B0.9730.9280.9490.8740.825
Humans-Y0.9610.9090.9350.8530.792
Table 4.

Results on non-viral pathogens datasets.

DatasetAUCAUPRCACCF1MCC
Human-B0.9730.9280.9490.8740.825
Humans-Y0.9610.9090.9350.8530.792
DatasetAUCAUPRCACCF1MCC
Human-B0.9730.9280.9490.8740.825
Humans-Y0.9610.9090.9350.8530.792

4 Discussion

In this paper, wek propose HBFormer, a single-stream computational framework based on the hybrid attention mechanism for identifying human-virus PPIs. Unlike other deep learning approaches for PPI identification, our pioneering single-stream framework is capable of achieving additional relation modeling between human protein and viral protein simultaneously with feature extraction, thereby improving the model’s ability to discriminate interaction patterns. Moreover, the multimodal feature fusion strategy adopted is also an important factor contributing to the excellent performance of HBFormer, which provides the model with both comprehensive and multidimensional insights into proteins. Extensive experiments validate the effectiveness of our proposed framework and demonstrate its extensibility to other types of PPI identification. However, there remain some shortcomings in this research. The imbalance learning strategy we incorporate may still not fully resolve the problem of skewed sample distributions in some categories, and thus more ways to rectify the imbalance still need to be explored in order to improve the performance of the model. Furthermore, due to the high computation and memory demands of the Transformer architecture, insufficient computational resources may affect the application of the model. Optimizing computation to break through speed bottlenecks with FlashAttention tools will be an extensible work in the future.

Conflict of interest: The authors declare no competing interests.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62331012).

Data availability

All data and source code used in the present study is available at https://github.com/RmQ5v/HBFormer.

References

Bepler
T
,
Berger
B.
Learning protein sequence embeddings using information from structure. In: International Conference on Learning Representations, New Orleans, LA, USA: ICLR,
2019
.

Brückner
A
,
Polge
C
,
Lentze
N
et al.
Yeast two-hybrid, a powerful tool for systems biology
.
Int J Mol Sci
2009
;
10
:
2763
88
.

Chen
M
,
Ju
CJ-T
,
Zhang
T
et al.
Multifaceted protein–protein interaction prediction based on Siamese residual RCNN
.
Bioinformatics
2019
;
35
:
i305
i314
.

Chen
Z
,
Zhao
P
,
Li
C
et al.
Ilearnplus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
.
Nucleic Acids Res
2021
;
49
:
e60
.

Consortium
U.
Uniprot: a worldwide hub of protein knowledge
.
Nucleic Acids Res
2019
;
47
:
D506
D515
.

Cui
G
,
Fang
C
,
Han
K.
Prediction of protein-protein interactions between viruses and human by an SVM model.
BMC Bioinformatics
2012
;13:S5.

Dey
L
,
Chakraborty
S
,
Mukhopadhyay
A.
Machine learning techniques for sequence-based prediction of viral–host interactions between sars-cov-2 and human proteins
.
Biomed J
2020
;
43
:
438
50
.

Dey
S
,
Mondal
A.
Unveiling the role of host kinases at different steps of influenza a virus life cycle
.
J Virol
2024
;
98
:
e0119223
.

Dosovitskiy
A
,
Beyer
L
,
Kolesnikov
A
et al. An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations, Virtual Event. Austria: ICLR, 
2021
.

Dyer
MD
,
Murali
TM
,
Sobral
BW.
The landscape of human proteins interacting with viruses and other pathogens
.
PLoS Pathog
2008
;
4
:
e32
.

Eid
F-E
,
ElHefnawi
M
,
Heath
LS.
Denovo: virus-host sequence-based protein-protein interaction prediction
.
Bioinformatics
2016
;
32
:
1144
50
.

Elnaggar
A
,
Heinzinger
M
,
Dallago
C
et al.
Prottrans: toward understanding the language of life through self-supervised learning
.
IEEE Trans Pattern Anal Mach Intell
2022
;
44
:
7112
27
.

Grove
J
,
Marsh
M.
Host–pathogen interactions: the cell biology of receptor-mediated virus entry
.
J Cell Biol
2011
;
195
:
1071
82
.

Hashemifar
S
,
Neyshabur
B
,
Khan
AA
et al.
Predicting protein–protein interactions through sequence-based deep learning
.
Bioinformatics
2018
;
34
:
i802
i810
.

Hochreiter
S
,
Schmidhuber
J.
Long short-term memory
.
Neural Comput
1997
;
9
:
1735
80
.

Joulin
A
,
Grave
E
,
Bojanowski
P
et al. Bag of tricks for efficient text classification. In: Proceedings of the Fifteenth Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain: EACL, 2017.

Kösesoy
İ
,
Gök
M
,
Öz
C.
A new sequence based encoding for prediction of host–pathogen protein interactions
.
Comput Biol Chem
2019
;
78
:
170
7
.

Lasso
G
,
Mayer
SV
,
Winkelmann
ER
et al.
A structure-informed atlas of human-virus interactions
.
Cell
2019
;
178
:
1526
41.e16
.

LeCun
Y, Bengio Y.
Convolutional networks for images, speech, and time series
. In: Arbib MA (ed.),
The Handbook of Brain Theory and Neural Networks
, Vol.
3361
. Cambridge: MIT Press,
1995
.

Li
H
,
Gong
X-J
,
Yu
H
et al.
Deep neural network based predictions of protein interactions using primary sequences
.
Molecules
2018
;
23
:
1923
.

Li
H-L
,
Pang
Y-H
,
Liu
B.
Bioseq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
.
Nucleic Acids Res
2021
;
49
:
e129
.

Lin
T-Y
,
Goyal
P
,
Girshick
R
et al. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy: ICCV, 
2017
.

Liu
Y
,
Tian
B.
Protein–DNA binding sites prediction based on pre-trained protein language model and contrastive learning
.
Brief Bioinform
2023
;
25
:
bbad488
.

Liu-Wei
W
,
Kafkas
Ş
,
Chen
J
et al.
Deepviral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes
.
Bioinformatics
2021
;
37
:
2722
9
.

Madan
S
,
Demina
V
,
Stapf
M
et al.
Accurate prediction of virus-host protein-protein interactions via a Siamese neural network using deep protein sequence embeddings
.
Patterns
2022
;
3
:
100551
.

Mikolov
T
,
Chen
K
,
Corrado
G
et al. Efficient estimation of word representations in vector space. In: International Conference on Learning Representations, Scottsdale, Arizona, USA: ICLR, 2013.

Peng
X
,
Wang
J
,
Peng
W
et al.
Protein–protein interactions: detection, reliability assessment and applications
.
Brief Bioinform
2017
;
18
:
798
819
.

Qin
W
,
Cho
KF
,
Cavanagh
PE
et al.
Deciphering molecular interactions by proximity labeling
.
Nat Methods
2021
;
18
:
133
43
.

Raffel
C
,
Shazeer
N
,
Roberts
A
et al.
Exploring the limits of transfer learning with a unified text-to-text transformer
.
J Mach Learn Res
2020
;
21
:
1
67
.

Rives
A
,
Meier
J
,
Sercu
T
et al.
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
.
Proc Natl Acad Sci USA
2021
;
118
:
e2016239118
.

Steinegger
M
,
Söding
J.
Clustering huge protein sequence sets in linear time
.
Nat Commun
2018
;
9
:
2542
.

Stynen
B
,
Tournu
H
,
Tavernier
J
et al.
Diversity in genetic in vivo methods for protein-protein interaction studies: from the yeast two-hybrid system to the mammalian split-luciferase system
.
Microbiol Mol Biol Rev
2012
;
76
:
331
82
.

Suzek
BE
,
Wang
Y
,
Huang
H
et al. ;
UniProt Consortium
.
Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches
.
Bioinformatics
2015
;
31
:
926
32
.

Tsukiyama
S
,
Hasan
MM
,
Fujii
S
et al.
LSTM-PHV: prediction of human-virus protein–protein interactions by LSTM with word2vec
.
Brief Bioinform
2021
;
22
:
bbab228
.

Vaswani
A
,
Shazeer
N
,
Parmar
N
et al.
Attention is all you need
.
Adv Neural Inf Process Syst
2017
;
30
:
5998
6008
.

Wang
B
,
Xie
Q
,
Pei
J
et al.
Pre-trained language models in biomedical domain: a systematic survey
.
ACM Comput Surv
2023
;
56
:
1
52
.

Wang
S
,
Osgood
AO
,
Chatterjee
A.
Uncovering post-translational modification-associated protein–protein interactions
.
Curr Opin Struct Biol
2022
;
74
:
102352
.

Xing
Q
,
Huang
P
,
Yang
J
et al.
Visualizing an ultra-weak protein–protein interaction in phosphorylation signaling
.
Angew Chem Int Ed Engl
2014
;
53
:
11501
5
.

Yang
L
,
Xia
J-F
,
Gui
J.
Prediction of protein-protein interactions from protein sequence using local descriptors
.
Protein Pept Lett
2010
;
17
:
1085
90
.

Yang
S
,
Fu
C
,
Lian
X
et al.
Understanding human-virus protein-protein interactions using a human protein complex-based analysis framework
.
MSystems
2019
;
4
:
10
1128
.

Yang
X
,
Yang
S
,
Li
Q
et al.
Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
.
Comput Struct Biotechnol J
2020
;
18
:
153
61
.

Yang
X
,
Yang
S
,
Lian
X
et al.
Transfer learning via multi-scale CNN for human-virus protein-protein interaction prediction
.
Bioinformatics
2021
;
37
:
4771
8
.

Yang
X
,
Yang
S
,
Ren
P
et al.
Deep learning-powered prediction of human-virus protein-protein interactions
.
Front Microbiol
2022
;
13
:
842976
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Jonathan Wren
Jonathan Wren
Associate Editor
Search for other works by this author on: