ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm

Shao, Jiangyi; Liu, Bin

doi:10.1093/bib/bbaa192

Abstract

As one of the most important tasks in protein structure prediction, protein fold recognition has attracted more and more attention. In this regard, some computational predictors have been proposed with the development of machine learning and artificial intelligence techniques. However, these existing computational methods are still suffering from some disadvantages. In this regard, we propose a new network-based predictor called ProtFold-DFG for protein fold recognition. We propose the Directed Fusion Graph (DFG) to fuse the ranking lists generated by different methods, which employs the transitive closure to incorporate more relationships among proteins and uses the KL divergence to calculate the relationship between two proteins so as to improve its generalization ability. Finally, the PageRank algorithm is performed on the DFG to accurately recognize the protein folds by considering the global interactions among proteins in the DFG. Tested on a widely used and rigorous benchmark data set, LINDAHL dataset, experimental results show that the ProtFold-DFG outperforms the other 35 competing methods, indicating that ProtFold-DFG will be a useful method for protein fold recognition. The source code and data of ProtFold-DFG can be downloaded from http://bliulab.net/ProtFold-DFG/download

protein fold recognition, Directed Fusion Graph, PageRank, KL divergence, transitive closure

Introduction

Different tertiary structures of proteins lead to various protein functions. A fundamental step to study protein structures is to identify the protein folds [1]. In this regard, different computational methods are proposed to solve the protein fold recognition problem. They can be split into three categories: (i) alignment methods; (ii) machine-learning-based methods; (iii) network-based methods.

The alignment methods directly calculate the similarity between two proteins, based on which the folds can be predicted. For example, the CLUSTAL W [2] is based on the gap penalties. HHsearch [3] is based on the Hidden Markov Model-Hidden Markov Model (HMM-HMM) alignments. The PSI-BLAST [4] considers the evolutionary information by Position-Specific Score Matrices (PSSMs). ACCFold [5] combines the Autocross-covariance (ACC) transformation and PSSMs. Because the residue–residue contacts contain the predicted structure information [6], the residue–residue contacts are incorporated into the computational method so as to further improve the predictive performance [7]. The local or nonlocal interactions among residues or motifs are incorporated into the predictors as well [8–10]. For example, MotifCNN-fold [11] combines the motif-based kernel with Convolutional Neural Network (CNN) and Deep Convolutional Neural Network (DCNN) to extract the fold-specific features.

Figure 1

The flowchart of the ProtFold-DFG predictor for protein fold recognition.

Open in new tab Download slide

Figure 2

The process of generating the DFG. (A) The basic ranking lists are generated by four methods. (B) The four ranking lists are merged. (C) The Directed Fusion Graph is constructed based on the combined ranking list. (D) The PageRank is performed on the Directed Fusion Graph to recognize the protein folds.

Open in new tab Download slide

Based on the aforementioned features, some machine-learning-based methods have been proposed [12]. For example, FOLDpro [13] combines different alignment methods and structural information with Support Vector Machines (SVMs) [14]. The ensemble classifier [15] fuses nine basic OET-KNN classifiers into a more accurate classifier. The deep learning techniques [16, 17] have been also applied to this field. For example, DeepFRpro [10] employs the CNN to generate the fold-specific features, based on which a random forest model is constructed to predict the folds. MV-fold [18] combines different features via multiview modeling.

The relationships among proteins can be considered as a protein similarity network. Some network-based methods extract the global interactions among proteins from the protein similarity network to recognize the protein folds. For example, the enrichment of network topological similarity is performed on the protein similarity network to detect the protein folds [19]. CMsearch [20] based on the cross-model learning learns the sequence–structure correlation. Fold-LTR-TCP [21] is a re-rank model via combining the Triadic Closure Principle (TCP) algorithm with Learning to Rank (LTR) [22].

As discussed above, the network-based methods achieve the state-of-the-art performance. However, two bottleneck problems still exist in the network-based methods, which should be further explored: (i) how to incorporate the relationships among different proteins into the protein similarity network? (ii) how to accurately recognize the proteins sharing the same fold with the query protein based on the network? To answer these two questions, in this study we propose the Directed Fusion Graph (DFG) to incorporate the relationships among proteins detected by different methods. The PageRank algorithm [23] is then performed on the DFG for protein fold recognition by considering the global relationships among proteins. The new computational predictor is called ProtFold-DFG.

Materials and methods

Benchmark data set

In order to facilitate the fair performance comparison with other competing methods, the ProtFold-DFG is evaluated on a widely used and rigorous benchmark data set, LINDAL data set [24] with 321 proteins covering 38 protein folds extracted from the SCOP database. In order to rigorously simulate the protein fold recognition task, these proteins are divided into two subsets sharing no proteins from the same family or superfamily. Based on these two subsets, the 2-fold cross-validation is used to evaluate the performance of various methods. For more information on the benchmark data set, please refer to [24].

The framework of ProtFold-DFG

Previous studies show that different methods for protein fold recognition are complementary [18, 25]. Therefore, it is possible to further improve the predictive performance by fusing these complementary methods. The protein fold recognition task can be treated as an information retrieval problem. For each query protein, it is searched against the data set by a computational method to generate the corresponding ranking list. The query protein is more likely to share the same fold with the top hit in the ranking list. For multiple ranking lists, the common top hits in different ranking lists are more likely to be in the same fold as the query protein. In this regard, we propose the DFG to incorporate these characteristics.

In this study, we fuse different ranking lists generated by five computational methods. The DFG is constructed to reflect the relationships among proteins in the network based on the ranking lists generated by the different methods. Finally, the PageRank algorithm [23] is performed on the DFG to re-rank the proteins in DFG so as to generate the final ranking list. The proposed new computational method is called ProtFold-DFG, whose overall flowchart is shown in Figure 1. In the following sections, we will introduce the detailed information of each step.

Figure 3

The pseudo-code of ProtFold-DFG predictor by combining PageRank and DFG.

Open in new tab Download slide

Generate the basic ranking lists

As shown in Figure 2A, ranking lists are generated by different computational methods to reflect the relationships between the query protein and the template proteins, including DeepFR [10], DeepSVM-fold (CCM) [7], DeepSVM-fold (PSFM) [7], MotifCNN-fold (CCM) [11], MotifCNN-fold (PSFM) [11] and Fold-LTR-TCP [21]. Among these methods, the DeepFR, DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM) and MotifCNN-fold (PSFM) are based on the deep learning techniques to extract the fold-specific features to improve the predictive performance, whereas Fold-LTR-TCP is based on the LTR [22].

DeepFR [10] extracts the fold-specific features by using DCNN, based on which the proteins are converted into feature vectors. The cosine similarity between two feature vectors is used to measure the similarity between two proteins.

DeepSVM-fold [7] calculates the feature vectors of proteins based on the fold-specific features generated using convolutional neural network–bidirectional long short-term memory (CNN-LSTM), and then the similarity between two proteins is calculated by cosine between their corresponding feature vectors. DeepSVM-fold (CCM) based on reside–residue contact (CCM) and DeepSVM-fold (PSFM) based on Position-Specific Frequency Matrix (PSFM) are employed in this study.

MotifCNN-fold [11] are based on the structural motif kernels to construct the motif-based CNNs. In this study the MotifCNN-fold (CCM), MotifCNN-fold (PSFM) are used, which are constructed based on the CCM and PSFM, respectively.

Fold-LTR-TCP [21] searches the query protein against the template proteins so as to find the template proteins in the same fold with the query protein in a supervised manner. The ranking list generated by Fold-LTR-TCP without TCP is used in this study.

Merge multiple ranking lists

After generating the basic ranking lists, we merge these ranking lists into a digraph G by:

$$\begin{equation} \left\{\kern-6pt\begin{array}{c}{\mathbf{G}}_{u,v}=1,\mathrm{if}\ \underset{l=1}{\overset{k}{\bigvee }}\left\langle u,v\right\rangle \mathbf{\in}{\mathbf{M}}_l\\{}{\mathbf{G}}_{u,v}=0,\mathrm{otherwise}\end{array}\right. \end{equation}$$

(1)

where the |${\mathbf{M}}_l$| represents a ranking list generated by the method l. k represents the total number of the ranking lists. The proteins in this ranking list are sorted in ascending order according to their similarities with the query protein. The |$\Big\langle u,v\Big\rangle$| represents that protein v is more likely to be in the same fold with the query protein than protein u is. This process is shown in Figure 2B. The protein similarity network can be constructed by using the aforementioned approach, but this network fails to measure the subtle relationships among proteins, which will be explored in the following section.

Construct the DFG

The digraph G is not suitable for protein fold recognition, because it only considers the relationships between adjacent proteins in the basic ranking lists. In order to incorporate more interactions among proteins in the ranking lists, the transitive closure [26] is used to update the digraph G (Figure 2C). A ranking list |${\mathbf{M}}_l$| is normalized by:

$$\begin{equation} {\overline{\mathbf{M}}}_l=\frac{{\mathbf{M}}_l}{3}\left(\frac{1}{{\left\Vert{\mathbf{M}}_l\right\Vert}_1+0.01}+\frac{1}{{\left\Vert{\mathbf{M}}_l\right\Vert}_2+0.01}+\frac{1}{{\left\Vert{\mathbf{M}}_l\right\Vert}_{\infty }+0.01}\right) \end{equation}$$

(2)

where |${\Big\Vert{\mathbf{M}}_l\Big\Vert}_1$|⁠, |${\Big\Vert{\mathbf{M}}_l\Big\Vert}_2$| and |${\Big\Vert{\mathbf{M}}_l\Big\Vert}_{\infty }$| is the 1 norm, 2 norm and infinite norm of the ranking list by method l, respectively. The depth-first transitive closure algorithm [26] is employed to update the digraph G:

$$\begin{equation} \left\{\kern-6pt\begin{array}{c}{\mathbf{G}}_{u,v}=1,\mathrm{if}\ {\mathbf{G}}_{u,w}=1\wedge{\mathbf{G}}_{w,v}=1\\{}{\mathbf{G}}_{u,v}=0,\mathrm{otherwise}\end{array}\right. \end{equation}$$

(3)

This updated digraph G represents the reachability among vertices, incorporating the relationships among proteins in the basic ranking lists.

Table 1

Open in new tab

Performance of the ProtFold-DFG predictors based on different ranking lists on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Accuracy (%)
ProtFold-DFG^a	77.88
ProtFold-DFG^b	78.82
ProtFold-DFG^c	71.03
ProtFold-DFG^d	76.65
ProtFold-DFG^e	76.65
ProtFold-DFG^f	81.93
ProtFold-DFG^g	80.69

^aFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM) and Fold-LTR-TCP.

^bFusing the rank lists generated by MotifCNN-fold (CCM), MotifCNN-fold(PSFM) and Fold-LTR-TCP.

^cFusing the rank lists generated by DeepFR and Fold-LTR-TCP.

^dFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), DeepFR and Fold-LTR-TCP.

^eFusing the rank lists generated by MotifCNN-fold (CCM), MotifCNN-fold (PSFM), DeepFR and Fold-LTR-TCP.

^fFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotfiCNN-fold (PSFM) and Fold-LTR-TCP.

^gFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotifCNN-fold (PSFM), DeepFR and Fold-LTR-TCP.

Table 1

Open in new tab

Performance of the ProtFold-DFG predictors based on different ranking lists on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Accuracy (%)
ProtFold-DFG^a	77.88
ProtFold-DFG^b	78.82
ProtFold-DFG^c	71.03
ProtFold-DFG^d	76.65
ProtFold-DFG^e	76.65
ProtFold-DFG^f	81.93
ProtFold-DFG^g	80.69

^aFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM) and Fold-LTR-TCP.

^bFusing the rank lists generated by MotifCNN-fold (CCM), MotifCNN-fold(PSFM) and Fold-LTR-TCP.

^cFusing the rank lists generated by DeepFR and Fold-LTR-TCP.

^dFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), DeepFR and Fold-LTR-TCP.

^eFusing the rank lists generated by MotifCNN-fold (CCM), MotifCNN-fold (PSFM), DeepFR and Fold-LTR-TCP.

^fFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotfiCNN-fold (PSFM) and Fold-LTR-TCP.

^gFusing the rank lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotifCNN-fold (PSFM), DeepFR and Fold-LTR-TCP.

Table 2

Open in new tab

Performance of the ProtFold-DFG predictors with and without the transitive closure on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Accuracy (%)
ProtFold-DFG^a	54.20
ProtFold-DFG^b	81.93

^aDFG |$\overline{\mathbf{\mathcal{F}}}$| without transitive closure [26].

^bDFG |$\overline{\mathbf{\mathcal{F}}}$| with transitive closure [26].

Table 2

Open in new tab

Performance of the ProtFold-DFG predictors with and without the transitive closure on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Accuracy (%)
ProtFold-DFG^a	54.20
ProtFold-DFG^b	81.93

^aDFG |$\overline{\mathbf{\mathcal{F}}}$| without transitive closure [26].

^bDFG |$\overline{\mathbf{\mathcal{F}}}$| with transitive closure [26].

Based on the KL divergence [26], the DFG |$\overline{\mathbf{\mathcal{F}}}$| is constructed by assigning weight to each edge in the digraph G, which can be represented as:

$$\begin{equation} \left\{\kern-6pt\begin{array}{c}\overline{\mathcal{F}}_{u,v}=\frac{1}{k}\sum_{l=1}^k\frac{\left|\left(\overline{\mathbf{M}}_{l,u}+1\right)\left(\log\left(\frac{\overline{\mathbf{M}}_{l,u}+1}{\overline{\mathbf{M}}_{l,v}+1}\right)\right)\right|}{1+\left|\left(\overline{\mathbf{M}}_{l,u}+1\right)\left(\log\left(\frac{\overline{\mathbf{M}}_{l,u}+1}{\overline{\mathbf{M}}_{l,v}+1}\right)\right)\right|},\mathrm{if}\ u\ne v\wedge{\mathbf{G}}_{u,v}=1\ \\{}\overline{\mathcal{F}}_{u,v}=1,\mathrm{if}\ u=v\\{}\overline{\mathcal{F}}_{u,v}=0,\mathrm{otherwise}\end{array}\right. \end{equation}$$

(4)

where |$\overline{\mathbf{M}}_{l,u}$| represents the similarity between the template protein u and the query protein calculated by the method l; k represents the total number of ranking lists. The KL divergence [27] measures the relationship between two proteins, which is then mapped to the edge weight. As a result, the DFG can accurately present the probability of mutual recommendation among proteins. Because we cannot insure that |$\overline{\mathbf{\mathcal{F}}}$| is stochastic and irreducible [28], |$\overline{\mathbf{\mathcal{F}}}$| is further converted following Google matrix [23] by:

$$\begin{equation} \overline{\mathbf{\mathcal{F}}}=\frac{17}{20}{\overline{\mathcal{F}}}^{\mathrm{T}}+\frac{3}{40n}\mathbf{1}{\mathbf{1}}^{\mathrm{T}} \end{equation}$$

(5)

where n is the number of the training samples. This combination ensures the |$\overline{\mathbf{\mathcal{F}}}$| is both stochastic and irreducible [28].

Perform PageRank on DFG

The PageRank algorithm is able to find the most cited literature from global perspective [29]. It has also been successfully applied to protein remote homology detection [30]. The PageRank algorithm assumes that the accuracy of a ranking is positively correlated with the relationships among proteins in the graph. Therefore, as shown in Figure 2D in this study the proteins are considered as the literature, and the proteins in the same fold can be considered as the most relevant literature. Based on these similarities, the PageRank is performed on the DFG for protein fold recognition.

Table 3

Open in new tab

Performance of the ProtFold-DFG predictors using KL divergence and out-degree on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Accuracy (%)
ProtFold-DFG (out-degree)^a	80.06
ProtFold-DFG (out-degree)^b	7.47
ProtFold-DFG (KL divergence)^c	81.93
ProtFold-DFG (KL divergence)^d	81.93

^aThe evaluation score between two proteins is the reciprocal of the vertex’s out-degree and the initial value is the average of basic methods.

^bThe evaluation score between two proteins is the reciprocal of the vertex’s out-degree and the initial value is |${\mathbf{R}}^{(0)}$| (Equation 8).

^cThe evaluation score between two proteins is calculated by the KL divergence and the initial value is the average of basic methods.

^dThe evaluation score between two proteins is calculated by the KL divergence and the initial value is |${\mathbf{R}}^{(0)}$| (Equation 8).

Table 3

Open in new tab

Performance of the ProtFold-DFG predictors using KL divergence and out-degree on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Accuracy (%)
ProtFold-DFG (out-degree)^a	80.06
ProtFold-DFG (out-degree)^b	7.47
ProtFold-DFG (KL divergence)^c	81.93
ProtFold-DFG (KL divergence)^d	81.93

^aThe evaluation score between two proteins is the reciprocal of the vertex’s out-degree and the initial value is the average of basic methods.

^bThe evaluation score between two proteins is the reciprocal of the vertex’s out-degree and the initial value is |${\mathbf{R}}^{(0)}$| (Equation 8).

^cThe evaluation score between two proteins is calculated by the KL divergence and the initial value is the average of basic methods.

^dThe evaluation score between two proteins is calculated by the KL divergence and the initial value is |${\mathbf{R}}^{(0)}$| (Equation 8).

Table 4

Open in new tab

Performance of the ProtFold-DFG predictors based on three link analysis algorithms on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Link analysis algorithm	Accuracy (%)
ProtFold-DFG^a	Hypertext Induced Topic Selection	80.37
ProtFold-DFG^b	Triadic Closure Principle	79.43
ProtFold-DFG^c	PageRank	81.93

Method	Link analysis algorithm	Accuracy (%)
ProtFold-DFG^a	Hypertext Induced Topic Selection	80.37
ProtFold-DFG^b	Triadic Closure Principle	79.43
ProtFold-DFG^c	PageRank	81.93

^aThe Hypertext Induced Toxic Selection (HITS) algorithm is performed on the DFG.

^bThe Triadic Closure Principle (TCP) algorithm is performed on the DFG.

^cThe PageRank algorithm is performed on the DFG.

Table 4

Open in new tab

Performance of the ProtFold-DFG predictors based on three link analysis algorithms on the LINDAHL data set evaluated by using 2-fold cross-validation

Method	Link analysis algorithm	Accuracy (%)
ProtFold-DFG^a	Hypertext Induced Topic Selection	80.37
ProtFold-DFG^b	Triadic Closure Principle	79.43
ProtFold-DFG^c	PageRank	81.93

Method	Link analysis algorithm	Accuracy (%)
ProtFold-DFG^a	Hypertext Induced Topic Selection	80.37
ProtFold-DFG^b	Triadic Closure Principle	79.43
ProtFold-DFG^c	PageRank	81.93

^aThe Hypertext Induced Toxic Selection (HITS) algorithm is performed on the DFG.

^bThe Triadic Closure Principle (TCP) algorithm is performed on the DFG.

^cThe PageRank algorithm is performed on the DFG.

Figure 4

The performance of four ProtFold-DFG predictors based on the out-degree and KL divergence with different iterations. For the meanings of a, b, c and d, please refer to the footnote of Table 3.

Open in new tab Download slide

Table 5

Open in new tab

Performance of 36 computational methods for protein fold recognition on LINDAHL data set via 2-fold cross-validation

Methods	Category^a	Accuracy (%)	Source
PSI-BLAST	Alignment	4	[4]
HMMER	Alignment	4.4	[33]
SAM-T98	Alignment	3.4	[34]
BLASTLINK	Alignment	6.9	[24]
SSEARCH	Alignment	5.6	[35]
SSHMM	Alignment	6.9	[35]
THREADER	Alignment	14.6	[36]
FUGUE	Alignment	12.5	[37]
RAPTOR	Alignment	25.4	[38]
SPARKS	Alignment	24.3	[39]
SP3	Alignment	28.7	[51]
FOLDpro	Machine learning	26.5	[13]
HHpred	Alignment	25.2	[40]
SP4	Alignment	30.8	[41]
SP5	Alignment	37.9	[42]
BoostThreader	Machine learning	42.6	[43]
SPARKS-X	Alignment	45.2	[44]
FFAS-3D	Machine learning	35.8	[45]
RF-Fold	Machine learning	40.8	[46]
DN-Fold	Machine learning	33.6	[47]
RFDN-Fold	Machine learning	37.7	[47]
DN-FoldS	Machine learning	33.3	[47]
DN-FoldR	Machine learning	27.4	[47]
HH-fold	Machine learning	42.1	[48]
TA-fold	Machine learning	53.9	[48]
dRHP-PseRA	Machine learning	34.9	[49]
MT-fold	Machine learning	59.1	[18]
DeepFR (strategy1)	Machine learning	44.5	[10]
DeepFR (strategy2)	Machine learning	56.1	[10]
DeepFRpro (strategy1)	Machine learning	57.6	[10]
DeepFRpro (strategy2)	Machine learning	66.0	[10]
DeepSVM-fold	Machine learning	67.3	[7]
MotifCNN-fold	Machine learning	72.6	[11]
Fold-LTR-TCP	Network	73.2	[21]
FoldRec-C2C	Network	77.88	[50]
ProtFold-DFG ^b	Network	81.93	This study

Methods	Category^a	Accuracy (%)	Source
PSI-BLAST	Alignment	4	[4]
HMMER	Alignment	4.4	[33]
SAM-T98	Alignment	3.4	[34]
BLASTLINK	Alignment	6.9	[24]
SSEARCH	Alignment	5.6	[35]
SSHMM	Alignment	6.9	[35]
THREADER	Alignment	14.6	[36]
FUGUE	Alignment	12.5	[37]
RAPTOR	Alignment	25.4	[38]
SPARKS	Alignment	24.3	[39]
SP3	Alignment	28.7	[51]
FOLDpro	Machine learning	26.5	[13]
HHpred	Alignment	25.2	[40]
SP4	Alignment	30.8	[41]
SP5	Alignment	37.9	[42]
BoostThreader	Machine learning	42.6	[43]
SPARKS-X	Alignment	45.2	[44]
FFAS-3D	Machine learning	35.8	[45]
RF-Fold	Machine learning	40.8	[46]
DN-Fold	Machine learning	33.6	[47]
RFDN-Fold	Machine learning	37.7	[47]
DN-FoldS	Machine learning	33.3	[47]
DN-FoldR	Machine learning	27.4	[47]
HH-fold	Machine learning	42.1	[48]
TA-fold	Machine learning	53.9	[48]
dRHP-PseRA	Machine learning	34.9	[49]
MT-fold	Machine learning	59.1	[18]
DeepFR (strategy1)	Machine learning	44.5	[10]
DeepFR (strategy2)	Machine learning	56.1	[10]
DeepFRpro (strategy1)	Machine learning	57.6	[10]
DeepFRpro (strategy2)	Machine learning	66.0	[10]
DeepSVM-fold	Machine learning	67.3	[7]
MotifCNN-fold	Machine learning	72.6	[11]
Fold-LTR-TCP	Network	73.2	[21]
FoldRec-C2C	Network	77.88	[50]
ProtFold-DFG ^b	Network	81.93	This study

^a‘Alignment’, ‘Machine learning’ and ‘Network’ mean that the corresponding predictors are alignment methods, machine-learning-based methods and network-based methods respectively according to the three categories introduced in the introduction section.

^bThe ProtFold-DFG predictor is based on the KL-divergence, PageRank and the DFG generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotifCNN-fold (PSFM) and Fold-LTR-TCP.

Table 5

Open in new tab

Performance of 36 computational methods for protein fold recognition on LINDAHL data set via 2-fold cross-validation

Methods	Category^a	Accuracy (%)	Source
PSI-BLAST	Alignment	4	[4]
HMMER	Alignment	4.4	[33]
SAM-T98	Alignment	3.4	[34]
BLASTLINK	Alignment	6.9	[24]
SSEARCH	Alignment	5.6	[35]
SSHMM	Alignment	6.9	[35]
THREADER	Alignment	14.6	[36]
FUGUE	Alignment	12.5	[37]
RAPTOR	Alignment	25.4	[38]
SPARKS	Alignment	24.3	[39]
SP3	Alignment	28.7	[51]
FOLDpro	Machine learning	26.5	[13]
HHpred	Alignment	25.2	[40]
SP4	Alignment	30.8	[41]
SP5	Alignment	37.9	[42]
BoostThreader	Machine learning	42.6	[43]
SPARKS-X	Alignment	45.2	[44]
FFAS-3D	Machine learning	35.8	[45]
RF-Fold	Machine learning	40.8	[46]
DN-Fold	Machine learning	33.6	[47]
RFDN-Fold	Machine learning	37.7	[47]
DN-FoldS	Machine learning	33.3	[47]
DN-FoldR	Machine learning	27.4	[47]
HH-fold	Machine learning	42.1	[48]
TA-fold	Machine learning	53.9	[48]
dRHP-PseRA	Machine learning	34.9	[49]
MT-fold	Machine learning	59.1	[18]
DeepFR (strategy1)	Machine learning	44.5	[10]
DeepFR (strategy2)	Machine learning	56.1	[10]
DeepFRpro (strategy1)	Machine learning	57.6	[10]
DeepFRpro (strategy2)	Machine learning	66.0	[10]
DeepSVM-fold	Machine learning	67.3	[7]
MotifCNN-fold	Machine learning	72.6	[11]
Fold-LTR-TCP	Network	73.2	[21]
FoldRec-C2C	Network	77.88	[50]
ProtFold-DFG ^b	Network	81.93	This study

Methods	Category^a	Accuracy (%)	Source
PSI-BLAST	Alignment	4	[4]
HMMER	Alignment	4.4	[33]
SAM-T98	Alignment	3.4	[34]
BLASTLINK	Alignment	6.9	[24]
SSEARCH	Alignment	5.6	[35]
SSHMM	Alignment	6.9	[35]
THREADER	Alignment	14.6	[36]
FUGUE	Alignment	12.5	[37]
RAPTOR	Alignment	25.4	[38]
SPARKS	Alignment	24.3	[39]
SP3	Alignment	28.7	[51]
FOLDpro	Machine learning	26.5	[13]
HHpred	Alignment	25.2	[40]
SP4	Alignment	30.8	[41]
SP5	Alignment	37.9	[42]
BoostThreader	Machine learning	42.6	[43]
SPARKS-X	Alignment	45.2	[44]
FFAS-3D	Machine learning	35.8	[45]
RF-Fold	Machine learning	40.8	[46]
DN-Fold	Machine learning	33.6	[47]
RFDN-Fold	Machine learning	37.7	[47]
DN-FoldS	Machine learning	33.3	[47]
DN-FoldR	Machine learning	27.4	[47]
HH-fold	Machine learning	42.1	[48]
TA-fold	Machine learning	53.9	[48]
dRHP-PseRA	Machine learning	34.9	[49]
MT-fold	Machine learning	59.1	[18]
DeepFR (strategy1)	Machine learning	44.5	[10]
DeepFR (strategy2)	Machine learning	56.1	[10]
DeepFRpro (strategy1)	Machine learning	57.6	[10]
DeepFRpro (strategy2)	Machine learning	66.0	[10]
DeepSVM-fold	Machine learning	67.3	[7]
MotifCNN-fold	Machine learning	72.6	[11]
Fold-LTR-TCP	Network	73.2	[21]
FoldRec-C2C	Network	77.88	[50]
ProtFold-DFG ^b	Network	81.93	This study

^a‘Alignment’, ‘Machine learning’ and ‘Network’ mean that the corresponding predictors are alignment methods, machine-learning-based methods and network-based methods respectively according to the three categories introduced in the introduction section.

^bThe ProtFold-DFG predictor is based on the KL-divergence, PageRank and the DFG generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotifCNN-fold (PSFM) and Fold-LTR-TCP.

The final ranking list R generated by the ProtFold-DFG predictor based on PageRank and DFG |$\overline{\mathbf{\mathcal{F}}}$| can be represented as [31]:

$$\begin{equation} \mathbf{R}=\overline{\mathbf{\mathcal{F}}}\mathbf{R} \end{equation}$$

(6)

or [31]:

$$\begin{equation} \left(\mathbf{I}-\overline{\mathbf{\mathcal{F}}}\right)\mathbf{R}=\mathbf{0} \end{equation}$$

(7)

where I is the identity matrix. The power method [28] is employed to solve this problem by:

$$\begin{equation} \left\{\kern-6pt\begin{array}{c}{\mathbf{R}}^{(t)}=\overline{\mathbf{\mathcal{F}}}{\mathbf{R}}^{\left(t-1\right)},\mathrm{if}\ t>0\\{}{\mathbf{R}}^{(t)}=\frac{2+n}{3n}\mathbf{1},\mathrm{otherwise}\end{array}\right. \end{equation}$$

(8)

where |${\mathbf{R}}^{(t)}$| is the result of vector |${\mathbf{R}}^{(0)}$| and matrix |$\overline{\mathbf{\mathcal{F}}}$| after matrix multiplications for t times, |${\mathbf{R}}^{(0)}$| is an initial vector, which is set as |$\frac{2+n}{3n}\mathbf{1}$| at the start and n is the number of training samples. The |${\mathbf{R}}^{(t)}$| is normalized for each matrix multiplication by using Equation 2. The pseudocode of the ProtFold-DFG is shown in Figure 3.

Evaluation methodology

The overall accuracy is used to measure the performance of different methods, which can be calculated by:

$$\begin{equation} \mathrm{Accuracy}=\frac{\mathrm{CN}}{N} \end{equation}$$

(9)

where CN represents the number of the correctly predicted query proteins and N is the number of all proteins to be predicted. Please note that if the fold of the query protein is the same as that of the top-ranked template protein in the training set, this query protein is correctly predicted; otherwise, it is incorrectly predicted.

Results and discussion

The influence of the different combinations of the basic ranking lists on the performance of the ProtFold-DFG predictors

In this section, we investigate the influence of the basic predictors on the predictive performance. As introduced in the method section, six computational methods are selected to generate the basic ranking lists. There are two reasons for selecting these predictors: (i) they are the state-of-the-art methods in the field of protein fold recognition. (ii) They are based on different theories and techniques, and therefore, they are complementary.

The results of the ProtFold-DFG predictors by fusing different ranking lists are shown in Table 1, from which we can see the followings: (i) the performance of ProtFold-DFG can be improved when adding more complementary ranking lists. (ii) The ProtFold-DFG predictor based on the five ranking lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), MotifCNN-fold (CCM), MotfiCNN-fold (PSFM) and Fold-LTR-TCP achieves the best performance. (iii) The ProtFold-DFG predictor achieves lower performance when adding the ranking list generated by DeepFR. The reason is that Fold-LTR-TCP is also based on the DeepFR, and their predictive results are not complementary. Noise information will be introduced when incorporating the predictive results of DeepFR into the ProtFold-DFG predictor. This conclusion is further supported by the fact that the ProtFold-DFG predictor based on DeepFR and Fold-LTR-TCP can only achieve an accuracy of 71.03%.

Figure 5

The ProtFold-DFG prediction visualization. The best ProtFold-DFG predictor based on the five basic ranking lists is analyzed. The ranking lists generated by DeepSVM-fold (CCM), DeepSVM-fold (PSFM), Fold-LTR-TCP, MotifCNN-fold (CCM) and MotifCNN-fold (PSFM) are visualized in (A–E), respectively. Based on the basic ranking lists, the corresponding DFG is generated (F), and then the PageRank is performed on the DFG to detect the template proteins in DFG sharing the same protein fold with the query protein (G). For these figures, the blue lines and gray lines represent the relationships among proteins. The relationship between two proteins linked by blue line is closer than that linked by gray line. The red lines indicate the final prediction results: the query proteins and the template proteins are predicted to be in the same protein folds. These figures are plotted by using the software tool Gephi [52].

Open in new tab Download slide

The impact of the transitive closure on the performance of ProtFold-DFG

The main contribution of this study is to propose the DFG to fuse different ranking lists so as to model the global interactions among proteins. Different from the Google matrix [23], the DFG incorporates more interactions among proteins in the ranking lists by using the transitive closure [26]. In this section, we will explore the influence of the transitive closure on the performance of ProtFold-DFG, and the results are listed in Table 2, from which we can obviously see that the transitive closure can obviously improve the performance of ProtFold-DFG. The reason for its better performance is that when using transitive closure, the DFG cannot only consider the relationships between adjacent proteins in the ranking lists, but it also can incorporate the local or global interactions among proteins in the ranking lists, which will provide valuable information for the PageRank to re-rank the ranking list so as to accurately recognize the protein folds.

ProtFold-DFG based on KL divergence can improve its generalization ability

In this study, we employ the KL divergence [27] to assign weights to the edges in the DFG |$\overline{\mathbf{\mathcal{F}}}$|⁠. Besides KL divergence, out-degree [29] can also calculate the weight for each edge. The performance of ProtFold-DFG predictors based on these two methods with different node initial values is shown in Table 3 and Figure 4, from which we can see the followings: (i) the two ProtFold-DFG predictors based on the KL divergence can achieve stable performance, indicating that the DFG constructed by KL divergence is insensitive with the node initial values and the iteration times t (Equation 8). (ii) In contrast, the two ProtFold-DFG predictors based on the out-degree achieve lower performance, and the initial value and the iteration times obviously impact their performance.

As discussed above, the PageRank updates the ranking list during each iteration. Therefore, a suitable node initial value helps the PageRank to find an optimum ranking list. However, the initial value has little impact on the results with more iteration times. Compared with the out-degree, the KL divergence considers the subtle interactions among proteins in the ranking list. Therefore, the two ProtFold-DFG predictors based on the KL divergence achieve higher and more stable performance than that of the two ProtFold-DFG predictors based on the out-degree.

The performance of three link analysis algorithms performed on the DFG

Inspired by the Google matrix, we propose the DFG to fuse different ranking lists and incorporate the local and global interactions among proteins. The link analysis algorithm is then performed on the DFG for protein fold recognition. In this section, the performance of three link analysis algorithms is explored, including PageRank [23], Hyperlink Induced Topic Selection (HITS) [32] and Triadic Closure Principle (TCP) [21]. The predictive results of the three ProtFold-DFG predictors based on the three link analysis algorithms are shown in Table 4. The ProtFold-DFG predictor based on PageRank achieves the best performance. The reason is that PageRank is able to search the related proteins in the DFG globally, making full use of the mutual recommendation relationships among all the proteins in the DFG. In contrast, HITS only models the protein local interactions, and TCP only considers the pairwise interactions between two proteins.

Performance comparison with 35 other related methods

In order to evaluate the performance of the ProtFold-DFG, it is compared with other 35 computational methods for protein fold recognition, including PSI-BLAST [4], HMMER [33], SAM-T98 [34], BLASTLINK [24], SSEARCH [35], SSHMM [35], THREADER [36], FUGUE [37], RAPTOR [38], SPARKS [39], SP3 [39], FOLDpro [13], HHpred [40], SP4 [41], SP5 [42], BoostThreader [43], SPARKS-X [44], FFAS-3D [45], RF-Fold [46], DN-Fold [47], RFDN-Fold [47], DN-FoldS [47], DN-FoldR [47], HH-fold [48], TA-fold [48], dRHP-PseRA [49], MT-fold [18], DeepFR (strategy 1) [10], DeepFR (strategy 2) [10], DeepFRpro (strategy 1) [10], DeepFRpro (stragegy 2) [10], DeepSVM-fold [7], MotifCNN-fold [11], Fold-LTR-TCP [21] and FoldRec-C2C [50].

The results of the 36 predictors are listed in Table 5, from which we can see the followings: (i) ProtFold-DFG outperforms the compuational methods used for generating the basic ranking lists, including DeepSVM-fold, Fold-LTR-TCP and MotifCNN-fold. These results further confirm that the fusion of the different basic ranking lists by DFG can improve the predictive performance. (ii) The three network-based methods (Fold-LTR-TCP, FoldRec-C2C and ProtFold-DFG) show better performance than the alignment methods and the machine-learning-based methods. The reason is that the network-based methods incorporate the global relationships among proteins into the prediction, which is more efficient than only considering the pairwise protein similarities or the local relationships among proteins. (iii) ProtFold-DFG achieves the best performance among the 36 predictors. The reason is that ProtFold-DFG is able to more accurately mesaure the relationships among proteins by fusing complementary ranking lists, and the link analysis algorithm PageRank is able to globally analyze the relationships among proteins in the DFG to accurately detect the protein folds.

Feature analysis

In order to further explore the reasons why the ProtFold-DFG predictor is able to accurately detect the protein folds, a query protein 1bdo-d1bdo from protein fold 2_59 (SCOP ID) is selected as an example, and the prediction results of ProtFold-DFG are visualized in Figure 5, from which we can see the followings: (i) according to the top hits in the five basic ranking lists shown in Figure 5A–E, the fold type of the query protein is incorrectly predicted by the five computational predictors. (ii) DFG is able to fuse the five ranking lists by considering the local and global relationships among proteins (Figure 5F). (iii) Performed on the DFG, the PageRank algorithm is able to re-rank the proteins in the DFG and correctly predict the fold of the query protein 1bdo-d1bdo as 2_59 (Figure 5G). These results are not surprising, although the top hits in the basic ranking lists of the five predictors are incorrect, these errors can be corrected by fusing these complementary ranking lists and analyzing their relationships comprehensively. ProtFold-DFG treats the proteins as the literature, and finds the most closely linked proteins by combining the PageRank and DFG. These proteins can be considered as in the same protein fold.

Conclusion

Protein fold recognition is one of the key tasks in bioinformatics. In this regard, some computational methods have been proposed. Among these methods, the network-based methods can achieve the state-of-the-art performance, because they can more efficiently model the relationships among proteins with the help of the protein similarity network [50]. In this study, we propose a new network-based computational predictor called ProtFold-DFG. Compared with other existing methods, it has the following advantages: (i) The proposed DFG is able to fuse the complementary ranking lists generated by different methods, and more interactions can be incorporated by transitive closure. (ii) KL divergence can more accurately calculate the weights of the edges in DFG, which is insensitive with the initial values and the literation times. As a result, the performance and generalization ability of ProtFold-DFG can be obviously improved. (iii) The PageRank algorithm can correct the prediction errors in the basic ranking lists by comprehensively considering the protein interactions in the DFG. Because the DFG is able to fuse the results of different ranking methods, other applications in bioinformatics can be anticipated, such as protein remote homology detection [53].

Key Points

Protein fold recognition is a fundamental task for detecting protein structures and determining the protein functions. More accurate computational predictors are highly required.
We propose the ProtFold-DFG predictor for protein fold recognition. It is based on the DFG generated by different basic ranking methods. In order to globally consider the interactions among proteins in the DFG, the PageRank is performed on DFG to re-rank the ranking lists so as to improve the predictive performance.
Experimental results show that the ProtFold-DFG predictor outperforms the other 35 state-of-the-art computational methods. The source code and data of the ProtFold-DFG can be accessed from http://bliulab.net/ProtFold-DFG/download/

Acknowledgements

The authors are very much indebted to the three anonymous reviewers, whose constructive comments are very helpful in strengthening the presentation of this article.

Funding

The National Key R&D Program of China (No. 2018AAA0100100); National Natural Science Foundation of China (No. 61672184, 61822306, 61702134, 61732012, 61861146002); the Beijing Natural Science Foundation (No. JQ19019); Fok Ying-Tung Education Foundation for Young Teachers in the Higher Education Institutions of China (161063).

Jiangyi Shao is a master student at the School of Computer Science and Technology, Beijing Institute of Technology, China. His expertise is in bioinformatics.

Bin Liu is a professor at the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China. His expertise is in bioinformatics, natural language processing and machine learning.

References

1.

Chothia

C

,

Finkelstein

AV

.

The classification and origins of protein folding patterns

.

Annu Rev Biochem

1990

;

59

:

1007

–

39

.

2.

Thompson

JD

,

Higgins

DG

,

Gibson

TJ

.

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

.

Nucleic Acids Res

1994

;

22

:

4673

–

80

.

3.

Söding

J

.

Protein homology detection by HMM--HMM comparison

.

Bioinformatics

2005

;

21

:

951

–

60

.

4.

Altschul

SF

,

Madden

TL

,

Schaffer

AA

, et al.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

.

Nucleic Acids Res

1997

;

25

:

3389

–

402

.

5.

Dong

Q

,

Zhou

S

,

Guan

J

.

A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation

.

Bioinformatics

2009

;

25

:

2655

–

62

.

6.

Gromiha

MM

,

Selvaraj

S

.

Inter-residue interactions in protein folding and stability

.

Prog Biophys Mol Biol

2004

;

86

:

235

–

77

.

7.

Liu

B

,

Li

C

,

Yan

K

.

DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks

.

Brief Bioinform

. doi:

10.1093/bib/bbz098

.

OpenURL Placeholder Text

WorldCat

Crossref

8.

Han

KF

,

Baker

D

.

Recurring local sequence motifs in proteins

.

J Mol Biol

1995

;

251

:

176

–

87

.

9.

Simons

KT

,

Kooperberg

C

,

Huang

E

, et al.

Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions

.

J Mol Biol

1997

;

268

:

209

–

25

.

10.

Zhu

J

,

Zhang

H

,

Li

SC

, et al.

Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts

.

Bioinformatics

2017

;

33

:

3749

–

57

.

11.

Li

C-C

,

Liu

B

.

MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks

.

Brief Bioinform

. doi:

10.1093/bib/bbz133

.

OpenURL Placeholder Text

WorldCat

Crossref

12.

Wei

L

,

Zou

Q

.

Recent progress in machine learning-based methods for protein fold recognition

.

Int J Mol Sci

2016

;

17

:

2118

.

Google Scholar

Crossref

WorldCat

13.

Cheng

J

,

Baldi

P

.

A machine learning information retrieval approach to protein fold recognition

.

Bioinformatics

2006

;

22

:

1456

–

63

.

14.

Li

D

,

Ju

Y

,

Zou

Q

.

Protein folds prediction with hierarchical structured SVM

.

Curr Proteomics

2016

;

13

:

79

–

85

.

Google Scholar

Crossref

WorldCat

15.

Shen

H-B

,

Chou

K-C

.

Ensemble classifier for protein fold pattern recognition

.

Bioinformatics

2006

;

22

:

1717

–

22

.

16.

Peng

L

,

Peng

MM

,

Liao

B

, et al.

The advances and challenges of deep learning application in biological big data processing

.

Curr Bioinform

2018

;

13

:

352

–

9

.

Google Scholar

Crossref

WorldCat

17.

Lv

Z

,

Ao

C

,

Zou

Q

.

Protein function prediction: from traditional classifier to deep learning

.

Proteomics

2019

;

19

:

1900119

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18.

Yan

K

,

Fang

X

,

Xu

Y

, et al.

Protein fold recognition based on multi-view modeling

.

Bioinformatics

2019

;

35

:

2982

–

90

.

19.

Lhota

J

,

Hauptman

R

,

Hart

T

, et al.

A new method to improve network topological similarity search: applied to fold recognition

.

Bioinformatics

2015

;

31

:

2106

–

14

.

20.

Cui

X

,

Lu

Z

,

Wang

S

, et al.

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

.

Bioinformatics

2016

;

32

:

i332

–

40

.

21.

Liu

B

,

Zhu

Y

,

Yan

K

.

Fold-LTR-TCP: protein fold recognition based on triadic closure principle

.

Brief Bioinform

. doi:

10.1093/bib/bbz139

.

OpenURL Placeholder Text

WorldCat

Crossref

22.

Burges

CJC

.

From RankNet to LambdaRank to LambdaMART: an overview

.

Microsoft Research Technical Report MSR-TR-2010-82

2010

.

23.

Haveliwala

T

,

Kamvar

S.

The second eigenvalue of the Google matrix

,

Technical Report

,

Stanford University

2003

. http://ilpubs.stanford.edu:8090/582/

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

24.

Lindahl

E

,

Elofsson

A

.

Identification of related proteins on family, superfamily and fold level

.

J Mol Biol

2000

;

295

:

613

–

25

.

25.

Patil

K

,

Chouhan

U

.

Relevance of machine learning techniques and various protein features in protein fold classification: a review

.

Curr Bioinform

2019

;

14

:

688

–

97

.

Google Scholar

Crossref

WorldCat

26.

Ioannidis

YE

,

Ramakrishnan

R

. Efficient transitive closure algorithms. In:

Proceedings of the 14th VLDB Conference

.

1988

, p.

382

–

94

.

27.

Goldberger

J

,

Gordon

S

,

Greenspan

H

. An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. In:

Proceedings Ninth IEEE International Conference on Computer Vision

.

France

,

2003

, p.

487

–

93

.

28.

Langville

AN

,

Meyer

CD

.

Deeper inside PageRank

.

Internet Math

2004

;

1

:

335

–

80

.

Google Scholar

Crossref

WorldCat

29.

Page

L

,

Brin

S

,

Motwani

R

et al.

The PageRank Citation ranking: bringing order to the web

,

Technical Report

1999

. http://ilpubs.stanford.edu:8090/422/.

30.

Liu

B

,

Jiang

S

,

Zou

Q

.

HITS-PR-HHblits: protein remote homology detection by combining PageRank and hyperlink-induced topic search

.

Brief Bioinform

2020

;

21

:

298

–

308

.

Google Scholar

OpenURL Placeholder Text

WorldCat

31.

Brin

S

,

Page

L

.

The anatomy of a large-scale hypertextual web search engine

.

Comput Netw ISDN Syst

1998

;

30

:

107

–

17

.

Google Scholar

Crossref

WorldCat

32.

Kleinberg

JM

.

Authoritative sources in a hyperlinked environment

.

J ACM

1999

;

46

:

604

–

32

.

Google Scholar

Crossref

WorldCat

33.

McClure

MA

,

Smith

C

,

Elton

P

.

Parameterization studies for the SAM and HMMER methods of hidden Markov model generation

.

Proc Int Conf Intell Syst Mol Biol

1996

;

4

:

155

–

64

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

34.

Karplus

K

,

Barrett

C

,

Hughey

R

.

Hidden Markov models for detecting remote protein homologies

.

Bioinformatics

1998

;

14

:

846

–

56

.

35.

Hargbo

J

,

Elofsson

A

.

Hidden Markov models that use predicted secondary structures for fold recognition

.

Proteins

1999

;

36

:

68

–

76

.

36.

Jones

DT

,

Taylor

WR

,

Thornton

JM

.

A new approach to protein fold recognition

.

Nature

1992

;

358

:

86

–

9

.

37.

Shi

J

,

Blundell

TL

,

Mizuguchi

K

.

FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties

.

J Mol Biol

2001

;

310

:

243

–

57

.

38.

Xu

J

,

Li

M

,

Kim

D

, et al.

RAPTOR: optimal protein threading by linear programming

.

J Bioinform Comput Biol

2003

;

1

:

95

–

117

.

39.

Zhou

H

,

Zhou

Y

.

Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition

.

Proteins

2004

;

55

:

1005

–

13

.

40.

Soding

J

,

Biegert

A

,

Lupas

AN

.

The HHpred interactive server for protein homology detection and structure prediction

.

Nucleic Acids Res

2005

;

33

:

W244

–

8

.

41.

Liu

S

,

Zhang

C

,

Liang

S

, et al.

Fold recognition by concurrent use of solvent accessibility and residue depth

.

Proteins

2007

;

68

:

636

–

45

.

42.

Zhang

W

,

Liu

S

,

Zhou

Y

.

SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model

.

PLoS One

2008

;

3

:

e2325

.

43.

Peng

J

,

Xu

J

.

Boosting protein threading accuracy

.

Res Comput Mol Biol

2009

;

5541

:

31

–

45

.

44.

Yang

Y

,

Faraggi

E

,

Zhao

H

, et al.

Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates

.

Bioinformatics

2011

;

27

:

2076

–

82

.

45.

Xu

D

,

Jaroszewski

L

,

Li

Z

, et al.

FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking

.

Bioinformatics

2014

;

30

:

660

–

7

.

46.

Jo

T

,

Cheng

J

.

Improving protein fold recognition by random forest

.

BMC Bioinform

2014

;

15

(

Suppl 11

):

S14

.

Google Scholar

OpenURL Placeholder Text

WorldCat

47.

Jo

T

,

Hou

J

,

Eickholt

J

, et al.

Improving protein fold recognition by deep learning networks

.

Sci Rep

2015

;

5

:

17573

.

48.

Xia

J

,

Peng

Z

,

Qi

D

, et al.

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier

.

Bioinformatics

2017

;

33

:

863

–

70

.

49.

Chen

J

,

Long

R

,

Wang

XL

, et al.

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation

.

Sci Rep

2016

;

6

:

32333

.

50.

Shao

J

,

Yan

K

,

Liu

B

.

FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network

.

Brief Bioinform

. doi:

10.1093/bib/bbaa144

.

OpenURL Placeholder Text

WorldCat

Crossref

51.

Zhou

H

,

Zhou

Y

.

Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments

.

Proteins

2005

;

58

:

321

–

8

.

52.

Bastian

M

,

Heymann

S

,

Jacomy

M.

Gephi: an open source software for exploring and manipulating networks. In:

Third international AAAI conference on weblogs and social media

.

2009

.

53.

Liu

B

,

Li

S

. ProtDet-CCH: Protein remote homology detection by combining Long Short-Term Memory and ranking methods.

IEEE/ACM Transactions on Computational Biology and Bioinformatics

2019

;

16

:

1203

–

10

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
September 2020	53
October 2020	35
November 2020	19
December 2020	30
January 2021	19
February 2021	17
March 2021	32
April 2021	13
May 2021	13
June 2021	25
July 2021	18
August 2021	18
September 2021	22
October 2021	16
November 2021	23
December 2021	8
January 2022	21
February 2022	12
March 2022	11
April 2022	7
May 2022	27
June 2022	11
July 2022	9
August 2022	26
September 2022	1
October 2022	4
November 2022	29
December 2022	23
January 2023	14
March 2023	4
April 2023	4
May 2023	13
June 2023	11
July 2023	13
August 2023	14
September 2023	32
October 2023	12
November 2023	29
December 2023	39
January 2024	37
February 2024	57
March 2024	21
April 2024	24
May 2024	19
June 2024	16
July 2024	15
August 2024	8
September 2024	25
October 2024	42
November 2024	61
December 2024	15
January 2025	31
February 2025	21
March 2025	33
April 2025	14
May 2025	4

Article Contents

ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm

Abstract

Introduction

Materials and methods

Benchmark data set

The framework of ProtFold-DFG

Generate the basic ranking lists

Merge multiple ranking lists

Construct the DFG

Perform PageRank on DFG

Evaluation methodology

Results and discussion

The influence of the different combinations of the basic ranking lists on the performance of the ProtFold-DFG predictors

The impact of the transitive closure on the performance of ProtFold-DFG

ProtFold-DFG based on KL divergence can improve its generalization ability

The performance of three link analysis algorithms performed on the DFG

Performance comparison with 35 other related methods

Feature analysis

Conclusion

Acknowledgements

Funding

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm

Abstract

Introduction

Materials and methods

Benchmark data set

The framework of ProtFold-DFG

Generate the basic ranking lists

Merge multiple ranking lists

Construct the DFG

Perform PageRank on DFG

Evaluation methodology

Results and discussion

The influence of the different combinations of the basic ranking lists on the performance of the ProtFold-DFG predictors

The impact of the transitive closure on the performance of ProtFold-DFG

ProtFold-DFG based on KL divergence can improve its generalization ability

The performance of three link analysis algorithms performed on the DFG

Performance comparison with 35 other related methods

Feature analysis

Conclusion

Acknowledgements

Funding

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only