Inter-kingdom prediction certainty evaluation of protein subcellular localization tools: microbial pathogenesis approach for deciphering host microbe interaction

Khan, Abdul Arif; Khan, Zakir; Kalam, Mohd Abul; Khan, Azmat Ali

doi:10.1093/bib/bbw093

Abstract

Microbial pathogenesis involves several aspects of host–pathogen interactions, including microbial proteins targeting host subcellular compartments and subsequent effects on host physiology. Such studies are supported by experimental data, but recent detection of bacterial proteins localization through computational eukaryotic subcellular protein targeting prediction tools has also come into practice. We evaluated inter-kingdom prediction certainty of these tools. The bacterial proteins experimentally known to target host subcellular compartments were predicted with eukaryotic subcellular targeting prediction tools, and prediction certainty was assessed. The results indicate that these tools alone are not sufficient for inter-kingdom protein targeting prediction. The correct prediction of pathogen’s protein subcellular targeting depends on several factors, including presence of localization signal, transmembrane domain and molecular weight, etc., in addition to approach for subcellular targeting prediction. The detection of protein targeting in endomembrane system is comparatively difficult, as the proteins in this location are channelized to different compartments. In addition, the high specificity of training data set also creates low inter-kingdom prediction accuracy. Current data can help to suggest strategy for correct prediction of bacterial protein’s subcellular localization in host cell.

protein targeting, microbial pathogenesis, in silico, nuclear proteins, mitochondrial proteins

Introduction

Microbial pathogenesis involves a highly coordinated response of the pathogens with the host for their survival, growth and reproduction. This coordination is multifaceted and involves microbial attachment to the host and the subsequent signaling with host cell machinery. These events are managed through multiple processes including pathogen proteins targeting the host cell. These targeted proteins get localized in several host subcellular compartments [1]. The most important among these are nucleus and mitochondria, which carry genetic material and control host cell survival and death. The bacterial proteins migrating to host nucleus are also known as nucleomodulins [2]. The nucleus is core of entire eukaryotic cellular machinery and controls genetic expression, which governs whole cell physiology. The mitochondrion is also a critically important organelle of eukaryotic cell that controls the energy requirement of cell. It is also involved in regulating intrinsic pathway of apoptosis, thereby controlling cellular senescence and death. These two organelles are common in terms of having their own genetic material susceptible to several bacterial genetic modulator proteins. In addition, several microbial proteins are known to target host cell endomembrane system and cytoplasm. The endomembrane system includes various membrane-bound compartments of eukaryotic cell, which include nuclear membrane, rough and smooth endoplasmic reticulum, golgi, cytoplasmic vesicles, which is connected to each other either directly or by vesicle transport. During microbial pathogenesis, these membrane-bound compartments communicate with each other and involves pathogen protein subcellular targeting among endomembrane system components [3–5]. Targeting of bacterial proteins in host cell cytoplasm is a common event affecting host cell machinery. For example, anthrax lethal toxin produced by bacteria Bacillus anthracis migrate to host cell cytoplasm and influence several host proteins including mitogen-activated protein kinase and kill macrophages and macrophage-like cell lines [6].

Several studies tried to detect pathogen protein targeting host cell to decipher their role in microbial pathogenesis, regulation of host cell physiology including cell death and proliferation [7–9]. As the experimental analysis of whole microbial proteome is always a labor-intensive and extravagant task and every laboratory cannot afford it, therefore computational prediction of microbial proteins targeting host cell is now a routine practice [10–14]. Several computational tools are available for predicting subcellular targeting of certain proteins. However, these tools are based on certain data set derived from same type of organism for which they are designed to predict subcellular targeting, but the capability of these tools for inter-organism prediction needs to be investigated. These tools work on variety of principles including detection of localization signal, evolutionary information, amino acid composition, dipeptide composition, sequence similarity, transmembrane segment, etc. (Figure 1). Each method has its own limitations and advantages, but they claim to have certain prediction ability depending on the type of tools (Table 1). Although prediction reliability of these tools is assessed for certain types of organisms, which is included in their training data set, evaluation of their prediction reliability for microbial proteins is required.

Table 1

Different prediction tools used during the study and their prediction approach, training data set and reliability as mentioned in literature

Sr. No.	Prediction tool	Database size and validation process		Reliability/prediction performance (as per literature)
Sr. No.	Prediction tool	Database size and validation process			Sensitivity	Specificity	Accuracy
1	cNLS mapper [15, 16]	Predicts NLS in query protein. The NLS activity is measured instead of conventional sequence similarity or machine-learning strategy. The NLS activity score is contributed by every amino acid residue at certain position. These predictions were validated by analyzing effect of replacing each individual amino acid and its effect on NLS activity for a certain class in budding yeast. It was found that each amino acid within an NLS contributes to the entire activity independently. Training data and limitations: NLS profiles were prepared through budding yeast data after considering conserved nature of importin α/β pathway in eukaryotes, but the prediction for other distant organisms may be less efficient. It cannot predict protein directly binding to importin β or working with α-independent NLSs.		Class ½	99	94	98
				Class 3	100	100	100
				Class 4	87	97	92
				Bipartite	87	82	85
				Values are based on test peptide sequence from synthetic NLS mutant
2	PSORT II [17, 18]	Detect sorting signal sequence plus transmembrane segment and membrane topology Training data: 1531 yeast sequences from Swiss-Prot		57% for yeast sequences and 86% f or Escherichia coli sequences
3	WOLF PSORT [19]	Uses amino acid composition in addition to PSORT features Training data set: Fungi: 2113; plant: 2333; animal: 12 771 proteins		70% sensitivity and specificity for mitochondria, nucleus, cytosol, PM, EC and chloroplast Low sensitivity for other sites
4	TargetP [20]	Uses N terminal sequence information only		Plants: 85% Non-plants: 90% On redundancy-reduced test sets
		Plant	Chloroplast transit peptide (cTP): 141; mitochondrial targeting peptide (mTP): 368; secretory: 269; nuclear: 102; cytosolic: 195
		Non-plant	Cytosolic: 438; mTP: 371; secretory: 715; nuclear: 1214
5	Mitoprot [21]	Evaluation of 47 parameters of large set of mitochondrial proteins present in Swiss-Prot Training data set: 12 432 non-mitochondrial and 607 mitochondrial proteins		With considering only amino acid sequence: 75–97% With Mictochondrial targeting sequence (MTS): 76–94%
6	BaCeILo [22]	Evaluate residue sequence and alignment profiles. It evaluates N- and C-termini sequence as well as whole protein sequence. The results are balanced in different categories to avoid effect of biased training data set. The similarity of data set was reduced to make sure that no protein has >30% identity, and prediction are balanced Training data set: 2597 animals, 1198 fungi and 491 plants proteins		Animal: 74% Fungi: 76% Plants: 67%
7	HSLPred [23]	Uses SVM to evaluate amino acid composition, dipeptide composition, PSI-BLAST and hybrid method including all above approaches Training data set: 3532 human proteins (cytoplasmic: 840; mitochondrial: 315; nuclear: 858; PM: 1519; endoplasmic reticulum: 63; EC: 48; peroxisome: 25; lysosome: 51; Golgi: 32; centrosome: 8; microsome: 21)		Amino acid composition: 76.6% Dipeptide composition: 77.8% Similarity based: 73.3% Hybrid approach: 84.9%
8	ESLPred [24]	Uses multiple approaches including amino acid composition-based SVM, physicochemical properties-based SVM, dipeptide composition-based SVM and PSI-BLAST-based SVM and a hybrid approach involving all above methods Training data set: 2427 eukaryotic proteins (cytosol: 684; mitochondrial: 321; nuclear: 1097; and EC: 325)		Amino acid composition: 78.1% Physicochemical properties: 77.8% Dipeptide based: 82.4% Hybrid module: 88.0%
9	SubLoc v 1.0 [25]	Analyzes sequences composition using SVM Training data set: Prokaryotic (cytosol: 688; periplasmic: 202; EC: 107) Eukaryotic (nuclear: 1097; cytosol: 684; mitochondrial: 321; EC: 325)		Three locations of prokaryotes: 91.4% Four locations of eukaryotes: 79.4%
10	EffectiveDB [26]	It is a combination of tools to predict secretion of bacterial proteins and their subsequent localization in subcellular compartments. We used the following: EffectiveT3 (predict signal peptide for type 3 secretion system) Training data set: 504 T3ss secreted proteins T4SEPre (predict type 4 secretion system) Training data set: 1913 T4SS effectors from 10 genera Predotar (predict N-terminal targeting sequence for host subcellular targeting) Training data set: 13 668 proteins with known subcellular location in Swiss-Prot		ET3: specificity: 93%; sensitivity: 73%, accuracy: 86%, Matthews correlation coefficient (MCC) = 0.66 T4SEPre: sensitivity: 89%, specificity: 97% Predotar: plant: 91.62% Non-plant: 94.00% [27]

11	TMPred [28]	It predicts membrane-spanning regions of certain protein with their orientation		Average prediction reliability for photosynthetic reaction centre, bacteriorhodopsin, and cytochrome c oxidase: 84.5% [29]

Sr. No.	Prediction tool	Database size and validation process		Reliability/prediction performance (as per literature)
Sr. No.	Prediction tool	Database size and validation process			Sensitivity	Specificity	Accuracy
1	cNLS mapper [15, 16]	Predicts NLS in query protein. The NLS activity is measured instead of conventional sequence similarity or machine-learning strategy. The NLS activity score is contributed by every amino acid residue at certain position. These predictions were validated by analyzing effect of replacing each individual amino acid and its effect on NLS activity for a certain class in budding yeast. It was found that each amino acid within an NLS contributes to the entire activity independently. Training data and limitations: NLS profiles were prepared through budding yeast data after considering conserved nature of importin α/β pathway in eukaryotes, but the prediction for other distant organisms may be less efficient. It cannot predict protein directly binding to importin β or working with α-independent NLSs.		Class ½	99	94	98
				Class 3	100	100	100
				Class 4	87	97	92
				Bipartite	87	82	85
				Values are based on test peptide sequence from synthetic NLS mutant
2	PSORT II [17, 18]	Detect sorting signal sequence plus transmembrane segment and membrane topology Training data: 1531 yeast sequences from Swiss-Prot		57% for yeast sequences and 86% f or Escherichia coli sequences
3	WOLF PSORT [19]	Uses amino acid composition in addition to PSORT features Training data set: Fungi: 2113; plant: 2333; animal: 12 771 proteins		70% sensitivity and specificity for mitochondria, nucleus, cytosol, PM, EC and chloroplast Low sensitivity for other sites
4	TargetP [20]	Uses N terminal sequence information only		Plants: 85% Non-plants: 90% On redundancy-reduced test sets
		Plant	Chloroplast transit peptide (cTP): 141; mitochondrial targeting peptide (mTP): 368; secretory: 269; nuclear: 102; cytosolic: 195
		Non-plant	Cytosolic: 438; mTP: 371; secretory: 715; nuclear: 1214
5	Mitoprot [21]	Evaluation of 47 parameters of large set of mitochondrial proteins present in Swiss-Prot Training data set: 12 432 non-mitochondrial and 607 mitochondrial proteins		With considering only amino acid sequence: 75–97% With Mictochondrial targeting sequence (MTS): 76–94%
6	BaCeILo [22]	Evaluate residue sequence and alignment profiles. It evaluates N- and C-termini sequence as well as whole protein sequence. The results are balanced in different categories to avoid effect of biased training data set. The similarity of data set was reduced to make sure that no protein has >30% identity, and prediction are balanced Training data set: 2597 animals, 1198 fungi and 491 plants proteins		Animal: 74% Fungi: 76% Plants: 67%
7	HSLPred [23]	Uses SVM to evaluate amino acid composition, dipeptide composition, PSI-BLAST and hybrid method including all above approaches Training data set: 3532 human proteins (cytoplasmic: 840; mitochondrial: 315; nuclear: 858; PM: 1519; endoplasmic reticulum: 63; EC: 48; peroxisome: 25; lysosome: 51; Golgi: 32; centrosome: 8; microsome: 21)		Amino acid composition: 76.6% Dipeptide composition: 77.8% Similarity based: 73.3% Hybrid approach: 84.9%
8	ESLPred [24]	Uses multiple approaches including amino acid composition-based SVM, physicochemical properties-based SVM, dipeptide composition-based SVM and PSI-BLAST-based SVM and a hybrid approach involving all above methods Training data set: 2427 eukaryotic proteins (cytosol: 684; mitochondrial: 321; nuclear: 1097; and EC: 325)		Amino acid composition: 78.1% Physicochemical properties: 77.8% Dipeptide based: 82.4% Hybrid module: 88.0%
9	SubLoc v 1.0 [25]	Analyzes sequences composition using SVM Training data set: Prokaryotic (cytosol: 688; periplasmic: 202; EC: 107) Eukaryotic (nuclear: 1097; cytosol: 684; mitochondrial: 321; EC: 325)		Three locations of prokaryotes: 91.4% Four locations of eukaryotes: 79.4%
10	EffectiveDB [26]	It is a combination of tools to predict secretion of bacterial proteins and their subsequent localization in subcellular compartments. We used the following: EffectiveT3 (predict signal peptide for type 3 secretion system) Training data set: 504 T3ss secreted proteins T4SEPre (predict type 4 secretion system) Training data set: 1913 T4SS effectors from 10 genera Predotar (predict N-terminal targeting sequence for host subcellular targeting) Training data set: 13 668 proteins with known subcellular location in Swiss-Prot		ET3: specificity: 93%; sensitivity: 73%, accuracy: 86%, Matthews correlation coefficient (MCC) = 0.66 T4SEPre: sensitivity: 89%, specificity: 97% Predotar: plant: 91.62% Non-plant: 94.00% [27]

11	TMPred [28]	It predicts membrane-spanning regions of certain protein with their orientation		Average prediction reliability for photosynthetic reaction centre, bacteriorhodopsin, and cytochrome c oxidase: 84.5% [29]

EC = endothelial cell; PM = plasma membrane.

Table 1

Different prediction tools used during the study and their prediction approach, training data set and reliability as mentioned in literature

Sr. No.	Prediction tool	Database size and validation process		Reliability/prediction performance (as per literature)
Sr. No.	Prediction tool	Database size and validation process			Sensitivity	Specificity	Accuracy
1	cNLS mapper [15, 16]	Predicts NLS in query protein. The NLS activity is measured instead of conventional sequence similarity or machine-learning strategy. The NLS activity score is contributed by every amino acid residue at certain position. These predictions were validated by analyzing effect of replacing each individual amino acid and its effect on NLS activity for a certain class in budding yeast. It was found that each amino acid within an NLS contributes to the entire activity independently. Training data and limitations: NLS profiles were prepared through budding yeast data after considering conserved nature of importin α/β pathway in eukaryotes, but the prediction for other distant organisms may be less efficient. It cannot predict protein directly binding to importin β or working with α-independent NLSs.		Class ½	99	94	98
				Class 3	100	100	100
				Class 4	87	97	92
				Bipartite	87	82	85
				Values are based on test peptide sequence from synthetic NLS mutant
2	PSORT II [17, 18]	Detect sorting signal sequence plus transmembrane segment and membrane topology Training data: 1531 yeast sequences from Swiss-Prot		57% for yeast sequences and 86% f or Escherichia coli sequences
3	WOLF PSORT [19]	Uses amino acid composition in addition to PSORT features Training data set: Fungi: 2113; plant: 2333; animal: 12 771 proteins		70% sensitivity and specificity for mitochondria, nucleus, cytosol, PM, EC and chloroplast Low sensitivity for other sites
4	TargetP [20]	Uses N terminal sequence information only		Plants: 85% Non-plants: 90% On redundancy-reduced test sets
		Plant	Chloroplast transit peptide (cTP): 141; mitochondrial targeting peptide (mTP): 368; secretory: 269; nuclear: 102; cytosolic: 195
		Non-plant	Cytosolic: 438; mTP: 371; secretory: 715; nuclear: 1214
5	Mitoprot [21]	Evaluation of 47 parameters of large set of mitochondrial proteins present in Swiss-Prot Training data set: 12 432 non-mitochondrial and 607 mitochondrial proteins		With considering only amino acid sequence: 75–97% With Mictochondrial targeting sequence (MTS): 76–94%
6	BaCeILo [22]	Evaluate residue sequence and alignment profiles. It evaluates N- and C-termini sequence as well as whole protein sequence. The results are balanced in different categories to avoid effect of biased training data set. The similarity of data set was reduced to make sure that no protein has >30% identity, and prediction are balanced Training data set: 2597 animals, 1198 fungi and 491 plants proteins		Animal: 74% Fungi: 76% Plants: 67%
7	HSLPred [23]	Uses SVM to evaluate amino acid composition, dipeptide composition, PSI-BLAST and hybrid method including all above approaches Training data set: 3532 human proteins (cytoplasmic: 840; mitochondrial: 315; nuclear: 858; PM: 1519; endoplasmic reticulum: 63; EC: 48; peroxisome: 25; lysosome: 51; Golgi: 32; centrosome: 8; microsome: 21)		Amino acid composition: 76.6% Dipeptide composition: 77.8% Similarity based: 73.3% Hybrid approach: 84.9%
8	ESLPred [24]	Uses multiple approaches including amino acid composition-based SVM, physicochemical properties-based SVM, dipeptide composition-based SVM and PSI-BLAST-based SVM and a hybrid approach involving all above methods Training data set: 2427 eukaryotic proteins (cytosol: 684; mitochondrial: 321; nuclear: 1097; and EC: 325)		Amino acid composition: 78.1% Physicochemical properties: 77.8% Dipeptide based: 82.4% Hybrid module: 88.0%
9	SubLoc v 1.0 [25]	Analyzes sequences composition using SVM Training data set: Prokaryotic (cytosol: 688; periplasmic: 202; EC: 107) Eukaryotic (nuclear: 1097; cytosol: 684; mitochondrial: 321; EC: 325)		Three locations of prokaryotes: 91.4% Four locations of eukaryotes: 79.4%
10	EffectiveDB [26]	It is a combination of tools to predict secretion of bacterial proteins and their subsequent localization in subcellular compartments. We used the following: EffectiveT3 (predict signal peptide for type 3 secretion system) Training data set: 504 T3ss secreted proteins T4SEPre (predict type 4 secretion system) Training data set: 1913 T4SS effectors from 10 genera Predotar (predict N-terminal targeting sequence for host subcellular targeting) Training data set: 13 668 proteins with known subcellular location in Swiss-Prot		ET3: specificity: 93%; sensitivity: 73%, accuracy: 86%, Matthews correlation coefficient (MCC) = 0.66 T4SEPre: sensitivity: 89%, specificity: 97% Predotar: plant: 91.62% Non-plant: 94.00% [27]

11	TMPred [28]	It predicts membrane-spanning regions of certain protein with their orientation		Average prediction reliability for photosynthetic reaction centre, bacteriorhodopsin, and cytochrome c oxidase: 84.5% [29]

Sr. No.	Prediction tool	Database size and validation process		Reliability/prediction performance (as per literature)
Sr. No.	Prediction tool	Database size and validation process			Sensitivity	Specificity	Accuracy
1	cNLS mapper [15, 16]	Predicts NLS in query protein. The NLS activity is measured instead of conventional sequence similarity or machine-learning strategy. The NLS activity score is contributed by every amino acid residue at certain position. These predictions were validated by analyzing effect of replacing each individual amino acid and its effect on NLS activity for a certain class in budding yeast. It was found that each amino acid within an NLS contributes to the entire activity independently. Training data and limitations: NLS profiles were prepared through budding yeast data after considering conserved nature of importin α/β pathway in eukaryotes, but the prediction for other distant organisms may be less efficient. It cannot predict protein directly binding to importin β or working with α-independent NLSs.		Class ½	99	94	98
				Class 3	100	100	100
				Class 4	87	97	92
				Bipartite	87	82	85
				Values are based on test peptide sequence from synthetic NLS mutant
2	PSORT II [17, 18]	Detect sorting signal sequence plus transmembrane segment and membrane topology Training data: 1531 yeast sequences from Swiss-Prot		57% for yeast sequences and 86% f or Escherichia coli sequences
3	WOLF PSORT [19]	Uses amino acid composition in addition to PSORT features Training data set: Fungi: 2113; plant: 2333; animal: 12 771 proteins		70% sensitivity and specificity for mitochondria, nucleus, cytosol, PM, EC and chloroplast Low sensitivity for other sites
4	TargetP [20]	Uses N terminal sequence information only		Plants: 85% Non-plants: 90% On redundancy-reduced test sets
		Plant	Chloroplast transit peptide (cTP): 141; mitochondrial targeting peptide (mTP): 368; secretory: 269; nuclear: 102; cytosolic: 195
		Non-plant	Cytosolic: 438; mTP: 371; secretory: 715; nuclear: 1214
5	Mitoprot [21]	Evaluation of 47 parameters of large set of mitochondrial proteins present in Swiss-Prot Training data set: 12 432 non-mitochondrial and 607 mitochondrial proteins		With considering only amino acid sequence: 75–97% With Mictochondrial targeting sequence (MTS): 76–94%
6	BaCeILo [22]	Evaluate residue sequence and alignment profiles. It evaluates N- and C-termini sequence as well as whole protein sequence. The results are balanced in different categories to avoid effect of biased training data set. The similarity of data set was reduced to make sure that no protein has >30% identity, and prediction are balanced Training data set: 2597 animals, 1198 fungi and 491 plants proteins		Animal: 74% Fungi: 76% Plants: 67%
7	HSLPred [23]	Uses SVM to evaluate amino acid composition, dipeptide composition, PSI-BLAST and hybrid method including all above approaches Training data set: 3532 human proteins (cytoplasmic: 840; mitochondrial: 315; nuclear: 858; PM: 1519; endoplasmic reticulum: 63; EC: 48; peroxisome: 25; lysosome: 51; Golgi: 32; centrosome: 8; microsome: 21)		Amino acid composition: 76.6% Dipeptide composition: 77.8% Similarity based: 73.3% Hybrid approach: 84.9%
8	ESLPred [24]	Uses multiple approaches including amino acid composition-based SVM, physicochemical properties-based SVM, dipeptide composition-based SVM and PSI-BLAST-based SVM and a hybrid approach involving all above methods Training data set: 2427 eukaryotic proteins (cytosol: 684; mitochondrial: 321; nuclear: 1097; and EC: 325)		Amino acid composition: 78.1% Physicochemical properties: 77.8% Dipeptide based: 82.4% Hybrid module: 88.0%
9	SubLoc v 1.0 [25]	Analyzes sequences composition using SVM Training data set: Prokaryotic (cytosol: 688; periplasmic: 202; EC: 107) Eukaryotic (nuclear: 1097; cytosol: 684; mitochondrial: 321; EC: 325)		Three locations of prokaryotes: 91.4% Four locations of eukaryotes: 79.4%
10	EffectiveDB [26]	It is a combination of tools to predict secretion of bacterial proteins and their subsequent localization in subcellular compartments. We used the following: EffectiveT3 (predict signal peptide for type 3 secretion system) Training data set: 504 T3ss secreted proteins T4SEPre (predict type 4 secretion system) Training data set: 1913 T4SS effectors from 10 genera Predotar (predict N-terminal targeting sequence for host subcellular targeting) Training data set: 13 668 proteins with known subcellular location in Swiss-Prot		ET3: specificity: 93%; sensitivity: 73%, accuracy: 86%, Matthews correlation coefficient (MCC) = 0.66 T4SEPre: sensitivity: 89%, specificity: 97% Predotar: plant: 91.62% Non-plant: 94.00% [27]

11	TMPred [28]	It predicts membrane-spanning regions of certain protein with their orientation		Average prediction reliability for photosynthetic reaction centre, bacteriorhodopsin, and cytochrome c oxidase: 84.5% [29]

EC = endothelial cell; PM = plasma membrane.

Figure 1

Graphical outline for different prediction methods used by different tools and their training data sets.

Open in new tab Download slide

An estimation of the reliability and accuracy of these inter-organism predictions is always a challenging task. Therefore, we designed this study for evaluating the ability of eukaryotic subcellular localization prediction tools to predict prokaryotic proteins as a query. This calibration is highly important in maintaining prediction accuracy of these tools for their use in microbial pathogenesis-related studies.

Materials and methods

Protein sequences

The 119 bacterial proteins experimentally known to target host subcellular compartments were selected for the study. These proteins included 44 (nuclear), 29 (mitochondrial), 32 (endomembrane system), 14 (cytosolic) proteins either known to target or interact with respective subcellular targeting location in host cell. Possible care was taken to avoid similar sequence with multiple accession numbers, but derived from similar bacterial strain. Although in some cases, proteins from two different organisms were included in the study, their origin from different bacteria made them suitable candidates for inclusion in the study. The protein sequences were retrieved from Uniprot, whereas the protein sequences, which were not found in Uniprot, were retrieved from NCBI protein database (details available in Supplementary tables). Both plant and animal pathogens (including human pathogens) were selected for prediction.

Selection of tools

The pathogen’s protein targeting in host cell is governed by multiple host pathogen factors. Under certain situations, pathogen proteins can passively localize to host subcellular compartments, and this property of proteins is governed by their molecular weight [30, 31]; therefore, we detected molecular weight of protein to understand their passive subcellular targeting. The pathogen proteins targeting host subcellular compartment are also regulated by presence of certain localization signals, so the tools predicting these localization signals were included in the study. The prediction tools based on single prediction approach cannot consider influence of other factors on subcellular targeting, therefore prediction tools detecting bacterial protein secretion mechanism, and host subcellular targeting by multiple approaches including transmembrane helices detection, evolutionary information, sequence similarity were also used. As the aim of this study was to detect prediction certainty of prokaryotic protein targeting in eukaryotic host cells, tools working on diverse principles and training data set were selected (Figure 1). A total of 11 tools working on different prediction approaches and training data set were used to predict pathogen protein targeting in host cell (Table 1; Figure 1). Among these, classical nuclear localization signal (cNLS) mapper detects nuclear targeting and therefore was used only for nuclear proteins, and MitoProt, which detects mitochondrial targeting, was used for host mitochondrial-targeted proteins only. Remaining seven prediction tools were known to predict both nuclear and mitochondrial subcellular targeting and therefore used for all proteins irrespective of their types. TargetP detects only mitochondrial, chloroplast and secretary pathway localization signal, but include data set of nuclear proteins also, and so, it was also used for all types of proteins to understand their effect on protein localization prediction (Table 1). TMPred was used for detection of transmembrane helices in query proteins.

Host subcellular targeting prediction

The bacterial proteins known to target host subcellular compartments were subjected as query for prediction by above tools. The default parameters were used for prediction, as these are most frequently used. For cNLS mapper prediction, the prediction was performed in the entire protein with NLS cutoff value 2.0. The plant and animal/human pathogen proteins were searched in their respective database wherever desired. With some tools like ESLPred and HSLPred, the protein subcellular localization is detected through various properties of query sequence under individual prediction approach, but the hybrid method involves inclusion of all approaches of prediction. We used hybrid method approach for prediction of subcellular targeting, as it is found to have highest prediction accuracy in comparison with other individual approaches (Table 1). TargetP has another variation SignalP, which predicts subcellular targeting of bacterial proteins. Nevertheless, we used only TargetP, as we wanted to predict targeting of bacterial proteins in eukaryotic system [20].

Detection of bacterial secretion of proteins

Some tools are able to detect release of protein by potential secretion system of bacteria. In addition, this can also indicate about targeting of certain proteins in host subcellular compartments. The protein secretion with its subcellular targeting makes more sense for actual protein targeting in practical scenario. Therefore, we predicted secretion system in bacteria through EffectiveDB.

Results

Nuclear targeting prediction

The results indicate that the detection of NLS is insufficient to guarantee about nuclear localization of proteins. After considering NLS cutoff value 5 as strong nuclear targeting signal, we found only 27% nuclear protein with monopartite and bipartite NLS. These NLS-containing proteins included 83.3% sequences with >40 kDa molecular weight. In contrast, 56.25% proteins without NLS was found to have <40 kDa molecular weight. The present distribution of transmembrane helices in proteins was found to be almost equal in both NLS-containing and not containing proteins. Figure 2 indicates about prediction performance of nuclear proteins with different protein subcellular localization prediction tools using bacterial proteins as query. BaCeILo and ESLPred were found to give better inter-kingdom prediction reliability with >40 kDa nuclear proteins. In case of protein with <40 kDa molecular weight and presence of transmembrane helices, BaCeILo was not able to predict nuclear targeting proteins accurately. Prediction reliability of PSORT II and WOLF PSORT was not good until second prediction choice was also considered as significant. Both these proteins subcellular targeting prediction tools give a number of hits or percent chance for subcellular targeting of query protein with WOLF PSORT and PSORT II, respectively. Inclusion of second predicted location choice markedly increased the prediction reliability (Figure 2). However, these tools were also not able to give 100% prediction certainty, and some false-negative predictions occurred depending on different factors associated with query protein, but ESLpred maintained almost uniform prediction reliability among the tools analyzed. Supplementary Table S1 gives details about overall prediction of bacterial proteins experimentally known to target host nucleus. During the analysis of NLS distribution among different molecular weight proteins, it was found that majority of nuclear-targeted proteins lies between 0 and 150 kDa. Among these, the highest molecular weight protein with accession number Q2GGH1 was found to have good mono- and bipartite NLS cutoff value. However, monopartite NLS was comparatively less than bipartite NLS with few exceptions, but in case of proteins with increasing molecular weight, the monopartite NLS cutoff value was higher than bipartite NLS cutoff (Figure 3).

Figure 2

Prediction of host nuclear targeting of experimentally known bacterial proteins localizing host nucleus and their relation with associated factors.

Open in new tab Download slide

Figure 3

Prediction of NLS in bacterial proteins experimentally known to target host nucleus and their relation with molecular weight of proteins.

Open in new tab Download slide

Mitochondrial targeting prediction

During analysis of bacterial proteins known to target host mitochondria, the MitoProt P value greatly influenced prediction ability of subcellular localization prediction tools. The bacterial proteins known to target host mitochondria with MitoProt P value >0.5 were found to have increased prediction reliability with sorting signals detecting tools like TargetP, PSORT II, WOLF PSORT and EffectiveDB. The prediction reliability was also increased with tools working on multiple approaches like BaCeIlo. The presence of transmembrane segments also reduced prediction reliability with TargetP, PSORT II, WOLF PSORT and EffectiveDB and the tool working on multiple approaches like BaCeILo. In contrast, HSLPred, ESLPred and SubLoc were not able to predict proteins without transmembrane helices (Figure 4). WOLF PSORT and PSORT II increased prediction accuracy after adding second choice as significant. Under certain situation, the first prediction choice by these tools almost miss all mitochondrial proteins. This makes an impression that second prediction choice should not be neglected in such predictions, as this can also give valuable information. During analysis of MitoProt P value and its relation with molecular weight of bacterial protein known to target host mitochondria, no consistent relation was found except with the proteins with molecular weight >250 kDa giving low MitoProt P value (Figure 5). Supplementary Table S2 provides details about overall prediction of bacterial proteins experimentally known to target host cell mitochondria.

Figure 4

Prediction of host mitochondrial targeting of experimentally known bacterial proteins localizing host mitochondria and their relation with associated factors.

Open in new tab Download slide

Figure 5

Prediction of MitoProt P value in bacterial proteins experimentally known to target host mitochondria and their relation with molecular weight of proteins.

Open in new tab Download slide

Endomembrane system and cytoplasmic targeting prediction

The detection of bacterial protein targeting in endomembrane system components was found to be highly influenced by presence of transmembrane helices in query proteins. However, the prediction performance or these targeting locations was poor, but detection of such locations in proteins without transmembrane helices was 0%. The correct prediction with cytoplasmic proteins was highest with SubLoc v 1.0 followed by BaCeILo and HSLPred (Figure 6). Supplementary Tables S3 and S4 give details about protein targeting prediction by different subcellular localization prediction tools for host endomembrane system and cytosol targeting proteins, respectively.

Figure 6

Prediction of host endomembrane system targeting of experimentally known bacterial proteins localizing host endomembrane system and their relation with associated factors.

Open in new tab Download slide

During detection of secretion system in query proteins, it was found that among our listed secretion system (based on literature) and EffectiveDB predicted secretion system, 90.9% (nuclear), 83.33% (mitochondrial), 91.66% (endomembrane system) and 66.6% (cytosol) proteins showed correct prediction. The overall prediction assessment ability of these subcellular localization prediction tools is presented (Figure 7) as heat plot.

Figure 7

Overall prediction performance of in silico tools for analysis of bacterial protein localization in host subcellular compartments.

Open in new tab Download slide

Discussion

Pathogen protein targeting host cell is an important part of microbial pathogenesis. Among the host subcellular compartments, the nucleus and mitochondria are important components that form the core of cell survival [14, 32, 33]. The pathogen tries to hijack the host cellular machinery such that the host cell survival and death are coordinated as per pathogen requirement. In addition, pathogen protein targeting host cell cytoplasm and other membrane-bound subcellular organelles has several implications in microbial pathogenesis [1]. Several subcellular protein targeting prediction tools are available, and their number and prediction performance are gradually improving. Our selection of tools for this study depends on diverse approaches for prediction as well as variable training data set (Figure 1; Table 1).

The protein targeting can be passive through diffusion, where protein passively travels through available space and stops wherever it finds its proper restricting target [30]. In contrast, under most of the cases, the protein targeting depends on presence of certain localization signals present in protein itself [34]. We tried to use both prediction strategies during our study. We observed molecular weight of nuclear proteins, as it is known that <40 kDa molecular weight proteins can passively localize to host cell nucleus [31]. The prediction tools based on detection of certain localization signal were further included in the study to cover protein targeting prediction by this mechanisms (Figure 1). However, these localization signals are not always present with certain category proteins. For example, only a part of nuclear protein carries NLS and therefore additional factors are involved in subcellular localization of certain protein [35]. Therefore, we additionally included other tools working on support vector machine (SVM) and consider multiple factors for protein subcellular targeting prediction. SVMs are computational supervised models to classify data on the basis of learning algorithm associated with SVM [36]. These SVMs work on different prediction approach as well as training data set to provide maximum accuracy to available protein subcellular localization tools (Table 1). The subcellular localization of query protein is also influenced by the presence of transmembrane domain and therefore it was also included in the study.

However, the protein localization depends on multiple factors mentioned above, but the targeting of pathogen protein in host cell required additional measures. The pathogen uses special secretion system to export their proteins in host cell [37]. The subcellular targeting prediction of pathogen proteins in host cell is incomplete without prediction of their secretion system to export particular type of proteins by pathogen. Therefore, we included EffectiveDB, which is a combination of tools detecting secretion system as well as subcellular targeting of query protein.

Our study predicted presence of NLS in only 27% proteins. This observation is consistent with the finding that only a part of nuclear proteins has NLS, and these proteins can use alternate strategies for targeting host nucleus [38]. Although low molecular weight proteins can be translocated to host nucleus as per the fact that <40 kDa molecular weight proteins can passively enter into nucleus, high molecular weight proteins required higher NLS cutoff value (Figure 3) and justify the results. The detection of NLS has multiple advantages and disadvantages. However, being simple in nature with either one (mono) or two (bipartite) stretches of basic amino acids, but in the proteins with multiple predicted NLS, detection of actual functional NLS is not possible and therefore it can be a contributing factor behind inaccurate predictions. Moreover, the NLS activity is calculated as an isolated peptide instead of considering the structure of native protein. The cNLS mapper is based on yeast data set to predict nuclear localization [39]. Sometimes, the nuclear proteins are known to target multiple locations [40] and create complex situations for prediction tools. This problem is far grave with detection of distant protein as a query. During our study, we used bacterial proteins as a query to detect their targeting in eukaryotic host cell. Therefore, the disparity in targeting prediction is certain, and added parameters should be measured for getting more precise prediction as mentioned in Figure 2.

According to results, ESLPred was able to give comparatively consistent inter-kingdom prediction accuracy for nuclear proteins. This may be because of a number of reasons including the multiple approaches used by ESLPred. The HSLPred and ESLPred are almost similar in prediction approaches, but the difference lies in their training data set. The HSLPred works on specific human protein data set, while ESLPred works on the basis of broad group of eukaryotic protein data set (Table 1). As mentioned in Figure 2, the inter-kingdom subcellular targeting prediction performance of ESLPred was always higher in comparison with HSLPred for nuclear proteins. This indicates that although highly specific training data set can provide high prediction accuracy for that particular organism query proteins [41, 42], the detection of prokaryotic pathogen protein targeting in eukaryotic host cell indicates that highly specific training data set creates low inter-kingdom prediction accuracy. Therefore, organism-specific protein subcellular targeting prediction tools cannot solve the problem of in silico detection of one organism’s protein targeting in another distant organism.

During detection of pathogen protein targeting in host cell mitochondria, the tools detecting mitochondrial targeting signals (e.g. TargetP, PSORT II, WOLF PSORT and EffectiveDB) gave good inter-kingdom prediction performance. It has been already suggested that bacteria use mitochondrial targeting signals to target their proteins in host mitochondria [1]. This evidence fairly supports the result indicating high MitoProt P value proteins are showing good inter-kingdom mitochondrial targeting prediction accuracy (Figure 4) and poor prediction of mitochondrial protein with low MitoProt P value.

The prediction performance of proteins with transmembrane helices was comparatively poor (Figure 4). Perhaps, the transmembrane helices detection by prediction tools creates additional complexity in the query proteins and reduces their mitochondrial targeting prediction ability. For example, it is found with Legionella pneumophila protein LncP (which is experimentally known to target host mitochondria), that it has four strong transmembrane segments (Supplementary Table S2). It has found that this protein targets mitochondria and makes a specific channel for transfer of metabolites. It is involved in evacuation of adenosine triphosphate molecules from mitochondrial matrix during infection [43]. It is obvious now that detection of pathogen’s protein (with strong mitochondrial sorting signal and without transmembrane domain) targeting in host subcellular compartments is comparatively easier than vice versa. The influence of mitochondrial targeting signal in prediction ability assessment is further supported by the fact that BaCeILo performance was higher among similar category tools detecting targeting of bacterial proteins with >0.5 MitoProt P value. The BaCeILo considers N and C termini sequences in addition to evolutionary information for SVM, while ESLPred and HSLPred use different approaches of prediction (Table 1). MitoProt P value was less with >250 kDa molecular weight mitochondrial proteins (Figure 5). This indicates that alternative mechanisms for mitochondrial targeting are possible and should be covered for prediction tools detecting pathogen’s protein targeting in host cell mitochondria. After analysis of these proteins by TMPred, it was found that these all contain transmembrane domain and can use mechanism like LcnP of L. pneumophila.

It can be concluded for detection of pathogen’s protein targeting in host cell mitochondria that detection of transmembrane helices and mitochondrial targeting signals should be used as additional parameters to customize the predictions. In addition, the host pathogen protein targeting prediction tools should incorporate these parameters to improve prediction accuracy for microbial pathogenesis-related studies.

The prediction performance of endomembrane system proteins was poorest among all subcellular targeting location analyzed, especially in the proteins without transmembrane helices. None of the prediction tool was able to predict correct subcellular targeting of endomembrane system proteins without transmembrane helices (Figure 6). However, the prediction performance of protein with transmembrane helices was comparatively higher, but not good. There may be several reasons behind this poor inter-kingdom prediction reliability. The endomembrane system involves protein trafficking through vesicles in multiple compartments [44], and it is already known that pathogen’s proteins are trafficked through endomembrane system during infection [45, 46]. Owing to this reason, we selected endomembrane system as a whole with the intention to get better prediction reliability. The proteins targeting any endomembrane system component was included as correct prediction, but still the prediction performance was poor. The majority of the tools used in the study were not detecting endomembrane system targeting. Only PSORT II and WOLF PSORT were detecting this targeting location on the basis of sorting signals, and HSLPred was detecting host cell plasma membrane targeting only. This may be the reason for low inter-kingdom prediction certainty for this location. It is required to have a tool with including endomembrane compartment sorting signals, transmembrane domain and other parameters for efficient prediction of pathogen’s protein targeting in host cell. The poor prediction performance of such protein deserves an independent study on properties of these proteins and their inclusion in prediction tools algorithm to increase prediction certainty.

During detection of pathogen’s protein targeting in host cell cytoplasm, Subloc v 1.0 was found to have comparatively higher inter-kingdom prediction certainty. SubLoc is also based on SVM to predict subcellular targeting of query proteins. However, it has two variations to predict prokaryotic and eukaryotic proteins separately, but as it does not ask for a particular system to predict, the chances of giving good accuracy with prokaryotic proteins in eukaryotic system are higher and logical [25]. It analyzes query protein without asking its source (animal, plant, bacteria, etc.). It can be the reason behind comparatively good prediction performances of SubLoc for cytoplasmic proteins.

The prediction of secretion system by EffectiveDB was comparatively good. This tool is primarily designed for detecting secretion of query protein by bacteria, but also detects host subcellular targeting [26]. This utility makes it an ideal candidate for microbial pathogenesis-related studies. Bacteria uses several secretion systems to transport their effectors in host cell, and the information about subcellular targeting can be better assumed with their secretion prediction specifically for extracellular pathogens. However, the prediction of secretion system adds valuable input in microbial pathogenesis, but the detection of subcellular targeting through only N-terminals targeting sequence may be the reason behind limited subcellular targeting prediction certainty of EffectiveDB [27]. This fact is also reflected in another study analyzing prediction accuracy of k-nearest neighbors classifier (PSORT II method), that it gives 60% prediction accuracy for 10 yeast classes and therefore may be the reason behind certain false predictions of PSORT II [47].

The variations in host subcellular targeting prediction of these tools indicate that these in silico prediction tools can miss many nuclear and mitochondrial proteins while predicting their subcellular targeting location elsewhere. However, this does not summarily nullify previous studies predicting bacterial proteins targeting in host cell, but raises skepticism that such prediction should be validated further for evaluating actual protein localization and their subsequent impact on host cell through protein–protein interactions (PPIs). Certainly, there are several factors behind low inter-kingdom prediction accuracy of these tools. For example, sometimes the proteins are not exclusively localized to one location (especially for multi membrane pass proteins) and makes prediction uncertain. Therefore, additional measures are required to increase prediction certainty for such proteins. In addition, the database used for prediction is different from the query sequence, and this can be another reason behind variable inter-kingdom prediction accuracy.

This study also revealed that several measures can be taken to improve prediction accuracy by in silico tools. Every type of subcellular targeting location can be analyzed by different approach-based tools. The use of ESLPred was consistent for nuclear proteins with or without transmembrane (TM), NLS and with variation in molecular weight. However, the detection of pathogen’s proteins targeting host cell mitochondria should be coupled with additional parameters of MitoProt P value and presence of TM, but tools working on detection of sorting signals were good in comparison with tools based on SVM using other approaches. The detection of pathogen’s protein targeting in endomembrane system is still a challenging task, and tools working on sorting signal detection give a slightly better performance, but still it needs methods to incorporate additional parameters to predict accurate pathogen’s protein targeting in host cell (Figure 7). However, it has been experimentally verified that proteins with multiple subcellular targeting location, the targeting should be customized for in silico approach [48]. Current data generated from this study can add valuable inputs in customizing prokaryotic protein’s subcellular targeting prediction in eukaryotic host cell. The data generated during the study provide details about the factors those can be added to provide positive and negative impact on reliability values of these tools. It will also be helpful for development of prediction tools for such complex situations. In addition, researchers trying to predict bacterial proteins through existing tools can also involve these recommendations in their studies for getting better prediction outcomes. Another major addition can be done by predicting host pathogen PPI of query proteins by homology modeling, hidden Markov model or gold standard PPI data. Among these PPI methods, gold standard PPI is the most reliable method, which depends on experimentally validated PPI. Several databases of gold standard PPI are available and should be incorporated to increase the validity of predictions. The high specificity of training data set and less number of prediction approaches in a tool also create low inter-kingdom prediction certainty, and therefore, tools based on these criteria should be avoided for certain cases.

The standard in silico approach involves evaluation of host pathogen PPI by multiple methods and detection of host subcellular targeting by significant interacting proteins. Therefore, the evaluation of microbial proteins influence on host physiology by only in silico protein subcellular targeting data should be discouraged. In conclusion, this article is not intended to raise criticism on actual function of tools analyzed, as these were not tested by us and perhaps already tested by developer before making tools available for public. Nevertheless, the results indicate the potential of these tools to predict certainty of bacterial protein as query should be carefully done and further validated by PPI data. In addition, the results also provide a glimpse about customizing parameters for inter-kingdom protein subcellular targeting prediction in case of microbial pathogenesis-related studies.

Key Points

Pathogen’s protein targeting in host subcellular compartments is an important part of microbial pathogenesis.
Computational prediction of this targeting is a common practice now.
Detection of prokaryotic protein targeting in eukaryotic host cell should be done carefully.
Host subcellular compartment targeting can be analyzed by different approach-based tools, considering several other host pathogen factors.

Supplementary data

Supplementary data are available online at http://bib.oxfordjournals.org/.

Abdul Arif Khan is working as an Assistant Professor in Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia. He has strong research interest in the field of cancer associated infections including study of host-pathogen interactions using system biology approaches. He is involved in using computational approaches to decipher role of microbes in cancer etiology and diagnosis.

Zakir Khan is a Scientist at the Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, USA. He has major research interest in understanding of molecular mechanisms for identifying novel targets/strategies in cancer treatment. He is also involved in using computational approaches to understand molecular mechanisms behind cancer etiology.

Mohd Abul Kalam is working as an Assistant Professor at Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia. His research area is to conduct rigorous translational nanomedicine for promising improvements of potential therapeutics. His expertise is in nanotechnology including the role for computational approaches in nanotechnology research.

Azmat Ali Khan is working as an Assistant Professor in Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia. His research interest focus is on drug delivery via lipid nanoparticles. He is also working on study of host-pathogens interactions using computational and wet lab tools.

Acknowledgments

The authors are grateful to Deanship of Scientific Research and Research Centre, College of Pharmacy, King Saud University.

References

1

Escoll

P

,

Mondino

S

,

Rolando

M

, et al.

Targeting of host organelles by pathogenic bacteria: a sophisticated subversion strategy

.

Nat Rev Microbiol

2016

;

14

:

5

–

19

.

2

Bierne

H

,

Cossart

P.

When bacteria target the nucleus: the emerging family of nucleomodulins

.

Cell Microbiol

2012

;

14

:

622

–

33

.

3

Herweg

JA

,

Hansmeier

N

,

Otto

A

, et al.

Purification and proteomics of pathogen-modified vacuoles and membranes

.

Front Cell Infect Microbiol

2015

;

5

:

48

.

4

Yu

X

,

Decker

KB

,

Barker

K

, et al.

Host-pathogen interaction profiling using self-assembling human protein arrays

.

J Proteome Res

2015

;

14

:

1920

–

36

.

5

Caillaud

MC

,

Piquerez

SJ

,

Fabro

G

, et al.

Subcellular localization of the Hpa RxLR effector repertoire identifies a tonoplast-associated protein HaRxL17 that confers enhanced plant susceptibility

.

Plant J

2012

;

69

:

252

–

65

.

6

Tang

G

,

Leppla

SH.

Proteasome activity is required for anthrax lethal toxin to kill macrophages

.

Infect Immun

1999

;

67

:

3055

–

60

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

7

Hou

M

,

Chen

R

,

Yang

D

, et al.

Identification and functional characterization of EseH, a new effector of the type III secretion system of Edwardsiella piscicida

.

Cell Microbiol

2016

, doi: 10.1111/cmi.12638.

Google Scholar

OpenURL Placeholder Text

WorldCat

8

Zupan

JR

,

Citovsky

V

,

Zambryski

P.

Agrobacterium VirE2 protein mediates nuclear uptake of single-stranded DNA in plant cells

.

Proc Natl Acad Sci USA

1996

;

93

:

2392

–

7

.

9

Pennini

ME

,

Perrinet

S

,

Dautry-Varsat

A

, et al.

Histone methylation by NUE, a novel nuclear effector of the intracellular pathogen Chlamydia trachomatis

.

PLoS Pathog

2010

;

6

:

e1000995

.

10

Khan

S

,

Zakariah

M

,

Rolfo

C

, et al.

Prediction of mycoplasma hominis proteins targeting in mitochondria and cytoplasm of host cells and their implication in prostate cancer etiology

.

Oncotarget

2016

, doi: 10.18632/oncotarget.8306.

Google Scholar

OpenURL Placeholder Text

WorldCat

11

Khan

S

,

Zakariah

M

,

Palaniappan

S.

Computational prediction of Mycoplasma hominis proteins targeting in nucleus of host cell and their implication in prostate cancer etiology

.

Tumour Biol

2016

;

37

:

10805

–

13

.

12

Khan

S

,

Imran

A

,

Khan

AA

, et al.

Systems biology approaches for the prediction of possible role of Chlamydia pneumoniae proteins in the etiology of lung cancer

.

PLoS One

2016

;

11

:

e0148530

.

13

Xie

LP

,

Gao

Y

,

Tian

SW

, et al.

Bioinformatics analysis on homology of CagM protein in Helicobacter pylori Cag Pathogenicity Island

.

Adv Mat Res

2014

;

926–30

:

1081

–

4

.

Google Scholar

OpenURL Placeholder Text

WorldCat

14

Moreno-Altamirano

MM

,

Paredes-Gonzalez

IS

,

Espitia

C

, et al.

Bioinformatic identification of Mycobacterium tuberculosis proteins likely to target host cell mitochondria: virulence factors?

Microb Inform Exp

2012

;

2

:

9

.

15

Kosugi

S

,

Hasebe

M

,

Tomita

M

, et al.

Systematic identification of cell cycle-dependent yeast nucleocytoplasmic shuttling proteins by prediction of composite motifs

.

Proc Natl Acad Sci USA

2009

;

106

:

10171

–

6

.

16

Kosugi

S

,

Hasebe

M

,

Matsumura

N

, et al.

Six classes of nuclear localization signals specific to different binding grooves of importin alpha

.

J Biol Chem

2009

;

284

:

478

–

85

.

17

Nakao

MC

,

Nakai

K.

Improvement of PSORT II protein sorting prediction for mammalian proteins

.

Genome Inform

2002

;

13

:

441

–

2

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18

Nakai

K

,

Horton

P.

PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization

.

Trends Biochem Sci

1999

;

24

:

34

–

6

.

19

Horton

P

,

Park

KJ

,

Obayashi

T

, et al.

WoLF PSORT: protein localization predictor

.

Nucleic Acids Res

2007

;

35

:

W585

–

7

.

20

Emanuelsson

O

,

Nielsen

H

,

Brunak

S

, et al.

Predicting subcellular localization of proteins based on their N-terminal amino acid sequence

.

J Mol Biol

2000

;

300

:

1005

–

16

.

21

Claros

MG

,

Vincens

P.

Computational method to predict mitochondrially imported proteins and their targeting sequences

.

Eur J Biochem

1996

;

241

:

779

–

86

.

22

Pierleoni

A

,

Martelli

PL

,

Fariselli

P

, et al.

BaCelLo: a balanced subcellular localization predictor

.

Bioinformatics

2006

;

22

:

e408

–

16

.

23

Garg

A

,

Bhasin

M

,

Raghava

GP.

Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search

.

J Biol Chem

2005

;

280

:

14427

–

32

.

24

Bhasin

M

,

Raghava

GP.

ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST

.

Nucleic Acids Res

2004

;

32

:

W414

–

9

.

25

Hua

S

,

Sun

Z.

Support vector machine approach for protein subcellular localization prediction

.

Bioinformatics

2001

;

17

:

721

–

8

.

26

Eichinger

V

,

Nussbaumer

T

,

Platzer

A

, et al.

EffectiveDB-updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems

.

Nucleic Acids Res

2016

;

44

:

D669

–

74

.

27

Small

I

,

Peeters

N

,

Legeai

F

, et al.

Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences

.

Proteomics

2004

;

4

:

1581

–

90

.

28

Hofmann

K

,

Stoffel

W.

TMbase—a database of membrane spanning proteins segments

.

Biol Chem Hoppe-Seyler

1993

;

374

:

166

.

Google Scholar

OpenURL Placeholder Text

WorldCat

29

Gromiha

MM.

A simple method for predicting transmembrane alpha helices with better accuracy

.

Protein Eng

1999

;

12

:

557

–

61

.

30

Rudner

DZ

,

Losick

R.

Protein subcellular localization in bacteria

.

Cold Spring Harb Perspect Biol

2010

;

2

:

a000307

31

Tran

EJ

,

Wente

SR.

Dynamic nuclear pore complexes: life on the edge

.

Cell

2006

;

125

:

1041

–

53

.

32

Canonne

J

,

Rivas

S.

Bacterial effectors target the plant cell nucleus to subvert host transcription

.

Plant Signal Behav

2012

;

7

:

217

–

21

.

33

Jiang

JH

,

Tong

J

,

Gabriel

K.

Hijacking mitochondria: bacterial toxins that modulate mitochondrial function

.

IUBMB Life

2012

;

64

:

397

–

401

.

34

Rusch

SL

,

Kendall

DA.

Protein transport via amino-terminal targeting sequences: common themes in diverse systems

.

Mol Membr Biol

1995

;

12

:

295

–

307

.

35

Freitas

N

,

Cunha

C.

Mechanisms and signals for the nuclear import of proteins

.

Curr Genomics

2009

;

10

:

550

–

7

.

36

Cortes

C

,

Vapnik

V.

Support vector network

.

Learn Mach

1995

;

20

:

273

–

97

.

Google Scholar

OpenURL Placeholder Text

WorldCat

37

Costa

TR

,

Felisberto-Rodrigues

C

,

Meir

A

, et al.

Secretion systems in Gram-negative bacteria: structural and mechanistic insights

.

Nat Rev Microbiol

2015

;

13

:

343

–

59

.

38

Macara

IG.

Transport into and out of the nucleus

.

Microbiol Mol Biol Rev

2001

;

65

:

570

–

94

. Table of contents

39

Khan

AA.

In silico prediction of escherichia coli proteins targeting the host cell nucleus, with special reference to their role in colon cancer etiology

.

J Comput Biol

2014

;

21

:

466

–

75

.

40

Rivas

S.

Nuclear dynamics during plant innate immunity

.

Plant Physiol

2012

;

158

:

87

–

94

.

41

Kaundal

R

,

Raghava

GP.

RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information

.

Proteomics

2009

;

9

:

2324

–

42

.

42

Kaundal

R

,

Saini

R

,

Zhao

PX.

Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis

.

Plant Physiol

2010

;

154

:

36

–

54

.

43

Dolezal

P

,

Aili

M

,

Tong

J

, et al.

Legionella pneumophila secretes a mitochondrial carrier protein during infection

.

PLoS Pathog

2012

;

8

:

e1002459.

44

Hsu

VW

,

Lee

SY

,

Yang

JS.

The evolving understanding of COPI vesicle formation

.

Nat Rev Mol Cell Biol

2009

;

10

:

360

–

4

.

45

Alexander

MM

,

Cilia

M.

A molecular tug-of-war: global plant proteome changes during viral infection

.

Current Plant Biol

2016

;

5

:

13

–

24

.

Google Scholar

Crossref

WorldCat

46

Lu

YJ

,

Schornack

S

,

Spallek

T

, et al.

Patterns of plant subcellular responses to successful oomycete infections reveal differences in host cell reprogramming and endocytic trafficking

.

Cell Microbiol

2012

;

14

:

682

–

97

.

47

Horton

P

,

Nakai

K.

Better prediction of protein cellular localization sites with the k nearest neighbors classifier

.

Proc Int Conf Intell Syst Mol Biol

1997

;

5

:

147

–

52

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

48

Fuss

J

,

Liegmann

O

,

Krause

K

, et al.

Green targeting predictor and ambiguous targeting predictor 2: the pitfalls of plant protein targeting prediction and of transient protein expression in heterologous systems

.

New Phytol

2013

;

200

:

1022

–

33

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/about_us/legal/notices)

Download all slides

Month:	Total Views:
January 2017	3
February 2017	6
March 2017	5
April 2017	3
May 2017	3
June 2017	6
August 2017	2
October 2017	6
November 2017	2
December 2017	6
January 2018	41
February 2018	23
March 2018	33
April 2018	8
May 2018	14
June 2018	8
July 2018	4
August 2018	4
September 2018	8
October 2018	4
November 2018	7
December 2018	5
January 2019	10
February 2019	1
March 2019	2
April 2019	3
May 2019	8
July 2019	4
August 2019	8
September 2019	5
October 2019	10
December 2019	6
January 2020	21
February 2020	25
March 2020	22
April 2020	25
May 2020	8
June 2020	46
July 2020	28
August 2020	7
September 2020	13
October 2020	16
November 2020	14
December 2020	19
January 2021	12
February 2021	6
March 2021	16
April 2021	8
May 2021	17
June 2021	16
July 2021	25
August 2021	8
September 2021	13
October 2021	7
November 2021	4
December 2021	12
January 2022	11
February 2022	8
March 2022	14
April 2022	9
May 2022	21
June 2022	10
July 2022	18
August 2022	24
September 2022	38
October 2022	44
November 2022	16
December 2022	24
January 2023	10
February 2023	9
March 2023	14
April 2023	14
May 2023	5
June 2023	2
July 2023	2
August 2023	21
September 2023	5
October 2023	6
November 2023	8
December 2023	11
January 2024	8
February 2024	8
March 2024	12
April 2024	8
May 2024	19
June 2024	6
July 2024	25
August 2024	6
September 2024	15
October 2024	3
November 2024	7
December 2024	17
January 2025	10
February 2025	11
March 2025	24
April 2025	14

Article Contents

Inter-kingdom prediction certainty evaluation of protein subcellular localization tools: microbial pathogenesis approach for deciphering host microbe interaction

Abstract

Introduction

Materials and methods

Protein sequences

Selection of tools

Host subcellular targeting prediction

Detection of bacterial secretion of proteins

Results

Nuclear targeting prediction

Mitochondrial targeting prediction

Endomembrane system and cytoplasmic targeting prediction

Discussion

Supplementary data

Acknowledgments

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Inter-kingdom prediction certainty evaluation of protein subcellular localization tools: microbial pathogenesis approach for deciphering host microbe interaction

Abstract

Introduction

Materials and methods

Protein sequences

Selection of tools

Host subcellular targeting prediction

Detection of bacterial secretion of proteins

Results

Nuclear targeting prediction

Mitochondrial targeting prediction

Endomembrane system and cytoplasmic targeting prediction

Discussion

Supplementary data

Acknowledgments

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only