mUSP: a high-accuracy map of the in situ crosstalk of ubiquitylation and SUMOylation proteome predicted via the feature enhancement approach

Data collection and processing

In this study, ubiquitylation and SUMOylation, the most widely studied ubiquitin family conjugations, were investigated. From a home-built database of PLMD [24], we collected 121 742 and 8115 lysine ubiquitylation and SUMOylation sites. By integrating these two data sets, 3363 UBS crosstalk sites in 1302 proteins were detected. In our system, a local sliding window size of a maximum number of 31 residues flanking each lysine was chosen. We used multiply-modified sites as the positive data, but it would be difficult to prove definitively that a lysine residue not have occurred crosstalk under any conditions. To solve this problem of choosing the negative sets, the single modification sites were used as negative data. Therefore, in this study, we can develop a model to determine whether experimental ubiquitination site can be regulated by SUMOylation, or vice versa. And such a procedure can efficiently exclude potentially non-functional UBS crosstalk mainly generated by chance. Eventually, the positive set was composed of 3363 crosstalk sites, whereas the negative sets included 123 131 single modification sites. All of the benchmark data sets can be downloaded from http://bioinfo.ncu.edu.cn/mUSP/Download.html.

Sequence-derived characteristics

Binary

For the feature encodings, the UBS crosstalk sequence UCS (m, n) was defined as a lysine flanked by m residue upstream and n residue downstream, and UCS (15, 15) was adopted. Because a number of UBS crosstalk sites are located at the N- or C-terminus of proteins, special characters ‘*’ was added to complement the full UCS (15, 15) peptides. Binary encoding [25] calculates the position-specific composition of the residue in the UCS (15, 15). Each residue is encoded by a 21-dimensional binary vector, i.e. A is encoded to (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), C is encoded to (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), …, …, Y is encoded by (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), respectively. For a UCS (15, 15), a digital vector was encoded from the flanking residue:

$$\begin{eqnarray*} &&{B}_i=\left({b}_{n1},{b}_{n2},{b}_{n3},{b}_{n4},\dots, {b}_{n31}\ \right),b\in \left\{\begin{array}{@{}c}\mathrm{A}:100000000000000000000\\{}\mathrm{C}:010000000000000000000\\{}\dots \\{}\mathrm{Y}:000000000000000000010\\{}\ast :000000000000000000001\end{array}\right. \\ &&n\in \left\{A,C,\dots, Y,\ast \right\} \end{eqnarray*}$$

Composition of K-spaced amino acid pairs

Composition of K-spaced amino acid pairs (CKSAAP) feature encoding [26] represents the composition of amino acid pairs that are separated by K (K = 0, 1, 2, …, 5. The default maximum value of K is 5) other residues within the UCS (15, 15), which reflects the short-range interactions of amino acid within the sequences. Suppose K = 0 as an example, 441 0-spaced amino acid pairs (i.e. ‘AC’, ‘AD’, ‘AE’, ‘AF’, ‘AG’, ‘AH’, ‘AI’, ‘AK’, …, ‘**’)₄₄₁ are generated. Then, the feature vector can be defined as:

$$ {\left(\frac{N_{\mathrm{AC}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AD}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AE}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AF}}}{N_{\mathrm{Total}}}, \dots, \frac{N_{\ast \ast }}{N_{\mathrm{Total}}}\right)}_{K=0} $$

All amino acid pair for K = (0, 1, …, 5) are considered. Each UCS (15, 15) is encoded as a digital vector as follows:

$$\begin{eqnarray*} &&{\left(\frac{N_{\mathrm{AC}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AD}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AE}}}{N_{\mathrm{Total}}},\dots, \frac{N_{\ast \ast }}{N_{\mathrm{Total}}}\right)}_{K=0},\dots, \\ && {\left(\frac{N_{\mathrm{AC}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AD}}}{N_{\mathrm{Total}}},\frac{N_{\mathrm{AE}}}{N_{\mathrm{Total}}},,\dots, \frac{N_{\ast \ast }}{N_{\mathrm{Total}}}\right)}_{K=5} \end{eqnarray*}$$

Amino acid composition

Amino acid composition (AAC) encoding [27, 28] represents the frequency of each type of residues in a UCS (15, 15). The frequencies of all 20 natural residues (‘ACDEFGHIKLMNPQRSTVWY’) can be calculated as:

$$ f(i)=\frac{N_{(i)}}{N},i\in \left\{\mathrm{A},\mathrm{C},\mathrm{D},\mathrm{E},\mathrm{F},\mathrm{G},\mathrm{H},\mathrm{I},\mathrm{K},\mathrm{L},\mathrm{M},\mathrm{N},\mathrm{P},\mathrm{Q},\mathrm{R},\mathrm{S},\mathrm{T},\mathrm{V},\mathrm{W},\mathrm{Y}\right\} $$

where N_(i) donates the number of residues type and N represents the length of a UCS (15, 15).

Enhanced amino acid composition

The enhanced amino acid composition (ENAC) encoding [29] represents the local AAC for the fixed-length sequence window (we set the window as 5) that continuous slides from the 5′ to 3′ terminus of each protein sequence. For more information on this approach, please refer to AAC feature encoding. Thus, a UCS (15, 15) corresponded to 21 * (31 – 5 + 1) sliding windows and its vector dimension of the EAAC encoding was 21 × 27 = 567.

Evolutionary-derived information

Local sequence clusters (LSC) often exist around PTM sites because substrates of the same enzyme usually shares similar patterns. To take full advantage of such cluster information for predicting UBS crosstalk sites, we adopted the LSC score algorithm [25] to extract features from its similar sequences in both positive and negative data sets. For a query sequence, we first detected its K nearest neighbors in both positive and negative data sets based on the local sequence similarity. For example, for two local sequence fragments S₁ and S₂ (the window size is 2p + 1), we define the distance D (S₁, S₂) between S₁ and S₂ as

$$ Dist\left({S}_1,{S}_2\right)=1-\frac{\sum_{i=-p}^p Sim\left( Si(i),S2(i)\right)}{2p+1} $$

where p denotes the number of flanking residues from the central site in the protein sequence fragment and i denotes the position of an amino acid in the target sequence segment; Sim, the amino acid similarity matrix, is derived from the BLOSUM62 substitution matrix as:

$$ \mathrm{Sim}\left(a,b\right)=\frac{M\left(a,b\right)-\operatorname{Min}\left[M\right]}{\operatorname{Max}\left[M\right]-\operatorname{Min}\left[M\right]} $$

where a and b are two amino acids, M is the substitution matrix and max/min [M] represents the largest/smallest number in the matrix, respectively. After that, the LSC score is then extracted as follows: (i) calculate the average distances from the query sequence fragment S to all the training set (contain the positive and negative sets), (ii) sort the neighbors by the distances and choose the K nearest neighbors and (iii) calculate the percentage of positive neighbors in its K nearest neighbors as the LSC score. Last, to take advantage of different properties of neighbors with various similarities, we chose different K values to obtain multiple scores.

Structural-derived information

Accessible surface area

Accessible surface area (ASA) encoding [30] represents the approximately exposed area of an amino acid to solvent. A widely used tool, SPIDER2 [31], was adopted to calculate the potential ASA values for residues in a UCS (15, 15). Thus, each UCS (15, 15) peptide i is characterized by a 21-dimensional digital vector as

$$\begin{eqnarray*} &&A(i)=\left({a}_1,{a}_2,\dots, {a}_{31}\right),\\ && a\in \left\{\mathrm{A},\mathrm{C},\mathrm{D},\mathrm{E},\mathrm{F},\mathrm{G},\mathrm{H},\mathrm{I},\mathrm{K},\mathrm{L},\mathrm{M},\mathrm{N},\mathrm{P},\mathrm{Q},\mathrm{R},\mathrm{S},\mathrm{T},\mathrm{V},\mathrm{W},\mathrm{Y}\right\} \end{eqnarray*}$$

where a_i donates the value of each residue.

Local backbone angles

The local backbone angles (LBA) encoding [32] represents the detailed geometric information at specific positions for each residue in a protein sequence. Then, our indexes, i.e. the backbone torsion angles φ and Ψ, the angle between Cαi − 1 − Cαi − Cαi + 1(θ) and the dihedral angle rotated around the Cαi − Cαi + 1 bond (τ) were computed by SPIDER2. Each UCS (15, 15) peptide i is transformed into a dimensional vector as

$$ L(i)={\left({L}_1,{L}_2,\cdots\!, {L}_{31}\right)}_{\varphi }{\left({L}_1,{L}_2,\cdots\!, {L}_{31}\right)}_{\varPsi }{\left({L}_1,{L}_2,\cdots\!, {L}_{31}\right)}_{\theta }{\left({L}_1,{L}_2,\cdots\!, {L}_{31}\right)}_{\tau } $$

Secondary structure

For the protein SS information, the probability score of α-helix, β-strand or coil was computed by SPIDER2 for each residue in the UCS (15, 15). Each UCS (15, 15) peptide i is transformed into a dimensional vector as

$$ S(i)={\left({s}_1,{s}_2,\cdots, {s}_{31}\right)}_{\alpha -\mathrm{helix}}{\left({s}_1,{s}_2,\cdots, {s}_{31}\right)}_{\beta -\mathrm{strand}}{\left({s}_1,{s}_2,\cdots, {s}_{31}\right)}_{\mathrm{coil}} $$

Two-step feature selection strategy via recursive feature elimination

Feature selection was a critical step to eliminate the noisy features and improve performance. In this study, we performed a two-step feature selection method to capture the most prominent features and then to construct the final model with the optimal feature group. In the first step, the chi-square statistics [33] between every feature variable and the target labels were calculated to explore the feature important. If the feature was independent of the target, we discard the feature variable. Otherwise, the feature was considered important and reserved. This procedure could generate an index of feature ranking regarding their classification importance. In the second step, the method of recursive feature elimination was adopted to determine the optimal feature representations by recursively eliminating a small number of weakest features per loop [34]. More specifically, to determine the optimal group, features from the ranked index were group eliminated ten-by-ten from lower rank to higher rank each time, whereas the least important features are gradually pruned. The remaining features were selected to rebuild random forests (RF)-based prediction model on the 10-fold CV repeatedly. Finally, the feature subset with the best performance was selected as the optimal feature subset to build the prediction model. The methodology of the strategy was illustrated as follows:

Open in new tab

Inputs:

Training data set D.

Set of i features F = {f₁, f₂, …, f_i}.

Ranking method R (D, F).

Outputs:

Final ranking FR.

Algorithms:

Repeat for k in {1: F}:

Rank set F using R (D, F)

f^* = last ranked feature in F.

FR (F – K + 1) = f ^*.

F = F – f ^*.

Figure 2

Sequence logo illustration generated by pLogo to show the amino acids with statistically significant differences surrounding the UBS crosstalk sites with ubiquitylation (A) or SUMOylation (B) data as background. (C, D) The comparison of LSC scores between UBS crosstalk sites and ubiquitylation (C) or SUMOylation (D) data.

The implementation of the web service

The online service of mUSP was constructed in an easy-to-use manner using PHP and JavaScript. Many annotated information was integrated into mUSP. In addition to a ‘general search’ option, where you can retrieve information for all columns by imputing keyword, researchers can search each column individually, and multiple columns or multiple conditions search at the same time. The website of mUSP was extensively tested on various web browsers including Internet Explorer, Mozilla Firefox and Google Chrome to provide a robust service. For convenience, the online service of mUSP was implemented and freely available at http://bioinfo.ncu.edu.cn/mUSP/index.html.

Results

Motifs analysis for the UBS crosstalk site compared to single modification

Based on the data sets, the graphical sequence logo (P-value <0.05) was generated by pLogo tool to reveal the statistical differences of position-specific amino acid frequencies between the UBS crosstalk and single modification sites (Figure 2A, B). Compared to ubiquitylation sites, we found lysine (K), glutamic acid (E), aspartic acid (D) and valine (V) were significantly enriched in the fragments containing UBS crosstalk sites, whereas ubiquitylation modification fragments contained a large number of leucine (L) in the protein sequence. When using SUMOylation modification data as the background, it was found that glycine (G), leucine (L) and alanine (A) were significantly enriched in the crosstalk fragments. Moreover, UBS crosstalk events might be facilitated by specific consensus sequences or motifs that have biological significances. Therefore, using MEME tool, we analyzed the UBS crosstalk data against the background set of single modification sequences individually and three significant motifs of UBS crosstalk were discovered, respectively [35] (see Supplementary Figure S1 available online at https://dbpia.nl.go.kr/bib). From the result, ΨKxE is the most predominant motif of UBS crosstalk sites compared to ubiquitylation sites (where Ψ represents hydro-phobic amino acid) such as IKxE and VKxE. In addition, GTYG with remarkable enrichment of glycine (G) at upstream of the UBS crosstalk was also detected as the most consensus motif with SUMOylation data as the background. The analysis suggests that sequence patterns of the flanking regions of UBS crosstalk sites were more conservative with higher specificity than those of single modification sites.

Figure 3

The structural annotations regarding the accessible surface area (A), local backbone angles (B) and SS (C) between UBS crosstalk and ubiquitylation or SUMOylation data.

LSC scores analysis between UBS crosstalk and single modification site

The Local Sequence Clusters (LSC) scores can measure whether a query sequence is more similar to the UBS crosstalk sequence or not. A higher LSC score always means that the query sequence was more similar to the positive samples, whereas a lower LSC score means that the query sequence was more likely to the negative samples. Figure 2C, D compared the LSC score between UBS crosstalk and single modification sequences individually. More specifically, the average LSC scores of UBS crosstalk with different sizes of nearest neighbors were within 0.55–0.63 and 0.47–0.54. From the result, we found there were statistically significant differences among the various size of nearest neighbors of the LSC score between UBS crosstalk and ubiquitylation (P-value, 1.94e − 4) or SUMOylation sites (P-value, 2.67e − 4). UBS crosstalk sequences own larger LSC scores than single modification sequences. Hence, the local sequences surrounding identified UBS crosstalk sites were more similar to their nearest neighbors in the positive set. In other words, for query sequence, a high LSC score always means an increased likelihood that the site was to be some known UBS crosstalk sites. This result confirms that UBS crosstalk-related clusters exist in local sequences around crosstalk sites. For single modification sites, it was not surprising that the average LSC scores were comparatively low, which demonstrated that the global sequences in the negative set were not mainly similar to the nearest neighbors in either the positive or the negative data sets. Sequences in the negative data set possess the same opportunities to find nearest neighbors in either the positive or the negative data set. In summary, the LSC scores can capture the cluster information in the local sequence around UBS crosstalk sites and hence distinguish them from the negative data. Therefore, LSC score was adopted as features for UBS crosstalk site prediction.

Figure 4

(A) The correlation between every feature vector in the training data and the corresponding label based on chi-square statistics. (B) The AUC value of 10-fold cross-validation with the decrease of feature dimension. (C) The t-distributed Stochastic Neighbor Embedding (t-SNE) visualization of the benchmark data set in a 2D feature space before and after feature optimization.

Structural annotations between UBS crosstalk and single modification site

Structural annotations regarding the SS, accessible surface area and local backbone angles between UBS crosstalk and single modification site were explored (Figure 3). For the results, it was found that UBS crosstalk sites existed a higher accessible surface area value compared to ubiquitylation and SUMOylation sites, whereas ubiquitylation site has the lowest value (Figure 3A). For the local backbone torsion angles, our results show that UBS crosstalk was at an intermediate value between ubiquitylation and SUMOylation in terms of the properties of φ, θ and τ and do not show special fluctuation. But for the Ψ, we found that UBS crosstalk sites owned a higher value compared to ubiquitylation and SUMOylation sites (Figure 3B). The SS represents the local 3D structure of proteins. Each residue can contribute to one of the three defined local structures of a protein, i.e. helix, strand and coil. The information of protein SS can provide a potent trail towards understanding their general 3D configuration. In this study, the SS between the UBS crosstalk and single modification was compared. It was found that UBS crosstalk tends to have a slightly higher helix structure probability, whereas the prediction probability of the strand and coil structure were located in between ubiquitylation and SUMOylation (Figure 3C). Taken together, the results of structural annotations demonstrated that there are structural differences between UBS crosstalk and single modification site and can be used to build models to distinguish positive and negative samples.

Two-step feature selection strategy to determine the optimal feature subset

As demonstrated above, the UBS crosstalk sites exhibited great difference on the primary sequence’s motifs, local evolutionary scores and structural annotations compared to single modification site. But multiple features include many dimensions, which were not equivalently important for the model performance, and some of the feature vectors might be unwanted noise. In this study, a two-step feature selection method was used via the recursive feature elimination to select the optimal feature group. First, we use the chi-square statistics to calculate the correlation between every feature vector and the corresponding label to quantify its important (Figure 4A). The least important features are gradually pruned round by round. We plotted the change of AUC value based on 10-fold CV with the decrease of feature dimension (Figure 4B). We observed the performance of the model increased quickly at the beginning, then reached the highest point of performance and gradually decreasing, which suggested that the recursive feature elimination contributes feature representation and was effective to improve performance. To explore the inner cause of performance improvement, the distribution of feature space between the initial and optimal status was investigated and visualized using t-distributed Stochastic Neighbor Embedding (t-SNE) (Figure 4C). We first adopted principal components analysis method to reduce the number of dimensions of all features to 50-dimensions. We then computed the t-SNE implemented in scikit-learn 0.22.1 with the default parameters (n_components = 2, perplexity = 30, and learning rate = 200). The distribution of positive and negative samples in the two-dimensional (2D) feature space was illustrated. We found a clear inhomogeneity without clear border in the distribution pattern of positive (UBS crosstalk sites) and negative data (single modification sites), indicating that the original feature discriminating capability was less. By implementing the recursive feature elimination process, the feature space tends to be relatively stable, in which the distinction between the positive samples and negative samples in feature space was clearer.

Figure 5

The performance evaluation and comparison of mUSP, mUSP-S and mUSP-U, based on multiple cross-validations. For the models of mUSP-U and mUSP-S, the negative samples in training data only contained ubiquitylation and SUMOylation sites. (A) The performance comparison between the integration of optimal features and the individual feature. (B) The ROC curves and AUC values of mUSP based on multiple cross-validations. (C) The ROC curves and AUC values of mUSP-S and comparison of the mUSP-S with GPS-SUMO. (D) The ROC curves and AUC values of mUSP-U and comparison of the mUSP-U with MUscADEL.

Construction and evaluation of mUSP predictor

After exhaustively testing, RF classification algorithm with 100-dimension representative features was implemented as the mUSP classifier. To assess the prediction performance of mUSP, the 4-, 6-, 8- and 10-fold CVs were applied and the ROC curves were drawn (Figure 5B) and the corresponding AUC values were calculated as 0.8362 (4-fold), 0.8396 (6-fold), 0.8433 (8-fold) and 0.8472 (10-fold), respectively. Since the prediction performance of the 4-, 6-, 8- and 10-fold CVs were closely similar and average AUC values of various CVs also as high as 0.8416, it is evident that the model was a robust predictor. In fact, researchers can directly input some ubiquitylation or SUMOylation sites to the SUMOylation or ubiquitylation prediction tools to cross-detect the potential UBS crosstalk sites. But this is often inaccurate and biased due to the original intention of these tools. Therefore, to exhibit the reliability of mUSP, another two-gold data sets with only negative data that contain only ubiquitylation and SUMOylation sites were constructed, which was used to train two new models mUSP-U and mUSP-S, separately. The 4-, 6-, 8- and 10-fold CVs were performed, and the ROC curves were drawn for the two new models (Figure 5C, D). The average AUC values of mUSP-U and mUSP-S were calculated as 0.8616 and 0.8404. Multiple CV results of these two models were high congruence, which indicates the promising accuracy and the two models are robust. Moreover, the benchmark data set of mUSP-U and mUSP-S were directly submitted to MUscADEL [36] and GPS-SUMO [37], i.e. two most popular prediction tools for identifying the lysine ubiquitylation and SUMOylation sites for cross-prediction, and the cross-prediction results were 0.6072 and 0.5866 for the GPS-SUMO and MUscADEL. The mUSP achieved significantly better performance with the 38.41 and 51.48% AUC values improvement compared to the results by GPS-SUMO and MUscADEL in the benchmark data. Overall, our results demonstrated the robustness and superiority of mUSP, mUSP-U and mUSP-S.

Discussion

Recent studies impressively demonstrated that various PTMs can work together to regulate specific biological pathways [38,39]. The combination of different PTMs on protein surfaces can create a ‘PTM code’, which regulated protein functions and cellular signaling, represent just another level of complexity and pose a major challenge to ongoing and future PTM research. Ubiquitylation and SUMOylation are the discrete modifications, which can bring different outcomes. In situ crosstalk of ubiquitylation and SUMOylation has been identified and functionally studied. For example, mono-ubiquitylated at Lys164 of proliferating cell nuclear antigen (PCNA) facilitates translation synthesis, a DNA repair pathway that is prone to error; however, construction of a poly-ubiquitin chain at this site promotes an error-free repair process by PCNA. Conversely, SUMOylation at Lys164 during S-phase prevents PCNA-induced DNA recombination [40].

In this study, we systematically explored the in situ crosstalk of ubiquitylation and SUMOylation co-occurring on the same residue. In total, 3363 UBS crosstalk sites in 1302 proteins were detected from the housekeeping of PLMD database. By investigating sequence information, we found UBS crosstalk sites exhibited great differences in the prime sequence characteristics, local sequence evolutionary scores compared to the single modification sites. Also, there was a difference for the distribution of structural annotations between UBS crosstalk and single modification sites either on central modification sites or on crosstalk sequences. Thus, we integrated sequence-derived characteristics, local sequence evolutionary scores and structural annotations with a two-step feature selection strategy via recursive feature elimination approach to build the mUSP model for predicting UBS crosstalk sites.

Our analysis provided important clues to the feature selection and led to the satisfying predictive performance. We observed that the two-step feature selection strategy enhances feature representation and contributes to performance improvement for the in situ crosstalk of ubiquitylation and SUMOylation prediction. The following performance evaluation showed that our method achieved relatively high satisfaction. With the mUSP model, 11 444 high-confidence potential UBS crosstalk sites were excavated and added to the mUSP web server along with annotation information for researchers to query. With the great development of mass spectrometry-based proteomics experiment, a mass of ubiquitylation and SUMOylation sites was experimentally verified, but their crosstalk and its properties remain to be systematically characterized. Thus, it is anticipated that the classifiers constructed in this work could be useful for biomedical research and guide the related experimental validations.

Key Points

Reversible post-translational modification (PTM) crosstalk plays important role in regulating diverse biological processes. This study conducted a comprehensive survey of properties of UBS crosstalk regarding the sequence motifs, local evolutionary scores and distribution of structural annotations.
Two-step feature selection strategy enhances feature representation and contributes to performance improvement for the in situ crosstalk of ubiquitylation and SUMOylation prediction.
A web portal (http://bioinfo.ncu.edu.cn/mUSP/index.html) has been made available to facilitate the query of our high-accuracy map of the UBS crosstalk results.

Funding

National Natural Science Foundation of China (21665016 and 21175064).

Conflict of interest

The authors declare that they have no competing interests.

Hao-Dong Xu is a graduate student at the department of chemistry, Nanchang University. His research interest includes machine learning & deep learning algorithms development and database construction.

Ru-Ping Liang is a professor at the department of chemistry, Nanchang University.

You-Gan Wang is a graduate student at the department of chemistry, Nanchang University. His research interest includes machine learning and database construction.

Jian-Ding Qiu is a professor at the department of chemistry, Nanchang University. His research interest includes the development of novel algorithms, online tools, and databases for protein structure and function prediction.

References

1.

Gallego

M

,

Virshup

DM

.

Post-translational modifications regulate the ticking of the circadian clock, at

.

Nat Rev Mol Cell Biol

2007

;

8

:

139

.

2.

Harmel

R

,

Fiedler

D

.

Features and regulation of non-enzymatic post-translational modifications

.

Nat Chem Biol

2018

;

14

:

244

.

3.

Westermann

S

,

Weber

K

.

Post-translational modifications regulate microtubule function

.

Nat Rev Mol Cell Biol

2003

;

4

:

938

.

4.

Xu

H

,

Wang

Y

,

Lin

S

, et al.

PTMD: a database of human disease-associated post-translational modifications

.

Genomics Proteomics Bioinformatics

2018

;

16

:

244

–

51

.

5.

Hunter

T

.

The age of crosstalk: phosphorylation, ubiquitination, and beyond

.

Mol Cell

2007

;

28

:

730

–

8

.

6.

Swaney

DL

,

Beltrao

P

,

Starita

L

, et al.

Global analysis of phosphorylation and ubiquitylation cross-talk in protein degradation

.

Nat Methods

2013

;

10

:

676

.

7.

Yang

X-J

,

Grégoire

S

.

A recurrent phospho-sumoyl switch in transcriptional repression and beyond

.

Mol Cell

2006

;

23

:

779

–

86

.

8.

Xu

HD

,

Wang

LN

,

Wen

PP

, et al.

Site-specific systematic analysis of lysine modification crosstalk

.

Proteomics

2018

;

18

:

1700292

.

9.

Pan

Z

,

Liu

Z

,

Cheng

H

, et al.

Systematic analysis of the in situ crosstalk of tyrosine modifications reveals no additional natural selection on multiply modified residues

.

Sci Rep

2014

;

4

:

7331

.

10.

Huang

J

,

Perez-Burgos

L

,

Placek

BJ

, et al.

Repression of p53 activity by Smyd2-mediated methylation

.

Nature

2006

;

444

:

629

.

11.

Gao

D

,

Wan

L

,

Inuzuka

H

, et al.

Rictor forms a complex with Cullin-1 to promote SGK1 ubiquitination and destruction

.

Mol Cell

2010

;

39

:

797

–

808

.

12.

Shi

X

,

Kachirskaia

I

,

Yamaguchi

H

, et al.

Modulation of p53 function by SET8-mediated methylation at lysine 382

.

Mol Cell

2007

;

27

:

636

–

46

.

13.

Bergink

S

,

Jentsch

S

.

Principles of ubiquitin and SUMO modifications in DNA repair

.

Nature

2009

;

458

:

461

–

7

.

14.

Ahner

A

,

Gong

X

,

Frizzell

RA

.

Cystic fibrosis transmembrane conductance regulator degradation: cross-talk between the ubiquitylation and SUMOylation pathways

.

FEBS J

2013

;

280

:

4430

–

8

.

15.

Lamoliatte

F

,

Caron

D

,

Durette

C

, et al.

Large-scale analysis of lysine SUMOylation by SUMO remnant immunoaffinity profiling

.

Nat Commun

2014

;

5

:

5409

.

16.

Weinert

BT

,

Schölz

C

,

Wagner

SA

, et al.

Lysine succinylation is a frequently occurring modification in prokaryotes and eukaryotes and extensively overlaps with acetylation

.

Cell Rep

2013

;

4

:

842

–

51

.

17.

Hendriks

IA

,

D'souza

RC

,

Yang

B

, et al.

Uncovering global SUMOylation signaling networks in a site-specific manner

.

Nat Struct Mol Biol

2014

;

21

:

927

.

18.

Trinidad

JC

,

Barkan

DT

,

Gulledge

BF

, et al.

Global identification and characterization of both O-GlcNAcylation and phosphorylation at the murine synapse

.

Mol Cell Proteomics

2012

;

11

:

215

–

29

.

19.

van Noort

V

,

Seebacher

J

,

Bader

S

, et al.

Cross-talk between phosphorylation and lysine acetylation in a genome-reduced bacterium

.

Mol Syst Biol

2012

;

8

.

20.

Gray

VE

,

Liu

L

,

Nirankari

R

, et al.

Signatures of natural selection on mutations of residues with multiple posttranslational modifications

.

Mol Biol Evol

2014

;

31

:

1641

–

5

.

21.

Yao

H

,

Li

A

,

Wang

M

.

Systematic analysis and prediction of in situ cross talk of O-GlcNAcylation and phosphorylation

.

Biomed Res Int

2015

;

2015

.

22.

Huang

Y

,

Xu

B

,

Zhou

X

, et al.

Systematic characterization and prediction of post-translational modification cross-talk

.

Mol Cell Proteomics

2015

;

14

:

761

–

70

.

23.

Huang

R

,

Huang

Y

,

Guo

Y

, et al.

Systematic characterization and prediction of post-translational modification cross-talk between proteins

.

Bioinformatics

2019

;

35

:

2626

–

2633

.

PubMed

24.

Xu

H

,

Zhou

J

,

Lin

S

, et al.

PLMD: an updated data resource of protein lysine modifications

.

J Genet Genomics

2017

;

44

:

243

–

50

.

25.

Xu

H-D

,

Shi

S-P

,

Wen

P-P

, et al.

SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy

.

Bioinformatics

2015

;

31

:

3748

–

50

.

PubMed

26.

Xu

H-D

,

Shi

S-P

,

Chen

X

, et al.

Systematic analysis of the genetic variability that impacts SUMO conjugation and their involvement in human diseases

.

Sci Rep

2015

;

5

:

10900

.

27.

Cao

M

,

Chen

G

,

Yu

J

, et al.

Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy

.

Brief Bioinform

2018

. doi:

10.1093/bib/bby122

.

28.

Wen

P-P

,

Shi

S-P

,

Xu

H-D

, et al.

Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization

.

Bioinformatics

2016

;

32

:

3107

–

15

.

29.

Chen

Z

,

He

N

,

Huang

Y

, et al.

Integration of a deep learning classifier with a random forest approach for predicting malonylation sites

.

Genom Proteom Bioinf

2018

;

16

:

451

–

9

.

30.

Taherzadeh

G

,

Yang

Y

,

Xu

H

, et al.

Predicting lysine-malonylation sites of proteins using sequence and predicted structural features

.

J Comput Chem

2018

;

39

:

1757

–

63

.

31.

Yang

Y

,

Heffernan

R

,

Paliwal

K

, et al.

Spider2: a package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks

.

Methods Mol Biol

2017

;

1484

:

55

–

63

.

32.

López

Y

,

Sharma

A

,

Dehzangi

A

, et al.

Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction

.

BMC Genomics

2018

;

19

:

923

.

33.

Moh'd

A

,

Mesleh

A

.

Chi square feature extraction based svms arabic language text categorization system

.

J Comput Sci

2007

;

3

:

430

–

5

.

34.

Yan

K

,

Zhang

D

.

Feature selection and analysis on correlated gas sensor data with recursive feature elimination

.

Sensor Actuat B-Chem

2015

;

212

:

353

–

63

.

35.

Chou

MF

,

Schwartz

D

.

Biological sequence motif discovery using motif-x

.

Curr Protoc Bioinformatics

2011

;

35

:

13.15.11

–

24

.