MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles

Tiberti, Matteo; Terkelsen,  Thilde; Degn,  Kristine; Beltrame,  Ludovica; Cremers,  Tycho Canter; da Piedade, Isabelle; Di Marco,  Miriam; Maiani,  Emiliano; Papaleo,  Elena

doi:10.1093/bib/bbac074

Abstract

Mutations, which result in amino acid substitutions, influence the stability of proteins and their binding to biomolecules. A molecular understanding of the effects of protein mutations is both of biotechnological and medical relevance. Empirical free energy functions that quickly estimate the free energy change upon mutation (ΔΔG) can be exploited for systematic screenings of proteins and protein complexes. In silico saturation mutagenesis can guide the design of new experiments or rationalize the consequences of known mutations. Often software such as FoldX, while fast and reliable, lack the necessary automation features to apply them in a high-throughput manner. We introduce MutateX, a software to automate the prediction of ΔΔGs associated with the systematic mutation of each residue within a protein, or protein complex to all other possible residue types, using the FoldX energy function. MutateX also supports ΔΔG calculations over protein ensembles, upon post-translational modifications and in multimeric assemblies. At the heart of MutateX lies an automated pipeline engine that handles input preparation, parallelization and outputs publication-ready figures. We illustrate the MutateX protocol applied to different case studies. The results of the high-throughput scan provided by our tools can help in different applications, such as the analysis of disease-associated mutations, to complement experimental deep mutational scans, or assist the design of variants for industrial applications. MutateX is a collection of Python tools that relies on open-source libraries. It is available free of charge under the GNU General Public License from https://github.com/ELELAB/mutatex.

free energy, mutations, post-translational modifications, structural ensembles, binding, stability

Introduction

Advances in proteomics are now providing a massive amount of data on protein–protein interaction or post-translationally modified proteins [1–5] that benefit from structural studies to be rationalized [6, 7]. On the other hand, genomic initiatives allow the identification of missense mutations [8–10] in the coding region of genes that need to be understood at a structural and functional level [11–13]. Indeed, single amino acid substitutions (i.e. mutations) or post-translational modifications (PTMs) in proteins may alter structural stability and intermolecular interactions, impacting protein activity, function and cellular signaling. An accurate and systematic prediction of changes in stability and binding upon mutations or PTMs is thus crucial to understand protein variants at the molecular level [14–16]. Moreover, the possibility to predict mutational effects also strongly impacts protein engineering for biotechnological and industrial applications [17, 18]. It has also been shown that disease-causing mutations are characterized by changes in structural stability or binding affinity [19–21]. Most prediction tools are designed to work on a single conformation at a time. However, proteins are not static entities, and they can attain multiple states in solutions that are tightly related to their function [22, 23]. Some proteins are conformationally heterogeneous and undergo a myriad of structural changes, including sparsely populated states that are stabilized upon interaction with a biological partner [24, 25]. It is thus important to account for the conformational ensembles of proteins to achieve a proper understanding of the impact of mutations and PTMs. Protein ensembles can be obtained by experimental techniques such as NMR [26, 27], simulations [28–30] or an integration of the two [31–33].

Several different methods have been developed to estimate changes in folding and binding free energies upon amino-acidic substitutions from a structure, many of which are extensively summarized in a recent review article [34]. Synthetic benchmarks of these methods have been performed over time, along with comparisons with experimental data. Structure-based methods were generally considered to be moderately accurate in terms of classification between stabilizing, neutral or destabilizing mutations [35, 36] and in terms of correlation between experimental and computed ΔΔG values [37, 38]. FoldX [39] ranked among the best-performing methods [40] and has been applied with success to many biological questions. Experimental validation of FoldX predictions, along with a comparison to existing experimental data, also supports the notion that this tool is valuable for mutation classification, design and prediction [34, 40–45].

The FoldX method predicts changes in free energy of folding upon mutation as the difference between the estimated free energy of folding of the mutant and the reference wild-type (WT) variants. This is done by using an empirical free energy function, which includes different energetic terms for main-chain and side-chain contributions. Some of these terms are weighted by a coefficient, which was determined using a fitting procedure over a database of free energy differences from single-point mutants, meaning that FoldX free energies are not necessarily physical. However, the calculated differences are comparable to those identified experimentally. The energy function is implemented in the software suite, FoldX, which is free for academic users. The suggested FoldX mutation protocol includes a repair step, in which an energy optimization is performed that removes interatomic clashes or optimizes the initial conformation of side chains. Next, a mutation step is performed (BuildModel) in which the model of the mutant variant and corresponding WT models are generated. Finally, the respective energies are calculated and subtracted to obtain the final difference in free energy upon mutation. FoldX can also estimate the free energy of interaction between pairs of molecules in the structure through the AnalyseComplex command. FoldX supports the twenty canonical amino acids, a few post-translationally modified ones and other chemical moieties such as nucleic acids and metal ions.

The accuracy of FoldX has been independently investigated with SDs reported in the range of 1.0–1.78 kcal mol⁻¹ between the FoldX ΔΔGs and experimental data, and an average accuracy of 0.69 [40]. Recent benchmarking and thorough analysis of FoldX and other prediction methods highlighted possible sources of bias for the predictions, most likely due to the limited or unbalanced dataset used to parametrize such methods. One of the most discussed is the bias toward identifying destabilizing mutations [46–48]. Nonetheless, among the methods tested for systematic bias, FoldX was in the top five least biased methods [48]. The bias can be alleviated through a backbone relaxation step [47]. Other factors such as the difference in volume between WT and mutant residue types can be an additional source of bias [48]. A recent investigation comparing several prediction methods, including FoldX, pointed out that mutation type (i.e. the type of the specific WT and mutant residue considered) is also a significant source of bias. In addition, the same study pointed out that such methods perform better in predicting neutral or destabilizing mutations than for stabilizing or strongly destabilizing mutations [49].

Moreover, FoldX is sensitive to the input structure. Therefore, as a solution, the calculations could be done on multiple conformations of the same protein [49]. In this regard, a modified protocol in which ensemble averages of free energies predictions calculated on conformers from molecular dynamics (MD) simulations has been proposed [44], with the observation that, while predictions performed on the single MD conformation had relatively poor correlation with experimental ΔΔG, using ensemble average predictions restored the correlation and reduced the spread of the ΔΔG values. This indicates that ensemble averaging in FoldX constitutes a viable line of research that could improve the results obtainable by the software.

The energy functions mentioned above, and FoldX in the first place, are, in general, promising approaches since they, in principle, allow for a fast but yet semiquantitative estimate of the changes in stability and at the interaction interface for other biological partners of all possible mutations in a protein structure. Despite their shortcomings, they still capture a correct trend in their prediction and are useful to obtain a large-scale picture of a protein’s mutational landscape. Nevertheless, elegant computational solutions to design flexible and comprehensive pipelines to carry out in silico deep mutational scans in a high-throughput manner are still missing, along with the extension of these functions to a systematic study of protein ensembles taking advantage of parallelization on modern multicore hardware.

Results

Design and implementation

Overview on MutateX

MutateX includes a wrapper in Python for the FoldX suite accompanied by plotting and post-processing Python scripts. The main program implements an in silico saturation mutagenesis protocol for proteins and protein complexes that uses the popular FoldX program to predict the effect of mutations. It allows calculating the free energy change associated with the systematic substitution of each protein residue to any of the natural amino acids or post-translationally modified residues, such as phospho-residues. MutateX automates and streamlines the mutation process, making running a full mutational scan straightforward, even for several conformers of the same protein. From a single three-dimensional (3D) protein structure or an ensemble of structures, MutateX provides the researcher with textual and graphical representations of the mutational data to explore and make sense of the large amount of information derived from such an exhaustive scan. This overview is possible through visualization tools, which produce publication-ready figures.

At the heart of the MutateX software lies a pipeline engine that handles input preparation, parallel runs of available programs able to estimate the free energy, coupled with data gathering and visualization tools. Currently, the supported software for calculating free energies is the FoldX Suite to determine the energetic effect of point mutations, along with the binding energies associated with the formation of multimeric protein complexes. MutateX is compatible with FoldX Suite versions 4 and 5 [50].

The MutateX protocol

In its most straightforward applications, the MutateX main wrapper requires as an input a single file with the coordinates of a protein 3D structure in the protein data bank (PDB) format. Moreover, the user should provide the FoldX running input files, for which templated are included by default in the MutateX distribution. The tool also provides templates for the other calculations are used by FoldX, such as Repair, BuildModel and (optionally) AnalyseComplex, as detailed in the user guide. Advanced users are invited to create their own templates or modify the existing ones as they see fit.

The MutateX software implements the protocol summarized in Figure 1. First, the main MutateX program checks that the input PDB files can be parsed correctly (i.e. they are in the proper format) and saves one file per model in the PDB format. Next, MutateX prepares and performs a Repair FoldX run on each model. For each repaired model, MutateX generates all the directories and files required to perform the mutational scan, including structure files, run files for FoldX and lists of mutations. Before attempting mutation, the software checks that all the input protein models have the same primary sequence. All substitutions on a given site are treated as a single independent FoldX run, making it possible to run several instances of the FoldX software simultaneously, which is necessary to take advantage of modern multicore processors and effectively reduce the execution time of the whole pipeline. MutateX takes care of scheduling the FoldX runs, keeping track of their status, and checking the output files. During the preparatory step, the software automatically recognizes if two or more identical protein chains are present in the input structure files and prepare the mutation runs accordingly, by mutating all the chains having the same sequence at the same time. This is the only instance in which multiple mutations are performed, meaning in all other cases only one mutation at the time is considered. It also supports setting the number of runs per mutation, so that cases in which more local structural variability is expected can be taken into account. Once all the mutation runs have been performed, MutateX reads the output of the FoldX runs and summarizes the estimated average free energy differences, the associated SDs and minimum and maximum values in separate text files for each residue. The MutateX processing tools can be used on these files to obtain representations of the effects of the mutations, such as probability density plots of ΔΔG values calculated with Kernel density estimation (ddg2density), heatmaps (ddg2heatmap), histograms relative to single mutation sites (ddg2histo), logo plots (ddg2logo) and per-site distributions of ΔΔGs in different representations (ddg2distribution). The user can specify custom labels by filling in a template generated using the pdb2labels tool for plotting. The ddg2dg tool applies systematic linear corrections to the calculated ΔΔG values. The ddg2excel tool converts the dataset to Microsoft Excel-compatible files. PDB files with relevant information about the mutagenesis results can be stored in the B-factor field (ddg2pdb). These files can be further visualized and processed using protein structure viewers, such as PyMOL (https://pymol.org). The same data can also be written as the diagonal of a residue matrix file, compatible with the xPyder plugin [51]. The ddg2summary tools generate ad hoc reports on specific mutations. Most of these tools support extensive plotting options, the most important of which is the range of ΔΔG values to be considered. In fact, according to our experience, the ΔΔGs predicted by FoldX can go up a few tens kcal mol⁻¹. Therefore, if the entire range of predicted ΔΔGs is considered, identifying smaller but still meaningful differences becomes difficult. We suggest limiting the range of ΔΔG values between −3.0 and 5.0 kcal mol⁻¹, as this is the range in which most mutations were experimentally found (see Results for details). ΔΔG values below or above this are likely to be overestimated by the method.

Figure 1

Scheme of the workflow implemented in MutateX, as described in the Methods section.

Open in new tab Download slide

Self-mutations, i.e. those mutations that replace a residue with itself, can also be performed independently with respect to the rest of the protocol and are especially useful as a sanity check of the method for every single mutation site, in each structure. In fact, the free energy difference calculated when mutating a residue to itself acts as a lower bound of the prediction error, as we would generally expect ΔΔG close to 0 for such a mutation.

MutateX allows customizing some aspects of the calculations. For instance, it is possible to provide the list of target residue types that should be considered in the scan, and to include multiple models from one or more PDB files. Indeed, MutateX supports the saturation scan and the free energy calculations over structural ensembles, an approach that in principle might mitigate the inaccuracies associated with the limited conformational variability that FoldX allows for during modeling of the mutant variants. Furthermore, the software can introduce simultaneously each mutation on more than one chain in the case homo-dimers or homo-multimers are present in the PDB file. Finally, MutateX is designed to handle, calculate and plot the changes of free energies of binding between complexes upon mutation, both considering homo-multimers and hetero-multimers.

Performance

Most of the run time for the MutateX scan is taken by the runs of the ΔΔG estimation engine, as the operations of preparation and post-processing performed by MutateX are computationally inexpensive. Therefore, the performance of MutateX is tied to that of the underlying mutation engine - FoldX in this case. MutateX performs a complete mutational scan of a 300-residues protein to all the 20 natural variants in less than a day using four cores for parallel execution, which can routinely be done on a desktop class machine (Figure 2). As the FoldX runs performed by MutateX are completely independent of one another, the execution time scales roughly linearly with the number of cores as well as with the number of input structures per run. For instance, considering the p53 MD ensembles below (199 residues), the execution time using 10 processor cores was 51.14, 103.40 and 203.6 h for 25, 50 and 100 frames, respectively. With 32 processor cores, 100 frames can also be processed in approximately 72.5 h.

Figure 2

Scaling behavior of MutateX with increasing number of residues. These run times have been obtained using MutateX in combination with FoldX Suite version 4.

Open in new tab Download slide

Results

Selection of a cutoff for truncation of ΔΔGs values for plotting purposes

The predicted FoldX ΔΔGs could reach up a few tens of kcal mol⁻¹ in some cases, making visualizing a complete mutational scan challenging due to scale effects, for instance, when using heatmaps. Using large scales would mask smaller but meaningful differences, giving more importance to very large energy differences that are likely to be overestimations. Therefore, we, sought to identify a range of free energy differences in which small differences are considered meaningful, while smaller or larger values than that can be flattened out as largely destabilizing or stabilizing. To this extent, we have used the ProTherm database [52] to check how experimentally derived ΔΔG values are distributed, after filtering out those values for which an energy unit was not specified in the database. Figure 3 shows that most mutations fall within the −3.0 to 5.0 kcal mol⁻¹ range, which can be used as a cutoff for truncation of ΔΔG values when visualizing MutateX mutation results, which correspond roughly to the second and 98th percentile, respectively. We recommend using these values as range limits for visualization and plotting purposes.

Figure 3

Distribution of experimental ΔΔG values, upon sign change, as found in the ProTherm database [52] for which a unit was specified.

Open in new tab Download slide

Saturation mutation scan of the cancer-related protein phosphatase and tensin homolog (PTEN): relationship between protein stability and disease

PTEN is a tumor suppressor with homology to tyrosine phosphatases and the cytoskeletal protein tensin [53]. The PTEN monomer consists of two major functional domains, i.e. the phosphatase domain (protein tyrosine phosphatase (PTP), residues 14–185) and the C2 domain (residues 190–285). PTEN is involved in regulating cellular growth and suppresses cell migration and invasion [54]. Mutations in PTEN have been identified in various human cancers [55]. Most reported missense mutations of PTEN are located within the exon encoding the PTP domain.

PTEN has been used in a study in which other methods have been used to predict changes in free energy upon mutations and then compared to multiplexed experiments [56]. The authors found that the loss of PTEN stability was a driving factor for disease-causing mutations, with 60% of the pathogenic variants causing the loss of function because the protein is destabilized and degraded. PTEN is thus a useful case study to illustrate the possible application of the scan for changes in stability with the MutateX protocol. Indeed, we performed a saturation mutagenesis scan of PTEN to identify the most damaging mutations and relate them with cancer data publicly available.

We employed the most complete structure of PTEN available at the time of analyses (PDB ID 1D5R [57]). Notably, the PDB entry lacks atomic coordinates for residues 286–309 in a disordered region—however, given the local sampling carried out by FoldX, the missing residues are likely to affect only residues in the proximity of the loop. We did not model the missing region since this calculation serves only for illustrative purposes.

Figure 4

Saturation mutagenesis scan of the tumor suppressor PTEN by MutateX. The heatmap (A) and box plot (B) of the mutational scan corresponding to the residues 64–113 is shown as an example. It includes the folding ΔΔGs estimated for the mutation of every mutation site (x-axis) to every natural amino acid (y-axis) using FoldX. The values of free energy difference in A have been limited to the range −3 to 5 kcal mol⁻¹, meaning that values above 5 or below −3 kcal mol⁻¹ are shown as the extremes of the range. In panel D, the average ΔΔG for the mutation of each residue to all the 20 natural residue types is shown; then the ten sites with the largest average ΔΔG are labeled. The same mutations are shown on the structure of PTEN in panel C, produced with PyMOL.

Open in new tab Download slide

The results from the scan were visualized exploiting different post-processing tools available within MutateX. We first analyzed the results using heatmaps (produced by ddg2heatmap) and boxplots (ddg2distribution), which provide a comprehensive overview of the free energy changes upon mutations along the protein sequence. In particular, the ddg2distribution tool has been designed to produce different representations of the obtained ΔΔGs on a per-residue distribution basis. The heatmap and box plots are shown in Figure 4A and B with changes of ΔΔGs in the region 64–113 of PTEN, corresponding to the middle part of the phosphatase domain. The results may be classified into the following scenarios:

(i) the ΔΔG is not influenced by the substitution of the WT amino acid. An overall lack of change in ΔΔGs at a certain site indicates that the WT residue is not essential for protein stability. This is often the case when residues are located at the protein surface, or in the flexible region (such as D77) or if they lack interactions with neighboring residues.
(ii) The ΔΔG increases upon mutations of the WT residue for a certain subset of other residues. This subset is likely to have something in common, e.g. residues are hydrophobic, hydrophilic, charged, small/large or have an aromatic side chain. Substituting a small uncharged residue with another similar one is unlikely to alter protein stability, whereas substitutions to large aromatic side chains could have a marked effect (such as in the case of N69).
(iii) The ΔΔG increases regardless of what the WT amino acid is mutated to, indicating that the native residue may play an essential/specific role in preserving the protein structure (such as the example of P95 and P96).
(iv) The ΔΔG decreases when the WT residue is mutated to one or a subset of other residues—e.g. the substitutions result in a more stable protein variant. An observed decrease in ΔΔG is expected to be significantly more modest in size and observed less often as protein structure/stability is evolutionarily optimized (for example the substitution of residue Q with Y or F at position 87).

Table 1

Open in new tab

Residues with largest average ΔΔG annotated with information from cBioPortal

Residue position	Wild-type amino acid	Annotated mutant amino acid	Number of missense annotations	Cancer type	Location within PTEN structure
83	C	NA	NA	NA	Phosphatase domain
95	P	S, L	5	Glioblastoma multiforme, uterine endometrioid carcinoma, colorectal adenocarcinoma and mixed germ cell tumor	Phosphatase domain
96	P	A, L, S, T	4	Uterine endometrioid carcinoma, chromophobe renal cell carcinoma, glioblastoma multiforme and uterine endometrioid carcinoma	Phosphatase domain
127	G	E, R, V	15	Lung adenocarcinoma, glioblastoma multiforme, hepatocellular carcinoma, cutaneous melanoma, uterine endometrioid carcinoma, colorectal adenocarcinoma, ampullary carcinoma, teratoma, uterine clear cell carcinoma and uterine serous carcinoma	Phosphatase domain
132	G	A, C, D, S, V	23	Solitary fibrous tumor, uterine carcinosarcoma, glioblastoma multiforme, cutaneous melanoma, colorectal adenocarcinoma, lung adenocarcinoma, astrocytoma, uterine endometrioid carcinoma, esophageal squamous cell carcinoma, medulloblastoma, prostate adenocarcinoma and rectal adenocarcino	Phosphatase domain
151	A	G, T, D, V	4	Breast invasive lobular carcinoma, colorectal adenocarcinoma, uterine carcinosarcoma and glioblastoma multiforme	Phosphatase domain
165	G	E, R	15	Small cell lung cancer, renal non-clear cell carcinoma, medulloblastoma, cutaneous melanoma, glioblastoma multiforme, rectal adenocarcinoma and uterine endometrioid carcinoma	Phosphatase domain
204	P	L, R, S	7	Melanoma, glioblastoma multiforme, uterine endometrioid carcinoma, uterine carcinosarcoma, colorectal adenocarcinoma and cutaneous melanoma	C2 domain
213	P	R	1	Prostate adenocarcinoma	C2 domain
328	A	NA	NA	NA	C2 domain

Residue position	Wild-type amino acid	Annotated mutant amino acid	Number of missense annotations	Cancer type	Location within PTEN structure
83	C	NA	NA	NA	Phosphatase domain
95	P	S, L	5	Glioblastoma multiforme, uterine endometrioid carcinoma, colorectal adenocarcinoma and mixed germ cell tumor	Phosphatase domain
96	P	A, L, S, T	4	Uterine endometrioid carcinoma, chromophobe renal cell carcinoma, glioblastoma multiforme and uterine endometrioid carcinoma	Phosphatase domain
127	G	E, R, V	15	Lung adenocarcinoma, glioblastoma multiforme, hepatocellular carcinoma, cutaneous melanoma, uterine endometrioid carcinoma, colorectal adenocarcinoma, ampullary carcinoma, teratoma, uterine clear cell carcinoma and uterine serous carcinoma	Phosphatase domain
132	G	A, C, D, S, V	23	Solitary fibrous tumor, uterine carcinosarcoma, glioblastoma multiforme, cutaneous melanoma, colorectal adenocarcinoma, lung adenocarcinoma, astrocytoma, uterine endometrioid carcinoma, esophageal squamous cell carcinoma, medulloblastoma, prostate adenocarcinoma and rectal adenocarcino	Phosphatase domain
151	A	G, T, D, V	4	Breast invasive lobular carcinoma, colorectal adenocarcinoma, uterine carcinosarcoma and glioblastoma multiforme	Phosphatase domain
165	G	E, R	15	Small cell lung cancer, renal non-clear cell carcinoma, medulloblastoma, cutaneous melanoma, glioblastoma multiforme, rectal adenocarcinoma and uterine endometrioid carcinoma	Phosphatase domain
204	P	L, R, S	7	Melanoma, glioblastoma multiforme, uterine endometrioid carcinoma, uterine carcinosarcoma, colorectal adenocarcinoma and cutaneous melanoma	C2 domain
213	P	R	1	Prostate adenocarcinoma	C2 domain
328	A	NA	NA	NA	C2 domain

Table 1

Open in new tab

Residues with largest average ΔΔG annotated with information from cBioPortal

Residue position	Wild-type amino acid	Annotated mutant amino acid	Number of missense annotations	Cancer type	Location within PTEN structure
83	C	NA	NA	NA	Phosphatase domain
95	P	S, L	5	Glioblastoma multiforme, uterine endometrioid carcinoma, colorectal adenocarcinoma and mixed germ cell tumor	Phosphatase domain
96	P	A, L, S, T	4	Uterine endometrioid carcinoma, chromophobe renal cell carcinoma, glioblastoma multiforme and uterine endometrioid carcinoma	Phosphatase domain
127	G	E, R, V	15	Lung adenocarcinoma, glioblastoma multiforme, hepatocellular carcinoma, cutaneous melanoma, uterine endometrioid carcinoma, colorectal adenocarcinoma, ampullary carcinoma, teratoma, uterine clear cell carcinoma and uterine serous carcinoma	Phosphatase domain
132	G	A, C, D, S, V	23	Solitary fibrous tumor, uterine carcinosarcoma, glioblastoma multiforme, cutaneous melanoma, colorectal adenocarcinoma, lung adenocarcinoma, astrocytoma, uterine endometrioid carcinoma, esophageal squamous cell carcinoma, medulloblastoma, prostate adenocarcinoma and rectal adenocarcino	Phosphatase domain
151	A	G, T, D, V	4	Breast invasive lobular carcinoma, colorectal adenocarcinoma, uterine carcinosarcoma and glioblastoma multiforme	Phosphatase domain
165	G	E, R	15	Small cell lung cancer, renal non-clear cell carcinoma, medulloblastoma, cutaneous melanoma, glioblastoma multiforme, rectal adenocarcinoma and uterine endometrioid carcinoma	Phosphatase domain
204	P	L, R, S	7	Melanoma, glioblastoma multiforme, uterine endometrioid carcinoma, uterine carcinosarcoma, colorectal adenocarcinoma and cutaneous melanoma	C2 domain
213	P	R	1	Prostate adenocarcinoma	C2 domain
328	A	NA	NA	NA	C2 domain

Residue position	Wild-type amino acid	Annotated mutant amino acid	Number of missense annotations	Cancer type	Location within PTEN structure
83	C	NA	NA	NA	Phosphatase domain
95	P	S, L	5	Glioblastoma multiforme, uterine endometrioid carcinoma, colorectal adenocarcinoma and mixed germ cell tumor	Phosphatase domain
96	P	A, L, S, T	4	Uterine endometrioid carcinoma, chromophobe renal cell carcinoma, glioblastoma multiforme and uterine endometrioid carcinoma	Phosphatase domain
127	G	E, R, V	15	Lung adenocarcinoma, glioblastoma multiforme, hepatocellular carcinoma, cutaneous melanoma, uterine endometrioid carcinoma, colorectal adenocarcinoma, ampullary carcinoma, teratoma, uterine clear cell carcinoma and uterine serous carcinoma	Phosphatase domain
132	G	A, C, D, S, V	23	Solitary fibrous tumor, uterine carcinosarcoma, glioblastoma multiforme, cutaneous melanoma, colorectal adenocarcinoma, lung adenocarcinoma, astrocytoma, uterine endometrioid carcinoma, esophageal squamous cell carcinoma, medulloblastoma, prostate adenocarcinoma and rectal adenocarcino	Phosphatase domain
151	A	G, T, D, V	4	Breast invasive lobular carcinoma, colorectal adenocarcinoma, uterine carcinosarcoma and glioblastoma multiforme	Phosphatase domain
165	G	E, R	15	Small cell lung cancer, renal non-clear cell carcinoma, medulloblastoma, cutaneous melanoma, glioblastoma multiforme, rectal adenocarcinoma and uterine endometrioid carcinoma	Phosphatase domain
204	P	L, R, S	7	Melanoma, glioblastoma multiforme, uterine endometrioid carcinoma, uterine carcinosarcoma, colorectal adenocarcinoma and cutaneous melanoma	C2 domain
213	P	R	1	Prostate adenocarcinoma	C2 domain
328	A	NA	NA	NA	C2 domain

To determine which of the 351 PTEN amino acids were most important for protein stability, we have used ddg2distribution to plot the per-site average ΔΔG over all the residues of PTEN. We have further annotated with residue names the ten positions with the largest average ΔΔG (Figure 4C and D). On average, positions 83, 95, 96, 127, 132, 151, 165, 204, 213 and 328 featured the more pronounced effects on PTEN stability upon mutation. The top 10 residues are highlighted with dots/sticks on the 3D structure of the protein reported in Figure 4C. The mutation predicted to be most sensitive to substitution is the glycine residue at position 127, which is in a loop right before the third α-helix in PTEN. Glycine residues located in disordered regions such as loops provide a certain degree of flexibility, which is important for the proper functional breathing motions of the protein [29]. It should also be noted that, since FoldX does not relax the protein backbone upon mutation, replacing a residue without side chain with larger ones can result in steric clashes that artificially increase the obtained ΔΔG values.

PTEN functions as a tumor suppressor in many cancer types. Thus, we extracted 1116 missense cancer-associated mutations of PTEN using the cBioPortal database [9] (August 2019)—a comprehensive collection of cancer mutations identified by genomic initiatives of high-throughput profiling of samples from cancer patients. Intriguingly, eight out of ten of the top ten candidates identified in the MutateX scan were annotated in CBioPortal and are reported in Table 1. Furthermore, G127, G132 and G165 were among the most frequent mutation sites of PTEN in cancer patients.

Figure 5

Saturation mutagenesis for the p53 DBD tetramer. (panels A and C) Heatmaps of interaction ΔΔG for p53 generated using MutateX. Only the residues that take part in the interaction between the different chains are shown. On the left hand side (panels A and B), the difference interaction energies are between chains A and B are shown, while the right hand side panels (panels C and D) are for chains A and D. Heatmaps for both cases (panels A and C) have been generated using the ddg2heatmap tool, white a violin plot (panel B) and a stem plot (panel D) have been generated using ddg2distribution. Each point in the stem plot shows the average difference in interaction energy of that mutation site on the 20 residue types it has been mutated to, with the SD shown as a vertical bar.

Open in new tab Download slide

Saturation mutagenesis to identify hotspot residues at the binding interface for tetramerization of p53

The tumor suppressor p53 is known as the guardian of the human genome. It is a stress response transcription factor involved in different cellular functions such as cell cycle arrest, senescence and apoptosis [58]. Mutations in the p53 gene occur at high frequency in many cancers [59–61]. p53 exerts its function as a transcription factor by binding to DNA through its central folded DNA-binding domain (DBD, residues 91–289), which is also able to bind other proteins and acts as a signal integration hub [58, 62, 63]. The activity of p53 depends on its conformation. P53 is active in a tetrameric (dimer of dimers) conformation required for its ability to bind with high affinity to DNA and interact with various proteins [58]. Formation of the tetramer happens through a specialized tetramerization (TET) domain; nonetheless, the DBD of the four p53 units in the tetramer still interact with one another [64]. Many of the cancer-related p53 mutations are in the DBD and have destabilizing effects, but there are also mutations that do not fall within this class. To properly understand the effects of mutations on p53 and their roles in cancer, it is thus essential to identify hotspot mutations that could induce structural changes at the binding interface for TET of p53.

Here, we used MutateX to study the effects of mutations on the binding interfaces between DBDs in the p53 tetramer. We performed a saturation mutagenesis scan of p53 tetramer, mutating each residue to all the 20 natural amino acids, using the X-ray structure of the p53 DBD as a self-assembled tetramer (PDB ID 3KMD) [65] and calculated the differences in binding energies upon mutation. As the structure contains four protein chains with identical sequences, MutateX automatically recognized it as a tetramer and performed the same mutation of each chain for every run. The software then calculated differences in binding energy considering every possible combination of protein chains (A–B, B–C, C–D, A–D, A–C and B–D). The p53 dimer of dimer has a planar structure shaped like a parallelogram, with no contacts between monomers A–C and B–D. Interface A–B connects two p53 monomers into a dimer (dimer interface), while interface A–D connects two dimers into the tetramer (dimer–dimer interface). We here considered only differences in free energy of binding between pairs A–B and A–D, as the other two interfaces are identical to these. We first plotted a heatmap of binding ΔΔGs using ddg2heatmap, which provides a comprehensive visual representation of the free energy changes, limiting the plotted values between −3.0 and 5 kcal mol⁻¹ (Figure 5A and C). We also plotted the per-site ΔΔG distributions using ddg2distribution, obtaining either a violin or a stem plot (Figure 5B and D). The sites with mutations causing marked changes in binding free energy between chains A and B are in the regions C176-R181 and S241-C245 (Figure 5A). These correspond to the dimerization interface of the two p53 chains A and B, which include part of L2 and L3, close to where the zinc-binding site is located. Residues that were previously identified as important for the interaction interface, such as P177, H178, E180, M243 and G244 and surrounding residues, featured the most destabilizing effects in terms of binding ΔΔG when mutated. This is the case of mutations of residue C176 to bulky aromatic residue (F, W and Y) or charged ones (K, R and H) or of mutation of P177 to S,T, G and R. Replacing key residues on the other side of the binding interface is predicted to be damaging in general, given the high ΔΔG values for mutations at M243 and G244.

Next, we analyzed the interaction surface between chains A and D, which corresponds to the dimer–dimer interface. The interface is extended and includes two patches of residues, formed by loops connecting the β-stands, the DBD N-terminal tail and loop L2. One patch of this interface (patch I) is described as formed by L93, S94, S95, S166, Q167, T170 and F212 of monomer A and L201, and T140, E198, G199, M200, R202 and H233 of monomer D. Of these residues, L93, Q167, G199, L201 and H233 included the most destabilizing mutations according to our scan. G199 is deeply buried within patch 1 and faces T170 directly; its mutation to bulky amino acids (such as F, W, Y, H, but also D and E) is found to be highly destabilizing. It should be noted that FoldX does not rearrange the protein backbone after side-chain replacement and remodeling, meaning the mutation might be predicted as more destabilizing than it would be if the whole structure could be fully relaxed. Patch 2 is smaller and composed of residue Q100 and K101 on monomer A and E224 and V225 on chain D. Of these potential mutation sites, E224 is predicted to be the most sensitive, with most mutations predicted to be destabilizing.

Saturation mutagenesis of protein-peptide complexes: identifying binding specificities of SH2 domains

Protein PTMs play a critical role in cellular transduction. Phosphorylation is the most well-characterized PTM and consists of transferring the terminal phosphate group from adenosine triphosphate (ATP) to the hydroxyl group of serine, threonine or tyrosine residues. Protein kinases and phosphatases control the levels of protein phosphorylation, regulating several processes such as proliferation, DNA repair, programmed cell death, differentiation and immune responses [66]. Indeed, the physiological phosphorylation pattern is frequently altered in pathological conditions such as cancer [67]. Phosphorylation of short linear motifs creates docking sites recognized by binding domains, such as Src homology 2 (SH2) and others [68]. SH2 domains specifically bind pTyr-containing linear motifs and represent the largest class of pTyr-recognition domains [69]. SH2 domains have been studied with biochemical assays and high-throughput approaches, clarifying the determinants of binding specificity for different members of the SH2 family [69–72].

We used MutateX to evaluate how phosphorylations can modulate the interaction between SH2 domains and peptides, aiming at predicting the change of free energy of binding upon mutation of phosphorylated and non-phosphorylated peptides. This allowed us to understand which mutations influence the binding and to correlate our results with the experimentally known binding specificities of phosphopeptides to SH2 domains.

We performed a MutateX saturation mutagenesis scan of different SH2 containing proteins complexed with phospho-tyrosine peptides: CDPK-related protein kinase (CRK) protooncogene bound to a CRK-derived phosphopeptide (PDB ID 1JU5 [73]), GRB2 SH2 domain bound to a phosphorylated peptide (PDB ID 1JYR [74]) and SHP2 phosphatase complexed with a with RLNpYAQLWHR peptide (PDB ID 3TL0 [75]). We selected these complexes because of their known heterogeneous binding specificities [69, 71] and because 3D experimental structures were available.

We compared our results with the target consensus described in Tinti et al. [71], in which high-density peptide chip technology has been used to define the binding specificity of 70 different SH2 domains to pTyr peptides. Results from this article indicate that: (i) CRK SH2 domain has a strong preference for Proline in position +3 (from phospho-tyrosine), (ii) GRB2 SH2 domain has a strong preference for Asparagine in position +2 and (iii) N-terminal SH2 domain of SHP2 has a preference for hydrophobic residues at positions −2, +1 and +3.

Following the MutateX run, we analyzed the results by using the post-processing scripts from the MutateX suite of tools. At first, we used ddg2distribution to analyze the site averages of all the calculated ΔΔGs focusing our attention on the phosphopeptides. This tool was used to identify the residue positions that if mutated were impacting more on the binding free energy states of the complex. (Figure 6, center). Next, we used the ddg2heatmap tool to generate heatmaps of binding interface ΔΔGs, including all the 20 naturally occurring amino acids (Figure 6, left). This tool provided an immediate graphical representation of the residues, which were either more or less perturbing the interaction between the SH2 domains and the phosphopeptides, and gave more details on which substitutions are more likely to affect the interaction. We also used the ddg2histo tool to plot in graph bars the binding interface ΔΔGs for each residue in the phosphopeptides from positions −2 to +3 (Figure 6, right).

Figure 6

Changes in binding free energy in complexes of SH2-peptide. For each of the SH2 domain complexes, we have plotted the calculated ΔΔG for the mutation of key residues of the phosphopeptide. These are shown as heatmaps, generated by ddg2heatmap, box plots, generated by ddg2distribution. The ΔΔG of single key mutation sites are then shown more in detail as histograms, generated using the ddg2histo tool.

Open in new tab Download slide

As it is clearly shown in Figure 6, in all the SH2/phosphopeptide complexes, mutations of the pTyr residues are associated with high ΔΔGs, indicating a strong impact on the binding affinity. The substitution tyrosine/pTyr shows elevated ΔΔGs, in agreement with experimental data according to which the tyrosine phosphorylation is a prerequisite for the binding. Interestingly, the ΔΔGs were higher upon mutations in the residue positions that were shown to be mostly conserved in the different SH2 domain target consensus.

More in detail, the analysis showed that (i) CRK SH2 domain displays a strong preference for Proline residue in position +3, while all the other mutations affect the interaction, except for branched-chain amino acids and Met, which present lower ΔΔGs; (ii) GRB2 SH2 domain shows a strong preference in position +2 for Asn, with aromatic and charged amino acids being the most disrupting substitutions (iii) SHP2 N terminal SH2 shows a moderate preference of hydrophobic residues in position −2 and +1 and a more pronounced preference in position +3 for Leu and Met.

Figure 7

Performance of MutateX predictions when ran on p53 DNA-binding domain molecular dynamics ensembles. The plots illustrate the correlation between available experimental ΔΔG values and MutateX predictions. We performed MutateX scans on the original X-ray experimental structure, the starting structure of an MD simulation and MD simulation ensembles of 25, 50 and 100 frames extracted from the same MD trajectory. For the ensemble runs, we show the average ΔΔG values calculated over independent FoldX runs performed on each ensemble structure. The horizontal lines show the SD of experimental measurements when multiple values were found for any given mutation, while vertical lines account for the SD of the predicted values on the ensemble.

Open in new tab Download slide

Our predictions were consistent with experimental data which defined the SH2 domains consensus. Summarizing, this example clearly indicates that MutateX is a useful tool for the prediction of binding specificities and protein–peptide interaction analysis.

Comparison of MutateX-predicted changes in free energy for the p53 DBD in experimental structures and ensembles from molecular dynamics

As discussed in the introduction, FoldX estimates changes of free energy upon mutation by side-chain replacement, keeping the protein main-chain rigidly fixed. This, joined with the observation that the algorithm is quite sensitive to the input structure, led to the suggestion that performing mutational scans on ensembles of conformations, either from experiments or simulations, might be useful to improve the predictions. Indeed, MutateX supports performing mutational scans on multiple structures as a single job. It also provides the average ΔΔG values independently predicted in each structural scan. Here, we have used the p53 DBD as a case study and performed saturation mutagenesis scans on conformational ensembles from an MD simulation as well as the starting structure of such a simulation and the corresponding crystal structure (PDB ID 2XWR [76]). The ensembles were extracted as PDB files from the same 500 ns MD trajectory of p53 DBD [77], equally spaced as to obtain ensembles of 100, 50 and 25 frames, meaning the 100-frames ensemble is a perfect superset of the other two. Each PDB file was then used as input for MutateX directly. We then selected a set of mutations of the p53 DBD whose experimentally determined ΔΔG values were available in ThermoMutDB [78] and computed the Pearson correlation coefficient r between this dataset and the corresponding ensemble-averaged ΔΔG values from our mutational scans.

As shown in Figure 7, we obtain a good correlation (r = 0.71) between experimental and predicted free energy changes when using the experimental structure. Using just a single structure from MD we obtain a significantly worse correlation (r = 0.47), while using our MD ensembles results in a small improvement in the correlation (r = 0.75–0.77), with 50 frames being the best choice for this case. It should be noted that the difference in correlation is mostly due to a single outlier (R175H, ΔΔG = 9.9 kcal mol⁻¹ for the X-ray structure scan); removing it from the dataset makes r very similar for all the datasets (r = 0.75–0.76) except for the single-MD frame one (r = 0.58). These results suggest that using MD ensembles of multiple structures was able to improve our prediction and make the analysis more robust.

Figure 8

ΔΔG of interaction calculated by MutateX saturation scans of the Spike protein belonging to the WT and Omicron variants. Panels on the left and right refer to the WT and Omicron structures, respectively. We report values for residues of the Spike protein within 6 Å from ACE2 in both structures. Positive ΔΔG values indicate a detrimental effect for binding. Boxplots (top) show the distribution of the predicted ΔΔG values for each site, while heatmaps (bottom) illustrate the predicted ΔΔG of interaction in kcal mol⁻¹ for the spike residues for each mutation.

Open in new tab Download slide

Identification of variants in the Coronavirus Disease-19 (COVID-19) spike-ACE2 complex

Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) is the pathogenic agent of the current COVID-19 pandemic. The viral spike protein mediates cell entry by first contacting molecules on the cell surface. Specifically, spike is characterized by a receptor binding domain (RBD) capable of interacting with the human receptor Angiotensin-Converting Enzyme 2 (ACE2). The recently reported Omicron (B.1.1.529) variant, which emerged in November 2021 in South Africa, has quickly become a variant of concern because of its high infectivity and immune evasion capability. Omicron spike protein is characterized by 37 mutations with respect to the original Wuhan-Hu-1 strain (WT variant here on), with 15 of them located in the RBD [79]. Omicron spike has an overall increased apparent affinity for ACE2 with respect to the WT spike and a comparable affinity with respect to the Delta variant [79]. Among the 15 mutations of the Omicron variant, previous studies including a high-throughput surface plasmon resonance (SPR) assay [80] of the WT variant has shown that some of the Omicron variant mutations mildly decrease (S371L, S373P, S375F, K417N, G446S, E484A, Q493R, G496S, Q498R and Y505H) or increase (G339D, N440K, S477N, T478K and N501Y) the binding with the RBD. A recently published cryoelectron microscopy 3D structure of the Omicron spike complexed with ACE2 has shed light on the structural details of the structural differences imparted by the mutations [79].

Here we used MutateX on the spike RBD in complex with ACE2 receptor as a case study aiming at identifying hotspot residues involved in the spike-receptor binding as well as evaluating the effect of the Omicron variant mutations. We performed both a MutateX saturation scan and a MutateX self-scan on the most complete (at the best of our knowledge) WT spike RBD in complex with ACE2 (PDB ID 6LZG [81]) and on the only currently available cryo-EM structure of the Omicron variant RBD interacting with the receptor (PDB ID 7T9L [79]). As we were only interested in the residues of the spike–ACE2 interaction surface, we performed partial mutational scans by specifying a list of mutation sites of interest as supported by the software. For this purpose, we selected in both cases all the residues of the spike RBD chain and the ACE2 residues that were at a distance lower than 12 Å from spike. We then considered for further analysis those residues of spike that were in close contact with the ACE2 (i.e. within 6 Å in both the WT and Omicron structures). We also performed a self-scan, mutating all residues to their own WT variants as quality control, finding that all the residues folding ΔΔGs were low (|ΔΔG| < 1.2 kcal mol⁻¹) except for E406 and C432 in the Omicron scan.

Subsequently, we visualized the results of the saturation scan with different post-processing tools provided by MutateX. To gain a general overview of the effect of the spike mutations on the ΔΔG of interaction, we focused on the spike residues placed at the interface with ACE2 and we plotted heatmaps (ddg2heatmap) restricting the plotted values to a range of −3.0 to 5 kcal mol⁻¹ and boxplots (ddg2distribution) for both the WT and the Omicron variant (Figure 8).

Our MutateX scan showed no effect on mutation on either WT or Omicron variant for a few sites of interest (339, 371, 373 and 375; see dataset on GitHub). These sites were indeed not in contact with the ACE2 protein in our structures. Overall, our scan identified several sites that were highly sensitive to mutations in terms of interaction free energy, most of them identified both in the WT and the Omicron scans (G447, Y449, L455, F456, A475, F486, N487, Y489, G496, N501, G502 and Y505). Positions 496, 501 and 505 coincide with mutation sites identified in the Omicron variant.

We have then used the ddg2summary tool to extract specifically the predicted effect on the binding of the Omicron variant mutations. Most of them have a small effect on the ΔΔG of interaction, with only a few values with |ΔΔG| > 1.0 kcal mol⁻¹. This is consistent with SPR data that also showed small effects for most of the mutations [80]. The scan of our WT structure highlighted the slight destabilizing effect of mutations K417N (ΔΔG = 0.67 kcal mol⁻¹) and Y505H (ΔΔG = 1.77 kcal mol⁻¹) in agreement with experimental data. However, it unexpectedly predicted mutation G496S to be stabilizing (ΔΔG = −1.04) [80]. This is probably due to the bias inherent in replacing a small residue with a larger one. Similarly, N501Y was predicted to be highly destabilizing (ΔΔG = 5.49 kcal mol⁻¹) while it is experimentally known to increase binding [80]. The latter result is not entirely surprising given the method’s bias toward destabilizing mutations. In addition, we found that the inverse Y501N mutation from the Omicron variant scan was also predicted to be destabilizing, in agreement with the experimental data [80]. Moreover, the stabilizing effect of this mutation was proposed to be due to pi-stacking interactions which we do not expect to be fully represented in the FoldX force field. Furthermore, it should be noted that mutations can have cooperative effects. For example, Q498R increases binding affinity to ACE2 when in combination with N501Y [82], which cannot be adequately captured by our scan.

Limitations, future directions and availability

MutateX is a collection of Python-based scripts and requires Python 2.7 and 3 or higher to be available on the system together with open-source Python libraries, whose installation is straightforward on macOS and Linux distributions. MutateX comes with a standard installation script that takes care of installing all the necessary requirements as well. It is available at https://github.com/ELELAB/mutatex. The package also includes templates for input preparation, in-depth documentation and test cases. It should be noted however that to perform the mutagenesis calculation, the most recent version of the FoldX Suite needs to be available in the system. The FoldX Suite is free of charge for academic use, and it is currently available athttp://foldxsuite.crg.eu. While MutateX currently supports the FoldX Suite only, we plan on adding support for other software packages or web servers in the future, along with providing customized support to other energy functions available in the literature and of potential interest for the user community.

The potential of MutateX is attested by recent publications where a preliminary suite of tools designed by us was employed to perform saturation mutational scans for several systems of biological interest, such as enzymes of industrial and pharmacological interest [45, 83], the Parkinson-related DJ1 protein [84], transcription factors as MZF1 [85], proteins involved in mismatch repair [41, 42] and other disease-related targets [41]. The aforementioned studies, along with the examples reported in this manuscript, provide insight into the mutational landscape of known cancer- or disease-related proteins. This was in fact an important driving force behind the development of MutateX. The wide-scale and high-throughput prediction of the effects on stability and binding of disease-related targets can help predict the effect of cancer-related missense variants found in mutational data or cancer mutation initiatives (such as catalogue of somatic mutations in cancer (COSMIC) [86]). The scans that MutateX provides can contribute to identifying mutational hotspots in cancer targets. Even in this context, however, the limitations of the tool should be taken into consideration. FoldX relies on local side-chain rearrangement, and we thus don’t expect predictions to reliably represent long-range allosteric effects, which are important in the context of different diseases [87, 88]. For similar reasons, we don’t expect predictions to fully incorporate epistatic effects due to distant mutations when estimating the combined effect of multiple mutations (e.g. concerning ‘latent driver’ mutations [89]). Moreover, predictions of changes in stability only give a partial view of the effect of the amino acid changes disregarding important functional consequences. The use of other computational tools, either using structural data or not, can complement these predictions and obtain a broader picture of the effect of cancer mutations and of the mutational landscape overall [90, 91]. For instance, tools designed to investigate protein structure networks [30, 92, 93] or to quantify allosteric modulation [94] can complement MutateX data and partially address its shortcomings.

MutateX also has a relevant technological interest as it can be used in the context of protein engineering. It is a promising tool for designing enzymes with different thermal stabilities, as FoldX has been previously used in different protein engineering studies assessing protein (de)stabilization upon mutagenesis [41–44, 95–97]. However, given the aforementioned bias of FoldX toward destabilizing mutations, we advise carefully considering its results and combining them with experiments or other more accurate predictions of free energy changes on interesting candidates. Lastly, the FoldX Suite allows runs on structures containing DNA and RNA molecules as well as other molecules such ATP and GTP, and adding their support will be one of the further developments of MutateX.

MutateX provides a powerful complementary tool in experimental biochemistry, biotechnology and molecular/cellular biology, along with the molecular understanding and helping classification of disease-causing mutations.

Key Points

Automatization of in silico structure-based mutational scans for high-throughput applications
Possibility to estimate changes in both stability and binding (i.e. function) upon mutations
Possibility to apply the method to conformational ensembles of proteins and protein complexes
Different tools for visualization and analyses of the results

Author contributions

Conceived and designed the project: M.T., E.P. perform calculations and data visualization: T.T. (PTEN), M.D.M. (PTEN), I.P. (p53), E.M. (SH2), K.D. (MD ensembles) and L.B. (spike). Analyzed the data: T.T. (PTEN), M.T. (all), E.P. (all), I.P. (p53), E.M. (SH2, spike), K.D. (p53 ensembles) and L.B. (spike). Contributed analysis tools, programming: T.T., M.T., T.C. and I.P. Wrote the paper: M.T. and E.P. with inputs from all the coauthors.

Acknowledgements

Space limitation precludes the citations of many excellent works by colleagues in the fields discussed. We acknowledge these efforts here.

The authors are also extremely grateful to all the current and former Computational Biology Laboratory (now Cancer Structural Biology) members at the Danish Cancer Society Research Center for extensive testing and usage of the software, which allowed us to optimize it. We also would like to thank Kresten Lindorff-Larsen, Amelie Stein and Rasmus Hartmann-Petersen for the interest and support that their groups have demonstrated for our project, and for having used the tool for their proteins of interest. We are also grateful for the interest demonstrated by the FoldX developers in Luis Serrano’s group.

Data availability

To use for the Data Availability Section Software and data used in this publication is available at our GitHub repository https://github.com/ELELAB/mutatex.

Funding

The result of this research has been achieved using the DECI-PRACE15th and 16th HPC Grants on Archer. The research has been also carried out thanks to the access to the Danish HPC Infrastructure Computerome2. The research has been supported by grants from LEO Foundation (LF17006, LF17024), Carlsberg Distinguished Fellowship (CF18-0314), Danmarks Grundforskningsfond (DNRF125), NovoNordisk Fonden in Bioscience and Basic Biomedicine (0065262) and Biotechnology-based Synthesis and Production (17OC0027588) and Hartmann Foundation (A33877). This work is part of Interregional Childhood Oncology Precision Medicine Exploration, a cross-Oresund collaboration between University Hospital Copenhagen, Rigshospitalet, Lund University, Region Skåne and Technical University Denmark, supported by the European Regional Development Fund.

Author Biographies

Matteo Tiberti is a staff scientist at the Cancer Structural Biology group (Danish Cancer Society Research Center, DCRC, Copenhagen, Denmark) and his research focuses on software development for structural bioinformatics and annotations of cancer mutations.

Thilde Terkelsen worked as PhD student at the Cancer Structural Biology group at DCRC and she is now Data Scientist at the Center for Health and Data Science at the University of Copenhagen. At DCRC, she worked on the implementation of MutateX functions for downstream analyses and the analyses of -omics data.

Kristine Degn is a PhD student in the Cancer Systems Biology group (Department of Health and Technology, Technical University of Denmark, DTU, Lyngby, Denmark) and her research focuses on structure-based methods to characterize and classify variants found in cancer samples with impact on the protein products.

Ludovica Beltrame is an Erasmus exchange student at the Cancer Structural Biology group at DCRC. Her research is focusing on free energy calculations and molecular simulations to study the effect of mutations on protein structures.

Tycho Canter Cremers carried out his Master Studies in the Cancer Structural Biology group at DCRC, focusing on implementing new functions for MutateX and he is currently working as Bioinformatician at the Center of Medical Genetics, Antwerp.

Isabelle da Piedade worked as a Post-Doctoral Researcher at the Cancer Structural Biology Group at DCRC with a focus on analyses of genomics and transcriptomics data. She is currently working at Senior Bioinformatician at the Danish National Genome Center (Copenhagen, Denmark).

Miriam Di Marco worked as a pre-graduate fellow and master's student at the Cancer Structural Biology group at DCRC. She applied free energy calculations to characterize disease-related proteins and the impact of mutations on protein stability.

Emiliano Maiani worked as a Post-Doctoral Researcher and Senior Scientist at the Cancer Structural Biology group of DCRC. His main research focus in the group has been in the area of experimental structural and cellular studies of short linear motifs. He is currently an Adjunct Professor at UniCamillus–Saint Camillus International University of Health Sciences, Rome, Italy. His current research focuses on DNA damage response and autophagy.

Elena Papaleo is Associate Professor and leader of the Cancer Systems Biology group at DTU and group leader of the Cancer Structural Biology group at DCRC. Her research focuses on applying -omics bioinformatics and structural methods, accompanied by experimental validation with in vitro and cellular assays. These methodologies are used to unravel the effects of disease-related mutations and post-translational modifications on protein structures and to characterize protein-protein interactions mediated by disordered regions.

References

1.

Legrain

P

,

Wojcik

J

,

Gauthier

JM

.

Protein-protein interaction maps: a lead towards cellular functions

.

Trends Genet

2001

;

17

:

346

–

52

.

2.

Vaudel

M

,

Verheggen

K

,

Csordas

A

, et al.

Exploring the potential of public proteomics data

.

Proteomics

2015

;

16

:

214

–

25

.

3.

Boersema

PJ

,

Kahraman

A

,

Picotti

P

.

Proteomics beyond large-scale protein expression analysis

.

Curr Opin Biotechnol

2015

;

34

:

162

–

70

.

4.

Yurkovich

JT

,

Palsson

BO

.

Quantitative -omic data empowers bottom-up systems biology

.

Curr Opin Biotechnol

2018

;

51

:

130

–

6

.

5.

Riley

NM

,

Coon

JJ

.

Phosphoproteomics in the age of rapid and deep proteome profiling

.

Anal Chem

2015

;

88

:

74

–

94

.

6.

Laddach

A

,

Fung Ng

JC

,

Fraternali

F

.

Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants

.

PLoS Biol

2021

;

19

:

e3001207

.

7.

Laddach

A

,

Ng

JCF

,

Chung

SS

, et al.

Genetic variants and protein–protein interactions: a multidimensional network-centric view

.

Curr Opin Struct Biol

2018

;

50

:

82

–

90

.

8.

Forbes

SA

,

Beare

D

,

Boutselakis

H

, et al.

COSMIC: somatic cancer genetics at high-resolution

.

Nucleic Acids Res

2017

;

45

:

D777

–

83

.

9.

Cerami

E

,

Gao

J

,

Dogrusoz

U

, et al.

The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data

.

Cancer Discov

2012

;

2

:

401

–

4

.

10.

Moore

L

,

Cagan

A

,

Coorens

THH

, et al.

The mutational landscape of human somatic and germline cells

.

Nature

2021

;

597

:

381

–

6

.

11.

Gong

S

,

Worth

CL

,

Cheng

TMK

, et al.

Meet me halfway: when genomics meets structural bioinformatics

.

J Cardiovasc Transl Res

2011

;

4

:

281

–

303

.

12.

Meyer

MJ

,

Beltrán

JF

,

Liang

S

, et al.

Interactome INSIDER: a structural interactome browser for genomic studies

.

Nat Methods

2018

;

15

:

107

–

14

.

13.

Muiños

F

,

Martínez-Jiménez

F

,

Pich

O

, et al.

In silico saturation mutagenesis of cancer genes

.

Nature

2021

;

596

:

428

–

32

.

14.

Thusberg

J

,

Vihinen

M

.

Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods

.

Hum Mutat

2009

;

30

:

703

–

14

.

15.

Reva

B

,

Antipin

Y

,

Sander

C

.

Predicting the functional impact of protein mutations: application to cancer genomics

.

Nucleic Acids Res

2011

;

39

:

e118

.

16.

Timson

DJ

.

Value of predictive bioinformatics in inherited metabolic diseases

.

World J Med Genet

2015

;

5

:

46

–

51

.

Google Scholar

Crossref

WorldCat

17.

Bornscheuer

UT

,

Pohl

M

.

Improved biocatalysts by directed evolution and rational protein design

.

Curr Opin Chem Biol

2001

;

5

:

137

–

43

.

18.

Bommarius

AS

,

Blum

JK

,

Abrahamson

MJ

.

Status of protein engineering for biocatalysts: how to design an industrially useful biocatalyst

.

Curr Opin Chem Biol

2011

;

15

:

194

–

200

.

19.

Laskowski

RA

,

Tyagi

N

,

Johnson

D

, et al.

Integrating population variation and protein structural analysis to improve clinical interpretation of missense variation: application to the WD40 domain

.

Hum Mol Genet

2016

;

25

:

927

–

35

.

20.

Laskowski

RA

,

Thornton

JM

.

Understanding the molecular machinery of genetics through 3D structures

.

Nat Rev Genet

2008

;

9

:

141

–

51

.

21.

Stein

A

,

Fowler

DM

,

Hartmann-Petersen

R

, et al.

Biophysical and mechanistic models for disease-causing protein variants

.

Trends Biochem Sci

2019

;

44

:

575

–

88

.

22.

Villali

J

,

Kern

D

.

Choreographing an enzyme’s dance

.

Curr Opin Chem Biol

2011

;

14

:

636

–

43

.

Google Scholar

Crossref

WorldCat

23.

Karplus

M

,

Kuriyan

J

.

Molecular dynamics and protein function

.

Proc Natl Acad Sci U S A

2005

;

102

:

6679

–

85

.

24.

Baldwin

AJ

,

Kay

LE

.

NMR spectroscopy brings invisible protein states into focus

.

Nat Chem Biol

2009

;

5

:

808

–

14

.

25.

Vendruscolo

M

.

Determination of conformationally heterogeneous states of proteins

.

Curr Opin Struct Biol

2007

;

17

:

15

–

20

.

26.

Torchia

DA

.

NMR studies of dynamic biomolecular conformational ensembles

.

Prog Nucl Magn Reson Spectrosc

2015

;

84–85

:

14

–

32

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

27.

Vögeli

B

,

Orts

J

,

Strotz

D

, et al.

Towards a true protein movie: a perspective on the potential impact of the ensemble-based structure determination using exact NOEs

.

J Magn Reson

2014

;

241

:

53

–

9

.

28.

Dror

RO

,

Dirks

RM

,

Grossman

JP

, et al.

Biomolecular simulation: a computational microscope for molecular biology

.

Annu Rev Biophys

2012

;

41

:

429

–

52

.

29.

Papaleo

E

,

Saladino

G

,

Lambrughi

M

, et al.

The role of protein loops and linkers in conformational dynamics and allostery

.

Chem Rev

2016

;

116

:

6391

–

423

.

30.

Papaleo

E

.

Integrating atomistic molecular dynamics simulations, experiments, and network analysis to study protein dynamics: strength in unity

.

Front Mol Biosci

2015

;

2

:

1

–

6

.

31.

Lehtivarjo

J

,

Tuppurainen

K

,

Hassinen

T

, et al.

Combining NMR ensembles and molecular dynamics simulations provides more realistic models of protein structures in solution and leads to better chemical shift prediction

.

J Biomol NMR

2012

;

52

:

257

–

67

.

32.

Lindorff-Larsen

K

,

Best

RB

,

Depristo

MA

, et al.

Simultaneous determination of protein structure and dynamics

.

Nature

2005

;

433

:

128

–

32

.

33.

Esteban-Martín

S

,

Bryn Fenwick

R

,

Salvatella

X

.

Synergistic use of NMR and MD simulations to study the structural heterogeneity of proteins

.

Wiley Interdiscip Rev Comput Mol Sci

2012

;

2

:

466

–

78

.

Google Scholar

Crossref

WorldCat

34.

Geng

C

,

Xue

LC

,

Roel-Touris

J

, et al.

Finding the ΔΔG spot: are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it?

Wiley Interdiscip Rev Comput Mol Sci

2019

;

9

:

e1410

.

Google Scholar

Crossref

WorldCat

35.

Thusberg

J

,

Olatubosun

A

,

Vihinen

M

.

Performance of mutation pathogenicity prediction methods on missense variants

.

Hum Mutat

2011

;

32

:

358

–

68

.

36.

Khan

S

,

Vihinen

M

.

Performance of protein stability predictors

.

Hum Mutat

2010

;

31

:

675

–

84

.

37.

Nisthal

A

,

Wang

CY

,

Ary

ML

, et al.

Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis

.

Proc Natl Acad Sci

2019

;

116

:

16367

–

77

.

38.

Potapov

V

,

Cohen

M

,

Schreiber

G

.

Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details

.

Protein Eng Des Sel

2009

;

22

:

553

–

60

.

39.

Guerois

R

,

Nielsen

JE

,

Serrano

L

.

Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations

.

J Mol Biol

2002

;

320

:

369

–

87

.

40.

Buß

O

,

Rudat

J

,

Ochsenreither

K

.

FoldX as protein engineering tool: better than random based approaches?

Comput Struct Biotechnol J

2018

;

16

:

25

–

33

.

41.

Scheller

R

,

Stein

A

,

Nielsen

SV

, et al.

Toward mechanistic models for genotype-phenotype correlations in phenylketonuria using protein stability calculations

.

Hum Mutat

2019

;

40

:

444

–

57

.

42.

Nielsen

SV

,

Stein

A

,

Dinitzen

AB

, et al.

Predicting the impact of lynch syndrome-causing missense mutations from structural calculations

.

PLoS Genet

2017

;

13

:

e1006739

.

43.

Sánchez

IE

,

Beltrao

P

,

Stricher

F

, et al.

Genome-wide prediction of SH2 domain targets using structural information and the FoldX algorithm

.

PLoS Comput Biol

2008

;

4

:

e1000052

.

44.

Christensen

NJ

,

Kepp

KP

.

Accurate stabilities of laccase mutants predicted with a modified FoldX protocol

.

J Chem Inf Model

2012

;

52

:

3028

–

42

.

45.

Jónsdóttir

LB

,

Ellertsson

B

,

Invernizzi

G

, et al.

The role of salt bridges on the temperature adaptation of aqualysin I, a thermostable subtilisin-like proteinase

.

Biochim Biophys Acta Proteins Proteomics

2014

;

1844

:

2174

–

81

.

Google Scholar

Crossref

WorldCat

46.

Pancotti

C

,

Benevenuta

S

,

Birolo

G

, et al.

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset

.

Brief Bioinform

2022

;

1–12

:

bbab555

.

Google Scholar

OpenURL Placeholder Text

WorldCat

47.

Usmanova

DR

,

Bogatyreva

NS

,

Bernad

JA

, et al.

Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation

.

Bioinformatics

2018

;

34

:

3653

–

8

.

48.

Pucci

F

,

Bernaerts

KV

,

Kwasigroch

JM

, et al.

Quantification of biases in predictions of protein stability changes upon mutations

.

Bioinformatics

2018

;

34

:

3659

–

65

.

49.

Caldararu

O

,

Blundell

TL

,

Kepp

KP

.

A base measure of precision for protein stability predictors: structural sensitivity

.

BMC Bioinformatics

2021

;

22

:

88

.

50.

Delgado

J

,

Radusky

LG

,

Cianferoni

D

, et al.

FoldX 5.0: working with RNA, small molecules and a new graphical interface

.

Bioinformatics

2019

;

35

(

20

):

4168

–

9

.

51.

Pasi

M

,

Tiberti

M

,

Arrigoni

A

, et al.

xPyder: a PyMOL plugin to analyze coupled residues and their networks in protein structures

.

J Chem Inf Model

2012

;

279

:

1

–

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

52.

Kumar

MDS

,

Bava

KA

,

Gromiha

MM

, et al.

ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions

.

Nucleic Acids Res

2006

;

34

:

D204

–

6

.

53.

Pulido

R

.

PTEN: a yin-yang master regulator protein in health and disease

.

Methods

2015

;

77–78

:

3

–

10

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

54.

Yamada

KM

,

Araki

M

.

Tumor suppressor PTEN: modulator of cell signaling, growth, migration and apoptosis

.

J Cell Sci

2001

;

114

:

2375

–

82

.

55.

Chalhoub

N

,

Baker

SJ

.

PTEN and the PI3-kinase pathway in cancer

.

Annu Rev Pathol Mech Dis

2009

;

4

:

127

–

50

.

Google Scholar

Crossref

WorldCat

56.

Jepsen

MM

,

Fowler

DM

,

Hartmann-Petersen

R

, et al.

Classifying disease-associated variants using measures of protein activity and stability

.

Protein Homeost Dis

2020

;

91

–

107

.

Google Scholar

OpenURL Placeholder Text

WorldCat

57.

Lee

JO

,

Yang

H

,

Georgescu

MM

, et al.

Crystal structure of the PTEN tumor suppressor: implications for its phosphoinositide phosphatase activity and membrane association

.

Cell

1999

;

99

:

323

–

34

.

58.

Joerger

AC

,

Fersht

AR

.

The p53 pathway: origins, inactivation in cancer, and emerging therapeutic approaches

.

Annu Rev Biochem

2016

;

85

:

375

–

404

.

59.

Kim

MP

,

Zhang

Y

,

Lozano

G

.

Mutant p53: multiple mechanisms define biologic activity in cancer

.

Front Oncol

2015

;

85

:

375

–

404

.

Google Scholar

OpenURL Placeholder Text

WorldCat

60.

Freed-pastor

WA

,

Prives

C

.

Mutant p53: one name, many proteins

.

Genes Dev

2012

;

26

:

1268

–

86

.

61.

Vogelstein

B

,

Lane

D

,

Levine

AJ

.

Surfing the p53 network

.

Nature

2000

;

408

:

307

–

10

.

62.

Joerger

AC

,

Fersht

AR

.

Structural biology of the tumor suppressor p53

.

Annu Rev Biochem

2008

;

77

:

557

–

82

.

63.

Okorokov

AL

,

Orlova

EV

.

Structural biology of the p53 tumour suppressor

.

Curr Opin Struct Biol

2009

;

19

:

197

–

202

.

64.

Kitayner

M

,

Rozenberg

H

,

Kessler

N

, et al.

Structural basis of DNA recognition by p53 tetramers

.

Mol Cell

2006

;

22

:

741

–

53

.

65.

Chen

Y

,

Dey

R

,

Chen

L

.

Crystal structure of the p53 Core domain bound to a full consensus site as a self-assembled tetramer

.

Structure

2010

;

18

:

246

–

56

.

66.

Hunter

T

.

Signaling - 2000 and beyond

.

Cell

2000

;

100

:

113

–

27

.

67.

Reimand

J

,

Wagih

O

,

Bader

GD

.

The mutational landscape of phosphorylation signaling in cancer

.

Sci Rep

2013

;

3

:

2651

.

68.

Yaffe

MB

.

Phosphotyrosine-binding domains in signal transduction

.

Nat Rev Mol Cell Biol

2002

;

3

:

177

–

86

.

69.

Liu

BA

,

Jablonowski

K

,

Raina

M

, et al.

The human and mouse complement of SH2 domain proteins—establishing the boundaries of phosphotyrosine signaling

.

Mol Cell

2006

;

22

:

851

–

68

.

70.

Machida

K

,

Thompson

CM

,

Dierck

K

, et al.

High-throughput phosphotyrosine profiling using SH2 domains

.

Mol Cell

2007

;

26

:

899

–

915

.

71.

Tinti

M

,

Kiemer

L

,

Costa

S

, et al.

The SH2 domain interaction landscape

.

Cell Rep

2013

;

3

:

1293

–

305

.

72.

Palma

A

,

Tinti

M

,

Paoluzi

S

, et al.

Both intrinsic substrate preference and network context contribute to substrate selection of classical tyrosine phosphatases

.

J Biol Chem

2017

;

292

:

4942

–

58

.

73.

Donaldson

LW

,

Gish

G

,

Pawson

T

, et al.

Structure of a regulatory complex involving the Abl SH3 domain, the Crk SH2 domain, and a Crk-derived phosphopeptide

.

Proc Natl Acad Sci U S A

2002

;

99

:

14053

–

8

.

74.

Nioche

P

,

Liu

WQ

,

Broutin

I

, et al.

Crystal structures of the SH2 domain of Grb2: highlight on the binding of a new high-affinity inhibitor

.

J Mol Biol

2002

;

315

:

1167

–

77

.

75.

Zhang

Y

,

Zhang

J

,

Yuan

C

, et al.

Simultaneous binding of two peptidyl ligands by a Src homology 2 domain

.

Biochemistry

2011

;

50

:

7637

–

46

.

76.

Natan

E

,

Baloglu

C

,

Pagel

K

, et al.

Interaction of the p53 DNA-binding domain with its n-terminal extension modulates the stability of the p53 tetramer

.

J Mol Biol

2011

;

409

:

358

–

68

.

77.

Lambrughi

M

,

De Gioia

L

,

Gervasio

FL

, et al.

DNA-binding protects p53 from interactions with cofactors involved in transcription-independent functions

.

Nucleic Acids Res

2016

;

44

:

9096

–

109

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

78.

Xavier

JS

,

Nguyen

TB

,

Karmarkar

M

, et al.

ThermoMutDB: a thermodynamic database for missense mutations

.

Nucleic Acids Res

2021

;

49

:

D475

–

9

.

79.

Mannar

D

,

Saville

JW

,

Zhu

X

, et al.

SARS-CoV-2 omicron variant: antibody evasion and cryo-EM structure of spike protein–ACE2 complex

.

Science

2022

;

375

(6582):

760

–

4

.

80.

Starr

TN

,

Greaney

AJ

,

Hilton

SK

, et al.

Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding

.

Cell

2020

;

182

:

1295

–

1310.e20

.

81.

Wang

Q

,

Zhang

Y

,

Wu

L

, et al.

Structural and functional basis of SARS-CoV-2 entry by using human ACE2

.

Cell

2020

;

181

:

894

–

904

.

82.

Zahradník

J

,

Marciano

S

,

Shemesh

M

, et al.

SARS-CoV-2 variant prediction and antiviral drug design are enabled by RBD in vitro evolution

.

Nat Microbiol

2021

;

6

:

1188

–

98

.

83.

Papaleo

E

,

Parravicini

F

,

Grandori

R

, et al.

Structural investigation of the cold-adapted acylaminoacyl peptidase from Sporosarcina psychrophila by atomistic simulations and biophysical methods

.

Biochim Biophys Acta Proteins Proteomics

2014

;

1844

:

2203

–

13

.

Google Scholar

Crossref

WorldCat

84.

Mathiassen

SG

,

Larsen

IB

,

Poulsen

EG

, et al.

A two-step protein quality control pathway for a misfolded DJ-1 variant in fission yeast

.

J Biol Chem

2015

;

290

:

21141

–

53

.

85.

Nygaard

M

,

Terkelsen

T

,

Olsen

AV

, et al.

The mutational landscape of the oncogenic MZF1 SCAN domain in cancer

.

Front Mol Biosci

2016

;

3

:

1

–

18

.

86.

Tate

JG

,

Bamford

S

,

Jubb

HC

, et al.

COSMIC: the catalogue of somatic mutations in cancer

.

Nucleic Acids Res

2019

;

47

:

D941

–

7

.

87.

Guarnera

E

,

Berezovsky

IN

.

Allosteric drugs and mutations: chances, challenges, and necessity

.

Curr Opin Struct Biol

2020

;

62

:

149

–

57

.

88.

Nussinov

R

,

Jang

H

,

Tsai

CJ

, et al.

Precision medicine review: rare driver mutations and their biophysical classification

.

Biophys Rev

2019

;

11

:

5

–

19

.

89.

Nussinov

R

,

Tsai

C-J

.

‘Latent drivers’ expand the cancer mutational landscape

.

Curr Opin Struct Biol

2015

;

32

:

25

–

32

.

90.

Fas

BA

,

Maiani

E

,

Sora

V

, et al.

The conformational and mutational landscape of the ubiquitin-like marker for autophagosome formation in cancer

.

Autophagy

2021

;

17

(10):

2818

–

41

.

91.

Kumar

M

,

Papaleo

E

.

A pan-cancer assessment of alterations of the kinase domain of ULK1, an upstream regulator of autophagy

.

Sci Rep

2020

;

10

:

14874

.

92.

Bhattacharyya

M

,

Ghosh

S

,

Vishveshwara

S

.

Protein structure and function: looking through the network of side-chain interactions

.

Curr Protein Pept Sci

2016

;

17

:

4

–

25

.

93.

Tiberti

M

,

Invernizzi

G

,

Lambrughi

M

, et al.

PyInteraph: a framework for the analysis of interaction networks in structural ensembles of proteins

.

J Chem Inf Model

2014

;

54

:

1537

–

51

.

94.

Guarnera

E

,

Berezovsky

IN

.

Toward comprehensive allosteric control over protein activity

.

Structure

2019

;

27

:

866

–

878.e1

.

95.

Dvorak

P

,

Bednar

D

,

Vanacek

P

, et al.

Computer-assisted engineering of hyperstable fibroblast growth factor 2

.

Biotechnol Bioeng

2018

;

115

:

850

–

62

.

96.

Song

X

,

Wang

Y

,

Shu

Z

, et al.

Engineering a more thermostable blue light photo receptor Bacillus subtilis YtvA LOV domain by a computer aided rational design method

.

PLoS Comput Biol

2013

;

9

:

e1003129

.

97.

Heselpoth

RD

,

Yin

Y

,

Moult

J

, et al.

Increasing the stability of the bacteriophage endolysin PlyC using rationale-based FoldX computational modeling

.

Protein Eng Des Sel

2015

;

28

:

85

–

92

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
March 2022	165
April 2022	142
May 2022	83
June 2022	130
July 2022	65
August 2022	51
September 2022	51
October 2022	63
November 2022	50
December 2022	45
January 2023	48
February 2023	90
March 2023	68
April 2023	81
May 2023	128
June 2023	65
July 2023	32
August 2023	36
September 2023	43
October 2023	84
November 2023	26
December 2023	48
January 2024	137
February 2024	185
March 2024	220
April 2024	235
May 2024	275
June 2024	151
July 2024	82
August 2024	69
September 2024	89
October 2024	102
November 2024	62
December 2024	93
January 2025	92
February 2025	213
March 2025	291
April 2025	256

Article Contents

MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles

Abstract

Introduction

Results

Design and implementation

Overview on MutateX

The MutateX protocol

Performance

Results

Selection of a cutoff for truncation of ΔΔGs values for plotting purposes

Saturation mutation scan of the cancer-related protein phosphatase and tensin homolog (PTEN): relationship between protein stability and disease

Saturation mutagenesis to identify hotspot residues at the binding interface for tetramerization of p53

Saturation mutagenesis of protein-peptide complexes: identifying binding specificities of SH2 domains

Comparison of MutateX-predicted changes in free energy for the p53 DBD in experimental structures and ensembles from molecular dynamics

Identification of variants in the Coronavirus Disease-19 (COVID-19) spike-ACE2 complex

Limitations, future directions and availability

Author contributions

Acknowledgements

Data availability

Funding

Author Biographies

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles

Abstract

Introduction

Results

Design and implementation

Overview on MutateX

The MutateX protocol

Performance

Results

Selection of a cutoff for truncation of ΔΔGs values for plotting purposes

Saturation mutation scan of the cancer-related protein phosphatase and tensin homolog (PTEN): relationship between protein stability and disease

Saturation mutagenesis to identify hotspot residues at the binding interface for tetramerization of p53

Saturation mutagenesis of protein-peptide complexes: identifying binding specificities of SH2 domains

Comparison of MutateX-predicted changes in free energy for the p53 DBD in experimental structures and ensembles from molecular dynamics

Identification of variants in the Coronavirus Disease-19 (COVID-19) spike-ACE2 complex

Limitations, future directions and availability

Author contributions

Acknowledgements

Data availability

Funding

Author Biographies

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only