ConPlex: a server for the evolutionary conservation analysis of protein complex structures

Author Notes

Abstract

Evolutionary conservation analyses are important for the identification of protein–protein interactions. For protein complex structures, sequence conservation has been applied to determine protein oligomerization states, to characterize native interfaces from non-specific crystal contacts, and to discriminate near-native structures from docking artifacts. However, a user-friendly web-based service for evolutionary conservation analysis of protein complexes has not been available. Therefore, we developed ConPlex (http://sbi.postech.ac.kr/ConPlex/) a web application that enables evolutionary conservation analyses of protein interactions within protein quaternary structures. Users provide protein complex structures; ConPlex automatically identifies protein interfaces and carries out evolutionary conservation analyses for the interface regions. Moreover, ConPlex allows the results of the residue-specific conservation analysis to be displayed on the protein complex structure and provides several options to customize the display output to fit each user’s needs. We believe that ConPlex offers a convenient platform to analyze protein complex structures based on evolutionary conservation of protein–protein interface residues.

INTRODUCTION

Functionally important amino acids in a protein sequence are conserved through selective evolutionary pressure, such as constraints on protein folding, involvement in enzymatic activity, and the maintenance of ligand binding or protein–protein interactions. Evolutionary conservation analyses have been widely applied to characterize functionally/structurally important residues, to identify protein–protein interfaces, or to predict interactions between a protein and its ligand (1–3). Various methods to automatically calculate conservation scores from the primary amino-acid sequences or protein structures have been developed. Rate4Site was developed to identify functionally important regions in proteins by estimating the evolutionary rates of each amino acid among homolog proteins (2). Based on Rate4Site, a web-based tool, called ConSeq, was introduced to perform conservation analysis of protein sequences (4). Furthermore, ConSurf was developed to enable automation of the conservation analysis of protein tertiary structures (a single chain of a protein structure) (5,6).

When it comes to protein ‘quaternary’ structures, evolutionary conservation analysis has been successfully applied to predict interfaces of protein complex structures. For example, McCammon’s and Thornton’s groups showed that analyses of protein interface conservation is effective for the identification of protein oligomerization states and discrimination of true oligomeric contacts from non-specific crystal contacts (7,8). Additionally, our previous study proved that protein interfaces are more conserved than the rest of the surface (ROS) and revealed that docking artifacts could be effectively eliminated by comparing conservation levels between the interface and the ROS in a protein docking complex (9). Furthermore, Guharoy and Chakrabarti demonstrated that within a protein–protein interface measuring the distinct conservation levels between fully buried residues upon binding (core) and partially buried residues (rim) is an effective means to discriminate biological interfaces from crystal contacts, because core residues are significantly more conserved than the rim residues (10). Therefore, a web application that automatically identifies interface, ROS, core and rim residues within protein complex structures, and calculates evolutionary conservation scores for these residues would be very useful for characterizing protein oligomerization states, identifying native interfaces from crystal contacts, and discriminating near-native structures from docking decoys.

In this report, we present the ConPlex server for evolutionary conservation analyses of protein complex structures. With ConPlex, the user inputs protein complex structures and the server automatically calculates position-specific conservation scores based upon evolutionary relationships among the query protein and its homologs. Then, ConPlex finds protein interfaces and other biologically important regions of the protein quaternary structure. By mapping the calculated residue-specific conservation scores onto the complex structure, ConPlex enables users to comprehensively analyze the evolutionary conservation of the protein interface, surface, rim and core regions. The results of the evolutionary conservation analyses are displayed on the 3D structures with a user-friendly interface. The ConPlex server also provides users with the results of each intermediate calculation, including compilation of primary amino acid sequences homologous to the query, a multiple sequence alignment, and the residue-specific evolutionary score. Finally, ConPlex has a flexible output display and an ability to use two different visualization scripts for further analyses on a user’s own computer. We believe ConPlex will be an essential web tool for protein biochemists in a wide range of fields, such as experimental biology and bioinformatics, to assess protein complex structures based on evolutionary conservation.

METHODS

A brief description of the methodology is provided here. More detailed information is available at http://sbi.postech.ac.kr/ConPlex/, under the ‘Instructions’ and ‘Output Example’ menus.

ConPlex input

ConPlex takes input as protein complex structures in a Protein Data Bank (PDB) format and chain identifiers indicating interacting partners. Users can either upload the PDB format file or input the four-letter PDB accession number. For evaluation of protein docking models, ConPlex also allows users to upload multiple docking decoys in a compressed file.

ConPlex protocol

Using a protein complex structure as an input, ConPlex automatically carries out the following processes itemized below (Figure 1). To analyze the evolutionary conservation of the protein complex, ConPlex performs two separate calculations; (i) residue-specific conservation scores and (ii) interface residue identifications. Then, the results from these two separate calculations are integrated to perform more detailed analyses.

Figure 1.

A flow chart of the ConPlex calculation processes.

Open in new tab Download slide

To calculate the sequence conservation scores, the amino-acid sequence of each chain is extracted from the PDB file.
Based on the extracted sequence, homolog sequences are collected from the SWISS-PROT database (11) using PSI-BLAST (12). For identification of homologous sequences both E-values and sequence identities are considered. The default E-value cutoff is 0.001, and only proteins having sequence identity over 20% with the query sequence were selected. Cutoff values are adjustable through the advanced options. If insufficient numbers of homologs are identified, then additional PSI-BLAST iterations can be carried out. The default cutoff for the minimum number of homologs is five and default PSI-BLAST iteration number is one; however, both values are adjustable through advanced options.
A multiple sequence alignment (MSA) of retrieved homolog sequences is constructed by MUSCLE (13), which is one of the most rapid and accurate sequence alignment programs available (14).
Using the MSA, residue-specific conservation scores are calculated by Rate4Site (2), a tool that determines the evolutionary rate of each amino-acid positions using the maximum likelihood principle.
ConPlex simultaneously identifies interface residues from the query protein complex structure. Solvent accessible surface area (SASA) of each residue is calculated for both the monomer and complex states (9). Based on differences in the SASA for each residue upon binding (ΔSASA), surface residues are classified into interface and ROS. If the ΔSASA of a residue is >1 ï¿½², then the residue is classified as an interface residue as used in other studies (8,15).
Interface residues are sub-classified into core and rim residues based upon following criteria: if a SASA of a residue after complex formation is <10 ï¿½², then it is defined as a core residue (fully buried interface residues), otherwise it is classified as a rim residue (partially buried interface residues). The SASA thresholds for core and rim residues are adjustable through advanced options.
Evolutionary conservation analyses of the protein complex structure are carried out by combining the identified interface residues and residue-specific conservation scores. The conservation score of the protein interface (CSV_int), ROS (CSV_ros) and core (CSV_core), rim residues (CSV_rim) are calculated as described in our previous study (9).
where i indicates the residue in the region of interest, ΔSASA_i is the SASA of residue i that becomes buried upon binding, and CSV_i is the conservation score of residue i. In case of residues in the ROS, where ΔSASA_i = 0, SASA of unbound state would be calculated instead. The weighted average of conservation score of all residues participated in the defined region (i.e. interface, ROS, rim and core) was considered. All the defined regions are basically surface patches that are composed of the residues whose ΔSASA_i is >1 upon complex formation. The size of the patch is the sum of ΣΔSASA_i, where each residue i contributes to the patch size by its ΔSASA_i– In this way, we could weigh the contribution of each residue by its relative contribution to the total solvent accessible area of each defined region [see (7,9) for more details]. Note that a smaller CSV means a more conserved region (9).
The ratios of the conservation level between interface and ROS residues [CSV_{ratio(int,ROS)}] and between core and rim residues [CSV_{ratio(core,rim)}] are calculated as follows:
For example, if CSV_{ratio(int, ROS)} is <1, the protein interface is considered evolutionary more conserved than the ROS. Otherwise, the protein interface is considered less conserved than the ROS.
Finally, the results from the conservation analyses are displayed on the 3D structure. Normalized residue-specific conservation scores are projected onto each residue in the protein complex structure using a 20-color gradient, from red (conserved) to blue (variable). Interface residues of each chain are represented as spheres.

Advanced input options

ConPlex provides a variety of advanced input options such as customizing the PSI-BLAST E-value cutoff, sequence identity cutoff, minimum/maximum number of homologs required, the maximum PSI-BLAST iteration number and core/rim SASA cutoff. Most advanced input options are for homologous sequence search, since retrieving proper homologous sequences is critical for the proper calculation of conservation scores. These options may need to be adjusted to obtain a sufficient number of proper homologous sequences to obtain reliable results.

OUTPUT

ConPlex provides a variety of output formats for the evolutionary conservation analyses of the protein complex (Figure 2). The main results page is divided into four sections: Output File, Intermediate File, Result Analysis and Structural Visualization (Figure 2A).

Figure 2.

ConPlex Server outputs formats. (A) The main output page of ConPlex. (B) Conservation scores of interface, ROS, core and rim residues of the protein complex structure. (C) Intermediate calculation results, including homologous sequences, multiple sequence alignment and residue-specific conservation scores. (D) Visualization script for structure viewer programs. (E) Color-coded sequence conservation analyses of the primary amino-acid sequence. (F) List of the interface residues and their conservation scores.

Open in new tab Download slide

Output file

In the ‘Output File’ section, users can download the conservation scores of the protein complexes (Figure 2B). The file contains the conservation scores of the interface (CSV_int), the ROS (CSV_ROS) and CSV ratio of each binding partner [CSV_{ratio(int, ROS)}]. It also includes conservation scores of core (CSV_core), and rim residues (CSV_rim), as well as their ratio [CSV_{ratio(core, rim)}]. For example, if the CSV_{ratio(int, ROS)} score is <1, it implies that the interface is more conserved than ROS (9). If the user uploaded an archived file containing multiple docking decoys, then the output file would have multiple lines of results. In that case, each line reveals the conservation analyses of each docking models. The file is delimited with TABs, to be easily analyzed using a spreadsheet programs.

Intermediate files

In the ‘Intermediate Files’ section, intermediate calculation results are provided (Figure 2C). Based on each sequence extracted from the query protein complex, ConPlex collects homologous sequences using PSI-BLAST (12), carries out a multiple sequence alignment using MUSCLE (13), and calculates the residue-specific evolutionary score using Rate4Site (2). Users can download these three intermediate calculation results and carry out more detailed analyses of each calculation in order to adjust the default setting in the advanced input options.

Result analysis and visualization scripts

In the ‘Result Analysis’ section, ConPlex provides various results for further analysis (Figure 2A). Both the CSV_{ratio(int, ROS)} and CSV_{ratio(core, rim)} scores are provide for the conservation analyses of interface/ROS and core/rim residues. If the interface is more conserved than the ROS, or the core residues are more conserved than the rim residues, then the value is represented in red. The analysis also provides the calculated results for the protein interface sizes (SASA), since the protein interface size is used as an important criterion to distinguish true protein–protein interactions from non-specific crystal contacts (16–18).

ConPlex also provides convenient visualization scripts so that users can visualize the results of the analyses on user’s local computers (Figure 2D). The default setting displays the complex structure in the webpage (‘structural visualization’ section). To provide added flexibility, users can download visualization scripts, enabling analysis of the results on their computer. Visualization scripts are provided for two structure viewer programs; Jmol (http://www.jmol.org/) and Pymol (http://pymol.sourceforge.net/). An example of applying the Pymol script is shown in Figure 2D, in which the residue-specific conservation scores and interface residues of each chain are automatically displayed (See http://sbi.postech.ac.kr/ConPlex/, under the ‘Output Example’ menu for detailed instructions). In this display, the degree of conservation is represented by a red-to-blue color gradient. A deeper red color indicates a more conserved residue and a deeper blue color represents a more variable residue. Interface residues are represented as spheres.

In the ‘Info’ menu, under the ‘Result Analysis’ section, users can open a popup window that shows detailed information on the residue-specific conservation scores for each chain (Figure 2E). At the top of the popup window, users can download the list of interface residues which includes the residue number, chain identifier, conservation score (CSV) and ΔSASA upon complex formation for each interface residue (Figure 2F).

Structural visualization

Under the ‘Structural Visualization’ section, the 3D complex structure displayed is color-graded according to the residue-specific conservation scores and visualized using Jmol (Figure 3). As indicated by the scale-bar, the degree of conservation is represented by a red-to-blue color gradient.

Figure 3.

Display options for complex structure analysis. Rho–RhoGAP complex (PDB id: 1TX4) is used as example. (A) Default display format. Users have the option to toggle conservation colors (B), interface residues (C), and each chain in structures with two binding partners (D). (E) Users can rotate and enlarge/reduce the structure in the display. (F) Clicking a residue in sequence window will highlight the corresponding residue on the structure with a yellow outline.

Open in new tab Download slide

In Figure 3, various display options for the structural analysis are shown with an example of Rho–RhoGAP complex (PDB id: 1TX4). Rho is a small G protein and a molecular switch to regulate phosphorylation pathways of cytoskeleton formation and cell proliferation. Through the interaction with RhoGAP, an active GTP-bound form of Rho is hydrolyzed to an inactive GDP-bound form (19). This Rho–RhoGAP interaction plays a crucial role in regulating Rho-mediated signaling pathways, which is evolutionary conserved across many species. ConPlex successfully visualize the highly conserved Rho–RhoGAP interface and should be useful to guide further analysis of the interface.

In the structural visualization, users can toggle the red-to-blue conservation color representation with the ‘Conservation Color’ option (Figure 3B). When conservation color is disabled, each chain of the complex is presented as a distinct color to easily differentiate each complex component. As a result, RhoGAP (chain A) and Rho (chain B) are shown in light blue and in light green, respectively. The ‘Interface Residues’ option toggles the display of the interface residues as sphere or in the ribbon representation (Figure 3C). Finally, ‘Chain’ option allow each chain to be shown or hidden (Figure 3D). Users can rotate and change the size of the protein structure to analyze the protein interface patch in detail (Figure 3E).

The protein sequences in the pop-up window, shown in Figure 2E, interfaces with the structure display, and if users click on a residue of interest in the primary sequence, the position of a corresponding residue will be highlighted in the structure with yellow outline (Figure 3F). Using the display options available in ConPlex, users can perform a detailed analysis of the complex structure and gain broader insights into the interactions.

PROGRAMMATIC INTERFACE

ConPlex provides programmatic interface using soaplib-based library using Web Service Description Language (WSDL). Users can automatically analyze large numbers of complexes either by using the script we provide or by modifying the script for their specific purpose. For advanced users who want to build their own ConPlex web-service client in other developmental environments, such as C++, Java or Ruby, supporting SOAP library, we provide a standard WSDL interface file in XML-Cascade. See ‘Programmatic Interface’ section in main webpage for more detailed information.

PRE-CALCULATED LIBRARY

ConPlex offers the library of the pre-calculated results for PDB binary complexes. Currently ConPlex stores pre-calculated results of 3376 PDB complexes (20) and provides the results under ‘Pre-calculated Library’ menu. The current results were calculated with default options. Moreover, the server stores the results from users’ PDB input so that it gives out result faster when it is found from the job history. When users submit a job with a PDB identifier, ConPlex searches for the same PDB complex from the pre-calculated library and job history without conducting redundant calculations. ConPlex start whole calculation processes when the submitted job is found to be new. Pre-calculated library will be expanded as the calculation results from users’ inputs are accumulated.

CONCLUSIONS

Evolutionary conservation has been widely applied to analyze protein sequences, as well as protein tertiary and quaternary structures. Currently, conservation analyses are shown to successfully identify functionally and structurally important residues (2,21), and several tools and web servers, such as Rate4Site (2), ConSeq (4), ConSurf (5,6) have been introduced to perform these calculations. However, a user-friendly web server for evolutionary conservation analyses of protein quaternary structures has not been available.

ConPlex is the first web application for evolutionary conservation analyses of protein–protein complex structures. It offers a one-stop calculation of residue-specific conservation scores within the quaternary structure by automatically collecting homologous sequences of the query protein chains and identifying biologically important regions of the complex structures. Additionally ConPlex provides a variety of display options that enable users to easily and thoroughly analyze their protein complex of interest. Furthermore, users are provided with the results of each intermediate calculation steps and visualization scripts for further analysis. Thus, ConPlex offers a convenient platform for biologists to perform evolutionary conservation analyses of protein complex structures.

FUNDING

National Research Foundation grant funded by the Korea government (MEST) (20090084155, 20090091503); FPR08B1-300 21st frontier functional Proteomics Project from the Korean Ministry of Education, Science and Technology; World Class University program (R31-2008-000-10100-0). Funding for open access charge: World Class University program (R31-2008-000-10100-0).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors thank to the SBI lab members for critical discussion and comments on the manuscript.

REFERENCES

Magliery

Regan

Sequence variation in ligand binding sites in proteins

BMC Bioinformatics

2005

, vol.

pg.

240

Pupko

Bell

Mayrose

Glaser

Ben-Tal

Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues

Bioinformatics

2002

, vol.

Suppl. 1

(pg.

S71

S77

)

Bradford

Westhead

Improved prediction of protein-protein binding sites using a support vector machines approach

Bioinformatics

2005

, vol.

(pg.

1487

1494

)

Berezin

Glaser

Rosenberg

Paz

Pupko

Fariselli

Casadio

Ben-Tal

ConSeq: the identification of functionally and structurally important residues in protein sequences

Bioinformatics

2004

, vol.

(pg.

1322

1324

)

Glaser

Pupko

Paz

Bell

Bechor-Shental

Martz

Ben-Tal

ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information

Bioinformatics

2003

, vol.

(pg.

163

164

)

Landau

Mayrose

Rosenberg

Glaser

Martz

Pupko

Ben-Tal

ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures

Nucleic Acids Res.

2005

, vol.

(pg.

W299

W302

)

Elcock

McCammon

Identification of protein oligomerization states by analysis of interface conservation

Proc. Natl Acad. Sci. USA

2001

, vol.

(pg.

2990

2994

)

Google Scholar

Crossref

WorldCat

Valdar

Thornton

Conservation helps to identify biologically relevant crystal contacts

J. Mol. Biol.

2001

, vol.

313

(pg.

399

416

)

Choi

Yang

Choi

Ryu

Kim

Evolutionary conservation in multiple faces of protein interaction

Proteins

2009

, vol.

(pg.

)

Guharoy

Chakrabarti

Conservation and relative importance of residues across protein-protein interfaces

Proc. Natl Acad. Sci. USA

2005

, vol.

102

(pg.

15447

15452

)

Google Scholar

Crossref

WorldCat

Boeckmann

Bairoch

Apweiler

Blatter

Estreicher

Gasteiger

Martin

Michoud

O'Donovan

Phan

et al. ,

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

Nucleic Acids Res.

2003

, vol.

(pg.

365

370

)

Altschul

Madden

Schaffer

Zhang

Miller

Lipman

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Nucleic Acids Res.

1997

, vol.

(pg.

3389

3402

)

Edgar

MUSCLE: multiple sequence alignment with high accuracy and high throughput

Nucleic Acids Res.

2004

, vol.

(pg.

1792

1797

)

Edgar

Batzoglou

Multiple sequence alignment

Curr. Opin. Struct. Biol.

2006

, vol.

(pg.

368

373

)

Chakrabarti

Janin

Dissecting protein-protein recognition sites

Proteins

2002

, vol.

(pg.

334

343

)

Bradford

Needham

Bulpitt

Westhead

Insights into protein-protein interfaces using a Bayesian network prediction method

J. Mol. Biol.

2006

, vol.

362

(pg.

365

386

)

Janin

Specific versus non-specific contacts in protein crystals

Nat. Struct. Biol.

1997

, vol.

(pg.

973

974

)

Ponstingl

Henrick

Thornton

Discriminating between homodimeric and monomeric proteins in the crystalline state

Proteins

2000

, vol.

(pg.

)

Rittinger

Walker

Eccleston

Smerdon

Gamblin

Structure at 1.65 A of RhoA and its GTPase-activating protein in complex with a transition-state analogue

Nature

1997

, vol.

389

(pg.

758

762

)

Mintz

Shulman-Peleg

Wolfson

Nussinov

Generation and analysis of a protein-protein interface data set with similar chemical and spatial patterns of interactions

Proteins

2005

, vol.

(pg.

)

Mayrose

Graur

Ben-Tal

Pupko

Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior

Mol. Biol. Evol.

2004

, vol.

(pg.

1781

1791

)

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

ï¿½ The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
December 2016	2
January 2017	1
February 2017	3
March 2017	11
April 2017	7
May 2017	2
June 2017	5
July 2017	4
August 2017	11
September 2017	4
October 2017	8
November 2017	8
December 2017	20
January 2018	20
February 2018	16
March 2018	24
April 2018	19
May 2018	21
June 2018	12
July 2018	18
August 2018	17
September 2018	32
October 2018	14
November 2018	27
December 2018	16
January 2019	12
February 2019	15
March 2019	28
April 2019	45
May 2019	40
June 2019	24
July 2019	21
August 2019	31
September 2019	44
October 2019	38
November 2019	38
December 2019	20
January 2020	21
February 2020	25
March 2020	35
April 2020	26
May 2020	29
June 2020	39
July 2020	50
August 2020	44
September 2020	49
October 2020	52
November 2020	36
December 2020	52
January 2021	47
February 2021	41
March 2021	57
April 2021	49
May 2021	26
June 2021	27
July 2021	33
August 2021	53
September 2021	39
October 2021	30
November 2021	40
December 2021	29
January 2022	18
February 2022	21
March 2022	21
April 2022	33
May 2022	28
June 2022	15
July 2022	11
August 2022	19
September 2022	20
October 2022	23
November 2022	4
December 2022	27
January 2023	14
February 2023	5
March 2023	3
April 2023	16
May 2023	10
June 2023	10
July 2023	4
August 2023	9
September 2023	14
October 2023	23
November 2023	10
December 2023	31
January 2024	30
February 2024	21
March 2024	13
April 2024	13
May 2024	16
June 2024	10
July 2024	25
August 2024	8
September 2024	14
October 2024	21
November 2024	35
December 2024	4
January 2025	8
February 2025	26
March 2025	5
April 2025	14
May 2025	11

Article Contents

ConPlex: a server for the evolutionary conservation analysis of protein complex structures

Abstract

INTRODUCTION

METHODS

ConPlex input

ConPlex protocol

Advanced input options

OUTPUT

Output file

Intermediate files

Result analysis and visualization scripts

Structural visualization

PROGRAMMATIC INTERFACE

PRE-CALCULATED LIBRARY

CONCLUSIONS

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

ConPlex: a server for the evolutionary conservation analysis of protein complex structures Open Access

Abstract

INTRODUCTION

METHODS

ConPlex input

ConPlex protocol

Advanced input options

OUTPUT

Output file

Intermediate files

Result analysis and visualization scripts

Structural visualization

PROGRAMMATIC INTERFACE

PRE-CALCULATED LIBRARY

CONCLUSIONS

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

ConPlex: a server for the evolutionary conservation analysis of protein complex structures