![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2007 D'Ursi et al; licensee BioMed Central Ltd. ProCMD: a database and 3D web resource for protein C mutants 1Department of Science and Biomedical Technologies, University of Milano, Italy 2Institute of Biomedical Technologies, National Research Council, Segrate (Mi), Italy 3Hematology and Thrombosis Unit, DMCO- University of Milano and Az. Ospedaliera San Paolo, Italy Corresponding author.Pasqualina D'Ursi: pasqualina.dursi/at/itb.cnr.it; Francesca Marino: silefra/at/libero.it; Andrea Caprera: andrea.caprera/at/itb.cnr.it; Luciano Milanesi: luciano.milanesi/at/itb.cnr.it; Elena M Faioni: elena.faioni/at/unimi.it; Ermanna Rovida: ermanna.rovida/at/itb.cnr.it SupplementItalian Society of Bioinformatics (BITS): Annual Meeting 2006 Rita Casadio, Manuela Helmer-Citterich, Graziano Pesole ConferenceItalian Society of Bioinformatics (BITS): Annual Meeting 2006 28–29 April 2006 Bologna, Italy This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Activated Protein C (ProC) is an anticoagulant plasma serine protease which also plays an important role in controlling inflammation and cell proliferation. Several mutations of the gene are associated with phenotypic functional deficiency of protein C, and with the risk of developing venous thrombosis. Structure prediction and computational analysis of the mutants have proven to be a valuable aid in understanding the molecular aspects of clinical thrombophilia. Results We have built a specialized relational database and a search tool for natural mutants of protein C. It contains 195 entries that include 182 missense and 13 stop mutations. A menu driven search engine allows the user to retrieve stored information for each variant, that include genetic as well as structural data and a multiple alignment highlighting the substituted position. Molecular models of variants can be visualized with interactive tools; PDB coordinates of the models are also available for further analysis. Furthermore, an automatic modelling interface allows the user to generate multiple alignments and 3D models of new variants. Conclusion ProCMD is an up-to-date interactive mutant database that integrates phenotypical descriptions with functional and structural data obtained by computational approaches. It will be useful in the research and clinical fields to help elucidate the chain of events leading from a molecular defect to the related disease. It is available for academics at the URL http://www.itb.cnr.it/procmd/. Background Activated protein C (APC) is a vitamin K-dependent serine protease that plays a central role in the regulation of blood coagulation. It is the key component of an anticoagulant system that provides an essential mechanism in thrombosis prevention and in inflammatory response control. APC binds to its cofactor protein S, and the complex inactivates the two cofactors involved in the clotting cascade, factors Va and factor VIIIa, leading to the efficient inhibition of the coagulation process. Furthermore APC has direct and indirect anti-inflammatory actions. It prevents leukocyte rolling, tissue factor exposure and tumour necrosis factor production by monocytes, thrombin-mediated inflammatory actions and apoptosis of endothelial cells [1]. The anti-thrombotic and anti-inflammatory actions of APC have been therapeutically exploited in severe sepsis [2]. Protein C, the zymogen of APC, is synthesized as a single chain precursor containing an amino-terminal leader sequence followed by a propeptide, which are cleaved upon secretion. It circulates in plasma mostly as a two-chain zymogen obtained by removal of the dipeptide Lys198 and Arg199 that results in the formation of two chains (light, 21 kDa and heavy, 41 kDa) linked by a disulphide bridge between Cys183 and Cys320. The zymogen is activated by thrombin through the proteolytic cleavage of a 12 amino acid peptide (Asp200-Arg211) [3]. The activation occurs on the endothelium of blood vessels by the thrombin-thrombomodulin complex [4] and this process is enhanced by the endothelial cell protein C receptor (EPCR) [5]. Protein C has a multi-domain structure: the light chain contains a γ-carboxyglutamic acid (Gla)-rich membrane binding domain and two epidermal growth factor (EGF)-like modules, while the heavy chain has the form of a typical trypsin-like serine protease domain [3]. The structure of the Gla-domainless APC has been solved by X-ray crystallography [6], while the structure of the Gla-domain is available in the complex with endothelial protein C receptor [7]. Protein C shares homologies with other vitamin K-dependent coagulation proteins as a result of a common evolutionary pathway. Mutations on the gene have been found in patients with protein C deficiency (OMIM 176860), a disorder associated with development of purpura fulminans and severe recurrent thrombotic events in the homozygous form while, in the heterozygous form, it is responsible for an increased risk of venous thromboembolism in early adulthood [8,9]. Phenotypically, two distinct types of protein C deficiency are recognized: type I deficiency, the most common, is characterized by a parallel reduction in protein C concentration and function (measured in plasma by amidolytic or anticoagulant methods); type II deficiency is identified by normal or increased concentration and reduced function [10]. A large number of the mutations that contribute to the protein C deficiency fall in the structurally solved domains and thus are amenable to homology modelling and computational analysis. Structural models can be useful in the research and clinical fields to elucidate how a mutation may interfere with enzymatic activity, ligand binding and cofactor interaction and relate the effect to patient phenotype. In the last published database of mutations of the protein C gene [10,11], 161 different mutations (corresponding to 351 entries) are reported. Additional variants can be obtained from other sources (Human Gene Mutation Database [12,13] and the Swiss-Prot Variant Page [14,15]. However, to our knowledge, none of the available collections include data on the structural-functional interpretation of reported variants. We describe here an updated, 3D-structure oriented database of protein C that associates clinical and phenotypical descriptions with functional and structural data obtained by computational approaches. It includes the description of 21 new variants that we have identified and analyzed in a previous work [16][17]. The database is integrated with an interactive search interface and with tools for structure visualization and mutant modelling. Construction and content Dataset We collected a total of 195 naturally occurring mutations in the coding region of the protein C gene that include 182 missense and 13 stop mutations. Of these, a set of 21 variants were identified by our group through the screening of the protein C gene of 42 patients with a phenotypic functional deficiency of protein C [17]. The remaining portion of the dataset consists of missense and nonsense variants already reported, obtained from 3 different sources: the database of mutations of protein C gene [10,11], the Human Gene Mutation Database [12,13] and the Swiss-Prot Variant Page [14,15]. Additional variants, not included in the above databases, were obtained from literature. The entries were manually extracted and filtered to avoid duplicates. General information associated with each entry was derived from literature sources. A multiple alignment obtained with CLUSTALW [18] using a set of orthologous and paralogous sequences was also associated with each entry. Variants with substitutions in the structurally solved regions of the protein C, have been modelled starting from X-ray coordinates (PDB entry 1AUT). Molecular modelling of the 21 variants identified by our group, were achieved by residue replacement using InsightII (Accelrys INC., San Diego, Ca, USA). The lowest energy rotamer was chosen as the starting side chain position, followed by energy minimization calculation consisting of 500 steps of steepest descent keeping the backbone fixed, followed by 500 steps of conjugate gradient on the whole structure. A detailed computational analysis of these variants, such as electrostatic potential calculations, was formerly carried out and the results are also stored in the database. All other variants were modelled using an automatic approach based on an adapted Python script of Modeller [19]. The script replaces the side chain of the mutated residue in the PDB file (1AUT) and optimizes the conformation by energy minimization and molecular dynamics. For each variant modelled, a set of 3D molecular representation were constructed using the programs PyMol (PyMOL Molecular Graphics System, DeLano Scientific, San Carlos, CA, USA) and MOLSCRIPT [20] to obtain fixed images and VRML (Virtual Reality Modelling Language) files. VRMLs can be viewed through a browser in a dynamic, interactive way using a player like CORTONA2 [21]. Database and User Interface The data are stored in a relational database managed by a MySQL Database Management System [22]. The database at the moment contains four tables: a "Mutations" table with all the single point mutation entries, a "Variants" table with clinical comments and, if available, literature data concerning each mutant; and two other tables with structural information about domains, chains, secondary structures, molecular modelling results and structure-function relationships. We created a web based interface with the aim of helping users search for specific information, to browse the entire database or to visualize 2D and 3D images of the variants. The web site is also a source of documentation about protein C, and a point of access to external resources and databases related to protein C. This user interface has been built with PHP language scripts [23] on an Apache Web Server [24]. Entry description All entries in the database are labeled by an unique identifier and may include the following fields: - sequence position of the mutated residues numbered according to:UniProtKB/Swiss-Prot entry P04070, Foster's codon numbering [25] and the chymotrypsin numbering used in PDB entry 1AUT when the substituted residue is included in the X-ray structure of protein C; - wild type and mutated residues; - gene localization; - clinical or laboratory phenotype data as obtained from other database reports or from the literature; - links to PUBMED database; - cross-references to other databases reporting the mutation. - There is also a structural information section that assigns the mutated residue to its specific chain, secondary structure and domain localization. A multiple alignment of homologous sequences, a 3D gallery of structural images and the PDB coordinates of the mutants are also present. Results of computational analysis are collected in a 3D-notes page that includes considerations on the physico-chemical properties of the mutant residue compared to the wild-type, (i.e. charge, hydrophobicity, solvent accessibility), a list of hydrogen bonds and hydrophobic interactions. Additional information resulting from further computational studies, such as electrostatic potential calculation, and the prediction of structural-functional effects of the mutation are associated with some entries. Utility and Discussion The ProCMD database aims to provide a summary of the sequence and structure information on variants with substitutions in the coding region of the protein C gene. The database is interfaced with a fully interactive website, through which the user can retrieve entries of interest, find cross-references and visualize structural models with interactive tools. The home page of ProCMD web tool is shown in figure figure1.1
Data search and retrieval A query page allows the user to retrieve entries by the position in the sequence of a mutated residue, by amino acidic substitution, and by domain localization. Results of the query are listed in a table showing the amino acidic substitution and the sequence position for each entry, and provide links to details pages summarizing all the associated data. An example of the output is shown in figure figure2.2
Results of more detailed computational analysis and interpretation of the effect of mutation, when available, appear in a 3D-notes page with the corresponding images (figure (figure33
Taken together, all the data related to each variant are useful to understand the relationship between the mutation and phenotype and help to elucidate the role of specific residues for protein function. Analysis tools for new mutations For other user-defined missense mutants, not present in the database, the site provides tools for evaluating the residue conservation and for the homology modelling of the variant. Selecting "Multiple alignments" from the home page (figure (figure1),1 Molecular models can be obtained, if the residue falls in the 3D-structure, by the same automatic procedure based on the script of Modeller [19] used for entries preparation. Models can be visualized and 3D coordinates can be downloaded for further studies. As the models are obtained in a completely automatic way, the user is cautioned about the possibility of having obtained non-accurate results. Careful inspection of the outcome is therefore recommended. Data submission ProCMD features an online submission of mutation data. New mutations regarding the coding region of the protein C gene can be sent by filling out the fields on the online submission form. A text mail will be automatically generated by the server and sent to the authors after submission. The database curators verify the submitted data and will incorporate them after annotation according to the database format. Conclusion This database provides a tool, complementary to other mutant collections of protein C, that is especially devoted to structural analysis and interpretation. A great effort has been put into the production of 3D-images associated with the molecular models and input files for interactive viewers which visualize the models in 3D space. The availability of structural models can be useful in the research and clinical fields both to elucidate how a mutation may interfere with enzymatic activity, ligand binding and cofactor interaction, and to relate the effect to patient phenotype. The present resource can be valuable to help predict the effect of a mutation, to clarify the role of specific residues in protein function and hopefully to give hints for the rational design of specific variants of protein C for therapeutic use. Availability and requirements The database is maintained on the server of the Institute of Biomedical Technology -National Research Council (Segrate – MI, Italy) and is available at the following URL http://www.itb.cnr.it/procmd Abbreviations ProCMD:protein C mutation database, PDB:Protein Data Bank, APC: Activated Protein C, HGMD: Human Gene Mutation Database, VRML: Virtual Reality Modelling Language. Authors' contributions PD conceived and designed the database and drafted the manuscript, FM carried out the data collection and annotation, implemented the SQL database and the web server pages and drafted the manuscript, AC structured the database and supervised the implementation, LM contributed to manuscript revision, EMF contributed to data communication and critically revised the manuscript, ER coordinated and supervised the project and prepared the manuscript. All authors read and approved the final manuscript Acknowledgements We are indebted to Chiara Bishop for the critical reading of the manuscript. This work was supported by European Project BioinfoGRID (Bioinformatics Application for Life Science). This article has been published as part of BMC Bioinformatics Volume 8, Supplement 1, 2007: Italian Society of Bioinformatics (BITS): Annual Meeting 2006. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S1. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Arterioscler Thromb Vasc Biol. 2004 Aug; 24(8):1374-83.
[Arterioscler Thromb Vasc Biol. 2004]N Engl J Med. 2001 Mar 8; 344(10):699-709.
[N Engl J Med. 2001]Semin Thromb Hemost. 1984 Apr; 10(2):109-21.
[Semin Thromb Hemost. 1984]Science. 1987 Mar 13; 235(4794):1348-52.
[Science. 1987]J Exp Med. 1998 Apr 6; 187(7):1029-35.
[J Exp Med. 1998]Semin Thromb Hemost. 1984 Apr; 10(2):109-21.
[Semin Thromb Hemost. 1984]EMBO J. 1996 Dec 16; 15(24):6822-31.
[EMBO J. 1996]J Biol Chem. 2002 Jul 12; 277(28):24851-4.
[J Biol Chem. 2002]Semin Hematol. 1997 Jul; 34(3):205-16.
[Semin Hematol. 1997]Thromb Res. 1995 Jan 1; 77(1):1-43.
[Thromb Res. 1995]Thromb Haemost. 1995 May; 73(5):876-89.
[Thromb Haemost. 1995]Thromb Haemost. 1995 May; 73(5):876-89.
[Thromb Haemost. 1995]Hum Mutat. 2003 Jun; 21(6):577-81.
[Hum Mutat. 2003]Hum Mutat. 2004 May; 23(5):464-70.
[Hum Mutat. 2004]Br J Haematol. 2000 Feb; 108(2):265-71.
[Br J Haematol. 2000]Hum Mutat. 2007 Apr; 28(4):345-55.
[Hum Mutat. 2007]Hum Mutat. 2007 Apr; 28(4):345-55.
[Hum Mutat. 2007]Thromb Haemost. 1995 May; 73(5):876-89.
[Thromb Haemost. 1995]Hum Mutat. 2003 Jun; 21(6):577-81.
[Hum Mutat. 2003]Hum Mutat. 2004 May; 23(5):464-70.
[Hum Mutat. 2004]Gene. 1988 Dec 15; 73(1):237-44.
[Gene. 1988]J Mol Biol. 1993 Dec 5; 234(3):779-815.
[J Mol Biol. 1993]Proc Natl Acad Sci U S A. 1985 Jul; 82(14):4673-7.
[Proc Natl Acad Sci U S A. 1985]J Mol Biol. 1993 Dec 5; 234(3):779-815.
[J Mol Biol. 1993]