• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2013; 41(D1): D306–D311.
Published online Nov 27, 2012. doi:  10.1093/nar/gks1230
PMCID: PMC3531129

PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins

Abstract

Post-translational modifications (PTMs) are involved in the regulation and structural stabilization of eukaryotic proteins. The combination of individual PTM states is a key to modulate cellular functions as became evident in a few well-studied proteins. This combinatorial setting, dubbed the PTM code, has been proposed to be extended to whole proteomes in eukaryotes. Although we are still far from deciphering such a complex language, thousands of protein PTM sites are being mapped by high-throughput technologies, thus providing sufficient data for comparative analysis. PTMcode (http://ptmcode.embl.de) aims to compile known and predicted PTM associations to provide a framework that would enable hypothesis-driven experimental or computational analysis of various scales. In its first release, PTMcode provides PTM functional associations of 13 different PTM types within proteins in 8 eukaryotes. They are based on five evidence channels: a literature survey, residue co-evolution, structural proximity, PTMs at the same residue and location within PTM highly enriched protein regions (hotspots). PTMcode is presented as a protein-based searchable database with an interactive web interface providing the context of the co-regulation of nearly 75 000 residues in >10 000 proteins.

INTRODUCTION

Most eukaryotic proteins are targeted by a multitude of post-translational modifications (PTMs) that fine-tune their function as a rapid response to stimuli without involvement of genomic, transcriptomic or translational regulation. These PTMs are present in various types and combinations, and their on–off status can vary during the life time of a protein, thereby fine-tuning its function, localization and interaction with other molecules. Specific mechanisms have been associated to particular pairs of PTM types, like the competition for serine and threonine residues by phosphorylation and O-linked glycosylation (1,2) or the promotion of ubiquitination by phosphorylation that leads to protein degradation (3), and there are extensive studies describing the regulation by PTM interplay of individual proteins such as the tumor suppressor p53 (4) or the well known compilation of molecular switches that occur within histone tails (5), suggesting the existence of a ‘PTM code’ (6–9) based on the presence and the association of several PTMs that leads to perform a particular function (10). Most of the studies trying to decipher this molecular barcode are based on single proteins and few PTM types (4,11); however, thanks to the recent technological advances in the mass spectrometry-based detection methods (12), an increasing amount of data about protein modifications is becoming available, increasing the diversity and abundance of reported PTMs, although it is far from being complete. Yet, deciphering a potential PTM code remains a difficult challenge, as the collection of the parts lists, the PTMs repertoire, is only a first step, and individual PTMs have to be functionally associated, somewhat analogous to the delineation of proteomes whereby individual proteins are involved in complex protein–protein interactions.

Only recently, several studies have explored the association of several PTM types in whole proteomes based on, for instance, the study of acetylation status by the systematic perturbation of kinases (13), structural changes in modified residues in response to in-silico perturbations of other modification sites (14), the competition of several PTMs for a residue (15), the presence of clusters of PTMs within the protein sequence (16) or the co-evolution of the modified residues, where pairs of particular PTM types were associated to specific protein localizations, functions and protein functional units such as short linear motifs and globular domains (17). The latter being the first attempt to characterize the interplay between a large number of PTM types in several eukaryotes. Based on these recent independent developments, one should be able to derive coherent functional associations between modified residues, thus adding value to information about individual modifications, stored in classical protein resources (18) or PTM specific databases (19–22), again analogous to protein sequence and protein-interaction databases. A functional association between two PTMs should be seen here as a broad concept that not only stands for a physical interaction or a competition (PTM crosstalk) but also describing more broad associations, i.e. PTMs that are not present in the protein at the same time but involved in the same protein function.

The PTMcode database combines results from several large-scale analyses that identify known and predicted functionally associated PTMs of 13 different types from 8 eukaryotes. As the first large-scale public database providing this kind of information, we believe that PTMcode will enable both computational and molecular biology laboratories to further PTM research in many ways and at various scales, ranging from individual mechanistic studies to global network analyses towards a global PTM code.

RESULTS

Available PTMs and their functional associations

We extracted all experimental validated PTMs available at UniProt (18), PHOSIDA (19), PhosphoSite (20), PhosphoELM (21), O-GlycBase (22), dbPTM (23) and HPRD (24) and performed a pre-processing task to avoid redundancy and non-matching modifications. First, protein Ids from the sources were converted to a reference Id, taken from the STRING database (25), and second, the modified residues were required to match the amino acids in a reference sequence, taken from the eggNOG database (26) which in turn, fetches the longest protein isoform from the Ensembl database (27). In total, we integrated 136 258 experimentally determined, non-redundant PTMs of 13 different types in 25 765 proteins of 8 different eukaryotes (Homo sapiens, Mus musculus, Rattus norvegicus, Bos taurus, Gallus gallus, Drosophila melanogaster, Caenorhabditis elegans and Saccharomyces cerevisiae).

We combine five different channels for the extraction of known or predicted functional association between PTMs (Figure 1): (i) sites that co-evolve across many eukaryotes as described in (17); (ii) PTMs associated based on their proximity in the protein structure, as modified sites that tightly co-operate seems to be clustered (4,28); (iii) PTMs known to modify the same residue in the protein sequence; (iv) a manual annotation survey to identify known PTM crosstalks in the literature; and (v) PTMs that are located within PTM ‘hotspots’, significant high-density modified regions within the protein sequence (16). From the four channels that assign pairwise associations (co-evolution, structural distance, PTMs modifying the same residue and manual annotation), PTMcode holds 401 690 distinct functional associations describing the co-regulation of 74 839 residues in 10 410 proteins. In addition, 7400 residues have been extracted from 1635 computed regions with high PTM density. More details on the PTM associations content are in Table 1.

Figure 1.
PTMcode integrates five types of evidence channels to collect known and predicted functional associated PTMs within the same protein, named in here as: (A) ‘Co-evolution’ where two modified residues are found to be significantly co-evolving ...
Table 1.
Number of functional associations that are predicted by each of the evidence channels and those that at least have two different evidences, only considering the four types of evidences that assign concrete pairwise association (co-evolution, structural ...

PTMs in the context of protein domains and structures

PTMcode is not a resource aiming at PTM collections, but instead for exploration, retrieval and analysis of predicted and known functional association between PTMs within proteins. Yet, we map PTMs onto curated protein domains and unstructured regions as defined by the database SMART (29). PTMs become then part of the functional annotation, and their associations can be seen as part of the whole regulation of protein domains. We classify PTMs into three categories: ‘regulatory’ (those involved in regulation of protein function), ‘stabilizing’ (those that are not involved in regulation of function but required for conformational purposes) and ‘uncharacterized’ (those with unknown or unclear function, as in the case of C-linked glycosylations). In addition, we added, when applicable, a view of the three dimensional structure of the domain or the entire protein, highlighting any two modified residues that are predicted to be functionally associated by any of our five source channels. The structure is displayed using the popular java viewer Jmol (http://www.jmol.org) in an interacting way and can be explored by using all Jmol features. Information about modifying enzymes (such as protein kinases) is also provided by extracting it from several sources (19–21,24).

PTMcode also supplies a score for each PTM, the relative Residue Conservation Score (rRCS), that measures the conservation of the modified residue over the oldest eukaryotic orthologous group where the protein in present and takes into account both the conservation of the residue within the orthologous proteins and the evolutionary distance between the species with the conserved residue; for full details on rRCS algorithm and performance see (17). An rRCS >95 means that the modified residue is more conserved than the 95% of the same amino acids within the same type of protein region. The rRCS can be used as a proxy to hint at PTM functionality in the absence of other data. Conservation has been used before for the same purpose (30–32), although caution is required in its interpretation (33). Yet, several other PTM databases provide simple information about protein and residue conservation to be used as a filter for functional sites (19,20).

Co-evolution

We used the co-evolution of two modified residues to predict their functional association as described in (17). This strategy showed already that co-evolving pairs of different PTM types can be specifically linked in proteins with certain functionalities, localizations and can even potentially co-regulate protein interactions through their association to particular protein domains and short linear motifs (17). The functional associations provided by this prediction channel should be considered from a broad perspective, as they can range from physical interactions (as seen by the overlap with pairs of PTMs found close in the protein structure) to their participation in the same protein functionality although not necessarily at the same stage. The species where both amino acids are conserved over the protein orthologous groups are shown in the co-evolution pop-up window in which the protein alignment with the respective columns highlighted can be visualized using Jalview (34).

Structural distance

A straightforward mechanism of two PTMs to be associated is based on their proximity (4,28), measured here using the 3D structure of the protein. If they are close enough, they could be either competing for the same space, i.e. methylation inhibiting the phosphorylation of adjacent serines (35), or co-operating in the regulation of the same protein region [i.e. the highly modified cassette of amino acids in p53 (4)]. We mapped PTM residues to three-dimensional structures of proteins from the Protein Data Bank (36) and calculated the spatial distance between pairs of modified residues. To delineate a first estimation for an appropriate distance to conclude physical interaction, we measured the average distance for 12 pairs of associated modifications reported in the literature to physically interact. Thus, modified residues closer than 4.69 Å are predicted to either be physically in contact or being mutually exclusive competing for the same protein niche; their conformation can be visualized using the Jmol plugin.

PTMs modifying the same residue

The simplest evidence for a direct crosstalk of two PTMs is a modification of the same residue in the protein sequence, which would reflect either that they compete for the same amino acid (mutually exclusive PTMs) or that they co-operate for the same function if the modifications happen sequentially in time. Two well-known associations between PTM types are described to follow a competition strategy. Phosphorylation and O-linked glycosylation modify serine and threonine amino acids and constitute molecular switches that co-regulate protein function and localization within the so-called yin-yang sites (1,2). The promiscuous amino acid lysine can be acetylated, SUMOylated, ubiquitinated and methylated, and it has been described to be co-regulated by several PTM types at the same position, for example, during the regulation of histone tails (5).

We identified 576 residues regulated by this channel, mostly between the above reported cases but also between other pairs of PTM types [i.e. 10 instances of hydroxylated and O-linked glycosylated lysines that happen sequentially in time in collagen proteins (37) or 7 instances between phosphorylation and sulfation]. PTMcode provides a tentative annotation for the pairs of associated PTMs predicted by this channel, classifying them as ‘competing’, ‘co-operating’ or ‘uncharacterized’ associations.

Manual annotation

PTMcode not only holds predicted associations but also PTM sites that are reported in the literature to crosstalk. They were extracted using the literature review about PTM types interplay in Minguez et al. (17) and are now introduced in the database after their mapping into the correct protein sequences. Links to the scientific articles through PubMed and a short description on the mechanism of action are provided for the 57 associations found using this channel.

Hotspots

PTMs can also be part of regulatory hotspots, small regions in the protein sequence that are enriched in modifications (4). Such regions have been recently defined (16), and rules have been established and benchmarked. For example, modified lysines are more probably located within a distance of 15 amino acids to a phosphorylated residue therefore forming hotspot regions where PTMs tend to cluster. According to the established rules, for each of the modified residues in a protein, we define a window of 31 amino acids (15 downstream and 15 upstream), count the number of modifications there and compared them using a Fisher exact test to the number of modifications in the whole protein. All resultant P-values were adjusted by False Discovery Rate, and overlapping regions were collapsed to give a total of 1635 hotspots that are visible and explorable within the PTMcode web interface.

PTMcode web interface. Query, results and availability

PTMcode is accessible through the url http://ptmcode.embl.de. The web interface provides a browser to access all proteins that have some known or predicted functional associations between PTMs (Figure 2) and a search engine where the user can introduce a protein sequence or any protein id. In addition, the user can restrict the search to a particular residue or protein region. PTMcode is a protein-oriented database, as one of the major motivation for the resource is to help experts of particular proteins or research fields to explore functional hypotheses. A flash-based graphic interface enables intuitive interactivity. A single or many functional associations can be explored, and supporting information (alignment, structure etc) for each of the five channels can be easily called, all within the context of the protein domain architecture with links to the respective SMART entries. Figure 3 shows an overview of the predicted functional co-regulation by several post-translational modifications of a protein within the PTMcode environment.

Figure 2.
The PTMcode database uses in its first release 13 different types of PTMs that are abbreviated in a two letter code as: Ph (phosphorylation), NG (N-linked glycosylation), Ac (acetylation), OG (O-linked glycosylation), Ub (ubiquitination), Me (methylation), ...
Figure 3.
PTMcode offers the exploration of post-translational regulation within thousands of proteins. (A) Interactive graphical display of functional associations between PTMs within the human EGF receptor (EGFR). The protein is represented by the grey line at ...

Conclusion and future plans

PTMcode is a unique database in that it goes beyond mere compilation of PTMs, which is already covered by useful resources either detailed for the single PTM type (21) or for various PTM types (19,20,23). The aim is to put these PTMs into context, and focus of the first PTMcode release is to capture known and predicted functional associations between PTMs within proteins. Other databases start to implement functional context of the individual PTMs providing for instance, conservation information, mapping PTMs into proteins domains or 3D structures. Ptmfunc (16) for example provides information about the regulation of binding interfaces, protein domains and even the presence within a hotspot, although it lacks of a graphical display of the protein. PTMcode goes a step further and is PTM association-centric, aiming at a description of a co-regulation landscape of a protein by means of its modified sites. In the future, we plan to develop PTMcode further by introducing more PTMs, more accurate prediction methods and to extend to associations between proteins in order to prepare the ground for deciphering the global PTM code.

FUNDING

EMBL; Marie Curie IEF fellowship (VII Framework Program) (to P.M.). Funding for open access charge: EMBL.

Conflict of interest statement: None declared.

ACKNOWLEDGEMENTS

We thank Yan Yuan for all his help and support on all technical and infrastructure issues we encountered during this project.

REFERENCES

1. Zeidan Q, Hart GW. The intersections between O-GlcNAcylation and phosphorylation: implications for multiple signaling pathways. J. Cell Sci. 2010;123:13–22. [PMC free article] [PubMed]
2. Butt AM, Khan IB, Hussain M, Idress M, Lu J, Tong Y. Role of post translational modifications and novel crosstalk between phosphorylation and O-beta-GlcNAc modifications in human claudin-1, -3 and -4. Mol. Biol. Rep. 2011;39:1359–1369. [PubMed]
3. Vodermaier HC. APC/C and SCF: controlling each other and the cell cycle. Curr. Biol. 2004;14:R787–R796. [PubMed]
4. Brooks CL, Gu W. Ubiquitination, phosphorylation and acetylation: the molecular basis for p53 regulation. Curr. Opin. Cell Biol. 2003;15:164–171. [PubMed]
5. Latham JA, Dent SYR. Cross-regulation of histone modifications. Nat. Struct. Mol. Biol. 2007;14:1017–1024. [PubMed]
6. Hunter T. The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol. Cell. 2007;28:730–738. [PubMed]
7. Yang XJ, Seto E. Lysine acetylation: codified crosstalk with other posttranslational modifications. Mol. Cell. 2009;31:449–461. [PMC free article] [PubMed]
8. Creixell P, Linding R. Cells, shared memory and breaking the PTM code. Mol. Syst. Biol. 2012;8:598. [PMC free article] [PubMed]
9. Benayoun BA, Veitia RA. A post-translational modification code for transcription factors: sorting through a sea of signals. Trends Cell Biol. 2009;19:189–197. [PubMed]
10. Seet BT, Dikic I, Zhou MM, Pawson T. Reading protein modifications with interaction domains. Nat. Rev. Mol. Cell Biol. 2006;7:473–483. [PubMed]
11. Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M. Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science. 2009;325:834–840. [PubMed]
12. Choudhary C, Mann M. Decoding signalling networks by mass spectrometry-based proteomics. Nat. Rev. Mol. Cell Biol. 2010;11:427–439. [PubMed]
13. van Noort V, Seebacher J, Bader S, Mohammed S, Vonkova I, Betts MJ, Kühner S, Kumar R, Maier T, O’Flaherty M, et al. Cross-talk between phosphorylation and lysine acetylation in a genome-reduced bacterium. Mol. Syst. Biol. 2012;8:571. [PMC free article] [PubMed]
14. Lu Z, Cheng Z, Zhao Y, Volchenboum SL. Bioinformatic analysis and post-translational modification crosstalk prediction of lysine acetylation. PLoS One. 2011;6:e28228. [PMC free article] [PubMed]
15. Danielsen JM, Sylvestersen KB, Bekker-Jensen S, Szklarczyk D, Poulsen JW, Horn H, Jensen LJ, Mailand N, Nielsen ML. Mass spectrometric analysis of lysine ubiquitylation reveals promiscuity at site level. Mol. Cell. Proteomics. 2011;10 M110.003590. [PMC free article] [PubMed]
16. Beltrao P, Albanèse V, Kenner LR, Swaney DL, Burlingame A, Villén J, Lim WA, Fraser JS, Frydman J, Krogan NJ. Systematic functional prioritization of protein posttranslational modifications. Cell. 2012;150:413–425. [PMC free article] [PubMed]
17. Minguez P, Parca L, Diella F, Mende DR, Kumar R, Helmer-Citterich M, Gavin AC, van Noort V, Bork P. Deciphering a global network of functionally associated post-translational modifications. Mol. Syst. Biol. 2012;8:599. [PMC free article] [PubMed]
18. The UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2012;40:D71–D75. [PMC free article] [PubMed]
19. Gnad F, Gunawardena J, Mann M. PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Res. 2010;39:D253–D260. [PMC free article] [PubMed]
20. Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 2012;40:D261–D270. [PMC free article] [PubMed]
21. Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, Diella F. Phospho.ELM: a database of phosphorylation sites—update 2011. Nucleic Acids Res. 2010;39:D261–D267. [PMC free article] [PubMed]
22. Gupta R, Birch H, Rapacki K, Brunak S, Hansen JE. O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins. Nucleic Acids Res. 1999;27:370–372. [PMC free article] [PubMed]
23. Lee TY, Huang HD, Hung JH, Huang HY, Yang YS, Wang TH. dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. 2006;34:D622–D627. [PMC free article] [PubMed]
24. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al. Human protein reference database—2009 update. Nucleic Acids Res. 2009;37:D767–D772. [PMC free article] [PubMed]
25. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–D568. [PMC free article] [PubMed]
26. Muller J, Szklarczyk D, Julien P, Letunic I, Roth A, Kuhn M, Powell S, von Mering C, Doerks T, Jensen LJ, et al. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 2010;38:D190–D195. [PMC free article] [PubMed]
27. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–D90. [PMC free article] [PubMed]
28. Christensen B, Nielsen MS, Haselmann KF, Petersen TE, Sørensen ES. Post-translationally modified residues of native human osteopontin are located in clusters: identification of 36 phosphorylation and five O-glycosylation sites and their biological implications. Biochem. J. 2005;390:285–292. [PMC free article] [PubMed]
29. Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2012;40:D302–D305. [PMC free article] [PubMed]
30. Boekhorst J, van Breukelen B, Heck A, Snel B. Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome Biol. 2008;9:R144. [PMC free article] [PubMed]
31. Landry CR, Levy ED, Michnick SW. Weak functional constraints on phosphoproteomes. Trends Genet. 2009;25:193–197. [PubMed]
32. Chen SC, Chen FC, Li WH. Phosphorylated and nonphosphorylated serine and threonine residues evolve at different rates in mammals. Mol. Biol. Evol. 2010;27:2548–2554. [PMC free article] [PubMed]
33. Tan CS, Bodenmiller B, Pasculescu A, Jovanovic M, Hengartner MO, Jørgensen C, Bader GD, Aebersold R, Pawson T, Linding R. Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases. Sci. Signal. 2009;2:ra39. [PubMed]
34. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. [PMC free article] [PubMed]
35. Zhang K, Lin W, Latham JA, Riefler GM, Schumacher JM, Chan C, Tatchell K, Hawke DH, Kobayashi R, Dent SYR. The Set1 methyltransferase opposes Ipl1 aurora kinase functions in chromosome segregation. Cell. 2005;122:723–734. [PMC free article] [PubMed]
36. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
37. Oikarinen A, Anttinen H, Kivirikko KI. Hydroxylation of lysine and glycosylation of hydroxylysine during collagen biosynthesis in isolated chick-embryo cartilage cells. Biochem. J. 1976;156:545–551. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...