Evolutionary constraint and disease associations of post-translational modification sites in human genomes

PLoS Genet. 2015 Jan 22;11(1):e1004919. doi: 10.1371/journal.pgen.1004919. eCollection 2015 Jan.

Abstract

Interpreting the impact of human genome variation on phenotype is challenging. The functional effect of protein-coding variants is often predicted using sequence conservation and population frequency data, however other factors are likely relevant. We hypothesized that variants in protein post-translational modification (PTM) sites contribute to phenotype variation and disease. We analyzed fraction of rare variants and non-synonymous to synonymous variant ratio (Ka/Ks) in 7,500 human genomes and found a significant negative selection signal in PTM regions independent of six factors, including conservation, codon usage, and GC-content, that is widely distributed across tissue-specific genes and function classes. PTM regions are also enriched in known disease mutations, suggesting that PTM variation is more likely deleterious. PTM constraint also affects flanking sequence around modified residues and increases around clustered sites, indicating presence of functionally important short linear motifs. Using target site motifs of 124 kinases, we predict that at least ∼180,000 motif-breaker amino acid residues that disrupt PTM sites when substituted, and highlight kinase motifs that show specific negative selection and enrichment of disease mutations. We provide this dataset with corresponding hypothesized mechanisms as a community resource. As an example of our integrative approach, we propose that PTPN11 variants in Noonan syndrome aberrantly activate the protein by disrupting an uncharacterized cluster of phosphorylation sites. Further, as PTMs are molecular switches that are modulated by drugs, we study mutated binding sites of PTM enzymes in disease genes and define a drug-disease network containing 413 novel predicted disease-gene links.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Composition
  • Binding Sites
  • Codon / genetics
  • Conserved Sequence / genetics
  • Genome, Human*
  • Humans
  • Noonan Syndrome / etiology
  • Noonan Syndrome / genetics
  • Protein Processing, Post-Translational / genetics*
  • Protein Tyrosine Phosphatase, Non-Receptor Type 11 / genetics
  • Proteins / genetics*
  • Proteins / metabolism
  • Selection, Genetic / genetics*

Substances

  • Codon
  • Proteins
  • PTPN11 protein, human
  • Protein Tyrosine Phosphatase, Non-Receptor Type 11