Format

Send to

Choose Destination
J Proteome Res. 2014 Apr 4;13(4):2028-44. doi: 10.1021/pr401191w. Epub 2014 Mar 10.

Annotating N termini for the human proteome project: N termini and Nα-acetylation status differentiate stable cleaved protein species from degradation remnants in the human erythrocyte proteome.

Author information

1
Centre for Blood Research, University of British Columbia , 2350 Health Sciences Mall, Vancouver, British Columbia V6T 1Z3, Canada.

Abstract

A goal of the Chromosome-centric Human Proteome Project is to identify all human protein species. With 3844 proteins annotated as "missing", this is challenging. Moreover, proteolytic processing generates new protein species with characteristic neo-N termini that are frequently accompanied by altered half-lives, function, interactions, and location. Enucleated and largely void of internal membranes and organelles, erythrocytes are simple yet proteomically challenging cells due to the high hemoglobin content and wide dynamic range of protein concentrations that impedes protein identification. Using the N-terminomics procedure TAILS, we identified 1369 human erythrocyte natural and neo-N-termini and 1234 proteins. Multiple semitryptic N-terminal peptides exhibited improved mass spectrometric identification properties versus the intact tryptic peptide enabling identification of 281 novel erythrocyte proteins and six missing proteins identified for the first time in the human proteome. With an improved bioinformatics workflow, we developed a new classification system and the Terminus Cluster Score. Thereby we described a new stabilizing N-end rule for processed protein termini, which discriminates novel protein species from degradation remnants, and identified protein domain hot spots susceptible to cleavage. Strikingly, 68% of the N-termini were within genome-encoded protein sequences, revealing alternative translation initiation sites, pervasive endoproteolytic processing, and stabilization of protein fragments in vivo. The mass spectrometry proteomics data have been deposited to ProteomeXchange with the data set identifier <PXD000434>.

PMID:
24555563
PMCID:
PMC3979129
DOI:
10.1021/pr401191w
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for American Chemical Society Icon for PubMed Central
Loading ...
Support Center