Send to

Choose Destination
Proc Natl Acad Sci U S A. 1995 Mar 28;92(7):2879-83.

Diverse incidences of individual oligopeptides (dipeptidic to hexapeptidic) in proteins of human, bakers' yeast, and Escherichia coli origin registered in the Swiss-Prot data base.

Author information

Biological Informatics Section, Fujitsu Labs, Ltd., Chiba, Japan.


Oligopeptidic permutations of the 20 amino acid residues give rise to proteins of diverse functions. Our long-term goal is to produce a lexicon of oligopeptides, classifying them into at least five categories: (i) ubiquitous, (ii) function specific, (iii) group specific, (iv) species specific, and (v) nonexistent. To begin with, we report on the varying frequencies of individual oligopeptides (dipeptidic to hexapeptidic in length) found among 2862 human proteins, 1942 Saccharomyces cerevisiae proteins, and 2672 Escherichia coli proteins registered in the Swiss-Prot data base (version 29.0, released in June 1994). At all lengths (dipeptides to hexapeptides), homooligopeptides were very prominent among the most frequently occurring varieties in proteins of human and bakers' yeast origins. However, this was not the case with E. coli. While all of the expected 20(3) varieties of tripeptides were found among human proteins, three tripeptides (Cys-Cys-Trp, Trp-Trp-Cys, and Trp-Trp-His) were missing from the bakers' yeast proteins. Three tripeptides (Cys-Ile-Trp, Cys-Met-Tyr, and Cys-Trp-Trp) were also absent from E. coli proteins. Inasmuch as the Swiss-Prot data base already contained 67% of the expected total of 4000 E. coli proteins, it is virtually certain that 96,000 varieties of hexapeptides containing at least one or another of the three missing tripeptides noted above shall be nonexistent in E. coli. Furthermore, the observation of missing tripeptides in the bakers' yeast proteins suggests that nonexistent hexapeptides shall be highly phylum specific. Because of the sample size, only a small fraction of the 20(6) varieties of hexapeptides were expected to be encountered in the present survey. Indeed, only 1.2-1.5% of the possible hexapeptides were found, and the average copy number of observed hexapeptides varied between 1.06 and 1.25. Nevertheless, 33 varieties of hexapeptides occurred in 102-169 copies among human proteins. Furthermore, 15 of the 33 varieties contained such rarely used residues as Tyr, His, Cys, and Trp.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center