Format

Send to

Choose Destination
J Theor Biol. 1994 Mar 21;167(2):161-6.

Analysis on the distribution of bases in 1487 human protein coding sequences.

Author information

1
Department of Physics, Tianjin University, China.

Abstract

The occurrence frequencies of bases A, C, G and T, denoted by a, c, g and t, respectively, in 1487 human protein coding sequences have been calculated and analyzed. The analysis has been performed by a diagrammatic method presented recently, in which each coding sequence is represented by a point in 3-D space. The distribution of points gives the observer an overall and intuitive picture of the base frequencies. The distance between a point and the origin of the co-ordinate, which corresponds to the case of a = c = g = t = 1/4, is called the radical distance. The radical distribution of 1487 points in 3-D space has been found to be normal, with the center basically coinciding with the origin of the co-ordinate. We have found that among 1487 coding sequences, an empirical rule a2 + c2 + g2 + t2 < 1/3 holds for 1486 sequences. The only sequence in which the above rule does not hold is the one coding for the human parathymosin protein. The composition of amino acids and the structural class of this protein has been studied in some detail.

PMID:
8207944
DOI:
10.1006/jtbi.1994.1060
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center