Send to

Choose Destination
Comput Biol Chem. 2004 Apr;28(2):149-53.

A nucleotide composition constraint of genome sequences.

Author information

Department of Physics, Tianjin University, Tianjin 300072, China.


Let a, c, g and t denote the occurrence frequencies of A, C, G and T, respectively, in a genome. We calculated the statistical quantity S = a2 + c2 + g2 + t2 for each of 809 genomes (11 archaea, 42 bacteria, 3 eukaryota, 90 phages, 36 viroids and 627 viruses) and 236 plasmids. We found that S < 1/3 is strictly valid for almost all of the above genomes or plasmids. As a direct deduction of the above observation, it is shown that (i) the statistical quantity S is a kind of genome order index, which is negatively correlated with the Shannon H function; (ii) S < 1/3 suggests that a minimal value of the Shannon H function is required for each genome; (iii) S defined above would be a new biological statistical quantity, useful to describe the composition features of genomes; (iv) By jointly considering the Chargaff Parity Rule 2, it is shown that the genomic G + C content should be in between 0.211 and 0.789.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center