Send to

Choose Destination
DNA Res. 2004 Dec 31;11(6):361-70.

Gene recognition based on nucleotide distribution of ORFs in a hyper-thermophilic crenarchaeon, Aeropyrum pernix K1.

Author information

Department of Physics, Tianjin University, Tianjin, 300072, China.


The 2694 ORFs originally annotated as potential genes in the genome of Aeropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are coding ones, while those assigned to clusters B and C are non-coding ORFs. A "codingness" index called the AZ score is defined based on a clustering method used to recognize protein-coding genes in the A. pernix genome. The criterion for a coding or non-coding ORF is based on the AZ score. ORFs with AZ > 0 or AZ < 0 are coding or non-coding, respectively. Consequently, 620 out of 632 ORFs with putative functions based on the original annotation are contained in cluster A, which have positive AZ scores. In addition, all 29 ORFs encoding putative or conserved proteins newly added in RefSeq annotation also have positive AZ scores. Accordingly, the number of re-recognized protein-coding genes in the A. pernix genome is 1610, which is significantly less than 2694 in the original annotation and also much less than 1841 in the RefSeq annotation curated by NCBI staff. Annotation information of re-recognized genes and their AZ scores are available at:

[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center