Display Settings:

Format

Send to:

Choose Destination
    Biol Direct. 2010 Feb 17;5:10.

    'Genome order index' should not be used for defining compositional constraints in nucleotide sequences--a case study of the Z-curve.

    Source

    McKusick - Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA. eelhaik@gmail.com

    Abstract

    BACKGROUND:

    The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a2+ c2+t2+g2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ square root 3 contains almost all points corresponding to various genomes, implying that S <r2. The distribution of the points P obtained by S was studied using the Z-curve.

    RESULTS:

    In this work, we studied the basic properties of the Z-curve using the "genome order index" as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome.

    CONCLUSION:

    The "genome order index" S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively.

    PMID:
    20158921
    [PubMed - indexed for MEDLINE]
    PMCID: PMC2841071
    Free PMC Article

    Images from this publication.See all images (4) Free text

    Figure 2
    Figure 4
    Figure 3
    Figure 1

      Supplemental Content

      Click here to read Click here to read

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk