Send to:

Choose Destination
See comment in PubMed Commons below
J Comput Biol. 2007 Jun;14(5):655-68.

Effects of long-range correlations in DNA on sequence alignment score statistics.

Author information

  • 1Max Planck Institute for Molecular Genetics, Berlin, Germany.


Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Mary Ann Liebert, Inc.
    Loading ...
    Write to the Help Desk