Display Settings:

Format

Send to:

Choose Destination
    BMC Bioinformatics. 2010 Dec 16;11:599.

    A genome alignment algorithm based on compression.

    Source

    Clayton School of Information Technology, Monash University, Clayton 3800, Australia. minhduc@monash.edu

    Abstract

    BACKGROUND:

    Traditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique.

    RESULTS:

    Experiments on both simulated data and real data show that XMAligner is superior to conventional methods especially on distantly related sequences and statistically biased data. XMAligner can align sequences of eukaryote genome size with only a modest hardware requirement. Importantly, the method has an objective function which can obviate the need to choose parameter values for high quality alignment. The alignment results from XMAligner can be integrated into a visualisation tool for viewing purpose.

    CONCLUSIONS:

    The information-theoretic approach for sequence alignment is shown to overcome the mentioned problems of conventional character matching alignment methods. The article shows that, as genomic sequences are meant to carry information, considering the information content of nucleotides is helpful for genomic sequence alignment.

    AVAILABILITY:

    Downloadable binaries, documentation and data can be found at ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/XMAligner/.

    PMID:
    21159205
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC3022628
    Free PMC Article

    Images from this publication.See all images (6) Free text

    Figure 1
    Figure 3
    Figure 5
    Figure 2
    Figure 4
    Figure 6

      Supplemental Content

      Icon for BioMed Central Icon for PubMed Central

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk