Format

Send to

Choose Destination
See comment in PubMed Commons below
EURASIP J Bioinform Syst Biol. 2012 Nov 27;2012(1):18. doi: 10.1186/1687-4153-2012-18.

Optimal reference sequence selection for genome assembly using minimum description length principle.

Author information

1
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, USA. bilalwajidabbas@hotmail.com.

Abstract

: Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that "counting the number of reads of the novel genome present in the reference sequence" is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of "counting the number of reads that align to the reference sequence" and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome.

PubMed Commons home

PubMed Commons

0 comments

    Supplemental Content

    Full text links

    Icon for Springer Icon for PubMed Central
    Loading ...
    Support Center