Format

Send to

Choose Destination
ISRN Bioinform. 2012 Apr 23;2012:816402. doi: 10.5402/2012/816402. eCollection 2012.

Enhancing de novo transcriptome assembly by incorporating multiple overlap sizes.

Author information

1
Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.
2
Institute of Plant and Microbial Biology, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan.
3
Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan.

Abstract

BACKGROUND:

The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly.

METHODOLOGY:

To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source.

SIGNIFICANCE:

Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center