Format

Send to

Choose Destination
Bioinformatics. 2016 Sep 1;32(17):i545-i551. doi: 10.1093/bioinformatics/btw463.

CoLoRMap: Correcting Long Reads by Mapping short reads.

Author information

1
School of Computing Sciences MADD-Gen Graduate Program, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
2
School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada.
3
School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada, School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA.
4
Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.

Abstract

MOTIVATION:

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.

RESULTS:

We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.

AVAILABILITY AND IMPLEMENTATION:

The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap

CONTACT:

ehaghshe@sfu.ca or cedric.chauve@sfu.ca

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
27587673
DOI:
10.1093/bioinformatics/btw463
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center