Format

Send to

Choose Destination
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan 5. doi: 10.1109/TCBB.2018.2789909. [Epub ahead of print]

GapReduce: a gap filling algorithm based on partitioned read sets.

Abstract

With the advances in technologies of sequencing and assembly, draft sequences of more and more genomes are available. However, there commonly exist gaps in these draft sequences which influence various downstream analysis of biological studies. Gap filling methods can shorten the length of gaps and improve the completion of these draft sequences of genomes. Although some gap filling tools have been developed, their effectiveness and accuracy need to be improved. In this study, we develop a novel tool, called GapReduce, which can fill the gaps using the paired reads. For a gap, GapReduce selects the reads whose mate reads are aligned on the left or the right flanking region, and partitions the reads to two sets. Then GapReduce adopts different values and frequency thresholds to iteratively construct De Bruijn graphs, which are used for finding the correct path to fill the gap. For overcoming the branching problems caused by repetitive regions and sequencing errors in the procedure of path selection, GapReduce designs a novel approach that simultaneously considers frequency and distribution of paired reads based on the partitioned read sets. We compare the performance of GapReduce with current popular gap filling tools. The experimental results demonstrate that GapReduce can produce satisfactory gap filling results, especially for long insert size datasets. GapReduce is publicly available for downloading at https://github.com/bioinfomaticsCSU/GapReduce.

PMID:
29993951
DOI:
10.1109/TCBB.2018.2789909

Supplemental Content

Full text links

Icon for IEEE Engineering in Medicine and Biology Society
Loading ...
Support Center