Format

Send to

Choose Destination
Bioinformatics. 2014 Jun 15;30(12):i302-9. doi: 10.1093/bioinformatics/btu280.

Ragout-a reference-assisted assembly tool for bacterial genomes.

Author information

1
St. Petersburg University of the Russian Academy of Sciences, Bioinformatics Institute, St. Petersburg, Russia, UCSC, 1156 High Street, Santa Cruz, CA and Department of Computer Science and Engineering, UCSD, 9500 Gilman Drive, La Jolla, CA, USASt. Petersburg University of the Russian Academy of Sciences, Bioinformatics Institute, St. Petersburg, Russia, UCSC, 1156 High Street, Santa Cruz, CA and Department of Computer Science and Engineering, UCSD, 9500 Gilman Drive, La Jolla, CA, USA.
2
St. Petersburg University of the Russian Academy of Sciences, Bioinformatics Institute, St. Petersburg, Russia, UCSC, 1156 High Street, Santa Cruz, CA and Department of Computer Science and Engineering, UCSD, 9500 Gilman Drive, La Jolla, CA, USA.

Abstract

SUMMARY:

Bacterial genomes are simpler than mammalian ones, and yet assembling the former from the data currently generated by high-throughput short-read sequencing machines still results in hundreds of contigs. To improve assembly quality, recent studies have utilized longer Pacific Biosciences (PacBio) reads or jumping libraries to connect contigs into larger scaffolds or help assemblers resolve ambiguities in repetitive regions of the genome. However, their popularity in contemporary genomic research is still limited by high cost and error rates. In this work, we explore the possibility of improving assemblies by using complete genomes from closely related species/strains. We present Ragout, a genome rearrangement approach, to address this problem. In contrast with most reference-guided algorithms, where only one reference genome is used, Ragout uses multiple references along with the evolutionary relationship among these references in order to determine the correct order of the contigs. Additionally, Ragout uses the assembly graph and multi-scale synteny blocks to reduce assembly gaps caused by small contigs from the input assembly. In simulations as well as real datasets, we believe that for common bacterial species, where many complete genome sequences from related strains have been available, the current high-throughput short-read sequencing paradigm is sufficient to obtain a single high-quality scaffold for each chromosome.

AVAILABILITY:

The Ragout software is freely available at: https://github.com/fenderglass/Ragout.

PMID:
24931998
PMCID:
PMC4058940
DOI:
10.1093/bioinformatics/btu280
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center