Format

Send to

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2013 Nov 15;29(22):2826-34. doi: 10.1093/bioinformatics/btt502. Epub 2013 Sep 20.

Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold.

Author information

1
Department of Intelligent Systems, The Delft Bioinformatics Lab, Delft University of Technology, 2628 CD Delft, The Netherlands, Kluyver Centre for Genomics of Industrial Fermentation, 2600 GA Delft, The Netherlands and Department of Computer Science, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.

Abstract

MOTIVATION:

Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes.

RESULTS:

We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation. We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets.

AVAILABILITY:

MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/software

PMID:
24058058
PMCID:
PMC3916741
DOI:
10.1093/bioinformatics/btt502
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems Icon for PubMed Central
    Loading ...
    Support Center