Format

Send to

Choose Destination
Microbiome. 2019 Feb 8;7(1):17. doi: 10.1186/s40168-019-0633-6.

CAMISIM: simulating metagenomes and microbial communities.

Author information

1
Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
2
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany.
3
German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, 38124, Germany.
4
Center for Biotechnology and Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany.
5
The ithree institute, University of Technology Sydney, Sydney NSW, 2007, Australia.
6
Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany. alice.mchardy@helmholtz-hzi.de.
7
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany. alice.mchardy@helmholtz-hzi.de.

Abstract

BACKGROUND:

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required.

RESULTS:

We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM.

CONCLUSIONS:

CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.

KEYWORDS:

Benchmarking; CAMI; Genome binning; Metagenome assembly; Metagenomics software; Microbial community; Simulation; Taxonomic binning; Taxonomic profiling

PMID:
30736849
PMCID:
PMC6368784
DOI:
10.1186/s40168-019-0633-6
Free full text

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center