To evaluate the performances of de novo assemblers and sequence clustering tools for metagenomic data, genomic DNA of 20 known bacteria were mixed at three different abundance ratios. For each mixture sample, paired-end (insert-size, 550 bp) and mate-pair (insert-size, 4 kbp and 8 kbp) libraries were constructed and sequenced by Illlumia sequencers. Resulting datasets are assumed as synthetic human gut data. Contained strains are as follows: Bacteroides caccae ATCC43185 JCM9498, Parabacteroides distasonis ATCC8503, Bacteroides eggerthii ATCC27754 DSM20697, Bacteroides fragilis YCH46, Parabacteroides merdae ATCC43184 JCM9497, Bacteroides ovatus ATCC8483, Bacteroides stercoris ATCC43183 JCM9496, Bacteroides thetaiotaomicron VPI-5482, Bacteroides uniformis ATCC8492, Bacteroides vulgatus ATCC8482, Clostridium acetobutylicum ATCC824, Clostridium cellulolyticum ATCC35319 H10, Clostridium difficile 630, Clostridium hylemonae DSM15053 JCM10539, Clostridium pasteurianum ATCC6013 DSM525, Clostridium perfringens ATCC13124, Clostridium ramosum DSM1402 JCM1298, Escherichia coli K12 MG1655, Pseudomonas aeruginosa PAO1, Serratia marcescens Db1.
Less...