The goal of this project is to provide sequence data that can be used for benchmarking metagenomics protocols. We mixed the DNA from ten organisms that have completed genome sequences available and sequenced libraries that were created with this DNA. We have 3 different small-insert libraries (sequenced using Sanger/cappilary-based sequencing) that have ~12,000 sequences each, and one 454 run, which has ~500,000 sequences.
The ten organisms are:
Acidothermus cellulolyticus 11B (NCBI Reference Sequence: NC_008578.1)
Shewanella amazonensis SB2B (NCBI Reference Sequence: NC_008700.1)
Pediococcus pentosaceus ATCC 25745 (NCBI Reference Sequence: NC_008525.1)
Lactobacillus casei ATCC 334 (NCBI Reference Sequence: NC_008526.1)
Lactococcus lactis subsp. cremoris SK11 (NCBI Reference Sequence: NC_008527.1)
Lactobacillus brevis ATCC 367 (NCBI Reference Sequence: NC_008497.1)
Lactococcus lactis subsp. lactis Il1403 (NCBI Reference Sequence: NC_002662.1)
Halobacterium sp. NRC-1 (NCBI Reference Sequence: NC_002607.1)
Myxococcus xanthus DK 1622 (NCBI Reference Sequence: NC_008095.1)
Saccharomyces cerevisiae (NCBI Reference Sequences: NC_001133-48)
Sequences deposited into the Sequence Read Archive can be found using the Project data link. Less...