Global Ocean Sampling Expedition Microbial Metagenome

Microbial DNA sequences were isolated from seawater samples collected during a global circumnavigation aboard the Sorcerer II in order to analyze the genomic and functional diversity within this particular marine environment. Starting in Halifax, Canada, samples were collected at sites along the U.S. east coast, Gulf of Mexico, Galapagos Islands, central and south Pacific Oceans, Australia, Indian Ocean, South Africa, and across the Atlantic back to the U.S. This large dataset allows investigators to study genetic and biochemical microbial diversity within the marine environment.


A total of 41 different samples were taken from a variety of aquatic habitats collected over 8,000 km. 7.7 million sequencing reads were obtained from size-fractionated samples, yielding 6.4 million contiguous sequences, totaling 5.9 Gbp of nonredundant sequence. These were further processed into about 3 million assemblies (scaffolds).

Isolation Source

Samples were collected as part of the Sorcerer II expedition between August 8, 2003, and May 22, 2004. Most specimens were collected from surface water marine environments at approximately 320 km intervals. 44 samples were obtained from 41 sites, covering a wide range of distinct surface marine environments as well as a few nonmarine aquatic samples for contrast.

Genome Assembly

The 4,124,495 contigs were further assembled into 3,087,206 WGS scaffolds using a overlap cutoff of 98%. 85% of the assembled sequences and 57% of unassembled data is unique at the 98% identity cutoff. Based on clustering and HMM profiling, more than 6.1 million proteins were annotated on this dataset (includes bacterial as well as viral sequences). These are defined as "marine metagenome" within the source. 60 highly abundant ribotypes were identified and found to be associated with open ocean and aquatic samples.


Sequence Analysis

