In this issue

Transitioning from LocusLink to Entrez Gene

Cancer Chromosomes: a New Entrez Database

HomoloGene: An Entrez Database with a New Look

BLAST Link (BLink) to Protein Alignments and Structures

Debut of the HCT Database and Anthropology/Allele Frequencies in dbMHC

350kb Sequence Length Limit Removed by Sequence Database Collaboration

New Eukaryotic Genomes at NCBI

Environmental Samples Make Big Splash

HIV Protein-Interaction Database

e-PCR and Reverse e-PCR: Greater Sensitivity, More Options

New Organisms in UniGene

RefSeq Accession Numbers Get Longer as Rat Gets Last 6-digit Accession

Slots available for FieldGuidePlus Training Course Onsite at NCBI

RefSeq Release 6 on FTP Site

Exponential Growth of GenBank Continues with Release 142

Entrez Tools is a 'Hot Spot'

BLAST Lab: Using BLASTClust

New Microbial Genomes in GenBank

Entrez Quiz


Environmental Samples Make Big Splash

The technology of Whole Genome Shotgun (WGS) sequencing is now being applied to quickly assemble large sets of genomic sequences taken from organisms inhabiting a particular ecological niche. Sequence data collected in this manner provides a snapshot of the genetic diversity existing at a particular locale and is especially important in providing data for organisms which are difficult or impossible to culture in the laboratory. Recently, The Institute for Biological Energy Alternatives sampled water from the Sargasso Sea, one of the most well-characterized regions of the world's oceans.1 The larger of two sets of samples collected produced over 1.3 gigabases of sequence in the form 1.66 million WGS reads. These reads were assembled into contigs containing about 1 gigabase of non-redundant sequence. In addition, over 1 million protein sequences were derived from the annotation of open reading frames on the genomic sequences. Contigs constructed from the WGS reads and the remaining single reads have been deposited in the WGS division of GenBank, under the project accession number AACY01000000. Scaffolds assembled from these contigs are available within the accession ranges CH004436-CH004736, and CH004737-CH236877. The raw sequencing data is available in the Trace Archive.

The Sargasso Sea dataset along with other environmental sample datasets, such as sequences from an acid mine drainage biofilm submitted by the DOE Joint Genome Institute,
2 can be queried using the new “Environmental Samples” BLAST page at:

Environmental sample data can also be searched using two newly-created standard BLAST databases, “env_nt” or “env_nr” for nucleotide and protein sequences respectively. The environmental sample data contained within these two new databases is no longer contained within the “nt” or “nr” BLAST databases.


1 Venter, J.C.,, Environmental Genome Shotgun Sequencing of the Sargasso Sea, Science, 2004 Apr 2;304(5667):66-74. PMID 15001713
2 Tyson, G.W.,, Community structure and metabolism through reconstruction of microbial genomes from the environment, Nature, 2004 Mar 4;428(6978):37-43. PMID 14961025

Continue to:  eukaryotic

NCBI News | Fall/Winter 2002 NCBI News: Spring 2003