In this issue

Influenza Database and Tools

Trace Archives at 1 Billion

Entrez Nucleotide Split Database

Third Party Annotation Database

RefSeq Release 18

1918 Killer Flu Virus


GenBank Release 155

Mammoths and Moas at NCBI

Recent NCBI Publications

NCBI Papers Most Cited

NCBI Courses


Genome Builds and Map Viewer


Trace Archives Tops 1 Billion Records

NCBI’s Trace Archive now contains over 1.2 billion entries, making it one of the largest publicly accessible biological databases in the world. The database also ranks as one of the most important to the medical research community because it contains the genetic blueprints of hundreds of organisms important to biomedical research.

A Collaborative Effort

The Trace Archive was established in 2001 as a collaborative effort between NCBI and the European Molecular Biology Laboratory (EMBL/ENSEMBL) to collect raw data produced at sequencing centers around the world. Today, these data are submitted to one of two central processing centers—NCBI or the Wellcome Trust Sanger Centre. The amount of data in the archive has doubled every 10 months since 2001 so that it is now an overwhelming 22 trillion bytes in size, large enough to fill a stack of compact disks 10 stories high. New sequencing technologies promise an even sharper increase in data volume in the future. NCBI works closely with the groups pioneering these new techniques to develop the necessary processing , storage and retrieval technologies in advance of the anticipated data influx.

Traces are Pieces of a Puzzle

NCBI’s Trace Archive provides direct access to the raw traces, typically between 300 and 1,000 DNA letters in length.

Researchers can view and evaluate over 850 assemblies, such as that shown in Fig. 1, of trace-derived sequences for influenza virus.

click for larger image

Click on image to view larger

Figure 1. Assembly Viewer display for a neuraminidase assembly from influenza virus traces. The overlapping traces comprising the assembly are shown in panel A. Detailed alignments are shown in panel B with mismatches highlighted. The assembled virus sequence is one of over 850 trace assemblies available in NCBI's Assembly Archive.

These assemblies are found in the Assembly Archive, a database that builds upon the sequences in the Trace Archive to provide a higher level view.

A Vital Resource in the Fight Against Disease

Sequencing traces are vital to the hunt for polymorphisms in gene sequences that are linked to disease when they occur in human DNA or linked to virulence when they occur in the DNA of a virus. To further support studies of DNA sequence variability, NCBI maintains the core dbSNP database with detailed information for over 25 million genetic variations, predominantly single DNA letter changes called ‘Single Nucleotide Polymorphisms’. The trace data, combined with that of dbSNP, is a boon to medical researchers seeking to gain greater insight into the impact of genetic variation on health. Trace sequences may be searched using MegaBLAST, or via the web-based form at

(see the ‘Mammoth found in Trace Archive’ section of the “Mammoths and Moas. . .” article in this issue.)

back to previous articleContinue to next article

NCBI News | Fall/Winter 2002 NCBI News: Spring 2003