In this issue

New Cn3D 4.0

SNP Population Grows

1000th Viral RefSeq Unmasked!

New Genomes in GenBank

View the Mouse Genome with Map Viewer

Mouse Genome Resources

Recent Publications

GenBank Release 131

Anopheles Gambiae Genome

Submitting Large Sequin Files

BLAST Version 2.2.4 Released



SNP Population Grows at NCBI

SNPs, or single nucleotide polymorphisms, are variations in genomic DNA sequences within a population of organisms. These genetic changes occur at a frequency of over 1 percent in the human genome, and are important because they are sometimes linked to heritable phenotypes. Knowledge of SNPs is useful for physical mapping, disease association, and surveys of population structure. The dbSNP database was developed at NCBI to facilitate the management of SNP data, integrate this data with other NCBI resources, and distribute the information to the scientific community.

Composition of the Database

The data in dbSNP includes SNPs, microsatellite repeats, and small insertion/deletion polymorphisms. There is no minimum allele frequency or requirement that a SNP result in a measurable phenotype for submission to dbSNP, and a large portion of the polymorphisms in the database are neutral polymorphisms. Currently, dbSNP contains predominantly human data, but variation information for sev-eral other organisms can be found in the database. Release 106 of dbSNP contained 4.5 million SNPs, and the database is growing at a rate of 90 SNPs per month.

Although dbSNP accepts submissions from any laboratory or individual, the bulk of the submissions are derived from large-scale contributors associated with the National Human Genome Research Institute’s (NHGRI) grants program that aims to catalog 50,000 SNPs by 2005. SNPs are submitted to dbSNP using a special procedure that involves registering a submission “handle” with the NCBI SNP group, followed by the preparation of a set of structured submission files. Instructions on how to submit to dbSNP are located on the dbSNP home page. Each SNP in the database is given an identifier beginning with “ss”, for “submitted SNP”. If there are multiple submissions for the same SNP, then a reference SNP cluster is created, to incorporate information from the multiple submitters. The reference SNP clusters, given “rs” identifiers, are used in the annotation of reference genome sequences.

The SNP Record

A SNP record contains the obser-ved alleles at a particular locus, the flanking sequence that surrounds the variation, the experimental method used to assay the variation, including protocols and conditions, and cross-references to associated GenBank records or UniGene clusters. Other types of data that can be included are genetic map locations, population-specific frequencies, individual-specific genotype information, relevant publications that docu-ment the details of the methodologies or populations, known genes in the region, synonyms for a submitter’s SNP ID used in the submission, and validation information to describe the quality of the frequency data.

Searching dbSNP

Searches of dbSNP may be limited to Entrez fields such as allele variation, validation status, chromosome on which the SNP is mapped, and many others. SNP records retrieved in Entrez are displayed in a summary format tailored to the structure of a SNP record. There are, however, several additional display formats, such as a graphical summary and a chromosome report. Entrez SNP results may also be sorted by various fields including organism, SNP ID, success rate, and heterozygosity.

Special SNP query services offer pre-formulated search methods by Submitter, New Batches, Method type, Population Detail, Publication, Locus Information, and STS Markers; two Free Form Search services are also offered. In addition, the dbSNP data can be searched using a special BLAST page. These search options are linked from the blue sidebar menu of the dbSNP home page.

The integration of SNPs into other resources, such as the Map Viewer, provides a way to see them in their genomic context. When the “SNP” Master Map is viewed in the Map Viewer, a graphical summary showing mapping information, associated gene features, and marker heterozygosity is provided.

Downloading SNPs

Batch downloads of SNPs can be performed using Batch Entrez or via an e-mail-mediated query service that allows for the retrieval of a large number of SNPs by using individual submissions (ss#), submitter IDs, or dbSNP RefSNP cluster IDs (rs#). The SNP batch query service is accessible from

The SNP data may also be downloaded at

The SNP home page is found at

The Entrez SNP page can be reached from the “Hotspots”
column on the NCBI home page. 


NCBI News | Spring 2002 NCBI News