NCBI Logo
NCBI News




In this issue


Open Mass Spectrometry Search Algorithm (OMSSA)

Probe Database Debut

New Structure Link from Protein

BLAST Download Update

New Microbial Genomes in GenBank

Nucleotide Database Splits

NCBI 4-Pack Course

RefSeq Release 14

New Organisms in UniGene

GenBank Passes 100 Gigabases

New BLAST Formatter

Splign Alignment Tool

GenBank Release 150

New Genome Builds

Submission Corner

Masthead



Submitting Sequence Polymorphisms to NCBI's dbSNP

Small genetic variations at specific positions in the genome, called single nucleotide polymorphisms or SNPs, are often responsible for phenotypic differences. The identification and analysis of SNPs in human and other complex genomes has become one of the major themes of biomedical research since the completion of the human genome sequence. The NCBI database of Single Nucleotide Polymorphisms (dbSNP) provides a public repository for this rapidly growing set of primary data and now contains over 40 million submitted SNPs from 33 different species. SNPs are submitted using a specialized protocol which involves the generation and transmission of a set of files to the NCBI SNP submission group. A brief outline of the SNP submission protocol and file types needed is presented below.

Submitters should begin by navigating to the detailed 'Quick Start' section on the SNP submissions page:

The SNP submission process is modeled on that of the GenBank bulk divisions - sequence tagged site (STS), genome survey sequence (GSS) and expressed sequence tag (EST). In fact, it is possible to simultaneously submit polymorphism data as a STS and a SNP. In all submission scenarios, the submitter creates a text file made up of a combination of required and optional sections — Contact, Publications, Method, Population Description and Assay, among others — for different types of information. Each section of the file is broken up into a set of fields identified by colon-delineated capitalized tags for the various types of data. The SNP submissions page mentioned above provides more information, including examples of the submission file format, and shows the possible sections and fields.

The submission file can be created using any standard text editor. However, electronic spreadsheet software can also be used to prepare the submission and can make the process easier. Figure 1 shows a portion of a spreadsheet used to generate a SNP submission. Data for each field are placed on the same row as the Field tag, and can also be entered on subsequent lines. Before submission the file must be saved in plain text format from the spreadsheet software.

click here for larger image

Click on image to view larger

Figure 1: A section of a spreadsheet for creating a SNP submission showing the Contact, Publications, Method and Population sections.

The completed submission file should be emailed to:


SNP submissions can be made for either published or unpublished data.

Each submitted SNP is assigned an identifier of the form ss#, where “#” represents an integer identifier. The ss identifier serves the same purpose as an accession number for a GenBank sequence. NCBI also builds a non-redundant Reference SNP (RefSNP) database. Each RefSNP cluster, which is given an identifier of the form rs#, contains polymorphisms that map to the same position in the genome. RefSNPs are available as part of the Entrez database system and are linked to the primary SNP records as well as sequence, gene, genome, structure and functional information.

An example of a submitted SNP record that includes population and other detailed information for the human gene alcohol dehydregnase 2 can be seen on the following Web page:



Questions concerning snp submissions should be directed to:


—MR

back to previous articleContinue to next article

NCBI News | Fall/Winter 2002 NCBI News: Spring 2003