NCBI logo gif RefSeq banner gif
PubMed All Databases BLAST OMIM Books Taxonomy Structure

  NCBI Reference Sequences

The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq is a foundation for medical, functional, and diversity studies; they provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses. [more...]

blue marker gifScope back to top

NCBI provides RefSeqs for taxonomically diverse organisms including eukaryotes, bacteria, and viruses. Additional records are added to the collection as data become publicly available.

blue marker gifAnnouncements back to top

September 2009: An update for the human CCDS set was released. This update adds 3,852 CCDS IDs, bringing the total to 23,739 consistently annotated coding regions that pass all CCDS QA tests. [more]
  
November 11, 2009: RefSeq Release 38 available for FTP

This release includes:

Proteins:9,325,214
Organisms:9,166
Available at:ftp://ftp.ncbi.nih.gov/refseq/release/

To receive announcements of future RefSeq releases and incremental large updates please subscribe to NCBI's refseq-announce mail list: refseq-announce

  

Announcing the Consensus Coding Sequence (CCDS) database. More information is available at: http://www.ncbi.nlm.nih.gov/CCDS/

BLAST databases: Formatted genomic, mRNA, and protein RefSeq BLAST databases are available for FTP.

announcing HIV-1 protein interaction data, more information at http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/index.html



blue marker gifData Access and Availability back to top

RefSeq is accessible via BLAST, Entrez, and the NCBI FTP site. Information is also available in Entrez Genomes and Entrez Gene, and for some genomes additional information is available in the Map Viewer. Special properties have been defined to facilitate Entrez-based retrieval. Also see: Entrez Query Hints


blue marker gifDistinguishing Features back to top

The main features of the RefSeq collection include:
blank spacer gif  non-redundancy
blank spacer gif  explicitly linked nucleotide and protein sequences
blank spacer gif  updates to reflect current knowledge of sequence data and biology
blank spacer gif  data validation and format consistency
blank spacer gif  distinct accession series (all accessions include an underscore '_' character)
blank spacer gif  ongoing curation by NCBI staff and collaborators, with reviewed records indicated

blue marker gifReferences back to top

Please refer to the Publications page for a full list of articles describing or using the RefSeq dataset. When using the RefSeq database, please cite one of the following:

The NCBI handbook [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2002 Oct. Chapter 17, The Reference Sequence (RefSeq) Project. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Pruitt KD, Tatusova, T, Maglott DR
Nucleic Acids Res 2007 Jan 1;35(Database issue):D61-5
[Full Text in PubMed Central]


 Site contents spacer gif  
Information
NCBI Handbook
Overview  |  FAQ Frequently Asked Questions
Accessions  |  Status  |  Queries  |  Publications
FTP
RefSeq Release
Catalog  |  Notes
Genomes
BLAST databases
Statistics
Release Statistics
Feedback
NCBI Help Desk
Submit Updates
Submit GeneRIF
Subscribe - eMail Lists
RefSeq  |  Gene
Map Viewer  |  NCBI
Related links
Genomic Biology Home
Gene  |  Genome Project
Entrez Genomes Home
Map Viewer  |  UniGene
Credits
Collaborators
Microbial Providers
Viral Genome Advisors
NCBI Staff

Last updated November 13, 2009
Questions or Comments?
Write to the Help Desk

Disclaimer     Privacy statement