Format of Sequence Record
PubMed Entrez BLAST OMIM Taxonomy Structure
NCBI Home
NCBI Site Map
     brief/complete

Course Description

Schedule

Introduction

Genetics Review

Types of Databases

Format of Sequence Record

Entrez

BLAST

3-D Structures

Genomes and Maps

Librarian Roles

WWW Sites

Glossaries and Dictionaries

 

Three Main Parts to Sequence Record *
  • Header

    • title of record ("Definition Line," or "Def Line")
    • unique identifiers (accession number, sequence ID number, etc.)
    • summary information, such as length of sequence, molecule type, GenBank division, date of last modification

  • Descriptive information

    • source organism
    • published references, or tentative title for an unpublished reference
    • submitter block (last reference, with contact information for submitter)
    • biological features of sequence, such as coding region (CDS) and its amino acid translation; the Feature Key field provides a 'road map' or 'index' to the sequence data that follows

  • Sequence Data

    • single, contiguous sequence from a single molecule type
    • experimentally sequenced in a lab; not a hypothetical or consensus sequence based on the analysis or merger of third party data

* Regular font items represent similarities with bibliographic records; italicized items represent differences. The main difference between a sequence record and a bibliographic record is that a sequence record contains sequence data rather than an abstract at the end.

Sample Archival Database Records
Database Accession GI Number
GenBank U49845  (detailed) 1293613
GenPept
CDS #1 AAA98665 1293614
CDS #2 AAA98666 1293615
CDS #3 AAA98667 1293616

Sample Curated Database Records
Database Accession GI Number
SwissProt P39076 730917
(contains xref to gi|1293614
RefSeq NM_002111 4755137
RefSeq NP_002102 4755137


Exercise back to top

For an example GenBank of record that was updated with new sequence data, see U46667.
  • How many nucleotide sequences are in the record?

  • How many protein sequences are in the record?

  • How many corresponding GenPept records will there be?

  • Notice that the GI numbers of the nucleotide and protein sequences are no longer consecutive. Which sequences in the record have been changed? How can you tell?

Help Desk NCBI NLM NIH Credits
  Revised January 20, 2000
Comments/questions about course to Renata Geer renata@ncbi.nlm.nih.gov
Questions about NCBI resources to info@ncbi.nlm.nih.gov