Search Field Descriptions for Sequence Database

Created: ; Last Update: February 9, 2011.

Table 1.

Fields available for all Sequence Databases (Nucleotide, Protein, EST, GSS). Fields only available for the EST and GSS databases are given in Table 2.

Search FieldShort Field SpecifierSequences_help_appe.TF.2Definition
[Accession][ACCN]The accession number assigned by NCBI.


AF123456[ACCN] Nucleotide
NP_000240[ACCN] Protein
[All Fields][ALL]
All terms from all search fields in the database.


human[All Fields] Nucleotide Protein EST GSS

(Compare with human[Organism], see [Organism] entry in this table.)
All authors from all references in the records. The format is last name [space] first initial(s), without punctuation.


venter jc[AUTH] Nucleotide Protein
[EC/RN Number][ECNO]Enzyme Commission (EC) number for an enzyme activity.

Example:[ECNO]) Protein Nucleotide
(glucose-6-phosphate isomerase)
[Feature Key]
(Nucleotide, Protein, GSS)
[FKEY]Biological features listed in the Feature Table of the sequence records.


polya signal[FKEY] Nucleotide
nonstdres[FKEY] Protein
gene[FKEY] GSS

The GenBank feature table definition has more information on available features.
Filtered subsets of the database. An important kind of filter is based on the presence of links to other records. Other filters create useful subsets of data such as those set as Filters in the Discovery column of search results



nucleotide_protein[Filter] Nucleotide
protein_structure[Filter] Protein
nucest_unigene[Filter] EST
nucgss_unists[Filter] GSS

Organism or properties subsets

all[filter] Nucleotide Protein EST GSS
mrna[filter] Nucleotide
refseq[filter] Nucleotide Protein
mammals[filter] Nucleotide Protein EST GSS
[Gene Name][GENE]Gene names annotated on database records. For NCBI Reference Sequences, these names correspond to official nomenclature guidelines when possible. Submitters provide the gene names on GenBank/GenPept records. Gene names on submitted records may be historical names or vary from official guidelines for other reasons.


BRCA1[GENE] Nucleotide Protein
[Genome Project]-The numeric unique identifier for the genome project that produced the sequence records.


13139[Genome Project] Nucleotide Protein
(Oryza sativa Japonica)

21117[Genome Project] Nucleotide EST GSS
(Pelagic Microbial Assemblages in the Oligotrophic Ocean)
[Issue][ISS]The issue number of the journals cited on sequence records, not generally useful in sequence databases.
[Journal][JOUR]The name of the journals cited on sequence records. Journal names are indexed in the database in abbreviated form although many full titles are mapped to their abbreviations. Journals are also indexed by their by International Standard Serial Number (ISSN).


proceedings of the national academy of sciences of the united states of america[Journal] Nucleotide Protein EST GSS
Proc Natl Acad Sci U S A[Journal] Nucleotide Protein EST GSS
0027-8424[Journal] Nucleotide Protein EST GSS
[Keyword][KYWD]Keywords applied by submitter or from controlled vocabularies applied by NCBI or other databases. Except for specific kinds of records, such as the examples given below, the terms in this index are not well controlled. This field is unpopulated for many GenBank/GenPept records.


BARCODE[KYWD] Nucleotide Protein
HTG[KYWD] Nucleotide
RefSeqGene[KYWD] Nucleotide
[Modification Date][MDAT]The date of most recent modification of a sequence record. The date format is YYYY/MM/DD. Only the year is required. The Modification Date is often used as a range of dates. The colon ( : ) separates the beginning and end of a date range.


2009/01/08[MDAT] Nucleotide Protein EST GSS
1995/09[MDAT] Nucleotide Protein EST GSS
2010/01:2010/12/31[MDAT] Nucleotide Protein EST GSS
[Molecular Weight]
(Protein only)
[MOLWT]The molecular weight in Daltons of the protein chain calculated from the amino acids only. This may not correspond to the molecular weight of the protein obtained from biological samples because of incomplete data or post-translational modifications of the protein in living systems. The colon ( : ) separates the beginning and end of a molecular weight range.


3039[MOLWT] Protein
25000:75000[MOLWT] Protein
[Organism][ORGN]The scientific and common names for the complete taxonomy of organisms that are the source of the sequence records. This vocabulary includes all available nodes in the NCBI taxonomy database.


cellular organisms[ORGN] Nucleotide Protein EST GSS
firmicutes[ORGN] Nucleotide Protein
human[ORGN] Nucleotide Protein EST GSS
Escherichia coli O157:H7[ORGN] Nucleotide Protein
[Page Number][PAGE]The page numbers of the articles that are cited on the sequence record, not generally useful in sequence databases.
[Primary Accession][PACC]The primary accession number of the sequence record. This is the first one appearing on the ACCESSION line in the GenBank/GenPept format. Many records have additional secondary accessions representing records that have been merged. The Accession field indexes both primary and secondary accessions.


U01317[PACC] Nucleotide
M18047[PACC] Nucleotide
(Compare: M18047[ACCN] Nucleotide, see [Accession] entry in this table.)
[Primary Organism][PORGN]The primary organism when there is more than one source organism.


human[PORGN] Nucleotide
(Compare with human[ORGN], see [Organism] entry in this table.)
[Properties][PROP]Molecular type, source database, and other properties of the sequence record. Terms indexed for this field are a useful classification system for sequence records.


Molecule type

biomol crna[PROP] Nucleotide
biomol_genomic[PROP] Nucleotide
biomol_mrna[PROP] Nucleotide

Cellular location

gene_in_genomic[PROP] Nucleotide Protein
gene_in_mitochondrion[PROP] Nucleotide Protein

GenBank division

gbdiv_htg[PROP] Nucleotide
gbdiv_vrt[PROP] Nucleotide Protein

(These GenBank division queries must be combined with srcdb_genbank[PROP] to retrieve only GenBank records.)

Database source

srcdb_genbank[PROP] Nucleotide Protein EST GSS
srcdb_ddbj/embl/genbank[PROP] Nucleotide Protein EST GSS
srcdb_refseq_known[PROP] Nucleotide Protein
srcdb_refseq_predicted[PROP] Nucleotide Protein
srcdb_swiss-prot[PROP] Protein
srcdb_pdb[PROP] Nucleotide Protein
[Protein Name][PROT]The names of protein products as annotated on sequence records. The content of this field is not well controlled for GenBank/GenPept records and may contain inaccurate or incomplete information.


aldolase[Protein Name] Nucleotide Protein
[Publication Date][PDAT]The date that records were made public in Entrez. The date format is YYYY/MM/DD. The colon ( : ) separates the beginning and end of a date range.


2009/01/08[PDAT] Nucleotide EST GSS
2009/01/10[PDAT] Protein
1995/09[PDAT] Nucleotide Protein EST GSS
2010/01:2010/12/31[PDAT] Nucleotide Protein EST GSS
[SeqID String][SQID]The NCBI identifier string for the sequence record. This is a brief structured format used by NCBI software.


gnl asm gca 000000215 2 chr3 45328308[SeqID String] Nucleotide
[Sequence Length][SLEN]The total length of the sequence − the number of nucleotides or amino acids in the sequence. The colon ( : ) separates the beginning and end of a length range.


755[SLEN] Nucleotide Protein EST GSS
100:1000[SLEN] Nucleotide Protein EST GSS
[Substance Name][SUBS]The names of chemical substances associated with a record. This field is only populated for sequences extracted from structure records – PDB derived sequences. The associated residue position is often included.


mg, 1010[Substance Name] Nucleotide
atp[Substance Name] Protein
[Text Word][WORD]Text on a sequence record that is not indexed in other fields. Terms indexed here are included in an All Fields search, not generally useful.
[Title][TI] OR [TITL]Words and phrases found in the title of the sequence record. The title is the DEFINITION line of the GenBank/GenPept format of the record. This line summarizes the biology of the sequence and includes the organism, product name, gene symbol, molecule type, and sequence completeness.

complete cds[TI] Nucleotide
kinesin[TI] Nucleotide Protein
liver[TI] Nucleotide Protein EST
uncultured[TI] Nucleotide Protein EST GSS
[Volume][VOL]Contains the volume number of the journals in references on the sequence record, not generally useful in the sequence databases.

Queries using any term followed by the full name of the indexed field in square brackets will only retrieve records with the term indexed in that field. For example a search with apolipoprotein[Title] finds only records with “apolipoprotein” indexed for their Title field. Some fields have shorter names that can also be used instead of the full name. These are listed in the Abbreviated Field Specifier column of Table 1 when available.

Table 2.

Fields available only for EST and GSS databases.

Index Search FieldDescription
[Clone ID]The clone identifier provided by the submitter of the EST or GSS records.


image 1000232[Clone ID] EST
ZMMBBb0001G04f[Clone ID] GSS
[EST Name]
[GSS Name]
The name given to the EST or GSS record by the submitter.


R-OVA-119[EST Name] EST
DKFZP761J17121[GSS Name] GSS
Legacy dbEST or dbGSS unique identifier provided by NCBI.


2081316[EST ID] EST
14283478[GSS ID] GSS
[Library Class]
(GSS Only)
Information about the kind of genomic DNA library that was the source of the clone.


bac ends[Library Class] GSS
methylation filtered [Library Class] GSS
cosmid ends[Library Class] GSS
shotgun[Library Class] GSS
[Library Name]
(EST Only)
The name given to the cDNA library that is the source of the clone, provided by the submitter and taken verbatim from the record. May contain useful information about the cell, tissue, or organ source.


soares fetal liver spleen 1nfls[Library Name] EST
full length enriched swine cdna library, adult adrenal gland[Library Name] EST
[Submitter Name]Submitter name of EST and GSS records. Unlike [Author Name], the Submitter Name content is not controlled and is verbatim from the EST or GSS record


smith tpl[Submitter Name] EST GSS
david severson[Submitter Name] EST
da lightfoot and chris town[Submitter Name] GSS
