NCBI Site Map
Types of Databases
Format of Sequence Record
Genomes and Maps
Glossaries and Dictionaries
Integrated access to:
- Nucleotide sequences: GenBank/EMBL/DDBJ, PDB, RefSeq (NM_* and NT_*)
- Protein sequences: GenPept (translated coding regions from DNA), PIR, SWISSPROT, PRF, PDB, RefSeq (NP_*)
- 3-D Structures: Molecular Modeling Database (MMDB) database, derived from the Protein Data Bank (PDB)
- Genomes: Complete genomes and schematics of entire chromosomes as well as associated mapping information, RefSeq (NC_*)
- Taxonomy: names and lineages of the >75,000 organisms that are represented in the genetic databases with at least one nucleotide or protein sequence
- PopSet: aligned sequences submitted as a set resulting from a population, a phylogenetic, or mutation study describing such events as evolution and population variation. Includes both nucleotide and protein sequence data.
- PubMed: Bibliographic records from MEDLINE as well as additional, publisher supplied citations.
|Text Term Searches
- Search fields vary by databases
- Some fields are common to all Entrez databases, such as accession number (or in PubMed, UID), author name, text word, journal name, publication date, volume, etc.
- Other fields are present in most of the Entrez databases, such as title word, which appears in all except Structures.
- Some fields that are particularly useful for searching the nucleotide and protein sequence databases include:
- Sequence Length
- Feature Key (nucleotides only)
- Molecular Weight (proteins only)
|The "Properties" Search Field
The Properties field is very useful for searching the Entrez nucleotide and protein sequence databases. It allows you to limit searches by a number of different record attributes, including:
The most commonly used properties are shown on the Limits page as check boxes or in pop-up menus (see Entrez Help doc for more details).
To see a complete list of properties, browse the index of that field in the database of interest. For example, select the Entrez Nucleotide database, follow the link for Preview/Index in the grey bar beneath the search box, select Properties from the Search Field pop-up menu, and press the Index button. Use the Up and Down buttons to scroll through the index.
- molecule type -- e.g., genomic DNA is represented in the index as
"biomol_genomic"[prop]; mRNA is represented as "biomol_mRNA"[prop];
the sample GenBank record includes additional information about molecule types)
- GenBank division -- e.g., the GenBank EST division is represented as "gbdiv_EST"[prop]; note that you can exclude ESTs by using a search such as: insulin[titl] NOT gbdiv_est[prop]
the sample GenBank record includes additional information about GenBank divisions)
- gene location -- e.g., a mitochondrial gene is represented as "gene in mitochondrion"[prop]; a plasmid gene is represented as "gene in plasmid"[prop])
- source database -- e.g., a RefSeq record is represented as "srcdb_refseq"[prop]; a Swiss-Prot record is represented as "srcdb_swiss_prot"[prop]
- Related records, or "neighbors" within each database
- Hard links to associated records in the other Entrez databases
- Record formats vary by data domain for display/save functions
- ASN.1: all records stored in Abstract Syntax Notation format;
key behind the integration of multiple different databases within a single search system;
(example: U49845 in ASN.1 format)
|Three Levels of Search Complexity
Just enter search terms without specifying search fields, other limits, or Boolean operators. E.g., in the Entrez Nucleotides database,
Enter: cystic fibrosis human
Entrez will search All Fields for many terms by default, and will map other terms to search fields that it selects. Click on the Details button to see
exactly how Entrez executed your search.
Control your search to a greater degree by using the "Limits," "Index," and "History" options in New Entrez. These enable you to select search fields or other limits, view the index of a field, or combine components of your search in various ways. E.g.,
Select "Limits" option
Enter: cystic fibrosis
Select "Title Word" as search field
Select "Limits" option
Select "Organism" as search field
Select "History" option
Enter: #1 AND #2
(Note that Boolean operators must be in upper case.
The OR and NOT operators are also available,
and parentheses can be used to nest the search.)
Compare the search results with the Basic search.
- Complex Boolean
Enter your search in command language, indicating
field qualifiers in square brackets .
If no field qualifier is indicated for a term, All Fields
will be assumed. The "Search Fields and Qualifiers" section of the Entrez help documentation provides a brief description of each field and its corresponding abbreviation. It is not necessary to include
a space between the search term and the field qualifier,
although spaces must surround the Boolean operator.
Booleans must be written in upper case, and parentheses
can be used for nesting. E.g.,
cystic fibrosis[titl] AND human[orgn]
These results should be identical to the Advanced search.
The main advantage is that you can insert your complete
search into the Entrez query box in a single step.
See Entrez Exercises handout, or the exercises in the Entrez help document for the sequence databases. (The exercises in the handout are from the help doc.)
The NCBI Education Web page also provides access to several Entrez tutorials, including:
- Nucleotides -- search to see if the database contains the sequence of a penicillin-binding gene (which renders penicillin ineffective) from Mycobacterium tuberculosis, the organism which causes tuberculosis.
- OMIM search for information about epilepsy
WWW Entrez Help Documentation is accessible from sidebar of each Entrez Web page. Note that there are separate help documents for PubMed and for the molecular databases in Entrez.
McEntyre, J. 1998. Linking up with Entrez. Trends in Genetics 14(1):39-40.
Book Chapter (a bit dated, but discusses principles behind Entrez):
Schuler, G.D., J.A. Epstein, H. Ohkawa, and J.A. Kans. 1996. Entrez: Molecular biology database and retrieval systems. Chap. 10 in Methods in Enzymology, Vol. 266. San Diego:Academic Press.