Entrez
PubMed Entrez BLAST OMIM Taxonomy Structure
NCBI Home
NCBI Site Map
     brief/complete

Course Description

Schedule

Introduction

Genetics Review

Types of Databases

Format of Sequence Record

Entrez

BLAST

3-D Structures

Genomes and Maps

Librarian Roles

WWW Sites

Glossaries and Dictionaries

 

Entrez Databases back to top

Integrated access to:
  • Nucleotide sequences: GenBank/EMBL/DDBJ, PDB, RefSeq (NM_* and NT_*)

  • Protein sequences: GenPept (translated coding regions from DNA), PIR, SWISSPROT, PRF, PDB, RefSeq (NP_*)

  • 3-D Structures: Molecular Modeling Database (MMDB) database, derived from the Protein Data Bank (PDB)

  • Genomes: Complete genomes and schematics of entire chromosomes as well as associated mapping information, RefSeq (NC_*)

  • Taxonomy: names and lineages of the >75,000 organisms that are represented in the genetic databases with at least one nucleotide or protein sequence

  • PopSet: aligned sequences submitted as a set resulting from a population, a phylogenetic, or mutation study describing such events as evolution and population variation. Includes both nucleotide and protein sequence data.

  • PubMed: Bibliographic records from MEDLINE as well as additional, publisher supplied citations.

Text Term Searches back to top

  • Search fields vary by databases

  • Some fields are common to all Entrez databases, such as accession number (or in PubMed, UID), author name, text word, journal name, publication date, volume, etc.

  • Other fields are present in most of the Entrez databases, such as title word, which appears in all except Structures.

  • Some fields that are particularly useful for searching the nucleotide and protein sequence databases include:
    • Organism
    • Properties
    • Sequence Length
    • Feature Key (nucleotides only)
    • Molecular Weight (proteins only)

The "Properties" Search Field back to top

The Properties field is very useful for searching the Entrez nucleotide and protein sequence databases. It allows you to limit searches by a number of different record attributes, including:

  • molecule type -- e.g., genomic DNA is represented in the index as "biomol_genomic"[prop]; mRNA is represented as "biomol_mRNA"[prop];
    the sample GenBank record includes additional information about molecule types)

  • GenBank division -- e.g., the GenBank EST division is represented as "gbdiv_EST"[prop]; note that you can exclude ESTs by using a search such as:    insulin[titl] NOT gbdiv_est[prop]
    the sample GenBank record includes additional information about GenBank divisions)

  • gene location -- e.g., a mitochondrial gene is represented as "gene in mitochondrion"[prop]; a plasmid gene is represented as "gene in plasmid"[prop])

  • source database -- e.g., a RefSeq record is represented as "srcdb_refseq"[prop]; a Swiss-Prot record is represented as "srcdb_swiss_prot"[prop]

The most commonly used properties are shown on the Limits page as check boxes or in pop-up menus (see Entrez Help doc for more details).

To see a complete list of properties, browse the index of that field in the database of interest. For example, select the Entrez Nucleotide database, follow the link for Preview/Index in the grey bar beneath the search box, select Properties from the Search Field pop-up menu, and press the Index button. Use the Up and Down buttons to scroll through the index.

Special Features back to top

  • Related records, or "neighbors" within each database

  • Hard links to associated records in the other Entrez databases

  • Record formats vary by data domain for display/save functions

  • ASN.1: all records stored in Abstract Syntax Notation format;
    key behind the integration of multiple different databases within a single search system;
    (example: U49845 in ASN.1 format)

Three Levels of Search Complexity back to top

  • Basic

    Just enter search terms without specifying search fields, other limits, or Boolean operators. E.g., in the Entrez Nucleotides database,
    Enter:  cystic fibrosis human
    Entrez will search All Fields for many terms by default, and will map other terms to search fields that it selects. Click on the Details button to see exactly how Entrez executed your search.
  • Advanced

    Control your search to a greater degree by using the "Limits," "Index," and "History" options in New Entrez. These enable you to select search fields or other limits, view the index of a field, or combine components of your search in various ways. E.g.,
    Step 1:
    Select "Limits" option
    Enter:  cystic fibrosis
    Select "Title Word" as search field
    Press "Go"
    
    Step 2:
    Select "Limits" option
    Enter:  human
    Select "Organism" as search field
    Press "Go"
    
    Step 3:
    Select "History" option
    Enter:  #1 AND #2
    (Note that Boolean operators must be in upper case.
    The OR and NOT operators are also available,
    and parentheses can be used to nest the search.)
    
    Compare the search results with the Basic search.
  • Complex Boolean

    Enter your search in command language, indicating field qualifiers in square brackets []. If no field qualifier is indicated for a term, All Fields will be assumed. The "Search Fields and Qualifiers" section of the Entrez help documentation provides a brief description of each field and its corresponding abbreviation. It is not necessary to include a space between the search term and the field qualifier, although spaces must surround the Boolean operator. Booleans must be written in upper case, and parentheses can be used for nesting. E.g.,
    cystic fibrosis[titl] AND human[orgn]
    
    These results should be identical to the Advanced search. The main advantage is that you can insert your complete search into the Entrez query box in a single step.

Four Versions back to top

Exercises back to top

See Entrez Exercises handout, or the exercises in the Entrez help document for the sequence databases. (The exercises in the handout are from the help doc.)

The NCBI Education Web page also provides access to several Entrez tutorials, including:
  • Nucleotides -- search to see if the database contains the sequence of a penicillin-binding gene (which renders penicillin ineffective) from Mycobacterium tuberculosis, the organism which causes tuberculosis.

  • OMIM search for information about epilepsy

References back to top

Help Documentation:

WWW Entrez Help Documentation is accessible from sidebar of each Entrez Web page. Note that there are separate help documents for PubMed and for the molecular databases in Entrez.

Articles:

McEntyre, J. 1998. Linking up with Entrez. Trends in Genetics 14(1):39-40. [PubMed]

Book Chapter (a bit dated, but discusses principles behind Entrez):

Schuler, G.D., J.A. Epstein, H. Ohkawa, and J.A. Kans. 1996. Entrez: Molecular biology database and retrieval systems. Chap. 10 in Methods in Enzymology, Vol. 266. San Diego:Academic Press.

Help Desk NCBI NLM NIH Credits
  Revised January 17, 2007
Comments/questions about course to Renata Geer renata@ncbi.nlm.nih.gov
Questions about NCBI resources to info@ncbi.nlm.nih.gov