Information Hubs
Course Home Modules Schedule Exercises Comments Credits

Retrieve all data from a given organism or taxon

  Sample User Question Comments/Analysis Step By Step Guide Additional Tips  

Sample User Question back to

I will soon begin a postdoctoral fellowship in a lab studying the dog genome. In preparation, I would like to assess the quantity and types of data currently available for that organism. What is the easiest way to do that?

Comments / Analysis back to

The Organism field in Entrez databases such as Nucleotide, Protein, and Structure allows you to retrieve data from a specific source organism, or for any other node in the taxonomic hierarchy. For example, searching the Entrez Nucleotide database for dog[orgn] will retrieve all the nucleotide sequence records currently available for the species Canis familiaris. (Whenever possible, common names are mapped to scientific names, so searching for either name will retrieve the same results.) However, searching each individual database would be somewhat cumbersome.

In contrast, the NCBI Taxonomy Browser provides a broad view of all the types and quantities of data available for any organism within the Entrez system. (See note about scope of taxonomy database.) An "Entrez Records" summary table allows you to instantly retrieve data for that organism from any Entrez database in which it has records.

The Taxonomy Browser also allows you to browse up and down the taxonomic tree to view similar information (and retrieve associated data) for any node in the taxonomic hierarchy.

Step By Step Guide back to top

  1. Taxonomy Browser - retrieve the Taxonomy database record for dog

    • enter dog in the search box
      (Whenever possible, common names are mapped to scientific names, so you can search for either and get the same results.)
    • leave the search mode set to complete name
      (other search modes such as token set, etc. are explained under additional tips)
    • press Go
    • the initial search results display page shows brief information such as:
      • lineage of the organism, showing higher taxonomic nodes (see tip on browsing up and down the taxonomic tree)
      • scientific name
      • taxonomic nodes that fall beneath the organism in the taxonomic hierarchy

  2. Display complete record for dog and view summary table of "Entrez Records"

    • click on the organism name, Canis familiaris, to display the full taxonomy database record
    • the complete record includes detailed information such as:
      • scientific name
      • taxonomy ID (TaxID, the unique identifier for the taxonomic node)
      • synonyms and common names
      • lineage
      • information about genetic codes
      • a summary table of all data currently available in Entrez for that organism or taxonomic node
      • links to additional data and resources

  3. follow links of interest in the "Entrez Records" summary table to retrieve the corresponding data

    • numbers in the subtree links column represent all records for that taxonomic node and all nodes that fall beneath it
    • numbers in the direct links column represent records that have the specific organism name (or taxonomic node) listed as the source organism for the data.
      For Canis familiaris, the numbers in both columns are the same at the time of this writing. View the taxonomy database record for the genus Canis to see an example of how the numbers in those columns can differ (more).

Additional Tips back to

Scope of NCBI Taxonomy Database

The NCBI Taxonomy Database contains the names and lineages of >130,000 organisms, both living and extinct, that are represented in the molecular biology databases with at least one nucleotide or protein sequence. New organisms are added to the database as sequence data are deposited for them. The purpose of the taxonomy project at NCBI is to build a consistent phylogenetic taxonomy for the sequence databases.

Taxonomy Browser Search Modes (complete name, token set, etc.)

The Taxonomy Browser offers several search modes, which can be selected from the pop-up menu beside the text box:

  • complete name looks for a complete common or scientific name of an organism, or the complete name of any other taxonomic node, e.g., dog, Canis familiaris, Canidae. However, the terms such as or Canid will not retrieve any records because they are not complete names.
  • wild card allows searching with an asterisk as a wild card anywhere in the string, e.g.: Canid* or *anid*, or Ca*id
  • token set searches for any string, whether in the middle of a word or at the end, e.g., dog. However, "dog" will appear as a complete word, and not a word stem, in the retrieved records. For example, the search will retrieve dog hookworm and black-tailed prairie dog, but it will not retrieve dogfish sharks or Doguera baboon.
  • phonetic name searches for names pronounced phonetically, e.g., a "drosofila" will retrieve Drosophila.
  • taxonomy id searches by TaxId, which is a unique identifer assigned to every node in the taxonomic hierarchy. Some examples of TaxIDs: human (9606), Mammalia (40674), Canidae (9608), and for Canis familiaris (9615).

The "lock" check box preserves your selected search mode for subsequent searches. If the box is not checked, the Taxonomy Browser will return to the default search mode of complete name after each search.

Browsing up and down the taxonomic tree

The NCBI Taxonomy Browser displays the lineage of every organism and taxonomic node. (The full lineage is displayed by default. Click the "Lineage" link to toggle between the full and abbreviated lineage displays.)

Browsing Up the Tree:  Click on any link in the lineage to display the complete record for a higher taxonomic node. For example, click on the order Carnivora in the dog's lineage to display information about that taxon.

Browsing Down the Tree:  The default display for a taxon will show three levels of organisms that fall under the taxon being displayed. If desired, change the number of levels displayed in the taxonomic tree. For example, when viewing the tree for Carnivora, change the number of levels from 3 to 5 and press the Display button. Click on any taxon name to display its complete record, including its "Entrez Records" summary table.

"Subtree Links" vs. "Direct Links" in "Entrez Records" summary table

When viewing the complete record for any organism or node in the taxonomic hierarchy, the "Entrez Records" summary table includes columns labeled:

  • subtree links -- all records for that taxonomic node and all nodes that fall beneath it
  • direct links -- records that have the specific organism name (or taxonomic node) listed as the source organism for the data.
For example, click on the link for the genus Canis (in the lineage section of the dog record) and view the complete record for that taxonomic node. Follow the number for Nucleotide records in the "direct links" column to see sequence records for which the submitters noted only the genus Canis as the source organism. Follow the number for PopSet records to see population or phylogentic studies that include sequences from several species in the genus Canis.

Information Hubs Return to Slides (*.html or *.mht format)
Return to Exercises List
Revised 09/07/2006