Retrieve all data from a given organism or taxon
|Sample User Question
I will soon begin a postdoctoral fellowship in a lab studying the dog
genome. In preparation, I would like to assess the quantity and types of data
currently available for that organism. What is the easiest way to do
|Comments / Analysis
The Organism field in Entrez databases such as Nucleotide, Protein, and
Structure allows you to retrieve data from a specific source organism, or for any
other node in the taxonomic hierarchy. For example, searching the Entrez
Nucleotide database for dog[orgn] will retrieve all the nucleotide sequence
records currently available for the species Canis familiaris.
(Whenever possible, common names are mapped to scientific names, so searching for
either name will retrieve the same results.) However, searching each individual
database would be somewhat cumbersome.
In contrast, the NCBI
Taxonomy Browser provides a broad view of all the types and quantities
of data available for any organism within the Entrez system. (See note about
scope of taxonomy database.) An "Entrez Records" summary
table allows you to instantly retrieve data for that organism from any Entrez
database in which it has records.
The Taxonomy Browser also allows you to browse up and down the taxonomic
tree to view similar information (and retrieve associated data) for any node
in the taxonomic hierarchy.
|Step By Step Guide
- Taxonomy Browser - retrieve the Taxonomy database
record for dog
- enter dog in the search box
(Whenever possible, common names are mapped to scientific names,
so you can search for either and get the same results.)
- leave the search mode set to complete name
(other search modes such as token set, etc.
are explained under additional tips)
- press Go
- the initial search results display page shows brief information
- lineage of the organism, showing higher taxonomic nodes (see
tip on browsing up and down the taxonomic
- scientific name
- taxonomic nodes that fall beneath the organism in the taxonomic
Display complete record for dog and view summary table of "Entrez
- click on the organism name, Canis familiaris, to display the
full taxonomy database record
- the complete record includes detailed information such as:
- scientific name
- taxonomy ID (TaxID, the unique identifier for the taxonomic
- synonyms and common names
- information about genetic codes
- a summary table of all data currently available in
Entrez for that organism or taxonomic node
- links to additional data and resources
follow links of interest in the "Entrez Records" summary table to
retrieve the corresponding data
- numbers in the subtree links column represent all records for that
taxonomic node and all nodes that fall beneath it
- numbers in the direct links column represent records that have the
specific organism name (or taxonomic node) listed as the source
organism for the data.
For Canis familiaris, the numbers in both columns are the
same at the time of this writing. View the taxonomy database record for the genus
Canis to see an example of how the numbers in those columns can differ (more).
Scope of NCBI Taxonomy Database
The NCBI Taxonomy Database contains the names and lineages of >130,000
organisms, both living and extinct, that are represented in the molecular biology
databases with at least one nucleotide or protein sequence. New organisms are
added to the database as sequence data are deposited for them. The purpose of the
taxonomy project at NCBI is to build a consistent phylogenetic taxonomy for the
Taxonomy Browser Search Modes (complete name, token set, etc.)
The Taxonomy Browser offers several search modes, which can be selected
from the pop-up menu beside the text box:
- complete name looks for a complete common or scientific name of an
organism, or the complete name of any other taxonomic node, e.g., dog, Canis
familiaris, Canidae. However, the terms such as or Canid
will not retrieve any records because they are not complete names.
- wild card allows searching with an asterisk as a wild card anywhere
in the string, e.g.: Canid* or *anid*, or Ca*id
- token set searches for any string, whether in the middle of a word
or at the end, e.g., dog. However, "dog" will appear as a complete word, and not
a word stem, in the retrieved records. For example, the search will retrieve dog
hookworm and black-tailed prairie dog, but it will not retrieve dogfish sharks or
- phonetic name searches for names pronounced phonetically, e.g., a
"drosofila" will retrieve Drosophila.
- taxonomy id searches by TaxId, which is a unique identifer assigned
to every node in the taxonomic hierarchy. Some examples of TaxIDs: human (9606),
Mammalia (40674), Canidae (9608), and for Canis familiaris (9615).
The "lock" check box preserves your selected search mode for subsequent
searches. If the box is not checked, the Taxonomy Browser will return to
the default search mode of complete name after each search.
Browsing up and down the taxonomic tree
The NCBI Taxonomy Browser displays the lineage of every organism and
taxonomic node. (The full lineage is displayed by default. Click the
"Lineage" link to toggle between the full and abbreviated lineage
Browsing Up the Tree: Click on any link in the lineage to
display the complete record for a higher taxonomic node. For example, click on
the order Carnivora in the dog's lineage to display information about that
Browsing Down the Tree: The default display for a taxon
will show three levels of organisms that fall under the taxon being
displayed. If desired, change the number of levels displayed in the taxonomic
For example, when viewing the tree for Carnivora, change the number of levels from
3 to 5 and press the Display button. Click on any taxon name to display its
complete record, including its "Entrez Records" summary table.
"Subtree Links" vs. "Direct Links" in "Entrez Records" summary table
When viewing the complete record for any organism or node in the taxonomic
hierarchy, the "Entrez Records" summary table includes columns labeled:
For example, click on the link for the genus Canis (in the lineage
section of the dog record) and view the complete record for that taxonomic node.
Follow the number for Nucleotide records in the "direct links" column to see
sequence records for which the submitters noted only the genus Canis as the
source organism. Follow the number for PopSet records to see population or
phylogentic studies that include sequences from several species in the genus
- subtree links -- all records for that taxonomic node and all nodes
that fall beneath it
- direct links -- records that have the specific organism name
(or taxonomic node) listed as the source organism for the data.