Statistics for NCBI Resources
PubMed Entrez BLAST OMIM Taxonomy Structure
NCBI Home

Site Map
Resource Guide
Alphabetical List

About NCBI
general and contact information


GenBank
submit your sequence, general information


Molecular Databases
nucleotides, proteins, structures and taxonomy


Literature Databases
PubMed, PubRef, OMIM, Citation Matcher


Genomes and Maps
maps, the human genome and model organisms


Tools
for data mining and analysis


Research at NCBI
people and projects


Software Engineering
Tools, R&D and databases


Education
teaching resources and on-line tutorials


FTP site
download data and software

 

General tips for obtaining Entrez database statistics back to top

You can determine the number of records in a given Entrez database by viewing the index of the Filter field. Each database has the term "all" in its Filter field. The number in parentheses beside that term is the number of records currently present in the database.

For example, to see the number of records in the PubMed database, follow these steps (the links will open in a separate window). Similar steps can be used to see the number of records in PubMed Central, in the MMDB Structure database, etc.
  • From the Entrez home page, follow the link for the PubMed database
  • On the PubMed database page, select Preview/Index from the grey area under the search box
    There are two search boxes on the Preview/Index page: (a) the search box near the top of the page shows the active query; (b) the search box near the bottom of the page is like a "worksheet" that allows you to browse the index of a search field of interest and/or to select one or more terms from the index for addition to your active query
  • Select the Filter field from the pop-up menu of searchable fields that is shown beside the lower search box.
  • Enter "all" (without quotes) as the search term and press the Index button
    A window will appear at the bottom of the page that allows you to see your term in the index of the search field, and to browse up and down the index. (Tip: If no term is entered in the search box before pressing the "Index" button, the system will automatically take you to the first term in the index. Entering a search term simply forces the system to jump to a specific part of the index.)
  • The number in parentheses beside the term "all" is the number of records currently in the PubMed database.
Additional statistics for some Entrez databases are presented on special web pages accessible through the links given in the sections below.

Additional statistics web page for specific databases back to top

Consensus CDS (CCDS) Database Statistics back to top
The Consensus CDS (CCDS) Database home page includes a link to statistics.

As noted on the home page, the Consensus CDS (coding sequence) project is a collaborative effort to identify a core set of human protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations on the human genome.

dbEST Statistics back to top
The Database of Expressed Sequence Tags (dbEST) home page includes a link to information about the current release, which summarizes the number of publicly available ESTs by organism.

dbGSS Statistics back to top
The Database of Genome Survey Sequences (dbGSS) home page includes a link to information about the current release, which summarizes the number of publicly available GSSs by organism.

dbSNP Statistics back to top
The dbSNP Summary page displays statistics for the current dbSNP release, and also provides the ability to view summary information for previous releases.

GenBank Growth Statistics back to top
Graph
Numbers - Current GenBank Release Notes
Numbers - Past GenBank Release Notes

The top of each GenBank Release Notes file shows number of sequence records and bases in a given release.
Specific sections of the Release Notes include additional statistics:

2.2.6 (per division statistics) -- for current release only
2.2.7 (per organism statistics) -- for current release only
2.2.8 (growth of GenBank) -- from December 1982 through the present

To plot the growth of data for specific GenBank divisions or organisms, compare the statistics in section 2.2.6 or 2.2.7, respectively, from current and past Release Notes.

Entrez Gene database statistics back to top
The blue sidebar of the Entrez Gene database homepage provides a link to statistics summarizing the number of organisms represented in Entrez Gene from major taxonomic groups such as Archaea, Bacteria, Eukaryota, Viroids, and Viruses, and the number of gene records currently available for each of the organisms. Following the link for an individual organism name, such as Homo sapiens, will display a table showing the current as well as previous number of gene records for that organism.

Gene Expression Omnibus (GEO) Statistics back to top
The upper right corner of the GEO home page provides statistics summarizing the number of platforms, samples, and series currently available in the database.

OMIM Statistics back to top
The blue sidebar of the Online Mendelian Inheritance in Man (OMIM) home page includes a link to OMIM statistics. That shows the total number of records in the database, as well as the breakdown of the number of records in categories that correspond to the MIM number prefixes:

* genes with known sequence
+ genes with known sequence and phenotype
# Phenotype description, molecular basis known
% Mendelian phenotype or locus, molecular basis unknown
no prefix Other, mainly phenotypes with suspected mendelian basis

RefSeq Statistics back to top
The NCBI FTP site for RefSeq includes statistics for the current release and past releases.

Taxonomy Statistics back to top
The NCBI Taxonomy home page includes a link to taxonomy statistics. By default, the cumulative, current statistics are shown for the number of higher taxa, genera, species, and lower taxa represented in NCBI's taxonomy database. The number of taxa that were added in any particular year can be viewed by following the link for the year of interest.

As noted in the Taxonomy database summary description in the Resource Guide, the NCBI Taxonomy Database contains the names and lineages of living and extinct organisms that are represented in the genetic databases with at least one nucleotide or protein sequence. New organisms are added to the database as sequence data are deposited for them. The purpose of the taxonomy project at NCBI is to build a consistent phylogenetic taxonomy for the sequence databases.

Genome Statistics back to top

Entrez Genome Database Statistics back to top
The number of records available in the Entrez Genome database can be determined using the approach described under Entrez Database Statistics (by searching for "All" in the Filter field). Note that an organism can have multiple records in the Entrez Genome database (for example, one for each chromosome and one for each plasmid or organellar genome).

The number of organisms represented in Entrez Genome from each main domain of life -- archaea, bacteria, and eukaryota -- can be viewed by clicking on the domain of interest in the blue sidebar of the page. The top of the resulting page will show the total number of organisms in the group that have records in Entrez Genome, followed by a list of the organism names. The blue sidebar of the Entrez Genome home page provides similar links for viruses, viroids, and organelles.

Whenever possible, statistics for individual genomes are provided as well, such as the size of the genome (and/or individual chromosomes) in base pairs, the number of various features annotated on a genome, etc. The method by which these statistics can be accessed depends upon the software used to view a particular genome. Entrez Genome employs two graphical viewing software programs: (1) an original, basic viewer that is used to show smaller genomes such as bacteria, viruses, and organelles, and (2) a more powerful Map Viewer for larger and more complex genomes, such as those of eukaryotes. Additional information is provided in the next two sections on statistics for individual genomes.

Statistics for Individual Prokaryotic and Viral Genomes back to top
For many organisms in Entrez Genome, statistics showing the number of bases in a genome are shown on the organism's overview page. For example, the Escherichia coli K12, complete genome page shows there are 4639675 bp in the genome. The page also includes a table listing statistics for the features that were annotated on the genome (e.g., all genes, protein coding genes, structural RNAs, pseudo genes, and other features).

Statistics for Individual Eukaryotic Genomes back to top
The Map Viewer is a software program that provides special searching, browsing, and viewing capabilities for a growing subset of organisms in Entrez Genome. Each organism name shown on the Map Viewer home page leads to a "genome view" for that organism. Whenever possible, an organism's genome view page provides a link to the statistics for that organism's current genome build.

For example, the top of the human genome view page includes a link to the current build statistics (under the header "Homo_sapiens genome view", follow the link for "Build XX.X statistics"). The build statistics summarize the types and quantities of data used in the genome build, and the types and quantities of objects (e.g., genes, markers, ESTs, phenotypes) placed on different types of maps (e.g., sequence, genetic, cytogenetic). The chromosome lengths are displayed in the detailed graphical views of individual chromosomes (also known as "map views"). The "View Summary" at the bottom of each map view shows the number of objects on each map in the display. More maps and objects can be displayed using the "Maps&Options" dialog box.

There is also an umbrella page that provides easy access to the build statistics for every organism represented in Map Viewer for which we have statistics.

Usage Statistics back to top

PubMed Usage Statistics back to top
PubMed usage statistics show the number of searches from January 1997 through the present. (The section on "Entrez databases", above, provides tips on how to determine the number of records in PubMed.)

Help Desk NCBI NLM NIH Credits
  Revised October 19, 2006
Questions about NCBI resources to info@ncbi.nlm.nih.gov