• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 1, 2001; 29(1): 1–10.

The Molecular Biology Database Collection: an updated compilation of biological database resources


The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists’ attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the initial point from which to find specialized databases that may be of use in biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches. Short, searchable summaries of each of the databases included in the Collection are available through the Nucleic Acids Research Web site, at http://www.nar.oupjournals.org.

With the advent of the new millennium, the scientific community marked a significant milestone in the study of biology—the completion of the ‘working draft’ of the human genome (1). Amongst much fanfare, the completion of the working draft was announced by President Clinton at a White House ceremony on June 26, 2000 (http://www.whitehouse.gov/WH/New/html/20000626.html). This announcement signaled that the majority of biological and biomedical research would now be conducted in a ‘sequence-based’ fashion. This new approach, long-awaited and much-debated, promises to quickly lead to advances not just in the understanding of basic biological processes, but in the prevention, diagnosis and treatment of many genetic and genomic disorders. While the fruits of sequencing the human genome may not be known or appreciated for another hundred years, the implications to the basic way in which medicine will be practised in the future is staggering.

At the time of writing of this paper, the International Human Genome Sequencing Consortium had fully finished 24.7% of the human sequence, with another 66.2% of the sequence being available in draft form. In the course of this sequencing, two of the human chromosomes have been finished, namely chromosomes 21 and 22 (2,3). Even with most of the chromosomes incomplete, some interesting insights have already been made into the structure of the human genome, such as a decided down-estimate in the number of genes actually in the human genome. While most of the attention of the scientific community and the public at large has focused on the human sequence, a number of model organisms have also been sequenced, including that of the fruit fly (Drosophila melanogaster) in 2000 (4); the complete genomes of organisms such as the rat and the mouse will quickly follow over the next several years. Efforts are also focused on sequence variation, with the SNP Consortium anticipating the identification of a million single nucleotide polymorphisms (SNPs) by the end of 2000, far ahead of the initial goal of discovering 100 000 SNPs by 2003 (1).

Database efforts have kept pace with the furious rate at which this sequence data is being generated, providing investigators access to all public data in a practically instantaneous fashion (5). While most biologists are familiar with the databases comprising the International Nucleotide Sequence Database Collaboration (DDBJ, EMBL and GenBank), numerous other specialized databases have emerged. These specialized databases often arise out of a particular need, whether it be to address a particular biological question of interest or to better serve a particular segment of the biological community. This journal has devoted its first issue over the last several years to documenting the availability and features of these specialized databases in order to better serve its readership and to promote the use of these resources in the design and analysis of experiments. These reviewed databases are collectively listed in the Molecular Biology Database Collection.

The databases included in the current version of the Collection are shown in Table Table1.1. This year, 55 new entries have been added, bringing the total number of databases listed to 281. While this number may seem large for a ‘curated collection’, these databases distinguish themselves by their approach to presenting the underlying data–for example, by adding new value to the underlying data by virtue of curation, by providing new types of data connections or by implementing other innovative approaches facilitating biological discovery. The individual entries are classified by type, but the reader should recognize that the distinctions between these classes are often arbitrary, and that many of these databases provide more than one type of information to the user.

Table 1.
Molecular Biology Database Collection

In addition to the list presented in this paper, an electronic version of the Database Issue and Collection can be accessed online and is freely available to everyone, regardless of subscription status, at http://www.nar.oupjournals.org. While the list contains the databases described in the papers comprising the current issue, it should be immediately apparent to the reader that there are simply not enough pages in this journal to accommodate full-length, printed descriptions of all 281 of the databases featured here. To address this, the online version of the Collection now includes short summaries of many of the databases, the summaries having been provided directly by the investigators responsible for the individual databases. It is hoped that this approach will provide the reader with an additional source of information that will facilitate finding and selecting the sources of data that would be of most value in addressing a specific biological problem. Contributors will be encouraged to keep their entries up-to-date, as the online descriptions will be updated on a regular basis.

Suggestions for the inclusion of additional database resources in this Collection are encouraged and may be directed to the author (vog.hin.irghn@ydna).

Supplementary Material


I wish to thank Yi-Chi Barash for designing the Web-based submission tool for this Collection as well as for her technical support.


1. Collins F.S., Patrinos,A., Jordan,E., Chakravarti,A., Gesteland,R., Walters,L. and members of the DOE and NIH Planning Groups (1998) New goals for the U.S. Human Genome Project: 1998–2003. Science, 282, 682–689. [PubMed]
2. Hattori M., Fujiyama,A., Taylor,T.D., Watanabe,H., Yada,T., Park,H.S., Toyoda,A., Ishii,K., Totoki,Y., Choi,D.K. et al. (2000) The DNA sequence of human chromosome 21. The chromosome 21 mapping and sequencing consortium. Nature, 405, 311–319. [PubMed]
3. Dunham I, Shimizu,N., Roe,B.A., Chissoe,S., Hunt,A.R., Collins,J.E., Bruskiewich,R., Beare,D.M., Clamp,M., Smink,L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489–495. [PubMed]
4. Adams M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. (2000) The genome sequence of Drosophila melanogaster. Science, 287, 2185–2195. [PubMed]
5. Guyer M.S. (1998) Statement on the rapid release of genomic DNA sequence. Genome Res., 8, 413. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...