• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jmlaJournal informationSubscribeSubmissions on the Publisher web siteCurrent issue of JMLA in PMCAlso see BMLA journal in PMC
J Med Libr Assoc. Apr 2002; 90(2): 253–256.
PMCID: PMC100774

Multiple database coverage of structural biology*

Claudia Lascar, M.P.A., M.S.L.I.S., AHIP, Science Reference Librarian1 and Loren D. Mendelsohn, M.S., A.M.L.S., Chief1

INTRODUCTION

Sir John C. Kendrew, founder and past editor-in-chief of the Journal of Molecular Biology, described molecular biology as comprised of two subjects rather than one, each of which developed in relative isolation from one another. One aspect of the discipline he called the “informational school,” which is concerned with the genetic information stored in nucleic acids and the expression of such information. The other aspect of molecular biology he referred to as the “conformational school,” which is concerned primarily with structure (i.e., sequence and conformation) and its relation to molecular function in the living cell [1]. This second aspect became known as “structural molecular biology” or simply “structural biology.” The ultimate goal of structural biologists is to use the sequence data collected by their colleagues in the informational school to predict the three-dimensional structure and physiological function of biological macromolecules.

Over the past few years, sequence databases have experienced extremely rapid growth. This growth has been documented in numerous articles discussing both genome and proteome research [2–4]. With the growth of these databases, one of the great challenges remaining for molecular biologists is to correlate such sequences with the function of the molecules that they encode. Among other things, these efforts will result in a more complete understanding of the genetic defects that are the ultimate causes of such diseases as muscular dystrophy, diabetes, mental disorders, and the like. This clearer understanding of the genetic bases of such diseases will expedite the development of better and earlier gene tests and gene therapies. Moreover, it will advance the field of preventive medicine and contribute to the development of customized therapies. Related research on protein structure-function relationships has already facilitated the drug discovery process [5]. Medical librarians play a decisive role in supporting such biomedical research by developing strong print and electronic collections in structural biology and by instructing researchers on how to search electronic databases to identify articles of interest. Additionally, they may assist researchers in identifying other tools such as sequence databases and structural modeling tools.

In late 1998, a consortium of ten major research institutions was formed under the aegis of The New York City Partnership Policy Center. This consortium designated the City College of New York as the site for the consortium's magnetic resonance research laboratory. The facility was to be named the New York Structural Biology Center (hereinafter referred to as “the center”). The City College Library was asked to identify key library resources to support this center [6]. In the course of this project, the authors developed a strategy for collecting and analyzing data from multiple databases.

Such searching of multiple databases for this purpose has been advocated in the library research literature [7] and is essential in this subject area because of structural biology's multidisciplinary character. This paper demonstrates the need for searching multiple databases for structural biology information. It also identifies which of several indexes—PubMed, Biological Abstracts WebSPIRS version (BA), and Web of Science (WoS)—provides the most complete coverage for this subject in terms of number of citations retrieved when searching. These databases were selected because of their availability to our user community.

METHODOLOGY

There were many differences with respect to these databases in terms of subject coverage, number of journals indexed, source material,§ and searching capabilities.** We therefore determined a methodology that could be applied consistently to each index. This methodology consisted first of identifying a common searching mode; in other words, one that could be used across all three databases. Author searching was the obvious choice, because it was supported by all three databases. Next, we used a brochure describing the initiative to establish the center in the City College of New York, as our point of reference to select all of the eleven scientists who were listed in the brochure; they were identified as leading researchers from the participating medical institutions and universities. To ensure representation for every scientist in our sample, we limited ourselves to a relatively narrow range of years of publication (1995–2000). This range was selected, because some of the more junior scientists in the group had only started publishing in 1995, as indicated by a preliminary search. All searches were performed at the end of April 2001.

The three databases chosen for evaluation have significant differences in their respective approaches to author and subject indexing. Thus, we needed to develop a strategy to compensate for such differences. In the case of PubMed and WoS, authors were indexed only by last name and initials. As a result, author-based searches for structural biology information generated numerous false drops. In PubMed, we dealt with this problem by examining the subject indexing to determine relevance to structural biology. In WoS, we were able to weed out false drops by using author's address as one of our search terms. Because BA indexes on the full name of authors, such filters were less necessary. We found the major and miscellaneous BA subject fields inadequate to ensure relevancy to structural biology, because they lacked the necessary specificity. In all of the databases, we examined abstracts to confirm relevance. If necessary, we examined the original paper to confirm authorship.

Finally, to measure with fairness the search performance of each of the databases, we needed to take into account differences in indexing coverage. We normalized our data by examining only those classes of publications that were indexed by all three databases, namely research journals. We excluded meeting abstracts, because only WoS indexed them, and we excluded review journals, because BA did not index them.

RESULTS

WoS listed 240 articles by our sample group of structural biologists, more than any of the other databases that we examined. PubMed listed 207 articles, and BA listed 188 articles. More significant, however, is the difference between journal coverage among the three databases. WoS, which aims to provide comprehensive coverage of all major scientific and medical journals, comes closest to indexing all of the journals in which members of our sample group published. While PubMed is weak in its coverage of chemistry journals (as would be expected), it nonetheless provides comprehensive indexing of the journals that it does cover. The real (and somewhat dismaying) surprise is that BA falls short in its coverage of biology journals with respect to this study. Most likely, this is because of BA's selective indexing of many of the titles relevant to structural biology.

Table 1 compares and contrasts the index terms that were associated with or assigned to the articles published by our sample group. The first group listed are molecular or substance descriptors, the second group covers structural considerations, and the third group covers methods. The similarity between PubMed, BA, and WoS terms is misleading, because there are large variations in the indexing practices of these databases. PubMed uses a controlled vocabulary, Medical Subject Headings (MeSH), which ensures identical indexing for articles that use different terminology for the same concept. Such a controlled vocabulary guarantees uniform and consistent indexing, because it is based on predetermined terms and follows rigorous standards and rules.

Table thumbnail
Table 1 Structural biology subject indexing

BA uses both a controlled and uncontrolled vocabulary. Because none of the terms in the controlled vocabulary is applicable to structural biology, we used several of the subfields that are features of the BA database to determine how structural biology articles are indexed. These subfields include the Miscellaneous Descriptors (MI), which are effectively free-text terms derived from article titles and abstracts; the Chemicals and Biochemicals (CB) field, which contains chemical names as given by the author; and the Methods and Equipment (MQ) field, which includes the apparatus and techniques identified in the source document.

As compared to PubMed and BA, WoS indexing is the least controlled. ISI uses a combination of keywords supplied by the article authors (Keywords) and keywords generated from the titles of references cited in individual articles (KeywordsPlus). Even though the system generated by this method is post-coordinate (keyword-based and non-hierarchical), ISI retains significant control over the index terms because of its rigorous standards. Nevertheless, the system retains the inconsistencies typical of post-coordinate systems, with some variation in indexing from one article to the next. The word variations in the BA and WoS databases are represented in Table 1 by a string of periods. Finding term variations is feasible in BA, because users can browse the word index and explode the terms. It is not possible to do this in WoS, because users can only use truncation or wildcards to introduce word variations. Thus, WoS articles may be accessed by topic only by using text-word searching. The related record feature available in PubMed is also available in WoS, but criteria for determining the relationship are very different. WoS relates articles if they have at least one reference in common, while PubMed requires that related articles be topically connected. Thus, a topic search can be done much more effectively in PubMed. On the other hand, WoS is more effective for author searching, partially due to its broader coverage of the journal source material. Also, its cited reference searching capabilities make it possible to determine relationships between articles that share such cited references.

CONCLUSION

In view of these findings, we recommend searching multiple databases when looking for structural biology information. Those libraries that do not have access to WoS or BA (which are very expensive databases) should complement PubMed by searching journal tables of contents. Such table-of-contents searching should focus on those chemistry journals that are especially important in the field of structural biology (e.g., Journal of the American Chemical Society, Journal of Physical Chemistry, etc.) We further recommend using the Web version of these journals to scan their tables of contents. In this way, relevant articles on structural biology can be quickly identified and obtained for researchers through document delivery services. We also recommend establishing instructional programs that can be used to educate physicians and students in using the full range of structural biology resources. The importance of such resources has been strongly emphasized by Magee, Gordon, and Whelan [8] in their recent article on teaching bioinformatics.

Research on a larger scale using a random sample of scientists is needed to pinpoint more clearly the strengths and weaknesses of these databases with respect to structural biology. We hope that the MeSH index terms from Table 1 will be useful to medical librarians conducting research on this important topic.

Footnotes

* Based on a presentation at the 101st Annual Meeting of the Medical Library Association, Orlando, Florida; May 28, 2001.

 Subject coverage: PubMed covers preclinical and health sciences; Biological Abstracts (BA) covers life sciences including biomedical areas; Web of Science (WoS) covers all scientific disciplines.

 Number of journals indexed: PubMed indexes approximately 4,000 journals; BA indexes approximately 5,200 journals; WoS indexes more than 8,700 journals.

§ Source material: PubMed indexes research articles, review articles, notes, letters, short papers, editorials discussing actual research data, and errata. BA Web edition covers all articles published in research journals, including reviews, letters, notes, editorials, and errata. Reviews published in review journals, papers published in conference proceedings, and monographs are indexed in a separate file, BA/RRM, which was not available to us. WoS indexes all the categories found in PubMed and BA as well as many other categories such as meeting abstracts, software reviews, book reviews, etc.

** Searching capabilities: capabilities vary greatly among the databases due the manner by which the data elements of each record have been chosen and can be manipulated. PubMed supports author searching, keyword searching, and controlled vocabulary based on a huge list of Medical Subject Headings (MeSH). BA supports author searching, keyword searching, and subject searching based on a combination of controlled vocabulary and free text terms. WoS supports author searching, cited author searching, and keyword-based subject searching. In some cases, keywords are provided by individual authors.

REFERENCES

  • Kendrew JC.. Some remarks on the history of molecular biology. Biochem Soc Symp. 1970;30:5–10. [PubMed]
  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, and Wheeler DL. GenBank. Nucleic Acids Res. 2000.  Jan 1; 28(1):15–8. [PMC free article] [PubMed]
  • Pandey A, Lewitter F. Nucleotide sequence databases: a gold mine for biologists. Trends Biochem Sci. 1999.  Jul; 24(7):276–80. [PubMed]
  • Pennisi E. So many choices, so little money. Science. 2001.  Oct 5; 294(5540):82–5. [PubMed]
  • Winkler FK, Banner DW, and Böhm HJ. Structure-based approaches in modern drug discovery. In: Schlichting I, Egner U, eds. Data mining in structural biology: signal transduction and beyond. Berlin, Germany: Springer-Verlag, 2001:123–42.
  • Lascar C, Mendelsohn LD. An analysis of journal use by structural biologists with applications for journal collection development decisions. Coll Res Libr. 2001.  Sep; 62(5):422–33.
  • Kushkowski JD, Gerhard KH, and Dobson C. A method for building core journal lists in interdisciplinary subject areas. J Doc. 1998.  Sep; 54(4):477–88.
  • Magee J, Gordon JI, and Whelan A. Bringing the human genome and the revolution in bioinformatics to the medical school classroom: a case report from Washington University School of Medicine. Acad Med. 2001.  Aug; 76(8):852–5. [PubMed]

Articles from Journal of the Medical Library Association : JMLA are provided here courtesy of Medical Library Association
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles