• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jmlaJournal informationSubscribeSubmissions on the Publisher web siteCurrent issue of JMLA in PMCAlso see BMLA journal in PMC
J Med Libr Assoc. Jan 2004; 92(1): 97–100.
PMCID: PMC314110

An examination of PubMed's ability to disambiguate subject queries and journal title queries

Aida Marissa Smith, M.L.I.S., AHIP, Medical Reference Librarian1

INTRODUCTION

Term mapping in bibliographic databases optimizes user search results by matching a search term to an appropriate controlled vocabulary list. Term mapping in MEDLINE interfaces is not unusual [1]. PubMed's term-mapping system has evolved since its debut, but has consistently compared favorably to other MEDLINE search interfaces [2]. Two characteristics, employed through the query box, that contribute to its strength are its automatic application and the robustness of its mapping capabilities. Terms entered into PubMed's query box are automatically and seamlessly mapped to terms in one of several tables containing a controlled vocabulary list. It is not apparent to users that automatic term mapping is being applied to a search. An intuitive searcher may recognize that term mapping is being used by inferring from the search instructions located below the search query box or verify that it is used by clicking on the details button. In addition, information about term mapping can be obtained by pursuing the help links.

PubMed's mapping system goes beyond Medical Subject Headings (MeSH) by employing four vocabulary-controlled mapping tables: MeSH Translation Table, Journals Translation Table, Phrase List, and Author Index. When no qualifiers are present in the search phrase, these four tables are applied sequentially. If a match is not found in the MeSH Translation Table, PubMed moves to the Journals Translation Table to look for a match, then to the Phrase List, and finally to the Author Index [3]. If no match is found the first time the mapping tables are applied, the phrase is broken apart into individual terms, from left to right, and the mapping table process is repeated. If a match cannot be found using the translation tables, each term in the phrase is searched individually as an All Fields keyword search and combined with a Boolean AND [4]. Single word journal titles are searched first in the MeSH Translation Table and secondly as an All Fields keyword search [5]. Stop words are ignored throughout the mapping and search process [6]. The application and use of these robust mapping tables are the foundations behind PubMed's single search point query box for subject, journal title, and author queries.

Attention has been given in the literature to identifying and mapping journal title abbreviations to their full forms [7] and to mapping free-text to medical vocabularies [8]. Little attention has been given, however, to the methodology employed by search engines to disambiguate terms that could be construed as either a subject query or a journal title query.

Given the correlation between entries in the MeSH Translation Table and journal titles, the disambiguation of whether a user intends to perform a subject query or a journal title query can be crucial to returning the appropriate search results. For example, a student searching for research articles on the topic of nutrition might search PubMed's query box using the following natural language search terms: “nutrition research.” PubMed's search engine would apply the search terms against its translation tables looking for a match. The search phrase “nutrition research” is not in the MeSH Translation Table and, therefore, would not find a match. The phrase would find a match in the subsequent Journals Translation Table since it is the title of a journal. As a result, the student who had intended to retrieve research articles on the topic of nutrition actually executed a journal title search on Nutrition Research. PubMed has established a set of rules governing journal title searching from the query box [9]. However, casual users would likely overlook PubMed's journal title search nuances, particularly when it interferes with an intended subject search.

The implication of improper disambiguation of subject queries and journal title queries has the potential to mislead PubMed users. A closer look at this issue is warranted. The purpose of this study is to examine PubMed's ability to disambiguate subject searches and journal title searches.

METHODS

Journals listed in any one of the three Brandon/Hill Selected lists—small medical library, nursing, and allied health—were identified for potential use in this study [10]. The Brandon/Hill lists are a recognized collection development authority tool within the medical librarianship community. Additionally, the Brandon/Hill lists cover a wide scope of clinical health care, encompassing medicine, nursing, and the allied health fields.

To arrive at a list of ambiguous search terms, journals were excluded from the search term list if they contained words associated with journal titles: annals, archives association, bulletin, clinics, foundation, institute, journal, magazine, quarterly, society, topics, or proper names. Journals were also removed from the list if PubMed did not index them. The remaining search term list contained 106 journal titles (Table 1).

Table thumbnail
Table 1 Search terms tested in PubMed's query box

Each of the 106 search terms was individually entered into PubMed's query box. After every query, the details button was selected from the results page. The presence of the [MeSH Terms] tag or [Journal] tag was noted. The [MeSH Terms] tag indicated a map to the MeSH Translation Table. The [Journal] tag indicated a map to the Journals Translation Table.

RESULTS

From the initial list of 302 Brandon/Hill journal titles, 196 titles (64.9%) were removed from the search term list for containing terms closely associated with journal titles or because they were not indexed by PubMed. Of the remaining 106 ambiguous journal titles that were used as search terms, 56 (52.83%) were mapped to the Journals Translation Table and 43 (40.56%) were mapped to the MeSH Translation Table. (Table 1).

Seven of the single-word search terms—angiology, cutis, disease-a-month, gut, lancet, physiotherapy, and transfusion—did not match in the MeSH Translation Table. For these search terms, the details button revealed the [All Fields] tag.

DISCUSSION

An examination of the search terms table reveals that a surprising number of subject queries and journal title queries have the potential to be mapped to a translation table that supplies results contrary to a user's intent. An intended subject search from PubMed's query box without qualifiers on the search phrase “physical therapy” would result in a journal title search for Physical Therapy. An intended journal search from PubMed's query box without qualifiers on Emergency Medical Services would result in a subject search on the subject of emergency medical services (Table 1).

The single-word search terms also have the potential to mislead users. An intended journal title search from PubMed's query box without qualifiers on the search term Blood would result in a subject search on “blood.” A single-word search term searched from PubMed's query box without qualifiers, intended as a subject search, could not result in a journal title search. Single-word journal titles, when searched from the query box, are either treated as a subject search by being mapped to the MeSH Translation Table, or as an All Fields keyword search.

Since it is not evident from PubMed's query box that term mapping is applied to search terms, users would be less likely to notice the incorrect interpretations of a journal title search or a subject search. Additionally, the results of a misinterpreted journal title search or subject search would likely retrieve relevant, but incomplete, results, misleading the user into thinking the search was successfully performed.

The weight of this finding is tempered by likelihood of its occurrence. For this limited study, over half of the journal titles (64.9%) were removed from the initial list of 302 Brandon/Hill journal titles for containing words associated with journal titles or because they were not indexed by PubMed.

CONCLUSION

PubMed has difficulty disambiguating subject searches from journal title searches when there is a strong similarity between the natural language expression of a subject and an established journal title. Users intending to perform a subject search may retrieve the results of a journal title search. Users intending to perform a journal title search may retrieve the results of a subject search.

The natural-language expression of subject terms and the corresponding journal titles makes the task of disambiguation difficult, but crucial, to a search engine employing a single search point for subject searches and journal title searches. Since it is not clear that term mapping is being employed from the query box, an even greater need is created for accurate disambiguation by the search engine. Additional investigation is warranted to determine the scope and depth of this problem from a variety of perspectives.

Table thumbnail
Table 1 Continued

REFERENCES


Articles from Journal of the Medical Library Association : JMLA are provided here courtesy of Medical Library Association
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles