• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jamiaJAMIA - The Journal of the American Medical Informatics AssociationInstructions for authorsCurrent TOC
J Am Med Inform Assoc. 2001 Jul-Aug; 8(4): 391–397.
PMCID: PMC130084

Searching for Clinical Prediction Rules in Medline


Objectives: Clinical prediction rules have been advocated as a possible mechanism to enhance clinical judgment in diagnostic, therapeutic, and prognostic assessment. Despite renewed interest in the their use, inconsistent terminology makes them difficult to index and retrieve by computerized search systems. No validated approaches to locating clinical prediction rules appear in the literature. The objective of this study was to derive and validate an optimal search filter for retrieving clinical prediction rules, using the National Library of Medicine's medline database.

Design: A comparative, retrospective analysis was conducted. The “gold standard” was established by a manual search of all articles from select print journals for the years 1991 through 1998, which identified articles covering various aspects of clinical prediction rules such as derivation, validation, and evaluation. Search filters were derived, from the articles in the July through December issues of the journals (derivation set), by analyzing the textwords (words in the title and abstract) and the medical subject heading (from the MeSH Thesaurus) used to index each article. The accuracy of these filters in retrieving clinical prediction rules was then assessed using articles in the January through June issues (validation set).

Measurements: The sensitivity, specificity, positive predictive value, and positive likelihood ratio of several different search filters were measured.

Results: The filter “predict$ OR clinical$ OR outcome$ OR risk$” retrieved 98 percent of clinical prediction rules. Four filters, such as “predict$ OR validat$ OR rule$ OR predictive value of tests,” had both sensitivity and specificity above 90 percent. The top-performing filter for positive predictive value and positive likelihood ratio in the validation set was “predict$.ti. AND rule$.”

Conclusions: Several filters with high retrieval value were found. Depending on the goals and time constraints of the searcher, one of these filters could be used.

Clinical prediction rules, otherwise known as clinical decision rules, are tools designed to assist health care professionals in making decisions when caring for their patients. They comprise variables, obtained from the history, physical examination, and simple diagnostic tests, that show strong predictive value.1 The use of these rules to assist in decision making relevant to diagnosis, treatment, and prognosis has been the subject of increasing discussion over the past 15 years.

Wasson et al.2 brought clinical prediction rules to the forefront in a seminal article on their evaluation, validation, and application to medical practice. Twelve years later, Laupacis et al.,1 building on the work of Wasson et al., raised awareness of this topic by reviewing the quality of published reviews and suggesting further modifications of methodological standards.

With the advent of managed care and evidence-based medicine, interest in easily administered and valid rules that are applicable in various clinical settings has increased.

A recent addition to the JAMA Users' Guides to the Medical Literature3 again highlights the use of clinical decision rules. As the article illustrates, the National Library of Medicine's medline database is often used to locate articles that discuss the derivation, validation, and use of such rules. Because of inconsistent use of terminology in describing clinical prediction rules, the rules are difficult to index and retrieve by computerized search systems.

In 1994, Haynes et al.4 developed optimal search strategies for locating clinically sound studies in medline by determining sensitivities, specificities, precision, and accuracy for multiple combinations of terms and medical subject headings (from the MeSH thesaurus) and comparing them with results of a manual review of the literature, which provided the “gold standard.” They showed that medline retrieval of these studies could be enhanced by utilizing combinations of indexing terms and textwords.

In 1997, van der Weijden et al.5 furthered the research called for in the article by Haynes et al. by determining the performance of a diagnostic search filter combined with use of disease terms (e.g., urinary tract infections) and content terms (e.g., erythrocyte sedimentation rate, dipstick). This study, comparing filter with gold standard, again confirmed that the combination of MeSH terms with textwords resulted in higher sensitivity than the use of subject headings alone.

A second study, published in 1994 by Dickersin et al.,6 found that sensitivities of search strategies for a specific study type, randomized clinical trials, also benefited from the use of textwords and truncation. The present work has incorporated the findings of these studies and has again extended the analytic survey of search strategies for clinical studies, by Haynes et al., to include clinical prediction rules.


This study was designed with two phases, derivation and validation (Figure 1[triangle]). Initially, a gold standard was determined to identify clinical prediction rules, through a manual review of print journals from 1991 to 1998. In the derivation phase, the performance of the search filters in terms of accuracy measures was determined. In the validation phase, a different set of journal articles for the same years was used to assess the external validity of the derived filters. Medline, accessed through OVID Technologies (New York, NY) Web interface, was used in both phases. The comparison (filter vs. gold standard) yielded values for sensitivity, specificity, positive predictive value, and positive likelihood ratio, as shown in Figure 2[triangle].

Figure 1
Flow diagram of derivation and validation phases.
Figure 2
Calculation of performance measures.Sensitivityequals a/(a+c), the proportion of articles with clinical prediction rules that were retrieved by filter. Specificityequals d/(b+d), the proportion of articles without clinical prediction rules that were ...

Gold Standard

The foundation of the data set consisted of 34 reports on clinical prediction rules found by Laupacis et al.1 in their manual review of the literature from Jan 1, 1991 through Dec 31, 1994. Building on this work, all articles of every issue of the same four journals— Annals of Internal Medicine, BMJ, JAMA, and New England Journal of Medicine—from Jan 1, 1995 through Dec 31, 1998 were manually reviewed to identify clinical prediction rules. To ensure a sufficient data pool, two additional journals—Annals of Emergency Medicine and the Journal of General Internal Medicine—were reviewed from Jan 1, 1991 through Dec 31, 1998.

The reviewers used the Laupacis definition of clinical prediction rules as the standard and criteria for inclusion in the present study; that is, studies that contained a “prediction-making tool that included three or more variables obtained from the history, physical examination, or simple diagnostic tests and that either provided the probability of an outcome or suggested a diagnostic or therapeutic course of action”1 were selected. This included articles that derived, evaluated, or validated clinical prediction rules.

The initial review identified 211 potential rules. These were independently read by a librarian and a physician, both with expertise in evidence-based medicine. Disagreements were resolved by completion of a third review, based on the Laupacis definition, followed by discussion and consensus.

Eighty-five articles met the criteria for inclusion. These, plus the 34 reports from the Laupacis study, resulted in a data set of 119 articles on clinical prediction rules—63 in the derivation set (July through December 1991 to 1998) and 56 in the validation set (January through June 1991 to 1998). This longitudinal division allowed the search filter to incorporate changes in indexing and terminology over time.

Of the 63 studies in the derivation set, 27 (43 percent) described the development of a particular clinical prediction rule, 8 (13 percent) involved the validation of a previously derived rule, and 20 (32 percent) described both development and validation of a clinical prediction rule. Of the 56 studies in the validation set of this report, 19 (34 percent) included the development of a clinical prediction rule, 8 (14 percent) involved validation of a previously derived rule, and 21 (38 percent) included both development and validation of a clinical prediction rule.

Both phases used articles from the six reviewed journals. In the derivation phase, all 10,877 articles from July through December 1991 to 1998 were used; the remaining 10,878 articles (January through June 1991 to 1998) were used for the validation phase.

Comments, letters, editorials, and animal studies were excluded from the total number of articles, as in the study by Boynton et al.7


Titles and abstracts of the reports in the derivation set were downloaded from medline and imported as 63 unique records into SimStat statistical software (Provalis Research, Montreal, Canada). WordStat, the content-analysis module for SimStat, was used to calculate the word frequencies and record occurrences for every textword (word used in the title or abstract of articles) of the 63 records. Word frequency is equal to the total number of times each word was used in the 63 records. Record occurrence is equal to the total number of unique records that contained each word.

It is possible for different words to have the same word frequency but different record occurrences, as shown in Table 1[triangle]. Words that retrieved more records were considered higher impact terms with the potential for greater retrieval. The record occurrence value was used as the first filter in the derivation process. Textwords occurring in less than 15 percent of the records were excluded. This reduced the total number of words from 1,458 to 169.

Table 1
[filled square] Content Analysis of Retrieval Value

An exclusion dictionary, created in WordStat, filtered out terms that were common to many studies, leaving those terms that were unique to clinical prediction rules. Terms in the following categories were excluded: adverbs, articles, conjunctions, prepositions, pronouns, and words for anatomic parts, demographics, locations (e.g., emergency room, hospital), measurements (e.g., positive, high), named populations (e.g., physicians, nurses), numbers, analytic methods (e.g., multivariate, statistical), and study types (e.g., cohort, trials) as well as single- or two-letter abbreviations. Since the goal was to retrieve all clinical prediction rules, specific diagnoses, treatment procedures, and outcomes (e.g., mortality, hospitalization) were also excluded. Filtration through the exclusion dictionary reduced the 169 terms to 66. To accommodate variations in grammatical form and maximize retrieval potential, terms were truncated using the symbol $. For example, predict$ retrieves predicted, prediction, and predicts.

The retrieval value of each textword was determined using accuracy measures. Of the 66 textwords , a sensitivity of at least 20 percent and a positive predictive value of at least 1.5 percent were necessary for a word to remain in further consideration. This narrowed the set to 22 terms. Sensitivity was defined as the proportion of clinical prediction rules in the gold standard that were retrieved using a given filter (Figure 2[triangle]). Positive predictive value was the proportion of retrieved articles that contained clinical prediction rules.

The MeSH terms, used to index the 63 reports, were considered separately. Subject headings were downloaded from medline into Microsoft Excel. They were sorted alphabetically and then filtered using the exclusion dictionary. Based on highest frequency, four headings were included: Decision Support Techniques, Predictive Value of Tests, Logistic Models, and Risk Factors.

The list of search terms totaled 26 (22 single textwords and 4 MeSH terms). Using the search operators AND and OR, two-term combinations were created. All 650 possible combinations were searched in the derivation set, and accuracy measures of these combinations were calculated (Figure 2[triangle]). Based on the performance measures of the two-term combinations (e.g., high sensitivity), 18 search strategies were developed using three or more combinations of textwords and MeSH terms. Therefore, using the derivation set, a total of 694 search filters were assessed for usefulness in retrieving clinical prediction rules.


The filters from the derivation phase were searched in medline using the journal articles from the validation set. Accuracy measures (Figure 2[triangle]) and 95 percent confidence intervals were calculated.


The filter “predict$ OR clinical$ OR outcome$ OR risk$” yielded the highest sensitivity, 98.4 percent. Top filters ranked by sensitivity are listed in Table 2[triangle]. The single term with the greatest sensitivity (78.6 percent) in the validation set was “predict$.” The filter with the highest specificity (99.97 percent) in the validation set was “predict$.ti. AND rule$,” although the sensitivity was 16.1 percent. Of single textwords or MeSH terms, “Decision Support Techniques” yielded the highest specificity (99.5 percent), followed by the single term “rule$” (99.3 percent).

Table 2
[filled square] Performance Measures for Search Filters with Highest Sensitivities

Four filters were found to have both sensitivity and specificity greater than 90 percent; these are listed in Table 3[triangle]. Using the top two filters in Table 3[triangle], searches were conducted using PubMed (the National Library of Medicine's medline retrieval service) to assess retrieval for a specific example. Similarly to the example described in the JAMA guide for clinical decision rules, the filters were combined with the MeSH term—Ankle Injuries—and limited to human studies in English published from 1995 through 2000. These searches yielded 55 and 67 articles, respectively.

Table 3
[filled square] Performance Measures for Search Filters with High-accuracy Measures

Of the 55 articles retrieved, 26 (47 percent) discussed clinical prediction rules; of the 67 articles, 28 (42 percent) discussed clinical prediction rules. In contrast, the filter with the highest sensitivity but a lower positive predictive value, “predict$ OR clinical$ OR outcome$ OR risk$,” retrieved 345 articles, of which 29 (8 percent) discussed prediction rules.

Filters with the highest positive predictive value and positive likelihood ratios (Table 3[triangle]) had low sensitivities. Of the three shown in Table 3[triangle], “predict$.ti. and rule$” performed most consistently in both derivation and validation sets. Of single terms, “validat$” and “rule$” were top performers. “Validat$” had a positive predictive value of 23.5 percent and a positive likelihood ratio of 59.3 in the validation set. “Rule$” had a positive predictive value of 19.1 percent and a positive likelihood ratio of 45.6.


The goal of this study was to develop and validate an optimal search filter for the retrieval of clinical prediction rules. It was realized, during the course of this study, that one filter could not address the needs of the researcher, clinician, and educator simultaneously. For researchers concerned with maximum retrieval of studies containing clinical prediction rules, a search filter with high sensitivity would be advisable (Table 2[triangle]). As noted previously, the filter with the highest sensitivity captured 98 percent of such studies.

A researcher may want the highest sensitivity to avoid missing a valuable article, whereas a busy clinician may be willing to sacrifice some sensitivity for a higher positive predictive value, resulting in fewer nuisance articles (those that do not contain clinical prediction rules). Several filters performed well for both sensitivity and specificity and had higher positive predictive values (Table 3[triangle]). Depending on the level of searching expertise and the medline search system available, the user may find one of these four filters a good starting point.

For the medical educator who wants to quickly retrieve a clinical prediction rule for illustrative purposes, the use of “predict$.ti. AND rule$” would be a wise choice, yielding three relevant articles for every four articles retrieved (positive predictive value 75.0 percent in the validation set). Although such a search will not yield a comprehensive list of articles containing clinical prediction rules (i.e., the sensitivity is low), it is useful for educators who want to quickly find an article with a clinical prediction rule for educational demonstration. Computer database searching is integral to the practice of evidence-based medicine. Optimal retrieval of the best evidence is based on the formulation of a well-defined question, which includes population, intervention, comparison and outcome, and its translation into a searchable strategy. This was evidenced by the search for ankle injuries, which resulted in an increase in the positive predictive value, from less than 10 percent to over 40 percent. The number of nuisance articles may be reduced when a specific disease, therapy, intervention, or outcome—or a combination of these—is incorporated into the search, as needed.


While the lack of standard nomenclature in articles describing clinical prediction rules and the current vocabulary used for indexing do not offer a simple mechanism for their retrieval, several validated filters perform quite well. The choice of the filter is dependent on the goal of the searcher.


The authors thank Carol Lefebvre, Information Specialist at the UK Cochrane Centre, for suggesting that the software SimStat/WordStat, being investigated by colleagues Victoria White and Julie Glanville (NHS Centre for Reviews and Dissemination, University of York), might be suitable for this project.


This work was supported in part by a grant received through the New York State/United University Professions, Joint LaborManagement Committees, Individual Development Awards Program.


1. Laupacis A, Sekar N, Stiell IG. Clinical prediction rules: a review and suggested modifications of methodological standards. JAMA. 1997;277:488–94. [PubMed]
2. Wasson JH, Sox HC, Neff RK, Goldman L. Clinical prediction rules: applications and methodological standards. N Engl J Med. 1985;313:793–9. [PubMed]
3. McGinn TG, Guyatt GH, Wyer PC, Naylor CD, Stiell IG, Richardson WS. Users' guides to the medical literature, XXII: How to use articles about clinical decision rules. Evidence-based Medicine Working Group. JAMA. 2000;284:79–84. [PubMed]
4. Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC. Developing optimal search strategies for detecting clinically sound studies in medline. J Am Med Inform Assoc. 1994;1:447–58. [PMC free article] [PubMed]
5. van der Weijden T, Ijzermans CJ, Dinant GJ, van Duijn NP, de Vet R. Buntinx F. Identifying relevant diagnostic studies in medline. The diagnostic value of the erythrocyte sedimentation rate (ESR) and dipstick as an example. Fam Pract. 1997;14:204–8. [PubMed]
6. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ. 1994;309:1286–91. [PMC free article] [PubMed]
7. Boynton J, Glanville J, McDaid D, Lefebvre C. Identifying systematic reviews in medline: developing an objective approach to search strategy design. J Info Sci. 1998;24:137–57.

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of American Medical Informatics Association


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...