Hedging their Mets: The Use of Uncertainty Terms in Clinical Documents and its Potential Implications when Sharing the Documents with Patients
David A. Hanauer, MD, MS, Yang Liu, MSI, [...], and Kai Zheng, PhD
Abstract
In this study, we quantified the use of uncertainty expressions, referred to as ‘hedge’ phrases, among a corpus of 100,000 clinical documents retrieved from our institution’s electronic health record system. The frequency of each hedge phrase appearing in the corpus was characterized across document types and clinical departments. We also used a natural language processing tool to identify clinical concepts that were spatially, and potentially semantically, associated with the hedge phrases identified. The objective was to delineate the prevalence of hedge phrase usage in clinical documentation which may have a profound impact on patient care and provider–patient communication, and may become a source of unintended consequences when such documents are made directly accessible to patients via patient portals.
Introduction
Uncertainties pervade medicine.1 They originate in multiple sources ranging from clinicians’ course of investigation for pinpointing the right diagnoses2 3 to patients’ pondering whether to start a medical treatment.4 Expressions found in clinicians’ written or oral communications describing such uncertainties are known as ‘hedging,’ herein defined as “any term or phrase that is ambiguous or lacks clear precision.” Hedging may take multiple forms such as those related to probability (e.g., likely, probable, possible), frequency (e.g., often, occasionally, sometimes), or quantity (e.g., many, few, some). These concepts have also been described as being ‘underspecified’ in relation to their use in clinical practice guidelines.5
Numerous attempts have been made to study clinicians’ use of hedging phrases, particularly with respect to probability expressions. Some early studies, for example, investigated how clinicians interpret the meaning of hedge expressions and how this interpretation might in turn affect intra-clinician communication and distributed medical decision-making.6–14 Recent studies explored the interpretation of probability expressions in clinical documents such as pathology reports15–17 and radiology reports.18 19
Similar investigations have also been conducted to study how patients and families interpret the meaning of hedge phrases.12 20–25 These studies may have been driven by the medical establishment’s growing acknowledgement of the importance of information transparency and patient autonomy. A recent survey study, for example, assessed if physicians and patients interpret hedge phrases differently. The results suggested that a great deal of variability exists with respect to how physicians and patients associate quantitative values to hedge phrases differently.26
It has been demonstrated that uncertainty expressions used by clinicians could result in negative patient perceptions during patient–provider encounters.27 28 Nonetheless, such encounters are almost always conducted through face-to-face (or telephone) contact in which hedge phrases used could be timely clarified during the interaction. Written notes in patient charts, on the other hand, have been traditionally purposed to record data to support intra-clinician communications not intended to be directly viewable by patients. This premise could soon change, however, with an increasing recognition of the value of making clinician notes available to patients to improve information transparency. While most commercial electronic health record (EHR) systems share primarily structured, non-narrative data through patient portals, the OpenNotes Project (http://www.myopennotes.org/) has been experimenting with the concept of letting patients and families have direct access to their data including clinician notes.29–31 In such scenarios, clinicians’ ‘hedging’ language is directly exposed to patients, which could result in both confusion and negative patient perceptions.
Therefore, while often justified due to the inherent uncertainties in medicine, the use of hedge phrases, combined with medical ‘lingoes’ speed-written (e.g., “mets” for describing cancer “metastases”), could create confusion among clinicians, and could become even more problematic when the communication is conveyed between healthcare providers and their patients. In this study, we sought to take a fresh, comprehensive look at clinicians’ use of hedge phrases in an EHR. We first collected a wide range of hedge phrases drawn from the literature, and determined the frequency of these terms appearing in a corpus of clinical documents retrieved from an EHR. We then applied natural language processing (NLP) to identify key clinical concepts to assess how often they appeared in association with the hedge phrases. The results could provide valuable insights into understanding clinicians’ use of uncertainty expressions in clinical documents and its potential implications when sharing such documents with patients.
Methods
A. Literature Review and List Compilation
We first conducted a literature search in PubMed to identify published studies that had described hedge phrases used in clinical documentation and communication. Keywords searched included “hedging,” “uncertainty,” “ambiguous,” and “probability expressions” (and their alternative spellings), in combination with relevant terms such as “clinical,” “pathology reports,” and “medical records.” All hedge phrases mentioned in the papers we found were extracted. In addition, we developed variants based on the phrases identified. For example, “can’t rule out” was reported in two papers15 16 while “unable to rule out” was never mentioned; the latter was deemed as a reasonable alternative and was added to our list.
B. Empiric Dataset
In this study, we analyzed a dataset consisting of 100,000 clinical documents retrieved randomly from our institution’s EHR system. They belonged to hematology/oncology patients whose decedent status was double confirmed using our EHR and tumor registry. These documents represent a wide variety of note types ranging from admission notes to discharge summaries. All documents were de-identified prior to analysis using the MITRE Identification Scrubber Toolkit (MIST).32 The use of these documents was reviewed and determined to be exempt as nonhuman subject research by our institutional review board.
C. Frequencies of Hedge Phrases
First, we split the 100,000 documents into a total of 3,863,418 sentences. Then, we identified those sentences containing at least one hedge phrase. Regular expressions were used to identify the maximum possible match of these phrases. In other words, the phrase “evidence of” would not be double counted if a longer phrase “no evidence of” is identified.
Because the word “may” was considered to be ambiguous beyond its hedging connotation (“May” can also be a month of the year), we excluded that term from our analysis when it was capitalized unless it was the first word of a sentence. Document metadata were included in the analysis to examine differences in hedge phrase usage across document type, clinical department, and data entry method. Percentages of hedge phrase usage between departments (reported in Table 4) were compared using a ‘2-sample test for equality of proportions with continuity correction’ in R version 2.13.2 for Mac OS X.
D. Concept Identification using Natural Language Processing
We used the MetaMap software released by the National Library of Medicine in 2011 to determine the types of clinical concepts frequently associated with the hedging phrases.33 34 MetaMap is an NLP software program that identifies clinical concepts and maps them to other relevant concepts contained in the Unified Medical Language System (UMLS). The sentences containing at least one hedge term, as previously identified, were processed using MetaMap, and all phrases in the sentence that mapped to one or more UMLS concepts were identified and mapped to a specific semantic type. Only those phrases with a score of 1000, i.e., a perfect match, were retained to help ensure reliable results. For cases in which a phrase had more than one semantic type with a perfect score, both were retained.
Then, we excluded all mapped phrases that were beyond a window size of 4 with respect to the hedge phrase. This window size was chosen because it had been shown to be ideal in work related to assertion classification.35 Finally, we summarized the frequency of the semantic types that were associated with the hedge phrases. Negation by MetaMap was not considered in this analysis since many of our terms were themselves negation terms (e.g., “no evidence for,” “does not appear,” “definitely not”).
Results
A total of 31 papers were identified through the literature search,6–20 22–26 35–45 which provided a list of 313 distinct hedge phrases (available upon request). An additional 28 terms were added, most of which represented slight variations from the phrases in the literature (e.g,, including ‘not much of a chance’ based on the published phrase ‘not much chance’). Among the 313 phrases directly drawn from the literature, thirty-three appeared in 4 or more papers, shown in Table 1.
Most of the papers quantified the meaning of expressions through the use of surveys and interviews. Among them, we found a lot of variability in the approaches used to perform the quantification and/or in the manner in which the results were reported. An example showing the results of studying the word ‘probable’ is provided in Figure 1. This figure displays visual comparisons of the results and reporting approaches from each study, and also demonstrates the wide range of probabilities assigned by subjects.

Among our list of 341 hedge phrases, 265 appeared in the corpus at least once. Examples of phrases that did not appear even once include ‘not compatible with’, ‘more often than not’, and ‘moderate probability.’ The 30 hedge phrases that appeared in the largest number of documents are listed in Table 2. Among the 30 most frequently used hedge phrases discovered in this study, only 10 were commonly reported in the literature. The rest were mentioned in 3 or fewer publications.
Table 3 lists the 20 most frequently occurring document types in our sample, representing 82.8% of all documents we analyzed. Nearly all (98.1%) of ‘Admission History & Physical’ documents contained at least one hedge phrase whereas the nursing notes (both standard and procedure notes) contained far fewer hedge phrases (31.5% and 13.5%, respectively). Other nursing notes not shown in the table also contained few hedge phrases. For example, only 14.4% of the ‘Nursing Progress Note’ documents contained a hedge phrase. In Table 4 are some of the clinical departments in our health system and the percentage of documents containing hedge phrases created by each department. Notes generated by the Department of Psychiatry used many hedge phrases whereas Nursing did not, consistent with the note types in Table 3.
Table 5 shows the top 30 UMLS semantic types associated with a hedge phrase identified by MetaMap. Many of these high level categories are clinically relevant including ‘Disease or Syndrome,’ ‘Finding,’ and ‘Sign or Symptom.’ Figure 2 displays a network graph showing the relationships between the most common semantic types identified by MetaMap and the most frequent hedge phrases associated with those semantic types. The term ‘likely’ was associated with 9 of the top 10 semantic types, followed by ‘may’ and ‘possible’ which were each associated with 8 of them. Table 6 provides examples of hedge phrases associated with clinical concepts identified with our approach, drawn from our document corpus.

In a prior study using a subset of the corpus, we showed that there are linguistic property differences in documents that were typed versus dictated/transcribed that could influence the performance of NLP systems.46 The current study included 47,665 documents that were typed by clinicians and 19,882 documents created via dictation/transcription. The remainder of the documents were added from other source systems but we could not ascertain how they were created based on the metadata. Among the typed notes, 66.52% had at least one hedge phrase, whereas 86.81% of the dictated notes had at least one phrase, and this difference was highly significant (p < 0.001).
Discussion
It is not surprising that clinicians’ frequent use of hedge phrases has continued into the era of EHR. With the availability of a large-size corpus, our study was able to further quantify several nuances in hedge phrase utilization. For example, the results show that physicians express uncertainties far more frequently than nurses. While hedge phrases are present in a variety of documents, they appear in distinct rates.
Among the semantic types detected using MetaMap, many of them are clinically relevant. There were nearly 35,000 instances in which a hedge phrase was associated with a ‘Disease or Syndrome.’ Such instances must be carefully communicated to patients as it has been demonstrated that “in situations where there is substantial uncertainty, extra vigilance is required to ensure that patients are given the tools and information they need to participate in cooperative decision making about their care.”47
It should be noted that disclosing uncertainty to patients, rather than concealing it, can yield better results. It has been shown that educating patients regarding the limitations of medicine (such as the inherent uncertainty in diagnoses and prediction of disease progression) can improve clinician–patient communication as well as increase patient involvement and understanding of their illness.48 For example, a study conducted in a general medicine clinic found that physicians who expressed uncertainty and educated patients about it received higher patient satisfaction scores.49 In another study setting, similar results were reported that clinicians’ willingness to disclose and openly discuss uncertainty with patients led to better patient satisfaction.50 Nonetheless, with a growing interest in making clinical data more accessible to patients via patient portals, clinical documents containing hedge phrases may now be directly exposed to patients without adequate clarification, which could result in unintended adverse consequences. Specific strategies, therefore, need to be developed to better handle hedge phrases in patient-facing scenarios. Such strategies may include providing computerized alerting mechanisms to detect certain hedge phrase usage that may be problematic, or including contextual educational materials that provide explanations to patients about the cause of the uncertainty and its implications.42 44 51–54 Involving patients in developing strategies for handling these commonly used phrases will be important.
This study has several limitations. First, we quantified the multitude of hedge phrase usage without manually reviewing each instance to verify what the context was and whether the use was appropriate. For example, we did see that the word “may” also appeared in non-ambiguous contexts related to its meaning of ‘permission’ such as “I explained that he may call back at any time with further questions.” Adding a qualitative component to the study wherein patients and physicians interpreted a selection of hedge phrases from our corpus would provide additional data about the use and meaning of the phrases. Furthermore, our analysis was conducted based on data collected in a single institution and we were therefore unable to study local differences in how hedge phrases are used or accepted which limits the generalizability of our findings. Finally, the association of a semantic entity and a hedge phrase was determined approximately; the hedge phrase appearing within 4 words of a semantic entity might not have necessarily been used to modify the very entity. Further work with manual verification would help to improve the precision of the analysis.
Conclusion
We empirically evaluated clinicians’ use of hedge phrases in their clinical documentation by examining the frequency of these phrases and their association with clinical concepts identified through natural language processing. We found that hedge phrases are used in a substantial proportion and variety of clinical documents, and many are associated with clinically relevant concepts. The use of hedge phrases in clinical notes has the potential for unintended consequences as patients begin to gain direct access to their notes through patient portals.
Acknowledgments
This work was supported in part by Grant HHSN276201000032C received from the National Library of Medicine, the UMCC Support Grant CA46592 received from the National Cancer Institute, and Grant UL1RR024986 received from the National Center for Research Resources (NCRR). We thank Lei Yang for his assistance in implementing MetaMap.


