Reproduced from Saracci58 with permission
The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.
To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.
AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.
We welcome written comments on this evidence report. They may be sent to: Acting Director, Center for Practice and Technology Assessment, Agency for Healthcare Research and Quality, 6010 Executive Blvd., Suite 300, Rockville, MD 20852.
Carolyn M. Clancy, M.D.
Acting Director
Agency for Healthcare Research and Quality
Robert Graham, M.D.
Director, Center for Practice and Technology Assessment
Agency for Healthcare Research and Quality
The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services of a particular drug, device, test, treatment, or other clinical service.
Context. Despite an extensive literature documenting multiple benefits of the autopsy, only approximately 5% of deaths nationally undergo autopsy. Many of the autopsy's benefits—relating to medical education, characterization of new diseases, and advancing the understanding of disease pathogenesis—are difficult to quantify and thus less likely to change the attitudes, reimbursement mechanisms, and other factors that have produced this dramatic decline in autopsy performance. This report, therefore, focuses on more quantifiable benefits—to individual clinicians, institutions, and the healthcare system as a whole. Specifically, the report reviews the literature addressing the extent to which the autopsy reveals important errors in clinical diagnosis and the roles these data have in measuring and improving clinical performance.
Methods. We conducted an extensive search of the MEDLINE® database, supplemented by hand searches of article bibliographies and consultation with experts in the field. Included studies were required to have well-defined patient samples, clinical diagnoses derived from autopsy request forms or chart review (rather than death certificates), and identification of diagnostic errors using well-defined classification schemes.
Main Results. Multiple regression analysis incorporating study period, autopsy rate, country (U.S. vs. non-U.S.) and case mix as predictors showed that diagnostic errors that may have affected patient outcome (“Class I errors”) are detected in 10.2% (95% CI: 6.7–15.3%) of autopsies performed in the base time (1980) and country (U.S.), and with the reference case mix (general autopsies) and mean autopsy rate. The prevalence of other “major errors” related to the principal diagnosis or underlying cause of death was 25.2% (95% CI: 20.8–31.2%). When changes in autopsy rates are taken into account, these error rates showed modest decreases with time. Specifically, Class I errors exhibited a relative decrease of 26.2% per decade (p=0.10), and major errors decreased at a rate of 28.0% per decade (p=0.006). Nonetheless, Class I errors still occur in 3.8–7.9% of cases and major errors in 8.0–22.8%, with these ranges reflecting the impact of variations in autopsy rates. Studies specifically addressing the issue of clinical selection indicate that clinicians cannot reliably predict which autopsies will be of high diagnostic yield. No intervention study has directly addressed the impact of autopsy findings on clinical practice or performance improvement. However, the existing evidence strongly suggests substantial inaccuracies in death certificates and hospital discharge data, both of which play important roles in epidemiologic research and healthcare policy decisions.
Conclusions. At the level of the individual clinician, the chance that autopsy will reveal important unsuspected diagnoses in a given case remains significant. Moreover, clinicians do not seem able to predict the cases in which such findings are likely to occur. There is no evidence to determine whether findings from autopsy improve subsequent clinical performance. The existing literature does demonstrate that clinical diagnoses, whether obtained from death certificates or hospital discharge data, contain major inaccuracies compared with autopsy diagnoses. The health care system as a whole can thus benefit enormously from autopsy data, by substantially enhancing the accuracy of vital statistics, which play important roles in research, funding, and other policy decisions. Future research opportunities include characterizing the factors leading to errors in clinical diagnosis, establishing optimal means of using autopsy data in performance improvement strategies, and exploring different mechanisms for encouraging autopsies.
An extensive literature documents a high prevalence of errors in clinical diagnosis discovered at autopsy. Multiple studies have suggested no significant decrease in these errors over time. Despite these findings, autopsies have dramatically decreased in frequency in the United States and many other countries. In 1994, the last year for which national U.S. data exist, the autopsy rate for all non-forensic deaths fell below 6%. The marked decline in autopsy rates from previous rates of 40–50% undoubtedly reflects various factors, including reimbursement issues, the attitudes of clinicians regarding the utility of autopsies in the setting of other diagnostic advances, and general unfamiliarity with the autopsy and techniques for requesting it, especially among physicians-in-training.
The autopsy is valuable for its role in undergraduate and graduate medical education, the identification and characterization of new diseases, and contributions to the understanding of disease pathogenesis. Although extensive, these benefits are difficult to quantify. This systematic review studied the more easily quantifiable benefits of the autopsy as a tool in performance measurement and improvement. Such benefits largely relate to the role of the autopsy in detecting errors in clinical diagnosis and unsuspected complications of treatment. It is hoped that characterizing the extent to which the autopsy provides data relevant to clinical performance measurement and improvement will help inform strategies for preserving the benefits of routinely obtained autopsies and for considering its wider use as an instrument for quality improvement.
This report does not attempt to address the roles of the autopsy in medical education; furthering medical research; quality control within pathology; verification, second-opinion consultations, and legal documentation of findings; the bereavement process for surviving family members; or other benefits that are described in many of the sources listed in the bibliography (Appendix F). In addition to being difficult to quantify, these benefits apply primarily to teaching hospitals. To address the role of the autopsy as an outcome measure and tool for quality improvement, the report focuses on benefits likely to apply to all hospitals, such as the detection of important diagnostic errors and related quality problems.
This report synthesizes the autopsy literature as it relates to the following four key questions:
To what extent does the autopsy reveal important diagnoses that were clinically unsuspected prior to death?
To what extent does the autopsy provide a useful performance measure or audit of clinical diagnosis in general?
What impact do autopsy findings have on clinical performance improvement?
To what extent are vital statistics compromised by low autopsy rates?
To address the above questions adequately, we also sought evidence pertaining to the properties of the autopsy as a diagnostic test. Specifically, we looked for any information describing autopsy quality, accuracy, and precision or reproducibility.
It is important to note that, though the phrase “diagnostic error” appears throughout this report, the discrepancies between clinical and autopsy diagnoses to which we refer do not necessarily represent errors in the sense of mistakes, “slips,” or other such terms. Some of these discrepancies do undoubtedly result from failures to consider an appropriately broad differential diagnosis, misinterpretation of test results, and other quality problems, so that resulting discrepant diagnoses detected at autopsy do warrant the label “diagnostic errors.” However, other such discrepancies clearly represent acceptable limits to clinical diagnosis, based on the performance of current technologies or the occurrence of atypical clinical presentations. (In fact, one of the areas of future research identified by this report involves characterizing the relative distribution of these two types of clinical-autopsy diagnostic discrepancies.) Despite these considerations, we use the term “diagnostic errors” because it appears so commonly in the autopsy literature.
The patient population covered in this report includes all patients (e.g., adult and pediatric, male and female, and so on) in various settings, although predominantly consisting of hospitalized patients. We did not specifically exclude medical examiner cases, but few studies from the forensic literature addressed the specific questions posed in this report.
We conducted an extensive search of the MEDLINE® database, supplemented by hand searches of article bibliographies and consultation with experts in the field. For articles published in languages other than English, we reviewed the abstract (if available) to determine whether or not the study reported methodologies or findings qualitatively different from those described in the English-language literature.
The autopsy literature consists entirely of observational studies, rendering problematic the development of appropriate inclusion and exclusion criteria, as the vast majority of systematic reviews involve at least some randomized controlled trials. In the absence of relevant and well-established quality scoring systems, we adopted fairly minimal inclusion and exclusion criteria. For studies reporting diagnostic error rates detected at autopsy, we required:
Well-defined patient samples consisting of consecutive or randomly sampled autopsies meeting explicit criteria—convenience samples were excluded.
Clinical diagnoses derived from autopsy request forms submitted by clinicians or chart review performed by the study investigators—clinical diagnoses derived solely from death certificates were excluded.
Classification schemes for discrepancies between clinical and autopsy diagnoses conforming to one of three categories—potentially treatable causes of death (“Class I”), other major missed diagnoses, and discrepant disease categorizations based on standard international classification coding. These classifications (defined further in the report) encompass the majority of studies reported in the literature. Studies that reported clinical diagnoses simply as “correct/incorrect” or “confirmed/unconfirmed” were excluded.
Articles identified from the literature search were stored in a reference database and categorized according to the study questions addressed. Structured abstraction forms were then used to collect demographic data (pertaining to patients and institutions), salient methodologic features and results. Each article was abstracted by at least two of the four reviewers, including three physicians and one non-physician research assistant. One of the physicians reviewed all of the articles.
To address the first key question pertaining to the extent to which autopsies reveal clinically unsuspected important diagnoses, we reviewed studies assessing the performance of the autopsy as a diagnostic test. Given the generally accepted role of the autopsy as the ultimate diagnostic standard for many aspects of clinical care, the test characteristics of the autopsy have received surprisingly little attention.
The quality of the autopsy has received little systematic study, with the only evidence pertaining to perinatal autopsies, where two studies show that deficiencies relative to reporting standards (i.e., a proxy measure for potentially inadequate quality) appear to be common.
The potential for error or disagreement in autopsy interpretations has been assessed in only one small study. In relation to the determination of principal diagnoses relating to the cause of death in technically adequate autopsy, diagnostic uncertainty persists in 1–5% of cases, although rates of up to 40% have been reported, depending on the type of autopsy cases, e.g., perinatal. Importantly, errors in classification of autopsy diagnoses involving even a few percent of cases substantially distort estimates of the performance of clinical diagnosis when autopsy is used as the gold standard.
The reproducibility of judgments about errors in clinical diagnosis as indicated by autopsy findings has only been mentioned in passing in the autopsy literature. Studies from the health care quality and medical error literature suggest that reproducibility of similar types of judgments is likely fair to moderate at best.
There is insufficient literature to address: a) the quality of the autopsy, b) the technical adequacy in interpreting autopsy findings, and c) the reliability of judgments made regarding autopsy detected discrepancies. There is also no literature that addresses the quality of training in autopsy pathology or the ability of physicians to utilize autopsy findings.
In terms of the four main study questions:
To what extent does the autopsy reveal important diagnoses that were clinically unsuspected prior to death?
The chance that autopsy will reveal a misdiagnosis that may have affected outcome (i.e., a Class I error) was 10.2% (95% CI: 6.7–15.3%) using data from all studies and the base values of time (1980), autopsy rate (overall mean rate of 44.3%), country (U.S.) and case mix (general autopsies). Restricting the analysis to data from U.S. institutions only yielded a slightly higher point estimate but almost entirely overlapping confidence interval, 11.2% (95% CI: 6.9–17.5%). Adjusting for changes in autopsy rates, and the effects of case mix and the country, the probability of a Class I error showed a relative decrease of 26.2% per decade (p=0.10).
The base probability of the autopsy detecting a major error in a given case was 25.6% (95% CI: 20.8–31.2%) when data from all institutions were included. Using data from U.S. institutions only, the probability of the autopsy detecting a major error in a given case was slightly lower at 24.0%, but with an almost entirely overlapping 95% CI of 17.6–31.5%. Major error rates also showed a similar decrease over time, but, in contrast to the results for Class I errors, this relationship was statistically significant. Relative to the base rate in 1980, the prevalence of major errors exhibited a relative decrease of 28.0% (95% CI: 9.8–42.6%) per decade.
The regression analysis supported the expected inverse correlation between error rate and autopsy rate (i.e., that lower autopsy rates produce higher error rates due to selection of diagnostically challenging cases), but this effect is relatively modest. Specifically, every 10% increase in the autopsy rate is associated with a relative decrease in Class I errors of 7.8% (p=0.18). For major errors, this relationship was more substantial and statistically significant, with every 10% increase in autopsies associated with a relative decrease in major errors of 12% (p=0.0003).
Using the regression model to compute rates of autopsy-detected diagnostic errors over a range of autopsy rates and as a function of time, contemporary (year 2000) autopsies detect Class I errors in 3.8–7.9% of cases and major errors in 8.0–22.8%, of cases. These ranges reflect variations in autopsy rates from 5–100%.
The weak relationship between autopsy rates and error rates in the general analysis was supplemented by review of studies specifically addressing the issue of clinical selection of diagnostically challenging or uncertain cases. These studies indicated that clinicians cannot reliably predict which autopsies will be of high diagnostic yield, reinforcing the conclusion that the relatively unchanged diagnostic error rates do not simply reflect competing effects of medical progress (leading to fewer errors) and fewer autopsies (leading to selection for cases likely to have errors).
Because of the recent interest in medical error and patient safety, we specifically looked for studies that reported the proportion of autopsies that detected clinically unsuspected complications of care. These data were usually mentioned in passing in these studies, with no study specifically focusing on this issue. Thus, the extent to which these complications contributed to death (and even the extent to which they were truly unsuspected) was often unclear. For this reason, and because of the heterogeneity of the case mix in the relatively small sample of studies reporting the relevant data, we did not pool estimates for rates of autopsy-detection of unsuspected complications of care. Nonetheless, the 11 studies that did provide data on this point indicated that approximately 1–5% of autopsies disclose unsuspected complications of care.
To what extent does the autopsy provide a useful performance measure or audit of clinical diagnosis in general?
Autopsy studies commonly report diagnostic “error rates,” but these error rates involve autopsied cases only. It is commonly assumed that the true denominator of interest is all deaths; hence the interest in increased autopsy rates. However, the denominator of interest for clinical performance measurement is, in fact, all patients receiving care during the autopsy observation period. Only one autopsy study provides any data on clinical diagnoses for patients discharged alive from the hospital during the same observation period as for the autopsy series. Because of the importance of this question, we searched extensively for studies outside the autopsy literature per se for potentially relevant studies.
Specifically, we looked for studies reporting clinical diagnoses and other follow-up data on cohorts of patients (e.g., all patients admitted to a given hospital during a defined observation period), not just the diagnoses obtained for patients who died and went to autopsy. Supplementing autopsy findings with the results of ante mortem diagnostic testing and/or clinical follow-up for patients who did not die permits determination of the numerator and denominator required to assess the sensitivity of clinical diagnosis. Despite an extensive search, we found appropriate studies for only five target conditions: pulmonary embolism (PE), acute myocardial infarction (MI), acute appendicitis, aortic dissection, and active tuberculosis.
Among these five conditions, the performance of clinical diagnosis exhibited substantial variation, with excellent performance only for acute MI and to a lesser extent PE. Even for these two conditions, the high sensitivities obtained likely overstate clinical performance, as focusing on the dichotomous outcome of correct or incorrect identification of one target condition (PE or MI) obscures the extent to which other important conditions are missed once these target diagnoses are ruled out. A patient who is correctly identified as not having an MI counts as a success, regardless of whether or not the underlying cause of the patient's presenting complaint is ever diagnosed.
What impact do autopsy findings have on clinical performance improvement?
No intervention study has directly addressed the impact of autopsy findings on clinical practice or performance improvement. Consequently, the study objectives in this regard were not met, including not being able to perform a cost effectiveness analysis, as the effectiveness of the autopsy in reducing errors and other quality problems remains unknown. This does not invalidate the potential role of the autopsy in relation to clinical practice or performance improvement, but does reveal an important gap in the literature.
To what extent are vital statistics compromised by low autopsy rates?
Major error rates detected by autopsy indicate substantial inaccuracies in death certificates and hospital discharge data, both of which play important roles in epidemiologic research and health care policy decisions. Previous studies have suggested that these errors roughly cancel each other out (i.e., for a given condition, false positive and false negative diagnoses are roughly equal). However, this finding has not been consistent across studies. Even when present, this balancing effect applies only when considering the most general of diagnostic categories (i.e., cardiovascular, neoplastic, infectious, metabolic, and so on). Thus, the current evidence is adequate to suggest that the epidemiologic data for important diseases such as myocardial infarction, breast cancer, pneumonia, stroke, and so on, all contain substantial inaccuracies—in the 20–30% range reported for major errors.
The findings of this review have different implications depending on the level of analysis—individual clinicians, hospitals, or the health care system as a whole. From the point of view of the individual clinician, the chance that autopsy will reveal important unsuspected diagnoses in a given case remains significant. Moreover, clinicians do not seem able to predict reliably cases in which such findings are more likely to occur. Thus, clinicians have compelling reasons to request autopsies far more often than currently occurs.
At the institutional level, the role of the autopsy is less clear. The prevalence of missed diagnoses among autopsied patients (or even all deaths) provides a numerator, but not a denominator with which to assess the rate at which patients with a given condition remain undiagnosed until death. Using autopsy results to track hospital quality requires not only explicitly defined error rates, but also data on the number of patients discharged alive with diagnoses that appear among the list of conditions first detected at autopsy. Clearly, though, the unexpected findings at autopsy in specific cases are of interest to institutions as a whole and not just the individual treating clinicians. However, no study has ever examined the impact of performing autopsies (and communicating autopsy findings back to clinicians) on institutional performance improvement. This represents a major area for future research, but should not detract from the finding that many institutions perform too few autopsies to allow any meaningful assessment of local diagnostic performance and other quality problems, no matter how communication and feedback to clinicians occurs.
At the level of the entire health care system, existing literature provides two compelling reasons to pursue autopsies. First, results for the five conditions examined in this report suggest that clinical diagnosis in routine practice may not perform as well as is generally believed by clinicians or as suggested by the literature assessing specific aspects of clinical diagnosis (e.g., new tests) in research settings. Better characterizing the performance of clinical diagnosis for common conditions would clearly benefit the entire health system and identify important targets for quality improvement that could be pursued in a concerted manner.
The second benefit to the health care system as a whole relates to vital statistics and other epidemiologic data. Vital statistics impact important decisions about allocation of funding for research and other aspects of health care policy. The existing literature demonstrates that clinical diagnoses, whether obtained from death certificates or hospital discharge data, contain major inaccuracies compared with diagnoses generated from postmortem findings. The use of autopsy data to correct inaccuracies in epidemiologic data would likely confer multiple benefits on the health care system as a whole.
Various aspects of the performance of the autopsy as a diagnostic test (e.g., the reproducibility of findings between pathologists) remain undefined and represent areas for further research. More specifically relevant to the present review is the inter-rater reliability for error classifications in specific cases, i.e., establishing the extent to which pathologists, clinicians or other peer reviewers agree that a particular case does or does not involve a clinically important diagnostic error.
The causes of important diagnostic discrepancies remain uncharacterized. This represents a very important area of investigation. Discrepancies between efficacy and effectiveness (i.e., differences between the performance of a diagnostic or therapeutic procedure in routine practice compared to the result in the research literature) have diverse causes. Broadly speaking, though, discrepancies are caused by a) quality problems related to underuse, overuse and misuse of diagnostic or therapeutic procedures, and b) patient factors, including atypical presentations and complex interactions between comorbid conditions and patient demographic factors. Neither of these categories are captured in the “efficacy literature” (i.e., clinical trials), as the nature of research settings make underuse, overuse or misuse unlikely, and stringent patient selection reduces the complexities of comorbid conditions and multiple competing diagnostic considerations.
Autopsy data provide a window into discrepancies between efficacy and effectiveness both for therapeutics (by detecting clinically unsuspected complications of care) and diagnostics (by detecting the diagnostic discrepancies discussed in this report). In both cases, but perhaps especially the latter, the autopsy can play a pivotal role in spearheading investigations into the causes of these discrepancies. Where discrepancies prove to present quality problems, the institution benefits and, where they reflect differences between the types of patients receiving care in routine practice and clinical trials, the whole health system may benefit from awareness of these findings.
Future research should establish strategies for optimizing the utility of the autopsy at the institutional level. No study has ever directly assessed the impact of detecting errors in clinical diagnosis on subsequent clinical performance. Thus, future research should establish optimal methods of involving clinicians in the autopsy process (or communicating its results to them) and effective ways of stimulating change based on autopsy findings. Until such research is performed it is not clear to what extent autopsy rates need to be increased as opposed to achieving improvements in communication and utilization of information generated from autopsies performed at current rates.
Future research should establish the optimal means of using autopsy data to provide more accurate vital statistics and other important epidemiologic data. The first step might be to validate the findings suggested in this review, namely that current vital statistics contain substantial inaccuracies. Such an undertaking might involve funding a small number of demographically diverse institutions to achieve high institutional autopsy rates, with prospectively determined protocols for autopsy performance and error classification. Even one year's worth of data from such a project would likely document substantial inaccuracies in vital statistics. Continuing such a project could also provide ongoing epidemiologic data, as well as more meaningful error rates that could be used to fuel quality improvement efforts throughout the health system. Such a program would not replace autopsies as routinely performed elsewhere, that is, this suggested research program would not be equivalent to a system of regional autopsy centers performing autopsies on behalf of other institutions. Rather, these centers would act as surveillance centers for basic causes of death and detection of quality problems and present numerous opportunities for basic research into the pathogenesis of acute and chronic illnesses.
The nomination for this report came from the College of American Pathologists (CAP), which sought a formal assessment of the extensive literature on the role of the autopsy as an outcome and performance measure. This literature, beginning in 19121 and continuing to the present,2, 3 documents the prevalence of significant diagnostic errors discovered at autopsy, and forms the basis for regarding the autopsy as a potential tool in clinical audit and quality assessment. The autopsy has numerous other known and suggested benefits, 4–10 with specific examples including: more accurate vital statistics11–16; provision of accurate prevalence data for specific target conditions17–22; pathologic descriptions of new diseases23–26; teaching gross anatomy, disease progression and pathology in both undergraduate and graduate medical education27, 28; and comfort to family members in knowing the cause of death or allaying fears regarding heritable conditions.29–31
Despite the continued fulfillment of these roles, the frequency of autopsies has steadily declined in this country and elsewhere (Appendix Figures 1&2). In 1994, the last year for which there are national data, the autopsy rate for all non-forensic deaths fell below 6%.32 A recent survey found that over half of all hospitals in one state reported performing no autopsies during a one year period.33 The marked decline in autopsy rates from previous rates as high as 40–50% undoubtedly reflects various factors, including the lack of direct or indirect economic incentives and absence of regulatory requirements for minimum autopsy rates.32–34 Other factors include the attitudes of clinicians regarding the utility of autopsies in the setting of other diagnostic advances, and general unfamiliarity with the autopsy and techniques for requesting it, especially among physicians-in-training.28, 35–41
Although the autopsy plays a number of potentially valuable roles, quantification of this value is difficult except in relation to the impact of autopsy performance on clinical diagnostics or therapeutics. In this way, assessments of the value of postmortem examination are handicapped in a fashion similar to assessments of the physical examination performed by clinicians. The act of examining patients has a number of benefits—focusing the clinician on the patient as an individual, reassuring the patient, and various roles in medical education.42, 43 Nevertheless, assessments of the value of the physical examination inevitably focus on its impact on medical decision-making related to diagnosis or treatment.44–50
Recognizing at the outset, then, that our review would not take into account less quantifiable attributes of the autopsy (e.g., its value in medical education), we set out to assess the extent to which the autopsy provides a valid measure of clinical performance, especially as regards diagnosis.
An extensive literature has focused on the prevalence of clinically important diagnoses that remained undetected until autopsy. Several well-known studies from this literature have compared rates of “diagnostic errors” detected by autopsy from different time periods and found surprisingly little change.51–55 It is important to note that the discrepancies between clinical and autopsy diagnoses discussed here and in these studies do not necessarily represent errors per se. In some cases, misinterpretation of test results, failure to respond appropriately to abnormal clinical findings and other such mistakes or 'slips" in the diagnostic process do produce diagnostic discrepancies warranting the term “error.” In other cases, however, atypical clinical presentations or the limits of current diagnostic techniques result in misdiagnoses without any true “errors” on the parts of clinicians. Despite this important distinction, we refer to “diagnostic errors” throughout this report because of the ubiquitous use of this phrase in the autopsy literature.
Discussions of the significance of autopsy-detected diagnostic errors have generally focused on the degree to which they reflect increased selection by clinicians.56, 57 Given that autopsy rates have declined, it is quite plausible to suggest that cases for which clinicians request autopsy represent precisely those that have presented diagnostic difficulties. This question is obviously important (and one which we address in this report), but it is also important to clarify how autopsy-detected diagnostic error rates should be interpreted.
Reproduced from Saracci58 with permission
Autopsied patients represent a nonrandom sample of all deaths, which in turn constitute a non-random sample of all patients. This non-random sampling is further complicated by the fact that most autopsy series involve hospital deaths only, adding a further level of selection. Thus, the interpretation of autopsy-detected diagnostic error rates is complicated by secular changes in the proportion and demographics of patients passing through these successive selections. Fewer and sicker patients are admitted to the hospital for shorter periods of time.59–67 For many conditions, overall mortality has decreased, and, as already noted, fewer deceased patients undergo autopsy. Before attempting to unravel these complicating factors, it is worth considering the extent to which one ought to expect the rate of autopsy detected errors in clinical diagnosis to decrease over time (as commonly assumed in studies reporting these rates).
Patients who die generally fall into one of two categories - treatment failures (e.g., complications from treatments, ineffective treatment available, etc.) or diagnostic failures (e.g., missed or incorrect diagnoses).1While improvements in care reduce overall mortality for an increasing number of conditions, the proportion of deaths in these two categories may remain relatively unchanged. In fact, if the number of patients in the first category (treatment failures) decreases over time for many conditions, the relative proportion of deaths due to diagnostic failures might increase. Alternatively, if diagnostic advances outpace improvements in therapy, more patients will be diagnosed with a condition, but then not respond to treatment and therefore count as treatment failures. This issue is discussed in greater detail in the section on autopsy studies of the sensitivity of clinical diagnosis for pulmonary embolism.
The main point is that the relationship between diagnostic errors detected at autopsy and the overall performance of clinical diagnosis is quite complex. For a given condition it may be possible to predict trends in diagnostic error rates, but for clinical diagnosis taken as a whole, there is no reason to expect a decrease in error rates over time. The specific study questions below attempt to clarify the significance of autopsy-detected errors in clinical diagnosis in the context of this complexity.
This report primarily addresses the following questions:
To what extent does the autopsy reveal important diagnoses that were clinically unsuspected prior to death?
To what extent does the autopsy provide a useful performance measure or audit of clinical diagnosis in general?
What impact do autopsy findings have on clinical performance improvement?
Do autopsy findings have a role to play in generating accurate epidemiologic data (e.g., vital statistics)?
Although the first two questions are often considered the same, they are in fact quite distinct. The first question addresses the degree to which clinicians have correctly identified important potentially treatable diagnoses (e.g., pulmonary embolism, dissecting aortic aneurysm, bowel perforation) prior to death. Using the autopsy literature to answer this question requires assessing the extent to which the rate of diagnostic errors detected at autopsy simply reflects selection by clinicians of diagnostically challenging cases. Studies directly relevant to this assessment include autopsy series in which close to 100% of deaths were autopsied and studies in which clinicians explicitly state their diagnostic confidence prior to autopsy results becoming known.
The second question asks how common is it for important treatable diagnoses to escape detection during life and therefore first come to attention at autopsy. Answering this question requires taking into account performance of clinical diagnosis among all patients with conditions of interest, not just those who died. Precise evaluations of the overall performance of clinical diagnosis are complicated by the lack of a uniform diagnostic standard for (presumptive) clinical diagnoses among patients who survive. Nevertheless, long-term clinical follow-up of patients who survive combined with autopsy results for patients who die provides an estimate of the true sensitivity of clinical diagnosis for a given condition of interest.
The third question asks what impact the autopsy has on quality improvement. In other words, to what extent does knowledge of clinically missed diagnoses or unrecognized complications of care reduce the subsequent occurrence of similar quality problems. Many commentators have responded to studies of diagnostic error rates detected at autopsy by calling for an increase in the rate at which autopsies are performed. But, increasing autopsy rates without appropriate utilization of the information derived from autopsies may achieve little benefit. If hospitals are to invest in increasing current autopsy rates, there would also need to be simultaneous improvements in the utilization of the information derived from autopsies, in order to have a chance of realizing a clinical benefit for future patients.
The above questions focus on clinical applications of autopsy-detected errors, but the autopsy undoubtedly has multiple other values. One of the more quantifiable or tangible potential applications of autopsy-detected diagnostic errors is in generating accurate vital statistics, which is the target of the fourth question.
Writing as an epidemiologist with an interest in the autopsy, Saracci58, 68, 69 suggested several changes in the approach to the autopsy in order to foster its valid use as a “monitor of the quality of clinical diagnosis.” First, he suggested shifting away from the common use of agreement or confirmation rates, and instead using autopsy findings to calculate the sensitivity and specificity of clinical diagnoses. Second, he stressed the importance of well-defined protocols for autopsy performance and a systematic approach to assessing discrepancies between autopsy findings and clinical diagnoses. Third, he outlined steps that ought to be taken to account for the distorting effects of non-random selection of cases for autopsy. Such steps included tracking the severity of illness and spectrum of diseases seen at the hospital in question and the percentage of cases with different diagnoses that undergo autopsy.
The issues highlighted by Saracci58 inform much of the methods used to evaluate the literature summarized in this report. For instance, as described below, autopsy studies that merely dichotomize clinical diagnoses as confirmed/not confirmed (or correct/incorrect, etc.) were not included in the section on diagnostic errors detected at autopsy. Second, we review the literature addressing the properties of the autopsy as a diagnostic test. Thus, we consider questions relating to the quality of autopsy performance, variability among pathologists in interpreting autopsy results, and the frequency with which diagnoses remain unclear despite adequate postmortem examination. Third, in reviewing the literature on diagnostic errors detected autopsy, we discuss in detail issues related to the non-random selection of cases for autopsy.
In establishing the specific questions addressed in this report, we started with an initial set of objectives proposed by the College of American Pathologists to the Agency for Healthcare Research and Quality (AHRQ). Preliminary reviews of the literature, discussions with a Technical Expert Advisory Group (Appendix F) for the project, and input from AHRQ resulted in refinement of the original questions. For example, determination of the cost-effectiveness of autopsy was proposed in the initial questions, but not included in the final set of study objectives. To accomplish a cost-effectiveness analysis for the autopsy requires quantifying the benefits associated with autopsy or interventions using information derived from autopsy findings. As stated previously, many of the benefits of the autopsy are difficult to quantify at all. The benefit with the most potential for quantification to the extent required of a cost-effectiveness analysis is the impact of autopsy on reducing diagnostic errors or complications of care. However, no study has ever addressed this question, so the effectiveness of the autopsy is unknown in this regard. In addition, preliminary reviews of the literature retrieved inadequate data on the true costs of the autopsy. It is possible that data on autopsy-detected errors in clinical diagnosis collected for this report might provide part of the foundation for some modeling of the possible cost effectiveness of the autopsy, but generating such estimates was beyond the scope of the current project which is primarily a systematic review of the literature.
We conducted an extensive electronic literature of the MEDLINE® database, supplemented by hand searches of article bibliographies and information from experts in the field. The MEDLINE® search strategy involved the Medical Subject Heading (MeSH) terms autopsy and postmortem changes, as well as the following title words: autopsy, post-mortem, postmortem, necropsy, and posthumous. We searched the more than 35,000 citations retrieved with this initial broad search using terms that captured aspects of study design (e.g., epidemiologic studies, clinical trials, comparative study) or topics relating to diagnosis (e.g., diagnostic errors, diagnostic techniques and procedures, diagnosis, differential) or error (e.g., medical error, iatrogenic disease, sentinel surveillance, safety).
The Cochrane Library was searched using similar key terms and title words. Preliminary searches of the CINAHL® database revealed extensive overlap with articles already retrieved from MEDLINE®, so this database was not searched further.
Reference lists from all relevant articles were reviewed to identify additional studies. The extensive search was completed in September 2000, but periodic electronic searches were conducted throughout the project until November 1, 2001 as relevant studies continued to appear during this time. During the final preparation of the report, four additional studies were identified after a repeat search was conducted in April 2002.319–322
Unlike systematic reviews of treatment efficacy, reviews related to health services research must take into account differences in practice patterns between countries, as well as different economic structures to health care systems. We were concerned that U.S. clinicians might regard patients in other countries as less likely to undergo extensive sophisticated imaging procedures (e.g., magnetic resonance imaging) and thus question the relevance of these studies to U.S. practice. On the other hand, we did not want to miss the opportunity to review as much of the world literature as possible, especially as this view likely represents a misconception. As a result, we did not rule out the possibility of including non-English studies, especially when English language abstracts were available and indicated relevant data addressing a research question not answered by data from English-language articles. However, we did not search other databases known to capture more of the non-English published literature (e.g., EMBASE).
The autopsy literature consists entirely of observational studies, rendering problematic the development of appropriate inclusion and exclusion criteria as most systematic reviews usually involve at least some randomized controlled trials. In the absence of quality scoring systems or other well-established criteria, we adopted fairly minimal inclusion and exclusion criteria.
For studies of diagnostic errors detected at autopsy, we required the following:
Well-defined patient samples consisting of consecutive or randomly sampled autopsies meeting explicit criteria (e.g., all adults dying after hospital arrival during a specified time period of time);
Clinical diagnoses derived from autopsy request forms submitted by clinicians or chart review performed by the study investigators;
Classification schemes for discrepancies between clinical and autopsy diagnoses conforming to one of the following three categories:
Class I equivalent—missed diagnoses that “would,” “could,” possibly” or “might” have affected patient “prognosis” or “outcome” had they been detected during life. (At a minimum, such impact involved discharge from the hospital alive). This category also includes any scheme explicitly identified as equivalent to the classification proposed by Goldman et al 51 or the subsequent modification by Battle et al.70 Error schemes that make no distinction between changes in management and changes in outcome were classified under “major” below.
Major errors—missed diagnoses that, while important, likely had no effect on patient outcome; changes in management could be assessed as possible or even expected, but impacts on outcome for errors in this category were judged to have been “equivocal,” “doubtful,” “unlikely,” or otherwise unexpected. (Errors for which expected impacts on patient outcomes were explicitly restricted to symptom palliation were included in this category.) This category also includes the sum of Class I and Class II errors from articles that explicitly reference the classification proposed by Goldman et al 51 or the subsequent modification by Battle et al.70 The reason for including Class I and Class II together in this category is that this seemed more comparable to the many articles that did not distinguish between changes in management and changes in outcome - i.e., errors that, had they been diagnosed would likely have led to changes in treatment or improved survival.
Discrepant ICD major disease categories—errors in this category include cases in which major clinical and autopsy diagnoses fall in different disease categories in the International Classification of Diseases (ICD). These categories consist of groupings of three-digit ICD codes into general headings such as “infective,” “neoplastic,” “cardiovascular,” and so on. Although this classification of clinical-autopsy discrepancies makes no mention of changes in outcome or treatment, such changes would be likely in many, if not most cases. Moreover, the classification scheme is well defined and likely involves less subjectivity than the other two error categories above.
For studies of diagnostic errors detected at autopsy, we excluded those with any of the following features:
An autopsy series equivalent to a “convenience sample” (consecutive series missing more than 20% of eligible cases were also excluded)
Assessments of clinical diagnoses based primarily on death certificates. (This criterion did not apply to death certificates created especially for a study, for instance, by having an investigator code new death certificates for all included cases using clinical information.14)
Dichotomous error classification schemes for clinical diagnoses such as “correct/incorrect” and confirmed/unconfirmed."
Studies excluded from consideration with regard to the performance of clinical diagnosis could still be considered in addressing other study questions, such as the performance of the autopsy as a diagnostic test, trends in autopsy rates and attitudes towards the autopsy.
Articles identified from the literature search were stored in a reference database and categorized according to the study questions addressed (Appendix C). Structured abstraction forms (Appendix D) were then used to collect demographic data (pertaining to patients and institutions), salient methodologic features and results. Each article was abstracted by at least two of four reviewers, including three physicians and one non-physician research assistant. One of the physicians reviewed all of the articles.
Mean error rates were calculated using logistic regression methods. Specifically, the probability of autopsy detected errors served as the dependent variable in a regression model that included as predictors: study period, autopsy rate, case mix and country (US or non-US). Appendix B provides further details regarding the statistical methods.
A preliminary draft of the report was submitted for peer review and commentary by experts chosen by the Evidence-based practice center (EPC) team as well as by the College of American Pathologists. These reviewers provided many helpful comments and suggestions, which were incorporated into the final draft. These reviewers, as well as the many advisors to this project, are listed in the Acknowledgments (Appendix G).
Over 500 articles presented data addressing some aspect of the study questions. To address the first question related to autopsy-detected errors in clinical diagnosis, we reviewed 225 English-language studies. Abstracts for 34 additional articles in languages other than English were reviewed, but none added to or substantially changed the information already available in the English-language literature. For other study questions, review of the non-English abstracts suggested a possible benefit from translating one article. A full translation was obtained for this Danish article presenting data on the impact of legislation on autopsy rates.71
Four basic methodological issues significantly impact the ability to address this question about autopsy-detected error rates in clinical diagnosis, but have received little to no attention in the autopsy literature. These issues and the literature that addresses them are summarized below.
Audits of the quality of autopsy reports by panels of reviewers have shown substantial deficiencies in terms of the report content and interpretation of autopsy findings. For instance, an audit of 104 perinatal autopsies in northern England reported low compliance with minimum autopsy reporting standards proposed by the Royal College of Pathologists. 72 These standards consisted of a 100 point scoring system (to achieve a maximum score of 600 points) for six factors identified in the report: body measurements (crown to rump or heel, head circumference, foot length, and body weight); organ weights (lung, heart, liver, brain); quoting of normal values; histological report on main organs; radiology report; microbiology report; and other relevant findings (e.g., cytogenetics). The minimum acceptable score based on the Royal College of Pathologists guidelines was set at 250 total points. Only 51% of cases met or exceeded the minimum score and adequate interpretative comments were recorded in only 49%. Also importantly, the reported autopsy rate of 45% for this study was well below the College's recommended rate of 75% and most of the autopsy cases were classified as Coroner's cases.72
An audit of 314 perinatal autopsies in Wales similarly reported achievement of minimum reporting standards in only 54% of cases and showed a clear difference in quality scores for autopsies performed at a regional pediatric referral center, where only 8% failed to meet the minimum score compared to smaller community hospitals in which 72% failed to meet the minimum score.73, 74 In this case, the authors conducted a follow-up study and noted substantial improvements in autopsy quality as a result of general educational efforts but also in the context of a shift in autopsies to a regional center.75
The quality of adult autopsies has received less attention. Only one study explicitly addressed autopsy quality for adult cases, and this study focused on the completeness of autopsy requests and the timeliness of autopsy reports.76 Two other studies provide some information relevant to autopsy quality in adults.77, 78 In one series of 1000 adult autopsies, the authors listed 73 cases (7.3%, 95% CI: 5.8–9.1%) as too poorly performed or reported to include in their analysis of diagnostic errors.77 In another study of 111 autopsies, the authors stated that in 3 cases (2.7%, 95% CI: 0.7–8.3%) deficiencies in autopsy performance resulted in failure of the autopsy to answer important clinical questions.78
While appropriate documentation is certainly a component of healthcare quality and performance, caution must be used in interpreting overall quality from the quality of record keeping. Similar problems exist for documenting or reporting different components within the patient medical record including: clinical discharge summaries, history and physical examinations, operative reports, and pharmacy and clinic notes.79–89 More importantly, the level of completeness of such reports or summaries may effect medical decision making and patient care. Nonetheless, retrospective evaluation of this type of data as a surrogate for measuring quality often occurs, and the potential limitations must be acknowledged when drawing conclusions about the quality of care, whether for clinical activities or autopsy performance.
Like most complex procedures involving multiple observational and cognitive elements, the autopsy almost certainly has an error rate of its own,68 despite its role as a diagnostic standard. All diagnostic tests attempt to capture the “true diagnosis,” and test characteristics such as accuracy are defined in terms of this true diagnosis. For a “gold standard,” assessing the extent to which the test falls short of capturing the true diagnosis, presents practical difficulties because confirmatory tests generally rely on the gold standard for their own accuracy.
| “True Diagnosis” (Observed Autopsy Diagnosis) | ||||
|---|---|---|---|---|
| Lung Cancer | Other Major Diagnosis | Total | ||
| Clinical Diagnosis | Lung Cancer | 180 (182) | 90 (88) | 270 (270) |
| Other Major Diagnosis | 20 (54) | 1710 (1676) | 1730 (1730) | |
| Total | 200 (236) | 1800 (1764) | 2000 (2000) | |
Adapted from Saracci 58
Using the true diagnosis as the diagnostic standard, the sensitivity of clinical diagnosis for detecting lung cancer as the cause of death is 180/(180+20) = 0.90. When the observed autopsy results are taken as the diagnostic standard, the same sensitivity is 182/(182+54) = 0.77. Thus, even a relatively small autopsy misclassification rate of 2% would reduce the observed sensitivity of clinical diagnosis for lung cancer from its “true” value of 90% to an apparent value of 77%.
This illustration assumes that all of the misclassified cases have the same distribution as those cases already in the left-hand column. In fact, it could be that none of these cases would be diagnosed clinically as lung cancer, so that the observed sensitivity would correspond to 180/236=76%. At the other extreme, all of these misclassified cases might be diagnosed clinically as lung cancer, resulting in an observed sensitivity of 216/236=91%. Given this range of values, the conclusion remains that small rates of misclassification at autopsy can produce significant changes in the apparent sensitivity of clinical diagnosis.
In the absence of data on the accuracy of the autopsy, one can still ask about its reproducibility - i.e., the extent to which pathologists agree in their identification of the main autopsy diagnoses. Multiple studies within the pathology literature address the reproducibility for diagnoses in surgical pathology and cytology. These studies suggest a wide range of values for measures of agreement from excellent to poor, but the majority falls in the range conventionally regarded as indicating moderate to substantial agreement. 96–107 The range of values varies depending on the specific test evaluated (e.g., cervical cytology, 96, 97 prostate 98, 99 or liver biopsies101) and the evaluation setting (e.g., general pathologists versus specialists98, 99).
The above studies do not assess the extent to which pathologists confronted with the results of the diverse investigations that make up a given autopsy independently arrive at the same conclusions. Despite an extensive search, we found only three studies that address the issue of reproducibility among pathologists in interpreting autopsy findings.108–110 The oldest of these studies110 compared the judgments of three reviewers in assigning a principal cause of death based on the findings presented in 50 autopsy reports. Because the three reviewers were third-year medical students (trained to abstract autopsy reports and code death certificates for the purposes of this study), we did not regard the briefly summarized data on inter-rater agreement, presented in passing in this study of the accuracy of death certificates, as indicative of agreement among pathologists in ordinary clinical practice.
A second study109 involved the analysis of a series of 39 patients who died in an intensive care unit and underwent autopsy specifically to address pulmonary pathology. The study excluded patients infected with HIV, bilateral chest tubes or bilateral pleural infection. Four pathologists (who were blinded to clinical and microbiological data) independently examined postmortem lung biopsies from these patients with the purpose of judging the presence or absence of pneumonia. The authors reported these independent judgments as well as the diagnoses derived from the application of pre-selected criteria for the histologic diagnosis of pneumonia.111
| Observer pair | Observed Kappa (95% CI)* | Maximum possible Kappa† |
|---|---|---|
| A,B | 0.72 (0.49–0.95) | 0.83 |
| A,C | 0.65 (0.39–0.91) | 0.65 |
| A,D | 0.52 (0.22–0.82) | 0.52 |
| B,C | 0.81 (0.59–1.0) | 0.81 |
| B,D | 0.66 (0.38–0.94) | 0.66 |
| C,D | 0.84 (0.63–1.0) | 0.84 |
| D,D (6 months later) | 0.82 (0.59–1.0) | 1.0 |
Cohen's kappa, 112, 113 with 95% confidence-interval limits calculated according to the Wilson efficient-score method, corrected for continuity
Maximum possible kappa, given the observed marginal frequencies
The third study108 is the only one to address agreement among pathologists with respect to the autopsy as a whole - i.e. in determining principal underlying diseases and causes of death. This study involved four pathologists examining 35 autopsies. The authors reported excellent (near perfect) agreement for all pathologist pairs independently determining the principal disease (i.e., underlying cause of death), with kappa values between 0.83 and 0.97. Specifically, there were four cases that generated disagreement in the principal underlying disease. In three of these cases, only one pathologist disagreed with the other three, while in the fourth case two pathologists shared one diagnosis, while two shared a different diagnosis.
For assignments of the immediate cause of death, the authors reported kappa values ranging from 0.43–0.75. Importantly, this level of agreement discounts the roughly 30% of cases for which two of the pathologists could not decide on an immediate cause of death, since in these cases agreement was not measured. Listings of minor diseases showed much less agreement.
Overall, it is likely that pathologists exhibit substantial agreement in generating the principal autopsy diagnoses, although the literature contains little data addressing this issue.1
Numerous studies report group consensus for important discrepancies between clinical and autopsy diagnoses related to the cause of death.2, 51, 54, 55, 116–133 Unfortunately, only five studies55, 117, 124, 130, 131 mention the issue of reproducibility for these judgments of diagnostic discrepancies. In one of these studies,55 three investigators independently classified each case, with a fourth reviewer for cases with discordant classifications. Reclassification occurred for 20% of Class I and II errors, but the authors provided no further details on the consensus process nor did they calculate a kappa score or other measure of inter-rater agreement. In another study,124 the authors report that in 25 of 338 autopsies (7.4%, 95% CI: 4.9–10.9%) two reviewers could not reach consensus, so classification of clinical-autopsy discordances required a panel of peer reviewers. The other three studies117, 130, 131 provided too little detail regarding the assessment of agreement between reviewers to permit meaningful assessments of the reproducibility of judgments pertaining to diagnostic errors detected at autopsy.
Because these five studies provide so few data on the frequency of differences in opinion between reviewers assessing clinical-autopsy discrepancies, we reviewed studies from the medical error literature that provide data on comparable assessments. Several well-known studies have examined the reproducibility of peer review classifications of hospital deaths as “preventable” or “not preventable.” 134–137 These studies are relevant to the present question because the definition of preventability lies at the heart of the two most commonly used classifications of discrepancies between clinical and autopsy diagnoses.51, 70
Assessments of reviewer agreement from the medical error literature indicate that achieving anything beyond fair or modest inter-rater reliability for judgments about error or preventability for major adverse outcomes requires reviewer training and, in some cases, dropping reviewers that consistently yield divergent judgments.136 In the most recent of these studies (and the one which most specifically focused on the issue of reproducibility), the authors noted that “if one reviewer rated a death as definitely or probably preventable, the probability that the next reviewer would rate that case as definitely not preventable (18%) was actually slightly higher than the probability that the second reviewer would agree with the first (16%).” 137
The quality of the autopsy has received little systematic study except in the case of perinatal autopsies, where deficiencies in the quality of reporting appear to be common.
In at least 1–5% of cases, diagnostic uncertainty persists despite technically adequate autopsy. Classification errors affecting autopsy diagnoses at even this relatively small rate can substantially distort estimates of the performance of clinical diagnosis.
Only one small study has specifically assessed the reproducibility of autopsy diagnoses among pathologists. This study showed excellent inter-rater agreement for the principal diagnosis, but data on agreement for other major diagnoses were not presented.
The reproducibility of judgments about errors in clinical diagnosis as indicated by autopsy findings has received almost no attention. Extrapolation from studies of inter-rater reliability for peer review assessments of case notes in the healthcare quality and medical error literature suggests that reproducibility of such judgments is likely to be fair to moderate at best.
Notwithstanding the existing deficiencies in standardizing a systematic means of using the autopsy as a measure of clinical diagnostic performance, the extensive literature on clinically missed diagnoses detected at autopsy nevertheless bears attention from clinicians and others interested in measures of clinical quality.
One would expect diagnostic error rates to decrease over time due to medical advances, and thus would expect an inverse correlation between error rates and study period. Appendix Figures 5 & 6 present time trends for Class I and major error rates, respectively. Inspection of these figures indicates no clear changes in error rates over time, but it is possible that trends are obscured by differences in case mix, country effects and changes in autopsy rates. The latter possibility is of considerable interest, as clinicians generally believe that autopsy cases represent precisely those cases most likely to have unexpected findings detected at autopsy. Thus, true improvements in clinical diagnosis might be offset by increased selection of autopsy cases by clinicians. We addressed this possibility as well as the effects of case mix and study country by performing logistic regression using a model in which study period, autopsy rate, case mix (using 7 different categories) and country (U.S. vs. non-U.S.) were included as predictors of error rates.
The results of the quantitative analysis are presented in detail in Appendix B and summarized below. Briefly, we performed logistic regression for each of the three definitions of error (Class I, major and discrepancies in major ICD disease categories). In each case, the analysis began with a model that included all predictors—study period, autopsy rate, case mix (general adult inpatients, medical patients, surgical parents, pediatric, etc.) and country (U.S., non-U.S.). Hospital teaching status was not included as a predictor in any of the models, because too few studies involved non-teaching hospitals and because the nature of the teaching status (academic medical center, community teaching vs. tertiary referral center, presence of residents across a broad range of specialties) was usually unclear.
| Error Classification | Error rate* (95% CI) | Change in error rate with time** | Change in error rate with increased autopsy rate† |
|---|---|---|---|
| Class I | 10.2% (6.7–15.3%) | 26.2% relative decrease per decade (p=0.1) | 7.8% relative decrease for each 10% increase in autopsy rate (p=0.2) |
| Major errors | 25.6% (95% CI: 20.8–31.2%) | 28% relative decrease per decade (p=0.0006) | 12% relative decrease for each 10% increase in autopsy rate (p=0.0003) |
| Discrepant ICD Disease Categories | 11.7% (9.7–13.9%) | 28% increase per year (p<0.0001) | 1.4% decrease for each 5% increase in autopsy rate (p=0.1) |
The error rates listed here represent the probability of a diagnostic error at time zero (1980 for Class I and major errors, 1975 for ICD discrepancies), with the referent case mix category (general inpatients or adult inpatients), reference country (U.S.) and unweighted mean autopsy rate (44.3%) for included studies.
Changes are all relative to the value for the base year and mean autopsy rate.
| Autopsy Rate | 1970 | 1980 | 1990 | 2000 |
|---|---|---|---|---|
| 5% | 17.6% | 13.6% | 10.4% | 7.9% |
| 10% | 17.0% | 13.1% | 10.0% | 7.6% |
| 20% | 15.9% | 12.2% | 9.3% | 7.1% |
| 40% | 13.9% | 10.6% | 8.0% | 6.1% |
| 80% | 10.4% | 7.9% | 5.9% | 4.5% |
| 100% | 9.0% | 6.8% | 5.1% | 3.8% |
| Autopsy Rate | 1970 | 1980 | 1990 | 2000 |
|---|---|---|---|---|
| 5% | 44.3% | 36.4% | 29.2% | 22.9% |
| 10% | 42.7% | 34.9% | 27.9% | 21.8% |
| 20% | 39.6% | 32.1% | 25.4% | 19.7% |
| 40% | 33.7% | 26.8% | 20.8% | 15.9% |
| 80% | 23.3% | 17.9% | 13.6% | 10.2% |
| 100% | 19.0% | 14.5% | 10.8% | 8.1% |
Using the base values for case mix, country, autopsy rate (the overall mean autopsy rate of 44.3%), and time (1980, the midpoint of the period of analysis), the rate of autopsy-detected diagnostic errors was 10.2% (95% CI: 6.7–15.3%) and 25.6% (95% CI: 20.8–31.2%) for Class I and major errors, respectively. Restricting the analysis to U.S. institutions yielded similar point estimates and almost entirely overlapping confidence intervals—11.2% (95% CI: 6.9–17.5%) and 24.0% (95% CI: 17.3–32.3%) for Class I and major errors, respectively.
The expected inverse correlation between error rate and study period (i.e., the more recent the study the lower the error rate) was observed for Class I and major errors, though this relationship was statistically significant only for the latter. Specifically, Class I errors decreased at a rate of 26.2% per decade relative to the base prevalence in 1980 (p=0.10). The probability of a major error exhibited a relative decrease of 28.0% (95% CI: 9.8–42.6%) per decade.
One would expect the rate of decrease for Class I errors and major errors to be similar. As this was, in fact, observed (with relative decreases of 26.2% and 28.0% per year for Class I and major error rates, respectively), it is likely that the decrease in Class I errors is real, and that the statistically non-significant result reflects inadequate power. In fact, the 95% confidence interval for this relationship between Class I errors and time extends from a 48.8% decrease to a 6.3% increase (i.e., the 95% CI lies predominantly on the side of a relative decrease), reinforcing the interpretation that the time trend for Class I errors is real.
The expected inverse correlation between error rate and autopsy rate (i.e., the higher the autopsy rate, the lower the error rate) was also observed, though this relationship was again statistically significant only for major errors. Specifically, for every 10% increase in autopsies, the Class I error rates exhibited a relative decrease of 7.8% (p=0.2) and major errors decreased by 12.0% (p=0.0003) relative to the base error rate.
In contrast with the other two definitions of errors, discrepancies in major ICD disease categories (between clinical and autopsy diagnoses) showed an increase over time, and this relationship was statistically significant. Specifically, the error rate increased by roughly 28% per year (p<0.0001). We had included this definition because, in principle, it represents a relatively objective type of error and because several of the studies using this definition were generally well designed and had high autopsy rates. Unfortunately, these positive features were likely overwhelmed by the heterogeneous effects introduced by variations in coding practices in different countries over different time periods.
Using the regression model and standard error to calculate 95% confidence intervals (as shown in Appendix B) we can estimate the error rates expected in an academic medical center in the U.S., where one might find an autopsy rate of 20%. (In fact, many academic centers have even lower rates than this). The specific error rates to be expected with an autopsy rate of 20% are 6.7% (95% CI: 3.9–9.2%) and 20.3% (95% CI: 16.7–24.5%) for Class I and major errors respectively.
Publication bias tends to operate in such a way that ‘negative’ results are less likely to appear in the peer review literature. For studies evaluating therapeutic interventions, this bias results in an under-representation of studies indicating no benefit for the intervention.138, 139 Autopsy studies do not involve an intervention. Nonetheless, a study showing a low rate of important errors in clinical diagnosis might be regarded as analogous to a ‘negative study’. One can imagine such a bias occurring, if, as with intervention studies, studies showing no significant errors at autopsy are considered uninteresting.
On the other hand, one could imagine publication bias operating in this case in the opposite direction, with studies showing high error rates less likely to appear. In fact, the bias need not operate at the level of publication per se. Institutions with particularly high error rates might choose not to submit such results for publication out of concern for the scrutiny or criticism it might bring. Alternatively, institutions with such high error rates might be less likely to conduct performance evaluation studies of this type in the first place.
To evaluate publication bias in this review, we constructed a funnel plot (Appendix Figures 9 & 10). Because trim-and-fill techniques are more appropriate for assessing systematic reviews of studies reporting odds ratios140–142 and not proportions (such as rates of diagnostic errors), we assessed publication bias using the graphical assessment of funnel plots.141, 143, 144 These plots have as their x- and y-axes the reported error rates and autopsy rates, respectively. In the absence of publication bias, such plots have the appearance of an inverted funnel.
In Appendix Figure 9a, there is a fairly definite appearance of a half-funnel. As it is the right half that is missing (i.e., the half with higher than average error rates), it would appear that publication bias is operating against studies reporting higher error rates. This appearance likely reflects the inclusion of studies reporting errors among different patient populations, which undermines the expectation of a symmetric funnel distribution. When only studies reporting general inpatients are included, the distribution of error rates is more symmetric (Appendix Figure 9b). The funnel plot for all studies reporting major errors (Appendix Figure 10a) shows greater symmetry than the plot in Appendix Figure 9a distribution, but it still appears that the right half of the plot has fewer data points. When inclusion is limited to general inpatient samples only (Appendix Figure 10b), there is still some asymmetry. In this case, it appears that there is an under-representation of large studies with low error rates and small studies with high error rates.
The analysis for publication bias is substantially limited by the complexities introduced by differences in case mix between studies and the competing directions in which publication bias might operate for autopsy studies, in contrast to most systematic topics. Overall, though, it appears unlikely that publication has substantially affected the results presented in this report. Even if substantial publication bias were present, it is reasonable to expect this publication bias to remain stable over time, in which case the observed trend of a decrease in published error rates over time would still be meaningful. Finally, the results of the regression analysis indicating trends towards decreased error rates over time are corroborated by the single study55 that assessed trends in autopsy-detected error rates over time and maintained high (and approximately equal) autopsy rates throughout the time periods assessed.
The Class I error rate for the base time (1980), autopsy rate, case mix and country in a multivariate model was 10.2% (95% CI: 6.7–15.3%).
The probability of the autopsy detecting a major error in a given case was 25.6% (95% CI: 20.8–31.2%).
Restricting the analysis to U.S. institutions yielded similar point estimates and almost entirely overlapping confidence intervals for both Class I and major errors.
These rates decreased by approximately 25% per decade over the 40-year observation period, though these decreases were statistically significant only for major errors.
The combined effects of study period and autopsy rate on error rates are still noticeable, such that the overall rates of Class I and major errors do appear to be decreasing over time, though likely less so than expected by clinicians.
Some publication bias may operate against studies reporting high error rates, but this bias is unlikely to account for the modest trend toward decreased diagnostic errors over time.
If the above analysis had shown a decrease in error rates over time by adjusting for decreases in autopsy rates, there would have been indirect support for the hypothesis that unchanged error rates reflect increased selection of cases by clinicians. In other words, as fewer autopsies are performed, the proportion of cases that presented diagnostic challenges to clinicians may increase, offsetting any improvements in overall diagnostic performance. Such an effect was not demonstrable in the above analysis, but is not ruled out by these results. Thus, we looked for studies specifically addressing the issue of clinical selection of cases for autopsy.
One study commonly cited as evidence that clinical selection does not occur simply demonstrated that major diagnostic groups did not differ between autopsied and non-autopsied cases.57 Thus, what this study really shows is that patients with neoplastic diseases are no more or less likely to undergo autopsy than patients with cardiovascular disease, infectious diseases, and so on. Even in this limited sense of selection, other studies have shown a bias towards certain diagnostic categories (e.g., trauma and “ill-defined conditions”) and a bias against other diagnostic categories (e.g., infectious and neoplastic) in referral for autopsy.14
More important than the diagnostic categories for principal causes of death is the level of diagnostic certainty, and whether or not clinicians refer cases for autopsy primarily in uncertain cases. Addressing this question requires assessing diagnostic certainty using a chart-based tool or prospectively eliciting from clinicians their level of diagnostic certainty prior to autopsy results becoming available.
| Diagnostic Certainty | Autopsy Rate (95% CI) |
|---|---|
| Completely Certain (Level 1) | 53.1% (41.7–64.1%) |
| Almost Certain (Level 2) | 61.3% (56.2–66.2%) |
| Probable or suspected (Level 3) | 73% (62.4–81.6%) |
| Clinical Confidence | Total Number of deaths | Autopsy rate | Disagreement rate |
|---|---|---|---|
| Fairly certain | 9,248 | 57% | 16% |
| Probable | 3,694 | 76% | 33% |
| Uncertain | 1,282 | 88% | 50% |
| Not stated | 393 | 67% | 28% |
| Total | 14,617 | 65% | 25% |
| Clinical Diagnosis | Age < 70 | Age≥ 70 | Total |
|---|---|---|---|
| Fairly certain - # (% erroneous) | 91 (15%) | 91 (34%) | 182 (25%) |
| Probable - # (% erroneous) | 60 (33%) | 91 (53%) | 151 (45%) |
| Total - # (% erroneous) | 151 (23%) | 182 (43%) | 333 (34%) |
| Clinicians' Assessments of Principal Diagnosis | Percent of Cases | Confirmation of Principal Diagnosis at Autopsy |
|---|---|---|
| Fairly certain | 47% | 75% |
| Probable | 35% | 55% |
| Uncertain | 16% | 30% |
| Number of cases (Percent of cases with autopsy confirmation of principal clinical diagnosis %) | ||
|---|---|---|
| 5/75–6/77 (Autopsy rate 30%) | 6 months in 1978 (Autopsy Rate 63%) | |
| Total # of autopsies | 326 | 154 |
| Main diagnosis “certain” or “fairly certain” | 168 (52%) | 144 (94%) |
| Autopsy confirmation of main diagnosis | 182 (56%) | 131 (85%) |
| Autopsy confirmation of causes of death | 60 (18%) | 90 (58%) |
| Autopsy would normally NOT be requested | ----- | 44 (86%) |
| Autopsy would normally be requested | ----- | 110 (85%) |
| Confidence in clinical diagnosis | Clinicians' Assessments of Autopsy Necessity | ||
|---|---|---|---|
| Essential | Desirable | Little Interest | |
| Certain | 22% | 70% | 8% |
| Uncertain | 45% | 54% | 1% |
| Principal diagnoses and causes of death as determined by autopsy in 1975-7 | Principal discharge diagnoses from same hospitals in 1976 | ||||
|---|---|---|---|---|---|
| Confirmed | Overturned | Missed | Alive | Died | |
| Pulmonary TB | 7 | 8 | 7 | 138 | 5 |
| PE | 44 | 35 | 99 | 140 | 18 |
| Acute appendicitis | 3 | 1 | 1 | 441 | 4 |
| Bowel obstruction | 6 | 3 | 0 | 159 | 13 |
| Acute MI | 198 | 58 | 51 | 1452 | 258 |
| Cirrhosis | 22 | 5 | 13 | 125 | 12 |
TB-tuberculosis; PE-pulmonary embolism; MI-myocardial infarction
Heasman MA, Lipworth L: Accuracy of certification of cause of death. In: General Register Office, ed. Studies on medical and populations subjects, No. 20. London: HMSO, 1966.
This prospective study conducted by government departments of health and vital statistics in the United Kingdom involved 75 British hospitals, including 23 in London and 52 scattered throughout England and Wales. The 9,501 autopsies included in this study consisted of adult inpatients who died during a 6-month period in 1959.
As expected, the less certain clinicians were regarding their diagnoses, the more likely they were to obtain an autopsy and the higher the chance of detecting an error in clinical diagnosis.
i) Britton M. Diagnostic errors discovered at autopsy. Acta Med Scand 1974;196:203–10. ii) Britton M. Clinical diagnostics: experience from 383 autopsied cases. Acta Med Scand 1974;196:211–9.
The study author reported comparisons of each italicized cell to the adjacent cell (vertically or horizontally) as statistically significant (p<0.05). The relevant comparisons for the totals were also significant (i.e., the total error rate for patents < 70 years of age was significantly lower than for older patients, and the total error rate for “fairly certain” cases was significantly lower than the error rate for the “probable” cases). Overall, the prevalence of errors in the cases in which the clinical diagnosis was only “probable” was almost double that seen in the “fairly certain” group, even when age is taken into account.
i) Cameron HM, McGoogan E: A prospective study of 1152 hospital autopsies: I. Inaccuracies in death certification. J Pathol 1981; 133: 273–83.
ii) Cameron HM, McGoogan E: A prospective study of 1152 hospital autopsies: II. Analysis of inaccuracies in clinical diagnoses and their significance. J Pathol 1981; 133: 285–300.
Source: Cameron HM, McGoogan E, Watson H: Necropsy: a yardstick for clinical diagnoses. Br Med J 1980; 281: 985–8.
The authors conducted a follow-up to the previous study in which they worked with clinicians to try to increase the autopsy rate at one of the institutions from the previous study. As before clinicians were asked to record their confidence in the clinical diagnoses, but were also asked to state whether or not an autopsy would normally have been obtained (i.e., in the absence of the initiative to increase autopsy frequency).
Hartveit F. Clinical and Post-Mortem Assessment of the Cause of Death. J Pathol 1977;4:193–210.
Landefeld CS, Chren MM, Myers A, Geller R, Robbins S, Goldman L. Diagnostic yield of the autopsy in a university hospital and a community hospital. N Engl J Med 1988; 318: 1249–54.
Investigators in this prospective case-control study conducted daily reviews of in-hospital deaths in order to contact clinicians prior to receipt of autopsy results. Patients undergoing autopsy were paired with the next adult who died in the same hospital without undergoing autopsy.
For each death, regardless of whether or not autopsy was eventually performed, clinicians were asked to record their assessment of the probability that the autopsy would reveal a major undiagnosed cause of death (i.e., a Class I or II finding). The questionnaire asked for an absolute probability (percentage from 0–100) and a qualitative estimate (5 point scale: “much more likely than usual,” more likely than usual,” “as likely as usual,” “less likely than usual,” much less likely than usual.”) The chi-square test for linear trend revealed no significant relationship between the physicians' estimated probability of Class I or II errors and the observed prevalence of such findings. The relationship between qualitative estimates and unexpected findings was not presented, but was reported as non-significant as well.
The studies discussed above generally bear out the intuitively plausible claim that clinicians' diagnostic error rates occur more commonly in cases that clinicians identify as diagnostically uncertain. However, important diagnostic errors were still found in cases clinicians rated as diagnostically certain. Moreover, the relationship between clinicians' reported confidence in antemortem diagnoses and autopsy-detected error rate is complex. On the one hand, some studies show that clinicians can identify levels of diagnostic certainty that correlate with the rate of unexpected findings at autopsy. On the other hand, clinicians show little ability to predict the utility of the autopsy. One possible explanation is that the decision to request autopsy is not completely determined by clinicians' clinical certainty. Another possibility is that, in the context of a study explicitly targeting the accuracy of clinical diagnosis, clinicians reflected more carefully on the confidence of the clinical diagnosis (or even downplayed it) than they would routinely or when considering the decision to pursue autopsy. Probably both explanations play a role.
The relationship between clinicians' reported confidence in antemortem diagnoses and autopsy-detected error rate is complex.
To some extent, clinicians can identify levels of diagnostic certainty.
Cases with greater uncertainty are more likely to go for autopsy and, in some studies, are more likely to have diagnostic errors.
However, even in cases clinicians identify as “certain” with respect to confidence in diagnosis, the autopsy reveals major errors in roughly 5–15% of cases.
Moreover, when asked specifically to predict unexpected findings or to state if the autopsy would be important/useful in a particular case, clinicians do not appear able to predict the findings (or utility) of postmortem findings.
The above sections have indicated that rates of diagnostic errors detected at autopsy have shown little change over time, and that this lack of change likely reflects an effect other than decreased autopsy rates and increased clinical selection. It is possible that, while the rates of errors have remained fairly constant, the nature of the errors has changed. For instance, cirrhosis, bacterial pneumonia, and tumors were commonly identified as missed clinical diagnoses in autopsy studies prior to 1965. 51, 145 Subsequent studies were more likely to reveal previously uncommon diagnoses such as systemic fungal infections, but also pulmonary emboli.51, 145
The small numbers of specific diagnoses in each autopsy study (e.g., 1 missed aortic dissection, 5 missed pulmonary emboli, 2 missed cases of pneumonia) and inconsistent reporting between studies preclude formal analysis of trends in the nature of the missed diagnoses. Further complicating such analysis is the question of whether to report that, for example, 5 of the 100 autopsies were cases of missed pulmonary emboli or that, of the 10 cases of pulmonary emboli detected at autopsy, 5 were clinically missed. Most studies did not report sufficient data to explore these two ways of summarizing error rates for specific diseases and assessing trends over time.
While “newer diseases” such as opportunistic infections have undoubtedly increased in recent decades and account for some of the misdiagnoses detected at autopsy, clinically missed diagnoses continue to include common diagnoses such as myocardial infarction, pulmonary embolism, bowel perforation, and other common conditions.
The autopsy has traditionally played a prominent role in morbidity and mortality rounds, in which clinicians review cases with adverse outcomes for potential quality problems. Unfortunately, even in surgical departments, where these rounds have retained their traditional focus to a greater extent than in many departments of medicine, only a fraction of the complications on a given surgical service typically result in discussion.146 Moreover, many clinicians may assume that autopsies simply confirm recognized complications and do not detect unsuspected ones.
Nonetheless, it is interesting to note that this complication rate is similar to the rate of serious complications of care noted in population based studies such as the Harvard Medical Practice Study134 and the more recent Colorado-Utah Study.136 In other words, despite the various selection forces operating on the autopsy decision, the rate at which it reveals complications of care is comparable to the rates detected by chart-based studies.
Fear of litigation is often cited as a reason for the decline in autopsy rates, i.e., the presumption that clinicians fear that autopsy detected errors will lead to malpractice claims. Pathologists have offered anecdotal evidence to the contrary, i.e., that autopsy findings can resolve potential malpractice claims by definitively refuting alleged diagnoses or complications.149–151 Only one study of diagnostic discrepancies detected at autopsy directly addressed the relationship between postmortem findings and medicolegal exposure for clinicians and healthcare organizations. This study152 of diagnostic errors in 176 autopsies was conducted at the University of Pittsburgh Medical Center in 1994. In addition, the authors reviewed all cases after the two-year statute of limitations on malpractice suits (in Pennsylvania) had expired. Only one malpractice lawsuit had been filed, and the intent to proceed to litigation in that case had been declared prior to the patient's death.
A second study explicitly addressed the issue of the role of the autopsy in malpractice suits. Unfortunately, all 15 cases reviewed in this study153 involved autopsies requested after the filing of suit. Thus, the information from this small series does not address the issue of a possible increase in medicolegal exposure due to routine autopsy performance.
A third survey study assessed the contribution of the autopsy to loss control/ risk management.154 The author's principal interest lay in determining the extent to which non-forensic autopsies increased or reduced institutional medicolegal exposure. The survey of pathologists at 183 teaching institutions had a response rate of only 31%, and the survey instrument appeared quite informal. In response to the main survey question, 36 of 58 respondents (62%) indicated that autopsies reduce overall institutional medicolegal losses.
Summary of the autopsy as a diagnostic test
Only two studies have addressed the relationship between routine autopsy performance and medicolegal exposure.
One of these studies indicated no increased risk of malpractice as a result of autopsy findings.
The other study, a survey of pathologists' impressions of the relationship between the autopsy and risk management activities, suggested that the autopsy may reduce legal exposure more often than it increases it.
Overall, the relative frequencies with which the autopsy generates information that is legally harmful or helpful to an institution remain unclear and deserve further study.
The findings reviewed in the previous section indicate that, for a given patient who has died, there is a roughly 10% chance that autopsy will detect a clinically significant misdiagnosis. Moreover, clinicians' ability to predict cases likely to have unexpected findings at autopsy is at best weak to moderate. On an individual level, these data are quite compelling - i.e., from the point of view of an individual clinician deciding whether or not to pursue autopsy in the case of an individual patient death. But what about at the hospital or population level? To what extent do clinicians miss certain diagnoses in all patients, not just patients who die? As outlined in the Objectives section, this question can be understood as asking what proportion of clinically important diagnoses are first detected at autopsy.
Despite the extensive literature on diagnostic errors detected at autopsy, this question remains largely unanswerable with the existing literature, especially the autopsy literature per se. This gap in the literature does not result from the problem of selection bias - i.e., that clinicians may request autopsies precisely because they are uncertain about clinical diagnoses, so that autopsied cases are not representative of all deaths. Rather, this basic question remains largely unanswerable because the prevalence of clinically missed diagnoses at autopsy is not the same as the prevalence of diagnostic errors among all clinical cases, which is the true error rate of interest.
A typical study of diagnostic errors found at autopsy reports the number of clinically missed diagnoses and often lists the particularly common missed diagnoses (e.g., myocardial infarction, pulmonary embolism, systematic fungal infection, perforated bowel, and so on.) With only one exception,155, 156 these studies include no information relevant to estimating the relevant denominator, namely all cases of myocardial infarction, pulmonary embolism, fungal infection, and so on. If patient deaths were a random sample of all patients, error rates from autopsies alone would provide reasonable estimates of error rates for clinical diagnosis. However, precisely because some deaths represent diagnostic failures (not just treatment failures), autopsies do not provide a random sample with which to gauge the performance of clinical diagnosis.
Because of the importance of this issue, the one study from the autopsy literature that addresses the overall performance of clinical diagnosis is discussed in detail below. Before doing so, we address a point that will arise repeatedly in the discussion of estimates for clinical diagnostic performance, namely how to handle the non-autopsied deaths.
Even if a study provides data on the patients with a given diagnosis who left the hospital alive, estimating the sensitivity of clinical diagnosis is hampered by the possibility of a significant number of unidentified cases with this diagnosis among the non-autopsied deaths. Thus, if a study has an autopsy rate of 20%, we need to have a reasonable approximation for the prevalence of errors among the 80% of deaths that did not undergo autopsy.
In developing this approximation, we need to consider the possibility that non-autopsied deaths might have even more errors than observed among autopsied cases. For instance, clinicians might have a bias against referring cases with errors for autopsy (e.g., to avoid medicolegal exposure), resulting in an under-representation of diagnostic errors among cases referred for autopsy. This possibility stands diametrically opposed to the view many clinicians have, which is to assume that the few (non-forensic) cases sent for autopsy are requested precisely because of diagnostic uncertainty.
From the previous section, it appears that clinicians do not avoid sending cases with diagnostic uncertainty for autopsy, so over-representation of errors among the non-autopsied cases is unlikely. On the other hand, clinicians have a very limited ability to predict the detection of diagnostic errors at autopsy. Thus, the assumption of random distribution between autopsied and non-autopsied cases represents a reasonable lower bound for estimating the sensitivity of clinical diagnosis. In other words, if a series with an autopsy rate of 20% reported 5 Class I errors, the upper bound for the sensitivity of clinical diagnosis would consider these as the only “false negatives” for clinical diagnosis. Thus, one way (admittedly still somewhat arbitrary) to estimate the lower bound, the false negatives would include 5 times this number to account for the possibility that the 80% of non-autopsied deaths included the same proportion of errors as the autopsied cases.
Cameron HM, McGoogan E: A prospective study of 1152 hospital autopsies: I. Inaccuracies in death certification. J Pathol 1981; 133: 273–83.
Cameron HM, McGoogan E: A prospective study of 1152 hospital autopsies: II. Analysis of inaccuracies in clinical diagnoses and their significance. J Pathol 1981; 133: 285–300.
The authors of the two companion articles155, 156 describing this study reported the total number of patients discharged (alive or dead) with the principal diagnoses highlighted among the autopsy cases, permitting estimation of the sensitivity and specificity of clinical diagnosis.
Using these data to estimate diagnostic performance presents problems, as we do not know the accuracy of the clinical diagnoses in patients except in the case of the patients who went to autopsy. However, reasonable estimates are possible for TB and appendicitis because the ultimate clinical diagnoses for these conditions are often unequivocal (microbiologic results, findings at laparotomy and subsequent pathology). For other of these diagnoses, it is difficult to make comparable calculations because the certainty of these clinical diagnoses exhibits greater heterogeneity, with many diagnoses reflecting a general clinical impression rather than definitive results from specific investigations (e.g., pulmonary angiography, liver biopsy).
Pulmonary TB. In 1976 (the middle study year), 138 patients were discharged alive with pulmonary TB as the principal diagnosis and 5 patients died with this as the main diagnosis. Among the 15 patients who died in the 3-year study period with a principal clinical diagnosis of pulmonary TB, autopsy confirmed this diagnosis in 7 cases and rejected it in 8. In an additional 7 deaths, autopsy disclosed pulmonary TB as the main diagnosis or the immediate cause of death. As a rough approximation, then, it appears we can assume that the patients who die with a clinical diagnosis of pulmonary TB are approximately equal in number to the missed deaths due to pulmonary TB. This estimate ignores patients who died but did not undergo autopsy (75% of deaths in this study). Unfortunately, we do not know if clinicians are more or less likely toautopsy cases of TB. If we discount the possibility of a bias against autopsying such cases2, then a reasonable lower bound for clinical diagnosis is that in which the diagnosis of TB is evenly distributed between autopsied and non-autopsied deaths. With an autopsy rate of 25%, this assumption implies an additional 3/7=21 clinically undetected cases. Thus, the lower bound for the sensitivity of clinical diagnosis for pulmonary TB can be estimated as 138/(143 + 21) = 84%.
This estimate ignores patients discharged alive with a missed diagnosis of pulmonary TB, but presumably most of these patients are either subsequently diagnosed or eventually counted among the deaths. The above estimate also assumes that all of the patients discharged alive in fact had pulmonary TB. The rate of false positive diagnoses for pulmonary TB is low, as the false positive rate for TB culture is extremely low. In fact, most false positives consist of culture-negative cases treated on clinical grounds. Assuming even 10% false-positives, the sensitivity of clinical diagnosis for pulmonary TB remains relatively unchanged at 124/150 = 83%.
Acute appendicitis. This example illustrates the complete disconnect between the “error rates” and sensitivity values presented in autopsy studies as compared with the actual sensitivity of clinical diagnosis. Focusing on the autopsy cases only, the “sensitivity” of clinical diagnosis for acute appendicitis would be calculated as 3/4=75%. In fact, though, the above data suggest sensitivity closer to 98%.
Given standard practice in the developed world, patients discharged alive with the diagnosis of appendicitis have undergone surgery (as opposed to empiric antibiotic treatment without operation) so the diagnosis has been confirmed. Autopsies were performed on all four cases of patients who died before going to surgery. Thus, for appendicitis, we can calculate the sensitivity of clinical diagnosis quite precisely. True positives = 441 (alive) + 4 (dead)-1 overturned = 444. Since one case was missed, the total number of cases is 444 + 1, so that the sensitivity of clinical diagnosis is 444/445 = 99.8%. Again, however, we have to take into account the possibility of missed cases among non-autopsied deaths. The 25% of deaths that underwent autopsy contained 1 clinically missed case. If we assume that the other 75% of deaths included 6 cases (i.e., same proportion as observed among autopsied deaths), the sensitivity would still be 444/451 = 98%.
This sensitivity estimate has a wide confidence interval (due to the small number of autopsied cases and the autopsy rate of only 25%), so the true sensitivity may be lower. Moreover, the above estimate does not address the detection of the diagnosis in a timely enough fashion to avoid perforation. Cases in which clinicians recognized the diagnosis only after perforation has occurred represent potential quality problems. Also, the specificity of the diagnosis of acute appendicitis is an important performance measure in practice, as a high sensitivity should not be achieved by submitting a large number of patients to unnecessary laparotomy. These two issues are discussed in the section below, in which appendicitis is one of the five specific conditions reviewed.
Because this one study provides the only data from the autopsy literature per se relevant to assessing the extent to which the autopsy provides a valid measure of clinical diagnostic performance, we conducted ancillary searches from the general clinical literature. Our goal consisted of finding population level studies in which, through a combination of autopsy for patients who die and unequivocal diagnostic testing or long-term clinical follow-up for patients who do not, error rates for clinical diagnosis can be established.
Given the breadth of the potentially relevant literature, our search may not be completely exhaustive. However, we found at least one study providing an estimate of the sensitivity of clinical diagnosis in detecting:
acute pulmonary embolism among patients presenting to the hospital
cases of myocardial infarction among patients presenting to the hospital with chest pain
acute appendicitis
acute dissection of the aorta
active tuberculosis
Many general autopsy series list pulmonary embolism as a significant percentage of the major clinically missed diagnoses. 2, 3, 51, 54, 55, 70, 116, 119, 122, 129, 131, 152, 155–160 Additionally, multiple autopsy studies have specifically assessed the prevalence of clinically significant (but clinically undetected) PE at autopsy.161–173 The most recent of these studies is representative of the general problem with using such data to assess the performance of clinical diagnosis. Pineda et al173 reviewed 778 autopsy reports from 1991-96 at a teaching hospital and identified 67 patients with PE as the primary or major cause of death. Review of the clinical records in these 67 cases of fatal PE indicated that the diagnosis had been suspected clinically in only 30 (45%) patients. The authors compared this prevalence of clinically missed fatal PE with data from previous studies,161, 162, 171 and concluded that the clinical suspicion of fatal PE has shown only minimal improvement over time.
This conclusion involves an important misconception that illustrates a problem common to many of the studies in the autopsy literature. The benchmark for the sensitivity of clinical diagnosis of PE is generally taken to be 96%,174, 175 as the implied false negative rate of 4% corresponds to the prevalence of PE among patients with normal lung ventilation-perfusion scans. 176 No benchmark exists, however, for the sensitivity of clinical diagnosis in detecting fatal PE. One obvious reason for the lack of such a benchmark is that clinicians do not attempt to diagnose fatal PE; they attempt to diagnose PE among patients before they die.
In fact, it is not clear what the ideal sensitivity for the clinical detection of fatal PE ought to be (or, equivalently, what is an acceptable “miss rate” is for fatal PE). As stated more generally in the introduction, patients who die of PE generally fall into one of two categories-treatment failures and diagnostic failures. Contrary to the expectation implied by the authors of many autopsy studies of fatal PE, 161, 162, 171, 173 improvements in care need not reduce the relative proportion of the latter. For instance, if the number of patients in the first category (treatment failures) decreases over time (due to earlier and more aggressive anticoagulation), one might even expect an increase in the relative proportion of diagnostic failures.3
In addition to the above misconception, the rate of clinically missed fatal PE bears little connection to the rate of greater interest, namely the sensitivity of clinical diagnosis in detecting acute PE among all patients. Assessing the overall sensitivity of clinical diagnosis for PE in routine practice requires estimating a ratio that has as its numerator the “true positives” recognized during life (regardless of whether or not they subsequently died). The denominator consists of all cases of PE, i.e., true positives plus false negatives. The false negatives include those cases of PE detected only at autopsy plus the additional clinically missed cases among patients who died but not did not undergo autopsy.
Reasonable assumptions about the prevalence of PE among non-autopsied cases combined with published data permit an estimate of this numerator and denominator. For instance, the International Cooperative Pulmonary Embolism Registry included 2454 patients with suspected or confirmed PE, with 2110 cases confirmed by autopsy, angiography or venous ultrasound combined with a high clinical suspicion.177 In 61 cases, the diagnosis was detected only at autopsy, so the clinically detected “true positives” included 2049 cases. This estimate ignores the clinical false positives among the 2,049 clinically detected cases. It also ignores the additional true positives among the 344 remaining patients with less clear-cut diagnoses of PE. Applying rates of false positive and false negative diagnoses from large clinical studies175, 176 suggests that these two factors roughly cancel each other out (as they affect the estimate in opposite directions), but this is a rough approximation.
At first glance, the above results suggest that the sensitivity of clinical diagnosis could be as high as 2049/2110=97%. However, this estimate does not take into account missed PEs among non-autopsied deaths. If we assume an average autopsy rate of 20% (the participating centers were all teaching hospitals), then the number of clinically missed cases could be as high as 305, rather than 61 (i.e., assuming the same proportion of missed cases among non-autopsied and autopsied deaths). The resulting estimate for sensitivity is 2049/2354=87%. Overall, then it appears that the sensitivity of clinical diagnosis for detecting PE falls within the range of 87–97%.
This range suggests that the performance of clinical diagnosis may fall short of the ideal benchmark of 96%. Nonetheless, by taking into account all cases of PE (even with the above pessimistic assumptions) the estimated performance of clinical diagnosis is strikingly different from the picture suggested by autopsy data alone, in which 30–50% of cases are clinically missed.
Only one autopsy study172 attempted to assess the performance of clinical diagnosis of PE among all patients and not just those who die. The authors reviewed autopsy data from their institution during the period in which their institution participated in the Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED) study.176 The authors reviewed all autopsies for cases of PE and also analyzed data from PIOPED patients at their institution, with the primary objective of estimating the prevalence of acute PE among hospitalized patients. Unfortunately, this study did not take into account the roughly 45% of all patients with suspected PE who were not eligible for inclusion in PIOPED, nor the roughly 50% of eligible patients who declined participation or the 10–40% of patients (depending on the institution) who consented but were not selected for assessment of the sensitivity and specificity of ventilation-perfusion scans.176 These exclusions make assessments of the sensitivity of clinical diagnosis for PE based on these data much more speculative than the estimate generated above from the ICOPER data.
Many general studies of diagnostic errors detected at autopsy report cases of myocardial infarction (MI) as a condition contributing to death but unsuspected antemortem. 2, 3, 51, 54, 55, 70, 116, 119, 122, 129, 131, 155, 156, 159, 160 Other studies have specifically addressed clinicopathologic discrepancies in the diagnosis of myocardial infarction. 178, 179 As in the discussion of PE above, the true question of interest is the sensitivity of clinical diagnosis in detecting MI among all patients, rather than the sensitivity for fatal MI.
Analyses of large cohort studies (e.g., the Framingham Study),180–187 have allowed estimates of the prevalence of unrecognized MI. As reviewed recently, 188 these studies suggest that clinically unrecognized myocardial infarction accounts for at least 25% of all myocardial infarctions. The high prevalence of unrecognized MI has clear implications for public health strategies and clinical practice in terms of screening and diagnosing patients. Importantly, though, these data combine clinically silent MI (i.e., patients who experience asymptomatic MI) with symptomatic, but undiagnosed patients. As the former group accounts for the majority of unrecognized MI,188 these data do not address the issue of errors in clinical diagnosis, as many of these patients do not present to medical attention.
More appropriate for the assessment of the sensitivity of clinical diagnosis for acute MI are the several large studies of myocardial ischemia among patients presenting to the hospital with acute symptoms.189–192 The three more recent of these studies189–191 indicate that 2–4% of patients who present to the hospital with symptoms related to acute MI are clinically missed and inappropriately discharged from the emergency department.
Appendicitis is a common condition, with a lifetime risk of 8.6% for males and 6.7% for females. Approximately 12% of males and 23% of females undergo appendectomy (with the difference in rate due in large part to more frequent opportunities for incidental appendectomies among women).193 Clinical performance measures for acute appendicitis have typically included rates of perforation (reflecting delayed diagnosis and increased complications) and rates of normal appendix at laparotomy (reflecting unnecessary exposure of patients to risks of surgery).194 With the introduction of helical CT as a sensitive and specific test for acute appendicitis,195, 196 it seems reasonable to hypothesize that rates of appendiceal perforation (reflecting delayed diagnosis) and normal appendices discovered at laparotomy should have decreased in recent years.
One recent study197 used procedure codes from a state hospital discharge database to identify patients undergoing appendectomy from 1987-1998. Among 63,707 non-incidental appendectomy patients, 84.5% had appendicitis (25.8% with perforation) and 15.5% had no associated diagnosis of appendicitis. Adjusting for important demographic features (age, gender) showed that the population-based incidence of unnecessary appendectomy and of appendicitis with perforation had not changed significantly over time. While many new technologies take time to diffuse into routine practice, ten years is a fairly long time period, especially for an easily implemented diagnostic test (a type of computed tomography scan) affecting such a common condition.
Alternatively, the lack of a time trend might be due to inaccuracies in administrative data. While such inaccuracies are well-known, there is no reason to suppose that they significantly changed during this study period, so that time trend analyses may be quite accurate. Moreover, another recent study reported similar findings using clinical data. This study194 included two cohorts, one of which consisted of 1,026 patients undergoing appendectomy. The authors reported a normal appendix in 10.5% of cases, with a range of 4.7–19.5% for the 12 institutions participating in the study.
This study194 included a second cohort of consecutive patients presenting to the emergency department (ED) with abdominal pain. The 1,118 patients identified for this cohort included 44 patients who ultimately proved to have appendicitis. Focusing on physicians initially assessing these patients (in the office or ED), the sensitivity of clinical diagnosis was 81.4%, but with an institutional range from 72.2–89.4%. Moreover, perforation was observed in the concurrent appendectomy cohort in 20.3% of cases with a range from 6.9–33%.
Another recent study did show reductions in appendiceal perforation and false positive rates (i.e., normal appendix at laparotomy).198 This study may accurately reflect institutional expertise with appendiceal CT and/or the development of an efficient protocol for the work-up of suspected appendicitis. On the other hand, this study did not include all patients presenting with abdominal pain, so that patients who subsequently re-presented (i.e., false negatives) to another hospital would not be captured.
General autopsy series commonly list cases of missed aortic dissection among the major diagnostic errors.2, 3, 116, 120, 199 A number of older studies from the clinical literature report 50% or more cases as clinically missed.200–202 One of the first of the more modern studies203 provides a misleadingly high estimate of clinical sensitivity because, as the authors note, the series was predominantly clinical, with less systematic attempts to identify cases detected only at autopsy. Three other recent studies more comprehensively look for cases detected only at autopsy.204–206 The most recent of these studies 206 focused only on clinical diagnosis or suspicion by physicians in the emergency department, so this study was not considered further.
One study204 reports the clinical findings in a series of 235 patients with aortic dissection seen at the Mayo Clinic in Rochester from 1980-90. The diagnosis was confirmed by surgery (162), autopsy (27) or radiologic testing without surgery (47); iatrogenic dissections (e.g., cases that occurred during vascular surgery or catheterization) were excluded. This series included 59 patients referred from outside facilities with the radiologic diagnosis of aortic dissection. Including these patients significantly overestimates sensitivity of clinical diagnosis, because clinically missed cases from these referring institutions are not included. Among the 176 patients presenting initially to the Mayo Clinic, 17 cases were not identified until autopsy, suggesting a sensitivity of 159/176= 90%. This estimate does not take into account missed cases among non-autopsied deaths. The study does not report the autopsy rate. If we use published data on autopsy rates in Olmsted county323, we can estimate the autopsy rate as approximately 30%. As a rough estimate, then, the non-autopsied deaths together with the autopsied cases included as many as 56 (i.e., 17/0.3) clinically missed aortic dissections, suggesting the sensitivity of clinical diagnosis of 159/215=74%. Thus, the sensitivity of clinical diagnosis for dissection of the aorta has 90% as its upper bound (if the non-autopsied deaths contained no missed cases) and as its lower bound 74% (if the non-autopsied deaths contained the same proportion of missed cases as detected among the autopsied patients).
Another article205 describes the clinical findings and outcomes for 258 patients with aortic dissection from 1966 to 1986. (This series includes the patients reported in an earlier study from the same institution, so this earlier article is not discussed.207) In this series of 258 patients with 259 dissections, 69 cases were identified only at autopsy (including 58 acute Type A dissections.) Thus, the sensitivity of clinical diagnosis in this series was no higher than 190/259=73%. Again, this estimate is an upper limit because it does not include missed cases among non-autopsied deaths. If we attribute an autopsy rate as high as that attributed above (i.e., 30%), then the total number of missed cases would be 69/0.3=230, for an overall sensitivity of 90/(190+230)=45%. If the institution involved had a lower autopsy rate, the estimated sensitivity would be even lower. Thus, this study205 suggests that clinical diagnosis detects no more than 73% of cases and possibly as few as 45%.
Many general autopsy series list tuberculosis (TB) among the clinically significant missed diagnoses first detected at autopsy.116, 127, 129, 131, 156, 208–211 Studies specifically assessing the prevalence of TB among autopsied patients have reported that roughly 50% of cases were not detected antemortem.212–217
The difference between the proportion of diagnostic failures among fatal cases and the failure rate for clinical diagnosis for all cases is particularly striking for an eminently treatable infectious disease such as TB. As discussed previously, the single autopsy study reporting diagnoses for patients discharged alive suggested a sensitivity of clinical diagnosis in detecting pulmonary TB of greater than 83%.155, 156 We identified two other studies providing population level data on all diagnoses of TB. 218, 219 The more recent study 219 reviewed all cases of TB reported to the San Francisco Department of Health from 1986-95. Among 3102 reported cases of TB, 120 (3.9%) met the definition for diagnosis after death. (This definition included patients who literally were diagnosed after death with postmortem tissue cultures, but also patients who were not receiving therapy with more than one agent at the time of death.) The earlier study at the national level reported approximately the same result, with 5.1% of cases meeting the same definition of diagnosis after death.
Unfortunately, neither of these results takes into account missed cases among non-autopsied death. Moreover, we do not know the rate of autopsy for the catchment areas involved. Given the variation in autopsy rates between institutions and the predominance of non-teaching hospitals (typically with low autopsy rates) it would be unlikely for the effective autopsy rate to be higher than 10% (see national autopsy rates in Appendix Figure 2). As before, regarding missed cases of TB as evenly distributed between autopsied and non-autopsied deaths provides a reasonable lower bound for the sensitivity of clinical diagnosis. With the non-autopsied cases thus including as many as 9/120=1080 additional cases, the sensitivity of clinical diagnosis for TB could be as low as 39%.
In summary, the overall performance of clinical diagnosis may diverge significantly from that suggested by autopsy results alone. Unfortunately, few data assess the performance of clinical diagnosis (i.e., answering the question of how often cases of a given diagnosis remain undetected during a patient's life), and studies generally come from the clinical literature, rather than the autopsy studies. (Finding such studies thus involved extensive searches outside the originally targeted autopsy literature.)
Among the five conditions for which we found relevant data, the sensitivity of clinical diagnosis substantially exceeded the performance suggested by autopsy studies for clinical detection of PE and acute MI. The sensitivity of acute appendicitis is also relatively high, but not necessarily as high as many clinicians may believe. For the other two diagnoses—TB and aortic dissection—clinical or population level studies confirm the findings of the autopsy literature, with a substantial number of cases clinically missed. The example of acute appendicitis also illustrates that advances in diagnostic testing may not always translate into advances in overall diagnostic sensitivity.
Even for the two conditions for which clinical diagnosis appears to perform relatively well, it is possible that these high sensitivities overstate clinical performance. By focusing on the identification of target conditions such as PE and MI, clinicians may miss other important conditions once these target diagnoses are ruled out. Cohort studies of patients being investigated for MI or PE do not evaluate the extent to which other diagnoses are missed as a result of focusing on the identification of these diagnoses. A patient who is correctly identified as not having an MI or PE counts as a success, regardless of whether or not the patient eventually receives a diagnosis other than “non-cardiac chest pain” or “unexplained dyspnea.”
The use of autopsy data—whether as conventionally reported error rates (proportion of autopsies with diagnostic errors) or as rates with the true denominator of interest (all cases treated during the observation period) —presents formidable practical problems in terms of clinical performance measurement. Only a small percentage of patients admitted to the hospital die, only a minority of these patients undergo autopsy, and only in a minority of these cases are clinically important misdiagnoses. Moreover, only a portion of these misdiagnoses represent true “errors” (and therefore quality problems), as many represent atypical presentations—in fact, cases in which the clinicians were frankly unsure of the diagnosis prior to death were counted as “errors” in most of the autopsy studies—or deliberate decisions not to pursue aggressive diagnostic work-ups (e.g., in chronically ill patients approaching the end of life). The opportunity to detect true quality problems using the autopsy data at a single institution is thus quite small and error rates derived solely from autopsy data are unlikely to generate statistically meaningful measures for comparing or benchmarking institutional performance.
The overall performance of clinical diagnosis may diverge significantly from that suggested by autopsy results alone.
Assessing the performance of clinical diagnosis requires supplementing autopsy data with discharge diagnoses or other follow-up information concerning patients who do not die.
We found studies providing data on the performance of clinical diagnosis for only five conditions: acute MI, PE, aortic dissection, acute appendicitis, and active tuberculosis. Among these five conditions, clinical diagnosis exhibited substantial variation in sensitivity, with satisfactory values only for acute MI and to a lesser extent PE. Moreover, these results may overstate the performance of clinical diagnosis as they focus on the dichotomous outcome of correct recognition or exclusion of a specific target condition, rather than successful diagnosis of whatever condition the patients may have.
The existing literature indicates that antemortem diagnosis likely does not perform as well as generally believed by clinicians. Unfortunately, effective use of autopsy data for institutional performance measurement is hampered by problems with small numbers and the difficulties of obtaining the true error rates of interest (diagnostic errors in all cases). At the level of the health care system, though, autopsy data could likely be combined with other clinical data to provide meaningful estimates of the performance of clinical diagnosis in routine practice.
Two examples from the medical error and patient safety literature illustrate the type of study required to demonstrate a benefit for performance improvement. In a prospective study aimed at reducing errors in radiographic interpretation by emergency physicians, example errors were used to create a teaching file.220 Moreover, a formal protocol for interpreting radiographs was adopted, and attending radiologists read all films within 12 hours to provide feedback and quality control. Subsequent outcome measures showed a significant decrease in the rate of significant missed diagnoses from 3% (95% CI: 2.8–3.2%) to 1.2% (95% CI: 1.03% to 1.37%). Further process redesign achieved a further reduction to 0.3% (0.26% to 0.34%). On a larger scale (comparable to what would be involved in an autopsy performance improvement project), a program at an Australian hospital 221 showed a reduction in adverse events over an 8-year period, attributable to an intensive system for incident reporting and conducting root cause analyses of serious events.
No intervention study has examined the impact of autopsy-detected errors in clinical diagnosis on subsequent clinical performance. Given the absence of any studies of the impact of postmortem findings, the current literature provides no direct evidence for or against an impact of autopsy-detected errors on performance improvement at the level of individual practitioners or institutions. This is not to say that there is no valid role for the autopsy in relation to clinical practice or performance improvement, but instead reveals that the current literature is insufficient to address these issues.
Dissemination of findings from studies of autopsy-detected errors may, however, exert an impact on clinical practice at a broader level. While such an impact is plausible, the existing literature does not lend itself to analyzing trends in specific misdiagnoses, and certainly does not allow for inferences about causality. For instance, cirrhosis, bacterial pneumonia and tumors were commonly missed by clinicians and thus detected as major errors in autopsy studies prior to 1965. 51, 145 Subsequent studies were more likely to reveal previously uncommon diagnoses such as systemic fungal infections and pulmonary emboli,51, 145 suggesting some improvements in clinical practice. Similarly, even though specific tumor misdiagnoses continue to occur,222the recognition of the presence of a malignancy has almost certainly improved over time.51, 145, 223 These improvements, though difficult to document with existing data, plausibly result from general dissemination of the results of autopsies in the medical literature, even if local impacts remain unclear.
In the event that some connection could be shown between the identification of diagnostic errors at autopsy and subsequent performance, it is unclear how best to utilize the autopsy as a performance improvement tool at the institutional level. Discussions in the literature have generally assumed that the persistence of significant rates of diagnostic errors at autopsy implies that autopsy rates must be increased. However, the ability to measure the benefit of investing resources in such efforts is currently not feasible. Studies would be needed to establish optimal ways of utilizing the information generated by autopsies as currently performed or, at the least, to document the effectiveness of current methods.
Taking into account overall diagnostic performance does not substantially improve the power to detect differences in diagnostic performance. Consider, for instance, a relatively large hospital with 20,000 admissions per year.4 Based on national discharge data,224 this hospital would have a crude mortality rate of 2%, so the total number of deaths would be 400. Even, if we attribute to this hospital a reasonably high (by contemporary standards) autopsy rate of 25%, so that 100 autopsies would be performed in one year, only we would expect to observe 5 Class I errors among these autopsies (using the regression model to predict Class I error rate for a U.S. institution with autopsy rate = 0.25 and year = 2000, as shown in Appendix B), for an error rate of 5% (95% CI: 1.9–11.8%).
Error rates for specific diagnoses have even wider confidence intervals. Suppose the above 5 Class I errors included two missed cases of major PE. Based on national hospital discharge data,224 we can estimate a hospital with 20,000 annual dischargese to include 40 cases of clinically diagnosed PE. Putting aside the complicating issue of false positive clinical diagnoses, we would thus estimate clinical diagnosis as having false negative rate of 2/42 or 4.8% for the detection of PE, with a 95% confidence interval extending from 0.8% to 17.4%. Taking into account the possibility of missed cases among non-autopsied death widens the confidence interval even further. As an approximation, we can regard the non-autopsied deaths as including the same proportion of Class I errors as the autopsied deaths. Thus, with a 25% autopsy rate, there might be a total of 4/2=8 missed PEs. The upper limit for the 95% CI for the error rate of 8/48 would then be 31%.
The previous sections have outlined the weak connection between autopsy-detected errors and overall clinical performance and the lack of any evidence for an impact of autopsy-detected errors on performance improvement. The prevalence of autopsy-detected errors in clinical diagnosis do, however, have major implications for vital statistics and other epidemiologic data, and these are discussed below.
Vital statistics such as mortality data provide a fundamental source of death information for different demographic groups (by age, gender, race, geographic area) across the country. One of the few sources of such data are the vital statistics collected by the National Center for Health Statistics.225 These data permit characterization of the leading causes of death, calculation of life expectancy, and comparisons of mortality trends with other countries.
The National Vital Statistics System compiles these national mortality data from death certificates. In the United States, state laws require completion of death certificates for all deaths, and federal law mandates national collection and publication of deaths and other vital statistics data.226 Importantly, though, these activities all depend on the accuracy of death certificates.
Death certificates primarily reflect clinical diagnoses, because the majority of deaths do not undergo autopsy and because death certificates are typically completed before autopsy results become available and are seldom corrected in light of postmortem findings.14, 254 Thus, vital statistics based on death certificates contain all of the errors in clinical diagnosis reviewed in the literature on autopsy-detected diagnostic errors. In fact, because the magnitude of these errors remains independent of impacts on prognosis, death certificates would be expected to have errors in the range we found for major errors, which in the regression model ranged from 8.1–23.8% in the year 2000, depending on the autopsy rate (Appendix Figure 8). Variations in coding of death certificates introduce additional inaccuracies,14, 15, 110, 114, 227–229, 231–235 further increasing the error rate for death certificates. Clinicians unfamiliar with protocols for assigning causes of death undoubtedly contribute to much of this variation, but significant variation occurs even among pathologists and medical examiners.114
Several studies have suggested that errors in death certificates roughly cancel each other out. In other words, for a given disease category (e.g., infectious, neoplastic, cardiovascular), the false positives approximate the false negatives, resulting in little net change in major causes of death at a population level.14, 15, 254 However, the specific disease categories that remain relatively unchanged are not the same across studies, and for some disease categories the changes have been large. Moreover, for most epidemiologic purposes, more than a major disease category is required (e.g., emphysema or myocardial infarction, not simply death due to respiratory disease or cardiovascular disease).
Mortality data, such as those contained in death certificates225, 226 or large administrative databases,224, 255 are commonly used in outcomes research.256–266 Moreover, major decisions related to the allocation of healthcare research funds derive in part from estimates of disease burden.267–271 The degree of correlation between disease burden and research funding depends on the specific measure—prevalence, incidence, mortality, disability-adjusted life-years and economic costs.269, 271 Nonetheless, all of these measures depend on accurate vital statistics and other epidemiologic data derived from clinical diagnoses, both of which are known to contain major inaccuracies.
One epidemiologic application of the autopsy consists of the use of incidental findings or ‘necropsy surprises’ in routine autopsies to gauge the prevalence of important chronic diseases. For instance, patients who die of conditions unrelated to gallstones or an abdominal aortic aneurysm provide a quasi-random sample of the general population from the perspective of attempting to measure the prevalence of these conditions. This approach to using the autopsy as an epidemiologic tool has been applied to diseases of the biliary tract and abdominal aorta, as well as several common forms of cancer.17–19, 21, 22, 272, 273, 274
Another epidemiologic application of the autopsy is in identifying or helping to characterize new and emerging diseases. Prominent examples of this role of the autopsy have consisted of infectious diseases,275 such as Legionnaire's disease, AIDS, and pulmonary hantavirus.23–26, 276–281 Outbreaks, such as the West Nile virus epidemic, and, more recently the use of anthrax a biological weapon, emphasize the importance of surveillance. Routine autopsies may detect cases that would otherwise escape investigation.
Overall, this review confirmed data from innumerable reports and the impression of many advocates of the autopsy, that the autopsy continues to detect important errors in clinical diagnosis. Errors that might have affected prognosis (i.e., “Class I” errors) have remained relatively stable over time, occurring in an average of 10.2% (95% CI: 6.7–15.3%) autopsies. Some selection does occur, so that cases for which clinicians had persistent diagnostic uncertainty are more likely to undergo autopsy. However, this selection does not explain away the persistently observed Class I error rates. The regression model derived from the data reviewed indicates that Class I error rates at US institutions in the year 2000 range from 3.8% to 7.9%, depending on the autopsy rate.
Despite the relative robustness of the above findings, the conclusions that follow from them differ depending on the level of analysis—individual clinicians, hospitals, or the healthcare system as a whole.
We take as a given that many (ideally all) clinicians will have an intrinsic interest in the autopsy given the possibility of learning of important misdiagnoses in roughly 25% of cases, with roughly one third of these potentially having affected patient outcome. Importantly, though, as with the physical examination, quantitatively justifying this interest is difficult. (Interestingly, the quantifiable benefit of the physical examination is similar to the autopsy, with roughly 10% of diagnoses directly suggested by findings from patient examination.44–46) While the autopsy has interest for individual clinicians (not to mention students and trainees) beyond what can easily be measured, the most quantifiable benefit for the autopsy at the individual practitioner's level is the potential impact on subsequent diagnostic performance.
Because physicians often fail to recognize diagnostic errors in the first place and thus miss the opportunity to change practice, the autopsy represents a potentially invaluable quality improvement tool, and demonstrating this value constitutes a crucial area of future research. Such research will face several important obstacles, though, including the need for substantially increasing autopsy rates in the first place, the relatively small number of cases (and even smaller number of errors) individual clinicians encounter, and substantial evidence that physicians tend not to change or improve their practice in response to interventions that consist only of the provision of new information.282–289 On the other hand, by implementing strategies other than traditional conferences, pathologists and clinicians may achieve demonstrable effects on performance improvement, as observed with other more interactive ways of stimulating change, as educational outreach programs290 and involvement of local opinion leaders.291
The extensive literature search retrieved no reports of evaluations of interventions to improve clinical diagnostic performance based on autopsy-detected errors. Even assuming that such a benefit from autopsy findings is possible, increasing autopsy rates will not necessarily achieve this benefit without an established and effective mechanism for feeding back autopsy findings to clinicians and stimulating performance improvement.
Using institutional error rates for performance measurement is possible in principle, but meaningful comparisons are unlikely to occur even with modest increases in autopsy rates. For most hospitals, the error rates will have confidence intervals too wide to detect significant deviations from benchmark values in all but extreme cases.
The existing literature provides two compelling reasons to pursue autopsies in order to benefit the healthcare system as a whole. First, results for the 5 conditions examined in this report suggest that clinical diagnosis in routine practice may not perform as well as is generally believed by clinicians or as suggested by the literature assessing specific aspects of clinical diagnosis (e.g., new tests) in research settings. Better characterizations of the performance of clinical diagnosis for common conditions would clearly benefit the entire health system and identify important targets for quality improvement that could be pursued in a concerted manner.
The second benefit to the entire health care system relates to vital statistics and other epidemiologic data. Vital statistics impact important decisions about allocation of funding for research and other aspects of healthcare policy. The existing literature demonstrates that clinical diagnoses, whether obtained from death certificates or hospital discharge data, contain major inaccuracies compared with diagnoses generated from postmortem findings. Because the accuracy of vital statistics is independent of consideration of impacts on prognosis, the error rate of interest is that found for major errors. Consequently, the existing evidence strongly suggests that substantial inaccuracies in 8–23% of diagnoses listed as causing or contributing to death. Given the importance of vital statistics and other epidemiologic data in conducting outcomes research, allocating research funding, and making other important policy decisions, using autopsy data to rectify this problem has the potential to have multiple benefits for the health care system as a whole.
Various aspects of the performance of the autopsy as a diagnostic test (e.g., the reproducibility of findings between pathologists) remain undefined and represent areas for further research. More specifically relevant to the present review is the inter-rater reliability for error classifications in specific cases - i.e., establishing the extent to which pathologists, clinicians or other peer reviewers agree that a particular case does or does not involve a clinically important diagnostic error.
The causes of important diagnostic discrepancies remain uncharacterized. This represents a very important area of investigation. Discrepancies between efficacy and effectiveness (i.e., differences between the performance of a diagnostic or therapeutic procedure in routine practice compared to the result in the research literature) have diverse causes. Broadly speaking, though, factors consist of quality problems, related to underuse, overuse and misuse of diagnostic or therapeutic procedures, and patient factors, including atypical presentations and complex interactions between comorbid conditions and patient demographic factors. Neither of these categories are captured in the “efficacy literature” (i.e., clinical trials), as the nature of research settings make underuse, overuse or misuse unlikely, and stringent patient selection reduces the complexities of comorbid conditions and multiple competing diagnostic considerations.
Autopsy data provide a window into discrepancies between efficacy and effectiveness both for therapeutics (by detecting clinically unsuspected complications of care) and diagnostics (by detecting the diagnostic discrepancies discussed in this report). In both cases, but perhaps especially the latter, the autopsy can play a pivotal role in spearheading investigations into the causes of these discrepancies. Where they prove to present quality problems, the institution benefits and, where they reflect differences between the types of patients receiving care in routine practice and clinical trials, the whole health system may benefit from awareness of these findings.
Future research is needed to establish strategies for optimizing the utility of the autopsy at the institutional level. No study has ever directly assessed the impact of detecting errors in clinical diagnosis on subsequent clinical performance. Thus, future research is needed to determine optimal methods of involving clinicians in the autopsy process (or communicating its results to them) and effective ways of stimulating change based on autopsy findings. Until such research is performed it is not clear to what extent autopsy rates need to be increased as opposed to achieving improvements in communication and utilization of information generated from autopsies performed at current rates.
Another opportunity for future research would be to establish the optimal means of using autopsy data to provide more accurate vital statistics and other important epidemiologic data. The first step might be to validate the findings suggested in this review, namely that current vital statistics contain substantial inaccuracies. Such an undertaking might involve funding a small number of demographically diverse institutions to achieve high autopsy rates, with prospectively determined protocols for autopsy performance and error classification. Such a program would not replace autopsies as routinely performed elsewhere (i.e., this suggested research program in no way represents a system of regional autopsy centers). Rather these centers would act as surveillance centers for basic causes of death and detection of quality problems, as well as numerous opportunities for basic research into the pathogenesis of acute and chronic illnesses. Even one year's worth of data from such a project would likely document substantial inaccuracies in vital statistics. If continued it could provide ongoing epidemiologic data, as well as more meaningful error rates that could be used to fuel quality improvement efforts throughout the health system.
| Institution | Autopsy Rate | Case Mix (Number of Autopsies) | Persistent Uncertainty After Autopsy (95% CI) |
|---|---|---|---|
| Paddington General Hospital (UK): 195890 | 53% | General inpatientsa (265) | 21.1% (16.5–26.6%) |
| Woodend General Hospital and Aberdeen City Hospital (Scotland): 1972-74 77 | 37% | General inpatients (1,000) | 5.1%b (3.9–6.7%) |
| Oulu University Hospital (Finland): 1972-74292 | 24%c | Oncology (377) | 1.1% (0.3–2.9%) |
| Holstebro Central Hospital, Denmark: 1974 199 | 85% | General inpatients (266) | 0.4% (02–2.4%) |
| University of Bergen (Norway): 1975293 | 75% | General inpatients (742) | 4.3% (3.0–6.1%) |
| Huddinge University Hospital (Sweden): 1977-78, 1987-88)53,294 | 76%, 42%d | General inpatients (3,042) | 2.4% (1.9–3.1%) |
| Salford Health Authority (UK): 1981295 | 35% | Inpatients age >65 (332) | 3.6% (2.0–6.4%) |
| University of Colorado (Denver, CO): 1981-8391 | 81% | Fetal deaths (64) | 46.2% (32.5–60.4%) |
| University of Colorado (Denver, CO): 1981-8391 | 80% | Neonatal/infant deaths (108) | 23.0% (14.9–33.5%) |
| White River Junction VA Medical Center (White River Junction, VT): 198378 | 60% | General inpatients (111) | 3.6% (1.2–9.5%) |
| Hospital Central de la Cruz Roja (Spain): 1983-85125 | 51% | ICU (100) | 6% (2.5–13.1%) |
| Sumitomo Hospital (Japan): 1983-97296 | 40% | General inpatientsa (1,044) | 0.6% (0.2–1.3%) |
| Leiden University Hospital (Netherlands): 1984 118 | 47% | Medicine patients (133) | 9% (5–16%) |
| Northwestern Memorial Hospital (Chicago): 1985 130 | 36% | General inpatient (151) | 6% (3–11%) |
| Institute Jules Bordet (Belgium): 1985-86 297 | 69% | Adult Oncology ICU (34) | 17.6% (7.4–35.2%) |
| Johns Hopkins Hospital (Baltimore, MD): 1985-9594 | 24% | Adult cardiac surgery (147) | 25.2% (18.6–33.1%) |
| Women and Infants Hospital (Providence, RI): 1991-9293 | 83% | Fetal deaths (77) | 37.7% (27.1–49.5%) |
| Women and Infants Hospital (Providence, RI): 1991-9293 | 74% | Neonatal deathse (47) | 2.1% (0.1–12.7%) |
| Lund University Hospital (Sweden): 1991-9292 | 50% | Fetal deathsf (54) | 42.6% (29.5–56.7%) |
| Lund University Hospital (Sweden): 1991-9292 | 50% | Neonatal and infant deaths (31) | 3.2% (0.2–18.5%) |
| Lund University Hospital (Sweden): 1991-9292 | 90% | Medical abortions (19) | 0% (0–20.9%) |
| 248 Departments of Pathology (USA): 1993 298 | ----1 | General inpatients (2,479) | 5.2%2 (4.1–6.6%) |
| Prince of Wales Hospital (Hong Kong): 1997211 | 18% | General inpatients and forensic cases (332) | 1.8% (0.7–4.1%) |
Presumed from description of study and results, but not explicitly stated.
The authors reported that “the solution of a clinical problem proved impossible despite an adequately performed and reported autopsy” in 71 cases. The rate listed in the table derives from these cases, but excluding the 20 cases in which this failure to solve the clinical problem involved failure to perform biochemical or hematological tests.
This rate reflects cancer patients only, which was focus of study.
The rate of indeterminate autopsies was reported for the study overall and not separated by time periods.
Defined by authors as deaths within 48 hrs of birth.
Included 40 spontaneous abortions and 14 intra-uterine deaths.
Rate not calculable given study design in which each of 248 institutions contributed a small, consecutive sample of its autopsies.
This study reported the number of clinical questions answered by the autopsy, with each of the 2,479 cases having a mean of 2.6 associated questions. The listed figure for “persisted uncertainty” after autopsy reflects the percentage of the cases in which the autopsy did not establish the cause of death (The answer to this question was provided by respondents for 1,330 of the total of 2,479 autopsies.)
| Institution | Autopsy Rate | Case Mix (Number of Autopsies) | Error Rate (95% CI) |
|---|---|---|---|
| Peter Bent Brigham Hospital (Boston): 1960 51 | 75% | Adult inpatients (100) | 8% (4–16%) |
| Christian- Albrechts University Hospital (Germany): 1959 54 | 88% | General inpatients (100) | 7% (3–14%) |
| Christian- Albrechts University Hospital (Germany): 196954 | 82% | General inpatients (100) | 12% (7–20%) |
| Peter Bent Brigham Hospital (Boston): 1970 51 | 71% | Adult inpatients (100) | 12% (7–20%) |
| Zurich University Hospital (Switzerland): 197255 | 94% | Adult medical (100) | 16% (10–25%) |
| Winnipeg Health Sciences Center (Canada): 1978-79299 | 26%3 | Adult inpatients (200) | 1.5% (0.4–4.7%) |
| Royal Adelaide Hospital (Australia): 1979300 | 45% | Adult inpatients (99) | 2.0% (0.4–7.8%) |
| Christian- Albrechts University Hospital (Germany): 197954 | 58% | General inpatients (100) | 12% (7–20%) |
| Peter Bent Brigham Hospital (Boston): 198051 | 38% | Adult inpatients (100) | 11% (6–19%) |
| 39 Connecticut institutions performing autopsies: 198014 | 14% | General (272) | 4.0% (2.1–7.3%) |
| Belgrade University School of Medicine (Yugoslavia): 1981-84301 | 12% | General inpatients (2,145) | 29% (26.9–30.8%) |
| Zurich University Hospital (Switzerland): 198255 | 89% | Adult medical (100) | 9% (4–17%) |
| 32 US hospitals (21 university and 11 community non-teaching): 1984 70 | 30% | General inpatients (2,067)4 | 13% (12–15%) |
| Brigham & Women's Hospital (Boston, MA): 1984-85 129 | 37% | Adult inpatients (175) | 11% (7–17%) |
| Emerson Hospital (Concord, MA): 1984-85 129 | 26% | General inpatients (58) | 12% (5–24%) |
| Medical Center of Delaware: 1988-91 210 | 17% | General inpatients (145) | 3% (1–8%) |
| St. Vincent's Hospital (Australia): 1988160 | 22% | General inpatients (139) | 6% (3–11%) |
| Christian- Albrechts University Hospital (Germany): 198954 | 36% | General inpatients (100) | 11% (6–19%) |
| Zurich University Hospital (Switzerland): 199255 | 89% | Adult medical (100) | 7% (3–14%) |
| Ben Taub General Hospital (Houston, TX): 1992-3 116 | 16% | Auult medical5 (110) | 10.9% (6.3–16.8%) |
| San Francisco General Hospital and 31 hospital in Orange County, CA: 1974-756,302 | 100% | Adults and pediatric moror vehicle accident fatalities (182) | 12.6% (8.3–18.6%) |
| Mason F. Lord Nursing Home (Baltimore, MD):1981-88303 | 3.5%7 | Nursing home residents (34) | 32.4% (18–50.6%) |
| Hospital Central de la Cruz Roja (Spain): 1983-85125 | 51% | Adult ICU (100) | 7% (3.1–14.4%) |
| Six Department of Veterans Affairs Hospitals: 1986-87131 | 43% | Adult medical ICU (172) | 12% (8–18%) |
| Hartford Hospital (New Haven, CT): 1986-92304 | 29% | Adult surgical ICU patients (149) | 11% (6.5–17.1%) |
| Gloucestershire Royal Hospital (U.K.): 1996-993 | 40% | Adult ICU patients (97) | 4.1% (1.3–10.8%) |
| University of Massachusetts Medical Center (Worcester, MA): 1984-88120 | 32% | Adult ED patients8 (244) | 1.6% (0.5–4.4%) |
| University of Texas Medical Branch (Galveston, TX): 1984-88305 | 73% | Adult and pediatric surgical patients (409) | 7.8% (5.5–11%) |
| Royal Victoria Hospital (Northern Ireland): 1986-88306 | 23% | Adult and pediatric perioperative deaths (213) | 20.7% (15.6–26.8%) |
| Ben Taub General Hospital (Houston, TX): 1992-3 116 | 16%9 | Patients with AIDS (42) | 9.5% (3.1–23.6%) |
| Lutheran General Children's Hospital (Park Ridge, Ill): 1984-93128 | 36% | Pediatric inpatients (107)10 | 6.5% (2.9–13.5%) |
| Children's Hospital of Western Ontario (Canada): 1985-89307 | 75% | Pediatric deaths in the ED (52) | 0%11 (0–8.6%) |
| Toronto Hospital for Sick Children (Canada): 1985-90124 | 62% | Neonatal ICU (338) | 2.1% (0.9–4.4%) |
| North Shore University Hospital (Manhasset, NY): 1985-92132 | 26% | Pediatric ICU (50) | 10% (3.7–22.6%) |
| Royal Alexandra Hospital for Children (Australia): 1991-97308 | 40% | Neonatal ICU (91) | 5.5 % (2.0–12.9 %) |
| General Hospital Celje (Slovenia): 1998-99320 | 47% | Medical ICU (126) | 9.5% (5.2–16.4%) |
| Ryder Trauma Center (Miami): 1997-98321 | 97% | Adult & pediatric trauma/burn patients dying in ICU (153) | 2.6% (0.8–7.0%) |
| King Edward Memorial Hospital (India): 1995-96319 | 82% | Neonatal ICU (197) | 12.2% (8.1–17.8%) |
Rate reflects “teaching cases” only, with medicolegal cases excluded; overall autopsy rate for this period including medicolegal/forensic cases was 44%.
Not clear how many pediatric patients included as study states only that 2,067 autopsies reviewed included 360 cases under 20 years of age.
Patients with Acquired Immunodeficiency Syndrome (AIDS) were reported separately; these data are presented in the line below this entry.
The authors of this study were focused primarily on demonstrating the difference in quality of care between a region with a Level 1 trauma center (San Francisco County) and a region (Orange County) where trauma patients are sent to nearest available facility. Chart reviews were conducted by the authors and were not blinded to patient location. Because of the significant potential for bias introduced by non-blinded peer review, the results for the two regions were combined rather than presenting separate Class I error rates for the two regions.
This value represents an average over the study period, with yearly rates ranging from 1.6–10.8%. In fact, all but the last year had rates under 3%, with the increase in the final year attributed to targeted efforts to increase autopsies.
Study focused on patients who died in the Emergency Department (ED) or within 7 days of hospitalization after admission through the ED; forensic cases were included.
Listed autopsy rate is for all medical patients; specific autopsy rate for AIDS patients not stated.
Deaths in neonatal ICU were excluded.
As noted by the authors of this study, the lack of Class I errors likely reflects the high proportion of deaths due to cardiac arrest in the ED, making changes in outcome unlikely even in the presence of major unrecognized diagnoses.
Studies with fewer than 50 cases were included only if they involved a specific patient population (e.g., perinatal autopsies, ICU patients, oncology, etc.). Such studies are listed in the table below
| Institution | Autopsy Rate | Case Mix (Number of Autopsies) | Error Rate (95% CI) |
|---|---|---|---|
| Peter Bent Brigham Hospital (Boston): 1960 51 | 75% | Adult inpatients (100) | 22% (15–32%) |
| Peter Bent Brigham Hospital (Boston): 1970 51 | 71% | Adult inpatients (100) | 23% (15–33%) |
| Zurich University Hospital (Switzerland): 197255 | 94% | Adult medical (100) | 30% (21.4–40.1%) |
| South Lothian District hospitals (Scotland): 1975-7 155,156 | 25% | General inpatients (1,152) | 38% (36–41%) |
| University of Edinburgh teaching hospital (Scotland): 1978 123 | 64% | General inpatients (154) | 15% (10–22%) |
| Peter Bent Brigham Hospital (Boston): 198051 | 38% | Adult inpatients (100) | 21% (14–30%) |
| Zurich University Hospital (Switzerland): 198255 | 89% | Adult medical (100) | 18% (11.3–27.2%) |
| White River Junction VA Medical Center (White River Junction, VT): 198378 | 60% | General inpatients (111) | 12.6% (7.3–20.6%) |
| Chandigarh Postgraduate Institute of Medical Education and Research (India): 1983-88309 | 23–27%12 | Adult inpatients13 (1,000) | 31.7% (28.8–34.7%) |
| 32 US hospitals (21 university and 11 community non-teaching): 1984 70 | 30% | General inpatients (2,067) | 34% (32–36%) |
| Brigham & Women's Hospital (Boston): 1984-85 129 | 37% | General inpatients (175) | 23% (17–30%) |
| Northwestern Memorial Hospital (Chicago): 1985 130 | 36% | General inpatients (142) | 23% (17–31%) |
| Medical Center of Delaware: 1988-91 210 | 17% | General inpatients (145) | 12% (7–18%) |
| Peterborough District Health Authority (England): 1990 310 | 13% | General inpatients (63) | 19% (11–31%) |
| Zurich University Hospital (Switzerland): 199255 | 89% | Adult medical (100) | 14% (8.1–22.7%) |
| University of Pittsburgh Medical Center (Pittsburgh): 1994 152 | 19% | General inpatients (172) | 34% (27–42%) |
| Prince of Wales Hospital (Hong Kong): 1997211 | 17.7% | General inpatients and forensic cases (332) | 23.5% (19.1–28.5%) |
| Mason F. Lord Nursing Home (Baltimore, MD):1981-88303 | 3.5%14 | Nursing home residents (34) | 47.1% (30.2–64.6%) |
| Leiden University Hospital (Netherlands): 1984118 | 47% | Adult medical (133) | 41% (33–50%) |
| Ben Taub General Hospital (Houston, TX): 1992-3 116 | 16% | Medical PAtients15 (110) | 23.6% (16.3–32.9%) |
| Ben Taub General Hospital (Houston, TX): 1992-3 116 | 16%16 | Patients with AIDS (42) | 33.3% (20.0–49.6%) |
| Salford Health Authority (UK): 1981295 | 35%17 | Inpatients over 85 yrs old (129) | 31% (23.3–39.8%) |
| Beth Israel Hospital (Boston, MA): 1981-83158 | 27% | Adult inpatient deaths after undergoing CPR18 (130) | 13.8% (8.6–21.3%) |
| University of Massachusetts Medical Center (Worcester, MA): 1984-88120 | 32% | Emergency patients19 (244) | 4.1% (2.1–7.6%) |
| Johns Hopkins Hospital (Baltimore, MD): 1985-9594 | 24% | Adult cardiac surgery (147) | 38.8% (31.0–47.2 %) |
| Royal Victoria Hospital (Northern Ireland): 1986-88306 | 23% | Adult and pediatric perioperative deaths (213) | 49.8% (42.9–56.6%) |
| Hospital Central de la Cruz Roja (Spain): 1983-85125 | 51% | Adult ICU (100) | 22% (14.6–31.6%) |
| Hartford Hospital (New Haven, CT): 1986-92304 | 29% | Adult surgical ICU (149) | 23% (16.5–30.6%) |
| Hershey Medical Center (Pennsylvania, PA): 1994-5 117 | 31% | Adult ICU (41) | 27% (15–43%) |
| Cleveland Clinic (Cleveland, OH): 1994-952 | 23% | Medical ICU (91) | 19.8% (12.4–29.7%) |
| Gloucestershire Royal Hospital (U.K.): 1996-993 | 40% | ICU patients (97) | 23.7% (15.9–33.6%) |
| University of Rochester Medical Center (Rochester, NY): 1989-94126 | 74% | Pediatric inpatients including forensic cases (157) | 6.4% (3.3–11.7%) |
| Children's Hospital of New Jersey (Newark, NJ): 1992133 | 29% | General Pediatrics20 (23) | 13.0% (3.4–34.7%) |
| University of Texas Medical Branch (Galveston, TX): 1984-88305 | 73% | Adult and pediatric surgical patients (409) | 30.3% (26–35.1%) |
| Lutheran General Children's Hospital (Park Ridge, Ill): 1984-93128 | 36% | Pediatric inpatients (107)21 | 13.1% (7.6–21.3%) |
| Children's Hospital of Western Ontario (Canada): 1985-89307 | 75% | Pediatric Deaths in the Emergency Department (52) | 15.4% (7.3–28.6%) |
| Hospital for Sick Children (Canada): 1985-90124 | 62% | Neonatal ICU (338) | 18.9% (15.0–23.6%) |
| North Shore University Hospital (Manhasset, NY): 1985-92132 | 26% | Pediatric ICU (50) | 28.0% (16.7–42.7%) |
| Ryder Trauma Center (Miami): 1997-98321 | 97% | Adult & pediatric trauma/burn patients dying in ICU (153) | 15.7% (10.5–22.6%) |
| King Edward Memorial Hospital (India): 1995-96319 | 82% | Neonatal ICU (197) | 26.9% (21.0–33.8%) |
Stated range for annual autopsy rates during study period; overall autopsy rate not stated.
Aged 15 years and older.
This value represents an average over the study period, with yearly rates ranging from 1.6–10.8%. In fact, all but the last year had rates under 3%, with the increase in the final year attributed to targeted efforts to increase autopsies.
Patients with Acquired Immunodeficiency Syndrome (AIDS) were reported separately; these data are presented in the line below this entry.
Listed autopsy rate is for all medical patients; specific autopsy rate for AIDS patients not stated.
The autopsy rate is for patients aged 65 and over; the rate specifically for the group over 85 years is unclear.
Study autopsies drawn from inpatients who died after undergoing cardiopulmonary resuscitation (CPR); patients experienced cardiac arrest outside the hospital, in the Emergency Department or ambulatory clinics were excluded.
Study focused on patients who died in the Emergency Department or within 7 days of hospitalization after admission through the Emergency Department; forensic cases were included.
Deaths in the Emergency Department were excluded.
Deaths in neonatal ICU were excluded
As explained in the text, “major errors” included missed diagnoses that, while important, likely had no effect on patient outcome. Changes in management could be assessed as possible or even expected, but impacts on outcome for errors in this category were judged to have been “equivocal,” “doubtful,” “unlikely,” or otherwise unexpected. (Errors for which expected impacts on patient outcomes were explicitly restricted to symptom palliation were included in this category.) This category also included the sum of Class I and Class II errors from articles that explicitly reference the classifications of Goldman et al51 or Battle et al.70
| Institution | Autopsy Rate | Case Mix (Number of Autopsies) | Error Rate (95% CI) |
|---|---|---|---|
| Serafimerlasarettet (Sweden): 1970-1122 | 96% | General inpatients (383) | 7% (5–10%) |
| Holstebro Central Hospital (Denmark): 1974 199 | 85% | General inpatients (266) | 10% (6.9–14.6%) |
| University of Bergen (Norway): 1975293 | 75% | General inpatients (742) | 18.7% (16–21.8%) |
| South Lothian District hospitals (Scotland): 1975-7 155,156 | 25% | General inpatients (1,152) | 22% (20–24%) |
| Anonymous community teaching hospital (New York State): 1977-7815 | 37% | General inpatients (130)a | 17.7 % (11.8–25.6%) |
| University of Sao Paulo (Brazil): 1978-80311 | 84% | General inpatients (997) | 33.7% (30.8–36.7%)b |
| 39 Connecticut institutions performing autopsies: 198014 | 14% | General (272) | 26.8%c (21.8–32.6%) |
This study was primarily focused on the accuracy of death certificates. Two anonymous hospitals were involved. At one of the hospitals, 130 of the 229 autopsies had death certificates that the treating clinicians had completed prior to autopsy. The data from these 130 autopsies at one of the two study hospitals provide the basis for the rate of diagnostic discrepancies indicated in the table. The autopsy rate listed for this study is based on all of the autopsies, as it is not possible to calculate a rate applicable to just the 130 cases for which the death certificate was competed prior to autopsy.
This outlier result is largely attributable to the high misclassification rate involving cases of Chagas' disease, which was quite prevalent in this Brazilian autopsy series. Clinical diagnosis was regarded as having missed 71 cases of Chagas's disease, so that 71 cases were assigned to some major disease grouping other than “infectious”- most often the “circulatory” category.
This relatively high result partially reflects the shift of all of 13 cases in the “ill-defined” category to the trauma category and failure to code 15 deaths as due to alcoholism, all of which then counted as missed “mental disorders.” Excluding all 28 of these cases lowers the observed error rate to 16.5% (95% CI: 12.4–21.6%).
As explained in the text, studies included in this table reported the prevalence of cases for which major clinical and autopsy diagnoses fell in different disease categories in the International Classification of Diseases (ICD). Although this classification of clinical-autopsy discrepancies makes no mention of changes in outcome or treatment, such changes would be likely in many, if not most cases. Moreover, the classification scheme is well defined and likely involves less subjectivity than the other two error categories above.
| Study | Major Findings |
|---|---|
| 75 hospitals in England and Wales (U.K.): 1959127 | Autopsy rates inversely related to clinicians' diagnostic certainty |
| Rates of disagreement between autopsy and clinical diagnoses increased as clinicians became less certain of the cause of death. | |
| Significant clinical-autopsy disagreements still noted in 16% of cases clinicians rated with the high level of confidence. | |
| Serafimerlasarettet (Sweden): 1970-1122 | Rates of autopsy detected diagnostic errors higher in cases with clinical uncertainty |
| Major diagnostic discrepancies still noted in 15% of clinically confident cases under age 70 years of age and 34% of confident cases over age 70. | |
| University of Bergen (Norway): 1975293 | Autopsy rated as “essential” in only 45% of cases identified by clinicians as diagnostically “uncertain” |
| Clinicians appeared to base their impression of autopsy necessity on more than just diagnostic confidence | |
| South Lothian District hospitals (Scotland): 1975-7 155,156 | Autopsy rates and error rates inversely correlated with clinical confidence |
| Failure to confirm clinical diagnoses still noted in 25% of clinically certain cases | |
| One University of Edinburgh teaching hospital from above study (Scotland): 1978 123 | Study again documented that clinicians pursue autopsies less frequently and make fewer diagnostic errors when they are confident of their diagnoses |
| By contrast, there was no difference in diagnostic error rates between autopsies clinicians viewed as necessary and those they viewed as unnecessary | |
| Leiden University Hospital (Netherlands): 1984 118 | Clinicians identified cases as certain (185), probable (54) or uncertain (50). Major diagnostic errors were identified in 4% of the first category, 15% of the second and 46% for the third. |
| Brigham & Women's Hospital (Boston, MA) and Emerson Hospital (Concord, MA): 1984-85 129 | Clinicians' prospective impressions of the likelihood of unexpected findings at autopsy showed no significant relationship to the occurrence of major unexpected findings (Class I or II errors) |
| Institution | Autopsy Rate | Case Mix (Number of Autopsies) | Unsuspected complications (95% CI) |
|---|---|---|---|
| Peter Bent Brigham Hospital (Boston): 196051 | 38% | Adult inpatients (100) | 1% (0.05–6.2%) |
| Peter Bent Brigham Hospital (Boston): 197051 | 38% | Adult inpatients (100) | 3% (0.8–9.2%) |
| Evanston Hospital (Evanston, IL): 1973-82 | Not stated | General inpatients and forensic cases (2,537) | 3.1% (2.5–3.9%) |
| Peter Bent Brigham Hospital (Boston): 198051 | 38% | Adult inpatients (100) | 0% (0–4.6%) |
| Brigham & Women's Hospital (Boston, MA): 1984-85 129 | 37% | Adult inpatients (175) | 0.6% (0.03–3.6%) |
| Emerson Hospital (Concord, MA): 1984-85 129 | 26% | Adult inpatients (58) | 0% (0–7.7%) |
| Hospital for Sick Children (Canada): 1985-90124 | 62% | Neonatal ICU (338) | 15.4% (11.8–19.8%) |
| Johns Hopkins Hospital (Baltimore, MD): 1985-9594 | 24% | Adult cardiac surgery (147) | 1.4% (0.2–5.3%) |
| Royal Alexandra Hospital for Children (Australia): 1991-97308 | 40% | Neonatal ICU (91) | 4.4% (1.4–11.5%) |
| Prince of Wales Hospital (Hong Kong): 1997211 | 17.7% | General inpatients and forensic cases (332) | 3.6% (2.0–6.4%) |
| Central Hospital of Akershus (Norway): 1993-95312 | 78% | Adult medicine (572) | 7.7%d (5.7–10.3%) |
This rate refers to adverse drug events (ADEs) only, not all complications. On the other hand, it is not clear how “unsuspected” these complications were. The authors stated that the autopsy was “decisive for identifying fatal ADEs” in 44 autopsies (from a total of 572).
| Condition | Estimated Sensitivity of Clinical Diagnosis | Source of Estimate |
|---|---|---|
| Acute PE | 87–97% | A prospective clinical registry of 2,454 patients with suspected or confirmed PE, included 61 major PE cases first detected at autopsy. 177 As a high proportion of the remaining cases were diagnosed with high probability, these data permit a reasonable estimate of the sensitivity of clinical diagnosis in detecting acute PE. (The assumptions required to generate this estimate are discussed in the text.) |
| Acute myocardial infarction | 96–98% | Four large cohort studies of patients presenting to the hospital189–192 have reported that 2–4% wutg acute MI or unstable angina are clinically missed and inappropriately discharged from the emergency departmente |
| Acute appendicitis | ~98% | This estimate comes from the one autopsy study155,156 that includes data on discharge diagnoses for all patients, not just those who died. The high sensitivity listed here does not take into account whether or not perforation had already occurred by the time the clinical diagnosis was made. |
| ~90% | This estimate reflects the false-negative rate for physicians assessing patients with acute abdominal pain in the office or ED - i.e., it does not take into account eventual clinical diagnosis by consulting surgeons or other physicians seeing these patients after hospital admission. As explained in the text, the study included a cohort of 1,118 consecutive patients with abdominal pain, 44 of whom proved to have appendicitis.194 | |
| Acute dissection of the aorta | 45–90% | Data from two large clinical series indicate that 10–55% of patients with aortic dissection are not identified until autopsy.204,205 |
| Active tuberculosis | >83% | One autopsy study155,156 included data on discharge diagnoses for all patients, not just those who died, allowing estimation of the sensitivity of clinical diagnosis using reasonable assumptions discussed in the text. |
| 39–96% | Data from a department of health indicated that the diagnosis of TB followed death in 4% of 3,102 reported cases.219 | |
As explained in the text, population level studies of “unrecognized MI” were not included here, because the majority of patients with this entity have minimal or no symptoms and so do not seek medical attention.
Legend: these autopsy rates are those reported from all included studies of autopsy-detected diagnostic errors and thus reflect rates from institutions around the world. Appendix Figure 2 shows US data only and is limited to institutions with more than one time point, so that the general trend toward fewer autopsies is more apparent
NB: This figure does not include the 3 most recent studies,319–321 though these studies were used in the regression analyses and have been included in several of the other figures and tables.
Legend: Trends derived from the following sources: 54 US Hospitals in Professional Activity Study (PAS)313; aggregate US autopsy rate32; 79 university hospitals listed in the American Medical Association directory of approved internships and residencies314; Yale-New Haven Hospital (New Haven, CT)28, 315; Brigham & Women's Hospital (Boston, MA)51, 129
The table on the following page shows additional data on US autopsy rates.
Addendum to Appendix Figures 2: Trends in Autopsy Rates - Studies from a single time period, but involving more than 2 US institutions
| Place/Institution | Observation Period | Mean Autopsy Rate |
|---|---|---|
| 18 States316 | 1967 | 26.7% |
| 19 States (only partial overlap with above)316 | 1968 | 20.2% |
| 1,828 US Hospitals in Professional Activity Study (PAS)317 | 1975 | 21.7% (28.4% for teaching hospitals) |
| 21 university teaching hospitals70 | 1984 | 32% |
| 11 community non-teaching hospitals70 | 1984 | 20% |
| Six Veterans Affairs Hospitals131 | 1986-7 | 43% |
| 256 Pathology departments in College of American Pathologists Q-probe study (49% teaching hospitals)318 | 1993 | 12.4% (median 8.5%) |
The numbering of the autopsy series here corresponds to the vertical axis in Appendix Figure 3 and in Appendix Figure 4
| Autopsy Series | Year | Autopsy Rate | Case Mix (No. of autopsies) | Major Error Rate (95% CI) | Class I Error Rate (95% CI) |
|---|---|---|---|---|---|
| 1. Christian-Albrechts University Hospital (Germany)54 | 1959 | 88% | General inpatients (100) | ---- | 7% (3–14%) |
| 2. Peter Bent Brigham Hospital (Boston)51 | 1960 | 75% | Adult inpatients (100) | 22% (15–32%) | 8% (4–16%) |
| 3. Christian- Albrechts University Hospital (Germany)54 | 1969 | 82% | General inpatients (100) | ---- | 12% (7–20%) |
| 4. Peter Bent Brigham Hospital (Boston)51 | 1970 | 71% | Adult inpatients (100) | 23% (15–33%) | 12% (7–20%) |
| 5. South Lothian District hospitals (Scotland)3,4 | 1975-77 | 25% | General inpatients (1,152) | 38% (36–41%) | ---- |
| 6. University of Edinburgh (Scotland)5 | 1978 | 64% | General inpatients (154) | 15% (10–22%) | ---- |
| 7. Winnipeg Health Sciences Center (Canada)299 | 1978-79 | 26%22 | Adult inpatients (200) | ---- | 1.5% (0.4–4.7%) |
| 8. Royal Adelaide Hospital (Australia)300 | 1979 | 45% | Adult inpatients (99) | ---- | 2.0% (0.4–7.8%) |
| 9. Christian- Albrechts University Hospital (Germany)54 | 1979 | 58% | General inpatients (100) | ---- | 12% (7–20%) |
| 10. Peter Bent Brigham Hospital (Boston)51 | 1980 | 38% | Adult inpatients (100) | 21% (14–30%) | 11% (6–19%) |
| 11. Thirty-nine Connecticut institutions performing autopsies 14 | 1980 | 14% | General autopsies (272) | ---- | 4.0% (2.1–7.3%) |
| 12. Belgrade University School of Medicine (Yugoslavia)301 | 1981-84 | 12% | General inpatients (2,145) | 29% (26.9–30.8%) | ---- |
| 13. White River Junction VA Medical Center 78 | 1983 | 60% | General inpatients (111) | 12.6% (7.3–20.6%) | ---- |
| 14. Chandigarh Postgraduate Institute (India)309 | 1983-88 | 23–27%23 | Adult inpatients24 (1,000) | 31.7% (28.8–34.7%) | ---- |
| 15. 32 U.S. university and community hospitals 70 | 1984 | 30% | General inpatients (2,067)25 | 34% (32–36%) | 13% (12–15%) |
| 16. Brigham & Women's Hospital (Boston)129 | 1984-85 | 37% | Adult inpatients (175) | 23% (17–30%) | 11% (7–17%) |
| 17. Emerson Hospital (Concord, MA)129 | 1984-85 | 26% | Adult inpatients (58) | 33% (21–46%) | 12% (5–24%) |
| 18. Northwestern Memorial Hospital (Chicago)130 | 1985 | 36% | General inpatients (142) | 23% (17–31%) | ---- |
| 19. Medical Center of Delaware 210 | 1988-91 | 17% | General inpatients (145) | 12% (7–18%) | 3% (1–8%) |
| 20. St. Vincent's Hospital (Australia)160 | 1988 | 22% | General inpatients (139) | ---- | 6% (3–11%) |
| 21. Christian- Albrechts University Hospital (Germany)54 | 1989 | 36% | General inpatients (100) | ---- | 11% (6–19%) |
| 22. Peterborough District Health Authority (UK)310 | 1990 | 13% | General inpatients (63) | 19% (11–31%) | ---- |
| 23. University of Pittsburgh Medical Center150 | 1994 | 19% | General inpatients (172) | 34% (27–42%) | ---- |
| 24. Prince of Wales Hospital (Hong Kong)211 | 1997 | 17.7% | General autopsies (332) | 23.5% (19.1–28.5%) | ---- |
| 25. Zurich University Hospital (Switzerland)55 | 1972 | 94% | Medical patients (100) | 30% (21.4–40.1%) | 16% (10–25%) |
| 26. Zurich University Hospital (Switzerland)55 | 1982 | 89% | Medical patients (100) | 18% (11.3–27.2%) | 9% (4–17%) |
| 27. Leiden University Hospital (Netherlands)118 | 1984 | 47% | Medical patients (133) | 41% (33–50%) | ---- |
| 28. Zurich University Hospital (Switzerland)55 | 1992 | 89% | Medical patients (100) | 14% (8.1–22.7%) | 7% (3–14%) |
| 29. Ben Taub General Hospital (Houston)116 26 | 1992-93 | 16% | Medical patients (110) | 23.6% (16.3–32.9%) | 10.9% (6.3–16.8%) |
| 30. Hospital Central de la Cruz Roja (Spain)125 | 1983-8527 | 51% | Adult ICU (100) | 22% (14.6–31.6%) | 7% (3.1–14.4%) |
| 31. Six Department of Veterans Affairs Hospitals131 | 1986-87 | 43% | Medical ICU (172) | 27.9% (21.5–35.3%) | 12% (8–18%) |
| 32. Hartford Hospital (New Haven, CT) 304 | 1986-92 | 29% | Surgical ICU (149) | 23% (16.5–30.6%) | 11% (6.5–17.1%) |
| 33. Cleveland Clinic2 | 1994-95 | 23% | Medical ICU (91) | 19.8% (12.4–29.7%) | ---- |
| 34. Hershey Medical Center (Hershey, PA)117 | 1994-95 | 31% | Medical-Coronary ICU (41) | 27% (15–43%) | ---- |
| 35. Gloucestershire Royal Hospital (UK)3 | 1996-99 | 40% | Adult ICU (97) | 23.7% (15.9–33.6%) | 4.1% (1.3–10.8%) |
| 36. General Hospital Celje (Slovenia)320 | 1998-99 | 47% | Medical ICU (126) | ---- | 9.5% (5.2–16.4%) |
| 37. San Francisco General Hospital and 31 hospitals in Orange County, CA302 28 | 1974-75 | 100% | Motor vehicle fatalities (182) | ---- | 12.6% (8.3–18.6%) |
| 38. University of Texas Medical Branch (Galveston)305 | 1984-88 | 73% | Adult and pediatric surgery (409) | 30.3% (26–35.1%) | 7.8% (5.5–11%) |
| 39. Johns Hopkins Hospital (Baltimore)94 | 1985-95 | 24% | Cardiac surgery (147) | 38.8% (31.0–47.2 %) | ---- |
| 40. Royal Victoria Hospital (Northern Ireland)306 | 1986-88 | 23% | Post-operative deaths (213)29 | 49.8% (42.9–56.6%) | 20.7% (15.6–26.8%) |
| 41. Ryder Trauma Center (Miami)321 | 1997-98 | 97% | Adult & pediatric trauma/bum patients dying in ICU30 (153) | 15.7% (10.5–22.6%) | 2.6% (0.8–7.0%) |
| 42. Children's Hospital of New Jersey (Newark)133 | 1992 | 29% | General pediatrics31 (23) | 13.0% (3.4–34.7%) | 4.3 % (0.2–24.0%) |
| 43. Lutheran General Children's Hospital (Park Ridge, III)128 | 1984-93 | 36% | General pediatrics32 (107) | 13.1% (7.6–21.3%) | 6.5 % (2.9–13.5 %) |
| 44. University of Rochester Medical Center (Rochester, NY)126 | 1989-94 | 74% | General pediatrics (157) | 6.4% (3.3–11.7%) | ---- |
| 45. North Shore University Hospital (Manhasset, NY)132 | 1985-92 | 26% | Pediatric ICU (50) | 28.0% (16.7–42.7%) | 10% (3.7–22.6%) |
| 46. Toronto Hospital for Sick Children (Canada)124 | 1985-90 | 62% | Neonatal ICU (338) | 18.9% (15.0–23.6%) | 2.1% (0.9–4.4%) |
| 47. Royal Alexandra Hospital for Children (Australia)308 | 1991-97 | 40% | Neonatal ICU (91) | ---- | 5.5 % (2.0–12.9 %) |
| 48. King Edward Memorial Hospital (India)319 | 1995-9633 | 82% | Neonatal ICU (197) | 26.9 (21.0–33.8%) | 12.2 (8.1–17.8%) |
| 49. Salford Health Authority (UK)295 | 1981 | 35%34 | Inpatients over 85 yrs old (129) | 31% (23.3–39.8%) | ---- |
| 50. Mason F. Lord Nursing Home (Baltimore)303 35 | 1981-88 | 3.5% | Nursing home residents (34) | 47.1% (30.2–64.6%) | 32.4% (18–50.6%) |
| 51. Beth Israel Hospital (Boston)158 | 1981-83 | 27% | Adult inpatients post-CPR36 (130) | 13.8% (8.6–21.3%) | ---- |
| 52. University of Massachusetts Medical Center (Worcester, MA)120 | 1984-88 | 32% | Deaths in ED37 (244) | 4.1% (2.1–7.6%) | 1.6% (0.5–4.4%) |
| 53. Children's Hospital of Western Ontario (Canada)307 | 1985-89 | 75% | Pediatric Deaths in ED (52) | 15.4% (7.3–28.6%) | 0% (0–8.6%) |
| 54. Ben Taub General Hospital (Houston)116 | 1992-93 | 16%38 | Adult inpatients with AIDS (42) | 33.3% (20.0–49.6%) | 9.5% (3.1–23.6%) |
Rate reflects “teaching cases” only, with medicolegal cases excluded; overall autopsy rate for this period including medicolegal/forensic cases was 44%.
Stated range for annual autopsy rates during study period; overall autopsy rate not stated.
Aged 15 years and older.
Not clear how many pediatric patients included as study states only that 2,067 autopsies reviewed included 360 cases under 20 years of age.
Patients with Acquired Immunodeficiency Syndrome (AIDS) were reported separately — series 54, the last entry in this table.
May, 1983-Dec.1985 —> in calculating midpoint, was roundd to nearest year or mid-year so became 1984.5.
The authors of this study were focused primarily on demonstrating the difference in quality of care between a region with a Level 1 trauma center (San Francisco County) and a region (Orange County) where trauma patients are sent to nearest available facility. Chart reviews were conducted by the authors and were not blinded to patient location. Because of the significant potential for bias introduced by non-blinded peer review, the results for the two regions were combined rather than presenting separate Class I error rates for the two regions.
“Perioperative” was defined as deaths during surgery or within 30 days after surgery. Thus, this sample essentially represents deaths among patients who have undergone surgery, as opposed to patients who die on the surgical service without having undergone operation.
Age range 9–104; mean = 50 y.o.
Deaths in the Emergency Department were excluded.
Deaths in neonatal ICU were excluded.
July 1995-August 1996, so only one year, but midpoint is January 1996.
The listed autopsy rate is for patients aged 65 and over; the rate specifically for the group over 85 years is unclear.
This value represents an average over the study period, with yearly rates ranging from 1.6–10.8%. In fact, all but the last year had rates under 3%, with the increase in the final year attributed to targeted efforts to increase autopsies.
Study autopsies drawn from inpatients who died after undergoing cardiopulmonary resuscitation (CPR); patients who experienced cardiac arrest outside the hospital, in the Emergency Department or ambulatory clinics were excluded.
Study focused on patients who died in the Emergency Department (ED) or within 7 days of hospitalization after admission through the ED; forensic cases were included.
Listed autopsy rate is for all medical patients; specific autopsy rate for AIDS patients not stated.
The numbering of the autopsy series here corresponds to the vertical axis in the Appendix Figure 4
| Institution | Autopsy Rate | Case Mix (Number of Autopsies) | Error Rate (95% CI) |
|---|---|---|---|
| 1. Peter Bent Brigham Hospital (Boston): 196051 | 75% | Adult inpatients (100) | 22% (15–32%) |
| 2. Peter Bent Brigham Hospital (Boston): 197051 | 71% | Adult inpatients (100) | 23% (15–33%) |
| 3. South Lothian District hospitals (Scotland): 1975-7155,156 | 25% | General inpatients (1152) | 38% (36–41%) |
| 4. University of Edinburgh teaching hospital (Scotland): 1978123 | 64% | General inpatients (154) | 15% (10–22%) |
| 5. Peter Bent Brigham Hospital (Boston): 198051 | 38% | Adult inpatients (100) | 21% (14–30%) |
| 6. Belgrade University School of Medicine (Yugoslavia): 1981-84301 | 12% | General inpatients (2,145) | 29% (26.9–30.8%) |
| 7. White River Junction VA Medical Center (White River Junction, VT): 198378 | 60% | General inpatients (111) | 12.6% (7.3–20.6%) |
| 8. Chandigarh Postgraduate Institute of Medical Education and Research (India): 1983-88309 | 23–27% | Adult inpatients (1000) | 31.7% (28.8–34.7%) |
| 9. 32 US hospitals (21 university and 11 community non-teaching): 198470 | 30% | General inpatients (2067) | 34% (32–36%) |
| 10. Brigham & Women's Hospital (Boston): 1984-85129 | 37% | General inpatients (175) | 23% (17–30%) |
| 11. Emerson Hospital (Concord, MA): 1984-85129 | 26% | General inpatients (58) | 33% (21–46%) |
| 12. Northwestern Memorial Hospital (Chicago): 1985130 | 36% | General inpatients (142) | 23% (17–31%) |
| 13. Medical Center of Delaware: 1988-91210 | 17% | General inpatients (145) | 12% (7–18%) |
| 14. Peterborough District Health Authority (England): 1990310 | 13% | General inpatients (63) | 19% (11–31%) |
| 15. University of Pittsburgh Medical Center (Pittsburgh): 1994152 | 19% | General inpatients (172) | 34% (27–42%) |
| 16. Prince of Wales Hospital (Hong Kong): 1997211 | 17.7% | General inpatients and forensic cases (332) | 23.5% (19.1–28.5%) |
| 17. Zurich University Hospital (Switzerland): 197255 | 94% | Adult medical (100) | 30% (21.4–40.1%) |
| 18. Mason F. Lord Nursing Home (Baltimore, MD): 1981-88303 | 3.5% | Nursing home residents (34) | 47.1% (30.2–64.6%) |
| 19. Zurich University Hospital (Switzerland): 198255 | 89% | Adult medical (100) | 18% (11.3–27.2%) |
| 20. Leiden University Hospital (Netherlands): 1984118 | 47% | Adult medical (133) | 41% (33–50%) |
| 21. Zurich University Hospital (Switzerland): 199255 | 89% | Adult medical (100) | 14% (8.1–22.7%) |
| 22. Ben Taub General Hospital (Houston, TX): 1992-3116 | 16% | Medical patients (110) | 23.6% (16.3–32.9%) |
| 23. Ben Taub General Hospital (Houston, TX): 1992-3116 | 16% | Patients with AIDS (42) | 33.3% (20.0–49.6%) |
| 24. Salford Health Authority (UK): 1981295 | 35% | Inpatients over 85 yrs old (129) | 31% (23.3–39.8%) |
| 25. Beth Israel Hospital (Boston, MA): 1981-83158 | 27% | Adult inpatient deaths after undergoing CPR (130) | 13.8% (8.6–21.3%) |
| 26. University of Massachusetts Medical Center (Worcester, MA): 1984-88120 | 32% | Emergency patients (244) | 4.1% (2.1–7.6%) |
| 27. Johns Hopkins Hospital (Baltimore, MD): 1985-9594 | 24% | Adult cardiac surgery (147) | 38.8% (31.0–47.2 %) |
| 28. Royal Victoria Hospital (Northern Ireland): 1986-88306 | 23% | Adult and pediatric perioperative deaths (213) | 49.8% (42.9–56.6%) |
| 29. Hospital Central de la Cruz Roja (Spain): 1983-85125 | 51% | Adult ICU (100) | 22% (14.6–31.6%) |
| 30. Hartford Hospital (New Haven, CT): 1986-92304 | 29% | Adult surgical ICU (149) | 23% (16.5–30.6%) |
| 31. Hershey Medical Center (Pennsylvania, PA): 1994-5117 | 31% | Adult ICU (41) | 27% (15–43%) |
| 32. Cleveland Clinic (Cleveland, OH): 1994-952 | 23% | Medical ICU (91) | 19.8% (12.4–29.7%) |
| 33. Gloucestershire Royal Hospital (U.K.): 1996-993 | 40% | ICU patients (97) | 23.7% (15.9–33.6%) |
| 34. University of Rochester Medical Center (Rochester, NY): 1989-94126 | 74% | Pediatric inpatients including forensic cases (157) | 6.4% (3.3–11.7%) |
| 35. Children's Hospital of New Jersey (Newark, NJ): 1992133 | 29% | General Pediatrics (23) | 13.0% (3.4–34.7%) |
| 36. University of Texas Medical Branch (Galveston, TX): 1984-88305 | 73% | Adult and pediatric surgical patients (409) | 30.3% (26–35.1%) |
| 37. Lutheran General Children's Hospital (Park Ridge, III): 1984-93128 | 36% | Pediatric inpatients (107) | 13.1% (7.6–21.3%) |
| 38. Children's Hospital of Western Ontario (Canada): 1985-89307 | 75% | Pediatric Deaths in the Emergency Department (52) | 15.4% (7.3–28.6%) |
| 39. Hospital for Sick Children (Canada): 1985-90124 | 62% | Neonatal ICU (338) | 18.9% (15.0–23.6%) |
| 40. North Shore University Hospital (Manhasset, NY): 1985-92132 | 26% | Pediatric ICU (50) | 28.0% (16.7–42.7%) |
Legend: Class I error rates reported in autopsy series included in review without adjustment from variations in autopsy rate or case mix.
NB: This figure does not include the 3 most recent studies,319–321 though these studies were used in the regression analyses and have been included in several of the other figures and tables.
Legend: Major error rates reported in autopsy series included in review without adjustment from variations in autopsy rate or case mix.
NB: This figure does not include the 3 most recent studies,319–321 though these studies were used in the regression analyses and have been included in several of the other figures and tables.
Legend: Class I error rates derived from regression model over a range of autopsy rates in 4 different time periods with a case mix consisting of general inpatients.
Legend: The above scatter plot graphs the observed error rate against sample size for studies reporting class I errors. For studies of therapeutic interventions, an inverted funnel distribution is expected if there is no publication bias, as the reported effect sizes are expected to be symmetrically distributed about the true effect. In this case, the inclusion of studies with different patient populations introduces additional asymmetry (e.g., the data point furthest to the right represents the only study involving nursing home patients). Therefore, we constructed a funnel plot limited to studies reporting autopsies among general inpatients, Appendix Figure 9b).
NB: Figures 9a & 9b and Figures 10a &10b do not include the 3 most recent studies,319–321 though these studies were used in the regression analyses and have been included in several of the other figures and tables.
Legend: This scatter plot is more symmetric appearing than that in Appendix Figure 9a. However, because the issue of asymmetry due to inclusion of different patient populations still applies, a separate plot involving general inpatient studies only was again created (Appendix Figure 10b).
NB: Figures 9a & 9b & Figures 10a & 10b do not include the 3 most recent studies,319–321 though these studies were used in the regression analyses and have been included in several of the other figures and tables.
Legend: Restricting the funnel plot to studies reporting general inpatients (as shown above), indicates a relative absence of large studies with low error rates and small studies with high error rates. As explained in the text, the present review differs from systematic reviews of therapeutic studies in that publication bias might be operating in opposing directions (as suggested here). On the one hand, studies reporting very low error rates might be considered uninteresting and subject to publication bias in a manner similar to “negative” therapeutic trials. On the other hand, studies reporting high error rates might be self-censored by the institutions themselves. It is even possible that institutions with high error rates are less likely to conduct these types of studies in the first place (e.g., because they have less interest in performance measurement and improvement).
The error rate was modeled from country, time, case mix and autopsy rate using a logistic model with a random study effect. Case mix and country were treated as categorical effects. More specifically, if X 1, X 2, … are the above predictor variables, Ne the number of errors found for a study with Na autopsies, then the error rate p (Ne / Na) is modeled as 1 / (1 + e-λ) where λ = β0 + β1 X 1 + β2 X 2 + … + u with the random study effect, u, having a normal distribution with mean 0 and variance σ2. Computations were done using the SAS NLMixed procedure. [SAS software, version 8.2, SAS Institute: Cary, NC.]
In the tables and analysis below, Case Mix (CM) was categorized as follows:
General inpatients or general adult inpatients
Adult medical
Adult ICU
Adult or pediatric surgery
Pediatric inpatients
Neonatal or pediatric ICU
Other
Country was treated as a dichotomous variable, with U.S. studies assigned the value –1 and non-U.S. studies =1.
Time was defined as the midpoint of the study period (using whole or half years) and centered with 1980 as T=0. Thus a study reporting autopsies over the calendar year 1979 would be centered at –0.5; a study with a 2-year study period including 1982 and 1983 would be centered at +3.
Autopsy rates were also centered, using the unweighted mean autopsy rate for the included studies, which was 44.3% for studies reporting Class I and major errors.
For each of the error definitions (Class I, major errors, discrepant major ICD disease classifications), the analysis began with a model that included all predictors–study period, autopsy rate, case mix (general adult inpatients, medical patients, surgical parents, pediatric, etc.) and country (U.S., non-U.S.). We then compared this model with models in which one variable at a time was dropped.
The differences between the results produced by these models and the more complete model above were not statistically significant for Class I errors, but they were for the other two error definitions (data not shown). Even for Class I errors, the contributions of these other factors (autopsy rate, country, case mix) were noticeable, even if not statistically significant, and were clearly plausible. Therefore, we used the more complete model to compute the mean error rate and the range of error rates shown in the analyses and tables below for all three error definitions.
| Variable | Estimate | Standard Error | Odds Ratio | Lower 95% CI | Upper 95% CI | P-value |
|---|---|---|---|---|---|---|
| Intercept | -2.2622 | 0.2175 | 0.10412 | 0.06699 | 0.1618 | <.0001 |
| Time | -0.8122 | 0.6007 | 0.44389 | 0.1313 | 1.5008 | 0.1848 |
| Autopsy rate | -0.03041 | 0.01825 | 0.97005 | 0.9348 | 1.0066 | 0.1044 |
| Country | -0.09338 | 0.1322 | 0.91085 | 0.6967 | 1.1909 | 0.4844 |
| CM 2 vs 1 | 0.4969 | 0.3513 | 1.64370 | 0.8061 | 3.3516 | 0.1658 |
| CM 3 vs 1 | 0.1985 | 0.3121 | 1.21955 | 0.6476 | 2.2968 | 0.5289 |
| CM 4 vs 1 | 0.3589 | 0.3449 | 1.43173 | 0.7114 | 2.8815 | 0.3050 |
| CM 5 vs 1 | -0.3306 | 0.6389 | 0.71850 | 0.1966 | 2.6254 | 0.6080 |
| CM 6 vs 1 | 0.03348 | 0.3634 | 1.03405 | 0.4948 | 2.1610 | 0.9271 |
| CM 7 vs 1 | -0.4348 | 0.4007 | 0.64742 | 0.2873 | 1.4592 | 0.2851 |
I. Time Trend
Coefficient for Year = -0.03041 → -0.3041 to use decade as unit of analysis instead of single year
Odds ratio = exp (–0.3041)= 0.7378
<Subtract above from 1.0 to obtain relative decreases>
→ 0.2622
Class I error rates showed relative decrease of 26.2% per decade (p=0.1044)
95% CI = exp (-0.3041 ± 2*0.1825)
= exp (-0.3041 ± 0.3650)
= exp (0.0609, -0.6691)
= (1.0628,0.5122)
<Subtract above from 1.0 to obtain relative decreases>
→ (-0.0628,0.4878)
→ Class I error rates showed relative decrease of 28.0% (95% CI: 48.8% decrease to 6.3% increase) per decade
II. Relationship to Autopsy Rate
Coefficient Autopsy Rate = -0.8122
(-0.8122/10 = -0.08122 to calculate relationship as per 10% change in Autopsy Rate
Odds ratio = exp (-0.08122) = 0.9220
<Subtract above from 1.0 to obtain relative decrease>
(0.0780)
For every 10% increase in autopsy rate, Class I error rate decreased by 7.8% (p=0.1848)
95% CI = exp (-0.08122 (2*0.6007/10)
= exp (-0.08122 (0.12014)
= exp (-0.20136, 0.03892)
= 0.8176, 1.04
<Subtract above from 1.0 to obtain relative decreases>
(-0.0400, 0.1824)
(Class I error rate exhibited 7.8% relative decrease (95% CI: 18.2% decrease to 4.0% increase) for each 10% increase in autopsies.
III. Calculation of “mean” Class I error rate
Because error rates varied with time and autopsy rate (as well as case mix, though to a lesser extent), a true “mean” error rate does not exist. However, we can estimate a “base error rate,” if the predictor variables are all set to their base values (i.e., time=1980, autopsy rate = mean rate of 44.3%, country=U.S., and case mix=general autopsies).
A point estimate for the base error rate can then be obtained from the regression equation,
Prob (Class I error) = 1 / (1 + e-λ) where λ = β0 + β1 X 1 + β2 X 2 + … + u
Intercept= -2.2622 and, for the base probability, the terms for time, autopsy rate and case mix all equal zero, because the equation is centered on these values.
Therefore,
λ= -2.2622 + (1980-1980)(–0.03041) + (-0.8122)(0.443–0.443) + (-0.09338)(-1), where the –1 in the country term reflects the value assigned to U.S. (non-U.S.=+1).
Thus, λ= -2.16882, so that base prob (Class I error) = 1 / (1 + e‐(-2.16882))
→ Base prob (Class I error) = 0.10258→ 10.2%
Base error and 95% CI obtained from software were: 0.1023 and (0.06701, 0.1532)
Therefore, base rate of Class I errors was: 10.2% (95% CI: 6.7–15.3%)
Because the trends over time and the relationship to autopsy rate were not statistically significant for Class I errors, this base error rate provides a reasonable overall mean rate for Class I errors. It is clear, though (as shown in Table below), that the effects of study period (time) and autopsy rate are noticeable enough (even if not statistically significant for Class I errors). Therefore, a true “mean error rate” is not meaningful in the absence of stipulated values for time and autopsy rate. The table below shows how the Class I error rate varies with time and autopsy rate using the regression model above, with country equal to U.S. and case mix equal to general autopsies.
| Autopsy Rate | 1970 | 1980 | 1990 | 2000 |
|---|---|---|---|---|
| 5% | 17.6% | 13.6% | 10.4% | 7.9% |
| 10% | 17.0% | 13.1% | 10.0% | 7.6% |
| 15% | 16.5% | 12.7% | 9.7% | 7.3% |
| 20% | 15.9% | 12.2% | 9.3% | 7.1% |
| 25% | 15.4% | 11.8% | 9.0% | 6.8% |
| 30% | 14.9% | 11.4% | 8.7% | 6.5% |
| 40% | 13.9% | 10.6% | 8.0% | 6.1% |
| 50% | 12.9% | 9.9% | 7.5% | 5.6% |
| 60% | 12.0% | 9.2% | 6.9% | 5.2% |
| 70% | 11.2% | 8.5% | 6.4% | 4.8% |
| 80% | 10.4% | 7.9% | 5.9% | 4.5% |
| 100% | 9.0% | 6.8% | 5.1% | 3.8% |
| Variable | Estimate | Standard Error | Odds Ratio | Lower 95% CI | Upper 95% CI | P-value |
|---|---|---|---|---|---|---|
| Intercept | -2.0678 | 0.2501 | 0.12647 | 0.07477 | 0.2139 | <.0001 |
| Time | -1.6629 | 1.2590 | 0.18958 | 0.01346 | 2.6703 | 0.2031 |
| Autopsy rate | -0.05501 | 0.02860 | 0.94647 | 0.8913 | 1.0051 | 0.0704 |
| CM 2 vs 1 | 0.1857 | 0.5808 | 1.20405 | 0.3554 | 4.0796 | 0.7529 |
| CM 3 vs 1 | 0.3223 | 0.3966 | 1.38027 | 0.6000 | 3.1754 | 0.4270 |
| CM 4 vs 1 | 0.5144 | 0.6996 | 1.67261 | 0.3846 | 7.2734 | 0.4717 |
| CM 5 vs 1 | -0.2763 | 0.5884 | 0.75861 | 0.2204 | 2.6113 | 0.6443 |
| CM 6 vs 1 | 0.01320 | 0.6391 | 1.01329 | 0.2646 | 3.8802 | 0.9837 |
| CM 7 vs 1 | -0.3228 | 0.4533 | 0.72410 | 0.2794 | 1.8769 | 0.4855 |
Coefficient for Time is –0.05501
→ -0.5501 to use decade as unit of analysis instead of single year
odds ratio = exp (–0.5501)= 0.57689
<Subtract above from 1.0 to obtain relative decrease>
→ 0.42311
→ Error rate showed relative decrease of 42.3% per decade, but this relationship was not statistically significant (p=0.07)
Coefficient Autopsy Rate = -1.6629
→ -1.6629/10 = -0.16629 to calculate relationship as per 10% change in Autopsy Rate
odds ratio = exp (-0.166291) = 0.8468
<Subtract above from 1.0 to obtain relative decrease>
→ For every 10% increase in autopsy rate, error rate decreases by approximately 15.3%, but this relationship was not statistically significant (p=0.2).
Prob (Class I error) = 1 / (1 + e-λ) where λ = β0 + β1 X 1 + β2 X 2 + … + u
Intercept= -2.0678
For base probability, time, autopsy and case mix terms all equal zero and there is no country terms in the analysis restricted to U.S. only, so
λ = -2.0678
Therefore,
→ Base prob (Class I error) = 1 / (1 + e‐(-2.0678))
→ Base prob (Class I error) = 0.1117→ 11.2%
The value calculated using the statistical software corroborated this estimate and provided the corresponding confidence interval.
Thus, the mean Class I error rate using data from U.S. only is 11.2% (95% CI: 6.9–17.5%).
In the model adjusting for study period, variations in autopsy rates, differences in case mix and study country (U.S. vs. non-U.S.), the probability of the autopsy detecting a Class I error in a given case was 10.2% (95% CI: 6.7–15.3%). Restricting the analysis to data from U.S. institutions only, yielded a similar point estimate, but a slightly wider confidence interval, 11.2% (95% CI: 6.9–17.5%).
The expected inverse correlation between error rate and study period (i.e., the more recent the study the lower the error rate) was modest and statistically significant. Specifically, the probability of a Class I error showed a relative decrease of 28.0% per decade (p=0.1; 95% CI: 48.8% decrease to 6.3% increase).
The expected inverse correlation between error rate and autopsy rate (i.e., the higher the autopsy rate, the lower the error rate) was relatively weak and not statistically significant. Specifically, for every 10% increase in autopsies, the Class I error rate exhibited a relative decrease of 7.8% (p=0.2).
| Variable | Estimate | Standard Error | Odds Ratio | Lower 95% CI | Upper 95% CI | P-value |
|---|---|---|---|---|---|---|
| Intercept | -0.9773 | 0.1238 | 0.37633 | 0.2931 | 0.4831 | <.0001 |
| a_rate | -1.2846 | 0.3211 | 0.27677 | 0.1448 | 0.5291 | 0.0003 |
| year | -0.03288 | 0.01131 | 0.96766 | 0.9458 | 0.9900 | 0.0058 |
| country | 0.08833 | 0.07323 | 1.09235 | 0.9423 | 1.2663 | 0.2345 |
| cm2 | 0.2505 | 0.2024 | 1.28464 | 0.8538 | 1.9329 | 0.2228 |
| cm3 | 0.09321 | 0.1796 | 1.09770 | 0.7639 | 1.5773 | 0.6065 |
| cm4 | 0.7518 | 0.1969 | 2.12083 | 1.4254 | 3.1555 | 0.0004 |
| cm5 | -0.7406 | 0.2842 | 0.47680 | 0.2687 | 0.8461 | 0.0126 |
| cm6 | 0.01956 | 0.2773 | 1.01975 | 0.5827 | 1.7847 | 0.9441 |
| cm7 | -0.2192 | 0.1781 | 0.80320 | 0.5607 | 1.1506 | 0.2253 |
I. Time Trend
Coefficient for Time is –0.03288 → -0.3288 to use decade as unit of analysis instead of single year
Odds ratio = exp (–0.3288 ) = 0.719787
<Subtract above from 1.0 to obtain relative decrease>
→ 0.2802
→ Major error rates showed decrease of 28.0% per decade (p=0.0058) relative to the base time period of 1980.
Coefficient Autopsy Rate = -1.2846
10% change → -1.2846/10= -0.12846
Odds ratio = exp (-0.12846) = 0.879449
<Subtract above from 1.0 to obtain relative decrease>
→ 0.120551
For every 10% increase in autopsy rate, major error rate decreases by 12.0% (p=0.0003)
The models dropping case mix, country and autopsy rate (one at a time) produced results with statistically significant differences from the above. Consequently, we used Model 1 (including autopsy rate, country, case mix as predictors of major errors) to compute the mean error rate and the range of error rates shown in the table below and the mean major error rate of 25.6% (95% CI: 20.8–31.2%). The point estimate can be obtained manually using the calculation shown below and was compared with the value generated by the statistical software. (The software was required to calculate the confidence interval.)
Prob (major error) = 1 / (1 + e-λ) where λ = β0 + β1 X 1 + β2 X 2 + … + u
Substituting in Intercept (β0)= -0.9773
and base values of time (1980), autopsy rate (overall mean rate of 44.3%), country (U.S.=-1) and case mix (general autopsies, so that case mix terms CM2,…,CM7 all equal zero), then
λ = -0.9773 + (–0.02223)(1980-1980) + (-0.9603)(0.443–0.443) + (0.08833)(-1)
λ = -1.06563
→ Base prob (major error) = 1 / (1 + e‐(-0.88867))
→ Base prob (major error) = 0.256235 → 25.6%
Calculation performed with statistical software confirmed this estimate and provided the corresponding confidence interval of (0.2077, 0.3116).
Thus, base probability of major error was: 25.6% (95% CI: 20.8–31.2%)
| Autopsy Rate | 1970 | 1980 | 1990 | 2000 |
|---|---|---|---|---|
| 5% | 44.3% | 36.4% | 29.2% | 22.9% |
| 10% | 42.7% | 34.9% | 27.9% | 21.8% |
| 15% | 41.2% | 33.5% | 26.6% | 20.7% |
| 20% | 39.6% | 32.1% | 25.4% | 19.7% |
| 25% | 38.1% | 30.7% | 24.2% | 18.7% |
| 30% | 36.6% | 29.4% | 23.0% | 17.7% |
| 40% | 33.7% | 26.8% | 20.8% | 15.9% |
| 50% | 30.9% | 24.3% | 18.8% | 14.3% |
| 60% | 28.2% | 22.0% | 16.9% | 12.8% |
| 70% | 25.7% | 19.9% | 15.2% | 11.4% |
| 80% | 23.3% | 17.9% | 13.6% | 10.2% |
| 100% | 19.0% | 14.5% | 10.8% | 8.1% |
| Variable | Estimate | Standard Error | Odds Ratio | Lower 95% CI | Upper 95% CI | P-value |
|---|---|---|---|---|---|---|
| Intercept | -1.1499 | 0.2002 | 0.31666 | 0.2098 | 0.4779 | <.0001 |
| Time | -1.0411 | 0.5382 | 0.35306 | 0.1168 | 1.0673 | 0.0640 |
| Autopsy rate | -0.02460 | 0.01844 | 0.97570 | 0.9394 | 1.0134 | 0.1937 |
| CM 2 vs 1 | -0.00791 | 0.4640 | 0.99212 | 0.3823 | 2.5750 | 0.9865 |
| CM 3 vs 1 | 0.1429 | 0.2625 | 1.15363 | 0.6726 | 1.9787 | 0.5908 |
| CM 4 vs 1 | 0.6615 | 0.3243 | 1.93774 | 0.9950 | 3.7738 | 0.0516 |
| CM 5 vs 1 | -0.7628 | 0.3381 | 0.46637 | 0.2328 | 0.9344 | 0.0327 |
| CM 6 vs 1 | 0.2219 | 0.4953 | 1.24850 | 0.4510 | 3.4561 | 0.6578 |
| CM 7 vs 1 | -0.2093 | 0.2599 | 0.81118 | 0.4754 | 1.3841 | 0.4281 |
Coefficient for Time is –0.0246 → -0.246 to use decade as unit of analysis instead of single year
Odds ratio = exp (–0.246)= 0.78192
→ Error rate exhibited relative decrease of 21.8% per decade, but this relationship was not statistically significant (p=0.2).
Coefficient Autopsy Rate = -1.0411
10% change → -1.0411/10 = -0.10411
Odds ratio = exp (-0.10411) = 0.90113
For every 10% increase in autopsy rate, the major error rate decreased by 9.9% (p=0.06)
In the model adjusting for study period, variations in autopsy rates, differences in case mix and study country (U.S. vs. non-U.S.), the base probability of the autopsy detecting a major error in a given case was 25.6% (95% CI: 20.8–31.2%). Using data from U.S. institutions only, the base probability of the autopsy detecting a major error was slightly lower at 24.0%, but with an almost entirely overlapping confidence interval (95% CI: 17.3–32.3%).
The expected inverse correlation between error rate and autopsy rate (i.e., the higher the autopsy rate, the lower the error rate) was relatively weak, but in contrast to the results for Class I errors, this relationship was statistically significant. Specifically, for every 10% increase in the autopsy rate, the major error rate decreased by 12.0% (95% CI: 6.2–17.5%).
The expected inverse correlation between error rate and study period (i.e., the more recent the study the lower the error rate) was modest and, in contrast to the results for Class I errors, this relationship was statistically significant. Specifically, the probability of a major error exhibited a relative decrease of 28.0% per decade (95% CI: 9.8–42.6%).
Nb: for the analysis below, time was centered at 1975, rather than 1980, as the study periods ranged from 1970-1980. Also, all of the studies using this error classification involved general inpatients or adult inpatients (Case Mix = 1), so case mix was not included in the models below.
| Variable | Estimate | Standard Error | Odds Ratio | Lower 95% CI | Upper 95% CI | P-value |
|---|---|---|---|---|---|---|
| Intercept | -2.0252 | 0.1024 | 0.132 | 0.103 | 0.170 | <.0001 |
| Time | 0.2468 | 0.02098 | 1.280 | 1.216 | 1.347 | <.0001 |
| Autopsy rate | -0.2642 | 0.1541 | 0.768 | 0.527 | 1.119 | 0.1373 |
| Country | 0.4187 | 0.08126 | 1.520 | 1.246 | 1.854 | 0.0021 |
Intercept= -2.0252; Standard error = 0.1024
95% CI= -2.0252 ± 2*0.1024 → -2.230, -1.8204
probability of error = 0.1166 (95% CI: 0.0971 -0.1394)
Coefficient for Time is 0.2468
Odds ratio = exp (0.2468)= 1.2800
→ Error rate increased by roughly 28% per year (p<0.0001).
Coefficient Autopsy Rate = -0.2642
5% change → -0.2644/20 = -0.01321
Odds ratio = exp(-0.01321) = 0.9869
For every 5% increase in autopsy rate, error rate decreases by approximately 1.4%, but this relationship was not statistically significant (p=0.1)
In the model adjusting for study period, variations in autopsy rates, study country (U.S. vs. non-U.S.), the autopsy and clinical diagnoses fell in different major ICD in 11.7% (95% CI: 9.7% -13.9%) of cases in this base time period and country and at the base autopsy rate.
In contrast with the other two definitions of errors, ICD discrepancies showed an increase over time, and this relationship was statistically significant. Specifically, the error rate increased by roughly 28% per year (p<0.0001).
The relationship between the ICD discrepancies and autopsy rate did have the expected inverse correlation (as with the other two definitions of errors), but the relationship was weak and not statistically significant. For every 5% increase in autopsy rate, error rate decreases by approximately 1.4%, but this relationship was not statistically significant (p=0.1)
One of the fields in the reference database was used to index articles according to the study topics addressed and the specific type of information provided.
Diagnostic Errors(“Dx errors”)-studies of clinical-autopsy diagnostic discrepancies
Because of the complexity of this topic, articles tended to be indexed with one or more of the following sub-headings.
Spec Dz's - studies reporting diagnostic discrepancies for specific diseases or diagnoses.
Examples
Bobrowitz ID. Active tuberculosis undiagnosed until autopsy. Am J Med. 1982;72:650–8.
Zarling EJ, et al. Failure to diagnose acute myocardial infarction: the clinicopathologic experience at a large community hospital. JAMA. 1983;250:1177–81.
Goldhaber SZ, et al. Factors associated with correct antemortem diagnosis of major pulmonary embolism. Am J Med. 1982;73:822–6.
Population - studies providing data relevant to clinical diagnostic performance in a general population, not just at among deaths or autopsies (e.g., by providing discharge diagnoses for the same time period, or de facto, by achieving an autopsy rate close to 100%.)
Examples
DeRiemer K, et al. The epidemiology of tuberculosis diagnosed after death in San Francisco, 1986-1995. Int J Tuberc Lung Dis. 1999;3(6):488–93.)
McCarthy BD, et al. Missed diagnoses of acute myocardial infarction in the emergency department: results from a multicenter study. Ann Emerg Med. 1993;22:579–82.
Flum DR, et al. Has misdiagnosis of appendicitis decreased over time? A population-based analysis. JAMA. 2001;286:1748–53.
Selection - studies assessing clinical selection bias of cases for autopsy (e.g., by prospectively asking clinicians about their expectation of new information at autopsy
Example
Cameron HM, McGoogan E. A prospective study of 1152 hospital autopsies: I. Inaccuracies in death certification. J Pathol 1981; 133: 273–83).
Predictors - studies reporting factors other than clinical selection (above) and time of study (below) that predict the occurrence of diagnostic errors. Examples include demographics (age, gender, race, socioeconomic status, religion), hospital length of stay (LOS), clinical service (service), clinical diagnosis (Dx), and presence of a DNR order.
Examples
Landefeld CS, et al. Diagnostic yield of the autopsy in a university hospital and a community hospital. N Engl J Med 1988; 318: 1249–54.
McFarlane MJ. Clinical diagnosis is not a source of bias in selection for necropsy. Arch Pathol Lab Med. 1989;113:64–7.
Setting indicates that diagnostic error rates were compared in different settings (e.g., teaching vs. non-teaching hospital, nursing home vs. hospital)
Examples
Landefeld CS, et al. Diagnostic yield of the autopsy in a university hospital and a community hospital. N Engl J Med. 1988;318:1249–54.
Time indicates that diagnostic error rates were assessed compared in different time periods
Examples
Goldman L, et al. The value of the autopsy in three medical eras. N Engl J Med. 1983;308:1000–5.
Kirch W, Schafii C. Misdiagnosis at a university hospital in 4 medical eras. Medicine (Baltimore). 1996;75:29–40.
Sonderegger-Iseli K, et al. Diagnostic errors in three medical eras: a necropsy study. Lancet. 2000;355:2027–31.
Error analysis - studies including some sort of analysis of why misdiagnoses were made (anecdotal discussion of selected cases did not count as sufficient)
Example
Middleton K, et al. An autopsy-based study of diagnostic errors in geriatric and nongeriatric adult patients. Arch Intern Med. 1989;149(8):1809–12.
Dx tests - studies examining the impact of antemortem diagnostic testing on autopsy-detected errors
Goldman L, et al. The value of the autopsy in three medical eras. N Engl J Med. 1983;308(17):1000–5.
Kirch W, Schafii C: Misdiagnosis at a university hospital in 4 medical eras. Medicine (Baltimore) 1996; 75: 29–40.
Complications - studies of the role of the autopsy in detecting complications of care, including many of the general autopsy series that happen to mention complications as a specific type of missed diagnosis, but also other studies focused specifically on this issue.
Examples
Ebbesen J, et al. Drug-related deaths in a department of internal medicine. Arch Intern Med. 2001;161:2317–23.
Gotti EW. Adverse drug reactions and the autopsy. Prevalence and perspective. Arch Pathol. 1974;97:201–4.
Medicolegal - studies including any information on the impact of autopsy-detected errors on legal actions
Examples
Juvin P, et al. Postoperative death and malpractice suits: is autopsy useful? Anesth Analg. 2000;91:344–6.
Nichols L, et al Are autopsies obsolete? Am J Clin Pathol. 1998;110:210–8.
Performance - studies containing information on the “test characteristics” of the autopsy itself (e.g., quality of the autopsy, number of cases in which a diagnosis could not be established despite adequate autopsy)
Examples
Veress B, et al. The reliability of autopsy diagnostics: inter-observer variation between pathologists, a preliminary report. Qual Assur Health Care. 1993;5:333–7.
Schned AR, et al A comprehensive quality assessment program on the autopsy service. Am J Clin Pathol. 1986;86:133–8.
Fowler EF, et al. Evaluation of a teaching hospital necropsy service. J Clin Pathol. 1977;30:575–8.
Attitudes - articles describing attitudes towards the autopsy on the parts of patients, their family members, or the general public clinicians (including students, post-graduate trainees and practicing physicians), and pathologists.
Examples
McPhee SJ, et al. To redeem them from death. Reactions of family members to autopsy. Am J Med. 1986;80(4):665–71.
Wilke A, French F. Attitudes toward autopsy refusal by young adults. Psychological Reports. 1990;67(1):81–2.
Sanner M. A comparison of public attitudes toward autopsy, organ donation, and anatomic dissection. A Swedish survey. JAMA. 1994;271(4):284–8.
Rosenbaum GE, et al. Autopsy consent practice at US teaching hospitals: results of a national survey. Arch Intern Med. 2000;160(3):374–80.
Stolman CJ, et al. Attitudes of pediatricians and pediatric residents toward obtaining permission for autopsy. Arch Pediatr Adolesc Med. 1994;148(8):843–7.
Trelstad RL, et al. The role for regional autopsy centers in the evaluation of covered deaths. Survey of opinions of U.S. and Canadian chairs of pathology and major health insurers in the United States. Arch Pathol Lab Med. 1996;120(8):753–8.
Autopsy rates - observational studies of trends in autopsy rates over time or intervention studies attempting to increase the autopsy rate
Examples
Cameron HM, McGoogan E, Clarke J, Wilson BA. Trends in hospital necropsy rates: Scotland 1961-74. BMJ. 1977;1:1577–80.
Sinard JH. Factors affecting autopsy rates, autopsy request rates, and autopsy findings at a large academic medical center. Exp Mol Pathol. 2001;70:333–43.
Epidemiology - studies describing the role of the autopsy in tracking the epidemiology of target conditions
Examples
Welch HG, Black WC. Using autopsy series to estimate the disease “reservoir” for ductal carcinoma in situ of the breast: how much more breast cancer can we find? Ann Intern Med; 1997. p. 1023–8.
Rakar S, Sinagra G, Di LA, Poletti A, Bussani R, Silvestri F, et al. Epidemiology of dilated cardiomyopathy. A prospective post-mortem study of 5252 necropsies. The Heart Muscle Disease Study Group. Eur Heart J. 1997;18:117–23.
McFarlane MJ, Feinstein AR, Wells CK, Chan CK. The ‘epidemiologic necropsy’. Unexpected detections, demographic selections, and changing rates of lung cancer. JAMA. 1987;258:331–8.
Death Certificates - studies addressing the correlation between diagnoses on death certificates and autopsies and/or formal chart review
Examples
Kircher T, Nelson J, Burdo H. The autopsy as a measure of accuracy of the death certificate. N Engl J Med. 1985;313:1263–9.
Jordan JM, Bass MJ. Errors in death certificate completion in a teaching hospital. Clin Invest Med. 1993;16:249–55.
The first half of the form applies to all of the study topics (Appendix C), but the second half (Results section) is tailored to studies addressing the rate of autopsy-detected diagnostic errors. The heterogeneity of studies addressing other topics precluded the development of a practical template. Thus, for these other topics, reviewers summarized the results as free text.













A Technical Expert Advisory Group was assembled to provide guidance to the project team. The Advisors included pathologists, internists, a surgeon, and researchers with expertise in critical appraisal of the literature, health economics, patient perspective, and ethnicity. The Advisors were provided with the original project proposal and study questions, as well as a set of questions tailored to their areas of expertise. These questions were formulated to allow the project team to gather background from a variety of perspectives in order to inform a feasible and worthwhile direction for the systematic evidence review of autopsy. The responses to these questions therefore provide important background to the project, and are summarized broadly below.
On reviewing the study questions proposed for the project, were there any clear gaps or omissions?
Most of the advisors thought that the study questions covered important topics. Gathering data concerning the diagnostic yield of the autopsy, factors influencing selection of cases for autopsy, and documentation of complications of care were considered to be most salient if incorporated into the broader question about “How can this information be used in quality improvement, outcome analyses, performance measurement initiatives and error reduction?” In other words, the critical question that this entire study should answer is “Does the autopsy have the potential to improve quality and reduce errors?”
Additionally, one of the clinician advisors discussed the issue of distinguishing instances in which diagnostic errors would have affected therapy, but would likely not have altered patient. This issue was illustrated with two cases from this advisor's recent experience. In the first case, a patient had been transferred from another hospital with altered mental status and fever and underwent several lumbar punctures that were unrevealing. The patient progressively deteriorated and died after a respiratory arrest led to anoxic brain damage. Because of the patient's cachexia, the clinical team presumed an underlying malignancy or undiagnosed HIV infection. Autopsy revealed tuberculous meningitis. Although knowing the diagnosis would have resulted in more appropriate treatment, the outcome would have likely remained the same given the advanced state of his CNS infection. By contrast, the second case was a patient who had Noonan's Syndrome and a variety of chronic problems including a seizure disorder and recurrent pulmonary infections. His terminal hospitalization for an acute febrile illness was presumed to reflect another pneumonia. Autopsy revealed endocarditis. This was not only undiagnosed, but also completely unsuspected and untreated. In this case knowing the diagnosis would not only have altered therapy, but also would very likely have changed the outcome.
Regarding the “diagnostic yield of the autopsy,” many clinicians will undoubtedly believe that cases taken for autopsy are “pre-selected” (by the treating physicians) to represent a high potential for unexpected or erroneous diagnoses. We know of only one study that specially asked clinicians prospectively whether or not the clinical diagnoses were particularly uncertain prior to autopsy performance. (No correlation was found between clinicians' expectation and the finding of significant errors.) Because of the importance of addressing this significant potential for bias, we would like to know if you know of any other studies relevant to this question. (We also know that “diagnostic error rates” are significant even at institutions with higher autopsy rates, but institutions with higher autopsy rates tend to be non-US centers, and so clinicians may wonder if these patients undergo fewer sophisticated diagnostic tests.)
One Advisor noted that the Royal College of Pathologists (UK) in their August 1991 report entitled “The Autopsy and Audit” (available on their web site www.rcpath.org or from their offices at 2 Carlton House Terrace, London SW1Y 5AF) addressed sampling of hospital deaths. They suggested that autopsies should be done in all “problematic” deaths and, in addition, autopsies should be performed on 10% of random general hospital deaths in which there is no perceived “problem”. Another Advisor suggested that clinicians ask for autopsies in cases of diagnostic uncertainty, but also emphasized that is it is not uncommon to uncover diagnostic errors in cases that clinicians regarded as routine, as noted in the article by Landefeld et al from the New England Journal of Medicine. Similarly, another Advisor agreed that some selection is likely occurs, but strongly believed that the observed error rates could not be explained solely as an artifact of this selection – i.e., the autopsy does detect important quality problems in clinical diagnosis, even when selection is taken into account. In contrast, the surgical Advisors believed strongly that diagnostic uncertainty was the main, if not only, reason for requesting autopsy.
What do you anticipate to be the major challenge in conducting a systematic review such as this one in which the target literature consists entirely of observational studies? Are there some special concerns regarding the observational nature of the literature given that we are trying to evaluate the performance of what amounts to a diagnostic test?
The biggest challenge for both of the above questions will be avoiding publication bias. Observational studies tend to be far more subject to publication bias than randomized control trials.
Because of the importance of addressing this significant potential for bias, we would like to know if you could suggest any other means of assessing the degree to which persistent significantly high rates of “diagnostic errors” can be explained by selection bias.
The only hope of errors related to pre-selection is if, within the US, you have differential autopsy rates across studies that are not totally confounded with time. Then, you can use rate meaningfully as a factor in your regression.
Do you think that the current autopsy rate in the U.S (or in US hospitals) is appropriate? If so, why so?
Some Advisors thought the current rate is too low to allow detection of important quality problems at a given institution. The literature fairly consistently indicates important diagnostic errors in small, but still a significant proportion of deaths. Autopsy rates < 5% (as occur at many hospitals) clearly do not permit one to notice trends in such errors that might alert one to local quality problems. Others pointed out that a specific or required rate is inappropriate, and supported JCAHO's having dropped this as a regulatory requirement since there is a lack of necessity of performing autopsy in “most” cases.
What do you think the autopsy rate should be - higher, lower or about the same as current rates?
One Advisor agreed with the Royal College of Pathologists that autopsies should be performed on a minimum of 10% of random deaths and on all “problematic” cases. These could be defined locally such as all perinatal deaths, all deaths following new or experimental treatments and also all deaths which have educational or research value. (Of course often one does not know that a case is interesting until after an autopsy is performed, thus the rationale for the 10% random, unsolicited autopsies.) Another Advisor could not state a specific rate (or offer any evidence for such a rate), but thought that the “right rate” was around 30%. The major factor in offering this target was capturing diagnostic errors. Also, at the current rate at this Advisor's institution (20%), it is difficult to achieve the requirement that pathology residents participate in 50 autopsies over the 2 years of training anatomic pathology. Therefore, 30% would remove the current difficulties in achieving this educational goal. The current 20% rate also makes it difficult to achieve the goal of having each medical student see one autopsy. Medical students attend autopsies in groups of 6–8 students, but are not available all year, and are not invited to cases where there are major infectious risks (e.g., known TB) or to fetal cases. One other opinion was that, although a “right rate” cannot be defined, the rate clearly needs to be higher than it is now.
If you do think increasing autopsy rates (or at least preventing further decreases) is important, which if the following issues do you regard as crucial to address - physician understanding of the persistent importance of the autopsy? Patient perceptions of requests for autopsies? Support for activities related to the autopsy within pathology departments? Reimbursement for autopsy performance?
All are considered important, although clearly each present their own problems. Residents also do not know how to request autopsies, but an in-service could easily be put together in which the recommendations in the literature (e.g., from the Archives of Internal Medicine series on the autopsy) regarding answers to commonly asked questions from patients were presented to house staff. Reimbursement could be an issue, with a belief that declining rate of autopsies is a victim of our current financial system. It is a problem for clinical medicine in general because there are times when one really does learn from patients. There is some downside also with regards to research, and perhaps to education. The big issue relates to quality within an institution and in the management of a particular problem. Addressing the decline is important for both clinical medicine and for quality of care within an institution. An Advisor noted that community pathologists have no financial incentive to perform autopsies and may even find the procedure itself “distasteful,” compared to other pathology procedures.
Another Advisor also added that pursuit of the autopsy out of academic interest has to be tempered by recognition of the potential for malpractice, and that the legal environment is an important consideration with respect to obtaining autopsies.
Are there certain groups that you think are under-represented in selection of autopsy cases (e.g., in terms of age, gender, socioeconomic class, ethnicity, religion)?
Advisors were not aware of any groups that are under-represented, except perhaps some religious groups, and possibly Chinese patients. In addition, one Advisor thought there is over-representation of perinatal autopsies, due in large part to local or state laws and regulations mandating such, probably based on the reaction to SIDS.
Do you think selection biases (based on patient or provider factors) or missed opportunities for quality improvement are significant enough to warrant random selection of autopsy cases? Do you think such a system could be developed in the US?
While one Advisor favored such a system, others pointed out a number of logistic and patient-related barriers.
Do you think clinicians benefit from autopsy findings? To your knowledge, has the impact of autopsy findings on clinicians ever been studied?
Of course clinicians could benefit from autopsy findings, but it requires a well-trained, knowledgeable pathologist working with a concerned clinician. Advisors knew of no studies to document this. Clinicians probably do not derive optimal benefit from current autopsies, and Advisors believed this should be addressed. Advisors did not think that the quality improvement impact of the autopsy has ever been studied.
What could be done to increase the benefits/impact of autopsy findings for clinicians?
One Advisor suggested improving the training of pathologists in autopsy performance, as well as updating the methods and techniques used in the autopsy. Modern and up-to-date techniques of molecular pathology and immunohistochemistry as well as imaging, probe and physico-chemical methods should be incorporated in the performance of autopsy. Because of the inherent costs of such advances, innovative solutions might include centralization or regionalization of both training and performance of autopsies. Another Advisor pointed out that clinicians may not benefit from the current system in place because of time delays between autopsy request and reporting. An alternative to focusing on time would be to have periodic conferences with clinical departments in which the cases with diagnostic errors could be reviewed. One Advisor's department already has such conferences on a regular basis with the Coronary Care Unit. All deaths are reviewed and cases in which an error has occurred are always presented. In addition, the chief of the medical service is notified whenever important diagnostic errors are detected on autopsy, but there is no formal mechanism for ensuring that these cases are presented at Department of Medicine's Morbidity and Mortality (M&M) rounds.
From a policy or economic point of view, would you place greater emphasis on autopsy rates or the use that is made of autopsy information as currently performed?
Advisors felt that both could be considered. The current low rate only provides information for individual families or clinicians. The numbers are too small for any valid statistical evaluation about quality, outcomes or performance.
As a practicing clinician, what would you like to see changed/improved about procedures and protocols related to reporting the results of autopsied cases?
Key features would include: detailed descriptions of clinical and demographic characteristics of consecutive autopsied and non-autopsied deaths autopsied and not autopsied cases (in order to clarify selection bias and other features distinguishing these two groups); assessments of reproducibility for judgments of diagnostic errors (i.e., discrepancies between clinical and autopsy diagnoses demonstrated); better quantification of the likely impact of autopsy-detected errors on patient outcome (not just antemortem therapy).
Are there factors that make surgeons more or less likely to request autopsies than their non-surgical clinical colleagues? Do you think timing of death (e.g., soon after surgery) exert a special effect on surgeons' tendency to request autopsy?
The main factor is the unexpectedness of death or complication, and the adequacy of the diagnostic evaluation the patient had undergone prior to death.
In surgery, do you think that autopsies play an important role in Morbidity and Mortality (M&M) conferences, or do discussions center more on aspects of antemortem care (i.e., use of appropriate diagnostics, aspects of surgical care, etc.)?
Although the Surgeon Advisor strongly supports the continued role of M&M rounds a part of surgical quality assurance, he believes that autopsies play a very minor role in these proceedings, with patient care problems generally apparent prior to death or autopsy. He based this opinion primarily on advances in diagnostic imaging. As a result of this comment, the analogy with unexpected findings at laparotomy was discussed. Anecdotally, surgeons generally believe that advances in diagnostic imaging have made discrepancies between pre-operative and post-operative diagnoses less common. Moreover, the “diagnostic laparotomy” has become a very uncommon operation. One of the other advisors confirmed these impressions, but none of the advisors (nor the core project team) could find any studies documenting this trend.
How do you think patients' attitudes towards requests for autopsy performance might be affected by ethnicity or culture?
One Advisor pointed out that ethnicity should be defined based on self-identification, and that Defining “culture” is a major challenge. Perhaps thinking of it in terms of individual's national origin, religious background, social class would be one approach. This Advisor suggested that not violating the physical remains of a deceased person might be important to many people (e.g., Catholics who want an open casket funeral). In decision making about consenting to an autopsy, one has to consider cultural roles and norms. In the Latino culture, for example, the extended family or family members would need to be agreeable that an autopsy is important. This kind of groundwork may need to be established ahead of time if possible, but clearly would not just be a simple “let's go in and consent for an autopsy” approach. Much respect and consideration has to be given to this decision, which is often a collective one, not an individual one. Similarly, the trust in the physician and respect for the physician having taken care of the deceased family member would in the view of this Advisor, play a major role in consenting or not.
In considering the above question, is there a particular aspect of the autopsy that you think is most relevant - e.g., what the procedure entails, its purpose?
One Advisor thought that the altruistic goal of defining what we can learn from it – how can we do this better next time – how can the system learn from it is probably the most compelling argument. Clearly, nothing can be done to help the individual. However, there are benefits for the system, the public, and the clinicians. Taking another perspective, as eluded to in Steve McPhee's study, it may be very relevant to know what the diagnosis really is because of the concerns about genetic susceptibility or infectious diseases. Another Advisor noted that unsuspected hemochromatosis (and other genetic diseases) are detected and impacts family members.
Are there any areas of research involving the impact of ethnicity on patient attitudes that might shed light on attitudes towards the autopsy (e.g., requests for organ donation)?
One Advisor suggested consideration of attitudes about family and how collective decision making is important in certain cultures. This may be particularly true in Asian and Latino immigrant cultures, but to some extent is universal. This is not an individual decision or an individual family member's decision much of the time. A second area is what degree of trust and respect the family has for the clinicians and the institution taking care of the patient. There is wide spread perception that African Americans have less trust in the system, and although less documented, it probably is also true for other groups such as Latinos and Asians. However, individual physicians with a strong therapeutic relationship can overcome this. Trust and respect would seem to be important areas to pursue.
The following studies were suggested relevant to this and related questions:
Connell C, Avey H, Holmes S. Attitudes about autopsy: Implications for educational interventions. Gerontologist. 1994 Oct. 34 (5) p. 665–673.
Sanner M. Attitudes toward organ donation and transplantation: A model for understanding reactions to medical procedures after death. Social Science & Medicine. 1994 Apr. 38(8): p. 1141–1152.
Kotch J, Cohen S. SIDS counselors' reports of own and parents' reactions to reviewing the autopsy report. Omega: Journal of Death & Dying. 1985-1986. 16(2): p. 129–139.
Are there any areas of research on patient attitudes to other medical requests or procedures that you think might shed some light on patients' attitudes towards the autopsy?
The Patient Perspective Advisor commented that important patient factors to consider regarding autopsy consent processes might include the existence of a prior relationship with the requesting clinician (e.g., primary care physician versus attending physician versus unknown covering physician), religious/spiritual beliefs, and trust in the healthcare system, which in turn is very likely to be affected by ethnicity and economic status.
The two areas of study of patient perspectives mentioned were organ donation and cancer screening, with organ donation obviously being the more directly related area. The notable difference between organ donation and autopsies that the Advisor pointed out was that patients probably do not regard autopsies as having a clear goal (e.g., helping people). The general area of patient perspectives of healthcare choices could also be researched.
One of the questions that the existing literature probably will not answer is whether or not there is a “right rate” for the autopsy? As a health economist, how might you frame this question or consider answering it?
Agreed with general approach of the report, that one has to identify quantifiable benefits and then demonstrate cost-effectiveness in achieving these benefits with a certain autopsy rate.
The literature will almost certainly furnish evidence that the autopsy provides multiple benefits for different “users” of the autopsy, including patients' family members, clinical staff, pathologists, researchers, public health officials, hospitals and health care organizations, and health care payers. Unfortunately, the information for families regarding heritable diseases, the use of the autopsy as a means of detection for public health officials in monitoring important trends, the multiple roles in medical education, and other benefits attributable to the autopsy have no clear “ dollar value.” Do you have any thoughts on how we might attempt to quantify the benefits of the autopsy?
Benefits other than improved diagnosis and more accurate vital statistics are difficult to quantify, and even the latter benefit is not easily quantified.
The costs of the autopsy are presumably more straightforward: other than the time spent by the pathologists performing and interpreting autopsies and associated use of supplies/equipment, are there any significant costs associated with autopsy performance? Do you think medicolegal exposure for society/hospitals/physicians represents a substantial cost of the autopsy?
One Advisor noted that there are anecdotal reports of autopsy findings actually helping hospitals/MDs in defending the care they delivered, but no real data addressing the issue of costs incurred or saved by routine autopsy performance.
Another Advisor pointed out that the College of American Pathologists has done some work on specifying the cost elements for autopsy. Baseline costs include: 1.) Space - detail original cost, depreciation and maintenance; 2.) Utilities (i.e., Air - Pressure Gradient Requirements, Filtration - Laminar Flow, Venting, Plumbing, Waste disposal - liquid and solid); 3.) Capital Equipment (i.e., Depreciation, Interest loss, Lease or rental costs, Maintenance, Insurance, Licensing fees for computers and other equipment); 4.) Supplies (i.e., Disposables, Cleaning Supplies, Histology, Secretarial and computer, Photography and educational); 5.) Personnel – Technical (i.e., Salaries, Benefits, Pension; Autopsy Assistants; Histopathology Technicians; Laboratory Assistants; Nurses - in Decedent Affairs Office); 6.) Personnel – Professional (i.e., Pathologists; Residents; Autopsy Assistants - if employees of the Professional Group; Costs of Autopsy Performance -- Chart Review, Gross Dissection, Microscopic review, Dictation, Formulation of Final Diagnosis; Costs of Educational Services -- For attending physician, For Hospital/Medical Staff); 7.) Indirect Costs (i.e., Laboratory Administration and supervision; General Laboratory and Office Supplies; General Maintenance; Computer services; Continuing Education - including subscriptions, dues and travel; Licensing fees; Quality control, quality assurance and laboratory accreditation costs; Other); and 8.) Allocated Expenses from Hospital. Additionally, Special Costs Related To OSHA Regulations: 1.) Personnel Protective Equipment; 2.) Decontamination and Housekeeping; 3.) Ducted exhaust and air ventilation system; 4.) Waste disposal; 5.) Containment Equipment (biologic Safety Cabinets); 6.) Employee Vaccination; 7.) Employee Education; 8.) Construction and Capital costs; 9.) Other Personnel costs; 10.) Other Supply Costs; and 11.) Indirect and allocated costs.
In conducting this analysis, from a policy perspective, would you recommend targeting research toward determining an optimal autopsy rate, or instead considering the strategy of directing resources at increasing the impact of the information derived from autopsies obtained under the current system, or both?
All of the Advisors agreed that both are important.
Most of our analysis will be from the societal perspective, but can you think of any financial incentives for institutions or payers to maintain their current autopsy rates (or increase them) under the current reimbursement system? For example, can autopsy results be used to modify discharge diagnoses so as to increase hospital reimbursement for individual patients?
The Health Economics Advisor knew of no data addressing this possibility. (We subsequently identified two papers that discuss this issue, but no good studies demonstrating a systematic impact on DRGs—and therefore reimbursement—by routinely including autopsy findings.)
The authors gratefully acknowledge the guidance provided by members of the project advisory panel listed below. In addition, Preetha A. Basaviah, MD, Thomas E. Baudendistel, MD, and Chad Roach, BA performed many of the article abstractions/reviews, while Susan B. Nguyen, BA assisted with development and organization of the literature database and numerous other technical aspects of the project.
Pathologist Advisors
Richard E. Horowitz, MD—Clinical Professor of Pathology, University of Southern California and UCLA Schools of Medicine; project representative from the College of American Pathologists
George Lundberg, MD— Executive Vice-President and Editor-in-Chief, Medscape
Jean Olson, MD—Professor of Pathology and Director of the Autopsy Service at Moffitt-Long Hospitals, University of California San Francisco
Methodologist Advisors & Quantitative Analysis
Lisa A. Bero, PhD—Associate Professor of Clinical Pharmacy and Health Policy, University of California San Francisco; Co-Director, San Francisco Cochrane Center
Alan Bostrom, PhD - Department of Epidemiology & Biostatistics, University of California San Francisco
Gordon Guyatt, MD—Professor, Departments of Medicine and Clinical Epidemiology & Biostatistics, McMaster University
Charles E. McCulloch, PhD - Professor of Biostatistics, Department of Epidemiology & Biostatistics, University of California San Francisco
Surgeon Advisor
Thomas Russell, MD— Executive Director, American College of Surgeons
Healthcare Economist
Peter Nemetz, PhD—Professor, Faculty of Commerce and Business Administration, University of British Columbia; Senior Visiting Scientist, Department of Health Sciences Research, Mayo Clinic
Patient Perspective (Including Impact of race/Ethnicity)
Eliseo Perez-Stable, MD—Professor of Medicine and Chief, Division of General Internal Medicine, University of California San Francisco
Ellen Shaffer, PhD—Director of Policy, Robert Wood Johnson Patient-Provider Relationship Initiative; Consultant, Consumer Health Concerns
Peer Reviewers
Three of the advisors listed above (Richard Horowitz, MD, George Lundberg, MD, and Peter Nemetz, PhD) participated in the peer review process, but we also solicited reviews from the individuals listed below, whom we gratefully acknowledge for their thoughtful comments.
Stephen A. Geller, MD - Professor and Chairman, Pathology and Laboratory Medicine, Cedars Sinai Medical Center, Los Angeles, CA
Randy L. Hanzlick, MD - Chief Medical Examiner, Fulton County Medical Examiner,Atlanta, Georgia
Richard J. Hausner, MD - Medical Director, Clinical Laboratory, Cypress Fairbanks Medical Center Hospital, Houston, TX
Lee H. Hilborne, MD, MPH - Professor of Pathology and Laboratory Medicine, Director, UCLA Center for Patient Safety and Quality, University of California Los Angeles, Los Angeles, CA
Grover M. Hutchins, MD - Professor of Pathology, Johns Hopkins University School of Medicine
Joanne Lynn, MD - Director, RAND-Center to Improve Care of the Dying, Arlington, VA.
A number of Fellows of the College of American Pathologists (CAP), in addition to those already listed above, also provided very helpful comments, including Paul A. Raslavicus, MD, President, CAP, Kevin Bove, MD, Chair, CAP Autopsy Committee, Victor Weedn, MD, JD, Co-Chair Forensic Pathology/Forensic Identity Committee, and Patricia Styer, PhD of the CAP biostatistics department.
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC] [PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC]
[PubMed]
Free Full text in PMC] [PubMed]