• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Health Serv Res. Author manuscript; available in PMC Oct 1, 2013.
Published in final edited form as:
PMCID: PMC3448279

Composite measures for rating hospital quality with major surgery

Justin B. Dimick, MD, MPH, Assistant Professor of Surgery
University of Michigan 2800 Plymouth Road; Building 520, Office 3144; Ann Arbor, MI 48109
Douglas O. Staiger, PhD, John French Professor of Economics
Dartmouth College 301 Rockefeller Hall; Dartmouth College; Hanover, NH 03755 Phone: (603) 646-2979; Fax: (603) 646-2979; ude.htuomtrad@regiatS.O.salguoD
Nicholas H. Osborne, MD, MS, Surgical House Officer
University of Michigan 2800 Plymouth Road; Building 520, Office 3144; Ann Arbor, MI 48109 Phone: (734) 998-7470; Fax: (734) 998-7473; ude.hcimu.dem@obsohcin
Lauren H. Nicholas, PhD, Faculty Research Fellow
Institute for Social Research, University of Michigan 426 Thompson Street, 4205 MISQ; Ann Arbor, MI 48104 Phone: (734) 764-2562; Fax: (734) 647-1186; ude.hcimu@alohcinl
John D. Birkmeyer, MD, George Zuidema Professor of Surgery



To assess the value of a novel composite measure for identifying the best hospitals for major procedures.

Data Source

We used national Medicare data for patients undergoing 5 high risk surgical procedures between 2005 and 2008.

Study Design

For each procedure, we used empirical Bayes techniques to create a composite measure combining hospital volume, risk-adjusted mortality with the procedure of interest, risk-adjusted mortality with other related procedures, and other variables. Hospitals were ranked based on 2005–06 data and placed in one of 3 groups: 1-star (bottom 20%), 2-star (middle 60%), and 3-star (top 20%). We assessed how well these ratings forecasted risk-adjusted mortality rates in the next two years (2007–08), compared to other measures.

Principal Findings

For all 5 procedures, the composite measures based on 2005–06 data performed well in predicting future hospital performance. Compared to 1-star hospitals, risk-adjusted mortality was much lower at 3-star hospitals for esophagectomy (6.7% vs. 14.4%), pancreatectomy (4.7% vs. 9.2%), coronary artery bypass surgery (2.6% vs. 5.0%), aortic valve replacement (4.5% vs. 8.5%), and percutaneous coronary interventions (2.4% vs. 4.1%). Compared to individual surgical quality measures, the composite measures were better at forecasting future risk-adjusted mortality. These measures also outperformed the Center for Medicare and Medicaid Services (CMS) Hospital Compare ratings.


Composite measures of surgical quality are very effective at predicting hospital mortality rates with major procedures. Such measures would be more informative than existing quality indicators in helping patients and payers identify high quality hospitals with specific procedures.


With wide recognition that surgical outcomes vary across hospitals, information on surgical quality is in high demand. Patients, families, and referring physicians are looking for information to help them select the best hospital for specific procedures [1]. Payers and large health care purchasers need accurate hospital ratings to inform their selective referral and value-based purchasing programs [2]. To meet these demands, several organizations are publicly reporting measures of surgical quality. The Leapfrog Group, a large coalition of public and private purchasers, reports information on mortality and hospital volume with major surgical procedures [3]. As its primary measure of surgical quality, the Center for Medicare and Medicaid Services (CMS) publicly reports hospital compliance with several process measures on its Hospital Compare website [4].

Whether such measures are optimal for helping patients and payers identify the safest hospitals for major procedures is uncertain, however. Measures based on structure, process, and outcomes all have distinct limitations [5]. Hospital volume, for example, is a useful predictor of outcomes for some operations, but it is a relatively weak proxy of quality for most procedures [6, 7]. Outcome measures, such as mortality and morbidity, are often too “noisy” to reliably measure hospital quality due to small sample sizes and low event rates [8, 9]. Process measures, such as those of the Surgical Care Improvement Program (SCIP) reported by CMS on Hospital Compare, are only weakly related to hospital outcomes with surgery [4]. Providing patients with all available information may be better than any single indicator, but they would still require guidance on how to weight various measures, particularly when they conflict.

In previous work, we have evaluated several different approaches for improving hospital quality measurement for surgery. Using national Medicare data, we recently described the value of a simple composite measure for profiling hospital quality with surgery based on volume and mortality alone [3]. This measure has since become the quality standard for the Leapfrog Group and was recently endorsed by the National Quality Forum (NQF) for use with esophagectomy, pancreatectomy, and abdominal aortic aneurysm repair. In another recent publication, we demonstrated the value of empirical Bayes techniques for filtering out statistical “noise” in surgical mortality measurement [10]. Finally, we have demonstrated the feasibility of a more comprehensive composite measure that takes into account a broader array of measures, albeit in a narrow clinical context [11].

In this paper, we demonstrate the value of a comprehensive composite measure for a broad range of surgical conditions. We combine the insights from our prior work, and others, to create a composite measure that incorporates all sources of information, including quality indicators for other, related operations, and uses empirical Bayes techniques to filter out statistical “noise”. Given the manner in which hospital ratings are used, we examined the ability of these measures to forecast future hospital mortality, compared to several other existing quality measures, including hospital volume, risk-adjusted mortality, simple empirical Bayes methods, and the process measures reported by the Center for Medicare and Medicaid Services (CMS) on its Hospital Compare website.


Data source and study population

We used data from the Medicare Analysis Provider and Review (MEDPAR) files for 2005–08 to create our main analysis datasets. This dataset contains hospital discharge abstracts for all fee-for-service acute care hospitalizations of US Medicare recipients, which accounts for approximately 70% of such admissions in the Medicare patients. The Medicare eligibility file was used to assess patient vital status at 30 days. The study protocol was approved by the Institutional Review Board at the University of Michigan.

Using appropriate procedure codes from the International Classification of Diseases, version 9 (ICD-9), we identified all patients aged 65 to 99 undergoing five high-risk surgical procedures that are targeted by the Leapfrog Group as part of their evidence based hospital referral program: coronary artery bypass grafting, aortic valve replacement, percutaneous coronary interventions, and resection of pancreatic and esophageal cancers [3]. To enhance the homogeneity of hospital case mix, we excluded small patient subgroups with much higher baseline risks, including those with procedure codes indicating that other operations were simultaneously performed (e.g., coronary artery bypass and valve surgery) or were performed for emergent indications (e.g., acute myocardial infarction with percutaneous coronary intervention) [7].

Hospital ratings

Hospitals were rated using information from 2005–06. In creating our composite measures, we considered several individual quality measures, including measures of hospital structure (volume, teaching status, and nurse-to-patient ratios), and outcomes (risk-adjusted mortality and nonfatal complications). For each operation, we considered hospital volume and mortality not only for the index operation but also for other, related procedures (e.g., coronary bypass mortality and volume were tested as inputs to the composite measure for aortic valve replacement). These quality indicators were selected because they can be readily estimated from administrative data and they have been shown in previous studies to correlate with mortality for many surgical procedures [12, 13]. In preliminary analyses, we also considered other quality indicators such as patient length of stay and alternative definitions of nonfatal complications, but these indicators did not correlate with mortality and were therefore not included.

Hospital volume was calculated as the number of Medicare cases performed during a two year period (2005–06). We constructed 3 separate measures: volume for the index procedure, volume for all related procedures (i.e., procedures in the same clinical specialty), and total hospital volume (i.e., including procedures from other specialties). After testing several transformations, we used the natural log of the continuous volume variable for each operation in our analyses. Hospital teaching status (membership in the Council of Teaching Hospitals) and nurse ratios (Registered Nurse hours per patient day) were assessed using data from the American Hospital Association survey data from 2005.

Risk-adjusted mortality and non-fatal complication rates were calculated at all hospitals over a 2-year period (2005–06). Mortality was defined as death occurring before discharge or within 30 days of surgery. Because of the well-known limitations of ICD-9 coding for complications, we focus on a subset of complications from the Complications Screening Project that have been demonstrated to have good sensitivity and specificity for use with surgical patients [14, 15]. Complication rates were calculated as one or more complication for each patient. For mortality and non-fatal complications, risk-adjustment was performed using logistic regression of patient covariates, including age, gender, race, urgency of admission, socioeconomic position, and comorbid diseases. To adjust for socioeconomic position, we used a zip code-level measure derived from the most recent US census data. Comorbid diseases were ascertained from secondary diagnoses using ICD-9 codes using the methods of Elixhauser [16]. Each comorbid disease was entered into the risk-adjusted model as an independent variable. We used logistic regression model to estimate the predicted probability of the outcome for each patient. These probabilities were then summed to generate the number of expected deaths. Risk-adjusted mortality was then calculated by dividing the observed by the expected deaths and multiplying by the average procedure-specific mortality rate.

We developed a composite measure that incorporates information from multiple quality indicators to optimally predict “true” risk-adjusted mortality. Our composite measure is a generalization of the standard shrinkage estimator that places more weight on a hospital's own mortality rate when it is measured reliably, but shrinks back towards the average mortality when a hospital's own mortality is measured with error (e.g. for hospitals with small numbers of patients undergoing the procedure) [11]. While the simple shrinkage estimator is a weighted average of a single mortality measure of interest and its mean, our composite measure is a weighted average of all available quality indicators – the mortality and complication rates for all procedures along with all of the observable hospital structural characteristics (hospital volume, nurse-staffing ratios, and teaching status) that are thought to be related to patient outcomes. The weight on each quality indicator is determined for each hospital to minimize the expected mean squared prediction error, using an empirical Bayes methodology.

Although the statistical methods used to create these measures are described in detail elsewhere [11], we will provide a brief conceptual overview (full mathematical details can be found in the APPENDIX). Perhaps the best way to conceptualize these methods is to imagine a simple regression equation where the outcome (e.g., a hospital's mortality rate in the next year) is not known. The goal of the analyses described below is to estimate the regression coefficients (i.e., the weights on different quality indicators) to best predict next year's mortality. We then evaluate our “model”, the composite measure, based on how well it actually predicts mortality in a subsequent time period.

The first step in creating the composite measure was to determine the extent to which each individual quality indicator predicts risk-adjusted mortality for the index operation. To evaluate the importance of each potential input, we first estimated the proportion of systematic variation in risk-adjusted mortality explained by each individual quality indicator (Table 1), where the systematic variation is defined as the hospital variation (i.e., between hospital variance) derived from hierarchical logistic regression models. We then entered each quality indicator into the model and assessed the degree to which it reduced the hospital-level variance, as described in our prior work [11]. Thus, when using the mortality from the index operation itself as the quality indicator, this estimate reflects the reliability of this outcome measure. We included any quality indicator in the composite measure that explained more than 10% of hospital variation in risk-adjusted mortality over a two-year period (2005–06).

Table 1
Components of the composite measure for all 5 procedures are shown, along with the proportion of non-random hospital-level mortality explained by each.

Next, we calculated weights for each quality indicator. The weight placed on each quality indicator in our composite measure was based on 2 factors [11]. The first is the hospital-level correlation of each quality indicator with the mortality rate for the index operation. The strength of these correlations indicates the extent to which other quality indicators can be used to help predict mortality for the index operation. The second factor affecting the weight placed on each quality indicator is the reliability with which each indicator is measured. Reliability ranges from 0 (no reliability) to 1 (perfect reliability). The reliability of each quality indicator refers to the proportion of the overall variance that is attributable to true hospital-level variation in performance, as opposed to estimation error (“noise”). For example, in smaller hospitals, less weight is placed on mortality and complication rates because they are less reliably estimated. We assume that structural characteristics of each hospital (such as hospital volume) are not estimated with error and, therefore, have reliability equal to 1.

Comparing performance of hospital ratings

We determined the value of our composite measure by determining how well it predicted future risk-adjusted mortality compared to several other approaches. For each operation (data from years 2005–06), hospitals were ranked based on the composite measure. Each hospital was assigned one of three rankings (1-star, 2-star, and 3-star). The “worst” hospitals (bottom 20%) received a 1-star rating, the middle of the distribution (60%) received a 2-star rating, and the “best” hospitals (top 20%) received a 3-star rating. Many hospital rating systems determine tiers of performance by designating high and low outliers by testing for statistically significant differences from the average. Because we used empirical Bayes methods, which adjust each hospital's composite for imprecision (i.e., hospital rankings are a valid indicator of relative performance), we used percentile cut-offs for this analysis. We then calculated the risk-adjusted mortality rates for 1-star, 2-star, and 3-star hospitals during the subsequent two years (data from years 2007–08).

We next assessed the ability of our composite measure to predict future performance compared to other widely used surgical quality. As with the composite measure, for each operation (data from years 2005–06), hospitals were ranked based on hospital volume, risk-adjusted mortality, “reliability adjusted” mortality (i.e., empirical Bayes shrinkage alone), and process measures from Hospital Compare. Hospital Compare reports data on each hospital's compliance with processes of care from the Surgical Care Improvement Program (SCIP). These measures track adherence to processes of care which are proven to prevent common surgical complications, including infection, venous thromboembolism, and cardiac complications [4]. For all of these analyses, we evaluated the discrimination in future risk-adjusted mortality, comparing the 1-star hospitals (bottom 20%) to the 3-star hospitals (top 20%) for each of the measures.

We also assessed compared the ability of the composite measure and these other quality measures to explain future (2007–08) hospital level variation in risk-adjusted mortality. To avoid problems with “noise variation” in the subsequent time period, we determined the proportion of systemic hospital-level variation explained. We generated hierarchical models with mortality as the dependent variable (2007–08) and used them to estimate the hospital-level variance. We first used an “empty model” that contained only patient variables for risk-adjustment. We then entered each historical quality measure (assessed in 2005–06) into the model. We then calculated the degree to which the historical quality measures reduced the hospital-level variance, an approach described in our prior work [11].All statistical analyses were conducted using STATA 10.0 (College Station, Texas).


Components of the composite measure

For each of the 5 procedures, several individual measures explained a significant proportion of hospital-level variation in risk-adjusted mortality (Table 1). The importance of each individual measure varied across procedures. For example, hospital volume with the procedure of interest explained 65% of the variation in risk-adjusted mortality for pancreatic resection, but only 11% for coronary artery bypass surgery (Table 1). Hospital volume with other operations was also important in explaining variation in mortality with all 5 procedures, explaining from 11% (coronary artery bypass surgery) to 43% (pancreatic resection). Other structural measures, including hospital teaching status and nurse-to-patient ratios, explained a large proportion of the variation in mortality for pancreatic resection and esophageal resection, but were not important for other procedures (Table 1).

The amount of hospital-level variation explained by each procedure's own mortality rate varied, ranging from 50% with coronary artery bypass surgery to only 19% for esophageal resection (Table 1). Mortality with other, related procedures was important in explaining hospital-level variation for all 5 procedures (Table 1). For example, mortality with aortic valve surgery and mitral valve surgery explain 25% and 20% of the hospital-level variation in risk-adjusted mortality with coronary artery bypass surgery, respectively. Rates of non-fatal complications were not important in explaining variation in mortality rates for any of the 5 procedures.

The average weights on each input measure also varied across procedures (Table 2). For each procedure, most weight was placed on the structural variables such as hospital volume. The weight placed on structural variables such as volume also includes the weight placed on the average mortality (consistent with empirical Bayes “shrinkage”) and therefore appears relatively high for all operations. Nonetheless, the amount of weight placed on each conditions own mortality rate varied from 46% with percutaneous coronary interventions to only 14% with esophageal resection (Table 2). For esophagectomy, the least common operation, the weight placed on mortality with another, related operation (colon resection) was higher than the weight placed on its own mortality (29% vs. 14%).

Table 2
Weights placed on each input measure for the composite measure for all 5 procedures are shown.

Ability of the composite to predict future performance

The composite score created by combining these individual measures performed well at predicting future hospital performance (Figure 1). Compared to 1-star hospitals, risk-adjusted mortality was much lower at 3-star hospitals for esophagectomy (6.7% vs. 14.4%), pancreatectomy (4.7% vs. 9.2%), coronary artery bypass surgery (2.6% vs. 5.0%), aortic valve replacement (4.5% vs. 8.5%), and percutaneous coronary interventions (2.4% vs. 4.1%). These differences in mortality were not explained by observable differences in patient severity of illness, as the differences in patient characteristics shown in Table 3 were adjusted for in all comparisons.

Figure 1
Future risk-adjusted mortality rates (2007–08) for 1-star, 2-star, and 3-star hospitals as assessed using the composite measure from the two previous years (2005–06).
Table 3
Patient characteristics for 1-star, 2-star, and 3-star hospitals in 2005–06.

For all 5 procedures, the composite measure based on 2005–06 data was better at discriminating future performance in 2007–08 when compared to risk-adjusted mortality (Table 4). For example, with coronary artery bypass surgery, historical mortality predicted a smaller difference between the 1-star (bottom 20%) and 3-star (top 20%) hospitals (OR, 1.61; 95% CI, 1.48 to 1.75) when compared to the composite measure (OR, 2.10; 95%CI, 1.93 to 2.28) (Table 4). For all 5 procedures, the composite measures based on 2005–06 data was better at discriminating future performance in 2007–08 when compared to hospital volume (Table 4). For example, with aortic valve replacement, hospital volume predicted a smaller difference between the 1-star (bottom 20%) and 3-star (top 20%) hospitals (OR, 1.65; 95% CI, 1.89 to 2.34) when compared to the composite measure (OR, 2.10; 95%CI, 1.89 to 2.34) (Table 4). Although reliability adjusted mortality (i.e., empirical Bayes shrinkage) performed better than risk-adjusted mortality assessed using standard techniques, the composite measure was better at discriminating future performance for all 5 procedures (Table 4).

Table 4
Relative ability of hospital rankings based on individual and composite measures from 2005–06 to forecast risk-adjusted mortality in 2007–08.

Composite measures were also much better at discriminating future performance than the measures publicly reported on the Hospital Compare website. When comparing the 1-star (bottom 20%) to 3-star (top 20%) hospitals, the composite predicted much larger differences than the Hospital Compare ratings for all 5 procedures (Table 4). For example, with pancreatic cancer resection, the difference in the risk of future mortality between 1-star and 3-star hospitals was much smaller (OR 1.40; 95% CI, 1.02 to 1.91) than the difference between 1-star and 3-star ratings based on the composite measure (OR, 2.28; 95% CI, 1.55 to 3.36).

The composite measures were also much better at explaining systemic hospital-level variation in risk-adjusted mortality in the next two years (Table 5). Although the ability to explain future risk-adjusted mortality varied across measures, the composite measure outperformed all individual measures including reliability adjustment (i.e., simple empirical Bayes shrinkage (Table 5).

Table 5
Proportion of subsequent (2007–08) hospital-level variation in risk-adjusted mortality explained by rankings for each quality measure assessed in the prior two years (2005–06).


In this study, we investigated the value of empirically weighted composite measures for assessing surgical performance. We found that several input measures explained a large proportion of hospital-level variation in risk-adjusted mortality, but the relative importance of each measure varied across procedures. Composite measures combining various types of information about quality were better at forecasting future performance than existing quality indicators, including hospital volume, risk-adjusted mortality, reliability adjusted mortality, and the quality ratings reported by Center for Medicare and Medicaid Services (CMS) on its Hospital Compare website.

There is growing interest in composite measures of performance in healthcare [17]. Recent pay-for-performance programs, including the Center for Medicare and Medicaid Services (CMS)/Premier pilot program, use composite quality measures for several medical conditions and surgical procedures [18]. The Society of Thoracic Surgeons (STS), which maintains a large national registry in cardiac surgery, recently created a composite measure of hospital performance by combining process and outcome measures [19]. Like our composite measures, these approaches are created by combining multiple input measures. However, they were designed with distinctly different goals in mind. The CMS and STS composite scores aim to provide a summary score of multiple domains of quality. In contrast, our measure was designed to optimize the prediction of one particularly central measure of quality—risk-adjusted mortality. As a result, we use a different approach to weighting input measures. Many existing approaches for creating composite measures, including those of the CMS and STS, assign equal weight to all measures (i.e., the all or none approach) or weight measures according to expert opinion. Our method differs from existing composites by providing an empirical weighting process that takes into account the importance of each input.

Our findings suggest that weights applied composite measures need to be tailored to the specific operation. We found that the reliability of mortality as a hospital quality measure varies dramatically across procedures. For very common operations, such as coronary artery bypass, more weight is placed on the mortality rate, largely because it is measured with more precision. At the other end of the spectrum, less common operations like pancreatic and esophageal cancer resection are not performed often enough to measure mortality precisely, and very little weight should be placed on the mortality rate. Another reason that weighting of inputs to a composite measure should be tailored to the procedure is that hospital volume matters more for some operations than others. It is well known that the strength of the volume outcome relationship varies across procedures (e.g., very strong for pancreatic and esophageal resection, much less so for coronary artery bypass) [6]. Prior to weighting measures in a composite score, this relationship should be systematically evaluated and used to guide the empirical weighting of input measures.

Given the value of our composite measure in predicting future hospital outcomes, we believe our measures would be particularly valuable for public reporting or value based purchasing. In such contexts, arguably the most important criterion of their usefulness is the extent to which measures based on historical information can predict outcomes here and now [20]. In quality improvement contexts, however, information about past performance is arguably most relevant to help hospitals target quality improvement efforts. While our composite measures perform well in discriminating hospitals on their historical performance, their summary nature makes them more limited (i.e., less actionable) for purposes of quality improvement.

We should acknowledge several limitations to this study. Because we used Medicare claims data, our adjustment for patient case mix is limited. Although we adjusted for several patient variables, including age, gender, race, urgency or admission, socioeconomic status, and secondary diagnoses, problems with risk-adjustment using administrative data are well known [21]. If differences in patient risk varied systematically across hospitals, our analysis would tend to overestimate the ability of hospital ratings to forecast future mortality. However, random, year-to-year differences in patient risk would bias our results towards a null finding and lead to an underestimation of the predictive power of composite measures. While there is little empiric data establishing whether these differences are random or systematic, there is a growing body of evidence suggesting that hospital case-mix may not vary substantially, especially among patients undergoing the same surgical procedure [22].

The use of Medicare claims data limits our study in several other ways. First, we used Medicare fee-for-service volume rather than total hospital volume. Although these volume measures are highly correlated, a more complete ascertainment of hospital volume would likely further improve the predictive ability of our measure. Second, administrative data are limited in their ability to accurately ascertain non-fatal complications. We focused on a subset of complications previously shown to have a high sensitivity and specificity on medical chart review [14, 15]. Using data from a clinical registry, where complications are determined from the medical chart based on rigorous definitions could potentially enhance the ability of our composite measure to predict future performance. Although there are a growing number of hospitals participating in clinical registries in surgery (i.e., the National Surgical Quality Improvement Program), such detailed data are not currently available for most US hospitals [23].

Our findings also suggest that surgical quality measures publicly reported by CMS on the Hospital Compare website are not ideal for helping patients identify the safest hospitals for surgery. The Hospital Compare measures are processes of care related to preventing surgical complications, as developed for the Surgical Care Improvement Program (SCIP). Although these measures were selected because of clinical trials linking them to better outcomes, there is growing evidence that these processes do not account for hospital level variations in important surgical outcomes, such as complications and mortality [24]. The composite measures described in this study would be much better at helping patients and payers identify low mortality hospitals for high-risk surgery.

Our findings may also have implications for the Hospital Compare measures for non-surgical conditions. CMS publicly reports risk-standardized mortality and readmission rates for several common, inpatient medical conditions, such as acute myocardial infarction, congestive heart failure, and pneumonia. The modeling strategy used by CMS for these measures addresses the problem of small hospital size (i.e., statistical “noise”) using Bayesian analyses somewhat analogous to those in this paper, with one key difference. The Hospital Compare measures do not include hospital volume or other structural variables in their modeling strategy. Because of this exclusion, they implicitly make the assumption that small hospitals have average performance, which is not supported by the empirical data [25]. As shown in this paper, hospital volume and other structural variables are important for optimally predicting a hospitals future performance. Silber et al have recently brought attention to this issue and assert that a measure based on mortality and hospital volume combined would more accurately reflect “true” hospital performance [26]. Thus, the methods presented in this paper may have important implications outside surgery.

Numerous stakeholders would benefit from better measures of surgical quality. Patients would benefit by having access to data that could help them increase their chances of surviving surgery by choosing the right hospital. Payers and health care purchasers would benefit by having reliable measures—created using readily available data—that would help them identify high quality hospitals for their selective referral and value-based purchasing programs. Although these composite measures could no doubt be improved with better inputs, they represent a significant advance over current surgical quality indicators.

Supplementary Material

Supp App S1


The authors would like to acknowledge Wenying Zhang at the Center for Healthcare Outcomes and Policy for assistance with data management and analysis.

Funding: This study was supported by a career development award to Dr. Dimick from the Agency for Healthcare Research and Quality (K08 HS017765) and a grant to Dr. Birkmeyer and Dr. Staiger from the National Institute of Aging to (P01AG019783). The views expressed herein do not necessarily represent the views of Center for Medicare and Medicaid Services or the United States Government.


Prior presentation: This work was presented at the American College of Surgeons Clinical Congress Papers Session in Washington, D.C., October, 2010.

Disclosures: Drs. Dimick, Staiger, and Birkmeyer are paid consultants and have an equity interest in ArborMetrix, Inc, a company that provides software and analytics for assessing hospital quality and efficiency. The company had no role in the conduct of the study herein.


1. Kizer KW. Establishing health care performance standards in an era of consumerism. JAMA. 2001;286(10):1213–7. [PubMed]
2. Galvin RS. The business case for quality. Health Aff (Millwood) 2001;20(6):57–8. [PubMed]
3. Dimick JB, et al. Composite measures for predicting surgical mortality in the hospital. Health Aff (Millwood) 2009;28(4):1189–98. [PubMed]
4. Stulberg JJ, et al. Adherence to surgical care improvement project measures and the association with postoperative infections. JAMA. 303(24):2479–85. [PubMed]
5. Birkmeyer JD, Dimick JB, Birkmeyer NJ. Measuring the quality of surgical care: structure, process, or outcomes? J Am Coll Surg. 2004;198(4):626–32. [PubMed]
6. Halm EA, Lee C, Chassin MR. Is volume related to outcome in health care? A systematic review and methodologic critique of the literature. Ann Intern Med. 2002;137(6):511–20. [PubMed]
7. Birkmeyer JD, et al. Hospital volume and surgical mortality in the United States. N Engl J Med. 2002;346(15):1128–37. [PubMed]
8. Dimick JB, Welch HG, Birkmeyer JD. Surgical mortality as an indicator of hospital quality: the problem with small sample size. JAMA. 2004;292(7):847–51. [PubMed]
9. Shahian DM, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg. 2001;72(6):2155–68. [PubMed]
10. Dimick JB, Staiger DO, Birkmeyer JD. Ranking hospitals on surgical mortality: the importance of reliability adjustment. Health Serv Res. 2010;45(6 Pt 1):1614–29. [PMC free article] [PubMed]
11. Staiger DO, et al. Empirically derived composite measures of surgical performance. Med Care. 2009;47(2):226–33. [PubMed]
12. Dimick JB, Staiger DO, Birkmeyer JD. Are mortality rates for different operations related?: implications for measuring the quality of noncardiac surgery. Med Care. 2006;44(8):774–8. [PMC free article] [PubMed]
13. Goodney PP, et al. Do hospitals with low mortality rates in coronary artery bypass also perform well in valve replacement? Ann Thorac Surg. 2003;76(4):1131–6. discussion 1136–7. [PubMed]
14. Weingart SN, et al. Use of administrative data to find substandard care: validation of the complications screening program. Med Care. 2000;38(8):796–806. [PubMed]
15. Lawthers AG, et al. Identification of in-hospital complications from claims data. Is it valid? Med Care. 2000;38(8):785–95. [PubMed]
16. Southern DA, Quan H, Ghali WA. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data. Med Care. 2004;42(4):355–60. [PubMed]
17. Peterson ED, et al. ACCF/AHA 2010 Position Statement on Composite Measures for Healthcare Performance Assessment: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Performance Measures (Writing Committee to develop a position statement on composite measures) Circulation. 121(15):1780–91. [PubMed]
18. O'Brien SM, et al. Exploring the behavior of hospital composite performance measures: an example from coronary artery bypass surgery. Circulation. 2007;116(25):2969–75. [PubMed]
19. Shahian DM, et al. Quality measurement in adult cardiac surgery: part 1--Conceptual framework and measure selection. Ann Thorac Surg. 2007;83(4 Suppl):S3–12. [PubMed]
20. Birkmeyer JD, Dimick JB, Staiger DO. Operative mortality and procedure volume as predictors of subsequent hospital performance. Ann Surg. 2006;243(3):411–7. [PMC free article] [PubMed]
21. Iezzoni LI. Assessing quality using administrative data. Ann Intern Med. 1997;127(8 Pt 2):666–74. [PubMed]
22. Dimick JB, Birkmeyer JD. Ranking hospitals on surgical quality: does risk-adjustment always matter? J Am Coll Surg. 2008;207(3):347–51. [PubMed]
23. Khuri SF, et al. Successful implementation of the Department of Veterans Affairs' National Surgical Quality Improvement Program in the private sector: the Patient Safety in Surgery study. Ann Surg. 2008;248(2):329–36. [PubMed]
24. Hawn MT. Surgical care improvement: should performance measures have performance measures. JAMA. 303(24):2527–8. [PubMed]
25. Ross JS, et al. Hospital volume and 30-day mortality for three common medical conditions. N Engl J Med. 2010;362(12):1110–8. [PMC free article] [PubMed]
26. Silber JH, et al. The Hospital Compare mortality model and the volume-outcome relationship. Health Serv Res. 2010;45(5 Pt 1):1148–67. [PMC free article] [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...