NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel to Advance a Research Program on the Design of National Health Accounts. Accounting for Health and Health Care: Approaches to Measuring the Sources and Costs of Their Improvement. Washington (DC): National Academies Press (US); 2010.

Cover of Accounting for Health and Health Care

Accounting for Health and Health Care: Approaches to Measuring the Sources and Costs of Their Improvement.

Show details

3Allocating Medical Expenditures: A Treatment-of-Disease Organizing Framework


Developing a national health account—whether the satellite medical care version of the Bureau of Economic Analysis (BEA) or a broader version designed to track population health status and its determinants—requires defining useful expenditure categories and then devising a method for allocating economy-wide spending on medical care into those categories. In addition, units of output that are meaningful from a consumer standpoint must be identified in such a way that price and quantity measures can be attached.1

In Chapter 2, we described the two existing accounts for medical care—(1) the National Income and Product Accounts (NIPAs) and (2) the National Health Expenditure Accounts (NHEAs)—and developed the profile of an improved and more adequate account that links medical care inputs with medical care output. Section 2.5 specifies the output concept for measuring the production of medical care: it is an episode of treatment for a disease. In the two existing accounts for medical care, however, the output concept is not fully developed, as neither account presents clear information on what the medical care system actually produces.

The NHEA have been compiled and maintained by the Office of the Actuary at the Centers for Medicare & Medicaid Services (CMS) since 1960. The accounts track the flow of funds into and out of the health care system, providing information on payer type (e.g., Medicare, out of pocket) and services purchased (e.g., hospital care, pharmaceuticals) in a series of standardized tables published annually on the CMS website. A typical NHEAs table forms a “sources and uses” matrix, imposing a specific set of accounting principles for who pays and how much, ensuring that all subtotals add up in a consistent manner.

While providing essential information on health care spending trends, the NHEAs have historically revealed little about the output of the sector—what is being bought—in terms that are meaningful for assessing medical care productivity and the impact on population health. The highly aggregated NHEAs data leave gaps that need filling if a number of critical health policy questions are to be resolved. Does an expensive new medical technology provide enough added health benefit to justify its use when compared with less costly alternatives? How do the public and private sectors encourage or limit adoption and diffusion of new technologies? And, more generally, which medical treatments are the most productive in terms of generating improved population health, and which are the least?

With the NHEAs alone, it is not possible to determine whether medical costs are increasing more because of cardiovascular disease treatments or because of cancer prevention activities. It is also largely unknown who is affected, and how, by the spending. Are vulnerable populations benefiting or suffering from current resource allocation strategies? Simply put, health care cost containment strategies in the United States are debated and pursued with inadequate information about what (or on whom) money is being spent (Triplett, 2001; Triplett and Bosworth, 2008). Addressing critical health policy questions requires more disaggregated data. Recognizing this need, there have been strong arguments for integrating cost-of-illness (COI) data into the NHEAs (Thorpe, 1999; Rosen and Cutler, 2007), linking microdata from national expenditure surveys to the macrodata in these accounts.

A similar deficiency limits the value of the medical care information in the NIPAs. Output estimates exist for the medical care sector and subsectors (for example, the ambulatory care subsector), but nowhere in the NIPAs is information presented on the products that the medical care sectors produce. Adding COI estimates to the NHEAs and the NIPAs can provide this critical information. Thus, a central issue in expanding either account of medical care is adding the disaggregated microdata needed to estimate treatment of disease costs.

As discussed in Chapter 2, linking health care spending to the treatment of specific diseases is useful in several respects. It provides a framework for understanding changes in the cost and quantity of health care, and it makes it possible to distinguish the effects of increasing prices for health care from the effects of increasing provision of services. Disease-based accounts also provide useful indicators of the economic burden individual diseases place on society; they can also be used to help identify how health resources are currently allocated, including across different population subgroups (informing questions of distributional equity). In addition, estimating health care expenditures by disease permits linkages with other essential information. For example, the effectiveness of therapies and the outcomes of care are measured this way, so a disease-based classification of spending is more clinically relevant and understandable to providers and to patients.

Of course, health care expenditures by themselves, even if grouped by disease, tell us little about health system performance or about priorities for resource allocation. Ultimately, if the links between spending on treatments and prevention (the inputs) can be successfully related to resultant changes in health status (outcomes), policy makers will be armed with a powerful tool—the information needed to better target spending to its most efficient uses. Ultimately, this tool will help determine who—the wealthy, the vulnerable, the elderly, the young—is benefiting or suffering from current resource allocation strategies.2 Developing this information base is a complex, multistage process.

In shifting the output concept, we noted in Chapter 2 that an episode of treatment may not apply neatly to a category, such as preventive care, that is beyond those explicitly designated for diseases. Likewise, episodes may need to be specified differently for acute care than for chronic care of the same disease. In short, there are multiple ways in which episodes of care can be conceptualized, categorized, and put into practice for attributing spending across the range of medical services. In this chapter, we sort through some of these options and describe issues that must be resolved in order to move toward a treatment-of-disease framework for a national health account.3


While the NHEAs measure spending broadly by source and recipient, a separate literature has focused on measuring the costs of particular illnesses using more disaggregated data. These COI studies quantify the economic impact of a disease and, along with information on prevalence, morbidity, and mortality, help portray the overall burden of disease in the population. Clabaugh and Ward (2008) provide a recent review of the COI literature.

3.2.1. Historical Context—United States and Abroad

The initial landmark studies to distribute total personal health care spending by diagnosis were published by Dorothy Rice in the 1960s (1966; Rice and Horowitz, 1967). These efforts were followed by a series of general COI studies estimating disease costs (Cooper and Rice, 1976; Berk, Paringer, and Mushkin, 1978; Rice and MacKensie and Associates, 1989; Rice et al., 1991; Hodgson and Cohen, 1999). Since the Medical Expenditure Panel Survey (MEPS) began, COI studies reporting direct medical costs have become more common (e.g., see Druss et al., 2001; Cohen and Krauss, 2003; Thorpe, Florence, and Joski, 2004; Roehrig et al., 2009).

COI studies in the United States have been influential in efforts to define disease burden, justify policy interventions, assist in the allocation of research dollars to specific diseases, provide an economic framework for program evaluation, and provide a basis for policy and planning activities (Rice, 2000). Responding to congressional requests, the National Institutes of Health (NIH) have produced several COI summaries (Varmus, 1995, 1997, 2000), and such estimates have been cited in congressional testimony, official reports, and other publications (Englander, Hodgson, and Terragrossa, 1996; Graham et al., 1997; Medicare Payment Advisory Commission [MedPAC], 2006). They have also served as justification for the expansion of research funding for specific disease areas. Indeed, Congress has expressed interest in using COI estimates as a tool for allocating research dollars among NIH, and panels of the Institute of Medicine have recommended their routine production (Institute of Medicine, 1998).

Beyond the United States—specifically in Australia, Canada, France, Germany, Japan, the Netherlands, Spain, Sweden, and the United Kingdom— researchers have developed COI estimates to account for national health expenditures; some have done this within national health accounting frameworks, in anticipation of further development of disease-based satellite accounts (Heijink, Koopmanschap, and Polder, 2006). Indeed, several recent cross-national comparisons have been performed (Polder et al., 2005; Heijink, Koopmanschap, and Polder, 2006; Heijink et al., 2008), and the comparability of data and methods has improved with each subsequent study. The Organisation for Economic Cooperation and Development (OECD) has sponsored work to develop a conceptual framework to account for expenditures by patient age, gender, and disease in the system of health accounts (Slobbe, Heijink, and Polder, 2007; Organisation for Economic Co-operation and Development, 2009).

Although their potential role in a health accounting data infrastructure is clear, COI studies have limitations. A single COI estimate is insufficient by itself as a policy tool. Rather, it needs to be embedded in a framework that allows disease-based cost estimates to be meaningfully connected to changes in population health (discussed in Chapter 5) and, in turn, to health policies. In addition, COI studies have to date mostly been one-period snapshots, without the necessary time-series comparability that would maximize their usefulness (a welcome recent exception is Roehrig et al., 2009). Some of the better cost estimates can provide inputs to cost-effectiveness analyses that are essential to better informed resource allocation strategies. That said, the methods employed by COI studies—many of which are conducted at aggregated levels—can vary substantially, and no single approach is at this time considered the gold standard, provoking ongoing debate about their usefulness for policy purposes (Koopmanschap, 1998; Bloom et al., 2001; Akobundu, Blatt, and Mullins, 2006; Clabaugh and Ward, 2008). Clouding this debate is an important but frequently overlooked distinction between two very different types of COI studies: disease-specific studies, which estimate the cost of a single disease, and general COI studies, which attempt to allocate total expenditures to multiple diseases.

3.2.2. Disease-Specific COI Studies

The vast majority of published COI studies are disease-specific (Koopmanschap, 1998; Bloom et al., 2001; Akobundu, Blatt, and Mullins, 2006; Clabaugh and Ward, 2008), and it is to these that most COI methodological concerns refer.4 It is often difficult, if not impossible, to compare cost estimates within a single disease or across different diseases because no standard COI methodology exists. Some studies produce prevalence-based (annual) COI estimates, while others produce incidence-based (lifetime) estimates (Hodgson, 1988). Some studies include direct costs only, while others include indirect costs as well. Studies vary in their perspective, time horizon, discounting practices, data sources, and underlying purpose. Frequently, studies do not include all critical components of direct spending and may therefore underestimate COI. For example, a COI study using Medicare claims data before 2006 would fail to include full costs of most prescription drugs (out-of-pocket costs, ignoring bad debt, are not explicitly on the claims5). At the same time, disease-specific studies risk double counting the costs of comorbidities and disease complications that are common to multiple diseases. If, for example, the costs of heart attacks are attributed to diabetes in one study, hypertension in another, and preexisting coronary heart disease in yet another, the total cost of all diseases will be overestimated. Indeed, a systematic review of COI studies by Bloom et al. (2001) found up to a seven-fold difference in estimated direct costs within a given disease. Furthermore, the total median direct medical cost of the 80 diagnoses reviewed was more than twice the total actual 1992 U.S. health expenditures (Bloom et al., 2001). There has been little effort in this single-disease study literature to reconcile or to explain to users the sources of differences.

Calls have been made for the development of standardized guidelines for performing and reporting COI studies (Hodgson and Meiners, 1982; Bloom et al., 2001; Ettaro et al., 2004; Clabaugh and Ward, 2008), analogous to those published for cost-effectiveness analyses (Gold, 1996). However, while standards may improve the comparability of these disease-specific studies, conceptually they are not as well suited as general COI studies are for the broader type of satellite health accounts described herein.

3.2.3. General COI Studies

General COI studies allocate total expenditures for a population to a group of diseases. Costs are usually estimated using a top-down approach in which total costs for the health care sector are used as the starting point and some fraction of the sector’s costs are attributed to each of the diseases of interest. By constraining to national expenditure totals, general COI studies are considered more methodologically sound (Koopmanschap, 1998) and are certainly more readily aligned with the NHEA than are disease-specific studies (Slobbe, Heijink, and Polder, 2007; Organisation for Economic Co-operation and Development, 2009). However, they too are not without limitations. As with the disease-specific estimates, costs must be constrained to a national total to avoid double counting. General COI studies reduce this risk (but do not preclude it) by creating disease groups that are usually mutually exclusive and exhaustive.

Comorbidities complicate the allocation of expenditures. If a person has diabetes and a prior heart attack and is now taking an ACE inhibitor, it is not obvious how the costs of the drug should be divided among diseases. The most common methodology for dealing with comorbidities is to assign each service to one condition—typically, the principal diagnosis (in the example above, most likely the heart attack)—but other methods, discussed below, are feasible. How to think about conditions—such as diabetes or hypertension—that are risk factors for later problems depends on the goals of the exercise. For dividing up current expenditures, these later problems are irrelevant, but for studying the impact of current treatments on future health, current costs may be a small part of the total impact (Lee, Meyer, and Clouse, 2001; Norlund et al., 2001).

Beyond comorbidities, thinking more generally about people with ill-defined diseases (e.g., How are costs for chronic fatigue syndrome or stress ulcers parceled out?), it is worth noting that, by and large, most of the existing COI studies to date have examined the “easy” disease cases; they have also typically dealt with diseases associated with high-aggregate expenditures. The hard-to-classify treatments are an important and difficult problem, and tracking this subset of expenditures in a detailed way will require much more study.

Another issue common to both disease-specific and general COI studies is how to separate prevention and screening costs from treatment costs. One would not want to consider fecal occult blood testing for colorectal cancer screening in the same spending category as chemotherapy to treat a diagnosed case. Both apply to the same disease, but they have very different implications for medical spending. This need to separate other types of medical spending from disease treatment is part of the rationale underlying Recommendation 2.11, which calls for research to begin on estimating the costs of (and eventually the health return from) nontreatment (i.e., management, preventive, diagnostic, screening), nondisease-specific (e.g., the cost of a physical), and long-term medical services.


The value of organizing a national health account around treatment-of-disease measurement units lies in its potential to better inform the policy process than do either NHEAs and NIPAs (as currently specified) or COI studies alone. While the NHEAs and economic accounts include comprehensive health care expenditures, the high level of aggregation and the lack of information on health preclude many policy analyses. National survey data (described below and in Appendix A) produce detailed information on both costs and health that can support COI estimation and microsimulation modeling of policy alternatives, but care must be taken to ensure that estimates do not exceed national expenditure totals. Institutionalizing the expenditure surveys’ disaggregated data within an economic accounting framework would ensure that estimates from COI studies and from microsimulation modeling link to and are constrained by the more aggregated totals in NHEAs (Thorpe, 1999). A combined analytic dataset would build on the strengths of each while addressing their weaknesses as stand alone data sources.

Disease-based accounts would supplement—rather than substitute for—the NHEA or the NIPA. The disease-based estimates can also be attached to the BEA industry account, as suggested in Chapter 2. The basic framework could build on NHEAs (and NIPAs) sources and could build on their matrix structures by adding disease categories as a third dimension. This three-way matrix would support multiple data tables: total expenditures by disease, payers by disease, and services purchased by disease, among others. The goal for the accounts would be to allocate total personal health care expenditures to a mutually exclusive, exhaustive set of disease categories. While tables would follow NHEAs standards for classification and completeness, the dimensions of the tables would largely be dictated by data availability. Therefore, while it would not be necessary to include every category of spending in a table, those categories that are shown would need to be distributed completely.

As noted throughout this report, one of the most difficult conceptual issues in allocating expenditures to diseases is dealing with comorbidities; when patients utilize medical care services for multiple conditions, it becomes much more complicated to assign costs and attribute treatment effects to predefined categories accurately. And, as noted elsewhere, comorbidities are only one source of heterogeneity in groups of individuals who might be classified as having a single diagnosis. Others are stage, severity, and even different diseases that might share the same disease classification code.

One aspect of these complications associated with comorbidities involves determining whether they are independent phenomena, which they seldom are. Researchers are rightly interested in the marginal effect of a treatment of a given disease on a comorbidity, or conversely, the effect of treating a comorbidity on a disease. For example, a patient admitted to a hospital for treatment of a heart condition may also suffer from Alzheimer’s disease and cancer. The comorbidities may not be risk factors for heart disorders, but their presence may contribute to a higher than average length of stay and related treatment costs for the heart condition. Some portion of the extra cost could legitimately be attributed to Alzheimer’s disease and cancer rather than solely to heart disease.

In considering the feasibility of a national health account and the rapidity with which it can be developed, it is important to point out that potential solutions do exist for problems created by comorbidities. For example, if a case involves heart disease and lung cancer, then the cost of heart disease could be assigned based on data on heart disease cases in which there were no comorbidities present; the same could be done to assign the cost of cancer. The total will either exceed the actual cost or be under the actual cost (if there are economies of scope in treating multiple condition patients). Thus, to estimate costs for cases with comorbidities, figures are compressed or inflated so that they agree with the actual cost. Of course this is not quite right, but the error is nowhere near what would induce giving up the project. This method is borrowed from some other, nonmedical applications in which more than one determinant is present and can be done without data on individuals; many ways of doing this have been developed by risk-adjustment researchers.


An important task in proceeding toward the production of a health account is to develop a methodologically rigorous, empirically feasible way of bringing NHEAs (and NIPAs) and COI studies together within a common framework for categorically allocating medical expenditures. The remainder of this chapter focuses on the steps needed to achieve this goal. It draws substantially from two sources: the first is a conceptual paper by Rosen and Cutler (2007) that outlines much of their work on this front to date. The second source is a report commissioned by the OECD6 titled “Draft Guidelines for Estimating Expenditure by Disease, Age and Gender Under the System of Health Accounts Framework” (2009); this report, developed by Slobbe, Heijink, and Polder (2007), is based largely on a 2003 COI study in the Netherlands, which itself draws on a wealth of experience accumulated since 1991, when efforts began to systematically estimate COI in that country (Koopmanschap, van Roijen, and Bonneux, 1991).

National health accounts organized around a disease-based framework require individual-level data indicating how much is spent for particular conditions. At the same time, as noted above, figures derived from the microdata must add up to national expenditure totals. Therefore, a central challenge for disease-based national health accounts is identifying individual-level data of sufficiently broad scope that can be linked across surveys and to NHEAs. While no single source of data will provide all of the information desired, the national expenditure surveys can provide a nationally representative sample with sufficient clinical detail to allow attribution of costs to different diseases. Indeed, several recent COI studies have used the Agency for Healthcare Research and Quality (AHRQ) MEPS (Druss et al., 2001; Thorpe, Florence, and Joski, 2004; Roehrig et al., 2009). However, MEPS underestimates national spending and requires adjustment if it is to match NHEA totals. In 2002, for example, national cost estimates from MEPS accounted for less than 70 percent of NHEA totals, partly due to the MEPS restriction to the noninstitutionalized population (Sing, Banthin, and Selden, 2006). In turn, the Medicare Current Beneficiary Survey (MCBS) collects data on institutionalized Medicare beneficiaries that could be used to supplement MEPS. However, there is no straightforward way to link these surveys.

There are a number of challenges that accompany the task of linking disparate national surveys. Combining the MEPS and MCBS for meaningful analyses requires more than simply concatenating two sets of survey data (Schenker et al., 2002). Each survey employs its own complex, multistage sample design that involves stratification, clustering, and oversampling of certain subpopulations of particular interest. Unique sampling strategies are then used to calculate a series of survey weights. Each survey also develops detailed design variables—frequently masked to protect respondent confidentiality—reflecting several nested levels of sampling strata and sampling units. The weights and survey design effects must be applied properly to ensure valid point and variance estimates.

If these surveys are to be used for source data for a national health account, it will become increasingly important that AHRQ and CMS work together to (1) develop standardized methods for linking MEPS and MCBS and (2) to develop standard methods for reconciling the linked MEPS-MCBS data set to NHEAs. AHRQ and CMS have made significant strides in reconciling MEPS data to the NHEA (Selden et al., 2001; Sing, Banthin, and Selden, 2006). Additional work by Rosen and Cutler (2007) has focused on linking MEPS and MCBS data in order to expand the scope of the covered population for reconciliation to NHEAs. It is encouraging that work to construct these data set linkages and reconciliations is progressing, but more research, with careful attention to detail, is required before they will be ready for use in an official statistical series. In the remainder of this section, we identify some of the more pressing challenges that will need to be addressed as reconciliation efforts mature.

Ensuring transparency about the scope of a national health account requires that decisions about which NHEAs cost categories to include be made explicit. Studies in some countries, such as the Netherlands, have often defined health care and costs of illness using a broad societal perspective (that may include “welfare” elements such as those related to informal care or the reduced well-being of family members due to morbidity and premature death) for their COI studies. U.S. studies, in contrast, including the research by Rosen and Cutler, have favored restricting the analysis to personal health care expenditures. There is no single right answer, but inclusion of nonpersonal health care does have one potential drawback when extended to health accounting: the method typically involves estimating the costs for a disease, not for persons with the disease. This implies that total costs for a disease can be translated to costs per capita but not so easily to costs per prevalent case of a disease. It also introduces types of spending and population groups that are most likely out-of-scope in the national surveys.

The implications of all the current and imminent data sets should be considered—for example, provider-side data would presumably pick up some excluded groups, but other coverage problems exist there as well. Researchers at the agencies working on this task will also confront the fact that, even for the covered populations, the scope of spending included in NHEAs and the surveys may differ. For example, NHEAs include total net revenues for all U.S. hospitals, but also government tax appropriations, nonpatient operating revenues (such as from gift shops), and nonoperating revenues (such as interest income) (Centers for Medicare & Medicaid Services, 2008). MEPS and MCBS, on the other hand, are event driven; most of these expenditures would not get picked up in the surveys, as they are not associated with discrete patient utilization events. Expenditures associated with discrete patient events (such as those going to freestanding labs and prescription medications) are underestimated as well. An approach implied by these data source characteristics is that provider-side data could be used as a control total, and the survey data on COI would then be used to help allocate across categories.

We discuss the characteristics and coverage of these national surveys in Chapter 6 in more detail because both expenditure- and outcomes-side data are needed. The remainder of this chapter focuses primarily on attributing expenditures to diseases assuming that national survey data will be the primary source of person-level estimates. We comment on some of the resultant challenges, leaving detailed discussion of data challenges and needs to the final chapter of the report.


A wide variety of disease classification schemas exist, all differing with respect to the requisite data elements, populations covered, units of analysis, time period to which the assessment is applicable, and, at the most basic level, the types of analyses each is designed to support. For example, the cost category for diabetes could separate or combine type I and type II cases. Furthermore, patients with complications could be differentiated from those without, or everyone could be left in one spending category. There is no obvious rule about which strategy is best. Most systems use the International Classification of Diseases (ICD), 9th/10th revision codes; however, the number of disease categories and the combination of codes mapping to a given disease can vary significantly across systems, pointing to a need for comparative research. Furthermore, it appears that many systems start with the ICD chapters or with some other existing classification schema and then add or subtract categories to adapt to local conditions, such as clinical practice. While this may help tailor the classification system to users’ needs, it makes standardization efforts difficult.

The validity of disease classifications can be optimized, in part, by grouping diagnoses into homogenous, mutually exclusive, exhaustive categories. However, the first-level categorization of the International Classification of Diseases-Ninth Revision, Clinical Modification (ICD-9-CM) (the most frequently used system in the United States) violates this rule, as do even the most detailed system entries. ICD-9-CM codes are organized into 17 broad categories or chapters— some represent organ systems (e.g., circulatory diseases, respiratory diseases); others represent conditions that span multiple organ systems (e.g., infectious and parasitic diseases, neoplasms); and one additional category is reserved for “symptoms, signs, and ill-defined conditions.” As a result, for many purposes, the chapters range from too narrow to too broad. Recognizing that the chapters, or an appropriate combination of chapters and subchapters, make up the schema for publication does not imply that they are adequate for grouping observations, which in principle should be at a much lower level. A related problem is that two different, not fully compatible versions of the ICD are in common use (ICD-9 and ICD-10).

One categorization schema, AHRQ’s Clinical Classification Software (CCS) (Elixhauser, Steiner, and Palmer, 2006), is unique in that it groups diseases with similar etiologies together, regardless of whether they cross organ system (or ICD-9 chapter) boundaries. This consistency, along with AHRQ’s ongoing and timely maintenance of the CCS (updated annually to capture the frequent changes to ICD-9 codes), makes it an appealing instrument for standardization efforts. At the same time, though, the inconsistencies in many of the other grouping systems have made mapping to CCS challenging as well (Lu and Tsiatis, 2005).

A variety of commercial risk adjustment tools—such as Medical Episode Groups, Episode Treatment Groups, and Diagnosis Cost Groups—have also been used as the basis for disease categorization schemas. While, to our knowledge, no comprehensive catalogue exists, there have been two excellent recent reviews of many of these disease classification systems, one developed for clinical outcomes (Lu and Tsiatis, 2005) and the other for risk-adjusting costs (Winkelman and Mehmud, 2007). In the first study, seven grouping schemes—five for mortality and two for morbidity—were evaluated, and poor comparability was found to exist between them. The various schemas used different grouping logic, covered different ranges of codes, and named some groups the same but defined them with entirely different diagnostic codes. It is noteworthy that this set of divergent grouping schemes are the ones used to make most international mortality comparisons (Lu and Tsiatis, 2005). The second review, by the Society of Actuaries, made side-by-side comparisons of 12 largely commercial claims-based, risk- adjustment models. The models varied markedly in the data fields used to define patient risk categories and their output. For example, risk-adjustment tools may or may not include age, sex, and secondary diagnoses. Some included pharmacy and laboratory data, while others did not. The number of risk categories varied substantially, as did the proportion of expenditures that could be allocated to disease groups (Winkelman and Mehmud, 2007).

Problems associated with this kind of modeling might largely be solved through adoption of a standardized list of diseases, which would make it possible to map local classifications onto the list. Unfortunately, such a list does not yet exist, although some progress has been made. The World Health Organization has, in collaboration with OECD, Eurostat, and the Nordic Medico-Statistical Committee, recently developed an international short list for the tabulation of hospital data (World Health Organization, 2009). This is a useful point of departure for discussion, but the disadvantage for use in a COI analysis is that it was developed specifically for use with hospital data and may not be well suited for use with other providers.

In identifying an appropriate disease classification system, the number of categories will depend on the available data and current scientific knowledge. If this number is large, a two-level classification can be developed with a more aggregated level analogous to ICD-9 chapters and a disease level within these chapters (Slobbe, Heijink, and Polder, 2007; Heijink et al., 2008; Organisation for Economic Co-operation and Development, 2009). For each chapter, key diseases can be broken out (e.g., diabetes) with others mapped to a residual or “other” group (e.g., other endocrine diseases). When attributing expenditures to diseases, two additional categories will be needed as well: “disease unknown” and “not disease-related” (Slobbe, Heijink, and Polder, 2007).

The key point here is that, given the number of different disease classification schemas currently in use in the U.S. health care system, it is essential for all players, in both public and private sectors, to participate in and come to a consensus on the development of a single unified version.

Recommendation 3.1: A concerted effort is needed to reach consensus on how to classify diseases and about what the criteria are by which diseases are disaggregated from the very broad International Classification of Diseases chapters. The National Center for Health Statistics should lead the effort, working with the Agency for Healthcare Research and Quality, the Centers for Medicare & Medicaid Services, the Bureau of Economic Analysis, and other relevant statistical agencies. As part of this effort, U.S. agencies should participate in ongoing standardization efforts (such as those sponsored by the Organisation for Economic Co-operation and Development or the World Health Organization) to benefit from international expertise, to consider these as the basis for a national system, and to facilitate international comparisons.

The basic principles underlying the groupings are that they should be clinically meaningful, derived from routinely collected data (to the extent possible), and limited to a manageable number of categories. The criteria must ensure practicality as well as acceptability by the medical and economic communities.

Conceptually, the idea of having to choose groupings of ICD chapters and subchapters is not a major hurdle. The ICD already has a rich disaggregation below the hierarchy; however, resources are needed to sample entries for detail below the chapter headings.

Once a common disease classification system is chosen, the next step will be to implement it in order to generate data that are useful for medical care and health accounting purposes.

Recommendation 3.2: Using a population subsample for which good data exist, a pilot study should be undertaken by the Bureau of Economic Analysis using a proposed classification system with the goal of identifying adaptations needed for a national health account. At the point when the classification schema has undergone initial rounds of revisions and modifications, a concerted effort should be undertaken to consistently measure these diseases in the national health surveys in order to more accurately capture their epidemiologies.

Implementing these recommendations will create a foundation from which more targeted research can be conducted to attribute spending to the diseases. This step can be separated into the aggregation task, which can be thought of as the unit of analysis, and attribution, which is the method by which spending gets assigned to specific diseases. Because the two concepts are largely inseparable—the attribution method will almost always determine the level of aggregation—it is important to consider the conceptual basis for each before proceeding. Section 3.6 describes the attribution of spending to encounters, episodes, and persons. Section 3.7 discusses the conceptual basis for selecting the output measures and, thereby, the units of analysis.


There are three distinct conceptual approaches to attributing costs to illnesses using medical claims data. While each approach has implications for the unit of analysis, this is rarely explicitly indicated in the literature. The first approach is an encounter-based method in which spending is attributed to one or to several diagnoses as reflected by data extracted from patient claims for that one encounter (or visit). A second approach constructs episodes of treatment—estimating the spending on all services considered to be involved in the diagnosis, management, and treatment of a condition. The unit of analysis—an episode—may have variable lengths of time. The third method takes a person-based approach, tracking individuals for a set period of time (often 1 year) and then attributing each individual’s spending to different disease treatments. Each approach tends to be used in different settings, and each has its own advantages and drawbacks.

3.6.1. Encounter-Based Approach

Conceptually, most COI studies have used an encounter-based approach, estimating disease-specific spending by diagnoses listed on medical claims and assigning each claim to a spending category (Rice, 1966; Rice and Horowitz, 1967; Cooper and Rice, 1976; Berk, Paringer, and Mushkin, 1978; Rice and MacKensie and Associates, 1989; Druss et al., 2001; Cohen and Krauss, 2003; Thorpe, Florence, and Joski, 2004). In this approach, it is easy to see that comorbidities create problems. A common practice is to assign each service claim to one condition, generally the primary diagnosis, but this dilutes the apparent cost impact of many important risk factors. For example, if a person with diabetes, hypertension, and coronary heart disease visits a doctor, to which disease should the costs be attributed? What if only coronary heart disease is listed on the encounter despite the fact that the diabetes likely contributed to the coronary heart disease? Likewise, this method has difficulty accounting for downstream complications. If a person who has been treated for diabetes later has a heart attack, is the subsequent spending a result of the former or the latter? Most analyses assign the downstream costs to the heart attack, which underweights the future costs of diabetes (Lee, Meyer, and Clouse, 2001; Norlund et al., 2001). These issues are particularly important for individuals with such conditions as coronary heart disease, in which multiple chronic diseases and risk factors are the rule, rather than the exception.

Research by the Altarum Institute used primary or first-listed diagnostic categories to allocate expenditures. In this method, disease categories can be allocated at varying levels of detail. For the Altarum project, 660 clinical classification categories were used based on AHRQ-created groupings.7 Figures for these categories can, if desired, be aggregated into a smaller number of categories.

The main advantage of the encounter-based approach is that, when claims data are available, it is relatively easy to attribute costs to diagnostic categories (it is essentially an accounting exercise). At the same time, however, a nontrivial portion of spending has no associated claims or valid diagnosis codes, such that those costs cannot be allocated to diseases. Furthermore, the encounter-based COI estimates are not readily compared with health outcomes, which are measured at the person level.

3.6.2. Episode-Based Approach

An episode of care involves a set of services whose beginning and end is defined in parallel with a patient’s course of treatment (which may or may not coincide with the patient’s discharge). The concept of an episode of care as a unit of measurement dates back to the 1960s. Its most widespread use has been for hospital reimbursement based on diagnosis-related groups, in which payments are based on the inpatient episode of care (Hornbrook, Hurtado, and Johnson, 1985; Rosen and Mayer-Oakes, 1999). Theoretically, a full episode of care runs from the initial diagnosis of a condition to the completion of all treatment for that condition. In practice, the feasibility of measuring complete treatments largely depends on the degree of fragmentation of the services making up the treatment and the availability of data (Hyman, 2009).

Payers are increasingly using episode-of-care “grouper” software in an attempt to profile physician efficiency (Pacific Business Group on Health, 2005; McGlynn, 2008; Sandy, Rattray, and Thomas, 2008; Miller, 2009b). Grouper software is so named because it sorts through millions of claims records, and it groups together all of a patient’s claims related to a given diagnosis over a set time window (the episode of care). The key piece to this (at least for price index development) is the grouping of patients’ clinical conditions into discrete, clinically homogenous disease categories with similar expected resource consumption. Commercial episode groupers differ in their input data (e.g., Current Procedural Terminology, ICD-9-CM, Healthcare Common Procedure Coding System, National Drug Code, hospital revenue codes), the number of categories into which diagnoses and procedures are assigned, and the way they identify increasing medical complexity and illness severity (e.g., whether the presence of a procedure in an episode is used to define severity). They also differ in the length of the “clean periods” that signal the end of an episode and the beginning of another.

To date, these episode groupers have not been adequately vetted by research (McGlynn, 2008). While a number of alternate episode groupers are already widely in use, they have received little scientific evaluation to date, and they have not been extensively tested for reliability, validity, or agreement with each other (McGlynn, 2008). A small but growing body of research by CMS (MaCurdy et al., 2008) and others points to significant variation in the output of different vendors’ groupers. Perhaps most problematic is that the episode-grouping algorithms are proprietary and largely a black box, making it difficult to use them for public work.

Beyond the grouper-specific issues, comorbidities and the resulting joint costs are major challenges with the episode-based approach as well. It is common for individuals to receive treatment for multiple diseases simultaneously, and these comorbidities can lead to a very complicated picture of episode definitions and measurement (Hornbrook et al., 1985). Even in the absence of comorbidities, other challenges arise. It is often difficult (or not possible) to link data when the episode’s services are supplied by several different providers. For chronic disease episodes, length of the episode must be determined (it is often set arbitrarily at one year). Complications of treatment for one condition may lead to the development of another. Should these be treated as a new episode or an old one? And, as we have pointed out, medical treatments do not always fall neatly into a disease category.

3.6.3. Person- (or Population-)Based Approach

A conceptually similar, but alternative, way of attributing expenditures on medical care to disease categories is to use a person-based approach. The distinction here is that spending is assigned to the entire set of diseases a person has, not simply to the primary diagnosis listed on a claim. Indeed, person-based (or population-based) measures were the norm before the introduction and rapid gain in use of the proprietary episode groupers. The key feature of these case-mix measures is that individuals—rather than episodes—are classified into clinical categories based on similar demographic and clinical characteristics (and grouper software exists for this purpose as well). Again, the goal is to categorize patients into relatively homogenous groups with respect to resource needs over some specified time window (usually 1 year).

In this approach, an individual’s total health care spending over the period is regressed on indicators for the presence of all medical conditions. This approach is designed to produce more valid estimates for patients with multiple chronic conditions, as it better captures expenditures for comorbidities and complications. That said, the regression specification typically assumes that comorbidities have an independent effect on spending with few interaction terms included in the models. An empirical issue is what interaction terms to include. For the most part, clinical expertise is needed to identify the appropriate groups of co-occurring diseases; while clinical insight is likely to result in better estimates, the need for clinical expertise represents a limitation as well, particularly for federal agencies that have not typically had access to clinician health services researchers. There is also the issue of how to allocate the intercepts for the base spending for the year.

These criticisms aside, an attractive conceptual feature of person-based cost estimates is that they can be readily matched to health outcomes (a topic of Chapter 6), such as mortality and quality of life, thereby providing the critical link between spending and health needed to measure value more systematically. Furthermore, unlike the encounter-and episode-costing approaches, a person approach can conceivably attribute the costs of cases for which there are no valid claims or ICD-9 codes.

The person-based measures have been more thoroughly studied than have the episode-based measures, largely because they are older and many were developed by health services researchers. Many of the measures are statistically valid, are easy to implement, and have good predictive ability for explaining variation in utilization (Ellis et al., 1996; Weiner et al., 1996; Rosen, 2001; Iezzoni, 2003) when used in the populations for which they were developed (Arbitman, 1986; Hornbrook et al., 1991; Rosen, 2001; Iezzoni, 2003).

Person-based measures have limitations as well. First, the different groupers vary markedly in the data inputs required to define patient risk categories and then, in turn, outputs. For example, the groupers may or may not include age, gender, or secondary diagnoses. Some include laboratory, pharmacy, and/or procedure data, while others do not (Rosen, 2001; Zhao et al., 2001; Grazier, Thomas, and Ward, 2002, 2006; Iezzoni, 2003). Second, the effective sample size is smaller with person-based than with episode-based measures (e.g., one individual can have multiple episodes in a year). Because of this, while more patient groups are desirable to increase the homogeneity of expected resource use within groups, this must be balanced against the smaller sample sizes.

3.6.4. Comparing Episode- and Person-Based Methods

Although the evidence base is limited, researchers have begun comparing the different measures for attributing costs to illnesses. Thomas, Grazier, and Ward (2004) tested the consistency of six groupers (some episode-based and some person-based) for measuring the costs of primary care providers and found moderate to high agreement (weighted kappa = 0.51 to 0.73) between physician efficiency rankings using the different measures. In contrast, the Medicare Payment Advisory Commission (2006) compared episode-and person-based measures in area-level analyses and found they can produce different results. For example, compared with Minneapolis, Miami had lower average costs per coronary artery disease episode but higher average per capita costs due to a higher volume of episodes. Box 3-1 provides a more detailed description of several commercial episode- and person-based case-mix measures (McGlynn, 2008).8

Box Icon

BOX 3-1

Examples of Episode- and Person-Based Case-Mix Measures. Episode-Based Measures Episode Treatment Groups (ETG). The ETG methodology identifies and classifies episodes of care, defined as unique occurrences of clinical conditions for individuals and the (more...)

Over the past year or two, the Cutler-Rosen group has been working with BEA to empirically assess quantitative differences in the various approaches to allocating medical care expenditures by disease; Rosen reported some preliminary findings at the panel’s workshop. The research objective is to reconcile disease categories among the encounter-, episode-, and person-based regression approaches; to simulate costs of diseases using each; and to compare and contrast the findings. For this project, health claims data from Pharmetrics, Inc., for the period 2003–2005 were used. For 2003, the data cover just over 3 million lives and include total spending of $9.09 billion on inpatient and outpatient services, office visits, prescription drugs, skilled nursing facilities, and laboratory services. Up to four ICD-9 diagnoses are present on a given claim, although only the primary diagnosis is listed for hospital claims. Symmetry software from Ingenix was used to link medical expenditures to disease categories.

In order to reconcile the three approaches to common disease categories, the researchers first mapped ICD-9 codes into CCS categories. These were aggregated into 65 clinically meaningful groups that had been developed earlier based on clinical advice from physicians. Cost categories were created primarily for diseases with known treatments that have led to health benefits and for which more detailed analyses could be done matching quality to costs.

For the person-based regression approach, the authors were able to use all listed diagnoses on claims in a given year. For the encounter approach, an algorithm was used to determine the diagnosis (usually the first listed) to which the majority of spending went, and the dollars were assigned to that category. For the episode-based approach, each episode treatment group (ETG) was allocated to the clinical group that accounted for the largest share of spending. There were a few problems with the ETG approach. For example, there was no ETG for cervical cancer; also, the transparency issue remained—the method of aggregating data into the ETG was still essentially a black box.

Under the encounter approach, about 19 percent of the spending recorded from claims had no listed diagnosis, and the dollars could not be allocated to a condition.9 Using the episode-based approach, only 1 percent of spending originated from claims with no ETG. Using the person-based approach, expenditures for individuals with no diagnoses accounted for only about 0.6 percent of the total. The problem of unlinkable spending was clearly most serious with the encounter-based approach.

The Cutler-Rosen work demonstrates that the cost of illness can be estimated by each of the proposed methods. The total dollar amount that can be allocated differs and, certainly, the fact that noncomparable data sets are being used for the different methods also has an impact on the results. Table 3-1 shows annual spending by condition estimates. The spending estimates varied significantly for some disease categories. For example, the person-based approach yielded very high annual expenditures for dementia—on the order of $9,000—relative to the encounter-and episode-based approaches. The likely reason is that the regression used does not include all of the needed interaction terms, so the estimates essentially capture unobserved correlates of spending. Instead of getting just spending on dementia, the coefficient is picking up aspiration pneumonias, feeding tube treatments, and all sorts of other things for which clinically meaningful categories need to be created. This illustrates why it is important to bring clinical insight into analyses.

TABLE 3-1. Annual Per-Patient Cost for Selected Diseases by Method, 2003.


Annual Per-Patient Cost for Selected Diseases by Method, 2003.

For some of the same disease categories, the encounter-based approach appears to underestimate expenditures. This may have something to do with the way risk factors for diseases are commonly coded. For example, physicians may be more likely to code coronary heart disease than they are diabetes, hypertension, or hyperlipidemia. In contrast to the encounter-based approach, which relies entirely on physician coding on claims, one useful feature of the person-based approach is that coding can be captured over time, so more information about multiple conditions can be obtained; the approach also allows the claims data to be supplemented with surveys, injecting information from patients that can enrich the picture.

The research team has not yet done any time series with MEPS. Making some direct comparisons, they have found that some of the same things—for example, dementia or acute renal failure, which tend to occur in patients being treated for other conditions simultaneously—end up being much higher with the person-based regression approach than the others.

Ultimately, the choice of episode-based versus population-based measures will depend on the context in which they are to be used (Luft, 2006; Davis, 2007; McGlynn, 2008; Miller, 2008, 2009a; Mechanic and Altman, 2009). For example, while care of acute conditions may best be understood at the episode level, chronic disease care (and the provision of preventive services, such as cancer screening) may be better understood at the person level. For a given setting, the predominant provider payment approach may also impact the choice of measure. Whereas fee-for-service payments make episodes somewhat easier to interpret, capitation could be more readily evaluated at the person level.

So, which approach is best? At this point, the panel cannot definitively endorse one method for allocating expenditures to diseases over others. Rather, the best method depends largely on the question at hand and the needs of the target audience. For example, if the goal is to compare costs and health effects for a given disease, as is done in cost-effectiveness analyses, a person-based approach is likely to be most appropriate. In contrast, if price index construction is the goal, federal agencies may find an episode-of-treatment approach more meaningful. For a manager of a health plan trying to understand why emergency room spending patterns have changed, real-time answers may be possible only with an encounter-based approach. The choice of method will also invariably be constrained by the availability of data.

In the long term, what is needed is more empirical work to compare different approaches and to determine more definitively which is best under different conditions. BEA, for its part, is working both internally and with the Cutler-Rosen team to establish what the allocations end up looking like under the different methods and whether it matters for estimating expenditures and prices. These researchers have already begun producing disease-based cost estimates; spending could also be further broken out into subcategories along functional lines, such as disease prevention, diagnosis, and screening activities. This is important since, as noted earlier, not all medical spending can be attributed specifically to the treatment of a disease or condition. Whichever method of allocating expenditures is used, it has to offer a solution to the comorbidity problem.

As different output measures are developed, some mechanism will be needed for commissioning demonstration projects to determine what actually makes a difference in practice.

Recommendation 3.3: The Bureau of Economic Analysis (working with academic researchers and with the Bureau of Labor Statistics) should continue to investigate the impact of different expenditure allocation approaches— particularly the episode-and person-based methods—on price index construction and performance. Research is needed to determine which method is best under different circumstances.

As part of this effort, BEA (perhaps in coordination with AHRQ and the National Institute on Aging) should sponsor a workshop for the three vendors of episode grouping software and the top three or four person-based case-mix system vendors to present their products, how they are used in the marketplace, and the underlying rules and logic.

At this point, a cautious approach is warranted, as it is too early for BEA to buy into a particular method for aggregating treatment costs. This means that there may be a need for parallel sets of accounts, at least on a research basis, for some time. BEA researchers should continue to experiment with competing methods (some, such as regression techniques, would be statistical; others may be deterministic) of parsing expenditures into disease groups using different kinds of data (e.g., claims records, survey data). It would be helpful to get a practical sense of how different results would use various approaches and data sources. Results of comparisons will depend on the level of disaggregation. It will be difficult to determine which method, in the abstract, is best—there will inevitably be some joint production with arbitrary allocations of dollars. And there are practical considerations—for example, the proprietary nature of the grouper software—that may steer the work toward particular approaches and away from others.

Recommendation 3.4: The Bureau of Economic Analysis, working with academic researchers (and perhaps other agencies, such as the Centers for Medicare & Medicaid Services and other parts of the Department of Health and Human Services), should collaborate on work to move incrementally toward the goal of creating disease-based expenditure accounts by attempting a “proof of concept” prototype. Using a subgroup of the population with good data coverage, the prototype would attempt to demonstrate that dollars spent in the economy on medical care can be allocated into disease categories in a fashion that yields meaningful information.

Choices will have to be made about how to aggregate rare events and unusual comorbidity combinations. The project should attempt to determine how sensitive expenditure allocation figures are to alternative choices.

Selection of an appropriate group for the pilot should be based on data quality and completeness. The Medicare population, the military, or veterans—groups for which their spending and health data are available (and for whom a good deal of the medical care action takes place)—would be logical choices. Alternatively, a disease-costing pilot could be done for a well-defined, geographically (and administratively) complete group, such as found in parts of Intermountain Healthcare, Geisinger Health System, or one of the Hawaiian islands, before attempting it on a national basis.



Health care purchasers are struggling with a distinct but not dissimilar challenge—determining how best to measure efficiency within the health care system (Leapfrog Group for Patient Safety and Bridges to Excellence, 2004; Pacific Business Group on Health, 2005; McGlynn, 2008; Physician Consortium for Performance Improvement® Work Group on Efficiency and Cost of Care, 2008). These efforts also require identifying meaningful (i.e., measurable and actionable) measures of health care output.


COI estimates have been made for population subgroups. Moreover, the Bureau of Labor Statistics already controls for at least some demographic aspects when it prices diagnoses for the Producer Price Index. However, if such estimates are made by population subgroups, we might envision needing a price index for each, e.g., females with heart disease, Also, it is difficult to say whether there would be significant gains from adding demographic breakouts within the existing disease grouping detail—we do not know (very well) the variation in health status gains from various treatments across groups. Our intuition is to give priority to additional disease disaggregation over disaggregation by demographic group, but it is premature to answer, and perhaps even to consider, this empirical question now.


Even if agreement is reached that episodes of care should be the unit of output, questions still exist about how to attribute the expenditures. Aggregation must take place at the person level if measures of health care output and health outcomes are to be comparable. It follows from our definition of quality that the unit for measuring medical sector output should be the patient treated. This makes it necessary to link the activities directed at treatment of a patient. For example, a patient undergoing treatment for heart disease would receive prescriptions for various drugs, attend outpatient clinics, and have lab tests. This topic is discussed in greater detail later in the chapter.


Also, authors of these studies sometimes have conflicts of interest—funders may want to establish very high costs for their disease.


However, one knows the Medicare allowable cost and what Medicare paid. The difference is the beneficiary’s share, but it could be covered by supplementary insurance or could turn into bad debt or could be paid out of pocket.


Details about the Altarum Institute project are described in National Research Council (2009).


The other potential unit of output, the visit or encounter (discussed earlier), also uses groupers (e.g., case-mix measure) to classify the resources utilized. As with the other measures, the spending sorted into each visit-based group has similar diagnostic codes, and the hope is that the groups would have similar expected resource use and cost. An example is Ambulatory Patient Groups, which was developed by 3M (Averill et al., 1990). The system was designed to explain the amount and types of resources used in Medicare hospital-based outpatient visits. A shortcoming is its restriction to outpatient care.


Hodgson and Cohen (1999), using an encounter approach, reported only 10 percent of expenditures with unallocated diagnoses. It would be enlightening to compare these results.

Copyright © 2010, National Academy of Sciences.
Bookshelf ID: NBK53341


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.3M)
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...