NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel to Advance a Research Program on the Design of National Health Accounts. Accounting for Health and Health Care: Approaches to Measuring the Sources and Costs of Their Improvement. Washington (DC): National Academies Press (US); 2010.

Cover of Accounting for Health and Health Care

Accounting for Health and Health Care: Approaches to Measuring the Sources and Costs of Their Improvement.

Show details

5Defining and Measuring Population Health


5.1.1. Motivation

The novel challenge of a national health account is measuring health. In order to answer the question “What are people getting for their health care dollar?” it is necessary to be able to track the health of the population and its subgroups accurately, including those in vulnerable segments of the population. The broad health account requires data on medical care expenditures (and other nonmedical and nonmarket health-affecting inputs) and on the health benefits derived, which are what patients and, collectively, society seek to purchase. The output side of the account is quantified in terms represented by the population’s health. Monitoring changing population health on a disease-by-disease basis is also relevant to medical care accounting in the National Income and Product Accounts (NIPAs) because, as discussed in Chapters 2 and 4, tracking health outcomes will ultimately play a key role in quality adjusting the price of medical treatments. Data on health inputs and outputs are also crucial for researchers attempting to link the two sides of the equation—that is, to attribute deaths and impairment to diseases, medical conditions, and other causal factors.

In a satellite health account, output associated with investments in health should be measured independently of inputs. Previous chapters of this report have discussed options for measuring the output of medical care (the treatments), which is an input to health. Independent measurement of health, however, means going beyond simply adding up the value of inputs to yield a value for the output side of the account. In estimating that value, both components of output—the consumption flow of good health and the additional (or reduced) income that a healthier (or less healthy) population generates—should be measured (National Research Council, 2005, pp. 131–132).

In this chapter, we describe and assess various approaches for measuring population health (mortality and nonfatal/impairment dimensions), acknowledging that best measures may vary by purpose and for different populations along the health spectrum. We focus on generic health here, realizing that there is also interest in disease-specific indicators. Our recommendations in Chapter 6 for data collection on major chronic diseases to facilitate research on spending and health linkages would provide the data for a more detailed annual report, as recommended in the Institute of Medicine’s State of the USA Health Indicators letter report (2009).

In addition to current health status, we also consider risk factors that impact future health. Current health status does not capture the health effects of such risks as hypertension, which does not cause symptoms today but may do so in the future. In principle, these risk factors, together with age, current health, and a variety of other determinants (see Chapter 6) go into health capital (a stock of health), which can be defined as the present discounted value of future health consumption flows (Grossman, 1972). Some portion of a person’s health capital is determined genetically, but health can also be affected through investments in inputs ranging from medical care to personal behaviors, such as consumption habits and exercise. While calculating national health capital is not feasible at this time, data on some of the risk factors that go into those calculations could be collected—or, at least, data that are already collected could be extended and more systematically compiled period by period.

Monitoring the population’s health status requires metrics that combine quantity and quality of life such as quality-adjusted life years (QALYs). A number of such measures are already widely used to identify unmet health needs and to guide policies for addressing those needs. The multiplicity of measures reflects the lack of an undisputed definition of and method for measuring population health (Kindig and Stoddart, 2003; McDowell, Spasoff, and Kristjansson, 2004).

Nonetheless, some broad generalizations are possible. For the purpose of developing a health account used to evaluate productivity of direct medical and public health services, the health of a population or subgroup within the population is taken to be some combination of survival probabilities and the sum (or, equivalently, the average) of the health of survivors in the population or subgroup. Measuring the health of the population under this assumption reduces to calculating death rates, measuring the health of individual survivors, and aggregating across individuals.

5.1.2. Mortality and Life Expectancy

The oldest and simplest measures of population health are death records. These were used, for example, to calculate the impact in excess deaths of the great plague of 1665–1666 in London (Champion, 1993). With the decline in deaths from infectious disease in the developed world, crude death rates are primarily determined by the age composition of a population. For example, about 0.4 percent of the Mexican population died in 2003 compared with 1 percent of the Italian population. The major reason for the difference was age, with 6 percent of Mexicans compared with 19 percent of Italians age 65 or older (mortality rates at any given age were lower in Italy than in Mexico).1 So, despite mortality’s importance, measures of it must be standardized to reflect age composition in order to be useful for comparative purposes. One possibility is to pick a baseline, such as the year 2000 U.S. population, and calculate directly standardized death rates for all other populations of interest (subgroups, other countries, other years, etc.). These rates are the deaths that would occur if the people in the standard population died at the same rate as those of each age in the population of interest. However, life-expectancy methods provide an approach to standardizing death rates that avoids picking a particular reference population; because of this, it is the most widely used method of summarizing population mortality.

Typically, life expectancy is computed using a period life table, which summarizes the age-specific mortality experience of a population over a short time, usually 1 or 3 years. The National Center for Health Statistics (NCHS) receives mortality information for all deaths in the United States through a cooperative reporting program involving all states and territories. Age-specific mortality rates are computed by dividing the number of deaths of persons at a specific age by the estimated midyear age-specific population provided by the U.S. Census Bureau. From these mortality rates, life expectancies can be simply calculated.

Despite their name, life expectancies are not primarily estimates of the future experience of individuals of a given age. Instead, they provide a method for summarizing mortality experience in the population during the period for which the deaths used in the computations were observed. They are the average number of years a hypothetical cohort of people with a particular starting age who have the current age-specific mortality rates at each future age would live. By contrast, a cohort life table gives the mortality experience of a fixed cohort of people—for example, deaths in a given year, until the entire cohort has died. So, for a cohort born in 1950, the death rates at ages 30 and 50 would be based on death rates observed around 1980 and 2000.

Only under the heroic assumption that age-specific mortality rates will not change for the next 100 years do life tables predict how many years that newborns will live. Because age-specific mortality rates have fallen steadily and are expected to continue to fall, current age-specific life expectancy should under-estimate the expected years of survival for an individual now that age. Analysts who need to predict future survival accurately, such as Social Security actuaries, must focus carefully on assumptions about how fast age-specific mortality will fall in future decades to obtain good predictions.

Because life tables use only mortality rates, they are not affected by the age composition of the population. For many purposes this is desirable, but some statistics, such as per capita medical spending and death or disability rates, depend heavily on a population’s age composition. And for some planning and budgeting purposes, it is necessary to collect or predict actual mortality, disability, and spending, rather than age-adjusted statistics. Analysts must be consistent in standardization for age when comparing trends or levels in expenditures and health. Stratification is a good alternative to standardization and would also prevent misleading confounding-by-age differences in comparisons. One could include either total or per capita health and expenditure flows stratified by age to adjust for population heterogeneity for a given disease.

Until the 1960s, gains in life expectancy were cited as the leading indicator of improving population health. Life expectancy in the United States is defined by the period life tables produced by NCHS, and the trend toward increasing life expectancy is the first health statistic cited in its recent annual report of health in the United States (National Center for Health Statistics, 2007). The statistic may be reported in a number of ways. Life expectancy at birth is often taken as an overall measure of population health, because it aggregates mortality rates for all ages. Life expectancy may also be reported as conditional on achieving a specific age or for subsets of the population. For example, the period from 1970 to 2006 saw an increase in life expectancy at birth in the United States from 70.8 years to 78.1 years (Arias, 2007). In 2005, life expectancy at age 65 was 18.7 years; at age 75, it was 12.0 years. Female life expectancy at birth in the United States was 80.4, exceeding male life expectancy by 5.2 years (National Center for Health Statistics, 2007). In 2003 the United States ranked 26th in female life expectancy at birth among 37 selected countries and territories ranging from Japan (ranked 1st at 85.3 years) to the Russian Federation (ranked 37th at 71.8 years) (National Center for Health Statistics, 2007).

5.1.3. Morbidity and Health-Related Quality of Life

In the modern era of health care, American society has become as concerned with health-related quality of life as with life expectancy (Linder, 1966). A great deal of modern medical care is directed toward reducing morbidity and increasing functioning, thus improving health-related quality of life. Accordingly, researchers have sought to measure not only changes in length of life affected by medical care, but also changes in morbidity and functioning affecting quality of life.

At this point, no consensus summary measure of an individual’s health, as affected by morbidity, has emerged. However, most researchers agree that a general measure of health for an individual at a point in time should reflect physical and mental functioning and account for the degree to which a person is affected by pain. These “within-the-skin” attributes form the core of most measures (Patrick and Erickson, 1993). Sometimes additional measures of social and role functioning are included (Stewart and Ware, 1992), which may be affected by physical and mental functioning (as well as other health factors) but are important enough to be valued and measured in their own right. There is dispute about including “beyond-the-skin” attributes—such as social support and the physical and socioeconomic environment within which an individual lives—the dispute being whether these are actual attributes of health or determinants of health. The inventory of attributes of health-related quality of life devised by the World Health Organization (WHO) is perhaps the most comprehensive in inclusion of beyond-the-skin attributes (World Health Organization Quality of Life Group, 1998a, 1998b).

To measure the impact of medical care on health, it seems reasonable to begin by focusing on measures of within-the-skin attributes of health and functioning. Even within this category, many choices of measures remain. An important consideration is whether to use condition- or organ-specific measures or generic health measures. Because any particular health condition or disease may have very specific effects, condition- or organ-specific measures can be quite detailed and sensitive to small changes in how a certain condition affects an individual. As a result, they are more likely than a generic health measure to be the primary outcome in trials of management for particular diseases.

However, the condition-specific options have major problems as measures of general population health. The first problem is logistical. There are many different conditions, and collecting data using a different index for each condition presents practical problems since all persons would need to be asked to identify all instruments relevant to them and to complete those questionnaires. A second and more important problem is that we do not know how to combine the scores from multiple disease-specific instruments into a single summary score to represent the health of an individual who suffers from several conditions (as is the norm for aging people) nor how to compare, say, an increase on an asthma index for one person to a decrease on a diabetes instrument for another person. In addition to the practical difficulty of collecting the data for such a large panel of measures, which simply may not be feasible, is a fundamental conceptual challenge. Converting condition-specific measures into an overall measure of health or well-being requires that they be comprehensive, or nearly so. Furthermore, it presupposes a way to combine them. That is, it is necessary to assign appropriate weights to each condition-specific measure to convert the group of them into an aggregate measure of health. They cannot simply be added together. Aggregation would require substantial new research, including the development of techniques to validate the aggregate measure.

Researchers interested in summary generic measures have instead concentrated on developing measures that cover the major domains of health but do not provide a great deal of detail relevant to any one health condition. The past 35 years have seen significant progress in the development of self-reported, preference-based summary measures. Advances include a growing number of available indexes and the use of generic health outcome measures in clinical and cost-effectiveness studies, creating a legacy of published data (Patrick, Bush, and Chen, 1973; Ware et al., 1981; Fryback et al., 1993; Gold, 1996; Gold, Stevenson, and Fryback, 2002). More recently, interpretation of the measures has been improved by their inclusion in large health surveys, thereby creating population norms against which to evaluate results obtained in clinical studies. These surveys also provide points for population tracking and comparison.

5.1.4. Summary Measures of Health-Related Quality of Life

Health-related quality of life (HRQoL) measures have been developed to capture both morbidity and mortality in a single number, to capture the cumulative effects of multiple conditions, and to reflect both psychological and physiological dimensions of illness. This section describes indexes of HRQoL that have been collected in population data sets in the United States. Any of these could, in principle, be used to measure HRQoL for a health account. Each has advantages and drawbacks for such use based on response burden for collecting the data, the proprietary versus nonproprietary nature of the measure, the existence of a scoring system based on U.S. population data, and how much detail the measure provides for models associating medical inputs to health.2

Generic HRQoL indexes measure health using standardized weighting systems representing community preferences for health states on a scale anchored by 0 (dead) and 1 (full health). Representing community preferences is important for cost-effectiveness analysis in health and medicine (Gold et al., 1996). There are currently six such indexes used in the United States and elsewhere around the globe. Each index emphasizes overlapping, but not identical, health concepts from the other indexes. Each measure involves a generic multidimensional health state classification, or descriptive system, employing multiple health domains to classify health broadly (i.e., these are formulated in generic, not disease- or organ-specific terms), and a standardized weighting (or “scoring”) system derived from a community preference valuation of health states. The health state classification system is most commonly a set of health domains, or “attributes” or “dimensions” (such as pain), which have predefined levels (e.g., “none,” “moderate,” “severe”) that can be selected. Levels range from a fully healthy state to a very unhealthy state in each domain. The measures are self-reported; a person’s answers to a standardized questionnaire are used in a prescribed manner to specify the level of each domain in the index’s descriptive system with which to associate the respondent. We refer to each specific combination of questionnaire, health state classification system, and standardized scoring system as an “index” or “measure” synonymously and to numbers assigned to individuals as “scores.”


Permission to administer EQ-5D is given without charge by the EuroQol Group (see The questions refer to “your health today,” and the descriptive system uses 5 domains (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression), each with 3 response options (no problems, moderate problems, severe problems), defining a total of 243 unique health states (Rabin and de Charro, 2001). There is a scoring algorithm for the U.S. general population that was derived from time trade-off assessments of EQ-5D health states made by a population sample of 4,000 U.S. adults in face-to-face household interviews administered in English (Shaw, Johnson, and Coons, 2005).

The EQ-5D is used widely around the world—for example; it is used in the U.S. Medical Expenditure Panel Survey (MEPS). As of January 2008, there are 81 official language versions and an additional 38 have been completed and are awaiting official ratification by the EuroQol Group’s Translation Committee.

Health Utilities Index Marks 2 and 3

License to use the proprietary Health Utilities Index Marks 2/3 (HUI2/HUI3) questionnaire and mapping algorithm must be purchased from Health Utilities, Inc. (see, which distributes and supports more than 16 standard questionnaire versions in English. A specific version is determined by a number of factors including mode of administration (self- or inter-viewer-administered); length of health-status recall period (1 week, 2 weeks, 4 weeks, or usual health); and assessment viewpoint (self- or proxy-assessment). Many of these questionnaires are also available in Chinese (traditional and simplified characters), Czech, Danish, Dutch, French (continental or European French and Canadian French), German, Hebrew, Italian, Japanese, Korean, Portuguese (European and Brazilian), Spanish (European and Latin/South American), Swedish, and Turkish. A condition of the license is that users not reveal the content of the questions or the mapping algorithm. Respondents are asked to consider “your level of ability or disability during the past week.” Scoring algorithms for both HUI2 and HUI3 were derived from standard gamble assessments made by adults in community samples in Hamilton, Ontario, and employ multiplicative multiattribute utility functions. The algorithms map data from the same 40-item interviewer-administered questionnaire to each of the HUI2 and HUI3 classification systems. The HUI2 defines health status on six attributes (sensation, mobility, emotion, cognition, self-care, and pain—excluding an optional fertility dimension, as is usual in the literature). Each attribute is divided into 4 or 5 levels, resulting in 8,000 unique health states (Feeny, Torrance, and Furlong, 1996). The HUI3 defines health on 8 attributes (vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain), each having 5 or 6 levels and jointly describing 972,000 unique health states (Feeny et al., 2002).

Both HUI2 and HUI3 scoring functions have health states scored less than 0 (dead). HUI2 scores range from −0.03 to 1.0; HUI3 scores range from −0.36 to 1.0.


Permission to use the QWB-SA can be obtained free of charge from the University of California, San Diego, Health Services Research Center (see Usually self-administered using a two-sided optical scan form, the QWB-SA assesses health over the past 3 days. It combines three domains of functioning (mobility, physical activity, and social activity) with a lengthy list of symptoms and health problems, each assigned a weight, using an algorithm that yields a single summary score (Kaplan, Sieber, and Ganiats, 1997; Andresen, Rothenberg, and Kaplan, 1998) based on the presence or absence of activities and symptoms on each of the past 3 days. The final score is the average of the three single-day scores. The original QWB algorithm was developed using visual analog scale (VAS) ratings of health state descriptions by a community sample of adults in the San Diego area. The QWB-SA algorithm is conceptually similar to that of the original QWB but was derived from ratings by a convenience sample of people in family medicine clinics around San Diego; VAS were used to rate domain levels and some case descriptions formed from special combinations of domains in a multiattribute utility elicitation process. Excluding dead (0), the minimum possible QWB-SA score is 0.09 and the maximum is 1.0.

SF-6D and SF-36v2

License to administer the SF-36v2 must be purchased from its vendor (see SF-36v2 refers to several time frames. One question asks for self-rated health “in general.” Some questions ask how much one’s health “now limits” doing certain activities. Other questions refer to the “past four weeks.” The SF-6D is computed from a subset of 11 of the 36 questions in the proprietary questionnaire. While the SF-36v2 yields a health profile summary using 8 domains, the SF-6D has reduced this to 6 domains (physical function, role limitation, social function, pain, mental health, and vitality), each comprised of 5 to 6 levels and jointly defining about 18,000 health states (Brazier, Roberts, and Deverill, 2002). Scoring was derived from standard gamble assessments by a population sample from the United Kingdom. The SF-6D scoring algorithm is distributed by the SF-36v2 vendor. The scoring algorithm produces scores ranging from 0.30 to 1.0.

The SF-6D may also be computed using the RAND-36 questionnaire, which is equivalent to the SF-36 version 1, and available without charge (see A license to use the SF-6D may be obtained from its developers free of charge for non-commercial applications and for government agencies (see


No permission is needed to use the HALex. It is the only summary index available that uses data directly from the National Health Interview Survey (NHIS) instead of its own free-standing system. It is used to track years of healthy life in Healthy People 2000 and 2010. HALex questions refer to “your health in general.” It consists of 2 domains, 6 levels of activity limitation (ranging from “no limitations” to “unable to perform activities of daily living”), and 5 levels of self-reported health (“excellent,” “very good,” “good,” “poor,” and “fair”), jointly defining 30 health states. This is the only one of the six indexes to use self-rated health to describe health states. Appendix 1 of Erickson (1998) abstracts from NHIS the questions used to rate a person in the activity limitation domain. The self-reported health question may be administered alongside these questions to complete an ad hoc questionnaire for the HALex.

The scoring algorithm was developed ad hoc without actual preference survey data using correspondence analysis to the HUI Mark 1. The worst of the 30 health states is scored 0.10, and the best is scored 1.0. Table 5-1 summarizes the pros and cons of the various indexes for use in a health accounts data system. No one measure is uniformly best, and each has strengths and weaknesses.

TABLE 5-1. Pros (+) and Cons (−) of the Various Indexes.


Pros (+) and Cons (−) of the Various Indexes.


The measures of health-related quality-of-life described in the previous section may be used to derive QALY compatible estimates. A QALY is a summary measure of health—based on subjective quantification of illness—that includes both morbidity and mortality. A year in perfect health is equal to 1.0 QALY. The value of a year in ill health is discounted to reflect the relative utility of the ill state versus perfect health; for example, a year bedridden may be valued at 0.5 QALY. In cost-effectiveness analysis of health care interventions, QALYs are now the standard metric for health impacts (Gold, 1996). These impacts are calculated for both individuals getting new treatments and populations with some changes in their health inputs.

A second acronym—QALE (for quality-adjusted life expectancy)—is used in the population health literature as a summary measure of current health status. QALE is life expectancy adjusted for the quality of surviving years and so is measured in QALYs. QALE is by definition less than life expectancy computed in unadjusted years. The discrepancy between QALE and unadjusted life expectancy reflects the relative perceived desirability people place on living a given length of time with morbidity versus living that time in perfect health. In a life-table representation of population health, QALEs are reported undiscounted for time (i.e., just as life expectancies in a population actuarial table are undiscounted).

QALEs and the methods described below to compute them have several good properties. First, they are independent of the age composition of the population. Other measures, such as crude death rates, disease prevalence, or restricted activity days, are highly dependent on age and must be stratified or standardized for many comparisons. Life-table methods are a natural method of standardization that do not require any particular population (such as the U.S. population in 2000) to be chosen. Second, with no additional work, the tables that compute QALE at birth can be used to compute QALE and nonquality-adjusted life expectancy for any age group (e.g., 65-year-old life expectancy as recommended by status of health indicators 2008). Ignoring the health adjustments, these methods compute classic measures of population health, such as life expectancy at birth, to compare with historical data from the United States and other countries. However, using this type of actuarial QALE as a descriptive summary of population health requires cross-sectional surveys of HRQoL, as discussed in the next section.

5.2.1. Cross-Sectional Surveys of HRQoL

Medical care may affect life expectancy, HRQoL, or both. For the purpose of cost-effectiveness analysis of medical interventions, these two are generally combined into one QALY measure (Gold, 1996). At the individual person level, generic health over time may be represented as a function of HRQoL over time, q(t). Once one has observed the individual’s HRQoL over time from t0 to t1, one can compute the QALYs accrued by the individual between time t0 and t1 as the integral t0t1q(x)dx, where the function q is empirically defined by the observations.

More often, empirically defined QALYs are computed by weighting time intervals lived, such as 1-year intervals, by an observed or estimated HRQoL average for the interval, then summing products of HRQoL and interval length across intervals. In this fashion, consider qa to be the average HRQoL for a person in the year interval from a to a + 1. Let tpx be the probability that a person age x will survive to age x + t. The empirically QALE conditioned on current age being a, will be QALEa=t=1(qa+t-1)(pta). This computation is a variation on standard life-table calculation of age-specific life expectancy where each year of life is weighted by age-specific HRQoL. This method is widely attributed to Sullivan (1971).

Rosenberg, Fryback, and Lawrence (1999) demonstrate this technique using qa values measured with the QWB index estimated in a community population and combine these with life-table survival probabilities to compute QALEa for males and females ages 55 and 65. Others have used a binary-valued 0,1 HRQoL function giving disability an HRQoL of 0 to calculate disability-free life expectancy (Molla, Wagener, and Madans, 2001). A similar method that allows for a few states between life and death is multistate life tables (Cai et al., 2003).

These techniques can estimate QALE for a health account. As with ordinary life expectancy, QALE measures do not predict future health, but instead summarize health in the current year. Two inputs are needed: (1) a life table describing mortality experience in the population and (2) average HRQoL at each year of age in a population. Age-specific death rates from NCHS would be needed as the first input. Data from population surveys using any of the HRQoL indexes described above would suffice for the second. In the next section we list existing surveys using one or more of these measures. Table 5-2 shows how the necessary life-table calculations would be structured.

TABLE 5-2. Life-Table Calculation Framework.


Life-Table Calculation Framework.

Data for column 1 come from NCHS vital statistics. They calculate the age-specific death rates from data from the decennial census on midyear population of each age, together with their collected deaths by age. Column 2 shows how many people of each age remain alive with these death rates at each age, assuming an initial hypothetical cohort of 1,000 births. Column 3 is new: it would require a population survey of HRQoL. Columns 4–9 are computed quantities using standard life-table techniques augmented for HRQoL weighting, as described above. Column 4 is the product of population × (1 – death rate × fraction of year lost to death). The fraction of years lost to death is usually very close to one-half except for infants because, on average, people die half way through the year. The row corresponding to 100, the largest age in the table, is special, as it covers more than 1 year: the death rate is 100 percent and years lived means expected future years for those exactly 100 years old. Column 5 is the years lived at that age × average HRQoL at that age, so it equals the QALYs lived by the remaining hypothetical population at that age. Columns 6 and 7 are added from the bottom to get cumulative health in years and in QALYs from each age to death in the hypothetical cohort. Finally, columns 8 and 9 divide the remaining years by column 2, the number of people of that age alive, to get life expectancy and QALE.

These tables also can be used to calculate other generic measures that have been collected for years in many countries, such as infant mortality and (nonquality-adjusted) life expectancy from birth and at other ages, such as 21 and 65. The national health account would need to do so to facilitate historical and cross-country comparisons, although we expect other Western countries to begin calculating and reporting QALE also.

Although restricted activity presumably is reflected in HRQoL, there might be some interest in these numbers and trends as well. One might use life-table methods to standardize other age-dependent measures, such as restricted activity days, calculating the expected lifetime-restricted activity days for a period cohort with death rates and restricted activity days by age as in the current year, but it seems more natural just to report the actual number of restricted activity days, perhaps stratified into large age groups such as children and adults over and under age 65.

Several cross-sectional surveys of HRQoL currently exist; Box 5-1 is a list of data sources. In Table 5-2, new data in column 3 would be the result of periodic HRQoL surveys of the population. Several one-time national data sets and at least two continuing periodic surveys collect one or more of the HRQoL indexes described earlier. However, without augmentation, none of these is entirely sufficient for an ongoing and detailed health account computation of QALE. Three one-time surveys have collected systematic HRQoL data. Although these studies can be used to estimate age-specific HRQoL of community living adults in the United States, they all miss persons younger than age 18, institutionalized persons, and persons living in the community but unable to respond to a survey for physical or cognitive reasons. The Joint Canada/United States Survey of Health was conducted in English, French, and Spanish. The National Health Measurement Study was conducted in English, and the U.S. Valuation of the EQ-5D was conducted in English and Spanish. Hanmer, Hays, and Fryback (2007) discuss similarities and differences of these surveys and implications for HRQoL estimates.

Box Icon

BOX 5-1

Cross-Sectional Surveys of Health-Related Quality of Life. One-Time Surveys The Joint Canada/United States Survey of Health was conducted in 2002–2003 by the U.S. National Center for Health Statistics and Statistics Canada. Approximately 3,500 (more...)

Two ongoing surveys of the U.S. adult population collect information for one or more of the HRQoL indexes. Other studies of note that have included HRQoL indexes include the Health and Retirement Study (HRS, see, which administered HUI3 in 2000 as a module for approximately one-twelfth of the full HRS sample, or about 1,600 individuals. The Centers for Medicare & Medicaid Services are required by law to survey a sample of 1,000 Medicare recipients from each participating Medicare Advantage plan. The resulting sample is nearly 100,000 persons per year and has been ongoing since 1998. This survey was formerly known as the Health of Seniors study, but in 1999 was renamed the Medicare Health Outcomes Survey (HOS, see and HOS included the SF-36 version 1 questionnaire through 2007; this questionnaire serves for calculating SF-6D HRQoL scores. In 2008, the HOS plans to change the questionnaire to a version of SF-36 developed for the U.S. Department of Veterans Affairs (Kazis et al., 2004a, 2004b, 2004c). Whether SF-6D scores can be calculated after this change is not known at this time.

None of the surveys mentioned above consistently includes actual medical examination of participants but, beginning two waves ago, the HRS has been collecting blood and some anthropometric measures. The ongoing National Health and Nutrition Examination Survey (NHANES, see does include a medical examination and testing of participants. To assist in modeling relationships of proven medical conditions (versus self-reported ones) for HRQoL, it would be highly desirable for NHANES to include at least one of the HRQoL indexes in its protocol. More broadly, given that multiple surveys (each with a somewhat different population scope) are required to adequately measure health across the population broadly, there should be some effort by the statistical agencies to pick a common quality of life instrument to use in the different surveys.

Recommendation 5.1: A committee of members from agencies responsible for collecting population health data (Agency for Healthcare Research and Quality, National Center for Health Statistics, Census Bureau, etc.) should be charged with identifying and putting in place a single standard population health measurement tool (or set of tools) to use in a wide range of surveys. The best instrument, which is situation specific, may simply be the one that can be added to enough surveys collected over time so that most of the population is covered.

Ideally, agencies would collaborate and choose one instrument that can be followed over time, for the purpose of having at least one comparable measure for different years, but others could be considered as well. For example, it would be very useful if a generic quality of life measure were added to the Medicare Current Beneficiary Survey (MCBS). The MEPS and MCBS should pick at least one instrument in common that both will consistently use over time. Although the choice of instrument should be left to the agencies, the system should strive for consistency of instrument use over time. The main reason for standardization is that chaining from one instrument to another is problematic. If a change does occur, an overlap period is needed for the transition.

5.2.2. Disease-Specific Measures of Health

In addition to generic measures of health, the national health accounts should also contain information on a set of specific diseases. WHO, for example, collects data on the incidence, prevalence, patterns of treatment, and mortality from tuberculosis for almost all countries in the world, and these are reported annually (World Health Organization, 2008). In the United States, separate data collection efforts are ongoing for cancer, HIV/AIDS, end-stage renal disease, and other diseases, the results of which can be used by researchers and policy makers. The databases may include information on stage at diagnosis, treatments, disease severity, and mortality. Is it reasonable to try to include such data on all major diseases in the health accounts?

Disease-specific data are considered by many to be the cornerstones of health research as they can provide the basis for learning, for example, if the incidence or prevalence of Alzheimer’s, HIV/AIDS, or lung cancer is going up or down; how limiting is arthritis or asthma; or how many people are dying from food poisoning or heart attacks. Disease-specific data may fit more closely with the intermediate outcome of expenditures organized by disease-specific episodes of treatment that are used to estimate improvement per expenditure. They are vital in planning policies and research on these diseases and in evaluating current programs.

However, as a general population-wide measure of health, disease-specific data have several problems. To be comprehensive, an enormous number of diseases would need to be covered, so the price tag would be high. Aggregation into a single or a few measures seems very difficult. How would one add up cases of asthma and breast cancer, or even cases of breast cancers of different severity? How would one treat cases of cancer in remission or aggregate cases with different stages? Also, each disease-specific health unit is on its own scale. Researchers are now in the early stages of trying to estimate the QALY impact for specific diseases on future health and survival. In one of these initial studies, Eggelston et al. (2009) measured the net economic value of improvements in health status (to an admittedly homogenous population), using a QALY metric defined in terms of 10-year cardiovascular risk, from spending on care for patients with type 2 diabetes.3

Another issue is the overlap of diseases, which raises the problem of allocating the total health decrement to each that is present. It is even difficult to split up deaths in this way—given that multiple possible causes of death are common-place, information is currently not precise enough to confidently estimate death trends by disease. It is also useful when these measures include cognitive impairment. This is a large problem for coverage of the elderly and also for populations outside the United States—for example, deaths of children in developing countries from malnutrition.

Recommendation 5.2: Recognizing the difficulties in estimating the incidence and prevalence of disease, the National Center for Health Statistics should commission research on selecting and specifying a set of important acute and chronic diseases and feasible methods for estimating acute incidence and chronic prevalence that might be part of a national health data system. These counts would be a supplement to the systematic data systems to measure health-related quality of life using measures that transcend single diseases (Recommendation 5.1).

Such estimates should prove useful for looking at the impact of trends in screening, expenditures per case, and deaths.

5.2.3. QALYs and Disability-Adjusted Life Years

A possible alternative to the QALY approach for measuring population health stock would be to base it instead on the disability-adjusted life year (DALY) methodology (Murray, 2002). These measures were developed to help WHO and others working in the developing world understand the burden of (generally infectious) diseases, plan and evaluate policies to reduce the impact of those diseases, and assess health using data that might be available in those countries.

The DALY approach to measuring health is based on disease incidence and prevalence—a fixed burden per person is computed based on the methodology, and then total burden is counted by multiplying this times the number of people affected. Early versions of the methodology used expert opinion to set the burden per person of morbidity, but the DALY weights and methods overall have been fluid in the past several years in response to criticisms, and they are now much closer to a QALY. One minor difference is that the DALY method assesses a health gap: the difference between a person’s life and a life at good health to some maximal age, whereas QALY methods typically say how much life is achieved. If the methods for assessing HRQoL were identical, there would be the identity that the QALYs (at less than maximal age) + DALY gap = maximal life span. The developing world’s focus on DALYs may be partly driven by the enormous data gaps that exist, and it makes sense conceptually for those trying to maximize the reductions in disease-based disability and death (burden of disease) for a given cost outlay. However, they are not sufficient for that purpose, because estimates of the effect of any program on each DALY are also necessary.4

The QALY method is much more widely used than are DALY methods in the United States and among the developed countries in the Organisation for Economic Co-operation and Development. In these countries, the major health problems are no longer childhood infectious diseases but the chronic conditions that people accumulate with age and need to manage to preserve their health. Also, the QALY methods have been the subject of extensive scientific investigation and were, in particular, developed for the health questions typically asked on the ongoing surveys. Economic accounts are in the business of organizing data for measuring the value of goods and services generated in the economy, so it makes sense to think in terms of QALYs (the good that is being generated from medical care). And for a U.S. application, one would want data of sufficient quality to estimate QALE using QALY methods.

All things considered, for the purpose of measuring population health for a national health account, QALY is the most appropriate metric currently available.

Recommendation 5.3: Initially the national health account should focus on quality-adjusted life expectancy measured in qualify-adjusted life years as the best summary measure of health in each year.

Data on important risk factors that impact future health should be collected, but more research is needed before useful calculations can be made of the future life expectancy of the current population.5 Expected future QALYs will need to be revised as the effect of treatments become known—this is similar to other parts of the NIPAs.


The computations of QALE outlined above are not forecasts of future health. They provide a picture of current health in the United States—a snapshot of current health accrual. For the reasons discussed, QALE computed from period life tables is not a forecast of experience of the population in the future. At this point, a national health account should focus more on current health than on predicted future health, mainly because data will be easier to collect, assemble, and understand. Much can be said about particular risk factors in terms of their future impact in a national report, but overall life expectancy of the current population is a stretch.

Data should include generic health in the current year and information on risk factors and disease. Average HRQoL, including death, is appropriate for the current year, and it may be age-adjusted or not to track over time. Currently life expectancy is based on this year’s outcomes, not on projections, for the practical reason that they are easy to compute. Population-based QALE is a good metric (and an improvement over life expectancy) for current health stock, although it is data-intensive to compute.

That said, it is important to recognize that many medical, personal, and public health interventions are investments in future health, not just current health. For example, one of the most successful interventions of the past four decades to which a good proportion of current gains in health is attributed is detection and treatment of high blood pressure. The purpose of diagnosing and treating high blood pressure is not to improve current health but to improve future health— indeed, many drugs aimed at preventing future heart disease may even have side effects that reduce quality of life in the short run. Similarly, many environmental policies are aimed at lowering the burden of future chronic disease by reducing exposures to carcinogens or other air pollutants, even when those exposures have no impact on present health (Gold, 1996).

The purchased medical services tracked in the medical expenditures accounts are an important input into the production of future health, but there are many other inputs. The National Research Council report Beyond the Market (National Research Council, 2005) lists five other major inputs: (1) medical care services provided without payment; (2) time and effort that individuals invest in their own and relative’s health; (3) consumption of nonmedical goods, such as food, tobacco, and alcohol; (4) research and development that may lead to improvements in medical technology; and (5) environment and disease state factors and shocks. In addition, one might add other influences, such as immigration, education, social capital, culture and desires, and genetics. Finally, the success of earlier treatments leads to populations that are older and have a history of disease. These factors aggregated from conception to the present all impact health.

Cost-effectiveness studies of medical technology do need to forecast future health and usually do so through use of mathematical modeling of treatment results and personal risk factors associated with patients. Risk factors are characteristics of individuals or individual behaviors that are associated with future health gains in the population. A comprehensive list of these is beyond the scope of this report. However, an accounting of health gains from current medical investments needs to recognize that reductions in risk factors in the present can improve QALE in the future. We suggest several criteria for deciding which, if any, inputs to future health to collect:

  • There should be feasible or actual reasonable national estimates available.
  • Solid scientific links should exist between the risk factor and future health.
  • The impact on future health should be large enough to make data collection worthwhile.
  • Interventions to change the risk factor should be reflected in the expenditures part of the national health account.

By these criteria, data concerning risk factors in the United States today should include information related to:

  • smoking and tobacco use;
  • physical activity;
  • sleep habits;
  • obesity (i.e., body mass index, although waist/height may work better as a predictor);
  • measures of diet (apart from obesity);
  • high blood pressure;
  • cholesterol;
  • alcohol use, especially when driving;
  • education;
  • birth weight and prematurity;
  • participation in screening (e.g., mammography, colorectal cancer screening);
  • vaccination (e.g., childhood, pneumococcal, human papillomavirus);
  • oral hygiene;
  • preventive interventions (e.g., seat belt use);
  • exposure to environmental factors (e.g., violet A and ultraviolet B light); and
  • diseases such as diabetes, chronic kidney disease, cardiovascular disease, and cancer, which have a major impact on future as well as present health.

Most of these are collected regularly through the Behavioral Risk Factor Surveillance System (BRFSS) of the Centers for Disease Control and Prevention (CDC). BRFSS is a state-based system of health surveys conducted by telephone. CDC assists states to track health risks in the United States and makes the data widely and easily available. A number of states use the opportunity to make BRFSS a rolling cross-sectional survey; however, CDC publishes the data as annual surveys, allowing tracking of trends over time and comparisons to be made at the state level.

Most (not all) of these risk factors are collected as self-reports. The NHIS asks if a doctor or medical professional has told the person that they have high blood pressure or high cholesterol, and many surveys have copied this format when it is not possible to document with actual physical or laboratory measurement of blood pressure or cholesterol. NHANES makes actual measurements on participants for these data. From NHANES one could add measures of stress and of the incidence and prevalence of such diseases as diabetes and cancer that presage future poor health. It is unlikely that self-reports of biological variables such as cholesterol, cancer, or other lab work will typically yield anything like complete or accurate results. NHANES, together with an algorithm, has been used for adults to define chronic kidney disease in a microsimulation model; it uses a correction factor in which a percentage is taken off for early-stage chronic kidney disease because proteinuria is measured only once, but that correction seems solidly based (Coresh et al., 2007).

In general, however, alternatives based on self-report (e.g., “Has a doctor told you that you have diabetes?”) are unlikely to give rise to very good models of future health, which means that the predictions of future health in these cases will be somewhat speculative. The spread of electronic medical records may facilitate data collection on the factors that are usually in such records. Clearly, better markers are needed of current population health and risk factors to be able to project future health better.

Attribution of health trends to changes in inputs is discussed in Chapter 6. In reports on risk factors in the population, rough interpretation of trends using the current epidemiological literature may be applicable. For example, one might say that the observed y percent lower hypertension prevalence will lead to x percent fewer deaths using the formulas derived from Framingham Heart Study data. One such formula calculates risk of cardiovascular disease based on smoking, diabetes, systolic blood pressure, cholesterol, high-density lipoproteins, gender, and age (D’Agostino et al., 2008).


There are three problems with currently collected HRQoL data:

  1. Sample sizes, although large for most health research, are small for making detailed analyses.
  2. National surveys are mainly limited to adults living in the community and miss children as well as institutionalized persons and those living in the community but who are cognitively impaired.
  3. Release of national survey data often lags 1–3 years from the time they were collected.

Unlike data for national economic accounts, for which data are often collected and reported quarterly, health data are generally collected and reported much more sparsely. BRFSS data are collected annually; however, the BRFSS website in April 2008 reported trend data only up to 2003. NHIS and NHANES data are available for 2006, with selected estimates recently released from 2007 data. MEPS released 2006 data in phases throughout calendar 2008. Mortality data typically lag 2–3 years. A lag of 1–3 years for data reporting may be sufficient for a satellite health account. However, if the nation wants health accounting anywhere near as current as economic accounts, then the collection of health data needs a paradigm shift. And there may be such a paradigm shift on the way.

As part of the “Roadmap” of the National Institutes of Health (NIH), which is a priority path for medical research, NIH has funded a consortium of researchers to develop a standardized system for collecting patient-reported health outcomes data: the Patient Reported Outcomes Management and Information System (PROMIS, PROMIS researchers have defined a generic set of health domains to broadly cover physical, mental, and social health outcomes and several specialized subdomains such as sleep/wake function, sexual function, fatigue, and pain, among others. For each domain the PROMIS project is developing an item bank—a set of psychometrically validated questions posed with standardized response scales—to be used in measuring patient function. The questionnaires may be in the traditional format of paper and pencil or delivered as a computerized adaptive test (CAT) via the Internet. CATs reduce response burden by minimizing the number of questions respondents answer while still maintaining a high degree of psychometric measurement precision.

As envisioned under the plan of the NIH Roadmap, the instruments developed by PROMIS are intended as a common suite of measures to be administered in all NIH-funded trials and eventually to be part of the routine medical record. Because of the structure of its domain system, PROMIS may be adapted to HRQoL measurement simply by developing a scoring summary reflecting preferences of the U.S. population for health and well-being.

Although PROMIS is targeted to patients now, the methods are generalizable and the measures could well serve as the basis of a population survey administered via the Internet. Surveys using Internet panels are currently being tested as a new vehicle for population surveys. Should they prove valid for self-reported health, then it may well be possible to conduct large HRQoL surveys of the U.S. population with relatively low expense compared with current face-to-face and telephone surveys and with perhaps quarterly reporting of HRQoL and risk factors. Of course Internet surveys may have the same limitations as current surveys: they miss very sick people, cognitively impaired individuals, and children.


The earlier chapters of this report have been about how to estimate quantities and prices of inputs to health; most of that discussion has focused on one of these inputs—medical care. Satellite accounts will be most useful if their structure is as consistent as possible with the NIPAs; one element of this consistency entails using a monetary metric to value account components, whether it be expenditure or income, inputs or outputs (National Research Council, 2005, p. 5). This chapter is mainly about estimating the quantity portion of the broad health account’s output. This quantity is natural and interesting in itself, but it is useful for some purposes to value it. To do so, one must attach a price to the unit of output—a healthy year of life.

Of course, health is not a market good and, as such, its valuation is anything but straightforward. As noted in Beyond the Market (National Research Council, 2005, pp. 33–34):

Developing a market-based measure of the marginal value of additional years of life that may flow from health care investments is controversial, though labor market [and many other kinds of] data have proven useful for this purpose. Specifically, the fact that different occupations are associated both with different risks of fatal injury and different relative wage rates has been exploited to derive estimates of the value of an additional year of life. Such measures, while far from perfect, have the advantage of being based on real-world decisions that yield observable market outcomes and, for that reason, they have appeal.

In a mature national health account, health-output quantities could be derived in terms of QALEs while, as suggested above, the value placed on a year of healthy life would be established by the literature. Viscusi and Aldy (2003) include a review of literature estimating the value of a statistical life. While the variation across studies in these kinds of estimates is high, the estimates generally translate to the $75,000 to $150,000 range (in year 2000 dollars, based on age-specific wages) for an additional healthy year of life. We do not endorse any one value for a year of life, but, for many policy purposes, it is necessary to compare the return of different options using a monetary metric. Indeed, this is one reason why the NIPAs have been so useful over the years.

Thus, while we believe physical quantity estimates of health are clearer, we also endorse Recommendation 6.8 from Beyond the Market (National Research Council, 2005, p. 139):

Recognizing that there is a range of uncertainty, a satellite health account should be based on a dollar figure for the value of a year in perfect health derived from estimates in the literature. That value should be updated as further research indicates.

The report goes on to note that, due to the controversial nature of this kind of valuation, both physical quantity estimates and monetary measures of their value should be presented (p. 139). We refer readers to Beyond the Market for a discussion on the full set of issues to be considered when estimating marginal valuations on a year of life.



The remainder of this section draws heavily from Fryback et al. (2007).


Assuming that “one life-year is worth $200,000 and accounting for changes in modifiable cardio-vascular risk,” the authors found that “the net value of changes in health care for patients with type 2 diabetes was $10,911 per patient (95% CI, is $8,480 to $33,402) between 1997 and 2005.”


There is also dispute about the DALY disease weights. For a good summary of differences between QALYs and DALYs, see Sassi (2006).


This recommendation is consistent with a recommendation of an Institute of Medicine panel on valuing health for the purpose of cost-effectiveness analysis (Institute of Medicine, 2006). That panel recommended the use of QALYs when evaluating medical and public health programs that primarily reduce both mortality and morbidity: “Regulatory [cost-effectiveness analyses] that integrate morbidity and mortality impacts in a single effectiveness measure should use the QALY to represent net health effects” (p. 161).

Copyright © 2010, National Academy of Sciences.
Bookshelf ID: NBK53336


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.3M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...