NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on National Statistics; National Research Council (US) Committee on Population. Improving the Measurement of Late-Life Disability in Population Surveys: Beyond ADLs and IADLs, Summary of a Workshop. Washington (DC): National Academies Press (US); 2009.

Cover of Improving the Measurement of Late-Life Disability in Population Surveys

Improving the Measurement of Late-Life Disability in Population Surveys: Beyond ADLs and IADLs, Summary of a Workshop.

Show details

2Challenges to Improving Measurement of Late-Life Functioning and Disability

This chapter summarizes the first workshop session designed to explore the challenges to improving current measurement of late-life functioning and disability in population surveys. The session opened with an overview of the background paper commissioned for the workshop (see Appendix A). It was followed by presentations on four topics:

  1. Developing questions in surveys to identify people early in the disablement process
  2. Enhancing the ascertainment of disability
  3. Self-responses versus proxy responses in surveys
  4. Expanding modes of survey administration


Barbara Altman (disability statistics consultant) presented a brief overview of her background paper (Appendix A). She noted that including measures of disability in population surveys involves many disciplinary approaches and many research agendas. As Nagi pointed out in 1991, the history of the involvement of many disciplines in the development of theory, policy, and programs to address disability issues—medicine, education, social work, psychology, sociology, vocational counseling, occupational and physical therapy, and others—sets the stage for attempts at conceptual distinctions to delineate measures of the disability process. There has been a tremendous richness in the work and the number of disciplines involved in trying to understand disability, trying to develop models of disability, and trying to identify measures of it. Yet this variety in approaches, language (and jargon), orientation, and focus complicates efforts at measurement and sometimes confuses interpretation of the results of the measurement.

Altman stated three objectives of her paper:

  1. To examine the disability conceptualization transition to measurement within the general theory of disability and to compare it across areas of application
  2. To define the sources and types of measurement that have been developed for the various theoretical concepts and examine what measures are available and how well they represent the concepts
  3. To reintroduce the important contribution of the social and environmental context, not only to the conceptualization of disability, but also to its measurement

Altman then gave an overview of the measurement elements that are necessary to fully understand disability, reveal the strengths and weaknesses of what we have, and identify the gaps in measurement that exist.

Transition from Concepts to Measures

Multiple theoretical models provide the conceptual basis for understanding the disablement process. The Nagi model (1965), developed by a sociologist, is one of the earliest and most widely known coherent organizations of the conceptual components and their relationships. It was revisited and expanded in 1991 by Nagi and also elaborated by Verbrugge and Jette (1994).

Subsequently, conceptual elements and relationships have been expanded with models from the Institute of Medicine (1991, 1997) and the International Classification of Functioning, Disability and Health (ICF) model of the World Health Organization (2001), but their elaborations do not really take new directions. They do make the components more understandable to a wider audience, provide a standardization of the language, and make things more accessible to the people who are using the models. The ICF model also provides an accompanying classification scheme that is a listing of domains for consideration when one is operationalizing a measure. It has been a very useful tool. While each of these succeeding models has made contributions, the original model is still very visible.

On the basis of these models, major conceptual elements that make up the experience of disability and need to be measured in population surveys include (the background paper and this discussion focus only on those concepts that are starred)

  • pathology or impairment,*
  • personal factors,
  • functioning of the whole person,
  • actions or activities,*
  • participation or disability,*
  • environment,* and
  • quality of life.

Personal factors, at least in terms of demographic characteristics, are commonly measured in population surveys, although some additions could be made for the purposes of studying disability. Quality of life reflects a variety of conceptual composites that have been constructed differently in different research situations.

The transition from theory to measurement involves several steps and several areas of decision making that are not always thought through when the process is happening. Yet this transition is the point at which the theoretical elements are converted to operational choices, either specific characteristics of, or observations about, individual respondents. Altman noted that the process in effect creates the breadth or limitations of the data to reflect the concept being addressed and, as such, requires conscious consideration and forethought.

The primary conceptual components of disability models have become familiar to people at the abstract and theoretical levels. However, translation of the concepts into concrete (i.e., everyday reality) measurement in a population survey involves decision points, and with each decision point the initial basic concept is narrowed. Because of the limitations associated with population surveys—time, space, and cost—it is hard to include all the conceptual elements identified in a full theoretical model of disability.

Briefly, the transition from theory to measurement process involves

  • identifying the purpose of data collection,
  • identifying the appropriate conceptual component related to the purpose of the data collection,
  • operationalizing the theoretical concepts in real-world terms—deciding what actual behavior or characteristic will represent the larger concept, and
  • locating the unit of analysis and the type or level of measurement.

Purposes for Measurement

The Washington Group on Disability and Statistics, an international organization that is seeking to develop comparable measures of disability internationally, has identified three major purposes for data collection in population surveys.

The first purpose is to identify trends in prevalence rates of impairments, social limitations, or levels of participation. Trends can be developed of almost anything that is measured so that construction of the measure creates the bounds of the population examined. For example, if ADLs and IADLs are measured, one only identifies trends in the prevalence of limitations in ADLs or IADLs. One cannot assume that those identified by the measure also represent all persons with physical or sensory limitations. While it is likely that all the persons who indicate that they have ADL and IADL limitations also have physical, sensory, emotional, or cognitive limitations, they do not represent the total population with all those kinds of limitations. Similarly, if the measures focus on physical or sensory limitations, the resulting trend data document the change over time in those limitations. Such measures are also likely to include most people with ADLs or IADLS, but because the measures are of physical or sensory functioning, they may cover a much larger population. People with ADL or IADL limitations can only be assumed to be a smaller and less identifiable portion of the population than has been defined. The purpose of trend data is simply to monitor the changes in prevalence of a certain conceptual element of disability over time.

The second purpose of collecting data in a population survey is to demonstrate the provision of service and programmatic needs of a population. Measures of service needs are generally focused on particular types of impairment, functional limitation, or age groupings and usually involve such subpopulations as wheelchair users, people who have problems communicating, or people with spinal cord injuries. Much more detailed information is needed about those subpopulations in order to provide the information required to develop programs or document that programs are working. National (or general) population surveys are not necessarily an appropriate vehicle for that kind of purpose because of an insufficient sample size for the specific problem or because of the number of questions needed to provide the necessary detail. Such information is best obtained in a medical setting or in a special survey.

The third purpose of collecting data in a population survey is to assess the integration in or the equalization of opportunity for a population of concern as compared to the general population. This is a new approach to understanding disability and is related to the social model. It addresses the interactions of people in their communities and calls for a measure that identifies the full range of possible candidates for discrimination—the “at risk” population. It derives from the Americans with Disabilities Act, which guarantees the rights of all people with disabilities, including access to buildings, doctors’ offices, stores, jobs, and all other public places without discrimination because of physical, mental, or emotional difficulties. It is very similar to what has been done relative to race, gender, and age discrimination. Disability measurement for this purpose takes on a much broader focus than is found in the other two purposes. It calls for measures that identify a full range of people at risk of discrimination because of their limitations.

The questions in the American Community Survey that went into use in 2008 were specifically developed to satisfy this third purpose of monitoring equalization of opportunity. Respondents are asked about functioning limitations with the assumption that those who have functioning difficulties of any kind are the population who are most at risk for limitations in social functioning because of the social or physical structure around them. The measures are then used as a demographic context to examine differences in access to education or participation in employment in the same way that one would examine differences in access between men and women or among racial and ethnic groups.

Operationalization of Measures

Altman noted that when operationalizing a theoretical concept, such as functioning, one moves from the conceptual definition, which incorporates all possibilities, to the single question or observation that represents one possibility or a small group of them. One has to decide what measure is going to represent functioning.

There are also various levels of complexity of the measurement of disability. Some things are relatively simple, such as whether an impairment or a condition is present. That is a yes or no question, and whether it is a self-report or a doctor’s diagnosis is used, it gives a good idea of a person’s general health condition. However, the current models of disability reflect a hierarchy or an increasing complexity of the components associated with the disability process. In addition to the representation of the presence of an impairment or an impairing condition, there are at least four levels of measurement reflected in disability conceptual models: basic action, specific task, organized activity, and role participation. Each of these represents a more complex level of action or activity. The areas of activities and participation identified in the ICF model, known as domains, include all four levels of complexity in a particular area. An example would be the area (or domain) of mobility that incorporates activities, such as changing body positions, moving and handling objects, walking or moving to different locations, and moving around using transportation.

The levels of complexity of conceptual components influence how a measure is constructed. Generally, questions about basic actions or movement or use of the body or mind represent the simplest level of action or activities. A more complicated level of action or activity that goes beyond coordination of bodily movement is represented by the specific task that an individual is motivated to accomplish and requires a combination of physical movements, sensory perception, intellectual activity, and possibly the use of assistive devices in order to complete the task.

The next level involves combining basic actions and multiple tasks to accomplish what may be considered a behavioral element of an ongoing role: For example, using a motor vehicle is an organized activity that is useful in many roles. As a parent, taking children to school fulfills one of many parental roles, but as a bus driver the ability to drive a motor vehicle is one of the central and necessary elements of the job. Finally, participation represents the accomplishment of enough organized activities to satisfy minimum role requirements to be considered integrated in a specific role.

Measures in Population Surveys1

Almost all population surveys with disability measures include impairment measures. Measurement of functioning is frequently associated with questions developed by Nagi to measure physical functioning in the employment context; such measures represent the whole person’s ability to accomplish basic actions, such as walking, seeing, and communicating. In contrast, measures of behavior or functioning in social roles are much more complicated and complex. Disability is experienced when the person with the functioning limitation interacts with the cultural expectations or the physical environment. There are far fewer measures of this type in surveys, and it is probably the direction that measurement work should take to understand the full effects of functional limitations.

In summary, Altman said, there is a rich set of survey measures on the task level that represent dependence in basic task activities that are necessary for maintaining independence. There are also some good representations of physical functioning, although not all areas of physical functioning are well represented. Other types of functioning are not well represented. Mental health or intellectual functioning is not well represented, although there are some intellectual tests that are included in surveys of older people. Large gaps exist in measuring role participation, as well as both the physical and social environmental contexts in which all action takes place. There is a great lack of standardization of core measures that permit a continued search for uniform concepts and a common language. Without standardized core measures, one cannot accumulate knowledge in a way that is useful because each measure represents a somewhat different segment of the population with disabilities.


Linda Fried (Mailman School of Public Health, Columbia University) began her presentation with a question: How can one identify people early in the disablement process? In other words, is there life before ADL and IADL limitations—looking at disease, predicting impairments, predicting functional limitations, predicting disability? The natural history of disability indicates that at least half of disability in older adults is chronic and progressive (Ferrucci et al., 1996). Catastrophic progressive mobility difficulties, and other factors, predict difficulty or independence in ADLs and IADLs (Harris et al., 1989), and having difficulty in a task predicts dependency (Gill et al., 1998). Given this chronic, progressive course, the key to prevention and compression of morbidity is early ascertainment.

A number of survey measures have been developed over the past several years to ascertain individuals at an early stage of disablement, those who are likely to be most amenable to interventions, with the goal of effective targeting as a basis for intervention. They include

  • a focus on mobility, as well as social roles and compromise of them as disability progresses;
  • “life-space diameter” and its constriction over time in terms of the activities in which people engage and the geographic perimeter of their lives, as markers of people who are on a disablement pathway;
  • disability in more demanding tasks, as a predictor of disability in less demanding tasks;
  • fatigue and tiredness as early indicators of functional decline and predictor of incipient disability; and
  • preclinical disability, including survey or objective measures, as well as mixed measures combining both survey and objective performance-based measures, such as screening nomograms for preclinical disability and measures of frailty, both of which predict incident disability.

To illustrate the disablement process, Fried offered a story of a 75-year-old woman with arthritis of the knees and fear of falling: About 10 years ago, she started having difficulty walking a half mile and since has stopped being able to do that. Then, 2–3 years ago, she started having difficulty climbing stairs and would clutch the handrail. About 2 years ago, she stopped being able to do heavy housework and cut back on light housework, and about a year ago she stopped being able to carry groceries from the store. She said that difficulty in each of these tasks was due to the same reasons: arthritis of her knees and her fear of falling (Fried and Herdman, 1992). This story provides a sense of what the disablement pathway might feel like to an individual.

Much interest exists in whether self-reported task difficulty in more demanding tasks (such as this woman was describing) can, itself, identify people early in the disablement process and at risk of progression to difficulty in less demanding tasks. It raises the question of whether there is a hierarchy in a group of tasks, such as mobility. This question raises a further question: Can optional participation in higher level, complex tasks be used for ascertainment of those at risk of progression to disability? Reuben and colleagues (1990) proposed looking at advanced ADLs, for example, how much exercise people are engaging in as a predictor of their level of activity.

Self-reported task difficulty in more demanding tasks can, itself, identify people early in a disablement process and possibly presage incipient disease. For example, unpublished data from the Cardiovascular Health Study (Fried et al., 1991a)—a prospective, observational cohort study of older adults in four U.S. communities—suggest that onset of reported difficulty in these tasks might have the added interesting feature of predicting incipient disease events. The study looked at physical function before and after cardiovascular events, and tasks were divided into three categories as being more or less demanding in terms of exercise tolerance, but all requiring some mobility. One category involved more demanding tasks, such as walking a half mile and climbing steps; another category included medium-demand activities, such as shopping and preparing meals; and a third category included low-demand activities, such as walking around the home or getting out of a bed or chair. Looking at the months before or after the onset of a cardiovascular event—coronary heart disease, congestive heart failure, or stroke—a precipitous decline occurred in the proportion of the population without difficulty in demanding tasks, although there was little change in the proportion with regard to low- or medium-demand tasks before the event itself. This suggests that the onset of difficulty in high-energy-demand tasks indicates a decline in physiological status before the cardiovascular event itself. There is a need to characterize what that is: Is there a preclinical stage of change in physiological function marked by onset of some disability? Would identification of that stage help find the people who are really at very high risk of progression of disability? Those people would be a particularly desirable target for intervention and likely to be much more amenable to improvement than after they have become severely disabled.

Going back to the example of the woman with arthritis, Fried said that there were a number of less demanding mobility tasks that the woman had no difficulty performing, but over the same 10-year period she had started to change the way she did them (Fried et al., 1991b). This phenomenon suggests that there is a progressive process even among people who have not reported difficulty following a hierarchy of mobility tasks. At early stages, people may be able to successfully compensate for the effects of disease and maintain their function without difficulty.

Fried stated that many years ago she and her colleagues hypothesized that there might be observable preclinical changes in function that could identify an early stage of disablement and that they might be able to ascertain such changes through survey methods. This intermediate stage of function between high function and difficulty in a task would be measured by self-reports of whether people modify task performance or its frequency as a result of underlying changes in health, even though they reported no difficulty. What would really be measured is adaptation to physical limitations in order to preserve task performance.

The Women’s Health and Aging Study (Fried et al., 2000) found that, among the two-thirds least disabled older women living in the community who had no difficulty at the beginning of the study, those who were reporting task modification were at fourfold higher risk of developing difficulty walking a half mile over 18 months (adjusting for a number of factors). Walking speed was also an independent predictor, with a twofold increased risk for lower walking speeds in these models. Interestingly, strength and other covariates in these higher functioning women were not predictors of subsequent changes for them.

Another example is a series of studies by Douglas Miller and Fred Wolinsky in St. Louis (see Miller et al., 2005), which looked at inner-city African Americans in comparison with older African Americans living in suburbia and whites living in suburbia. In the first study, the authors found that inner-city African Americans who were 65 and older were already so substantially disabled that preclinical disability could not even be ascertained. They found that disability was occurring about 10 years earlier in inner-city African Americans than in suburban African Americans or whites. They then did another study to look at middle-aged African Americans (ages 49–65) living in inner cities and found that 60 percent already had one or more disabilities and that about 33 percent already were reporting preclinical disability in mobility tasks.

They also found that preclinical status in terms of mobility in these middle-aged African Americans living in the inner city predicted a four-and-a-half-fold increased risk of onset of difficulty in walking a half mile, with preclinical disability again defined as self-report of modifications in the way people went about doing mobility tasks but without difficulty. An interesting finding from this population is that there were no physical performance tests that were predictive of incident mobility difficulty, while the self-report survey methods were highly predictive in the same model. These are somewhat different findings than found in the Women’s Health and Aging Study of an older group of women. Wolinsky and colleagues (2007) concluded that the preclinical disability survey measure they used was a highly effective early warning system and a target for prevention.

Other work in the Women’s Health and Aging Study suggests that the way people modify mobility tasks has a lot of variation, but in general suggests a hierarchy of compensation use, going from the very intrinsic ones (such as change in pace or biomechanics) that do not threaten people’s own perception of whether or not they are having difficulty, to highly extrinsic ones for which the modifications are more evident. Basically, people report doing the task more slowly, changing their body position, or doing the activity less frequently; then they cut out parts of the activity that they would normally do in a day; and then they start using assistive devices and human assistance.

This work offers one perspective on using surveys to identify people with earlier changes in function in a way that can be used for targeting for prevention of disability.


In his presentation, Thomas Gill (Yale University School of Medicine) summarized some of the results from the Precipitating Events Project, a prospective cohort study of 754 initially nondisabled persons aged 70 years and older living in the community that he has been leading for the past decade (for further information, see Gill et al., 2002; Gill and Gahbauer, 2008; Gill et al., in press). Monthly telephone interviews have been conducted for up to 130 months assessing disability in ADLs, IADLs, and mobility. Gill focused on the four essential ADLs: bathing, dressing, walking, and transferring.

A standard strategy for ascertaining the occurrence of disability does not currently exist. In most longitudinal studies, an incident episode of disability is noted when a nondisabled person reports disability “at the present time” during a subsequent follow-up assessment. Yet it has been shown previously that incident episodes of disability are often not ascertained by longitudinal studies with assessment intervals longer than 6 months. Using the traditional strategies, for individuals nondisabled at baseline who are assessed again 1 year later, the incidence rate of disability might be about 2.5 percent. Yet by evaluating them every month, as was done in this study, at the 12-month mark the disability rate was 10 percent. This finding indicates a substantial underestimate of the incidence of disability in studies with infrequent assessments.

The question is: What is driving these underestimates? Analysis of data for 24 months showed that for each of the three risk groups—low, intermediate, and high—the difference between the cumulative disability rates increased progressively as the length of the assessment interval increased. Although these differences in rates were attributable almost exclusively to recovery from disability from 1 month to the next in the first 6 months, they were due increasingly to deaths and some losses to follow-up over the next 18 months, especially among participants in the high-risk group.

To evaluate whether the ascertainment of disability could be improved in longitudinal studies, the researchers added several questions that had not been included in the prior assessments to the comprehensive assessment at 72 months. In addition to asking about disability at the present time, for each of the four essential ADLs, participants who did not need help from another person “at the present time” were asked to recall whether they needed help from another person to complete the relevant task “at any time” during the past 1 month, 3 months, 6 months, and 12 months, respectively. Focusing on the 12-month results, up to one-half of the incident disability episodes, which would otherwise have been missed by asking only about disability at the present time, could have been ascertained if nondisabled persons had also been asked to recall whether they had had a disability at any time since the prior assessment.

At this new 72-month baseline, 370 people were not disabled in their basic ADLs when first surveyed. One year later, 14.2 percent (53 people) said they were disabled at the present time. That is the standard incidence rate for disability that would be determined from traditional surveys. However, by asking those individuals who were not disabled at the present time the additional question, “At any time during the past year have you been disabled?” almost 13 percent more (48 people) responded in the affirmative, for a total incident rate of 27 percent—almost double what would be found using the traditional approach.

Gill said that the next question was how to determine if these reports of disability at any time over some period were accurate. People who were not currently disabled were evaluated in two groups: those who said they had not had any disability at any time during the past year and those who said they had. The two groups were followed forward for another 18 months, with the hypothesis that people who reported having a disability or having had one at any time over the previous year would have worse outcomes, and that is what was found. Specifically, the additional disability episodes ascertained only by the person’s recall predicted high risk for the subsequent development of chronic disability—a major determinant for the use of long-term care services—even after accounting for potential confounders. This finding provides some validation that the reports of disability at any time are valid.

Despite this potential advance in the assessment of disability, a large proportion of older persons do not recall episodes of disability that in fact occurred during the prior year. An effort to identify the factors that are associated with accurate recall of prior disability found that participants’ education was the strongest predictor: more highly educated individuals were more likely to accurately recall having had a disability over a preceding 12 months. Cognition was also associated with accurate recall, but not statistically so. Although education could be a proxy for cognitive status, the effect of education was not attenuated in the multivariate analysis, which included cognition as a covariate. The validation effort also found that disability-specific factors had some relationship to accurate recall. Thus people who had a disability episode more recently, say within the past 3 months, were more likely to recall having it than those with less recent episodes. Those who had at least one episode of severe disability, as well as people who had had a severe episode (defined as having disability in three or more ADLs), were more likely to recall it than others. If a disability persisted for more than 1 month, the likelihood of recall was higher.

In summing up the results, Gill offered ways to enhance the ascertainment of disability. If an individual is not disabled at the present time, ask whether he or she has had a disability episode at any time since the prior assessment. If the response is no, probe further, using a standard protocol focusing on major illnesses or injuries that have occurred since the prior assessment. Special attention may be warranted for people with low levels of education and perhaps those who are cognitively impaired. Lastly, another way to possibly enhance the ascertainment of disability would be to adopt a calendar approach, which has been used successfully to ascertain falls. Some variation of this approach could be implemented for the ascertainment of the incidence of disability.


Jay Magaziner’s (School of Medicine, University of Maryland, Baltimore) presentation dealt with the use of proxies to obtain information on health and functioning of older persons in population surveys, describing some of the issues, suggesting some practical strategies for using data from proxies, and identifying areas for additional study. He noted that there have not been any major breakthroughs in the use of proxies in the past several years, and so a lot of the information comes from work that was done some time ago.

The significance of the problem is obvious. There is a substantial increase in the number of older people in the population, and this number is projected to increase. There is an increased need to conduct clinical and population research on this group; the omission of persons who cannot respond for themselves limits generalizability in research. There is a lot of effort toward improving measurement of disability, but a key question is how to get information about disability from people who cannot report for themselves.

Proxies are used to obtain information about people who cannot respond for themselves, will not respond for themselves, or are difficult to locate initially or for follow-up. At times, proxies may be used to obtain information in a less costly way. However, if proxies are used, how would one factor the information and what would be its real utility?

The extent of the problem of nonrespondents varies depending on the group of interest. Among people aged 65 years and older, about 5–10 percent of community-dwelling people are unable to provide reliable information for themselves because of cognitive limitations. As many as 40 percent of people who are hospitalized are unable to provide information for themselves. For nursing home residents, this number is well over 50 percent. Depending on the group of interest, one is dealing with a fairly large problem. In addition to people who cannot respond because of cognitive limitations, there are people who cannot respond for other health reasons, people who refuse, and people who cannot be located. Thus, there is a fairly sizable problem of missing information if it cannot be obtained from some other source. Areas of measurement for which proxies may be needed include measures of health status, including information on reported diagnoses and symptoms, and a variety of areas of functioning—physical, instrumental, affective, cognitive, social, and economic. There may be other areas, but these are some that have been examined.

The major issues related to subject and proxy agreement are no different from the kinds of issues faced in any kind of scientific measurement. These are issues of precision and bias. The level of agreement between subjects and proxies is really a function of the precision and the bias (bias refers to discrepancy and not whether one is right or wrong). Most studies have focused on agreement, but essentially, because of the nature of agreement being a composite of precision and bias, less has been done on bias. Magaziner noted that researchers need to pay more attention to the magnitude and direction of bias. Precision is important, but when using proxies, one must consider bias. Agreement and bias are both functions of the question asked, characteristics of the subject, characteristics of the proxy, and characteristics of the context and culture.

Magaziner next highlighted some of the findings of studies of patient and proxy responses. In a study of community-dwelling women aged 65 years and older, those who had been hospitalized for hip fracture were asked about their ADLs, walking, and how they were before they were hospitalized. When both subjects and proxies were asked about a simple task, such as walking ability, the measure of agreement was fairly good, with 10 percent of the patients reporting they were unable to perform a walking task or needed a lot of assistance with it, and proxies reporting about 10.8 percent (Magaziner et al., 1996). The level of agreement declines in an ordered manner as one moves from walking to bathing, shopping, preparing meals, dressing, handling money, and grooming. The level of agreement for handling money was not very good, possibly because of the complexity of the question. The question itself may not be tapping into the same thing for a self-report and a proxy: 15 percent of subjects reported that they could not handle money on their own, while 20 percent of proxies said they could not. Whenever there is a bias, it tends to be in the direction of more disability reported by proxies.

With regard to affective status, the subjects were asked about depressive symptoms, using a Center for Epidemiologic Studies Depression Scale type of measure on them. Then the proxies were asked how they thought the person would respond to those particular questions. The same was done with cognitive status, using the Mini Mental State Exam. The bias was quite small, but in this community survey there was a negative bias. The proxies underreported both depressive symptoms and cognitive status compared to the subjects (Bassett et al., 1990). Similar analysis was done with data on a post-hip fracture group; for which proxies overreported depressive symptoms but clealy underreported cognitive problems; that is, proxies said that the person actually performed better than was shown in a test of cognition.

For people with ADL and IADL limitations due to chronic conditions, there is a tendency for proxies to overreport disability (Magaziner et al., 1996). For physical symptoms experienced within the past month, there is no consistent pattern. Many of these symptoms are private symptoms. They are not things that a proxy would know easily, which may result in a lack of agreement.

Some characteristics of the proxy make a difference. For example, female proxies tend to report more disability than male proxies when their responses are compared with the subjects themselves. Proxies who live with the subjects report more disability than those who do not. Those who assist subjects report more disability, and those who claim to have good knowledge of the subject generally report more disability than those who do not.

To summarize, proxies can provide answers that agree with subject reports for objective, observable items, such as walking, and chronic disease states. Proxies are poor reporters of private unobservable items, such as the use of a urinary catheter or symptoms. Proxies are poor reporters of complex tasks when the questions are asked in a global manner, such as handling money.

When there is disagreement, proxies generally report higher levels of disability than subjects report for themselves, with the notable exception of cognitive function. Female proxies, those living with subjects, and those providing care report more disability than subjects report for themselves. Agreement and bias are functions of the question, subject characteristics, and proxy characteristics.

There is some practical advice about using proxies that one can take away from this work: Develop more objective questions that do not call for judgments by proxies. Conduct pilot studies for questions to be used and proxies to be encountered in the population under study, and try to understand how the proxy would perform in that particular study and then use that information for interpretation of results. Evaluate agreement and bias. Consider using only proxies, which might be useful if a large percentage of dropouts is expected: Why introduce another level of bias if one can get all the information consistently over time from proxies?

Further research is needed to evaluate proxy data for those who cannot respond for themselves. Most of the research to date is based on subjects who can respond for themselves. It is important to develop and test better questions and determine whether data adjustments can be made from knowledge about questions, subjects, and proxies. Evaluate the effects of substituting proxy data on parameter estimates; evaluate the effect of using only proxies, especially when bias is significant, and evaluate the effects of using information from multiple sources in the same analysis to arrive at assessment of functional status.

In conclusion, proxies can be used with a reasonable degree of reliability for some questions. More research is needed on the use of proxies for measuring functional status in those who cannot provide information for themselves. Proxies must frequently be used in place of subjects in studies of older persons, until some good methods are developed for obtaining information in a reliable way about those who cannot provide it for themselves. Scientists have an obligation to report on their use of proxies and describe the possible effects that they can have on study results.


Arie Kapteyn (Labor and Population, RAND) addressed the use of the Internet for survey administration as the focus of most of the innovation at this time, but some of the issues he raised are also relevant to other modes. He focused on Internet interviewing in general and the Internet and the elderly because disability clearly is most prevalent among the elderly, and it is also the group for which Internet use is still more problematic than for other age groups. He also discussed new technologies and some perspectives on what is coming next.

Internet penetration in the United States is probably about 75 percent. In Europe the Internet penetration is probably about 50 percent, with large variations among countries. In the Netherlands it is about 90 percent, probably even higher. Scandinavia is also very high. In Southern Europe it is much lower: In Greece penetration is only 35 percent. In countries with low Internet penetration, using the Internet as the only survey mode would lead to coverage error. Yet other modes, such as the telephone, also have problems; telephone interviews that use only land lines increasingly suffer from the same problem of coverage, and there are also problems because of answering machines that screen calls.

Internet coverage is directly related to age. In a study using the Internet mode, researchers found that in 2002 almost half of respondents under 60 years of age had Internet access. This number declined quickly until only about 10 percent of the respondents 76 years and older had Internet access (Couper et al., 2007).

Data from a new panel set up in the Netherlands (the Longitudinal Internet Studies for the Social Sciences) provide some information on how representative a study can be using the Internet for survey administration. For this Internet panel, respondents received broadband Internet access if they did not have it yet. In collaboration with Statistics Netherlands, the researchers used population registers as a sampling frame. Kapteyn remarked that one of the great things about northern Europe is that there are population registers, which make nice sampling frames. The baseline response rate of this panel was 50 percent, which for a panel in the Netherlands is quite good.

One of the things that people always talk about is the mode effect. What is different between the Internet and other modes? Internet and written interviews are similar, and computer-assisted telephone interview (CATI) and computer-assisted personal interview (CAPI) are also quite similar. Essentially, the distinction really is if there is an interviewer.

Some features of the Internet that make it attractive are speed and cost-effectiveness, especially for panels. Once people are in the sample, questions can be asked at any time of the day and any time of the week. However, it is also dangerous in the sense that anyone can do Internet surveys. Arduous tasks can be broken up into modest-sized bits: For example, if you do surveys of no longer than 30 minutes over a couple of weekends, you can amass a lot of information. A total of 5 weekends of 30 minutes would yield 2.5 hours of interviews. Other attractive features are quick turnaround, feedback, flexibility, and high-frequency and event-related interviewing (e.g., following the onset of disability or some illness). In terms of automation, the Internet mode has all the advantages of CATI and CAPI. One use of this approach has been the American Life Panel, which since November 2008 has been monitoring via the Internet how households are faring in the financial crisis. Various experiments have been done on elicitation of probabilities and expectations, portfolio choice and presentation of information, a sequence of vignettes in the Netherlands and the United States, including test of response consistency, and animation. The Health and Retirement Study (HRS) instrument is being migrated to the Internet; it is expected to be completely on the Internet in about a year. It is administered to respondents in chunks. Respondents get a module, and a couple of weeks later they get another one.

Some examples of future possibilities are a heart rate monitor and an actigraph device that measures individual activity level, energy/caloric expenditure, duration and intensity of sustained activity, daily activity profile, limb extremity movements, sleep patterns and night activity, steps taken, and heart rate (in at least some models). The input from these devices can be combined with the Internet. The respondents are asked to wear the device, say for a week, and the measurements can easily be transferred by using wireless technology, or a USB key to transfer data by computer or by mailing the device back. These measurements can be combined with self-reports of activities or stress, time-use data, self-reports of subjective well-being, experience sampling, anchoring vignettes, etc.

One reason for the interest in using these devices is a result of findings from the HRS and its English equivalent, the English Longitudinal Study of Ageing: In questions about physical activity, Americans say they are more physically active than the English, and somehow the English do not believe it. There is currently a proposal to use these devices in the United States and England and find out whether the English are too modest, the Americans are bragging, or something else is going on.

In conclusion, Kapteyn observed that because Internet penetration is related to age, it is likely to grow substantially, even among the elderly, as cohorts age. In addition, the user friendliness of devices is improving quickly. Finally, more attention should be devoted to design of websites intended for the elderly.


In the discussion, the topic of proxies and proxy measurement drew the most comments. Other issues of note were use of the Internet for data collection, the role of the home environment for conducting performance tests, and phobias in old age.

Use of Internet for Data Collection

A participant noted an interesting aspect of using the Internet for data collection, as well as any of the research looking at age differences and use of the Internet, namely, that there actually are age differences in sensory perception and ability to physically use a computer interface. Arie Kapteyn was asked if design issues related to each of the cohorts were examined. Although a lot of these concerns will be moot in the next several years as greater familiarity with computers and the Internet moves through the population, design issues are important when the Internet is the mode of data collection, especially for the cohorts in the older ages.

He responded that they had not addressed that issue in their study, but there are people working on website design for all age groups. The Internet is still very much attuned to young people. The smaller devices with a lot of information are really difficult to read. For very old respondents, the first thing needed is that the letters have to be big, and the screen should not be cluttered—it has to be as simple as possible, otherwise people get confused.

Role of the Home Environment for Conducting Performance Tests

Linda Fried was asked if in her studies she and her colleagues had studied the differences that the home environment makes in conducting performance tests with different populations. Lack of space to set up the walking speed course could limit the ability to do the performance tests.

Fried responded that she was not aware of anyone having looked into this issue. In the design of the Women’s Health and Aging Study, investigators spent a lot of time on the design of those performance measures, and they were able to do performance-based measures on highly disabled older women in some pretty constrained homes. In the Whitehall Study of British civil servants (Brunner et al., 2009), they took what was then the standard 4-meter walk and if the space was too limited in the home, designed a way to do it in just 3 meters in a very standardized way. They were then able to compare both the 4-meter and 3-meter walks in the same data set.


Robert Wallace (University of Iowa) asked Fried if the fear of falling expressed by an individual in her study was due to a phobia about falling or the result of disability. He said that there may be a lot of phobias in old age, which may or may not be warranted, that researchers do not pay much attention to fear of falling, fear of crowds, fear of noise, and fear of going out. Such phobias may be a lot of the reason for a “disability” rather than actual mechanical problems of the disability. There are a lot of phobias without a physical basis that create a fear that may then affect functioning. There is a whole range of other things that also affect behavior and what may be perceived as disability.

Fried responded that in that particular case the person’s fear of falling was entirely due to the instability of her knees from osteoarthritis. She was not phobic. However, Fried agreed that there are a lot of phobias without a physical basis to create that fear that may in fact be modifiers. For example, does a person have a reason to get up in the morning? Depression aside, are there activities that one has available that one cares about? Are there places to go? All of those things affect motivation, absent psychiatric illness, which are also huge modifiers of behavior.

Proxies and Proxy Measurement

Robert Hauser (University of Wisconsin-Madison) commented on the importance of the gold standard with respect to proxy measurement. All survey responses are subject to error. If the correlation between self- and proxy reports is high, it means that people are reporting perfectly up to the level of reliability of the instruments. One needs to think about measurement error on both sides and about a gold standard for the value of proxy measurement.

Jay Magaziner responded that the need for a gold standard really gets more to a fundamental question that cuts across all of what has been discussed. That is the purpose of measuring something before one can talk about how best to measure it. If there is a reason behind what one is measuring, whatever it is, then one might be able to approach some kind of a gold standard for that specified purpose. So the key question is what does one really want to know and why? The gold standard is now a mixture of the environment, the social situation, the nature of the items we are asking, and so on.

Fried commented that this issue of a gold standard is something that has many different dimensions. It goes to the issue of both what one wants to understand and also the experience of the individual. There are many contextual factors that shape or modify and exacerbate or minimize that human experience. The human experience of what people are able to do is the core issue. There is as yet no conceptual agreement about measuring disability.

In addition to the discussion of a gold standard for proxy measurement, other issues about proxies drew lively discussion. Participants reiterated several points made in the presentation: One of the reasons for bias in responses in proxy measurement might be characteristics of the proxy, such as gender. Is there similar information on cultural or ethnic differences that might account for different perceptions of disability in the subject? Magaziner said that female proxies tend to report more disability in the subjects than male proxies do. Those who provide more care for the person have higher ratings of disability of the subject than those who do not provide care. The gender issue may be tied to the fact that women are also providing care.

Should there be a rule that if one is going to be doing a study in which proxies will be considered, the researchers should have a subsample in which they interview both proxies and subjects? For participants who cannot self-report, how relevant is that kind of methodology, because they are really different from the participants for whom one can query both subject and proxy pairs? If people cannot report for themselves, how does one know what somebody else would really be reporting for them? Magaziner responded that he does not know empirically whether that would work. Should there be a subsample? Given currently available information, yes, it would be worthwhile. At least it would help with sensitivity analysis or setting some boundaries on what is learned.

For the oldest old, the proxy frequently is more accurate because in many cases people will underplay their disabilities because of fear of being moved from independent living to assistive living or a nursing home. Another factor is that the elderly person may report that he or she is independent if a caregiver is making activities feasible that would not otherwise be feasible.

Magaziner commented that the direction of the discrepancy becomes important in a population survey in which people of all ages are reporting. One wants to know about that 96-year-old person who cannot quite self-report because he or she does not understand the question. When he or she cannot give what a reasonable person would believe is a reasonable reply, one asks the proxy. Often researchers make simple substitutions, but maybe that is not what one wants to do. Researchers do not have an answer, but that is what needs to be addressed if they do not want to lose people in their attempt to obtain information about the whole population, and not just those who can provide an answer for themselves.

Should one be guided by the findings on the characteristics of the proxies associated with discrepancies in selecting people to serve as proxies? The answer is yes, if one can find the perfect proxy. One has to work with what is available in the real world. The choice may be dependent on the question to be asked, and who has the best opportunity to observe the subject? For example, in a nursing home, perhaps the family proxy is not the best person but someone who sees the subject all the time on a daily basis.

In closing, Andrew Houtenville (New Editions Consulting) informed the participants about two research efforts under way—one led by Mathematica Policy Research and the other by New Editions Consulting, both funded by the National Institute on Disability and Rehabilitation Research, and both about proxy response. Mathematica is going to be working on the question of what protocol is best for a given situation, using an experimental design. New Editions Consulting is going to look at administrative data as a third source of information. Work has been done on this by some economists at the National Bureau of Economic Research using Canadian data in the reporting of diabetes in a working-age population. That work did not involve proxies, but it gave the degree of reliability of reporting diabetes as well as an association with the reporting of a work limitation among the working-age population.



The background paper (see Appendix A) has an extensive examination of these measures in tabular form for the most frequently used population surveys with disability measures.

Copyright © 2009, National Academy of Sciences.
Bookshelf ID: NBK28473


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.2M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...