Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Gerontol A Biol Sci Med Sci. Author manuscript; available in PMC 2009 Jul 30.
Published in final edited form as:
PMCID: PMC2718692

Creating a Computer Adaptive Test Version of the Late-Life Function & Disability Instrument

Alan M. Jette, PhD, PT,1 Stephen M. Haley, PhD, PT,1 Pengsheng Ni, MD, MPH,1 Sippy Olarsch, MS PT,1 and Richard Moed, MPA2



This study applied Item Response Theory (IRT) and Computer Adaptive Test (CAT) methodologies to develop a prototype function and disability assessment instrument for use in aging research. Herein, we report on the development of the CAT version of the Late-Life Function & Disability instrument (Late-Life FDI) and evaluate its psychometric properties.


We employed confirmatory factor analysis, IRT methods, validation, and computer simulation analyses of data collected from 671 older adults residing in residential care facilities. We compared accuracy, precision, and sensitivity to change of scores from CAT versions of two Late-Life FDI scales with scores from the fixed-form instrument. Score estimates from the prototype CAT versus the original instrument were compared in a sample of 40 older adults.


Distinct function and disability domains were identified within the Late-Life FDI item bank and used to construct two prototype CAT scales. Using retrospective data, scores from computer simulations of the prototype CAT scales were highly correlated with scores from the original instrument. The results of computer simulation, accuracy, precision, and sensitivity to change of the CATs closely approximated those of the fixed-form scales, especially for the 10- or 15-item CAT versions. In the prospective study each CAT was administered in less than 3 minutes and CAT scores were highly correlated with scores generated from the original instrument.


CAT scores of the Late-Life FDI were highly comparable to those obtained from the full-length instrument with a small loss in accuracy, precision, and sensitivity to change.

Keywords: outcome assessment (Health Care), geriatrics, rehabilitation


Difficulty with physical functioning and disability are widely recognized as serious problems among older adults and predicts nursing home admission, hospitalization, physician utilization, overall dependency, and mortality (14). Consequently, physical functioning and disability assessment have become standards in the evaluation of older persons in gerontological research.

While the field has seen great improvement over the past 30 years in patient-reported instruments that assess function and disability, the length and complexity of many fixed-form instruments is problematic and raises concerns over respondent burden and administration costs (5, 6). The shift to shorter fixed-form versions of patient-reported instruments has raised concern over resultant loss of precision and insensitivity to clinically-meaningful change (7). These well-recognized limitations are largely due to the defining signature of traditional instruments which consist of a fixed set of questions that must be administered to all subjects. Respondents are often frustrated by redundant items and items that to them are of low salience and relevance (810). Consequently, no function and disability measure can be considered as a “gold standard.”

Contemporary methods of instrument construction coupled with new data collection models provide a promising means for simultaneously achieving measurement breadth, precision, and sensitivity across the broad range of function for older adults while reducing the burden and cost of data collection. Outcome measurement has seen important advances with the introduction of Item Response Theory (IRT) methods (1114), which have allowed researchers to develop quantitative scales that are sensitive to the smaller functional change often seen in older adults. Nevertheless, IRT methods alone have been insufficient to balance comprehensiveness of coverage, precision, and sensitivity against assessment feasibility. Recently, computer adaptive testing (CAT) methods have been introduced in the health field as a potential solution to this measurement dilemma (15, 16). Adaptive testing approaches tailor the assessment to the current level of function or disability of the older adult so that only items that are neither too hard nor too easy are administered. In CAT administration, the program uses the response to an initial question to establish a general range of likely function. Subsequent questions are selected through application of algorithms to progressively refine the estimated score to the range of precision established a priori by the examiner. Regardless of the actual items administered all scores are on the same scale, which supports comparisons across time or across groups of individuals with different levels of current functional performance. A detailed discussion of IRT and CAT methodology is available elsewhere (17).

The development of a CAT requires: 1) a set of items (item bank) that examine each outcome; 2) items that scale consistently on a single dimension from low to high ability; and 3) rules to guide starting, stopping, and scoring. Although CAT offers a potential solution to the conflict between psychometric adequacy and measurement feasibility, the psychometric properties of CAT instruments must be demonstrated empirically. In the present study we conducted analyses to identify distinct conceptual domains within the Late-Life FDI, calibrated quantitative scales, and created a prototype CAT version of the instrument. We conducted computer simulation studies of two CAT scales to assess their accuracy, precision and sensitivity to change. In a small prospective study we estimated the comparability of CAT scores with fixed-form scales of the Late-Life FDI and calculated the time to complete each CAT.



CAT Development sample

Subjects for confirmatory factor analysis, IRT analyses, and simulation studies come from the “Promoting Independence in Residential Care” (PIRC) study that examined outcomes of a functional training intervention in 671 subjects residing in 32 residential care facilities in Auckland and Christchurch, New Zealand. Baseline data from the PIRC study were used to assess the underlying structure of the Late-Life FDI and for simulation studies of the accuracy and precision of the CAT version of the Late-Life FDI. While 671 subjects completed the baseline full-length instrument, there were 476 subjects who finished the 12 month follow-up and were used to assess the CAT scales’ sensitivity to change. Since the attrition of subjects could affect the generalizability of findings, comparisons of background information between completers and non-completers were performed.

Pilot Study sample

We recruited a sample of 40 older adults from the greater Boston area with limitations in function and/or physical disabilities who were subjects in previous studies conducted by the BU Health & Disability Research Institute. These were used to calculate the time needed to complete each CAT and to compare scores generated from the CAT and fixed-form versions of the Late-Life FDI.


The Late Life FDI

The fixed-form Late-Life FDI consists of 64 items that provide a patient-reported assessment of distinct function and disability domains in older adults (1822). The Function component assesses patient-reported difficulty in performing 32 functional tasks. The Disability component, which contains 32 life activities, assesses two dimensions:


16 items – “how often…” and Limitation: 16 items – “how limited…”. Two approaches to assessing disability allow older persons to respond differently to questions of what they actually do in daily life versus what they are capable of doing. Raw logit scores for each domain were transformed to summary scale scores that were transformed to a t-scale where a score of 50 is the mean with 10 as the standard deviation. Based on a 1 parameter Rasch model each domain was scored on a similar metric where higher scores reflected better function or less disability (23). The Late-Life FDI is being used in gerontological research (24) although some concern has been reported regarding the 20–30 minute administration time (25).


Late-Life FDI Domains

We tested the underlying structure of the Late-Life FDI items in a series of confirmatory factor analyses (26) and used MPlus software to evaluate item loadings and residual correlations between items (27). Because the data were not normally distributed, we used the weighted least square estimation which is based on polytotic correlation. We used multiple fit statistics to check the model fit: 1) chi-square to degrees of freedom ratio, Comparative Fit index (CFI), Tucker–Lewis Index (TLI) and Root Mean Square Error Approximation (RMSEA). CFI and TLI compare the testing model to a baseline null model, values range from 0 to 1, 0.95 or higher suggests acceptable fit; RMSEA, which assesses misfit per degree of freedom; values less than 0.08 suggest an acceptable fit, whereas values less than 0.05 suggest very good fit; 2) the magnitude of the factor loadings on the primary factor; and 3) residual correlation greater than +/− 0.2; a higher residual correlation indicated the primary factor could not fully explain the correlation between items, it indicates violation of the local independence assumption.

We used weighted least squares means and variance adjusted estimation methods, which are more precise when analyzing moderate-sized samples with skewed categorical data (26, 28). To determine the extent to which a unidimensional model adequately represented scale structure we considered the eigenvalues associated with each factor extracted, item loadings on the primary factor, and results from the overall model fit tests.

Item calibrations

The item calibrations for each scale were estimated using the Rasch IRT model, which estimates the item difficulty parameters (2931). The Rasch model was selected as the best solution for this phase of the project because of simplicity in interpretation and flexibility about the underlying form of the population or trait distributions. Item calibrations from a Rasch partial credit model are estimates of each item’s level of difficulty based on the item response pattern in the sample data. Using item calibrations for all items, we estimated IRT-based scores for each function and disability domains using Weighted Maximum Likelihood estimation methods (23, 28). We evaluated fit using the INFIT and OUTFIT statistics for each item based on the comparison of expected and observed value across the distribution of each latent variable; Bonferroni corrected p-values were used for significance testing. The scores estimated from the IRT model were standardized to have a mean of 50 and standard deviation of 10.

Differential item functioning

In IRT a subject’s score on an item should depend entirely on the latent variable being measured. Significant differential item function (DIF) indicates that variables other than the latent variable, such as age, or gender, are likely influencing the response (32). There are two kinds of DIF, one is uniform DIF which means the item response difference is constant across the subjects’ ability level, another is non-uniform DIF which means the item response difference will change at different ability levels. Logistic regression was used to detect the uniform DIF and non-uniform DIF across gender, age and between the baseline and follow-up assessments using statistical significance and a cutoff of the R-square change greater than 0.02. In these regressions the dependent variable was the function/disability item score and the independent variables were the ability level as assessed by the Late-Life FDI, the background variable being examined for DIF (e.g., cognitive level), and the interaction between the background variable and the ability level estimate. In a DIF analysis if the background variable effect is significant and interaction term is not, then the item displayed uniform DIF. If the interaction variable is significant the item has non-uniform DIF. We used level of statistical significance based on the likelihood ratio test and used Bonferroni corrected p-values for significance testing. R-square change was used to quantify the effect size of uniform and non-uniform DIF using the criterion based on Jodoin and Gierl’s work (33).

Development of the CAT program

Once a final item pool was identified and item calibrations were generated for each domain, we constructed the Late-Life FDI CAT algorithms on the HDRI™ software developed at the Boston University Health and Disability Research (HDR) Institute. The CATs were designed to be patient-reported and are administered from a stand-alone or laptop computer. We programmed the CATs to use weighted maximum likelihood (WML) score estimation and we selected initial items from those in the middle of the function and disability range. The response to the first item was fed into the CAT algorithm and the application calculated a probable score as well a person-specific measure of how precise that score was. If the score was not estimated with sufficient precision, according to internal guidelines, additional questions were selected and administered until either the precision standard was reached or the defined maximum number of items had been administered. The CAT process is illustrated in Figure 1.

Figure 1
Example Late-Life FDI CAT, using 4 items to meet stopping rule

Psychometric Evaluation of the CAT

We conducted a series of simulations to estimate the CAT scales’ accuracy, precision, validity and sensitivity to change. In these simulations, responses to items selected by the CAT software were obtained for cases in the PIRC data set and "fed" to the computer to simulate the conditions of an actual CAT assessment. As in an actual CAT, the simulation used the IRT model to select the best item to administer next, i.e. the one with the highest information function given the current score level, re-estimated the domain score and confidence interval (CI), and decided whether or not to continue testing. To compare results from the CAT and fixed-length scales in the simulation studies we used a fixed-stopping rule of 5-, 10- and 15- items for the 32 item function scale, and 5- and 10- items for the disability scale since the disability limitation scale had only 16 items. For the prospective evaluation we employed a 10-item CAT and compared that to each fixed-form function and disability limitation scale.

The accuracy and precision simulations were conducted in the baseline PIRC sample (N=671). To assess CAT accuracy, Pearson correlations were calculated between each of the CAT generated scores and the fixed-form Late-Life FDI domain scales to assess the extent to which simulated CAT scores were consistent with scores from the full length instrument. To compare the relative precision of the CAT scores to scores from the full length scales we plotted the standard errors in relation to the ability scores in the sample.

The comparability of simulated CAT-based estimates in measuring change over time was examined within the development sample who had been administered each Late-Life FDI scale at baseline and at a 12 month follow-up assessment. Average scores, change scores, effect sizes, and standardized response means were compared (34). We did not examine responsiveness in relation to an external standard.

Known-groups validity was assessed by calculating Spearman correlation coefficients (±95% confidence intervals) between Late-Life FDI CAT scales and measures of physical performance in the Elderly Mobility Scale (EMS) (40). We used quartiles to group scores from the EMS into less than 25th, 25th-50th, 50th-75th, and greater than 75th.

Pilot Study

The Late-Life FDI CATs and fixed-form scales were administered by telephone interview with a sample of community dwelling older adults with disabilities. Pearson correlations were calculated between each of the CAT generated scores and the fixed-form Late-Life FDI scales to assess the extent to which simulated CAT scores were consistent with scores from the original instrument.

We provided interviewers with formal training in the administration of the CAT. We had an internal clock to track the amount of time and the number of items needed to meet pre-set levels of precision for each CAT. Subjects were told they were being asked to help us evaluate a new instrument and that similar items would be asked. Demographic information (ethnicity, sex, age, and functional level) was available for each subject. All procedures were approved by the Institutional Review Board at Boston University.


In the baseline PIRC sample, the average age was 84.1, (sd = 7.4) and 73% were female (Table 1). There were no statistically significant differences in mean age, gender, or Late-Life FDI scores between those who completed the 12 month follow-up and those who dropped out.

Table 1
Demographic characteristics of the study samples

Function and Disability Domains

We tested several different models within each domain of the Late-Life FDI. In the function domain, a one dimension model across all 32 items achieved an acceptable level of fit, explained 57% of the variance, and was easily interpretable (Table 2). The percentage of residual covariance greater than +/− 0.2 was 2.2% of this one dimensional function model which indicated that the local independence was acceptable. The 16-items in the disability limitation scale also fit a one dimensional model (Table 2). In this model, there was no residual covariance value greater than +/− 0.2, which means the local dependent assumption was satisfied. In contrast, the 16 item disability frequency scale did not fit an acceptable one dimension model. Even when the domain was broken the scale into 2 scales, the level of fit was not acceptable. After removing 5 items from the frequency scale, the one dimensional model achieved an acceptable degree of fit, resulting in an 11 item disability frequency scale. With only 11 items, a disability frequency CAT was not developed.

Table 2
Model fit statistics for confirmatory factor analysis of Late-Life-FDI domains

Item calibrations

The distribution of the Late-Life FDI function and disability scale scores in this sample is displayed in Figure 1 and Figure 2 along with the distribution of response categories at each level of the scoring distribution. For the function scale at baseline, one member of the sample displayed floor and ceiling effects while 2 subjects had a ceiling effect at 12 months. At baseline, in the disability limitation scale, 17 subjects (2.5%) of the sample exhibited a ceiling effect and no floor effects were seen at either time point. For the disability frequency scale, one subject had a floor effect while no subjects displayed a ceiling effect at baseline or the 12 month follow-up. Item calibrations in each domain of the Late-Life FDI are displayed in Table 3.

Figure 2
Late-Life Sample and Function Item Bank Distributions
Table 3
Item Calibrations for the Disability and Function Domains

Differential item functioning

There was 1 item (unscrewing a jar lid) which did display gender DIF in the function scale. Since this item displayed only marginal DIF (a R square change of less than 0.0284) and because of the importance of the content of this item, we decided to retain the item in the function scale. The effect of this item on the CAT simulation results was very small. If gender DIF is identified in future analyses of an expanded function item bank, the analysis can be stratified by gender. There were no function scale items that showed DIF across level of cognitive impairment or across the two assessment time points.

There were no items that showed gender, cognitive impairment, or age DIF within the disability limitation scale. The logistic regression results did detect the DIF between baseline and 12 month assessment for the ‘ACTIVE RECREATION’ item which showed significant DIF across two time points.

Accuracy, precision, validity, and sensitivity to change of the CAT

As Table 4 displays, the correlation between scores based on the fixed-form versions and the 5-, 10-, 15-item CATs for the function version and the 5- and 10-item versions in the disability limitation domain ranged from 0.90 – 0.99 in the baseline sample and at comparably high levels in the 12 month sample, indicating a high and consistent degree of accuracy using CATs.

Table 4
Intraclass correlations between fixed-form Late-Life FDI and CAT simulation scores in the baseline and follow up samples

In the function and disability limitation CATs, the standard errors of CAT 5-item were consistently larger that the CAT-10 and CAT 15 across all ranges (Figure 4 and Figure 5), indicating less precision for the 5-item CAT. The CAT standard errors were only slightly larger than those from the full-length version reflecting the fewer number of items that were used to calculate the overall score. For both scales the standard errors were greater at extreme score ranges.

Figure 4
Plot of standard error of subject scores based on 5-, 10- item Late-Life Disability CAT compared with all items for that scale
Figure 5
Plot of standard error of subject scores based on 5-, 10-,15-item Late-Life Function CAT compared with all items for that scale

The ability of each CAT version to discriminate between groups of older adults on the basis of mobility limitations was evaluated by comparing average scores on the function and disability CATs across subject groups who scored in different quartiles on the EMS. As hypothesized, the CAT average scores were significantly different for those in lower EMS quartiles as compared with those in higher EMS quartiles, an indication of the CAT’s validity (Table 5).

Table 5
Known Groups Validation of the Late-Life FDI CAT scales

The descriptive statistics and standard deviations of baseline and follow-up scores from the 5- and 10-item CATs were quite similar to those for the original version of the Late-Life FDI for all domains (Table 6). The effect sizes and standardized response mean values for both CAT versions are somewhat smaller than the full length version of the instrument.

Table 6
Sensitivity to change of the fixed-form Late-Life FDI versus the CAT versions

Pilot Study

The function CAT, on average, took 2.5 (sd=0.87) minutes and the Disability limitation CAT took 2.8 (sd=1.4) minutes, on average, to complete by telephone administration in comparison to taking 20 minutes, on average, to complete the fixed-form versions of both scales of the Late-Life FDI. Scores estimated by each CAT scale were highly correlated with the scores generated by the fixed-form versions of each scale. Scores from the function CAT were correlated 0.94 with those generated by the fixed form scale and 0.82 between the disability CAT and the disability fixed-form scale.


The results of these analyses revealed that prototype CAT models built for the function and disability domains of the Late-Life FDI instrument provided accurate, precise, valid, and sensitive estimates of late-life function and disability while reducing administrative burden. Two 10-item CATs were administered by telephone, on average, in 5.4 minutes compared with 20 minutes or more for the fixed-length scales. Although preliminary, the results from the present study are encouraging in that they demonstrate that the goal of a patient-reported function and disability assessment is achievable with little sacrifice of psychometric quality while reducing data collection time and administrative burden. Although further work is clearly needed to expand the Late-Life FDI item banks to improve breadth and depth of coverage and to refine the CAT scales, these results provide evidence that a CAT version of the Late-Life FDI offers the possibility of a patient-reported outcome measure that could be usefully applied across diverse populations of older adults to monitor change in function and disability. These results are consistent with previous research with CAT models for function (35, 36).

In reflecting on the Late-Life FDI structure as revealed in this sample with others it is interesting to note that previous factor analysis and IRT analyses suggested a more complex structure underlying the Late-Life FDI than was revealed in this study. In prior work, the function component consisting of three sub-scales: Upper Extremity Function, Basic Lower Extremity Function, and Advanced Lower Extremity Function. Analysis of the disability limitation domain revealed two sub-domains: Instrumental Role and Management Role while two different sub-domains were identified in the disability frequency domain: Social Role and Personal Role. Other investigators suggested a somewhat different structure across the instrument (25, 37). In contrast to these previous analyses, the confirmatory factor analysis findings in this study revealed a more parsimonious structure consisting of 3 sub-domains: function (32 items), disability limitation (16 items), and disability frequency (11 items). However, given the small number of items that fit a one dimensional solution in the disability frequency scale, we do not recommend the construction of a CAT version of this domain at this time. Further research is particularly needed to expand the disability frequency item bank sub-domain before a useful CAT version can be constructed.

One option would be to delete entirely the disability frequency domain within the Late Life FDI. We believe, however, that the distinction between disability limitation and frequency of daily activity performance may be important and worth preserving. Recent analyses of a sample with knee pain and functional limitations, for example, revealed that factors within a person’s community environment had a differential impact on frequency of activities in contrast to their perceived limitation in the same activities (38). Specifically, the presence of transportation facilitators in one’s community was associated with less limitation in disability but was not associated with the frequency with which persons did those same activities. The finding may suggest that regardless of the level of transportation resources available in one’s environment, older persons continue to participate at the level of frequency they desire. Yet, more transportation resources makes it easier for them to do so, resulting in feeling less limited.

One potential concern with the introduction of CAT instruments is that by administering fewer items may contribute to a loss of sensitivity to change which would greatly dampen enthusiasm for their use. As has been reported for other CAT instruments, the present findings reveal that modest, if any, sensitivity to change is lost compared with the full length instrument as long as the CAT program has 10 or more items. This finding has been replicated in prospective studies of other CAT instruments that assess function in different samples (39, 40). However, it should be noted that the 5-item CATS were less accurate, precise, and sensitive and therefore would not be recommended for many research applications. One of the advantages of CAT is that it allows the user to specify the level of score precision desired for a particular application. In a CAT model using a stopping rule based on a relaxed level of score precision, therefore, it is quite possible that the scores of some individuals might be estimated with fewer than 10 items. Also, in individual assessment, where high precision is desirable, a 15-item or greater stopping rule or a criterion reflecting a smaller degree of measurement error might be more desirable. On the other hand, for large scale studies where efficiency of administration is essential and less precision is required, even the 5-item CAT might be acceptable. Scores from the function CAT were correlated 0.94 with those generated by the fixed form scale while a correlation of 0.82 was observed between the disability CAT and the disability fixed-form scale. Although a correlation of this magnitude between a CAT version and item bank is not unusual (40, 41), since the pilot sample size was small the observed correlations were influenced by a small number of subjects. We found 3 subjects that displayed response patterns that were inconsistent between the fixed-form and the CAT version of the Late-Life FDI. When we removed those 3 subjects, the correlation between the disability CAT and the disability fixed-form scale rose to 0.934.

We note a number of limitations to this study. First, simulations of CAT scores, such as those reported in this study for the PIRC sample, are possible whenever datasets include responses to all items in an item pool, in this case, the fixed-form version of the Late-Life FDI. Simulations are based on the assumption that the answers to a subset of those items selected using CAT would be identical to the answers given if they were embedded in a larger fixed-form instrument. Such simulations are likely good (but not perfect) approximations of actual CAT administrations (42, 43). Results from the pilot study sample which completed both a CAT version of the Late-Life FDI and the fixed-form version of the instrument provide preliminary real-time evidence of levels of scoring comparability. Clearly the results of this real-time comparison need to be replicated in a larger and more representative sample of older adults.

The findings from this study are further limited by the relatively small number of questionnaire items available in the fixed-form version of the Late-Life FDI. In future work we plan to expand considerably the number of functions and disabilities in the Late-Life FDI item banks and construct an updated CAT instrument that will be able to draw from a more comprehensive pool of function and disability items to estimate an older person’s function and disability levels. One of the advantages of CAT-based instruments is the ability to replenish the items pools underlying a CAT instrument on a regular basis (44). An important advantage of this process is that CAT instruments can be improved relatively quickly based on data from ongoing outcome assessments while scores from the different CAT versions of the same instrument are made comparable along a common scoring metric.

And finally, it should be noted that this analysis was done on a sample of older adults living within a residential care facilities in New Zealand. Both the New Zealand cohort and the Boston pilot study sample study utilized persons with a restricted range of health and functional status and were not representative. This study needs to be replicated in repeated administrations of CAT instruments in prospective field studies with different types of older samples that are more representative of defined populations.


This study revealed that CAT methodology can be applied successfully to assess patient-reported functioning and disability in older adults. These preliminary findings suggest that the application of CAT methodology can reduce the time required for administration without significant loss of accuracy, precision or sensitivity to change. Although further work is needed to expand and refine the item pools in all three outcome domains, the results suggest that the CAT approach offers a viable solution to the long-standing conflict between the need for accuracy in clinical assessment and the equal need for practicality of administration.

Figure 3
Late Life Sample and Disability Item Bank Distributions
Table 7
Comparison of scores from prototype CAT and fixed-form Late-Life FDI instrument (N=40)


Supported by the NIA/NIH R41 AG027620-01 and an Independent Scientist Award (K02 HD45354-01) to Dr. Haley.


1. Department of Health Human Services (U.S.) 2nd ed. Washington: U.S. Government Printing Office; 2000. Healthy People 2010: With Understanding and Improving Health Objectives for Improving Health.
2. Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49:M85–M94. [PubMed]
3. Guralnik JM, Ferrucci L, Simonsick EM, et al. Lower-extremity function in persons over the age of 70 years as a predictor of subsequent disability. N Engl J Med. 1995;332:556–561. [PubMed]
4. Dunlop DD, Hughes SL, Manheim LM. Disability in activities of daily living: Patterns of change and a hierarchy of disability. Am J Public Health. 1997;87:378–383. [PMC free article] [PubMed]
5. McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med. 1997;127:743–750. [PubMed]
6. Ware JE., Jr Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;84 Supplement 2:S43–S51. [PubMed]
7. Rubenach S, Shadbolt B, McCallum J, et al. Assessing health-related quality of life following myocardial infarction: Is the SF-12 useful? J Clin Epidemiol. 2002;55:306–309. [PubMed]
8. Chen AL-T, Broadhead WE, Doe EA, Broyles WK. Patient acceptance of two health status measures: The Medical Outcomes Study Short-Form General Health Survey and the Duke Health Profile. Fam Med. 1993;25:536–539. [PubMed]
9. Beaton DE, Richards RR. Measuring Function of the Shoulder. J Bone Joint Surg. 1996;78-A(6):882–890. [PubMed]
10. McHorney CA, Bricker DE., Jr A qualitative study of patients' and physicians' views about practice-based functional health assessment. Med Care. 2002;40:1113–1125. [PubMed]
11. Lord F. Hillsdale, NJ: Erlbaum Associates; 1990. Applications of Item Response Theory to practical testing problems.
12. van der Linden W, Hambleton R. New York: Springer-Verlag New York, Inc; 1997. Handbook of Modern Item Response Theory.
13. Hambleton R, Swaminathan H, Rogers H. Newbury Park, CA: Sage Publications; 1991. Fundamentals of Item Response Theory.
14. Hambleton RK, Pitoniak MJ. Testing and measurement. Advances in item response theory and selected testing practices In: Pashler H, Yantis S, Medin D, Gallistel R, Wiated, editors.
15. Jette AM, Haley SM. Contemporary measurement techniques for rehabilitation outcomes assessment. J Rehabil Med. 2005;37:339–345. [PubMed]
16. Cella D, Gershon R, Lai J-S, Choi S. The future of outcomes measurement:item banking, tailored short forms, and computerized adaptive assessment. Qual Life Res. 2007;16:133–141. [PubMed]
17. Wainer H. Mahwah, NJ: Lawrence Erlbaum Associates; 2000. Computerized Adaptive Testing: A Primer.
18. Haley SM, Ludlow LH, Kooyoomjian JT. Extending the Range of Functional Assessment in Older Adults: Development of the Late-Life Function and Disability Instrument. J Aging Phys Act. 2002;10:453–465.
19. Haley SM, Jette AM, Coster WJ, et al. Late life function and disability instrument: II Development and evaluation of the function component. J Gerontol A Bio Sci Med Sci. 2002;57:M217–M222. [PubMed]
20. Jette AM, Haley SM, Coster WJ, et al. Late Life Function and Disability Instrument: I. Development and evaluation of the disability component. J Gerontol A Bio Sci Med Sci. 2002;57:M209–M216. [PubMed]
21. Sayers SP, Jette AM, Haley SM, et al. Validation of the Late-Life Function and Disability Instrument (LLFDI) J Am Geriatr Soc. 2004;52:1–6. [PubMed]
22. Dubuc N, Haley SM, Ni PS, et al. Function and disability in late life: comparison of the Late-Life Function and Disability Instrument to the Short-Form-36 and the London Handicap Scale. Disabil Rehabil. 2004;26:362–370. [PubMed]
23. Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989;54:427–450.
24. Ouellette M, LeBrasseur NK, Bean JF, et al. High-Intensity Resistance Training Improves Muscle Strength, Self-Reported Function and Disability in Long-Term Stroke Survivors. Stroke. 2004;35:1404–1409. [PubMed]
25. McAuley E, Konopack JF, Motl RW, et al. Measuring Disability and Function in Older Women: Psychometric Properties of the Late-Life Function and Disability Instrument. J Gerontol. 2005;60:901–909. [PubMed]
26. Mislevy RJ. Recent developments in the factor analysis of categorical variables. J Educ Stat. 1986;11:3–31.
27. Muthen B, Muthen L. Los Angeles: Muthen & Muthen; 1998. Mplus user's guide.
28. Beauducel A, Herzberg PY. On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Struct Equat Model. 2006;13:186–203.
29. Fischer G, Molenaar I. Berlin: Springer-Verlag; 1995. Rasch models: Foundations, recent developments, and applications.
30. Andrich D. Beverly Hills, CA: Sage Publications; 1998. Rasch models for measurement.
31. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–174.
32. Hariharan S, Rogers HJ. Detecting differential item functioning using logistic regression procedures. J Educ Meas. 1990;27:361–370.
33. Jodoin MG, Gierl MJ. Evaluating Type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education. 2001;14:329–349.
34. Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006;86:735–743. [PubMed]
35. Jette A, Haley S, Tao W, et al. Prospective evaluation of the AM-PAC-CAT in outpatient rehabilitation settings. Phys Ther. 2007;87:385–398. [PubMed]
36. Haley SM, Coster WJ, Andres PL, et al. Score comparability of short-forms and computerized adaptive testing: an illustration with the Activity Measure for Post-Acute Care (AM-PAC) Arch Phys Med Rehabil. 2004;85:661–666. [PubMed]
37. Haley S, Ni P, Hambleton R, Slavin M, Jette A. Computer Adaptive Testing Improved Accuracy and Precision of Scores over Random Item Selection in a Physical Functioning Item Bank. J Clin Epidemiol. 2006;59:1174–1182. [PubMed]
38. Bean JF, Olveczky D, Sharon L, et al. Self-Reported vs Observed Mobility Performance: Are the Underlying Factors Identical? Presented at the 2007. Annual Meeting of the American Geriatrics Society
39. Keysor J, Jette A, LaValley M, et al. Are Community Environmental Factors associated with Disability in Older Adults with Functional Limitations. Under review. 2007 [PMC free article] [PubMed]
40. Haley S, Siebens H, Coster W, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes. Arch Phys Med Rehabil. 2006;87:1033–1042. [PubMed]
41. Haley S, Gandek B, Siebens H, Black-Schaeffer R, Sinclair J, Tao W, Coster W, Ni P, Jette A. Computer adaptive testing follow-up after discharge from inpatient rehabilitation. II. Participation Outcomes. Arch Phys Med Rehabil. 2008;89:275–283. [PMC free article] [PubMed]
42. Ware J, Bjorner J, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies widely used headache impact scales. Med Care. 2000;38(9 Suppl):I173–I182. [PubMed]
43. Ware JE., Jr Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;84:S43–S51. [PubMed]
44. Haley S, Pengsheng N, Jette A, et al. Replenishing a Computerized Adaptive Test (CAT) of Patient Reported Outcomes. Qual Life Res. 2008 (in press)
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...