Logo of ptjournalLink to Publisher's site
Phys Ther. 2009 Jun; 89(6): 589–600.
PMCID: PMC2689784

Evaluation of an Item Bank for a Computerized Adaptive Test of Activity in Children With Cerebral Palsy


Background: Contemporary clinical assessments of activity are needed across the age span for children with cerebral palsy (CP). Computerized adaptive testing (CAT) has the potential to efficiently administer items for children across wide age spans and functional levels.

Objective: The objective of this study was to examine the psychometric properties of a new item bank and simulated computerized adaptive test to assess activity level abilities in children with CP.

Design: This was a cross-sectional item calibration study.

Methods: The convenience sample consisted of 308 children and youth with CP, aged 2 to 20 years (X=10.7, SD=4.0), recruited from 4 pediatric hospitals. We collected parent-report data on an initial set of 45 activity items. Using an Item Response Theory (IRT) approach, we compared estimated scores from the activity item bank with concurrent instruments, examined discriminate validity, and developed computer simulations of a CAT algorithm with multiple stop rules to evaluate scale coverage, score agreement with CAT algorithms, and discriminant and concurrent validity.

Results: Confirmatory factor analysis supported scale unidimensionality, local item dependence, and invariance. Scores from the computer simulations of the prototype CATs with varying stop rules were consistent with scores from the full item bank (r=.93–.98). The activity summary scores discriminated across levels of upper-extremity and gross motor severity and were correlated with the Pediatric Outcomes Data Collection Instrument (PODCI) physical function and sports subscale (r=.86), the Functional Independence Measure for Children (Wee-FIM) (r=.79), and the Pediatric Quality of Life Inventory–Cerebral Palsy version (r=.74).

Limitations: The sample size was small for such IRT item banks and CAT development studies. Another limitation was oversampling of children with CP at higher functioning levels.

Conclusions: The new activity item bank appears to have promise for use in a CAT application for the assessment of activity abilities in children with CP across a wide age range and different levels of motor severity.

The importance of using valid and reliable measures to evaluate the impact of surgical, pharmacological, and therapeutic interventions for children with cerebral palsy (CP) is well accepted.14 Over the past decade, clinicians and researchers have placed a renewed emphasis on activity-level measures to evaluate the impact of health care interventions on children's physical functioning in home, school, and community settings.57 The World Health Organization's International Classification of Functioning, Disability and Health (ICF) defines activity as the execution of specific tasks or actions by an individual.8 We used this definition to guide the development of an activity scale highlighting physical functioning. We included physical tasks and skills from the mobility; self-care; domestic life; and community, social, and civic life components of the ICF. We excluded tasks and skills that had a primary focus on cognition and behavior.

The use of parent-report measures is an accepted method for documenting the physical functioning of a child with CP in home and community environments.6,911 Parent-report measures such as the Pediatric Evaluation of Disability Inventory (PEDI)12 and the Pediatric Outcomes Data Collection Instrument (PODCI)13 are commonly used in clinic and research settings to measure activity-level abilities in children with CP. Other instruments, such as the Activity Scale for Kids (ASK),14 the Pediatric Quality of Life–Cerebral Palsy version (PedsQL-CP),15 the Functional Assessment Questionnaire (FAQ),16 and the Functional Mobility Scale (FMS),17 also have been used to measure activity-level changes in children with CP. These measures yield data that are reliable and valid; however, as individual instruments, all have limitations. The FAQ and FMS concentrate on functional mobility only, whereas the ASK, PEDI, and PODCI include other areas of function such as self-care and play, but may not assess mobility skills in sufficient depth. The PODCI combines health-related quality-of-life questions with physical function questions, thereby limiting content breadth and specificity.18 The PEDI and PODCI can take 30 minutes or more to administer, which often creates a substantial response burden in a busy clinic or a research study with multiple outcome measures. Other limitations of these scales include ceiling and floor effects when used across wide age and ability ranges and limited item content, especially in the areas of more-complex tasks.3

There is a need for a parent-report measure that: (1) can be used to document activity abilities at program and individual child levels, (2) is inclusive of young children and teen age groups, and (3) is feasible to administer in a clinical setting. Yet, fixed-length instruments that cover a broad range of ages and functional abilities often have too many items and are overly burdensome. Using traditional methods, it has become clear that no single, fixed-length instrument can meet these content and psychometric standards for children and youth with CP throughout a wide age range (2–21 years).19

The use of computerized adaptive testing (CAT) provides an alternative approach to these measurement and practical issues.20 Computerized adaptive testing utilizes a software algorithm that selects questions appropriate to the child's functional ability by using previous responses to create a score estimate. Item information functions,20 based on the locations and discrimination of each item, form the basis for the selection of each new item within the CAT program, as the item with the maximum information at the current score level is chosen. In effect, items administered in a CAT are customized to the individual parent-report of the child's functional level by skipping items that are clearly too easy or too difficult for the child's expected capabilities, given the previous responses. Computerized adaptive testing software shortens or lengthens the test to achieve the desired precision and scores all children on a standard metric so that results can be compared across children. The potential advantages of CAT programs in the evaluation of functional abilities of children have been documented previously.2128

Starting from a list of items adapted from existing instruments and newly created items, we collected parent-report data on a new item bank to measure the ability of children with CP to perform activities in their home and community environments. Our goal was to build an item bank that would cover relevant clinical ages (2–21 years) and levels of severity of children and youth with CP typically seen at the Shriners Hospitals for Children (SHC) orthopedic hospitals. In order to fully assess the potential usefulness of the item bank, we created simulations of CAT scores based on the data collected during the item bank calibration phase. Computerized adaptive testing simulations are a common approach for investigating the merits of an item bank and its potential for providing the foundation for a CAT program.29 During the simulation program, as items are selected for administration to parents, responses are taken directly from the actual data set. The complete set of activity item responses and subsequent score estimates serve as the criteria against which CAT-based scores are compared.

The purpose of this study was to examine initial psychometric properties (unidimensionality; local item dependence; item invariance, including stability across groups and differential item functioning [DIF]; scale coverage; score agreement with CAT algorithms; and discriminant and concurrent validity) of an activity item bank and resulting simulated CAT program designed specifically to assess activity level and physical function in children with CP. Our long-term goal was to create a series of multifaceted item banks that can assess global physical health, upper- and lower-extremity skills, and activity via CAT technology for the monitoring of functional outcomes and the assessment of change with interventions.30 In this article, we report on the results of the activity scale.



Parent-report data were collected on a convenience sample of 308 children and youth with CP. Inclusion criteria were: a diagnosis of CP, ages 2 to 20 years, and parents with a primary language of English. Participants were excluded if they had had surgical or pharmacological interventions within the past 6 months. Often after these interventions, deterioration in ambulation or hand function may be observed due to orthopedic restrictions, and children's functional abilities may not correlate with their baseline abilities or overall severity level. Data were collected across 3 SHC orthopedic hospitals in Philadelphia, Pennsylvania, Montreal, Canada, and Springfield, Massachusetts, and at Franciscan Hospital for Children (FHC), Boston, Massachusetts.

The mean age of the sample was 10.7 years (SD=4.0). Demographic characteristics of the sample are presented in Table 1. Our sample is not fully representative of population data reported elsewhere,31 as we have underrepresented children with more-severe gross motor disabilities. Because 2 of the 3 SHC sites were motion laboratory-based facilities, it was typical at those sites to recruit primarily ambulatory participants.

Table 1.
Demographic Characteristics of Samplea

Activity Item Bank

The original activity item pool contained 70 items that sampled physical functioning in home and community settings. These items were identified from review of similar instruments and related literature, as well as through discussion with clinicians and families. Conceptually, we decided that daily activity tasks typically involved a combination of upper- and lower-extremity skills, required several steps to complete, and included skills needed for family routines, play, and school activities. The areas of focus in this construct included basic and instrumental activities of daily living (ADL) and sports, play, and recreation activities. Based on the judgment of clinicians and researchers at SHC and FHC and the results of cognitive testing,32 we were able to reduce the items to be tested from 70 to 45. Items were removed if item wording appeared ambiguous, an item had similar content to items in the lower- or upper-extremity skills scales, items depended on adaptive devices, or activities were not relevant for all children (such as putting on a brace or doing a home exercise program).


Activity items along with items from the 3 other scales (global physical health, lower-extremity and mobility, and upper-extremity skills) were administered to parents using a PC-based tablet. The global physical health scale33 assesses pain and fatigue, the upper-extremity skills scale34 samples functional status in hand and dexterity skills, and the lower-extremity and mobility scale35 examines lower-extremity functioning and mobility, including mobility with devices. In a few cases (n=11), parents who were unable to complete the survey during the clinic visit completed it at home using a Web-based interface. For the calibration testing, the activity items were rated by a parent or caregiver and were judged on the basis of the following 5-point rating scale: 0=“unable to do,” 1=“with much difficulty,” 2=“with some difficulty,” 3=“with a little difficulty,” and 4=“without any difficulty.”

External Measures

Severity of CP initially was rated by the parents and then confirmed by the research staff using the Gross Motor Function Classification System (GMFCS),36 which rates children on a 5-point severity scale based primarily on ambulatory ability, and the Manual Ability Classification System (MACS),37 which categorizes children on a 5-point severity scale based on hand function and dexterity. Both classification systems have been shown have reliability and validity for use in children and youth with CP.3739

Concurrent validity comparisons were chosen on the basis of whether the instruments were currently used across many of the SHC hospitals. To serve as concurrent validity comparisons, subsets of parents also completed the PODCI40 (n=168), the PedsQL-CP15 (n=77), and the Functional Independence Measure for Children (Wee-FIM)41 (n=113). The PODCI was developed specifically to assess changes following pediatric orthopedic interventions for a broad range of diagnoses, including CP. The dimension most similar to the new activity scale was the subdomain of physical function and sports. The Wee-FIM is a standard outcome measure used in many of the SHC hospitals, and its scores include a motor function score. The PedsQL-CP is an adapted form of the generic PedsQL developed specifically for children with CP. The most relevant subscale within the PedsQL-CP is daily activity.

Data Analysis


Item Response Theory (IRT) and CAT methods assume certain measurement properties of item sets. These include the assumptions of unidimensionality, local independence, and stability of item parameters (item invariance) across groups (eg, types of CP). Item sets that violate these assumptions may be less effective in modeling the latent variable and may limit the accuracy of a CAT instrument.42,43 We tested the latent structure of the activity items by confirmatory factor analyses (CFAs)44 and evaluated item loadings and residual correlations among items using Mplus software.45,* Model fit was assessed by multiple fit indexes such as the Comparative Fit Index (CFI), the Tucker Lewis Index (TLI), and root mean square error approximation (RMSEA). Recent simulation reports suggest that for most of the fit indexes, it is difficult to establish strict cutoff criteria.46,47 We used unweighted least squares means and variance-adjusted estimation methods, which are more precise when analyzing small to moderate-size samples with skewed categorical data.48,49 Four pieces of evidence were reviewed to determine the extent to which a unidimensional model adequately represented the activity scale: (1) item loadings on the primary factor, the percentage of variance attributed to the first factor, and the ratio of eigenvalues between the first and second factors; (2) results from overall model fit tests; (3) residual correlations between all possible pairs of items; and (4) the patterns of inter-item correlations among items. We retained items with factor loadings greater than 0.4 in the item bank. We considered items with residual correlations greater than 0.2 to be locally dependent items.50

Item calibrations, fit, and score estimates.

The item parameters for each scale were estimated using the Graded Response Model (GRM), but by restricting the slope parameter to a single value.51 This one-parameter logistic model using GRM was selected as the best solution for this phase of the project because of the relatively small sample size and the observation that most of the items had high, but similar, point-biserial correlations, suggesting that discrimination did not vary much across items. The item parameters and fit statistics were calculated using PARSCALE,51, which is based on marginal maximum likelihood estimation. We evaluated item fit using the likelihood ratio chi-square statistic. Probability values less than .05 suggest item misfit. We evaluated the individual scores by weighted maximum likelihood estimation.52 The individual scores were standardized to a mean of 50 and standard deviation of 10 (T-scale).

Item invariance.

To examine item parameter stability, we grouped the participants into high- and low-function groups according to activity ability level and calculated the correlation between the item parameters estimated based on those 2 groups. With IRT, the child's score on an item should depend entirely on the latent variable (ability to perform activities). Significant DIF indicates that variables other than the activity variable, such as age (<11 years, ≥11 years), type (hemiplegic, diplegic, quadriplegic), or severity of CP (as assessed with the GMFCS or the MACS), are likely influencing the responses.53 The analysis of DIF was conducted using ordinal logistic regression.54 If a variable produced significant model coefficients and explained more than 2% of the variance, considering the total score, then an item was considered to exhibit DIF. Because of the number of items that were analyzed in the final item bank (36 items), the alpha level was set to .0014 (using the Bonferroni adjustment, .05/36). If the likelihood ratio test was statistically significant and the R-square change was greater than .07, we designated that as large DIF. If the likelihood ratio test was statistically significant and the R-square change was between .035 and .07, we designated that as moderate DIF. Otherwise, values indicated small DIF.

Scale coverage.

To evaluate the matching of item content with the estimated activity scores of the sample, we produced parallel item maps in which item category expected values55 and person scores were plotted on the same metric (X=50, SD=10). The expected value is the sum of the category values multiplied by their probabilities:

An external file that holds a picture, illustration, etc.
Object name is zad00609-2802-m01.jpg

where Ei(θ) is the expected value for item i at score level θ, m is the number of rating scale categories, and Pij(θ) is the category probability for item i category j at score θ.

The logit score that is the best estimate of the expected value is in the middle range of each response category. For each item, the expected value of each response category (5 per item) was plotted in this item map. These expected item response category values are used rather than step estimates because the expected values are more representative of the full content range of each item. The content range was based on estimated locations of the item-response categories that represent the lowest and highest levels of ability of the sample. In addition, we identified the number of individuals who received the highest possible score (ceiling) and the lowest possible score (floor).

CAT real data simulations.

We based the activity CAT algorithms on the HDRI software24 developed at the Health and Disability Research Institute, Boston, Massachusetts. The CAT software includes options for item selection, score estimation using weighted likelihood,52 and stop rules based on either the number of items, level of precision, or both. We used a real data simulation approach for investigating the merits of CAT; that is, the complete set of the actual item responses of parents to estimate their ability in activity items (IRT criterion score) served as the criterion against which scores from the CAT were compared. As items were selected for administration in the simulation, responses were taken from the actual data set. We selected the item “getting up from the floor” to be the first activity item administered to all participants because its difficulty parameter was in the middle of the range and content seemed appropriate for most children. After each response, an estimated score based on all administered items to that point in the simulation and the associated standard error was calculated. The selection of the next item was based on the item that could provide the highest information at the estimated score. We established specific stop rules based on the number of items (5, 10, or 15) and did not use precision for stop rule decisions. The validity of this real data simulation approach for studying CAT estimated scores assumes that people respond in much the same way to items regardless of their context; that is, items that precede or follow or short versus long forms would not influence a person's responses to items. Basically, this is the assumption of independence of item responses that is made with all common IRT models. In the present study, we developed 3 CAT scores in the simulations to reflect the 3 stop rules based on number of items (CAT-15, CAT-10, and CAT-5). These simulated scores were compared with a “gold standard” (ie, the actual IRT latent trait score for activity estimated by the full item bank).

Discriminant and concurrent validity.

Our logic in analyzing the concurrent and discriminant validity was to determine whether the person score estimates from the full item bank and the 3 CAT versions could produce interpretable scores. The ability of the full item bank and each CAT version (5-, 10-, and 15-item stop rules) to discriminate between groups of children based on levels of severity was evaluated by comparing average scores across the MACS and GMFCS levels using one-way analysis-of-variance tests with post hoc comparisons. Because of the relatively small numbers in GMFCS and MACS levels IV and V, we combined them. To assess concurrent validity, Pearson correlations were calculated between the full item bank and the PODCI physical function and sports skills summary score, the WeeFIM motor score, and the PedsQL-CP daily activity and school activity subscales.

Funding Source for the Study

This study was supported by the Shriners Hospital for Children Foundation (grant 8957) and an Independent Scientist Award to Dr Haley (National Center on Medical Rehabilitation Research/National Institute of Child Health and Human Development/National Institutes of Health, grant K02 HD45354–01A1).


We reduced the available activity items from 45 to 36 items as the final item bank. These 36 items met all of the criteria for the final item bank and were incorporated into the CAT simulation program. Items were removed based on poor item fit, low discrimination, or redundant content, as described below. The Cronbach alpha for the 36-item scale was .98.


Based on the final item bank of 36 items, one factor explained 72% of the item variance, and all of the factor loadings were moderate to very high (range=0.607–0.918). The ratio of the first factor to the second factor was 18.8:1. The CFI and TLI values indicated acceptable fit; the RMSEA of 0.103 was higher than the acceptable range. A summary of the CFA results for the 45- and 36-item scales is provided in Table 2. The average inter-item correlation was .70 (SD=.09). Based on the multiple fit indexes, factor analysis, and inter-item correlation results, we concluded that the unidimensionality assumptions of the activity scale were met. There were no residual correlations greater than .2, so the local independence assumption also was satisfied.

Table 2.
Confirmatory Factor Analysis Results From the Activity Item Bank

Item Calibrations and Fit

The data fit the GRM fixed slope (χ2=842, df=815, P=.243). The item fit was generally acceptable in the final item bank, with the exception of 2 items (“eats a meal” [P=.014] and “gets into and out of a car” [P=.02]). We chose to retain these items in the bank due to their importance of content or their location along the activity scale.

Item Invariance

We examined item parameters stability by grouping participants into high- and low-function groups according to activity ability level; the correlation between the item parameters estimated based on those 2 groups was .88. The activity items in the final item bank are listed in the Appendix.

No DIF was noted for MACS level. Only one item (“climbs and moves on high playground equipment”) showed moderate DIF for age. Two items (“hops and skips while playing games with other children of similar age, such as during hopscotch or a relay race” and “prepares to eat a meal”) showed moderate DIF in CP diagnosis. Children with quadriplegia had more difficulty with these 2 items compared with children with either hemiplegia or diplegia. Three similar items (“prepares to eat a meal,” “crosses a quiet 2-lane neighborhood street,” and “keeps up with other children of similar age while walking up stairs”) showed moderate DIF in GMFCS levels, with children with greater gross motor severity having more difficulty with these items. Because of their important content, as indicated during the cognitive testing sessions, we did not remove any of the DIF items at this time. In the future, these items may be removed or revised.

Scale Coverage

We found generally good coverage of the sample with the 36 items, as indicated in Figure 1, which displays the item map of person scores and expected response category values for all 36 items. For example, the expected category value for “unable to do” for the item “eats a meal” (easiest item) is about 26 on the mean of 50 (SD=10) metric. The expected category value for “without any difficulty” for the most-difficult item (“hikes or jogs for 2 or more miles”) is about 75. Locations for all of the other categories for the other items in between are depicted in Figure 1. There were minimal ceiling effects (n=3 [1%]) and floor effects (n=11 [3.6%]). A couple of areas along the vertical item category scale (around a score of 50 to 60 in Fig. 1) had some small gaps in content coverage.

Figure 1.
Item map for activity scale. The item categories are represented by the activity summary score that produces the expected value in the middle of each category range. The expected value is defined by the sum of the category values multiplied by their probabilities. ...

CAT Simulations

As reported in Table 3, the descriptive statistics for scores from the 10- and 15-item CAT simulations were quite similar to those for the full item bank score. The mean score of the 5-item CAT was slightly lower than the full item bank score, and the variance and range of the 5-item CAT scores were identical. The Pearson correlations between CAT scores and the full item bank scores were quite strong, even in the 5-item CAT simulation, indicating that the CAT scores accurately captured the information from the entire item bank.

Table 3.
Comparison of Scores From Simulated Computerized Adaptive Testing (CAT) and Full Item Bank

Discriminant and Concurrent Validity

Because of the relatively small number of participants in GMFCS and MACS levels IV and V, the groups were combined for discriminant analyses. The full activity scale with 36 items was able to discriminate across known groups of upper-extremity (F3,284=53.59, P<.0001) and gross motor (F3,287=99.58, P<.0001) severity levels (Figs. 2 and and3).3). Post hoc tests yielded significant results across all categories. In addition, 5-item (F3,286=92.16, P<.0001), 10-item (F3,286=101.99, P<.0001), and 15-item (F3,287=98.55, P<.0001) simulated CAT scores were able to discriminate among GMFCS categories, and 5-item (F3,283=46.82, P<.0001), 10-item (F3,283=53.53, P<.0001), and 15-item (F3,284=53.67, P<.0001) simulated CAT scores were able to discriminate among MACS categories. In 5-item CAT simulations, the only adjacent pair that did not show a significant difference was the comparison between the MACS levels II and III.

Figure 2.
Discriminant validity of the activity scale with the Manual Activity Classification System. Bar graphs depict average score (95% confidence interval) at different manual severity levels with different Computerized Adaptive Test item stop rules ...
Figure 3.
Discriminant validity of the activity scale with the Gross Motor Function Classification System. Bar graphs depict average score (95% confidence interval) at different gross motor severity levels with different Computerized Adaptive Test item ...

When compared with the PODCI physical function and sports subscale, the activity scale correlation was r=.86 for scores from the full item bank. All of the Wee-FIM motor scores (r=.79) and the PedsQL-CP daily activity scores (r=.74) were strongly correlated with the new activity item bank scores.


Assessment of the conduct of daily activities at home and school is a key component of an overall health and functional evaluation for children with CP.6,18 The ability of children with CP to perform age-related daily activities can offer significant challenges and can help determine the long-term effects of specific interventions and the quality of an overall rehabilitation and physical therapy program. The determination of the impact of interventions, both at the program level and for individual children with CP, has become complicated. Different hospitals, clinics, and programs, both within and outside the SHC system, use current parent-report instruments cannot easily be compared. The results of this study suggest that a single activity scale might provide a uniform assessment approach for children with CP across a wide age range and across multiple levels of severity.

Based on the CFAs and item fit analyses, the final set of activity items were sufficiently unidimensional to meet the assumption of IRT modeling. A number of items with large misfit were removed; for example, “social dancing” could be accomplished by, at a very rudimentary level, children in wheelchairs, and children with high levels of physical functioning ability may choose not to take part in dancing activities due either to lack of peer acceptance or to being uncomfortable with the activity. We did keep in the bank a few items that exceeded the threshold for DIF or fit. These items were retained mainly for content. By keeping items in the bank that exhibit either DIF or misfit, the estimation of scores may have been affected negatively. However, these items appeared to have trivial effect on the CAT-15 scores and likely a small effect on the CAT-10 scores.

Using the full 36 final items, we found minimal ceiling and floor effects across a very diverse sample of children with CP with a wide age range. The children with the minimum score (n=11) were not in the range of 2 to 5 years; the youngest child who received a minimum score was 7 years of age, and the average age of children who received a minimum score was 12 years. These findings indicate that the coverage for all ages was appropriate; we had some problems covering children with the most-severe activity restrictions. We did have relatively small numbers of children at both the low and high ends of the age range of 2 to 21 years. This creates some limitation in knowing how well the scale works for the entire age range. We hope in the future to sample more children in the lower (2–5 years) and higher (14–21 years) age groups to make sure the item bank is robust for these age groups. As shown in Figure 1, we found only small gaps in item coverage along the full activity scale. A previous study19 has reported major problems with ceiling effects for children with CP at GMFCS level I and floor effects for children with CP at GMFCS levels IV and V. We will need to address the floor effects for children who are most severely involved in future item bank development.

The results of the CAT simulations indicate that the 5-, 10-, and 15-item models yield accurate estimates of activity in children with CP. Several other studies on the development of CAT models also have conducted simulations using real data sets, comparing responses to all items in the item bank with CAT simulations of various lengths.21,24,27,28 We believe these simulations are likely good approximations of actual CAT administrations, yet some overestimation is possible. In future studies, administration of the full item bank along with the CAT models is recommended in prospective clinical studies to examine accuracy in real clinical situations.

These preliminary data indicate that the full item bank and all 3 CAT versions can discriminate among GMFCS and MACS levels for children with CP. It is noteworthy that all of the simulated CAT versions were able to discriminate across the 4 categories (combined IV and V) of the GMFCS and MACS. One limitation we should note is that although the GMFCS is valid for children up through the age of 12 years, we applied it to the entire sample. Future work should use the expanded and revised version of the GMFCS.56

We found the activity scale unable to discriminate between MACS levels II and III. In contrast, we found the companion scale on upper-extremity skills clearly discriminated between these 2 levels.34 The PODCI physical function and sports subscale, the PedsQL-CP daily activity scale, and the Wee-FIM motor scale appear to measure related activity concepts, as indicated by their high positive correlations.

Our sample size was relatively small for the analyses that we conducted. The effects of a small sample size were minimized by using a one-parameter model. In the future, a larger sample will be used for the calibration work so that we can be more confident of the findings and extend our analyses to using 2-parameter models, if warranted. We decided to publish results at this stage because the findings were compelling and this work is one of the first examples of developing a CAT for children with CP. Our data are not fully representative of population-based studies of children with CP, as we have underrepresented children with more-severe limitations in mobility. Nevertheless, we had sufficient low-level activity items for most of the children with low activity levels.

We acknowledge an additional concern in the interpretation of the data. We combined arguably 2 modes of data collection—collection of data in the clinic and data collection by parents at home (n=11) using the Internet. We combined these modes primarily to increase our sample size. This difference in mode may have created some error and bias; however, we believe any error or bias was small.

Although most indicators we used (TLI, CFI, inter-item correlations, factor analysis results) suggested good fit to a unidimensional scale, the RMSE was lower than ideal. These findings may have been due to our effort at putting some nontraditional activity items into the item pool, such as taking part in indoor games and sports. We felt these items were important to the activity scale and should be able to be scaled into one activity scale. Future work will be needed to confirm these findings.

There are a number of additional steps in our CAT development that will be needed prior to expectations for widespread use. First, we have conducted test-retest reliability on the activity CAT, and the results will be presented soon in a future report. We are currently testing the sensitivity and responsiveness of the activity CAT in a series of children with lower-extremity surgeries. Another approach has been to expand the item bank to include children with conditions other than CP. In our first phase, we are building and testing new items for children with brachial plexus birth palsy. This effort also may help fill in some items that are needed at the lower end of the activity continuum.

If successful, CAT versions will be used within the SHC system to evaluate activity changes in children with CP after orthopedic surgeries and conservative interventions such as bracing, therapy, and spasticity (hypertonicity) management medications and injections. Using CAT versions of the other scales (global physical health, upper- and lower-extremity skills) in conjunction with the activity scale will assist physical therapists and other clinicians in understanding the relationship among changes in different functional areas following rehabilitation interventions.


The activity item bank met the required IRT assumptions and covered the range of activity seen in children with CP between ages 2 to 20 years. Based on the CAT simulations, CAT versions of the activity scale yielded summary scores comparable to summary scores estimated using all items and could discriminate across CP severity levels at the same magnitude as the full set of items. We conclude that this initial item bank development work has the potential to produce a CAT that efficiently assesses activity functioning in children with CP.


Item Characteristics and Differential Item Functioning (DIF)


Dr Haley, Ms Fragala-Pinkham, Ms Dumas, Dr Ni, Mr Gorton, Dr Watson, and Dr Tucker provided concept/idea/research design. Dr Haley, Ms Fragala-Pinkham, Ms Dumas, Dr Ni, and Dr Tucker provided writing. Ms Fragala-Pinkham, Ms Dumas, Dr Ni, Mr Gorton, Dr Watson, Ms Montpetit, Ms Bilodeau, and Dr Tucker provided data collection. Dr Haley, Dr Ni, Mr Gorton, Dr Hambleton, and Dr Tucker provided data analysis. Dr Haley and Dr Tucker provided project management. Dr Tucker provided fund procurement and facilities/equipment. Ms Fragala-Pinkham, Ms Dumas, Mr Gorton, Ms Montpetit, Ms Bilodeau, and Dr Tucker provided participants. Ms Dumas, Ms Montpetit, and Dr Tucker provided institutional liaisons. Ms Fragala-Pinkham, Ms Dumas, Dr Ni, Ms Montpetit, Ms Bilodeau, Dr Hambleton, and Dr Tucker provided consultation (including review of manuscript before submission).

Human subject approval was obtained at each participating institution and through the Boston University Institutional Review Board.

This study was supported by the Shriners Hospital for Children Foundation (grant 8957) and an Independent Scientist Award to Dr Haley (National Center on Medical Rehabilitation Research/National Institute of Child Health and Human Development/National Institutes of Health, grant K02 HD45354-01A1).

*Muthén & Muthén, 3463 Stoner Ave, Los Angeles, CA 90066.

Scientific Software International Inc, 7383 N Lincoln Ave, Ste 100, Lincolnwood, IL 60712-1747.


1. Oeffinger DJ, Tylkowski CM, Rayens MK, et al. Gross Motor Function Classification System and outcome tools for assessing ambulatory cerebral palsy: a multicenter study. Dev Med Child Neurol. 2004;46:311–319. [PubMed]
2. Calmes J, Damiano D, Oeffinger DJ, et al; Group TFAR. Relationship and redundancy among measures of pediatric health-related function in ambulatory children with cerebral palsy. Dev Med Child Neurol. 2005;47(suppl 102):24–25.
3. Sullivan E, Douglas B, Linton JL, et al. Relationships among functional outcome measures used for assessing children with ambulatory CP. Dev Med Child Neurol. 2007;49:338–349. [PubMed]
4. Oeffinger DJ, Gorton GE, Bagley A, et al. Outcome assessments in children with cerebral palsy, part I: descriptive characteristics of GMFCS levels I to III. Dev Med Child Neurol. 2007;49:172–180. [PubMed]
5. Palisano RJ, Copeland WP, Galuppi BE. Performance of physical activities by adolescents with cerebral palsy. Phys Ther. 2007;87:77–87. [PubMed]
6. Schenker R, Coster W, Parush S. Participation and activity performance of students with cerebral palsy within the school environment. Disabil Rehabil. 2005;27:539–552. [PubMed]
7. Msall ME, Avery RC, Tremont MR, et al. Functional disability and school activity limitations in 41,300 school-age children: relationship to medical impairments. Pediatrics. 2003;111:548–553. [PubMed]
8. International Classification of Functioning, Disability and Health (ICF). Geneva, Switzerland: World Health Organization; 2001.
9. Tervo RC, Symons F, Stout J, Novacheck T. Parental report of pain and associated limitations in ambulatory children with cerebral palsy. Arch Phys Med Rehabil. 2006;87:928–934. [PubMed]
10. Pirpiris M, Gates PE, McCarthy JJ, et al. Function and well-being in ambulatory children with cerebral palsy. J Pediatr Orthop. 2006;26:119–124. [PubMed]
11. Majnemer A, Mazer B. New directions in the outcome evaluation of children with cerebral palsy. Semin Pediatr Neurol. 2004;11:11–17. [PubMed]
12. Haley SM, Coster WJ, Ludlow LH, et al. Pediatric Evaluation of Disability Inventory: Development, Standardization, and Administration Manual. Boston, MA: Trustees of Boston University; 1992.
13. Daltroy LH, Cats-Baril WL, Katz JN, et al. The North American Spine Society Outcome Assessment Instrument: reliability and validity tests. Spine. 1996;21:741–749. [PubMed]
14. Young N, Williams J, Yoshida K, Wright J. Measurement properties of the Activities Scale for Kids. J Clin Epidemiol. 2000;53:125–137. [PubMed]
15. Varni JW, Burwinkle TM, Berrin SJ, et al. The PedsQL in pediatric cerebral palsy: reliability, validity, and sensitivity of the Generic Core Scales and Cerebral Palsy Module. Dev Med Child Neurol. 2006;48:442–449. [PubMed]
16. Novacheck TF, Stout JL, Tervo R. Reliability and validity of the Gillette Functional Assessment Questionnaire as an outcome measure in children with walking disabilities. J Pediatr Orthop. 2000;20:75–81. [PubMed]
17. Graham HK, Harvey A, Rodda J, et al. The Functional Mobility Scale. J Pediatr Orthop. 2004;24:514–520. [PubMed]
18. Harvey A, Robin J, Morris M, et al. A systematic review of measures of activity limitation for children with cerebral palsy. Dev Med Child Neurol. 2008;50:190–198. [PubMed]
19. McCarthy ML, Silberstein CE, Atkins EA, et al. Comparing reliability and validity of pediatric instruments for measuring health and well-being of children with spastic cerebral palsy. Dev Med Child Neurol. 2002;44:468–476. [PubMed]
20. Wainer H. Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates; 2000.
21. Coster WJ, Haley SM, Ni P, et al. Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory Arch Phys Med Rehabil. 2008;89:622–629. [PMC free article] [PubMed]
22. Mulcahey MJ, Haley SM, Duffy T, et al. Measuring physical functioning in children with spinal impairments with computerized adaptive testing. J Pediatr Orthop. 2008;28:330–335. [PMC free article] [PubMed]
23. Jacobusse G, van Buuren S. Computerized adaptive testing for measuring development of young children. Stat Med. 2007;26:2629–2638. [PubMed]
24. Haley SM, Ni P, Ludlow LH, Fragala-Pinkham MA. Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the Pediatric Evaluation of Disability Inventory. Arch Phys Med Rehabil. 2006;87:1223–1229. [PubMed]
25. Haley SM, Fragala-Pinkham MA, Ni P. Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme. Clin Rehabil. 2006;20:616–622. [PubMed]
26. Haley SM, Ni P, Hambleton RK, et al. Computer adaptive testing improves accuracy and precision of scores over random item selection in a physical functioning item bank. J Clin Epidemiol. 2006;59:1174–1182. [PubMed]
27. Haley SM, Ni P, Fragala-Pinkham MA, et al. A computer adaptive testing approach for assessing physical function in children and adolescents. Dev Med Child Neurol. 2005;47:113–120. [PubMed]
28. Haley SM, Raczek AE, Coster WJ, et al. Assessing mobility in children using a computer adaptive testing version of the Pediatric Evaluation of Disability Inventory (PEDI). Arch Phys Med Rehabil. 2005;86:932–939. [PubMed]
29. Sands W, Waters BK, McBride JR. Computerized Adaptive Testing: From Inquiry to Operation. Washington DC: American Psychological Association; 1997.
30. Tucker CA, Haley SM, Watson K, et al. Physical function for children and youth with cerebral palsy: item bank development for computer adaptive testing J Pediatr Rehabil Med. 2008;1:237–244. [PubMed]
31. Howard J, Soo B, Graham H, et al. Cerebral palsy in Victoria: motor types, topography and gross motor function. J Paediatr Child Health. 2005;41:479–483. [PubMed]
32. Dumas HM, Watson K, Fragala-Pinkham MA, et al. Cognitive interviewing to elicit parent feedback of test items for assessing physical function in children with cerebral palsy. Pediatr Phys Ther. 2008;20:356–362. [PubMed]
33. Haley SM, Ni P, Dumas HM, et al. Measuring global physical health in children with cerebral palsy: illustration of a bi-factor model and computerized adaptive testing. Qual Life Res. 2009. Feb 17 [Epub ahead of print]. [PMC free article] [PubMed]
34. Tucker CA, Montpetit K, Bilodeau N, et al. Assessment of children with cerebral palsy using a parent-report computer adaptive test, I: upper extremity skills. Dev Med Child Neurol. In press.
35. Tucker CA, Gorton GE, Watson K, et al. Assessment of children with cerebral palsy using a parent-report computer adaptive test, II: lower extremity and mobility skills Dev Med Child Neurol. In press. [PubMed]
36. Palisano RJ, Rosenbaum PL, Walter S, et al. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol. 1997;39:214–223. [PubMed]
37. Eliasson A, Krumlinde-Sundholm L, Rösblad B, et al. The Manual Ability Classification System (MACS) for children with cerebral palsy: scale development and evidence of validity and reliability. Dev Med Child Neurol. 2006;48:549–554. [PubMed]
38. Morris C, Kurinczuk JJ, Fitzpatrick R, Rosenbaum PL. Who best to make the assessment? Professionals’ and families’ classifications of gross motor function in cerebral palsy are highly consistent. Arch Dis Child. 2006;91:675–679. [PMC free article] [PubMed]
39. Wood E, Rosenbaum PL. The Gross Motor Function Classification System for cerebral palsy: a study of reliability and stability over time. Dev Med Child Neurol. 2000;42:292–296. [PubMed]
40. Daltroy LH, Liang MH, Fossel AH, Goldberg MJ; Group POID. The POSNA Pediatric Musculoskeletal Functional Health Questionnaire: report on reliability, validity, and sensitivity to change. J Pediatr Orthop. 1998;18:561–571. [PubMed]
41. Guide for the Functional Independence Measure for Children (WeeFIM) of the Uniform Data System for Medical Rehabilitation, Version 4.0: Community/Outpatient. Buffalo, NY: State University of New York at Buffalo; 1993.
42. Hambleton RK, Swaminathan H, Rogers H. Fundamentals of Item Response Theory. Newbury Park, CA: Sage Publications; 1991.
43. van der Linden W, Hambleton RK. Handbook of Modern Item Response Theory. Berlin, Germany: Springer; 1997.
44. Mislevy RJ. Recent developments in the factor analysis of categorical variables. J Ed Stat. 1986;11:3–31.
45. Muthen BO, Muthen L. Mplus User's Guide. Los Angeles, CA: Muthen & Muthen; 1998.
46. Fan X, Sivo S. Sensitivity of fit indices to model misspecification and model types. Multivariate Behav Res. 2007;42:509–529.
47. Chen F, Curran P, Bollen K, et al. An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models. Social Methods Research. 2008;36:462–494. [PMC free article] [PubMed]
48. Ximénez C. A Monte Carlo study of recovery of weak factor loadings in confirmatory factor analysis. Structural Equation Modeling. 2006;13:587–614.
49. Maydeu-Olivares A. Limited information estimation and testing of Thurstonian models for paired comparison data under multiple judgment sampling. Psychometrika. 2001;66:209–228.
50. Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45(5 suppl 1):S22–S31. [PubMed]
51. Muraki E, Bock RD. PARSCALE: IRT Item Analysis and Test Scoring for Rating—Scale Data. Chicago, IL: Scientific Software International; 1997.
52. Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989;54:427–450.
53. Hariharan S, Rogers H. Detecting differential item functioning using logistic regression procedures. J Ed Meas. 1990;27:361–370.
54. Crane PK, Gibbons LE, Ocepek-Welikson K, et al. A comparison of three sets of criteria for determining the presence of differential item functioning using ordinallogistic regression. Qual Life Res. 2007;16(suppl 1):69–84. [PubMed]
55. Lai J-S, Cella D, Chang C-H, et al. Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale. Qual Life Res. 2003;12:485–501. [PubMed]
56. Palisano RJ, Rosenbaum P, Bartlett D, Livingston MH. Content validity of the expanded and revised Gross Motor Function Classification System. Dev Med Child Neurol. 2008;50:744–750. [PubMed]

Articles from Physical Therapy are provided here courtesy of American Physical Therapy Association

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...