NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Berkman ND, Wallace I, Watson L, et al. Screening for Speech and Language Delays and Disorders in Children Age 5 Years or Younger: A Systematic Review for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2015 Jul. (Evidence Syntheses, No. 120.)

Cover of Screening for Speech and Language Delays and Disorders in Children Age 5 Years or Younger

Screening for Speech and Language Delays and Disorders in Children Age 5 Years or Younger: A Systematic Review for the U.S. Preventive Services Task Force [Internet].

Show details

3Results

This chapter provides a comprehensive presentation of the evidence from the 2006 report and our updated searches. The KQs in this update are similar to those in the 2006 report, and we added three descriptive contextual questions. The contextual questions describe techniques used for speech and language screening, risk factors associated with speech and language delays, and the role of the primary care provider in screening when the screening occurs in other venues, such as daycare. The inclusion criteria across the two reviews are generally the same. Exceptions include the type of screening studies allowed. In this review, we limited the administration time of the instrument when used by a primary care provider; included studies with a broad range of children’s ages only if there were separate data for children age 5 years or younger; excluded studies of children with known conditions such as cleft palate; and required reference standards to be instruments known to be used by speech and language practitioners to diagnose speech and language delays or disorders in either research or clinical venues (Table 1). We limited treatment studies to RCTs, and only those with no treatment comparisons, because “usual care” implies inclusion of a treatment arm. To be comparable to the United States, we required the setting to be in countries with a very high Human Development Index.

Table 1. Differences in Included Studies in the 2006 Review and Current Review.

Table 1

Differences in Included Studies in the 2006 Review and Current Review.

We first report on the yields from our literature searches. The results presented below first summarize and then describe new studies identified by the updated search. Next, we summarize studies from the 2006 report that continue to meet inclusion and quality criteria. In relation to screening, we included 16 good- or fair-quality studies (in 26 publications) of the 35 studies included in the prior report, and in relation to treatment, we included seven good- or fair-quality studies of 14 earlier included studies. Table 2 lists all studies included for analysis in this review. Reasons for study exclusion are detailed below. We follow with a synthesis of the overall (new, then old) evidence, noting results for subgroups when such data are available. Appendix D contains full evidence tables for each KQ.

Table 2. Comparison of Studies Meeting Inclusion and Key Question Quality Criteria in Previous and Present USPSTF Reviews.

Table 2

Comparison of Studies Meeting Inclusion and Key Question Quality Criteria in Previous and Present USPSTF Reviews.

Literature Search

Figure 2 illustrates the yield at each stage of the review process for the update search. We reviewed 1,497 titles and abstracts dually and independently, and identified 555 studies for full-text review. Evidence to answer KQs was obtained from 38 studies (in 40 articles) and two systematic reviews. Fifty-five additional studies were used solely to answer contextual questions. More specifically, of the 52 fair- or good- quality studies on screening or intervention included in the previous review, 27 studies (28 articles) met the inclusion criteria for this review. Four studies rated as good or fair quality in the earlier review were newly rated as poor quality and were not included in our analysis.6669 Eight new screening studies (in nine publications) and six new treatment studies met our inclusion and quality eligibility criteria following dual independent review.

This figure shows the flow of articles through the systematic review process. 1,556 records were identified through database searching: 906 records through PubMed, 7 through Cochrane, 221 through PsycInfo, and 212 through CINAHL; 210 instruments were identified. Instruments were searched by name across the database. Additionally, 67 records were identified from the previous report and 35 records were found through a hand search. A total of 161 duplicates were removed. 1,497 records were screened and 942 records were excluded. This included 6 irretrievable abstracts. 555 full-text articles were assessed for eligibility and 436 full-text articles were excluded. The reason for exclusion and number of articles excluded is as follows: not original research (70); not published in English (2); wrong age range, probable reason for delay or disorder identified prior to speech and language diagnostic procedure, or wrong population of interest (125); wrong comparison (136); wrong design (20); no speech or language component (50); wrong geographic setting (10); no accuracy information (13); and article irretrievable (10). 115 studies in 119 articles were included in the systematic review. This includes 26 studies that were rated as poor quality.

Figure 2

Preferred Reporting of Systematic Reviews and Meta-Analysis (PRISMA) Tree.

Key Question 1. Does Screening for Speech and Language Delays or Disorders Lead to Improved Speech and Language Outcomes as Well as Improved Outcomes in Domains Other Than Speech and Language?

Although one new study met our inclusion criteria,70,71 it was rated as poor quality, resulting in no evidence being available to answer this KQ. The study randomized a large sample of children in The Netherlands who attended regularly scheduled visits at child health centers. Children were randomized at age 15 months to receive screening/no screening at ages 18 and 24 months, and then followed to age 8 years. The study found no significant differences between the two arms in language performance at age 36 months. At age 8 years, children in the screening arm were less likely to be in a special school but not less likely to have repeated a grade because of language problems. A comparison of children screened versus not screened found that children who were screened were less likely to be in the lowest 10th percentile for oral language testing.

Of primary concern in this study was a large attrition rate. Of 6,485 children randomized to the screening group, 3,776 were fully screened. The study obtained outcomes for 3,118 children in this arm, only 1,980 of the children who had been fully screened. Of 4,955 children randomized to the control arm, outcome measures were obtained for 2,288 children.

Breaking the randomization, cohort analyses were conducted comparing children who were screened (a subgroup of the intervention arm) versus not screened (obtained from the intervention and control arms). This analysis did not control for other possible differences between children in the two groups that could result in poorer outcomes, such as autism spectrum disorder or hearing, developmental, and emotional problems that may have arisen following the initial screening.

Key Question 2. Do Screening Evaluations in the Primary Care Setting Accurately Identify Children for Diagnostic Evaluations and Interventions?

Key Question 2a. What Is the Accuracy of These Screening Techniques, and Does It Vary By Age, Cultural/Linguistic Background, Whether the Screening Is Conducted in a Child’s Native Language, or How the Screening Is Administered?

Summary of Newly Identified Evidence on Accuracy of Screening

Fourteen new studies (15 articles) on the accuracy of screening instruments met our inclusion criteria since the prior review.7285 In addition, we found three older studies and a systematic review through hand searches/peer reviewer recommendation that were not included in the previous review.8689 We used both the systematic review by Law et al89 and the previous USPSTF review90 to hand-search for relevant studies. Of the 14 newly identified studies, we rated eight as poor quality (Appendix C); two because the reference test was not independent of the screening instrument,79,80 three because the reference test was inappropriate (i.e., either another screening instrument or a measure of cognitive ability),8183 two because an inappropriate reference standard was used and the reference was not independent of the screening instrument,84,85 and one because no information was given on the reference standard and there was limited information on the screening instrument.87

Study Characteristics of Newly Identified Evidence on Accuracy of Screening

Characteristics of the eight newly identified studies rated fair- or good-quality are shown in Table 3. Of these, only the study by Sachse and Von Suchodoletz76 was rated as good quality. Three studies7274 examined the accuracy of the ASQ, including a Spanish translation of the instrument. Five studies7376,78 examined different versions of the CDI, including translations in Spanish, German, and Swedish and shortened versions; two of these studies73,74 also examined the ASQ. One of the studies72 that reported on the ASQ also examined the accuracy of the Battelle Developmental Inventory Screening Test (BDIST) Communication Domain, the Brigance Preschool Screen (BPS), and the Early Screening Profile (ESP). One study by Rigby and Chesham86 reported on a trial speech screening test. Another reported on the Infant-Toddler Checklist (ITC), a component of the Communication and Symbolic Behavior Scales–Developmental Profile.88 As part of these studies, children ages 18 months,78 2 years,7476 3 to 5 years,73 and 4.5 years were screened.72,86 Five of the studies were conducted in the United States; the remaining three were located in Canada, the United Kingdom, and Sweden. Recruitment techniques and venues included advertisements, birth registries, early childhood programs, medical practices, and university research programs. Venues for the studies included primary care practices, early childhood centers, health centers, hospitals, and university research laboratories.

Table 3. Screening Accuracy Studies From 2006 Review and New Review.

Table 3

Screening Accuracy Studies From 2006 Review and New Review.

Description of Previously Identified Studies on Screening That Continue to Meet Current Inclusion and Quality Criteria

We examined all 42 studies (in 43 articles) identified in the 2006 review. Of these 42 studies, 23 continued to meet the inclusion criteria for this update.54,6668,91108 Nineteen studies were excluded at the full-text level. One study109 was not original research but rather a letter to an editor, and another110 examined the accuracy of a diagnostic test rather than a screening instrument. Eight studies111118 included children who either had a prior diagnosis or were older than our age criteria and did not include an analysis by subgroup that met our inclusion criteria. Six studies included screening instruments that did not focus on speech or language, did not have a speech and language component, or could not be administered or interpreted in the required timeframe. Three studies119121 did not include accuracy information about the screening instrument. We rated seven of the remaining 23 studies6668,96,101,107 as poor quality. Reasons for these ratings may be found in Appendix C.

Characteristics of the 16 good- or fair-quality studies included in our analysis are shown in Table 3. Three studies (in four articles)54,98,102,122 examined the accuracy of the LDS. Two studies95,104 examined the General Language Screen (GLS) (formerly known as the Parent Language Checklist). Two studies92,105 examined the Fluharty Preschool Speech and Language Screening Test (FPSLST) and its earlier version, the Fluharty Preschool Screening Test. Two studies reported on the Structured Screening Test (SST)99 and its previous version, the Hackney Early Language Screening Test (HELST).100 No other screening instruments were examined in more than one study. Nine studies examined one or more instruments that were not assessed in any other study; many of these instruments have not been published or used widely outside of the study that reported their use or were older versions of a currently used instrument. These include the Davis Observation Checklist for Texas (DOCT),91 the Northwestern Syntax Screening Test (NSST),92 the Screening Kit of Language Development (SKOLD),93 the Denver Developmental Screening Test (DDST),94 the Denver Articulation Screening Exam (DASE),97 the Developmental Nurse Screen (DNS) and the Parent Questionnaire,103 the Sentence Repetition Screening Test (SRST),106 and Ward’s unnamed screening tool.108

The ages of the children screened in these studies varied; the majority focused on children ages 2 and 3 years. One study focused on children age 9 months,108 and four were limited to those ages 4 and 5 years.91,105,106 Nine of the studies were conducted in the United States, and the remaining seven in other English-speaking countries, including the United Kingdom, Canada, and Australia. Recruitment techniques and venues included advertisements, birth registries, early childhood programs, university research programs, medical practices, and school registration and entrance medical examinations.

Detailed Synthesis of Evidence on Screening

Table 4 provides a description of each screening instrument included in addressing KQ 2. We present the skills screened, the summary scores, time to complete, appropriate ages for administration, source of the screening information, and when available, reliability data. In some cases, we obtained the reliability information from test manuals. We review the evidence on the accuracy of screening by considering who does the screening and whether demographics, such as age, race, and ethnicity, and risk factors facilitate screening. Table 5 provides accuracy statistics for parent-rated instruments, and Table 6 provides statistics for those administered by trained examiners. We report sensitivity, specificity, prevalence, positive and negative predictive values, and positive and negative likelihood ratios (LRs), as well as 95% confidence intervals (CIs) for sensitivity and specificity. However, we caution that the positive and negative predictive values are virtually meaningless in studies where the prevalence exceeded 10 percent, because investigators chose a random sample from among children with negative screens to complete the reference measures. Therefore, we do not discuss them in the text. When accuracy statistics were not provided by the author, we calculated them ourselves using an online calculator123 (Appendix E).

Table 4. Screening Instruments for Speech and Language Delays and Disorders in Children Age 5 Years and Younger.

Table 4

Screening Instruments for Speech and Language Delays and Disorders in Children Age 5 Years and Younger.

Table 5. Accuracy of Parent-Rated Screening Instruments for Speech and Language Delays and Disorders.

Table 5

Accuracy of Parent-Rated Screening Instruments for Speech and Language Delays and Disorders.

Table 6. Accuracy of Professional/Paraprofessional-Administered Screening Instruments for Speech and Language Delays and Disorders.

Table 6

Accuracy of Professional/Paraprofessional-Administered Screening Instruments for Speech and Language Delays and Disorders.

We calculated median test statistics across all parent-rated and trained examiner instruments separately. In some cases, a study calculated separate statistics for each reference measure; we calculated the median accuracy statistics across all measurements across all studies. We calculated the median rather than the mean because the accuracy statistics were somewhat skewed. When more than one study examined the accuracy of an instrument, we determined the median of the accuracy statistics for that instrument and discuss it separately in the text. We report the accuracy statistics by age when there was variation by age at screening.

Parent-Rated Screening Instruments

Fourteen studies (in 16 articles)54,7278,88,95,98,102104,108,122 examined the accuracy of screening instruments in which parents rated their child’s speech and language skills (Table 5). The instruments included are the ASQ, CDI, GLS, ITC, LDS, Parent Questionnaire, and Ward’s screening tool. Most children in these studies were age 2 or 3 years (toddlers). Cutoff scores for positive screening, when provided, varied as a function of the instrument but were usually the scores recommended by the developer.

Sensitivity for detecting a true speech and language delay or disorder using parent-reported instruments ranged from 50 to 94 percent, with a median of 80 percent, based on data from 19 measurements of accuracy that include 12 different reference standards in the 14 studies. Data from one study98 were not included, as they concerned the same sample with a different cutpoint. The specificity of the screening test for detecting a child without a speech and language delay or disorder ranged from 45 to 96 percent, with a median of 81 percent. Based on the Michigan State University Evidence-based Medicine Course124 criteria for interpreting LRs (Appendix E), we found a positive LR in at least one study investigating the ASQ, CDI, LDS, Parent Questionnaire, and Ward’s screening tool. These results indicate that there was at least a moderate increase in the likelihood of a language delay using the results of each of these instruments. Inspection of the negative LRs suggests that in at least one study examining the CDI, ITC, and LDS, there was at least a moderate decrease in the likelihood of language delay.

Figures 3a and 3b present CIs for sensitivity and specificity of the parent-rated instruments, by study. As Figure 3a demonstrates, the CIs for sensitivity of the different instruments overlap, suggesting no clear difference in sensitivity between them. In contrast, Figure 3b shows that the CI for the specificity of the Parent Questionnaire does not overlap with that of other instruments, suggesting that this measure is less able to detect children without language delays than the others.

This figure is a graph depicting the sensitivity values for parent-reported screening instruments found in the included studies. Each point shown on the graph provides the point value of sensitivity. In some cases, one instrument is presented with different reference measures, and these are presented separately. Such instruments include the Spanish ASQ, ASQ, Infant-Toddler Checklist, and the LDS. The upper and lower limit of the 95% confidence interval is shown for the sensitivity for each screening instrument. All the values presented in this figure can be found in Table 5.

Figure 3a

Parent-Reported Screening Instruments: Sensitivity Values. Figure Notes: [a] Guiberson et al, 2011; [b] Guiberson and Rodriguez, 2010; [c] Frisk et al, 2009, with reference measure PLS-4, Comprehension; [d] Frisk et al, 2009, with reference measure PLS-4, (more...)

This figure is a graph depicting the specificity values for parent-reported screening instruments found in the included studies. Each point shown on the graph provides the point value of specificity. In some cases, one instrument is presented with different reference measures, and these are presented separately. Such instruments include the Spanish ASQ, ASQ, Infant-Toddler Checklist, and the LDS. The upper and lower limit of the 95% confidence interval is shown for the specificity for each screening instrument. All the values presented in this figure can be found in Table 5.

Figure 3b

Parent-Reported Screening Instruments: Specificity Values. Figure Notes: [a] Guiberson et al, 2011; [b] Guiberson and Rodriguez, 2010; [c] Frisk et al, 2009, with reference measure PLS-4, Comprehension; [d] Frisk et al, 2009, with reference measure PLS-4, (more...)

Accuracy data for all screening instruments are presented in Tables 5 and 6. In addition, when there was more than one study that assessed an instrument, we provide results below.

Ages and Stages Questionnaire. Children in the three studies7274 evaluating the ASQ ranged in age from 24 to 54 months. The median sensitivity of the ASQ was 63 percent and the median across the different studies was 84 percent. In the two studies using the Spanish version of the ASQ,73,74 the positive LR indicates moderate to large increases in the likelihood of a language delay for those children who screened positive.

MacArthur-Bates Communication Development Inventory. Five studies (six articles)7378 examined the accuracy of the CDI. This instrument has versions for infants, toddlers, and preschool children. In the toddler and preschool versions, parents report their child’s use of words and sentences. All but one of the studies in this review73 used the toddler version; children ranged in age from 18 to 62 months. In addition to the original English-language version of the CDI, studies included translated versions in Spanish, German, and Swedish. The median sensitivity of the CDI across studies was 82 percent and the median specificity was 86 percent. The positive and negative LRs in the German version of the CDI76,77 indicate a moderate increase in the likelihood of a language delay for those children who screened positive and a large decrease in the likelihood of language delay for those who scored negative. The CDI Words and Sentences75 and the short form of the Spanish version of the CDI74 also had moderately positive LRs; the Spanish short-form version also had a moderately negative LR.

General Language Screen/Parent Language Checklist. The Parent Language Checklist is an earlier version of the GLS and is essentially the same. Children in the two studies95,104 evaluating this instrument were age 36 months. The median sensitivity across three measurements was 75 percent and the median specificity was 68 percent. The CI for the specificity of the Parent Language Checklist did not overlap with that of other parent-rated instruments, indicating that its specificity is lower than others.

Language Development Survey. Three studies (four articles)54,98,102,122 reported on the LDS, in which parents indicate which of 310 words their child produces, as well as whether the child produces two-word and longer sentences. Children in these studies ranged in age from 24 to 34 months. The median sensitivity of the LDS was 91 percent, based on data from three measurements; data from Klee et al98 were not included, as they concerned the same sample with a different cutpoint. The median specificity across three measurements was 86 percent. In one study of the LDS,122 the positive LR was 24.1, indicating that children who screened positive were very likely to have a language delay. In addition, in each of the studies that investigated the LDS, the negative LRs were moderate to strong, indicating that children who screened negative on the LDS were highly likely not to have a language delay.

Accuracy of Parent-Rated Screening Instruments By Child Age

Ages and Stages Questionnaire. Comparison of the CIs of the ASQ in older children (age 4.5 years) in the Frisk et al study72 with those in children ages 2 to 3 years in two other studies73,74 suggests that there are few differences in sensitivity as a function of age. However, as the CIs indicate, specificity is higher for the Spanish ASQ in younger children: the median specificity for detecting the absence of speech and language delays or disorders in children ages 2 to 3 years was 94 percent compared with 74 percent in children age 4.5 years. Moreover, the positive LRs indicate at least a moderate increase in the likelihood of a language delay relative to children ages 2 to 3 years who screened negative, with only a small increase in the likelihood of delays for older children. The negative LRs were small and equivalent for both younger and older samples.

MacArthur-Bates Communication Development Inventory. Four of the five studies (five articles)7478 that examined the accuracy of the toddler version of the CDI included children ages 18 to 36 months. One study73 used the preschool version with children ages 36 to 62 months. Comparison of the accuracy of the toddler version with the preschool version indicates that they are fairly comparable. The median sensitivity of the toddler version was 84 percent compared with 82 percent for the preschool version; the median specificity was 87 percent for the toddler version versus 81 percent for the preschool version. However, as Figures 3a and 3b show, sensitivity was lower in one study78 of toddlers than in all the others.

Infant-Toddler Checklist. The one study of the ITC88 included separate accuracy statistics for children in two age groups: younger toddlers (ages 12 to 17 months) and older toddlers (ages 18 to 24 months). Accuracy results were similar, as shown in Figures 3 and 3b. Sensitivity was 89 percent for younger toddlers and 86 percent for older toddlers. Specificity was 74 percent for younger toddlers and 77 percent for older toddlers. In both samples, negative LRs indicate a moderate decrease in the likelihood of language delay in those children who tested negative.

Accuracy of Parent-Rated Screening Instruments By Child Race/Ethnicity

No studies provided evidence for accuracy as a function of race/ethnicity.

Accuracy of Parent-Rated Screening Instruments By Prediction Length

Two studies in four articles76,77,98,122 examined the accuracy of screening instruments for predicting future language delay or disorder. In both studies, the accuracy of the instrument administered at age 2 years was examined in relation to the reference standard at both ages 2 and 3 years, allowing a comparison of longer-term versus more immediate sensitivity and specificity. In a study (one of two articles)122 that examined the LDS, sensitivity for detecting a language delay or disorder at age 3 years was 67 percent compared with 91 percent at age 2 years. Specificity for detecting typical language at age 3 years was 93 percent compared with 96 percent at age 2 years. In a second study that examined the German version of the CDI, sensitivity for detecting a language delay or disorder at age 3 years was 94 percent compared with 93 percent at age 2 years. Specificity for detecting typical language at age 3 years was 61 percent compared with 88 percent at age 2 years.

Trained Examiner Screening Instruments

Twelve studies72,86,9194,97,99,100,103,105,106 examined the accuracy of screening tests designed to be completed by trained examiners, including nurses, primary care providers, teachers, and paraprofessionals (Table 6). Evidence includes data on the following instruments: the BPS, BDIST Communication Domain, DOCT, DASE, DDST Language, DNS, ESP, FPSLST, NSST, SRST, SKOLD, SST, and Rigby’s trial speech screening test. Several studies included more than one screening instrument. All but two of the instruments (DNS103 and DOCT91) require at least some direct testing of the child; DNS and DOCT are completed after observing the child. In comparison with the studies of parent-rated instruments, these studies tended to focus on older preschool-age children, ranging from 18 to 72 months. Three studies99,100,103 focused on children ages 2 to 3 years, one study92 included children ages 3 to 4 years, five studies72,86,91,105,106 included children ages 4 to 5 years, and three studies93,94,97 included children across the age span.

Four instruments included at least a component to screen for articulation delays or disorders (FPSLST, DASE, SRST, and Rigby’s trial speech screening test). Four instruments included separate components for language expression and language comprehension (SKOLD, BDIST Communication Domain, BPS, and SST). Two instruments measured grammar (NSST and SRST), and one assessed vocabulary knowledge (ESP). Two instruments measured global speech and language skills (DOCT and DDST Language). The DNS includes a single question about the child’s communication that is answered after a period of observation.

Many studies either included multiple screening instruments or examined accuracy in relation to more than one reference test; we include all of these measurements in our analysis. Based on 27 measurements (in the 11 studies using accuracy from all reference tests), sensitivity of a screening test administered by a trained examiner for detecting a true speech and language delay or disorder ranged from 17 to 100 percent (median, 74%); specificity ranged from 46 to 100 percent (median, 91%). In studies of the BDIST,72 DOCT,91 SKOLD,93 SRST,106 SST,99 and Rigby’s trial speech screening test,86 positive LRs indicated at least a moderate increase in the likelihood of language delay for children who screened positive; studies of the BPS,72 DOCT,91 ESP,72 NSST,92 SKOLD,93 and HELST100 indicated at least a moderate decrease in the likelihood of language delay for those who screened negative. Accuracy results for instruments that appeared in more than one study are presented below.

Figures 4a and 4b display the sensitivity and specificity of the trained examiner screening instruments. The CIs for sensitivity indicate great variability among the instruments. However, the CIs for the Standard English (SE) version of the SKOLD and the HELST did not overlap with those of several other instruments (BDST Receptive, BDST Expressive, BPS Receptive, DDST, FPST, SRST, SST, and Rigby’s trial speech screening test), indicating that the latter are less sensitive than SKOLD and HELST for detecting language delays. The figures also show that the DDST is less sensitive than several other instruments. CIs around the specificity point estimates were somewhat tighter. Some instruments demonstrated better ability to detect typical speech or language compared with others; namely, the SE version of the SKOLD-30, DOCT, DSST, SRST (for typical articulation), SST, and Rigby’s trial speech screening test demonstrated better ability to detect typical speech or language delays compared with the BDST (for typical receptive language), BPS, ESP (for typical receptive language), NSST, and HELST.

This figure is a graph depicting the sensitivity values for trained examiner screening instruments found in the included studies. Each point shown on the graph provides the point value of sensitivity. In some cases, one instrument is presented with different reference measures, and these are presented separately. Such instruments include the BDIST Receptive, the ESP Verbal, and the SRST. The upper and lower limit of the 95% confidence interval is shown for the sensitivity for each screening instrument. All the values presented in this figure can be found in Table 6.

Figure 4a

Trained Examiner Screening Instruments: Sensitivity Values. Figure Notes: [a] Frisk et al, 2009, with reference measure PLS-4, Comprehension; [b] Frisk et al, 2009, with reference measure PLS-4, Expression; [c] Frisk et al, 2009, with reference measure (more...)

This figure is a graph depicting the specificity values for trained examiner screening instruments found in the included studies. Each point shown on the graph provides the point value of specificity. In some cases, one instrument is presented with different reference measures, and these are presented separately. Such instruments include the BDIST Receptive, the ESP Verbal, and the SRST. The upper and lower limit of the 95% confidence interval is shown for the specificity for each screening instrument. All the values presented in this figure can be found in Table 6.

Figure 4b

Professional/Paraprofessional-Reported Screening Instruments: Specificity Values. Figure Notes: [a] Frisk et al, 2009, with reference measure PLS-4, Comprehension; [b] Frisk et al, 2009, with reference measure PLS-4, Expression; [c] Frisk et al, 2009, (more...)

Fluharty Preschool Speech and Language Screening Test. Two studies92,105 examined the accuracy of the FPSLST and its precursor, the Fluharty Preschool Screening Test, in children ages 3 and 4 to 5 years. The FPSLST provides separate scores for articulation and language and an overall composite. Across the five measurements (all reference tests included) in these two studies, sensitivity ranged from 17 to 74 percent, with a median of 43 percent; specificity ranged from 81 to 97 percent, with a median of 93 percent.

Structured Screening Test. Two studies evaluated the SST and its precursor, the HELST,99,100 each in children age 30 months. Designed for health visitors to administer during routine developmental assessments, this instrument includes items measuring language expression and comprehension. In the two studies, sensitivity was 66 and 98 percent (median, 82%) and specificity was 89 and 69 percent (median, 79%) for the SST and HELST, respectively. It should be noted that the SST maximized specificity rather than sensitivity.

Accuracy of Trained Examiner Screening Instruments By Child Age and Language Dialect

Screening Kit of Language Development. One study93 assessed the SKOLD in children ages 30 to 48 months. The SKOLD measures both language comprehension and expression, and includes separate subtests for different ages and for speakers of African American dialect and SE. Because the instrument has separate subtests by age and linguistic background, we could examine accuracy as a function of these two characteristics.

Across the two dialect versions, the median sensitivity was 94, 94, and 97 percent for children ages 30 to 36 months, 37 to 42 months, and 43 to 48 months, respectively; the median specificity was 92, 88, and 85 percent, respectively.

Across the three age levels, the median sensitivity for SE subtests was 100 percent compared with 88 percent for African American dialect, and the median specificity for SE was 93 percent compared with 86 percent for African American dialect. As noted above, the SE version of the SKOLD displays higher sensitivity for detecting language delays than several other measures.

Except for African American children screened at ages 43 to 48 months, positive LRs indicate a large increase in the likelihood of a language delay among children who scored positive in any age/dialect group. Across all ages and both dialect groups, negative LRs indicate a large decrease in the likelihood of a language delay among those children who scored negative.

No other screening instrument provided separate data by racial/age groups.

Key Question 2b. What Are the Optimal Ages and Frequency for Screening?

There is no evidence to answer this question.

Key Question 2c. Is Selective Screening Based on Risk Factors More Effective Than Unselected, General-Population Screening?

There is no evidence to answer this question.

Key Question 2d. Does the Accuracy of Selective Screening Vary Based on Risk Factors? Is the Accuracy of Screening Different for Children With an Inherent Language Disorder Compared With Children Whose Language Delay Is Due to Environmental Factors?

There is no evidence to answer this question.

Key Question 3. What Are the Adverse Effects of Screening for Speech and Language Delays or Disorders?

There is no evidence to answer this question.

Key Question 4. Does Surveillance (Active Monitoring) By Primary Care Clinicians Play a Role in Accurately Identifying Children for Diagnostic Evaluations and Interventions?

There is no evidence to answer this question.

Key Question 5. Do Interventions for Speech and Language Delays or Disorders Improve Speech and Language Outcomes?

In this review, we organize our summary of treatment evidence around three broad outcome categories: language (including expressive and receptive language and more specific aspects of language, such as vocabulary, syntax/morphology, and narratives), speech sounds (including articulation, phonology, phonological awareness, and speech intelligibility), and fluency (stuttering). Among both the newly and previously identified evidence, some studies report outcomes in more than one of these three broad categories.

Summary of Newly Identified Evidence on Treatment

We include in our analysis six trials testing treatment for speech and language delays or disorders that met the inclusion criteria and were not included in the previous review.125130 Also, we identified one systematic review of the literature on treatment of childhood apraxia of speech.131

We identified two additional studies that we rated as poor quality (Appendix C). One study did not state how the groups were randomized or whether the researchers used any procedures to address missing data and intention to treat, and presented no participant characteristics beyond pretest scores.132 The other study did not state how study assignments were made and did not include baseline characteristics or independent measures of the outcome.133

Study Characteristics of Newly Identified Evidence on Treatment

The newly identified evidence includes one good-quality cluster RCT128 and five fair-quality parallel RCTs125127,129,130 (Table 7). The systematic review of the literature on treatment of childhood apraxia of speech found no studies that met the inclusion criteria.

Table 7. Characteristics of Randomized, Controlled Trials of Speech and Language Interventions.

Table 7

Characteristics of Randomized, Controlled Trials of Speech and Language Interventions.

Among the six newly identified trials, four examined language outcomes,125,128130 including three that also examined aspects of speech sound outcomes.125,129,130 The other two newly identified studies focused on fluency outcomes (Table 8).126,127

Table 8. Outcomes of Randomized, Controlled Trials of Speech and Language Interventions.

Table 8

Outcomes of Randomized, Controlled Trials of Speech and Language Interventions.

Summary of Previously Identified Studies on Treatment That Continue to Meet Current Inclusion and Quality Criteria

Of the 14 fair- or good-quality trials identified in the previous review (two of which we concluded were the same study), seven trials reported in eight publications met the inclusion criteria for this update134141 (Table 7). One of these was evaluated as being of good quality.135

We excluded five treatment studies that were included in the 2006 review because we considered them to be comparative effectiveness studies.142146 One additional article from the previous review was excluded because it was irretrievable.147

Language Outcomes

New Studies

Wake et al128 tested the effects of a modified Hanen Parent Program called “You Make the Difference”148 for children served by maternal and child health centers in Melbourne, Australia. Child eligibility at age 18 months was based on a score at or below the 20th percentile on a parent-completed vocabulary checklist; 301 children were randomized by the maternal and child health center in which they were served. Treatment was provided by three professionals trained in the intervention model (one speech-language pathologist and two psychologists) through six weekly 2-hour parent group sessions. For the first 1.5 hours, the group leader facilitated a review of the previous week’s home practice, followed by a participatory presentation on optimizing responsive interactions and providing a rich language environment for young children. For the last 30 minutes, parents were videotaped practicing new strategies with their children, with coaching as needed from the group leader. The report does not state if any children received speech/language services in the community. Outcomes were measured at ages 2 and 3 years, and included broad measures of expressive and receptive language (the Preschool Language Scale expressive communication and auditory comprehension subscales) and the Expressive Vocabulary Test.

Fricke et al125 recruited 180 children (mean age, 4 years) with the lowest scores on a composite measure of expressive language from nursery school programs in Yorkshire, England. For children in the treatment group, teaching assistants provided a 30-week manualized oral language program modified from a previous intervention study.149 This program was compiled from a variety of sources and has not been widely disseminated or evaluated as a specific treatment package outside of the studies conducted by this group of researchers, thereby limiting its immediate applicability to other settings. Lessons covered vocabulary and narratives, and in the last 10 weeks of the program, also covered the emergent literacy skills of letter sounds and phonological awareness. The children participated in three 15-minute small-group sessions per week for 10 weeks in nursery school classrooms (for children ages 3 and 4 years), and three 30-minute small-group sessions and two 15-minute individual sessions per week for 20 weeks in reception classrooms (in which children are enrolled the year they turn age 5). A large number of individual language outcome measures were gathered, and through latent variable analysis, the researchers identified four constructs (language, narrative, phoneme awareness, literacy) for which effects were examined at immediate posttest and at a maintenance followup 6 months after the end of the intervention. No information was provided regarding whether any children received speech/language treatment in the community.

Wake et al130 recruited 200 children at age 4 years from the greater Melbourne, Australia area. Eligible children had receptive and/or expressive language scores at least 1.25 standard deviations below the mean on the Clinical Evaluation of Language Fundamentals–Preschool, second edition. Children were excluded if they had a known intellectual disability, major medical condition, autism spectrum disorder, hearing loss greater than 40 dB in the better ear, or parents with insufficient English to participate. Children were randomized to an intervention (n=99) or control (n=101) group. The intervention was planned to comprise 18 1-hour sessions, occurring in three blocks of six 1-hour sessions across 6 weeks, with a 6-week break between session blocks. The intervention was adapted from a manualized program developed for an earlier RCT by a different team of investigators.150 Trained language assistants provided the intervention, which included phonological awareness activities and storybook reading targeting print awareness, initial phoneme isolation, and letter knowledge for all children, and also included specific language targets selected for each child individually based on the child’s language profile. Examples of individualized targets include vocabulary expansion, sentence structure, and comprehension and use of morphological markers (e.g., plurals, possessives, past-tense verb endings). The intervention manual supported implementation of the intervention by the language assistants, who were trained and had ongoing guidance from a supervising speech pathologist. Parents of children in the control group were informed by mail of group allocation and the availability of local speech pathology services. However, no data were reported on local speech pathology services actually received by the control group or on community speech pathology services received by the experimental group, if any.

Yoder et al129 recruited 52 preschool children with specific speech and language impairments (mean age, 43.8 months). Included children a had nonverbal IQ above 80 and scores at least 1.3 standard deviations below the mean on either a mean length of utterance measure or the expressive subscale of the Preschool Language Scale, third edition,151 and a score of at least 1.3 standard deviations below the mean on the Arizona Articulation Proficiency Profile.152 The intervention consisted of broad target recasting, a strategy characterized by an interventionist providing additional information when a child uses an immature form of speech or language. Interventionists provided speech recasts (providing an appropriately articulated repetition of an utterance the child used with immature articulation, but without adding additional grammatical structure) or sentence-length recasts (expanding a syntactically immature structure used by the child to a syntactically complete sentence). Individualized treatment was conducted three times per week for 30 minutes per session for 6 months. Intervention effects were examined at immediate posttreatment and 8 months after the treatment ended. All study participants were free to participate in community interventions. The treatment and control groups did not differ in the amount of speech and language treatment they received in the community, but the control group participated in more treatments targeting areas other than speech and language.

Studies From the Previous Review

All seven previously identified trials included in this update reported language outcomes.134137, 139141

One trial by Glogowska et al examined children younger than age 42 months (n=159) who were identified as having a delay in general language, expressive language, or phonological development at any of 16 clinics in Bristol, England.135 Treatment consisted of immediate speech and language therapy services, usually provided by the clinic. Some children in both arms did not fulfill the protocol. In the therapy group, three of 71 children failed to attend any therapy sessions; in the control group (n=88), one family requested therapy within 1 month of randomization and 17 requested therapy at the end of 6 months. Intervention treatment services were provided for an average of 8.4 months (range, 0.9 to 12), for 8.1 contacts (range, 0 to 17) and 6.2 total hours per participant (range, 0 to 15). Outcome measures were collected at 6 and 12 months after randomization.

Robertson and Ellis Weismer139 examined the effects of a clinician-delivered intervention on the expressive and receptive language skills of toddlers (ages 21 to 30 months) who were identified as late talkers based on parent-reported expressive vocabulary scores below the 10th percentile (n=21). Speech-language pathologists directed therapy in small groups of no more than four children for 150 minutes per week for 12 weeks. Aspects of the intervention included establishing routines, using theme-based materials, increasing the salience of linguistic input through modifications of stress vocabulary and pitch, modeling language, and providing interaction opportunities and feedback. Three key strategies used for language modeling were 1) parallel talk, or providing a verbal description of the child’s actions in the absence of a child verbalization; 2) expansion/expatiation, or repeating a child’s utterance with the addition of content that extends that given by the child; and 3) recast, defined here as repeating a child’s utterance with modification of syntactic elements of modality or voice.

One previously included Canadian trial evaluated the effects of the Hanen Parent Program on language outcomes137 among children ages 23 to 33 months with expressive language delays (i.e., at no higher than the one-word stage). The Hanen Parent Program comprises eight parent group sessions of 2.5 hours each and three home visits. Parents were taught to provide linguistic input to their children contingent on their child’s interests. For this study, the usual Hanen Parent Program was modified to coach parents on focused stimulation of 10 target words; replacing acquired words with new, parent-identified target words; and modeling two-word utterances.

Gibbard136 evaluated a parent training program for parents of toddlers ages 26 to 39 months with limited expressive vocabulary (30 words or less) but without evidence of global developmental delays. Parents attended sessions for 60 to 75 minutes every other week for 11 weeks. The primary objective for parents was to increase their child’s language development to the point that the child was producing three- to four-word utterances. During the parent group meetings, the group leader emphasized games and activities that could be used to help the children meet these objectives and how to transfer the language skills achieved during the games to daily life activities.

A second trial conducted by Robertson and Ellis Weismer140 randomized 20 children ages 44 to 61 months with an SLI to a peer model or control group. All children were enrolled in a language-based early childhood classroom throughout the study. Children in the peer model group played house in their classroom with language-typical peers at least four times for 15 minutes per play session over a 3-week period. Children in the control group were monitored to ensure that they played in the house area at least 60 minutes during the same 3-week interval, but without language-typical peer models. Language measures were all tied to the play house scripts and included gain scores in 1) the number of words included in a script describing how to play house, 2) the number of different words in the script, 3) the number of play-theme–related acts described in the script, and 4) the number of linguistic markers used in the script. Group comparisons were made on these content and structural indexes at immediate posttest and at 3-week followup. No comparisons were made on language measures apart from those in the play house scripts, which were tied to the specific context in which the experimental group interacted with language-typical peer models.

Finally, two studies that focused on treating children with speech sound disorders also included language outcome measures.134,141 These studies are described in more detail in the section on speech sound outcomes. Almost and Rosenbaum134 included mean length of utterance as an outcome measure of expressive language. Shelton et al141 included the NSST153 and the Auditory Association Subtest of the Illinois Test of Psycholinguistic Abilities as language outcome measures.

Speech Sound Outcomes

New Studies

Three of the new studies described in the section on language outcomes included speech sound outcome measures as well.125,129,130 In their study of broad target recast treatment, Yoder et al129 evaluated speech intelligibility, measured as acceptable (“intelligible”) word approximations in a 20-minute speech sample. Two other new studies examined outcomes related to phonological awareness.125,130 Phonological awareness is the ability to recognize the variety of sound units that make up spoken words. Slow development of phonological awareness often occurs in children with other speech and language delays or disorders, and is associated with difficulty in the development of early literacy skills.154,155

Studies From the Previous Review

Three of the trials described in detail in the section on language outcomes also reported speech sound outcomes. Glogowska et al135 included a phonology error rate156 to measure the effects of usual speech and language therapy services on speech sounds. Girolametto et al138 evaluated the effects of the Hanen Parent Program, adapted to include focused stimulation of language targets on three measures related to speech sounds: syllable structure level, consonant inventory, and percentage of consonants correct. Robertson and Ellis Weismer139 included a measure of percentage of intelligible utterances in their study of small-group language therapy for late-talking toddlers.

Two additional trials focused primarily on speech sound outcomes,134,141 although both included measures of language outcomes as well. Almost and Rosenbaum134 evaluated the efficacy of a modified “cycles” approach to phonological therapy,157 wherein rule-based errors in the child’s speech sound production are treated through recursive cycles of therapy targeting particular rules (also known as phonological processes). In a trial of 26 children with severe phonological disorders, outcomes were measured for those randomized to the intervention group following 4 months of treatment. Speech sound outcome measures included the Assessment of Phonological Processing–Revised,158 the Goldman-Fristoe Test of Articulation,159 and percentage of consonants correct.

Shelton et al141 identified 45 preschoolers (mean age, 47 months) through articulation screening, matched trios of children on a measure of receptive vocabulary, and then randomly assigned each member of the trio to one of three groups: a listening intervention that focused on speech sound discrimination activities, a reading and talking intervention that focused on storybook interactions, or control. Parents conducted activities with their children in the two active treatment groups for 57 days, for 5 minutes per day in the listening group and 15 minutes per day in the reading and talking group. Speech sound outcomes included measures of speech sound discrimination in quiet and in noise, speech sound error recognition, and articulation.

Fluency Outcomes

New Studies

Two newly identified studies focused only on fluency outcomes.126,127 Both of these studies examined the Lidcombe Program of Early Stuttering Intervention.160 The manual for the Lidcombe program can be downloaded from the Web site of the Australian Stuttering Research Centre (www.fhs.usyd.edu.au/asrc). In this program, parents are trained to provide differential verbal contingencies for stutter-free speech and for unambiguously stuttered speech for prescribed periods each day. In the original version of the program, the parent and child attend sessions with a speech-language pathologist for up to 1 hour per week during stage 1 of the treatment, in which the parent learns and practices the contingencies and learns to rate the severity of the child’s stuttering. The speech-language pathologist also evaluates the child’s stuttering during each weekly visit, using a measure of percentage of syllables stuttered. When the child is stuttering on less than 1 percent of all syllables uttered, the treatment progresses to stage 2. During stage 2, the parent gradually withdraws the contingencies, and clinic visits decrease in frequency over a period of at least 1 year. If the child’s percentage of syllables stuttered is greater than 1 percent for two consecutive visits, then the treatment returns to stage 1 until stuttering again decreases to the criterion level.

Jones et al126 evaluated the Lidcombe program in New Zealand based on a trial that recruited 54 children ages 36 to 72 months. The control group parents were told they would receive the Lidcombe intervention at the end of the trial should it prove to be efficacious and their children were still stuttering; they were also free to seek other treatment for their children during the trial, provided it was not the Lidcombe program. In violation of the protocol, four of the 25 children in the control group received some Lidcombe treatment; three others received alternative treatments for stuttering. Outcomes were measured at 9 months after randomization.

The second study of the Lidcombe program was conducted in Australia and involved telehealth delivery of the treatment.127 The 22 included children were ages 36 to 54 months, with a history of stuttering for longer than 6 months and no previous or current treatment for stuttering. Adaptations for telehealth delivery of the intervention included regularly scheduled telephone consultations in place of weekly clinic visits, videotaped demonstrations of the use of contingent feedback, parent training in rating stuttering severity via audiotaped speech samples and telephone conversations, audio-recorded parent-child interactions mailed to the speech-language pathologist for evaluation of parent implementation, and audio-recorded speech samples of the child mailed to the speech-language pathologist for computation of the percentage of syllables stuttered. Although parents of children in the control group were offered the Lidcombe program after the posttest, unlike in the Jones study, it was not reported whether any families sought other treatment during the trial.

Studies From the Previous Review

No previously included trials measured fluency outcomes.

Detailed Synthesis of Prior Evidence With New Findings on Treatment

In synthesizing the evidence across studies, we first organized the trials based on the type(s) of outcomes reported—language, speech sounds, or fluency. Within each group of studies reporting the same type of outcomes, we considered treatment heterogeneity, including the agent (teacher/clinician, parent, peer), strategies, and dosage/intensity. We also considered the characteristics of the children, including age range, and their speech and language abilities and disabilities.

In our synthesis, to aid in readability, we refer descriptively to the types of outcomes but in general do not name each specific outcome. Details for results of specific outcome measures are given in Table 8. In addition, we characterize outcomes as statistically significant or nonsignificant, and we use Cohen’s161 conventions for referring to effect sizes as small, medium, or large based on the variance explained by treatment group assignment. For Cohen’s d, a statistic representing the distance in standard deviation units between two means, the conventions we use are: small (0.2 to <0.5), medium (0.5 to <0.8), and large (≥0.8). For odds ratios giving the differential likelihood of a dichotomous outcome, the conventions we use are: small (1.44 to <2.47), medium (2.47 to <4.25), and large (≥4.25). Although we use Cohen’s conventions for characterizing effect sizes as small, medium, or large, we acknowledge and agree with the caution that these conventions may not be equated with the clinical significance of the differences.162 When standardized effect sizes were provided in the publications, we used the reported effect size. For trials not reporting standardized effect sizes, we computed effect sizes when the published data permitted these computations.

Table 7 provides information on specific ages of children in the included trials. In the text, we use “toddlers” to refer to children younger than age 3 years and “preschoolers” to refer to children ages 3 to 6 years.

Studies Reporting Language Outcomes

Eleven trials reported on language outcomes (Table 8). Among these, four used parents as the primary intervention agent.128,136,137,141 Two trials tested the effects of variations of the Hanen Parent Program128,137 on outcomes of toddlers with language delays, with divergent findings. The trial by Girolametto et al137 (n=25) found moderate to large effects favoring the treatment group on five of six expressive language outcome measures, in contrast with no significant differences and negligible effect sizes on three expressive language measures and one receptive language measure in the trial by Wake et al128 (n=301). Compared with the trial by Girolametto et al, the trial by Wake et al128 provided a lower dosage of parent training (720 vs. >1,200 minutes), enrolled younger children (age 18 vs. 23 to 33 months) who were selected based on less stringent criteria for language delay (lowest 20th vs. lowest 5th percentile for expressive vocabulary), and did not include any home visits for coaching purposes but included some individual parent coaching at the end of the parent group meetings. In the study by Girolametto et al, the parent group facilitators made three home visits. The differences in eligibility criteria for the two studies may be relevant to the divergent findings. Whereas Wake et al considered the possibility that the tested treatment was not sufficiently intensive to produce an effect, they concluded that the null findings in their study were more likely the result of natural resolution of the initial symptoms of delayed language, based on finding that the mean language scores were in the normal range (and very close to the standardized mean scores) for children in both groups at age 3 years. Children in the study by Girolametto et al, who were selected based on expressive vocabulary in the lowest 5th percentile, may have been less likely to experience a natural resolution of their language delay than those in the trial by Wake et al.

In a small trial involving parent training (n=36), Gibbard136 tested group training for parents of toddlers (ages 27 to 39 months) with limited expressive language. The total intensity of the intervention was relatively low, similar to that in the study by Wake et al128 (780 to 975 minutes), although the parent group meetings in the Gibbard trial were scheduled over a 6-month period compared with a 6-week period in the study by Wake et al.128 The content of the training was focused on activities parents could do with their children to promote specific language objectives, an approach that seemed more similar to the adaptation of the Hanen Parent Program by Girolametto et al137 than to the trial by Wake et al,128 which focused on more general language stimulation strategies. However, we could not fully assess the comparability of the content of the Gibbard intervention with that of either adaptation of the Hanen Parent Program from information available in the publication or online. Similar to Girolametto et al and in contrast to Wake et al, Gibbard reported large effects across seven language outcome measures, including six measures of expressive language and one of receptive language.

Shelton et al141 also had parents provide interventions for their children (ages 27 to 55 months) in a small trial (n=45 in three groups). They were primarily interested in the treatment of children with speech sound disorders; however, in addition to a listening treatment group exposed to speech discrimination activities designed to target speech sound outcomes, they included a second reading and talking treatment group, in which parents read and talked about storybooks with their children, a treatment that might be expected to positively affect children’s language outcomes. No significant effects were found for either treatment group compared with the control group on expressive syntax (small effect sizes favoring the control vs. listening group, and favoring the reading and talking vs. control group). Also, no significant effects were found on an auditory association measure tapping children’s semantic knowledge (medium effect sizes in favor of the listening vs. control group, as well as for the reading and talking vs. control group).

Two trials tested treatments primarily or exclusively delivered in a small-group format to toddlers140 and preschoolers125 with speech and language delays or disorders. In addition to small-group intervention, the trial by Fricke et al included two 15-minute individual treatment sessions per week during the last 20 weeks of the 30-week program. The intensity of both interventions was relatively high—2,850 total minutes in the trial by Fricke et al125 and 1,800 minutes in the trial by Robertson and Ellis Weismer.139 In both studies, the researchers specified the components of the intervention and trained the interventionists (teaching assistants in Fricke et al and speech-language pathologists in Robertson and Ellis Weismer) to implement the program. Both trials reported significant and large effects on measures of language skills. Fricke et al also reported a significant but small effect for a construct measuring narrative language.

Four trials reporting language outcomes tested treatments provided to children on an individual basis by research staff or speech-language pathologists129,135 but are not otherwise very comparable with one another. Glogowska et al135 examined the effects of providing young children (ages 18 to 42 months) with clinically significant delays in language or phonological development immediate access to usual speech-language therapy services in the community. Over the 12 months of the trial, children received an average of 372 minutes of treatment and showed significant but small gains relative to the control group in receptive language, with a small effect size (d=0.3), but did not differ at the end of treatment on expressive language measures, for which effect sizes were negligible. Wake et al130 tested a manualized intervention for 4-year-olds with specific language impairments that included a focus on phonological awareness, print awareness, and letter knowledge for all children but also addressed individualized language goals based on each child’s profile of language impairments. Children received an average of 1,020 minutes of treatment over a 30-week period (approximately 7 months). The intervention had no significant effect on the primary outcomes of expressive or receptive language or on the secondary outcome of pragmatic language, with small to negligible effect sizes for all three variables. Yoder et al129 tested the effects of an intervention strategy called recasting (repeating what is said by a child, but with correct articulation or with a grammatical expansion of the child’s utterance). The total amount of treatment was 2,340 minutes provided over 6 months. The intervention had no significant effect on the outcome measure of language (mean length of utterance); the publication did not report data sufficient to allow for the computation of an effect size. Yoder et al reported an interaction between the treatment group and the pretreatment articulation skills of the child, with a significant treatment effect on mean length of utterance at posttest and at followup for children with the lowest baseline articulation skills. Almost and Rosenbaum134 tested whether an individualized treatment for children with speech sound disorders had an effect on the language outcome measure of mean length of utterance but found no significant language effect (small effect size). More information about this study is provided in the following section.

Finally, the trial in which preschoolers with language impairments play with peers with age-appropriate language skills in the house play area of the preschool classroom at least four times over a 3-week period found large and significant effects on four measures of expressive language taken from samples in which the children were asked to specifically talk about playing house.140

Studies Reporting Speech Sound Outcomes

We included eight trials that reported outcomes related to speech sounds (measures of articulation, phonology, phonological/phonemic awareness, or intelligibility)125,129,130,134,135,138,139, 141 (Table 8). All of these trials also reported language outcomes.

In two trials, the treatment was parent mediated. Girolometto et al138 examined speech sound outcomes in addition to language outcomes for toddlers whose parents participated in the modified Hanen Parent Program. They reported significant effects on consonant inventory and syllable structure for the treatment group compared with the control group, and the effect sizes were large in both cases. Although parent mediated, the approach examined by Shelton et al141 was quite different in content. The primary research question in their study was whether children (ages 27 to 55 months) would benefit from a listening treatment in which parents focused the child’s attention on consonant sounds in syllables and words, and engaged the child in activities directed at discrimination of sounds, including correctly and incorrectly articulated sounds. The total intensity of the treatment was 1,425 minutes, delivered 5 minutes per day 5 days per week for a total of 57 sessions. One significant difference emerged in comparing the listening treatment with a control condition: children in the control condition made more improvements in auditory discrimination in noise. Although effects on articulation were nonsignificant, there was a medium-sized effect in favor of the listening group on one articulation measure (Templin-Darley Articulation Screening Test), but only a small effect on a second articulation measure (McDonald Screening Deep Test of Articulation). Shelton et al also reported results on articulation measures for the reading and listening treatment, described in the section on language outcomes; this group did not differ significantly from the control group on articulation outcomes, with small effects for both measures. Further, the effect favored the control group for one measure (McDonald Screening Deep Test of Articulation).

Robertson and Ellis Weismer139 evaluated a speech sound outcome (percentage of intelligible utterances) for toddlers who participated in a small-group speech and language program provided by speech-language pathologists. They found a significant effect of large magnitude in favor of the treated children compared with the control group.

Two studies examined effects on speech sounds for children treated individually by speech-language pathologists. Almost and Rosenbaum134 examined the effects of a now well-known “cycles” approach to phonological therapy for preschoolers with severe phonological disorders. The treatment was provided by speech-language pathologists in 30-minute sessions twice a week across 4 months (total of 1,040 minutes of treatment). There were significant effects, with large effect sizes, on three speech sound outcome measures, including two standardized tests, as well as the percentage of consonants correct during a speech sample. Glogowska et al135 found no improvement in phonology error rate for young children randomized to usual community speech-language pathology services for a year; however, after 12 months, treated children were 2.7 times more likely than control children to no longer exhibit the criterion severity of speech sound problems used to initially determine eligibility for the trial, a significant effect of medium size. As mentioned previously, the total average amount of treatment time in that trial was less than 7 hours.

The individual treatment trial of preschoolers by Yoder et al129 included the strategy of speech recast, which involved repeating a child’s incorrect speech production with correct articulation. There were no main effects of treatment on child intelligibility; however, there was an interaction between treatment and the pretreatment articulation skills of the child, with a significant treatment effect on intelligibility at followup for children with the lowest baseline articulation skills.

Two studies that focused primarily on language outcomes examined the effects of speech and language interventions on phonological/phonemic awareness skills as secondary outcomes for preschoolers.125,130 The study by Fricke et al,125 in which preschoolers participated in small-group and individual speech and language lessons delivered by teaching assistants, found significant effects, with a small to medium effect size both in the immediate posttest and at 6-month followup for a construct representing measures of phonemic awareness. Phonological awareness was also measured in the study by Wake et al,130 in which language assistants provided individual home-based intervention focusing on language and emergent literacy skills to preschoolers with language impairments, with findings of a significant effect of moderate size on this outcome.

Studies Reporting Fluency Outcomes

Two trials focused only on fluency outcomes126,127 (Table 8), examining the Lidcombe Program of Early Stuttering Intervention.160

Jones et al,126 who delivered the treatment to parents and their children ages 3 to 6 years in a clinic setting, found that the Lidcombe group showed a greater decrease in the percentage of syllables stuttered than the control group after 9 months; children in the Lidcombe group were almost 8 times more likely to have reached the criterion of stuttering on less than 1 percent of syllables. The odds ratio for this finding is large, with children in the Lidcombe program 7.7 times more likely than those in the control group to stutter on less than 1 percent of syllables after 9 months.

The trial by Lewis et al,127 using telehealth delivery of the Lidcombe program to parents and their preschool children, found that the treatment group showed a significantly greater reduction in the percentage of syllables stuttered, 69 percent less than in the control group (95% CI, 13 to 89).

Key Question 6. Do Interventions for Speech and Language Delays or Disorders Improve Other Outcomes, Such as Academic Achievement, Behavioral Competence, Socioemotional Development, or Health Outcomes, Such as Quality of Life?

Summary of Newly Identified Evidence on Other Outcomes

We identified three trials that met the inclusion criteria, contribute evidence relevant to this KQ, and were not included in the previous review.125,128,130 All three trials examined speech or language measures as primary outcomes, and thus they were included in the synthesis of evidence related to KQ 5 (Table 7).

Study Characteristics of Newly Identified Evidence on Other Outcomes

Two newly identified trials, both rated fair-quality, measured outcomes related to literacy.125,130 One of these trials also included a secondary measure of health-related quality of life.130 That trial and one other128 included outcomes related to child problem behaviors.

Description of Previously Identified Studies on Other Outcomes That Continue to Meet Current Inclusion and Quality Criteria

Two previously identified studies met inclusion criteria for the current review135,139 and provide evidence relevant to this KQ. Both also measured speech or language outcomes and thus were included in the results for KQ 5. Glogowska et al135 measured well-being, attention level, play level, and adaptive socialization skills as secondary outcomes. Robertson and Ellis Weismer139 measured adaptive socialization skills and parental stress as outcomes.

Detailed Synthesis of Prior Evidence With New Findings on Other Outcomes

Two trials examined the effects of language treatments on socialization, either among children receiving community-based speech-language pathology services135 or among language-delayed toddlers receiving small-group therapy.139 The former trial produced no significant differences between children in the treatment and control groups in socialization outcomes, while the latter produced significant differences in favor of children in the treatment group, with large effect sizes.

Of the two trials reporting outcomes related to child behavior problems, one was a low-intensity parent group program for parents of slow-to-talk toddlers,128 and the other provided up to 18 1-hour in-home speech and language treatment sessions for preschoolers with a SLI, with the sessions conducted by a language assistant.130 Neither found treatment to have a significant effect on children’s problem behaviors, with very small effect sizes. Similarly, two trials reporting secondary outcome measures of well-being (in toddlers)135 and health-related quality of life (in preschoolers)130 reported nonsignificant effects of treatment and very small effect sizes in both cases.

Contrasting with these null findings, two trials measured outcomes related to emergent literacy skills for speech and language treatments conducted in preschoolers125,130 and found significant improvement in letter knowledge in both cases, with small effect sizes. Although one of these studies failed to find a significant treatment effect for a broader construct of literacy,125 the researchers found a significant treatment effect of moderate size on a measure of reading comprehension first administered at a 6-month followup. Further, these differences were mediated by differences in oral language associated with being in the treatment group.

Several other outcomes were examined only in single trials. Glogowska et al135 found no significant advantages in favor of toddlers randomized to receive speech-language pathology services versus those in the control condition on measures of well-being, attention level, or play. Robertson and Ellis Weismer139 found that parents of language-delayed toddlers randomized to participate in small-group language therapy reported significantly greater improvements in parental stress than parents of toddlers in the control condition; the effect size for this finding was large.

Key Question 7. What Are the Adverse Effects of Interventions for Speech and Language Delays or Disorders?

Three studies examined potential adverse effects of interventions.135,139 The small-group intervention study conducted by Robertson and Ellis Weismer found greater improvement in parent stress, as measured by the Parental Stress Index, in the intervention group. Glogowska et al135 found no differences in well-being between a group receiving individual treatment and the control, and Wake et al130 found no differences in health-related quality of life.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.4M)

In this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...