Received 28 December 2015; accepted 13 February 2016; published 16 February 2016
During the child’s second year of life, parents often become concerned if their child does not talk as other children of the same age do. Many studies have shown the difficulty to predict early language delay (Henrichs, Rescorla, Schenk, Schmidt, Jaddoe, Hofman et al., 2011; Klee, Carson, Gavin, Hall, Kent, & Reece, 1998; Nelson, Nygren, Walker, & Panoscha, 2006; Reilly, Wake, Bavin, Prior, Williams, Bretherton et al., 2007; Zubrick, Taylor, Rice, & Slegers, 2007) . Screening can be used to select individuals from a given population at risk for an impairment (Law, Boyle, Harris, Harkness, & Nye, 1998) . By 24 months, most children have acquired the basic vocabulary and use a varying range of words per utterance, but some children still have no word combinations (Fenson, Marchman, Thal, Dale, Reznick, & Bates, 2007) . Even the slow learners are known to improve their vocabulary from 2 to 3 years, and quite many of them are reported to have roughly normal language skills at 5 or 6 years (Rescorla & Lee, 1999; Rescorla, Roberts, & Dahlgaard, 1997) . Despite this natural recovery, some children with early speech and language delay have an increased risk for learning disabilities at school (Bishop & Clarkson, 2003; Catts, Fey, Tomblin, & Zhang, 2002; Rescorla, 2009) . Language-delayed children also have more frequently behavioral and psychosocial problems than their typically developed age-mates (Cohen, Menna, Vallance, Barwick, Im, & Horodezky, 1998) .
The estimated prevalence of speech and language delay ranges from 5% to 8% for children at 2 to 4.5 years (Burden, Stott, Forge, & Goodyer, 1996; Law et al., 1998) . Reilly and co-workers (2007) reported a much higher prevalence, 20% for late talkers at 2 years. In the study of Zubrick and co-workers (2007) , 13% of children at 2 years were identified as late talkers. According to Klee and colleagues (1998) about 15% of 24-month-old children may screen positive, but only about 3% - 8% will have language impairment evidenced at later age.
Besides the actual language evaluation, it is important to identify the risks and protective factors to understand an individual child’s challenges and resiliencies. Family history is often mentioned as an important risk factor for language delay (Cambell, Dollaghan, Rockette, Paradise, Feldman, Shriberg et al., 2003; Choudhury & Benasich, 2003; Horwitz, Irwin, Briggs-Gowan, Bosson Heenan, Mendoza, & Carter, 2003; Tomblin, Smith, & Zhang, 1997; Zubrick, Taylor, Rice, & Slegers, 2007) . In many studies, low maternal education is associated with child’s language delay (Cambell et al., 2003; Tallal, Ross, & Curtiss, 1989; Tomblin, Smith, & Zhang, 1997) . Early neurobiological development (e.g., low birth weight and prematurity) was a consequential predictor for language delay in the study by Zubrick et al. (2007) . Childhood illnesses, low socio-economic status and non-traditional family structure are less often included in the list of risk factors (Singer, Siegel, Lewis, Hawkins, Yamashita, & Baley, 2001) . Many studies have reported a significant gender effect among early lexical skills, with girls showing advanced language development relative to boys (Cambell et al., 2003; Choudhury & Benasich, 2003; Zubrick et al., 2007) . To conclude, there seems to be a cumulative effect: children with multiple risk factors need more often special support than children with the same level of language disability but with a better resourced background (Crais, 2011) .
Parental reports and questionnaires are used for language screening within general health care, even though some clinicians doubt the accuracy of these methods. However, there are many studies showing their validity and reliability (Camaioni, Castelli, Lombardi, & Volterra; 1991; Dale, Bates, Reznick, & Morisset, 1989; Ring & Fenson, 2000; van Agt, van der Stege, de Ridder-Sluiter, & de Koning, 2007) . When parents complete word checklists during the course of several days, they do not have to rely on their memory. The MacArthur Communicative Developmental Inventories (CDI) is one of the most commonly used parent-reported instruments for screening early language acquisition (Fenson, Dale, Reznick, Thal, Bates, Hartung, Pethick, & Reilly, 1993; Fenson et al., 2007; Klee et al., 1998) . CDI-T is used to assess language skills in toddlers (from 16 to 30 months; words and sentences). Ring and Fenson (2000) tested children’s lexicon at 25 months by using two methods, namely word checklists completed by the parents and laboratory assessment with equal pictures. In study conclusions, parents were found to be reliable evaluators of their child’s lexical skills.
The child’s vocabulary is a commonly accepted criterion of language acquisition and an important component of cognitive development. On the other hand, there is very little, if any, knowledge of language competence as a whole in toddlers (Crais, 2011) . Language competence consists of many sub-skills, with each having its own developmental pathways (Poll & Miller, 2013) .
In this study, we examined if early language screening can be used to identify children who need special support or intervention. The current study was designed to compare two differently constructed screening instruments and two age points to examine language delay: 1) parent-reported vocabulary checklist at 24 months; and 2) a battery of language skills at 36 months carried out by a clinical nurse. The sensitivity and specificity of these screening methods were compared using clinical language test results at 36 months as outcome measures.
2.1. Study Design and Participants
Participants were children from a Finnish cohort study entitled Steps to the Healthy Development and Well- being of Children (the STEPS study) (Lagström, Rautava, Kaljonen, Räihä, Pihlaja, Korpilahti et al., 2013) . The STEPS study is a population-based longitudinal study that investigates children’s physical, psychological and social development, starting from pregnancy and continuing until adolescence. One aim of the STEPS study is to identify precursors and causes of problems in children’s health and well-being. The study population is two-tiered: the cohort group (N = 9.936 children) and the intensive follow-up group (N = 1.827 children, 18.3% of the cohort). The cohort group consists of all children born in the Hospital District of Southwest Finland between January 2008 and April 2010 and their parents (children acquiring Finnish or Swedish as their native language). Mothers were recruited to the intensive follow-up group during the first trimester of pregnancy or soon after delivery.
The Let’s Talk STEPS study sample for this paper consists of 226 children (120 boys and 106 girls). Participants were randomly selected from the intensive follow-up group (N = 1.827), excluding the Swedish speaking mothers (N = 74) (Figure 1).
Children’s early expressive vocabulary was reported by parents with the MacArthur Communicative Development Inventories (CDI-T) (Fenson et al., 1993) ; Finnish version and normative data by Lyytinen (1999) , and
Figure 1. The flow chart describing the selection process of the study.
coded into the STEPS cohort database. The CDI-T word checklists were mailed to the parents when the child was 24 months old (CDI-T toddlers, usable from 16 to 30 months). Parents also had the possibility to complete the vocabulary checklists online. To categorize a child to have delayed language at 24 months we used a cutoff score < 10th percentile for the CDI-T. CDI-T information was received from 188 (83%) children of the LT-STEPS study group (N = 226).
Screening was performed for the second time when the family visited the Turku Institute for Child and Youth Research (CYRI) for a health control at 36 months. A trained nurse carried out the screening with the Fox Language Inventory (FLI) (Korpilahti & Eilomaa, 2002) individually for each child during a 10 - 15 minutes’ session. The FLI is designed for screening speech and language at 36 months. The FLI is based on the Finnish language and it is widely used as a screening instrument within daycare and general health care. It measures 8 linguistic sub-skills: naming, language comprehension, following instructions, intelligibility of child’s speech, narration, morphology, knowledge of numbers and colors, and sentence length. The screening material consists of 8 black and white line-drawn pictures, small toys, and 8 color-drawn pictures presenting the main character, Fox, in everyday situations. The FLI provides sub-area scores, and also a total score for the child’s language competence. The normative data for the FLI comes from 100 healthy, monolingual Finnish-speaking children aged 3; 0 - 3; 6 years (Korpilahti & Eilomaa, 2002) . A child is classified to have a risk for language disorder when the total score is below 45 points (Max 60 points). In our study, FLI scores were received from 146 (65%) children of the LT-STEPS study group.
At the age of 36 months ±2 weeks, children were invited for a language assessment at the Turku University Clinic of Speech-Language Pathology. With the invitation letter, the parents received an informed consent form describing the purpose of the language assessment and providing information of its content and duration. The Renfrew Word Finding Vocabulary Test (RWF) (Renfrew, 1995) , and The Reynell Developmental Language Scales III, language comprehension (RDLS III) (Edwards, Fletcher, Garman, Hughes, Letts, & Sinka, 1985) , served as measures of the child’s language competence.
The RWF contains 50 line-drawn picture cards representing everyday lexicon arranged in the order of difficulty. The testing is conducted by presenting one picture at a time until the child fails to name six pictures in a row. The assessment continues with the next sets of six pictures presented, until the child does not find any proper names for the pictures. For the RWF the original scoring was used. RDLS III, language comprehension sub-test, contains single words and sentences in the order of difficulty and growing demand for reasoning. RDLS III has standardized Finnish norms (Kortesmaa, Heimonen, Merikoski, Warma, & Varpela, 2001) .
Language testing was performed by a trained research assistant under the control and supervision of an experienced speech-language therapist. Some children were very shy and refused from testing. For statistical analyses, we obtained test results from 217 (96%) children for the RWF and from 218 (96%) children for the RDLS III. After testing, the family was provided with written feedback of the assessment results. In the case of a severe speech or language delay, the parents were instructed to contact their local health care center for further diagnosis and support.
2.3. Statistical Analyses
Statistical analyses were partly based on non-parametric methods because the conditions for parametric statistics were not met by all of the data material. Spearman’s rho was used to calculate the correlations between CDI-T, FLI and language outcome measures. Analyses were complemented either by Mann-Whitney U test or Kruskal- Wallis one-way analysis of variance (ANOVA) to test the differences between the study groups. Receiver operating characteristic analyses (ROC) was used to analyze the cost and benefit of the two screening methods in decision making (Griner, Mayewski, Mushlin, & Greenland, 1981) . ROC analyses indicates the sensitivity and specificity of the research method. ROC curve was used to visualize the fraction of true positives out of the total actual positives vs. fraction of false positives out of the total actual negatives. The Area Under Curve (AUC) shows the probability assuming positive ranks higher than negative ones. Statistical level of significance was set at 0.05 in all analyses. Statistical analyses were carried out using SAS version 9.3 (SAS Institute, Cary, NC, USA).
2.4. Ethics Approval
The Ministry of Social Affairs and Health, and the Ethics Committee of the Hospital District of Southwest Finland have approved the STEPS Study (February 2007). The Ethics Committee of the University of Turku has approved the Let’s Talk STEPS study (March 2011). The parents have given their written informed consent and been informed of their right to withdraw from the study at any point. The description of the scientific data file is formulated according to guideline issues by the Office of the Data Protection Ombudsman. The data are stored in a secure manner at the Turku Institute for Child and Youth Research (CYRI), University of Turku.
At 24 months 9.6% of the target children were categorized as language delayed, using the CDI-T as a screening instrument. Individual differences in the size of expressive vocabulary were quite notable. Some children had reached almost the maximal value of the screening instrument (595 words). On the other hand, there were some children who did not have any expressive words (mean 340 words; range 0 - 587 words).
Screening with the FLI did show that 10.9% of the study population had delayed language at 36 months. At this age, 8.6% of children reached the maximal scores for the FLI (60 points). The test results were positively skewed (mean 56 points; range 31 - 60 points) (Table 1).
The language skills at 36 months were tested for 226 children, using the RWF (Renfrew, 1995) for measuring the vocabulary and the RDLS III (Edwards et al., 1985) for language comprehension. Statistical analyses involved 96% of the tested children: lexical test results were obtained from 217 children and language comprehension test results from 218 children. The results of the language tests at 36 months showed that altogether 19 children, 15 boys and 4 girls (8.8%), had delayed language. They had not reached age norms for vocabulary (n = 12, score < 9) or language comprehension (n = 10; score < 80). Three of these children had difficulties in both sub-skills.
The ROC-analysis confirmed that expressive vocabulary at 24 months (CDI-T; n = 188) was already a reliable indicator for language delay shown by language test scores a year later (Table 2). AUC for the CDI-T revealed probability of 80.1 % for the exactness for screening language delay. The sensitivity for the CDI-T was 84.6%, the specificity 70.9%, and the LR (the likelihood of a positive screen result) was 2.91. These results also confirm that parents were competent in their evaluations of the child’s language acquisition.
The ROC-analysis displayed that enlarged task battery and occupational expertise of a performing person, together with an year later time frame, improved the reliability of screening. AUC for screening with the FLI (n = 146) revealed the probability of 80.6% for the exactness for screening language delay. The sensitivity for the FLI was 85.7%, the specificity 73.7%, and the LR 3.26. The areas between the ROC curve and the no-discrim- ination line demonstrate the efficacy of the two screening methods (Figure 2 and Figure 3).
If the child had not reached age-typical vocabulary at 24 months, he or she had 22% probability to have poor language skills also at 36 months, compared with 8% probability for the typically developed children (p = 0.0118). If categorized to have delayed language at 24 months, a child probably belonged to the DL group also
Table 1. Screening scores at 24 months (CDI-T) and at 36 months (FLI), and outcome measures for word finding (RWF) and for language comprehension (RDLS-III) at 36 months.
Figure 2. The receiver operating curve (ROC) of the CDI-T at 24 months, dotted line for random quess. The outcome measures at 36 months were RWF (word finding) and RDLS-III (language comprehension).
Figure 3. The receiver operating curve (ROC) of the FLI at 36 months, dotted line for random quess. The outcome measures at 36 months were RWF (word finding) and RDLS-III (lan- guage comprehension).
Table 2. ROC analyses and confidence levels for the two screening methods, CDI-T at 24 months and FLI at 36 months.
at 36 months, as indicated by the FLI total score (p = 0.0002). If the child was found to have language-delay at FLI, he or she had 12% probability to have poor language skills also at language tests at 36 months, compared with 1% probability for the child from typically developed group. The difference between the delayed and the typically developing group was statistically significant, p = 0.0024 (Fisher’s Exact Test).
Screening scores at 24 and at 36 months both correlated at significant level with word finding (RWF) and language comprehension (RDLS III) (p < 0.0001). The Fox Language Inventory was found to be a consistent screening method at 36 months (Cronbach’s alpha = 0.66). In the FLI task battery, sentence length was the only parameter that seemed to set apart a deviating sub-skill from the other language abilities (Table 3).
This study focused on children with primary language delay, in the absence of a clear etiology. At 36 months age, 12 (63%) of language-delayed children had some markers of familiarity: 3 mothers and 3 fathers had learning problems at school, 6 mothers and 7 fathers needed special education, and in 4 cases the family history showed language related problems among more distant relatives. A significant gender effect for the preference of girls was found already at 24 months (p = 0.0072) and it became even more prominent for language comprehension at 36 months (p = 0.0005).
Table 3. Spearman correlations between the parent reported vocabulary (CDI-T) at 24 months, FLI at 36 months, and language test results RWF and RDLS at 36 months.
p < 0.05*, p < 0.01**, p < 0.001***. FLI: I = naming, II = language comprehension, III = following instructions, IV = intelligibility of child’s speech, V = narration, VI = morphology, VII = knowledge of numbers und colors, VIII = sentence length.
We compared two methods to find out their reliability as screening instruments and appropriate age-point to screen the risk for language delay. ROC analyses was used to assess the sensitivity and specificity of the screening tool. ROC analyses is a powerful method to identify false-negative and false-positive results which may cause anxiety for parents and useless interventions. We found that parent reported vocabulary scores at 24 months can already be used as a reliable screening measurement for later language development. CDI-T can be recommended as a simple and still effective screening instrument because it does not require professional education. However, a wider language battery, FLI, and later age-point of 36 months had slightly higher screening sensitivity and specificity. Both screening instruments were able to determine those children who needed special support in their language development.
According outcome test results 8.8% of children had delayed language at 36 months, and might have a risk for later language disorders. The prevalence is lower than that reported for 2-year-old children by Reilly et al. (2007) (20 %) and Zubrick et al. (2007) (13%). One reason for lower prevalence in our data may be that we did not test articulation as a part of the outcome measures. Children from 2 to 3 years are just learning the phonological structure of their native language, and this is reflected in their oral-motor skills and pronunciation. Our data were based on voluntary participation and healthy population, which also may have affected the incidence of delayed language (Lagström et al., 2013) .
The data of this study is derived from a large population-based cohort, the STEPS study. Even though there are several studies on the early language acquisition, the screening results are quite seldom based on population data. In our study, randomly selected children (N = 226) were invited to a language assessment at 36 months in order to test their vocabulary and language comprehension, the two main indicators for future language learning. Unfortunately we did not have screening data for the whole research group, which was weakening the statistical analyses. We found that a later time-point, and wider test battery added the validity of the screening instrument. One limitation of the present study is that the early screening was only based on parent reports, although there are many studies showing their validity and reliability (Camaioni, Castelli, Lombardi, & Volterra, 1991; Dale, Bates, Reznick, & Morisset, 1989) . Further research is also needed into the predictivity of early screening when compared with the language skills at pre-school and school age. Many language-delayed children are found to catch up with their age mates and to have approximately normal language skills at 5 or 6 years of age (Rescorla, 2011) .
5. Conclusions and Practical Implications
We conclude that, expressive vocabulary at 24 months has a strong prognostic value for language competence at 36 months. However, language screening based on a wider task battery and age range a year later gives even more reliable bases for diagnostic decision making. The results of this study demonstrate that severe language delay can be identified at two to three years of age. Language screening, clinical assessment and test results serve as gateways to speech-language therapy and intervention. Accurate identification of language delay is important from clinical, educational and economical perspectives. Reliable screening methods are needed to allocate health care resources to children at risk of language delay. Multidisciplinary research, and especially knowledge of the neural bases of language, is needed to gain a deeper understanding of language delay and permanent disability.
We wish to thank the University of Turku, Åbo Academy University, the Turku University Hospital and the City of Turku funding this research. The first author was supported by the Emil Aaltonen Foundation. We wish to thank all our collaboration partners at the University of Turku and Turku University Hospital. The authors are grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the entire STEPS Study team.
Conflict of Interests