The predictive value of cognitive ability tests for criteria such as academic success and different aspects of job performance is normally attributed to the effects of the latent abilities that cognitive tests are supposed to measure  . However, concomitant relations between test scores and non-ability factors (e.g., affective traits, socio-economic indicators) continue to support the idea that cognitive ability test scores and/or associated predictive validity coefficients may be biased.
In particular, substantial attention has been given to the examination of the influence of test anxiety. Although test anxiety is a complex construct with multiple cognitive, affective and behavioral components, it is well known that it may decrease cognitive ability test scores and college grades   . The correlation between test anxiety scores and intelligence tests is negative and substantial: r = −0.23 (p < 0.01)  or r = −0.33 (p < 0.01)  . These correlations are open to alternative interpretations   .
According to the interference model   , test anxiety artificially lowers the performance on cognitive ability tests. In this case, test anxiety introduces measurement bias into the test scores. That is to say, two test takers who have the same level of general cognitive ability (g), but who differ in test anxiety, will differ in their expected test score, such that the person with the higher level of test anxiety has a lower expected test score than the person with the lower level of test anxiety.
On the other hand, the deficit model  states that test anxiety does not cause lower test scores, but rather that the correlation between (measures of) test anxiety and test scores exists because people with lower ability levels tend to be higher in test anxiety. In this case, there is no measurement bias because of test anxiety. Two test takers with the same level of (g) will have the same expected score on the test, regardless of their (relative) level of test anxiety.
1.1. The Present Study
These alternative models can be readily distinguished within the Confirmatory Factor Analysis (CFA) framework    , given multivariate test scores and an identified CFA model.
To our knowledge, despite the aforementioned studies, little research has directly investigated whether test anxiety and/or its components can induce measurement bias in cognitive ability tests. Therefore, the aim of the present study was to test for measurement bias in cognitive ability tests, due to a cognitive component of test anxiety: namely, cognitive interference.
1.1.1. Cognitive Interference
The term cognitive interference refers to intrusive thoughts―thoughts that are unwanted, undesirable and perhaps disturbing  . The bulk of research on cognitive interference has examined its role in test taking situations. Intrusive thoughts occurring in academic situations are hypothesized to be a function of test anxiety. Sarason, Keefe, Hayes & Shearin  view cognitive interference as a mediator of the performance deficits associated with test anxiety. Cognitive interference refers to thoughts that intrude and pop into one’s mind during exams, but have no functional value in solving the cognitive task at hand.
Both situational factors as well as individual differences in test anxiety are thought to play a crucial role in the likelihood of occurrence of interfering thoughts. Task-irrelevant processing is claimed to consume working memory capacity in high-test-anxious subjects, which in less anxious individuals remains available for task performance. Field evidence and lab studies suggest that cognitive interference may be a key factor in reducing the quality or efficiency of exam performance and numerous studies have documented the debilitating effects of cognitive interference on task performance. Students whose cognitions during actual course examinations are characterized by intrusive off-task thoughts achieve lower grades than do more task-focused students, a finding that has been replicated in several countries   .
Although results of research on test anxiety are consistent with cognitive interference as an explanatory mechanism, little research has investigated whether differences in cognitive interference result in measurement bias on cognitive ability tests. Furthermore, it would be useful to assess as directly as possible the cognitive interference people experience on particular tasks. Therefore, the specific aim of the present study was to test for measurement bias due to cognitive interference as state, in cognitive ability tests, using a structural equation modeling technique.
1.1.2. Measurement Invariance Analyses
Measurement invariance analyses address the question of whether, and if so, how, “groups differ in the way the measurement of a psychological construct (e.g., mathematics test score) is related to that construct (e.g., mathematical ability)”  . Measurement invariance is said to exist if the manifest random variable(s) (i.e., observed item or test scores) are a function of only the latent ability variables and are conditionally independent from scores on an external variable   . Conversely, measurement bias is present if the observed scores are functions of an external variable in addition to the latent variable underlying the manifest random variable(s).
According to Reeve & Bonaccio  , when the external variable is continuous, measurement bias can be assessed using the following single-group Structural Equation Modeling (SEM) approach:
One first estimates a “General Bias Model” which includes paths from the latent variables to their respective observed indicators, as well as from the putative external biasing variable(s) to the latent variables and the observed indicators. For the current study cognitive interference is included as external variable in the model. The critical question for measurement bias is whether the external variable has any direct effects on the observed indicators.
Second, this “General Bias Model” can be compared to a “No Bias Model” in which the external variable only influences the latent variables and has no direct impact on the observed indicators (i.e., these paths are constrained to zero). The “No Bias Model” is nested within the “General Bias Model”, thus they can be tested for significant differences in fit due to the additional constraints.
The single-group Structural Equation Modeling (SEM) approach, which was used in the present study, was similar to the aforementioned approach, proposed by Reeve & Bonaccio  , in order for them to test for measurement bias, due to test anxiety, in cognitive ability tests.
The total sample consisted of 231 volunteer undergraduate students, 124 (53.7%) men and 107 (46.3%) women. Their age ranged between 19 - 38 years. The majority of the participants 114 (49.6%) were attending the School of Education and the rest of them 117 (50.4%) were attending Schools of Social Sciences, Mathematics, Physical Sciences, Informatics, Engineering and Life Sciences at Greek Universities. As regards class level, in the sample there were included 123 (54.2%) freshmen, 16 (7.0%) sophomores, 32 (14.1%) juniors and 56 (24.6%) seniors, who were attending the 1st, 3rd, 5th and 7th semester of their studies, respectively. Exclusion criteria were history of neurological conditions or psychiatric diseases, alcohol or drug abuse, and profound visual impairments.
2.2. Psychometric Instruments
Cognitive Ability Tests:
(a) The Paper Folding (PF) test is a visualization task that assesses visuo-spa- tial ability. It involves 10 items which require mental folding and unfolding of pieces of paper.
(c) The Number Series (NS) test is an inductive task. The NS test contains 20 items in which a series of five or six numbers is given, and the task requires two more numbers to be added at the end of the series  . The NS test addresses to fluid intelligence. Fluid intelligence has been identified in the past as (g)  .
Cognitive Interference Questionnaire (CIQ):
The CIQ  is a 22-item questionnaire designed to measure, following performance on a task, the degree to which people experienced various types of thoughts while working on it, and the degree to which these thoughts are viewed as interfering with concentration.
The CIQ is a state measure of two types of thoughts in a specific situation: task-oriented worries and off-task thoughts. In the present study, the “Task- oriented Worries” dimension was chosen to test for measurement bias due to cognitive interference in cognitive ability tests. For the purposes of a previous study, the “Task-oriented Worries” dimension (the first 10 items) of the CIQ had been translated into Greek and the single factor structure of the Greek version was verified with CFA   . For our sample: Cronbach’s α = 0.83 was satisfactory.
Sample item: “I thought about how others have done on this task.”
Data were collected across multiple sessions ranging in size from 15 to 20 participants. The Cognitive Interference Questionnaire (CIQ) was administered at the end of participants’ examination in cognitive ability tests. Participants also provided demographic information, including age, gender and class level, prior to completing the questionnaire. Participation in the study was voluntary and participants were informed that all results were confidential.
2.4. Statistical Analyses
The cognitive ability model employed in this research was fitted following Gustafsson & Balke’s  “nested-factor” measurement model method. As the name implies “the less general factors are nested within the more general factors”  , and it should be noted that: (1) as the nested-model approach is used, a separate factor for inductive ability was not specified because this factor is essentially indistinguishable from the (g) factor  , and (2) rather than using full-scale scores or individual items in the SEM analyses, item parcels were used as indicators in the measurement model. The “nested-factor” measurement model technique verified our measurement model for our sample: χ2 (80, N = 231) = 144.25, p < 0.001, χ2/df = 1.80, CFI = 0.97, SRMR = 0.05, RMSEA = 0.06 (CI90% 0.04 - 0.07). Confirmatory factor analysis was conducted in EQS 6.1  . As regards the sample size requirements, for SEM techniques, it is recommended as a rule of thumb that there be at least five observations per estimated parameter  . In the three “nested-factor” measurement models that were estimated in the present study, free parameters ranged between 40 - 44 parameters. Hence, the sample size for these “nested-factor” measurement models had to exceed 220. Thus, the sample size exceeded the minimum recommended level for performing the “nested-factor” measurement model technique.
In order to test the degree to which the item parcels were unbiased measures of the latent ability variables, we applied the aforementioned single-group SEM approach  and we fit two models to the data with the use of EQS 6.1  . In the first model (hereafter called the “General Bias Model”), the cognitive interference measure was specified as a variable external to the latent ability factors, which in turn were specified as the determinants of performance on the item parcels. In addition, we freely estimated paths from the cognitive interference variable to the item parcels. This model was then compared to a second model in which the paths from cognitive interference to the item parcels were constrained to zero (hereafter referred to as the “No Bias Model”).
The Goodness-of-Fit Indexes for the two models were the following:
“General Bias Model”: χ2 (92, N = 231) = 151.46, p < 0.001, χ2/df = 1.65, CFI = 0.97, SRMR = 0.05, RMSEA = 0.05 (CI90% 0.04 - 0.07). The “General Bias Model” is displayed in Figure 1.
“No Bias Model”: χ2 (93, N = 231) = 158.88, p < 0.001, χ2/df = 1.71, CFI = 0.97, SRMR = 0.05, RMSEA = 0.06 (CI90% 0.04 - 0.07). The “No Bias Model” is displayed in Figure 2.
On the basis of the comparison of CFI, and the low value of their RMSEA and SRMR, both of the models showed an acceptable overall fit  . However, there was a slight decrement in fit, when the additional constraints of the “No Bias Model” were imposed. The difference in chi-squares of the two models [Δχ2 (Δdf = 1) = 7.42, p < 0.01] was significant and the χ2/degrees of freedom ratio was slightly lower for the “General Bias Model”.
Figure 1. The “General Bias Model”. *NS = Number Series, PF = Paper Folding, NF = Number Facility, VSA = Visuo-Spatial Abjlily. **All loadings drawn indicate significant associations (p < 0.05).
Figure 2. The “No Bias Model”. *NS = Number Series, PF = Paper Folding, NF = Number Facility, VSA = Visuo-Spatial Ability. **All loadings drawn indicate significant associations (p < 0.05).
Actually, none of the parameter estimates for the paths from cognitive interference to the item parcels reached statistical significance, when the “General Bias Model” was fit to the data, except for an additional path from the cognitive interference variable to the one of the five item parcels, which were used as indicators for the fluid intelligence (inductive ability). This effect of cognitive interference was positive and low (0.14) and indicates that cognitive interference did not have a high direct effect on the observed indicator.
Additionally, in both of the models, cognitive interference was significantly associated with the latent variables reflecting (g) (−0.32, for the “General Bias Model” & −0.29, for the “No Bias Model”) and visuo-spatial ability (−0.37, for the “General Bias Model” & −0.38, for the “No Bias Model”), as well.
In conclusion, across the application of the single-group SEM approach to the data of the present study, the best-fitting model was the “General Bias Model”, which suggested that the differences in cognitive interference were mainly associated with differences in the latent variables reflecting (g) and visuo-spatial ability, but they were also directly associated with differences in one of the observed indicators (item parcels) of g-factor. In specific, these results showed that our participants’ observed performance on one of the Number Series (NS) test’s item parcels was being influenced directly by cognitive interference.
These findings are not consistent with previous findings supporting the deficit model  . On the contrary, they partly support the predictions of the interference model and suggest that task-oriented worries affected the degree to which, at least, one of the item parcels was equally valid and unbiased measured variable of the latent g-factor. Consequently, our results suggest that a cognitive ability test, similar to those typically used in personnel and educational contexts, namely the Number Series (NS) test  displays measurement bias due to cognitive interference as state.
The use of cognitive ability tests is common in both educational and employment settings, due to their robust capability to predict important outcomes. However, according to the findings of the present study, substantial attention should be given to the examination of the influence of affect on ability tests’ performance, since the use of these tests in applied settings may result in the biased assessment, placement, or selection of test-takers. For example, in the context of college admissions, high ability applicants who suffer from test anxiety may be inappropriately rejected. Similarly, in the context of cognitive education using information and communication technologies, high ability students who suffer from test anxiety may be inappropriately assessed.
A limitation of this study is the use of self-reported measure of cognitive interference. Affective computing research could possibly contribute to creating new techniques to take into consideration the effect of test-takers’ emotions on their test performance. The restricted nature of the sample should also be noted, especially with regard to age and class level. Furthermore, it is unknown whether the same pattern of results would be obtained, if more college students of others schools, than School of Education, were included in the sample of the present study. Finally, it should be noted that this paper has been focused on linear effects of cognitive interference on test performance and it is quite likely that test anxiety effects are non-linear. Thus, more work on such effects is needed as they require more elaborate psychometric models.