Few research studies have been conducted to compare the validity of a cognitive ability test used in a proctored environment and that of the same test used in an unproctored environment. Kaminsky and Hemingway (2009) found comparable validities for the same test administered in proctored and unproctored conditions. Beaty and colleagues’ (2011) meta-analysis also showed that the validities of the proctored and unproctored tests were similar. Despite the lack of many comparative studies, the validity of the unproctored test is typically assumed to be less than that of the proctored tests because of cheating. Nevertheless, when tests administered in unproctored environments are validated, their validity is usually at an acceptable level for use in pre-employment selection programmes.
Because of the challenges of collecting criterion data to conduct a validity study, another approach to comparing validities of proctored and unproctored testing is to evaluate the extent of cheating and impute lower validity when cheating occurs. The underlying assumption is that the more that cheating occurs, the lower the validity of the unproctored test is likely to be. When researchers compare scores of individuals who took a test under proctored conditions to those who took the same test in an unproctored setting, higher scores in the unproctored setting are presumed to indicate that some form of cheating has occurred and thus some negative impact on the validity of the test has also occurred.
Despite the opportunity to cheat, the incidence of higher scores in the unproctored setting compared to the proctored setting is relatively low. Arthur and colleagues (2009) compared test scores of individuals who took a speeded cognitive ability test in proctored and unproctored conditions and estimated an upper limit of 7.7% of test-takers cheating. In a slight twist of the typical research protocol, Lievens and Burke (2011) compared test scores on a timed cognitive ability test consisting of both numerical and verbal items obtained in an unproctored setting to those obtained in a verification testing session. They corrected for regression to the mean and found small d scores across four levels of jobs. At the individual test score level, fewer than 2.2% of those who passed the unproctored test and were invited to take the proctored test exhibited a negative score change, and some proctored scores were actually higher than unproctored scores. In a similar study, Kantrowitz and Dainis (2014) compared unproctored scores on a cognitive test to the proctored cognitive test scores of those who passed the battery of which the cognitive ability test was a part. Again, the incidence of significant score differences was very low (259 of 4,026 at the 0.05 level and 78 at the 0.01 level). Caution in extending the results of both these studies to the entire distribution of test scores is warranted because only people at the top of the score distribution from the unproctored test were invited to take the proctored test. An unanswered question is whether the rate of cheating is consistent at all score levels.
Several factors are particularly relevant to the degree of difference between proctored and unproctored test scores. Researchers use various statistical indicators of a difference. Some correct for regression to the mean. Few seem to correct for a practice effect but do acknowledge the potential to attenuate scores in the second administration. Lievens and Burke (2011) identify differences in test administration conditions other than the degree of proctoring and note that those who were asked to take a verification test under proctored conditions might be more motivated to concentrate on the test because they had passed the earlier hurdle of the unproctored test.
Based on limited research studies, work simulations appear to show similar response patterns in proctored and unproctored environments to cognitive ability tests. For example, Hense and colleagues (2009) reported an effect size of 0.32 between scores on a proctored and an unproctored job simulation.
In contrast to cognitive ability tests that have right and wrong answers, most researchers find that scores on other types of test have similar score distributions regardless of the environment in which the test is given. Nye and colleagues (2008) found no differences in scores from unproctored and proctored internet versions of a speeded perceptual accuracy test.
Personality tests in particular seem to show little or no score differences across administration conditions. Arthur and colleagues (2009) found little evidence of response distortions when they compared the mean scores from a low-stakes, speeded personality test and from high-stakes administrations of personality measures in the literature. Arthur and colleagues estimated the percentage of individual test-takers with elevated scores was estimated to be 30-50%. Although response distortions are common in high-stakes testing, there appears to be little difference in the extent of distortion in proctored and unproctored settings.
In summary, there are few research studies comparing the validity of proctored and unproctored tests; however, in the published studies, the low rates of cheating on different measures (cognitive ability tests, perceptual accuracy tests, simulations) suggest there is little impact on the tests’ validity.