Situational judgement tests

In their 2001 narrative review, Hough and colleagues estimated Black-White ds to be about 0.61 and 0.43 (both favouring Whites) for written- and video-based situational judgement tests, respectively. In 2008, Ployhart and Holtz reported lower estimates of about 0.40 and 0.31, respectively. Whetzel, McDaniel and Nguyen (2008), in one of the few meta-analyses exploring Black-White differences on situational judgement tests, reported a Black-White d of 0.38 (k = 62, N = 42,178). These authors also found minimal differences when comparing ds for knowledge-focused or behavioural tendency-focused situational judgement tests.

Bobko and Roth (2013) note that the previous research exploring Black-White differences on situational judgement tests are generally based on incumbent samples. They cautioned that Whetzel and colleagues’ (2008) meta-analytically derived estimate of 0.38 is likely downwardly biased. Additionally, job level and construct saturation - in particular, cognitive ability - impact these estimates. Thus, it is currently unclear what the magnitude of the score differences actually is for situational judgements tests. Based on a summary of various primary studies using job applicants, Bobko and Roth provide the following examples based on construct saturation: d = 0.19 for interpersonal skills, d = 0.65 for cognitive ability and job knowledge and d = 1.02 for leadership.

In summary, the size of the observed score differences between Whites and African- Americans has received considerable research attention. The research described in this chapter shows that the size of the differences and who is advantaged vary depending on what is measured. In general, non-cognitive constructs show near-zero differences and, when the differences are greater than zero, they advantage Whites or African-Americans depending on the specific facet of the construct. The observed score differences tend to be larger for cognitive constructs and more consistently favour Whites. However, the size of the difference tends to vary with the specific test or cognitive domain measured, a point we return to later in this chapter, with domains requiring acquired knowledge showing the largest differences. For the various measurement methods, the existence and size of the differences vary widely between the methods and appear to be an interaction between the method of measurement and the relative saturation of the construct that is measured by the method. Thus, contrary to what is often claimed that there are intractable differences between groups, it appears that the observed differences are far more complicated and non-uniform, and the measurement method may play a role.

