Argument-Based Approach to Validity as the Foundation for Assessment Design
The argument-based approach to validity, which entails both an interpretive and use (IU) argument and a validity argument, provides a foundation for assessment design considerations (Kane 2006, 2013, this volume). An IU argument explicitly links the inferences from performance to conclusions and decisions, including the actions resulting from the decisions. Therefore, the choices made in the design phase of performance assessments and tasks have direct implications for the validity of score interpretations and uses. A validity argument provides a structure for evaluating the merits of the IU argument, and requires the accumulation of both theoretical and empirical support for the appropriateness of the claims (Kane, 2006). Each inference in the validity argument is based on a proposition or claim that requires support. The validity argument entails an overall evaluation of the plausibility of the proposed claims and interpretations and uses of test scores by providing a coherent analysis of the evidence for and against the proposed interpretations and uses (AERA et al., 2014; Kane, 2006, this volume; Messick, 1989). The intended score inferences and uses inform the design of performance assessments and tasks. The documentation of the procedures and materials used in their design can provide evidence to support the score interpretations and uses.
In the design of performance assessments, it is important to consider the evidence that is needed to support the validity of the score inferences (Standard 1.1, AERA et al., 2014, p. 23). Two sources of potential threat to the validity of score inferences are construct underrepresentation and construct- irrelevant variance (AERA et al., 2014; Messick, 1989). Construct underrepresentation occurs when a test does not capture the targeted construct or domain, jeopardizing the generalizability of the score inferences to the larger domain. More specifically this occurs when the test does not fully represent the intended construct, evoke the intended cognitive skills or evoke some ways of responding that are essential to the construct. This implies that test developers need to ensure that the knowledge and skills being assessed by the tasks and reflected in the scoring rubrics represent the targeted knowledge and skills. Construct-irrelevant variance occurs when one or more extraneous constructs are being assessed along with the intended construct, such as task wording, task context, response mode, testwiseness, student motivation and raters’ or computers’ attention to irrelevant features of responses. Consequently, test scores will be artificially inflated or deflated, which is a serious threat to the validity of score inferences. The contextual, open-ended and often lengthy nature of performance tasks renders them particularly susceptible to construct-irrelevant variance, implying that design procedures should be in place to maximize standardization and minimize sources of this validity threat.
Providing educators equal access to a representative sample of tasks, allowing for multiple opportunities for student practice, is essential for promoting test fairness and validity of score interpretations and uses (see Zieky, this volume, for a discussion on fairness in testing). Disclosure and equal access to a sample of test material are particularly important with performance tasks because the format may be unfamiliar to students and be a source of construct-irrelevant variance. As indicated in the Standards, test developers must provide test consumers with ample notice of the knowledge and skills to be measured, as well as opportunities to become familiar with the item formats and mode of test administration (Standard 12.8, AERA et al., 2014, p. 197).