Requirements for Testing Program Documentation in the No Child Left Behind Peer Review Guidance
The No Child Left Behind Act of 2001 (No Child Left Behind, 2002) requires states to provide rigorous academic standards, statewide testing systems that include all students in public schools in Grades 3-8 and high school, and other provisions. In 2004, the U.S. Department of Education first provided guidance to “help States develop comprehensive assessment systems that provide accurate and valid information for holding districts and schools accountable for student achievement against State standards” (U.S. Department of Education, 2007, p. 1). The peer review guidance advises that “the requirements are interrelated and that decisions about whether a State has met the requirements will be based on a comprehensive review of the evidence submitted” (U.S. Department of Education, 2007, p. 1); the “Guidance is a framework used to make a series of analytic judgments” (p. 8); and a state “should organize its evidence with a brief narrative response to each of the ‘peer reviewer questions’ in the Guidance” (p. 9; italics in original). These statements suggest that collections of evidence summaries are inadequate if they are not built around a line of reasoning or argument about the accuracy and validity of interpretations of test scores. Despite that suggestion, the practice of organizing technical reports as collections of evidence prevails, as we will show in a subsequent section.
The peer review guidance is organized as a set of questions for each critical element required to make a comprehensive evaluation. For example, the question pertaining to critical element 1.1(a) is “Has the State formally approved/adopted . . . challenging academic content standards” (U.S. Department of Education, 2007, p. 11). The questions are accompanied by brief examples of acceptable evidence (e.g., “The State has formally adopted/approved academic content standards . . .”), possible evidence (e.g., written documentation of State Board of Education meeting minutes) and examples of incomplete evidence (e.g., “The State has developed academic content standards but these standards have not been formally approved/adopted by the State”). The critical elements are organized into sections, some of which resemble the kinds of evidence included in technical reports even before the appearance of peer review guidance in 2004. The seven sections require evidence of content standards, performance standards (referred to as “academic achievement standards”), a statewide assessment system, technical quality, alignment between the test and content standards, inclusion of all students in the assessment system and score reporting requirements. State testing programs are required to provide narrative responses to each critical element question in each section and provide supporting evidence; this organization corresponds with current practices of organizing technical reports as collections of evidence.
The No Child Left Behind peer review process has nudged technical report writers in the K-12 sphere toward making validity arguments by, for example, requiring them to relate technical evidence to questions in peer review guidance (U.S. Department of Education, 2007). Questions like “Has the State taken steps to ensure consistency of test forms over time?” (U.S. Department of Education, 2007, p. 46) require a response in the form of an explanation or argument, based on evidence from, for example, test forms equating analyses. However, the guidance requires these arguments within technical categories (e.g., inclusion of all students in Grades 3-8 and high school, score reliability), which supports technical reports as collections of evidence, and does not promote comprehensive, coherently organized lines of validity argument. We discuss validity argumentation and its implications for technical reporting and documentation in detail later in this chapter.
At the time of writing this chapter, the U.S. Department of Education had gathered comments in preparation to consider revisions to the peer review guidelines. We proposed that the department should organize the critical elements checklist around lines of argument for the validity of interpretations and uses of scores from K—12 reading, mathematics and science tests. Later in this chapter we describe how testing programs can do that.