Introduction: The Changing Context of Educational Testing
The field of educational testing is in the midst of dramatic changes. These changes can be characterized, first and foremost, by how exams are administered. Test administration marks a significant and noteworthy paradigm shift. Because the printing, scoring, and reporting of paper-based tests require tremendous time, effort, and expense, it is neither feasible nor desirable to administer tests in this format. Moreover, as the demand for more frequent testing continues to escalate, the cost of administering paper-based tests will also continue to increase. The obvious solution for cutting some of the administration, scoring, and reporting costs is to migrate to a computer-based testing (CBTj system (Drasgow & Olson-Buchanan, 1999; Mills, Potenza, Fremer, & Ward, 2002; Parshall, Spray, Kalohn, & Davey, 2002; Ras & Joosten-Ten Brinke, 2015; Susanti, Tokunaga, & Nishikawa, 2020; Ziles, West, Herman, & Bretl, 2019). CBT offers important economic benefits for test delivery because it eliminates the need for paper-based production, distribution, and scoring. In addition, CBT can be used to support teaching and promote learning. For instance, computers permit testing on demand, thereby allowing students to take the exam on a more frequent and flexible schedule. CBTs are created in one central electronic location, but they can be deployed to students locally, nationally, or internationally. Items on CBTs can be scored immediately, thereby providing students with instant feedback while, at the same time, reducing the time teachers would normally spend on marking tests (Bartram & Hambleton, 2006; Drasgow, 2016; van der Linden & Glas, 2010; Wainer, Dorans, Eignor, Flaugher, Green, Mislevy, Steinberg,
&Thissen, 2000). Because of these important benefits, the wholesale transition from paper-to CBT is now underway.
Adopting CBT will have a cascading effect that changes other aspects of educational testing, such as why we test and how many students we test. As the importance of technology in society continues to increase, countries require a skilled workforce that can make new products, provide new services, and create new industries. The ability to create these products, services, and industries will be determined, in part, by the effectiveness of our educational programs. Students must acquire the knowledge and skills required to think, reason, solve complex problems, communicate, and collaborate in a world that is increasingly shaped by knowledge services, information, and communication technologies (e.g., Ananiadou & Claro, 2009; Auld & Morris, 2019; OECD, 2018; Binkley Erstad, Herman, Raizen, Ripley, Mi I ler-Ricci, & Rumble, 2012; Chu, Reynolds, Notari, & Lee, 201 7; Darling-Hammond, 2014; Griffin & Care, 2015). Educational testing has an important role to play in helping students acquire these foundational skills and competencies. The 1990s marked a noteworthy shift, during which the objectives of testing were broadened to still include the historically important focus on summative outcomes, but a new focus on why we test was also added to include procedures that yield explicit evidence to help teachers monitor their instruction and to help students improve how and what they learn. That is, researchers and practitioners began to focus on formative assessment (see, for example, Black & Wiliam, 1998; Sadler, 1989). Formative assessment is a process used during instruction to produce feedback required to adjust teaching and improve learning so that students can better achieve the intended outcomes of instruction. Feedback has maximum value when it yields specific information in a timely manner that can direct instructional decisions intended to help each student acquire different types of knowledge and skills more effectively. Outcomes from empirical research consistently demonstrate that formative feedback can produce noteworthy student achievement gains (Bennett, 2011; Black & Wiliam, 1998, 2010; Hattie & Timperley, 2007; Kluger & DeNisi, 1996; Shute, 2008). As a result, our educational tests, once developed exclusively for the purposes of accountability and outcomes-based summative testing, are now expected to also provide teachers and students with timely, detailed feedback to support teaching and learning
(Drasgow, Luecht, & Bennett, 2006; Ferrara, Lai, Reilly, & Nichols, 201 7; Nicol & Macfarlane-Dick, 2006; Nichols, Kobrin, Lai, & Koepfler, 2017; Pellegrino & Quellmalz, 2010).
With enhanced delivery systems and a broader mandate for why we evaluate students, educational testing now appeals to a global audience, and, therefore, it also affects how many students are tested (Gregoire & Hambleton, 2009; Hambleton, Merenda, & Spielberger, 2005; International Test Commission Guidelines for Translation and Adapting Tests, 201 7). As a case in point, the world's most popular and visible educational achievement test—the Programme for International Student Assessment (PISA) developed, administered, and analyzed by the Organisation for Economic Cooperation and Development (OECD)—is computerized. The OECD (2019a, p. 1) asserted,
Computers and computer technology are part of our everyday lives and it is appropriate and inevitable that PISA has progressed to a computer-based delivery mode. Over the past decades, digital technologies have fundamentally transformed the ways we read and manage information. Digital technologies are also transforming teaching and learning, and how schools assess students.
OECD member countries initiated PISA in 1997 as a way to measure the knowledge, skills, and competencies of 15-year-olds in the core content areas of mathematics, reading, and science. To cover a broad range of content, a sophisticated test design was used in which examinees wrote different combinations of items. The outcome of this design is a basic knowledge and skill profile for a typical 15-year-old within each country. To accommodate the linguistic diversity among member countries, exams were created, validated, and administered in 47 languages. The results from these tests are intended to allow educators and policy makers to compare the performance of students from around the world and to guide future educational policies and practices. While the first five cycles were paper based, 54 of the 72 (81%) participating countries in PISA 2015 took the first computer-based version. The number of countries which opted for CBT increased to 89% (70 of the 79 participating countries) for PISA 2018 in keeping with the OECD view that CBT has become part of the educational experience for most students (OECD, 2019b). In short, testing is now a global enterprise that includes large numbers of students immersed in different educational systems located around the world.