Purpose of Chapter

Haladyna (2013) presented a comprehensive overview of AIG history, beginning in the 1950s with the development of Louis Guttman’s facet theory But the last decade was characterized by a flurry of AIG research. This research has focused, in part, on design-related issues, such as cognitive model development (e.g., Embretson & Yang, 2007; Gierl & Lai, 2013b; Gierl, Lai & Turner, 2012), item model development (e.g., Bejar & Cooper, 2013; Gierl & Lai, 2012b; Gierl, Zhou & Alves, 2008) and test designs for AIG (e.g., Bejar et al., 2003; Embretson & Yang, 2007; Huff, Alves, Pellegrino & Kaliski, 2013; Lai & Gierl, 2013; Luecht, 2013). Research has focused on technological advances for AIG (e.g., Gierl et al., 2008; Gutl, Lankmayr, Weinhofer & Hofler, 2011; Higgins, 2007; Higgins, Futagi & Deane, 2005; Mortimer, Stroulia & Yazdchi, 2013), including the use of language-based approaches for item generation that draw on natural language processing and rule-based artificial intelligence (e.g., Aldabe & Maritxalar, 2010; Gutl et al., 2011; Karamanis, Ha & Mitkov, 2006; Mit- kov, Ha & Karamanis, 2006; Moser, Gutl & Lui, 2012), frame-semantic representations (e.g., Cubric & Toasic, 2010; Deane & Sheehan, 2003; Higgins et al., 2005), schema theory (e.g., Singley & Bennett,

  • 2002) and sematic web-rule language (Zoumpatianos, Papasalouros & Kotis, 2011). AIG research has also focused on estimating the psychometric characteristic of the generated items (e.g., Cho, DeBoeck, Embretson & Rabe-Hesketh, in press; Embretson, 1999; Geerlings, Glas & van der Linden, 2011; Glas & van der Linden, 2003; Sinharay & Johnson, 2008, 2013; Sinharay, Johnson & Williams,
  • 2003) . Because of these important developments, AIG has been used to create millions of new items in diverse content areas, including but not limited to K-12 levels in subjects such as language arts, social studies, science, mathematics (Gierl et al., 2008; Gierl & Lai, 2012b, 2013b) and advanced placement (AP) biology (Alves, Gierl & Lai, 2010); in psychological domains, such as spatial (Bejar, 1990), abstract (Embretson, 2002), figural inductive (Arendasy, 2005) and quantitative reasoning (Arendasy & Sommer, 2007; Cho, DeBoeck, Embretson & Rabe-Hesketh, in press; Embretson & Daniels, 2008; Sinharay & Johnson, 2008, 2013) as well as situational judgment (Bejar & Cooper, 2013), word fluency (Arendasy, Sommer & Mayr, 2012), visual short-term memory (Hornke, 2002), vocabulary recall (Brown, Frishhoff & Eskenazi, 2005), cloze tasks (Goto, Kojiri, Watanabe, Iwata & Tamada, 2010), analogies (Alsubait, Parsia & Sattler, 2012) and mental rotation (Arendasy & Sommer, 2010); and in licensure and certification content areas, such as nursing, architecture and medicine (Karamanis et al., 2006; Gierl et al., 2008; Gierl, Lai & Turner, 2012; Wendt, Kao, Gorham & Woo, 2009).

The purpose of this chapter is to describe and illustrate a practical method for generating test items. We will present the basic logic required for generating items using a template-based method that provides the basis for understanding other AIG approaches. By template-based AIG, we mean methods that draw on item models to guide the generative process. An item model is comparable to a mold, rendering or prototype that highlights the features in an assessment task that must be manipulated to produce new items. To ensure our description is both concrete and practical, we illustrate template-based item generation using an example from a medical licensure exam. This example was selected to highlight the applicability and the generalizability of template-based AIG using a complex problem-solving domain. A three-step process is presented. In step 1, the content required for item generation is identified by domain specialists. In step 2, an item model is developed to specify where this content is placed in each generated item. In step 3, computer-based algorithms are used to place the content specified in step 1 into the item model developed in step 2. Using this three-step method, large numbers of items can be generated using a single item model.

While AIG provides a method for producing new items, the psychometric properties (e.g., item difficulty) of these newly generated items must also be evaluated. Item quality is often determined through a field-testing process, where each item is administered to a sample of examinees so the psychometric characteristics of the item can be evaluated. This typical solution may not be feasible or desirable when thousands of new items have been generated. An alternative method for estimating the psychometric properties of the generated items is with statistical models that permit item precalibration. With precalibration, the psychometric properties of the items can be estimated during the item generation process. A description of precalibration statistical methods is beyond the scope of this chapter. However, a recent review of these methods is presented in Sinharay and Johnson (2013).

< Prev   CONTENTS   Source   Next >