Table of Contents:


1 Of all the AIG methods described in this book, distractor generation was the most challenging for us to address. Distractor generation, as described in this chapter, required almost two years of sustained work to create and evaluate. Many setbacks and dead ends were encountered along the way.


Ananiadou, K. & Claro, M. (2009). 21st century skills and competences for new millennium learners in OECD countries. OECD Education Working Papers, 41, OECD Publishing. 10.1787/218525261154

Bejar, I. I., Williamson, D. M., & Mislevy, R. ). (2006). Human scoring. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated Scoring of Complex Tasks in Computer-Based Testing (pp. 49-82). Mahwah, Nj: Erlbaum.

Binkley, M., Erstad, O., Herman, J., Raizen, S., Ripley, M., Miller-Ricci, M., & Rumble, M. (2012). Defining twenty-first century skills. In P. Griffin, B. McGaw, & E. Care (Eds.), Assessment and Teaching of 21st Century Skills (pp. 17-66). New York: Springer.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.

Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, I /, 33-63.

Chu, S., Reynolds, R., Notari, M. & Lee, C. (2017). 21st Century Skills Development through Inquiry-Based Learning. New York: Springer.

Collins,). (2006). Writing multiple-choice questions for continuing medical education activities and self-assessment modules. Radiographics, 26, 543-551.

Darling-Hammond, L. (2014). Next Generation Assessment: Moving Beyond the Bubble Test to Support 21st Century Learning. San Francisco, CA: )ossey-Bass.

de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple choice options. Applied Psychological Measurement, 33, 163-183.

Downing, S. M. (2006). Selected-response item formats in test development. In S. M. Downing & T. Haladyna (Eds.), Handbook of Test Development (pp. 287-302). Mahwah, NJ: Erlbaum.

Gierl, M. J. (1997). Comparing the cognitive representations of test developers and students on a mathematics achievement test using Bloom's taxonomy. lournal of Educational Research, 91, 26-32.

Guttman, L., & Schlesinger, I. M. (1967). Systematic construction of distrac- tors for ability and achievement test items. Educational and Psychological Measurement, 27, 569-580.

Haladyna, T. M. (2016). Item analysis for selected-response test items. In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of Test Development (2nd ed., pp. 392-409). New York, NY: Routledge.

Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple choice item-writing rules. Applied Measurement in Education, 2, 37-50.

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and Validating Test Items. New York, NY: Routledge.

Hambleton, R. K., & Jirka, S. J. (2006). Anchor-based methods for judgmentally estimating item statistics. In S. M. Downing &T. Haladyna (Eds.), Handbook of Test Development (pp. 399-420.). Mahwah, NJ: Erlbaum.

Lai, H., Gierl, M. J., Touchie, C., Pugh, D., Boulais, A., & De Champlain, A. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching and Learning in Medicine, 28, 166-173.

Moreno, R., Martfnez, R. J., & Muniz, J. (2006). New guidelines for developing multiple-choice items. Methodology, 2, 65-72.

Moreno, R., Martinez, R. J., & Muniz, J. (2015). Guidelines based on validity criteria for the development of multiple choice items. Psicothema, 27, 388-394.

Paniagua, M., & Swygert, K. (2016). Constructing Written Test Questions for the Basic and Clinical Sciences (4th ed.). Philadelphia, PA: National Board of Medical Examiners.

Penfield, R. D. (2008). An odds ratio approach for assessing differential distractor functioning effects under the nominal response model, lournal of Educational Measurement, 45, 247-269.

Penfield, R. D. (2010). Modelling DIF effects using distractor-level invariance effects: Implications for understanding the causes of DIF. Applied Psychological Measurement, 34, 151-165.

Rodriguez, M. C. (2011). Item-writing practice and evidence. In S. N. Elliott, R. J. Kettler, P. A. Beddow, & A. Kurz (Eds.), Handbook of Accessible Achievement Tests for All Student: Bridging the Caps Between Research, Practice, and Policy (pp. 201-216). New York, NY: Springer.

Rodriguez, M. C. (2016). Selected-response item development. In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of Test Development (2nd ed., pp. 259-273). New York, NY: Routledge.

Shermis, M. D., & Burstein, J. (2013). Handbook of Automated Essay Evaluation: Current Applications and New Directions. New York: Routledge.

Shermis, M. D., Burstein, J., Brew, C, Higgins, D., & Sechner, K. (2016). Recent innovations in machine scoring of student- and test taker-written and -spoken responses. In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of Test Development (2nd ed., pp. 335-354). New York, NY: Routledge.

Shin J., Guo Q., & Gierl M. J. (2019). Multiple-choice item distractor development using topic modelling approaches. Frontiers in Psychology, 10: 825.

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Medical Education, 9, 1-8.

Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors are also part of the item, journal of Educational Measurement, 26, 161-176.

Vacc, N. A., Loesch, L. C., & Lubik, R. E. (2001). Writing multiple-choice test items. In G. R. Walz & J. C. Bleuer (Eds.), Assessment: Issues and Challenges for the Millennium (pp. 215-222). Greensboro, NC: ERIC Clearinghouse on Counseling and Student Services. Retrieved from fulltext/ ED457440.pdf

Williamson, D. M., Mislevy, R. )., & Bejar, I. I. (2006). Automated Scoring of Complex Tasks in Computer-Based Testing. Mahwah, NJ: Erlbaum.

Putting It All Together to Generate Test Items

< Prev   CONTENTS   Source   Next >