Item Writing: A Collaborative Effort

The process of item development is an arduous one. It begins during the earliest stages of test development. But generally, test specifications and item specifications are developed that include information relevant to the item developer to support the process. Item writing is a part of the larger process of item development, which is a part of the larger process of test development.

Lane, Raymond and Haladyna (this volume) provide an excellent summary of the test development process. An important component of this process for the item writer is the design of item specifications, which detail the framework used by item writers to write or create test items and tasks. The most important feature ties item writing to content specifications, focusing attention on the target knowledge, skills and abilities. Standard 4.12 (AERA et al., 2014, p. 89) specifies the documentation of the extent to which test item content represents the domain as defined by the test specifications; although this is a larger test design process, it is imperative that item development proceed to support this effort (see also Standard 7.4, AERA et al., 2014, p. 126).

Advances in item development models include ECD task models and others including engineering frameworks to develop multiple items with a common specification. Examples of item specifications can be found at the websites of state testing programs in Florida ( asp), New Jersey (, Minnesota (, Washington ( Mathematics/TestItemSpec.aspx) and others, and larger testing programs, including initial guidance from the Common Core State Standards effort through Smarter Balanced (http://www.smarterbalanced. org/smarter-balanced-assessments) and PARCC ( test-specs), and the National Assessment of Educational Progress ( frameworks.htm).[1] Raymond (this volume) also describes the important role of practice analysis in creating test specifications. Similarly, we can include a parallel approach that takes advantage of ECD, including the specification of the task models for item production.

Item specifications should include the following (Haladyna & Rodriguez, 2013):

The process of item development requires collaboration among a number of individuals and groups, including general test developers, item writers who typically are content or subject- matter experts, measurement specialists (who may be psychometricians) and relevant specialists in areas such as culture, language development, gender issues and cognitive, emotional/behavioral or physical disabilities. This last group of specialists is often brought into the process for the purpose of sensitivity review of items, but their involvement in the item development process from the beginning is potentially powerful (Zieky and Abedi, this volume, address related issues). Standard 4.8 (AERA et al., 2014, p. 88) specifies that empirical analysis and/or expert judgment be included in item reviews, where qualifications, experiences and backgrounds of judges are documented.

The empirical review is typically based on evidence gathered from item tryouts or field trial administrations. Standard 4.9 (AERA et al., 2014, p. 88) requires clear documentation of these procedures as well as sample selection and representativeness of the intended populations. Standard 4.10 (AERA et al., 2014, p. 88) then specifies that the statistical model(s) used for item review should be documented, including evidence to defend the use of such models to the extent they support the validity of intended inferences and uses of resulting test scores. A list of the life cycle that an item goes through from prewriting to retirement includes:

  • 1. Test purpose, uses and specifications are defined.
  • 2. Item specifications are developed. Here we assume that the item specifications call for SR item formats. The decision to use SR items should be documented, presenting the argument supporting the appropriate and meaningful use of SR items to achieve the test’s purpose.
  • 3. Item writers are identified, selected and trained, including a comprehensive introduction to steps 1 and 2, and training regarding item writing for various subgroups, including students with disabilities or English language learners. This may include the use of item-generation techniques, such as the use of item shells or other models (see Gierl, this volume), including task models with the ECD approach.
  • 4. Item writers engage in supervised item writing, iteratively writing and reviewing items with their peers, with the support of an item-writing leader.
  • 5. Item writers continue in the process of item writing. Items are reviewed potentially by multiple groups prior to piloting:

a. Peer item writers

b. Senior content specialists

c. Sensitivity review (for bias and fairness), including experts with relevant subgroups like persons with disabilities and English language learners

d. Measurement specialists

e. Copy editor.

6. Items are piloted or field-tested, ideally as embedded items in operational tests. Items are then reviewed in several ways:

a. Item analysis is conducted, including a review of the item difficulty and discrimination;

b. Distractor analysis is conducted, to assess the functioning of the distractors (should be selected relatively uniformly and be selected more often by test takers scoring lower on the overall measure) (see Haladyna, this volume, for distractor analysis methods);

c. Item analysis should include some form of DIF analysis, examining functioning across gender, race and language status (perhaps others as required by the testing authority);

d. For new item types, consider conducting think-aloud cognitive interviews to establish (confirm) the cognitive task elicited by the item.

7. Decisions are made regarding the disposition of the item:

a. Edit and revise

b. Eliminate

c. Select for operational use.

8. Items selected for operational use are placed in the item bank, become available for operational

tests and are monitored for performance over time, until released or retired.

Once an item has survived this long process, it is entered into an item bank. Vale (2006) provided a comprehensive discussion regarding item banks and the many decisions relevant for successful item banking systems (see Muckle, this volume).

The primary message here is that each step of the item development process should be completed in support of the purpose of the test. Explicit decision making should be documented along the way. The reasoning behind every decision should be known and documented. Such documentation is consistent with the technical documentation required by the Standards (AERA et al., 2014) and provides validity evidence in support of score interpretation and related inferences and uses.

  • [1] Content domain and cognitive tasks to be included. In ECD, this includes the domain analysis of content and skills and specification of the domain model, including claims and evidence. a. Description of the precise domains of knowledge and skills to be assessed, guides regardinggrade-level requirements, or target job tasks; b. Guidance to support construct representation and comparability across tasks; c. Guidance for cognitive complexity; d. Intended targets for item difficulty; e. Standards and core elements of practice of professional societies and organizations. 2. Item formats allowed and the parameters around their structure. In ECD, this includes the assessment framework, including the student, evidence and task models. a. Sample or model items in each allowable format; b. Number of allowable options for each item format; c. Sources of and characteristics of reading passages; d. Sources and characteristics of stimulus materials (illustrations, figures, graphs); e. Sources and characteristics of practice-based cases, scenarios, vignettes; f. Issues related to diversity and local, regional or national content relevance. 3. Item writing guidelines to be followed. 4. Item editing style guide. 5. Process and criteria for item reviews. 6. Criteria for item selection.
< Prev   CONTENTS   Source   Next >