Content Coding: Challenges Inherent to Managing Generated Items in a Bank
We have described and illustrated a three-step method for generating items using a systematic process. It can be used to produce a large number of test items. While testing organizations have an insatiable desire and appetite for producing large numbers of items using AIG, managing these items after they are created presents many novel and unexpected challenges. In this chapter, we describe the issues that arise when attempting to manage a large item bank, and we provide strategies that can be used to organize a generated bank.
In a traditional bank, items are imported, stored, and administered individually. Depending on the type of administration (e.g., adaptive testing, automated test assembly), items must be appended with additional information so that they can be identified and differentiated. A bank is a repository of items that can also serve as a catalogue for organizing these items. Currently, hundreds of items are needed to create the operational exams used in most testing programs. For this type of use, a bank can be developed that ranges from a simple spreadsheet to a customized database. But with the availability of generated items, the requirements and challenges for a banking system increase because thousands or even millions of items must be organized and managed.
There are at least two important challenges that must be addressed when managing a bank of generated items. First, the sheer volume of items produced using AIG creates a situation that many testing organizations have often dreamt of but have yet to encounter (see, for example, Cole, Lima-Walton, Brunnert, Vesey, & Raha, 2020). Managing and populating a bank containing hundreds of items is complex. But when the bank is expanded dramatically to include hundreds of thousands or millions of items, problems related to memory limits, search criteria, and content review quickly arise. Second, the shift in the unit of analysis from the individual item to the item model solves the problem of content review (see Chapter 7) but creates a new challenge that many organizations have not experienced because these models must be created, organized, and managed. While traditional development relies on processes in which items are written, reviewed, and revised individually, AIG uses processes in which cognitive and item models are written, reviewed, and revised in order to generate items. One important consequence of cognitive and item modelling is that the generated items may be more closely related to one another, which means that enemy items (i.e., items that cannot appear with one another on a test) and duplicate item tracking must be monitored. Similarities among the generated items should, in fact, be expected because large numbers of new items are constantly being created. As a result, a process is needed for selecting items that are produced from the same model. In short, the joy of producing large numbers of items is quickly tempered with the reality of needing to efficiently organize and manage this massive new inventory of assessment content.
These challenges which are inherent to managing generated items warrants a shift in banking. With a traditional item development approach, items are managed individually in the bank. With AIG, items are managed at the model level in the bank. This management task is accomplished with the use of metadata. Metadata permits the SME to organize, track, and manage important characteristics of the generated items. It also permits the SME to answer important questions, such as how similar can two items be before they are considered enemies? Or when can two items that contain a similar stem but different options be included on the same test? Metadata can be used to describe the characteristics of each generated item, thereby allowing the SME to differentiate one generated item from another. This type of differentiation is essential when attempting to organize and manage a large bank of generated items. To produce this level of differentiation, the generated items in a bank require metadata so that they can be organized at both the model and item levels.
Managing Generated Items With Metadata
Metadata is a set of data that is used to describes other data. For example, digital image files from your smartphone contain embedded descriptive information, such as image resolution, smartphone model, and colour depth. The files on your computer contain embedded descriptive information, such as author, date created, file size, and date modified. Digital video files contain embedded descriptive information, such as the length of the video, how it was captured, the compression method used, and the frame rate of the video. In short, metadata is data about data that is intended to provide a more complete description about existing information. Metadata can also be used for managing generated test items. A bank is a repository of test items, which includes both the individual items and data about their characteristics. These characteristics are the metadata for the items. The type of metadata that is commonly used in a traditional bank to describe individual items includes information such as item format, number of options, correct response, item difficulty, and item discrimination. Information about whether an item can appear with another item (i.e., enemy item pairs) can also be captured using metadata. The type of metadata that is used in a bank containing AIG items can include conventional descriptors, such as item format, number of options, correct response, item difficulty, and item discrimination. But additional information in the form of content codes can also be included.
Content coding is a method of appending metadata to existing items. The content needed to create this data can be found in different sources. One common source is the test specifications or blueprint of the exam (see Chapter 2). These specifications are often represented in a two-way matrix, where one dimension represents content areas and/or learning outcomes, and the other dimension represents cognitive skills. Another common source is an assessment framework. This framework includes a description of the content areas measured in each subject, as well as the knowledge and skills students require to solve the test items in each content area (Jago, 2009). The Science Framework for the 2019 National Assessment of Educational Progress, as an example, contains a description of the science content areas (physical science, life science, Earth and space science) and the science practices (identifying science principles, using science principles, using scientific inquiry, using technological design) required by students to solve items on the exam. Test items are then written by SMEs to measure this content and these practices for students in grades 4, 8, and 12 (NAEP, 2019). Medical educators use similar frameworks. For instance, the Royal College of Physicians and Surgeons of Canada developed CanMEDS, which contains a comprehensive list of competencies physicians are expected to acquire and demonstrate during their training (Frank, Snell, & Sherbino, 2015). These competencies are organized thematically according to seven roles (i.e., medical expert, communicator, collaborator, leader, health advocate, scholar, professional) in which a competent physician is one who integrates and achieves all of the competencies across all of the roles. Test items can be described using this list of competencies.
Once the metadata source is identified, the most important challenge the SME must address when attempting to use this information is the timing of the metadata assignment. Should the content codes be identified and assigned before cognitive modelling or after item generation? The timing of content coding presents the SME with different advantages and disadvantages. Applying content codes after the items are generated has the advantage of flexibility because it does not require any planning prior to item generation. The disadvantage of ad hoc coding is that the content definitions may be overly broad, subject to change and, therefore, challenging to interpret consistently. In our experience, ad hoc content coding can be particularly problematic if large numbers of SME are responsible for creating the codes. For example, if different SMEs describe the same skill "division" as "divide", "dividing", or "division without remainder", searching for generated items with these codes may become a difficult task, particularly as more item models with division become available because the terms are not specific, and the concepts and rules that differentiate each skill are not defined, which means that different search terms can yield different sets of generated items.
The alternative to applying codes at the end of the generation process is to identify the content codes that will be used before the cognitive models are created. Typically, these codes are associated with a predefined nomenclature based on an existing test specification or framework. The disadvantage of using a test specification or framework is that it constraints the coding task for SMEs because they are limited to the existing content. In other words, it is not very flexible. Novel cognitive models could be created to produce unique items that fall outside the range of the existing content codes as described in the test specifications or frameworks resulting in items that cannot be coded and classified. The advantage of using existing test specifications of frameworks is that it allows SMEs to have a common understanding of the content that will be used to describe a specific domain, thereby ensuring that the coded items are interpretable. But for this advantage to be realized, an appropriate content-coding system associated with existing test specification or framework must first exist. The codes must be applicable to all of the items that will be created using the AIG methodology. The SMEs must also be trained to use the content-coding system reliably and to interpret the content codes consistently.