Test Format and Method of Delivery
Tests are utilitarian objects. The form of PPTs and CBTs should follow from their function. A test format should help test takers to do their best work without interference. Although complicated documents, tests must be presented simply if they are to be fair and to support valid interpretations. Examinees should not be hampered by confusing directions, unwieldy layouts, busy typefaces, mazelike illustrations or glitchy on-screen features. Making the complex simple is the art and craft of designing test formats. There is no reason the results should lack aesthetic value.
The guiding principle should be Thoreau’s “Simplify, simplify,” or better yet, “Simplify.” But, increasingly in the digital matrix, the devil is in the details, and simplification is itself a complicated process. Fortunately, design guidelines older than Gutenberg and as recent as the latest edition of the CMS help. The main ideas are to use familiar layouts and sturdy fonts, minimize page turning and scrolling, make the presentation as intuitive as possible and clearly state all directions. Examinees then know what to expect and can easily read the test, navigate conveniently and get through the exercise with the least fuss.
Good test design is inclusive. In accord with “universal design” principles, tests should be “designed and developed from the beginning to allow participation of the widest possible range of students, and to result in valid inferences about performance for all students who participate in the assessment” (Thompson, Johnstone & Thurlow, 2002, p. 5). Designing tests from the beginning to accommodate the needs of virtually all students makes for fair tests, as advocated by the Standards for Educational and Psychological Testing (AERA et al., 2014, pp. 50—53). The definition of universal design speaks as well to efficiency and economy as it does to fairness: “the design of products and environments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design” (Center for Universal Design, 1997, p. 1).
As its advocates acknowledge, however, “the concept of universally designed assessments is still a work in progress” (Thompson, Johnstone & Thurlow, 2002, “Acknowledgments,” n.p.). And as practitioners recognize, “the Principles of Universal Design address only universally usable design, while the practice of design involves more than consideration for usability. Designers must also incorporate other considerations such as economic, engineering, cultural, gender, and environmental concerns in their design processes” (Center for Universal Design, 1997, p. 3).
Some tests require technology-enhanced (TE) items of various kinds—but in many areas, what is tested now is the same as in past generations. Developers must ask, “Is TE really needed to test this content? Does a simple format do just as well?” Developers must also ask what QC and other resources are required to support TE formats, and whether they are cost-effective.
Test design and assembly should be preceded by careful consideration of how the test will be administered. Will the means of presentation be print, computer or both? Does socioeconomic fairness demand that print editions be provided to users who do not have the resources to deliver computerized tests? Will there be large-print editions? Audio versions? Reader scripts? A Braille edition? Will there be artwork? If so, will it need to be rendered in raised-line drawings for use with the audio and Braille editions? Considering the range of formats in advance reduces or obviates the need to retrofit. However, it may not always be possible to foretell all the editions. That is why it is best to design tests for maximum flexibility to meet the widest range of needs. For example, if a test is to have a Braille edition, the original test design should avoid these features: construct-irrelevant graphs or pictures, vertical or diagonal text, keys and legends positioned left of or below a figure or item, items that depend on reading graphics for which no textual description is supplied, and purely decorative design features (Thompson, Johnstone & Thurlow, 2002, p. 12). In general, these features are better avoided for all readers. Good design tends to be, indeed, universal.
In planning variant editions, carefully consider space needs. Will the test fit on the page or the computer screen? A large table or graph that runs across two pages in a regular-type booklet will not fit in a large-type booklet or on-screen in such a way as to all be visible at once. Large, detailed artwork that fits on one page in a regular-type booklet may not translate well into large type or on-screen—or may require page turning or scrolling. In evaluating the space available in each format, economy of means in graphics is best: effective “design strategies are transparent and self-effacing in character” (Tufte, 1990, p. 33). Guidelines should specify default values for maximum dimensions for tables and artwork and should specify zoom features to enlarge images for examinees testing by computer.
The same idea holds true for text. If a reading test passage fits on one page in a regular-type booklet, will it also fit in one window on screen? If not, is validity or comparability compromised, or does the testing protocol compensate for the difference (e.g., by giving examinees extra time to navigate)? If a mathematics PPT has blank space for scratch work, is there an equivalent in the CBT? Thinking across platforms (e.g., desktops, tablets, smartphones) helps keep a testing program from falling between them.
Provide clear instructions to test takers (Standards 4.16 and 6.5). Directions must be as nearly identical as feasible across formats and presented equivalently (e.g., timed or untimed).
Besides considering the operational versions of a test, developers need to gauge how closely an operational test must resemble antecedent pilot test, field test and pretest. Similarly, the presentation of test materials itself should be as equivalent as possible across formats (e.g., in the alignment of numerical responses), as should the documents for recording examinee responses. It is important to consider format effects (changes in item performance associated with altered formatting, from screen or window sizes to page breaks). Studies show that variation in screen layout and navigation can have an unpredictable impact on examinee scores (Pommerich, 2004). It is prudent for a testing program to undertake studies to show that its item performance remains stable across the formats it employs.
Before test developers begin to consider the details particular to PPTs and to CBTs respectively, they must consider some details of test formatting and production across both platforms.
The use of computerized item banks is nearly universal (Muckle, this volume). An item bank should be designed with care regarding the required file formats for both input and output. Great care must also be taken to ensure that special characters, symbols and formatting instructions (e.g., underlining, highlighting, required line breaks, line numbering) are preserved in the output to PPTs and CBTs.
It is also important to plan and maintain an archive of the materials that were used to produce the finished product. Without such an archive, costly work must be redone when the occasion for reuse arises.
Figure 27.1 illustrates the basic options of test production for MC, CR and essay tests. In a digital production work flow, a major question is, who marks up what coding (writing and inserting the tags that direct devices), at which steps? The CMS observes that coding markup “requires an immersion in software and programming that is generally the arena of IT specialists” (2010, pp. 878-880). Whoever does the markup, the CMS also notes the great advantage of having a single source file that is publishable in various platforms and says that Extensible Markup Language (XML), an open-source code, “provides the most promising means to date for achieving such flexibility” (p. 863). If XML is
Figure 27.1 Tree diagram of digital production work flow.
used, then each medium in which the file is to be published will need its own cascading style sheet (CSS), a program that reads and applies the tags in the file (p. 865). If a PDF file, which supplies additional formatting codes, is used for publication, it must be optimized for print or electronic use as appropriate. The CMS sections on digital publishing, especially Appendix A, “Production and Digital Technology” (pp. 861-890), describe these processes.
Because of the complexity and pace of change in digital publishing, test developers should communicate early and frequently with IT specialists and printers with regard to planning, design and the interoperability of software and hardware. Known glitches and their remedies, from elegant solutions to kludges (make-do fixes), should be candidly discussed:
- 1. What electronic software and hardware are in use; should any be updated or replaced, and what compatibility issues are there? (Be sure to include text, graphics and any multimedia programs and devices.)
- 2. What accommodated versions are needed, with what hardware and software?
- 3. What are the numbers and kinds of conversions needed, and what known issues are involved?
- 4. What are the metadata needs?
- 5. What is the scale of the test—small and focused, or large and diverse? (Scaling up hardware and software from the former to the latter is challenging.)
- 6. What is the scale of the test administration, and is there sufficient device and server capacity?
- 7. Will the test be delivered by CBT or PPT or both? If CBT, will the test be streamed over the web or downloaded to users?
- 8. What is the range of devices permissible for test takers to use to take the test?
- 9. What will be the division of labor between test developers, proofreaders and IT—for example, will editors be responsible for all the results visible to test takers, and for metadata verification, while IT specialists will be responsible for all underlying tags (computer coding)? Who is responsible for QC at each step?
- 10. What are the security needs, and how robust are the safeguards (secure file transfer protocol [FTP] sites, firewalls, contractual agreements for the disposition of all secure materials, etc.)?
Comprehensive planning is indispensable. Needs, costs and schedules should be developed early, shared with staff and vendors and monitored regularly. Otherwise, servers may overload, software misbehave, lead time disappear and budgets balloon.