Multilingual Item Generation: Beyond Monolingual Item Development
For much of the 20th century, tests were developed and administered in the language specific to the culture at a local or regional level. But profound global, technological, and economic changes occurring at the end of the 20th century have resulted in a dramatic increase in multilingual testing. We noted in Chapter 1 that globalization is one outcome that helps characterize this noteworthy period of transition that is now occurring in educational testing. Computerized testing is replacing paper- based assessment, thereby creating the foundation for the wide-spread use of modern testing systems. Modern computer delivery systems are being used to implement test designs to permit examiners to collect information during the assessment process that supports both formative and summative inferences. These developments are unfolding throughout the world and thus affect large numbers of students who are educated in different cultures, geographic regions, and economic systems. To support these changes, educational tests must be developed in many different languages.
As an example, consider the multilingual item development challenges inherent to international achievement testing as conducted by OECD as part of PISA. OECD member countries initiated PISA in 1997 as a way to measure the knowledge, skills, and competencies of 15-year-olds in mathematics, reading, and science. The results from these tests are intended to allow educators and policy makers to compare the performance of students from around the world and to guide future educational practices and policies. The first data collection occurred in 2000 with 32
countries. The sixth and most recent data collection occurred in 2018 with 79 countries. The student samples within these countries ranged from 4,500 to 10,000. CBT is now the norm in PISA with students writing items in both the selected- and constructed-response formats. To measure performance across a broad range of content, a complex test design is used where students write different combinations of items leading to a basic knowledge and skill profile for the typical 15-year-old within each country. To account for the linguistic diversity that exists among member countries, items are created, validated, and administered in 47 different languages (OECD, 2019).
At the other end of the continuum, consider the multilingual item development challenges inherent to local achievement testing as conducted by the government in the Canadian province of Alberta. Alberta Education conducts the provincial achievement testing program each year. The purpose of this program is to determine if students have acquired the knowledge, skills, and competencies outlined in the provincial curriculum. Achievement tests are administered using a CBT system in both of Canada's official languages of English and French across the four content areas of language arts, mathematics, science, and social studies in Grades 6 and 9. Separate tests are developed for language arts in English and French, while the tests in mathematics, science and, social studies are initially developed in English and then translated into French. In total, ten different forms across two grades and two languages are required each year. At Grade 6, more than 49,000 students wrote the English version of the language arts, mathematics, science, and social studies achievement test in 2019 with an additional 3,479 students writing the French language arts test and more than 4,000 students writing the French version of the mathematics, science, and social studies tests. At Grade 9, more than 41,000 students wrote the English version of the language arts, mathematics, science, and social studies achievement test in 2019 with an additional 2,720 students writing the French language arts test and more than 2,800 students writing the French version of the mathematics, science, and social studies tests (Alberta Education, 2019). These two examples demonstrate that creating and administering items in different languages is a common challenge for testing organizations on both a global and local scale.
Challenges With Writing Items in Different Languages
Computer-based educational testing in a multilingual context, as illustrated in the previous two examples, relies on the availability of diverse, multilingual, high-quality test items. Many items are needed for each content area in each required language in order to make inferences about examinees' knowledge, skills, and competencies. One way to address the challenge of scaling item development in order to create items in multiple languages is to hire a large number of SMEs and translators to use the traditional item development and item translation methods (e.g., Hambleton, Merenda, & Spielberger, 2005). But as we noted in Chapter 1, this strategy for scaling item development is costly and time-consuming because each item is created by the SME and then edited, reviewed, and revised until it meets a specific quality standard. The process of translating items adds considerable time and effort to an already time-consuming and resource-intensive development process (Ganji & Esfandiari, 2020). The need to translate each item after it is written serves as a new and complex step to the process because test translation is characterized by its own challenges and problems that must be addressed to produce items that are considered to be equivalent across languages (see, for example, Chapter 5 "Translation and Verification of the Survey Materials" in the PISA 2018 Technical Report). The task of translating items is complex, expensive, and time-consuming when two languages are required, as presented in the Alberta Education example. This task becomes mind-boggling when 47 languages are required, as described in the OECD example.
Multilingual item development using the traditional one-item-at-a- time approach is susceptible to the same problems as monolingual item development. When the item is the unit of analysis, the SME's task is time- consuming and expensive because each item is unique and, therefore, the content in each item must be individually written, edited, reviewed, and revised. Added to this challenge in a multilingual context is the fact that each language is unique. As a result, the language used for the content in each item must be individually reviewed, edited, and revised. If we return to the ambitious SME working in the medical education context introduced in Chapter 1, she aspired to have a very large inventory with over half-ami I I ion items. Added to her challenge is the fact that she works in Canada.
As a result, her testing program is required to provide examinees with the option of writing the test in either one of Canada's two official languages, English or French. Therefore, the 500,000 items in her bank must be written in both English and French—which means that the bank will actually contain 1,000,000 items (500,000 English, 500,000 French). We also noted that her task of writing 500,000 items was challenging because it required her to produce about 41,700 items per month if she is required to produce the items in a year. In a multilingual context, these 41,700 items will also need to be translated into French each month if she is to meet her item banking goals. One way to address the challenge of creating more items in multiple languages is to scale the current item development and item translation methods by hiring a larger number of SMEs to create the source language content and then to hire a large number of translators to convert the source language content into the target language. But these traditional, manual development and translation processes are slow, costly, and labour intensive. Moreover, it is difficult to envision how the translation process would be scaled from two languages, as required in Canada, to 47 languages, as required by PISA in an efficient and cost-effective way.