Two Important Insights Involving Cognitive and Item Modelling in AIG

AIG is capable of producing an impressive number of test items. In Chapter 1, we claimed that the three-step AIG method could be used to generate hundreds or thousands of new test items using a single cognitive model. This statement is accurate. However, this claim is sometimes interpreted to also mean that one cognitive model can be used to produce the content needed for an entire test. This interpretation is inaccurate. The cognitive model will faithfully produce the content that is specified in the top panel. Recall the top panel includes the problem and scenarios. For example, the problem and scenarios in the medical cognitive model include respiratory illness and the common cold and seasonal flu, respectively (see Chapter 2, Figure 2.4). This modelling structure means that regardless of whether the three-step AIG process is used to generate 500 or 5,000 items, these items will all be related to the problem of respiratory illness and measure the two scenarios of the common cold and seasonal flu. If the purpose of a test were intended to measure a single problem with a small number of scenarios, then only one cognitive model would be needed to produce the content for such a test. But in reality, this type of test is never created. Instead, the test specifications we described in step one for developing a cognitive model in Chapter 2 are used to identify many different content areas. These content areas must be carefully aligned with the problem and scenarios in our cognitive models to produce the content for the test. Hence in most operational testing situations, many different cognitive models will be required to implement the objectives described in the test specifications because the specifications tend to be multifaceted and complex. We advise that when developers or users want a quick summary of the generated content, they simply need to review the problem and scenarios panel in the cognitive model. This panel provides a succinct and accurate snapshot of the general content domain that will be represented by all of the generated items from a single cognitive model.

The topic of item diversity serves as a closely related second insight. Again, we return to the claim that AIG is capable of generating an impressively large number of items. While the content represented by these items is outlined in the problem and scenario panel of the cognitive model, the diversity that can be captured with these items is dictated by the item modelling approach. A 1-layer item model produces new items by manipulating a small number of elements at a single level in the model. This type of item model is a good starting point for a novice AIG developer because it is relatively simple. Using our medical example, the cognitive model with a respiratory illness problem that includes the common cold and seasonal flu scenarios can be used to generate diagnosis (i.e., What is the most likely diagnosis?) items using a 1-layer item model. Regardless of whether these cognitive and item models generate 500 or 5,000 items, all of the generated items will be related to the examinee's ability to diagnose the cold or flu under the general problem area of respiratory illness. N-layer item modelling helps diversify the generation process. An n-layer item model produces new items by manipulating a relatively large number of elements at two or more layers in the model. This type of modelling is appropriate for the experienced SME because it is more complex compared to 1 -layer modelling. But the complexity offers the benefit of generating more diverse items. Using our medical example, a single n-layer item model was used to generate both diagnosis (i.e., What is the most likely diagnosis?) and management (What is the best next step?) items—two entirely different items. Because the layering process is unlimited, other types of items (e.g., treatment) could also be included in this model. Therefore, we advise that developers or users review the layering approach when they want to understand and anticipate the kind of diversity that an item model is capable of producing. N-layer models are capable of generating diverse items, but the content for these items will always remain within the domain defined by the problem and scenarios panel in the cognitive model.

Non-template AIG: A Review of the State of the Art

It is important to recognize that AIG can be conducted in many different ways.1 Our book is focused on template-based AIG using item modelling. But non-template AIG approaches also exist. Now that we have described template-based AIG, we provide a brief summary of non-template AIG, as described in three recent studies. We consider these three studies to represent the state of the art for non-template AIG. We also explain why the template-based approach is preferred for operational testing applications, at least at this point in the history of AIG.

Non-template AIG can be guided by the syntactic, semantic, or sequential structure of a text. Non-template AIG, which relies heavily on natural language processing (NLP) techniques and knowledge bases, can be used to directly generate statements, questions, and options from inputs such as texts, databases, and corpora of existing information. With this approach, templates are not required for generating content. The first, commonly used, non-template AIG approach is syntax based. This approach operates at the syntax level, where language representations and transformations are defined using syntax. Syntax is a description of how words are combined in sentences, where the syntactic structure of a sentence conveys meaning. The typical syntax-based approach requires tagging parts of speech, defining syntactic categories (e.g., verbs, noun phrases), and constructing syntax trees. Danon and Last (2017) described a syntax-based approach to automatically generate factual questions in the content area of cybersecurity. Syntax-based question generation extracted key statements from a text and then transformed the syntactic structure to directly produce factual questions, which served as the generated items (Heilman, 2011). To build on the work first described in Heilman's (2011) dissertation research, Danon and Last introduced a new language processing technique to provide richer contextual information to the questions in order to improve question-generation quality. The Danon and Last system started by training word embeddings using Word2vec with over one million cyber-related documents to accurately represent the syntactic relationships expressed in the corpus. Then a set of sentences was selected from the corpus as an input to the generation system. Initial question-answer pairs were generated by identifying possible answer phrases and creating suitable questions for each answer type. Finally, the quality of the generated questions from the previous step was improved by adding extra contextual information to the stem. Item generation was conducted in the content area of cybersecurity. An evaluation corpus composed of 69 sentences was created using 30 articles on cybersecurity. The syntax-based AIG system was provided with a statement such as "Polymorphic virus infects files with an encrypted copy of itself" in order to generate questions such as, "What infects files with an encrypted copy of itself?" One hundred and forty-eight questions were generated. The generated questions were then evaluated by two cybersecurity SMEs using question quality from Heilman's 2011 study as the baseline. The evaluation was based on the fluency, clarity, and semantic correctness of the questions. The results indicated that 68% of the questions initially generated by Heilman's (2011) system could be modified using the current system. Among the modified questions generated by Danon and Last, 42% were identified as acceptable by the SMEs.

The second, commonly used, non-template AIG approach is semantic based. This approach generates questions at the semantic level. Semantics focuses on the meaning of a sentence, which is often expressed using a structured combination of words. Semantic-related techniques include finding synonyms (e.g., Giitl, Lankmayr, Weinhofer, & Hofler, 2011), processing word sense disambiguation (e.g., Susanti, lida, &Tokunaga, 2015), and translating words and sentences from one language to another. Flor and Riordan (2018) described a semantic-based AIG system that generates factual questions using role labelling. Semantic role labelling is the process of assigning particular labels to different parts of a sentence in order to represent their semantic roles. The system used the information gathered from role labelling with rule-based models to create constituent or wh-questions and yes/no questions. Two steps were required. To begin, Flor and Riordan used open language processing tools to analyze the grammatical structure of sentence parts required for assigning particular semantic roles. SENNA, for instance, is a popular tool for identifying generalizable core arguments in a sentence to provide specific labels, such as agent (or subject), patient (or object), location, and time of the event. Then the system directly generated constituent questions from the labelled sentences focusing on a focal label (e.g., agent) to select the most appropriate question type (e.g., what, who, where). To prevent the generation of erroneous questions, rule-based decisions were used. For example, the system sub-classified question types based on prepositions (e.g., on, for, in) and provided do-support for certain cases. Generating yes/no questions followed a similar framework by providing do-support to the original statement. Item generation was conducted in the content area of education. A corpus of 171 sentences was created using 5 educational expository texts. The sematic-based AIG system was provided with a statement such as "Peter called on Monday", or "Peter called for six hours". In this example, the semantic role labelling identified several focal points in the sentence, such as "Peter" as the agent and "on Monday" and "for six hours" as the time information. Using this information, the system could generate constituent questions such as "When did Peter call?" and "How long did Peter call?" For the yes/no question type, a statement such as "The particles from the Sun also carry an electric charge" could be used to generate a question such as "Do the particles from the Sun carry an electric charge?" Eight hundred and ninety constituent and yes/no questions were generated. Question quality was evaluated, first, by generated comparable questions using a neural network AIG described by Du and Cardie (2017), which served as the baseline and, second, by asking two linguistic SMEs to compare the neural network and the sematic-based generated questions on their relevance, grammatical, and semantic soundness. Both the neural network and the sematic-based AIG systems were required to generate questions using the 171-sentence corpus. The result indicated that constituent (i.e., 10.20 out of 14, where a higher score indicates a better result) questions and yes/no (i.e., 11.41) questions generated from the semantic-based AIG system were more relevant, grammatical, and semantically sound compared to the questions generated from the neural network (i.e., 8.01).

The third commonly used, non-template AIG approach is sequence based. Sequence describes a set of letters or words that follow one another using a specific ordering to produce meaning. Hence, the sequence-based AIG focuses on mapping content to sequential numeric vectors and then using neural networks to predict the sequence of letters and words in order to create new content. Von Davier (2018) described a sequence-based AIG approach for generating personality statements. His item generation system used a language modelling approach. Language modelling captures sequential information in a text using letter and word probabilities. Language models are commonly used in NLP tasks that require predicting the future occurrence of letters and words using word history. Von Davier used a deep learning neural network algorithm to train a language model based on a corpus created using existing personality items. Then a carefully trained language modelling system based on a recurrent neural networks algorithm was used to predict the most probable next character given a sequence of input characters. Each character was passed onto the single neural network cell to inform and predict the most probable next character to form a complete sentence. Because of the configuration in his system, the output sequences of characters reflected the language structures of the input sequences. Item generation was conducted in the content area of personality testing. A corpus of 3,320 personality statements was used to identify probabilistic sequential dependencies among the words. The sequence-based AIG system was provided with the existing personality statements from the corpus to generate 24 new personality items. The evaluation of the generated items focused on comparing their statistical qualities with existing personality items. The 24 generated personality items were combined with 17 existing personality items to produce an inventory that was administered to 277 participants. Then exploratory factor analysis was conducted to identify the factor structure for the items. The generated items were not distinguishable from the existing items in the factor analysis. As an example, generated items such as "I usually control others", and "I prefer to be the perfect leader" loaded on the "extraversion" factor, along with the existing extraversion personality items. Similarly, generated items such as "I often do not know what happens to me", and "I rarely consider my actions" loaded on the "neuroticism" factor, along with the existing neuroticism personality items.

< Prev   CONTENTS   Source   Next >