In neurocognitive psychology, manually collected cloze completion probabilities (CCPs) are the standard approach to quantifying a word’s predictability from sentence context [KLI 04, KUT 84, REI 03]. Here, we test a series of language models in accounting for CCPs, as well as the data they typically account for, i.e. electroencephalographic (EEG) and eye movement (EM) data. With this, we hope to render time-consuming CCP procedures unnecessary. We test a statistical n-gram language model [KNE 95], a Latent Dirichlet Allocation (LDA) topic model [BLE 03], as well as a recurrent neural network (RNN) language model [BEN 03, ELM 90] for correlation with the neurocognitive data.
CCPs have been traditionally used to account for N400 responses as an EEG signature of a word’s contextual integration into sentence context [DAM 06, KUT 84]. Moreover, they were used to quantify the concept of word predictability from sentence context in models of eye movement control [ENG 05, REI 03]. However, as CCPs are effortfully collected from samples of up to 100 participants [KLI 04], they provide a severe challenge to the ability of a model to be generalized across all novel stimuli [HOF 14], which also prevents their ubiquitous use in technical applications.
To quantify how well computational models of word recognition can account for human performance, Spieler and Balota [SPI 97] proposed that a model should explain variance at the item-level, i.e. latencies averaged across a number of participants. Therefore, a predictor variable is fitted to the mean word naming latency as a function of У = f (x ) =
2>a + b + error for a number of n predictor variables x that are scaled by
a slope factor a, an intercept of b, and an error term. The Pearson correlation coefficient r is calculated, and squared to determine the amount of explained variance r2. Models with a larger number of n free parameters are more likely to (over-)fit error variance, and thus fewer free parameters are preferred (e.g. [HOF 14]).
While the best cognitive process models can account for 40-50% of variance in behavioral naming data [PER 10], neurocognitive data are noisier. The only interactive activation model that gives an amount of explained variance in EEG data [BAR 07, MCC 81] was that of Hofmann et al. [HOF 08], who account for 12% of the N400 variance. Though models of eye movement control use item-level CCPs as predictor variables [ENG 05, REI 03], computational models of eye movement control have hardly been benchmarked at the item-level, to our knowledge [DAM 07].
While using CCP-data increases the comparability of many studies, the creation of such information is expensive and they only exist for a few languages [KLI 04, REI 03]. If it were possible to use (large) natural language corpora and derive the information leveraged from such resources automatically, this would considerably expedite the process of experimentation for under-resourced languages. Comparability would not be compromised when using standard corpora, such as that available through Goldhahn et al. [GOL 12] in many languages. However, it is not yet clear what kind of corpus is most appropriate for this enterprise, and whether there are differences in explaining human performance data.