Experiment setup

Engbert et a/.’s [ENG 05] data are organized in 144 short German sentences with an average length of 7.9 tokens, and provide features, such as freq as corpus frequency in occurrences per million [BAA 95],pos andpred. We test whether two corpus-based predictors can account for predictability, and compare the capability of both approaches in accounting for EEG and EM data. For training n-gram and topic models, we used three different corpora differing in size and covering different aspects of language. Further, the units for computing topic models differ in size.

NEWS: a large corpus of German online newswire from 2009, as collected by LCC [GOL 12], of 3.4 million documents/30 million sentences/ 540 million tokens. This corpus is not balanced, i.e. important events in the news are covered better than other themes. The topic model was trained on the document level.

WIKI: a recent German Wikipedia dump of 114,000 articles/7.7 million sentences/180 million tokens. This corpus is rather balanced, as concepts or entities are described in a single article each, independent of their popularity, and spans all sorts of topics. The topic model was trained on the article level.

SUB: German subtitles from a recent dump of opensubtitles.org, containing 7,420 movies/7.3 million utterances/54 million tokens. While this corpus is much smaller than the others, it is closer to a colloquial use of language. Brysbaert et a/. [BRY 11] showed that word frequency measures of subtitles provide numerically greater correlations with word recognition speed than larger corpora of written language. The topic model was trained on the movie level.

Pearson’s product-moment correlation coefficient was calculated (e.g. [COO 10, p. 293]), and squared for the N = 1,138 predictability scores [ENG 05] or N = 343 N400 amplitudes or SFD [DAM 07]. To address overfitting, we randomly split the material into two halves, and test how much variance can be reproducibly predicted on two subsets of 569 items. For N400 amplitude and SFD, we used the full set, because one half was too small for reproducible predictions. The correlations between all predictor variables can be examined in Table 10.1. We observe very high correlations between the n-gram and the RNN predictions within and across corpora. The correlations involving topic-based predictions are smaller, supporting our hypothesis that they reflect a somewhat different neurocognitive process.

1.

2.

3.

4.

5.

6.

7.

8.

9.

NEWS

1. n-gram

0.65

0.87

0.87

0.56

0.84

0.83

0.59

0.80

2. topic

0.65

0.68

0.66

0.78

0.70

0.61

0.77

0.61

3. neural

0.87

0.68

0.84

0.59

0.88

0.77

0.62

0.79

WIKI

4. n-gram

0.87

0.66

0.84

0.61

0.90

0.79

0.59

0.78

5. topic

0.56

0.78

0.59

0.61

0.65

0.55

0.75

0.55

6. neural

0.84

0.70

0.88

0.90

0.65

0.76

0.64

0.79

SUB

7. n-gram

0.83

0.61

0.77

0.79

0.55

0.76

0.61

0.85

8. topic

0.59

0.77

0.62

0.59

0.75

0.64

0.61

0.61

9. neural

0.80

0.61

0.79

0.78

0.55

0.79

0.85

0.61

Table 10.1. Correlations between the language model predictors

 
Source
< Prev   CONTENTS   Source   Next >