Menu
Home
Log in / Register
 
Home arrow Language & Literature arrow COGNITIVE APPROACH TO NATURAL LANGUAGE PROCESSING
Source

Single-fixation duration (SFD) results

Finally, we examine the corpus-based predictors for modeling the mean single fixations duration for 343 words. For this target, the pos+freq baseline explains r2 = 0.021, whereas predictability, alone or combined with the baseline, explains r2 = 0.184.

Predictors

NEWS

WIKI

SUB

n-gram

0.225

0.140

0.126

topic

0.135

0.140

0.100

neural

0.242

0.190

0.272

base+n-gram

0.239

0.226

0.226

base+topic

0.152

0.154

0.127

base+neural

0.265

0.204

0.284

base+n-gram+topic

0.260

0.262

0.246

base+n-gram+neural

0.287

0.238

0.297

base+neural+topic

0.279

0.235

0.298

base+n-gram+topic+neural

0.295

0.265

0.307

base+n-gram+pred

0.273

0.274

0.258

base+topic+pred

0.235

0.250

0.229

base+neural+pred

0.314

0.267

0.320

base+n-gram+topic+pred

0.297

0.301

0.275

base+n-gram+neural+pred

0.319

0.283

0.322

base+neural+topic+pred

0.319

0.289

0.329

base+n-gram+topic+neural+pred

0.323

0.304

0.330

Table. 10.4. Explained variance of the single-fixation durations, for various combinations of baseline, predictability and corpus-based predictors

The experiments confirm the utility of n-gram models in accounting for eye movement data. The n-gram model alone explains even more variance than predictability - however, the difference is not significant (P > 0.46).

In contrast to the previous approaches to predictability and N400 amplitudes, however, the recurrent neural network outperformed the n-gram model at a descriptive level, as it accounted for up to 3% more of the variance than the n-gram model. This performance was not reached at the largest NEWS corpus, but at the smaller SUB corpus. This suggests that - for SFD data - the dimension reduction seems to compensate for the larger amount of the noise in the smaller training dataset (see [BUL 07, GAM 16, HOF 14]). Therefore, the neural model may provide a better fit for such early neurocognitive processes when it is trained by colloquial language [BRY 11].

The topics model seems to have a stronger impact on SFDs than on the other neurocognitive benchmark variables, suggesting a greater influence of long-range semantics on SFDs than on predictability or the N400. Taken together, these findings suggest that SFDs reflect different cognitive processes than the N400 (see [DAM 07]).

Last but not least, though again adding predictability increased the total amount of explained variance by 2%, the language models did an excellent job in accounting for SFD data. When taking all language model-based predictors together, this accounts for significantly more variance than the standard model using predictability (see Figure 10.3).

Prediction models exemplified for the SUB corpus in the x-axes and the N = 334 mean SFD scores on the y-axes

Figure 10.3. Prediction models exemplified for the SUB corpus in the x-axes and the N = 334 mean SFD scores on the y-axes. A) Prediction by baseline + all three language models (r2 = 0.295), and B) a standard approach to SFD data, using the baseline and predictability as predictors of SFDs (r2 = 0.184). Fisher’s r-to-z test revealed a significant difference in explained variance (z = 1.95; P = 0.05). For a color version of this figure, see www.iste.co.uk/sharp/cognitive.zip

 
Source
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Mathematics
Political science
Philosophy
Psychology
Religion
Sociology
Travel