Menu
Home
Log in / Register
 
Home arrow Language & Literature arrow COGNITIVE APPROACH TO NATURAL LANGUAGE PROCESSING
Source

Methodology

Human performance measures

This study proposes that language models can be benchmarked by item- level performance on three datasets that are openly available in online databases. Predictability was taken from the Potsdam Sentence Corpus1, first published by Kliegl et al. [KLI 04]. The 144 sentences consist of 1,138 tokens, available in Appendix A of [DAM 09], and the logit-transformed CCP measures of word predictability were retrieved from Ralf Engbert’s homepage1 [ENG 05]. For instance, in the sentence “Manchmal sagen Opfer vor Gericht nicht die volle Wahrheit” [Before the court, victims tell not always the truth.], the last word has a CCP of 1. N400 amplitudes were taken from the 343 open-class words published in Dambacher and Kliegl [DAM 07]. These are available from the Potsdam Mind Research Repository[1] [2]. The EEG data published there are based on a previous study (see [DAM 06] for method details). The voltage of 10 centroparietal electrodes was averaged across up to 48 artifact-free participants from 300 to 500 ms after word presentation for quantifying the N400. SFD are based on the same 343 words from Dambacher and Kliegl [DAM 07], available from the same source URL. Data were included when this word was only fixated for one time, and these SFDs ranged from 50 to 750 ms. The SFD was averaged across up to 125 German native speakers [DAM 07].

  • [1] http://mbd.unipotsdam.de/EngbertLab/Software.html
  • [2] http://read.psych.unipotsdam.de
 
Source
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Mathematics
Political science
Philosophy
Psychology
Religion
Sociology
Travel