Data and design


For this case study, we took 6,863 occurrences of the constructions with doen and laten from two large corpora of Dutch: the Twente News Corpus (Ordelman et al. 2007) and Leuven News Corpus.[1] The newspaper data consisted of equal samples of articles about politics, economy, music and football. Because the corpora were syntactically parsed, the information about the lexical fillers of the three main slots - the Causer, the Causee and the Effected Predicate - was extracted automatically and later checked manually. Some of the nominal slots were empty, especially the Causee in transitive constructions, as in (9):

(9) Ik liet het huis bouwen.

I let the house build

‘I had the house built.’

The objects of the classifications were all explicit non-pronominal slot fillers treated as types, or lemmata. In total, we obtained 2700 common and proper nouns in the function of the Causer, 1810 nouns filling the Causee slot, and 1155 verbs in the function of the Effected Predicates.

  • [1] Leuven News Corpus is a large corpus of contemporary newspaper Dutch in Flanders. Thecorpus was compiled by the Quantitative Lexicology and Variational Linguistics Research Unitat the University of Leuven.
