The analysis presented in this paper is based on usage data from historical and modern corpora (summarised in Table 1) and from attestations from broadcast media and the web.

Table 1: The corpora used in this investigation (arranged chronologically by start date)


Written or spoken


Total tokens


Early Modern Dutch Corpus (EMDC)


diaries, drama, prose (fiction, academic, non-academic)

c. 300,000

16th—19th century


written (83.3%), spoken (16.7%)

journalism, popular science, fiction, speech

c. 720,000


Corpus Gesproken Nederlands (CGN)


conversations, broadcasts, lectures, speeches; read-aloud texts

c. 9 million


INL 27 Miljoen Woorden Krantencorpus (INL 27 Mil.)


journalism (NRC Handelsblad)

c. 27 million


The Early Modern Dutch Corpus (EMDC) was compiled especially for the project of which the research reported here forms a part. The aim was to produce a balanced corpus of written language use covering a variety of text types, from relatively informal egodocuments to formal academic prose. To this end, a corpus was compiled from texts held by the online Digitale Bibliotheek voor de Neder- landse Letteren (DBNL); the corpus contains three sub-corpora of 100,000 tokens

each: the Gouden Eeuw ‘golden age’ (16th and 17th centuries combined, following the distinction made in the DBNL), the 18th century and the 19th century. Each of these sub-corpora consists of five genre-defined sub-corpora of 20,000 tokens each: diaries, drama, fictional prose, academic prose and non-academic prose. The genres and registers covered were chosen so as to exploit the various text types covered within the DBNL as far as possible.

The orthography of the corpus examples, both historical and modern, is reproduced unaltered throughout this chapter.

