A core question in the usage-based approach to language acquisition is that of grammatical productivity. How does a learner, be it an artificial one or one of flesh and blood, know, after having seen a number of exemplars, what patterns it should use to produce and interpret novel utterances? Although many informal discussions of the process have been given (Tomasello 2003; Goldberg 2006), often with reference to Gentner’s more formalized work on analogy (Gentner 1983), few models of discovering the productive grammatical units of a language have been developed so far. Similarly, no existing description of construction grammar’s parsing principles offers us an account of recombining these productive fragments into analyses of novel utterances. Such an account is desirable, as it can help validate learnability claims and adds to the possibilities for evaluating the theory against the data. It should be noted that construction grammar and usage-based theories are not alone in their lack of precise definitions; it seems that any current linguistic theory has given up on the construction of a precise, testable model of language use and language acquisition.
Formalizations of learning mechanisms for acquiring a grammar such as Embodied Construction Grammar (Chang 2008) and Fluid Construction Grammar (van Trijp et al. 2009) have been developed over the last decade. Other systems that have claimed relevance to usage-based theorizing are Memory-Based Learning (Daelemans and Van den Bosch 2005) and the memory-access and parsing framework developed by Jurafsky (1996). All of these add to our understanding, and insights from these different approaches complement DOP’s contribution, namely a precise account of how Gestalt-like linguistic units can be discovered in the data. The proposed mechanisms of Data-Oriented Parsing obviously cannot capture the wealth of linguistic phenomena described in full detail, but aim to give us insight in how complex representations can be acquired from the input data, and as such can help understand the domain-general learning processes in want of further specification.