Analogy, acquisition and the unlearnable
If we think about DOP as a model of language acquisition, the model effectively says that children acquire grammar by constructing analogies with previous utterances, guided by statistical generalization. This starting point, of using analogy to construct a grammar, has not gone unchallenged in linguistic theorizing. Examples of linearly similar but structurally different sentences, as discussed by Pinker (1979) and Chomsky (1986), show how proportional analogy, the simplest format of analogical reasoning, might lead a learner to wrong conclusions (this particular example being Pinker’s):
- (2) John likes fish : John likes chicken :: John might fish : John might chicken
- (3) Swimming in the sea is dangerous : The sea is dangerous ::
Swimming in the rivers is dangerous : The rivers is dangerous
As Pinker and Chomsky correctly point out, analogies like these do not hold because there is no notion of structural dependency, nor a concept of syntactic category that would be required to make them work. However, this is not a problem of analogical reasoning per se, but rather of the structure and content of the input analogical reasoning applies to. Analogical reasoning can be described as trying to solve a problem (categorizing an object, parsing an utterance) by comparing our knowledge about the object to a knowledge base of similar objects. If we grant people the ability to infer grammatical categories and hierarchical representations for sentences, analogical reasoning over such a knowledge base would not come up with the erroneous predictions of the grammaticality of John might chicken. If we let the the learner make the analysis without anything like grammatical categories or a notion of hierarchical structure, we do arrive at this prediction. Hence, it is not the mechanism that yields ungrammatical results, it is the nature of the content.
An extension of the original DOP model presented in the previous sections, Unsupervised Data-Oriented Parsing, or U-DOP, has been developed as an attempt to address this issue. Unsupervised techniques, developed in machine learning, allow a learner to build up some representation (of structure or categories, for instance), without having ‘correct’ representations for a set of training items (which would be supervised learning). Instantiated in U-DOP, these techniques grant the learner the domain-general starting point of understanding data as hierarchically structured, that is, as containing different levels of analysis, in which a concept on one level is triggered by (communicatively, cf. Verhagen 2009) or consists of (cognitively) small, less inclusive parts, but do not give the learner the correct analyses of the structure to train on. These assumptions are not language-specific, as we can apply the template of meronymy (the consists- of relation) to our understanding of body parts, artefacts, grouping relations of identical individuals, only in language we combine it with symbolic understanding (the triggered-by relation).
Using the idea that language is hierarchical and a stricter notion of analogical reasoning, U-DOP can be shown to predict the ungrammaticality of The rivers is dangerous (Bod 2009). It also predicts that a child can learn that, if it wants to form a polar interrogative of a sentence like (4), it is not the first (as in example 5) but the second is (as in example 6) that is produced at the front of the utterance.
- (4) The man who is sick is singing.
- (5) *Is the man who sick is singing?
- (6) Is the man who is sick singing?
How does the model acquire these dependencies correctly? Unlike DOP, U-DOP assumes that the learner does not know how to interpret its initial input. Instead, the learner will store all possible analyses of the input, and uses that as a basis for extracting subtrees and estimating their probabilities. Furthermore, we leave the problem of syntactic categories out of scope for now, focussing solely on the hierarchical structure. Effectively, all nodes in the tree representation, except for the lexical leafs, are of the same category, say ‘X’, and can thus be substituted for one another with the combination operation. So, suppose the U-DOP learner has heard the two utterances the dog walks and watch the dog. Each of these has two possible analyses:
Figure 5: The analysis of the two utterances the dog walks and watch the dog
From this collection of all possible analyses, we can extract all possible subtrees, just like we did with DOP. What will become immediately clear, is that a subtree like [ [ the ]X [ dog ]X ]X forms a reliable constituent, being found in two out of four parse trees. A less reliable subtree is [ [ dog ]X [ walks ]X ]X , which is only found in one parse tree. Using these subtrees, then, we can infer the hierarchical structure of an unseen utterance.
In order to do this, we use a stricter notion of analogy than in DOP. U-DOP starts from the insight that the more similar a novel analysis is to earlier analyses, the better an analysis it is. The model will therefore choose in the first place that parse tree that has the shortest derivation. We consider the length of the derivation to be the number of subtrees used in that derivation. Often, there are multiple parse trees that have an equally long derivation. In that case, the learner selects the most probable parse among the ones that have the shortest derivations. The probability of the parse is calculated as in DOP. This idea of selecting the most probable parse from among the shortest-derivations (MPSD) is in essence a probability driven model of analogy.
When we train this model on child-directed speech, we can simulate the acquisition of hierarchical structure and grammatical dependency. If we train U-DOP on the Adam corpus (Brown 1973), which consists of two hours of child directed speech per fortnight over the course of approximately two years, which constitutes only a fraction of a child’s input, the model correctly assigns more probability to a sentence like 6 than to one with the wrong auxiliary at the sentence-initial position 5 (Bod and Smets 2012)
Why is this of interest? The issue of auxiliary-fronting with subjects that have relative clauses has been a parade case for nativist approaches to grammar. Crain (1991) and others have tried to argue that the fact that children make no errors like the one in (5) when learning these patterns shows that they directly home in on the correct hypothesis, i.e. that there is a main clause and a subordinate clause, and that the auxiliary in main clause ought to be fronted. U- DOP uses no concepts of ‘clause’ to explain the phenomenon, but grounds it in the experience of a learner and its attempt to stick as close as possible to that experience. A typical nativist argument against the use of experience is that this specific construction is rarely, if ever, observed in the primary data, yet children seem be sensitive to the difference in grammaticality between examples (6) and (5). U-DOP tackles this problem by saying that we can combine subtrees from different processed utterances. A learner may have never seen a case of auxiliary fronting with a subject containing a relative clause, it will probably have processed some auxiliary fronting without subjects with a relative clause as well as some relative clauses in other grammatical constellations.
Subtrees from these parses make the learner able to produce a more probable short derivation for sentences like example (6), but not for cases like (5). A pattern like the X who is X might be used, along with is X singing. However, the model will have never or very scarcely seen patterns like who sick or is X is singing. Because of this, the model will find more probable short derivations for the good pattern, and longer, or less likely derivations for the erroneous one. The observed behavior of the model is in line with the pattern of errors observed in Ambridge, Rowland, and Pine (2008)
Moreover, this manner of analyzing the learnability of complex grammatical phenomena can be extended to other cases. Whereas most empiricist computational linguists use a specific model to address a specific phenomenon in order to refute nativist explanations (e.g., auxiliary fronting (Clark and Eyraud 2006) or anaphoric one (Foraker, Regier, Khetarpal, Perfors, and Tenenbaum 2009)), Bod and Smets (2012) show that a single, unified model, viz. U-DOP, can learn (virtually) all existing cases of hierarchical dependencies that are thought of as unlearnable. Work like this complements the analysis done by Pullum and Scholz (2002) and shows how general learning and structuring principles may lead to the behavior or judgements we can observe. As such, they provide us with a cognitively leaner, simpler and hence a priori more likely model of the acquisition of grammatical structure.
-  See Clark and Lappin (2011) for an overview of different refutations using different models.