Data-Oriented Parsing as a constructional learner
Suppose a learner has processed several exemplars, or structured representations of utterances. When processing novel utterances, the learner can draw on this inventory by recombining its parts in order to come up with an analysis of the novel utterance.
Now, a novel utterance, say the one in example (1), can be analyzed using parts of the processed utterances in Figure 1. Let us define a legal part, or subtree of a tree representation to be a connected subgraph in which all sisterhood relations of the original tree hold (see Bod and Kaplan 1998 for a more precise definition). Maintaining the sisterhood means that given the first tree in Figure 1, we can have a fragment of S going to NP and VP, but not S going just to NP (without its VP sister). But most importantly, it is not just the small parts that can be re-used, larger fragments can be used in analyzing novel utterances as well. The main claim of Data-Oriented Parsing is that all fragments, irrespective of size, can be used in analyzing novel utterances. Given this starting point, we have many ways of analyzing sentence (1). Some of these are given in Figure 2. We analyze novel utterances by using a legal part of an utterance the learner has seen and “substituting” its leftmost non-terminal symbol (that is: grammatical symbols, such as ‘S’, ‘Det’ and ‘PP’) with another part the learner has already seen. This procedure is repeated until there are no non-terminal symbols left, that is: all words of the utterance are present in the analysis. The symbol of this substitution operation is v.
(1) She saw the dress with the telescope
Figure 1: A corpus of two processed utterances
What we see in Figure 2 is that there are multiple analyses possible for the novel utterance, and there are multiple ways to arrive at a single analysis. We call each of these ways of arriving at an analysis a derivation and a resulting analysis (which can emerge through different derivations) a parse. The first and third derivation thus give us the same parse tree, but get there in different ways. The first tree uses only the smallest fragments possible, while the third tree re-uses larger fragments of the earlier processed experiences.
Figure 2: Three derivations of she saw the dress with the telescope
Now, given this structural ambiguity (traditionally: the PP-attachment problem), the learner has to choose which of the possible analyses to consider as the right one. This is where the frequencies of the fragments come in. First, consider a derivation to be a complex event consisting of a number of smaller events, viz. the subtrees. Suppose that each of these subtrees has a certain probability. This would mean that the probability of the event of them occurring together would be the joint probability of all of the individual events of selecting that subtree. The joint probability of a derivation can thus be given by the product of the probabilities of the individual subtrees t1... tn that make up the derivation d:
The probability of a subtree, then, can be estimated by the number of times it occurs in the corpus of processed utterances, divided by the number of times a fragment is found with the same syntactic category at the root of that subtree. This is to say that the event of drawing a specific subtree from a bag of subtrees with the same syntactic label in the root node is the number of times that subtree occurs divided by the number of subtrees in the bag. With this estimation procedure, the subtree competes with all other subtrees that can occur at the same place in a derivation, viz. at an open position in another fragment that has the syntactic category of that subtree. Stated more formally:
The estimation of these probabilities thus involves finding all possible subtree types in all trees in the corpus and establishing their frequency. For a simple tree such as the one in Figure 3, we can extract all possible subtrees and arrive at the set shown in Figure 4.
Figure 3: A simple parse tree
Figure 4: All subtrees of the tree in Figure 3
Now, we have multiple derivations leading to the same parse of an utterance, as we have seen in Figure 2. We determine the probability of a parse to be the sum of the probabilities of all derivations yielding that parse tree. Again, this is grounded in relatively simple probability theory: the probability of a parse is the probability of either derivation one leading to that parse, or derivation two, or derivation three, and so forth, and so it is the sum of the probabilities of the individual events. For any parse tree t we can thus calculate the probability as follows:
We then select the parse with the highest probability mass to be the most likely analysis of the novel utterance (cf. Zuidema 2006 for a discussion of this and other estimation and evaluation methods).
It is important to note that, using the same mechanisms, DOP can also generate utterances from the grammar. This process boils down to probabilistically selecting a subtree fragment rooted in the starting label, and expanding its open substitution sited by other subtrees. When coupled with meaning, this approach can give us a Data-Oriented generator as well.
-  This procedure may strike some readers as very top-down. Although it is presented as such,the same principles can be applied in a bottom-up parsing algorithm equally well.