Perspective 3: CA and its results, interpretation, and motivation
As outlined above, CA returns ranked lists of (distinctive) collexemes, which are analyzed in terms of functional characteristics. For the ditransitive data discussed above with Table 2, the rank-ordering in (1) emerges:
(1) give, tell, send, offer, show, cost, teach, award, allow, lend, deny, owe, promise, earn, grant, allocate, wish, accord, pay, hand, ...
Obviously, the verbs are not distributed randomly across constructions, but reveal semantic characteristics of the constructions they occupy. Here, the verbs in (1) clearly reflect the ditransitive's meaning of transfer (most strongly-attracted verbs involve transfer), but also other (related) senses of this construction (cf. Goldberg's 1995: Ch. 5): (non-)enablement of transfer, communication as transfer, perceiving as receiving, etc.
Similarly clear results are obtained from comparing the ditransitive and the prepositional dative discussed above with Table 3. The following rank-orderings emerge for the ditransitive (cf. (2)) and the prepositional dative (cf. (3)):
(2) give, tell, show, offer, cost, teach, wish, ask, promise, deny, ...
(3) bring, play, take, pass, make, sell, do, supply, read, hand, ...
Again, the verbs preferring the ditransitive strongly evoke the notion of transfer, but we also see a nice contrast with the verbs preferring the prepositional dative, which match the proposed constructional meaning of 'continuously caused (accompanied) motion.' Several verbs even provide nice empirical evidence for an iconicity account of the dative alternation as proposed by Thompson & Koide (1987): Verbs such as bring, play, take, and pass involve some greater distance between the agent and the recipient (pass here mostly refers to passing a ball in soccer), certainly greater than the one prototypically implied by give and tell.
By now, this method has been used successfully on data from different languages (e.g., English, German, Dutch, Swedish, ...) and in different contexts (e.g., constructional description in synchronic data, syntactic 'alternations' (Gilquin 2006), priming phenomena (Szmrecsanyi 2006), second language acquisition (Gries & Wulff 2005, 2009, Deshors 2010), and diachronic language change (Hilpert 2006, 2008). However, while these above examples and many applications show that the CA rankings reveal functional patterns, one may still wonder why this works. This question might especially arise given that the most widely-used though not prescribed statistical collostructional measure is in fact a significance test, a p-value. Apart from the two mathematical motivations for this p-value approach mentioned in the previous section, there is also a more conceptual reason, too.
As all p-values, such (logged) p-values are determined by both effect and sample size or, in other words, the p-value "weighs the effect on the basis of the observed frequencies such that a particular attraction (or repulsion, for that matter) is considered more noteworthy if it is observed for a greater number of occurrences of the [word] in the [constructional] slot" (Stefanowitsch & Gries 2003: 239, n. 6). For instance, all other things being equal, a percentage of occurrence o of a word w in c (e.g., 40%) is 'upgraded' in importance if it is based on more tokens (e.g., 14/35) than on less (e.g., 8/20). This cannot be emphasized enough, given that proponents of CA have been (wrongly) accused of downplaying the role of observed frequencies. CA has in fact been used most often with FYE, which actually tries to afford an important role to observed frequencies: it integrates two pieces of important information: (i) how often does something happen w's frequency of occurrence in c, which proponents of observed frequencies rely on but also (ii) how exclusive is w's occurrence to c and c's to w. Now why would it be useful to combine these two pieces of information? For instance,
- (i) because "frequency plays an important role for the degree to which constructions are entrenched and the likelihood of the production of lexemes in individual constructions (cf. Goldberg 1999)"
(Stefanowitsch & Gries 2003: 239, n. 6, my emphasis);
- (ii) because we know how important frequency is for learning in general
(cf., e.g., Ellis 2007);
- (iii) because "collostructional analysis goes beyond raw frequencies of occurrence, [...] determining what in psychological research has become known as one of the strongest determinants of prototype formation, namely cue validity, in this case, of a particular collexeme for a particular construction"
(cf. Stefanowitsch & Gries 2003: 237, my emphasis).
In spite of these promising characteristics, Bybee (2010) criticizes CA with respect to each of the three different perspectives outlined above: the goals, the mathematical side, and the results/interpretation of CA. In her claims, Bybee also touches upon the more general point of frequencies vs. AMs as used in many corpus- and psycholinguistic studies. In this chapter, I will refute the points of critique by Bybee and discuss a variety of related points of more general importance to cognitive/usage-based linguists.