Perspective 3: CA and its results, interpretation, and motivation
The perceived lacks of semantics
I find it hard to make sense of Bybee's first objection to CA, the alleged lack of consideration of semantics discussed in Section 3.3: (i) her claim appears to contradict the exemplar-model perspective that permeates both her whole book and much of my own work; (ii) it does not engage fully with the literature; (iii) it is based on a partial representation of CA, and so it is really arguing against a straw man.
As for (i), Bybee's statement that "[s]ince no semantic considerations go into the analysis, it seems plausible that no semantic analysis can emerge from it" is false. There is a whole body of work in, e.g., computational (psycho)linguistics where purely frequency-based distributional analyses reveal functionally interpretable clusters. Two classics are Redington, Chater & Finch (1998) and Mintz, Newport & Bever (2002). Both discuss how multidimensional distributional analyses of co-occurrence frequencies reveal clusters that resemble something that, in cognitive linguistics, is considered to have semantic import, namely parts of speech. And even if one did not postulate a relation between parts of speech and semantics, both reveal that something can emerge from a statistical analysis (parts of speech) that did not enter into the analysis. Even more paradoxically, it is a strength of exactly the type of usage-/exemplar-based models that Bybee and I both favor that they can explain such processes as the emergence of categories of any kind from processing and representing vast numbers of usage events in multidimensional memory space.
As for (ii), it is even less clear how anyone can imply having read CA studies but claim that collostructional results do not reveal semantic patterns. For example, there are the (discussions of the) lists of collexemes presented in Stefanowitsch & Gries (2003) recall the ditransitive and the dative alternation from Section 2.3 above plus there are many other studies aside from Stefanowitsch & Gries (2003) and Gries & Stefanowitsch (2004) cf. Sections 4.2.2 and 4.3.2 for many examples nearly all of which have discussed at length functional patterns in the top-ranked collexemes. This lack of engagement with the literature extends even to the CA work speaking most directly to this question: Gries & Stefanowitsch (2010), first presented in 2004 and available online since 2006, clustered the first verbs in the info-causative (cf. (6)) based on the ing-verbs,[1] and the verbs in the way-construction (cf. (7)) based on the prepositions.
(6) a. V NPDirect object int0 V-ing
b. He tricked her into believing him.
c. They talked you into giving up.
(7) a. V [Direct Object POSS way] pp
b. She fought her way to the stage.
c. He argued his way out of the situation.
Specifically, for each construction they computed a table with all verbs in the construction in the rows, the ing-verbs (for the info-causative) or the prepositions (for the way-construction) in the rows, and the collostructional strengths in the cells. Then, the verbs in the rows (for each construction) were clustered on the basis of the collostructional preferences in the columns using a hierarchical cluster analysis and the resulting tree plot was interpreted in terms of which verbs were grouped together based on similar preferences. These cluster analyses, into which semantics did not enter as data, produced clear semantic patterns. For the into-causative, the cluster analysis revealed groups of (more) physical force verbs, of provoking, of trickery, of verbs providing positive stimuli, and of verbs providing negative stimuli. For the way-construction, the clustering revealed a cluster of two highly frequent all-purpose verbs, again a group of (more) physical force verbs, and three different clusters reflecting different kinds of slow motion.
In sum, the statement that "[s]ince no semantic considerations go into the analysis, it seems plausible that no semantic analysis can emerge from it" can only be upheld by ignoring both the distributional linguistics literature that Bybee is otherwise sympathetic towards and the specific collostructional literature that she means to criticize and that shows the opposite.
As for (iii), Bybee's comparison of her and Eddington's approach and collostructional data is misleading. Recall the four-step characterization of CA in Section 2.1. On that level of abstraction, Bybee and Eddington's approach consists of the following steps:
- generating a concordance of two words in question (ponerse and quedarse);
- retrieving frequency data for twelve adjectival collocates of each verb;
- carefully categorizing the adjectives on the basis of their semantic characteristics and frequencies.
Table 4. Bybee's 'Collostructional Analysis'
Bybee then compares the results of her full-fledged, linguistically informed analysis not to the results of an equally full-fledged CA she compares them to nothing more than the result of applying only step (ii) of a full-fledged CA, as represented in Table 4, which, of course, delivers results that do not have academic merit. To have offered a genuine comparison, Bybee should have computed collostruction strengths of all verbs, not just a small selection and in particular not a selection of collexemes occurring maximally once only then could she have computed the intended rank-ordering which takes high frequencies into consideration and allows for the follow-up semantic analysis of highly-ranked collexemes that many studies have offered. Bybee compares her full analysis to only the numerical output of what Bybee calls a CA rather than the semantic classes of the top-ranked words of a real CA
- [1] Bybee (2010: 81) quotes Gries et al. (2005) for "verbs occurring in the ifo-causative," but these are not discussed in that paper (but in Gries & Stefanowitsch 2004b, 2010).
- [2] More precisely, it is unclear whether this step was undertaken or not, but no data/analysis is offered of collexemes other than the 24 mentioned and it is possible to compute Bybee's collostruction strengths on the basis of the lexical frequencies of ponerse, quedarse, and the 24 adjectives.