Let us now turn to some of the more technical arguments regarding CA's input data and choice of measure.

The issue of the corpus size

Let us begin with the issue of Bybee's "fourth factor", the corpus size in constructions. Yes, an exact number of constructions for a corpus cannot easily be generated because

i. "a given clause may instantiate multiple constructions" (Bybee 2010: 98);

ii. researchers will disagree on the number of constructions a given clause instantiates;

iii. in a framework that does away with a separation of syntax and lexis, researchers will even disagree on the number of constructions a given word instantiates.

However, this is much less of a problem than it seems. First, this is a problem nearly all AMs have faced and addressed successfully. The obvious remedy is to choose a level of granularity close to the one of the studied phenomenon. For the last 30 years collocational statistics used the number of lexical items in the corpus as n, and collostructional studies on argument structure constructions used the number of verbs. Many CA studies, none of which are cited by Bybee or other critics, have shown that this yields meaningful results with much predictive power (cf. also Section 4.2.2 below).

Second, CA rankings are remarkably robust. Bybee herself pointed out that different corpus sizes yield similar results, and a more systematic test supports that. I took Stefanowitsch & Gries's (2003) original results for the ditransitive construction and increased the corpus size from the number used in the paper by a factor of ten (138,664 to 1,386,640), and I decreased the observed frequencies used in the paper by a factor of 0.5 (with n's = 1 being set to 0 / omitted). Then I computed four CAs:

- one with the original data;

- one with the original verb frequencies but the larger corpus size;

- one with the halved verb frequencies and the original corpus size;

- one in which both frequencies were changed.

In Figure 1, the pairwise correlations of the collostruction strengths of the verbs are computed (Spearman's rho) and plotted. The question of which verb frequencies and corpus size to use turns out to be fairly immaterial: Even when the corpus size is de-/increased by one order of magnitude and/or the observed frequencies of the words in the constructional slots are halved/doubled, the overall rankings of the words are robustly intercorrelated (all rho > 0.87). Thus, this 'issue' is unproblematic when the corpus size is approximated at some appropriate level of granularity and, trivially, consistently, in one analysis.

Figure 1. Pairwise comparisons between (logged) collostruction values, juxtaposing corpus sizes (138,664 and 1,386,640) and observed frequencies (actually observed ones and values half that size, with n's = 1 being omitted)

The distribution of pFYE

Another aspect of how CA is computed concerns its 'response' to observed frequencies of word w in construction c and w's overall frequency. Relying on frequencies embodies the assumption that effects are linear: If something is observed twice as often as something else (in raw numbers or percent), it is, unless another transformation is applied, two times as important/entrenched/... However, many effects in learning, memory, and cognition are not linear:

- the power law of learning (cf. Anderson 1982, cited by Bybee herself);

- word frequency effects are logarithmic (cf. Tryk 1986);

- forgetting curves are logarithmic (as in priming effects; cf. Gries 2005, Szmrecsanyi 2006), ...

Given such and other cases and Bybee's emphasis on domain-general processes (which I agree with), it seems odd to rely on frequencies, which have mathematical characteristics that differ from those of many general cognitive processes. It is therefore useful to briefly discuss how frequencies, collostruction strengths, and other measures are related to each other, by exploring systematically-varied artificial data and authentic data from different previous studies.

As for the former, it is easy to show that the AM used in most CAs, pFYE, is not a straightforward linear function of the observed frequencies of words in constructions but rather varies as a function of w's frequency in c as well as w's and c's overall frequencies, as Figure 2 partially shows for systematically varied data. The

Frequency of verb in construction

Figure 2. The interaction between the frequency of w, the overall frequencies ofw and c, and their collostruction strengths

frequency of w in c is on the x-axis, different overall frequencies of w are shown in differently grey-shaded points/lines and with numbers, and -log10 pFYE is shown on the y-axis.

I am not claiming that logged pFYE-values are the best way to model cognitive processes for example, a square root transformation makes the values level off more like a learning curve but clearly a type of visual curvature we know from many other cognitive processes is obtained. Also, pFYE values are highly correlated with statistics we know are relevant in cognitive contexts and that may, therefore, serve as a standard of comparison. Ellis (2007) and Ellis & Ferreira Junior (2009: 198 and passim) discuss a unidirectional AM called AP, which has been used successfully in the associative-learning literature. Interestingly for the data represented in Figure 2, the correlation of pWT, with AP , . . .. is ex- r & ' r FYE word-to-construction trimly significant (p < 10-15) and very high (rho = 0.92) whereas the correlations of the observed frequencies or their logs with APword-to-construction are significant (p < 10-8) but much smaller (rho = 0.65). Again, pFYE is not necessarily 'the optimal solution', but it exhibits appealing theoretical characteristics ([transformable] curvature, high correlations with measures from learning literature, responsiveness to frequency) that makes one wonder how Bybee can just dismiss them.

Let us now also at least briefly look at authentic data, some here and some further below (in Section 4.3.2). The first result is based on an admittedly small comparison of three different measures of collostruction strengths: For the ditransitive construction, I computed three different CAs, one based on -log10 pFYE, one on an effect size (logged odds ratio), and one on Mutual Information (MI). Consider the three panels in Figure 3 for the results, where the logged frequencies of the verbs in the ditransitive are on the x-axes, the three AMs are on the y-axes, and the verbs are plotted at the x/y-values reflecting their frequencies and AM values. The correlation between the frequencies and AMs is represented by a polynomial smoother and on the right, I separately list the top 11 collexemes of each measure.

Comparing these results to each other and to Goldberg's (1995) analysis of the ditransitive suggests that, of these measures, pFYE performs best: Starting on the right, MI's results are suboptimal because the prototypical ditransitive verb, give, is not ranked highest (let alone by a distinct margin) but only third, and other verbs in the top five are, while compatible with the ditransitive's semantics, rather infrequent and certainly not ones that come to mind first when thinking of the ditransitive. The log odds ratio fares a bit better because give is the strongest collexeme, but otherwise the problems are similar to MI's ones.

The pFYE-values arguably fare best: give is ranked highest, and by a fittingly huge margin. The next few verbs are intuitively excellent fits for the polysemous ditransitive and match all the senses Goldberg posited: the metaphor of communication as transfer (tell), caused reception (send), satisfaction conditions implying transfer (offer), the metaphor of perceiving as receiving (show), etc.; cf. Stefanowitsch & Gries (2003: 228f.) for more discussion. Note also that pFYE also exhibits a behavior that should please those arguing in favor of raw observed frequencies: As the polynomial smoother shows, it is pFYE that is most directly correlated with frequency. At the same time, and this is only a prima facie piece of evidence, it is also the pFYE-values whose values result in a curve that has the Zipfian shape that one would expect for such data (given Ellis & Ferreira-Junior's (2009) work (cf. also below).

Finally, there is Wiechmann's (2008) comprehensive study of how well more than 20 AMs predict experimental results regarding lexico-constructional co-occurrence. Raw co-occurrence frequency scores rather well but this was in part because several outliers were removed. Crucially, pFYE ended up in second place and the first-ranked measure, Minimum Sensitivity (MS), is theoretically problematic. Using the notation of Table 1, it is computed as shown in (5), i.e. as the minimum of two conditional probabilities:

One problem here is that some collexemes' positions in the ranking order will be due to p(word|construction) while others' will be due to p(construction|word). Also, the value for give in Table 2 is 0.397, but that does not reveal which conditional probability that value is p(word|construction) or p(construction|word). In fact, this can lead to cases where two words get the same MS-value, but in one

Figure 3. Output scores for the ditransitive of three different AMs (left: pFYE, middle: log odds ratio, right:MI)

case it is p(word|construction) and in the other it is p(construction|word). This is clearly undesirable, which is why pFYE, while 'only' second, is more appealing. As an alternative, a unidirectional measure such as AP is more useful (cf. Gries to appear).

Found a mistake? Please highlight the word and press Shift + Enter