The average conditional entropy between cells provides a particularly transparent measure of the implicational structure that underpins the classical factorization of inflectional systems into exemplary paradigms and principal parts. It is highly suggestive that the effects of this structure are so strong even under the simplest idealization of a paradigmatic system. Moreover, just as the use of more informative data sources will improve the precision of entropy estimations, the use of more refined or, perhaps, more appropriate techniques of analysis may also be expected to contribute to better measures. Refinements may be required in cases where Bonami and Luis (2013, 2014) have argued that paradigm cells do not always correspond to random variables.

The use of different measures may likewise offer more accurate estimations or more transparent perspectives. For example, instead of using conditional entropy to measure remaining uncertainty, one might use techniques to estimate degrees of informativeness. The notion of mutual information in Figure 7.8 provides a measure of the mutual dependence between two variables. As its name suggests, mutual information is a symmetrical measure of the information shared by variables. For a pair of variables X and Y, the relations between their mutual information, entropies and conditional entropies is given in Figure 7.9. If a pair of values are independent, their mutual information will be 0. If they are completely interdependent, their mutual information will be identical to their common entropy value. Between these extremes lie a range of values that are useful for characterizing the proportion of uncertainty that one cell eliminates about another or the cohesiveness of sets of cells.

The proportion of uncertainty that a cell C_{1} eliminates about a cell C_{2} can be obtained by dividing their mutual information, I(Q; C_{2}),bythe entropyof C_{2}, asin Figure 7.10. Dividing I(C_{1}; C_{2}) by the entropy of C_{1} likewise defines the proportion of uncertainty about C_{1} eliminated by knowledge of C_{2}.

Figure 7.8 Mutual information (Cover and Thomas 1991:18)

Figure 7.9 Mutual information and conditional entropy (Cover and Thomas 1991:19)

Figure 7.10 Proportional uncertainty reduction

Comparing the proportional values for all pairs of cells in a paradigm or inflection class will determine a ranking of cell in terms of relative informativeness. This ranking provides one means of identifying the best candidates for principal parts in a system. There is of course no guarantee that any single cell will eliminate all uncertainty in other cells. So if the complete elimination of uncertainty is the criterion for the selection of principal parts, ‘principle part cohorts’ can be defined as all the sets of forms Q ... C_{m} such that the conditional value H(CjQ ... C_{m}) approaches zero for all cells Cj in a paradigm, class or system. In either case, the use of information-theoretic measures offers a solution, in principle, to the traditional problem of selecting principal parts:

One objection to the Priscianic model... was that the choice of leading form was inherently arbitrary: the theory creates a problem which it is then unable, or only partly able, to resolve (Matthews 1972: 74).

An element of arbitrariness remains, in that the selection of principal parts reflects pedagogical goals that have no clear relevance for the native speaker. The relative informativeness of particular cells or forms can be expected to have behavioural correlates in a speaker’s willingness to produce novel forms or the speed or confidence with which they identify forms that are presented to them. But there is no reason to believe that speakers isolate the cells or forms that correspond to principal parts from other, partially informative, elements.

The mutual information between cells or forms also determines a natural measure of system cohesion. The cohesion of a pair of cells C_{1} and C_{2} correlates with the degree to which they are mutually informative (i.e., the proportion of their cumulative uncertainty that they share). The cumulative uncertainty of C_{1} and C_{2 }is defined by their joint entropy in Figure 7.11. In the limiting case where C_{1} and C_{2 }are independent, their joint entropy is just the sum of their entropies. However, a central assumption of WP models is that morphological systems never consist of independent cells, so the cumulative uncertainty of cells is always less than the sum of their individual uncertainty.

Cohesion is then measured by dividing the mutual information of C_{1} and C_{2} by their joint entropy, as in Figure 7.12. In the case where C_{1} and C_{2} are independent, their mutual information and cohesion will approach zero. In the case where C_{1 }and C2 are identical or perfectly intercorrelated, their cohesion will approach 1. Averaging this value over the cells in a paradigm, class or system will determine a measure of system cohesion which represents how tightly the cells are integrated into a network of mutual implication.

The traditional problem of ‘validating’ analogical deductions can also be approached from an uncertainty-based perspective. The validity of a four-part

Figure 7.11 Joint entropy (cf. Cover and Thomas 1991:15)

INFORMATION-THEORETIC WP

183

Figure 7.12 Cell cohesion

proportion of the form a : b = c : X depends on how well a predicts b (and how reliably a can be matched with c). Predictability can again be formulated in information-theoretic terms, most straightforwardly in terms of conditional entropy. This strategy can be illustrated with reference to the proportions in (7.1). As in traditional formulations, the terms in these proportions represent forms with implicit (but unambiguous) interpretations.

(7.1) Valid and spurious declensional analogies in Russian

a. slovarja : slovarju = portfelja : portfelju

b. nedeljami: nedelja = tetradjami: tetradja

The antecedent of the proportion in (7.1a) establishes a pattern between the genitive and dative singular forms of slovar’ ‘dictionary’ from Table 7.5. Since both of these forms are unique within the paradigm of slovar’, the forms alone suffice to pick out a unique cell-form pair.^{[1]} The pattern in the ancecedent then sanctions the deduction of the dative singular form of the first declension noun portfel’ ‘briefcase’ from its genitive singular form. The spurious proportion in (7.1b) establishes a similar pattern between unambiguous instrumental plural and nominative singular forms of nedelja ‘week’ and uses this relation to deduce an incorrect nominative singular of tetrad’ ‘notebook’.

The relationship between the ‘a’ and ‘c terms are parallel in these proportions: slovarja and porfela are both genitive singulars and nedeljami and slovarjami are both instrumental plurals. The contrast between the proportions in (7.1) reflects a difference in the relations between the ‘a’ and ‘b’ terms. It is this difference that can be expressed in terms of conditional entropy. Proportional analogies do not normally apply to all realizations of a given cell, since patterns of that generality can be specified by two terms. For example, the fact that the nominative plural of an Estonian noun corresponds to the genitive singular plus -d can be expressed by a ‘two-part’ analogy: any noun whose genitive singular is realized by ‘X’ has the nominative singular Xd.^{[2]} Hence the pattern established in the antecedent of four-part proportion relates a cell to a specific realization. In the case of (7.1a), the antecedent picks out nouns whose genitive singular conforms to the pattern Xa.

The reduction in the uncertainty of a cell C_{2} that is attributable to knowledge of a cell C_{1} with a realization r can be measured by the specific conditional entropy H(C_{2}|C_{1} = r). The reliability of the proportion in (7.1a) correlates with the value of H(Dat.Sg|Gen.Sg = Xa). Significantly, this use of specific conditional entropy preserves the representational neutrality of traditional proportions (cf. Morpurgo Davies 1978). The relevant subclass of the genitive singular is defined by a pattern, expressed by the inflectional ending -a. But the pattern could also include theme vowels, segments from the stem or root, or anything else that is of predictive value. Specific conditional entropy can be calculated from conditional probabilities in the same way as the conditional entropy values in Section 7.2.1. As exhibited in Figure 7.7, knowledge of the genitive singular eliminates roughly two thirds of the uncertainty associated with the dative singular. The remaining uncertainty derives from the fact that a genitive singular in -i canbe associated with a dative singular in-e or -i, as shown in Table 7.8. However, since a genitive singular in -a only co-occurs with a dative singular in -u, knowledge of a genitive singular in -a eliminates the uncertainty in the dative singular form in this fragment of the Russian declensional system. Hence the reliability of the proportion (7.1a) is reflected in the fact that the value of H(Dat.Sg|Gen.Sg = Xa) approaches 0.

In contrast, since the instrumental plural has a single realization, knowledge of an instrumental plural in-ami is the same as knowledge of the instrumental plural. Knowledge of the instrumental plural preserves all of the uncertainty associated with the nominative singular in (7.1b), just as the uncertainty of the dative singular is preserved in Figure 7.5. Hence the spurious nature of the proportion (7.1b) is reflected in the fact that the value of H(Nom.Sg|Inst.Pl = Xami) is the same as the value of H(Nom.Sg).

As with principal part selection, the formulation and validation of proportional analogies can be expressed in information-theoretic terms. Proportions can be defined over any stable property-form pairs, so that the role of words can be seen to reflect the traditional claim that words are the most morphosyntactically stable units. The reliability of an analogical deduction is likewise defined with reference to the totality of relevant property-form pairs in a language. To the extent that proportions make reference to words, their validation requires a lexicon containing units that are (at least) word-sized. However, the fact that these traditional notions can be reconstructed in terms of uncertainty reduction does not imply that they have a privileged status in a modern WP model. Again like principal parts or exemplary paradigms, proportions isolate individual deductive patterns within a larger network of mutually interdependent elements. Predictive value is a matter of degree and, as suggested by the average conditional entropy measures of Ackerman and Malouf (2013), most inflectional variants are at least partially informative about other forms of an item. Proportions tend to identify patterns that are of particular relevance or salience for language descriptions, but there is no reason to believe that they are distinguished from other patterns by the speakers of a language.

[1] The use of unambiguous forms to represent cell-form pairs in proportional analogies should notbe allowed to create the false impression that they express pure form-based deductions.

[2] Similar remarks apply to the Priscianic patterns in (6.2).