The Low Conditional Entropy Conjecture
In the examples considered in Section 7.2.1, it is not absolute entropy values that are of interest, but the relations between values. As more accurate distributional information is incorporated into an analysis, estimations of uncertainty become more precise and increasingly useful for modelling a range of behavioural responses, as discussed in Milin et al. (2009b). What emerges from the present illustration is the general usefulness of informationtheoretic notions for the purposes of representing morphological variation and structure. A standard entropy measure defines an uncertaintybased notion of variation, while conditional entropy measures the reduction in uncertainty that underlies the implicational structure assigned by a classical WP model.
The cells in Table 7.5 with the largest number of realizations exhibit the greatest variation and complexity. In isolation, the entropy of these cells is determined by the number of realizations and their distribution. If, as assumed by Post Bloomfieldian accounts such as Muller (2004:191), paradigms were “pure epiphe nomena”, paradigm affiliation would not be expected to make a significant contribution to constraining the entropy of individual cells. For some cells this is essentially true. Given the invariant realization of the instrumental plural ami, there is no uncertainty to resolve and, conversely, no pattern of covariation that reduces uncertainty about the realization of other cells. The dative and locative plural are similarly autonomous in the declensional system of Russian. In a system where the realization of all cells were mutually independent, it would be fair to describe the organization of forms into paradigms as a taxonomic artifact or matter of descriptive convenience.
Yet, as expressed in Matthew’s observation that “one inflection tends to predict another”, inflectional systems do typically exhibit a high degree of interdependence. In the Russian system, the dative, instrumental and locative plural cells are atypical, achieving an unambiguous encoding of case and number by providing no information about class. Nearly every other cell in a paradigm is at least partially informative about other cells. In the examples considered above, the genitive singular eliminated roughly two thirds of the uncertainty associated with the dative singular, and knowledge of the instrumental singular was sufficient to identify the dative singular realization.
As discussed in Section 7.1.2, this implicational structure underpins the use of exemplary paradigms, principal parts and analogical extension in a classical grammar. At the same time, the fact that diagnostic features and patterns tend to be specific to individual languages, morphological systems and even inflection classes inhibits crosslinguistic generalization of patterns within the classical WP tradition. Hence it is not until Ackerman and Malouf (2013) that there is a systematic attempt within this tradition to measure the informativeness of implicational structure within a range of languages. Ackerman and Malouf (2013) start by identifying four factors that contribute to paradigmatic uncertainty and complexity. They then classify the declensional systems of 10 genetically unrelated and areally scattered languages in terms of these factors. The results are repeated in Table79. Thefirstcolumn(followingthe language name)specifiesthe totalnumber
INFORMATIONTHEORETIC WP
Table 7.9 Entropies for a 10language sample (Ackerman and Malouf 2013:443)
Language 
Total Total 
Max 
Decl 
Entropy 

Cells 
Real 
Real 
Class 
Decl 
Paradigm 
Avg Cond 

Amele 
^{3} 
30 
14 
24 
4.585 
2.882 
1.105 
Arapesh 
2 
41 
26 
26 
4.700 
4.071 
0.630 
Burmeso 
12 
6 
2 
2 
1.000 
1.000 
0.000 
Fur 
12 
50 
10 
19 
4.248 
2.395 
0.517 
Greek 
8 
12 
^{5} 
8 
3.000 
1.621 
0.644 
Kwerba 
12 
9 
^{4} 
^{4} 
2.000 
0.864 
0.428 
Mazatec 
6 
356 
94 
109 
6.768 
4.920 
0.709 
Ngiti 
16 
7 
^{5} 
10 
3.322 
1.937 
0.484 
Nuer 
6 
12 
^{3} 
16 
4.000 
0.864 
0.793 
Russian 
12 
14 
^{3} 
^{4} 
2.000 
0.911 
0.538 
179
of cells in a paradigm. The second column identifies the total number of different (morphologically conditioned) cell realizations. The third column identifies the largest number of realizations of any one cell. The fourth column specifies the number of declension classes defined by the variation in cell realizations.
The last three columns in Table 7.9 calculate entropy measures based on the values in the preceding columns.^{[1]} The first measure represents what Ackerman and Malouf (2013) term “Declensional Entropy”, indicating the uncertainty associated with guessing the class of a random noun. The value of this measure depends directly on the number of classes; the more classes there are, the harder it is to guess. The next column measures the “Paradigm Entropy”, corresponding to the average uncertainty associated with guessing the realization of a cell of a noun. The more realizations per cell, the higher this value will be. The final column gives the “Average Conditional Entropy”, the uncertainty associated with guessing the realization of one cell based on knowing the realization of another, averaged over all pairs of cells.
Since Russian is included in this 10language sample, we can continue with the earlier illustration and use the values in the final row of Table 7.9 to exemplify each of these measures. Consider the first four columns. As shown in Table 7.5, Russian declensions have 12 cells, defined by the six cases and two numbers. The full declensional system contains a total of 14 casenumber realizations, adding four to the 10 listed in Table 7.7. No cell has more than three distinct realizations, and the recognition of four major declension classes accords with the number proposed in Corbett (1991).
From this description of Russian declensions, we can calculate the entropy measures. As in the previous illustration, we are dealing with entropy ceilings here, since the descriptions do not contain information about the relative frequency of cells, realizations or classes. Given a class size of four, and a provisional assumption of equiprobability, the uncertainty associated with guessing the class of a random noun is log_{2} (4) or 2 bits. The paradigm entropy will be lower, since the most highly allomorphic cell has only three distinct realizations (corresponding to roughly 1.6 bits of uncertainty) and the fact that other cells have even fewer realizations brings the average in Table 7.9 under 1 bit.
As in the earlier illustration, the actual values in Table 7.9 are less important than the relations between values. Of particular interest is the contrast between the last two measures in Table 7.9. In every language of the sample, the uncertainty associated with deducing the realization of a cell was reduced, in some cases significantly, by information about the realization of another cell. This pattern is all the more striking given that the omission of frequency information from this sample not only overestimates the overall uncertainty of a system by excluding distributional biases but also underestimates the potential value of these biases for deducing one form from another.
The limitations due to the lack of frequency information reflect a basic tradeoff between the genetic and areal diversity of a language sample and the kinds of information currently available for the languages. Subsequent comparisons based on larger samples with more distribution information may permit more secure estimations of the crosslinguistic variation in entropy values. However, even this initial comparison highlights a number of important points. The most significant is that the reduction in uncertainty in the final three columns correlates with an increase in psychological plausibility.
Guessing the class of an item is irreducibly difficult because this is not the sort of task that a speaker is ever confronted with. This reflects the fact that classes are not in a language per se but, like ‘blocks’, ‘templates’ and similar constructs, form part of the scaffolding of a language description. Class assignment is a metatask performed by linguists for descriptive and pedagogical purposes. Hence assessments of morphological complexity based on the structure of class systems is principally relevant to the classification of grammatical descriptions, not the languages they are meant to describe. Guessing the realization of a cell on the basis of no prior acquaintance with the item is nearly as unrealistic. As Ackerman and Malouf (2013:443f.) remark, “native speakers are not confronted with situations where it is necessary to guess declension class membership or the cell for particular word forms of novel lexemes without having encountered some actual word form of that lexeme”. Consequently notions of complexity determined by sheer numbers of realization patterns are again largely orthogonal to the factors that are relevant for speakers.
It is the final measure, which estimates the difficulty of deducing novel forms of an item from knowledge of other forms of the item, which is most clearly relevant to the analysis of the paradigmatic complexity of morphological systems. From a classical WP perspective, it is no accident that the average conditional entropy is the measure that is most constrained. It is of course true that speakers are not, in a strict sense, presented with the task of guessing the value of paradigm cells. However, the features associated with cells are a realistic proxy for the grammatical properties that would guide the choice of inflected form in production or partition the interpretative space in comprehension.
 [1] Unfortunately, Table 3 on p. 443 of Ackerman and Malouf (20Г3) transposes the labels on the finaltwo columns, so that the “Paradigm (Cell) Entropy” column gives the average conditional entropy andthe “Avg Entropy” column gives the paradigm entropy.