The Low Conditional Entropy Conjecture

In the examples considered in Section 7.2.1, it is not absolute entropy values that are of interest, but the relations between values. As more accurate distributional information is incorporated into an analysis, estimations of uncertainty become more precise and increasingly useful for modelling a range of behavioural responses, as discussed in Milin et al. (2009b). What emerges from the present illustration is the general usefulness of information-theoretic notions for the purposes of representing morphological variation and structure. A standard entropy measure defines an uncertainty-based notion of variation, while conditional entropy measures the reduction in uncertainty that underlies the implicational structure assigned by a classical WP model.

The cells in Table 7.5 with the largest number of realizations exhibit the greatest variation and complexity. In isolation, the entropy of these cells is determined by the number of realizations and their distribution. If, as assumed by Post- Bloomfieldian accounts such as Muller (2004:191), paradigms were “pure epiphe- nomena”, paradigm affiliation would not be expected to make a significant contribution to constraining the entropy of individual cells. For some cells this is essentially true. Given the invariant realization of the instrumental plural -ami, there is no uncertainty to resolve and, conversely, no pattern of covariation that reduces uncertainty about the realization of other cells. The dative and locative plural are similarly autonomous in the declensional system of Russian. In a system where the realization of all cells were mutually independent, it would be fair to describe the organization of forms into paradigms as a taxonomic artifact or matter of descriptive convenience.

Yet, as expressed in Matthew’s observation that “one inflection tends to predict another”, inflectional systems do typically exhibit a high degree of interdependence. In the Russian system, the dative, instrumental and locative plural cells are atypical, achieving an unambiguous encoding of case and number by providing no information about class. Nearly every other cell in a paradigm is at least partially informative about other cells. In the examples considered above, the genitive singular eliminated roughly two thirds of the uncertainty associated with the dative singular, and knowledge of the instrumental singular was sufficient to identify the dative singular realization.

As discussed in Section 7.1.2, this implicational structure underpins the use of exemplary paradigms, principal parts and analogical extension in a classical grammar. At the same time, the fact that diagnostic features and patterns tend to be specific to individual languages, morphological systems and even inflection classes inhibits cross-linguistic generalization of patterns within the classical WP tradition. Hence it is not until Ackerman and Malouf (2013) that there is a systematic attempt within this tradition to measure the informativeness of implicational structure within a range of languages. Ackerman and Malouf (2013) start by identifying four factors that contribute to paradigmatic uncertainty and complexity. They then classify the declensional systems of 10 genetically unrelated and areally scattered languages in terms of these factors. The results are repeated in Table79. Thefirstcolumn(followingthe language name)specifiesthe totalnumber

INFORMATION-THEORETIC WP

Table 7.9 Entropies for a 10-language sample (Ackerman and Malouf 2013:443)

Language

Total Total

Max

Decl

Entropy

Cells

Real

Real

Class

Decl

Paradigm

Avg Cond

Amele

3

30

14

24

4.585

2.882

1.105

Arapesh

2

41

26

26

4.700

4.071

0.630

Burmeso

12

6

2

2

1.000

1.000

0.000

Fur

12

50

10

19

4.248

2.395

0.517

Greek

8

12

5

8

3.000

1.621

0.644

Kwerba

12

9

4

4

2.000

0.864

0.428

Mazatec

6

356

94

109

6.768

4.920

0.709

Ngiti

16

7

5

10

3.322

1.937

0.484

Nuer

6

12

3

16

4.000

0.864

0.793

Russian

12

14

3

4

2.000

0.911

0.538

179

of cells in a paradigm. The second column identifies the total number of different (morphologically conditioned) cell realizations. The third column identifies the largest number of realizations of any one cell. The fourth column specifies the number of declension classes defined by the variation in cell realizations.

The last three columns in Table 7.9 calculate entropy measures based on the values in the preceding columns.[1] The first measure represents what Ackerman and Malouf (2013) term “Declensional Entropy”, indicating the uncertainty associated with guessing the class of a random noun. The value of this measure depends directly on the number of classes; the more classes there are, the harder it is to guess. The next column measures the “Paradigm Entropy”, corresponding to the average uncertainty associated with guessing the realization of a cell of a noun. The more realizations per cell, the higher this value will be. The final column gives the “Average Conditional Entropy”, the uncertainty associated with guessing the realization of one cell based on knowing the realization of another, averaged over all pairs of cells.

Since Russian is included in this 10-language sample, we can continue with the earlier illustration and use the values in the final row of Table 7.9 to exemplify each of these measures. Consider the first four columns. As shown in Table 7.5, Russian declensions have 12 cells, defined by the six cases and two numbers. The full declensional system contains a total of 14 case-number realizations, adding four to the 10 listed in Table 7.7. No cell has more than three distinct realizations, and the recognition of four major declension classes accords with the number proposed in Corbett (1991).

From this description of Russian declensions, we can calculate the entropy measures. As in the previous illustration, we are dealing with entropy ceilings here, since the descriptions do not contain information about the relative frequency of cells, realizations or classes. Given a class size of four, and a provisional assumption of equiprobability, the uncertainty associated with guessing the class of a random noun is log2 (4) or 2 bits. The paradigm entropy will be lower, since the most highly allomorphic cell has only three distinct realizations (corresponding to roughly 1.6 bits of uncertainty) and the fact that other cells have even fewer realizations brings the average in Table 7.9 under 1 bit.

As in the earlier illustration, the actual values in Table 7.9 are less important than the relations between values. Of particular interest is the contrast between the last two measures in Table 7.9. In every language of the sample, the uncertainty associated with deducing the realization of a cell was reduced, in some cases significantly, by information about the realization of another cell. This pattern is all the more striking given that the omission of frequency information from this sample not only overestimates the overall uncertainty of a system by excluding distributional biases but also underestimates the potential value of these biases for deducing one form from another.

The limitations due to the lack of frequency information reflect a basic tradeoff between the genetic and areal diversity of a language sample and the kinds of information currently available for the languages. Subsequent comparisons based on larger samples with more distribution information may permit more secure estimations of the cross-linguistic variation in entropy values. However, even this initial comparison highlights a number of important points. The most significant is that the reduction in uncertainty in the final three columns correlates with an increase in psychological plausibility.

Guessing the class of an item is irreducibly difficult because this is not the sort of task that a speaker is ever confronted with. This reflects the fact that classes are not in a language per se but, like ‘blocks’, ‘templates’ and similar constructs, form part of the scaffolding of a language description. Class assignment is a meta-task performed by linguists for descriptive and pedagogical purposes. Hence assessments of morphological complexity based on the structure of class systems is principally relevant to the classification of grammatical descriptions, not the languages they are meant to describe. Guessing the realization of a cell on the basis of no prior acquaintance with the item is nearly as unrealistic. As Ackerman and Malouf (2013:443f.) remark, “native speakers are not confronted with situations where it is necessary to guess declension class membership or the cell for particular word forms of novel lexemes without having encountered some actual word form of that lexeme”. Consequently notions of complexity determined by sheer numbers of realization patterns are again largely orthogonal to the factors that are relevant for speakers.

It is the final measure, which estimates the difficulty of deducing novel forms of an item from knowledge of other forms of the item, which is most clearly relevant to the analysis of the paradigmatic complexity of morphological systems. From a classical WP perspective, it is no accident that the average conditional entropy is the measure that is most constrained. It is of course true that speakers are not, in a strict sense, presented with the task of guessing the value of paradigm cells. However, the features associated with cells are a realistic proxy for the grammatical properties that would guide the choice of inflected form in production or partition the interpretative space in comprehension.

  • [1] Unfortunately, Table 3 on p. 443 of Ackerman and Malouf (20Г3) transposes the labels on the finaltwo columns, so that the “Paradigm (Cell) Entropy” column gives the average conditional entropy andthe “Avg Entropy” column gives the paradigm entropy.
 
Source
< Prev   CONTENTS   Source   Next >