Menu
Home
Log in / Register
 
Home arrow Language & Literature arrow Theory and Data in Cognitive Linguistics

A reconstruction of the semantics of the Dative Subject Construction in Indo-European

The predicates instantiating the Dative-Subject Construction in the Indo-European languages comprise two main semantic categories, experience-based predicates and happenstance predicates (Barðdal 2004, 2008, 2011). The experience-based predicates are verbs of emotion, bodily states, cognition, attitudes, and perception. The happenstance predicates are verbs of gain, success, happening, hindrance, ontological states, speaking, and possession. In addition, verbs of modality and evidentially, as well as possessives, are also found occurring in the Dative Subject Construction, and these are not readily classified as either experience or happenstance.

The predicates can also be divided into event type categories in that some are stative and others are eventive. However, matters are not so simple that the experience-based predicates are stative and the happenstance predicates are eventive, as verbs of emotion are very often stative but there are also some inchoative predicates among them. For instance, 'occur to one's mind' would be inchoative and not stative. The same is true for happenstance predicates: some of them are eventive like 'happen, 'lead to death, while others are stative. The semantic class of Verbs of Ontological States is one such class where the predicates are stative. The predicate 'be similar to, from Verbs of (Dis)Similarity, would be a case in point.

We will now present three different semantic maps of the lexical semantic classes instantiating the Dative Subject Construction, one, manually drawn, based on the 14 higher-level semantic categories, and two others, computationally drawn, based on the narrowly-circumscribed semantic verb classes given in the Appendix. We start with the manually-drawn map, shown in Figure 7. Here one can see that the Dative Subject Construction has the widest semantic scope in Ancient Greek, where all the categories are represented. Old Norse-Icelandic has all the subconstructions, except for Possession, Latin has all but Verbs of Speaking, Old Russian has all but Verbs of Speaking and Verbs of Perception, while Old Lithuanian is missing Verbs of Speaking. The graphics in Figure 7 are therefore most reminiscent of the graphics in Figure 6, which we claimed above in Section 3 might be typical for a scenario where the construction is an early inheritance but has not been productive since the languages split. It is of course entirely possible that the lack of sememe attestation in specific language branches is due to accidental gaps in the data. Only further research will reveal if that is the case. In the meantime, let us continue with our methodological exercise.

Notice, moreover, that the semantic map in Figure 7 is based on the 14 higher-level semantic categories presented in Section 4, and that such a semantic map abstracts away from the lower-level predicate-specific classes given in the Appendix. The distribution of languages across the lower-level classes will be addressed below, through a Principal Component Analysis and the semantic map derived from that method.

Except for Verbs of Speaking in Old Norse-Icelandic, all other semantic categories in Figure 7 are always found in at least four of five branches. Therefore, the semantic map in Figure 7 suggests an amazing stability within the semantic field of the Dative Subject Construction in the history of Indo-European.

A comparison of the higher-level semantic categories across Old Norse-Icelandic, Ancient Greek, Latin, Old Russian, and Old Lithuanian.

Figure 7. A comparison of the higher-level semantic categories across Old Norse-Icelandic, Ancient Greek, Latin, Old Russian, and Old Lithuanian.

The Semantic Map Model is frequently used in typological and cross-linguistic comparisons and its adequacy lies in its ability to model implicational hierarchies and diachronic predictions, as well as its ability to capture the relatedness of grammatical categories and their structure through adjacent regions in conceptual space. The last property is often referred to as the Semantic Map Connectivity Hypothesis (Croft 2001: 96). However, most semantic maps are intended to capture grammatical categories and not lexical categories (with the notable exceptions of Barðdal 2007, Francois 2008, Majid, Boster & Bowerman 2008, and Barðdal , Kristoffersen & Sveen 2011). We believe that the semantic maps of lexical verb classes should be regarded as discrete networks rather than continuous categories, and the details of such networks are hard to represent given 232 sememes. We therefore present our lexical categories as continuous spaces on the semantic map in Figure 7, while in reality this is more of a discrete network than a set of continuous categories. The discrete nature of lexical categories is better captured by maps as in Figures 8-9, although semantic maps as in Figure 7 are good for illustrative purposes, showing very clearly where some of the similarities and differences between languages lie (cf. Janda 2009). Therefore, at the present stage, we do not intend the semantic map in Figure 7 to have diachronic or typological validity, as only through a careful examination of a set of (unrelated) languages is it possible to order the categories on the map in such a way that they represent an implicational hierarchy or diachronic predictions.

Let us now consider the computationally generated semantic map in Figures 8-9, based on the 49 narrowly-circumscribed semantic verb classes. The map was created using Principal Component Analysis (PCA), an unsupervised clustering method closely related to techniques such as Multidimensional Scaling (MDS) and Correspondence Analysis (CA). All three techniques are used to tease out the most important patterns in a multidimensional dataset (i.e. one with several row- and column-variables) so that it can be plotted in a two-dimensional graph. For a more thorough technical description of the methods and how they can be applied in linguistic research, see e.g. Jenset & McGillivray (2012).

Table 2. A portion of the by-language-and-class centroids that were used to cluster the branches based on their attestations, as expressed by the centroid, for specific classes. The full data has a total of 49 rows, each corresponding to a narrowly-circumscribed verb class.

Greek

Latin

OLith

ON-I

OR

Anger/Irritation

0.00

0.75

0.00

0.50

0.00

Be Allowed

0.33

0.67

0.00

0.33

0.33

Be Surprised/Confused

0.33

0.00

0.17

0.67

0.17

Be In/Determined

0.29

0.71

0.00

0.29

0.00

Benefit

0.33

0.44

0.11

0.67

0.33

Bitterness/Hate

0.75

0.50

0.00

0.00

0.25

In our case, the choice of method was motivated by the data at hand, which consists of binary (0 and 1) indications of whether a narrowly-circumscribed verb class is attested in our five Indo-European language branches or not. For each language branch we calculated the centroid of the verb class, essentially the mean (see e.g. Everitt & Hothorn 2006). To take the like/please class as an example, we find that there are eight sememes included in the class in the dataset. Ancient Greek, with three attested sememes in the class, hence gets a centroid of 3/8 = 0.375 as the best approximation of the lexical attestation of that specific class in that branch. The process was carried out for all classes and all branches, and the resulting data were used for the Principal Component Analysis. Table 2 shows a portion of the data as an illustration.

PCA clusters the data based on importance, or strength of contribution, to dimensions or Principal Components (PCs). These can be compared to the degree of association between categories in the data and ordered so that the most important relationships are expressed in PC 1 (as a percentage), the second most important in PC 2, and so on, until all variation is accounted for. Figure 8 shows the first two, i.e. the most important dimensions. In this case, the first two dimensions account for a total of 67% of the total variation in the data (PC1 + PC2), a result which we consider more than acceptable, even if there is clearly more variation than what the map shows.

PCA map of the verb class data exemplified in Table 2, showing the first and the second dimension with a cumulative explained variation of 67%. Note in particular the opposition between Old Norse-Icelandic and Old Russian.

Figure 8. PCA map of the verb class data exemplified in Table 2, showing the first and the second dimension with a cumulative explained variation of 67%. Note in particular the opposition between Old Norse-Icelandic and Old Russian.

PCA map of dimensions 2 and 3 in the PCA solution shown in Figure 8, bringing the cumulative explained variation to 82%. Note the opposition between Ancient Greek/Latin and Old Russian.

Figure 9. PCA map of dimensions 2 and 3 in the PCA solution shown in Figure 8, bringing the cumulative explained variation to 82%. Note the opposition between Ancient Greek/Latin and Old Russian.

The most striking feature of Figure 8 is the position of Old Norse-Icelandic as being distinct from the other branches, shown in PC 1. If we next turn to PC 2, we find that this dimension is distinguished by the difference between Old Russian and the other branches. Since Figure 8 only accounts for 67% of the variation, we have also included a plot of PC 3, shown in Figure 9.

Including a third dimension brings the cumulative explained variation up to 82%, which must be considered a good result. In Figure 9 we recognize PC 2 from Figure 8 (now represented as the horizontal axis) and PC 3 on the vertical axis. The interpretation of PC 2 is the same as in Figure 8: Old Russian stands out from the rest. Turning to PC 3, we find that Greek and Latin together stand out from the other branches, indicating a close relationship in terms of verb class attestations. Since the dimensions in the PCA solution are ordered, we can draw the following conclusions about the patterning in the verb-class data. First and most importantly, we see that Old Norse-Icelandic is clearly distinct from all the other languages. Secondarily, we note that Old Russian shows signs of important differences from the other languages and Old Norse-Icelandic in particular. Finally, we see that Ancient Greek and Latin together form a third dimension where they stand out from the others. It is also evident from Figures 8-9 that Old Lithuanian contributes little in terms of explained variation, suggesting that it shares several features with all the other branches.

Table 3. Summary of the multinominal logistic analysis of the similarities and differences between Old Norse-Icelandic, Ancient Greek, Latin, Old Russian, and Old Lithuanian with respect to the narrowly-circumscribed semantic verb classes

Chisq

Df

P-value

Sig-level

Ancient Greek

62.65

49

0.0911

n.s

Latin

74.92

49

0.0100

0.05

Old Lithuanian

56.63

49

0.2118

n.s

Old Norse-Icelandic

75.47

49

0.0089

0.01

Old Russian

74.36

49

0.0112

0.05

The verb classes themselves are harder to interpret, since there is obviously much overlap, as seen in the two plots. However, this is hardly surprising, both due to linguistic realities and the nature of the data. It would be surprising to find clear, unique, and unambiguous clouds of verb classes clustered with specific languages, given that the languages at hand are genetically related. Secondly, and perhaps more importantly, the underlying binary nature of the data makes a faithful representation more difficult. In this perspective, it should be noted that the 82% of the variation captured in the maps, the x and y axes combined, is probably a very good result, indicating that there are real differences among the languages. This picture is therefore highly compatible with a scenario of early common development where the languages have had time to develop in different directions.

To follow up on the PCA results, we have also carried out a Multinomial Logistic Regression analysis on the data. Multinomial Logistic Regression is used to predict the probability of an outcome with more than two values (in our case, the 49 verb classes), which is estimated as a weighted average of the input data (in our case the binary, language-specific attestations of the sememes). For our purposes, the method is used to further help us distinguish among the language branches.

Reconstruction of the semantics of the Dative Subject Construction for a common proto-stage

Figure 10. Reconstruction of the semantics of the Dative Subject Construction for a common proto-stage

The summary of the analysis shown in Table 3 corroborates the PCA results. Only Old Norse-Icelandic, Old Russian, and Latin are statistically significantly different from the overall average, supporting the view presented above that these are the branches with the strongest attestations of sememes with respect to verb classes.

Figure 9 also shows that the earliest attested languages, Latin and Ancient Greek, cluster together, while the three more recently documented languages, Old Norse-Icelandic, Old Lithuanian, and Old Russian, deviate from these, and from each other. This is in fact highly compatible with the scenario presented in Figure 4 above, of early inheritance and productivity, suggesting that the construction is inherited but that it has become productive in the history of Germanic, Baltic, and Slavic, albeit in different ways. This is exactly what one would expect from an ancient inherited category, namely that it has developed and not remained static; moreover, if the developments are independent of each other, they are expected to go in different directions.

On the basis of all this, we would like to suggest a reconstruction of the semantics of the Dative Subject Construction, for a common Indo-European protostage, as shown in Figure 10. We have excluded here Verbs of Speaking, which are only documented in two language branches, Old Norse-Icelandic and Ancient Greek, but we have included all the other high-level semantic classes, as all of these are found in at least four language branches.

We would like to emphasize, however, that we are not making claims about the structure of the semantic space of the Dative Subject Construction for Proto-Indo-European, as our investigation is only based on five Indo-European subbranches. In order to make such claims, more Indo-European language branches must be investigated. One could claim, however, that our reconstruction may be valid for a common West-Indo-European stage, given that such a stage existed (cf. Kulikov 2009). What we have shown, however, is how a semantic reconstruction of constructional semantics may be accomplished within historical-comparative linguistics, irrespective of whether one can reconstruct any specific lexical predicates or not for the relevant proto-stage (for a reconstruction of a specific cognate set for the Dative Subject Construction in Proto-Indo-European, see Barðdal & Smitherman 2013).

To conclude, the present comparison of sememes found with the Dative Subject Construction across five different Indo-European language branches does not suggest an independent development but either a recent common development or an early inheritance. However, the fact that there are not very many cognate predicates found across these branches, instantiating the Dative Subject Construction, suggests an early inheritance rather than a recent common development. Furthermore, a Principal Component Analysis shows that Latin and Ancient Greek, the oldest attested branches in our database, are most similar to each other, while the later attested branches, Germanic, Slavic, and Baltic, all show signs of independent developments. In other words, our preliminary results, based on five branches of Indo-European, certainly suggest that the Dative Subject Construction is inherited in these branches, as opposed to being borrowed or having arisen independently in the different daughters (cf. Barðdal & Eythorsson 2009, 2012b, Barðdal 2013, Barðdal & Smitherman 2013, Barðdal et al. 2013). This inherited Dative Subject Construction has certain semantic properties, which are in principle reconstructable as a semantic space, meaning that even though we are not reconstructing any individual verbs and their lexical semantic meaning, it is still possible to reconstruct the meaning of more abstract argument structure constructions for earlier language stages and dead languages, given the tools of Cognitive Construction Grammar in combination with the Semantic Map Model. Finally, Principal Component Analysis and related techniques may be used to detect more fine-grained differences between the language branches; in this case it has shown that the later-documented languages have partly developed in different directions, while at the same time maintaining a common semantic core.

 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Mathematics
Political science
Philosophy
Psychology
Religion
Sociology
Travel