Menu
Home
Log in / Register
 
Home arrow Language & Literature arrow Extending the Scope of Construction Grammar
Source

Practical implementation

In practice, SVSs record the co-occurrence frequencies of a set of target words with a large set of context words in a given window around the target words.[1] The choice of target words depends on the task and can range from all words in the corpus (e.g. to automatically identify taxonomic relations in the whole of the vocabulary), or it can be limited to a set of words, like in our case, where we want to group only the nouns and verbs occurring in the Dutch causative constructions into semantic classes. The choice of context words can be said to be dependent on how well they are able to represent the semantics of the target words. A stop-list of highly frequent (function) words is usually excluded because they occur with almost all words and cannot therefore discriminate one set of semantically related words from another. Context words with very low frequencies simply do not occur with enough target words to be a basis of comparison. Most SVSs therefore use a few thousand highly frequent context words minus a stop list of function words. The co-occurrence frequencies between target and context words are stored in vectors and collected in a large matrix. Table 2 illustrates such a co-occurrence matrix with a handful of context words. In reality, the matrix is high-dimensional with thousands of target and context words: The length of the vectors, i.e. the number of columns, is equal to the number of context words, and the number of vectors (the rows) to the number of target words. As shown, many co-occurrence counts are zero, making matrices usually quite sparse. This is because words tend to co-occur with a limited set of context words, which is exactly the property that allows the technique to capture word semantics through context. In the toy matrix in Table 2, it is clear that kiss and hug must be semantically related because they have high co-occurrence frequencies with the same context words (lovingly, mother, lovers). The same holds for kill and murder that share high co-occurrence frequencies for gun, psychopath, knife and cruelly. Soap, on the other hand, has high co-occurrence frequencies with very different context words and thus is not related.

Table 2: A matrix with imaginary co-occurrence frequencies of target words (rows) and contextual features (columns)

gun

psychopath

knife

cruelly

lovingly

mother

lovers

detergent

kiss

2

2

0

0

89

56

98

0

hug

3

1

2

5

77

49

88

0

kill

10

59

67

69

0

8

12

1

murder

97

65

58

81

0

9

9

0

soap

0

0

0

0

1

0

1

67

To capture these collocational properties even better, the raw co-occurrence frequencies are usually weighted to represent collocational strength (e.g. Point- wise Mutual Information or Log Likelihood Ratio). This has the effect of giving a higher weight to very informative context words, i.e. those that co-occur only with a limited set of semantically related target words. For the vector comparison, Semantic Vector Spaces use a geometrical approach (hence Vector Space): the weighted co-occurrence frequencies can be seen as co-ordinates defining a point in a high-dimensional context feature space. Points closer together in the space are then semantically more related. Figure 1 shows a 2D subspace of the highdimensional space where kiss and hug are close together based on their shared relatively high co-occurrence frequency with lovingly and relatively low cooccurrence frequency with cruelly, and vice versa for kill and murder.

An imaginary 2D subspace of semantic vectors

Figure 1: An imaginary 2D subspace of semantic vectors

As a proximity measure, most implementations use the cosine of the angle between the word vectors. If the angle is small, like between kill and murder, the cosine will be close to 1, indicating high similarity. If the angle is large, like between kill and kiss, the cosine is close to 0, indicating low similarity.[2] The computation of all pairwise cosine similarities between word vectors results in a word-by-word similarity matrix, shown in Table 3. Since the cosine is a symmetric similarity measure (the cosine of vector A with vector B is the same as the cosine of vector B with vector A), the matrix as a whole is also symmetric, with 1s on the diagonal (a target word is always completely similar with itself). Based on the similarity matrix, we can now derive for each target word a similarity ranking of all other target words. Depending on the application, the top most similar words can then be synonym candidates for inclusion in a thesaurus, or possible replacements in a search query to a Question Answering system. Important for this paper is that the similarity matrix can also be the input for a clustering algorithm that will try to identify groups of similar target words and hence semantic classes.

Table 3: Pairwise cosine similarities between word vectors (imaginary data)

kiss

hug

kill

murder

soap

kiss

i

0.88

0.24

0.19

0.08

hug

0.88

1

0.18

0.26

0.11

kill

0.24

0.18

1

0.91

0.14

murder

0.19

0.26

0.91

1

0

1

soap

0.08

0.11

0.14

0

1

  • [1] See Turney and Pantel (2010) for an overview of implementations and applications.
  • [2] See Weeds, Weir and McCarthy (2004) for an overview and precise mathematical characterisation of different similarity measures, including the cosine.
 
Source
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Mathematics
Political science
Philosophy
Psychology
Religion
Sociology
Travel