Performance by machine
Although we have presented simulation results concerning the reverse association task in section 4.3, these results were obtained using a much smaller test set than the one used for the human survey. We therefore conducted an additional system run using the human test data. The parameters were the same as described previously (section 4.2). However, as our corpus we did not use the BNC but rather ukWaC, a web-derived corpus of about 2 billion words. This has the advantage that our results can also be compared with those of the CogALex shared task [RAP 14], where the ukWaC corpus has also been used.
As done previously for the BNC, we lemmatized the ukWaC corpus and removed stop words. As our vocabulary we used a list of words which in the BNC had an occurrence frequency of 100 or higher. According to our definition of a word (string of alpha or of non-alpha characters), this was the case for 36,097 words. However, like the ukWaC corpus, we also lemmatized this list and removed stop words. We also eliminated any strings containing non-alpha characters. This reduced the vocabulary to 22,578 words.
Using this list, by counting word co-occurrences in the ukWaC-corpus, we built up a co-occurrence matrix, thereby considering a window size of plus and minus two words around a given word. The resulting co-occurrence matrix was converted into a weight matrix by applying the log-likelihood ratio [DUN 93] to each value in the matrix. Finally, the product-of-ranks algorithm was applied to compute the results for each item of five words in the test set.
Of the 2,000 items, the system got 613 right, i.e. the word on rank 1 of the computed list was exactly identical to the expected word as provided in the test set. This corresponds to an accuracy of 30.65%. Despite the larger corpus (ukWaC is about 20 times larger than the BNC), this accuracy is considerably lower than the top performance figure of 54% reported in section 4.3. However, the discrepancy is not surprising as, in section 4.3, the test was conducted with the 100 words from the Kent-Rosanoff test. This test contains mostly very common words well known for their salient associations, which are comparatively easy to predict. Also, in section 4.3, the words in the test set had been lemmatized, which is not the case here.
Therefore, let us compare our results with those of the CogALex shared task [RAP 14]. There, on exactly the same test set and based on the same corpus, the best system showed a performance of 30.45%, which almost exactly matches the performance of our system. However, while this system used sophisticated technology involving word embeddings (neural network technology), we achieved a similar result using very simple technology. It should also be noted that we did not do any parameter optimization and simply used the parameters from a previous paper [RAP 13], i.e. to obtain the current results, we did not even look at the training set, which was also provided for the CogALex shared task.
It should also be noted that our selection of vocabulary was completely unrelated to the current task. Our word list was derived from the BNC rather than from ukWaC, and the fact that the test set was derived from the EAT had no influence on our choice of words. We also used an arbitrary frequency threshold (BNC frequency of 100) for the selection of our vocabulary, rather than trying to optimize this threshold using the training set.
It should be noted that the choice of vocabulary is very important for this task, and much better results could be achieved by applying “informed guesses” on this issue. Let us mention that of the 2,000 unique solutions from the test set, in our vocabulary of 22,755 words, only 1,482 occurred, i.e. for 518 words, our system had no chance to come up with the correct solution.