 # Evaluation results

The results of the evaluation are presented in Figures 6.1 and 6.2, where the results with the two initializations described in section 6.4.2 are shown: uniform (Figure 6.1) and geometric (Figure 6.2). Figure 6.1. Uniform distribution. Results as F1 varying the number of labeled nodes. For a color version of this figure, see www.iste.co.uk/sharp/cognitive.zip Figure 6.2. Geometric distribution. Results as F1 varying the number of labeled nodes. For a color version of this figure, see www.iste.co.uk/sharp/cognitive.zip

As we can see from the plots, the performance of our system on S7CG is very different from the others. This is because this dataset is coarse grained which means that the disambiguation of each word is not restricted to just one sense, as in the fine-grained datasets, but to a set of similar senses.

An important aspect to note is that the performance of the system is always increasing with the increasing labeled points. This is particularly evident on S7, where the performance passes from 0.43 to 0.57 using the uniform distribution and from 0.55 to 0.63 using the geometric

distribution. For the other datasets, the improvements given by the labeled point are in the range of 3 - 5%. Figure 6.3. Results as F1 using the geometric distribution and considering as correct the labeled nodes. The results are compared with the best supervised system on each dataset. For a color version of this figure, see www.iste.co.uk/sharp/cognitive. zip

The information given by labeled points is more effective when we use a uniform distribution to initialize the strategy space of the system. This can be explained considering that with this initialization, we use less information, and for this reason, the presence of labeled points can balance this lack.

# Comparison with state-of-the-art algorithms

The comparison of our system using a geometric distribution to initialize the strategy space of the games is presented in Figure 6.3. We compared our results with the best system that participated in each competition on each dataset if their performances are higher than those obtained with It makes sense [ZHO 10], a well-known supervised system.

From the plots, we can see that, on S7CG, the performance of our system is higher than those of supervised systems without using labeled points. This setting is the same as the one proposed in [TRI 17]. On the other datasets, we can see that the performance of our system follows a similar trend. In fact, on S2 and S3, we require 50 points to outperform supervised systems and, on S7, 15. These numbers correspond to the 2.09, 249 and 3.29 percent of S2, S3 and S7, respectively.

•  This system achieves higher results on S7CG and S3. 