Human Action Recognition in Football

Unlike the golf putting described in the previous section, in football athletes can run on a straight line, side run, with quick changes of direction, with different velocities and rhythms (Dicharry, 2010). Such panoply of movements in only one action reveals how football can be challenging for action recognition. This area of studies became trendy with the promise of helping coaches, physiotherapists, and all team staff to understand how players behave during matches and, with that, lead to a faster and more efficient decision. To understand athlete’s behaviour, independently of the sport practised, and as already mentioned, feature selection is key since some of them can be more informative than others as it was addressed in Chapter 4.

This section intends to:

  • 1 Investigate the feasibility of combining and fusing data from different wearable technologies, namely Myontec’s Mbody3 and Ingeniarius’s TraXports, to recognize actions.
  • 2 Evaluate the most adequate features for identifying actions such as (i) running, (ii) running with the ball, (iii) walking, (iv) walking with the ball, (v) passing, (vi) shooting, and (vii) jumping.
  • 3 Compare the two sequence classification algorithms presented earlier in this chapter: LSTM and DBMM.

This study, fully described in Rodrigues et al. (2020), consisted of recording four futsal matches divided in a two-day tournament. Each of the four games had two 10-minute parts. Twenty-two injury-free males (22.2 ± 4.5 years, with max 39 years and min 19 years) have participated in this study. However, only one of them was equipped with a combination of the wearable technologies mentioned above. This allowed tracking both positional and physiological data. A video camera was also employed for a posterior ground-truth analysis, wherein recorded videos were synchronized with the wearables and, with that, it was possible to manually label the trials according to the action performed by the athlete.

Both LSTM and DBMM have been adopted and compared to investigate the feasibility of this solution. As supervised methods, and as described previously, both labelled training and testing datasets were needed to compare the result of both models with the expected result (ground-truth label). Table 5.1 presents the total number of trials for each action. This allows us to

TABLE 5.1 Number of trials for each different action/class


Number of trials



Running with the ball




Walking with the ball








have representativeness of each action, or class, which justifies, to some extent, the performance of the methods in their identification.

As input data for the sequence classification methods, a total of nine time-dependent individual computational metrics, or features, were considered, including the athlete’s (absolute) velocity, distance, and orientation towards the opponent goal, which were computed from kinematical data extracted from TraXports, and the normalized muscle activations values acquired from the athlete’s lower limbs with Mbody3 (see the previous chapter). The feature selection was carried out before training and testing the model. Afterwards, the feature extraction was applied to the entire dataset, processing raw data into relevant time series to be used as input data.

In a first instance, the impact of each feature in the classification result was assessed by performing five tests containing the following combination for the classifier input Ж(() for the LSTM:

Test 1 - EMG features:

Test 2 - EMG and velocity:

Test 3 - EMG and distance to the goal:

Test 4 - EMG and orientation towards the goal:

Test 5 - All features:

From the results obtained, it was possible to observe that the accuracy of LSTM increased from 59.6% to 60.13% by adding the velocity of the player to the EMG features. The same happened with the distance and orientation towards the opponent goal, having increased the model’s accuracy by 1.07% and 0.21%, respectively. When all the features were used, the accuracy improvement of LSTM was 1.32% (see Figure 5.11). Refer to Sokolova et al. (2006) for more classification performance metrics and Rodrigues et al. (2020) for an in-depth discussion of these results.

Subsequently, the input dataset containing all the features was adopted to compare both LSTM and DBMM. To train and test the model more diversely, the process of splitting the data into training (70%) and test (30%) was repeated over 30 times, randomly. This also ensured that none of the training data was used for testing in order to avoid overfitting. A performance comparison between DBMM and LSTM is hereby shown in Figure 5.12, but in order to simplify it, one of the times was chosen where the model was trained and then tested. It shows that the DBMM performed better than the LSTM, reaching an overall accuracy of 88.5% against 66.1%, respectively.

This is an interesting use case since, in general, one would expect deep learning to outperform ensemble. This could be the case, however, if the dataset had a more consistent representation of each class, though, in this particular case, some classes were underrepresented when compared to others. For instance, the overall dataset had only 15 trials of the shooting action, though it had 690 trials for the action ‘running with the ball’. Put it differently, it is known that LSTM, as a deep learning approach, requires a large representative dataset. However, while some of the actions were well represented in the dataset, others were not. This justifies the behaviour observed in the confusion matrix of the LSTM model (Figure 5.12), where the accuracy discrepancies of the different actions were substantial, making the learning

Evaluation metrics per test (adapted from Rodrigues et al. (2020))

FIGURE 5.11 Evaluation metrics per test (adapted from Rodrigues et al. (2020)).

Confusion matrices

FIGURE 5.12 Confusion matrices: (top) DBMM, (bottom) LSTM.

process of the algorithm and its consequent decision very much influenced by the class/action containing a larger amount of data, being this the main reason why the LSTM network presented an inferior performance compared to the DBMM.


Two types of AI approaches for pattern recognition in sports were introduced: non-sequence and sequence classification. Non-sequence classification deals with ‘static’ data, where each sample is described by a set of features, each having a fixed dimension. A sequence classification approach is characterized by its variable-sized features, where each sample can be represented by one or more features with changing values over time. Most of the time, non-sequence classification problems can be characterized by independent situations. In a sports context, it can be the number of times a movement is performed. Nevertheless, certain dynamic movements can still be modelled, or process variables extracted out of those and fed to non-sequence classifiers. Regarding sequence classification problems, it can be characterized by evolving situations, i.e., the movement that was performed during a certain time interval.

As expected, the more powerful the machine learning algorithm, the more complex the problem that they can deal with; deep learning methods are known for their outstanding performance. However, this benefit comes at the cost of requiring more training data; thus, larger computational resources are required. Deep learning approaches are known for their high GPU requirements, being the most resource-intensive task in the whole pattern recognition cycle. Moreover, if the data is not well represented, deep learning methods may even perform worse than other approaches since highly imbalanced datasets pose added difficulty, leading the methods to exhibit a bias towards the majority class, and, in extreme cases, even ignore the minority class altogether.


< Prev   CONTENTS   Source   Next >