Empowering Human Intelligence: The Ecological Dynamics Approach to Big Data and Artificial Intelligence in Sport Performance Preparation
Big Data in Sport
Digital technology has had a profound impact on sport (Miah, 2017). Athletes and coaches rely on digital data to monitor and enhance the performance. Officials use tracking systems to augment their judgement. Audiences use collective shared data purported to expand the places in which sports can be watched and experienced.
Nowadays, technology enables practitioners, performers, and spectators to collect and store a massive amount of data in faster, more abundant, and more diverse ways than ever. Data can be collected from various sensors and devices in different formats, from independent or connected applications. This data avalanche has outpaced human capability to process, analyse, store, and understand the information contained in these datasets. Moreover, people and devices are becoming increasingly interconnected. The increase in the number of such connected components generates a massive dataset, and valuable information needs to be discovered from patterns within the data to help improve performance, safety, health, and well-being. Not only have technological advancements led to an abundance of new data streams, repositories, and computational power, but they also have resulted in advances in statistical and computational techniques, such as artificial intelligence, that have proliferated widespread analysis of such datasets in many domains, including sport, improving our ability to plan, prepare, and predict performance outcomes. Therefore, it is unsurprising that big data is also entering research programmes in the sport sciences (Goes et al., 2020; Rein & Memmert, 2016; Chapter 2). Big data broadly refers to multiplying multiform data (e.g., structured, unstructured) and their supporting technological infrastructure (i.e., capture, storage, processing) and analytic techniques that can enhance research (Woo, Tay, & Proctor, 2020).
Big data, a term probably coined by John Mashey in the mid-1990s (Gandomi & Haider, 2015), is used to identify datasets that cannot be managed for a particular problem domain with traditional methodologies to obtain meaning, due to their large size and complexity (Proctor & Xiong, 2020). Consequently, Volume, Variety, and Velocity (the three Vs) have emerged as a common framework to describe big data. It is relevant to understand the meaning of the three Vs (Gandomi & Haider, 2015): (i) Volume is related to the size of data (many terabytes and even exabytes); (ii) Variety refers to the types of data (e.g., text, physical sensors data, audio, video, graph) and its structure (e.g., structured or unstructured); (iii) Velocity indicates the continuous generation of streams of data and the speed at which those data should be analysed. There are additional Vs being discussed nowadays (Proctor & Xiong, 2020) such as Variability (variation in the data flow), Veracity (imprecision of the data), and Value (obtain meaning to inform decisions in ways only possible with big data). Relatedly, big data mining is the capability of obtaining useful information from these large datasets (Fan & Bifet, 2014). One way of mining big data is by means of artificial intelligence, as described in the remaining chapters of this book.
For sport scientists and practitioners, the challenges start from understanding how to obtain and access data, followed by how to process and clean big data into formats usable for research and athlete support goals (Endel & Piringer, 2015). At the same time, the collected data may be incomplete, which requires methods to transform, detect, and deal with missing data. Also, the traditional statistical method of null hypothesis testing at 0.05 alpha level loses its meaning because very small differences can be statistically significant due to the very large sample sizes involved in big datasets. Thus, one obvious accompanying challenge is to understand how to obtain meaningful information and predictions from big data. One solution is to place more emphasis on statistics and computational modelling (Proctor & Xiong, 2020), such as machine learning (e.g., Couceiro, Dias, Mendes, & Araujo, 2013, see Chapter 2 for a review). Another possible complementary solution, discussed at the end of this chapter, is to become theoretically informed about what data to obtain, how to process it, and how to interpret it, instead of simply relying on computational brute force.