Perspectives for Applying Machine Learning and Big Data Analytics to OSH Management Processes
As mentioned in section 6.2.2, the progress observed recently in the field of ICT, and in particular the development of applications based on cloud and edge computing, makes it possible to collect and process more and more data, including that collected through smart PPE and workplace wearables, which make it possible to monitor in real time the state of working conditions and behaviour of workers in the context of safety and health at work. On the other hand, the growing number of data sets obtained by these systems makes it possible to use artificial intelligence methods, including machine learning and big data analytics algorithms, to obtain new knowledge and useful insights suitable for advanced OSH management functions. The purpose of this section is to present some leading concepts in this field, review selected examples of their application in practice, and discuss perspectives for their further development and use to support key processes in OSH management systems.
Basic Notions and Underlying Concepts
From the very beginning of the development and promotion of the concept of digital transformation of industry, both in the professional literature dealing with new industrial technologies and in the scientific literature on computer science and related domains, as well as in other media, the terms such as “artificial intelligence”, “machine learning”, “deep learning”, “big data analytics”, “predictive analytics”, etc. began to appear more and more frequently. These concepts are, to a great extent, inter-related and often used interchangeably, therefore, in order to properly discuss them, their brief definitions are presented below.
Artificial intelligence (AI) is defined in literature in many ways, depending on the needs of the authors or users of a given definition. For example, the Merriam- Webster Dictionary (2020a) defines this as: “1: a branch of computer science dealing with the simulation of intelligent behaviour in computers, 2: the capability of a machine to imitate intelligent human behaviour”. The Council of the Organisation for Economic Co-operation and Development (OECD) gives the following definition: “An AI system is a machine-based system that can, for a given set of human- defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. AI systems are designed to operate with varying levels of autonomy” (OECD 2019). The second definition refers to the development and deployment of “intelligent” systems and therefore seems to be more appropriate to have in mind when reading this chapter.
Machine learning (ML) is a dynamically developing sub-area of AI, which deals with algorithms for analysing and learning from data and then using the acquired knowledge to produce useful insights and/or predict the course of various phenomena in the future. Machine learning algorithms are commonly divided into three main categories, i.e. (1) supervised learning, (2) unsupervised learning and (3) reinforcement learning, although many scholars propose more complex taxonomies. For example, Ayodele (2010) extends this division into three additional categories, semi-supervised learning, transduction, and learning to learn, while Dey (2016) also lists another four: multi-task learning, ensemble learning, neural networks, and instance based learning. Describing and discussing the different types and possibilities of machine learning algorithms is outside of the purpose and scope of this chapter, but there is a wealth of literature and many Internet resources where interested readers can find more knowledge in this field, e.g. in Ayodele (2010), Dey (2016), Kelleheret al. (2015), and Kubat (2017).
Deep learning is in turn a sub-field of machine learning that can be defined as “an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making. Deep learning is a subset of machine learning in artificial intelligence (AI) that has networks capable of learning unsupervised from data that is unstructured or unlabelled. Also known as deep neural learning or deep neural network” (Investopedia 2020b). Deep learning methods make it possible to solve complex cognitive problems such as object detection, speech recognition, translation of texts into other languages, recognition and classification of images, etc.
Predictive analytics is a form of advanced data analytics that aims to extract data from analysed data sets in order to identify specific trends and patterns that can then be used to predict future trends or events. The techniques of predictive analytics can generally be divided into two main groups: the first is aimed at discovering historical patterns in the outcome variables and extrapolate them to the future, while the second is aimed at capturing the interdependencies between outcome variables and explanatory variables, and exploit them to make predictions (Gandomi and Heider
2015). Predictive analytics may be applied to both small and very large datasets.
Big data refers to high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation (Gartner IT Glossary 2020a); however, there are many similar, although differently formulated, definitions of this term in literature. Big data was initially described by indicating three main determinants of this concept, often referred to as three Vs (Gandomi and Heider 2015), i.e. volume (amount of data, e.g. in terms of terabytes or petabytes), variety (i.e. structured and/or unstructured, variety of data forms), and velocity (speed of data generation, real time, near time or off-line). But recently, some authors and influences from the field of big data analytics and related domains (e.g. Marr 2014) mention two other Vs as important features of big data, namely Veracity, which refers to the messiness or trustworthiness of the data, and Value, which emphasises the need to collect such data, which will have business value, and the benefits of their processing will be greater than the costs incurred.
Big data analytics concerns the application of specific methods for the analysis of data sets corresponding to the above-mentioned definition of big data, for which the analysis with the use of traditional statistical methods is pointless or impossible. The aim of big data analytics is to obtain useful actionable insights that can be used for making decisions leading to the development of new products and services, improving the efficiency of processes, reducing the costs of business and generally to acquire new knowledge, which can then be used for many different purposes. The process of big data analytics can be roughly divided into two stages: data management and data analytics (Gandomi and Haider 2015). The data management includes data acquisition and recording, data extraction, cleaning, and annotation, as well as data integration, aggregation and representation. The second stage consists of data modelling and analysis and is followed by the interpretation of data.
In the literature and practice on AI, machine learning and big data analytics one can often find another term, “data mining”, which can be defined as “the process of discovering meaningful correlations, patterns and trends by sifting through large amounts of data stored in repositories. Data mining employs pattern recognition technologies, as well as statistical and mathematical techniques” (Gartner IT Glossary 2020b). This definition points to close inter-relations between this concept and the ones presented above, as, for example, machine learning and deep learning techniques can be used in favour of data mining, while deep mining methods can be a part of prediction and big data analytics methodology.
Applications of Machine Learning and Big Data Analytics in the Field of OSH
Dynamic development of AI methods and technologies, including, in particular, increasingly better techniques and tools in the field of machine learning and big data analytics, making possible the acquisition of new knowledge and useful insights, both on the basis of historical data and data acquired on an ongoing basis, made it possible to apply these technologies in practice in many different sectors of business activity, including the support of activities in the area of OSH (Ajayi et al. 2018). This section presents some examples of applications based on machine learning and big data analytics in off-line mode, i.e. those that aim at acquiring new knowledge which can later be used by employers, safety managers and other stakeholders to better plan and perform OSH-related activities. The next section will be devoted to the application of artificial intelligence combined with ICT and IoT networks to support selected OSH management processes to be carried in real or near-real time.
One of the first achievements in this field is the use of supervised learning algorithms (SVMs) to predict accidents at work on the basis of input variables describing working conditions, such as employment status, occupation details, seniority in the company, main accident hazards, physical demands, psychosocial possibilities, w'ork rhythm determining factors, fit between working hours and family or social commitments, and recent workplace risk assessment. The study conducted by Suarez Sanchez et al. (2011) showed inter-alia that the SVM technique was able on the basis of data on the working conditions to indicate those workers who had suffered from an accident at work in the last year and those who had not been subject to any accidents in the past.
Another example of using machine learning to predict accidents at work was presented by Tixier et al. (2016), who applied Random Forest and Stochastic Gradient Tree Boosting algorithms to a data set of binary attributes and categorical safety outcomes extracted from a large pool of textual construction injury reports by means of the highly accurate Natural Language Processing tool. The developed model was able to predict injury type, energy type, and body part affected with the ability exceeding the previous parametric models.
Another example of using big data methods to improve workplace safety in offline mode is the platform developed by Guo et al. (2016), which enables to classify, collect and store data about unsafe behaviours of construction workers. An intelligent video surveillance system and a mobile application were used to collect information on workers’ unsafe behaviours. Behaviour-related data could be retrieved with complete semantic information, including identification, time, image, source, location, description, unsafe behaviour type and a possible injury. The introduction of this system on a metro construction site demonstrated that it could effectively analyse semantic information contained in collected images, automatically extract workers’ unsafe behaviour and quickly retrieve this information from the dedicated database system.
An example of machine learning applications in the area of OSH at the level of governmental administration is the Risk Group Prediction Tool (RGPT), developed by the Norwegian Labour Inspection Authority to assist labour inspectors in selecting enterprises w'ith regard to workplace risks (Dahl and Starren 2019). This tool covers approximately 230,000 enterprises in Norway and divides them into four groups according to their OSH-related risks. It is assumed that the higher the risk group, the higher the probability that a future inspection of working conditions will detect deviations from OSH regulations in the company. RGPT was built on the basis of predictive modelling with the help of a machine learning algorithm using binary logistic regression analysis, which is a part of the supervised learning algorithms class. With the increasing number of inspections performed, the predictions made by RGPT become gradually more precise, because the algorithm adjusts itself on the basis of feedback (correct or erroneous forecasts) registered in the database containing data on already performed inspections.