Big Data Analytics and Machine Learning for Industry 4.0: An Overview

Nguyen Tuan Thanh Le and Manh Linh Pham

Big Data Analytics for Industry 4.0

Characteristics of Big Data

The concept of “Big data” was mentioned for the first time by Roger Mougalas in 2005 [I]. It refers to a large scale data, one of the characteristics of Industry 4.0, that cannot be stored in a single computer and is almost impossible to be handled using traditional data analytics approaches. Big data applications exploded after 2011 are related to the improvement in computing power and storage as well as the reduction in the cost of sensors, communication and, recently, the development of the Internet of Things (IoT). These advances have leaded to the utilization of multiple sources (sensors, applications, people, and animals) in the generation of data. In 2011, Big data is defined by [2] using 4Vs characteristics, including: Volume, Velocity, Variety, and also Value. Then the fifth one, Veracity, was introduced in 2012 [3], as shown in Fig. 1.1.

Vs Characteristics of Big Data

FIGURE 1.1 5Vs Characteristics of Big Data

Volume hints to the size and/or scale of datasets. Until now, there is no universal threshold for data volume to be considered as big data, because of the time and diversity of datasets. Generally, big data can have the volume starting from exabyte (EB) or zettabyte (ZB) [4].

Variety implies the diversity of data in different forms which contain structured, semi-structured, or unstructured ones. Real-world datasets, coming from heterogeneous sources, are mostly under unstructured or semi-structured form that makes the analysis challenging because of the inconsistency, incompleteness, and noise. Therefore, data prepossessing is needed to remove noise, which includes some steps such as data cleaning, data integrating, and data transforming [5].

Velocity indicates the speed of processing data. It can fall into three categories: streaming processing, real-time processing, or batch processing. This characteristic emphasizes that the speed of producing data should keep up with the speed of processing data [4].

Value alludes to the usefulness of data for decision making. Giant companies (e.g., Amazon, Google, Facebook, etc.) analyze daily large scale datasets of users and their behavior to give recommendations, improve location services, or provide targeted advertising, etc. [3].

Veracity denotes the quality and trustworthiness of datasets. Due to the variety characteristic of data, the accuracy and trust become harder to accomplish and they play an essential role in applications of big data analytics (BDA). As with analyzing millions of health care entries in order to respond to an outbreak that impacts on a huge number of people (e.g., the CoVid-19 pandemic) or veterinary records to guess the plague in swine herd (e.g., African swine fever or porcine reproductive and respiratory syndrome), any ambiguities or inconsistencies in datasets can impede the precision of analytic process [3], leading to a catastrophic situation.

Generally, big data in the context of Industry 4.0 can originate from several and various sources, such as: product or machine design data, machine-operation data from control systems, manual-operation records performed by staff, product-quality and process-quality data, manufacturing execution systems, system-monitoring and fault-detection deployments, information on operational costs and manufacturing, logistics information from partners, information from customers on product utilization, feedback, and so on and so forth [6]. Some of these datasets are semi-structured (e.g., manual-operation records), a few are structured (e.g., sensor signals), and others are completely unstructured (e.g., images). Therefore, an enterprise 4.0 requires cutting-edge technologies that can fully take advantage of the valuable manufacturing data, including: machine learning (ML) and BDA.

Characteristics of Big Data Analytics

BDA can be referred to as “the process of analyzing large scale datasets in order to find unknown correlations, hidden patterns, and other valuable information which is not able to be analysed using conventional data analytics” [7], as the conventional data analysis techniques are no longer effective because of the special characteristics of big data: massive, heterogeneous, high dimensional, complex, erroneous, unstructured, noisy, and incomplete [8].

BDA has attracted attention from not only academic but also industrial scientists as the requirement of discovering hidden trends in large scale datasets increases. Reference [9] compared the impact of BDA for Industry 4.0 with the invention of the microscope and telescope for biology and astronomy, respectively. Recently, the considerable development in the ubiquitous IoT (i.e., Internet of Things), sensor networks, and CPS (i.e., cyber-physical systems) have expanded the data-collection process to an enormous scale in numerous domains, including: social media, smart cities, education, health care, finance, agriculture, etc. [3].

Various advanced techniques to analyze data (i.e., ML, computational intelligence, data mining, natural language processing) and potential strategies (i.e., parallelization, divide and conquer, granular computing, incremental learning, instance selection, feature selection, and sampling) can help to handle big data issues. Empowering more efficient processing, and making better decisions can also be obtained by using these techniques and strategies [3].

Divide and conquer helps to reduce the complexity of computing problems. It is composed of three phases: firstly, it reduces the large-complex problem into several smaller, easier ones; secondly, it tries to solve each smaller problem; and finally, it combines the solutions of all the smaller problems to solve the original problem [3].

Parallelization allows one to improve computation time by dividing big problems into smaller instances, distributing smaller tasks across multiple threads and then performing them simultaneously. This strategy decreases computation time instead of total amount of work because multiple tasks can be performed simultaneously rather than sequentially [10].

Incremental learning is widely practiced and used to handle streaming data. It is a learning algorithm and can be trained continuously with additional data rather than current ones. In the learning process, this strategy tunes parameters each time new input data comes in [10].

Granular computing helps to simplify the elements from a large space by grouping them into subsets, or granules [11, 12]. By reducing large elements to a search space which is smaller, any uncertainty of elements in this search space is identified effectively [13].

Feature selection is useful for preparing high scale datasets. This strategy handles big data by determining a subset of relevant features which are for an aggregation. Nevertheless, the data representation is more precise in this particular strategy [14].

Instance selection is a major approach for pre-processing data. It helps to shorten training sets and run-time in the training phases [15].

Sampling is a method for data reducing that helps to derive patterns in big datasets by generating, manipulating, and analyzing subsets of the original data [10].

Machine Learning for Industry 4.0

Machine learning (ML), a state-of-the-art subfield of Artificial Intelligence (AI) - Fig. 1.4, has now been powering several aspects of our society: information search on the Internet, content filtered on social networks, recommendations of e-commercial platforms, or accurate language translation, virtual classrooms in education, support for diagnosing diseases in medicine, etc. [16]. It has been applied successfully to solve several real problems, such as: transcribing speech into text, matching new items, identifying objects, selecting relevant search results, etc. [16].

Actually, the goal of a typical ML to find a mathematical formula (i.e., the model), when applied to a collection of inputs (i.e., the training data) then produces the desired outputs [17]. The “invented” mathematical formula is also expected to generate the “correct” outputs for most other new inputs (distinct from the training data) on the assumption that those inputs come from the same or a similar statistical distribution of the training data [17].

In order to teach the machine, three components are needed, including:

• (1) data - the more diverse and bigger the data, the better the result; (2) features - also know as parameters or variables (e.g., age, gender, stock price, etc.), they are the factors that the machine is looking at; and (3) algorithms - the steps we follow to solve the given problem that affects the precision, performance, and size of our model [18].

Generally, ML algorithms can be classified into four main types: (1) supervised learning, (2) unsupervised learning, (3) semi-supervised learning, and (4) reinforcement learning [17, 19], as shown in Fig. 1.2.

Four Main Categories of Machine Learning

FIGURE 1.2 Four Main Categories of Machine Learning

Supervised Learning

To find out a mathematical formula that maps inputs to already-known outputs, when provided with a set of human-annotated examples, is the purpose of a supervised learning algorithm. In this case, we have a “supervisor” or a “teacher” who gives the machine all the answers, like whether it’s a cat or a dog in a given picture [18], i.e., a classification problem. The “teacher” has already labeled input datasets and the machine will learn on top of these examples [18].

A specific kind of supervised learning is self-supervised learning, in which the machine learns without human-annotated labels [19]. There are still associated labels, but they are generated from the input data typically using a heuristic algorithm [19].

Unsupervised Learning

In contrast to the former, in this case, the machine has no “supervisor” or “teacher”. Input data is not labeled; the machine is left on its own, trying to find certain hidden patterns or structures in datasets. For example, in a clustering problem, the model (i.e., a mathematical formula) will output a cluster identifier for each input data. Or in a dimensionality reduction problem, the model will output a new vector with fewer features than the original one of input data [17].

Semi-Supervised Learning

In this case, input data contains not only labeled but also unlabeled examples. The purpose of semi-supervised learning is the same as the purpose of supervised learning [17]. By using several unlabeled examples (i.e., adding more information about the problem that then reflects better the probability distribution of data), we can help our learning algorithm to find out better models [17].

Reinforcement Learning

In this case, the machine, also called an agent, is embedded in an environment. It is capable of observing states of that environment as a vector of features, and then can perform actions in response to these states [17]. Different actions can give different rewards to the agent and could also move it to another state of the environment, and so on. In contrast to supervised learning and unsupervised learning, where we operate with static datasets, in reinforcement learning, we work with dynamic datasets collected repeatedly from a dynamic environment.

Learning an optimal policy is the target of a reinforcement learning algorithm, i.e., a function that inputs the feature vector of a state and then outputs an optimal sequence of actions to execute in that state [17], in order to maximize long-term accumulated reward. For instance, a sequence of scaling actions such as adding or removing virtual machines/containers to keep up with fluctuation of a resource’s demand for big data analytic application can be a result drawn from a reinforcement learning algorithm. The feature vector here consists of information from the application itself (e.g., current computing power, workload) and the surrounding environment (e.g., type of media, other co-located applications).

Machine Learning for Big Data

Conventional ML approaches cannot handle efficiently big data problems because of its 5Vs characteristics (i.e., high speeds, diverse sources, low-value density, large volumes, and incompleteness) [3]. Therefore, several advanced ML techniques for BDA are proposed: feature learning, transfer learning, distributed learning, active learning, and deep learning [3].

Feature learning empowers a system to figure out automatically the representations required for feature detection or classification from raw datasets [3].

Transfer learning allows the machine to employ knowledge which has been learned from one context to new contexts. By transferring useful information from similar domains, it efficiently improves a learner from one specific domain [20].

Distributed learning aims to alleviate the scalability issue of conventional ML by distributing computations on datasets among a couple of machines for scaling up the process of learning [21]. One of the platforms using this distributed approach to resolve scaling problems for multi-cloud applications was proposed by [22].

Active learning aims to employ adaptive data collection. In this process, parameters are adjusted automatically to gather as quickly as possible useful data for accelerating ML tasks and overpowering the problem of labeling [23].

(a) Typical Architecture of Deep Learning Neural Network with One Output, One Input, and К Hidden Layers; (b) Artificial Neuron

FIGURE 1.3 (a) Typical Architecture of Deep Learning Neural Network with One Output, One Input, and К Hidden Layers; (b) Artificial Neuron: Basic Computational Building Block for Neural Networks

Deep learning (DL) can be employed to extract complex and high- level abstractions of data representations. It is done by using a hierarchical, layered architecture of learning, where more abstract features (i.e., higher- level) are stated, described, and implemented on top of less abstract ones (i.e., lower-level) [24] - see Fig. 1.3(a). DL techniques can analyze and learn from an enormous amount of unsupervised data, which is suitable for BDA in which raw data is mostly unlabeled as well as uncategorized [24]. We will focus on DL for Industry 4.0 in the next section.

Deep Learning for Industry 4.0: State of the Art

Deep learning, the most exciting branch of ML - Fig. 1.4, has been expanded on the base of classic Artificial Neural Networks (NNs). It supports computational models that, in contrast to shallow NN-like models with only a few layers, consist of multiple processing (non-linear) layers. Each layer will take charge of a different level of abstraction that helps to learn hierarchical representation of data. The functionality of DL is emulated from the operation of the neuron network in the human brain for processing ambient signals [25], with the notions of axon, synapse, dendrite - see Fig. 1.3(b). DL, with different types (e.g., Recurrent Neural Networks, Autoencoders, Convolutional Neural Networks, Deep Belief Net, etc.), has outperformed others conventional ML techniques as well as improved dramatically cutting-edge real problems in the recognition of objects, speech recognition, object detection, language translation and several other areas such as self-driving cars, genomics, games, or drug discovery, etc. [26, 16].

DL helps to discover convoluted structures in a large scale dataset by using an optimization algorithm, called hackpropagation, meaning “error backward propagation”. It specifies how a model changes its (up to billions of) internal parameters. In each layer, these parameters are used to compute the representation based on the last one [16]. Most contemporary DL algorithms are based on Stochastic Gradient Descent (SGD) [27].

Relationships between DL. ML, and AI [27]

FIGURE 1.4 Relationships between DL. ML, and AI [27]

In addition, DL requires fairly little manual engineering, as it can profit from increases in the amount of available data and computation [16], and thus is suitable for BDA. In consequence, several hidden features, which might not be seen obviously by a human, can be exposed by using a DL model [25].

For the industry sector, in order to accelerate technologies toward smart manufacturing, equipping intelligent and high-precision systems is very important because they directly affect straightly the efficiency of related products, reinforce productivity, and also reduce operation costs as well as maintenance expenses [25]. In this context, a DL model can play an essential role. Indeed, a wide range of applications for industry such as controlling robots, object detection and tracking, visual inspection of product lines, fault diagnosis, etc., can benefit by applying a DL model [25].

Luckow et al. [28] investigated visual inspection of product lines using Convolutional Neural Network architectures including AlexNet and GoogLeNet over different DL platforms, such as: Tensorflow, Caffe, and Torch. In this work, several vehicle images, along with their annotation, in the assembly line are submitted to a DL system. Consequently, by using the Tensorflow platform, they achieved the best performance with accuracy of 94%.

Lee et al. [29] tackled the detection of faults found in the process of transferring geometric shapes on a mask to the surface of a silicon wafer and classification problem in noisy settings by employing Stacked Denoising Auto-Encoders (SdA). They help to lower the noise contained in descriptive sensory data, derived from electrically mechanical disturbances as well as carry out classifications of fault. The results of this paper showed that, in comparison with other baseline methods (e.g., Support Vector Machine or К-Nearest Neighbors), SdA drives to about 14% more accuracy in noisy situations.

Another w'ork that involved SdA was of Yan et al. [30]. They performed the detection of abnormal actions of combustion gas turbines by applying extreme learning machines joined with SdA. Their results showed that the features detected by SdA led to a more improved classification in comparison with hand-crafted features.

Shao et al. [31] extracted features in a fault diagnosis system for rotating devices with the input of vibration data by applying Deep Neural Networks. The authors combined Denoising Auto-Encoders with Contractive Auto-Encoders. To diagnose the fault, they refined the learned features using Locality Preserving Projection, then put them into a softmax classifier. Seven conditions were considered in their system, including: rubbing fault, compound faults (rub and unbalance), four levels of imbalance faults as well as normal operation. The device status is identified based on exploitation of vibration data by the diagnosis system. It figures out whether the device is in fault or normal condition. Their approach used on the experiments to diagnose the fault of locomotive bearing devices and rotors showed that it could beat Convolutional Neural Network and other shallow learning methods.

Lee [32] supported that the detection of faults belongs to several defect types often appearing on headlight modules of cars in a setting of vehicle manufacturer by proposing a Deep Belief Network (DBN) model together with a cloud platform and an IoT deployment. The results showed that the DBN model outperformed two other baseline methods (i.e., Radial Basis Function, and Support Vector Machine) with regard to error rate in test datasets.


In this chapter, we have reviewed two promising technologies for Industry 4.0, namely BDA and ML. We focused on the data aspect of smart manufacturing, which is fast and massive, and cannot be handled efficiently by conventional approaches. Indeed, by employing BDA and ML, especially DL, a wide range of industrial applications is proven to be accelerated. Although few successful works were reported in the literature, we believe that an optimizing and fully automated production on a large scale could be achieved in the very near future because of these potential advanced technologies.


This work has been partly supported by Vietnam National University, Hanoi (VNU), under Project No. QG.20.55.


  • 1. R. Magoulas and B. Lorica. Introduction to big data. O’Reilly Media, Sebastopol, CA. February 2009.
  • 2. J. Gantz and D. Reinsel. Extracting value from chaos. IDC iview, 1142(2011): 1—12, 2011.
  • 3. R. H. Hariri, E. M. Fredericks, and К. M. Bowers. Uncertainty in big data analytics: survey, opportunities, and challenges. Journal of Big Data, 6(1):44, 2019.
  • 4. M. Chen, S. Mao, and Y. Liu. Big data: A survey. Mobile Networks and Applications, 19(2): 171-209, 2014.
  • 5. J. Han, J. Pei, and M. Karnber. Data mining: concepts and techniques. Elsevier, 2011.
  • 6. M. Schuldenfrei. Big data challenges of industry 4.0. Datanami, April 25 2019.
  • 7. N. Golchha. Big data-the information revolution. International Journal of Advanced Research. 1( 121:791-794, 2015.
  • 8. C.-W. Tsai, C.-F. Lai, H.-C. Chao, and A. V. Vasilakos. Big data analytics: A survey. Journal of Big data. 2(1):21, 2015.
  • 9. M. Hilbert. Big data for development: A review of promises and challenges. Development Policy Review, 34(1):135—174, 2016.
  • 10. X. Wang and Y. He. Learning from uncertainty for big data: Future analytical challenges and strategies. IEEE Systems. Man, and Cybernetics Magazine, 2(2):26—31, 2016.
  • 11. A. Bargiela and W. Pedrycz. Granular computing. In Handbook on Computational Intelligence: Volume 1: Fuzzy Logic, Systems, Artificial Neural Networks, and Learning Systems, pages 43-66. World Scientific, 2016.
  • 12. J. Kacprzyk, D. Filev, and G. Beliakov. Granular, Soft and Fuzzy Approaches for Intelligent Systems. Springer International Publishing, New York City, NY, 2017.
  • 13. R. R. Yager. Decision making under measure-based granular uncertainty. Granular Computing, 3(4):345-353, 2018.
  • 14. H. Liu and H. Motoda. Computational Methods of Feature Selection. CRC Press, Boca Raton. FL. 2007.
  • 15. J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, J. F. Martinez-Trinidad, and J. Kittler. A review of instance selection methods. Artificial Intelligence Review, 34(2): 133—143, 2010.
  • 16. Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436—444, 2015.
  • 17. A. Burkov. The Hundred-Page Machine Learning Book. Andriy Burkov, Quebec City, 2019.
  • 18. V. Zubarev. Machine learning for everyone: In simple words, with real-world examples, yes, again, 2019.
  • 19. F. Chollet. Deep Learning with Python. Manning Publications Co., 2017.
  • 20. K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big data, 3(1 ):9, 2016.
  • 21. J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing, 2016(1):67, 2016.
  • 22. L. M. Pham and T.-M. Pham. Autonomic fine-grained migration and replication of component-based applications across multi-clouds. In 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pages 5-10. IEEE. 2015.
  • 23. S. Athmaja, M. Hanumanthappa, and V. Kavitha. A survey of machine learning algorithms for big data analytics. In 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pages 1-4. IEEE, 2017.
  • 24. M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya. R. Wald, and E. Muharemagic. Deep learning applications and challenges in big data analytics. Journal of Big Data. 2(1):1. 2015.
  • 25. M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani. Deep learning for iot big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials, 20(41:2923-2960, 2018.
  • 26. J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85-117,2015.
  • 27. I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
  • 28. A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, and B. Vorster. Deep learning in the automotive industry: Applications and tools. In 2016 IEEE International Conference on Big Data (Big Data), pages 3759-3768. IEEE. 2016.
  • 29. H. Lee, Y. Kim, and C. O. Kim. A deep learning model for robust wafer fault monitoring with sensor measurement noise. IEEE Transactions on Semiconductor Manufacturing, 30( 1 ):23—31. 2016.
  • 30. W. Yan and L. Yu. On accurate and reliable anomaly detection for gas turbine combustors: A deep learning approach. arXiv preprint arXiv: 1908.09238, 2019.
  • 31. H. Shao, H. Jiang, F. Wang, and H. Zhao. An enhancement deep feature fusion method for rotating machinery fault diagnosis. Knowledge-Based Systems, 119:200-220, 2017.
  • 32. H. Lee. Framework and development of fault detection classification using iot device and cloud environment. Journal of Manufacturing Systems, 43:257-270, 2017.
< Prev   CONTENTS   Source   Next >