Technologies of IoT Big Data Analytics
Big data analytics is used along with the IoT in many applications to predict outcomes, find the possibilities of events to have happened in the past, and to show some alternatives for the result. In part, we will be discuss on the major technologies, such as cloud computing, fog computing, edge computing, relational databases, non-relational databases, and tools used for performing big data analytics.
9.3.1 Cloud Computing
It is on-demand computing, which provides resources such as a server with storage space and virtual operating systems. Applications and tools which are needed on rental basis will be given to the user by the cloud service provider by using the Internet. The major requirement in cloud computing is that the systems must be connected to the Internet all the time to ensure less downtime and high uptime (Doukas et al. 2012). It came into existence when it was harder to manage the files in the traditional file system (Cai et al. 2016).
The IoT data gets stored in the cloud server, then the stored data can be retrieved by the IoT user at any point of time (Stergiou et al. 2018). The recorded IoT data will be stored in the cloud and can be accessed by the user at any time irrespective of geographical location (Cai et al. 2016).
9.3.2 Fog Computing
It is also called as fogging or fog network. It transmits data by bridging the communication between the user’s devices such as mobiles and the cloud server (Aazam et al. 2014). The devices are also called as edge devices and perform computation and store data. In simple terms, it is a process of extending the cloud computing to the end user or enterprises (Bonomi et al. 2012). It is closer to the IoT users as the cloud server can be in any geographical location, but the fog will be near the user, making the information reach faster, which is explained in Figure 9.2.
■ Reduced latency
■ Faster response time
■ Increased efficiency
■ Performance even with unavailable bandwidth
■ Improved consistency
■ Better security
■ Increased business agility
■ Less operation cost
Figure 9.2 Architecture of edge, cloud, and fog computing.
Examples of Applications
■ Electrical grid
■ Wind farms
■ Oil wells
9.3.3 Edge Computing
As IoT devices are increasing exponentially every day, it’s hard to compute, store, process, and retrieve the data from the centralized cloud due to load in the network (Satyanarayanan 2017)- Here edge computing is used to reduce the latency between the devices and the cloud. Even the data centers have many resources. Still the transfer rate and response time are challenges in the network topology of the IoT environment (Shi et al. 2016).
It is the small data centers that store the data locally and send the recorded IoT data to the central IoT cloud.
■ Reduction in traffic
■ Less response time
■ Easy access of data
■ Increased performance
■ Local storage
9.3.4 Data Virtualization
It is the process of managing the data by storing and retrieving it without physical access to it and by saving memory space, as virtual memory is being used instead of physical memory. It is the process of consolidating the data from multiple sources to develop single point of access as the user doesn’t know the locations of servers (Pangal et al. 2008).
An organization uses different types of servers. By using virtualization, it is possible to make the data available in single location (Weng et al. 2004).
It plays a role in the following:
■ Business integration
■ Data integration
■ Enterprise search.
■ Decrease in data errors
■ Better workload management
■ Reduction in data storage
■ Improved access speed
■ Reduced support time
9.3.5 Data Quality Assurance
It is the process of assessing data to check whether it behaves as intended. It consists of the following steps (Gudivada et al. 2017):
- 1. Defining the objective
- 2. Assessing previous state
- 3- Verifying the gap between the objective and achieved state
- 4. Improving the current plans as per the needs
- 5. Implementing the mechanism to the problem
- 6. Checking if the data works as intended.
9.3.6 Data Preprocessing
It is the process in which raw data will be converted into a meaningful form so that decisions can be taken based on them.
The following steps are carried out in this process:
■ Data cleansing
■ Data transformation
■ Data reduction.
9.3.7 Data Cleansing
Data may contain a lot of unrelated information, and some values may also be missing. To solve this issue, data cleansing can be used (Maletic et al. 2000).
9.3.8 Data Transformation
Here the cleansed data will be converted into a form which will be easy to manipulate. The conversion of data from one form to another form is called as data transformation.
9.3.9 Data Reduction
The process of reducing the duplicates in the database can be called as data reduction. Here all the redundant values will be removed from the system.
9.3.10 Data Fabric
Distributed cache is the process of making copies of data in the memory, which can then be made available in the cluster for a user. A data grid will partition the data in the memory, and then the clusters will contain only subsets of these data (Chung et al. 2011).
The following features are needed for the functioning of data grids:
■ Query distribution
■ Transaction distribution
■ Computation of the available data.