Machine Learning for Big Data Analytics, Interactive and Reinforcement
- Big Data
- Characteristics of Big Data
- The Big Data Revolution
- Why Is Big Data Analytics Important?
- Challenges FACED BY Big Data
- Machine Learning
- Types of Machine Learning
- Supervised Learning
- Unsupervised Learning
- Semi-Supervised Learning
- Reinforcement Learning
- Machine Learning Steps
- Collection of Data
- Preparing Data
- Selecting a Model
- Training Model
- Evaluation of the Model
- Tuning of Parameter
- Making Predictions (Matthew Mayo, 2018)
The term "big data” is defined as the large amount of data that is increasing epidemically. It is a way of collecting and organizing large sets of data. Data analytics is the process of examining the sets of data in order to extract the necessary information from the data by building all possible relations among various data. With this the big data seems to appear much bigger than it is. Machine learning (ML) is an application of artificial intelligence (AI) that provides the system the potential to learn on its own from the experiences observed. The volume of data that is termed big data is lot of information for a person to examine, so ML as a service in data analytics has helped in managing this huge amount of data so that it can be processed easily.
The big data trend promises to modify our living, working, and thinking by providing access to a mode of optimization, authorizing insight discovery, and improving decision-making. The ability to perceive such huge data depends on how the data has been drawn out using data analytics
Characteristics of Big Data
The characteristics of big data are as follows (Figure 13.1):
- 1. Volume: We know that data is extracted from a large number of places both online and offline, so it’s a huge amount of information to keep and maintain precisely. The name “big data” itself tells about volume. The more and the better quality of data, the better is going to be the analysis of it. However, many times storing these huge volumes of data is not easy.
- 2. Velocity: Velocity is how fast data is transmitted. It deals with calculating how swiftly the data is being transmitted from different sources in the real world, online and offline. The amount of data is very big for large organizations.
FIGURE 13.1 Characteristics of big data.
3. Variety: Data is visible to us in many forms like text, images, videos, document records, audio, and online sources. Data about the different types available to us tells about the varieties of data (Gunjan Dogra, 2018; Shweta Mittal, Om Prakash Sangwan, 2019).
The Big Data Revolution
Figure 13.2 shows the feeling which has been seen about big data in the past decade. Concealed within the tremendous quantity of amorphous junk data are scraps of information that can allow us to conclude user preferences, trends in the world, and other precious information that helps businesses grow. But why has big data become vitally important very quickly? The presence of data was always there but it was not so easily available to everyone as it is now and it was not present in such a great amount that it would matter. Now, as the Internet has become a very huge part of our lives, we unknowingly leave our impressions everywhere in technological form all over the network. Just like a police detective would excavate clues from a crime scene, different organizations analyze these various technological impressions, making a profile that allows them to compute our habits and know exactly what we want to do at a particular time .Emerging organizations use big data from social media web-browsing, industry predictions, and existing customer records to examine, visualize, and base their business selections. Different organizational sectors use big data to solve their business challenges, transform their processes, and bring about innovation. Retail, medical, banking, government, and security sectors use the big data strategy of trust, availability, and speed to
FIGURE 13.2 Big data revolution.
improve profits. Hence there has been a rush to hire data analysts who use complex languages to analyze millions of pieces of data to make future products or tap into the internal human resources data of multiple companies to assess employee involvement and retain their ability. Big data is among the special ones that have the capability to develop the entire look of tech as it is known to us. Becoming a part of this revolution will help us by shaping our future as we are now aware that big data is the upcoming future (Shweta Iyer, 2019).
Why Is Big Data Analytics Important?
The main function of big data analytics is to find solutions to problems such as cost reduction, saving time, and bringing down the risks of decision-making. Organizations are profiting by combining the ML and data analytics:
- 1. Managing risk and calculating potential causes of risk.
- 2. Determining causes of business failure and removing them in future.
- 3. Continuously offering users options in accordance with their purchasing style and preference.
- 4. Identifying any fraudulent activity by cross-checking data.
Challenges FACED BY Big Data
Big data is facing a lot of challenges. Working with big data has become a normal part of business, but this doesn’t mean that big data can be handled easily (Figure 13.3). It is quite understandable that one difficulty associated with big data is simply storing and reviewing the data. The amount of information kept in online storage devices
FIGURE 13.3 Challenges of big data.
is doubling at the rate of every two years. In the near future, the total amount will fill a pile of tablets that will reach from the earth to the moon 6.6 times. Enterprises are responsible for about 85 percent of that data. A lot of data doesn’t reside in a database because it is unstructured. Analyzing and searching through documents, photos, audio, videos, and other unstructured data can be difficult. To handle the increasing amount of data organizations are developing other options. When storage is concerned, converged and hyper-converged infrastructure and software-defined storage could make it easy for organizations to scaling out their hardware. Costs related to big data storage and the space associated with it can be reduced using technologies like compression, duplication, and tiering. Enterprises are taking the help of various equipment like NoSQL databases, Hadoop, Spark, big data analytics software, business intelligence applications, AI. and ML to help them go deeply through the data. Big data seems to be a very appealing target for hackers, so security is also a big focus of companies who have big data stores. (Cynthia Harvey, 2017)
The goals associated with big data:
- 1. Decreasing expenses through operational cost efficiencies.
- 2. Establishing a data-driven culture.
- 3. Creating new avenues for innovation and disruption.
- 4. Accelerating the speed with which new capabilities and services are deployed.
- 5. Launching new product and service offerings.
ML is the field in which computers are capable of learning by themselves from experience without being explicitly programmed. ML is the most wonderful technology of recent generations that has ever come across. The machines are capable of learning by observing and analyzing using ML.
Types of Machine Learning
The main types of ML are:
- 1. Supervised ML
- 2. Unsupervised ML
Other types of ML are (Figure 13.4):
- • Semi-supervised ML
- • Reinforcement learning (mostly considered as supervised learning) (Wikipedia; https://en.wikipedia.org/wiki/Reinforcement_learning)
In supervised learning we are have a complete chart of the data and we know how the output must look. We have complete knowledge and control over the result and the process is continuously being supervised (Figure 13.5).
FIGURE 13.5 Supervised learning.
Unsupervised learning is quite similar to supervised, however we have little or no idea about how our output will likely appear, unlike as in supervised learning. We can only draw a rough idea from the structure of the data. We get the structure by combining the data based on relationships among the variables in the data (Figure 13.6).
NOTE: The act of grouping a collection of objects in such a manner that the objects in similar groups (known as clusters) are much more alike (in some sense) to all the objects than those in different groups (clusters) is known as clustering.
FIGURE 13.7 Semi-supervised learning.
ML problems fall into this category when we have very few' labeled variables and most of the target variables are unlabeled. We take the help of those few labeled targets to decide classes for the unlabeled targets (Figure 13.7).
Reinforcement learning is taking suitable actions to maximize the benefits one can get in a specified task. It is achieved using specified powerful software and different machines that help to find the best result or path out of any task (J. Qui. Q. Wu, G. Ding, Y. Xu, S. Feng, 2016) (Figure 13.8).
Machine Learning Steps
Collection of Data
- • Accuracy of the model is checked by the quantity and quality of data.
- • The result of this part is simply a presentation of the information that will be used for training purposes.
- • Using data previously collected, through datasets from Kaggle, UCI, etc., now also takes place in this step.
- • Dispute data and prepare it for training.
- • Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data.
- • Visualize data to help detect relevant relationships between variables or class imbalances (bias alert!), or perform other exploratory analysis.
- • Divide data to training and evaluation groups.
Selecting a Model
• Choosing the right algorithm is necessary as different algorithms are made for different purposes.
- • The aim of training is to provide an accurate result to the problem.
- • Linear regression example: the algorithm would need to learn values for m (or W) and b (x is input, у is output).
- • Training step is part of every iteration.
Evaluation of the Model
- • Uses some bar or combination of bars to “compute” the individual performance of the model.
- • Testing the prototype with past data that we have not seen.
Tuning of Parameter
- • This step refers to hyperparameter tuning, which is an artform as opposed to science. It tunes models for enhanced performance..
- • Increase in performance of the tuned model.