The Relationship between Data Mining, Machine Learning, and Artificial Intelligence

Figure 1.3 represents the relationship among data mining, Artificial Intelligence (AI), data science, and machine learning. Artificial Intelligence can be defined as the study of training computers in such a way that computers can accomplish tasks which, at present, can be done better by humans. Machine learning is a sub-field of knowledge science that focuses on the development of algorithms that may be learned from in order to generate likelihoods based on the given information.

FIGURE 1.3

Relationship in AI and machine learning.

TABLE 1.1

Difference between Machine Learning and AI

Artificial Intelligence (AI)

Machine Learning (ML)

Focus is given to increasing success and not the accuracy.

Main focus is to get the maximum accuracy.

The goal of AI is to imitate human intelligence which will be used to solve complex problems.

In ML, the primary goal is to be trained from data on a definite task to make the most of the performance of the machine on this task.

AI leads to intelligence.

ML leads to knowledge.

It progresses to build up an arrangement to imitate humans and to behave similarly in a particular circumstance.

It involves developing self-learning algorithms which can learn independently.

The difference between machine learning and AI is represented by the Table 1.1 below.

Applications of Machine Learning

Each and every time the word machine learning is used, people generally think of "AI" and "Neural networks that can simulate human brains", Self- Driving Cars, and more. But machine learning is much different. Below we expound on anticipated and unexpected aspects of contemporary computing where machine learning is enacted.

Machine Learning: The Expected

  • 1. Speech Recognition
  • 2. Computer Vision (Facial Recognition, Pattern Recognition, and Character Recognition Techniques belong to Computer Vision)
  • 3. Google's Self-Driving
  • 4. Web Search Engine
  • 5. Photo Tagging Applications
  • 6. Spam Detector
  • 7. Database Mining for Growth of Automation
  • 8. Understanding Human Learning

Machine Learning: The Unexpected

  • 1. YouTube/Netflix
  • 2. Data Mining/Big Data
  • 3. Amazon's Product Recommendations
  • 4. Stock Market/Housing Finance/Real Estate

Types of Machine Learning

We can define machine learning as learning from some past experiences based on some task, and it may have one of the following types as shown in Figure 1.4.

Supervised Learning

This is the most popular paradigm for machine learning, which learns from labeled data. A function is inferred from the data that maps the input, output pair to the target, h: f(x,y) —► y, where f is the function learned from input and output pairs x and y, respectively. It is further of two types: classification and regression. Classification predicts categorical answers and function acquires the class codes of different classes, that is, (0/1) or (yes/no). Naive Bayes, decision tree (Batra and Agrawal 2018), к nearest neighbor (Agrawal 2019), and support vector machines (SVM) are frequently used algorithms for classification. Regression predicts the numerical response, e.g. predicting the future value of stock prices. Linear Regression, neural networks, and regularization are algorithms used for regression. Table 1.2 (A and B) shows the difference between classification and regression.

Table 1.2A represents the classification task by showing the dataset of a shopping store with input variables as user ID, gender, age, and salary.

FIGURE 1.4

Types of machine learning.

TABLE 1.2A

Classification

User ID

Gender

Age

Salary

Buy Product (Yes/No)

101

M

42

15k

Yes

102

M

65

55k

No

103

F

65

50k

Yes

105

F

35

20k

Yes

TABLE 1.2B

Regression

Temp

Pressure

Relative Humidity

Wind Direction

Wind Speed

17.70

988.11

39.11

192.92

2.973

24.23

988.24

19.74

318.32

0.32

22.54

989.56

22.81

44.66

0.264

Based on these input variables, the machine learning algorithm will predict whether the customer will buy a product or not (0 for no, 1 for yes). Table 1.2B here shows the data of a meteorological department with input variables of temperature, pressure, relative humidity, and wind direction; after applying regression techniques, wind speed is determined.

In classification, the goal is to predict discrete values going to a specific class and calculate them on the basis of accuracy. This can be 0 or 1 (yes or no) in binary classification, but in the case of multi-class classification, it is more than one class. In regression, the output has continuous values.

Table 1.3 summarizes the supervised algorithms which are used in machine learning.

A method of performance measurement for machine learning classification is a confusion matrix, which has been shown in Table 1.4 for a binary clarification problem, and Table 1.5 represents a confusion matrix for a three- class problem. It is immensely functional for evaluating Precision, Recall, Specificity, AUC-ROC Curve, and Accuracy.

Each entry of Tables 1.4 andl. 5 depicts the number of records from class I predicted to be of class j. For example ' Q0 is the number of records from class 1 incorrectly predicted as class 0. On the other hand, C00 represents the number of records from class 0 that were correctly predicted as class 0. From the confusion matrix we can find the total number of correct predictions made by the classification model as (Cn + C00) and the total number of incorrect predictions as (C10 + C01).

A good classification model is expected to have more records in cells Cll and COO and fewer records in C01 and CIO. The most popular performance metric for evaluating the merit of a classifier is the accuracy, defined by:

Similarly, to find the error rate of the classification model, we use the following equation:

TABLE 1.3

Supervised Algorithms

Algorithm

Type

Description

Linear regression

Regression

This technique correlates each feature to the output which helps to predict future values.

Logistic regression

Classification

This technique is an extension of linear regression, used for classification tasks, and takes its output variable as binary.

Decision tree

Regression/Classification

It is a model which is used for predicting the values by splitting the nodes into the children nodes by forming structure of a tree.

Support Vector Machine (SVM)

Regression

This algorithm is best used with a non-linear solver. It catches a hyper plane to distribute the classes optimally.

Naive Bayes

Classification or Regression

Naive Bayes classification technique finds the prior knowledge of an event in relation to the independent probability of each feature.

AdaBoost

Classification or Regression

It uses a mass of models to have a decision which weighs them based on the accuracy in prediction.

Random forest

Classification or Regression

Random forest uses the "majority vote" method on multiple decision trees to label the output.

Gradient-boosting

Classification or Regression

It focuses on the error generated by the preceding trees to update the results.

Confusion Matrix for a Binary Classification Problem

TABLE 1.4

Predicted Class

Actual Class

1

0

1

Qi

Cio

0

о

П

8

Confusion Matrix for a Three-Class Problem

TABLE 1.5

Actual Class

Predicted Class

Cl

C2

C3

Cl

c„

C12

C,3

C2

Q,

C22

^23

C3

c3,

C32

C33

The key objective of a classification model is to find the highest accuracy and lowest error rate.

Supervised Learning Use Cases

  • 1. Cortana: this automated speech system is used for mobile applications. First it trains itself by using mobile phone voices and then it makes predictions based on this data.
  • 2. Weather Apps: weather apps are used to predict future weather by exploring the conditions for a given time, based on previous data.
  • 3. Biometric Attendance: the machines can be trained with inputs of biometric individuality which can be iris, thumb, or earlobe, etc. After training, the machine can easily identify the person.
 
Source
< Prev   CONTENTS   Source   Next >