Data Analytics and AI

Data Analytics: Descriptive vs. Predictive vs. Prescriptive

The term “Data Analytics” has been used in the business world for decades. It refers to the use of various techniques and processes that find meaningful patterns in data. It can be as simple as extracting statistics such as the average age of customers. It may also involve applying advanced AI algorithms to extract from humongous data for making predictions about business trends. Data analytics tools enable us to describe what happened in the past, draw insights about the present, and make predictions about the future.

The type of data analytics that allows us to describe what happened is called “Descriptive Analytics.” It can consolidate data in a form that enables appropriate reporting and analysis. As the simplest class of analytics, it usually takes place as the first step in a data analysis process. For example, it helps to answer the questions such as “What is our sales growth this month from last month? What customers have required the most customer service help? What is the total revenue per subscriber?” Such initial questions must be answered regardless of others advanced analytics capabilities. And it establishes a common ground for applying other types of analytics. Presenting the findings from descriptive analytics is frequently through data visualization via reports, dashboards, and scorecards.

Predictive analytics utilizes statistical modeling and machine learning techniques to make predictions about future outcomes based on the patterns found in the existing data. For example, a predictive churn model basically looks at data from customers that have already churned (that is, active users stopped using the service) and their characteristics/behaviors to identify customers who are likely to churn. This method requires data about customers from various sources and statistical modeling techniques to estimate the probability of a user’s churn. With some technology, it can also probabilistically identify the steps and stages when a customer is leaving.

Prescriptive analytics makes use of the results obtained from descriptive and predictive analysis to make prescriptions (or recommendations) around the optimal actions to achieve business objective such as customer service, profits, and operation efficiency. It goes beyond predicting future outcomes by suggesting actions to benefit from the predictions and showing the implications of each decision option.[1]

Optimization and decision modeling technologies are used to solve complex decisions with millions of decision variables, constraints, and tradeoffs.[2]

As the starting step in data analytics, descriptive analytics has been the major form in traditional business intelligence. However, with the availability of big volume data and advanced analysis techniques, more effort is going toward predictive and prescriptive analytics.

Advanced Analytics toward Machine Learning and Artificial Intelligence

If say traditional analytics is important to help answer “what happened?”, advanced analytics focuses more on predicting the future and finding patterns and insights that aim to transform the future. Advanced analytics refers to the set of sophisticated analytical tools, techniques, and methods designed to uncover deep trends, patterns, and insights that are hidden within data, predict the future, and drive change using data-driven information. It offers organizations far more control and educated perspective when it comes to making business-critical decisions. It encompasses predictive analytics, prescriptive analytics, and other analytics that involve high-level analytical methods.

The development of advanced analytics has greatly benefited from the advancement of applications of machine learning algorithms which is an intersection of statistics and computer science. As a subset of artificial intelligence, machine learning deals with large-scale statistical models on large data sets and enables computers to automatically self-learn from data and improve from experience. Nowadays, machine learning, especially deep learning, along with big data, is becoming the main driver to advance the applications of artificial intelligence in different fields.

Machine Learning Approaches

General speaking, there are four categories of machine learning algorithms.

Supervised Learning. Supervised learning aims to find patterns in data that can be used for predictive analytics. In order to do that, a “well-understood” data set is first established. For example, there could be millions of emails of spam and non-spam types. By labeling each email message with the type it belongs to and defining what features can contribute to distinguishing between the two types, you have a well-understood data set with which a supervised machine learning algorithm can work with. Such a data set is called a “training data set” in the setting of supervised learning. If the label is a discrete value such as “Yes” (or 1) or “No” (or 0), this task is known as “Classification.” If the label is continuous, the task is known as “Regression.”

Classification has been one of the most common machine learning tasks. A supervised machine learning algorithm is “trained” using the training data set, which means a statistical model is built through modeling the “correlation” among the label and features. The model will be evaluated with a “test” data set. A good test data set should contain a sufficient number of “unseen” examples (data points that did not occur in the training data set) to avoid an “overfitting” issue. This way can make the model built more generalizable.

Some of the most widely used supervised learning methods include linear regression, decision trees, random forests, naive Bayes, logistic regression, and support vector machines (SVM), and have broadly applied to a variety of business applications, including fraud detection, recommendation system, risk analysis, and sentiment analysis.

Unsupervised Learning. Unsupervised learning applies when not enough labeled data is available. In other words, no well-understood data set is available or labeling a new data set is just too costly. By applying some well- designed distance metrics, an unsupervised learning algorithm can strategically group the data into clusters. Data points within the distance threshold will be grouped together. This process iterates until the number of clusters converges or no more change takes place to each cluster. Some of the widely used clustering algorithms include к-means and к-nearest neighbors (KNN).

Reinforcement Learning. Unlike supervised learning where the labeled data is given to “teach” or “train” the system to learn, in reinforcement learning, learning takes place as a result of an interaction between an agent and the environment through trial and error. This learning process fundamentally mimics how humans (and animals) learn. As humans, we perform actions and observe the results of these actions on the environment. The idea is commonly known as “cause and effect” which can be translated into the following steps in reinforcement learning:[3]

  • 1) The agent observes a state in the environment.
  • 2) An action is selected using a decision-making function (policy).
  • 3) The action is performed.
  • 4) The agent receives a scalar reward (reinforcement) or penalty from the environment.
  • 5) The policy is updated/fine-tuned for that particular state accordingly (learning step).
  • 6) The above steps are iterated until an optimal policy is found. Reinforcement learning has been widely applied in robotics and game playing. It is also used for complex tasks such as autonomous vehicles.

Neural Networks and Deep Learning. Neural networks and deep learning currently provide the cutting-edge solutions to many problems in traditional AI fields such image recognition and speech recognition. Neural networks were developed in the 1950s. Its interconnected structures were designed to attempt to emulate how the human brain works so computers can learn and make decisions in a human-like manner. A neural network consists of an input layer, one or more hidden layers, and an output layer. Due to its high demand for computation resources, there was only one to three hidden layers in a conventional neural network. The term “deep learning” refers to using a neural network when there are more hidden layers within the network. Nowadays, a typical deep neural network may consist of thousands or even millions of nodes that are densely interconnected. Powerful programming tools and computing power are in place to support the computations required on such a big network. Compared with supervised learning where features need to be carefully selected for learning, deep learning can automatically learn features and model high-level representations of data with the multiple processing layers and the non-linear transformations that take place through those layers. Data is often high-dimensional and too complicated to be represented by a simple model. In that sense, deep neural networks can provide simpler but more descriptive models for many problems.

  • [1] https://en.wikipedia.0rg/wiki/Prescriptive_analytics#cite_note-6
  • [2]
  • [3]
< Prev   CONTENTS   Source   Next >