# Generative Models

These types of models make up the bulk of Unsupervised Learning Models. The primary reason for this is that they can generate brand new data samples from the same distribution of any established training dataset. These kinds of models are created and implemented to learn the data about the datasets. This is very often referred to as the “Metadata.”

# Data Compression

This refers to the process for keeping the datasets as small as possible. This is purely an effort to keep them as smooth and efficient as possible so as not to drain the processing power of the Machine Learning system. This is very often done through what is known as the “Dimensionality Reduction Process.” Other techniques that can be used in this regard include those of “Singular Value Decomposition” and “Principal Component Analysis.”

Singular Value Decomposition mathematically factors the datasets into a product of three other datasets, using the concepts of Matrix Algebra. With Principal Component Analysis, various Linear Combinations are used find the specific statistical variances amongst all of the datasets.

# Association

As its name implies, this is actually a Rule-based Machine Learning methodology which can be used to find both hidden and unhidden relationships in all of the datasets. In order to accomplish this, the “Association Rule” is typically applied. It consists of both a consequent and an antecedent. An example of this is given in the matrix below:

 Frequency Count items That Are Present 1 Bread, Milk 2 Bread, Biscuits, Drink, Eggs 3 Milk, Biscuits, Drink, Diet Coke 4 Bread, Milk, Biscuits, Diet Coke 5 Bread, Milk, Diet Coke, and Coke

(SOURCE: 2).

There are wo very important properties to be aware of here:

■ The Support Count:

This is the actual count for the frequency of occurrence in any set that is present in the above matrix. For example, [(Milk, Bread, Biscuit)] = 2. Here, the mathematical representation can be given as follows:

X->Y, where the values of X and Y can be any two of the sets in the above matrix. For example, (Milk, Biscuits)->(Drinks).

■ The Frequent Item:

This is the statistical set that is present when it is equal to or even greater than the minimum threshold of the datasets. In this regard, there are three key metrics that one needs to be aware of:

1) The Support:

This specific metric describes just how frequently an Item Set actually occurs in all of the data processing transactions. The mathematical formula to calculate this level of occurrence is as follows:

Support[(X) (Y)] = transactions containing both X and Y/The total number of transactions.

2) Confidence:

This metric is used to gauge the statistical likeliness of an occurrence having any subsequent, consequential effects. The mathematical formula to calculate this is as follows:

Confidence [(X) -^(Y)] = the total transactions containing both X and Y/ The transactions containing X.

3) Lift:

This metric is used to statistically support the actual frequency of a consequent from which the conditional property of the occurrence of (Y) given the state of (X) can be computed. More specifically, this can be defined as the statistical rise in the probability level of the influence that (Y) has over (X). The mathematical formula to calculate this is as follows:

Lift [(X) -^(Y)] = (The total transactions containing both X and Y) *)The transactions containing X)/The total fraction of transactions containing Y.

It should be noted that the Association Rule relies heavily upon using data patterns as well as statistical co-occurrences. Very often in these situations, “If/Then” statements are utilized. There are also three other Machine Learning algorithms that fit into this category, and they are as follows:

1) The AIS Algorithm:

With this, a Machine Learning system can scan in and provide the total count of the number of datasets that are being fed into the Machine Learning system.

2) The SETM Algorithm:

This is used to further optimize the transactions that take place within the datasets as they are being processed by the Machine Learning system.

3) The Apriori Algorithm:

This allows for the Candidate Item to be set as a specific variable known as “S” to generate only those support amounts that are needed for a Large Item that resides within the datasets.

# The Density Estimation

This is deemed to be the statistical relationship between the total number of observations and their associated levels of probability. It should be noted here that when it comes to the outputs that have been derived from the Machine Learning system, the density probabilities can vary from high to low, and anything in between.

But in order to fully ascertain this, one needs to also determine whether or not a given statistical observation will actually happen or not.

# The Kernel Density Function

This mathematical function is used to further estimate the statistical probability of a Continuous Variable actually occurring in the datasets. In these instances, all of the Kernel Functions that are present are mathematically divided by the sheer total of the Kernel Functions, whether they are actually present or not. This is meant to provide assurances that the Probability Density Function remains a non-negative value, and to confirm that it will remain a mathematical integral over the datasets that are used by the Machine Learning system.

The Python source code for this is as follows:

For I = 1 to n:

For all X;

Dens(X)

+ = (1/n) * (1/w) *K[(x-Xi)/w]

Where:

■ The Input = the Kernel Function K(x), with the Kernel Width ofW, consisting of Data Instances of xl and xN.

■ The Output = the estimated Probability Density Function that underlays the training datasets.

■ The Process: This initializes the Dens(X) = 0 at all points of “X” which occur in the datasets.

# Latent Variables

These variables are deemed to be those that are statistically inferred from other variables in the datasets that have no direct correlation amongst one another. These kinds of variables are not used in training sets, and are not quantitative by nature. Rather, they are qualitative.

# Gaussian Mixture Models

These are deemed to be Latent Variable models as well. They are highly used in Machine Learning applications because they can calculate the total amount of data in the datasets, including those that contain Clusters. Each of the latter can be further represented as N1, ... NK, but the statistical distributions that reside in them are deemed to be Gaussian Mixture by nature.