KNN is a supervised classifier. It is a best choice for the classification kind of problems. To predict the target label of a new test data, KNN finds the distance of the nearest training data class labels with a new test data point in the presence of К value. Then counts the number of veiy closest data points using К value and concludes the new test data class label. To calculate the number of the nearest training data point’s distance, KNN uses К variable value between 0 to 10 with the help of Euclidean distance (ED) function for continuous variables and Hamming distance function for categorical variables.


The working procedure of KNN algorithm is described as follows.

Let us consider the training data sample with » counts. Every data point x has associated class label c. Here, x denotes training data points and c denotes class labels. For the understanding purpose, the training data and its associated class is plotted in (x, y) graph. Also, the new test data point is placed in the same (x, y) graph to predict the class label. Then the distance between the test data point with all the training data points is calculated using any one of the distance functions as mentioned in the objective. The distance values are then arranged in descending order. Now using К variable value is used to count the number of training data points that are near to test data point. The class label of the maximum training data point within к value will be assigned to the class label of new test data (Park and Lee, 2018).

  • 1. Choosing the К Value: The difficult part of the KNN algorithm is to choose the К value. The small К value influence to the noise in predicting the target class label and the biggest К value leads to overfitting probability. Also, the biggest К value increases the calculation time and reduces the execution speed. The formula К = пл (1/2) is used to choose the К value. To optimize the test result, the cross-validation of data is performed on training data with different К values. An optimized value will be chosen based on the best accuracy for the test result.
  • 2. Condensed Nearest Neighbor: It is the process of removing the unwanted data from the training data to increase the accuracy.

The steps for condensing the data include:

1. Outliers: Remove the abnormal distance data.

  • 2. Prototypes: To find the non-outlier points, a minimum training set is used.
  • 3. Absorbed Points: Used to identify non-outlier points correctly.
  • 2.4.3 ADVANTAGES

The advantages of KNN include the following:

  • • It is strong enough if training data is large.
  • • It is simple and flexible with attributes and distance functions.
  • • It can support a multi-class data set.

The limitations of KNN include:

  • • Finding suitable К value is a difficult task.
  • • It is difficult to choose the type of distance function and its implications for a specific data set.
  • • The computation cost is a little bit high due to find the distance between the test to all training data.
  • • This is a kind of lazy learner, it couldn't learn anything, only depends on К-nearest common class labels.
  • • Sometimes, change in К value will result in a change in the target class label.
  • • It requires large storage space.
  • • It needs large samples for high accuracy.

The algorithm is best suited to the following identified real-tune applications:

  • • Text, handwriting mining;
  • • Agriculture;
  • • Finance;
  • • Medicine;
  • • Credit ratings;
  • • Image and video recognition.


The tools used for the implementation of KNN include:

  • • Weka;
  • • Scikit-leam of python;
  • • R tool.



The NB algorithm performs classification tasks in the field of ML. It can do classification very well on the data set even if it has huge records with multi-class and binaiy class classification problems. The main application of Naive Bayes is text analysis and natural language processing (NLP).


The working procedure of the NB algorithm is described as follows.

Baye’s theorem is required to understand (work with) the NB algorithm efficiently. Bayes theorem is used to combine the multiple classification algorithms to form NB classifier with a common principle (Wu et al., 2019).

1. Bayes Theorem: It works based on conditional probability. Conditional probability means, an event will occur with conditioned event already occurred. The formula to obtain conditional probability is:


P(A): Prior probability of an event A and A is not dependent on an event В in any way.

P(A|B): Conditional probability of an event A with conditioned on another event B. If an event A occurs, it should be dependent on event В which has already occurred.

P(B|A): Conditional probability of an event В with conditioned on the event A. If an event В occurs, it should be dependent on event A which has already occurred.

P(B): Prior probability of an event B. Here, В is not dependent on event A in any way.

2. Naive Bayes Classifier: This considers all the features (attributes) of the data set independently, which contribute to classify the new data even if the attributes have some dependency. It means, probability of one attribute should not impact the occurrence of probability of other attributes in the data set. Also, each attribute in the data set equally contributes to predict the new data class label. As per Bayes theorem, P(A|B) is called as posterior probability. In NB classifier, the posterior probability is calculated for all the attributes independently. Then the highest posterior probability attribute is taken as the most likely attribute and is called maximum A posteriori (MAP).

Here, P(B) acts as an evidence probability with a constant value, which helps to normalize the result only. As P(B) is a constant, it can be omitted and it will not affect the MAP(A) value. So;

Types of NB Algorithm

There are three Naive Bayes algorithms.

  • • Gaussian Naive Bayes;
  • • Multinomial Naive Bayes; and
  • • Bernoulli Naive Bayes.

i. Gaussian Naive Bayes: If all the attribute values are continuous, the Gaussian NB classifier is useful. It performs normal distribution and calculates the mean and variance for all attribute values.

ii. Multinomial Naive Bayes: It is useful when attribute values are distributed in a multinomial form.

iii. Bernoulli Naive Bayes: This classifier is useful when attribute values are binary-valued.


The advantages of the NB algorithm include the following:

  • • It performs quickly with high scalability and simple manner.
  • • It can be useful for continuous, binaiy, and multinomial distributed attribute values.
  • • It is a best choice for text classification.
  • • Understanding and model building becomes very simple for small and big data (BD) set.
  • • For irrelevant attributes, it is not sensitive.

The limitations of the NB algorithm include:

  • • It cannot find the relationship among the attributes as it recognizes all the attributes are irrelevant.
  • • There is a possibility to occur “zero conditional probability problem,” if the attribute class has zero frequency data items.
  • • The assumption of highly independence of attribute variables has not been possible in real life always.
  • • It is not suitable for regression problems.

The algorithm is best suited to the following identified real-time applications:

  • • Real-time prediction;
  • • Multiclass prediction;
  • • Text, emails, symbols, and name classification/spam filtering/senti- rnent analysis (SA);
  • • Recommendation system.

The tools used for the implementation of the Naive Bayes algorithm include:

  • • Java classifier based on the NB approach;
  • • WEKA;
  • • Python;
  • • R.
< Prev   CONTENTS   Source   Next >