Linear Kernel SVM Classification
First, we are looking at Linear Kernel SVM. This kernel method is the simplest kernel function and almost every library is pre-implemented. Using the algorithm on the dataset we are receiving after the adjustment on C parameter, the best results when C is set to 1.0. With this, we achieve an accuracy of 87% on our training dataset.
Now, we perform settings on our test dataset, and we achieve the Insurance Claim probability of 82% as the array [0.1774705, 0.8225295] leads to an Insurance Claim Prediction array . That means that the dataset has an 18% chance of being class 0 and 82% chance of being class 1.
Next, regarding Feature Importance at Linear SVM Classifier (Figure 8.4b), the following conclusions can be made. First, CC/PII and KRITIS have significant influence as we saw it at Logistic Regression, and Cyber Investment has a negative influence on the prediction as well. But here we see the Rating as a higher importance than Turnover.
After we saw the model’s accuracy, let us now have also a look at the ROC curve (Figure 8.4c). As mentioned before, a good model should be as far away from the dotted line and as close to the upper left corner as possible. It can be seen from the ROC curve that the linear SVM model is not very effective with the Cyber Insurance dataset as it was thought it would be regarding the achieved accuracy. Although the dotted line is not touched as it is the case with, for example, Logistic Regression, but the distance between the lines is not big enough. Therefore, we adjust the kernel by considering other SVM models.
Polynomial Kernel SVM
KNN Confusion Matrix ([[239, 141],
"Die Polynomial Kernel is a frequently used kernel for SVM classification. It is especially suitable for problems where all training data has been normalized (Goldberg and Elhadad, 2008). It is implemented in many libraries like Scikit-Learn.
After applying this to the train datasets, we were able to get an accuracy of 100%. We used the poly kernel with the tree degree of 5. We made several adjustments to C parameters. Best results were obtained when choosing C equal 100. As the accuracy was perfect on the training set, we will compare the results with the next SVM algorithm, as in general the polynomial method is very computational expensive.
Gaussian RBF Kernel SVM
Although linear SVM classifiers are very efficient and work well in many cases, a lot of datasets are not being separable. Just as the polynomial feature method, the similarity feature method can be very interesting for ML algorithms. The main issue is the computational expensiveness of such a model. To get this problem under control, we can use the kernel trick with SVM. This has the same effect as if we were adding many similar features without doing it (Muller, 2012). Tie Radial Basis Function (RBF) Kernel, or Gaussian Kernel, is one of the most widely used kernel methods for SVM classification and is therefore implemented in many libraries (Chang et al., 2010). The choice of sigma plays a remarkable role in the performance of the kernel, because the kernel behaves like the Linear Kernel if the value is too high (Geron, 2017).
After applying this algorithm to the training dataset, we were able to get an accuracy of 100% as that we achieved with the polynomial kernel. We used the Gaussian RBF kernel of the Scikit-Learn Support Vector Classifier (SVC) class. We made several adjustments to the gamma and C parameters. Best results were obtained when choosing C equal to 100. Since we have not yet explained gamma, a short description is needed. The effect of gamma is like a regularization parameter. Tiis means that if the model should be overfitting, then gamma should be reduced, and if it should be underfitting, then gamma should be increased (Raschka and Mirjalili, 2019). After implementing this approach on the dataset, the following results are obtained on the test dataset.
Insurance Claim Probability [0.1198319, 0.88051681] and Insurance Claim Prediction . These results mean that the dataset has a 12% chance of being class 0 and 88% chance of being class 1. As expected, without any surprise our result is better than the one obtained with the Linear Kernel.
After showing the models accuracy, now the properties of the ROC curve (Figure 8.4b, Figure 8.4c) are analyzed. As already mentioned before, a good model should be as far away from the dotted line and as close to the upper left corner as possible. According to the ROC curve, first, compared to the Linear Kernel SVM, both models are much more effective on the Cyber Insurance dataset - this time the accuracy was a good measurement. Without any cost decisions the more expensive polynomial should be chosen here as the performance is a bit better than the RBF. But if the dataset is getting bigger, it should be focused on the RBF.
In the following, we will use the К-Nearest Neighbors (KNN) algorithm. The KNN algorithm is certainly one of the simplest and most efficient algorithms in ML (Altman, 1992). As with many non- parametric algorithms, the instances are represented here as points in multidimensional space. An instance is defined by its attributes. Each attribute represents an axis in multidimensional space, the number of occurrences of the attributes is merged into a point vector. With the KNN method, as the name already suggests, the decision is made based on the closest data to the point to be reassigned (Samworth, 2012).
As we performed the KNN on our Cyber Insurance dataset, we achieved an accuracy of 85% on the training data. Again, we want to evaluate this performance in a better way. Therefore, we want to look again at the ROC curve but this time also on the confusion matrix (Fawcett, 2006). With this approach, it is counted how many times a class is classified as a claim and how often not. To perform this task on the dataset, we are choosing cross-validation predicting function in Scikit-Learn. This function performs the validation and puts out the prediction on each training set fold. It is used to track the overfitting problem as described before. This is done by using the cross- validation parameter к in the library. We achieved the best result with К = 4. The performance obtained in this case is as follows:
KNN Confusion Matrix ([[239, 141],
Each row in the confusion matrix is a current class, and each column represents a predicted class. The first row of the matrix describes 239 not claims (true negatives), correctly classified, but 141 were wrongly classified as claims (false positives). The second row considers (positive class) 76 cases, which were wrongly classified as non-claims (false negatives). Finally, 494 were correctly classified as claims (true positives). A perfect classifier would have only true positives and true negatives as an output.
At the ROC curve (Figure 8.4d), we can see that the performance is close to the polynomial SVM, but instead of the high computational cost we have a cheaper algorithm which can achieve this score.