# Machine Learning Algorithms for Cyber Insurance Decision-Making Process

## Analysis with Regression Algorithms

The goal of this procedure is to use a regression for every class and setting, the output equals 1 for training instances that belong to the class and 0 for those not belonging to that class (classification and clustering). The result is a linear output for the class. Next, if given test data of an unknown class, calculation of the value of each linear expression can be made and the best one can be chosen. This is often called multi-response linear regression (Breiman and Friedman, 1997). In the case of multi-response linear regression, it is observed that a fair membership is created for each class. With a new instance, it can be calculated for its membership to a class afterward and finally the best one selected. Besides the fact that multi-response linear regression often brings up good results, it has also some drawbacks. As some produced values can fall outside of the range of 0 and 1, least squares regression also assumes that errors are not only statistically independent but are also normally distributed with the same standard deviation.

In this investigation, we are looking for a regression algorithm as simple as possible, capable to provide relevant information so that we can make predictions about the dependent variable as better as possible. The target value and the input value have to be put in relation to each other. It should also be noted that some points cannot be perfectly described by a straight line and therefore not every error can be caught. At regression, therefore, it makes sense to use the root mean square error (RMSE) as a performance measure which gives information of how much error the system makes in its predictions. In the following sections, we compare first Linear Regression algorithm with each other and finalize with the Logistic Regression.

## Linear Regression

This algorithm is the simplest and very popular on ML as it can be fit very quickly and is good to interpret as well (Yan, 2009). This helps us with our aim to predict the Insurance Claims closest to the actual claims data. A regression function describes the trend or the average relationship between numerical attributes. A Linear Regression assumes that there is a linear relationship between the data and the associated function values, that the relationship between the x and у values is linear, and that it can therefore be described by a straight line.

The aim is to minimize the error between the actual target value and the calculated value, so that positive and negative deviations do not compensate each other. In this context, the mean squared error is examined.

Linear Regression itself is prone to overfitting when basis function is used (Harrell, 2001). Such a behavior is a problem, and it would be better if there are possibilities to limit overfitting or to raise penalties for this behavior. At Linear Regressions, there are no parameters which can influence this. This is possible when regularization is used (Biihlmann and van de Geer, 2011). Therefore, we look at regularization in the following methods when we validated the best approach.

CYBER PROTECTION

145

## Ridge Regression (L2)

Tie Ridge Regression, also called L2 regression, is to regulate the Linear Regression algorithm while training (Tikhonov, 1963). The regularization can be controlled with a hyper parameter.

## Lasso Regression (L1)

Tie only difference to the Ridge Regression is that Lasso uses the Ll, which is the weighted vector (Santosa and Symes, 1989; Tibshirani, 1996), instead of the L2 algorithm to regularize the linear regression model.

In binary, we can decide if an Insurance Claim happened with a 1, or if there is no claim with a 0. Therefore, the threshold is at 0.5. Regression model claim prediction

First, a model was used without regularization, afterward one with Lasso regularization (Ll), and finally one with Ridge regression (L2) using Scikit-Learn. After analyzing Figure 8.3a to 8.3c, we conclude that it shows a good performance just in the case where few instances of the fit training set are used (curve starts at 0), that is, as more datasets are added, it is found that there isn’t any model that can appropriately fit in an acceptable way. The top of the error will be reached at a certain point, when adding more data to the training does not make any differences regarding the average error results. In contrast, the performance of the model is different with validation data. There is a big difference with only a few datasets at the beginning, which is why the curve starts much more clearly above 0. However, with increasing datasets each model learns and achieves that the validation error decreases. In the context of Cyber Insurance datasets, it has become clear that our Cyber Insurance dataset cannot be analyzed very well with an unregulated linear regression model.

This is made clear by the noticeably large RMSE (see Table 8.2). Tie regulation of the linear model by means of Ll and L2 makes sense in the first view, since RMSE could be significantly reduced. Tie best result was achieved with the Ridge Regression. However, an unrestricted further use of the model is not advisable. The present

Figure 8.3 Accuracy of the different approaches and relevance of parameters.

dataset for Cyber Insurance cannot be separated linearly very well. This is much clearer by its properties in Figure 8.3c, where it can be seen that the error does not fall below a certain level and that this level is very close to the train set. Furthermore, it has to be stated that the model is underfitting. As long as a model is underfitting on the training set, a parameter control does not help, even adding further datasets does not (Burnham and Anderson, 2002). You can try to get more features, but the much more efficient approach is to choose a more complex model and algorithms (Table 8.3).

Table 8.3 Results of Algorithms

Linear |
Lasso (LI) |
Ridge (L2) |

RMSE: 0.72 |
RMSE: 0.44 |
RMSE: 0.42 |