Cyber risks often cause unknown financial damage to companies. Technical progress of information systems leading to interconnection of systems, data storage in databases (DDBB) and sharing, and complex technology architectures are increasing the risks of cyber exposure. Cyber risks primarily affect digital information security. Cyber risk can jeopardize the availability, integrity, and confidentiality of sensitive data, processes, and information when they are realized. The success of new technologies and services on line and on the cloud, such as cloud computing or the Internet of Things, has pushed cyber risk into the realm of general risk perception, especially as more and more end customers are affected by risk exposure. Compared to traditional risks, however, cyber risks are still a relatively new threat to companies, especially for small ones. Cyber Incidents and massive data breaches at several companies have gained a strong presence in scientific and research community, and media, especially in recent years, due to the accumulation of incidents. An example of this fact is the recent hacker attack on British Airways, which resulted in the disclosure of sensitive customer information. For insurance companies offering cyber protection to their customers, this is a major challenge. Insurance companies rely on data that have been stored for decades in private liability insurance or in motor vehicles to evaluate their risk. Since this is not the case in the context of cyber risks, their own underwriters work together with technical IT security experts to evaluate the respective customer situation. This takes place within a multi-tier process and includes a risk dialogue (Bartolini et al., 2017), a questionnaire- based approach (Bartolini et al., 2018) for evaluating the cyber risks and a macroeconomic assessment of the client including the client’s branch. This chapter is aimed at showing how this macroeconomic assessment can be carried out by means of a novel approach based on machine learning (ML). The remaining part of this chapter is structured as follows: In Section 2, the general approach is introduced. In Section 3, several ML algorithms and their results will be described. In Section 4, the models will be compared, and a final evaluation proceeded. Finally, some concluding remarks are provided in Section 5.

Evaluation of Technical and Business Risk Features for a Machine Learning Approach

Business Intelligence (BI) focuses on the independent analysis of data (Rud, 2009). This feature is aimed at predicting the outcome of a business strategy, which in the context of an insurance in terms of risk management is the selection of customer-specific factors. Therefore, the economic risk assessment first requires this data. BI is a good concept for insurance companies to use information in an intelligent way. Here, the results of strategies are based on the data analysis. On the other hand, ML works in a different way. Its functionality is more focused on understanding the system itself (Bishop, 2006). Therefore, both are complementary and needed by an insurance company to make an underwriting decision. ML focuses on learning patterns by accessing the collected data available in the system databases and transforming that data into information and decision, which is the main concern of the investigation introduced in this chapter.

Since there are no databases with Cyber Incidents and loss data etc. available on which insurance companies base their risk management, at least Cyber Insurance specialized insurance companies can use their own data to work out evaluations and features for ML. We selected such an available dataset to develop a Cyber Insurance ML claims prediction model. In doing so, features must be appropriately selected to make them helpful to better assess the cyber risk, that is, to predict a possible occurrence of claims. The approach chosen will explore which factors play a significant role in insurers’ economic risk assessment. Following feature selection criteria have been applied for this approach:

  • Turnover of the insured company. The greater the company’s Turnover, the greater its size and thus its IT infrastructure. In addition, a large and well-known company is more exposed to a cyber attack than a small one.
  • Other IT insurances. For an economic as well as a risk-based approach, it is a relevant aspect if the company has more IT insurances besides the Cyber Insurance and if this additional risk transfer has a positive correlation.
  • Credit card/cardholder and personal identifying information (СС/ PII) data. Data breaches have dramatic outcomes in which personal data as well as cardholder data are stolen by criminals. Claims payments can arise when the company is forced to pay fines in the legal context (e.g., General Data Protection Regulation, 2016) or in the regulatory context (e.g., PCI Security Standards Council, 2018).
  • The result of the technical cyber risk assessment. As described in related work (Bartolini et al., 2018), a company can be technically insured if the result of the cyber risk assessment (technical risk assessment) reaches minimum 2.00 Rating. Therefore, this criterion is also relevant for the approach.
  • The customer is one of the critical infrastructures. As critical infrastructures are a high target for criminals, these companies need to have a high maturity level.
  • Investments in IT/Cyber Programs, another economic figure for correlation analysis. In general, it seems high development in this area will be a positive risk factor.
  • Cyber damage has finally occurred. The factor if a Cyber Insurance claim has already occurred at the company is a very important aspect for the approach.

For better illustration of the seven described features, we have extracted the first five datasets and visualized these features in the figures of Table 8.1.

Tie total number of insured company’s dataset includes 1,295 customers. The seven features (Table 8.2, Figure of the ML approach will be described next. In total, 758 customers already raised an insurance claim, while for 537 this was not the case (Table 8.2). The annual Turnover of the insured companies varies widely. There are only a few companies with a Turnover exceeding 100M EUR.

Table 8.1 Relevant Features Affecting Insurability


Other IT Insurance

CC / Pll



Cyber Invest

Insurance claim










































Table 8.2 Dataset Features. Insurance Claim

Companies experienced minimum one Cyber Insurance Claim


Companies without any Cyber Insurance Claims



However, most companies have a Turnover significantly below 60M EUR per year (Figure 8.1a). As explained in the authors’ previous work (Bartolini et al., 2018) on the Cyber Insurance Risk Assessment, a rating result must be at least 2.00 for a company to be eligible for Cyber Insurance. This is also reflected in Figure 8.1b as no data less than 2.00 exists in the dataset. Most of the insured companies have been rated between 2.00 and 3.00 and only very few have a better rating. Next, sensitive data such as cardholder data or personal data are stored in most companies well below 10,000 records. But for some companies, the collection of these stored records is well over 50,000 (Figure 8.1c).

Dataset features for the different parameters

Figure 8.1 Dataset features for the different parameters.

Especially in the context of regulatory and legal requirements, such a high amount of sensitive data is critical and in the context of a data protection violation, a huge number of potential customers are affected. Information regarding investments in cyber protection and programs moves up to 1M EUR per year for most companies.

Few companies invest more than 4M EUR a year (Figure 8.Id). The so-called critical infrastructures (KRITIS) were included in less than 300 companies among the dataset. One of the reasons for this is that certain risks are excluded from Insurance cover because their risk cannot be accepted. Finally, among the insured companies, the ratio is quite balanced, between those who use additional other IT insurance as protection against the cyber risk and those who only use the cyber insurance as a risk transfer measure.

Based on this data, it will be first relevant how the correlations of every pair of these variables (features) look like. As can be seen from the heatmap in Figure 8.2, there is a strong correlation between СС/ PII and insurance claims as well as between KRITIS and Insurance claims.

Turnover also has a certain relevance, as well the Rating on Insurance claims. Although it can be stated that there is a certain

Heatmap. Correlation between features

Figure 8.2 Heatmap. Correlation between features.

correlation between the investment in cyber security and KRITIS in companies, this fact generally has little significance for Insurance Claims. The same conclusion can in principle be interpreted for Other IT Insurance. As we want to predict target numeric values, then a supervised learning method must now be chosen. In this chapter, we will focus on the most relevant algorithms: Logistic Regression, Linear Regression, Random Forests, Gradient Boosted Trees, etc. The Python Scikit-Learn library is used for all these algorithms. In order to process the data collected in Scikit-Learn (2019) (as already shown in heatmap), it has to be binary. Therefore, One Hot Encoding is used. One way to evaluate a model is to split the training set into smaller training sets and evaluate against the validation set. Therefore, we use Scikit-Learn cross-validation, and our three sets for all models in this chapter are divided into training sets (containing 80% of the data), test set (10%), and the mentioned cross- validation set (10%). Cross-validation is used to reduce the problem of overfitting. The aim is to find out which algorithm is the best claims predictor.

< Prev   CONTENTS   Source   Next >