# Diagnosis Using Support Vector Machines

A SVM is a supervised machine learning algorithm proposed by Vapnik in 1995 [5]. It has a number of theoretical and computational merits, for example, the simple geometrical interpretation of the margin, uniqueness of the solution, statistical robustness of the loss function, modularity of the kernel function, and overfitting control through the choice of a single regularization parameter. A brief introduction to SVMs is presented below.

## Support Vector Machines

The goal of SVMs is to define an optimal separating hyperplane (OSH) to separate two classes. The vectors from the same class fall on the same side of the optimal separating hyperplane, and the distance from the closest vectors to the optimal separating hyperplane is the maximum among all the separating hyperplanes. An important and unique feature of this approach is that the solution is only based on those vectors that are the closest to the OSH, calculated in the following way. Let (xi, yi), i = 1, 2,...,n be a set of training examples, and xj e Rd, yj e {-1, +1}. Figure2.1 illustrates a two-class SVM model. The vector xi is considered as input, and d is the dimensionality of the input vectors. Each input vector belongs to one of the two classes. One is labeled by y = +1; the other is labeled by y = -1. If the set can be linearly separated, there must be a hyperplane satisfying Formula (2.1):

Fig. 2.1 Illustration of a 2-class support vector machine model

where sgn refers to the sign of (штх+Ь), ш is a d-dimensional vector, and b is a scalar. Those vectors xi for which f (xi) is positive are placed in one class, while vectors xi for which f (xi) is negative are placed in another class. Based on [5], we define margin as twice the distance from the classifier to the closest data vector, namely the support vector. The margin is a measure of the ability to generate a classifier. The larger the margin is, the better is the generation of the classifier. SVMs maximize the margin between two classes.

Since the margin width equals , the maximum-margin solution is found by solving the following minimization problem:

Subject to

where slack variable ? is introduced to measure the degree of misclassification of data xi and C is the error penalty. We can tune C to adjust the trained SVM model to be either overfitting or underfitting. The parameter p is used for regularization of the weights in the SVM model. Most SVM solvers use standard regularization, i.e., p = 2. Hence we assume p = 2 in our work.

In order to solve the constrained optimization problem described in (2.2), a set of Lagrange multipliers ai, where ai > 0, is used. Each multiplier ai corresponds to a constraint in (2.3) on the support vectors. The optimization problem from (2.2) can now be expressed in its dual form

where Qij = yiyjK(xi, xj), and K is the kernel function described in the next section. Additional mathematical details are omitted here but they can be found in [5]. The weights and offsets are as follows:

Originally, SVMs were designed for linear binary classification problems. In practice, classification problems are not limited by two classes. In board-level fault diagnosis, the number of root cause candidates (classes) is typically in the range of a few hundreds. In [5], a modified design of SVMs was proposed in order to incorporate multiclass learning. Besides this, an alternative approach for handling a large number of classes is to combine multiple binary SVMs. “One against one” provides pairwise comparisons between classes. “One against all” compares a given class with all the other classes. According to a comparison study in [5], the accuracies of these methods are almost the same. Therefore, we choose the “one against all” in our problem, which has the lowest computation complexity.