Parallel Support Vector Machine
There is an enormous amount of data available due to the advancements in electronics and computer technology. It has become an important issue for the organizations to process and analyse a huge amount of data. There is an increase in demand of techniques to process large amount of data in almost every field, such as, medical, bioinformatics, business, web, banking, organization etc. Parallel algorithms are the solution to the problem of analysing such huge amount of data.
Support Vector Machine comes under the category of supervised learning, because the classes of classification are already known in this method (“Support Vector Machine”, 2015) and (Han & Kamber, 2001). SVM can be used for both classification and regression. It emphasizes on mapping the input training data to higher dimension. SVM computes a linear optimal separating hyperplane, i,e,, a plane/boundary that separates tuples of one class from another. The main objective of SVM is to find hyperplane using support vector and margins (Butler, 2014).
Since there is a very huge amount of data present, storage requirements increases rapidly. Parallelization is a solution proposed for this problem, which suggests that the problem should be split in subsets and training tuples should be assigned to different subsets. Parallel SVM can be realized with the help of MapReduce framework.
The parallel SVM can be seen as being implemented on cascade model. The SVM network is trained through the solutions of partial subSVMs, where each subSVM serves as a filter. Solutions obtained from partial subSVMs help us reach the global optimum. In this model optimization problem of large scale data is divided into independent and smaller optimization problem. The support vector of previous subSVM is fed as input to the next subSVM (Sun & Fox, 2012).
In the architecture given below it is clearly visible that the support vector from two subSVMs are combined and given as input to subSVM of next level. The process is carried out until only one SVM process is left. Ultimately on the last level, the output of the overall system is obtained. In this architecture each subSVM has to deal with a smaller set of input training data. Therefore, set of training data for each SVM is smaller than the training data set of whole problem.
Parallel SVM architecture is shown in Figure 4.
- • Configure computation environment.
- • Data is partitioned and distributed to different computation nodes.
- • Create a partition file.
Main class for parallel SVM is shown in Figure 5 and Map class & Reduce class is shown in Figure 6.
The program is explained as follows: The large dataset D is partitioned into smaller size subsets Dp D2, ..., Dn and given as input to different nodes. Then a partition file is created according to twister command, which will be used in Twister configuration.
jobConf is a command which is used to configure the parameters such as Map, Reduce parameters and class names. While help in computation. TwisterDriver is used to initiate MapReduce task. In each node, Map task is performed. Partitioned training data from local file system is loaded into the first layer. After that support
Figure 4. Parallel SVM architecture based on cascade model (Sun & Fox, 2012)
Figure 5. Parallel SVM main class (Sun & Fox, 2012)
vectors of previous layer are combined in pair and given as input to the nodes of next layer. Trained support vectors are given to the Reduce jobs as input. Reducer job is to collect all support vectors from all mapper jobs and to send to the client node. This iterative training process will stop executing when support vector of all subSVM are combined to one SVM (Sun & Fox, 2012).
Figure 6. Parallel SVM map class & reduce class (Sun & Fox, 2012)
CASE STUDY Implementation of Back-Propagation Neural Network for Demand Forecasting in Supply Chain
Demand forecasting is the most important element of supply chain management in any organization. Forecasting technique must be efficient otherwise it would lead to cost increment, lose customers etc. This case study which is presented by Cheng et al. (2006), trains neural network using backpropagation algorithm. The neural network is trained to make predictions for a corporation in Taiwan which produces electrical connectors. In this study a tool called AweSim was used to accumulate the orders of different types of connectors. This set of data was used to train neural network to give better prediction results.
This study presents the feasibility of a backpropagation neural network in predicting demand for a simulated set of data. In this approach simple backpropagation neural network was used. However to implement the forecasting approach in parallel manner, the above explained Parallel backpropagation on MapReduce can be used (see Figure 7).
According to the study, five variables have been defined, that would define the data and help train the neural network. The five variables are as follows:
Figure 7. Flow chart for the process of demand forecasting using backpropagation
(Cheng, Hai-Wei & Chen, 2006)
- 1. T: time interval between two arrival orders
- 2. Q: quantity of a connector
- 3. PA: operator time
- 4. PB: machining capacity
- 5. PC: inventory level
Data of orders of electrical connectors, provided by the corporation, was not sufficient to train a neural network. So a simulation tool called AweSim was used to simulate large amount of data based on the records provided by the company. The data which is generated by AweSim is normalized, so that the network does not become saturated and indifferent to a new set of records.
Network architecture for demand forecasting consists of three layers namely as input layer, hidden layer, output layer. Input layer consists of five neurons, each corresponding to one of the five variables. Number of neurons in output layer were also five. A gradient descent backpropagation approach is used to train the network. A professional software called Neural Network Professional II/Plus was used to train the network. Once the network has been trained on the training data, it is the time to test it on testing data. Testing data a set of records that has not been used in training process.
The study shows that backpropagation neural network is capable of predicting demands based on the data of electrical connectors. Similarly, the neural network can be trained for predicting the demand of several other products. Neural network has been proved to be a powerful tool in different processes of operations management such as demand forecasting, cost estimation etc.