# LITERATURE REVIEW

Available relevant pieces of literature from 2007 to 2018 prove that stock market behaves nonlinearly. Different researchers used different techniques and tools that are applied to get more reliable and improved results to optimize predictions. Recent studies show that hybrid techniques can produce better results than the single analysis model.

Khemchandani and Chandra [7] proposed a TWSVM, which is a set of two smaller SYM classifiers and is capable of solving nonlinear problems. It is four times faster and generalized type of model than the traditional SYM. Khemchandani and Chandra found that the proposed TWSYM gave better results and is capable of solving smaller size quadratic problems.

As compared with the single hyperplane, which is present in the traditional SVM, the TWSVM has two hyperplanes, where either of the planes is close to one of two datasets and far from the other datasets. In addition, Khemchandani and Chandra observed that the TWSVM is more favorable than a single SVM because a nonlinear kernel can easily be applied to precise the result.

Lu et al. [8] suggested a two-staged hybrid framework that is a mixture of support vector regression (SYR) and independent component analysis (ICA), which was applied on financial prediction. They found the individual modeling framework as SVR is good enough to precise the prediction result of the financial time series, but it is not capable of handling the inherited high noise. This problem degrades the prediction capabilities. It forced to shift in the hybrid framework so that one can handle inherited noise of the dataset and others can optimize the result of prediction and create the overfitting and underfitting problem. They proposed a hybrid model that is a combination of SYR and ICA, wherein the first-stage ICA takes care of influenced noise, and the second-stage SVR will be used for the prediction of financial time series. ICA has a perfect type of mechanism to separate the potential source signal from any mixed datasets without any advanced knowledge of separation mechanism. In the proposed model, at the first stage, an independent component (IC) is produced using ICA.

Moreover, the particular ICs that had higher noise were removed. The ICs that had less noise was used as input for the SVR model. In the first segment, the ICA was utilized for predicting variables to generate the ICs. hi the next segment, noisy ICs were identified and removed, and the rest of the ICs without noise served as input for the SVR predicting model. The result obtained from the experiment suggests that the proposed model outperforms the SVR model. This model was applied to two different datasets as Nikkei 225 opening price and the TAIEX closing price, and the proposed model was compared with the traditional model. It was observed that the proposed model provides a precise and better result as compared to the traditional model. The experiment suggests that in the future, some other hybrid mechanisms can be developed for better prediction result.

Peng [9] proposed a different type of hybridization that is twin support regression because the speed of learning in terms of classical SVR is deficient. The classical SVR is constructed based on a convex quadratic function and linear inequalities. These problems forced to build a twin support vector regression (TSVR). This fragmented SVR is smaller than the classical SYR. The help of two nonparallel planes can develop it. The experimental results suggest that the used TSYR are much faster than the classical SYR, and it is capable of solving the twin quadratic programming problem efficiently without the use of equality constraints. It was also observed that the TSYR is not capable to handle the hybridization problem. It was also observed that the TSVR loses sparsely because a more in-depth study is required to handle it.

Wang and Zhu [10] studied many different types of machine learning algorithm and found that an SVM is the most promising tool in the field of financial market. The SVM is a kernel-based algorithm that uses input space *X* and Hilbert space *F* by the use of kernel function space *X.* Finally, they propose a two-stage kernel mechanism applied in SYR in time-series forecasting in the financial market. Herein, they took some candidate kernels and then produced a mixture kernel, which is capable of producing the better result of the prediction. The regularization parameters were selected from all the available combinations to optimize the result. The experimental setup worked in two steps. In the first step, a standard SVR was applied by using all available candidate kernels to find a linear combination of sparse regularization parameters. In the second step, the regularization parameter will automatically be solved by using a solution path algorithm. The model was applied to the S&P500 and NASDAQ market and gave a promising result. The model continuously outperforms the financial market, and the over return or profit is statistically significant, which is the best part of this study.

Wang et al. [11] developed a hybrid model, which is a combination of ARIMAmodel, exponential smoothing model (ESM), and the backpropagation neural network (BPNN), to predict the stock market price based on time series. They applied the best of all three models to optimize the result. The threshold factor of the hybrid model was decided by genetic algorithm (GA). It is a linear mixture of the ARIMA model and the ESM and a nonlinear mixture of the BPNN on the two original and available datasets, hi the experiment, the opening of the Dow Jones Industrial Average Index and the closing of the Shenzhen Integrated Index are used. The experimental result proves that the proposed hybrid model works better than any individual model. Thus, this study proves that the hybrid combination of tools is a powerful technique in prediction, and it is dominant in the field of management science, which can be applied in other generalized fields. However, this model has its own limitation; the authors recommend some more powerful hybrid combinations as multivariate adaptive regression splines and SVR can be developed to precise and improve the time series and high-frequency forecasting.

Devi et al. [12] suggested a fundamental and generalized approach for the stock market prediction in terms of time series. In this experimental setup, the authors use the last five years, from 2007 to 2011, historical data of NSE—Nifty Midcap50 companies (top four companies that have maximum Midcap value) were taken for time-series prediction. The actual dataset was collected and trained by using the ARIMA model with different criteria. The Akaike information criterion and the Bayesian information criterion are used to observe the correctness of the model. The Box-Jenkins mechanism is used to analyze the model. To forecast the nature of the dataset, mean absolute error, and mean absolute deviation are used to analyze the fluctuation in the actual historical data. It is realized that more modified and advanced approaches will be applied to find the information hidden in the stock market.

Enke and Mehdiyev [13] developed a model that is based on hybrid prediction. It works based on combined differential evolution with a fuzzy clustering and fuzzy inference neural network to produce indexing result. First, the input can be generated with the help of stepwise regression analysis. They pick the sets of input that has the most robust prediction capability. Second, a differential-evolution-based fuzzy clustering method can be applied for the extraction of rules to produce a result. Finally, fuzzy inference in a neural network is developed to predict the final result. The model developed was used for stock market forecasting. The experimental results, simulation, and lower root-mean-square error (RMSE) suggest that linear regression models, probabilistic neural models, a regression neural network, and a multilayered feedforward neural network (FFNN) may produce better results in terms of this type of regression. This study suggests that by allowing the fuzzy models as augment and by using type-2 fuzzy sets, the computational and expressive power can be improved, and the modified model can produce better results and is able to capture the unpredictive nature of the stock market shortly.

Patel et al. [14] suggested the two-stage fusion technique, in which in the first stage, they applied SVR. At the second stage, they applied the mixture of ANN, random forest, and SVR, resulting in SVR-ANN, SVR-RF, and SVR-SVR to predict the model. The potential of this mixture model is compared with the single-stage modeling techniques, in which RF, ANN, and SVR are used single-modeled techniques. The particular outcomes suggest that two-stage hybrid or mixture models are superior to that of the single-stage prediction modeling techniques. They suggested a hybridizing method to get better and more accurate results.

Sheta et al. [15] suggested a model, which is a combination of SVMs and ANNs, to build hybrid prediction models. The hybrid prediction models were compared on the basis of the various evaluation criteria. The produced model was compared with the various evaluation criteria. Twenty-seven potentially useful variables were selected that may affect the stock movement and its analysis. The analyzed SVM model with the capability radial basis function (RBF) kernel model gives better prediction result compared to ANN and regression techniques. The results were analyzed under the evaluation criteria. This study recommends applying other hybrid soft computing tools to precise the stock market prediction result.

Cliiang et al. [16] developed a model, which is a combination of the ANN and particle swarm optimization, to act as an adaptive intelligent stock trading decision support system and predict the futuristic nature of the stock market. The particular system has its own limitation because it demands technical indicators and particular patterns for the input pattern.

Tkac and Vemer [17] investigated and analyzed Generalized AutoRe- gressive Conditional Heteroskedasticity, linear regression, discriminant analysis, where the nature of neural networks is capable of finding better results without using a statistical assumption. This study identified linear regression, logit, discriminant analysis, and the ARIMA model as benchmark methods. This study also observed that the ANN has better potential than any other statistical and soft computing methods in terms of determination coefficient, mean square error, or classification or prediction accuracy. The advantages of conventional models are their transparency, simplicity, generalized nature, and ability to comprehensibly analyze the received output. It was realized that due to parallel and complex nature of the neural network, there is no authentic and recognized value of synaptic weight in the hidden layer, which makes it impossible to establish the relation between input and output datasets. This problem established the need for the hybrid model, which is a combination of the ANN and the traditional approach. This study observed that proposed the hybrid network is better than the conventional feedforward network supported by gradient-based tools and techniques. The particular hybridization is suitable for a particular type of task. Therefore, this study suggests the need for a metaheuristic method to optimize the result performance. The essential problems such as lack of analytical abilities and formal background can be addressed to improve the performance of the ANN. Therefore, general methodology universal guidelines for the selection of hidden layers, control variables, and overall design of the topology to improve the performance of the ANN are needed. This chapter established that the ANN is an imdisputable better method than any traditional method due to user-friendliness of software packages and general availability of data analysis technique.

Slirivastav and Kumar [18] suggested the deep neural network (DNN) strategy that can trend data in real tune. They defined three parameters: log return, pseudo log return, and trade indicator. They formulated and calculated all these terms in their paper. By using the DNN, they predict the next 1-min pseudo log return. The used architecture was chosen arbitrarily, which has one input layer, five hidden layers, and one output layer. The DNN was trained after every 50 epochs.

Chounnouziadis and Cliatzoglou [19] suggested a twofold system, in which they applied the fuzzy system. In the first stage, they used a fuzzy system in the short-term trading that discards the overflowed confidence of classical data and uses the detailed assessment, hi the second stage, they applied a novel trading technique and an “amalgam” between compromised sets of mainly picked unrequited technical indicators to produce alarming signals and then supplied these signals to required design and required fuzzy system, which produces the part of the portfolio that is to be invested. That short-term fuzzy system is tested for the ASE general index for a longer time. This particular model has its own limitations such as weights of the fuzzy rules. Therefore, it is very difficult task because the success rate of the model depends on the capability of selecting the required technical indications. The proposed strategy analyzes the nature of prediction for the short interval of time. It is a good strategy for small-size datasets with 66% accuracy. In the testing period, it provides 81% accuracy for the traders for the prediction. The DNN is a simple model; therefore, the author recommends the other model as deep recurrent neural network, deep belief network, convolution DNN, deep coding network, and other network to get a more precise and accurate result for the vast datasets.

Qiu et al. [20] suggested a model that uses the ANN, GA, and simulated annealing. It produces satisfactory results in the proposed 18 input sets that can successfully predict the stock market returns. It can be applied to minimize the dimension of the available input variables. They recommended applying the ANN and other models to predict the stock market.

Zhong and Enke [21] suggested three-dimensional reduction techniques, which are fuzzy robust principal component analysis (PCA), PCA, and kernel-based principal component analysis (KPCA). These techniques are used to simplify and rearrange the original data structure through the use of ANN and dimension reduction. Proper selection of kernel for the excellent performance of KPCA is very essential. Zhong and Enke [21] suggested the mechanism for selecting automatic kernel functions to get a better result. The simulated results suggested that combining the ANNs with the PCA gives little better classification accuracy than the rest of other two combinations.

Barak et al. [22] suggested a fusion model, which is based on multiple diverse base classifiers that handle standard input and a meta-classifier that precise the prediction. The combination of diverse methods such as Ada-Boost, Bagging, and Boosting is used to produce diversity in the sets of the classifier. The experimental result produces that Bagging performance is superior infusion with 83.6% of accuracy when mixed with Decision Tree,

Rep Tree, and logical analysis of data (LAD) Tree producing result accuracy of 88.2% with bloom filter Tree, LAD Tree, and Decision Tree and Naive Bayes in terms of risk prediction. This study helps select the prominent individual classifier and produce a mixture model for the stock market prediction. The fusion model was compared with a wrapper GA model to produce a benchmark. The fusion model was applied on a particular dataset of the Tehran Stock Exchange, and the Bagging and Decision mixture model performed better than the other two algorithms. This study recommends optimizing the classification parameter, predicting other significant responses and textual information, and applying technical features, and customizing the proposed approach will optimize the prediction capabilities of the mixture model.

Chonga et al. [23] suggested the seven-set feature application on three significant approaches such as autoencoder, restricted Boltzmann machine, and PCA to construct three-layer DNNs to predict the futuristic stock returns by using data representation and the DNN. It is applied on the Korean stock market index and found that the DNN produces slightly better results than the linear autoregressive machine learning model in the training set. However, these advantages were mostly disappeared in the testing phase. It works better in the limited resource, but in the case of high-frequency stochastic datasets, its performance may be doubtful.

Despite enormous research in the area, no researchers can produce a single established model that can give an optimized and precise model of computationally intelligent system for the stock market. Moreover, high- frequency datasets are rarely used in this area. These problems demand new or hybrid tools, and techniques should be applied to these high-frequency datasets, which justifies the significance of this study. This study is the first step toward the model of a computationally intelligent system for the stock market prediction. For this purpose, different tools and techniques should be analyzed and realized to create a benchmark. Therefore, in this study, the most famous statistical ARIMA model will be utilized to maintain the quality of the system.