EXPERIMENTAL SETUP AND RESULTS
Adhikari et al. [38] explained that the ARIMA model was introduced by Box and Jenkins in 1970. This method is the composition of many activities with a time-series mechanism. It prominently results in short-term forecasting. The ARIMA (p,q,d) model is applied on the dataset of Coca Cola, which was taken from June 22, 1999 to December 16, 1999 in a fine interval of a single minute. At the fust glance, it was observed that the original dataset, which is in high volume, is not stationary. For this purpose, fust, the standard dataset is converted into a log(dataset). However, it was found that log (dataset) is also not stationary. Therefore, the dataset is differentiated to make it saturated. Then, the autocorrelation function (ACF) and the partial autocorrelation function (PCF) will be calculated, and finally, the auto ARIMA model is applied for the prediction withp, q, d assumptions.
1.4.1 DATA COLLECTION AND PREPARATION
High-frequency data are collected from the stock market based on the first interval. These were collected from June 22, 1999 to December 16, 1999 in a fine interval of a minute. The volume of data is very high, that is, 49,058. It has Index, Date, Time, Open, High, Low, and Close attributes. However, we will use the only Index as Minute and Close for this analysis and prediction.
1.4.2 DATA PREPROCESSING AND EXTRACTION
Data preprocessing and extraction are essential steps for precise and accurate prediction. Data for holidays will not be available. Therefore, we need some preprocessing algorithms to minimize them. For smooth preprocessing, a feature extraction mechanism should be applied to minimize the irrelevant or blank data to produce a more accurate and precise result. It will automatically find new data by R-Studio by using an inbuilt function: anyNA(dataset).
1.4.3 SUMMARY OF THE DATASET
A summaiy is a general-purpose function in R-language that completely analyzes central tendencies of the datasets as min, fu st quaitile, median, mean, third quar- tile, and max of both attributes Minute and Close represents in Table 1.1.
TABLE 1.1 Summary of the Dataset
Minute |
Close |
Min: 1 |
Min: 1242 |
First quartile : 12,277 |
Fust quaitile: 1332 |
Median: 24,507 |
Median: 1366 |
Mean: 24,528 |
Mean: 1363 |
Third quaitile: 36,758 |
Third quartile: 1404 |
Max: 49,068 |
Max: 1452 |
1.4.4 ALGORITHM DEFINITION AND PARTITION
This is a critical step for the prediction of the ARIMA model that has been realized according to the available datasets, as shown in Figure 1.1.
First, we will divide the dataset into two parts, in which one part will be used for training and the other will be used in the testing phase, valid in <- createDataPartition(y$Close,p=0.80, list=FALSE) valid <- dataset[-validindex,] dataset <- dataset[validin,] dim(dataset)
[1] 39,256 2 dim(valid)
[1] 9812 2
FIGURE 1.1 Data flow in the ARIMA model.
1.4.5 FORECASTING EVALUATION AND IMPLEMENTATION
The nature of a real high-frequency stochastic dataset is dispersed in nature. The real fluctuation and fluctuation from its mean value of the close are shown in Figures 1.2 and 1.3, respectively.
FIGURE 1.2 Stock market close fluctuation.
FIGURE 1.3 Stock market close deviation from the mean.
1.4.6 MODEL IMPLEMENTATION BY THE ARIMA MODEL
Implementation of the ARIMA model requires appropriate steps of the algorithm. This algorithm is the best way of exploring the model behavior as well as the dataset behavior. The algorithm is represented as follows:
- • Find the log(Close) value.
- • Obtain the diff(log(Close)), as shown in Figure 1.4.
- • Get the ACF of diff(log(Close)).
- • Find the PCF of diff(log(Close)), as shown in Figure 1.5.
- • Obtain the time series of diff(log(Close)).
- • Train the model with time series or diff(log(Close)) results by auto ARIMA.
- • Forecast the result up to the desired period, as shown in Figure 1.6.
- • Test the model with some hypotheses.
First, w^{r}e have taken Close attribute for training and established its graph, as shown in Figure 1.2. To find its deviation, w'e plotted another graph, as in Figure 1.3. To reduce its fluctuation, we took log(Close) and plotted. Still, w'e need to reduce the fluctuation; therefore, we took diff(log(Close)). Now', w'e calculated the ACF and the PACF and tested the dataset for saturation, which is required the further procedure. Now', we applied time series and auto ARIMA to predict the model up to the desired year based on previous training.
FIGURE 1.4 Stock market diff(log(close)).
FIGURE 1.5 Stock market close PCAF.
adf.test(z)
adf.test(diffz)
closearima <- ts(r, start = c(1999,06), end=c( 1999,06), frequency = 1) pclose<-auto. arima(c) closeforval=forecast(pclose, //=9812).
1.4.7 TEST OF THE ARIMA MODEL
Finally, it is tested by the Dickey-Fuller test, checks the null hypotheses, and searches for unit root that is available in the autoregressive model. The experimental test performed on the datasets of stock market concludes that in all steps, the time series was stationary that was done by adf.test(). The test results of both log(Close) and diff(log(Close)) are equal in terms of Lag Order, where its p value is below 1 and its alternate hypothesis is also stationary. Therefore, both results satisfied that the ARIMA model was successfully implemented with a bonafide dataset, as shown in Table 1.2.
TABLE 1.2 Test Result of ARIMA (0,1,2)
Test Name |
Dickey- Fuller |
Lag Order |
/»-Value |
Alternative Hypothesis |
Augmented Dickey-Fuller Test for log(Close) |
-1.7325 |
33 |
0.6923 |
Stationary |
Augmented Dickey-Fuller Test for diff(log (Close)) |
36.551 |
33 |
0.01 |
Stationary |
1.4.8 MODEL IMPLEMENTATION BY THE GLM
GLM general-purpose machine learning tools can be used in regression and classification [39,40]. In the case of classification, such tools perform as binary classifiers. The selection of classification and regression depends on the nature of the dataset. In the case of factor type of dataset, the classification will be applied, and in the case of numeric data, regr ession type can be utilized. This study proposes the GLP with Gaussian regr ession because the available dataset is numeric. The model will be developed by using very advanced, memory efficient, and speedy package H,0 in R-language. This model will follow' all preprocessing steps as provided in the ARIMA model. The experimental result is shown in Figure 1.7.
FIGURE 1.7 Forecasting in the GLM.
1.4.9 COMPARATIVE ANALYSIS
The experimental results describe that both the AREMLAmodel and the GLM are unable to capture the nonlinear and Brownian nature of the big high-frequency dataset. Figure 1.8 and Table 1.3 show the comparative results, respectively.
TABLE 1.3 Analysis of 20-Day Data
Index |
Minute |
Real Close |
ARIMA Close |
GLM Close |
1 |
5 |
1376 |
1417.51 |
1359.875 |
2 |
8 |
1376 |
1417.51 |
1359.875 |
3 |
14 |
1376 |
1417.51 |
1359.876 |
4 |
16 |
1376 |
1417.51 |
1359.876 |
5 |
17 |
1376 |
1417.51 |
1359.876 |
6 |
20 |
1376 |
1417.51 |
1359.877 |
1 |
21 |
1376 |
1417.51 |
1359.877 |
8 |
22 |
1376 |
1417.51 |
1359.877 |
9 |
24 |
1376 |
1417.51 |
1359.877 |
10 |
26 |
1376 |
1417.51 |
1359.877 |
11 |
30 |
1376 |
1417.51 |
1359.878 |
12 |
31 |
1376 |
1417.51 |
1359.878 |
13 |
32 |
1376 |
1417.51 |
1359.878 |
14 |
35 |
1376 |
1417.51 |
1359.879 |
15 |
37 |
1376 |
1417.51 |
1359.879 |
16 |
39 |
1376 |
1417.51 |
1359.879 |
17 |
44 |
1376 |
1417.51 |
1359.88 |
18 |
47 |
1376 |
1417.51 |
1359.88 |
19 |
53 |
1376 |
1417.51 |
1359.881 |
20 |
60 |
1376 |
1417.51 |
1359.882 |
1.4.10 COMPARATIVE PERFORMANCE ESTIMATION
The root mean square defines the behavior of the model. It also defines that the lower RJVISE is better for the model
?/: total number of samples r: real sample value p predicted value
RMSE of the ARIMA model: 70.59261 RMSE of the GLM: 45.23671.
This performance estimation explores that both the ARIMA model and the GLM are not able to capture the nature of high-frequency stochastic big data in a good maimer. However, the GLM provides a better comparative result than the ARIMA model. The use of other available regression techniques can improve GLM performance.
CONCLUSION AND FUTURE DIRECTION
This chapter applied the most straightforward ARIMA model and GLM to realize the approach and to understand the behavior of the stochastic nature of the stock market. The ARIMA model is the best available statistical model for exploring the nonlinear and Brownian behavior of the stock market. The experimental results suggest that both these models are unable to capture the nature of the high-frequency dataset of the stock market. However, the GLM is slightly better than the ARIMA model. The discussed techniques and approach can be fruitful to guide the student and investors to build the more reliable and optimized intelligent financial forecasting model. The primary use of this work is to explore and provide fundamental obstacles and futuristic dimension and guidelines in directions of the particular research. The simulated result showed the deviation from the actual result. Therefore, this study and analysis recommend a more in-depth, comparative, and ensemble analysis, and simulation is required to build a more optimized intelligent system to predict the stock market behaviors more precisely and accurately. This study also recommends shifting from statistical modeling to machine learning frameworks for more precise, automated, and timeliness result prediction. For this purpose, the futuristic studies and research will need advanced machine learning nonparametric models for the betterment of prediction results in terms of a high-frequency stochastic dataset of the stock market.
KEYWORDS
- • high-frequency data
- • stock market prediction
- • machine learning
- • ARIMA model
- • artificial neural network
- • support vector machine
REFERENCES
- 1. G. S. Atsalakis and К. P. Yalavanis, “Surveying stock market forecasting techniques—Part II: Soft computing methods,” Expert Syst. Appl., vol. 36, no. 3 PART 2, pp. 5932-5941, 2009.
- 2. S. Shuuroug, H. Jiang, and T. Zhang, “Stock market forecasting using machine learning algorithms," Dept. Elect. Eng., Stanford Univ., Stanford, CA, USA, 2012, pp. 1-5.
- 3. R. Kumar, P. Chandra, and M. Hamnandlu, “Local directional pattern (LDP) based fingerprint matching using SLFNN,” in Proc. IEEE 2ndInt. Conf. Image Inf. Process., 2013. pp. 493^198.
- 4. R. Kumar, “Fingerprint matching using rotational invariant orientation local binary pattern descriptor and machine learning techniques,” Int. J. Compiit. Vis. Image Process., vol. 7, no. 4, pp. 51-67, 2017.
- 5. M. Tkac and R. Yemer, “Artificial neural networks in business: Two decades of research,” Appl. Soft Comput., vol. 38, pp. 788-804, 2016.
- 6. J.-J. Wang, J.-Z. Wang, Z.-G. Zhang, and S.-P. Guo, “Stock index forecasting based on a hybrid model,” Omega, vol. 40, no. 6. pp. 758-766, 2012.
- 7. R. Khemchandani and S. Chandra, “Twin support vector machines for pattern classification,” IEEE D ans. Pattern Anal. Mach. Intel!., vol. 29, no. 5, pp. 905-910, 2007.
- 8. C.-J. Lu, T.-S. Lee. and C.-C. Chiu. “Financial tune series forecasting using independent component analysis and support vector regression,” Decis. Support Syst., vol. 47, no. 2, pp. 115-125, 2009.
- 9. X. Peng. “TSYR: An efficient twin support vector machine for regression,” Neural Netw., vol. 23, no. 3, pp. 365-372, 2010.
- 10. L. Wang and J. Zhu, “Financial market forecasting using a two-step kernel learning method for the support vector regression,” Ann. Open Res., vol. 174, no. l,pp. 103-120, 2010.
- 11. J. J. Wang, J. Z. Wang, Z. G. Zhang, and S. P. Guo, “Stock index forecasting based on a hybrid model,” Omega, vol. 40, no. 6. pp. 758-766, 2012.
- 12. B. U. Devi. D. Simdar, and P. Alii, “An effective time series analysis for stock trend prediction using ARIMA model for nifty midcap-50,” Int. J. Data Min. Knowl. Manag. Process, vol. 3, no. 1, pp. 65-78, 2013.
- 13. D. Enke and N. Mehdiyev, “Stock market prediction using a combination of stepwise regression analysis, differential evolution-based fuzzy clustering, and a fuzzy inference neural network,” Int ell. Autom. Soft Comput., vol. 19, no. 4. pp. 636-648, 2013.
- 14. J. Patel, S. Shah. P. Thakkar, and K. Kotecha, “Predicting stock market index using fusion of machine learning techniques.” Expert Syst. Appl., vol. 42, no. 4, pp. 2162- 2172, 2015.
- 15. A. F. Sheta, S. E. M. Ahmed, and H. Fails, “A comparison between regression, artificial neural networks and support vector machines for predicting stock market index,” Soft Comput., vol. 7, no. 8, pp. 55-63, 2015.
- 16. W. C. Chiang, D. Enke, T. Wu, and R. Wang, “An adaptive stock index trading decision support system,” Expert Syst. Appl., vol. 59, pp. 195-207, 2016.
- 17. M. Tkac and R. Yemer, “Artificial neural networks in business: Two decades of research,” Appl. Soft Comput., vol. 38, pp. 788-804, 2016.
- 18. L. K. Shrivastav and R. Kumar, “A novel approach towards the analysis of stochastic high frequency data analysis using ARIMA model,” Int. J. Inf. Syst. Manage. Sci., vol. 2, no. 2,pp. 326-331,2019.
- 19. K. Chourmouziadis and P. D. Chatzoglou, “An intelligent short term stock trading fuzzy system for assisting investors in portfolio management,” Expert Syst. Appl., vol. 43, pp. 298-311,2016.
- 20. M. Qiu. Y. Song, and F. Akagi, “Application of artificial neural network for the prediction of stock market returns: The case of the Japanese stock market,” Chaos, Solitons Fractals, vol. 85, pp. 1-7, 2016.
- 21. X. Zhong and D. Euke. “Forecasting daily stock market return using dimensionality reduction,” Expert Sy-st. Appl., vol. 67. pp. 126-139, 2017.
- 22. S. Barak. A. Arjmaud, and S. Ortobelli, “Fusion of multiple diverse predictors in stock market,” Inf. Fusion, vol. 36, pp. 90-102, 2017.
- 23. E. Chong. C. Han, and F. C. Park. “Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies,” Expert Syst. Appl, vol. 83, pp. 187-205,2017.
- 24. B. Al-Hnaity and M. Abbod, “A novel hybrid ensemble model to predict FTSE100 index by combining neural network and EEMD,” in Proc. Eur. Control Conf., 2015, pp. 3021-3028.
- 25. R. Aguilar-Rivera, M. Yalenzuela-Rendon, and J. J. Rodriguez-Ortiz. “Genetic algorithms and Darwinian approaches in financial applications: A survey,” Expert Syst. Appl, vol. 42, no. 21, pp. 7684-7697, 2015.
- 26. E. Alpaydin, Introduction to Machine Learning, 2nd ed. Cambridge, MA, USA: MIT Press, 2010.
- 27. E. Guresen. G. Kayakutlu, and T. U. Daim, “Using artificial neural network models in stock market index prediction,” Expert Syst. Appl, vol. 38, no. 8, pp. 10389-10397, 2011.
- 28. A. A. Adebiyi, A. O. Adewumi. and С. K. Ayo, “Comparison of ARIMA and artificial neural networks models for stock price prediction,’V. Appl. Math., vol. 2014,2014, Art. no. 614342.
- 29. A. Kazem. E. Sharifi. F. K. Hussain, M. Saberi. and О. K. Hussain, “Support vector regression with chaos-based firefly algorithm for stock market price forecasting,” Appl Soft Coinput., vol. 13, no. 2, pp. 947-958, 2013.
- 30. В. M. Henrique, V. A. Sobreiro, and H. Kimura, “Stock price prediction using support vector regression on daily and up to the minute prices,” J. Finance Data Sci., vol. 4, no. 3, pp. 183-201,2018.
- 31. T. Howley and M. G. Madden, “An evolutionary approach to automatic kernel construction,” in Proc. Int. Conf.Artif. Neural Netw., 2006, pp. 417^26.
- 32. T. Hofmann, B. Scholkopf. and A. J. Smola, “A review of kernel methods in machine learning,” Max Planck Inst., Nijmegen. Germany, Tech. Rep. 156, 2006.
- 33. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge Univ. Press, 2000.
- 34. J.-J. Wang, J.-Z. Wang, Z.-G. Zhang, and S.-P. Guo, “Stock index forecasting based on a hybrid model,” Omega, vol. 40, no. 6, pp. 758-766, 2012.
- 35. M. Ouahilal, M. E. Mohajir, M. Chahliou, and В. E. E. Mohajir, “A novel hybrid model based on Hodrick-Prescott filter and support vector regression algorithm for optimizing stock market price prediction,” J. Big Data, vol. 4, no. 1, 2017, Art. no. 31.
- 36. R. Khemchandani. P. Saigal, and S. Chandra. “Improvements on v-twin support vector machine,” Neural Netw., vol. 79, pp. 97-107, 2016.
- 37. R. C. Cavalcante, R. C. Brasileiro, 1'. L. F. Souza. J. P. Nobrega, and A. L. I. Oliveira, “Computational intelligence and financial markets: A survey and future directions,” Expert Syst. Appl.,o. 55, pp. 194-211,2016.
- 38. R. Adhikari and R. K. Agrawal, “An introductory study on time series modeling and forecasting,” 2013, arXiv:1302.6613.
- 39. [Online.] Available: http: Vdocs.h2o.ai/h2o/latest-stable/h2o-docs/data-science glm.html
- 40. A. Elliot and С. H. Hsu, “Time series prediction: Predicting stock price.” 2017, arXiv: 1710.05751.