Oil price forecasting: WTI crude oil
In this chapter, the forecasting power of the candlestick is further scrutinized on tlie crude oil price. The econometric method used in this chapter is also the DVAR model.
11.1 Introduction
Crude oil, known as the blood of industries, plays an important role in world economies. As one of the main focal points in the world, oil price has become an increasingly essential topic of concern to governments, enterprises and investors. For example, Hamilton noted in a paper published in Macroeconomic Dynamics in 2011 that at that time, 10 out of the 11 postwar U.S. recessions had been preceded by a sharp increase in the price of crude petroleum (Hamilton, 2011). Improving the forecasting accuracy of the oil price is of great interest to both researchers and practitioners. For example, central banks and private sector forecasters take the oil price as one of the key variables in assessing macro economic risks. Thus, a more accurate forecast of the oil price is of great importance to both policymakers and investors.
Influenced mainly by the fundamental supply and demand shocks, the crude oil price is also shocked by many other factors such as environmental disasters, political events, speculations, and so on. All these factors contribute to the huge volatility of the oil price and make oil price forecasting a difficult and challenging job.
Despite the difficulty, voluminous approaches have been devoted to concerning crude oil price forecasting and analyzing. Generally, these approaches can be classified into two categories: structural models and datadriven methods. Structural models attempt to analyze and forecast oil price in terms of supplydemand equilibrium schedule (e.g. Bacon, 1991; Huntington, 1994; Ye et al., 2004; Yang et ah, 2002; He et ah, 2010). In a different way, the datadriven models tty to capture the real data generating process (DGP) from the historical price information using either statistical analysis or artificial intelligence. Datadriven approaches include linear models such as Autoregressive Moving Average (ARMA), Autoregressive Conditional Heteroscedasticity (ARCH) type models (e.g. Sadorsky, 2002; Morana, 2001) etc., and nonlinear models such as Artificial
Neural Network (e.g. Mirmirani and Li, 2004; Moshiri, 2004; Nelson et al., 1994), pattern matching approach (Fan et al., 2008), soft computing approach (Ghaffari and Zare, 2009), wavelet decomposition and neural network modeling (Jammazi and Aloui, 2012). Recent academic research shows a rising interest in using interval data to forecast oil price, including Sun et al. (2018), Yang et al.
(2012) , Qiao et al. (2019). Other references on oil price forecasting and analyzing include Abosedra and Baghestani (2004), Stevens (1995), Pindyck (1999), Chaudhuri (2001), Zhang et al. (2008), Yu et al. (2008), Huang et al. (2009), Kang and Yoon (2013), and Yang et al. (2013, 2016).
Different from the existing method, this chapter scrutinizes the predictability of crude oil price with the DVAR model. We compare the DVAR modeling technique with the efficient market model and the classic ARMA model. Previewing the results, we obtain the following interesting findings.
First, both insample and outofsample forecasts demonstrate that the DVAR model does report statistically significant and informative forecasts. Different evaluation methods are used to check the robustness of the forecasts, and the results confirm that the crude oil price is predictable.
Second, empirical results indicate that the DVAR model performs better when the oil price is in recession, which is consistent with the findings of Li et al.
(2013) in future oil price and with the information friction theory of Hong et al. (2000, 2007). This finding is important as it indicates that the oil market is not so efficient as it is commonly believed to be, especially when in recession, and that the DVAR model is more applicable in recession.
Finally, outofsample forecasting results demonstrate the dominance of the decompositionbased VAR model over the ARMA model, indicating that the candlestick chart is informative in forecasting crude oil price.
This chapter is organized as follows. Section 2 presents the DVAR model along with some discussions. Section 3 empirically investigates the performance of the DVAR model using the monthly WTI spot oil price data series and compares the results with the efficient market model and with the ARMA model. Section 4 summarizes.
11.2 Econometric method
In this section, we present a brief introduction to the DVAR model and the forecast evaluation criteria.
11.2.1 DVAR model
The decompositionbased VAR (DVAR) model will be used to investigate the predictability of the crude oil price. A DVAR model of p order can be presented as follows:
where Y^(AR_{t}, A W_{f})^{T}, X_{t}_ is a vector of exogenous variables. The exogenous variables used in this chapter include the upper shadow and the lower shadow. The forecasts of the return on crude oil are constructed through the following equation
where rf is the return forecast, AR{ and A Wf are respectively the forecasts of AR[ and A W{ reported by the DVAR model.
11.2.2 Forecast evaluation
The commodity market is believed to be efficient and follow a random walk. To see if the DVAR model beats the simple random walk model in terms of outof sample forecasts, the first forecast evaluation criterion used is the outofsample Rsquare, Rf (Campbell and Thompson, 2008):
where tf is the return forecast, and r, is the historical mean forecast
If the DVAR model reports better forecasts, Rf will be positive, which implies a lower meansquared forecast error (hereafter MSFE) relative to the forecast based on the historical average return.
The null hypothesis of interest is therefore Rf< 0 against the alternative hypothesis that Rf >0. We test this hypothesis by using the Clark and West (2007) MSFEadjusted statistic. Define
then the Clark and West (2007) MSFEadjusted statistic is the fstatistic from the regression off on a constant.
Following Sadorsky (2002), we also perform the market timing ability of the DVAR model using different ways:
where I is an indicator function, which is equal to 1 if its argument is true and 0 otherwise. The BGJ test (Breen et al., 1989) is asymptotically equivalent to a onetailed test on the significance of the slope coefficient, aj.
The CM test (Cumby and Modest, 1987) extends the BGJ test to include not just market timing, but also the magnitude.
The BH test (Bossaerts and Million, 1999) investigates if the forecasts capture any valuable information contained in the real values. In case of statistically significant nonzero of a_{b} the forecasts are said to be informative. In case of a„ = 0 and a, = 1, r[ is said to be an unbiased forecast of r,.
In both the CM and BH tests, the null hypothesis is that the slope coefficient is equal to zero, and the alternative hypothesis is a onesided alternative that the slope coefficient is positive.
In time series forecasting, the ARMA model is the most commonly used econometric tool. To see if highlow price information adds additional information for improving the forecasting accuracy, we compare the performance of the DVAR model with the ARMA model. To see if the DVAR model outperforms the ARMA model, we report the root mean square error (RMSE) and the mean absolute error (MAE) i.e.,
where rf (m) is the forecast reported by model m.
11.3 Empirical results
This section investigates the empirical performance of the DVAR model forecasting and compares the forecasting accuracy between the DVAR model and the random walk model, and the ARMA model.
11.3.1 The data
The monthly spot price data used in this analysis is the U.S. Cushing, OK WTI Spot Price (dollars per barrel). The data sample covers 1986.012013.01, with a number of 325 observations. Figure 11.1 presents the time series for the WTI spot oil price. The data was downloaded from the EIA website.
Since the original data set provides no high and low prices, we constructed the high and low prices from the closing price as follows:
Figure 11.1 Time series of monthly WTI crude oil price over 1986.012013.01
Table 11.1 Summary statistics of AR„ AW, and crude oil return: 1986.012013.01
AR, 
AW, 
r, 

Mean 
0.006 
0.000 
0.006 
Median 
0.000 
0.004 
0.012 
Maximum 
0.148 
0.297 
0.392 
Minimum 
0.174 
0.255 
0.332 
Std. Dev. 
0.037 
0.073 
0.082 
Skewness 
0.480 
0.037 
0.249 
Kurtosis 
7.890 
4.716 
5.593 
JarqueBera 
323.848*** 
38.463*** 
90.893*** 
We use ***, **, and * to mean significance at the level of 1%, 5%, 10%.
To be specific, the high price is taken to be the maximum price of the 12 consecutive monthly observations, and the low price is take to be the minimum price of the 12 consecutive observations.
From the high and low prices, AR_{t} and AW, can be constructed. Table 11.1 reports the summary statistics AR_{n} AW„ and r,. The summary statistics indicate great price volatility risk of the crude oil market: within one month, the oil price changes can go up as large as 39% and down as huge as 33%, both of which are three times larger than the standard deviation. Consistent with the well documented facts, the oil price changes also exhibit skewness and high kurtosis. The Jarque Bera statistic reveals the abnormal distribution of AR„ A W„ and r, at a significance level of 1%.
11.3.2 Insample model estimation
The insample forecast covers the whole data observations. Lag order selection is of great importance when performing a VAR model estimation. As is usual, complex models underperform the simple ones when delivering an outof sample forecast. Therefore, we use the SIC (Schwarz Information Criterion) as the order selection criterion. Table 11.2 reports the estimates of the DVAR model.
To quantitatively see how well the DVAR model explains the real observations, we perform a linear regression test. The result is presented as follows
where r{ are the forecasts. The result indicates good insample forecasts. Regression analysis shows unbiased insample forecasts of the DVAR model: the slope coefficient is equal to 1 and is statistically significant, while the constant is almost 0 and is insignificantly different from 0. The Rsquare indicates that about 8% variance can be explained away by the DVAR model.
11.3.3 Outofsample performance
For the outofsample forecast, the total sample of T observations are divided into two portions: the insample portion composed of the first M observations and the outofsample portion composed of the last TM observations. We use the static forecasting procedure to produce the outofsample forecast. To be specific, the first M observations are used to estimate the parameters in the DVAR model. Keeping fixed these estimated parameters, the outofsample forecasts are reported. The static forecast is employed to check the robustness of the DVAR model. It is commonly accepted that the outofsample forecast would be
Table 11.2 Estimates of the DVAR model: WTI oil price
AR, 
AW, 

Cocf. 
/■Statistic 
Cocf. 
tStatistic 

с 
0.007 
2.677 
0.005 
0.803 
а 
0.466 
8.039.657 
0.273 
2.536 
р 
0.119 
4.265 
0.223 
3.578 
Is,1 
0.060 
3.341 
0.045 
1.059 
us,1 
0.009 
0.468 
0.034 
0.792 
Rsquarc 
Rsquarc 

0.300 
0.084 
poor if there is any instability' in the model structure. Thus, the static forecast offers a nice tool for checking the robustness of the model structure.
For this example, the time period 1986.122000.12 is used to estimate the parameters, and the time period 2001.012013.01 is used for the outofsample forecast. The division is typical. For one thing, the cutoff point is almost the middle point of the whole sample; for another, the portion used for the outof sample forecast covers both expansion and recession in the crude oil market.
The BGJ, CM, and BH tests are reported as follows:
Contrary to Sadorsky (2002), the BGJ test result indicates there is direction forecasting ability for monthly returns. Different from Sadorsky (2002), the CM and BH tests give positive reports, which confirms that the DVAR model does report significant outofsample forecasts.
To see in what conditions DVAR performs better, recession or expansion, we present the following regression analysis
The test result shows that when the oil price is in expansion (г, > 0), the slope coefficient is positive but not significant; while the slope is significant at a level of 1% when the oil price is in recession (r, < 0). This pattern is consistent with the information friction of Hong et al. (2000, 2007) who claim that “bad news travels more slowly”, and with the slow information diffusion effect detected by Li et al. (2013) in the oil futures price. This pattern has also been widely observed in the stock markets, such as Rapach et al. (2010), Henkel et al. (2011), and Dangl and Hailing (2012).
To see if the DVAR model significantly outperforms the simple historical mean, we calculate the fvalue of the MSFEadjusted statistic. The fvalue is reported to be 2.18, statistically significant at a level of 5%, which confirms that the DVAR model beats the simple historical average. The crude oil market is thus not informationally efficient.
Another interesting question is whether or not the DVAR model outperforms the classic ARMA(p, q) model for outofsample forecasting. Following the same rule, we select the order (p, q) based on SIC. The ARMA model selected by the SIC is given by
To compare the relative performance of DVAR and ARIMA, the MAE and RMSE are computed. The results are presented in Table 11.3. The DVAR model outperforms the ARIMA in terms of both MAE and RMSE.
98 Applications
Tabic 11.3 Outofsample prediction error comparison: DVAR v.s. ARMA
Model 
MAE 
RMSE 
ARMA 
0.072 
0.091 
DVAR 
0.066 
0.084 
Figure 11.2 Outofsample forecasting comparison, ARMA v.s. DVAR over 2001.012013.01
We also run encompassing regression, and the results are presented as follows
The result shows clear dominance of DVAR over ARMA: once the forecasts of DVAR are included, the forecasts given by ARMA become insignificant. Figure 11.2 presents the cumulative squared forecast error in the left panel and the difference between the cumulative squared forecast error for the ARMA model and the cumulative squared forecast error for the DVAR model in the right panel. We use legend “ARJV1A_DVAR” to mean the difference.
11.4 Summary
This chapter further scrutinizes the forecasting power of candlestick forecasting on crude oil price. An empirical study was performed on the WIT crude oil price, and the result confirms that the DVAR model outperforms not only the historical mean model but also the classic ARMA model. This finding indicates that candlestick charts have additional valuable information for oil price forecasting. In summary, candlestick forecasting is informative.
Part V