A Hybrid Autoregressive Integrated Moving Average-phGMDH Model to Forecast Crude Oil Price

Crude oil price fluctuations affect almost every individual and activity on the planet. Forecasting the crude oil price is therefore an important concern especially in economic policy and financial circles as it enables stakeholders estimate crude oil price at a point in time. Autoregressive integrated moving average (ARIMA) has been an effective tool that has been used widely to model time series. Its limitation is the fact that it cannot model nonlinear systems sufficiently. This paper assesses the ability to build a robust forecasting model for the world crude oil price, Brent on the international market using a hybrid of two methods ARIMA and polynomial harmonic group method of data handling. ARIMA methodology is used to model the time series component with constant variance whilst the polynomial harmonic group method of data handling is used to model the harmonic ARIMA model residuals.


INTRODUCTION
Crude oil products fuel most vehicles in air, on water, and on land all over the world. Fuel derived from Crude oil, such as petrol, kerosene, diesel, heating oil, and so on supply 33% of all the energy consumed by households, businesses, and manufacturers in the world, (BP, 2018). Changes in crude oil price is therefore of significant interest to decision makers especially finance practitioners and commodity market participants. Unfortunately, crude oil price is the most complex and difficult to model because the changes are frequent, nonlinear, irregular and nonstationary. Thus, accurate forecasting of the crude oil price time series is one of the greatest challenges and among the most important issues facing energy researchers and economists towards better decisions at several managerial levels. As a result, achieving reliable and highly accurate forecasting models to answer the uncertainties and complexities of crude oil price is necessary and important to policy makers.

LITERATURE REVIEW
Several works have been done on forecasting of crude oil price in recent times. Existing literature review have characterized forecasting techniques into two main groups; quantitative and qualitative methods (Behmiri and Manso, 2013).
The time series of monthly cigarette sales in China have had double trends which include long-term upward trend and seasonal fluctuations trend. The complex time series cannot allow single linear or nonlinear forecasting model capture features of the data, so the results are inaccurate. Autoregressive integrated moving average (ARIMA) and GMDH models were combined to take advantage of their unique strengths in linear and nonlinear modeling respectively. Comparing forecast result with actual data sets confirmed the proposed hybrid model could be an effective way to improve forecasting accuracy This Journal is licensed under a Creative Commons Attribution 4.0 International License achieved by using either of the models separately (Aiyun et al., 2010).
ARIMA modelling and GMDH neural network, which are quantitative methods were individually used to do a short-term forecast from February 2015 to April 2015 of the prices of four crude oil extracts; Diesel, Kerosene, Petrol, and Liquified Petroleum Gas in India. Forecasted results were compared with the actual prices for the above period. The forecasting accuracy for all the four petroleum products showed promising results which justified the ARIMA and GMDH models forecasting the price of the different petroleum products in India (Khan et al., 2015).
The econometric model ARIMA, a linear model has been used widely for forecasting crude oil price. The ARIMA model is not able to forecast the crude oil price accurately which has non-linear characteristics; the ARIMA model cannot capture all the dynamic properties of the crude oil price hence the ARIMA residual. (Wang et al., 2005;Yu et al., 2008) This paper addressed the problem by the modelling the time series of crude oil using the hybrid ARIMA-phGMDH methodology as shown in the flow chart in Figure 1.
The ARIMA residual which was produced as part of the ARIMA model of crude oil price have been observed to have harmonic properties 1 , an example is shown in Figure 2, Residual for ARIMA (0, 2, 2).
phGMDH has the capacity to model harmonic data into a polynomial. (Ivakhnenko and Ivakhnenko, 1995;Nikolaev and Iba, 2003;Onwubolu, 2011;2014b). Hence the proposed hybrid method is designed to improve the accuracy of ARIMA crude oil price forecasting model. This paper focused on the modelling of a hybrid model and is organized into the introduction, literature review, data, methodology, results and discussion and conclusion.

RESEARCH DATA
Daily spot Brent crude oil prices from 2003 to 2017 was used in this paper, (U.S Department of Energy, 2019). On some days, data were not issued due to unknown reasons. These missing data points were filled by the interpolation method (Burden and Faires, 1997a;Moler, 2015a).

RESEARCH METHODS
This research paper applied two methodologies in building the hybrid forecasting model; the ARIMA and the phGMDH methods. The motivation for the adaption for the phGMDH is the fact that it can model harmonic time series (Nikolaev and Iba, 2003). The ARIMA methodology on the other hand can model the crude oil price and have harmonic residuals. The hybrid ARIMA-phGMDH methodology is shown in Figure 1.

The ARIMA Modelling
The ARIMA modeling or Box-Jenkins methodology includes three iterative steps: 1. Model Identification 2. Parameter estimation 3. Diagnostic checking The ARIMA model of order ARIMA (p, d, q) is one of the time series forecasting methods for the non-stationary data series. The ARIMA (p, d, q) can be expressed as in Eq. (1) (BowErman et al., 2005).

phGMDH Modeling on ARIMA Residual
The phGMDH is a variant of the multilayer GMDH network algorithm. It constructs hierarchical layers of bivariate Refer to (Axler et al., 2013) activation polynomials, Eq. (3), in the nodes and variables in the leaves. One or more of the activation polynomials terms is/ are harmonic terms of the form as shown in Eq. (2), (Nikolaev and Iba, 2003).
The phGMDH modelling process uses simple low-order bivariate activation polynomials, of form Eq. (3). Here the nodes are the hidden units, the leaves are inputs and activation polynomial coefficients are weights. The outcome of the activation polynomial, feeds forward to the parent nodes, creating new partial polynomial models. The algorithm thus churns out high-order multivariate polynomials, Eq. (5), by bringing together simple and tractable activation polynomials allocated in the hidden nodes of the network. For a given data series, Eq. (4), of vectors for all Y t and x t that belongs to real number, where α t are term coefficients 1 , b, the number of x variables, the objective of the phGMDH algorithm, F t x ( , ) , is to grow a higher order polynomial equation as shown in Eq. (5). Figure 3 demonstrates the phGMDH modelling process.

D
x y The phGMDH modelling is a two-step process; 1) Network initialization 2) Network construction and weight training (Nikolaev and Iba, 2003).

The Hybrid ARIMA phGMDH Modelling
ARIMA models the linear L t capabilities in the data y t whilst phGMDH approximates the non-linear N t , characteristics of the data using the residual of ARIMA model as input, Thus, y t , the hybrid approximation of the model is achieved by adding the linear and the nonlinear components as shown in Eq. (6), (Zhang, 2003

RESULTS AND DISCUSSION
The structure of the results flows as shown in Figure 1: ARIMA-phGMDH Methodology flow chart.

Model identification
The model is identified when the time series is made stationary. Stationary time series is achieved by differencing the time series.
The series is made stationary on the second differencing. Figure 4, sample autocorrelation (SAC) and sample partial autocorrelation (SPAC) of ARIMA for second differencing of Brent crude oil price confirms the stationarity. The SAC cut off at lag 2 and the sample SPAC dies down slowly. Hence the model was identified at ARIMA (0, 2, 2)
Substituting the parameters from the output of the econometric modeler, using the Brent crude oil price series, produced the tentative Brent ARIMA (0, 2, 2) model, Eq. (8). .
(8) Figure 2 shows the ARIMA residual of Brent series.

Diagnostic checking
Two tests were performed, the residual SAC (RSAC) and residual SPAC (RSPAC) tests, and the Ljung-Box Q-test, (Ljung and Box, 1978). The RSAC and RSPAC in Figure 5 did not show autocorrelation hence the model was exhaustive (Bowerman, et al., 1993). On the other hand, the Ljung-Box Q-test indicated an inadequate model by rejecting the null hypothesis that is the first m autocorrelations of the residuals of ARIMA (0, 2, 2) are jointly zero. The ARIMA (0, 2, 2) was inadequate, hence the residual, Figure 2 was remodeled. It served as input for the phGMDH modelling.
Step 1.1 Data organisation The input data was the ARIMA residual series r t as shown in Figure 2, with 3768 data points. It was organized into a 628 × 6 matrix of form Eq. (4) such that x x x x x x t = ( , , , ) , 1 2 3 4 5 , with dimension of the independent variables, 628 × 5 and y t the dependent variable. Figure 6 shows the assigned residual input data for the phGMDH. The assignment of the residual r t to x t and y t was done using Eq. (9).
The resultant non-multiple frequency function is a 208 th degree polynomial. Newton Raphson's method (Burden and Faires, 1997b) or fzero function in MATLAB (Moler, 2015b) is applied to solve for w i . For, 1 i h   , cos( ) 1 i w  and w i ∈ shows two w i , non-multiple harmonic frequencies, that are identified to be real numbers within the interval [0, π]. The amplitudes A i , B i , resultant amplitude, C i and phase angles ϕ i , for each angular frequency w i is computed (Nikolaev and Iba, 2003). For each of the independent vector variables x t the intensity plot function is produced using Eq. (12).
Dominant harmonics in each independent variable vector, x t is identified by selecting harmonics with the greatest intensity I(w i ), as shown in Table 1. Harmonics with the highest intensities H 1 and H 3 are chosen to replace x 1 and x 3 respectively to form Eq.
(13) the input for the GMDH as shown in Figure 7. Step 1.3 Initializing of parameters Now there are 10 combinations of the input variables for the first layer of the GMDH process. Network parameters width, κ, layer l, lowest error for convergence, ε and activation polynomial are initialized.

Step 2: Network construction and weight training
The input matrix in Figure 7: GMDH input, is fed to the GMDH algorithm in MATLAB (Mohammed, 2014;Onwubolu, 2014a,). The output is the polynomial, Eq. (14). From Eq. (17) is the approximation for the residual phGMDH model, ȓ t .

The Hybrid ARIMA phGMDH
The computation at this stage have been made easier as the ARIMA (0, 2, 2) and phGMDH estimate are at a differencing order of 2. Per definitions of Eq. (6), Eq. (8)

CONCLUSION
Organisations all over the world have a major duty of care to optimise the use of resources at their disposal to assure their going concern. One major resource that directly or indirectly affect profitability and sustainability of organisations is energy. Crude oil provides at least 33% of all energy needs of most organizations. Forecasting crude oil price is a proactive measure in the process of hedging against crude oil price risk, a major influence on most organisations' sustainability.
This study is an attempt to close the gap in forecasting the crude oil price using ARIMA methodology by building a hybrid ARIMA-phGMDH model to forecast Brent crude oil price using historical data. This paper demonstrates that it is feasible to build a hybrid forecasting model using the ARIMA and phGMDH models. The developed model for Brent crude oil as shown in Eq. (20) can be simulated to forecast future crude oil price for short term to medium term periods. This concept can be extended to the other crude oil markets in the oil and gas industry. The data can be updated to accommodate change in current data. Literature have confirmed an above 95% efficacy of the individual model components, the ARIMA and phGMDH models. Future work will concentrate on evaluating the ARIMA-phGMDH model to assess its forecasting efficacy.