Selection of Heteroscedastic Models: A Time Series Forecasting Approach

Show more

1. Introduction

Model selection is the act of choosing a model from a class of candidate models as a quest for a true model or best forecasting model or both (see also, [1], [2], [3]). There are often several competing models that can be used for forecasting a particular time series. Consequently, selecting an appropriate forecasting model is considerably practical importance [4] [5]. Selecting the model that provides the best fit to historical data generally does not result in a forecasting method that produces the best forecasts of new data. Concentrating too much on the model that produces the best historical fit often leads to overfitting, or including too many parameters or terms. The best approach is to select the model that results in the smallest standard deviation or mean squared error of the one-step-ahead forecast errors when the model is applied to data set that was not used in fitting process [4]. There are two approaches to model selection in time series; the in-sample model selection and the out-of-sample model selection. The in-sample model selection is targeted at selecting a model for inference, which according to [1] is intended to identify the best model for the data and to provide a reliable characterization of the sources of uncertainty for scientific insight and interpretation. The in-sample model selection criteria include Akaike information criterion, AIC [6], Schwarz information criterion, SIC [7], and Hannan and Quinn information criteria, HQIC [8]. As captured in [9], AIC considered a discrepancy between the true model and a candidate, BIC approximated the posterior model probabilities in a Bayesian framework, and Hannan and Quinn proposed a related criterion which has a smaller penalty compared to BIC that yet permitted strong consistency property (for more details on information criteria, see [10] [11] [12] [13] [14]). However, the major drawbacks of in-sample model selection criteria are that, they are unstable and minimizing these criteria over a class of candidate models leads to a model selection procedure that is conservative or over-consistent in parameter settings [2] [9], and the inability to inform directly about the quality of the model [3]. On the other hand, out-of-sample model selection procedure is applied to achieve the best predictive performance, essentially at describing the characterization of future observations without necessarily considering the choice of true model, rather, the attention is shifted to choose a model with the smallest predictive errors [1] [2] [15] [16]. The out-of-sample forecast is accomplished when the data used for constructing the model are different from that used in forecasting evaluation. That is, the data is divided into two portions. The first portion is for model construction and the second is used for evaluating the forecasting performance with possibility of forecasting new future observations which can be checked against what is observed ( [11] [16] [17]). Yet the choice of in-sample and out-of-sample model selection criteria is not without contention and such contention is well handled in [1] [15] [18] [19] [20].

With respect to heteroscedastic processes (or nonlinear time series), details regarding model selection are available in the studies of [21] - [27]. Meanwhile, in Nigeria, model selection in heteroscedastic processes are mainly based on in-sample criteria. For instance, the studies of [28] - [33] rely on the in-sample procedure to select the best fit model. Hence, this study seeks to improve on the work of [28] who applied the in-sample model selection criteria to choose best fitted heteroscedastic models by adopting out-of-sample forecasting approach in selecting heteroscedastic models that would best describe the accuracy and precision of future observations.

This work is further organized as follows: materials and methods are treated in Section 2, results and discussion covered in Section 3 and Section 4 takes care of conclusion.

2. Materials and Methods

2.1. Return

The return series ${R}_{t}$ can be obtained given that ${P}_{t}$ is the price of a unit share at time t, and ${P}_{t-1}$ is the share price at time $t-1$.

${R}_{t}=\nabla \mathrm{ln}{P}_{t}=\left(1-B\right)\mathrm{ln}{P}_{t}=\mathrm{ln}{P}_{t}-\mathrm{ln}{P}_{t-1}$ (1)

The ${R}_{t}$ in Equation (1) is regarded as a transformed series of the share price, ${P}_{t}$ meant to attain stationarity, that is, both mean and variance of the series are stable [29]. The letter B is the backshift operator.

2.2. Information Criteria

There are several information criteria available to determine the order, p, of an AR process and the order, q, of MA(q) process, all of them are likelihood based. The well-known Akaike information criterion (AIC), [6] is defined as

$\text{AIC}=\frac{-2}{T}\mathrm{ln}\left(\text{likelihood}\right)+\frac{2}{T}x\left(\text{number of parameters}\right),$ (2)

where the likelihood function is evaluated at the maximum likelihood estimates and T the sample size. For a Gaussian AR(p) model, AIC reduces to

$\text{AIC}\left(P\right)=\mathrm{ln}\left({\stackrel{^}{\sigma}}_{P}^{2}\right)+\frac{2P}{T}$ (3)

where ${\stackrel{^}{\sigma}}_{P}^{2}$ is the maximum likelihood estimate of ${\stackrel{^}{\sigma}}_{a}^{2}$, which is the variance of ${a}_{t}$, and T is the sample size. The first term of the AIC in Equation (6) measures the goodness-of-fit of the AR(p) model to the data whereas the second term is called the penalty function of the criterion because it penalizes a chosen model by the number of parameters used. Different penalty functions result in different information criteria.

The next commonly used criterion function is the Schwarz information criterion (SIC), [7]. For a Gaussian AR(p) model, the criterion is

$\text{SIC}\left(P\right)=\mathrm{ln}\left({\stackrel{^}{\sigma}}_{P}^{2}\right)+\left(\frac{P\mathrm{ln}\left(T\right)}{T}\right)$ (4)

Another commonly used criterion function is the Hannan Quinn information criterion (HQIC), [8]. For a Gaussian AR(p) model, the criterion is

$\text{HQIC}\left(P\right)=\mathrm{ln}\left({\stackrel{^}{\sigma}}_{P}^{2}\right)+\frac{\mathrm{ln}\left\{\mathrm{ln}\left(T\right)\right\}}{T}$ (5)

The penalty for each parameter used is 2 for AIC, ln(T) for SIC and ln{ln(T)} for HQIC. These penalty functions help to ensure selection of parsimonious models and to avoid choosing models with too many parameters.

The AIC criterion asymptotically overestimates the order with positive probability, whereas the BIC and HQIC criteria estimate the order consistently under fairly general conditions ( [11] [17]). Moreover, an in-sample model selection criterion is consistent if it chooses a true model when the true model is among those considered with probability approaching unity as the sample size becomes large, and if the true model is not among those considered, it selects the best approximation with probability approaching unity as sample size becomes larger [3]. The AIC is always considered inconsistent in that it does not penalize the inclusion of additional parameters. As such, relying on these criterion leads to overfitting. Meanwhile, the SIC and HQIC criteria are consistent in that it takes into account large size adjustment penalty. In contrast, consistency is not sufficiently informative. It turns out that the true model and any reasonable approximation to it are very complex. An asymptotically efficient model selection criterion chooses a sequence of models as the sample size get larger for which the one-step-ahead forecast error variances approach the one-step-ahead forecast error variance for the true model at least as fast as any other criterion [3]. The AIC is asymptotically efficient while SIC and HQIC are not. However, one major drawback of in-sample criteria is their inability to evaluate a candidate model’s potential predictive performance.

2.3. Model Evaluation Criteria

It is tempting to evaluate performance on the basis of the fit of the forecasting or time series model to historical data [3]. The best way to evaluate a candidate model’s predictive performance is to apply the out-of-sample forecast technique. This will provide a direct estimate of the one-step-ahead forecast error variance that guarantees an efficient model selection criterion. The methods of forecast evaluation based on forecast error include Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). These criteria measure forecast accuracy. The forecast bias is measured by Mean Error (ME).

The measures are computed as follows:

$\text{MSE}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{e}_{i}^{2}}$ (6)

$\text{RMSE}=\sqrt{\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{e}_{i}^{2}}}$ (7)

$\text{MAE}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left|{e}_{i}\right|}$ (8)

$\text{ME}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({e}_{i}\right)}$ (9)

where ${e}_{i}$ is the forecast error and n is the number of forecast error. Also, it should be noted that in this work, the forecasts of the returns are used as proxies for the volatilities as they are not directly observable [34].

2.4. Autoregressive Integrated Moving Average (ARIMA) Model

[10] considered the extension of ARMA model to deal with homogenous non-stationary time series in which
${X}_{t}$, itself is non-stationary but its d^{th} difference is a stationary ARMA model. Denoting the d^{th} difference of
${X}_{t}$ by

$\phi \left(B\right)=\varphi \left(B\right){\nabla}^{d}{X}_{t}=\theta \left(B\right){\epsilon}_{t},$ (10)

where $\phi \left(B\right)$ is the nonstationary autoregressive operator such that d of the roots of $\phi \left(B\right)=0$ are unity and the remainder lie outside the unit circle. $\varphi \left(B\right)$ is a stationary autoregressive operator.

2.5. Heteroscedastic Models

Autoregressive Conditional Heteroscedastic (ARCH) Model: The first model that provides a systematic framework for modeling heteroscedasticity is the ARCH model of [35]. Specifically, an ARCH (q) model assumes that,

${R}_{t}={\mu}_{t}+{a}_{t},{a}_{t}={\sigma}_{t}{e}_{t},$

${\sigma}_{t}^{2}=\omega +{\alpha}_{1}{a}_{t-1}^{2}+\cdots +{\alpha}_{q}{a}_{t-q}^{2}$, (11)

where $\left[{e}_{t}\right]$ is a sequence of independent and identically distributed (i.i.d.) random variables with mean zero, that is $\text{E}\left({e}_{t}\right)=0$ and variance 1, that is $\text{E}\left({e}_{t}^{2}\right)=1$, $\omega >0$, and ${\alpha}_{1},\cdots ,{\alpha}_{q}\ge 0$ [36]. The coefficients ${\alpha}_{i}$, for $i>0$, must satisfy some regularity conditions to ensure that the unconditional variance of ${a}_{t}$ is finite.

Generalized Autoregressive Conditional Heteroscedastic (GARCH) Model: Although the ARCH model is simple, it often requires many parameters to adequately describe the volatility process of a share price return. Some alternative models must be sought. [37] proposed a useful extension known as the generalized ARCH (GARCH) model. For a return series, ${R}_{t}$, let ${a}_{t}={R}_{t}-{\mu}_{t}$ be the innovation at time t. Then, ${a}_{t}$ follows a GARCH(q, p) model if

${a}_{t}={\sigma}_{t}{e}_{t}$,

${\sigma}_{t}^{2}=\omega +{\displaystyle \underset{i=1}{\overset{q}{\sum}}{\alpha}_{i}{a}_{t-i}^{2}}+{\displaystyle \underset{j=1}{\overset{q}{\sum}}{\beta}_{j}{\sigma}_{t-j}^{2}},$ (12)

where again ${e}_{t}$ is a sequence of i.i.d. random variance with mean, 0, and variance, 1, $\omega >0,{\alpha}_{i}\ge 0,{\beta}_{j}\ge 0$, and $\underset{i=1}{\overset{\mathrm{max}\left(p,q\right)}{\sum}}\left({\alpha}_{i}+{\beta}_{i}\right)}<1$ (see [38]).

Here, it is understood that ${\alpha}_{i}=0$, for $i>p$, and ${\beta}_{i}=0$, for $i>q$. The latter constraint on ${\alpha}_{i}+{\beta}_{i}$ implies that the unconditional variance of ${a}_{t}$ is finite, whereas its conditional variance ${\sigma}_{t}^{2}$, evolves over time.

Exponential Generalized Autoregressive Conditional Heteroscedastic (EGARCH) Model: The EGARCH model represents a major shift from ARCH and GARCH models [39]. Rather than modeling the variance directly, EGARCH models the natural logarithm of the variance, and so no parameter restrictions are required to ensure that the conditional variance is positive. The EGARCH(q, p) is defined as,

${R}_{t}={\mu}_{t}+{a}_{t},{a}_{t}={\sigma}_{t}{e}_{t},$

$\mathrm{ln}{\sigma}_{t}^{2}=\omega +{\displaystyle {\sum}_{i=1}^{q}{\alpha}_{i}\left|\frac{{a}_{t-i}}{\sqrt{{\sigma}_{t-i}^{2}}}\right|}+{\displaystyle {\sum}_{k=1}^{r}{\gamma}_{k}\left(\frac{{a}_{t-k}}{\sqrt{{\sigma}_{t-k}^{2}}}\right)}+{\displaystyle {\sum}_{j=1}^{p}{\beta}_{j}\mathrm{ln}{\sigma}_{t-j}^{2}},$ (13)

where again, ${e}_{t}$ is a sequence of i.i.d. random variance with mean, 0, and variance, 1, and ${\gamma}_{k}$ is the asymmetric coefficient.

Glosten, Jagannathan and Runkle (GJR-GARCH) Model: The GJR-GARCH (q, p) model proposed by [40] is a variant, represented by

${a}_{t}={\sigma}_{t}{e}_{t},$

${\sigma}_{t}^{2}=\omega +{\displaystyle {\sum}_{i=1}^{q}{\alpha}_{i}{a}_{t-i}^{2}}+{\displaystyle {\sum}_{i=1}^{p}{\gamma}_{i}{I}_{t-i}{a}_{t-i}^{2}}+{\displaystyle {\sum}_{j=1}^{p}{\beta}_{j}{\sigma}_{t-j}^{2}},$ (14)

where ${I}_{t-1}$ is an indicator for negative ${a}_{t-i}$, that is,

${I}_{t-1}=\{\begin{array}{l}0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}{a}_{t-i}<0,\\ 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{if}\text{\hspace{0.17em}}{a}_{t-i}\ge 0,\end{array}$

and ${\alpha}_{i},{\gamma}_{i}$ and ${\beta}_{j}$ are nonnegative parameters satisfying conditions similar to those of GARCH models. Also the introduction of indicator parameter of leverage effect, ${I}_{t-1}$ in the model accommodates the leverage effect, since it is supposed that the effect of ${a}_{t-i}^{2}$ on the conditional variance ${\sigma}_{t}^{2}$ is different accordingly to the sign of ${a}_{t-i}$.

2.6. Parametric Bootstrap

The parametric bootstrap is used in computing nonlinear forecasts given the fact that the model used in forecasting has been rigorously checked and is judged to be adequate for the series under study [39]. Let T be the forecast origin and k be the forecast horizon (k > 0). That is, we are at time index T and interested in forecasting ${R}_{T+k}$. The parametric bootstrap considered compute realizations ${R}_{T+1},\cdots ,{R}_{T+k}$ sequentially by drawing a new innovation from the specific innovational distribution of the model, and computing ${R}_{T+i}$ using the model, data, and previous forecasts ${R}_{T+1},\cdots ,{R}_{T+i-1}$. This results in a realization for ${R}_{T+k}$. The procedure is repeated M times to obtain M realizations of ${R}_{T+k}$ denoted by ${\left\{{R}_{T+k}^{\left(j\right)}\right\}}_{j=1}^{M}$. The point forecast of ${R}_{T+k}$ is then the sample average of ${R}_{T+k}^{\left(j\right)}$.

Consequently, Forecasts of the ARCH model are obtained recursively. Let T be the starting date for forecasting, that is forecast origin. Let ${F}_{T}$ be the information set available at time T. Then, the 1-step ahead forecast for conditional variance, ${\sigma}_{T+1}^{2}$ is

${\sigma}_{T}^{2}\left(1\right)=\stackrel{^}{\omega}+{\stackrel{^}{\alpha}}_{1}{\stackrel{^}{a}}_{T}^{2}+\cdots +{\stackrel{^}{\alpha}}_{p}{\stackrel{^}{a}}_{T+1-p}^{2},$ (15)

where ${\stackrel{^}{a}}_{T}$ is the estimated residual. For the 2-step ahead forecast ${\sigma}_{T+2}^{2}$, we need a forecast of ${a}_{T+1}^{2}$. It is given by ${\sigma}_{T}^{2}\left(1\right)$. We therefore obtain

${\sigma}_{T}^{2}\left(2\right)=\stackrel{^}{\omega}+{\stackrel{^}{\alpha}}_{1}{\sigma}_{T}^{2}\left(1\right)+{\stackrel{^}{\alpha}}_{2}{\stackrel{^}{a}}_{T}^{2}+\cdots +{\stackrel{^}{\alpha}}_{p}{\stackrel{^}{a}}_{T+2-p}^{2}.$ (16)

The k-step ahead forecast for ${\sigma}_{T+k}^{2}$ is

${\sigma}_{T}^{2}\left(k\right)=\stackrel{^}{\omega}+{\stackrel{^}{\alpha}}_{1}{\sigma}_{T}^{2}\left(k-1\right)+\cdots +{\stackrel{^}{\alpha}}_{p}{\sigma}_{T}^{2}\left(k-p\right),$ (17)

with ${\sigma}_{T}^{2}\left(k-i\right)={\stackrel{^}{a}}_{T+k-i}^{2}$ if $k-i\le 0$.

Forecasts of the GARCH model are obtained recursively in a similar way as that of the ARCH model. Then, the 1-step ahead forecast for ${\sigma}_{T+1}^{2}$ is

${\sigma}_{T}^{2}\left(1\right)=\stackrel{^}{\omega}+{\stackrel{^}{\alpha}}_{1}{\stackrel{^}{a}}_{T}^{2}+{\stackrel{^}{\beta}}_{1}{\stackrel{^}{\sigma}}_{T}^{2}$, (18)

since ${a}_{T}^{2}={\sigma}_{T}^{2}{e}_{T}^{2}$, the GARCH (1,1) model can be rewritten as

${\sigma}_{T}^{2}=\omega +{\alpha}_{1}{a}_{T-1}^{2}+{\beta}_{1}{\sigma}_{T-1}^{2}=\omega +\left({\alpha}_{1}+{\beta}_{1}\right){\sigma}_{T-1}^{2}+{\alpha}_{1}{\sigma}_{T-1}^{2}\left({e}_{T-1}^{2}-1\right)$,

so that, at time $T+2$, we have

${\sigma}_{T+2}^{2}=\omega +\left({\alpha}_{1}+{\beta}_{1}\right){\sigma}_{T+1}^{2}+{\alpha}_{1}{\sigma}_{T+1}^{2}\left({e}_{T+1}^{2}-1\right)$,

with $E\left[\left({e}_{T+1}^{2}-1\right)/{F}_{T}\right]=0$, we deduce the following 2-step ahead forecast for ${\sigma}_{T+2}^{2}$ :

${\sigma}_{T}^{2}\left(2\right)=\stackrel{^}{\omega}+\left({\stackrel{^}{\alpha}}_{1}+{\stackrel{^}{\beta}}_{1}\right){\sigma}_{T}^{2}\left(1\right)$.

Generally speaking, the k-step ahead forecast for ${\sigma}_{T+k}^{2}$ is

${\sigma}_{T}^{2}\left(k\right)=\stackrel{^}{\omega}+\left({\stackrel{^}{\alpha}}_{1}+{\stackrel{^}{\beta}}_{1}\right){\sigma}_{T}^{2}\left(k-1\right),k>1.$ (19)

One of the beauties of GARCH is that volatility forecasts for any horizon can be constructed from the estimated model. The estimated GARCH model is used to get forecasts of instantaneous forward volatilities, that is, the forecast for ${\sigma}_{T+k}^{2}$ made at time T and for every k step ahead.

For EGARCH model, assuming that the model parameters are known and the observations are standard Gaussian, for EGARCH (1,1) model, we have

$\mathrm{ln}{\sigma}_{T}^{2}=\left(1-{\alpha}_{1}\right)\omega +{\alpha}_{1}\mathrm{ln}{\sigma}_{T-1}^{2}+g\left({\u03f5}_{T-1}\right),$

$g\left({\u03f5}_{T-1}\right)=\theta {\u03f5}_{T-1}+\gamma \left(\left|{\u03f5}_{T-1}\right|-\sqrt{2/\text{\pi}}\right)$. (20)

Taking exponentials, the model becomes

${\sigma}_{T}^{2}={\sigma}_{T-1}^{2{\alpha}_{1}}\mathrm{exp}\left[\left(1-{\alpha}_{1}\right)\omega \right]\mathrm{exp}\left[g\left({\u03f5}_{T-1}\right)\right],$

$g\left({\u03f5}_{T-1}\right)=\theta {\u03f5}_{T-1}+\gamma \left(\left|{\u03f5}_{T-1}\right|-\sqrt{2/\text{\pi}}\right)$. (21)

For the 1-step ahead forecast, ${\sigma}_{T+1}^{2}$ we have

${\sigma}_{T}^{2}\left(1\right)={\sigma}_{T}^{2{\alpha}_{1}}\mathrm{exp}\left[\left(1-{\alpha}_{1}\right)\omega \right]\mathrm{exp}\left[g\left({\u03f5}_{T}\right)\right]$. (22)

The 2-step-ahead forecast of ${\sigma}_{T+2}^{2}$ is given by

${\sigma}_{T}^{2}\left(2\right)={\stackrel{^}{\sigma}}_{T}^{2{\alpha}_{1}}\left(1\right)\mathrm{exp}\left[\left(1-{\alpha}_{1}\right)\omega \right]{E}_{T}\left\{\mathrm{exp}\left[g\left({\u03f5}_{T}\right)\right]\right\}$,

where ${E}_{T}$ denotes a conditional expectation taken at the time origin T with

$E\left\{\mathrm{exp}\left[g\left({\u03f5}_{T}\right)\right]\right\}=\mathrm{exp}\left(-\gamma \sqrt{2/\text{\pi}}\right)\left[{\text{e}}^{{\left(\theta +\gamma \right)}^{2}/2}\Phi \left(\theta +\gamma \right)+{\text{e}}^{{\left(\theta -\gamma \right)}^{2}/2}\Phi \left(\gamma -\theta \right)\right]$,

where $\Phi \left(x\right)$ is the cumulative density function of the standard normal distribution (see [39] for more details). Hence,

$\begin{array}{l}{\stackrel{^}{\sigma}}_{T}^{2}\left(2\right)={\stackrel{^}{\sigma}}_{T}^{2{\alpha}_{1}}\left(1\right)\mathrm{exp}\left[\left(1-{\alpha}_{1}\right)\omega -\gamma \sqrt{2/\text{\pi}}\right]\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\times \left\{\mathrm{exp}\left[{\left(\theta +\gamma \right)}^{2}/2\right]\Phi \left(\theta +\gamma \right)+\mathrm{exp}\left[{\left(\theta -\gamma \right)}^{2}/2\right]\Phi \left(\gamma -\theta \right)\right\}\end{array}$

Generally, the k-step -ahead forecast can be obtained as

$\begin{array}{l}{\stackrel{^}{\sigma}}_{T}^{2}\left(k\right)={\stackrel{^}{\sigma}}_{T}^{2{\alpha}_{1}}\left(k-1\right)\mathrm{exp}\left[\left(1-{\alpha}_{1}\right)\omega -\gamma \sqrt{2/\text{\pi}}\right]\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\times \left\{\mathrm{exp}\left[{\left(\theta +\gamma \right)}^{2}/2\right]\Phi \left(\theta +\gamma \right)+\mathrm{exp}\left[{\left(\theta -\gamma \right)}^{2}/2\right]\Phi \left(\gamma -\theta \right)\right\}\end{array}$ (23)

(See also, [34], [38]).

3. Results and Discussion

3.1. Plot Analysis

Figure 1 and Figure 2 are the share prices of Diamond and Fidelity Banks. Their movements appeared to fluctuate away from the common mean indicating the presence of stochastic nonstationarity.

Figure 3 and Figure 4 are the returns series of the respective banks and are found to cluster around the common mean signifying stationarity.

Figure 1. Share price series of diamond bank.

Figure 2. Share price series of fidelity bank.

Figure 3. Return series of diamond bank.

Figure 4. Return series of fidelity bank.

3.2. In-Sample Model Selection

Several models with respect to normal distribution (norm) and student-t distribution (std) such as ARIMA (2,1,1)-GARCH (1,0)-std, ARIMA (2,1,1)-GARCH (2,0)-std, ARIMA (2,1,1)-GARCH (1,1)-norm, ARIMA (2,1,1)-EGARCH (1,1)-norm and ARIMA (2,1,1)-EGARCH (1,1)-std were considered tentatively for the return series of Diamond Bank. ARIMA (2,1,1)-GARCH (2,0)-std was selected based on minimum information criteria (see Table 1). The model was found to be adequate given that the p-values corresponding to weighted Ljung-Box Q statistics at lags 1, 8 and 14 on standardized residuals, weighted

Table 1. Estimation of Heteroscedastic models of return series of diamond bank.

Ljung-Box Q statistics at lags 1, 5 and 9 on standardized squared residuals and weighted Lagrange Multiplier statistics at lags 3, 5 and 7 are all greater than 5% level of significance [see Table 2]. That is to say, the hypotheses of no autocorrelation and no remaining ARCH effect are not rejected.

Also, for Fidelity Bank, ARIMA (1,1,0)-GARCH (1,0)-norm, ARIMA (1,1,0)-GARCH (1,0)-std, ARIMA (1,1,0)-GARCH (1,1)-norm, ARIMA (1,1,0)-EGARCH (1,1)-norm and ARIMA (1,1,0)-EGARCH (1,1)-std were considered tentatively (Table 3). Based on smallest information criteria, ARIMA (1,1,0)-EGARCH (1,1)-std was chosen as the appropriate model. The selected model is adequate since all the p-values corresponding to weighted Ljung-Box Q statistics at lags 1, 2 and 5 on standardized residuals, weighted Ljung-Box Q statistics at lags 1, 5 and 9 on standardized squared residuals and weighted Lagrange Multiplier statistics at lags 3, 5 and 7 are greater than 5% level of significance [see Table 4]. That is to say, the null hypotheses of no autocorrelation and no ARCH effect are not rejected at 5% significance level.

3.3. Out-Of-Sample Forecasting Model Selection

Here, the out-of-sample forecast evaluation criteria; MAE, MSE and RMSE for each of the models are considered for the series of the banks. It was found that ARIMA (2,1,1)-EGARCH (1,1)-norm and ARIMA (1,1,0)-EGARCH (1,1)-norm possessed the smallest out-of-sample forecast evaluation criteria (see Table 5 and Table 6). Hence, the most appropriate for the return series of the respective banks.

Based on our findings, the in-sample model selection procedure favoured ARIMA (2,1,1)-GARCH (2,0)-std and ARIMA (1,1,0)-EGARCH (1,1)-std model while the out-of-sample model selection sufficed the choice of ARIMA (2,1,1)-EGARCH (1,1)-norm and ARIMA (1,1,0)-EGARCH (1,1)-norm models for the banks considered. Majorly, it is discovered that in each of the models selected through in-sample criteria are ill-conditioned. For instance, the constant term of the variance equation, ω of ARIMA (2,1,1)-GARCH (2,0)-std is zero which actually violates the constraint condition that requires $\omega >0$. The implication is that, this model is not suitable for forecasting long-run variance as it would collapse at zero. Again, in EGARCH (1,1)-std, the stationarity condition which requires ${\sum}_{j}^{p}{\beta}_{j}}<1$, is violated. The implication is that, forecasting long-run variance using this model would not be realistic in that the variance

Table 2. Diagnostic checking for heteroscedastic models of return series of diamond bank.

Table 3. Estimation of heteroscedastic models of return series of fidelity bank.

Table 4. Diagnostic checking for Heteroscedastic models of return series of fidelity bank.

Table 5. Out-of-sample forecast evaluation criteria for diamond bank.

Table 6. Out-of-sample forecast evaluation criteria for fidelity bank.

would converge at infinity. Moreover, the highly significance of the parameters of the models indicated that the models are over-fitted. Meanwhile, the models selected through out-of-sample criteria are characterized by non-significant parameters yet possessed smallest predictive errors and problem associated with over-fitting is overcome. In particular, this study showed that the study of [28] can be improved by adopting out-of-sample forecasting procedure. Furthermore, the study is in agreement with the works of [1], [2], [22] by supporting the choice of models based on smallest predictive errors.

4. Conclusion

In all, our study showed that out-of-sample model selection approach outperformed the in-sample counterpart in describing the characterization of future observations without necessarily considering the choice of true model. The major strength of this study is in utilizing the advantage of combining both ARIMA and GARCH-type models to achieve forecast accuracy. The weakness of this study is in adopting larger samples of training data against smaller sample sizes for forecast evaluation, which is suitable for achieving the best fitting models. However, this weakness could be overcome by adopting smaller sample sizes of data for model formulation and larger samples for forecast evaluation in future study.

References

[1] Ding, J., Tarokh, V. and Gang, G. (2018) Model Selection Techniques—An Overview. IEEE Signal Processing Magazine, 21, 1-21.

http://arXiv:1810.09583v1

[2] Leeb, H. (2008) Evaluation and Selection of Models for Out-of-Sample Prediction when the Sample Size Is Small Relative to the Complexity of the Data-Generating Process. Bernoulli, 14, 661-690.

https://doi.org/10.3150/08-BEJ127

[3] Sinharay, S. (2010) An Overview of Statistics in Education. In: Peterson, P., et al., Eds., International Encyclopedia of Education, 3rd Edition, Elsevier Ltd., Amsterdam, 1-11.

https://doi.org/10.1016/B978-0-08-044894-7.01719-X

[4] Montgomeny, D.C., Jennings, C.L. and Kulahci, M. (2008) Introduction to Time Series Analysis and Forecasting. John Wiley & Sons, Hoboken, 18-60.

[5] Wei, W.W.S. (2006) Time Series Analysis Univariate and Multivariate Methods. 2nd Edition, Addison Wesley, New York, 33-59.

[6] Akaike, H. (1973) A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, 19, 716-723.

https://doi.org/10.1109/TAC.1974.1100705

[7] Schwarz, G. (1978) Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464.

https://www.jstor.org/stable/2958889

https://doi.org/10.1214/aos/1176344136

[8] Hannan, E. and Quinn, B. (1979) The Determination of the Order of an Auto-Regression. Journal of Royal Statistical Society, Series B, 41, 190-195.

https://www.jstor.org/stable/2985032

https://doi.org/10.1111/j.2517-6161.1979.tb01072.x

[9] Zou, H. and Yang, G. (2004) Combining Time Series Models for Forecasting. International Journal of Forecasting, 20, 69-84.

https://doi.org/10.1016/S0169-2070(03)00004-9

[10] Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (2008) Time Series Analysis: Forecasting and Control. 3rd Edition, John Wiley & Sons, Hoboken, 5-22.

https://doi.org/10.1002/9781118619193

[11] Bozdogan, H. (2000) Akaike’s Information Criteria and Recent Developments Information Complexity. Journal of Mathematical Psychology, 44, 62-91.

https://doi.org/10.1006/jmps.1999.1277

[12] Wasserman, L. (2000) Bayesian Model Selection and Model Averaging. Journal of Mathematical Psychology, 44, 92-107.

https://doi.org/10.1006/jmps.1999.1278

[13] Myung, I.J. (2000) The Importance of Complexity in Model Selection. Journal of Mathematical Psychology, 44, 190-204.

https://doi.org/10.1006/jmps.1999.1283

[14] Zucchini, W. (2000) An Introduction to Model Selection. Journal of Mathematical Psychology, 44, 41-61.

https://doi.org/10.1006/jmps.1999.1276

[15] Pilatowska, M. (2011) Information and Prediction Criteria in Selecting the Forecasting Model. Dynamic Econometric Models, 11, 21-40.

https://doi.org/10.12775/DEM.2011.002

[16] Chatfield, C. (2000) Time Series Forecasting. 5th Edition, Chapman and Hall CRC, New York.

[17] Moffat, I.U. and Akpan, E.A. (2014) Time Series Forecasting: A Tool for Out-Sample Model Selection and Evaluation. American Journal of Scientific and Industrial Research, 5, 185-194.

[18] Mitchell, H. and Mokenzie, M.D. (2010) GARCH Model Selection Criteria. Quantitative Finance, 3, 262-284. https://doi.org/10.1088/1469-7688/3/4/303

[19] Brooks, C. and Burke, S.P. (2010) Information Criteria for GARCH Model Selection. The European Journal of Finance, 9, 557-580.

https://doi.org/10.1080/1351847021000029188

[20] Degiannakis, S. and Xekalaki, E. (2005) Predictability and Model Selection in the Context of ARCH Models. Journal of Applied Stochastic Models in Business and Industry, 21, 55-82.

https://doi.org/10.1002/asmb.551

[21] Bal, C., Demir, S. and Aladag, C.H. (2016) A Comparison of Different Model Selection Criteria for Forecasting EURO/USD Exchange Rates by Feed Forward Neural Network. International Journal of Computing, Communication and Instrumentalism Engineering, 3, 271-275.

https://doi.org/10.15242/IJCCIE.U0616010

[22] Psaradakis, Z., Sola, M., Spagnolo, F. and Spagnolo, N. (2009) Selecting Nonlinear Time Series Models Using Information Criteria. Journal of Time Series Analysis, 30, 369-394.

https://doi.org/10.1111/j.1467-9892.2009.00614.x

[23] Pena, D. and Rodriguez, J. (2005) Detecting Nonlinearity in Time Series by Model Selection Criteria. International Journal of Forecasting, 21, 731-748.

https://doi.org/10.1016/j.ijforecast.2005.04.014

[24] Manzan, S. (2004) Model Selection for Non Linear Time Series. Empirical Economics, 29, 901-920.

https://doi.org/10.1007/s00181-004-0207-7

[25] Judd, K. and Mees, A. (1995) On Selecting Models for Nonlinear Time Series. Physica D: Nonlinear Phenomena, 82, 426-444.

https://doi.org/10.1016/0167-2789(95)00050-E

[26] Liu, Y. and Enders, W. (2003) Out-of-Sample Forecasts and Nonlinear Model Selection with an Example of the Term Structure of Interest Rates. Southern Economic Journal, 69, 520-540. https://www.jstor.org/stable/1061692

https://doi.org/10.2307/1061692

[27] Gabriel, A.S. (2012) Evaluating the Forecasting Performance of GARCH Models: Evidence from Romania. Precedia-Social and Behavioral Sciences, 62, 1006-1010.

https://doi.org/10.1016/j.sbspro.2012.09.171

[28] Akpan, E.A., Lasisi, K.E. and Adamu, A. (2018) Modeling Heteroscedasticity in the Presence of Outliers in Discrete-Time Stochastic Series. Academic Journal of Applied Mathematical Sciences, 4, 61-76.

[29] Akpan, E.A. and Moffat, I.U. (2017) Detection and Modeling of Asymmetric GARCH Effects in a Discrete-Time Series. International Journal of Statistics and Probability, 6, 111-119.

https://doi.org/10.5539/ijsp.v6n6p111

[30] Akpan, E.A., Moffat, I.U. and Ekpo, N.B. (2016) Arma-Arch Modeling of the Returns of First Bank of Nigeria. European Scientific Journal, 12, 257-266.

https://doi.org/10.19044/esj.2016.v12n18p257

[31] Onwukwe, C.E., Samson, T.K. and Lipcsey, Z. (2014) Modeling and Forecasting Daily Returns Volatility of Nigerian Banks Stocks. European Scientific Journal, 10, 449-467.

[32] Arowolo, W.B. (2013) Predicting Stock Prices Returns Using GARCH Model. International Journal of Engineering and Science, 2, 32-37.

[33] Emenike, K.O. and Friday, A.S. (2012) Modeling Asymmetric Volatility in the Nigerian Stock Exchange. European Journal of Business and Management, 4, 52-59.

[34] Akpan, E.A., Lasisi, K.E., Adamu, A. and Rann, H.B. (2019) Evaluation of Forecasts Performance of ARIMA-GARCH-Type Models in the Light of Outliers. World Scientific News, 119, 68-84.

[35] Engle, R.F. (1982) Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflations. Econometrica, 50, 987-1007.

https://doi.org/10.2307/1912773

[36] Francq, C. and Zakoian, J. (2010) GARCH Models: Structure, Statistical Inference and Financial Applications. John Wiley & Sons Ltd., Chichester, 19-220.

https://doi.org/10.1002/9780470670057

[37] Bollerslev, T. (1986) Generalized Autoregressive Conditional Heteroscedasticity. Econometrics, 31, 307-327.

https://doi.org/10.1016/0304-4076(86)90063-1

[38] Tsay, R.S. (2010) Analysis of Financial Time Series. 3rd Edition, John Wiley & Sons Inc., New York, 97-140.

https://doi.org/10.1002/9780470644560

[39] Nelson, D.B. (1991) Conditional Heteroscedasticity of Asset Returns. A New Approach. Econometrica, 59, 347-370.

https://doi.org/10.2307/2938260

[40] Glosten, L.R., Jagannathan, R. and Runkle, D. (1993) On the Relation between the Expected Values and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance, 48, 1779-1801.

https://doi.org/10.1111/j.1540-6261.1993.tb05128.x