Stock index forecasting has been empirically investigated over the past decades. The importance of stock index forecasting in making speculation, hedge, and arbitrage investment decisions is addressed by many practitioners, financial engineers, and academic researchers. Due to the stochastic and much like a random walk phenomenon nature of stock index movement, the task of making efficient forecast is challenging and requires innovative thinking in investment theory, model settings, and variable selection.
The stock market behavior is a typical financial time-series process which involves issues such as stationarity, serial correlation, heteroscedasticity, nonlinearity, and causality. While ARIMA models could be used to build a stock market index forecasting model, the results are usually unsatisfactory (Khandelwal et al., 2015; Ariyo et al., 2014; Zhang, 2003). Many researchers had tried to use traditional econometric model with macroeconomic variables in forecasting the stock returns, but the forecasting power is limited (Laichena & Obwogi, 2015; Ouma & Muriu, 2014, Flannery & Protopapadakis, 2002). Some of researchers utilized the technical indicators in forecasting the stock returns (Paluch & Jackowska-Strumiłło, 2018; Paluch & Jackowska-Strumiłło, 2012; Sutheebanjard & Premchaiswadi, 2010; Tilakaratne, Morris, Mammadov & Hurst, 2007).
On the other hand, dramatic development in statistical and heuristic computing algorithms such as genetic algorithm (GA) and artificial neural networks (ANN) have been seen in the past decades. The improvement of mathematical optimization capability for handling complicated, dynamic, and nonlinear functional forms with multivariate dataset could help researchers enhance the construction data classification, financial forecasting, and risk management models.
The genetic algorithm (GA) uses the biological evolutionary rule for finding optimal number of variables and weighting schemes. Specifically, the optimal final outcomes can be found by using reproduction, crossover, and mutation procedure with a fitness function and a certain amount of iterative generations. Past literatures have disclosed the application of the GA techniques for forecasting stock price (Armano, Marchesi, & Murru, 2005; Kim & Han, 2000; Kai & Wenhua, 1997). The artificial neural networks (ANN) imitate the bio-neural processing system with hidden layers and hidden units for finding better solutions. Specifically, the ANN model can be used in making a forecasting model by searching optimal hidden layers, hidden units, transformation, and learning coefficient. Past literatures have disclosed the application of the ANN techniques for forecasting stock price (Nayak, Misra & Behera, 2017; Kwon & Moon, 2007; Chen, Leung, & Daouk, 2003).
According to past literatures, past researches had focused on many issues regarding stock index forecasting. However, this study intends to re-examine some issues which may not have been addressed in the past studies. First, the GA and ANN models are integrated in such a way that allows GA method to randomly select proper sets of variables through crossover and mutation, the ANN methodology is applied in each simulation to find optimal simulated parameters, and a forecast for one-period ahead stock index is made. Second, randomly selected transforming and learning rates in both hidden layers and final outcome stages are simulated. Third, the stock index forecasting efficiency between macroeconomic factors and technical indicators are compared. Fourthly, the focus is placed on the monthly stock index rather than the daily stock index.
The rest of the paper is organized as follows: Section 2 discusses data and methodology; Section 3 provides the empirical results; and Section 4 summarizes the discussion and concludes the paper.
2. Data and Methodology
2.1. Data Description
Monthly data of Taiwan stock index, electronic index and financial index from Jan. 2001 to Dec. 2019 are collected as dependent variables. Eight influential macroeconomic factors and seven commonly watched technical indicators are used as independent variables. The total number of months is 228. All of the dependent and independent variables are lagged t − 1 thru lagged t − 6. Thus, there are 54 and 48 predetermined variables for maroeconomic and technical analysis data set, respectively.
The stock index return (RET) is computed as the natural log of (Price/lagged_Price). The eight macroeconomic variables are as follows: (Kvainickas & Stankevičienė, 2019; Laichena & Obwogi, 2015; Ouma & Muriu, 2014)
1) GDP: the growth rate of gross national product.
2) M1B: the government defined M1B money supply.
3) BOND: the monthly 10-year Long-term government bonds.
4) UMR: the monthly Unemployment rate.
5) Wage: the average monthly salary of manufacturing industry.
6) IPI: the industrial production index.
7) CPI = the monthly consumer price index.
8) WPI = the monthly wholesale price index.
The seven technical indicators are as follows: (Paluch & Jackowska-Strumiłło, 2018; Paluch & Jackowska-Strumiłło, 2012; Sutheebanjard & Premchaiswadi, 2010; Tilakaratne, Morris, Mammadov, & Hurst, 2007)
1) MA5: the 5-month moving average.
2) MA10: the 10-month moving average.
3) MA20: the 20-month moving average.
4) OSC: the Oscillator indicator, i.e., DIF – MACD.
DIF = EMA12 − EMA26;
MACD = EMA9;
EMA12t = (2 × Pt + 11 × EMA12t−1)/13
5) BIAS5: the 5-month BIAS, i.e. PRICE/MA5.
6) BIAS10: the 10-month BIAS, i.e. PRICE/MA10.
7) BIAS20: the 20-month BIAS, i.e. PRICE/MA20.
2.2.1. Linear and Nonlinear Unit Root Tests
Financial time series often exhibit trending behavior or non-stationarity in the mean. The study conducts the linear unit root tests of the three stock index series by applying the augmented Dickey-Fuller (ADF) test (Dickey & Fuller, 1979; Dickey and Fuller, 1981), the Phillips-Perron (PP) test (Phillips & Perron, 1988), the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski, Phillips, Schmidt, & Shin, 1992), as well as the nonlinear Kapetanios-Shin-Snell (KSS) test (Kapetanios, Shin, & Snell, 2003). The ADF test’s regression includes lags of the first differences of Yt, and the corresponding three models are expressed in the following equations:
where t is the time index, α is an intercept constant called a drift, is the coefficient on a time trend, is the coefficient presenting the process root, i.e., the focus of testing, k is the lag order of the first-differences autoregressive process, and is an independent identically distributed residual term.
Model (1) is a pure random walk with the lag terms. Model (2) possesses a drift. Model (3) includes a drift and a time trend. The null hypothesis for the ADF test is: , with the alternative . The ADF t-test statistic is .
The PP test differs from the ADF test mainly in how PP test deals with serial correlation and heteroscedasticity in the error term. The PP test does not require the specification of the form of the serial correlation of under the null, nor the errors be conditionally homoscedastic. The ADF and PP unit root tests are for the null hypothesis that a time series is integrated of order one, I(1). On the other hand, the KPSS unit root test is for the null that is integrated of order zero, I(0). In addition, the KSS test is applied since the above linear unit root tests may suffer from important power distortions in the presence of nonlinearities in the data generating process.
2.2.2. The ARMA(p, q) Model as the Benchmark
In this study, the ARMA(p, q) model is used as the benchmark model. The stationarity of the returns series is checked using the unit root tests. The estimation of the ARMA models for three stock index returns includes the checking of appropriate ARMA(p, q) orders, the sliding window of the training sample, and one-month ahead forecasting.
2.2.3. Development of Augmented GA_ANN (AGA_ANN) Model
The traditional genetic algorithm estimation procedure includes Initialization, reproduction, genetic operations (including crossover and mutation), heuristics, and termination. As shown in Figure 1, the ANN model consists of three stages, i.e. input, hidden layer, and output. The components of ANN includes neurons, connections and weights, propagation function, ANN parameters (including learning rate, the number of hidden layers and batch size), weights adjustment, backpropagation, and self-learning.
The rationale of the newly proposed augmented GA_ANN (namely, AGA_ANN) model is to adopt the advantages of GA and ANN so as to improve the forecasting accuracy. The transformation functions from the input node, the hidden layer node, to the output node are as follows: (the λh and λo are transformation parameters.)
is the jth hidden unit; is the forecasted output; is the input variable. is the weight of input variable; is the weight of hidden unit.
The detailed AGA_ANN estimation procedure is as follows:
Figure 1. The AGA_ANN modle.
1) Variables transformation
a) Dependent variables
To improve simulated performance, the three stock index returns series are transformed by using the following logistic function. The transformed series (Y1) is then converted into 0 or 1 series (Y).
Y is one when Y1 is greater than or equal to 0.5; otherwise Y is zero.
b) Independent variables
The independent variables are standardized with mean equal to zero and standard deviation equal to one. The transformed series is then logisticalized to within zero and one.
2) The sliding window span parameters
In this study, the sliding window spans are simulated by 24-, 30-, 36-, 42-, and 48-months as the training base. The base data is then used for simulating the AGA_ANN model. The best simulated parameters are then adopted for making the one-month ahead forecast. Then the sliding window moves one period ahead and performs next AGA_ANN model until the end of observations.
3) The initialization of Wji and Wlj parameters
The coefficient weights of Wji and Wlj are randomly and uniformly simulated having values within zero and one.
4) The selection of simulated IV and hidden units
In this study, the number of simulated independent variables (M) ranges from 6 to NVAR/2. The NVAR is the total number of predetermined variables. For each simulation, 100 sets of random selection are made. The number of hidden units (J) ranges from M/2 to M.
5) The GA procedure
By using the core ANN estimation, the hit ratios of the 100 sets are ranked. The top 10 sets are kept. The variables in the middle 80 sets are switched according to crossover method. The worst 10 sets are wiped off and additional new 10 sets are created. Thus, the newly created 100 sets are used for the next run.
6) The randomization of transformation and learning parameters
In this study, the transformation and learning Parameters are uniformly simulated from 0.5 to 1.0. For each simulation, 10 sets of random selection are made.
7) The one-month ahead forecast
For each simulation, the best simulated parameters are used to make a one-month ahead forecast until the end of observation.
8) The computation of performance indices
In this study, the proposed four performance indices are as follows:
The forecasted value Y is converted into a forecasted stock index . The equation of the mean absolute percentage error (MAPE) is listed below: (Pt is the actual stock index at time t)
The equation of the hit ratio (HR) is listed below:
where HITt = 1 if RET × PRET > 0; HITt = 0 otherwise.
The equation of the average relative variance (ARV) is listed below:
where is the monthly average stock index.
d) Theil’s U
The equation of the Theil’s U is listed below: (The U2 measure)
3. Empirical Results
3.1. Descriptive Statistics
The descriptive statistics is shown in Table 1. There are three subjects under study, namely, market, electronic, and financial stock indices. Monthly data is listed from Jan. 2000 to Dec. 2019. A total of 240 months of data are used for each subject. Seven technical indicators and eight macroeconomic variables are listed. In order to create the lagged values of predetermined variables including the lagged dependent and independent variables, year 2000 is used as the extra year for creating lagged values. The actual simulation starts from Jan. 2001.
3.2. The Results of Linear and Nonlinear Unit Root Tests
A nonstationary time series might lead to spurious regression. Linear unit root tests of the ADF, PP, and KPSS, and nonlinear KSS unit root tests are conducted for the MKT, ELEC, and FINA returns. Tables 2-4 show the results and conclude that all three series are stationary statistically. Notice that an insignificant t value of KPSS test verifies the series is stationary.
3.3. The Simulated Parameters of the Three Models
Using the SAS-IML and FARMAFIT functional call, the estimation and sliding window simulation of ARMA(p, q) model reveals that AR(p) = 3 and MA(q) = 2 throughout entire simulation process.
In Table 5, the simulated parameters of the technical indicators (TECH) and macroeconomic factors (MACRO) shows that the total number of forecasted
Table 1. Descriptive statistics of the variable.
Note: IND = 1 for Market; IND = 2 for ELEC; IND = 3 for FINA.
Table 2. Unit root test results for the MKT returns.
Note: *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.
Table 3. Unit root test results for the ELEC returns.
Note: *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.
Table 4. Unit root test results for the FINA returns.
Note: *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.
Table 5. The simulated parameters for TECH and MACRO.
Note: N = 3 sectors in 5-base forecasted series of 204, 198, 192, 186, 180; IND = 1 for Market; IND = 2 for ELEC; IND = 3 for FINA. YM is the year-month; SDATE is the date of simulated series; M = # of Indep. Var; J = # of hidden units; HR is the training sample’s hit ratio; LAMH and LAMO are the transformation coefficients for hidden and output transformation; ETAH and ETAO are the learning rates for the hidden and output weights; PY is the predicted Y.
observations is 2880. The predetermined variable (M) ranges from 6 to 24. The number of hidden units range from 3 to 24. The mean value of training sample’s hit ratios for TECH and MACRO are 71.49% and 71.62%, respectively. The mean values of transformation parameters LAMH and LAMO for TECH and MACRO are (0.8113, 0.7989) and (0.8052, 0.7894), respectively.
3.4. The Performance Comparison of the Three Models
1) The TECH model has the best overall MAPE. The MACRO model has the best overall HR and ARV. The ARMA model has the best THEIL’s U.
2) In terms of the market stock index, the ARMA model has the best MAPE.
Table 6. The MAPE and HR performance measures.
Table 7. The ARV and THEIL_U Performance measures.
The MACRO model has the best HR. The TECH model has the best ARV and THEIL_U.
3) In terms of the electronic stock index, the TECH model has the best MAPE and HR. The MACRO model has the best ARV. The ARMA model has the best THEIL_U.
4) In terms of the financial stock index, the MACRO model has the best MAPE and HR. The TECH model has the best ARV. The ARMA model has the best THEIL_U.
5) In terms of the training base in MAPE and HR, the best base observed from the market stock index shows is between 30 to 48 months. The best base observed from the electronic stock index is between 42 to 48 months. The best base observed from the financial stock index is between 42 to 48 months. Thus, the training base from 42 to 48 months exhibits better forecasting performance.
In sum, previous study shows that daily stock index forecast is quite satisfactory. However, the monthly stock index forecasts tell the story otherwise, which indicates monthly data forecast might be even more difficult than that of daily data. The overall forecasting performance between TECH and MACRO models show little difference. The electronic and financial stock indices have the out-of-sample hit ratios of 77.78% and 68.89%, respectively. Thus, these two stock indices might be suitable for making meaningful investment decisions.
4. Conclusion and Discussion
The study attempted to compare the forecasting efficiency of Stock Indices between macroeconomic factors and technical indicators by using augmented GA and ANN Models. Three models are proposed including the ARMA model as the benchmark, GA_ANN with macroeconomic factors (MACRO), and GA_ANN with technical indicators (TECH). The empirical findings are summarized as follows:
1) The overall forecasting performance between MACRO and TECH models shows little difference. The electronic and financial stock indices have the out-of-sample hit ratios of 77.78% and 68.89%, respectively. Thus, these two stock indices may be suitable for making meaningful investment decisions.
2) The best training base observed from the market stock index is between 30 to 48 months. The best base observed from the electronic stock index is between 42 to 48 months. The best base observed from the financial stock index is between 42 to 48 months. Thus, the training base from 42 to 48 months exhibits better forecasting performance.
3) The optimal transformation parameters under ANN may range from 0.50 to 0.99 and may not be a constant parameter.
Due to the complexity of the augmented GA_ANN model, tremendous computing time and efforts are involved. The study found that monthly stock index forecasts may be more challenging than daily data. Further theoretical and empirical works are needed. Specifically, previous researches have adopted many different types of models, variables, and data frequency. All aspects require extensive and prudent investigations.
 Chen, A., Leung, M., & Daouk, H. (2003). Application of Neural Networks to an Emerging Financial Market: Forecasting and Trading the Taiwan Stock Index. Computers & Operations Research, 30, 901-923.
 Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association, 74, 427-431.
 Kai, F., & Wenhua, X. (1997). Training Neural Network with Genetic Algorithms for Forecasting the Stock Price Index. In Proceedings of the 1997 IEEE International Conference on Intelligent Processing Systems (pp. 401-403).
 Khandelwal, I., Adhikari, R., & Verma, G. (2015). Time Series Forecasting Using Hybrid ARIMA and ANN Models Based on DWT Decomposition. Procedia Computer Science, 48, 173-179.
 Kim, K. J., & Han, I. (2000). Genetic Algorithms Approach to Feature Discretization in Artificial Neural Networks for the Prediction of Stock Price Index. Expert Systems with Applications, 19, 125-132.
 Kvainickas, T. S., & Stankeviciene, J. (2019). Regional Limitations of Stock Indices Prediction Models Based on Macroeconomic Variables. Economics and Culture, 16, 1-16.
 Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root? Journal of Economics, 54, 159-178.
 Laichena, K. E., & Obwogi, T. N. (2015). Effects of Macroeconomic Variables on Stock Returns in the East African Community Stock Exchange Market. International Journal of Education and Research, 3, 305-320.
 Paluch, M., & Jackowska-Strumillo, L. (2012). Prediction of Closing Prices on the Stock Exchange with the Use of Artificial Neural Networks. Image Processing & Communications, 17, 275-282.
 Paluch, M., & Jackowska-Strumillo, L. (2018). Hybrid Models Combining Technical and Fractal Analysis with ANN for Short-Term Prediction of Close Values on the Warsaw Stock Exchange. Applied Sciences, 8, 2473.
 Sutheebanjard, P., & Premchaiswadi, W. (2010). Stock Exchange of Thailand Index Prediction Using Back Propagation Neural Networks. In Proceedings of the Second International Conference on Computer and Network Technology (pp. 377-380).
 Tilakaratne, C. D., Morris, S. A., Mammadov, M. A., & Hurst, C. P. (2007). Predicting Stock Market Index Trading Signals Using Neural Networks. In Proceedings of the 14th Annual Global Finance Conference (pp. 171-179).