Gold Price Prediction Based on PCA-GA-BP Neural Network

Show more

1. Introduction

For a long time, gold is a symbol of wealth and has been widely used in currency, jewelry and other industries. The current gold market has the characteristics of high-yield and high-risk coexistence. On the basis of practice, people gradually formed some gold price theories. However, a lot of research shows that due to various factors such as economic, political, human, and market factors, the price of gold has a high degree of randomness and nonlinearity [1] [2] [3] [4] .

ARIMA [5] [6] usually only predicts data with a linear relationship. The grey prediction [7] method regards irregular data change as interference during forecasting. These Irregular data will be removed during the forecasting process. This determines that the grey prediction has a strong inertia without considering the randomness of the system. It is also not sensitive about the fluctuation trends and is more suitable for predictions data with certain characteristic features. In addition, it also includes FAR model [8] , GARCH model [9] and so on. But none of these methods are suitable for the prediction of gold prices.

The opening price, closing price, highest price, lowest price, change amount, change rate, trading volume and turnover of gold prices are the combined results of various macro factors and micro factors. A large number of potential laws and factor indicators determine the changes about the gold price. Through the study of these data, we can grasp the trend of gold prices to a certain extent. Therefore, using historical market data to study gold prices have certain significance. After PCA selects principal components with some rules, the original multidimensional data can be simplified, the relevance of network input data can be eliminated, redundant information can be eliminated, the input data of network can be reduced, and the main information of the original data can be retained. However, PCA cannot obtain the non-linear relationship about data. BP neural network is a feed forward neural network [10] . The BP neural network could be used as a good model for the gold price prediction due to its simple structure and easy operation, especially the ability of self-learning to realize any complex nonlinear mapping. When using BP network, it is not necessary to establish a specific mathematical model, and could find the optimal solution with iterativing the input and output data is directly. However, the BP network has the disadvantages of slow convergence rate, easily falling into local minimum, and oscillation near-optimal solution. GA can globally optimize the weights and thresholds of the neural network, obtain the approximate solution of the optimal solution. Then the BP neural network can obtain the optimal solution, achieve the goal of global optimization. GA can solve the problem of BP neural network. So, a PCA-GA-BP neural network model combining PCA, GA and BP is proposed to realize short-term prediction of gold prices.

2. Related Principles

2.1. Principal Component Analysis (PCA)

PCA is a concept in statistics. Through the analysis of original data to obtain the cumulative contribution rate, and then get the main component. That the reconstructed data retain the primary information of the original data, thus achieving the goal of reducing the correlation between the original data and reducing the data dimension [11] . The original data are represented by the matrix X (n*p). The specific calculation steps are as follows:

1) Data standardization: ${Y}_{ij}$ is the standardized variable.

${Y}_{ij}=\frac{{X}_{ij}-{\stackrel{\xaf}{X}}_{j}}{{S}_{j}}$ , $\left(i=1,2,\mathrm{...},n;j=1,2,\mathrm{...},p\right)$ (1)

where, ${\stackrel{\xaf}{X}}_{j}$ is the average of the jth variable, ${S}_{j}$ is the standard deviation of the jth variable. The definitions of ${\stackrel{\xaf}{X}}_{j}$ and ${S}_{j}$ are as follows:

${\stackrel{\xaf}{X}}_{j}=\frac{1}{n}{\displaystyle \underset{i=1}{\overset{n}{\sum}}{X}_{ij}}$ , ${S}_{j}=\sqrt{\frac{1}{n-1}{\displaystyle \underset{i=1}{\overset{n}{\sum}}{\left({X}_{ij}-{\stackrel{\xaf}{X}}_{j}\right)}^{2}}}$

2) R is the correlation coefficient matrix, as in (2)

$R={Y}^{T}Y/\left(n-1\right)$ (2)

3) Calculating eigenvector matrix A and the eigenmatrix $\lambda $

$RA=A\lambda $ (3)

4) Determine the principal component, calculate the principal component contribution rate and cumulative contribution rate. Select the first m principal components with the cumulative contribution rate is not less than 85%. The component contribution rate of kth principal as in (4):

${\lambda}_{k}/{\displaystyle \underset{j=1}{\overset{p}{\sum}}{\lambda}_{j}}$ , (k = 1, 2, ..., p) (4)

where, $\lambda =\left({\lambda}_{1},{\lambda}_{2},\mathrm{...},{\lambda}_{p}\right)$ and ${\lambda}_{1}\ge {\lambda}_{2}\ge \cdots \ge {\lambda}_{p}$

The cumulative contribution rate of the first k principal components as in (5)

$\underset{j=1}{\overset{k}{\sum}}{\lambda}_{j}}/{\displaystyle \underset{j=1}{\overset{p}{\sum}}{\lambda}_{j}$ , (k = 1, 2, ..., p) (5)

5) ${\alpha}_{1},{\alpha}_{2},\mathrm{...},{\alpha}_{m}$ are the feature values corresponding to feature vectors ${\lambda}_{1},{\lambda}_{2},\mathrm{...},{\lambda}_{m}$ respectively. The sample data calculated by the principal component:

$Z={\displaystyle \underset{i=1}{\overset{m}{\sum}}{\alpha}_{i}{Y}_{i}}$ (6)

2.2. BP Neural Network

The BP neural network uses a gradient descent method to change the sample I/O problem into a nonlinear optimization problem [12] . BP is a typical supervised learning algorithm [13] . Through learning the neural networks weights and thresholds repeatedly to obtain the output error minimum value. The specific process is divided into two steps:

Forward propagation: The data pass the input layer, hidden layer, and output layer. The actual output and expected output are compared at the output layer. If the actual output error is not reached the expected output error, the network enter the back propagation.

Back propagation: The error signal transmits from the output layer, then passes the hidden layer and finally reaches the input layer. During this process, every neuron’s weight in each hidden layer is corrected according to the negative gradient direction of the error function, and the error signal is continuously reduced to make the actual output near the desired output.

After training, neural network grasps the relationship between input variables and output variables. Finally the output could be predicted according to the input variables base on the trained model [14] [15] .

2.3. GA Optimize BP

GA simulates the evolutionary principle in the biological field [16] , which can search data in parallel and randomly, regarding the problem as the biological evolution process. GA selects individuals and generates new individuals repeatedly with selection-crossover-mutation operation until the condition is satisfied [17] . The specific steps of GA to optimize the BP are as follows:

1) Encoding and initializing population: Using the floating point number encoding, each individual contains all the weights and threshold. R is the nodes number in the input layer. S_{1} is the nodes number in the hidden layer. S_{2} is the nodes number of the output layer. S is the individual length.

$S=R\ast {S}_{1}+{S}_{1}\ast {S}_{2}+{S}_{1}+{S}_{2}$ (7)

The size of the population has a great influence on the global search performance of the genetic algorithm. Therefore, the size of the population must be selected according to the specific problem. Initial population size is 50.

2) Evaluation function: Inputting the training sample and calculating its error function value. Regarding the reciprocal of the error function value as the fitness. If the error is smaller, the fitness is greater, as in (8). E is the sum of squared errors between the predicted output and the expected output. $\delta $ is a positive minimum amount.

$f=1/\left(E+\delta \right)$ (8)

3) Use roulette algorithm as the selection operator. ${P}_{i}$ is the probability that the individual i is selected. ${f}_{i}$ is the fitness value of individual i. n is the population size.

${P}_{i}={f}_{i}/{\displaystyle \underset{i=1}{\overset{n}{\sum}}{f}_{i}}$ (9)

4) Crossover: The primary search mean of GA is crossover operation. For the non-optimal individuals, let two individuals cross to produce two new individuals with a crossover probability of ${P}_{c}$ ( ${P}_{c}$ = 0.7) and the optimal individuals can reach the next generation directly. The maximum number of iterations is 100.

5) Non-optimal individuals mutate with a probability of ${P}_{m}$ ( ${P}_{m}$ = 0.1) to produce new individuals and optimal individuals can reach the next generation directly. The maximum number of iterations is 100.

6) Repeat steps 2) - 5) until reach the iteration goal or the number of iterations.

The PCA-processed data can be used as the input data of the GA-optimized BP network to achieve rapid convergence. At the same time, because PCA greatly reduces the complexity of input data and the complexity of neural network training, it can improve the accuracy of prediction. GA can overcome the shortcomings of the slow convergence rate of BP network and falling into local minimum easily. Combined with the three algorithms, they can exert their respective advantages, reducing input, accelerating the convergence speed, and searching for global optimal values. At the same time, they have good nonlinear modeling capabilities and overall improve the performance of the network.

3. PCA Standardized Raw Data

Selecting 110 valid data from Shanghai Gold Exchange as the raw data and the period is from 2016/5/23 to 2016/11/3. The opening price, closing price, highest price, lowest price, change amount, change rate, trading volume and turnover as eight input variables.

1) The raw data is shown in Table 1.

2) Standardized the raw data. Normalize each raw data with (1) to eliminate the magnitude and dimension difference between variables. The standardized data is shown in Table 2.

3) Calculating the correlation coefficient matrix R with (2).

$R=\left\{\begin{array}{llllllll}1.0000\hfill & 0.9591\hfill & -0.1626\hfill & -0.1659\hfill & 0.9800\hfill & 0.9793\hfill & -0.2077\hfill & -0.1160\hfill \\ 0.9591\hfill & 1.0000\hfill & 0.0917\hfill & 0.0885\hfill & 0.9798\hfill & 0.9820\hfill & -0.2093\hfill & -0.1160\hfill \\ -0.1626\hfill & 0.0917\hfill & 1.0000\hfill & 0.9993\hfill & -0.0509\hfill & -0.0196\hfill & -0.0619\hfill & -0.0612\hfill \\ -0.1659\hfill & 0.0885\hfill & 0.9993\hfill & 1.0000\hfill & -0.0557\hfill & -0.0209\hfill & -0.0576\hfill & -0.0572\hfill \\ 0.9800\hfill & 0.9798\hfill & -0.0509\hfill & -0.0557\hfill & 1.0000\hfill & 0.9678\hfill & -0.2165\hfill & -0.1250\hfill \\ 0.9793\hfill & 0.9820\hfill & -0.0196\hfill & -0.0209\hfill & 0.9678\hfill & 1.0000\hfill & -0.1881\hfill & -0.0955\hfill \\ -0.2077\hfill & -0.2093\hfill & -0.0619\hfill & -0.0576\hfill & -0.2165\hfill & -0.1881\hfill & 1.0000\hfill & 0.9942\hfill \\ -0.1160\hfill & -0.1160\hfill & -0.0612\hfill & -0.0572\hfill & -0.1250\hfill & -0.0955\hfill & 0.9942\hfill & 1.0000\hfill \end{array}\right\}$

Matrix R shows that the correlation coefficient between the first variable and the second variable is 0.9591, and the correlation coefficient between the second variable and the fifth variable is 0.9798. This shows that the correlation between these data is strong and the correlation needs to be reduced.

4) Calculate eigenvalues, contribution rates and cumulative variance contribution rates with (3) (4) (5). The result is shown in Table 3.

The cumulative contribution rate of the first three principal components is 99.448%. According to the rule, the original eight variables are replaced by the first three principal components. They are ${I}_{1}$ , ${I}_{2}$ and ${I}_{3}$ respectively. The principal component load matrix is shown in Table 4.

According to Table 4, main factor expressions can be got.

Table 1. The raw data.

Table 2. Standardized data.

Table 3. Characteristic value and contribution rate.

Table 4. Principal component load matrix.

$\begin{array}{l}{I}_{1}=-0.4891{X}_{1}-0.4860{X}_{2}+0.0259{X}_{3}+0.0278{X}_{4}\\ \text{}-0.4894{X}_{5}-0.4869{X}_{6}+0.1734{X}_{7}+0.1287{X}_{8}\end{array}$

$\begin{array}{l}{I}_{2}=0.1128{X}_{1}-0.0333{X}_{2}-0.5956{X}_{3}-0.5944{X}_{4}\\ \text{}+0.0461{X}_{5}+0.0375{X}_{6}+0.3670{X}_{7}+0.3738{X}_{8}\end{array}$

$\begin{array}{l}{I}_{3}=0.045{X}_{1}+0.1511{X}_{2}+0.369{X}_{3}+0.371{X}_{4}\\ \text{}+0.0861{X}_{5}+0.119{X}_{6}+0.5725{X}_{7}+0.5933{X}_{8}\end{array}$

5) The sample data after principal component analysis is shown in Table 5.

Table 5. The sample data after principal component analysis.

4. Experiment

4.1. Model Training

The three layer neural network structure is used. The output layer transfer function is purelin and other parameters are default. Training target is 0.001, maximum training times is 10,000, and learning rate is 0.01.

Selecting the first 100 sample as the training sample and the last 10 sample as the test sample. Taking the principal components of the first three days as input data and the output is the closing price of the 4th day. The input layer nodes number is 9, and the output layer nodes number is 1. The hidden layer nodes number is usually determined based on the empirical formula to obtain a rough range and then try to determine the nodes optimal number [18] . Empirical formula is shown by (10)

$m=\sqrt{n+l}+a$ (10)

m is the hidden layer nodes number, n is the input layer nodes number, l is the output layer nodes number, $a$ is a constant between 1 and 10. According to the empirical formula, the hidden layer nodes numbers are between 5 and 14. For the different hidden layer nodes number, the maximum number of training steps is 2000. Basing on same sample, the trial and error method is used to train the network firstly with less hidden layer nodes. Then gradually increases the nodes number of the hidden layer. For each node which is continuously trained 70 times, then select the node that has minimum output error and the corresponding number of steps. Finally, when the nodes number of the hidden layer is 13, the output error has minimum value in the PCA-GA-BP model. Detailed results are shown in Table 6. After setting up the structure, the trained error curve of the PCA-GA-BP is shown in Figure 1.

Two neural network models, GA-BP and BP, are established to compare with the PCA-GA-BP. The GA-BP training error curve is shown in Figure 2, and the BP training error curve is shown in Figure 3.

When the training target is 0.001, Figure 1 shows that the PCA-GA-BP output error is 0.00072166 and the training steps number is 71. Figure 2 shows that the GA-BP output error is 0.00089814 and the training steps number is 889. Figure 3 shows the PCA output error is 0.00097956 and the training steps number is 3880. By comparison, the GA-BP convergence speed is faster than BP and the convergence speed of PCA-GA-BP convergence speed is faster than GA-BP.

Table 6. Results of different nodes number.

Figure 1. PCA-GA-BP model training error.

Figure 2. The GA-BP training error.

Figure 3. BP model training error.

Figure 4. Actual and predictive closing price.

4.2. Model Test

Input the last 10 days data to the three neural network models respectively that have been trained. Their closing price prediction curve is shown by Figure 4 and the relative error curve is shown by Figure 5. The detailed results of the predictive closing price and the predictive error are shown in Table 7.

Table 7 can be used to calculate average relative error about the three models and the specific results are shown in Table 8.

Figure 5. Relative error of closing price.

Table 7. Predictive value and predictive error of closing price.

Table 8. The average relative error.

From Figure 4 and Figure 5, the PCA-GA-BP model that predicts gold price is more accurate than GA-BP and BP. Table 8 shows that the PCA-GA-BP model average relative error of prediction is only 1.637%, which is less than the GA-BP result which is 3.124% and BP result which is 5.018%. The PCA-GA-BP prediction result is closer to the actual value.

5. Conclusion

The algorithm proposed in this paper combines PCA, GA and BP neural network. PCA can simplify the network structure and reduce the dimension of input data. The genetic algorithm optimizes the weights and thresholds of the BP neural network, and overcomes the shortcoming that the BP neural network is easy to fall into the local minimum. BP neural network can predict the nonlinear relationship of gold price. The PCA-GA-BP model could predict the price of gold accurately, which has certain reference significance in the financial field. In the next step, we will continue to improve the BP network on the basis of this research, and combine other algorithms to further improve the accuracy of the forecast price.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Chen, L. (2013) Gold Price Prediction Model Based on PP-BPNN. Computer Simulation, 30, 354-357.

[2] Serletis, A. and Shintani, M. (2003) No Evidence of Chaos but Some Evidence of Dependence in the US Stock Market. Chaos Solitons & Fractals, 17, 449-454.

https://doi.org/10.1016/S0960-0779(02)00387-9

[3] Liu, Y.Q., Song, J.K. and Zhou, X.J. (2003) Research on the Dynamic Trend of Gold Market. Quantitative Economics & Economics Research, 20, 25-29.

[4] Yu, F. (2004) An Empirical Study of Recent Gold Price Fluctuations. Industrial Economics Research, No. 1, 30-40.

[5] Xu, L.P. and Luo, M.Z. (2011) Short-Term Analysis and Forecast of Gold Price Based on ARIMA Model. Finance and Economics, No. 1, 26-34.

[6] Fei, J.W. (2017) Analysis and Forecast of China’s Gold Futures Price Based on ARIMA Model. Contemporary Economics, No. 09, 148-150.

[7] Xu, G.Y. (2014) Chinese Gold Futures Price Forecasting Model Based on Grey Forecasting Method. Gold, 35, 8-11.

[8] Peng, Y.S., Zhang, D.S., Wang, R.X. and Chen, C. (2011) GARCH Prediction Model with Exogenous Variables for International Gold Prices. Gold, 32, 10-14.

[9] Xia, X.Y. (2013) Research on Volatility of China’s Gold Price Based on GARCH Model. Journal of Science and Technology Pioneering, 6, 18-19.

[10] Zhang, K., Yu, Y. and Li, T. (2010) Wavelet Neural Network’s Application in Gold Price Prediction. Computer Engineering and Applications, 46, 224-226+241.

[11] Xu, X. and Lu, X.L. (2016) GA-Optimized Acceleration Feature Selection Method in Behavior Recognition. Computer Engineering and Applications, 5, 139-143+166.

[12] Gao, W.H. (2010) Web Datas Mining Based on BP Neural Network. South-Central University For Nationalities, Wuhan.

[13] Liu, W., Liu, S. and Bai, R.C. (2017) Research on Mutual Learning Neural Network Training Method. Chinese Journal of Computers, 40, 1291-1308.

[14] Ding, H.F. and Li, Y.H. (2016) Research on Travel Time Combination Forecast of Expressway Based on BP Neural Network and SVM. Application Research of Computers, 33, 2929-2932+2936.

[15] Ren, X.L. and Lv, L.Y. (2014) A Survey about Network Important Nodes’ Sorting Methods. Chinese Science Bulletin, 59, 1175-1197.

https://doi.org/10.1360/972013-1280

[16] Yan, X., Li, S.Y. and Zhang, Z. (2016) Application of BP Neural Network Based on Genetic Algorithm in City Water Consumption Forecast. Computer Science, 43, 547-550.

[17] Ding, Y., Jiang, F. and Wu, Y.Y. (2016) GA’s Application in Bus Dispatching. Computer Science, 43, 547-550.

[18] Gao, Y.M. and Zhang, R.J. (2014) House Price Prediction and Anlysis Based on Genetic Algorithm and BP Neural Network. Computer Engineering, 40, 187-191.