Research on GDP Forecast of Ji’an City Based on ARIMA Model

Show more

1. Introduction

Since the reform and opening up, China has made remarkable achievements in economic development. The living standards of the people are constantly improving, and we are heading toward the goal of completing the building of a moderately prosperous society in all respects. We can feel the rapid development of China’s economy in our daily life, but from the national level, we need specific data to analyze the overall state of the national economy. Gross domestic product (GDP) refers to the market value of all final products produced by all resident units of a country or region within a certain period of time. GDP not only plays an irreplaceable role in reflecting a country’s national income, consumption capacity and economic development, but also helps people understand the economic condition of a country or region from a macro perspective. It is the key basis for the formulation of national or regional economic policies, as well as a key means to test whether economic policies are effective and scientific. Therefore, if we use appropriate statistical methods to reveal the law of GDP data changes and make high-precision prediction of short-term GDP, then it will be of great practical significance to the overall planning of macro-economy.

Ji’an City, located in central Jiangxi Province, is a famous Red City on the shore of Gan River. In the past, Ji’an City lagged behind due to inconvenient transportation and weak economic foundation. Ji’an City had 5 national poverty counties, 4 counties in the Luoxiao Mountain area, and 11 counties in the former Central Soviet District, so the task of poverty alleviation is very difficult. According to Jiangxi Daily, after 6 years of hard work, Ji’an City has gone from 350,000 registered poor households in 2013 to only 6000 people remaining in poverty by the end of 2019. In the past 6 years, the people of Ji’an have handed in a vocal answer for poverty alleviation with sweat, hard work and blood. Today, Ji’an City is focusing on consolidating and improving the results of poverty alleviation, advancing continuous poverty reduction, improving the long-term mechanism, and ensuring that a well-off society is built in sync with the whole country. This year is the final year of the 13th Five-Year Plan and the year of decisive victory for poverty alleviation. However, affected by the global spread of the new coronavirus pneumonia and the deterioration of Sino-US relations, Ji’an City is currently facing many uncertainties in the economy like other regions in China. In this context, the research on the GDP of Ji’an City is helpful for the functional departments of Ji’an City Government to grasp the overall situation of economic growth and formulate targeted policy measures to promote better and faster economic development.

In recent years, more and more scholars have realized the importance of accurately forecasting GDP. They used a variety of methods to forecast GDP from multiple perspectives. The main forecasting methods could be divided into four categories: regression analysis (Jin, 2011; Chen, 2012), gray prediction (Li & Li, 2010; Tian & Liu, 2018; Zhang & Xie, 2019), artificial neural network (Huang, 2007; Zhao, 2017; He, Wu, & Xia, 2020) and ARIMA model (Li & Xue, 2013; Zhou, 2015; Yan, 2018). ARIMA model is popular because of its simple operation and high precision. It is one of the most commonly used methods to forecast GDP. Li and Xue (2013) established the ARIMA model and used multiple screening criteria to determine the optimal ARIMA(6, 1, 3) model. This model has been proved to be highly accurate in predicting China’s GDP from 2009 to 2011. Zhou (2015) used ARIMA(1, 2, 3) that passed the test to forecast the GDP of Guangxi during the 13th Five Year Plan period, and put forward countermeasures and suggestions to improve Guangxi’s ability to achieve sustained economic growth during the 13th Five Year Plan period. Yan (2018) established ARIMA(1, 1, 1) model and used it to predict the GDP of Shandong province. The test results showed that the prediction effect was good. At present, there is no scholar to forecast the GDP of Ji’an City. Based on the GDP of Ji’an City from 1978 to 2018, in this paper, we establish ARIMA model, and forecast the GDP of Ji’an City in the next five years by using the ARIMA model. It is expected to provide reference for Ji’an City to formulate economic development goals and macroeconomic decisions.

This paper is organized as follows. In Section 2, we give some definitions and ARIMA model modeling steps that will be used in this paper. We use the steps and methods mentioned in the first section to conduct empirical research in Section 3. In Section 4, we analyze the prediction results.

2 ARIMA Model

2.1. ARIMA Model

Definition 2.1. For time series $\left\{{X}_{t},t=0,\pm 1,\pm 2,\cdots \right\}$, the mean function is defined as

${\mu}_{t}=E{X}_{t},t=0,\pm 1,\pm 2,\cdots .$

The autocovariance function ${\gamma}_{t,s}$ is defined as

${\gamma}_{t,s}=cov\left({X}_{t},{X}_{s}\right),t,s=0,\pm 1,\pm 2,\cdots .$

The autocorrelation function (ACF) ${\rho}_{t,s}$ is defined as

${\rho}_{t,s}=corr\left({X}_{t},{X}_{s}\right)=\frac{{\gamma}_{t,s}}{\sqrt{{\gamma}_{t,t}\cdot {\gamma}_{s,s}}},t,s=0,\pm 1,\pm 2,\cdots .$

Definition 2.2. (Stationary time series) Time series $\left\{{X}_{t},t=0,\pm 1,\pm 2,\cdots \right\}$ is called stationary time series, if the following three conditions are satisfied:

1) $E{X}_{t}^{2}<+\infty ,t=0,\pm 1,\pm 2,\cdots .$

2) The mean function does not change with time.
$E{X}_{t}=c,t=0,\pm 1,\pm 2,\cdots $, *c* is a constant.

3) The autocovariance function only depends on the time interval, that is

${\gamma}_{t,s}={\gamma}_{t+r,s+r},t,s,r=0,\pm 1,\pm 2,\cdots .$

Autocorrelation function and autocorrelation function describe the degree of correlation between a random event at two different times. In other words, it describes the impact of one’s historical behavior on the current situation. However, this kind of correlation is not pure, because when the degree of correlation between
${X}_{t}$ and
${X}_{t-k}$ is measured by calculating the lag k-step autocorrelation function *a*, the influence of *k* − 1 random variable
${X}_{t-1},{X}_{t-2},\cdots ,{X}_{t-k+1}$ is actually doped. Therefore, we introduce the concept of partial autocorrelation function.

Definition 2.3. (Partial autocorrelation function) Let $\left\{{X}_{t},t=0,\pm 1,\pm 2,\cdots \right\}$ be a stationary time series, we call

${\varphi}_{k}={\rho}_{{X}_{t},{X}_{t-k}|{X}_{t-1},{X}_{t-2},\cdots ,{X}_{t-k+1}}=\frac{E\left[\left({X}_{t}-\stackrel{^}{E}{X}_{t}\right)\left({X}_{t-k}-\stackrel{^}{E}{X}_{t-k}\right)\right]}{E\left[{\left({X}_{t-k}-\stackrel{^}{E}{X}_{t-k}\right)}^{2}\right]}$

a partial autocorrelation function, where

$\stackrel{^}{E}{X}_{t}=E\left[{X}_{t}|{X}_{t-1},{X}_{t-2},\cdots ,{X}_{t-k+1}\right],$

$\stackrel{^}{E}{X}_{t-k}=E\left[{X}_{t-k}|{X}_{t-1},{X}_{t-2},\cdots ,{X}_{t-k+1}\right].$

It can be seen that the partial autocorrelation characterizes the degree of correlation between ${X}_{t}$ and ${X}_{t-k}$ that is not affected by ${X}_{t-1},{X}_{t-2},\cdots ,{X}_{t-k+1}$.

Definition 2.4. (White noise) If the time series $\left\{{\epsilon}_{t},t=0,\pm 1,\pm 2,\cdots \right\}$ meets the following two conditions:

1) $E{\epsilon}_{t}=0$,

2) $cov\left({\epsilon}_{t},{\epsilon}_{s}\right)=\{\begin{array}{l}{\sigma}^{2},t=s\\ 0,t\ne s\end{array},t,s=0,\pm 1,\pm 2,\cdots $,

then $\left\{{\epsilon}_{t}\right\}$ is called white noise, denoted as $\left\{{\epsilon}_{t}\right\}~WN\left(0,{\sigma}^{2}\right)$.

White noise is a sequence of uncorrelated random variables with equal zero mean variances. It is the simplest and most basic stationary time series. Its importance lies in the fact that white noise is the “generator” of many important models or sequences. Obviously, the autocovariance function ${\gamma}_{k}$ and the autocorrelation function ${\rho}_{k}$ of white noise $WN\left(0,{\sigma}^{2}\right)$ are:

${\gamma}_{k}=\{\begin{array}{l}{\sigma}^{2},k=0\\ 0,k\ne 0\end{array}$

${\rho}_{k}=\{\begin{array}{l}1,k=0\\ 0,k\ne 0\end{array}$

This shows that the items of white noise are uncorrelated.

Definition 2.5. (ARIMA(*p*, *d*, *q*) model) Let
$\left\{{\nabla}^{d}{X}_{t},t=0,\pm 1,\pm 2,\cdots \right\}$ be a stationary time series, we call the model with the following form as ARIMA(*p*, *d*, *q*) model:

$\phi \left(B\right){\nabla}^{d}{X}_{t}=\theta \left(B\right){\epsilon}_{t}.$

where *B* is the back shift operator,
${\nabla}^{d}\equiv {\left(1-B\right)}^{d}$ is the d-order difference operator,
$\phi \left(B\right)\equiv 1-{\phi}_{1}B-{\phi}_{2}{B}^{2}-\cdots -{\phi}_{p}{B}^{p}$ is the autoregressive operator polynomial, and
$\theta \left(B\right)\equiv 1-{\theta}_{1}B-{\theta}_{2}{B}^{2}-\cdots -{\theta}_{q}{B}^{q}$ is the moving average operator polynomial,
$\left\{{\epsilon}_{t}\right\}~WN\left(0,{\sigma}^{2}\right)$.

2.2. ARIMA Model Modeling Steps

1) To test whether the time series $\left\{{X}_{t},t=0,\pm 1,\pm 2,\cdots \right\}$ to be studied is stationary.

2) Data preprocessing. If the time series data studied is non-stationary, the data must be preprocessed, such as logarithm operation, difference operation and so on, so that the data can be transformed into stationary. Otherwise, move on to the next step.

3) Check whether a stationary time series is a white noise series. If it is, it means that the data does not have any analytical value and the analysis should be stopped; if it is not, then the next step of modeling can be carried out.

4) ARIMA model structure selection. The autocorrelation and partial autocorrelation function graphs of stationary time series are obtained by using statistical analysis software. The model structure is judged by observing the specific characteristics of these two images. The judgment basis is shown in Table 1.

As the observation of images is subjective, the judgment may not be so accurate. Therefore, when determining the structure of the model, it is appropriate to select several values around the determined *p* value and *q* value to construct the ARIMA model.

Table 1. ARIMA model structure decision table.

5) Determine the specific parameter values in ARIMA model. In the process of building multiple ARIMA models, we investigate whether they can pass the significance test, specifically for the model and for the parameters. The models that fail to pass the two types of tests are discarded, and then the model corresponding to the minimum BIC value is determined as the optimal model by comparing their specific BIC values.

6) White noise test of residual sequence. If the residual sequence passes the white noise test, the modeling can be terminated; if the residual is not white noise, it means that there is useful information in the residual and the model needs to be modified.

7) Use the finally established ARIMA model to make short-term forecasts. The predicted value is obtained by substituting the known data into the relation established by the above model.

The visual operation process of the ARIMA model modeling steps is shown in Figure 1.

3. Empirical Analysis

3.1. Data Selection and Preprocessing

In this article, we use the Ji’an Yearbook released by Ji’an City as the data source, and select the Ji’an City GDP data from 1978 to 2018 as a sample to establish an ARIMA model to predict the Ji’an City GDP from 2019 to 2021.

3.2. Stationarity Test

According to the data in Table 2, we use MATLAB software to draw the GDP time series diagram of Ji’an City from 1978 to 2018, as shown in Figure 2. It can be clearly seen that the series has an exponential increasing trend and is a non-stationary time series.

In order to eliminate this trend of exponential increase, we perform logarithmic operations with base e on the original data, and records the processed sample time series as lnGDP, and then use MATLAB again to make a sequence diagram of the new series obtained after taking the logarithm. As shown in Figure 3.

It can be seen from Figure 3 that the upward trend of time series lnGDP is more gradual, but still presents a linear upward trend. The first-order difference of the time series lnGDP is denoted as ∇lnGDP, and its sequence diagram is shown in Figure 4.

Figure 1. ARIMA model modeling process.

Table 2. GDP data of Ji’an City from 1978 to 2018 (unit: 100 million yuan).

Figure 2. Sequence diagram of GDP of Ji’an City over the years.

Figure 3. Sequence diagram of lnGDP of Ji’an City over the years.

It can be seen from the image that the sequence obtained by the first difference always oscillates around a certain constant value. It follows that the new sequence is roughly stable. In order to avoid the interference of subjective consciousness and obtain more rational and accurate conclusions, it is necessary to continue to carry out the unit root test, which has become the most widely used

Figure 4. Sequence diagram of ∇lnGDP.

statistical test method of stationarity by establishing the one-to-one correspondence between the existence of unit roots of characteristic equations and the stationarity of sequences. ADF test was carried out on the ∇lnGDP data of Ji’an City from 1978 to 2018, and the running result of MATLAB program is

$\text{adftest}\left(\nabla \mathrm{ln}\text{GDP}\right)=0$.

The results show that, ∇lnGDP is a non-stationary series. Therefore, in order to obtain the stationary sequence, we still need to do further difference operation. The second-order difference is made for the time series lnGDP, which is denoted as ∇^{2}lnGDP. The time sequence diagram is shown in Figure 5.

From Figure 5, it can be preliminarily judged that the second-order difference ∇^{2}lnGDP of the time series lnGDP is stationary. Through the ADF stationarity test of the sequence ∇^{2}lnGDP, the MATLAB program running result is

$\text{adftest}\left({\nabla}^{2}\mathrm{ln}\text{GDP}\right)=1$,

which confirms the stationarity of ∇^{2}lnGDP.

3.3. Model Recognition and Order Determination

Through the above stationarity analysis, the logarithmic difference order *d*= 2 has been determined. In order to determine the *p* and *q* in the ARIMA(*p*, *d*, *q*) model, autocorrelation and partial autocorrelation analysis are performed on the second-order difference sequence ∇^{2}lnGDP. The results are shown in Figure 6 and Figure 7. It can be seen from the figure that the autocorrelation function graph is truncated after order 1, and the partial autocorrelation function graph is tailed. According to the characteristics of autocorrelation function and partial autocorrelation function, ARIMA(0, 2, 1) is considered for fitting.

Figure 5. Sequence diagram of ∇^{2}lnGDP.

Figure 6. ACF diagram of ∇^{2}lnGDP.

It is not accurate to determine the values of *P* and *Q* only through Figure 6 and Figure 7. According to the BIC criterion, the BIC function values of a limited number of models are investigated within a certain range, and then the relative optimal order is selected. The corresponding BIC values of each model are shown in Table 3.

Figure 7. PACF diagram of ∇^{2}lnGDP.

Table 3. BIC value of each model.

According to BIC criterion, BIC value is the smallest when *p* = 0 and *q *= 1, which is consistent with the above image analysis results. Therefore, the optimal fitting model of time series ∇^{2}lnGDP is ARIMA(0, 0, 1), that is, the optimal fitting model of time series lnGDP is ARIMA(0, 2, 1). The results of fitting the model with MATLAB are as follows:

The model expression obtained by ARIMA(0, 2, 1) model fitting is:

${\nabla}^{2}\mathrm{ln}{\text{GDP}}_{t}=\left(1-0.5917B\right){\epsilon}_{t}.$

Removing the difference, we get the final model expression:

$\mathrm{ln}{\text{GDP}}_{t}-2\mathrm{ln}{\text{GDP}}_{t-1}+\mathrm{ln}{\text{GDP}}_{t-2}={\epsilon}_{t}-0.5917{\epsilon}_{t-1}$

Therefore, the forecast formula of GDP time series ${\text{GDP}}_{t}$ is:

${\text{GDP}}_{t}=\frac{{\text{GDP}}_{t-1}^{2}}{{\text{GDP}}_{t-2}}{\text{e}}^{{\epsilon}_{t}-0.5917{\epsilon}_{t-1}}$

3.4. White Noise Test

After establishing ARIMA(0, 2, 1), it is necessary to investigate whether ARIMA(0, 2, 1) can extract the information contained in historical data sufficiently. Therefore, the white noise test should be performed on the residual sequence obtained by ARIMA(0, 2, 1). We use the autocorrelation function of the residual sequence to examine its independence, and the result is shown in Figure 8.

Obviously, the autocorrelation function of any order is smaller than the standard deviation, so the residual sequence can be determined to be independent of each other. We used the Ljung-Box test to determine the adaptability of the model. The MATLAB program running results are as follows:

Figure 8. Autocorrelation function graph of residual sequence.

Table 4. GDP forecast of Ji’an city in 2019-2023 (unit: 100 million yuan).

The results show that *h* = 0 and *p* = 0.8450 << 0.05, which indicates that the residual sequence is a white noise sequence. Therefore, the ARIMA(0, 2, 1) model we established is feasible and can be used to forecast the GDP of Ji’an City.

3.5. Model Prediction

Now, we use the above ARIMA(0, 2, 1) model to forecast the GDP of Ji’an City in the next 5 years, the results are shown in Table 4.

So far, by smoothing the GDP data of Ji’an City from 1978 to 2018, we have established an ARIMA(0, 2, 1) model. On this basis, we obtain the GDP forecast value of Ji’an City in 2019-2023.

4. Conclusion

In this paper, ARIMA(0, 2, 1) model is established by using the GDP data of Ji’an City from 1978 to 2018. The adaptability test of the model shows that this model can be used for the short-term forecast of GDP of Ji’an City, which can provide decision-making reference for Ji’an City to make economic plans. According to the predicted results, the GDP growth of Ji’an City in 2019-2023 is good. In this situation, Ji’an City should increase investment in education, health and other livelihood fields to increase the attraction of talents. In the short term, this will increase the financial burden of Ji’an City, but in the long run, the increase of population will make Ji’an more competitive in the future competition, and Ji’an city will achieve more long-term and stable economic growth. However, due to the novel coronavirus epidemic and the relationship between China and the United States, whether the GDP of Ji’an City can achieve the predicted value remains to be tested in practice.

References

[1] Chen, M. (2012). Analysis on the Influencing Factors of China’s GDP Based on Linear Regression Model. Guide to Business, 9, 20-21.

[2] He, G., Wu, W., & Xia, J. (2020). Application of Grey Neural Network Based on Simpson Formula in GDP Forecast. Statistics and Decision, 2, 43-47.

[3] Huang, X. (2007). Research on Application of Artificial Neural Network in GDP Forecast. M.S. Dissertation, Changchun: Jilin University.

[4] Jin, Y. (2011). Research on Regression Estimation Method of GDP Accounting. M.S. Dissertation, Shanghai: Shanghai Jiao Tong University.

[5] Li, D., & Li, Z. (2010). Research on GDP Forecast of Guangxi Based on GM (1,1) Model and Grey Correlation Analysis. Anhui Agricultural Sciences, 38, 19704-19706+19710.

[6] Li, M., & Xue, J. (2013). China’s GDP Growth Forecast Based on Optimal ARIMA Model. Statistics and Decision, 9, 23-26.

[7] Tian, Z., & Liu, M. (2018). An Empirical Study of GDP Forecast Based on Improved Grey GM (1,1) Model. Statistics and Decision, 34, 83-85.

[8] Yan, Y. (2018). Analysis and Forecast of Shandong Province GDP Based on ARIMA Model. Mathematics in Practice and Theory, 48, 285-292.

[9] Zhang, H., & Xie, X. (2019). Research on Combination Optimization Model Based on Grey Correlation Degree. Statistics and Decision, 35, 19-23.

[10] Zhao, L. (2017). GDP Forecast Based on Particle Swarm Optimization Neural Network. M.S. Dissertation, Beijing: North China Electric Power University.

[11] Zhou, L. (2015). Forecast and Analysis of Guangxi’s Economic Growth during the 13th Five Year Plan Period Based on ARIMA Model. Business Economic Research, 15, 137-139.