The Forecast Research of Linear Regression Forecast Model in National Economy
Abstract: Based on application of Linear Regression Model in national economy. Statistical Yearbook of China is a statistical data book published by China. Some different types of data in China have been documented every year. The dissertation is researching some part of national economy in Statistical Yearbook of China to research the relationship between GNP (gross national product) and various data, and afterwards to establish model to achieve some rationalization suggestions for the future of the National Economy. The basic method for data mining is prediction. The method proposed in this dissertation will use multiple linear regression model combined with Gray Prediction to research to achieve processing information. This dissertation uses forward stepwise regression or backward stepwise regression in multiple linear regression to make specific data model. Afterwards some fuzzy data will be analysed with Gray prediction model. Finally, the combination of the two realizes the data prediction algorithm. The main calculation tools that will be used in this dissertation are SPSS and MATLAB. The prediction results are based on model which is obtained in this dissertation. After comparing the multiple linear regression and the grey prediction model, the researcher found the algorithm of this dissertation is more accurate, thereby verifying the rationality of the prediction model in this dissertation. 1. Introduction

The analysis method of the national economy has always been linear regression or other various methods for regression analysis, but when we use a single method for forecasting or data analysis, there will often be large errors or large limitations. In this case, the combined model can often be used to calculate more accurate results. For example, when using a multiple linear regression model, it can predict data with a linear relationship, but it cannot predict data with an exponential relationship. When using a gray prediction model, can only make fuzzy predictions on the data, and cannot reflect the linear relationship of the data. The model in this article solves the problem that the linear prediction cannot determine the highest power and function, and at the same time solves the problem that the gray prediction model cannot perform linear prediction. Combining the two can be used in a variety of sequence forecasts with both linear and exponential growth trends in the future. It has broad significance and can make better predictions on the data.

2. Multiple Linear Regression and Grey Forecast Theory Basis

2.1. Introduction to Multiple Linear Regression

Suppose there is a random variable y and a general variable ${x}_{1},{x}_{2},\cdots ,{x}_{p}$. The linear regression model between is

$y={\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+\cdots +{\beta }_{p}{x}_{p}+\epsilon$ (1.1)

In ${\beta }_{0},{\beta }_{1},\cdots ,{\beta }_{p}$ is $p+1$, unknown parameter ${\beta }_{0}$ can be called regression constant, ${\beta }_{1},\cdots ,{\beta }_{p}$ is regression coefficients. Y is dependent variable, ${x}_{1},{x}_{2},\cdots ,{x}_{p}$ is p general variables that can be accurately measured and controlled at the same time are called independent variables. When P = 1, the formula (1) is called a univariate linear regression model. At that time, we can call Equation (1) a multiple linear regression model. $\epsilon$ is random error. Regarding the random error term, we usually set $\left\{\begin{array}{l}{E}_{\left(\epsilon \right)}=0\\ Var\left(\epsilon \right)={\sigma }^{2}\end{array}$, $E\left(x\right)={\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+\cdots +{\beta }_{p}{x}_{p}$ is the theoretical regression equation.

For a practical problem, if we have obtained n sets of observation data $\left({x}_{i1},{x}_{i2},\cdots ,{x}_{ip};{y}_{i}\right)\left(i=1,2,\cdots ,n\right)$ then the above linear regression model Equation (1) can be expressed as

$\left\{\begin{array}{l}{y}_{1}={\beta }_{0}+{\beta }_{1}{x}_{11}+{\beta }_{2}{x}_{12}+\cdots +{\beta }_{p}{x}_{1p}+{\epsilon }_{1}\\ {y}_{2}={\beta }_{0}+{\beta }_{1}{x}_{21}+{\beta }_{2}{x}_{22}+\cdots +{\beta }_{p}{x}_{2p}+{\epsilon }_{2}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}⋮\\ {y}_{n}={\beta }_{0}+{\beta }_{1}{x}_{n1}+{\beta }_{2}{x}_{n2}+\cdots +{\beta }_{p}{x}_{np}+{\epsilon }_{n}\end{array}$ (1.2)

It is written in the form of a matrix:

$y=X\beta +\epsilon$, $y=\left[\begin{array}{c}{y}_{1}\\ {y}_{2}\\ ⋮\\ {y}_{n}\end{array}\right]$, $X=\left[\begin{array}{ccccc}1& {x}_{11}& {x}_{12}& \cdots & {x}_{1p}\\ 1& {x}_{21}& {x}_{22}& \cdots & {x}_{2p}\\ 1& ⋮& ⋮& & ⋮\\ 1& {x}_{n1}& {x}_{n2}& \cdots & {x}_{np}\end{array}\right]$, $\beta =\left[\begin{array}{c}{\beta }_{1}\\ {\beta }_{2}\\ ⋮\\ {\beta }_{p}\end{array}\right]$, $\epsilon =\left[\begin{array}{c}{\epsilon }_{1}\\ {\epsilon }_{2}\\ ⋮\\ {\epsilon }_{p}\end{array}\right]$ (1.3)

The second term X is a $n×\left(p+1\right)$ matrix of order, which is called regression design matrix or data matrix. In the design of the experiment, because the elements in X can be set in advance and can be controlled, because there are some subjective factors, so X is called the design matrix .

For example, to analyze the relationship between automobile characteristics and automobile sales volume, a multivariate linear regression model is established first. The data used by the model is shown in Figure 1   .

Firstly, it is stepwise regressed by SPSS, then selected, analyzed, regressed and linearized, and obtained (Tables 1-3).

Figure 1. Relationship between automobile characteristics and automobile sales volume.

Table 1. Model summary table.

APredicted value: (constant), resale, length, horsepow, width, wheelbas, engineers, price; BPredicted value: (constant), resale, length, horsepow, width, wheelbas, engineers; CPredicted value: (constant), resale, length, horsepow, wheelbas, engineers; DPredicted value: (constant), resale, length, wheelbas, engineers; EPredicted Value: (constant), resale, length, wheelbase.

Table 2. Analysis of variance.

AVariable: Sales; BPredicted values: (constant), resale, length, horsepow, width, wheelbas, engineers, price; CPredicted values: (constant), resale, length, horsepow, width, wheelbas, engineers; DPredicted values: (constant), resale, length, horsepow, wheelbas, engineers; EPredicted values: (constant), resale, length, wheelbas, engineers; FPredicted values: (constant), resale, length, wheelbas.

Table 3. Excluded variables table.

AVariable: Sales; BPredicted values in the model: (constant), resale, length, horsepow, width, wheelbas, engineers; CPredicted values in the model: (constant), resale, length, horsepow, wheelbas, engineers; DPredicted values in model: (constant), resale, length, wheelbas, engineer; EPredicted values in the model: (constant), resale, length, wheelbas.

2.2. Grey Pre-Test and Test

Grey prediction method is a kind of prediction method for some systems with uncertain factors. Grey theory holds that although some systematic behavior phenomena are hazy and the data are complex, they are still orderly and have certain overall functions. Therefore, the generation of grey number is to find the law from the clutter, and the grey theory establishes a generated data model instead of the original data model, so the data used in grey prediction is the result of inverse processing of some predicted values obtained from the GM model of generated data. The method of grey prediction is a branch of theory in the field of fuzzy prediction, which is more perfect in theory and method .

Model of grey prediction

First of all, we should do some preprocessing to the data.

For example, let’s say that the original data sequence is

${x}^{\left(0\right)}=\left\{{x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\cdots ,{x}^{\left(0\right)}\left(N\right)\right\}=\left\{6,3,8,10,7\right\}$

Accumulate this set of data

$\begin{array}{l}{x}^{\left(1\right)}\left(1\right)={x}^{\left(0\right)}\left(1\right)=6\\ {x}^{\left(1\right)}\left(2\right)={x}^{\left(0\right)}\left(1\right)+{x}^{\left(0\right)}\left(2\right)=6+3=9\\ {x}^{\left(1\right)}\left(3\right)={x}^{\left(0\right)}\left(1\right)+{x}^{\left(0\right)}\left(2\right)+{x}^{\left(0\right)}\left(3\right)=6+3+8=17\\ {x}^{\left(1\right)}\left(4\right)={x}^{\left(0\right)}\left(1\right)+{x}^{\left(0\right)}\left(2\right)+{x}^{\left(0\right)}\left(3\right)+{x}^{\left(0\right)}\left(4\right)=6+3+8+10=27\\ {x}^{\left(1\right)}\left(5\right)={x}^{\left(0\right)}\left(1\right)+{x}^{\left(0\right)}\left(2\right)+{x}^{\left(0\right)}\left(3\right)+{x}^{\left(0\right)}\left(4\right)+{x}^{\left(0\right)}\left(5\right)=6+3+8+10+7=34\end{array}$

You can get a new sequence of data

${x}^{\left(1\right)}=\left\{6,9,17,27,34\right\}$

The above formula can be ${x}^{\left(1\right)}\left(i\right)=\left\{\underset{j=1}{\overset{i}{\sum }}{x}^{\left(0\right)}\left(j\right)|i=1,2,\cdots ,N\right\}$ summed up as, then we can call the sequence represented in this data an accumulation generation of the original data sequence, and we can also call it an accumulation ${x}^{\left(1\right)}\left(1\right)={x}^{\left(0\right)}\left(1\right)$ generation, obviously. Therefore, it can be assumed that an exponential curve or even a straight line is used to approximate this accumulated generated ${x}^{\left(1\right)}$ sequence. In order to return this accumulated sequence to the original sequence, it is necessary to carry out post-subtraction operation, which can also be called subtraction generation, that is, the difference between the two data before and after, as in the above example

$\begin{array}{l}\Delta {x}^{\left(1\right)}\left(5\right)={x}^{\left(1\right)}\left(5\right)-{x}^{\left(1\right)}\left(4\right)=34-27=7\\ \Delta {x}^{\left(1\right)}\left(4\right)={x}^{\left(1\right)}\left(4\right)-{x}^{\left(1\right)}\left(3\right)=27-17=10\\ \Delta {x}^{\left(1\right)}\left(3\right)={x}^{\left(1\right)}\left(3\right)-{x}^{\left(1\right)}\left(2\right)=17-9=8\\ \Delta {x}^{\left(1\right)}\left(2\right)={x}^{\left(1\right)}\left(2\right)-{x}^{\left(1\right)}\left(1\right)=9-6=3\\ \Delta {x}^{\left(1\right)}\left(1\right)={x}^{\left(1\right)}\left(1\right)-{x}^{\left(1\right)}\left(0\right)=6-0=6\end{array}$

Summarizing the above formulas, we can get the following results:

$\Delta {x}^{\left(1\right)}\left(i\right)={x}^{\left(1\right)}\left(i\right)-{x}^{\left(1\right)}\left(i-1\right)={x}^{\left(0\right)}\left(i\right)$ (1.4)

Among them $i=1,2,\cdots ,N$, ${x}^{\left(0\right)}\left(0\right)=0.$

Modeling principle of grey prediction model.

Give a column of observation data ${x}^{\left(0\right)}=\left\{{x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\cdots ,{x}^{\left(0\right)}\left(N\right)\right\}$ Add it up once and get it. ${x}^{\left(1\right)}=\left\{{x}^{\left(1\right)}\left(1\right),{x}^{\left(1\right)}\left(2\right),\cdots ,{x}^{\left(1\right)}\left(N\right)\right\}$ Let ${x}^{\left(1\right)}$ the first order differential equation be $\frac{\text{d}{x}^{\left(1\right)}}{\text{d}t}+a{x}^{\left(1\right)}=u$ satisfied, and this equation also satisfies the initial conditions when $t={t}_{0}$, ${x}^{\left(1\right)}={x}^{\left(1\right)}\left({t}_{0}\right)$ The solution of is ${x}^{\left(1\right)}\left(t\right)=\left[{x}^{\left(1\right)}\left({t}_{0}\right)-\frac{u}{a}\right]{\text{e}}^{-a\left(t-{t}_{0}\right)}+\frac{u}{a}.$

The method of grey modeling is to estimate the constants and by using the least square method through a a cumulative u sequence. Because ${x}^{\left(1\right)}\left(1\right)$ they are used as initial values, they are ${x}^{\left(1\right)}\left(2\right),{x}^{\left(1\right)}\left(3\right),\cdots ,{x}^{\left(1\right)}\left(N\right)$ all brought into the equation separately, ${x}^{\left(1\right)}={x}^{\left(1\right)}\left({t}_{0}\right)$ and the difference is used instead of the differential, and because they are sampled at equal $\Delta t=\left(t+1\right)-t=1$ intervals, they can be obtained

$\frac{\Delta {x}^{\left(1\right)}\left(2\right)}{\Delta t}=\Delta {x}^{\left(1\right)}\left(2\right)={x}^{\left(1\right)}\left(2\right)-{x}^{\left(1\right)}\left(1\right)={x}^{\left(0\right)}\left(2\right)$ (1.5)

So there are similar $\frac{\Delta {x}^{\left(1\right)}\left(3\right)}{\Delta t}={x}^{\left(0\right)}\left(3\right),\cdots ,\frac{\Delta {x}^{\left(1\right)}\left(N\right)}{\Delta t}={x}^{\left(0\right)}\left(N\right)$ ones,

The ${x}^{\left(1\right)}={x}^{\left(1\right)}\left({t}_{0}\right)$ term $\left\{\begin{array}{l}{x}^{\left(0\right)}\left(2\right)+a{x}^{\left(1\right)}\left(2\right)=u\\ {x}^{\left(0\right)}\left(3\right)+a{x}^{\left(1\right)}\left(3\right)=u\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}⋮\\ {x}^{\left(0\right)}\left(N\right)+a{x}^{\left(1\right)}\left(N\right)=u\end{array}$ is then $a{x}^{\left(1\right)}\left(i\right)$ moved to the right and written as the quantity product of the vector because it $\left\{\begin{array}{l}{x}^{\left(0\right)}\left(2\right)=\left[-{x}^{\left(1\right)}\left(2\right),1\right]\left[\begin{array}{l}a\\ u\end{array}\right]\\ {x}^{\left(0\right)}\left(3\right)=\left[-{x}^{\left(1\right)}\left(3\right),1\right]\left[\begin{array}{l}a\\ u\end{array}\right]\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}⋮\\ {x}^{\left(0\right)}\left(N\right)=\left[-{x}^{\left(1\right)}\left(N\right),1\right]\left[\begin{array}{l}a\\ u\end{array}\right]\end{array}$ also $\frac{\Delta {x}^{\left(1\right)}}{\Delta t}$ involves accumulating the values ${x}^{\left(1\right)}$ of the two times in the column, ${x}^{\left(1\right)}\left(i\right)$ so it is more appropriate to take the average of the two times before and after, that is, ${x}^{\left(i\right)}\left(i\right)$ it will $\frac{1}{2}\left[{x}^{\left(i\right)}\left(i\right)+{x}^{\left(i\right)}\left(i-1\right)\right]$, $\left(i=2,3,\cdots ,N\right)$ be replaced by, so the formula in the form of the quantity product of the vector before will be rewritten into a matrix expression

$\left[\begin{array}{c}{x}^{\left(0\right)}\left(2\right)\\ {x}^{\left(0\right)}\left(3\right)\\ ⋮\\ {x}^{\left(0\right)}\left(N\right)\end{array}\right]=\left[\begin{array}{cc}-\frac{1}{2}\left[{x}^{\left(1\right)}\left(2\right)+{x}^{\left(1\right)}\left(1\right)\right]& 1\\ -\frac{1}{2}\left[{x}^{\left(1\right)}\left(3\right)+{x}^{\left(1\right)}\left(2\right)\right]& 1\\ ⋮& ⋮\\ -\frac{1}{2}\left[{x}^{\left(1\right)}\left(N\right)+{x}^{\left(1\right)}\left(N-1\right)\right]& 1\end{array}\right]\left[\begin{array}{l}a\\ u\end{array}\right]$, (1.6)

Order, $y={\left({x}^{\left(0\right)}\left(2\right),{x}^{\left(0\right)}\left(3\right),\cdots ,{x}^{\left(0\right)}\left(N\right)\right)}^{\text{T}}$ where T means transposition, order $B=\left[\begin{array}{cc}-\frac{1}{2}\left[{x}^{\left(1\right)}\left(2\right)+{x}^{\left(1\right)}\left(1\right)\right]& 1\\ -\frac{1}{2}\left[{x}^{\left(1\right)}\left(3\right)+{x}^{\left(1\right)}\left(2\right)\right]& 1\\ ⋮& ⋮\\ -\frac{1}{2}\left[{x}^{\left(1\right)}\left(N\right)+{x}^{\left(1\right)}\left(N-1\right)\right]& 1\end{array}\right]$ Then $U=\left[\begin{array}{l}a\\ u\end{array}\right]$ the original matrix expression can also be written as, $Y=BU$ it is estimated by the least square method $\stackrel{^}{U}=\left[\begin{array}{l}\stackrel{^}{a}\\ \stackrel{^}{u}\end{array}\right]={\left({B}^{\text{T}}B\right)}^{-1}{B}^{\text{T}}y$ as, and the $\stackrel{^}{a}$ estimated $\stackrel{^}{u}$ values

$\left\{\begin{array}{l}{x}^{\left(0\right)}\left(2\right)+a{x}^{\left(1\right)}\left(2\right)=u\\ {x}^{\left(0\right)}\left(3\right)+a{x}^{\left(1\right)}\left(3\right)=u\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}⋮\\ {x}^{\left(0\right)}\left(N\right)+a{x}^{\left(1\right)}\left(N\right)=u\end{array}$ are substituted into such a time response equation ${\stackrel{^}{x}}^{\left(1\right)}\left(k+1\right)=\left[{x}^{\left(1\right)}\left(1\right)-\frac{\stackrel{^}{u}}{\stackrel{^}{a}}\right]{\text{e}}^{-\stackrel{^}{a}k}+\frac{\stackrel{^}{u}}{\stackrel{^}{a}}$, when $k=1,2,\cdots ,N-1$ What is calculated from the above ${\stackrel{^}{x}}^{\left(1\right)}\left(k+1\right)$ formula is a fitting value; $k\ge N$ At ${\stackrel{^}{x}}^{\left(1\right)}\left(k+1\right)$ that time, it was a predicted value, which was the fitted value ${x}^{\left(1\right)}$ relative to a cumulative sequence, and then restored by post-subtraction operation. At $k=1,2,\cdots ,N-1$ that time, the fitted ${x}^{\left(0\right)}$ value could ${\stackrel{^}{x}}^{\left(0\right)}\left(k\text{+}1\right)$ be $k\ge N$ obtained; At that time, the forecast ${x}^{\left(0\right)}$ value of the original series can be obtained.

This chapter introduces the theoretical basis of this paper, introduces the concept of multiple linear regression model and its application examples, and introduces the concept of grey theory and the modeling basis of grey theory, as well as the simple modeling method and calculation method of grey theory  .

3. Prediction Model Based on Multiple Linear Regression and Grey Prediction

3.1. Grey Forecast and Its Advantages

3.1.1. GM (1, 1) Model

GM series models are the basic models in grey prediction theory, especially GM (1, 1) model is widely used. The model used in this paper is GM (1, 1) model. There are four basic forms of GM (1, 1) model, including mean GM (1, 1) model, original difference GM (1, 1) model, mean difference GM (1, 1) model and discrete GM (1, 1) model. The definitions of these four models are explained as follows:

There is ${X}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\cdots ,{x}^{\left(0\right)}\left(n\right)\right)$ a ${x}^{\left(0\right)}\left(k\right)\ge 0$, $k=1,2,\cdots ,n$ sequence, ${X}^{\left(1\right)}$ wherein; ${X}^{\left(0\right)}$ 1-AGO sequence of: ${x}^{\left(1\right)}\left(k\right)=\underset{i=1}{\overset{k}{\sum }}{x}^{\left(0\right)}\left(i\right)$, $k=1,2,\cdots ,n$ where; Weighing

${x}^{\left(0\right)}\left(k\right)+a{x}^{\left(1\right)}\left(k\right)=b$ (2.1)

It is the original form of GM (1, 1) model, and the original form of GM (1, 1) model is essentially a difference equation. The parameter $\stackrel{^}{a}={\left[a,b\right]}^{\text{T}}$ vector in Equation (2.1) can also be estimated by least square method

$\stackrel{^}{a}={\left({B}^{\text{T}}B\right)}^{\left(-1\right)}{B}^{\text{T}}Y$ (2.2)

where Y and B are respectively

$Y=\left[\begin{array}{c}{x}^{\left(0\right)}\left(2\right)\\ {x}^{\left(0\right)}\left(3\right)\\ ⋮\\ {x}^{\left(0\right)}\left(n\right)\end{array}\right]$, $B=\left[\begin{array}{cc}-{x}^{\left(1\right)}\left(2\right)& 1\\ -{x}^{\left(1\right)}\left(3\right)& 1\\ ⋮& ⋮\\ -{x}^{\left(1\right)}\left(n\right)& 1\end{array}\right]$ (2.3)

Therefore, based on the original form of GM (1, 1) model and Formula (2.2), the parameters of the model are estimated, and the solution of the original difference Equation (2.1) is directly used as the time response formula, and the obtained model is called the original difference form of GM (1, 1) model .

Suppose, ${X}^{\left(0\right)}$ as ${X}^{\left(1\right)}$ shown in Equation (2.1)

${Z}^{\left(1\right)}=\left({z}^{\left(1\right)}\left(2\right),{z}^{\left(1\right)}\left(3\right),\cdots ,{z}^{\left(1\right)}\left(n\right)\right)$, where ${Z}^{\left(1\right)}\left(k\right)=\frac{1}{2}\left({x}^{\left(1\right)}\left(k\right)+{x}^{\left(1\right)}\left(k+1\right)\right)$, $k=2,3,\cdots ,n$ call

${x}^{\left(0\right)}\left(k\right)+a{z}^{\left(1\right)}\left(k\right)=b$ (2.4)

is in the form of the mean value of GM (1, 1) model. The mean value of GM (1, 1) model is essentially a difference equation. The parameter $\stackrel{^}{a}={\left[a,b\right]}^{\text{T}}$ vectors in Equation (2.4) can also be estimated using Equation (2.2), where it is important to note that some elements in Matrix B differ from Equation (2.3)

$B=\left[\begin{array}{cc}-{z}^{\left(1\right)}\left(2\right)& 1\\ -{z}^{\left(1\right)}\left(3\right)& 1\\ ⋮& ⋮\\ -{z}^{\left(1\right)}\left(n\right)& 1\end{array}\right]$ (2.5)

$\frac{\text{d}{x}^{\left(1\right)}}{\text{d}t}+a{x}^{\left(1\right)}=b$ (2.6)

is called an albino differential equation in mean form of GM (1, 1) model. The matrix B in Equation (2.2) is replaced by Equation (2.5). The parameter vectors in Equation (2.6) are estimated according to the least square estimation method. $\stackrel{^}{a}={\left[a,b\right]}^{\text{T}}$ With the help of the solution of the above-mentioned whitening differential equation Equation (2.6), the mixed model of difference and differential of the time response equation of GM (1, 1) is constructed, which is called the mean mixed form of GM (1, 1) model, which can be called the mean GM (1, 1) model. The parameter of the mean GM (1, 1) $-a$ model is the development coefficient, and B is the grey action. The development $-a$ coefficient reflects ${\stackrel{^}{x}}^{\left(1\right)}$ the ${\stackrel{^}{x}}^{\left(0\right)}$ development trend of as well. The estimated model parameters in the form of mean value of GM (1, 1) model can be directly taken the solution of mean difference Equation (2.4) as the model obtained in the time response formula, which is called the mean difference form of GM (1, 1) model, that is, GM (1, 1) model called mean difference. The ${x}^{\left(1\right)}\left(k+1\right)={\beta }_{1}{x}^{\left(1\right)}\left(k\right)+{\beta }_{2}$ discrete form called GM (1, 1) model is called discrete GM (1, 1) model .

GM (1, 1) model can only use the behavior data sequence of the system to establish the prediction model, which belongs to a relatively simple and practical single sequence modeling method. In the case of time data, it only involves some regular time variables, and in the case of horizontal sequence data, it only involves some regular object serial number variables, but does not involve other explanatory variables. It is simple and can discover some valuable development and change information at the same time, so it has been widely used. However, the practice of forecasting shows that GM (1, 1) model is sometimes very effective, sometimes it is very unsatisfactory, and sometimes it will lose its utility. In order to make up for this small deviation, sometimes some combination models can be used to make up for the deficiency that GM (1, 1) model cannot describe some linear relationships among variables .

3.2. Combining Multiple Linear Regression with Grey Prediction

3.2.1. The Inadequacy of Multiple Linear Regression

Although multivariate linear regression is widely used, it also has its own shortcomings. When we can’t determine what kind of function conditions the data meet, multivariate linear can’t determine its specific index, and can’t describe the exponential growth trend. This grey prediction can just complement it  .

3.2.2. Shortcomings of Grey Prediction

Grey prediction sometimes depends too much on historical data, and does not consider the relationship between various factors, which is too much error for medium and long-term prediction results.

3.2.3. Combining Multiple Linear Regression with Grey Prediction

The grey linear regression combination model can make up for the deficiency of the original linear regression model that can’t describe the exponential growth trend and the GM (1) model that can’t describe the linear relationship among variables, and can predict both the linear growth trend and the exponential growth trend, so the data analysis has higher accuracy.

3.3. Grey Linear Combination Model

Using the method of combining multiple linear regression model with grey forecasting GM (1, 1) model, this paper puts forward that grey linear regression combination model can be used for forecasting, that is, the sum between linear regression $y={\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+\cdots +{\beta }_{p}{x}_{p}+\epsilon$ equation and exponential $y={C}_{1}{\text{e}}^{-ak}+{C}_{2}$ equation can be used ${X}^{\left(0\right)}$ ’s1-AGO ${X}^{\left(1\right)}$ for fitting.

This chapter plays an important role in this paper, This chapter introduces the advantages and disadvantages of the traditional multiple linear regression model, and introduces the grey GM (1, 1) model, and some advantages and disadvantages of the grey GM (1, 1) model. After analysis, it puts forward a model combining the two models, which makes comprehensive use of their advantages and makes them complement each other at the same time, so as to achieve better prediction data. At the same time, this chapter also introduces the research significance of the new model proposed in this paper .

4. Application of Forecasting Model in National Economic Data

4.1. National Economic Index Data and Preprocessing

4.1.1. National Economic Indicators

The national economic indicators used in this paper come from the data of China Statistical Yearbook 2017 in recent ten years, that is, the data of gross national product from 2007 to 2016.

The data in Figure 2 is the data that needs to be referenced in this article.

4.1.2. Sample Data Preprocessing

If we want to establish a linear relationship between the gross national product and other values, and hope to achieve the purpose of prediction, then we must establish a multiple linear relationship between the attributes of the current year and the gross national product of the next year.

For example:

$\begin{array}{c}\text{GDPin2001}=\text{Constant quantity}+{k}_{1}\cdot \text{Gross National Product Income in 2000}\\ \text{\hspace{0.17em}}\text{ }\text{ }+{k}_{2}\cdot \text{Primary industry in 2000}+\cdots +{k}_{5}\cdot \text{GDP per capita}\end{array}$

By analogy, the linear relationship established in this way is the multiple linear relationship between the current year’s gross national income, the primary industry, the secondary industry, the tertiary industry and the per capita GDP and the next year’s GDP, that is to say, the current year’s value is used to predict the next year’s GDP. Import the data in the above table into SPSS as shown in Figure 3.

Figure 2. National economic data table 2007-2016.

Figure 3. SPSS national economic data chart.

4.2. Multiple Linear Regression Forecasting Model Predicts National Economy

Import the data used into SPSS and then do the following (Figure 4).

Introduce “next year’s gross national product” into dependent variables, and “GDP, primary industry, secondary industry, tertiary industry and per capita GDP” into independent variables. By using the entry method, the multiple linear relationship between them is obtained. The specific operation method is shown in the following figure (Figure 5).

The results are obtained as in Tables 4-7.

4.3. Grey Forecasting Model Predicts National Economy

Establish a grey prediction model to set a ${X}^{\left(0\right)}=\left({x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\cdots ,{x}^{\left(0\right)}\left(n\right)\right)$ sequence, which; ${x}^{\left(0\right)}\left(k\right)\ge 0$, $k=1,2,\cdots ,n$ 1-AGO ${X}^{\left(1\right)}$ sequence ${X}^{\left(0\right)}$ of: where; then the ${x}^{\left(1\right)}\text{ }\left(k\right)\text{\hspace{0.17em}}=\text{ }\underset{i=1}{\overset{k}{\sum }}{x}^{\left(0\right)}\text{ }\left(i\right)$, $k\text{\hspace{0.17em}}=\text{\hspace{0.17em}}1,2,\cdots ,n$ parameter ${x}^{\left(0\right)}\text{ }\left(k\right)\text{\hspace{0.17em}}+\text{\hspace{0.17em}}a{x}^{\left(1\right)}\text{ }\left(k\right)\text{\hspace{0.17em}}=\text{\hspace{0.17em}}b$ vector is $\stackrel{^}{a}={\left[a,b\right]}^{\text{T}}$ estimated by the least square $\stackrel{^}{a}={\left({B}^{\text{T}}B\right)}^{\left(-1\right)}{B}^{\text{T}}Y$ method, where Y and B are

$Y=\left[\begin{array}{c}{x}^{\left(0\right)}\left(2\right)\\ {x}^{\left(0\right)}\left(3\right)\\ ⋮\\ {x}^{\left(0\right)}\left(n\right)\end{array}\right]$, $B=\left[\begin{array}{cc}-{x}^{\left(1\right)}\left(2\right)& 1\\ -{x}^{\left(1\right)}\left(3\right)& 1\\ ⋮& ⋮\\ -{x}^{\left(1\right)}\left(n\right)& 1\end{array}\right]$

Import the model data into MATLAB, and get the following graphics (Figure 6).

Figure 4. SPSS linear regression operation.

Table 4. Variables entered into table.

AVariable: Gross national product for the next year (100 million yuan); BHas reached the tolerance = 0.000 limit.

Table 5. Model summary table.

AValue: (constant), per capita GDP (yuan), added value of secondary industry (100 million yuan), added value of primary industry (100 million yuan), gross national income (100 million yuan).

Table 6. Coefficient table.

ANumber of changes: Gross National Product of the next year (100 million yuan).

Table 7. Excluded variables table.

AVariable: Gross national product for the next year (100 million yuan); BForecast value in the model: (constant), per capita GDP (yuan), added value of secondary industry (100 million yuan), added value of primary industry (100 million yuan), gross national income (100 million yuan).

Figure 5. Spss linear regression operation.

Figure 6. MATLAB analysis chart.

4.4. Grey Linear Combination Model Predicts National Economy

Because there are many sequences of this model, the model of 2.3 is further improved here, and it is ${X}^{\left(0\right)}=\left\{{x}_{1}^{0},{x}_{2}^{0},\cdots ,{x}_{n}^{0}\right\}$ assumed that it is accumulated to ${X}^{\left(1\right)}=\left\{{x}_{1}^{1},{x}_{2}^{1},\cdots ,{x}_{n}^{1}\right\}$ obtain, in which ${X}_{i}^{1}=\underset{j-1}{\overset{i}{\sum }}{x}_{j}^{0}$ It ${\stackrel{^}{x}}_{k}^{1}={C}_{1}{\text{e}}^{-vk}+{C}_{2}k+{C}_{3}$ is called grey linear regression combination prediction model, where v and ${C}_{1}$ are ${C}_{2}$ undetermined ${C}_{3}$ parameters.

Orde ${z}_{k}={\stackrel{^}{x}}_{k+1}^{1}-{\stackrel{^}{x}}_{k}^{1}={C}_{1}{\text{e}}^{-vk}\left({\text{e}}^{v}-1\right)+{C}_{2}$, ${y}_{m}\left(k\right)={z}_{k+m}-{z}_{k}={C}_{1}{\text{e}}^{-vk}\left({\text{e}}^{vm}-1\right)\left({\text{e}}^{v}-1\right)$

Then, $\frac{{y}_{m}\left(k+1\right)}{{y}_{m}\left(k\right)}={\text{e}}^{v}$ from this, because ${\stackrel{^}{v}}_{m}\left(k\right)=\mathrm{ln}\left(\frac{{y}_{m}\left(k+1\right)}{{y}_{m}\left(k\right)}\right)$ all ${y}_{m}\left(k\right)$ the data in are predicted ${\stackrel{^}{x}}_{k}^{1}$ values, in order to facilitate v fitting ${\stackrel{^}{x}}_{k}^{1}$ parameters, all of them are replaced by ${x}_{k}^{1}$ sequential accumulation sequences, so that the fitting v values of ${\stackrel{^}{v}}_{m}\left(k\right)$ parameters to be m identified can be determined, Considering that different values are taken, ${\stackrel{^}{v}}_{m}\left(k\right)$ the values of the obtained fitting parameters will be $m=1,2,\cdots ,n-3$ different, so all ${\stackrel{^}{v}}_{m}\left(k\right)$ the obtained selected mean values ${\stackrel{^}{v}}_{m}\left(k\right)$ can be selected and calculated separately, and the finally obtained mean values can be taken v as the fitting $\stackrel{˜}{V}=\frac{\underset{m\text{=}1}{\overset{n-3}{\sum }}\underset{k\text{=}1}{\overset{n-2-m}{\sum }}{\stackrel{^}{v}}_{m}\left(k\right)}{\left(n-2\right)\left(n-3\right)/2}$ values of v the parameters $\stackrel{˜}{V}$ to be identified: When the fitting values of the parameters are determined, ${\stackrel{^}{x}}_{k}^{1}={C}_{1}{\text{e}}^{-\stackrel{˜}{V}k}+{C}_{2}k+{C}_{3}$ the grey linear combination prediction model is as follows: ${X}^{\left(1\right)}=\left[\begin{array}{c}{x}_{1}^{1}\\ {x}_{2}^{1}\\ ⋮\\ {x}_{n}^{1}\end{array}\right]$, $C=\left[\begin{array}{c}{C}_{1}\\ {C}_{2}\\ {C}_{3}\end{array}\right]$, $A=\left[\begin{array}{ccc}{\text{e}}^{-\stackrel{˜}{V}}& 1& 1\\ {\text{e}}^{-\stackrel{˜}{V}}& 2& 1\\ ⋮& ⋮& ⋮\\ {\text{e}}^{-\stackrel{˜}{V}}& n& 1\end{array}\right]$ Then, the matrix ${X}^{\left(1\right)}=AC$ estimation formula of the parameters to be identified C1, C2, C3 can be obtained, and the prediction of $C={\left({A}^{\text{T}}A\right)}^{-1}{A}^{\text{T}}{X}^{\left(1\right)}$ this model can be carried out by the above formula.

As shown in Figure 7, so

$\begin{array}{l}{X}^{\left(0\right)}=\left\{270232.3,\text{\hspace{0.17em}}319515.5,\text{\hspace{0.17em}}349081.4,\text{\hspace{0.17em}}413030.3,\text{\hspace{0.17em}}489300.6,\text{\hspace{0.17em}}540367.4,\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}595244.4,\text{\hspace{0.17em}}643974,\text{\hspace{0.17em}}689052.1,\text{\hspace{0.17em}}743585.5\right\}\end{array}$

Carry out an accumulation generation

$\begin{array}{l}{X}^{\left(1\right)}=\left\{270232.3,\text{\hspace{0.17em}}589747.8,\text{\hspace{0.17em}}938829.2,\text{\hspace{0.17em}}1351859.5,\text{\hspace{0.17em}}1841160.1,\text{\hspace{0.17em}}2381527.5,\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}2976771.9,\text{\hspace{0.17em}}3620745.9,\text{\hspace{0.17em}}4309798,\text{\hspace{0.17em}}5053384.5\right\}\end{array}$

Therefore, from the formula provided in this paper, we can see Figure 8 and Figure 9.

Therefore, $\stackrel{˜}{V}=-0.127.$

Therefore $C=\left[\begin{array}{c}1.3433\\ 2.4962\\ -1.3839\end{array}\right],$ so ${\stackrel{^}{x}}_{k}^{1}=1.3433{\text{e}}^{0.127k}+2.4962k-1.3839$, $\stackrel{˜}{V}=\frac{\underset{m\text{=}1}{\overset{n-3}{\sum }}\underset{k\text{=}1}{\overset{n-2-m}{\sum }}{\stackrel{^}{v}}_{m}\left(k\right)}{\left(n-2\right)\left(n-3\right)/2}$ $error=\frac{1}{10}\underset{k=1}{\overset{10}{\sum }}\frac{|{x}_{k}^{1}-{\stackrel{^}{x}}_{k}^{1}|}{{x}_{k}^{1}}.$

Figure 7. Gross domestic product 2007-2016.

Figure 8. ${y}_{m}\left(k\right)$ and the relationship between m and k.

Figure 9. ${\stackrel{^}{v}}_{m}\left(k\right)$ and the relationship between m and k.

4.5. Comparison of Results

From several results, the prediction model used in this paper is more accurate and closer to the predicted value, which better shows the advantages of prediction, and combines the advantages of linear regression with the advantages of grey prediction.

This chapter is the practical part of the data in this paper, using the three models mentioned in this paper to fit the data respectively, and comparing the results obtained, it is found that the model used in this paper is more accurate.

5. Conclusion

In this article, based on the idea of combining linear regression with grey prediction, determines the model of combining the two, and compares it with the separate models of the two. The linear regression model solves the problem that the highest power and function cannot be determined by linear prediction. At the same time, it solves the problem that the grey prediction model can’t make linear prediction. By combining the two models, it can be used in the series prediction of various existing linear trends and exponential growth trends in the future, which not only has extensive significance, but also can make better prediction for data.

Cite this paper: Xiao, Y. and Jin, Z.Z. (2021) The Forecast Research of Linear Regression Forecast Model in National Economy. Open Access Library Journal, 8, 1-17. doi: 10.4236/oalib.1107797.
References

   He, X. and Liu, W. (2015) Applied Regression Analysis. 4th Edition, Renmin University of China Press, Beijing.

   Wang, H. and Meng, J. (2007) Predictive Modeling Method of Multiple Linear Regression. Journal of Beihang University, 33, 500-504.

   Zhong, L. and Gao, S. (2017) Application of Multiple Linear Regression Model in Housing Price Trend and Forecast. Science Entrepreneurship Monthly, No. 9, 94-96.

   Leng, J., Gao, X. and Zhu, J. (2016) Application of Multiple Linear Regression Statistical Prediction Model. Statistics and Decision-Making, No. 7, 82-85.

   Liu, S.F., et al. (2017) Grey System Theory and Its Application (8th Edition). Science Press, Beijing.

   Wu, S. (2009) Grey Rough Set Model and Its Application. Science Press, Beijing.

   Deng, J. (2007) Grey Mathematical Resources] Introduction to Science. Huazhong University of Science and Technology Press, Wuhan.

   Lu, Y. (2017) Financial Forecasting Method Based on Grey Linear Combination Statistics and Decision, No. 10, 91-93.

   Wang, B. (2012) Forecasting Application Based on Linear Regression and Neural Network in National Economic Data. Jilin University, Changchun.

   Fisher, R.A. (1936) The Use of Multiple Measures in Taxonomic Problems. Annals of Eugenics, 7, 179-188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

   Bagirov, A.M., Julien, U. and Dean, W. (2011) Fast Modified Global k-Means Algorithm for Incremental Cluster Construction. Pattern Recognition, 44, 866-876. https://doi.org/10.1016/j.patcog.2010.10.018

   Zheng, C.L., Fan, J. and Fei, M.R. (2009) PID Neural Network Control Research Based on Fuzzy Neural Network Model. 2009 International Conference on Computational Intelligence and Software Engineering, Wuhan, 11-13 December 2009, 1-4. https://doi.org/10.1109/CISE.2009.5363758

Top