The objective of estimation procedures is to produce residuals (the estimated noise sequence) with no apparent deviations from stationarity, and in particular with no dependence among these residuals. If there is no dependence among these residuals, then we can regard them as observations of independent random variables; there is no further modeling to be done except to estimate their mean and variance. If there is significant dependence among the residuals, then we need to look for the noise sequence that accounts for the dependence  .
In this paper, we examine the covariance structure of powers of the noise sequence when the noise sequence is assumed to be independent and identically distributed normal (Gaussian) random variates with mean zero and finite variance, . Some simple tests for checking the hypothesis that the residuals and their powers are observed values of independent and identically distributed random variables are also considered. Also considered are tests for normality of the residuals and their powers.
The stochastic process is said to be strictly stationary if the distribution function is time invariant. That is;
That is, the probability measure for the sequence is the same as that for for all k. If a series satisfies the next three equations, it is said to be weakly or covariance stationary.
If the process is covariance stationary, all the variances are the same and all the covariances depend on the difference between and . The moments
are known as the autocovariance function. The autocorrelations which do not depend on the units of measurements of are given by
A stochastic process , where , is called a white noise if with finite mean and variance all the autocovariances (1.4) are zero except at lag zero [ , for ]. In many applications, is assumed to be normally distributed with mean zero and variance, , and the series is called a linear Gaussian white noise process if:
where is known as the partial autocorrelation function. For large n, the sample autocorrelations:
of an iid sequence with finite variance are approximately distributed as    . We can use this to do significance tests for the
autocorrelation coefficients by constructing a confidence interval. Here is a realization of such an iid sequence, about of the sample autocorrelations should fall between the bounds:
where is the quartile of the normal distribution. If the null and alternative hypothesis are:
where are autocorrelations at lag k computed for .
We can also test the joint hypothesis that all m of the correlation coefficients are simultaneously equal to zero. The null and alternative hypothesis are:
The most popular test for (1.11) is the  portmanteau test which admits the following form
where m is the so-called lag truncation number  and (typically) assumed to be fixed  . Under the assumption that is an iid sequence, is asymptotically a chi-squared random variable with m degree of freedom.  modified the statistic to increase the power of the test in finite samples as
Several values of m are often used and simulation studies suggest that the choice of provides better power performance  .
Another Portmanteau test formulated by  can be used as a further test for iid hypothesis, since if the data are iid, then the squared data are also iid. It is based on the same statistic used for the Ljung-Box test as
where the sample autocorrelations of the data are replaced by the sample autocorrelations of the squared data, .
According to  , the methodology for testing for white noise can be roughly divided into two categories: time domain tests and frequency domain tests. Other time domain tests include the turning point test, the difference-sign test, the rank test  . Another time domain test is to fit an autoregressive model to the data and choosing the order which minimizes the AICC statistic. A selected order equal to zero suggests that the data is white noise  .
be the normalized spectral density of . The normalized spectral density function for the linear Gaussian white noise process is
The equivalent frequency domain expressions to H0 and H1 are
H0: and H1: (1.17)
In the frequency domain,  proposed test statistics based on the famous and processes  , and a rigorous theoretical treatment of their limiting distributions was provided by  . Some contributions to the frequency domain tests can be found in  and  , among others. This study will concentrate on the time domain approach only.
A stochastic process may have the covariance structure (1.6) even when it is not the linear Gaussian white noise process. Examples are found in the study of bilinear time series processes   . Researchers are often confronted with the choice of the linear Gaussian white noise process for use in constructing time series models or generating other stationary processes in simulation experiments. The question now is, “How do we distinguish between the linear Gaussian white noise process from other processes with similar covariance structure”? Additional properties of the linear Gaussian white noise process are needed for proper identification and characterization of the process from other processes with similar covariance structure. Therefore, the ultimate aim of this study is on the use of higher moments for the acceptability of the linear Gaussian white noise process. The first moment (mean) and second or higher moments (variance, covariances, skewness and kurtosis) of powers of the linear Gaussian white noise process was established in Section 2. The methodology was discussed in Section 3, the results are contained in Section 4 while Section 5 is the conclusion.
2. Mean, Variance and Covariances of Powers of the Linear Gaussian White Noise Process
2.1. Mean of Powers of the Linear Gaussian White Noise Process
Let , where is the linear Gaussian white noise process. The expected value of are needed for the effective determination of the variance and covariance structure of . Lemma 2.1 gives the required result.
Lemma 2.1: Let be a linear Gaussian white noise process with mean zero and variance ( follows iid ), then
Let , then
1) Case 1:
Equation (2.5) reduces to
The integral in Equation (2.8) is a gamma function  and by definition
2) Case II:
2.2. Variances of Powers of the Linear Gaussian White Noise Process
Theorem 2.2: Let be a linear Gaussian white noise process with mean zero and variance ( follows iid ), then
Let , then the expected value of is given by Equation (2.1).
Case I: (d even)
From Equation (2.1)
Case II (d odd)
From Equation (2.1)
Table 1 summarizes the mean and variances of . The standard deviation of is also included when . A plot of against d for fixed is given in Figure 1. From Figure 1, we note that for fixed , increase in d leads to an exponential increase in the standard deviation.
The specific objective of this paper is to investigate if powers of are also iid and to determine the distribution of , especially for . The analytical proofs are provided in Section 2.3.
2.3. Covariances of Powers of the Linear Gaussian White Noise Process
Theorem 2.3: If is a linear Gaussian white noise process then
Figure 1. Plot of standard deviation of against power (d) for fixed σ = 1.
Table 1. Mean, variance and standard deviation of .
higher powers of are also white noise processes (iid) but not normally distributed.
Since are iid and , we consider for .
However, for , . Hence
It is clear from Equation (2.20) that when are iid, the powers of are also iid. That is,
The probability distribution function (p.d.f) of can be obtained to enable a detailed study of the series. Theorem 2.4 gives the p.d.f of
Theorem 2.4: If is a linear Gaussian white noise process, then has the p.d.f
If and , the distribution function of Y is, for ,
Let , then since , we have
Of course , where . The p.d.f of Y is and by one form of the fundamental theorem of calculus 
Note that the p.d.f of is the p.d.f of a gamma distribution with parameters . That is, .
However, for a more detailed study on the behavioral of the linear Gaussian white noise process, the coefficient of symmetry and kurtosis for powers of the process are provided in Section 2.4.
2.4. Coefficient of Symmetry and Kurtosis for Powers of the Linear Gaussian White Noise Process
Non-normality of higher powers of ( ) can also be confirmed by the coefficient of symmetry and kurtosis defined by
Figure 2. Plot of kurtosis coefficient against power of the linear Gaussian white noise process.
Table 2. Coefficient of symmetry and kurtosis for .
3.1. Checking for Normality
If the noise process is Gaussian (that is, if all of its joint distributions are normal), then stronger conclusions can be drawn when a model is fitted to the data. We have shown that all powers of the linear Gaussian process are non-normal. The only reasonable test is the one that enables us to check whether the observations are from an iid normal sequence. The Jarque-Bera (JB) test    for normality can be used. The JB test is based on the assumption that the normal distribution (with any mean or variance) has skewness coefficient of zero, and a kurtosis coefficient of three. We can test if these two conditions hold against a suitable alternative and the JB test statistic is
is the sample size while, and are the sample skewness and kurtosis coefficients. The asymptotic null distribution of JB is with 2 degrees of freedom.
3.2. White Noise Testing
We have shown that the sample autocorrelations of . are those of the white noise series if the sample autocorrelations of are also iid. We will adopt the Ljung-Box test by replacing the sample autocorrelations of the data with those of and use the statistic
The hypothesis of iid data is then rejected at level if the observed is larger than the quartile of the distribution.
3.3. Determining the Optimal Value of d
Figure 1 suggests two growth models: 1) the quadratic growth model and 2) exponential growth model. We are going to use the behavior of the variance and kurtosis coefficient to determine the optimal value of d. The optimal value is that value of d that gives a perfect fit for either the quadratic or exponential growth curves. Using the standard deviation for , the exponential growth curve performs better than the quadratic growth curve. The quadratic growth curve fitted negative values to positive values at the different data points while the exponential curve fitted only positive values. However, the residual of the resulting exponential curve is very large as measured by the following accuracy measures  .
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
Mean Squared Error (MSE)
where m is the value of d used in the trend analysis and,
When , the quadratic growth curve performs better than the exponential curve with minimal residual. Both curves fitted positive values at different data points. We also observed from Table 3 that with , the quadratic
Table 3. Summary of accuracy measures for the exponential and quadratic curves using the standard deviation of for .
Table 4. Fitting exponential and quadratic curves to the standard deviation of powers of linear Gaussian white noise process when and .
*Exponential and Quadratic trend analysis cannot be possible for or .
growth curve performs optimally than the exponential growth curve. The resulting quadratic curve yielded zero residual. The implication of the result is that we obtain a perfect fit for the data point when for the quadratic curve only. Hence, the optimal value of d is 3 when we use the standard deviation curve.
Figure 2 also suggests two growth models: 1) the quadratic growth model and 2) exponential growth model. Using the kurtosis coefficient for , the exponential growth curve performs better than the quadratic growth curve. The quadratic growth curve fitted negative values to positive values at the different data points while the exponential curve fitted only positive values.
When , the quadratic growth curve performs optimally than the exponential growth curve. The resulting quadratic curve yielded zero residual as that of the standard deviation curve. The implication of these results is that we obtain a perfect fit for the data point when for the quadratic curve only. Hence, the optimal value of d is 3. Therefore, we recommend that in order to stop the variance from exploding, the order of the data points should not be raised to power greater that three.
3.4. On the Use of Higher Moment for the Acceptability of the Linear Gaussian White Noise Process
We have shown that if is a linear Gaussian white noise process, is also iid but not normally distributed. Using the variances and kurtosis of , we were able to establish that the optimal value of d is three. Variances and kurtosis of have been given in Table 5 and Table 6 respectively. It is also clear from Equation (2.24) that the kurtosis itself is a function of variances. We, therefore, insist that for a stochastic process to be accepted as a linear Gaussian white noise process, the following variances must be true:
Table 5. Summary of accuracy measures for the exponential and quadratic curves using the Kurtosis Coefficient of for .
*Exponential and Quadratic trend analysis cannot be possible for or .
Table 6. Fitting exponential and quadratic curves to the kurtosis coefficient of powers of linear Gaussian white noise process when and .
In view of these, we suggest that the two following null hypothesis be tested before a stochastic process is accepted as a linear Gaussian white noise process:
Then, the chi-square test statistic  for testing (3.12) is
while that for (3.13) is
where and are the estimated variance of the second and third power of the stochastic process, is the null value for the true variance of the stochastic process and n is the number of observations of the random digits. The null hypothesis is rejected at level if the observed value of is larger
than quartile of the chi-square distribution with . Degree of freedom.
For an illustration, six (6) random digits were simulated using Minitab 16 series (see Appendix). The simulated series met the following conditions: 1) The simulated series are normal and 2) Powers of are shown to be iid but not normally distributed (see Table 7).
Table 7. Descriptive statistics and estimate of the test statistic for rejecting the null hypothesis of equality of the variance of higher moment for six simulated series, , as linear Gaussian white noise process.
The value of the chi-square test statistic for testing (3.12) and (3.13) are also shown in Table 7. We observed that the null hypothesis is rejected at level equals 5% for two simulated series and is not rejected for the other four. The result clearly showed that testing the variance of higher moments for is a necessary condition for accepting the linear Gaussian white noise process.
We have been able to show that if are iid then, all powers of are also iid but, non-normal. Hence, we computed the kurtosis of some higher powers of and established that an increase in the powers of leads to an exponential increase on the kurtosis. We recommend that stochastic processes (white noise processes) and processes with similar covariance structure should be considered for normality, white noise testing and for test of the variance of higher moments being equal to the theoretical values of Table 1 with .
Table A1. Six simulated white noise series: data.
 Box, G.E.P. and Pierce, D.A. (1970) Distribution of Residual Autocorrelations in Autoregressive Integrated Moving Average Time Series Models. Journal of the American Statistical Association, 65, 1509-1526. https://doi.org/10.1080/01621459.1970.10481180
 Shao, X. (2011) Testing for White Noise under Unknown Dependence and Its Applications to Goodness-of-Fit for Time Series Models. Econometric Theory, 27, 1-32.
 Jarque, C.M. and Bera, A.K. (1980) Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals. Economics Letters, 6, 255-259.
 Jarque, C.M. and Bera, A.K. (1981) Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo Evidence. Economics Letters, 7, 313-318. https://doi.org/10.1016/0165-1765(81)90035-5