A univariate AutoRegressive Integrated Moving Average (henceforth ARIMA) model is a concise quantitative summary of the internal dynamics of a time series in a linear framework, as such, is useful for several reasons, amongst others for forecasting and model-based time series decomposition in unobserved components. This work will deal with the former and, in particular, with univariate forecasts, which usually serve either as short-term, or benchmark forecasts. However, economic time series from the real world are not usually “ready” to be used for forecasting purposes and they need to undergo some statistical preparation and pre-adjustment. This is because in time series of raw data variance non-stationarity may be present. Furthermore, very often there exist causes that disrupt the underlying stochastic process (existence of outliers, calendar effects, etc.). Their treatment is known as “linearization”.
Within that line of reasoning, statistical forecasts can be made after a series itself, or some variance stabilizing transformation of it, is “linearized” according to the general framework  :
where: , f is some transformation of the raw series zt, which may be necessary to stabilize the variance;
is a vector of regression coefficients;
denotes n regression or intervention variables;
denotes the matrix with columns possible calendar effect variables (e.g. trading day) and the vector of associated coefficients;
is an indicator variable for the possible presence of an outlier at period ;
captures the transmission of the j-th effect and denotes the coefficient of the outlier in the multiple regression model with m outliers;
follows in general a multiplicative seasonal ARIMA(p, d, q) (P, D, Q)s model:
• is the so-called autoregressive polynomial of order p;
• is the so-called moving average polynomial of order q;
• is the arithmetic difference operator of order d;
• is the seasonal arithmetic difference operator of order D and seasonality s;
• is the so-called seasonal autoregressive polynomial of order P and seasonality s;
• is the so-called moving average polynomial of order Q and seasonality s;
• is the stochastic disturbance.
As far as variance stabilization is concerned, if variance is somehow functionally related to the mean level it is possible to select a transformation to stabilize the variance. Widely used transformations to tackle this problem belong to the class of the power Box and Cox transformation . For example, very often used transformations are given by:
On the other hand, outliers are major changes in values that especially stand out in a time series. In the TSW1 framework, of which use will be made in this work, three types of outliers are detected according to their effect in a time series: Additive outliers (AO), Transitory Change outliers (TC), and Level shifts (LS). In an additive outlier the value of only one observation is affected. In a transitory change the value of one observation is extremely high or low and then the size of the deviation reduces gradually. In the level shift the level of the time series is changed. As far as the detection of outliers is concerned within the TSW framework, outliers are automatically detected, classified and corrected using the Chen and Liu  approach (further details in Section 3).
So, there are two effects with potential influence on forecasting: «linearization» and transformation, each of which separately, as well as in combination, may play an important role on time series forecasting.
At the empirical level, studies which have considered the merits of mathematical transformations on forecasting have demonstrated that transformation often does not have a positive effect on forecast accuracy    .
On the other hand, at the theoretical level, Granger and Newbold  found that such forecasts are not optimal in terms of minimization of Mean Square Forecast Error (MSFE). More specifically, for instance for the most popular transformation, namely the logarithmic one, they showed that the minimum MSFE h-step ahead forecast is not equal to , as implied by the previous discussion, but is given by the expression , where is the h-step ahead forecast error variance. Pankratz and Dudley , building up further on the work of Granger and Newbold , relate the bias in using simply the inversely transformed value of the forecasts on the transformed time series (as compared to the minimum MSFE forecast) amongst others to the value of the exponent λ of the power transformation. The two most frequent transformations, namely the logarithmic and the square root ones, under certain conditions may be associated with serious biases .
Regarding time series linearization, such a procedure is utilized thus far mainly as a preadjustment task for seasonal adjustment , so its effect on forecasting has not been examined systematically, but only indirectly and fragmentally.2 It is also remarked that even in studies coping with forecasting with transformed data the attention focuses almost exclusively on point forecasts, by and large disregarding interval forecasts.
Aiming at covering this research gap in the literature the objective of this work is in fact twofold: a) to examine the effect of «linearization» and transformation separately, as well as in combination, on both point forecasts and confidence interval forecasts; b) as a further application, we rank main economic indicators of the Greek economy in terms of statistical «forecastability». The intended approach will be practical.
The structure of the paper is as follows: In Section 2 details about the data to be used for the empirical analysis are given; Section 3 presents the empirical results and relevant comments; Section 4 summarizes and concludes the paper.
The data set comprises some of the most important macroeconomic time series for the Greek economy, which refer to: GDP; unemployment; prices of consumer goods and services; monetary aggregates; and balance of payments statistics. Particularly, in the balance of payments, a distinction is made between imports—exports of all goods and imports—exports of goods without fuels and ships, as according to a study by the Bank of Greece  the dependence of the Greek economy on oil was high and was rising at the fastest pace among the euro area countries. Furthermore, from the same study it is noted that the balance of payment of sea transport is significant in the Greek balance of current transactions (4% of GPD in 2008) and will be considered separately from other BOP transactions on transport. More specifically about the data, twenty economic time series were used, of which nineteen were monthly time series, while one was quarterly time series. The list of time series used is given in Table 1. The data are available at the official websites of Bank of Greece (BoG) and Hellenic Statistical Authority (ELSTAT), [http://www.bankofgreece.gr/ and http://www.statistics.gr/, respectively].
The monthly time series data cover the period from January 2004 to August 2018 and consist of one hundred and seventy-six (176) observations, except from Industrial Production Index, where available data existed from January 2010 to August 2018 (104 observations). The quarterly time series is that of Gross Domestic Product and covers the period from 1995 Quarter 1 to 2018 Quarter 3 (95 observations).
3. Empirical Results and Comments
As mention in Section 1, the effect of transformation and the effect of linearization on forecasting will be examined at first each one separately and, subsequently, in combination. The aforementioned effects will be studied utilizing TSW.
Typical statistics to be used for the assessment of the quality of point forecasts are the following: 1) the Mean Absolute Percentage Error (MAPE) statistic given by: , 2) the Mean Square Forecast Error (MSFE) statistic given by: , and 3) the Mean Absolute Error (MAE) statistic , where is the actual value and is the forecast value. Furthermore, as far as interval forecasts are concerned, the width of the forecast confidence interval (CI) will be considered.
Table 1. Data.
3.1. The Effect of “Linearization” on Forecast Quality
We will investigate how time series linearization affects the quality of both point forecasts and confidence interval forecast. Here linearization will not be considered in its generality, as described in Section 1, but will be confined to outliers’ detection and adjustment3. Table 2 presents the number of best forecasts, in terms of minimization of the corresponding statistic, with data in levels. Auxiliary Table 3 presents the number of best forecasts with log-transformed data indistinguishably for all time series, as it is often the case to use log-transformed data in econometric analyses. It is noted that in one time series with levels (that of unemployment expressed in percentages) and one time series in logs (that of industrial production index) no outliers were detected, hence, the total number of time series considered reduced to nineteen for each case.
From the results of Table 2 and Table 3 it is apparent that, when outliers are considered, forecasts are better in every single case in terms of the width of the forecast confidence interval. In contrast, there is no obvious improvement in
Table 2. Summary table, Number of best forecasts (levels)4.
Table 3. Summary table, Number of best forecasts (log-data).
point forecasts. One point that should be stressed is that such results are in general dependent upon the specific characteristics of each time series, especially upon whether an outlier lays among the first, the middle or the last observations. For this reason, it would be desirable to use a large number of time series, so as to draw conclusions of indisputable confidence. Although the number of time series used in this work is relatively small (though comparable to that of other similar works, see for instance Nelson and Granger, 1976)  the evidence that lead to the above conclusions, in particular regarding the width of the forecast confidence interval, is so convincing that it really stands far and beyond any concern related to micronumerosity.
3.2. The Effect of Level Shifts (LS), in Particular, on Forecast Quality
After a level shift outlier, all observations subsequent to the outlier move to a new level. In contrast to additive and transitory outliers a level shift outlier reflects a major change in the stochastic process and affects many observations, as it has a permanent effect. For this reason, the case with only additive and transitory outliers (i.e. excluding level shifts) was considered, and their effect on forecasts was examined separately, performing the same analysis as in Section 3.1. It is noted that this time only fifteen time series were considered, i.e. those including all types of outliers. The results are presented in Table 4 and Table 5.
From the results below it is obvious that there is a trade-off: confidence interval forecasts are better with level shift outliers included and, conversely, point forecasts are better excluding level shifts. Given the influence of the level shift
Table 4. Summary table, Number of best forecasts (levels).
Table 5. Summary table, Number of best forecasts (log-data).
outliers it would be desirable to possibly consider stricter identification criteria for them relative to the other two types of outliers. It is noted that in existing statistical software specializing on time series analysis there is no such an option and a purpose-built routine should be created by the researcher.
3.3. The Effect of a Data Transformation on Forecast Quality
As far as the effect of data transformation is concerned, at first it is important to note that the effect of a transformation is meant in two ways: 1) direct and 2) indirect (through its influence on outlier detection). Indeed, regarding the later, it has been shown that data transformation affects the number and the character of outliers in a time series  .
The possible need for a data transformation of the original time series data will be examined using the TSW routine. Once a decision about the proper data transformation is made, TSW will be used for further analysis on statistical forecasting.
Table 6 presents the results on the decision about, transforming or not, the original time series data. The twenty series were analyzed following the standard TSW procedure. It is noted that the only alternatives available with TSW are either the log-transformation, or no transformation. Using the TSW routine for these twenty cases, TSW suggested the logarithmic transformation of the original data for eighteen cases. It is remarkable that only for the two series of unemployment TSW suggests no transformation.
The possible effect of transforming time series on forecasting quality is examined through Table 7. From the results below it is concluded that point forecasts with TSW transformation method are the same in comparison with that of no transformation in terms of MAPE and MAE, and slightly worse in terms of
Table 6. Decision about data transformation.
MSFE. As already explained, forecasts on transformed variables are not optimal in terms of MSFE. Similarly, confidence interval forecasts are shorter in only eight out of the eighteen cases using transformations with the TSW approach. Thus, data transformations using the TSW routine do not seem to improve
Table 7. Summary table, Number of best forecasts (TSW versus benchmark).
either point forecasts or forecasts confidence intervals. We note, however, that in this case a larger data set is needed for more solid conclusions. Moreover, the outcome may be a result of the restriction of TSW to use only the logarithmic transformation. Further research is needed on that matter allowing for a wider range of transformations.
3.4. The Combined Effect of Linearization and Data Transformation
The results of the examination of the forecasting performance combining both linearization and data transformation are presented in Table 8. The conclusion that is derived is that, by and large, the combined effect does not lead to better point forecasts but leads to improved confidence interval forecasts. The conclusion about the forecast confidence interval is reasonable and, to a large extent, expected, as with the adjustment for outliers the process variance is reduced. It is possible to exploit this reduction in obtaining forecasts with increased confidence.
Appendix 1 presents the ARIMA models for the benchmark model and the combination of TSW variance stabilizing method linearization. It is noted that the differences in the ARIMA models for the two time series where no transformation was needed (that of unemployment expressed in percentages and thousands) should be attributed to the existence of outliers adjusted by linearization.
3.5. Sensitivity Analysis: Outliers (Dependence of Outlier Detection on the Parameter τ)
Let denote the optimal one step ahead linear prediction of given the information set , which includes information up to time T, denote the associated forecast error, and
denote the associated variance. The observation is considered as an outlier if the null Hypothesis: Ho: is rejected. The appropriate statistic to test Ho is .
However, theory cannot predict the critical value of τ above which the corresponding observation can be considered as an outlier. Α usual practice is to relate the critical value of τ with the length of a time series. The default values of
Table 8. Summary table, Number of best forecasts (TSW versus benchmark).
TSW for τ are presented in Table 95. In the course of our experimentation it was observed that outlier detection (as well as ARIMA models for the linearized-transformed series), were very sensitive to the value of parameter τ. In order to examine, whether or not, the critical τ values could have any noticeable effect on our final conclusions, as an alternative set of critical values for τ we used those suggested by Fischer and Planas , who examined a very large number of time series. Their critical values for τ were set at 3.5, 3.7 and 4.0 for series lengths of less than 130 observations, between 131 and 180, and more than 180 observations, respectively.
The comparison of the results based on default critical τ values, as well as on Fischer – Planas recommendations are presented in Table 10, while the detected outliers for each time series and each set of values for the parameter τ are presented in Appendix 2. Looking at Appendix 2 it is observed that the detection of outliers is indeed sensitive even to the examined small changes in the value of τ. On the other hand, however, from the results of Table 10, it is apparent that using the Fisher and Planas critical values for τ leads to mixed results regarding the effect on forecast quality. By and large, there is only very weak evidence of improvement using the Fischer-Planas recommendations6.
3.6. An Ad-Hoc Evaluation of Models’ Forecasting Performance
The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. The presence of a benchmark makes it easier to compare approaches and for this reason a benchmark is proposed to establish a common ground for comparison. In the present case an obvious benchmark is to use the twenty-time series described in section 2, non-linearized and non-transformed. Although there exist established formal tests for forecast evaluation    in this work, in line with its practical
Table 9. Critical values for τ.
character, it suffices to use a very simple and transparent ad-hoc forecasting evaluation approach based on point and interval forecasts.
More specifically, for the point forecasts for each time series and for each model an arithmetic value is assigned in ascending order based on the corresponding value of the MSFE statistic (i.e. 1 for the minimum MSFE value, 2 and 3 for the second and third lower MSFE value respectively, 4 for the maximum MSFE value). Then, adding up the arithmetic values for all series for a particular model their sum will represent the performance of the model. Models will be ranked according to the value of the corresponding sum. Apparently, the model with the lowest sum will be considered as the best one. For interval forecasts the same procedure will be followed replacing the value of the MSFE statistic with the value of the corresponding standard error around the point forecasts. The results are shown in Table 11 and Table 127 and more detailed results are quoted in Appendix 3. It is clarified that TSW transformation approach is coupled with the outlier detection-adjustment approach.
From the results of Table 11 and Table 12 it is evident that the performance of TSW approach for point forecasts is not better than that of the benchmark model (as a matter of fact is slightly worse). On the other hand, for the forecast confidence intervals the Levels—all outliers method has a better performance than TSW and the benchmark model. Furthermore, TSW outperforms the benchmark model. A rather crude way to procced to an overall evaluation of the four models is to add up their performances in the two categories (i.e. point and interval forecasts). The addition gives the values of 108, 113, 91 and 88 for the benchmark model, Logs—no outliers, Levels—all outliers and TSW method respectively, which means that TSW method performs clearly better that the benchmark model and further the overall performance of the TSW method is slightly better than that of the “levels-all outliers model” and clearly better than that of the other two models.
Nelson and Granger  utilized the Box-Cox transformations, amongst others, for forecasting purposes (point forecasts) using twenty-one actual economic time series. As they failed in getting superior forecasts, they reached to the rather pessimistic conclusion that it is not worthwhile to make use of these transformations bearing in mind the extra inconvenience, effort and cost. Their point of view was subsequently adopted by other researchers as well, as already mentioned in the introductory section. Lest to get too disappointed, despite the fact
Table 10. Results based on Fischer-Planas recommendations.
Table 11. Ranking of forecasting performance according to MSFE (point forecasts).
Table 12. Ranking of forecasting performance according to SE (interval forecasts).
that cost and effort are much lower nowadays than what they were at that time, we further note that Nelson and Granger  did not associate forecasts on transformed time series with an outlier detection-adjustment approach. Furthermore, their conclusion was based only on point forecasts, disregarding forecast confidence intervals. The latter are of much importance especially in cases where the focus is on best-worst forecast scenarios. For instance, such is the case with actuarial time series on mortality rates, which may be used further for the construction of pension plans. As shown above, the combination of transformation-linearization leads to shorter forecast confidence intervals.
It should also be stressed that neither in the existing research works thus far, nor in the present one, the treatment of the effect of data transformation on time series forecasting is complete for the simple reason that no work extends the analysis in a bivariate (in general multivariate) framework. Indeed, the existence of variance non-stationarity in time series will contaminate the pre-whitening process (for details about the pre-whitening process see Box and Jenkins, 1976  ), consequently, the sample cross correlation function, so it will mask the true dynamic relationship between two series, one of which is supposed to be the leading indicator, thus affecting negatively the conditional (in this case) forecasts.
3.7. The Shift towards Normality
Another serious concern expressed by Nelson and Granger  was the fact that the problem of acute non-normal distributions they found in most macroeconomic time series they analyzed was restored only very little by their use of data transformations. Table 13 presents the results for the Jarque-Bera statistic for normality . This statistic is distributed as chi-square with two degrees of freedom. An asterisk right next to an arithmetic value of Table 13 indicates a rejection of the null hypothesis of normality at the 5% significance level (critical value = 5.99).
The results of Table 13 allow, again, for a more optimistic view, inasmuch as it is evident that there is a general shift towards normality from the benchmark model to TSW transformation-linearization procedure. The phenomenon on some occasions is really very pronounced indeed (e.g. in the series of M1 and Balance of Payments–transport-payments). This allows for computational algorithms such as maximum likelihood estimation, as well as standard statistical tests, to be legitimately employed with transformed-linearized data.
3.8. Statistical Benchmark Forecasting
Seizing the opportunity of the above analysis, it is useful to assess the forecastability of the twenty time series of the Greek economy. Here forecastability will be perceived in both point and confidence interval forecasts. For the former the MAPE statistic will be employed. For the latter the percentage standard error statistic will be introduced as the mean average of the ratio of the forecasts’ standard error over the corresponding actual value, so as to make forecasts of the various series mutually comparable. In all cases one-step-ahead forecasts will be performed. It is stressed that although these forecasts are technically perfectly acceptable, nevertheless they are purely statistical, hence, a-theoretical, and they can only serve as benchmark forecasts in order to evaluate the merit of more structural econometric forecasts. Table 14, Table 15 shows the results in terms of statistical forecastability, according to the combined transformation-linearization effect (denoted as TSW). More specifically, point forecasts in Table 14 are presented in descending ordered according to the value of the Mean Absolute Percentage Error (MAPE) statistic for the combined transformation-linearization
Table 13. Values of the Jarque –Bera statistic (statistically significant values are indicated with an asterisk).
effect (fourth column), and interval forecasts in Table 15 are presented in descending order according to the value of the Percentage Standard Error statistic for the combined transformation-linearization effect (fourth column).
Table 14. Forecastability of main economic indicators. Greece. Point forecasts.
From the results of Table 14, Table 15, it is observed that although there are many similarities in the two Tables, the ordering is not exactly the same. For this reason, the linear correlation coefficient between orderings based on MSFE and the percentage standard error was used. In all cases there is a strong positive correlation (see Table 16). The method of Levels –all outliers has the highest correlation, while TSW has the lowest.
From Tables 14-15 it is also noticeable that the BOP series are the least forecastable in both Tables. Regarding imports-exports it is noted that the former are less forecastable than the latter. Furthermore, imports-exports excluding fuels and ships are clearly more forecastable than imports-exports including them.
Table 15. Forecastability of main economic indicators. Greece. Interval forecasts.
Table 16. Linear correlation coefficient between MSFE and percentage SE ordering.
This justifies, here from the statistics point of view, the separate recording and usage of the imports-exports without the inclusion of fuels and ships, as presented in the official BOP statistics for Greece .
4. Summary and Conclusions
This work dealt with the effect of data transformation for variance stabilization and linearization for outlier adjustment on the quality of univariate time series forecasts, following a practical approach.
There is clear evidence that linearization improves the forecasts’ confidence intervals, but not such evidence for the data transformation. Furthermore, no evidence was found that either transformation or linearization leads to better point forecasts. The combined effect of transformation-linearization leads to better forecast confidence intervals and improves substantially the non-normality problem encountered in many macroeconomic time series, but worsens point forecasts. There is also evidence that the overall forecasting performance using the TSW data transformation procedure is somewhat better than that of the other used models.
It must be remarked that the above results regarding the effect of data transformation were obtained within the restrictive framework of TSW, which allows the logarithmic transformation as the only alternative. Further research is needed on that mater using a larger dataset and the whole Box-Cox transformations framework.
Table A1. Univariate ARIMA models with and without transformation-linearization.
Table A2. Detected outliers for the different values of parameter τ (the first number indicate the serial number of the corresponding observation, then follows the type of outlier and within the parentheses the corresponding month, or quarter, and year).
Table A3. Detailed forecast quality statistics: MSFE, MAE and Forecast Standard Error.
1TSW stands for TRAMO-SEATS for Windows, a Windows version of the DOS programmes TRAMO and SEATS , and is freely available by the provider (Bank of Spain).
2An additional advantage of “linearizing” the outliers is that such a procedure makes the original data distribution shift closer towards normality. This is important, especially for actual economic data in view of their extreme non-normality in many cases.
3Calendar effects such as the trading day and leap effects were considered and indeed were found to be statistically significant on some occasions. All series were properly adjusted for calendar effects before further analysis.
4In all cases the hold out sample for ex-post forecasts was set to twelve time periods for the monthly series and ten time periods for GDP.
5In the TSW framework the subroutine TERROR is designed especially for outlier detection. Incoming data volume in international institutions like EUROSTAT, ECB, OECD, etc. may be enormous. Such data may be contaminated by errors of various types and origins. Using TERROR is a convenient, yet formal way to spot aberrant observations (outliers). It is highly possible that if erroneous data do exist, they will be included in the set of observations characterized as outliers by TERROR, hence, in a second stage, their possible identification is focused exclusively on that data set. In this work we used the first stage only.
6Indeed, setting the Fisher-Planas critical values instead of the default ones, the results pertaining to those of Table 8 they are identical in terms of the standard error, and 8/18 for MAPE, MAD and MSFE with TSW, as compared to 7/18 using the default critical values.
7If for two models the value of MSFE or SE is exactly the same, the mid-point will be used for both.
 Nelson, H.L. and Granger, C.W.J. (1979) Experience with Using the Box-Cox Transformation When Forecasting Economic Time Series. Journal of Econometrics, 10, 57-69.
 Meese, R. and Geweke, J. (1984) A Comparison of Autoregressive Univariate Forecasting Procedures for Macroeconomic Time Series. Journal of Business and Economic Statistics, 2, 191-200.
 Milionis, A.E. (2004) The Importance of Variance Stationarity in Economic Time Series Modelling; A Practical Approach. Applied Financial Economics, 14, 265-278.
 Jarque, C. and Bera, A. (1987) Efficient Tests for Normality Homoscedasticity and Serial Independence of Regression Residuals. Econometric Letters, 6, 255-259.