The hydrological modelling of a river system is not only complex but also challenging because of increasingly anthropogenic influence on river flow and climate change coupled with lack of relevant data to characterise river systems. In many parts of the world increases in water demand for irrigation, domestic, mining and industrial uses as well as the conflicting demands for sustainability of the ecosystem have resulted in water stressed river basins being fully developed with highly regulated river flows. An efficient management of available limited water resources needs to be promulgated for sustainable water resource utilisation rather than construction of new hydraulic infrastructure facilities to meet water shortage challenges. The water shortage problems in river basins can be well addressed based on proper understanding of the past, present and the future hydrological patterns of river flow. Real time flow forecasting is the most important application of hydrology for decision making in water resources management ( Dessalegn et al., 2017). Moreover, Musa (2013) opined that in order to combat water shortage issues, maximising water management efficiency based on stream flow forecasting is crucial. Furthermore, Otache et al. (2011a) emphasized that the dynamics and accurate forecasting of stream flow processes of a river are of paramount importance in the management of extreme events such as floods and droughts, optimal designs of water storage structures and drainage networks.
It should be noted however that modelling accuracy of river flow for water resources planning is highly dependent on relevant data availability, which mostly is limited in many river basins especially in Africa. In South Africa, most river basins are water stressed with highly regulated river flows and limited hydrological data. The available historical data are quite short and that if long term records are available, they are mostly associated with long range gaps of inconsistent missing records (missing information) that may lead to discarding of the records. In situations like this, a system designed based on such historical data is subject to inefficiency and may have a great chance of being inadequate for the unknown flow sequence that the system might experience. It should be realised that historical data comprising a single short series do not cover a sequence of low flows as well as high flows. Thus, the reliability of a system has to be evaluated under these conditions which are not possible with historical data alone (Otache et al., 2011a). Moreover, it should be noted that the existing simulation models and software do not effectively support scenario analyses (Peng et al., 2018). Therefore, there is a need for a proper tool that can model river system flow to mimic actual physical natural flow characteristics under prevailing conditions.
In South Africa, an increasing importance is placed on the operation of reservoirs and water resource systems. Since 1985, the water resources yield model (WRYM) and the water resources planning model (WRPM) have been applied to support decisions concerning the operations of water supply to the region. The WRYM and WRPM apply stochastically generated stream flow sequences for operational analysis (Nyabeze et al., 2007). Moreover, the water research commission ( WRC, 2004 ) monthly stochastic stream flow model for South Africa (STOMSA) has been used to generate coefficients for stochastic stream flow. The generated coefficients in STOMSA subsequently are used as input to the WRYM/WRPM models ( Nyabeze et al., 2007), Unfortunately these models are not capable of comprehensively modelling stochastic river/stream flow to provide conclusive results on the best model to be selected.
Over the years, mathematical models have been developed for stochastic modelling of river/stream flow based on either statistical analysis (Mukherjee & Mansour, 1996) or physical considerations (Garrote & Bras, 1995). In statistical modelling, the historical record is considered to be a sample out of a population of natural river flow process. It is not essential to realise that generated flows are neither historical flows nor a prediction of future flows but rather are representative of likely flows in a river system (Otache et al., 2011a).
The history of development of ARMA models started in many decades back, since in 1920s, when Autoregressive (AR) models were introduced for the first time by (Yule, 1926). In 1930s these models gained more strength, when they were supplemented with Moving Average (MA) schemes by (Slutsky, 1937) and when the AR models and MA schemes were treated as a combined model by (Wold, 1938). Wold (1938) proved that it is possible that the ARMA modelling processes can be used to model all stationary time series as long as the appropriate order of p (i.e. the number of AR parameters/terms), and q (i.e. the number of MA parameters/terms) are appropriately specified. This may imply that for any time series data denoted by can be modelled as a combination of past values and/or past errors/residuals denoted by in ARMA (p, q) model family as:
in which is the series of data to be modelled; p is the number of AR parameters; is the jth AR parameter; q is the number of MA parameters; is the jth MA parameter; C is a constant; and is the innovation process (residual series also known as white noise term); and N is sample size (i.e. the total number of observations).
It should be noted that in a real world modelling of time series systems, it requires that four steps should be considered, namely (Makridakis & Hibon, 1995): firstly, the original time series should be transformed to become stationary around its mean and its variance. Secondly, the appropriate order of p and q should be specified. Thirdly, the value of the parameters for and/or for must be estimated using some non-linear optimisation procedure that minimizes the sum of square errors or some other appropriate loss function. Fourthly, practical ways of modelling seasonal series must be envisioned and the appropriate order of such models specified.
It was Box and Jenkins (1976) who popularized the use of ARMA models through their establishment of methodological procedure for making the series stationary in both its mean and variance. They also suggested the use of autocorrelations and partial autocorrelation coefficients for determining appropriate values of p and q and its seasonal equivalent P and Q when the series shows seasonality. Moreover, they provided a set of computer programs that can assist to identify appropriate values of p and q, as well as P and Q, and estimate the parameters involved. Their procedure followed by diagnostic check to determine whether or not the residuals comply with white noise conditions (i.e. whether or not it satisfy the condition specified for white noise). The Box and Jenkins approach came to be known as the Box-Jenkins methodology to ARIMA models, where the letter “I” stand for word “Integrated”, hence it sounds as “Autoregressive Integrated Moving Average” (ARIMA) models. Moreover, generally Box-Jenkins methodology modelling framework, considers three key steps for fitting ARMA models, as: 1) Identification of model; 2) Model parameters estimation; and 3) Diagnostic process. In this case, these steps are repeated until the best model is obtained.
It is important to know that in many cases in order to account for stationary and seasonality issues, ARMA models are used with different transformations of the original (observed) series (Granger & Newbold, 1976). The commonly used transformations are the logarithm transformation (Box & Jenkins, 1977) and square root transformation (Mcleod et al., 1977). Usually these transformations decide the class to which the model belongs. It should be noted that both observed and standardized series establish important classes and that standardization ensures the removal of periodicities inherent in the process (Mujumdar & Kumar, 1990). In many cases, the ARMA model includes the AR and seasonalized ARIMA models (Musa, 2013; Vandaele, 1983; Otache et al., 2011b). The common and easy way to include seasonal ARIMA models as ARMA model is through deseasonalized ARMA model.
In this study, a stochastic approach is presented in view that a time series modelling is achieved through Autoregressive Moving Average (ARMA) model family, the ARIMA model was obtained through a deseasonalized ARMA model. The model that best describes the marginal probability distribution of river flows was selected for representation of the Letaba river flow time series process and for forecasting. The river flow discharge of a 25 years data set was analysed using three different models, the Autoregressive (AR) model, Autoregressive Moving Average (ARMA) model and Autoregressive Integrated Moving Average (ARIMA) model. The Autocorrelation function (ACF) and Partial Autocorrelation function (PACF) were used for initial model identification.
2. Materials and Methods
2.1. Study Area
The Great Letaba River system (also known as Groot Letaba River system) falls within the Olifants River basin Water Management Area (WMA). It is one of the major tributaries of the Olifants River. The Great Letaba River sub-catchment lies on latitude 23˚6'1"S - 24˚5'59"S and longitude 29˚48'6"E and 31˚49'17"E (Figure 1). It is located in the Limpopo Province, north-eastern part of South Africa. The sub-catchment drains a surface area of approximately 13,670 km2 (DWAF, 2006). The Great Letaba, Klein (Little) and Middle Letaba Rivers are the main tributaries of the Letaba River. The Letaba River is a perennial river
Figure 1. The Letaba river system and flow gauging station networks.
contributing over 50% of the downstream flow of the Olifants River into the Kruger National Park (KNP). KNP is one of the world’s renowned conservation tourism areas in South Africa. Hence, an effective management of the Letaba River flow is of paramount importance for the sustenance of the ecosystem and maintenance of downstream water requirements. Generally, the flow gauging stations network in the Letaba River system is poor with most of the stations concentrated in the upper part of the catchment (DWAF, 2006). Annual rainfall ranges from 1000 mm - 2000 mm with higher rainfalls at the mountainous escarpment areas of high altitudes with wet and humid conditions. Mean annual temperature is about 18˚C and relative humidity of about 70% (DWA, 2004).
The monthly time series river flow data of the downstream gauge station B8H034 for the Letaba River system were found to be adequate with minimum missing records compared to the other gauge stations records hence, were used to model the river system flow process.
2.2. Theoretical Formulations and Statistical Procedures
The MATLAB 2014a Econometrics ToolboxTM Package was used to carry out the analysis of a monthly observed time series river flow data set. The observed river flow time series data set of 25 years (i.e. equivalent to a sample size of 300 monthly observations) obtained from Department of water and Sanitation (DWS), South Africa, was first standardized and used for computation of the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF). The standardised series denoted by , in this case, is defined as the time series, , as:
where is the estimated mean of the river flows of the ith month to which time period t belongs and is the estimated standard deviation of the river flows of the time period i. It was important to standardize time series data to ensure that data set does not have periodicities patterns inherent in the process.
Autoregressive Moving Average (ARMA) modelling process was used to model stationary time series for a given appropriate order of Autoregressive (AR) and Moving Average (MA) (i.e. the parameters p and q) of modelled time series . The time series was modelled as a combination of past observations and past innovations/errors/residuals . Hence, in general form ARMA (p, q) model was modelled as Equation (1) but, for clarity purpose herein rewritten as:
in which all parameters are as previously defined.
It was assumed that the innovation process (i.e. the residual/white noise) term in equation (3) constitutes an independently identically distributed (i. i.d.) random variable with mean zero and uncorrelated individual elements. It should be realised that the AR (p), MA (q) models, and a combined ARMA (p, q) model, usually are expressed in lag operator polynomial notation in a compact form as , in which the ARMA lag operator polynomial function defines both the degree p AR lag operator polynomial function ) and degree q MA lag operator polynomial function ) . Hence, the ARMA (p, q) lag operator polynomial function model is written as:
Dividing each term of Equation (3) by , the equation can be reformulated as:
where is the unconditional mean of the stochastic process, and is a rational infinite degree lag operator polynomial , such that
It should be noted that the constant property of an arima model object function corresponds to a constant C and not to the unconditional mean . Equation (5) is considered as a stationary stochastic process provided that the coefficients are absolutely summable. Note that the stability and invertibility of the ARMA polynomial is enforced in the MATLAB-Econometrics Package and that its estimate imposes stationarity and invertibility constraints during estimation.
In this work, the overall modelling effort was to determine hydrological model that can best describe the Great Letaba river flow. The modelling method used involved two main stages, model selection and forecasting using a regression model with ARIMA (p, D, q) errors. The seasonal ARIMA model was achieved through deseasonalized ARMA model. Generally, model selection was carried out in an iterative process that involved three key steps: specification tests; model comparisons; and diagnostic or goodness-of-fit checks. Specification tests were carried out to identify the model family that describes the data generating process. Initially, models were identified by computing the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) of the time series data set to qualitatively assess autocorrelation. Due to seasonality effects in observed time series data, deseasonalisation was carried out to remove seasonal trend patterns. Differencing technique was used to remove linear trend and to ensure that a stationary stochastic process is achieved. Differencing was carried out by taking a first difference of the time series data set, and then followed by computation of ACF and PACF autocorrelation functions using differenced time series data set.
Based on the ACF and PACF autocorrelation functions results, model order was specified as degree p AR lag, degree q MA lag, with D differencing degree, that is, an ARIMA (p, D, q) model family. Alternatively, the procedure for specifying model order can be achieved by using a log likelihood measure. One of the most used measures is the Akaike’s Information Criterion (AIC), defined as (Musa, 2013).
in which k is the number of independently adjusted parameters in the model, ML is the maximum likelihood value of the model, and * is multiplication sign. For ARMA (p, q) model family, where , the AIC (p, q) value can be calculated as (Musa, 2013):
where is the variance of the innovation process (i.e. the white noise/errors/ residuals variance). In this case, the best model is the one with the lowest calculated AIC value.
The specified model was then checked by using goodness of fit technique to assess the sample adequacy, to verify that all the model assumptions made are valid and evaluate out of sample forecast performance of the selected model. Parameters of the best selected model were then used to formulate a regression model for generating forecasts time series. Monte Carlo simulation forecasts approach was used to generate forecasts time series for the next 25 years. In Monte Carlo simulation forecasts approach, the regression model with ARIMA (p, D, q) errors was specified and forecasted. The regression model for generating forecasts was simulated with 1000 realisations (sample paths).
3. Results and Discussion
The mean monthly time series river flow observed data (from 1989-2014) for gauge station B8H034 for the Letaba River system were plotted over a time horizon (Figure 2).
From Figure 2, the time series data set appears to show some degree of seasonality. Initially, the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) computations were carried out on the standardised observed time series (Figure 3).
The initial ACF and PACF computational results (Figure 3) indicate a significant seasonality with decaying ACF and PACF. This means that the river flow time series process is a seasonal and non-stationary stochastic process. To enforce a stationary stochastic process on the mean monthly time series data (from 1989-2014), differencing technique was used and then differenced Time series stochastic process, autocorrelation function (ACF) and partial autocorrelation function (PACF) were plotted (Figure 4 and Figure 5, respectively).
Figure 4 clearly shows that the differenced time series appears more stationary. The differenced time series was then used to compute and plot the sample ACF and PACF to determine the time series behaviour more consistently with a
Figure 2. Mean monthly river flow time series of the Letaba river system.
Figure 3. The ACF and PACF of the time series before differencing process.
Figure 4. Differenced time series stochastic process of the Letaba river flow.
Figure 5. The ACF and PACF of the time series after differencing process.
stationary stochastic process (Figure 5).
It can be observed from Figure 5 that, the ACF of the differenced time series decays more quickly and the PACF cuts off after lag 12. This shows behaviour consistent with AR (12) process (i.e. a twelve degree autoregressive AR (12) model). So the ARIMA (p, D, q) model family was specified and then estimated by considering appropriate p AR and q MA lags. Since, a significant correlation appeared on both the ACF and PACF processes at 1, 3, 6 and 12 lags, the ARIMA (12, 1, 1), ARIMA (12, 1, 3) and ARIMA (12, 1, 6) models were specified and estimated. The results were then compared for the best model selection. The innovation process distribution of the time series was considered to be a Gaussian with a constant variance.
In order to ensure that the selected model actually describes the time series stochastic process of the particular system, each specified model was checked for goodness of fit and analysed. The residuals from the fitted model were inferred and checked to see if they are normally distributed and uncorrelated. It is important to realise that residuals distribution should portray a normally distribution behaviour and must be uncorrelated for the best model to be considered. Figures 6-8 show results of goodness of fit checks for the specified ARIMA (12, 1, 1), ARIMA (12, 1, 3) and ARIMA (12, 1, 6) models, respectively.
From Figures 6-8, the goodness of fit check indicates that the residuals are reasonably normally distributed as well as significantly uncorrelated for the ARIMA (12, 1, 6) model compared to the ARIMA (12, 1, 1) and ARIMA (12, 1, 3) models, hence the ARIMA (12, 1, 6) model was considered as the best model that can well describe the stochastic time series behaviour of the Great Letaba River system. The ARIMA (12, 1, 6) model was therefore used to generate forecasts of the River system for the next 25 years (i.e. 300 months from 361 to 660 periods). Monte Carlo simulation approach was used for forecasting the flow time series of the Great Letaba River flow for the next 25 years (Figure 9).
Figure 6. Goodness of Fit Check for Model of One degree Difference with Twelve AR lags and One MA lag.
Figure 7. Goodness of Fit Check of Model of One degree Difference with Twelve AR lags and Three MA lags.
Figure 8. Goodness of Fit Check for Model of One degree Difference with Twelve AR lags and Six MA lags.
Figure 9. Forecast time series process of the Great Letaba River flow for the next 25 years.
It should be noted that the residuals from the fitted model were inferred and checked to see if they are normally distributed and uncorrelated as previously discussed. In Figure 9, the Minimum Mean Square Error (MMSE) and Monte Carlo (MC) forecasts show the river flow continuing to decline over the forecast time horizon (period). The confidence bounds show that a decline in river flow is plausible, but, since this is a nonstationary process and forecasting with integrated errors, the width of the forecast intervals grows over time horizon. It can also be observed from Figure 9 that, the MMSE and MC forecasts are virtually equivalent. However, there are minor discrepancies in the forecast intervals. The discrepancies are due to some errors in Monte Carlo simulation. Usually the discrepancies decrease as the number of Monte Carlo simulations (realisations/sample path) increases.
Modelling quantitative river flow for efficient water resource management is important for sustainable water resource development. In this work, results revealed that the Great Letaba River flow time series process shows some degree of correlation, seasonality and non-stationary behaviour with decreasing river flow. Hence, in conclusion, the Great Letaba River flow has shown a decreasing trend and therefore, should be effectively used for sustainable future development.
The authors acknowledge the Tshwane University of Technology for financial support and the Department of Water and Sanitation of South Africa for assisting with data.
 Dessalegn, T. A., Moges, M. A., Dagnew, D. C., & Gashaw, A. (2017). Applicability of Galway River Flow Forecasting and Modeling System (GFFMS) for Lake Tana Basin, Ethiopia. Journal of Water Resource and Protection, 9, 1319-1334.
 Garrote, L., & Bras, R. L. (1995). A Distributed Model for Real-Time Flood Forecasting using Digital Elevation Models. Journal of Hydrology, 167, 279-306.
 Mukherjee, D., & Mansour, N. (1996). Estimation of Flood Forecasting Errors and Flow Duration Joint Probabilities of Exceedance. Journal of Hydrological Engineering, 122, 130-140.
 Nyabeze, W. R., Mallory, S., Hallowes, J., Mwaka, B., & Sinha, P. (2007). Determining Operating Rules for the Letaba River System in South Africa Using Three Models. Journal of Physics and Chemistry of the Earth, 32, 1040-1049.
 Otache, Y. M., Sadeeq, A. M., & Isiguzo, E. A. (2011b). ARMA Modelling River Flow Dynamics: Comparative Study of PAR Model. Open Journal of Modern Hydrology, 1, 1-9.
 Peng, G., Lu, F., Song, Z., & Zhang, Z. (2018). Key Technologies for an Urban Overland Flow Simulation System to Support What-If Analysis. Journal of Water Resource and Protection, 10, 699-724.
 Yule, G. U. (1926). Why Do We Sometimes Get Nonsense-Correlations between Time Series? A Study in Sampling and the Nature of Time Series. Journal of Royal Statistical Society, 89, 1-64.