ariable for these analyses was river discharge (disch) measured in cubic metre per second whiles the independent variables were time (month & year) and space (loc) which are the various gauge stations along the Black Volta River considered in this study, namely Lawra, Chache, Bui and Bamboi. The covariates included rainfall (rain) measured in millimetres, relative humidity (humid), elevation (elev) measured in meters, soil type (soil) and land use (luse) which was considered as a random effect. Interactions between some of these variables were also considered especially the space-time interactions.

2.3. Models and Analyses

After checking the relationship between river discharge (disch) and all predictors, independent models were constructed for all covariates to determine their effect on disch and, if it resulted significant, its nature (linear or nonlinear) was also determined. If a covariate was significant as a unique predictor, but not significant jointly with other predictors, it was removed. For instance, it was observed in the case of elevation (elev), which had a significant parameter as a unique predictor, but was eliminated in models which had several predictors because its effect resulted no more significant. This preliminary analysis is omitted from the results for the sake of brevity.

The response variable ‘river discharge in gauge station i’ ( $disc{h}_{i}$ ) is modelled using a generalized additive mixed model (GAMM)    , as shown in Table 1.

2.4. Parameter Estimation

The GAMMs in Table 1 can be expressed as generalized linear mixed models (GLMMs);

$\mathrm{log}\left({\mu }_{i}\right)={X}_{i}\theta$ (1)

where ${X}_{i}$ is a row of the model matrix containing all components of the model. That is, all explanatory variables of fixed and random effects, and all the basic functions evaluated at observation i. The parameter $\theta$ contains the coefficients of fixed terms, the random land use effects and the bases. We estimate parameters with maximum likelihood (ML) estimation of the smoothness parameters, by integrating out the part of $\theta$ in the log likelihood function that is in the range space as described in  .

2.5. Model Selection and Validation

Model selection was based on Akaike information criterion (AIC), Bayesian information criterion (BIC), adjusted R-squared, the root mean squared prediction error (RMSPE), and Nash-Sutcliffe efficiency (NSE). However, the key indicators of performance were the RMSPE which is independent of the likelihood and NSE. The RMSPE and NSE are calculated using Equations ((2) and (3)) respectively.

Table 1. Summary of models considered.

where ${\mu }_{i}=E\left(disc{h}_{i}\right)$ and $\mathrm{log}\left(disc{h}_{i}\right)$ is assumed to follow the Gaussian distribution. ${\beta }_{0}$ is the intercept parameter, ${\beta }_{1}$ and ${\beta }_{2}$ are parameter estimates of fixed effects while ${f}_{1-4}$ are smooth functions of the covariates which are represented using a cyclic cubic regression spline. $k\left(i\right)$ indexes the land use at the ith gauge station and $lus{e}_{k\left(i\right)}$ is a random land use effect. ${f}_{5}$ is a tensor product of cyclic cubic regression splines.

$\text{RMSPE}=\sqrt{\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-{\stackrel{^}{y}}_{i}\right)}^{2}}{n}}$ (2)

$\text{NSE}=1-\left[\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-{\stackrel{^}{y}}_{i}\right)}^{2}}{{\sum }_{i=1}^{n}{\left({y}_{i}-{y}^{mean}\right)}^{2}}\right]$ (3)

where ${y}_{i}$ and ${\stackrel{^}{y}}_{i}$ are the observed and predicted river discharges for n months, ${y}^{mean}$ is the mean of the observed data.

For model checking and investigating whether the final selected model has disentangled spatial and temporal correlation in residuals, several diagnostics were used. The QQ-plot and histogram of residuals were used to check normality of residuals. Homoscedasticity of residuals was checked using scatter plot of residuals versus predictors. Also, the relationship between response and fitted values was checked as a visual goodness of fit verification using scatter plot of response versus fitted values.

2.6. Software

Data analysis was done in the R programming environment version 3.2.4  and models were fit using the MGCV package  .

3. Results and Discussion

3.1. Descriptive Statistics

In generalized regression models, it is important and necessary to study the distribution function of the response variable (disch) in order to select both response distribution and link fuction. Boxplots for disch and log (disch) are reported in the upper panels of Figure 4 while the normal QQ-plots for disch and log (disch) are reported in the lower panels. We observe from the figure that log (disch) gives a good approximation to the normal distribution. Hence the Gaussian distribution was considered as the underlying theoretical distribution of disch in the GAMMs with log link function.

The time series plots of river discharge at the various gauge stations are shown in Figure 5, which indicates an obvious seasonality in discharge at all four gauge stations. This suggests that smooth functions may be represented using cyclic cubic regression splines.

3.2. Model Selection

The first GAMM (Model 1) included fixed effect of soil type, random effect of land use, and smooth functions of rainfall and humidity but excluded space and time effects, and was able to explain only about 19.2% (R-sq. (adj) = 0.192) of the variability in river discharge as shown in Table 2. The second GAMM (Model 2) added the main effects of space and time to model 1 which resulted in explaining about 40.1% of the variability in river discharge. The third GAMM (Model 3) added only the space-time interaction effect to model 1 and resulted in explain-

Figure 4. In the upper panels: Boxplots for river discharge (on the left) and the log transformation of river discharge (on the right); in the lower panels: Normal QQ-plots for river discharge (on the left) and the log transformation of river discharge (on the right).

Figure 5. Time series plots for river discharge at the various gauge stations.

Table 2. Model selection criteria.

ing about 72.4% of the variability in river discharge while the final GAMM (Model 4) added both the main and interaction effects of space and time to model 1 and resulted in explaining about 82.1% of the variability in river discharge. This provides an indication of the very significant role space-time effects play in modelling river discharge, but are usually ignored.

Furthermore, the RMSPEs and NSE values in Table 2 indicate satisfactory performance for all GAMMs considered. However, a comparison among them using AIC and BIC values clearly indicate a much better performance by the GAMM which included both the main and interaction effects of space and time (Model 4).

3.3. Parameter Estimates of the Selected Model

Parameter estimates of the selected GAMM are reported in Table 3 and Table 4. We observe from those Tables that, parameter coefficients (both smooth terms and non-smooth terms) were all significant at the 0.05 level.

3.4. Diagnostic Checks

Basic diagnostic plots of the selected GAMM are reported in Figure 6. The QQ-plot of residuals shows an evident arrival of residual quantiles at the theoretical normal quantiles and a near symmetry observed in the histogram of residuals as well. The scatter-plot of residuals versus the linear predictor indicates

Table 3. Parameter coefficients.

Table 4. Approximate significance of smooth terms.

Figure 6. Diagnostic for selected GAMM. In the upper panels: QQ-plot of residuals (on the left) and plot of residuals vs. Linear predictor (on the right); In the lower panels: Histogram of the residuals (on the left) and plot of the response vs. Fitted values (on the right).

an accentuated homoscedasticity of residual variance while that of the response versus fitted values shows independence of the residuals. All in all, diagnostics of the selected GAMM are quite good.

4. Conclusions

We have effectively used GAMMs for modelling space-time river discharge data in this paper. GAMMs provide a flexible framework which allows for smooth effects of covariates and smooth effects of space and time. In other applications such as repeated observations of weather station data, the use of spatio-temporal dynamic models or state-space models have been proposed. Four GAMMs were explored, two with space-time interactions and two without space-time interactions. The comparison of the performance of the models with space-time interactions and those without space-time interactions based on AIC and BIC suggests that in this application, the former is better overall and in particular for modelling variations in river discharge data. Further, a model with space and time main effects performed better compared with one without space and time main effects.

Acknowledgements

This study was supported by the Akosombo Kpong Dams Reoperation and Reoptimization Study hosted by the Water Resources Commission (WRC) of Ghana and funded by the African Development Bank (ADB). We sincerely thank WRC and ADB for the support.

Iddrisu, W. , Nokoe, K. , Luguterah, A. and Antwi, E. (2017) Generalized Additive Mixed Modelling of River Discharge in the Black Volta River. Open Journal of Statistics, 7, 621-632.
