Water is one of the most important natural goods for maintenance of life in Earth. Nonetheless, water scarcity is a global challenge that currently affects more than 40% of the total global population  . It is also estimated that by 2025, an estimated 3.9 billion (or over 60%) of the world’s population will live in a water stressed environment  . However, supply of clean and fresh water is one of the main challenges facing most of the African countries.
Despite water being one of the most essential resources with great implications for development in Africa, the freshwater situation is still not encouraging  . According to United Nations  , an estimation of more than 300 million people in Africa is currently living in a water-scarce environment and many water requirements for agriculture, sanitation, industry and domestic use in Africa cannot be met. Regrettably, the situation is even getting worse as a result of increased population growth, rapid urbanization and industrialization, increasing agriculture and also lack of adequate capacity to manage existing freshwater resources.
Moreover, in Kenya water predicament remains very critical and the country has been classified among the water scarce countries with only 647 m3 per capita against 1000 m3 standard global benchmark  . Almost 41% of Kenyans out of a total population of 46 million people still rely on unimproved water sources, such as rivers, shallow wells and ponds while 59% of Kenyans use unimproved sanitation solutions (Kenya water and sanitation report,  ). Athi River is one of the Kenya’s fastest growing towns. According to the republic of Kenya 2009 census data, the town’s population growth rate is 4% per annum and it is still witnessing tremendous growth and social upheaval due to a large influx of new residents and establishment of new industries. Athi River environment has therefore been seriously threatened by these alarming population pressures and industrial development leading to increased water demand  .
Mavoko Water and Sewerage Company (MAVWASO) owned by the Kenyan Government, provides sewerage and water services in Athi River town. The company supplies approximately 3500 m3 of water to a population of around 110,396 per day translating to 40% of the water demand and due to this the area has frequent water shortage. As a consequence, in order to ensure reliable water supply to the residents of a town, water demand estimation and projection are necessary. According to House-Peters et al.  , water demand estimate is useful in developing alternative water supply sources, integrating water demand management programs and also planning a cost effective and reliable infrastructure.
There has been a growing interest in using Geospatial Information System (GIS) based techniques in modeling water demand occurrence in the recent past years. According to various studies     , the interest has mostly stemmed on analyzing socioeconomic, climate, physical, public policies and strategies related factors in understanding water demand of a site specific area. However, water demand modeling can use as many variables as required that directly or indirectly affect the water demand in a particular area. Besides, several studies have revealed the significance of using ordinary least square (OLS) technique in identifying key water demand drivers   . In a study in Oregon, USA, GIS and statistical methods were used to identify the determinants of water demand and to determine the water spatial trends and how they changed overtime  . Following the same line, in a study by Wentz et al.  that tackled residential water demand at the census tract level found a spatial effect above and beyond the effects for household size and pools on water consumption. The results demonstrated that census tracts exhibited water consumption behavior similar to neighboring tracts for the two variables. However, occurrence of critical water shortage issues reported in Athi River town has been highly associated with socioeconomic characteristics which indicate the economic status of the region  . Modeling them can help understand the key factors in terms of their influence, pattern and relationship they have to water use. With the rise of new statistical techniques and GIS tools, GIS techniques have come in handy in analyzing and estimating water use/demand. In spite of this, research on water demand occurrence in Kenya has been done but most studies like      have not yet utilized GIS spatial modeling techniques to its full potential. Narrowing down to the study’s area of interest, despite the town having frequent water shortages, few studies   has been conducted to analyze the problematic situation of the town and none of the studies have incorporated the idea of GIS technology to understand this water use challenge in a spatial domain. GIS is a new technique which has been recently used to analyze and understand the factors behind water demand and to estimate its occurrence for sometimes in future.
Therefore, this study explored the use of GIS spatial modeling to examine the effects of housing characteristics on water use at the zone level in Athi River town. This was arrived at, first of all by determining factors that influence domestic water demand occurrence in the town. We achieved this by applying a global OLS regression model to select the significant factors from candidate factors proposed to use when assessing domestic water demand. Secondly, local GWR regression model was adopted in orderto reveal the sensitivity selected factors to water use and examine their spatial effects above and beyond domestic water use. Finally, the study used the GWR model to project short term water use of the town. This baseline information can be useful in administering and planning adequate water supply system.
2. Materials and Methods
2.1. Study Area
Athi River town was targeted for the analysis because of its heterogeneous characteristics. For instance; Senior staff is an area with very low population and well supplied with piped water, Ngei 2 represents a middle class area, Sofia and slota are slum areas with high population and less piped water connection, and Old town is a commercial centre. To enhance water service delivery, the town is divided into thirty-nine subzones regions (Figure 1).
2.2. Description of Data
The data collected in order to achieve the objectives of this study included both primary, and secondary data. Primary data were collected through the use of close and open-ended questionnaires, interviews and field observations. Face-to-face interviews were administered with the senior and junior staffs at MAVWASCO. Observations were made through several visits to the sites whereas questionnaires were administered to the residents of the town. This enabled us to have firsthand information on key household characteristics which were hypothesized as key independent variables that influence water consumption in our study area. The variables included data on average household rooms, percentage of people with diploma holder and above (education level), average household income and the percentage of households with garden. GIS vector data in shape file format of the meter connection was obtained from the GIS Department of MAVWASCO. Population data was obtained from the Kenya GIS Portal as provided by the Kenya National Bureau of Statistics per national 2009 census count. The population data had lot of metadata such as age and gender which was not required in the analysis hence these columns were deleted.
A geodatabase of the domestic water demand for the area was created using ArcGIS 10.2 and the information captured included; average household size, number of meter connections, number of households, average household rooms, average household income, percentage of households with garden presence and percentage of people with diploma holder and above (education level). The geodatabase formed the platform on which various analysis were done. Administrative datasets for Mavoko sub-county were obtained from the Machakos County Government. The dataset was used to delineate Athi River township boundary. The shape file generated outlined the boundaries of all the 39 zones which get water supply from the company. For the dependent variable, we used 2017 billing records for water consumption to determine factors affecting the total consumption of residential units at the zone level in Athi River Town. Water consumption data for all sectors were acquired from Mavoko Water and Sewerage Company, Commercial Department. We extracted only domestic water users from their database, combined the monthly data for the year 2017, and aggregated use to the zone level in order to protect the privacy of individual households. These statistical values were entered into the prepared GIS vector polygon map as non-spatial data. Lastly, all the data were converted to ESRI Shape files and transformed to one coordinate system Universal Transverse Mercator (UTM) zone 37-S on WGS_1984 datum and file geodatabase which was used for the analysis was created in arc catalogue in ArcGIS where all feature classes were created and stored considering the compatibility of the dataset in terms of formats, scale and types.
Figure 1. Map of the study area.
We adopted the approach as shown in Figure 2 in order to examine the effects of housing characteristics on water use at the zone level in Athi River town. To begin with, household characteristics that are proposed to use in assessing water demand were determined from various studies and used as the independent variables in this study. Consequently, using the identified household characteristics as the explanatory variables and recorded water consumption for 2017 by zone level as the dependent variable, Ordinary Least Squares (OLS) and Geographically Weighted Regression (GWR) models were applied to understand domestic water demand in Athi river. OLS regression model was used to check for multicollinearity effects among the explanatory variables and for selecting the significant variables. In addition, spatial autocorrelation statistic was applied to detect whether there was spatial autocorrelation or clustering of the residuals which violate the assumptions of OLS. Finally, in order to examine the spatial relationship between the significant variables and water demand occurrence and in predicting future water demand of the town, GWR regression model was adopted.
2.4. Analysis Methods
2.4.1. Ordinary Least Square Regression
Several factors from literature have been proposed for use in assessing water demand/use occurrence in site-specific area. For instance, these studies  -  ,  established use of around 10 household characteristics to have influence in domestic water use. They included; household size, household income, education level, building age, number of rooms, garden, swimming pool, lot size, meter connection and number of households. From field visits and talk to municipal senior and junior staffs and residents, only 7 of the 10 variables were found relevant to our study area (Figure 2). In order to narrow down and determine the significant variables, the above mentioned 10 factors were considered and subjected to OLS regression against a dependent variable to determine its significance in the study. OLS is commonly referred to as linear regression because of the nature of its model. The model can be simple or multiple depending on the number of explanatory variables. The OLS method corresponds to minimizing the sum of square differences between the observed and predicted values. As long as a model satisfies the assumptions of the OLS for linear regression, then the best possible estimates can be guaranteed. The study therefore used OLS technique for selecting the appropriate key predictors of domestic water demand with respect to their type and strength of relationship with the dependent variable. The OLS models equation can be expressed as:
where y is the dependent variable, beta β represents the contribution of each independent variable makes to the prediction of dependent variable, x is the corresponding number of predictors and ε is the random error term of the residuals. In our study, water consumption was the dependent variable with average household size, average household income, education level, building age, number of rooms, garden, meter connection and number of households as the independent variables. Essentially, OLS model was also used to check for multicollinearity effects (redundancy among predictors). The multicollinearity was assessed with the variance inflation factor (VIF) values of the OLS. As proposed by  , the VIF of the ith predictor can be expressed by;
where R2 is the multiple correlation coefficients of the regression and i is the predictors. If the computed VIF value(s) is greater than 7.5, it indicates the existence of multicollinearity among the predictors. Progressively, spatial autocorrelation statistic was applied to detect whether there was clustering of the residuals which violates the assumption of OLS. As well, the spatial independency of the residuals was assessed with the global spatial autocorrelation coefficient (Moran’s I).
2.4.2. Geographically Weighted Regression
GWR is a tool for exploring spatial heterogeneity; it allows the relationships being modelled to vary across the study area. The GWR is designed to answer scientific questions like, which explanatory variable shows stronger influence in a certain area? Does the relationship between the dependent variable and the predictors vary across space? Our expectation is that the variation in water demand is a function of both the effects of the independent variables on domestic water demand and the fact that the nearby zones are similar with respect to the independent variables. Therefore, the GWR was adopted in the study to explore
Figure 2. Approach adopted for the research.
how each explanatory variable related to water use spatially in the whole study area. GWR equation is represented as:
where (u, v) is the geographical location of ith point in the space and is a realization of the continuous function at point I. GWR results provide coefficients, t-scores, standard errors and R2 values at each location. These results can be viewed on a map to visualize spatial patterns in the model. One of our aim was to compare the performance of these models through overall R2 values and corrected Akaike Information Criterion (AIC) and determine which model provided better explanations for variations in water consumption and which model was capable of projecting water demand more accurately. In our GWR analysis, we used the same dependent and independent variables as we did for the OLS and each location was the centroid of the zone for the Athi River Town.
2.4.3. Geometrically Regression Method
To estimate future water use or demand in the town, the projection of identified significant variables (household size, household income, meter connection and household room) was necessary. Average household size was derived from household population and number of households. Geometrical progression method was used in estimating population (in this method the percentage increase in population from year to year was assumed to remain constant at 4% per annum). The population at the end of nth year, Pn can be estimated as:
where, IG is the growth rate percent, P is the present population while n represents the forecast period. Per capita personal income is often used to measure economic well-being of a region. Therefore, in determining how average household income in the region will be shaped into the future, the variable was estimated based on the personal disposable income growth rate of 8.16% per annum. Since the increase in number of meter connections is a policy decision to be made by MAVWASO, it was reasonable to assume that the company will have to increase the number of connections according to the increase in population. Hence, the number of connections was projected taking the constant ratio of population to the number of connections as per the data of 2017. The household room is a policy decision to be made by the concerned individual. Therefore, the average household room in each zone was assumed to be constant during the projection period.
3.1. Selection of Significant Variables by OLS
The OLS model was calibrated to diagnose multicollinearity effects among the explanatory variables. The OSL diagnostic report demonstrated that the household room and garden presence variable returned VIF values of 18.43 and 19.87 respectively indicating the presence of multicollinearity effects of the two variables. Since these values were higher than the set redundancy threshold of 7.50, garden presence variable was removed from the model and re-calibrated. After recalibration, all the variables returned VIF values fairly greater than 1.00 indicating absence of multicollinearity effects. OLS regression was also used to provide insight into the variables that explained the spatial variation of domestic water demand across the entire study region. Probability and robust probabilities were used to assess explanatory variable significance. Statistically significant probabilities have an asterisk next to them. OLS model revealed a statistical significance of 4 factors: Household size, household income, meter connections and household rooms as shown in Table 1.
Similarly, OLS regression model explained about 87 percent (adjusted R2 = 0.87) of the water demand variation with AIC = 807.44, (Table 2). The model reported a joint F-statistic of 46 and Joint Wald statistic of 1445. This was a general prove that the model was statistically significant. Importantly, the Chi-squared value (17.17) of the Koenker statistic was statistically significant indicating the spatial relationship between some or perhaps all of the explanatory variables and dependent varied across the region. Since the Koenker statistic detected non-stationarity in the relationship, it was evident that the model’s fitness was likely to be improved by employing GWR. This is because GWR assumes that relationships are non-static across space. Finally, Jarque-Bera statistic returned a non-significant Chi-squared value of 14.29 indicating that the models prediction was free from bias and that the residuals were normally distributed.
3.2. Geographically Weighted Regression (GWR) Analysis
The calibrated GWR results suggested a significant improvement on OLS model. Comparing both models with AICc, in the OLS model, AIC was 807.44 and in the GWR model the AIC converged to 803.74 (Table 3). The reduction by around 4 in the AIC from the OLS model suggested that the GWR model performed better  . Also by comparing with adjusted R2, GWR model improved the explaining power of the OLS model with about 5% which is a relatively high percentage explained value not accounted for by the global model. As well, the increase in explanation from the OLS to the GWR model reveals the spatial effects, the fact that there is spatial variation in the relationships themselves and nearby cases respond similarly to changes in the significant independent variables.
Table 1. OLS regression coefficients.
Table 2. OLS diagnostic statistics.
Table 3. GWR output results.
3.3. OLS and GWR Standardized Residuals
Progressively, OLS and GWR generated residuals were mapped to investigate their distributive pattern (Figure 3). A visual examination of the result showed the models residuals were randomly distributed and there was no clustering of over predictions and under predictions in the models. The results were further confirmed statistically by applying spatial autocorrelation statistic (global Moran’s I). The Moran’s I report revealed that the residual patterns for both OLS and GWR were significantly different from random with a Moran’s index value = 0.07 and z-score value = 0.81 and Moran index −0.08 and z-score −0.48 respectively. This indicated that both models were properly specified.
3.4. GWR Parameter Estimates
Both OLS and GWR models identified the prominent factors that influenced water consumption in the study area. Therefore, only the useful predictors were entered into the models-household size, household income and household rooms and Meter connections. Coefficients values computed by GWR tool, reflected the relationship and strength of each factor to the water demand in the area of study. These factors were symbolized to reveal their trend which showed that they were strong predictors of water demand in some locations and weak predictors in other locations. The dark coloured areas are locations in the study area where a variable had the strongest relationship with water demand whereas the light coloured areas are locations where a variable had the weakest relationship with the water demand (Figure 4). This is particularly useful in controlling water supply to specific areas, where region of high consumption is given the first priority in case of any water distribution system to be implemented.
3.5. Projection of Water Demand
To estimate 5-year domestic water demand of the town, GWR was adopted in the study. The GWR model was calibrated using the explanatory variables used all along, but the explanatory variables for the predictions were projected household size, household room, meter connection and household income variables for some time in the future, that is for 2022. From the GWR model, the domestic water demand in town in 2017 is 721,899 m3 as compared to estimated 880,769 m3 in 2022, explaining an increase of about 22%. In order to ascertain the reliability of the projection outcomes, we applied the GWR model on the consumption data for 2013 to project what would have been the consumption for 2017. The projected result from the model compared with the actual consumption recorded by MAVWASCO showed very minimal deviation from each other therefore making the result from the model reliable. For this reason, using 2017 as the base year, we were able to project the water consumption for the study region to 2022. The result for the towns zone projections are as shown in the graph (Figure 5).
Figure 3. OLS and GWR standardized residuals.
Figure 4. Local parameter estimates of GWR.
Figure 5. A graph of actual water consumption in 2017 and 2022 projections.
The aim of this study was to spatially analyse domestic water demand in Athi River using two GIS-based regression models (OLS and GWR models). To achieve this OLS was used to determine significant variables that explain water use variations. A total of six variables namely: average household income, average household size, education level, meter connections, number of households, and average household rooms, were identified and used in the models. From these variables, two variables (education level of consumers and the number of households) were found to negatively influence water use in the study area. This was demonstrated by negative coefficients returned by the OLS regression model as shown in Table 1. The implication of this was that for the education level variable, the increase in number of educated people in a household explained an overall decrease in water consumption. This was interesting and in an agreement with a study done in Kathmandu, Nepal which found out that, water consumption reduced with the overall increase in the education level of the consumers  . One likely explanation of this outcome is that, education brings increased awareness among the people for the conservation and careful use of water. Another surprising outcome was the negative coefficient returned by the OLS model on the number of households. Negative coefficient inferred a decrease of water consumption with an increase in the number of households in the study region. Remarkably, this could be the case because it is reasonable enough to assume that newly-constructed homes in Athi River emphasize more on water saving fixtures and appliances like water saving showerheads that are water use efficient that reduce average water consumption.
On the other hand, meter connections, average household income, average household size, and average household room variables revealed a positive relationship with water consumption and were found statistically significant variables that explained the variability of water use across Athi River households (Table 1). This explained that water use increases with increase of each of the mentioned significant variables. The OLS regression results depicted that household size was the most significant determinant of domestic water demand (Table 1). Like in other studies    , household size significantly explains the variation of household water consumption and the larger the household is the more household water requirements. At the household scale, as occupancy increases, overall total water demand increases, due to more water being used for bathing, laundry, toilet flushing and dish-washing, but per capita use decreases,  .
Our study also found out that household income is a significant factor in explaining water use at household level (Table 1). This was similar to the findings on difference between affluent and non-affluent households water consumption in Masvingo, Zimbabwe  where water consumption fluctuated more in affluent households than in no-affluent households. The reason for this is that, a higher family income results in a higher level of household water use and consumption because of their ability to pay a compared to lower family income who consume less as they have low purchasing power. The study also found out that, number of rooms in a household was also an important factor in explaining domestic water use and to expound on this, more number of rooms in a household implies more number of water tap and bathtubs connections which indicate more water use from each room category. Last but not least, the results of the exploratory GWR regression model illustrated the spatial relationship between water use and the significant explanatory variables. The influence of household size to water use was found to be stronger in the North Eastern and South Western parts of the town. This depicts the true characteristics of the regions which consist of mostly occupied three to four storey apartments and other semi-permanent structures making the area to be highly populated and hence more water requirements. In contrast, the same variable was a weak predictor of water demand in the South East and North West margin as shown in (Figure 4). This is the areas of few households and low population.
Another important variable identified to have a spatial relationship with water demand was household income; as shown in Figure 4 the variable appears to have a high influence on water demand in the North Western part of the town. This region consists of affluent households (mostly occupied by senior staffs) who consume more water because of their ability to pay as compared to the midst parts of the town where the occupancy is by the non-affluent households (and mostly slums). Meter connection also a significant predictor, exhibits strong positive influence over the dependent variable in the North Western part of the region is well connected. On the central part of the country, the influence of the variable is very weak and continues down southern parts where the region is not well connected. Amazingly, household room variable was discovered to be a strong predictor of water demand in large parts of the town. As shown in Figure 4, household room variable has a high influence on water demand occurrence in almost all the eastern parts of town. The region consists of mixture house topologies (apartment and detached houses) with numerous average household rooms. Households with various rooms simply mean more water connections which result into high water usage.
5. Conclusion and Outlook
Using GIS-based local model and global statistic to explore the relationship between water use or demand variation and the household variables affecting water demand, it was able to identify and understand some certain vital information concerning stationarity and non-stationarity in spatial dataset. Results confirmed the significance of household size, household income, meter connection and household room on domestic water demand. These findings correspond generally to the determinants of water consumption found in other studies and can be used to analyze and understand water consumption in other different areas depending upon the same expected variables. Further, results of the GWR confirmed the importance of spatial effects in influencing water demand for the four variables. However, GWR model results suggested that water resource planners should incorporate and consider spatial and neighborhood effects in order to manage limited water resources effectively. Improvement of GWR over OLS demonstrated that it can be used to project water demand more accurately than OLS model. Water demand projection is necessary and especially in the contemporary society since it provides basis for planning future system improvement and also helps to evaluate the ability of existing sources to meet future water needs. The long-term goal is using this model to augment a decision support tool that policy and decision makers can use to analyze the effects of decision making on ensuring water sustainability. Finally, this study is a contribution to the field of GIS and domestic water demand modeling since it provides a crucial evidence on household water demand variation in Athi river town and statistically establishes that GWR local models show better fitness than OLS global models when modeling spatial data.
We acknowledge the Mavoko Water Supply and Sewerage Company for the provision of both spatial and non-spatial datasets used in this research. Finally, we thank Machakos county government for the provision of administrative boundaries dataset.