Groundwater pollution vulnerability defines the sensitivity of a groundwater body of being adversely affected by an imposed contaminant load . This concept entails two notions: intrinsic and specific vulnerability. Intrinsic vulnerability defines the vulnerability of groundwater to contaminants generated by human activities, depending on the inherent geological, hydrological and hydrogeological characteristics of an area (soil type, topography, recharge, vadose zone, etc.), but independent of the nature of contaminants. For specific vulnerability, specific physicochemical properties from contaminants are considered . Groundwater pollution risk can be defined as the process of estimating the possibility that a particular event may occur under a given set of circumstances  and the assessment is achieved by overlaying hazard and vulnerability .
Several approaches exist for assessing groundwater vulnerability. They can be grouped into methods based on the use of 1) process-based simulation models, 2) statistical models and 3) overlay and index methods  . Alternatively, they can be classified according to the degree of integration of monitoring data in the vulnerability assessment . Hence, distinction can be made between vulnerability assessment methods based on generic data, based on groundwater monitoring data, or hybrid methods based both on monitoring and generic data.
The first and most straightforward method for modelling vulnerability consists of the mapping of pollution as assessed from a monitoring program. In this approach the actually observed pollution is used as a metric of vulnerability. The rational of this model relies on the logic that if pollution is observed, the groundwater body should be vulnerable. Yet, the inverse is not necessarily true.
In Europe, groundwater sites are monitored on a regular basis for a broad set of physicochemical parameters. This monitoring activity is deployed to comply with the current different regulations related to water (e.g. Water Framework Directive in Europe). The data collected in the different monitoring programs can therefore directly be used for making a first assessment of vulnerability. Mapping of the pollution can be done using standard GIS software or using more advanced geostatistical procedures. For instance,  illustrated how time dynamics of pollution parameters can be integrated into the mapping approach. However, monitoring programs suffer from many drawbacks such as limited spatial support, low space-time resolution of observed pollution, limited number of pollution parameters and in particular a high cost . In addition, the results of monitoring programs do not directly allow identifying the origin of the pollution, and does not allow a clear distinction between specific and intrinsic vulnerability. The observed pollution is therefore often a biased metric of the real vulnerability. Groundwater pollution maps therefore only yield a partial image of the real groundwater vulnerability.
Alternatively, vulnerability can be assessed by means of vulnerability models, which integrates the loading, transport, retention and attenuation processes of pollution towards the groundwater body in a mathematical formalism. Use can be made of process-based models , statistical based models   or generic parametric vulnerability models, like the DRASTIC model . In the class of statistical models, use can be made of statistical models based on observation data, or statistical models based on more complex system models, i.e. the so-called meta-models  . The generic parametric methods represent definitely the most utilized approach.
The methods based on the monitoring data or on the modelling data have the potential to predict vulnerability. Most of them can even predict the full statistical distribution (i.e. the expected value and the higher moments) of the vulnerability, which allows to assess the accuracy of the assessment (the precision cannot be assessed as vulnerability cannot be “measured” in a direct way). The accuracy and hence the uncertainty will be region and case specific. Hence methods can be selected and optimized in terms of the uncertainty associated with the vulnerability assessment method. Alternatively, methods can be combined, following the logic that each method allows a partial assessment of the vulnerability of a groundwater body. Such combined methods rely on operational data fusion techniques. Data fusion is a method that can be used to combine optimally various sources of information about groundwater quality and vulnerability in a consistent and accurate model prediction. Among different data fusion techniques, a Bayesian Data Fusion (BDF) approach was recently proposed by . It was especially designed for spatial predictions problems and provides a consistent framework of fusing an arbitrary large number of information sources that are related to a same variable of interest in order to provide a unique spatial prediction. The main advantage of a Bayesian approach is to put the problem of data fusion into a clear probabilistic framework. Recently, the BDF method was also successfully applied to map groundwater pollution in Belgium  and the Democratic Republic of Congo .
Most of these recent vulnerability assessment methodologies reviewed in the previous sections have been developed at the regional scale, national or continental scale, focusing on rural and nature environments. Yet few of these methods have been tested in strongly urbanized and human impacted environments. There is therefore scope to analyze more in detail the vulnerability assessment methodologies for urban and peri-urban environments. This is particularly important given the strong concentration of the global population in urban environments, and the many groundwater function and services developed in urban settings such as the provision of drinking water to the urban population. The study presented in this paper should therefore be considered in this context.
In this study, we present the use of a statistical based modelling approach for assessing groundwater vulnerability of the Brusselean groundwater body in the Brussel’s Capital Region, in Belgium. We focus on the nitrate pollution problem, since the local authorities need to comply with the European Nitrate directive, and therefore sufficient nitrate monitoring data are available to implement statistical modelling approaches. We compare linear and non-linear models. We consider the loading of the independent explanatory variables in these models as indicators of possible pollution sources. The development of a statistical based modelling technique should be considered as an initial step towards a hybrid model that combines process knowledge or data based empirical models with monitoring data.
2. Material and Methods
2.1. The Study Region
The Brussel’s Capital Region (BCR) is situated in the center of Belgium and encompasses 19 communities, a. o. the city of Brussels (Figure 1).
The land use is dominated by urbanization. Forty eight percent of the surface area is urbanized. The non-urbanized area is dominated by forest cover (11% of total area) situated in the South Eastern part of the region, and public parks and gardens (8% of total area). The river Zenne, which is an affluent of the Scheldt river, intersects the region. The river is covered in the major part of the city. The relief is gently sloping from the river banks onwards. Details on the land use of the BCR can be consulted at http://www.geo.irisnet.be/en/maps/new/.
The hydrogeology of the BCR is illustrated in Figure 2. The valley of the Zenne and the left bank is dominated by Lower Eocene clayey formations, while on the right bank the Middle Eocene Brussels sandy formations outcrops. The
Figure 1. Situation of the Brussel’s capital region in the centre of Belgium.
Figure 2. Hydrogeology of the Brusssel’s capital region (Source ).
Brussels sandy formation is partially covered by a less permeable Middle Eocene clayey formation (Maldegem Formation). The unconfined Brusselean groundwater body situates within the Brussels sandy aquifer. This groundwater body is locally exploited for drinking water provision purposes at an average rate of 2.5 mio m3/year.
2.2. The Nitrate Monitoring Data Set
The Brusselean groundwater body needs to comply with current EU environmental regulations, amongst others with the EU nitrate directive. Given the unconfined nature of the water body, the water body is potentially subjected to pollution from point sources (infiltration holes, animal storage facilities, cemeteries, waste water treatment facilities) or from diffuse sources (natural mineralization of soils, leaking sewage systems, fertilization of public parks, urban agriculture, …). Therefore a nitrate monitoring program was initiated, encompassing 48 monitoring points. The positions of these monitoring stations are given in Figure 3 and Figure 4. We consider in this study the monitored yearly averaged nitrate data since 2006. We mapped the mean average annual concentration and
Figure 3. Nitrate contamination observed in the Brussels groundwater body as observed in 50 monitoring stations. In the background, data are projected on a DRASTIC vulnerability map. The latter has been parametrized using standard generic data.
Figure 4. Tau Kendall trend statistics of observed mean annual nitrate concentration in BCR. The colors indicate the direction of the Tau Kendall statistic (orange-red: positive trend; green-dark green: negative trend). The size of the dots indicates the strength (significance level) of the Tau Kendall statistic.
analyzed the statistical trend. The statistical trend analyses were made using the Tau Kendall trend test . A selection of water samples was also analyzed for the stable N and O isotopes. Isotope data were analyzed by means of the Bayesian source identification model SIAR (Stable Isotope Analysis in R) .
2.3. The Statistical Modelling Approach
Statistical models were developed to predict nitrate groundwater pollution in terms of easily available spatial attributes. The dependent variable in these models was the mean of the annual average nitrate concentration at a given location. The independent variables were the values of the representative ancillary variables within the influence zone of the monitoring station. In total 23 spatially distributed ancillary attributes were defined. Ancillary variables were defined on a regular 10 m resolution grid. The selected attributes belonged to 4 categories. The natural hydrogeological environment category comprised variables that are directly related to the basic hydrogeological setting (depth of the aquifer, size of the influence zone of the monitoring well, part of the influence zone confined with Middle Eocene clayey formation, etc.). Second, the urban density category included variables such as population density in the influence zone, percentage of impermeable surface in the influence zone, etc. The authorization category included variables related to specific urbanization permits (e.g. authorization for animal exploitation). The last category, encompassed variables related to the status of the sewage system.
Previous studies illustrated the sensitivity of statistical modelling results on the size and shape of the influence zones of each monitoring point . Influence zones, and hence the independent parameters that are considered in the statistical model, should be defined using hydrogeological criteria, similarly as with methods to delineate groundwater well protection zones. In this study, we used a GIS based simplified approach to model the influence zones. We modelled particle transport from the monitoring wells, using the slope of the flipped aquifer depth map as a proxy for the hydraulic gradient. We considered generic data of the aquifer thickness, porosity and transmissivities, and modelled steady state flow magnitude and direction in the GIS environment. Subsequently, we injected conservative particles in the flow field and evaluated the particle displacement after 5, 10 and 20 years. We added a porous tuff parameter corresponding to a longitudinal dispersivity of 30 m and lateral dispersivity of 10 m. The envelopes of particles displaced after 5, 10 and 20 years were superposed to define the influence zone of each monitoring well.
We implemented linear statistical models using multiple stepwise regression, and non-linear artificial neural networks with one neuron layer and 3 neurons, implemented in JMP SASTM (see e.g. see https://www.jmp.com/support/help/14-2/neural-networks.shtml or https://www.jmp.com/support/help/14-2/multivariate-methods.shtml#). We used the Akaike Information Criterion (AIC) to analyze the robustness of the model and to identify the most appropriate model structure. We separated the data set, so that data from 38 stations could be used for model identification (i.e. the model calibration data set) and 10 stations for model validation (i.e. the model validation data set).
3. Results and Discussion
The mean annual average nitrate concentration in the monitoring wells exceeds the WHO nitrate drinking water norm for nitrate significantly in the Northern urbanized part of the groundwater body (Figure 3). Lower concentrations are observed in the less urbanized Southern part, which is dominated by the presence of the Zoniën forest. Major water exploitation wells and drainage galleries for drinking water provision are concentrated in the Southern part of the study area. The spatial structure of the observed nitrate contamination is generally consistent with the vulnerability predicted from the DRASTIC model, suggesting that nitrate contamination can partially be explained by natural hydrogeological properties determining the groundwater vulnerability. Yet, there is a strong correlation with land use as the urbanized part of the region is concentrated on the alluvial part of the study region.
Generally, only small trends can be observed in the nitrate concentration time series, and only a part of them are significant from a statistical point of view (Figure 4). Results show that the heavily contaminated monitoring points in the Northern part are mostly characterized by a decreasing trend, while the lower contaminated monitoring points in the Southern part are characterized by an increasing trend.
Results of the dual isotope analysis are shown in Figure 5. The results together
Figure 5. Dual isotope plot of nitrate in the Brusselean groundwater body of BCR. The dual isotope results are projected on the possible intervals of pollution sources. Black: Nitrate precipitation source; Red: Nitrate mineral fertilizer source; Green: Ammonia fertilizer source in rain; Dark blue: Soil N mineralisation source; Light blue: Manure and sewage source.
with the more detailed SIAR analysis (results not shown) suggest a strong influence of manure or sewage and soil mineralization to the overall nitrate groundwater loading. Given the small contribution of agriculture in the BCR land use, the contribution of manure in the overall loading is not likely. Hence, the isotope analysis suggests merely a contribution from leaking sewage systems to the observed nitrate loading. Waste water in BCR is majorly collected in a unitary sewage network system, and the status of the unitary system is poor. Some of the sewage lines are more than 100 years old and are partially degraded. The waste water manager recently mapped the status of the sewage network. It is estimated that more than 500 km of the total of 1900 km of sewage lines are in a poor status. Hence leaking of organic matter loaded waste water towards the groundwater body is likely.
The simplified methodology for delineating the influence zones of the monitoring wells was compared with the results of a delineation method using the numerical and mechanistic hydrogeological model code FEFLOW . Results suggest a conservative estimate of the influence zone with the simplified methodology (Figure 6). The methodology can therefore be used to weigh the spatial attributes of the independent variables for constructing the statistical models.
Figure 6. Comparison of the estimates of the influence zone for 4 monitoring wells in the BCR between the simplified GIS based method and a method based on a detailed hydrogeological model implemented in the FEFLOW software.
The Akaike Information Criterion (AIC) and R2 were used to identify the independent parameters to be included in the linear multiple regression and ANN models. Results of the AIC and R2 in terms of the number of independent parameters are shown in Figure 7. A linear model with 5 parameters allows to describe 56% of the observed variation in nitrate concentrations of the calibration data set (Figure 7, Figure 8). When more than 6 parameters are used, the increasing AIC suggests an overfitting of the data.
It is well known that many processes determining nitrate pollution of groundwater are often non-linear. To incorporate possible non-linearity, the model with
Figure 7. Akaike Information Criterion (AIC) and R2 in terms of the number of most significant explanatory variables in the multiple regression model.
Figure 8. Simulated versus observed mean annual nitrate concentration data as simulated with the multivariate linear regression model (calibration data set).
five parameters was refined using a non-linear artificial neural network (ANN) model. With this ANN model, we are able to explain 79% of the variation in mean annual nitrate concentration in training mode, and 60% in validation mode. This is consistent with other studies showing that non-linear statistical models increase the explanatory power of the model .
The 5 most significant parameters explaining observed nitrate concentrations in the linear model are 1) the percentage of impermeable surface in the influence zone of the monitoring well; 2) the density of sewage water collectors that are classified as moderate in these influence zones; 3) the number of environmental permits considered being at risk in the influence zone; 4) the size of the influence zone; and 5) the depth of the monitoring well. The individual effect of these parameters on the predicted mean nitrate concentration for both the linear and the non-linear model is illustrated in Figure 9. The first three parameters are directly related to urban infrastructure in the study area. The impact of urban infrastructure on groundwater quality is consistent with observations in other studies, showing that urban development is often associated with a degradation of groundwater quality  . This is also consistent with the isotope analysis suggesting the influence of degrading sewage infrastructure on the groundwater quality. The impact of the size of the influence zone illustrates a possible dilution effect when the size of the influence zone increases. The effect of the depth of the monitoring well is consistent with the observation that nitrate pollution plumes are more concentrated close to the soil surface and are more diluted when piercing deeper in the geological formation.
Process based models, indicator models and statistical models can be used to asses nitrate contamination vulnerability of groundwater. In this study, we preferred statistical models to asses the vulnerability. Indeed, most of the approaches in the literature deal with nitrate contamination in rural and peri-urban environments, where nitrate contamination from agricultural origin may significantly
Figure 9. Individual effects of the 5 explanatory variables on predicted mean nitrate concentration. RNA: Artificial Neural Network model; RLM: Multiple linear regression model; Perc_surf_imperm: the percentage of impermeable surface; lg_col_4: the density of sewage water collectors that are classified as moderate; nb_permis: the number of environmental permits considered being at risk; surface_ZI: the size of the influence zone; prof_rel_m: the depth of the monitoring well.
contribute to the degradation of groundwater quality. The processes of nitrate fate and transport in agricultural soils and subsoils are well known and are integrated in validated nitrate fate and transport models.
These models can be used to assess the degradation status in rural and peri-urban environments. In urban environments however, the sources and pathways of nitrate contamination are much more complex. Process based modelling approaches are therefore less appropriate for modelling vulnerability of groundwater systems in these environments.
Statistical and machine learning approaches are appropriate to model nitrate contamination of groundwater in complex environments, when sufficient data on the contamination and the possible explanatory variables are available. The individual effects give some insight on the relevance of explanatory variables in the overall process. The overall model structure identified in the present study may be rather generic, since it aligns with other studies on nitrate contamination in urban environments. Yet, the specific model parameters of the statistical models are study site specific and should at best be recalibrated for each new study.
In this study, we implemented statistical models to predict the groundwater nitrate contamination of the unconfined Brusselean groundwater body in the urban environment of the Brussel’s Capital Region. This groundwater body is degraded, with nitrate concentration levels exceeding the drinking water norms in the northern part of the region. The spatial structure of the nitrate contamination is consistent with the predicted vulnerability by means of the DRASTIC model. The temporal structure exhibits small trends that are moderate significant, with an increasing trend for the points with low concentrations and a decreasing trend for the points with high concentrations. The explanatory variables suggest a strong impact of urban infrastructure on groundwater nitrate degradation. This is consistent with many other studies reported in literature and with the dual isotopic analysis of detected nitrate in this groundwater body. This also illustrates the importance of the maintenance and/or rehabilitation of urban infrastructure to preserve groundwater quality. Groundwater restauration in the studied urban environment should primarily focus on the rehabilitation of the degraded part of the sewage water network.
The statistical models were able to explain 56% of total variation of observed nitrate contamination when a linear model structure is used. This level could increase to 60% in validation mode when a non-linear model structure was used. The proposed models can therefore be used to predict spatially distributed nitrate contamination of the groundwater body in terms of available spatially distributed ancillary variables. The predicted contamination can hence be considered as a proxy for the groundwater vulnerability. The statistical basis of the models developed in this study allows also to assess the uncertainty of the contamination predictions, and these uncertainty assessments are based on a solid theoretical basis. Statistical models for predicting vulnerability have therefore a comparative advantage to parametric models as they are data based and as they allow to add a quality label to the vulnerability prediction.
The statistical models presented in this study focus on nitrate contamination. It is expected that similar approaches can be developed for other contaminants. The statistical models can therefore be used to improve the overall mapping of groundwater quality by assimilating available and new groundwater monitoring data with predictions coming to the statistical models. Use can be made of advanced data fusion techniques to generate such type of models. The presented approaches can therefore be used to support the monitoring program, by identifying the locations which currently are poorly sampled, or by interpolation when monitoring data are missing. The resulting maps may be valuable for steering the many groundwater protection or restauration programs in such urban environments.
 Foster, S.S.D. (1987) Fundamental Concepts in Aquifer Vulnerability, Pollution Risk and Protection Strategy. In: Duijvenbooden, W. and Waegeningh, H.G., Eds., Vulnerability of Soil and Groundwater to Pollutants, TNO Committee on Hydrological Research, The Hague, 69-86.
 Gogu, R.C. and Dassargues, A. (2000) Current Trends and Future Challenges in Aquifer Vulnerability Assessment Using Overlay and Index Methods. Environmental Geology, 39, 549-559.
 Voudouris, K. (2009) Assessing Groundwater Pollution Risk in Sarigkiol Basin, NW Greece. In: Gallo, M. and Herrari, M., Eds., River Pollution Research Progress, Nova Science Publishers Inc., Hauppauge, 265-281.
 Uricchio, V.F., Giordano, R. and Lopez, N. (2004) A Fuzzy Knowledge-Based Decision Support System for Groundwater Pollution Risk Evaluation. Journal of Environmental Management, 73, 189-197.
 Farjad, B., Mohamed T., Wijesekara, N., Pirasteh, S. and Shafri, H. (2012) Groundwater Intrinsic Vulnerability and Risk Mapping. Proceedings of the Institution of Civil Engineers—Water Management, 165, 441-450.
 Vanclooster, M., Mfumu Kihumba, A. and Ouedraogo, I. (2014) L’Union Fait la Force, or How Different Approaches Should Be Combined to Assess Groundwater Vulnerability at the Regional Scale. Proceedings of the IAH 2014 Conference, Groundwater: Challenges and Strategies, Marrakech, 15-19 September 2014, 117.
 Mattern, S., Bogaert, P. and Vanclooster, M. (2007) Introducing Time Variability in the Mapping of Groundwater Contamination by Means of the Bayesian Maximum Entropy Method. Proceedings of the International Conference on Water Pollution in Natural Porous Media at Different Scales. Assessment of Fate, Impact and Indicators, Barcelona, April 2007, 181-186.
 Candela, L., Vadillo, P., Aagaard, P., Bedbur, P., Trevisan, M., Vanclooster, M., Viotti, P. and Lopez-Geta, L. (2007) Introduction. Proceedings of the International Conference on Water Pollution in Natural Porous Media at Different Scales. Assessment of Fate, Impact and Indicators, Barcelona, April 2007, 3-5.
 Tiktak, A., Boesten, J.J.T.I., van der Linden, A.M.A. and Vanclooster, M. (2006) Mapping Groundwater Vulnerability to Pesticide Leaching with a Process-Based Metamodel of EuroPEARL. Journal of Environmental Quality, 35, 1213-1226.
 Mattern, S., Fasbender, D. and Vanclooster, M. (2009) Discriminating Sources of Nitrate Pollution in a Sandy Aquifer. Journal of Hydrology, 376, 275-284.
 MfumuKihumba, A., Ndembo Longo, J. and Vanclooster, M. (2015) Modelling Nitrate Pollution Pressure Using a Multivariate Statistical Approach: The Case of Kinshasa Groundwater Body, Democratic Republic of Congo. Hydrogeology Journal, 24, 425-437.
 Aller, L., Lehr, J.H., Petty, R. and Truman, B. (1987) Drastic: A Standardized System to Evaluate Groundwater Pollution Potential Using Hydrogeologic Settings. National Water Well Association, Worthington.
 Tiktak, A., Boesten, J.J.T.I., van der Linden, A.M.A. and Vanclooster, M. (2007) Mapping the Vulnerability of European Groundwater to Leaching of Pesticides with a Process-Based Meta-Model of EuroPEARL. Proceedings of the ACS National Meeting, Chicago, March 2007, 26.
 Pineros-Garcet, J., Ordonez, A., Roosen, J. and Vanclooster, M. (2006) Metamodelling: Theory, Concepts and Application to Nitrate Leaching Modelling. Ecological Modelling, 193, 629-644.
 Bogaert, P. and Fasbender, D. (2007) Bayesian Data Fusion in a Spatial Prediction Context: A General Formulation. Stochastic Environmental Research and Risk Analysis, 21, 695-709.
 Mattern, S., Raouafi, W., Bogaert, P., Fasbender, D. and Vanclooster, M. (2012) Bayesian Data Fusion (BDF) of Monitoring Data with a Statistical Groundwater Contamination Model to Map Groundwater Quality at the Regional Scale. Journal of Water Resource and Protection, 4, 929-943.
 Parnell, A.C., Inger, R., Bearhop, S. and Jackson, A.L. (2010) Source Partitioning Using Stable Isotopes: Coping with Too Much Variation. PLoS ONE, 5, e9672.
 Diersch, H.J.G. (2013) FEFLOW: Finite Element Modeling of Flow, Mass and Heat Transport in Porous and Fractured Media. Springer Science & Business Media, Berlin.
 Maier, H.R. and Graeme, C.D. (1996) The Use of Artificial Neural Networks for the Prediction of Water Quality Parameters. Water Resources Research, 32, 1013-1022.
 Appleyard, S. (1995) The Impact of Urban Development on Recharge and Groundwater Quality in a Coastal Aquifer near Perth, Western Australia. Hydrogeology Journal, 3, 65-75.
 Kulabako, N.R., Nalubega, M. and Thunvik, R. (2007) Study of the Impact of Land Use and Hydrogeological Settings on the Shallow Groundwater Quality in a Peri-Urban Area of Kampala, Uganda. Science of the Total Environment, 381, 180-199.