With increasing energy requirements, economical and cleaner renewable energy resources are in great demand; over the past several decades, this has led to global growth in natural gas production. In 2002-2003, the combination of advanced horizontal drilling and hydraulic fracturing technologies was developed and deployed in the exploration and production stages, allowing economically feasible extraction of natural gas from unconventional sources, primarily shale, throughout the U.S. Since then, shale gas has been the primary energy source in the U.S., delivering 63% of the total natural gas production in 2011, with a growth rate of 48% between 2006 and 2010   . The production of shale gas is expected to increase by at least 100% from 2011 through 2040  .
Hydraulic fracturing fluid, a mixture of 7000 - 18,000 m3 volume of water and sand (99.5%) with chemical additives (0.5%), including acid, friction reducer (polyacrylamide, mineral oil), surfactant (isopropanol), salt (potassium chloride), scale inhibitor (ethylene glycol), pH adjusting agent (sodium hydroxide, potassium carbonate), corrosion inhibitor (n, n-dimethyl formamide), and biocide (glutaraldehyde), has been designed specifically to create a well-flow path to the targeted shale formation, which has very low permeability (k < 0.01 md)  .
During the hydraulic fracturing process, the fracturing fluid is injected into horizontal pipes at high pressure to exert enough force on the shale formation―at depths of 5000 - 16,000 m and lateral distances up to 8000 m―to open fractures within the formation to create paths for the gas held in pores in the shale to flow to the well, while the proppant (sand) keeps the fractures open    .
In the first two weeks after completion of the hydraulic fracturing process, pressure is released at the well bore, and 10% - 50% of the fracturing fluid that was deposited in the shale (flowback water), returns to the surface carrying chemical additives, total dissolved solids (TDS), gas and oil compounds, naturally-occurring metal ions and radioactive materials, while the rest of the fluid returns to the surface along with the produced oil and gas over the lifetime of the well (produced water). The returning flow (flowback water and produced water) is collected on the surface and processed to be either recycled or disposed in a Class II injection well. When the wells reach their lifespan, they are abandoned, filled with cement, sealed and buried   .
Recently, dramatic increases in the number of oil and gas extraction wells in the U.S. have also raised environmental concerns about the potential effects of oil and gas activities, with intense debates about groundwater pollution and safety issues related to hydraulic fracturing     . The two most active debates related to potential groundwater contamination are possible methane gas migration and groundwater contamination by flowback and produced water  .
The formation of methane by bacteria occurs commonly in anaerobic subsurface environments (biogenic methane). However, there is concern that methane gas formed in the deep shale formations (thermogenic methane) can contaminate groundwater through possible connectivity between the deep shale and the aquifers above it.
A previous study in Pennsylvania found higher concentrations of methane gas in groundwater near active extraction areas  , while in contrast, no relationship was identified between distance from the oil and gas wells and groundwater concentrations of methane in the Wattenberg field, Colorado. However, thermogenic methane was found in two aquifer wells in the Wattenberg field, implying the existence of a possible pathway from deep shale to the overlying aquifer  .
The potential paths of groundwater contamination with flowback and produced water are: improper disposal of saline water produced during oil and gas exploration and production activities; surface spills and leaks; poorly constructed wells, and well casing failure   . However, no evidence of systemic groundwater contamination due to oil and gas activities has been found in the U.S., possibly because of a lack of data  .
The importance of monitoring water quality before, during, and after drilling has been emphasized by federal agencies such as the U.S. EPA, and a few states have adopted regular monitoring practices. In February, 2013, Colorado was the first state to pass groundwater sampling and monitoring regulations (Rule 609).
Rule 609 administered by the Colorado Oil and Gas Conservation Commission (COGCC) requires up to four baseline samples collected within a 0.8 km radius of a proposed oil and gas well within 12 months prior to drilling, and post-completion sampling between 6 and 12 months after drilling, followed by additional sampling between 5 and 6 years after the last sampling event at the initial sample locations. The list of water quality parameters required for groundwater quality testing includes: pH, specific electrical conductivity (EC), total dissolved solids (TDS), dissolved gases, such as methane, ethane and propane, alkalinity, major anions, major cations, bacteria, benzene, toluene, ethylbenzene, xylenes, and total petroleum hydrocarbons (TPH)  . These data are available on the COGCC website, http://cogcc.state.co.us/.
The most important goal of monitoring groundwater quality in the oil and gas industry is to determine the effects of oil and gas production activities on groundwater quality by monitoring changes in groundwater quality before, during, and after the construction of oil and gas wells in the area. Sufficient consecutive water quality data are required to understand trends in groundwater quality (e.g., seasonal) for a period of time before it is possible to determine the significance of changes in the water quality by comparison to normal conditions.
However, the limitations of field sampling (ex-situ) methods have been acknowledged by regulatory agencies and scientists, not only due to the insufficient acquisition of the temporal and spatial data necessary for evaluation, but also due to the resources and time-intensiveness, as well as the high cost, of such methods. For these reasons, wireless real-time technology, which is capable of continuous monitoring of groundwater quality on-site using remote in-situ sensors, has been proposed. Despite growing needs for real-time in-situ monitoring methods, there are technical difficulties and economic challenges associated with building contaminant-specific sensors and wireless telemetry systems.
1.1. A Real-Time Groundwater Monitoring System: Colorado Water Watch
In response to perceived industry and community needs, the Colorado Water Watch (CWW) real-time groundwater monitoring system was developed in a cost-effective manner based on lab-qualified surrogate technologies; it uses equipment currently available to bridge the gap between the two monitoring methods: ex-situ and in-situ, and to increase the performance efficiency of monitoring (Figure 1).
Using a commercially available in-situ water quality multi-parameter probe, the CWW system monitors groundwater in real-time and collects enough historical data to understand normal water quality conditions (e.g., pH, EC, DO, etc.). After establishing an initial acceptable baseline, if the groundwater is contaminated by oil and gas activities, the surrogate water quality parameters will indicate a change using the anomaly detection algorithms. The CWW system’s anomaly detection algorithms can decide whether ex-situ monitoring is necessary to determine if the groundwater disturbance is due to oil and gas activities.
The main purpose of the CWW is to enhance the effectiveness of the regulatory agency’s monitoring practices―not only through long-term data acquisition, but also through screening large amounts of data from large segments of oil and gas operations―by incorporating an event detection system (EDS) in the CWW system. The key aspect of the CWW system is that it generates information through data evaluation (qualitative monitoring), not just data collection (quantitative monitoring), which makes the system different from existing monitoring approaches. Information generated by the system will help the industry understand normal background conditions and anomalies of groundwater quality and provide the time needed to sample groundwater for in-depth lab analysis.
Figure 1. The CWW system workflow, event: a time period of anomalous water quality.
The differences between the existing COGCC groundwater monitoring rule and the CWW system are well described in Supporting Information (SI) Table S1.
The objectives of the CWW real-time groundwater monitoring system are to: 1) establish a wireless data network and automate multiple steps of data flow; 2) develop a real-time groundwater monitoring network in the Denver-Julesburg (DJ) Basin; 3) collect and evaluate long-term water quality data as a decision- making tool for stakeholders, and 4) expand the conversation with local communities by establishing a public web-based information resource on groundwater quality associated with oil and gas production activities.
1.2. Basin Description
The DJ Basin encompasses approximately 180,000 km2 in eastern Colorado, southeastern Wyoming, and southwestern Nebraska. The cross-section of the basin is that of an asymmetrical bowl that resulted from the uplift of the Rocky Mountains to the west, with the deepest sedimentary rock formation in the western flank across the axis of the basin  .
The first oil and natural gas wells in the DJ Basin were completed in 1881 in the Florence Field, but the Wattenberg field, where the deepest shale formation crosses, has only been developed in the past 45 years after its discovery in 1970. Petroleum source rocks in the gas field date mostly from the Cretaceous period, with six potential reservoirs: J Sandstone, Codell, Dakota, Niobrara, Sussex, and Shannon, that range in age from 68 to over 100 million years and are buried at depths between 1.2 and 2.7 km  . Currently, the Wattenberg Field is the most active oil and gas area in the DJ Basin, having over 22,000 wells that produced 0.02 km3 of gas and 1400 kL of oil per day in 2013  .
Groundwater in the area is present in two forms: South Platte Aquifer and Laramie-Fox Hills Aquifer. The South Platte Aquifer is a shallow, unconfined alluvial aquifer. The aquifer has hydraulic connectivity with surface water and is recharged by infiltration of streams and the percolation of precipitation, irrigation, and canal and pond seepage.
The depth of the water from the ground surface is 0 - 65 m and the saturated alluvial deposit is up to 16 km wide and 60 m thick. The aquifer produces up to 11,350 lpm of water and has a transmissivity of 370 - 10,200 m2/d, a hydraulic conductivity of 30 - 610 m/d, and a specific capacity of 140 - 5200 lpm/m. The South Platte Aquifer is the largest source of water for agriculture in the area, primarily for irrigation and livestock purposes, and contains relatively high concentrations of TDS.
The Laramie-Fox Hills Aquifer is a confined bedrock aquifer, encompassing 17,000 km2 of the Denver Basin. The maximum depth of the water from the ground surface is 730 m, with a saturated thickness of 0 - 110 m. The Laramie confining unit is an impermeable layer that is between the Arapahoe Aquifer and the Laramie-Fox Hills Aquifer, obstructing water flow from the Arapahoe Aquifer to the underlying Laramie-Fox Hills Aquifer. The Laramie-Fox Hills Aquifer yields up to 1300 lpm, with a transmissivity ranging from 12,000 - 87,000 lpm/m. It is a significant source of domestic and municipal water  .
In an effort to monitor potential groundwater contamination from oil and gas activities, mainly in the form of surface spills and/or well casing failures, both deep, confined aquifers and shallow alluvial aquifers were targeted for monitoring. Contamination caused by casing and cementing failures of oil and gas wells can be detected through monitoring the deep, confined aquifer (mechanisms 1 and 2 in Figure 2), and contaminants from surface spills are detected by sensors that monitor the shallow alluvial aquifer (mechanism 3 in Figure 2).
1.3. Site Description and Monitor Installation
Groundwater monitors were installed in four shallow alluvial aquifer wells and one deep, confined aquifer well. The monitoring stations in the shallow alluvial aquifer wells are: ARDEC (control), CHILL, LaSalle and Gilcrest, and the deep, confined aquifer well is Galeton (Table 1, SI Figure S1 and SI Figure S2). The surface soil hydraulic profiles are listed in SI Table S2.
Prior to installation of the monitor, all existing and newly-drilled groundwater wells were cleaned, and in-depth baseline water quality tests were performed at each site, according to COGCC Rule 609. No oil and gas-related issues were discovered.
Figure 2. System design of the Colorado water watch.
Table 1. CWW site locations and descriptions.
A multi-parameter in-situ probe was installed at the screened level of each monitoring well to measure fresh groundwater and avoid measuring stagnant water in the well. The self-charging cellular data-logger was placed in an enclosed box immediately next to the monitored well (SI, Figure S3).
1.4. Surrogate Sensing Technology
Sensors that measure specific parameters of TDS (e.g., chloride) or dissolved gases (e.g., methane) related to oil and gas activity, were considered for use with the CWW. However, contaminant-specific sensing has a relatively high cost in addition to being less durable and requiring more maintenance.
To resolve these issues, a contaminant-surrogate sensing approach was evaluated in the laboratory at Colorado State University  . The contaminant- surrogate sensing technology chosen is a cost-effective, low maintenance, long- term monitoring method that was developed based on the well-known correlation between EC and TDS, and the expected close relationship between oxidation- reduction potential (ORP), dissolved oxygen (DO) and dissolved methane gas in water (SI, Table S3).
In general, groundwater contamination by oil and gas production activities occurs in two forms: liquid and gas. Liquid contamination occurs through contact of groundwater with produced or flowback water, both of which have significantly higher TDS concentrations than even the highest drinkable groundwater TDS concentration of approximately 2000 mg/L. Thus, a minimal amount of produced and/or flowback water can disturb groundwater and be detected easily by measuring the EC of groundwater, which has a high correlation with TDS.
Gas contamination can be caused by a similar mechanism to liquid and in addition, natural gas can migrate upward along the annulus of an improperly sealed casing and wellbore. Using surrogates of methane―ORP and DO―in- creased concentrations of methane in groundwater can be detected and the origin of the methane can be determined by subsequent ex-situ isotopic lab analysis.
The EC, ORP, and DO sensors are relatively inexpensive and adequate for deployment in the field for long-term monitoring. A multi-parameter in-situ probe (Hach Hydrolab, Loveland, CO) was selected subject to three requirements: cost-efficiency; durability, and low maintenance, in order to measure six preferred water quality parameters: temperature, pH, EC, ORP, DO, and water depth in the selected groundwater wells.
1.5. Event Detection System (EDS)
An EDS was adopted for the CWW system to look for changes in monitoring parameters when compared to historical data. The EDS chosen was CANARY, a set of algorithms that have been developed by the U.S. EPA for the purpose of detecting contaminants in drinking water resources  .
The CWW establishes a real-time connection between the database and the EDS, which runs constantly on the server; the EDS flow diagram is shown in SI, Figure S4. An “outlier” is defined by the system as an immediate anomaly in the water quality data caused by an incident or a false operation, and an “event” is an occurrence of multiple outliers for a given duration.
To detect outliers in time series data, CANARY uses statistical and mathematical event detection algorithms, such as the linear prediction coefficient filter (LPCF), multivariate near neighbor (MVNN), and set-point proximity (SPP) algorithms.
The LPCF algorithm predicts the next value based on the historical data at each time step and calculates a residual by measuring the distance between the incoming and predicted data. The MVNN calculates a residual by measuring the distance of multiple parameters in a three-dimensional space and comparing these to the historical distances of the parameters. The SPP algorithm is the simplest; it estimates a residual by measuring the distance between the new data and the pre-defined minimum or maximum limit of the parameter.
The algorithms then classify the new data as either normal background or an outlier. The new datum is an outlier if the estimated residual exceeds a pre-determined outlier threshold, which means that the new datum is significantly different from the historical data. The data then go through the binomial event discriminator step. In this step, the system recognizes an event if the occurrence of outliers exceeds a pre-defined event probability threshold in a moving time frame (history window). A consensus algorithm, CMAX, was employed to report the maximum event probability from all event detection algorithms applied. An event is considered to indicate that “something happened”, and therefore it requires follow-up water collection and ex-situ analysis to determine the cause, as mentioned above in the purpose of the CWW system.
More details about the algorithms and the EDS can be found in the CANARY user’s manual  . A study on the EDS performance optimization for the CWW system was conducted to determine event detection algorithm inputs required based on the CANARY Testing and Sensitivity Analysis  .
Three sets of the first two months of data, acquired hourly from the CWW monitoring sites, were used in the study. The consecutive two-month data were assumed to be background data containing no oil and gas-related events. This assumption was supported by the laboratory ex-situ groundwater quality data analysis from the monthly sampling performed in the first stage to understand groundwater quality at the CWW monitoring sites.
The window size in LPCF and MVNN algorithms are defined as the number of previous time steps used to predict the next value and to compare water quality values at each time step, respectively. To determine the appropriate window size, outlier and event probability thresholds were pre-set to infinite to have neither outliers nor events. Window sizes between 168 (one week) and 672 (four weeks) were tested with one half week (84) increments to find the optimal window size, which has the lowest average and standard deviations of residuals of all measuring parameters calculated by each algorithm (SI, Figure S5). A window size of 588 (3.5 weeks) was selected because the standard deviation of the residuals of the window size was similar to their minimum for both LPCF and MVNN algorithms  .
1.6. Event Responses
When an event is detected by the EDS, the system alerts the registered CWW group to initiate a detailed inspection at the site. The systematic flow of the event responses is shown in SI, Figure S6. An event can be mechanical, such as sensor failure, or seasonal, caused, for example, by water table fluctuations due to irrigation; thus, a primary event analysis is required to classify the type of the event based on the practical data and previous experience. If the event was mechanical or seasonal, it can be stored in the historical pattern library to prevent future false positives.
In the case of non-operational or non-seasonal events, the CWW team is deployed to the site within 24 hours of the event alarm to sample groundwater according to COGCC Rule 609; they also conduct a brief site inspection. The samples are then transported to an EPA-certified laboratory, where they undergo a comprehensive lab analysis for 7 - 10 days.
The sensor data could send the first alert of an “event” based on the backend algorithms, but the source of the event was not determined until a field grab sample was collected by CWW researchers and in depth lab results were obtained.
If the event is related to oil and gas operations, the COGCC is notified and they begin an extensive inspection with respect to the type of contamination, source, extent, and other characteristics; the results are posted subsequently on the CWW website, as well as on the COGCC website (http://cogcc.state.co.us). The overall systematic monitoring scheme is described in SI, Figure S7.
2. Real-Time Data and Monitoring Network
A wireless, real-time groundwater monitoring network was designed, in general, to have four steps: 1) wireless data acquisition and transfer; 2) data management and storage; 3) data processing and analysis, and 4) data interpretation and display of results.
Wireless data acquisition and transfer are achieved by employing a cellular data logger (HachHydromet, Loveland, CO) with a multi-parameter probe through an SDI-12 interface. The cellular data logger is capable of general packet radio service (GPRS) that enables hourly remote data transfer from the probe in the field to the CWW data server. Power is supplied by a self-charging solar battery. The data are transmitted to a cloud server, and then stored automatically in the database, where they go through the real-time event detection system. The results from the data analysis are then posted in the database and displayed on the CWW website in simplified form to show the public whether the recent groundwater quality data for each site are normal or not. When an event is detected in the data analysis step, the system alerts the CWW team for further inspection of the groundwater quality, including the COGCC baseline testing as described above.
The end-user interface was constructed using ASP.NET. ASP.NET is an easy- to-use, comprehensive tool for building powerful websites and interfaces that incorporate component-based development. The data from the aforementioned data logger are transmitted to the server in the form of XMLs. These XMLs are decoded to populate the database, which is located securely in the local SQL Server.
The data acquisition infrastructure is primarily an automated process of the telemetry system that sends data to a host address in real-time. The host address is set up inside the data-processing infrastructure in the server, which organizes the raw data to its corresponding entries in the database. The workflow shown in SI, Figure S8 explains the structure of our current system.
The CWW described in this paper was put together at a proof-of-concept scale consisting of only five monitoring sites. It is one of the first real-time groundwater monitoring systems in an active oil and gas production field. The CWW is intended to be an early warning system that can provide risk management and decision-making tools utilizing advanced monitoring and information technologies. It has the ability to enhance other groundwater monitoring networks and approaches for oil and gas regulatory agencies, industry, and communities as well.
Over a two-year monitoring period, real-time groundwater quality data have been collected hourly, transmitted into the CWW database, and analyzed through the event detection software.
3.1. Statistical Summary of the Real-Time Monitoring Data
Real-time monitoring data for this analysis started from the installation date and ended in January 2017, which represents a one-hour frequency, and two-year long period of groundwater quality observation.
Normality tests show that none of the water quality variables exhibits the same form of normal distribution. As shown in Figure 3, all five monitoring wells have distinguishing groundwater (p < 0.01, t-test) except dissolved oxygen between Galeton and LaSalle (p = 0.1618).
All the monitoring wells had a relatively stable background signal. 90% of the temperature readings are between 10.39˚C and 16.16˚C. Galeton had the highest water temperature, with a mean of 16.05˚C and Gilcrest had the lowest water temperature with mean of 10.77˚C. pH ranges from 5.47 to 8.88 for 90% of the readings, which vary a lot for the individual wells. Shallower groundwater wells have lower pH values than the deep groundwater well, because water would react with carbonate minerals during the downward migration of water and correspond to the exchange of sodium in the clay with calcium in water  .
High ORP level in water indicates an oxygen-rich water that is capable of oxidizing metals such as manganese and iron. The more electron-donating organic compounds in the water, the lower ORP level. Also, according to our surrogate study  , low ORP values could indicate the presence of dissolved methane in water. Over 94% of the measurements showed negative ORP in Galeton, while less than 0.1% of the ORP readings are negative for the other four monitoring wells. Meanwhile, most of the dissolved oxygen of the four shallow wells is greater than zero and most of the DO readings (99%) of Galeton are less than zero indicating a more reduced environment in this well. Higher water temperature, higher pH, lower ORP and lower dissolved oxygen are observed in Galeton because it is the only deep monitoring well with a depth of 114 m while the
Figure 3. Comparison of six water quality variables between five monitoring wells (from the installation date to Jan 2017) (temperature-groundwater temperature, ORP-oxidation-reduction potential). (a) temperature; (b) pH; (c) Conductivity; (d) ORP; (e) Do; (f) Depth.
depth of the deepest shallow well is 20 m. The appearance of the water becoming reducing with the increasing depth underground would be caused by the absence of dissolved oxygen. The conductivity of groundwater in Galeton is lower than the conductivity of groundwater in shallow wells. Dissolved solids which have a positive linear correlation to conductivity in groundwater come from several sources, 1) mineral-dissolution in the subsoil, 2) surface runoff infiltration, 3) saline leakage from other formations  . Since rainfall is low in TDS and cross-formation transport of salts is limited due to the underlying confined aquifer, the main factors that would affect the TDS are surface runoff infiltration and mineral dissolution in the subsurface.
The real-time monitoring data shows that ORP was not linearly correlated to dissolved oxygen, and it might be due to the combination effect of water temperature, pH, dissolved oxygen and conductivity.
Since the oldest monitoring well in this study was installed in Feb 2014 and the latest one was in Nov 2014, we can compare the changing trends of all the water quality parameters in 2014, 2015 and 2016 (Figure 4). The unit is ˚C for
Figure 4. Real time monitoring data comparison 2014-2016 (x-axis of the plots represents the data observation date from Feb 2014 to Dec 2016).
temperature, 100 µS/cm for conductivity, mV for ORP in Figure 4 above. The trend of water temperature of CHILL and ARDEC are the same that water temperature drops to the lowest level in the summer time and rises to the highest level in the winter, while Gilcrest has the opposite trend. Water temperature in LaSalle and Galeton was stable. The highest overall water temperature occurs at Galeton (over 20˚C) among the five monitoring wells due to its deepest well depth. Most of the “events” detected by CANARY (detection probability greater or equal to 0.9) occurred in June-September (summer), and there are a few events detected in Feb and March. Most of the “events” occurred during the irrigation season, which lasts from late April to early Oct, with the lowest water depth during the year.
Galeton has the most stable water since it is in a confined aquifer and is likely not affected by surface activities. Dissolved oxygen in water either rises in the summer (ARDEC and Gilcrest) or is stable during the whole year. Most data for the same month (2014 & 2015) at the same monitoring well are nonidentical except the water depth at Galeton, which remains the same (p > 0.05, Mann-Whit- ney-Wilcoxon Test), indicating the water table of this deep monitoring well has not been changed and not affected by surface activities. ORP ranges from 0 to 600 mV for the four shallow wells and 0 - 150 mV for Galeton.
3.2. Trend Analysis and Outlier Detection
Since the background knowledge of groundwater quality in Wattenberg field is not available, learning process to understand the quality-changing trend is required and necessary. For an individual monitoring well, the behavior of a groundwater quality variable on the time scale was fitted into the moving median  . Equations are shown below:
y = movmedian (x, k) returns an array of local k-point median values, where each median is calculated over a sliding window of length k across neighboring elements of x. When k is odd, the window is centered about the element in the current position. When k is even, the window is centered about the current and previous elements. The window size is automatically truncated at the endpoints when there are not enough elements to fill the window. When the window is truncated, the median is taken over the elements that fill the window. Optimal window size of five is selected and the Goodness of fit for the simple moving median results are shown in Table 2.
Moving median method gives a good estimation and prediction of temperature, pH, conductivity and ORP. Using the moving median method, we can filter out the outliers that fell beyond the range of [mean + 4 * standard deviation, mean - 4 * standard deviation] (99% confidence interval) for each single water quality variable.
Table 2. Goodness of fit (R-square) by moving median mode (n = 5) by monitored value for each site.
3.3. Event Detection
“Events” may be the result of natural processes (such as surface runoff that infiltrated to the subsurface and flow towards a point of discharge (monitoring site) or anthropogenic activities. The human related activities that are particularly relevant to the groundwater “anomalies” include:
Non-oil/gas related activities
1) Irrigation―large quantity of the irrigation water that recharges the subsurface can raise the groundwater table accompanied by significant changes in groundwater quality, especially nutrient level. Irrigation is an important source of surface and groundwater non-point source contamination, due to the high concentration of nutrients, pesticides, salinity and trace elements. Nitrate is a significant soluble nutrient that can contribute to the change of redox potential, which would affect ORP. The Wattenberg field has over 1100 irrigation wells, an indicator of the extent of agricultural activity in the area  .
2) Fertilizer pollution―the use of fertilizer would result in increasing nutrient loading, such as , which could increase the EC or TDS  .
3) Pumping―extracting groundwater through a pumping well near a monitoring site could lower the regional water table and draw water from other sources  .
4) Instrument malfunctions and sampling disturbance―pH and water temperature signals respond to an instrument malfunction almost without any lags, including monitoring instruments installation, sensor calibration and recalibration, and groundwater sampling (when an event occurred). An abrupt increase or decrease in water temperature usually suggests an instrument malfunction.
Oil/gas related activities, such as surface spills, well leakage or well construction failure, would cause contaminants, such as methane and deep formation water, dissolving and mixing with groundwater leading to a change in ORP and/ or conductivity. Deep formation water usually contains high concentrations of dissolved solids, and has a high proportion of chloride with respect to TDS  .
For the only deep monitoring well, 85% of the “events” detected from Galeton are due to the ORP change both for MOVING MEDIAN and for CANARY, indicating that the groundwater from the deep monitoring well is largely impacted by the redox reactions rather than the physical exchanges, such as the temperature and water depth. In addition, the occurrence of the outliers at the Galeton site is not the most among all the five monitoring wells even though it is surrounded by highest density of OG wells, suggesting that oil/gas activity might not be the major influence that could change the groundwater quality in a short time period. Increasing water temperature corresponded to instrument malfunctions and sampling.
Table 3 shows the when and how many of outliers detected by CANARY and moving median. There are some outliers clustered in months, such as March, July and August, even though different wells perform differently. The two methods do not agree with each other in terms of the numbers of outliers or the months that have the most outliers.
Table 3. Comparison of numbers of outliers detected by CANARY and moving median.
The moving median is used to give additional information about outliers and anomalies, to compare with CANARY’s result. PCA, (principal components analysis) is applied as a tool to organize the chemical variable matrix and differentiate different groups (outlier predictions by MOVING MEDIAN and CANARY). Six chemical variables including water temperature, pH, conductivity, ORP, DO and water depth were analyzed.PCA reduces the high-dimension data (6 in our study) and transforms the multi-dimension to two principle components (PCA1 and PCA2), which are the subsets of the attributes  . The first two principle components, PCA1 (x axis) and PCA2 (y axis), spanned the two-dimensional plot space in Figure 4 (PCA1 and PCA2 explained over 75% of the variance which could be used to analyze the inter-relations between each chemical variable and their impacts).
Real time data includes the number of outliers detected by CANARY & MOVING.
MEDIAN (on the left) and PCA plot (on the right) are shown in Figures 4(a)-(e). Red and green dots represent the outliers detected by moving median and CANARY. According to COGCC inspection, there were four surface spills including three that occurred at tank batteries and one at a well less than 2 miles upstream of CHILL on Oct 10th, Oct 13th, Dec 8th, and Dec 15th in the year of 2014. There were 139 “events” detected by moving median for CHILL, in Oct 2014. This event might be a reflection of these four surface spills since E&P wastes usually have a high BOD concentration. The event detected on Feb 15th 2015 for LaSalle might be a delayed impact of surface spill occurred on Dec 23th 2014 within 2 miles of the monitoring site. The surface spill occurred on May 28th 2014 near the Galeton site did not influence the water quality because the monitoring well is in a confined aquifer that is usually not affected by surface activities.
As shown in Figures 5(a)-(e), moving median is more sensitive to data changes compared to the CANARY EDS. Different clusters plotted in the PCA diagrams represent various data types. All the data points are drawn as black in the PCA diagrams, if the data was considered to be an “abnormal” or an outlier by Moving Median, it would be crossed by red; if the data was considered to be an outlier by CANARY, it would be circled out by green. As a result, the numbers of outliers detected by MOVING MEDIAN is more than those detected by CANARY and also the outliers detected by MOVING MEDIAN agrees with the PCA analysis, outliers shown by PCA analysis are the data points that separate with the group of “normal” data. The reason might be the real-time data was not normally distributed, a key assumption made by the CANARY algorithms by default. According to our comparisons, CANARY has not detected most of the abrupt changes as MOVING MEDIAN does, but MOVING MEDIIAN still has some drawbacks. Advantages and disadvantages for CANARY and MOVING MEDIAN are listed below:
1) The window size needs to be selected for both MOVING MEDIAN and CANARY, but during the first window size, MOVING MEDIAN can do the
Figure 5. Real-time data vs. time (left) and Outliers detected by CANARY and MOVING MEDIAN (right).
calculation but CANARY cannot.
2) Background information, such as the normal groundwater quality data, is required for both methods. For CANARY, the algorithm could be trained by using normal water quality data to optimize the settings, such as the window size, the occurrence of the abnormal events, and etc. For MOVING MEDIAN, it is a good option to select the unusual event if the data is stable for most of the time.
3) There are multiple algorithms to calculate outliers and events in CANARY that could largely reduce the noise while MOVING MEDIAN is only based on one calculation that is good for data analysis but its performance in real-time still need to be determined.
Although a seamless integration of real-time data flow, data analysis (event detection) application, event response protocols, and results visualization on a user-friendly website were established, additional monitoring stations are required to provide a network with greater resolution and coverage. The CWW is currently being expanded with support from both the oil and gas industry and the Colorado Department of Natural Resources. Additional partnerships are expected as the system becomes more accepted by the primary stakeholders, communities and industry.
This project was funded by Colorado Department of Natural Resources, Colorado Oil and Gas Conservation Commission, Noble Energy, and Colorado State University (CSU). In addition, support was provided by the Center for the New Energy Economy at CSU, Western Resource Advocates, Colorado Oil and Gas Association, Colorado Department of Agriculture, Central Colorado Water Conservancy District, and West Greeley Conservation District.
Table S1. Comparison of the attributes between the existing groundwater monitoring regulation of Colorado Oil and Gas Conservation Commission (COGCC) and the Colorado Water Watch (CWW) system.
Table S2. Surface soil properties of the five monitoring wells.
Table S3. Correlations between target parameters (dissolved methane and TDS) associated surrogates (ORP, DO, and EC) (data source: Son and Carlson, 2014  ).
Figure S1. Study sites in the Denver-Julesburg (DJ) Basin (data source: COGCC  and USGS  GIS data).
Figure S2. A flow chart demonstrating the approach for the determination of optimal sites for real-time groundwater monitoring.
Figure S3. The CWW monitoring station of Gilcrest (left) and Galeton (right).
Figure S4. Event detection system flow chart (modified from USEPA, 2012  ).
Figure S5. Averages of Average deviation and Standard Deviation of the residuals (prediction error) of all measuring parameters calculated by (a) the LPCF algorithm and (b) the MVNN algorithm with different window sizes using data from CHILL (top), ARDEC (middle), and Galeton (bottom).
Figure S6. A systematic diagram of the CWW event response protocol.
Figure S7. The overall monitoring scheme of the CWW.
Figure S8. The CWW data and monitoring network work flow.