Reconstructive Mapping from Sparsely-Sampled Groundwater Data Using Compressive Sensing

Show more

1. Introduction

Fresh water is critical to sustainable living, economic growth, social stability and public health [1]. For management and dispensing of the groundwater, accurate monitoring is of utmost importance. However, existing methods such as drilling are expensive and intrusive, and alternate methods such as gravimetry and seismic sounding are also costly to perform, subject to errors and limited in range. Thus, a method for mapping complete distributions of groundwater based on sparse data sets would be a tremendous asset in groundwater management. Here, we introduce the use of compressed sensing (CS) for generating accurate contour maps of groundwater levels based on sparsely-sampled or incomplete data. Compressive sensing has been developed for signal/image processing technology to enable reconstruction of the source signal based on sparsely sampled data, far below the Nyquist criterion. We have applied this method for recovering turbulent flow structures in so-called “sub-grid” scales [2]. Here, we demonstrate that the compressive sensing can also be used for generating accurate groundwater level maps from sparsely sampled data, by using direct measurement data in Arizona and in other parts of the southwestern United States (California, Nevada, Utah and Colorado). The method is compared with conventional interpolation method, such as kriging, which demonstrates that compressive sensing outperforms by a large margin.

Compressive sensing is a powerful method that allows for the actual signals to be recovered from far fewer samples than what have been possible according to the Nyquist-Shannon sampling theorem [3]. The Nyquist-Shannon sampling theorem states that if a function *f*(*t*) has no frequencies higher than *υ* Hz, then tracing its ordinates at a series of points (spaced at least by 1/2*υ* seconds apart) determines *f*(*t*) perfectly. This requirement had limited the full mapping of data, unless very tightly-spaced data were sampled, either in space or time. Compressive sensing, in this regard, is a transcending method, initially developed in image processing, where a very small amount of data can be effectively used to reconstruct the full mapping [3]. Compressive sensing works in a way similar (but opposite) to image compression methods such as the JPEG protocol. A signal *f*(*x*), in general, can be expressed in a series of orthonormal functions, *Ψ*_{i} (e.g. Fourier series of sinusoidal functions).

$f\left(x\right)={\displaystyle \underset{i=1}{\overset{n}{\sum}}{c}_{i}{\psi}_{i}\left(x\right)}$ (1)

For most signals, only a small number of the coefficients *c _{i}* are of significant magnitude, and most of the others can be discarded with negligible loss of information. Image compression works by having the full data,

Compressive sensing is of course not without limits, and does need the following criteria to work, in applications such as groundwater mapping:

1) The desired groundwater network database is “compressible”, meaning that the discrete data (at sparsely-spaced probe locations) can be approximated as a series of orthogonal functions (e.g. sinusoidal function in a Fourier series). Based on working with most groundwater network data base, we find that this condition is satisfied since most continuous signals can be expressed reasonably accurately using truncated Fourier series.

2) The second condition relates to the coefficient matrix, *c _{i}* in Equation (1) as an example. This matrix needs to have a relatively small number of dominant terms, so that the indeterminate matrix can be numerically found under the “compressibility” constraint. Again, tests with groundwater database suggest that this condition is also satisfied. Technical details can be found in the Appendix and in the references cited above.

If the above criteria are met, then numerical algorithms such as the *l*1-optimization or total variational optimization routines can be used to find highly under-determined data matrix based on sparse input data. The step-by-step computational procedures used in this work follow Boyd and Vandenberghe [13] and Candes *et al.* [3].

2. Compressive Sensing for Reconstructive Groundwater Mapping

For a demonstration of the compressive sensing technique, a sample MODFLOW image (256 by 256 pixel resolution or a total pixel count of 65,536) is used, for groundwater level in Fort Cobb Reservoir experimental watershed (780 km^{2}) [14], as shown in Figure 1(a). The location is in southwestern Oklahoma, USA, and the region shown in Figure 1(a) shows a mapping for a portion of the Rush Spring aquifer [14]. The feed streams supply the groundwater, from which diffusion patterns are observable. First, we start with sampling of only 2025 pixel values (out of 65,536 total) V-shaped (pyramidal) sampling lines (top box in Figure 1(b), which is an extremely sparse pixel sampling of 3%. Even at such low sampling density, remarkably some baseline features of the groundwater distribution in the MODFLOW data are recovered, as shown in Figure 1(c).

Figure 1. (a) Original MODFLOW contour map (256 by 256 pixels) of groundwater [14]; (b) Pyramidal sampling lines at various sampling rates (0.0308, 0.0761, 0.1492, and 0.2857, top to bottom); (c) Reconstructed images.

The sampling density is progressively increased from top to bottom, in Figure 1(b), which results in rapid increase in the fidelity of the reconstructed data (Figure 1(c)). Visually, the original MODFLOW data are nearly fully recovered at sampling density of only 14%, nearly an order of magnitude less than the full data. Thus, Figure 1 demonstrates the utility of compressive sensing where sparsely sampled data can be used to map out the two-dimensional data in spite of the complex feed and distribution patterns for groundwater. Of course, practical monitoring of groundwater does not follow any specific geometrical patterns; nonetheless the compressive sensing operations work in non-geometrical sampling as long as above-mentioned criteria are met [3].

We can track the rate at which the original data can be recovered at different sampling rates, and also test different data sampling geometries (pyramidal, radial and Cartesian), for the same MODFLOW data. The results are shown in Figure 2, where the “error norm” is plotted as a function of the sampling rate. The error norm is the normalized pixel-by-pixel comparison of the reconstructed to original image, and the sampling rate (or density) is the ratio of the read pixels to the total number of pixels.

Figure 2. The error norm (Equation (2)) as a function of the sampling rate for three sampling geometries: pyramidal, Cartesian and radial. The MODFLOW data in Figure 1 is used for this analysis.

$\text{ErrorNorm}=\sqrt{{{\displaystyle \sum \left({I}_{ij}-{I}_{ij}^{TRUE}\right)}}^{2}}$ (2)

*I _{ij}*is the pixel value, relative to the actual data,
${I}_{ij}^{TRUE}$. We can easily see that the pyramidal sampling geometry far outperforms the others, and using this sampling method error norm of 10 or less is possible at only 10% sampling rate, meaning that 1/10

3. Results and Discussion

For the groundwater level measurements, we use the USGS database, augmented by data provided by the state water resources management in the southwestern states (California, Nevada, Arizona, and Utah). We first focus on the groundwater resources in the southwestern United States, where the recent historical drought conditions make the accurate assessments and management important. Compressive sensing can be applied in a similar manner to any other regions where minimal amount of measurement data are available. The groundwater level data are typically generated using a pressure transducer or a floating device in registered wells. In each registered well, depth-to-well (the water head) is measured [15]. Data sets are obtained from USGS and respective water resource departments in Arizona, California, Nevada and Utah (ADWR, CDWR, NDWR and UDWR, respectively). The two pieces of information at each registered well, the depth-to-well and the site elevation (altitude), are converted to obtain the “groundwater level”, referenced to the sea level. For the site elevation, we use the NGVD29 (National Geodetic Vertical Datum of 1929), NAVD88 (National Geodetic Vertical Datum of 1988), or NED (National Elevation Dataset). The data are averaged over 0.25˚ latitude × 0.25˚ longitude areas, for data processing purposes (*i.e.* to arrange the data in manageable array sizes, and also to average the fluctuations), as shown in Figure 3. It can be seen in Figure 3 that even with this segmentation and averaging, the data sampling is quite sparse, and there are large regions (white) where there aren’t any data available (e.g. some swaths in northeastern and southwestern Arizona). Thus, it would of high value to be able to track the groundwater in the entire area, through a minimally installed probe measurement.

For validation of the compressive sensing algorithm, we start with the data set as shown in Figure 3(a), which shows the absolute groundwater level relative to the sea level for the year 2006, averaged to 0.25˚ latitude × 0.25˚ longitude cells. The spatial averaging is performed, to reduce the matrix size which exponentially reduces the computational time. Compressive sensing involves array operation, and the size of the array is one of the determining factors in reliable reconstruction. Again, white cells represent regions where no data are available. From this data set, we intentionally remove the data for 26 sites (cells) for a later comparison. The criterion for selecting these 26 sites is that there should be some nearby data locations, otherwise no validation is possible. Other than that, the selection was designed to be more or less evenly distributed. These sites are marked with inverted triangles in Figure 3(a), and also identified in Figure 3(b). The purpose is to test whether compressive sensing can recover the data at these sites, after “blinding” itself by removing these data from the analysis.

Figure 4 shows the result of this test, where we have not only removed the data at the 26 sites shown in Figure 3(b), but also applied different sampling rates. The latter was done in order to find critical limits in the data sampling rate. Out of several sampling rates that we have applied, we exhibit three: 32 (unacceptable), 59 (acceptable) and 100% (full) sampling rates. The “city index” (the horizontal axis) represents the 26 sites shown in Figure 3(b), while the vertical axis is the “absolute” groundwater level relative to the sea level. At or below 32% sampling rate of the available data, the validation plot shows poor outcome where the compressive sensing data mostly tend to overestimate the groundwater level. However, above 59% sampling rate, the reconstruction becomes almost exact. Note that this 59% sampling rate is relative to the available data set, as

(a)(b)

Figure 3. (a) Groundwater absolute level relative to the sea level, averaged to 0.25 × 0.25 (longitude x latitude) resolution, inverted triangles are the 26 sites removed from the data for internal validation; and (b) Location and names of these sites. CA = California, NV = Nevada, UT = Utah, and AZ = Arizona.

shown in Figure 3(a), which is already under-sampled. This means that even if we had only 59% of the data shown in Figure 3(a), minus the 26 data cells removed, we would still obtain accurate readings of the groundwater levels. Above this sampling rate, the data remain much the same, and overlap with the exact levels.

Thus, Figure 4 shows that compressive sensing can convert sparse data set into a more complete spatial map of the groundwater levels. We can test the compressive sensing to perform this reconstruction over time, by applying it on an annual basis from 1992 to 2012, as shown in Figure 5. Here, we also compare

(a)(b)(c)

Figure 4. Validation of the compressive sensing reconstruction. (a) 32% sampling rate; (b) 59%; and (c) 100%. Circles denote measured data, and asterisks reconstructed data. Due to local altitude, the groundwater level in absolute ft can be high.

the results with traditional methods of recovering missing data, e.g. Delaunay polygon, kriging, radial basis function, and inverse distance weighting methods. In hydrological monitoring, sparsity in the data is evidently a persistent problem, and several methods have been developed to “patch” this deficiency [15] [16]. After testing all of the above “interpolation” methods, kriging worked as well or slightly better (although still far from the actual data as shown in Figure 3) than the others, and therefore we only show the results of kriging method as open squares in Figure 5. Figure 5 shows that both the “under-sampled” (32%) compressive sensing and kriging deviate in a random manner from the actual (filled diamonds), and also from amply sampled (>59%) compressive sensing

(a)(b)

Figure 5. Point validation in (a) Cedar city, Utah and (b) Phoenix, Arizona from 1992 to 2012.

data. When the sampling rate is above 59%, as in Figure 4, the data overlap exactly with the actual at all 26 locations and in time (only 2 sites are shown in Figure 5 as examples). Open triangles (>59%), dark squares (100%), and actual (filled diamonds) are all merged in Figure 5.

Delaunay polygon, kriging, radial basis function, and inverse distance weighting methods have been traditionally used to compensate for lack of data [15] [16]. The main assumption is that water head can be interpolated in a linear or some other interpolation function. The Delaunay interpolation (or sometimes referred to as Thiessen polygon) is for defining regions based on a set of data points in the plane such that regions must be enclosed by a line midway between the measured point under consideration and surrounding points [17]. In geostatistics, kriging is an interpolation method for which the interplated values are modeled by prior covariance-governed Gaussian process, and is known as the best linear unbiased prediction method. Radial basis function uses a real-valued function whose value is dependent only on the distance from the origin. Using this function, sums of radial basis functions are used to approximate given functions, which give interpolated values. Inverse distance weighting is a deterministic multivariate interpolation method, using a known scattered set of points. All of these methods are deterministic methods, whereas compressive sensing is a completely different paradigm, which serves as a data recovery tool for highly indeterminate systems involving sparsely sampled groundwater data.

Now, we can check whether compressive sensing is able to construct two-dimensional mapping of the groundwater level. We start from the 0.25-degree resolution mapping in Figure 6(a), where again the white cells indicate absence of data. Compressive sensing is able to fill in these white cells with approximate groundwater level values, and generate a full map as shown in Figure 6(b). We can see that in the state of California many of the missing cells have been replaced (recovered) with blue cells, indicating low groundwater levels. This appears to be a valid reconstruction as there are generally low groundwater levels

(a)(b)

Figure 6. Construction of groundwater mapping in the southwestern US in 2002, based on localized site measurements.

as shown by the actual data in Figure 6(a). However, we can also see that in other regions the white or empty cells were replaced with data cells that are consistent with adjacent cells in Figure 6(b). A few of the cells have been converted to blue where we would not expect to low levels, due to continuity in the data and also possibly characteristic of the region. This may be the limit of compressive sensing where beyond a certain sampling rate, when the sampling rate is lower than the critical limit, full recovery of the data is not possible.

We can see this effect when we increase the pixel resolution to 0.1˚ longitude × 0.1˚ latitude mapping by averaging the available data over smaller areas, as shown in Figure 7. Thus, we are basically decreasing the sampling rate by reducing the ratio of available data (data cells) to the total number of cells. The full

(a)(b)

Figure 7. Groundwater map in the southwestern US in 2002 with increased resolution (more sparse data).

mapping obtained through compressive sensing shows many data cells that exhibit unreasonably low groundwater levels (blue cells). The method is able to only generate data near the points or cells at which there existed some amount of data.

Figure 8 shows the set of contour maps obtained from compressive sensing reconstruction of groundwater levels in the southwestern United States, from 1954 to 2012. Contour maps are shown at 12-year intervals, for demonstration of the utility of compressive sensing. Although some of the details may not be fully recovered, general trends in the distribution of groundwater can easily be observed in compressive sensing recovered mapping. We can see the expanding regions of low groundwater (deep blue in the contour map) in the southwestern United States, encroaching southern Utah area. Also, pockets of low groundwater levels are observed in Utah and Colorado areas in 2012 data map.

Figure 8. Groundwater contour maps obtained through compressive sensing in the southwestern United States, from 1954 to 2012. The numbers in the parenthesis refer to the number of measurements sites used in the averaging for the given year, for the entire Southwest/Arizona.

4. Conclusion

We have shown that compressive sensing can be an effective method in hydrological monitoring, where the measurements are often sparsely populated. Important parameters, such as the groundwater level, across large regions can be mapped into a more complete contour data, at data sampling rates far below the Nyquist criterion. Although compressive sensing has been developed in image and signal processing, it evidently has significant applications in fluidic systems as well, such as groundwater hydrology since transport processes are continuous through physical principles of convection and diffusion. Fluidic processes mostly involve continuity and smoothness in the data (which is not necessarily the case in general imaging), which tend to satisfy the “compressibility” condition. Therefore, the potential for applications of compressive sensing in hydrology is quite wide and impactful.

Acknowledgements

This work was supported in part by the Ministry of Education (Czech Republic), under the program INTER_ EXCELLENCE (project number LTAUSA19053).

Appendix: Compressive Sensing Algorithm

To illustrate the application of compressive sensing to groundwater data analysis, we can start with a simple data set composed of the size N by 1 matrix, and then *N* nodes are the elements of the groundwater level data. Each node acquires a sample *x _{i}* which is the averaged groundwater level at the specified coordinate ranges. Thus, we have a data set:
$x={\left[{x}_{1},\cdots ,{x}_{n}\right]}^{\text{T}}$. We can approximate data using a basis function (such as sinusoidal function in Fourier series),
$\Psi ={\left[{\psi}_{1},\cdots ,{\psi}_{n}\right]}^{\text{T}}$. Then, the data can be found and it meets that
$x={\displaystyle {\sum}_{i=1}^{m}{z}_{i}{\psi}_{i}}$ as well as
$m\ll n$. Compressive sensing theory [3] suggests that, under satisfaction of the requirements called null space properties (NSP) and restricted isometry properties (RIP), we can collect the sample
$y=\Phi x$, where
$\Phi $ is size

$\mathrm{min}{\Vert x\Vert}_{1}\text{\hspace{0.17em}}\text{such}\text{\hspace{0.17em}}\text{that}\text{\hspace{0.17em}}y=\Phi \Psi z$

There are online, open-source programs for *l*1-optimization, such as Yang *et al.* [18] that can accomplish this task, without having to go through all the programming.

Once one gets to this point, then extensions to two- or three-dimensional problems are straight-forward. For example, we can reconstruct a two-dimensional groundwater network data $f\left({t}_{x},{t}_{y}\right)$ from samples $F\left(x,y\right)$ of its discrete Fourier transform on a radial domain Θ. Two-dimensional Fourier transform is formulated below.

$F\left(x,y\right)={\displaystyle \underset{{t}_{x}=0}{\overset{N-1}{\sum}}{\displaystyle \underset{{t}_{y}=0}{\overset{N-1}{\sum}}f\left({t}_{x},{t}_{y}\right){\text{e}}^{-2\pi i\left({t}_{x}\frac{x}{N}+{t}_{y}\frac{y}{N}\right)}}}$

where,
$f\left({t}_{x},{t}_{y}\right)$ is the two-dimensional groundwater level and
$F\left(x,y\right)$ represents sampled data. For two-dimensional data, *l*1-optimization algorithm is slow, and other methods such as total variation and gradient pursuit algorithms are used, also available as open-source codes [19] with ample, user-friendly descriptions. A similar application in geophysics can be found in Ref. [20].

References

[1] Clarke, R. and King, J. (2004) The Atlas of Water: Mapping the World’s Most Critical Resource. Earthscan Publications, Ltd., London.

[2] Lee, T.-W. and An, K. (2016) Recovering Sub-Grid Scale Features of Turbulent Flows, through Compressive Sensing. Journal of Flow Visualization and Image Processing, 72, 121-127.

[3] Candes, E.J., Romberg, J. and Tao, T. (2006) Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information. IEEE Transactions on Information Theory, 52, 489-509.

https://doi.org/10.1109/TIT.2005.862083

[4] Elad, M. (2010) Sparse and Redundant Representations. Springer, New York.

https://doi.org/10.1007/978-1-4419-7011-4

[5] Elad, M., Starck, J.-L., Querre, P. and Donoho, D.L. (2005) Simultaneous Cartoon and Texture Image in Painting Using Morphological Component Analysis (MCA). Applied and Computational Harmonic Analysis, 19, 340-358.

https://doi.org/10.1016/j.acha.2005.03.005

[6] Eldar, Y.C. and Kutyniok, G. (2012) Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge.

[7] Donoho, D.L. (2006) Compressed Sensing. IEEE Transactions on Information Theory, 52, 1289-1306.

https://doi.org/10.1109/TIT.2006.871582

[8] Duarte, M.F. and Eldar, Y.C. (2011) Structured Compressed Sensing: From Theory to Applications. IEEE Transactions on Signal Processing, 59, 4503-4085.

https://doi.org/10.1109/TSP.2011.2161982

[9] DeVore, R.A. (2007) Deterministic Constructions of Compressed Sensing Matrices. Journal of Complexity, 23, 918-925.

https://doi.org/10.1016/j.jco.2007.04.002

[10] Stojnic, M. (2010) l2/l1-Optimization in Block-Sparse Compressed Sensing and Its Strong Thresholds. IEEE Journal of Selected Topics in Signal Processing, 4, 350-357.

https://doi.org/10.1109/JSTSP.2009.2039172

[11] Takhar, D., Laska, J.N., Wakin, M.B., Duarte, M.F., Baron, D., Sarvotham, S., Kelly, K.F. and Baraniuk, R.G. (2006) A New Compressive Sensing Imaging Camera Architecture Using Optical-Domain Compression. Proceedings of Computational Imaging IV at Society of Photo-Optical Instrumentation Engineers Electronic Imaging, 6065, 43-52.

https://doi.org/10.1117/12.659602

[12] Yin, W., Osher, S., Goldfarb, D. and Darbon, J. (2008) Bregman Iterative Algorithms for l1-Minimization with Applications to Compressed Sensing. Journal of Imaging Science in Society for Industrial and Applied Mathematics, 1, 143-168.

https://doi.org/10.1137/070703983

[13] Boyd, S. and Vandenberghe, L. (2004) Convex Optimization. Cambridge University Press, Cambridge.

https://doi.org/10.1017/CBO9780511804441

[14] Guzman, J.A., Moriasi, D.N., Gowda, P.H., et al. (2015) A Model Integration Framework for Linking SWAT and MODFLOW. Environmental Modelling and Software, 73, 103-116.

https://doi.org/10.1016/j.envsoft.2015.08.011

[15] Li, J. and Heap, A.D. (2011) A Review of Comparative Studies of Spatial Interpolation Methods in Environmental Sciences: Performance and Impact Factors. Ecological Informatics, 6, 228-241.

https://doi.org/10.1016/j.ecoinf.2010.12.003

[16] Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267-288.

https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

[17] Brassel, K.E. and Reif, D. (2010) A Procedure to Generate Thiessen Polygons. Geographical Analysis, 11, 289-303.

https://doi.org/10.1111/j.1538-4632.1979.tb00695.x

[18] Yang, A.Y., Ganesh, A., Zhou, Z., Wagner, A., Shia, V., Sastry, S. and Ma, Y. (2010) Fast l-1 Minimization Algorithms: Homotopy and Augmented Lagrangian Method—Implementation from Fixed-Point MPUs to Many-Core CPUs/GPUs.

https://people.eecs.berkeley.edu/~yang/software/l1benchmark/

[19] Zhang, Y. (2016)

http://www.caam.rice.edu/~zhang/cv/node7.html

[20] Yao, H., Gerstoft, P., Shearer, P.M. and Mecklenbrauker, C. (2011) Compressive Sensing of the Tokoku-Oki Mw 9.0 Earthquake: Frequency-Dependent Rupture Modes. Geophysical Research Letters, 38, L20310.

https://doi.org/10.1029/2011GL049223