GEP  Vol.7 No.6 , June 2019
Application of Surface Water Quality Classification Models Using Principal Components Analysis and Cluster Analysis
Water quality monitoring has one of the highest priorities in surface water protection policy. Many variety approaches are being used to interpret and analyze the concealed variables that determine the variance of observed water quality of various source points. A considerable proportion of these approaches are mainly based on statistical methods, multivariate statistical techniques in particular. In the present study, the use of multivariate techniques is required to reduce the large variables number of Nile River water quality upstream Cairo Drinking Water Plants (CDWPs) and determination of relationships among them for easy and robust evaluation. By means of multivariate statistics of principal components analysis (PCA), Fuzzy C-Means (FCM) and K-means algorithm for clustering analysis, this study attempted to determine the major dominant factors responsible for the variations of Nile River water quality upstream Cairo Drinking Water Plants (CDWPs). Furthermore, cluster analysis classified 21 sampling stations into three clusters based on similarities of water quality features. The result of PCA shows that 6 principal components contain the key variables and account for 75.82% of total variance of the study area surface water quality and the dominant water quality parameters were: Conductivity, Iron, Biological Oxygen Demand (BOD), Total Coliform (TC), Ammonia (NH3), and pH. However, the results from both of FCM clustering and K-means algorithm, based on the dominant parameters concentrations, determined 3 cluster groups and produced cluster centers (prototypes). Based on clustering classification, a noted water quality deteriorating as the cluster number increased from 1 to 3. However the cluster grouping can be used to identify the physical, chemical and biological processes creating the variations in the water quality parameters. This study revealed that multivariate analysis techniques, as the extracted water quality dominant parameters and clustered information can be used in reducing the number of sampling parameters on the Nile River in a cost effective and efficient way instead of using a large set of parameters without missing much information. These techniques can be helpful for decision makers to obtain a global view on the water quality in any surface water or other water bodies when analyzing large data sets especially without a priori knowledge about relationships between them.
Cite this paper: Hamed, M. (2019) Application of Surface Water Quality Classification Models Using Principal Components Analysis and Cluster Analysis. Journal of Geoscience and Environment Protection, 7, 26-41. doi: 10.4236/gep.2019.76003.

[1]   Abd El-Daiem, S. (2011). Water Quality Management in Egypt. Journal of Water Resources Development, 27, 181-202.

[2]   Adekunle, L., Adetunji, M., & Gbadebo, A. (2007). Assessment of Ground Water Quality in a Typical Rural Settlement in South Nigeria. International Journal of Environmental Research and Public Health, 4, 307-318.

[3]   Akume, D., & Weber, G.-W. (2002). Cluster Algorithms: Theory and Methods. Journal of Computational Technologies, 7, 15-27.

[4]   Cattel, R. D. (1966). The Scree Test for the Number of Factors. Multivariate Behavioral Research, 1, 245-276.

[5]   CDWC (2018). Central Laboratory Annual Technical Report. Cairo Drinking Water Company.

[6]   Chatfield, C., & Collin, A. J. (1980). Introduction to Multivariate Analysis. New York: Chapman and Hall in Association with Methuen, Inc.

[7]   Davis, J. C. (2002). Statistics and Data Analysis in Geology (3rd ed.). New York: John Wiley and Sons, Inc.

[8]   Egyptian Governmental Law No. 48 (1982). The Implementer Regulations for Law 48/1982 Regarding the Protection of the River Nile and Water Ways from Pollution (pp. 12-35). Map. Periodical Bulletin, 3-4 December.

[9]   EWQS (Egyptian Drinking Water Quality Standards) (2007). Ministry of Health, Population Decision Number 458.

[10]   Goher, M. E., Hassan, A. M., Abdel-Moniem, I. A., Fahmy, A. H., & El-Sayed, S. M. (2014). Evaluation of Surface Water Quality and Heavy Metal Indices of Ismailia Canal, Nile River, Egypt. Egyptian Journal of Aquatic Research, 40, 225-233.

[11]   Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). New York: Springer-Verlag.

[12]   Karavoltsos, S., Sakellar, A., Mihopoulos, N., Dassenakis, M., & Scoullos, M. J. (2008). Evaluation of the Quality of Drinking Water in Regions of Greece. Desalination, 224, 317-329.

[13]   Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data—An Introduction to Cluster Analysis. New York: John Wiley & Sons Inc.

[14]   Panda, U. C., Sundaray, S. K., Rath, P., Nayak, B. B., & Bhatta, D. (2006). Application of Factor and Cluster Analysis for Characterization of River and Estuarine Water Systems—A Case Study: Mahanadi River (India). Journal of Hydrology, 331, 434-445.

[15]   Reghunath, R., Murthy, S. T. R., & Raghavan, B. R. (2002). The Utility of Multivariate Statistical Techniques in Hydrogeochemical Studies. An Example from Karnataka, India. Water Research, 36, 2437-2442.

[16]   Saleh, A. R. (2009). Bacteria and Viruses in the Nile. Monographiae Biologicae, 89, 407-429.

[17]   Selim, S. Z. (1984). Soft Clustering of Multi-Dimensional Data: A Semi-Fuzzy Approach. Pattern Recognition, 17, 559-568.

[18]   Suhr, D. (2005). Principal Component Analysis vs. Exploratory Factor Analysis. SUGI 30 Proceedings.

[19]   Tebbutt, T. (1998). Principles of Water Quality Control (5th ed.). Sheffield: Hallam University.

[20]   Terceiro, P., Lobo-Ferreira, J. P., & Leitão, T. E. (2008). Análise da qualidade da água e questões de governan-ciana Albufeirado Alqueva. Comunicaçãoapresen-tada no 9o Congresso da água-água: Desafios de hoje, exigências de amanhã. Cascais, Portugal. (In Portuguese)

[21]   Toufeek, M. A., & Korium, M. A. (2009). Quality in Lake Nasser Water. Global Journal of Environmental Research, 3, 141-148.

[22]   Trauwaert, E., Kaufman, L., & Rousseeuw, P. (1991). Fuzzy Clustering Algorithms Based on the Maximum Likelihood Principle. Fuzzy Sets and Systems, 42, 213-227.

[23]   Yu, S., Shang, J., Zhao, J., & Guo, H. (2003). Factor Analysis and Dynamics of Water Quality of the Songhua River Northeast China. Water, Air, & Soil Pollution, 144, 159-169.

[24]   Zamxaka, M., Pironcheva, G., & Muyima, N. Y. O. (2004). Microbiological and Physico-Chemical Assessment of the Quality of Domestic Water Sources in Selected Rural Communities of the Eastern Cape Province, South Africa. Water SA, 30, 333-340.

[25]   Zeng, X., & Rasmussen, T. C. (2005). Multivariate Statistical Characterization of Water Quality in Lake Lanier, Georgia, USA. Journal of Environmental Quality, 34, 1980-1991.