A Novel Spatial Clustering Algorithm Based on Delaunay Triangulation

ABSTRACT

Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques which is the discovery of interesting rela-tionships and characteristics that may exist implicitly in spatial databases. So far, a lot of spatial clustering algorithms have been proposed in many applications such as pattern recognition, data analysis, and image processing and so forth. However most of the well-known clustering algorithms have some drawbacks which will be presented later when ap-plied in large spatial databases. To overcome these limitations, in this paper we propose a robust spatial clustering algorithm named NSCABDT (Novel Spatial Clustering Algorithm Based on Delaunay Triangulation). Delaunay dia-gram is used for determining neighborhoods based on the neighborhood notion, spatial association rules and colloca-tions being defined. NSCABDT demonstrates several important advantages over the previous works. Firstly, it even discovers arbitrary shape of cluster distribution. Secondly, in order to execute NSCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that NSCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.

Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques which is the discovery of interesting rela-tionships and characteristics that may exist implicitly in spatial databases. So far, a lot of spatial clustering algorithms have been proposed in many applications such as pattern recognition, data analysis, and image processing and so forth. However most of the well-known clustering algorithms have some drawbacks which will be presented later when ap-plied in large spatial databases. To overcome these limitations, in this paper we propose a robust spatial clustering algorithm named NSCABDT (Novel Spatial Clustering Algorithm Based on Delaunay Triangulation). Delaunay dia-gram is used for determining neighborhoods based on the neighborhood notion, spatial association rules and colloca-tions being defined. NSCABDT demonstrates several important advantages over the previous works. Firstly, it even discovers arbitrary shape of cluster distribution. Secondly, in order to execute NSCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that NSCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.

Cite this paper

nullX. Yang and W. Cui, "A Novel Spatial Clustering Algorithm Based on Delaunay Triangulation,"*Journal of Software Engineering and Applications*, Vol. 3 No. 2, 2010, pp. 141-149. doi: 10.4236/jsea.2010.32018.

nullX. Yang and W. Cui, "A Novel Spatial Clustering Algorithm Based on Delaunay Triangulation,"

References

[1] G. Piatetsky-Shapiro and W. J. Frawley. “Knowledge discovery in databases,” AAAI/MIN Press, 1999.

[2] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. “The KDD process for extracting useful knowledge from vol-umes of data,” Communications of ACM, Vol. 39, 1996.

[3] S. Shekhar, C. T. Lu, P. Zhang, and R. Liu, “Data mining for selective visualization of large spatial datasets,” Proc-essing of 14th IEEE international conference on tools with artificial intelligence (ICTAI’02), 2002.

[4] J. Han and M. Kamber, “Data mining: Concepts and Techniques,” Academic Press, 2001.

[5] I. Atsushi and T. Ken, “Graph-based clustering of random point set,” Structural, Syntactic and Statistical Pattern Recognition, Springer Berlin, pp. 948–956, 2004.

[6] R. T. Ng and J. Han, “CLARANS: A method for cluster-ing objects for spatial data mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 5, pp. 1003–1016, 2002.

[7] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Proceedings of the 1998 ACM SIGMOD international conference on Management of data, ACM Press, pp. 73–84, 1998.

[8] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” Proceedings of the 1996 ACM SIGMOD international conference on Management of data, ACM Press, pp. 103–114, 1996.

[9] L. Kaufman and P. J. Rousseeuw, “Finding Groups in Data: An introduction to cluster analysis,” John Wiley & Sons, 1990.

[10] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “Density- based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the 1996 Knowl-edge Discovery and Data Mining (KDD’96) international conference, AAAI Press, pp. 226–231, 1996.

[11] M. Ankerst, M. M. Breunig, H. P. Kriegel, et al., “OP-TICS: Ordering points to identify the clustering struc-ture,” Proceedings of the International Conference on Management of Data (SIGMOD), ACM Press, pp. 49–60, 1999.

[12] J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Den-sity-Based Clustering in Spatial Databases: The algorithm GDBSCAN and its applications,” Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp.169–194, 1998.

[13] X. Wang and H. J. Hamilton, “DBRS: A density-based spatial clustering method with random sampling,” Pro-ceedings of the 7th PAKDD, Springer, pp. 563–575, 2003.

[14] R. P. Haining, “Spatial data analysis in the social and environmental sciences,” Cambridge University Press, 1990.

[15] J. Han and M. Kamber, “Data Mining: Concepts and techniques,” Second Edition, Morgan Kaufmann, 2006.

[16] V. Estivill-Castro and I. Lee, “AMOEBA: Hierarchical clustering based on spatial proximity using delaunay dia-gram,” Proceedings of the 9th international symposium on spatial data handling, pp. 7a. 26–7a. 41, 2000.

[17] E. Schikuta and M. Erhart, “The BANG-clustering sys-tem: Grid-based data analysis,” Proceedings of the 2nd international symposium IDA-97, Advances in intelligent data analysis, Springer-Verlag, pp. 513–524, 1997.

[18] S. Openshaw, “A mark 1 geographical analysis machine for the automated analysis of point data sets,” Interna-tional Journal of GIS, Vol. 1, No. 4, pp. 335–358, 1987.

[19] In-Soo Kang, Tae-wan Kim, and Ki-Joune Li, “A spatial data mining method by delaunay triangulation,” Proceed-ing of 5th ACM Workshop on Geographic Information Systems, Las Vegas, Nevada, pp. 35–39, 1997.

[20] C. Eldershaw and M. Hegland, “Cluster analysis using triangulation,” Computational Techniques and Applica-tions (CTAC97), World Scientific, Singapore, pp. 201– 208, 1997.

[21] V. Estivill-Castro and I. Lee, “AUTOCLUST: Automatic clustering via boundary extraction for mining massive point-data sets,” Proceedings of the 5th international con-ference on geocomputation, 2000.

[22] H. J. Miller, “Geographic data mining and knowledge discovery,” Handbook of geographic information science. Malden, MA: Blackwell, pp. 149–159, 2009.

[23] V. Estivill-Castro and M. E. Houle., “Robust Distance- based clustering with applications to spatial data mining,” Algorithmica, Vol. 30, No. 2, pp. 216–242, 2001.

[24] W. R. Tobler, “A computer movie simulating urban growth in the Detroit region,” Economic Geography, Vol. 46, No. 2, pp. 234–240, 1970.

[25] B. Delaunay, “Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestven- nykh Nauk,” pp. 793–800, 1934.

[26] G. Liotta, “Low degree algorithm for computing and checking gabriel graphs”, Report No. CS–96–28, De-partment of Computer Science in Brown University, Providence, 1996.

[27] X. Xu, M. Ester, H. Kriegel, and J. Sander, “A distribu-tion-based clustering algorithm for mining in large spatial databases,” Proceedings of the 14th International Con-ference on Data Engineering (ICDE’98), pp. 324–331, 1998.

[28] D. Y. Ma and A. D. Zhang, “An adaptive density-based clustering algorithm for spatial database with noise,” ICDM, Proceedings Fourth IEEE International Confer-ence, pp. 467–470, 2004.

[1] G. Piatetsky-Shapiro and W. J. Frawley. “Knowledge discovery in databases,” AAAI/MIN Press, 1999.

[2] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. “The KDD process for extracting useful knowledge from vol-umes of data,” Communications of ACM, Vol. 39, 1996.

[3] S. Shekhar, C. T. Lu, P. Zhang, and R. Liu, “Data mining for selective visualization of large spatial datasets,” Proc-essing of 14th IEEE international conference on tools with artificial intelligence (ICTAI’02), 2002.

[4] J. Han and M. Kamber, “Data mining: Concepts and Techniques,” Academic Press, 2001.

[5] I. Atsushi and T. Ken, “Graph-based clustering of random point set,” Structural, Syntactic and Statistical Pattern Recognition, Springer Berlin, pp. 948–956, 2004.

[6] R. T. Ng and J. Han, “CLARANS: A method for cluster-ing objects for spatial data mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 5, pp. 1003–1016, 2002.

[7] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Proceedings of the 1998 ACM SIGMOD international conference on Management of data, ACM Press, pp. 73–84, 1998.

[8] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” Proceedings of the 1996 ACM SIGMOD international conference on Management of data, ACM Press, pp. 103–114, 1996.

[9] L. Kaufman and P. J. Rousseeuw, “Finding Groups in Data: An introduction to cluster analysis,” John Wiley & Sons, 1990.

[10] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “Density- based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the 1996 Knowl-edge Discovery and Data Mining (KDD’96) international conference, AAAI Press, pp. 226–231, 1996.

[11] M. Ankerst, M. M. Breunig, H. P. Kriegel, et al., “OP-TICS: Ordering points to identify the clustering struc-ture,” Proceedings of the International Conference on Management of Data (SIGMOD), ACM Press, pp. 49–60, 1999.

[12] J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Den-sity-Based Clustering in Spatial Databases: The algorithm GDBSCAN and its applications,” Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp.169–194, 1998.

[13] X. Wang and H. J. Hamilton, “DBRS: A density-based spatial clustering method with random sampling,” Pro-ceedings of the 7th PAKDD, Springer, pp. 563–575, 2003.

[14] R. P. Haining, “Spatial data analysis in the social and environmental sciences,” Cambridge University Press, 1990.

[15] J. Han and M. Kamber, “Data Mining: Concepts and techniques,” Second Edition, Morgan Kaufmann, 2006.

[16] V. Estivill-Castro and I. Lee, “AMOEBA: Hierarchical clustering based on spatial proximity using delaunay dia-gram,” Proceedings of the 9th international symposium on spatial data handling, pp. 7a. 26–7a. 41, 2000.

[17] E. Schikuta and M. Erhart, “The BANG-clustering sys-tem: Grid-based data analysis,” Proceedings of the 2nd international symposium IDA-97, Advances in intelligent data analysis, Springer-Verlag, pp. 513–524, 1997.

[18] S. Openshaw, “A mark 1 geographical analysis machine for the automated analysis of point data sets,” Interna-tional Journal of GIS, Vol. 1, No. 4, pp. 335–358, 1987.

[19] In-Soo Kang, Tae-wan Kim, and Ki-Joune Li, “A spatial data mining method by delaunay triangulation,” Proceed-ing of 5th ACM Workshop on Geographic Information Systems, Las Vegas, Nevada, pp. 35–39, 1997.

[20] C. Eldershaw and M. Hegland, “Cluster analysis using triangulation,” Computational Techniques and Applica-tions (CTAC97), World Scientific, Singapore, pp. 201– 208, 1997.

[21] V. Estivill-Castro and I. Lee, “AUTOCLUST: Automatic clustering via boundary extraction for mining massive point-data sets,” Proceedings of the 5th international con-ference on geocomputation, 2000.

[22] H. J. Miller, “Geographic data mining and knowledge discovery,” Handbook of geographic information science. Malden, MA: Blackwell, pp. 149–159, 2009.

[23] V. Estivill-Castro and M. E. Houle., “Robust Distance- based clustering with applications to spatial data mining,” Algorithmica, Vol. 30, No. 2, pp. 216–242, 2001.

[24] W. R. Tobler, “A computer movie simulating urban growth in the Detroit region,” Economic Geography, Vol. 46, No. 2, pp. 234–240, 1970.

[25] B. Delaunay, “Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestven- nykh Nauk,” pp. 793–800, 1934.

[26] G. Liotta, “Low degree algorithm for computing and checking gabriel graphs”, Report No. CS–96–28, De-partment of Computer Science in Brown University, Providence, 1996.

[27] X. Xu, M. Ester, H. Kriegel, and J. Sander, “A distribu-tion-based clustering algorithm for mining in large spatial databases,” Proceedings of the 14th International Con-ference on Data Engineering (ICDE’98), pp. 324–331, 1998.

[28] D. Y. Ma and A. D. Zhang, “An adaptive density-based clustering algorithm for spatial database with noise,” ICDM, Proceedings Fourth IEEE International Confer-ence, pp. 467–470, 2004.