Addendum to: An Approach to Hierarchical Clustering via Level Surfaces and Convexity

ABSTRACT

This article is an addendum to the 2001 paper [1] which investigated an approach to hierarchical clustering based on the level sets of a density function induced on data points in a d-dimensional feature space. We refer to this as the “level-sets approach” to hierarchical clustering. The density functions considered in [1] were those formed as the sum of identical radial basis functions centered at the data points, each radial basis function assumed to be continuous, monotone decreasing, convex on every ray, and rising to positive infinity at its center point. Such a framework can be investigated with respect to both the Euclidean (L2) and Manhattan (L1) metrics. The addendum here puts forth some observations and questions about the level-sets approach that go beyond those in [1]. In particular, we detail and ask the following questions. How does the level-sets approach compare with other related approaches? How is the resulting hierarchical clustering affected by the choice of radial basis function? What are the structural properties of a function formed as the sum of radial basis functions? Can the levels-sets approach be theoretically validated? Is there an efficient algorithm to implement the level-sets approach?

This article is an addendum to the 2001 paper [1] which investigated an approach to hierarchical clustering based on the level sets of a density function induced on data points in a d-dimensional feature space. We refer to this as the “level-sets approach” to hierarchical clustering. The density functions considered in [1] were those formed as the sum of identical radial basis functions centered at the data points, each radial basis function assumed to be continuous, monotone decreasing, convex on every ray, and rising to positive infinity at its center point. Such a framework can be investigated with respect to both the Euclidean (L2) and Manhattan (L1) metrics. The addendum here puts forth some observations and questions about the level-sets approach that go beyond those in [1]. In particular, we detail and ask the following questions. How does the level-sets approach compare with other related approaches? How is the resulting hierarchical clustering affected by the choice of radial basis function? What are the structural properties of a function formed as the sum of radial basis functions? Can the levels-sets approach be theoretically validated? Is there an efficient algorithm to implement the level-sets approach?

KEYWORDS

Hierarchical Clustering, Level Sets, Level Surfaces, Radial Basis Function, Convex, Heat, Gravity, Light, Cluster Validation, Ridge Path, Euclidean Distance, Manhattan Distance, Metric

Hierarchical Clustering, Level Sets, Level Surfaces, Radial Basis Function, Convex, Heat, Gravity, Light, Cluster Validation, Ridge Path, Euclidean Distance, Manhattan Distance, Metric

Cite this paper

nullJ. Malitz and S. Malitz, "Addendum to: An Approach to Hierarchical Clustering via Level Surfaces and Convexity,"*Intelligent Information Management*, Vol. 2 No. 5, 2010, pp. 299-305. doi: 10.4236/iim.2010.25035.

nullJ. Malitz and S. Malitz, "Addendum to: An Approach to Hierarchical Clustering via Level Surfaces and Convexity,"

References

[1] R. Holley, J. Malitz and S. Malitz, “An Approach to Hierarchical Clustering via Level Surfaces and Convexity,” Discrete and Computational Geometry, Vol. 25, No. 2, 2001, pp. 221-233.

[2] J. Hartigan, “Clustering Algorithms,” Wiley, 1975.

[3] K. Fukunaga and L. Hostler, “The Estimation of the Gradient of a Density Function with Application in Pattern Recognition,” IEEE Transactions on Information Theory, Vol. 21, No. 1, 1975, pp. 32-40.

[4] P. Schnell, “A Method to Find Point-Groups,” Biometrika, Vol. 6, 1964, pp. 47-48.

[5] M. Halkidi, Y. Batistakis and M. Vazirgiannis, “On Clustering Validation Techniques,” Journal of Intelligent Information Systems, Academic Publishers, Vol. 17, No. 2-3, 2001, pp. 107-145.

[6] S. Kotsiantis and P. Pintelas, “Recent Advances in Clustering: A Brief Survey,” WSEAS Transactions on Information Science and Applications, Vol. 1, No. 1, 2004, pp. 73-81.

[7] A. Hinneburg and D. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998, pp. 58-65.

[8] Y. J. Oyang, C. Y. Chen and T. W. Yang, “A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory,” Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, 2001, pp. 350-361.

[9] C. Y. Chen, S. C. Hwang and Y. J. Oyang, “An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory,” Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Sprin- ger, Berlin/Heidelberg, 2002, pp. 237-250.

[10] G. Strang, “Introduction to Applied Mathematics,” Wellesley- Cambridge Press, Wellesley, 1985.

[11] J. Sethian, “Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Scien- ce,” Cambridge University Press, Cambridge, 2002.

[12] Y. W. Teh, M. Jordan, M. Beal and D. Blei, “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, Vol. 101, No. 476, 2006, pp. 1566-1581.

[13] S. Axler, P. Bourdon and W. Ramey, “Harmonic Function Theory,” Springer-Verlag, New York, 2001.

[14] M. Kass, A. Witkin and D. Terzopolous, “Snakes: Active Contour Models,” International Journal of Computer Vision, Kluwer Academic Publishers, Norwell, 1988, pp. 321-331.

[15] L. Hyafil and R. Rivest, “Constructing Optimal Binary Decision Trees is NP-Complete,” Information Processing Letters, Vol. 5, No. 1, 1976. pp. 15-17.

[16] M. Garey and D. Johnson, “The Rectilinear Steiner Tree Problem is NP-Complete,” SIAM Journal on Applied Mathematics, Vol. 32, No. 4, 1977, pp. 826-834.

[17] L. Foulds and R. Graham, “The Steiner Problem in Phylogeny is NP-Complete,” Advances in Applied Mathematics, Vol. 3, No. 1, 1982, pp. 43-49.

[18] B. Chor and T. Tuller, “Finding a Maximum Likelihood Tree is Hard,” Journal of the ACM, Vol. 53, No. 5, September 2006, pp. 722-744.

[19] W. Day, “Computationally Difficult Parsimony Problems in Phylogenetics Systematics,” Journal of Theoretical Biology, Vol. 103, No. 3, 1983, pp. 429-438.

[20] C. Aggarwal, A. Hinneburg and D. Keim, “On the Surprising Behavior of Distance Metrics in High Dimensional Space,” Database Theory — ICDT 2001, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, Vol. 1973, 2001, pp. 420-434.

[1] R. Holley, J. Malitz and S. Malitz, “An Approach to Hierarchical Clustering via Level Surfaces and Convexity,” Discrete and Computational Geometry, Vol. 25, No. 2, 2001, pp. 221-233.

[2] J. Hartigan, “Clustering Algorithms,” Wiley, 1975.

[3] K. Fukunaga and L. Hostler, “The Estimation of the Gradient of a Density Function with Application in Pattern Recognition,” IEEE Transactions on Information Theory, Vol. 21, No. 1, 1975, pp. 32-40.

[4] P. Schnell, “A Method to Find Point-Groups,” Biometrika, Vol. 6, 1964, pp. 47-48.

[5] M. Halkidi, Y. Batistakis and M. Vazirgiannis, “On Clustering Validation Techniques,” Journal of Intelligent Information Systems, Academic Publishers, Vol. 17, No. 2-3, 2001, pp. 107-145.

[6] S. Kotsiantis and P. Pintelas, “Recent Advances in Clustering: A Brief Survey,” WSEAS Transactions on Information Science and Applications, Vol. 1, No. 1, 2004, pp. 73-81.

[7] A. Hinneburg and D. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” Proceedings of 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998, pp. 58-65.

[8] Y. J. Oyang, C. Y. Chen and T. W. Yang, “A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory,” Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, 2001, pp. 350-361.

[9] C. Y. Chen, S. C. Hwang and Y. J. Oyang, “An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory,” Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Sprin- ger, Berlin/Heidelberg, 2002, pp. 237-250.

[10] G. Strang, “Introduction to Applied Mathematics,” Wellesley- Cambridge Press, Wellesley, 1985.

[11] J. Sethian, “Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Scien- ce,” Cambridge University Press, Cambridge, 2002.

[12] Y. W. Teh, M. Jordan, M. Beal and D. Blei, “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, Vol. 101, No. 476, 2006, pp. 1566-1581.

[13] S. Axler, P. Bourdon and W. Ramey, “Harmonic Function Theory,” Springer-Verlag, New York, 2001.

[14] M. Kass, A. Witkin and D. Terzopolous, “Snakes: Active Contour Models,” International Journal of Computer Vision, Kluwer Academic Publishers, Norwell, 1988, pp. 321-331.

[15] L. Hyafil and R. Rivest, “Constructing Optimal Binary Decision Trees is NP-Complete,” Information Processing Letters, Vol. 5, No. 1, 1976. pp. 15-17.

[16] M. Garey and D. Johnson, “The Rectilinear Steiner Tree Problem is NP-Complete,” SIAM Journal on Applied Mathematics, Vol. 32, No. 4, 1977, pp. 826-834.

[17] L. Foulds and R. Graham, “The Steiner Problem in Phylogeny is NP-Complete,” Advances in Applied Mathematics, Vol. 3, No. 1, 1982, pp. 43-49.

[18] B. Chor and T. Tuller, “Finding a Maximum Likelihood Tree is Hard,” Journal of the ACM, Vol. 53, No. 5, September 2006, pp. 722-744.

[19] W. Day, “Computationally Difficult Parsimony Problems in Phylogenetics Systematics,” Journal of Theoretical Biology, Vol. 103, No. 3, 1983, pp. 429-438.

[20] C. Aggarwal, A. Hinneburg and D. Keim, “On the Surprising Behavior of Distance Metrics in High Dimensional Space,” Database Theory — ICDT 2001, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, Vol. 1973, 2001, pp. 420-434.