Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension

Affiliation(s)

Department of CSE, Jawaharlal Nehru Technological University Hyderabad (JNTUH), Hyderabad, India.

Department of CSE, University College of Engineering Kakinada (UCEK), Jawaharlal Nehru Technological University Kakinada (JNTUK), Kakinada, India.

Department of CSE, Jawaharlal Nehru Technological University Hyderabad (JNTUH), Hyderabad, India.

Department of CSE, University College of Engineering Kakinada (UCEK), Jawaharlal Nehru Technological University Kakinada (JNTUK), Kakinada, India.

Abstract

The
work on the paper is focused on the use of Fractal Dimension in clustering for
evolving data streams. Recently Anuradha *et
al.* proposed a new approach based on Relative Change in Fractal Dimension
(RCFD) and damped window model for clustering evolving data streams. Through observations on the aforementioned referred paper, this
paper reveals that the formation of quality cluster is heavily predominant on
the suitable selection of threshold value. In the above-mentionedpaper
Anuradha *et al.* have used a heuristic
approach for fixing the threshold value. Although the outcome of the approach
is acceptable, however, the approach is purely based on random selection and
has no basis to claim the acceptability in general. In this paper a novel
method is proposed to optimally compute threshold value using a population
based randomized approach known as particle swarm optimization (PSO). Simulations
are done on two huge data sets KDD Cup 1999 data set and the Forest Covertype
data set and the results of the cluster quality are compared with the fixed
approach. The comparison reveals that the chosen value of threshold by Anuradha *et al*., is robust and can be used
with confidence.

Keywords

Correlation Fractal Dimension, Fractal Dimension, Clustering, Particle Swarm Optimization, Data Stream Clustering

Correlation Fractal Dimension, Fractal Dimension, Clustering, Particle Swarm Optimization, Data Stream Clustering

Cite this paper

Yarlagadda, A. , Murthy, J. and Prasad, M. (2014) Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension.*Applied Mathematics*, **5**, 1615-1622. doi: 10.4236/am.2014.510155.

Yarlagadda, A. , Murthy, J. and Prasad, M. (2014) Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension.

References

[1] Aggarwal, C.C. (2006) Data Streams: Models and Algorithms (Advances in Database Systems). Springer, Secaucus.

[2] Gantz, J., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., Xheneti, I., Toncheva, A. and Manfrediz, A. (2007) The Expanding Digital Universe: A Forecast of Worldwide Information Growth through 2010. Technical Report, 12, 634-638.

[3] Gaber, M.M., Zaslavsky, A. and Krishnaswamy, S. (2005) Mining Data Streams: A Review. SIGMOD Record, 34, 18-26. http://dx.doi.org/10.1145/1083784.1083789

[4] Aggarwal, C.C., Han, J., Wang, J. and Yu, P. (2003) A Framework for Clustering Evolving Data Streams. In: Proceedings of 29th International Conference on Very Large Data Bases (VLDB’03), Berlin, September 2003.

[5] Aggarwal, C.C., Han, J.W., Wang, J.Y. and Yu, P.S. (2006) On Clustering Massive Data Streams: A Summarization Paradigm. In: Aggarwal, C.C., Ed., Data Streams—Models and Algorithms, Springer, Boston, 11-38.

[6] Babock, B., Datar, M., Motwani, R. and O’Callaghan, L. (2003) Maintaining Variance and k-Medians over Data Stream Windows. Proceedings of the 22nd ACM Symposium on Principles of Data Base Systems, San Diego, 234-243.

[7] Barbará, D. (2002) Requirements for Clustering Data Streams. SIGKDD Explorations Newsletter, 3, 23-27.
http://dx.doi.org/10.1145/507515.507519

[8] Beringher, J. and Hullermeier, E. (2006) Online Clustering of Parallel Data Streams. Data & Knowledge Engineering, 58, 180-204. http://dx.doi.org/10.1016/j.datak.2005.05.009

[9] Cao, F., Ester, M., Qian, W. and Zhou, A. (2006) Density-Based Clustering over Evolving Data Stream with Noise. Proceedings of the 6th SIAM International Conference on Data Mining (SIAM’06), Bethesda, 326-337.

[10] Charikar, M., O’Callaghan, L. and Panigrahy, R. (2003) Better Streaming Algorithms for Clustering Problems. Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC’03), San Diego, 30-39.

[11] Chen, Y. and Li, T. (2007) Density-Based Clustering for Real-Time Stream Data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07), ACM, New York, 133-142.

[12] Guha, S., Meyerson, A., Mishra, N., Motwani, R. and O’Callaghan, L. (2003) Clustering Data Streams: Theory and Practice. IEEE Transactions on Knowledge and Data Engineering, 15, 515-528.
http://dx.doi.org/10.1109/TKDE.2003.1198387

[13] Joao, G. (2009) An Overview on Mining Data Streams. Springer-Verlag, Berlin, Heidelberg, 29-45.

[14] Zhu, Y.Y. and Shasha, D. (2002) StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, 358-369.

[15] Ester, M., Kriegel, H.-P., Jrg, S. and Xu, X. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, 373-382.

[16] Tu, L. and Chen, Y.X. (2009) Stream Data Clustering Based on Grid Density and Attractions. ACM Transaction Knowledge Discovery Data, 3, 12:1-12:27.

[17] Guha, S., Mishra, N., Motwani, R. and O’Callaghan, L. (2000) Clustering Data Streams. In: Proceedings of the Annual IEEE Symposium on Foundations of Computer Science, Redondo Beach, 12-14 November 2000, 359-366.

[18] O’Callaghan, L., Mishra, N., Mishra, N. and Guha, S. (2002) Streaming-Data Algorithms for High Quality Clustering. Proceedings of the 18th International Conference on Data Engineering (ICDE’01), San Jose, 685-694.

[19] Anuradha, Y., Murthy, J.V.R. and Krishnaprasad, M.H.M. (2014) Clustering Based on Correlation Fractal Dimension over an Evolving Data Stream. Communicated to IJAIT 2014, unpublished.

[20] Anuradha, Y., Murthy, J.V.R. and Krishnaprasad, M.H.M. (2013) Estimating Correlation Dimension Using Multi Layered Grid and Damped Window Model over Data Streams. Procedia Technology, 10, 797-804.
http://dx.doi.org/10.1016/j.protcy.2013.12.424

[21] Belussi, A. (1995) Estimating the Selectivity of Spatial Queries Using the Correlation Fractal Dimension. Proceedings of 21st International Conference on Very Large Data Bases, Zurich, 11-15 September 1995.

[22] Li, G.L., et al. (2011) Fractal-Based Algorithm for Anomaly Pattern Discovery on Time Series Stream. Journal of Convergence Information Technology, 6, 181-187.

http://dx.doi.org/10.4156/jcit.vol6.issue3.20

[23] Kennedy, J.F. and Eberhart, R.C. (1995) Particle Swarm Optimization. Proceedings of the IEEE International Conference on Neural Networks, 4, 1942-1948.

http://dx.doi.org/10.1109/ICNN.1995.488968

[24] Shi, Y. and Eberhart, R.C. (1998) Parameter Selection in Particle Swarm Optimization. Evolutionary Programming VII, Springer. Lecture Notes in Computer Science, 1447, 591-600.

http://dx.doi.org/10.1007/BFb0040810