Privacy Preserving Two-Party Hierarchical Clustering Over Vertically Partitioned Dataset

ABSTRACT

Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes more challenging when dataset is distributed between different parties and they do not want to share their data. So, in this paper we propose a privacy preserving two party hierarchical clustering algorithm vertically partitioned data set. Each site only learns the final cluster centers, but nothing about the individual’s data.

Data mining has been a popular research area for more than a decade. There are several problems associated with data mining. Among them clustering is one of the most interesting problems. However, this problem becomes more challenging when dataset is distributed between different parties and they do not want to share their data. So, in this paper we propose a privacy preserving two party hierarchical clustering algorithm vertically partitioned data set. Each site only learns the final cluster centers, but nothing about the individual’s data.

Cite this paper

A. Tripathy and I. De, "Privacy Preserving Two-Party Hierarchical Clustering Over Vertically Partitioned Dataset,"*Journal of Software Engineering and Applications*, Vol. 6 No. 5, 2013, pp. 26-31. doi: 10.4236/jsea.2013.65B006.

A. Tripathy and I. De, "Privacy Preserving Two-Party Hierarchical Clustering Over Vertically Partitioned Dataset,"

References

[1] J. W. Han and M. Kamber, “Data Mining: Concepts and Techniques,” 2nd Edition, China Machine Press, Beijing, 2006.

[2] J. S. Vaidya, “Privacy Preserving Data Mining over Vertically Partitioned Data,” Ph.D Thesis, Purdue University, 2004, pp. 1-149.

[3] J. Vaidya and C. Clifton, “Privacy Preserving K-Means Clustering over Vertically Partitioned Data,”*Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining*, Washington DC, USA, 2003, pp. 206-215. doi:10.1145/956750.956776

[4] T. K. Yu, D. T. Lee, Shih-Ming Chang and Justin Zhan, “Multi-Party k-Means Clustering with Privacy Consideration,”*International Symposium on Parallel and Distributed Processing with Applications, IEEE Computer Society*, 2010, pp. 200- 207.

[5] G. Jagannathan and R. N. Wright, “Privacy Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data,”*Proceedings of the 11th ACM, SIGKDD International Conference on Knowledge Discovery and Data Mining*, USA, 2005, pp. 1-7.

[6] G. Jagannathan, K. Pillaipakkamnatt and R. Wright, “A New Privacy Preserving Distributed k-Clustering Algorithm,”*In Proceeding of the 6th SIAM International Conference on Data Mining*, 2006, pp. 492-496.

[7] G. Jagannathan, K. Pillaipakkamnatt, R. N. Wright and D. Umano, “Communication-Efficient Privacy Preserving Clustering,”*Transactions on Data Privacy 3*,Vol. 3, No. 1, 2010, pp.1-25.

[8] V. Esti-vill-Castro, “Private Representative-Based Clustering for Vertically Partitioned Data,”*Proceedings of the Fifth Mexican International Conference in Computer Science (ENC’04), IEEE Computer Society*, 2004, pp.1-8.

[9] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining”,*In Advances in Cryptology (Crypto 2000)*, 2000, pp. 36-54.

[10] R. Agrawal and R. Srikant, “Priva-cy-preserving data mining,”*In Proceedings of the 2000 ACM SIGMOD Conference on Management of Data*, 2000, pp. 439-450. doi:10.1145/342009.335438

[11] P. Bunn and R. Ostrovsky, “Secure Two-Party k-Means Clustering,”*In Proceedings of the 14th ACM Conference on Computer and Communications Security*, 2007, pp. 486-497. doi:10.1145/1315245.1315306

[12] S. Jha, L. Kruger and P. McDaniel, “Privacy Preserving Clustering,”*10th European Symp on Research in Computer Security*, 2005, pp. 397-417.

[13] P. K. Prasad and C. P. Rangan, “Privacy Preserving BIRCH Algorithm for Clustering over Vertically Partitioned Databases,”*SDM 2006, LNCS 4165, Springer*, Berlin, Hiedelberg, 2006, pp. 84-99.

[14] A. Asuncion and D. J. Newman, “UCI Machine Learning Repository,” 2007. [http://www.ics.uci.edurmlearnIMLRepository.html].

[1] J. W. Han and M. Kamber, “Data Mining: Concepts and Techniques,” 2nd Edition, China Machine Press, Beijing, 2006.

[2] J. S. Vaidya, “Privacy Preserving Data Mining over Vertically Partitioned Data,” Ph.D Thesis, Purdue University, 2004, pp. 1-149.

[3] J. Vaidya and C. Clifton, “Privacy Preserving K-Means Clustering over Vertically Partitioned Data,”

[4] T. K. Yu, D. T. Lee, Shih-Ming Chang and Justin Zhan, “Multi-Party k-Means Clustering with Privacy Consideration,”

[5] G. Jagannathan and R. N. Wright, “Privacy Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data,”

[6] G. Jagannathan, K. Pillaipakkamnatt and R. Wright, “A New Privacy Preserving Distributed k-Clustering Algorithm,”

[7] G. Jagannathan, K. Pillaipakkamnatt, R. N. Wright and D. Umano, “Communication-Efficient Privacy Preserving Clustering,”

[8] V. Esti-vill-Castro, “Private Representative-Based Clustering for Vertically Partitioned Data,”

[9] Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining”,

[10] R. Agrawal and R. Srikant, “Priva-cy-preserving data mining,”

[11] P. Bunn and R. Ostrovsky, “Secure Two-Party k-Means Clustering,”

[12] S. Jha, L. Kruger and P. McDaniel, “Privacy Preserving Clustering,”

[13] P. K. Prasad and C. P. Rangan, “Privacy Preserving BIRCH Algorithm for Clustering over Vertically Partitioned Databases,”

[14] A. Asuncion and D. J. Newman, “UCI Machine Learning Repository,” 2007. [http://www.ics.uci.edurmlearnIMLRepository.html].