Attacks on Anonymization-Based Privacy-Preserving: A Survey for Data Mining and Data Publishing

Show more

Data mining is the extraction of vast interesting patterns or
knowledge from huge amount of data. The initial idea of privacy-preserving
data mining PPDM was to extend traditional data mining techniques to work with
the data modified to mask sensitive information. The key issues were how to
modify the data and how to recover the data mining result from the modified
data. Privacy-preserving data mining considers the problem of running data mining
algorithms on confidential data that is not supposed to be revealed even to the
party running the algorithm. In contrast, privacy-preserving
data publishing (PPDP) may not necessarily be tied to a specific data mining
task, and the data mining task may be unknown at the time of data publishing.
PPDP studies how to transform raw data into a version that is immunized against
privacy attacks but that still supports effective data mining tasks. Privacy-preserving
for both data mining (PPDM) and data publishing (PPDP) has become increasingly
popular because it allows sharing of privacy sensitive data for analysis
purposes. One well studied approach is the k-anonymity model [1]
which in turn led to other models such as confidence bounding, l-diversity,
t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to
minimize information loss and such an attempt provides a loophole for attacks.
The aim of this paper is to present a survey for most of the common attacks
techniques for anonymization-based PPDM & PPDP and explain their effects
on Data Privacy.

References

[1] P. Samarati and L. Sweeney, “Protecting Privacy When Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression,” Technical Report SRI-CSL-98-04, 1998.

[2] A. Machanavajjhala, J. Gehrke, et al., “l -Diversity: Privacy beyond k-Anonymity,” Proceeding of ICDE, April 2006.

[3] N. Li, T. Li and S. Venkatasubramanian, “t-Closeness: Privacy Beyond k-Anonymity and l-Diversity,” Proceedings of ICDE, 2007, pp. 106-115.

[4] R. C. Wong, J. Li, A. W. Fu, et a1., “(α,k)-Anonymity: An Enhaned k-Anonymity Model for Privacy-Preserving Data Publishing,” In: Proceedings of the 12th ACM SIGKDD, ACM Press, New York, 2006, pp. 754-759.

[5] M. Terrovitis, N. Mamoulis and Kalnis, “Privacy Preserving Anonymization of Set-Valued Data,” VLDB, Auckland, 2008, pp. 115-125.

[6] K. LeFevre, D. J. DeWitt and R. Ramakrishnan, “Incognito: Efficient Full-Domain k-Anonymity,” In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, June 2005, pp. 49-60.

[7] X. Ye, L. Jin and B. Li, “A Multi-Dimensional K-Anonymity Model for Hierarchical Data, Electronic Commerce and Security,” 2008 International Symposium, Beijing, August 2008, pp. 327-332.

[8] K. LeFevre, D. J. DeWitt and R. Ramakrishnan, “Workload-Aware Anonymization,” Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, August 2006, pp. 277-286. doi:10.1145/1150402.1150435

[9] X. Xiao and Y. Tao, “M-Invariance: Towards PrivacyPreserving Re-Publication of Dynamic Datasets,” In: Proceedings of SIGMOD, ACM Press, New York, 2007, pp. 689-700.

[10] Y. Bu, A. Wai-Chee Fu, et al., “Privacy-Preserving Serial Data Publishing By Role Composition,” VLDB, Auckland, 2008, pp. 845-856.

[11] X. Xiao and Y. Tao, “Personalized Privacy Preservation, Proceedings of ACM Conference on Management of Data (SIGMOD),” ACM Press, New York, 2006, pp. 785-790.

[12] “Business for Social Responsibility,” BSR Report on Privacy, 1999. http://www.bsr.org/

[13] B. Krishnamurthy, “Privacy vs. Security in the Aftermath of the September 11 Terrorist Attacks,” November 2001. http://www.scu.edu/ethics/publications/briefings/privacy.html

[14] J. W. Seifert, “Data Mining and Homeland Security: An Overview,” CRS Report for Congress, (RL31798), January 2006. http://www.fas.org/sgp/crs/intel/RL31798.pdf

[15] T. Fawcett and F. Provost, “Activity Monitoring: Noticing Interesting Changes in Behavior,” Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Diego, 1999, pp. 53-62. doi:10.1145/312129.312195

[16] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proceedings of ACM International Conference on Management of Data (SIGMOD), Dallas, 2000, pp. 439-450.

[17] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin and M. Y. Zhu, “Tools for Privacy-Preserving Distributed Data Mining,” ACM SIGKDD Explorations Newsletter, Vol. 4, No. 2, 2002, pp. 28-34. doi:10.1145/ 772862.772867

[18] N. R. Adam and J. C. Wortman, “Security Control Methods for Statistical Databases,” ACM Computer Surveys, Vol. 21, No. 4, 1989, pp. 515-556.
doi:10.1145/76894.76895

[19] S. Agrawal and J. R. Haritsa, “A Framework for HighAccuracy Privacy-Preserving Mining,” Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), Tokyo, April 2005, pp. 193-204.
doi:10.1109/ICDE.2005.8

[20] A. Evfimievski, “Randomization in Privacy-Preserving Data Mining,” ACM SIGKDD Explorations Newsletter, Vol. 4, No. 2, 2002, pp. 43-48.
doi:10.1145/772862.772869

[21] K. Liu, H. Kargupta and J. Ryan, “Random ProjectionBased Multiplicative Perturbation for Privacy-Preserving Distributed Data Mining,” IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 18, No. 1, 2006, pp. 92-106. doi:10.1109/TKDE.2006.14

[22] A. Shoshani, “Statistical Databases: Characteristics, Problems and Some Solutions,” Proceedings of the 8th Very Large Data Bases (VLDB), Mexico City, September 1982, pp. 208-213.

[23] V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin and Y. Theodoridis, “State-of-the-Art in Privacy Preserving Data Mining,” ACM SIGMOD Record, Vol. 3, No. 1, 2004, pp. 50-57.
doi:10.1145/974121.974131

[24] B. Pinkas, “Cryptographic Techniques for Privacy-Preserving Data Mining,” ACM SIGKDD Explorations Newsletter, Vol. 4, No. 2, 2002, pp. 12-19.
doi:10.1145/772862.772865

[25] J. Vaidya, C. W. Clifton and M. Zhu, “Privacy-Preserving Data Mining,” 2006.

[26] W. Du, Y. S. Han and S. Chen, “Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification,” Proceedings of the SIAM International Conference on Data Mining (SDM), Florida, 2004.

[27] W. Du and Z. Zhan, “Building Decision Tree Classifier on Private Data,” Workshop on Privacy, Security, and Data Mining at the 2002 IEEE International Conference on Data Mining, Maebashi City, December 2002.

[28] A. W. C. Fu, R. C. W. Wong and K. Wang, “Privacy-Preserving Frequent Pattern Mining across Private Databases,” Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, November 2005, pp. 613-616.

[29] M. Kantarcioglu and C. Clifton, “Privacy-Preserving Data Mining of Association Rules on Horizontally Partitioned Data,” IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 16, No. 9, 2004, pp. 1026-1037.
doi:10.1109/TKDE.2004.45

[30] M. Kantarcioglu and C. Clifton, “Privately Computing a Distributed K-Nn Classifier,” Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, September 2004, pp. 279-290.

[31] J. Vaidya and C. Clifton, “Privacy-Preserving Association Rule Mining in Vertically Partitioned Data,” Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Edmonton, 2002. pp. 639-644.

[32] J. Vaidya and C. Clifton, “Privacy-Preserving k-Means Clustering over Vertically Partitioned Data,” Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington, 2003, pp. 206-215.

[33] Z. Yang, S. Zhong and R. N. Wright, “Privacy-Preserving Classification of Customer Data without Loss of Accuracy,” Proceedings of the 5th SIAM International Conference on Data Mining (SDM), Newport Beach, 2005, pp. 92-102.

[34] A. C. Yao, “Protocols for Secure Computations,” Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science, Washington DC, 1982, pp. 160-164.

[35] A. C. Yao, “How to Generate and Exchange Secrets,” Proceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science, 1986, pp. 162-167.

[36] R. Brand, “Microdata Protection through Noise Addition,” Inference Control in Statistical Databases, From Theory to Practice, London, 2002, pp. 97-116.

[37] Confidentiality and Data Access Committee, “Report on Statistical Disclosure Limitation Methodology,” Technical Report 22, Office of Management and Budget, December 2005.

[38] A. Blum, C. Dwork, F. McSherry and K. Nissim, “Practical Privacy: The Sulq Framework,” Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS), Baltimore, June 2005, pp. 128-138.

[39] I. Dinur and K. Nissim, “Revealing Information While Preserving Privacy,” Proceedings of the 22nd ACM Symposium on Principles of Database Systems (PODS), San Diego, June 2003, pp. 202-210.

[40] C. Dwork, “Differential Privacy: A Survey of Results,” Proceedings of the 5th International Conference on Theory and Applications of Models of Computation (TAMC), Xi’an, April 2008, pp. 1-19.

[41] A. Blum, K. Ligett and A. Roth, “A Learning Theory Approach to Non-Interactive Database Privacy,” Proceedings of the 40th annual ACM Symposium on Theory of Computing (STOC), Victoria, 2008, pp. 609-618.

[42] X. Xiao and Y. Tao, “Anatomy: Simple and Effective Privacy-Preservation,” Proceedings of the International Conference on Very Large Data Bases (VLDB), Seoul, 2006, pp. 139-150.

[43] N. Maheshwarkar, K. Pathak and V. Chourey, “Privacy Issues for k-Anonymity Model,” International Journal of Engineering Research, Vol. 1, No. 4, 2011, pp. 1857-1861. doi:10.1109/DBTA.2009.74

[44] X. Hu, Z. Sun, Y. Wu, W. Hu and J. Dong, “k-Anonymity Based on Sensitive Tuples,” First International Workshop on Database Technology and Applications, Wuhan, 25-26 April 2009, pp. 91-94.