JIS  Vol.6 No.3 , July 2015
Utility-Based Anonymization Using Generalization Boundaries to Protect Sensitive Attributes
Abstract: Privacy preserving data mining (PPDM) has become more and more important because it allows sharing of privacy sensitive data for analytical purposes. A big number of privacy techniques were developed most of which used the k-anonymity property which have many shortcomings, so other privacy techniques were introduced (l-diversity, p-sensitive k-anonymity, (α, k)-anonymity, t-closeness, etc.). While they are different in their methods and quality of their results, they all focus first on masking the data, and then protecting the quality of the data. This paper is concerned with providing an enhanced privacy technique that combines some anonymity techniques to maintain both privacy and data utility by considering the sensitivity values of attributes in queries using sensitivity weights which determine taking in account utility-based anonymization and then only queries having sensitive attributes whose values exceed threshold are to be changed using generalization boundaries. The threshold value is calculated depending on the different weights assigned to individual attributes which take into account the utility of each attribute and those particular attributes whose total weights exceed the threshold values is changed using generalization boundaries and the other queries can be directly published. Experiment results using UT dallas anonymization toolbox on real data set adult database from the UC machine learning repository show that although the proposed technique preserves privacy, it also can maintain the utility of the publishing data.
Cite this paper: Hussien, A. , Darwish, N. and Hefny, H. (2015) Utility-Based Anonymization Using Generalization Boundaries to Protect Sensitive Attributes. Journal of Information Security, 6, 179-196. doi: 10.4236/jis.2015.63019.

[1]   Huang, Z., Du, W. and Chen, B. (2005) Deriving Private Information from Randomized Data. Proceedings of the ACM SIGMOD Conference on Management of Data, Baltimore, 37-48.

[2]   Fung, B., Wang, K. and Yu, P. (2005) Top-Down Specialization for Information. Conference on Data Engineering (ICDE05), 205-216.

[3]   Aggarwal, C. and Yu, P. (2008) Models and Algorithms: Privacy-Preserving Data Mining. Springer, Berlin.

[4]   Burnett, L., Barlow-Stewart, K., Pros, A. and Aizenberg, H. (2003) The Gene Trustee: A Universal Identification System That Ensures Privacy and Confidentiality for Human Genetic Databases. Journal of Law and Medicine, 10, 506-513.

[5]   Kargupta, H., Datta, S., Wang, Q. and Sivakumar, K. (2003) On the Privacy Preserving Properties of Random Data Perturbation Techniques. Proceedings of the 3rd International Conference on Data Mining, 19-22 November 2003, 99-106.

[6]   Hussien, A.A., Hamza, N., Shahen, A.A. and Hefny, H.A. (2012) A Survey of Privacy Preserving Data Mining Algorithms. Yanbu Journal of Engineering and Science, 5, 1433H.

[7]   LeFevre, K., DeWitt, D. and Ramakrishnan, R. (2005) Incognito: Efficient Full Domain k-Anonymity. Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, 49-60.

[8]   Dayal, U. and Hwang, H.Y. (1984) View Definition and Generalization for Database Integration in Multi-Database Systems. IEEE Transactions on Software Engineering, 10, 628-645.

[9]   Samarati, P. (2001) Protecting Respondents Identities in Microdata Release. IEEE Transactions on Knowledge and Data Engineering, 13, 1010-1027.

[10]   Pei, J., Xu, J., Wang, Z.B., Wang, W. and Wang, K. (2007) Maintaining k-Anonymity against Incremental Updates. Proceedings of the 19th International Conference on Scientific and Statistical Database.

[11]   Miller, J., Campan, A. and Marius, T. (2008) Constrained K-Anonymity: Privacy with Generalization Boundaries. P3DM’08, 26 April 2008, Atlanta.

[12]   Winkler, W. (1995) Matching and Record Linkage. Business Survey Methods. Wiley, New York, 374-403.

[13]   Hussien, A.A., Hamza, N. and Hefny, A. (2013) Attacks on Anonymization-Based Privacy-Preserving: A Survey for Data Mining and Data Publishing. Journal of Information Security, 4, 101-112.

[14]   Sweeney, L. (2002) K-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-Based Systems, 10, 557-570.

[15]   Sweeney, L. (2002) Achieving K-Anonymity Privacy Protection Using Generalization and Suppression. International Journal on Uncertainty, Fuzziness, and Knowledge-Based Systems, 10, 571-588.

[16]   Xiao, X.K. and Tao, Y.F. (2006) Personalized Privacy Preservation. Proceedings of the ACM SIGMOD International Conference, 229-240.

[17]   Wong, R.C.-W., Li, J.Y., Fu, A.W.-C. and Wang, K. (2006) (α, K)-Anonymity: An Enhanced K-Anonymity Model for Privacy Preserving Data Publishing. KDD, 754-759.

[18]   Bayardo, R. and Agrawal, R. (2005) Data Privacy through Optimal K-Anonymity. Proceedings of the 21st International conference on Data Engineering (ICDE), Tokyo, 5-8 April 2005, 217-228.

[19]   Iyengar, V. (2002) Transforming Data to Satisfy Privacy Constraints. Proceedings of the ACM SIGMOD International Conference, 279-288.

[20]   LeFevre, K., DeWitt, D.J. and Ramakrishnan, R. (2006) Mondrian Multidimensional K-Anonymity. Proceedings of the 22nd International Conference on Data Engineering, 3-7 April 2006, 25.

[21]   Kamakshi, P. and Vinaya Babu, A. (2012) Automatic Detection of Sensitive Attribute in PPDM. 2012 IEEE International Conference on Computational Intellgence and Computing Research.

[22]   Xu, J., Wang, W., Pei, J., Wang, X.Y., Shi, B.L. and Fu, A.W.-C. (2006) Utility-Based Anonymization Using Local Recoding. KDD’06, Philadelphia, 20-23 August.

[23]   Conover, W.J. (1999) Practical Nonparametric Statistical. 3rd Edition, John Wiley & Sons Inc., New York, 428-433.


[25]   Kantarcioglu, I.M. and Bertino, E. (2009) Using Anonymized Data for Classification. ICDE, 429-440.

[26]   Xiao, X.K. and Tao, Y.F. (2006) Anatomy: Simple and Effective Privacy Preservation. Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 139-150.

[27]   Woo, M.-J., Reitery, J.P., Oganianz, A. and Karr, A.F. (2009) Global Measures of Data Utility for Microdata Masked for Disclosure Limitation. Journal of Privacy and Confidentiality, 1, 111-124.

[28]   Kaar, A.F., Oganian, A., Reiter, J.P. and Woo, M.-J. (2006) New Measures of Data Utility.

[29]   Abad, B. and Kinariwala S.A. (2012) A Novel Approach for Privacy Preserving in Medical Data Mining Using Sensitivity Based Anonymity. International Journal of Computer Applications, 42, 13-16.

[30]   Acock, A.C. (2014) A Gentle Introduction to Stata. 4th Edition, Stata Press, College Station.

[31]   Boland, P.J. (2000) William Sealy Gosset—Alias “Student” 1876-1937. In: Houston, K., Ed., Creators of Mathematics: The Irish Connection, University College Dublin Press, Dublin, 105-112.

[32]   Dixon, W.J. and Massey Jr., F.J. (1983) Introduction to Statistical Analysis. 4th Edition, McGraw-Hill, New York.

[33]   Gleason, J.R. (1999) Sg101: Pair Wise Comparisons of Means, Including the Tukey Wsd Method. Stata Technical Bulletin, 47, 31-37.