Journal of Computer Science and Technology ›› 2018, Vol. 33 ›› Issue (6): 1231-1242.doi: 10.1007/s11390-018-1884-6

• Data Management and Data Mining • Previous Articles     Next Articles

Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness

Rong Wang1, Student Member, CCF, Yan Zhu1,*, Member, CCF, Tung-Shou Chen2, Chin-Chen Chang3, Fellow, IEEE   

  1. 1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China;
    2. Department of Computer Science and Information Engineering, "National" Taichung University of Science and Technology, Taichung 404, China;
    3. Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, China
  • Received:2017-09-18 Revised:2018-07-23 Online:2018-11-15 Published:2018-11-15
  • Contact: Yan Zhu,E-mail:yzhu@swjtu.edu.cn E-mail:yzhu@swjtu.edu.cn
  • About author:Rong Wang received her B.S. degree in computer science and technology from Mianyang Normal College, Mianyang, in 2011. She is now a Ph.D. candidate in the School of Information Science and Technology at Southwest Jiaotong University, Chengdu. Her current research interests include machine learning, data mining with big data, and privacypreserving data mining.
  • Supported by:
    The work was supported by the Academic and Technological Leadership Training Foundation of Sichuan Province of China under Grant Nos. WZ0100112371601/004, WZ0100112371408, and YH1500411031402.

Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist several common attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closeness models address the case in which the original data have only one single sensitive attribute, data with multiple sensitive attributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitive attributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributes in any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness, both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses a clustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasiidentifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the loss of information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. The results show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm but the former can preserve more original information. In addition, compared with related approaches, both proposed algorithms can achieve stronger protection of privacy and reduce less.

Key words: data privacy; k-anonymity; t-closeness; multiple sensitive attribute;

[1] Sánchez D, Martínez S, Domingo-Ferrer J. Comment on "Unique in the shopping mall:On the reidentifiability of credit card metadata". Science, 2016, 351(6279):1274.
[2] Sweeney L. k-anonymity:A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10(5):557-570.
[3] LeFevre K, DeWitt D J, Ramakrishnan R. Mondrian multidimensional k-anonymity. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.25.
[4] Machanavajjhala A, Gehrke J, Kifer D. Venkitasubramaniam M. l-diversity:Privacy beyond k-anonymity. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.24.
[5] Li N H, Li T C, Venkatasubramanian S. t-closeness:Privacy beyond k-anonymity and l-diversity. In Proc. the 23rd International Conference on Data Engineering, April 2007, pp.106-115.
[6] Domingo-Ferrer J, Soria-Comas J. From t-closeness to differential privacy and vice versa in data anonymization. Knowledge-Based Systems, 2015, 74:151-158.
[7] Rebollo-Monedero D, Forne J, Domingo-Ferrer J. From tcloseness-like privacy to postrandomization via information theory. IEEE Trans. Knowl. Data Eng., 2010, 22(11):1623-1636.
[8] Cao J N, Karras P, Kalnis P, Tan K L. SABRE:A sensitive attribute bucketization and redistribution framework for t-closeness. The VLDB Journal, 2011, 20:59-81.
[9] Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S. t-closeness through microaggregation:Strict privacy with enhanced utility preservation. IEEE Trans. Knowl. Data Eng., 2015, 27(11):3098-3110.
[10] Sha C F, Li Y, Zhou A Y. On t-closeness with KLdivergence and semantic privacy. In Proc. the 15th International Conference on Database Systems for Advanced Applications, April 2010, pp.153-167.
[11] Zhang J P, Xie J, Yang J, Zhang B. A t-closeness privacy model based on sensitive attribute values semantics bucketization. Journal of Computer Research and Development, 2014, 51(1):126-137. (in Chinese)
[12] Rubner Y, Tomasi C, Guibas L J. The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 2000, 40(2):99-121.
[13] Xu J, Wang W, Pei J, Wang X Y, Shi B L, Fu A W C. Utility-based anonymization using local recoding. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006, pp.785-790.
[14] Ghinita G, Karras P, Kalnis P, Mamoulis N. Fast data anonymization with low information loss. In Proc. the 33rd International Conference on Very Large Data Bases, September 2007, pp.758-769.
[15] LeFevre K, DeWitt D J, Ramakrishnan R. Incognito:Efficient full-domain k-anonymity. In Proc. ACM SIGMOD International Conference on Management of Data, June 2005, pp.49-60.
[16] Li N H, Li T C, Venkatasubramanian S. Closeness:A new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng., 2010, 22(7):943-956.
[17] Fang Y, Ashrafi M Z, Ng S K. Privacy beyond single sensitive attribute. In Proc. the 22nd International Conference on Database and Expert Systems Applications, August 2011, pp.187-201.
[18] Sei Y C, Okumura H, Takenouchi T, Ohsuga A. Anonymization of sensitive quasiidentifiers for l-diversity and tcloseness. IEEE Transactions on Dependable and Secure Computing. doi:10.1109/TDSC.2017.2698472.
[19] Höppner F, Klawonn F. Clustering with size constraints. In Computational Intelligence Paradigms, Jain L C, Sato-Ilic M, Virvou M, Tsihrintzis G A, Balas V E (eds.), Springer, Berlin, Heidelberg, 2008, pp.167-180.
[20] Jolliffe I T, Cadima J. Principal component analysis:A review and recent developments. Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences, 2016, 374(2065):20150202.
[1] Yan-Bo Han (韩燕波), Jun-Yi Sun (孙君意), Gui-Ling Wang (王桂玲) and Hou-Fu Li (李厚福). A Cloud-Based BPM Architecture with User-End Distribution of Non-Compute-Intensive Activities and Sensitive Data [J]. , 2010, 25(6): 1157-1167.
[2] Grigorios Loukides and Jian-Hua Shao. An Efficient Clustering Algorithm for k-Anonymisation [J]. , 2008, 23(2): 188-202 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved