Journal of Computer Science and Technology ›› 2018, Vol. 33 ›› Issue (6): 1231-1242.doi: 10.1007/s11390-018-1884-6

• Data Management and Data Mining • Previous Articles     Next Articles

Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness

Rong Wang1, Student Member, CCF, Yan Zhu1,*, Member, CCF, Tung-Shou Chen2, Chin-Chen Chang3, Fellow, IEEE   

  1. 1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China;
    2. Department of Computer Science and Information Engineering, "National" Taichung University of Science and Technology, Taichung 404, China;
    3. Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, China
  • Received:2017-09-18 Revised:2018-07-23 Online:2018-11-15 Published:2018-11-15
  • Contact: Yan Zhu,
  • About author:Rong Wang received her B.S. degree in computer science and technology from Mianyang Normal College, Mianyang, in 2011. She is now a Ph.D. candidate in the School of Information Science and Technology at Southwest Jiaotong University, Chengdu. Her current research interests include machine learning, data mining with big data, and privacypreserving data mining.
  • Supported by:
    The work was supported by the Academic and Technological Leadership Training Foundation of Sichuan Province of China under Grant Nos. WZ0100112371601/004, WZ0100112371408, and YH1500411031402.

Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist several common attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closeness models address the case in which the original data have only one single sensitive attribute, data with multiple sensitive attributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitive attributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributes in any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness, both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses a clustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasiidentifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the loss of information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. The results show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm but the former can preserve more original information. In addition, compared with related approaches, both proposed algorithms can achieve stronger protection of privacy and reduce less.

Key words: data privacy; k-anonymity; t-closeness; multiple sensitive attribute;

