• Articles • Previous Articles     Next Articles

An Efficient Clustering Algorithm for k-Anonymisation

Grigorios Loukides and Jian-Hua Shao   

  1. School of Computer Science, Cardiff University, Cardiff, U.K.
  • Revised:2007-12-06 Online:2008-03-15 Published:2008-03-10

K-anonymisation is an approach to protecting individuals from being identified from data. Good $k$-anonymisations should retain data utility and preserve privacy, but few methods have considered these two conflicting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.

Key words: congestion control; neural network; fuzzy neural network; ATM;



[1] Li N, Li T, Venkatasubramanian S. $t$-closeness: Privacy beyond $k$-anonymity and $l$-diversity. In -\it Proc. ICDE}, Istanbul, Turkey, 2007, pp.106--115.

[2] Loukides G, Shao J. Speeding up clustering-based $k$-anonymisation algorithms with pre-partitioning. In -\it Proc. The 24th British National Conference on Databases}, Glasgow, UK, 2007, pp.203--214.

[3] Loukides G, Shao J. Capturing data usefulness and privacy protection in $K$-anonymisation. In -\it Proc. The 22nd Annual ACM Symposium on Applied Computing}, Seoul, Korea, 2007, pp.370--374.

[4] Sweeney L. $K$-anonymity: A model for protecting privacy. -\it International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems}, 2002, 10(5): 557--570.

[5] Samarati P. Protecting respondents identities in microdata release. -\it IEEE Transactions on Knowledge and Data Engineering}, 2001, 13(9): 1010--1027.

[6] LeFevre K, DeWitt D J, Ramakrishnan R. Mondrian multidimensional $K$-anonymity. In -\it Proc. ICDE}, Atlanta, Georgia, USA, 2006, p.25.

[7] Bayardo R J, Agrawal R. Data privacy through optimal $k$-anonymization. In -\it Proc. ICDE}, Tokyo, Japan, 2005, pp.217--228.

[8] Iyengar V S. Transforming data to satisfy privacy constraints. In -\it Proc. KDD}, Edmonton, Alberta, Canada, 2002, pp.279--288.

[9] LeFevre K, DeWitt D J, Ramakrishnan R. Workload-aware anonymization. In -\it Proc. KDD}, Philadelphia, PA, USA, 2006, pp.277--286.

[10] Fung B C M, Wang K, Yu P S. Top-down specialization for information and privacy preservation. In -\it Proc. ICDE}, Tokyo, Japan, 2005, pp.205--216.

[11] Teng Z, Du W. Comparisons of $k$-anonymization and randomization schemes under linking attacks. In -\it Proc. ICDM}, Hong Kong, China, 2006, pp.1091--1096.

[12] Machanavajjhala A Gehrke J, D Kifer \it et al. \rm $l$-diversity: Privacy beyond $k$-anonymity. In -\it Proc. ICDE}, Atlanta, Georgia, USA, 2006, p.24.

[13] Koudas N, Zhang Q, Srivastava D \it et al. \rm Aggregate query answering on anonymized tables. In -\it Proc. ICDE}, Istanbul, Turkey, 2007, pp.116--125.

[14] Hettich S, Merz C J. UCI Repository of machine learning databases, 1999, http://kdd.ics.uci.edu.

[15] LeFevre K, DeWitt D J, Ramakrishnan R. Incognito: Efficient full-domain K-anonymity. In -\it Proc. SIGMOD}, Baltimore, Maryland, USA, 2005, pp.49--60.

[16] Xu J, Wang W, Pei J \it et al. \rm Utility-based anonymization using local recoding. In -\it Proc. KDD}, Philadelphia, PA, USA, 2006, pp.785--790.

[17] Aggarwal C C, Yu P S. \rm A condensation approach to privacy preserving data mining. In -\it Proc. The 9th International Conference on Extending Database Technology}, Heraklion, Crete, Greece, 2004, pp.183--199.

[18] Byun J, Kamra E, Bertino E \it et al. \rm Efficient $k$-anonymization using clustering techniques. In -\it Proc. The 12th International Conference on Database Systems for Advanced Applications}, 2007, Bangkok, Thailand, pp.188--200.

[19] Zhou J, Sander J. \rm Data bubbles for non-vector data: Speeding-up hierarchical clustering in arbitrary metric spaces. In -\it Proc. VLDB}, Berlin, Germany, 2003, pp.452--463.

[20] Narayan B L, Murthy C A, Pal S K. Maxdiff kd-trees for data condensation. -\it Pattern Recogn. Lett.}, 27(3): 187--200.

[21] Xiao X, Tao Y. \rm Anatomy: Simple and effective privacy preservation. In -\it Proc. VLDB}, Seoul, Korea, 2006, pp.139--150.

[22] Gehrke J, Ramakrishnan R, Ganti V. \rm RainForest --- A Framework for fast decision tree construction of large datasets. In -\it Proc. VLDB}, New York City, USA, 1998, pp.416--427.
[1] Yusuf Fatihu Hamza and Hong-Wei Lin. Conjugate-Gradient Progressive-Iterative Approximation for Loop and Catmull-Clark Subdivision Surface Interpolation [J]. Journal of Computer Science and Technology, 2022, 37(2): 487-504.
[2] Xiao-Zheng Xie, Jian-Wei Niu, Xue-Feng Liu, Qing-Feng Li, Yong Wang, Jie Han, and Shaojie Tang. DG-CNN: Introducing Margin Information into Convolutional Neural Networks for Breast Cancer Diagnosis in Ultrasound Images [J]. Journal of Computer Science and Technology, 2022, 37(2): 277-294.
[3] Xin-Feng Wang, Xiang Zhou, Jia-Hua Rao, Zhu-Jin Zhang, and Yue-Dong Yang. Imputing DNA Methylation by Transferred Learning Based Neural Network [J]. Journal of Computer Science and Technology, 2022, 37(2): 320-329.
[4] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[5] Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen. Pre-Train and Learn: Preserving Global Information for Graph Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(6): 1420-1430.
[6] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. Machine Learning Aided Key-Guessing Attack Paradigm Against Logic Block Encryption [J]. Journal of Computer Science and Technology, 2021, 36(5): 1102-1117.
[7] Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. Area Efficient Pattern Representation of Binary Neural Networks on RRAM [J]. Journal of Computer Science and Technology, 2021, 36(5): 1155-1166.
[8] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network [J]. Journal of Computer Science and Technology, 2021, 36(4): 762-777.
[9] Chen-Chen Sun, De-Rong Shen. Mixed Hierarchical Networks for Deep Entity Matching [J]. Journal of Computer Science and Technology, 2021, 36(4): 822-838.
[10] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. Is It Easy to Recognize Baby's Age and Gender? [J]. Journal of Computer Science and Technology, 2021, 36(3): 508-519.
[11] Yang-Jie Cao, Shuang Wu, Chang Liu, Nan Lin, Yuan Wang, Cong Yang, Jie Li. Seg-CapNet: A Capsule-Based Neural Network for the Segmentation of Left Ventricle from Cardiac Magnetic Resonance Imaging [J]. Journal of Computer Science and Technology, 2021, 36(2): 323-333.
[12] Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang, Qing Shen. A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(2): 434-444.
[13] Bo-Wei Zou, Rong-Tao Huang, Zeng-Zhuang Xu, Yu Hong, Guo-Dong Zhou. Language Adaptation for Entity Relation Classification via Adversarial Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(1): 207-220.
[14] Yue-Huan Wang, Ze-Nan Li, Jing-Wei Xu, Ping Yu, Taolue Chen, Xiao-Xing Ma. Predicted Robustness as QoS for Deep Neural Network Models [J]. Journal of Computer Science and Technology, 2020, 35(5): 999-1015.
[15] Bi-Ying Yan, Chao Yang, Pan Deng, Qiao Sun, Feng Chen, Yang Yu. A Spatiotemporal Causality Based Governance Framework for Noisy Urban Sensory Data [J]. Journal of Computer Science and Technology, 2020, 35(5): 1084-1098.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved