Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (1): 231-251.doi: 10.1007/s11390-021-1754-5

Special Issue: Data Management and Data Mining

• Regular Paper • Previous Articles     Next Articles

Correlated Differential Privacy of Multiparty Data Release in Machine Learning

Jian-Zhe Zhao1 (赵建喆), Xing-Wei Wang2,3,* (王兴伟), Senior Member, CCF, Ke-Ming Mao1 (毛克明), Chen-Xi Huang1 (黄辰希), Yu-Kai Su1 (苏昱恺), and Yu-Chen Li1 (李宇宸)        

  1. 1Software College, Northeastern University, Shenyang 110169, China
    2State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China
    3College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2021-07-01 Revised:2021-11-04 Accepted:2021-11-12 Online:2022-01-28 Published:2022-01-28
  • Contact: Xing-Wei Wang
  • About author:Xing-Wei Wang received his B.E., M.S., and Ph.D. degrees in computer science from Northeastern University, Shenyang, in 1989, 1992, and 1998, respectively. He is currently a professor with the School of Computer Science and Engineering, Northeastern University, Shenyang. His research interests include cloud computing and future Internet. He has published more than 100 journal articles, book chapters, and refereed conference papers.
  • Supported by:
    This work is supported by the National Natural Science Foundation of China under Grant Nos. 62102074 and 62032013, the Liaoning Revitalization Talents Program under Grant No. XLYC1902010, the Natural Science Foundation of Liaoning Province of China under Grant No. 2020-MS-091, and Fundamental Research Funds for the Central Universities of China under Grant No. N2017015.

Differential privacy (DP) is widely employed for the private data release in the single-party scenario. Data utility could be degraded with noise generated by ubiquitous data correlation, and it is often addressed by sensitivity reduction with correlation analysis. However, increasing multiparty data release applications present new challenges for existing methods. In this paper, we propose a novel correlated differential privacy of the multiparty data release (MP-CRDP). It effectively reduces the merged dataset's dimensionality and correlated sensitivity in two steps to optimize the utility. We also propose a multiparty correlation analysis technique. Based on the prior knowledge of multiparty data, a more reasonable and rigorous standard is designed to measure the correlated degree, reducing correlated sensitivity, and thus improve the data utility. Moreover, by adding noise to the weights of machine learning algorithms and query noise to the release data, MP-CRDP provides the release technology for both low-noise private data and private machine learning algorithms. Comprehensive experiments demonstrate the effectiveness and practicability of the proposed method on the utilized Adult and Breast Cancer datasets.

Key words: correlated differential privacy; multiparty data release; machine learning;

[1] Shanthamallu U S, Spanias A, Tepedelenlioglu C, Stanley M. A brief survey of machine learning methods and their sensor and IoT applications. In Proc. the 8th Int. Conf. Information, Intelligence, Systems & Applications, Aug. 2017. DOI: 10.1109/IISA.2017.8316459.
[2] Mohammed N, Fung B C M, Debbabi M. Anonymity meets game theory: Secure data integration with malicious participants. The VLDB Journal, 2011, 20(4): 567-588. DOI: 10.1007/s00778-010-0214-6.
[3] Fung B C M, Wang K, Chen R, Yu P S. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 2010, 42(4): Article No.14. DOI: 10.1145/1749603.1749605.
[4] Kim H, Ben-Othman J, Mokdad L. UDiPP: A framework for differential privacy preserving movements of unmanned aerial vehicles in smart cities. IEEE Trans. Veh. Technol., 2019, 68(4): 3933-3943. DOI: 10.1109/TVT.2019.2897509.
[5] Du M, Wang K, Xia Z, Zhang Y. Differential privacy preserving of training model in wireless big data with edge computing. IEEE Trans. Big Data, 2020, 6(2): 283-295. DOI: 10.1109/TBDATA.2018.2829886.
[6] Kim S, Shin H, Baek C H, Kim S, Shin J. Learning new words from keystroke data with local differential privacy. IEEE Trans. Knowl. Data Eng., 2020, 32(3): 479-491. DOI: 10.1109/TKDE.2018.2885749.
[7] Li D, Yang Q, Yu W, An D, Zhang Y, Zhao W. Towards differential privacy-based online double auction for smart grid. IEEE Trans. Inf. Forensics Secur., 2020, 15: 971-986. DOI: 10.1109/TIFS.2019.2932911.
[8] Dwork C. Differential privacy. In Proc. the 33rd International Colloquium on Automata, Languages and Programming, July 2006, pp.1-12. DOI: 10.1007/11787006-1.
[9] Dwork C, McSherry F, Nissim K, Smith A D. Calibrating noise to sensitivity in private data analysis. In Proc. the 3rd Theory of Cryptography Conference, March 2006, pp.265-284. DOI: 10.1007/11681878-14.
[10] Ji Z, Lipton Z C, Elkan C. Differential privacy and machine learning: A survey and review. arXiv:1412.7584, 2014., May 2020.
[11] Mir D J. Differentially-private learning and information theory. In Proc. the 2012 EDBT/ICDT Workshops, March 2012, pp.206-210. DOI: 10.1145/2320765.2320823.
[12] Friedman A, Schuster A. Data mining with differential privacy. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, pp.493-502. DOI: 10.1145/1835804.1835868.
[13] Mohammed N, Chen R, Fung B C M, Yu P S. Differentially private data release for data mining. In Proc. the17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2011, pp.493-501. DOI: 10.1145/2020408.2020487.
[14] Vaidya J, Shafiq B, Basu A, Hong Y. Differentially private naive Bayes classification. In Proc. the 2013 IEEE/WIC/ACM International Conferences on Web Intelligence, November 2013, pp.571-576. DOI: 10.1109/WI-IAT.2013.80.
[15] Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression. In Proc. the 22nd Annual Conference on Neural Information Processing Systems, December 2008, pp.289-296.
[16] Lei J. Differentially private M-estimators. In Proc. the 25th Annual Conference on Neural Information Processing Systems, December 2011, pp.361-369.
[17] Zhang J, Zhang Z, Xiao X, Yang Y, Winslett M. Functional mechanism: Regression analysis under differential privacy. Proceedings of the VLDB Endowment, 2012, 15(11): 1364-1375. DOI: 10.14778/2350229.2350253.
[18] Rubinstein B I P, Bartlett P L, Huang L, Taft N. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. arXiv:0911.5708, 2009., May 2020.
[19] Chaudhuri K, Monteleoni C, Sarwate A D. Differentially private empirical risk minimization. Machine Learning Research, 2011, 12: 1069-1109.
[20] Song S, Chaudhuri K, Sarwate A D. Stochastic gradient descent with differentially private updates. In Proc. the 2013 IEEE Global Conf. Signal Inf. Process., December 2013, pp.245-248. DOI: 10.1109/GlobalSIP.2013.6736861.
[21] Abadi M, Chu A, Goodfellow I J, McMahan H B, Mironov I, Talwar K, Zhang L. Deep learning with differential privacy. In Proc. the 2016 ACMSIGSAC Conf. Comput. Commun. Secur., October 2016, pp.308-318. DOI: 10.1145/2976749.2978318.
[22] Xiao Y, Xiong L. Protecting locations with differential privacy under temporal correlations. In Proc. the 22nd ACM Conference on Computer and Communications Security, October 2015, pp.1298-1309. DOI: 10.1145/2810103.2813640.
[23] Lv D, Zhu S. Achieving correlated differential privacy of big data publication. Computers & Security, 2019, 82: 184-195. DOI: 10.1016/j.cose.2018.12.017.
[24] Kifer D, Machanavajjhala A. No free lunch in data privacy. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, June 2011, pp.193-204. DOI: 10.1145/1989323.1989345.
[25] He X, Machanavajjhala A, Ding B. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.1447-1458. DOI: 10.1145/2588555.2588581.
[26] Kifer D, Machanavajjhala A. Pufferfish: A framework for mathematical privacy definitions. ACM Trans. Database Syst., 2014, 39(1): Article No.3. DOI: 10.1145/2514689.
[27] Chen R, Fung B C M, Yu P S, Desai B C. Correlated network data publication via differential privacy. The VLDB Journal, 2014, 23(4): 653-676. DOI: 10.1007/s00778-013-0344-8.
[28] Zhu T, Xiong P, Li G, Zhou W. Correlated differential privacy: Hiding information in Non-IID data set. IEEE Trans. Info. Fore. and Secur., 2015, 10(2): 229-242. DOI: 10.1109/TIFS.2014.2368363.
[29] Yang B, Sato I, Nakagawa H. Bayesian differential privacy on correlated data. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 31-June 4, 2015, pp.747-762. DOI: 10.1145/2723372.2747643.
[30] Alhadidi D, Mohammed N, Fung B C M, Debbabi M. Secure distributed framework for achieving $\epsilon$-differential privacy. In Proc. the 12th International Symposium on Privacy Enhancing Technologies, July 2012, pp.120-139. DOI: 10.1007/978-3-642-31680-7-7.
[31] Hong Y, Vaidya J, Lu H, Karras P, Goel S. Collaborative search log sanitization: Toward differential privacy and boosted utility. IEEE Trans. Dependable Secur. Comput., 2015, 12(5): 504-518. DOI: 10.1109/TDSC.2014.2369034.
[32] Mohammed N, Alhadidi D, Fung B C M, Debbabi M. Secure two-party differentially private data release for vertically partitioned data. IEEE Trans. Dependable Secur. Comput., 2014, 11(1): 59-71. DOI: 10.1109/TDSC.2013.22.
[33] Cheng X, Tang P, Su S, Chen R, Wu Z, Zhu B. Multi-party high-dimensional data publishing under differential privacy. IEEE Trans. Knowl. Data Eng., 2020, 32(8): 1557-1571. DOI: 10.1109/TKDE.2019.2906610.
[34] Goryczka S, Xiong L. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE Transactions on Dependable and Secure Computing, 2017, 14(5): 463-477. DOI: 10.1109/TDSC.2015.2484326.
[35] Dangi D, Santhi G. Secured multi-party data release on cloud for big data privacy-preserving using fusion learning. Turkish Journal of Computer and Mathematics Education, 2021, 12(3): 4716-4725. DOI: 10.17762/turcomat.v12i3.1893.
[36] Zhu T, Xiong P, Li G, Zhou W. Answering differentially private queries for continual datasets release. Future Gener. Comput. Syst., 2018, 87: 816-827. DOI: 10.1016/j.future.2017.05.007.
[37] Chen J, Ma H, Zhao D, Liu L. Correlated differential privacy protection for mobile crowdsensing. IEEE Trans. Big Data, 2021, 7(4): 784-795. DOI: 10.1109/TBDATA.2017.2777862.
[38] Cao Y, Yoshikawa M, Xiao Y, Xiong L. Quantifying differential privacy in continuous data release under temporal correlations. IEEE Trans. Knowl. Data Eng., 2019, 31(7): 1281-1295. DOI: 10.1109/TKDE.2018.2824328.
[39] Song S, Wang Y, Chaudhuri K. Pufferfish privacy mechanisms for correlated data. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1291-1306. DOI: 10.1145/3035918.3064025.
[40] Zhang T, Zhu T, Xiong P, Huo H, Tari Z, Zhou W. Correlated differential privacy: Feature selection in machine learning. IEEE Trans. Industrial Informatics, 2020, 16(3): 2115-2124. DOI: 10.1109/TII.2019.2936825.
[41] Wang H, Wang H. Correlated tuple data release via differential privacy. Inf. Sci., 2021, 560: 347-369. DOI: 10.1016/j.ins.2021.01.058.
[42] Wang H, Xu Z, Jia S, Xia Y, Zhang X. Why current differential privacy schemes are inapplicable for correlated data publishing? World Wide Web, 2021, 24(1): 1-23. DOI: 10.1007/s11280-020-00825-8.
[43] Ou L, Qin Z, Liao S, Hong Y, Jia X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Trans. Dependable Secur. Comput., 2020, 17(5): 1109-1123. DOI: 10.1109/TDSC.2018.2853105.
[44] Tang P, Chen R, Su S, Guo S, Ju L, Liu G. Differentially private publication of multi-party sequential data. In Proc. the 37th IEEE International Conference on Data Engineering, April 2021, pp.145-156, DOI: 10.1109/ICDE51399.2021.00020.
[45] Wu X, Dou W, Ni Q. Game theory based privacy preserving analysis in correlated data publication. In Proc. the Australasian Computer Science Week Multiconference, January 31-February 3, 2017, Article No.73. DOI: 10.1145/3014812.3014887.
[46] McSherry F, Talwar K. Mechanism design via differential privacy. In Proc. the 48th Annu. IEEE Symp. Found. Comput. Sci., October 2007, pp.94-103. DOI: 10.1109/FOCS.2007.66.
[47] Chandrashekar G, Sahin F. A survey on feature selection methods. Comput. Elect. Eng., 2014, 40(1): 16-28. DOI: 10.1016/j.compeleceng.2013.11.024.
[1] Geun Yong Kim, Joon-Young Paik, Yeongcheol Kim, and Eun-Sun Cho. Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis [J]. Journal of Computer Science and Technology, 2022, 37(2): 423-442.
[2] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. Machine Learning Aided Key-Guessing Attack Paradigm Against Logic Block Encryption [J]. Journal of Computer Science and Technology, 2021, 36(5): 1102-1117.
[3] Jian-Wei Cui, Wei Lu, Xin Zhao, Xiao-Yong Du. Efficient Model Store and Reuse in an OLML Database System [J]. Journal of Computer Science and Technology, 2021, 36(4): 792-805.
[4] Sara Elmidaoui, Laila Cheikhi, Ali Idri, Alain Abran. Machine Learning Techniques for Software Maintainability Prediction: Accuracy Analysis [J]. Journal of Computer Science and Technology, 2020, 35(5): 1147-1174.
[5] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults [J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146.
[6] Shu-Zheng Zhang, Zhen-Yu Zhao, Chao-Chao Feng, Lei Wang. A Machine Learning Framework with Feature Selection for Floorplan Acceleration in IC Physical Design [J]. Journal of Computer Science and Technology, 2020, 35(2): 468-474.
[7] Rui Ren, Jiechao Cheng, Xi-Wen He, Lei Wang, Jian-Feng Zhan, Wan-Ling Gao, Chun-Jie Luo. HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems [J]. Journal of Computer Science and Technology, 2019, 34(6): 1167-1184.
[8] João Fabrício Filho, Luis Gustavo Araujo Rodriguez, Anderson Faustino da Silva. Yet Another Intelligent Code-Generating System: A Flexible and Low-Cost Solution [J]. Journal of Computer Science and Technology, 2018, 33(5): 940-965.
[9] Lan Yao, Feng Zeng, Dong-Hui Li, Zhi-Gang Chen. Sparse Support Vector Machine with Lp Penalty for Feature Selection [J]. , 2017, 32(1): 68-77.
[10] Xin-Qi Bao, Yun-Fang Wu. A Tensor Neural Network with Layerwise Pretraining: Towards Effective Answer Retrieval [J]. , 2016, 31(6): 1151-1160.
[11] Najam Nazar, Yan Hu, He Jiang. Summarizing Software Artifacts: A Literature Review [J]. , 2016, 31(5): 883-909.
[12] Xi-Jin Zhang, Yi-Fan Lu, Song-Hai Zhang. Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks [J]. , 2016, 31(3): 489-500.
[13] Lixue Xia, Peng Gu, Boxun Li, Tianqi Tang, Xiling Yin, Wenqin Huangfu, Shimeng Yu, Yu Cao, Yu Wang, Huazhong Yang. Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication [J]. , 2016, 31(1): 3-19.
[14] Jun-Fa Liu, Wen-Jing He, Tao Chen, and Yi-Qiang Chen. Manifold Constrained Transfer of Facial Geometric Knowledge for 3D Caricature Reconstruction [J]. , 2013, 28(3): 479-489.
[15] Yuan Jiang (姜远), Member, CCF, Ming Li (黎铭), Member, CCF, ACM, IEEE, and Zhi-Hua Zhou (周志华), Senior Member, CCF, IEEE, <. Software Defect Detection with ROCUS [J]. , 2011, 26(2): 328-342.
Full text



[1] . Online First Under Construction [J]. Journal of Computer Science and Technology, 0, (): 1 .
[2] Zhi-Neng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, Yu-Gang Jiang. Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues[J]. , 2014, 29(5): 785 -798 .
[3] Fei Xia, De-Jun Jiang, Jin Xiong, Ning-Hui Sun. A Survey of Phase Change Memory Systems[J]. , 2015, 30(1): 121 -144 .
[4] André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. Ad Hoc File Systems for High-Performance Computing[J]. Journal of Computer Science and Technology, 2020, 35(1): 4 -26 .
[5] Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Design and Implementation of the Tianhe-2 Data Storage and Management System[J]. Journal of Computer Science and Technology, 2020, 35(1): 27 -46 .
[6] Reza Jafari Ziarani, Reza Ravanmehr. Serendipity in Recommender Systems: A Systematic Literature Review[J]. Journal of Computer Science and Technology, 2021, 36(2): 375 -396 .
[7] Bo-Han Li, Yi Liu, An-Man Zhang, Wen-Huan Wang, Shuo Wan. A Survey on Blocking Technology of Entity Resolution[J]. Journal of Computer Science and Technology, 2020, 35(4): 769 -793 .
[8] Lie-Huang Zhu, Bao-Kun Zheng, Meng Shen, Feng Gao, Hong-Yu Li, Ke-Xin Shi. Data Security and Privacy in Bitcoin System: A Survey[J]. Journal of Computer Science and Technology, 2020, 35(4): 843 -862 .
[9] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results[J]. Journal of Computer Science and Technology, 2020, 35(3): 493 -505 .
[10] Lan Huang, Da-Lin Li, Kang-Ping Wang, Teng Gao, Adriano Tavares. A Survey on Performance Optimization of High-Level Synthesis Tools[J]. Journal of Computer Science and Technology, 2020, 35(3): 697 -720 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved