SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Zhao D, Zhao SY, Chen H et al. Hadamard encoding based frequent itemset mining under local differential privacy. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(6): 1403−1422 Nov. 2023. DOI: 10.1007/s11390-023-1346-7. |
Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.
[1] |
Li N H, Qardaji W, Su D, Cao J N. PrivBasis: Frequent itemset mining with differential privacy. Proceedings of the VLDB Endowment, 2012, 5(11): 1340–1351. DOI: 10.14778/2350229.2350251.
|
[2] |
Xiong X Y, Chen F, Huang P Z, Tian M M, Hu X F, Chen B D, Qin J. Frequent itemsets mining with differential privacy over large-scale data. IEEE Access, 2018, 6: 28877–28889. DOI: 10.1109/ACCESS.2018.2839752.
|
[3] |
Wang T H, Li N H, Jha S. Locally differentially private frequent itemset mining. In Proc. the 2018 IEEE Symposium on Security and Privacy (SP), May 2018, pp.127–143. DOI: 10.1109/SP.2018.00035.
|
[4] |
Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499. DOI: 10.5555/645920.672836.
|
[5] |
Adar E, Weld D S, Bershad B N, Gribble S S. Why we search: Visualizing and predicting user behavior. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.161–170. DOI: 10.1145/1242572.1242595.
|
[6] |
Brin S, Motwani R, Silverstein C. Beyond market baskets: Generalizing association rules to correlations. In Proc. the 1997 ACM SIGMOD International Conference on Management of Data, Jun. 1997, pp.265–276. DOI: 10.1145/253260.253327.
|
[7] |
Dwork C. Differential privacy: A survey of results. In Proc. the 5th International Conference on Theory and Applications of Models of Computation, Apr. 2008. DOI: 10.1007/978-3-540-79228-4_1.
|
[8] |
Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In Proc. the 3rd Theory of Cryptography Conference, Mar. 2006, pp.265–284. DOI: 10.1007/11681878_14.
|
[9] |
Wang S W, Huang L S, Nie Y W, Wang P Z, Xu H L, Yang W. PrivSet: Set-valued data analyses with locale differential privacy. In Proc. the 2018 IEEE Conference on Computer Communications, Apr. 2018, pp.1088–1096. DOI: 10.1109/INFOCOM.2018.8486234.
|
[10] |
Su S, Xu S Z, Cheng X, Li Z Y, Yang F C. Differentially private frequent itemset mining via transaction splitting. IEEE Trans. Knowledge and Data Engineering, 2015, 27(7): 1875–1891. DOI: 10.1109/TKDE.2015.2399310.
|
[11] |
Qin Z, Yang Y, Yu T, Khalil I, Xiao X K, Ren K. Heavy hitter estimation over set-valued data with local differential privacy. In Proc. the 2016 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2016, pp.192–203. DOI: 10.1145/2976749.2978409.
|
[12] |
Cormode G, Jha S, Kulkarni T, Li N H, Srivastava D, Wang T H. Privacy at scale: Local differential privacy in practice. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.1655–1658. DOI: 10.1145/3183713.3197390.
|
[13] |
Zeng C, Naughton J F, Cai J Y. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment, 2012, 6(1): 25–36. DOI: 10.14778/2428536.2428 539.
|
[14] |
Vadhan S. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, Lindell Y (ed.), Springer, 2017, pp.347–450. DOI: 10.1007/978-3-319-57048-8_7.
|
[15] |
Li N H, Lyu M, Su D, Yang W N. Differential privacy: From theory to practice. Synthesis Lectures on Information Security, Privacy, & Trust, 2016, 8(4): 1–138. DOI: 10.1007/978-3-031-02350-7.
|
[16] |
Papernot N, Song S, Mironov I, Raghunathan A, Talwar K, Erlingsson Ú. Scalable private learning with PATE. arXiv: 1802.08908, 2018. https://arxiv.org/abs/1802.08908, Dec. 2023.
|
[17] |
Kulkarni T, Cormode G, Srivastava D. Marginal release under local differential privacy. arXiv: 1711.02952, 2017. https://arxiv.org/abs/1711.02952, Dec. 2023.
|
[18] |
Nguyên T T, Xiao X K, Yang Y, Hui S C, Shin H, Shin J. Collecting and analyzing data from smart device users with local differential privacy. arXiv: 1606.05053, 2016. https://arxiv.org/abs/1606.05053, Dec. 2023.
|
[19] |
Wang T H, Blocki J, Li N H, Jha S. Locally differentially private protocols for frequency estimation. In Proc. the 26th USENIX Conference on Security Symposium, Aug. 2017, pp.729–745. DOI: 10.5555/3241189.3241247.
|
[20] |
Bassily R, Smith A. Local, private, efficient protocols for succinct histograms. In Proc. the 47th Annual ACM Symposium on Theory of Computing, Jun. 2015, pp.127–135. DOI: 10.1145/2746539.2746632.
|
[21] |
Wang T H, Li N H, Jha S. Locally differentially private heavy hitter identification. IEEE Trans. Dependable and Secure Computing, 2021, 18(2): 982–993. DOI: 10.1109/TDSC.2019.2927695.
|
[22] |
Duchi J C, Wainwright M J, Jordan M I. Local privacy and minimax bounds: Sharp rates for probability estimation. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.1529–1537. DOI: 10.5555/2999611.2999782.
|
[23] |
Ye Q Q, Hu H B, Meng X F, Zheng H D. PrivKV: Key-value data collection with local differential privacy. In Proc. the 2019 IEEE Symposium on Security and Privacy (SP), May 2019, pp.317–331. DOI: 10.1109/SP.2019.00018.
|
[24] |
Gu X L, Li M, Cheng Y Q, Xiong L, Cao Y. PCKV: Locally differentially private correlated key-value data collection with optimized utility. In Proc. the 29th USENIX Conference on Security Symposium, Aug. 2020, pp.967–984. DOI: 10.5555/3489212.3489267.
|
[25] |
Erlingsson Ú, Feldman V, Mironov I, Raghunathan A, Talwar K, Thakurta A. Amplification by shuffling: From local to central differential privacy via anonymity. In Proc. the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, Jan. 2019, pp.2468–2479. DOI: 10.1137/1.9781611975482.151.
|
[26] |
Duchi J C, Jordan M I, Wainwright M J. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 2018, 113(521): 182–201. DOI: 10.1080/01621459.2017.1389735.
|
[27] |
Wang N, Xiao X K, Yang Y, Zhao J, Hui S C, Shin H, Shin J, Yu G. Collecting and analyzing multidimensional data with local differential privacy. In Proc. the 35th International Conference on Data Engineering (ICDE), Apr. 2019, pp.638–649. DOI: 10.1109/ICDE.2019.00063.
|
[28] |
Liu R X, Cao Y, Chen H, Guo R Y, Yoshikawa M. FLAME: Differentially private federated learning in the shuffle model. In Proc. the AAAI Conference on Artificial Intelligence, May 2021, pp.8688–8696. DOI: 10.1609/aaai.v35i10.17053.
|
[29] |
Chen R, Li H R, Qin A K, Kasiviswanathan S P, Jin H. Private spatial data aggregation in the local setting. In Proc. the 32nd IEEE International Conference on Data Engineering (ICDE), May 2016, pp.289–300. DOI: 10.1109/ICDE.2016.7498248.
|
[30] |
Gursoy M E, Tamersoy A, Truex S, Wei W Q, Liu L. Secure and utility-aware data collection with condensed local differential privacy. IEEE Trans. Dependable and Secure Computing, 2021, 18(5): 2365–2378. DOI: 10.1109/TDSC.2019.2949041.
|
[31] |
Murakami T, Kawamoto Y. Utility-optimized local differential privacy mechanisms for distribution estimation. In Proc. the 28th USENIX Conference on Security Symposium, Aug. 2019, pp.1877–1894. DOI: 10.5555/3361338.3361468.
|
[32] |
Gu X L, Li M, Xiong L, Cao Y. Providing input-discriminative protection for local differential privacy. In Proc. the 36th IEEE International Conference on Data Engineering (ICDE), Apr. 2020, pp.505–516. DOI: 10.1109/ICDE48307.2020.00050.
|
[33] |
Han X, Wang M, Zhang X J, Meng X F. Differentially private top-k query over mapreduce. In Proc. the 4th International Workshop on Cloud Data Management, Oct. 2012, pp.25–32. DOI: 10.1145/2390021.2390027.
|
[34] |
Evfimievski A, Gehrke J, Srikant R. Limiting privacy breaches in privacy preserving data mining. In Proc. the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Jun. 2003, pp.211–222. DOI: 10.1145/773153.773174.
|
[35] |
Bassily R, Nissim K, Stemmer U, Thakurta A. Practical locally private heavy hitters. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.2288–2296. DOI: 10.5555/3294771.3294989.
|
[36] |
Bun M, Nelson J, Stemmer U. Heavy hitters and the structure of local privacy. In Proc. the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, May 2018, pp.435–447. DOI: 10.1145/3196959.3196981.
|
[37] |
Acharya J, Sun Z T, Zhang H. Hadamard response: Estimating distributions privately, efficiently, and with little communication. arXiv: 1802.04705, 2018. https://arxiv.org/abs/1802.04705, Dec. 2023.
|
[38] |
Liu Y H, Suresh A T, Yu F, Kumar S, Riley M. Learning discrete distributions: User vs item-level privacy. arXiv: 2007.13660, 2020. https://arxiv.org/abs/2007.13660, Dec. 2023.
|
[39] |
Kairouz P, Oh S, Viswanath P. Extremal mechanisms for local differential privacy. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2879–2887. DOI: 10.5555/2969033.2969148.
|
[1] | Xiao-Jun Zhu, Li-Jie Xu, Xiao-Bing Wu, Bing Chen. Minimum Time Extrema Estimation for Large-Scale Radio-Frequency Identification Systems[J]. Journal of Computer Science and Technology, 2020, 35(5): 1099-1114. DOI: 10.1007/s11390-020-9828-3 |
[2] | Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data[J]. Journal of Computer Science and Technology, 2017, 32(2): 368-385. DOI: 10.1007/s11390-017-1726-y |
[3] | Yong-Xin Tong, Lei Chen, Jieying She. Mining Frequent Itemsets in Correlated Uncertain Databases[J]. Journal of Computer Science and Technology, 2015, 30(4): 696-712. DOI: 10.1007/s11390-015-1555-9 |
[4] | Hai-Bo Tian, Willy Susilo, Yang Ming, Yu-Min Wang. A Provable Secure ID-Based Explicit Authenticated Key Agreement Protocol Without Random Oracles[J]. Journal of Computer Science and Technology, 2008, 23(5): 832-842. |
[5] | Daniel Kunkle, Donghui Zhang, Gene Cooperman. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy[J]. Journal of Computer Science and Technology, 2008, 23(1): 77-02. |
[6] | Jian-Hua Feng, Qian Qian, Jian-Yong Wang, Li-Zhu Zhou. Efficient Mining of Frequent Closed XML Query Pattern[J]. Journal of Computer Science and Technology, 2007, 22(5): 725-735. |
[7] | Shi-Zhu Liu, He-Ping Hu. Text Classification Using Sentential Frequent Itemsets[J]. Journal of Computer Science and Technology, 2007, 22(2): 334-337. |
[8] | DU XiaoPing, TANG ShiWei, Akifumi Makinouchi. Maintaining Discovered Frequent Itemsets: Cases for ChangeableDatabase and Support[J]. Journal of Computer Science and Technology, 2003, 18(5). |
[9] | LI Qingzhong, WANG Haiyang, YAN Zhongmin, MA Shaohan. Efficient Mining of Association Rules by Reducing the Number of Passes over the Database[J]. Journal of Computer Science and Technology, 2001, 16(2). |
[10] | HUANG Liusheng, CHEN Huaping, WANG Xun, CHEN Guoliang. A Fast Algorithm for Mining Association Rules[J]. Journal of Computer Science and Technology, 2000, 15(6): 619-624. |