Journal of Computer Science and Technology

   

Hadamard Encoding based Frequent Itemset Mining under Local Differential Privacy Constraints

Dan Zhao1,2(赵丹), Su-yun Zhao2(赵素云), Member, CCF, Hong Chen2,∗(陈红), Distinguished Member, CCF, Rui-xuan Liu2(刘睿瑄), Cui-ping Li2(李翠平), Distinguished Member, CCF, and Xiao-ying Zhang2(张晓莹)   

  1. 1Institute of Scientific and Technical Information of China, Beijing 100038, China
    2Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education School of Information, Renmin University of China, Beijing 100872, China
  • Received:2021-01-31 Revised:2023-02-11 Accepted:2023-02-21
  • Contact: Hong Chen E-mail:chong@ruc.edu.cn
  • About author:Hong Chen received the B.S., M.S., and Ph.D. degrees, professor, doctoral supervisor, distinguished member of CCF. She is currently with the School of Information, Renmin University of China. Her current research interests include high-performance database system, data warehouse and data mining, stream data management, and data management in wireless sensor networks.

Local differential privacy (LDP) approaches to collect sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an item-level LDP frequency oracle approach, named the integrated-with-Hadamard-transform-based frequency oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, allowing perturbation adding in IHFO. For this purpose, an FIM approach, called optimized united itemset mining (O-UISM), is proposed. O-UISM combines the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework to acquire accuracy frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.


中文摘要

频繁项集挖掘是数据挖掘中的重要问题,而在挖掘中使用本地化差分隐私技术收集敏感信息能有效的保障用户隐私。大部分本地化差分隐私技术主要应对的是用户只有一个项值的情况,而当每个用户都有一组项值时,现有的本地化差分隐私技术使用“填充和采样”的步骤来获得频繁项集及其频率。目前最优的算法SVSM也需要平衡结果的方差和偏差来获得相对准确的结果。如何能同时降低方差和偏差是目前满足本地化差分隐私技术的频繁项集挖掘的挑战。为了解决这个问题,我们提出一种满足值-级别的本地化隐私方法IHFO,它是首次把哈达玛编码引入来应对用户拥有值集的情况,因为哈达玛编码能将任意多的值集编码为一个固定的向量,并同时使用频数估计和均值估计的本地化差分隐私算法对其增加扰动。在此基础上我们提出了一种优化的联合项目集挖掘方法(O-UISM)来进行频繁项集挖掘。O-UISM将基于SVSM中应用的PSFO和本文的IHFO结合到一个框架中,因为我们发现PSFO算法在不考虑频数准确性的前提下,寻找频繁一项集更加准确,而本文提出的IHFO对于获取频率估计更加准确,因此将两者的优势一起来使用,从而可以获得更加准确的频繁项集及其频率。最后我们从理论和实验上证明的在相同的隐私保护条件下,O-UISM在寻找频繁项集和项集的频率估计方面都优于之前的算法。

Key words: Local Differential Privacy; Frequent Itemset Mining; Frequency Oracle;

[1] Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data [J]. , 2017, 32(2): 368-385.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved