We use cookies to improve your experience with our site.

本地化差分隐私技术下基于哈达玛编码的频繁项集挖掘

Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy

  • 摘要:
    频繁项集挖掘是数据挖掘中的重要问题,而在挖掘中使用本地化差分隐私技术收集敏感信息能有效的保障用户隐私。大部分本地化差分隐私技术主要应对的是用户只有一个项值的情况,而当每个用户都有一组项值时,现有的本地化差分隐私技术使用“填充和采样”的步骤来获得频繁项集及其频率。目前最优的算法SVSM也需要平衡结果的方差和偏差来获得相对准确的结果。如何能同时降低方差和偏差是目前满足本地化差分隐私技术的频繁项集挖掘的挑战。
    为了解决这个问题,我们提出一种满足值-级别的本地化隐私方法IHFO,它是首次把哈达玛编码引入来应对用户拥有值集的情况,因为哈达玛编码能将任意多的值集编码为一个固定的向量,并同时使用频数估计和均值估计的本地化差分隐私算法对其增加扰动。
    在此基础上我们提出了一种优化的联合项目集挖掘方法(O-UISM)来进行频繁项集挖掘。O-UISM将基于SVSM中应用的PSFO和本文的IHFO结合到一个框架中,因为我们发现PSFO算法在不考虑频数准确性的前提下,寻找频繁一项集更加准确,而本文提出的IHFO对于获取频率估计更加准确,因此将两者的优势一起来使用,从而可以获得更加准确的频繁项集及其频率。
    最后我们从理论和实验上证明的在相同的隐私保护条件下,O-UISM在寻找频繁项集和项集的频率估计方面都优于之前的算法。

     

    Abstract: Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.

     

/

返回文章
返回