Hadamard Encoding based Frequent Itemset Mining under Local Differential Privacy Constraints
-
Abstract
Local differential privacy (LDP) approaches to collect sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an item-level LDP frequency oracle approach, named the integrated-with-Hadamard-transform-based frequency oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, allowing perturbation adding in IHFO. For this purpose, an FIM approach, called optimized united itemset mining (O-UISM), is proposed. O-UISM combines the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework to acquire accuracy frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.
-
-