满足t-Closeness多敏感属性的隐私保护算法

doi:10.1007/s11390-018-1884-6

满足t-Closeness多敏感属性的隐私保护算法

Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness

摘要

摘要: 虽然k-anonymity模型是数据匿名发布的常用模型，但它无法抵抗属性泄露和相似性攻击等常见攻击。为克服这一缺陷，大量基于k-anonymity模型的改进模型被提出，其中t-closeness模型是最严格的匿名模型之一。虽然大多数已有的t-closeness模型强调了单一敏感属性数据集的隐私保护，但多敏感属性数据集在现实中更为常见。针对这一局限，提出了两种满足t-closeness多敏感属性的隐私保护算法。基于原始数据集的敏感属性取值均匀分散至每个等价类中将易于构建t-closeness模型的观察，提出的两种算法首先根据敏感属性取值对原始数据进行划分，然后根据准标识符属性的相似度从不同划分中选择合适的记录构造等价类。实验结果表明，提出的两种算法在有效地保护数据隐私的同时实现减少了不必要地信息丢失。

Abstract: Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist several common attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closeness models address the case in which the original data have only one single sensitive attribute, data with multiple sensitive attributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitive attributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributes in any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness, both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses a clustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasiidentifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the loss of information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. The results show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm but the former can preserve more original information. In addition, compared with related approaches, both proposed algorithms can achieve stronger protection of privacy and reduce less.

HTML全文

参考文献()

施引文献

资源附件()