We use cookies to improve your experience with our site.

轨迹数据发布中的匿名机制实验分析

Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing

  • 摘要: 1、研究背景(context)
    随着位置检测技术的进步和手机等位置感知设备的日益普及,轨迹数据在不断增长。在大规模轨迹数据上可开发各种应用,但也存在个人位置等隐私信息泄露的风险。
    2、目的(Objective)
    最近,《科学》杂志上有一场关于个人可再识别性(re-identifiability)的有趣辩论。Sánchez等人的主要发现与deMontjoye等人的观点完全相反,这引发了第一个问题:就身份再识别而言,轨迹隐私保护的真实情况如何?此外,匿名化通常会导致数据效用的下降,匿名化机制需要考虑隐私和可用性之间的平衡。这引发了第二个问题:匿名轨迹可用性的真实情况是什么?
    为了回答这两个问题,我们从隐私性(Privacy)和可用性(Utility)两个角度,对现有的用于轨迹数据发布的匿名化机制进行了系统的评估和分析。
    3、方法(Method)
    我们使用了三个真实的轨迹数据集,实现五种匿名机制(即身份匿名化、基于网格的匿名化、虚拟轨迹、k-匿名和ε-差分隐私),系统地评估了匿名化轨迹数据的隐私性和可用性。对于隐私性,本文以轨迹的唯一性(Unicity)作为衡量指标;对于可用性,本文实现了两个真实应用(交通时间估计和窗口范围查询),并以应用的量化效果作为衡量指标。
    4、结果(Result & Findings)
    我们发现,以唯一性衡量,身份匿名化、虚拟轨迹和基于网格的匿名化都不能很好地应对身份再识别,而k-匿名性和差分隐私则可以很好地保护身份信息。这在某种程度上证实了Sánchez等人的发现,同时说明了de Montjoye等人使用的匿名化机制存在局限性。这也回答了第一个问题,即在身份再识别方面轨迹隐私保护的真实情况。
    我们发现,匿名轨迹的可用性由匿名化机制和轨迹数据的具体应用算法共同决定。除了身份匿名化外,目前没有任何匿名化机制能够满足所有应用的可用性需求。
    我们还发现,除了应用于窗口范围查询的ε-差分隐私外,没有任何轨迹数据匿名机制能够很好地平衡隐私性和可用性。这也回答了第二个问题,即匿名轨迹可用性的真实情况。
    5、结论(Conclusions)本文研究说明,以匿名轨迹的可用性而言,目前的情况并不乐观。轨迹隐私保护还有很长的路要走,一方面我们需要设计更好的、对匿名化机制更宽容的应用算法;另一方面也要进一步设计对匿名轨迹可用性影响更小的匿名化机制。

     

    Abstract: With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez et al. is exactly opposite to that of De Montjoye et al., which raises the first question: "what is the true situation of the privacy preservation for trajectories in terms of reidentification?'' Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: "what is the true situation of the utility of anonymized trajectories?'' To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, k-anonymity and ε-differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye et al. and Sánchez et al. To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications.

     

/

返回文章
返回