Abstract The big data era is coming with strong and ever-growing demands on analyzing personal information and footprints in the cyber world. To enable such analysis without privacy leak risk, differential privacy (DP) is quickly rising in recent years, as the first practical privacy protection model with rigorous theoretical guarantee. This paper discusses how to publish differentially private histograms on events in time series domain, with sequences of personal events over graphs with events as edges. Such individual-generated sequences commonly appear in formalized industrial workflows, online game logs and spatial-temporal trajectories, the direct publication of which may compromise personal privacy. While existing DP mechanisms mainly target at normalized domains with fixed and aligned dimensions, our problem raises new challenges when the sequences could follow arbitrary paths on the graph. To tackle the problem, we reformulate the problem with a three-step framework, which 1) carefully truncates the original sequences, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy, 2) decomposes the event graph into path sub-domains based on the given query workload, and 3) employs a deeply optimized tree-based histogram construction approach for each sub-domain to benefit with less noise addition. We present a careful analysis on our framework to support thorough optimizations over each step of the framework, and verify the huge improvements of our proposals over state-of-the-art solutions.
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61472071 and 61272179, the Fundamental Research Funds for Central Universities of China under Grant No. 161604005, and the National Basic Research 973 Program of China under Grant No. 2012CB316201.
Corresponding Authors: Ge Yu,email@example.com
About author: Ning Wang is a Ph.D. candidate in computer software and theory, Northeastern University, Shenyang. She received her B.E. and M.E. degrees in computer science both from Northeastern University, Shenyang, in 2011 and 2013 respectively. Her current area of research is data privacy.
Cite this article:
Ning Wang, Yu Gu, Jia Xu, Fang-Fang Li, Ge Yu.Differentially Private Event Histogram Publication on Sequences over Graphs[J] Journal of Computer Science and Technology, 2017,V32(5): 1008-1024
 Poel D V, Buckinx W. Predicting online-purchasing behaviour. European Journal of Operational Research, 2005, 166(2):557-575. Wang J, Song S, Zhu X, Lin X. Efficient recovery of missing events. Proceedings of the VLDB Endowment, 2013, 6(10):841-852. Jeung H, Yiu M L, Zhou X, Jensen C S. Path prediction and predictive range querying in road network databases. The International Journal on Very Large Data Bases, 2010, 19(4):585-602. Song R, Sun W, Zheng B, Zheng Y. Press:A novel framework of trajectory compression in road networks. Proceedings of the VLDB Endowment, 2014, 7(9):661-672. Liu C, White R W, Dumais S T. Understanding web browsing behaviors through Weibull analysis of dwell time. In Proc. the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2010, pp.379-386. Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In Proc. the 3rd Conference on Theory of Cryptography, March 2006, pp.265-284. Götz M, Nath S, Gehrke J. Maskit:Privately releasing user context streams for personalized mobile applications. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.289-300. He Y, Barman S, Wang D, Naughton J F. On the complexity of privacy-preserving complex event processing. In Proc. the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2011, pp.165-174. Wang D, He Y, Rundensteiner E A, Naughton J F. Utilitymaximizing event stream suppression. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, June 2013, pp.589-600. Rastogi V, Nath S. Differentially private aggregation of distributed time-series with transformation and encryption. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.735-746. Chen R, Ács G, Castelluccia C. Differentially private se-quential data publication via variable-length n-grams. In Proc. the 2012 ACM conference on Computer and Communications Security, October 2012, pp.638-649. Zeng C, Naughton J F, Cai J. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment, 2012, 6(1):25-36. Hay M, Rastogi V, Miklau G, Suciu D. Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment, 2010, 3(1):1021-1032. Qardaji W H, Yang W, Li N. Understanding hierarchical methods for differentially private histograms. Proceedings of the VLDB Endowment, 2013, 6(14):1954-1965. Li H G, Chen S, Tatemura J, Agrawal D, Candan K S, Hsiung W P. Safety guarantee of continuous join queries over punctuated data streams. In Proc. the 32nd International Conference on Very Large Data Bases, September 2006, pp.19-30. Maheshwari P, Tam S S L. Events-based exception handling in supply chain management using Web services. In Proc. the Advanced Int. Conference on Telecommunications and Int. Conference on Internet and Web Applications and Services, February 2006, pp.151-156. Roh G P, Roh J W, Hwang S, Yi B K. Supporting patternmatching queries over trajectories on road networks. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(11):1753-1758. Maruseac M, Ghinita G, Rughinis R. Privacy-preserving publication of provenance workflows. In Proc. the 4th ACM Conference on Data and Application Security and Privacy, February 2014, pp.159-162. Chen R, Fung B C M, Desai B C, Sossou N M. Differentially private transit data publication:A case study on the Montreal transportation system. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2012, pp.213-221. McSherry F, Mahajan R. Differentially-private network trace analysis. In Proc. the ACM SIGCOMM 2010 Conference, August 2010, pp.123-134. He X, Cormode G, Machanavajjhala A, Procopiuc C M, Srivastava D. DPT:Differentially private trajectory synthesis using hierarchical reference systems. Proceedings of the VLDB Endowment, 2015, 8(11):1154-1165. Kellaris G, Papadopoulos S. Practical differential privacy via grouping and smoothing. Proceedings of the VLDB Endowment, 2013, 6(5):301-312. Yuan G, Zhang Z, Winslett M, Xiao X, Yang Y, Hao Z. Lowrank mechanism:Optimizing batch queries under differential privacy. Proceedings of the VLDB Endowment, 2012, 5(11):1352-1363. Li C, Miklau G. An adaptive mechanism for accurate query answering under differential privacy. Proceedings of the VLDB Endowment, 2012, 5(6):514-525. Li C, Hay M, Miklau G, Wang Y. A data-and workloadaware query answering algorithm for range queries under differential privacy. Proceedings of the VLDB Endowment, 2014, 7(5):341-352.