计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (4): 775-794.doi: 10.1007/s11390-019-1942-8

所属专题: Data Management and Data Mining

• • 上一篇    下一篇

运用语义搜索活动轨迹

Li-Hua Yin1, Member, CCF, Huiwen Liu2,*   

  1. 1 Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China;
    2 School of Information System, Singapore Management University, Singapore 188065, Singapore
  • 收稿日期:2019-01-22 修回日期:2019-05-27 出版日期:2019-07-11 发布日期:2019-07-11
  • 通讯作者: Huiwen Liu E-mail:hwliu.2018@phdis.smu.edu.sg
  • 作者简介:Li-Hua Yin received her Ph.D.degree in computer science and technology from Harbin Institute of Technology,Harbin,in 2007.She is a professor at Cyberspace Institute of Advanced Technology,Guangzhou University,Guangzhou.Her research interests include information security,big data privacy protection,etc.She is a member of CCF and CIPS.
  • 基金资助:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61872100.

Searching Activity Trajectories with Semantics

Li-Hua Yin1, Member, CCF, Huiwen Liu2,*   

  1. 1 Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China;
    2 School of Information System, Singapore Management University, Singapore 188065, Singapore
  • Received:2019-01-22 Revised:2019-05-27 Online:2019-07-11 Published:2019-07-11
  • Contact: Huiwen Liu E-mail:hwliu.2018@phdis.smu.edu.sg
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61872100.

随着智能手机和移动互联网的广泛使用,社交网络用户产生了大量带有地理标签的推文、照片和视频,形成了大量的信息轨迹,这些轨迹不仅揭示了他们的时空动态,还揭示了他们在现实世界中的活动。现有的空间轨迹查询研究主要侧重于分析用户轨迹的时空特性,而对用户活动的理解却基本未涉及。在本文中,我们将嵌入在轨迹中的活动信息的语义集成到查询建模和处理中,目的是为终端用户提供信息更加丰富和有意义的结果。为此,我们提出了一种新的轨迹查询方法,它不仅考虑了时空的接近性,而且更重要的是,利用文本挖掘领域的成熟技术-概率主题模型,来捕捉数据和查询之间的活动语义相关性。为了支持高效的查询处理,我们将轨迹子结构上的概率主题分布集成到相应索引层的时空范围中,设计了一个基于网格的分层索引。这种特殊的结构使自顶向下搜索算法能够遍历索引,同时在空间和主题维度上修剪不合格的轨迹。在实际数据集上的实验结果表明,所提出的索引和轨迹搜索算法具有非常好的效率性和稳定性。

关键词: 时空数据库, 活动轨迹, 轨迹索引, 轨迹查询处理

Abstract: With the widespread use of smart phones and mobile Internet, social network users have generated massive geo-tagged tweets, photos and videos to form lots of informative trajectories which reveal not only their spatio-temporal dynamics, but also their activities in the physical world. Existing spatial trajectory query studies mainly focus on analyzing the spatio-temporal properties of the users' trajectories, while leaving the understanding of their activities largely untouched. In this paper, we incorporate the semantics of the activity information embedded in trajectories into query modelling and processing, with the aim of providing end users more informative and meaningful results. To this end, we propose a novel trajectory query that not only considers the spatio-temporal closeness but also, more importantly, leverages a proven technique in text mining field, probabilistic topic modelling, to capture the semantic relatedness of the activities between the data and query. To support efficient query processing, we design a hierarchical grid-based index by integrating the probabilistic topic distribution on the substructures of trajectories and their spatio-temporal extent at the corresponding level of the index hierarchy. This specialized structure enables a top-down search algorithm to traverse the index while pruning unqualified trajectories in spatial and topical dimensions simultaneously. The experimental results on real-world datasets demonstrate the good efficiency and scalability performance of the proposed indices and trajectory search methods.

Key words: spatio-temporal database, activity trajectory, semantic understanding, trajectory indexing, trajectory query processing

[1] Xiao X, Zheng Y, Luo Q, Xie X. Finding similar users using category-based location history. In Proc. the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, November 2010, pp.442-445.
[2] Zheng Y, Xie X. Learning location correlation from GPS trajectories. In Proc. the 11th Int. Conference on Mobile Data Management, May 2010, pp.27-32.
[3] Cao X, Cong G, Jensen C S. Mining significant semantic locations from GPS data. Proceedings of the VLDB Endowment, 2010, 3(1):1009-1020.
[4] Zheng Y, Zhang L, Xie X, Ma W Y. Mining interesting locations and travel sequences from GPS trajectories. In Proc. the 18th Int. Conference on World Wide Web, April 2009, pp.791-800.
[5] Chen Z, Shen H T, Zhou X, Zheng Y, Xie X. Searching trajectories by locations:An efficiency study. In Proc. the 2010 ACM SIGMOD Int. Conference on Management of Data, June 2010, pp.255-266.
[6] Xu J, Gao Y, Liu C, Zhao L, Ding Z. Efficient route search on hierarchical dynamic road networks. Distributed and Parallel Databases, 2015, 33(2):227-252.
[7] Dai J, Liu C, Xu J, Ding Z. On personalized and sequenced route planning. World Wide Web:Internet and Web Information Systems, 2016, 19(4):679-705.
[8] Xue A Y, Zhang R, Zheng Y, Xie X, Huang J, Xu Z. Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In Proc. the 29th Int. Conference on Data Engineering, April 2013, pp.254-265.
[9] Zheng K, Shang S, Yuan N J, Yang Y. Towards efficient search for activity trajectories. In Proc. the 29th Int. Conference on Data Engineering, April 2013, pp.230-241.
[10] Liu H, Xu J, Zheng K, Liu C, Du L, Wu X. Semantic-aware query processing for activity trajectories. In Proc. the 10th ACM Int. Conference on Web Search and Data Mining, February 2017, pp.283-292.
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022.
[12] Jagadish H V, Ooi B C, Tan K L, Yu C, Zhang R. iDistance:An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems, 2005, 30(2):364-397.
[13] Blei D M. Probabilistic topic models. Communications of the ACM, 2012, 55(4):77-84.
[14] Li J, Liu C, Yu J X, Chen Y, Sellis T, Culpepper J S. Personalized influential topic search via social network summarization. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7):1820-1834.
[15] Blei D M, Lafferty J D. Dynamic topic models. In Proc. the 23rd Int. Conference on Machine Learning, June 2006, pp.113-120.
[16] Kim S, Smyth P. Hierarchical Dirichlet processes with random effects. In Proc. the 20th Annual Conference on Neural Information Processing Systems, December 2007, pp.697-704.
[17] Du L, Buntine W L, Jin H. Sequential latent Dirichlet allocation:Discover underlying topic structures within a document. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.148-157.
[18] Jiang H, Zhou R, Zhang L, Wang H, Zhang Y. A topic model based on Poisson decomposition. In Proc. the 2017 ACM Conference on Information and Knowledge Management, November 2017, pp.1489-1498.
[19] Li B, Yang X, Zhou R, Wang B, Liu C, Zhang Y. An efficient method for high quality and cohesive topical phrase mining. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(1):120-137.
[20] Li B, Yang X, Zhou R, Wang B, Liu C, Zhang Y. Sentence level topic models for associated topics extraction. World Wide Web:Internet and Web Information Systems:Special Issue on Web and Big Data, 2018, Article No. 7.
[21] Liu Q, Ge Y, Li Z, Chen E, Xiong H. Personalized travel package recommendation. In Proc. the 11th IEEE Int. Conference on Data Mining, December 2011, pp.407-416.
[22] Hu B, Jamali M, Ester M. Spatio-temporal topic modeling in mobile social media for location recommendation. In Proc. the 13th IEEE Int. Conference on Data Mining, December 2013, pp.1073-1078.
[23] Yuan N J, Zheng Y, Xie X, Wang Y, Zheng K, Xiong H. Discovering urban functional zones using latent activity trajectories. IEEE Trans. Knowledge and Data Engineering, 2015, 27(3):712-725.
[24] Cong G, Jensen C S, Wu D. Efficient retrieval of the top-k most relevant spatial web objects. Proceedings of the VLDB Endowment, 2009, 2(1):337-348.
[25] Rocha-Junior J B, Gkorgkas O, Jonassen S, Nørvåg K. Efficient processing of top-k spatial keyword queries. In Proc. the 12th International Symposium on Spatial and Temporal Databases, August 2011, pp.205-222.
[26] Zhang D, Chan C Y, Tan K L. Processing spatial keyword query as a top-k aggregation query. In Proc. the 37th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, July 2014, pp.355-364.
[27] de Felipe I, Hristidis V, Rishe N. Keyword search on spatial databases. In Proc. the 24th Int. Conference on Data Engineering, April 2008, pp.656-665.
[28] Tao Y, Sheng C. Fast nearest neighbor search with keywords. IEEE Trans. Knowledge and Data Engineering, 2014, 26(4):878-888.
[29] Chen Y Y, Suel T, Markowetz A. Efficient query processing in geographic Web search engines. In Proc. the 2006 ACM SIGMOD International Conference on Management of Data, June 2006, pp.277-288.
[30] Zhang C, Zhang Y, Zhang W, Lin X, Cheema M A, Wang X. Diversified spatial keyword search on road networks. In Proc. the 17th International Conference on Extending Database Technology, March 2014, pp.367-378.
[31] Gao Y, Qin X, Zheng B, Chen G. Efficient reverse top-k Boolean spatial keyword queries on road networks. IEEE Trans. Knowledge and Data Engineering, 2015, 27(5):1205-1218.
[32] Luo S, Luo Y, Zhou S, Cong G, Guan J, Yong Z. Distributed spatial keyword querying on road networks. In Proc. the 17th International Conference on Extending Database Technology, March 2014, pp.235-246.
[33] Zheng K, Zheng B, Xu J, Liu G, Liu A, Li Z. Popularityaware spatial keyword search on activity trajectories. World Wide Web:Internet and Web Information Systems, 2017, 20(4):749-773.
[34] Cao X, Cong G, Jensen C S, Ooi B C. Collective spatial keyword querying. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, June 2011, pp.373-384.
[35] Chen Q, Hu H, Xu J. Authenticating top-k queries in location-based services with confidentiality. Proceedings of the VLDB Endowment, 2013, 7(1):49-60.
[36] Li J, Liu C, Islam M S. Keyword-based correlated network computation over large social media. In Proc. the 30th IEEE International Conference on Data Engineering, March 2014, pp.268-279.
[37] Wu D, Choi B, Xu J, Jensen C S. Authentication of moving top-k spatial keyword queries. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(4):922-935.
[38] Guo L, Shao J, Aung H H, Tan K L. Efficient continuous top-k spatial keyword queries on road networks. GeoInformatica, 2015, 19(1):29-60.
[39] Qian Z, Xu J, Zheng K, Sun W, Li Z, Guo H. On efficient spatial keyword querying with semantics. In Proc. the 21st International Conference on Database Systems for Advanced Applications, April 2016, pp.149-164.
[40] Qian Z, Xu J, Zheng K, Zhao P, Zhou X. Semantic-aware top-k spatial keyword queries. World Wide Web:Internet and Web Information Systems, 2018, 21(3):573-594.
[41] Zheng Y, Liu Y, Yuan J, Xie X. Urban computing with taxicabs. In Proc. the 13th Int. Conference on Ubiquitous Computing, September 2011, pp.89-98.
[42] Xie M. EDS:A segment-based distance measure for subtrajectory similarity search. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.1609-1610.
[43] Su H, Zheng K, Wang H, Huang J, Zhou X. Calibrating trajectory data for similarity-based analysis. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, June 2013, pp.833-844.
[44] Xie X, Yiu M L, Cheng R, Lu H. Scalable evaluation of trajectory queries over imprecise location data. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):2029-2044.
[45] Jiang W, Zhu J, Xu J, Li Z, Zhao P, Zhao L. A feature based method for trajectory dataset segmentation and profiling. World Wide Web:Internet and Web Information Systems, 2017, 20(1):5-22.
[46] Bogorny V, Kuijpers B, Alvares L O. ST-DMQL:A semantic trajectory data mining query language. International Journal of Geographical Information Science, 2009, 23(10):1245-1276.
[47] Alvares L O, Bogorny V, Kuijpers B, de Macêdo J A F, Moelans B, Vaisman A. A model for enriching trajectories with semantic geographical information. In Proc. the 15th ACM International Symposium on Geographic Information Systems, November 2007, Article No. 22.
[48] Ying J J C, Lee W C, Weng T C, Tseng V S. Semantic trajectory mining for location prediction. In Proc. the 19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, November 2011, pp.34-43.
[49] Leung K W T, Lee D L, Lee W C. CLR:A collaborative location recommendation framework based on co-clustering. In Proc. the 34th ACM SIGIR Conference on Research and Development in Information Retrieval, July 2011, pp.305-314.
[50] Zheng V W, Zheng Y, Xie X, Yang Q. Collaborative location and activity recommendations with GPS history data. In Proc. the 19th Int. Conference on World Wide Web, April 2010, pp.1029-1038.
[51] Shang S, Ding R, Yuan B, Xie K, Zheng K, Kalnis P. User oriented trajectory search for trip recommendation. In Proc. the 15th Int. Conference on Extending Database Technology, March 2012, pp.156-167.
[1] Wei Chen, Lei Zhao, Jia-Jie Xu, Guan-Feng Liu, Kai Zheng, Xiaofang Zhou. 基于旅行的活动轨迹搜索[J]. , 2015, 30(4): 745-761.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: