计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (4): 747-761.doi: 10.1007/s11390-019-1940-x

所属专题: Data Management and Data Mining

• • 上一篇    下一篇

分布式数据流中轨迹大数据的自适应连接方法

Jun-Hua Fang1,2, Member, CCF, Peng-Peng Zhao1, An Liu1, Member, CCF, Zhi-Xu Li1, Member, CCF, ACM, IEEE, Lei Zhao1, Member, CCF   

  1. 1 Institute of Artificial Intelligence, School of Computer Science and Technology, Soochow University, Suzhou 215006, China;
    2 Neusoft Corporation, Shenyang 110179, China
  • 收稿日期:2019-01-13 修回日期:2019-05-13 出版日期:2019-07-11 发布日期:2019-07-11
  • 作者简介:Jun-Hua Fang is a lecturer in Advanced Data Analytics Group at the School of Computer Science and Technology,Soochow University,Suzhou.Before joining Soochow University,he earned his Ph.D.degree in computer science from East China Normal University,Shanghai,in 2017.He is a member of CCF.His research focuses on distributed database and parallel streaming analytics.
  • 基金资助:
    This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61802273, 61772356, and 61836007, the Postdoctoral Science Foundation of China under Grant No. 2017M621813, the Postdoctoral Science Foundation of Jiangsu Province of China under Grant No. 2018K029C, and the Natural Science Foundation for Colleges and Universities in Jiangsu Province of China under Grant No. 18KJB520044. This work was also supported by the Open Program of Neusoft Corporation under Grant No. SKLSAOP1801 and Blockshine Technology Corporation of China.

Scalable and Adaptive Joins for Trajectory Data in Distributed Stream System

Jun-Hua Fang1,2, Member, CCF, Peng-Peng Zhao1, An Liu1, Member, CCF, Zhi-Xu Li1, Member, CCF, ACM, IEEE, Lei Zhao1, Member, CCF   

  1. 1 Institute of Artificial Intelligence, School of Computer Science and Technology, Soochow University, Suzhou 215006, China;
    2 Neusoft Corporation, Shenyang 110179, China
  • Received:2019-01-13 Revised:2019-05-13 Online:2019-07-11 Published:2019-07-11
  • Supported by:
    This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61802273, 61772356, and 61836007, the Postdoctoral Science Foundation of China under Grant No. 2017M621813, the Postdoctoral Science Foundation of Jiangsu Province of China under Grant No. 2018K029C, and the Natural Science Foundation for Colleges and Universities in Jiangsu Province of China under Grant No. 18KJB520044. This work was also supported by the Open Program of Neusoft Corporation under Grant No. SKLSAOP1801 and Blockshine Technology Corporation of China.

近年来,移动对象间的相似度匹配作为基于位置服务的基本操作正在引起广泛关注。由于轨迹数据量的飞速增长和基于轨迹数据的更复杂应用需求出现,人们对轨迹相似度处理的需求在质和量上都有提升,即轨迹大数据的相似度处理正在要求以实时且大吞吐量的方式进行。现有工作已经将轨迹相似度的计算部署至分布式并行计算架构中。然而,这些工作不能保证轨迹相似度的实时处理。具体来说,直接将轨迹处理部署至现有处理系统中会引起系统负载的持续且动态的不均衡。基于此,本文针对实时轨迹相似度操作提出了一种自适应的计算架构。这种架构能够保证系统在动态负载下保持并行处理节点的均衡状态进而最大化系统吞吐量,同时使系统自适应的裁剪掉不必要的负载。实验结果从不同方面证明了本文方案的优越性。

关键词: 实时数据处理, 分布式计算, 轨迹相似度, 负载均衡

Abstract: As a fundamental operation in LBS (location-based services), the trajectory similarity of moving objects has been extensively studied in recent years. However, due to the increasing volume of moving object trajectories and the demand of interactive query performance, the trajectory similarity queries are now required to be processed on massive datasets in a real-time manner. Existing work has proposed distributed or parallel solutions to enable large-scale trajectory similarity processing. However, those techniques cannot be directly adapted to the real-time scenario as it is likely to generate poor balancing performance when workload variance occurs on the incoming trajectory stream. In this paper, we propose a new workload partitioning framework, ART (Adaptive Framework for Real-Time Trajectory Similarity), which introduces practical algorithms to support dynamic workload assignment for RTTS (real-time trajectory similarity). Our proposal includes a processing model tailored for the RTTS scenario, a load balancing framework to maximize throughput, and an adaptive data partition manner designed to cut off unnecessary network cost. Based on this, our model can handle the large-scale trajectory similarity in an on-line scenario, which achieves scalability, effectiveness, and efficiency by a single shot. Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposal and prove the huge advantage of our approach over state-of-the-art solutions in the literature.

Key words: real-time data processing, distributed computing, trajectory similarity, load balancing

[1] Trasarti R, Pinelli F, Nanni M et al. Mining mobility user profiles for car pooling. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2011, pp.1190-1198.
[2] Zheng K, Zheng Y, Yuan N J et al. Online discovery of gathering patterns over trajectories. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1974-1988.
[3] Li L, Kim J, Xu J et al. Time-dependent route scheduling on road networks. SIGSPATIAL Special, 2018, 10(1):10-14.
[4] Shang S, Ding R, Zheng K et al. Personalized trajectory matching in spatial networks. The International Journal on Very Large Data Bases, 2014, 23(3):449-468.
[5] Shang S, Chen L, Wei Z et al. Collective travel planning in spatial networks. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(5):1132-1146.
[6] Su H, Zheng K, Wang H et al. Calibrating trajectory data for similarity-based analysis. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, June 2013, pp.833-844.
[7] Su H, Zheng K, Huang J et al. Calibrating trajectory data for spatio-temporal similarity analysis. The International Journal on Very Large Data Bases, 2015, 24(1):93-116.
[8] Sun J, Xu J, Zhou R et al. Discovering expert drivers from trajectories. In Proc. the 34th International Conference on Data Engineering, April 2018, pp.1332-1335.
[9] Shang S, Chen L, Wei Z et al. Parallel trajectory similarity joins in spatial networks. The International Journal on Very Large Data Bases, 2018, 27(3):395-420.
[10] Shang S, Chen L, Wei Z et al. Trajectory similarity join in spatial networks. Proceedings of the VLDB Endowment, 2017, 10(11):1178-1189.
[11] Xie D, Li F, Phillips J M. Distributed trajectory similarity search. Proceedings of the VLDB Endowment, 2017, 10(11):1478-1489.
[12] Toshniwal A, Taneja S, Shukla A et al. Storm@twitter. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.147-156.
[13] Kulkarni S, Bhagat N, Fu M et al. Twitter heron:Stream processing at scale. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.239-250.
[14] Zaharia M, Das T, Li H et al. Discretized streams:Faulttolerant streaming computation at scale. In Proc. the 24th ACM SIGOPS Symposium on Operating Systems Principles, November 2013, pp.423-438.
[15] Nasir M A U, Morales G D F, Garcia-Soriano D et al. The power of both choices:Practical load balancing for distributed stream processing engines. In Proc. the 31st IEEE International Conference on Data Engineering, April 2015, pp.137-148.
[16] Nasir M A U, Morales G D F, Kourtellis N et al. When two choices are not enough:Balancing at scale in distributed stream processing. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.589-600.
[17] Lin Q, Ooi B C, Wang Z et al. Scalable distributed stream join processing. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.811-825.
[18] Elseidy M, Elguindy A, Vitorovic A et al. Scalable and adaptive online joins. Proceedings of the VLDB Endowment, 2014, 7(6):441-452.
[19] Wang H, Su H, Zheng K et al. An effectiveness study on trajectory similarity measures. In Proc. the 24th Australasian Database Conference, February 2013, pp.13-22.
[20] Ying R, Pan J, Fox K et al. A simple efficient approximation algorithm for dynamic time warping. In Proc. the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, October 2016, Article No. 21.
[21] Vlachos M, Kollios G, Gunopulos D. Discovering similar multidimensional trajectories. In Proc. the 18th International Conference on Data Engineering, February 2002, pp.673-684.
[22] Chen L, Özsu M T, Oria V. Robust and fast similarity search for moving object trajectories. In Proc. the 2005 ACM SIGMOD International Conference on Management of Data, June 2005, pp.491-502.
[23] Frentzos E, Gratsias K, Theodoridis Y. Index-based most similar trajectory search. In Proc. the 23rd International Conference on Data Engineering, April 2007, pp.816-825.
[24] Yanagisawa Y, Akahani J, Satoh T. Shape-based similarity query for trajectory of mobile objects. In Proc. the 3rd International Conference on Mobile Data Management, January 2003, pp.63-77.
[25] Jiang Y, Li G, Feng J et al. String similarity joins:An experimental evaluation. Proceedings of the VLDB Endowment, 2014, 7(8):625-636.
[26] Gedik B. Partitioning functions for stateful data parallelism in stream processing. The International Journal on Very Large Data Bases, 2014, 23(4):517-539.
[27] Shah M A, Hellerstein J M, Chandrasekaran S et al. Flux:An adaptive partitioning operator for continuous query systems. In Proc. the 19th International Conference on Data Engineering, March 2003, pp.25-36.
[1] Yun-Cong Zhang, Xiao-Yang Wang, Cong Wang, Yao Xu, Jian-Wei Zhang, Xiao-Dong Lin, Guang-Yu Sun, Gong-Lin Zheng, Shan-Hui Yin, Xian-Jin Ye, Li Li, Zhan Song, Dong-Dong Miao. Bigflow:一种分布式计算框架的通用优化层[J]. 计算机科学技术学报, 2020, 35(2): 453-467.
[2] Guo-Wei Wang, Jin-Dou Zhang, Jing Li. 用户异源移动轨迹链接的研究[J]. , 2018, 33(4): 792-806.
[3] Qin Liu, Yuhong Guo, Jie Wu, Guojun Wang. 云计算环境下有效地查询分组策略[J]. 计算机科学技术学报, 2017, 32(6): 1231-1249.
[4] Xin Bi, Xiang-Guo Zhao, Guo-Ren Wang. 基于节点分发的分布式Twig查询处理技术[J]. , 2017, 32(1): 78-92.
[5] Dong-Hong Han, Xin Zhang, Guo-Ren Wang. 利用分布式极限学习机分类不确定演化数据流[J]. , 2015, 30(4): 874-887.
[6] Wen-Yu Li, Xiang Zhang, Shu-Cong Jia, Xin-Yu Gu, Lin Zhang, Xiao-Yu Duan, and Ji. LTE SON网络下一种动态自适应负载均衡与切换的联合最优化算法[J]. , 2013, 28(3): 437-444.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: