计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (5): 1084-1098.doi: 10.1007/s11390-020-9724-x

所属专题: Computer Networks and Distributed Computing

• • 上一篇    下一篇

一种基于时空因果性的城市感知数据治理方法

Bi-Ying Yan1,2, Member, CCF, Chao Yang3,4, Senior Member, CCF, Member, ACM, IEEE, Pan Deng5,*, Senior Member, CCF, Member, ACM, Qiao Sun1, Feng Chen1,6, and Yang Yu1,2,7   

  1. 1 Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 School of Mathematical Sciences, Peking University, Beijing 100871, China;
    4 Peng Cheng Laboratory, Shenzhen 518052, China;
    5 Research Institute for Frontier Science, Beihang University, Beijing 100191, China;
    6 Guiyang Academy of Information Technology, Guiyang 550081, China;
    7 Guiyang Municipal Commission of Transport, Guiyang 550003, China
  • 收稿日期:2019-05-18 修回日期:2019-12-09 出版日期:2020-09-20 发布日期:2020-09-29
  • 通讯作者: Pan Deng E-mail:pandeng@buaa.edu.cn
  • 作者简介:Bi-Ying Yan is currently working toward her Ph.D. degree in University of Chinese Academy of Sciences, Beijing. She received her B.S. degree in computer science and technology from Beihang University, Beijing, in 2013. She is currently a research assistant in the Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing. Her research interests include machine learning, data mining, big data and parallel computing.
  • 基金资助:
    This work was supported in part by the National Key Research and Development Program of China under Grant No. 2018YFC0-831500.

A Spatiotemporal Causality Based Governance Framework for Noisy Urban Sensory Data

Bi-Ying Yan1,2, Member, CCF, Chao Yang3,4, Senior Member, CCF, Member, ACM, IEEE, Pan Deng5,*, Senior Member, CCF, Member, ACM, Qiao Sun1, Feng Chen1,6, and Yang Yu1,2,7        

  1. 1 Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 School of Mathematical Sciences, Peking University, Beijing 100871, China;
    4 Peng Cheng Laboratory, Shenzhen 518052, China;
    5 Research Institute for Frontier Science, Beihang University, Beijing 100191, China;
    6 Guiyang Academy of Information Technology, Guiyang 550081, China;
    7 Guiyang Municipal Commission of Transport, Guiyang 550003, China
  • Received:2019-05-18 Revised:2019-12-09 Online:2020-09-20 Published:2020-09-29
  • Contact: Pan Deng E-mail:pandeng@buaa.edu.cn
  • Supported by:
    This work was supported in part by the National Key Research and Development Program of China under Grant No. 2018YFC0-831500.

城市感知是城市计算的基本组成部分之一。它使用部署在不同地理空间位置的各种传感器,持续监测城市的自然和人文环境。但是,这些传感器由于分布不均匀,采样率低和故障率高等原因,往往读数不太可靠。本文提出了一个创新的框架来检测噪声数据,并从时空因果关系的角度修复它们。该方法使用Skip-gram模型来估计空间相关性,采用长短期记忆神经网络来估计时间相关性。该框架由三个主要模块组成:1)嵌入空间信息的双向LSTM序列标记模块,用于检测噪声数据和潜在丢失数据。2)嵌入空间信息的双向LSTM序列预测模块预测缺失轨迹点的位置。3)融合物体特征的数据修复模块,用于修复错误数据。该方法采用中国某中等规模城市中3000余电子交通卡口设备收集的实际数据进行评估,结果优于多种参考方法。与原始数据相比,修复后的数据准确性提高了12.9%,该方法在城市治理的各种实际应用中发挥了重要作用,例如刑事侦查,交通违规监控和设备维护。

关键词: 轨迹数据, 循环神经网络, 时空大数据, 城市计算

Abstract: Urban sensing is one of the fundamental building blocks of urban computing. It uses various types of sensors deployed in different geospatial locations to continuously and cooperatively monitor the natural and cultural environment in urban areas. Nevertheless, issues such as uneven distribution, low sampling rate and high failure ratio of sensors often make their readings less reliable. This paper provides an innovative framework to detect the noise data as well as to repair them from a spatial-temporal causality perspective rather than to deal with them individually. This can be achieved by connecting data through monitored objects, using the Skip-gram model to estimate spatial correlation and long shortterm memory to estimate temporal correlation. The framework consists of three major modules: 1) a space embedded Bidirectional Long Short-Term Memory (BiLSTM)-based sequence labeling module to detect the noise data and the latent missing data; 2) a space embedded BiLSTM-based sequence predicting module calculating the value of the missing data; 3) an object characteristics fusion repairing module to correct the spatial and temporal dislocation sensory data. The approach is evaluated with real-world data collected by over 3 000 electronic traffic bayonet devices in a citywide scale of a medium-sized city in China, and the result is superior to those of several referenced approaches. With a 12.9% improvement in data accuracy over the raw data, the proposed framework plays a significant role in various real-world use cases in urban governance, such as criminal investigation, traffic violation monitoring, and equipment maintenance.

Key words: trajectory data, recurrent neural network, spatiotemporal (ST) big data, urban computing

[1] Gajda J, Burnos P. Identification of the spatial impulse response of inductive loop detectors. In Proc. the 2015 IEEE International Instrumentation and Measurement Technology Conference, May 2015, pp.1997-2002.
[2] Jin Y, Tan E, Li L, Wang G, Wang J, Wang P. Hybrid traffic forecasting model with fusion of multiple spatial toll collection data and remote microwave sensor data. IEEE Access, 2018, 6(79):211-221.
[3] Cai Y, Tong H, Fan W, Ji P. Fast mining of a network of coevolving time series. In Proc. the 2015 SIAM International Conference on Data Mining, April 2015, pp.298-306.
[4] Nguyen H V, Vreeken J. Linear-time detection of non-linear changes in massively high dimensional time series. In Proc. the 2016 SIAM International Conference on Data Mining, May 2016, pp.828-836.
[5] Méger N, Rigotti C, Pothier C. Swap randomization of bases of sequences for mining satellite image times series. In Proc. the 2015 European Conference on Machine Learning and Knowledge Discovery in Databases, September 2015, pp.190-205.
[6] Shi W, Zhu Y, Yu P S, Zhang J, Huang T, Wang C, Chen Y. Effective prediction of missing data on Apache Spark over multivariable time series. IEEE Transactions on Big Data, 2018, 4(4):473-486.
[7] Shi N. Intelligent detection of license plate-related illegal vehicles using bayonet images[Ph.D. Thesis]. Nanjing Normal University, 2013. (in Chinese)
[8] Yuan W, Deng P, Taleb T, Wan J, Bi C. An unlicensed taxi identification model based on big data analysis. IEEE Trans. Intelligent Transportation Systems, 2016, 17(6):1703-1713
[9] Hsueh Y, Chen H, Huang W. A hidden Markov model-based map-matching approach for low-sampling-rate GPS trajectories. In Proc. the 7th IEEE International Symposium on Cloud and Service Computing, November 2017, pp.271-274.
[10] Zheng K, Zheng Y, Xie X et al. Reducing uncertainty of lowsampling-rate trajectories. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.1144-1155.
[11] Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y. T-drive:Driving directions based on taxi trajectories. In Proc. the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, November 2010, pp.99-108.
[12] Chen M, Liu Y, Yu X. Predicting next locations with object clustering and trajectory clustering. In Proc. the 19th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 2015, pp.344-356.
[13] Rendle S, Freudenthaler C, Schemidt-Thieme L. Factoring personalized Markov chains for next-basket recommendation. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.811-820.
[14] Liu Q, Wu S, Wang L et al. Predicting the next location:A recurrent model with spatial and temporal contexts. In Proc. the 13th Conference on Artificial Intelligence, February 2016, pp.12-17.
[15] Al-Molegi A, Jabreel M, Ghaleb B. STF-RNN:Space time features-based recurrent neural network for predicting people next location. In Proc. the 2016 IEEE Symposium Series on Computational Intelligence, December 2016.
[16] Rahm E, Do H H. Data cleaning:Problems and current approaches. IEEE Data Eng. Bull., 2000, 23(4):3-13.
[17] Chomicki J, Marcinkowski J. Minimal-change integrity maintenance using tuple deletions. Information Computation, 2005, 197(1/2):90-121.
[18] Wijsen J. Condensed representation of database repairs for consistent query answering. In Proc. the 9th International Conference on Database Theory, January 2003, pp.375-390.
[19] Bohannon P, Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for data cleaning. In Proc. the 3rd International Conference on Data Engineering, April 2007, pp.746-755.
[20] Zheng Y. Trajectory data mining:An overview. ACM Transactions on Intelligent Systems and Technology, 2015, 6(3):Article No. 29.
[21] Kong X, Li M, Ma K et al. Big trajectory data:A survey of applications and services. IEEE Access, 2018, 6:58295-58306.
[22] Bian J, Tian D, Tang Y et al. Trajectory data classification:A review. ACM Transactions on Intelligent Systems and Technology, 2019, 10(4):Article No. 33
[23] Gambs S, Killijian M, Cortez M. Next place prediction using mobility Markov chains. In Proc. the 1st Workshop on Measurement Privacy and Mobility, April 2012, Article No. 3.
[24] Dong B, Ju H, Lu Y, Shi Z. CURE:Curvature regularization for missing data recovery. arXiv:1901.09548, 2019. http://arxiv.org/abs/1901.09548, Sept. 2019.
[25] Kong X, Li M, Li J, Tian K, Hu X, Xia F. CoPFun:An urban co-occurrence pattern mining scheme based on regional function discovery. World Wide Web:Internet and Web Information Systems, 2018, 22(3):1029-1054.
[26] Morzy M. Mining frequent trajectories of moving objects for location prediction. In Proc. the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition, July 2007, pp.667-680.
[27] Agrawal R, Srikant R. Mining sequential patterns. In Proc. the 11th Int. Conf. Data Engineering, March 1995, pp.3-14.
[28] Gomariz A, Campos M, Marín R, Goethals B. ClaSP:An efficient algorithm for mining frequent closed sequences. In Proc. the 17th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, April 2013, pp.50-61.
[29] Pinto H, Han J, Dayal U, Wang J, Chen Q, Pei J, Mortazavi-Asl B, Hsu M. Mining sequential patterns by pattern-growth:The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11):1424-1440.
[30] Zaki M J. SPADE:An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2):31-60
[31] Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, July 2002, pp.429-435.
[32] Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. In Proc. the 13th Conference on Computational Natural, June 2009, pp.147-155.
[33] Passos A, Kumar V, McCallum A. Lexicon infused phrase embeddings for named entity resolution. In Proc. the 18th Computational Natural Language Learning, June 2014, pp.78-86.
[34] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, June 2016, pp.260-270.
[35] Ma X, Hovy E H. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv:1603.01354, 2016. https://arxiv.org/abs/1603.01354, Sept. 2019.
[36] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In Proc. the 1st Int. Conf. Learning Representations, May 2013, Article No. 22.
[37] Peters M E, Ammar W, Bhagavatula C, Power R. Semisupervised sequence tagging with bidirectional language models. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1756-1765.
[38] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991, 2015. http://arxiv.org/abs/1508.01991, Sept. 2019.
[1] Guo-Wei Wang, Jin-Dou Zhang, Jing Li. 用户异源移动轨迹链接的研究[J]. , 2018, 33(4): 792-806.
[2] Ai-Wen Jiang, Bo Liu, Ming-Wen Wang. 基于上下文引导型循环注意机制与深度多模态强化网络的图像问答算法[J]. , 2017, 32(4): 738-748.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[4] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[5] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[6] 孙永强; 陆汝占; 黄小戎;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[7] 王永成;. Automatic Extraction of Words from Chinese Textual Data[J]. , 1987, 2(4): 287 -291 .
[8] 戚余禄;. A Systolic Approach for an Improvement of a Finite Field Multiplier[J]. , 1987, 2(4): 303 -309 .
[9] 金凌紫; 朱鸿;. Systems Programming in the Functional Language FP[J]. , 1988, 3(1): 40 -55 .
[10] 冯玉琳;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: