Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (5): 1084-1098.doi: 10.1007/s11390-020-9724-x

Special Issue: Computer Networks and Distributed Computing

• Computer Networks and Distributed Computing • Previous Articles     Next Articles

A Spatiotemporal Causality Based Governance Framework for Noisy Urban Sensory Data

Bi-Ying Yan1,2, Member, CCF, Chao Yang3,4, Senior Member, CCF, Member, ACM, IEEE, Pan Deng5,*, Senior Member, CCF, Member, ACM, Qiao Sun1, Feng Chen1,6, and Yang Yu1,2,7        

  1. 1 Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China;
    3 School of Mathematical Sciences, Peking University, Beijing 100871, China;
    4 Peng Cheng Laboratory, Shenzhen 518052, China;
    5 Research Institute for Frontier Science, Beihang University, Beijing 100191, China;
    6 Guiyang Academy of Information Technology, Guiyang 550081, China;
    7 Guiyang Municipal Commission of Transport, Guiyang 550003, China
  • Received:2019-05-18 Revised:2019-12-09 Online:2020-09-20 Published:2020-09-29
  • Contact: Pan Deng
  • Supported by:
    This work was supported in part by the National Key Research and Development Program of China under Grant No. 2018YFC0-831500.

Urban sensing is one of the fundamental building blocks of urban computing. It uses various types of sensors deployed in different geospatial locations to continuously and cooperatively monitor the natural and cultural environment in urban areas. Nevertheless, issues such as uneven distribution, low sampling rate and high failure ratio of sensors often make their readings less reliable. This paper provides an innovative framework to detect the noise data as well as to repair them from a spatial-temporal causality perspective rather than to deal with them individually. This can be achieved by connecting data through monitored objects, using the Skip-gram model to estimate spatial correlation and long shortterm memory to estimate temporal correlation. The framework consists of three major modules: 1) a space embedded Bidirectional Long Short-Term Memory (BiLSTM)-based sequence labeling module to detect the noise data and the latent missing data; 2) a space embedded BiLSTM-based sequence predicting module calculating the value of the missing data; 3) an object characteristics fusion repairing module to correct the spatial and temporal dislocation sensory data. The approach is evaluated with real-world data collected by over 3 000 electronic traffic bayonet devices in a citywide scale of a medium-sized city in China, and the result is superior to those of several referenced approaches. With a 12.9% improvement in data accuracy over the raw data, the proposed framework plays a significant role in various real-world use cases in urban governance, such as criminal investigation, traffic violation monitoring, and equipment maintenance.

Key words: trajectory data; recurrent neural network; spatiotemporal (ST) big data; urban computing;

[1] Gajda J, Burnos P. Identification of the spatial impulse response of inductive loop detectors. In Proc. the 2015 IEEE International Instrumentation and Measurement Technology Conference, May 2015, pp.1997-2002.
[2] Jin Y, Tan E, Li L, Wang G, Wang J, Wang P. Hybrid traffic forecasting model with fusion of multiple spatial toll collection data and remote microwave sensor data. IEEE Access, 2018, 6(79):211-221.
[3] Cai Y, Tong H, Fan W, Ji P. Fast mining of a network of coevolving time series. In Proc. the 2015 SIAM International Conference on Data Mining, April 2015, pp.298-306.
[4] Nguyen H V, Vreeken J. Linear-time detection of non-linear changes in massively high dimensional time series. In Proc. the 2016 SIAM International Conference on Data Mining, May 2016, pp.828-836.
[5] Méger N, Rigotti C, Pothier C. Swap randomization of bases of sequences for mining satellite image times series. In Proc. the 2015 European Conference on Machine Learning and Knowledge Discovery in Databases, September 2015, pp.190-205.
[6] Shi W, Zhu Y, Yu P S, Zhang J, Huang T, Wang C, Chen Y. Effective prediction of missing data on Apache Spark over multivariable time series. IEEE Transactions on Big Data, 2018, 4(4):473-486.
[7] Shi N. Intelligent detection of license plate-related illegal vehicles using bayonet images[Ph.D. Thesis]. Nanjing Normal University, 2013. (in Chinese)
[8] Yuan W, Deng P, Taleb T, Wan J, Bi C. An unlicensed taxi identification model based on big data analysis. IEEE Trans. Intelligent Transportation Systems, 2016, 17(6):1703-1713
[9] Hsueh Y, Chen H, Huang W. A hidden Markov model-based map-matching approach for low-sampling-rate GPS trajectories. In Proc. the 7th IEEE International Symposium on Cloud and Service Computing, November 2017, pp.271-274.
[10] Zheng K, Zheng Y, Xie X et al. Reducing uncertainty of lowsampling-rate trajectories. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.1144-1155.
[11] Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y. T-drive:Driving directions based on taxi trajectories. In Proc. the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, November 2010, pp.99-108.
[12] Chen M, Liu Y, Yu X. Predicting next locations with object clustering and trajectory clustering. In Proc. the 19th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 2015, pp.344-356.
[13] Rendle S, Freudenthaler C, Schemidt-Thieme L. Factoring personalized Markov chains for next-basket recommendation. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.811-820.
[14] Liu Q, Wu S, Wang L et al. Predicting the next location:A recurrent model with spatial and temporal contexts. In Proc. the 13th Conference on Artificial Intelligence, February 2016, pp.12-17.
[15] Al-Molegi A, Jabreel M, Ghaleb B. STF-RNN:Space time features-based recurrent neural network for predicting people next location. In Proc. the 2016 IEEE Symposium Series on Computational Intelligence, December 2016.
[16] Rahm E, Do H H. Data cleaning:Problems and current approaches. IEEE Data Eng. Bull., 2000, 23(4):3-13.
[17] Chomicki J, Marcinkowski J. Minimal-change integrity maintenance using tuple deletions. Information Computation, 2005, 197(1/2):90-121.
[18] Wijsen J. Condensed representation of database repairs for consistent query answering. In Proc. the 9th International Conference on Database Theory, January 2003, pp.375-390.
[19] Bohannon P, Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for data cleaning. In Proc. the 3rd International Conference on Data Engineering, April 2007, pp.746-755.
[20] Zheng Y. Trajectory data mining:An overview. ACM Transactions on Intelligent Systems and Technology, 2015, 6(3):Article No. 29.
[21] Kong X, Li M, Ma K et al. Big trajectory data:A survey of applications and services. IEEE Access, 2018, 6:58295-58306.
[22] Bian J, Tian D, Tang Y et al. Trajectory data classification:A review. ACM Transactions on Intelligent Systems and Technology, 2019, 10(4):Article No. 33
[23] Gambs S, Killijian M, Cortez M. Next place prediction using mobility Markov chains. In Proc. the 1st Workshop on Measurement Privacy and Mobility, April 2012, Article No. 3.
[24] Dong B, Ju H, Lu Y, Shi Z. CURE:Curvature regularization for missing data recovery. arXiv:1901.09548, 2019., Sept. 2019.
[25] Kong X, Li M, Li J, Tian K, Hu X, Xia F. CoPFun:An urban co-occurrence pattern mining scheme based on regional function discovery. World Wide Web:Internet and Web Information Systems, 2018, 22(3):1029-1054.
[26] Morzy M. Mining frequent trajectories of moving objects for location prediction. In Proc. the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition, July 2007, pp.667-680.
[27] Agrawal R, Srikant R. Mining sequential patterns. In Proc. the 11th Int. Conf. Data Engineering, March 1995, pp.3-14.
[28] Gomariz A, Campos M, Marín R, Goethals B. ClaSP:An efficient algorithm for mining frequent closed sequences. In Proc. the 17th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, April 2013, pp.50-61.
[29] Pinto H, Han J, Dayal U, Wang J, Chen Q, Pei J, Mortazavi-Asl B, Hsu M. Mining sequential patterns by pattern-growth:The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11):1424-1440.
[30] Zaki M J. SPADE:An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2):31-60
[31] Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, July 2002, pp.429-435.
[32] Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. In Proc. the 13th Conference on Computational Natural, June 2009, pp.147-155.
[33] Passos A, Kumar V, McCallum A. Lexicon infused phrase embeddings for named entity resolution. In Proc. the 18th Computational Natural Language Learning, June 2014, pp.78-86.
[34] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, June 2016, pp.260-270.
[35] Ma X, Hovy E H. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv:1603.01354, 2016., Sept. 2019.
[36] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In Proc. the 1st Int. Conf. Learning Representations, May 2013, Article No. 22.
[37] Peters M E, Ammar W, Bhagavatula C, Power R. Semisupervised sequence tagging with bidirectional language models. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1756-1765.
[38] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991, 2015., Sept. 2019.
Full text



[1] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[4] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[5] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[6] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[7] Wang Yongcheng;. Automatic Extraction of Words from Chinese Textual Data[J]. , 1987, 2(4): 287 -291 .
[8] Qi Yulu;. A Systolic Approach for an Improvement of a Finite Field Multiplier[J]. , 1987, 2(4): 303 -309 .
[9] Jin Lingzi; Zhu Hong;. Systems Programming in the Functional Language FP[J]. , 1988, 3(1): 40 -55 .
[10] Feng Yulin;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved