Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (4): 727-746.doi: 10.1007/s11390-019-1939-3

Special Issue: Surveys; Data Management and Data Mining

• Special Section on Spatio-Temporal Big Data Analytics • Previous Articles     Next Articles

Location and Trajectory Identification from Microblogs

Na Ta1, Member, CCF, Guo-Liang Li2,*, Member, CCF, ACM, IEEE, Jun Hu3, Jian-Hua Feng2   

  1. 1 School of Journalism and Communication, Renmin University of China, Beijing 100872, China;
    2 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    3 Airbnb Network(Beijing) Inc., Beijing 100020, China
  • Received:2019-01-31 Revised:2019-03-10 Online:2019-07-11 Published:2019-07-11
  • Contact: Guo-Liang Li E-mail:liguoliang@tsinghua.edu.cn
  • Supported by:
    This work is supported by the National Natural Science Foundation of China under Grant Nos. 61802414, 61632016, 61521002 and 61661166012, the National Basic Research 973 Program of China under Grant No. 2015CB358700, the Social Science Foundation of Beijing under Grant No. 18XCC011, the Humanities and Social Sciences Base Foundation of Ministry of Education of China under Grant No. 16JJD860008, Huawei, and TAL (Tomorrow Advancing Life) education.

The rapid development of social networks has resulted in a proliferation of user-generated content (UGC), which can benefit many applications. In this paper, we study the problem of identifying a user's locations from microblogs, to facilitate effective location-based advertisement and recommendation. Since the location information in a microblog is incomplete, we cannot get an accurate location from a local microblog. As such, we propose a global location identification method, Glitter. Glitter combines multiple microblogs of a user and utilizes them to identify the user's locations. Glitter not only improves the quality of identifying a user's location but also supplements the location of a microblog so as to obtain an accurate location of a microblog. To facilitate location identification, Glitter organizes points of interest (POIs) into a tree structure where leaf nodes are POIs and non-leaf nodes are segments of POIs, e.g., countries, cities, and streets. Using the tree structure, Glitter first extracts candidate locations from each microblog of a user which correspond to some tree nodes. Then Glitter aggregates these candidate locations and identifies top-k locations of the user. Using the identified top-k user locations, Glitter refines the candidate locations and computes top-k locations of each microblog. To achieve high recall, we enable fuzzy matching between locations and microblogs. We propose an incremental algorithm to support dynamic updates of microblogs. We also study how to identify users' trajectories based on the extracted locations. We propose an effective algorithm to extract high-quality trajectories. Experimental results on real-world datasets show that our method achieves high quality and good performance, and scales well.

Key words: location identification; microblog; trajectory identification;

[1] Li R, Wang S, Deng H, Wang R, Chang K. Towards social user profiling:Unified and discriminative influence model for inferring home locations. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2012, pp.1023-1031.
[2] Chakrabarti K, Chaudhuri S, Ganti V, Xin D. An efficient filter for approximate membership checking. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, June 2008, pp.805-818.
[3] Li G, Deng D, Feng J. Faerie:Efficient filtering algorithms for approximate dictionary-based entity extraction. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, June 2011, pp.529-540.
[4] Li G, Deng D, Feng J. An efficient trie-based method for approximate entity extraction with edit-distance constraints. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.762-773.
[5] Hoffart J, Suchanek F M, Berberich K, Weikum G. YAGO2:A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 2013, 194:28-61.
[6] Cheng Z, Caverlee J, Lee K. You are where you tweet:A content-based approach to geo-locating Twitter users. In Proc. the 19th ACM International Conference on Information and Knowledge Management, June 2010, pp.759-768.
[7] Chandra S, Khan L, Muhaya F B. Estimating Twitter user location using social interactions-A content based approach. In Proc. the 3rd International Conference on Social Computing, October 2011, pp.838-843.
[8] Amitay E, Harel N, Sivan R, Soffer A. Web-a-where:Geotagging web content. In Proc. the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2004, pp.273-280.
[9] Backstrom L, Kleinberg J M, Kumar R, Novak J. Spatial variation in search engine queries. In Proc. the 17th International Conference on World Wide Web, April 2008, pp.357-366.
[10] Li G, Hu J, Feng J, Tan K. Effective location identification from microblogs. In Proc. the 30th International Conference on Data Engineering, March 2014, pp.880-891.
[11] Chandel A, Nagesh P C, Sarawagi S. Efficient batch top-k search for dictionary-based entity recognition. In Proc. the 22nd International Conference on Data Engineering, April 2006, Article No. 28.
[12] Sun C, Naughton J F. The token distribution filter for approximate string membership. In Proc. the 14th International Workshop on the Web and Databases, June 2011, Article No. 5.
[13] Lu J, Han J, Meng X. Efficient algorithms for approximate member extraction using signature-based inverted lists. In Proc. the 18th ACM Conference on Information and Knowledge Management, November 2009, pp.315-324.
[14] Wang W, Xiao C, Lin X, Zhang C. Efficient approximate entity extraction with edit distance constraints. In Proc. the 2009 ACM SIGMOD International Conference on Management of Data, June 2009, pp.759-770.
[15] Chaudhuri S, Ganti V, Xin D. Mining document collections to facilitate accurate approximate entity matching. Proceedings of the VLDB Endowment, 2009, 2(1):395-406.
[16] Agrawal S, Chakrabarti K, Chaudhuri S, Ganti V. Scalable ad-hoc entity extraction from text collections. Proceedings of the VLDB Endowment, 2008, 1(1):945-957.
[17] Deng D, Li G, Feng J, Duan Y, Gong Z. A unified framework for approximate dictionary-based entity extraction. Proceedings of the VLDB Endowment, 2015, 24(1):143-167.
[18] Li K, Li G. Approximate query processing:What is new and where to go?-A survey on approximate query processing. Data Science and Engineering, 2018, 3(4):379-397.
[19] Gao D, Tong Y, She J, Song T, Chen L, Xu K. Top-k team recommendation and its variants in spatial crowdsourcing. Data Science and Engineering, 2017, 2(2):136-150.
[20] Leal F, Malheiro B, González-Vélez H, Burguillo J. Trustbased modelling of multi-criteria crowdsourced data. Data Science and Engineering, 2017, 2(3):199-209.
[21] Mei Q, Liu C, Su H, Zhai C. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proc. the 15th International Conference on World Wide Web, May 2006, pp.533-542.
[22] Rattenbury T, Good N, Naaman M, Towards automatic extraction of event and place semantics from flickr tags. In Proc. the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2007, pp.103-110.
[23] Backstrom L, Sun E, Marlow C. Find me if you can:Improving geographical prediction with social and spatial proximity. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.61-70.
[24] Hao Q, Cai R, Wang C, Xiao R, Yang J, Pang Y, Zhang L. Equip tourists with knowledge mined from travelogues. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.401-410.
[25] Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th International Conference on World Wide Web, March 2011, pp.247-256.
[26] Hong L, Ahmed A, Gurumurthy S, Smola A, Tsioutsiouliklis K. Discovering geographical topics in the Twitter stream. In Proc. the 21st International Conference on World Wide Web, April 2012, pp.769-778.
[27] Li G, Deng D, Wang J, Feng J. PASS-JOIN:A partitionbased method for similarity joins. Proceedings of the VLDB Endowment, 2011, 5(3):253-264.
[28] Li G, Deng D, Feng J. A partition-based method for string similarity joins with edit-distance constraints. ACM Transactions on Database Systems, 2013, 38(2):Article No. 9.
[29] Wang J, Li G, Feng J. Trie-join:Efficient trie-based string similarity joins with edit-distance constraints. Proceedings of the VLDB Endowment, 2010, 3(1):1219-1230.
[30] Xu L, Ling T W, Wu H, Bao Z. DDE:From Dewey to a fully dynamic XML labeling scheme. In Proc. the 2009 ACM SIGMOD International Conference on Management of Data, June 2009, pp.719-730.
[1] Fei-Fei Kou, Jun-Ping Du, Cong-Xian Yang, Yan-Song Shi, Wan-Qiu Cui, Mei-Yu Liang, Yue Geng. Hashtag Recommendation Based on Multi-Features of Microblogs [J]. , 2018, 33(4): 711-726.
[2] Cun-Chao Tu, Zhi-Yuan Liu, Mao-Song Sun. Tag Correspondence Model for User Tag Suggestion [J]. , 2015, 30(5): 1063-1072.
[3] Xian Wu, Wei Fan, Jing Gao Zi-Ming Feng, Yong Yu. Detecting Marionette Microblog Users for Improved Information Credibility [J]. , 2015, 30(5): 1082-1096.
[4] Wu Yang Guo-Wei Shen, Wei Wang, Liang-Yi Gong, Miao Yu, Guo-Zhong Dong. Anomaly Detection in Microblogging via Co-Clustering [J]. , 2015, 30(5): 1097-1108.
[5] Fei Jiang, Yi-Qun Liu, Huan-Bo LuanJia-Shen Sun, Xuan Zhu, Min Zhang, Shao-Ping Ma. Microblog Sentiment Analysis with Emoticon Space Model [J]. , 2015, 30(5): 1120-1129.
[6] Yan-Tao Jia, Yuan-Zhuo Wang, Xue-Qi Cheng. Learning to Predict Links by Integrating Structure and Interaction Information in Microblogs [J]. , 2015, 30(4): 829-842.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[2] Xiang Dong; Wei Daozheng; Chen Shisong;. Probabilistic Models for Estimation of Random and Pseudo-Random Test Length[J]. , 1992, 7(2): 164 -174 .
[3] Ma Jun; Ma Shaohan;. Efficient Parallel Algorithms for Some Graph Theory Problems[J]. , 1993, 8(4): 76 -80 .
[4] Zhang Mingyi;. Some Results on Default Logic[J]. , 1994, 9(3): 267 -274 .
[5] Ma Guangsheng; Zhang Zhongwei; and Huang Shaobin;. A New Method of Solving Kernels in Algebraic Decomposition for the Synthesis of Logic Cell Array[J]. , 1995, 10(6): 569 -573 .
[6] Shen Li;. Fuzzy Logic Control ASIC Chip[J]. , 1997, 12(3): 263 -270 .
[7] Chen Yangjun;. Graph Traversal and Top-Down Evaluation of Logic Queries[J]. , 1998, 13(4): 300 -316 .
[8] David de Frutos-Escrig; Luis Liana-Diaz; Manuel Nunez;. An invitation to Friendly Testing[J]. , 1998, 13(6): 531 -545 .
[9] PENG Guoqiang; CHENG Hu;. A Causal Model for Diagnostic Reasoning[J]. , 2000, 15(3): 287 -294 .
[10] Xiao-Jun Wan and Yu-Xin Peng. A New Retrieval Model Based on TextTiling for Document Similarity Search[J]. , 2005, 20(4): 552 -558 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved