|
计算机科学技术学报 ›› 2019,Vol. 34 ›› Issue (4): 727-746.doi: 10.1007/s11390-019-1939-3
所属专题: 综述; Data Management and Data Mining
Na Ta1, Member, CCF, Guo-Liang Li2,*, Member, CCF, ACM, IEEE, Jun Hu3, Jian-Hua Feng2
Na Ta1, Member, CCF, Guo-Liang Li2,*, Member, CCF, ACM, IEEE, Jun Hu3, Jian-Hua Feng2
社交网络的迅速发展导致了用户生成内容(UGC)的激增,UGC数据有利于多种应用。例如,从微博中识别用户的位置对于有效的基于位置的广告和推荐非常重要。本文研究了从微博中识别用户位置的问题。解决这一问题存在的难点在于,微博中的位置信息并不完整,因此无法从单一微博中获得准确的位置。为了应对这一挑战,我们提出了一种全局的位置识别方法Glitter,结合用户的多条微博识别用户位置信息。Glitter不仅提高了用户位置识别的质量,还可以准确地补全微博位置。为了辅助位置识别,Glitter将兴趣点(POI)数据组织成一个树结构,其中叶节点是POI,而非叶节点是POI的分段,例如国家、州、城市、地区和街道。使用树结构,Glitter首先从用户的每条微博中提取对应于某些树节点的候选位置。然后,glitter聚集这些候选位置并识别微博用户的Top-K位置。使用确定的Top-K用户位置,Glitter进一步优化候选位置并计算每条微博的Top-K位置。为了达到高召回率,我们启用了位置和微博之间的模糊匹配,并提出了一种支持微博动态更新的增量算法。此外,我们还研究了如何根据提取的位置来识别用户的轨迹,并提出一种有效的轨迹提取算法,解决高质量用户轨迹的识别问题。在真实数据集的实验结果表明,该方法具有较高的质量和良好的性能,且具有良好的扩展性。
[1] Li R, Wang S, Deng H, Wang R, Chang K. Towards social user profiling:Unified and discriminative influence model for inferring home locations. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2012, pp.1023-1031. [2] Chakrabarti K, Chaudhuri S, Ganti V, Xin D. An efficient filter for approximate membership checking. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, June 2008, pp.805-818. [3] Li G, Deng D, Feng J. Faerie:Efficient filtering algorithms for approximate dictionary-based entity extraction. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, June 2011, pp.529-540. [4] Li G, Deng D, Feng J. An efficient trie-based method for approximate entity extraction with edit-distance constraints. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.762-773. [5] Hoffart J, Suchanek F M, Berberich K, Weikum G. YAGO2:A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 2013, 194:28-61. [6] Cheng Z, Caverlee J, Lee K. You are where you tweet:A content-based approach to geo-locating Twitter users. In Proc. the 19th ACM International Conference on Information and Knowledge Management, June 2010, pp.759-768. [7] Chandra S, Khan L, Muhaya F B. Estimating Twitter user location using social interactions-A content based approach. In Proc. the 3rd International Conference on Social Computing, October 2011, pp.838-843. [8] Amitay E, Harel N, Sivan R, Soffer A. Web-a-where:Geotagging web content. In Proc. the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2004, pp.273-280. [9] Backstrom L, Kleinberg J M, Kumar R, Novak J. Spatial variation in search engine queries. In Proc. the 17th International Conference on World Wide Web, April 2008, pp.357-366. [10] Li G, Hu J, Feng J, Tan K. Effective location identification from microblogs. In Proc. the 30th International Conference on Data Engineering, March 2014, pp.880-891. [11] Chandel A, Nagesh P C, Sarawagi S. Efficient batch top-k search for dictionary-based entity recognition. In Proc. the 22nd International Conference on Data Engineering, April 2006, Article No. 28. [12] Sun C, Naughton J F. The token distribution filter for approximate string membership. In Proc. the 14th International Workshop on the Web and Databases, June 2011, Article No. 5. [13] Lu J, Han J, Meng X. Efficient algorithms for approximate member extraction using signature-based inverted lists. In Proc. the 18th ACM Conference on Information and Knowledge Management, November 2009, pp.315-324. [14] Wang W, Xiao C, Lin X, Zhang C. Efficient approximate entity extraction with edit distance constraints. In Proc. the 2009 ACM SIGMOD International Conference on Management of Data, June 2009, pp.759-770. [15] Chaudhuri S, Ganti V, Xin D. Mining document collections to facilitate accurate approximate entity matching. Proceedings of the VLDB Endowment, 2009, 2(1):395-406. [16] Agrawal S, Chakrabarti K, Chaudhuri S, Ganti V. Scalable ad-hoc entity extraction from text collections. Proceedings of the VLDB Endowment, 2008, 1(1):945-957. [17] Deng D, Li G, Feng J, Duan Y, Gong Z. A unified framework for approximate dictionary-based entity extraction. Proceedings of the VLDB Endowment, 2015, 24(1):143-167. [18] Li K, Li G. Approximate query processing:What is new and where to go?-A survey on approximate query processing. Data Science and Engineering, 2018, 3(4):379-397. [19] Gao D, Tong Y, She J, Song T, Chen L, Xu K. Top-k team recommendation and its variants in spatial crowdsourcing. Data Science and Engineering, 2017, 2(2):136-150. [20] Leal F, Malheiro B, González-Vélez H, Burguillo J. Trustbased modelling of multi-criteria crowdsourced data. Data Science and Engineering, 2017, 2(3):199-209. [21] Mei Q, Liu C, Su H, Zhai C. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proc. the 15th International Conference on World Wide Web, May 2006, pp.533-542. [22] Rattenbury T, Good N, Naaman M, Towards automatic extraction of event and place semantics from flickr tags. In Proc. the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2007, pp.103-110. [23] Backstrom L, Sun E, Marlow C. Find me if you can:Improving geographical prediction with social and spatial proximity. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.61-70. [24] Hao Q, Cai R, Wang C, Xiao R, Yang J, Pang Y, Zhang L. Equip tourists with knowledge mined from travelogues. In Proc. the 19th International Conference on World Wide Web, April 2010, pp.401-410. [25] Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th International Conference on World Wide Web, March 2011, pp.247-256. [26] Hong L, Ahmed A, Gurumurthy S, Smola A, Tsioutsiouliklis K. Discovering geographical topics in the Twitter stream. In Proc. the 21st International Conference on World Wide Web, April 2012, pp.769-778. [27] Li G, Deng D, Wang J, Feng J. PASS-JOIN:A partitionbased method for similarity joins. Proceedings of the VLDB Endowment, 2011, 5(3):253-264. [28] Li G, Deng D, Feng J. A partition-based method for string similarity joins with edit-distance constraints. ACM Transactions on Database Systems, 2013, 38(2):Article No. 9. [29] Wang J, Li G, Feng J. Trie-join:Efficient trie-based string similarity joins with edit-distance constraints. Proceedings of the VLDB Endowment, 2010, 3(1):1219-1230. [30] Xu L, Ling T W, Wu H, Bao Z. DDE:From Dewey to a fully dynamic XML labeling scheme. In Proc. the 2009 ACM SIGMOD International Conference on Management of Data, June 2009, pp.719-730. |
[1] | Fei-Fei Kou, Jun-Ping Du, Cong-Xian Yang, Yan-Song Shi, Wan-Qiu Cui. 基于微博多特征的标签推荐[J]. , 2018, 33(4): 711-726. |
[2] | Jin-Qi Zhu, Li Lu, Chun-Mei Ma. 从兴趣到位置:微博系统中基于近邻关系的朋友推荐[J]. , 2015, 30(6): 1188-1200. |
[3] | Xian Wu, Wei Fan, Jing Gao Zi-Ming Feng, Yong Yu. 检测僵尸用户提高微博数据的可靠度[J]. , 2015, 30(5): 1082-1096. |
[4] | Wu Yang Guo-Wei Shen, Wei Wang, Liang-Yi Gong, Miao Yu, Guo-Zhong Dong. 基于联合聚类的微博异常检测[J]. , 2015, 30(5): 1097-1108. |
[5] | Fei Jiang, Yi-Qun Liu, Huan-Bo LuanJia-Shen Sun, Xuan Zhu, Min Zhang, Shao-Ping Ma. 利用表情符空间进行微博情感分析[J]. , 2015, 30(5): 1120-1129. |
[6] | Yan-Tao Jia, Yuan-Zhuo Wang, Xue-Qi Cheng. 微博中融合结构和交互信息的链接预测学习[J]. , 2015, 30(4): 829-842. |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |