|
›› 2015,Vol. 30 ›› Issue (5): 1130-1140.doi: 10.1007/s11390-015-1588-0
所属专题: Data Management and Data Mining
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Yong-Xin Tong1,2*(童咏昕), Member, CCF, ACM, IEEE, Jieying She3(佘洁莹), Student Member, IEEE,Lei Chen3(陈雷), Member, ACM, IEEE
Yong-Xin Tong1,2*(童咏昕), Member, CCF, ACM, IEEE, Jieying She3(佘洁莹), Student Member, IEEE,Lei Chen3(陈雷), Member, ACM, IEEE
App正在移动平台与Web平台上吸引着越来越多的关注。由于当前App市场的自组织特性, app的描述书写地并不正式, 其还包含了众多噪音词和句子。因此, 对于大多数app, 它们的功能未被很好地文档化, 所以也不易于被app搜索引擎获取。本文通过识别app描述中信息最丰富词汇的方法研究了推断一个app真实功能这一问题。为了以一种适当的方式来运用和集成app语料库的多样化信息, 我们提出了一个概率主题模型来发现app语料库中隐含的数据结构。此主题模型的结果进一步可用于识别一个app的功能和其信息最丰富的词汇。我们分别在从Google Play和Windows Phone Store所爬取的真实数据集上进行大量实验, 并验证所提出方法的有效性。
[1] Liu S, Wang S, Zhu F, Zhang J, Krishnan R. HYDRA:Large-scale social identity linkage via heterogeneous behavior modeling. In Proc. ACM SIGMOD, June 2014, pp.51-62.[2] Tong Y, Cao C C, Chen L. TCS:Efficient topic discovery over crowd-oriented service data. In Proc. the 20th SIGKDD, August 2014, pp.861-870.[3] Baeza-Yates R, Jiang D, Silvestri F, Harrison B. Predicting the next app that you are going to use. In Proc. the 8th WSDM, February 2015, pp.285-294.[4] She J, Tong Y, Chen L, Cao C C. Conflict-aware eventparticipant arrangement. In Proc. the 31st ICDE, April 2015, pp.735-746.[5] She J, Tong Y, Chen L. Utility-aware social eventparticipant planning. In Proc. ACM SIGMOD, May 31-June 4, 2015, pp.1629-1643.[6] Tong Y, Meng R, She J. On bottleneck-aware arrangement for event-based social networks. In Proc. the 31st ICDE Workshops, April 2015, pp.216-223.[7] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022.[8] Griffiths T L, Steyvers M. Finding scientific topics. Proc. the National Academy of Sciences, 2004, 101(Suppl.1):5228-5235.[9] Jo Y, Oh A H. Aspect and sentiment unification model for online review analysis. In Proc. the 4th WSDM, February 2011, pp.815-824.[10] Sato I, Nakagawa H. Topic models with power-law using Pitman-Yor process. In Proc. the 16th SIGKDD, July 2010, pp.673-682.[11] Wang C, Wang J, Xie X, Ma W Y. Mining geographic knowledge using location aware topic model. In Proc. the 4th ACM Workshop on GIR, November 2007, pp.65-70.[12] Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th WWW, March 28-April 1, 2011, pp.247-256.[13] Jiang D, Vosecky J, Leung K W T, Ng W. G-WSTD:A framework for geographic web search topic discovery. In Proc. the 21st CIKM, October 29-November 2, 2012, pp.1143-1152.[14] Jiang D, Leung K W T, Ng W, Li H. Beyond click graph:Topic modeling for search engine query log analysis. In Proc. the 18th DASFAA, April 2013, pp.209-223.[15] Sizov S. Geofolk:Latent spatial semantics in Web 2.0 social media. In Proc. the 3rd WSDM, February 2010, pp.281-290.[16] Eisenstein J, O'Connor B, Smith N A, Xing E P. A latent variable model for geographic lexical variation. In Proc. the EMNLP, October 2010, pp.1277-1287.[17] Jiang D, Leung K W T, Vosecky J, Ng W. Personalized query suggestion with diversity awareness. In Proc. the 30th ICDE, March 31-April 4, 2014, pp.400-411.[18] Jiang D, Leung K W T, Ng W. Query intent mining with multiple dimensions of web search data. World Wide Web, 2015.[19] Hao Q, Cai R, Wang C, Xiao R, Yang J M, Pang Y, Zhang L. Equip tourists with knowledge mined from travelogues. In Proc. the 19th WWW, April 2010, pp.401-410.[20] Teh Y W. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proc. the 44th ACL, July 2006, pp.985-992.[21] El-Arini K. Dirichlet Processes:A Gentle Tutorial. 2008. https://www.cs.cmu.edu/~kbe/dptutorial.pdf, Aug. 2015.[22] Wallach H M. Structured topic models for language[Ph.D. Thesis]. Univ. Cambridge, 2008.[23] Rosen-ZviM, Griffiths T, Steyvers M, Smyth P. The authortopic model for authors and documents. In Proc. the 20th UAI, July 2004, pp.487-494.[24] Xia H, Li J, Tang J, Moens M F. Plink-LDA:Using link as prior information in topic modeling. In Proc. the 17th DASFAA, April 2012, pp.213-227. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |