›› 2015,Vol. 30 ›› Issue (5): 1130-1140.doi: 10.1007/s11390-015-1588-0

所属专题: Data Management and Data Mining

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

关于如何更好地理解App的功能

Yong-Xin Tong1,2*(童咏昕), Member, CCF, ACM, IEEE, Jieying She3(佘洁莹), Student Member, IEEE,Lei Chen3(陈雷), Member, ACM, IEEE   

  1. 1 State Key Laboratory of Software Development Environment, School of Computer Science and Engineering Beihang University, Beijing 100191, China;
    2 International Research Institute for Multidisciplinary Science, Beihang University, Beijing 100191, China;
    3 Department of Computer Science and Engineering, The Hong Kong University of Science and Technology Hong Kong, China
  • 收稿日期:2014-11-16 修回日期:2015-07-22 出版日期:2015-09-05 发布日期:2015-09-05
  • 通讯作者: Yong-Xin Tong E-mail:yxtong@buaa.edu.cn
  • 作者简介:Yong-Xin Tong received his Ph.D. degree in computer science and engineering from the Hong Kong University of Science and Technology (HKUST), Hong Kong, in 2014. He is currently an associate professor in the School of Computer Science and Engineering, Beihang University, Beijing. Before that, he served as a research assistant professor and a postdoctoral fellow at HKUST. He is a member of CCF, ACM, and IEEE. His research interests include crowdsourcing, uncertain data mining and management, and social network analysis.
  • 基金资助:

    This work is supported in part by the Hong Kong RGC Project under Grant No. N HKUST637/13, the National Basic Research 973 Program of China under Grant No. 2014CB340303, the National Natural Science Foundation of China under Grant Nos. 61328202 and 61502021, Microsoft Research Asia Gift Grant, Google Faculty Award 2013, and Microsoft Research Asia Fellowship 2012.

Towards Better Understanding of App Functions

Yong-Xin Tong1,2*(童咏昕), Member, CCF, ACM, IEEE, Jieying She3(佘洁莹), Student Member, IEEE,Lei Chen3(陈雷), Member, ACM, IEEE   

  1. 1 State Key Laboratory of Software Development Environment, School of Computer Science and Engineering Beihang University, Beijing 100191, China;
    2 International Research Institute for Multidisciplinary Science, Beihang University, Beijing 100191, China;
    3 Department of Computer Science and Engineering, The Hong Kong University of Science and Technology Hong Kong, China
  • Received:2014-11-16 Revised:2015-07-22 Online:2015-09-05 Published:2015-09-05
  • Contact: Yong-Xin Tong E-mail:yxtong@buaa.edu.cn
  • About author:Yong-Xin Tong received his Ph.D. degree in computer science and engineering from the Hong Kong University of Science and Technology (HKUST), Hong Kong, in 2014. He is currently an associate professor in the School of Computer Science and Engineering, Beihang University, Beijing. Before that, he served as a research assistant professor and a postdoctoral fellow at HKUST. He is a member of CCF, ACM, and IEEE. His research interests include crowdsourcing, uncertain data mining and management, and social network analysis.
  • Supported by:

    This work is supported in part by the Hong Kong RGC Project under Grant No. N HKUST637/13, the National Basic Research 973 Program of China under Grant No. 2014CB340303, the National Natural Science Foundation of China under Grant Nos. 61328202 and 61502021, Microsoft Research Asia Gift Grant, Google Faculty Award 2013, and Microsoft Research Asia Fellowship 2012.

App正在移动平台与Web平台上吸引着越来越多的关注。由于当前App市场的自组织特性, app的描述书写地并不正式, 其还包含了众多噪音词和句子。因此, 对于大多数app, 它们的功能未被很好地文档化, 所以也不易于被app搜索引擎获取。本文通过识别app描述中信息最丰富词汇的方法研究了推断一个app真实功能这一问题。为了以一种适当的方式来运用和集成app语料库的多样化信息, 我们提出了一个概率主题模型来发现app语料库中隐含的数据结构。此主题模型的结果进一步可用于识别一个app的功能和其信息最丰富的词汇。我们分别在从Google Play和Windows Phone Store所爬取的真实数据集上进行大量实验, 并验证所提出方法的有效性。

Abstract: Apps are attracting more and more attention from both mobile and web platforms. Due to the self-organized nature of the current app marketplaces, the descriptions of apps are not formally written and contain a lot of noisy words and sentences. Thus, for most of the apps, the functions of them are not well documented and thus cannot be captured by app search engines easily. In this paper, we study the problem of inferring the real functions of an app by identifying the most informative words in its description. In order to utilize and integrate the diverse information of the app corpus in a proper way, we propose a probabilistic topic model to discover the latent data structure of the app corpus. The outputs of the topic model are further used to identify the function of an app and its most informative words. We verify the effectiveness of the proposed methods through extensive experiments on two real app datasets crawled from Google Play and Windows Phone Store, respectively.

[1] Liu S, Wang S, Zhu F, Zhang J, Krishnan R. HYDRA:Large-scale social identity linkage via heterogeneous behavior modeling. In Proc. ACM SIGMOD, June 2014, pp.51-62.

[2] Tong Y, Cao C C, Chen L. TCS:Efficient topic discovery over crowd-oriented service data. In Proc. the 20th SIGKDD, August 2014, pp.861-870.

[3] Baeza-Yates R, Jiang D, Silvestri F, Harrison B. Predicting the next app that you are going to use. In Proc. the 8th WSDM, February 2015, pp.285-294.

[4] She J, Tong Y, Chen L, Cao C C. Conflict-aware eventparticipant arrangement. In Proc. the 31st ICDE, April 2015, pp.735-746.

[5] She J, Tong Y, Chen L. Utility-aware social eventparticipant planning. In Proc. ACM SIGMOD, May 31-June 4, 2015, pp.1629-1643.

[6] Tong Y, Meng R, She J. On bottleneck-aware arrangement for event-based social networks. In Proc. the 31st ICDE Workshops, April 2015, pp.216-223.

[7] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022.

[8] Griffiths T L, Steyvers M. Finding scientific topics. Proc. the National Academy of Sciences, 2004, 101(Suppl.1):5228-5235.

[9] Jo Y, Oh A H. Aspect and sentiment unification model for online review analysis. In Proc. the 4th WSDM, February 2011, pp.815-824.

[10] Sato I, Nakagawa H. Topic models with power-law using Pitman-Yor process. In Proc. the 16th SIGKDD, July 2010, pp.673-682.

[11] Wang C, Wang J, Xie X, Ma W Y. Mining geographic knowledge using location aware topic model. In Proc. the 4th ACM Workshop on GIR, November 2007, pp.65-70.

[12] Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In Proc. the 20th WWW, March 28-April 1, 2011, pp.247-256.

[13] Jiang D, Vosecky J, Leung K W T, Ng W. G-WSTD:A framework for geographic web search topic discovery. In Proc. the 21st CIKM, October 29-November 2, 2012, pp.1143-1152.

[14] Jiang D, Leung K W T, Ng W, Li H. Beyond click graph:Topic modeling for search engine query log analysis. In Proc. the 18th DASFAA, April 2013, pp.209-223.

[15] Sizov S. Geofolk:Latent spatial semantics in Web 2.0 social media. In Proc. the 3rd WSDM, February 2010, pp.281-290.

[16] Eisenstein J, O'Connor B, Smith N A, Xing E P. A latent variable model for geographic lexical variation. In Proc. the EMNLP, October 2010, pp.1277-1287.

[17] Jiang D, Leung K W T, Vosecky J, Ng W. Personalized query suggestion with diversity awareness. In Proc. the 30th ICDE, March 31-April 4, 2014, pp.400-411.

[18] Jiang D, Leung K W T, Ng W. Query intent mining with multiple dimensions of web search data. World Wide Web, 2015.

[19] Hao Q, Cai R, Wang C, Xiao R, Yang J M, Pang Y, Zhang L. Equip tourists with knowledge mined from travelogues. In Proc. the 19th WWW, April 2010, pp.401-410.

[20] Teh Y W. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proc. the 44th ACL, July 2006, pp.985-992.

[21] El-Arini K. Dirichlet Processes:A Gentle Tutorial. 2008. https://www.cs.cmu.edu/~kbe/dptutorial.pdf, Aug. 2015.

[22] Wallach H M. Structured topic models for language[Ph.D. Thesis]. Univ. Cambridge, 2008.

[23] Rosen-ZviM, Griffiths T, Steyvers M, Smyth P. The authortopic model for authors and documents. In Proc. the 20th UAI, July 2004, pp.487-494.

[24] Xia H, Li J, Tang J, Moens M F. Plink-LDA:Using link as prior information in topic modeling. In Proc. the 17th DASFAA, April 2012, pp.213-227.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 张钹; 张铃;. Statistical Heuristic Search[J]. , 1987, 2(1): 1 -11 .
[10] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: