›› 2015,Vol. 30 ›› Issue (5): 1017-1035.doi: 10.1007/s11390-015-1578-2

所属专题: Data Management and Data Mining Software Systems

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

TagCombine:一种为软件信息网站推荐标签的方法

Xin-Yu Wang1(王新宇), Xin Xia1,*(夏鑫), Member, CCF, ACM, IEEE, David Lo2, Member, ACM, IEEE   

  1. 1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
    2 School of Information Systems, Singapore Management University, Singapore, Singapore
  • 收稿日期:2015-03-20 修回日期:2015-07-09 出版日期:2015-09-05 发布日期:2015-09-05
  • 通讯作者: Xin Xia E-mail:xxia@zju.edu.cn
  • 作者简介:Xin-Yu Wang received his Bachelor's and Ph.D. degrees in computer science from Zhejiang University, Hangzhou, in 2002 and 2007 respectively. He was a research assistant in Zhejiang University during 2002~2007. He is currently an associate professor in the College of Computer Science and Technology, Zhejiang University. His research interests include software engineering, formal methods, and very large information systems.
  • 基金资助:

    This research was partially supported by China Knowledge Centre for Engineering Sciences and Technology under Grant No. CKCEST-2014-1-5, the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant Nos. 2015BAH17F01 and 2013BAH01B01, and the Fundamental Research Funds for the Central Universities of China.

TagCombine: Recommending Tags to Contents in Software Information Sites

Xin-Yu Wang1(王新宇), Xin Xia1*(夏鑫), Member, CCF, ACM, IEEE, David Lo2, Member, ACM, IEEE   

  1. 1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
    2 School of Information Systems, Singapore Management University, Singapore, Singapore
  • Received:2015-03-20 Revised:2015-07-09 Online:2015-09-05 Published:2015-09-05
  • Contact: Xin Xia E-mail:xxia@zju.edu.cn
  • About author:Xin-Yu Wang received his Bachelor's and Ph.D. degrees in computer science from Zhejiang University, Hangzhou, in 2002 and 2007 respectively. He was a research assistant in Zhejiang University during 2002~2007. He is currently an associate professor in the College of Computer Science and Technology, Zhejiang University. His research interests include software engineering, formal methods, and very large information systems.
  • Supported by:

    This research was partially supported by China Knowledge Centre for Engineering Sciences and Technology under Grant No. CKCEST-2014-1-5, the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant Nos. 2015BAH17F01 and 2013BAH01B01, and the Fundamental Research Funds for the Central Universities of China.

如今, 软件工程师热衷于在大量网络媒体中搜寻探索新鲜有趣的技术, 并与其他工程师相互学习和帮助。我们将这类能够帮助软件工程师提高软件开发维护和测试性能的网络媒体称为软件信息网站。在本文中, 我们提出了一种软件信息网站中自动标签推荐方法, 称为TagCombine。TagCombine由三个不同的组件组成, 分别是:1.多标记排序组件, 将标签推荐考虑成一个多标记学习问题;2.相似度排序组件, 通过搜寻相似的对象来推荐标签;3.基于标签-词的排序组件, 通过分析对象中不同单词和标签之间的关系来推荐标签。我们在四个软件信息网站上进行实验来评估TagCombine的性能, 它们是AskDifferent, AskUbuntu, Freecode和StackOverflow。平均来说, TagCombine的recall@5和recall@10 的值是0.6198 和0.7625, 我们的方法比Al-Kofahi等人论文提出的TagRec方法提高了14.56%和10.55%, 比Zangerle等人论文提出的标签推荐方法提高了12.08%和8.16%。

Abstract: Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers improve their performance in software development, maintenance, and test processes as software information sites. In this paper, we propose TagCombine, an automatic tag recommendation method which analyzes objects in software information sites. TagCombine has three different components: 1) multi-label ranking component which considers tag recommendation as a multi-label learning problem; 2) similarity-based ranking component which recommends tags from similar objects; 3) tag-term based ranking component which considers the relationship between different terms and tags, and recommends tags after analyzing the terms in the objects. We evaluate TagCombine on four software information sites, Ask Different, Ask Ubuntu, Freecode, and Stack Overflow. On averaging across the four projects, TagCombine achieves recall@5 and recall@10 to 0.619 8 and 0.762 5 respectively, which improves TagRec proposed by Al-Kofahi et al. by 14.56% and 10.55% respectively, and the tag recommendation method proposed by Zangerle et al. by 12.08% and 8.16% respectively.

[1] Storey M, Treude C, Deursen A, Cheng L. The impact of social media on software engineering practices and tools. In Proc. the FSE/SDP Workshop on Future of Software Engineering Research, November 2010, pp.359-364.

[2] Begel A, DeLine R, Zimmermann T. Social media for software engineering. In Proc. the FSE/SDP Workshop on Future of Software Engineering Research, November 2010, pp.33-38.

[3] Treude C, Storey M. How tagging helps bridge the gap between social and technical aspects in software development. In Proc. the 31st IEEE International Conference on Software Engineering (ICSE), May 2009, pp.12-22.

[4] Treude C, Storey M A. Work item tagging:Communicating concerns in collaborative software development. IEEE Transactions on Software Engineering, 2012, 38(1):19-34.

[5] Tsoumakas G, Katakis I. Multi-label classification:An overview. International Journal of Data Warehousing and Mining (IJDWM), 2007, 3(3):1-13.

[6] Al-Kofahi J, Tamrawi A, Nguyen T T, Nguyen H A, Nguyen T N. Fuzzy set approach for automatic tagging in evolving software. In Proc. the 26th IEEE International Conference on Software Maintenance (ICSM), September 2010.

[7] Zangerle E, Gassler W, Specht G. Using tag recommendations to homogenize folksonomies in microblogging environments. In Proc. the 3rd Int. Conf. Social Informatics, Oct. 2011, pp.113-126.

[8] Xia X, Lo D,Wang X, Zhou B. Tag recommendation in software information sites. In Proc. the 10th Working Conference on Mining Software Repositories, May 2013, pp.287- 296.

[9] Baeza-Yates R A, Ribeiro-Neto B A. Modern Information Retrieval-The Concepts and Technology Behind Search (2nd edition). Boston, MA, USA:Addison-Wesley Publishing Company, 2011.

[10] Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, Maimon O, Rokach L (eds.), Springer US, 2010, pp.667-685.

[11] Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7):1079-1089.

[12] Zhang M, Zhou Z. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10):1338-1351.

[13] McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In Proc. AAAI-98 Workshop on Learning for Text Categorization, July 1998.

[14] Tsoumakas G, Spyromitros-Xioufis L, Vilcek J, Vlahavas I. MULAN:A Java library for multi-label learning. Journal of Machine Learning Research, 2011, 12:2411-2414.

[15] Han J, Kamber M. Data Mining:Concepts and Techniques (2nd edition). San Francisco, CA, USA:Morgan Kaufmann, 2006.

[16] Zhang M, Zhou Z. ML-KNN:A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7):2038-2048.

[17] Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), September 2012, pp.604-607.

[18] Bacchelli A. Mining challenge 2013:Stack Overflow. In Proc. the 10th Working Conference on Mining Software Repositories, June 2013.

[19] Wurst M. The word vector tool:User guide. December 2007. http://wvtool.sf.net, Mar. 2015.

[20] Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z. CQArank:Jointly model topics and expertise in community question answering. In Proc. the 22nd ACM International Conference on Conference on Information & Knowledge Management, October 27-November 1, 2013, pp.99-108.

[21] Surian D, Liu N, Lo D, Tong H, Lim E, Faloutsos C. Recommending people in developers' collaboration network. In Proc. the 18th Working Conference on Reverse Engineering (WCRE), Oct. 2011, pp.379-388.

[22] Xia X, Lo D, Qiu W, Wang X, Zhou B. Automated configuration bug report prediction using text mining. In Proc. the 38th IEEE Annual Computer Software and Applications Conference (COMPSAC), July 2014, pp.107-116.

[23] Shihab E, Ihara A, Kamei Y, Ibrahim W M, Ohira M, Adams B, Hassan A E, Matsumoto K I. Predicting reopened bugs:A case study on the Eclipse project. In Proc. the 17th Working Conference on Reverse Engineering (WCRE), Oct. 2010, pp.249-258.

[24] Osman M H, Chaudron M R, van der Putten P. An analysis of machine learning algorithms for condensing reverse engineered class diagrams. In Proc. the 29th IEEE International Conference on Software Maintenance (ICSM), September 2013, pp.140-149.

[25] Xia X, Feng Y, Lo D, Chen Z, Wang X. Towards more accurate multi-label software behavior learning. In Proc. the 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), Feb. 2014, pp.134-143.

[26] Xia X, Lo D, Wang X, Zhou B. Accurate developer recommendation for bug resolution. In Proc. the 20th Working Conference on Reverse Engineering (WCRE), Oct. 2013, pp.72-81.

[27] Marlow C, Naaman M, Boyd D, Davis M. HT06, tagging paper, taxonomy, Flickr, academic article, to read. In Proc. the 17th Conference on Hypertext and Hypermedia, August 2006, pp.31-40.

[28] Sigurbjörnsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. In Proc. the 17th International Conference on World Wide Web, April 2008, pp.327-336.

[29] Hong Q, Kim S, Cheung S, Bird C. Understanding a developer social network and its evolution. In Proc. the 27th IEEE International Conference on Software Maintenance (ICSM), September 2011, pp.323-332.

[30] Surian D, Lo D, Lim E P. Mining collaboration patterns from a large developer network. In Proc. the 17th Working Conference on Reverse Engineering (WCRE), Oct. 2010, pp.269-273.

[31] Bougie G, Starke J, Storey M A, German D M. Towards understanding Twitter use in software engineering:Preliminary findings, ongoing challenges and future questions. In Proc. the 2nd International Workshop on Web 2.0 for Software Engineering, May 2011, pp.31-36.

[32] Tian Y, Achananuparp P, Lubis I, Lo D, Lim E. What does software engineering community microblog about? In Proc. the 9th IEEE Working Conference on Mining Software Repositories (MSR), June 2012, pp.247-250.

[33] Achananuparp P, Lubis I N, Tian Y, Lo D, Lim E P. Observatory of trends in software related microblogs. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, September 2012, pp.334-337.

[34] Prasetyo P K, Lo D, Achananuparp P, Tian Y, Lim E P. Automatic classification of software related microblogs. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), September 2012, pp.596-599.

[35] Pagano D, Maalej W. How do developers blog?:An exploratory study. In Proc. the 8th Working Conference on Mining Software Repositories, May 2011, pp.123-132.

[36] Gottipati S, Lo D, Jiang J. Finding relevant answers in software forums. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering, November 2011, pp.323-332.

[37] HenβS, Monperrus M, Mezini M. Semi-automatically extracting FAQs to improve accessibility of software development knowledge. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.793-803.

[38] Thung F, Lo D, Jiang L. Detecting similar applications with collaborative tagging. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), September 2012, pp.600-603.

[39] Zhang M, Zhang K. Multi-label learning by exploiting label dependency. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, pp.999-1008.

[40] Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Machine Learning, 2011, 85(3):333-359.

[41] Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6):391-407.

[42] Hofmann T. Probabilistic latent semantic indexing. In Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 1999, pp.50-57.

[43] Jolliffe I. Principal Component Analysis. Springer-Verlag New York, 2002.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 沈理; Stephen Y.H.Su;. Generalized Parallel Signature Analyzers with External Exclusive-OR Gates[J]. , 1986, 1(4): 49 -61 .
[2] 冯玉琳;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[3] 戴汝为; 王珏; 陈欣;. A Syntactic-Semantic Approach for Pattern Recognition and Knowledge Representation[J]. , 1988, 3(3): 161 -172 .
[4] 练林; 张一立; 唐常杰;. A Non-Recursive Algorithm Computing Set Expressions[J]. , 1988, 3(4): 310 -316 .
[5] 蔡士杰; 张福炎;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[6] 徐洁盘; 王磊;. A New Approach to Database Auto-Design by Logic[J]. , 1991, 6(2): 201 -204 .
[7] Adelino Santos;. Cooperative Hypermedia Editing with CoMEdiA[J]. , 1993, 8(3): 67 -79 .
[8] 向东; 魏道政;. GLOBAL: A Design for Random Testability Algorithm[J]. , 1994, 9(2): 182 -192 .
[9] 曹存根; 刘薇;. A Three-Stage Knowledge Acquisition Method[J]. , 1995, 10(3): 274 -280 .
[10] 徐殿祥; 郑国梁;. Logical Object as a Basis of Knowledge Based Systems[J]. , 1995, 10(5): 425 -438 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: