›› 2015, Vol. 30 ›› Issue (5): 1017-1035.doi: 10.1007/s11390-015-1578-2

Special Issue: Data Management and Data Mining; Software Systems

• Special Section on Software Systems • Previous Articles     Next Articles

TagCombine: Recommending Tags to Contents in Software Information Sites

Xin-Yu Wang1(王新宇), Xin Xia1*(夏鑫), Member, CCF, ACM, IEEE, David Lo2, Member, ACM, IEEE   

  1. 1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
    2 School of Information Systems, Singapore Management University, Singapore, Singapore
  • Received:2015-03-20 Revised:2015-07-09 Online:2015-09-05 Published:2015-09-05
  • Contact: Xin Xia E-mail:xxia@zju.edu.cn
  • About author:Xin-Yu Wang received his Bachelor's and Ph.D. degrees in computer science from Zhejiang University, Hangzhou, in 2002 and 2007 respectively. He was a research assistant in Zhejiang University during 2002~2007. He is currently an associate professor in the College of Computer Science and Technology, Zhejiang University. His research interests include software engineering, formal methods, and very large information systems.
  • Supported by:

    This research was partially supported by China Knowledge Centre for Engineering Sciences and Technology under Grant No. CKCEST-2014-1-5, the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant Nos. 2015BAH17F01 and 2013BAH01B01, and the Fundamental Research Funds for the Central Universities of China.

Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers improve their performance in software development, maintenance, and test processes as software information sites. In this paper, we propose TagCombine, an automatic tag recommendation method which analyzes objects in software information sites. TagCombine has three different components: 1) multi-label ranking component which considers tag recommendation as a multi-label learning problem; 2) similarity-based ranking component which recommends tags from similar objects; 3) tag-term based ranking component which considers the relationship between different terms and tags, and recommends tags after analyzing the terms in the objects. We evaluate TagCombine on four software information sites, Ask Different, Ask Ubuntu, Freecode, and Stack Overflow. On averaging across the four projects, TagCombine achieves recall@5 and recall@10 to 0.619 8 and 0.762 5 respectively, which improves TagRec proposed by Al-Kofahi et al. by 14.56% and 10.55% respectively, and the tag recommendation method proposed by Zangerle et al. by 12.08% and 8.16% respectively.

[1] Storey M, Treude C, Deursen A, Cheng L. The impact of social media on software engineering practices and tools. In Proc. the FSE/SDP Workshop on Future of Software Engineering Research, November 2010, pp.359-364.

[2] Begel A, DeLine R, Zimmermann T. Social media for software engineering. In Proc. the FSE/SDP Workshop on Future of Software Engineering Research, November 2010, pp.33-38.

[3] Treude C, Storey M. How tagging helps bridge the gap between social and technical aspects in software development. In Proc. the 31st IEEE International Conference on Software Engineering (ICSE), May 2009, pp.12-22.

[4] Treude C, Storey M A. Work item tagging:Communicating concerns in collaborative software development. IEEE Transactions on Software Engineering, 2012, 38(1):19-34.

[5] Tsoumakas G, Katakis I. Multi-label classification:An overview. International Journal of Data Warehousing and Mining (IJDWM), 2007, 3(3):1-13.

[6] Al-Kofahi J, Tamrawi A, Nguyen T T, Nguyen H A, Nguyen T N. Fuzzy set approach for automatic tagging in evolving software. In Proc. the 26th IEEE International Conference on Software Maintenance (ICSM), September 2010.

[7] Zangerle E, Gassler W, Specht G. Using tag recommendations to homogenize folksonomies in microblogging environments. In Proc. the 3rd Int. Conf. Social Informatics, Oct. 2011, pp.113-126.

[8] Xia X, Lo D,Wang X, Zhou B. Tag recommendation in software information sites. In Proc. the 10th Working Conference on Mining Software Repositories, May 2013, pp.287- 296.

[9] Baeza-Yates R A, Ribeiro-Neto B A. Modern Information Retrieval-The Concepts and Technology Behind Search (2nd edition). Boston, MA, USA:Addison-Wesley Publishing Company, 2011.

[10] Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, Maimon O, Rokach L (eds.), Springer US, 2010, pp.667-685.

[11] Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7):1079-1089.

[12] Zhang M, Zhou Z. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10):1338-1351.

[13] McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In Proc. AAAI-98 Workshop on Learning for Text Categorization, July 1998.

[14] Tsoumakas G, Spyromitros-Xioufis L, Vilcek J, Vlahavas I. MULAN:A Java library for multi-label learning. Journal of Machine Learning Research, 2011, 12:2411-2414.

[15] Han J, Kamber M. Data Mining:Concepts and Techniques (2nd edition). San Francisco, CA, USA:Morgan Kaufmann, 2006.

[16] Zhang M, Zhou Z. ML-KNN:A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7):2038-2048.

[17] Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), September 2012, pp.604-607.

[18] Bacchelli A. Mining challenge 2013:Stack Overflow. In Proc. the 10th Working Conference on Mining Software Repositories, June 2013.

[19] Wurst M. The word vector tool:User guide. December 2007. http://wvtool.sf.net, Mar. 2015.

[20] Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z. CQArank:Jointly model topics and expertise in community question answering. In Proc. the 22nd ACM International Conference on Conference on Information & Knowledge Management, October 27-November 1, 2013, pp.99-108.

[21] Surian D, Liu N, Lo D, Tong H, Lim E, Faloutsos C. Recommending people in developers' collaboration network. In Proc. the 18th Working Conference on Reverse Engineering (WCRE), Oct. 2011, pp.379-388.

[22] Xia X, Lo D, Qiu W, Wang X, Zhou B. Automated configuration bug report prediction using text mining. In Proc. the 38th IEEE Annual Computer Software and Applications Conference (COMPSAC), July 2014, pp.107-116.

[23] Shihab E, Ihara A, Kamei Y, Ibrahim W M, Ohira M, Adams B, Hassan A E, Matsumoto K I. Predicting reopened bugs:A case study on the Eclipse project. In Proc. the 17th Working Conference on Reverse Engineering (WCRE), Oct. 2010, pp.249-258.

[24] Osman M H, Chaudron M R, van der Putten P. An analysis of machine learning algorithms for condensing reverse engineered class diagrams. In Proc. the 29th IEEE International Conference on Software Maintenance (ICSM), September 2013, pp.140-149.

[25] Xia X, Feng Y, Lo D, Chen Z, Wang X. Towards more accurate multi-label software behavior learning. In Proc. the 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), Feb. 2014, pp.134-143.

[26] Xia X, Lo D, Wang X, Zhou B. Accurate developer recommendation for bug resolution. In Proc. the 20th Working Conference on Reverse Engineering (WCRE), Oct. 2013, pp.72-81.

[27] Marlow C, Naaman M, Boyd D, Davis M. HT06, tagging paper, taxonomy, Flickr, academic article, to read. In Proc. the 17th Conference on Hypertext and Hypermedia, August 2006, pp.31-40.

[28] Sigurbjörnsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. In Proc. the 17th International Conference on World Wide Web, April 2008, pp.327-336.

[29] Hong Q, Kim S, Cheung S, Bird C. Understanding a developer social network and its evolution. In Proc. the 27th IEEE International Conference on Software Maintenance (ICSM), September 2011, pp.323-332.

[30] Surian D, Lo D, Lim E P. Mining collaboration patterns from a large developer network. In Proc. the 17th Working Conference on Reverse Engineering (WCRE), Oct. 2010, pp.269-273.

[31] Bougie G, Starke J, Storey M A, German D M. Towards understanding Twitter use in software engineering:Preliminary findings, ongoing challenges and future questions. In Proc. the 2nd International Workshop on Web 2.0 for Software Engineering, May 2011, pp.31-36.

[32] Tian Y, Achananuparp P, Lubis I, Lo D, Lim E. What does software engineering community microblog about? In Proc. the 9th IEEE Working Conference on Mining Software Repositories (MSR), June 2012, pp.247-250.

[33] Achananuparp P, Lubis I N, Tian Y, Lo D, Lim E P. Observatory of trends in software related microblogs. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, September 2012, pp.334-337.

[34] Prasetyo P K, Lo D, Achananuparp P, Tian Y, Lim E P. Automatic classification of software related microblogs. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), September 2012, pp.596-599.

[35] Pagano D, Maalej W. How do developers blog?:An exploratory study. In Proc. the 8th Working Conference on Mining Software Repositories, May 2011, pp.123-132.

[36] Gottipati S, Lo D, Jiang J. Finding relevant answers in software forums. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering, November 2011, pp.323-332.

[37] HenβS, Monperrus M, Mezini M. Semi-automatically extracting FAQs to improve accessibility of software development knowledge. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.793-803.

[38] Thung F, Lo D, Jiang L. Detecting similar applications with collaborative tagging. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), September 2012, pp.600-603.

[39] Zhang M, Zhang K. Multi-label learning by exploiting label dependency. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, pp.999-1008.

[40] Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Machine Learning, 2011, 85(3):333-359.

[41] Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6):391-407.

[42] Hofmann T. Probabilistic latent semantic indexing. In Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 1999, pp.50-57.

[43] Jolliffe I. Principal Component Analysis. Springer-Verlag New York, 2002.
No related articles found!
Full text



[1] Shen Li; Stephen Y.H.Su;. Generalized Parallel Signature Analyzers with External Exclusive-OR Gates[J]. , 1986, 1(4): 49 -61 .
[2] Feng Yulin;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[3] Tai Juwei; Wang Jue; Chen Xin;. A Syntactic-Semantic Approach for Pattern Recognition and Knowledge Representation[J]. , 1988, 3(3): 161 -172 .
[4] Lian Lin; Zhang Yili; Tang Changjie;. A Non-Recursive Algorithm Computing Set Expressions[J]. , 1988, 3(4): 310 -316 .
[5] Cai Shijie; Zhang Fuyan;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[6] Xu Jiepan; Wang Lei;. A New Approach to Database Auto-Design by Logic[J]. , 1991, 6(2): 201 -204 .
[7] Adelino Santos;. Cooperative Hypermedia Editing with CoMEdiA[J]. , 1993, 8(3): 67 -79 .
[8] Xiang Dong; Wei Daozheng;. GLOBAL: A Design for Random Testability Algorithm[J]. , 1994, 9(2): 182 -192 .
[9] Cao Cungen; Liu Wei;. A Three-Stage Knowledge Acquisition Method[J]. , 1995, 10(3): 274 -280 .
[10] Xu Dianxiang; Zheng Guoliang;. Logical Object as a Basis of Knowledge Based Systems[J]. , 1995, 10(5): 425 -438 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved