Mobile apps (applications) have become a popular form of software, and the app reviews by users have become an important feedback resource. Users may raise some issues in their reviews when they use apps, such as a functional bug, a network lag, or a request for a feature. Understanding these issues can help developers to focus on users' concerns, and help users to evaluate similar apps for download or purchase. However, we do not know which types of issues are raised in a review. Moreover, the amount of user reviews is huge and the nature of the reviews' text is unstructured and informal. In this paper, we analyze 3 902 user reviews from 11 mobile apps in a Chinese app store-360 Mobile Assistant, and uncover 17 issue types. Then, we propose an approach CSLabel that can label user reviews based on the raised issue types. CSLabel uses a cost-sensitive learning method to mitigate the effects of the imbalanced data, and optimizes the setting of the support vector machine (SVM) classifier's kernel function. Results show that CSLabel can correctly label reviews with the precision of 66.5%, the recall of 69.8%, and the F1 measure of 69.8%. In comparison with the state-of-the-art approach, CSLabel improves the precision by 14%, the recall by 30%, the F1 measure by 22%. Finally, we apply our approach to two real scenarios:1) we provide an overview of 1 076 786 user reviews from 1 100 apps in the 360 Mobile Assistant and 2) we find that some issue types have a negative correlation with users' evaluation of apps.
This work is supported by the National Natural Science Foundation of China under Grant No. 61672078, and the State Key Laboratory of Software Development Environment of China under Grant No. SKLSDE-2017ZX-06.
通讯作者: Jing Jiang
About author: Li Zhang received her Bachelor's,Master's and Ph.D.degrees in computer science and technology from Beihang University,Beijing,in 1989,1992 and 1996,respectively.She is now a professor in the State Key Laboratory of Software Development Environment of Beihang University,Beijing,where she is leading the expertise area of system and software modeling.
Li Zhang, Xin-Yue Huang, Jing Jiang, Ya-Kun Hu.CSLabel：一种为移动评论添加标签的方法[J] Journal of Computer Science and Technology , 2017,V32(6): 1076-1089
Li Zhang, Xin-Yue Huang, Jing Jiang, Ya-Kun Hu.CSLabel:An Approach for Labelling Mobile App Reviews[J] Journal of Computer Science and Technology, 2017,V32(6): 1076-1089
 Pagano D, Maalej W. User feedback in the Appstore:An empirical study. In Proc. the 21st IEEE International Requirements Engineering Conference (RE), July 2013, pp.125-134. Pagano D, Brügge B. User involvement in software evolution practice:A case study. In Proc. the 35th International Conference on Software Engineering (ICSE), May 2013, pp.953-962. Maalej W, Nabil H. Bug report, feature request, or simply praise? On automatically classifying app reviews. In Proc. the 23rd IEEE International Requirements Engineering Conference (RE), Aug. 2015, pp.116-125. Panichella S, Di Sorbo A, Guzman E, Visaggio C A, Canfora G, Gall H C. How can I improve my app? Classifying user reviews for software maintenance and evolution. In Proc. IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept. 29-Oct. 1, 2015, pp.281-290. McIlroy S, Ali N, Khalid H, Hassan A E. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering, 2016, 21(3):1067-1106. Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies, Volume 1, June 2011, pp.142-150. Pang B, Lee L. A sentimental education:Sentiment analysis using subjectivity summarization based on minimum cuts. In Proc. the 42nd Annual Meeting on Association for Computational Linguistics, July 2004, pp.271-278. Seaman C B, Shull F, Regardie M, Elbert D, Feldmann R L, Guo Y, Godfrey S. Defect categorization:Making use of a decade of widely varying historical data. In Proc. the 2nd ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, Oct. 2008, pp.149-157. Seaman C B. Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering, 1999, 25(4):557-572. Shrout P E, Fleiss J L. Intraclass correlations:Uses in assessing rater reliability. Psychological Bulletin, 1979, 86(2):420-428. Witten I H, Frank E, Hall M A, Pal C J. Data Mining:Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016. Salton G, Yang C S. On the specification of term values in automatic indexing. Journal of Documentation, 1973, 29(4):351-372. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5):513-523. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software:An update. ACM SIGKDD Explorations Newsletter, 2009, 11(1):10-18. Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, Maimon R L (ed.), Springer, 2009, pp.667-685. Elkan C. The foundations of cost-sensitive learning. In Proc. the 17th International Joint Conference on Artificial Intelligence, Volume 17, Aug. 2001, pp.973-978. Dumais S, Platt J, Heckerman D, Sahami M. Inductive learning algorithms and representations for text categorization. In Proc. the 7th International Conference on Information and Knowledge Management, Nov. 1998, pp.148-155. Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 2002, 34(1):1-47. Platt J. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods-Support Vector Learning, Schoelkopf B, Burges C, Smola A (eds.), MIT Press, 1998. Seni G, Elder J F. Ensemble Methods in Data Mining:Improving Accuracy Through Combining Predictions. Morgan & Claypool, 2010. Harman M, Jia Y, Zhang Y. App store mining and analysis:MSR for app stores. In Proc. the 9th IEEE Working Conference on Mining Software Repositories (MSR), June 2012, pp.108-111. Di Sorbo A, Panichella S, Alexandru C V, Shimagaki J, Visaggio C A, Canfora G, Gall H C. What would users change in my app? Summarizing app reviews for recommending software changes. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2016, pp.499-510. Iacob C, Harrison R. Retrieving and analyzing mobile apps feature requests from online reviews. In Proc. the 10th IEEE Working Conference on Mining Software Repositories (MSR), May 2013, pp.41-44. Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N. Why people hate your app:Making sense of user feedback in a mobile app store. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2013, pp.1276-1284. Carreño L V G, Winbladh K. Analysis of user comments:An approach for software requirements evolution. In Proc. the 35th International Conference on Software Engineering (ICSE), May 2013, pp.582-591. Jo Y, Oh A H. Aspect and sentiment unification model for online review analysis. In Proc. the 4th ACM International Conference on Web Search and Data Mining, Feb. 2011, pp.815-824. Guzman E, Maalej W. How do users like this feature? A fine grained sentiment analysis of app reviews. In Proc. the 22nd IEEE International Requirements Engineering Conference (RE), Aug. 2014, pp.153-162. Manning C D, Schütze H. Foundations of Statistical Natural Language Processing (1st edition). MIT Press, 1999. Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 2010, 61(12):2544-2558. Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022. Chen N, Lin J, Hoi S C, Xiao X, Zhang B. AR-Miner:Mining informative reviews for developers from mobile app marketplace. In Proc. the 36th International Conference on Software Engineering, May 31-June 7, 2014, pp.767-778.