›› 2017,Vol. 32 ›› Issue (4): 785-795.doi: 10.1007/s11390-017-1759-2

所属专题: Artificial Intelligence and Pattern Recognition

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

基于RNN的文本关键字强调模型用于情感分类

Fei Hu1,2, Student Member, CCF, Li Li1,*, Senior Member, CCF, Member, ACM, Zi-Li Zhang1, Distinguished Member, CCF, Member, ACM, Jing-Yuan Wang1, Student Member, CCF, Xiao-Fei Xu1, Student Member, CCF   

  1. 1 College of Computer and Information Science, Southwest University, Chongqing 400715, China;
    2 Network Centre, Chongqing University of Education, Chongqing 400065, China
  • 收稿日期:2016-12-20 修回日期:2017-06-03 出版日期:2017-07-05 发布日期:2017-07-05
  • 通讯作者: Li Li E-mail:lily@swu.edu.cn
  • 作者简介:Fei Hu is a Ph.D. candidate in the College of Computer and Information Science, Southwest University, Chongqing. His research interests include deep learning technologies and natural language processing.
  • 基金资助:

    The work was supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission of China under Grant No. KJ1501405, the National Natural Science Foundation of China under Grant No. 61170192, and the Chongqing Science and Technology Commission (CSTC) under Grant No. cstc2015gjhz40002.

Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks

Fei Hu1,2, Student Member, CCF, Li Li1,*, Senior Member, CCF, Member, ACM, Zi-Li Zhang1, Distinguished Member, CCF, Member, ACM, Jing-Yuan Wang1, Student Member, CCF, Xiao-Fei Xu1, Student Member, CCF   

  1. 1 College of Computer and Information Science, Southwest University, Chongqing 400715, China;
    2 Network Centre, Chongqing University of Education, Chongqing 400065, China
  • Received:2016-12-20 Revised:2017-06-03 Online:2017-07-05 Published:2017-07-05
  • Contact: Li Li E-mail:lily@swu.edu.cn
  • Supported by:

    The work was supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission of China under Grant No. KJ1501405, the National Natural Science Foundation of China under Grant No. 61170192, and the Chongqing Science and Technology Commission (CSTC) under Grant No. cstc2015gjhz40002.

随着网上交流方式的盛行,人们越来越容易通过互联网获取到各种文本资讯。这里面有很多文本内容短小且不符合语法规范,比如论坛帖子、微博和电影评论等,计算机很难对这一类文本进行分析。例如,传统的BOW概率模型就很难处理短文本。因为短文本的概率统计信息不足,而这是BOW模型在处理这一类模型时所必须的。近年来,不少研究者开始关注文本中单词之间的依存关系,用来弥补文本中单词统计信息不足的缺点,从而更好的实现文本语义挖掘。LSTM就是这样一种模型,可以挖掘文本中单词之间的依存关系,能够“记住”这些依存关系,即使两个较远距离的单词之间的关系也能够记住。同时,我们通过研究人类阅读文本的方式,发现人们会对文本中特点的一些单词记忆深刻,这种“区别记忆”的方式可以帮助人们记住文本中关键内容,从而更好地理解文本语义。在本文中,我们提出一种基于LSTM的关键字记忆模型,该模型能够对模拟人类的“区别记忆”模式,从而更好地理解文本语义。为了验证效果,我们把这个模型分别用于两个数据集的情感分类任务:IMDB和SemEval2016。实验结果证明我们的模型效果显著。在情感分类的准确度方面,比基准LSTM模型提升了1到2个百分点,特别是在处理短文本方面把非RNN模型远远抛在后面。同时我们也把“区别记忆”这一方式用于GRU模型(LSTM的一种变种),同样取得了不俗的效果。

Abstract: With the explosion of online communication and publication, texts become obtainable via forums, chat messages, blogs, book reviews and movie reviews. Usually, these texts are much short and noisy without sufficient statistical signals and enough information for a good semantic analysis. Traditional natural language processing methods such as Bow-of-Word (BOW) based probabilistic latent semantic models fail to achieve high performance due to the short text environment. Recent researches have focused on the correlations between words, i.e., term dependencies, which could be helpful for mining latent semantics hidden in short texts and help people to understand them. Long short-term memory (LSTM) network can capture term dependencies and is able to remember the information for long periods of time. LSTM has been widely used and has obtained promising results in variants of problems of understanding latent semantics of texts. At the same time, by analyzing the texts, we find that a number of keywords contribute greatly to the semantics of the texts. In this paper, we establish a keyword vocabulary and propose an LSTM-based model that is sensitive to the words in the vocabulary; hence, the keywords leverage the semantics of the full document. The proposed model is evaluated in a short-text sentiment analysis task on two datasets:IMDB and SemEval-2016, respectively. Experimental results demonstrate that our model outperforms the baseline LSTM by 1% 2% in terms of accuracy and is effective with significant performance enhancement over several non-recurrent neural network latent semantic models (especially in dealing with short texts). We also incorporate the idea into a variant of LSTM named the gated recurrent unit (GRU) model and achieve good performance, which proves that our method is general enough to improve different deep learning models.

[1] Wang G, Zhang Z, Sun J S, Sun J S, Yang S L, Larsonc C A. POS-RS:A random subspace method for sentiment classification based on part-of-speech analysis. Information Processing & Management, 2015, 51(4):458-479.

[2] Hua W, Wang Z Y, Wang H X, Zheng K, Zhou X F. Short text understanding through lexical-semantic analysis. In Proc. Int. Conf. Data Engineering, April 2015, pp.495-506.

[3] Zou H, Tang X H, Xie B, Liu B. Sentiment classification using machine learning techniques with syntax features. In Proc. Int. Conf. Computational Science and Computational Intelligence, Dec. 2015, pp.175-179.

[4] Davuth N, Kim S R. Classification of malicious domain names using support vector machine and bi-gram method. International Journal of Security and its Applications, 2013, 7(1):51-58.

[5] Bao S H, Xu S L, Zhang L, Yan R, Su Z, Han D Y, Yu Y. Mining social emotions from affective text. IEEE Trans. Knowledge and Data Engineering, 2012, 24(9):1658-1670.

[6] Rao Y H, Lei J S, Liu W Y, Li Q, Chen M L. Building emotional dictionary for sentiment analysis of online news. World Wide Web, 2014, 17(4):723-742.

[7] Stoyanov V, Cardie C. Annotating topics of opinions. In Proc. the 6th International Conference on Language Resources and Evaluation, May 31-June 1, 2008, pp.3213-3217.

[8] Cheng X Q, Yan X H, Guo Y Y, Guo J F. BTM:Topic modeling over short texts. IEEE Trans. Knowledge and Data Engineering, 2014, 26(12):2928-2941.

[9] Wang Z Y, Zhao K J, Wang H X, Meng X F, Wen J R. Query understanding through knowledge-based conceptualization. In Proc. the 24th Int. Conf. Artificial Intelligence, July 2015, pp.3264-3270.

[10] Cheng J P, Wang Z Y, Wen J R, Yan J. Contextual text understanding in distributional semantic space. In Proc. the 24th ACM Int. Conf. Information and Knowledge Management, Oct. 2015, pp.133-142.

[11] Cui W Y, Zhou X Y, Lin H Y, Xiao Y H. Verb pattern:A probabilistic semantic representation on verbs. In Proc. the 30th AAAI Conf. Artificial Intelligence, March 2016, pp.2587-2593.

[12] Zhang X W, Wu B. Short text classification based on feature extension using the n-gram model. In Proc. the 12th Int. Conf. Fuzzy Systems and Knowledge Discovery, Aug. 2015, pp.710-716.

[13] López G J, Ruiz I M. Character and word baselines systems for irony detection in Spanish short texts. Procesamiento de Lenguaje Natural, 2016, 56:41-48.

[14] Song G, Ye Y M, Du X L, Huang X H, Bie S F. Short text classification:A survey. Journal of Multimedia, 2014, 9(5):635-643.

[15] Wang M, Lin L F, Wang F. Improving short text classification through better feature space selection. In Proc. the 9th Int. Conf. Computational Intelligence and Security, December 2013, pp.120-124.

[16] Wang B K, Huang Y F, Yang W X, Li X. Short text classification based on strong feature thesaurus. Journal of Zhejiang University Science C, 2012, 13(9):649-659.

[17] Kim K, Chung B S, Choi Y, Lee S, Jung J Y, Park J. Language independent semantic kernels for short-text classification. Expert Systems with Applications, 2014, 41(2):735-743.

[18] Fan X H, Hu H G. Construction of high-quality feature extension mode library for Chinese short-text classification. In Proc. WASE Int. Conf. Information Engineering, Aug. 2010, pp.87-90.

[19] Song Y Q, Wang H X, Wang Z Y, Li H S, Chen W Z. Short text conceptualization using a probabilistic knowledgebase. In Proc. the 22nd Int. Joint Conf. Artificial Intelligence, July 2011, pp.2330-2336.

[20] Kim D, Wang H X, Oh A. Context-dependent conceptualization. In Proc. the 23rd Int. Joint Conf. Artificial Intelligence, Aug. 2013, pp.2654-2661.

[21] Huang P S, He X D, Gao J F, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In Proc. the 22nd ACM Int. Conf. Information & Knowledge Management, Oct. 2013, pp.2333-2338.

[22] Shen Y L, He X D, Gao J F, Deng L, Mesnil G. A latent semantic model with convolutional-pooling structure for information retrieval. In Proc. the 23rd ACM Int. Conf. Information and Knowledge Management, Nov. 2014, pp.101-110.

[23] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.

[24] Hu F, Xu X F, Wang J Y, Yang Z B, Li L. Memoryenhanced latent semantic model:Short text understanding for sentiment analysis. In Proc. Int. Conf. Database Systems for Advanced Applications. March 2017, pp.393-407.

[25] Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 2001, 42(1/2):177-196.

[26] Wang J, Peng J X, Liu O. A classification approach for less popular webpages based on latent semantic analysis and rough set model. Expert Systems with Applications, 2015, 42(1):642-648.

[27] Ke X H, Luo H J. Using LSA and PLSA for text quality analysis. In Proc. Int. Conf. Electronic Science and Automation Control, Jan. 2015, pp.289-291.

[28] Anoop V S, Prem S C, Asharaf S, Alessandro Z. Generating and visualizing topic hierarchies from microblogs:An iterative latent dirichlet allocation approach. In Proc. Int. Conf. Advances in Computing, Communications and Informatics, Aug. 2015, pp.824-828.

[29] Gao J F, Toutanova K, Yih W T. Clickthrough-based latent semantic models for web search. In Proc. the 34th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, July 2011, pp.675-684.

[30] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507.

[31] Bengio Y, Ducharme R, Vincent P, Janvin C. A neural probabilistic language model. Journal of Machine Learning Research, 2003, 3(2):1137-1155.

[32] Huang E H, Socher R, Manning C D, Ng A Y. Improving word representations via global context and multiple word prototypes. In Proc. the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1, July 2012, pp.873-882.

[33] Salakhutdinov R, Hinton G. Semantic hashing. International Journal of Approximate Reasoning, 2009, 50(7):969-978.

[34] Mikolov T, Karafiát M, Burget L, ?ernocký J, Khudanpur S. Recurrent neural network based language model. In Proc. the 11th Annual Conference of the International Speech Communication Association, Sept. 2010, 1045-1048.

[35] Mikolov T. Statistical language models based on neural networks. http://www.fit.vutbr.cz/~imikolov/rnnlm/google.pdf, March 2015.

[36] Williams R J, Zipser D. Gradient-based learning algorithms for recurrent networks and their computational complexity. In Backpropagation:Theory, Architectures, and Applications, Chauvin Y, Rumelhart D E (eds.), Lawrence Erlbaum Associates, Inc., 1995, pp.433-486.

[37] Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. In Proc. the 30th Int. Conf. Machine Learning, June 2013, pp.1310-1318.

[38] Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, 6(2):107-116.

[39] Olah C. Understanding LSTM networks. http://colah.github. io/posts/2015-08-Understanding-LSTMs/, Sept. 2016.

[40] Gers F A, Schmidhuber J, Cummins F. Learning to forget:Continual prediction with LSTM. Neural Computation, 2000, 12(10):2451-2471.

[41] Gers F A, Schmidhuber J. Recurrent nets that time and count. In Proc. the IEEE-INNS-ENNS Int. Joint Conf. Neural Networks, July 2000.

[42] Greff K, Srivastava R K, Koutník J, Steunebrink B R, Schmidhuber J. LSTM:A search space odyssey. IEEE Trans. Neural Networks and Learning Systems, 2015, PP(99):1-11.

[43] Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078, 2014. http://arxiv. org/abs/1406.1078, Sept. 2016.

[44] Esuli A, Sebastiani F. SentiWordNet:A publicly available lexical resource for opinion mining. In Proc. the 5th Conf. Language Resources and Evaluation, May 2006, pp.417-422.

[45] Baccianella S, Esuli A, Sebastiani F. SentiWordNet 3.0:An enhanced lexical resource for sentiment analysis and opinion mining. In Proc. the 7th Conf. Int. Language Resources and Evaluation, Jan. 2010, pp.2200-2204.

[46] Miller G A. WordNet:A lexical database for English. Communications of the ACM, 1995, 38(11):39-41.

[47] Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1, June 2011, pp.142-150.

[48] Nakov P, Ritter A, Rosenthal S, Sebastiani F, Stoyanov V. Evaluation measures for the semeval-2016 task 4:Sentiment analysis in twitter. http://alt.qcri.org/semeval2016/task4/, Feb. 2017.

[49] LeCun Y, Bottou L, Orr G B, Müller K R. Efficient backprop. In Neural Networks:Tricks of the Trade, Orr G B, Müller K R (eds.), Springer, 2012, pp.9-50.

[50] Zeiler M D. ADADELTA:An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/1212.5701, Sept. 2016.

[51] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011, 12:2121-2159.

[52] Kingma D, Ba J. Adam:A method for stochastic optimization. arXiv:1412.6980, 2014. https://arxiv.org/abs/1412.6980, Sept. 2016.

[53] Graves A, Wayne G, Danihelka I. Neural Turing machines. arXiv:1410.5401, 2014. https://arxiv.org/abs/1410.5401, Sept. 2016.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[2] 吴允曾;. On the Development of Applications of Logic in Programming[J]. , 1987, 2(1): 30 -34 .
[3] 王镭; 谭英;. The Researches in Fault-Tolerant D ataflow Architecture[J]. , 1991, 6(4): 395 -398 .
[4] 张钹; 张铃;. An Algorithm for Finding D-Time Table[J]. , 1992, 7(1): 62 -67 .
[5] 叶世伟; 史忠植;. A Necessary Condition about the Optimum Partition on a Finite Set of Samples and Its Application to Clustering Analysis[J]. , 1995, 10(6): 545 -556 .
[6] 吴杰;. Reliable Communication on Cube-Based Multicomputers[J]. , 1996, 11(3): 208 -221 .
[7] 唐常杰; 熊民;. The Temporal Mechanisms in Hbase[J]. , 1996, 11(4): 365 -371 .
[8] 薛锦云;. Unified Approach for Developing EfficientAlgorithmic Programs[J]. , 1997, 12(4): 314 -329 .
[9] 王德强; 赵连昌;. The Twisted-Cube Connected Networks[J]. , 1999, 14(2): 181 -187 .
[10] 王晓东; 徐明; 周兴铭;. Fast Multicast on Multistage Interconnection Networks Using Multi-Head Worms[J]. , 1999, 14(3): 250 -258 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: