›› 2015,Vol. 30 ›› Issue (5): 1120-1129.doi: 10.1007/s11390-015-1587-1

所属专题: Artificial Intelligence and Pattern Recognition Data Management and Data Mining

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

利用表情符空间进行微博情感分析

Fei Jiang1,2,3(姜飞), Yi-Qun Liu1,2,3*(刘奕群), Senior Member, CCF, Huan-Bo Luan1,2,3(栾焕博)Jia-Shen Sun4(孙甲申), Xuan Zhu4(朱璇), Min Zhang1,2,3(张敏), Senior Member, CCF, Shao-Ping Ma1,2,3(马少平)   

  1. 1 State Key Laboratory of Intelligent Technology and Systems, Beijing 100084, China;
    2 Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China;
    3 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    4 Language Computing Laboratory, Samsung Research & Development Institute of China, Beijing 100028, China
  • 收稿日期:2014-11-15 修回日期:2015-07-03 出版日期:2015-09-05 发布日期:2015-09-05
  • 通讯作者: Yi-Qun Liu E-mail:yiqunliu@tsinghua.edu.cn
  • 作者简介:Fei Jiang is a master candidate in Department of Computer Science and Technology, Tsinghua University, Beijing. He received his B.S. degree in computer science from Tsinghua University in 2013. His research interests include information retrieval and natural language processing.
  • 基金资助:

    This work was supported by Tsinghua-Samsung Joint Laboratory, the National Basic Research 973 Program of China under Grant No. 2015CB358700, and the National Natural Science Foundation of China under Grant Nos. 61472206, 61073071, and 61303075.

Microblog Sentiment Analysis with Emoticon Space Model

Fei Jiang1,2,3(姜飞), Yi-Qun Liu1,2,3*(刘奕群), Senior Member, CCF, Huan-Bo Luan1,2,3(栾焕博)Jia-Shen Sun4(孙甲申), Xuan Zhu4(朱璇), Min Zhang1,2,3(张敏), Senior Member, CCF, Shao-Ping Ma1,2,3(马少平)   

  1. 1 State Key Laboratory of Intelligent Technology and Systems, Beijing 100084, China;
    2 Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China;
    3 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    4 Language Computing Laboratory, Samsung Research & Development Institute of China, Beijing 100028, China
  • Received:2014-11-15 Revised:2015-07-03 Online:2015-09-05 Published:2015-09-05
  • Contact: Yi-Qun Liu E-mail:yiqunliu@tsinghua.edu.cn
  • About author:Fei Jiang is a master candidate in Department of Computer Science and Technology, Tsinghua University, Beijing. He received his B.S. degree in computer science from Tsinghua University in 2013. His research interests include information retrieval and natural language processing.
  • Supported by:

    This work was supported by Tsinghua-Samsung Joint Laboratory, the National Basic Research 973 Program of China under Grant No. 2015CB358700, and the National Natural Science Foundation of China under Grant Nos. 61472206, 61073071, and 61303075.

在微博环境下, 表情符被广泛用来表达各种各样的情感。因此, 在微博情感分析领域, 它们被用做最重要的情感信号之一。目前大多数工作使用一些表达明确情感意义的表情符作为带噪音的标签或者类似的情感信号。然而, 在实际的微博环境中, 大量的表情符被广泛采用, 每个表情符都有其独自的情感含义。此外, 相当数量的表情符并没有明确的情感意义。上述现象不应被忽略。针对前人工作的不足, 我们提出了一种表情符空间的方法来获取词语的情感表示。该方法利用无标注数据将词语映射到表情符空间, 从而构造词语的情感表示。在NLP&CC2013的语料上的实验证明, 我们的方法能够有效地使用更为丰富的表情符信号, 从而获得比前人工作以及评测任务最好结果更好的实验结果。

Abstract: Emoticons have been widely employed to express different types of moods, emotions, and feelings in microblog environments. They are therefore regarded as one of the most important signals for microblog sentiment analysis. Most existing studies use several emoticons that convey clear emotional meanings as noisy sentiment labels or similar sentiment indicators. However, in practical microblog environments, tens or even hundreds of emoticons are frequently adopted and all emoticons have their own unique emotional meanings. Besides, a considerable number of emoticons do not have clear emotional meanings. An improved sentiment analysis model should not overlook these phenomena. Instead of manually assigning sentiment labels to several emoticons that convey relatively clear meanings, we propose the emoticon space model (ESM) that leverages more emoticons to construct word representations from a massive amount of unlabeled data. By projecting words and microblog posts into an emoticon space, the proposed model helps identify subjectivity, polarity, and emotion in microblog environments. The experimental results for a public microblog benchmark corpus (NLP&CC 2013) indicate that ESM effectively leverages emoticon signals and outperforms previous state-of-the-art strategies and benchmark best runs.

[1] Jansen B J, Zhang M, Sobel K, Chowdury A. Twitter power:Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 2009, 60(11):2169-2188.

[2] Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science, 2011, 2(1):1-8.

[3] Zhao J, Dong L, Wu J, Xu K. MoodLens:An emoticonbased sentiment analysis system for Chinese tweets. In Proc. the 18th KDD, Aug. 2012, pp.1528-1531.

[4] Jiang L, Yu M, Zhou M, Liu X, Zhao T. Target-dependent Twitter sentiment classification. In Proc. the 49th ACL, Jun. 2011, pp.151-160.

[5] Liu K L, Li W J, Guo M. Emoticon smoothed language models for Twitter sentiment analysis. In Proc. the 26th AAAI, Jul. 2012.

[6] Bermingham A, Smeaton A F. Classifying sentiment in microblogs:Is brevity an advantage? In Proc. the 19th ACM International Conference on Information and Knowledge Management, Oct. 2010, pp.1833-1836.

[7] Kouloumpis E, Wilson T, Moore J. Twitter sentiment analysis:The good the bad and the OMG! In Proc. the 5th ICWSM, Jul. 2011.

[8] Barbosa L, Feng J. Robust sentiment detection on Twitter from biased and noisy data. In Proc. the 23rd International Conference on Computational Linguistics:Posters, Aug. 2010, pp.36-44.

[9] Pak A, Paroubek P. Twitter as a corpus for sentiment analysis and opinion mining. In Proc. LREC, May 2010.

[10] Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. Technical Report, Stanford University, 2009.

[11] Weichselbraun A, Gindl S, Scharl A. Enriching semantic knowledge bases for opinion mining in big data applications. Knowledge-Based Systems, 2014, 69:78-85.

[12] Nielsen F Å. A new ANEW:Evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903, 2011. http://arxiv.org/abs/1103.2903v1, Jun. 2015.

[13] Esuli A, Sebastiani F. SENTIWORDNET:A publicly available lexical resource for opinion mining. In Proc. the 5th LREC, May 2006, pp.417-422.

[14] Hu X, Tang J, Gao H, Liu H. Unsupervised sentiment analysis with emotional signals. In Proc. the 22nd International Conference on World Wide Web, May 2013, pp.607-618.

[15] Cui A, Zhang M, Liu Y, Ma S. Emotion tokens:Bridging the gap among multilingual Twitter sentiment analysis. In Proc. the 7th AIRS, Dec. 2011, pp.238-249.

[16] Bifet A, Frank E. Sentiment knowledge discovery in Twitter streaming data. In Proc. the 13th DS, Oct. 2010, pp.1-15.

[17] Davidov D, Tsur O, Rappoport A. Enhanced sentiment learning using Twitter hashtags and smileys. In Proc. the 23rd International Conference on Computational Linguistics:Posters, Aug. 2010, pp.241-249.

[18] Bengio Y, Ducharme R, Vincent P, Jauvin C. A neural probabilistic language model. Journal of Machine Learning Research, 2003, 3:1137-1155.

[19] Mnih A, Hinton G E. A scalable hierarchical distributed language model. In Proc. the 22nd Advances in Neural Information Processing Systems, Dec. 2008, pp.1081-1088.

[20] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. arXiv:1310.4546, 2013. http://arxiv.org/abs/1310.4546v1, Jun. 2015.

[21] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. http://arxiv.org/abs/1301.3781v3, Jun. 2015.

[22] Zhang H P, Yu H K, Xiong D Y, Liu Q. HHMM-based Chinese lexical analyzer ICTCLAS. In Proc. the 2nd SIGHAN Workshop on Chinese Language Processing, Volume 17, Jul. 2003, pp.184-187.

[23] Chang C C, Lin C J. LIBSVM:A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3):27:1-27:27.

[24] Zhang W, Liu J, Guo X. Sentiment Lexicon for Students. Encyclopedia of China Publishing House, 2004. (in Chinese)
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 谢立; 陈珮珮; 杨培根; 孙钟秀;. The Design and Implementation of an OA System ZGL1[J]. , 1988, 3(1): 75 -80 .
[2] 许志明;. Discrete Interpolation Surface[J]. , 1990, 5(4): 329 -332 .
[3] 金志权; 柳诚飞; 孙钟秀; 周晓方; 陈佩佩; 顾建明;. Design and Implementation of a Heterogeneous Distributed Database System[J]. , 1990, 5(4): 363 -373 .
[4] 张钹; 张铃;. On Memory Capacity of the Probabilistic Logic Neuron Network[J]. , 1993, 8(3): 62 -66 .
[5] 顾君忠;. An Object-Oriented Transaction Model[J]. , 1993, 8(4): 3 -20 .
[6] 招兆铿; 戴军; 陈文丹;. Automated Theorem Proving in Temporal Logic:T-Resolution[J]. , 1994, 9(1): 53 -62 .
[7] 马军; 杨波; 马绍汉;. A Practical Algorithm for the Minimum Rectilinear Steiner Tree[J]. , 2000, 15(1): 96 -99 .
[8] . 颜色图像滤波的四元数扩散方法[J]. , 2006, 21(1): 126 -136 .
[9] Chuan Shi (石川), Member, CCF, IEEE, Zhen-Yu Yan (闫震宇), Member, IEEE. 一种基于事后决策的社团发现算法[J]. , 2011, 26(5): 792 -805 .
[10] Liu-Xin Zhang (张柳新), Member, CCF, Ming-Tao Pei (裴明涛), Member, CCF and Yun-De Jia (贾云得), Senior Member, CCF. 多视点可见性判别及其在图像建模中的应用[J]. , 2011, 26(6): 1000 -1010 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: