利用表情符空间进行微博情感分析
Microblog Sentiment Analysis with Emoticon Space Model
-
摘要: 在微博环境下, 表情符被广泛用来表达各种各样的情感。因此, 在微博情感分析领域, 它们被用做最重要的情感信号之一。目前大多数工作使用一些表达明确情感意义的表情符作为带噪音的标签或者类似的情感信号。然而, 在实际的微博环境中, 大量的表情符被广泛采用, 每个表情符都有其独自的情感含义。此外, 相当数量的表情符并没有明确的情感意义。上述现象不应被忽略。针对前人工作的不足, 我们提出了一种表情符空间的方法来获取词语的情感表示。该方法利用无标注数据将词语映射到表情符空间, 从而构造词语的情感表示。在NLP&CC2013的语料上的实验证明, 我们的方法能够有效地使用更为丰富的表情符信号, 从而获得比前人工作以及评测任务最好结果更好的实验结果。Abstract: Emoticons have been widely employed to express different types of moods, emotions, and feelings in microblog environments. They are therefore regarded as one of the most important signals for microblog sentiment analysis. Most existing studies use several emoticons that convey clear emotional meanings as noisy sentiment labels or similar sentiment indicators. However, in practical microblog environments, tens or even hundreds of emoticons are frequently adopted and all emoticons have their own unique emotional meanings. Besides, a considerable number of emoticons do not have clear emotional meanings. An improved sentiment analysis model should not overlook these phenomena. Instead of manually assigning sentiment labels to several emoticons that convey relatively clear meanings, we propose the emoticon space model (ESM) that leverages more emoticons to construct word representations from a massive amount of unlabeled data. By projecting words and microblog posts into an emoticon space, the proposed model helps identify subjectivity, polarity, and emotion in microblog environments. The experimental results for a public microblog benchmark corpus (NLP&CC 2013) indicate that ESM effectively leverages emoticon signals and outperforms previous state-of-the-art strategies and benchmark best runs.