一种使用上下文自动编码器进行情感分析的新方法

doi:10.1007/s11390-018-1889-1

一种使用上下文自动编码器进行情感分析的新方法

A New Method for Sentiment Analysis Using Contextual Auto-Encoders

摘要

摘要: 情感分析，对理解用户在线表达的观点和看法提出了新的挑战。它们旨在通过分配给用户一个极性标签对主观文本进行分类。在此，我们介绍了一个通过使用自动编码器网络推测词语和句子层次的情感极性标签的新机器学习框架。受自动编码器的降维和特征提取能力的启发，我们通过将pointwise-mutual-information "PMI"词向量作为输入，提出了一个分布式词向量表征"PMI-SA"新模型。所得到的连续词向量结合表征一个句子。我们也发了一个用于学习句子表达的非监督句子嵌入方法：为上下文递归自动编码器"CoRAE"。CoRAE遵循递归自动编码器的基本观点，组合成构成句子的词的向量，但不依赖任何词法解析树。CoRAE模型通过考虑词的顺序，递归地将每个词和它的上下文词（邻词：其前的和其后的）组合起来。我们使用了一个具有微调技术的支持向量机分类器，以显示CoRAE提高了情感分析任务的准确度。实验结果表明CoRAE在Sanders推特数据库和Facebook评论数据库上显著优于其它几个竞争力强的基准方法。CoRAE模型在Facebook数据集到达了83.28%的效率，并且在Sanders数据集到达了97.57%的效率。

Abstract: Sentiment analysis, a hot research topic, presents new challenges for understanding users' opinions and judgments expressed online. They aim to classify the subjective texts by assigning them a polarity label. In this paper, we introduce a novel machine learning framework using auto-encoders network to predict the sentiment polarity label at the word level and the sentence level. Inspired by the dimensionality reduction and the feature extraction capabilities of the auto-encoders, we propose a new model for distributed word vector representation "PMI-SA" using as input pointwisemutual-information "PMI" word vectors. The resulted continuous word vectors are combined to represent a sentence. An unsupervised sentence embedding method, called Contextual Recursive Auto-Encoders "CoRAE", is also developed for learning sentence representation. Indeed, CoRAE follows the basic idea of the recursive auto-encoders to deeply compose the vectors of words constituting the sentence, but without relying on any syntactic parse tree. The CoRAE model consists in combining recursively each word with its context words (neighbors' words:previous and next) by considering the word order. A support vector machine classifier with fine-tuning technique is also used to show that our deep compositional representation model CoRAE improves significantly the accuracy of sentiment analysis task. Experimental results demonstrate that CoRAE remarkably outperforms several competitive baseline methods on two databases, namely, Sanders twitter corpus and Facebook comments corpus. The CoRAE model achieves an efficiency of 83.28% with the Facebook dataset and 97.57% with the Sanders dataset.

HTML全文

参考文献()

施引文献

资源附件()