基于弱监督信息和大量数据抽取评论的特征词和情感词

房磊; 刘 彪; 黄民烈

doi:10.1007/s11390-015-1569-3

基于弱监督信息和大量数据抽取评论的特征词和情感词

Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

摘要

摘要: 特征词和情感词的抽取在情感分析领域是一项比较重要工作。在本文中,我们提出了一种基于大量未标注的评论数据,仅使用少数的特征词-情感词的搭配作为先验知识,来抽取语料中的特征词和情感词。
我们的主要贡献有两个方面:第一,我们提出了一种数据驱动的表示方法来表示特征词和评价词在语料级别上的关系,这种表示方法能够灵活的刻画丰富的语言结构;第二,我们使用了简单的、引入先验知识的无监督学习模型来进行特征词和情感词的抽取,并且在抽取的过程中一定程度上减少了错误传播的可能性。实验的结果表明我们提出的方法对于特征词和情感词抽取这个任务来说是十分有效的。

Abstract: Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two-fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures; second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.

HTML全文

参考文献()

施引文献

资源附件()