一种使用句法和语义的评价对象抽取的领域词典构建的融合方法
A Hybrid Method of Domain Lexicon Construction for Opinion Targets Extraction Using Syntax and Semantics
-
摘要: 中文微博评价对象抽取在意见挖掘中的扮演着十分重要地位, 近几年该领域得到了巨大发展, 尤其是基于CRF的方法。然而, 这种方法仅仅考虑了基于词汇的方法, 没有挖掘隐含的句法和语义知识。本文提出了一种融合领域词典和句法语义组合特征的新方法, 该方法首先采用词性、依存结构、句法结构、语义角色和基于词向量语义相似度获取领域词典, 然后将领域词典与不同特征组合相结合, 使用CRF做评价对象抽取。在COAE2014数据集上的实验结果表明, 该方法在评价对象抽取任务中获得了较好的效果。Abstract: Opinion targets extraction of Chinese microblogs plays an important role in opinion mining. There has been a significant progress in this area recently, especially the method based on conditional random field (CRF). However, this method only takes lexicon-related features into consideration and does not excavate the implied syntactic and semantic knowledge. We propose a novel approach which incorporates domain lexicon with groups of syntactical and semantic features. The approach acquires domain lexicon through a novel way which explores syntactic and semantic information through Partof-Speech, dependency structure, phrase structure, semantic role and semantic similarity based on word embedding. And then we combine the domain lexicon with opinion targets extracted from CRF with groups of features for opinion targets extraction. Experimental results on COAE2014 dataset show the outperformance of the approach compared with other well-known methods on the task of opinion targets extraction.