›› 2015,Vol. 30 ›› Issue (4): 903-916.

• Special Section on Selected Paper from NPC 2011 •

基于弱监督信息和大量数据抽取评论的特征词和情感词

Lei Fang(房磊), Biao Liu(刘 彪), Min-Lie Huang*(黄民烈), Member, CCF

1. State Key Laboratory on Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
• 收稿日期:2014-09-12 修回日期:2015-05-04 出版日期:2015-07-05 发布日期:2015-07-05
• 通讯作者: Min-Lie Huang is an associate professor in the Department of Computer Science and Technology, Tsinghua University, Beijing. E-mail:aihuang@tsinghua.edu.cn
• 作者简介:Lei Fang is a fifth year Ph.D. student in the Department of Computer Science and Technology, Tsinghua University, Beijing. He received his Bachelor's degree in computer science and technology from Harbin Institute of Technology, in 2010. His research interest includes natural language processing, data mining, and machine learning.
• 基金资助:

This work is partly supported by the National Basic Research 973 Program of China under Grant Nos. 2012CB316301 and 2013CB329403, the National Natural Science Foundation of China under Grant Nos. 61332007 and 61272227, and the Beijing Higher Education Young Elite Teacher Project.

Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

Lei Fang(房磊), Biao Liu(刘 彪), Min-Lie Huang*(黄民烈), Member, CCF

1. State Key Laboratory on Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
• Received:2014-09-12 Revised:2015-05-04 Online:2015-07-05 Published:2015-07-05
• Contact: Min-Lie Huang is an associate professor in the Department of Computer Science and Technology, Tsinghua University, Beijing. E-mail:aihuang@tsinghua.edu.cn
• About author:Lei Fang is a fifth year Ph.D. student in the Department of Computer Science and Technology, Tsinghua University, Beijing. He received his Bachelor's degree in computer science and technology from Harbin Institute of Technology, in 2010. His research interest includes natural language processing, data mining, and machine learning.
• Supported by:

This work is partly supported by the National Basic Research 973 Program of China under Grant Nos. 2012CB316301 and 2013CB329403, the National Natural Science Foundation of China under Grant Nos. 61332007 and 61272227, and the Beijing Higher Education Young Elite Teacher Project.

Abstract: Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two-fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures; second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.

