›› 2015, Vol. 30 ›› Issue (5): 1109-1119.

• Special Section on Social Media Processing •

### Clustering Context-Dependent Opinion Target Words in Chinese Product Reviews

Yu Zhang*(张宇), Member, CCF, Miao Liu(刘妙), Hai-Xia Xia(夏海霞)

1. School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
• Received:2014-11-15 Revised:2015-06-08 Online:2015-09-05 Published:2015-09-05
• Contact: Yu Zhang E-mail:yzh@zstu.edu.cn
• About author:Yu Zhang now is an associate professor of Zhejiang Sci-Tech University, Hangzhou. She received her Ph.D. degree in computer science and technology from Zhejiang University, Hangzhou, in 2009. She is a member of CCF. Her current research interests include data mining, recommender system, and sentiment analysis.
• Supported by:

This work was supported by the Commonweal Technical Project of Zhejiang Province of China under Grant No. 2013C33063, the National Natural Science Foundation of China under Grant Nos. 61100183, 61402417, the Natural Science Foundation of Zhejiang Province of China under Grant No. LQ13F020014, and the 521 Talents Project of Zhejiang Sci-Tech University.

In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a variety of terms to describe the same opinion target. These terms are called as context-dependent synonyms. In order to provide a comprehensive summary, the first step is to classify these opinion target words into groups. In this article, we mainly focus on clustering context-dependent opinion target words in Chinese product reviews. We utilize three clustering methods based on distributional similarity and use four different co-occurrence matrices for experiments. According to the experimental results on a large number of reviews, we find that our proposed heuristic k-means clustering method using opinion target words co-occurrence matrix achieves the best clustering result with lower time complexity and less memory space. In addition, the accuracy is more stable when choosing different combinations of centroids. For some kinds of co-occurrence matrices, we also find that using small-size (low-dimensional) matrices achieves higher average clustering accuracy than using large-size (high-dimensional) matrices. Our findings provide a time-efficient and space-efficient way to cluster opinion targets with high accuracy.

