We use cookies to improve your experience with our site.
张宇, 刘妙, 夏海霞. 中文产品评论中上下文相关观点目标词的聚类[J]. 计算机科学技术学报, 2015, 30(5): 1109-1119. DOI: 10.1007/s11390-015-1586-2
引用本文: 张宇, 刘妙, 夏海霞. 中文产品评论中上下文相关观点目标词的聚类[J]. 计算机科学技术学报, 2015, 30(5): 1109-1119. DOI: 10.1007/s11390-015-1586-2
Yu Zhang, Miao Liu, Hai-Xia Xia. Clustering Context-Dependent Opinion Target Words in Chinese Product Reviews[J]. Journal of Computer Science and Technology, 2015, 30(5): 1109-1119. DOI: 10.1007/s11390-015-1586-2
Citation: Yu Zhang, Miao Liu, Hai-Xia Xia. Clustering Context-Dependent Opinion Target Words in Chinese Product Reviews[J]. Journal of Computer Science and Technology, 2015, 30(5): 1109-1119. DOI: 10.1007/s11390-015-1586-2

中文产品评论中上下文相关观点目标词的聚类

Clustering Context-Dependent Opinion Target Words in Chinese Product Reviews

  • 摘要: 在对产品评论进行观点挖掘的过程中, 一个重要的任务是基于不同的观点目标词进行用户观点汇总。由于不同的知识背景和语言习惯, 用户会使用多种多样的术语来描述同一个观点目标词。这些术语被称之为上下文相关的同义词。为了提供全面的汇总结果, 首先要做的就是对这些观点目标词进行聚类。本文中, 我们主要关注中文产品评论中上下文相关观点目标词的聚类问题。我们提出了三种基于词语分布相似性的聚类方法, 并利用四个不同的共现矩阵进行实验。根据大量评论数据上的实验结果, 我们发现本文所提出的利用观点目标词共现矩阵的启发式k-means聚类方法取得了最好的聚类结果, 而且该方法的时间复杂度低、占用内存空间小。此外当选择不同的中心点组合时, 该方法的准确率也更加稳定。对于某些类型的共现矩阵, 我们还发现使用低维矩阵能够获得更高的平均聚类准确率。本文的研究成果提供了一种高效、节约空间并且准确率高的观点目标词聚类方法。

     

    Abstract: In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a variety of terms to describe the same opinion target. These terms are called as context-dependent synonyms. In order to provide a comprehensive summary, the first step is to classify these opinion target words into groups. In this article, we mainly focus on clustering context-dependent opinion target words in Chinese product reviews. We utilize three clustering methods based on distributional similarity and use four different co-occurrence matrices for experiments. According to the experimental results on a large number of reviews, we find that our proposed heuristic k-means clustering method using opinion target words co-occurrence matrix achieves the best clustering result with lower time complexity and less memory space. In addition, the accuracy is more stable when choosing different combinations of centroids. For some kinds of co-occurrence matrices, we also find that using small-size (low-dimensional) matrices achieves higher average clustering accuracy than using large-size (high-dimensional) matrices. Our findings provide a time-efficient and space-efficient way to cluster opinion targets with high accuracy.

     

/

返回文章
返回