|
›› 2015,Vol. 30 ›› Issue (5): 1109-1119.doi: 10.1007/s11390-015-1586-2
所属专题: Artificial Intelligence and Pattern Recognition; Data Management and Data Mining
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Yu Zhang*(张宇), Member, CCF, Miao Liu(刘妙), Hai-Xia Xia(夏海霞)
Yu Zhang*(张宇), Member, CCF, Miao Liu(刘妙), Hai-Xia Xia(夏海霞)
在对产品评论进行观点挖掘的过程中, 一个重要的任务是基于不同的观点目标词进行用户观点汇总。由于不同的知识背景和语言习惯, 用户会使用多种多样的术语来描述同一个观点目标词。这些术语被称之为上下文相关的同义词。为了提供全面的汇总结果, 首先要做的就是对这些观点目标词进行聚类。本文中, 我们主要关注中文产品评论中上下文相关观点目标词的聚类问题。我们提出了三种基于词语分布相似性的聚类方法, 并利用四个不同的共现矩阵进行实验。根据大量评论数据上的实验结果, 我们发现本文所提出的利用观点目标词共现矩阵的启发式k-means聚类方法取得了最好的聚类结果, 而且该方法的时间复杂度低、占用内存空间小。此外当选择不同的中心点组合时, 该方法的准确率也更加稳定。对于某些类型的共现矩阵, 我们还发现使用低维矩阵能够获得更高的平均聚类准确率。本文的研究成果提供了一种高效、节约空间并且准确率高的观点目标词聚类方法。
[1] Li D, Shuai X, Sun G, Tang J, Ding Y, Luo Z. Mining topiclevel opinion influence in microblog. In Proc. the 21st ACM International Conference on Information and Knowledge Management, Oct. 29-Nov. 2, 2012, pp.1562-1566.[2] Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D, Ng A Y, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, Oct. 2013, pp.1631-1642.[3] Poria S, Cambria E, Winterstein G, Huang G B. Sentic patterns:Dependency-based rules for concept-level sentiment analysis. Knowledge-Based Systems, 2014, 69:45-63.[4] Zhai Z, Liu B, Xu H, Jia P. Constrained LDA for grouping product features in opinion mining. In Proc. the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part 1, May 2011, pp.448-459.[5] Cambria E, Mazzocco T, Hussain A, Eckl C. Sentic medoids:Organizing affective common sense knowledge in a multi-dimensional vector space. In Proc. the 8th International Symposium on Neural Networks, Part 3, May 29-Jun. 1, 2011, pp.601-610.[6] Cambria E, Hussain A, Havasi C, Eckl C, Munro J. Towards crowd validation of the UK national health service. In Proc. the Web Science Conference 2010, Apr. 2010.[7] Deshpande B. How to use clustering for product categorization or segmentation. Feb. 2013. http://www.simafore.com-/blog/bid/113689/How-to-use-clustering-for-product-categorization-or-segmentation, Aug. 2015.[8] Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, May 31-Jun. 5, 2009, pp.19-27.[9] Carenini G, Ng R T, Zwart E. Extracting knowledge from evaluative text. In Proc. the 3rd International Conference on Knowledge Capture, Oct. 2005, pp.11-18.[10] Wagstaff K, Cardie C, Rogers S, Schrödl S. Constrained kmeans clustering with background knowledge. In Proc. the 18th International Conference on Machine Learning, Jun. 28-Jul. 1, 2001, pp.577-584.[11] Zhai Z, Liu B, Xu H, Jia P. Clustering product features for opinion mining. In Proc. the 4th International Conference on Web Search and Data Mining, Feb. 2011, pp.347-354.[12] Lin D, Wu X. Phrase clustering for discriminative learning. In Proc. the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Aug. 2009, pp.1030-1038.[13] Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6):391-407.[14] Sahami M, Heilman T D. A web-based kernel function for measuring the similarity of short text snippets. In Proc. the 15th International Conference on World Wide Web, May 2006, pp.377-386.[15] Bu F, Zhu X, Li M. Measuring the non-compositionality of multiword expressions. In Proc. the 23rd International Conference on Computational Linguistics, Aug. 2010, pp.116-124.[16] Pantel P, Crestan E, Borkovsky A, Popescu A M, Vyas V. Web-scale distributional similarity and entity set expansion. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, Aug. 2009, pp.938-947.[17] Andrzejewski D, Zhu X, Craven M. Incorporating domain knowledge into topic modeling via Dirichlet Forest priors. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.25-32.[18] Zhao S, Liu T, Li S. A topical document clustering method. Journal of Chinese Information Processing, 2007, 21(2):58-62. (in Chinese)[19] Elsner M, Charniak E, Johnson M. Structured generative models for unsupervised named-entity clustering. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, May 31-Jun. 5, 2009, pp.164- 172.[20] Andrews N, Eisner J, Dredze M. Robust entity clustering via phylogenetic inference. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1:Long Papers, Jun. 2014, pp.775-785.[21] Green S, Andrewst N, Gormleyt M R, Dredzet M, Manning C D. Entity clustering across languages. In Proc. the 2012 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Jun. 2012, pp.60-69.[22] Chen J, Zhao Z, Ye J, Liu H. Nonlinear adaptive distance metric learning for clustering. In Proc. the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2007, pp.123-132.[23] Li F, Han C, Huang M, Zhu X, Xia Y, Zhang S, Yu H. Structure-aware review mining and summarization. In Proc. the 23rd International Conference on Computational Linguistics, Aug. 2010, pp.653-661.[24] Zhang Y, Zhu W. Extracting implicit features in online customer reviews. In Proc. the 22nd International Conference on World Wide Web Companion, May 2013, pp.103-104. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |