›› 2013,Vol. 28 ›› Issue (2): 311-321.doi: 10.1007/s11390-013-1331-7

所属专题: Artificial Intelligence and Pattern Recognition

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

可能性指数模糊聚类研究

Kiatichai Treerattanapitak and Chuleerat Jaruskulchai   

  • 收稿日期:2012-03-06 修回日期:2012-11-02 出版日期:2013-03-05 发布日期:2013-03-05

Possibilistic Exponential Fuzzy Clustering

Kiatichai Treerattanapitak and Chuleerat Jaruskulchai   

  1. Department of Computer Science, Kasetsart University, 50 Ngamwongwan Rd., Jatuchak, Bangkok, Thailand
  • Received:2012-03-06 Revised:2012-11-02 Online:2013-03-05 Published:2013-03-05

通常, 异常点(如噪声和异常)使得聚类分析(尤其是在模糊聚类分析)所得到的结果往往不准确。在聚类分析之后, 这些异常数据不仅停留在聚类之中, 而且使得聚类质心完全偏离其真实值。一方面, 传统的模糊聚类方法, 如模糊C-Means(FCM)方法将所有的数据点归到所有的聚类中。这种做法在某些具体情况下并不合适。通过将目标函数构造为指数形式, 可能性指数模糊聚类(PXFCM)算法竞争性地将数据归到聚类之中。另一方面, 噪声数据和异常数据并未在聚类过程中被恰当处理。由于模糊聚类在概率上要求每个数据点的类别隶属度之和为1, 因此异常数据点也被分配到某些聚类之中。为解决这一不足之处, 可能性方法被引入以期改进隶属度分配。然而, 随之而来的另一问题是可能性聚类方法, 由于其隶属度函数忽略了不同聚类质心之间的距离, 通常会将导致各个聚类之间相互产生重叠(即叠加聚类)。虽然已有的许多可能性聚类方法能够避免叠加聚类问题, 但是它们都需要大量参数进行组合优化。本文对集成可能性方法和指数模糊聚类的可能性指数模糊聚类(PXFCM)算法进行了理论研究。PXFCM算法仅需要一个参数, 它不仅能对数据进行划分也能够过滤噪声数据和探测异常数据。实验表明, 在聚类分析和异常探测方面, PXFCM能产出较高的聚类准确率并且不会产生叠加聚类。

Abstract: Generally, abnormal points (noise and outliers) cause cluster analysis to produce low accuracy especially in fuzzy clustering. These data not only stay in clusters but also deviate the centroids from their true positions. Traditional fuzzy clustering like Fuzzy C-Means (FCM) always assigns data to all clusters which is not reasonable in some circumstances. By reformulating objective function in exponential equation, the algorithm aggressively selects data into the clusters. However noisy data and outliers cannot be properly handled by clustering process therefore they are forced to be included in a cluster because of a general probabilistic constraint that the sum of the membership degrees across all clusters is one. In order to improve this weakness, possibilistic approach relaxes this condition to improve membership assignment. Nevertheless, possibilistic clustering algorithms generally suffer from coincident clusters because their membership equations ignore the distance to other clusters. Although there are some possibilistic clustering approaches that do not generate coincident clusters, most of them require the right combination of multiple parameters for the algorithms to work. In this paper, we theoretically study Possibilistic Exponential Fuzzy Clustering (PXFCM) that integrates possibilistic approach with exponential fuzzy clustering. PXFCM has only one parameter and not only partitions the data but also filters noisy data or detects them as outliers. The comprehensive experiments show that PXFCM produces high accuracy in both clustering results and outlier detection without generating coincident problems.

[1] MacQueen J B. Some methods for classification and analy-sis of multivariate observations. In Proc. the 5th BerkeleySymp. Mathematical Statistics and Probability, June 21-July18, 1965 and Dec.27, 1965-Jan.7, 1966, Vol.1, pp.281-297.

[2] Dunn J C. A fuzzy relative of the ISODATA process and itsuse in detecting compact well-separated clusters. J. Cyber-netics., 1973, 3(3): 32-57.

[3] Bezdek J C. Pattern Recognition with Fuzzy Objective Func-tion Algoritms. New York: Plenum Press, 1993.

[4] Krishnapuram R, Keller J M. A possibilistic approach to clus-tering. IEEE Trans. Fuzzy Systems, 1993, 1(2): 98-110.

[5] Treerattanapitak K, Juruskulchai C. Exponential fuzzy C-means for collaborative filtering. Journal of Computer Sci-ence & Technology, 2012, 27(3): 567-576.

[6] Chandola V, Banerjee A, Kumar V. Anomaly detection: Asurvey. ACM Comput. Surveys, 2009, 41(3): Article No. 15.

[7] Treerattanapitak K, Juruskulchai C. Outlier detection withpossibilistic exponential fuzzy clustering. In Proc. the 8thFSKD, Jul. 2011, pp.453-457.

[8] Barni M, Cappellini V, Mecocci A. Comments on "A possi-bilistic approach to clustering". IEEE Trans. Fuzzy Systems,1996, 4(3): 393-396.

[9] Pal N R, Pal K, Bezdek J C. A mixed c-means clusteringmodel. In Proc. the 6th IEEE Int. Conf. Fuzzy Systems,Jul. 1997, pp.11-21.

[10] Pal N R, Pal K, Keller J M, Bezdek J C. A possibilistic fuzzyc-means clustering algorithm. IEEE Trans. Fuzzy Systems,2005, 13(4): 517-530.

[11] Wachs J, Shapira O, Stern H. A method to enhance the "Pos-sibilistic C-means with repulsion" algorithm based on clustervalidity index. In Proc. the 9th Word Conf. Soft Computingin Industry Application, Sept. 20-Oct. 8, 2004, pp.77-87.

[12] Yang M, Wu K. Unsupervised possibilistic clustering. J. Pat-tern Recognition, 2006, 39(1): 5-21.

[13] Wu X, Wu B, Sun J, Fu H. Unsupervised possibilistic fuzzyclustering. J. Info. and Comp. Sci., 2010, 7(5): 1075-1080.

[14] Hawkins S, He H, Williams G J et al. Outlier detection usingreplicator neural networks. In Proc. the 4th DaWaK, Sept.2002, pp.170-180.

[15] Davy M, Godsill S. Detection of abrupt spectral changes us-ing support vector machines, an application to audio signalsegmentation. In Proc. the 2002 IEEE Int. Conf. Acoustics,Speech, and Signal Processing, May. 2002, pp.1313-1316.

[16] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms formining outliers from large data sets. In Proc. the 6th SIG-MOD Int. Conf. Management of Data, Jun. 2000, pp.427-438.

[17] Breunig M, Kriegel H, Ng R T et al. LOF: Identifying density-based local outliers. In Proc. the 6th ACM SIGMOD Int.Conf. Management of Data, Jun. 2000, pp.93-104.

[18] Ester M, Kriegel H P, Sander J, Xu X. A density-based algo-rithm for discovering clusters in large spatial databases withnoise. In Proc. the 2nd the KDD, Aug. 1996, pp.226-231.

[19] Tang C, Wang S, Xu W. New fuzzy c-means clustering modelbased on the data weighted approach. Data Knowledge En-gineering, 2010, 69(9): 881-900.

[20] Shahi A, Atan R B, Sulaiman M N. Detecting effectiveness ofoutliers and noisy data on fuzzy system using FCM. EuropeanJ. Sci. Research, 2009, 36(4): 627-638.

[21] He Z, Deng S, Xu X. An optimization model for outlier de-tection in categorical data. In Lecture Notes in ComputerScience 3644, Huang D S, Zhang X P, Huang G B (eds.),Springer, 2005, pp.400-409.

[22] Agovic A, Banerjee A, Ganguly A R, Protopopescu V.Anomaly detection in transportation corridors using manifoldembedding. In Proc. the 1st SensorKDD, Aug. 2007.

[23] Jin W, Tung K H, Han J. Mining top-n local outliers in largedatabases. In Proc. the 7th KDD, Aug. 2001, pp.293-298.

[24] Xue Z, Shang Y, Feng A. Semi-supervised outlier detectionbased on fuzzy rough C-means clustering. J. Mathematicsand Computers in Simulation, 2010, 80(9): 1911-1921.

[25] Xie X L, Beni G. A validity measure for fuzzy clustering.TPAMI, 1991, 13(8): 841-847.

[26] Kwon S H. Cluster validity index for fuzzy clustering. Elec-tronics Letters, 1998, 34(22): 2176-2177.

[27] Fukuyama Y, Sugeno M. A new method of choosing the num-ber of clusters for the fuzzy c-means method. In Proc. the5th Fuzzy Systems Symposium, Jun. 1989, pp.247-250.

[28] Gath I, Geva A B. Unsupervised optimal fuzzy clustering.Trans. Pattern Anal. Mach. Intell., 1989, 11(7): 773-781.

[29] Pakhira M K, Bandyopadhyay S, Maulik U. Validity indexfor crisp and fuzzy clusters. Pattern Recognition, 2004, 37(3):487-501.

[30] Wu K L, Yang M S. A cluster validity index for fuzzy cluster-ing. Pattern Recognition Lett., 2005, 26(9): 1275-1291.

[31] Aggarwal C C, Yu P S. Outlier detection for high dimensionaldata. In Proc. the 2001 ACM SIGMOD Int. Conf. Manage-ment of Data, May 2001, pp.37-46.

[32] Williums G J, Baster R A, He H et al. A comparative studyof RNN for outlier detection in data mining. In Proc. the2002 Int. Conf. Data Mining, Dec. 2002, pp.709-712.

[33] He Z, Xu X, Huang J Z, Deng S. A frequent pattern discov-ery method for outlier detection. In Proc. the 5th Int. Conf.Web-Age Info. Management, Jul. 2004, pp.726-732.

[34] He Z, Xu X, Deng S. Discovery cluster-based local outliers.Pattern Recognition Letters, 2003, 24(9/10): 1641-1650.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[2] 范植华;. Vectorization for Loops with Three-Forked Jumps[J]. , 1988, 3(3): 186 -202 .
[3] 郭庆平; Y.Paker;. Communication Analysis and Granularity Assessment for a Transputer-Based System[J]. , 1990, 5(4): 347 -362 .
[4] 周勇; 唐泽圣;. Constructing Isosurfaces from 3D Data Sets Taking Account of Depth Sorting of Polyhedra[J]. , 1994, 9(2): 117 -127 .
[5] 廖乐健; 史忠植;. Minimal Model Semantics for Sorted Constraint Representation[J]. , 1995, 10(5): 439 -446 .
[6] 赵彧; 张琼; 向辉; 石教英; 何志均;. A Simplified Model for Generating 3D Realistic Sound in the Multimedia and Virtual Reality Systems[J]. , 1996, 11(4): 461 -470 .
[7] 汪芸; 顾冠群; 兑继英;. Research on Protocol Migration[J]. , 1996, 11(6): 601 -606 .
[8] 程歧; 朱洪;. MNP: A Class of NP Optimization Problems[J]. , 1997, 12(4): 306 -313 .
[9] 傅育熙;. Constructive Sets in Computable Sets[J]. , 1997, 12(5): 425 -440 .
[10] 陈阳军;. On the Arc Consistency Problem[J]. , 1999, 14(4): 298 -308 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: