|
›› 2013,Vol. 28 ›› Issue (2): 311-321.doi: 10.1007/s11390-013-1331-7
所属专题: Artificial Intelligence and Pattern Recognition
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Kiatichai Treerattanapitak and Chuleerat Jaruskulchai
Kiatichai Treerattanapitak and Chuleerat Jaruskulchai
通常, 异常点(如噪声和异常)使得聚类分析(尤其是在模糊聚类分析)所得到的结果往往不准确。在聚类分析之后, 这些异常数据不仅停留在聚类之中, 而且使得聚类质心完全偏离其真实值。一方面, 传统的模糊聚类方法, 如模糊C-Means(FCM)方法将所有的数据点归到所有的聚类中。这种做法在某些具体情况下并不合适。通过将目标函数构造为指数形式, 可能性指数模糊聚类(PXFCM)算法竞争性地将数据归到聚类之中。另一方面, 噪声数据和异常数据并未在聚类过程中被恰当处理。由于模糊聚类在概率上要求每个数据点的类别隶属度之和为1, 因此异常数据点也被分配到某些聚类之中。为解决这一不足之处, 可能性方法被引入以期改进隶属度分配。然而, 随之而来的另一问题是可能性聚类方法, 由于其隶属度函数忽略了不同聚类质心之间的距离, 通常会将导致各个聚类之间相互产生重叠(即叠加聚类)。虽然已有的许多可能性聚类方法能够避免叠加聚类问题, 但是它们都需要大量参数进行组合优化。本文对集成可能性方法和指数模糊聚类的可能性指数模糊聚类(PXFCM)算法进行了理论研究。PXFCM算法仅需要一个参数, 它不仅能对数据进行划分也能够过滤噪声数据和探测异常数据。实验表明, 在聚类分析和异常探测方面, PXFCM能产出较高的聚类准确率并且不会产生叠加聚类。
[1] MacQueen J B. Some methods for classification and analy-sis of multivariate observations. In Proc. the 5th BerkeleySymp. Mathematical Statistics and Probability, June 21-July18, 1965 and Dec.27, 1965-Jan.7, 1966, Vol.1, pp.281-297.[2] Dunn J C. A fuzzy relative of the ISODATA process and itsuse in detecting compact well-separated clusters. J. Cyber-netics., 1973, 3(3): 32-57.[3] Bezdek J C. Pattern Recognition with Fuzzy Objective Func-tion Algoritms. New York: Plenum Press, 1993.[4] Krishnapuram R, Keller J M. A possibilistic approach to clus-tering. IEEE Trans. Fuzzy Systems, 1993, 1(2): 98-110.[5] Treerattanapitak K, Juruskulchai C. Exponential fuzzy C-means for collaborative filtering. Journal of Computer Sci-ence & Technology, 2012, 27(3): 567-576.[6] Chandola V, Banerjee A, Kumar V. Anomaly detection: Asurvey. ACM Comput. Surveys, 2009, 41(3): Article No. 15.[7] Treerattanapitak K, Juruskulchai C. Outlier detection withpossibilistic exponential fuzzy clustering. In Proc. the 8thFSKD, Jul. 2011, pp.453-457.[8] Barni M, Cappellini V, Mecocci A. Comments on "A possi-bilistic approach to clustering". IEEE Trans. Fuzzy Systems,1996, 4(3): 393-396.[9] Pal N R, Pal K, Bezdek J C. A mixed c-means clusteringmodel. In Proc. the 6th IEEE Int. Conf. Fuzzy Systems,Jul. 1997, pp.11-21.[10] Pal N R, Pal K, Keller J M, Bezdek J C. A possibilistic fuzzyc-means clustering algorithm. IEEE Trans. Fuzzy Systems,2005, 13(4): 517-530.[11] Wachs J, Shapira O, Stern H. A method to enhance the "Pos-sibilistic C-means with repulsion" algorithm based on clustervalidity index. In Proc. the 9th Word Conf. Soft Computingin Industry Application, Sept. 20-Oct. 8, 2004, pp.77-87.[12] Yang M, Wu K. Unsupervised possibilistic clustering. J. Pat-tern Recognition, 2006, 39(1): 5-21.[13] Wu X, Wu B, Sun J, Fu H. Unsupervised possibilistic fuzzyclustering. J. Info. and Comp. Sci., 2010, 7(5): 1075-1080.[14] Hawkins S, He H, Williams G J et al. Outlier detection usingreplicator neural networks. In Proc. the 4th DaWaK, Sept.2002, pp.170-180.[15] Davy M, Godsill S. Detection of abrupt spectral changes us-ing support vector machines, an application to audio signalsegmentation. In Proc. the 2002 IEEE Int. Conf. Acoustics,Speech, and Signal Processing, May. 2002, pp.1313-1316.[16] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms formining outliers from large data sets. In Proc. the 6th SIG-MOD Int. Conf. Management of Data, Jun. 2000, pp.427-438.[17] Breunig M, Kriegel H, Ng R T et al. LOF: Identifying density-based local outliers. In Proc. the 6th ACM SIGMOD Int.Conf. Management of Data, Jun. 2000, pp.93-104.[18] Ester M, Kriegel H P, Sander J, Xu X. A density-based algo-rithm for discovering clusters in large spatial databases withnoise. In Proc. the 2nd the KDD, Aug. 1996, pp.226-231.[19] Tang C, Wang S, Xu W. New fuzzy c-means clustering modelbased on the data weighted approach. Data Knowledge En-gineering, 2010, 69(9): 881-900.[20] Shahi A, Atan R B, Sulaiman M N. Detecting effectiveness ofoutliers and noisy data on fuzzy system using FCM. EuropeanJ. Sci. Research, 2009, 36(4): 627-638.[21] He Z, Deng S, Xu X. An optimization model for outlier de-tection in categorical data. In Lecture Notes in ComputerScience 3644, Huang D S, Zhang X P, Huang G B (eds.),Springer, 2005, pp.400-409.[22] Agovic A, Banerjee A, Ganguly A R, Protopopescu V.Anomaly detection in transportation corridors using manifoldembedding. In Proc. the 1st SensorKDD, Aug. 2007.[23] Jin W, Tung K H, Han J. Mining top-n local outliers in largedatabases. In Proc. the 7th KDD, Aug. 2001, pp.293-298.[24] Xue Z, Shang Y, Feng A. Semi-supervised outlier detectionbased on fuzzy rough C-means clustering. J. Mathematicsand Computers in Simulation, 2010, 80(9): 1911-1921.[25] Xie X L, Beni G. A validity measure for fuzzy clustering.TPAMI, 1991, 13(8): 841-847.[26] Kwon S H. Cluster validity index for fuzzy clustering. Elec-tronics Letters, 1998, 34(22): 2176-2177.[27] Fukuyama Y, Sugeno M. A new method of choosing the num-ber of clusters for the fuzzy c-means method. In Proc. the5th Fuzzy Systems Symposium, Jun. 1989, pp.247-250.[28] Gath I, Geva A B. Unsupervised optimal fuzzy clustering.Trans. Pattern Anal. Mach. Intell., 1989, 11(7): 773-781.[29] Pakhira M K, Bandyopadhyay S, Maulik U. Validity indexfor crisp and fuzzy clusters. Pattern Recognition, 2004, 37(3):487-501.[30] Wu K L, Yang M S. A cluster validity index for fuzzy cluster-ing. Pattern Recognition Lett., 2005, 26(9): 1275-1291.[31] Aggarwal C C, Yu P S. Outlier detection for high dimensionaldata. In Proc. the 2001 ACM SIGMOD Int. Conf. Manage-ment of Data, May 2001, pp.37-46.[32] Williums G J, Baster R A, He H et al. A comparative studyof RNN for outlier detection in data mining. In Proc. the2002 Int. Conf. Data Mining, Dec. 2002, pp.709-712.[33] He Z, Xu X, Huang J Z, Deng S. A frequent pattern discov-ery method for outlier detection. In Proc. the 5th Int. Conf.Web-Age Info. Management, Jul. 2004, pp.726-732.[34] He Z, Xu X, Deng S. Discovery cluster-based local outliers.Pattern Recognition Letters, 2003, 24(9/10): 1641-1650. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |