2013,Vol. 28 Issue (2): 311-321.

所属专题: Artificial Intelligence and Pattern Recognition

Special Section on Selected Paper from NPC 2011


Kiatichai Treerattanapitak and Chuleerat Jaruskulchai   

  • 收稿日期:2012-03-06 修回日期:2012-11-02 出版日期:2013-03-05 发布日期:2013-03-05

Possibilistic Exponential Fuzzy Clustering

  1. Department of Computer Science, Kasetsart University, 50 Ngamwongwan Rd., Jatuchak, Bangkok, Thailand
通常, 异常点(如噪声和异常)使得聚类分析(尤其是在模糊聚类分析)所得到的结果往往不准确。在聚类分析之后, 这些异常数据不仅停留在聚类之中, 而且使得聚类质心完全偏离其真实值。一方面, 传统的模糊聚类方法, 如模糊C-Means(FCM)方法将所有的数据点归到所有的聚类中。这种做法在某些具体情况下并不合适。通过将目标函数构造为指数形式, 可能性指数模糊聚类(PXFCM)算法竞争性地将数据归到聚类之中。另一方面, 噪声数据和异常数据并未在聚类过程中被恰当处理。由于模糊聚类在概率上要求每个数据点的类别隶属度之和为1, 因此异常数据点也被分配到某些聚类之中。为解决这一不足之处, 可能性方法被引入以期改进隶属度分配。然而, 随之而来的另一问题是可能性聚类方法, 由于其隶属度函数忽略了不同聚类质心之间的距离, 通常会将导致各个聚类之间相互产生重叠(即叠加聚类)。虽然已有的许多可能性聚类方法能够避免叠加聚类问题, 但是它们都需要大量参数进行组合优化。本文对集成可能性方法和指数模糊聚类的可能性指数模糊聚类(PXFCM)算法进行了理论研究。PXFCM算法仅需要一个参数, 它不仅能对数据进行划分也能够过滤噪声数据和探测异常数据。实验表明, 在聚类分析和异常探测方面, PXFCM能产出较高的聚类准确率并且不会产生叠加聚类。

Abstract: Generally, abnormal points (noise and outliers) cause cluster analysis to produce low accuracy especially in fuzzy clustering. These data not only stay in clusters but also deviate the centroids from their true positions. Traditional fuzzy clustering like Fuzzy C-Means (FCM) always assigns data to all clusters which is not reasonable in some circumstances. By reformulating objective function in exponential equation, the algorithm aggressively selects data into the clusters. However noisy data and outliers cannot be properly handled by clustering process therefore they are forced to be included in a cluster because of a general probabilistic constraint that the sum of the membership degrees across all clusters is one. In order to improve this weakness, possibilistic approach relaxes this condition to improve membership assignment. Nevertheless, possibilistic clustering algorithms generally suffer from coincident clusters because their membership equations ignore the distance to other clusters. Although there are some possibilistic clustering approaches that do not generate coincident clusters, most of them require the right combination of multiple parameters for the algorithms to work. In this paper, we theoretically study Possibilistic Exponential Fuzzy Clustering (PXFCM) that integrates possibilistic approach with exponential fuzzy clustering. PXFCM has only one parameter and not only partitions the data but also filters noisy data or detects them as outliers. The comprehensive experiments show that PXFCM produces high accuracy in both clustering results and outlier detection without generating coincident problems.

Full text



