Squeezer: An Efficient Algorithm for Clustering Categorical Data
-
Abstract
This paper presents a new efficient algorithm forclustering categorical data, Squeezer, which can produce high qualityclustering results and at the same time deserve goodscalability. The Squeezer algorithm reads each tuple tin sequence, either assigning t to an existing cluster (initiallynone), or creating t as a new cluster, which is determined bythe similarities between t and clusters. Due to itscharacteristics, the proposed algorithm is extremely suitable forclustering data streams, where given a sequence of points, theobjective is to maintain consistently good clustering of the sequenceso far, using a small amount of memory and time. Outliers can also behandled efficiently and directly in Squeezer. Experimentalresults on real-life and synthetic datasets verify the superiority ofSqueezer.
-
-