Clustering by Pattern Similarity
-
Abstract
The task of clustering is to identify classes of \em similarobjects among a set of objects. The definition of similarityvaries from one clustering model to another. However, in most ofthese models the concept of similarity is often based on suchmetrics as Manhattan distance, Euclidean distance or other L_pdistances. In other words, similar objects must have \em closevalues in at least a set of dimensions. In this paper, we explorea more general type of similarity. Under the \it pCluster model weproposed, two objects are similar if they exhibit a \em coherentpattern on a subset of dimensions. The new similarity conceptmodels a wide range of applications. For instance, in DNAmicroarray analysis, the expression levels of two genes may riseand fall synchronously in response to a set of environmentalstimuli. Although the magnitude of their expression levels may notbe close, the patterns they exhibit can be very much alike.Discovery of such clusters of genes is essential in revealingsignificant connections in gene regulatory networks. E-commerceapplications, such as collaborative filtering, can also benefitfrom the new model, because it is able to capture not only thecloseness of values of certain leading indicators but also thecloseness of (purchasing, browsing, etc.) patterns exhibited bythe customers. In addition to the novel similarity model, thispaper also introduces an effective and efficient algorithm todetect such clusters, and we perform tests on several real andsynthetic data sets to show its performance.
-
-