基于半监督学习整合多种时间序列距离度量方法的时间序列聚类

周竞; 朱山风; 黄晓地; 张彦春

doi:10.1007/s11390-015-1565-7

基于半监督学习整合多种时间序列距离度量方法的时间序列聚类

Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning

摘要

摘要: 时间序列聚类在各个领域都有着广泛的应用。现有研究工作主要集中在设计时间序列间的距离度量算法上,比如,基于动态时间规整的方法,基于编辑距离的方法和基于shapelet的方法。本文基于谱聚类首次通过实验证明了,在时间序列聚类领域,没有一种现有的距离度量算法要显著优于其它距离度量算法。因此,这给我们提出了一个疑问,即对于一个给定的时间序列数据集,如何选择合适的度量算法。针对这一问题,我们提出了一种基于半监督聚类的多种距离度量算法的整合方法。我们的方法通过抽取潜在的对聚类有价值的信息来整合所有的距离度量算法。据我们所知,本文首次证明了基于约束的半监督聚类算法能够整合各种时间序列距离度量算法,并有效提升时间序列聚类的性能。在各类时间序列数据集上的聚类实验表明,该方法不仅优于基于单个距离度量方法的聚类算法,而且比经典的聚类整合方法也要优越。

Abstract: Time series clustering is widely applied in various areas. Existing research work focuses mainly on distance measures between two time series, such as DTW-based (dynamic time warping) methods, edit distance-based methods, and shapelets-based methods. In this work, we experimentally demonstrate, for the first time, that no single distance measure performs significantly better than others on clustering data sets of time series where spectral clustering is used. As such, a question arises as to how to choose an appropriate measure for a given data set of time series. To answer this question, we propose an integration scheme that incorporates multiple distance measures using semi-supervised clustering. Our approach is able to integrate all the measures by extracting valuable underlying information for the clustering. To our best knowledge, this work demonstrates for the first time that semi-supervised clustering method based on constraints is able to enhance time series clustering by combining multiple distance measures. Having tested on clustering various time series data sets, we show that our method outperforms individual measures, as well as typical integration approaches.

HTML全文

参考文献()

施引文献

资源附件()