›› 2015, Vol. 30 ›› Issue (4): 859-873.doi: 10.1007/s11390-015-1565-7

Special Issue: Artificial Intelligence and Pattern Recognition; Data Management and Data Mining

• Special Section on Data Management and Data Mining • Previous Articles     Next Articles

Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning

Jing Zhou1,2(周竞), Shan-Feng Zhu1,2(朱山风), Member, CCF, ACM, Xiaodi Huang3(黄晓地), Member, ACM, IEEE, Yanchun Zhang1,4,5(张彦春), Member, CCF   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China;
    2. Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China;
    3. School of Computing and Mathematics, Charles Sturt University, Albury, NSW 2640, Australia;
    4. School of Engineering and Science, Victoria University, Melbourne, Victoria 8001, Australia;
    5. Shanghai Key Laboratory of Data Science, Fudan University, Shanghai 201203, China
  • Received:2015-02-01 Revised:2015-03-25 Online:2015-07-05 Published:2015-07-05
  • About author:Jing Zhou received his B.S. degree in computer science from Donghua University, Shanghai, in 2012. He is currently a graduate student of the Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai. His current research interests include time series analysis, data mining, and bioinformatics.
  • Supported by:

    The work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61332013, 61272110, and 61370229, and the National Key Technology Research and Development Program of China under Grant No. 2013BAH72B01.

Time series clustering is widely applied in various areas. Existing research work focuses mainly on distance measures between two time series, such as DTW-based (dynamic time warping) methods, edit distance-based methods, and shapelets-based methods. In this work, we experimentally demonstrate, for the first time, that no single distance measure performs significantly better than others on clustering data sets of time series where spectral clustering is used. As such, a question arises as to how to choose an appropriate measure for a given data set of time series. To answer this question, we propose an integration scheme that incorporates multiple distance measures using semi-supervised clustering. Our approach is able to integrate all the measures by extracting valuable underlying information for the clustering. To our best knowledge, this work demonstrates for the first time that semi-supervised clustering method based on constraints is able to enhance time series clustering by combining multiple distance measures. Having tested on clustering various time series data sets, we show that our method outperforms individual measures, as well as typical integration approaches.

[1] Hirano S, Tsumoto S. Cluster analysis of time-series medical data based on the trajectory representation and multiscale comparison techniques. In Proc. the 6th International Conference on Data Mining, December 2006, pp.896-901.

[2] Ruiz E J, Hristidis V, Castillo C, Gionis A, Jaimes A. Correlating financial time series with micro-blogging activity. In Proc. the 5th ACM International Conference on Web Search and Data Mining, February 2012, pp.513-522.

[3] Tan S C, San L J P. Time series clustering: A superior alternative for market basket analysis. In Proc. the 1st International Conference on Advanced Data and Information Engineering, January 2013, pp.241-248.

[4] Mackas D L, Greve W, Edwards M et al. Changing zooplankton seasonality in a changing ocean: Comparing time series of zooplankton phenology. Progress in Oceanography, 2012, 97/98/99/100: 31-62.

[5] Lai C P, Chung P C, Tseng V S. A novel two-level clustering method for time series data analysis. Expert Systems with Applications, 2010, 37(9): 6319-6326.

[6] Wang X, Smith K, Hyndman R. Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 2006, 13(3): 335-364.

[7] Zhang X, Liu J, Du Y, Lv T. A novel clustering method on time series data. Expert Systems with Applications, 2011, 38(9): 11891-11900.

[8] Zakaria J, Mueen A, Keogh E J. Clustering time series using unsupervised-shapelets. In Proc. the 12th IEEE International Conference on Data Mining, December 2012, pp.785-794.

[9] Bagnall A, Janacek G. Clustering time series with clipped data. Machine Learning, 2005, 58(2/3): 151-178.

[10] Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E J. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. the VLDB Endowment, 2008, 1(2): 1542-1552.

[11] Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E J. Indexing multi-dimensional time-series with support for multiple distance measures. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003, pp.216-225.

[12] Ye L, Keogh E J. Time series shapelets: A new primitive for data mining. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 28-July 1, 2009, pp.947-956.

[13] Keogh E J, Pazzani M J. Derivative dynamic time warping. In Proc. the 1st SIAM International Conference on Data Mining, April 2001, pp.1:1-1:11.

[14] Jeong Y S, Jeong M K, Omitaomu O A. Weighted dynamic time warping for time series classification. Pattern Recognition, 2011, 44(9): 2231-2240.

[15] Marteau P F, Gibet S. On recursive edit distance kernels with application to time series classification. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1121-1133.

[16] Marteau P F. Time warp edit distance with stiffness adjustment for time series matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 306- 318.

[17] Shao J, Huang Z, Shen H T, Shen J, Zhou X. Distributionbased similarity measures for multi-dimensional point set retrieval applications. In Proc. the 16th ACM International Conference on Multimedia, October 2008, pp.429-438.

[18] Sun Y, Li J, Liu J, Sun B, Chow C. An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing, 2014, 138: 189-198.

[19] Qi J, Zhang R, Ramamohanarao K, Wang H,Wen Z,Wu D. Indexable online time series segmentation with error bound guarantee. World Wide Web, 2015, 18(2): 359-401.

[20] Lin J, Vlachos M, Keogh E J, Gunopulos D. Iterative incremental clustering of time series. In Proc. the 9th International Conference on Extending Database Technology, March 2004, pp.106-122.

[21] Hautamaki V, Nykanen P, Franti P. Time-series clustering by approximate prototypes. In Proc. the 19th International Conference Pattern Recognition, December 2008.

[22] Oates T, Firoiu L, Cohen P R. Clustering time series with hidden Markov models and dynamic time warping. In Proc. the IJCAI-99 Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, August 1999, pp.17-21.

[23] Ghassempour S, Girosi F, Maeder A. Clustering multivariate time series using hidden Markov models. International Journal of Environmental Research and Public Health, 2014, 11(3): 2741-2763.

[24] Izakian H, Pedrycz W, Jamal I. Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Artificial Intelligence, 2015, 39: 235-244.

[25] Ramoni M, Sebastiani P, Cohen P. Bayesian clustering by dynamics. Machine Learning, 2002, 47(1): 91-121.

[26] Yang Y, Chen K. Temporal data clustering via weighted clustering ensemble with different representations. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(2): 307-320.

[27] Yang Y, Chen K. Time series clustering via RPCL network ensemble with different representations. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2011, 41(2): 190-199.

[28] Lines J, Bagnall A. Ensembles of elastic distance measures for time series classification. In Proc. the 14th SIAM International Conference on Data Mining, April 2014, pp.524- 532.

[29] Kulis B, Basu S, Dhillon I, Mooney R. Semi-supervised graph clustering: A kernel approach. Machine Learning, 2009, 74(1): 1-22.

[30] Huang X, Cheng H, Yang J, Yu J X, Fei H, Huan J. Semisupervised clustering of graph objects: A subgraph mining approach. In Proc. the 17th International Conference on Database Systems for Advanced Applications — Volume Part I, April 2012, pp.197-212.

[31] Chen Y, Rege M, Dong M, Hua J. Non-negative matrix factorization for semi-supervised data clustering. Knowledge and Information Systems, 2008, 17(3): 355-379.

[32] Shiga M, Mamitsuka H. Efficient semi-supervised learning on locally informative multiple graphs. Pattern Recognition, 2012, 45(3): 1035-1049.

[33] Sakoe H, Chiba S. A dynamic programming approach to continuous speech recognition. In Proc. the 7th International Congress on Acoustics, August 1971, pp.65-69.

[34] Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(1): 43-49.

[35] Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.

[36] Zhu S, Zeng J, Mamitsuka H. Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinformatics, 2009, 25(15): 1944-1951.

[37] Fern X Z, Brodley C E. Solving cluster ensemble problems by bipartite graph partitioning. In Proc. the 21st International Conference on Machine Learning, July 2004, Article No. 36.

[38] Ghaemi R, Sulaiman M N, Ibrahim H, Mustapha N. A survey: Clustering ensembles techniques. World Academy of Science, Engineering and Technology, 2009, 3(2): 477-486.

[39] Huang X, Zheng X, Yuan W, Wang F, Zhu S. Enhanced clustering of biomedical documents using ensemble nonnegative matrix factorization. Information Sciences, 2011, 181(11): 2293-2302.

[40] Gu J, Feng W, Zeng J, Mamitsuka H, Zhu S. Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Transactions on Cybernetics, 2013, 43(4): 1265-1276.

[41] Ji X, Xu W. Document clustering with prior knowledge. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2006, pp.405-412.

[42] Ghosh J. Scalable clustering. In Handbook of Data Mining, Ye N (ed.), CRC Press, 2003, pp.247-277.

[43] Strehl A, Ghosh J. Cluster ensembles — A knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 2003, 3: 583-617.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved