›› 2013,Vol. 28 ›› Issue (2): 300-310.doi: 10.1007/s11390-013-1330-8

所属专题: Artificial Intelligence and Pattern Recognition Data Management and Data Mining

• Special Section on Selected Paper from NPC 2011 • 上一篇    下一篇

无参数的时间序列discord搜索

Wei Luo1, Marcus Gallagher2, Member, IEEE, and Janet Wiles2, Member, IEEE   

  • 收稿日期:2012-02-15 修回日期:2012-11-14 出版日期:2013-03-05 发布日期:2013-03-05

Parameter-Free Search of Time-Series Discord

Wei Luo1, Marcus Gallagher2, Member, IEEE, and Janet Wiles2, Member, IEEE   

  1. 1 School of Information Technology, Deakin University, Geelong, VIC 3220, Australia;
    2 School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, QLD 4072, Australia
  • Received:2012-02-15 Revised:2012-11-14 Online:2013-03-05 Published:2013-03-05
  • Supported by:

    Support by Australian Research Council Linkage Grant No. LP 0776417.

时间序列异常常常在数据挖掘中被用来描述异常子序列。相比一些其他的搜索算法, 基于复原图的直接搜索算法拥快速和无参数等优点。但是直接搜索算法假设时间序列存在准周期性, 因此限制了该算法的适用性。本文提出一个参考函数和基于它的新的取样策略, 从而去除了直接搜索算法中的准周期假设。实验结果显示新的算法具有更高的效率和鲁棒性。

Abstract: Time-series discord is widely used in data mining applications to characterize anomalous subsequences in time series. Compared to some other discord search algorithms, the direct search algorithm based on the recurrence plot shows the advantage of being fast and parameter free. The direct search algorithm, however, relies on quasi-periodicity in input time series, an assumption that limits the algorithm's applicability. In this paper, we eliminate the periodicity assumption from the direct search algorithm by proposing a reference function for subsequences and a new sampling strategy based on the reference function. These measures result in a new algorithm with improved efficiency and robustness, as evidenced by our empirical evaluation.

[1] Shoeb A, Edwards H, Connolly J et al. Patient-specificseizure onset detection. Epilepsy & Behavior, 2004, 5(4): 483-498.

[2] Febrero M, Galeano P, González-Manteiga W. A functionalanalysis of NOx levels: Location and scale estimation andoutlier detection. Computational Statistics, 2007, 22(3): 411-427.

[3] van Wijk J, Van Selow E. Cluster and calendar based visual-ization of time series data. In Proc. the 1999 IEEE Sympo-sium on Information Visualization, Oct. 1999, pp.4-9.

[4] Rebbapragada U, Protopapas P, Brodley C, Alcock C. Find-ing anomalous periodic time series. Machine learning, 2009,74(3): 281-313.

[5] Rousseeuw P, Leroy A. Robust Regression and Outlier Detec-tion. Wiley Online Library, 1987.

[6] Dasgupta D, Forrest S. Novelty detection in time series datausing ideas from immunology. In Proc. the InternationalConference on Intelligent Systems, Dec. 1996, pp.82-87.

[7] Hyndman R, Ullah S. Robust forecasting of mortality andfertility rates: A functional data approach. ComputationalStatistics & Data Analysis, 2007, 51(10): 4942-4956.

[8] Febrero M, Galeano P, González-Manteiga W. Outlier detec-tion in functional data by depth measures, with application toidentify abnormal NOX levels. Environmetrics, 2008, 19(4):331-345.

[9] Keogh E, Lin J, Fu A. HOT SAX: Efficiently finding the mostunusual time series subsequence. In Proc. the 5th IEEE In-ternational Conference on Data Mining, Nov. 2005, pp.226-233.

[10] Lin J, Keogh E, Fu A, Van Herle H. Approximations to magic:Finding unusual medical time series. In Proc. the 18th IEEESymposium on Computer-Based Medical Systems, Jun. 2005,pp.329-334.

[11] Bu Y, Leung T, Fu A, Keogh E, Pei J, Meshkin S. WAT:Finding top-K discords in time series database. In Proc. the7th SIAM International Conference on Data Mining, Apr.2007.

[12] Yankov D, Keogh E, Rebbapragada U. Disk aware discord dis-covery: Finding unusual time series in terabyte sized datasets.Knowledge and Information Systems, 2008, 17(2): 241-262.

[13] Goldberger A, Amaral L, Glass L et al. Physiobank, phys-iotoolkit, and physionet: Components of a new research re-source for complex physiologic signals. Circulation, 2000,101(23): e215-e220.

[14] Shoeb A, Guttag J. Application of machine learning to epilep-tic seizure detection. In Proc. the 27th International Confer-ence on Machine Learning, Jun. 2010, pp.975-982.

[15] Pham N, Le Q, Dang T. HOT aSAX: A novel adaptive sym-bolic representation for time series discords discovery. InProc. the 2nd Intelligent Information and Database Systems,Mar. 2010, pp.113-121.

[16] Fu A, Leung O, Keogh E, Lin J. Finding time series discordsbased on Haar transform. In Proc. the 2nd Advanced DataMining and Applications, Aug. 2006, pp.31-41.

[17] Shieh J, Keogh E. iSAX: Indexing and mining terabyte sizedtime series. In Proc. the 14th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, Aug.2008, pp.623-631.

[18] Son M, Anh D. EWAT+: Finding time series discords basedon new discord measure functions. In Proc. IEEE Inter-national Conference on Computing & Communication Tech-nologies, Research, Innovation, and Vision for the Future,Nov. 2010, pp.1-4.

[19] Luo W, Gallagher M. Faster and parameter-free discordsearch in quasi-periodic time series. In Proc. PAKDD, May2011, pp.135-148.

[20] Eckmann J, Kamphorst S, Ruelle D. Recurrence plots of dy-namical systems. Europhysics Letters, 1987, 4(9): 973-977.

[21] Iwanski J, Bradley E. Recurrence plots of experimental data:To embed or not to embed? Chaos: An InterdisciplinaryJournal of Nonlinear Science, 1998, 8(4): 861-871.

[22] Webber Jr C, Zbilut J. Recurrence quantification analysisof nonlinear dynamical systems. Tutorials in ContemporaryNonlinear Methods for the Behavioral Sciences, 2005, pp.26-94.

[23] Marwan N, Carmen Romano M C, Thiel M, Kurths J. Re-currence plots for the analysis of complex systems. PhysicsReports, 2007, 438(5/6): 237-329.

[24] Marwan N. A historical review of recurrence plots. The Eu-ropean Physical Journal-Special Topics, 2008, 164(1): 3-12.

[25] Chatfield C. The Analysis of Time Series: An Introduction,(6th edition). Chapman & Hall/CRC, 2004.

[26] Keogh E, Lin J, Fu A. The UCR time series discords.http://www.cs.ucr.edu/~eamonn/discords/, April 2010.

[27] Hyndman R. Time series data library. http://data.is/TSDLde-mo, April 2010.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘明业; 洪恩宇;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] 闵应骅;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] 朱鸿;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: