›› 2014, Vol. 29 ›› Issue (3): 436-448.doi: 10.1007/s11390-014-1441-x

Special Issue: Data Management and Data Mining

• Data Management and Data Mining • Previous Articles     Next Articles

Continuous Outlier Monitoring on Uncertain Data Streams

Ke-Yan Cao1 (曹科研), Guo-Ren Wang1, 2 (王国仁), Senior Member, CCF Dong-Hong Han1, 2 (韩东红), Member, CCF, Guo-Hui Ding3 (丁国辉), Ai-Xia Wang1 (王爱侠) and Ling-Xu Shi4 (石凌旭)   

  1. 1 College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
    2 Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang 110819, China;
    3 College of Computer, Shenyang Aerospace University, Shenyang 110819, China;
    4 Department of Command Information System Engineering, Logistic Engineering University of People's Liberation Army Chongqing 400311, China
  • Received:2013-07-02 Revised:2014-04-04 Online:2014-05-05 Published:2014-05-05
  • About author:Ke-Yan Cao is a Ph.D. candidate at Northeastern University, Shenyang. Her research interests include data mining, uncertain data management and data stream management.
  • Supported by:

    The work is supported by the National Natural Science Foundation of China under Grant Nos. 61025007, 61328202, 61173029, 61100024, 61332006, and 61073063, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011004, and the National Basic Research 973 Program of China under Grant No. 2011CB302200-G.

Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the effciency. Furthermore, we propose a pruning approach——Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.

[1] Niennattrakul V, Keogh E, Ratanamahatana C A. Data editing techniques to allow the application of distance-based outlier detection to streams. In Proc. the 10th International Conference on Data Mining, December 2010, pp.947-952.

[2] Jin C Q, Zhang J W, Zhou A Y. Continuous ranking on uncertain streams. Frontiers of Computer Science, 2012, 6(6): 686-699.

[3] Zhang C, Gao M, Zhou A Y. Tracking high quality clusters over uncertain data streams. In Proc. the 25th Int. Conf. Data Engineering, March 29-April 2, 2009, pp.1641-1648.

[4] Aggarwal C C. On density based transforms for uncertain data mining. In Proc. the 23rd International Conference on Data Engineering, April 2007, pp.866-875.

[5] Barbar D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502.

[6] Burdick D, Deshpande P M, Jayram T S, Ramakrishnan R, Vaithyanathan S. OLAP over uncertain and imprecise data. In Proc. the 31st Int. Conf. Very Large Data Bases, August 2005, pp.970-981.

[7] Cheng R, Kalashnikov D V, Prabhakar S. Evaluating probabilistic queries over imprecise data. In Proc. International Conference on Management of Data, June 2003, pp.551-562.

[8] Sarma A D, Benjelloun O, Halevy A, Widom J.Working models for uncertain data. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.7.

[9] Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S. Indexing uncertain categorical data. In Proc. the 23rd Int. Conf. Data Engineering, April 2007, pp.616-625.

[10] Tao Y, Cheng R, Xiao X, Ngai W K, Kao B, Prabhakar S. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proc. the 31st Int. Conf. Very Large Data Bases, August 2005, pp.922-933.

[11] Chen M, Yu G, Gu Y, Jia Z X,Wang Y Q. An effcient method for cleaning dirty-events over uncertain data in WSNs. J. Computer Science and Technology, 2011, 26(6): 942-953.

[12] Yang D, Rundensteiner E A, Ward M O. Neighbor-based pattern detection for windows over streaming data. In Proc. the 12th International Conference on Extending Database Technology, March 2009, pp.529-540.

[13] Aggarwal C C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In Proc. the 29th Int. Conf. Very Large Data Bases, September 2003, pp.81-92.

[14] Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In Proc. the 21st ACM SIGMOD-SIGART-SIGACT Symposium on Principles of Database Systems, June 2002, pp.1-16.

[15] Knorr E M, Ng R T. Algorithms for mining distance-based outliers in large datasets. In Proc. the 24th International Conference on Very Large Data Bases, August 1998, pp.392403.

[16] Angiulli F, Fassetti F. Detecting distance-based outliers in streams of data. In Proc. the 16th International Conference on Information and Knowledge Management, November 2007, pp.811{820.

[17] Kontaki M, Gounaris A, Papadopoulos A N et al. Continuous monitoring of distance-based outliers over data streams. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.135-146.

[18] Assent I, Kranen P, Baldauf C, Seidl T. AnyOut: Anytime outlier detection on streaming data. In Proc. the 17th International Conference on Databases Systems for Advanced Applications, Vol.1, April 2012, pp.228-242.

[19] Aggarwal C C, Yu P S. Outlier detection with uncertain data. In Proc. SIAM Int. Conf. Data Mining, April 2008, pp.483493.

[20] Wang B, Xiao G, Yu H, Yang X. Distance-based outlier detection on uncertain data. In Proc. the 9th Int. Conf. Comp. and Information Technology, October 2009, pp.293-298.

[21] Jiang B, Pei J. Outlier detection on uncertain data: Objects, instances, and inferences. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.422-433.

[22] Wang B, Yang X C, Wang G R, Yu G. Outlier detection over sliding windows for probabilistic data streams. Journal of Computer Science and Technology, 2010, 25(3): 389-400.

[23] Cao K Y, Han D H, Wang G R, et al. An algorithm for outlier detection on uncertain data stream. In Proc. the 15th Asia-Pacific Web Conference, April 2013, pp.449-460.

[24] Yan C, Chen G L, Shen Y F. Outlier analysis for gene expression data. Journal of Computer Science and Technology, 2004, 19(1): 13-21.

[25] Knorr E M, Ng R T. Finding intensional knowledge of distance-based outliers. In Proc. the 25th International Conference on Very Large Data Bases, Sept. 2009, pp.211-222.

[26] Das Sarma A, Benjelloun O, Halevy A, Widom J. Working models for uncertain data. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.7.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[2] Zhu Mingyuan;. Two Congruent Semantics for Prolog with CUT[J]. , 1990, 5(1): 82 -91 .
[3] Yao Xin; Li Guojie;. General Simulated Annealing[J]. , 1991, 6(4): 329 -338 .
[4] Zhang Bo; Zhang Ling;. A Relation Matrix Approach to Labelling Temporal Relations in Scheduling[J]. , 1991, 6(4): 339 -346 .
[5] Harald E. Otto;. UNDO, An Aid for Explorative Learning?[J]. , 1992, 7(3): 226 -236 .
[6] Fei Xianglin; Liao Lei; Wang Hezhen; Wang Chengzao;. Structured Development Environment Based on the Object-Oriented Concepts[J]. , 1992, 7(3): 193 -201 .
[7] Shen Yidong;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[8] Li Renwei; He Pei; Zhang Wenhui;. An Introduction to IN CAPS System[J]. , 1993, 8(1): 26 -37 .
[9] Zhao Zhaokeng; Dai Jun; Chen Wendan;. Automated Theorem Proving in Temporal Logic:T-Resolution[J]. , 1994, 9(1): 53 -62 .
[10] Xiang Dong; Wei Daozheng;. GLOBAL: A Design for Random Testability Algorithm[J]. , 1994, 9(2): 182 -192 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved