We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Ke-Yan Cao, Guo-Ren Wang, Dong-Hong Han, Guo-Hui Ding, Ai-Xia Wang, Ling-Xu Shi. Continuous Outlier Monitoring on Uncertain Data Streams[J]. Journal of Computer Science and Technology, 2014, 29(3): 436-448. DOI: 10.1007/s11390-014-1441-x
Citation: Ke-Yan Cao, Guo-Ren Wang, Dong-Hong Han, Guo-Hui Ding, Ai-Xia Wang, Ling-Xu Shi. Continuous Outlier Monitoring on Uncertain Data Streams[J]. Journal of Computer Science and Technology, 2014, 29(3): 436-448. DOI: 10.1007/s11390-014-1441-x

Continuous Outlier Monitoring on Uncertain Data Streams

Funds: The work is supported by the National Natural Science Foundation of China under Grant Nos. 61025007, 61328202, 61173029, 61100024, 61332006, and 61073063, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011004, and the National Basic Research 973 Program of China under Grant No. 2011CB302200-G.
More Information
  • Author Bio:

    Ke-Yan Cao is a Ph.D. candidate at Northeastern University, Shenyang. Her research interests include data mining, uncertain data management and data stream management.

  • Received Date: July 01, 2013
  • Revised Date: April 03, 2014
  • Published Date: May 04, 2014
  • Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the effciency. Furthermore, we propose a pruning approach——Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
  • [1]
    Niennattrakul V, Keogh E, Ratanamahatana C A. Data editing techniques to allow the application of distance-based outlier detection to streams. In Proc. the 10th International Conference on Data Mining, December 2010, pp.947-952.
    [2]
    Jin C Q, Zhang J W, Zhou A Y. Continuous ranking on uncertain streams. Frontiers of Computer Science, 2012, 6(6): 686-699.
    [3]
    Zhang C, Gao M, Zhou A Y. Tracking high quality clusters over uncertain data streams. In Proc. the 25th Int. Conf. Data Engineering, March 29-April 2, 2009, pp.1641-1648.
    [4]
    Aggarwal C C. On density based transforms for uncertain data mining. In Proc. the 23rd International Conference on Data Engineering, April 2007, pp.866-875.
    [5]
    Barbar D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502.
    [6]
    Burdick D, Deshpande P M, Jayram T S, Ramakrishnan R, Vaithyanathan S. OLAP over uncertain and imprecise data. In Proc. the 31st Int. Conf. Very Large Data Bases, August 2005, pp.970-981.
    [7]
    Cheng R, Kalashnikov D V, Prabhakar S. Evaluating probabilistic queries over imprecise data. In Proc. International Conference on Management of Data, June 2003, pp.551-562.
    [8]
    Sarma A D, Benjelloun O, Halevy A, Widom J.Working models for uncertain data. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.7.
    [9]
    Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S. Indexing uncertain categorical data. In Proc. the 23rd Int. Conf. Data Engineering, April 2007, pp.616-625.
    [10]
    Tao Y, Cheng R, Xiao X, Ngai W K, Kao B, Prabhakar S. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proc. the 31st Int. Conf. Very Large Data Bases, August 2005, pp.922-933.
    [11]
    Chen M, Yu G, Gu Y, Jia Z X,Wang Y Q. An effcient method for cleaning dirty-events over uncertain data in WSNs. J. Computer Science and Technology, 2011, 26(6): 942-953.
    [12]
    Yang D, Rundensteiner E A, Ward M O. Neighbor-based pattern detection for windows over streaming data. In Proc. the 12th International Conference on Extending Database Technology, March 2009, pp.529-540.
    [13]
    Aggarwal C C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In Proc. the 29th Int. Conf. Very Large Data Bases, September 2003, pp.81-92.
    [14]
    Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In Proc. the 21st ACM SIGMOD-SIGART-SIGACT Symposium on Principles of Database Systems, June 2002, pp.1-16.
    [15]
    Knorr E M, Ng R T. Algorithms for mining distance-based outliers in large datasets. In Proc. the 24th International Conference on Very Large Data Bases, August 1998, pp.392403.
    [16]
    Angiulli F, Fassetti F. Detecting distance-based outliers in streams of data. In Proc. the 16th International Conference on Information and Knowledge Management, November 2007, pp.811{820.
    [17]
    Kontaki M, Gounaris A, Papadopoulos A N et al. Continuous monitoring of distance-based outliers over data streams. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.135-146.
    [18]
    Assent I, Kranen P, Baldauf C, Seidl T. AnyOut: Anytime outlier detection on streaming data. In Proc. the 17th International Conference on Databases Systems for Advanced Applications, Vol.1, April 2012, pp.228-242.
    [19]
    Aggarwal C C, Yu P S. Outlier detection with uncertain data. In Proc. SIAM Int. Conf. Data Mining, April 2008, pp.483493.
    [20]
    Wang B, Xiao G, Yu H, Yang X. Distance-based outlier detection on uncertain data. In Proc. the 9th Int. Conf. Comp. and Information Technology, October 2009, pp.293-298.
    [21]
    Jiang B, Pei J. Outlier detection on uncertain data: Objects, instances, and inferences. In Proc. the 27th International Conference on Data Engineering, April 2011, pp.422-433.
    [22]
    Wang B, Yang X C, Wang G R, Yu G. Outlier detection over sliding windows for probabilistic data streams. Journal of Computer Science and Technology, 2010, 25(3): 389-400.
    [23]
    Cao K Y, Han D H, Wang G R, et al. An algorithm for outlier detection on uncertain data stream. In Proc. the 15th Asia-Pacific Web Conference, April 2013, pp.449-460.
    [24]
    Yan C, Chen G L, Shen Y F. Outlier analysis for gene expression data. Journal of Computer Science and Technology, 2004, 19(1): 13-21.
    [25]
    Knorr E M, Ng R T. Finding intensional knowledge of distance-based outliers. In Proc. the 25th International Conference on Very Large Data Bases, Sept. 2009, pp.211-222.
    [26]
    Das Sarma A, Benjelloun O, Halevy A, Widom J. Working models for uncertain data. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.7.
  • Related Articles

    [1]Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data[J]. Journal of Computer Science and Technology, 2017, 32(2): 368-385. DOI: 10.1007/s11390-017-1726-y
    [2]Shi-Ming Guo, Hong Gao. HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases[J]. Journal of Computer Science and Technology, 2016, 31(4): 776-786. DOI: 10.1007/s11390-016-1662-2
    [3]Dong-Hong Han, Xin Zhang, Guo-Ren Wang. Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine[J]. Journal of Computer Science and Technology, 2015, 30(4): 874-887. DOI: 10.1007/s11390-015-1566-6
    [4]Bin Wang, Xiao-Chun Yang, Guo-Ren Wang, Ge Yu. Outlier Detection over Sliding Windows for Probabilistic Data Streams[J]. Journal of Computer Science and Technology, 2010, 25(3): 389-400.
    [5]Yu-Bao Liu, Jia-Rong Cai, Jian Yin, Ada Wai-Chee Fu. Clustering Text Data Streams[J]. Journal of Computer Science and Technology, 2008, 23(1): 112-128.
    [6]Jian-Hua Feng, Qian Qian, Jian-Yong Wang, Li-Zhu Zhou. Efficient Mining of Frequent Closed XML Query Pattern[J]. Journal of Computer Science and Technology, 2007, 22(5): 725-735.
    [7]Zhi-Hong Chong, Jeffrey Xu Yu, Zhen-Jie Zhang, Xue-Min Lin, Wei Wang, Ao-Ying Zhou. Efficient Computation of k-Medians over Data Streams Under Memory Constraints[J]. Journal of Computer Science and Technology, 2006, 21(2): 284-296.
    [8]Mondher Maddouri, Mourad Elloumi. Encoding of Primary Structures of Biological Macromolecules Within a Data Mining Perspective[J]. Journal of Computer Science and Technology, 2004, 19(1).
    [9]ZHOU Aoying, JIN Wen, ZHOU Shuigeng, QIAN Weining, TIAN Zenping. Incremental Mining of the Schema of Semistructured Data[J]. Journal of Computer Science and Technology, 2000, 15(3): 241-248.
    [10]Fan Jianhua, Li Deyi. An Overview of Data Mining and Knowledge Discovery[J]. Journal of Computer Science and Technology, 1998, 13(4): 348-368.

Catalog

    Article views (31) PDF downloads (2200) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return