We use cookies to improve your experience with our site.
Xi-Te Wang, De-Rong Shen, Mei Bai, Tie-Zheng Nie, Yue Kou, Ge Yu. An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets[J]. Journal of Computer Science and Technology, 2015, 30(6): 1233-1248. DOI: 10.1007/s11390-015-1596-0
Citation: Xi-Te Wang, De-Rong Shen, Mei Bai, Tie-Zheng Nie, Yue Kou, Ge Yu. An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets[J]. Journal of Computer Science and Technology, 2015, 30(6): 1233-1248. DOI: 10.1007/s11390-015-1596-0

An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets

  • The distance-based outlier is a widely used definition of outlier. A point is distinguished as an outlier on the basis of the distances to its nearest neighbors. In this paper, to solve the problem of outlier computing in distributed environments, DBOZ, a distributed algorithm for distance-based outlier detection using Z-curve hierarchical tree (ZH-tree) is proposed. First, we propose a new index, ZH-tree, to effectively manage the data in a distributed environment. ZH-tree has two desirable advantages, including clustering property to help search the neighbors of a point, and hierarchical structure to support space pruning. We also design a bottom-up approach to build ZH-tree in parallel, whose time complexity is linear to the number of dimensions and the size of dataset. Second, DBOZ is proposed to compute outliers in distributed environments. It consists of two stages. 1) To avoid calculating the exact nearest neighbors of all the points, we design a greedy method and a new ZH-tree based k-nearest neighbor searching algorithm (ZHkNN for short) to obtain a threshold LW. 2) We propose a filter-and-refine approach, which first filters out the unpromising points using LW, and then outputs the final outliers through refining the remaining points. At last, the efficiency and the effectiveness of ZH-tree and DBOZ are testified through a series of experiments.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return