We use cookies to improve your experience with our site.
Zhi-Hong Chong, Jeffrey Xu Yu, Zhen-Jie Zhang, Xue-Min Lin, Wei Wang, Ao-Ying Zhou. Efficient Computation of k-Medians over Data Streams Under Memory Constraints[J]. Journal of Computer Science and Technology, 2006, 21(2): 284-296.
Citation: Zhi-Hong Chong, Jeffrey Xu Yu, Zhen-Jie Zhang, Xue-Min Lin, Wei Wang, Ao-Ying Zhou. Efficient Computation of k-Medians over Data Streams Under Memory Constraints[J]. Journal of Computer Science and Technology, 2006, 21(2): 284-296.

Efficient Computation of k-Medians over Data Streams Under Memory Constraints

  • In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data streams on top of the requirements of high accuracy and small memory. Our work is motivated by the following observation: the existing algorithms have similar approximation behaviors in practice, even though they make noticeably different worst case theoretical guarantees. The underlying reason is that in order to achieve high approximation level with the smallest possible memory, they need rather complex techniques to maintain a sketch, along time dimension, by using some existing off-line clustering algorithms. Those clustering algorithms cannot guarantee the optimal clustering result over data segments in a data stream but accumulate errors over segments, which makes most algorithms behave the same in terms of approximation level, in practice. We propose a new grid-based approach which divides the entire data set into cells (not along time dimension). We can achieve high approximation level based on a novel concept called (1-\epsilon)-dominant. We further extend the method to the data stream context, by leveraging a density-based heuristic and frequent item mining techniques over data streams. We only need to apply an existing clustering once to computing k-medians, on demand, which reduces CPU time significantly. We conducted extensive experimental studies, and show that our approaches outperform other well-known approaches.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return