›› 2017, Vol. 32 ›› Issue (1): 93-109.doi: 10.1007/s11390-017-1708-0

Special Issue: Data Management and Data Mining

• Data Management and Data Mining • Previous Articles     Next Articles

Approximate Continuous Top-k Query over Sliding Window

Rui Zhu, Member, CCF, ACM, Bin Wang*, Member, CCF, Shi-Ying Luo, Member, CCF, ACM, Xiao-Chun Yang, Senior Member, CCF, IEEE, Member, ACM, and Guo-Ren Wang, Member, CCF, ACM, IEEE   

  1. College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2016-02-29 Revised:2016-08-17 Online:2017-01-05 Published:2017-01-05
  • Contact: Bin Wang E-mail:binwang@mail.neu.edu.cn
  • About author:Rui Zhu received his M.S. degree in computer science from the Department of Computer Science, Northeastern University, Shenyang, in 2008. Currently, he is a Ph.D. candidate of Northeastern University, Shenyang. His research interests include design and analysis of algorithms, databases, data quality, and distributed systems.
  • Supported by:

    This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε,δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1-δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.

