Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε,δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1-δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.
This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.
通讯作者: Bin Wang
About author: Rui Zhu received his M.S. degree in computer science from the Department of Computer Science, Northeastern University, Shenyang, in 2008. Currently, he is a Ph.D. candidate of Northeastern University, Shenyang. His research interests include design and analysis of algorithms, databases, data quality, and distributed systems.
Rui Zhu, Bin Wang, Shi-Ying Luo, Xiao-Chun Yang, Guo-Ren Wang.基于滑动窗口的近似top-k连续查询算法[J] Journal of Computer Science and Technology , 2017,V32(1): 93-109
Rui Zhu, Bin Wang, Shi-Ying Luo, Xiao-Chun Yang, Guo-Ren Wang.Approximate Continuous Top-k Query over Sliding Window[J] Journal of Computer Science and Technology, 2017,V32(1): 93-109
 Yang D, Shastri A, Rundensteiner E A, Ward M O. An optimal strategy for monitoring top-k queries in streaming windows. In Proc. the 14th International Conference on Extending Database Technology, March 2011, pp.57-68. Mouratidis K, Bakiras S, Papadias D. Continuous monitoring of top-k queries over sliding windows. In Proc. ACM SIGMOD International Conference on Management of Data, June 2006, pp.635-646. Bai M, Xin J C, Wang G R, Zhang L M, Zimmermann R, Yuan Y, Wu X D. Discovering the k representative skyline over a sliding window. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2041-2056. Yu A, Agarwal P K, Yang J. Processing a large number of continuous preference top-k queries. In Proc. ACM SIGMOD International Conference on Management of Data, June 2012, pp.397-408. Shen Z T, Cheema M A, Lin X M, Zhang W J, Wang H X. Efficiently monitoring top-k pairs over sliding windows. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.798-809. Yang X C, Qiu T, Wang B, Zheng B H, Wang Y S, Li C. Negative factor:Improving regular-expression matching in strings. ACM Transactions on Database Systems, 2016, 40(4):25. Yang X C, Liu H L, Wang B. ALAE:Accelerating local alignment with affine gap exactly in biosequence databases. Proceedings of the VLDB Endowment, 2012, 5(11):1507-1518. Yang X C, Wang B, Qiu T, Wang Y S, Li C. Improving regular-expression matching on strings using negative factors. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.361-372. Xie X H, Yang X C, Wang J Y, Wang B, Li C. Efficient direct search on compressed genomic data. In Proc. the 29th International Conference on Data Engineering, April 2013, pp.961-972. Yi K, Yu H, Yang J, Xia G Q, Chen Y G. Efficient maintenance of materialized top-k views. In Proc. the 19th International Conference on Data Engineering, March 2003, pp.189-200. Pripu?i? K, ? arko I P, Aberer K. Time-and space-efficient sliding window top-k query processing. ACM Transactions on Database Systems, 2015, 40(1):Article No. 1. Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In Proc. the 28th Annual ACM Symposium on the Theory of Computing, May 1996, pp.20-29. Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644. Harvey N J A, Nelson J, Onak K. Sketching and streaming entropy via approximation theory. In Proc. the 49th Annual IEEE Symposium on Foundations of Computer Science, Oct. 2008, pp.489-498. Tong Y X, Zhang X F, Chen L. Tracking frequent items over distributed probabilistic data. World Wide Web, 2016, 19(4):579-604. Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In Proc. the 29th International Conference on Automata, Languages and Programming, July 2002, pp.693-703. Ganguly S, Majumder A. Cr-precis:A deterministic summary structure for update data streams. In Proc. the 1st Int. Symp. Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, April 2007, pp.48-59. Shrivastava N, Buragohain C, Agrawal D, Suri S. Medians and beyond:New aggregation techniques for sensor networks. In Proc. the 2nd International Conference on Embedded Networked Sensor Systems, November 2004, pp.239-249. Cormode G, Muthukrishnan S. An improved data stream summary:The count-min sketch and its applications. Journal of Algorithms, 2005, 55(1):58-75. DeGroot M H, Schervish M J. Probability and Statistics (4th edition). China Machine Press, 2012.