›› 2017, Vol. 32 ›› Issue (1): 93-109.doi: 10.1007/s11390-017-1708-0

Special Issue: Data Management and Data Mining

• Data Management and Data Mining • Previous Articles     Next Articles

Approximate Continuous Top-k Query over Sliding Window

Rui Zhu, Member, CCF, ACM, Bin Wang*, Member, CCF, Shi-Ying Luo, Member, CCF, ACM, Xiao-Chun Yang, Senior Member, CCF, IEEE, Member, ACM, and Guo-Ren Wang, Member, CCF, ACM, IEEE   

  1. College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2016-02-29 Revised:2016-08-17 Online:2017-01-05 Published:2017-01-05
  • Contact: Bin Wang E-mail:binwang@mail.neu.edu.cn
  • About author:Rui Zhu received his M.S. degree in computer science from the Department of Computer Science, Northeastern University, Shenyang, in 2008. Currently, he is a Ph.D. candidate of Northeastern University, Shenyang. His research interests include design and analysis of algorithms, databases, data quality, and distributed systems.
  • Supported by:

    This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε,δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1-δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.

[1] Yang D, Shastri A, Rundensteiner E A, Ward M O. An optimal strategy for monitoring top-k queries in streaming windows. In Proc. the 14th International Conference on Extending Database Technology, March 2011, pp.57-68.

[2] Mouratidis K, Bakiras S, Papadias D. Continuous monitoring of top-k queries over sliding windows. In Proc. ACM SIGMOD International Conference on Management of Data, June 2006, pp.635-646.

[3] Bai M, Xin J C, Wang G R, Zhang L M, Zimmermann R, Yuan Y, Wu X D. Discovering the k representative skyline over a sliding window. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2041-2056.

[4] Yu A, Agarwal P K, Yang J. Processing a large number of continuous preference top-k queries. In Proc. ACM SIGMOD International Conference on Management of Data, June 2012, pp.397-408.

[5] Shen Z T, Cheema M A, Lin X M, Zhang W J, Wang H X. Efficiently monitoring top-k pairs over sliding windows. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.798-809.

[6] Yang X C, Qiu T, Wang B, Zheng B H, Wang Y S, Li C. Negative factor:Improving regular-expression matching in strings. ACM Transactions on Database Systems, 2016, 40(4):25.

[7] Yang X C, Liu H L, Wang B. ALAE:Accelerating local alignment with affine gap exactly in biosequence databases. Proceedings of the VLDB Endowment, 2012, 5(11):1507-1518.

[8] Yang X C, Wang B, Qiu T, Wang Y S, Li C. Improving regular-expression matching on strings using negative factors. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.361-372.

[9] Xie X H, Yang X C, Wang J Y, Wang B, Li C. Efficient direct search on compressed genomic data. In Proc. the 29th International Conference on Data Engineering, April 2013, pp.961-972.

[10] Yi K, Yu H, Yang J, Xia G Q, Chen Y G. Efficient maintenance of materialized top-k views. In Proc. the 19th International Conference on Data Engineering, March 2003, pp.189-200.

[11] Pripu?i? K, ? arko I P, Aberer K. Time-and space-efficient sliding window top-k query processing. ACM Transactions on Database Systems, 2015, 40(1):Article No. 1.

[12] Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In Proc. the 28th Annual ACM Symposium on the Theory of Computing, May 1996, pp.20-29.

[13] Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644.

[14] Harvey N J A, Nelson J, Onak K. Sketching and streaming entropy via approximation theory. In Proc. the 49th Annual IEEE Symposium on Foundations of Computer Science, Oct. 2008, pp.489-498.

[15] Tong Y X, Zhang X F, Chen L. Tracking frequent items over distributed probabilistic data. World Wide Web, 2016, 19(4):579-604.

[16] Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In Proc. the 29th International Conference on Automata, Languages and Programming, July 2002, pp.693-703.

[17] Ganguly S, Majumder A. Cr-precis:A deterministic summary structure for update data streams. In Proc. the 1st Int. Symp. Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, April 2007, pp.48-59.

[18] Shrivastava N, Buragohain C, Agrawal D, Suri S. Medians and beyond:New aggregation techniques for sensor networks. In Proc. the 2nd International Conference on Embedded Networked Sensor Systems, November 2004, pp.239-249.

[19] Cormode G, Muthukrishnan S. An improved data stream summary:The count-min sketch and its applications. Journal of Algorithms, 2005, 55(1):58-75.

[20] DeGroot M H, Schervish M J. Probability and Statistics (4th edition). China Machine Press, 2012.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Jose K- Raphel; Siu Cheung Hui; Angela Goh;. Class Based Contextual Logic for DOOD[J]. , 1996, 11(2): 161 -170 .
[2] Chi-Ming CHUNG; Ding-An CHIANG; YANG Qing;. A Comparative Analysis of Different Arbitration Protocols for Multiple-Bus Multiprocessors[J]. , 1996, 11(3): 313 -325 .
[3] Ma Zongmin; Yan Li;. Using Multivalued Logic in Relational Database Containing Null Value[J]. , 1996, 11(4): 421 -426 .
[4] Hong Mei, Dong-Gang Cao, and Fu-Qing Yang. Development of Software Engineering: A Research Perspective[J]. , 2006, 21(5): 682 -696 .
[5] Wen Zheng, Jun-Hai Yong, and Jean-Claude Paul. Visual Simulation of Multiple Unmixable Fluids[J]. , 2007, 22(1): 156 -160 .
[6] Hai-Bo Tian, Xi Sun, and Yu-Min Wang. A New Public-Key Encryption Scheme[J]. , 2007, 22(1): 95 -02 .
[7] Donggeon Noh and Heonshik Shin. URECA: Efficient Resource Location Middleware for Ubiquitous Environment[J]. , 2008, 23(6 ): 929 -943 .
[8] Shao-Ping Lu (卢少平), Student Member, CCF, ACM, and Song-Hai Zhang (张松海), Member, CCF, ACM, IEEE. Saliency-Based Fidelity Adaptation Preprocessing for Video Coding[J]. , 2011, 26(1): 195 -202 .
[9] Long Zheng (郑龙), Mian-Xiong Dong (董冕雄), Student Member, IEEE, Kaoru Ota, Hai Jin (金海), Senior Member, IEEE, Member, ACM, Song Guo, Senior Member, IEEE, Member, ACM, and Jun Ma (马俊), Student Member, IEEE. Energy Efficiency of a Multi-Core Processor by Tag Reduction[J]. , 2011, 26(3): 491 -503 .
[10] Xin Liu (刘欣) and Tsuyoshi Murata, Member, ACM, IEEE. Detecting Communities in K-Partite K-Uniform (Hyper)Networks[J]. , 2011, 26(5): 778 -791 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved