? 基于滑动窗口的近似top-k连续查询算法
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | Help
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (1) :93-109    DOI: 10.1007/s11390-017-1708-0
Data Management and Data Mining << Previous Articles | Next Articles >>
基于滑动窗口的近似top-k连续查询算法
Rui Zhu, Member, CCF, ACM, Bin Wang*, Member, CCF, Shi-Ying Luo, Member, CCF, ACM, Xiao-Chun Yang, Senior Member, CCF, IEEE, Member, ACM, and Guo-Ren Wang, Member, CCF, ACM, IEEE
College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Approximate Continuous Top-k Query over Sliding Window
Rui Zhu, Member, CCF, ACM, Bin Wang*, Member, CCF, Shi-Ying Luo, Member, CCF, ACM, Xiao-Chun Yang, Senior Member, CCF, IEEE, Member, ACM, and Guo-Ren Wang, Member, CCF, ACM, IEEE
College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

摘要
参考文献
相关文章
Download: [PDF 616KB]  
摘要 数据流环境下的top-k连续查询问题是流数据管理领域的经典问题。它返回窗口中分值最高的k个对象。现有算法的核心思想是维护流数据集合的一个子集。当窗口滑动时,新的查询结果可在该子集中找到。然而,上述算法均对查询参数和数据分布敏感。这些算法的增量维护代价较高,它们无法满足用户实时性的需求。针对这些问题,本文首先提出了(ε,δ)-近似top-K连续查询的概念。针对该查询,提出了三种适用于不同数据分布的过滤算法。由理论分析可知,这三种算法均可用Os)的计算代价过滤掉Os-k)的流数据。与此同时,它们可保证不误删查询结果的概率为ε。此后,提出了一种多段归并算法。该算法通过归并策略和压缩策略降低候选对象的维护代价。假设滑动窗口的长度为N,该算法处理N个对象的计算代价为ONlogk+(((NK)/(s))logφ((R)/(εk))+N×costF)。最后,通过模拟实验对所提出算法的性能进行评估。
关键词滑动窗口 top-k连续查询   近似   过滤     
Abstract: Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε,δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1-δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.
Keywordscontinuous top-k query   approximate   sliding window     
Received 2016-02-29;
本文基金:

This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.

通讯作者: Bin Wang     Email: binwang@mail.neu.edu.cn
About author: Rui Zhu received his M.S. degree in computer science from the Department of Computer Science, Northeastern University, Shenyang, in 2008. Currently, he is a Ph.D. candidate of Northeastern University, Shenyang. His research interests include design and analysis of algorithms, databases, data quality, and distributed systems.
引用本文:   
Rui Zhu, Bin Wang, Shi-Ying Luo, Xiao-Chun Yang, Guo-Ren Wang.基于滑动窗口的近似top-k连续查询算法[J]  Journal of Computer Science and Technology , 2017,V32(1): 93-109
Rui Zhu, Bin Wang, Shi-Ying Luo, Xiao-Chun Yang, Guo-Ren Wang.Approximate Continuous Top-k Query over Sliding Window[J]  Journal of Computer Science and Technology, 2017,V32(1): 93-109
链接本文:  
http://jcst.ict.ac.cn:8080/jcst/CN/10.1007/s11390-017-1708-0
Copyright 2010 by Journal of Computer Science and Technology