We use cookies to improve your experience with our site.

有效的针对分布型和非分布型聚集函数的增量维护

Efficient Incremental Maintenance for Distributive and Non-Distributive Aggregate Functions

  • 摘要: 为了提高数据立方体的计算效率,人们已经进行了广泛的研究工作。但是,由于数据量太大,要计算一个完整的数据立方体代价总是非常昂贵。最近提出 Quotient 数据立方体采用划分的方法解决了这个由于数据量太大而导致计算效率难以提高的问题。该方法不仅对分布型聚集函数如 Sum 非常有效,而且也可以应用于非分步型聚集函数如 Median 的增量维护。这样的聚集函数需要存储所有的基元组。 然而,当数据源发生改变的时候, Quotient 数据立方体很难进行维护,尤其是针对 Median 这样的非分步型聚集函数,因为当插入新的元组时, Quotient 数据立方体中的等价类有的需要进行拆分,有的还需要重新生成。当从 Quotient 数据立方体中删除元组的时候,有些等价类需要进行合并。而且对 Median 来说,在聚集计算时需要所有的源数据参与,维护代价非常大。 本文针对可分布型聚集函数(如 SUM ),提出了两种有效的增量维护方法,一种是单元组递增更新的方法,另外一种批量更新的方法。针对非分布型聚集函数(如 MEDIAN ),提出了一种有效的滑动窗口技术,该技术和 addset 存储结构的结合极大地减少了维护不可自维护型函数时所需要的存储空间和运行时间。同时提出了对不可自维护型函数进行增量维护的有效方法。实验证明我们设计的算法是有效的。

     

    Abstract: Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage ofa set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. Forthe aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube. Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases.

     

/

返回文章
返回