We use cookies to improve your experience with our site.
Shuai Han, Xian-Min Liu, Jian-Zhong Li. Efficient Partitioning Method for Optimizing the Compression on Array Data[J]. Journal of Computer Science and Technology, 2022, 37(5): 1049-1067. DOI: 10.1007/s11390-022-2371-7
Citation: Shuai Han, Xian-Min Liu, Jian-Zhong Li. Efficient Partitioning Method for Optimizing the Compression on Array Data[J]. Journal of Computer Science and Technology, 2022, 37(5): 1049-1067. DOI: 10.1007/s11390-022-2371-7

Efficient Partitioning Method for Optimizing the Compression on Array Data

  • Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that the array partitioning can affect the compression performance significantly, this paper aims to design the efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return