Compressing Data Cube in Parallel OLAP Systems
Sangster Award Recipient Bo-Yong Liang, PhD candidate, Carleton University
Data warehouses provide the primary support for Decision Support Systems (DSS) and Business Intelligence (BI) systems. One of the most interesting recent themes in this respect has been the computation and manipulation of the data cube, a relational model that can be used to support On-Line Analytical Processing (OLAP). Within the context of massive data volumes, data cube computation has to be very efficient with respect to speed and space. Many research studies have shown that parallel computation effectively speeds up data cube construction. Data cube compression, on the other hand, is not only crucial for computing and storing data cubes in limited space, but also reduces I/O access time. Though compression algorithms are quite common in the literature, most are poorly suited to database/cube environments since they (i) offer relatively poor compression ratios or (ii) result in significant run-time penalties.
My work both proposes and implements an efficient algorithm to compress
the
cubes in the progress of the parallel data cube generation. This low
overhead compression mechanism provides block-by-block and
record-by-record
compression by using tuple difference coding techniques, thereby
maximizing
the compression ratio and minimizing the decompression penalty at
run-time.
The experimental results demonstrate that the typical compression ratio is
about 30:1 without sacrificing running time. My work also proposes two
data
cube computation algorithms (random query, sub cube construction) based on
the compressed data. Moreover, my work demonstrates that the compression
method is also suitable for Hilbert Space Filling Curve, a mechanism widely
used in multi-dimensional indexing.