Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos
Yannis Sismanis DOLAP Motivation Dimensional values annotated with hierarchies: Examples Time: secmin hour Store: coderetailer Flat/Lattice Profound effect on cube complexity L hierarchy levels/dimension & d dimensions: 2 d (L+1) d Traditionally handled externally Mapped into queries on raw data Requires aggregation of the results
Yannis Sismanis DOLAP Importance On Line Analytical Processing (OLAP) Decision Support Systems (DSS) Data Mining Queries Rollup/Drilldown Ad-hoc It’s not just about fast queries Computation/storage/indexing/…
Yannis Sismanis DOLAP Related Work View materialization NP-complete Even greedy algorithms are not practical for high- dimensional/hierarchical cubes Compute the cube Various techniques that suffer from the dimensionality curse Store/Index ROLAP/MOLAP/… Compressed Cubes Condensed,Dwarf,Quotient
Yannis Sismanis DOLAP Our Contribution Extend Dwarf Architecture [SDRK02] Implemented two approaches Partial view-covering Breaks the problem to sub-problems and solves separately each one Hierarchical Treats the problem as a whole Maximize the effects of compression Important on all aspects of cube management Address partial/full materialization Extensive experimentation with real OLAP data
Yannis Sismanis DOLAP Dwarf Overview Complete system (100% accuracy) Compute/Store/Index/Query/Update [SDRK02] Structural Redundancies Prefix Elimination Very high on dense areas Suffix Coalescing (!) Orders of magnitude more important on sparse areas Partial materialization Minimum granularity Resembles iceberg cubes Optimizations Clustering
Yannis Sismanis DOLAP View-covering Partial Dwarfs Use a forest of Dwarfs Encapsulates all views in the hierarchical cube “base dwarf” contains the lowest hierarchical views “partial dwarfs” cover the higher hierarchical views Partial Dwarf: Do not store every possible combination of views Avoids duplication in the final forest of partial dwarfs Fast View covering enumeration process Single traversal over the view-space Keep track of just the last enumerated partial dwarf
Yannis Sismanis DOLAP Partial Dwarfs (example) Store: StoreId Retailer ALL Product: Code Group ALL Customer: Name ALL
Yannis Sismanis DOLAP Hierarchical Dwarf Extend the Dwarf model Incorporate hierarchies inside the Dwarf DAG Even higher-level aggregates can be reached through a path from the root Nature of prefix redundancies changes Common prefixes between partial dwarfs Most importantly suffix redundancies are now exploited in a “global” way
Yannis Sismanis DOLAP Hierarchical Dwarf (example) StoreCodeNameSales S1C2N1$10 S2C3N2$30 S3C1N1$60 StoreCode S1R1C1G2 S2R1C2G1 S3R2C3G2
Yannis Sismanis DOLAP Non-linear Hierarchies
Yannis Sismanis DOLAP Experiments Real-world data 8 dimensions (7458,2765,3857,3247,213,660,4,4) 4 hierarchies (1x6,2x4,1x3) 256-views vs 11,200-views Comparison with base Dwarf I.e all hierarchical queries are mapped to the raw data and then further aggregated Full uncompressed cube (BSF)
Yannis Sismanis DOLAP Computation
Yannis Sismanis DOLAP Storage
Yannis Sismanis DOLAP Full Cube Statistics
Yannis Sismanis DOLAP Query Evaluation Simulated Queries Point/Range Children Effect of Rollup/Drill-Down
Yannis Sismanis DOLAP Queries – Gmin=1
Yannis Sismanis DOLAP Queries – Gmin=1000
Yannis Sismanis DOLAP Conclusions Presented two extensions to the Dwarf architecture Decompose the problem to simpler Embed hierarchies in the structure Suffix redundancies are more apparent in hierarchical cubes Compression ratio of more than 70 times Query response performance increase of ~10 times Sparsity exploitation: A minimum granularity of 1,000 minimizes computation time (about 3 times) and increases performance
Yannis Sismanis DOLAP Questions?