Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo)

Similar presentations


Presentation on theme: "Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo)"— Presentation transcript:

1 Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign) + * The work is partially supported by NSERC and NCE/IRIS + The work is partially supported by NSF, UI, and Microsoft Research

2 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube2 Outline Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

3 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube3 Data Cube Base table DimensionsMeasure StoreProductSeasonAVG(Sales) S1P1Spring6 S1P2Spring12 S2P1Fall9 S1*Spring9 ………… ***9 DimensionsMeasure StoreProductSeasonSales S1P1Spring6 S1P2Spring12 S2P1Fall9 Aggregation

4 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube4 Previous Work: Efficient Cube Computation Compute a cube from a base table: e.g. (Agarwal et al. 98), (Zhao et al. 97) View materialization with space constraint: e.g. Harinarayann et al. 96 Handling scarcity (Ross & Srivastava 97) Cube compression: e.g. (Sismanis et al. 02), (Shanmugasundaram et al. 99), (Want et al. 02) Approximation: e.g. (Barbara & Sullivan 97), (Barbara & Xu 00), (Vitter et al. 98) Constrained cube construction: e.g. (Beyer & Ramakrishnan 99)

5 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube5 Previous Work: Extracting Semantics From Cubes General contexts of patterns (Sathe & Sarawagi 01) Generalize association rules (Imielinski et al. 00) Cube gradient analysis (Dong et al. 01)

6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube6 Cube (Cell) Lattice Many cells have same aggregate values Can we summarize the semantics of the cube by grouping cells by aggregate values? (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*):9(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9

7 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube7 A Naïve Attempt Put all cells having same aggregate value in a class (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*):9(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9 C1C2C3 C4

8 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube8 Problems w/ the Naïve Attempt The result is not a lattice anymore! –Anomaly –The rollup/drilldown semantics is lost C1C2C3 C4 (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*):9(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9

9 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube9 A Better Partitioning Quotient cube: partitioning reserving the rollup/drilldown semantics (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9 C1C3 C5 C4 C2

10 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube10 Problem Statement Given a cube, characterize a good way (quotient cube) of partitioning its cells into classes such that –The partition generates a reduced lattice preserving the rollup/drilldown semantics –The partition is optimal: # classes as small as possible Compute quotient cubes efficiently

11 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube11 Why A Quotient Cube Useful? Semantic compression Semantic OLAP browsing (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9 C1C2 C5 C4 C3

12 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube12 Why A Quotient Cube Useful? Semantic compression Semantic OLAP browsing (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9 C1C2 C5 C4 (S2,P1,f):9 (S2,*,f):9(S2,P1,*)(*,P1,f):9 (*,*,f):9 (S2,*,*):9

13 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube13 Outline Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

14 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube14 Convex Partitions A convex partition retains semantics (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9 C1C3 C5 C4 C2

15 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube15 A Non-convex Partition Anomaly The rollup/drilldown semantics is lost C1C2C3 C4 (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*):9(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9

16 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube16 Connected Partitions Cells c1 and c2 are connected if a series of rollup/drilldown operation starting from c1 can touch c2 Intuitively, (each class of) a partition should be connected

17 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube17 Cover Partition For a cell c, a tuple t in base table is in c’s cover if t can be rolled up to c –E.g., Cov(S1,*,spring)={(S1,P1,spring), (S1,P2,spring)} DimensionsMeasure StoreProductSeasonSales S1P1Spring6 S1P2Spring12 S2P1Fall9

18 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube18 Cover Partitions Are Convex All cells having the same cover are in a class (S1,P2,s) and (*,P2,*) cover same tuples in the base table  (S1,P2,*) and (*,P2,s) are in the same class. (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9

19 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube19 Cover Partitions Are Connected Cells c1 and c2 have the same cover  there must be some common ancestor c3 of c1 and c2 st c3 has the same cover –Cells c1 and c2 are in the same class and connected (S1,P1,s):6(S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12 (S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12 (*,*,f):9(S2,*,*):9 (*,*,*):9

20 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube20 Cover Partitions & Aggregates All cells in a cover partition carry the same aggregate value w.r.t. any aggregate function –But cells in a class of MIN() may have different covers For COUNT() and SUM() (positive), cover equivalence coincides with aggregate equivalence

21 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube21 Outline Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

22 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube22 Class 1 = Class 2 Class 1 Weak Congruence Weak congruence preserves semantics Class 2 cc’ dd’ rollup cc’ dd’ rollup imply

23 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube23 Weak Congruence = Convex Convex  no “hole” in the class  weak congruence They preserve the rollup/drilldown semantics Quotient cube lattice is the lattice of convex classes How to derive the coarsest quotient cube?

24 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube24 Monotone Aggregate Functions Monotone functions –S  T  f(S)  f(T) –S  T  f(S)  f(T) –MIN(), MAX(), COUNT(), PSUM(), … The aggregate function f is monotone   f is the unique coarsest partition –MIN(): put all cells having the same MIN() value into a class

25 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube25 Non-monotone Functions Bad news:  f may or may not be a convex/weak congruence.  Good news: cover partition is convex (I.e., weak congruence) and always yields a quotient cube w.r.t. any aggregate function!

26 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube26 Outline Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

27 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube27 How to Compute A QC Aggregate functions –Monotone functions –Non-monotone functions Settings –The cube is available –Only the base table is available

28 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube28 Monotone Functions The cube is available  grab all cells with the same aggregate value and put them into a class Only the base table is available  bottom-up, depth-first search –For a cell, compute its cover, find the upper bound having the same aggregate value –Group lower bounds by upper bounds

29 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube29 Example: Cover QC (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9(*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9

30 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube30 Non-monotone Functions Class merging Find cover partition classes Merge classes as long as convexity is retained

31 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube31 Example: AVG QC (S1,P1,s):6(S1,P2,s):12(S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9(S2,P1,*)(*,P1,f):9 (S1,*,*):9(*,*,s):9 (*,P1,*):7.5(*,P2,*):12(*,*,f):9(S2,*,*):9 (*,*,*):9

32 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube32 Outline Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

33 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube33 Reduction Ratio vs. Dimensionality # base tuples = 200k Zipf factor = 2.0

34 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube34 Reduction Ratio vs. Zipf Factor # base tuples = 200k # dimensions = 6

35 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube35 Reduction Ratio vs. Base Table Size Zipf factor = 2.0 # dimensions = 6

36 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube36 Runtime Zipf factor = 2.0 # dimensions = 6

37 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube37 Compression Ratio on Weather Data Set

38 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube38 Outline Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

39 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube39 Semantic Cube Exploration Theoretical foundation for semantic summarization in data cube –concept and properties of quotient cubes Efficient algorithms for quotient cube construction –Quotient cubes can be computed directly from base tables

40 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube40 Ongoing Research Efficient implementation of quotient cube-based OLAP system –Data warehouse built using quotient cubes Hierarchies and constraints Incremental maintenance Semantics based OLAP and mining Efficient query answering

41 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube41 References (1) R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB, 1996. D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997. D. Barbara and X. Wu. Using loglinear models to compress datacube. In WAIM'2000}, pages 311--322, 2000. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD'99.

42 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube42 Reference (2) G. Birkhoff, Lattice Theory, 2 nd edition, New York, American Mathematical Society (Colloquium Publications, vol. 25), 1948. S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE'99. Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE'96. C.-T. Ho, J. Bruck, and R. Agrawal. Partial-sum queries in data cubes using covering codes. In PODS'97. J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In SIGMOD'01.

43 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube43 Reference (3) V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96. T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing Association Rules. Technical Report, Rutgers University, August 2000. H. V. Jagadish, J. Madar, R.T. Ng. Semantic Compression and Pattern Extraction with Fascicles. VLDB'99. K. Ross and D. Srivastava. Fast computation of sparse datacubes. In VLDB'97. G. Sathe and S. Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. VLDB'01.

44 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube44 Reference (4) J. Shanmugasundaram, U.M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. SIGKDD’99. J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation and historgrams via wavelets. In CIKM'98. W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed cube: An effective approach to reducing data cube size. In ICDE'02. Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97. G.K. Zipf. Human Behavior and The Principle of Least Effort Addison-Wesley, 1949.


Download ppt "Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo)"

Similar presentations


Ads by Google