Presentation is loading. Please wait.

Presentation is loading. Please wait.

March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

Similar presentations


Presentation on theme: "March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University."— Presentation transcript:

1 March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University

2 March 30 2001DGRC FedStats Visit Research Experience n Complex query processing n Data Warehousing n Main memory databases Students: Kazi Zaman, Junyan Ding

3 March 30 2001DGRC FedStats Visit Mediator Query Unified Results User Main- Memory DBMS Traditional DBMS... Scenario A

4 March 30 2001DGRC FedStats Visit Mediator Data Request Unified Results User Web Traditional DBMS... Scenario B Main Memory DB Sequence Of Interactive Queries Queries

5 March 30 2001DGRC FedStats Visit Mediator Data Request Unified Results User Web Traditional DBMS... Scenario C Main Memory DB Graphical User Interface Dynamic Query

6 March 30 2001DGRC FedStats Visit Outline n Introduction to Datacubes n Frameworks for querying cubes n The Main Memory based framework n Experimental Results n Conclusions and Plan

7 March 30 2001DGRC FedStats Visit The CUBE BY Operator State Year Grade Sales CA 1997 Regular 90 NY 1997 Premium 70 CA 1998 Premium 65 NY 1998 Premium 95 State Year Grade Sales CA 1997 Regular 90 CA 1997 ALL 90 ALL 1997 Regular 90 CA ALL Regular 90 ALL 1997 Regular 90 ALL 1997 ALL 160 ALL ALL Regular 90 CA ALL ALL 155 ALL ALL ALL 320 CUBE BY (sum Sales) Large increase in total Size, especially with many dimensions ……. Additional records

8 March 30 2001DGRC FedStats Visit Lattice Representation State, Year, Grade State, YearState, Grade Year, Grade StateYear Grade

9 March 30 2001DGRC FedStats Visit Modeling Queries Slice Queries ask for a single aggregate record SELECT State, year, sum(sales) FROM BLS-12345 GROUP BY State, year HAVING State = “NY” AND year = “1998”

10 March 30 2001DGRC FedStats Visit Existing Frameworks State, Year, Grade State, Year State,Grade Year,Grade State Year Grade Choose subset of cube to materialize based on workload. Materialize on disk Appropriate record recovered or computed for incoming slice query Drawbacks: Ignores Clustering of Relation on disk. Smallest unit of materialization is too big.

11 March 30 2001DGRC FedStats Visit Our approach State, Year, Grade State, Year State,Grade Year,Grade State Year Grade The full cube is often larger than available memory, but... The finest granularity aggregate may fit. Any record can be computed without having to go to disk. How should the finest granularity be organized ?

12 March 30 2001DGRC FedStats Visit Framework Level-1 Store Level-2 Store records in linked lists Slot directory Selected coarse records in hash table Finest granularity cuboid Query q

13 March 30 2001DGRC FedStats Visit The Level-1 Store Records are pairs stored in a hash table. Records can contain ALL’s Given query Q, form composite key and check level-1 store (constant time). If not found, use level-2 store Key Value a1 55 b2 34 c2 12 …...

14 March 30 2001DGRC FedStats Visit The Level-2 Store Level-2 Store records in linked lists Slot directory Finest granularity cuboid Slot directory is organized as a multidimensional array: level2[sz1][sz2][sz3][sz4] Each slot points to a linked list of elements. Records placed according to set of mapping functions H

15 March 30 2001DGRC FedStats Visit Using the Level-2 store b4 Query Q without ALL’s d5 a3 c2 Slot 4 Slot 3 Slot 7 Slot1 Access list denoted by level2[4][3][7][1] ; aggregate those matching (a3,b4,c2,d5).

16 March 30 2001DGRC FedStats Visit Using the Level-2 store ALL Query Q with ALL’s ALLa3 c2 Slot 4 List of Slots Slot 7 List of Slots Access lists matching level2[4][*][7][*] ; aggregate those matching (a3,*,c2,*).

17 March 30 2001DGRC FedStats Visit Demo n Shows multidimensional dataset (subset of columns of 5% Census sample for NY in 1990). n User asks queries: fast answers. n Future: User Interface asks many queries, with display changing interactively. n demo demo

18 March 30 2001DGRC FedStats Visit Experimental Results Scanning all records takes 194 ms.

19 March 30 2001DGRC FedStats Visit Importance of Work Aggregation is fundamental to analysis. Make analysis interactive, even for many dimensions. Make a variety of aggregate granularities available, where possible.

20 March 30 2001DGRC FedStats Visit Contributions n A Main Memory based framework for answering datacube queries efficiently. n Query Performance in the 2-4 ms range which is more efficient than going to disk.

21 March 30 2001DGRC FedStats Visit Plan n Integrate with user interface to generate dynamic queries. n Self-tuning capability. n Multiple data sets. n Work with agencies to generate value –For intra-agency analysis –For enhanced data dissemination


Download ppt "March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University."

Similar presentations


Ads by Google