Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ.

Similar presentations


Presentation on theme: "Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ."— Presentation transcript:

1 Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ. Todd Eavis, Dalhousie Univ.

2 Data Warehousing for Decision Support zOperational data collected into DW zDW used to support multi- dimensional views zViews form the basis of OLAP processing zOur focus: the OLAP server

3 Multi-dimensional views zCollection of feature attributes zAggregate along one or more measure attributes zReduce the granularity by collapsing dimensions zPoints generated by: ydistributive functions(e.g., sum) yalgebraic functions (e.g., average) yholistic functions(e.g., median)

4 Data Cube Generation zProposed by Gray et al in 1995 zCan be generated manually from a relational DB but this is very inefficient zExploit the relationship between cuboids to compute all 2 d cuboids zIn OLAP environments, we typically pre-compute these views to improve query response time ABC AB ACBC AC B ALL

5 Existing Parallel Results zGoil & Choudhary zMOLAP solution yin-memory structures yglobal partition + d communication rounds ydistributed views zLimitations yMemory for multi- dimensional arrays yexpensive communication for larger d J. Of Data Mining & Knowledge Discovery 1(4), 1997

6 Our Approach zROLAP solution yConstruct and cost the data cube lattice yFind a least cost spanning tree yPartition the spanning tree over the processors equally, construct views and distribute yCan handle partial cubes zLimitations yWhat about indexing????? ABCD ABCABD ACDBCD AB AC ADBCBDCD AA BB CCDD All CCGrid01 + J. Dist. & Parallel Databases 11(2), 2001

7 Parallel Multi-dimensional Indexing zQuery specifies a range on multiple dimensions zForms a hypercube in the point space

8 General Approach zNo multidimensional index is universally successful zExploit domain specific information and the features of a particular index zOLAP yData is provided up front yUpdates are batch oriented

9 Design Goals zA framework for distributed high- performance indexing of ROLAP cubes yPractical to implement yLow communication volume yFully adapted to external memory (disks) yNo shared disk required yIncrementally maintainable yEfficient for high D spatial searches yScalable in terms of data size, dimensions, processors

10 Challenge zHow to order and partition data such that yNumber of records retrieved per node is as balanced as possible yMinimize the number of disk seeks required in answering a query ABC P1P1 P2P2 P3P3 P4P4

11 Indexing the Data Cube zCombine the strengths of a space filling and an r-tree index zUse Hilbert curve to load buckets zIndex buckets with r- tree zUpdate indexes with merge/sort

12 Space Filling Curves & Striping

13 Query Retrieval P1P1 P2P2 P3P3 P4P4 ABC

14 Example Original SpaceProcessor 1Processor 2 8 points to be reported Reports: 2 consecutive blocks & 4 points

15 The Parallel Framework zA single view is partitioned across p processors zPartial Hilbert/r-tree indexes are computed locally zQueries are answered concurrently zQueries answered individually or piggy- backed

16 The Virtual Data Cube z Problem: Full cube often to large to materialize z Solution: Use surrogate views

17 Surrogate Processing

18 Other issues… zDimension ordering zQuery piggybacking zBatch updating zManaging Hierarchies of views

19 Experimental Results zMachine y17 node cluster yNode = 1.8 GHz Xeon, 1 GB RAM, 2 * 40 GB IDE drives, running Linux yInterconnect = Intel Fast Ethernet switch zTest Data y10 dimensions and 1,000,000 records

20 RCUBE index Construction Output: ~640 million rows, 16 Gigabytes

21 Distributed Query Resolution Test: Random queries returning ~15% of points (10 experiments per point)

22 Disk blocks retrieved vs. Disk Seeks Test: Random queries returning 5-15% of points (15 experiments per point)

23 Distributed Query Resolution in Surrogate Group-bys

24 Thank You Questions?


Download ppt "Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ."

Similar presentations


Ads by Google