Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ.

Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ. Todd Eavis, Dalhousie Univ.

Data Warehousing for Decision Support zOperational data collected into DW zDW used to support multidimensional views zViews form the basis of OLAP processing zOur focus: the OLAP server

Multi-dimensional views zCollection of feature attributes zAggregate along one or more measure attributes zReduce the granularity by collapsing dimensions zPoints generated by: ydistributive functions(e.g., sum) yalgebraic functions (e.g., average) yholistic functions(e.g., median)

Data Cube Generation zProposed by Gray et al in 1995 zCan be generated manually from a relational DB but this is very inefficient zExploit the relationship between cuboids to compute all 2 d cuboids zIn OLAP environments, we typically pre-compute these views to improve query response time ABC AB ACBC AC B ALL

Existing Parallel Results zGoil & Choudhary zMOLAP solution yin-memory structures yglobal partition + d communication rounds ydistributed views zLimitations yMemory for multidimensional arrays yexpensive communication for larger d J. Of Data Mining & Knowledge Discovery 1(4), 1997

Our Approach zROLAP solution yConstruct and cost the data cube lattice yFind a least cost spanning tree yPartition the spanning tree over the processors equally, construct views and distribute yCan handle partial cubes zLimitations yWhat about indexing????? ABCD ABCABD ACDBCD AB AC ADBCBDCD AA BB CCDD All CCGrid01 + J. Dist. & Parallel Databases 11(2), 2001

Parallel Multi-dimensional Indexing zQuery specifies a range on multiple dimensions zForms a hypercube in the point space

General Approach zNo multidimensional index is universally successful zExploit domain specific information and the features of a particular index zOLAP yData is provided up front yUpdates are batch oriented

Design Goals zA framework for distributed high- performance indexing of ROLAP cubes yPractical to implement yLow communication volume yFully adapted to external memory (disks) yNo shared disk required yIncrementally maintainable yEfficient for high D spatial searches yScalable in terms of data size, dimensions, processors

Challenge zHow to order and partition data such that yNumber of records retrieved per node is as balanced as possible yMinimize the number of disk seeks required in answering a query ABC P1P1 P2P2 P3P3 P4P4

Indexing the Data Cube zCombine the strengths of a space filling and an r-tree index zUse Hilbert curve to load buckets zIndex buckets with r- tree zUpdate indexes with merge/sort

Space Filling Curves & Striping

Query Retrieval P1P1 P2P2 P3P3 P4P4 ABC

Example Original SpaceProcessor 1Processor 2 8 points to be reported Reports: 2 consecutive blocks & 4 points

The Parallel Framework zA single view is partitioned across p processors zPartial Hilbert/r-tree indexes are computed locally zQueries are answered concurrently zQueries answered individually or piggy- backed

The Virtual Data Cube z Problem: Full cube often to large to materialize z Solution: Use surrogate views

Surrogate Processing

Other issues… zDimension ordering zQuery piggybacking zBatch updating zManaging Hierarchies of views

Experimental Results zMachine y17 node cluster yNode = 1.8 GHz Xeon, 1 GB RAM, 2 * 40 GB IDE drives, running Linux yInterconnect = Intel Fast Ethernet switch zTest Data y10 dimensions and 1,000,000 records

RCUBE index Construction Output: ~640 million rows, 16 Gigabytes

Distributed Query Resolution Test: Random queries returning ~15% of points (10 experiments per point)

Disk blocks retrieved vs. Disk Seeks Test: Random queries returning 5-15% of points (15 experiments per point)

Distributed Query Resolution in Surrogate Group-bys

Thank You Questions?

Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ.

Similar presentations

Presentation on theme: "Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ.

Similar presentations

Presentation on theme: "Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ."— Presentation transcript:

Similar presentations

About project

Feedback