Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.

Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University

OUTLINE Introduction Review Of Peano Trees (P-trees) OLAP Operations Using P-trees  Peano Data Cubes (PD-Cubes)  OLAP Operations Performance Analysis Conclusion

INTRODUCTION Efficient OLAP for spatial data warehouses is in great demand Spatial warehouses is growing with more and more spatial data, such as remotely sensed images, geographical information, digital sky survey data The data in a warehouse are conceptually modeled as data cubes (Gray et al, 1997)

INTRODUCTION (Cont.) OLAP queries are complex and time consuming Two major approaches to speed up OLAP  Using index structures  Operating on compressed data. Bitmap index are space inefficient for high cardinality attributes, and are only suitable for narrow domains.

Our Approach on OLAP A new data warehousing structure, PD- cube, is developed to facilitate OLAP operations and queries Fast logical operations of P-Trees are used to accomplish OLAP operations. Predicate P-trees are used to efficiently reduce data accesses by filtering out “big holes” consisting of consecutive 0’s

REVIEW OF PEANO TREES (Ptrees) The Ptree is a quadrant-based tree structure (assuming a 2-dimensional image; more generally, for n- dimensional data, an n-polytant tree) It is used to facilitate compression and very fast logical operations on bit sequential (bSQ) data (Perrizo, 2001)  Ptrees can be 1-dimensional, 2-dimensional, 3-dimensional… The most useful form of a Ptree is the predicate- Ptree: e.g., Pure1 Ptree (P1tree) and NonPure0 Ptree (NP0-tree)

bSQ File and a Pure1 tree (P1-tree) 0 1000 00101101 11100010110 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 10 0 1 111 11 10 000 10 0 11 0 0 1 0 P1-tree: Tree node=1 iff that sub-quadrant is purely 1-bits. 1111110011111000111111001111111011110000111100001111000001110000

55 1681516 30414434 11100010110 1 An Count Ptree (NOTE: usually counts are the ultimate goal, but Pure1 trees are easier to work with and produce the needed counts quite quickly) Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count  Level  Fan-out  QID (Quadrant ID) 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0123 111 ( 7, 1 ) ( 111, 001 ) 10.10.11 2 3 2. 2. 3 001

BSQ File and a NP0-tree NP0-tree: Node=1 iff that sub-quadrant is not pure zero. (more general; -Ptree: node=1 iff sub-quad satisfies 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1110 10111111 11100010110 1

Logical Operations of P-trees Operations are level by level Consecutive 0’s holes can be filtered out We only need to load quadrant with Qid 2 for ANDing NP0-tree1 and NP0-tree2.

OLAP OPERATIONS USING P-TREES 1.Peano Data Cube (PD-cube) 2.OLAP Operations 1)Slice/Dice 2)Rollup 3.Performance Analysis

Peano Data Cube (PD-cube) The data cube is partitioned by bit position  Each bit-wised data cube is in Peano order Take advantage of the continuity and sparseness of spatial data An example: a 3-D data cube representing the crop yield with three dimensions: X- coordinate, Y-coordinate, and time T.

A Fact Table and the PD-cubes XYTYield 00015 (1111) 100 010 110 001 101 011 111 200 300 4 (0100) 210 1 (0001) 31012 (1100) 201 301 2 (0010) 21112 (1100) 311 02015 (1111) 120 030 2 (0010) 130 0 (0000) 02115 (1111) 121 031 2 (0010)

OLAP Query Examples “Find all galaxies brighter than magnitude 22.” “Find average crop yield in a field. ” “Find area of the region with the color red.” “Find total traffic flow during a given period.”

Slice/Dice Operations Typical select statements may have a number of predicates in their “where” clause. The predicates may include “=”, “ ”.  These predicates lead to two different query scenarios: equal queries (“=”) and range queries (“ ”).

Equal Select Slice Example Suppose we have a 3-D data cube representing crop yield with dimensions X, Y and T, where X = {0, 1}, Y = {0, 1} and T = {0, 1}. XYTYield 000111 100011 010001 110100 001011 101010 011100 111001 1 4 7 3 3 2 1

Ptrees for 3-D Cube Example P ij is a Ptree for the j th bit of the i th attribute.

Slice: “Get yield where Y = 1” First get Ptree masks, and then trim all Ptrees accordingly. 1 4 7 3 3 2 1

Range Slice: Get yield where Y >1001 Data set {“Y > 1001”} consists of two subsets {“Y = 11**”} and {“Y = 101*”}, where * is 1 or 0. The query clause can be written as “where Y = 11** || Y = 101*”. The query is retrieved by Ptree mask PM gt = PM1 || PM2. PM1 = P21 & P22 PM2 = P21 & P’22 & P27 1111 1110 1101 1100 1011 1010 ------- 1001 11** 101* P21 & P22 P21 & P’22 & P23

Other Properties of Range Queries Combination of an Equality Query and a Range Query  Divide {“T  01011001”} into 2 subsets {“T>01011001”}, {“T=01011001”} Complement of a Range Query  Data set {“T  01011001”} is the complement of {“T>01011001”}  With the result of query “Get yield where T>01011001, we can easily retrieve query “Get yield where T  01011001” by making the complement, i.e. PM le = PM’ gt.

Rollup Operations PD-cube is stored in Peano order rather than in raster order. Therefore, the rollup of PD-cube is accomplished differently from the traditional data cube as a result of different storage models. According to the Peano storage of PD- cube, we develop the recursive rollup algorithm.

Rollup of “Yield” along Dimension T S2[ ] = {1, 1, 1, 0} S1 [ ] = {0, 1, 1, 1} S0 [ ] = {2, 1, 1, 1} S [ ] = {8, 7, 7, 7} S[i] = S2[i] x 2 2 + S1[i] x 2 1 + S0[i] x 2 0 01 10 0 00 11 0 10 11 1 001 100 111011 011 010 0 1 1 0 0 0 1 11 01 21 4 4 3 1 1 7 01 10 0 00 11 0 10 11 1 87 0 1 1 03 2 1 01 7

PERFORMANCE ANALYSIS: Compare our algorithm with bitmap indexed data cube method The data is prepared in five sizes, 128x128, 256x256, 512x512, 1024x1024, and 2048x2048. When cube size > 1900KB, our method outperforms bitmap indexed data cube method. As the cube size increases, there is a drastic increase in response time for bitmap indexed data cube method.

CONCLUSION A general spatial data warehousing structure, PD- cube, is presented to facilitate OLAP operations. The fast logical operations of Ptrees are used to accomplish these operations. Predicate Ptrees are used to find the “big holes” of consecutive 0’s by performing logical operations. Experiments indicate OLAP operations using Ptrees is much faster than traditional data cube methods.

FUTURE WORK One future research direction is to extend our PD-cube into parallel data warehouse systems.  It appears to be particularly promising to partition large cubes horizontally or vertically (or both) into small cubes to improve the query performance through parallelism.

Thank you.

Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.

Similar presentations

Presentation on theme: "Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.

Similar presentations

Presentation on theme: "Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University."— Presentation transcript:

Similar presentations

About project

Feedback