Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yue (Jenny) Cui and William Perrizo North Dakota State University

Similar presentations


Presentation on theme: "Yue (Jenny) Cui and William Perrizo North Dakota State University"— Presentation transcript:

1 Yue (Jenny) Cui and William Perrizo North Dakota State University
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui and William Perrizo North Dakota State University

2 Outline Introduction Review of Aggregate Functions Review P-tree vertical, compressed datamining-ready data structures Algorithms of Aggregate Function Computation Using P-trees SUM, COUNT, and AVERAGE. MAX, MIN, MEDIAN, RANK, and TOP-K. Performance Analysis Conclusion

3 Introduction Commonly used aggregation functions include COUNT, SUM, AVERAGE, MIN, MAX, MEDIAN, RANK, and TOP-K. Iceberg queries perform aggregate functions across attributes and then eliminate aggregate values that are below some specified threshold. example iceberg query. SELECT Location, Product Type, Sum (# Product) FROM Relation Sales GROUPBY Location, Product Type HAVING Sum (# Product) >= T

4 But it is pure (pure0) so this branch ends
A file, R(A1..An), contains horizontal structures (horizontal records) Ptrees: vertically partition; then compress each vertical bit slice into a basic Ptree; horizontally process these basic Ptrees using one multi-operand logical AND. processed vertically (vertical scans) R( A1 A2 A3 A4) R[A1] R[A2] R[A3] R[A4] Horizontal structures (records) Scanned vertically R11 1 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 1-Dimensional Ptrees are built by recording the truth of the predicate “pure 1” recursively on halves, until there is purity, P11: 1. Whole file is not pure1 0 2. 1st half is not pure1  0 3. 2nd half is not pure1  0 0 0 P11 P12 P13 P21 P22 P23 P31 P32 P33 P41 P42 P43 0 0 0 1 10 1 0 01 1 0 0 1 5. 2nd half of 2nd half is  1 0 0 0 1 6. 1st half of 1st of 2nd is  1 0 0 0 1 1 7. 2nd half of 1st of 2nd not 0 0 0 0 1 10 4. 1st half of 2nd half not  0 0 0 But it is pure (pure0) so this branch ends Eg, to count, s, use “pure ”: level P11^P12^P13^P’21^P’22^P’23^P’31^P’32^P33^P41^P’42^P’43 = level =2 level

5 Algorithms of Aggregate Function Computation Using P-trees
The dataset we used in our example. We use the data in relation Sales to illustrate algorithms of aggregate function. Id Mon Loc Type On line # Product 1 Jan New York Notebook Y 10 2 Minneapolis Desktop N 5 3 Feb Printer 6 4 Mar 7 11 Chicago 9 Apr Fax Table 1. Relation Sales.

6 Algorithms of Aggregate Function Computation Using P-trees (Cont.)
Table 2 shows the binary representation of data in relation Sales. Id Mon Loc Type On line # Product P0,3 P0,2 P0,1 P0,0 P1,4 P1,3 P1,2 P1,1 P1,0 P2,2 P2,1 P2,0 P3,0 P4,3 P4,2 P4,1 P4,0 1 0001 00001 001 1010 2 00101 010 0101 3 0010 100 0110 4 0011 0111 5 1011 6 00110 1001 7 0100 101 Table 2. Binary Form of Sales.

7 Algorithm of Aggregate Function COUNT
COUNT function: It is not necessary to write special function for COUNT because P-tree RootCount function has already provided the mechanism to implement it. Given a P-tree Pi, RootCount(Pi) returns the number of 1s in Pi. Id Mon Loc Type On line # Product P0,3 P0,2 P0,1 P0,0 P1,4 P1,3 P1,2 P1,1 P1,0 P2,2 P2,1 P2,0 P3,0 P4,3 P4,2 P4,1 P4,0 1 0001 00001 001 1010 2 00101 010 0101 3 0010 100 0110 4 0011 0111 5 1011 6 00110 1001 7 0100 101 Table 1. Relation Sales.

8 Algorithm of Aggregate Function SUM
SUM function: Sum function can total a field of numerical values. Algorithm 4.1 Evaluating sum () with P-tree. total = 0.00; For i = 0 to n { total = total + 2i * RootCount (Pi); } Return total Algorithm Sum Aggregate

9 Algorithm of Aggregate Function SUM
P4, P4, P4, P4,0 10 5 6 7 11 9 3 1 1 1 1 For example, if we want to know the total number of products which were sold out in relation Sales, the procedure is showed on left {3} {3} {5} {5} 23 * * * * = 51

10 Algorithm of Aggregate Function AVERAGE
Average function: Average function will show the average value in a field. It can be calculated from function COUNT and SUM. Average () = Sum ()/Count ().

11 Algorithm of Aggregate Function MAX
Max function: Max function returns the largest value in a field. Algorithm 4.2 Evaluating max () with P-tree. max = 0.00; c = 0; Pc is set all 1s For i = n to 0 { c = RootCount (Pc AND Pi); If (c >= 1) Pc = Pc AND Pi; max = max + 2i; } Return max; Algorithm Max Aggregate.

12 Algorithm of Aggregate Function MAX
Steps IF Pos Bits P4, P4, P4, P4,0 1. Pc = P4,3 RootCount (Pc) = >= 1 10 5 6 7 11 9 3 1 1 1 1 {1} 2. RootCount (Pc AND P4,2) = 0 < Pc = Pc AND P’4,2 {0} 3. RootCount (Pc AND P4,1 ) = 2 >= 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P4,0 ) = 1 >= 1 {1} 23 * * * * = {1} {0} {1} {1} 11

13 Algorithm of Aggregate Function MIN
Min function: Min function returns the smallest value in a field. Algorithm Evaluating Min () with P-tree. min = 0.00; c = 0; Pc is set all 1s For i = n to 0 { c = RootCount (Pc AND NOT (Pi)); If (c >= 1) Pc = Pc AND NOT (Pi); Else min = min + 2i; } Return min; Algorithm Max Aggregate.

14 Algorithm of Aggregate Function MIN
Steps IF Pos Bits P4, P4, P4, P4,0 1. Pc = P’4,3 RootCount (Pc) = >= 1 10 5 6 7 11 9 3 1 1 1 1 {0} 2. RootCount (Pc AND P’4,2) = 1 >= 1 Pc = Pc AND P’4,2 {0} 3. RootCount (Pc AND P’4,1 ) = 0 < 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P’4,0 ) = 0 < 1 {1} 23 * * * * = {0} {0} {1} {1} 3

15 Performance Analysis Figure 15. Iceberg Query with multi-attributes aggregation Performance Time Comparison

16 Performance Analysis Our experiments are implemented in the C++ language on a 1GHz Pentium PC machine with 1GB main memory running on Red Hat Linux. In figure 15, we compare the running time of P-tree method and bitmap method on calculating multi-attribute iceberg query. In this case P-trees are proved to be substantially faster.

17 Conclusion we believe our study confirms that the P-tree approach is superior to the bitmap approach for aggregation of all types and multi-attribute iceberg queries. It also proves that the advantages of basic P-tree representations of files are: First, there is no need for redundant, auxiliary structures. Second basic P-trees are good at calculating multi-attribute aggregations, numeric value, and fair to all attributes.


Download ppt "Yue (Jenny) Cui and William Perrizo North Dakota State University"

Similar presentations


Ads by Google