Yue (Jenny) Cui and William Perrizo North Dakota State University

Yue (Jenny) Cui and William Perrizo North Dakota State University
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui and William Perrizo North Dakota State University

Outline Introduction Review of Aggregate Functions Review P-tree vertical, compressed datamining-ready data structures Algorithms of Aggregate Function Computation Using P-trees SUM, COUNT, and AVERAGE. MAX, MIN, MEDIAN, RANK, and TOP-K. Performance Analysis Conclusion

Introduction Commonly used aggregation functions include COUNT, SUM, AVERAGE, MIN, MAX, MEDIAN, RANK, and TOP-K. Iceberg queries perform aggregate functions across attributes and then eliminate aggregate values that are below some specified threshold. example iceberg query. SELECT Location, Product Type, Sum (# Product) FROM Relation Sales GROUPBY Location, Product Type HAVING Sum (# Product) >= T

But it is pure (pure0) so this branch ends
A file, R(A1..An), contains horizontal structures (horizontal records) Ptrees: vertically partition; then compress each vertical bit slice into a basic Ptree; horizontally process these basic Ptrees using one multi-operand logical AND. processed vertically (vertical scans) R( A1 A2 A3 A4) R[A1] R[A2] R[A3] R[A4] Horizontal structures (records) Scanned vertically R11 1 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 1-Dimensional Ptrees are built by recording the truth of the predicate “pure 1” recursively on halves, until there is purity, P11: 1. Whole file is not pure1 0 2. 1st half is not pure1  0 3. 2nd half is not pure1  0 0 0 P11 P12 P13 P21 P22 P23 P31 P32 P33 P41 P42 P43 0 0 0 1 10 1 0 01 1 0 0 1 5. 2nd half of 2nd half is  1 0 0 0 1 6. 1st half of 1st of 2nd is  1 0 0 0 1 1 7. 2nd half of 1st of 2nd not 0 0 0 0 1 10 4. 1st half of 2nd half not  0 0 0 But it is pure (pure0) so this branch ends Eg, to count, s, use “pure ”: level P11^P12^P13^P’21^P’22^P’23^P’31^P’32^P33^P41^P’42^P’43 = level =2 level

Algorithms of Aggregate Function Computation Using P-trees
The dataset we used in our example. We use the data in relation Sales to illustrate algorithms of aggregate function. Id Mon Loc Type On line # Product 1 Jan New York Notebook Y 10 2 Minneapolis Desktop N 5 3 Feb Printer 6 4 Mar 7 11 Chicago 9 Apr Fax Table 1. Relation Sales.

Algorithms of Aggregate Function Computation Using P-trees (Cont.)
Table 2 shows the binary representation of data in relation Sales. Id Mon Loc Type On line # Product P0,3 P0,2 P0,1 P0,0 P1,4 P1,3 P1,2 P1,1 P1,0 P2,2 P2,1 P2,0 P3,0 P4,3 P4,2 P4,1 P4,0 1 0001 00001 001 1010 2 00101 010 0101 3 0010 100 0110 4 0011 0111 5 1011 6 00110 1001 7 0100 101 Table 2. Binary Form of Sales.

Algorithm of Aggregate Function COUNT
COUNT function: It is not necessary to write special function for COUNT because P-tree RootCount function has already provided the mechanism to implement it. Given a P-tree Pi, RootCount(Pi) returns the number of 1s in Pi. Id Mon Loc Type On line # Product P0,3 P0,2 P0,1 P0,0 P1,4 P1,3 P1,2 P1,1 P1,0 P2,2 P2,1 P2,0 P3,0 P4,3 P4,2 P4,1 P4,0 1 0001 00001 001 1010 2 00101 010 0101 3 0010 100 0110 4 0011 0111 5 1011 6 00110 1001 7 0100 101 Table 1. Relation Sales.

Algorithm of Aggregate Function SUM
SUM function: Sum function can total a field of numerical values. Algorithm 4.1 Evaluating sum () with P-tree. total = 0.00; For i = 0 to n { total = total + 2i * RootCount (Pi); } Return total Algorithm Sum Aggregate

Algorithm of Aggregate Function SUM
P4, P4, P4, P4,0 10 5 6 7 11 9 3 1 1 1 1 For example, if we want to know the total number of products which were sold out in relation Sales, the procedure is showed on left {3} {3} {5} {5} 23 * * * * = 51

Algorithm of Aggregate Function AVERAGE
Average function: Average function will show the average value in a field. It can be calculated from function COUNT and SUM. Average () = Sum ()/Count ().

Algorithm of Aggregate Function MAX
Max function: Max function returns the largest value in a field. Algorithm 4.2 Evaluating max () with P-tree. max = 0.00; c = 0; Pc is set all 1s For i = n to 0 { c = RootCount (Pc AND Pi); If (c >= 1) Pc = Pc AND Pi; max = max + 2i; } Return max; Algorithm Max Aggregate.

Algorithm of Aggregate Function MAX
Steps IF Pos Bits P4, P4, P4, P4,0 1. Pc = P4,3 RootCount (Pc) = >= 1 10 5 6 7 11 9 3 1 1 1 1 {1} 2. RootCount (Pc AND P4,2) = 0 < Pc = Pc AND P’4,2 {0} 3. RootCount (Pc AND P4,1 ) = 2 >= 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P4,0 ) = 1 >= 1 {1} 23 * * * * = {1} {0} {1} {1} 11

Algorithm of Aggregate Function MIN
Min function: Min function returns the smallest value in a field. Algorithm Evaluating Min () with P-tree. min = 0.00; c = 0; Pc is set all 1s For i = n to 0 { c = RootCount (Pc AND NOT (Pi)); If (c >= 1) Pc = Pc AND NOT (Pi); Else min = min + 2i; } Return min; Algorithm Max Aggregate.

Algorithm of Aggregate Function MIN
Steps IF Pos Bits P4, P4, P4, P4,0 1. Pc = P’4,3 RootCount (Pc) = >= 1 10 5 6 7 11 9 3 1 1 1 1 {0} 2. RootCount (Pc AND P’4,2) = 1 >= 1 Pc = Pc AND P’4,2 {0} 3. RootCount (Pc AND P’4,1 ) = 0 < 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P’4,0 ) = 0 < 1 {1} 23 * * * * = {0} {0} {1} {1} 3

Performance Analysis Figure 15. Iceberg Query with multi-attributes aggregation Performance Time Comparison

Performance Analysis Our experiments are implemented in the C++ language on a 1GHz Pentium PC machine with 1GB main memory running on Red Hat Linux. In figure 15, we compare the running time of P-tree method and bitmap method on calculating multi-attribute iceberg query. In this case P-trees are proved to be substantially faster.

Conclusion we believe our study confirms that the P-tree approach is superior to the bitmap approach for aggregation of all types and multi-attribute iceberg queries. It also proves that the advantages of basic P-tree representations of files are: First, there is no need for redundant, auxiliary structures. Second basic P-trees are good at calculating multi-attribute aggregations, numeric value, and fair to all attributes.

Yue (Jenny) Cui and William Perrizo North Dakota State University

Similar presentations

Presentation on theme: "Yue (Jenny) Cui and William Perrizo North Dakota State University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yue (Jenny) Cui and William Perrizo North Dakota State University

Similar presentations

Presentation on theme: "Yue (Jenny) Cui and William Perrizo North Dakota State University"— Presentation transcript:

Similar presentations

About project

Feedback