A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda, Italy

Data Cube sum(price*quantity)

Data Cube BrandProduct TypeCountrysum(Price*Quantity) Brand1Type1Country11234 Brand1Type2Country13522 Brand1Type11234 Brand1Type23522 Brand1Country14756 Type1Country11234 Type2Country13522 Brand14756 Type11234 Type23522 Country14756 1.What are the top-2 product types with the highest revenue of each brand in each country? 2.What are the top-2 brands with the highest revenue in each country?

Top-k Queries Primary Attribute: The attribute/dimension over which the selection is performed (e.g. product type) Secondary Attributes: Used to filter specific results (e.g. brand, country) Aggregated Attributes: Used to compute an aggregated score (e.g. price, quantity) Aggregate Function: e.g. sum One top-k query for each combination of secondary attribute instances (filtering condition)

Filtering Conditions: Example (1) brand={X}-country = {Y, W} brand=X AND country=Y brand=X AND country=W SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=W GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

Filtering Conditions: Example (2) country = {Y, W}-brand={X} country=Y country=W brand=X SELECT type, SUM(price*quantity) FROM relation WHERE country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE country=W GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE brand=X GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

Filtering Conditions: Example (3) country = {Y, W}-brand={X} SELECT type, SUM(price*quantity) FROM relation GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

Updates Insertions to the underlying database that contain all information related to the top-k queries INSERT INTO relation (type, brand, country, price, quantity) VALUES (T, X, Y, 100, 3)

Problem How to maintain all these queries in the presence of fast updates?

Outline Setting/Problem Algorithms – Naïve Approach – Estimates Approach – Groups Approach Experimental Results Conclusions

Example SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT 2 Update: (type, X, Y, 300)

Naïve Approach Case 1: type in the top-2, e.g. (B,X,Y,300) TypeScore A3452 B2406 +300 TypeScore A3452 B2706 Case 2: type NOT in the top-2, e.g. (K,X,Y,300) Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=K GROUP BY type

Estimates Approach In-memory Structures top-(k+N) instances with exact aggregated scores B instances with estimated aggregated scores best possible score (basic score) + inserted values TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer

Estimates Approach Case 1.1: type in the top-2, e.g. (B,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore A3452 B2706 C2356 D2167 E1987 top-2 top-5 +300

Estimates Approach Case 1.2: type in the top-5, e.g. (D,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore A3452 B2406 C2356 D2467 E1987 top-2 top-5 +300 TypeScore A3452 D2467 B2406 C2356 E1987

Estimates Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer +300 TypeScore O1990 P2412 Q2076 R1997 Buffer Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=P GROUP BY type

Estimates Approach Sub-case 2.1: score(P) < score(E), e.g. score(P) = 756 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P756 Q2076 R1997 Buffer TypeScore O1990 Q2076 R1997

Estimates Approach Sub-case 2.2: score(P) > score(E), e.g. score(P) = 2178 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2178 Q2076 R1997 Buffer TypeScore A3452 B2406 C2356 D2167 P2178 TypeScore O1990 Q2076 R1997

Estimates Approach Sub-case 2.3: score(P) > score(B), e.g. score(P) = 2407 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2407 Q2076 R1997 Buffer TypeScore A3452 P2407 B2406 C2356 D2167 TypeScore O1990 Q2076 R1997

Estimates Approach Buffer Full Reset Query Estimated Score(T) = basic score + 300 = 2287 Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300) SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type IN (O,P,Q,R) GROUP BY type

Estimates Approach score(O)=1254, score(P)=432, score(Q)=2050, score(R)=1990 TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer TypeScore T2287 TypeScore A3452 B2406 C2356 D2167 Q2050 Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300)

Queries Characteristics SAME primary attribute SAME aggregate attributes SAME aggregate function SAME top-k condition DIFFERENT filtering condition

Lattice organisation

Groups Approach The updates are forwarded from top to bottom in the lattice Each ranking forwards the queried results to the rankings lying in lower levels in the lattice

Groups Approach: Example SELECT type, SUM(price*quantity) FROM relation WHERE brand=X GROUP BY type ORDER BY SUM(price*quantity) LIMIT 2 Update: (type, X, Y, 300) Ranking: brand=X, country=ANY

Groups Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) TypeScore A3452 B2406 C2356 D2167 E1987 top-2 top-5 TypeScore O1990 P2112 Q2076 R1997 Buffer +300 TypeScore O1990 P2412 Q2076 R1997 Buffer Verification Query

Groups Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) Verification Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type=P Buffer Reset Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R) Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300)

Groups Approach Tuples (type, brand, country, price*quantity) limited to those satisfying its filtering condition Uses them to compute the scores. Forwards them to the rankings lower in lattice Rankings receiving tuples use those qualifying to their filtering condition to compute the scores

Groups Approach: Verification Query SELECT brand, country, price*quantity FROM relation WHERE brand=X AND type=P A set with (brand, country, price*quantity) tuples limited to those that have brand=X Uses them to compute score(P). Forwards them to the rankings lower in lattice ({brand=X, country=Y}, {brand=X, country=W})

Groups Approach Buffer Full Reset Query Estimated Score(T) = score(E) + 300 = 2287 SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R) Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300)

Outline Problem Algorithms Naïve Approach Estimates Approach Groups Approach Experimental Results Conclusions

Experiments (1) TPC-H data Select on part.p_partKey (200,000 unique values) Filter on customer.c_mktsegment, orders.o_orderpriority and region.r_name Aggregation sum on lineitem.l_quantity 216 total rankings 30,000 updates/insertions

Experiments (2) Updates Random: inserts quantity between 1 and 50 for a random part.p_partKey. 80-20: inserts quantity between 1 and 50 for a part.p_partKey selected according to the 80-20 rule N-extra Gap Difference between top-k and top-(k+N) scores 100% (1*50) and 200% (2*50)

80-20 Updates: Queries

80-20 Updates: Time

Random Updates: Queries

Random Updates: Time

Naïve Approach 80-20 updates: 239,985 Verification Queries, 4 secs/update Random updates: 239,977 Verification Queries, 4 secs/update

Outline Problem Algorithms Naïve Approach Estimates Approach Groups Approach Experimental Results Conclusions

Conclusion Two algorithms to maintain top-k rankings in the presence of fast updates arriving in an underlying database Exact top-k results Faster than a Naïve approach while Groups Approach limits further the communication with the database Preliminary results which provide insights on the impact of the various parameters in the effectiveness of our methods

Thank you!

Additional Instances

A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Similar presentations

Presentation on theme: "A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Similar presentations

Presentation on theme: "A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,"— Presentation transcript:

Similar presentations

About project

Feedback