Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin.

Similar presentations


Presentation on theme: "Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin."— Presentation transcript:

1 Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin & Oregon Graduate Institute

2 Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

3 Querying the WWW: The Present Who won the Nobel prize for Physics in 1999? Nobel prize physics 1999 www.google.com The Internet Search Engine HTML File

4 Querying the WWW: The Present Want 1998 Red BMW No accidents 20% < avg. model price The Internet Search Engine HTML File 1998 Red BMW price www.google.com HTML File

5 Querying the WWW: The Future? Want 1998 Red BMW No accidents 20% < avg. model price (Queryable) Data Source (Queryable) XML Sources The Internet Internet Query Engine XML Query Language XML Query Engine (e.g., Niagara) High-level Query Through GUI www.google+.com

6 Inside the Internet Query Engine (carId, model, price, otherinfo) Red Used BMW Cars (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Accident Reports (carId) Union (carId)

7 The Problem Return results to users as soon as possible “Results so far” for queries with blocking operators Arbitrary blocking operators –Not exists, Average, Nest … Blocking operators occurring anywhere in the query –Potentially intermixed with non-blocking operators

8 Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

9 What is a Partial Result of a Query? Let Full Result of Query Q on Inputs A and B be: –Q(A, B) Then Partial Result of Query Q on Inputs A and B is: –Q(PA, PB) –PA  A –PB  B

10 Maximal Output Property Produce “correct” results as soon as possible Why? –If query is non-blocking Produces results soon –If query is blocking Return “non-blocking parts” soon (e.g., outer join)

11 Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)

12 Anytime Property Blocking operators should be able to return the “result so far” at any time Why? –User can request partial results at any time

13 Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)

14 Non-Monotonic Input/Output Property Operators should handle “changes”, not just additions to input Similarly, operators should produce “changes”, not just additions to output Both blocking and non-blocking operators Why? –Partial results may represent “wrong” answers –Need to be corrected later

15 Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)

16 Flexible Input Property Should be able to process data from any input at any time Processes data as it becomes available Why? –If query is non-blocking: Can return results soon –If query is blocking Faster partial result response time

17 A Note on Partial Result Accuracy Focus is on producing partial results Architecture is general enough to exploit existing techniques –Online aggregation [Hellerstein et. al.] –Nested aggregates [Tan et. al.] Accuracy for general blocking operators?

18 Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

19 Where do we start? Use known flexible input, maximal output operator implementations –Non-blocking: select, symmetric hash join, Xjoin –Blocking: group-by, symmetric outer join Blocking operator implementations should satisfy anytime property All operator implementations should satisfy non-monotonic input/output property

20 Non-Monotonic Input/Output Re-evaluation Approach: –On partial result request, compute results “so far” –Then forget all “potentially incorrect” inputs Differential Approach: –On partial result request, compute results “so far” –“Update” incorrect inputs for future result computation

21 Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW CarsAccident Reports (carId) (carId, model, price, otherinfo) Not Exists (carId, model, price, otherinfo) (model, avgprice) Group By Join model = model price <= 0.8 * avgprice Union (carId, model, price, otherinfo) Union (carId)

22 Re-evaluation Join (1, Z3, 10000) (Z3, 15000) (19, Z3, 20000) (5, 400i, 30000) (400i, 25000) (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (1, Z3, 10000) (3, 400i, 20000)

23 Re-evaluation Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (1, Z3, 10000) (3, 400i, 20000) (8, 400i, 20000) (Z3, 15000) (400i, 23333)

24 Differential Join (1, Z3, 10000) (Z3, 15000) (19, Z3, 20000) (5, 400i, 30000) (400i, 25000) (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (1, Z3, 10000) (3, 400i, 20000)

25 Differential Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (3, 400i, 20000) update (400i, 23333) del (3, 400i, 20000)

26 Differential Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 23333) (3, 400i, 20000) del (3, 400i, 20000) (8, 400i, 20000)

27 Re-evaluation vs. Differential Re-evaluation Approach: –Simple – just “forget” partial inputs –Easier to extend (no changes to tuple structure) –Unnecessary computation Differential Approach: –Need to handle deletions/updates of inputs –Changes to tuple structure –Re-computes only what is necessary

28 Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

29 Response Time

30 Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

31 New properties for query engine operators Operator implementation alternatives –Re-evaluation –Differential Evaluation –Partial results improve response time –Re-evaluation approach is simpler –Differential approach is more efficient

32 Future Work General GUI Partial result accuracy for general blocking operators Changes at finer granularities Consistent partial results

33 Related Work Online aggregation [Hellerstein et. al.] Nested aggregates [Tan et. al.] Online reordering [Raman et. al.] Symmetric hash join [Wilschut et. al.] Adaptive operators [Ives et. al.] XJoin [Urhan et. al.]


Download ppt "Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin."

Similar presentations


Ads by Google