Presentation is loading. Please wait.

Presentation is loading. Please wait.

Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

Similar presentations


Presentation on theme: "Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao."— Presentation transcript:

1 Senthil Gnanaprakasam http://tsangpo.eas.asu.edu System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao Kambhampati Dr Chitta Baral Dr Susan D Urban MS Thesis Defense

2 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Organization Internet Information Gathering Internet databases Join ordering issues Current methods and algorithms Internet System R Algorithm Implementation and conclusion

3 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Internet Information Gathering

4 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Need to order pizzas!

5 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Need to order pizzas!

6 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Need to order pizzas!

7 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Need to order pizzas!

8 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Need to order pizzas!

9 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Need to order pizzas! Information Gatherer Other Sources

10 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Information Integration Uniform query interface Uses mediated schema/virtual relations Source descriptions & statistics Query rewriting Query plan optimization Query execution engine Wrapper Global Data Model

11 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Query Rewriting Sound: Does all data returned satisfy given query? Complete: Does the query return all possible sound results?

12 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Query Plan Optimization Subsumption of sources Quality of data Cost of accessing sources Ordering for optimized cost Duplicate sources Pay-per-use, Network costs

13 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Internet Databases

14 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Types of internet databases Form interfaced database Text database Intranet databases 1 2 3

15 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Source Statistics Access and transfer time vary widely with Type of source: local, intranet, Internet Time of the day Number and speed of servers Reliability of connection

16 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Binding Constraints What attributes can be bound? Books [isbn, title, author, publisher, price, pages f ] YellowPages (lastName f, firstName, zip, phone b ) YellowPages (lastName b, firstName, zip, phone f )

17 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Join Ordering Issues

18 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Importance of join ordering P ⋈ Q a + 100t+ 100a + 10t Q ⋈ P a + 10t+ 10a + 100t Cost(P ⋈ Q): a + 100t + 100a + 10t Cost(Q ⋈ P): a + 10t + 10a + 100t 90a Traditional: Hard disk seek time Internet source access time

19 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Internet Information Gathering University (Intranet) Administration Student Library (Machine) Borrow Information Gatherer Lost Books

20 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Schema Books (isbn b, title, author, publisher, price, pages) Student (id b, firstName, lastName) Borrow (studentId, isbn, dateIssued) Lost (isbn) SELECT * FROM Student, Books, Lost, Borrow WHERE Borrow.isbn=Lost.isbn AND Books.isbn=Lost.isbn AND Borrow.id=Student.id

21 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Current Methods & Algos Current Methods & Algorithms

22 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Bound Is Easier Binding values to attributes produces lesser number of results Valid heuristic in absence of source statistics Example: |UniversityStudent(Name, Age, Sex, Dept)|50,000 |UniversityStudent(Name, Age, Sex, “CS”)| 4,000 |UniversityStudent(Name, Age, “M”, “CS”)| 2,000

23 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Greedy Algorithm Based on Bound-Is-Easier heuristic Gives importance to access costs keeping the Internet scenario Maintains a list of feasible binding patterns Views sources/binding patterns as either High Traffic Binding Pattern [HTBP] or Low Traffic Attempts are made at each iteration to access the most general feasible binding pattern if not in HTBP Sentinel checks ensure the algorithm proceeds to completion

24 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Join Ordering Strategies Bound is easier UniversityStudent(Name, Age, Sex, Dept) Student(Name, Id) Greedy Algorithm Student(Name, Id, Dept) UniversityStudent(Age, Sex, Dept) System R Static query optimization algorithm Exhaustive search Dynamic programming approach Retains candidate trees with smallest cost and prunes others

25 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Left linear Bushy Shortcomings of System R Binding restrictions not taken care of Bushy trees not considered R4 R3 R2R1 ⋈ ⋈ ⋈ R2R3 R4 ⋈ ⋈ ⋈

26 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering ISR Algo Internet System R Algorithm

27 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Internet System R Update bindings obtained from previous level of subplans Search all types of trees (left linear, bushy & right linear) Use full set of statistics to estimate sizes Preserves graceful degradation property Trade off: Planning vs. execution time

28 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Algorithm INPUTS S [1..m]: Array of all subgoals expanded w.r.t binding patterns; Associated data structure along with above which will help calculate costs; Initialize NODE with PP = nil; Bindings = {φ}; Cost=0. IF S has a corresponding BestPlan return the corresponding join order ENDIF REPEAT FOR i = 1 TO number of feasible leaf nodes FOR j = 1 TO |Q| C i DO LET LeftSubGoal = jth element in |Q| C i LET RightSubGoal = S - LeftSubGoal Recursively call this algorithm with LeftSubGoal and RightSubgoal CurPlan = Make a new plan by joining the above resultant plans IF it has a lower cost than current BestPlan THEN update BestPlan ENDIF NEXT j NEXT i UNTIL no child nodes are generated in an entire iteration return join order of BestPlan END. Perform feasibility check Bushy trees Pruning

29 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Example ISR (Student, Books, Borrow, Lost) ISR (Student, Books, Borrow) ⋈ ISR (Lost) ISR (Student, Books, Lost) ⋈ ISR (Borrow) ISR (Student, Lost, Borrow) ⋈ ISR (Books) ISR (Books, Lost, Borrow) ⋈ ISR (Student) ISR (Lost) ⋈ ISR (Student, Books, Borrow) ISR (Borrow) ⋈ ISR (Student, Books, Lost) ISR (Books) ⋈ ISR (Student, Lost, Borrow) ISR (Student) ⋈ ISR (Books, Lost, Borrow) ISR (Student, Books) ⋈ ISR (Lost Borrow) ISR (Student, Lost) ⋈ ISR (Books, Borrow) ISR (Student, Borrow) ⋈ ISR (Books, Lost) ISR (Lost, Borrow) ⋈ ISR (Student, Books) ISR (Books, Borrow) ⋈ ISR (Student, Lost) ISR (Books, Lost) ⋈ ISR (Student, Borrow)

30 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Feasiblity Check Books (isbn b, title, author, publisher, price, pages) Student (id b, firstName, lastName) Borrow (studentId, isbn, dateIssued) Lost (isbn) ISR (Student, Books) ⋈ ISR (Lost, Borrow)

31 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Pruning PlanCost (Student ⋈ Borrow) ⋈ (Books ⋈ Lost)2100 (Books ⋈ Lost) ⋈ (Student ⋈ Borrow)2800 Pruned

32 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Impl & Conclusion Implementation & Conclusion

33 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Implementation Java 2 on Sun Solaris Simulated sources with variable statistics Measured time independent data

34 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Experiments How important is access time? How big is the search space? How much is the overhead? Measure tradeoff between planning and execution time What if all statistics are not available?

35 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Access & Transfer Times Measured for an intranet & Internet source on a T1 line intranet Access time ~ 92msTransfer time ~ 25ms/kb Internet Access time ~ 4.8sTransfer time ~ 25ms/kb Access time is a large enough cost worth spending time to optimize

36 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering ISR vs. SR Empirical evaluation of search space Trade off between planning time and execution time Larger search space  Higher planning cost More optimal solution  Lower execution cost Fast processors/slower network  Lower total cost It is worth exploring a larger search space

37 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Effect of Binding Patterns Search space increases with number of sources Search space is more constrained and small as number of binding restrictions increase Size of search space is a non-trivial function of the given parameters

38 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Graceful Degradation Is searching a larger space a good idea when statistics are not fully available? Yes: Preserves graceful degradation property of traditional System R

39 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Related Work Florescu, Levy et al Do not consider access costs being important Bottom up approach- build partial plans and check which ones lead to complete plans Consider planning time important and generate a best first method to produce before the algorithm runs to completion

40 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Related Work Kabra and DeWitt Generate a seemingly optimal plan Difficult to gather statistics Run time collection of statistics and modification of plan

41 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Related Work Urhan, Franklin et al Access costs are most important Concentrate on initial delays Change plan if delays exceed a limit

42 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Contributions Analysis of importance of considering access costs Developed Internet System R Algorithm –Included binding constraints over traditional System R –Increased search space to include bushy trees Empirical evaluation of total cost compared to planning and execution costs Examined preservation of graceful degradation with a larger search space and partial statistics

43 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Papers Published [LKG99] Eric Lambrecht, Subbarao Kambhampati and Senthil Gnanaprakasam Optimizing Recursive Information Gathering Plans. In Proceedings of the IJCAI- 99. [KG99] Subbarao Kambhampati and Senthil Gnanaprakasam. Optimizing source-call ordering in information gathering plans. Proceedings of the IJCAI-99 Workshop on Intelligent Information Integration.

44 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Thank You Questions?

45 http://tsangpo.eas.asu.edu System R Based Query Execution Optimization For Internet Information Gathering Where Do I Go From Here?


Download ppt "Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao."

Similar presentations


Ads by Google