Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright.

Similar presentations


Presentation on theme: "Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright."— Presentation transcript:

1 Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright

2 L. Raschid — University of Maryland Wide-Area Applications (WAA ) Problem: Scalability of Wrapper/Mediator architectures to WebSources that are accessible via WANs n Multiple sources and complex computational capabilities (WSI) n Complex queries on multiple sources, e.g., drug lead discovery using biomolecular sources n Dissimilar access costs (end-to-end latencies) and other metrics for each WSI n Dynamic Wide area environment is noisy (introduces unpredictable delays)

3 Page 3 Detailed Architecture Execution Engine Web Query Optimizer Extended Randomized Optimizer Capability-Based Pre-optimizer Web Query Broker WCM WebPT Catalog Web Source Mediator Wrapper Query

4 L. Raschid — University of Maryland Relevant Technologies n Wrapper generation Toolkit: CoopIS 1999, VLDB Journal special issue 2000 n Wrapper Mediator Architecture (Predator ORDBMS platform) ICDE 2000 demo n CBR Tool, Wrapper Cost model (WCM) CoopIS 1999 n Web prediction tool (WebPT) VLDB Journal 2000 special issue, CoopIS 2001 n Web Query Optimizer (WQO) Sigmod 2002, ICDCS 2002

5 L. Raschid — University of Maryland Outline of the talk n Motivation and architecture n Example n Two Phase Optimizer (WQO) and Pre-plans n Heuristics used by WQO to choose Pre-plans n Evaluation n Related Work n Future Work

6 L. Raschid — University of Maryland ACM DL Web Source Schema Paper(FirstAuthor, Title, PaperID, Keywords, PaperPDF) Coauthor(PaperId, CName) Editor(PaperId, EName) Reviewer(PaperId, RName) Limited capabilities - WSIs ior1 Paper: FirstAuthor —› Title, PaperID, Keywords, PaperPDF ior3 Paper: FirstAuthor —› Title, PaperID, Keywords ior4 Paper: PaperID —› PaperPDF ior2 CoAuthor: PaperID —› CName ior5 Editor: PaperID —› EName ior6 Reviewer: {} —› PaperId, RName Dependencies: (ior1 —› ior2), (ior3 —› ior4),(ior3&ior4 —› ior2)... Composed WSI ior3 & ior4 Atomic WSI ior1

7 L. Raschid — University of Maryland 2 Phase Optimization for WebSources Objective: Generate safe good plans in a large search space (due to multiple alternate WSIs) n Pre-plan n 2 Phase optimization n Cost based heuristics n Cost based optimization (randomized optimizer)

8 L. Raschid — University of Maryland CBR Tool and Pre-plan n Pre-plan – partition subgoals on mediator relations u (partial) Ordering between subgoals (WSI dependencies) u Select WSI for each subgoal SELECT P.Title, P.PaperPDF, CoAuthor.CName FROM Paper P, CoAuthor, Editor, Reviewer WHERE P.1stAuthor=“Franklin” & P.PaperId=CoAuthor.PaperId & P.PaperId=Reviewer.PaperId & P.PaperId=Editor.PaperId {{Paper (ior1), Reviewer(ior6)} {CoAuthor(ior2), Editor(ior5)} } Paper(ior1) —› CoAuthor(ior2), Paper(ior1) —› Editor(ior5) {{Paper (ior3;ior4), Reviewer(ior6)} {CoAuthor(ior2), Editor(ior5)} } Paper(ior3;ior4) —› CoAuthor(ior2), Paper(ior3;ior4) —› Editor(ior5)

9 L. Raschid — University of Maryland 2 Phase Optimization for WebSources Objective: Choose a good pre-plan that will lead to a good plan n WQO uses CBR tool to select WSIs and generate pre- plans n WQO uses cost-based heuristics to select one or more good pre-plans n WQO uses randomized relational optimizer and cost model and chooses safe good plans

10 L. Raschid — University of Maryland Traditional optimizer behavior Query with 5-way join, 3 WSIs per relation Details of relative costs of WSIs in paper

11 L. Raschid — University of Maryland WQO behavior using heuristics to explore a few a good pre-plan(s)

12 L. Raschid — University of Maryland WQO functionality n Metrics of Wrapper Cost Model (WebPT) n Ignore local processing costs (assume optimizer will choose best plan possible for pre-plan) n Choose WSIs and pre-plans to minimize remote costs n Why use heuristics to choose pre-plan? u Impact of cost / delay on heuristics versus impact on a cost model u Impact of noise on heuristics versus cost model u Limitations of heuristics

13 L. Raschid — University of Maryland Metrics in Wrapper Cost Model n WebSource and Network Costs u Remote Cost at Web Source - TTF u Downloading data from Web Source (extraction cost) u Total cost - TTL n Wrapper Statistics u Number of Pages Accessed u Cardinality of Result n Statistics dependent on value of query binding n WebPT - a tool for learning using query feedback and predicting access cost based on parameters such as Day, Time, Qty, Cardinality, etc.

14 L. Raschid — University of Maryland WQO Heuristics to reduce remote costs n Reduce number of WSI calls u Favors atomic WSIs u … but composed WSIs may reduce result cardinality hence reduce overall number of calls n Choose WSI with lowest access cost u Favors lower cost WSIs u … but more expensive WSIs may provide more filtering of results and may reduce result cardinality n Reduce result cardinality u More selective WSIs u WSIs with more input bindings

15 L. Raschid — University of Maryland WQO Heuristics to choose good pre-plans n Top-down versus bottom-up evaluation u Requires knowledge of dependencies u Details in paper {R1 R3 R4 R5} {R2(S21)} U R3 -> R2 {R1 R2(S22) R3 R4 R5} U ø n Choice of atomic versus composed WSIs ior1 ior3 + ior4 n Cost&selectivity measure u Favor both low access costs and low cardinality u E.g., if access costs of 2 WSIs are similar choose WSI with greater selectivity

16 Query Q1: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID Query Q2: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID and Coauthor.CoAuthorName = ”Stan Zdonik” Sample Queries for ACM DL Web Source Query Q3: and keyword = “broadcast disks”

17 L. Raschid — University of Maryland Execution Plans for Query Q1 and Q2 -- Dependent Join Paper Coauthor Plan 1: ior1ior2 Use atomic ior1 Coauthor Paper Plan 3: Q1 bad Q2 good ior3 ior2 ior4costly Use composed (ior3,ior4) Paper Coauthor Plan 2: ior3 ior4 ior2 costly Use composed (ior3,ior4) costly Q3 good

18 Query Q1: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID Query Q2: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID and Coauthor.CoAuthorName = ”Stan Zdonik” Sample Queries for ACM DL Web Source Query Q3: and keyword = “broadcast disks”

19 L. Raschid — University of Maryland Response Time for Queries Q1 and Q2 Query # Plan1 Plan2 Plan3 Plan1 Plan2 Plan3

20 L. Raschid — University of Maryland Quantile plots for Queries Q1 and Q2 Plan1 Plan2 Plan3 Plan1 Plan2 Plan3

21 Objective: reduce the amount of data delivered to the mediator and minimize remote access cost Factors: u Cardinality / Selectivity of dependent join (binding attributes) of WSIs u Selectivity / cardinality of the remote relations u Costly WSIs Good choice of WSIs and pre-plans relies on ability to construct a realistic cost model Summary of WQO Heuristics

22 L. Raschid — University of Maryland Choice of WSIs for Complex Queries

23 L. Raschid — University of Maryland WQO Limitations: Choice of (A)tomic and (C)omposed capability 4way and 5way join queries 2 remote relations -(A)tomic /(C)omposed AA AC CA CC Qcard100-AA best plan (ms) worst plan (ms) 2545 200667 AA poor heuristic 639 214748 CC Qcard1000-AA good heuristic 20547 200847 AA 24229 2147480 CC Qcard1000-CC good heuristic 20265 200534 AA 2399 466704 CC

24 n Capability based rewriting n Wrapper cost models: Garlic(IBM), DISCO(INRIA), HP n Mediator Optimizers: Garlic(IBM), WSQ/DSQ(Stanford), IRO-DB(Versailles) n Adaptive operators: Telegraph, Tukwila, XJoin n Reactive optimizers: Query scrambling (Maryland), LEC optimizer (Cornell), EC+D Optimizer (Maryland) n Cost/Quality trade-off: Nie+Rao (ASU), Naumann+Freitag (Germany) n ….. Related Work

25 L. Raschid — University of Maryland Current implementation status of WQO Extensions to randomized relational optimizer. WebPT tool to predict response time from WebSources. WebWrapper cost model for WebSources. Cost based heuristics to choose pre-plans. ' Integration into a scrambling enabled optimizer. ' Study of pre-plan choice (explored search space) on choice of good plan.

26 n Integration of capability-based rewriting with relational query optimization n Extension of mediator cost model with access costs of Web sources n Development of cost-based heuristics to choose Web access patterns for mediator relations n Implemented and tested on complex Web queries Summary of Web Query Optimizer


Download ppt "Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright."

Similar presentations


Ads by Google