Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.

Similar presentations


Presentation on theme: "CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina."— Presentation transcript:

1 CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

2 2 Heterogeneous Databases data DBMS 1 data DBMS 2 data legacy data web site Distributed Database System

3 3 Limited Capabilities

4 4 author: title: subject: format: price: must specify at least one of these this attribute not returned cannot query on this attribute menu of choices Example: Amazon.com

5 5 Example: BarnesAndNoble.com must specify at least one of these can query if one of other attributes specified Menu of choices author: title: subject: format: price:

6 6 Why Limited Capabilities? Search forms Security Indexes Legacy

7 7 Capability vs. Content Capability description –Can only search for subject = “art,” “history,” “science” Content description –Source only contains subject = “art,” “history,” “science”

8 8 Describing source capabilities Extending source capabilities How mediators cope with limited capabilities Mediator capabilities Other topics Outline Mediator Source Wrapper

9 9 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S

10 10 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S With output restriction f’ u’ b’ c’[S] o’[S]

11 11 Example Relation R(X, Y, Z) Description Templates: bu’f, uf’c[z 1, z 2 ] Answerable queries: R(x 1, Y, Z), R(X, Y, z 1 ) Unanswerable queries: R(X, y 1, Z), R(X, Y, z 3 )

12 12 Other Description Mechanisms Tsimmis –Query templates Information Manifold –capability records (# bound attrs, conditions ok,...) Disco Garlic –black box Context-free grammars

13 13 Extending Source Capabilities amazon Wrapper Query: author=“Freud” AND price > 10 Source: R(author, price,...) Template: b, u,...

14 14 Extending Source Capabilities Source: R(author, price,...) Template: b, u,... Query: author=“Freud” AND price > 10 Source Query: author=“Freud” Wrapper Filter: price > 10 amazon Wrapper

15 15 Another Example Barnes&Noble Wrapper Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author

16 16 Another Example Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author Q1: author = “Freud” AND price < 10 Q2: author = “Jung” AND price < 10 Union Operation Barnes&Noble Wrapper

17 17 Extending Source Capabilities General scheme: –try many query rewritings –check if query fragments supported by source –check if wrapper can combine answer fragments –do all this very efficiently!! –H. Garcia-Molina, W. Labio, R. Yerneni: Capability-Sensitive Query Processing on Internet Sources, ICDE 1999 Tsimmis, Info Manifold: no disjunctive queries DISCO: no query splitting Garlic: only CNF queries

18 18 Mediator Processing R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper

19 19 Plan 1 R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper (1) R(5, Y, Z) (2) T(Z, W, 3) (3) Join answers

20 20 Plan 2 R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper (3) Join answers (1) P = T(Z, W, 3) (2) for each (z,w,u)  P: R(5, Y, u)

21 21 Mediator Plan Generation Need feasible and efficient plan Search space is huge Tsimmis, Info Manifold, Garlic: – exponential algorithms Polynomial algorithms: –often find optimal or near-optimal plan –bounded performance –R. Yerneni, C. Li, J. D. Ullman, H. Garcia-Molina: Optimizing Large Join Queries in Mediation Systems, ICDT 1999

22 22 Conclusion Not all sources are created equal! Need to –describe what sources can do –efficiently process queries with limited sources –describe what mediators can do –exploit content information –deal with unavailable sources

23 23 References Computing Capabilities of Mediators –Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey D. Ullman –SIGMOD Conference 1999 Describing and Using Query Capabilities of Heterogeneous Sources –Vasilis Vassalos, Yannis Papakonstantinou –VLDB 1997


Download ppt "CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina."

Similar presentations


Ads by Google