Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

Similar presentations


Presentation on theme: "1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding."— Presentation transcript:

1 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding

2 2 Common Integration Architecture Information Integration Systems Global-as-view (Gav.) vs. Local-as-view (Lav.) Query Reformulation Specification of Source Description Adding new sources

3 3 Query Reformulation Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema Given a query Q in terms of the mediator schema relations, and descriptions of information sources Find a query Q’ that uses only the source relations, such that –Q’  Q, and –Q’ provides all possible answers to Q given the sources

4 4 Solving Queries by Views Mediator Relations Source Relations

5 5 Query Rewriting Using Views Query Containment: q’  q   D q’(D)  q(D) Query Equivalence: q’=q  q’  q ^ q  q’ Given query q and view definitions V={v1, …, vn} q’ is an Equivalent Rewriting of q using V if –q’ refers only to views in V, and –q’ = q q’ is an Maximally-Contained Rewriting of q using V if –q’ refers only to views in V and –q’  q, and –There is no rewriting q1, such that q’  q1 and q1  q’

6 6 Computation Complexity

7 7 Complexity of Query Containment Conjunctive Queries (CQ) (NP-Complete) –Q1: p(X,Z) :- a(X,Y) & a(Y,Z) –Q2: p(X,Z) :- a(X,Y) & a(V,Z) CQ’s With Negation ( -Complete) –Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z) CQ’s With Arithmetic Comparison ( -Complete) –Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y Datalog Programs – p(A,C) :- a(A,B) & b(B,C)

8 8 Specification of Source Description Views: resources that used by integrator to help to answer queries Gav. Mediator relation defined as view over source relations Lav. Source relation defined as view over mediator relations

9 9 Information Integration Systems Tsimmis –Stanford and IBM –Global-as-View (Gav) –Mediator relations defined as views of source relations Information Manifold (IM) –AT&T –Local-as-View (Lav) –Description logic –Source relations defined as views of mediator relations ( a collection of global predictions)

10 10 TSIMMIS – Gav Solution The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS) Offers: –A flexible data model –A common query language –Other supporting tools

11 11 TSIMMIS – Components OEM (Object-Exchange Model) LOREL (Lightweight Object REpository Language) MSL (Mediator Specification Language) Wrappers

12 12 TSIMMIS – OEM Object Exchange Model The data model for TSIMMIS “self-describing” (labels carry all of the information that there is about an object) Flexible First order logic

13 13 TSIMMIS – OEM OID:label typevalue Object Identifier Human Understandable “set” or “string” A set or a string

14 14 TSIMMIS – OEM libraryset bookset authorstring titlestring Aho Compilers…

15 15 TSIMMIS – OEM First order predicate logic authorstringAho123 author( T, “Aho” ) This would return the object IDs of all objects with a label “author” and value “Aho”.

16 16 TSIMMIS – LOREL Lightweight Object REpository Language An OQL for OEM The end-user language for TSIMMIS

17 17 TSIMMIS – LOREL Example select library.book.title from library where library.book.author = “Aho”

18 18 TSIMMIS – LOREL Partial Match Semantics select R.A from R, S, T where R.A = S.A or R.A = T.A This would fail to return anything in SQL if either S or T were empty. Because of partial match semantics this does not fail in LOREL

19 19 TSIMMIS – MSL Mediator Specification Language Allows declarative specification of mediators Object oriented, logical query language Targeted to OEM

20 20 TSIMMIS – MSL Query Mediator Wrapper Source :- } > } > @s1 libraryset bookset authorstringAho titlestringCompilers…

21 21 TSIMMIS – Wrappers Query Mediator Wrapper Source Wrappers are similar to database drivers Wrappers are written with MSL

22 22 TSIMMIS – Wrappers Wrappers have the form: MSL template // action // Example: :- }> }>@s1 // sprintf(lookup-query, “find author %s”, $AU) //

23 23 TSIMMIS – Summary End users need to specify their sources w.r.t. a mediator model – OEM in TSIMMIS Query specification is standard – LOREL Query rewriting is straightforward – MSL and wrappers To add a new source is not easy – need to specify it in the mediator model

24 24 Information Manifold Challenges for Information Integration –Interrelated data over multiple information sources –Large number of the sources –Limited size of data in many of the sources –Greatly variant details of interacting with each source

25 25 IM Architecture 1 2 3 Bucket algorithm

26 26 World View Product( Model ) Automobile( Model, Year, Category ) Motorcycle( Model, Year ) Car( Model, Year, Category ) NewCar( Model, Year, Category ) UsedCar( Model, Year, Category ) CarForSale( Model, Year, Category, Price, SellerContact ) Automobile CarMotorcycle Car UsedCarCarForSale Product Automobile Virtual Relations: Classes: NewCar

27 27 Source Descriptions For each source: Content Record Capability Record Web Sources for Automobile Application

28 28 Content Records of Auto Sources

29 29 Capability Records of Auto Sources desired input setpossible output set capable selection set

30 30 Query Reformulation Containing instead of equivalent –Incomplete source –Useful subset Utilizes Plan Generator to: –Prune irrelevant sources –Split query into subgoals –Generate conjunctive query plans –Find executable ordering of subgoals

31 31 The Bucket Algorithm Given: user query q, source descriptions {V i } 1.Find relevant source (fill buckets) For each relation g in query q Find V j that contains relation g Check that constraints in V j are compatible with q 2.Combine source relations {V j } from each bucket into a conjunctive query q’ and check for containment (q’  q)

32 32 The Bucket Algorithm: Example q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

33 33 1. Filling the Buckets q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r) V 1 (c 1 ) V 2 (c 2 ) V 3 (c 3 ) V 1 (c 1,t 1 ) V 2 (c 2,t 2 ) V 3 (c 3,t 3 ) V 1 (c 1,y 1 ) V 2 (c 2,y 2 ) V 3 (c 3,y 3 ) V 1 (c 1,m 1 ) V 2 (c 2,m 2 ) V 3 (c 3,m 3 ) V 1 (c 1,p 1 ) V 2 (c 2,p 2 ) V 3 (c 3,p 3 ) V 5 (m 5,y 5,r 5 ) CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y  1992 t=sportscar

34 34 2. Checking Containment User Query q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r) Result Query q’(m,p,r)  V 1 (c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)  1992, Category(c)=sportscar}), V 5 (m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).  ?  Expanded Query q’(m,p,r)  CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y  1992 

35 35 Finding an Executable Ordering CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y  1992 t=sportscar V 1 (c)V 1 (c,t)V 1 (c,y)V 1 (c,m)V 1 (c,p)V 5 (m,y,r) BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y  1992} 

36 36 Advantages and Disadvantages Gav: Tsimmis –Advantage Query reformulation: rule unfolding –Disadvantage Mediation description Adding, removing, and modifying source description –Better for static, centralized systems Lav: Information Maniford –Advantage: adding new sources Mediator (global predicates, source descriptions) Query processing –Disadvantages query reformulation (Bucket algorithm) –Better for dynamic, distributed systems


Download ppt "1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding."

Similar presentations


Ads by Google