Presentation is loading. Please wait.

Presentation is loading. Please wait.

Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.

Similar presentations


Presentation on theme: "Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala."— Presentation transcript:

1 Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala

2 Introduction.Data Integration with webs of data as sources..Previous works are inappropriate for incorporating data webs as sources in Data Integration..Data Integration systems posses many hard technical problems..Due to growing number of sources,they should be modeled as webs of data.

3 GOAL A Procedure for modeling data webs i.e incorporating them into a Data Integration system. A Procedure for modeling data webs i.e incorporating them into a Data Integration system. GLAV language for source description. GLAV language for source description. An algorithm for reformulating user queries into executional plans that both query and navigate the data sources. An algorithm for reformulating user queries into executional plans that both query and navigate the data sources.

4 Incorporating Data Webs A Data web consists of pages and links between them. A Data web consists of pages and links between them. The structure of a Data Web is represented with a Web Schema. The structure of a Data Web is represented with a Web Schema. In a Web Schema In a Web Schema Nodes Sets of pages Directed Edges Sets of directed Nodes Sets of pages Directed Edges Sets of directed links between them links between them

5 Example of a Web Schema

6 Univ represent the home page of the university. Univ represent the home page of the university. Univ(u1) denotes the home page object of university u1. Univ(u1) denotes the home page object of university u1. Every websites has a set of entry points i.e. nodes. Every websites has a set of entry points i.e. nodes. The Data Integration System can access directly by URL using entry points. The Data Integration System can access directly by URL using entry points.

7 There are three kinds of logical information stored on each page:- 1) Ordinary contents of the page. p(Y1,Y2……Yk) 2) Outgoing edges from the page. P(x,y) --> M(Y) 3) Search forms on the page. p(x,y )-----> M(Y). There are three kinds of logical information stored on each page:- 1) Ordinary contents of the page. p(Y1,Y2……Yk) 2) Outgoing edges from the page. P(x,y) --> M(Y) 3) Search forms on the page. p(x,y )-----> M(Y). Search forms map binary relations to other pages. Search forms map binary relations to other pages. form

8 Mediated Schemas It is a set of relations which serves as uniform query interface for all sources. It is a set of relations which serves as uniform query interface for all sources. Here is the example of mediated schema for our university Domain collegeOf(College,University) depfOf(Department,College) profOf(Proffesor,Department) courseOf(Course,Department) chairOf(Proffesor,Department) prereqOf(Course,Course) Here is the example of mediated schema for our university Domain collegeOf(College,University) depfOf(Department,College) profOf(Proffesor,Department) courseOf(Course,Department) chairOf(Proffesor,Department) prereqOf(Course,Course)

9 The user posses queries in terms of relations and attributes of a mediated database schema. The user posses queries in terms of relations and attributes of a mediated database schema. The relations in the mediated schema are virtual. The relations in the mediated schema are virtual. The mediated schema captures the aspects of the domain of interest to the users of the application. The mediated schema captures the aspects of the domain of interest to the users of the application.

10 Source Descriptions Why Source Descriptions? Why Source Descriptions? Sample Source Description Sample Source Description

11 The mediated schema relations do not match the source relations in one-one fashion because 1) Source schema contains different levels of detail from each other. 2) Splitting of attributes into relations is different. The mediated schema relations do not match the source relations in one-one fashion because 1) Source schema contains different levels of detail from each other. 2) Splitting of attributes into relations is different. In addition to mediated schema,the system has a set of source descriptions that specify a semantic mapping between the mediated schema and the source schema. In addition to mediated schema,the system has a set of source descriptions that specify a semantic mapping between the mediated schema and the source schema. The problem of mismatch can be solved by GAV and LAV source description languages. The problem of mismatch can be solved by GAV and LAV source description languages.

12 The LAV source description have the form The LAV source description have the form v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk) v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk) where v---Source Relation ri’s---mediated schema where v---Source Relation ri’s---mediated schema relations relations LAV contains details that are not presented in every source. LAV contains details that are not presented in every source. _____

13 GAV source description have the form GAV source description have the form _ _ _ _ _ _ _ _ _ _ V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X) V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X) There are undesirable consequences of using the either one. There are undesirable consequences of using the either one. There is also no flexibility. There is also no flexibility. GLAV combines the expressive power of both GAV and LAV. GLAV combines the expressive power of both GAV and LAV.

14 The GLAV source description has the form The GLAV source description has the form _ _ _ _ _ _ _ _ _ _ _ _ V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk). V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk). It allows source descriptions that contain recursive queries over sources. It allows source descriptions that contain recursive queries over sources.

15 Data Integration Domain The combination of set of source descriptions and set of web schemas form Data integration Domain. The combination of set of source descriptions and set of web schemas form Data integration Domain. It can be denoted as D= triple(R,{Gi},SD) where It can be denoted as D= triple(R,{Gi},SD) where R--> Set of mediated schema relations Gi--> Web Schemas SD--> Source Descriptions. R--> Set of mediated schema relations Gi--> Web Schemas SD--> Source Descriptions.

16 How to answer a Query? Using a query processor. Using a query processor. The user query is translated into a lower level procedural program called an executional plan. The user query is translated into a lower level procedural program called an executional plan. A logical plan is constructed first. A logical plan is constructed first. A navigational plan is formed later by augmenting logical plan with navigational information A navigational plan is formed later by augmenting logical plan with navigational information A Navigational plan describes how to locate the desired relations in the data webs. A Navigational plan describes how to locate the desired relations in the data webs.

17 Logical Plan A Logical Plan is a Datlog Program whose EDB relations are the source relations and whose answer predicate is q. A Logical Plan is a Datlog Program whose EDB relations are the source relations and whose answer predicate is q. The result of applying a Datlog program to a data base is the set of tuples computed for a query predicate. The result of applying a Datlog program to a data base is the set of tuples computed for a query predicate. If a conjunctive query Q is given, a sound and complete logical plan is constructed for a query using an inverse rules algorithm for GLAV called as GlavInverse. If a conjunctive query Q is given, a sound and complete logical plan is constructed for a query using an inverse rules algorithm for GLAV called as GlavInverse. Let ‘T’ contains the sentences in the source description, then the GlavInverse converts the theory T into a Datlog program. Let ‘T’ contains the sentences in the source description, then the GlavInverse converts the theory T into a Datlog program.

18 GalvInverse Algorithm

19 Theorem: Let D=(R,{Gi},SD) be an information integration domain. Let ‘Q’ be a conjunctive query. Then the logical plan ‘ ▲ ’ returned by GlavInverse is sound and complete. Theorem: Let D=(R,{Gi},SD) be an information integration domain. Let ‘Q’ be a conjunctive query. Then the logical plan ‘ ▲ ’ returned by GlavInverse is sound and complete.

20 Navigational Plan Logical plans do not explain how to populate the source relations from data webs. So they cannot be executed by themselves. Logical plans do not explain how to populate the source relations from data webs. So they cannot be executed by themselves. Logical plans are extended to navigational plans. Logical plans are extended to navigational plans. Navigational plans are augmented datlog programs. Navigational plans are augmented datlog programs. Navigational terms specify both the location and the logical content of the relation stored in the data web. Navigational terms specify both the location and the logical content of the relation stored in the data web.

21 The navigational term is of the form P:v(x), where P is the path and v is the source relation. The navigational term is of the form P:v(x), where P is the path and v is the source relation. The path ‘P’ starts at source(P) and ends at target(P). The path ‘P’ starts at source(P) and ends at target(P). Trivial paths: If P=[N(X)] Where N---node, X—variable or constant. Source(P) = target(P) = N(X). Trivial paths: If P=[N(X)] Where N---node, X—variable or constant. Source(P) = target(P) = N(X).

22 Compound paths: P = [P--  M(Y)] is a path If P is a path with target(P) = N(X) e is an edge from node N(X) to node M(Y) then, source(P`) = source(P) and target(P`) = M(Y). Compound paths: P = [P--  M(Y)] is a path If P is a path with target(P) = N(X) e is an edge from node N(X) to node M(Y) then, source(P`) = source(P) and target(P`) = M(Y). e

23

24

25 Algorithm of Navigational plan produces a Navigational plan ∆′ if logical plan ∆ and web schemas. Algorithm of Navigational plan produces a Navigational plan ∆′ if logical plan ∆ and web schemas. The Navigational plan ∆′ produced by Navigational plan is sound and complete. The Navigational plan ∆′ produced by Navigational plan is sound and complete.

26 Conclusions How to extend Data Integration systems to incorporate data webs is shown. How to extend Data Integration systems to incorporate data webs is shown. A formalism for modeling data webs and a language for source descriptions is studied. A formalism for modeling data webs and a language for source descriptions is studied. An algorithm for answering queries using GLAV source description is focused. An algorithm for answering queries using GLAV source description is focused.

27 QUERIES?

28 THANK YOU THANK YOU


Download ppt "Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala."

Similar presentations


Ads by Google