Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
TU/e eindhoven university of technology PACIS'03 July Engineering Semantic Web Information Systems Richard Vdovjak Flavius Frasincar Geert-Jan Houben.
TU/e technische universiteit eindhoven Hera: Development of Semantic Web Information Systems Geert-Jan Houben Peter Barna Flavius Frasincar Richard Vdovjak.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Semantics Static semantics Dynamic semantics attribute grammars
CSE 636 Data Integration Data Integration Approaches.
CHAPTER 3: DESCRIBING DATA SOURCES
Information Integration Using Logical Views Jeffrey D. Ullman.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Relational Algebra Dashiell Fryer. What is Relational Algebra? Relational algebra is a procedural query language. Relational algebra is a procedural query.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
CSE 425: Semantic Analysis Semantic Analysis Allows rigorous specification of a program’s meaning –Lets (parts of) programming languages be proven correct.
CS 355 – Programming Languages
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
CS 290C: Formal Models for Web Software Lecture 6: Model Driven Development for Web Software with WebML Instructor: Tevfik Bultan.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Automatic Data Ramon Lawrence University of Manitoba
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
The Internet & The World Wide Web Notes
Domain Modelling the upper levels of the eframework Yvonne Howard Hilary Dexter David Millard Learning Societies LabDistributed Learning, University of.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
DEDUCTIVE DATABASE.
3231 Software Engineering By Germaine Cheung Hong Kong Computer Institute Lecture 12.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Adaptive Hypermedia Tutorial System Based on AHA Jing Zhai Dublin City University.
Dimitrios Skoutas Alkis Simitsis
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Presented By: Miss N. Nembhard. Relation Algebra Relational Algebra is : the formal description of how a relational database operates the mathematics.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Defects of UML Yang Yichuan. For the Presentation Something you know Instead of lots of new stuff. Cases Instead of Concepts. Methodology instead of the.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
The Semantic Web By: Maulik Parikh.
Learn about relations and their basic properties
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Architecture Components
Computing Full Disjunctions
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Knowledge Representation
Knowledge Representation
Web Couple: Coupling web information
Part 8 Q36 to Q40 of National 5 Prelim
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
A programming language
Chen Li Information and Computer Science
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Probabilistic Databases with MarkoViews
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala

Introduction.Data Integration with webs of data as sources..Previous works are inappropriate for incorporating data webs as sources in Data Integration..Data Integration systems posses many hard technical problems..Due to growing number of sources,they should be modeled as webs of data.

GOAL A Procedure for modeling data webs i.e incorporating them into a Data Integration system. A Procedure for modeling data webs i.e incorporating them into a Data Integration system. GLAV language for source description. GLAV language for source description. An algorithm for reformulating user queries into executional plans that both query and navigate the data sources. An algorithm for reformulating user queries into executional plans that both query and navigate the data sources.

Incorporating Data Webs A Data web consists of pages and links between them. A Data web consists of pages and links between them. The structure of a Data Web is represented with a Web Schema. The structure of a Data Web is represented with a Web Schema. In a Web Schema In a Web Schema Nodes Sets of pages Directed Edges Sets of directed Nodes Sets of pages Directed Edges Sets of directed links between them links between them

Example of a Web Schema

Univ represent the home page of the university. Univ represent the home page of the university. Univ(u1) denotes the home page object of university u1. Univ(u1) denotes the home page object of university u1. Every websites has a set of entry points i.e. nodes. Every websites has a set of entry points i.e. nodes. The Data Integration System can access directly by URL using entry points. The Data Integration System can access directly by URL using entry points.

There are three kinds of logical information stored on each page:- 1) Ordinary contents of the page. p(Y1,Y2……Yk) 2) Outgoing edges from the page. P(x,y) --> M(Y) 3) Search forms on the page. p(x,y )-----> M(Y). There are three kinds of logical information stored on each page:- 1) Ordinary contents of the page. p(Y1,Y2……Yk) 2) Outgoing edges from the page. P(x,y) --> M(Y) 3) Search forms on the page. p(x,y )-----> M(Y). Search forms map binary relations to other pages. Search forms map binary relations to other pages. form

Mediated Schemas It is a set of relations which serves as uniform query interface for all sources. It is a set of relations which serves as uniform query interface for all sources. Here is the example of mediated schema for our university Domain collegeOf(College,University) depfOf(Department,College) profOf(Proffesor,Department) courseOf(Course,Department) chairOf(Proffesor,Department) prereqOf(Course,Course) Here is the example of mediated schema for our university Domain collegeOf(College,University) depfOf(Department,College) profOf(Proffesor,Department) courseOf(Course,Department) chairOf(Proffesor,Department) prereqOf(Course,Course)

The user posses queries in terms of relations and attributes of a mediated database schema. The user posses queries in terms of relations and attributes of a mediated database schema. The relations in the mediated schema are virtual. The relations in the mediated schema are virtual. The mediated schema captures the aspects of the domain of interest to the users of the application. The mediated schema captures the aspects of the domain of interest to the users of the application.

Source Descriptions Why Source Descriptions? Why Source Descriptions? Sample Source Description Sample Source Description

The mediated schema relations do not match the source relations in one-one fashion because 1) Source schema contains different levels of detail from each other. 2) Splitting of attributes into relations is different. The mediated schema relations do not match the source relations in one-one fashion because 1) Source schema contains different levels of detail from each other. 2) Splitting of attributes into relations is different. In addition to mediated schema,the system has a set of source descriptions that specify a semantic mapping between the mediated schema and the source schema. In addition to mediated schema,the system has a set of source descriptions that specify a semantic mapping between the mediated schema and the source schema. The problem of mismatch can be solved by GAV and LAV source description languages. The problem of mismatch can be solved by GAV and LAV source description languages.

The LAV source description have the form The LAV source description have the form v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk) v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk) where v---Source Relation ri’s---mediated schema where v---Source Relation ri’s---mediated schema relations relations LAV contains details that are not presented in every source. LAV contains details that are not presented in every source. _____

GAV source description have the form GAV source description have the form _ _ _ _ _ _ _ _ _ _ V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X) V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X) There are undesirable consequences of using the either one. There are undesirable consequences of using the either one. There is also no flexibility. There is also no flexibility. GLAV combines the expressive power of both GAV and LAV. GLAV combines the expressive power of both GAV and LAV.

The GLAV source description has the form The GLAV source description has the form _ _ _ _ _ _ _ _ _ _ _ _ V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk). V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk). It allows source descriptions that contain recursive queries over sources. It allows source descriptions that contain recursive queries over sources.

Data Integration Domain The combination of set of source descriptions and set of web schemas form Data integration Domain. The combination of set of source descriptions and set of web schemas form Data integration Domain. It can be denoted as D= triple(R,{Gi},SD) where It can be denoted as D= triple(R,{Gi},SD) where R--> Set of mediated schema relations Gi--> Web Schemas SD--> Source Descriptions. R--> Set of mediated schema relations Gi--> Web Schemas SD--> Source Descriptions.

How to answer a Query? Using a query processor. Using a query processor. The user query is translated into a lower level procedural program called an executional plan. The user query is translated into a lower level procedural program called an executional plan. A logical plan is constructed first. A logical plan is constructed first. A navigational plan is formed later by augmenting logical plan with navigational information A navigational plan is formed later by augmenting logical plan with navigational information A Navigational plan describes how to locate the desired relations in the data webs. A Navigational plan describes how to locate the desired relations in the data webs.

Logical Plan A Logical Plan is a Datlog Program whose EDB relations are the source relations and whose answer predicate is q. A Logical Plan is a Datlog Program whose EDB relations are the source relations and whose answer predicate is q. The result of applying a Datlog program to a data base is the set of tuples computed for a query predicate. The result of applying a Datlog program to a data base is the set of tuples computed for a query predicate. If a conjunctive query Q is given, a sound and complete logical plan is constructed for a query using an inverse rules algorithm for GLAV called as GlavInverse. If a conjunctive query Q is given, a sound and complete logical plan is constructed for a query using an inverse rules algorithm for GLAV called as GlavInverse. Let ‘T’ contains the sentences in the source description, then the GlavInverse converts the theory T into a Datlog program. Let ‘T’ contains the sentences in the source description, then the GlavInverse converts the theory T into a Datlog program.

GalvInverse Algorithm

Theorem: Let D=(R,{Gi},SD) be an information integration domain. Let ‘Q’ be a conjunctive query. Then the logical plan ‘ ▲ ’ returned by GlavInverse is sound and complete. Theorem: Let D=(R,{Gi},SD) be an information integration domain. Let ‘Q’ be a conjunctive query. Then the logical plan ‘ ▲ ’ returned by GlavInverse is sound and complete.

Navigational Plan Logical plans do not explain how to populate the source relations from data webs. So they cannot be executed by themselves. Logical plans do not explain how to populate the source relations from data webs. So they cannot be executed by themselves. Logical plans are extended to navigational plans. Logical plans are extended to navigational plans. Navigational plans are augmented datlog programs. Navigational plans are augmented datlog programs. Navigational terms specify both the location and the logical content of the relation stored in the data web. Navigational terms specify both the location and the logical content of the relation stored in the data web.

The navigational term is of the form P:v(x), where P is the path and v is the source relation. The navigational term is of the form P:v(x), where P is the path and v is the source relation. The path ‘P’ starts at source(P) and ends at target(P). The path ‘P’ starts at source(P) and ends at target(P). Trivial paths: If P=[N(X)] Where N---node, X—variable or constant. Source(P) = target(P) = N(X). Trivial paths: If P=[N(X)] Where N---node, X—variable or constant. Source(P) = target(P) = N(X).

Compound paths: P = [P--  M(Y)] is a path If P is a path with target(P) = N(X) e is an edge from node N(X) to node M(Y) then, source(P`) = source(P) and target(P`) = M(Y). Compound paths: P = [P--  M(Y)] is a path If P is a path with target(P) = N(X) e is an edge from node N(X) to node M(Y) then, source(P`) = source(P) and target(P`) = M(Y). e

Algorithm of Navigational plan produces a Navigational plan ∆′ if logical plan ∆ and web schemas. Algorithm of Navigational plan produces a Navigational plan ∆′ if logical plan ∆ and web schemas. The Navigational plan ∆′ produced by Navigational plan is sound and complete. The Navigational plan ∆′ produced by Navigational plan is sound and complete.

Conclusions How to extend Data Integration systems to incorporate data webs is shown. How to extend Data Integration systems to incorporate data webs is shown. A formalism for modeling data webs and a language for source descriptions is studied. A formalism for modeling data webs and a language for source descriptions is studied. An algorithm for answering queries using GLAV source description is focused. An algorithm for answering queries using GLAV source description is focused.

QUERIES?

THANK YOU THANK YOU