Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Data Integration Approaches.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Information Integration Using Logical Views Jeffrey D. Ullman.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.
The Entity-Relationship Model
The Entity-Relationship Model Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Efficient Query Evaluation on Probabilistic Databases
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
Multidimensional Database in Context of DB2 OLAP Server Khang Pham Class: CSCI397-16C Instructor: Professor Renner.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
Chapter 3 An Introduction to Relational Databases.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
IASSIST Conference 2006 – Ann Arbor, May Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
CSE 636 Data Integration Answering Queries Using Views Overview.
Information Integration Using Logical Views Jeffrey D. Ullman.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Objectives of the Lecture :
Equational Reasoning Math Foundations of Computer Science.
Dataface API Essentials Steve Hannah Web Lite Solutions Corp.
Welcome to CPSC 534B: Web Data Integration & Management Laks V.S. Lakshmanan Rm. CICSR Main Mall.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Chapter 3 An Introduction to Relational Databases.
The ACL2 Proof Assistant Formal Methods Jeremy Johnson.
Session-9 Data Management for Decision Support
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Describing and Using Query Capabilities of Heterogeneous Sources Vasilis Vassalos& Yannis Papakonstantinou Presented by Srujan Kothapally.
1 Functional Dependencies and Normalization Chapter 15.
View 1. Lu Chaojun, SJTU 2 View Three-level vision of DB users Virtual DB views DB Designer Logical DB relations DBA DBA Physical DB stored info.
Databases and Speadsheets
Presented by Jiwen Sun, Lihui Zhao 24/3/2004
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
CS 4720 Model-View-Controller CS 4720 – Web & Mobile Systems.
Data Integration Approaches
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
Lesson 13 Databases Lesson Objective: Understand the main features of database software Learning Outcome: Clearly identify the uses of database software.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
Chapter 3 An Introduction to Relational Databases.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Database Architecture
Local-as-View Mediators
Information Integration
Materializing Views With Minimal Size To Answer Queries
Answering Queries Using Views: A Survey
Presentation transcript:

Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous data sources, incl., flat files, spreadsheets, …. Key idea: write “wrappers” for data sources that export a relation-like (or something as high level) views. BUT, remember: sources != DBs. Exported Views  sets of heterogeneous “lightweight objects”.

II architecture. mediator source mediator query No predefined hierarchy. A med talks to sources via translators and other med’s.

What data model is appropriate? Remember role played by data model now: In db design, you model appln. data first, develop schema, create tables and populate ‘em. Here, you are trying to abstract existing data and/or applns. using wrappers and would like to leverage the abstraction for querying (i.e., II) via mediators. So, you don’t get to preach here! Model as expressive as possible Yet as flexible as possible Handle missing, repeated (nested), and heterogeneous data Support meta-data

What are the architectural requirements? Facilitate easy joining of new mediators and “registration” of new sources Need for Mediator generator and wrapper generator

What sort of query model/language is appropriate? Must understand and be in sync with the expressive but permissive data model we sketched at. TSIMMIS uses LOREL. But we will keep our discussion more general. In principle, can use SchemaSQL, XQuery, etc.

More on data model Lightweight object model (OEM): an OEM object = OID:. Self-descriptive (i.e., schema along with data, and for every data item!). Value – atomic or set-valued.

An example OEM database guide resto o1 o2o3 o4 c n a near gourmet Three amigos s c z 1650 ste catherine montreal H3G 1M7. address westmont Not every resto may have address of same type. Indeed, some may have no address!

TSIMMIS Query model Each mediator describes its concepts (whatever it can garner from the sources it talks to) using some logical rules. TSIMMIS uses MSL, but we will see that SchemaLog can express it easily.

Information Manifold Approach Two models: (Local as View (LAV)): World view = global predicates (like base relations but does not exist) Each source = a description of what info. it can contribute for the global predicate = view over global predicate (derived relations) Query global predicate Answer using views (which are the only ones that hold the data!)

IM approach Alternative model: global predicates exported by sources as a view of the data they actually store Global as View (GAV) Query global predicates Answer by expanding query using view defs. IM follows LAV

LAV example Global predicates: emp(E), phone(E,P), office(E,O), mgr(E,M), dept(E,D) (remember they DON’T exist!) source1(E,P,M)  emp(E), phone(E,P), mgr(E,M). source2(E,O,D)  emp(E), office(E,O), dept(E,D). source3(E,P)  emp(E), phone(E,P), dept(E,`toy’). Points to remember: Views are descriptive, not prescriptive. Completeness not guaranteed. Consistency across sources not guaranteed. Example query: q1(O,P)  phone(mary,P), office(mary,O).

Query answering How can we answer such a query? Must get all relevant info. from views. I.e., rewrite query using ONLY source/view predicates. More than one possible way. Want ALL possible rewrites (to ensure (near) completeness). Rewritten q1: r1q1(O,P)  s1(E,P,M), s2(E,O,D). r2q1(O,P)  s3(E,P), s2(E,O,D). There are other rewrites too (e.g., join all three sources), but they are contained in one of the above. So, above rewrites are all “minimal” answers. Compare expanded r1q1 and r2q1 with q1 (w.r.t. containment). What can you say?

How do we get minimal rewrites? q – original query given (CQ over global predicates). r – a candidate rewrite. It’s valid provided r’s expansion (by expanding source def.’s), say E(r) is contained in q. A rewrite r is minimal if E(r) is NOT contained in E(r’) for any other rewrite. What does minimality really mean?: Example: s1(X,Y)  a(X,Y). s2(X,Y)  a(X,Y). query: q(X,Y) <- a(X,Y). r1q(X,Y)  s1(X,Y) as well as r2q(X,Y)  s2(X,Y) are needed to answer it. Why? (s1 and s2 do NOT necessarily provide the same set of tuples. Rules are descriptions NOT prescriptions!) How many rewrites should we try?

Levy-etal. Theorem Thm.: if a rewrite r of query q has more subgoals than q, then s can’t be minimal. Proof: assume r is valid (or it’s useless). So E(r) is contained in q. let h be the c.m. if r has more subgoals than q, there must be a subgoal p in r, s.t. h doesn’t map any subgoal of q to any subgoal in E(p). Then get rid of all such subgoals  modified rewrite r’. r’ contains r (trivially). But r’ is contained in q (just use the original c.m. h). \qed Given a q, only consider those sources whose body contains >= 1 global predicate appearing in q. Still exponential # choices, but not too terrible in practice.

Example revisited & expanded. Suppose source 1 instead exported s1(E,P) and source 2 s2(E,O). Is q1 answerable using the views? What about q2(E)  emp(E), mgr(E, `john’). What about q3(E1, E2)  phone(E1,P), phone(E2,P). what about q4(E,M)  emp(E), dept(E, “toy”), mgr(E,M).

QAV (AQUV) – general story Why is QAV worthwhile problem? Speed up query processing. Materialized views.   can I answer this query using stored view(s)? Information integration.  Sources store some data, and *describe* (usu. using rules) how local data relates to the global schema (i.e., what are the contributions?)  Can I answer this query using available source data (i.e., views)?  How best can I answer?

QAV – two models Classic query optimization context: Equivalent rewriting. Used extensively in data warehousing/OLAP. Information integration: Maximally contained (also called minimal, maximally sound) rewriting. Excellent survey: Alon Y. Halevy. Answering queries using views: a survey. VLDB Jl