1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

CSE 636 Data Integration Data Integration Approaches.
Information Integration Using Logical Views Jeffrey D. Ullman.
2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Introduction to Structured Query Language (SQL)
Data Integration Techniques Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 30, 2003 Some slide content may.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
CSE 636 Data Integration Answering Queries Using Views Overview.
Information Integration Using Logical Views Jeffrey D. Ullman.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
2005Integration/tsimmis1 Tsimmis The Stanford-IBM Manager of Multiple Information Sources  Overview  Mediator specification  A reduction to Datalog.
Querying Heterogeneous Information Sources Using Source Descriptions Authors: Alon Y. Levy Anand Rajaraman Joann J. Ordille Presenter: Yihong Ding.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
1 Information Integration Mediators Warehousing Answering Queries Using Views.
Process-oriented System Automation Executable Process Modeling & Process Automation.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
The TSIMMIS Approach to Mediation: Data Models and Languages Hector Garcia-Molina Yannis Papakonstantinou Dallan Quass Anand Rajaraman Yehoshua Sagiv Jeffrey.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
Integration of Spatial Information Sources Based on Source Description Framework Yoshiharu Ishikawa, Gihyong Ryu, and Hiroyuki Kitagawa University of Tsukuba.
1 Information Integration. 2 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Bridging Different Data Representations Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 28, 2003 Some slide.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Integration what it takes to put data together Ir. Richard Vdovjak, MTD.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Data Integration Approaches
1 Chapter 2 Database Environment Pearson Education © 2009.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Capability-Sensitive Query Processing on Internet Sources
Capability Based Mediation in TSIMMIS
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Data, Databases, and DBMSs
MANAGING DATA RESOURCES
Database management concepts
Information Integration
Query Optimization.
Chapter 2 Database Environment Pearson Education © 2009.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding

2 Common Integration Architecture Information Integration Systems Global-as-view (Gav.) vs. Local-as-view (Lav.) Query Reformulation Specification of Source Description Adding new sources

3 Query Reformulation Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema Given a query Q in terms of the mediator schema relations, and descriptions of information sources Find a query Q’ that uses only the source relations, such that –Q’  Q, and –Q’ provides all possible answers to Q given the sources

4 Solving Queries by Views Mediator Relations Source Relations

5 Query Rewriting Using Views Query Containment: q’  q   D q’(D)  q(D) Query Equivalence: q’=q  q’  q ^ q  q’ Given query q and view definitions V={v1, …, vn} q’ is an Equivalent Rewriting of q using V if –q’ refers only to views in V, and –q’ = q q’ is an Maximally-Contained Rewriting of q using V if –q’ refers only to views in V and –q’  q, and –There is no rewriting q1, such that q’  q1 and q1  q’

6 Computation Complexity

7 Complexity of Query Containment Conjunctive Queries (CQ) (NP-Complete) –Q1: p(X,Z) :- a(X,Y) & a(Y,Z) –Q2: p(X,Z) :- a(X,Y) & a(V,Z) CQ’s With Negation ( -Complete) –Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z) CQ’s With Arithmetic Comparison ( -Complete) –Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y Datalog Programs – p(A,C) :- a(A,B) & b(B,C)

8 Specification of Source Description Views: resources that used by integrator to help to answer queries Gav. Mediator relation defined as view over source relations Lav. Source relation defined as view over mediator relations

9 Information Integration Systems Tsimmis –Stanford and IBM –Global-as-View (Gav) –Mediator relations defined as views of source relations Information Manifold (IM) –AT&T –Local-as-View (Lav) –Description logic –Source relations defined as views of mediator relations ( a collection of global predictions)

10 TSIMMIS – Gav Solution The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS) Offers: –A flexible data model –A common query language –Other supporting tools

11 TSIMMIS – Components OEM (Object-Exchange Model) LOREL (Lightweight Object REpository Language) MSL (Mediator Specification Language) Wrappers

12 TSIMMIS – OEM Object Exchange Model The data model for TSIMMIS “self-describing” (labels carry all of the information that there is about an object) Flexible First order logic

13 TSIMMIS – OEM OID:label typevalue Object Identifier Human Understandable “set” or “string” A set or a string

14 TSIMMIS – OEM libraryset bookset authorstring titlestring Aho Compilers…

15 TSIMMIS – OEM First order predicate logic authorstringAho123 author( T, “Aho” ) This would return the object IDs of all objects with a label “author” and value “Aho”.

16 TSIMMIS – LOREL Lightweight Object REpository Language An OQL for OEM The end-user language for TSIMMIS

17 TSIMMIS – LOREL Example select library.book.title from library where library.book.author = “Aho”

18 TSIMMIS – LOREL Partial Match Semantics select R.A from R, S, T where R.A = S.A or R.A = T.A This would fail to return anything in SQL if either S or T were empty. Because of partial match semantics this does not fail in LOREL

19 TSIMMIS – MSL Mediator Specification Language Allows declarative specification of mediators Object oriented, logical query language Targeted to OEM

20 TSIMMIS – MSL Query Mediator Wrapper Source :- } > } libraryset bookset authorstringAho titlestringCompilers…

21 TSIMMIS – Wrappers Query Mediator Wrapper Source Wrappers are similar to database drivers Wrappers are written with MSL

22 TSIMMIS – Wrappers Wrappers have the form: MSL template // action // Example: :- }> // sprintf(lookup-query, “find author %s”, $AU) //

23 TSIMMIS – Summary End users need to specify their sources w.r.t. a mediator model – OEM in TSIMMIS Query specification is standard – LOREL Query rewriting is straightforward – MSL and wrappers To add a new source is not easy – need to specify it in the mediator model

24 Information Manifold Challenges for Information Integration –Interrelated data over multiple information sources –Large number of the sources –Limited size of data in many of the sources –Greatly variant details of interacting with each source

25 IM Architecture Bucket algorithm

26 World View Product( Model ) Automobile( Model, Year, Category ) Motorcycle( Model, Year ) Car( Model, Year, Category ) NewCar( Model, Year, Category ) UsedCar( Model, Year, Category ) CarForSale( Model, Year, Category, Price, SellerContact ) Automobile CarMotorcycle Car UsedCarCarForSale Product Automobile Virtual Relations: Classes: NewCar

27 Source Descriptions For each source: Content Record Capability Record Web Sources for Automobile Application

28 Content Records of Auto Sources

29 Capability Records of Auto Sources desired input setpossible output set capable selection set

30 Query Reformulation Containing instead of equivalent –Incomplete source –Useful subset Utilizes Plan Generator to: –Prune irrelevant sources –Split query into subgoals –Generate conjunctive query plans –Find executable ordering of subgoals

31 The Bucket Algorithm Given: user query q, source descriptions {V i } 1.Find relevant source (fill buckets) For each relation g in query q Find V j that contains relation g Check that constraints in V j are compatible with q 2.Combine source relations {V j } from each bucket into a conjunctive query q’ and check for containment (q’  q)

32 The Bucket Algorithm: Example q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

33 1. Filling the Buckets q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r) V 1 (c 1 ) V 2 (c 2 ) V 3 (c 3 ) V 1 (c 1,t 1 ) V 2 (c 2,t 2 ) V 3 (c 3,t 3 ) V 1 (c 1,y 1 ) V 2 (c 2,y 2 ) V 3 (c 3,y 3 ) V 1 (c 1,m 1 ) V 2 (c 2,m 2 ) V 3 (c 3,m 3 ) V 1 (c 1,p 1 ) V 2 (c 2,p 2 ) V 3 (c 3,p 3 ) V 5 (m 5,y 5,r 5 ) CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y  1992 t=sportscar

34 2. Checking Containment User Query q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r) Result Query q’(m,p,r)  V 1 (c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)  1992, Category(c)=sportscar}), V 5 (m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).  ?  Expanded Query q’(m,p,r)  CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y  1992 

35 Finding an Executable Ordering CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y  1992 t=sportscar V 1 (c)V 1 (c,t)V 1 (c,y)V 1 (c,m)V 1 (c,p)V 5 (m,y,r) BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y  1992} 

36 Advantages and Disadvantages Gav: Tsimmis –Advantage Query reformulation: rule unfolding –Disadvantage Mediation description Adding, removing, and modifying source description –Better for static, centralized systems Lav: Information Maniford –Advantage: adding new sources Mediator (global predicates, source descriptions) Query processing –Disadvantages query reformulation (Bucket algorithm) –Better for dynamic, distributed systems