1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
CSE 636 Data Integration Data Integration Approaches.
Peer-to-Peer Networking for Distributed Learning Repositories: The Edutella Network Diplomarbeit von Boris Wolf.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Template-Based Wrappers in the TSIMMIS System Joachim Hammer Hector Garcia-Molina Svetlozer Nestorov Ramana Yerneni Marcus Breunig Vasilia Vassalos SIGMOD97.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Database Systems and XML David Wu CS 632 April 23, 2001.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
2005Integration/tsimmis1 Tsimmis The Stanford-IBM Manager of Multiple Information Sources  Overview  Mediator specification  A reduction to Datalog.
CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Automatic Data Ramon Lawrence University of Manitoba
Putting Semi-structured Data to Practice Alon Levy Seattle, Washingon University of Washington.
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
INTERPRETING IMPERATIVE PROGRAMMING LAGUAGES IN EXTENSIBLE STYLESHEET LANGUAGE TRANSFORMATIONS (XSLT) Authors: Ruhsan Onder Assoc.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Information System Development Courses Figure: ISD Course Structure.
1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department,
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
1 Information Integration. 2 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Integrating Structured & Unstructured Data. Goals  Identify some applications that have crucial requirement for integration of unstructured and structured.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Kevin D. Munroe Bertram Ludäscher Yannis Papakonstantinou.
Mining the Biomedical Research Literature Ken Baclawski.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
AQUAINT Mid-Year Workshop: Observations and Comments Jimmy Lin MIT Artificial Intelligence Laboratory.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Capability Based Mediation in TSIMMIS
Data and Applications Security Developments and Directions
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Context-Aware Internet
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Presentation transcript:

1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego

2 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward

3 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and conflicting information WWW Ticker Tape Personal database Dialog

4 Goal: System Providing Integrated View of Heterogeneous Data Integration System WWW Personal database collects and combines information provides integrated view, uniform user interface Ticker Tape Dialog

5 The Wrapper and Mediator Architecture Mediator Wrapper Client business reports portfolios for each company stock market prices Ticker Tape Dialog Common Data Model

6 The Data Warehousing Approach to Integration Mediator Wrapper Client Ticker Tape Dialog Stored Integrated View

7 The Lazy Integration Approach Mediator Wrapper Client IBM portfolio IBM price IBM related reports (in common model) IBM related reports Ticker Tape Dialog Query Decomposition, Translation and Result Fusion

8 Mediator Client Wrapper Wrappers & Mediators from High-Level Specifications Mediator Specification Interpreter Wrapper Generator Wrapper Specification Mediator Specification Source

9 Challenge: Sources Without a Well- Structured Schema semistructured –irregular –deeply nested –cross-referenced incomplete schema knowledge –autonomous –dynamic HTML pages SGML documents genome data chemical structures bibliographic information results of the integration process Examples

10 Challenge: Different and Limited Source Capabilities Client Wrapper (A) Wrapper (B) Mediator (U = A + B) retrieve IBM data

11 Mediator has to Adapt to Query Capabilities of Sources Client Wrapper (A) Wrapper (B) Mediator (U = A + B) retrieve everything retrieve IBM data (A) does not allow selection

12 Part B Semistructured Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting

13 Representation of Semistructured Information using OEM semantic object-id label Atomic Value Set Value structural object-id

14 Graph Representation of OEM Data faculty first_name “John” last_name “Doe” rank “professor”

15 OEM Structures Represent Arbitrary Labeled Graphs faculty first_name “John” last_name “Doe” rank “professor” faculty name “Mary Smith” project “Air DB” paper author name “John Doe” author name “Mary Smith” title “Thin Air DB”

16 Overview Semistructured Data Representation Mediator Generation Example of mediator specification Language expressiveness Implementation and performance Wrapper Generation Capabilities-Based Rewriting

17 Merge Information Relating to a Faculty person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” papers... s1 faculty name “John Doe” rank “professor” birthday “April 1” papers...

18 Mediator Specification Example person name “John Doe” birthday “April 1” s2 }> :- }> :- faculty name “John Doe” rank “professor” papers... s1 faculty name “John Doe” rank “professor” birthday “April 1” papers...

19 Mediator Specification Example: Semantics of Rule Bodies }> :- }> :- person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” birthday “April 1” papers... faculty name “John Doe” rank “professor” papers... s1

20 Mediator Specification Example: Semantics of Rule Heads }> :- }> :- person name “John Doe” birthday “April 1” s2 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers... faculty name “John Doe” rank “professor” papers... s1

21 Incrementally Add to Semantically Identified Object }> :- }> :- faculty name “John Doe” rank “professor” papers... s1 person name “John Doe” birthday “April 1” s2 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers...

22 Irregularities & Incomplete Schema Knowledge }> :- faculty name “John Doe” rank “professor” papers faculty name “Mary Smith” project “Air DB” s1 person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” birthday “April 1” papers faculty name “Mary Smith” project “Air DB” “John Doe” “Mary Smith”

23 Second Rule Attaches More Subobjects to View Objects }> :- }> :- faculty name “John Doe” rank “professor” papers... s1 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers... person name “John Doe” birthday “April 1” s2

24 Language Expressiveness Information fusion problems solved by MSL –Irregularities –Incomplete knowledge of source structure –Transformation of cross-referenced structures –Inconsistent and redundant data –Use of arbitrary matching criteria Theoretical analysis of expressiveness –Consider the relational representation of OEM graphs. Then MSL is equivalent to “SQL + special form of transitive closure”

25 faculty name “John Doe” rank “associate” Inconsistent and Redundant Information }> :- }> :- AND NOT person name “John Doe” rank “assistant” s1s2 “John Doe” faculty name “John Doe” rank “associate” rank “assistant”

26 Overview Semistructured Data Representation Mediator Generation Example of mediator specification Language expressiveness Implementation and performance Wrapper Generation Capabilities-Based Rewriting

27 Mediator Specification Interpreter Architecture Query Rewriter Cost-Based Optimizer Datamerge Engine Mediator Specification Query logical datamerge program plan Result Queries to Wrappers Results

28 Query Rewriting When Known Origins of Information }> :- :- }> :- }> :- }> AND X>65000

29 Query Rewriter Pushes Conditions to Sources }> :- :- }> :- }> :- }> AND X>65000 logical datamerge program }> :- ( }> AND AND

30 :- <person { }> Passing Bindings & Local Join Plans Passing Bindings Local Join :- }> AND X>65000 :- <person { }> }>:- }> AND X>65000 N s1s2 s1s2

31 Query Decomposition When Unknown Origins of Information }> :- }> }> :- }> :-

32 Plan Considers All Possible Sources of birthday }> :- }> }> :- }> :- name s2s1 name birthday

33 Overview Semistructured-Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting

34 Query Translation in Wrappers Source SELECT * FROM person WHERE name=“Smith” find -all find -n Smith Query Translator Result Translator Wrapper

35 Rapid Query Translation Using Templates and Actions Source SELECT * FROM person WHERE name=“Smith” find -all find -n Smith Template Interpreter Result Translator SELECT * FROM person {emit “find -all” } SELECT * FROM person WHERE name=$N {emit “find -n $N”}

36 Description of Infinite Sets of Supported Queries uses recursive nonterminals Example: –job description contains word w1 and word w2 and... –SELECT subset(person) FROM person WHERE \CJob \CJob : job LIKE $W AND \CJob \CJob : TRUE

37 Overview Semistructured-Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting

38 Wrapper Supported Queries Description Capabilities-Based Rewriter in Mediator Architecture Capabilities- Based Rewriter Query Rewriter Cost-Based Optimizer Datamerge Engine logical datamerge program supported plans optimal plan Mediator Specification Wrapper Supported Queries Description Query

39 Capabilities-Based Rewriter Finds Supported Plans Supported Queries SELECT * FROM A WHERE salary>65000 SELECT * FROM A

40 Capabilities-Based Rewriter Finds Most-Selective Supported Plans Supported Queries SELECT * FROM B WHERE salary>65000 SELECT * FROM B WHERE salary >65000

41 Capabilities-Based Rewriter Architecture Component SubQuery Discovery Plan Construction Plan Refinement Query Capabilities Description Component SubQueries Plans (not fully optimized) Query Algebraically optimal plans

42 What TSIMMIS Achieved system for integration of heterogeneous sources challenges and solutions –semistructured data & incomplete schema knowledge appropriate specification language and query processing algorithms –limited and different query capabilities query translation algorithm capabilities-based query rewriting algorithm

43 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward

44 Insufficiencies of the TSIMMIS framework OEM was really unstructured data –some loose and partial schematic info may pay off tremendously too “databasy” user/mediator/source interaction

45 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward

46 Web emerges as a Distributed DB and XML as its Data Model Data Source Native XML Database XML View Document(s) XML View Document(s) XML View Document(s) Also export: 1. Schemas & Metadata (XML-Data, RDF,…) 2. Description of supported queries Wrapper Legacy Source XMAS Query Language

47 Definition of Integrated Views Data Source Data Source Data Source Mediator XML View Document(s) Integrated XML View Document(s) XML View Document(s) View Definition in XMAS

48 Non-Materialized Views in the MIX mediator system Blended Browsing & Querying (BBQ) GUI Application DOM for Virtual XML Doc’s MIX Mediator XMAS queryXML document DTD Inference Integrated View DTD XML Source Query Processor View Definition in XMAS Source DTD

49 RDB RDB2XML Wrapper DTD Inference Resolution Simplification Execution Unfolded Query Blended Browsing & Querying (BBQ) GUI MIX Mediator XMAS Mediator View Definition View DTD Translation to Algebra Optimization XML Document Fragments XMAS Query XML Source 1 XML Source 2 DTD XMAS Query XML Document Fragments DOM (VXD) Client API Application