Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.

Similar presentations


Presentation on theme: "1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego."— Presentation transcript:

1 1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego

2 2 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward

3 3 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and conflicting information WWW Ticker Tape Personal database Dialog

4 4 Goal: System Providing Integrated View of Heterogeneous Data Integration System WWW Personal database collects and combines information provides integrated view, uniform user interface Ticker Tape Dialog

5 5 The Wrapper and Mediator Architecture Mediator Wrapper Client business reports portfolios for each company stock market prices Ticker Tape Dialog Common Data Model

6 6 The Data Warehousing Approach to Integration Mediator Wrapper Client Ticker Tape Dialog Stored Integrated View

7 7 The Lazy Integration Approach Mediator Wrapper Client IBM portfolio IBM price IBM related reports (in common model) IBM related reports Ticker Tape Dialog Query Decomposition, Translation and Result Fusion

8 8 Mediator Client Wrapper Wrappers & Mediators from High-Level Specifications Mediator Specification Interpreter Wrapper Generator Wrapper Specification Mediator Specification Source

9 9 Challenge: Sources Without a Well- Structured Schema semistructured –irregular –deeply nested –cross-referenced incomplete schema knowledge –autonomous –dynamic HTML pages SGML documents genome data chemical structures bibliographic information results of the integration process Examples

10 10 Challenge: Different and Limited Source Capabilities Client Wrapper (A) Wrapper (B) Mediator (U = A + B) retrieve IBM data

11 11 Mediator has to Adapt to Query Capabilities of Sources Client Wrapper (A) Wrapper (B) Mediator (U = A + B) retrieve everything retrieve IBM data (A) does not allow selection

12 12 Part B Semistructured Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting

13 13 Representation of Semistructured Information using OEM semantic object-id label Atomic Value Set Value structural object-id

14 14 Graph Representation of OEM Data faculty first_name “John” last_name “Doe” rank “professor” http://www/~doe

15 15 OEM Structures Represent Arbitrary Labeled Graphs faculty first_name “John” last_name “Doe” rank “professor” http://www/~doe faculty name “Mary Smith” project “Air DB” paper author name “John Doe” author name “Mary Smith” title “Thin Air DB” http://www/~smith

16 16 Overview Semistructured Data Representation Mediator Generation Example of mediator specification Language expressiveness Implementation and performance Wrapper Generation Capabilities-Based Rewriting

17 17 Merge Information Relating to a Faculty person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” papers... s1 faculty name “John Doe” rank “professor” birthday “April 1” papers...

18 18 Mediator Specification Example person name “John Doe” birthday “April 1” s2 }> :- }>@s1 }> :- }>@s2 faculty name “John Doe” rank “professor” papers... s1 faculty name “John Doe” rank “professor” birthday “April 1” papers...

19 19 Mediator Specification Example: Semantics of Rule Bodies }> :- }>@s1 }> :- }>@s2 person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” birthday “April 1” papers... faculty name “John Doe” rank “professor” papers... s1

20 20 Mediator Specification Example: Semantics of Rule Heads }> :- }>@s1 }> :- }>@s2 person name “John Doe” birthday “April 1” s2 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers... faculty name “John Doe” rank “professor” papers... s1

21 21 Incrementally Add to Semantically Identified Object }> :- }>@s1 }> :- }>@s2 faculty name “John Doe” rank “professor” papers... s1 person name “John Doe” birthday “April 1” s2 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers...

22 22 Irregularities & Incomplete Schema Knowledge }> :- }>@s1 faculty name “John Doe” rank “professor” papers faculty name “Mary Smith” project “Air DB” s1 person name “John Doe” birthday “April 1” s2 faculty name “John Doe” rank “professor” birthday “April 1” papers faculty name “Mary Smith” project “Air DB” “John Doe” “Mary Smith”

23 23 Second Rule Attaches More Subobjects to View Objects }> :- }>@s1 }> :- }>@s2 faculty name “John Doe” rank “professor” papers... s1 “John Doe” faculty name “John Doe” rank “professor” birthday “April 1” papers... person name “John Doe” birthday “April 1” s2

24 24 Language Expressiveness Information fusion problems solved by MSL –Irregularities –Incomplete knowledge of source structure –Transformation of cross-referenced structures –Inconsistent and redundant data –Use of arbitrary matching criteria Theoretical analysis of expressiveness –Consider the relational representation of OEM graphs. Then MSL is equivalent to “SQL + special form of transitive closure”

25 25 faculty name “John Doe” rank “associate” Inconsistent and Redundant Information }> :- }>@s1 }> :- }>@s2 AND NOT }>@s1 person name “John Doe” rank “assistant” s1s2 “John Doe” faculty name “John Doe” rank “associate” rank “assistant”

26 26 Overview Semistructured Data Representation Mediator Generation Example of mediator specification Language expressiveness Implementation and performance Wrapper Generation Capabilities-Based Rewriting

27 27 Mediator Specification Interpreter Architecture Query Rewriter Cost-Based Optimizer Datamerge Engine Mediator Specification Query logical datamerge program plan Result Queries to Wrappers Results

28 28 Query Rewriting When Known Origins of Information }> :- :- }>@s1 }> :- }>@s2 }> :- }> AND X>65000

29 29 Query Rewriter Pushes Conditions to Sources }> :- :- }>@s1 }> :- }>@s2 }> :- }> AND X>65000 logical datamerge program }> :- ( }> AND X>65000)@s1 AND }>@s2

30 30 :- <person { }> Passing Bindings & Local Join Plans Passing Bindings Local Join :- }> AND X>65000 :- <person { }> }>:- }> AND X>65000 N s1s2 s1s2

31 31 Query Decomposition When Unknown Origins of Information }> :- }> }> :- }>@s1 }> :- }>@s2

32 32 Plan Considers All Possible Sources of birthday }> :- }> }> :- }>@s1 }> :- }>@s2 name s2s1 name birthday

33 33 Overview Semistructured-Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting

34 34 Query Translation in Wrappers Source SELECT * FROM person WHERE name=“Smith” find -all find -n Smith Query Translator Result Translator Wrapper

35 35 Rapid Query Translation Using Templates and Actions Source SELECT * FROM person WHERE name=“Smith” find -all find -n Smith Template Interpreter Result Translator SELECT * FROM person {emit “find -all” } SELECT * FROM person WHERE name=$N {emit “find -n $N”}

36 36 Description of Infinite Sets of Supported Queries uses recursive nonterminals Example: –job description contains word w1 and word w2 and... –SELECT subset(person) FROM person WHERE \CJob \CJob : job LIKE $W AND \CJob \CJob : TRUE

37 37 Overview Semistructured-Data Representation Mediator Generation Wrapper Generation Capabilities-Based Rewriting

38 38 Wrapper Supported Queries Description Capabilities-Based Rewriter in Mediator Architecture Capabilities- Based Rewriter Query Rewriter Cost-Based Optimizer Datamerge Engine logical datamerge program supported plans optimal plan Mediator Specification Wrapper Supported Queries Description Query

39 39 Capabilities-Based Rewriter Finds Supported Plans Supported Queries SELECT * FROM A WHERE salary>65000 SELECT * FROM A

40 40 Capabilities-Based Rewriter Finds Most-Selective Supported Plans Supported Queries SELECT * FROM B WHERE salary>65000 SELECT * FROM B WHERE salary >65000

41 41 Capabilities-Based Rewriter Architecture Component SubQuery Discovery Plan Construction Plan Refinement Query Capabilities Description Component SubQueries Plans (not fully optimized) Query Algebraically optimal plans

42 42 What TSIMMIS Achieved system for integration of heterogeneous sources challenges and solutions –semistructured data & incomplete schema knowledge appropriate specification language and query processing algorithms –limited and different query capabilities query translation algorithm capabilities-based query rewriting algorithm

43 43 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward

44 44 Insufficiencies of the TSIMMIS framework OEM was really unstructured data –some loose and partial schematic info may pay off tremendously too “databasy” user/mediator/source interaction

45 45 Overview TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward

46 46 Web emerges as a Distributed DB and XML as its Data Model Data Source Native XML Database XML View Document(s) XML View Document(s) XML View Document(s) Also export: 1. Schemas & Metadata (XML-Data, RDF,…) 2. Description of supported queries Wrapper Legacy Source XMAS Query Language

47 47 Definition of Integrated Views Data Source Data Source Data Source Mediator XML View Document(s) Integrated XML View Document(s) XML View Document(s) View Definition in XMAS

48 48 Non-Materialized Views in the MIX mediator system Blended Browsing & Querying (BBQ) GUI Application DOM for Virtual XML Doc’s MIX Mediator XMAS queryXML document DTD Inference Integrated View DTD XML Source Query Processor View Definition in XMAS Source DTD

49 49 RDB RDB2XML Wrapper DTD Inference Resolution Simplification Execution Unfolded Query Blended Browsing & Querying (BBQ) GUI MIX Mediator XMAS Mediator View Definition View DTD Translation to Algebra Optimization XML Document Fragments XMAS Query XML Source 1 XML Source 2 DTD XMAS Query XML Document Fragments DOM (VXD) Client API Application


Download ppt "1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego."

Similar presentations


Ads by Google