Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.

Similar presentations


Presentation on theme: "1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI."— Presentation transcript:

1 1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI

2 2 Outline Information Integration –Definition –Architectures –Domain Models –Source wrapping Application to EIA –Example of Multi-State Data re-organization Wrapping Modelling

3 3 Information Integration Single Interface to Multiple Sources Decision Support Application Programs Information Agent Knowledge Bases Databases Computer Programs The Web

4 4 Information Integration The problem of providing uniform (sources transparent to user) access to (query, and eventually updates too ) multiple (even 2 is hard!) autonomous (not affect the behavior of sources) heterogeneous (different data models, schemas) structured (and semistructured) data sources (not only databases, web sources, …)

5 5 Information Integration in SIMS To enable query access SIMS needs to: address semantic heterogeneity: => describe sources in common domain model address syntactic (format) heterogeneity: => standardize access to sources: –Structured (DBMS): Oracle, MS Access … –Semistructured: wrappers for html, text, pdf

6 6 Domain Model (for Time Series) Point of Sale Time Series Week Month Year Period Quality CPI Price Volume Footnote Area USA CA NY Unit Measurement Value Date Text Tag G. Premium Unleaded Gasoline G. Unleaded G. Leaded Product Subclass Part-of General Relation Source Mapping G. Regular PPI

7 7 Integration Architectures Materialized: Virtual: Data Warehouse Mediator from [Levy2000]

8 8 Wrappers provide uniform mechanism for extracting data from semi-structured sources (HTML, text, …) transform semi-structured sources into structured Wrapper

9 9 Wrapper Building Tools Creating Wrappers (semi-)automatically: –Demonstration-oriented user interface enables users to show system what to extract by example –System automatically induces extraction rules –Common extraction engine Benefits: –Rapid wrapper creation –Simplified wrapper maintenance Fetch.com –Start-up that comercializes the technology [Muslea99]

10 10 Example: EIA Multi-State Data

11 11 EIA Multi-State Data: Multiple formats

12 12 EIA Multi-State Data: Table 31 Text source: Formatted text Tables contains national, regional (PADDs), state data  extract state data Tables contains different measurements

13 13 EIA Multi-State: Wrapper Creation Wrapper Creation 1. Mark-up a few examples and assign meaning (map to attributes from domain model) date value 2. System induces extraction rules

14 14 EIA Multi-State: Metadata Extraction Extract - data - metadata Associate with domain model

15 15 Extracted Data + Metadata

16 16 Domain Model Point of Sale EIA-T31-1 Week Month Year Period Quality Price Volume Footnote Area USA CA Maine Unit Measurement Value Date Text Tag G. Premium Unleaded Gasoline G. Unleaded G. Leaded Product Subclass Part-of General Relation Source Mapping G. Regular EIA Multi-state Wrapper Cents/gallon Retail outlets Time Serie s EIA-T31-1: Regular Gasoline Prices in Maine, Sales to end users through retail outlets Measured in cents per gallon

17 17 Additional slides

18 18 Example of Extraction Rule Start: SkipTo(Cuisine :) SkipTo( ) End: SkipTo( ) Page: Name: Chinois on Main Cuisine : Pacific New Wave RULE = sequence of landmarks (e.g., Cuisine : ) [Muslea et al 1999]

19 19 Training Examples: Example of Rule Induction SkipTo( ) Cuisine: Thai Review: Good [Muslea et al 1999] Review: Excellent

20 20 Training Examples: Example of Rule Induction SkipTo( ) Cuisine: Thai Review: Good [Muslea et al 1999] Review: Excellent SkipTo( )... SkipTo( : ) SkipTo( )... SkipTo( )SkipTo( )

21 21 Training Examples: Example of Rule Induction Cuisine: Thai Review: Good [Muslea et al 1999] Review: Excellent SkipTo( ) SkipTo( )... SkipTo( : ) SkipTo( )... SkipTo( )SkipTo( ) … SkipTo( Review :) SkipTo( )...

22 22 Mediator Architecture User queries in global (mediator) schema Mediator translates and decomposes user query into multiple source queries from [Levy2000]

23 23 System Architecture Sources Domain modeling - DB analysis - text analysis Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms Access phase: Create DB query Retrieve data Query processor - reformulation - cost optimization RST  User phase: Compose query User Interface - ontology browser - query constructor


Download ppt "1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI."

Similar presentations


Ads by Google