1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.

Slides:



Advertisements
Similar presentations
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Advertisements

1 Data Integration June 3 rd, What is Data Integration? uniform accessmultiple autonomousheterogeneousdistributed Provide uniform access to data.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Wrap up  Matching  Geometry  Semantics  Multiscale modelling / incremental update / generalization  Geometric algorithms  Web Services.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Status of Mediation Technology Gio Wiederhold Stanford University Oct 1999 SNU -- KINS.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 Basic DB Terms Data: Meaningful facts, text, graphics, images, sound, video segments –A collection of individual responses from a marketing research.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Page 1 MDBS Schema Integration: The Relational Integration Model Ramon Lawrence MDBS Schema Integration: The Relational Integration Model Candidacy Exam.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Automatic Data Ramon Lawrence University of Manitoba
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Information Integration José Luis Ambite, Ph.D. Project Leader, Information Sciences Institute Research Assistant Professor, Computer Science University.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Integrated Querying Across Disparate Data Sources José Luis Ambite & Gully APC Burns Information Sciences Institute University of Southern California.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
© 2007 by Prentice Hall 1 Introduction to databases.
Introduction to World Wide Web Authoring © Directorate of Information Systems and Services University of Aberdeen, 1999 IT Training Workshop.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Dimitrios Skoutas Alkis Simitsis
Ontoprise: B 3 - Semantic B2B Broker whitepaper review Bernhard Schueler CSCI 8350, Spring 2002,UGA.
The Glance Project ATLAS Management January 2012.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
1 Information Integration Mediators Warehousing Answering Queries Using Views Slides are modified from Dr. Ullman’s notes.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Using Ontologies to Enable Access to Multiple Heterogeneous Databases CARDGIS Eduard Hovy Information Sciences Institute University of Southern California.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
COMMUNITY. Data Acquisition and Usage Value Chain.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Object storage and object interoperability
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
1 A Medical Information Management System Using the Semantic Web Technology Networked Computing and Advanced INFORMATION MANAGEMENT, NCM '08. Fourth.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
A Rule Driven Bi-Directional Translation System for Remapping Queries and Result Sets Between a Mediated Schema and Heterogeneous Data Sources R. Shaker.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
VERA AULIA ( ).  Oil palm is one of the major edible oil traded in the global market.  Oil palm tree will start to produce fruits within three.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
A Mixed-Initiative System for Building Mixed-Initiative Systems Craig A. Knoblock, Pedro Szekely, and Rattapoom Tuchinda Information Science Institute.
NCSR “Demokritos” Institute of Informatics & Telecommunications CROSSMARC CROSS-lingual Multi Agent Retail Comparison WP3 Multilingual and Multimedia Fact.
Web Ontology Language for Service (OWL-S)
Enhance BI Applications and Simplify Development
Introduction of Week 9 Return assignment 5-2
Presentation transcript:

1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI

2 Outline Information Integration –Definition –Architectures –Domain Models –Source wrapping Application to EIA –Example of Multi-State Data re-organization Wrapping Modelling

3 Information Integration Single Interface to Multiple Sources Decision Support Application Programs Information Agent Knowledge Bases Databases Computer Programs The Web

4 Information Integration The problem of providing uniform (sources transparent to user) access to (query, and eventually updates too ) multiple (even 2 is hard!) autonomous (not affect the behavior of sources) heterogeneous (different data models, schemas) structured (and semistructured) data sources (not only databases, web sources, …)

5 Information Integration in SIMS To enable query access SIMS needs to: address semantic heterogeneity: => describe sources in common domain model address syntactic (format) heterogeneity: => standardize access to sources: –Structured (DBMS): Oracle, MS Access … –Semistructured: wrappers for html, text, pdf

6 Domain Model (for Time Series) Point of Sale Time Series Week Month Year Period Quality CPI Price Volume Footnote Area USA CA NY Unit Measurement Value Date Text Tag G. Premium Unleaded Gasoline G. Unleaded G. Leaded Product Subclass Part-of General Relation Source Mapping G. Regular PPI

7 Integration Architectures Materialized: Virtual: Data Warehouse Mediator from [Levy2000]

8 Wrappers provide uniform mechanism for extracting data from semi-structured sources (HTML, text, …) transform semi-structured sources into structured Wrapper

9 Wrapper Building Tools Creating Wrappers (semi-)automatically: –Demonstration-oriented user interface enables users to show system what to extract by example –System automatically induces extraction rules –Common extraction engine Benefits: –Rapid wrapper creation –Simplified wrapper maintenance Fetch.com –Start-up that comercializes the technology [Muslea99]

10 Example: EIA Multi-State Data

11 EIA Multi-State Data: Multiple formats

12 EIA Multi-State Data: Table 31 Text source: Formatted text Tables contains national, regional (PADDs), state data  extract state data Tables contains different measurements

13 EIA Multi-State: Wrapper Creation Wrapper Creation 1. Mark-up a few examples and assign meaning (map to attributes from domain model) date value 2. System induces extraction rules

14 EIA Multi-State: Metadata Extraction Extract - data - metadata Associate with domain model

15 Extracted Data + Metadata

16 Domain Model Point of Sale EIA-T31-1 Week Month Year Period Quality Price Volume Footnote Area USA CA Maine Unit Measurement Value Date Text Tag G. Premium Unleaded Gasoline G. Unleaded G. Leaded Product Subclass Part-of General Relation Source Mapping G. Regular EIA Multi-state Wrapper Cents/gallon Retail outlets Time Serie s EIA-T31-1: Regular Gasoline Prices in Maine, Sales to end users through retail outlets Measured in cents per gallon

17 Additional slides

18 Example of Extraction Rule Start: SkipTo(Cuisine :) SkipTo( ) End: SkipTo( ) Page: Name: Chinois on Main Cuisine : Pacific New Wave RULE = sequence of landmarks (e.g., Cuisine : ) [Muslea et al 1999]

19 Training Examples: Example of Rule Induction SkipTo( ) Cuisine: Thai Review: Good [Muslea et al 1999] Review: Excellent

20 Training Examples: Example of Rule Induction SkipTo( ) Cuisine: Thai Review: Good [Muslea et al 1999] Review: Excellent SkipTo( )... SkipTo( : ) SkipTo( )... SkipTo( )SkipTo( )

21 Training Examples: Example of Rule Induction Cuisine: Thai Review: Good [Muslea et al 1999] Review: Excellent SkipTo( ) SkipTo( )... SkipTo( : ) SkipTo( )... SkipTo( )SkipTo( ) … SkipTo( Review :) SkipTo( )...

22 Mediator Architecture User queries in global (mediator) schema Mediator translates and decomposes user query into multiple source queries from [Levy2000]

23 System Architecture Sources Domain modeling - DB analysis - text analysis Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms Access phase: Create DB query Retrieve data Query processor - reformulation - cost optimization RST  User phase: Compose query User Interface - ontology browser - query constructor