Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications.

Similar presentations


Presentation on theme: "CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications."— Presentation transcript:

1 CSE 636 Data Integration Overview

2 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications OLAP / Decision Support Data Cubes / Data Mining ETL Tools (Extract-Transform-Load) Data Cleaning

3 3 Virtual Integration Architecture Leave the data in the sources When a query comes in: –Determine the relevant sources to the query –Break down the query into sub-queries for the sources –Get the answers from the sources, filter them if needed and combine them appropriately Data is fresh Otherwise known as On Demand Integration

4 4 Virtual Integration Architecture End Users   Applications Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Design-Time Schema Mappings Schema Mappings Schema Mappings Sources can be: Relational DBs Excel Files Web Sites Web Services

5 5 Differences in: –Names in schema –Attribute grouping –Coverage of databases –Granularity and format of attributes Inventory Database B Authors ISBN FirstName LastName Books Title ISBN Price DiscountPrice Edition Inventory Database A BooksAndMusic Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords Schema Mappings BookCategories ISBN Category CDCategories ASIN Category Artists ASIN ArtistName GroupName CDs Album ASIN Price DiscountPrice Studio

6 6 Issues for Schema Mappings Design-Time What formalisms to express them? How to create them? Can we discover them somehow? How do we use them? End Users   Applications Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Mappings Schema Mappings Schema Mappings

7 7 Mediator Virtual Integration Architecture Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Run-Time Reformulation Optimization Execution QueryResult Wrapper

8 8 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Reformulation Query User queries refer to the global schema Data is stored in the sources in a local schema Rewriting algorithms

9 9 Issues for Query Processing Reformulation Global Schema Books Title ISBN Price DiscountPrice Edition Local Schema A BooksAndMusic Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’ SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ‘on the road’ AND ItemType = ‘Books’

10 10 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Translation Reformulation Optimization Execution Query Wrapper Different query languages

11 11 Local Source A Issues for Query Processing Query Translation Global Schema Books Title ISBN Price DiscountPrice Edition SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’ http://www.amazon.com/homepage.html?ItemType=Books&Title=on+the+road

12 12 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Data Translation Reformulation Optimization Execution Query Wrapper Different data models

13 13 Issues for Query Processing Data Translation On the Road -- by Jack Kerouac; Paperback Buy new : $10.86 Local Result A Global Schema Books Title ISBN Price DiscountPrice Edition TitleISBNPrice…… On the Road12310.86……

14 14 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Execution Reformulation Optimization Execution Query Wrapper Access as many data sources as needed Duplicate/redundant and irrelevant data Limited query capabilities

15 15 Issues for Query Processing Limited Query Capabilities Global Schema Books Title ISBN Price DiscountPrice Edition Local Schema A BooksAndMusic Title Author ItemID ItemType SuggestedPrice SELECT ISBN, Price, DiscountPrice FROM Books WHERE Title = ‘on the road’ SELECT GreatPrice FROM DiscountBooks WHERE ISBN = ? Local Schema B DiscountBooks Title Edition ISBN GreatPrice SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ? SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ‘on the road’ A B SELECT GreatPrice FROM DiscountBooks WHERE ISBN = 123 C ItemIDSuggestedPrice 12310.86 ItemIDSuggestedPrice 12310.86 D E GreatPrice 8.86 ISBNPriceDiscountPrice 12310.868.86

16 16 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Answering Reformulation Optimization Execution QueryResult Wrapper Combine the results and further process them if needed Mainly union and merge Inconsistencies

17 17 Issues for Query Processing Query Answering (Union) ItemIDSuggestedPrice 12310.86 ISBNGreatPrice 4568.86 ISBNPrice 12310.86 4568.86

18 18 Issues for Query Processing Query Answering (Merge) ItemIDTitle 123On the Road ISBNEditionPrice 1232nd8.86 ISBNTitleEditionPrice 123On the Road2nd8.86 Primary Key ISBNTitleEditionPrice 123On the Road2nd8.86 Primary Key Primary Key

19 19 Issues for Query Processing Query Answering (Inconsistencies) ItemIDTitleEdition 123On the Road1st ISBNEditionPrice 1232nd8.86 ISBNTitleEditionPrice 123On the Road8.86 Primary Key ISBNTitleEditionPrice 123On the Road???8.86 Primary Key Primary Key

20 20 Source Domain Web Domain End Users  Application Domain Community-Based Integration Community Domain Data Source Mediator Community Schema Developers  New Source Application New Application Web Forms & Reports Source Schema …   Web Service Web Service Fairly-dynamic environment New sources register over time and new applications queries are formulated  Allow developers to easily build applications based on the community schema  So that each other’s needs are accommodated   Allow source owners to easily and independently register their source Source Owners  Community Owner 

21 21 Peer-Based Integration Peer 2 Peer 1 Peer 5 Peer 3 Peer 4 Query

22 22 Peer-Based Integration No need for a central mediated schema Peers serve as mediators for other peers A peer can be both a server and a client Semantic relationships are specified locally (between small sets of peers) Queries are posed using the peer’s schema Answers come from anywhere in the system This is not P2P file sharing. –Data has rich semantics

23 23 References Information integration –Maurizio Lenzerini –Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003 –Invited Tutorial Data Integration: a Status Report –Alon Halevy –German Database Conference (BTW), 2003 –Invited Talk


Download ppt "CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications."

Similar presentations


Ads by Google