Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:

Similar presentations


Presentation on theme: "CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:"— Presentation transcript:

1 CSE 636 Data Integration Introduction

2 2 Staff Instructor: Dr. Michalis Petropoulos Email: mpetropo@cse.buffalo.edu Location: 210 Bell Hall Office Hours: Wednesday & Friday 1:00-2:00pm & By Appointment Web Page http://www.cse.buffalo.edu/~mpetropo/CSE636-FA08/ Newsgroup sunyab.cse.636

3 3 Course Goals Data integration applications and architectures Issues in building such applications –Really big and currently active research area Solutions to several of them Provide foundation for –understanding current research problems –criticizing proposed solutions –proposing your own solution! Acquire valuable experience by implementing the project

4 4 Prerequisites An introductory database course –CSE 520, CSE 562 or equivalent Data structures and algorithms Knowledge Representation Distributed systems Complexity theory Mathematical Logic Curiosity! –You should ask a lot of questions Have a lot of fun!

5 5 Relevant Material Textbooks Database Systems: The Complete Book –by Garcia-Molina, Ullman and Widom Database Management Systems –by Ramakrishnan Fundamentals of Database Systems –by Elmasri and Navathe Foundations of Databases –by Abiteboul, Hull and Vianu Data on the Web –by Abiteboul, Buneman and Suciu

6 6 Course Format Assignments: 15% –Three assignments will be given, 5% each Final: 20% (take home) Projects: 60% –Detailed specs will be given –Can be used to satisfy the M.S. project requirement Participation: 5%

7 7 What is Data Integration? The problem of providing uniform (sources transparent to users) access to (query) multiple (even 2 is a problem) autonomous (not affect the behavior of sources) heterogeneous (different data models, schemas) structured (at least semistructured) data sources (not only databases)

8 8 The Data Integration Problem MyBookstore.com Mediated Schema DB BooksInventoryOrdersShippingReviews Site Morgan Kaufman Addison Wesley Prentice Hall East West DB Orders Site FedEx UPS DB Customer Reviews Site NY Times DB Intranet Site … WS Site Internet WS Internet Uniform query capability across autonomous, heterogeneous data sources on the Internet

9 9 Motivation Enterprise data integration –Web site construction WWW –Comparison shopping –Portals integrating data from multiple sources –B2B, electronic marketplaces Sciences –Geology: integrate geological data across the US continent (text as well as spatial data) –Biology: integrating genomic data

10 10 Current Solutions Mostly ad-hoc programming –Create a special solution for every case –Pay consultants a lot of money Data Warehousing (Data Exchange) –Load all the data periodically into a warehouse –Separates operational DBMS from decision support DBMS (not only a solution to data integration) –Performance is good –Data may not be fresh –Need to clean data

11 11 Course Outline (Tentative) Data Integration Scenarios & Architectures –Find out what the problems are Data Models & Type Systems –XML/Semistructured Data, DTDs, XML Schema Query & Transformation Languages –Datalog, XPath, XQuery, XSLT Data Integration Approaches –Different approaches depending on application characteristics Schema Integration –Schema Mapping/Matching –Semi-automate the discovery of schema mappings

12 12 Course Outline (cont) Distributed Query Processing Algorithms Query Rewriting Algorithms Limited Query Capabilities –We don’t have full access to any database Consistent Query Answers Web Services –What can they do for data integration? Semantic Web –RDF & SPARQL Workflow Languages –How is this related to data integration?

13 13 References Data Integration: a Status Report –Alon Halevy –German Database Conference (BTW), 2003 –Invited Talk Lecture Slides –Alon Halevy –http://www.cs.washington.edu/education/courses/cse544/00sp/l ectures/ps/l12.pshttp://www.cs.washington.edu/education/courses/cse544/00sp/l ectures/ps/l12.ps


Download ppt "CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:"

Similar presentations


Ads by Google