Data integration Chitta Baral Arizona State University.

Slides:



Advertisements
Similar presentations
Database Searching: How to Find Journal Articles? START.
Advertisements

Sound familiar?. Problems with Lecture Lack of student interaction Lack of student interaction Does not engage students in material Does not engage students.
CSE 636 Data Integration Data Integration Approaches.
CHAPTER 3: DESCRIBING DATA SOURCES
Information Integration Using Logical Views Jeffrey D. Ullman.
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
= Answer shown here in 5 seconds Question ? here Decimal test study session … Each two space bar touches starts a new question. The answer will appear.
Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,
The Entity-Relationship Model
Polya’s Four Step Problem Solving Process
Indexes. An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a fixed value for attribute.
Exploration 3.13 You try Lattice Multiplication for
Chapter Information Systems Database Management.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
Introduction to CSE 591: Autonomous agents - theory and practice. Chitta Baral Professor Department of Computer Sc. & Engg. Arizona State University.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
CSE 636 Data Integration Answering Queries Using Views Overview.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
The Relational Database Model
Structured Query Language (SQL) A2 Teacher Up skilling LECTURE 2.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Knowing which topic is only the beginning Even if you have selected a topic for your thesis, and narrowed that topic down sufficiently, you do not have.
1 April 2012 Quality assurance in education at NTNU.
COMP 211 REQUIREMENTS CAPTURE 1 ASU Course Registration Acceptance Test Plan (Inception Phase)
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 3 Introduction to a First Course in Database Systems.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
1 Lecture 6: Views Friday, January 17th, Updating Views How can I insert a tuple into a table that doesn’t exist? Employee(ssn, name, department,
How to Conduct a Group Study. Earlier in the course you chose a group with which to work during the semester. You will be responsible for conducting a.
Introduction to Indexes. Indexes An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a.
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
Virtual Field Trips: Bringing the World Into the Classroom By: Cynthia Harrison EDUC -7101/ Diffusion and Integration of Technology in Education.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 4: Intermediate.
In section 11.9, we were able to find power series representations for a certain restricted class of functions. Here, we investigate more general problems.
Copyright © Curt Hill Joins Revisited What is there beyond Natural Joins?
Call to Write, Third edition Chapter Two, Reading for Academic Purposes: Analyzing the Rhetorical Situation.
Chapter 2: Intro to Relational Model. 2.2 Example of a Relation attributes (or columns) tuples (or rows)
Class material and homework for February 9 today’s in-class topic: selected examples of contemporary biotechnology –polymerase chain reaction (PCR) –DNA.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
ASU Rosters for Roster Contacts Class Roster ASR Roster Grade Roster Grade Changes.
Strategies for reading activities. Strategies – Reading 1 Before you start reading: Read the instructions carefully so you don’t misunderstand what to.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Copyright OpenHelix. No use or reproduction without express written consent1.
Writing an Essay. Reading a Primary Source: Step 1 Who wrote this document? In the first place, you need to know how this document came to be created.
Data Integration Approaches
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Relational Algebra Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 4.
Modifying the Database
Information Systems Database Management
Computing Full Disjunctions
Relational Schemas Classroom (building, room-number, capacity) Department (dept-name, building, budget) Course (course-id, title, dept-name, credits) Instructor.
Chapter 4: Intermediate SQL Joins
Functional Dependencies and Normalization
Sequence: A list of numbers in a particular order
Introduction to Comparative Effectiveness Course (HAP 823)
Chapter 2: Intro to Relational Model
Final exam project guidance
Local-as-View Mediators
Information Integration
Chapter 19 (part 1) Functional Dependencies
Chen Li Information and Computer Science
Syllabus Introduction Website Management Systems
How to search NCBI.
Presentation transcript:

Data integration Chitta Baral Arizona State University

Example 1 Data Source 1 –List of course#s with the title `Database Systems’ taught anywhere, their instructors, and university names. View: R1(prof, course#, university) Data Source 2 –List of Ph.D level courses taught at ASU, professors name and course# View: R2(title,prof,course#) Query: List the course#s of courses taught at ASU, and the professor names who teach the course. Partial answer: obtained by using the following query SELECT course#, prof FROM R1WHERE university = ASU UNION SELECT course#, prof FROM R2

General question Given: Several sources and a query Problem: How do we best answer this query using the several sources that are available? First Step: Need to model the sources; Need to have a global picture. Two basic approaches: Global as view (GaV) and Local as view (LaV). LaV for the last example. –Global schema: Teaches(prof,course#,title,semester,university) –Create view R1 as SELECT prof, course#, universityFROM Teaches WHERE title = `Database systems’. –Create view R2 as SELECT title, prof, course# FROM Teaches WHERE univesrity = ASU and course# >= 500 –Now given a query (in English), we need to express it in terms of the global schema, and then reformulate it (to the extent possible) in terms of the sources (R1 and R2 here). To do that use the relation between R1 (and R2) and the global schema.

A Global as View (GaV) example 3 movie sources –S1(title,dir,year,genre) from until –S2(title,dir) since 1970 –S3(title, year, genre) all movies A global view: S1 union (S2 join S3) –SELECT * FROM S1 UNION SELECT S2.title, S2.dir, S3.year, S3.genre FROM S2, S3 WHERE S2.title = S3.title Another global view: union of S1, (S2 join S3) and 4- tuples made up of tuples in S2 and S3 (where the title does not appear in the other) with added null values. –If we have S2(xyz, uvw) and xyz is not a title that appears in S3 then we assume (xyz,uvw,null,null) is part of the global view.

LaV vs GaV Given a query reformulating it in terms of the sources –Is easier in GAV (just needs unfolding of the query) –Is harder in LaV Adding a new source –Supposedly easier in LaV (just need to express the new source as a view of the global schema) –Harder in GaV (as the global schema needs to be revised)

Steps for Projects of type 2 Given: Some NIH/NCBI/Others data sources Goal: Virtual integration of these sources First Step: Explore each data source to figure out the `view’ of each source. Second Step: Come up with a global schema (don’t think too much about the sources; or keep the global schema general enough that if GaV is used then adding new sources does not change the global schema) Third Step (GaV based approach): –Define the global schema in terms of the source views. –Now any global query can be unfolded to a query (can be done in real time) in terms of the source views. Alternative third step (LaV approach): –Define each source in terms of the global schema. –Now any global query needs to be `reformulated’ in terms of the source views. –Several `reformulation’ techniques are available. –For LaV approach a particular set of queries can be considered a priori and their reformulation could be made before hand (rather than in real time) –Havasu and Biohavasu follow this approach.

HW 2 Review: Sample answers Correct understanding about the queries List all human genes, name of their discoverer and the project through which it was discovered –In this I tried to find out the human genes that are completely discovered. –Querying the genome database for human genome did this. –But the links field did not give links to the Pubmed articles relevant to that specific gene. –Had to type in another query with the specific gene information for Pubmed. Misunderstanding about what queries are –How are the DNA probes identified in a DNA chip? –What are the four nucleotide bases in a DNA ?