DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML DOCUMENTS AND DATABASES
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
C6 Databases.
IS698: Database Management Min Song IS NJIT. The Relational Data Model.
Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
Foundations of Relational Implementation n Defining Relational Data n Relational Data Manipulation n Relational Algebra.
Management Information Systems, Sixth Edition
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Schema Mapping as Query Discovery Renee J. Miller Laura M. Haas Mauricio A. Hernandez Presented by: Helen Chen.
Database Systems More SQL Database Design -- More SQL1.
Rutgers University Relational Algebra 198:541 Rutgers University.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
CSC271 Database Systems Lecture # 6. Summary: Previous Lecture  Relational model terminology  Mathematical relations  Database relations  Properties.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Database System Concepts and Architecture
Session-9 Data Management for Decision Support
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Next-generation databases Active databases: when a particular event occurs and given conditions are satisfied then some actions are executed. An active.
Dimitrios Skoutas Alkis Simitsis
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
MIS 673: Database Analysis and Design u Objectives: u Know how to analyze an environment and draw its semantic data model u Understand data analysis and.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Relational Database. Database Management System (DBMS)
Views Lesson 7.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Algorithmic Detection of Semantic Similarity WWW 2005.
1 CS 430 Database Theory Winter 2005 Lecture 4: Relational Model.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 The T4SQL Temporal Query Language Presented by 黃泰豐 2007/12/26.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Relational Database Design by ER- and EER-to- Relational Mapping
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Relational Algebra Chapter 4, Part A
Discrete Structures for Computer Science
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Data Integration for Relational Web
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Data Model.
Metadata The metadata contains
Query Optimization.
Presentation transcript:

DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010

INTRODUCTION USER – Difficult finding correct mappings for applications – Schema mappings are complex, effectively communicating subtleties involved – Understanding source data difficult, hence provide facility for schema and data exploration – Complexities of mapping and subtle difference between alternative mappings – Reasoning about complex non-associative operators – Increase of data and necessity to integrate data from multiple source – Mappings between these schemas – But Still some issues need to be addressed

ILLUSTRATIONS “ The Ultimate goal of schema is not building correct queries but to extract correct data from source to populate target schema” The user is expected to have thorough understanding of data Debug complex SQL queries or procedural transformations Clio makes it easy

ILLUSTRATIONS Source: Ling Ling Yan, Ren\&\#233;e J. Miller, Laura M. Haas, and Ronald Fagin Data-driven understanding and refinement of schema mappings. SIGMOD Rec. 30, 2 (May 2001), DOI= /

MAPPINGS Mapping is a query on source schema that produces subset of target relation Mapping involves three main activities Determining Correspondences Data Linking Data trimming

A be set of attributes A A A relation on schema S is named finite set of tuples on S t[A] dom(A) value of t on A Assumption: Relation in source database do not contain any tuple that are null on any attribute

Predicate P over schema S maps tuples on S to true or false – Join Predicate – Selection predicate A predicate is strong if it evaluates to false for every tuple that is null for all attributes in S Join Predicate is strong predicate Selection predicate is not required to be strong

Correspondence to Target What attribute and how it should appear in target relation E.g: Kids.FamilyIncome = parents.salary + parents2.salary (ref

DATA LINKING

DATA TRIMMING All tuples in Query Graph G may not be semantically meaningful Data associations in some category may be too incomplete to include User decides some categories are excluded as they have incomplete coverage

MAPPING DEFINITION

Mapping defines the relationship between a target relation and set of source relations, defined with three main components : – Query graph G – Set V of Value Components – Two sets of filter Cs and C T defining conditions source and target should satisfy

MAPPING EXAMPLES Positive example states how source tuples contribute successfully to target relation Negative example states how source tuples are combined correctly but fails to contribute

MAPPINGS OPERATORS Correspondence Operators Permit users to change value of correspondences Data Trimming Operators Modify the source and target filters of a mapping. They do not change the query graph of a mapping. Data Linking Operators Directly change the query graph of mapping. They are of two type: Data Walk Data Chase

DATA WALK In a data walk, the user knows where the missing data resides in the source or more specifically what source relation(s) contain this data. A data walk makes use of Clio’s knowledge of the source schema (which is gathered from schema and constraint definitions and from mining the source data, views, stored queries and metadata).

DATA CHASE In a data Chase, the user does not know where the missing data resides. The chase permits the user to explore the source data incrementally to locate the desired data. The user may not know which relations to include in the extended query graph.

CLIO FOR LARGE MAPPINGS Manage and manipulate multiple (possible) mappings while the user explores the data, creates new correspondences and extends the query graph. More complex the relationship between source and target, the more (possible) mappings we must handle. Large schemas are a source of complexity. Large volumes of data need to be transformed. Unfamiliar data sources the amount of data itself might be an obstacle for mapping.

CLIO MAPPING FRAMEWORK Clio provides Target Viewer “What You Is What You Get” flavor to the mapping. Source Viewer Serves as a palette from which users can choose the relations with which they want to work or explicitly select an edge to follow. Provides a visualization of the query graph being constructed. A set of workspaces, each associated with a single mapping alternative.

COMPLEX MAPPINGS Many single target mappings create will have great deal of overlap, differing only in a few correspondences or a small portion of query graph. The decisions made in creating one mapping can be stored and made available to the user in order reduce the burden and overhead of re-creating the bulk of each mapping from scratch.

CLIO FOR COMPLEX MAPPINGS Clio automatically computes both possible mappings and the user can accept one or several, adding filters as needed. Clio’s rich framework supports the user in specifying complex target mappings.

SUMMARY presents a new framework that uses examples drawn from source data to illustrate complex schema mappings. Provides formal definitions of mappings, mapping examples and mapping operators and shows how they can be used to help a user understand the data and develop mappings.

QUESTIONS?