Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

Distributed DBMS©M. T. Özsu & P. Valduriez Ch.15/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.14/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Distributed DBMSPage 6. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
1 Minggu 2, Pertemuan 3 The Relational Model Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
6 Chapter 6 Database Design Hachim Haddouti. 6 2 Hachim Haddouti and Rob & Coronel, Ch6 In this chapter, you will learn: That successful database design.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Modeling & Designing the Database
1 Chapter 2 Database Environment. 2 Chapter 2 - Objectives u Purpose of three-level database architecture. u Contents of external, conceptual, and internal.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Chapter 2 Database System Concepts and Architecture
Ch 6: ER to Relational Mapping
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Chapter 2 CIS Sungchul Hong
ITEC224 Database Programming
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
DISTRIBUTED DATABASE DESIGN
Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
10/3/2012ISC329 Isabelle Bichindaritz1 Logical Design.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Dimitrios Skoutas Alkis Simitsis
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
DDBMS Distributed Database Management Systems Fragmentation
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Methodology – Physical Database Design for Relational Databases.
1Mr.Mohammed Abu Roqyah. Database System Concepts and Architecture 2Mr.Mohammed Abu Roqyah.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Ch- 8. Class Diagrams Class diagrams are the most common diagram found in modeling object- oriented systems. Class diagrams are important not only for.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Conceptualization Relational Model Incomplete Relations Indirect Concept Reflection Entity-Relationship Model Incomplete Relations Two Ways of Concept.
B. Information Technology (Hons.) CMPB245: Database Design Physical Design.
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Logical Design 12/10/2009GAK1. Learning Objectives How to remove features from a local conceptual model that are not compatible with the relational model.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline n Introduction Background Distributed DBMS Architecture Distributed Database.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Outline Background Introduction Distributed DBMS Architecture
Lec 6: Practical Database Design Methodology and Use of UML Diagrams
Chapter 2 Database Environment.
Presentation transcript:

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡ Schema Mapping Semantic Data Control Distributed Query Processing Multimedia Query Processing Distributed Transaction Management Data Replication Parallel Database Systems Distributed Object DBMS Peer-to-Peer Data Management Web Data Management Current Issues

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/2 Problem Definition Given existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS) ➡ GCS is also called mediated schema Bottom-up design process

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/3 Integration Alternatives Physical integration ➡ Source databases integrated and the integrated database is materialized ➡ Data warehouses Logical integration ➡ Global conceptual schema is virtual and not materialized ➡ Enterprise Information Integration (EII)

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/4 Data Warehouse Approach

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/5 Bottom-up Design GCS (also called mediated schema) is defined first ➡ Map LCSs to this schema ➡ As in data warehouses GCS is defined as an integration of parts of LCSs ➡ Generate GCS and map LCSs to this GCS

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/6 GCS/LCS Relationship Local-as-view ➡ The GCS definition is assumed to exist, and each LCS is treated as a view definition over it Global-as-view ➡ The GCS is defined as a set of views over the LCSs

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/7 Database Integration Process

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/8 Recall Access Architecture

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/9 Database Integration Issues Schema translation ➡ Component database schemas translated to a common intermediate canonical representation Schema generation ➡ Intermediate schemas are used to create a global conceptual schema

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/10 Schema Translation What is the canonical data model? ➡ Relational ➡ Entity-relationship ✦ DIKE ➡ Object-oriented ✦ ARTEMIS ➡ Graph-oriented ✦ DIPE, TranScm, COMA, Cupid ✦ Preferable with emergence of XML ✦ No common graph formalism Mapping algorithms ➡ These are well-known

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/11 Schema Generation Schema matching ➡ Finding the correspondences between multiple schemas Schema integration ➡ Creation of the GCS (or mediated schema) using the correspondences Schema mapping ➡ How to map data from local databases to the GCS Important: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCS

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/12 Running Example EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET, LOC, CNAME) ASG(ENO, PNO, RESP, DUR) PAY(TITLE, SAL) Relational E-R Model

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/13 Schema Matching Schema heterogeneity ➡ Structural heterogeneity ✦ Type conflicts ✦ Dependency conflicts ✦ Key conflicts ✦ Behavioral conflicts ➡ Semantic heterogeneity ✦ More important and harder to deal with ✦ Synonyms, homonyms, hypernyms ✦ Different ontology ✦ Imprecise wording

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/14 Schema Matching (cont’d) Other complications ➡ Insufficient schema and instance information ➡ Unavailability of schema documentation ➡ Subjectivity of matching Issues that affect schema matching ➡ Schema versus instance matching ➡ Element versus structure level matching ➡ Matching cardinality

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/15 Schema Matching Approaches

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/16 Linguistic Schema Matching Use element names and other textual information (textual descriptions, annotations) May use external sources (e.g., Thesauri) 〈 SC1.element-1 ≈ SC2.element-2, p, s 〉 ➡ Element-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p holds with a similarity value of s Schema level ➡ Deal with names of schema elements ➡ Handle cases such as synonyms, homonyms, hypernyms, data type similarities Instance level ➡ Focus on information retrieval techniques (e.g., word frequencies, key terms) ➡ “Deduce” similarities from these

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/17 Linguistic Matchers Use a set of linguistic (terminological) rules Basic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet) Predicate p and similarity value s ➡ hand-crafted ⇒ specified, ➡ discovered ⇒ may be computed or specified by an expert after discovery Examples ➡ 〈 uppercase names ≈ lower case names, true, 1.0 〉 ➡ 〈 uppercase names ≈ capitalized names, true, 1.0 〉 ➡ 〈 capitalized names ≈ lower case names, true, 1.0 〉 ➡ 〈 DB1.ASG ≈ DB2.WORKS_IN, true, 0.8 〉

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/18 Automatic Discovery of Name Similarities Affixes ➡ Common prefixes and suffixes between two element name strings N-grams ➡ Comparing how many substrings of length n are common between the two name strings Edit distance ➡ Number of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the other Soundex code ➡ Phonetic similarity between names based on their soundex codes Also look at data types ➡ Data type similarity may suggest stronger relationship than the computed similarity using these methods or to differentiate between multiple strings with same value

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/19 N-gram Example 3-grams of string “Responsibility” are the following: Res sib ibi esp bip spo ili pon lit ons ity nsi 3-grams of string “Resp” are ➡ Res ➡ esp 3-gram similarity: 2/12 = 0.17

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/20 Edit Distance Example Again consider “Responsibility” and “Resp” To convert “Responsibility” to “Resp” ➡ Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y” To convert “Resp” to “Responsibility” ➡ Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y” The number of edit operations required is 10 Similarity is 1 − (10/14) = 0.29

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/21 Constraint-based Matchers Data always have constraints – use them ➡ Data type information ➡ Value ranges ➡ … Examples ➡ RESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low) ➡ If they come from the same domain, this may increase their similarity value ➡ ENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-R ➡ ENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRING

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/22 Constraint-based Structural Matching If two schema elements are structurally similar, then there is a higher likelihood that they represent the same concept Structural similarity: ➡ Same properties (attributes) ➡ “Neighborhood” similarity ✦ Using graph representation ✦ The set of nodes that can be reached within a particular path length from a node are the neighbors of that node ✦ If two concepts (nodes) have similar set of neighbors, they are likely to represent the same concept

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/23 Learning-based Schema Matching Use machine learning techniques to determine schema matches Classification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar concepts Similarity is defined according to features of data instances Classification is “learned” from a training set

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/24 Learning-based Schema Matching

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/25 Combined Schema Matching Approaches Use multiple matchers ➡ Each matcher focuses on one area (name, etc) Meta-matcher integrates these into one prediction Integration may be simple (take average of similarity values) or more complex (see Fagin’s work)

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/26 Schema Integration Use the correspondences to create a GCS Mainly a manual process, although rules can help

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/27 Binary Integration Methods

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/28 N-ary Integration Methods

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/29 Schema Mapping Mapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target. Data warehouses ⇒ actual translation Data integration systems ⇒ discover mappings that can be used in the query processing phase Mapping creation Mapping maintenance

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/30 Mapping Creation Given ➡ A source LCS ➡ A target GCS ➡ A set of value correspondences discovered during schema matching phase Produce a set of queries that, when executed, will create GCS data instances from the source data. We are looking, for each T k, a query Q k that is defined on a (possibly proper) subset of the relations in S such that, when executed, will generate data for T i from the source relations

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/31 Mapping Creation Algorithm General idea: Consider each T k in turn. Divide V k into subsets such that each specifies one possible way that values of T k can be computed. Each can be mapped to a query that, when executed, would generate some of T k ’s data. Union of these queries gives