© 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

Slides:



Advertisements
Similar presentations
Oracle SQL Developer Data Modeler 3.0: Technical Overview March 2011.
Advertisements

Francesca Bugiotti Università Roma Tre 1 17/12/2009.
Data integration and transformation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010.
Information capacity in schema and data translation Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari, G. Gianforme,
WebRatio BPM: a Tool for Design and Deployment of Business Processes on the Web Stefano Butti, Marco Brambilla, Piero Fraternali Web Models Srl, Italy.
Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
1 CSL Workshop, October 13-14, 2005 ESDI Workshop on Conceptual Schema Language and Tools - Aim, Scope, and Issues to be Addressed Anders Friis-Christensen,
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Generic Schema Matching using Cupid
Xyleme A Dynamic Warehouse for XML Data of the Web.
Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
1 Basic DB Terms Data: Meaningful facts, text, graphics, images, sound, video segments –A collection of individual responses from a marketing research.
--The Entity Relationship Model(1)--1 The Entity Relationship Model.
Understanding Metamodels. Outline Understanding metamodels Applying reference models Fundamental metamodel for describing software components Content.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
WebRatio BPM: a Tool for Design and Deployment of Business Processes on the Web Stefano Butti, Marco Brambilla, Piero Fraternali Web Models Srl, Italy.
Schema Matching Algorithms Phil Bernstein CSE 590sw February 2003.
Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005.
George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens
Database Management Systems ISYS 464 David Chao. Introduction to Databases The most important component in an information system Created to support all.
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
Sept. 15, 2003© 2003 Microsoft Corporation1 Generic Model Management: A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Research.
IRS XML Standards & Tax Return Data Strategy For External Discussion June 30, 2010.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.
Meta Data Management Philip A. Bernstein Sergey Melnik {philbe, Microsoft Research Modified version of the seminar presented at ICDE,
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Generic Model Management: A Database Infrastructure for Schema Manipulation Philip A. Bernstein Senior Researcher Database Research Microsoft Corporation.
Database System Concepts and Architecture
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
© 2007 by Prentice Hall 1 Introduction to databases.
Vision The ultimate IDE/CASE tool should supports all steps in the software development process. Current tools perform only minimal semantic-level analysis.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Dimitrios Skoutas Alkis Simitsis
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
CSE 636 Data Integration Schema Matching Cupid Fall 2006.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Generic Schema Matching using Cupid Jayant Madhavan University of Washington Philip A. Bernstein Erhard Rahm Microsoft Research University of Leipzig.
ModelPedia Model Driven Engineering Graphical User Interfaces for Web 2.0 Sites Centro de Informática – CIn/UFPe ORCAS Group Eclipse GMF Fábio M. Pereira.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
The Unified Modeling Language (UML)
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Inference-based Semantic Mediation and Enrichment for the Semantic Web AAAI SSS-09: Social Semantic Web: Where Web 2.0 Meets Web 3.0 March 25, 2009 Dan.
A facilitator to discover and compose services Oussama Kassem Zein Yvon Kermarrec ENST Bretagne.
Yu, et al.’s “A Model-Driven Development Framework for Enterprise Web Services” In proceedings of the 10 th IEEE Intl Enterprise Distributed Object Computing.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
Logical Design 12/10/2009GAK1. Learning Objectives How to remove features from a local conceptual model that are not compatible with the relational model.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Building Enterprise Applications Using Visual Studio®
Phil Bernstein Microsoft Corp.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Data Model.
Metadata Framework as the basis for Metadata-driven Architecture
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Business Process Management and Semantic Technologies
Presentation transcript:

© 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September 6, 2001

© 2001 Microsoft Corp.2 The Problem  There is 30 years of DB Research on meta data  But we don’t have great infrastructure to offer –Most design tools and web services store meta data in files, not DBs –OODBMS’s are not a huge success –Most meta data driven tools use their own infrastructure  Goal: generic meta data manipulation infrastructure –Reduce the amount of programming required to build meta data driven applications. Proposal: Model Management – Define an algebra to manipulate meta data in large chunks, called models and mappings.

© 2001 Microsoft Corp.3 Outline Overview of Model Management Solutions to classical meta data problems Recent technical results

© 2001 Microsoft Corp.4 Models and Mappings Model – a complex information structure –XML schema, SQL schema, OO interface, UML model, web site map, make script, …. Mapping – a transformation from one model into another –Map between two XML schemas –Map a SQL schema to an XML schema –Map data sources to a data warehouse –Map an ER diagram to a SQL schema –Map a process defn to a workflow script

© 2001 Microsoft Corp.5 Representation A model is a directed graph with one root. Emp E# Dept# Name Relational Schema Emp E# Dept# Name First Last XSD map 1 A mapping is a model each of whose nodes connects nodes of two other models

© 2001 Microsoft Corp.6 Model Management Algebra Match Merge Compose Select Diff Enumerate ApplyFunction Copy Update operations

© 2001 Microsoft Corp.7 map = Match(M 1, M 2,  ) Match(M 1, M 2,  ) returns the best mapping between M 1 and M 2, w.r.t. to  map1 = = Emp E# Dept# Name Addr M1M1 M2M2 Emp E# Dept# Name First Last Phone 

© 2001 Microsoft Corp.8 M 3 = Merge(M 1, M 2, map) Return the union of models M 1 and M 2 –Use map to guide the Merge –If elements x = y in map, then collapse them into one element Emp AddrName Emp NamePhone mapC = Emp NamePhoneAddr 

© 2001 Microsoft Corp.9 Left Composition ( f ) Emp Addr Street City Emp Street City Emp StAddr Town mapA a1 a2 a3 mapB b2 b3 M1M1 M2M2 M3M3 Emp Addr Street City Emp StAddr Town mapC c1 c2 c3 mapC = mapA f mapB Name b1

© 2001 Microsoft Corp.10 Model Management Algebra map = Match ( M 1, M 2,  ) M 3 = Merge ( M 1, M 2, map ) map 3 = Compose( map 1, map 2 ) M 2 = Select( M 1, pred ) M 2 = Diff( M 1, map ) list = Enumerate( M ) ApplyFunction( M, f ) M 2 = Copy( M 1 ) Update operations They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story

© 2001 Microsoft Corp.11 Examplerdb1 xsd1 map 1 xsd2 1. map 2 1. map 2 = Match(xsd1, xsd2) 2. map 3 2. map 3 = map 1  map 2 rdb2 3. map 4 3. = Copy(map 3 ) Given –map 1 from SQL schema rdb1 to xsd1, –xsd2, which is similar to xsd1 Produce –a map between xsd2 and a relational schema. 4. Use ApplyFunction(map 4 ) to map each x in Diff(xsd2,map 4 ) into rdb2

© 2001 Microsoft Corp.12 Theme Classic meta data problems can be solved using Model Management operations –Schema integration –Schema evolution –Data migration –Reverse engineering –Data reintegration (3-way merging) Published solutions to these problems help us produce generic implementations of model mgmt operations

© 2001 Microsoft Corp.13 Outline Overview of Model Management Solutions to classical meta data problems –Schema integration –Schema evolution –Reverse engineering –Data reintegration (3-way merging) –Data migration Recent technical results

© 2001 Microsoft Corp map 1. map= Match(V 1, V 2 ) Schema Integration Given –two view schemas, V 1 and V 2 Produce –an integrated schema, S V1V1V1V1 V2V2V2V2 2. S = Merge(V 1, V 2, map) map S ) // to resolve conflicts in, S 3. ApplyFunction(S) // to resolve conflicts in S, producing S S

© 2001 Microsoft Corp.15 Emp E# Dept# Addr V1V1 V2V2 E# Dept# Phone FirstName LastName Emp Name 1. map= Match(V 1, V 2 ) map = =  2. S = Merge(V 1, V 2, map) S E# Dept# Addr Phone Emp Name FirstName LastName f L R FirstName LastName 3. Use ApplyFunction(S) to re- solve conflicts, producing S

© 2001 Microsoft Corp.16 Merging Knowledge Bases (Ontologies) Same as schema integration, but applied to ontologies The literature on merging ontologies focuses mostly on Match.

© 2001 Microsoft Corp.17 Schema Evolution Given –map SV from schema S to view V –a modified version S of S Produce –a mapping map SV from S to V (i.e. a view defn for V over S). S V map SV S 1. map SS 1. map SS = Match(S, S) 2. map SV 2. map S V = map S S  map SV 3.Use ApplyFunction(V) to delete elements not derivable from S

© 2001 Microsoft Corp.18 Outline Overview of Model Management Solutions to classical meta data problems Schema integration Schema evolution –Reverse engineering –Data reintegration (3-way merging) –Data migration Recent technical results

© 2001 Microsoft Corp.19 Reverse Engineering Given –Model M (e.g., an ER model) –Model G (e.g., SQL) generated via map MG from M –A modified version G of G Produce –A modified version M of M that generates G G M map MG G 1. map GG 1. map GG = Match(G, G) 2. map MG 2. map MG = map MG  map GG M 3. map MG 3. = Copy(map MG ) 4. Use ApplyFunction(map MG ), to reverse engineer each g in Diff(G,map MG ) into M

© 2001 Microsoft Corp.20 3-Way Merge (aka Reintegration) Given –a source schema S 0 –two derived schemas S 1 and S 2 Produce –a schema S 3 that merges the changes of S 1 and S 2 1.MapOA = Match(O, A) (based on OIDs) 2.MapOB = Match (O, B) (based on OIDs) 3.MapOA = ApplyFunction(MapOA) such that if e  MapOA if domain(e) = range(e), then delete e (i.e. things changed in A) 4.MapOB = ApplyFunction(MapOB) such that if e  MapOB if domain(e) = range(e), then delete e (i.e. things changed in B) 5.ChangedA = range(MapOA) 6.ChangedB = range(MapOB) 7.MapChAChB = Match(ChangedA, ChangedB) 8.MapChBChA = invert(MapChAChB) 9.A = Diff(ChangedA, ChangedB, MapChAChB) (changed in A but not changed in B) 10.B = Diff(ChangedB, ChangedA, MapChBChA) 11.MapAB = Match (A,B) (by OIDs) 12.G = Merge (A,B, MapAB) 13.MapGA =Match(G,A) 14.GA = Merge (G, A, MapGA) with preference for A 15.MapGAB =Match(GA,B) 16.GAB = Merge (GA’, B’, MapGA’B’) with preference for B 17.DeletedA = Diff(O,A,MapOA) 18.DeletedB = Diff(O, B, MapOB) 19.MapDeletedAChangedB = Match(DeletedA, ChangedB) 20.MapDeletedBChangedA = Match(DeletedB, ChangedA) 21.ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB) 22.ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA) 23.MapGABSDA = Match(GAB, ShouldDeleteA) 24.GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA) 25.MapGABSDASDB = Match(GABSDA,ShouldDeleteB) 26.Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB) S0S0 S1S1 S2S2 S3S3

© 2001 Microsoft Corp.21 Data Migration Given –a schema S and its database D –an evolved schema S Produce –a procedure for mapping D into an S database D S S D 2. Use Enum(S) to generate a data migration script Generate Migration Script Enum 1. map SS = Match(S, S) 1. map SS Run D

© 2001 Microsoft Corp.22 Data Translation Like data migration, except S and S are expressed in different data models.

© 2001 Microsoft Corp.23 Outline Overview of Model Management Solutions to classical meta data problems Recent technical results

© 2001 Microsoft Corp.24 Status Report Vision –[Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00] Data Warehouse Examples –[Bernstein & Rahm, ER ’00] Match Operation –Survey: [Rahm & Bernstein, MSR Tech Report] –Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01] Merge Operation –coming soon … Theory –[Alagić & Bernstein, DBPL ’01]

© 2001 Microsoft Corp.25 Schema Matching Approaches About a dozen published algorithms. Many good ideas, but none are robust. Automatic composition Composite Individual matchers Combined matchers Manual composition Schema-basedContent-based Graph matching Linguistic Constraint -based StructuralPer-Element Types Keys Value pattern and ranges Constraint -based Linguistic IR (word frequencies, key terms) Per-Element Hybrid Constraint -based Names Descriptions

© 2001 Microsoft Corp.26 The CUPID Algorithm CityStreet PurchaseOrder InvoiceToDeliverTo CityStreetCityStreet Address POShipTo PO POBillTo CityStreet ssim++ Computes linguistic similarity of element pairs Computes structural similarity of element pairs Generates a mapping

© 2001 Microsoft Corp.27 M 3 = Merge(M 1, map, M 2 ) [Buneman, Davidson, Kosky, EDBT 92] –Meta-model has aggregation & generalization only –Do a union and collapse objects having the same name –Fix-up step for inconsistencies created by merging Y X a Z X a Y X Z W a Y X Z a a –Successive fixups lead to different results  –Batch them at the end, to produce a unique minimal result Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)

© 2001 Microsoft Corp.28 A Formal Semantics for Model Mgt Use category theory for a data-model-independent characterization of models and mappings Models and their DBs are categories Model and data transformations are morphisms Mappings between models & data are functors Utility –Define formal semantics for Match and Merge –Explain when Match & Merge preserve constraints. –Check that implementation satisfies the semantics

© 2001 Microsoft Corp.29 Categories Functor Theory Db Db(Sch 1 ) Db(Sch 12 ) Db(Sch 2 ) Db q p Sch 12 Sch 1 Sch 2 f Sch m g Match Merge Goal – a mathematical semantics of MM algebra

© 2001 Microsoft Corp.30 Implementation Vision OR Mapper MM Meta-Model Match Merge Compose Copy Apply … Model-Driven UI Generator Model Manager Object-Oriented Repository SQL DBMS Bill Customer Update Marketing Inventory Authorize Credit Order Entry Schedule Delivery select all cust emp dept dno dna Generic Tools Generic Tools Browser Browser Import/export Import/export Scripting Scripting Editors Editors Catalogs Catalogs Operation Speciali- zations Inferencing Engine            

© 2001 Microsoft Corp.31 Related Work There’s a lot of it. Apply it to model management! Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic) Inferencing on mappings – AQUV, description logic Transitive closure and recursive QP Differencing – text, trees, graphs Data translation – algebras, schema evolution Data integration – schema match, view generation

© 2001 Microsoft Corp.32 Summary Raise the level of abstraction of meta-data programming by using: –models and mappings as objects –an algebra that manipulates models and mappings on a generic meta-model Classical meta data problems can be expressed using this algebra Implementations of classic problems offer guidance on implementing the algebra

© 2001 Microsoft Corp.33 References P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage- ment of Complex Models”, SIGMOD Record, Dec E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001 S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001