Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

Similar presentations


Presentation on theme: "© 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September."— Presentation transcript:

1 © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September 6, 2001

2 © 2001 Microsoft Corp.2 The Problem  There is 30 years of DB Research on meta data  But we don’t have great infrastructure to offer –Most design tools and web services store meta data in files, not DBs –OODBMS’s are not a huge success –Most meta data driven tools use their own infrastructure  Goal: generic meta data manipulation infrastructure –Reduce the amount of programming required to build meta data driven applications. Proposal: Model Management – Define an algebra to manipulate meta data in large chunks, called models and mappings.

3 © 2001 Microsoft Corp.3 Outline Overview of Model Management Solutions to classical meta data problems Recent technical results

4 © 2001 Microsoft Corp.4 Models and Mappings Model – a complex information structure –XML schema, SQL schema, OO interface, UML model, web site map, make script, …. Mapping – a transformation from one model into another –Map between two XML schemas –Map a SQL schema to an XML schema –Map data sources to a data warehouse –Map an ER diagram to a SQL schema –Map a process defn to a workflow script

5 © 2001 Microsoft Corp.5 Representation A model is a directed graph with one root. Emp E# Dept# Name Relational Schema Emp E# Dept# Name First Last XSD map 1 A mapping is a model each of whose nodes connects nodes of two other models

6 © 2001 Microsoft Corp.6 Model Management Algebra Match Merge Compose Select Diff Enumerate ApplyFunction Copy Update operations

7 © 2001 Microsoft Corp.7 map = Match(M 1, M 2,  ) Match(M 1, M 2,  ) returns the best mapping between M 1 and M 2, w.r.t. to  map1 = = Emp E# Dept# Name Addr M1M1 M2M2 Emp E# Dept# Name First Last Phone 

8 © 2001 Microsoft Corp.8 M 3 = Merge(M 1, M 2, map) Return the union of models M 1 and M 2 –Use map to guide the Merge –If elements x = y in map, then collapse them into one element Emp AddrName Emp NamePhone mapC = Emp NamePhoneAddr 

9 © 2001 Microsoft Corp.9 Left Composition ( f ) Emp Addr Street City Emp Street City Emp StAddr Town mapA a1 a2 a3 mapB b2 b3 M1M1 M2M2 M3M3 Emp Addr Street City Emp StAddr Town mapC c1 c2 c3 mapC = mapA f mapB Name b1

10 © 2001 Microsoft Corp.10 Model Management Algebra map = Match ( M 1, M 2,  ) M 3 = Merge ( M 1, M 2, map ) map 3 = Compose( map 1, map 2 ) M 2 = Select( M 1, pred ) M 2 = Diff( M 1, map ) list = Enumerate( M ) ApplyFunction( M, f ) M 2 = Copy( M 1 ) Update operations They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story

11 © 2001 Microsoft Corp.11 Examplerdb1 xsd1 map 1 xsd2 1. map 2 1. map 2 = Match(xsd1, xsd2) 2. map 3 2. map 3 = map 1  map 2 rdb2 3. map 4 3. = Copy(map 3 ) Given –map 1 from SQL schema rdb1 to xsd1, –xsd2, which is similar to xsd1 Produce –a map between xsd2 and a relational schema. 4. Use ApplyFunction(map 4 ) to map each x in Diff(xsd2,map 4 ) into rdb2

12 © 2001 Microsoft Corp.12 Theme Classic meta data problems can be solved using Model Management operations –Schema integration –Schema evolution –Data migration –Reverse engineering –Data reintegration (3-way merging) Published solutions to these problems help us produce generic implementations of model mgmt operations

13 © 2001 Microsoft Corp.13 Outline Overview of Model Management Solutions to classical meta data problems –Schema integration –Schema evolution –Reverse engineering –Data reintegration (3-way merging) –Data migration Recent technical results

14 © 2001 Microsoft Corp.14 1. map 1. map= Match(V 1, V 2 ) Schema Integration Given –two view schemas, V 1 and V 2 Produce –an integrated schema, S V1V1V1V1 V2V2V2V2 2. S = Merge(V 1, V 2, map) map S 2. 3. ) // to resolve conflicts in, S 3. ApplyFunction(S) // to resolve conflicts in S, producing S S

15 © 2001 Microsoft Corp.15 Emp E# Dept# Addr V1V1 V2V2 E# Dept# Phone FirstName LastName Emp Name 1. map= Match(V 1, V 2 ) map = =  2. S = Merge(V 1, V 2, map) S E# Dept# Addr Phone Emp Name FirstName LastName f L R FirstName LastName 3. Use ApplyFunction(S) to re- solve conflicts, producing S

16 © 2001 Microsoft Corp.16 Merging Knowledge Bases (Ontologies) Same as schema integration, but applied to ontologies The literature on merging ontologies focuses mostly on Match.

17 © 2001 Microsoft Corp.17 Schema Evolution Given –map SV from schema S to view V –a modified version S of S Produce –a mapping map SV from S to V (i.e. a view defn for V over S). S V map SV S 1. map SS 1. map SS = Match(S, S) 2. map SV 2. map S V = map S S  map SV 3.Use ApplyFunction(V) to delete elements not derivable from S

18 © 2001 Microsoft Corp.18 Outline Overview of Model Management Solutions to classical meta data problems Schema integration Schema evolution –Reverse engineering –Data reintegration (3-way merging) –Data migration Recent technical results

19 © 2001 Microsoft Corp.19 Reverse Engineering Given –Model M (e.g., an ER model) –Model G (e.g., SQL) generated via map MG from M –A modified version G of G Produce –A modified version M of M that generates G G M map MG G 1. map GG 1. map GG = Match(G, G) 2. map MG 2. map MG = map MG  map GG M 3. map MG 3. = Copy(map MG ) 4. Use ApplyFunction(map MG ), to reverse engineer each g in Diff(G,map MG ) into M

20 © 2001 Microsoft Corp.20 3-Way Merge (aka Reintegration) Given –a source schema S 0 –two derived schemas S 1 and S 2 Produce –a schema S 3 that merges the changes of S 1 and S 2 1.MapOA = Match(O, A) (based on OIDs) 2.MapOB = Match (O, B) (based on OIDs) 3.MapOA = ApplyFunction(MapOA) such that if e  MapOA if domain(e) = range(e), then delete e (i.e. things changed in A) 4.MapOB = ApplyFunction(MapOB) such that if e  MapOB if domain(e) = range(e), then delete e (i.e. things changed in B) 5.ChangedA = range(MapOA) 6.ChangedB = range(MapOB) 7.MapChAChB = Match(ChangedA, ChangedB) 8.MapChBChA = invert(MapChAChB) 9.A = Diff(ChangedA, ChangedB, MapChAChB) (changed in A but not changed in B) 10.B = Diff(ChangedB, ChangedA, MapChBChA) 11.MapAB = Match (A,B) (by OIDs) 12.G = Merge (A,B, MapAB) 13.MapGA =Match(G,A) 14.GA = Merge (G, A, MapGA) with preference for A 15.MapGAB =Match(GA,B) 16.GAB = Merge (GA’, B’, MapGA’B’) with preference for B 17.DeletedA = Diff(O,A,MapOA) 18.DeletedB = Diff(O, B, MapOB) 19.MapDeletedAChangedB = Match(DeletedA, ChangedB) 20.MapDeletedBChangedA = Match(DeletedB, ChangedA) 21.ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB) 22.ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA) 23.MapGABSDA = Match(GAB, ShouldDeleteA) 24.GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA) 25.MapGABSDASDB = Match(GABSDA,ShouldDeleteB) 26.Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB) S0S0 S1S1 S2S2 S3S3

21 © 2001 Microsoft Corp.21 Data Migration Given –a schema S and its database D –an evolved schema S Produce –a procedure for mapping D into an S database D S S D 2. Use Enum(S) to generate a data migration script Generate Migration Script Enum 1. map SS = Match(S, S) 1. map SS Run D

22 © 2001 Microsoft Corp.22 Data Translation Like data migration, except S and S are expressed in different data models.

23 © 2001 Microsoft Corp.23 Outline Overview of Model Management Solutions to classical meta data problems Recent technical results

24 © 2001 Microsoft Corp.24 Status Report Vision –[Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00] Data Warehouse Examples –[Bernstein & Rahm, ER ’00] Match Operation –Survey: [Rahm & Bernstein, MSR Tech Report] –Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01] Merge Operation –coming soon … Theory –[Alagić & Bernstein, DBPL ’01]

25 © 2001 Microsoft Corp.25 Schema Matching Approaches About a dozen published algorithms. Many good ideas, but none are robust. Automatic composition Composite Individual matchers Combined matchers Manual composition Schema-basedContent-based Graph matching Linguistic Constraint -based StructuralPer-Element Types Keys Value pattern and ranges Constraint -based Linguistic IR (word frequencies, key terms) Per-Element Hybrid Constraint -based Names Descriptions

26 © 2001 Microsoft Corp.26 The CUPID Algorithm CityStreet PurchaseOrder InvoiceToDeliverTo CityStreetCityStreet Address POShipTo PO POBillTo CityStreet ssim++ Computes linguistic similarity of element pairs Computes structural similarity of element pairs Generates a mapping

27 © 2001 Microsoft Corp.27 M 3 = Merge(M 1, map, M 2 ) [Buneman, Davidson, Kosky, EDBT 92] –Meta-model has aggregation & generalization only –Do a union and collapse objects having the same name –Fix-up step for inconsistencies created by merging Y X a Z X a Y X Z W a Y X Z a a –Successive fixups lead to different results  –Batch them at the end, to produce a unique minimal result Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)

28 © 2001 Microsoft Corp.28 A Formal Semantics for Model Mgt Use category theory for a data-model-independent characterization of models and mappings Models and their DBs are categories Model and data transformations are morphisms Mappings between models & data are functors Utility –Define formal semantics for Match and Merge –Explain when Match & Merge preserve constraints. –Check that implementation satisfies the semantics

29 © 2001 Microsoft Corp.29 Categories Functor Theory Db Db(Sch 1 ) Db(Sch 12 ) Db(Sch 2 ) Db q p Sch 12 Sch 1 Sch 2 f Sch m g Match Merge Goal – a mathematical semantics of MM algebra

30 © 2001 Microsoft Corp.30 Implementation Vision OR Mapper MM Meta-Model Match Merge Compose Copy Apply … Model-Driven UI Generator Model Manager Object-Oriented Repository SQL DBMS Bill Customer Update Marketing Inventory Authorize Credit Order Entry Schedule Delivery select all cust emp dept dno dna Generic Tools Generic Tools Browser Browser Import/export Import/export Scripting Scripting Editors Editors Catalogs Catalogs Operation Speciali- zations Inferencing Engine            

31 © 2001 Microsoft Corp.31 Related Work There’s a lot of it. Apply it to model management! Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic) Inferencing on mappings – AQUV, description logic Transitive closure and recursive QP Differencing – text, trees, graphs Data translation – algebras, schema evolution Data integration – schema match, view generation

32 © 2001 Microsoft Corp.32 Summary Raise the level of abstraction of meta-data programming by using: –models and mappings as objects –an algebra that manipulates models and mappings on a generic meta-model Classical meta data problems can be expressed using this algebra Implementations of classic problems offer guidance on implementing the algebra

33 © 2001 Microsoft Corp.33 References http://www.research.microsoft.com/~philbe P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage- ment of Complex Models”, SIGMOD Record, Dec. 2000 E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001 S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001


Download ppt "© 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September."

Similar presentations


Ads by Google