Model-Independent Schema and Data Translation Paolo Atzeni Università Roma Tre Based on Tutorial at ICDE 2006 Paper in EDBT 2006 (with P. Cappellari and.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Francesca Bugiotti Università Roma Tre 1 17/12/2009.
Data integration and transformation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010.
Information capacity in schema and data translation Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari, G. Gianforme,
Database Systems: Design, Implementation, and Management Tenth Edition
Database Systems: Design, Implementation, and Management Ninth Edition
Introduction to Databases
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Ch5: ER Diagrams - Part 1 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
System Analysis - Data Modeling
CS 425/625 Software Engineering System Models
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
File Systems and Databases
Organizing Data & Information
Introduction to Databases Transparencies
Model-independent schema and data translation Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari, G. Gianforme,
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 2 Data Models Database Systems, 8th Edition 1.
Model-independent Schema and Data Translation Paolo Atzeni 14/10/2009.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 10 Structuring.
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Architectural Design.
Model-independent Schema and Data Translation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010.
Computer System Analysis Chapter 10 Structuring System Requirements: Conceptual Data Modeling Dr. Sana’a Wafa Al-Sayegh 1 st quadmaster University of Palestine.
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
2 1 Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database System Concepts and Architecture
Database Systems: Design, Implementation, and Management Ninth Edition
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 2/1 Copyright © 2004 Please……. No Food Or Drink in the class.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Concepts and Terminology Introduction to Database.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Model-independent schema and data translation: a run-time approach Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari,
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Modeling Component-based Software Systems with UML 2.0 George T. Edwards Jaiganesh Balasubramanian Arvind S. Krishna Vanderbilt University Nashville, TN.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Jemerson Pedernal IT 2.1 FUNDAMENTALS OF DATABASE APPLICATIONS by PEDERNAL, JEMERSON G. [BS-Computer Science] Palawan State University Computer Network.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 10 Structuring.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Management of schema translations in a model generic framework Paolo Atzeni Università Roma Tre Joint work with Paolo Cappellari, Giorgio Gianforme (Università.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
MANAGING DATA RESOURCES
File Systems and Databases
Chapter 4 Entity Relationship (ER) Modeling
Presentation transcript:

Model-Independent Schema and Data Translation Paolo Atzeni Università Roma Tre Based on Tutorial at ICDE 2006 Paper in EDBT 2006 (with P. Cappellari and P. Bernstein) JISBD Sitges, 5 October 2006

P. AtzeniJISBD - October 5, Outline Introduction Model management Schema and data translation: the problem A metamodel based approach

P. AtzeniJISBD - October 5, A ten-year goal for database research The “Asilomar report” (Bernstein et al. Sigmod Record –The information utility: make it easy for everyone to store, organize, access, and analyze the majority of human information online

P. AtzeniJISBD - October 5, A general framework: cooperation "The capacity of a system to interact (effectively) with other systems, possibly managed by different organizations" Forms of cooperation –Process-centered cooperation: the systems offer services one another, by exchanging messages, or by triggering activities, without making remote data explicitly visible –Data-centered cooperation: the systems offer data one another; data is distributed, heterogeneous and autonomous

P. AtzeniJISBD - October 5, Databases in the Internet era Databases before the Internet –An internal infrastructure, a precious resource, but usually hidden, with some controlled cooperation Internet changes the requirements –More users (not only humans), more diverse cooperating systems (distributed, heterogeneous, autonomous), more types of data "Future" Internet changes more –New devices (embedded everywhere), even more users (many “per person”), real mobility, need for personalization and adaptation

P. AtzeniJISBD - October 5, The most studied form of data-centered cooperation: integration We are interested in data-centered cooperation, often referred to as integration “The unification of related, heterogeneous data from disparate sources, for example, to enable collaboration” (Hammer & Stonebraker 2005) Some "paradigms" …

P. AtzeniJISBD - October 5, Multidatabase Global Manager Local mgr DB Mediator Local mgr DB Mediator Local mgr DB Mediator

P. AtzeniJISBD - October 5, Data Warehousing System Mediator Local manager Mediator Local manager Mediator Local manager “Integrator” DB Data Warehouse DW manager

P. AtzeniJISBD - October 5, Intermediate solutions in practice Mediator Local manager DB Local manager DB Integrator Mediator Local manager Mediator Local manager DB Local Manager Application

P. AtzeniJISBD - October 5, Peer-based architecture Peer mgr Local mgr DB Local mgr DB Mediator Peer mgr Local mgr DB Local mgr DB Mediator Peer mgr Local mgr DB Local mgr DB Mediator

P. AtzeniJISBD - October 5, Data is not just in databases Mail messages Web pages Spreadsheets Textual documents Palmtop devices, mobile phones Multimedia annotations (e.g., in digital photos) XML documents

P. AtzeniJISBD - October 5, The same data in the same form? Adaptivity: –Personalization: content adapted to the user upon system's decision upon user's request –Customization: structure adapted to the user according to the user's role upon user's request –Context dependence User, Device, Network, Place, Time, Rate

P. AtzeniJISBD - October 5, Summarizing: a general need We have data at various places, and data has to be –exchanged –replicated –migrated –integrated –adapted

P. AtzeniJISBD - October 5, A major difficulty Heterogeneity –System level –Model level –Structural (different structure for similar data) –Semantic (different meaning for the same structure) Many efforts, but current techniques are mostly manual and ad hoc

P. AtzeniJISBD - October 5, A direction for the solutions Be general! –ad hoc solution could work in-the-small, but they are repetitive and time consuming do not scale are not maintainable Historical notes: –W. C. McGee: Generalization: Key to Successful Electronic Data Processing. J. ACM 1959 Indeed, databases are the result of generalization applied to secondary storage management!

P. AtzeniJISBD - October 5, Generality requires … … high-level descriptions of problems within the family of interest: –Metadata: “data about data” (formal or informal) description of structures and meaning General solutions leverage on metadata (and then operate on data as a consequence)

P. AtzeniJISBD - October 5, Outline Introduction and motivation Model management Schema and data translation: the problem A metamodel based approach

P. AtzeniJISBD - October 5, A wider perspective (Generic) Model Management: –A proposal by Bernstein et al (2000 +) –Includes a set of operators on schemas and mappings between schemas

P. AtzeniJISBD - October 5, Terminology: a warning Model Mgmt peopleTraditional DB people Meta-metamodelMetamodel Model Schema

P. AtzeniJISBD - October 5, Schemas and mappings A simplified approach: –Schema: a set of elements, related in some way to one another –Mapping: a set of correspondences (pair of elements) or its reification, a third schema related to the other two via two sets of correspondences

P. AtzeniJISBD - October 5, Model mgmt operators, a first set map = Match (S1, S2) S3 = Merge (S1, S2, map) S2 = Diff (S1, map) and more –map3 = Compose (map1, map2) –S2 = Select (S1, pred) –Apply (S, f) –list = Enumerate (S) –S2 = Copy (S1) –…

P. AtzeniJISBD - October 5, Match map = Match (S1, S2) –given two schemas S1, S2 –returns a mapping between them the “classical” initial step in data integration: –find the common elements of two schemas and the correspondences between them

P. AtzeniJISBD - October 5, Merge S3 = Merge (S1, S2, map) –given two schemas and a mapping between them –returns a third schema (and two mappings) the “classical” second step in data integration: –given the correspondences, find a way to obtain one schema out of two

P. AtzeniJISBD - October 5, Diff S2 = Diff (S1, map) –given a schema and a mapping from it (to some other schema, not relevant) –returns a (sub-)schema, with the elements that do not participate in the mapping

P. AtzeniJISBD - October 5, DW2 Example (Bernstein and Rahm, ER 2000) A database (a “source”), a data warehouse and a mapping between the two we get a second source, with some similarity to the first one and we want to update the DW DB1 DW1 DB2

P. AtzeniJISBD - October 5, DW2 Example, the "solution" DB1 DW1 DB2 m1 m2 m3 DB2’ m2 = Match(DB1,DB2) m3= Compose(m2,m1) DB2’=Diff(DB2,m3) DW2’, m4 user defined m5 = Match(DW1,DW2’) DW2 = Merge(DW1,DW2’,m5) DW2’ m4 m5

P. AtzeniJISBD - October 5, Magic does not exist Operators might require human intervention: –Match is the main case Scripts involving operators might require human intervention as well (or at least benefit from it): –a full implementation of each operator might not always available –a mapping might require manual specification –incomparable alternatives might exist

P. AtzeniJISBD - October 5, The “data level” The major operators have also an extended version that operates on data, and not only on schemas Especially apparent for –Merge

P. AtzeniJISBD - October 5, We also have heterogeneity Round trip engineering (Bernstein, CIDR 2003) –A specification, an implementation –then a change to the implementation: want to revise the specification We need a translation from the implementation model to the specification one S1 I1I2 S2

P. AtzeniJISBD - October 5, Model management with heterogeneity The previous operators have to be “model generic” (capable of working on different models) We need a “translation” operator – = ModelGen (S1)

P. AtzeniJISBD - October 5, ModelGen, an additional operator = ModelGen (S1) –given a schema (in a model) –returns a schema (in a different data model) and a mapping between the two A “translation” from a model to another I should call it “SchemaGen” … We should better write – = ModelGen (S1,mod2)

P. AtzeniJISBD - October 5, S2 Round trip engineering S1 I1 m1 I2 m2 = Match (I1,I2) m3 = Compose (m1,m2) I2’= Diff(I2,m3) = Modelgen(I2’) … Match, Merge m2 m3 I2’ S2’ m4

P. AtzeniJISBD - October 5, Outline Introduction Model management Schema and data translation: the problem A metamodel based approach

P. AtzeniJISBD - October 5, Schema and data translation, a long standing issue Schema translation: –given schema S1 in model M1 and model M2 –find a schema S2 in M2 that “corresponds” to S1 Schema and data translation: –given also a database D1 for S1 –find also a database D2 for S2 that “contains the same data” to D1

P. AtzeniJISBD - October 5, Schema and data translation, a long standing issue Translations from a model to another have been studied since the 1970’s Whenever a new model is defined, techniques and tools to generate translations are studied However, proposals and solutions are usually model specific

P. AtzeniJISBD - October 5, Model specific solutions Given an ER schema, find the suitable relational schema that “implements” it –the original paper (Chen 1976) contains the basics –further discussions by many (e.g. Markowitz and Shoshani 1989) –illustrated in every textbook Similarly with –any other conceptual model and any other logical one –XML and relational (or object)

P. AtzeniJISBD - October 5, Another problem in the picture: data exchange Given a source S1 and a target schema S2 (in different models or even in the same one), find a translation, that is, a function that given a database D1 for S1 produces a database D2 for S2 that “correspond” to D1 Often emphasized with reference to materialized solutions

P. AtzeniJISBD - October 5, Integration Given two or more sources, build an integrated schema or database

P. AtzeniJISBD - October 5, Schema translation Given a schema find another one with respect to some specific goal (better quality, another model, …)

P. AtzeniJISBD - October 5, Data exchange Given a source and a target schema, find a transformation from the former to the latter

P. AtzeniJISBD - October 5, Schema translation and data exchange Can be seen a complementary: –Data translation = schema translation + data exchange Given a source schema and database Schema translation produces the target schema Data exchange generates the target database In model management terms we could write –Schema translation: = ModelGen (S1,mod2) –Data exchange: i2 = DataGen (S1,i1,S2,map12)

P. AtzeniJISBD - October 5, Outline Introduction Model management Schema and data translation: the problem A metamodel based approach

P. AtzeniJISBD - October 5, The problem ModelGen –given two data models M1 and M2, and a schema S1 of M1 (the source schema and model), generate a schema S2 of M2 (the target schema and model), corresponding to S1 and, for each database D1 over S1, generate an equivalent database D2 over S2

P. AtzeniJISBD - October 5, We have been doing this for a while Initial work more than ten years ago (Atzeni & Torlone, 1996) Major novelty recently –translation of both schemas and data –data-level translations generated, from schema-level ones Moreover: –a visible, multilevel and (in part) self-generating dictionary –high-level, visible and customizable translation rules in Datalog with OID-invention –mappings between elements generated as a by-product

P. AtzeniJISBD - October 5, Heterogeneity We need to handle artifacts and data in various models –Data are defined wrt to schemas –Schemas are defined wrt to models –How models can be defined? Models Schemas Data

P. AtzeniJISBD - October 5, A metamodel approach The constructs in the various models are rather similar: –can be classified into a few categories (Hull & King 1986): Lexical: set of printable values (domain) Abstract (entity, class, …) Aggregation: a construction based on (subsets of) cartesian products (relationship, table) Function (attribute, property) Hierarchies … We can fix a set of metaconstructs (each with variants): –lexical, abstract, aggregation, function,... –the set can be extended if needed, but this will not be frequent A model is defined in terms of the metaconstructs it uses

P. AtzeniJISBD - October 5, The metamodel approach, example The ER model: –Abstract (called Entity) –Function from Abstract to Lexical (Attribute) –Aggregation of abstracts (Relationship) –… The OR model: –Abstract (Table with ID) –Function from Abstract to Lexical (value-based Attribute) –Function from Abstract to Abstract (reference Attribute) –Aggregation of lexicals (value-based Table) –Component of Aggregation of Lexicals (Column) –…

P. AtzeniJISBD - October 5, The supermodel A model that includes all the meta-constructs (in their most general forms) –Each model is subsumed by the supermodel (modulo construct renaming) –Each schema for any model is also a schema for the supermodel (modulo construct renaming)

P. AtzeniJISBD - October 5, A Multi-Level Dictionary Handles models, schemas and data Has both a model specific and a model independent component

P. AtzeniJISBD - October 5, Multi-Level Dictionary model schema model generic model specific description model independence Supermodel description (mSM) Model descriptions (mM) Supermodel schemas (SM) Model specific schemas (M) Supermodel instances (i-SM) Model specific instances (i-M) data

P. AtzeniJISBD - October 5, The supermodel description MSM-Construct OIDNameIsLex 1AggregationOfLexicalsF 2ComponentOfAggrOfLexT 3AbstractF 4AttributeOfAbstractT 5BinaryAggregationOfAbstractsF MSM-Property OIDNameConstr.Type 11Name1String 12Name2String 13IsKey2Boolean 14IsNullable2Boolean 15Type2String 16Name3String 17Name4String 18IsIdentifier4Boolean 19IsNullable4Boolean 20Type4String 21IsFunct15Boolean 22IsOptional15Boolean 23Role15String 24IsFunct25Boolean 25IsOptional25Boolean 26Role25String MSM-Reference OIDNameConstructTarget 30Aggregation21 31Abstract43 32Abstract153 33Abstract253 mSMmM SMM i-SMi-M

P. AtzeniJISBD - October 5, Schemas in the supermodel SM-Abstract OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… MSM-Construct OIDNameIsLex ……… 3AbstractF 4AttributeOfAbstractT ……… Supermodel schemas mSMmM SMM i-SMi-M Employees Departments EmpNo Name Address

P. AtzeniJISBD - October 5, i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe3010 … Bob White3011 …………… Instances in the supermodel SM-Abstract OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… Supermodel schemas Supermodel instances mSMmM SMM i-SMi-M

P. AtzeniJISBD - October 5, Multi-Level Repository model schema model generic model specific description model independence Supermodel description (mSM) Model descriptions (mM) Supermodel schemas (SM) Model specific schemas (M) Supermodel instances (i-SM) Model specific instances (i-M) data

P. AtzeniJISBD - October 5, Model descriptions MSM-Construct OIDNameIsLex 1AggregationOfLexicalsF 2ComponentOfAggrOfLexT 3AbstractF 4AttributeOfAbstractT 5BinaryAggregationOfAbstractsF MM-Construct OIDModelMSM-ConstrName 123ER_Entity 224ER_Attribute 325ER_Relationship 411Rel_Table 512Rel_Column 633OO_Class MM-Model OIDName 1Relational 2Entity-Relationship 3Object MSM-Property OIDNameConstructType ………… MSM-Reference OIDNameConstruct… ………… MM-Property ………… MM-Reference ………… mSMmM SMM i-SMi-M

P. AtzeniJISBD - October 5, Multi-Level Repository, generation and use model schema model generic model specific description model independence Supermodel description (mSM) Model descriptions (mM) Supermodel schemas (SM) Model specific schemas (M) Supermodel instances (i-SM) Model specific instances (i-M) data Structure fixed, content provided by tool designers Structure generated by the tool from the content of mSM Structure fixed, content provided by model designers out of mSM Structure generated by the tool from the content of mM

P. AtzeniJISBD - October 5, The metamodel approach, translations The constructs in the various models are rather similar: –can be classified into a few categories (“metaconstructs'') –translations can be defined on metaconstructs, and there are “standard”, accepted ways to deal with translations of metaconstructs they can be performed within the supermodel –each translation from the supermodel SM to a target model M is also a translation from any other model to M: given n models, we need n translations, not n 2

P. AtzeniJISBD - October 5, Generic translation environment 1. Copy 2. Translation 3. Copy Supermodel Source model Target model Translation: composition 1,2 & 3

P. AtzeniJISBD - October 5, Translations within the supermodel We still have too many models: –Just within simple ER model versions, we have 4 or 5 constructs, and each has several independent features which give rise to variants for example, relationships can be –binary or N-ary –with all possible cardinalities or without many-to- many –with or without the possibility of specifying optionality –with or without attributes –…–… –Combining all these, we get hundreds of models! –The management of a specific translation for each model would be hopeless

P. AtzeniJISBD - October 5, Translations, the approach Elementary translation steps to be combined Each translation step handles a supermodel construct (or a feature thereof) "to be eliminated" or "transformed" A translation is the concatenation of elementary translation steps

P. AtzeniJISBD - October 5, A complex translation, example (0,N) Eliminate N-ary relationships Eliminate attributes from relationships Eliminate many-to-many relationships Replace relationships with references Eliminate generalizations

P. AtzeniJISBD - October 5, Complex translations Binary ER w/o gen N-ary ER w/ gen Binary ER w/ gen Relational Bin ER w/ gen w/o attr on rel OO w/ gen N-ary ER w/o gen Bin ER w/o gen w/o attr on rel Bin ER w/o gen w/o M:N rel Elim. N-ary relationships Elim. Relationship attr.s Elim. M:N relationships Replace relationships with references Elim OO generalizations Elim ER generalizations Bin ER w/ gen w/o M:N rel OO w/o gen …

P. AtzeniJISBD - October 5, Translations Basic translations are written in a variant of Datalog, with OID invention –We specify them at the schema level –The tool "translates them down" to the data level –Some completion or tuning may be needed

P. AtzeniJISBD - October 5, A basic translation From (a simple) binary ER model to the relational model –a table for each entity –a column (in the table for E) for each attribute of an entity E –for each M:N relationship a table for the relationship columns … –for each 1:N and 1:1 relationship: a column for each attribute of the identifier …

P. AtzeniJISBD - October 5, A basic translation application Departments NameAddress Employees EmpNo Name Affiliation Departments 1,1 0,N Name Address Employees EmpNoNameAffiliation

P. AtzeniJISBD - October 5, A basic translation (in supermodel terms) From (a simple) binary ER model to the relational model –an aggregation of lexicals for each abstract –a component of the aggregation for each attribute of abstract –for each M:N aggregation of abstracts … … From (a simple) binary ER model to the relational model –a table for each entity –a column (in the table for E) for each attribute of an entity E –for each M:N relationship a table for the relationship columns … –for each 1:N and 1:1 relationship: a column for each attribute of the identifier …

P. AtzeniJISBD - October 5, "An aggregation of lexicals for each abstract" SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: name)  SM_Abstract ( OID: OID, Name: name ) ;

P. AtzeniJISBD - October 5, Datalog with OID invention Datalog (informally): –a logic programming language with no function symbols and predicates that correspond to relations in a database –we use a non-positional notation Datalog with OID invention: –an extension of Datalog that uses Skolem functions to generate new identifiers when needed Skolem functions: –injective functions that generate "new" values (value that do not appear anywhere else); so different Skolem functions have disjoint ranges

P. AtzeniJISBD - October 5, "An aggregation of lexicals for each abstract" SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: n)  SM_Abstract ( OID: OID, Name: n ) ; the value for the attribute Name is copied (by using variable n) the value for OID is "invented": a new value for the function #aggregationOID_1(OID) for each different value of OID, so a different value for each value of SM_Abstract.OID

P. AtzeniJISBD - October 5, "An aggregation of lexicals for each abstract" SM-Abstract OIDSchemaName 3011Employees 3021Departments ……… SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText301 ………………… Employees EmpNo Name 11 Departments1002 Employees1001 SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: n)  SM_Abstract ( OID: OID, Name: n ) ; … …… 1001 SM-AggregationOfLexicals SchemaNameOID SM-aggregationOID_1_SK absOIDOID 301 Employees

P. AtzeniJISBD - October 5, "A component of the aggregation for each attribute of abstract" SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type : type ) ; Skolem functions –are functions –are injective –have disjoint ranges the first function "generates" a new value the second "reuses" the value generated by the first rule

P. AtzeniJISBD - October 5, A component of the aggregation for each attribute of abstract" SM-Abstract OIDSchemaName 3011Employees 3021Departments ……… Employees EmpNo Name 11 Departments1002 Employees1001 … …… 1001 SM-AggregationOfLexicals SchemaNameOID SM-aggregationOID_1_SK absOIDOID 301 SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type : type ) ; SM-componentOID_1_SK absOIDOID Employees EmpNoName SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText301 ………………… SM-ComponentOfAggregationOfLexicals SchemaTypeAggrOIDisNullableisIdentNameOID Text1001FFName IntFTEmpNo

P. AtzeniJISBD - October 5, Generating data-level translations Schema translation Data translation Same environment Same language High level translation specification Supermodel description (mSM) Supermodel schemas (SM) Supermodel instances (i-SM)

P. AtzeniJISBD - October 5, Translation rules, data level i-SM_ ComponentOfAggregation… ( OID: #i-componentOID_1 (i-attOID), i-AggrOID: #i-aggregationOID_1(i-absOID), ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID), Value: Value ) ← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID, AttributeOfAbstractOID: attOID, Value: Value ), SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID, Name: attName, IsNullable: isNull, IsID: isIdent, Type: type ) SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type: type ) ;

P. AtzeniJISBD - October 5, Correctness Usually modelled in terms of information capacity equivalence/dominance (Hull 1986, Miller 1993, 1994) Mainly negative results in practical settings that are non-trivial Probably hopeless to have correctness in general We follow an "axiomatic" approach: –We have to verify the correctness of the basic translations, and then infer that of complex ones

P. AtzeniJISBD - October 5, Experiments A significant set of models –ER (in many variants and extensions) –Relational –OR –XSD –UML

P. AtzeniJISBD - October 5, Summary ModelGen was studied a few years ago New interest on it within the "Model management" framework New approach –Translation of schema and data –Visible (and in part self generated) dictionary –Visible and modifiable rules –Skolem functions describe mappings

P. AtzeniJISBD - October 5, Conclusion The ten-year goal of the Asilomar report: –The information utility: make it easy for everyone to store, organize, access, and analyze the majority of human information online A lot of interesting work has been done but … …integration, translation, exchange are still difficult… … 2009 is approaching … we are late!

P. AtzeniJISBD - October 5, GrazzieGraciesGracias Grazie  Thank you

P. AtzeniJISBD - October 5,

Leftovers

P. AtzeniJISBD - October 5, i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe Bob White CS3012 …………… Instances in our dictionary SM-Abstract OIDSchemaName 3011Employees 3021Departments SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar302 ………………… mSMmM SMM i-SMi-M John Doe Bob White … CS

P. AtzeniJISBD - October 5, Translation of instances John Doe Bob White … CS Employees EmpNoNameAffiliation 75432John DoeCS Bob White Departments Name CS Departments NameAddress Employees EmpNo Name Affiliation Departments 1,1 0,N Name Address Employees EmpNoNameAffiliation

P. AtzeniJISBD - October 5, i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe CS3012 …………… Instances in our dictionary John Doe Bob White … CS Employees EmpNoNameAffiliation 75432John DoeCS Departments Name CS i-SM-AggregationOfLexicals AggOID InstOIDOID 5010CS John Doe i-SM-ComponentOfAggregationOfLexicals i- AggOIDValueInstOIDCompOfAggOIDOID

P. AtzeniJISBD - October 5, i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe CS3012 …………… Instances in our dictionary i-SM-AggregationOfLexicals AggrOID InstOIDOID John Doe CS i-SM-ComponentOfAggregationOfLexicals i- AggrOIDValueInstOIDCompOfAggOIDOID i-SM_ComponentOfAggregation… (OID: #i-componentOID_1 (i-attOID), i-AggrOID: #i-aggregationOID_1(i-absOID), ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID),Value: Val ) ← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID, AttributeOfAbstractOID: attOID, Value: Val ), SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID)

P. AtzeniJISBD - October 5, The problem ModelGen (a model management operator, [Bernstein 2003]) –given two data models M1 and M2, and a schema S1 of M1 (the source schema and model), generate a schema S2 of M2 (the target schema and model), corresponding to S1 and, for each database D1 over S1, generate an equivalent database D2 over S2 Generality: what are the models we consider? Correctness: what do corresponding and equivalent mean?

P. AtzeniJISBD - October 5, Many artifacts and models for them Many notations, each with variants and conventions: –Object diagrams –Interface definitions –Database schemas –Web site layouts –Control flow diagrams –XML schemas –Form definitions –…

P. AtzeniJISBD - October 5, Many models just in the database world With different features and goals –semantic models and logical models: E-R, functional, (conceptual) object relational, object-relational, object, network –general purpose models (for all seasons) and problem oriented models (for specific contexts: DW, statistical, spatial, temporal) Variations of models –models with different levels of abstraction –versions within a family (e.g. many versions of the ER model) More models recently with the Web and XML

P. AtzeniJISBD - October 5, What do we need to exchange and translate In design: –Artifacts (schemas and other descriptions) In operations: –Data (in databases, files, documents, …)

P. AtzeniJISBD - October 5, Elementary steps There can be a set of predefined basic translations, for example –eliminate n-ary aggregations; replace them with binary ones (and abstracts) –eliminate binary aggregations; replace them with functions –eliminate functions to abstracts; replace them with aggregations –eliminate complex attributes; replace them with simple attributes and abstracts Assumed to be correct and so complex translations built over them are correct by definition (an “axiomatic” approach)

P. AtzeniJISBD - October 5, A complex translation From an N-ary ER model with generalizations to a simple Object model with only single valued references and no generalizations –Eliminate N-ary relationships (replaced by binary ones and new entities) –Eliminate attributes from relationships –Eliminate many-to-many relationships –Transform relationships to references –Eliminate generalizations

P. AtzeniJISBD - October 5, "An aggregation of lexicals for each M:N aggregation of abstracts" SM_AggregationOfLexicals( OID: #aggregationOID_2(aggOID), Name: n )  SM_BinaryAggregationOfAbstracts( OID: aggrOID, Name: n, isFun1: "False", isFun2: "False") ; another Skolem function #aggregationOID_2(): –generates a "table" for a "relationship" the rule applies only to M:N "relationships", due to the conditions on isFun1 and isFun2

P. AtzeniJISBD - October 5, Translation rules, data level i-SM_ ComponentOfAggregation… ( OID: #i-componentOID_1 (i-attOID), i-AggrOID: #i-aggregationOID_1(i-absOID), ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID), Value: Value ) ← … SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), AggrOID: #aggregationOID_1(absOID), Name: name, IsNullable: isNullable, IsKey: isIdent, Type : type ) ← …

P. AtzeniJISBD - October 5, Translation rules, data level ← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID, AttributeOfAbstractOID: attOID, Value: Value ), SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID, Name: name, IsNullable: isNullable, IsID: isIdent, Type: type ) ← SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID, Name: name, IsIdent: isIdent, IsNullable: isNullable, Type: type ) ;

P. AtzeniJISBD - October 5, Management of translations Basic properties: –Correctness, minimality, … Construction of complex translations by picking basic translations in the library

P. AtzeniJISBD - October 5, ModelGen: the architecture

P. AtzeniJISBD - October 5, A simple example An object relational database, to be translated in a relational one Source: the OR-model Target: the relational model System managed ids

P. AtzeniJISBD - October 5, Example, 2 Does the OR model allow for keys? Assume EmpNo and Name are keys

P. AtzeniJISBD - October 5, Example, 3 Does the OR model allow for keys? Assume no keys are specified