Model-independent Schema and Data Translation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Francesca Bugiotti Università Roma Tre 1 17/12/2009.
Data integration and transformation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010.
Università degli studi Roma Tre 1 Management of heterogeneity in the Semantic Web Paolo Atzeni Pierluigi Del Nostro Semantic Web and Databases Atlanta,
Information capacity in schema and data translation Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari, G. Gianforme,
IS698: Database Management Min Song IS NJIT. The Relational Data Model.
Relational Schemas and Predicate Logic: Notation.
The Entity-Relationship (ER) Model
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
The Relational Database Model
1 Class Number – CS 304 Class Name - DBMS Instructor – Sanjay Madria Instructor – Sanjay Madria Lesson Title – EER Model –21th June.
The Relational Model System Development Life Cycle Normalisation
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Conceptual Data Modeling Using Entities and Relationships.
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
Augmenting Traditional Conceptual Models to Accommodate XML Structures Stephen W. Liddle Information Systems Department Reema Al-Kamha & David W. Embley.
Model-independent schema and data translation Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari, G. Gianforme,
Sunday, June 28, Entity-Relationship and Enhanced Entity-Relationship Conceptual Data Models (Chapters 6 & Section 7.1)
Describing Syntax and Semantics
Model-independent Schema and Data Translation Paolo Atzeni 14/10/2009.
Foundations This chapter lays down the fundamental ideas and choices on which our approach is based. First, it identifies the needs of architects in the.
Mapping ERM to relational database
Chapter 3: The Enhanced E-R Model
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Data Modeling Using the Entity-Relationship Model
3 The Relational Model MIS 304 Winter Class Objectives That the relational database model takes a logical view of data That the relational model’s.
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
11 1 Object oriented DB (not in book) Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel Learning objectives: What.
RAJIKA TANDON DATABASES CSE 781 – Database Management Systems Instructor: Dr. A. Goel.
1 ER Modeling BUAD/American University Entity Relationship (ER) Modeling.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
An Algebra for Composing Access Control Policies (2002) Author: PIERO BONATTI, SABRINA DE CAPITANI DI, PIERANGELA SAMARATI Presenter: Siqing Du Date:
Model-independent schema and data translation: a run-time approach Paolo Atzeni Based on work done with L. Bellomarini, P. Bernstein, F. Bugiotti, P. Cappellari,
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Data integration and transformation 3. Data Exchange Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 28/10/2009.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Model-Independent Schema and Data Translation Paolo Atzeni Università Roma Tre Based on Tutorial at ICDE 2006 Paper in EDBT 2006 (with P. Cappellari and.
Entity-Relationship Model Using High-Level Conceptual Data Models for Database Design Entity Types, Sets, Attributes and Keys Relationship Types, Sets,
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Functional Dependencies and Normalization Chapter 15.
UNIT_2 1 DATABASE MANAGEMENT SYSTEM[DBMS] [Unit: 2] Prepared By Lavlesh Pandit SPCE MCA, Visnagar.
Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
CS263 Lecture 5: Logical Database Design Can express the structure of a relation by a Tuple, a shorthand notation Name of the relation is followed (in.
Karolina Muszyńska Based on: S. Wrycza, B. Marcinkowski, K. Wyrzykowski „Język UML 2.0 w modelowaniu SI”
Database Design – Lecture 6 Moving to a Logical Model.
Entity-Relation Model. E-R Model The Entity-Relationship (ER) model was originally proposed by Peter in 1976 ER model is a conceptual data model that.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
CSE 412/598 DATABASE MANAGEMENT COURSE NOTES 3. ENTITY-RELATIONSHIP CONCEPTUAL MODELING Department of Computer Science & Engineering Arizona State University.
1 6 Concepts of Database Management, 5 th Edition, Pratt & Adamski Chapter 6 Database Design 2: Design Methodology Spring 2006.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts Chapter 2: Entity-Relationship Model Entity Sets Relationship Sets Mapping Constraints Keys.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module 8: Entity-Relationship.
Management of schema translations in a model generic framework Paolo Atzeni Università Roma Tre Joint work with Paolo Cappellari, Giorgio Gianforme (Università.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
Lecture # 14 Chapter # 5 The Relational Data Model and Relational Database Constraints Database Systems.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
COP Introduction to Database Structures
Database Management System
ISC321 Database Systems I Chapter 10: Object and Object-Relational Databases: Concepts, Models, Languages, and Standards Spring 2015 Dr. Abdullah Almutairi.
Presentation transcript:

Model-independent Schema and Data Translation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010

Content Part 1: Background: –Data integration and transformationData integration and transformation Part 2: Core: –Model-independent Schema and Data Translation GII School 29/09/20102

References P. Atzeni, G. Gianforme, and P. Cappellari. A Universal Metamodel and Its Dictionary. Transactions on Large-Scale Data-and Knowledge-Centered Systems I, 2009A Universal Metamodel and Its Dictionary P. Atzeni, P. Cappellari, R. Torlone, P. A. Bernstein, and G. Gianforme. Model-independent schema translation. VLDB Journal, 2008Model-independent schema translation P. Atzeni, P. Cappellari, and P. A. Bernstein. Model independent schema and data translation. EDBT 2006.Model independent schema and data translation P. Atzeni, L. Bellomarini, F. Bugiotti and G. Gianforme. A runtime approach to model-independent schema and data translation. EDBT 2009A runtime approach to model-independent schema and data translation GII School 29/09/20103

4 Schema and data translation Schema translation: –given schema S1 in model M1 and model M2 –find a schema S2 in M2 that “corresponds” to S1 Schema and data translation: –given also a database D1 for S1 –find also a database D2 for S2 that “contains the same data” as D1

GII School 29/09/20105 A wider perspective (Generic) Model Management: –A proposal by Bernstein et al (2000 +) –Includes a set of operators on schemas and mappings between schemas –The main operators Match Merge Diff ModelGen (=schema translation)

GII School 29/09/20106 A long standing issue Translations from a model to another have been studied since the 1970’s Whenever a new model is defined, techniques and tools to generate translations are studied However, proposals and solutions are usually model specific: –Given an ER schema, find the suitable relational schema that “implements” it the original paper (Chen 1976) contains the basics further discussions by many (e.g. Markowitz and Shoshani 1989) illustrated in every textbook –Similarly with any other conceptual model and any other logical one XML and relational (or object)

GII School 29/09/20107 A simple example An object relational database, to be translated in a relational one Source: the OR-model Target: the relational model System managed ids used as references

GII School 29/09/20108 Example, 2 How do we traslate? –Two possibilities. Does the OR model allow for keys? –If YES (EmpNo and Name are keys)

GII School 29/09/20109 Example, 3 How do we traslate? –Two possibilities. Does the OR model allow for keys? –If NOT

GII School 29/09/ Many different models (plus variants …) OR Relational OO … ER XSD

GII School 29/09/ Heterogeneity We need to handle artifacts and data in various models –Data are defined wrt to schemas –Schemas wrt to models –How can models be defined? We need metamodels Data Models Schemas = metaschemas = metadata

GII School 29/09/ A metamodel approach The constructs in the various models are rather similar: –can be classified into a few categories (Hull & King 1986): Abstract (entity, class, …) Lexical: set of printable values (domain) Aggregation: a construction based on (subsets of) cartesian products (relationship, table) Function (attribute, property) Hierarchies … We can fix a set of metaconstructs (each with variants): –abstract, lexical, aggregation, function,... –the set can be extended if needed, but this will not be frequent A model is defined in terms of the metaconstructs it uses

GII School 29/09/ The metamodel approach, example The ER model: –Abstract (called Entity) –Function from Abstract to Lexical (Attribute) –Aggregation of abstracts (Relationship) –…–… The OR model: –Abstract (Table with ID) –Function from Abstract to Lexical (value-based Attribute) –Function from Abstract to Abstract (reference Attribute) –Aggregation of lexicals (value-based Table) –Component of Aggregation of Lexicals (Column) –…–…

GII School 29/09/ The supermodel A model that includes all the meta-constructs (in their most general forms) –Each model is subsumed by the supermodel (modulo construct renaming) –Each schema for any model is also a schema for the supermodel (modulo construct renaming) In the example, a model that generalizes OR and relational Each translation from the supermodel SM to a target model M is also a translation from any other model to M: –given n models, we need n translations, not n 2

GII School 29/09/ Generic translations 1. Copy 2. Translation 3. Copy Supermodel Source model Target model Translation: composition 1,2 & 3

GII School 29/09/ Translations within the supermodel We still have too many models: –we have few constructs, but each has several independent features which give rise to variants for example, within simple OR model versions, –Key may be specifiable or not –Generalizations may be allowed or not –Foreign keys may be used or not –Nesting may be used or not –Combining all these, we get hundreds of models! –The management of a specific translation for each model would be hopeless

GII School 29/09/ The metamodel approach, translations As we saw, the constructs in the various models are similar: –can be classified according to the metaconstructs –translations can be defined on metaconstructs, there are standard, known ways to deal with translations of constructs (or variants theoreof) Elementary translation steps can be defined in this way –Each translation step handles a supermodel construct (or a feature thereof) "to be eliminated" or "transformed" Then, elementary translation steps to be combined A translation is the concatenation of elementary translation steps

GII School 29/09/ Many different models OR Relational OO … ER XSD

GII School 29/09/ Many different models (and variants …) OR w/ PK, gen, ref, FK Relational … OR w/ PK, gen, ref OR w/ PK, gen, FK OR w/ PK, ref, FK OR w/ gen, ref OR w/ PK, FK OR w/ PK, ref OR w/ ref

GII School 29/09/ A complex translation, example (0,N) Target: simple object model Eliminate N-ary relationships Eliminate attributes from relationships Eliminate many-to-many relationships Replace relationships with references Eliminate generalizations

GII School 29/09/ Complex translations Binary ER w/o gen N-ary ER w/ gen Binary ER w/ gen Relational Bin ER w/ gen w/o attr on rel OO w/ gen N-ary ER w/o gen Bin ER w/o gen w/o attr on rel Bin ER w/o gen w/o M:N rel Elim. N-ary relationships Elim. Relationship attr.s Elim. M:N relationships Replace relationships with references Elim OO generalizations Elim ER generalizations Bin ER w/ gen w/o M:N rel OO w/o gen …

GII School 29/09/ A more complex example An object relational database, to be translated in a relational one –Source: an OR-model –Target: the relational model

GII School 29/09/ A more complex example, 2 ENG EMP DEPT Last Name School Dept Name Address ID Dept_ID Emp_ID Target: relational model Eliminate generalizations Add keys Replace refs with FKs

A more complex example, 3 GII School 29/09/ EMP Last Name ID Dept_ID DEPT Name Address ID ENG School ID Emp_ID Target: relational model Eliminate generalizations Add keys Replace refs with FKs Replace objects with tables

GII School 29/09/ Many different models (and variants …) OR w/ PK, gen, ref, FK Relational OR w/ PK, gen, ref OR w/ PK, gen, FK OR w/ PK, ref, FK OR w/ gen, ref OR w/ PK, FK OR w/ PK, ref OR w/ ref Eliminate generalizations Add keys Replace refs with FKs Replace objects with tables Source Target

GII School 29/09/ Translations in MIDST (our tool) Basic translations are written in a variant of Datalog, with OID invention –We specify them at the schema level –The tool "translates them down" to the data level (both in off-line and run-time manners, see later) –Some completion or tuning may be needed

GII School 29/09/ A Multi-Level Dictionary Handles models, schemas and data Has both a model specific and a model independent component Relational implementation, so Datalog rules can be easily specified

GII School 29/09/ A common dictionary (for an ER design tool) Entity OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices AttributeOfEntity OIDSchemaNameisIdentisNullableTypeEntity 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… Relationship OIDSchemaNameEntity1Entity2 4011Memb301302…… ………………… EmployeesDepartments Memb Name Address Name EmpNo

GII School 29/09/ Multi-Level Dictionary model schema model generic model specific description model independence Supermodel description (mSM) Model descriptions (mM) Supermodel schemas (SM) Model specific schemas (M) Supermodel instances (i-SM) Model specific instances (i-M) data

GII School 29/09/ The supermodel description MSM-Construct OIDNameIsLex 1AggregationOfLexicalsF 2ComponentOfAggrOfLexT 3AbstractF 4AttributeOfAbstractT 5BinaryAggregationOfAbstractsF MSM-Property OIDNameConstr.Type 11Name1String 12Name2String 13IsKey2Boolean 14IsNullable2Boolean 15Type2String 16Name3String 17Name4String 18IsIdentifier4Boolean 19IsNullable4Boolean 20Type4String 21IsFunct15Boolean 22IsOptional15Boolean 23Role15String 24IsFunct25Boolean 25IsOptional25Boolean 26Role25String MSM-Reference OIDNameConstructTarget 30Aggregation21 31Abstract43 32Abstract153 33Abstract253 mSMmM SMM i-SMi-M

GII School 29/09/ Model descriptions MSM-Construct OIDNameIsLex 1AggregationOfLexicalsF 2ComponentOfAggrOfLexT 3AbstractF 4AttributeOfAbstractT 5BinaryAggregationOfAbstractsF MM-Construct OIDModelMSM-ConstrName 123ER_Entity 224ER_Attribute 325ER_Relationship 411Rel_Table 512Rel_Column 633OO_Class MM-Model OIDName 1Relational 2Entity-Relationship 3Object MSM-Property OIDNameConstructType ………… MSM-Reference OIDNameConstruct… ………… MM-Property ………… MM-Reference ………… mSMmM SMM i-SMi-M

GII School 29/09/ Schemas in a model ER-Entity OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices ER-AttributeOfEntity OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… SM-Construct OIDNameIsLex ……… 3ER-EntityF 4ER-AttributeOfEntityT ……… ER schemas mSMmM SMM i-SMi-M Employees Departments EmpNo Name Address

GII School 29/09/ Schemas in the supermodel SM-Abstract OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… MSM-Construct OIDNameIsLex ……… 3AbstractF 4AttributeOfAbstractT ……… Supermodel schemas mSMmM SMM i-SMi-M Employees Departments EmpNo Name Address

GII School 29/09/ i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe3010 … Bob White3011 …………… Instances in the supermodel SM-Abstract OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… Supermodel schemas Supermodel instances mSMmM SMM i-SMi-M

GII School 29/09/ Multi-Level Repository, generation and use model schema model generic model specific description model independence Supermodel description (mSM) Model descriptions (mM) Supermodel schemas (SM) Model specific schemas (M) Supermodel instances (i-SM) Model specific instances (i-M) data Structure fixed, content provided by tool designers Structure generated by the tool from the content of mSM Structure fixed, content provided by model designers out of mSM Structure generated by the tool from the content of mM

GII School 29/09/ Translations Basic translations are written in a variant of Datalog, with OID invention –We specify them at the schema level –The tool "translates them down" to the data level –Some completion or tuning may be needed

GII School 29/09/ A basic translation From (a simple) binary ER model to the relational model –a table for each entity –a column (in the table for E) for each attribute of an entity E –for each M:N relationship a table for the relationship columns … –for each 1:N and 1:1 relationship: a column for each attribute of the identifier …

GII School 29/09/ A basic translation application Departments NameAddress Employees EmpNo Name Affiliation Departments 1,1 0,N Name Address Employees EmpNoNameAffiliation

GII School 29/09/ A basic translation (in supermodel terms) From (a simple) binary ER model to the relational model –an aggregation of lexicals for each abstract –a component of the aggregation for each attribute of abstract –for each M:N aggregation of abstracts … … From (a simple) binary ER model to the relational model –a table for each entity –a column (in the table for E) for each attribute of an entity E –for each M:N relationship a table for the relationship columns … –for each 1:N and 1:1 relationship: a column for each attribute of the identifier …

GII School 29/09/ "An aggregation of lexicals for each abstract" SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: name)  SM_Abstract ( OID: OID, Name: name ) ;

GII School 29/09/ Datalog with OID invention Datalog (informally): –a logic programming language with no function symbols and predicates that correspond to relations in a database –we use a non-positional notation Datalog with OID invention: –an extension of Datalog that uses Skolem functions to generate new identifiers when needed Skolem functions: –injective functions that generate "new" values (value that do not appear anywhere else; so different Skolem functions have disjoint ranges

GII School 29/09/ "An aggregation of lexicals for each abstract" SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: n)  SM_Abstract ( OID: OID, Name: n ) ; the value for the attribute Name is copied (by using variable n) the value for OID is "invented": a new value for the function #aggregationOID_1(OID) for each different value of OID, so a different value for each value of SM_Abstract.OID

GII School 29/09/ "An aggregation of lexicals for each abstract" SM-Abstract OIDSchemaName 3011Employees 3021Departments ……… SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText301 ………………… Employees EmpNo Name 11 Departments1002 Employees1001 SM_AggregationOfLexicals( OID: #aggregationOID_1(OID), Name: n)  SM_Abstract ( OID: OID, Name: n ) ; … …… 1001 SM-AggregationOfLexicals SchemaNameOID SM-aggregationOID_1_SK absOIDOID 301 Employees

GII School 29/09/ "A component of the aggregation for each attribute of abstract" SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type : type ) ; Skolem functions –are functions –are injective –have disjoint ranges the first function "generates" a new value the second "reuses" the value generated by the first rule

GII School 29/09/ A component of the aggregation for each attribute of abstract" SM-Abstract OIDSchemaName 3011Employees 3021Departments ……… Employees EmpNo Name 11 Departments1002 Employees1001 … …… 1001 SM-AggregationOfLexicals SchemaNameOID SM-aggregationOID_1_SK absOIDOID 301 SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type : type ) ; SM-componentOID_1_SK absOIDOID Employees EmpNoName SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText301 ………………… SM-ComponentOfAggregationOfLexicals SchemaTypeAggrOIDisNullableisIdentNameOID Text1001FFName IntFTEmpNo

Many rules, how to choose? Models can be described in a compact way –the involved constructs and their properties (boolean propositions) Datalog rules can be "summarized" by means of "signatures" that describe the involved constructs and how they are transformed (in terms of boolean properties) GII School 29/09/201046

GII School 29/09/ Model Signatures M Rel = {T(true), C(true)} relational M RelNoN = {T(true), C(¬N)} relational with no nulls M ER = {E(true), A(true), R(true), A-R(true)} ER M ERsimple = {E(true), A(¬N), R(true)} M ERnoM2N = {E(true), A(true), R(F 1  F 2 ), A-R(true)} T table C column; ¬N: no null values allowed E entity R relationship ; F 1  F 2 : no many-to-many A attribute A-R attribute of relationship

GII School 29/09/ Rule signature –Body - B List of signatures of body constructs Applicability of the rule –Head - H Signature of the atom in the head Sure conditions of the result of the application of the rule –MAP Mapping between properties of body constructs and properties of the head construct Transfer of values from the body to the head of the rule

GII School 29/09/ Rule signature Example Relationship (OID:#relationship_1(eOid,rOid), Name: eN+rN, isOpt1: false, isFunct1: true, isIdent: true, isOpt2: isOpt, isFunct2: false, Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))  Relationship (OID:,rOid, Name: rN, isOpt1: isOpt, isFunct1: false, isFunct2: false, Entity1: eOid), Entity (OID: eOid, Name:eN)

GII School 29/09/ Rule signature –Body Relationship (OID:#relationship_1(eOid,rOid), Name: eN+rN, isOpt1: false, isFunct1: true, isIdent: true, isOpt2: isOpt, isFunct2: false, Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))  Relationship (OID:,rOid, Name: rN, isOpt1: isOpt, isFunct1: false, isFunct2: false, Entity1: eOid), Entity (OID: eOid, Name:eN)

GII School 29/09/ Rule signature –Head Relationship (OID:#relationship_1(eOid,rOid), Name: eN+rN, isOpt1: false, isFunct1: true, isIdent: true, isOpt2: isOpt, isFunct2: false, Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))  Relationship (OID:,rOid, Name: rN, isOpt1: isOpt, isFunct1: false, isFunct2: false, Entity1: eOid), Entity (OID: eOid, Name:eN)

GII School 29/09/ Rule signature –MAP Relationship (OID:#relationship_1(eOid,rOid), Name: eN+rN, isOpt1: false, isFunct1: true, isIdent: true, isOpt2: isOpt, isFunct2: false, Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))  Relationship (OID:,rOid, Name: rN, isOpt1: isOpt, isFunct1: false, isFunct2: false, Entity1: eOid), Entity (OID: eOid, Name:eN)

Reasoning Formal System –Compact representation of models and rules Based on logical formulas –Reasoning on data models Union, intersection, difference of models and schemas Applicability and application of rules and programs –Sound and complete with respect to the Datalog programs let us see 53GII School 29/09/2010

54 Reasoning on translations M 1 = SIG( M 1 ) description of model M 1 r P = SIG( P ) signature of Datalog program P M 2 = r P (M 1 ) application of the sig of P to the desc of M 1 S1  M1S1  M1 P S 2 = P (S 1 ) M 2 = r P (M 1 ) M 1 = SIG( M 1 ) r P = SIG( P ) Theorem Program P applied to schemas of M 1 generates schemas (and somehow all of them) that belong to a model M 2 whose description is M 2 = r P (M 1 ) schema model program S 2  M 2 M 2 = SIG( M 2 )

Reasoning Main application –We automatically find a sequence of basic translations to perform the transformation of a schema from a model to another, under suitable assumptions Observations –Few “families” of models ER, OO, Relational… Each family has a progenitor –Two kinds of Datalog programs Reduction Transformation 55GII School 29/09/2010

Reasoning Automatic Translation –3-step transformation Reduction within the source family Transformation from the source family to the target family Reduction within the target family M∗1M∗1 M∗2M∗2 M1M1 M'1M'1 M'2M'2 M2M2 T F1F1 F2F2 56GII School 29/09/2010

57 Correctness Usually modelled in terms of information capacity equivalence/dominance (Hull 1986, Miller 1993, 1994) Mainly negative results in practical settings that are non-trivial Probably hopeless to have correctness in general We follow an "axiomatic" approach: –We have to verify the correctness of the basic translations, and then infer that of complex ones

The data level So far, we have considered only schemas How do we translate data? Should we move data or translate it on the fly? GII School 29/09/201058

GII School 29/09/ Translations: off-line approach The dictionary has a third level, for data Basic translations are "translates them down" to the data level Some completion or tuning may be needed

GII School 29/09/ Multi-Level Dictionary model schema model generic model specific description model independence Supermodel description (mSM) Model descriptions (mM) Supermodel schemas (SM) Model specific schemas (M) Supermodel instances (i-SM) Model specific instances (i-M) data

GII School 29/09/ i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe3010 … Bob White3011 …………… Instances in the supermodel SM-Abstract OIDSchemaName 3011Employees 3021Departments 2013Clerks 2023Offices SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar AddressFFText CodeTFInt201 ………………… Supermodel schemas Supermodel instances mSMmM SMM i-SMi-M

GII School 29/09/ Generating data-level translations Schema translation Data translation Same environment Same language High level translation specification Supermodel description (mSM) Supermodel schemas (SM) Supermodel instances (i-SM)

GII School 29/09/ Translation rules, data level i-SM_ ComponentOfAggregation… ( OID: #i-componentOID_1 (i-attOID), i-AggrOID: #i-aggregationOID_1(i-absOID), ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID), Value: Value ) ← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID, AttributeOfAbstractOID: attOID, Value: Value ), SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID, Name: attName, IsNullable: isNull, IsID: isIdent, Type: type ) SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type: type ) ;

GII School 29/09/ i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe Bob White CS3012 …………… Instances in our dictionary SM-Abstract OIDSchemaName 3011Employees 3021Departments SM-AttributeOfAbstract OIDSchemaNameisIdentisNullableTypeAbstrOID 4011EmpNoTFInt NameFFText NameTFChar302 ………………… mSMmM SMM i-SMi-M John Doe Bob White … CS

GII School 29/09/ Translation of instances John Doe Bob White … CS Employees EmpNoNameAffiliation 75432John DoeCS Bob White Departments Name CS Departments NameAddress Employees EmpNo Name Affiliation Departments 1,1 0,N Name Address Employees EmpNoNameAffiliation

GII School 29/09/ i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe CS3012 …………… Instances in our dictionary John Doe Bob White … CS Employees EmpNoNameAffiliation 75432John DoeCS Departments Name CS i-SM-AggregationOfLexicals AggOID InstOIDOID 5010CS John Doe i-SM-ComponentOfAggregationOfLexicals i- AggOIDValueInstOIDCompOfAggOIDOID

GII School 29/09/ i-SM-Abstract OIDAbsOIDInstOID ……… i-SM-AttributeOfAbstract OIDAttrOfAbsOIDInstOIDValuei- AbsOID John Doe CS3012 …………… Instances in our dictionary i-SM-AggregationOfLexicals AggrOID InstOIDOID John Doe CS i-SM-ComponentOfAggregationOfLexicals i- AggrOIDValueInstOIDCompOfAggOIDOID i-SM_ComponentOfAggregation… (OID: #i-componentOID_1 (i-attOID), i-AggrOID: #i-aggregationOID_1(i-absOID), ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID),Value: Val ) ← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID, AttributeOfAbstractOID: attOID, Value: Val ), SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID)

GII School 29/09/ Translation rules, data level i-SM_ Lexical ( OID: #i-lexicalOID_1 (i-attOID), i-AggrOID: #i-aggregationOID_1(i-absOID), LexicalOID: #lexicalOID_1(attOID), Value: Value ) ← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID, AttributeOfAbstractOID: attOID, Value: Value ), SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID, Name: attName, IsNullable: isNull, IsID: isIdent, Type: type ) SM_Lexical( OID: #lexicalOID_1(attOID), Name: name, AggrOID: #aggregationOID_1(absOID), IsNullable: isNullable, IsKey: isIdent, Type : type ) ← SM_AttributeOfAbstract( OID: attOID, Name: name, AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable, Type: type ) ;

Off-line approach GII School 29/09/ Operational Systems MIDST DB Translator Importer Exporter Supermodel DB

GII School 29/09/ Experiments A significant set of models –ER (in many variants and extensions) –Relational –OR –XSD –UML

GII School 29/09/ XSD OR noTab ref, FK, nested Relational OO ref, nested OR Tab, ref, FK, nested OO ref, flat OR noTab, FK, nested OR noTab, FK, flat OR Tab, gen ref, FK, nested OR noTab, gen ref, FK, nested OR noTab, gen, FK, nested 37+ Remove generalizations 36 Unnest sets (with ref) 03 Unnest sets (with FKs) 24 Unnest structures (flatten) 06 Unnest structures (TT & FKs) 01 Unnest structures (TT & ref) 43 FKs for references 29 Tables for typed tables 30 Typed tables for tables 13 References for FK 31 Nest referenced classes

The off-line approach: drawbacks It is highly inefficient, because it requires databases to be moved back and forth It does not allow data to be used directly A "run-time" approach is needed 72GII School 29/09/2010

A run-time alternative: generating views Main feature: –generate, from the datalog traslation programs, executable statements defining views representing the target schema. How: –by means an analysis of the datalog schema rules under a new classification of constructs 73GII School 29/09/2010

Runtime translation procedure Views DB Supermodel Schema Importer View Generator Translator MIDSTOperational System Access via target schema S t Access via source schema S s 74GII School 29/09/2010

Run-time vs off-line GII School 29/09/ Operational Systems MIDST DB Translator Importer Exporter Supermodel DB Views DB Supermodel Schema Importer View Generator Translator MIDSTOperational System Access via target schema Access via source schema

GII School 29/09/ Issues Reasoning on translations ModelGen in the model management scenario: –Round-trip engineering (and more) Customization of translations Off-line translation vs run-time “Correctness” of data translation wrt schema translation: –compare with data exchange Schematic heterogeneity and semantic Web framework: –What if the distinction between schemas and instances is blurred? Materialized Skolem function & provenance