Presentation is loading. Please wait.

Presentation is loading. Please wait.

OLE: ORM Logic-based English A Language for Redesigning and Migrating Databases Herman Balsters University of Groningen The Netherlands September, 2012.

Similar presentations


Presentation on theme: "OLE: ORM Logic-based English A Language for Redesigning and Migrating Databases Herman Balsters University of Groningen The Netherlands September, 2012."— Presentation transcript:

1 OLE: ORM Logic-based English A Language for Redesigning and Migrating Databases Herman Balsters University of Groningen The Netherlands September, 2012

2 Situation Designers (technical and non-technical) of information systems want to write down constraints and derivation rules to describe their business context They want a language that is easy to read and write by both designers and domain experts: hence, the constraints and derivation rules should be able to be validated by non- technical domain experts The language has to be expressive enough to capture typical business rules and constraints The language should be precise enough to be completely unambiguous and eventually translate to a technical platform (e.g., SQL)

3 In our case: a fact-based language for re-engineering, re- structuring, and migrating databases 1.A user (of our method) is a person who wants to deliver a re-engineered and re-structured database, serving as a basis for, e.g., a database migration 2.Axiom: There is no sense in migrating data of which the semantics are not known 3.The semantics of each database table from a given database is reconstructed by offering a corresponding natural language sentence capturing the intended meaning of the table heading. 4.The first offering of this sentence can be in informal semantics/syntax 5.This sentence is rewritten to a structured format, coined as OLE: ORM Logic-based English. This format is mandatory and fixed for the user. 6.OLE is a sugared form of Sorted Logic, and is easy to read and write by non-technical domain experts (hence enabling validation of the OLE statements). 7.Why take logic as a basis?: ORM diagrams are directly related to first- order predicate logic (cf. PhD- thesis Halpin, and other recent work by Halpin)

4 OLE continued.. 8. Objects are categorized as entities and values, and entities are offered reference modes. 9. Elementary facts in OLE may be added with the following relevant constraints: uniqueness constraints mandatory constraints external uniqueness constraints subtyping 10. Entities can also be identified by using compound reference schemes (So far, OLE looks like CSDP, but then for textual modeling) 11. OLE allows for ad hoc specifications of constraints (what you can express as a constraint in logic, you can express in OLE) 12. OLE allows for ad hoc specifications of derivation rules (what you can express as a derivation rule in logic, you can express in OLE) 13. OLE-specs eventually translate to SQL

5 Comparable approaches Common Logic Controlled English (CLCE, John Sowa): Adopted by the OMG, ISO standard The Attempto project (ACE, Kuhn): Texts in ACE are used to represent knowledge and rules, e.g. within the context of the Web Constellation Query Langauge (CQL, Clifford Heath): Texts in CQL can be seen as an alternative for ORM diagrams to model the UoD Formal ORM Language 2 (FORML2, Terry Halpin and Jan Pieter Wijbenga): highly expressive natural-language based specification language for derivation rules in ORM Some differences with OLE CLCE and ACE are not primarily directed at data modelling, but is a general requirements specification language, based on sugared variants of first-order predicate logic CLCE attempts to offer UML class diagrams a more formal and complete basis (cf. FUML) CQL is not based on logic FORML2 is not based on logic, but more on how humans actually communicate in natural language

6 OLE and its correspondence to Sorted Logic with variables (SORT): SORT:= UnaryPredicate(Term) | BinaryPredicate(Term,Term) | TernaryPredicate(Term,Term,Term) | Term=Term | ~ (SORT) | SORT{ & | v | → | ↔ }SORT | { ∀ | ∃ | ∃ 0.. * | ∃ 0..1 | ∃ 1 | ∃ 0 } Var : Type (SORT)| ∀ Var1, Var2: Type(Var1 BinaryPredicate Var2 ↔ SORT)| ∀ Var1:EntityType, Var2: Type (Var1.RoleName = Var2 ↔ SORT | ( SORT ). Term:= Var | Object| Entity.Rolename | Entity.Refmode | BaseRelation.Attribute Rolename:= Name RefMode:= Name Object:= Entity | Value Type:= ValueType | EntityType | BaseRelation EntityType:= Name ValueType:= Name

7 OLE: grammar spec (not complete, and in non-variable format) OLE:= Term UnaryPredicate | Term BinaryPredicate Term | Term TernaryPredicateFirstPart Term TernaryPredicateSecondPart Term | Term isa Term | Term=Term | not OLE | OLE { and | or } OLE | if OLE then OLE | for each Type { and Type} *, OLE | there is { some | possibly some | at most one | exactly one | no } Type { and Type} *, such that OLE | for each Type1 and Type2, Type1 BinaryPredicate Type2 iff OLE | for each EntityType and Type, the Rolename of EntityType is Type iff OLE | ( OLE ). Term:= { that } Type | the Rolename of EntityType | EntityType.Refmode | BaseRelation.Attribute | Constant Rolename:= Name Type:= ValueType | EntityType | BaseRelation EntityType:= Name RefMode:= Name ValueType:= Name Constant is some concrete value (e.g. “1’, “abc”)

8 OLE examples 1.for each Project, there is exactly one Description, such that Project has Description 2.for each Student and Project, there is at most one StudentProject, such that StudentProject involves Student and StudentProject involves Project 3.for each StudentProject, there is exactly one Mentor, such that StudentProject has Mentor

9 Migrating Databases Consider the following table STUDENT(nr, mentor, project, projectDescription) (informally) defined by: "Student is identified by number and has mentor for the project described by projectDescription" Notice: definition involves soft semantics, a composite fact, and non-normalized data How do we re-engineer and re-structure this table into an acceptable format? Acceptable format = Normalized (no redundancy) Correct and complete semantics (including all relevant constraints) Our example is not acceptable because –as a fact type- it violates the (n-1)-rule (consider the non-key dependency project → projectDescription)

10 OLE ReDesigner method Step 1: List the elementary facts for the target model Student has Mentor for Project Project is described by ProjectDescription Step 2: For each elementary fact, list all entities and values, along with reference modes and basic types Student is Entity (and is referred to by nr (of type integer)) Mentor is Entity (and is referred to by name (of type integer)) Project is Entity (and is referred to by code (of type varchar)) Description is Entity (and is referred to by name (of type varchar)) Step 3: For each elementary fact, list all uniqueness constraints for each Student and Project, there is exactly one Mentor, such that Student has Mentor for Project for each Project, there is exactly one Description, such that Project has Description

11 Putting this information into the NORMA tool, will yield the following proper ORM model (re-engineering the schema of the original Student base table): Model M1

12 Canonical ORM format (equivalent to the original proper ORM model) yields: Model M2 Canonical ORM (Halpin, NORMA), can be completely derived from the original source table populations, and is a possible model on which to base an actual migration.

13 The importance of CoReference-ORM It is the model format that is as close as possible to the table format we eventually want in the implementation of the target model. The canonical model can be called the binary break-down of the target model. However, it is still completely conceptual: semantics are given by fully validated elementary fact statements. (This in contrast to the corresponding relational view: table headings often do not even offer a clue about the table’s underlying fact types!) The canonical model offers fully re-engineered, re-structured and validated semantics of the original source database. The canonical model is therefore used as the model in which we define the derivation rules defining the target database These derivation rules (in OLE) offer what we call the binary build-up of the restructured database (i.e., the target database) Hence, migration is performed by binary break-down followed by binary build-up

14 We have the following rules to populate the fact types of M2 (assuming a population of the source model M1) : (1) for each Student and Project and Mentor, if Student has Mentor for Project then there is some StudentProject, such that StudentProject involves Student and StudentProject involves Project (2) for each StudentProject and Student and Project, if StudentProject involves Student and StudentProject involves Project then there is some Mentor, such that Student has Mentor for Project Rule(1) tells us that the entity type StudentProject is at least populated by taking instances from the fact type Student has Mentor for Project from Model M1 Rule(2) tells us that the entity type StudentProject is at most populated by instances taken from the fact type Student has Mentor for Project from Model M1

15 We can now prove the following properties (proof is based on the previous rules (1,2) and the uc’s taken from model M1): (i) for each Student and Project, there is at most one StudentProject, such that StudentProject involves Student and StudentProject involves Project (ii)for each StudentProject, there is exactly one Student, such that StudentProject involves Student (iii)for each StudentProject, there is exactly one Project, such that StudentProject involves Project Properties (i), (ii), and (iii) now allow us to introduce the following (derived) compound reference scheme for StudentProject: StudentProject is Entity ( and is referred to by Student and Project, where StudentProject involves Student and StudentProject involves Project)

16 This results in the following intermediate target database as suggested by NORMA (Rmap) This target database still has to be transformed into its final “view” format: all attributes occurring in the 2 tables are to be declared as derived

17 How do migrate our original base tables to a Re-engineered/Restructured database? Here, the original source table STUDENT(nr,mentor,project,projectDescription) serves as the input base relation from which we derive the population of the canonical ORM model and, subsequentky, the target database We proceed by using Derivation Rules in OLE …

18 (1) for each Nr, Name, and Code, there is some StudentProject, such that the studentNr of StudentProject is that Nr and the mentorName of StudentProject is that Name and the projectCode of StudentProject is that Code iff there is some STUDENT, such that STUDENT.nr = Nr and STUDENT.project = Code and STUDENT.mentor = Nam (2) for each Nr, Name, String and Code, there is some Project, such that the projectCode of Project is that Code and the Description of Project is that Name iff there is some STUDENT, such that STUDENT.nr = Nr and STUDENT.project = Code and STUDENT.mentor = Name and STUDENT.projectDescription = that String

19 We note that definition (1) respects the (derived) compound reference scheme for StudentProject, proved in the previous section. We note that in (1,2) we have employed a general scheme for defining views. Say that: - we wish to define a view V, with attributes x 1,.., x n (of domain type X 1,.., X n ) - this view is to be defined in terms of a set of base relations B 1,.., B m - Ψ denotes the predicate describing the definition of V then the general formula (in Sorted Logic) stating that V is defined in terms of Ψ is given by ∀ x 1 :X 1,.., ∀ x n :X n [ ∃ v:V v(x 1,.., x n ) ⇔ ∃ b 1 :B 1,.., ∃ b m :B m Ψ(b 1,.., b m, x 1,.., x n )]

20 The OLE rules (1,2) result in the following target table definitions (generated by the ORM ReDesigner tool) Define the target table StudentProject with column names StudentNr, projectCode, mentorName as: rows consisting of STUDENT.nr, STUDENT.project, STUDENT.mentor for each STUDENT Define the target table Project with column names projectCode, description as: rows consisting of STUDENT.project, STUDENT.projectDescription for each STUDENT

21 This results in the following SQL (generated by the ORM ReDesigner tool): CREATE VIEW `Project` (`projectCode`, `descriptionName`) AS SELECT `STUDENT`.`Project`, `STUDENT`.`ProjectDescription` FROM `STUDENT`; CREATE VIEW `StudentProject` (`studentNr`, `projectCode`, `mentorName`) AS SELECT `STUDENT`.`nr`, `STUDENT`.`Project`, `STUDENT`.`Mentor` FROM `STUDENT`;

22 Some Notes on OLE and the ReDesigner OLE has been offered a full BNF language spec, which was implemented in the ORMReDesigner tool (tech. report RuG) OLE contains at least the expressive power of Sorted Logic (tech. report RuG) OLE can handle many of ORM's fact type constructions and constraints We are currently working on a suitable syntax for set comparison constraints (and aggregate functions) ------------------------------------------------------------------------------------------------------- In this paper, we have taken only a simple base relation as a case study; we have, however, applied OLE and the ORMReDesigner to larger case studies (e.g., migrating patient medical records from two different hospitals to a common EPD (Electronic Patient Dossier) database). Both OLE and the ORMReDesigner proved in practice (Master students in a course given on Data Migration, and professional EPD-designers) to be reasonably simple to use, both by the non-technical domain expert (involved in the validation of the OLE-specifications) and professional designer (using the tool to migrate databases).

23 Conclusions We have tried to show how fact-based modeling, in particular ORM and its representation in (sugared) Sorted Logic, can help in reengineering (relational) databases. We reconstruct the semantics of the source database by offering a set of natural-language sentences capturing the conceptual structure and constraints of the source. These sentences are rewritten in a structured natural language format, coined as OLE: ORM Logic-based English. OLE is used to define the mappings from the original source to a reengineered target database. We have discussed the ORM ReDesigner: a semi-automatic tool, based on OLE and NORMA, available as a research prototype, used for reengineering and migrating relational databases.

24 Future work The ORMReDesigner tool is a research protoype, and, hence, needs improvement to perform on an actual professional scale OLE needs to be extended with language features to also support more advanced data modeling (e.g. set comparison constraints) and advanced support for businesss rules (e.g. aggregate functions) Non green-field situations: We are currently working on a combination of BPMN and OLE, to offer a procedure for deriving target data models from target process models (method and tool)

25 Acknowledgement Terry Halpin for in-depth duscussions on using Logic in ORM Excellent feedback/comments from one of the reviewers which greatly improved an important part of the paper


Download ppt "OLE: ORM Logic-based English A Language for Redesigning and Migrating Databases Herman Balsters University of Groningen The Netherlands September, 2012."

Similar presentations


Ads by Google