Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Namur Faculté d'informatique LIBD - Laboratory of Database Application Engineering www.info.fundp.ac.be/libd LIBD Database Reverse Engineering.

Similar presentations


Presentation on theme: "University of Namur Faculté d'informatique LIBD - Laboratory of Database Application Engineering www.info.fundp.ac.be/libd LIBD Database Reverse Engineering."— Presentation transcript:

1 University of Namur Faculté d'informatique LIBD - Laboratory of Database Application Engineering www.info.fundp.ac.be/libd LIBD Database Reverse Engineering Jean-Luc Hainaut Ingénierie des bases de données avancées - avril 2010 Based on the keynote of WCRE'09: Legacy and Future of Database Reverse Engineering

2 LIBD 2 l What is Database Reverse Engineering? l Where does it come from? l Where is it going?

3 LIBD 3 Contents l Context, cultures and history l Databases: facts and challenges l Database reverse engineering motivation: the migration process l Database reverse engineering methodology l Database reverse engineering: A case study l Database reverse engineering: New challenges l Conclusions l Appendix 1: Conceptualization of foreign keys

4 LIBD 4 Context, cultures and history

5 LIBD 5 Context, cultures and history Database Reverse Engineering l A collection of methods and tools to help an organization determine the structure, function, and meaning of its data” [Chikofsky96] l A collection of methods and tools to help an organization to recover/rebuild the complete/partial specifications of its data A goal generally not reachable for programs

6 LIBD 6 Context, cultures and history The role of the database in data-centered business applications DB data Application (business logic) DB schema DMS GUI manager users c o m m u n i c a t i o n s

7 LIBD 7 program DB The database is made up of all the data that describe the current state of the business Oh yes, wouldn't it be nice to write some programs to process these data? Context, cultures and history Programming community vs DB community DB program DB program Programs implement services available to client agents Oh yes, wouldn't it be wise to store data somewhere?

8 LIBD 8 Context, cultures and history Building a database Coding SQL-DDL code Physical design Logical design Information analysis User requirements Conceptual schema Logical (RDB) schema Physical (DB2) schema

9 LIBD 9 Context, cultures and history Building a database l Analysis: elaborating the conceptual schema of the database (ERA, UML class diagrams) l Logical Design: producing the logical schema of the database (e.g., hierarchical, relational, object-relational), l Physical Design: producing the physical schema of the database (e.g., IMS/DL1, DB2, Oracle, SQL Server), l Coding: producing the DDL code of the database Should produce a complete, up to date, documentation of the DB

10 LIBD 10 Context, cultures and history l Database maintenance l Program maintenance l Database evolution l Database/application migration l Data conversion l Database integration/federation All require a complete, up to date, documentation of the DB What next?

11 LIBD 11 Context, cultures and history Rebuild it through Reverse Engineering No documentation?

12 LIBD 12 Context, cultures and history Short history : three periods 80's : Discovery of the concepts of DBRE Migrating CODASYL DBs, IMS DBs, standard files  relational technology Techniques automated DDL code interpretation some trivial heuristics to elicit missing constraints (e.g., foreign keys) Approach insufficient to recover the complete database schema (implicit data structures and constraints implemented in the procedural code and in user interfaces)

13 LIBD 13 Context, cultures and history Short history : three periods 90's : Deepening techniques and methodologies Additional elicitation techniques to recover implicit constructs Importance of program code analysis in DB schema recovery Development of comprehensive methodologies and tools. Need for reverse engineering relational databases

14 LIBD 14 Context, cultures and history Short history : three periods 00's : Widening the scope of DBRE XML as a data model, the web as an infinite DB, semi- structured/unstructured data, semantics expressed through conceptual schemas and ontologies, MD engineering, traceability (formal mapping), help in automating full migration: schema + data + program, explosion of poorly designed (web) databases, complexity and size of corporate databases, skill shortage dynamic SQL ORM-based program development

15 LIBD 15 Databases: facts and challenges

16 LIBD 16 Databases: facts and challenges Some facts (1) l a company may use more than 10 DMSs to implement its information system; l a new version of a DMS every 4 years, l a database may be used by several thousands of programs; l the schema of a large database may include more than 1,000 entity types and 20,000 attributes l some database schemas have got so large and complex that no single data administrator can master them any longer;

17 LIBD 17 Databases: facts and challenges Some facts (2) l description of one entity type and its attributes: from 1 to 100 pages l the functional documentation of a large database may (should) comprise more than 5,000 pages; l the SQL-DDL code of a database (tables, constraints, indexes, triggers, checks, etc.) may comprise 200,000 LOC (5,000 pages); l however, many databases have no documentation.

18 LIBD 18 Databases: facts and challenges Some facts (3) l database schemas share some interesting properties with programs: bugs awkward design dead parts obscure sections (nearly) duplicated sections developed on obsolete platforms poorly documented (if ever) l corrective, preventive and adaptive maintenance (no added value) of an information system may require more than 50% of development effort; l maintenance and evolution are much more costly when no correct documentation is available.

19 LIBD 19 DBRE Motivation: the migration process

20 LIBD 20 Database engineering - The migration process Database/application migration l Porting a complete legacy application, or some of its components, on another, generally more modern, platform. l For a database: changing its DMS. A popular example: migrating the legacy database of a business application to a RDBMS. l Two main approaches : physical approach semantic approach

21 LIBD 21 Database engineering - The migration process Database/application migration (2) l The physical, or one-to-one migration strategy is the cheapest but also the worst approach since it deeply degrades the final structure. Requires no documentation of the DB – Very popular Physical extraction Physical (IDMS) schema IDMS-DDL code SQL-DDL code Coding Physical (DB2) schema Transform

22 LIBD 22 Database engineering - The migration process Database/application migration (3) l Semantic approach: based on an in-depth understanding of the DB structures. Provides a high quality result. Strong basis for the future. Requires a complete, up to date, documentation of the DB Physical extraction Physical (IDMS) schema Logical (DBTG) schema Conceptual schema Logical extraction Conceptual- ization IDMS-DDL code SQL-DDL code Coding Physical design Logical design Logical (RDB) schema Physical (DB2) schema Conceptual schema Reverse Engineering IDMS-DDL code SQL-DDL code Coding Physical design Logical design Logical (RDB) schema Physical (DB2) schema

23 LIBD 23 Database engineering - The migration process Migrating an undocumented data structure (1) l physical (one-to-one) migration SELECT CLIENT ASSIGN TO "CUST.DAT" ORGANIZATION IS INDEXED RECORD KEY IS CUST_ID. FD CUST-FILE. 01 CUSTOMER. 02 CUST-ID PIC X(12). 02 CUST-INFO PIC X(80). 02 CUST-HIST PIC X(1000). Create table CUSTOMER( CUST_ID char(12) not null, CUST_INFO char(80) not null, CUST_HIST char(1000) not null, primary key (CUST_ID)); = =  no added value

24 LIBD 24 Database engineering - The migration process Migrating an undocumented data structure (2) l semantic migration (refinement) SELECT CLIENT ASSIGN TO "CUST.DAT" ORGANIZATION IS INDEXED RECORD KEY IS CUST_ID. FD CUST-FILE. 01 CUSTOMER. 02 CUST-ID PIC X(12). 02 CUST-INFO PIC X(80). 02 CUST-HIST PIC X(1000). ++ 

25 LIBD 25 Database engineering - The migration process Migrating an undocumented data structure (3) l semantic migration (SQL translation)  Create table CUSTOMER( CUST_ID char(12) not null, CUST_NAME char(28) not null, CUST_ADDRESS char(60) not null, CUST_STATUS char(2) not null, primary key (CUST_ID)); Create table CUST_HIST_PURCH( CUST_ID char(12) not null, ITEM char(10) not null, CINDEX smallint not null check(CINDEX <= 100), TOTAL smallint not null, primary key (CUST_ID,ITEM), unique (CUST_ID,CINDEX), foreign key (CUST_ID) reference CUSTOMER);  Normalized DB

26 LIBD 26 Database engineering - The migration process Migrating an undocumented data structure - Synthesis Create table CUSTOMER( CUST_ID char(12) not null, CUST_NAME char(28) not null, CUST_ADDRESS char(60) not null, CUST_STATUS char(2) not null, primary key (CUST_ID)); Create table CUST_HIST_PURCH( CUST_ID char(12) not null, ITEM char(10) not null, CINDEX smallint not null check(CINDEX <= 100), TOTAL smallint not null, primary key (CUST_ID,ITEM), unique (CUST_ID,CINDEX), foreign key (CUST_ID) reference CUSTOMER); Create table CUSTOMER( CUST_ID char(12) not null, CUST_INFO char(80) not null, CUST_HIST char(1000) not null, primary key (CUST_ID)); physical migration semantic migration

27 LIBD 27 Database engineering - The migration process Migrating an undocumented data structure (4) l new application: compute total sales per item ?  where is the required information? how to extract it from the CUSTOMER table? who will develop the (C, Java, VB) program? … and when? Select ITEM, sum(TOTAL) from CUST_HIST_PURCH group by ITEM;  clearly visible + documentation if needed just name the columns by any non expert immediately, 2 minutes

28 LIBD 28 Database reverse engineering: a DMS-independent methodology

29 LIBD 29 Database reverse engineering - Introduction create table COUNTRY ( CNAME char(24) not null, VOLUME numeric(12) not null, primary key (CNAME)); create table EXPORT ( CNAME char(24) not null, SNAME char(18) not null, primary key (CNAME, SNAME), foreign key (CNAME) references COUNTRY, foreign key (SNAME) references SECTOR); create table SECTOR ( SNAME char(18) not null, INCOME numeric(14) not null, primary key (SNAME)); DBRE: the ideal view 0-N export SECTOR Sector Name Income id:Sector Name COUNTRY Country Name Volume id:Country Name

30 LIBD 30 Database reverse engineering - Introduction create table TBL_C ( IDC char(24) not null, COL_2 numeric(12) not null); create table TBL_X ( IDX char(42) not null); create table TBL_S ( IDS char(18) not null, COL_2 numeric(14) not null); create unique index ID_C on TBL_C(IDC); create unique index ID_X on TBL_X(IDX); DBRE: the actual view ? 0-N export SECTOR Sector Name Income id:Sector Name COUNTRY Country Name Volume id:Country Name additional infos

31 LIBD 31 Database reverse engineering - Introduction Where to find the additional information? l DDL analysis or catalog extraction (35%) l Schema analysis (10%) l Data analysis (10%) l HMI analysis (10%) l Program analysis (35%)

32 LIBD 32 Database reverse engineering - Introduction l The DDL code (or the data dictionary) gives us the explicit data structures and constraints. Less than 50% l Many undeclared, therefore hidden, implicit data structures and constraints. More than 50% l Many complex, non-standard constructs: difficult to interpret. What are the challenges?  Physical extraction  Logical extraction  Conceptualization

33 LIBD 33 DBRE - Methodology The DB-MAIN Methodology - The technical processes Conceptualization Conceptual schema Logical extraction Physical extraction Program code Data User interface Reports Documentation DDL code Physical schema Logical schema

34 LIBD 34 DBRE - Methodology DB-MAIN methodology: Physical extraction step 1 TBL_X IDX id':IDX acc TBL_S IDS COL_2 TBL_C IDC COL_2 id':IDC acc create table TBL_C ( IDC char(24) not null, COL_2 numeric(12) not null); create table TBL_X ( IDX char(42) not null); create table TBL_S ( IDS char(18) not null, COL_2 numeric(14) not null); create unique index ID_C on TBL_C(IDC); create unique index ID_X on TBL_X(IDX);

35 LIBD 35 DBRE - Methodology DB-MAIN methodology : Logical extraction TBL_X IDX id':IDX acc TBL_S IDS COL_2 TBL_C IDC COL_2 id':IDC acc step 2 SECTOR SNAME CA id:SNAME COUNTRY NOMP VOLUME id:NOMP EXPORT CNAME SNAME id:CNAME SNAME ref:CNAME ref:SNAME  cryptic names  missing unique keys  missing foreign keys  concatenated columns

36 LIBD 36 DBRE - Methodology DB-MAIN methodology : Conceptualization step 3 0-N export SECTOR Sector Name Income id:Sector Name COUNTRY Country Name Volume id:Country Name SECTOR SNAME INCOME id:SNAME COUNTRY CNAME VOLUME id:CNAME EXPORT CNAME SNAME id:CNAME SNAME ref:CNAME ref:SNAME  technology-dependent constructs  semantic interpretation unclear (sort of!)

37 LIBD 37 DBRE - Methodology Project management Full project Pilote Conceptualization Logical extraction Physical extraction Source management Project planning Conditioning Selection Evaluation Identification Others Prog. analysis Data analysis Sch. analysis Stop/go

38 LIBD 38 Analysis techniques for Logical extraction 1. Schema analysis 2. Data analysis 3. Programming pattern (clichés) analysis 4. Query analysis 5. Program dependency analysis 6. Program slicing others DBRE - Methodology

39 LIBD 39 Analysis techniques for Logical extraction 1. Schema analysis DBRE - Methodology  One of the main heuristics in the 80's

40 LIBD 40 Analysis techniques for Logical extraction 2. Data analysis DBRE - Methodology select REG_NUMBER, count(*) as N from PATIENT group by REG_NUMBER having count(*) > 1 REG_NUMBER | N -----------|------ | 

41 LIBD 41 Analysis techniques for Logical extraction 3. Programming pattern (clichés) analysis DBRE - Methodology Much can be learned on the DB structures by examining how programs use the data. As a consequence, several code analysis techniques have been designed to extract information on these data structures. Four popular examples:  pattern analysis  DML statement analysis (e.g., SQL queries analysis)  dependency analysis  program slicing

42 LIBD 42 Analysis techniques for Logical extraction 3. Programming pattern (clichés) analysis DBRE - Methodology  select * from CUSTOMER where CUST_ID = :CN; if SQLCODE = 0 then begin insert into ORDER values(:ON,:CN,:OD); end; +

43 LIBD 43 Analysis techniques for Logical extraction 4. Query analysis DBRE - Methodology  select * from CUSTOMER, ORDER where CUST_ID = ORD_CUS +

44 LIBD 44 Analysis techniques for Logical extraction 5. Program dependency/flow analysis DBRE - Methodology select ORD_CUST into :OC from ORDER where ORD_NUM = :ON; C = "" + OC; insert into CUSTOMER values(:C,'?','?'); + ORDER.ORD_CUSTOC CCUSTOMER.CUST_ID   ?

45 LIBD 45 Analysis techniques for Logical extraction 6. Program slicing DBRE - Methodology l Considering program P, statement s in P, and variables V of P, the slice (s,V) in P is the subset of P that contributes to the state of V at point s [Weiser,84]. l Example : identify all the statements that fill a record before it is stored in the database. Idea: the data validation statements must be somewhere in this slice. l Has been extended to programs that comprise procedures [Horwitz,90]. l Has been extended to complex DB programs and schema objects [Cleve,06]. Useful to reduce the exploration space when searching programs for patterns, clichés, dependencies, etc.

46 LIBD 46 DBRE - Methodology l Developed by the Laboratory of Database Engineering (LIBD), University of Namur, since 1993 (work started in 1989). Effort of about 40 man-year. l Supported by the DB-MAIN CASE environment l Support DDL, schema, data, DML, program code and HMI analysis techniques l Produces the enriched logical schema, the conceptual schema and the inter-schema mappings The DB-MAIN Methodology (1)

47 LIBD 47 DBRE - Methodology l Extendible: can accommodate new sources, new technologies, new languages, new DBRE techniques. l Integrated with database design methodologies l Targets most Data manager: RPG, COBOL files, IMS/DL1, IDMS/IDS2, RDB, ORDB, XML l Automatic generation of documentation and additional code (inter- schema mappings, wrappers, ETL) The DB-MAIN Methodology (2)

48 LIBD 48 Database reverse engineering: A case study

49 LIBD 49 DBRE - Methodology DB-MAIN - A case study Context:Belgian car distributor - Spare parts management application Objective:migrating an IDMS database to a RDBMS Size:medium size database (324 record types) First trial : physical migration, outsourced Result : failure, very poor performance, unreadable code, the migrated system cannot be maintained any longer... Second trial : semantic migration, in house (DB-MAIN methodology) Result : success; satisfying performance, maintainable result; gave the project leader the "IT manager of the year" award in 2002!

50 LIBD 50 DBRE - Methodology Methodology (physical migration) Physical extraction Physical (IDMS) schema IDMS-DDL code SQL-DDL code Coding Physical (DB2) schema Transform

51 LIBD 51 DBRE - Methodology Methodology (semantic migration) Physical extraction Physical (IDMS) schema Logical (DBTG) schema Conceptual schema Logical extraction Conceptualization IDMS DDL code SQL-DDL code Coding Physical design Logical design Logical (RDB) schema Physical (DB2) schema

52 LIBD 52 DBRE - Physical schema (IDMS) - Excerpts

53 LIBD 53 DBRE - (Raw) Logical schema (IDMS) - Excerpts

54 LIBD 54 DBRE - Conceptual schema - Excerpts

55 LIBD 55 DBRE - Logical schema (Relational) - Excerpts

56 LIBD 56 New challenges

57 LIBD 57 New challenges New forms of data New engineering approaches Higher process automation Standard problems getting more critical Dynamic SQL ORM-based program development

58 LIBD 58 New challenges 1. New forms of data RDB and ORDB are no longer the ultimate data models Emergence of semi-structured and unstructured data XML: complex data are stored and transmitted in XML format HTML: web pages contains valuable data; requires more interpretation (conceptualization) effort Text: no structure but rich semantic contents Unifying structural data semantics (DB) and unstructured data semantics : increasing role of ontologies

59 LIBD 59 New challenges 2. New engineering approaches  Software engineering is going model-driven (MDE)  "Reverse Engineering is Reverse Forward Engineering" [Baxter,1997]  DBRE basically is model-driven, and, more important, transformational  many DBRE activities can be modeled as schema transformations  for higher reliability and completeness  for formal traceability  for better automation

60 LIBD 60 New challenges 3. Higher process automation Reverse engineering is a prerequisite to most evolution processes If performed correctly, it provides inter-schema mappings as a by- product These mappings can be used to automate evolution processes Example: migration/evolution of an application from a legacy DBMS to a modern one:  migrating the schema (conceptual approach) [Hick, 2001]  migrating the data [Hick, 2001]  migrating the application programs [Cleve, 2009]

61 LIBD 61 New challenges 4. Standard problems getting more critical Explosion of poorly designed (web) databases Skill shortage in legacy technologies Increasing complexity and size of corporate databases

62 LIBD 62 New challenges 5. Dynamic SQL In the 80's: major use of static SQL OD = "11-08-2009"; exec SQL select C.CUST_ID, O.ORD_NUM into :CI, :ON from CUSTOMER C, ORDER O where O.ORD_CUST = C.CUST_ID and O.ORD_DATE = :OD; In the 00's, increasing use of dynamic SQL (ODBC, JDBC, PHP) Connection conn; String query; Statement inst; ResultSet res; inst = conn.createStatement(); OD = "11-08-2009"; query = "select CUST_ID, ORD_NUM " + "from CUSTOMER C, ORDER O where CUST_NUM = " + "where O.ORD_CUST = C.CUST_ID and O.ORD_DATE = " + "'" + OD + "'"; res = inst.executeQuery(query); CI = res.getString(1); ON = res.getInt(2);

63 LIBD 63 New challenges 5. Dynamic SQL However: static analysis of programs and queries is no longer sufficient. Solution: to resort to dynamic analysis, but much more complex. Aspect technology promising [Cleve,08] Much research has been devoted to static analysis of (static) SQL queries in application programs.

64 LIBD 64 New challenges "Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language." Wikipedia [Object-relational_mapping] Oct. 2009 6. ORM-based program development Programs work on objects Data are stored in relational tables ORM API's map objects on table rows... and conversely complex, N:N, mappings allowed

65 LIBD 65 New challenges 6. ORM-based program development RDB data Application program Relational DB schema standard architecture RDB data Application program Relational DB schema Class schema O/R Mapping ORM architecture

66 LIBD 66 New challenges 6. ORM-based program development C++: LiteSQL, Debea, dtemplatelib, hiberlite, romic, SOCI, etc. Delphi: Bold for Delphi, Macrobject DObject, InstantObjects, tiOPF, etc. Java:Carbonado, Cayenne, CocoBase, Ebean, EJB 3.0, Enterprise Objects Framework, Hibernate, iBATIS, JPM2Java, JPOX, Kodo, Object Relational Bridge OpenJPA, SimpleORM, Spring, TopLink, Torque, GenORMous, etc..NET:ADO.NET Entity Framework, Atlas, Base One Foundation Component Library, BCSEi ORM Code Generator, Business Logic Toolkit for.NET, Castle ActiveRecord, DataObjects.Net, CocoBase, Devart LINQ to SQL, DevForce, Developer Express, ECO, EntityORM, EntitySpaces, Euss, Habanero, iBATIS, Invist, LLBLGen Pro, LightSpeed, Altova Mapforce, Neo,.netTiers, NConstruct, NHibernate, Opf3, ObjectMapper.NET, Picasso, OpenAccess, TierDeveloper, Persistor.NET, Quick Objects, Sooda, Subsonic, Wilson ORMapper, etc. PHP:CakePHP, Coughphp, DABL, Data Shuffler, dbphp, Doctrine, dORM, EZPDO, Hormon, LightOrm, Outle, pdoMap, PersistentObject, PHPSimpleDB, Propel, Rocks, Qcodo, Redbean, Xyster, etc. Python:Django, SQLAlchemy, SQLObject, Storm, etc. Ruby:Active Record, Ruby on Rails, Datamapper, iBATIS, Sequel, etc. Pearl:DBIx::Class, Rose::DB::Object, Fey::ORM, Jifty::DBI, DBIx::DataModel, Data::ObjectDriver, Class::DBI, etc. ORM API's are getting very popular:

67 LIBD 67 New challenges 6. ORM-based program development "Object/relational mapping is the Vietnam of Computer Science" Ted Neward

68 LIBD 68 New challenges 6. ORM-based program development "Object/relational mapping is the Vietnam of Computer Science" "It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy." Ted Neward's Technical Blog June 2006 http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

69 LIBD 69 New challenges Three approaches to ORM definition: 1.the database pre-exists : the class schema is ([semi-]automatically) derived from the DB schema; problem: the class schema imperfectly fits programmers view 6. ORM-based program development 3.both class and database schemas pre-exist : the mapping is built manually; problem: the mapping is built manually! 2.the class schema pre-exists : the database is ([semi-]automatically) derived from the class schema; problems: the DB schema imperfectly fits database good practices and may not meet the needs of other applications (the one-DB-per-AP syndrom inherited from the 60's);

70 LIBD 70 New challenges Both schemas can evolve independently without the mapping being maintained Class schema can be very large Database schema generally is very large Class schemas are poorly documented Database schemas are poorly documented (even more than it used to be) Mappings are poorly documented 6. ORM-based program development Some observations:

71 LIBD 71 New challenges So, what's the point? maintenance and evolution are more than "problematic" 6. ORM-based program development Maintenance and evolution require "a complete, up to date, documentation of the DB"... and of the classes... and of the mappings. Most often, neither the class schema nor the (generally ill-designed) DB have a decent documentation So, maintenance and evolution require preliminary reverse engineering.

72 LIBD 72 New challenges 6. ORM-based program development Example a large WebLogic application accessing an Oracle database through EJB (Enterprise Java Beans) entities defined on SQL views. Problem: maintenance and evolution are getting painful. Objective: 1.redocumentation of the Oracle database (base schema, views, checks, triggers, stored procedures); 2.redocumentation of the EJB entities 3.redocumentation of the O/R mappings

73 LIBD 73 New challenges 6. ORM-based program development DB physical extraction: Tables:835; 10,683columns; Views: 1,024; 12,998 columns; Foreign keys:1,447; Checks: 2,232; Triggers:978. EJB physical extraction: Entities:244; Relations:305; Fields:3,819; Methods:642.

74 LIBD 74 New challenges 6. ORM-based program development Mapping analysis: several EJB entities are mapped on missing tables/views (obsolete entities); 26% foreign keys declared in EJB entities are not declared in the DB schema (inconsistencies, implicit foreign keys); EJB ahead of DB when an EJB entity uses a table, it does not always declare its unique and foreign keys (inconsistencies, program execution abnomalies); DB ahead of EJB we now understand why maintenance and evolution were declared so painful!

75 LIBD 75 New challenges 6. ORM-based program development Conclusion: EJB/DB inconsistencies have been found through the reverse engineering of both schemas.

76 LIBD 76 Database reverse engineering: Conclusions

77 LIBD 77 Conclusions DBRE often is a critical process in information system management  the database is the central component of every IS.  any weakness in this component weakens the whole IS In the reverse engineering process, the database is not an isolated component  the DB cannot be understood independently of the programs  the programs cannot be understood independently of the DB

78 LIBD 78 Conclusions There is no fixed point in DBRE: new technologies make the documentation problem more critical and more complex  rapid website development may produce apocalyptic databases  OO programming and particularly ORM Risk of transforming DB knowledge into a niche culture.  ORM makes the DB transparent  for many application programmers, the DB is where « persistence services » (probably) are located. We have a DBA for that. Not an easy guy.  for many application programmers, Domain modeling consists in designing a nice Java class system.

79 LIBD 79 Contacts Laboratory: http://info.fundp.ac.be/libd ReveR:http://www.rever-sa.com

80 LIBD 80

81 LIBD 81 Appendix 2 Foreign key conceptualization

82 LIBD 82 DBRE - Foreign keys Foreign keys probably are the most important constructs both in the logical extraction and conceptualization processes. Field practice shows that they are used to express a surprising large variety of semantic patterns.

83 LIBD 83 DBRE - Foreign keys l Standard foreign keys and basic variants l Non standard foreign keys l Inclusion constraints l Complex foreign key patterns l Interval foreign keys l Pathological foreign keys Excepts from Jean-Luc Hainaut, Conceptual interpretation of foreign keys, DB-MAIN Technical report, May 2010, http://www.info.fundp.ac.be/~dbm/Documents/Publications-LIBD/Technical- Reports/Conceptual-interpretation-of-FK-(stand-alone).pdfhttp://www.info.fundp.ac.be/~dbm/Documents/Publications-LIBD/Technical- Reports/Conceptual-interpretation-of-FK-(stand-alone).pdf

84 LIBD 84 DBRE - Foreign keys Standard foreign keys and basic variants Standard foreign key 

85 LIBD 85 DBRE - Foreign keys Standard foreign keys and basic variants Optional foreign key 

86 LIBD 86 DBRE - Foreign keys Standard foreign keys and basic variants Optional, multi-components foreign key  ideal pattern

87 LIBD 87 DBRE - Foreign keys Standard foreign keys and basic variants Optional, multi-components foreign key  often found = ideal pattern

88 LIBD 88 DBRE - Foreign keys Standard foreign keys and basic variants Total foreign key (total FK) 

89 LIBD 89 DBRE - Foreign keys Standard foreign keys and basic variants Identifying foreign key 

90 LIBD 90 DBRE - Foreign keys Standard foreign keys and basic variants Identifying foreign key  id becomes implicit

91 LIBD 91 DBRE - Foreign keys Standard foreign keys and basic variants Cyclic foreign key 

92 LIBD 92 DBRE - Foreign keys Non standard foreign keys Secondary foreign key 

93 LIBD 93 DBRE - Foreign keys Non standard foreign keys Secondary foreign key (to optional id) 

94 LIBD 94 DBRE - Foreign keys Non standard foreign keys Multi-target foreign key = seems simple but includes a complex constraint

95 LIBD 95 DBRE - Foreign keys Non standard foreign keys Multi-target foreign key  for e  EXPENSE, e.by.SERVICE.ServiceID = e.on.BUDGET.BudgetID for b  BUDGET, b.BudgetID = b.of.SERVICE.ServiceID

96 LIBD 96 DBRE - Foreign keys Non standard foreign keys Alternate foreign key 

97 LIBD 97 DBRE - Foreign keys Non standard foreign keys Hierarchical foreign key to an ET 

98 LIBD 98 DBRE - Foreign keys Non standard foreign keys Hierarchical foreign key to a multivalued attribute 

99 LIBD 99 DBRE - Foreign keys Non standard foreign keys Computed foreign key  PURCHASE PurchID Agent Date Amount id:PurchID ref:f(Date) FISCAL-YEAR Year Budget id:Year for p  PURCHASE, p.for.FISCAL-YEAR.Year = f(p.Date) 

100 LIBD 100 DBRE - Foreign keys Non standard foreign keys Computed foreign key TAX-RATE Country Year Rate id:Country Year PURCHASE PurchID Customer Year Amount id:PurchID ref:Customer ref:f(Customer) Year CUSTOMER CustomerID Name City id:CustomerID ref:City CITY CityID CityName ContryName id:CityID  for p  PURCHASE: p.at_rate.TAX-RATE.Country = p.by.CUSTOMER.in.CITY.CountryName for p  PURCHASE: p.at.TAX-RATE.for.COUNTRY = p.by.CUSTOMER.in.CITY.in.COUNTRY

101 LIBD 101 DBRE - Foreign keys Non standard foreign keys Non-1NF foreign key   

102 LIBD 102 DBRE - Foreign keys Non standard foreign keys Non-1NF foreign key 

103 LIBD 103 DBRE - Foreign keys Inclusion constraints Inclusion constraint 

104 LIBD 104 DBRE - Foreign keys Inclusion constraints Domain sharing  SHOP Name: ChainName Town: char (32) Size: num (6) id:Name Town OFFER Item: char (12) Chain: ChainName Price: num (5) id:Item Chain 1-1 0-N of 1-1 0-N by SHOP Town: char (32) Size: num (6) id:of.CHAIN Town OFFER Item: char (1) Price: num (5) id:by.CHAIN Item CHAIN Name: ChainName id:Name

105 LIBD 105 DBRE - Foreign keys Complex foreign key patterns Conditional foreign key  for s  STUDENT: s.Country = "Belgium"  s.School  SCHOOL.SchoolName for f  FOREIGN: f.Country  "Belgium"  for s  STUDENT: s.Country = "Belgium"  s.BelgiumSchool  SCHOOL.SchoolName for f  FOREIGN: f.Country  "Belgium" 

106 LIBD 106 DBRE - Foreign keys Complex foreign key patterns Overlapping identifier - foreign key 

107 LIBD 107 DBRE - Foreign keys Complex foreign key patterns Overlapping identifier - foreign key  for d  DETAIL: d.ItemCode = d.ItemCode_R  for d  DETAIL: d.ItemCode = d.ref.ITEM.ItemCode

108 LIBD 108 DBRE - Foreign keys Complex foreign key patterns Overlapping foreign keys  for l  LINE-of-INVOICE: l.from.INVOICE.OrderNumber = l.for.LINE-of-ORDER.OrderNumber

109 LIBD 109 DBRE - Foreign keys Complex foreign key patterns Overlapping foreign keys  for l  LINE-of-INVOICE: l.from.INVOICE.OrderNumber = l.for.LINE-of-ORDER.OrderNumber for l  LINE-of-INVOICE: l.from.INVOICE.for.ORDER = l.for.LINE-of-ORDER.from.ORDER

110 LIBD 110 DBRE - Foreign keys Complex foreign key patterns Non-minimal FK  for r  REGISTRATION: r.Subject = r.for.LECTURE.Subject 

111 LIBD 111 DBRE - Foreign keys Complex foreign key patterns Partially reciprocal foreign keys  

112 LIBD 112 DBRE - Foreign keys Complex foreign key patterns Inverse foreign keys 

113 LIBD 113 DBRE - Foreign keys Temporal (interval?) foreign keys 

114 LIBD 114 DBRE - Foreign keys Pathological foreign keys Loosely matching foreign keys The domains of the foreign key and of the target key are comparable in some way

115 LIBD 115 DBRE - Foreign keys Pathological foreign keys 99% correct foreign key  for c  CUSTOMER(wrong): c.Category  CATEGORY.CatName

116 LIBD 116 DBRE - Foreign keys Pathological foreign keys Transitive foreign key (non-redundant) 

117 LIBD 117 DBRE - Foreign keys Pathological foreign keys Transitive foreign key (redundant) 

118 LIBD 118 DBRE - Foreign keys Pathological foreign keys Partly optional foreign key   for s  LAST-YEAR-STUDENT: s.Year = s.writes.DISSERTATION.Year

119 LIBD 119 DBRE - Foreign keys Pathological foreign keys Embedded foreign key 

120 LIBD 120 DBRE - Foreign keys Pathological foreign keys Embedded foreign key 

121 LIBD 121 DBRE - Foreign keys Pathological foreign keys Reflexive foreign key 


Download ppt "University of Namur Faculté d'informatique LIBD - Laboratory of Database Application Engineering www.info.fundp.ac.be/libd LIBD Database Reverse Engineering."

Similar presentations


Ads by Google