Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Database Reverse Engineering Case Study

Similar presentations


Presentation on theme: "A Database Reverse Engineering Case Study"— Presentation transcript:

1 A Database Reverse Engineering Case Study
DAMA MN Nov. 16, 2016 A Database Reverse Engineering Case Study Michael R. Blaha, DSc.

2 What is Database Reverse Engineering?
Reverse engineering is the inverse to normal development Start with an application and work backwards to understand the software and infer its intent Reverse engineering can apply to a variety of artifacts Hardware, programming code, databases, … Our focus here is on databases

3 Why Would Anyone Want to Do DBRE?
To elicit requirements DBRE is not intended to perpetuate past flaws DBRE is merely a source of tentative requirements To convert legacy data To integrate application stovepipes To assess software To assist maintenance To construct documentation

4 Inputs to DBRE The available information varies widely…
Database structure Documentation Application understanding Data Database queries Forms and reports

5 Key Themes for DBRE Don’t mistake hypotheses for conclusions
Expect multiple interpretations Don’t be discouraged by approximate results Expect odd constructs Watch for consistent style

6 Case Study 1: Reverse Engineer WordPress

7 Rationale WordPress is an interesting DBRE case study because...
WordPress is a well-known application WordPress is a framework that generalizes the process of building a website The case study has a populated database The data is real (not synthetic) The data is not proprietary Illustrate DBRE techniques for a small database

8 Processing Details Export MySQL db from www.superdataguy.com website
The exported localhost.sql is unreadable Import SQL code into a local MySQL db Export schema only from local MySQL db The exported file is readable Manually edit the SQL by deleting… `, unsigned, COLLATE, KEY, UNIQUE KEY, ENGINE Reverse engineer schema with ERwin

9 Initial ERwin Model

10 Record Counts From querying the MySQL database wp_fqir_commentmeta 90
wp_fqir_comments 31 wp_fqir_links wp_fqir_options 377 wp_fqir_postmeta 703 wp_fqir_posts 434 wp_fqir_term_relationships 64 wp_fqir_term_taxonomy 15 wp_fqir_termmeta wp_fqir_terms wp_fqir_usermeta 49 wp_fqir_users 1

11 Manually Add FKs Look for name similarity Verify with data analysis
SELECT * FROM wp_fqir_commentmeta WHERE comment_ID NOT IN (SELECT comment_ID FROM wp_fqir_comments); -- 0 records SELECT * FROM wp_fqir_commentmeta WHERE comment_ID IS NULL; -- 0 records

12 DBRE ERwin Model

13 Commentary WordPress has a very small schema
Only 12 tables I had expected more tables WordPress has no dangling references WordPress lacks RI WordPress compensates with careful programming ERwin only partially reverse engineers MySQL It chokes on some keywords

14 Case Study 2: Reverse Engineer Adventure Works 2012

15 Rationale Adventure Works 2012 is an interesting DBRE case study because... Adventure Works is a free database provided with MS SQL Server The case study has a populated database The data is not proprietary The database is of medium size (71 tables) The database defines referential integrity Only one FK is missing

16 Mechanical Approach Strip down the schema to get to a core model
This is like skimming a book We are working towards an abridgement of a model We can quickly get a sense of a schema We will use ER/Studio

17 Record Counts Query the SQL Server database dbo.AWBuildVersion 1
dbo.DatabaseLog 1597 dbo.ErrorLog HumanResources.Department 16 HumanResources.Employee 290 HumanResources. EmployeeDepartmentHistory 296 HumanResources.EmployeePayHistory 316 HumanResources.JobCandidate 13 HumanResources.Shift 3 Person.Address 19614 Person.AddressType 6 Person.BusinessEntity 20777 Person.BusinessEntityAddress 19614 Person.BusinessEntityContact 909 Person.ContactType 20 Person.CountryRegion 238 Person. Address 19972 Person.Password Person.Person Person.PersonPhone Person.PhoneNumberType 3 Person.StateProvince 181

18 Record Counts Record counts partially indicate table purpose
Production.BillOfMaterials 2679 Production.Culture 8 Production.Document 13 Production.Illustration 5 Production.Location 14 Production.Product 504 Production.ProductCategory 4 Production.ProductCostHistory 395 Production.ProductDescription 762 Production.ProductDocument 32 Production.ProductInventory 1069 Production.ProductListPriceHistory Production.ProductModel 128 Production.ProductModelIllustration 7 Production.ProductModel ProductDescriptionCulture 762 Production.ProductPhoto 101 Production.ProductProductPhoto 504 Production.ProductReview 4 Production.ProductSubcategory 37 Production.ScrapReason 16 Production.TransactionHistory 113443 Production.TransactionHistoryArchive 89253 Production.UnitMeasure 38 Production.WorkOrder 72591

19 Record Counts Note different counts for ‘types’ and ‘instances’
Purchasing.WorkOrderRouting 67131 Purchasing.ProductVendor 460 Purchasing.PurchaseOrderDetail 8845 Purchasing.PurchaseOrderHeader 4012 Purchasing.ShipMethod 5 Purchasing.Vendor 104 Sales.CountryRegionCurrency 109 Sales.CreditCard 19118 Sales.Currency 105 Sales.CurrencyRate 13532 Sales.Customer 19820 Sales.PersonCreditCard Sales.SalesOrderDetail 121317 Sales.SalesOrderHeader 31465 Sales.SalesOrderHeaderSalesReason 27647 Sales.SalesPerson 17 Sales.SalesPersonQuotaHistory 163 Sales.SalesReason 10 Sales.SalesTaxRate 29 Sales.SalesTerritory Sales.SalesTerritoryHistory Sales.ShoppingCartItem 3 Sales.SpecialOffer 16 Sales.SpecialOfferProduct 538 Sales.Store 701

20 Processing Details Import schema into ER/Studio
File / New / Reverse engineer AdventureWorks2012 All owners, user tables, no inferences Add FK Production.WorkOrderRouting.ProductID -> Production.Product.ProductID Successively delete all entity types with 0,1 connections Delete entity types with few (<= 3) connections

21 Step 1: ER/Studio Model

22 Step 2: ER/Studio Model

23 Step 3: ER/Studio Model

24 Step 3: Final Tables Remaining tables in black dbo.AWBuildVersion 1
dbo.DatabaseLog 1597 dbo.ErrorLog HumanResources.Department 16 HumanResources.Employee 290 HumanResources. EmployeeDepartmentHistory 296 HumanResources.EmployeePayHistory 316 HumanResources.JobCandidate 13 HumanResources.Shift 3 Person.Address 19614 Person.AddressType 6 Person.BusinessEntity 20777 Person.BusinessEntityAddress 19614 Person.BusinessEntityContact 909 Person.ContactType 20 Person.CountryRegion 238 Person. Address 19972 Person.Password Person.Person Person.PersonPhone Person.PhoneNumberType 3 Person.StateProvince 181

25 Step 3: Final Tables Deleted tables in red Production.BillOfMaterials
2679 Production.Culture 8 Production.Document 13 Production.Illustration 5 Production.Location 14 Production.Product 504 Production.ProductCategory 4 Production.ProductCostHistory 395 Production.ProductDescription 762 Production.ProductDocument 32 Production.ProductInventory 1069 Production.ProductListPriceHistory Production.ProductModel 128 Production.ProductModelIllustration 7 Production.ProductModel ProductDescriptionCulture 762 Production.ProductPhoto 101 Production.ProductProductPhoto 504 Production.ProductReview 4 Production.ProductSubcategory 37 Production.ScrapReason 16 Production.TransactionHistory 113443 Production.TransactionHistoryArchive 89253 Production.UnitMeasure 38 Production.WorkOrder 72591

26 Step 3: Final Tables Purchasing.WorkOrderRouting 67131
Purchasing.ProductVendor 460 Purchasing.PurchaseOrderDetail 8845 Purchasing.PurchaseOrderHeader 4012 Purchasing.ShipMethod 5 Purchasing.Vendor 104 Sales.CountryRegionCurrency 109 Sales.CreditCard 19118 Sales.Currency 105 Sales.CurrencyRate 13532 Sales.Customer 19820 Sales.PersonCreditCard Sales.SalesOrderDetail 121317 Sales.SalesOrderHeader 31465 Sales.SalesOrderHeaderSalesReason 27647 Sales.SalesPerson 17 Sales.SalesPersonQuotaHistory 163 Sales.SalesReason 10 Sales.SalesTaxRate 29 Sales.SalesTerritory Sales.SalesTerritoryHistory Sales.ShoppingCartItem 3 Sales.SpecialOffer 16 Sales.SpecialOfferProduct 538 Sales.Store 701

27 Commentary DBRE depends on having FKs
The 0,1 connection deletions lose little info The “few” connections deletions are speculative Supertype/subtypes are troublesome From a separate manual DBRE BusinessEntity -> Employee, Vendor, Person, Store Employee -> SalesPerson

28 Supertype / Subtype

29 Case Study 3: Core DBRE

30 Rationale A project building a very large data warehouse
100 facts 200 dimensions The primary operational feeder application has 8500 tables I was new to the project and there was a lot to learn I wanted to reverse engineer the feeder application so that I could understand it

31 Available Inputs We had the following inputs (paper printouts) for the feeder application A thorough data dictionary Primary key definitions Foreign key definitions

32 The DBRE Problem Reverse engineer a database with 8500 tables
With smaller schema, we could type the database structure into a modeling tool and then analyze it However, 8500 tables would take too long We decided to determine the tightly connected tables and hope that would yield a much smaller model We presume that the tightly connected tables are the most important ones

33 DBRE Approach Do a graph analysis
Create a meta-table with FK to PK references The FK in the source table points to the PK in the target table Using SQL, successively delete tables with 0,1 FK connections The final result is the multiply connected tables

34 Example 34

35 Example Repeatedly subtract tables with one reference until there is no change 35

36 Finding Core Tables DELETE FROM TableReferences AS T3 WHERE EXISTS ( SELECT T1.sourceTable FROM TableReferences AS T1 WHERE NOT EXISTS ( SELECT * FROM TableReferences AS T2 WHERE T1.sourceTable = T2.targetTable ) AND T3.sourceTable = T1.sourceTable GROUP BY T1.sourceTable HAVING COUNT(*)=1 ); The middle query finds tables with one source reference The innermost query limits the one-source tables to those that are not the target of any other sources The outer query does the deletion 36

37 Results Initial: 8500 tables Final result: 553 core tables
Several thousand FK definitions 854 tables have FK columns 254 tables are referenced by FKs Final result: 553 core tables

38 Case Study 4: Enterprise Data Model

39 Rationale Construct an enterprise data model
My client – a financial software vendor – was a fusion of five formerly separate companies The applications were greatly dissimilar because they were built by separate organizations The purpose of the EDM was to provide a basis for integrating the applications and help the new company strengthen their brand

40 DBRE Approach We wanted to seed the EDM with application content
We tried full DBRE but it was not helpful because the resulting models were so different We tried core DBRE but the models were still confusing because they were so different Finally we decided to count the FK references to each table We included the tables with the highest counts This worked

41 Example AddressType 2 Applicant 29 ApplicantAddressHistory 7
ApplicantType 2 LenderApplicantDetails 2 Country 2 OverseasCorrespondence 2 ExistingInsuranceCover 6 Provider 3 PaymentFrequency 2 Fee FeeDueType 2 FeeType 2 ProductFee 2 41

42 Results 42

43 Thank you for attending…
Any questions??? 43


Download ppt "A Database Reverse Engineering Case Study"

Similar presentations


Ads by Google