Download presentation
Presentation is loading. Please wait.
1
A Database Reverse Engineering Case Study
DAMA MN Nov. 16, 2016 A Database Reverse Engineering Case Study Michael R. Blaha, DSc.
2
What is Database Reverse Engineering?
Reverse engineering is the inverse to normal development Start with an application and work backwards to understand the software and infer its intent Reverse engineering can apply to a variety of artifacts Hardware, programming code, databases, … Our focus here is on databases
3
Why Would Anyone Want to Do DBRE?
To elicit requirements DBRE is not intended to perpetuate past flaws DBRE is merely a source of tentative requirements To convert legacy data To integrate application stovepipes To assess software To assist maintenance To construct documentation
4
Inputs to DBRE The available information varies widely…
Database structure Documentation Application understanding Data Database queries Forms and reports
5
Key Themes for DBRE Don’t mistake hypotheses for conclusions
Expect multiple interpretations Don’t be discouraged by approximate results Expect odd constructs Watch for consistent style
6
Case Study 1: Reverse Engineer WordPress
7
Rationale WordPress is an interesting DBRE case study because...
WordPress is a well-known application WordPress is a framework that generalizes the process of building a website The case study has a populated database The data is real (not synthetic) The data is not proprietary Illustrate DBRE techniques for a small database
8
Processing Details Export MySQL db from www.superdataguy.com website
The exported localhost.sql is unreadable Import SQL code into a local MySQL db Export schema only from local MySQL db The exported file is readable Manually edit the SQL by deleting… `, unsigned, COLLATE, KEY, UNIQUE KEY, ENGINE Reverse engineer schema with ERwin
9
Initial ERwin Model
10
Record Counts From querying the MySQL database wp_fqir_commentmeta 90
wp_fqir_comments 31 wp_fqir_links wp_fqir_options 377 wp_fqir_postmeta 703 wp_fqir_posts 434 wp_fqir_term_relationships 64 wp_fqir_term_taxonomy 15 wp_fqir_termmeta wp_fqir_terms wp_fqir_usermeta 49 wp_fqir_users 1
11
Manually Add FKs Look for name similarity Verify with data analysis
SELECT * FROM wp_fqir_commentmeta WHERE comment_ID NOT IN (SELECT comment_ID FROM wp_fqir_comments); -- 0 records SELECT * FROM wp_fqir_commentmeta WHERE comment_ID IS NULL; -- 0 records
12
DBRE ERwin Model
13
Commentary WordPress has a very small schema
Only 12 tables I had expected more tables WordPress has no dangling references WordPress lacks RI WordPress compensates with careful programming ERwin only partially reverse engineers MySQL It chokes on some keywords
14
Case Study 2: Reverse Engineer Adventure Works 2012
15
Rationale Adventure Works 2012 is an interesting DBRE case study because... Adventure Works is a free database provided with MS SQL Server The case study has a populated database The data is not proprietary The database is of medium size (71 tables) The database defines referential integrity Only one FK is missing
16
Mechanical Approach Strip down the schema to get to a core model
This is like skimming a book We are working towards an abridgement of a model We can quickly get a sense of a schema We will use ER/Studio
17
Record Counts Query the SQL Server database dbo.AWBuildVersion 1
dbo.DatabaseLog 1597 dbo.ErrorLog HumanResources.Department 16 HumanResources.Employee 290 HumanResources. EmployeeDepartmentHistory 296 HumanResources.EmployeePayHistory 316 HumanResources.JobCandidate 13 HumanResources.Shift 3 Person.Address 19614 Person.AddressType 6 Person.BusinessEntity 20777 Person.BusinessEntityAddress 19614 Person.BusinessEntityContact 909 Person.ContactType 20 Person.CountryRegion 238 Person. Address 19972 Person.Password Person.Person Person.PersonPhone Person.PhoneNumberType 3 Person.StateProvince 181
18
Record Counts Record counts partially indicate table purpose
Production.BillOfMaterials 2679 Production.Culture 8 Production.Document 13 Production.Illustration 5 Production.Location 14 Production.Product 504 Production.ProductCategory 4 Production.ProductCostHistory 395 Production.ProductDescription 762 Production.ProductDocument 32 Production.ProductInventory 1069 Production.ProductListPriceHistory Production.ProductModel 128 Production.ProductModelIllustration 7 Production.ProductModel ProductDescriptionCulture 762 Production.ProductPhoto 101 Production.ProductProductPhoto 504 Production.ProductReview 4 Production.ProductSubcategory 37 Production.ScrapReason 16 Production.TransactionHistory 113443 Production.TransactionHistoryArchive 89253 Production.UnitMeasure 38 Production.WorkOrder 72591
19
Record Counts Note different counts for ‘types’ and ‘instances’
Purchasing.WorkOrderRouting 67131 Purchasing.ProductVendor 460 Purchasing.PurchaseOrderDetail 8845 Purchasing.PurchaseOrderHeader 4012 Purchasing.ShipMethod 5 Purchasing.Vendor 104 Sales.CountryRegionCurrency 109 Sales.CreditCard 19118 Sales.Currency 105 Sales.CurrencyRate 13532 Sales.Customer 19820 Sales.PersonCreditCard Sales.SalesOrderDetail 121317 Sales.SalesOrderHeader 31465 Sales.SalesOrderHeaderSalesReason 27647 Sales.SalesPerson 17 Sales.SalesPersonQuotaHistory 163 Sales.SalesReason 10 Sales.SalesTaxRate 29 Sales.SalesTerritory Sales.SalesTerritoryHistory Sales.ShoppingCartItem 3 Sales.SpecialOffer 16 Sales.SpecialOfferProduct 538 Sales.Store 701
20
Processing Details Import schema into ER/Studio
File / New / Reverse engineer AdventureWorks2012 All owners, user tables, no inferences Add FK Production.WorkOrderRouting.ProductID -> Production.Product.ProductID Successively delete all entity types with 0,1 connections Delete entity types with few (<= 3) connections
21
Step 1: ER/Studio Model
22
Step 2: ER/Studio Model
23
Step 3: ER/Studio Model
24
Step 3: Final Tables Remaining tables in black dbo.AWBuildVersion 1
dbo.DatabaseLog 1597 dbo.ErrorLog HumanResources.Department 16 HumanResources.Employee 290 HumanResources. EmployeeDepartmentHistory 296 HumanResources.EmployeePayHistory 316 HumanResources.JobCandidate 13 HumanResources.Shift 3 Person.Address 19614 Person.AddressType 6 Person.BusinessEntity 20777 Person.BusinessEntityAddress 19614 Person.BusinessEntityContact 909 Person.ContactType 20 Person.CountryRegion 238 Person. Address 19972 Person.Password Person.Person Person.PersonPhone Person.PhoneNumberType 3 Person.StateProvince 181
25
Step 3: Final Tables Deleted tables in red Production.BillOfMaterials
2679 Production.Culture 8 Production.Document 13 Production.Illustration 5 Production.Location 14 Production.Product 504 Production.ProductCategory 4 Production.ProductCostHistory 395 Production.ProductDescription 762 Production.ProductDocument 32 Production.ProductInventory 1069 Production.ProductListPriceHistory Production.ProductModel 128 Production.ProductModelIllustration 7 Production.ProductModel ProductDescriptionCulture 762 Production.ProductPhoto 101 Production.ProductProductPhoto 504 Production.ProductReview 4 Production.ProductSubcategory 37 Production.ScrapReason 16 Production.TransactionHistory 113443 Production.TransactionHistoryArchive 89253 Production.UnitMeasure 38 Production.WorkOrder 72591
26
Step 3: Final Tables Purchasing.WorkOrderRouting 67131
Purchasing.ProductVendor 460 Purchasing.PurchaseOrderDetail 8845 Purchasing.PurchaseOrderHeader 4012 Purchasing.ShipMethod 5 Purchasing.Vendor 104 Sales.CountryRegionCurrency 109 Sales.CreditCard 19118 Sales.Currency 105 Sales.CurrencyRate 13532 Sales.Customer 19820 Sales.PersonCreditCard Sales.SalesOrderDetail 121317 Sales.SalesOrderHeader 31465 Sales.SalesOrderHeaderSalesReason 27647 Sales.SalesPerson 17 Sales.SalesPersonQuotaHistory 163 Sales.SalesReason 10 Sales.SalesTaxRate 29 Sales.SalesTerritory Sales.SalesTerritoryHistory Sales.ShoppingCartItem 3 Sales.SpecialOffer 16 Sales.SpecialOfferProduct 538 Sales.Store 701
27
Commentary DBRE depends on having FKs
The 0,1 connection deletions lose little info The “few” connections deletions are speculative Supertype/subtypes are troublesome From a separate manual DBRE BusinessEntity -> Employee, Vendor, Person, Store Employee -> SalesPerson
28
Supertype / Subtype
29
Case Study 3: Core DBRE
30
Rationale A project building a very large data warehouse
100 facts 200 dimensions The primary operational feeder application has 8500 tables I was new to the project and there was a lot to learn I wanted to reverse engineer the feeder application so that I could understand it
31
Available Inputs We had the following inputs (paper printouts) for the feeder application A thorough data dictionary Primary key definitions Foreign key definitions
32
The DBRE Problem Reverse engineer a database with 8500 tables
With smaller schema, we could type the database structure into a modeling tool and then analyze it However, 8500 tables would take too long We decided to determine the tightly connected tables and hope that would yield a much smaller model We presume that the tightly connected tables are the most important ones
33
DBRE Approach Do a graph analysis
Create a meta-table with FK to PK references The FK in the source table points to the PK in the target table Using SQL, successively delete tables with 0,1 FK connections The final result is the multiply connected tables
34
Example 34
35
Example Repeatedly subtract tables with one reference until there is no change 35
36
Finding Core Tables DELETE FROM TableReferences AS T3 WHERE EXISTS ( SELECT T1.sourceTable FROM TableReferences AS T1 WHERE NOT EXISTS ( SELECT * FROM TableReferences AS T2 WHERE T1.sourceTable = T2.targetTable ) AND T3.sourceTable = T1.sourceTable GROUP BY T1.sourceTable HAVING COUNT(*)=1 ); The middle query finds tables with one source reference The innermost query limits the one-source tables to those that are not the target of any other sources The outer query does the deletion 36
37
Results Initial: 8500 tables Final result: 553 core tables
Several thousand FK definitions 854 tables have FK columns 254 tables are referenced by FKs Final result: 553 core tables
38
Case Study 4: Enterprise Data Model
39
Rationale Construct an enterprise data model
My client – a financial software vendor – was a fusion of five formerly separate companies The applications were greatly dissimilar because they were built by separate organizations The purpose of the EDM was to provide a basis for integrating the applications and help the new company strengthen their brand
40
DBRE Approach We wanted to seed the EDM with application content
We tried full DBRE but it was not helpful because the resulting models were so different We tried core DBRE but the models were still confusing because they were so different Finally we decided to count the FK references to each table We included the tables with the highest counts This worked
41
Example AddressType 2 Applicant 29 ApplicantAddressHistory 7
ApplicantType 2 LenderApplicantDetails 2 Country 2 OverseasCorrespondence 2 ExistingInsuranceCover 6 Provider 3 PaymentFrequency 2 Fee FeeDueType 2 FeeType 2 ProductFee 2 41
42
Results 42
43
Thank you for attending…
Any questions??? 43
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.