Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ron Forino DAMA - Washington, DC September 1999 Project Driven Data Quality Improvement.

Similar presentations


Presentation on theme: "Ron Forino DAMA - Washington, DC September 1999 Project Driven Data Quality Improvement."— Presentation transcript:

1 Ron Forino DAMA - Washington, DC September 1999 Project Driven Data Quality Improvement

2 Confidential DMR Consulting Group Inc. 1 ExamplesExamples According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid. Electronic data audits show that the invalid data values in the typical customer database average around 15 - 20%. Physical audits suggest that this number may be closer to 25 - 30%. In 1992, 96,000 IRS tax refund checks were returned “undeliverable” due to incorrect addresses. This year, incorrect price data in retail databases will cost American consumers as much as $2.5 billion in overcharges. According to organizations like the Data Warehouse Institute, the Gartner Group and MetaGroup - Data Quality is one of the top 1-3 success factors to Data Warehousing. The average mid-sized company may have 30,000 - 50,000 fields in files, tables, screens, reports, etc. [Platinum Technology]

3 Confidential DMR Consulting Group Inc. 2 AgendaAgenda Definitions What is Data Quality? Tactics and the End Game Building Blocks to Data Quality – Tactical Initiatives – Strategic Initiatives Tactical Data Quality – Rule Disclosure – Data Quality Measurement, Analysis and Certification – Meta Data Creation – Validation – Quality Improvement

4 Definitions

5 Confidential DMR Consulting Group Inc. 4 DefinitionsDefinitions Data Transformation - Changing data values to a format consistent with integrity and business rules agreed to by data stakeholders. Data Cleansing - Consolidation of redundant customer records. Term used to describe the process of “merging and purging” of customer lists in an effort to reduce duplicate or inaccurate customer records. Data Quality Improvement - The process of improving data quality to the level desired to support the enterprise information demand. Data Quality - definition to follow….

6 Confidential DMR Consulting Group Inc. 5 Data Quality Improvement Decision Tree Data Quality Improvement Data Cleansing Transform Data Reengineering Match & Dedupe Process Reengineering Standardize Validate Match Dedupe Integrate Enrich Conform to Business Rule TaskProcess

7 Confidential DMR Consulting Group Inc. 6 Tactics and The End Game “We need better data quality...” Enterprise Initiative Select Project Data Quality Assessment Report & Recommendations Source System Clean-up Initiative

8 Confidential DMR Consulting Group Inc. 7 Tactics and The End Game “We need better data quality...” Enterprise Initiative Select Project Data Quality Assessment Report & Recommendations Source System Clean-up Initiative Data Warehouse Data Quality Assessment Report Staging Specifications Source System Clean-up Initiative

9 What is [Good] Data Quality?

10 Confidential DMR Consulting Group Inc. 9 How Can We Know Good Data Quality? Column 1 321453 212392 093255 214421. Is this Good Data Quality? What can we conclude?

11 Confidential DMR Consulting Group Inc. 10 What is Data Quality? Information Quality = f(Definition + Data + Presentation) Definition Defines Data Domain Value Specification Business Rules that Govern the Data Information Architecture Quality Data Content Completeness Validity/Reasonability Data Presentation Accessible Timely Non-ambiguous

12 Confidential DMR Consulting Group Inc. 11 Common Data Quality Problems Data Content Missing Data Invalid Data Data Outside Legal Domain Illogical Combinations of Data Structural Record Key Integrity Referential Integrity Cardinality Integrity Migration/Integration Rationalization Anomalies Duplicate or Lost Entities Definitions and Standards Ambiguous Business Rules Multiple Formats for Same Data Elements Different Meanings for the Same Code Value Multiple Codes Values with the Same Meaning Field Used for Unintended Data Data in Filler Y2K Violation

13 Building Blocks to Data Quality

14 Confidential DMR Consulting Group Inc. 13 Benefits Realization Strategic Tactical Building Blocks of a Data Quality Program Rule Disclosure Analyze & Certify Meta Data Creation Quality Improvement Data Stewardship DQ Requirements Enterprise Cultural Shift QC/Process Auditing Defect Prevention Validation Quality Reengineering Measure

15 Tactical Data Quality

16 Confidential DMR Consulting Group Inc. 15 Steps to Tactical Data Quality Measure Quality Meta Data Creation Rule Disclosure Analyze & Certify Validation Quality Improvement

17 Rule Disclosure

18 Confidential DMR Consulting Group Inc. 17 Sources of Meta Data Legacy Meta Data – Data Models, Process Models – Data Dictionary, Definitions, Aliases – Glossary of Terms Transformation Meta Data – Data Mapping – Transformation Rules – Error Handling Rules Access Meta Data – Data Directory – Data Definitions The Subject Matter Expert – Database Directory – Domain Values, Range of Values – Run Books – Derived Data Calculations – Audit Statistics – Source & Transformation

19 Confidential DMR Consulting Group Inc. 18 Acquiring good Meta Data is Essential Meta Data can be gathered before, during or after the Assessment Collect Documentation Report Findings Validate the Meta Data Assess the Data Collect Documentation Validate Findings Assess the Data Report Findings Preferred Collect Valid Meta Data Report Findings Assess the Data “You can pay me now, or you can pay me later…”

20 Measuring Data Quality ã Techniques ã Tools ã Methods

21 Confidential DMR Consulting Group Inc. 20  Customer Complaints  User Interviews & Feedback  Customer Satisfaction Survey  Data Quality Requirements Gathering  Data Quality Assessments “One accurate measurement is worth a thousand expert opinions” [Grace Hopper, Admiral, US Navy] How can Data Quality be Measured?

22 Confidential DMR Consulting Group Inc. 21 Measuring Data Quality - Tools Analysis Tools Specifically designed assessment tools – Quality Manager, Migration Architect – N & A: Trillium, Group-1, ID Centric, Finalist, etc. Improvisations – SAS, Focus, SQL, other query tools Other Necessary Tools File Transfer Data Conversion

23 Confidential DMR Consulting Group Inc. 22 Business Rule Integrity Requiring Meta Data Field Integrity Intuitive Integrity Rules Level 1: Completeness – Nulls or Blanks – Misuse (or overuse) of Default Values Level 2: Validity – Data Integrity Anomalies – Invalid Data based on Business Rule Level 3: Structural Integrity – Primary Key Uniqueness – Key Structure (Cardinality, Referential Integrity, Alternate Keys) Level 4: Business Rule Violations – Relationship between two or more fields – Calculations Assessment Measurements

24 Analyze and Certify ã Identifying Problems ã Sizing up Problems ã “To Certify or Not to Certify…” Report Card

25 Confidential DMR Consulting Group Inc. 24 Template - field level Value - the domain occurrence Frequency- the number of occurrences within the data set Percent- the % of the whole set 88 Info- the copybook definition for the value Analysis- comments about our findings

26 Confidential DMR Consulting Group Inc. 25 Identifying Problems Analysis (and Discovery) 1. Is the field required? If so, blanks indicate an anomaly. 2. Are the values “ID206” and “STANG” allowed? (Is this a problem with the data or the Meta Data? 3.Some values occur in only 1.3% of the records. Is this telling us there is a problem? 1 2 3

27 Confidential DMR Consulting Group Inc. 26 Data Quality Scoring

28 Confidential DMR Consulting Group Inc. 27 Example: Poor Data Quality

29 Confidential DMR Consulting Group Inc. 28 Field Analysis In a range of values, in the absence of domain rules, investigate the first and last.2% Bell curve distribution

30 Confidential DMR Consulting Group Inc. 29 Management Reporting - Short Engagement

31 Confidential DMR Consulting Group Inc. 30 Management Reporting - Status

32 Confidential DMR Consulting Group Inc. 31 Management Reporting - Anomalies

33 Confidential DMR Consulting Group Inc. 32 Management Reporting - Productivity

34 Meta Data Creation

35 Confidential DMR Consulting Group Inc. 34 Example: Data Quality Repository Newly Discovered Rules Newly Discovered Rules

36 Confidential DMR Consulting Group Inc. 35 Work Groups Field Name Data Inventory Meta Data Knowledge Management Transformation & Edit Recommendations Data Quality Statistical Reports DQ Assessment Data Quality & Definition Validation Data Cleansing Update SME Validation Meta Data Supply Chain Definition & Domain Meta Data Gathering Data Requirements

37 Results Validation

38 Report Validation SME validation… an opportunity to improve Meta Data 1. Supply a clear name for the field. 2. Is there a good definition? 3. Make the business rules public? 4. Will the SME initiate a data cleansing initiative? 5. Does the SME recommend edit or data transformation rules? 6. Are the findings consistent with the SMEs expectations? Report Sections Identification Field Definition & Rules Statistical Reports & Analysis Score & Explanation 1 23 45 6

39 Quality Improvement

40 Confidential DMR Consulting Group Inc. 39 Next Steps Management Report & RecommendationsSteeringCommittee Initiatives Data Clean-up Legacy System Enhancements & Re-engineering Data Migration Transformation & Cleansing Specifications Continued Monitoring MonthlyReports Perform Baseline Assessment InformationManagementObjectives Metadata,Models, Reports, etc. LegacyDataExtractions (Discovered Business Rules)

41 Confidential DMR Consulting Group Inc. 40 Completeness Accuracy 100% (More complete, more error prone) (More accurate, less data) $$ (Most complete, most accurate, most costly, most timely) Lessons Learned- Data Cleanup

42 Confidential DMR Consulting Group Inc. 41SummarySummary  We made the distinction between: - Data Migration - Data Quality - Data Cleansing We defined what “good” data quality is. We discussed that there could be 10 or more processes that could take place in building a comprehensive data quality program for the enterprise. - Tactical should precede the Strategic [or be the 1st step of ] There are 6 steps to an effective tactical data quality initiative: - Rule Disclosure - Quality Measurement - Analyze and Certify - Meta Data Creation - Validation - Quality Improvement

43 Confidential DMR Consulting Group Inc. 42 Reference Material The Demings Management Method (Total Quality Management), Mary Walton Data Quality for the Information Age, Tom Redman The Data Warehouse Challenge: Taming Data Chaos, Michael Brackett Improving Data Warehouse and Business Information Quality, Larry English DM Review Magazine, Information Quality series by Larry English

44 Ron Forino Director, Business Intelligence DMR Consulting Group (732)549-4100 X-8292 rforino@dmr.com ronforino@aol.com

45


Download ppt "Ron Forino DAMA - Washington, DC September 1999 Project Driven Data Quality Improvement."

Similar presentations


Ads by Google