Data Quality By Suparna Kansakar.

Slides:



Advertisements
Similar presentations
Data Quality Considerations
Advertisements

Quality Data – An Improbable Dream? Quality Data An Improbable Dream? Elizabeth Vannan Centre for Education Information Victoria, BC, Canada.
Quality Data for a Healthy Nation by Mary H. Stanfill, RHIA, CCS, CCS-P.
C6 Databases.
Quality Management System The Mentor Way. Inconsistency thy name is humanity Wise men’s answer is the path of Quality Human nature’s to rationalize failure.
Constructing a Data Management System National Center for Immunization & Respiratory Diseases Influenza Division Regional Training Workshop on Influenza.
Troy Eversen | 19 May 2015 Data Integrity Workshop.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
- 1 - Roadmap to Re-aligning the Customer Master with Oracle's TCA Northern California OAUG March 7, 2005.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
This chapter is extracted from Sommerville’s slides. Text book chapter
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
I.Information Building & Retrieval Learning Objectives: the process of Information building the responsibilities and interaction of each data managing.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
© 2012 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the U.S.
ITGS Databases.
Controls design Controls are “the plan of organization and all the methods and measures to safeguard its assets, check the accuracy and reliability of.
AUDIT IN COMPUTERIZED ENVIRONMENT
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
Requirement Engineering. Recap Elaboration Behavioral Modeling State Diagram Sequence Diagram Negotiation.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Session 6: Data Flow, Data Management, and Data Quality.
A Training Course for the Analysis and Reporting of Data from Education Management Information Systems (EMIS)
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
 The processes used for RE vary widely depending on the application domain, the people involved and the organisation developing the requirements.  However,
1 Auditing Your Fusion Center Privacy Policy. 22 Recommendations to the program resulting in improvements Updates to privacy documentation Informal discussions.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
TECHNOLOGY IN ACTION. Chapter 11 Behind the Scenes: Databases and Information Systems.
Security Development Lifecycle (SDL) Overview
Auditing Concepts.
Application Extension 5a
Database Development Lifecycle
Introduction To DBMS.
CHAPTER SIX DATA Business Intelligence
Software Quality Control and Quality Assurance: Introduction
Chapter 11 Designing Inputs, Outputs, and Controls.
Data Quality The Implications for Decision-Making
Overview of MDM Site Hub
ServiceNow Implementation Knowledge Management
Release Management Release Management.
Change Control Module P5 LEARNING OBJECTIVES: LEARNING OUTCOMES
Chapter 1 Database Systems
Chapter 9 Database and Information Management.
Assuring the Quality of your COSF Data
CHAPTER SIX OVERVIEW SECTION 6.1 – DATABASE FUNDAMENTALS
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Data Quality in the BI Life Cycle
Metadata The metadata contains
Chapter 1 Database Systems
The ultimate in data organization
IT 244 Database Management System
AICT5 – eProject Project Planning for ICT
Entering Records.
Valuing Organizational Information
Internal Audit Who? What? When? How? Why? In brief . . .
Instructor Materials Chapter 5: Ensuring Integrity
Process and Procedure Documentation
Implementation Business Case
HUD’s Coordinated Entry Data & Management Guide
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Assuring the Quality of your COSF Data
Organizational Aspects of Data Management
Database Design Chapter 7.
Presentation transcript:

Data Quality By Suparna Kansakar

Agenda What Is Data Quality? Cost of poor Quality of Data Improving Data Quality Dimensions of Data Quality Enhancing Data Quality Data Profiling Data Cleaning

What Is Data Quality? Data quality refers to the accuracy and completeness of information collected and reported in system.

Cost of Poor-Quality Data Rework Re-collect Data Correct Errors Data Verification

Cost of Poor-Quality Data Business Process Costs Incorrect Registrations Inaccurate Tuition Billings Payroll Errors

Cost of Poor-Quality Data Substandard Customer Service Missed Opportunities Substandard Customer Service Poor Decision Making Loss of Reputation

Improving Data Quality Business Practice Change Data Cleansing Improved Data Quality Business Process Review Data Quality Assessment

Business Process Review When, where, how is data collected? Where is data stored? Who creates data? Who uses data? What outputs are required? What quality checks already exist?

Dimensions of data quality Data Quality Assessment

Dimensions of data quality • Completeness: Is all the requisite information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue.   • Conformity: Are there expectations that data values conform to specified formats? If so, do all the values conform to those formats? Maintaining conformance to specific formats is important in data representation, presentation, aggregate reporting, search, and establishing key relationships.

Dimensions of data quality (continue) • Consistency: Do distinct data instances provide conflicting information about the same underlying data object? Are values consistent across data sets? Do interdependent attributes always appropriately reflect their expected consistency?    • Accuracy: Do data objects accurately represent the “real-world” values they are expected to model? Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications. 

Dimensions of data quality (continue) • Integrity: What data is missing important relationship linkages? The inability to link related records together may actually introduce duplication across your systems. Integrity means validity of data across the relationships and ensures that all data in a database can be traced and connected to other data.   • Timeliness- Timeliness references whether information is available when it is expected and needed. Timeliness of data is very important.

Playing Your Part to Enhance Data Quality Project participants School staff (Users) Collection Entry IT Staff Technology Processes Everyone plays a Role in Enhancing Data Quality 13 of 25

IT Level Roles Create and implement data quality plan Check data quality and provide feedback Provide training, support and documentation Hold regular user groups

IT Level Roles Ensure effective security measures (continue) Ensure effective security measures Develop an efficient editing and verification system Release only good data and clarify limitations with every aggregate data release Develop an electronic audit trail

User Level: Data Collection Intake or front line staff are often the first point of data collection for clients in need of service. Intake or front line staff need to understand and be able to communicate to every client served why information is being captured and how the information will be used including: Purpose of data collection; Privacy policies and consent protocols.

User Level: Data Entry Staff doing data entry, including volunteers must be trained to: Search the system for an existing client record by all methods, if applicable Enter all the information provided Enter accurate information Proofread for common errors: Accidentally picking the wrong response category Typing Data in the wrong field Misspellings

Data Profiling It is the process of statistically examining and analyzing the content in a data source, and hence collecting information about the data. It consists of techniques used to analyze the data we have for accuracy and completeness. Data profiling helps us make a thorough assessment of data quality. It assists the discovery of anomalies in data. It helps us understand content, structure, relationships, etc. about the data in the data source we are analyzing.

Data Profiling (Continue)   It helps us know whether the existing data can be applied to other areas or purposes. It helps us understand the various issues/challenges we may face in a database project much before the actual work begins. This enables us to make early decisions and act accordingly. It is also used to assess and validate metadata.

Data Cleaning Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.

Data Cleaning (continue)

Data Cleaning Principles Prevention is better than cure It is far cheaper and more efficient to prevent an error from happening, than to have to detect it and correct it later. It is also important that when errors are detected, that feedback mechanisms ensure that the error doesn’t occur again during data entry, or that there is a much lower likelihood of it re-occurring. Organising Data improves efficiency The organizing of data prior to data checking, validation and correction can improve efficiency and considerably reduce the time and costs of data cleaning.

Final thoughts on Data Quality… Take a critical look at your existing data. Implement changes to how you collect and manage data. Invest the time to educate and communicate with data users and creators. Make data quality improvement an on-going process. Data should be accurate, complete, consistent and timely. All members of the orgnization play a role in promoting data quality.

“ Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.”   -- William A. Foster Thank You.