Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality By Suparna Kansakar.

Similar presentations


Presentation on theme: "Data Quality By Suparna Kansakar."— Presentation transcript:

1 Data Quality By Suparna Kansakar

2 Agenda What Is Data Quality? Cost of poor Quality of Data
Improving Data Quality Dimensions of Data Quality Enhancing Data Quality Data Profiling Data Cleaning

3 What Is Data Quality? Data quality refers to the accuracy and completeness of information collected and reported in system.

4 Cost of Poor-Quality Data
Rework Re-collect Data Correct Errors Data Verification

5 Cost of Poor-Quality Data
Business Process Costs Incorrect Registrations Inaccurate Tuition Billings Payroll Errors

6 Cost of Poor-Quality Data Substandard Customer Service
Missed Opportunities Substandard Customer Service Poor Decision Making Loss of Reputation

7 Improving Data Quality
Business Practice Change Data Cleansing Improved Data Quality Business Process Review Data Quality Assessment

8 Business Process Review
When, where, how is data collected? Where is data stored? Who creates data? Who uses data? What outputs are required? What quality checks already exist?

9 Dimensions of data quality Data Quality Assessment

10 Dimensions of data quality
• Completeness: Is all the requisite information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue.   • Conformity: Are there expectations that data values conform to specified formats? If so, do all the values conform to those formats? Maintaining conformance to specific formats is important in data representation, presentation, aggregate reporting, search, and establishing key relationships.

11 Dimensions of data quality
(continue) • Consistency: Do distinct data instances provide conflicting information about the same underlying data object? Are values consistent across data sets? Do interdependent attributes always appropriately reflect their expected consistency?  • Accuracy: Do data objects accurately represent the “real-world” values they are expected to model? Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications. 

12 Dimensions of data quality
(continue) • Integrity: What data is missing important relationship linkages? The inability to link related records together may actually introduce duplication across your systems. Integrity means validity of data across the relationships and ensures that all data in a database can be traced and connected to other data. • Timeliness- Timeliness references whether information is available when it is expected and needed. Timeliness of data is very important.

13 Playing Your Part to Enhance Data Quality
Project participants School staff (Users) Collection Entry IT Staff Technology Processes Everyone plays a Role in Enhancing Data Quality 13 of 25

14 IT Level Roles Create and implement data quality plan
Check data quality and provide feedback Provide training, support and documentation Hold regular user groups

15 IT Level Roles Ensure effective security measures
(continue) Ensure effective security measures Develop an efficient editing and verification system Release only good data and clarify limitations with every aggregate data release Develop an electronic audit trail

16 User Level: Data Collection
Intake or front line staff are often the first point of data collection for clients in need of service. Intake or front line staff need to understand and be able to communicate to every client served why information is being captured and how the information will be used including: Purpose of data collection; Privacy policies and consent protocols.

17 User Level: Data Entry Staff doing data entry, including volunteers must be trained to: Search the system for an existing client record by all methods, if applicable Enter all the information provided Enter accurate information Proofread for common errors: Accidentally picking the wrong response category Typing Data in the wrong field Misspellings

18 Data Profiling It is the process of statistically examining and analyzing the content in a data source, and hence collecting information about the data. It consists of techniques used to analyze the data we have for accuracy and completeness. Data profiling helps us make a thorough assessment of data quality. It assists the discovery of anomalies in data. It helps us understand content, structure, relationships, etc. about the data in the data source we are analyzing.

19 Data Profiling (Continue) It helps us know whether the existing data can be applied to other areas or purposes. It helps us understand the various issues/challenges we may face in a database project much before the actual work begins. This enables us to make early decisions and act accordingly. It is also used to assess and validate metadata.

20 Data Cleaning Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.

21 Data Cleaning (continue)

22 Data Cleaning Principles
Prevention is better than cure It is far cheaper and more efficient to prevent an error from happening, than to have to detect it and correct it later. It is also important that when errors are detected, that feedback mechanisms ensure that the error doesn’t occur again during data entry, or that there is a much lower likelihood of it re-occurring. Organising Data improves efficiency The organizing of data prior to data checking, validation and correction can improve efficiency and considerably reduce the time and costs of data cleaning.

23 Final thoughts on Data Quality…
Take a critical look at your existing data. Implement changes to how you collect and manage data. Invest the time to educate and communicate with data users and creators. Make data quality improvement an on-going process. Data should be accurate, complete, consistent and timely. All members of the orgnization play a role in promoting data quality.

24

25 “ Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.”   -- William A. Foster Thank You.


Download ppt "Data Quality By Suparna Kansakar."

Similar presentations


Ads by Google