Fidelity and Data Quality L20. Topics Integrity and Fidelity The Cost of poor Data Quality The Causes of poor Data Quality The process improvement cycle.

Slides:



Advertisements
Similar presentations
Organisation Of Data (1) Database Theory
Advertisements

TK3333 Software Management Topic 7: Schedule Control.
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
Database Management System MIS 520 – Database Theory Fall 2001 (Day) Lecture 13.
Unit 7: Store and Retrieve it Database Management Systems (DBMS)
THE RELATIONAL DATABASE MODEL & THE DATABASE DEVELOPMENT PROCESS Fact of the Week: According to a Gartner study in ‘06, Microsoft SQL server had the highest.
Software Quality Engineering Roadmap
Chapter 9 Describing Process Specifications and Structured Decisions
Chapter 9 Describing Process Specifications and Structured Decisions Systems Analysis and Design Kendall & Kendall Sixth Edition © 2005 Pearson Prentice.
File Systems and Databases
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
General Ledger and Reporting System
Knowledge Management C S R PRABHU BY Deputy Director General
Troy Eversen | 19 May 2015 Data Integrity Workshop.
LOGICAL DATABASE DESIGN
Databases Creating databases to store information.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Introduction to Databases. Case Example: File based Processing Real Estate Agent’s office Property for sale or rent Potential Buyer/renter Staff/employees.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Kim Duckworth New Zealand Ministry of Fisheries The application of standardised data quality improvement methodologies to data describing marine fisheries.
BSBIMN501A QUEENSLAND INTERNATIONAL BUSINESS ACADEMY.
Database Design - Lecture 1
1. Define the term ‘database’(2) A database is a large and continuously updated collection of stored data structured to allow the various applications.
Search Engines and Information Retrieval Chapter 1.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Human Resource Management Lecture 27 MGT 350. Last Lecture What is change. why do we require change. You have to be comfortable with the change before.
Made by: Sambit Pulak XI-IB. Reliability refers to the operation of hardware, the design of software, the accuracy of data or the correspondence of data.
Chapter 14 Information System Development
Concepts and Terminology Introduction to Database.
Section 2 Section 2 Information and Information Technology.
Matching in Information Systems ISD3 Lecture 11. Contents Matching exercises Integrity and Fidelity –Fidelity as a matching problem – between the world.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T5 DESIGNING DATABASE APPLICATIONS.
© M S GIS & Mapping Implementing GIS © M S GIS & Mapping Training and Information A Successful Project A Case Study - The Geo Pres Project To Finish a.
Database What is a database? A database is a collection of information that is typically organized so that it can easily be storing, managing and retrieving.
1 Introduction to Software Engineering Lecture 1.
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
Entity-Relationship (ER) Modelling ER modelling - Identify entities - Identify relationships - Construct ER diagram - Collect attributes for entities &
13-1 COBOL for the 21 st Century Nancy Stern Hofstra University Robert A. Stern Nassau Community College James P. Ley University of Wisconsin-Stout (Emeritus)
SYSTEM TESTING AND DEPLOYMENT CHAPTER 8. Chapter 8: System Testing and Deployment 2 KNOWLEDGE CAPTURE (Creation) KNOWLEDGE TRANSFER KNOWLEDGE SHARING.
INTRODUCTION TO USER DOCUMENTATION Function and purpose Production specifications Evaluate the effectiveness.
Yemaneberhan Taddesse.  PASDEP(plan of accelerated and sustainable development for the Eradication of poverty) Poverty reduction strategy is the main.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Software Requirements and Design Khalid Ishaq
Relational Databases. Relational database  data stored in tables  must put data into the correct tables  define relationship between tables  primary.
Workflow for ACT! and QuickBooks Graeme Leo Xact Software.
NATIONAL AGENCY FOR EDUCATION Check the Source! - Web Evaluation
CISB113 Fundamentals of Information Systems IS Development.
Marketing Research Process
Main tasks of system analysis ? 1-study exit=sting information system 2-identify problem 3-spelify system requirement 4-asalysis decision ========= How.
Information System Analysis Topic-2. Data Gathering Observations Questionnaires Interviews.
Data Warehousing 101 Howard Sherman Director – Business Intelligence xwave.
GCSE ICT 3 rd Edition The system life cycle 18 The system life cycle is a series of stages that are worked through during the development of a new information.
1 The System life cycle 16 The system life cycle is a series of stages that are worked through during the development of a new information system. A lot.
McGraw-Hill/Irwin © 2013 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 10 E-business and Enterprise Resource Planning Systems.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Business intelligence systems. Data warehousing. An orderly and accessible repositery of known facts and related data used as a basis for making better.
Operational Issues. Operational Changes It is important to organisations to ensure that they abide by the Law when caring for the safety of their employees,
Copyright ©2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. Handbook of Informatics for Nurses and Healthcare.
Requirements. Outline Definition Requirements Process Requirements Documentation Next Steps 1.
 Overview of Project management. ◦ Management. ◦ Project Management. ◦ Software Project Management. ◦ Project(Dimensions, Characteristics, Complexity,
Information System Analysis Topic-2. Data Gathering Observations Questionnaires Interviews.
Getting started with Accurately Storing Data
Describe the responsibilities of financial-information management in an organization
E-business and Enterprise Resource Planning Systems
Databases.
Well Come To Our Presentation Accounting Information System Topic: Report on Revenue Cycle.
Presentation transcript:

Fidelity and Data Quality L20

Topics Integrity and Fidelity The Cost of poor Data Quality The Causes of poor Data Quality The process improvement cycle

Integrity Data in a database should agree with the rules in the schema –Checks on values –Referential integrity –Primary key A weak schema allows erroneous data –E.g. Invalid manager relationships in the Emp-Dept example –Need for extended Business rules in middle tier of application

Fidelity HiFi “exactitude in reproduction” A database as an image of its Domain of Discourse (Real World) Loss of fidelity when: –Two records in database but only one person in the RW –Address data does not correspond to an existing address in the RW –Address in database does not correspond to the current address of its owner But fidelity only has to be ‘good enough’ for its purpose

Data Quality Poor data quality results from loss of integrity and lack of fidelity. “Current data quality problems cost US businesses more that $600 billion per year” (report by the Data Warehousing Institute, 2002 Gartner Research estimates that through 2005 more than 50% of business intelligence and CRM deployments will suffer limited acceptance if not outright failure due to lack of attention to data quality issues. Direct costs of poor quality information estimated at between 10% and 20% of revenue

Information systems / computer systems Computer system quality depends only on ensuring the system doesn’t fall over when presented with bad data Information Systems quality depends on ensuring the system delivers information of high quality Information System includes procedures and guidance to users to meet this need.

Who’s who in data quality? Tom Redman – ex AT and T, now Cutter Consortium consultant - many books and articles including “Data Quality for the Information Age” 1996 Larry English of Dataflux Companies providing software for data cleansing

Data Quality improvement Redman’s top three Data cleansing Problem analysis Dataflow analysis Process improvement

Redman’s top three Focus on data accuracy –Companies still do not realise the cost of poor data quality Clear definitions –Common terms e.g. customer, product have slightly different meanings in different contexts (nuances) Relevance –Estimates of 50% of data not used by anyone, ever –No value in wasting time improving its quality

Data cleansing Identifying duplicates – a difficult matching task Parsing complex strings into meaningful pats – e.g. a name and address into title, given names, familiy name, street number, street, town Postcode, country

Problem analysis Analyse chain of cause and effect of poor quality Fishbone or Ishikawa Diagram diagrams this chain Systems approach: –Information system: Data flow model analysed for points where errors can be injected –Organisation: Attitudes and ethos

Data Flow in the Information System Information source Information gathering Information collation Information storage Information retrieval

Data source problems Data has only a limited lifetime of fidelity since world is in constant flux Length of lifetime depends on –Volatility of the data source – address for young out-of-work person or address of retired person Need to re-validate data on a cycle dependent on the lifetime

Data capture Data gathering procedures a major source of error. Integrity and Fidelity can be in conflict –If telephone number is mandatory, operator in hurry will enter any old number to get the record accepted Data quality depends on training and guidance given to operators

Collation Matching of new applicants with existing applicants is poor so duplicates generated. Postcodes accepted even if not matching Post Office database

Storage Database integrity failures or loss of backup data, or reload with duplicates (auto number primary key)

Improvement Process Based on learning cycle –Shewart cycle – Plan- Do –Check – Act –Deming cycle –Six Sigma – Define-measure-analyse- improve-control –Kolb learning cycle – act – reflect – theorise – plan

Improvement/ Learning Cycle Measure and observe the current process Analyse / develop theory of causes of problem Plan changes based in the theory Put plan into effect Measure /observe the resultant improvement ….