Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fidelity and Data Quality L20. Topics Integrity and Fidelity The Cost of poor Data Quality The Causes of poor Data Quality The process improvement cycle.

Similar presentations


Presentation on theme: "Fidelity and Data Quality L20. Topics Integrity and Fidelity The Cost of poor Data Quality The Causes of poor Data Quality The process improvement cycle."— Presentation transcript:

1 Fidelity and Data Quality L20

2 Topics Integrity and Fidelity The Cost of poor Data Quality The Causes of poor Data Quality The process improvement cycle

3 Integrity Data in a database should agree with the rules in the schema –Checks on values –Referential integrity –Primary key A weak schema allows erroneous data –E.g. Invalid manager relationships in the Emp-Dept example –Need for extended Business rules in middle tier of application

4 Fidelity HiFi “exactitude in reproduction” A database as an image of its Domain of Discourse (Real World) Loss of fidelity when: –Two records in database but only one person in the RW –Address data does not correspond to an existing address in the RW –Address in database does not correspond to the current address of its owner But fidelity only has to be ‘good enough’ for its purpose

5 Data Quality Poor data quality results from loss of integrity and lack of fidelity. “Current data quality problems cost US businesses more that $600 billion per year” (report by the Data Warehousing Institute, 2002 Gartner Research estimates that through 2005 more than 50% of business intelligence and CRM deployments will suffer limited acceptance if not outright failure due to lack of attention to data quality issues. Direct costs of poor quality information estimated at between 10% and 20% of revenue

6 Information systems / computer systems Computer system quality depends only on ensuring the system doesn’t fall over when presented with bad data Information Systems quality depends on ensuring the system delivers information of high quality Information System includes procedures and guidance to users to meet this need.

7 Who’s who in data quality? Tom Redman – ex AT and T, now Cutter Consortium consultant - many books and articles including “Data Quality for the Information Age” 1996 Larry English of Dataflux Companies providing software for data cleansing

8 Data Quality improvement Redman’s top three Data cleansing Problem analysis Dataflow analysis Process improvement

9 Redman’s top three Focus on data accuracy –Companies still do not realise the cost of poor data quality Clear definitions –Common terms e.g. customer, product have slightly different meanings in different contexts (nuances) Relevance –Estimates of 50% of data not used by anyone, ever –No value in wasting time improving its quality

10 Data cleansing Identifying duplicates – a difficult matching task Parsing complex strings into meaningful pats – e.g. a name and address into title, given names, familiy name, street number, street, town Postcode, country

11 Problem analysis Analyse chain of cause and effect of poor quality Fishbone or Ishikawa Diagram diagrams this chain Systems approach: –Information system: Data flow model analysed for points where errors can be injected –Organisation: Attitudes and ethos

12 Data Flow in the Information System Information source Information gathering Information collation Information storage Information retrieval

13 Data source problems Data has only a limited lifetime of fidelity since world is in constant flux Length of lifetime depends on –Volatility of the data source – address for young out-of-work person or address of retired person Need to re-validate data on a cycle dependent on the lifetime

14 Data capture Data gathering procedures a major source of error. Integrity and Fidelity can be in conflict –If telephone number is mandatory, operator in hurry will enter any old number to get the record accepted Data quality depends on training and guidance given to operators

15 Collation Matching of new applicants with existing applicants is poor so duplicates generated. Postcodes accepted even if not matching Post Office database

16 Storage Database integrity failures or loss of backup data, or reload with duplicates (auto number primary key)

17 Improvement Process Based on learning cycle –Shewart cycle – Plan- Do –Check – Act –Deming cycle –Six Sigma – Define-measure-analyse- improve-control –Kolb learning cycle – act – reflect – theorise – plan

18 Improvement/ Learning Cycle Measure and observe the current process Analyse / develop theory of causes of problem Plan changes based in the theory Put plan into effect Measure /observe the resultant improvement ….


Download ppt "Fidelity and Data Quality L20. Topics Integrity and Fidelity The Cost of poor Data Quality The Causes of poor Data Quality The process improvement cycle."

Similar presentations


Ads by Google