Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Similar presentations


Presentation on theme: "“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical."— Presentation transcript:

1 “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh,EH14 4AS {pilar,lachlan}@macs.hw.ac.ukpilar,lachlan}@macs.hw.ac.uk Doctoral Consortium

2 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 2 Agenda Introduction Proposal Data Quality Manager Components – Reference Model – Measure Model – Assessment Model – Quality Metadata Information Integration Process – Classification of DataSources – Selection of Best Datasources – Query Planning – Data Fusion – Ranking of Query results Questions

3 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 3 Naming Data Representation Domain Data scaling definition Data Precision Generalization Abstract Aggregation Data value attribute Schematic Attribute entity discrepancy Data value entity Known inconsistency Data Value Temporal inconsistency Acceptable inconsistency Default value Database id Entity Naming definition Union compatibility StructuralSchema isomorphism Conflicts Missing data item Attribute integrity constraints (Sheth92) Introduction Approached by Ontology Metadata Transformation rules Mapping

4 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 4 Introduction Emp_noNamesalary 123987Alastair Freich 14000 456339Fernando Lujan NULL SSNfullnamesal 123987A. Freich20000 789222Fiona Shaning 15000 employeSFEsalary 123987Al. FreichNULL 393765Lauren MacMillan 14500 DS 1DS 2DS 3 Employee_numberFull_name_employeeSalary 123987Alastair F.14000 123987A. Freich20000 123987Al. FreichNULL 456339Fernando LujanNULL 393765Lauren MacMillan14500 789222Fiona Shaning15000

5 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 5 Proposal We propose the development of a Data Quality Manager (DQM) to establish communication between the process of integration of information, the user and the application, to deal with semantic heterogeneity.

6 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 6 Proposal Local Schema 1 Local User 1Local User 2Local User N Wrapper Global User 1Global User 2Global User 3 Export Schema 1 Export Schema N Export Schema 2 Mediator Data Quality Manager Applications Global Schema Data Source 1 Data Source 2 Data Source N Wrapper … Local Schema 2 Local Schema N Global User M … Information Integration Process Selection of data sources Query Planning Detection and Fusion of data inconsistencies Query Integration Ranking query results Data Quality Manager DQ Metadata DQ Criteria Model DQ Assessment DQ Measure

7 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 7 Definition of Quality Criteria Reference Model DQM Components

8 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 8 Definition of Quality Criteria Definition of Metrics Measurement Model Reference Model DQM Components

9 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 9 Definition of Quality Criteria Definition of Metrics Definition of Assessment methods Assessment Model Measurement Model Reference Model DQM Components

10 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 10 Definition of Quality Criteria Definition of Metrics Definition of Assessment methods Definition of Quality Metadata (QMD) Quality Metadata Assessment Model Measurement Model Reference Model DQM Components

11 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 11 Completeness Accuracy Currency Survey, Queries, benchmarks # incomplete # total # errors # total Age + delivery time – input time Based on DQM components, classify the data sources QMD QMD Population DQM: Data Quality Manager QMD: Quality Meta Data

12 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 12 Data Quality Manager Selection of Best Data Sources Information Integration Process

13 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 13 Data Quality Manager Query Planning Selection of Best Data Sources Information Integration Process

14 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 14 Data Quality Manager Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Information Integration Process

15 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 15 Data Quality Manager Query Integration Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Information Integration Process

16 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 16 Data Quality Manager Ranking of Query results Query Integration Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Information Integration Process

17 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 17 User Query 1. The Quality user priorities are given by the user. Mapping Local/Global Schemas Selection of best Data Sources QMD Quality User Priorities Data sources Involved in the Query 1 23 4 Ranking of best Data Sources 2. The ranking of best data sources involved in the query is given before execution

18 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 18 User Query Top ranking Query Plan Query Partition QMD Quality User Priorities QueryA QueryB QueryC Plan 1 Plan 2 Plan 3. Plan N Query Planning

19 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 19 ResultX Data Inconsistencies Detection Data fusion QMD Quality user priorities ResultY ResultZ Inconsistent Query Result Execute Query Plan Data Fusion Consistent Query Result As in the DQM is stored where data comes from, it is possible to make decisions at data fusion time.

20 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 20 QMD Quality user priorities Data Fusion ResultJ ResultK ResultL Query Integration Query Result Ranking Ranking Query Result Consistent Query Result

21 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 21 Conclusion Using Data Quality Manager we can.. Approach data value level inconsistencies during Information Integration Process, using data quality properties. User may demand different quality priorities at query time. Manage user quality priorities AND data quality properties to give the expected quality query result by the user. What we need to do now…. Identify tools for measurement, assessment and develop a QMD. Store quality of data sources involved in the heterogeneous system. Identify techniques for Ranking of data sources and plans involved in the query Inconsistency detection Fusion data using data source and data level properties Ranking of query results.

22 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 22 Questions ?

23 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 23 Thanks !!


Download ppt "“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical."

Similar presentations


Ads by Google