“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Slides:



Advertisements
Similar presentations
Berliner XML Tage. Humboldt Universität zu Berlin, Oktober 2004 SWEB2004 – Intl Workshop on Semantic Web Technologies in Electronic Business Intelligent.
Advertisements

Serializability in Multidatabases Ramon Lawrence Dept. of Computer Science
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
A theory of Attribute Equivalence in Databases with Application to Schema Integration JAMES A.LARSON SHAMKANT B. NAVATHE RAMEZ ELMASRI Presented by REEMA.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.
The information integration wizard (Iwiz) project Report on work in progress Joachim Hammer Presented by Muhammed Al-Muhammed.
Lecture Two Database Environment Based on Chapter Two of this book:
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
A Unified Framework for the Semantic Integration of XML Databases
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Workshop – 10, December 2014, Berlin ICCS / NTUA Greece Efthymios Chondrogiannis An Intelligent Ontology Alignment Tool Dealing with Complicated Mismatches.
Baba Piprani (SICOM Canada) Robert Henkel (Transport Canada)
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Dimitrios Skoutas Alkis Simitsis
P15 Lai Xiaoni (U077151L) Qiao Li (U077194E) Saw Woei Yuh (U077146X) Wang Yong (U077138Y)
Workshop 1.4: ESPON Database ESPON Internal Seminar November 2011 Kraków,Poland ESPON M4D Project - LIG (Grenoble Computer Science Lab) Partner Jérôme.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Rupa Tiwari, CSci5980 Fall  Course Material Classification  GIS Encyclopedia Articles  Classification Diagram  Course – Encyclopedia Mapping.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Database Design. The process of developing database structures from user requirements for data a structured methodology Structured Methodology - a number.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
7 Strategies for Extracting, Transforming, and Loading.
1 ER Modeling BUAD/American University Mapping ER modeling to Relationships.
Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,
Modeling Security-Relevant Data Semantics Xue Ying Chen Department of Computer Science.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Logical Database Design and Relation Data Model Muhammad Nasir
Department of Mathematics Computer and Information Science1 CS 351: Database Management Systems Christopher I. G. Lanclos Chapter 4.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
VERA AULIA ( ).  Oil palm is one of the major edible oil traded in the global market.  Oil palm tree will start to produce fruits within three.
Laws and Algebras – A Comparison
DATA MODELS.
Quality Assessment in the framework of Map Generalization
Web Ontology Language for Service (OWL-S)
Big Data Quality the next semantic challenge
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Ontology-Based Information Integration Using INDUS System
Chapter 3 The Relational Model.
Big Data Quality the next semantic challenge
Metadata Construction in Collaborative Research Networks
Ontology-Based Approaches to Data Integration
Subject Name: SOFTWARE ENGINEERING Subject Code:10IS51
Big Data Quality the next semantic challenge
Presentation transcript:

“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh,EH14 4AS Doctoral Consortium

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 2 Agenda Introduction Proposal Data Quality Manager Components – Reference Model – Measure Model – Assessment Model – Quality Metadata Information Integration Process – Classification of DataSources – Selection of Best Datasources – Query Planning – Data Fusion – Ranking of Query results Questions

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 3 Naming Data Representation Domain Data scaling definition Data Precision Generalization Abstract Aggregation Data value attribute Schematic Attribute entity discrepancy Data value entity Known inconsistency Data Value Temporal inconsistency Acceptable inconsistency Default value Database id Entity Naming definition Union compatibility StructuralSchema isomorphism Conflicts Missing data item Attribute integrity constraints (Sheth92) Introduction Approached by Ontology Metadata Transformation rules Mapping

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 4 Introduction Emp_noNamesalary Alastair Freich Fernando Lujan NULL SSNfullnamesal A. Freich Fiona Shaning employeSFEsalary Al. FreichNULL Lauren MacMillan DS 1DS 2DS 3 Employee_numberFull_name_employeeSalary Alastair F A. Freich Al. FreichNULL Fernando LujanNULL Lauren MacMillan Fiona Shaning15000

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 5 Proposal We propose the development of a Data Quality Manager (DQM) to establish communication between the process of integration of information, the user and the application, to deal with semantic heterogeneity.

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 6 Proposal Local Schema 1 Local User 1Local User 2Local User N Wrapper Global User 1Global User 2Global User 3 Export Schema 1 Export Schema N Export Schema 2 Mediator Data Quality Manager Applications Global Schema Data Source 1 Data Source 2 Data Source N Wrapper … Local Schema 2 Local Schema N Global User M … Information Integration Process Selection of data sources Query Planning Detection and Fusion of data inconsistencies Query Integration Ranking query results Data Quality Manager DQ Metadata DQ Criteria Model DQ Assessment DQ Measure

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 7 Definition of Quality Criteria Reference Model DQM Components

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 8 Definition of Quality Criteria Definition of Metrics Measurement Model Reference Model DQM Components

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 9 Definition of Quality Criteria Definition of Metrics Definition of Assessment methods Assessment Model Measurement Model Reference Model DQM Components

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 10 Definition of Quality Criteria Definition of Metrics Definition of Assessment methods Definition of Quality Metadata (QMD) Quality Metadata Assessment Model Measurement Model Reference Model DQM Components

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 11 Completeness Accuracy Currency Survey, Queries, benchmarks # incomplete # total # errors # total Age + delivery time – input time Based on DQM components, classify the data sources QMD QMD Population DQM: Data Quality Manager QMD: Quality Meta Data

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 12 Data Quality Manager Selection of Best Data Sources Information Integration Process

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 13 Data Quality Manager Query Planning Selection of Best Data Sources Information Integration Process

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 14 Data Quality Manager Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Information Integration Process

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 15 Data Quality Manager Query Integration Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Information Integration Process

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 16 Data Quality Manager Ranking of Query results Query Integration Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Information Integration Process

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 17 User Query 1. The Quality user priorities are given by the user. Mapping Local/Global Schemas Selection of best Data Sources QMD Quality User Priorities Data sources Involved in the Query Ranking of best Data Sources 2. The ranking of best data sources involved in the query is given before execution

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 18 User Query Top ranking Query Plan Query Partition QMD Quality User Priorities QueryA QueryB QueryC Plan 1 Plan 2 Plan 3. Plan N Query Planning

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 19 ResultX Data Inconsistencies Detection Data fusion QMD Quality user priorities ResultY ResultZ Inconsistent Query Result Execute Query Plan Data Fusion Consistent Query Result As in the DQM is stored where data comes from, it is possible to make decisions at data fusion time.

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 20 QMD Quality user priorities Data Fusion ResultJ ResultK ResultL Query Integration Query Result Ranking Ranking Query Result Consistent Query Result

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 21 Conclusion Using Data Quality Manager we can.. Approach data value level inconsistencies during Information Integration Process, using data quality properties. User may demand different quality priorities at query time. Manage user quality priorities AND data quality properties to give the expected quality query result by the user. What we need to do now…. Identify tools for measurement, assessment and develop a QMD. Store quality of data sources involved in the heterogeneous system. Identify techniques for Ranking of data sources and plans involved in the query Inconsistency detection Fusion data using data source and data level properties Ranking of query results.

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 22 Questions ?

Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M. 23 Thanks !!