Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst Information Semantics Command & Control Center July 17, 2007 Ontologies Can't Help Records Management Or Can They?
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
XML: Extensible Markup Language
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Relational Algebra Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 Programming Languages (CS 550) Lecture Summary Functional Programming and Operational Semantics for Scheme Jeremy R. Johnson.
Database Systems: Design, Implementation, and Management Tenth Edition
Amit Shvarchenberg and Rafi Sayag. Based on a paper by: Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois,
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Chapter 12 Information Systems Chapter Goals Define the role of general information systems Explain how spreadsheets are organized Create spreadsheets.
Java Programming, 3e Concepts and Techniques Chapter 4 Decision Making and Repetition with Reusable Objects.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
XML Prashant Karmarkar Brendan Nolan Alexander Roda.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Chapter 12 Information Systems Nell Dale John Lewis.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
3x3x3 Rubik’s Cube Solver Kevin Van Kammen Kyle Rupnow Jason Lavrenz.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
4/20/2017.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
I. Pribela, M. Ivanović Neum, Content Automated assessment Testovid system Test generator Module generators Conclusion.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Introduction to the Orion Star Data
Jozef Stefan Institute Program Generators and Control System Software Development Klemen Žagar Anže.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Foundations of Computer Science Computing …it is all about Data Representation, Storage, Processing, and Communication of Data 10/4/20151CS 112 – Foundations.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
HEPTOX 1 : Marrying XML and Heterogeneity in Your P2P Databases Angela Bonifati (Icar CNR, Italy), Elaine Q.Chang, Laks V.S.Lakshmanan, Terence Ho, Rachel.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
Selection Control Structures. Simple Program Design, Fourth Edition Chapter 4 2 Objectives In this chapter you will be able to: Elaborate on the uses.
Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan.
Report from Workshop 8: XML and related technologies ELAG 2001 Jan Erik Kofoed BIBSYS Library Automation.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
G045 Lecture 08 DFD Level 1 Diagrams (Data Flow Diagrams Level 1)
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Java Programming, 2E Introductory Concepts and Techniques Chapter 4 Decision Making and Repetition with Reusable Objects.
Formal Methods in SE Lecture 16 Formal Methods in SE Qaisar Javaid Assistant Professor.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Using program transformations to add structure to a legacy data model Mariano Ceccato (1), Thomas Roy Dean (2), Paolo Tonella (1) (1) FBK-IRST, Trento,
Product Description. XML file generation Fluidity in data transfer. Just-in-time integration and transformation Based on JAVA technology. Output formats.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
General Architecture of Retrieval Systems 1Adrienn Skrop.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
Defects of UML Yang Yichuan. For the Presentation Something you know Instead of lots of new stuff. Cases Instead of Concepts. Methodology instead of the.
Product Training Program
Database Systems: Design, Implementation, and Management Tenth Edition
XML: Extensible Markup Language
Relational Database Design by ER- and EER-to- Relational Mapping
Introduction to Scheme
Chapter 2: Intro to Relational Model
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Introduction to Primitive Data types
Introduction to programming
Introduction to Primitive Data types
Presentation transcript:

Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan

Data Exchange “Scientia potentia est” What is Data Exchange?: The process of taking data built under a source schema and transforming it into data built under a target schema Data Exchange is the restructuring of data

Data Exchange – why? 1.Today when companies merge they also merge information sources.

Data Exchange – why? 2. When several institutions are working on a joint venture – a combined database is

Data Exchange – why? 3. Refreshing and updating data base scheme

Few problems with data exchange 1.The labels in the Source Schema and the values Target Schema could be very different 2.Data could be kept in a plethora of ways For instance: Car price could be stored in Shekels and in U.S dollars 3. Data could be lost in the exchange process if the Source Schema and Target Schema don’t correspond well

Data Exchange  In the past Data Exchange was done manually, taking many resources such as time and money.  Many researchers struggle with ways of improving data exchange

LocationList-priceAutomobi le Seniorit y Agent- name Belfast, NR650000Morris 82Gerry Adams Newry, NR500000Bentley Mark V 1Martin McGuiness IdNameCarmodelComm ission 48Nigel DoddsVauxhall Ian PaisleyFordT0.04 Schema Clunkers –R-Us Schema Buy-A-Wreck cars Car AGENTS Clunker table Antique Car Dealership CarModelpriceAgent-id Vauxhall14360,00048 FordModel T430,00066

Schema Clunkers –R-Us Schema Buy-A-Wreck Name Nigel Dodds Ian Paisley Agent- name Nigel Dodds Ian Paisley Matching Examples

Carmodel Vauxhall14 FordT Automobile Vauxhall 14 Ford T Schema Clunkers –R-Us Schema Buy-A-Wreck Matching Examples

Car type priceAgent-id Vauxh all 14360,00048 FordModel T 430,00066 IdCommission Schema Buy-A-Wreck cars Car AGENTS List-priceCar model Vauxhall Ford Model T Schema Clunkers –R-Us

Creating mappings: 1.schema matching: find matches 2.create query expressions: for automated data translation or exchange How do we match? Schema Matching Create Query expressions

Data Exchange 1.There may be no way to transform an instance given all of our constraints. 2. There may be numerous ways to transform the instance (possibly infinitely many). 3.We must identify and justify a best suited choice of solutions for our need.

S T Source schema S Target schema T Data Exchange - Summery To conclude: 1. Data exchange is exchanging data from a Source Schema to a Target Schema 2.It is a greatly dealt problem in the computerized world 3. Some Data exchange scenarios deal with Metadata

What is Metadata? Metadata: Data on Data. Metadata can come as: Video Audio Image Text

Why Do we need Meta – Data? Meta-Data helps us to understand data Can anyone tell what these numbers mean? Jan Feb

Why Do we need Meta – Data? Umbrella Sales Month USA UK Italy Jan Feb After adding Meta-Data…

Why Do we need Meta – Data? We all know this picture…

Why Do we need Meta – Data? What is this picture all about?

Why Do we need Meta – Data? Sir Edward Carson signing the Ulster Covenant

Why Do we need Meta – Data?

Wall Street, New York City, New York.

23 Data exchange scenarios may involve metadata transformations. Data-Metadata Translations Transforming the data in the Stock Ticker table to metadata in the Stock Quotes table is vital in the stock exchange world.

Data-Metadata Translations Mapping systems support Data-to-Data transformations with fixed schemas (Clio). Goal: Extend mapping systems to support Data-Metadata Translations.

Data Exchange Clio One software developed for simple graphic data exchange is “Clio” Clio corresponded values between the source scheme and the target scheme However, the Clio solution did not provide answers for possible data exchange scenarios that involve Metadata the solution involving Metadata is based on Clio

Clio interface

27 Source.Sales month USA UK Italy Jan Feb Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 1 “USA”

28 Source.Sales month USA UK Italy Jan Feb Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 2 “UK”

29 Source.Sales month USA UK Italy Jan Feb Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK m 3 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “Italy” and $t.units = $s.Italy Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 3 “Italy”

30 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units  countries  label value Select the elements to group Placeholder Copy elements’ values Copy elements’ labels Source.Sales Jan Feb Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Set of labels (strings) Dynamic selection of the source element Is a label value for $s in Source.Sales, {“USA”, “UK”, “Italy”} $c in {“USA”, “UK”, “Italy”} exists $t in Target.Sales where $t.month = $s.month and $t.country = $c and $t.units = $s.($c) MetadatA-Data (MAD) mapping: Metadata-to-Data: Our solution

31 Target: Rcd Stockquotes: SetOf Rcd time  symbols  label value Source: Rcd StockTicker: SetOf Rcd time symbol price Dynamic element Now we want to support the opposite operation The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Run-time: The dynamic element defines the target instance and the target schema. Data-to-Metadata

StockTicker (time: 0900, Symbol : MSFT, Price: ) StockTicker (time: 0900, Symbol : IBM, Price: ) StockTicker (time: 0905, Symbol : MSFT, Price: ) There are two possible interpretations for the target ndos: Consider this mapping and this source instance: Stockquotes (time: 0900, MSFT: ) Stockquotes (time: 0900, IBM: ) Stockquotes (time: 0905, MSFT: ) Target: Rcd Stockquotes: SetOf Rcd time symbols: Choice MSFT IBM Computed Target Instance Source Instance First alternative: Heterogeneous target records Computed Target Schema Data-to-Metadata: Heterogeneous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols  symbols  label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price

Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols  symbols  label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price StockTicker (time: 0900, Symbol : MSFT Price: ) StockTicker (time: 0900, Symbol : IBM Price: ) StockTicker (time: 0905, Symbol : MSFT Price: ) There are two possible interpretations for the target ndos: Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Computed Target Instance Source Instance Computed Target Schema Target: Rcd Stockquotes: SetOf Rcd time MSFT IBM Stockquotes (time: 0900, MSFT: 27.20, IBM: null ) Stockquotes (time: 0900, MSFT: null, IBM: ) Stockquotes (time: 0905, MSFT: 27.30, IBM: null ) Second alternative: Homogeneous target records

34 The Homogenous approach is a MAD improvemnet Stockquotes (time: 0900, MSFT : 27.20, IBM: null ) Stockquotes (time: 0900, MSFT : null, IBM: ) Stockquotes (time: 0905, MSFT : 27.30, IBM: null ) Homogeneity Constraint: “For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2” Stockquotes (time: 0900, MSFT : ) Stockquotes (time: 0900, IBM : ) Stockquotes (time: 0905, MSFT : ) Natural solution for semi- structured data models (XSD, DTD, JSON) Data-to-Metadata: Homogenous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols  symbols  label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price

MAD Mapping MetadatA-Data(MAD) mapping three steps: 1.Preliminary mapping  How do we map the Source schema to the Target schema  Preliminary mapping for > includes the metadata label and the value label of >.

36 Source: Rcd SalesByCountries : SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units  countries  label value { $x 1  Source.SalesByCountries, $x 2  >; $x 3 =$x 1.($x 2 ) } Target.Sales month USA UK Italy Jan Feb Source.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Preliminary Mapping Label Value Transfer

37 MAD Mapping 2.Skeletons:  n x m matrix of skeletons is constructed for the set of source preliminary mapping and the set of target preliminary mapping while each entry(i,j) can be potential mapping. 3.Creating MAD Mapping:  At this stage, the value correspondences need to be matched against the preliminary mapping in order to factor them into the appropriate skeletons. Source.Sales.country  Target.CountrySales.country Matched against one or more source mappings Matched against one or more target mappings

Source.SalesByCountries. >  Target.Sales.country Source.SalesByCountries.& >  Target.Sales.units MAD Mapping Generation Example Source: Rcd SalesByCountry: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units  countries  label value Source : { $x 1  Source.SalesByCountry, $x 2  >; $x 3 :=$x 1.($x 2 ) } Target : { $y 1  Target.Sales}

Source schema S Target schema T Declarative (internal) representation GUI XSLTJava Executable code (XSLT, XQuery, Java)  New construct to iterate over elements’ labels: placeholder  Target schema can be incomplete: nested dynamic output schema (ndos)  New mapping & query generation algorithms Data exchange with data-metadata support: Data to Data is a special case MAD vs Clio

40 Fin.