HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
1 A Survey of Approaches to Automatic Schema Matching Name: Samer Samarah Number: This.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University.
Generic Schema Matching using Cupid
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Aki Hecht Seminar in Databases (236826) January 2009
Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.
Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen, 1 David W. Embley 1 Stephen W. Liddle 2 1 Department of Computer Science 2 Rollins Center.
DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Schema Matching Algorithms Phil Bernstein CSE 590sw February 2003.
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.
BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Philip A. Bernstein Microsoft Corp. Jayant Madhavan Google Erhard Rahm Univ. of Leipzig Copyright © 2011 Microsoft Corp.
Senior Software Developer at DevScope Microsoft Integration MVP since 2011  Writer of numerous articles for Portuguese eMagazine “Programar”  Author.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.
M ANAGING U NCERTAINTY OF XML S CHEMA M ATCHING Reynold Cheng, Jian Gong, David W. Cheung ICDE’2010.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Module 3: Creating Maps. Overview Lesson 1: Creating a BizTalk Map Lesson 2: Configuring Basic Functoids Lesson 3: Configuring Advanced Functoids.
CSE 636 Data Integration Schema Matching Cupid Fall 2006.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Generic Schema Matching using Cupid Jayant Madhavan University of Washington Philip A. Bernstein Erhard Rahm Microsoft Research University of Leipzig.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Semantic Mappings for Data Mediation
HKU CSIS DB Seminar: HKU CSIS DB Seminar: COMA-A system for flexible combination of schema matching approaches - VLDB Hong-Hai Do and Erhard Rahm.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Kyriaki Dimitriadou, Brandeis University
Extracting Semantic Concept Relations
Property consolidation for entity browsing
Automating Schema Matching for Data Integration
Block Matching for Ontologies
Actively Learning Ontology Matching via User Interaction
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

HKU CSIS DB Seminar: HKU CSIS DB Seminar: Finding Set-Mappings in Schema Matching Supervisor: Dr. David Cheung Speaker: Eric Lo

DB Seminar2 What is Schema Matching? Finding semantic correspondences between elements of two schemas Input: 2 schemas, Output: A set of mappings Done by human experts  time consuming

DB Seminar3 Application domains Ecommerce and data translation: –E.g. Each trading partners has its own messaging format (e.g. EDI and ebXML) to describe the business transactions details –To deal with trading partners different message schemas, businesses often need to convert messages between the schemas –For example, the ‘total quantity’ field in one partner may match the ‘amount’ field in another partner

DB Seminar4 Outline Introduction Related Work Problem Possible Solutions Discussion and Conclusion References

DB Seminar5 State of the Art Goal: High match accuracy for large variety of schemas Different match criteria (e.g. name, data type, dictionary, thesaurus…) are used in a single algorithm –Linguistic  match the name (semantic) level –Structural  match also the structure

DB Seminar6 One related work Cupid [VLDB01] Support varieties of schema formats Cupid model the interconnected elements of a schema as a schema tree The schema tree can capture different data modeling in a unified way Match by both linguistic and structural level

DB Seminar7 Schema Tree Example PO POShipToPOBillTo POLines CityStreet CityStreet Item Count Line Qty UoM DeliverToInvoiceTo Items CityStreet CityStreet Item ItemCount Line Qty UoM PurchaseOrder Address

DB Seminar8 Match the schemas Calculate the similarities between elements Report those mappings with high similarity values (> threshold) Similarity is between [0,1] 2 Phases: –Linguistic Matching (lsim) E.g. match their value (edit distance for string) Use a thesaurus to resolve synonyms (“Bill”=“Invoice”), short form (“Qty”=“Quantity”) –Structural Matching

DB Seminar9 Structural Matching Match two elements based on context and vicinities Structural information can help to solve many ambiguity problems that “linguistic” cannot solve Define the similarity as ssim

DB Seminar10 Schema Tree Example Revisit PO POShipToPOBillTo POLines CityStreet CityStreet Item Count Line Qty UoM DeliverToInvoiceTo Items CityStreet CityStreet Item ItemCount ItemNo Qty UoM PurchaseOrder Address

DB Seminar11 Schema Tree Example Revisit (2) PO POShipToPOBillTo POLines CityStreet CityStreet Item Count Line Qty UoM DeliverToInvoiceTo Items CityStreet CityStreet Item ItemCount ItemNo Qty UoM PurchaseOrder Address

DB Seminar12 Similarity in Cupid Similarity = a x ssim + (1-a) x lsim “a” is the importance of structural similarity

DB Seminar13 How to evaluate a schema matching system? Precision and recall Option 1: –Compare with human experts Option 2: –Comparative study with other systems Automatic match returns P matches I is true positive (by domain experts) Precision= |c|/|P|  reliability of match predictions Recall= |c|/|I|  % of real matches found PI c

DB Seminar14 Limitations The problem is not solved completely Schema matching is just a step in data integration, data translation, ecommerce, etc. Why? –Given a set of matched schema elements … –Need to generate the query! –E.g. insert into B.BillTo… select A.InvoiceTo...

DB Seminar15 The Real Picture Should be… Input: –A.Firstname concat A.Lastname  B.Name –A.basesalary + A.workingHour x A.hourlyWages  B.salary Output: –XQuery, SQL

DB Seminar16 Current Problems All (except one) works on 1-to-1 matching Unrealistic Given a set of 1-to-1 mappings, users still need to form the real “input” from: –Mapping x: A.Firstname  B.Name (1:1) (not useful!) –Mapping y: A.Lastname  B.Name (1:1) (not useful!) –To: –A.Firstname concat A.Lastname  B.Name (2:1) (useful!)

DB Seminar17 Really novel? A DASFAA2003 paper “solved” it [DASFAA03] They augment each schemas to be matched by huge amount of ontological information Application oriented Assumption: –Each schemas has such ontology exists –Such ontology can be easily created

DB Seminar18 Set-Oriented Matching Use the ontology to enhance the similarity functions and generated a set of n-to-m mappings E.g. If one of the input schema is obtained from the real estate sector, argument an ontology about real estate, thus the system must know which elements form a set (e.g. firstname concat lastname is given in ontology a priori) Extremely high accuracy?

DB Seminar19 Our directions Previous work are not realistic Dig out the set-mappings without the help of ontology Observation 1: –Those m elements and n elements in the m-to-n mappings (useful input) are inter-correlated –Inter-correlated in terms on both structure and linguistic

DB Seminar20 Intra-similarity Structural similarity is much more important than linguistic similarity within schema –A.Firstname concat A.Lastname  B.Name same type?  Intra-Similar identical meaning?… Intra-Similar? Not necessary same hierarchical level? Intra-Similar –A.basesalary + A.workingHour x A.hourlyWages  B.salary same type?  Intra-Similar identical meaning?… Intra-Similar? Not necessary same hierarchical level? Intra-Similar

DB Seminar21 Intra-schema similarity A new similarity function is defined If intra-similar is > threshold, then output m-to-n mappings The algorithm is similar to other structural matching approaches Foreseeable evaluation result: –Must accuracy than all (except one) previous work As we can find those user expected mappings –May poorer than the ontology approach But no “magic” ontology is need and more realistic

DB Seminar22 Observation 2 Users efforts must involved in any approaches Users efforts are “throw away” afterwards The system is user-oriented Users give the final decision Why not learn and store the users patterns? –Improve accuracy –Suggest mappings to users in case they get lost

DB Seminar23 Discussion and Conclusion Still ongoing developing … A fact: very difficult to argue How to define the similarity between a set of source elements and a set of target elements, given the intra-similarity? Propose a novel matcher for discovering useful set-mappings

DB Seminar24 References [VLDB02] COMA-A system for flexible combination of schema matching approaches –By Hong-hai Do, Erhard Rahm –University of Leipzig [ICDE02] Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching –By Sergey Melik, Hector Garcia-Molina, Erhard Rahm –Stanford and University of Leipzig [VLDB02] Translating Web Data –By Lucian Popa, Yannis Velegrakis, Renee J. Miller, et. al. –IBM Almaden Research Center and University of Toronto [VLDB01] Generic Schema Matching with Cupid –By Jayant Madhavan, Philip A. Bernstein, Erhard Rahm –U of Washington and Microsoft Research [DASFAA03] Discoering Direct and Indirect Matches for Schema Elments –By Li Xu and David W. Embley –Brigham Young Univeristy

DB Seminar25 Thank You