Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.

Slides:



Advertisements
Similar presentations
IMAP: Discovering Complex Semantic Matches Between Database Schemas Ohad Edry January 2009 Seminar in Databases.
Advertisements

A Unified Framework for Context Assisted Face Clustering
Data Analysis of Tennis Matches Fatih Çalışır. 1.ATP World Tour 250  ATP 250 Brisbane  ATP 250 Sydney... 2.ATP World Tour 500  ATP 500 Memphis  ATP.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Mapping Target Schemas to Source Schemas Using WordNet Hierarchies Master’s Thesis Proposal David Jackman.
Semantic Integration in Heterogeneous Databases Using Neural Networks Wen-Syan Li, Chris Clifton Presentation by Jeff Roth.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Schema Matching Helen Chen CS652 Project 3 06/14/2002.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
Yahoo Visit Day Joint Reseach Opportunities Marti Hearst UC Berkeley School of Information.
DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.
UFMG, June 2002BYU Data Extraction Group Automating Schema Matching for Data Integration David W. Embley Brigham Young University Funded by NSF.
A Probabilistic Classifier for Table Visual Analysis William Silversmith TANGO Research Project NSF Grant # and Greetings Prof. Embley!
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
Scheme Matching and Data Extraction over HTML Tables from Heterogeneous Sources Cui Tao March, 2002 Founded by NSF.
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman.
BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
fleckvelter gonsity (ld/gg) hepth (gd) burlam falder multon repeat: 1.understand table 2.generate mini-ontology 3.match with growing.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
IARPA-BAA Question Period: 22 Dec 09 – 2 Feb 10 Proposal Due Date: 16 Feb 10.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Grouping search-engine returned citations for person-name queries Reema Al-Kamha, David W. Embley (Proceedings of the 6th annual ACM international workshop.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Exploitation of Structural Similarity in Semi-Structured Bioinformatics Data for Efficient Storage Construction Dongkyoo Shin Sejong.
Lesson 10-3 Warm-Up.
A Declarative Similarity Framework for Knowledge Intensive CBR by Díaz-Agudo and González-Calero Presented by Ida Sofie G Stenerud 25.October 2006.
XML Schema Integration Ray Dos Santos July 19, 2009.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
A Hybrid Match Algorithm for XML Schemas Ray Dos Santos Aug 21, 2009 K. Claypool, V. Hegde, N. Tansalarak UMass – Lowell - ICDE ‘06.
Data Modelling and Cleaning CMPT 455/826 - Week 8, Day 2 Sept-Dec 2009 – w8d21.
N n n n Objective- To recognize the properties of exponents and use them to simplify expressions. x 3 x x x = exponent base Rule of Common Bases x a =
1.2 Data Classification Qualitative Data consist of attributes, labels, or non-numerical entries. – Examples are bigger, color, names, etc. Quantitative.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Operations w/ Radicals Chapter 10, Section 3. Targets I will be able to simplify sums and differences of radical expressions. I will be able to simplify.
Tuning using Synthetic Workload Summary & Future Work Experimental Results Schema Matching Systems Tuning Schema Matching Systems Formalization of Tuning.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
 Rules of Distributive Property  Indiana Mathematical Standard › Use the distributive property in numerical equations and expressions.  5 th.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Statistical Schema Matching across Web Query Interfaces
7.2 – Rational Exponents The value of the numerator represents the power of the radicand. The value of the denominator represents the index or root of.
TESTS OF HYPOTHESES.
Simplify the following algebraic expression:
Asymmetric Gradient Boosting with Application to Spam Filtering
Automating Schema Matching for Data Integration
Presented by: Chang Jia As for: Pattern Recognition
Integers & Absolute Value
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Ratios, Rates, and Proportions
Presentation transcript:

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu

Background Problem: Attribute Matching Matching Possibilities (Facets) Attribute Names Data-Value Characteristics Expected Data Values Data-Dictionary Information Structural Properties

Approach Target Schema T Source Schema S Framework Individual Facet Matching Combining Facets Best-First Match Iteration

Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles

Individual Facet Matching Attribute Names Data-Value Characteristics Expected Data Values

Attribute Names Target and Source Attributes T : A S : B WordNet C4.5 Decision Tree: feature selection f0: same word f1: synonym f2: sum of distances to a common hypernym root f3: number of different common hypernym roots f4: sum of the number of senses of A and B

WordNet Rule The number of different common hypernym roots of A and B The sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

Confidence Measures

Data-Value Characteristics C4.5 Decision Tree Features Numeric data (Mean, variation, standard deviation, …) Alphanumeric data (String length, numeric ratio, space ratio)

Confidence Measures

Expected Data Values Target Schema T and Source Schema S Regular expression recognizer for attribute A in T Data instances for attribute B in S Hit Ratio = N’/N for (A, B) match N’ : number of B data instances recognized by the regular expressions of A N: number of B data instances

Confidence Measures

Combined Measures Threshold:

Final Confidence Measures

Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F % F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%

Conclusions Direct Attribute Matching – feasible Individual-Facet Matching – good Multifaceted Matching – better

Future Work Additional Facets More Sophisticated Combinations Additional Application Domains Automating Feature Selection Indirect Attribute Matching