1 Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Matchmaking By: Lu Yang March 16, 2005.

Slides:



Advertisements
Similar presentations
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Advertisements

ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Fast Algorithms For Hierarchical Range Histogram Constructions
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Introduction This chapter explores graphs and their applications in computer science This chapter explores graphs and their applications in computer science.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
CS Data Structures Chapter 10 Search Structures (Selected Topics)
Modern Information Retrieval
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
Data Compression Basics & Huffman Coding
Important Problem Types and Fundamental Data Structures
Discrete Mathematics Lecture 9 Alexander Bukharovich New York University.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
1 A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in e-Business Environments Virendra C.Bhavsar* Harold Boley** Lu Yang* * Faculty of Computer.
COMP 103 Introduction to Trees.
AgentMatcher Search in Weighted, Tree-Structured Learning Object Metadata H. Boley, V.C. Bhavsar, D. Hirtle, A. Singh, Z. Sun and L. Yang National Research.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
CS Data Structures Chapter 5 Trees. Chapter 5 Trees: Outline  Introduction  Representation Of Trees  Binary Trees  Binary Tree Traversals 
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Querying Structured Text in an XML Database By Xuemei Luo.
CS Data Structures Chapter 10 Search Structures.
1 Lu Yang, Biplab Sarker, Virendrakumar C. Bhavsar and Harold Boley Faculty of Computer Science University of New Brunswick (UNB) Fredericton,
Dimitrios Skoutas Alkis Simitsis
1 Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making Lu Yang, Marcel Ball, Virendra C. Bhavsar and.
A Declarative Similarity Framework for Knowledge Intensive CBR by Díaz-Agudo and González-Calero Presented by Ida Sofie G Stenerud 25.October 2006.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
1 Weighted-Tree Simplicity Algorithm for Similarity Matching of Partial Product Descriptions Lu Yang, Biplab Sarker, Virendra C. Bhavsar and Harold Boley.
The AgentMatcher Architecture Applied to Power Grid Transactions Riyanarto Sarno Faculty of Information Technology, Sepuluh Nopember Institute of Technology.
Data Structures TREES.
1 Outline:  Optimization of Timed Systems  TA-Modeling of Scheduling Tasks  Transformation of TA into Mixed-Integer Programs  Tree Search for TA using.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
IDSS: Overview of Themes AI  Introduction  Overview IDT  Attribute-Value Rep.  Decision Trees  Induction CBR  Introduction  Representation  Similarity.
Chapter 6 - Basic Similarity Topics
Foundation of Computing Systems
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
1 Trees What is a Tree? Tree terminology Why trees? What is a general tree? Implementing trees Binary trees Binary tree implementation Application of Binary.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
1 The tree data structure Outline In this topic, we will cover: –Definition of a tree data structure and its components –Concepts of: Root, internal, and.
MCS Thesis By: Sébastien Mathieu Supervisors: Dr. Virendra C. Bhavsar and Dr. Harold Boley Examining Board: Dr. John DeDourek, Dr. Weichang Du, Dr. Donglei.
Discrete Structures Li Tak Sing( 李德成 ) Lectures
Chapter 13 Artificial Intelligence. Artificial Intelligence – Figure 13.1 The Turing Test.
Review Graph Directed Graph Undirected Graph Sub-Graph Spanning Sub-Graph Degree of a Vertex Weighted Graph Elementary and Simple Path Link List Representation.
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Lecture Trees Chapter 9 of textbook 1. Concepts of trees
School of Computer Science & Engineering
The Tree-String Problem
OWL-S: Experiences and Directions, 6th of June, Austria, 2007
Nov. 29, 2001 Ontology Based Recognition of Complex Objects --- Problems to be Solved Develop Base Object Recognition algorithms that identify non-decomposable.
Scale-Space Representation for Matching of 3D Models
Text Categorization Berlin Chen 2003 Reference:
Important Problem Types and Fundamental Data Structures
Semantic Resolution in a Simple E-Commerce Application
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

1 Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Matchmaking By: Lu Yang March 16, 2005

2 Outline Motivation Similarity Measures Partonomy Similarity Algorithm – Tree representation – Tree simplicity – Partonomy similarity Experimental Results Node Label Similarity – Inner-node similarity – Leaf-node similarity Conclusion

3 Motivation – Keywords/keyphrases – Trees e-business, e-learning … Buyer-Seller matching Metadata for buyers and sellers Tree similarity

4 Similarity measures Similarity measures apply to many research areas – CBR (Case Based Reasoning), information retrieval, pattern recognition, image analysis and processing, NLP (Natural Language Processing), bioinformatics, search engine, e- Commerce and so on In e-Commerce – Product P satisfies demand D ? Is it an “All or Nothing” question? Additional knowledge needed Bridge the gap between demand and product descriptions Now, a “How similar?” question!

5 Numerical modeling of similarity – A similarity measure on a set M is a real function sim: M 2  [0,1] – Similarity measures have following properties Reflexivity  x  M: sim(x,x) = 1 Symmetry iff  x,y  M: sim(x,y) = sim(y,x) Similarity measures (Cont’d)

6 An opposite notion of similarity measures A distance measure on a set M is a real valued function d: M 2  IR + Distance measures have following properties – Reflexivity  x  M d(x,x) = 0 – Symmetry iff  x, y  M d(x,y) = d(y,x) – Triangle Inequality iff  x, y  M d(x,y) = 0  x = y  x, y, z  M d(x,y) + d(y,z)  d(x,z) Similarity measures – distance measures

7 Transformation of similarity measures and distance measures – If a bijective, order inverting mapping f: [0,1]  [0,1] exists with f(d(x,y)) = sim(x,y) then sim and d are compatible Similarity measures – distance measures

8 Global measures are defined on the whole object – reflect the task and have a pragmatic character Local measures are defined on details (e.g. the domains of some attribute) – reflect technical and domain character – task independent Similarity measures – global and local

9 Local to global – each object A is constructed from so-called “components” A i by some construction process C(A i |i  n) = A given two objects A and B, sim i (A i, B i ) denotes the similarity of their i th components – amalgamation function f sim(A, B) is the global similarity measure of A and B sim(A, B) = f(sim i (A i, B i ) |i  n ) Similarity measures – global and local

10 Tree representation Characterises of our trees – Node-labled, arc-labled and arc-weighted – Arcs are labled in lexicographical order – Weights sum to Make Model Year 2002 Car Ford Explorer

11 Tree representation – serialization of trees – XML attributes for arc weights and subelements for arc labels – Weighted Object-Oriented RuleML Car Make Ford Model Explorer Year 2002 Tree serialization in WOO RuleML

12 Tree representation – Relfun version of tree cterm[ -opc[ctor[car]], -r[n[make],w[0.3]][ind[ford]], -r[n[model],w[0.2]][ind[explorer], -r[n[year],w[0.5]][ind[2002]] ]

13 Tree simplicity A 0.1 a ed E b B C f D c FG – Treeplicity(i,t) Depth degradation index “i” = 0.9 – Reciprocal of tree breadth – Depth degradation factor = 0.5 (0.9) (0.45) (0.225) tree simplicity:

14 Partonomy similarity – simple trees Escape Car Make Model Ford Mustang Car Make Model Ford tree ttree t´ (House) 0 1 Inner nodes 0 1 Leaf nodes

15 Partonomy similarity – complex trees  (s i (w i + w' i )/2)  (A(s i )(w i + w' i )/2) A(s i ) ≥ s i lom educational 0.5 general format platform 0.5 Introduction to Oracle t t´t´ technical edu-setgen-set tec-set language en title HTMLWinXP lom 0.1 general format platform Basic Oracle technical gen-set tec-set language en title * WinXP * : Don’t Care

16 Partonomy similarity – main recursive functions – Treesim(t,t'): Recursively compares any (unordered) pair of trees Paremeters N and i Three main recursive functions (Relfun) – Treemap(l,l'): Recursively maps two lists, l and l', of labeled and weighted arcs: descends into identical–labeled subtrees – Treeplicity(i,t): Decreases the similarity with decreasing simplicity

17 Experimental results – simple trees

18 Experimental results – simple trees (cont’d) Experiments Tree Results make auto mustang auto 0.45 model 2000 ford year t1t1 t2t2 1.0 model 0.45 explorer 0.9 make auto mustang auto 0.05 model 2000 ford year t3t3 t4t4 1.0 model 0.05 explorer

19 Experimental results – identical tree structures ExperimentsTree Results make auto ford year t2t2 model 0.5 explorer make auto 1999 ford year t4t4 model explorer make auto ford yea r t1t1 model 0.5 explorer make 2002 ford yea r t3t3 model explorer auto

20 b2 Experimental results – complex trees b c c3 c1 c2 c b3 A B C D b d b1 b4 c1 c3 d1 B1 B4 C1 C3 D1 B2B3 c4 c A B C D bd d1 B1 C1 C4 C3D E F t t´t´

21 b2 Experimental results – complex trees b c c3 c1 c2 c b3 A B C D b d b1 b4 c1 c3 d1 B1 B4 C1 C3 D1 B2B3 c4 c A B C D bd d1 B1 C1 C4 C3D E E F t t´t´

22 b2 Experimental results – complex trees b c c3 c1 c2 c b3 A B C D b d b1 b4 c1 c3 d1 B1 B4 C1 C3 D1 B2B3 c4 c A B * D bd d1 B1 C1 C4 C3D E F t t´t´

23 Node label similarity For both inner nodes and leaf nodes – Exact string matching binary result 0.0 or 1.0 – Permutation of strings “Java Programming” vs “Programming in Java” Number of identical words Maximum length of the two strings Example 1: For two node labels “a b c” and “a b d e”, their similarity is: 2 4 = 0.5

24 Example 2: Node labels “electric chair” and “committee chair” Node label similarity (cont’d) 1 2 = 0.5 meaningful? Semantic similarity

25 Node label similarity – inner nodes vs. leaf nodes Inner nodes — class-oriented – Inner node labels can be classes – classes are located in a taxonomy tree – taxonomic class similarity measures Leaf nodes — type-oriented – address, currency, date, price and so on – type similarity measures (local similarity measures)

26 Node label similarity String Permutation (both inner and leaf nodes) Exact String Matching (both inner and leaf nodes) Non-Semantic Matching Taxonomic Class Similarity (inner nodes) Type Similarity (leaf nodes) Semantic Matching

27 Inner node similarity – partonomy trees Distributed Programming Credit “Introduction to Distributed Programming” Textbook Tuition Duration $800 2months t1t1 t2t2 Object-Oriented Programming Credit “Objected-Oriented Programming Essentials” Textbook Tuition Duration $1000 3months partonomy trees

28 Inner node similarity – taxonomy tree Programming Techniques Applicative Programming General Automatic Programming Concurrent Programming Sequential Programming Object-Oriented Programming Distributed Programming Parallel Programming arc weights at the same level of a subtree do not need to add up to 1 assigned by machine learning algorithms or human experts

29 Programming Techniques Applicative Programming General Automatic Programming Concurrent Programming Sequential Programming Object-Oriented Programming Distributed Programming Parallel Programming red arrows stop at their nearest common ancestor the product of subsumption factors on the two paths (0.018) Inner node similarity – taxonomic class similarity

30 Inner node similarity – separate to encoded taxonomy tree Separate taxonomy tree – extra taxonomic class similarity measures How to compute semantic similarity without – changing our partonomy similarity algorithm – losing taxonomic semantic similarity Encode the (subsections) of taxonomy tree into partonomy trees Disjoint subsections of taxonomy lead to zero semantic similarity

31 Inner node similarity – encoding taxonomy tree into partonomy tree Programming Techniques Applicative Prgrm 0.1 General Automatic Prgrm Concurrent Prgrm Sequential Prgrm Object-Oriented Prgrm Distributed Prgrm Parallel Prgrm * * * * * * * * encoded taxonomy tree

32 Credit Title Tuition Duration $800 2months t1t1 Classification 0.65 taxonomy Object- Oriented Prgrm $1000 3months t2t2 Classification 0.65 taxonomy Distributed Prgrm course Concurrent Prgrm Parallel Prgrm Object-Oriented Prgrm course 1.0 Programming Techniques 1.0 * Distributed Prgrm Credit Title Tuition Duration Programming Techniques Sequential Prgrm * * * * * * * Inner node similarity – encoding taxonomy tree into partonomy tree (cont’d) encoded partonomy trees

33 Leaf node similarity (local similarity) 0.5 end_date Nov 3, t1t1 t 2 start_date May 3, 2004 Project 0.5 end_date Feb 18, start_date Jan 20, 2004 Project Example: “date” type leaf nodes DS(d 1, d 2 ) = { 0.0 otherwise if | d 1 – d 2 | ≥ – | d 1 – d 2 |

34 Implementation Relfun version – exact string matching – don’t care Java version – exact string matching – don’t care – string permutation – encoded taxonomy tree in partonomy tree (Teclantic) – “date” type similarity measure

35 Conclusion Arc-labeled and arc-weighted trees Partonomy similarity algorithm – Traverses trees top-down – Computes similarity bottom-up Node label similarity – Exact string matching (both inner and leaf nodes) – String permutation (both inner and leaf nodes) – Taxonomic class similarity (only inner nodes) Taxonomy tree Encoding taxonomy tree into partonomy tree – Type similarity (only leaf nodes) “date” type similarity measures

36 Questions?