Inexact Matching of ontology graphs using expectation maximization Prashant Doshi, Ravikanth Kolli, Christopher Thomas Web Semantics: Science, Services and Agents on the World Wide Web 2009 Keywords: ontology, matching, expectation- maximization Universidad Autónoma de Madrid -15 Enero 2010

Agenda Introduction Expectation Maximization Ontology Schema Model Graph Matching with GEM Random sampling and Heuristics Computational complexity Initial Results Large ontologies Benchmarks Conclusions Universidad Autónoma de Madrid -15 Enero 2010

Introduction Growing usefulness of semantic web based on the increasingly number of ontologies OWL and RDF are labeled-directed-graph ontology representation languages Formulation ‘Find the most likely map between the two ontologies’* Universidad Autónoma de Madrid -15 Enero 2010

Expectation Maximization Technique to find the maximum likelihood estimate of the underlying model from observed data in the presence of missing data. E-Step Formulation of the estimate M-Step Search for the maximum of the estimate Relaxed search using: GEM Universidad Autónoma de Madrid -15 Enero 2010

Ontology Schema Model OWL y RDF (labeled directed graphs) Labels are removed, constructing a bipartite graph. Universidad Autónoma de Madrid -15 Enero 2010

Graph matching GEM Maximum likelyhood estimate problem Hidden variables: mapping matrix Local search guided by GEM Search-Space Universidad Autónoma de Madrid -15 Enero 2010

Graph matching GEM M * gives the maximum conditional probability of the data graph O d given O m. Only many-one matching Focused on homeomorphisms Universidad Autónoma de Madrid -15 Enero 2010

Graph matching GEM MLE problem with respect to map hidden variables

Graph matching GEM Need to maximize:

Graph matching GEM Probability that x a is in correspondence with y a given the assignment model Each of the hidden variables

Graph matching GEM Graph constraints And Smith-Waterman

Graph matching GEM Exhaustive search not possible Problem: local maxima Use K random models + heuristics If two classes are mapped, map their parents + Random restart Universidad Autónoma de Madrid -15 Enero 2010

Computational complexity SW technique is O(L 2 ) EM mapping is O(K*(|V m |*|V d |) 2 ) Universidad Autónoma de Madrid -15 Enero 2010

Initial Experiments Universidad Autónoma de Madrid -15 Enero 2010

Large Ontologies Universidad Autónoma de Madrid -15 Enero 2010

Benchmarks Universidad Autónoma de Madrid -15 Enero 2010

Conclusions Structure and Syntactic vs External Resources Weak performance: dissimilar names and structure Good performance: extensions and flattening Not scalable : partitioning and extension No longer GEM, but converges Future work: Markov Chain MonteCarlo methods Extensible algorithm: can include other aproaches Universidad Autónoma de Madrid -15 Enero 2010

