Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Machine Learning Approach Lecture 5.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Knowledge Discovery in Ontology Learning A survey.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Special Topics in Text Mining Manuel Montes y Gómez University of Alabama at Birmingham, Spring 2011.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Improving the Classification of Unknown Documents by Concept Graph Morteza Mohagheghi Reza Soltanpour
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Clustering Algorithms for Noun Phrase Coreference Resolution
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Text Categorization Berlin Chen 2003 Reference:
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

INTRODUCTION  Paper objectives: Presentation of a workbench for development and evaluation of the methods that learn ontologies Some experimental results that illustrate the suitability of the model in characterization of the methods of learning semantic classes

INTRODUCTION  Ontology building general strategy: Define a distance metric (as good an approximation for the semantic distance as possible) Devise/use a classifying algorithm that uses the above distance to build the ontology

Harris’ hypothesis  Formulation: Study of syntactic regularities leads to identification of syntactic schemata made out of combinations of word classes reflecting specific domain knowledge  Consequence: one can measure similarity using cooccurence in syntactic patterns

Conceptual clustering  Ontologies are organized as acyclic graphs: Nodes represent concepts Links represent inclusion (generality relation)  The methods considered in this paper rely upon bottom-up construction of the graph

The Mo’K model  Representation of examples: Binary syntactic patterns of the form:, where is the object, and the rest of the pattern is the attribute  Example: This causes a decrease in […]

Clustering  Bottom up clustering by joining classes that are near: Join classes of objects (nouns or actions – tuples ) that are frequently determined by the same attributes Join attribute classes that frequently determine the same objects

Corpora  Specialized corpora used for domain specific ontologies  Corpora are pruned (rare examples are eliminated) – the workbench allows the specification of Minimum number of occurences for a pattern to be considered Minimum number of occurences for an attribute/object to be considered

Distance modeling  Consider only distances that: Take syntactic analysis as input Do not use other ontologies (like WordNet) Are based on distributions of the attributes of an object  Identify general steps in computation of these distances to formulate a general model

Distance computation  Step 1: weighting phase Modify the frequencies of elements in the contingency matrix using general algorithm:  Initialization of the weight of each example E: W(E)  Initialization of the weight of each attribute A: W(A)  For each example E For each attribute A of the example  Calculate W(A) in the context of E Update global W(E) For each attribute A of the example  Normalization of the W(A) by W(E)  Step 2: similarity computation phase

Distance evaluation  The workbench provides support for evaluation of metrics  The procedure is Divide the corpus in training and test Perform clustering on training Use similarities computed on training to classify examples in the test and compute precision and recall – produce negative examples by randomly combining objects and attributes

Experiments  Purpose: evaluate Mo’K ’s parameterization possibilities and the impact of the parameters on results  Corpora: two French corpora One with cooking recipes from the Web – nearly examples One with agricultural data (Agrovoc) – examples

Results (Asium’s distance, 20% test) CorpusLearning object % Induced learned triplets Recall (test set) Precision AgrovocAction 40%4.7%45% Nom 38%5.3%45% CookingAction 34%12%32% Nom 38%9.1%52%

Recall rate  X-axis: the number of disjointed classes on which recall is evaluated

Class efficiency  Class efficiency: ration between triplets learned and triplets effectively used in evaluation of recall

Conclusions  Comments?  Questions?

Ontology Learning and Its Application to Automated Terminology Translation Authors: Roberto Navigli, Paola Velardi and Aldo Gangemi Presenter: Ovidiu Fortu

Introduction  Paper objective: Present OntoLearn, a system for automated construction of ontologies by extraction of relevant domain terms from corpora of text Present the usage of OntoLearn in the task of translating multiword terms from English to Italian

The OntoLearn architecture  Complex system, uses external resources like WordNet and the Ariosto language processor

The OntoLearn  New important feature: Semantic interpretation of terms (word sense disambiguation)  Three main phases: Terminology extraction Semantic interpretation Creation of a specialized view of WordNet

Terminology extraction  Terms selected with shallow stochastic methods  Better quality if syntactic features are used  High frequency in a corpus is not necessarily sufficient: credit card – is a term last week – not a term

Terminology extraction, continued  The comparison of frequencies in texts from different domains eliminates such constructs as “last week” – domain relevance score  Relevance of term t in domain D k

Terminology extraction, continued  Domain consensus of a term t in class D k exploits the frequency of t across documents

Terminology extraction, continued  A combination of the two scores is used to detect relevant terms  Only the terms with DW larger than a threshold are retained

Semantic interpretation  Step 1: create semantic nets for every w k  t and any synset w k by following all WordNet links, but limiting the path length to 3 (after disambiguation of words)  Step 2: intersect the networks and compute a score based on the number and type of semantic patterns connecting the networks

Semantic interpretation, continued  Semantic patterns are instances of 13 predefined metapatterns  Example: Topic, like in archeological site  Compute the score (S i k is sense k of word i in the term) for all possible pairs

Semantic interpretation, continued  Use the common paths in the semantic networks to detect semantic relations (taxonomic knowledge) between concepts: Select a set of domain specific semantic relations Use inductive learning to learn semantic relations given ontological knowledge Apply the model to detect semantic relations  Errors from the disambiguation phase can be corrected here

Creation of a specialized view of WordNet  In the last phase of the process Construct the ontology by eliminating the WordNet nodes that are not domain terms from the semantic networks A domain core ontology can also be used as backbone

Translating multiword terms  Classic approach: use of parallel corpora Advantage: easy to implement Disadvantage: few such corpora, especially in specific domains  OntoLearn based solution: Use EuroWordNet and build ontologies in both languages, associating them to synsets

Translation – the experiment  Experiment on 405 complex term in a tourism corpus  Problem: poor encoding of Italian words in EuroWordNet (fewer terms than in the English version – reduce to 113 examples)  Use semantic relations given by OntoLearn to translate: room service  servizio in camera Quality of translationGoodAcceptablePoor Manually corrected input 74%14%12% OntoLearn input70%14%16%

Conclusions  Questions?  Comments?