Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

PlWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanis ł aw Szpakowicz* G4.19 Research.
A Bilingual Corpus of Inter-linked Events Tommaso Caselli♠, Nancy Ide ♣, Roberto Bartolini ♠ ♠ Istituto di Linguistica Computazionale – ILC-CNR Pisa ♣
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Emerging from the Quagmire Building Expert Systems Technologies for the Social Sciences Robert Wozniak IASSIST 2002 University of Connecticut – 12 June.
Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Introduction to Lexical Semantics Vasileios Hatzivassiloglou University of Texas at Dallas.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
LREC 2008 AWN 1 Building WordNets: The Arabic case H. Rodríguez.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Query Relevance Feedback and Ontologies How to Make Queries Better.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
WordNet: Connecting words and concepts Peng.Huang.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.
Wordnet - A lexical database for the English Language.
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Talp Research Center, UPC, Barcelona, Spain
Generating sets of synonyms between languages
Ontology Engineering: from Cognitive Science to the Semantic Web
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Comparing Two Thesaurus Representations for Russian
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl Beyond the Transfer and Merge Wordnet Construction: plWordNet and a Comparison with WordNet

Wordnet {samochód 1, pojazd samochodowy 1, auto 1, wóz 1 `car, automobile’ } {pogotowie 3, karetka 1, sanitarka 1, karetka pogotowia 1 `ambulance’ } meronymy { samochodzik 2 `small car’ } deminutiveness {bagażnik 1 `boot’ } hypernymy/hyponymy

plWordNet 2.0

Independent vs. Translation-based Wordnet Construction Transfer and merge. Examples: – EuroWordNet – most component wordnets built by the transfer method (Vossen 2002) – MultiWordNet – semi-automatic acquisition method from the Princeton WordNet (Bentivogli et. al. 2000) – IndoWordNet – expansion from Hindi Wordnet (Sinha et al. 2006, Bhattacharyya 2010) – FinWordNet – directly translated from the Princeton WordNet

Independent vs. Translation-based Wordnet Construction From scratch. Examples: – GermaNet – the core built independently – plWordNet – a unique, corpus-based method; largely independent of the Princeton WordNet

Synonymy and synsets “A wordnet is a collection of synsets linked by semantic relations.” A synset is a set of synonyms which represent the same lexicalised concept Synonyms are members of the same synset Wordnet development deserves better: an operational theory with precise guidelines for wordnet editors.

Basic building block: synset vs lexical unit? Synset relations link lexicalised concepts But are named after linguistic lexico-semantic relations Substitution tests are defined for lexical units Synsets group lexical units Every wordnet includes relations between lexical units (lexical relations), e.g., antonymy Lexical units can be observed in text, concepts cannot

Constitutive relations Synset = a group of lexical units which share all constitutive relations Constitutive relation = a lexico-semantic relation which – is frequent enough – and frequently shared by groups Also – is established in linguistics – and accepted in the wordnet tradition Examples: hypernymy, meronymy, cause

Synset as an abbreviation Synset as a notational convention for a group of lexical units sharing certain relations represents synonyms {afekt 1 `passion’, uczucie 2 `feeling’}  hypernym  {miłość 1 `love’, umiłowanie 1 `affection’, kochanie 1 `loving’} This is based on constitutive relations Additional distinctions: stylistic register and aspect Minimal committment principle: make as few assumptions as possible

Relations in plWordNet Starting point: relations in Princeton WordNet, EuroWordNet and GermaNet e.g., hyponymy, meronymy, antonymy, cause, instance for proper names Additional constitutive relations – e.g., verb meronymy, preceding, presupposition, – gradation for adjectives

Relations in plWordNet Specific: derivationally based lexico- semantic relations, e.g., – inhabitant (góral ‘highlander’ – góry ‘highlands’) – inchoativity (zapalić się perfect `light, start burning' -- palić się imperfect `burn, produce light') – process (chamieć imperfect `to become a boor‘ – cham `boor‘)

Construction process 1.Data collection: 1.8 billion words corpus 2.Data selection phase – corpus browsing – WSD-based word usage example extraction – WordnetWeaver: semi-automatic expansion 3.Data analysis – questions is it a correct Polish lemma? how many lexical units does it have? how to describe them with relations? Other knowledge sources: available Polish dictionaries, thesauri, encyclopaedias, lexicons, the Web, and intuition.

The result – size matters compared with Princeton WordNet: General statistics Lexical coverage Polysemy Synset size Relation density Hypernymy depth

General statistics Number of synsets, lemmas and LUs in the largest wordnets

Lexical coverage Proportion of lemmas from PWN/plWN found among vocabulary with a given corpus frequency

Polysemy Proportion of polysemous lemmas with regard to POS

Relation density Synset relation density in PWN 3.1 and in plWordNet 2.0

Hypernymy depth Hypernymy path length for nouns in PWN 3.1 and plWordNet 2.0

Hypernymy depth Polish WordNet Princeton WordNet

Hypernymy depth Computer ElectricDevice Device Artifact Object Physical Entity Polish WordNet Princeton WordNet SUMO

Mapping procedure: plWordNet onto Princeton WordNet 1.Recognise the sense of the source synset: the position in the network structure existing relations, commentaries; other synsets containing the given lemma 2.Search the target synset candidates for the target synset: intuitions, automatic prompting and dictionaries verifying candidates: comparing hypernymy and hyponymy structures existing inter-lingual relations; definitions, commentaries; dictionaries 3.Link the source synset with the target synset

Hierarchy of inter-lingual relations Inter-lingual Synonymy (only one per synset) Inter-lingual inter-register synonymy I-partial synonymy I-hyponymy I-hypernymy I-meronymy for parts, elements or materials of bigger wholes I-holonymy for a whole made of smaller parts, elements or materials

Results of inter-lingual mapping Mapping direction: plWordNet – Princeton WordNet Bottom-up – from the lowest levels in the hierarchy up ~ synsets mapped (~ lexical units/senses) – Synonymy: – Partial synonymy:971 – Inter-register synonymy:676 – Hyponymy: – Hypernymy:3526 – Meronymy:1898 – Holonymy:555 Mapped branches – people, artefacts, places, food, time units: all communication, states and processes, body parts, group names: partially

Different relations for coding the same conceptual dependencies

Applications Free WordNet-type licence facilitate applications. Examples: Semantic annotation in a corpus of referential gestures (Lis, 2012) Lexicon of semantic valency frames (Hajnicz, 2011; Hajnicz, 2012) Features for text mining from Web pages (Maciolek and Dobrowolski, 2013) Mapping between a lexicon and an ontology (Wróblewska et al., 2013) Word-to-word similarity in ontologies (Lula and Paliwoda-Pękosz, 2009) Text similarity for Information Retrieval (Siemiński, 2012) Text classification (Maciołek, 2010) Terminology extraction and clustering (Mykowiecka and Marciniak, 2012) Automated extraction of Opinion Attribute Lexicons (Wawer and Gołuchowski, 2012) Named Entity Recognition Word Sense Disambiguation (Gołuchowski and Przepiórkowski, 2012) Anaphora resolution More than 500 registered users, ~70 declared commercial applications

Conclusions plWordNet 2.0 – a national wordnet not adapted from Princeton WordNet plWordNet 2.0 is comparable to WordNet 3.1 in size, as well as in lexical coverage, hypernymy depth and relation density Synset membership depends only on constitutive relations between lexical units. A unique mapping strategy and a unique opportunity to compare the two lexical systems plWordNet 3.0 (2015): – a comprehensive wordnet of Polish – 200k of lemmas and 260k of LUs, mapped to PWN 3.?

Thank-you Thank you!

Differences between plWN and PWN Inter-lingual lexico-grammatical differences: – marked forms (diminutives, augmentatives) – lexicalised gender – lexical gaps Differences in the definition of synonymy and synset: – 'Mixed' PWN synsets – marked and unmarked forms, feminine and masculine, countable and uncountable, hypernym and hyponym- hypernymy and (plWN) vs. and/or (PWN)

Differences between plWN and PWN Other differences: – synset definitions incompatible with relations (PWN) – different relations used for coding the same conceptual dependencies – more fine-grained meaning differentiation – differences boiling down to the content and size of resource

Differences in lexicalisation

Relation density Synset relation density in PWN 3.1 and in plWordNet 2.0 in the select semantic domains Semantic domain