WordNet: Connecting words and concepts Peng.Huang.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Statistical NLP: Lecture 3
Lexical Semantics and Word Senses Hongning Wang
Creating a Similarity Graph from WordNet
1 Words and the Lexicon September 10th 2009 Lecture #3.
C SC 620 Advanced Topics in Natural Language Processing Lecture Notes 2 1/20/04.
Meaning and Language Part 1.
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences.
SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Topic: More on semantic relations.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Query Operations Relevance Feedback & Query Expansion.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Lexical Semantics Chapter 16
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
23- November-091 WordNet and Extended WordNet Sriram Rajaraman.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Wordnet - A lexical database for the English Language.
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Using Semantic Relatedness for Word Sense Disambiguation
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Word Meaning and Similarity
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Word Meaning and Similarity Word Senses and Word Relations.
Lexical Semantics and Word Senses Hongning Wang
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Lexicons, Concept Networks, and Ontologies
SENSEVAL: Evaluating WSD Systems
Talp Research Center, UPC, Barcelona, Spain
Statistical NLP: Lecture 3
Generating sets of synonyms between languages
Ontology Engineering: from Cognitive Science to the Semantic Web
ConceptNet: Search ontology classes via human senses ---A proposal
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Linguistic Essentials
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Automatic generation of UW Dictionary through WordNet
Presentation transcript:

WordNet: Connecting words and concepts Peng.Huang

What is WordNet? A large lexical database, or “electronic dictionary”, for English Language Started in 1985, by Miller Covers most English nouns, verbs, adjectives, adverbs Electronic format makes it amenable to automatic manipulation

What’s so special about WordNet? Traditional paper dictionaries are organized alphabetically, so words that are grouped together (on the same page) are unrelated WordNet is organized by meaning, so words in close proximity are related

Basic Design of WordNet WordNet entries are word-concept mappings Natural Languages map many-to many: One concept can be expressed by many words (synonymy): {car, auto, automobile} {close, shut}

Basic Design of WordNet One word can express many concepts (polysemy): {club, stick} {club, nightclub} {club, playing card}

Basic Design of WordNet WordNet’s building blocks: sets of synonyms (synsets) --{hit, beat}, {big, large}, {queue, line} Each synset expresses a distinct concept. A gloss is a textual definition of the synset -- “band -- (a range of frequencies between two limits)” Currently, WordNet 3.0 contains appr. 117,000 synsets

Basic Design of WordNet Groups the meanings of English words into five categories –Nouns –Verbs –Adjectives –Adverbs –Function words(prepositions, pronouns, determiners)

Basic Design of WordNet WordNet stores, and allows one to retrieve, --all concepts that a given word can express --all words that express a given concept

But there’s more! Words and synsets are connected via meaning-based relations –Synonymy (Pipe, Tube) –Antonymy (Wet, Dry) –Hyponymy (Tree, Plant) –Meronymy (Ship, Fleet) –Morphological relations Result: a large semantic network (as opposed to a flat list in a paper dictionary)

Relations among WN noun synsets Hyperonymy/hyponymy relates super/subordinate synsets (denting more/less general concepts): {vehicle} / \ {car, automobile} {bicycle, bike} / \ \ {convertible} {SUV} {mountain bike} Transitivity: A car is a kind of vehicle An SUV is a kind of car => An SUV is a kind of vehicle

Relations among noun synsets Meronymy/holonymy (part/whole) {car, automobile} | {engine} / \ {spark plug} {cylinder} Inheritance: A car has an engine An engine has spark plugs => A car has spark plugs

Relations among verb synsets Verbs denote event Related by a “manner” relation {communicate} | {talk} / \ {stammer} {whisper}

Relations among verb Synset Semantics of events (verbs) are very different from semantics of entities (nouns) WordNet captures this fact with different relations Relation refer to temporal properties of events --partial and complete overlap of two events --prior or posterior events

WordNet Relations among synsets create interconnected network Different senses of polysemous words are members of distinct synsets that are related to different synsets (i.e., occupy different locations in the network) e.g., {stock, broth} has superordinate synset {dish} {stock, breed} has superordinate {variety} These different synsets are also linked to different part/whole synsets

WordNet A word’s meaning can be defined in terms of its position in the network club 1 is a kind of association/has members club 2 is a kind of stick Relatedness between words or synsets can be quantified in terms of path length (number of connections among synsets)

WordNet How closely related are {zebra} and {horse}? Very: Both share the direct superordinate equine What about {horse, sawhorse} and {horse, gymnastic horse}? Related, but less so: joint superordinate {artifact} is 4-5 levels up What about {zebra} and {horse, gymnastic horse}? Unrelated: the trees containing them never intersect!

WordNet for Word Sense Disambiguation WSD is a major problem in Natural Language Processing Assumption: words in a context (phrase, sentence, discourse) are semantically related So, horse in the neighborhood of zebra is likely to mean “equine”; in the neighborhood of gym it likely means “gymnastic horse.”

WordNet for WSD If you want to disambiguate “horse” in the context of “zebra,” look for all WordNet paths from “zebra” to “horse.” The shortest one is likely to give you the correct sense of “horse.”

WordNet for WSD Can take advantage of WordNet classes (trees of hierarchically related synsets) e.g., run 1 co-occurs with nouns that are all hyponyms (subordinate, more specific concepts) of office (mayor, congresswoman, President,...) run 2 co-occurs with nouns that are hyponyms of machine (computer, washer, printing press, engine,...)

Topics/Domain in WordNet Hierachical organization leaves many related concepts unconnected Solution: link synsets across “trees” in terms of their membership in a “domain” or topic E.g., synsets {contraindication},{surgery}, {physician},....are all linked to {medicine}, the concept that defines a domain or topic

Topics/Domain in WordNet Customizable: user can define new topics Topics can be as coarse- or fine-grained as desired By using synsets as topic labels, the concepts subsumed under the new topic(s) will continue to be part of the network

Current and Future Work Increase density of WordNet More links, new relations E.g. “role” relation among nouns: distinguish {poodle}-{dog} (a “type” relation) from {poodle}-{pet} (a “role” relation) poodle is a type of dog, but not a type of pet poodle can (but must not) play the “role” of pet

Work just completed... (sponsored by ARDA/AQUAINT) Manually link nouns, verbs, adjectives, adverbs in the definitions (“glosses”) to the appropriate synset: {bank (a financial institution that accepts deposits...)} {bank (sloping land..)}

Gloss Disambiguation {bank (a financial institution that accepts deposits...)} {financial, fiscal} {institution, establishment} {institution, custom} {bank (sloping land..)} {slope, incline} {land, ground, earth} {land, country}

Gloss Disambiguation: Results A closed system linking glosses and synsets (and a more densely connected network) Each gloss is more informative as it adds synset information for the words in the gloss Glosses are examples of contexts for many word- sense pairs, telling us how words with specific senses are being used in context Glosses can be used as training data for machine learning systems that want to “learn” to disambiguate words automatically

Summary From Google about 1,190,000 item with respect to WordNet There is more than what you see…But less than what you imagine!!!

Where to find WordNet Freely downloadable: Database, browser, documentation

Global WordNet Currently, wordnets exist for some 40 languages, including Arabic, Basque, Bulgarian, Estonian, Hebrew, Icelandic, Italian, Kannada, Latvian, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish,...

Thank you! Q&A