Presentation is loading. Please wait.

Presentation is loading. Please wait.

WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.

Similar presentations


Presentation on theme: "WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University."— Presentation transcript:

1 WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University

2 What is WordNet? A large lexical database, or “electronic dictionary” Covers most English nouns, verbs, adjectives, adverbs Electronic format makes it amenable to automatic manipulation Used in many applications (document retrieval and sorting, machine translation,...)

3 What’s so special about WordNet? Traditional paper dictionaries are organized alphabetically, so words that are grouped together (on the same page) are unrelated WordNet is organized by meaning, so words in close proximity are related

4 What’s so special...? Users can browse WordNet and find words related to their queries (like in a thesaurus)

5 Basic Design of WordNet WordNet entries are word-concept mappings Natural Languages map many-to many: One concept can be expressed by many words (synonymy): {car, auto, automobile} {close, shut}

6 Basic Design of WordNet One word can express many concepts (polysemy): {club, stick} {club, nightclub} {club, playing card}

7 Basic Design of WordNet Added problem in Natural Language: The words we use most frequently are the most polysemous (have the most meanings)!

8 Basic Design of WordNet WordNet harnesses synonymy and polysemy Represents words and concepts unambiguously Meaningfully relates words and concepts

9 Basic Design of WordNet WordNet’s building blocks: sets of synonyms (synsets) {hit, beat} {big, large} {queue, line} Each synset expresses a distinct concept. Currently, WordNet contains appr. 117,000 synsets

10 Basic Design of WordNet WordNet stores, and allows one to retrieve, --all concepts that a given word can express --all words that express a given concept

11 But wait--there’s more! Words and synsets are connected via meaning-based relations Result: a large semantic network (as opposed to a flat list in a paper dictionary)

12 Relations among WN noun synsets Hyperonymy/hyponymy relates super/subordinate synsets (denting more/less general concepts): {vehicle} / \ {car, automobile} {bicycle, bike} / \ \ {convertible} {SUV} {mountain bike} Transitivity: A car is a kind of vehicle An SUV is a kind of car => An SUV is a kind of vehicle

13 Relations among noun synsets Meronymy/holonymy (part/whole) {car, automobile} | {engine} / \ {spark plug} {cylinder} Inheritance: A car has an engine An engine has spark plugs => A car has spark plugs

14 Relations among verb synsets Verbs denote event Related by a “manner” relation {communicate} | {talk} / \ {stammer} {whisper}

15 Relations among verb synset Semantics of events (verbs) are very different from semantics of entities (nouns) WordNet captures this fact with different relations Relation refer to temporal properties of events --partial and complete overlap of two events --prior or posterior events

16 WordNet Relations among synsets create interconnected network Different senses of polysemous words are members of distinct synsets that are related to different synsets (i.e., occupy different locations in the network) e.g., {stock, broth} has superordinate synset {dish} {stock, breed} has superordinate {variety} These different synsets are also linked to different part/whole synsets

17 WordNet A word’s meaning can be defined in terms of its position in the network club 1 is a kind of association/has members club 2 is a kind of stick Relatedness between words or synsets can be quantified in terms of path length (number of connections among synsets)

18 WordNet How closely related are {zebra} and {horse}? Very: Both share the direct superordinate equine What about {horse, sawhorse} and {horse, gymnastic horse}? Related, but less so: joint superordinate {artifact} is 4-5 levels up What about {zebra} and {horse, gymnastic horse}? Unrelated: the trees containing them never intersect!

19 WordNet for Word Sense Disambiguation WSD is a major problem in Natural Language Processing Assumption: words in a context (phrase, sentence, discourse) are semantically related So, horse in the neighborhood of zebra is likely to mean “equine”; in the neighborhood of gym it likely means “gymnastic horse.”

20 WordNet for WSD If you want to disambiguate “horse” in the context of “zebra,” look for all WordNet paths from “zebra” to “horse.” The shortest one is likely to give you the correct sense of “horse.”

21 WordNet for WSD Can take advantage of WordNet classes (trees of hierarchically related synsets) e.g., run 1 co-occurs with nouns that are all hyponyms (subordinate, more specific concepts) of office (mayor, congresswoman, President,...) run 2 co-occurs with nouns that are hyponyms of machine (computer, washer, printing press, engine,...)

22 Topics/Domain in WordNet Hierachical organization leaves many related concepts unconnected Solution: link synsets across “trees” in terms of their membership in a “domain” or topic E.g., synsets {contraindication},{surgery}, {physician},....are all linked to {medicine}, the concept that defines a domain or topic

23 Topics/Domain in WordNet Customizable: user can define new topics Topics can be as coarse- or fine-grained as desired By using synsets as topic labels, the concepts subsumed under the new topic(s) will continue to be part of the network

24 Current and Future Work Increase density of WordNet More links, new relations E.g. “role” relation among nouns: distinguish {poodle}-{dog} (a “type” relation) from {poodle}-{pet} (a “role” relation) poodle is a type of dog, but not a type of pet poodle can (but must not) play the “role” of pet

25 Work just completed... (sponsored by ARDA/AQUAINT) Manually link nouns, verbs, adjectives, adverbs in the definitions (“glosses”) to the appropriate synset: {bank (a financial institution that accepts deposits...)} {bank (sloping land..)}

26 Gloss Disambiguation {bank (a financial institution that accepts deposits...)} {financial, fiscal} {institution, establishment} {institution, custom} {bank (sloping land..)} {slope, incline} {land, ground, earth} {land, country}

27 Gloss Disambiguation: Results A closed system linking glosses and synsets (and a more densely connected network) Each gloss is more informative as it adds synset information for the words in the gloss Glosses are examples of contexts for many word- sense pairs, telling us how words with specific senses are being used in context Glosses can be used as training data for machine learning systems that want to “learn” to disambiguate words automatically

28 Where to find WordNet Freely downloadable: http://wordnet.princeton.edu/ Database, browser, documentation

29 Global WordNet Currently, wordnets exist for some 40 languages, including Arabic, Basque, Bulgarian, Estonian, Hebrew, Icelandic, Italian, Kannada, Latvian, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish,... http://www.globalwordnet.org

30 Thank you! For questions, comments, and papers: fellbaum@princeton.edu


Download ppt "WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University."

Similar presentations


Ads by Google