Lecture 16: Lexical Semantics, Wordnet, etc

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.
Advertisements

Statistical NLP: Lecture 3
Word Relations and Word Sense Disambiguation
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
5/16/2015CPSC503 Winter CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini.
Lexical Semantics Chapter 19
 Aim to get back on Tuesday  I grade on a curve ◦ One for graduate students ◦ One for undergraduate students  Comments?
Chapter 17. Lexical Semantics From: Chapter 17 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
CS 4705 Relations Between Words. Today Word Clustering Words and Meaning Lexical Relations WordNet Clustering for word sense discovery.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Wordnet and word similarity Lectures 11 and 12.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
November 20, 2003 Chapter 16 Lexical Semantics. Words have structured meanings Lexeme – a pairing of a form with a sense Orthographic form – the way the.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Meaning and Language Part 1.
Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.
SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney.
Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Natural Language Processing Lecture 22—11/14/2013 Jim Martin.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Lexical Semantics Chapter 16
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Linguistic Essentials
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
The meaning of Language Chapter 5 Semantics and Pragmatics Week10 Nov.19 th -23 rd.
Word Meaning and Similarity
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Word Meaning and Similarity Word Senses and Word Relations.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
SEMANTICS Chapter 10 Ms. Abrar Mujaddidi. What is semantics?  Semantics is the study of the conventional meaning conveyed by the use of words, phrases.
Natural Language Processing Vasile Rus
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Natural Language Processing Vasile Rus
Statistical NLP: Lecture 3
Ontology Engineering: from Cognitive Science to the Semantic Web
Part I: Basics and Constituency
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
CSCI 5832 Natural Language Processing
Lecture 21 Computational Lexical Semantics
CSCI 5832 Natural Language Processing
CSC 594 Topics in AI – Applied Natural Language Processing
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Lecture 26 Lexical Semantics
Machine Learning in Practice Lecture 11
CPSC 503 Computational Linguistics
Lecture 26 Semantics September 30, /31/2018.
Computational Lexical Semantics
CSCI 5832 Natural Language Processing
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
CPSC 503 Computational Linguistics
Linguistic Essentials
CSCI 5832 Natural Language Processing
Relations Between Words
Statistical NLP : Lecture 9 Word Sense Disambiguation
CPSC 503 Computational Linguistics
Presentation transcript:

Lecture 16: Lexical Semantics, Wordnet, etc LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Lecture 16: Lexical Semantics, Wordnet, etc November 23, 2004 Dan Jurafsky Thanks to Jim Martin for many of these slides! 11/13/2018 LING 138/238 Autumn 2004

Meaning Traditionally, meaning in language has been studied from three perspectives The meanings of individual words How those meanings combine to make meanings for individual sentences or utterances How those meanings combine to make meanings for a text or discourse We are going to focus today on word meaning, also called lexical semantics. 11/13/2018 LING 138/238 Autumn 2004

Outline: Lexical Semantics Concepts about word meaning including: Homonymy, Polysemy, Synonymy Thematic roles Two computational areas Enabling resource: WordNet Enabling technology: Word sense disambiguation 11/13/2018 LING 138/238 Autumn 2004

Preliminaries What’s a word? Definitions we’ve used over the quarter: Types, tokens, stems, roots, inflected forms, etc... Lexeme: An entry in a lexicon consisting of a pairing of a form with a single meaning representation Lexicon: A collection of lexemes 11/13/2018 LING 138/238 Autumn 2004

Relationships between word meanings Homonymy Polysemy Synonymy Antonymy Hypernomy Hyponomy Meronomy 11/13/2018 LING 138/238 Autumn 2004

Homonymy Homonymy: Lexemes that share a form Phonological, orthographic or both But have unrelated, distinct meanings Clear example: Bat (wooden stick-like thing) vs Bat (flying scary mammal thing) Or bank (financial institution) versus bank (riverside) Can be homophones, homographs, or both: Homophones: Write and right Piece and peace 11/13/2018 LING 138/238 Autumn 2004

Homonymy causes problems for NLP applications Text-to-Speech Same orthographic form but different phonological form bass vs bass Information retrieval Different meanings same orthographic form QUERY: bat care Machine Translation Speech recognition Why? 11/13/2018 LING 138/238 Autumn 2004

Polysemy The bank is constructed from red brick I withdrew the money from the bank Are those the same sense? Or consider the following WSJ example While some banks furnish sperm only to married women, others are less restrictive Which sense of bank is this? Is it distinct from (homonymous with) the river bank sense? How about the savings bank sense? 11/13/2018 LING 138/238 Autumn 2004

Polysemy A single lexeme with multiple related meanings (bank the building, bank the finantial institution) Most non-rare words have multiple meanings The number of meanings is related to its frequency Verbs tend more to polysemy Distinguishing polysemy from homonymy isn’t always easy (or necessary) 11/13/2018 LING 138/238 Autumn 2004

Metaphor and Metonymy Specific types of polysemy Metaphor: Metonymy Germany will pull Slovenia out of its economic slump. I spent 2 hours on that homework. Metonymy The White House announced yesterday. This chapter talks about part-of-speech tagging Bank (building) and bank (financial institution) 11/13/2018 LING 138/238 Autumn 2004

How do we know when a word has more than one sense? ATIS examples Which flights serve breakfast? Does America West serve Philadelphia? The “zeugma” test: ?Does United serve breakfast and San Jose? 11/13/2018 LING 138/238 Autumn 2004

Synonyms Word that have the same meaning in some or all contexts. filbert hazelnut youth adolescent big large automobile car Two lexemes are synonyms if they can be successfully substituted for each other in all situations 11/13/2018 LING 138/238 Autumn 2004

Synonyms But there are few (or no) examples of perfect synonymy. Why should that be? Even if many aspects of meaning are identical Still may not preserve the acceptability based on notions of politeness, slang, register, genre, etc. Example: Big and large? That’s my big sister That’s my large sister 11/13/2018 LING 138/238 Autumn 2004

Antonyms Words that are opposites with respect to one feature of their meaning Otherwise, they are very similar! Dark light Boy girl Hot cold Up down In out 11/13/2018 LING 138/238 Autumn 2004

Word Similarity Computation For various computational applications it’s useful to find words which are similar to another word. Machine translation (to find near-synonyms) Information retrieval (to do “query expansion”) Two ways to do this: Automatic computation based on distributional similarity Lsa.colorado.edu Use a thesaurus which lists similar words. WordNet (we’ll return to this in a few slides) 11/13/2018 LING 138/238 Autumn 2004

Hyponymy Hyponymy: the meaning of one lexeme is a subset of the meaning of another Since dogs are canids Dog is a hyponym of canid and Canid is a hypernym of dog Similarly, Car is a hyponym of vehicle Vehicle is a hypernym of car 11/13/2018 LING 138/238 Autumn 2004

WordNet A hierarchically organized lexical database On-line thesaurus + aspects of a dictionary Versions for other languages are under development Category Unique Forms # of Senses Noun 114,648 141,690 Verb 11,306 24,632 Adjective 21,436 31,015 Adverb 4,669 5808 11/13/2018 LING 138/238 Autumn 2004

WordNet Demo: http://www.cogsci.princeton.edu/cgi-bin/webwn 11/13/2018 LING 138/238 Autumn 2004

Format of Wordnet Entries 11/13/2018 LING 138/238 Autumn 2004

WordNet Noun Relations 11/13/2018 LING 138/238 Autumn 2004

WordNet Verb and Adj Relations 11/13/2018 LING 138/238 Autumn 2004

WordNet Hierarchies 11/13/2018 LING 138/238 Autumn 2004

How is “sense” defined in WordNet? The critical thing to grasp about WordNet is the notion of a synset; it’s their version of a sense or a concept Example: table as a verb to mean defer > {postpone, hold over, table, shelve, set back, defer, remit, put off} For WordNet, the meaning of this sense of table is this list. Another example: give has 45 senses in WN; one of them is: {supply, provide, render, furnish}. 11/13/2018 LING 138/238 Autumn 2004

The Internal Structure of Words Previous section talked about relations between words Now we’ll look at “internal structure” of word meaning. We’ll look at only a few aspects: Thematic roles in predicate-bearing lexemes Selection restrictions on thematic roles Decompositional semantics of predicates Feature-structures for nouns 11/13/2018 LING 138/238 Autumn 2004

Thematic Roles Thematic roles: Thematic roles are semantic generalizations over the specific roles that occur with specific verbs. I.e. Takers, givers, eaters, makers, doers, killers, all have something in common -er They’re all the agents of the actions We can generalize across other roles as well to come up with a small finite set of such roles 11/13/2018 LING 138/238 Autumn 2004

Thematic Roles 11/13/2018 LING 138/238 Autumn 2004

Thematic Role Examples 11/13/2018 LING 138/238 Autumn 2004

Thematic Roles Takes some of the work away from the verbs. It’s not the case that every verb is unique and has to completely specify how all of its arguments uniquely behave. Provides a locus for organizing semantic processing It permits us to distinguish near surface-level semantics from deeper semantics 11/13/2018 LING 138/238 Autumn 2004

Linking Thematic roles, syntactic categories and their positions in larger syntactic structures are all intertwined in complicated ways. For example… AGENTS are often subjects In a VP->V NP NP rule, the first NP is often a GOAL and the second a THEME 11/13/2018 LING 138/238 Autumn 2004

More on Linking John opened the door AGENT THEME The door was opened by John THEME AGENT The door opened THEME John opened the door with the key AGENT THEME INSTRUMENT 11/13/2018 LING 138/238 Autumn 2004

Inference Given an event expressed by a verb that expresses a transfer, what can we conclude about the thing labeled THEME with respect to the thing labeled GOAL? 11/13/2018 LING 138/238 Autumn 2004

Deeper Semantics From the WSJ… He melted her reserve with a husky-voiced paean to her eyes. If we label the constituents He and her reserve as the Melter and Melted, then those labels lose any meaning they might have had. If we make them Agent and Theme then we can do more inference. 11/13/2018 LING 138/238 Autumn 2004

Problems What exactly is a role? What’s the right set of roles? Are such roles universals? Are these roles atomic? I.e. Agents Animate, Volitional, Direct causers, etc Can we automatically label syntactic constituents with thematic roles? 11/13/2018 LING 138/238 Autumn 2004

Selection Restrictions I want to eat someplace near campus Using thematic roles we can now say that eat is a predicate that has an AGENT and a THEME What else? And that the AGENT must be capable of eating and the THEME must be something typically capable of being eaten 11/13/2018 LING 138/238 Autumn 2004

As Logical Statements For eat… Eating(e) ^Agent(e,x)^ Theme(e,y)^Isa(y, Food) (adding in all the right quantifiers and lambdas) 11/13/2018 LING 138/238 Autumn 2004

Back to WordNet Use WordNet hyponyms (type) to encode the selection restrictions 11/13/2018 LING 138/238 Autumn 2004

Selectional Restrictions as loose approximation of deeper semantics Unfortunately, verbs are polysemous and language is creative… WSJ examples… … ate glass on an empty stomach accompanied only by water and tea you can’t eat gold for lunch if you’re hungry … get it to try to eat Afghanistan 11/13/2018 LING 138/238 Autumn 2004

Solutions Eat glass Eat gold Eat Afghanistan This is actually about an eating event, so might change our model of “Edible”. Eat gold Also about eating, and the can’t creates a scope that permits the THEME to not be edible Eat Afghanistan This is harder, its not really about eating at all 11/13/2018 LING 138/238 Autumn 2004

Word Sense Disambiguation (WSD) Given a word in context, decide which sense of the word this is. 11/13/2018 LING 138/238 Autumn 2004

Selection Restrictions for WSD Semantic selection restrictions can be used to help in WSD. They can help disambiguate: Ambiguous arguments to unambiguous predicates Ambiguous predicates with unambiguous arguments Ambiguity all around 11/13/2018 LING 138/238 Autumn 2004

WSD and Selection Restrictions Ambiguous arguments Prepare a dish Wash a dish Ambiguous predicates Serve Denver Serve breakfast Both Serves vegetarian dishes 11/13/2018 LING 138/238 Autumn 2004

Problems As we saw earlier, selection restrictions are violated all the time. This doesn’t mean that the sentences are ill-formed or preferred less than others. This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated 11/13/2018 LING 138/238 Autumn 2004

Supervised Machine Learning Approaches a training corpus of words tagged in context with their sense used to train a classifier that can tag words in new text Just as we saw for part-of-speech tagging, speech recognition, statistical MT. 11/13/2018 LING 138/238 Autumn 2004

WSD Tags What’s a tag? A dictionary sense? For example, for WordNet an instance of “bass” in a text has 8 possible tags or labels (bass1 through bass8). 11/13/2018 LING 138/238 Autumn 2004

WordNet Bass The noun ``bass'' has 8 senses in WordNet bass - (the lowest part of the musical range) bass, bass part - (the lowest part in polyphonic music) bass, basso - (an adult male singer with the lowest voice) sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) freshwater bass, bass - (any of various North American lean-fleshed freshwater fishes especially of the genus Micropterus) bass, bass voice, basso - (the lowest adult male singing voice) bass - (the member with the lowest range of a family of musical instruments) bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) 11/13/2018 LING 138/238 Autumn 2004

Representations Most supervised ML approaches require a very simple representation for the input training data. Vectors of sets of feature/value pairs I.e. files of comma-separated values So our first task is to extract training data from a corpus with respect to a particular instance of a target word This typically consists of a characterization of the window of text surrounding the target 11/13/2018 LING 138/238 Autumn 2004

Representations This is where ML and NLP intersect If you stick to trivial surface features that are easy to extract from a text, then most of the work is in the ML system If you decide to use features that require more analysis (say parse trees) then the ML part may be doing less work (relatively) if these features are truly informative 11/13/2018 LING 138/238 Autumn 2004

Surface Representations Collocational and co-occurrence information Collocational Features about words at specific positions near target word Often limited to just word identity and POS Co-occurrence Features about words that occur anywhere in the window (regardless of position) Typically limited to frequency counts 11/13/2018 LING 138/238 Autumn 2004

Examples Example text (WSJ) An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps Assume a window of +/- 2 from the target 11/13/2018 LING 138/238 Autumn 2004

Examples Example text An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps Assume a window of +/- 2 from the target 11/13/2018 LING 138/238 Autumn 2004

Collocational Position-specific information about the words in the window guitar and bass player stand [guitar, NN, and, CJC, player, NN, stand, VVB] Wordn-2, POSn-2, wordn-1, POSn-1, Wordn+1 POSn+1… In other words, a vector consisting of [position n word, position n part-of-speech…] 11/13/2018 LING 138/238 Autumn 2004

Co-occurrence Information about the words that occur within the window. First derive a set of terms to place in the vector. Then note how often each of those terms occurs in a given window. 11/13/2018 LING 138/238 Autumn 2004

Co-Occurrence Example Assume we’ve settled on a possible vocabulary of 12 words that includes guitar and player but not and and stand guitar and bass player stand [0,0,0,1,0,0,0,0,0,1,0,0] Which are the counts of words predefined as e.g., [fish,fishing,viol, guitar, double,cello… 11/13/2018 LING 138/238 Autumn 2004

Classifiers Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible Naïve Bayes (the right thing to try first) Decision lists Decision trees Neural nets Support vector machines Nearest neighbor methods… 11/13/2018 LING 138/238 Autumn 2004

Classifiers The choice of technique, in part, depends on the set of features that have been used Some techniques work better/worse with features with numerical values Some techniques work better/worse with features that have large numbers of possible values For example, the feature the word to the left has a fairly large number of possible values 11/13/2018 LING 138/238 Autumn 2004

Naïve Bayes Rewriting with Bayes Removing denominator assuming independence of the features 11/13/2018 LING 138/238 Autumn 2004

Naïve Bayes P(s) … just the prior of that sense. Just as with part of speech tagging, not all senses will occur with equal frequency P(vj|s)… conditional probability of some particular feature/value combination given a particular sense You can get both of these from a tagged corpus with the features encoded 11/13/2018 LING 138/238 Autumn 2004

Naïve Bayes Test On a corpus of examples of uses of the word line, naïve Bayes achieved about 73% correct Good? 11/13/2018 LING 138/238 Autumn 2004

Decision Lists Another popular method… 11/13/2018 LING 138/238 Autumn 2004

Learning Decision Lists Restrict the lists to rules that test a single feature (1-decisionlist rules) Evaluate each possible test and rank them based on how well they work. Glue the top-N tests together and call that your decision list. 11/13/2018 LING 138/238 Autumn 2004

Yarowsky On a binary (homonymy) distinction used the following metric to rank the tests This gives about 95% on this test… 11/13/2018 LING 138/238 Autumn 2004

Bootstrapping What if you don’t have enough data to train a system… Pick a word that you as an analyst think will co-occur with your target word in particular sense Grep through your corpus for your target word and the hypothesized word Assume that the target tag is the right one 11/13/2018 LING 138/238 Autumn 2004

Bootstrapping For bass Assume play occurs with the music sense and fish occurs with the fish sense 11/13/2018 LING 138/238 Autumn 2004

Bass Results 11/13/2018 LING 138/238 Autumn 2004

Bootstrapping Perhaps better Use the little training data you have to train an inadequate system Use that system to tag new data. Use that larger set of training data to train a new system 11/13/2018 LING 138/238 Autumn 2004

Problems Given these general ML approaches, how many classifiers do I need to perform WSD robustly One for each ambiguous word in the language How do you decide what set of tags/labels/senses to use for a given word? Depends on the application 11/13/2018 LING 138/238 Autumn 2004

WordNet Bass Tagging with this set of senses is an impossibly hard task that’s probably overkill for any realistic application bass - (the lowest part of the musical range) bass, bass part - (the lowest part in polyphonic music) bass, basso - (an adult male singer with the lowest voice) sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) freshwater bass, bass - (any of various North American lean-fleshed freshwater fishes especially of the genus Micropterus) bass, bass voice, basso - (the lowest adult male singing voice) bass - (the member with the lowest range of a family of musical instruments) bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) 11/13/2018 LING 138/238 Autumn 2004

Summary Concepts about word meaning including: Two computational areas Homonymy, Polysemy, Synonymy Thematic roles Two computational areas Enabling resource: WordNet Enabling technology: Word sense disambiguation 11/13/2018 LING 138/238 Autumn 2004