Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation.

Similar presentations

Presentation on theme: "CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation."— Presentation transcript:

1 CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation

2 Today Lexical Relations –Wordnet Semantic Role –Review: Semantic Roles –Selectional Restrictions –Selectional Association Word-Sense Disambiguation –Supervised –Unsupervised –Evaluation

3 Lexical Relations Semantic Networks: Used to represent lexical relationships –e.g. WordNet (George Miller et al) –Most widely used hierarchically organized lexical database for English –Synset: set of synonyms, a dictionary-style definition (or gloss), and some examples of uses --> a concept –Databases for nouns, verbs, and modifiers Applications can traverse network to find synonyms, antonyms, hyper- and hyponyms… –Available for download or online use –

4 Homonymy Homonyms: Words with same form – orthography and pronunciation -- but different, unrelated meanings, or senses –A bank 1 holds investments in a custodial account in the client’s name. –As agriculture is burgeoning on the east bank 2, the river will shrink even more

5 bank 1 "financial institution," 1474, from either O.It. banca or M.Fr. banque (itself from the O.It. term), both meaning "table" (the notion is of the moneylender's exchange table), from a Gmc. source (cf. O.H.G. bank "bench"); see bank (2). The verb meaning "to put confidence in" (U.S. colloquial) is attested from 1884. Bank holiday is from 1871, though the tradition is as old as the Bank of England. Bankroll (v.) "to finance" is 1920s. To cry all the way to the bank was coined 1956 by flamboyant pianist Liberace, after a Madison Square Garden concert that was packed with patrons but panned by bank 2 "earthen incline, edge of a river," c.1200, probably in O.E., from O.N. banki, from P.Gmc. *bangkon "slope," cognate with P.Gmc. *bankiz "shelf."

6 Related Phenomena Homophones (same pron/different orth) Read/red Homographs (same orth/different pron) Bass/bass

7 Polysemy Words with multiple but related meanings –They rarely serve red meat. –He served as U.S. ambassador. –He might have served his time in prison. –idea bank, sperm bank, blood bank, bank bank –Can the two candidate senses be conjoined? ?He served his time and as ambassador to Norway. –Same etymology –Often a domain-dependent specialization

8 Synonymy Substitutability: different words, same meaning –Old/aged, pretty/attractive, food/sustenance, money How big is that plane? How large is that plane? How big are you? How large are you? What makes words substitutable – and not? –Polysemy (large vs. old sense) –register: He’s really cheap/?parsimonious. –collocational constraints: roast beef, ?baked beef economy fare ?economy price

9 How could we find Synonyms and Collocations automatically? Synonyms: Identify words appearing frequently in similar contexts Blast victims were helped by civic-minded passersby. Public-spirited passersby came to the aid of this bombing victim. Collocations: Identify synonyms or closely related words that do and don’t appear in similar contexts Flu victims, flu sufferers vs. ?Cold victims, cold sufferers… Roast turkey vs. Baked turkey

10 Hyponomy General: hypernym (super…ordinate) –dog is a hypernym of poodle –Test: ‘That is a poodle’ implies ‘that is a dog’ Specific: hyponym (under..neath) –poodle is a hyponym of dog –Test: ‘That is a poodle’ implies ‘that is a dog’ Ontology: set of domain objects Taxonomy: Specification of relations between those objects Object hierarchy: Structured hierarchy that supports feature inheritance (e.g. poodle inherits some properties of dog)

11 Tropes, or Figures of Speech Metaphor: one entity is given the attributes of another (tenor/vehicle/ground)Metaphor –Life is a bowl of cherries. Don’t take it serious…. –We are the eyelids of defeated caves. ?? –GM killed the Fiero. (conventional metaphor: corp. as person) Metonymy: one entity used to stand for another (replacive) –GM killed the Fiero. –The ham sandwich wants his check. (deferred reference) Both extend existing sense to new meaning –Metaphor: completely different concept –Metonymy: related concepts

12 Sum Many definable word relations useful to NLP in different ways –Homonymy, polysemy, synonymy, hypernymy –Homography, homophony –Metaphor, metonymy –Collocations Resources available to aid in processing –WordNet, FrameNet, online dictionaries,…. A Huge Problem for NLP?

13 Ambiguity and Word Sense Disambiguation Recall: For semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’? Flies [V] vs. Flies [N] He robbed the bank. He sat on the bank. How do we determine the correct sense of the word? Machine Learning –Supervised methods –Lightly supervised and Unsupervised Methods Bootstrapping Dictionary-based techniques Selectional Association

14 Supervised WSD Approaches: –Tag a corpus with correct senses of particular words (lexical sample) or all words (all-words task) E.g. SENSEVAL corpora –Lexical sample: Extract features which might predict word sense –POS? Word identity? Punctuation after? Previous word? Its POS? Use Machine Learning algorithm to produce a classifier which can predict the senses of one word or many –All-words Use semantic concordance: each open class word labeled with sense from dictionary or thesaurus

15 –E.g. SemCor (Brown Corpus), tagged with WordNet senses

16 What Features Are Useful? “Words are known by the company they keep” –How much ‘company’ do we need to look at? –What do we need to know about the ‘friends’? POS, lemmas/stems/syntactic categories,… Collocations: words that frequently appear with the target, identified from large corpora federal government, honor code, baked potato –Position is key Bag-of-words: words that appear somewhere in a context window I want to play a musical instrument so I chose the bass. –Ordering/proximity not critical

17 Punctuation, capitalization, formatting

18 Rule Induction Learners and WSD Given a feature vector of values for independent variables associated with observations of values for the training set Top-down greedy search driven by information gain: how will entropy of (remaining) data be reduced if we split on this feature? Produce a set of rules that perform best on the training data, e.g. –bank 2 if w-1==‘river’ & pos==NP & src==‘Fishing News’… –… Easy to understand result but many passes to achieve each decision, susceptible to over-fitting

19 Naïve Bayes ŝ = p(s|V), or Where s is one of the senses S possible for a word w and V the input vector of feature values for w Assume features independent, so probability of V is the product of probabilities of each feature, given s, so p(V) same for any ŝ Then

20 How do we estimate p(s) and p(v j |s)? –p(s i ) is max. likelihood estimate from a sense-tagged corpus (count(s i,w j )/count(w j )) – how likely is bank to mean ‘financial institution’ over all instances of bank? –P(v j |s) is max. likelihood of each feature given a candidate sense (count(v j,s)/count(s)) – how likely is the previous word to be ‘river’ when the sense of bank is ‘financial institution’ Calculate for each possible sense and take the highest scoring sense as the most likely choice

21 Transparent Like case statements applying tests to input in turn fish within window--> bass 1 striped bass--> bass 1 guitar within window--> bass 2 bass player--> bass 1 –Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likelihood ratio Decision List Classifiers

22 Bootstrapping I –Start with a few labeled instances of target item as seeds to train initial classifier, C –Use high confidence classifications of C on unlabeled data as training data –Iterate Bootstrapping II –Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries, and label those automatically –One Sense per Discourse hypothesis Lightly Supervised Methods: Bootstrapping

23 Dictionary Approaches Problem of scale for all ML approaches –Building a classifier for each word with multiple senses Machine-Readable dictionaries with senses identified and examples –Simplified Lesk: Retrieve all content words occurring in context of target (e.g. Sailors love to fish for bass.)bass –Compute overlap with sense definitions of target entry »bass1: a musical instrument… »bass2: a type of fish that lives in the sea…

24 bass1 /be ɪ s/ Pronunciation Key - Show Spelled Pronunciation[beys] Pronunciation Key - Show IPA Pronunciation Music.Pronunciation KeyShow Spelled Pronunciation Pronunciation KeyShow IPA Pronunciation –adjective 1.low in pitch; of the lowest pitch or range: a bass voice; a bass instrument. 2.of or pertaining to the lowest part in harmonic music. – noun 3.the bass part. 4.a bass voice, singer, or instrument. 5.double bass.double bass. [Origin: 1400–50; late ME, var. of base2 with ss of basso ]basebasso bass2 /bæs/ Pronunciation Key - Show Spelled Pronunciation[bas] Pronunciation Key - Show IPA PronunciationPronunciation KeyShow Spelled Pronunciation Pronunciation KeyShow IPA Pronunciation –noun, plural (especially collectively ) bass, (especially referring to two or more kinds or species ) bass·es. 1.any of numerous edible, spiny- finned, freshwater or marine fishes of the families Serranidae and Centrarchidae. 2.(originally) the European perch, Perca fluviatilis. [Origin: 1375–1425; late ME bas, earlier bærs, OE bærs (with loss of r before s as in ass2, passel, etc.); c. D baars, G Barsch, OSw agh-borre ]asspassel

25 –Choose sense with most content-word overlap –Original Lesk: Compare dictionary entries of all content-words in context with entries for each sense –But….dictionary entries are short Expand with entries of ‘related’ words that appear in the original entry If tagged corpus available, collect all the words appearing in context of each sense of target word –e.g. all words appearing in sentences with bass1 added to signature for bass1 –Weight each by frequency of occurrence of word with that sense tagged in corpus (e.g. all senses of bass) to capture how discriminating a word is for the target word’s senses –Corpus Lesk performs best of all Lesk approaches

26 Disambiguation via Selectional Restrictions “Verbs are known by the company they keep” –Different verbs select for different thematic roles wash the dishes (takes washable-thing as patient) serve delicious dishes (takes food-type as patient) Method: another semantic attachment in grammar –Semantic attachment rules are applied as sentences are syntactically parsed, e.g. VP --> V NP V  serve {theme:food-type} –Selectional restriction violation: no parse

27 But this means we must: –Write selectional restrictions for each sense of each predicate – or use FrameNetFrameNet Serve alone has 15 verb senses –Obtain hierarchical type information about each argument (using WordNet)WordNet How many hypernyms does dish have? How many words are hyponyms of dish?hyponyms But also: –Sometimes selectional restrictions don’t restrict enough (Which dishes do you like?) –Sometimes they restrict too much (Eat dirt, worm! I’ll eat my hat!) Can we take a statistical approach?

28 Selectional Association (Resnik ‘97) Selectional Preference Strength: how much does a predicate tell us about the word class of its argument? George is a monster, George cooked a steak –S R (v): How different is p(c), the probability that any direct object will be a member of some class c, from p(c|v), the probability that a direct object of a specific verb will fall into that class? 1.Estimate conditional probabilities of word senses from a parsed corpus, counting how often each predicate occurs with an object argument 1.e.g. How likely is dish to be an object of served? 1.Jane served/V the dish/Obj 2.Then estimate the strength of association between each predicate and the super-class (hypernym) of the argument in Wordnet

29 –E.g. For each object x of serve (e.g. ragout, Mary, dish) Look up all x’s hypernym classes in WordNet (e.g dish isa piece of crockery, dish isa food item, ragout isa food item, Mary isa person…) Distribute “credit” for each of x’s senses occurring with serve among all hypernym classes (≈sense) to which x belongs (1/n for n classes) –Pr(c|v) is estimated at count(c,v)/count(v) –Why does this work? Ambiguous words have many superordinate classes John served food/the dish/tuna/curry The most common sense across all objects of the verb should eventually dominate the likelihood score

30 –How can we use this in wsd? Choose the class (sense) of the direct object with the highest probability, given the verb Mary served the dish proudly. Results: –Baselines: random choice of word sense is 26.8% choose most frequent sense (NB: requires sense- labeled training corpus) is 58.2% –Resnik’s: 44% correct from corpus only pred/arg relations labeled

31 Evaluating WSD In vivo/end-to-end/task-based/extrinsic vs. in vitro/stand- alone/intrinsic: evaluation in some task (parsing? q/a? IVR system?) vs. application independent –In vitro metrics: classification accuracy on held-out test set or precision/recall/f-measure if not all instances must be labeled Baseline: –Most frequent sense? –Lesk algorithms Ceiling: human annotator agreement

32 Summing Up Word relations: how can we identify different types? Disambiguating among word senses Next time: Ch 17: 3-5

Download ppt "CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation."

Similar presentations

Ads by Google