# Natural Language Processing Lecture 20: Some Problems in Semantic Disambiguation.

Natural Language Processing Lecture 20: Some Problems in Semantic Disambiguation

Semantics Road Map 1.Lexical semantics 2.Disambiguating words Word sense disambiguation Coreference resolution 3.Semantic role labeling 4.Meaning representation languages 5.Discourse and pragmatics 6.Compositional semantics, semantic parsing

On Banks bank 1 = “sloping land” bank 2 = “financial institution” bank 3 = “biological repository” bank 4 = “building where a bank 1 does its business”

Zeugma Test How can we tell whether two word senses are distinct? The farmers in the valley grew potatoes. The farmers in the valley grew bored. *The farmers in the valley grew potatoes and bored.

Word Sense Disambiguation Input: a word in context Output: its sense (usually from a fixed, predefined set) How? (The options we’ll discuss) -Simplified Lesk Algorithm -Decision List -Supervised Learning

Simplified Lesk The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable rate mortgage securities. bank.n.01sloping land (especially the slope beside a body of water) "they pulled the canoe up on the bank"; "he sat on the bank of the river and watched the currents" bank.n.02depository financial institution, bank, banking concern, banking company (a financial institution that accepts deposits and channels the money into lending activities) "he cashed a check at the bank"; "that bank holds the mortgage on my home”

Simplified Lesk Compute the overlap between words in the target word’s context and the “signatures” of each potential sense (i.e., the words in its definition/examples). The sense with the maximum overlap is the predicted sense.

Simplified Lesk The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable rate mortgage securities. bank.n.01sloping land (especially the slope beside a body of water) "they pulled the canoe up on the bank"; "he sat on the bank of the river and watched the currents" bank.n.02depository financial institution, bank, banking concern, banking company (a financial institution that accepts deposits and channels the money into lending activities) "he cashed a check at the bank"; "that bank holds the mortgage on my home”

Modifications to Simplified Lesk Expand the signatures for each word sense – Include hypernyms and/or hyponyms – Include their definitions/examples Corpus Lesk: add context words from a sense- labeled corpus to each sense’s signature – Use inverse document frequency to weight words

Decision List (bass) RuleSense fish within window1 striped bass1 guitar within window2 bass player2 piano within window2 tenor within window2 sea bass1 play/V bass2 river within window1 violin within window2 salmon within window1 on bass2 bass are1 1 Yarowsky (1997) 2

Supervised Learning We need labeled data. How can we get it?

Bootstrapping for WSD 1.Produce seeds (dictionary definitions, single defining collocate, or label common collocates)

Bootstrapping for WSD 1.Produce seeds (dictionary definitions, single defining collocate, or label common collocates) 2.Train supervised classifier on labeled examples 3.Label all examples, and keep labels for high- confidence instances

Bootstrapping for WSD 1.Produce seeds (dictionary definitions, single defining collocate, or label common collocates) 2.Train supervised classifier on labeled examples 3.Label all examples, and keep labels for high- confidence instances – Optional: exploit one sense per discourse 4.Go to 2

Coreference Resolution

Mary picked up the ball. She threw it to me.

Entity Linking Mary picked up the ball. She threw it to me.

President Park Geun-hye of South Korea ordered the country’s military on Monday to deliver a strong and immediate response to any North Korean provocation, the latest turn in a war of words that has become a test of resolve for the relatively unproven leaders in both the North and South. “I consider the current North Korean threats very serious,” Ms. Park told the South’s generals. “If the North attempts any provocation against our people and country, you must respond strongly at the first contact with them without any political consideration. “As top commander of the military, I trust your judgment in the face of North Korea’s unexpected surprise provocation,” she added. Since Kim Jong-un took power after the death of his father, Kim Jong-il, in late 2011, the North has taken a series of provocative steps and amplified threats against Washington and Seoul to much louder and more menacing levels. The North has launched a three-stage rocket, tested a nuclear device and threatened to hit major American cities with nuclear-armed ballistic missiles. And Mr. Kim has declared that the Korean Peninsula has reverted to a “state of war.”

Different Kinds of Noun Phrases Can Co-Refer Indefinite NPs: a smart programmer, some cheesecake Definite NPs: the store around the corner, the friend I was telling you about Pronouns: she Demonstratives: that one, this, those students Names: IBM, Carnegie Mellon

High-Level Recipe for Coreference Resolution 1.Parse the text and identify NPs; then 2.For every pair of NPs, carry out binary classification: coreferential or not? 3.Collect the results into coreferential chains What do we need? -A choice of classifier -Lots of labeled data -Features

Features? Edit distance between the two NPs Are the two NPs the same NER type? Appositive syntax – “Alan Shepherd, the first American astronaut…” Proper/definite/indefinite/pronoun Gender Number Distance in sentences Number of NPs between Grammatical role etc.

Evaluation is more complicated than true-false. One approach: B-Cubed Input: hypothesis chains and reference chains Evaluation (Bagga and Baldwin, 1998)

Coreference Resolution Demo http://cogcomp.cs.illinois.edu/page/demo_view /Coref

