CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,

Slides:



Advertisements
Similar presentations
The people Look for some people. Write it down. By the water
Advertisements

THEORY-BOOK.
A.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Semantics (Representing Meaning)
Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
C SC 620 Advanced Topics in Natural Language Processing Lecture 4 1/27/04.
SEMANTICS.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 3– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Cognitive Linguistics Croft & Cruse 6 A dynamic construal approach to sense relations I: hyponymy and meronymy.
Statistical NLP: Lecture 3
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
October 2014 Miss Hughes Maths Subject Leader
Meaning and Language Part 1.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Topic: Hindi Wordnet, Formalization.
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Introduction to linguistics II
Session 8 Lexical Semantic
Indo WordNet A WordNet for Hindi
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Topic: More on semantic relations.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
9/8/20151 Natural Language Processing Lecture Notes 1.
Transitivity / Intransitivity Lecture 7. (IN)TRANSITIVITY is a category of the VERB Verbs which require an OBJECT are called TRANSITIVE verbs. My son.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
Computational Lexical Semantics Lecture 8: Selectional Restrictions Linguistic Institute 2005 University of Chicago.
APPLIED LINGUISTICS AMBIGUITY. LOOK AT THIS: WHAT IS AMBIGUITY? A word, phrase, or sentence is ambiguous if it has more than one meaning, in other words.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Representations Floyd Nelson A.D 2009 December 28.
I am ready to test!________ I am ready to test!________
Sight Words.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
WordNet: Connecting words and concepts Peng.Huang.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Mathematics Workshop for early years parents September 2015.
Introduction to Semantics and Pragmatics
What is Wordnet Coimbatore Workshop at Amrita University Pushpak Bhattacharyya CSE Dept., IIT Bombay.
Linguistic Essentials
LECTURE 2: SEMANTICS IN LINGUISTICS
Wordnet - A lexical database for the English Language.
Rules, Movement, Ambiguity
The meaning of Language Chapter 5 Semantics and Pragmatics Week10 Nov.19 th -23 rd.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
Word Meaning and Similarity
High Frequency Words August 31 - September 4 around be five help next
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Sight Words.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
High Frequency Words.
SEMANTICS Chapter 10 Ms. Abrar Mujaddidi. What is semantics?  Semantics is the study of the conventional meaning conveyed by the use of words, phrases.
Statistical Methods in NLP Course 10 Diana Trandab ă ț
Created By Sherri Desseau Click to begin TACOMA SCREENING INSTRUMENT FIRST GRADE.
commerce 2. charting 3. surveying appearance 5. venture
Statistical NLP: Lecture 3
SEMASIOLOGY LECTURE 1.
Макет заголовкаМакет заголовка Підзаголовок. The noun is the central lexical unit of language. It is the main nominative unit of speech. As any other.
Lesson 1 – what is descriptive writing
CSC 594 Topics in AI – Applied Natural Language Processing
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Fry Word Test First 300 words in 25 word groups
Linguistic Essentials
Presentation transcript:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 11 th Jan, 2011

Foundation of Wordnet: Lexical Matrix

Meaning-form relationship M1 M2 M3 Mk-1 Mk... W1 W2 W3 Wk-1 Wk... Many to many relationship Meanings container Word forms container

Meaning-form Homonymy (accidental identity or word borrowing) Polysemy (shades of meaning) Eg: Fall 1.The kingdom fell 2.The fruit fell Homography (same picture) Eg: river bank and financial bank Homophony (same sound) Eg: write and right

Distributional Similarity Words which are semantically similar tend to appear in syntactically similar contexts. The neighbors of the words tend to be the same. Technically known as Distributional Similarity.

Synset+Gloss+Example Crucially needed for concept explication, wordnet building using another wordnet and wordnet linking. English Synset: {earthquake, quake, temblor, seism} -- (shaking and vibration at the surface of the earth resulting from underground movement along a fault plane of from volcanic activity) Hindi Synset: { भूकंप, भूचाल, भूडोल, जलजला, भूकम्प, भू - कंप, भू - कम्प, ज़लज़ला, भूमिकंप, भूमिकम्प - प्राकृतिक कारणों से पृथ्वी के भीतरी भाग में कुछ उथल - पुथल होने से ऊपरी भाग के सहसा हिलने की क्रिया " २००१ में गुज़रात में आये भूकंप में काफ़ी लोग मारे गये थे " (shaking of the surface of earth; many were killed in the earthquake in Gujarat) Marathi Synset: धरणीकंप, भूकंप - पृथ्वीच्या पोटात द्रव्यक्षोभ होऊन पृष्ठभाग हालण्याची क्रिया " २००१ साली गुजरातमध्ये झालेल्या धरणीकंपात अनेक लोक मृत्युमुखी पडले "

Semantic Relations Hypernymy and Hyponymy Relation between word senses (synsets) X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and asymmetrical Hypernymy is inverse of Hyponymy (lion->animal->animate entity->entity)

Semantic Relations (continued) Meronymy and Holonymy Part-whole relation, branch is a part of tree X is a meronymy of Y if X is a part of Y Holonymy is the inverse relation of Meronymy {kitchen} ………………………. {house}

Lexical Relation Antonymy Oppositeness in meaning Relation between word forms Often determined by phonetics, word length etc. ({rise, ascend} vs. {fall, descend})

Glossstudy Hyponymy Dwelling,abode bedroom kitchen house,home A place that serves as the living quarters of one or mor efamilies guestroom veranda bckyard hermitage cottage Meronymy Hyponymy MeronymyMeronymy Hypernymy WordNet Sub-Graph

Troponym and Entailment Entailment {snoring – sleeping} Troponym {limp, strut – walk} {whisper – talk}

Entailment Snoring entails sleeping. Buying entails paying. Proper Temporal Inclusion. Inclusion can be in any way. Sleeping temporally includes snoring. Buying temporally includes paying. Co-extensiveness. (Troponymy) Limping is a manner of walking.

Opposition among verbs. {Rise,ascend} {fall,descend} Tie-untie (do-undo) Walk-run (slow,fast) Teach-learn (same activity different perspective) Rise-fall (motion upward or downward) Opposition and Entailment. Hit or miss (entail aim). Backward presupposition. Succeed or fail (entail try.)

The causal relationship. Show- see. Give- have. Causation and Entailment. Giving entails having. Feeding entails eating.

Kinds of Antonymy Size Small - Big Quality Good – Bad State Warm – Cool Personality Dr. Jekyl- Mr. Hyde Direction East- West Action Buy – Sell Amount Little – A lot Place Far – Near Time Day - Night Gender Boy - Girl

Kinds of Meronymy Component-object Head - Body Staff-object Wood - Table Member-collection Tree - Forest Feature-Activity Speech - Conference Place-Area Palo Alto - California Phase-State Youth - Life Resource-process Pen - Writing Actor-Act Physician - Treatment

Gradation State Childhood, Youth, Old age Temperature Hot, Warm, Cold Action Sleep, Doze, Wake

Metonymy Associated with Metaphors which are epitomes of semantics Oxford Advanced Learners Dictionary definition: “The use of a word or phrase to mean something different from the literal meaning” Does it mean Careless Usage?!

Insight from Sanskritic Tradition Power of a word Abhidha, Lakshana, Vyanjana Meaning of Hall: The hall is packed (avidha) The hall burst into laughing (lakshana) The Hall is full (unsaid: and so we cannot enter) (vyanjana)

Metaphors in Indian Tradition upamana and upameya Former: object being compared Latter: object being compared with Puru was like a lion in the battle with Alexander (Puru: upameya; Lion: upamana)

Upamana, rupak, atishayokti upamana: Explicit comparison Puru was like a lion in the battle with Alexander rupak: Implicit comparison Puru was a lion in the battle with Alexander Atishayokti (exaggeration): upamana and upameya dropped Puru’s army fled. But the lion fought on.

Modern study (1956 onwards, Richards et. al.) Three constituents of metaphor Vehicle (items used metaphorically) Tenor (the metaphorical meaning of the former) Ground (the basis for metaphorical extension) “The foot of the mountain” Vehicle: :foot” Tenor: “lower portion” Ground: “spatial parallel between the relationship between the foot to the human body and the lower portion of the mountain with the rest of the mountain”

Interaction of semantic fields (Haas) Core vs. peripheral semantic fields Interaction of two words in metonymic relation brings in new semantic fields with selective inclusion of features Leg of a table Does not stretch or move Does stand and support

Lakoff’s (1987) contribution Source Domain Target Domain Mapping Relations

Mapping Relations: ontological correspondences Anger is heat of fluid in container Heat (i) Container (ii) Agitation of fluid (iii) Limit of resistence (iv) Explosion AngerBody Agitation of mind Limit of ability to suppress Loss of control

Image Schemas Categories: Container Contained Quantity More is up, less is down: Outputs rose dramatically; accidents rates were lower Linear scales and paths: Ram is by far the best performer Time Stationary event: we are coming to exam time Stationary observer: weeks rush by Causation: desperation drove her to extreme steps

Patterns of Metonymy Container for contained The kettle boiled (water) Possessor for possessed/attribute Where are you parked? (car) Represented entity for representative The government will announce new targets Whole for part I am going to fill up the car with petrol

Patterns of Metonymy (contd) Part for whole I noticed several new faces in the class Place for institution Lalbaug witnessed the largest Ganapati Question: Can you have part-part metonymy

Purpose of Metonymy More idiomatic/natural way of expression More natural to say the kettle is boiling as opposed to the water in the kettle is boiling Economy Room 23 is answering (but not *is asleep) Ease of access to referent He is in the phone book (but not *on the back of my hand) Highlighting of associated relation The car in the front decided to turn right (but not *to smoke a cigarette)

Feature sharing not necessary In a restaurant: Jalebii ko abhi dudh chaiye (no feature sharing) The elephant now wants some coffee (feature sharing)

Proverbs Describes a specific event or state of affairs which is applicable metaphorically to a range of events or states of affairs provided they have the same or sufficiently similar image- schematic structure

WSD APPROACHES

WSD – Problem Definition Obtain the sense of A set of target words, or of All words (all word WSD, more difficult) Against a Sense repository (like the wordnet), or A thesaurus (not same as wordnet, does not have semantic relations) Using the Context in which the word appears.

Example word: operation operation ((computer science) data processing in which the result is completely specified by a rule (especially the processing that results from a single instruction)) "it can perform millions of operations per second" operation, military operation (activity by a military or naval force (as a maneuver or campaign)) "it was a joint operation of the navy and air force" operation, surgery, surgical operation, surgical procedure, surgical process (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body) "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery" mathematical process, mathematical operation, operation ((mathematics) calculation by mathematical methods) "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic"

36 K NOWLEDEGE B ASED v/s M ACHINE L EARNING B ASED v/s H YBRID A PPROACHES Knowledge Based Approaches Rely on knowledge resources like WordNet, Thesaurus etc. May use grammar rules for disambiguation. May use hand coded rules for disambiguation. Machine Learning Based Approaches Rely on corpus evidence. Train a model using tagged or untagged corpus. Probabilistic/Statistical models. Hybrid Approaches Use corpus evidence as well as semantic relations form WordNet.

37 S ELECTIONAL P REFERENCES (I NDIAN T RADITION ) “Desire” of some words in the sentence (“aakaangksha”). I saw the boy with long hair. The verb “saw” and the noun “boy” desire an object here. “Appropriateness” of some other words in the sentence to fulfil that desire (“yogyataa”). I saw the boy with long hair. The PP “with long hair” can be appropriately connected only to “boy” and not “saw”. In case, the ambiguity is still present, “proximity” (“sannidhi”) can determine the meaning. E.g. I saw the boy with a telescope. The PP “with a telescope” can be attached to both “boy” and “saw”, so ambiguity still present. It is then attached to “boy” using the proximity check. 37

38 S ELECTIONAL P REFERENCES (R ECENT L INGUISTIC T HEORY ) There are words which demand arguments, like, verbs, prepositions, adjectives and sometimes nouns. These arguments are typically nouns. Arguments must have the property to fulfil the demand. They must satisfy selectional preferences. Example Give (verb) agent – animate obj – direct obj – indirect I gave him the book I gave him the book (yesterday in the school) -> adjunct How does this help in WSD? One type of contextual information is the information about the type of arguments that a word takes. 38

39 W SD U SING S ELECTIONAL P REFERENCES A ND A RGUMENTS 39 This airlines serves dinner in the evening flight. serve (Verb)‏ agent object – edible This airlines serves the sector between Agra & Delhi. serve (Verb)‏ agent object – sector Sense 1Sense 2 Requires exhaustive enumeration of:  Argument-structure of verbs.  Selectional preferences of arguments.  Description of properties of words such that meeting the selectional preference criteria can be decided. E.g. This flight serves the “region” between Mumbai and Delhi How do you decide if “region” is compatible with “sector”

40 O VERLAP B ASED A PPROACHES Require a Machine Readable Dictionary (MRD). Find the overlap between the features of different senses of an ambiguous word (sense bag) and the features of the words in its context (context bag). These features could be sense definitions, example sentences, hypernyms etc. The features could also be given weights. The sense which has the maximum overlap is selected as the contextually appropriate sense. 40

41 L ESK’S A LGORITHM 41 Sense 1 Trees of the olive family with pinnate leaves, thin furrowed bark and gray branches. Sense 2 The solid residue left when combustible material is thoroughly burned or oxidized. Sense 3 To convert into ash Sense 1 A piece of glowing carbon or burnt wood. Sense 2 charcoal. Sense 3 A black solid combustible substance formed by the partial decomposition of vegetable matter without free access to air and under the influence of moisture and often increased pressure and temperature that is widely used as a fuel for burning AshCoal Sense Bag: contains the words in the definition of a candidate sense of the ambiguous word. Context Bag: contains the words in the definition of each sense of each context word. E.g. “On burning coal we get ash.” In this case Sense 2 of ash would be the winner sense.

42 W ALKER’S A LGORITHM A Thesaurus Based approach. Step 1: For each sense of the target word find the thesaurus category to which that sense belongs. Step 2: Calculate the score for each sense by using the context words. A context words will add 1 to the score of the sense if the thesaurus category of the word matches that of the sense. E.g. The money in this bank fetches an interest of 8% per annum Target word: bank Clue words from the context: money, interest, annum, fetch Sense1: FinanceSense2: Location Money+10 Interest+10 Fetch00 Annum+10 Total30 Context words add 1 to the sense when the topic of the word matches that of the sense