Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pushpak Bhattacharyya CSE Dept., IIT Bombay

Similar presentations


Presentation on theme: "Pushpak Bhattacharyya CSE Dept., IIT Bombay"— Presentation transcript:

1 Pushpak Bhattacharyya CSE Dept., IIT Bombay
CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 4: Word Sense Disambiguation; Wordnet) Pushpak Bhattacharyya CSE Dept., IIT Bombay 1

2 Word Sense Disambiguation
WSD is a well know difficult problem Questions: Should the approach be Knowledge based Statistical Combined Resources Sense marked (annotated corpora) Sense repository Training Unsupervised Supervised Semi supervised

3 Synonym Distribution principle:
Words A and B are called ‘synonyms’ if their distribution is identical in a corpus. That means they can replace each other in any context. (Strong requirement – ideal) Pure synonym: If A and B are synonyms in all context (can replace in all contexts) they are pure synonyms. It has been very difficult to find pure synonyms. Question: How to ensure replaceability in Syntax Semantics Pragmatics Discourse

4 Example of replaceability
Consider {mother, mummi, amma} Syntax – yes: mother, mummi, ammi – noun: ex. Mother smiles. Constituent Parse Tree Dependency Parse Semantics: (Semantic Roles) replaceable Pragmatics: register (fails) A formal situation, ex. Dear Sir, Grant me leave for one day as my mother has to undergo an operation A proverb, ex. Mother makes the nation Register is linguistic memory specific to a situation S S S smiles agent S S mother Mother smiles

5 Relational and Componential Semantics
Relational Semantics (Words can disambiguate each other) vs. Componential Semantics (Words need features for disambiguation) Example Possible Features: Animate, Human, Carnivorous, Small, Moving Componential Semantics Semantic Feature Vector for cat (animal): <1,0,1,1,1> cat (expert): <1,1,U,U,1> Relational Semantics cat (animal): {cat, feline} cat (expert): {cat, expert} Cat animal An expert

6 4/20/2017 What is Wordnet CFILT, IIT Bombay

7 Wordnet A lexical knowledgebase based on conceptual lookup
4/20/2017 Wordnet A lexical knowledgebase based on conceptual lookup Organizing concepts in a semantic network. Organize lexical information in terms of word meaning, rather than word form Wordnet can also be used as a thesaurus. CFILT, IIT Bombay

8 Psycholinguistic Theory
4/20/2017 Psycholinguistic Theory Human lexical memory for nouns as a hierarchy. Can canary sing? - Pretty fast response. Can canary fly? - Slower response. Does canary have skin? – Slowest response. (can move, has skin) (can fly) (can sing) Wordnet - a lexical reference system based on psycholinguistic theories of human lexical memory. Animal Bird canary CFILT, IIT Bombay

9 4/20/2017 Lexical Matrix CFILT, IIT Bombay

10 Wordnet - Lexical Matrix (with examples)
Word Meanings Word Forms F1 F2 F3 Fn M1 (depend) E1,1 (bank) E1,2 (rely) E1,3 M2 E2,2 (embankment) E2,… M3 E3,2 E3,3 Mm Em,n

11 Wordnet: International Scenario
4/20/2017 Wordnet: International Scenario Wordnet is a network of words linked by lexical and semantic relations. The first wordnet in the world was for English developed at Princeton over 15 years. The Eurowordnet- linked structure of European language wordnets was built in 1998 over 3 years with funding from the EC as a a mission mode project. Wordnets for Hindi and Marathi being built at IIT Bombay are amongst the first IL wordnets. All these are proposed to be linked into the IndoWordnet which eventually will be linked to the English and the Euro wordnets. CFILT, IIT Bombay

12 Linked Wordnets in India
Bengali Wordnet Dravidian Language Wordnets Sanskrit Wordnet Punjabi Wordnet Hindi Wordnet North East Language Wordnet Marathi Wordnet Konkani Wordnet English Wordnet

13 Great Linguistic Diversity
Major streams Indo European Dravidian Sino Tibetan Austro-Asiatic Some languages are ranked within 20 in the world in terms of the populations speaking them Hindi and Urdu: 5th (~500 milion) Bangla: 7th (~300 million) Marathi 14th (~70 million)

14 Major Language Processing Initiatives
Mostly from the Government: Ministry of IT, Ministry of Human Resource Development, Department of Sceince and Technology Recently great drive from the industry: NLP efforts with Indian language in focus Google Microsoft IBM Research Lab Yahoo TCS

15 Fundamental Design Question
4/20/2017 Fundamental Design Question Syntagmatic vs. Paradigmatic realtions? Psycholinguistics is the basis of the design. When we hear a word, many words come to our mind by association. For English, about half of the associated words are syntagmatically related and half are paradignatically related. For cat animal, mammal- paradigmatic mew, purr, furry- syntagmatic CFILT, IIT Bombay

16 Stated Fundamental Application of Wordnet: Sense Disambiguation
4/20/2017 Stated Fundamental Application of Wordnet: Sense Disambiguation Determination of the correct sense of the word The crane ate the fish vs. The crane was used to lift the load bird vs. machine CFILT, IIT Bombay

17 The problem of Sense tagging
4/20/2017 The problem of Sense tagging Given a corpora To Assign correct sense to the words. This is sense tagging. Needs Word Sense Disambiguation (WSD) Highly important for Question Answering, Machine Translation, Text Mining tasks. CFILT, IIT Bombay

18 Basic Principle Words in natural languages are polysemous.
4/20/2017 Basic Principle Words in natural languages are polysemous. However, when synonymous words are put together, a unique meaning often emerges. Use is made of Relational Semantics. Componential Semantics where each word is a bundle of semantic features (as in the Schankian Conceptual Dependency system or Lexical Componential Semantics) is to be examined as a viable alternative. CFILT, IIT Bombay

19 Componential Semantics
4/20/2017 Componential Semantics Consider cat and tiger. Decide on componential attributes. For cat (Y, Y, N, Y) For tiger (Y,Y,Y,N) Complete and correct Attributes are difficult to design. Furry Carnivorous Heavy Domesticable CFILT, IIT Bombay

20 Semantic relations in wordnet
4/20/2017 Semantic relations in wordnet Synonymy Hypernymy / Hyponymy Antonymy Meronymy / Holonymy Gradation Entailment Troponymy 1, 3 and 5 are lexical (word to word), rest are semantic (synset to synset). CFILT, IIT Bombay

21 Synset: the foundation (house)
4/20/2017 Synset: the foundation (house) 1. house -- (a dwelling that serves as living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house") 2. house -- (an official assembly having legislative powers; "the legislature has two houses") 3. house -- (a building in which something is sheltered or located; "they had a large carriage house") 4. family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household"; "I waited until the whole house was asleep"; "the teacher asked how many people made up his home") 5. theater, theatre, house -- (a building where theatrical performances or motion-picture shows can be presented; "the house was full") 6. firm, house, business firm -- (members of a business organization that owns or operates one or more establishments; "he worked for a brokerage house") 7. house -- (aristocratic family line; "the House of York") 8. house -- (the members of a religious community living together) 9. house -- (the audience gathered together in a theatre or cinema; "the house applauded"; "he counted the house") 10. house -- (play in which children take the roles of father or mother or children and pretend to interact like adults; "the children were playing house") 11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided) 12. house -- (the management of a gambling house or casino; "the house gets a percentage of every bet") CFILT, IIT Bombay

22 Synset: DSF format (1/2) Synset ID: a unique number identifying a synset Category: POS category of the words Concept: The part of the gloss that gives a brief summary of what the synset represents Example: One or more examples of the words in the synset being used in sentences Synset: The set of synonymous words comprised in the synset

23 Synset - DSF format (2/2) ID :: 121 CATEGORY :: NOUN
CONCEPT :: अपने से छोटों के प्रति हृदय में उठनेवाला प्रेम EXAMPLE :: “चाचा नेहरू को बच्चों से बहुत ही स्नेह था” SYNSET :: स्नेह,नेह,लगाव,ममता

24 Creation of Synsets Three principles: Minimality Coverage
4/20/2017 Creation of Synsets Three principles: Minimality Coverage Replacability CFILT, IIT Bombay

25 Synset creation (continued)
4/20/2017 Synset creation (continued) Home John’s home was decorated with lights on the occasion of Christmas. Having worked for many years abroad, John Returned home. House John’s house was decorated with lights on the occasion of Christmas. Mercury is situated in the eighth house of John’s horoscope. CFILT, IIT Bombay

26 Synsets (continued) {house} is ambiguous.
4/20/2017 Synsets (continued) {house} is ambiguous. {house, home} has the sense of a social unit living together; Is this the minimal unit? {family, house , home} will make the unit completely unambiguous. For coverage: {family, household, house, home} ordered according to frequency. Replacability of the most frequent words is a requirement. CFILT, IIT Bombay

27 Synset creation From first principles
4/20/2017 Synset creation From first principles Pick all the senses from good standard dictionaries. Obtain synonyms for each sense. Needs hard and long hours of work. CFILT, IIT Bombay

28 Synset creation (continued)
4/20/2017 Synset creation (continued) From the wordnet of another language in the same family Pick the synset and obtain the sense from the gloss. Get the words of the target language. Often same words can be used- especially for t%sama words. Translation, Insertion and deletion. Hindi Synset: AnauBavaI jaanakar maMjaa huAa (experienced person) Marathi Synset: AnauBavaI t& jaaNata &ata CFILT, IIT Bombay

29 4/20/2017 Gloss and Example Crucially needed for concept explication, wordnet building using another wordnet and wordnet linking. {earthquake, quake, temblor, seism} -- (shaking and vibration at the surface of the earth resulting from underground movement along a fault plane of from volcanic activity) CFILT, IIT Bombay

30 Semantic Relations Hypernymy and Hyponymy
4/20/2017 Semantic Relations Hypernymy and Hyponymy Relation between word senses (synsets) X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and asymmetrical Hypernymy is inverse of Hyponymy (lion->animal->animate entity->entity) CFILT, IIT Bombay

31 Semantic Relations (continued)
4/20/2017 Semantic Relations (continued) Meronymy and Holonymy Part-whole relation, branch is a part of tree X is a meronymy of Y if X is a part of Y Holonymy is the inverse relation of Meronymy {kitchen} ………………………. {house} CFILT, IIT Bombay

32 Lexical Relation Antonymy Oppositeness in meaning
4/20/2017 Lexical Relation Antonymy Oppositeness in meaning Relation between word forms Often determined by phonetics, word length etc. ({rise, ascend} vs. {fall, descend}) CFILT, IIT Bombay

33 Troponym and Entailment
4/20/2017 Troponym and Entailment Entailment {snoring – sleeping} Troponym {limp, strut – walk} {whisper – talk} CFILT, IIT Bombay

34 Entailment. Proper Temporal Inclusion. Co-extensiveness. (Troponymy)
4/20/2017 Entailment. Snoring entails sleeping. Buying entails paying. Proper Temporal Inclusion. Inclusion can be in any way. Sleeping temporally includes snoring. Buying temporally includes paying. Co-extensiveness. (Troponymy) Limping is a manner of walking. CFILT, IIT Bombay

35 Opposition among verbs.
4/20/2017 Opposition among verbs. {Rise,ascend} {fall,descend} Tie-untie (do-undo) Walk-run (slow,fast) Teach-learn (same activity different perspective) Rise-fall (motion upward or downward) Opposition and Entailment. Hit or miss (entail aim) . Backward presupposition. Succeed or fail (entail try.) CFILT, IIT Bombay

36 The causal relationship.
4/20/2017 The causal relationship. Show- see. Give- have. Causation and Entailment. Giving entails having. Feeding entails eating. CFILT, IIT Bombay

37 4/20/2017 CFILT, IIT Bombay

38 Kinds of Antonymy Small - Big Good – Bad Warm – Cool
4/20/2017 Kinds of Antonymy Size Small - Big Quality Good – Bad State Warm – Cool Personality Dr. Jekyl- Mr. Hyde Direction East- West Action Buy – Sell Amount Little – A lot Place Far – Near Time Day - Night Gender Boy - Girl CFILT, IIT Bombay

39 Kinds of Meronymy Head - Body Wood - Table Tree - Forest
4/20/2017 Kinds of Meronymy Component-object Head - Body Staff-object Wood - Table Member-collection Tree - Forest Feature-Activity Speech - Conference Place-Area Palo Alto - California Phase-State Youth - Life Resource-process Pen - Writing Actor-Act Physician - Treatment CFILT, IIT Bombay

40 Gradation State Childhood, Youth, Old age Temperature Hot, Warm, Cold
4/20/2017 Gradation State Childhood, Youth, Old age Temperature Hot, Warm, Cold Action Sleep, Doze, Wake CFILT, IIT Bombay

41 WordNet Sub-Graph (English)
4/20/2017 WordNet Sub-Graph (English) study Hyponymy Dwelling,abode bedroom kitchen house,home A place that serves as the living quarters of one or mor efamilies guestroom veranda bckyard hermitage cottage Meronymy M e r o n y m Hypernymy Gloss CFILT, IIT Bombay

42 WordNet Sub-Graph: Hindi
गाय, गऊ (gaaya ,gauu) Cow चौपाया,पशु (chaupaayaa, pashu) Four-legged animal सींगवाला एक शाकाहारी मादा चौपाया (siingwaalaa eka sakaahaarii maadaa choupaayaa) A horny, herbivorous, four-legged female animal) पगुराना ( paguraanaa) ruminate बैल (baila) Ox कामधेनु kaamadhenu A kind of cow मैनी गाय mainii gaaya थन (thana) udder पूँछ (puunchh ) Tail शाकाहारी (shaakaahaarii) herbivorous Hypernym Attribute Hyponym Gloss Ability Verb meronym Antonym

43 Wordnet Subgraph (Marathi)
वनस्पती रान H Y P E R N Y M Y खोड HOLONYMY M E R O N Y झाड, वृक्ष, तरू बाग मूळ I have changed the title so please see this in master Here I can not add the roman because the space constraint. GLOSS H Y P O N Y M Y मुळे,खोड,फांद्या,पाने इत्यादींनी युक्त असा वनस्पतिविशेष:"झाडे पर्यावरण शुद्ध करण्याचे काम करतात" लिंबू आंबा

44 Pan-India Dictionary Standard
Senses Hindi Marathi Bangali Oriya Tamil (W1, W2, W3, W4, W5, W6 ) (W1, W2, W3) (W1, W2 , W3) (W1, W2, W3, W4) (sun) (सूर्य, सूरज, भानु, भास्कर, प्रभाकर, दिनकर, अंशुमान, अंशुमाली) (सूर्य, भानु, दिवाकर, भास्कर, रवि, दिनेश, दिनमणी) ... (cub, lad, laddie, sonny, sonny boy) (लड़का, बालक, बच्चा, छोकड़ा, छोरा, छोकरा, लौंडा ) (मुलगा, पोरगा, पोर, पोरगे ) (son, boy) (पुत्र, बेटा, लड़का, लाल, सुत, बच्चा, नंदन, पूत, चिरंजीव, चिरंजी ) (मुलगा, पुत्र, लेक, चिरंजीव, तनय )

45 Sanskrit Wordnet: a new effort- A column in the Concept based Multilingual dictionary
Concepts L1 (English) L2 (Hindi) L3 (Sanskrit) Concept ID: Concept description (W1, W2, W3, ..) (W4, W5, W6, ..) (W7, W8, W9, ..) 4066: any of various long-tailed primates (excluding the prosimians) (monkey) (बंदर, बन्दर, बानर, वानर, कीश, कपि, मर्कट, ..) (वानरः, कपिः, प्लवङ्गः, प्लवगः, शाखामृगः, वलीमुखः, मर्कटः, ..) 2186: a typical star that is the source of light and heat for the planets in the solar system (sun) (सूर्य,सूरज, भानु, दिवाकर, भास्कर, प्रभाकर, दिनकर, रवि, ..) (सूर्यः, सविता, आदित्यः, मित्रः, अरुणः, भानुः, पूषा, अर्कः, ..)

46 Summary Synsets: basic units
4/20/2017 Summary Synsets: basic units Principles of creation: minimality, coverage, replaceability Semantic relations (main ones): hypernymy (is-a), meronymy (part-of), antomymy, troponymy (manner-of) CFILT, IIT Bombay


Download ppt "Pushpak Bhattacharyya CSE Dept., IIT Bombay"

Similar presentations


Ads by Google