Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Nov 2001IS202: Information Organization and Retrieval Lexical Relations and WordNet Ray Larson & Warren Sack University of California, Berkeley School.

Similar presentations


Presentation on theme: "1 Nov 2001IS202: Information Organization and Retrieval Lexical Relations and WordNet Ray Larson & Warren Sack University of California, Berkeley School."— Presentation transcript:

1 1 Nov 2001IS202: Information Organization and Retrieval Lexical Relations and WordNet Ray Larson & Warren Sack University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture author: Warren Sack

2 1 Nov 2001IS202: Information Organization and Retrieval Last Time What is Cognitive Science? What is Artificial Intelligence? –Knowledge Representation Languages and Programming Paradigms –Representing Common Sense Common Sense Interfaces Story Understanding, Story Generation, and Common Sense

3 1 Nov 2001IS202: Information Organization and Retrieval Cognitive Science 10/30/01 – AI, knowledge representation and common sense 11/01/01 – Computational Linguistics, Cognitive Psychology and Lexical Knowledge 11/06/01 – AI and information extraction 11/08/01 – Linguistics, Philosophy, Psychology, categories, and cognition

4 1 Nov 2001IS202: Information Organization and Retrieval Today Lexical relations –Linguistics Two approaches to semantics: –Compositional –Relational –Psycholinguistics WordNet –Description –Structure –Applications

5 1 Nov 2001IS202: Information Organization and Retrieval Levels of Linguistic Analysis Sentences –Phonological/Morphological analysis –Syntactic analysis –Semantic analysis More than one sentence –Pragmatic analysis

6 1 Nov 2001IS202: Information Organization and Retrieval Phonology/Morphology Phonology: The study of the systems of sounds which are manifested in natural languages; the significant contrasts between sounds that are relevant to meaning. –E.g., consonants, vowels, stress, intonation, etc. Morphology: the forms of words –E.g., word=watched; morphs=watch+ed; morphemes=watch+past

7 1 Nov 2001IS202: Information Organization and Retrieval Syntax The syntax of a language is to be understood as a set of rules which accounts for the distribution of word forms throughout the sentences of a language. These rules codify permissible combinations of classes of word forms.

8 1 Nov 2001IS202: Information Organization and Retrieval Semantics Semantics is the study of linguistic meaning. Two standard approaches to lexical semantics (cf., sentential semantics; and, logical semantics): –(1) compositional –(2) relational Other approaches…

9 1 Nov 2001IS202: Information Organization and Retrieval Pragmatics Deixis –E.g., “I’ll be back in an hour” depends upon the time of the utterance. Conversational implicature –A: “Can you tell me the time?” –B: “Well, the milkman has come.” [I don’t know exactly, but perhaps you can deduce it from some extra information I give you.] Presupposition –“Are you still such a bad driver?” Speech acts –Constatives vs. performatives –e.g., “I second the motion.” Conversational Structure –E.g., turn-taking rules

10 1 Nov 2001IS202: Information Organization and Retrieval Lexical Semantics: Compositional Approach Compositional lexical semantics, introduced by Katz & Fodor (1963), analyzes the meaning of a word in much the same way a sentence is analyzed into semantic components. The semantic components of a word are not themselves considered to be words, but are abstract elements (semantic atoms) postulated in order to describe word meanings (semantic molecules) and to explain the semantic relations between words. For example, the representation of bachelor might be ANIMATE and HUMAN and MALE and ADULT and NEVER MARRIED. The representation of man might be ANIMATE and HUMAN and MALE and ADULT; because all the semantic components of man are included in the semantic components of bachelor, it can be inferred that bachelor  man. In addition, there are implicational rules between semantic components, e.g. HUMAN  ANIMATE, which also look very much like meaning postulates. George Miller, “On Knowing a Word,” 1999

11 1 Nov 2001IS202: Information Organization and Retrieval Lexical Semantics: Relational Approach Relational lexical semantics was first introduced by Carnap (1956) in the form of meaning postulates, where each postulate stated a semantic relation between words. A meaning postulate might look something like dog  animal (if x is a dog then x is an animal) or, adding logical constants, bachelor  man and never married [if x is a bachelor then x is a man and not(x has married)] or tall  not short [if x is tall then not(x is short)]. The meaning of a word was given, roughly, by the set of all meaning postulates in which it occurs. George Miller, “On Knowing a Word,” 1999

12 1 Nov 2001IS202: Information Organization and Retrieval Psycholinguistics The introduction of Noam Chomsky’s theory of syntax to psychologists: Miller, G.A., Galanter, E., Pribram, K.H. (1960) Plans and the Structure of Behavior. Some areas of psycholinguistics: –Children’s acquisition of language –First and second language learning –Artificial intelligence? (see Lyons, 1981)

13 1 Nov 2001IS202: Information Organization and Retrieval WordNet Started in 1985 by George Miller, students, and colleagues at the Cognitive Science Laboratory, Princeton University Can be downloaded for free: www.cogsci.princeton.edu/~wn/ In terms of coverage, WordNet’s goals differ little from those of a good standard college-level dictionary, and the semantics of WordNet is based on the notionof word sense that lexicographers have traditionally used in writing dictionaries. It is in the organization of that information that WordNet aspires to innovation. (Miller, 1998, chapter 1)

14 1 Nov 2001IS202: Information Organization and Retrieval Presuppositions of WordNet project Separability hypothesis: T The lexical component of language can be separated and studied in its own right. Patterning hypothesis: People have knowledge of the systematic patterns and relations between word meanings. Comprehensiveness hypothesis: Computational linguistics programs need a store of lexical knowledge that is as extensive as that which people have.

15 1 Nov 2001IS202: Information Organization and Retrieval WordNet structure Synsets versus Words

16 1 Nov 2001IS202: Information Organization and Retrieval WordNet: Size POSUnique Synsets Strings Noun 10793074488 Verb1080612754 Adjective2136518523 Adverb45833612 Totals144684109377

17 1 Nov 2001IS202: Information Organization and Retrieval Structure of WordNet

18 1 Nov 2001IS202: Information Organization and Retrieval Structure of WordNet

19 1 Nov 2001IS202: Information Organization and Retrieval Structure of WordNet

20 1 Nov 2001IS202: Information Organization and Retrieval Unique Beginners { entity, something, (anything having existence (living or nonliving)) } { psychological_feature, (a feature of the mental life of a living organism) } { abstraction, (a general concept formed by extracting common features from specific examples) } { state, (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") } { event, (something that happens at a given place and time) } { act, human_action, human_activity, (something that people do or cause to happen) } { group, grouping, (any number of entities (members) considered as a unit) } { possession, (anything owned or possessed) } { phenomenon, (any state or process known through the senses rather than by intuition or reasoning) }

21 1 Nov 2001IS202: Information Organization and Retrieval Roget’s “Unique Beginners” The ontology of Roget’s is headed by six Classes. The first three Classes cover the external world: Abstract Relations deals with such ideas as number, order and time; Space is concerned with movement, shapes and sizes, while Matter covers the physical world and humankind’s perception of it by means of five senses. The remaining Classes deal with the internal world of human beings: the mind (Intellect), the will (Volition), the heart and soul (Emotion, Religion and Morality). There is a logical progression from abstract concepts, through the material universe, to mankind itself, culminating in what Roget saw as mankind’s highest achievements: morality and religion (Kirkpatrick, 1998). Class Four, Intellect, is divided into Formation of ideas and Communication of ideas, and Class Five, Volition, into Individual volition and Social volition. In practice, therefore, the Thesaurus is headed by eight Classes. A path in Roget’s ontology always begins with one of the Classes. It branches to one of the 39 Sections and then to one of the 990 Heads. Each Head is divided into paragraphs grouped by parts of speech: nouns, adjectives, verbs and adverbs. From Mario Jarmasz, Stan Szpakowicz, “Roget’s Thesaurus as an Electronic Lexical Knowledge Base,” 2000.

22 1 Nov 2001IS202: Information Organization and Retrieval WordNet Browsers http://www.cogsci.princeton.edu/cgi- bin/webwn http://bogart.sip.ucm.es/~jorge/browser. htm http://www.visualthesaurus.com/

23 1 Nov 2001IS202: Information Organization and Retrieval Other WordNets http://www.hum.uva.nl/~ewn/gwa/wordnet_table.htm Dutch Spanish Italian German French Czech Estonian

24 1 Nov 2001IS202: Information Organization and Retrieval Forthcoming WordNets http://www.hum.uva.nl/~ewn/gwa/wordnet_table.htm Bengali Bulgarian Danish Greek Hebrew Hindi Kannada Latvian Moldavian Romanian Russian Slovenian Swedish Tamil Thai Turkish Yugoslavian Norwegian Icelandic

25 1 Nov 2001IS202: Information Organization and Retrieval Psycholinguistic evidence for WordNet’s structure Bever and Rosenbaum, 1970: –A pistol is more dangerous than a rifle. –* A pistol is more dangerous than a gun. –* A gun is more dangerous than a pistol. Resnik, 1993 –The direct object of the verb drink can be any hyponym of the noun berverage. Collins and Quillian, 1969 –The time required to verify the statement “A robin is a bird” is shorter than the time required to verify the statement “A robin is an animal.”

26 1 Nov 2001IS202: Information Organization and Retrieval Psycholinguistic evidence against WordNet’s structure Smith and Medin, 1981 –The time required to verify that a chicken is a bird is significantly longer than the time required to verify that a robin is a bird, even though chick and robin stand in the same taxonomic relation to bird. Rosch, 1973 –Ratings of “typicality” have little to do with frequency or familiarity. Lakoff, 1987 –Concepts are represented, not by a list of distinguishing features, but by the focal instances (or prototypes) that are the best examples of the prototype.

27 1 Nov 2001IS202: Information Organization and Retrieval WordNet Applications Using WordNet as a data structure. Many languages used by computational linguists and natural language processing researchers now have WordNet packages. E.g., for Perl –Lingua::Wordnet, and –Lingua::Wordnet::Analysis by Dan Brian, http://search.cpan.org/search?dist=Lingua- Wordnet

28 1 Nov 2001IS202: Information Organization and Retrieval WordNet Applications Information Retrieval: Voorhees, 1998 –Query expansion via synsets –“sense-based” rather than “stem-based” vectors –Unfortunately, in both cases, the inability to automatically resolve word senses prevented any improvement from being made.

29 1 Nov 2001IS202: Information Organization and Retrieval WordNet Applications Textual Cohesion and the correction of Malapropisms: Hirst and St-Onge, 1998 Malapropism = the confounding of an intended word with another word of similar sound or similar spelling that has a quite different meaning; e.g., “Super bowl  Superb owl”

30 1 Nov 2001IS202: Information Organization and Retrieval WordNet Applications Temporal Indexing through lexical chaining: Al-Halimi and Kazman, 1998 Indexing transcripts of conference meetings by topic.

31 1 Nov 2001IS202: Information Organization and Retrieval WordNet Applications Conversation themes in Usenet: Sack, 2000

32 1 Nov 2001IS202: Information Organization and Retrieval Next Time Information Extraction, Artificial Intelligence, and “Story Understanding” Revisited


Download ppt "1 Nov 2001IS202: Information Organization and Retrieval Lexical Relations and WordNet Ray Larson & Warren Sack University of California, Berkeley School."

Similar presentations


Ads by Google