1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

CODE/ CODE SWITCHING.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
The Meaning of Language
SEMANTICS.
Modality Lecture 10. Language is not merely used for conveying factual information A speaker may wish to indicate a degree of certainty to try to influence.
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek oν, genitive oντος: of being (part. of εiναι: to be) and –λογία:
Statistical NLP: Lecture 3
Introduction to Computational Linguisitics The Lexicon.
Section 4: Language and Intelligence Overview Instructor: Sandiway Fong Department of Linguistics Department of Computer Science.
1 Words and the Lexicon September 10th 2009 Lecture #3.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Introduction to Linguistics and Basic Terms
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
A STUDY ON THE KNOWLEDGE SOURCES OF TURKISH EFL LEARNERS IN LEXICAL INFERENCING İlknur İSTİFÇİ Anadolu University Eskişehir, TURKEY Eskişehir, TURKEY.
Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Lecture 1 Introduction: Linguistic Theory and Theories
Generative Grammar(Part ii)
Phonetics, Phonology, Morphology and Syntax
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Session 8 Lexical Semantic
What is linguistics  It is the science of language.  Linguistics is the systematic study of language.  The field of linguistics is concerned with the.
Semantics.
CAS LX 502 8b. Formal semantics A fragment of English.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
SYNTAX Lecture -1 SMRITI SINGH.
Linguistics The first week. Chapter 1 Introduction 1.1 Linguistics.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Time, Tense and Aspect Rajat Kumar Mohanty Centre For Indian Language Technology Department of Computer Science and Engineering Indian.
WordNet: Connecting words and concepts Peng.Huang.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 2.
Linguistic Essentials
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
LECTURE 2: SEMANTICS IN LINGUISTICS
Wordnet - A lexical database for the English Language.
Rules, Movement, Ambiguity
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
The meaning of Language Chapter 5 Semantics and Pragmatics Week10 Nov.19 th -23 rd.
Knowledge Representation
SYNTAX.
Levels of Linguistic Analysis
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
SYNTAX 1 NOV 9, 2015 – DAY 31 Brain & Language LING NSCI Fall 2015.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
SEMANTICS Chapter 10 Ms. Abrar Mujaddidi. What is semantics?  Semantics is the study of the conventional meaning conveyed by the use of words, phrases.
INTRODUCTION TO APPLIED LINGUISTICS
In this lecture, we will learn about: Translation.
Today, we will cover: 1.3 Language and the individual 1.4 Demonstrating semantic knowledge.
Introduction to Computational Linguisitics The Lexicon.
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
SEMANTICS VS PRAGMATICS
Statistical NLP: Lecture 3
SEMASIOLOGY LECTURE 1.
Ontology Engineering: from Cognitive Science to the Semantic Web
NLP Assignments for Undergraduates (1)
Part I: Basics and Constituency
What is Linguistics? The scientific study of human language
CSC 594 Topics in AI – Applied Natural Language Processing
WordNet: A Lexical Database for English
Levels of Linguistic Analysis
Linguistic Essentials
Knowledge Representation for Natural Language Understanding
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
INTRODUCTION TO SEMANTICS DEPARTMENT OF LANGUAGES
Presentation transcript:

1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University

2  Lexicons and Lexical Analysis  Lexicon: A Language Resource  A Lexicon for English Words: WordNet Outline

3 Lexicon: A Language Resource (1) Features for Lexicons (1) A lexicon means machine dictionary, which has the following features:  It elaborately provides all information which a dictionary contains;  Based on semantic descriptions, it describes syntagmatic and paradigmatic relationships for each word, e.g.: red + flower, green + leave, big + eye (syntagmatic rel.) red, green, and big; flower, leave and eye (paradigmatic rel.); Lexicons and Lexical Analysis (1)

4 Lexicon: A Language Resource (2) Features for Lexicons (2)  word building: fixed collocation between words;  systematization: description consistency including morphological, syntactic and semantic description;  formalization: expression with meta-langauge, e.g. [±noun]. Lexicons and Lexical Analysis (2)

5 Lexicon: A Language Resource (3) Construction of Lexicons The construction of a lexicon might contain the following critical points:  a knowledgebase rather than database is built. This work should be fulfilled by domain experts;  it can be built by manual or semi-automatic mode;  it can be applied to any machine platforms and domains;  it should have a general framework, so that it is able to interact with other lexicons. Lexicons and Lexical Analysis (3)

6 Lexicon: A Language Resource (4) Types of Lexicons The lexicon can be divided into four categories:  general lexicon (or basic lexicon);  collocation lexicon;  bilingual lexicon;  domain lexicon. Lexicons and Lexical Analysis (4)

7 Lexicon: A Language Resource (5) Information within Lexicons The information of a basic lexicon may contain:  lexical information (lexical entry etc.);  morphological information (POS, tense, etc.);  syntactic information (sentence pattern of verb, etc.);  semantic information (semantic attribute, predicate frame, etc.);  conceptual information (conceptual mark, word meaning explanation, etc.). Lexicons and Lexical Analysis (5)

8 Lexicon: A Language Resource (6) Sample (Morp., Syn. and Sem.) “ 给 ” (give) : Morp = [hq2, hq7, vjg, vjl, …]; Syn = [bso, bss, ksd, …]; Sem = [kyd, ]. e.g.: hq2 – allow to be followed by a numeral (verb as a quantifier); bso – it can not act as an object solely; kyd – donate or bestow; – taxonomic code Lexicons and Lexical Analysis (6)

9 Lexicon: A Language Resource (7) Sample (Frame) “ 给 ” (give) → S = NP + VP + NP 1 + NP 2 Syntactic Frame NP = [AP] + [QP] + N VP = [ADP] + V NP 1 = [QP] + N NP 2 = [QP] + N; NP = AGT (Agent) Semantic Frame NP 1 = DAT (Dative) NP 2 = OBJ (Patient) NP = human | country | society | saying Semantic Constraint NP 1 = human | animal | collectivity | region NP 2 = thing | a slap in the face | way out | elicitation Lexicons and Lexical Analysis (7)

10 Lexicon: A Language Resource (8) Collocation Lexicon Col(w) = where: cat – multi-POS; mor – morphology; syn – syntax and semantics; msy – nesting collocation; sen – sentence modifying rule set. Lexicons and Lexical Analysis (8)

11 Lexicon: A Language Resource (9) Sample (Collocation Lexicon) w: ‘ 大概 ’ (probably) cat: ^ ‘ 大概 ’ + (‘ 的 ’; n) cat: ^ ‘ 大概 ’ + (m; p; v; a; b; z) cat: q + ^ ‘ 大概 ’ … … Lexicons and Lexical Analysis (9)

12 A Lexicon for English Words: WordNet (1) What is WordNet ?  WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory.  English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym. Lexicons and Lexical Analysis (10)

13 A Lexicon for English Words: WordNet (2) Information within WordNet WordNet divides the lexicon into five categories:  Nouns  Verbs  Adjectives  Adverbs  Function verbs (particles) WordNet organizes lexical information in terms of word meanings, rather than word forms. Therefore, for organization, semantic relations are used. Lexicons and Lexical Analysis (11)

14 A Lexicon for English Words: WordNet (3) Psycholinguistics  The 20th Century has seen the emergence of psycho- linguistics, an interdisciplinary field of research concerned with the cognitive bases of linguistic competence.  Both linguists and psycholinguists have explored in consider- able depth the factors determining the contemporary (belonging to the same time) structure of linguistic knowledge in general, and lexical knowledge in particular. Lexicons and Lexical Analysis (12)

15 A Lexicon for English Words: WordNet (4) Psycholexicology  Miller and Johnson-Laird (1976) have proposed that research concerned with the lexical component of language should be called psycholexicology.  As linguistic theories evolved in recent decades, linguists became increasingly explicit about the information a lexicon must contain in order for the phonological, syntactic, and lexical components to work together in the everyday production and comprehension of linguistic messages, and those proposals have been incorporated into the work of psycholinguists. Lexicons and Lexical Analysis (13)

16 A Lexicon for English Words: WordNet (5) Lexicography  Beginning with word association studies at the turn of the century and continuing down to the sophisticated experimental tasks of the past twenty years, psycholinguists have discovered many synchronic properties of the mental lexicon that can be exploited in lexicography. Lexicons and Lexical Analysis (14)

17 A Lexicon for English Words: WordNet (6) Naissance of WordNet  In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database along lines suggested by these investigations (Miller, 1985).  The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically.  As the work proceeded, however, it demanded a more ambitious formulation of its own principles and goals. Lexicons and Lexical Analysis (15)

18 POSUnique StringsSynsetsTotal Word-Sense Pairs Noun Verb Adjective Adverb Totals Lexicons and Lexical Analysis (16) A Lexicon for English Words: WordNet (7) Size of WordNet 

19 A Lexicon for English Words: WordNet (8) Some Problems  What kinds of utterances enter into these lexical associations?  What is the nature and organization of the lexicalized concepts that words can express?  What syntactic roles do different words play? Lexicons and Lexical Analysis (17)

20 Lexicons and Lexical Analysis (18) A Lexicon for English Words: WordNet (9) Lexical Matrix (1)  In order to reduce ambiguity, ‘‘word form’’ is used here to refer to the physical utterance;  ‘‘word meaning’’ is referred to the lexicalized concept that a form can be used to express;  Then the starting point for lexical semantics can be said to be the mapping between forms and meanings.

21 Lexicons and Lexical Analysis (19) A Lexicon for English Words: WordNet (10) Lexical Matrix (2) Word Meanings Word Forms F 1 F 2 F 3... F n M1M2M3...MmM1M2M3...Mm E 1,1 E 1,2 E 2,2 E 3,3. E m,n If there are two entries in the same column, the word form is polysemous; if there are two entries in the same row, the two word forms are synonyms (relative to a context). Therefore, F1 and F2 are synonyms; F2 is polysemous.

22 Lexicons and Lexical Analysis (20) A Lexicon for English Words: WordNet (11) Polysemy and Synonymy  Mappings between forms and meanings are many:many—some forms have several different meanings, and some meanings can be expressed by several different forms.  That is to say, a listener or reader who recognizes a form must cope with its polysemy; a speaker or writer who hopes to express a meaning must decide between synonyms.

23 Lexicons and Lexical Analysis (21) A Lexicon for English Words: WordNet (12) Some of the Relations  Synonym  Antonym  Hyponymy / Hypernymy (Subordination / Superordination)  Meronymy / Holonymy (Part-Whole)

24 Lexicons and Lexical Analysis (22) A Lexicon for English Words: WordNet (13) Synonym (1) There are several definitions for synonym:  Two expressions are synonymous if the substitution of one for the other never changes the truth value of a sentence in which the substitution is made.  Two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value.  …

25 Lexicons and Lexical Analysis (23) A Lexicon for English Words: WordNet (14) Synonym (2)  Note that the definition of synonymy in terms of substitutability makes it necessary to partition WordNet into nouns, verbs, adjectives, and adverbs.  That is to say, if concepts are represented by synsets, and if synonyms must be interchangeable, then words in different syntactic categories cannot be synonyms (cannot form synsets) because they are not interchangeable.

26 Lexicons and Lexical Analysis (24) A Lexicon for English Words: WordNet (15) Antonym (1)  The antonym of a word x is sometimes not-x, but not always. For example, rich and poor are antonyms, but to say that someone is not rich does not imply that they must be poor; many people consider themselves neither rich nor poor.  Antonymy is a lexical relation between word forms, not a semantic relation between word meanings.

27 Lexicons and Lexical Analysis (25) A Lexicon for English Words: WordNet (16) Antonym (2) For example, the meanings {rise, ascend} and {fall, descend} may be conceptual opposites, but they are not antonyms; [rise / fall] are antonyms and so are [ascend / descend], but most people hesitate and look thoughtful when asked if rise and descend, or ascend and fall, are antonyms. Note that synonymy words are enclosed in curly brackets, ‘{’ and ‘}’, and other lexical relations will be enclosed in square brackets, ‘[’ and ‘]’.

28 Lexicons and Lexical Analysis (26) A Lexicon for English Words: WordNet (17) Hyponymy / Hypernymy  It is a semantic relation between word meanings. It is also called as subordination / superordination, subset / superset, or the ISA relation.  Hyponymy is transitive and asymmetrical. x is said to be a hyponymy of y if native speakers of English accept the sentence constructed as “An x is a (kind of) y.” Ex.: tree is a hyponymy of plant plant is a hypernymy of a tree

29 Lexicons and Lexical Analysis (27) A Lexicon for English Words: WordNet (18) Meronymy / Holonymy  It is a semantic relation which can also be called as part-whole or HASA relation.  x is said to be a meronymy of y if native speakers of English accept the sentence constructed as “An x is a part of y”. Ex.: a frame is a part of car or a car has a frame.

30 Lexicons and Lexical Analysis (28) A Lexicon for English Words: WordNet (19) User Interface

31 Lexicons and Lexical Analysis (29) A Lexicon for English Words: WordNet (20) References G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller Introduction to WordNet: An on-line lexical database. Journal of Lexicography, Vol. 3, pages G. Miller Nouns in WordNet: A Lexical Inheritance System. Journal of Lexicography, Vol. 3, pages C. Fellbaum English Verbs as a Semantic. Journal of Lexicography, Vol. 3, pages

32 Lexicons and Lexical Analysis (30) Assignments (2) 1.The text described several different example tests for distinguishing word classes. For example, nouns can occur in sentences of the form I saw the X, whereas adjectives can occur in sentences of the form It’s so X. Give some additional tests to distinguish these forms and to distinguish between count nouns and mass nouns. State whether each of the following words can be used as an adjective, count noun, or mass noun. If the word is ambiguous, give all its possible uses. milk, house, liquid, green, group, concept, airborne