Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011

Similar presentations


Presentation on theme: "1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011"— Presentation transcript:

1 1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011 zhaohai@cs.sjtu.edu.cn http://bcmi.sjtu.edu.cn/~zhaohai/lessons/nlp2011/index.html zhaohai@cs.sjtu.edu.cn http://bcmi.sjtu.edu.cn/~zhaohai/lessons/nlp2011/index.html

2 2  Lexicons and Lexical Analysis  Lexicon: A Language Resource  A Lexicon for English Words: WordNet Outline

3 3 Lexicon: A Language Resource (1) Features for Lexicons (1) A lexicon means machine dictionary, which has the following features:  It elaborately provides all information which a dictionary contains;  Based on semantic descriptions, it describes syntagmatic and paradigmatic relationships for each word, e.g.: red + flower, green + leave, big + eye (syntagmatic rel.) red, green, and big; flower, leave and eye (paradigmatic rel.); Lexicons and Lexical Analysis (1)

4 4 Lexicon: A Language Resource (2) Features for Lexicons (2)  word building: fixed collocation between words;  systematization: description consistency including morphological, syntactic and semantic description;  formalization: expression with meta-langauge, e.g. [±noun]. Lexicons and Lexical Analysis (2)

5 5 Lexicon: A Language Resource (3) Construction of Lexicons The construction of a lexicon might contain the following critical points:  a knowledgebase rather than database is built. This work should be fulfilled by domain experts;  it can be built by manual or semi-automatic mode;  it can be applied to any machine platforms and domains;  it should have a general framework, so that it is able to interact with other lexicons. Lexicons and Lexical Analysis (3)

6 6 Lexicon: A Language Resource (4) Types of Lexicons The lexicon can be divided into four categories:  general lexicon (or basic lexicon);  collocation lexicon;  bilingual lexicon;  domain lexicon. Lexicons and Lexical Analysis (4)

7 7 Lexicon: A Language Resource (5) Information within Lexicons The information of a basic lexicon may contain:  lexical information (lexical entry etc.);  morphological information (POS, tense, etc.);  syntactic information (sentence pattern of verb, etc.);  semantic information (semantic attribute, predicate frame, etc.);  conceptual information (conceptual mark, word meaning explanation, etc.). Lexicons and Lexical Analysis (5)

8 8 Lexicon: A Language Resource (6) Sample (Morp., Syn. and Sem.) “ 给 ” (give) : Morp = [hq2, hq7, vjg, vjl, …]; Syn = [bso, bss, ksd, …]; Sem = [kyd, 240202]. e.g.: hq2 – allow to be followed by a numeral (verb as a quantifier); bso – it can not act as an object solely; kyd – donate or bestow; 240202 – taxonomic code Lexicons and Lexical Analysis (6)

9 9 Lexicon: A Language Resource (7) Sample (Frame) “ 给 ” (give) → S = NP + VP + NP 1 + NP 2 Syntactic Frame NP = [AP] + [QP] + N VP = [ADP] + V NP 1 = [QP] + N NP 2 = [QP] + N; NP = AGT (Agent) Semantic Frame NP 1 = DAT (Dative) NP 2 = OBJ (Patient) NP = human | country | society | saying Semantic Constraint NP 1 = human | animal | collectivity | region NP 2 = thing | a slap in the face | way out | elicitation Lexicons and Lexical Analysis (7)

10 10 Lexicon: A Language Resource (8) Collocation Lexicon Col(w) = where: cat – multi-POS; mor – morphology; syn – syntax and semantics; msy – nesting collocation; sen – sentence modifying rule set. Lexicons and Lexical Analysis (8)

11 11 Lexicon: A Language Resource (9) Sample (Collocation Lexicon) w: ‘ 大概 ’ (probably) cat: ^ ‘ 大概 ’ + (‘ 的 ’; n)  @setmark(a); cat: ^ ‘ 大概 ’ + (m; p; v; a; b; z)  @setmark(d); cat: q + ^ ‘ 大概 ’  @setmark(n); … … Lexicons and Lexical Analysis (9)

12 12 A Lexicon for English Words: WordNet (1) What is WordNet ?  WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory.  English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym. Lexicons and Lexical Analysis (10)

13 13 A Lexicon for English Words: WordNet (2) Information within WordNet WordNet divides the lexicon into five categories:  Nouns  Verbs  Adjectives  Adverbs  Function verbs (particles) WordNet organizes lexical information in terms of word meanings, rather than word forms. Therefore, for organization, semantic relations are used. Lexicons and Lexical Analysis (11)

14 14 A Lexicon for English Words: WordNet (3) Psycholinguistics  The 20th Century has seen the emergence of psycho- linguistics, an interdisciplinary field of research concerned with the cognitive bases of linguistic competence.  Both linguists and psycholinguists have explored in consider- able depth the factors determining the contemporary (belonging to the same time) structure of linguistic knowledge in general, and lexical knowledge in particular. Lexicons and Lexical Analysis (12)

15 15 A Lexicon for English Words: WordNet (4) Psycholexicology  Miller and Johnson-Laird (1976) have proposed that research concerned with the lexical component of language should be called psycholexicology.  As linguistic theories evolved in recent decades, linguists became increasingly explicit about the information a lexicon must contain in order for the phonological, syntactic, and lexical components to work together in the everyday production and comprehension of linguistic messages, and those proposals have been incorporated into the work of psycholinguists. Lexicons and Lexical Analysis (13)

16 16 A Lexicon for English Words: WordNet (5) Lexicography  Beginning with word association studies at the turn of the century and continuing down to the sophisticated experimental tasks of the past twenty years, psycholinguists have discovered many synchronic properties of the mental lexicon that can be exploited in lexicography. Lexicons and Lexical Analysis (14)

17 17 A Lexicon for English Words: WordNet (6) Naissance of WordNet  In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database along lines suggested by these investigations (Miller, 1985).  The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically.  As the work proceeded, however, it demanded a more ambitious formulation of its own principles and goals. Lexicons and Lexical Analysis (15)

18 18 POSUnique StringsSynsetsTotal Word-Sense Pairs Noun11779882115146312 Verb115291376725047 Adjective214791815630002 Adverb448136215580 Totals155287117659206941 Lexicons and Lexical Analysis (16) A Lexicon for English Words: WordNet (7) Size of WordNet  http://wordnet.princeton.edu/ http://wordnet.princeton.edu/

19 19 A Lexicon for English Words: WordNet (8) Some Problems  What kinds of utterances enter into these lexical associations?  What is the nature and organization of the lexicalized concepts that words can express?  What syntactic roles do different words play? Lexicons and Lexical Analysis (17)

20 20 Lexicons and Lexical Analysis (18) A Lexicon for English Words: WordNet (9) Lexical Matrix (1)  In order to reduce ambiguity, ‘‘word form’’ is used here to refer to the physical utterance;  ‘‘word meaning’’ is referred to the lexicalized concept that a form can be used to express;  Then the starting point for lexical semantics can be said to be the mapping between forms and meanings.

21 21 Lexicons and Lexical Analysis (19) A Lexicon for English Words: WordNet (10) Lexical Matrix (2) Word Meanings Word Forms F 1 F 2 F 3... F n M1M2M3...MmM1M2M3...Mm E 1,1 E 1,2 E 2,2 E 3,3. E m,n If there are two entries in the same column, the word form is polysemous; if there are two entries in the same row, the two word forms are synonyms (relative to a context). Therefore, F1 and F2 are synonyms; F2 is polysemous.

22 22 Lexicons and Lexical Analysis (20) A Lexicon for English Words: WordNet (11) Polysemy and Synonymy  Mappings between forms and meanings are many:many—some forms have several different meanings, and some meanings can be expressed by several different forms.  That is to say, a listener or reader who recognizes a form must cope with its polysemy; a speaker or writer who hopes to express a meaning must decide between synonyms.

23 23 Lexicons and Lexical Analysis (21) A Lexicon for English Words: WordNet (12) Some of the Relations  Synonym  Antonym  Hyponymy / Hypernymy (Subordination / Superordination)  Meronymy / Holonymy (Part-Whole)

24 24 Lexicons and Lexical Analysis (22) A Lexicon for English Words: WordNet (13) Synonym (1) There are several definitions for synonym:  Two expressions are synonymous if the substitution of one for the other never changes the truth value of a sentence in which the substitution is made.  Two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value.  …

25 25 Lexicons and Lexical Analysis (23) A Lexicon for English Words: WordNet (14) Synonym (2)  Note that the definition of synonymy in terms of substitutability makes it necessary to partition WordNet into nouns, verbs, adjectives, and adverbs.  That is to say, if concepts are represented by synsets, and if synonyms must be interchangeable, then words in different syntactic categories cannot be synonyms (cannot form synsets) because they are not interchangeable.

26 26 Lexicons and Lexical Analysis (24) A Lexicon for English Words: WordNet (15) Antonym (1)  The antonym of a word x is sometimes not-x, but not always. For example, rich and poor are antonyms, but to say that someone is not rich does not imply that they must be poor; many people consider themselves neither rich nor poor.  Antonymy is a lexical relation between word forms, not a semantic relation between word meanings.

27 27 Lexicons and Lexical Analysis (25) A Lexicon for English Words: WordNet (16) Antonym (2) For example, the meanings {rise, ascend} and {fall, descend} may be conceptual opposites, but they are not antonyms; [rise / fall] are antonyms and so are [ascend / descend], but most people hesitate and look thoughtful when asked if rise and descend, or ascend and fall, are antonyms. Note that synonymy words are enclosed in curly brackets, ‘{’ and ‘}’, and other lexical relations will be enclosed in square brackets, ‘[’ and ‘]’.

28 28 Lexicons and Lexical Analysis (26) A Lexicon for English Words: WordNet (17) Hyponymy / Hypernymy  It is a semantic relation between word meanings. It is also called as subordination / superordination, subset / superset, or the ISA relation.  Hyponymy is transitive and asymmetrical. x is said to be a hyponymy of y if native speakers of English accept the sentence constructed as “An x is a (kind of) y.” Ex.: tree is a hyponymy of plant plant is a hypernymy of a tree

29 29 Lexicons and Lexical Analysis (27) A Lexicon for English Words: WordNet (18) Meronymy / Holonymy  It is a semantic relation which can also be called as part-whole or HASA relation.  x is said to be a meronymy of y if native speakers of English accept the sentence constructed as “An x is a part of y”. Ex.: a frame is a part of car or a car has a frame.

30 30 Lexicons and Lexical Analysis (28) A Lexicon for English Words: WordNet (19) User Interface

31 31 Lexicons and Lexical Analysis (29) A Lexicon for English Words: WordNet (20) References G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. 1990. Introduction to WordNet: An on-line lexical database. Journal of Lexicography, Vol. 3, pages 235-244. G. Miller. 1990. Nouns in WordNet: A Lexical Inheritance System. Journal of Lexicography, Vol. 3, pages 245-264. C. Fellbaum. 1990. English Verbs as a Semantic. Journal of Lexicography, Vol. 3, pages 278-301.

32 32 Lexicons and Lexical Analysis (30) Assignments (2) 1.The text described several different example tests for distinguishing word classes. For example, nouns can occur in sentences of the form I saw the X, whereas adjectives can occur in sentences of the form It’s so X. Give some additional tests to distinguish these forms and to distinguish between count nouns and mass nouns. State whether each of the following words can be used as an adjective, count noun, or mass noun. If the word is ambiguous, give all its possible uses. milk, house, liquid, green, group, concept, airborne


Download ppt "1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011"

Similar presentations


Ads by Google