Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.

Similar presentations


Presentation on theme: "LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG."— Presentation transcript:

1 LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG

2 Administrivia No more homeworks until the final No more homeworks until the final  Final will also cover the material after Homework 4  Take-home final  Handed out on Tuesday December 9th  Discussed in class that day  One week strict deadline No class on Thursday No class on Thursday  Happy Turkey Day!

3 Relative Clauses From Lecture 14, we have examples like: From Lecture 14, we have examples like:  The cat that John saw(object)  The cat i that John e i  The cat that saw John(subject)  The cat i that e i saw John From Homework 4 (review), we saw that we can have multiply embedded relative clauses From Homework 4 (review), we saw that we can have multiply embedded relative clauses

4 Relative Clauses Classwork Question (do it now) Classwork Question (do it now)  Rank the following sentences in order of the difficulty of comprehension: 1. I hate the man that the cat that Mary saw hissed at 2. I hate the man that saw the cat that hissed at John 3. I hate the man that the cat that hissed at John saw 4. I hate the man that hissed at the cat that John saw Note: 1 = most difficult If two (or more) are about the same level, give them the same rank

5 Today’s Lecture In Lecture 21, we looked at Stemming In Lecture 21, we looked at Stemming  … the (morphological) process of going from a fully inflected word form to a root In today’s lecture, we’ll discuss part-of- speech (POS) tagging In today’s lecture, we’ll discuss part-of- speech (POS) tagging  … the process of identifying the part of speech of a fully inflected word form

6 Part-of-Speech (POS) Tagging Example of a lightweight NLP task Example of a lightweight NLP task  Useful when complete syntactic analysis is not needed, or…  When used as a first stage towards a more complete analysis  POS taggers are practical and do well  95%+ accuracy claimed in the literature

7 Parts of Speech: Problem Example: Example:  walk: noun, verb  The walk : noun I took …  I walk : verb 2 miles every day Correct tag determined by syntax Correct tag determined by syntax POS taggers try to assign correct tag without actually parsing the sentence POS taggers try to assign correct tag without actually parsing the sentence

8 Components of a Tagger Dictionary of words Dictionary of words  Exhaustive list of closed class items  Examples: the, a, an: determinerthe, a, an: determiner from, to, of, by: prepositionfrom, to, of, by: preposition and, or: coordination conjunctionand, or: coordination conjunction  Large set of open class (e.g. noun, verbs, adjectives) items with frequency information

9 Components of a Tagger Mechanism to assign tags Mechanism to assign tags  Context-free: by frequency  Context: bigram, trigram, hand-coded rules  Example: Det Noun/*Verb the walk…Det Noun/*Verb the walk… Mechanism to handle unknown words (extra-dictionary) Mechanism to handle unknown words (extra-dictionary)  Capitalization  Morphology: -ed, -tion

10 How Hard is Tagging? Brown Corpus (Francis & Kucera, 1982): Brown Corpus (Francis & Kucera, 1982):  1 million words  39K distinct words  35K words with only 1 tag, 4K with multiple tags (DeRose, 1988) Easy task to do well on: Easy task to do well on:  90% accuracy for naïve algorithm (Charniak et al., 1993)

11 How Hard is Tagging? Multiple POS Multiple POS  Example:  still: noun, adjective, adverb, verb the still of the night, a glass stillthe still of the night, a glass still still watersstill waters stand stillstand still still strugglingstill struggling Still, I didn’t give wayStill, I didn’t give way still your fear of the dark (transitive)still your fear of the dark (transitive) the bubbling waters stilled (intransitive)the bubbling waters stilled (intransitive)

12 Penn TreeBank Tagset 48-tag simplification of Brown Corpus tagset 48-tag simplification of Brown Corpus tagset Examples: Examples: 1.CC Coordinating conjunction 3.DTDeterminer 7.JJAdjective 11.MDModal 12.NN Noun (singular,mass) 13.NNS Noun (plural) 27VB Verb (base form) 28VBD Verb (past)

13 Penn TreeBank Tagset www.ldc.upenn.edu/doc/treebank2/cl93.html

14 Penn TreeBank Tagset www.ldc.upenn.edu/doc/treebank2/cl93.html

15 Penn TreeBank Tagset How many tags? How many tags?  Tag criterion  Distinctness with respect to grammatical behavior?  Make tagging easier? Punctuation tags Punctuation tags  Penn Treebank numbers 37- 48  Trivial computational task

16 Penn TreeBank Tagset Simplifications : Simplifications :  TO: infinitival marker, preposition  I want to win  I went to the store  IN (preposition): that, when, although  I know that I should have stopped, although…  I stopped when I saw Bill

17 Penn TreeBank Tagset Simplifications: Simplifications:  DT (determiner): any, some, these, those  any man  these *man/men  VBP (verb, present): am, are, walk  Am I here?  *Walked I here?/Did I walk here?

18 Hard to Tag Items Syntactic function Syntactic function  Example:  I saw the man tired from running Examples from Brown Corpus Manual Examples from Brown Corpus Manual  Hyphenation:  long-range, high-energy  shirt-sleeved  signal-to-noise  Foreign words:  mens sana in corpore sano


Download ppt "LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG."

Similar presentations


Ads by Google