Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9: Part of Speech

Similar presentations


Presentation on theme: "Lecture 9: Part of Speech"— Presentation transcript:

1 Lecture 9: Part of Speech
Kai-Wei Chang University of Virginia Couse webpage: CS6501 Natural Language Processing

2 CS6501 Natural Language Processing
This lecture Parts of speech (POS) POS Tagsets CS6501 Natural Language Processing

3 CS6501 Natural Language Processing
Parts of Speech Traditional parts of speech ~ 8 of them CS6501 Natural Language Processing

4 CS6501 Natural Language Processing
POS examples N noun chair, bandwidth, pacing V verb study, debate, munch ADJ adjective purple, tall, ridiculous ADV adverb unfortunately, slowly P preposition of, by, to PRO pronoun I, me, mine DET determiner the, a, that, those CS6501 Natural Language Processing

5 CS6501 Natural Language Processing
Parts of Speech A.k.a. parts-of-speech, lexical categories, word classes, morphological classes, lexical tags... Lots of debate within linguistics about the number, nature, and universality of these CS6501 Natural Language Processing

6 CS6501 Natural Language Processing
POS Tagging The process of assigning a part-of-speech to each word in a collection (sentence). WORD tag the DET koala N put V the DET keys N on P table N CS6501 Natural Language Processing

7 Why is POS Tagging Useful?
First step of a vast number of practical tasks Parsing Need to know if a word is an N or V before you can parse Information extraction Finding names, relations, etc. Speech synthesis/recognition OBject obJECT OVERflow overFLOW DIScount disCOUNT CONtent conTENT Machine Translation CS6501 Natural Language Processing

8 Open and Closed Classes
Closed class: a small fixed membership Prepositions: of, in, by, … Pronouns: I, you, she, mine, his, them, … Usually function words (short common words which play a role in grammar) Open class: new ones can be created English has 4: Nouns, Verbs, Adjectives, Adverbs Many languages have these 4, but not all! CS6501 Natural Language Processing

9 CS6501 Natural Language Processing
Open Class Words Nouns Proper nouns (Boulder, Granby, Eli Manning) Common nouns (the rest). Count nouns and mass nouns Count: have plurals, get counted: goat/goats, one goat, two goats Mass: don’t get counted (snow, salt, communism) (*two snows) Verbs In English, have morphological affixes (eat/eats/eaten) CS6501 Natural Language Processing

10 CS6501 Natural Language Processing
Closed Class Words Examples: prepositions: on, under, over, … particles: up, down, on, off, … determiners: a, an, the, … pronouns: she, who, I, .. conjunctions: and, but, or, … auxiliary verbs: can, may should, … numerals: one, two, three, third, … CS6501 Natural Language Processing

11 Prepositions from CELEX
CELEX: online dictionary Frequency counts are from COBUILD 16-billion-word corpus CS6501 Natural Language Processing

12 CS6501 Natural Language Processing
English Particles CS6501 Natural Language Processing

13 CS6501 Natural Language Processing
Conjunctions CS6501 Natural Language Processing

14 CS6501 Natural Language Processing
Choosing a Tagset Could pick very coarse tagsets N, V, Adj, Adv, Other More commonly used set is finer grained E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB, WP$, VBG Brown cropus, 87 tags. Prague Dependency Treebank (Czech) 4452 tags AAFP3----3N----: (nejnezajímavějším) Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial] CS6501 Natural Language Processing

15 Penn TreeBank POS Tagset
CS6501 Natural Language Processing

16 CS6501 Natural Language Processing
Using the Penn Tagset The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. CS6501 Natural Language Processing

17 CS6501 Natural Language Processing
Universal Tag set ~ 12 different tags NOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X CS6501 Natural Language Processing

18 POS Tagging v.s. Word clustering
Words often have more than one POS: back The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB These examples from Dekang Lin CS6501 Natural Language Processing

19 CS6501 Natural Language Processing
How Hard is POS Tagging? CS6501 Natural Language Processing

20 CS6501 Natural Language Processing
POS tag sequences Some tag sequences more likely occur than others POS Ngram view ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_ Existing methods often model POS tagging as a sequence tagging problem CS6501 Natural Language Processing

21 CS6501 Natural Language Processing
Evaluation How many words in the unseen test data can be tagged correctly? Usually evaluated on Penn Treebank State of the art ~97% Trivial baseline (most likely tag) ~94% Human performance ~97% CS6501 Natural Language Processing


Download ppt "Lecture 9: Part of Speech"

Similar presentations


Ads by Google