Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parts of Speech Sudeshna Sarkar 7 Aug 2008.

Similar presentations


Presentation on theme: "Parts of Speech Sudeshna Sarkar 7 Aug 2008."— Presentation transcript:

1 Parts of Speech Sudeshna Sarkar 7 Aug 2008

2 Why Do We Care about Parts of Speech?
Pronunciation Hand me the lead pipe. Predicting what words can be expected next Personal pronoun (e.g., I, she) ____________ Stemming -s means singular for verbs, plural for nouns As the basis for syntactic parsing and then meaning extraction I will lead the group into the lead smelter. Machine translation (E) content +N  (F) contenu +N (E) content +Adj  (F) content +Adj or satisfait +Adj

3 What is a Part of Speech? Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns. Consider: green book book is a Noun green is an Adjective Now consider: book worm This green is very soothing.

4 How Many Parts of Speech Are There?
A first cut at the easy distinctions: Open classes: nouns, verbs, adjectives, adverbs Closed classes: function words conjunctions: and, or, but pronounts: I, she, him prepositions: with, on determiners: the, a, an

5 Part of speech tagging 8 (ish) traditional parts of speech
Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.) Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS We’ll use POS most frequently I’ll assume that you all know what these are

6 POS examples N noun chair, bandwidth, pacing
V verb study, debate, munch ADJ adj purple, tall, ridiculous ADV adverb unfortunately, slowly, P preposition of, by, to PRO pronoun I, me, mine DET determiner the, a, that, those

7 Tagsets Brown corpus tagset (87 tags): Penn Treebank tagset (45 tags): (8.6) C7 tagset (146 tags)

8 POS Tagging: Definition
The process of assigning a part-of-speech or lexical class marker to each word in a corpus: the koala put keys on table WORDS TAGS N V P DET

9 POS Tagging example WORD tag the DET koala N put V the DET keys N on P
table N

10 POS tagging: Choosing a tagset
There are so many parts of speech, potential distinctions we can draw To do POS tagging, need to choose a standard set of tags to work with Could pick very coarse tagets N, V, Adj, Adv. More commonly used set is finer grained, the “UPenn TreeBank tagset”, 45 tags PRP$, WRB, WP$, VBG Even more fine-grained tagsets exist

11 Penn TreeBank POS Tag set

12 Using the UPenn tagset The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”) Except the preposition/complementizer “to” is just marked “to”.

13 POS Tagging Words often have more than one POS: back
The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB The POS tagging problem is to determine the POS tag for a particular instance of a word.

14 How hard is POS tagging? Measuring ambiguity

15 Algorithms for POS Tagging
Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags): Worse, 40% of the tokens are ambiguous.

16 Algorithms for POS Tagging
Why can’t we just look them up in a dictionary? Words that aren’t in the dictionary One idea: P(ti | wi) = the probability that a random hapax legomenon in the corpus has tag ti. Nouns are more likely than verbs, which are more likely than pronouns. Another idea: use morphology.

17 Algorithms for POS Tagging - Knowledge
Dictionary Morphological rules, e.g., _____-tion _____-ly capitalization N-gram frequencies to _____ DET _____ N But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish) Combining these V _____-ing I was gracking vs. Gracking is fun.

18 POS Tagging - Approaches
Rule-based tagging (ENGTWOL) Stochastic (=Probabilistic) tagging HMM (Hidden Markov Model) tagging Transformation-based tagging Brill tagger Do we return one best answer or several answers and let later steps decide? How does the requisite knowledge get entered?

19 3 methods for POS tagging
1. Rule-based tagging Example: Karlsson (1995) EngCG tagger based on the Constraint Grammar architecture and ENGTWOL lexicon Basic Idea: Assign all possible tags to words (morphological analyzer used) Remove wrong tags according to set of constraint rules (typically more than 1000 hand-written constraint rules, but may be machine-learned)

20 3 methods for POS tagging
2. Transformation-based tagging Example: Brill (1995) tagger - combination of rule-based and stochastic (probabilistic) tagging methodologies Basic Idea: Start with a tagged corpus + dictionary (with most frequent tags) Set the most probable tag for each word as a start value Change tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific order (like rule-based taggers) machine learning is used—the rules are automatically induced from a previously tagged training corpus (like stochastic approach)

21 3 methods for POS tagging
3. Stochastic (=Probabilistic) tagging Example: HMM (Hidden Markov Model) tagging - a training corpus used to compute the probability (frequency) of a given word having a given POS tag in a given context

22 Hidden Markov Model (HMM) Tagging
Using an HMM to do POS tagging HMM is a special case of Bayesian inference It is also related to the “noisy channel” model in ASR (Automatic Speech Recognition)

23 Syntagmatic information
Hidden Markov Model (HMM) Taggers Goal: maximize P(word|tag) x P(tag|previous n tags) P(word|tag) word/lexical likelihood probability that given this tag, we have this word NOT probability that this word has this tag modeled through language model (word-tag matrix) P(tag|previous n tags) tag sequence likelihood probability that this tag follows these previous tags modeled through language model (tag-tag matrix) Lexical information Syntagmatic information

24 POS tagging as a sequence classification task
We are given a sentence (an “observation” or “sequence of observations”) Secretariat is expected to race tomorrow sequence of n words w1…wn. What is the best sequence of tags which corresponds to this sequence of observations? Probabilistic/Bayesian view: Consider all possible sequences of tags Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn.


Download ppt "Parts of Speech Sudeshna Sarkar 7 Aug 2008."

Similar presentations


Ads by Google