Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part of speech (POS) tagging

Similar presentations

Presentation on theme: "Part of speech (POS) tagging"— Presentation transcript:

1 Part of speech (POS) tagging
Tagging of words in a corpus with the correct part of speech, drawn from some tagset. Early automatic POS taggers were rule-based. Stochastic POS taggers are reasonably accurate.

2 Applications of POS tagging
Parsing recovering syntactic structure requires correct POS tags partial parsing refers to and syntactic analysis which does not result in a full syntactic parse (e.g. finding noun phrases) - “parsing by chunks”

3 Applications of POS tagging
Information extraction fill slots in predefined templates with information full parse is not needed for this task, but partial parsing results (phrases) can be very helpful information extraction tags with grammatical categories to find semantic categories

4 Applications of POS tagging
Question answering system responds to a user question with a noun phrase Who shot JR? (Kristen Shepard) Where is Starbucks? (UB Commons) What is good to eat here? (pizza)

5 Background on POS tagging
How hard is tagging? most words have just a single tag: easy some words have more than one possible tag: harder many common words are ambiguous Brown corpus: 10.4% of word types are ambiguous 40%+ of word tokens are ambiguous

6 Disambiguation approaches
Rule-based rely on large set of rules to disambiguate in context rules are mostly hand-written Stochastic rely on probabilities of words having certain tags in context probabilities derived from training corpus Combined transformation-based tagger: uses stochastic approach to determine initial tagging, then uses a rule-based approach to “clean up” the tags

7 Determining the appropriate tag for an untagged word
Two types of information can be used: syntagmatic information consider the tags of other words in the surrounding context tagger using such information correctly tagged approx. 77% of words problem: content words (which are the ones most likely to be ambiguous) typically have many parts of speech, via productive rules (e.g. N  V)

8 Determining the appropriate tag for an untagged word
use information about word (e.g. usage probability) baseline for tagger performance is given by a tagger that simply assigns the most common tag to ambiguous words correctly tags 90% of words modern taggers use a variety of information sources

9 Note about accuracy measures
Modern taggers claim accuracy rates of around 96% to 97%. This sounds impressive, but how good are they really? This is a measure of correctness at the level of individual words, not whole corpora. With a 96% accuracy, 1 word out of 25 is tagged incorrectly. This represents roughly one tagging error per sentence.

10 Rule-based POS tagging
Two-stage design: first stage looks up individual words in a dictionary and tags words with sets of possible tags second stage uses rules to disambiguate, resulting in singleton tag sets

11 Stochastic POS tagging
Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers generally maximize probabilities for tag sequences for sentences.

12 Bigram stochastic tagger
This kind of tagger “…chooses tag ti for word wi that is most probable given the previous tag ti-1 and the current word wi: ti = argmaxj P(tj | ti-1, wi) (8.2)” [page 303] Bayes law says: P(T|W) = P(T)P(W|T)/P(W) P(tj | ti-1, wi) = P(tj) P(ti-1, wi | tj) / P(tI-1, wi) Since we take the argmax of this over the tis, result is the same as using: P(tj | ti-1, wi) = P(tj) P(ti-1, wi | tj) Rewriting: ti = argmaxj P(tj | ti-1)P(wi | tj)

13 Example (page 304) What tag to we assign to race?
to/TO race/?? the/DT race/?? If we are choosing between NN and VB as tags for race, the equations are: P(VB|TO)P(race|VB) P(NN|TO)P(race|NN) Tagger will choose tag for NN which maximizes the probability

14 Example For first part – look at tag sequence probability:
P(NN|TO) = 0.021 P(VB|TO) = 0.34 For second part – look at lexical likelihood: P(race|NN) = P(race|VB) = Combining these: P(VB|TO)P(race|VB) = P(NN|TO)P(race|NN) =

Download ppt "Part of speech (POS) tagging"

Similar presentations

Ads by Google