Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.

Similar presentations


Presentation on theme: "Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10."— Presentation transcript:

1 Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10

2 2 Contents  Markov Model Taggers  Hidden Markov Model Taggers  Transformation-Based Learning of Tags  Tagging Accuracy and Uses of Taggers

3 3 Markov Model Taggers  Markov properties  Limited horizon  Time invariant cf. Wh-extraction (Chomsky) a. Should Peter buy a book? b. Which book should Peter buy?

4 4 Markov Model Taggers  The probabilistic model  Finding the best tagging t 1,n for a sentence w 1,n ex: P(AT NN BEZ IN AT VB | The bear is on the move)

5 5  assumtion words are independent of each other a word’s identity only depends on its tag

6 6 Markov Model Taggers  Training for all tags t j do for all tags t k do end for all tags t j do for all words w l do end

7 7 First tag Second tag ATBEZINNNVBPERIOD AT00048636019 BEZ19730426187038 IN4332201325173140185 NN10673720424701177361421392 VB607242475814761291522 PERIOD801675465613299540 ATBEZINNNVBPERIOD bear 00010430 is 0100650000 move 000361330 on 005484000 president 00038200 progress 00010840 the 6901600000. 0000048809

8 8 Markov Model Taggers  Tagging (the Viterbi algorithm)

9 9 Variations  The models for unknown words 1. assuming that they can be any part of speech 2. using morphological to make inferences about a possible parts of speech

10 10 Z: normalization constant

11 11 Variation  Trigram taggers  Interpolation  Variable Memory Markov Model (VMMM)

12 12 Variation  Smoothing  Reversibility K l : the number of possible parts of speech of w l

13 13 Variation  Sequence vs. tag by tag Time flies like an arrow. a. NN VBZ RB AT NN.P(.) = 0.01 b. NN NNS VB AT NN.P(.) = 0.01  there is no large difference in accuracy between maximizing the sequence and tag

14 14 Hidden Markov Model Taggers When we have no tagged training data  Initializing all parameters with the dictionary information  Jelinek’s method  Kupiec’s method

15 15 Hidden Markov Model Taggers  Jelinek’s method  initializing the HMM with the MLE for P(w k |t i )  assuming that words occur equally likely with each of their possible tags. T(w j ): the number of tags allowed for w j

16 16 Hidden Markov Model Taggers  Kupiec’s method  grouping all words with the same possible parts of speech into ‘metawords’ u L  not to fine-tune parameters for each word

17 17 Hidden Markov Model Taggers  Training  after initialization, the HMM is trained using the Forward-Backward algorithm  Tagging  equal to VMM ! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.

18 18 Hidden Markov Model Taggers  The effect of initialization on HMM  overtraining problem D0maximum likelihood estimates from a tagged training corpus D1correct ordering only of lexical probabilities D2lexical probabilities proportional to overall tag probabilities D3equal lexical probabilities for all tags admissible for a word T0maximum likelihood estimates from a tagged training corpus T1equal probabilities for all transitions

19 19  Use Visible Markov Model  a sufficiently large training text  similar to the intended text of application  Run Forward-Backward for a few iterations  no training text  training and test text are very different  but at least some lexical information  Run Forward-Backward for a larger number of iterations  no lexical information


Download ppt "Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10."

Similar presentations


Ads by Google