Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.

Similar presentations


Presentation on theme: "Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details."— Presentation transcript:

1 Ling 570 Day 6: HMM POS Taggers 1

2 Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details 2

3 HMM POS TAGGING 3

4 HMM Tagger 4

5 5

6 6

7 7

8 The good HMM Tagger From the Brown/Switchboard corpus: –P(VB|TO) =.34 –P(NN|TO) =.021 –P(race|VB) = –P(race|NN) = a.P(VB|TO) x P(race|VB) =.34 x = b.P(NN|TO) x P(race|NN) =.021 x =  a. TO followed by VB in the context of race is more probable (‘race’ really has no effect here). 8

9 HMM Philosophy Imagine: the author, when creating this sentence, also had in mind the parts-of- speech of each of these words. After the fact, we’re now trying to recover those parts of speech. They’re the hidden part of the Markov model. 9

10 What happens when we do it the wrong way? Invert word and tag, P(t|w) instead of P(w|t): 1.P(VB|race) =.02 2.P(NN|race) =.98 2 would drown out virtually any other probability! We’d always tag race with NN! 10

11 What happens when we do it the wrong way? 11

12 N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 12

13 N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags 13

14 N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags Predict word conditioned on current tag 14

15 N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 15

16 N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 16

17 HMM bigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 17

18 HMM trigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 18

19 Training An HMM needs to be trained on the following: 1.The initial state probabilities 2.The state transition probabilities –The tag-tag matrix 3.The emission probabilities –The tag-word matrix 19

20 Implementation 20

21 Implementation Transition distribution 21

22 Implementation Emission distribution 22

23 Implementation 23

24 Implementation 24

25 REVIEW VITERBI ALGORITHM 25

26 Consider two examples Mariners hit a a home run Mariners hit made the news 26

27 Consider two examples Mariners hit a a home run N N N N V V DT N N Mariners hit made the news N N V V DT N N N N 27

28 Parameters As probabilities, they get very small NVDT N V DT ahithomemadeMarinersnewsrunthe N E E-05 V DT

29 Parameters As probabilities, they get very small NVDT N V DT ahithomemadeMarinersnewsrunthe N E E-05 V DT NVDT N V DT ahithomemadeMarinersnewsrunthe N V-10-8 DT-2 As log probabilities, they won’t underflow… …and we can just add them 29

30 NVDT N -3-7 V DT ahithomemadeMarinersnewsrunthe N V-10-8 DT-2 Marinershitahomerun N V DT 30

31 NVDT N V DT ahithomemadeMarinersnewsrunthe N V-10-8 DT-2 Marinershitmadethenews N V DT 31

32 Viterbi 32

33 Pseudocode 33

34 Pseudocode 34

35 SMOOTHING 35

36 Training 36

37 Why Smoothing? Zero counts 37

38 Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities 38

39 Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities 39

40 Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities Handle unseen (word,tag) pairs where both are known 40

41 Smoothing Tag Sequences 41

42 Smoothing Tag Sequences 42

43 Smoothing Tag Sequences 43

44 Smoothing Tag Sequences 44

45 Smoothing Emission Probabilities 45

46 Smoothing Emission Probabilities 46

47 Smoothing Emission Probabilities Preprocessing the training corpus: –Count occurrences of all words –Replace words singletons with magic token –Gather counts on modified data, estimate parameters Preprocessing the test set –For each test set word –If seen at least twice in training set, leave it alone –Otherwise replace with –Run Viterbi on this modified input 47

48 Unknown Words Is there other information we could use for P(w|t)? –Information in words themselves? Morphology: –-able:  JJ –-tion  NN –-ly  RB –Case: John  NP, etc –Augment models Add to ‘context’ of tags Include as features in classifier models –We’ll come back to this idea! 48

49 HMM IMPLEMENTATION 49

50 HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i = 50

51 HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij : 51

52 HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): 52

53 HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): b{state_str}{symbol} 53

54 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}= 54

55 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}= 55

56 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = 56

57 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]= 57

58 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i : 58

59 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij : 59

60 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ): 60

61 HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ):b[state_idx][symbol_idx] 61

62 HMM Matrix Representations Issue: 62

63 HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = 63

64 HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = 64

65 HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” 65

66 HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” Could be: –Array of hashes –Array of lists of non-empty values –The latter is often quite fast, because lists are short and fit into cache lines 66


Download ppt "Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details."

Similar presentations


Ads by Google