Download presentation

Presentation is loading. Please wait.

Published byLily Willes Modified about 1 year ago

1
Ling 570 Day 6: HMM POS Taggers 1

2
Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details 2

3
HMM POS TAGGING 3

4
HMM Tagger 4

5
5

6
6

7
7

8
The good HMM Tagger From the Brown/Switchboard corpus: –P(VB|TO) =.34 –P(NN|TO) =.021 –P(race|VB) = –P(race|NN) = a.P(VB|TO) x P(race|VB) =.34 x = b.P(NN|TO) x P(race|NN) =.021 x = a. TO followed by VB in the context of race is more probable (‘race’ really has no effect here). 8

9
HMM Philosophy Imagine: the author, when creating this sentence, also had in mind the parts-of- speech of each of these words. After the fact, we’re now trying to recover those parts of speech. They’re the hidden part of the Markov model. 9

10
What happens when we do it the wrong way? Invert word and tag, P(t|w) instead of P(w|t): 1.P(VB|race) =.02 2.P(NN|race) =.98 2 would drown out virtually any other probability! We’d always tag race with NN! 10

11
What happens when we do it the wrong way? 11

12
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 12

13
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags 13

14
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously Predict current tag conditioned on prior n-1 tags Predict word conditioned on current tag 14

15
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 15

16
N-gram POS tagging JJ NNSVBRB colorlessgreenideassleepfuriously 16

17
HMM bigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 17

18
HMM trigram tagger JJ NNSVBRB colorlessgreenideassleepfuriously 18

19
Training An HMM needs to be trained on the following: 1.The initial state probabilities 2.The state transition probabilities –The tag-tag matrix 3.The emission probabilities –The tag-word matrix 19

20
Implementation 20

21
Implementation Transition distribution 21

22
Implementation Emission distribution 22

23
Implementation 23

24
Implementation 24

25
REVIEW VITERBI ALGORITHM 25

26
Consider two examples Mariners hit a a home run Mariners hit made the news 26

27
Consider two examples Mariners hit a a home run N N N N V V DT N N Mariners hit made the news N N V V DT N N N N 27

28
Parameters As probabilities, they get very small NVDT N V DT ahithomemadeMarinersnewsrunthe N E E-05 V DT

29
Parameters As probabilities, they get very small NVDT N V DT ahithomemadeMarinersnewsrunthe N E E-05 V DT NVDT N V DT ahithomemadeMarinersnewsrunthe N V-10-8 DT-2 As log probabilities, they won’t underflow… …and we can just add them 29

30
NVDT N -3-7 V DT ahithomemadeMarinersnewsrunthe N V-10-8 DT-2 Marinershitahomerun N V DT 30

31
NVDT N V DT ahithomemadeMarinersnewsrunthe N V-10-8 DT-2 Marinershitmadethenews N V DT 31

32
Viterbi 32

33
Pseudocode 33

34
Pseudocode 34

35
SMOOTHING 35

36
Training 36

37
Why Smoothing? Zero counts 37

38
Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities 38

39
Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities 39

40
Why Smoothing? Zero counts Handle missing tag sequences: –Smooth transition probabilities Handle unseen words: –Smooth observation probabilities Handle unseen (word,tag) pairs where both are known 40

41
Smoothing Tag Sequences 41

42
Smoothing Tag Sequences 42

43
Smoothing Tag Sequences 43

44
Smoothing Tag Sequences 44

45
Smoothing Emission Probabilities 45

46
Smoothing Emission Probabilities 46

47
Smoothing Emission Probabilities Preprocessing the training corpus: –Count occurrences of all words –Replace words singletons with magic token –Gather counts on modified data, estimate parameters Preprocessing the test set –For each test set word –If seen at least twice in training set, leave it alone –Otherwise replace with –Run Viterbi on this modified input 47

48
Unknown Words Is there other information we could use for P(w|t)? –Information in words themselves? Morphology: –-able: JJ –-tion NN –-ly RB –Case: John NP, etc –Augment models Add to ‘context’ of tags Include as features in classifier models –We’ll come back to this idea! 48

49
HMM IMPLEMENTATION 49

50
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i = 50

51
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij : 51

52
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): 52

53
HMM Implementation: Storing an HMM Approach #1: –Hash table (direct): π i =pi{state_str} a ij :a{from_state_str}{to_state_str} b i (o t ): b{state_str}{symbol} 53

54
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}= 54

55
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}= 55

56
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = 56

57
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]= 57

58
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i : 58

59
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij : 59

60
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ): 60

61
HMM Implementation: Storing an HMM Approach #2: –hash tables+arrays state2idx{state_str}=state_idx symbol2idx{symbol}=symbol_idx idx2symbol[symbol_idx] = symbol idx2state[state_idx]=state_str π i :pi[state_idx] a ij :a[from_state_idx][to_state_idx] b i (o t ):b[state_idx][symbol_idx] 61

62
HMM Matrix Representations Issue: 62

63
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = 63

64
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = 64

65
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” 65

66
HMM Matrix Representations Issue: –Many matrix entries are 0 Especially b[i][o] Approach 3: Sparse matrix representation –a[i] = “j1 p1 j2 p2…” –a[j] = “i1 p1 i2 p2..” –b[i] = “o1 p1 o2 p2 …” –b[o] = “i1 p1 i2 p2…” Could be: –Array of hashes –Array of lists of non-empty values –The latter is often quite fast, because lists are short and fit into cache lines 66

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google