Download presentation

Presentation is loading. Please wait.

Published byNelson Eppes Modified about 1 year ago

1
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center

2
Pen Technologies Pen-based interfaces in mobile computing

3
Mathematical Formulation H : Handwriting evidence on the basis of which a recognizer will make its decision – H = {h1, h2, h3, h4,…,hm} W : Word string from a large vocabulary – W = {w1, w2, w3, w4,…., wn} Recognizer : –

4
Mathematical Formulation SOURCE CHANNEL

5
Source Channel Model WRITERDIGITIZER FEATURE EXTRACTOR DECODER H CHANNEL

6
Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY

7
Hidden Markov Models Memoryless Model Add Memory Hide Something Markov Model Mixture Model Hide Something Add Memory Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

8
Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : 1 0 1 0 0 0 1 1 1 1 Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =

9
Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

10
State Sequence Representation 1 2 1 : 0.9 0 : 0.1 1 : 0.1 0 : 0.9 Observed Output Sequence Unique State Sequence

11
Hide the states => Hidden Markov Model s1 s2 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1

12
Why use Hidden Markov Models Instead of Non-hidden? Hidden Markov Models can be smaller – less parameters to estimate States may be truly hidden – Position of the hand – Positions of articulators

13
Summary of HMM Basics We are interested in assigning probabilities p(H) to feature sequences Memoryless model – This model has no memory of the past Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future Hide the states : HMM

14
Hidden Markov Models Given a observed sequence H – Compute p(H) for decoding – Find the most likely state sequence for a given Markov model (Viterbi algorithm) – Estimate the parameters of the Markov source (training)

15
Compute p(H) s1 s3 0.5 0.3 0.2 0.4 p(a) p(b) 0.5 0.7 0.3 0.5 0.1 s2 0.3 0.7 0.8 0.2

16
Compute p(H) – contd. Compute p(H) where H = a a b b Enumerate all ways of producing h1=a s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 0.40 0.21 0.04 0.03

17
Compute p(H) – contd. Enumerate all ways of producing h1=a h2=a s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 s2 s3 0.4x0.5 0.5x0.3

18
Compute p(H) Can save computation by combining paths s1 s2 s3 s1 s2 s3 s2 s3

19
Compute p(H) Trellis Diagram s1 s2 s3 0aaaaabaabb.5x.8.5x.2.4x.5.3x.7.3x.3.5x.3.5x.7.2.1

20
Basic Recursion Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) ) Boundary condition : Prob (s, 0) = 1 s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4 1.00.4.16.016.0016 s1, a : 0.4 s1, 0 :.08 s1, a :.21 s2, a :.04 0.2 0.33.182.054.01256 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0364 s1, 0 :.00032 s1, b :.00144 s2, b :.0108 s2, 0 :.033 s1, a :.03 0.02 0.063.0677.0691.020156 s2, 0 : 0.02 s2, 0 :.0182 s2, a :.0495 s2, 0 :.0054 s2, b :.0637 s2, 0 :.001256 s2, b :.0189

21
More Formally –Forward Algorithm

22
Find Most Likely Path for aabb - Dynamic Prog. or Viterbi Max Prob (Node) = MAX(Max(predecessor) x Prob (predecessor->node) ) s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4s1, a :.16s1, b :.016 s1,b :.0016 s1, 0 :.08 s1, a :.21 s2, a :.04 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0168 s1, 0 :.00032 s1, b :.00144 s2, b :.00336 s2, 0 :.021 s1, a :.03 s2, 0 : 0.02 s2, 0 :.0084 s2, a :.0315 s2, 0 :.00168 s2, b :.0294 s2, 0 :.000336 s2, b :.00588

23
Training HMM parameters 1/3 1/2 p(a) p(b) = H = abaa.000385.000578.000868.001302.001157.002604.001736 p(H) =.008632

24
Training HMM parameters = A posterior probability of path i =.045.067.134.100.201.150.301

25
Training HMM parameters

26
.71.29.68.32.64.36.60.40.34.46.20.60.40 0.001080.001290.004040.00212 0.00537 0.002530.00791 Keep on repeating : 600 iterations : p(H) =.037037037 Another initial parameter set : p(H) = 0.0625

27
Training HMM parameters Converges to local maximum There are 7 (atleast) local maxima Final solution depends on starting point Speed of convergence depends on starting point

28
Training HMM parameters : Forward Backward algorithm Improves on enumerating algorithm by using the Trellis Results in reduction from exponential computation to linear computation

29
Forward Backward Algorithm..................................................... j

30
Forward Backward Algorithm = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1,.. hj-1 = Probability of being in state and producing the output hj+1,..hm

31
Forward Backward Algorithm Transition count

32
Training HMM parameters Guess initial values for all parameters Compute forward and backward pass probabilities Compute counts Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google