Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.

Similar presentations


Presentation on theme: "Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center."— Presentation transcript:

1 Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center

2 Pen Technologies  Pen-based interfaces in mobile computing

3 Mathematical Formulation  H : Handwriting evidence on the basis of which a recognizer will make its decision – H = {h1, h2, h3, h4,…,hm}  W : Word string from a large vocabulary – W = {w1, w2, w3, w4,…., wn}  Recognizer : –

4 Mathematical Formulation SOURCE CHANNEL

5 Source Channel Model WRITERDIGITIZER FEATURE EXTRACTOR DECODER H CHANNEL

6 Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY

7 Hidden Markov Models Memoryless Model Add Memory Hide Something Markov Model Mixture Model Hide Something Add Memory Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

8 Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : 1 0 1 0 0 0 1 1 1 1 Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =

9 Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

10 State Sequence Representation 1 2 1 : 0.9 0 : 0.1 1 : 0.1 0 : 0.9 Observed Output Sequence  Unique State Sequence

11 Hide the states => Hidden Markov Model s1 s2 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1

12 Why use Hidden Markov Models Instead of Non-hidden?  Hidden Markov Models can be smaller – less parameters to estimate  States may be truly hidden – Position of the hand – Positions of articulators

13 Summary of HMM Basics  We are interested in assigning probabilities p(H) to feature sequences  Memoryless model – This model has no memory of the past  Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future  Hide the states : HMM

14 Hidden Markov Models  Given a observed sequence H – Compute p(H) for decoding – Find the most likely state sequence for a given Markov model (Viterbi algorithm) – Estimate the parameters of the Markov source (training)

15 Compute p(H) s1 s3 0.5 0.3 0.2 0.4 p(a) p(b) 0.5 0.7 0.3 0.5 0.1 s2 0.3 0.7 0.8 0.2

16 Compute p(H) – contd.  Compute p(H) where H = a a b b  Enumerate all ways of producing h1=a s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 0.40 0.21 0.04 0.03

17 Compute p(H) – contd.  Enumerate all ways of producing h1=a h2=a s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 s2 s3 0.4x0.5 0.5x0.3

18 Compute p(H)  Can save computation by combining paths s1 s2 s3 s1 s2 s3 s2 s3

19 Compute p(H)  Trellis Diagram s1 s2 s3 0aaaaabaabb.5x.8.5x.2.4x.5.3x.7.3x.3.5x.3.5x.7.2.1

20 Basic Recursion  Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) )  Boundary condition : Prob (s, 0) = 1 s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4 1.00.4.16.016.0016 s1, a : 0.4 s1, 0 :.08 s1, a :.21 s2, a :.04 0.2 0.33.182.054.01256 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0364 s1, 0 :.00032 s1, b :.00144 s2, b :.0108 s2, 0 :.033 s1, a :.03 0.02 0.063.0677.0691.020156 s2, 0 : 0.02 s2, 0 :.0182 s2, a :.0495 s2, 0 :.0054 s2, b :.0637 s2, 0 :.001256 s2, b :.0189

21 More Formally –Forward Algorithm

22 Find Most Likely Path for aabb - Dynamic Prog. or Viterbi  Max Prob (Node) = MAX(Max(predecessor) x Prob (predecessor->node) ) s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4s1, a :.16s1, b :.016 s1,b :.0016 s1, 0 :.08 s1, a :.21 s2, a :.04 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0168 s1, 0 :.00032 s1, b :.00144 s2, b :.00336 s2, 0 :.021 s1, a :.03 s2, 0 : 0.02 s2, 0 :.0084 s2, a :.0315 s2, 0 :.00168 s2, b :.0294 s2, 0 :.000336 s2, b :.00588

23 Training HMM parameters 1/3 1/2 p(a) p(b) = H = abaa.000385.000578.000868.001302.001157.002604.001736 p(H) =.008632

24 Training HMM parameters = A posterior probability of path i =.045.067.134.100.201.150.301

25 Training HMM parameters

26 .71.29.68.32.64.36.60.40.34.46.20.60.40 0.001080.001290.004040.00212 0.00537 0.002530.00791 Keep on repeating : 600 iterations : p(H) =.037037037 Another initial parameter set : p(H) = 0.0625

27 Training HMM parameters  Converges to local maximum  There are 7 (atleast) local maxima  Final solution depends on starting point  Speed of convergence depends on starting point

28 Training HMM parameters : Forward Backward algorithm  Improves on enumerating algorithm by using the Trellis  Results in reduction from exponential computation to linear computation

29 Forward Backward Algorithm..................................................... j

30 Forward Backward Algorithm  = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1,.. hj-1 = Probability of being in state and producing the output hj+1,..hm

31 Forward Backward Algorithm Transition count

32 Training HMM parameters  Guess initial values for all parameters  Compute forward and backward pass probabilities  Compute counts  Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M


Download ppt "Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center."

Similar presentations


Ads by Google