Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.

Similar presentations


Presentation on theme: "Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center."— Presentation transcript:

1 Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center

2 Pen Technologies  Pen-based interfaces in mobile computing

3 Mathematical Formulation  H : Handwriting evidence on the basis of which a recognizer will make its decision – H = {h1, h2, h3, h4,…,hm}  W : Word string from a large vocabulary – W = {w1, w2, w3, w4,…., wn}  Recognizer : –

4 Mathematical Formulation SOURCE CHANNEL

5 Source Channel Model WRITERDIGITIZER FEATURE EXTRACTOR DECODER H CHANNEL

6 Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY

7 Hidden Markov Models Memoryless Model Add Memory Hide Something Markov Model Mixture Model Hide Something Add Memory Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

8 Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =

9 Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

10 State Sequence Representation : : : : 0.9 Observed Output Sequence  Unique State Sequence

11 Hide the states => Hidden Markov Model s1 s

12 Why use Hidden Markov Models Instead of Non-hidden?  Hidden Markov Models can be smaller – less parameters to estimate  States may be truly hidden – Position of the hand – Positions of articulators

13 Summary of HMM Basics  We are interested in assigning probabilities p(H) to feature sequences  Memoryless model – This model has no memory of the past  Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future  Hide the states : HMM

14 Hidden Markov Models  Given a observed sequence H – Compute p(H) for decoding – Find the most likely state sequence for a given Markov model (Viterbi algorithm) – Estimate the parameters of the Markov source (training)

15 Compute p(H) s1 s p(a) p(b) s

16 Compute p(H) – contd.  Compute p(H) where H = a a b b  Enumerate all ways of producing h1=a s1 s2 s3 0.5x x x x

17 Compute p(H) – contd.  Enumerate all ways of producing h1=a h2=a s1 s2 s3 0.5x x x x s1 s2 s3 0.5x x x x s2 s3 0.4x x0.3

18 Compute p(H)  Can save computation by combining paths s1 s2 s3 s1 s2 s3 s2 s3

19 Compute p(H)  Trellis Diagram s1 s2 s3 0aaaaabaabb.5x.8.5x.2.4x.5.3x.7.3x.3.5x.3.5x.7.2.1

20 Basic Recursion  Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) )  Boundary condition : Prob (s, 0) = 1 s1 s2 s3 0a aaaabaabb 1.0 s1, a : s1, a : 0.4 s1, 0 :.08 s1, a :.21 s2, a : s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0364 s1, 0 : s1, b : s2, b :.0108 s2, 0 :.033 s1, a : s2, 0 : 0.02 s2, 0 :.0182 s2, a :.0495 s2, 0 :.0054 s2, b :.0637 s2, 0 : s2, b :.0189

21 More Formally –Forward Algorithm

22 Find Most Likely Path for aabb - Dynamic Prog. or Viterbi  Max Prob (Node) = MAX(Max(predecessor) x Prob (predecessor->node) ) s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4s1, a :.16s1, b :.016 s1,b :.0016 s1, 0 :.08 s1, a :.21 s2, a :.04 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0168 s1, 0 : s1, b : s2, b : s2, 0 :.021 s1, a :.03 s2, 0 : 0.02 s2, 0 :.0084 s2, a :.0315 s2, 0 : s2, b :.0294 s2, 0 : s2, b :.00588

23 Training HMM parameters 1/3 1/2 p(a) p(b) = H = abaa p(H) =

24 Training HMM parameters = A posterior probability of path i =

25 Training HMM parameters

26 Keep on repeating : 600 iterations : p(H) = Another initial parameter set : p(H) =

27 Training HMM parameters  Converges to local maximum  There are 7 (atleast) local maxima  Final solution depends on starting point  Speed of convergence depends on starting point

28 Training HMM parameters : Forward Backward algorithm  Improves on enumerating algorithm by using the Trellis  Results in reduction from exponential computation to linear computation

29 Forward Backward Algorithm j

30 Forward Backward Algorithm  = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1,.. hj-1 = Probability of being in state and producing the output hj+1,..hm

31 Forward Backward Algorithm Transition count

32 Training HMM parameters  Guess initial values for all parameters  Compute forward and backward pass probabilities  Compute counts  Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M


Download ppt "Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center."

Similar presentations


Ads by Google