# Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.

## Presentation on theme: "Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center."— Presentation transcript:

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center

Pen Technologies  Pen-based interfaces in mobile computing

Mathematical Formulation  H : Handwriting evidence on the basis of which a recognizer will make its decision – H = {h1, h2, h3, h4,…,hm}  W : Word string from a large vocabulary – W = {w1, w2, w3, w4,…., wn}  Recognizer : –

Mathematical Formulation SOURCE CHANNEL

Source Channel Model WRITERDIGITIZER FEATURE EXTRACTOR DECODER H CHANNEL

Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY

Hidden Markov Models Memoryless Model Add Memory Hide Something Markov Model Mixture Model Hide Something Add Memory Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : 1 0 1 0 0 0 1 1 1 1 Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =

Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

State Sequence Representation 1 2 1 : 0.9 0 : 0.1 1 : 0.1 0 : 0.9 Observed Output Sequence  Unique State Sequence

Hide the states => Hidden Markov Model s1 s2 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1

Why use Hidden Markov Models Instead of Non-hidden?  Hidden Markov Models can be smaller – less parameters to estimate  States may be truly hidden – Position of the hand – Positions of articulators

Summary of HMM Basics  We are interested in assigning probabilities p(H) to feature sequences  Memoryless model – This model has no memory of the past  Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future  Hide the states : HMM

Hidden Markov Models  Given a observed sequence H – Compute p(H) for decoding – Find the most likely state sequence for a given Markov model (Viterbi algorithm) – Estimate the parameters of the Markov source (training)

Compute p(H) s1 s3 0.5 0.3 0.2 0.4 p(a) p(b) 0.5 0.7 0.3 0.5 0.1 s2 0.3 0.7 0.8 0.2

Compute p(H) – contd.  Compute p(H) where H = a a b b  Enumerate all ways of producing h1=a s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 0.40 0.21 0.04 0.03

Compute p(H) – contd.  Enumerate all ways of producing h1=a h2=a s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 s1 s2 s3 0.5x0.8 0.3x0.7 0.2 0.4x0.5 0.5x0.3 0.2 s2 s3 0.4x0.5 0.5x0.3

Compute p(H)  Can save computation by combining paths s1 s2 s3 s1 s2 s3 s2 s3

Compute p(H)  Trellis Diagram s1 s2 s3 0aaaaabaabb.5x.8.5x.2.4x.5.3x.7.3x.3.5x.3.5x.7.2.1

Basic Recursion  Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) )  Boundary condition : Prob (s, 0) = 1 s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4 1.00.4.16.016.0016 s1, a : 0.4 s1, 0 :.08 s1, a :.21 s2, a :.04 0.2 0.33.182.054.01256 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0364 s1, 0 :.00032 s1, b :.00144 s2, b :.0108 s2, 0 :.033 s1, a :.03 0.02 0.063.0677.0691.020156 s2, 0 : 0.02 s2, 0 :.0182 s2, a :.0495 s2, 0 :.0054 s2, b :.0637 s2, 0 :.001256 s2, b :.0189

More Formally –Forward Algorithm

Find Most Likely Path for aabb - Dynamic Prog. or Viterbi  Max Prob (Node) = MAX(Max(predecessor) x Prob (predecessor->node) ) s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4s1, a :.16s1, b :.016 s1,b :.0016 s1, 0 :.08 s1, a :.21 s2, a :.04 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0168 s1, 0 :.00032 s1, b :.00144 s2, b :.00336 s2, 0 :.021 s1, a :.03 s2, 0 : 0.02 s2, 0 :.0084 s2, a :.0315 s2, 0 :.00168 s2, b :.0294 s2, 0 :.000336 s2, b :.00588

Training HMM parameters 1/3 1/2 p(a) p(b) = H = abaa.000385.000578.000868.001302.001157.002604.001736 p(H) =.008632

Training HMM parameters = A posterior probability of path i =.045.067.134.100.201.150.301

Training HMM parameters

.71.29.68.32.64.36.60.40.34.46.20.60.40 0.001080.001290.004040.00212 0.00537 0.002530.00791 Keep on repeating : 600 iterations : p(H) =.037037037 Another initial parameter set : p(H) = 0.0625

Training HMM parameters  Converges to local maximum  There are 7 (atleast) local maxima  Final solution depends on starting point  Speed of convergence depends on starting point

Training HMM parameters : Forward Backward algorithm  Improves on enumerating algorithm by using the Trellis  Results in reduction from exponential computation to linear computation

Forward Backward Algorithm..................................................... j

Forward Backward Algorithm  = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1,.. hj-1 = Probability of being in state and producing the output hj+1,..hm

Forward Backward Algorithm Transition count

Training HMM parameters  Guess initial values for all parameters  Compute forward and backward pass probabilities  Compute counts  Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M

Download ppt "Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center."

Similar presentations