# Christian VIARD-GAUDIN

## Presentation on theme: "Christian VIARD-GAUDIN"— Presentation transcript:

Christian VIARD-GAUDIN cviard@ireste.fr
Séminaire IRCCyN 6 Janvier 2 Christian VIARD-GAUDIN Institut de Recherche en Communications et Cybernétique de Nantes

Modélisation psychovisuelle
Multimédia et Réseaux Vidéo et Multimédia Modélisation psychovisuelle Modèle z 0.8 0.7 0.3 0.2 0.6 0.4 0.15 0.1 0.25 0.35 Écrit et Documents Équipe Image et Vidéo Communications Copyright © IRCCyN/CVG

SOMMAIRE TUTORIAL SUR LES MODÈLES MARKOV CACHÉS APPLICATION A LA RECONNAISSANCE DE L’ÉCRITURE MANUSCRITE Copyright © IRCCyN/CVG

Première Partie : Modèles de Markov Cachés (HMM’s)
Origins and Scopes Late 1960’s (BAUM, 1967) : Basic theory Late 1980’s : Large widespread understanding Application to speech recognition Used to model the behavior of speech, characters, Temperature, stock market, … By mean of a statistical approach A real-world signal Parametric random process Copyright © IRCCyN/CVG

Hidden Markov Model (1) An HMM is a double stochastic process 1) an underlying stochastic process generates a sequence of states : q1, q2, … , qt, ... qT, Where t : discrete time, regularly spaced T : length of the sequence qt  Q = {q1, q2, ... qN} N : the number of possible states Copyright © IRCCyN/CVG

Markov Chain Hypotheses
1) First order : probabilistic description is truncated to just the current state and the predecessor state : P[qt=qj|qt-1=qi, qt-2= qk, …] = P[qt=qj|qt-1=qi] 2) Stationarity : probabilities are time invariant : P[qt=qj|qt-1=qi] = aij 1  i, j  N This defines a square NxN matrix, A = {aij} (state transition probability matrix) where aij  0 and 3) an initial state distribution  = {i} should also be defined : i = P[q1 =qi] with 1  i  N where i  0 and The value of each state is unobservable, but ... Copyright © IRCCyN/CVG

Hidden Markov Model (2) An HMM is a double stochastic process 1) an underlying stochastic process generates a sequence of states : q1, q2, … , qt, ... qT, But ... 2) each state emits an observation according to a second stochastic process : xt  X = {x1, x2, ... xM} qt xi : a discrete symbol M : number of symbols Copyright © IRCCyN/CVG

Observation Hypothesis 1) The observation xt depends only of the present state qt : P[xt = xj | qt = qi] = bij This defines a NxM matrix B: observation probability matrix B = {bij} where bij  0 and A complete specification of an HMM (l) requires : Two model parameters N and M A specification of the symbols to observe Three probability measures : P, A, B l = (P, A, B) Copyright © IRCCyN/CVG

Example 1 : (Not Hidden, Just a Discrete Markov Model)
Model of the weather in Nantes N = M = 3 State = observation : Q = {q1=rain, q2=cloudy, q3=sunny} t is sampled every day, at noon for instance ANantes = {aij} = Let us play with this model now ! Copyright © IRCCyN/CVG

NANTES Weather Model Given that the weather today (t=1) is sunny (q1 = q3) Answer these questions What will be the weather tomorrow (t = 2) ? What is the probability of rain for the day after tomorrow (t = 3) ? And for the day after (t = 4) ? What is the probability the coming week will be “sun-sun-rain-rain-sun-cloudy-sun” Copyright © IRCCyN/CVG

NANTES Weather Model Two more questions : What is the probability of rain for d consecutive days ? (ex. d = 3), What is the average number of consecutive sunny days ? Cloudy days ? Rainy days ? Copyright © IRCCyN/CVG

R C S NANTES Weather Model R C S A = Answers What will be the weather tomorrow ? What is the probability of rain for the day after tomorrow ? Use of a trellis : Copyright © IRCCyN/CVG

NANTES Weather Model Just extend the trellis, from the previous values
And for the day after ? Just extend the trellis, from the previous values q1: rain q2: cloudy q3: sunny P(q4=q1) = What is the probability the coming week will be “sun-sun-rain-rain-sun-cloudy-sun” P(q3,q3,q1,q1,q3,q2,q3) = Copyright © IRCCyN/CVG

NANTES Weather Model What is the probability that rain lasts for d consecutive days ? (ex. d = 3), More generally : P(qi, qi, …qi,qij) = hence P(q1, q1, q1, qj  1) = Copyright © IRCCyN/CVG

NANTES Weather Model What is the average number of consecutive sunny days ? Cloudy days ? Rainy days ? Copyright © IRCCyN/CVG

NANTES Weather Model A = Topological graphic representation : q2 = cloudy q1 = rain q3 = sunny 0.4 0.3 0.6 0.2 0.1 0.8 This is an ergodic model : from one state, all other states are reachable Copyright © IRCCyN/CVG

Example 2 : Extension to HMM
Coin tossing experiment : We do not see how the tossing is performed : one coin, multiple coins, biased coins ? We only have access to the result, a sequence of tosses, In that case, M = 2 and X = {x1 = Head, x2 = Tail} Observation sequence : (x1,x2, …. xT) = (H,H,T,T,T,H, …H) The problem are : How to build a model to explain the observed sequence ? What are the states ? How many states ? Copyright © IRCCyN/CVG

Coin tossing experiment : model 1 : assume only one biased coin
2 states (N = 2) Q = {q1 = H, q2 = T} Model topology : q2 = Tail q1 = Head P(H) 1-P(H) Only one parameter is needed : P(H), it defines matrix A model 2 : assume two biased coins - 2 states (N = 2) Q = {q1 = Coin1, q2 = Coin2} - 2 different observations (M = 2) X = {x1 = H, x2 = T} Copyright © IRCCyN/CVG

Model topology : 1-a11 a11 a22 q1 = Coin1 q2 = Coin2 1-a22 Transition state Observation symbol probabilities : probabilities A = B = 4 parameters are required to define this model (A : 2, B : 2) Copyright © IRCCyN/CVG

Coin tossing experiment : model 3 : assume three biased coins
3 states (N = 3) Q = {q1 = Coin1, q2 = Coin2, q3 = Coin3} 2 different observations (M = 2) X = {x1 = H, x2 = T} Model topology : q2 = Coin2 q1 = Coin1 q3 = Coin3 a11 a12 a13 a22 a21 a23 a31 a32 a33 Observation symbol probabilities : B = 9 independent parameters are required to define this model (A : 6, B :3) Copyright © IRCCyN/CVG

Coin tossing experiment : model 3 : assume three biased coins
3 states (N = 3) Q = {q1 = Coin1, q2 = Coin2, q3 = Coin3} Consider the following example : Matrix A - State transition probabilities : 1/3 - Initial state probabilities : 1/3 Vector P - Observation probabilities Matrix B Copyright © IRCCyN/CVG

Coin tossing experiment : model 3 : assume three biased coins
3 states (N = 3) Q = {q1 = Coin1, q2 = Coin2, q3 = Coin3} 1. You observe X = (H,H,H,H,T,H,T,T,T,T) Which state sequence G, generates most likely X ? What is the joint probability, P(X, G | l) of the observation sequence and the state sequence ? 2. What is the probability that the observation sequence came entirely from state q1 ? Copyright © IRCCyN/CVG

1. You observe X = (H,H,H,H,T,H,T,T,T,T) 2. What is the probability that the observation sequence came entirely from state q1 ? Copyright © IRCCyN/CVG

Example of non ergodic model (left-right model)
0.7 0.4 0.1 qs 0.2 q1 0.8 q2 0.6 q3 0.9 qf 0.1 0.1 0.1 Three states + one starting state qs + one final state qf qs and qf are non emitting states. Assume there are 2 symbols to observe X = {x1=a,x2=b} P(a|q1) P(b|q3) Initial state Transition state Observation symbol probabilities probabilities probabilities Copyright © IRCCyN/CVG

The most probable state sequence with this model is : q2, q3 resulting in the symbol sequence “bb”. But this sequence can also be generated by other state sequences, such as q1, q2. Computation of the likelihood of an observation sequence : Given X = “aaa” compute the likelihood for this model : P(aaa | l) The likelihood P(X | l) is given by the sum over all possible ways to generate X. qs q1 qf q3 q2 0.2 0.9 0.6 0.8 0.4 0.1 0.7 Copyright © IRCCyN/CVG

Using HMM for pattern recognition consists in computing the model li among a set of K models which maximizes the likehood for an observation to have been generated by this model : lmax = arg max P(X|li) for i = 1, … K li Character recognition : Small lexicon : as many HMModels as words Otherwise, letters are individually modeled by an HMM which can be concatenated to form word models. lP lA lR lI lS Word model for « PARIS » Copyright © IRCCyN/CVG

The three basic problems for HMMs
Problem 1 : Recognition Given X = (x1,x2, … xT) and the various models li How to efficiently compute P(X|l) ? Problem 2 : Analysis Given X = (x1,x2, … xT) and a model l, find the optimal state sequence G. How can we undiscovered the sequence of states corresponding to a given observation ? Problem 3 : Learning Given X = (x1,x2, … xT), estimate model parameters l = (P, A, B) that maximize P(X| l) How do we adjust the model parameters l = (P, A, B) ? Forward-Backward algorithm Viterbi algorithm Baum-Welch algorithm Copyright © IRCCyN/CVG

How to efficiently compute P(X|l) ?
Problem 1 : Recognition How to efficiently compute P(X|l) ? X = x1x2 …xt … xT : observation sequence It exists several paths (G) which allow to obtain X : P(X|l) = SG P( X,G |l) = SG P(X | G, l) x P(G|l) Joint probability Depends only on observation probabilities : B matrix Depends only on state transition probabilities : A P(X | G, l) The path G is defined by a sequence of states : q1q2 …qt … qT P(X | G, l) = P(x1x2 …xt … xT | q1q2 …qt … qT , l) = P(x1| q1…qT , l)xP(x2| x1,q1… qT , l)…P(xT| xT-1 … x1,q1… qT , l) = P(x1| q1 , l)xP(x2| q2, l)…P(xT|qT, l) as xt depends only of qt Copyright © IRCCyN/CVG

Problem 1 : Recognition (2) How to efficiently compute P(X|l) ?
P(X|l) = SG P( X,G | l) = SG P(X | G, l) x P(G|l) P(G|l) The path G is defined by a sequence of states : q1q2 …qt … qT P(G | l) = P(q1q2 …qt … qT | l) = P(q1|l) x P(q2|q1, l)…P(qT| qT-1 … q1, l) = P(q1|l) x P(q2| q1, l)…P(qT|qT-1, l) Joint probability as we assume a first order HMM At last, by replacing : Copyright © IRCCyN/CVG

What about the computational complexity ? Number of multiplications for one path G : Number of paths X : Total number of multiplications : Total number of additions : (T-1)+1+1+(T-2) =2T-1 NT (2T-1)NT (NT-1) How long does it take ? Assume N = 23, T = 15 (word check application) Number of operations  2T.NT =  1022 Assume 1 Gops  Number of seconds = 1022/109 = 1013  Number of days = 1013/3600x24 = 108  Number of years = 108/365 = 105 FIND SOMETHING ELSE !!! Copyright © IRCCyN/CVG

Forward-Backward algorithm
Use a trellis structure to carry out the computations : at each node of trellis, store the forward variable ati with ati = P(x1x2 … xt, qt = qi | l) which is the probability of a partial observation sequence up to time t and of being in the state qi at that same time Algorithm in 3 steps : 1. Initialization a1i = P(x1, q1 = qi | l) = P(x1| q1 = qi,l) x P(q1 = qi|l) 2. Recursion with 1  j  N, and 1  t  T-1 3. Termination Copyright © IRCCyN/CVG

Forward-Backward algorithm
1 ... t t+1 ... a1j at1 qN . qi qj q1 atN ati aNj aij with 1  j  N, and 1  t  T-1 Total number of multiplications : Total number of additions : Assume N = 23, T = 15 Number of operations : Copyright © IRCCyN/CVG

Problem 2 : Analysis How can we undiscovered the sequence of states corresponding to a given observation X ? Choose the most likely path : Find the path (q1,q2,…,qT) that maximizes the probability = P(q1,q2,…,qT| X,l) Solution by Dynamic Programming : It is an inductive algorithm that keep the best possible state sequence ending in state qi at time t. Viterbi algorithm Copyright © IRCCyN/CVG

VITERBI Algorithm Define dt(i) = max P(q1,q2,…,qt=qi , x1,x2,…xt | l) q1,q2,…,qt-1 dt(i) is the highest probability path ending in state qi By induction, we have : dt+1(k) = max [dt(i) aik] . bk(xt+1), with 1 k N 1iN Memorize also t+1(k) = arg max(dt(i) aik) N k j i 2 1 STATES a1k aik aNk dt+1(k) dt(i) t+1(k) = j Tracing back the optimal state sequence max [dT(i)] 1 i N 1 2 t-1 t t+1 T-1 T TIME x1 x2 xt-1 xt xt+1 xT-1 xT OBSERVATION Copyright © IRCCyN/CVG

VITERBI Algorithm 1. Initialization For 1 i N d1(i) = pi x bi(x1); // or -ln(pi)-ln(bi(x1)) 1(i) = 0; 2. Recursive computation For 2 t T For 1 j N dt(j) = max [dt-1(i) aij] . bj(xt); // or dt(j) = min[dt-1(i)-ln( aij)]-ln(bj(xt)); 1iN t(j) = arg max(dt-1(i) aij); // or t(j) = arg min[dt-1(i)-ln(aij)); 3. Termination P* = max[dt(i)]; // or P* = min[dt(i)]; q*T = arg max[dT(i)]; // or q*T = arg min[dT(i)]; 4. Backtracking For t=T-1 down to 1 q*t = t(q*t+1); Hence P* (or exp(-P*) ) gives the required state-optimized probability,and G* = (q1*,q2*, …, qT*) is the optimal state sequence. Copyright © IRCCyN/CVG

Problem 3 : Learning How do we adjust the model parameters l = (P, A, B) ? Baum-Welch algorithm 1. Let initial model be l0, 2. Compute new model l based on l0 and on observation X, 3. If log P(X|l) - log(P(X|l0) < Delta stop, 4. Else set l0  l and goto step 2. Copyright © IRCCyN/CVG

Joint Probability P(a, b) = P(a | b) x P(b) = P(b | a) x P(a) Copyright © IRCCyN/CVG

SECONDE PARTIE : RECONNAISSANCE DE L’ECRITURE MANUSCRITE

OFF-LINE VERSUS ON-LINE

LA RECONNAISSANCE 2D 2D 1D OrdRec 1D Image d’un mot
Modélisation graphe : Ordonnancement off-line Graphèmes Colonnes de pixels Lettres 2D 1D OrdRec 1D HMM Copyright © IRCCyN/CVG

UN MODELE APPROPRIE POUR UNE OPTIMISATION GLOBALE
1 DEFINITION D’UN NOEUD 2 GRAPHE ORIGINAL 2 1 R2 3 4 Problème du voyageur de commerce recherche d’un cycle hamiltonien 5 7 6 8 R3 Modèle 1 2 3 1 4 4 2 3 5 lien inter lien intra 7 7 6 6 5 8 Copyright © IRCCyN/CVG 8

UN MODELE APPROPRIE POUR UNE OPTIMISATION GLOBALE
1 DEFINITION D’UN NOEUD 2 GRAPHE ORIGINAL 2 1 3 4 3 GRAPHE COMPLET 5 7 4 GRAPHE FINAL 6 8 5 VALUATION DU GRAPHE 1 6 CHEMIN HAMILTONIEN 2 3 1 4 4 2 3 5 P 7 7 6 6 5 Lien inter Lien intra 8 Lien de complétude Lien de départ et d’arrivée Copyright © IRCCyN/CVG 8

Synoptique du Système de RECONNAISSANCE
Feature Vectors Observations : 1D sequence of symbols Feature Extraction Symbol Mapping HMM - State transition prob. - Observations symbol prob. - Initial state prob. rugby : -2.15 ……… : …… Result Copyright © IRCCyN/CVG

RECONNAISSANCE SYSTEME DE Image d’un mot Segmentation Fichier on-line
Ensemble de segments Graphe SYSTEME Ordonnancement on-line Normalisation Ordonnancement off-line DE Séquence de segments orientés et de levers-posés de stylo Lignes de référence Extraction de caractéristiques Quantification vectorielle Séquence de symboles un deux HMM frs Vraisemblances pour chaque mot du dictionnaire Copyright © IRCCyN/CVG

EXAMPLES of CLUSTERS Core line Core line Core line Core line Core line Number of clusters : 300 Copyright © IRCCyN/CVG

IRONOFF RESULTATS Une base de données duales on-line et off-line  caractères isolés mots cursifs environ 700 scripteurs différents (64% d’hommes et 34% de femmes) âge moyen d’un scripteur : 26 ans et demi Dictionnaire : 197 mots Apprentissage : mots Test : mots Copyright © IRCCyN/CVG

IRONOFF Construction Online data Step 1 : online acquisition
Step 2 : offline acquisition Gray level image Step 3 : matching process Copyright © IRCCyN/CVG

COMPARAISON OffLineSeg et OnLineSeg

COMPARAISON OnLineSeg et OnLinePt

EXEMPLES Erreurs dues à un pré-traitement : normalisation du mot
Erreurs d ’annotation : mot inconnu dans une certaine casse Erreurs dues aux limitations du modèle Copyright © IRCCyN/CVG

EXEMPLES Erreurs dues à un pré-traitement : normalisation du mot
Erreurs d ’annotation : mot inconnu dans une certaine casse Erreurs dues aux limitations du modèle Copyright © IRCCyN/CVG

CONCATENATION DE MODELES LETTRES POUR DEFINIR DES MODELES MOTS
état initial états non émetteurs état final APPROCHE “LEGO” Copyright © IRCCyN/CVG

MODELE MOT : APPROCHE « LEGO »
dégât qD qF d e g a t dia dia dia dia dégât 15 d é div_0 dia g div_+ dia â div_0 dia divDia_0 t div_F dia divDia_F Copyright © IRCCyN/CVG

APPROCHE « ORDREC » APPROCHE « ORDREC » Graphe des segments
Graphe des traits Posés de stylo et ordre des traits Posés et levers de stylo Levers de stylo Copyright © IRCCyN/CVG