The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH.

The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH

Contents w Introduction w The Inside-Outside algorithm w Regular versus context-free grammar w Pre-training w The use of grammar minimization w Implementation w Conclusions

Introduction - 1 w HMM => SCFG in speech recognition tasks w The advantages of SCFG’s  Ability to capture embedded structure within speech data useful at lower levels such as phonological rule system  Learning: a simple extension of the Baum-Welch re-estimation procedure (Inside-Outside algorithm) w Little previous works of SCFG’s in speech w Two factors for the limited interest in speech  The increased power of CFG’s: not useful for natural language If all of the sentences = finite, CFG = RG  The time complexity of the Inside-Outside algorithm O(n 3 ) : n = input string length + the # of grammar symbols

Introduction - 2 w Usefulness of CFG’s in NL  The ability to model derivation probabilities > the ability to determine language membership  So, this paper introduces the Inside-Outside algorithm compares CFG with RG using the entropy of the language generated by each grammar w Reduction of time complexity of In-Outside algorithm  This paper describes a novel pre-training algorithm (smaller iteration) minimizes the number of non-terminal with grammar minimization (GM) : smaller symbols implements the In-Outside algorithm using a parallel transputer array : smaller input string length

The Inside-Outside algorithm - 1 w Chomsky Normal Form (CNF) in SCFG  Generated observation sequence: O = O 1, O 2, …, O T  The matrices of parameters w An application of SCFS’s  recognition :  training :

The Inside-Outside algorithm - 2 w Definition of inner (e) and outer (f) probabilities S i 1s-1t+1Tst Inner probability Outer probability i S i

The Inside-Outside algorithm - 3 w Inner probability: be computed bottom-up  Case 1: (s=t) the form i  m  Case 2: (s  t) the form i  jk jk i srr+1t

The Inside-Outside algorithm - 4 w Outer Probability: be computed top-down ki j rs-1st ik j stt+1rT1T1 +

The Inside-Outside algorithm - 5 w Recognition Process  By setting s=1, t=T,  By setting s=t,

The Inside-Outside algorithm - 6 w Training Process 

The Inside-Outside algorithm - 7   Re-estimation formula for a[i,j,k] and b[i,m]

The Inside-Outside algorithm - 8 w The Inside-Outside algorithm 1. Choose suitable initial values for the A and B matrices 2. Repeat A = … {Equation 20} B = … {Equation 21} P = … {Equation 11} UNTIL change in P is less than a set threshold

Regular versus context-free grammar w Measurements for the comparison  The entropy making an  -representation of L  Empirical entropy w Language for the comparison: palindromes w The number of parameters for each grammar  SCFG: N(# of non-terminal), M(# of terminal) => N 3 +NM  HMM(RG): K(# of states), M(# of terminal) => K 2 +(M+2)K w Condition for the comparison : N 3 +NM  K 2 +(M+2)K w The result (the ability to model derivation probabilities)  SCFG > RG

Pre-training - 1 w Goal: start off with good initial estimates  reducing the number of re-estimation cycles required (40%)  facilitating the generation of a good final model w Pre-training 1. Use Baum-Welch algorithm (O(n 2 )) to obtain a set of RG rules 2. RG rules (final matrices) => SCFG rules (initial matrices) 3. Start off the Inside-Outside algorithm (O(n 3 )) with the initial matrices  Time complexity: a n 2 + b n 3 << c n 3, if b << c

Pre-training - 2 w Modification (RG => SCFG) (a) For each b jk, define Y j  k with probability b jk. (b) For each a ij, define X i  Y a X j with probability a ij. (c) For each S i, define S  X i with probability S i. If X i  Y a X l with a il, S  Y a X l with S i a il (d) For each F j, define X j  Y a with probability F j. If Y a  k with b ak, X j  k with b ak F j  The remaining zero parameters => RG all parameters += floor value; (floor value = 1/ # of non-zero parameters) re-normalization for

The use of grammar minimization - 1 w Goal: detect and eliminate redundant and/or useless symbols  Good grammar: self-embedding CFG = self-embedding, if a A such that A *  wAx and neither w nor x is . Require more non-terminal symbols  Smaller n: speed up the Inner-Outer algorithm w Constraining the In-Outside algo.  Greedy symbols: take too many non-terminals  Constrains allocate a non-terminal to each terminal symbol force the remaining non-terminals to model hidden branching process  Infeasible for practical approaches (i.e. speech): because of inherent ambiguity

The use of grammar minimization - 2 w Two ways for GM incorporated into the In-Outside algo.  First approach: computationally intractable In-Out algo.: start with fixed maximum symbols GM: periodically detect and eliminate redundant and useless symbols  Second approach: more practical In-Out algo.: start with the desired number of non-terminals GM: periodically(or log P(S) < threshold) detect and reallocate redundant symbols

The use of grammar minimization - 3 w GM algorithm (ad hoc) 1. Detect greedy symbols in bottom-up fashion 1.1 redundant non-terminals are replaced by a single non-terminal 1.2 free the redundant non-terminals (free non-terminals) 1.3 the same rules are collapsed into a single rule by adding their probabilities 2. Fix the parameters of the remaining non-terminals involved in the generation of greedy symbols (excluded from (3) and (4)) 3. For each free non-terminal i, 3.1 b[i,m]= zero, if m is a greedy symbol, randomize b[i,m], otherwise. 3.2 a[i,j,k] = zero, if j and k are the non-terminals of step 2, randomize a[i,j,k], otherwise. 4. Randomize a[i,j,k] : i(the non-terminals of step2), j and k(free non- terminals)

Implementation using transputer array w Goal:  Speed up the In-Outside algo. (100 times faster) Split the training data into several subsets The in-Outside algo. works independently on each subset w Implementation SUN Control board Control board Transputer 1 Transputer 1 Transputer 2 Transputer 2 Transputer 64 Transputer 64... Computes the update parameter set and transmits it down the chain to all the others. Computes the update parameter set and transmits it down the chain to all the others. Each tranputer works independently on its own data set.

Conclusions w Usefulness of CFG’s in NL  This paper introduced the Inside-Outside algorithm in speech recognition compares CFG with RG using the entropy of the language generated by each grammar in “toy” problem w Reduction of time complexity of In-Outside algorithm  This paper described a novel pre-training algorithm (smaller iteration) proposed an ad hoc grammar minimization (GM) : smaller symbols implemented the In-Outside algorithm using a parallel transputer array : smaller input string length w Further Research  build SCFG models trained from real speech data

The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH.

Similar presentations

Presentation on theme: "The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH.

Similar presentations

Presentation on theme: "The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH."— Presentation transcript:

Similar presentations

About project

Feedback