The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH.

Slides:



Advertisements
Similar presentations
Context free languages 1. Equivalence of context free grammars 2. Normal forms.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
Lecture # 11 Grammar Problems.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Hidden Markov Models Eine Einführung.
Hidden Markov Models Theory By Johan Walters (SR 2003)
CS5371 Theory of Computation
Lecture Note of 12/22 jinnjy. Outline Chomsky Normal Form and CYK Algorithm Pumping Lemma for Context-Free Languages Closure Properties of CFL.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Project 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May 16, 2001.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule substitute B equivalent grammar.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Normal forms for Context-Free Grammars
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Towers of Hanoi. Introduction This problem is discussed in many maths texts, And in computer science an AI as an illustration of recursion and problem.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
The CYK Parsing Method Chiyo Hotani Tanya Petrova CL2 Parsing Course 28 November, 2007.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
CONVERTING TO CHOMSKY NORMAL FORM
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Some Probability Theory and Computational models A short overview.
Context-free Grammars [Section 2.1] - more powerful than regular languages - originally developed by linguists - important for compilation of programming.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
CSCI 2670 Introduction to Theory of Computing September 15, 2005.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Inside-outside reestimation from partially bracketed corpora F. Pereira and Y. Schabes ACL 30, 1992 CS730b김병창 NLP Lab
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱 陳思安.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
Context Free Grammars and Regular Grammars Needs for CFG Grammars and Production Rules Context Free Grammars (CFG) Regular Grammars (RG)
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
N-Gram Model Formulas Word sequences Chain rule of probability
CONTEXT DEPENDENT CLASSIFICATION
Context-Free Languages
Presentation transcript:

The estimation of stochastic context-free grammars using the Inside-Outside algorithm Oh-Woog Kwon KLE Lab. CSE POSTECH

Contents w Introduction w The Inside-Outside algorithm w Regular versus context-free grammar w Pre-training w The use of grammar minimization w Implementation w Conclusions

Introduction - 1 w HMM => SCFG in speech recognition tasks w The advantages of SCFG’s  Ability to capture embedded structure within speech data useful at lower levels such as phonological rule system  Learning: a simple extension of the Baum-Welch re-estimation procedure (Inside-Outside algorithm) w Little previous works of SCFG’s in speech w Two factors for the limited interest in speech  The increased power of CFG’s: not useful for natural language If all of the sentences = finite, CFG = RG  The time complexity of the Inside-Outside algorithm O(n 3 ) : n = input string length + the # of grammar symbols

Introduction - 2 w Usefulness of CFG’s in NL  The ability to model derivation probabilities > the ability to determine language membership  So, this paper introduces the Inside-Outside algorithm compares CFG with RG using the entropy of the language generated by each grammar w Reduction of time complexity of In-Outside algorithm  This paper describes a novel pre-training algorithm (smaller iteration) minimizes the number of non-terminal with grammar minimization (GM) : smaller symbols implements the In-Outside algorithm using a parallel transputer array : smaller input string length

The Inside-Outside algorithm - 1 w Chomsky Normal Form (CNF) in SCFG  Generated observation sequence: O = O 1, O 2, …, O T  The matrices of parameters w An application of SCFS’s  recognition :  training :

The Inside-Outside algorithm - 2 w Definition of inner (e) and outer (f) probabilities S i 1s-1t+1Tst Inner probability Outer probability i S i

The Inside-Outside algorithm - 3 w Inner probability: be computed bottom-up  Case 1: (s=t) the form i  m  Case 2: (s  t) the form i  jk jk i srr+1t

The Inside-Outside algorithm - 4 w Outer Probability: be computed top-down ki j rs-1st ik j stt+1rT1T1 +

The Inside-Outside algorithm - 5 w Recognition Process  By setting s=1, t=T,  By setting s=t,

The Inside-Outside algorithm - 6 w Training Process 

The Inside-Outside algorithm - 7   Re-estimation formula for a[i,j,k] and b[i,m]

The Inside-Outside algorithm - 8 w The Inside-Outside algorithm 1. Choose suitable initial values for the A and B matrices 2. Repeat A = … {Equation 20} B = … {Equation 21} P = … {Equation 11} UNTIL change in P is less than a set threshold

Regular versus context-free grammar w Measurements for the comparison  The entropy making an  -representation of L  Empirical entropy w Language for the comparison: palindromes w The number of parameters for each grammar  SCFG: N(# of non-terminal), M(# of terminal) => N 3 +NM  HMM(RG): K(# of states), M(# of terminal) => K 2 +(M+2)K w Condition for the comparison : N 3 +NM  K 2 +(M+2)K w The result (the ability to model derivation probabilities)  SCFG > RG

Pre-training - 1 w Goal: start off with good initial estimates  reducing the number of re-estimation cycles required (40%)  facilitating the generation of a good final model w Pre-training 1. Use Baum-Welch algorithm (O(n 2 )) to obtain a set of RG rules 2. RG rules (final matrices) => SCFG rules (initial matrices) 3. Start off the Inside-Outside algorithm (O(n 3 )) with the initial matrices  Time complexity: a n 2 + b n 3 << c n 3, if b << c

Pre-training - 2 w Modification (RG => SCFG) (a) For each b jk, define Y j  k with probability b jk. (b) For each a ij, define X i  Y a X j with probability a ij. (c) For each S i, define S  X i with probability S i. If X i  Y a X l with a il, S  Y a X l with S i a il (d) For each F j, define X j  Y a with probability F j. If Y a  k with b ak, X j  k with b ak F j  The remaining zero parameters => RG all parameters += floor value; (floor value = 1/ # of non-zero parameters) re-normalization for

The use of grammar minimization - 1 w Goal: detect and eliminate redundant and/or useless symbols  Good grammar: self-embedding CFG = self-embedding, if a A such that A *  wAx and neither w nor x is . Require more non-terminal symbols  Smaller n: speed up the Inner-Outer algorithm w Constraining the In-Outside algo.  Greedy symbols: take too many non-terminals  Constrains allocate a non-terminal to each terminal symbol force the remaining non-terminals to model hidden branching process  Infeasible for practical approaches (i.e. speech): because of inherent ambiguity

The use of grammar minimization - 2 w Two ways for GM incorporated into the In-Outside algo.  First approach: computationally intractable In-Out algo.: start with fixed maximum symbols GM: periodically detect and eliminate redundant and useless symbols  Second approach: more practical In-Out algo.: start with the desired number of non-terminals GM: periodically(or log P(S) < threshold) detect and reallocate redundant symbols

The use of grammar minimization - 3 w GM algorithm (ad hoc) 1. Detect greedy symbols in bottom-up fashion 1.1 redundant non-terminals are replaced by a single non-terminal 1.2 free the redundant non-terminals (free non-terminals) 1.3 the same rules are collapsed into a single rule by adding their probabilities 2. Fix the parameters of the remaining non-terminals involved in the generation of greedy symbols (excluded from (3) and (4)) 3. For each free non-terminal i, 3.1 b[i,m]= zero, if m is a greedy symbol, randomize b[i,m], otherwise. 3.2 a[i,j,k] = zero, if j and k are the non-terminals of step 2, randomize a[i,j,k], otherwise. 4. Randomize a[i,j,k] : i(the non-terminals of step2), j and k(free non- terminals)

Implementation using transputer array w Goal:  Speed up the In-Outside algo. (100 times faster) Split the training data into several subsets The in-Outside algo. works independently on each subset w Implementation SUN Control board Control board Transputer 1 Transputer 1 Transputer 2 Transputer 2 Transputer 64 Transputer Computes the update parameter set and transmits it down the chain to all the others. Computes the update parameter set and transmits it down the chain to all the others. Each tranputer works independently on its own data set.

Conclusions w Usefulness of CFG’s in NL  This paper introduced the Inside-Outside algorithm in speech recognition compares CFG with RG using the entropy of the language generated by each grammar in “toy” problem w Reduction of time complexity of In-Outside algorithm  This paper described a novel pre-training algorithm (smaller iteration) proposed an ad hoc grammar minimization (GM) : smaller symbols implemented the In-Outside algorithm using a parallel transputer array : smaller input string length w Further Research  build SCFG models trained from real speech data