# Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates.

## Presentation on theme: "Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates."— Presentation transcript:

Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates

Goals: To learn the syntax of utterances Approach: SCFG (Stochastic Context Free Grammar) M= V-finite set of non-terminal E-finite set of terminals R-finite set of rules, each r has p(r). Sum of p(r) of the same left-hand side = 1 S-start symbol

Problems with most SCFG Learning Algorithms 1)Expensive storage: need to store a corpus of complete sentences 2)Time-consuming: algorithms needs to repeat passes throughout all data

Learning SCFG Inducing context-free structure from corpus(sentences) Learning – the production(rules) probabilities

Learning SCFG –Cont General method: Inside/Outside algorithm –Expectation- Maximization (EM) Find expectation of rules Maximize the likelihood given both expectation & corpus Disadvantage of Inside/Outside algo. –Entire sentence corpus must be stored using some representation(eg. chart parse) –Expensive storage (unrealistic for human agent!)

Proposed Algorithm Use Unique Normal Form (UNF) –Replace all terminal A-z to 2 new rules A->D p[A->D]=p[A->z] D-> z p[D->z]=1 –No two productions have the same right hand side

Learning SCFG- Proposed Algorithm -cont Use Histogram –Each rule has 2 histograms (H o r, H L r )

Proposed Algorithm -cont –H o r -contructed when parsing sentences in O – H L r- -will continue to be updated throughout learning process H L r rescale to fixed size h –Why?! –Recently used rules has more impact on histogram

Comparing between H L r & H o r Relative entropy T decrease- increase prob of rules used –(if s large, increase prob of rules used when parsing last sentence ) T increase- decrease prob of rules used (eg p t+1 (r)=0.01* p t+1 (r)

Comparing Inside/Outside Algo with the proposed algorithm Inside/Outside –O(n 3 ) Good –3-5 iterations Bad –Need to store complete sentence corpus Proposed Algo –O(n 3 ) Bad –500-1000 iterations Good –Memory requirements is constant!

Download ppt "Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates."

Similar presentations