Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates.

Similar presentations


Presentation on theme: "Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates."— Presentation transcript:

1 Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates

2 Goals: To learn the syntax of utterances Approach: SCFG (Stochastic Context Free Grammar) M= V-finite set of non-terminal E-finite set of terminals R-finite set of rules, each r has p(r). Sum of p(r) of the same left-hand side = 1 S-start symbol

3 Problems with most SCFG Learning Algorithms 1)Expensive storage: need to store a corpus of complete sentences 2)Time-consuming: algorithms needs to repeat passes throughout all data

4 Learning SCFG Inducing context-free structure from corpus(sentences) Learning – the production(rules) probabilities

5 Learning SCFG –Cont General method: Inside/Outside algorithm –Expectation- Maximization (EM) Find expectation of rules Maximize the likelihood given both expectation & corpus Disadvantage of Inside/Outside algo. –Entire sentence corpus must be stored using some representation(eg. chart parse) –Expensive storage (unrealistic for human agent!)

6 Proposed Algorithm Use Unique Normal Form (UNF) –Replace all terminal A-z to 2 new rules A->D p[A->D]=p[A->z] D-> z p[D->z]=1 –No two productions have the same right hand side

7 Learning SCFG- Proposed Algorithm -cont Use Histogram –Each rule has 2 histograms (H o r, H L r )

8 Proposed Algorithm -cont –H o r -contructed when parsing sentences in O – H L r- -will continue to be updated throughout learning process H L r rescale to fixed size h –Why?! –Recently used rules has more impact on histogram

9 Comparing between H L r & H o r Relative entropy T decrease- increase prob of rules used –(if s large, increase prob of rules used when parsing last sentence ) T increase- decrease prob of rules used (eg p t+1 (r)=0.01* p t+1 (r)

10 Comparing Inside/Outside Algo with the proposed algorithm Inside/Outside –O(n 3 ) Good –3-5 iterations Bad –Need to store complete sentence corpus Proposed Algo –O(n 3 ) Bad –500-1000 iterations Good –Memory requirements is constant!


Download ppt "Incrementally Learning Parameter of Stochastic CFG using Summary Stats Written by:Brent Heeringa Tim Oates."

Similar presentations


Ads by Google