CS 182 Sections 101 - 102 slides created by Eva Mok modified by JGM March 15 2006.

CS 182 Sections 101 - 102 slides created by Eva Mok (emok@icsi.berkeley.edu) modified by JGM March 15 2006

Announcements a5 is due Friday night at 11:59pm a6 is out tomorrow (2 nd coding assignment), due the Monday after spring break Midterm solution will be posted (soon)

Quick Recap This Week –you just had the midterm –a bit more motor control –some belief net, feature structure Coming up –Bailey’s Model of learning hand action words

Your Task: As far as the brain / thought / language is concerned, what is the single biggest mystery to you at this point?

Remember Recruitment Learning? One-shot learning The idea is for things like words or grammar, kids learn at least something given a single input Granted, they might not get it completely right in the first shot But over time, their knowledge slowly converges to the right answer (i.e. built a model to fit the data)

Model Merging Goal: –learn a model given data The model should: –explain the data well –be "simple" –be able to make generalizations

Naïve way to make a model create a special case for each piece of data of course get the training data completely right cannot generalize at all when test data comes how to fix this — Model Merging "compact" the special cases into more descriptive rules without losing too much performance

Basic idea of Model Merging Start with the naïve model: one special case for each piece of data While performance increases –Create a more general rule that explains some of the data –Discard the corresponding special cases

2 examples of Model Merging Bailey’s VerbLearn system –model that maps actions to verb labels –performance: complexity of model + ability to explain data  MAP Assignment 6 - Grammar Induction –model that maps sentences to grammar rules –performance: size of grammar + derivation length of sentences  cost

Grammar Grammar: rules that governs what sentences are legal in a language e.g. Regular Grammar, Context Free Grammar Production rules in a grammar have the form    Terminal symbols: a, b, c, etc Non-terminal symbols: S, A, B, X, etc Different classes of grammar restrict where these symbols can go We’ll see an example on the next page

Right-Regular Grammar Right-Regular Grammar is a further restricted class of Regular Grammar Non terminal symbols are always on the right end e.g: S -> a b c X X -> d e X -> f valid sentences would be "abcde" and "abcf“

Grammar Induction As input data (e.g. “abcde”, “abcf”) comes in, we’d like to build up a grammar that explains the data We can certainly have one rule for each sentence we see in the data  naive approach, no generalization Would rather “compact” your grammar In a6, you have two ways of doing this “compaction” –prefix merge –suffix merge

How do we find the model? prefix merge S  a b c d e S  a b c f becomes S  a b c X X  d e X  f suffix merge S  a b c d e S  f c d e becomes S  a b X S  f X X  c d e

Contrived Example Suppose you have these 3 grammar rules: r1: S  eat them here or there r2: S  eat them anywhere r3: S  like them anywhere or here or there 5 merging options –prefix merge (r1, r2, 1) –prefix merge (r1, r2, 2) –suffix merge (r1, r3, 1) –suffix merge (r1, r3, 2) –suffix merge (r1, r3, 3)

Computationally Kids aren’t presented all the data at once Instead they’ll hear these sentences one by one: 1.eat them here or there 2.eat them anywhere 3.like them anywhere or here or there As each sentence (i.e. data) comes in, you create one rule for it, e.g. S  eat them here or there Then you look for ways to merge as more sentences come in

Example 1: just prefix merge After the first two sentences are presented, we can already do a prefix merge of length 2: r1: S  eat them here or there r2: S  eat them anywhere r3: S  eat them X1 r4: X1  here or there r5: X1  anywhere

Example 2: just suffix merge After the first three sentences are presented, we can do a suffix merge of length 3: r1: S  eat them here or there r2: S  eat them anywhere r3: S  like them anywhere or here or there r4: S  eat them X2 r5: S  like them anywhere or X2 r6: X2  here or there

Your Task in a6 pull in sentences one by one monitor your sentences do either a prefix merge or a suffix merge as soon as it’s “good” to do so

How do we know if a model is good? want a small grammar but want it to explain the data well minimize the cost along the way: size of grammarderivation length of sentences c(G) =  s(G) + d(G,D)  : learning factor to play with

Back to Example 2 Your original grammar: r1: S  eat them here or there r2: S  eat them anywhere r3: S  like them anywhere or here or there Remember your data is: 1.eat them here or there 2.eat them anywhere 3.like them anywhere or here or there size of grammar = 15 derivation length of sentences = 1 + 1 + 1 = 3 c(G) =  s(G) + d(G,D) =  ∙ 15 + 3

Back to Example 2 Your new grammar: r2: S  eat them anywhere r4: S  eat them X2 r5: S  like them anywhere or X2 r6: X2  here or there Remember your data is: 1.eat them here or there 2.eat them anywhere 3.like them anywhere or here or there size of grammar = 14 derivation length of sentences = 2 + 1 + 2 = 5 c(G) =  s(G) + d(G,D) =  ∙ 14 + 5 so in fact you SHOULDN’T merge if  ≤ 2

CS 182 Sections 101 - 102 slides created by Eva Mok modified by JGM March 15 2006.

Similar presentations

Presentation on theme: "CS 182 Sections 101 - 102 slides created by Eva Mok modified by JGM March 15 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 182 Sections 101 - 102 slides created by Eva Mok modified by JGM March 15 2006.

Similar presentations

Presentation on theme: "CS 182 Sections 101 - 102 slides created by Eva Mok modified by JGM March 15 2006."— Presentation transcript:

Similar presentations

About project

Feedback