Presentation is loading. Please wait.

Presentation is loading. Please wait.

Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.

Similar presentations


Presentation on theme: "Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you."— Presentation transcript:

1 Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you did bad and when you did good. Often called the critic. –Training Experience: Basically, you ’ ve got to know how you ’ re going to get the data you ’ re going to train the whole furshlugginer thing with.

2 Checkers T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it?

3 Type of Training Experience Direct or indirect? –Direct: board state -> correct move –Indirect: outcome of a complete game –Credit assignment problem Teacher or not ? –Teacher selects board states –Learner can select board states –Randomly selected board states? Is training experience representative of performance goal? –Training playing against itself –Performance evaluated playing against world champion

4 Choose Target Function ChooseMove : B  M : board state  move –Maps a legal board state to a legal move Evaluate : B  V : board state  board value –Assigns a numerical score to any given board state, such that better board states obtain a higher score –Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score

5 Possible Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = - 100 If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b’), where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational

6 State Space Search V(b)= ? V(b)= max i V(b i ) m 1 : b  b 1 m 2 : b  b 2 m 3 : b  b 3

7 State Space Search V(b 1 )= ? m 4 : b  b 4 m 5 : b  b 5 m 6 : b  b 6 V(b 1 )= min i V(b i )

8 Final Board States Black wins: V(b)=-100 Red wins: V(b)=100 draw: V(b)=0

9 Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + … … + 9!/(2! 4! 3!) + … 9 = 6045 4 x 4 checkers: #board states = ? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17

10 Choose Representation of Target Function Table look-up Collection of rules Decision Trees Neural networks Polynomial function of board features Trade-offs in choosing an expressive representation ??? –Approximation accuracy –Number of training examples to learn the target function

11 Representation of Target Function V(b)=  0 +  1 bp(b) +  2 rp(b) +  3 bk(b) +  4 rk(b) +  5 bt(b) +  6 rt(b) bp(b): #black pieces rp(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red

12 Obtaining Training Examples V(b) : true target function V’(b) : learned target function V train (b) : training value (estimate) Rule for estimating training values: V train (b)  V’(Successor(b))

13 Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) – V’(b) 2. For each board feature fi, update weight  i   i +  f i error(b)  : try learning rate = 0.1

14 Example: 4x4 checkers V(b)=  0 +  1 rp(b) +  2 bp(b) Initial weights:  0 =-10,  1 =75,  2 =-60 V(b 0 )=  0 +  1 *2 +  2 *2 = 20 m 1 : b  b 1 V(b 1 )=20 m 2 : b  b 2 V(b 2 )=20 m 3 : b  b 3 V(b 3 )=20

15 Example 4x4 checkers V(b 1 )=20 V(b 0 )=20 1. Compute error(b 0 ) = V train (b) – V(b 0 ) = V(b 1 ) – V(b 0 ) = 0 2. For each board feature fi, update weight  i   i +  f i error(b)  0   0 + 0.1 * 1 * 0  1   1 + 0.1 * 2 * 0  2   2 + 0.1 * 2 * 0

16 Example: 4x4 checkers V(b 2 )=20 V(b 0 )=20 V(b 1 )=20 V(b 3 )=20

17 Example: 4x4 checkers V(b 4b )=-55 V(b 4a )=20 V(b 3 )=20

18 Example 4x4 checkers V(b 4 )=-55 V(b 3 )=20 1. Compute error(b 3 ) = V train (b) – V(b 3 ) = V(b 4 ) – V(b 3 ) = -75 2. For each board feature fi, update weight  i   i +  f i error(b) :  0 =-10,  1 =75,  2 =-60  0   0 - 0.1 * 1 * 75,  0 = -17.5  1   1 - 0.1 * 2 * 75,  1 = 60  2   2 - 0.1 * 2 * 75,  2 = -75

19 Example: 4x4 checkers V(b 5 )=-107.5 V(b 4 )=-107.5  0 = -17.5,  1 = 60,  2 = -75

20 Example 4x4 checkers V(b 6 )=-167.5 V(b 5 )=-107.5 error(b 5 ) = V train (b) – V(b 5 ) = V(b 6 ) – V(b 5 ) = -60  0 =-17.5,  1 =60,  2 =-75  i   i +  f i error(b)  0   0 - 0.1 * 1 * 60,  0 = -23.5  1   1 - 0.1 * 1 * 60,  1 = 54  2   2 - 0.1 * 2 * 60,  2 = -87

21 Example 4x4 checkers V(b 6 )=-197.5 error(b 6 ) = V train (b) – V(b 6 ) = V f (b 6 ) – V(b 6 ) = 97.5  0 =-23.5,  1 =54,  2 =-87  i   i +  f i error(b)  0   0 + 0.1 * 1 * 97.5,  0 = –13.75  1   1 + 0.1 * 0 * 97.5,  1 = 54  2   2 + 0.1 * 2 * 97.5,  2 = -67.5 Final board state: black won V f (b)=-100

22 Checkers Task ? Performance Measure? Training Experience? –Type of knowledge to be learned –Target function, representation –Getting training data –Learning algorithm?

23 Questions once you define learning problem What can the system learn? –Can it learn the target function? Is this an important question? Why or why not? A related question is “can it learn any computable function?” What factors influence the answer to this question? –Target representation. –Training procedure. Training set. What is the system likely to learn? –Is this the same as the first main question? –Probabilistically. –If unlikely to learn the correct answer, how bad can it get? How bad is it likely to get? What does the answer to this depend on? How “hard” is it for the system to learn? –How much time does it take? –How much memory? –How can I tell when it’s done? If it’s stuck? If it’s better than another approach? We have constrained the evaluation function to a linear sum of features. Can the true evaluation function be represented by these features in this way? Even if so, will the learning technique be able to learn the correct weights? –Practically? –If it can ’ t, how bad can it get? How bad is it to likely get? –What do we think might be a problem with how we’ve set this one up?

24 Choose Representation of Target Function Table look-up Collection of rules Decision Trees Neural networks Polynomial function of board features Trade-offs in choosing an expressive representation ??? –Approximation accuracy –Number of training examples to learn the target function

25 Optical Character Recognition Task ? Performance Measure? Training Experience? –Type of knowledge to be learned –Target function, representation –Getting training data –Learning algorithm?


Download ppt "Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you."

Similar presentations


Ads by Google