Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.

Slides:



Advertisements
Similar presentations
Machine learning Overview
Advertisements

Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Support Vector Machines
CS 484 – Artificial Intelligence
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
Artificial Intelligence for Games Game playing Patrick Olivier
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
CS Machine Learning.
1er. Escuela Red ProTIC - Tandil, de Abril, Introduction How to program computers to learn? Learning: Improving automatically with experience.
Game Playing CSC361 AI CSC361: Game Playing.
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Reinforcement Learning Introduction Presented by Alp Sardağ.
1 Some rules  No make-up exams ! If you miss with an official excuse, you get average of your scores in the other exams – at most once.  WP only-if you.
A Brief Survey of Machine Learning
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
CS 391L: Machine Learning Introduction
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CISC 235: Topic 6 Game Trees.
Game Playing.
CpSc 810: Machine Learning Design a learning system.
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Computing & Information Sciences Kansas State University Wednesday, 13 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 9 of 42 Wednesday, 13 September.
Machine Learning Introduction. 2 교재  Machine Learning, Tom T. Mitchell, McGraw- Hill  일부  Reinforcement Learning: An Introduction, R. S. Sutton and.
Computing & Information Sciences Kansas State University Lecture 10 of 42 CIS 530 / 730 Artificial Intelligence Lecture 10 of 42 William H. Hsu Department.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2.
Machine Learning for an Artificial Intelligence Playing Tic-Tac-Toe Computer Systems Lab 2005 By Rachel Miller.
Computing & Information Sciences Kansas State University Wednesday, 12 Sep 2007CIS 530 / 730: Artificial Intelligence Lecture 9 of 42 Wednesday, 12 September.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Chapter 1: Introduction. 2 목 차목 차 t Definition and Applications of Machine t Designing a Learning System  Choosing the Training Experience  Choosing.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
1 Introduction to Machine Learning Chapter 1. cont.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Computing & Information Sciences Kansas State University CIS 530 / 730: Artificial Intelligence Lecture 09 of 42 Wednesday, 17 September 2008 William H.
Machine Learning & Datamining CSE 454. © Daniel S. Weld 2 Project Part 1 Feedback Serialization Java Supplied vs. Manual.
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
1 Machine Learning Patricia J Riddle Computer Science 367 6/26/2016Machine Learning.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
Spring 2003 Dr. Susan Bridges
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
AlphaGo with Deep RL Alpha GO.
Adversarial Search.
Machine Learning.
Why Machine Learning Flood of data
Game Playing Fifth Lecture 2019/4/11.
CAP 5610: Introduction to Machine Learning Spring 2011 Dr
Unit II Game Playing.
Presentation transcript:

Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you did bad and when you did good. Often called the critic. –Training Experience: Basically, you ’ ve got to know how you ’ re going to get the data you ’ re going to train the whole furshlugginer thing with.

Checkers T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it?

Type of Training Experience Direct or indirect? –Direct: board state -> correct move –Indirect: outcome of a complete game –Credit assignment problem Teacher or not ? –Teacher selects board states –Learner can select board states –Randomly selected board states? Is training experience representative of performance goal? –Training playing against itself –Performance evaluated playing against world champion

Choose Target Function ChooseMove : B  M : board state  move –Maps a legal board state to a legal move Evaluate : B  V : board state  board value –Assigns a numerical score to any given board state, such that better board states obtain a higher score –Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score

Possible Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b’), where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational

State Space Search V(b)= ? V(b)= max i V(b i ) m 1 : b  b 1 m 2 : b  b 2 m 3 : b  b 3

State Space Search V(b 1 )= ? m 4 : b  b 4 m 5 : b  b 5 m 6 : b  b 6 V(b 1 )= min i V(b i )

Final Board States Black wins: V(b)=-100 Red wins: V(b)=100 draw: V(b)=0

Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + … … + 9!/(2! 4! 3!) + … 9 = x 4 checkers: #board states = ? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17

Choose Representation of Target Function Table look-up Collection of rules Decision Trees Neural networks Polynomial function of board features Trade-offs in choosing an expressive representation ??? –Approximation accuracy –Number of training examples to learn the target function

Representation of Target Function V(b)=  0 +  1 bp(b) +  2 rp(b) +  3 bk(b) +  4 rk(b) +  5 bt(b) +  6 rt(b) bp(b): #black pieces rp(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red

Obtaining Training Examples V(b) : true target function V’(b) : learned target function V train (b) : training value (estimate) Rule for estimating training values: V train (b)  V’(Successor(b))

Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) – V’(b) 2. For each board feature fi, update weight  i   i +  f i error(b)  : try learning rate = 0.1

Example: 4x4 checkers V(b)=  0 +  1 rp(b) +  2 bp(b) Initial weights:  0 =-10,  1 =75,  2 =-60 V(b 0 )=  0 +  1 *2 +  2 *2 = 20 m 1 : b  b 1 V(b 1 )=20 m 2 : b  b 2 V(b 2 )=20 m 3 : b  b 3 V(b 3 )=20

Example 4x4 checkers V(b 1 )=20 V(b 0 )=20 1. Compute error(b 0 ) = V train (b) – V(b 0 ) = V(b 1 ) – V(b 0 ) = 0 2. For each board feature fi, update weight  i   i +  f i error(b)  0   * 1 * 0  1   * 2 * 0  2   * 2 * 0

Example: 4x4 checkers V(b 2 )=20 V(b 0 )=20 V(b 1 )=20 V(b 3 )=20

Example: 4x4 checkers V(b 4b )=-55 V(b 4a )=20 V(b 3 )=20

Example 4x4 checkers V(b 4 )=-55 V(b 3 )=20 1. Compute error(b 3 ) = V train (b) – V(b 3 ) = V(b 4 ) – V(b 3 ) = For each board feature fi, update weight  i   i +  f i error(b) :  0 =-10,  1 =75,  2 =-60  0   * 1 * 75,  0 =  1   * 2 * 75,  1 = 60  2   * 2 * 75,  2 = -75

Example: 4x4 checkers V(b 5 )= V(b 4 )=  0 = -17.5,  1 = 60,  2 = -75

Example 4x4 checkers V(b 6 )= V(b 5 )= error(b 5 ) = V train (b) – V(b 5 ) = V(b 6 ) – V(b 5 ) = -60  0 =-17.5,  1 =60,  2 =-75  i   i +  f i error(b)  0   * 1 * 60,  0 =  1   * 1 * 60,  1 = 54  2   * 2 * 60,  2 = -87

Example 4x4 checkers V(b 6 )= error(b 6 ) = V train (b) – V(b 6 ) = V f (b 6 ) – V(b 6 ) = 97.5  0 =-23.5,  1 =54,  2 =-87  i   i +  f i error(b)  0   * 1 * 97.5,  0 = –13.75  1   * 0 * 97.5,  1 = 54  2   * 2 * 97.5,  2 = Final board state: black won V f (b)=-100

Checkers Task ? Performance Measure? Training Experience? –Type of knowledge to be learned –Target function, representation –Getting training data –Learning algorithm?

Questions once you define learning problem What can the system learn? –Can it learn the target function? Is this an important question? Why or why not? A related question is “can it learn any computable function?” What factors influence the answer to this question? –Target representation. –Training procedure. Training set. What is the system likely to learn? –Is this the same as the first main question? –Probabilistically. –If unlikely to learn the correct answer, how bad can it get? How bad is it likely to get? What does the answer to this depend on? How “hard” is it for the system to learn? –How much time does it take? –How much memory? –How can I tell when it’s done? If it’s stuck? If it’s better than another approach? We have constrained the evaluation function to a linear sum of features. Can the true evaluation function be represented by these features in this way? Even if so, will the learning technique be able to learn the correct weights? –Practically? –If it can ’ t, how bad can it get? How bad is it to likely get? –What do we think might be a problem with how we’ve set this one up?

Choose Representation of Target Function Table look-up Collection of rules Decision Trees Neural networks Polynomial function of board features Trade-offs in choosing an expressive representation ??? –Approximation accuracy –Number of training examples to learn the target function

Optical Character Recognition Task ? Performance Measure? Training Experience? –Type of knowledge to be learned –Target function, representation –Getting training data –Learning algorithm?