Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.

Slides:



Advertisements
Similar presentations
Machine learning Overview
Advertisements

Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
Reinforcement Learning
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Support Vector Machines
Games & Adversarial Search
CS 484 – Artificial Intelligence
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
Artificial Intelligence for Games Game playing Patrick Olivier
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
CS Machine Learning.
1er. Escuela Red ProTIC - Tandil, de Abril, Introduction How to program computers to learn? Learning: Improving automatically with experience.
Game Playing CSC361 AI CSC361: Game Playing.
1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
1 Some rules  No make-up exams ! If you miss with an official excuse, you get average of your scores in the other exams – at most once.  WP only-if you.
A Brief Survey of Machine Learning
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
CS 391L: Machine Learning Introduction
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Game Playing.
CpSc 810: Machine Learning Design a learning system.
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Introduction Many decision making problems in real life
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Computing & Information Sciences Kansas State University Wednesday, 13 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 9 of 42 Wednesday, 13 September.
1 Machine Learning Introduction Paola Velardi. 2 Course material Slides (partly) from: 91L/
Machine Learning Introduction. 2 교재  Machine Learning, Tom T. Mitchell, McGraw- Hill  일부  Reinforcement Learning: An Introduction, R. S. Sutton and.
Computing & Information Sciences Kansas State University Lecture 10 of 42 CIS 530 / 730 Artificial Intelligence Lecture 10 of 42 William H. Hsu Department.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2.
Machine Learning for an Artificial Intelligence Playing Tic-Tac-Toe Computer Systems Lab 2005 By Rachel Miller.
Computing & Information Sciences Kansas State University Wednesday, 12 Sep 2007CIS 530 / 730: Artificial Intelligence Lecture 9 of 42 Wednesday, 12 September.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Chapter 1: Introduction. 2 목 차목 차 t Definition and Applications of Machine t Designing a Learning System  Choosing the Training Experience  Choosing.
1 Introduction to Machine Learning Chapter 1. cont.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Computing & Information Sciences Kansas State University CIS 530 / 730: Artificial Intelligence Lecture 09 of 42 Wednesday, 17 September 2008 William H.
Machine Learning & Datamining CSE 454. © Daniel S. Weld 2 Project Part 1 Feedback Serialization Java Supplied vs. Manual.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
1 Machine Learning Patricia J Riddle Computer Science 367 6/26/2016Machine Learning.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Game Playing Why do AI researchers study game playing?
Spring 2003 Dr. Susan Bridges
AlphaGo with Deep RL Alpha GO.
Machine Learning.
Why Machine Learning Flood of data
Game Playing Fifth Lecture 2019/4/11.
CAP 5610: Introduction to Machine Learning Spring 2011 Dr
Presentation transcript:

Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you did bad and when you did good. Often called the critic. –Training Experience: Basically, you’ve got to know how you’re going to get the data you’re going to train the whole furshlugginer thing with.

Checkers T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it?

Type of Training Experience Direct or indirect? –Direct: board state -> correct move –Indirect: outcome of a complete game –Credit assignment problem Teacher or not ? –Teacher selects board states –Learner can select board states –Randomly selected board states? Is training experience representative of performance goal? –Training playing against itself –Performance evaluated playing against world champion

Choose Target Function ChooseMove : B  M : board state  move –Maps a legal board state to a legal move Evaluate : B  V : board state  board value –Assigns a numerical score to any given board state, such that better board states obtain a higher score –Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score

Possible Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b’), where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational

State Space Search V(b)= ? V(b)= max i V(b i ) m 1 : b  b 1 m 2 : b  b 2 m 3 : b  b 3

State Space Search V(b 1 )= ? m 4 : b  b 4 m 5 : b  b 5 m 6 : b  b 6 V(b 1 )= min i V(b i )

Final Board States Black wins: V(b)=-100 Red wins: V(b)=100 draw: V(b)=0

Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + … … + 9!/(2! 4! 3!) + … 9 = x 4 checkers: #board states = ? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17

Choose Representation of Target Function Table look-up Collection of rules Neural networks Polynomial function of board features Trade-offs in choosing an expressive representation ??? –Approximation accuracy –Number of training examples to learn the target function

Representation of Target Function V(b)=  0 +  1 bp(b) +  2 rp(b) +  3 bk(b) +  4 rk(b) +  5 bt(b) +  6 rt(b) bp(b): #black pieces rp(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red

Obtaining Training Examples V(b) : true target function V’(b) : learned target function V train (b) : training value (estimate) Rule for estimating training values: V train (b)  V’(Successor(b))

Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) – V’(b) 2. For each board feature fi, update weight  i   i +  f i error(b)  : try learning rate = 0.1

Example: 4x4 checkers V(b)=  0 +  1 rp(b) +  2 bp(b) Initial weights:  0 =-10,  1 =75,  2 =-60 V(b 0 )=  0 +  1 *2 +  2 *2 = 20 m 1 : b  b 1 V(b 1 )=20 m 2 : b  b 2 V(b 2 )=20 m 3 : b  b 3 V(b 3 )=20

Example 4x4 checkers V(b 1 )=20 V(b 0 )=20 1. Compute error(b 0 ) = V train (b) – V(b 0 ) = V(b 1 ) – V(b 0 ) = 0 2. For each board feature fi, update weight  i   i +  f i error(b)  0   * 1 * 0  1   * 2 * 0  2   * 2 * 0

Example: 4x4 checkers V(b 2 )=20 V(b 0 )=20 V(b 1 )=20 V(b 3 )=20

Example: 4x4 checkers V(b 4b )=-55 V(b 4a )=20 V(b 3 )=20

Example 4x4 checkers V(b 4 )=-55 V(b 3 )=20 1. Compute error(b 3 ) = V train (b) – V(b 3 ) = V(b 4 ) – V(b 3 ) = For each board feature fi, update weight  i   i +  f i error(b) :  0 =-10,  1 =75,  2 =-60  0   * 1 * 75,  0 =  1   * 2 * 75,  1 = 60  2   * 2 * 75,  2 = -75

Example: 4x4 checkers V(b 5 )= V(b 4 )=  0 = -17.5,  1 = 60,  2 = -75

Example 4x4 checkers V(b 6 )= V(b 5 )= error(b 5 ) = V train (b) – V(b 5 ) = V(b 6 ) – V(b 5 ) = -60  0 =-17.5,  1 =60,  2 =-75  i   i +  f i error(b)  0   * 1 * 60,  0 =  1   * 1 * 60,  1 = 54  2   * 2 * 60,  2 = -87

Example 4x4 checkers V(b 6 )= error(b 6 ) = V train (b) – V(b 6 ) = V f (b 6 ) – V(b 6 ) = 97.5  0 =-23.5,  1 =54,  2 =-87  i   i +  f i error(b)  0   * 1 * 97.5,  0 = –13.75  1   * 0 * 97.5,  1 = 54  2   * 2 * 97.5,  2 = Final board state: black won V f (b)=-100

Questions We have constrained the evaluation function to a linear sum of features. Can the true evaluation function be represented by these features in this way? Even if so, will the learning technique be able to learn the correct weights? –Practically? –If it can’t, how bad can it get? How bad is it to likely get?

Optical Character Recognition Task ? Performance Measure? Training Experience? –Type of knowledge to be learned –Target function, representation –Getting training data –Learning algorithm?