CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section.

Slides:



Advertisements
Similar presentations
The Development of AI St Kentigerns Academy Unit 3 – Artificial Intelligence.
Advertisements

Artificial Intelligence Presentation
Todd W. Neller Gettysburg College
For Friday Finish chapter 5 Program 1, Milestone 1 due.
AI for Connect-4 (or other 2-player games) Minds and Machines.
Tic Tac Toe Game playing strategies
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Heidi Newton Peter Andreae Artificial Intelligence.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Artificial Intelligence in Game Design
CompSci Recursion & Minimax Playing Against the Computer Recursion & the Minimax Algorithm Key to Acing Computer Science If you understand everything,
CPSC 322 Introduction to Artificial Intelligence October 25, 2004.
Game Playing CSC361 AI CSC361: Game Playing.
November 10, 2009Introduction to Cognitive Science Lecture 17: Game-Playing Algorithms 1 Decision Trees Many classes of problems can be formalized as search.
Chapter 1: Introduction
Adversarial Search: Game Playing Reading: Chess paper.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
D Goforth - COSC 4117, fall Note to 4 th year students  students interested in doing masters degree and those who intend to apply for OGS/NSERC.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
CISC 235: Topic 6 Game Trees.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Adversarial Search CS30 David Kauchak Spring 2015 Some material borrowed from : Sara Owsley Sood and others.
Game Playing.
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Survey of AI for games. AI vs. AI for games Traditional AI: – Made to handle unseen inputs, large state space – Too many options possible to compute an.
Artificial Intelligence
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
1 Phase II - Checkers Operator: Eric Bengfort Temporal Status: End of Week Five Location: Phase Two Presentation Systems Check: Checkers Checksum Passed.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2.
Senior Project Poster Day 2007, CIS Dept. University of Pennsylvania Reversi Meng Tran Faculty Advisor: Dr. Barry Silverman Strategies: l Corners t Corners.
For Friday Finish chapter 6 Program 1, Milestone 1 due.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Neural Network Implementation of Poker AI
Outline Intro to Representation and Heuristic Search Machine Learning (Clustering) and My Research.
Carla P. Gomes CS4700 CS 4701: Practicum in Artificial Intelligence Carla P. Gomes
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
DEEP RED An Intelligent Approach to Chinese Checkers.
Artificial Intelligence and Searching CPSC 315 – Programming Studio Spring 2013 Project 2, Lecture 1 Adapted from slides of Yoonsuck Choe.
Lecture 2: 11/4/1435 Problem Solving Agents Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Introduction to Artificial Intelligence (G51IAI) Dr Rong Qu Blind Searches - Introduction.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Introduction to State Space Search
Chapter 2 Approaches/Methods Artificial Intelligence Instructor: Haris Shahzad Artificial Intelligence CS-402.
CompSci Recursion & Minimax Recursion & the Minimax Algorithm Key to Acing Computer Science If you understand everything, ace your computer science.
CS-424 Gregory Dudek Lecture 10 Annealing (final comments) Adversary Search Genetic Algorithms (genetic search)
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Artificial Intelligence AIMA §5: Adversarial Search
Done Done Course Overview What is AI? What are the Major Challenges?
David Kauchak CS52 – Spring 2016
C ODEBREAKER Class discussion.
CS 4700: Foundations of Artificial Intelligence
Artificial Intelligence and Searching
Emergence of Intelligent Machines: Challenges and Opportunities
CS6700 Advanced AI Prof. Carla Gomes Prof. Bart Selman
Topic 1: Problem Solving
Chapter 1: Introduction
Artificial Intelligence and Searching
CS51A David Kauchak Spring 2019
Artificial Intelligence and Searching
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section numbers refer to an earlier edition of R&N with a more basic description of the techniques. The slides provide a self-contained description.

Reinforcement Learning In our discussion of Search methods (developed for problem solving), we assumed a given State Space and operators that lead from one State to one or more Successor states with a possible operator Cost. The State space can be exponentially large but is in principle Known. The difficulty was finding the right path (sequence of moves). This problem solved by searching through the various alternative sequences of moves. In tough spaces, this leads to exponential searches. Can we do something totally different?? Avoid search…

Why don’t we “just learn” how to make the right move in each possible state? In principle, need to know very little about environment at the start. Simply observe another agent / human / program make steps (go from state to state) and mimic! Reinforcement learning: Some of the earliest AI research (1960s). It works! Principles and ideas still applicable today.

Environment we consider is a basic game (the simplest non-trivial game): Tic-Tac-Toe The question: Can you write a program that learns to play Tic-Tac-Toe? Let’s try to re-discover what Donald Michie did in He did not even use a computer! He hand- simulated one. The first non-trivial machine learning program!

Bart Selman CS Now, we don’t want… X’s turn O’s turn X 3x3 Tic-Tac-Toe optimal play We start 3 moves per player in: Tic-tac-toe (or Noughts and crosses, Xs and Os) loss

What else can we think of? Basic ingredients needed: 1)We need to represent board states. 2)What moves to make in different states. It may help to think a bit probabilistically … pick moves with some probability and adjust probabilities through a learning procedure …

Learn from human opponent We could try to learn directly from human what moves to make… But, some issues: 1)Human may be a weak player. We want to learn how to beat him/her! 2)Human may play “nought” (second player) and computer wants to learn how to play “cross” (first player). Answer: Let’s try to “just play” human against machine and learn something from wins and losses.

To start: some basics of the “machine” For each board state where cross is on-move, have a “match box” labeled with that state. Requires a few hundred matchboxes.

Each match box has a number of colored “beads” in it, each color represents a valid move for cross on that board. E.g. start with ten beads of each color for each valid move. 1) To make a move, pick up box with label of current state, shake it, Pick random bead. Check color and make that move. 2) New state, wait for human counter-move. New state, repeat above.

Game ends when one of the parties has a win / loss or no more open spaces. This is how the machine plays. How well will it play? What is is doing initially? Machine needs to learn! How? Can you think of a strategy? The first successful machine learning program in history (not involving search)… Let’s try to come up with a strategy…What do we need to do?

Reinforcement Learning

Works!!! Don’t need that many games. Quite surprising! Not even clear you can “deep learn” this.

Comments Learning in this case took “advantage of”: 1)State space is manageable. Further reduced by using 1 state to represent all isomorphic states (through board rotations and symmetries). We quietly encoded some knowledge about tic-tac-toe!

2) What if state space is MUCH larger? As for any interesting game… Options: a)Represent board by “features.” I.e., number of various pieces on chess board but not their position.. It’s like having each matchbox represent a large collection of states. Notion of “valid moves” becomes a bit trickier. b)Don’t store “match boxes” / states explicitly, instead learn a function (e.g. neural net) that computes the right move directly when given some representation of the state as input. c)Combination of a) and b). d)Combine a), b), and c) with some form of “look-ahead” search.