Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?

Slides:



Advertisements
Similar presentations
© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded.
Advertisements

Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
brings-uas-sensor-technology-to- smartphones/ brings-uas-sensor-technology-to-
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Reinforcement learning (Chapter 21)
Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.
Markov Decision Processes
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Reinforcement learning
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
Reinforcement Learning
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Chapter 1: Introduction
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Machine Learning Lecture 11: Reinforcement Learning
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Machine Learning Chapter 13. Reinforcement Learning
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
Introduction Many decision making problems in real life
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Reinforcement Learning
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
CHAPTER 10 Reinforcement Learning Utility Theory.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
Reinforcement Learning
Neural Networks Chapter 7
INTRODUCTION TO Machine Learning
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement learning (Chapter 21)
Reinforcement Learning
CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.
Reinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21)
Markov Decision Processes
Reinforcement Learning
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Reinforcement Learning
Chapter 2: Evaluative Feedback
یادگیری تقویتی Reinforcement Learning
Reinforcement Learning
CS 188: Artificial Intelligence Spring 2006
Reinforcement Nisheeth 18th January 2019.
Chapter 2: Evaluative Feedback
Presentation transcript:

Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?

Reinforcement Learning: a problem, not an approach What if you learn perfectly for level 1 and go to level 2? Trial-and-error again? (Gabe)

Sequential Decision Making: RL Environment Agent Action StateReward Markov Decision Process (MDP) S: set of states in the world A: set of actions an agent can perform T: S  A  S (transition function) R: S   (environmental reward)  : S  A (policy) Q: S  A   (action-value function)

Reinforcement Learning (RL) Basis in psychology (dopamine) Very limited feedback Delayed Reward Very flexible Possibly very slow / high sample complexity Agent-centric TD-Gammon Aibo Learned Gait: Tesauro, 1995 Kohl & Stone, 2004 Aside: Self play with same algorithm or different ones? (Josh / Anthony)

Reinforcement Learning (RL) Basis in psychology (dopamine) Very limited feedback Delayed Reward Very flexible Possibly very slow / high sample complexity Agent-centric TD-Gammon Aibo Learned Gait Tesauro, 1995 Kohl & Stone, 2004

Tic-Tac-Toe example Minimax Dynamic programming Evolutionary approach – entire policy, what happens during game is ignored (Approximate) Value Functions Exploration/exploitation – Greed is often good

Tic-Tac-Toe example Aside: Non-TD methods for learning value functions? (Bei)

Symmetries in tic-tac-toe (David) Opponent modeling vs. optimal play (Anthony)

Chris: UCT UCB1

Consider the following learning problem. You are faced repeatedly with a choice among n different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action selections. Each action selection is called a play.

First, what is optimal policy if payoffs are deterministic? Evolutionary Approach? Value-based approach? Explore vs. Exploit in this context?