Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.

Slides:

Advertisements

Similar presentations

Artificial Intelligence: Knowledge Representation

Advertisements

Reinforcement Learning

Markov Decision Process

Intelligent Agents Russell and Norvig: 2

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Eick: Reinforcement Learning. Topic 18: Reinforcement Learning 1. Introduction 2. Bellman Update 3. Temporal Difference Learning 4. Discussion of Project1.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Reinforcement Learning

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Markov Decision Processes

CSE 471/598, CBS 598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Fall 2004.

Reinforcement learning

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.

Learning From Observations

Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.

Reinforcement Learning Introduction Presented by Alp Sardağ.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Markov Decision Processes

Intelligent Agents revisited.

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,

Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.

Reinforcement Learning (1)

Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.

CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

MAKING COMPLEX DEClSlONS

Machine Learning Chapter 13. Reinforcement Learning

Reinforcement Learning

Learning to Play Blackjack Thomas Boyett Presentation for CAP 4630 Teacher: Dr. Eggen.

How R&N define AI Systems that think like humans Systems that think rationally Systems that act like humans Systems that act rationally humanly vs. rationally.

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Agents CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Kati Carlson EDPS 257 Tuesday/Thursday 12:30 p.m.-1:45 p.m. Daniel Abbott Brody and the Cell Phone.

Chapter 2 Hande AKA. Outline Agents and Environments Rationality The Nature of Environments Agent Types.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Instructional Objective  Define an agent  Define an Intelligent agent  Define a Rational agent  Discuss different types of environment  Explain classes.

Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.

Reinforcement learning (Chapter 21)

Reinforcement Learning

Intelligent Agents Introduction Rationality Nature of the Environment Structure of Agents Summary.

Markov Decision Process (MDP)

MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.

Chapter 6 Neural Network.

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

Motivation To change 1.

Done Done Course Overview What is AI? What are the Major Challenges?

Making complex decisions

Reinforcement Learning

Reinforcement Learning

Announcements Homework 3 due today (grace period through Friday)

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Chapter 2: Evaluative Feedback

Reinforcement Learning

CS 416 Artificial Intelligence

Chapter 2: Evaluative Feedback

Reinforcement Learning

Presentation transcript:

ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog

ai in game programming it university of copenhagen Introduction  How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong?  E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided  But what if these examples are not available?

ai in game programming it university of copenhagen Introduction  But what if these examples are not available?  Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in  But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal

ai in game programming it university of copenhagen Introduction  But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal  ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement  Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment

ai in game programming it university of copenhagen E.g. [D. Terzopoulos et al.]

ai in game programming it university of copenhagen E.g. [T. Streeter]

ai in game programming it university of copenhagen E.g. [K. Sims]

ai in game programming it university of copenhagen Reinforcement Learning  Use observed rewards to learn an [almost?] optimal policy for an environment  Reward R(s) assigns to every state s a number  Utility of an environment history is [as an example] the sum of the rewards received  Policy describes agent’s action from any state s in order to reach the goal  Optimal policy is policy with highest expected utility

ai in game programming it university of copenhagen Rewards, Utilities, &c.  +1  -1

ai in game programming it university of copenhagen Rewards, Utilities, &c.  +1  -1

ai in game programming it university of copenhagen Reinforcement Learning  How to learn a policy like the previous one?  Complicating factors  Normally, both the environment and the reward function are unknown  In many complex domains reinforcement learning is the only feasible way to success

ai in game programming it university of copenhagen Reinforcement Learning  Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out  We will concentrate on simple settings and agent designs to keep things manageable  E.g. fully observable environment

ai in game programming it university of copenhagen 3 Agent Designs  Utility-based agents : learns a utility function based on which it chooses actions  Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state  Reflex agent : learns a policy that maps directly from states to actions

ai in game programming it university of copenhagen More  Next week...

ai in game programming it university of copenhagen