Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.

ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog

ai in game programming it university of copenhagen Introduction  How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong?  E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided  But what if these examples are not available?

ai in game programming it university of copenhagen Introduction  But what if these examples are not available?  Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in  But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal

ai in game programming it university of copenhagen Introduction  But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal  ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement  Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment

ai in game programming it university of copenhagen E.g. [D. Terzopoulos et al.]

ai in game programming it university of copenhagen E.g. [T. Streeter]

ai in game programming it university of copenhagen E.g. [K. Sims]

ai in game programming it university of copenhagen Reinforcement Learning  Use observed rewards to learn an [almost?] optimal policy for an environment  Reward R(s) assigns to every state s a number  Utility of an environment history is [as an example] the sum of the rewards received  Policy describes agent’s action from any state s in order to reach the goal  Optimal policy is policy with highest expected utility

ai in game programming it university of copenhagen Rewards, Utilities, &c.  +1  -1

ai in game programming it university of copenhagen Reinforcement Learning  How to learn a policy like the previous one?  Complicating factors  Normally, both the environment and the reward function are unknown  In many complex domains reinforcement learning is the only feasible way to success

ai in game programming it university of copenhagen Reinforcement Learning  Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out  We will concentrate on simple settings and agent designs to keep things manageable  E.g. fully observable environment

ai in game programming it university of copenhagen 3 Agent Designs  Utility-based agents : learns a utility function based on which it chooses actions  Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state  Reflex agent : learns a policy that maps directly from states to actions

ai in game programming it university of copenhagen More  Next week...

ai in game programming it university of copenhagen

Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.

Similar presentations

Presentation on theme: "Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.

Similar presentations

Presentation on theme: "Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog."— Presentation transcript:

Similar presentations

About project

Feedback