Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog

ai in game programming it university of copenhagen Rationale  How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong?  E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided  But what if these examples are not available?

ai in game programming it university of copenhagen Rationale  But what if these examples are not available?  Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in  But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal

ai in game programming it university of copenhagen Rationale  But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal  ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement  Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment

ai in game programming it university of copenhagen Reinforcement Learning  Use observed rewards to learn an [almost?] optimal policy for an environment  Reward R(s) assigns to every state s a number  Utility of an environment history is [as an example] the sum of the rewards received  Policy describes agent’s action from any state s in order to reach the goal  Optimal policy is policy with highest expected utility

ai in game programming it university of copenhagen Reinforcement Learning  Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out  We will concentrate on simple settings and agent designs to keep things manageable  E.g. fully observable environment

ai in game programming it university of copenhagen Typically in Games  Offline / during development : episodic reinforcement learning  Multiple training instances / several runs from start to end  Online / during actual game playing : incremental reinforcement learning  One continuous sequence of states / possibly without clear ‘end’

ai in game programming it university of copenhagen 3 Agent Designs  Utility-based agents : learns a utility function based on which it chooses actions  Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state  Reflex agent : learns a policy that maps directly from states to actions

ai in game programming it university of copenhagen Passive Reinforcement  Policy is fixed : state s always leads to the same action  Goal is simply to learn how good this policy is  [Of course this can be extended ‘easily’ to policy learning...]

ai in game programming it university of copenhagen Direct Utility Estimation  Idea : utility of a state is expected total reward from that state onward  Each trial provides a sample of this value for each state visited  After trial, utility for each observed state is simply updated using running average  In the limit, sample average converges to true expectation  Direct utility estimation = standard supervised inductive learning

ai in game programming it university of copenhagen More Direct Utility Estimation  ‘Reduction’ of the problem to ‘standard learning’ is nice [of course]  However, important source of information is not used : utilities of states are not independent  Utility of each state is own reward + expected utility of its successor states  Bellman equations  Using this prior knowledge can improve [e.g. speed up] learning considerably  As is generally the case

ai in game programming it university of copenhagen Adaptive Dynamic Programming  Take into account constraints between states  Passive learning agent learns based on observed rewards and transition model  Latter models the probability of reaching state s’ from state s when performing action a(s)  Two possibilities  Solve system of linear equations [for small systems]  Update iteratively

ai in game programming it university of copenhagen Temporal Difference  Take into account constraints between states  Idea : use observed transitions to adjust utility value of observed states so that they agree [better] with the constraints

ai in game programming it university of copenhagen Active Reinforcement  Passive learning agent has fixed policy...  Active agent must decide [learn] what action to take, i.e., it should find the optimal policy  Agent should make a trade-off between exploitation and exploration

ai in game programming it university of copenhagen Exploitation & Exploration  Exploitation : use best action [at that time] in order to come to highest reward  Exploration : attempt to get to all states possible by trying all actions possible [resulting in experience from which can be learned]

ai in game programming it university of copenhagen Exploitation & Exploration  Agent relying completely on exploitation is called greedy and often very suboptimal  Trade-off between greed and curiosity of the agent is controlled by an exploration function

ai in game programming it university of copenhagen Learning Action-Value  Temporal difference learning can also be used for active reinforcement learning  Action-value function gives expected utility of taking given action in given state  Q-learning is an alternative to temporal difference learning that learns an action- value function Q(a,s) instead of utilities  The important difference is that former is ‘model-free’, no transition model has to be learned, nor the actual utilities

ai in game programming it university of copenhagen Of Course : Generalization  For large state spaces exact inference of utility and/or Q-function as a table becomes unrealistic  Function approximation is needed, i.e., not in a tabular form  Makes it possible to represent utility functions for very large state spaces  More importantly, it allows for generalization  All this relates, of course, to decision trees, MAP, regression, density estimation, ML, hypotheses spaces, etc.

ai in game programming it university of copenhagen E.g. Inverted Pendulum  MPI Magdeburg, Germany

ai in game programming it university of copenhagen...and Triple Inverted  MPI Magdeburg, Germany

ai in game programming it university of copenhagen E.g. Lee05a.pdf

ai in game programming it university of copenhagen Finally... a Summary  Reinforcement learning enables agents to become skilled in an unknown environment based only on percepts and occasional rewards  3 approaches  Direct utility estimation : observations independent  Adaptive dynamic programming : learns model + reward function and uses this to determine utilities or optimal policy  Temporal difference : adjust utility value so they agree with the constraints

ai in game programming it university of copenhagen More Summary...  Trade-off between exploitation and exploration is important  Large state spaces call for approximate methods, giving rise to function learning, regression, etc.  Reinforcement learning : one of most active areas of machine learning research, because of its potential for eliminating hand coding of control strategies...

ai in game programming it university of copenhagen Next Week  Guest lecturer Peter Andreasen on... I don’t know yet  Place : Auditorium 1  Start : ±0900  [Next next week : final lecture, including probably an hour’s lecture on NERO, some words on the course evaluation and there will be the possibility for asking questions...]

ai in game programming it university of copenhagen

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Similar presentations

Presentation on theme: "Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Similar presentations

Presentation on theme: "Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog."— Presentation transcript:

Similar presentations

About project

Feedback