Hidden Markov Models (cont.) Markov Decision Processes

Slides:



Advertisements
Similar presentations
Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Advertisements

Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/21 at 5:00pm.  Optional.
Decision Theoretic Planning
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
An Introduction to Markov Decision Processes Sarah Hickmott
Markov Decision Processes
Infinite Horizon Problems
Planning under Uncertainty
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
CS 188: Artificial Intelligence Fall 2009 Lecture 10: MDPs 9/29/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either Stuart Russell.
Department of Computer Science Undergraduate Events More
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Department of Computer Science Undergraduate Events More
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Quiz 6: Utility Theory  Simulated Annealing only applies to continuous f(). False  Simulated Annealing only applies to differentiable f(). False  The.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
CS 188: Artificial Intelligence Spring 2007 Lecture 23: Reinforcement Learning: III 4/17/2007 Srini Narayanan – ICSI and UC Berkeley.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.
Department of Computer Science Undergraduate Events More
Comparison Value vs Policy iteration
QUIZ!!  T/F: Optimal policies can be defined from an optimal Value function. TRUE  T/F: “Pick the MEU action first, then follow optimal policy” is optimal.
CS 188: Artificial Intelligence Spring 2007 Lecture 21:Reinforcement Learning: II MDP 4/12/2007 Srini Narayanan – ICSI and UC Berkeley.
Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/19 at 5:00pm.  Optional.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Markov Decision Process (MDP)
Making complex decisions
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Markov Decision Processes
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Biomedical Data & Markov Decision Process
Markov Decision Processes
Markov Decision Processes
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Spring 2007
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
CAP 5636 – Advanced Artificial Intelligence
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Quiz 6: Utility Theory Simulated Annealing only applies to continuous f(). False Simulated Annealing only applies to differentiable f(). False The 4 other.
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008
Instructor: Vincent Conitzer
CS 188: Artificial Intelligence Fall 2007
Markov Decision Problems
Chapter 17 – Making Complex Decisions
CS 188: Artificial Intelligence Spring 2006
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
CS 416 Artificial Intelligence
Chapter 4: Dynamic Programming
Reinforcement Learning Dealing with Partial Observability
Reinforcement Learning (2)
Markov Decision Processes
Markov Decision Processes
Reinforcement Learning (2)
Presentation transcript:

Hidden Markov Models (cont.) Markov Decision Processes CHAPTER 9 Hidden Markov Models (cont.) Markov Decision Processes

Markov Models

Conditional Independence

Weather Example

Mini-Forward Algorithm

Example

Stationary Distributions If we simulate the chain long enough:  What happens?  Uncertainty accumulates  Eventually, we have no idea what the state is! Stationary distributions:  For most chains, the distribution we end up in is independent of the initial distribution  Called the stationary distribution of the chain  Usually, can only predict a short time out

Example: Web Link Analysis

Mini-Viterbi Algorithm

Hidden Markov Models

HMM Applications

Filtering: Forward Algorithm

Filtering Example

MLE: Viterbi Algorithm

Viterbi Properties

Markov Decision Processes

MDP Solutions

Example Optimal Policies

Stationarity

How (Not) to Solve an MDP The inefficient way:  Enumerate policies  Calculate the expected utility (discounte rewards) starting from the start state  E.g. by simulating a bunch of runs  Choose the best policy We’ll return to a (better) idea like this later

Utilities of States

Infinite Utilities?

The Bellman Equation

Example: Bellman Equations

Value Iteration

Policy Iteration Alternate approach:  Policy evaluation: calculate utilities for a fixed policy  Policy improvement: update policy based on resulting utilities  Repeat until convergence This is policy iteration  Can converge faster under some conditions

Comparison In value iteration:  Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy In policy iteration:  Several passes to update utilities  Occasional passes to update policies Hybrid approaches (asynchronous policy iteration):  Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often