CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

Outline MDP/POMDP environment Research Direction Maze Domain Motivation – iNRIIS Previous Research

Introduction Consider a model (system) of an environment and an agent where: –The agent receives observations about current system state (inputs) –The agent can take actions that effect the system state (outputs) –The agent receives rewards/penalties for taking various actions in various system states.

Formal Definition

Markov Decision Process (MDP) The agent needs only the percept from its current state to calculate the optimal action –ie the action delivering maximum reward

Partially Observable Markov Decision Process (POMDP) The percept does not carry enough information to enable the agent to compute optimal action. However the whole (partial) history of percepts may allow the agent to calculate the optimal action.

Research Goals

Maze Domain

Produces 3-D forest models from areal imagesProduces 3-D forest models from areal images Applies vision operators to data tokens until a valid 3-D scene description is produced.Applies vision operators to data tokens until a valid 3-D scene description is produced. Produces 3-D forest models from areal imagesProduces 3-D forest models from areal images Applies vision operators to data tokens until a valid 3-D scene description is produced.Applies vision operators to data tokens until a valid 3-D scene description is produced.

iNRIIS

Challenges 152 million geographic states with each state in one of approximately 1000 conditions (seasonal, lighting, meteorological). There is no way to aquire perfect V*

Challenges (cont) Each image is approximately 1000 by 1000 = 1,000,000 pixels To make lookahead feasibile need to extract relevant features The feature extraction process abstracts (buckets) several states together.

Challenges (cont) The real (stochiastic) vision operators take a long time to run. Need a quick approximation { δ˜(s,a,s’)} of operators to make lookahead feasible

Solutions for Markov Decision Process if we know the transition model: –value/policy iteration –Temporal Difference learning eg. TDGammon by Tesauro if we do not know the transition model: –learning value of each action in a given state (Q-learning) –learning the model and the value of state I.e. Learn V* and δ

State aggregation in MDP Grouping “similar” states together Boutilier and Dearden Different kinds, e.g. –states grouped according to their values (exact abstraction) –some irrelevant features abstracted away (approximate abstraction) Used in robot navigation

POMDP Much more difficult Ways to restore the Markov property: –decision based on the history of observation and/or actions Used by Mitchell in pole balancing task –convert into MDP in the belief states

POMDP cont. Solution of underlying MDP as heuristic e.g GIB – the worlds best Bridge program by Ginsberg Limited lookahead (can be combined with learning: updating of the heurisitc value of states/belief states)

Research Goals

References G.Boutilier, T.Dean, S.Hanks, Decision-Theoretic Planning: Structural Assumptions and Computational Leverage, JAIR(11):1-94, 1999. R.Dearden, G.Boutilier, Abstraction and Approximate Decision Theoretic Planning, Artificial Intelligence, 89(1):219-283, 1997 L.P.Kaelbling, M.L.Littman, Reinforcement Learning: A Survey, JAIR(4):237-285, 1996. L.P.Kaelbling,A.R.Cassandra, M.L.Littman, Learning Policies for partially observable environments: Scaling up, Proceedings of the 12 th International Conference on Machine Learning, 1995.

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

Similar presentations

Presentation on theme: "CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

Similar presentations

Presentation on theme: "CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)"— Presentation transcript:

Similar presentations

About project

Feedback