Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classical Situation hellheaven World deterministic State observable.

Similar presentations


Presentation on theme: "Classical Situation hellheaven World deterministic State observable."— Presentation transcript:

1

2 Classical Situation hellheaven World deterministic State observable

3 MDP-Style Planning hellheaven World stochastic State observable [Koditschek 87, Barto et al. 89] Policy Universal Plan Navigation function

4 Stochastic, Partially Observable sign hell?heaven? [Sondik 72] [Littman/Cassandra/Kaelbling 97]

5 Stochastic, Partially Observable sign hellheaven sign heavenhell

6 Stochastic, Partially Observable sign heavenhell sign ?? hellheaven start 50%

7 Robot Planning Frameworks Classical AI/robot planning State/actionsdiscrete & continuous Stateobservable Environmentdeterministic PlansSequences of actions CompletenessYes OptimalityRarely State space size Huge, often continuous, 6 dimensions Computationa l Complexity varies

8 MDP-Style Planning hellheaven World stochastic State observable [Koditschek 87, Barto et al. 89] Policy Universal Plan Navigation function

9 Markov Decision Process (discrete) s2s2 s3s3 s4s4 s5s5 s1s1 0.7 0.3 0.9 0.1 0.3 0.4 0.99 0.1 0.2 0.8 r=  10 r=  0  r=0 r=1 r=0 [Bellman 57] [Howard 60] [Sutton/Barto 98]

10 Value Iteration Value function of policy  Bellman equation for optimal value function Value iteration: recursively estimating value function Greedy policy: [Bellman 57] [Howard 60] [Sutton/Barto 98]

11 Value Iteration for Motion Planning (assumes knowledge of robot’s location)

12 Continuous Environments From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

13 Approximate Cell Decomposition [Latombe 91] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

14 Parti-Game [Moore 96] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

15 Robot Planning Frameworks Classical AI/robot planning Value Iteration in MDPs Parti-Game State/actionsdiscrete & continuous discretecontinuous Stateobservable Environmentdeterministicstochastic PlansSequences of actions policy CompletenessYes OptimalityRarelyYesNo State space size Huge, often continuous, 6 dimensions millionsn/a Computationa l Complexity variesquadraticn/a

16 Stochastic, Partially Observable sign ?? start sign heavenhell sign hellheaven 50% sign ?? start

17 A Quiz  -dim continuous* stochastic 1-dim continuous stochastic actions# statessize belief space?sensors 3: s 1, s 2, s 3 deterministic3perfect 3: s 1, s 2, s 3 stochastic3perfect 2 3 -1: s 1, s 2, s 3, s 12, s 13, s 23, s 123 deterministic3 abstract states deterministic3stochastic 2-dim continuous*: p ( S=s 1 ), p ( S=s 2 ) stochastic3none 2-dim continuous*: p ( S=s 1 ), p ( S=s 2 ) *) countable, but for all practical purposes  -dim continuous* deterministic 1-dim continuous stochastic aargh!stochastic  -dim continuous stochastic

18 Introduction to POMDPs (1 of 3) 80  100 ba  0 ba  40 s2s2 s1s1 action a action b p(s1)p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97] s2s2 s1s1  100 0 100 action aaction b

19 Introduction to POMDPs (2 of 3) 80  100 ba  0 ba  40 s2s2 s1s1 80% c 20% p(s1)p(s1) s2s2 s1’s1’ s1s1 s2’s2’ p(s 1 ’) p(s1)p(s1) s2s2 s1s1  100 0 100 [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

20 Introduction to POMDPs (3 of 3) 80  100 ba  0 ba  40 s2s2 s1s1 80% c 20% p(s1)p(s1) s2s2 s1s1  100 0 100 p(s1)p(s1) s2s2 s1s1 s1s1 s2s2 p(s 1 ’|A) B A 50% 30% 70% B A p(s 1 ’|B) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

21 Value Iteration in POMDPs Value function of policy  Bellman equation for optimal value function Value iteration: recursively estimating value function Greedy policy: Substitute b for s

22 Missing Terms: Belief Space Expected reward: Next state density: Bayes filters! (Dirac distribution)

23 Value Iteration in Belief Space.... next belief state b’ observation o.... belief state b max Q(b’, a) next state s’, reward r’state s Q(b, a) value function

24 Why is This So Complex? State Space Planning (no state uncertainty) Belief Space Planning (full state uncertainties) ?

25 Augmented MDPs: [Roy et al, 98/99] conventional state space uncertainty (entropy)

26 Path Planning with Augmented MDPs information gainConventional plannerProbabilistic Planner [Roy et al, 98/99]

27 Robot Planning Frameworks Classical AI/robot planning Value Iteration in MDPs Parti-GamePOMDPAugmented MDP State/actionsdiscrete & continuous discretecontinuousdiscrete Stateobservable partially observable Environmentdeterministicstochastic PlansSequences of actions policy CompletenessYes No OptimalityRarelyYesNoYesNo State space size Huge, often continuous, 6 dimensions millionsn/adozensthousands Computationa l Complexity variesquadraticn/aexponentialO(N 4 )


Download ppt "Classical Situation hellheaven World deterministic State observable."

Similar presentations


Ads by Google