Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam.

Similar presentations


Presentation on theme: "Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam."— Presentation transcript:

1 Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam

2 Mathematical modelling Search space : finite/infinite state/belief space. Belief state = some idea of where we are (represented as a set of/probability distribution over the states). Initial state/belief, goal set of states/beliefs. Actions and their applicability. Action is applicable in a belief if it is applicable in all constituent states. Action transitions (state to state / belief to belief) Action costs Uncertainty : Disjunctive/Probabilistic Feedback : Zero/Partial/Total

3 Disjunctive – Zero Known as conformant planning. Belief = set of states (finite) Output : Plan (Sequence of actions) which takes all the possible initial states to some goal states or else outputs failure. Modelled as deterministic search in the belief space. What is the number of belief states if number of state variables is N?

4 Disjunctive - Partial Known as Contingent Planning Belief = set of states (finite) Output : Policy (Belief -> Action) Bellman Equation : V*(b)=min aεA(b) [c(a)+max oεO(b) V*(b a o )] Where V* represents worst possible cost to reach the goal.

5 Disjunctive - Total Output : Policy (State -> Action) Bellman Equation V*(s)=min aεA(s) [c(a)+max s’εF(a,s) V*(s’)]

6 Probabilistic-Zero Deterministic search in the belief space.

7 Probabilistic - Total Modelled as MDPs. (also called fully observable MDPs) Output : Policy (State -> Action) Bellman Equation V*(s)=min aεA(s) [c(a)+Σ s’εS V*(s’)P(s’|s)]

8 Probabilistic - Partial Modelled as POMDPs. (partially observable MDPs). Also called Probabilistic Contingent Planning. Belief = probabilistic distribution over states. What is the size of belief space? Output : Policy (Discretized Belief -> Action) Bellman Equation V*(b)=min aεA(b) [c(a)+Σ oεO P(b,a,o) V*(b a o )]

9 Heuristic Search Heuristic computation. Search using the heuristics. Stopping condition. Speedups.

10 Heuristic computation Solving a relaxed problem (Admissible) Example: Assume full observability and solve relaxed problem (disjunctive-total) for V* rel (s). h(b)=max sεb V* rel (s) Similarly, for probabilistic problems. Disjunctive-total can be solved in polynomial in |S| time by dynamic programming with initial value function = 0.

11 Algorithms for search A* : works for sequential solutions. AO* : works for acyclic solutions (Nilsson 80). LAO* : works for cyclic solutions (Hansen and Zilberstein 01). RTDP : works for cyclic solutions (Barto et al 95).

12 RTDP iteration Start with initial belief and initialize value of each belief as the heuristic value. For current belief Save the action that minimises the current state value in the current policy. Update the value of the belief through Bellman Backup. Apply the minimum action and then randomly pick an observation. Go to next belief assuming that observation. Repeat until goal is achieved.

13 Stopping Condition ε-convergence : A value function is ε –optimal if the error (residue) at every state/belief less than ε. Ex. (MDPs) : R(s)=|V(s)-(min aεA(s) [c(a)+Σ s’εS V*(s’)P(s’|s)])| Stop when max sεS R(s) < ε

14 Experiments Test on larger but tame POMDPs. Comparison with other planners? Table v/s Graph Time to compute the heuristics. Converted contingent planning to probabilistic contingent planning with uniform probabilities!!!

15 Weaknesses Experiments not informative. Scaling to bigger problems.

16 Easy extensions (Geffner 02) More general cost function. More general goal model. Ex: Pr(reaching goal > Threshold) Extension to epsitemic goal/preconditions An epistemic goal is a goal about complete knowledge of the agent regarding a Boolean.

17 Speedups Reachability Analysis (LAO*) Cheaper heuristic More informed heuristic (Bertoli and Cimatti 02)

18 Fast RTDP convergence What are the advantages of RTDP? What are the disadvantages of RTDP? How to speed up RTDP? (Bonet and Geffner 03)

19 Future Work Discretizations Fine for more probable regions Dynamic Several runs focussing on different areas Don’t discretize Which heuristic to use? Heuristics for information gathering problems. Heuristic search in factored state space?


Download ppt "Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam."

Similar presentations


Ads by Google