Presentation is loading. Please wait.

Presentation is loading. Please wait.

Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Similar presentations


Presentation on theme: "Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University."— Presentation transcript:

1 Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University

2 Policy Iteration for POMDPs [Hansen ‘98] 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement

3 Policy Iteration for POMDPs [Hansen ‘98] 1 2 V b 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value Determination DP Step Policy Improvement

4 POMDP Complexity Number of vectors can grow exponentially: Avoid generating unneeded facets: Witness, IP, etc; Approximate by discarding similar vectors, etc. Each vector has a large representation: One dimension for each state; 2 n dimensions for n state variables; Can try structured representations of the vectors. [Boutilier & Poole ’96] [Hansen & Feng ’00] POMDPs have multiple sources of complexity:

5 Factored POMDPs Total reward adding sub-rewards: R=R 1 +R 2 R2R2 Z R1R1 Y’Z’YX’X Time tt+1 Subset of variables are observed OZ’ AZAZ OX’ AXAX Actions only change small parts of model

6 Exploiting Structure Structured vectors approach: [Boutilier & Poole ’96], [Hansen & Feng ’00] Within a vector, many dimensions may be equivalent; Collapse using a tree; Works well if DBN structure leads to clean decomposition; Doesn’t always hold up, even in MDPs. 1 2 V b=P(XYZ) X Z Structure in model might imply structure in vectors;

7 Our Approach Not all structured POMDPs have structured vectors; Embed structure into value function space a priori: Project  -vectors into structured vector space; Efficiently find closest approximation to “true”  -vectors. Linear Combination of Structured Features

8 Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Best

9 Factored Best Want to find vector with highest value for given belief state: Factorization decomposes dot-product: 1 2 V b 3 b

10 Factored Best Example Assume 4 state variables, 3 basis functions: Decomposition of dot product: Summands depend only on marginal probabilities

11 Factored Best Properties Avoids exponential blowup in belief state representation; Exponential in size of basis function domains; Suggests a belief state decomposition; Factored Best only requires marginals; Useful at execution time; Monitoring belief state: Can represent belief state as product of marginals; [Boyen & Koller ’98] Analyze policy loss from belief state approximation. [McAllester & Singh ’99] [Poupart & Boutilier ’01]

12 Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Pointwise Dominates

13 Pointwise Domination 1 2 V b 3 Does  2 dominate  4 pointwise ? 4 Minimum  0 Factored value functions: Minimization over exponential state space! Minimization over factored function efficient with cost networks. [Bertele and Brioschi ‘72], [Dechter ‘99]

14 Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Value Determination

15 Value Determination 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value of policy, starting from 1 Expected Future RewardValue Observed O 1 Observed O 2

16 Approximate Value Determination Exact value determination exponential number of equations; Factored approximation efficient: Find best approximation in max-norm; Algorithm exploits factored model; Analogous to factored MDP case (see Max-norm Projection IJCAI talk on Thursday).

17 Exploiting Structure: Summary 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement

18 Conclusions Factored POMDPs can represent complex systems; Factorization in model doesn’t always imply factorization in solution: Linear approximation reduces dimensionality of problem; Can efficiently find closest linear approximation; Can modify standard POMDP algorithms to use factored linear value functions efficiently; Complexity function of DBN and basis structure.

19 Our Approach V b(s1) b(s2) One dimension for each state V h 1 (s) h 2 (s) Projection One dimension for each feature (<< #states)


Download ppt "Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University."

Similar presentations


Ads by Google