Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Similar presentations


Presentation on theme: "Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University."— Presentation transcript:

1 Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University

2 Policy Iteration for POMDPs [Hansen ‘98] 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement

3 Policy Iteration for POMDPs [Hansen ‘98] 1 2 V b 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value Determination DP Step Policy Improvement

4 POMDP Complexity Number of vectors can grow exponentially: Avoid generating unneeded facets: Witness, IP, etc; Approximate by discarding similar vectors, etc. Each vector has a large representation: One dimension for each state; 2 n dimensions for n state variables; Can try structured representations of the vectors. [Boutilier & Poole ’96] [Hansen & Feng ’00] POMDPs have multiple sources of complexity:

5 Factored POMDPs Total reward adding sub-rewards: R=R 1 +R 2 R2R2 Z R1R1 Y’Z’YX’X Time tt+1 Subset of variables are observed OZ’ AZAZ OX’ AXAX Actions only change small parts of model

6 Exploiting Structure Structured vectors approach: [Boutilier & Poole ’96], [Hansen & Feng ’00] Within a vector, many dimensions may be equivalent; Collapse using a tree; Works well if DBN structure leads to clean decomposition; Doesn’t always hold up, even in MDPs. 1 2 V b=P(XYZ) X Z Structure in model might imply structure in vectors;

7 Our Approach Not all structured POMDPs have structured vectors; Embed structure into value function space a priori: Project  -vectors into structured vector space; Efficiently find closest approximation to “true”  -vectors. Linear Combination of Structured Features

8 Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Best

9 Factored Best Want to find vector with highest value for given belief state: Factorization decomposes dot-product: 1 2 V b 3 b

10 Factored Best Example Assume 4 state variables, 3 basis functions: Decomposition of dot product: Summands depend only on marginal probabilities

11 Factored Best Properties Avoids exponential blowup in belief state representation; Exponential in size of basis function domains; Suggests a belief state decomposition; Factored Best only requires marginals; Useful at execution time; Monitoring belief state: Can represent belief state as product of marginals; [Boyen & Koller ’98] Analyze policy loss from belief state approximation. [McAllester & Singh ’99] [Poupart & Boutilier ’01]

12 Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Pointwise Dominates

13 Pointwise Domination 1 2 V b 3 Does  2 dominate  4 pointwise ? 4 Minimum  0 Factored value functions: Minimization over exponential state space! Minimization over factored function efficient with cost networks. [Bertele and Brioschi ‘72], [Dechter ‘99]

14 Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Value Determination

15 Value Determination 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value of policy, starting from 1 Expected Future RewardValue Observed O 1 Observed O 2

16 Approximate Value Determination Exact value determination exponential number of equations; Factored approximation efficient: Find best approximation in max-norm; Algorithm exploits factored model; Analogous to factored MDP case (see Max-norm Projection IJCAI talk on Thursday).

17 Exploiting Structure: Summary 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement

18 Conclusions Factored POMDPs can represent complex systems; Factorization in model doesn’t always imply factorization in solution: Linear approximation reduces dimensionality of problem; Can efficiently find closest linear approximation; Can modify standard POMDP algorithms to use factored linear value functions efficiently; Complexity function of DBN and basis structure.

19 Our Approach V b(s1) b(s2) One dimension for each state V h 1 (s) h 2 (s) Projection One dimension for each feature (<< #states)


Download ppt "Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University."

Similar presentations


Ads by Google