Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University

Policy Iteration for POMDPs [Hansen ‘98] 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement

Policy Iteration for POMDPs [Hansen ‘98] 1 2 V b 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value Determination DP Step Policy Improvement

POMDP Complexity Number of vectors can grow exponentially: Avoid generating unneeded facets: Witness, IP, etc; Approximate by discarding similar vectors, etc. Each vector has a large representation: One dimension for each state; 2 n dimensions for n state variables; Can try structured representations of the vectors. [Boutilier & Poole ’96] [Hansen & Feng ’00] POMDPs have multiple sources of complexity:

Factored POMDPs Total reward adding sub-rewards: R=R 1 +R 2 R2R2 Z R1R1 Y’Z’YX’X Time tt+1 Subset of variables are observed OZ’ AZAZ OX’ AXAX Actions only change small parts of model

Exploiting Structure Structured vectors approach: [Boutilier & Poole ’96], [Hansen & Feng ’00] Within a vector, many dimensions may be equivalent; Collapse using a tree; Works well if DBN structure leads to clean decomposition; Doesn’t always hold up, even in MDPs. 1 2 V b=P(XYZ) X Z Structure in model might imply structure in vectors;

Our Approach Not all structured POMDPs have structured vectors; Embed structure into value function space a priori: Project  -vectors into structured vector space; Efficiently find closest approximation to “true”  -vectors. Linear Combination of Structured Features

Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Best

Factored Best Want to find vector with highest value for given belief state: Factorization decomposes dot-product: 1 2 V b 3 b

Factored Best Example Assume 4 state variables, 3 basis functions: Decomposition of dot product: Summands depend only on marginal probabilities

Factored Best Properties Avoids exponential blowup in belief state representation; Exponential in size of basis function domains; Suggests a belief state decomposition; Factored Best only requires marginals; Useful at execution time; Monitoring belief state: Can represent belief state as product of marginals; [Boyen & Koller ’98] Analyze policy loss from belief state approximation. [McAllester & Singh ’99] [Poupart & Boutilier ’01]

Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Pointwise Dominates

Pointwise Domination 1 2 V b 3 Does  2 dominate  4 pointwise ? 4 Minimum  0 Factored value functions: Minimization over exponential state space! Minimization over factored function efficient with cost networks. [Bertele and Brioschi ‘72], [Dechter ‘99]

Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Value Determination

Value Determination 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value of policy, starting from 1 Expected Future RewardValue Observed O 1 Observed O 2

Approximate Value Determination Exact value determination exponential number of equations; Factored approximation efficient: Find best approximation in max-norm; Algorithm exploits factored model; Analogous to factored MDP case (see Max-norm Projection IJCAI talk on Thursday).

Exploiting Structure: Summary 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement

Conclusions Factored POMDPs can represent complex systems; Factorization in model doesn’t always imply factorization in solution: Linear approximation reduces dimensionality of problem; Can efficiently find closest linear approximation; Can modify standard POMDP algorithms to use factored linear value functions efficiently; Complexity function of DBN and basis structure.

Our Approach V b(s1) b(s2) One dimension for each state V h 1 (s) h 2 (s) Projection One dimension for each feature (<< #states)

Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Similar presentations

Presentation on theme: "Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Similar presentations

Presentation on theme: "Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University."— Presentation transcript:

Similar presentations

About project

Feedback