Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006

2 Objective: To learn the effect and preconditions of actions in partially observable domains. Two rooms in the world One with a switch and the other with a light bulb State of light bulb can only be observed when agent is in the West room. What does turning on the switch do?

3 Motivation: Exploration Agents Exploring partially observable domains –Interfaces to new software –Game-playing/companion agents –Robots exploring buildings, cities, planets –Agents acting in the WWW Difficulties: –No knowledge of actions’ effects apriori –Many features –Partially observable domain

4 Outline Problem Intuition Motivation Formal Definition Learning by Logical Inference Algorithm Experiments Comparison to other methods

5 SLAF: Simultaneous Learning and Filtering Exact learning of action models (i.e. the way actions affect the world). It determines the set of possible transition relations, given an execution sequence of actions and partial observations. Online update of Transition Belief Formula. –Similar to bayesian learning of HMM and Logical Filtering. –Basic algorithm takes linear time in the size of input formula.

6 SLAF: Simultaneous Learning and Filtering Assumptions: –Action models do not change with time. –System’s complete dynamics are no initially available. Solution: –All combinations of action models that could possibly have given rise to the observations in the input, and all the corresponding states in which the system may be. –Computing the solution can be done recursively.

7 Definition: Transition System A world state, s  S, is a subset of P that contains propositions true in this state. R(s,a,s’) means that state s’ is a possible result of action a in state s. A transition belief state is a set of tuples.

8 Example: Unlocking a Door An agent is in a room with a locked door. It has three different keys, and the agent cannot tell from observation only which key opens the door. The goal of the agent is to unlock the door. What is the transition system?

9 Example: Unlocking a Door P = {locked} S = {s 1,s 2 } –where s 1 = {locked} and s 2 = {} A = {unlock 1, unlock 2, unlock 3 } R 1 = {,, } What does R 1 mean?

10 Example: Unlocking a Door R2 and R3 can be defined in a similar fashion. A transition belief state given by ρ={,, } represents a fully known state of the world but only partially known action model. Agent needs to learn which key opens the door.

11 SLAF Semantics The progression of unlock 1 is given by SLAF[unlock 1 ](ρ)={,, }. Filtering of ρ on the observation ¬locked is given by SLAF[¬locked](ρ)={ }

12 Example: Back to the switch and the light. The semantics of SLAF generalize belief states and Logical Filtering. “If the transition relation is R, then the belief state is σ R ”

14 Learning Transition Models Directly is Intractable. It requires space Ω(2 2 |P| ) in many cases. It is possible to represent transition belief states more compactly using propositional logic. No encoding is compact for all sets! Re-define SLAF as an operation on propositional logic formulas with a propositional formula as output.

15 Terminology follows general propositional logical languages. L denotes a vocabulary, i.e., a set of propositional variables. L denotes a language, i.e., the set of propositional sentences. φ,ψ, are propositional formulas L ( L ) is the language built from propositions in L using standard connectives.

16 Definitions L A is the vocabulary of transition relations of the form a F G, where a is an action, and F and G stand for formulas. F are the effects of a. G are the preconditions, which we will assume to be a single state in S. If G holds in the current state, then F holds in the state that results from executing a.

17 Semantics Every interpretation M of L A correspond with a transition relation R M. Every transition relation has at least one (possibly more) interpretation that corresponds to it. On the left of the union we treat the cases of what the action affects, and on the right side what keeps its value (inertia).

18 Every Transition Relation Defines a Formula Th 0 addresses fluent changes, Th 1 addresses inertia, Th 2 addresses conditions in which actions are not executable. For every transition belief state ρ we can define a formula. For every formula we define a transition belief state.

19 Transition Formula Filtering Let C n L (φ) denote the set of logical consequences of φ restricted to vocabulary L. Consequence finding is any process that computes C n L (φ) for an input φ (e.g., resolution) The first part says that assuming a executes at time t, and it causes l when G holds, and G holds at time t, then l holds at time t+1. The second part says that if l holds after a’s execution, then it must be that alG holds in the current state.

21 Algorithm: SLAF 0 As stated before, consequence finding can be implemented with algorithms such as resolution. Example: –φ 0 = locked, φ 1 = SLAF 0 [unlock 2,locked](φ 0 ) –This is equivalent to ask if all models consistent with φ 1 give the value TRUE to unlock 2 locked locked. –Take the output of SLAF0 and check φ 1 ˄ ¬unlock2 locked locked is SAT.

22 SLAF distributes over logical connectives. When processing can be broken into independent pieces, computation scales up linearly in number of pieces.

23 Algorithm: Factored SLAF

25 How much time and space do SLAF computations take in practice? Experiments ran over domains taken from the 3 rd International Planning Competition. The algorithm receives 10 random fluents at every time step, it does not receive the size of the domain, the starting state, or the fluents. For every domain, the algorithm was ran over different number of fluents. The theoretical bound is O(Tn k )

26 Time per step remains relatively constant.

27 Space grows with the domain size, but scales easily for moderate domain sizes.

29 Other Approaches Reinforcement Learning and HMMs. –Maintain probability distribution over current state. –Exact solution is intractable for domains of high dimensionality –Approximate solutions have unbounded errors, or make strong mixing assumptions Learning AI-Planning operators. –Assume fully observable domain –Action preconditions are usually engineered to avoid unwanted cases.

30 Example of Other Methods: DBN

34 Conclusions First scalable learning algorithm for partially observable dynamic domains. Insight: Compact encoding (sometimes) using propositional logic. Exact for actions that always have the same effect. Takes polynomial update time Can solve problems with n>1000 domain features (>2 1000 states).

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Similar presentations

Presentation on theme: "Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Similar presentations

Presentation on theme: "Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006."— Presentation transcript:

Similar presentations

About project

Feedback