Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Slides:



Advertisements
Similar presentations
Completeness and Expressiveness
Advertisements

Comparative Succinctness of KR Formalisms Paolo Liberatore.
Shortest Vector In A Lattice is NP-Hard to approximate
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Dynamic Bayesian Networks (DBNs)
Efficient Query Evaluation on Probabilistic Databases
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Planning under Uncertainty
Computability and Complexity 20-1 Computability and Complexity Andrei Bulatov Random Sources.
P, NP, PS, and NPS By Muhannad Harrim. Class P P is the complexity class containing decision problems which can be solved by a Deterministic Turing machine.
Copyright ©2010 Pearson Education, Inc. publishing as Prentice Hall 9- 1 Basic Marketing Research: Using Microsoft Excel Data Analysis, 3 rd edition Alvin.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
Handling non-determinism and incompleteness. Problems, Solutions, Success Measures: 3 orthogonal dimensions  Incompleteness in the initial state  Un.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
1 Validation and Verification of Simulation Models.
Department of Computer Science Undergraduate Events More
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
MAKING COMPLEX DEClSlONS
Applications of Propositional Logic
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Analysis of Algorithms
Knowledge Representation Use of logic. Artificial agents need Knowledge and reasoning power Can combine GK with current percepts Build up KB incrementally.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Pattern-directed inference systems
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
1 Relational Algebra and Calculas Chapter 4, Part A.
Unit 2 Architectural Styles and Case Studies | Website for Students | VTU NOTES | QUESTION PAPERS | NEWS | RESULTS 1.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Logical Agents Chapter 7. Knowledge bases Knowledge base (KB): set of sentences in a formal language Inference: deriving new sentences from the KB. E.g.:
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
CS Statistical Machine learning Lecture 24
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Reasoning about Knowledge 1 INF02511: Knowledge Engineering Reasoning about Knowledge (a very short introduction) Iyad Rahwan.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Block Ciphers and the Advanced Encryption Standard
Probabilistic Robotics
Univariate Gaussian Case (Cont.)
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Lecture 7: Constrained Conditional Models
Planning as model checking, (OBDDs)
Reinforcement Learning in POMDPs Without Resets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Hidden Markov Models Part 2: Algorithms
Alternating tree Automata and Parity games
Knowledge Representation I (Propositional Logic)
Dichotomies in CSP Karl Lieberherr inspired by the paper:
Fundamental Sampling Distributions and Data Descriptions
Presentation transcript:

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006

2 Objective: To learn the effect and preconditions of actions in partially observable domains. Two rooms in the world One with a switch and the other with a light bulb State of light bulb can only be observed when agent is in the West room. What does turning on the switch do?

3 Motivation: Exploration Agents Exploring partially observable domains –Interfaces to new software –Game-playing/companion agents –Robots exploring buildings, cities, planets –Agents acting in the WWW Difficulties: –No knowledge of actions’ effects apriori –Many features –Partially observable domain

4 Outline Problem Intuition Motivation Formal Definition Learning by Logical Inference Algorithm Experiments Comparison to other methods

5 SLAF: Simultaneous Learning and Filtering Exact learning of action models (i.e. the way actions affect the world). It determines the set of possible transition relations, given an execution sequence of actions and partial observations. Online update of Transition Belief Formula. –Similar to bayesian learning of HMM and Logical Filtering. –Basic algorithm takes linear time in the size of input formula.

6 SLAF: Simultaneous Learning and Filtering Assumptions: –Action models do not change with time. –System’s complete dynamics are no initially available. Solution: –All combinations of action models that could possibly have given rise to the observations in the input, and all the corresponding states in which the system may be. –Computing the solution can be done recursively.

7 Definition: Transition System A world state, s  S, is a subset of P that contains propositions true in this state. R(s,a,s’) means that state s’ is a possible result of action a in state s. A transition belief state is a set of tuples.

8 Example: Unlocking a Door An agent is in a room with a locked door. It has three different keys, and the agent cannot tell from observation only which key opens the door. The goal of the agent is to unlock the door. What is the transition system?

9 Example: Unlocking a Door P = {locked} S = {s 1,s 2 } –where s 1 = {locked} and s 2 = {} A = {unlock 1, unlock 2, unlock 3 } R 1 = {,, } What does R 1 mean?

10 Example: Unlocking a Door R2 and R3 can be defined in a similar fashion. A transition belief state given by ρ={,, } represents a fully known state of the world but only partially known action model. Agent needs to learn which key opens the door.

11 SLAF Semantics The progression of unlock 1 is given by SLAF[unlock 1 ](ρ)={,, }. Filtering of ρ on the observation ¬locked is given by SLAF[¬locked](ρ)={ }

12 Example: Back to the switch and the light. The semantics of SLAF generalize belief states and Logical Filtering. “If the transition relation is R, then the belief state is σ R ”

13 Outline Problem Intuition Motivation Formal Definition Learning by Logical Inference Algorithm Experiments Comparison to other methods

14 Learning Transition Models Directly is Intractable. It requires space Ω(2 2 |P| ) in many cases. It is possible to represent transition belief states more compactly using propositional logic. No encoding is compact for all sets! Re-define SLAF as an operation on propositional logic formulas with a propositional formula as output.

15 Terminology follows general propositional logical languages. L denotes a vocabulary, i.e., a set of propositional variables. L denotes a language, i.e., the set of propositional sentences. φ,ψ, are propositional formulas L ( L ) is the language built from propositions in L using standard connectives.

16 Definitions L A is the vocabulary of transition relations of the form a F G, where a is an action, and F and G stand for formulas. F are the effects of a. G are the preconditions, which we will assume to be a single state in S. If G holds in the current state, then F holds in the state that results from executing a.

17 Semantics Every interpretation M of L A correspond with a transition relation R M. Every transition relation has at least one (possibly more) interpretation that corresponds to it. On the left of the union we treat the cases of what the action affects, and on the right side what keeps its value (inertia).

18 Every Transition Relation Defines a Formula Th 0 addresses fluent changes, Th 1 addresses inertia, Th 2 addresses conditions in which actions are not executable. For every transition belief state ρ we can define a formula. For every formula we define a transition belief state.

19 Transition Formula Filtering Let C n L (φ) denote the set of logical consequences of φ restricted to vocabulary L. Consequence finding is any process that computes C n L (φ) for an input φ (e.g., resolution) The first part says that assuming a executes at time t, and it causes l when G holds, and G holds at time t, then l holds at time t+1. The second part says that if l holds after a’s execution, then it must be that alG holds in the current state.

20 Outline Problem Intuition Motivation Formal Definition Learning by Logical Inference Algorithm Experiments Comparison to other methods

21 Algorithm: SLAF 0 As stated before, consequence finding can be implemented with algorithms such as resolution. Example: –φ 0 = locked, φ 1 = SLAF 0 [unlock 2,locked](φ 0 ) –This is equivalent to ask if all models consistent with φ 1 give the value TRUE to unlock 2 locked locked. –Take the output of SLAF0 and check φ 1 ˄ ¬unlock2 locked locked is SAT.

22 SLAF distributes over logical connectives. When processing can be broken into independent pieces, computation scales up linearly in number of pieces.

23 Algorithm: Factored SLAF

24 Outline Problem Intuition Motivation Formal Definition Learning by Logical Inference Algorithm Experiments Comparison to other methods

25 How much time and space do SLAF computations take in practice? Experiments ran over domains taken from the 3 rd International Planning Competition. The algorithm receives 10 random fluents at every time step, it does not receive the size of the domain, the starting state, or the fluents. For every domain, the algorithm was ran over different number of fluents. The theoretical bound is O(Tn k )

26 Time per step remains relatively constant.

27 Space grows with the domain size, but scales easily for moderate domain sizes.

28 Outline Problem Intuition Motivation Formal Definition Learning by Logical Inference Algorithm Experiments Comparison to other methods

29 Other Approaches Reinforcement Learning and HMMs. –Maintain probability distribution over current state. –Exact solution is intractable for domains of high dimensionality –Approximate solutions have unbounded errors, or make strong mixing assumptions Learning AI-Planning operators. –Assume fully observable domain –Action preconditions are usually engineered to avoid unwanted cases.

30 Example of Other Methods: DBN

31 Example of Other Methods: DBN

32 Example of Other Methods: DBN

33 Example of Other Methods: DBN

34 Conclusions First scalable learning algorithm for partially observable dynamic domains. Insight: Compact encoding (sometimes) using propositional logic. Exact for actions that always have the same effect. Takes polynomial update time Can solve problems with n>1000 domain features (> states).