Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

Explanation-Based Learning (borrowed from mooney et al)
Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Everything You Need to Know (since the midterm). Diagnosis Abductive diagnosis: a minimal set of (positive and negative) assumptions that entails the.
Department of Computer Science The University of Texas at Austin Probabilistic Abduction using Markov Logic Networks Rohit J. Kate Raymond J. Mooney.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo.
Markov Logic: Combining Logic and Probability Parag Singla Dept. of Computer Science & Engineering Indian Institute of Technology Delhi.
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
Plan Recognition with Multi- Entity Bayesian Networks Kathryn Blackmond Laskey Department of Systems Engineering and Operations Research George Mason University.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.5 [P]: Propositions and Inference Sections.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Scientific Thinking - 1 A. It is not what the man of science believes that distinguishes him, but how and why he believes it. B. A hypothesis is scientific.
On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics Islam Beltagy and Katrin Erk The University of Texas at Austin IWCS 2015.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Propositional Logic Reasoning correctly computationally Chapter 7 or 8.
A Brief Introduction to Graphical Models
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
1 CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin.
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12 th,
Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu Raghavan Advisor: Raymond Mooney PhD Oral Defense Nov 29 th,
Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.
Markov Logic And other SRL Approaches
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Abduction CIS308 Dr Harry Erwin. Three Perspectives on Induction Ronald Fisher—use p-values to compare hypotheses. Jerzy Neyman—define your error probabilities.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
1 Knowledge Representation. 2 Definitions Knowledge Base Knowledge Base A set of representations of facts about the world. A set of representations of.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Abductive Plan Recognition By Extending Bayesian Logic Programs Sindhu V. Raghavan & Raymond J. Mooney The University of Texas at Austin 1.
Ensemble Methods in Machine Learning
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Abduction CIS308 Dr Harry Erwin. Contents Definition of abduction An abductive learning method Recommended reading.
1 First Order Logic CS 171/271 (Chapter 8) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
CHAPTER 5 Handling Uncertainty BIC 3337 EXPERT SYSTEM.
CS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence
Probabilistic Horn abduction and Bayesian Networks
Representing Uncertainty
Bayesian Statistics and Belief Networks
Presentation transcript:

Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1

Abduction Process of finding the best explanation for a set of observations (Peirce 1958) Inference of cause from effect Applications – Plan recognition – Medical diagnosis – Natural language understanding 2

Logical Abduction Given: Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic Observations, O, in the form of atomic facts in first-order logic Find: A hypothesis, H, a set of assumptions (atomic facts) that logically entail the observations given the theory: B  H  O Typically, best explanation is the one with the fewest assumptions, e.g. minimizes |H| 3

Example - Plan Recognition Background knowledge B: go(person,loc) :- shopping(person,loc,item). go(person,loc) :- robbing(person,loc,instr). get(person,instr) :- robbing(person,loc,instr). get(person,instr) :- hunting(person,loc,instr). store(loc) :- shopping(person,loc,item). store(loc) :- robbing(person,loc,instr). gun(instr) :- robbing(person,loc,instr). gun(instr) :- hunting(person,loc,instr). 4

Sample Observations O get(john,o1) gun(o1) go(john,p1) store(p1) 5

Abductive Proof 1 get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1)shopping(john,p1,s1) 6

Abductive Proof 2 get(john,o1) gun(o1) go(john,p1) store(p1) robbing(john,p1,o1) 7

Best Explanation 8 Explanation 1 hunting(john,s2,o1) shopping(john,p1,s1) Explanation 1 hunting(john,s2,o1) shopping(john,p1,s1) Explanation 2 robbing(john,p1,o1) Explanation 2 robbing(john,p1,o1) Best explanation makes fewest assumptions

Existing Work on Logical Abduction History of research from 70’s – 90’s. (Pople 1973; Levesque 1989; Ng and Mooney 1992) Abductive Logic Programming (ALP) (Kakas, Kowalski, and Toni, 1993) – Formalization based on logic programming. 9

Problem with Logical Abduction Not handle uncertainty of assumptions and inferences. Unable to chose between explanations with the same number of assumptions based on probability. 10

Other Approaches to Abduction Probabilistic abduction using Bayesian networks (Pearl 1988). – Unable to capture relational structure since Bayes nets are propositional in nature Probabilistic relational abduction using Markov Logic Networks (MLNs) (Kate and Mooney 2009). – Does not use logical abduction, instead uses complex reverse implications to approximate it. 11

Bayesian Logic Programs (BLP) (Kersting and De Raedt, 2001) Bayesian Logic Programs (BLPs) combine first- order logic and Bayesian networks. Deductive logic programming used to construct structure of a Bayes net. Not suitable for problems requiring abductive logical inference to form a proof structure by making assumptions. 12

Bayesian Abductive Logic Programs (BALP) BLP ALP BALP Suitable for tasks involving abductive reasoning – plan recognition, diagnosis, etc. 13

BLPs vs. BALPs Like BLPs, BALPs use logic programs as templates for constructing Bayesian networks. Unlike BLPs, BALPs uses logical abduction instead of deduction to construct the network. 14

Abduction in BALPs Given : A set of observation literals O = {O 1, O 2,….O n } Compute all distinct abductive proofs of O. Construct a Bayesian network using the resulting set of proofs as in BLPs. Perform probabilistic inference on the Bayesian network to compute the best explanation. 15

Abductive Proof 1 get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1)shopping(john,p1,s1) 16

Abductive Proof 2 get(john,o1) gun(o1) go(john,p1) store(p1) robbing(john,p1,o1) 17

Resulting Bayes Net get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1) shopping(john,p1,s1) robbing(john,p1,o1) 18

Probabilistic Parameters As with BLPs, CPTs for Bayes net specified in first-order clauses. Noisy-and combining rule is used to specify the CPT for combining the conjuncts in the body of the clause – Reduces the number of parameters needed – Parameters can be learned from data Noisy-or combining rule is used to specify the CPT for combining the disjunctive contributions from different ground clauses with the same head – Models “explaining away” – Parameters can be learned from data 19

Resulting Bayes Net get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1) shopping(john,p1,s1) robbing(john,p1,o1) 20 noisy or

Probabilistic Inference Specify truth value of observed facts. Compute the Most Probable Explanation (MPE) to determine the most likely combination of truth values to all unknown literals given this evidence. Use standard Bayes-net package E LVIRA for inference. 21

Resulting Bayes Net get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1) shopping(john,p1,s1) robbing(john,p1,o1) noisy or 22

Resulting Bayes Net get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1) shopping(john,p1,s1) robbing(john,p1,o1) Observed facts 23 noisy or

Resulting Bayes Net get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1) shopping(john,p1,s1) robbing(john,p1,o1) Observed facts Query variables 24 noisy or

Resulting Bayes Net get(john,o1) gun(o1) go(john,p1) store(p1) hunting(john,s2,o1) shopping(john,p1,s1) robbing(john,p1,o1) Observed facts Query variables FALSE TRUE 25 noisy or

Experimental Evaluation 26

Story Understanding Data Recognizing plans from narrative text (Charniak and Goldman 1991; Ng and Mooney 1992). Infer characters’ higher-level plans that explain their observed actions represented in logic. – “Fred went to the supermarket. He pointed a gun at the owner. He packed his bag.” => robbing – “Jack went to the supermarket. He found some milk on the shelf. He paid for it.” => shopping 25 development examples and 25 test examples 12.6 observations per example. Background knowledge base originally constructed for ACCEL system (Ng and Mooney 1992). 27

Story Understanding Methodology Noisy-and and noisy-or parameters set to 0.9 and priors hand-tuned on development data. Multiple high-level plans per example are possible. MPE inference used to compute the best explanation. Computed precision, recall and F-measure. 28

Story Understanding Systems Evaluated BALPs ACCEL – Simplicity (Ng and Mooney 1992) – Logical abduction preferring fewest # of assumptions. ACCEL – Coherence (Ng and Mooney 1992) – Logical abduction that maximally connects observations – Specific to story understanding. Abductive MLNs (Kate and Mooney 2009) 29

Story Understanding Results 30

Monroe Data Recognizing high level plans in an emergency response domain.  Developed by Blaylock and Allen (2005) to test a statistical n-gram approach. 10 high level plans including setting up shelter, providing medical attention, clearing road wreck. Artificially generated using SHOP-2 HTN planner examples for evaluation observation literals per example. Single correct plan in each example. Knowledge base constructed based on the domain knowledge encoded in HTN. 31

Monroe Methodology Parameters were set as in story understanding. Computed marginal probabilities for all high level plans and selected the single one with the highest probability. Computed convergence score to compare with Blaylock and Allen results. Convergence score is the fraction of examples for which the top level plan schema (predicate only) was predicted accurately after seeing all observations. 32

Monroe Results BALPBlaylock and Allen Convergence score

Modified Monroe Data When applying the Kate & Mooney (2009) abductive MLN approach to Monroe, it resulted in explosively large ground networks. Simplified Monroe domain just enough to prevent these problems. Developed typed clauses effective for abductive MLNs. Weight learning was still not tractable, so weights set manually. 34

Modified Monroe Methodology Repeatedly measured the percentage of correct plans inferred (including correct arguments) after observing an increasing fraction of the actions in the plan. 35

Modified-Monroe Results % observations seen by the systems Accuracy 36

Ongoing & Future Work Automatic learning of BALP parameters from data. Alternative MLN formulation more directly modeling the BALP approach. – Preliminary results for story understanding and modified Monroe are only slightly worse than BALP results, but much worse for original Monroe. Compare to other SRL approaches that have incorporated logical abduction (SLPs, P RISM ) when applied to plan recognition. Evaluation on other datasets/tasks/domains. 37

Conclusions New SRL framework BALP that combines Bayesian Logic Programs and Abductive Logic Programming. Well suited for relational abductive reasoning tasks like plan recognition. Empirical results demonstrate advantages over existing methods. 38

Questions?? 39