Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis.

Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis

Outline 1.Models of abductive reasoning 2.An abductive reasoning algorithm 3.“Not-easiness” of abducing conjunctions

Abductive reasoning: making plausible guesses Abductive reasoning: Given a conclusion c, find a “plausible” h that implies/leads to/… c Two varieties of “plausibility” in common use – Logical plausibility: a small h from which c follows – Bayesian plausibility: a h which has large posterior probability when given c In symbols… Pr [h|c true] > … Requires a prior distribution over representations…

Why might we want a new model? Existing models only tractable in simple cases – E.g. Horn rules (a ⋀ b ⋀ c ⇒ d …no negations), “nice” (conjugate) priors The choice of prior distribution really matters – And, it’s difficult to specify by hand

New model: abductive reasoning from random examples Fix a set of attributes (propositional variables x 1, x 2, …, x n ) An environment is modeled by an arbitrary, unknown distribution D over examples, i.e., settings of the n propositional variables. Task: for a conclusion c, find a h such that 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ μ (for some given μ) 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε All probabilities over examples from D

Example: identifying a subgoal Consider: blocks world. For t=1,2,…,T – Propositional state vars. (“fluents”) ON t (A,B), ON t (A,TABLE), ON t (C,A), etc. – Actions also encoded by propositional vars. PUT t (B,A), PUT t (C,TABLE), etc. Given many examples of interaction… Our goal c: ON T (A,TABLE) ⋀ ON T (B,A) ⋀ ON T (C,B) A perhaps plausibly good “subgoal” h: [ON T-1 (B,A) ⋀ PUT T (C,B)] ⋁ [PUT T-1 (B,A) ⋀ PUT T (C,B)] A B Or, even given by examples, not explicitly formulated…

Formally: abductive reasoning from random examples for a class H Fix a class of Boolean representations H Given Boolean formula c; ε, δ, μ ∈ (0,1); independent examples x (1),…,x (m) ∈ D, Suppose that there exists a h* ∈ H such that 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 Find a h (ideally in H) such that with prob. 1-δ, 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ 1/poly( 1 / μ, 1 / 1-ε, n) 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε

in pictures… c(x)=1 x ∈ {0,1} n h(x)=1 x: h(x)=1 c(x)=1 c(x)=0 c: goal/observation… h: explanation/solution/…

Theorem 1. If there is a k-DNF h* such that 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 then using m = O( 1 / με (n k +log 1 / δ )) examples, in time O(mn k ) we can find a k-DNF h such that with probability 1-δ, 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ μ 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negations

Algorithm for k-DNF abduction Start with h as an OR over all terms of size k For each example x (1),…,x (m) – If c(x (i) ) = 0, delete all terms T from h such that T(x (i) ) = 1 Return h Simple algorithm, 1 st proposed by J.S. Mill, 1843 Running time is clearly O(mn k )

Analysis pt 1: Pr x ∈ D [h(x)=1] ≥ μ We are given that some k-DNF h* has 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 Initially, every term of h* is in h Terms of h* are never true when c(x)=0 by 2. ☞ every term of h* remains in h ☞ h* implies h, so Pr x [h(x)=1]≥Pr x [h*(x)=1]≥μ.

Analysis pt 2: Pr x [c(x)=1|h(x)=1] ≥ 1-ε Rewrite conditional probability: Pr x ∈ D [c(x)=0 ⋀ h(x)=1] ≤ εPr x ∈ D [h(x)=1] We’ll show: Pr x ∈ D [c(x)=0 ⋀ h(x)=1] ≤ εμ ( ≤ εPr x ∈ D [h(x)=1] by part 1) Consider any h’ s.t. Pr x [c(x)=0 ⋀ h’(x)=1] > εμ – Since each x (i) is drawn independently from D Pr x ∈ D [no i has c(x (i) )=0 ⋀ h’(x (i) )=1] < (1-εμ) m – A term of h’ is deleted when c=0 and h’ =1 – So, h’ is only possibly output w.p. < (1-εμ) m

Analysis pt 2, cont’d: Pr x [c(x)=1|h(x)=1] ≥ 1-ε We’ll show: Pr x ∈ D [c(x)=0 ⋀ h(x)=1] ≤ εμ Consider any h’ s.t. Pr x [c(x)=0 ⋀ h’(x)=1] > εμ – h’ is only possibly output w.p. < (1-εμ) m There are only 2 O(n k ) possible k-DNF h’ Since (1- 1 / x ) x ≤ 1 / e, m = O( 1 / με (n k +log 1 / δ )) ex’s suffice to guarantee that each such h’ is only possible to output w.p. < δ/2 O(n k ) ☞ w.p. >1-δ, our h has Pr x [c(x)=0 ⋀ h(x)=1] ≤ εμ

Theorem 1. If there is a k-DNF h* such that 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 then using m = O( 1 / με (n k +log 1 / δ )) examples, in time O(mn k ) we can find a k-DNF h such that with probability 1-δ, 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ μ 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negations A version that tolerates exceptions is also possible–see paper…

Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions. Then there is a polynomial-time algorithm for PAC-learning DNF.

PAC Learning (x (1), c(x (1) )) (x (2) 1,c(x (2) )) (x (m), c(x (m) )) … D D f C C C∈CC∈C C∈CC∈C w.p. 1-δ over… x’=(x’ 1,x’ 2,…,x’ n ) w.p. 1-ε over… c(x’) f(x’)

Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions. Then there is a polynomial-time algorithm for PAC-learning DNF. Theorem (Daniely & Shalev-Shwartz’14). If there is a polynomial-time algorithm for PAC- learning DNF, then for every f(k)∞ there is a polynomial-time algorithm for refuting random k-SAT formulas of n f(k) clauses. This is a new hardness assumption. Use at your discretion. ☞

Key learning technique: Boosting (Schapire, 1990) Suppose that there is a polynomial-time algorithm that, given examples of c ∈ C, w.p. 1- δ produces a circuit f s.t. Pr x [f(x)=c(x)] > ½+ 1 / poly(n). Then there is a polynomial-time PAC-learning algorithm for the class C. – i.e., using the ability to produce such f’s, we produce a g for which Pr x [g(x)=c(x)] is “boosted” to any 1-ε we require.

Sketch of learning DNF using conjunctive abduction If c is a DNF and Pr[c(x)=1]> ¼, then some term T of c has Pr[T(x)=1]> 1 / 4|c| – …and otherwise f≡0 satisfies Pr[f(x)=c(x)]>½+¼ Note: Pr[c(x)=1|T(x)=1]=1, T is a conjunction ☞ Abductive reasoning finds some h such that Pr[h(x)=1]> 1 / poly(n) and Pr[c(x)=1|h(x)=1]>¾ Return f: f(x)=1 whenever h(x)=1; if h(x)=0, f(x)=maj x:h(x)=0 {c(x)}—Pr[f(x)=c(x)]>½+ 1 / 4poly(n)

Recap: learning abduction from random examples For goal condition c, if there is a h* such that Pr x ∈ D [h*(x)=1] ≥ μ & Pr x ∈ D [c(x)=1|h*(x)=1] = 1 using examples x from D, find a h such that Pr x ∈ D [h(x)=1] ≥ μ’ & Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε Thm 1. An efficient algorithm exists for k-DNF h Thm 2. Unless DNF is PAC-learnable, there is no efficient algorithm for conjunctions

Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis.

Similar presentations

Presentation on theme: "Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis.

Similar presentations

Presentation on theme: "Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis."— Presentation transcript:

Similar presentations

About project

Feedback