Implicit learning of common sense for reasoning Brendan Juba Harvard University.

Implicit learning of common sense for reasoning Brendan Juba Harvard University

A convenient example “Thomson visited Cooper’s grave in 1765. At that date, he had been traveling [resp.: dead] for five years. “Who had been traveling [resp.: dead]?” (The Winograd Schema Challenge, [Levesque, Davis, and Morgenstern, 2012]) Our approach: learn sufficient knowledge to answer such queries from examples.

The task In_grave(x)Alive(x)Traveling(x) 100 *00 01* 011 110 011 01* 100 000 100 010 011 *1* *00 The examples may be incomplete (a * in the table) Given In_grave(Cooper), we wish to infer ¬Traveling(Cooper) Follows from In_grave(x) ⇒ ¬Alive(x), Traveling(x) ⇒ Alive(x) These two rules can be learned from this data Challenge: how can we tell which rules to learn?

This work Given: examples, KB, and a query… Proposes a criterion for learnability of rules in reasoning: “witnessed evaluation” Presents a simple algorithm for efficiently considering all such rules for reasoning in any “natural” (tractable) fragment – “Natural” defined previously by Beame, Kautz, Sabharwal (JAIR 2004) – Tolerant to counterexamples as appropriate for application to “common sense” reasoning

This work Only concerns learned “common sense” – Cf. Spelke’s “core knowledge:” naïve theories, etc. – But: use of logical representations provide potential “hook” into traditional KR Focuses on confirming or refuting query formulas on a domain (distribution) – As opposed to: predicting missing attributes in a given example (cf. past work on PAC-Semantics)

Why not use… Bayes nets/Markov Logic/etc.? – Learning is the Achilles heel of these approaches: Even if the distributions are described by a simple network, how do we find the dependencies?

Outline 1.PAC-Semantics: model for learned knowledge – Suitable for capturing learned common sense 2.Witnessed evaluation: a learnability criterion under partial information 3.“Natural” fragments of proof systems 4.The algorithm and its guarantee

PAC Semantics (for propositional logic) Valiant, (AIJ 2000) Recall: propositional logic consists of formulas built from variables x 1,…,x n, and connectives, e.g., ∧ (AND), ∨ (OR), ¬(NOT) Defined with respect to a background probability distribution D over {0,1} n (Boolean assignments to x 1,…,x n ) ☞ Definition. A formula φ(x 1,…,x n ) is (1-ε)-valid under D if Pr D [φ(x 1,…,x n )=1] ≥ 1-ε. A RULE OF THUMB…

Examples In_grave(x)Alive(x)Traveling(x) 100 *00 01* 011 110 011 01* 100 000 100 010 011 *1* *00 In_grave(x) ⇒ ¬Alive(x) 1 1 1 1 0 1 1 1 1 1 1 1 * 1 Buried Alive!! Grave- digger APPEARS TO BE ≈86%-VALID…

Examples In_grave(x)Alive(x)Traveling(x) 100 *00 01* 011 110 011 01* 100 000 100 010 011 *1* *00 Traveling(x) ⇒ Alive(x) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Note: Agreeing with all observed examples does not imply 1-validity. Rare counterexamples may exist. We only get (1-ε)-valid with probability 1-δ

The theorem, informally Theorem. For every natural tractable proof system, there is an algorithm that efficiently simulates access during proof search to all rules that can be verified (1-ε)-valid on examples. Can’t afford to explicitly consider all rules! Won’t even be able to identify rules simulated Thus: rules are “learned implicitly”

Outline 1.PAC-Semantics: model for learned knowledge 2.Witnessed evaluation: a learnability criterion under partial information 3.“Natural” fragments of proof systems 4.The algorithm and its guarantee

Masking processes Michael, (AIJ 2010) A masking function m : {0,1} n → {0,1,*} n takes an example (x 1,…,x n ) to a partial example by replacing some values with * A masking process M is a masking function valued random variable – NOTE: the choice of attributes to hide may depend on the example!

Restricting formulas Given a formula φ and masked example ρ, the restriction of φ under ρ, φ| ρ, is obtained by “plugging in” the values of ρ i for x i whenever ρ i ≠ * and recursively simplifying (using game-tree evaluation). I.e., φ| ρ is a formula in the unknown values. ¬x ∨ y¬z ∧ ρ: x=0, y=0 =1 =0 ¬z ∨ z =1

Witnessed formulas We will learn rules that can be observed to hold under the given partial information: Definition. ψ is (1-ε)-witnessed under a distribution over partial examples M(D) if Pr ρ ∈ M(D) [ψ| ρ =1] ≥ 1-ε We will aim to succeed whenever there exists a (1-ε)-witnessed formula that completes a simple proof of the query formula… Remark: equal to “ψ is a tautology given ρ” in standard cases where this is tractable, e.g., CNFs, intersections of halfspaces; remains tractable in cases where this is not, e.g., 3-DNFs

Example: Resolution (“RES”) A proof system for refuting CNFs (AND of ORs) – Equiv., for proving DNFs (ORs of ANDs) Operates on clauses—given a set of clauses {C 1,…,C k }, may derive – (“weakening”) C i ∨ l from any C i (where l is any literal—a variable or its negation) – (“cut”) C’ i ∨ C’ j from C i =C’ i ∨ x and C j =C’ j ∨ ¬x Refute a CNF by deriving empty clause from it

Tractable fragments of RES Bounded-width Treelike, bounded clause space ∅ xixi ¬x i ¬x i ∨ x j ¬x i ∨ ¬x j … SPACE-2 ≡ “UNIT PROPAGATION,” SIMULATES CHAINING

Tractable fragments of RES Bounded-width Treelike, bounded clause space ☞ Applying a restriction to every step of proofs of these forms yields proofs of the same form (from a refutation of φ, we obtain a refutation of φ| ρ of the same syntactic form) Def’n (BKS’04): such fragments are “natural”

Other “natural” fragments… Bounded width k-DNF resolution L 1 -bounded, sparse cutting planes Degree-bounded polynomial calculus (more?) REQUIRES THAT RESTRICTIONS PRESERVE THE SPECIAL SYNTACTIC FORM

The basic algorithm Given query DNF φ and masked ex’s {ρ 1,…,ρ k } – For each ρ i, search for a refutation of ¬φ| ρ i If the fraction of successful refutations is greater than (1-ε), accept φ, and otherwise reject. CAN INCORPORATE KB CNF Φ : REFUTE [ Φ ∧ ¬φ]| ρ i

Example space-2 treelike RES refutation ∅ Traveling ¬Traveling ¬Traveling ∨ Alive ¬Alive ¬In_grave ∨ ¬Alive In_grave Given Refute Supporting “common sense” premises

Example [Traveling ∧ In_grave]|ρ 1 ∅ Traveling ¬Traveling ¬Traveling ∨ Alive ¬Alive ¬In_grave ∨ ¬Alive In_grave Given Refute Example ρ 1 : In_grave = 0, Alive = 1 =T =∅=∅ Trivial refutation

Example [Traveling ∧ In_grave]|ρ 2 ∅ Traveling ¬Traveling ¬Traveling ∨ Alive ¬Alive ¬In_grave ∨ ¬Alive In_grave Given Refute Example ρ 2 : Traveling = 0, Alive = 0 =T =∅=∅ Trivial refutation =T

The algorithm uses 1 / γ 2 log 1 / δ partial examples to distinguish the following cases w.p. 1-δ: The query φ is not (1-ε-γ)-valid There exists a (1-ε+γ)-witnessed formula ψ for which there exists a proof of the query φ from ψ LEARN ANY ψ THAT HELPS VALIDATE THE QUERY φ. N.B.: ψ MAY NOT BE 1-VALID The theorem, formally

Note that resolution is sound… – So, whenever a proof of φ| ρ i exists, φ was satisfied by the example from D ⇒ If φ is not (1-ε-γ)-valid, tail bounds imply that it is unlikely that a (1-ε) fraction satisfied φ On the other hand, consider the proof of φ from the (1-ε+γ)-witnessed CNF ψ… – With probability (1-ε+γ), all of the clauses of ψ simplify to 1 ⇒ The restricted proof does not require clauses of ψ Analysis “Implicitly learned”

Recap: this work… Proposed a criterion for learnability of common sense rules in reasoning: “witnessed evaluation” Presented a simple algorithm for efficiently considering all such rules as premises for reasoning in any “natural” (tractable) fragment – “Natural” defined by Beame, Kautz, Sabharwal (JAIR 2004) means: “closed under plugging in partial info.” – Tolerant to counterexamples as appropriate for application to “common sense” reasoning

Prior work: Learning to Reason Khardon & Roth (JACM 1997) showed that O(log n)-CNF queries could be efficiently answered using complete examples – No mention of theorem-proving whatsoever! – Could only handle low-width queries under incomplete information (Mach. Learn. 1999) Noise-tolerant learning captures (some kinds of) common sense (Roth, IJCAI’95)

Work in progress Further integration of learning and reasoning – Deciding general RES for limited learning problems in quasipoly-time: arXiv:1304.4633 – Limits of this approach: ECCC TR13-094 Integration with “fancier” semantics (e.g., naf) – The point: want to consider proofs using such “implicitly learned” facts & rules

Future work Empirical validation – Good domain? Explicit learning of premises – Not hard for our fragments under “bounded concealment” (Michael AIJ 2010) – But: this won’t tolerate counterexamples!

The task Given examples consisting of a list of Boolean values for attributes The examples may not be complete (corresponding to a * in the table) In_grave(x)Alive(x)Traveling(x) 100 *00 01* 011 110 011 01* 100 000 100 010 011 *1* *00

Implicit learning of common sense for reasoning Brendan Juba Harvard University.

Similar presentations

Presentation on theme: "Implicit learning of common sense for reasoning Brendan Juba Harvard University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implicit learning of common sense for reasoning Brendan Juba Harvard University.

Similar presentations

Presentation on theme: "Implicit learning of common sense for reasoning Brendan Juba Harvard University."— Presentation transcript:

Similar presentations

About project

Feedback