From Machine Learning to Inductive Logic Programming: ILP made easy

From Machine Learning to Inductive Logic Programming: ILP made easy
Hendrik Blockeel Katholieke Universiteit Leuven Belgium 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Contents of this course
Introduction What is Inductive Logic Programming? Relationship with other fields Foundations of ILP Algorithms Applications Contents and slides in co-operation with Luc De Raedt of the University of Freiburg, Germany 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

What is inductive logic programming?
1. Introduction What is inductive logic programming? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Introduction: What is ILP?
Paradigm for inductive reasoning (reasoning from specific to general) Related to machine learning and data mining logic programming 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

ILP made easy -- ESSLLI 2000, Birmingham
Inductive reasoning Reasoning from specific to general from (specific) observations to a (general) hypothesis Studied in philosophy of science statistics ... 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

This tomato is red All tomatoes are red This tomato is also red Distinguish: weak induction: all observed tomatoes are red strong induction: all tomatoes are red 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Weak induction: conclusion is entailed by (follows deductively from) observations cannot be wrong Strong induction: conclusion does not follow deductively from observations could be wrong! logic does not provide justification probability theory may 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

A predicate logic approach
Different kinds of reasoning in first order predicate logic Standard example: Socrates Mortal(Socrates) Deduction Human(Socrates) Mortal(x) Human(x) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Mortal(Socrates) Human(Socrates) Mortal(x) Human(x) Induction (generalise from observed facts) Human(Socrates) Abduction (suggest cause) Mortal(Socrates) Mortal(x) Human(x) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Logic programming focuses on deduction Other types of LP: abductive logic programming (ALP) inductive logic programming (ILP) 2 questions to be solved: How to perform induction? How to integrate it in logic programming? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some examples Learning a definition of “member” from examples member(a, [a,b,c]). member(b,[a,b,c]). member(3,[5,4,3,2,1]). :- member(b, [1,2,3]). :- member(3, [a,b,c]). member(X, [X|Y]). member(X, [Y|Z]) :- member(X,Z). Hypothesis Examples 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some examples Use of background knowledge E.g., learning quicksort qsort([b,c,a], [a,b,c]). qsort([], []) . qsort([5,3],[3,5]). :- qsort([5,3],[5,3]). :- qsort([1,3] [3]). split(L, A, B) :- ... append(A,B,C) :- ... qsort([], []). qsort([X], [X]). qsort(X,Y) :- split(X, A, B), qsort(A, AS), qsort(B, BS), append(AS, BS, Y). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some examples Not only predicate definitions can be learned; e.g.: learning constraints parent(jack,mary). parent(mary,bob). father(jack,mary). mother(mary,bob). male(jack). male(bob). female(mary). :- male(X), female(X). male(X) :- father(X,Y). father(X,Y); mother(X,Y) :- parent(X,Y). … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Practical applications
Program synthesis very hard subtasks: debugging, validation, … Machine learning e.g., learning to play games Data mining mining in large amounts of structured data 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example Application: Mutagenicity Prediction
Given a set of molecules Some cause mutation in DNA (these are mutagenic), others don’t Try to distinguish them on basis of molecular structure Srinivasan et al., 1994: found “structural alert” 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example Application: Pharmacophore Discovery
Application by Muggleton et al., 1996 Find "pharmacophore" in molecules = identify substructure that causes it to "dock" on certain other molecules Molecules described by listing for each atom in it: element, 3-D coordinates, ... Background defines euclidean distance, ... 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some example molecules: (Muggleton et al. 1996) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Description of molecules: Background knowledge: atm(m1,a1,o,2, , , ). atm(m1,a2,c,2, , , ). atm(m1,a3,o,2, , , ). ... bond(m1,a2,a3,2). bond(m1,a5,a6,1). bond(m1,a2,a4,1). bond(m1,a6,a7,du). ... hacc(M,A):- atm(M,A,o,2,_,_,_). hacc(M,A):- atm(M,A,o,3,_,_,_). hacc(M,A):- atm(M,A,s,2,_,_,_). hacc(M,A):- atm(M,A,n,ar,_,_,_). zincsite(M,A):- atm(M,A,du,_,_,_,_). hdonor(M,A) :- atm(M,A,h,_,_,_,_), not(carbon_bond(M,A)), !. active(A) :- zincsite(A,B), hacc(A,C), hacc(A,D), hacc(A,E), dist(A,C,B,4.891,0.750), dist(A,C,D,3.753,0.750), dist(A, C,E,3.114,0.750), dist(A,D,B,8.475,0.750), dist(A,D,E, 2.133,0.750), dist(A,E,B,7.899,0.750). -> Hypothesis: 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning to play strategic games

Advantages of ILP Advantages of using first order predicate logic for induction: powerful representation formalism for data and hypotheses (high expressiveness) ability to express background domain knowledge ability to use powerful reasoning mechanisms many kinds of reasoning have been studied in a first order logic framework 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Foundations of Inductive Logic Programming

Overview Concept learning: the Versionspaces approach from machine learning how to search for a concept definition consistent with examples based on notion of generality 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Notions of generality in ILP the theta-subsumption ordering other generality orderings basic techniques and algorithms Representation of data two paradigms: learning from implications, learning from interpretations 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Concept learning Given: an instance space some unknown concept = subset of instance space Task: learn concept definition from examples (= labelled instances) Could be defined extensionally or intensionally Usually interested in intensional definition otherwise no generalisation possible 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Hypothesis h = concept definition can be represented intensionally : h or extensionally (as set of examples) : ext(h) Hypothesis h covers example e iff eext(h) Given a set of (positive and negative) examples E = <E+, E->, h is consistent with E if E+ext(h) and ext(h)E- =  11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Versionspaces Given a set of instances E and a hypothesis space H, the versionspace is the set of all hH consistent with E contains all hypotheses in H that might be the correct target concept Some inductive algorithms exist that, given H and E, compute the versionspace VS(H,E) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Properties If target concept cH, and E contains no noise, then cVS(H,E) If VS(H,E) is singleton : one solution Usually multiple solutions If H = 2I with I instance space: i.e., all possible concepts in H then : no generalisation possible H is called inductive bias 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Usually illustrated with conjunctive concept definitions Example : from T. Mitchell, 1996: Machine Learning Sky AirTemp Humidity Wind Water Forecast EnjoySport sunny warm normal strong warm same yes … … … … … … … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Lattice for Conjunctive Concepts
<?,?,?,?,?,?> <Sunny,?,?,?,?,?> <?,Warm,?,?,?,?> ... <?,?,?,?,?,Same> ... ... ... ... ... ... ... ... ... ... ... ... <Sunny,Warm,Normal,Strong,Warm,Same> ... <, , , , , > 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Concept represented as if-then-rule: <Sunny,Warm,?,?,?,?> IF Sky=sunny AND AirTemp=warm THEN EnjoySports=yes 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Generality Central to versionspace algorithms is notion of generality h is more general than h’ ( h  h’ ) iff ext(h’)ext(h) Properties of VS(H,E) w.r.t. generality: if sVS(H,E), gVS(H,E) and g  h  s, then hVS(H,E) => VS can be represented by its borders 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Candidate Elimination Algorithm
Start with general border G = {all} and specific border S = {none} When encountering positive example e: generalise hypotheses in S that do not cover e throw away hypotheses in G that do not cover e When encountering negative example e: specialise hypotheses in G that cover e throw away hypotheses in S that cover e 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

<?,?,?> <s,?,?> <c,?,?> <r,?,?> <?,w,?> <?,c,?> <?,?,n> <?,?,d> sw? sc? s?n s?d cw? cc? c?n c?d rw? rc? r?n r?d ?wn ?wd ?cn ?cd swn swd scn scd cwn cwd ccn ccd rwn rwd rcn rcd <,,> S 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

<c,w,n>: + <?,?,?> <s,?,?> <c,?,?> <r,?,?> <?,w,?> <?,c,?> <?,?,n> <?,?,d> sw? sc? s?n s?d cw? cc? c?n c?d rw? rc? r?n r?d ?wn ?wd ?cn ?cd swn swd scn scd cwn cwd ccn ccd rwn rwd rcn rcd S <,,> 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

<c,w,n>: + <c,c,d> : - <?,?,?> G G <s,?,?> <c,?,?> <r,?,?> <?,w,?> <?,c,?> <?,?,n> <?,?,d> sw? sc? s?n s?d cw? cc? c?n c?d rw? rc? r?n r?d ?wn ?wd ?cn ?cd swn swd scn scd cwn cwd ccn ccd rwn rwd rcn rcd S <,,> 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Keeping G and S may not be feasible exponential size In practice, most inductive concept learners do not identify VS but just try to find one hypothesis in VS 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Importance of generality for induction
Even when not VS itself, but only one element of it is computed, generality can be used for search properties allow to prune search space if h covers negatives, then any g  h also covers negatives if h does not cover some positives, then any s  h does not cover those positives either 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

For concept learning in ILP, we will need a generality ordering between hypotheses ILP is not only useful for learning concepts, but in general for learning theories (e.g., constraints) then we need generality ordering for theories 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Concept Learning in First Order Logic
Need a notion of generality (cf. versionspaces) -subsumption, entailment, … How to specialise / generalise concept definitions? operators for specialisation / generalisation inverse resolution, least general generalisation under -subsumption, … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Generality of theories
A theory G is more general than a theory S if and only if G |= S G |= S: in every interpretation (set of facts) for which G is true, S is also true "G logically implies S" e.g., "all fruit tastes good" |= "all apples taste good" (assuming apples are fruit) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Note: talking about theories, not just concepts (<-> versionspaces) generality of concepts is special case of this This will allow us to also learn e.g. constraints, instead of only predicate definitions (= concept definitions) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Deduction, induction and generality
Deduction = reasoning from general to specific is "always correct", = truth-preserving Induction = reasoning from specific to general = inverse of deduction not truth-preserving (“falsity-preserving”) there may be statistical evidence 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Deductive operators "|-" exist that implement (or approximate) |= E.g., resolution (from logic programming) Inverting these operators yields inductive operators basic technique in many inductive logic programming systems 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Various frameworks for generality
Depending on form of G and S 1 clause / set of clauses / any first order theory Depending on choice of |- to invert theta-subsumption resolution implication Some frameworks much easier than others 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

1) -subsumption (Plotkin)
Most often used in ILP S and G are single clauses c1 -subsumes c2 (denoted c1 c2 ) if and only if there exists a variable substitution  such that c1  c2 to check this, first write clauses as disjunctions a,b,c  d,e,f  a  b  c  d  e  f then try to replace variables with constants or other variables 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example: c1 = father(X,Y) :- parent(X,Y) c2 = father(X,Y) :- parent(X,Y), male(X) for  ={} : c1  c2 => c1 -subsumes c2 c3 = father(luc,Y) :- parent(luc,Y) for  ={X/luc} : c1 =c3 => c1 -subsumes c3 c2 and c3 do not -subsume one another 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Given facts for parent, male, female, … so-called background knowledge B Clause produces a set of father facts answer substitutions for X,Y when body considered as query or: facts occurring in minimal model of Bclause set = extensional definition of concept “father” 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Property : If c1 and c2 are definite Horn clauses c1  c2 Then facts produced by c2  facts produced by c1 (Easy to see from definition -subsumption) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Similarity with propositional refinement IF Sky = sunny THEN EnjoySports=yes To specialise: add 1 condition IF Sky=sunny AND Humidity=low THEN EnjoySports=yes ... 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

In first order logic: c1: father(X,Y) :- parent(X,Y) To specialize: find clauses -subsumed by c1 father(X,Y) :- parent(X,Y), male(X) father(luc,X) :- parent(luc,X) … = add literals or instantiate variables 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Another (slightly more complicated) example: c1 = p(X,Y) :- q(X,Y) c2 = p(X,Y) :- q(X,Y), q(Y,X) c3 = p(Z,Z) :- q(Z,Z) c4 = p(a,a) :- q(a,a) Which clauses -subsumed by which? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Properties of -subsumption: Sound: if c1 -subsumes c2 then c1 |= c2 Incomplete: possibly c1 |= c2 without c1 - subsuming c2 (but only for recursive clauses) c1 : p(f(X)) :- p(X) c2 : p(f(f(X))) :- p(X) Hence: -subsumption approximates entailment but is not the same 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Checking whether c1 -subsumes c2 is decidable but NP-complete Transitive and reflexive, not anti-symmetric "semi-order" relation e.g.: f(X,Y) :- g(X,Y), g(X,Z) f(X,Y) :- g(X,Y) both -subsume one another 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Semi-order generates equivalence classes + partial order on those equivalence classes equivalence class: c1 ~ c2 iff c1  c2 and c2  c1 c1 and c2 are then called syntactic variants c1 is reduced clause of c2 iff c1 contains minimal subset of literals of c2 that is still equivalent with c2 each equivalence class represented by its reduced clause 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

If c1 and c2 in different equivalence classes, either c1  c2 or c2  c1 or neither => anti- symmetry => partial order Thus, reduced clauses are partially ordered they form a lattice properties of this lattice? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

lgg p(X,Y) :- m(X,Y) p(X,Y) :- m(X,Y), m(X,Z) p(X,Y) :- m(X,Y), m(X,Z), m(X,U) ... p(X,Y) :- m(X,Y),r(X) p(X,Y) :- m(X,Y), m(X,Z),r(X) ... p(X,Y) :- m(X,Y),s(X) p(X,Y) :- m(X,Y), m(X,Z),s(X) ... reduced p(X,Y) :- m(X,Y),s(X),r(X) p(X,Y) :- m(X,Y), m(X,Z),s(X),r(X) ... glb 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Least upper bound / greatest lower bound of two clauses always exists and is unique Infinite chains c1  c2  c3  ...  c exist h(X) :- p(X,Y) h(X) :- p(X,X2), p(X2,Y) h(X) :- p(X,X2), p(X2,X3), p(X3,Y) ... h(X) :- p(X,X) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Looking for good hypothesis = traversing this lattice can be done top-down, using specialization operator or bottom-up, using generalization operator 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Heuristics-based searches (greedy, beam, exhaustive…)
top Heuristics-based searches (greedy, beam, exhaustive…) VS bottom 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Specialisation operators
Shapiro: general-to-specific traversal using refinement operator : (c) yields set of refinements of c theory: (c) = {c' | c' is a maximally general specialisation of c} practice: (c)  {c  {l} | l is a literal}  {c |  is a substitution} 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

daughter(X,Y) daughter(X,X) ...... daughter(X,Y) :- parent(X,Z) daughter(X,Y) :- female(X) daughter(X,Y) :- parent(Y,X) ... daughter(X,Y):-female(X),female(Y) daughter(X,Y):-female(X),parent(Y,X) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

How to traverse hypothesis space so that no hypotheses are generated more than once? no hypotheses are skipped? -> Many properties of refinement operators studied in detail 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some properties: globally complete: each point in lattice is reachable from top locally complete: each point directly below c is in (c) (useful for greedy systems) optimal: no point in lattice is reached twice (useful for exhaustive systems) minimal, proper, … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

A generalisation operator
For bottom-up search We discuss one generalisation operator: Plotkin’s lgg Starts from 2 clauses and compute least general generalisation (lgg) i.e., given 2 clauses, return most specific single clause that is more general than both of them 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Definition of lgg of terms: (let si, tj denote any term, V a variable) lgg(f(s1,...,sn), f(t1,...,tn)) = f(lgg(s1,t1),...,lgg(sn,tn)) lgg(f(s1,...,sn),g(t1,...,tn)) = V e.g.: lgg(a,b) = X; lgg(f(X),g(Y)) = Z; lgg(f(a,b,a),f(c,c,c))=f(X,Y,X); … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

lgg of literals: lgg(p(s1,...,sn),p(t1,...,tn)) = p(lgg(s1,t1),...,lgg(sn,tn)) lgg(p(...),  p(...)) =  lgg(p(...),p(...)) lgg(p(s1,...,sn),q(t1,...,tn)) is undefined lgg(p(...), p(...)) and lgg(p(...),p(...)) are undefined 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

lgg of clauses: lgg(c1,c2) = {lgg(l1, l2) | l1c1, l2c2 and lgg(l1,l2) defined} Example: f(t,a) :- p(t,a), m(t), f(a) f(j,p) :- p(j,p), m(j), m(p) lgg = f(X,Y) :- p(X,Y), m(X), m(Z) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Relative lgg (rlgg) (Plotkin 1971) relative to "background theory" B (assume B is a set of facts) rlgg(e1,e2) = lgg(e1 :- B, e2 :- B) method to compute: change facts into clauses with body B compute lgg of clauses remove B, reduce 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example: “Bongard problems”
Bongard: Russian scientist studying pattern recognition Given some pictures, find patterns in them Simplified version of Bongard problems used as benchmarks in ILP 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Examples labelled “neg” Examples labelled “pos” 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example: 2 simple Bongard problems, find least general clause that would predict both to be positive pos(1) pos(2). contains(1,o1) contains(2,o3). contains(1,o2). triangle(o1) triangle(o3). points(o1,down) points(o3,down). circle(o2). 1 2 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Method 1: represent example by clause; compute lgg of examples pos(1) :- contains(1,o1), contains(1,o2), triangle(o1), points(o1,down), circle(o2). pos(2) :- contains(2,o3), triangle(o3), points(o3,down). lgg( (pos(1) :- contains(1,o1), contains(1,o2), triangle(o1), points(o1,down), circle(o2)) , (pos(2) :- contains(2,o3), triangle(o3), points(o3, down) ) = pos(X) :- contains(X,Y), triangle(Y), points(Y,down) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Method 2: represent class of example by fact, other properties in background; compute rlgg Examples: Background: pos(1). pos(2). contains(1,o1) contains(2,o3). contains(1,o2). triangle(o1) triangle(o3). points(o1,down) points(o3,down). circle(o2). rlgg(pos(1), pos(2)) = ? (exercise) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

-subsumption ordering used by many ILP systems top down: using refinement operators (many systems) bottom up: using rlgg (e.g., Golem system, Muggleton & Feng) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Note: inverting implication Given the incompleteness of -subsumption, could we invert implication? Some problems: lgg under implication not unique; e.g., lgg of p(f(f(f(X)))):-p(X) and p(f(f(X))):-p(X) can be p(f(X)):-p(X) or p(f(f(X))):-p(Y) computationally expensive 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

2) Inverting resolution
Resolution rule for deduction: Propositional: First order: pq qr p  r p(X)  q(X) q(X)  r(X,Y) p(X)   r(X,Y) p  q q  s p  s p(a)  q(b) q(X)  r(X,Y) p(a)  r(b,Y) {X/b} 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Inverting resolution General resolution rule: 2 opposite literals (up to a substitution) : li1 = kj2 l1  ...  li  ...  ln k1  ...  kj  ...  km (l1  l2  ...  li-1  li+1  ...  ln  k1  kj-1  kj  km) 12 e.g., p(X) :- q(X) and q(X) :- r(X,Y) yield p(X) :- r(X,Y) p(X) :- q(X) and q(a) yield p(a). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Resolution implements |- for sets of clauses cf. -subsumption: for single clauses Inverting it allows to generalize a clausal theory Inverse resolution is much more difficult than resolution itself different operators defined no unique results 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Inverse resolution operators
Some operators related to inverse resolution: (A and B are conjunctions of literals) absorption: from q:-A and p :- A,B infer p :- q,B identification: from p :- q,B and p :- A,B infer q :- A p :- q,B q :- A p :- A,B p :- q,B q :- A p :- A,B 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Intra-construction: from p :- A,B and p :- A,C infer q :- B and p :- A,q and q :- C Inter-construction: from p :- A,B and q :- A,C infer p :- r,B and r :- A and q :- r,C q:-B p:-A,q q:-C p:-r,B r :- A q:-r,C inter intra p:-A,B p:-A,C p:-A,B q:-A,C 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

With intra- and inter-construction, new predicates are “invented” E.g., apply intra-construction on grandparent(X,Y) :- father(X,Z), father(Z,Y) grandparent(X,Y) :- father(X,Z), mother(Z,Y) What predicate is invented? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example inverse resolution
m(j) f(X,Y) :- p(X,Y),m(X) f(j,Y) :- p(j,Y) p(j,m) f(j,m)

grandparent(X,Y) :- father(X,Z), parent(Z,Y) father(X,Y) :- male(X), parent(X,Y) grandparent(X,Y) :- male(X), parent(X,Z), parent(Z,Y) male(jef) grandparent(jef,Y) :- parent(jef,Z),parent(Z,Y) parent(jef,an) grandparent(jef,Y) :- parent(an,Y) parent(an,paul) grandparent(jef,paul) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Properties of inverse resolution: + in principle very powerful - gives rise to huge search space - result of inverse resolution not unique e.g., father(j,p):-male(j) and parent(j,p) yields father(j,p):-male(j),parent(j,p) or father(X,Y):- male(X),parent(X,Y) or … CIGOL approach (Muggleton & Buntine) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

We now have some basic operators: -subsumption-based: at single clause level specialization operator:  generalization operator : lgg of 2 clauses inverse resolution: generalize a set of clauses These can be used to build ILP systems top-down: using specialization operators bottom-up: using generalization operators 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Representations 2 main paradigms for learning in ILP: learning from interpretations learning from entailment Related to representation of examples Cf. Bongard examples we saw before 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning from entailment
1 example = a fact e (or clause e:-B) Goal: Given examples <E+,E->, Find theory H such that e+E+: BH |- e+ e-E-: BH |- e- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

pos(1). pos(2). :- pos(3). Examples: contains(1,o1). contains(1,o2). contains(2,o3). triangle(o1). triangle(o3). points(o1,down). points(o3,down). circle(o2). contains(3,o4). circle(o4). Background: pos(X) :- contains(X,Y), triangle(Y), points(Y,down). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning from interpretations
Example = interpretation (set of facts) e contains a full description of the example all information that intuitively belongs to the example, is represented in the example, not in background knowledge Background = domain knowledge general information concerning the domain, not concerning specific examples 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Examples: pos(1) :- contains(1,o1), contains(1,o2), triangle(o1), points(o1,down), circle(o2). pos(2) :- contains(2,o3), triangle(o3), points(o3,down). :- pos(3), contains(3,o4), circle(o4). Background: polygon(X) :- triangle(X). polygon(X) :- square(X). pos(X) :- contains(X,Y), triangle(Y), points(Y,down). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Closed World Assumption made inside interpretations Examples: pos: {contains(o1), contains(o2), triangle(o1), points(o1,down), circle(o2)} pos: {contains(o3), triangle(o3), points(o3,down)} neg: {contains(o4), circle(o4)} Background: polygon(X) :- triangle(X). polygon(X) :- square(X). constraint on pos Y:contains(Y),triangle(Y),points(Y,down). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Note: when learning from interpretations can dispose of “example identifier” but can also use standard format CWA made for example description i.e., example description is assumed to be complete class of example related to information inside example + background information, NOT to information in other examples 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Because of 3rd property, more limited than learning from entailment cannot learn relations between different examples, nor recursive clauses … but also more efficient because of 2nd and 3rd property positive PAC-learnability results (De Raedt and Džeroski, 1994, AIJ), vs. negative results for learning from entailment 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Algorithms 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Rule induction Most inductive logic programming systems induce concept definition in form of set of definite Horn clauses (Prolog program) Many algorithms similar to propositional algorithms for learning rule sets FOIL -> CN2 Progol -> AQ 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

FOIL (Quinlan) Learns single concept, e.g., p(X,Y) :- ... To learn one clause: (hill-climbing search) start with general clause p(X,Y) :- true repeat add “best” literal to clause (i.e., literal that most improves quality of clause) new literal can also be unification: X=c or X=Y = applying refinement operator under -subsumption until no further improvement 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example father(homer,bart). father(bill,chelsea). :- father(marge,bart). :- father(hillary,chelsea). :- father(bart,chelsea). parent(homer,bart). parent(marge,bart). parent(bill,chelsea). parent(hillary,chelsea) male(homer). male(bart). male(bill). female(chelsea). female(marge). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

father(homer,bart). father(bill,chelsea). :- father(marge,bart). :- father(hillary,chelsea). :- father(bart,chelsea). parent(homer,bart). parent(marge,bart). parent(bill,chelsea). parent(hillary,chelsea). male(homer). male(bart). male(bill). female(chelsea). female(marge). father(X,Y) :- parent(X,Y). father(X,Y) :- parent(Y,X). father(X,Y) :- male(X). father(X,Y) :- male(Y). father(X,Y) :- female(X). father(X,Y) :- female(Y). 2+,2- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

father(homer,bart). father(bill,chelsea). :- father(marge,bart). :- father(hillary,chelsea). :- father(bart,chelsea). parent(homer,bart). parent(marge,bart). parent(bill,chelsea). parent(hillary,chelsea). male(homer). male(bart). male(bill). female(chelsea). female(marge). father(X,Y) :- parent(X,Y). father(X,Y) :- parent(Y,X). father(X,Y) :- male(X). father(X,Y) :- male(Y). father(X,Y) :- female(X). father(X,Y) :- female(Y). 2+,1- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

father(homer,bart). father(bill,chelsea). :- father(marge,bart). :- father(hillary,chelsea). :- father(bart,chelsea). parent(homer,bart). parent(marge,bart). parent(bill,chelsea). parent(hillary, chelsea). male(homer). male(bart). male(bill). female(chelsea). female(marge). [father(X,Y) :- male(X).] father(X,Y) :- male(X), parent(X,Y). father(X,Y) :- male(X), parent(Y,X). father(X,Y) :- male(X), male(Y). father(X,Y) :- male(X), female(X). father(X,Y) :- male(X), female(Y). 2+,0- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning multiple clauses: the “Covering” approach
To learn multiple clauses: repeat learn a single clause c (see previous algorithm) add c to h mark positive examples covered by c as “covered” until all positive examples marked “covered” or no more good clauses found 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

likes(garfield, lasagne). likes(garfield, birds). likes(garfield, meat). likes(garfield, jon). likes(garfield, odie). … likes(garfield, X) :- edible(X). 3+,0- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

likes(garfield, lasagne). likes(garfield, birds). likes(garfield, meat). likes(garfield, jon). likes(garfield, odie). … (italics: previously covered) likes(garfield, X) :- edible(X). likes(garfield, X) :- subject_to_cruelty(X). 2+,0- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some pitfalls Avoiding infinite recursion: when recursive clauses allowed, e.g., ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y) avoid learning parent(X,Y) :- parent(X,Y) won't be useful, even though it's 100% correct Bonus for introduction of new variables: literal may not yield any direct gain, but may introduce variables that may be useful later p(X) :- q(X) p positives, n negatives covered refine by adding age: p(X) :- q(X), age(X,Y) p positives, n negatives covered -> no gain 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Golem (Muggleton & Feng)
Based on rlgg-operator To build one clause: Look at 2 positive examples, find rlgg, generalize using yet another example, … until no improvement in quality of clause = bottom-up search Result very dependent on choice of examples e.g. what if true theory is {p(X) :- q(X) , p(X) :- r(X)} ? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Try this for different couples, pick best clause found this reduces dependency on choice of couple (if 1 of them noisy : no good clause found) Remove covered positive examples, restart process Repeat until no more good clauses found 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

1 limitation of Golem: extensional coverage tests only extensional background knowledge may go wrong when learning recursive clauses induces p(0). p(1). p(2). :- p(4). s(0,1). s(1,2). s(2,3). s(3,4). p(Y) :- s(X,Y), p(X). H:-B checked by running query (B  H) = extensional coverage test examples background 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Progol (Muggleton) Top-down approach, but with “seed” To find one clause: Start with 1 positive example e Generate hypothesis space He that contains only hypotheses that cover at least this one example first generate most specific clause c that covers e He contains every clause more general than c Perform exhaustive top-down search in He, looking for clause that maximizes compaction 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Compaction = size(covered examples) - size(clause) Repeat process of finding one clause until no more good (= causing compaction) clauses found Compaction heuristic in principle allows no coverage of negatives can be relaxed (accommodating noise) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Generation of bottom clause
Language bias = set of all acceptable clauses (chosen by user) = specification of H (on level of single clauses) Bottom clause  for example e = most specific clause in language bias covering e Constructed using “inverse entailment” 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Construction of : if BH |= e, then B  e |= H if H is clause, H is conjunction of ground (skolemized) literals compute  : all ground literals entailed by B  e H must be subset of these so B  e |=  |= H hence H |=  11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some examples (cf. Muggleton, NGC 1995) B e  anim(X) :- pet(X). pet(X) :- dog(X). nice(X) :- dog(X). nice(X) :- dog(X), pet(X), anim(X). hasbeak(X) :- bird(X). bird(X) :- vulture(X). hasbeak(tweety). hasbeak(tweety); bird(tweety); vulture(tweety). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example of (part of) Progol run learn to classify animals as mammals, reptiles, ... |- generalise(class/2)? [Generalising class(dog,mammal).] [Most specific clause is] class(A,mammal) :- has_milk(A), has_covering(A,hair), has_legs(A, 4), homeothermic(A), habitat(A,land). [C:-28,4,10,0 class(A,mammal).] [C:8,4,0,0 class(A,mammal) :- has_milk(A).] [C:5,3,0,0 class(A,mammal) :- has_covering(A,hair).] [C:-4,4,3,0 class(A,mammal) :- homeothermic(A).] [4 explored search nodes] f=8,p=4,n=0,h=0 [Result of search is] class(A,mammal) :- has_milk(A). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Exhaustive search : important to constrain size of hypothesis space Strong language bias specify which predicates to be used in head or body of clause specify types and modes of predicates e.g., allow: age(X,Y), Y<18 but not: habitat(X,Y), Y<18 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

E.g., for "animals" example: put this in head variable of type "animal" :- modeh(1,class(+animal,#class))? :- modeb(1,has_milk(+animal))? :- modeb(1,has_gills(+animal))? :- modeb(1,has_covering(+animal,#covering))? :- modeb(1,has_legs(+animal,#nat))? :- modeb(1,homeothermic(+animal))? :- modeb(1,has_eggs(+animal))? :- modeb(*,habitat(+animal,#habitat))? constant of type "covering" put this in body there can be any number of habitats only one literal of this kind needed 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Other approaches Algorithms we have seen up till now are rule based algorithms induce theory in the form of a set of rules (definite Horn clauses) induce rules one by one Quite normal, given that logic programs are essentially sets of rules… 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Still: induction of rule sets is only one type of machine learning Difference between ILP and propositional approaches is mainly in representation Possible to define other learning techniques and tasks in ILP: induction of constraints, induction of decision trees, Bayesian learning, ... 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Claudien (De Raedt & Bruynooghe)
"Clausal Discovery Engine" Discovers patterns that hold in set of data any patterns represented as clauses (not necessarily Horn clauses) I.e., finds patterns of a more general kind than predictive rules also called descriptive induction 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Given a hypothesis space: performs an exhaustive top-down search through the space returns all clauses that hold in the data set are not implied by other clauses found Strong language bias : precise syntactical description of acceptable clauses 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example language bias: {parent(X,Y), father(X,Y), mother(X,Y)} :- {parent(X,Y), father(X,Y), mother(X,Y), male(X), male(Y), female(X), female(Y)} May result in following clauses being discovered: parent(X,Y) :- father(X,Y). parent(X,Y) :- mother(X,Y). :- father(X,Y), mother(X,Y). :- male(X), female(X). mother(X,Y) :- parent(X,Y), female(X). ... 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Claudien algorithm S :=  Q := {} while Q not empty pick first clause c from Q for all (hb) in (c) : if query (bh) fails (i.e., clause is true in data) then if (hb) not entailed by clauses in S then add (hb) to S else add (hb) to Q 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

ICL (De Raedt and Van Laer)
“Inductive Constraint Logic” First system to learn from interpretations Search for constraints on interpretations distinguishing examples of different classes Roughly: run Claudien on set of examples E+ each constraint found will be true for all e+, but probably false for some e- all constraints together hopefully rule out all e- 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Search for one constraint: c := ; repeat until c true for all positives: find d in (c) so that d holds for as many positives and as few negatives as possible c := d add c to h can also use beam search 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Search for set of constraints on a class: h := {}; while there are negatives left to be eliminated: find a constraint c add c to h Uses same language bias (“DLAB”) as recent versions of Claudien DLAB is advanced form of original Claudien bias 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example of DLAB bias specification: min-max: [...] means at least min and at most max literals from the list are to be put here can be nested allows some nice tricks, e.g.: 1-1:[male(X),female(X)] 0-2:[parent(X,Y), father(X,Y), mother(X,Y)] <-- 0-len:[parent(X,Y), father(X,Y), mother(X,Y), male(X), male(Y), female(X), female(Y)] 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Warmr (Dehaspe) Induces “first order association rules” Algorithm similar to APRIORI Finds frequent patterns cf. "frequent item sets" in APRIORI context Pattern = conjunction of literals Uses -subsumption lattice over hypothesis space Constructs association rules from patterns IF this pattern occurs, THEN that pattern occurs too 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

The APRIORI algorithm APRIORI (Agrawal et al.): efficient discovery of frequent itemsets and association rules Typical example: market basket analysis which things are often bought together? Association rule: IF a1, …, an THEN an+1, … an+m 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Association rules should have at least some minimal support : #{t|(a1…an+m)} / #{t|true} how many people buy all these things together? confidence : #{t|a1…an+m}/#{t|a1…an} how many people of those buying IF-things also buy THEN-things? Minimal support and confidence may be low 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

APRIORI tailored towards using large data sets efficiency very important minimize data access Works in 2 steps: find frequent itemsets compute association rules from them 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Observation: if a1…an infrequent (below min. support) then a1…an+1 also infrequent adding a condition can only strengthen the conjunction Hence: {a1,…,an} can only be frequent if each subset of it is frequent 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Leads to levelwise algorithm: first compute frequent singletons then frequent pairs, triples, … a lot of pruning possible due to previous observation itemset of cardinality n is candidate if each subset of it of cardinality n-1 was frequent in previous level need to count only candidates 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example bread butter wine cheese ham jam Bread & butter Bread & cheese Bread & jam Butter & cheese Butter & jam Cheese & jam Bread & butter & cheese Bread & butter & jam Not a candidate 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Apriori algorithm Min_freq := min_support*freq() d := 0; Q0= {}; /* candidates for level 0 */ F := ; /* frequent sets */ while Qd   do for all S in Qd do find freq(S); Fd := {S in Qd | freq(S)  min_freq}; F := F  Fd compute Qd+1; d := d+1 return F; 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Computing candidates Compute Qd+1 from Fd : Qd+1 := ; for each S in Fd do for each item x not in S do S’ := S  {x}; if i in S’: S’\{i}  Fd then add S’ to Qd+1 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Step 2: deriving association rules from frequent sets if S  {a}  F and #(S{a})/#S > min_confidence then S -> S  {a} is a valid association rule = has sufficient support and confidence 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Warmr Warmr is first-order version of Apriori Patterns (“itemsets”) are now conjunctive queries “Frequent” patterns: what to count? examples, of course... Was easy in propositional case 1 example = 1 tuple -> count tuples 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

In first-order case: also easy when learning from interpretations not so clear when learning from implications which implications are examples? indicate this by specifying a key key = unique identification of example each pattern contains a set of variables that forms the key 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example: assume 100 people in database person(X): X is the key count answer substitutions of X, not Y or Z! [person(X),] mother(X,Y): 40 examples mother(X,Y), has_pet(Y,Z) : 30 examples “mother(X,Y) ---> has_pet(Y,Z)” : support 0.3, confidence 0.75 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Remark: association rule is NOT a clause mother(X,Y) ---> has_pet(Y,Z) = X: (Y:mother(X,Y)) -> (YZ:mother(X,Y),has_pet(Y,Z))  mother(X,Y) -> has_pet(Y,Z) main difference is occurrence of existentially quantified variables in conclusion 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Illustrated on Bongard drawings: 1 example = 1 drawing contains(D,Obj): D is the key Pattern: e.g., contains(D,X), circle(X), in(X,Y), circle(Y) Association rule: e.g., contains(D,X), circle(X),in(X,Y),circle(Y) --> contains(D,Z), square(Z) "drawings that contain a circle inside another circle usually also contain a square" 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Warmr also useful for feature construction Generally applicable method for improving representation of examples Given description of example derive new (propositional) features that describe the example add those features to a propositional description of the example run a propositional learner 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

For Bongard example: construct features "contains a circle", "contains a circle inside a triangle", ... given the correct features, a propositional representation of examples is possible Feature construction with ILP = general method for applying propositional machine learning techniques to structural examples 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Decision tree induction in ILP
S-CART (Kramer 1996): upgrade of CART Tilde (Blockeel & De Raedt ’98) upgrades C4.5 Both induce "first order" or "structural" decision trees (FOLDTs) test in node = first order literal may result in true or false -> binary trees different nodes may share variables "real" test in a node = conjunction of all literal in path from root to node 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Top-down Induction of Decision Trees: Algorithm
function TDIDT(E: set of examples): T := set of possible tests; t := BEST_SPLIT(T, E); E := partition induced on E by t if STOP_CRIT(E, E) then return leaf(INFO(E)) else for all Ei in E : ti := TDIDT(Ei) return inode(t, {(i, ti)}) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Set of possible tests: generated using refinement operator c = conjunction on path from root to node (c ) - c = literal(s) to be put in node Other auxiliary functions < prop. TDIDT best split: using e.g. information gain stop_crit: e.g. significance test info: e.g. most frequent class 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Known from propositional learning: induction of decision trees is fast usually yields good results These properties are inherited by Tilde / S-CART New results (not inherited from prop. learning) on expressiveness 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example FOLDT worn(X) yes no irreplaceable(X) ok yes no sendback fix ("x: Ø worn(x)) => ok ($x: worn(x) Ù irreplaceable(x)) => sendback ($x"y: worn(x) Ù Ø(worn(y) Ù irreplaceable(y))) => fix 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Expressiveness FOL formula equivalent with tree: ("x: Øworn(x)) => ok ($x: worn(x) Ù irreplaceable(x)) => sendback ($x"y: worn(x) Ù Ø(worn(y) Ù irreplaceable(y))) => fix Logic program equivalent with tree: a ¬ worn(X) b ¬ worn(X), irreplaceable(X) ok ¬ Ø a sendback ¬ b fix ¬ a Ù Ø b 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Prolog program equivalent with tree, using cuts (“first order decision list”): sendback :- worn(X), irreplaceable(X), ! fix :- worn(X), !. ok. 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

FOLDT can be converted to layered logic program containing invented predicates “flat” Prolog program (using cuts) Can not be converted to flat logic program 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Expressiveness T=L F F = Flat logic programs T = decision Trees L = decision Lists Difference is specific for first-order case Possible remedies for ILP systems: invent auxiliary predicates use both " and $ induce “decision lists” 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Representation with keys
class(e1,fix). worn(e1,gear). worn(e1,chain). class(e2,sendback). worn(e2,engine). worn(e2,chain). class(e3,sendback). worn(e3,control_unit). class(e4,fix). worn(e4,chain). class(e5,keep). worn(E,X)? not_replaceable(X)? class(E,keep) class(E,sendback) class(E,fix) conversion to Prolog replaceable(gear). replaceable(chain). not_replaceable(engine). not_replaceable(control_unit). class(E,sendback) :- worn(E,X), not_replaceable(X), !. class(E,fix) :- worn(E,X), !. class(E, keep). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

speed(x,s), s > 120, not job(x, politician), not (y: knows(x,y), job(y,politician)) => fine(x,Y) speed(X,S), S>120 yes no job(X, politician) N yes no N knows(X, Y) yes no job(Y, politician) Y yes no N Y 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Other advantages of FOLDTs
Both classification and regression possible classification : predict class (= learn concept) regression: predict numbers important: not given much attention in ILP Also clustering to some extent clustering: group similar examples together 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Many other approaches and applications of ILP possible...
Combination of ILP and Q-learning RRL ("relational reinforcement learning"): reinforcement learning in structural domains First-order equivalent of Bayesian networks First-order clustering needs first order distance measures ... 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Conclusions Many different approaches exist in Machine Learning ILP is in a sense diverging from concept learning… … to other approaches and tasks Still many new approaches to be tried! 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Applications of ILP 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Applications: Overview
User modelling Games Ecology Drug design Natural language Inductive Database Design … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

User Modelling Behavioural cloning build model of user’s behaviour simulate user’s behaviour by means of model e.g. : learning to fly / drive / … learning to play music learning to play games (adventure, strategic, …) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Automatic adaptation of system to user detect patterns in user’s actions use patterns to try to predict user’s next action based on predictions, make life easier for user e.g. mail system (auto-priority, …) adaptive web pages intelligent search engines … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example Applications Some applications the Leuven group has looked at: behavioural cloning: learning to play mus ic learning to play games automatic adaptation of system to user adaptive webpages a learning command shell intelligent interface 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning to Play Music Van Baelen & De Raedt, ILP-96 Playing music is difficult: not just playing the notes but: play with “feeling” adapt volume, speed, … Midi files provided to learning system System detects patterns w.r.t. pitch, volume, speed, … … and tries to play music itself 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Why an ILP approach? mainly because of time sequences Results? Compare computer generated MIDI file with human generated MIDI file “Computer makes similar mistakes as beginning player” See ILP-96 proc. for details (LNAI 1314) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Adaptive Webpages “Adaplix” project (Jacobs et al., 1997-) Webpage observes actions of user… e.g., which links are followed frequently, time that is spent on one page, … and adapts itself within limitations given by page author change layout of page move links to different places add or remove links 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

example site: identify yourself name, gender, occupation (personnel/student) based on this info: provides customized web page student project (in Dutch) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Intelligent Mailer “Visual Elm” (Jacobs, 1996) Intelligent mail interface: tries to detect which kind of mails are immediately deleted immediately read not deleted, read later forwarded … based on this, assigns priorities to new mails 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Predictions: priority assigned to new mails expected actions: delete, forward, … Explanation facility Several options offered to user e.g.: set priority threshold, only show mails above threshold sort mails according to priority … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning Shell Jacobs, Dehaspe et al. (1999) Context: Unix command shell, e.g., csh Each user has “profile” file defines configuration for user that makes it easier to use the shell usually default profile, unless user changes it manually 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Possible to learn profile file? Observe user which commands are often used? which parameters are used with the commands? Automatically construct better profile from observations 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example of input to ILP system : /* background */ command(Id, Command) :- isa(OrigCommand, Command), command(Id, OrigCommand). isa(emacs, editor). isa(vi, editor). /* observations */ command(1, ‘cd’). attribute(1, 1, ‘tex’). command(2, ‘emacs’). switch(2, 1, ‘-nw’). switch(2, 2, ‘-q’). attribute(2, 1, ‘aaai.tex’). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Detect relationships (“assocation rules”) with ILP system Warmr Examples of rules output by Warmr : IF command(Id, ‘ls’) THEN switch(Id, ‘-l’). IF recentcommand(Id, ‘cd’) AND command(ID, ‘ls’) THEN nextcommand(Id, ‘editor’). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some (preliminary) experimental results Evaluation criterion: predict next action of user Actions logged for 10 users each log about 500 commands 2 experiments: learning from all log files together learning from individual log files 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning from mixed data: predictive accuracy 35% (= fmax, relative frequency of most popular command) Learning from individual data: predictive accuracy 50% (> fmax) Conclusion: proposed approach to user modelling in this context shows promise 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Learning to Play Games Strategic games, adventure games, …: learning a strategy to play Examples: Rogue, … Slay Chess, Go, … : detecting patterns 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Strategic & Adventure Games
E.g. adventure games: Rogue, Wumpus, … 2-D world background knowledge Strategic game: “Slay” “Risk”-like game conquer territory of enemy larger territories are stronger 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Very complex to model 1 game situation = description of territories, game pieces, … description of user’s actions = set of moves recruiting new soldiers / building new watch towers move pieces around within or outside territory even during 1 ply, situation changes all the time order of moves is important (some moves only become possible after other moves) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Advantages of ILP in this context full description of all territories background knowledge easily incorporated e.g. rules of the game, definition of neighbouring areas, … logic representation allows for interesting reasoning mechanisms (e.g. event calculus) Unfinished work… 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Board Games Chess, Go, …: learning to recognise important patterns E.g., Go: Nakade forms, life/death problems alive Nakade dead 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some recent work in Go: high accuracy predicting “vital point” in Nakade forms relatively high accuracy predicting good moves to attack/defend groups of stones reduction of branching factor in search 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Ecology Environmental applications: relatively new field, gaining importance much to be learned much interest in data mining Some applications: Biodegradability Water quality 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Biodegradability Given some compound, predict whether it will degrade quickly in water = building predictive models regression approach : predict half life time classification approach : predict resistant/degradable ILP makes predictions based on molecular structure 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Water quality Quality of river water is monitored samples taken regularly at different sites study organisms, chemicals, … in water see how polluted the water is Time-related information yesterday’s chemicals influence today’s organisms ILP techniques used for predictive modelling 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Some approaches (with Tilde): predict water quality from chemical measurements taken during some interval predict chemical properties from biological measurements one at a time (different model for each chemical) all at once (regression with 16 target variables) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Drug design & related applications
See examples in introduction: Mutagenesis Pharmacophore discovery Other examples: Carcinogenesis (PTE challenge, IJCAI-97) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Carcinogenesis application: many chemicals to be tested, to determine whether they are carcinogenic tests are expensive and take long aim of PTE challenge: predict which compounds are (very likely to be) carcinogenic / safe may speed up testing process 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Natural Language Applications
Statistical approaches: typically use limited context ILP : potential to use unbounded context Applications: Part-of-speech tagging, NP chunking, grammar / morphology learning, … See Muggleton & Cussens, ESSLLI-2000 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Inductive Database Design
Given a deductive database find patterns in the database dependencies, constraints, … use these to restructure the database avoiding redundancy, increasing robustness hopefully arriving at a good design (possibly better than human-designed?) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Finding intensional definitions
For each predicate p: learn a set of clauses that forms a sound and complete definition of p algorithm based on Claudien learning one clause at a time recursive clauses possible intensional coverage test (slow) aim at maximising compactness (i.e. replacing large extensional definitions with small intensional ones) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

For instance: define path/2 from predicates arc/2 and path/2 first run: path(X,Y) :- arc(X,Z), path(Z,Y) found; valid, but does not yield compaction path(X,Y) :- arc(X,Y) also valid, greatest compaction -> added to Hp second run: path(X,Y) :- arc(X,Z), path(Z,Y) found; valid, and completes definition of Hp 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Combining definitions
If intensional definitions found for some predicates, replace extensional definition by intensional one Pitfall: definitions may be incompatible DB FID output p(1). q(1). p(2). q(2). p(3). q(3) Def for p: p(X):- q(X). Def for q: q(X) :- p(X). Both together: circular definition! 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Case 1: no intensional definition found for p easy: has to be extensional call set of these predicates E Case 2: intensional definition found for p… depending only on p or predicates in E call set of such p I1 depending only on p or predicates in I1E call set of such p I2 … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Case 3: other predicates these definitions may cause trouble (circular definitions) -> study them more closely Searching for such predicates: using graph algorithm find strongly connected component (SCC) of at least 2 elements in dependency graph SCC = “loops” in graph (path exists from each element in SCC to each other element in SCC) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example dependency graph
t q s u v Which predicates in E, I1, I2, …? Which form an SCC? 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Removing incompatibilities
Definitions in an SCC : are always sound but may be incomplete “breaking” the SCC (by defining at least one predicate extensionally) may make the definitions complete chose predicate with small extensional definition and large intensional one 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

IsIdd Above techniques (and some others) implemented in the IsIdd system (Interactive System for Inductive Database Design) Illustrative example: “family database” facts on family relationships: parent, grandparent, aunt/uncle, nephew/niece, … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

grandparent(X,Y) :- parent(X,Z), parent(Z,Y). sibling(X,Y) :- parent(Z,X), parent(Z,Y), noteq(X,Y). pil(X,Y) :- parent(X,Z), married(Y,Z). gil(X,Y) :- grandparent(X,Z), married(Y,Z). sil(X,Y) :- sibling(X,Z), married(Y,Z). sil(X,Y) :- sil(Y,X). sil(X,Y) :- pil(Z,X), pil(Z,Y), noteq(X,Y). aou(X,Y) :- sibling(X,Z), parent(Z,Y). noc(X,Y) :- aou(Z,X), parent(Z,Y). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Constraints found Constraints found using Claudien Constraints increase robustness of database false :- parent(A,B), parent(B,A). false :- married_asymm(A,B), married_asymm(B,C). false :- parent(A,B), parent(A,C), parent(B,C). false :- parent(A,B), parent(A,C), married_asymm(B,C). false :- parent(A,B), parent(B,C), parent(C,A). parent(A,B) :- parent(C,B), married_asymm(C,A). parent(A,B) :- parent(C,B), married_asymm(A,C). 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Beyond ILP 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

ILP for data mining Data mining = major application domain for ILP Current ILP systems require knowledge of and experience with Prolog, logic, … are not easy to use -> can only be used by highly trained people 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Current data mining systems require much less background in informatics are easier to use How to make ILP easier to use? Option 1: embed it into system (cf. IsIdd) Option 2: friendlier interface, e.g. more RDB-oriented interface 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Relational data mining
ILP can be set in relational database context Replace predicates with relations Replace hypothesis language (no logic) Simplify input for ILP systems Difference with other systems in this context: find patterns that extend over multiple tuples / tables 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Example: Registration Database

Relations can easily be transformed into predicates What about representation of hypothesis specification of types, modes, … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Hypothesis languages Users may not be familiar with logic Most are familiar with relational databases SQL: rather unreadable relational calculus: comparable with Prolog Ideal case: natural language translation Prolog -> English is feasible possibly expert-assisted 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Simplifying the inputs
Many settings are optional good defaults available non-experienced users can just ignore them Not so for language specifications! complicated part of input specifications cannot be avoided (currently) Need for simple formalism for language specification 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Use of UML UML can be used to describe database foreign key relationships, types, … Use UML as bias specification language Advantages: well known in a very broad community graphical input specification possible database may already have a description in UML -> no extra work needed 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

References Special issues of journals: New Generation Computing 95 Machine Learning 97 Journal of Logic Programming 99 Data Mining and Knowledge Discovery 99 … 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

Books: Muggleton, ed., ILP Academic Press, 92. Lavrac and Dzeroski 94 Nienhuys-Cheng and De Wolf 96 De Raedt, ed., Advances in ILP, IOS Press 96 Proc. of ILP workshops/conferences: from 1996 onwards available as Lecture Notes in Artificial Intelligence (Springer) 11/21/2018 ILP made easy -- ESSLLI 2000, Birmingham

From Machine Learning to Inductive Logic Programming: ILP made easy

Similar presentations

Presentation on theme: "From Machine Learning to Inductive Logic Programming: ILP made easy"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Machine Learning to Inductive Logic Programming: ILP made easy

Similar presentations

Presentation on theme: "From Machine Learning to Inductive Logic Programming: ILP made easy"— Presentation transcript:

Similar presentations

About project

Feedback