# ILP : Inductive Logic Programming

## Presentation on theme: "ILP : Inductive Logic Programming"— Presentation transcript:

ILP : Inductive Logic Programming

Induction Given a background theory Th (clauses)
positive examples Pos (ground facts) negative examples Neg (ground facts) Find a hypothesis Hyp in the form of a logic program such that for every pPos: Th Hyp |= p (Hyp covers p given Th ) for every nNeg: Th Hyp |= n (Hyp does not cover p given Th ) ILP generates Hyp in the form of a logic program.

Consistent hypothesis
complete incomplete

Inconsistent hypothesis

Example Predicates: group(X), in_group(e1,c1). circle(Z), square(Z),
triangle (t3,up). Description of the first set group(e1). circle(c1). triangle(t1,up). triangle(t2,up). triangle(t3,up). square(s1). in_group(e1,c1). in_group(e1,t1) in_group(e1,t2). inside(t3,c1) inside(s1,t2). How can candidate hypothesis look like? positive(X) :- group(X), in_group(X,Y1), triangle(Y1,up), in_group(X,Y2), triangle(Y2,up). negative(X) :- group(X), in_group(X,Y1), triangle(Y1,down).

What operations are used in the process of induction?
Generalization and specialization example action hypothesis +p(b,[b]) add clause p(X,Y). -p(x,[]) specialise p(X,[V|W]). -p(x,[a,b]) specialise p(X,[X|W]). +p(b,[a,b]) add clause p(X,[X|W]). p(X,[V|W]):-p(X,W). Induction: example

Algorithms ILP Generic ILP algorithm needs description of operations for design of new hypothesis Top-down approach: specialization (used e.g. in FOIL) Bottom-up approach: generalization (used e.g. in GOLEM)

m(X,Y) m(X,X) m(X,[Y|Z]) m([X|Y],Z) m(X,Y):-m(Y,X) m(X,[X|Z]) m(X,[Y|Z]):-m(X,Z) The set of (equivalence classes of) clauses is a lattice: C1 is more general than C2 iff for some substitution : C1  C2 greatest lower bound  -MGS, least upper bound  -LGG Specialisation  applying a substitution and/or adding a literal Generalisation  applying an inverse substitution and/or removing a literal Comment: There can be infinite chains! odstavce Generality of clauses

Specialization operators
Hypothesis F is a specialization of G iff F is a logical consequence of G G |= F (any model of G is a model of F). Specialization operator spec specifies the set of its specializations of a given clause. 2 basic spec. operations processing of used variables unification of 2 variables: spec(p(X, Y )) = p(X, X) substitution by a constant : spec(num(X)) = num(0) by a compount term: spec(num(X) = num(s(Y)) . Adding a literal into the body spec (p(X,Y)) = (p(X,Y):- edge(U,V))

Part of the specialisation graph for element/2
element(X,Y) element(X,[Y|Z]) element([X|Y],Z) element(X,X) element(X,Y):-element(Y,X) element(X,[X|Z]) element(X,[Y|Z]):-element(X,Z) Part of the specialisation graph for element/2

ILP generalization methods (searching the hypothesis space bottom-up)
The set of clauses is partially ordered by the relation of subsumption, characterizing „generalization“ and specialization (refinement) Def.: Let c, c1 be clauses. It is said that c -subsumes c1, if there is a substition  such that c   c1. Example: c = daughter(X,Y) :- parent(Y,X). c1 = daughter(X,Y) :- female(X),parent(Y,X). c2 = daughter(mary,ann) :- female(mary),parent(ann,mary),parent(ann,tom). Clause c is at least as general as the clause c1 iff c -subsumes c1. Clause c is more general than the clause c1 (c1 is a specialization of c) iff c - subsumes c1 and it is not true that c1 - subsumes c.

Usage of the operation -subsumes
Lemma 1: If c -subsumes c1, then c1 is a consequence of c, ie. c |- c1. Does the reverse claim hold? NO! See example c = list([V|W]) :- list(W). c1= list([X,Y|Z]) :- list(Z). Lemma 2: Using the partial order defined by -subsumption there can be found for any 2 clauses c, d their least upper and biggest lower bound (which is unique up to renaming of variables and -equivalence). Ussage? Pruning the space of hypotéz. Notation: d < c if d -subsumes c, ie. d is a generalization of c Application: Let e be an positive example covered by the clause c, ie. c |- e. According to L1 our hypothesis should be the generalizations of examples.

-subsumtion and the search in the space of hypothesis
If we generalize c to d ( d < c), all examples covered by c will be covered by d as well . If c covers some negative example, it is no good to generalize c. If we specialize c to f ( c < f), then the example not covered by c, will not be covered by f. If c does not cover some pozitive example, c is not worth of further specialization. Search for least general generalization – operator lgg – is purely syntactic task Example:lgg( [a,b,c], [a,c,d]) = [a,X,Y]. lgg( f(a,a), f(b,b)) = f (lgg(a,b), lgg(a,b)) = f (V,V), Attention to occurence of the same variable V in the case of repeated occurence of lgg(a,b), this is not the case of lgg(a,b) and lgg(b,a)

Definition of the lgg operator
lgg for terms t1, t2 lgg(t,t) = t lgg(f(s1,..,sn),f (t1,..,tn)) = f(lgg(s1,t1),.., lgg(sn,tn)) lgg(f(s1,..,sn),g (t1,..,tm)) = V, V- variable and f,g are different function symbols lgg(s,t) = V, where Vis a variable provided that at least one of the terms s,t is a variable lgg for atomic formulas lgg(A1,A2) lgg(p(s1,..,sn),p(t1,..,tn)) = p(lgg(s1,t1),.., lgg(sn,tn)) – the case of 2 atoms with the same predicate p lgg(p(s1,..,sn),q (t1,..,tn)) is not defined, if p and q are different symbols lgg for literals lgg(L1,L2) If both L1and L2 are positive, the task is reduced to lgg of atomic formulas If both L1 and L2 are negative, ie. L1= not A1, L2= not A2, than lgg (L1,L2) = not lgg(A1,A2) If L1 is positive and L2 negative, lgg(L1,L2) is not defined Example: lgg(parent(ann,mary),parent(ann,tom)) = parent(ann,X). lgg(parent(ann,mary),daughter(ann,tom)) not defined

lgg for clauses c1,c2 Suppose c1 = {L1,..,Ln} and c2 = {K1,..,Km}, then lgg (c1,c2) = { Fij = lgg(Li, Kj): Li Î c1, Kj Î c2 and lgg(Li ,Kj ) is defined } Example: c1 = daughter(mary,ann) :- female(mary),parent(ann,mary). c2 = daughter(eve,tom) :- female(eve),parent(tom,eve). lgg(c1,c2) = daughter(X,Z) :- female(X),parent(Z,X). Generalization wrt to background knowledge represented by conjunction K of ground facts - relative generalization by the operator rlgg rlgg(A1,A2) = lgg (A1:-K, A2 :-K) Appliaction in ILP: K is the set of all available facts from the task domain, atoms A1,A2 correspond to the training examples

Example: application of rlgg
Training examples Clas. Background knowledge daughter(mary,ann). + parent(ann, mary). female(ann). daughter(eve,tom). parent(ann, tom). female(mary). daughter(tom,ann). - parent(tom,eve). female(eve). daughter(eve,ann). parent(tom,ian). e1= daughter(mary,ann), e2= daughter(eve,tom) K = parent(ann, mary) & …& parent(tom,ian) & female(ann) & … female(eve). c1 = e1 :-K = d(m,a):-p(a,m) ,p(a,t),p(t,e),p(t,i),f(a),f(m),f(e). c2 = e2 :-K = d(e,t):-p(a,m),p(a,t),p(t,e) ,p(t,i) ,f(a),f(m),f(e). rlgg(e1,,e2)= lgg(c1,,c2) = d(Vm,e,Va,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e), p(a, Vm,t), p(Va,t, Vm,e), p(Va,t, Vm,i), p(Va,t, Vm,i) , p(a, Vt,m), p(V,t,a, Vi,m),…, f (Va,m), f (Va,e), f (Vm,e), … ., where Vm,e is lgg(m,e). Caution! The results of rlgg tends to be very long!

Irelevant literals d(Vm,e,Va,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e), p(a, Vm,t), p(Va,t, Vm,e), p(Va,t, Vm,i), p(Va,t, Vm,i) , p(a, Vt,m), p(V,t,a, Vi,m),…, f (Va,m), f (Va,e), f (Vm,e), … . Are there some literals which make no difference for distinguishing between positive and negative examples? If so, can they be omitted? If omitting a literal does not result in covering a negative example, we consider this literal to be irelevant. d(Vm,e,Va,t ):- p(Va,t, Vm,e), f (Vm,e). daughter(X,Y) :- parent(Y,X), female(X).

Generic ILP algorithm using a set R of rules for modif. of hypothesis
Input: B background knowledge E+ (E- ) the set of positive (negative) examples QH := inicialize(B; E+, E –) ; /*suggestion of the starting hypothesis*/  while not (end_criterion(QH)) do choose a hypothesis H from QH ; choose_modification_rules r1,…,rk from R ; applying r1,…,rk to H create the new hypothesis H1 fitting best E+ and E- ; QH := (QH-H) + H1 ; cancel_some_members of QH ; /*pruning*/ filter the set of examples E+ and E-  Choose_hypothesis P from QH

When is ILP usefull? ILP is a good choice whenever
relation among considered objects have to be taken into account the training data have no uniform structure (some objects are described extensively, other are mentioned in several facts only) there is extensive background knowledge which should be used for construction of hypothesis Some domains with succesfull industrial or research ILP applications: Bioinformatics, medicine, ecology Technical applications (finite element mash design, ..) Natural language processing

Structure Activity Relationships (SAR) task: given chem.structure of a compund empiric data about its toxicity/ mutageneticity/ terapeutic influence. What is the cause of the observed behaviour? Pozitive Negative Result: struktural indicator

Bioinformatics - structural description of organic compounds
Primary structure = sequence of aminoacids. Is it possible to predict the secondary structure (folds in space) from info about its primary structure ? Support for interpretation of NMR (nucleo-magnetic resonance) spectrum - there is required classification into 23 structural types. Classical ML methods - 80% accuracy, ILP 90% - corresponds to the results of a domain expert

Bioinformatics - carcinogenicity
230 aromatic and heteroaromatic compounds of natrium 188 compunds are well classifiable by attribute methods + remaining 42 coumponds, which are highly regression-unfriendly (denoted as RU group). The advantages of relational reprezentation have been demonstarted on the RU group : The hypothesis suggested by the ILP system PROGOL achieved 88% accuracy while the classical attribute ML methods reached about 20 % less.

East-West Trains

Other systems: http://www-ai.ijs.si/~ilpnet2/systems/
Aleph (descendant of P-Progol), Oxford University Tilde + WARMR = ACE (Blockeel, De Raedt 1998) FOIL (Quinlan 1993) GOLEM designes a hypothesis by a method which combines several rlgg steps and omitting of irelevant literals MIS (Shapiro 1981), Markus (Grobelnik 1992), WiM (1994) RSD (Železný 2002) search for interesting subgroups Other systems:

Download ppt "ILP : Inductive Logic Programming"

Similar presentations