Presentation is loading. Please wait.

Presentation is loading. Please wait.

ILP : Inductive Logic Programming. Given a background theory Th (clauses) positive examples Pos (ground facts) negative examples Neg (ground facts) Find.

Similar presentations

Presentation on theme: "ILP : Inductive Logic Programming. Given a background theory Th (clauses) positive examples Pos (ground facts) negative examples Neg (ground facts) Find."— Presentation transcript:

1 ILP : Inductive Logic Programming

2 Given a background theory Th (clauses) positive examples Pos (ground facts) negative examples Neg (ground facts) Find a hypothesis Hyp in the form of a logic program such that for every p  Pos: Th  Hyp |= p (Hyp covers p given Th ) for every n  Neg: Th  Hyp |= n (Hyp does not cover p given Th ) ILP generates Hyp in the form of a logic program. Induction

3 complete incomplete Consistent hypothesis

4 Inconsistent hypothesis

5 Predicates: –group(X), in_group(e1,c1). –circle(Z), square(Z), –triangle (t3,up). Description of the first set –group(e1). –circle(c1). triangle(t1,up). triangle(t2,up). –triangle(t3,up). square(s1). –in_group(e1,c1). in_group(e1,t1). in_group(e1,t2). –inside(t3,c1). inside(s1,t2). How can candidate hypothesis look like? –positive(X) :- group(X), in_group(X,Y1), triangle(Y1,up), in_group(X,Y2), triangle(Y2,up). –negative(X) :- group(X), in_group(X,Y1), triangle(Y1,down). Example

6 exampleactionhypothesis +p(b,[b]) add clause p(X,Y). -p(x,[]) specialise p(X,[V|W]). -p(x,[a,b]) specialise p(X,[X|W]). +p(b,[a,b]) add clause p(X,[X|W]). p(X,[V|W]):-p(X,W). Induction: example What operations are used in the process of induction? Generalization and specialization

7 Algorithms ILP Generic ILP algorithm needs description of operations for design of new hypothesis Top-down approach: specialization (used e.g. in FOIL) Bottom-up approach: generalization (used e.g. in GOLEM)

8 m(X,Y) m(X,[Y|Z]) m([X|Y],Z) m(X,Y):-m(Y,X) m(X,[X|Z]) m(X,X) m(X,[Y|Z]):-m(X,Z) The set of (equivalence classes of) clauses is a lattice: C 1 is more general than C 2 iff for some substitution  : C 1   C 2 greatest lower bound   -MGS, least upper bound   -LGG Specialisation  applying a substitution and/or adding a literal Generalisation  applying an inverse substitution and/or removing a literal Comment: There can be infinite chains! Generality of clauses odstavce

9 Specialization operators Hypothesis F is a specialization of G iff F is a logical consequence of G G |= F (any model of G is a model of F). Specialization operator spec specifies the set of its specializations of a given clause. 2 basic spec. operations + processing of used variables 3unification of 2 variables:spec(p(X, Y )) = p(X, X) 3substitution 3by a constant :spec(num(X)) = num(0) 3by a compount term: spec(num(X) = num(s(Y)). +Adding a literal into the body spec (p(X,Y)) = (p(X,Y):- edge(U,V))

10 Part of the specialisation graph for element/2 element(X,Y) element(X,[Y|Z])element([X|Y],Z) element(X,X)element(X,Y):-element(Y,X) element(X,[X|Z])element(X,[Y|Z]):-element(X,Z)

11 ILP generalization methods (searching the hypothesis space bottom-up) The set of c lauses is partially ordered by the relation of subsumption, characterizing „generalization“ and specialization (refinement) Def.: Let c, c1 be clauses. It is said that c  -subsumes c1, if there is a substition  such that c   c1. Example: c = daughter(X,Y) :- parent(Y,X). c1 = daughter(X,Y) :- female(X),parent(Y,X). c2 = daughter(mary,ann) :- female(mary),parent(ann,mary),parent(ann,tom). Clause c is at least as general as the clause c1 iff c  -subsumes c1. Clause c is more general than the clause c1 (c1 is a specialization of c) iff c  - subsumes c1 and it is not true that c1  - subsumes c.

12 Usage of the operation  -subsumes Lemma 1: If c  -subsumes c1, then c1 is a consequence of c, ie. c |- c1. Does the reverse claim hold? NO! See example c = list([V|W]) :- list(W). c1= list([X,Y|Z]) :- list(Z). Lemma 2: Using the partial order defined by  -subsumption there can be found for any 2 clauses c, d their least upper and biggest lower bound (which is unique up to renaming of variables and  -equivalence). Ussage? Pruning the space of hypotéz. Notation: d < c if d  -subsumes c, ie. d is a generalization of c Application: Let e be an positive example covered by the clause c, ie. c |- e. According to L1 our hypothesis should be the generalizations of examples.

13  -subsumtion and the search in the space of hypothesis If we generalize c to d ( d < c), all examples covered by c will be covered by d as well. If c covers some negative example, it is no good to generalize c. If we specialize c to f ( c < f), then the example not covered by c, will not be covered by f. If c does not cover some pozitive example, c is not worth of further specialization. Search for least general generalization – operator lgg – is purely syntactic task Example:lgg( [a,b,c], [a,c,d]) = [a,X,Y]. lgg( f(a,a), f(b,b)) = f (lgg(a,b), lgg(a,b)) = f (V,V), Attention to occurence of the same variable V in the case of repeated occurence of lgg(a,b), this is not the case of lgg(a,b) and lgg(b,a)

14 Definition of the lgg operator lgg for terms t1, t2 lgg(t,t) = t lgg(f(s1,..,sn),f (t1,..,tn)) = f(lgg(s1,t1),.., lgg(sn,tn)) lgg(f(s1,..,sn),g (t1,..,tm)) = V, V- variable and f,g are different function symbols lgg(s,t) = V, where Vis a variable provided that at least one of the terms s,t is a variable lgg for atomic formulas lgg(A 1,A 2 ) lgg(p(s1,..,sn),p(t1,..,tn)) = p(lgg(s1,t1),.., lgg(sn,tn)) – the case of 2 atoms with the same predicate p lgg(p(s1,..,sn),q (t1,..,tn)) is not defined, if p and q are different symbols lgg for literals lgg(L 1,L 2 ) If both L 1 and L 2 are positive, the task is reduced to lgg of atomic formulas If both L 1 and L 2 are negative, ie. L 1 = not A 1, L 2 = not A 2, than lgg (L 1,L 2 ) = not lgg(A 1,A 2 ) If L 1 is positive and L 2 negative, lgg(L 1,L 2 ) is not defined Example: lgg( parent(ann,mary),parent(ann,tom)) = parent(ann,X). lgg( parent(ann,mary),daughter(ann,tom)) not defined

15 lgg for clauses c 1,c 2 Suppose c 1 = {L 1,..,L n } and c 2 = {K 1,..,K m }, then lgg (c 1,c 2 ) = { F ij = lgg(L i, K j ): L i  c 1, K j  c 2 and lgg(L i,K j ) is defined } Example: c1 = daughter(mary,ann) :- female(mary),parent(ann,mary). c2 = daughter(eve,tom) :- female(eve),parent(tom,eve). lgg(c1,c2) = daughter(X,Z) :- female(X),parent(Z,X). Generalization wrt to background knowledge represented by conjunction K of ground facts - relative generalization by the operator rlgg rlgg(A 1,A 2 ) = lgg (A 1 :-K, A 2 :-K) Appliaction in ILP: K is the set of all available facts from the task domain, atoms A 1,A 2 correspond to the training examples

16 Example: application of rlgg Training examplesClas.Background knowledge daughter(mary,ann).+parent(ann, mary).female(ann). daughter(eve,tom).+parent(ann, tom).female(mary). daughter(tom,ann).-parent(tom,eve).female(eve). daughter(eve,ann).-parent(tom,ian). e 1 = daughter(mary,ann), e 2 = daughter(eve,tom) K = parent(ann, mary) & …& parent(tom,ian) & female(ann) & … female(eve). c 1 = e 1 :-K = d(m,a):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e). c 2 = e 2 :-K = d(e,t):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e). rlgg(e 1,, e 2 )= lgg(c 1,, c 2 ) = d(V m,e,V a,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e), p(a, V m,t ), p(V a,t, V m,e ), p(V a,t, V m,i ), p(V a,t, V m,i ), p(a, V t,m ), p(V,t,a, V i,m ),…, f (V a,m ), f (V a,e ), f (V m,e ), …., where V m,e is lgg(m,e). Caution! The results of rlgg tends to be very long!

17 Irelevant literals d(V m,e,V a,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e), p(a, V m,t ), p(V a,t, V m,e ), p(V a,t, V m,i ), p(V a,t, V m,i ), p(a, V t,m ), p(V,t,a, V i,m ),…, f (V a,m ), f (V a,e ), f (V m,e ), …. Are there some literals which make no difference for distinguishing between positive and negative examples? If so, can they be omitted? If omitting a literal does not result in covering a negative example, we consider this literal to be irelevant. d(V m,e,V a,t ):- p(V a,t, V m,e ), f (V m,e ). daughter(X,Y) :- parent(Y,X), female(X).

18 Generic ILP algorithm using a set R of rules for modif. of hypothesis Input:B background knowledge E + (E - ) the set of positive (negative) examples QH := inicialize(B; E +, E – ) ; /*suggestion of the starting hypothesis*/ while not (end_criterion(QH)) do while not (end_criterion(QH)) do choose a hypothesis H from QH ; choose_modification_rules r 1,…,r k from R ; applying r 1,…, r k to H create the new hypothesis H 1 fitting best ; applying r 1,…, r k to H create the new hypothesis H 1 fitting best E + and E - ; QH := (QH-H) + H 1 ; cancel_some_members of QH ; /*pruning*/ filter the set of examples filter the set of examples E + and E - Choose_hypothesis P from QH Choose_hypothesis P from QH

19 When is ILP usefull? ILP is a good choice whenever –relation among considered objects have to be taken into account –the training data have no uniform structure (some objects are described extensively, other are mentioned in several facts only) –there is extensive background knowledge which should be used for construction of hypothesis Some domains with succesfull industrial or research ILP applications: –Bioinformatics, medicine, ecology –Technical applications (finite element mash design,..) –Natural language processing

20 Bioinformatics: SAR tasks Structure Activity Relationships (SAR) task: given –chem.structure of a compund –empiric data about its toxicity/ mutageneticity/ terapeutic influence. What is the cause of the observed behaviour? Result: struktural indicator PozitiveNegative

21 Bioinformatics - structural description of organic compounds Primary structure = sequence of aminoacids. Is it possible to predict the secondary structure (folds in space) from info about its primary structure ? Support for interpretation of NMR (nucleo-magnetic resonance) spectrum - there is required classification into 23 structural types. Classical ML methods - 80% accuracy, ILP 90% - corresponds to the results of a domain expert

22 Bioinformatics - carcinogenicity 230 aromatic and heteroaromatic compounds of natrium 188 compunds are well classifiable by attribute methods + remaining 42 coumponds, which are highly regression-unfriendly (denoted as RU group). The advantages of relational reprezentation have been demonstarted on the RU group : The hypothesis suggested by the ILP system PROGOL achieved 88% accuracy while the classical attribute ML methods reached about 20 % less.

23 East-West Trains

24 Systems Aleph (descendant of P-Progol), Oxford University Tilde + WARMR = ACE (Blockeel, De Raedt 1998) FOIL (Quinlan 1993) GOLEM designes a hypothesis by a method which combines several rlgg steps and omitting of irelevant literals MIS (Shapiro 1981), Markus (Grobelnik 1992), WiM (1994) RSD (Železný 2002) search for interesting subgroups Other systems:

Download ppt "ILP : Inductive Logic Programming. Given a background theory Th (clauses) positive examples Pos (ground facts) negative examples Neg (ground facts) Find."

Similar presentations

Ads by Google