Inductive Logic Programming Representation scheme used – Logic Programs Need to – Recap logic programs – Specify the learning problem – Specify the operators – Worry about search considerations Also – Go through a session with Progol – Look at applications
Remember Logic Programs? Subset of first order logic All sentences are Horn clauses – Implications where a conjunction of literals (body) Imply a single goal literal (head) – Single facts can also be Horn clauses With no body A logic program consists of: – A set of Horn clauses ILP theory and practice is highly formal – Best way to progress and to show progress
Horn Clauses and Entailment Writing Horn Clauses: – h(X,Y) b 1 (X,Y) b 2 (X)... b n (X,Y,Z) Also replace conjunctions with a capital letter – h(X,Y) b 1, B – Assume lower case letters are single literals Entailment: – When one logic program, L 1 can be proved using another logic program L 2 We write: L 2 L 1 – Note that if L 2 L 1 This does not mean that L 2 entails that L 1 is false
Logic Programs in ILP Start with background information, – As a logic program labelled B Also start with a set of positive examples of the concept required to learn – Represented as a logic program labelled E + And a set of negative examples of the concept required to learn – Represented as a logic program labelled E - ILP system will learn a hypothesis – Which is also a logic program, labelled H
Explaining Examples A Hypothesis H explains example e – If logic program e is entailed by H – So, we prove e is true Example – H: class(A, fish) :- has_gills(A) – B: has_gills(trout) – Positive example: class(trout, fish) Entailed by H B taken together Note that negative examples can also be entailed – By the hypothesis and background taken together
Prior Conditions on the Problem Problem must be satisfiable: – Prior satisfiability: e E - (B e) – So, the background does not entail any negative example (if it did, no hypothesis could rectify this) – This does not mean that B entails that e is false Problem must not already be solved: – Prior necessity: e E + (B e) – If all the positive examples were entailed by the background, then we could take H = B.
Posterior Conditions on Hypothesis Taken with B, H should entail all positives – Posterior sufficiency: e E + (B H e) Taken with B, H should entail no negatives – Posterior satisfiability: e E - (B H e) If the hypothesis meets these two conditions – It will have perfectly solved the problem Summary: – All positives can be derived from B H – But no negatives can be derived from B H
Problem Specification Given logic programs E +, E -, B – Which meet the prior satisfiability and necessity conditions Learn a logic program H – Such that B H meet the posterior satisfiabilty and sufficiency conditions
Moving in Logic Program Space Can use rules of inference to find new LPs Deductive rules of inference – Modus ponens, resolution, etc. – Map from the general to the specific i.e., from L 1 to L 2 such that L 1 L 2 Look today at inductive rules of inference – Will invert the resolution rule Four ways to do this – Map from the specific to the general i.e., from L 1 to L 2 such that L 2 L 1 – Inductive inference rules are not sound
Inverting Deductive Rules Man alternates 2 hats every day – Whenever he wears hat X, he gets a pain, hat Y is OK Knows that a hat having a pin in causes pain – Infers that his hat has a pin in it Looks and finds the hat X does have a pin in it Uses Modus Ponens to prove that – His pain is caused by a pin in hat X Original inference (pin in hat X) was unsound – Could be many reasons for the pain in his head – Was induced so that Modus Ponens could be used
Inverting Resolution 1. Absorption rule of inference Rule written same as for deductive rules – Input above the line, and the inference below line Remember that q is a single literal – And that A, B are conjunctions of literals Can prove that the original clauses – Follow from the hypothesised clause by resolution
Proving Given clauses Exercise: translate into CNF – And convince yourselves Use the v diagram, – because we dont want to write as a rule of deduction Say that Absorption is a V-operator
Example of Absorption
Inverting Resolution 2. Identification Rule of inference: Resolution Proof:
Inverting Resolution 3. Intra Construction Rule of inference: Resolution Proof:
Predicate Invention Say that Intra-construction is a W-operator This has introduced the new symbol q q is a predicate which is resolved away – In the resolution proof ILP systems using intra-construction – Perform predicate invention Toy example: – When learning the insertion sort algorithm – ILP system (Progol) invents concept of list insertion
Inverting Resolution 4. Inter Construction Rule of inference: Resolution Proof: Predicate Invention Again
Generic Search Strategy Assume this kind of search: – A set of current hypothesis, QH, is maintained – At each search step, a hypothesis H is chosen from QH – H is expanded using inference rules Which adds more current hypotheses to QH – Search stops when a termination condition is met by a hypothesis Some (of many) questions: – Initialisation, choice of H, termination, how to expand…
Search (Extra Logical) Considerations Generality and Speciality There is a great deal of variation in – Search strategies between ILP programs Definition of generality/speciality – A hypothesis G is more general than hypothesis S iff G S. S is said to be more specific than G – A deductive rule of inference maps a conjunction of clauses G onto a conjunction of clauses S, such that G S. These are specialisation rules (Modus Ponens, resolution…) – An inductive rule of inference maps a conjunction of clauses S onto a conjunction of clauses G, such that G S. These are generalisation rules (absorption, identification…)
Search Direction ILP systems differ in their overall search strategy From Specific to General – Start with most specific hypothesis Which explain a small number (possibly 1) of positives – Keep generalising to explain more positive examples Using generalisation rules (inductive) such as inverse resolution – Are careful not to allow any negatives to be explained From General to Specific – Start with empty clause as hypothesis Which explains everything – Keep specialising to exclude more and more negative examples Using specialisation rules (deductive) such as resolution – Are careful to make sure all positives are still explained
Pruning Remember that: – A set of current hypothesis, QH, is maintained – And each hypothesis explains a set of pos/neg exs. If G is more general than S – Then G will explain more (>=) examples than S When searching from specific to general – Can prune any hypothesis which explains a negative Because further generalisation will not rectify this situation When searching from general to specific – Can prune any hypothesis which doesnt explain all positives Because further specialisation will not rectify this situation
Ordering There will be many current hypothesis in QH to choose from. – Which is chosen first? ILP systems use a probability distribution – Which assigns a value P(H | B E) to each H A Bayesian measure is defined, based on – The number of positive/negative examples explained – When this is equal, ILP systems use A sophisticated Occams Razor Defined by Algorithmic Complexity theory or something similar
Language Restrictions Another way to reduce the search – Specify what format clauses in hypotheses are allowed to have One possibility – Restrict the number of existential variables allowed Another possibility – Be explicit about the nature of arguments in literals – Which arguments in body literals are Instantiated (ground) terms Variables given in the head literal New variables – See Progols mode declarations
Example Session with Progol Animals dataset – Learning task: learn rules which classify animals into fish, mammal, reptile, bird – Rules based on attributes of the animals Physical attributes: number of legs, covering (fur, feathers, etc.) Other attributes: produce milk, lay eggs, etc. 16 animals are supplied 7 attributes are supplied
Input file: mode declarations Mode declarations given at the top of the file – These are language restrictions Declaration about the head of hypothesis clauses :- modeh(1,class(+animal,#class)) – Means hypothesis will be given an animal variable and will return a ground instantiation of class Declaration about the body clauses :- modeb(1,has_legs(+animal,#nat)) – Means that it is OK to use has_legs predicate in body And that it will take the variable animal supplied in the head and return an instantiated natural number
Input file: type information Next comes information about types of object – Each ground variable (word) must be typed animal(dog), animal(dolphin), … etc. class(mammal), class(fish), …etc. covering(hair), covering(none), … etc. habitat(land), habitat(air), … etc.
Input file: background concepts Next comes the logic program B, containing these predicates: – has_covering/2, has_legs/2, has_milk/1, – homeothermic/1, habitat/2, has_eggs/1, has_gills/1 E.g., – has_covering(dog, hair), has_milk(platypus), – has_legs(penguin, 2), homeothermic(dog), – habitat(eagle, air), habitat(eagle, land), – has_eggs(eagle), has_gills(trout), etc.
Input file: Examples Finally, E + and E - are supplied Positives: class(lizard, reptile) class(trout, fish) class(bat, mammal), etc. Negatives: :- class(trout, mammal) :- class(herring, mammal) :- class(platypus, reptile), etc.
Output file: generalisations We see Progol starting with the most specific hypothesis for the case when animal is a reptile – Starts with the lizard reptile and finds most specific: class(A, reptile) :- has_covering(A,scales), has_legs(A,4), has_eggs(A),habitat(A, land) Then finds 12 generalisations of this – Examples class(A, reptile) :- has_covering(A, scales). class(A, reptile) :- has_eggs(A), has_legs(A, 4). Then chooses the best one: – class(A, reptile) :- has_covering(A, scales), has_legs(A, 4). This process is repeated for fish, mammal and bird
Output file: Final Hypothesis class(A, reptile) :- has_covering(A,scales), has_legs(A,4). class(A, mammal) :- homeothermic(A), has_milk(A). class(A, fish) :- has_legs(A,0), has_eggs(A). class(A, reptile) :- has_covering(A,scales), habitat(A, land). class(A, bird) :- has_covering(A,feathers) Gets 100% predictive accuracy on training set
Some Applications of ILP (See notes for details) Finite Element Mesh Design Predictive Toxicology Protein Structure Prediction Generating Program Invariants