Presentation on theme: "1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005."— Presentation transcript:
1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005
2/23 What is it all about? Symbolic machine learning. Learning from positive examples instead of positive and negative examples. The talk contains two parts: 1.General ideas and tactics to learn from positives. 2.How the particular ILP system CProgol 4.4 of S. Muggleton (1997) deals with positive only learning
3/23 Disclaimer This talk has not been extracted from a survey or any article in particular: this is more like a patchwork of my experiences in the domain and how I interpret them. Feel free to criticize: I would like feedback on these ideas since I never shared them before. I would really appreciate comments on the slides with the ? sign.
4/23 Definitions Concept spaceInstances space ordering Inferred concept C Positive/Negative example of C Target concept C Is more general / less specific than The concept space is usually partially ordered with this relation
5/23 Positive and Negative Learning Possibility 1: Discrimination of classes Characterise the difference in the pos/neg examples No model of the positive concept ! ?
6/23 Positive and Negative Learning Possibility 2: Characterisation of a class Use neg. examples to prevent over-generalisation Needs neg. examples close to the concept border ?
7/23 Positive Only Learning Aim: Characterisation of a class Choice ?
8/23 Positive Only Learning Two strategies: 1.Bias in the search space: choosing a space with a (very) strong structure. 2.Bias in the evaluation function: choose a concept with a compromise between: –Generality/specificity of the concept –Coverage of the positives by the concept –Complexity of the hypothesis representing the concept ?
9/23 Search space bias approach Main idea: consider strongly organised concept spaces Possible inference algorithm: –Select the concept the least general covering all examples. –The constraints on the search space ensures there is only one such concept. Trivial example (generally not useful), tree organisation:
10/23 Search space bias approach Advantages: –Strong theoretical convergence results possible. –Can lead to (very) fast inference algorithms. Drawback: –Not available for all concepts spaces! –Theorem: super-finite classes of concepts are not inferable in the limit this way (Gold 69). Super-finite = contains all concepts covering a finite number of examples and at least one concept covering an infinity.
11/23 Heuristic Approach Scoring making a compromise between: 1.Specificity of the concept 2.Coverage of the positives by the concept 3.Complexity of the concept Implementations: –Ad-hoc measure of points 1, 2, 3 and combination in a formulae, e.g.: Score = Coverage + Specificity – Complexity –Minimum Message Length ideas (~MDL) ?
12/23 Heuristic Approach: Ad-hoc implementation Elements of the score –Coverage: counting covered instances –Specificity: measure of the proportion of instances of the space covered –Complexity: the size of the concept representation (e.g., number of rules) Advantages: –Usually easy to implement –Usually provides parameters to tune the compromise Disadvantage: –No theory –Bias not always clear –How to combine coverage/specificity/complexity? ?
13/23 Heuristic Approach: MML implementation Canal Examples Hyp.Examples classes ¦ Hyp classes 0100101001011010101110101 Canal Examples and classes Hyp. 00101101010111011101101 Examples and classes ¦ Hyp MML for discrimination MML for characterisation Gain = number of bits needed to send the message without compression – number of bits needed to send the message with compression. ?
14/23 Heuristic Approach: MML implementation Advantages: –Some theoretical justifications in Kolmogorov/ Solomonov/ Ockam/ Bayes/ Chaitin works. –Absolute and meaningful score. Disadvantage: –Limit of the theory: the optimal code can NOT be computed ! –Difficult implementation: the choices of the encoding creates the inference biases, this is not very intuitive.
15/23 Positive only learning in ILP with CProgol4.2
16/23 Positive only learning in ILP The following is not a survey! This is from what I already encountered but I have not looked for further references. MML implementations –Muggleton  –Srinivasan, Muggleton, Bain  –Stahl  Other implementations: –Muggleton CProgol4.2  –Heuristic had-hoc method –Somehow based on MML, but the implementation details makes it quite different.
17/23 CProgol4.2 uses Bayes DHDH DIDI D I ¦h h H i I Score: P(h ¦ E) = P(h) * P(E ¦ h) / P(E) Fixing distributions and computing P(h), P(E ¦ h), P(E)
18/23 Assumptions for the distributions P(h) = e - size(h) –Large theories are less probable than small ones –size(h) = sum over the rules c i of h of the number of literals in the body of c i P(E ¦ h) = Π e E D I¦h (e) = Π e E D I (e) / D I (h) –Assumption that D I and D H gives D I¦h –Independence assumption between examples
19/23 Replacing in Bayes P(h ¦ E) = e - size(h) * [ Π e E D I (e) / D I (h) ] / P(E) As we want to compare hypotheses: = [ e - size(h) / D I (h) |E| ] * Cste1 Take the log: ln(P(h ¦ E)) = -size(h) + |E| * ln(1/D I (h)) + Cste2 We still have to compute D I (h)...
20/23 D I (h): weight of h in the instance set Computing D I : –Using a stochastic logic program S trained with the BK to model D I (not included in the talk) Computing D I (h): –Generate R instances from D I –h covers r of them –D I (h) = (r+1) / (R+2) H
21/23 Formulae for a whole theory covering E ln(P(h ¦ E)) = -size(h) - |E| * ln((r+1)/(R+2)) + C2 ComplexitySpecificityCoverage Estimation of final theory score from a partially inferred theory: ln(P(h ¦ E)) = |E|/p * size(h) - |E| * ln( |E|/p * (r+1)/(R+2)) + C3
22/23 Final evaluation Suppression of |E| and C2: –f(h) = size(h) /p + ln(p) - ln(|E| * (r+1)/(R+2)) Possible boost of positives with k: –size(h)/(k*p) + ln(k*p) - ln( |E|*(r+1)/(R+2) ) The formulae is not written anywhere (the above one is my best guess !). The papers are hard to understand But it seems to work... ComplexitySpecificityCoverage
23/23 Conclusion Learning from positives only is a real challenge and methods from positive and negatives can hardly be adapted. Some nice theoretical frameworks exist. When it gets to implementing heuristic frameworks: –The theory is often lost in approximations and choices of implementation. –Useful systems can be created but tuning and understanding the biases have to be considered as very important stages of inference.