Inductive Logic Programming (for Dummies) Anoop & Hector.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
General Ideas in Inductive Logic Programming FOPI-RG - 19/4/05.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Automated Reasoning Systems For first order Predicate Logic.
Inductive Logic Programming: The Problem Specification Given: –Examples: first-order atoms or definite clauses, each labeled positive or negative. –Background.
Intelligent systems Lecture 6 Rules, Semantic nets.
Artificial Intelligence Chapter 14. Resolution in the Propositional Calculus Artificial Intelligence Chapter 14. Resolution in the Propositional Calculus.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
CS590D: Data Mining Prof. Chris Clifton April 21, 2005 Multi-Relational Data Mining.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning: Symbol-Based
Artificial Intelligence Chapter 14 Resolution in the Propositional Calculus Artificial Intelligence Chapter 14 Resolution in the Propositional Calculus.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Data Mining – Intro.
0 1 Todays Topics Resolution – top down and bottom up j-DREW BU procedure Subsumption – change to procedure Infinite Loops RuleML input – Prolog output.
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Machine Learning Chapter 3. Decision Tree Learning
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Inductive learning Simplest form: learn a function from examples
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
Empirical Explorations with The Logical Theory Machine: A Case Study in Heuristics by Allen Newell, J. C. Shaw, & H. A. Simon by Allen Newell, J. C. Shaw,
Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.
Introduction to search Chapter 3. Why study search? §Search is a basis for all AI l search proposed as the basis of intelligence l all learning algorithms,
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
RDN-Boost A Guide
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Predicate Calculus Syntax Countable set of predicate symbols, each with specified arity  0. For example: clinical data with multiple tables of patient.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Review: Tree search Initialize the frontier using the starting state While the frontier is not empty – Choose a frontier node to expand according to search.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 17 Wednesday, 01 October.
The AI War LISP and Prolog Basic Concepts of Logic Programming
For Monday Finish chapter 19 No homework. Program 4 Any questions?
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
Ch. 13 Ch. 131 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (notes?) Dr. Carter Tiernan.
For Wednesday Read 20.4 Lots of interesting stuff in chapter 20, but we don’t have time to cover it all.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Logic and Inductive Logic Programming.
1 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
DEDUCTION PRINCIPLES AND STRATEGIES FOR SEMANTIC WEB Chain resolution and its fuzzyfication Dr. Hashim Habiballa University of Ostrava.
Data Mining and Decision Support
Data Profiling 13 th Meeting Course Name: Business Intelligence Year: 2009.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
Frank DiMaio and Jude Shavlik Computer Sciences Department
Data Mining – Intro.
First-Order Logic and Inductive Logic Programming
CS490D: Introduction to Data Mining Prof. Chris Clifton
Biointelligence Lab School of Computer Sci. & Eng.
Presentation transcript:

Inductive Logic Programming (for Dummies) Anoop & Hector

EDAM Reading Group © Knowledge Discovery in DB (KDD) “automatic extraction of novel, useful, and valid knowledge from large sets of data” Different Kinds of: Knowledge –Rules –Decision trees –Cluster hierarchies –Association rules –Statistically unusual subgroups Data

EDAM Reading Group © Relational Analysis Would it not be nice to have analysis methods and data mining systems capable of directly working with multiple relations as they are available in relational database systems?

EDAM Reading Group © Single Table vs Relational DM The same problems still remain. But solutions? More problems: Extending the key notations Efficiency concerns

EDAM Reading Group © Relational Data Mining Data Mining : ML :: Relational Data Mining : ILP Initially Binary Classification Now Classification, Regression, Clustering, Association Analysis

EDAM Reading Group © ILP? India Literacy Project? International Language Program? Individualized Learning Program? Instruction Level Parallelism? International Lithosphere Program?

EDAM Reading Group © ILP Inductive Logic Programming: Is a sub-area of Machine Learning, that in turn is part of Artificial Intelligence Uses contributions from Logic Programming and Statistics Tries to automate the induction processes

EDAM Reading Group © Deductive Vs Inductive Reasoning T B  E (deduce) parent(X,Y) :- mother(X,Y). parent(X,Y) :- father(X,Y). mother(mary,vinni). mother(mary,andre). father(carrey,vinni). father(carry,andre). parent(mary,vinni). parent(mary,andre). parent(carrey,vinni). parent(carrey,andre). parent(mary,vinni). parent(mary,andre). parent(carrey,vinni). parent(carrey,andre). mother(mary,vinni). mother(mary,andre). father(carrey,vinni). father(carry,andre). parent(X,Y) :- mother(X,Y). parent(X,Y) :- father(X,Y). E B  T (induce)

EDAM Reading Group © ILP: Objective Given a dataset: Positive examples (E+) and optionally negative examples (E-) Additional knowledge about the problem/application domain (Background Knowledge B) Set of constraints to make the learning process more efficient (C) Goal of an ILP system is to find a set of hypothesis that: Explains (covers) the positive examples - Completeness Are consistent with the negative examples - Consistency

EDAM Reading Group © DB vs. Logic Programming DB Terminology Relation name p Attribute of relation p Tuple Relation p a set of tuples Relation q defined as a view LP Terminology Predicate symbol p Argument of predicate p Ground fact p(a 1,…,a n ) Predicate p defined extensionally by a set of ground facts Predicate q defined intentionally by a set of rules (clauses)

EDAM Reading Group © Relational Pattern IF Customer(C1,Age1,Income1,TotSpent1,BigSpender1) AND MarriedTo(C1,C2) AND Customer(C2,Age2,Income2,TotSpent2,BigSpender2) AND Income THEN BigSpender1 = Yes big_spender(C1,Age1,Income1,TotSpent1) married_to(C1,C2) customer(C2,Age2,Income2,TotSpent2,BigSpender2) Income

EDAM Reading Group © A Generic ILP Algorithm procedure ILP (Examples) I NITIALIZE (Theories, Examples) repeat T = S ELECT (Theories, Examples) {T i } n i=1 = R EFINE (T, Examples) Theories = R EDUCE (Theories T i, Examples) until S TOPPING C RITERION (Theories, Examples) return (Theories)

EDAM Reading Group © Procedures for a Generic ILP Algo. I NITIALIZE: initialize a set of theories (e.g. Theories = {true} or Theories = Examples) S ELECT: select the most promising candidate theory R EFINE: apply refine operators that guarantee new theories (specialization, generalization,…). R EDUCE: discard unpromising theories S TOPPING C RITERION: determine whether the current set of theories is already good enough (e.g. when it contains a complete and consistent theory) S ELECT and R EDUCE together implement the search strategy. (e.g. hill-climbing: R EDUCE = only keep the best theory.)

EDAM Reading Group © Search Algorithms Search Methods –Systematic Search Depth-first search Breadth-first search Best-first search –Heuristic Search Hill-climbing Beam-search Search Direction –Top-down search: Generic to specific –Bottom-up search: Specific to general –Bi-directional search

EDAM Reading Group © Example

EDAM Reading Group © Search Space

EDAM Reading Group © Search Space

EDAM Reading Group © Search Space as Lattice Search space is a lattice under  -subsumption There exists a lub and glb for every pair of clauses lub is ‘least general generalization’ Bottom-up approaches find the lgg of the positive examples

EDAM Reading Group © Basics of ILP cont’d Bottom-up approach of finding clauses leads to long clauses through lgg. Thus, prefer top-down approach since shorter and more general clauses are learned Two ways of doing top-down search –FOIL: greedy search using information gain to score –PROGOL: branch-and-bound, using P-N-l to score, uses saturation to restrict search space Usually, refinement operator is to –Apply substitution –Add literal to body of a clause

EDAM Reading Group © FOIL Greedy search, score each clause using information gain: –Let c 1 be a specialization of c 2 –Then WIG(c1) (weighted information gain) is –Where p  is the number of possible bindings that make the clause cover positive examples, p is the number of positive examples covered and n is the number of negative examples covered. –Background knowledge (B) is limited to ground facts.

EDAM Reading Group © PROGOL Branch and bound top-down search Uses P-N-l as scoring function: –P is number of positive examples covered –N is number of negative examples covered –l is the number of literals in the clause Preprocessing step: build a bottom clause using a positive example and B to restrict search space. Uses mode declarations to restrict language B not limited to ground facts While doing branch and bound top-down search: –Only use literals appearing in bottom clause to refine clauses. –Learned literal is a generalization of this bottom clause. Can set depth bound on variable chaining and theorem proving

EDAM Reading Group © Example of Bottom Clause Select a seed from positive examples, p(a), randomly or by order (first uncovered positive example) Gather all relevant literals (by forward chaining add anything from B that is allowable) Introduce variables as required by modes

EDAM Reading Group © Iterate to Learn Multiple Rules Select seed from positive examples to build bottom clause. Get some rule “If A  B then P”. Now throw away all positive examples that were covered by this rule Repeat until there are no more positive examples First seed selected First rule learned Second seed selected

EDAM Reading Group © From Last Time Why ILP is not just Decision Trees. –Language is First-Order Logic Natural representation for multi-relational settings Thus, a natural representation for full databases –Not restricted to the classification task. –So then, what is ILP?

EDAM Reading Group © What is ILP? (An obscene generalization) A way to search the space of First-Order clauses. –With restrictions of course –  -subsumption and search space ordering –Refinement operators: Applying substitutions Adding literals Chaining variables

EDAM Reading Group © More From Last Time Evaluation of hypothesis requires finding substitutions for each example. –This requires a call to PROLOG for each example –For PROGOL only one substitution required –For FOIL all substitutions are required (recall the p  in scoring function)

EDAM Reading Group © An Example: MIDOS A system for finding statistically-interesting subgroups –Given a population and its distribution over some class –Find subsets of the population for which the distribution over this class is significantly different (an example) –Uses ILP-like search for definitions of subgroups

EDAM Reading Group © MIDOS Example: –A shop-club database customer(ID, Zip, Sex, Age, Club) order(C_ID, O_ID, S_ID, Pay_mode) store(S_ID, Location) –Say want to find good_customer(ID) –FOIL or PROGOL might find: good_customer(C) :- customer(C,_,female,_,_), order(C,_,_,credit_card).

EDAM Reading Group © MIDOS Instead, let’s find a subset of customers that contain an interesting amount of members –Target: [member, non_member] –Reference Distribution: 1371 total –customer(C,_,female,_,_), order(C,_,_,credit_card). Distribution: 478 total 1.54 %

EDAM Reading Group © MIDOS How-to: –Build a clause in an ILP fashion –Refinement operators: Apply a substitution (to a constant, for example) Add a literal (uses given foreign link relationships) –Use a score for ‘statistical interest’ –Do a beam search

EDAM Reading Group © MIDOS ‘Statistically-interesting’ scoring –g: relative size of subgroup –p0 i : relative frequency of value i in entire population –p i : relative frequency of value i in subgroup

EDAM Reading Group © MIDOS Example –1378 total population, 478 female credit card buyers –906 members in total population, 334 members among female credit card buyers:

EDAM Reading Group © Scaling in Data Mining Scaling to large datasets –Increasing number of training examples Scaling to size of the examples –Increasing number of ground facts in background knowledge [Tang, Mooney, Melville, UT Austin, MRDM (SIGKDD) 2003.]

EDAM Reading Group © BETHE Addresses scaling to number of ground facts in background knowledge Problem: PROGOL bottom-clause is too big –This makes search space too big Solution: don’t build a bottom clause –Use FOIL-like search, BUT –Guide search using some seed example

EDAM Reading Group © Efficiency Issues Representational Aspects Search Evaluation Sharing computations Memory-wise scalability

EDAM Reading Group © Representational Aspects Example: –Student(string sname, string major, string minor) –Course(string cname, string prof, string cred) –Enrolled(string sname, string cname) In a natural join of these tables there is a one-to-one correspondance between join result and the Enrolled table Data mining tasks on the Enrolled table are really propositional MRDM is overkill

EDAM Reading Group © Representational Aspects Three settings for data mining: –Find patterns within individuals represented as tuples (single table, propositional) eg. Which minor is chosen with what major –Find patterns within individuals represented as sets of tuples (each individual ‘induces’ a sub-database) Multiple tables, restricted to some individual eg. Student X taking course A, usually takes course B –Find patters within whole database Mutliple tables eg. Course taken by student A are also taken by student B

EDAM Reading Group © Search Space restriction –Bottom clauses as seen above Syntactical biases and typed logic –Modes as seen above. –Can add types to variables to further restrict language Search biases and pruning rules –PROGOL’s bound (relies on anti-monotonicity of coverage) Stochastic search

EDAM Reading Group © Evaluation Evaluating a clause: get some measure of coverage –Match each example to the clause: Run multiple logical queries. –Query optimization methods from DB community Rel. Algebra operator reordering BUT: queries for DB are set oriented (bottom-up), queries in PROLOG find a single solution (top- down).

EDAM Reading Group © Evaluation More options –k-locality, given some bindings, literals in a clause become independent: eg. ?- p(X,Y),q(Y,Z),r(Y,U). Given a binding for Y, proofs of q and r are independent So, find only one solution for q, if no solution found for r no need to backtrack. –Relax  -subsumption using stochastic estimates Sample space of substitutions and decide on subsumption based on this sample

EDAM Reading Group © Sharing Computations Materialization of features Propositionalization Pre-compute some statistics –Joint distribution over attributes of a table –Query selectivity Store proofs, reuse when evaluating new clauses

EDAM Reading Group © Memory-wise scalability All full ILP systems work on memory databases –Exception: TILDE: learns multi-relational decision trees The trick: make example loop the outer loop Current solution: –Encode data compactly

EDAM Reading Group © References Dzeroski and Lavrac, Relational Data Mining, Springer, David Page, ILP: Just Do It, ILP Tang, Mooney, Melville, Scaling Up ILP to Large Examples: Results on Link Discovery for Counter Terrorism, MRDM Blockeel, Sebag: Scalability and efficiency in multi- relational data mining. SIGKDD Explorations 5(1) July 2003

EDAM Reading Group © Overview Theory 1 Theory 2 Theory 3 Performance Results Conclusion Future Work