Chapter 10 Learning Sets Of Rules

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Decision Tree Learning
Learning set of rules.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Induction of Decision Trees
Machine Learning: Symbol-Based
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
1 Machine Learning: Rule Learning. 2 Learning Rules If-then rules in logic are a standard representation of knowledge that have proven useful in expert-systems.
Decision-Tree Induction & Decision-Rule Induction
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, 12 April 2007 William.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, November 29, 2001.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Logic and Inductive Logic Programming.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
Machine Learning Concept Learning General-to Specific Ordering
Learning sets of rules 学习规则集合 Edited by Wang Yinglin Shanghai Jiaotong University.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 Classifier Learning: Rule Learning Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin.
Rule-based Learning Propositional Version. Rule Learning Based on generalization operations A generalization (resp. specialization) operation is an operation.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
First-Order Rule Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 2
Machine Learning Chapter 2
Presentation transcript:

Chapter 10 Learning Sets Of Rules

Content Introduction Sequential Covering Algorithm Learning First-Order Rules (FOIL Algorithm) Induction As Inverted Deduction Inverting Resolution

Introduction GOAL: Learning a target function as a set of IF-THEN rules BEFORE: Learning with decision trees Learning the decision tree Translate the tree into a set of IF-THEN rules (for each leaf one rule) OTHER POSSIBILITY: Learning with genetic algorithms Each set of rule is coded as a bitvector Several genetic operators are used on the hypothesis space TODAY AND HERE: First: Learning rules in propositional form Second: Learning rules in first-order form (Horn clauses which include variables) Sequential search for rules, one after the other

Introduction IF (Outlook = Sunny) ∧ (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) ∧ (Humidity = Normal) THEN PlayTennis = Yes

Introduction An example of first-order rule sets target concept: Ancestor IF Parent(x,y) THEN Ancestor(x,y) IF Parent(x,y)∧ Parent(y,z) THEN Ancestor(x,z) The content of this chapter Learning algorithms capable of learing such rules , given sets of training examples

Content Introduction Sequential Covering Algorithm Learning First-Order Rules (FOIL Algorithm) Induction As Inverted Deduction Inverting Resolution

Sequential Covering Algorithm

Sequential Covering Algorithm Goal of such an algorithm: Learning a disjunctive set of rules, which defines a preferably good classification of the training data Principle: Learning rule sets based on the strategy of learning one rule, removing the examples it covers, then iterating this process. Requirement for the Learn-One-Rule method: As Input it accepts a set of positive and negative training examples As Output it delivers a single rule that covers many of the positive examples and maybe a few of the negative examples Required: The output rule has a high accuracy but not necessarily a high coverage

Sequential Covering Algorithm Procedure: Learning set of rules invokes the Learn-One-Rule method on all of the available training examples Remove every positive example covered by the rule Eventually short the final set of the rules: more accurate rules can be considered first Greedy search: It is not guaranteed to find the smallest or best set of rules that covers the training example.

Sequential Covering Algorithm SequentialCovering( target_attribute, attributes, examples, threshold ) learned_rules { } rule LearnOneRule( target_attribute, attributes, examples ) while (Performance( rule, examples ) > threshold ) do learned_rules learned_rules + rule examples examples - { examples correctly classified by rule } rule LearnOneRule( target_attribute, attributes, examples ) learned_rules sort learned_rules according to Performance over examples return learned_rules

General to Specific Beam Search The CN2-Algorithm LearnOneRule (target_attribute, attributes, examples, k ) Initialise best_hypothesis to the most general hypothesis Ø Initialise candidate_hypotheses to the set { best_hypothesis } while ( candidate_hypothesis is not empty ) do 1. Generate the next more-specific candidate_hypothesis 2. Update best_hypothesis 3. Update candidate_hypothesis return a rule of the form “IF best_hypothesis THEN prediction“ where prediction is the most frequent value of target_attribute among those examples that match best_hypothesis. Performance( h, examples, target_attribute ) h_examples the subset of examples that match h return -Entropy( h_examples ), where Entropy is with respect to target_attribute

General to Specific Beam Search Generate the next more specific candidate_hypothesis all_constraints set of all constraints (a = v), where a Î attributes and v is a value of an occuring in the current set of examples new_candidate_hypothesis for each h in candidate_hypotheses, for each c in all_constraints create a specialisation of h by adding the constraint c Remove from new_candidate_hypothesis any hypotheses which are duplicate, inconsistent or not maximally specific Update best_hypothesis for all h in new_candidate_hypothesis do if statistically significant when tested on examples Performance( h, examples, target_attribute ) > Performance( best_hypothesis, examples, target_attribute ) ) then best_hypothesis h

General to Specific Beam Search Update the candidate-hypothesis candidate_hypothesis the k best members of new_candidate_hypothesis, according to Performance function Performance function guides the search in the Learn-One -Rule s: the current set of training examples c: the number of possible values of the target attribute : part of the examples, which are classified with the ith. value

Learn-One-Rule

Learning Rule Sets: Summary Key dimension in the design of the rule learning algorithm Here sequential covering: learn one rule, remove the positive examples covered, iterate on the remaining examples ID3 simultaneous covering Which one should be preferred? Key difference: choice at the most primitive step in the search ID3: chooses among attributes by comparing the partitions of the data they generated CN2: chooses among attribute-value pairs by comparing the subsets of data they cover Number of choices: learn n rules each containing k attribute-value tests in their precondition CN2: n*k primitive search steps ID3: fewer independent search steps If the data is plentiful, then it may support the larger number of independent decisons If the data is scarce, the sharing of decisions regarding preconditions of different rules may be more effective

Learning Rule Sets: Summary CN2: general-to-specific (cf. Find-S specific-to-general):the direction of the search in LEARN-ONE-RULE. Advantage: there is a single maximally general hypothesis from which to begin the search <=> there are many specific ones GOLEM: choosing several positive examples at random to initialise and to guide the search. The best hypothesis obtained through multiple random choices is the selected one CN2: generate then test Find-S, CANDIDATE-ELIMINATION are example-driven Advantage of the generate and test approach: each choice in the search is based on the hypothesis performance over many examples, the impact of noisy data is minimized

Content Introduction Sequential Covering Algorithm Learning First-Order Rules (FOIL Algorithm) Induction As Inverted Deduction Inverting Resolution

Learning First-Order Rules Why do that ? Can learn sets of rules such as IF Parent(x,y) THEN Ancestor(x,y) IF Parent(x,y)∧Ancestor(y,z) THEN Ancestor(x,z)

Learning First-Order Rules Terminology Term : Mary , x , age(Mary) , age(x) Literal : Female(Mary), ﹁Female(x) Greater_than(age(Mary) ,20) Clause : M1∨…∨Mn Horn Clause: H←(L1∧…∧Ln) Substitution : {x/3,y/z} Unifying substitution: = FOIL Algorithm (Learning Sets of First-Order Rules)

Cover all positive examples Avoid all negative examples

Learning First-Order Rules Analysis of FOIL Outer Loop Specific-to-general Inner Loop general-to-specific Difference between FOIL and Sequential- covering and Learn-one-rule generate candidate specializations of the rule FOIL_Gain

Learning First-Order Rules Candidate Specializations in FOIL (Ln+1)

Learning First-Order Rules Example Target Predicate : GrandDaughter(x,y) Other Predicate: Father , Female Rule : GrandDaughter(x,y) Candidate Literal: Equal(x,y) ,Female(x),Female(y),Father(x,y),Father(y,x), Father(x,z),Father(z,x), Father(y,z), Father(z,y) + negative Literals GrandDaugther(x,y) Father(y,z) GrandDaughter(x,y) Father(y,z) ∧Father(z,x) ∧Female(y)

Learning First-Order Rules Information Gain in FOIL Assertions of training data GrandDaughter(Victor,Sharon) Father(Sharon,Bob) Father(Tom,Bob) Female(Sharon) Father(Bob,Victor) Rule : GrandDaughter(x,y) Variable binding (16): {x/Bob , y/Sharon} Positive example binding: {x/Victor , y/Sharon} Negative example binding (15): {x/Bob,y/Tom} ……

Learning First-Order Rules Information Gain in FOIL L the candidate literal to add to rule R p0 number of positive bindings of R n0 number of negative bindings of R p1 number of positive bindings of R+L n1 number of negative bindings of R+L t the number of positive bindings of R also covered by R+L

Learning First-Order Rules Learning Recursive Rule Sets Target Predicate can be the candidate literals IF Parent(x,y) THEN Ancestor(x,y) IF Parent(x,y)∧Ancestor(y,z) THEN Ancestor(x,z)

Learning First-Order Rules Learning Recursive Rule Sets Target Predicate can be the candidate literals IF Parent(x,y) THEN Ancestor(x,y) IF Parent(x,y)∧Ancestor(y,z) THEN Ancestor(x,z)

Content Introduction Sequential Covering Algorithm Learning First-Order Rules (FOIL Algorithm) Induction As Inverted Deduction Inverting Resolution

Induction As Inverted Deduction Machine Learning Building theories that explain the observed data ( D <xi , f(xi)> ; B ; h ) Induction is finding h such that entail

Induction As Inverted Deduction Example target concept : Child(u , v)

Induction As Inverted Deduction What we will be interested is designing inverse entailment operators

Content Introduction Sequential Covering Algorithm Learning First-Order Rules (FOIL Algorithm) Induction As Inverted Deduction Inverting Resolution

Inverting Resolution(逆规约) C1 C2 C Resolution operator construct C : ( C1 ∧ C2 C )

Inverse resolution operator : O( C , C1 )=C2

Inverting Resolution First-order resolution Resolution operator (first-order):

Inverting Resolution Inverting First order resolution Inverse resolution (first-order )

Inverting Resolution Example target predicate : GrandChild(y,x) D = { GrandChild(Bob,Shannon) } B = {Father(Shannon,Tom) , Father(Tom,Bob)} C C1 L1

Inverting Resolution C1 C2 C

Summary Sequential covering algorithm Propositional form first-order (FOIL algorithm) A second approach to learning first-order rules -- Inverting Resolution