Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.

Slides:

Advertisements

Similar presentations

Analytical Learning.

Advertisements

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)

ICS320-Foundations of Adaptive and Learning Systems

Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Artificial Intelligence Chapter 14. Resolution in the Propositional Calculus Artificial Intelligence Chapter 14. Resolution in the Propositional Calculus.

Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.

Decision Tree Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

Chapter 10 Learning Sets Of Rules

Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.

Decision Tree Learning

Learning set of rules.

Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2005.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Machine Learning: Symbol-Based

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.

Mohammad Ali Keyvanrad

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.

Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.

CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

CpSc 810: Machine Learning Decision Tree Learning.

November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.

1 Machine Learning: Rule Learning. 2 Learning Rules If-then rules in logic are a standard representation of knowledge that have proven useful in expert-systems.

Decision-Tree Induction & Decision-Rule Induction

Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.

For Wednesday No reading Homework: –Chapter 18, exercise 6.

Multi-Relational Data Mining: An Introduction Joe Paulowskey.

Predicate Calculus Syntax Countable set of predicate symbols, each with specified arity  0. For example: clinical data with multiple tables of patient.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, 12 April 2007 William.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, November 29, 2001.

For Monday Finish chapter 19 No homework. Program 4 Any questions?

Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.

For Wednesday Read 20.4 Lots of interesting stuff in chapter 20, but we don’t have time to cover it all.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.

First-Order Logic and Inductive Logic Programming.

First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.

NMR98 - Logic Programming1 Learning with Extended Logic Programs Evelina Lamma (1), Fabrizio Riguzzi (1), Luís Moniz Pereira (2) (1)DEIS, University of.

Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Rule-based Learning Propositional Version. Rule Learning Based on generalization operations A generalization (resp. specialization) operation is an operation.

CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.

CS 9633 Machine Learning Explanation Based Learning

CS 9633 Machine Learning Decision Tree Learning

Decision Tree Learning

Resolution in the Propositional Calculus

First-Order Logic and Inductive Logic Programming

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

First-Order Rule Learning

Rule Learning Hankui Zhuo April 28, 2018.

Machine Learning Chapter 3. Decision Tree Learning

Knowledge in Learning Chapter 19

Presentation transcript:

Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과

Learning Disjunctive Sets of Rules n Method 1 –Learn Decision Tree –Translate Tree into Rules n Method 2 –Genetic Algorithm n Method 3 –Learn Rule Sets Directly –Sequential Covering Algorithm

Sequential Covering Algorithm (1) SEQUENTIAL-COVERING (Target_attribute, Attributes, Examples, Threshold) Learned_rules  {} Rule  LEARN-ONE-RULE(Target_attribute, Attributes, Examples) while PERFORMANCE(Rule, Examples) > Threshold, do Learned_rules  Learned_rules + Rule Examples  Examples - {examples correctly classified by Rule} Rule  LEARN-ONE-RULE(Target_attribute, Attributes, Examples) Learned_rules  sort Learned_rules according to PERFORMANCE over Examples return Learned_rules

Sequential Covering Algorithm (2) 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat n Greedy Search –No Guarantee of Best Set of Rules

Learn-One-Rule (1) n General to Specific Search –Greedy Depth-First Search –No Backtracking –Begin with most general rule –Greedily Adding Attribute Test one which most improve rule performance –High Accuracy, Incomplete Coverage

Learn-One-Rule (2)

General to Specific Beam Search n To Reduce Risk of Suboptimal Choice n Maintain a List of k best candidates

Learning Rule Sets (1) n Sequential Covering Algorithm –Learn One Rule at a time –Partition Data by Attribute-Value Pair n ID3 –Learn Entire Set of Disjunctive Rules –Partition Data by Attribute n If data is plentiful, sequential covering will be better.

Learning Rule Sets (2) n Sequential Covering Algorithm –General to Specific Search –Single Maximally General Hypothesis –Generate then Test Search –Robust : Impact of Noisy Data is Minimized n Find-S Algorithm –Specific to General Search –Example-Driven

Learning Rule Sets (3) n Rule Post-Pruning as Decision Tree Rule PERFORMANCE –relative frequency –m-estimate of accuracy –entropy (= Information Gain)

Learning First-Order Rules n Motivation for First-Order Rules –More Expressive –Inductive Logic Programming (ILP) Inductive Learning of First-order Rules  Automatic Inferring PROLOG Program n First-Order Horn Clauses –Horn Clause a Clause Containing at most One Positive Literal H   L 1    L n H  (L 1    L n )

Learning Sets of First-Order Rules: FOIL n FOIL –Natural Extension of SEQUENTIAL-COVERING & LEARN-ONE-RULE –Literals cannot contain Function Symbols. –Body of Rule May be Negated.

FOIL (1)

FOIL (2) n Seek Rules that Predict When the Target is TRUE n Hill-Climbing Search n Outer Loop –Generalize Current Disjunctive Hypothesis –Specific to General Search n Inner Loop –Hypothesis Space Consists of Conjunctions of Literals –General to Specific, Hill-Climbing Search

Generating Candidate Specializations in FOIL (1) n Suppose Current Rule is P(x 1, x 2, , x k )  L 1  L n n New Literal L n+1 that fit one of the following Forms: –Q(v 1, , v r ) Q : Predicate name occurring in Predicates v i : new variable or variables present in the rule At least one v i must already exist in current rule –Equal(x j, x k ) –negation of either of the above forms

Generating Candidate Specializations in FOIL (2) n Example –Begin with most general rule GrandDaughter(x, y)  –Generate Following Literals as Candidate Equal(x, y), Female(x), Female(y), Father(x, y), Father(x, z), Father(z, x), Father(y, z), Father(z, y), and negation of these literals. –Suppose that Father(y, z) be most promising GrandDaughter(x, y)  Father(y, z) –Iterate –GrandDaughter(x, y)  Father(y, z)  Father(z, x)  Female(y)

Guiding Search in FOIL n To Select the Most Promising Literal –Consider Performance of Rule Over Training Data –Consider All Possible Bindings of Each Variable

Guiding Search in FOIL n Information Gain in FOIL where L is the candidate literal to add to rule R p 0 = number of positive bindings of R n 0 = number of negative bindings of R p 1 = number of positive bindings of R+L n 1 = number of negative bindings of R+L t is the number of positive bindings of R also covered by R+L –Reduction of Number of Bits due to L

Induction As Inverted Deduction (1) n Induction is Finding h such that where x i is ith training instance f(x i ) is target function value for x i B is other background knowledge

Induction As Inverted Deduction (2) n Designing Inverse Entailment Operators O(B, D) = h such that n Minimum Description Length Principle –to choose hypothesis among hypotheses which satisfying n Practical Difficulty –Do not Allow Noisy Training Data –No. of Hypotheses satisfying is so large –Complexity of Hypothesis Space Increases as B is Increased.

Deduction : Resolution Rule n P  L  L  R P  R 1. Given initial clauses C 1 and C 2, find a literal L from clause C 1 such that  L occurs in clause C Form the resolvent C by including all literals from C 1 and C 2, except for L and  L. C = (C 1 - {L})  (C 2 - {  L })

Inverse Resolution Operator n Not Deterministic –Multiple C 2 such that C 1 and C 2 produce C –Prefer Shorter One 1. Given initial clause C 1 and C, find a literal L that occurs in C 1, but not in C Form the second clause C 2 by including the following literals C 2 = (C - (C 1 - {L})  {  L }

Rule-Learning Algorithm Based on Inverse Entailment Operators n Use Sequential Covering Algorithm 1. Select Training Example yet not covered 2. Apply Inverse Resolution to Generate Hypotheses h i That Satisfy 3. Iterate

First-Order Resolution n  is a Unifying Substitution for two Literals L 1 and L 2, if L 1  = L 2 . 1. Find a literal L 1 from C 1, literal L 2 from C 2, and substitution such  that L 1  =  L 2 . 2.From the Resolvent C by including all literals from C 1  and C 2 , except L 1  and  L 2 . C = (C 1 - {L 1 }  )  (C 2 - {  L 2 }) 

Inverting First-Order Resolution (1) n C = (C 1 - {L 1 }  1 )  (C 2 - {  L 2 })  2 where  =  1  2 n By Definition, L 2 =  L 1  1  2 -1, –C 2 = (C - (C 1 - {L 1 })  1 )  2 -1  {  L 1  1  2 -1 }

Inverting First-Order Resolution (2) n Training Data D = GrandChild(Bob, Shanon), Background Info. B = {Father(Shannon, Tom), Father(Tom, Bob)}.