Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.

Similar presentations


Presentation on theme: "Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과."— Presentation transcript:

1 Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과

2 Learning Disjunctive Sets of Rules n Method 1 –Learn Decision Tree –Translate Tree into Rules n Method 2 –Genetic Algorithm n Method 3 –Learn Rule Sets Directly –Sequential Covering Algorithm

3 Sequential Covering Algorithm (1) SEQUENTIAL-COVERING (Target_attribute, Attributes, Examples, Threshold) Learned_rules  {} Rule  LEARN-ONE-RULE(Target_attribute, Attributes, Examples) while PERFORMANCE(Rule, Examples) > Threshold, do Learned_rules  Learned_rules + Rule Examples  Examples - {examples correctly classified by Rule} Rule  LEARN-ONE-RULE(Target_attribute, Attributes, Examples) Learned_rules  sort Learned_rules according to PERFORMANCE over Examples return Learned_rules

4 Sequential Covering Algorithm (2) 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat n Greedy Search –No Guarantee of Best Set of Rules

5 Learn-One-Rule (1) n General to Specific Search –Greedy Depth-First Search –No Backtracking –Begin with most general rule –Greedily Adding Attribute Test one which most improve rule performance –High Accuracy, Incomplete Coverage

6 Learn-One-Rule (2)

7 General to Specific Beam Search n To Reduce Risk of Suboptimal Choice n Maintain a List of k best candidates

8 Learning Rule Sets (1) n Sequential Covering Algorithm –Learn One Rule at a time –Partition Data by Attribute-Value Pair n ID3 –Learn Entire Set of Disjunctive Rules –Partition Data by Attribute n If data is plentiful, sequential covering will be better.

9 Learning Rule Sets (2) n Sequential Covering Algorithm –General to Specific Search –Single Maximally General Hypothesis –Generate then Test Search –Robust : Impact of Noisy Data is Minimized n Find-S Algorithm –Specific to General Search –Example-Driven

10 Learning Rule Sets (3) n Rule Post-Pruning as Decision Tree Rule PERFORMANCE –relative frequency –m-estimate of accuracy –entropy (= Information Gain)

11 Learning First-Order Rules n Motivation for First-Order Rules –More Expressive –Inductive Logic Programming (ILP) Inductive Learning of First-order Rules  Automatic Inferring PROLOG Program n First-Order Horn Clauses –Horn Clause a Clause Containing at most One Positive Literal H   L 1    L n H  (L 1    L n )

12 Learning Sets of First-Order Rules: FOIL n FOIL –Natural Extension of SEQUENTIAL-COVERING & LEARN-ONE-RULE –Literals cannot contain Function Symbols. –Body of Rule May be Negated.

13 FOIL (1)

14 FOIL (2) n Seek Rules that Predict When the Target is TRUE n Hill-Climbing Search n Outer Loop –Generalize Current Disjunctive Hypothesis –Specific to General Search n Inner Loop –Hypothesis Space Consists of Conjunctions of Literals –General to Specific, Hill-Climbing Search

15 Generating Candidate Specializations in FOIL (1) n Suppose Current Rule is P(x 1, x 2, , x k )  L 1  L n n New Literal L n+1 that fit one of the following Forms: –Q(v 1, , v r ) Q : Predicate name occurring in Predicates v i : new variable or variables present in the rule At least one v i must already exist in current rule –Equal(x j, x k ) –negation of either of the above forms

16 Generating Candidate Specializations in FOIL (2) n Example –Begin with most general rule GrandDaughter(x, y)  –Generate Following Literals as Candidate Equal(x, y), Female(x), Female(y), Father(x, y), Father(x, z), Father(z, x), Father(y, z), Father(z, y), and negation of these literals. –Suppose that Father(y, z) be most promising GrandDaughter(x, y)  Father(y, z) –Iterate –GrandDaughter(x, y)  Father(y, z)  Father(z, x)  Female(y)

17 Guiding Search in FOIL n To Select the Most Promising Literal –Consider Performance of Rule Over Training Data –Consider All Possible Bindings of Each Variable

18 Guiding Search in FOIL n Information Gain in FOIL where L is the candidate literal to add to rule R p 0 = number of positive bindings of R n 0 = number of negative bindings of R p 1 = number of positive bindings of R+L n 1 = number of negative bindings of R+L t is the number of positive bindings of R also covered by R+L –Reduction of Number of Bits due to L

19 Induction As Inverted Deduction (1) n Induction is Finding h such that where x i is ith training instance f(x i ) is target function value for x i B is other background knowledge

20 Induction As Inverted Deduction (2) n Designing Inverse Entailment Operators O(B, D) = h such that n Minimum Description Length Principle –to choose hypothesis among hypotheses which satisfying n Practical Difficulty –Do not Allow Noisy Training Data –No. of Hypotheses satisfying is so large –Complexity of Hypothesis Space Increases as B is Increased.

21 Deduction : Resolution Rule n P  L  L  R P  R 1. Given initial clauses C 1 and C 2, find a literal L from clause C 1 such that  L occurs in clause C 2. 2. Form the resolvent C by including all literals from C 1 and C 2, except for L and  L. C = (C 1 - {L})  (C 2 - {  L })

22 Inverse Resolution Operator n Not Deterministic –Multiple C 2 such that C 1 and C 2 produce C –Prefer Shorter One 1. Given initial clause C 1 and C, find a literal L that occurs in C 1, but not in C 2. 2. Form the second clause C 2 by including the following literals C 2 = (C - (C 1 - {L})  {  L }

23 Rule-Learning Algorithm Based on Inverse Entailment Operators n Use Sequential Covering Algorithm 1. Select Training Example yet not covered 2. Apply Inverse Resolution to Generate Hypotheses h i That Satisfy 3. Iterate

24 First-Order Resolution n  is a Unifying Substitution for two Literals L 1 and L 2, if L 1  = L 2 . 1. Find a literal L 1 from C 1, literal L 2 from C 2, and substitution such  that L 1  =  L 2 . 2.From the Resolvent C by including all literals from C 1  and C 2 , except L 1  and  L 2 . C = (C 1 - {L 1 }  )  (C 2 - {  L 2 }) 

25 Inverting First-Order Resolution (1) n C = (C 1 - {L 1 }  1 )  (C 2 - {  L 2 })  2 where  =  1  2 n By Definition, L 2 =  L 1  1  2 -1, –C 2 = (C - (C 1 - {L 1 })  1 )  2 -1  {  L 1  1  2 -1 }

26 Inverting First-Order Resolution (2) n Training Data D = GrandChild(Bob, Shanon), Background Info. B = {Father(Shannon, Tom), Father(Tom, Bob)}.

27


Download ppt "Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과."

Similar presentations


Ads by Google