Presentation is loading. Please wait.

Presentation is loading. Please wait.

First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.

Similar presentations


Presentation on theme: "First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume."— Presentation transcript:

1 First-Order Rule Learning

2 Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume the existence of a Learn_one_Rule function: Input: a set of training instances Output: a single high-accuracy (not necessarily high- coverage) rule

3 Sequential Covering (II) Algorithm Sequential_Covering(Instances) Learned_rules   Rule  Learn_one_Rule(Instances) While Quality(Rule, Instances) > Threshold Do Learned_rules  Learned_rules + Rule Instances  Instances - {instances correctly classified by Rule} Rule  Learn_one_Rule(Instances) Sort Learned_rules by Quality over Instances # Quality is user-defined rule quality evaluation function Return Learned_rules

4 CN2 (I) Algorithm Learn_one_Rule_CN2(Instances, k) Best_hypo   Candidate_hypo  {Best_hypo} While Candidate_hypo   Do All_constraints  {(a=v): a is an attribute and v is a value of a found in Instances} New_candidate_hypo  For each h  Candidate_hypo For each c  All_constraints, specialize h by adding c Remove from New_candidate_hypo any hypotheses that are duplicates, inconsistent or not maximally specific For all h  New_candidate_hypo If Quality_CN2(h, Instances) > Quality_CN2(Best_hypo, Instances) Best_hypo  h Candidate_hypo  the k best members of New_candidate_hypo as per Quality_CN2 Return a rule of the form “IF Best_hypo THEN Pred” # Pred = most frequent target attribute’s value among instances that match Best_hypo

5 CN2 (II) Algorithm Quality_CN2(h, Instances) h_instances  {i  Instances: i matches h} Return -Entropy(h_instances) where Entropy is computed with respect to the target attribute Note that CN2 performs a general-to-specific beam search, keeping not the single best candidate at each step, but a list of the k best candidates

6 Illustrative Training Set

7 CN2 Example (I) First pass: Full instance set 2-best1: « Income Level = Low » (4-0- 0), « Income Level = High » (0-1-5) Can’t do better than (4-0-0) Best_hypo: « Income Level = Low » First rule: IF Income Level = Low THEN HIGH

8 CN2 Example (II) Second pass: Instances 2-3, 5-6, 8-10, 12-14 2-best1: « Income Level = High » (0-1- 5), « Credit History = Good » (0-1-3) Best_hypo: « Income Level = High » 2-best2: « Income Level = High AND Credit History = Good » (0-0-3), « Income level = High AND Collateral = None » (0-0-3) Best_hypo: « Income Level = High AND Credit History = Good » Can’t do better than (0-0-3) Second rule: IF Income Level = High AND Credit History = Good THEN LOW

9 CN2 Example (III) Third pass: Instances 2-3, 5-6, 8, 12, 14 2-best1: « Credit History = Good » (0- 1-0), « Debt level = High » (2-1-0) Best_hypo: « Credit History = Good » Can’t do better than (0-1-0) Third rule: IF Credit History = Good THEN MODERATE

10 CN2 Example (IV) Fourth pass: Instances 2-3, 5-6, 8, 14 2-best1: « Debt level = High » (2-0-0), « Income Level = Medium » (2-1-0) Best_hypo: « Debt Level = High » Can’t do better than (2-0-0) Fourth rule: IF Debt Level = High THEN HIGH

11 CN2 Example (V) Fifth pass: Instances 3, 5-6, 8 2-best1: « Credit History = Bad » (0-1- 0), « Income Level = Medium » (0-1-0) Best_hypo: « Credit History = Bad » Can’t do better than (0-1-0) Fifth rule: IF Credit History = Bad THEN MODERATE

12 CN2 Example (VI) Sixth pass: Instances 3, 5-6 2-best1: « Income Level = High » (0-0- 2), « Collateral = Adequate » (0-0-1) Best_hypo: « Income Level = High » Can’t do better than (0-0-2) Sixth rule: IF Income Level = High THEN LOW

13 CN2 Example (VII) Seventh pass: Instance 3 2-best1: « Credit History = Unknown » (0-1-0), « Debt level = Low » (0-1-0) Best_hypo: « Credit History = Unknown » Can’t do better than (0-1-0) Seventh rule: IF Credit History = Unknown THEN MODERATE

14 CN2 Example (VIII) Quality: -  p i log(p i ) Rule 1: (4-0-0)- Rank 1 Rule 2: (0-0-3)- Rank 2 Rule 3: (1-1-3)- Rank 5 Rule 4: (4-1-2)- Rank 6 Rule 5: (3-1-0)- Rank 4 Rule 6: (0-1-5)- Rank 3 Rule 7: (2-1-2)- Rank 7

15 CN2 Example (IX) IF Income Level = Low THEN HIGH IF Income Level = High AND Credit History = Good THEN LOW IF Income Level = High THEN LOW IF Credit History = Bad THEN MODERATE IF Credit History = Good THEN MODERATE IF Debt Level = High THEN HIGH IF Credit History = Unknown THEN MODERATE

16 Limitations of AVL (I) Consider the MONK1 problem: 6 attributes A1: 1, 2, 3 A2: 1, 2, 3 A3: 1, 2 A4: 1, 2, 3 A5: 1, 2, 3, 4 A6: 1, 2 2 classes: 0, 1 Target concept: If (A1=A2 or A5=1) then Class 1

17 Limitations of AVL (II) Can you build a decision tree for this concept?

18 Limitations of AVL (III) Can you build a rule set for this concept? If A1=1 and A2=1 then Class=1 If A1=1 and A2=1 then Class=1 If A1=2 and A2=2 then Class=1 If A1=2 and A2=2 then Class=1 If A1=3 and A2=3 then Class=1 If A1=3 and A2=3 then Class=1 If A5=1 then Class=1 If A5=1 then Class=1 Class=0 Class=0

19 First-order Language Supports first-order concepts -> relations between attributes accounted for in a natural way For simplicity, restrict to Horn clauses A clause is any disjunction of literals whose variables are universally quantified Horn clauses (single non-negated literal):

20 FOIL (I) Algorithm FOIL(Target_predicate, Predicates, Examples) Pos  those Examples for which Target_predicate is true Neg  those Examples for which Target_predicate is false Learned_rules   While Pos   Do New_rule  the rule that predicts Target_predicate with no precondition New_rule_neg  Neg While New_rule_neg   Do Candidate_literals  GenCandidateLit(New_rule, Predicates) Best_literal  argmax L  Candidate_literals FoilGain(L, New_rule) Add Best_literal to New_rule’s preconditions New_rule_neg  subset of New_rule_neg that satisfies New_rule’s preconditions Learned_rules  Learned_rules + New_rule Pos  Pos – {members of Pos covered by New_rule} Return Learned_rules

21 FOIL (II) Algorithm GenCandidateLit(Rule, Predicates) Let Rule  P(x 1, …, x k )  L 1, …, L n Return all literals of the form Q(v 1, …, v r ) where Q is any predicate in Predicates and the v i ’s are either new variables or variables already present in Rule, with the constraint that at least one of the v i ’s must already exist as a variable in Rule Equal(x j, x k ) where x j and x k are variables already present in Rule The negation of all of the above forms of literals

22 FOIL (III) Algorithm FoilGain(L, Rule) Return where p 0 is the number of positive bindings of Rule p 0 is the number of positive bindings of Rule n 0 is the number of negative bindings of Rule n 0 is the number of negative bindings of Rule p 1 is the number of positive bindings of Rule+L p 1 is the number of positive bindings of Rule+L n 1 is the number of negative bindings of Rule+L n 1 is the number of negative bindings of Rule+L t is the number of positive bindings of Rule that are still covered after adding L to Rule t is the number of positive bindings of Rule that are still covered after adding L to Rule

23 Illustration (I) Consider the data: GrandDaughter(Victor, Sharon) Father(Sharon, Bob) Father(Tom, Bob) Female(Sharon) Father(Bob, Victor) Target concept: GrandDaughter(x, y) Closed-world assumption

24 Illustration (II) Training set: Positive examples: GrandDaughter(Victor, Sharon) Negative examples: GrandDaughter(Victor, Victor) GrandDaughter(Victor, Bob) GrandDaughter(Victor, Tom) GrandDaughter(Sharon, Victor) GrandDaughter(Sharon, Sharon) GrandDaughter(Sharon, Bob) GrandDaughter(Sharon, Tom) GrandDaughter(Bob, Victor) GrandDaughter(Bob, Sharon) GrandDaughter(Bob, Bob) GrandDaughter(Bob, Tom) GrandDaughter(Tom, Victor) GrandDaughter(Tom, Sharon) GrandDaughter(Tom, Bob) GrandDaughter(Tom, Tom)

25 Illustration (III) Most general rule: GrandDaughter(x, y)  Specializations: Father(x, y) Father(x, z) Father(y, x) Father(y, z) Father(x, z) Father(z, x) Female(x) Female(y) Equal(x, y) Negations of each of the above

26 Illustration (IV) Consider 1 st specialization GrandDaughter(x, y)  Father(x, y) 16 possible bindings: x/Victor, y/Victor x/Victor y/Sharon … x/Tom, y/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon) n 0 = 15 p 1 = 0 n 1 = 16 t = 0 So that GainFoil(1 st specialization) = 0

27 Illustration (V) Consider 4 th specialization GrandDaughter(x, y)  Father(y, z) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon) n 0 = 15 p 1 = 1 (x/Victor, y/Sharon, z/Bob) n 1 = 11 (x/Victor, y/Bob, z/Victor) (x/Victor, y/Tom, z/Bob) (x/Sharon, y/Bob, z/Victor) (x/Sharon, y/Tom, z/Bob) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) (x/Tom, y/Sharon, z/Bob) (x/Tom, y/Bob, z/Victor) (x/Sharon, y/Sharon, z/Bob) (x/Bob, y/Bob, z/Victor) (x/Tom, y/Tom, z/Bob) t = 1 So that GainFoil(4 th specialization) = 0.415

28 Illustration (VI) Assume the 4 th specialization is indeed selected Partial rule: GrandDaughter(x, y)  Father(y, z) Still covers 11 negative examples New set of candidate literals: All of the previous ones Female(z) Equal(x, z) Equal(y, z) Father(z, w) Father(w, z) Negations of each of the above

29 Illustration (VII) Consider the specialization GrandDaughter(x, y)  Father(y, z), Equal(x, z) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon, z/Bob) n 0 = 11 p 1 = 0 n 1 = 3 (x/Victor, y/Bob, z/Victor) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) t = 0 So that GainFoil(specialization) = 0

30 Illustration (VIII) Consider the specialization GrandDaughter(x, y)  Father(y, z), Father(z, x) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon, z/Bob) n 0 = 11 p 1 = 1(x/Victor, y/Sharon, z/Bob) n 1 = 1 (x/Victor, y/Tom, z/Bob) t = 1 So that GainFoil(specialization) = 2.585

31 Illustration (IX) Assume that specialization is indeed selected Partial rule: GrandDaughter(x, y)  Father(y, z), Father(z, x) Still covers 1 negative example No new set of candidate literals Use all of the previous ones

32 Illustration (X) Consider the specialization GrandDaughter(x, y)  Father(y, z), Father(z, x), Female(y) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon, z/Bob) n 0 = 1 p 1 = 1(x/Victor, y/Sharon, z/Bob) n 1 = 0 t = 1 So that GainFoil(specialization) = 1

33 Illustration (XI) No negative examples are covered and all positive examples are covered So, we get the final correct rule: GrandDaughter(x, y)  Father(y, z), Father(z, x), Female(y)


Download ppt "First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume."

Similar presentations


Ads by Google