Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University.

Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University

2 注意：图书馆资源 www.lib.sjtu.edu.cn 馆藏资源  电子数据库 Kluwer Elsevier IEEE …. 中国期刊网万方数据 …

3 Introduction Set of “If-then” rules The hypothesis is easy to interpret.Goal Look at a new method to learn rules Rules Propositional rules (rules without variables) First-order predicate rules (with variables)

4 Introduction So far... Method 1: Learn decision tree  rules Method 2: Genetic algorithm, encode rule set as a bit string From now... New method! Learning first-order rule Using sequential covering First-order rule Difficult to represent using a decision tree or other propositional representation If Parent(x,y) then Ancestor(x,y) If Parent(x,z) and Ancestor(z,y) then Ancestor(x,y)

5 Introduction contents Introduction Sequential Covering Algorithms First Order Rules First-order inductive learning (FOIL) Induction as Inverted Deduction Summary

6 Sequential Covering Algorithms Algorithm 1. Learn one rule that covers certain number of positive examples 2. Remove those examples covered by the rule 3. Repeat until no positive examples are left

7 Sequential Covering Algorithms Require that each rule has high accuracy but any coverage High accuracy  the predicate : correct Accepting low coverage  the predicate NOT for every training example

8 Sequential Covering Algorithms Sequential-Covering (Target-Attribute, Attributes, Examples, Threshold) Learned-Rules  {} Rule  Learn-One-Rule (Target-Attribute, Attributes, Examples) WHILE Performance (Rule, Examples) > Threshold DO Learned_rules ← Learned_rules + Rule // add new rule to set Examples ← Examples - {examples correctly classified by Rule} Rule  Learn-One-Rule (Target-Attribute, Attributes, Examples) Sort-By-Performance (Learned-Rules, Target-Attribute, Examples) RETURN Learned-Rules

9 Sequential Covering Algorithms One of the most widespread approaches to learning disjunctive sets of rules. Learning disjunctive sets of rules  sequence of single problem ( a rule : conjunction of attr. value ) It performs a greedy search (no backtracking); as such it may not find an optimal rule set. It sequentially covers the set of positive examples until the performance of a rule is below a threshold.

10 Sequential Covering Algorithms : General to Specific Beam Search How do we learn each individual rule? Requirements for LEARN-ONE-RULE High accuracy, need not high coverage One approach is... To proceed as decision tree learning (ID3) BUT, by following a single branch with best performance at each search step

11 Sequential Covering Algorithms : General to Specific Beam Search IF {Humidity = Normal} THEN Play-Tennis = Yes IF {Wind = Strong} THEN Play-Tennis = No IF {Wind = Light} THEN Play-Tennis = Yes IF {Humidity = High} THEN Play-Tennis = No … IF {} THEN Play-Tennis = Yes … IF {Humidity = Normal, Outlook = Sunny} THEN Play-Tennis = Yes IF {Humidity = Normal, Wind = Strong} THEN Play-Tennis = Yes IF {Humidity = Normal, Wind = Light} THEN Play-Tennis = Yes IF {Humidity = Normal, Outlook = Rain} THEN Play-Tennis = Yes

12 Idea: organize the hypothesis space search in general to specific fashion. Start with most general rule precondition, then greedily add the attribute that most improves performance measured over the training examples. Sequential Covering Algorithms : General to Specific Beam Search

13 Sequential Covering Algorithms : General to Specific Beam Search Greedy search without backtracking  danger of suboptimal choice at any step The algorithm can be extended using beam-search Keep a list of the k best candidates at each step On each search step, descendants are generated for each of these k best candidates. And resulting set is again reduced to the k best candidates.

14 Sequential Covering Algorithms : General to Specific Beam Search Learn_One_Rule (target_attr,attributes,examples,k) Best-hypothesis = 0 Candidate-hypotheses = {Best-hypothesis} While Candidate-hypotheses is not empty do 1. Generate the next more specific candidate hypotheses   2. Update Best-hypothesis For all h in new-candidates if (Performance(h) > Performance(Best-hypothesis)) Best-hypothesis = h 3. Update Candidate-hypotheses Candidate-hypotheses = best k members of new-candidates Return rule : “IF Best-hypothesis THEN prediction” (predication: most frequent value of target_attr. among examples covered by Best- hypothesis) Performance(h, examples, target_attr.) - h_examples = the subsets of examples covered by h - Return Entropy(h_examples)

15 All_constraints  所有形式为 (a=v) 的约束集合，其中， a 为 Attributes 的成员， v 为出现在当前 Examples 集合中的 a 的值 New_candidate_hypotheses  对 Candidate_hypotheses 中每个 h ，循环对 All_constraints 中每个 c ，循环通过加入约束 c 创建一个 h 的特化式 New_candidate_hypotheses 中移去任意重复的、不一致的或非极大特殊化的假设 Generate the next more specific candidate hypotheses: Sequential Covering Algorithms : General to Specific Beam Search

16 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes Learn-One-Rule Example

17 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} Learn-One-Rule Example

18 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} 3.all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot,......} Learn-One-Rule Example

19 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} 3.all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot,......} 4.new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES,...} Learn-One-Rule Example

20 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} 3.all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot,......} 4.new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES,...} 5.best-hypothesis = IF Outlook=Sunny THEN PlayTennis=YES Learn-One-Rule Example

21 Sequential Covering Algorithms : Variations Learn only rules that cover positive examples The fraction of positive example is small in this case, we can modify the algorithm to learn only from those rare example Instead of entropy, use a measure that evaluates the fraction of positive examples covered by the hypothesis AQ-algorithm Different covering algorithm Searches rule sets for particular target value Different single-rule alg. Guided by uncovered positive examples Only attributes satisfied in examples are used

22 Summary : Points for Consideration Key design issue for learning sets of rules Sequential or simultaneous? Sequential : isolate components of hypothesis Simultaneous : whole hypothesis at once General-to-specific or Specific-to-general? G  S : learn-one-rule S  G : Find-S Generate-and-Test or Example-Driven? G&T : search thru syntactically legal hypotheses E-D : Find-S, Candidate-Elimination Post-pruning of Rules? Popular overfitting recovery method

23 Summary : Points for Consideration What statistical evaluation method? Relative frequency : n c /n (n : matched by rule, n c : classified by rule correctly) M-estimate of accuracy : (n c + mp) / (n + m) P : the prior probability that a randomly drawn example will have classification assigned by the rule m : weight ( or # of examples for weighting this prior) Entropy a

24 Learning first-order rules From now... We consider learning rule that contain variables (first-order rules) Inductive learning of first-order rules : inductive logic programming (ILP) Can be viewed as automatically inferring Prolog programs Two methods are considered FOIL Induction as inverted deduction

25 Learning first-order rules First-order rule Rules that contain variables Example Ancestor (x, y)  Parent (x, y). Ancestor (x, y)  Parent (x, z) ^ Ancestor (z, y) : recursive More expressive than propositional rules IF (Father1 = Bob)&(Name2 = Bob)&(Female1 = True),THEN Daughter1,2 = True IF Father(y,x) & Female(y), THEN Daughter(x,y)

26 Learning first-order rules : Terminology Terminology 常量 ---Constants: e.g., John, Kansas, 42 变量 ---Variables: e.g., Name, State, x 谓词 ---Predicates: e.g., Father-Of, Greater-Than 函数 ---Functions: e.g., age, cosine 项 ---Term: constant, variable, or function(term) 文字 ----Literals (atoms): Predicate(term) or negation 例如：,  Greater-Than(age(John), 42) ， Female(Mary), ¬Female(x) 子句 ---Clause: disjunction of literals with implicit universal quantification 例如：∀ x : Female(x) ∨ Male(x) Horn 子句 ----Horn clause: at most one positive literal (H   L 1   L 2  …   L n )

27 Learning first-order rules : Terminology First Order Horn Clauses Rules that have one or more preconditions and one single consequent. Predicates may have variables The following Horn clause is equivalent H   L1  …   Ln H  (L1  …  Ln ) If L1  …  Ln) then H

28 Learning first-order rules : First-Order Inductive Learning (FOIL) Natural extension of “Sequential covering + Learn-one-rule” FOIL rule : similar to Horn clause with two exceptions Syntactic restriction : no function More expressive than Horn clauses : Negation allowed in rule bodies

29 Learning first-order rules : First-Order Inductive Learning (FOIL) FOIL (Target_predicate, Predicates, Examples) Pos (Neg) ← those Examples for which the Target_predicate is True (False) Learned_rules ← { } while Pos is not NULL, do New Rule ← the rule that predicts Target_predicate with no preconditions New RuleNeg ← Neg while NewRuleNeg, do Candidate_literals ← candidate new literals for NewRule Best_literal ← argmax Foil_Gain (L, NewRule) Add Best_literal to preconditions of NewRule NewRuleNeg ← subset of NewRuleNeg (satisfies NewRule precondi.) Learned-rules ← Learned_rules + NewRule Pos ← Pos – {members of Pos covered by NewRule} Return Learned_rules

30 Learning first-order rules : First-Order Inductive Learning (FOIL) FOIL learns rules when the target literal is true. Cf. sequential covering learns both rules that are true and false Outer loop Add a new rule to its disjunctive hypothesis Specific-to-General search Inner loop Find a conjunction General-to-Specific search on each rule by starting with a NULL precondition and adding more literal (hill-climbing) Cf. sequential covering performs a beam search.

31 FOIL: Generating Candidate Specializations in FOIL Generate new literals, each of which may be added to the rule preconditions. Current Rule : P(x 1, x 2, …, x k ) <- L 1 … L n Add new literal L n+1 to get more specific Horn clause Form of literal Q(v 1, v 2, …, v k ) : Q in predicates Equal( X j, X k ) : X j and X k are variables already present to the rule The negation of above

32 FOIL : Guiding the Search in FOIL Consider all possible bindings (substitution) : prefer rules that possess more positive bindings Foil_Gain(L, R) L  candidate predicate to add to rule R p 0  number of positive bindings of R n 0  number of negative bindings of R p 1  number of positive bindings of R + L n 1  number of negative bindings of R + L t  number of positive bindings of R also covered by R + L Based on the numbers of positive and negative bindings covered before and after adding the new literal

33 FOIL : Examples Examples Target literal : GrandDaughter(x, y) Training Examples : GrandDaughter(Victor, Sharon) Father(Sharon,Bob)Father(Tom, Bob) Female(Sharon) Father(Bob, Victor) Initial step : GrandDaughter(x, y) ← positive binding : {x/Victor, y/Sharon} negative binding : others

34 FOIL : Examples Candidate additions to the rule preconditions : Equal(x,y), Female(x), Female(y), Father(x,y), Father(y,x), Father(x,z), Father(z,x), Father(y,z), Father(z,y) and the negations For each candidate, calculate FOIL_Gain If Father(y, z) has the maximum value of FOIL_Gain, select Father(y, z) to add precondition of rule GrandDaughter(x, y) ← Father(y,z) Iteration… We add the best candidate literal and continue adding literals until we generate a rule like the following: GrandDaughter(x,y)  Father(y,z) ^ Father(z,x) ^ Female(y) At this point we remove all positive examples covered by the rule and begin the search for a new rule.

35 FOIL : Learning recursive rules sets Predicate occurs in rule head. Example Ancestor (x, y)  Parent (x, z)  Ancestor (z, y). Rule: IF Parent (x, z)  Ancestor (z, y) THEN Ancestor (x, y) Learning recursive rule from relation Given: appropriate set of training examples Can learn using FOIL-based search Requirement: Ancestor  Predicates Recursive rules still have to outscore competing candidates at FOIL-Gain How to ensure termination? (i.e. no infinite recursion) [Quinlan, 1990; Cameron-Jones and Quinlan, 1993]

36 Induction as inverted deduction Induction : inference from specific to general Deduction : inference from general to specific Induction can be cast as a deduction problem ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) D : a set of training data B : background knowledge x i : ith training instance f(x i ) : target value X├ Y : “Y follows deductively from X”, or “X entails Y”  For every training instance x i, the target value f(x i ) must follow deductively from B, h, and x i

37 Induction as inverted deduction Learn target : Child(u,v) : child of u is v Positive example : Child(Bob, Sharon) Given instance: Male(Bob), Female(Sharon), Father(Sharon,Bob) Background knowledge : Parent(u,v)  Father(u,v) Hypothesis satisfying the (B ∧ h ∧ x i )├ f(x i ) h1 : Child(u, v) ←Father(v, u) : no need of B h2 : Child(u, v) ←Parent(v, u) : need B The role of Background Knowledge Expanding the set of hypotheses New predicates (Parent) can be introduced into hypotheses(h2)

38 Induction as inverted deduction In view of induction as the inverse of deduction Inverse entailment operators is required O(B, D) = h such that ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) Input : training data D = { } background knowledge B Output : a hypothesis h

39 Induction as inverted deduction Attractive features to formulating the learning task 1. This formulation subsumes the common definition of learning 2. By incorporating the notion of B, this formulation allows a more rich definition 3. By incorporating B, this formulation invites learning methods that use this B to guide search for h

40 Induction as inverted deduction Practical difficulties to formulating 1. The requirement of the formulation does not naturally accommodate noisy training data. 2. The language of first-order logic is so expressive, and the number of hypotheses that satisfy the formulation is so large. 3. In most ILP system, the complexity of the hypothesis space search increases as B is increased.

41 Inverting Resolution Resolution rule P ∨ L ￢ L ∨ R P ∨ R (L: literal P,R : clause) Resolution Operator (propositional form) Given initial clauses C 1 and C 2, find a literal L from clause C 1 such that ￢ L occurs in clause C 2. Form the resolvent C by including all literal from C 1 and C 2, except for L and ￢ L. More precisely, the set of literals occurring in the conclusion C is C = (C 1 - {L}) ∪ (C 2 - { ￢ L})

42 Inverting Resolution Example 1 C 2 : KnowMaterial ∨ ￢ Study C 1 : PassExam ∨ ￢ KnowMaterial C: PassExam ∨ ￢ Study Example 2 C 1 : A ∨ B ∨ C ∨ ┐D C 2 : ┐B ∨ E ∨ F  C : A ∨ C ∨ ┐D ∨ E ∨ F

43 Inverting Resolution O(C, C 1 ) Perform inductive inference Inverse Resolution Operator (propositional form) Given initial clauses C 1 and C, find a literal L that occurs in clause C 1, but not in Clause C. Form the second clause C 2 by including the following literals C 2 = (C - (C 1 -{L})) ∪ { ￢ L}

44 Inverting Resolution Example 1 C 2 : KnowMaterial ∨ ￢ Study C 1 : PassExam ∨ ￢ KnowMaterial C: PassExam ∨ ￢ Study Example 2 C 1 : B ∨ D, C : A ∨ B  C 2 : A ∨ ┐D (if, C 2 : A ∨ ┐D ∨ B ??) Inverse resolution is nondeterministic

45 Inverting Resolution : First-Order Resolution First-Order Resolution Substitution ( 置换 ) Mapping of variables to terms Ex) θ = {x/Bob, z/y} Unifying Substitution （合一置换） For two literal L 1 and L 2, provided L 1 θ = L 2 θ Ex) θ = {x/Bill, z/y} L 1 =Father(x, y), L 2 =Father(Bill, z) L 1 θ = L 2 θ = Father(Bill, y)

46 Inverting Resolution : First-Order Resolution Resolution Operator (first-order form) Find a literal L 1 from clause C 1, literal L 2 from clause C 2, and substitution θ such that L 1 θ= ￢ L 2 θ. From the resolvent C by including all literals from C 1 θ and C 2 θ, except for L 1 θ and ￢ L 2 θ. More precisely, the set of literals occurring in the conclusion C is C = (C 1 - {L 1 })θ ∪ (C 2 - {L 2 })θ

47 Inverting Resolution : First-Order Resolution Example C 1 = White(x) ← Swan(x), C 2 = Swan(Fred) C 1 = White(x) ∨￢ Swan(x), L 1 = ￢ Swan(x), L 2 =Swan(Fred) unifying substitution θ = {x/Fred} then L 1 θ = ￢ L 2 θ = ￢ Swan(Fred) (C 1 -{L 1 })θ = White(Fred) (C 2 -{L 2 })θ = Ø ∴ C = White(Fred)

48 Inverting Resolution : First-order case Inverse Resolution : First-order case C=(C 1 -{L 1 })θ 1 ∪ (C 2 -{L 2 })θ 2 (where, θ = θ 1 θ 2 (factorization)) C - (C 1 -{L 1 })θ 1 = (C 2 -{L 2 })θ 2 (where, L 2 = ￢ L 1 θ 1 θ 2 -1 ) ∴ C 2 =(C-(C 1 -{L 1 })θ 1 )θ 2 -1 ∪ { ￢ L 1 θ 1 θ 2 -1 }

49 Inverting Resolution : First-order case Multistep Inverse Resolution Father(Tom,Bob) GrandChild(y,x) ∨￢ Father(x,z) ∨￢ Father(z,y) {Bob/y,Tom/z} Father(Shannon,Tom) GrandChild(Bob,x) ∨￢ Father(x,Tom) {Shannon/x} GrandChild(Bob,Shannon)

50 Inverting Resolution C=GrandChild(Bob,Shannon) C 1 =Father(Shannon,Tom) L 1 =Father(Shannon,Tom) Suppose we choose inverse substitution θ 1 -1 ={}, θ 2 -1 ={Shannon/x) (C-(C 1 -{L 1 })θ 1 )θ 2 -1 = (Cθ 1 )θ 2 -1 = GrandChild(Bob,x) { ￢ L 1 θ 1 θ 2 -1 } = ￢ Father(x,Tom) ∴ C 2 = GrandChild(Bob,x) ∨￢ Father(x,Tom) or equivalently GrandChild(Bob,x) ← ￢ Father(x,Tom)

51 逆消解小结逆消解提供了一种一般的途径以自动产生满足式子 ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) 的假设与 FOIL 方法的差异逆消解是样例驱动， FOIL 是生成再测试逆消解在每一步只考虑可用数据中的一小部分， FOIL 考虑所有的可用数据看起来基于逆消解的搜索更有针对性且更有效，但实际未必如此

52 Summary Learning Rules from Data Sequential Covering Algorithms 序列覆盖算法学习析取的规则集，方法是先学习单个精确的规则，然后移去被此规则覆盖的正例，再在剩余样例上重复这一过程序列覆盖算法提供了一个学习规则集的有效的贪婪算法，可作为由顶向下的决策树学习算法的替代算法，决策树算法可被看作并行覆盖，与序列覆盖相对应 Learning single rules by search Beam search Alternative covering methods ：这些方法的不同在于它们考察规则前件空间的策略不同 Learning rule sets First-Order Rules Learning single first-order rules Representation: first-order Horn clauses Extending Sequential-Covering and Learn-One-Rule: variables in rule preconditions ：学习一阶规则集的方法是将 CN2 中的序列覆盖算法由命题形式扩展到一阶表示。

53 Summary FOIL: learning first-order rule sets ：可学习包括简单递归规则集在内的一阶规则集 Idea: inducing logical rules from observed relations Guiding search in FOIL Learning recursive rule sets Induction as inverted deduction 学习一阶规则集的另一个方法是逆演绎，通过运用熟知的演绎推理的逆算子来搜索假设 Idea : inducing logical rule as inverted deduction O(B, D) = h –such that ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) 逆消解是消解算子的逆转，而消解是普遍用于机器定理证明的一种推理规则，

54 书后练习

Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University.

Similar presentations

Presentation on theme: "Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University.

Similar presentations

Presentation on theme: "Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University."— Presentation transcript:

Similar presentations

About project

Feedback