Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu.

Similar presentations


Presentation on theme: "Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu."— Presentation transcript:

1 Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu

2 Outline  Extension matrix approach for rule induction  The MFL and MCV optimization problems  The AE1 solution  The HCV solution  Noise handling and discretization in HCV  Comparison of HCV with ID3-like algorithms including C4.5 and C4.5 rules

3 a  Number of attributes X a  a th attribute e   Vector of positive examples e –  Vector of negative examples  Value of a th attribute in the k th positive example n  Number of negative examples p  Number of positive examples (r ij ) axb  ij th element of axb matrix A(i,j)  ij th element of matrix A Extension Matrix Terminology

4  A positive example is such an example that belongs to a known class, say ‘Play’  All the other examples can be called negative examples Extension Matrix Definitions (overcast, mild, high, windy) => Play (rainy, hot, high, windy) => Don’t Play

5  Negative example matrix is defined as Negative Example Matrix

6  when, v + j k  NEM ij NEM ij when, v + j k  NEM ij The extension matrix (EM) of a positive example against NEM, is defined as dead-element Extension Matrix

7 Example Extension Matrix Negative Extension Matrix (NEM) Positive Example

8 Example Extension Matrix Extension Matrix (EM) Positive Example

9 e.g., {X 1  1, X 2  0, X 1  1} and {X 1  1, X 3  1, X 2  0} are paths in the extension matrix above A set of ‘n’ non-dead elements that come from ‘i’ different rows is called a path in an extension matrix Attributes Extension matrix Paths in Extension Matrices

10 Conjunctive Formulas A path in the EM k of the positive example k against NEM corresponds to a conjunctive formula or cover

11 A path in the EMD of against NE corresponds to a conjunctive formula or cover which covers Against NE and vice-versa Disjunction Matrix  when, otherwise all of Extension Matrix Disjunction

12 EMD Example Negative Extension Matrix (NEM)

13 EMD Example Extension Matrix Disjunction (EMD) Positive Example

14 EMD Example Positive Example Extension Matrix Disjunction (EMD)

15 EMD Example Positive Example Extension Matrix Disjunction (EMD)

16 MFL and MCV (1)  The minimum formula problem (MFL)  Generating a conjunctive formula that covers a positive example or an intersecting group of positive examples against NEM and has the minimum number of different conjunctive selectors  The minimum cover problem (MCV)  Seeking a cover that covers all positive examples in PE against NEM and has the minimum number of conjunctive formulae with each conjunctive formula being as short as possible

17 MFL and MCV (2)  NP-hard  Two complete algorithms are designed to solve them when each attribute domain D i  {i  1,…,a} satisfies |D i  2|  O(na2 a )for MFL  O(n2 a 4 a  pa 2 4 a )for MCV  When |D i  2|, the domain can be decomposed into several, each having base 2

18 AE1 Heuristic  Starting search from columns with the most non- dead elements  Simplifying redundancy by deductive inference rules in mathematical logic

19  Can easily loose optimum solution Here, AE1 will select [X 2  0], [X 1  1], and [X 3  1], instead of [X 1  1] and [X 3  1]  Simplifying redundancy for MFL and MCV itself is NP-hard Problems with AE1

20  HCV is a extension matrix based rule induction algorithm which is  Heuristic  Attribute­based  Noise­tolerant  Divides the positive examples into intersecting groups.  Uses HFL heuristics to find a conjunctive formula which covers each intersecting group.  Low­order polynomial time complexity at induction time What is HCV ?

21 HCV Issues  The HCV algorithm  The HFL heuristics  Speed and efficiency  Noise handling capabilities  Dealing with numeric and nominal data  Accuracy and description compactness

22 HCV Algorithm (1) Procedure HCV(EM 1,..., EM p ; Hcv) integer n, a, p matrix EM 1 (n,a),..., EM p (n,a), D(p) set Hcv S1: D   D(j) = 1 (j = 1,..., p) indicates that EM j has been put into an intersecting group. Hcv   initialization S2: for i = 1 to p, do if D(i) = 0 then { EM  EM i

23 HCV Algorithm (2) for j = i+1 to p, do if D(j) = 0 then {EM2  EM EM j If there exists at least one path in EM2 then { EM  EM2, D(j)  1 } } next j call HFL(EM; Hfl) Hcv  Hcv  Hfl } next i Return (Hcv)

24 HFL - Fast Strategy Selector [X 5  {normal, dry-peep}] can be a possible selector, which will cover all 5 rows

25 HFL - Precedence Selector [X 1  1] and [X 3  1] are two inevitable selectors in the above extension matrix

26 HFL - Elimination Attribute X 2 can be eliminated by X 3

27 HFL - Least Frequency Attribute X 1 can be eliminated and there still exists a path

28 HFL Algorithm (1) Procedure HFL(EM; Hfl) S0: Hfl  {} S1: /* the fast strategy */ Try the fast strategy on all these rows which haven't been covered; If successful, add a corresponding selector to Hfl and return(Hfl) S2: /* the precedence strategy */ Apply the precedence strategy to the uncovered rows; If some inevitable selectors are found, add them to Hfl, label all the rows they cover, and go to S1

29 HFL Algorithm (2) S3: /* the elimination strategy */ Apply the elimination strategy to those attributes that have neither been selected nor eliminated; If an eliminable selector is found, reset all the elements in the corresponding column with *, and go to S2. S4: /* the least­frequency strategy */ Apply the least­frequency strategy to those attributes which have neither been selected nor eliminated, and find a least­frequency selector; Reset all the elements in the corresponding column with *, and go to S2. Return(Hfl)

30 Complexity of HFL  S1 - O(na)  S2 - O(na)  S3 - O(na 2 )  S4 - O(na)  Overall - O( a(na  na  na 2  na) )  O(na 3 )

31 Complexity of HCV  Worst case time complexity  Space requirement  2na

32 HCV Example

33 NEM

34 HCV Example EM 1 Positive Example 1

35 HCV Example EM 2 Positive Example 2

36 HCV Example EM 3 Positive Example 3

37 HCV Example EM 4 Positive Example 4

38 HCV Example EM 5 Positive Example 5

39 HCV Example EM 1 EM 2

40 HCV Example EM 1 EM 2 EM 3

41 HCV Example EM 1 EM 2 EM 3 EM 4

42 HCV Example EM 1 EM 2 EM 3 EM 4 EM 5

43 HCV Example HFL Step 1: Fast Strategy HFL Rules = {}

44 HCV Example HFL Step 2: Precedence HFL Rules = {}

45 HCV Example HFL Step 3: Elimination HFL Rules = {}

46 HCV Example HFL Rules = {} HFL Step 4: Least-Frequency

47 HCV Example HFL Step 4: Least-Frequency HFL Rules = {}

48 HCV Example HFL Step 2: Precedence HFL Rules = {ESR fast }

49 HCV Example HFL Step 2: Precedence HFL Rules = {ESR fast }

50 HCV Example HFL Step 1: Fast Strategy HFL Rules = {ESR fast, Auscultation normal }

51 HCV Example HFL Step 1: Fast Strategy HFL Rules = {ESR fast, Auscultation normal }

52 HCV Example HCV generated rule C4.5rules generated rule

53 Example (8)

54 HCV versus AE1  The use of disjunctive matrix  Reasonable solution to MFL and MCV  Noise handling  Discretization of attributes

55 HCV Noise Handling  Don’t care values are dead elements  Approximate partitioning  Stopping criteria

56 Discretization of Attributes  Information Gain Heuristic  Stop splitting criteria  Stop if the information gain on all cut points is the same.  Stop if the number of example to split is less than a certain number.  Limit the total number of intervals.

57 Comparison (1) Table 1:Number of rules and conditions using Monk 1, 2 and 3 dataset as training set 1, 2 and 3 respectively

58 Comparison (2) Table 2: Accuracy

59 Comparison (3)

60 Conclusions  Rules generated in HCV take the form of variable-valued logic rules, rather than decision trees  HCV generates very compact rules in low-order polynomial time  Noise handling and discretization  Predictive accuracy comparable to the ID3 family of algorithms viz., C4.5, C4.5rules


Download ppt "Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu."

Similar presentations


Ads by Google