Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating Non-Redundant Association Rules Mohammed J. Zaki.

Similar presentations


Presentation on theme: "Generating Non-Redundant Association Rules Mohammed J. Zaki."— Presentation transcript:

1 Generating Non-Redundant Association Rules Mohammed J. Zaki

2 Yaeer Master©2 Outline  Introduction  Association Rules – reminder  Closed Frequent Itemsets  Generating Rules  Complexity Analysis  Experimental Evaluation

3 Yaeer Master©3 Introduction  Association Rule Discovery – The set of association rules can grow to be unwieldy especially as we lower the frequency requirement (support).  Many rules are redundant.  Number of redundant rules can be exponential in the length of the longest frequent itemset.  For dense datasets it is not feasible to mine all frequent itemsets.

4 Yaeer Master©4 Introduction Solution:  Using Closed Frequent Itemsets:  The set is smaller in orders of magnitude.  No loss of information.  Creating a “Generating Set”.  Algorithm for mining closed itemsets: CHARM

5 Yaeer Master©5 Association Rules

6 Yaeer Master©6 Mining Association Rules

7 Yaeer Master©7 Mining Association Rules  Find all frequent itemsets:  2 m : NP-Complete.  Assuming a bound on transaction length O (r · n · 2 L ).  Generating confident rules:  For each itemset of size k, 2 k potential rules.  Complexity: O (f · 2 L ). Num of max frequent itemsets Num of transactions Longest frequent itemset Num of frequent itemsets Longest frequent itemset

8 Yaeer Master©8 Closed Frequent Itemsets – Defining a Galois connection  The Mappings :  Let: Define a Galois Connection between the partially ordered sets P(I), P(T).  Galois connection: For all a in A and b in B: F (a) ≤ b ↔ G (b) ≤ a

9 Yaeer Master©9 Galois Connection Cont. Properties: 1. 1. 2. 2. 3. 3. )()( 2121 XtXtXX)()( 2121 YiYiYY  ))(( ))((YitYandXtiX 

10 Yaeer Master©10 Galois Connection

11 Yaeer Master©11 Example t (ACW) = t (A) ∩ t (C) ∩ t (W) = 1345 ∩ 123456 ∩ 12345 = 1345 = 1345 i (245) = CDW ACW ACDW  ACW  ACDW  t (ACW) = 1345 135 = t (ACDW) t (ACW) = 1345  135 = t (ACDW)

12 Yaeer Master©12 Closure Operator  c: P(s)  P(s) if satisfies the following: 1. 1. 2. 2. 3. 3.  Closure Composition:  c it (x) = i t (x) = i(t(x))  c ti (x) )(:XcXExtension)()(:YcXcYXtyMonotonici  )())((:XcXccyIdempotenc 

13 Yaeer Master©13 Closure Operator – Round Trip

14 Yaeer Master©14 Closed Itemset - Definition A Closed Itemset X is an Itemset that is same as its closure. Example : c it (AC) = i(t(AC) = i(1345) = ACW conclusion: AC is not closed. ACW is closed. ACW is closed.

15 Yaeer Master©15 Closed Vs Frequent itemsets

16 Yaeer Master©16 Concept - Definition  For any Closed Itemset X, there exists a Closed Tidset Y, with the property: Y = t(X).  The Pair X × Y is called a Concept.

17 Yaeer Master©17 Galois Lattice  A concept x 1 × y 1 is a sub concept of x 2 × y 2, If x 1  x 2 (if y 2  y 1 ).  Let B(δ) be the set of all concepts.  The ordered set (B(δ),≤) is a complete lattice, called the Galois lattice.

18 Yaeer Master©18 Galois Lattice Of Concepts

19 Yaeer Master©19 Frequent Closed ItemSets Vs. Frequent Itemsets  Lattice operations  Join:  Meet:  Frequent Concept: With support greater than minsup, We define the support is the cardinality of the closed tidset.

20 Yaeer Master©20 Join Meet Example Join: (ACDW × 45) (CDT × 56) = (ACDW × 45)  (CDT × 56) = c it )ACDW CDT) × (45 56) = c it )ACDW  CDT) × (45  56) = ACDTW × 5 Meet: (ACDW × 45) (CDT 56) = (ACDW × 45)  (CDT 56) = (ACDW CDT) × c ti (4556) = (ACDW  CDT) × c ti (45   56) = CD × 2456

21 Yaeer Master©21 Frequent Concepts

22 Yaeer Master©22 Frequent Concepts  Lemma 1: An itemset’s (X) support is equal to the support of its closure, i.e. σ(X) = σ(c it (X)). Therefore all frequent itemsets are uniquely determined by the Closed itemsets and can be determined by the join operation on the frequent concepts. frequent concepts frequent concepts

23 Yaeer Master©23 Redundant Rules  Definition: A rule R 1 : is more general than a rule R 2 denoted R 1 ‹ R 2, provided that R 2 can be generated by adding additional items to the antecedent or consequent of R 1. is more general than a rule R 2 denoted R 1 ‹ R 2, provided that R 2 can be generated by adding additional items to the antecedent or consequent of R 1. The Non-Redundant rules are those that are most general (with equal confidence). i p i XX i 21 

24 Yaeer Master©24 Rule Generation  Lemma 2: Transitivity: Let X 1, X 2, X 3 be frequent closed itemsets, with If, then Observation: it is sufficient to consider rules among adjacent concepts. 321 XXX  32 XX q  21 XX p  31 XX pq 

25 Yaeer Master©25 Rule Generation – 100% conf.  Lemma 3: An association rule has confidence p = 1.0 If and only if.  100% confidence rules are those directed from a super-concept to a sub-concept, i.e. Down Arcs. 2 0.1 1 XX  )()( 21 XtXt

26 Yaeer Master©26 Rule Generation – 100% conf.

27 Yaeer Master©27 Rule Generation – 100% conf  Theorem 1. Let R = {R 1,…, R n } be a set of rules with 100% confidence (p i for all i), such that for all rules R i. for all rules R i. Let R I denote the 100% confidence rule Then all rules R i ≠ R I are more specific than, and thus are redundant., and thus are redundant. )( and )( 22211 i it ii it XcIXXcI  2 0.1 1 II 

28 Yaeer Master©28 Rule Generation – 100% conf  Example: TW  A, TW  AC, CTW  A c it (TW A) = c it (ATW) = ACTW c it (TW  A) = c it (ATW) = ACTW c it (TW AC) = ACTW c it (TW  AC) = ACTW c it (CTW A) = ACTW c it (CTW  A) = ACTW The most general

29 Yaeer Master©29 Rule Generation – Confidence <100%  Rules from sub-concepts to super- concepts i.e. correspond to up-arcs.  Rules between non adjacent concepts can be derived by transitivity. For example: C  W (with p= 0.83) and W  A (q=0.8) C  A (pq = 0.67)

30 Yaeer Master©30 Rule Generation – Confidence <100%

31 Yaeer Master©31 Rule Generation – Confidence <100%  Theorem 2. Let R = {R 1,…, R n } be a set of rules with confidence p< 1.0 (p i for all i), such that for all rules R i. for all rules R i. Let R I denote the rule Then all rules R i ≠ R I are more specific than R I, and thus are redundant. )( and )( 21221 ii it i it XXcIXcI  21 II p 

32 Yaeer Master©32 Generating Set  Combining the two sets gives us a generating set for rules with minconf = 50% and minsup = 80%: }TW→A, A→W, W→C, T→C, D→C, W→A (0.8), C →W (0.83) } All association rules can Be derived from this set

33 Yaeer Master©33 Complexity of Rule Generation  Traditional:  New Framework:  Best case: one closed itemset, no rules.  Worst case:  All frequent itemsets are closed.  Number of rules:  Reduction factor: )2(22222 2 000 llll i l i lll i l i ill i l i O      )lOlil l l i l i l i l i 2 ( )( 00    ) 2 ( l O l l 2

34 Yaeer Master©34 Experimental Evaluation

35 Yaeer Master©35 Experimental Evaluation

36 Yaeer Master©36 Experimental Evaluation

37 Yaeer Master©37 Number of Rules Traditional Vs Closed itemset

38 Yaeer Master©38 Number of Rules Traditional Vs Closed itemset

39 Yaeer Master©39 Conclusion  The new framework based on closed itemsets can drastically reduce the rule set, and can be presented to the user in a succinct manner.  Future work:  Interactive visualization and exploration of mined associations, generating rules on demand based on user’s interest.  Finding a minimal generating set.


Download ppt "Generating Non-Redundant Association Rules Mohammed J. Zaki."

Similar presentations


Ads by Google