Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining the Most Interesting Rules Roberto J. Bayardo Jr., Rakesh Agrawal Presented by: Mohamed G. Elfeky.

Similar presentations


Presentation on theme: "Mining the Most Interesting Rules Roberto J. Bayardo Jr., Rakesh Agrawal Presented by: Mohamed G. Elfeky."— Presentation transcript:

1 Mining the Most Interesting Rules Roberto J. Bayardo Jr., Rakesh Agrawal Presented by: Mohamed G. Elfeky

2 Introduction Algorithms for mining rules: Constraint-based Heuristic (Predictive rules) Interestingness-metric Several interestingness metrics: confidence, support, laplace, gain, conviction

3 Generic Problem Statement The rule: A  C The input is: (U, D, , C, N) U is a set of conditions for the rule antecedent. D is a data-set.  is a total order on rules. C is a condition for the rule consequent. N is a set of constraints on rules.

4 Optimized Rule Mining Find a set A 1  U such that: A 1 satisfies N,   A 2  U: A 2 satisfies N  A 1 < A 2. Any rule A  C whose A  A 1 is optimal. Generally, this is NP-Hard problem.

5 Partial-Order Optimized Rule Mining Partial order vs. Total order Some rules may be incomparable. Several equivalence classes for optimal rules. Find a set O  P(U) such that:  A  O: A is optimal, For each equivalence class that has a rule that is optimal, exactly one member of this class is within O.

6 Monotonicity f(x) is said to be monotone in x if: x 1 < x 2  f(x 1 )  f(x 2 ) f(x) is said to be anti-monotone in x if: x 1 < x 2  f(x 1 )  f(x 2 )

7 Optimality SC-Optimality PC-Optimality Definition Theoretical Implications Practical Implications

8 SC-Optimality: Definition The partial order  sc For rules r 1 and r 2 : r 1 < sc r 2 if and only if: sup(r 1 )  sup(r 2 )  conf(r 1 ) < conf(r 2 ), or sup(r 1 ) < sup(r 2 )  conf(r 1 )  conf(r 2 ). Also, r 1 = sc r 2 if and only if: sup(r 1 ) = sup(r 2 )  conf(r 1 ) = conf(r 2 ).

9 SC-Optimality: Definition (cont.) The partial order  s  c For rules r 1 and r 2 : r 1 < s  c r 2 if and only if: sup(r 1 )  sup(r 2 )  conf(r 1 ) > conf(r 2 ), or sup(r 1 ) < sup(r 2 )  conf(r 1 )  conf(r 2 ). Also, r 1 = s  c r 2 if and only if: sup(r 1 ) = sup(r 2 )  conf(r 1 ) = conf(r 2 ).

10 SC-Optimality: Definition (cont.) sc-optimal rule s  c-optimal rule non-optimal rule confidence support No optimal rules fall outside the borders

11 SC-Optimality: Theoretical Implications A total order  t is implied by  sc if: r 1  sc r 2  r 1  t r 2 ^ r 1 = sc r 2  r 1 = t r 2 r is optimal for  sc  r is optimal for  t.  t defined by f(r) is implied by  sc if: f(r) is monotone in support, and f(r) is monotone in confidence.

12 SC-Optimality: Theoretical Implications (cont.) Interestingness metrics: laplace(r) = gain(r) = sup(r) (1 –  /conf(r)) conviction(r) =  /(1 – conf(r)) sup(r) + 1 sup(r)/conf(r) + k

13 PC-Optimality: Definition The partial order  pc For rules r 1 and r 2 : r 1 < pc r 2 if and only if: pop(r 1 )  pop(r 2 )  conf(r 1 ) < conf(r 2 ), or pop(r 1 )  pop(r 2 )  conf(r 1 )  conf(r 2 ). Also, r 1 = pc r 2 if and only if: pop(r 1 ) = pop(r 2 )  conf(r 1 ) = conf(r 2 ).

14 PC-Optimality: Definition (cont.) pop(A  C) is the set of records from D that satisfy both A and C. |pop(r)| = sup(r)  |D| Analogously, the definition of  p  c

15 PC-Optimality: Theoretical Implications  sc is implied by  pc and  s  c by  p  c.  pc results in more incomparable rule pairs. pc-optimal rule set will contain more rules than sc-optimal rule set.

16 Optimality: Practical Implications Two algorithms are proposed, one for each type of optimality. Each algorithm produces a set of optimal rules without specifying the interestingness metrics. The produced set is guaranteed to identify the most interesting rules according to several metrics.

17 Optimality: Practical Implications (cont.) These algorithms facilitate interactivity: Examine the optimal rules according to some metric without additional querying or mining. Find the most interesting rule that characterizes any given subset of the population.


Download ppt "Mining the Most Interesting Rules Roberto J. Bayardo Jr., Rakesh Agrawal Presented by: Mohamed G. Elfeky."

Similar presentations


Ads by Google