Download presentation
Presentation is loading. Please wait.
1
© Vipin Kumar CSci 8980 Fall 2002 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer Science University of Minnesota http://www.cs.umn.edu/~kumar
2
© Vipin Kumar CSci 8980 Fall 2002 2 Interestingness Measures l Association rule algorithms tend to produce too many rules –many of them are uninteresting or redundant –Redundant if {A,B,C} {D} and {A,B} {D} have same support & confidence l Interestingness measures can be used to prune/rank the derived patterns l In the original formulation of association rules, support & confidence are the only measures used
3
© Vipin Kumar CSci 8980 Fall 2002 3 Application of Interestingness Measure Interestingness Measures
4
© Vipin Kumar CSci 8980 Fall 2002 4 Computing Interestingness Measure l Given a rule X Y, information needed to compute rule interestingness can be obtained from a contingency table YY Xf 11 f 10 f 1+ Xf 01 f 00 f o+ f +1 f +0 |T| Contingency table for X Y f 11 : support of X and Y f 10 : support of X and Y f 01 : support of X and Y f 00 : support of X and Y Can apply various Measures u support, confidence, lift, Gini, J-measure, etc.
5
© Vipin Kumar CSci 8980 Fall 2002 5 Drawback of Confidence Coffee Tea15520 Tea75580 9010100 Association Rule: Tea Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9 Although confidence is high, rule is misleading P(Coffee|Tea) = 0.9375
6
© Vipin Kumar CSci 8980 Fall 2002 6 Other Measures
8
© Vipin Kumar CSci 8980 Fall 2002 8 Properties of A Good Measure l Piatetsky-Shapiro: 3 properties a good measure M should satisfy: –M(A,B) = 0 if A and B are statistically independent –M(A,B) increase monotonically with P(A,B) when P(A) and P(B) remain unchanged –M(A,B) decreases monotonically with P(A) [or P(B)] when P(A,B) and P(B) [or P(A)] remain unchanged
9
© Vipin Kumar CSci 8980 Fall 2002 9 Lift & Interest YY X100 X090 1090100 YY X900 X010 9010100 Statistical independence: If P(X,Y)=P(X)P(Y) => Lift = 1
10
© Vipin Kumar CSci 8980 Fall 2002 10 Comparing Different Measures 10 examples of contingency tables: Rankings of contingency tables using various measures:
11
© Vipin Kumar CSci 8980 Fall 2002 11 Property under Variable Permutation Does M(A,B) = M(B,A)? Symmetric measures: u support, lift, collective strength, cosine, Jaccard, etc Asymmetric measures: u confidence, conviction, Laplace, J-measure, etc
12
© Vipin Kumar CSci 8980 Fall 2002 12 Property under Row/Column Scaling MaleFemale High235 Low145 3710 MaleFemale High43034 Low24042 67076 Grade-Gender Example (Mosteller, 1968): Mosteller: Underlying association should be independent of the relative number of male and female students in the samples 2x10x
13
© Vipin Kumar CSci 8980 Fall 2002 13 Property under Inversion Operation Transaction 1 Transaction N..........
14
© Vipin Kumar CSci 8980 Fall 2002 14 Example: -Coefficient l -coefficient is analogous to correlation coefficient for continuous variables YY X601070 X102030 7030100 YY X201030 X106070 3070100 Coefficient is the same for both tables
15
© Vipin Kumar CSci 8980 Fall 2002 15 Property under Null Addition Invariant measures: u support, cosine, Jaccard, etc Non-invariant measures: u correlation, Gini, mutual information, odds ratio, etc
16
© Vipin Kumar CSci 8980 Fall 2002 16 Different Measures have Different Properties
17
© Vipin Kumar CSci 8980 Fall 2002 17 Support-based Pruning l Most of the association rule mining algorithms use support measure to prune rules and itemsets l Study effect of support pruning on correlation of itemsets –Generate 10000 random contingency tables –Compute support and pairwise correlation for each table –Apply support-based pruning and examine the tables that are removed
18
© Vipin Kumar CSci 8980 Fall 2002 18 Effect of Support-based Pruning
19
© Vipin Kumar CSci 8980 Fall 2002 19 Effect of Support-based Pruning Support-based pruning eliminates mostly negatively correlated itemsets
20
© Vipin Kumar CSci 8980 Fall 2002 20 Effect of Support-based Pruning l Investigate how support-based pruning affects other measures l Steps: –Generate 10000 contingency tables –Rank each table according to the different measures –Compute the pair-wise correlation between the measures
21
© Vipin Kumar CSci 8980 Fall 2002 21 Effect of Support-based Pruning u Without Support Pruning (All Pairs) u Red cells indicate correlation between the pair of measures > 0.85 u 40.14% pairs have correlation > 0.85 Scatter Plot between Correlation & Jaccard Measure
22
© Vipin Kumar CSci 8980 Fall 2002 22 Effect of Support-based Pruning u 0.5% support 50% u 61.45% pairs have correlation > 0.85 Scatter Plot between Correlation & Jaccard Measure:
23
© Vipin Kumar CSci 8980 Fall 2002 23 Effect of Support-based Pruning u 0.5% support 30% u 76.42% pairs have correlation > 0.85 Scatter Plot between Correlation & Jaccard Measure
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.