Download presentation

Presentation is loading. Please wait.

Published byConstance Ward Modified over 2 years ago

1
Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Fernando Berzal fberzal@decsai.ugr.es

2
1 Association mining searches for interesting relationships among items in a given data set EXAMPLES n Diapers and six-packs are bought together, specially on Thursday evening (a myth?) n A sequence such as buying first a digital camera and then a memory card is a frequent (sequential) pattern n … Motivation Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

3
2 MARKET BASKET ANALYSIS The earliest form of association rule mining Applications: Catalog design, store layout, cross-marketing… Motivation Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

4
3 Definition Item n n In transactional databases: Any of the items included in a transaction. n n In relational databases: (Attribute, value) pair k-itemset Set of k items Itemset support Itemset support support(I) = P(I) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

5
4 Definition Association rule X Y n Support support(X Y) = support(X U Y) = P(X U Y) n Confidence confidence(X Y) = support(X U Y) / support(X) = P(Y|X) N OTE : Both support and confidence are relative Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

6
5 Discovery Association rule mining 1. 1. Find all frequent itemsets 2. 2. Generate strong association rules from the frequent itemsets Strong association rules are those that satisfy both a minimum support threshold and a minimum confidence threshold. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

7
6 Apriori Observation: All non-empty subsets of a frequent itemset must also be frequent Algorithm: Frequent k-itemsets are used to explore potentially frequent (k+1)- itemsets (i.e. candidates) Discovery "Fast Algorithms for Mining Association Rules", Agrawal & Skirant: "Fast Algorithms for Mining Association Rules", VLDB'94 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

8
7 Apriori improvements (I) n "An Effective Hash-Based Algorithm for Mining Association Rules", n Reducing the number of candidates Park, Chen & Yu: "An Effective Hash-Based Algorithm for Mining Association Rules", SIGMOD'95 "Mining Association Rules with Adjustable Accuracy", Sampling Toivonen: "Sampling Large Databases for Association Rules", VLDB'96 Park, Yu & Chen: "Mining Association Rules with Adjustable Accuracy", CIKM'97 "An Efficient Algorithm for Mining Association Rules in Large Databases" Partitioning Savasere, Omiecinski & Navathe: "An Efficient Algorithm for Mining Association Rules in Large Databases", VLDB'95 Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

9
8 Apriori improvements (II) n "Fast Algorithms for Mining Association Rules", n Transaction reduction Agrawal & Skirant: "Fast Algorithms for Mining Association Rules", VLDB'94 (AprioriTID) "Dynamic Itemset Counting and Implication Rules for Market Basket Data", "Online Association Rule Mining", Dynamic itemset counting Brin, Motwani, Ullman & Tsur: "Dynamic Itemset Counting and Implication Rules for Market Basket Data", SIGMOD'97 (DIC) Hidber: "Online Association Rule Mining", SIGMOD'99 (CARMA) Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

10
9 Discovery Apriori-like algorithm: TBAR (Tree-based association rule mining) Berzal, Cubero, Sánchez & Serrano “TBAR: An efficient method for association rule mining in relational databases” Data & Knowledge Engineering, 2001 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

11
10 Discovery: TBAR A #7 B #9 C #7 D #8 B #6 D #5 C #6 D #7 D #5 5 instances with ABD 7 instances wih A 6 instances with AB 5 instances with AD L1L1L1L1 L2L2L2L2 L3L3L3L3 6 instances with BC Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

12
11 An alternative to Apriori: Compress the database representing frequent items into a frequent-pattern tree (FP-tree)… "Mining Frequent Patterns without Candidate Generation", Han, Pei & Yin: "Mining Frequent Patterns without Candidate Generation", SIGMOD'2000 Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

13
12 A challenge When an itemset is frequent, all its subsets are also frequent n n Closed itemset C: There exists no proper super-itemset S such that support(S)=support(C) n n Maximal (frequent) itemset M: M is frequent and there exists no super-itemset Y such that M Y and Y is frequent. Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

14
13 Variations Based on the kinds of patterns to be mined: n n Frequent itemset mining (transactional and relational data) n n Sequential pattern mining (sequence data sets, e.g. bioinformatics) n n Structured pattern mining (structured data, e.g. graphs) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

15
14 Variations Based on the types of values handled: n n Boolean association rules n n Quantitative association rules n n Fuzzy association rules Delgado, Marín, Sánchez & Vila “Fuzzy association rules: General model and applications” IEEE Transactions on Fuzzy Systems, 2003 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

16
15 Variations More options: n n Generalized association rules (a.k.a. multilevel association rules) n n Constraint-based association rule mining n n Incremental algorithms n n Top-k algorithms n n … I C D M F I M I W o r k s h o p o n F r e q u e n t I t e m s e t M i n i n g I m p l e m e n t a t i o n s Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

17
16 Visualization Integrated into data mining tools to help users understand data mining results: n n Table-based approach e.g. SAS Enterprise Miner, DBMiner… n n 2D Matrix-based approach e.g. SGI MineSet, DBMiner… n n Graph-based techniques e.g. DBMiner ball graphs Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

18
17 Visualization: Tables Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

19
18 Visualization: Visual aids Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

20
19 Visualization: 2D Matrix Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

21
20 Visualization: Graphs Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

22
21 Visualization: VisAR Based on parallel coordinates (Techapichetvanich & Datta, ADMA’2005) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

23
22 Extensions Confidence is not the best possible interestingness measure for rules e.g. A very frequent item will always appear in rule consequents, regardless its true relationship with the rule antecedent X went to war X did not serve in Vietnam (from the US Census) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

24
23 Extensions Desirable properties for interestingness measures Piatetsky-Shapiro, 1991 P1ACC(A ⇒ C) = 0 when supp(A ⇒ C) = supp(A)supp(C) P2 ACC(A ⇒ C) monotonically increases with supp(A ⇒ C) P3ACC(A ⇒ C) monotonically decreases with supp(A) (or supp(C)) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

25
24 Extensions Certainty factors… n n … satisfy Piatetsky-Shapiro’s properties n n … are widely-used in expert systems n n … are not symmetric (as interest/lift) n n … can substitute conviction when CF>0 “Measuring the accuracy and interest of association rules: A new framework", Berzal, Blanco, Sánchez & Vila: “Measuring the accuracy and interest of association rules: A new framework", Intelligent Data Analysis, 2002 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

26
25 Extensions References: “Evaluation of interestingness measures for ranking discovered knowledge” Hilderman & Hamilton: “Evaluation of interestingness measures for ranking discovered knowledge”. PAKDD, 2001 “Selecting the right objective measure for association analysis” Tan, Kumar & Srivastava: “Selecting the right objective measure for association analysis”. Information Systems, vol. 29, pp. 293-313, 2004. “Association rule evaluation for classification purposes” Berzal, Cubero, Marín, Sánchez, Serrano & Vila: “Association rule evaluation for classification purposes” TAMIDA’2005 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

27
26 Applications Two sample applications where associations rules have been successful n n Classification (ART) n n Anomaly detection (ATBAR) “Discovering Hidden Association Rules ” Balderas, Berzal, Cubero, Eisman & Marín “Discovering Hidden Association Rules ” KDD’2005, Chicago, Illinois, USA Berzal, Cubero, Sánchez & Serrano “ART: A hybrid classification model” Machine Learning Journal, 2004 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

28
27 Classification Classification models based on association rules n n Partial classification models vg: Bayardo n n “Associative” classification models vg: CBA (Liu et al.) n n Bayesian classifiers vg: LB (Meretakis et al.) n n Emergent patterns vg: CAEP (Dong et al.) n n Rule trees vg: Wang et al. n n Rules with exceptions vg: Liu et al. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

29
28 GOAL Simple, intelligible, and robust classification models obtained in an efficient and scalable way MEANS Classification Decision Tree Induction + Association Rule Mining =ART [Association Rule Trees] Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

30
29 ART Classification Model IDEA Make use of efficient association rule mining algorithms to build a decision-tree-shaped classification model. ART = Association Rule Tree KEY Association rules + “else” branches Hybrid between decision trees and decision lists Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

31
30 ART Classification Model SPLICE Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

32
31 Construction ART classification model K=1 Rule mining (rules with K items in their LHS) ¿suitable rules? Branch the tree using selected rules and recursively process the “else” branch Yes K=K+1 ¿ K <= MaxSize ? Yes No Create a leaf node labelled with the most frequent class Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

33
32 Construction ART classification model Rule mining: Candidate hypotheses MinSupp Minimum support threshold MinConf Minimum confidence threshold Fixed threshold Automatic selection K=1 Rule mining Selection Tree level K++ Go on? Tree leaf Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

34
33 Rule selection: n n Rules grouped by sets of attributes. n n Preference criterion. Construction ART classification model K=1 Rule mining Selection Tree level K++ Go on? Tree leaf Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

35
34 Example Dataset ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

36
35 Example Level 1K = 1 ART classification model S1: if (Y=0) then C=0 with confidence 75% if (Y=1) then C=1 with confidence 75% S2: if (Z=0) then C=0 with confidence 75% if (Z=1) then C=1 with confidence 75% LEVEL 1 – Association rule mining Minimum support threshold = 20% Automatic confidence threshold selection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

37
36 Example Level 1K = 2 LEVEL 1 – Association rule mining Minimum support threshold = 20% Automatic confidence threshold selection S1: if (X=0 and Y=0) then C=0 (100%) if (X=0 and Y=1) then C=1 (100%) S2: if (X=1 and Z=0) then C=0 (100%) if (X=1 and Z=1) then C=1 (100%) S3: if (Y=0 and Z=0) then C=0 (100%) if (Y=1 and Z=1) then C=1 (100%) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

38
37 Example Level 1 LEVEL 1 Best rule set selection e.g. S1 X=0 and Y=0: C=0 (2) X=0 and Y=1: C=1 (2) else... S1: if (X=0 and Y=0) then C=0 (100%) if (X=0 and Y=1) then C=1 (100%) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

39
38 Example Level 1 Level 2 ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

40
39 Example Level 2 LEVEL 2 Rule mining S1: if (Z=0) then C=0 with confidence 100% if (Z=1) then C=1 with confidence 100% RESULT X=0 and Y=0: C=0 (2) X=0 and Y=1: C=1 (2) else Z=0: C=0 (2) Z=1: C=1 (2) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

41
40 Example ART vs. TDIDT ARTTDIDT ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

42
41 Classifier accuracy ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

43
42 Classifier complexity ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

44
43 Training time ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

45
44 I/O Operations - Scans ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

46
45 I/O Operations - Records ART classification model > Experimental results

47
46 I/O Operations - Pages ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

48
47 Final comments ART classification model Classification models n n Acceptable accuracy n n Reduced complexity n n Attribute interactions n n Robustness (noise & primary keys) Classifier building method n n Efficient algorithm n n Good scalability properties n n Automatic parameter selection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

49
48 It is often more interesting to find surprising non-frequent events than frequent ones EXAMPLES n Abnormal network activity patterns in intrusion detection systems. n Exceptions to “common” rules in Medicine (useful for diagnosis, drug evaluation, detection of conflicting therapies…) n … Anomaly detection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

50
49 Anomaly detection Anomalous association rule Confident rule representing homogeneous deviations from common behavior. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

51
50 Anomaly detection X¬Y confident X Y frequent and confident X usually implies Y (dominant rule) When X does not imply Y, then it usually implies A (the Anomaly) A X Y ¬Aconfident Anomalous association rule Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

52
51 Anomaly detection XYA1A1 Z1Z1 … XYA1A1 Z2Z2 … XYA2A2 Z3Z3 … XYA2A2 Z1Z1 … XYA3A3 Z2Z2 … XYA3A3 Z3Z3 … XYAZ… XY3Y3 AZ3Z3 … X Y3Y3 AZ… XY4Y4 AZ… X Y is the dominant rule X A when ¬ Y is the anomalous rule Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

53
52 Anomaly detection Suzuki et al.’s “Exception Rules” X Y is an association rule X I X I is the reference rule is the exception rule¬ Y I is the “interacting” itemset Too many exceptions The “cause” needs to be present Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

54
53 Anomaly detection: ATBAR Anomalous association rules A #7 AB#6 AC#4 AD#5 AE#3 AF#3 B #9 C #7 D #8 First scan A #7 Second scan B #6 D #5 Non-frequent A #7 A * Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

55
54 Anomaly detection: ATBAR Anomalous association rules B #9 C #7 D #8 First scan A #7 Second scan A #7 A * B #6 D #5 B #9 B * C #7 C * D #8 D * C #6 D #7 D #5 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

56
55 Anomaly detection: ATBAR Anomalous association rules Rule generation is immediate from the frequent and extended itemsets obtained by ATBAR Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

57
56 Anomaly detection: Results Experiments on health-related datasets from the UCI Machine Learning Repository n n Relatively small set of anomalous rules (typically, >90% reduction with respect to standard association rules) n n Reasonable overhead needed to obtain anomalous association rules (about 20% in ATBAR w.r.t. TBAR) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

58
57 Anomaly detection: Results An example from the Census dataset: if WORKCLASS: Local-gov then CAPGAIN: [99999.0, 99999.0] (7 out of 7) CAPGAIN: [99999.0, 99999.0] (7 out of 7) when not CAPGAIN: [0.0, 20051.0] Usual consequent “Anomaly” Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

59
58 n n Anomalous association rules (novel characterization of potentially interesting knowledge) n n An efficient algorithm for discovering anomalous association rules: ATBAR n n Some heuristics for filtering the discovered anomalous association rules Anomaly detection: Results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

60
59 n n Additional heuristics for focusing on interesting anomalies (maybe domain- or even application-specific). n n Alternative measures for the evaluation and ranking of anomalous association rules: Certainty factors / Conviction … Anomaly detection: Future… Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

61
Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Questions, comments, and suggestions… Fernando Berzal fberzal@decsai.ugr.es

Similar presentations

OK

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授：廖述賢博士 報 告 人：朱 佩 慧 班 級：管科所博一.

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授：廖述賢博士 報 告 人：朱 佩 慧 班 級：管科所博一.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google