Download presentation

Presentation is loading. Please wait.

Published byBranden Stobbs Modified about 1 year ago

1
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications

2
10 -2 Basics Data Mining vs. KDD (Knowledge Discovery in Database) KDD Process

3
10 -3 Basics Data Mining: Discovery of patterns and relationships amongst various variables in a database. Association rules: To represent the patterns or correlations discovered by Data Mining. Basic form: where x 1 : item; X or X {y} : transaction Examples: Bread, Milk → Eggs Temp_High, Cloud_None → Burn_High

4
10 -4 Basics Evaluating the strength of associations support, sup( X) : number of transactions that contain X confidence, conf( X y ): probability that a transaction having X also contains y, or sup(X {y})/ sup(X)

5
10 -5 Basics Example association rule: R1: Burger, Fries → Cola Burger, Fries and Cola appear together 100 times in the dataset: sup({Burger, Fries, Cola}) = 100 Burger & Fries appear together with or without Cola is 125. Then the rule confidence is calculated as: conf(R1) = (sup({Burger, Fries, Cola})/ sup({Burger, Fries}) x 100 = (100 / 125) x 100 = 80%

6
10 -6 Basics Association rule mining can be translated to frequent pattern mining –Calculate sup(X {y}) –Calculate sup(X) Mining frequent patterns –A transaction (example) is treated as a set of items (item set) Mining frequent sequential patterns –A transaction is treated as a sequence of items (item sequence)

7
10 -7 Mining Frequent Patterns Apriori algorithm (Agrawal & Srikant, 1994) Step 1: Database Scan to produce C 1 i = 1; Produce C i, set of candidate patterns with length i Associate a support with each element in C i Step 2: Frequent patterns generation Produce L i from C i by keeping elements of sup≥ “minsup”) Step 3: Candidate patterns generation Produce C i+1 from self-joining of L i Prune patterns in C i+1 using “minimum support follows rule: All subsets of a given frequent pattern must be frequent patterns”

8
10 -8 Mining Frequent Patterns Step 4: Database scan to “compute support” Associate a support with each element in C i+1 Step 5: Iteration i ++; Repeat Steps 2, 3 and 4, until no more L i can be generated Step 6: Frequent patterns collection Step 7: Rule generation and pruning Generate rule Prune the rule if conf < “minconf”

9
10 -9 Mining Frequent Patterns Example of frequent patterns mining (minsup=2) L1 Self-joining C1C1 L1L1 L2L2 C2C2 C2C2 Scan D C3C3 L3L3 Scan D and pruning Scan D L2 Self-joining Pruning D: Database

10
10 -10 Mining Frequent Patterns Mined frequent patterns: L = L1 ∪ L2 ∪ L3 = {{1}, {2}, {3}, {5}, {1 3}, {2 3}, {2 5}, {3 5}, {2 3 5}} Rule generation: {2 5} → 3 [conf = 2/3= 67%] {3 5} → 2 [conf = 2/2= 100%] {2, 3} → 5 {1} → 3; {2} → 3; {5} → 3 {2} → 5; {3} → 5 {3} → 2; {5} → 2

11
10 -11 Mining Frequent Patterns Rule generalization can be beyond binary value rule: X y Generalization Original can be beyond binary value can be a set of values can be a set of items can be a set of values

12
10 -12 Mining Frequent Patterns Advantages Offer different approach to other data mining methods, such as classification and clustering Does not ‘generalize’ into a class, but predicts actual items Uses accurate statistical measures to evaluate rules, no inherent ‘error rate’ Take very simple form and are intuitive

13
10 -13 Mining Frequent Patterns Disadvantages The amount of computation depends critically on minimum support specified pre-mine – which often ends up as a ‘guess’ Computational issue due to amount of sweeps required over dataset – especially if dataset is too large to read into memory; ‘thrashing’ must be avoided

14
10 -14 Mining Frequent Sequential Patterns Apriori algorithm on frequent sequential patterns mining –Each frequent pattern is a sequence of items Step 3: Candidate patterns generation by joining has to consider sequential order –Given two sequential patterns: R= and S= –if s j-1 = r j, j = 2, …, k-1, then generate sequential pattern –If r j-1 = s j, j = 2, …, k-1, then generate sequential pattern

15
10 -15 Mining Frequent Sequential Patterns Step 7: Rule generation has to consider sequential order –Given two sequential patterns: S 1 = and S 2 = –we produce a sequential rule: t 1, t 2, …, t k-1 t k with conf(t1,t2,…,tk-1 tk) = sup(S1) / sup(S2)

16
10 -16 Mining Frequent Sequential Patterns Frequent sequential patterns mining based on database partitioning and perfect hashing

17
10 -17 Mining Frequent Sequential Patterns Rule generation Sequential Rule Confidence B,C E1 A C1 B C0.67 B E1 C E0.67

18
10 -18 Applications Wal-Mart has used the technique for years to mine POS data and arrange their store to maximize sales from such analysis Medical databases to discover commonly occurring diseases amongst groups of people Lottery results databases, to discover those lucky combinations of numbers

19
10 -19 Discussions Clustering algorithm in Open Sesame! –Attribute-based representation of events –Attribute-based similarity measure for clusters –Hierarchical clustering of event sequences –Generalization, e.g., “A ∧ B ∧ C” generalized to “A ∧ B” “A ∨ B” generalized to “A ∨ B ∨ C ” ontology –Specialization

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google