Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.

Similar presentations


Presentation on theme: "Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial."— Presentation transcript:

1 Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig aorriols@salle.url.edu aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

2 Recap of Lecture 5-12 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial IntelligenceMachine Learning

3 Recap of Lecture 5-12 Data Set Classification Model How? We have seen four different types of approaches to classification : Decision trees (C4.5) Instance-based algorithms (kNN & CBR) Bayesian classifiers (Naïve Bayes) Neural Networks (Perceptron, Adaline, Madaline, SVM) Slide 3 Artificial IntelligenceMachine Learning

4 Today’s Agenda □ Introduction to Association Rules 口 A Taxonomy of Association Rules 口 Measures of Interest □ Apriori Slide 4 Artificial IntelligenceMachine Learning

5 Introduction to AR □ Ideas come from the market basket analysis (MBA) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, bread Eggs, sugar Customer1 Customer2Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between the different items that customers place in their shopping basket Slide 5 Artificial IntelligenceMachine Learning

6 Introduction to AR □ Formalizing the problem a little bit Transaction Database T: a set of transactions T = {t 1, t 2, …, t n } Each transaction contains a set of items I (item set) An itemset is a collection of items I = {i 1, i 2, …, i m } □ General aim: Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements in databases or other information repositories. Put this relationships in terms of association rules Y X  Y Slide 6 Artificial IntelligenceMachine Learning

7 Example of AR TIDItems T1bread, jelly, peanut-butter Examples: bread  peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter beer  bread T4beer, bread T5beer, milk □ Frequent itemsets: Items that frequently appear together I = {bread, peanut-butter} I = {beer, bread} Slide 7 Artificial IntelligenceMachine Learning

8 What’s an Interesting Rule? □ Support count (σ) TIDItems Frequency of occurrence of and itemset T1bread, jelly, peanut-butter T2 bread, peanut-butter T3bread, milk, peanut-butter T4 beer, bread T5beer, milk Y o({bread, peanut-butter}) = 3 o({beer, bread}) = 1 Y □ Support Fraction of transactions that contain an itemset Y s ({bread,peanut-butter}) = 3/5 s ({beer, bread}) = 1/5 Y □ Frequent itemset An itemsetwhose support is greater than or equal to a minimum support threshold (minsup) Slide 8 Artificial IntelligenceMachine Learning

9 What’s an Interesting Rule? □ An association rule is an TIDItems implication of two itemsets X  Y T1bread, jelly, peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter T4beer, bread T5beer, milk □ Many measures of interest. The two most used are: Support (s) Y The occurring frequency of the rule, i.e., number of transactions that contain both X and Y  ( X  Y ) ( X  Y ) # oftrans. s  Confidence (c) Y The strength of the association, i.e., measures of how often items in Y Slide 9 Artificial IntelligenceMachine Learning c   ( X  Y )c   ( X  Y ) appear in transactions that contain X (X)

10 Interestingness of Rules □ Many other interesting measures The method presented herein are based on these two approaches Slide 10 Artificial IntelligenceMachine Learning TIDsc bread  peanut-butter 0.600.75 peanut-butter  bread 0.601.00 beer  bread 0.200.50 peanut-butter  jelly 0.200.33 jelly  peanut-butter 0.201.00 jelly  milk 0.00 TIDItems T1bread, jelly, peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter T4beer, bread T5beer, milk

11 Types of AR □ Binary association rules: bread  peanut-butter □ Quantitative association rules: weight in [70kg – 90kg]  height in [170cm – 190cm] □ Fuzzy association rules: weight in TALL  height in TALL □ Let’s start for the beginning Binary association rules – A priori Slide 11 Artificial IntelligenceMachine Learning

12 Apriori □ This is the most influential AR miner □ It consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules □ So, let’s pay attention to the first step Slide 12 Artificial IntelligenceMachine Learning

13 Apriori null ABCDE ABADACAEBDBCBECECDDE ABCABEABDACDADEACEBCDBDEBCECDE ABCDABCEABDEACDEBCDE ABCDE Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 13 Artificial IntelligenceMachine Learning

14 Apriori □ Let’s avoid expanding all the graph □ Key idea: Downward closure property: Any subsets of a frequent itemset are also frequent itemsets □ Therefore, the algorithm iteratively does: Create itemsets Only continue exploration of those whose support ≥ minsup Slide 14 Artificial IntelligenceMachine Learning

15 Example Itemset Generation null Infrequent itemset ABCDE ABADACAEBDBCBECECDDE ABCABEABDACDADEACEBCDBDEBCECDE ABCDABCEABDEACDEBCDE ABCD Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 15 Artificial IntelligenceMachine Learning

16 Recovering the Example Minimum support = 3 1-itemsets Itemcount bread peanut-b 4343 2-itemsets jelly milk beer 111111 Slide 16 Artificial IntelligenceMachine Learning TIDItems T1bread, jelly, peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter T4beer, bread T5beer, milk Itemcount bread, peanut-b3

17 Apriori Algorithm 口 k=1 □ Generate frequent itemsets of length 1 □ Repeat until no frequent itemsets are found k := k+1 Generate itemsets of size k from the k-1 frequent itemsets Compute the support of each candidate by scanning DB Slide 17 Artificial IntelligenceMachine Learning

18 Apriori Algorithm Algorithm Apriori(T) C 1  init-pass(T); F 1  {f | f  C 1, f.count/n  minsup}; for (k = 2; F k-1   ; k++) do C k  candidate-gen(F k-1 ); for each transaction t  T do for each candidate c  C k do if c is contained in t then c.count++; end // n: no. of transactions in T F k  {c  C k | c.count/n  minsup} end return F  k Fk;k Fk; Slide 18 Artificial IntelligenceMachine Learning

19 Apriori Algorithm Function candidate-gen(F k-1 ) C k   ; forall f 1, f 2  F k-1 // prune with f 1 = {i 1, …, i k-2, i k-1 } and f 2 = {i 1, …, i k-2, i’ k-1 } and i k-1 < i’ k-1 do c  {i 1, …, i k-1, i’ k-1 }; C k  C k  {c}; for each (k-1)-subset s of c do if (s  F k-1 ) then delete c from C k ; end end return C k ; Slide 19 Artificial IntelligenceMachine Learning // join f 1 and f 2

20 Example of Apriori Run Database TDB C1C1 L1L1 1 st scan C2C2 C2C2 Itemsetsup Itemset Itemsetsup L2L2 2 nd scan {A, B} {A, C} {A, B}1 {A, C}2 {A, E}1 {B, C} {B, E} {B, C}2 {B, E}3 {C, E}2 C3C3 L3L3 3 rd scan {C, E} Itemset {B, C, E} Itemsetsup Slide 20 Artificial IntelligenceMachine Learning {B, C, E}2 Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 {A}2 {B}3 {C}3 {E}3 TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2

21 Apriori □ Remember that Apriori consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules □ We accomplished step 1. So we have all frequent itemsets □ So, let’s pay attention to the second step Slide 21 Artificial IntelligenceMachine Learning

22 Rule Generation in Apriori □ Given a frequent itemset L Find all non-empty subsetsF in L, such that the association rule F  {L-F}satisfies the minimum confidence Create the rule F  {L-F} □ If L={A,B,C} The candidate itemsets are: AB  C, AC  B, BC  A, A  BC, B  AC, C  AB In general, there are 2 K -2 candidate solutions, where k is the length of the itemset L Slide 22 Artificial IntelligenceMachine Learning

23 Can you Be More Efficient? □ Can we apply the same trick used with support? Confidence does not have anti-monote property That is, c(AB  D) > c(A  D)? Y Don’t know! □ But confidence of rules generated from the same itemset does have the anti-monote property L={A,B,C,D} Y C(ABC  D)≥ c(AB  CD) ≥ c(A  BCD) We can apply this property to prune the rule generation Slide 23 Artificial IntelligenceMachine Learning

24 Example of Efficient Rule Generation ABCD Low confidence ABC  DABD  CACD  BBCD  A AB  CDAC  BDBC  ADBD  ADAD  BCCD  AB A  BCDB  ACDC  ABDD  ABC Slide 24 Artificial IntelligenceMachine Learning

25 Challenges in AR Mining □ Challenges Apriori scans the data base multiple times Most often, there is a high number of candidates Support counting for candidates can be time expensive □ Several methods try to improve this points by Reduce the number of scans of the data base Shrink the number of candidates Counting the support of candidates more efficiently Slide 25 Artificial IntelligenceMachine Learning

26 Next Class □ Advanced topics in association rule mining Slide 26 Artificial IntelligenceMachine Learning

27 Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig aorriols@salle.url.edu aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull


Download ppt "Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial."

Similar presentations


Ads by Google