Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elnaz Delpisheh York University Department of Computer Science and Engineering April 13, 2015 Identifying Interesting Association Rules with Genetic Algorithms.

Similar presentations


Presentation on theme: "Elnaz Delpisheh York University Department of Computer Science and Engineering April 13, 2015 Identifying Interesting Association Rules with Genetic Algorithms."— Presentation transcript:

1 Elnaz Delpisheh York University Department of Computer Science and Engineering April 13, 2015 Identifying Interesting Association Rules with Genetic Algorithms

2 Data mining 2 Data Data Mining Association rules Too much data I = {i 1,i 2,...,i n } is a set of items. D = {t 1,t 2,...,t n } is a transactional database. t i is a nonempty subset of I. An association rule is of the form A  B, where A and B are the itemsets, A ⊂ I, B ⊂ I, and A ∩ B= ∅. Apriori algorithm is mostly used for association rule mining. {milk, eggs}  {bread}. I = {i 1,i 2,...,i n } is a set of items. D = {t 1,t 2,...,t n } is a transactional database. t i is a nonempty subset of I. An association rule is of the form A  B, where A and B are the itemsets, A ⊂ I, B ⊂ I, and A ∩ B= ∅. Apriori algorithm is mostly used for association rule mining. {milk, eggs}  {bread}.

3 Apriori Algorithm TIDList of item IDs T100I1,I2,I3 T200I2, I4 T300I2, I3 T400I1,I2,I4 T500I1, I3 T600I2, I3 T700I1, I3 T800I1, I2, I3, I5 T900I1, I2, I3 3

4 Apriori Algorithm (Cont.) 4

5 Association rule mining 5 Too many association rules Data Data Mining Association rules Too much data

6 Interestingness criteria 6 Comprehensibility. Conciseness. Diversity. Generality. Novelty. Utility....

7 Interestingness measures Subjective measures Data and the user’s prior knowledge are considered. Comprehensibility, novelty, surprisingness, utility. Objective measures The structure of an association rule is considered. Conciseness, diversity, generality, peculiarity. Example: Support It represents the generality of a rule. It counts the number of transactions containing both A and B. 7

8 Drawbacks of objective measures Detabase-dependence Lack of knowledge about the database Threshold dependence Solution Multiple database reanalysis Problem oLarge number of disk I/O Detabase-independence 8

9 Genetic algorithm-based learning (ARMGA ) 1. Initialize population 2. Evaluate individuals in population 3. Repeat until a stopping criteria is met A. Select individuals from the current population B. Recombine them to obtain more individuals C. Evaluate new individuals D. Replace some or all the individuals of the current population by off-springs 4. Return the best individual seen so far 9

10 ARMGA Modeling Given an association rule X  Y Requirement Conf(X  Y) > Supp(Y) Aim is to maximise 10

11 ARMGA Encoding Michigan Strategy Given an association k-rule X  Y, where X,Y ⊂ I, I is a set of items I=i 1,i 2,..., i n, and X∩Y= ∅. For example {A 1,...,A j }  {A j+1,...,A k } 11

12 ARMGA Encoding (Cont.) 12 The aforementioned encoding highly depends on the length of the chromosome. We use another type of encoding: Given a set of items {A,B,C,D,E,F} Association rule ACF  B is encoded as follows 00A11B00C01D11E00F 00: Item is antecedent 11: Item is consequence 01/10: Item is absent

13 ARMGA Operators Select Crossover Mutation 13

14 ARMGA Operators-Select Select(c,ps): Acts as a filter of the chromosome C: Chromosome Ps: pre-specified probability 14

15 ARMGA Operators-Crossover This operation uses a two-point strategy 15

16 ARMGA Operators-Mutate 16

17 ARMGA Initialization 17

18 ARMGA Algorithm 18

19 Empirical studies and Evaluation Implement the entire procedure using Visual C++ Use WEKA to produce interesting association rules Compare the results 19

20 20


Download ppt "Elnaz Delpisheh York University Department of Computer Science and Engineering April 13, 2015 Identifying Interesting Association Rules with Genetic Algorithms."

Similar presentations


Ads by Google