Download presentation

Presentation is loading. Please wait.

Published byRomeo Worcester Modified over 3 years ago

1
**Identifying Interesting Association Rules with Genetic Algorithms**

Elnaz Delpisheh York University Department of Computer Science and Engineering April-10-17

2
**Data mining I = {i1,i2,...,in} is a set of items.**

Too much data Data Data Mining I = {i1,i2,...,in} is a set of items. D = {t1,t2,...,tn} is a transactional database. ti is a nonempty subset of I. An association rule is of the form AB, where A and B are the itemsets, A⊂ I, B⊂ I, and A∩B=∅ . Apriori algorithm is mostly used for association rule mining. {milk, eggs}{bread}. Association rules There exist other algorithms apart from apriori such as Fp-growth.

3
**Apriori Algorithm TID List of item IDs T100 I1,I2,I3 T200 I2, I4 T300**

I1, I2, I3, I5 T900 I1, I2, I3

4
**Apriori Algorithm (Cont.)**

5
**Association rule mining**

Too much data Data Data Mining Too many association rules Association rules

6
**Interestingness criteria**

Comprehensibility. Conciseness. Diversity. Generality. Novelty. Utility. ...

7
**Interestingness measures**

Subjective measures Data and the user’s prior knowledge are considered. Comprehensibility, novelty, surprisingness, utility. Objective measures The structure of an association rule is considered. Conciseness, diversity, generality, peculiarity. Example: Support It represents the generality of a rule. It counts the number of transactions containing both A and B.

8
**Drawbacks of objective measures**

Detabase-dependence Lack of knowledge about the database Threshold dependence Solution Multiple database reanalysis Problem Large number of disk I/O Problem with Multiple database reanalysis is that, some databases are simply large. Association rule mining must confront exponential search spaces. Detabase-independence This approach does not require users to specify thresholds. Instead of generating unknown number of interesting rules like the traditional models, only the most interesting rules are extracted according to the interestingness measure as defined by the fitness function! Detabase-independence

9
**Genetic algorithm-based learning (ARMGA )**

Initialize population Evaluate individuals in population Repeat until a stopping criteria is met Select individuals from the current population Recombine them to obtain more individuals Evaluate new individuals Replace some or all the individuals of the current population by off-springs Return the best individual seen so far Usually genetic algorithm for rule mining are divided into 2 groups according to their encoding of rules in the population of chromosomes. -Michigan Approach Many ppl have used this approach. However, if the number of rules is too many, this approach is impractical. -Pittsburgh Approach

10
**ARMGA Modeling Given an association rule XY Requirement**

Conf(XY) > Supp(Y) Aim is to maximise Conf(XY) > Supp(Y), since we are only interested in positive rules.

11
**ARMGA Encoding Michigan Strategy**

Given an association k-rule XY, where X,Y⊂I, I is a set of items I=i1,i2,..., in, and X∩Y=∅. For example {A1,...,Aj}{Aj+1,...,Ak} Michigan Approach Each rule is encoded into an individual Pittsburgh Approach A set of rules are encoded into a chromosome.

12
ARMGA Encoding (Cont.) The aforementioned encoding highly depends on the length of the chromosome. We use another type of encoding: Given a set of items {A,B,C,D,E,F} Association rule ACFB is encoded as follows 00A11B00C01D11E00F 00: Item is antecedent 11: Item is consequence 01/10: Item is absent

13
ARMGA Operators Select Crossover Mutation

14
**ARMGA Operators-Select**

Select(c,ps): Acts as a filter of the chromosome C: Chromosome Ps: pre-specified probability

15
**ARMGA Operators-Crossover**

This operation uses a two-point strategy

16
**ARMGA Operators-Mutate**

17
ARMGA Initialization

18
ARMGA Algorithm

19
**Empirical studies and Evaluation**

Implement the entire procedure using Visual C++ Use WEKA to produce interesting association rules Compare the results

Similar presentations

OK

© Vipin Kumar CSci 8980 Fall 2002 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

© Vipin Kumar CSci 8980 Fall 2002 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google