Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining  Association Rule  Classification  Clustering.

Similar presentations


Presentation on theme: "Data Mining  Association Rule  Classification  Clustering."— Presentation transcript:

1 Data Mining  Association Rule  Classification  Clustering

2 Data Mining: Association Rule

3 What Is Association Mining? Association Rule Mining – Finding frequent patterns, associations, correlations, or causal structures among item sets in transaction databases, relational databases, and other information repositories Applications – Market basket analysis (marketing strategy: items to put on sale at reduced prices), cross-marketing, catalog design, shelf space layout design, etc Examples – Rule form: Body  ead [Support, Confidence]. – buys(x, “Computer”)  buys(x, “Software”) [2%, 60%] – major(x, “CS”) ^ takes(x, “ DB”)  grade(x, “A”) [1%, 75%]

4 Market Basket Analysis Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.

5 Rule Measures: Support and Confidence Let minimum support 50%, and minimum confidence 50%, we have –A  C [50%, 66.6%] –C  A [50%, 100%]

6 Support & Confidence

7 Association Rule: Basic Concepts Given –(1) database of transactions, –(2) each transaction is a list of items (purchased by a customer in a visit) Find all rules that correlate the presence of one set of items with that of another set of items Find all the rules A  B with minimum confidence and support –support, s, P(A  B) –confidence, c, P(B|A)

8 Terminologies Item –I1, I2, I3, … –A, B, C, … Itemset –{I1}, {I1, I7}, {I2, I3, I5}, … –{A}, {A, G}, {B, C, E}, … 1-Itemset –{I1}, {I2}, {A}, … 2-Itemset –{I1, I7}, {I3, I5}, {A, G}, …

9 Terminologies K-Itemset –If the length of the itemset is K Frequent (Large) K-Itemset –If the length of the itemset is K and the itemset satisfies a minimum support threshold. Association Rule –If a rule satisfies both a minimum support threshold and a minimum confidence threshold

10 Analysis The number of itemsets of a given cardinality tends to grow exponentially

11 Fast Algorithms for Mining Association Rules

12 Mining Association Rules: Apriori Principle For rule A  C: –support = support({A  C}) = 50% –confidence = support({A  C})/support({A}) = 66.6% The Apriori principle: –Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%

13 Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support –A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset –Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules

14 Another Example 1 Database D 1 3 4 2 3 5 1 2 3 5 2 5 scan D count C 1 C 1 count 1 2 2 3 3 4 1 5 3 generate L 1 L 1 1 2 3 5 scan D count C 2 C 2 count 12 1 13 2 15 1 23 2 25 3 35 2 generate L 2 L 2 13 23 25 35 C 2 12 13 15 23 25 35 generate C 2 scan D count C 3 C 3 count 235 2 generate L 3 L 3 235 C 3 235 generate C 3

15 Example of Generating Candidates L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 –abcd from abc and abd –acde from acd and ace Pruning: –acde is removed because ade is not in L 3 C 4 ={abcd}

16 Example

17 Apriori Algorithm

18

19

20 Another Example 2

21 Demo-IBM Intelligent Minner

22 Demo Database

23

24

25

26 Multi-Dimensional Association Single-Dimensional (Intra-Dimension) Rules: Single Dimension (Predicate) with Multiple Occurrences. buys(X, “milk”)  buys(X, “bread”) Multi-Dimensional Rules:  2 Dimensions –Inter-dimension association rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”) –hybrid-dimension association rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) Categorical (Nominal) Attributes –finite number of possible values, no ordering among values Quantitative Attributes –numeric, implicit ordering among values

27 An Example


Download ppt "Data Mining  Association Rule  Classification  Clustering."

Similar presentations


Ads by Google