Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP 790-90 Seminar GNET 713 BCB Module Spring 2007.

Similar presentations


Presentation on theme: "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP 790-90 Seminar GNET 713 BCB Module Spring 2007."— Presentation transcript:

1 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP 790-90 Seminar GNET 713 BCB Module Spring 2007

2 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 2 Borders and Max-patterns Max-patterns: borders of frequent patterns A subset of max-pattern is frequent A superset of max-pattern is infrequent  abcd abacadbcbdcd abcabdacdbcd abcd

3 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 3 MaxMiner: Mining Max- patterns 1st scan: find frequent items A, B, C, D, E 2nd scan: find support for AB, AC, AD, AE, ABCDE BC, BD, BE, BCDE CD, CE, CDE, DE, Since BCDE is a max-pattern, no need to check BCD, BDE, CDE in later scan Baya’98 TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F Potential max-patterns Min_sup=2

4 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 4 Frequent Closed Patterns For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “acdf” is a frequent closed pattern Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT’99 TIDItems 10a, c, d, e, f 20a, b, e 30c, e, f 40a, c, d, f 50c, e, f Min_sup=2

5 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 5 CLOSET: Mining Frequent Closed Patterns Flist: list of all freq items in support asc. order Flist: d-a-f-e-c Divide search space Patterns having d Patterns having d but no a, etc. Find frequent closed pattern recursively Every transaction having d also has cfa  cfad is a frequent closed pattern PHM’00 TIDItems 10a, c, d, e, f 20a, b, e 30c, e, f 40a, c, d, f 50c, e, f Min_sup=2

6 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 6 Closed and Max-patterns Closed pattern mining algorithms can be adapted to mine max-patterns A max-pattern must be closed Depth-first search methods have advantages over breadth-first search ones

7 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 7 Mining Various Kinds of Rules or Regularities Multi-level, quantitative association rules, correlation and causality, ratio rules, sequential patterns, emerging patterns, temporal associations, partial periodicity Classification, clustering, iceberg cubes, etc.

8 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 8 Multiple-level Association Rules Items often form hierarchy Flexible support settings: Items at the lower level are expected to have lower support. Transaction database can be encoded based on dimensions and levels explore shared multi-level mining uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support

9 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 9 Multi-dimensional Association Rules Single-dimensional rules: buys(X, “milk”)  buys(X, “bread”) MD rules:  2 dimensions or predicates Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”) hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) Categorical Attributes: finite number of possible values, no order among values Quantitative Attributes: numeric, implicit order

10 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 10 Quantitative/Weighted Association Rules age(X,”33-34”)  income(X,”30K - 50K”)  buys(X,”high resolution TV”) Numeric attributes are dynamically discretized maximize the confidence or compactness of the rules 2-D quantitative association rules: A quan1  A quan2  A cat Cluster “adjacent” association rules to form general rules using a 2-D grid. 70-80k 60-70k 50-60k 40-50k 30-40k 20-30k <20k 32333435363738 Income Age

11 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 11 Mining Distance-based Association Rules Binning methods do not capture semantics of interval data Distance-based partitioning Density/number of points in an interval “Closeness” of points in an interval PriceEqui-widthEqui-depthDistance-based 7[0,10][7,20][7,7] 20[11,20][22,50][20,22] 22[21,30][51,53] 50[31,40] 51[41,50][50,53] 53[51,60

12 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 12 Constraint-based Data Mining Find all the patterns in a database autonomously? The patterns could be too many but not focused! Data mining should be interactive User directs what to be mined Constraint-based mining User flexibility: provides constraints on what to be mined System optimization: push constraints for efficient mining

13 COMP 790-090 Data Mining: Concepts, Algorithms, and Applications 13 Constraints in Data Mining Knowledge type constraint classification, association, etc. Data constraint — using SQL-like queries find product pairs sold together in stores in New York Dimension/level constraint in relevance to region, price, brand, customer category Rule (or pattern) constraint small sales (price $200) Interestingness constraint strong rules: support and confidence


Download ppt "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP 790-90 Seminar GNET 713 BCB Module Spring 2007."

Similar presentations


Ads by Google