Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frequent-Pattern Tree

Similar presentations


Presentation on theme: "Frequent-Pattern Tree"— Presentation transcript:

1 Frequent-Pattern Tree

2 Bottleneck of Frequent-pattern Mining
Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i1i2…i100 # of scans: 100 # of Candidates: (1001) + (1002) + … + (110000) = = 1.27*1030 ! Bottleneck: candidate-generation-and-test Can we avoid candidate generation?

3 Mining Freq Patterns w/o Candidate Generation
Grow long patterns from short ones using local frequent items “abc” is a frequent pattern Get all transactions having “abc”: DB|abc (projected database on abc) “d” is a local frequent item in DB|abc  abcd is a frequent pattern Get all transactions having “abcd” (projected database on “abcd”) and find longer itemsets

4 Mining Freq Patterns w/o Candidate Generation
Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure Highly condensed, but complete for frequent pattern mining Avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: examine sub-database (conditional pattern base) only!

5 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} min_sup= 50% Steps: Scan DB once, find frequent 1-itemset (single item pattern) Order frequent items in frequency descending order: f, c, a, b, m, p (L-order) Process DB based on L-order a 3 i 1 b j c 4 k d l 2 e m f n g o h p

6 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 0 nil c 0 nil a 0 nil b 0 nil m 0 nil p 0 nil Initial FP-tree

7 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 1 c 1 a 1 b 0 nil m 1 p 1 f:1 c:1 a:1 m:1 Insert {f, c, a, m, p} p:1

8 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 2 c 2 a 2 b 1 m 2 p 1 f:2 c2 a:2 m:1 b:1 Insert {f, c, a, b, m} p:1 m:1

9 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 3 c 2 a 2 b 2 m 2 p 1 f:3 c:2 b:1 a:2 m:1 b:1 Insert {f, b} p:1 m:1

10 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 3 c 3 a 2 b 3 m 2 p 2 f:3 c:1 c:2 b:1 b:1 a:2 p:1 m:1 b:1 Insert {c, b, p} p:1 m:1

11 Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 Insert {f, c, a, m, p} p:2 m:1

12 Benefits of FP-tree Structure
Completeness: Preserve complete DB information for frequent pattern mining (given prior min support) Each transaction mapped to one FP-tree path; counts stored at each node Compactness One FP-tree path may correspond to multiple transactions; tree is never larger than original database (if not count node-links and counts) Reduce irrelevant information—infrequent items are gone Frequency-descending ordering: more frequent items are closer to tree top and more likely to be shared

13 How Effective Is FP-tree?
Dataset: Connect-4 (a dense dataset)

14 Mining Frequent Patterns Using FP-tree
General idea (divide-and-conquer) Recursively grow frequent pattern path using FP-tree Frequent patterns can be partitioned into subsets according to L-order L-order=f-c-a-b-m-p Patterns containing p Patterns having m but no p Patterns having b but no m or p Patterns having c but no a nor b, m, p Pattern f

15 Mining Frequent Patterns Using FP-tree
Step 1 : Construct conditional pattern base for each item in header table Step 2: Construct conditional FP-tree from each conditional pattern-base Step 3: Recursively mine conditional FP-trees and grow frequent patterns obtained so far If conditional FP-tree contains a single path, simply enumerate all patterns

16 Step 1: Construct Conditional Pattern Base
Starting at header table of FP-tree Traverse FP-tree by following link of each frequent item Accumulate all transformed prefix paths of item to form a conditional pattern base Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 {} Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

17 Step 2: Construct Conditional FP-tree
For each pattern-base Accumulate count for each item in base Construct FP-tree for frequent items of pattern base min_sup= 50% # transaction =5 Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 p conditional FP-tree f 2 c 3 a m b 1 {} c:3 Item frequency head c 3 fcam cb

18 Mining Frequent Patterns by Creating Conditional Pattern-Bases
Empty f {(f:3)}|c {(f:3)} c {(f:3, c:3)}|a {(fc:3)} a {(fca:1), (f:1), (c:1)} b {(f:3, c:3, a:3)}|m {(fca:2), (fcab:1)} m {(c:3)}|p {(fcam:2), (cb:1)} p Conditional FP-tree Conditional pattern-base Item

19 Step 3: Recursively mine conditional FP-tree
Collect all patterns that end at p suffix: p(3) FP: p(3) CPB: fcam:2, cb:1 c(3) FP-tree: Suffix: cp(3) FP: cp(3) CPB: nil

20 Collect all patterns that end at m
Step 3: Recursively mine conditional FP-tree Collect all patterns that end at m FP-tree: suffix: m(3) f(3) c(3) FP: m(3) CPB: fca:2, fcab:1 a(3) suffix: am(3) suffix: cm(3) suffix: fm(3) FP: cm(3) CPB: f:3 FP: fm(3) CPB: nil f(3) FP-tree: Continue next page suffix: fcm(3) FP: fcm(3) CPB: nil

21 Collect all patterns that end at m (cont’d)
f(3) FP-tree: c(3) suffix: am(3) FP: am(3) CPB: fc:3 suffix: cam(3) f(3) FP-tree: suffix: fam(3) FP: cam(3) CPB: f:3 FP: fam(3) CPB: nil suffix: fcam(3) FP: fcam(3) CPB: nil

22 FP-growth vs. Apriori: Scalability With the Support Threshold
Data set T25I20D10K

23 Why Is Frequent Pattern Growth Fast?
Performance study shows FP-growth is an order of magnitude faster than Apriori Reasoning No candidate generation, no candidate test Use compact data structure Eliminate repeated database scan Basic operations are counting and FP-tree building

24 Weaknesses of FP-growth
Support dependent; cannot accommodate dynamic support threshold Cannot accommodate incremental DB update Mining requires recursive operations

25 Maximal patterns and Border
Maximal patterns: An frequent itemset X is maximal if none of its superset is frequent 20 patterns, but only 3 maximal patterns ! Can use 3 maximal patterns to represent all 20 patterns Maximal Patterns = {AD, ACE, BCDE} null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Border

26 Closed Itemsets An itemset X is closed if there exists no item y (yX) such that every transaction containing X also contains y Example: AC is not closed since every transaction containing AC also contains W CDW is closed since transaction Tid=2 contains no other item Tid Itemset 1 ACTW 2 CDW 3 4 ACDW 5 ACDTW 6 CDT

27 Frequent Closed Patterns
For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “acdf” is a frequent closed pattern Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT’99 Min_sup=2 TID Items 10 a, c, d, e, f 20 a, b, e 30 c, e, f 40 a, c, d, f 50


Download ppt "Frequent-Pattern Tree"

Similar presentations


Ads by Google