Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

Similar presentations


Presentation on theme: "COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong"— Presentation transcript:

1 COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
COMP5331

2 Large Itemset Mining Frequent Itemset Mining TID Items Bought 100
Problem: to find all “large” (or frequent) itemsets with support at least a threshold (i.e., itemsets with support >= 3) TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g COMP5331

3 Apriori Join Step Prune Step L1 C2 L2 Candidate Generation “Large” Itemset Generation Large 2-itemset Generation C3 L3 Large 3-itemset Generation Disadvantage 1: It is costly to handle a large number of candidate sets Disadvantage 2: It is tedious to repeatedly scan the database and check the candidate patterns Counting Step COMP5331

4 FP-tree Scan the database once to store all essential information in a data structure called FP-tree (Frequent Pattern Tree) The FP-tree is concise and is used in directly generating large itemsets COMP5331

5 FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331

6 FP-tree Frequent Itemset Mining TID Items Bought 100
Problem: to find all “large” (or frequent) itemsets with support at least a threshold (i.e., itemsets with support >= 3) TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g COMP5331

7 FP-tree TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300
b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g COMP5331

8 FP-tree TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300
b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g FP-tree COMP5331

9 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 Item Frequency a b c d e f g h i j k 4 COMP5331

10 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 Item Frequency a b c d e f g h i j k 4 4 1 3 COMP5331

11 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 Item Frequency a b c d e f g h i j k Item Frequency a 4 b d 3 e f g 4 4 1 3 3 3 3 1 1 1 COMP5331 1

12 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g Item Frequency a b c d e f g h i j k Item Frequency a 4 b d 3 e f g 4 4 1 3 3 3 3 1 1 1 COMP5331 1

13 FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331

14 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:1 b:1 d:1 e:1 f:1 g:1 COMP5331

15 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:2 a:1 f:1 b:1 g:1 d:1 e:1 f:1 g:1 COMP5331

16 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g b:1 a:2 b:1 d:1 f:1 f:1 d:1 e:1 g:1 e:1 f:1 g:1 COMP5331

17 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g b:1 a:3 a:2 b:2 b:1 d:1 f:1 f:1 d:1 d:2 g:1 e:1 e:1 f:1 g:1 COMP5331

18 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:3 a:4 b:1 b:2 b:3 f:1 d:1 e:1 d:2 g:1 e:1 g:1 e:1 f:1 f:1 g:1 COMP5331

19 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 g:1 COMP5331

20 FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331

21 Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 g:1 COMP5331

22 root a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 g:1 COMP5331

23 Threshold = 3 Cond. FP-tree on “g” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” { } (a:1, b:1, d:1, e:1, f:1, g:1), g:1 COMP5331

24 Threshold = 3 Cond. FP-tree on “g” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” { } (a:1, b:1, d:1, e:1, f:1, g:1), g:1 (a:1, b:1, e:1, g:1), COMP5331

25 Threshold = 3 Cond. FP-tree on “g” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” { } (a:1, b:1, d:1, e:1, f:1, g:1), g:1 (a:1, b:1, e:1, g:1), (a:1, f:1, g:1) Item Frequency a b d e f g 3 2 1 3 COMP5331

26 Threshold = 3 Cond. FP-tree on “g” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” 3 { } (a:1, b:1, d:1, e:1, f:1, g:1), { } (a:1, g:1), g:1 conditional pattern base of “g” (a:1, b:1, e:1, g:1), (a:1, g:1), (a:1, f:1, g:1) (a:1, g:1) Item Frequency a b d e f g Item Frequency a 3 g root Item Head of node-link a 3 a:3 2 1 2 COMP5331 2 3

27 Threshold = 3 Cond. FP-tree on “f” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” { } (a:1, b:1, d:1, e:1, f:1), g:1 COMP5331

28 Threshold = 3 Cond. FP-tree on “f” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” { } (a:1, b:1, d:1, e:1, f:1), g:1 (a:1, f:1), COMP5331

29 Threshold = 3 Cond. FP-tree on “f” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” { } (a:1, b:1, d:1, e:1, f:1), g:1 (a:1, f:1), (b:1, d:1, e:1, f:1) Item Frequency a b d e f g 2 2 3 COMP5331

30 Threshold = 3 Cond. FP-tree on “f” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” 3 { } (a:1, b:1, d:1, e:1, f:1), { } (f:1), g:1 (a:1, f:1), (f:1), (b:1, d:1, e:1, f:1) (f:1) Item Frequency a b d e f g Item Frequency f 3 root 2 2 2 2 COMP5331 3

31 Threshold = 3 Cond. FP-tree on “e” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “e” { } (a:1, b:1, d:1, e:1), (a:1, b:1, e:1), (b:1, d:1, e:1) g:1 Item Frequency a b d e f g 2 3 COMP5331

32 Threshold = 3 Cond. FP-tree on “e” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “e” 3 { } (a:1, b:1, d:1, e:1), { } (b:1, e:1), (b:1, e:1) g:1 g:1 (a:1, b:1, e:1), (b:1, d:1, e:1) Item Frequency a b d e f g Item Frequency b 3 e root Item Head of node-link b 2 b:3 3 2 3 COMP5331

33 Threshold = 3 Cond. FP-tree on “d” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “d” { } (a:2, b:2, d:2), (b:1, d:1) g:1 Item Frequency a b d e f g 2 3 COMP5331

34 Threshold = 3 Cond. FP-tree on “d” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “d” 3 { } (a:2, b:2, d:2), { } (b:2, d:2), (b:1, d:1) g:1 g:1 g:1 (b:1, d:1) Item Frequency a b d e f g Item Frequency b 3 d root Item Head of node-link b 2 b:3 3 3 COMP5331

35 Threshold = 3 Cond. FP-tree on “b” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “b” { } (a:3, b:3), (b:1) g:1 Item Frequency a b d e f g 3 4 COMP5331

36 Threshold = 3 Cond. FP-tree on “b” 4 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “b” 4 { } (a:3, b:3), { } (b:3, a:3), (b:1) g:1 g:1 g:1 g:1 (b:1) Item Frequency a b d e f g Item Frequency b 4 a 3 root Item Head of node-link a 3 a:3 4 COMP5331

37 Threshold = 3 Cond. FP-tree on “a” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “a” { } (a:4) g:1 Item Frequency a b d e f g 4 COMP5331

38 Threshold = 3 Cond. FP-tree on “a” 4 { } root a b d e f g a:4 b:1 b:3
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “a” 4 { } (a:4) { } (a:4) g:1 g:1 g:1 g:1 Item Frequency a b d e f g Item Frequency a 4 root 4 COMP5331

39 FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331

40 Cond. FP-tree on “g” 3 COMP5331

41 Cond. FP-tree on “g” 3 Cond. FP-tree on “f” 3 Cond. FP-tree on “e” 3
root Item Head of node-link a a:3 Cond. FP-tree on “f” 3 root Cond. FP-tree on “e” 3 COMP5331

42 Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root Item Head of node-link a a:3 Cond. FP-tree on “f” 3 root Cond. FP-tree on “e” 3 root Item Head of node-link b b:3 COMP5331

43 Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root root Item Head of node-link b Item Head of node-link a b:3 a:3 Cond. FP-tree on “f” 3 Cond. FP-tree on “b” 4 root Cond. FP-tree on “e” 3 root Item Head of node-link b b:3 COMP5331

44 Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root root Item Head of node-link b Item Head of node-link a b:3 a:3 Cond. FP-tree on “f” 3 Cond. FP-tree on “b” 4 root root Item Head of node-link a a:3 Cond. FP-tree on “a” 4 Cond. FP-tree on “e” 3 root root Item Head of node-link b b:3 COMP5331

45 Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root root 1. Before generating this cond. tree, we generate {d} (support = 3) 1. Before generating this cond. tree, we generate {g} (support = 3) Item Head of node-link b Item Head of node-link a b:3 a:3 2. After generating this cond. tree, we generate {b, d} (support = 3) 2. After generating this cond. tree, we generate {a, g} (support = 3) Cond. FP-tree on “f” 3 Cond. FP-tree on “b” 4 root 1. Before generating this cond. tree, we generate {f} (support = 3) 1. Before generating this cond. tree, we generate {b} (support = 4) root Item Head of node-link a a:3 2. After generating this cond. tree, we do not generate any itemset. 2. After generating this cond. tree, we generate {a, b} (support = 3) Cond. FP-tree on “a” 4 Cond. FP-tree on “e” 3 1. Before generating this cond. tree, we generate {a} (support = 4) root 1. Before generating this cond. tree, we generate {e} (support = 3) root Item Head of node-link b b:3 2. After generating this cond. tree, we do not generate any itemset. 2. After generating this cond. tree, we generate {b, e} (support = 3) COMP5331

46 Complexity Complexity in building FP-tree
Two scans of the transactions DB Collect frequent items Construct the FP-tree Cost to insert one transaction Number of frequent items in this transaction COMP5331

47 Size of the FP-tree The size of the FP-tree is bounded by the overall occurrences of the frequent items in the database COMP5331

48 Height of the Tree The height of the tree is bounded by the maximum number of frequent items in any transaction in the database COMP5331

49 Compression With respect to the total number of items stored,
is FP-tree more compressed compared with the original databases? COMP5331

50 Details of the Algorithm
Procedure FP-growth (Tree, ) if Tree contains a single path P for each combination (denoted by ) of the nodes in the path P do generate pattern  U  with support = minimum support of nodes in  else for each ai in the header table of Tree do generate pattern  = ai U  with support = ai.support construct ’s conditional pattern base and then ’s conditional FP-tree Tree if Tree   Call FP-growth(Tree, ) COMP5331


Download ppt "COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong"

Similar presentations


Ads by Google