Presentation is loading. Please wait.

Presentation is loading. Please wait.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Similar presentations


Presentation on theme: "AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project."— Presentation transcript:

1 AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project

2 AR mining Outline Motivation Dataset Apriori based hash tree algorithm FP-tree algorithm Conclusion Reference

3 AR mining Motivation Make the time of generating rules as shot as possible! To understand the three algorithms –Apriori algorithm –Apriori with hash tree algorithm –FP-tree algorithm Learn how to improve an algorithm

4 AR mining Dataset IBM dataset generator –Can set item number –Can set minimal support –Can set dataset size 1 2 5 8 9 2 3 4 6 7 12 Tiditem

5 AR mining Apriori principle –A candidate generation-and-test Approach [4] –Given a frequent itemset, its subset must be frequent –A set is infrequent, its super set will not be generated and tested But there is still some places can be improved –Count the support –I/O scan times

6 AR mining Apriori Hash Tree Alg Candidate K-itemset size is l There is n transactions Average transaction size is m Calculate support count: –Original Apriori Alg: –With hash tree: O( n.log(l).( m k ) )

7 AR mining Apriori Hash Tree Alg Candidate is stored in a hash tree structure TidItems 11 2 21 3 6 31 2 3 42 4 52 3 6 65 6 1-itemset candidate hash tree 1(1) 2(1) 1(2) 3(1) 1(2) 3(1) 2(1)

8 AR mining Apriori Hash Tree Alg Ti d Item s 11 2 21 3 6 31 2 3 42 4 52 3 6 65 6 2(4) 5(1) 6(3) 1(3) 3(3) 4(1) 1itemset, Min support = 2

9 AR mining Apriori Hash Tree Alg Ti d Item s 11 2 21 3 6 31 2 3 42 4 52 3 6 65 6 2 3(2) 2 6(1) 1 3(2) 1 2(2) 3 6(2) 1 6(1) 2 itemset, Min support = 2 3 itemset, Min support = 2 1 2 3(1)

10 AR mining FP-tree Since the mining dataset is always very huge, it’s impossible to read all transactions into computer memory all in once. But I/O scan is very time consuming. FP-tree algorithm will try to suite all information from the dataset into computer memory, hence only need to scan I/O two times.

11 AR mining FP-tree FP-tree algorithm and implementation –By Xiaobo Chen

12 AR mining FP-tree (Frequent Pattern Tree) Mining frequent pattern without candidate generation Divide and conquer methodology: decompose mining tasks into smaller ones

13 AR mining FP-tree (Merits of FP-tree algorithm) Make most use of common shared prefix Complete and compact All information of a transaction is stored in a path The size is constrained by the data set consequently, the longest path corresponds to the longest pattern The compact ratio: over 100

14 AR mining FP-tree (Construction of FP-tree) TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} min_support = 3 Item frequency f4 c4 a3 b3 m3 p3 f:1 c:1 a:1 m:1 p:1 root

15 AR mining FP-tree (construction (Cont’d)) TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} f:2 c:2 a:2 m:1 p:1 b:1 m:1 root

16 AR mining FP-tree construction (Cont’d) TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} min_support = 3 Item frequency f4 c4 a3 b3 m3 p3 Header Table Item frequency head f4 c4 a3 b3 m3 p3 f:4 c:3 a:3 m:2 p:2 b:1 m:1 b:1 c:1 b:1 p:1 root

17 AR mining FP-tree ( Mining Frequent Patterns Using the FP-tree) General idea (divide-and-conquer) –Recursively grow frequent pattern path using the FP- tree Method –For each item, construct its conditional pattern-base, and then its conditional FP-tree –Repeat the process on each newly created conditional FP-tree –Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)

18 AR mining FP-tree ( Mining Frequent Patterns Using the FP-tree) Conditional pattern base for p fcam:2, cb:1 f:4 c:3 a:3 m:2 p:2 c:1 b:1 p:1 p Start with last item in order (i.e., p). Follow node pointers and traverse only the paths containing p. Accumulate all of transformed prefix paths of that item to form a conditional pattern base root Constructing a new FP- tree based on this pattern base leads to only one branch c:3 Thus we derive only one frequent pattern cont. p. Pattern cp

19 AR mining FP-tree ( Mining Frequent Patterns Using the FP-tree) Move to next least frequent item in order, i.e., m Follow node pointers and traverse only the paths containing m. Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional pattern base for m fca:2, fcab:1 f:4 c:3 a:3 m:2 m m:1 b:1 Constructing a new FP-tree based on this pattern base leads to path fca:3 From this we derive frequent patterns fcam, fcm, cam, fm, cm, am root

20 AR mining FP-tree ( Conditional Pattern-Bases for the example) Empty f {(f:3)}|c{(f:3)}c {(f:3, c:3)}|a{(fc:3)}a Empty{(fca:1), (f:1), (c:1)}b {(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m {(c:3)}|p{(fcam:2), (cb:1)}p Conditional FP-treeConditional pattern-base Item

21 AR mining FP-tree (Why is Frequent pattern Growth fast?) Performance studies show that FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection Reasoning: –No candidate generation, no candidate test –Use compact data structure –Eliminate repeated database scan –Basic operation is counting and FP-tree building

22 AR mining FP-tree: Expected result: FP-growth vs. Apriori: Scalability With the Support Threshold

23 AR mining Conclusion FP-tree is faster than other two algorithms. Apriori as well as hash tree algorithms are easier to implement. –We can easily combine them with other methods or tools. (i.e. distributed parallel computing). The parameter of dataset is very important too. –Density, size, min support …

24 AR mining References [1] Jiawei Han and Micheline Kamber: "Data Mining: Concepts and Techniques ", Morgan Kaufmann, 2001Data Mining: Concepts and Techniques [2] Jiawei Han, Jian Pei, Yiwen Yin: Mining Frequent Patterns without Candidate Generation, ACM SIGMOD, 2000 [3] N.Mamoulis, Advanced Database Technologies (Slides) [4] Jiawei Han and Micheline Kamber. Data Mining - Concepts and Techniques. MorganKaufmann Publishers, 2001.


Download ppt "AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project."

Similar presentations


Ads by Google