Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci.

wenbin@cse.ust.hk Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci. and Tech. Microsoft Research Asia 1 Presenter: Wenbin Fang

2/33 Outline Contribution Introduction Design Evaluation Conclusion

3/33 Contribution Accelerate the Apriori algorithm for Frequent Itemset Mining using Graphics Processors (GPUs). Two GPU implementations: 1.Pure Bitmap-based implementation (PBI): processing entirely on the GPU. 2.Trie-based implementation (TBI): GPU/CPU co-processing.

4/33 Frequent Itemset Mining (FIM) Finding groups of items, or itemsets that co-occur frequently in a transaction database. Transaction IDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Minimum support: 2 1-itemsets (frequent items): A : 3 B : 2 C : 3 D : 4

5/33 Frequent Itemset Mining (FIM) Aims at finding groups of items, or itemsets that co-occur frequently in a transaction database. Transaction IDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Minimum support: 2 1-itemsets (frequent items): A, B, C, D 2-itemsets: AB: 2 AC: 2 AD: 3 BD: 2 CD: 3

6/33 Frequent Itemset Mining (FIM) Aims at finding groups of items, or itemsets that co-occur frequently in a transaction database. Transaction IDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Minimum support: 2 1-itemsets (frequent items): A, B, C, D 2-itemsets: AB, AC, AD, BD, CD 3-itemsets: ABD, ACD

7/33 Graphics Processors (GPUs) Exist in commodity machines, mainly for graphics rendering. Specialized for compute-intensive, highly data parallel apps. Compared with CPUs, GPUs provide 10x faster computational horsepower, and 10x higher memory bandwidth. CPUGPU --From NVIDA CUDA Programming Guide

8/33 Programming on GPUs OpenGL/DirectX AMD CTM NVIDIA CUDA The many-core architecture model of the GPU SIMD parallelism (Single Instruction, Multiple Data)

9/33 Hierarchical multi-threaded in NVIDIA CUDA … A warp = 32 GPU threads => SIMD schedule unit. …………… Thread Block # of threads in a thread block # of thread blocks Thread Block Warp

10/33 General Purpose GPU Computing (GPGPU) Applications utilizing GPUs Scientific computing Molecular Dynamics Simulation Weather forecasting Linear algebra Computational finance Folding@home, Seti@home Database applications Basic DB Operators [SIGMOD’04] Sorting [SIGMOD’06] Join [SIGMOD’08]

11/33 Our work As a first step, we consider the GPU-based Apriori, with intention to extend to another efficient FIM algorithm -- FP-growth. Why Apriori? 1.a classic algorithm for mining frequent itemsets. 2.also applied in other data mining tasks, e.g., clustering, and functional dependency.

12/33 The Apriori Algorithm Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Frequent 1-itemsets Candidate 2-itemsets Frequent 2-itemsets Candidate 3-itemsets Frequent 3-itemsets … Candidate (K-1)-itemsets Frequent (K-1)-itemsets Candidate K-itemsets Frequent K-itemsets

14/33 L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } GPU-based Apriori Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Pure Bitmap-based Impl. (PBI) Itemsets: bitmap Candidate generation on the GPU Transactions: bitmap Support counting on the GPU Trie-based Impl. (TBI) Transactions: bitmap Support counting on the GPU Itemsets: Trie Candidate generation on the CPU

15/33 Horizontal and Vertical data layout Horizontal data layout TIDItems 1A, B, C, D 2A, B, D 3A, C, D 4C, D Vertical data layout ItemsetTID List AB1, 2 AC1, 3 AD1, 2, 3 BD1, 2, 3 CD1, 3, 4 ItemsetTID List ABD1, 2 ACD1, 3 Scan all transactions Support counting is done on specific itemsets. 1.Intersect two transaction lists. 2.Count the number of transactions in the intersection result.

16/33 Bitmap representation for transactions T1T2T3T4 AB1100 AC1010 AD1110 BD1110 CD1011 Intersection = bitwise AND operation # of transactions # of itemsets Counting = # of 1’s in a string of bits

17/33 Lookup table IndexCount 00000 00011 … 10001 10102 10113 11002 Lookup table T1T2T3T4 ABD1100 ACD1010 Bitmap representation for transactions Constant memory 1.Cacheable 2.64 KB 3.Shared by all GPU threads IndexCount 0000 0000 0000 0000 (0)0 0000 0000 0000 0001 (1)1 … 1111 1111 1111 1110 (65534)15 1111 1111 1111 1111 (65535)16 1 byte 2 16 = 65536 # of 1’s = TABLE[12]; // decimal: 12 // binary: 1100 // (a string of bits)

18/33 Support Counting on the GPU (Cont.) Thread block 1 Thread block 2 T1T2T3T4 AB1100 AC1010 AD1110 BD1110 CD1011 T1T2T3T4 ABD1100 ACD1010 1.Intersect two transaction lists. 2.Count the number of transaction in the intersection result. LOOKUP TABLE 2

19/33 Support Counting on the GPU (Cont.) int AND int LOOKUP TABLE Counts of 1’s for every 16-bit integer Parallel Reduce Support for this itemset Thread Block Thread 1 Thread 2 Access vector type int4 In one instruction 1100 1110 1100 Example: Counts: 2 Support:2 AD AB ABD

20/33 Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. Join Subset test //Generate frequent k-itemsets Support Counting k += 1 } Support Counting on the GPU Candidate Generation 1.Join e.g., Join two 2-itemsets to obtain a candidate 3-itemset: AAA AC JOIN AD => ACD 2.Subset test e.g., Test all 2-subsets of ACD: {AC, AD, CD} GPU-based Apriori

21/33 L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } GPU-based Apriori Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Pure Bitmap-based Impl. (PBI) Itemsets: bitmap Candidate generation on the GPU Transactions: bitmap Support counting on the GPU Trie-based Impl. (TBI) Transactions: bitmap Support counting on the GPU Itemsets: Trie Candidate generation on the CPU Itemsets: bitmap Candidate generation on the GPU

Pure Bitmap-based Impl. (PBI) One GPU thread generates one candidate itemset. 22/33 ABCD ABD1101 ACD1011 ABCD AB1100 AC1010 AD1001 BD0101 CD0011 Bitwise OR In Join (e.g., AB JOIN AD = ABD) Binary search In Subset test (e.g., 2-subsets {AB, AD, BD}) # of items # of itemsets

23/33 L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } GPU-based Apriori Input: 1) Transaction Database 2) Minimum support Output: All frequent itemsets L 1 = {All frequent 1-itemsets} k = 2 While (L k-1 != empty) { //Generate candidate k-itemsets. C k <- Self join on L k-1 C k <- (K-1)-Subset test on C k //Generate frequent k-itemsets L k <- Support Counting on C k k += 1 } Pure Bitmap-based Impl. (PBI) Itemsets: bitmap Candidate generation on the GPU Transactions: bitmap Support counting on the GPU Trie-based Impl. (TBI) Transactions: bitmap Support counting on the GPU Itemsets: Trie Candidate generation on the CPU Itemsets: Trie Candidate generation on the CPU

D 24/33 Trie-based Impl. (TBI) Root ABCD BCDDD 1-itemsets: {A, B, C, D} 2-itemsets: {AB, AC, AD, BD, CD} A AB B JOINAC C = ABC C {AB, AC, BC} B ABJOINAD= ABD {AB, AD, BD} D DD D ACJOINAD= ACD {AC, AD, CD} On CPU 1, Irregular memory access 2, Branch divergence C D Candidate 3-itemsets: { ABD, ACD} Depth 0 Depth 1 Depth 2

26/33 Experimental setup Intel Core2 quad-core CPU NV GTX 280 GPU Processors 2.66 GHz * 4 1.35 GHz * 30 * 8 Memory Bandwidth (GB/sec) 5.6141.7 Development Env. Windows XP + Visual Studio 2005 + CUDA Dataset#ItemAvg. Length #TranDensity T40I10D100K (synthetic) 1,00040100,0004% Retail16,46910.388,1620.6% Chess7537319649% Platform: Experimental datasets: Density = Avg. Length / # items

27/33 Apriori Implementations Impl.Candidate Generation Support Countnig ItemsetsTransactions BORGELTSingle-threaded on the CPU Trie GOETHALSSingle-threaded on the CPU Multi-threaded on the CPU TrieHorizontal layout TBI-CPUSingle-threaded on the CPU Multi-threaded on the CPU TrieBitmap TBI-GPUSingle-threaded on the CPU Multi-threaded on the GPU TrieBitmap PBI-GPUMulti-threaded on the GPU Bitmap Best Apriori implementation in FIMI repository. (Frequent Itemset Mining Implementations Repository)

28/33 TBI-CPU vs GOETHALS Impl.Itemset/Candidat e Generation Transactions / Suport Counting TBI-CPU Trie / CPUBitmap / CPU GOETHALS Trie / CPUHorizontal layout / CPU Dense Dataset - Chess Sparse Dataset- Retail The impact of using bitmap representation for transactions in support counting. 1.2x ~ 25.7x

Sparse Dataset- Retail 29/33 TBI-GPU vs TBI-CPU Impl.Itemset/Candidat e Generation Transactions / Suport Counting TBI-GPU Trie / CPUBitmap / GPU TBI-CPU Trie / CPUBitmap / CPU Dense Dataset - Chess The impact of GPU acceleration in support counting. 1.1x ~ 7.8x

30/33 PBI-GPU vs TBI-GPU Impl.Itemset/Candidat e Generation Transactions / Suport Counting PBI-GPU Bitmap / GPU TBI-GPU Trie / CPUBitmap / GPU Sparse Dataset- Retail Dense Dataset - Chess The impact of bitmap-based itemset and trie-based itemset in candidate generation. PBI-GPU is faster in dense dataset. TBI-GPU is better in sparse dataset.

31/33 PBI-GPU/TBI-CPU vs BORGELT Impl.Itemset/Candidat e Generation Transactions / Suport Counting PBI-GPU Bitmap / GPU TBI-GPU Trie / CPUBitmap / GPU BORGELT Trie /CPU Sparse Dataset- Retail Dense Dataset - Chess Comparison to the best Apriori implementation in FIMI. 1.2x ~ 24.2x

32/33 Comparison to FP-growth With minsup 1%, 60%, and 0.01% PARSEC benchmark

33/33 Conclusion GPU-based Apriori Pure Bitmap-based impl. Bitmap Representation for itemsets. Bitmap Representation for transactions. GPU processing. Trie-based impl. Trie Representation for itemsets. Bitmap Representation for transactions. GPU + CPU co-processing. Better than CPU-based Apriori. Still worse than CPU-based FP-growth

Backup Slide Time breakdown on dense dataset Chess Time breakdown on dense dataset Retail Time Breakdown

Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci.

Similar presentations

Presentation on theme: "Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci.

Similar presentations

Presentation on theme: "Frequent Itemset Mining on Graphics Processors Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He 1, Qiong Luo Hong Kong Univ. of Sci."— Presentation transcript:

Similar presentations

About project

Feedback