Download presentation
Presentation is loading. Please wait.
Published byCalvin Thornton Modified over 9 years ago
1
Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism For Frequent Pattern Mining.
2
Overview Introduce Frequent Pattern Mining (FPM). Formal definition. Apriori algorithm for FPM. Task-parallel implementation of Apriori. Requirements for efficient parallelization. Cilk-style task scheduling Shortcomings w.r.t Apriori Clustered task scheduling policy Results
3
FPM: A Formal Definition Let I = {i ₁, i ₂, … i n } be a set of n items. Let D = { T ₁, T ₂ …, T m } be a set of m transactions such that T i ⊆ A set i ⊆ I of size k is called k-itemset Support of k-itemset is ∑j = 1, m ( 1: i ⊆ j ) The number of transactions in D having i as a subset. “Frequent Pattern Mining problem aims to find all i ∈ D that have a support are ≥ to a user supplied value”.
4
Apriori Algorithm for FPM TIDItem 1ABCE 2BCAF 3GHAC 4ADBH 5EDAB 6ABCD 7BDAG 8ACDB Transaction Database
5
Apriori Algorithm TIDItem 1ABCE 2BCAF 3GHAC 4ADBH 5EDAB 6ABCD 7BDAG 8ACDB A A B B C C D D E E F F G G H H 12345678 1245678 12368 45678 12 37 3 2 Transaction Database TID List
6
Apriori Algorithm for FPM A A B B C C D D 12345678 1245678 12368 45678 1245678 AB CD 68 Join Support (AB) = 87.5% Support (CD) = 25%
7
Apriori Algorithm for FPM Transaction Database A A B B C C D D E E F F G G H H Support = 37.5% (3/8) A A B B C C D D E E F F G G H H CD Spawn Wait All AB AC AD BC BD ABC ABD
8
Cilk-style parallelization 1 2 3 45 6 7 89 1011 Order of discovery 11 5 3 12 4 10 69 78 Order of completion Depth-first discovery, post-order finish n n n-1 n-2 n-3 n-4 n-3 n-4 n-5 n-6 1 Thread
9
Cilk-style parallelization Thd 1Thd 2 n Thd 1Thd 2 n-2 n-1 n Thd 1Thd 2 n-2n-1 n Thd 1Thd 2 nn-4 n-3 n-2 n-1 1. Breadth-first theft. 2. Steal one task at a time. 3. Stealing is expensive. Steal (n-1)Steal (n-3) Thread-local Deques n n n-1 n-2 n-3 n-4 n-3 n-4 n-5 n-6 Thd 1Thd 2 n-3n-4 nn-2 n-1
10
Efficient Parallelization of FPM AB AC AD A A ABC ABD AB Shortcomings of Cilk-style w.r.t FPM: 1. Exploits data locality only b/w parent-child tasks. 2.Stealing does not consider data locality. 3. Tasks are stolen one at a time. Tasks with overlapping memory accesses: 1. Executed by the same thread. 2. Stolen together by the same thread.
11
Clustered Scheduling Policy Cluster k-itemset based on common (k-1) prefix AB AC AD ABC ABD 1. Hash Table - std::hash_map. Hash(A) Hash(A) xor Hash(B) Thread-local deque Thread-local hash table Hash Table 2. Hash - std::hash.
12
Clustered Scheduling Policy AB AC AD ABC ABD Hash(A) Hash(A) xor Hash(B) Thd 1 Hash Table Thd 2 Hash Table
13
Clustered Scheduling Policy AB AC AD Steal an entire bucket of tasks. Hash(A) Thd 1 Hash Table ABC ABD Hash(A) xor Hash(B) Thd 2 Hash Table
14
Where does PFunc fit in? Customizable task scheduling and priorities. Cilk-style, LIFO, FIFO, Priority-based scheduling built-in. Custom scheduling policies are simple to implement. Eg.,Clustered scheduling policy. Chosen at compile time. Much like STL (Eg., stl::vector ). namespace pfunc { struct hashS: public schedS{}; template struct scheduler { … }; } // namespace pfunc
15
So, how does it work? Select Scheduling Policy and priority Hash Table-Based Reference to itemset Task T; SetPriority (T, ref (ABD)); Spawn (T); Task T; SetPriority (T, ref (ABD)); Spawn (T); Program GetPriority (T) - ABC Generate Hash Key Hash(A) xor Hash(B) Generate Hash Key Hash(A) xor Hash(B) Place task Scheduler ABC ABD Task Queue BCD BCE
16
Performance Analysis 8 Threads Dual AMD 8356, Linux 2.6.24, GCC 4.3.2
17
Performance Analysis - IPC DatasetSupportIPC(Cilk)IPC(Clustered) accidents0.250.5950.604 chess0.60.5600.669 connect0.80.5430.809 kosark0.00130.6920.717 pumsb0.750.4940.719 pumsb_star0.30.5270.698 mushroom0.100.5700.705 T40I10D100K0.0050.6270.727 T10I4D100K0.000060.5560.716 8 Threads Higher the better! Dual AMD 8356, Linux 2.6.24, GCC 4.3.2
18
Performance Analysis – L1 DTLB Misses DatasetSupportCilk DTLB L1M/L2H Clustered DTLB L1M/L2H accidents0.250.0000480.000046 chess0.60.0007970.000242 connect0.80.0002490.000112 kosark0.00130.0004000.000185 pumsb0.750.0002300.000114 pumsb_star0.30.0003150.000145 mushroom0.100.0004770.000267 T40I10D100K0.0050.0003680.000305 T10I4D100K0.000060.0002180.000144 8 Threads Lower the better! Dual AMD 8356, Linux 2.6.24, GCC 4.3.2
19
Performance Analysis – L2 DTLB Misses DatasetSupportCilk DTLB L1M/L2M Clustered DTLB L1M/L2M accidents0.250.0001610.000110 chess0.60.0010060.000032 connect0.80.0012040.000141 kosark0.00130.0006590.000123 pumsb0.750.0012760.000126 pumsb_star0.30.0010820.000114 mushroom0.100.0009500.000022 T40I10D100K0.0050.0009000.000021 T10I4D100K0.000060.0008760.000044 8 Threads Lower the better! Dual AMD 8356, Linux 2.6.24, GCC 4.3.2
20
Conclusions For task parallel FPM. Clustered scheduling outperforms Cilk-style. Exploits data locality. Better work-stealing policy. PFunc provides support for facile customizations. Task scheduling policy, task priorities, etc. Being released under COIN-OR. Eclipse Public License version 1.0. Future work. Task queues based on multi-dimensional index structures. K-d trees.
21
Fibonacci 37 ThreadsCilk (secs)PFunc/Cil k TBB/CilkPFunc/TBB 12.172.27184.4310.5004 21.152.11354.19240.5041 40.552.21314.41830.5009 80.282.21144.98390.4437 160.152.49445.93700.4201 2x faster than TBB 2x slower than Cilk. But provides more flexibility. Fibonacci is the worst case behavior!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.