Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism.

Similar presentations


Presentation on theme: "Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism."— Presentation transcript:

1 Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism For Frequent Pattern Mining.

2 Overview Introduce Frequent Pattern Mining (FPM). Formal definition. Apriori algorithm for FPM. Task-parallel implementation of Apriori. Requirements for efficient parallelization. Cilk-style task scheduling Shortcomings w.r.t Apriori Clustered task scheduling policy Results

3 FPM: A Formal Definition Let I = {i ₁, i ₂, … i n } be a set of n items. Let D = { T ₁, T ₂ …, T m } be a set of m transactions such that T i ⊆  A set i ⊆ I of size k is called k-itemset Support of k-itemset is ∑j = 1, m ( 1: i ⊆  j ) The number of transactions in D having i as a subset. “Frequent Pattern Mining problem aims to find all i ∈ D that have a support are ≥ to a user supplied value”.

4 Apriori Algorithm for FPM TIDItem 1ABCE 2BCAF 3GHAC 4ADBH 5EDAB 6ABCD 7BDAG 8ACDB Transaction Database

5 Apriori Algorithm TIDItem 1ABCE 2BCAF 3GHAC 4ADBH 5EDAB 6ABCD 7BDAG 8ACDB A A B B C C D D E E F F G G H H 12345678 1245678 12368 45678 12 37 3 2 Transaction Database TID List

6 Apriori Algorithm for FPM A A B B C C D D 12345678 1245678 12368 45678 1245678 AB CD 68 Join Support (AB) = 87.5% Support (CD) = 25%

7 Apriori Algorithm for FPM Transaction Database A A B B C C D D E E F F G G H H Support = 37.5% (3/8) A A B B C C D D E E F F G G H H CD Spawn Wait All AB AC AD BC BD ABC ABD

8 Cilk-style parallelization 1 2 3 45 6 7 89 1011 Order of discovery 11 5 3 12 4 10 69 78 Order of completion Depth-first discovery, post-order finish n n n-1 n-2 n-3 n-4 n-3 n-4 n-5 n-6 1 Thread

9 Cilk-style parallelization Thd 1Thd 2 n Thd 1Thd 2 n-2 n-1 n Thd 1Thd 2 n-2n-1 n Thd 1Thd 2 n-5n-3 n-6n-4 n-3 n-2 nn-1 Thd 1Thd 2 n-3n-4 nn-2 n-1 Thd 1Thd 2 nn-4 n-3 n-2 n-1 1. Breadth-first theft. 2. Steal one task at a time. 3. Stealing is expensive. Steal (n-1)Steal (n-3) Thread-local Deques n n n-1 n-2 n-3 n-4 n-3 n-4 n-5 n-6

10 Efficient Parallelization of FPM AB AC AD A A ABC ABD AB Shortcomings of Cilk-style w.r.t FPM: 1. Exploits data locality only b/w parent-child tasks. 2.Stealing does not consider data locality. 3. Tasks are stolen one at a time. 1. Exploit data locality in sibling tasks. 2. Steal clusters of tasks. 3. Maintain data locality amongst stolen tasks.

11 Clustered Scheduling Policy - Goals AB AC AD A A ABC ABD AB Tasks with overlapping memory accesses: 1. Executed by the same thread. 2. Stolen together by the same thread.

12 Clustered Scheduling Policy Cluster k-itemset based on common (k-1) prefix AB AC AD ABC ABD 1. Hash Table - std::hash_map. Hash(A) Hash(A) xor Hash(B) Thread-local deque Thread-local hash table Hash Table 2. Hash - std::hash.

13 Clustered Scheduling Policy AB AC AD ABC ABD Hash(A) Hash(A) xor Hash(B) Thd 1 Hash Table Thd 2 Hash Table

14 Clustered Scheduling Policy AB AC AD Steal an entire bucket of tasks. Hash(A) Thd 1 Hash Table ABC ABD Hash(A) xor Hash(B) Thd 2 Hash Table

15 Where does PFunc fit in? Customizable task scheduling and priorities. Cilk-style, LIFO, FIFO, Priority-based scheduling built-in. Custom scheduling policies are simple to implement. Eg.,Clustered scheduling policy. Chosen at compile time. Much like STL (Eg., stl::vector ). namespace pfunc { struct hashS: public schedS{}; template struct scheduler { … }; } // namespace pfunc

16 So, how does it work? Select Scheduling Policy and priority Hash Table-Based Reference to itemset Task T; SetPriority (T, ref (ABD)); Spawn (T); Task T; SetPriority (T, ref (ABD)); Spawn (T); Program GetPriority (T) - ABC Generate Hash Key Hash(A) xor Hash(B) Generate Hash Key Hash(A) xor Hash(B) Place task Scheduler ABC ABD Task Queue BCD BCE

17 Performance Analysis 8 Threads

18 Performance Analysis - IPC DatasetSupportIPC(Cilk)IPC(Clustered) accidents0.250.5950.604 chess0.60.5600.669 connect0.80.5430.809 kosark0.00130.6920.717 pumsb0.750.4940.719 pumsb_star0.30.5270.698 mushroom0.100.5700.705 T40I10D100K0.0050.6270.727 T10I4D100K0.000060.5560.716 8 Threads Higher the better!

19 Performance Analysis – L1 DTLB Misses DatasetSupportCilk DTLB L1M/L2H Clustered DTLB L1M/L2H accidents0.250.0000480.000046 chess0.60.0007970.000242 connect0.80.0002490.000112 kosark0.00130.0004000.000185 pumsb0.750.0002300.000114 pumsb_star0.30.0003150.000145 mushroom0.100.0004770.000267 T40I10D100K0.0050.0003680.000305 T10I4D100K0.000060.0002180.000144 8 Threads Lower the better!

20 Performance Analysis – L2 DTLB Misses DatasetSupportCilk DTLB L1M/L2M Clustered DTLB L1M/L2M accidents0.250.0001610.000110 chess0.60.0010060.000032 connect0.80.0012040.000141 kosark0.00130.0006590.000123 pumsb0.750.0012760.000126 pumsb_star0.30.0010820.000114 mushroom0.100.0009500.000022 T40I10D100K0.0050.0009000.000021 T10I4D100K0.000060.0008760.000044 8 Threads Lower the better!

21 Conclusions For task parallel FPM. Clustered scheduling outperforms Cilk-style. Exploits data locality. Better work-stealing policy. PFunc provides support for facile customizations. Task scheduling policy, task priorities, etc. Future work. Task queues based on multi-dimensional index structures. K-d trees.


Download ppt "Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism."

Similar presentations


Ads by Google