Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computing CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L.

Similar presentations


Presentation on theme: "Parallel Computing CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L."— Presentation transcript:

1 Parallel Computing CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L

2 Outline USC CSCI 201L2/18 ▪P▪Parallel Computing ▪P▪Programming

3 Parallel Computing ▪Parallel computing studies software systems where components located on networked components communicate through message passing ›Individual processes have only a partial knowledge of the problem ›Message passing can be accomplished through remote procedure calls (RPC) or message queues (shared memory) ›Parallel computing is a term also used for programs with multiple pieces running within multiple processors or cores in the same computer USC CSCI 201L3/18 Parallel Computing

4 Fork/Join Framework ▪The fork/join framework is an implementation of the ExecutorService interface that allows taking advantage of multiple processors ▪It is designed for work that can be broken into smaller pieces recursively, allowing all available processing power ▪We can create threads and add them to a thread pool ▪Worker threads that run out of things to do can steal tasks for other threads that are still busy (work-stealing algorithm) ▪The ForkJoinPool class implements the core work- stealing algorithm and can execute ForkJoinTask processes USC CSCI 201L4/18 Parallel Computing

5 Fork/Join Framework Structure ▪Pseudocode for the fork/join framework is: if (condition) do the work directly else split work into multiple pieces invoke the pieces and wait for results ▪The above code should be inside a child of ForkJoinTask ›RecursiveTask can return a result ›RecursiveAction does not return a result ▪Create the object that represents the work to be done and pass it to the invoke() or execute() method of a ForkJoinPool USC CSCI 201L5/18 Parallel Computing

6 ForkJoinTask Class ▪The ForkJoinTask class has a method compute() that should be overridden ▪The compute() method will contain the main computation performed by the task ▪The fork() method allows asynchronous execution (starts the thread and calls compute()) ▪The join() method will not proceed until the task’s compute() method has completed as a result of the call to fork() ▪The invoke() method on a ForkJoinPool will execute the compute() method asynchronously and return the value (through a fork() then a join() call) ▪The execute() method on a ForkJoinPool will execute the compute() method asynchronously but will not return any value (through a fork() call with no join()) USC CSCI 201L6/18 Parallel Computing

7 Running Time ▪Any time you fork a task, there is overhead in moving that to a new CPU, executing it, and then getting the response ▪Just because you are parallelizing code does not mean that you will have an improvement in execution speed ▪If you fork more threads than you have CPUs, the threads will execute in a concurrent manner (time- slicing) in each CPU USC CSCI 201L7/18 Parallel Computing

8 Not Parallel Fibonacci Example #1 1 public class FibonacciNoParallel { 2 public static void main(String [] args) { 3 int index = 30; 4 long before = System.currentTimeMillis(); 5 Fibonacci fib = new Fibonacci(index); 6 int num = fib.compute(); 7 long after = System.currentTimeMillis(); 8 System.out.println("time without parallelism = " + (after - before)); 9 System.out.println(index + "th Fibonacci number = " + num); 10 } 11 12 static class Fibonacci { 13 private int n; 14 Fibonacci(int n) { 15 this.n = n; 16 } 17 protected Integer compute() { 18 if (n == 0 || n == 1) { 19 return n; 20 } 21 Fibonacci f1 = new Fibonacci(n - 1); 22 Fibonacci f2 = new Fibonacci(n - 2); 23 this.n = f1.compute() + f2.compute(); 24 return this.n; 25 } 26 } 27 } USC CSCI 201L8/18 Parallel Computing time without parallelism = 109 30th Fibonacci number = 832040 time without parallelism = 125 30th Fibonacci number = 832040 time without parallelism = 94 30th Fibonacci number = 832040

9 Parallel Fibonacci Example #1 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel1 { 5 public static void main(String [] args) { 6 int index = 30; 7 long before = System.currentTimeMillis(); 8 ForkJoinPool pool = new ForkJoinPool(); 9 Fibonacci fib = new Fibonacci(index); 10 pool.execute(fib); 11 pool.shutdown(); 12 long after = System.currentTimeMillis(); 13 System.out.println("time with parallelism = " + (after - before)); 14 System.out.println(index + "th Fibonacci number = " + fib.getNum()); 15 } 16 17 static class Fibonacci extends RecursiveTask { 18 private static final long serialVersionUID = 1; 19 private int n; 20 Fibonacci(int n) { 21 this.n = n; 22 } 23 public int getNum() { 24 return this.n; 25 } 26 protected Integer compute() { 27 if (n == 0 || n == 1) { 28 return n; 29 } 30 Fibonacci f1 = new Fibonacci(n - 1); 31 Fibonacci f2 = new Fibonacci(n - 2); 32 f1.fork(); 33 f2.fork(); 34 this.n = f2.join() + f1.join(); 35 return this.n; 36 } 37 } 38 } USC CSCI 201L9/18 Parallel Computing time with parallelism = 16 30th Fibonacci number = 30 time with parallelism = 15 30th Fibonacci number = 30 time with parallelism = 16 30th Fibonacci number = 30

10 Parallel Fibonacci Example #2 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel2 { 5 public static void main(String [] args) { 6 int index = 30; 7 long before = System.currentTimeMillis(); 8 ForkJoinPool pool = new ForkJoinPool(); 9 Fibonacci fib = new Fibonacci(index); 10 pool.execute(fib); 11 pool.shutdown(); 12 while (pool.getActiveThreadCount() > 0) { 13 } 14 long after = System.currentTimeMillis(); 15 System.out.println("time with parallelism = " + (after - before)); 16 System.out.println(index + "th Fibonacci number = " + fib.getNum()); 17 } 18 19 static class Fibonacci extends RecursiveTask { 20 private static final long serialVersionUID = 1; 21 private int n; 22 Fibonacci(int n) { 23 this.n = n; 24 } 25 public int getNum() { 26 return this.n; 27 } 28 protected Integer compute() { 29 if (n == 0 || n == 1) { 30 return n; 31 } 32 Fibonacci f1 = new Fibonacci(n - 1); 33 Fibonacci f2 = new Fibonacci(n - 2); 34 f1.fork(); 35 f2.fork(); 36 this.n = f2.join() + f1.join(); 37 return this.n; 38 } 39 } 40 } USC CSCI 201L10/18 Parallel Computing time with parallelism = 535 30th Fibonacci number = 832040 time with parallelism = 969 30th Fibonacci number = 832040 time with parallelism = 766 30th Fibonacci number = 832040

11 Parallel Fibonacci Example #3 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel3 { 5 public static void main(String [] args) { 6 int index = 30; 7 long before = System.currentTimeMillis(); 8 ForkJoinPool pool = new ForkJoinPool(); 9 Fibonacci fib = new Fibonacci(index); 10 int num = pool.invoke(fib); 11 pool.shutdown(); 12 long after = System.currentTimeMillis(); 13 System.out.println("time with parallelism = " + (after - before)); 14 System.out.println(index + "th Fibonacci number = " + num); 15 } 16 17 static class Fibonacci extends RecursiveTask { 18 private static final long serialVersionUID = 1; 19 private int n; 20 Fibonacci(int n) { 21 this.n = n; 22 } 23 public int getNum() { 24 return this.n; 25 } 26 protected Integer compute() { 27 if (n == 0 || n == 1) { 28 return n; 29 } 30 Fibonacci f1 = new Fibonacci(n - 1); 31 Fibonacci f2 = new Fibonacci(n - 2); 32 f1.fork(); 33 f2.fork(); 34 this.n = f2.join() + f1.join(); 35 return this.n; 36 } 37 } 38 } USC CSCI 201L11/18 Parallel Computing time with parallelism = 763 30th Fibonacci number = 832040 time with parallelism = 861 30th Fibonacci number = 832040 time with parallelism = 609 30th Fibonacci number = 832040

12 Parallel Fibonacci Example #4 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel4 { 5 public static void main(String [] args) { 6 int index = 30; 7 int poolSize = Runtime.getRuntime().availableProcessors(); 8 for (int i=1; i <= poolSize; i++) { 9 long before = System.currentTimeMillis(); 10 ForkJoinPool pool = new ForkJoinPool(i); 11 Fibonacci fib = new Fibonacci(index); 12 int num = pool.invoke(fib); 13 pool.shutdown(); 14 long after = System.currentTimeMillis(); 15 System.out.println("time with parallelism " + i + " = " + (after - before)); 16 System.out.println(index + "th Fibonacci number = " + num); 17 } 18 } 19 20 static class Fibonacci extends RecursiveTask { 21 private static final long serialVersionUID = 1; 22 private int n; 23 Fibonacci(int n) { 24 this.n = n; 25 } 26 public int getNum() { 27 return this.n; 28 } 29 protected Integer compute() { 30 if (n == 0 || n == 1) { 31 return n; 32 } 33 Fibonacci f1 = new Fibonacci(n - 1); 34 Fibonacci f2 = new Fibonacci(n - 2); 35 f1.fork(); 36 f2.fork(); 37 this.n = f2.join() + f1.join(); 38 return this.n; 39 } 40 } 41 } USC CSCI 201L12/18 Parallel Computing time with parallelism 1 = 820 30th Fibonacci number = 832040 time with parallelism 2 = 225 30th Fibonacci number = 832040 time with parallelism 3 = 475 30th Fibonacci number = 832040 time with parallelism 4 = 267 30th Fibonacci number = 832040 time with parallelism 1 = 844 30th Fibonacci number = 832040 time with parallelism 2 = 251 30th Fibonacci number = 832040 time with parallelism 3 = 423 30th Fibonacci number = 832040 time with parallelism 4 = 166 30th Fibonacci number = 832040 time with parallelism 1 = 875 30th Fibonacci number = 832040 time with parallelism 2 = 200 30th Fibonacci number = 832040 time with parallelism 3 = 556 30th Fibonacci number = 832040 time with parallelism 4 = 394 30th Fibonacci number = 832040

13 Parallel Fibonacci Example #5 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel5 { 5 public static void main(String [] args) { 6 int index = 30; 7 int poolSize = Runtime.getRuntime().availableProcessors(); 8 for (int i=poolSize; i >= 1; i--) { 9 long before = System.currentTimeMillis(); 10 ForkJoinPool pool = new ForkJoinPool(i); 11 Fibonacci fib = new Fibonacci(index); 12 int num = pool.invoke(fib); 13 pool.shutdown(); 14 long after = System.currentTimeMillis(); 15 System.out.println("time with parallelism " + i + " = " + (after - before)); 16 System.out.println(index + "th Fibonacci number = " + num); 17 } 18 } 19 20 static class Fibonacci extends RecursiveTask { 21 private static final long serialVersionUID = 1; 22 private int n; 23 Fibonacci(int n) { 24 this.n = n; 25 } 26 public int getNum() { 27 return this.n; 28 } 29 protected Integer compute() { 30 if (n == 0 || n == 1) { 31 return n; 32 } 33 Fibonacci f1 = new Fibonacci(n - 1); 34 Fibonacci f2 = new Fibonacci(n - 2); 35 f1.fork(); 36 f2.fork(); 37 this.n = f2.join() + f1.join(); 38 return this.n; 39 } 40 } 41 } USC CSCI 201L13/18 Parallel Computing time with parallelism 4 = 986 30th Fibonacci number = 832040 time with parallelism 3 = 210 30th Fibonacci number = 832040 time with parallelism 2 = 171 30th Fibonacci number = 832040 time with parallelism 1 = 344 30th Fibonacci number = 832040 time with parallelism 4 = 770 30th Fibonacci number = 832040 time with parallelism 3 = 245 30th Fibonacci number = 832040 time with parallelism 2 = 166 30th Fibonacci number = 832040 time with parallelism 1 = 288 30th Fibonacci number = 832040 time with parallelism 4 = 996 30th Fibonacci number = 832040 time with parallelism 3 = 164 30th Fibonacci number = 832040 time with parallelism 2 = 153 30th Fibonacci number = 832040 time with parallelism 1 = 313 30th Fibonacci number = 832040

14 Parallel Fibonacci Example #6 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel6 { 5 public static void main(String [] args) { 6 int index = 30; 7 int poolSize = Runtime.getRuntime().availableProcessors(); 8 for (int i=5*poolSize; i >= 1; i--) { 9 long totalTime = 0; 10 int num = 0; 11 for (int j=0; j < 20; j++) { 12 long before = System.nanoTime(); 13 ForkJoinPool pool = new ForkJoinPool(i); 14 Fibonacci fib = new Fibonacci(index); 15 num = pool.invoke(fib); 16 pool.shutdown(); 17 long after = System.nanoTime(); 18 totalTime += (after - before); 19 } 20 long averageTotalTime = totalTime / 20; 21 System.out.println("time with parallelism " + i + " = " + averageTotalTime); 22 } 23 } 24 25 static class Fibonacci extends RecursiveTask { 26 private static final long serialVersionUID = 1; 27 private int n; 28 Fibonacci(int n) { 29 this.n = n; 30 } 31 public int getNum() { 32 return this.n; 33 } 34 protected Integer compute() { 35 if (n == 0 || n == 1) { 36 return n; 37 } 38 Fibonacci f1 = new Fibonacci(n - 1); 39 Fibonacci f2 = new Fibonacci(n - 2); 40 f1.fork(); 41 f2.fork(); 42 this.n = f2.join() + f1.join(); 43 return this.n; 44 } 45 } 46 } USC CSCI 201L14/18 Parallel Computing time with parallelism 1 = 820 30th Fibonacci number = 832040 time with parallelism 2 = 225 30th Fibonacci number = 832040 time with parallelism 3 = 475 30th Fibonacci number = 832040 time with parallelism 4 = 267 30th Fibonacci number = 832040 time with parallelism 20 = 218 time with parallelism 19 = 74 time with parallelism 18 = 81 time with parallelism 17 = 66 time with parallelism 16 = 85 time with parallelism 15 = 53 time with parallelism 14 = 69 time with parallelism 13 = 62 time with parallelism 12 = 53 time with parallelism 11 = 54 time with parallelism 10 = 51 time with parallelism 9 = 49 time with parallelism 8 = 42 time with parallelism 7 = 50 time with parallelism 6 = 46 time with parallelism 5 = 37 time with parallelism 4 = 37 time with parallelism 3 = 37 time with parallelism 2 = 71 time with parallelism 1 = 311

15 Parallel Fibonacci Example #7 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveTask; 3 4 public class FibonacciParallel6 { 5 public static void main(String [] args) { 6 int index = 30; 7 int poolSize = Runtime.getRuntime().availableProcessors(); 8 for (int i=5*poolSize; i >= 1; i--) { 9 long totalTime = 0; 10 int num = 0; 11 for (int j=0; j < 20; j++) { 12 long before = System.nanoTime(); 13 ForkJoinPool pool = new ForkJoinPool(i); 14 Fibonacci fib = new Fibonacci(index); 15 num = pool.invoke(fib); 16 pool.shutdown(); 17 long after = System.nanoTime(); 18 totalTime += (after - before); 19 } 20 long averageTotalTime = totalTime / 20; 21 System.out.println("time with parallelism " + i + " = " + averageTotalTime); 22 } 23 } 24 25 static class Fibonacci extends RecursiveTask { 26 private static final long serialVersionUID = 1; 27 private int n; 28 Fibonacci(int n) { 29 this.n = n; 30 } 31 public int getNum() { 32 return this.n; 33 } 34 protected Integer compute() { 35 if (n == 0 || n == 1) { 36 return n; 37 } 38 Fibonacci f1 = new Fibonacci(n - 1); 39 Fibonacci f2 = new Fibonacci(n - 2); 40 this.n = f2.compute() + f1.compute(); 41 return this.n; 42 } 43 } 44 } USC CSCI 201L15/18 Parallel Computing time with parallelism 20 = 74 time with parallelism 19 = 62 time with parallelism 18 = 53 time with parallelism 17 = 59 time with parallelism 16 = 60 time with parallelism 15 = 60 time with parallelism 14 = 60 time with parallelism 13 = 62 time with parallelism 12 = 59 time with parallelism 11 = 63 time with parallelism 10 = 59 time with parallelism 9 = 58 time with parallelism 8 = 65 time with parallelism 7 = 57 time with parallelism 6 = 55 time with parallelism 5 = 53 time with parallelism 4 = 58 time with parallelism 3 = 62 time with parallelism 2 = 60 time with parallelism 1 = 60

16 Parallel Merge Sort Example 1 import java.util.concurrent.ForkJoinPool; 2 import java.util.concurrent.RecursiveAction; 3 public class ParallelMergeSort { 4 public static void main(String[] args) { 5 int SIZE = 2_000_000; 6 int[] list = new int[SIZE]; 7 8 for (int i = 0; i < SIZE; i++) { 9 list[i] = (int)(Math.random() * Integer.MAX_VALUE); 10 } 11 int numProcessors = Runtime.getRuntime().availableProcessors(); 12 System.out.println("num processors: " + numProcessors); 13 14 // timing[i] : time to sort on i processors 15 long[] timing = new long[numProcessors*2+1]; 16 17 for (int i=1; i <= numProcessors * 2; i++) { 18 timing[i] = parallelMergeSort((int[])list.clone(), i); 19 System.out.println(i + " processors=" + timing[i] + " ms"); 20 } 21 } 22 23 public static long parallelMergeSort(int[] list, int proc) { 24 long before = System.currentTimeMillis(); 25 ForkJoinPool pool = new ForkJoinPool(proc); 26 pool.invoke(new SortTask(list)); 27 pool.shutdown(); 28 while (!pool.isTerminated()) { 29 Thread.yield(); 30 } 31 long after = System.currentTimeMillis(); 32 long time = after - before; 33 return time; 34 } USC CSCI 201L16/18 Parallel Computing 35 private static class SortTask extends RecursiveAction { 36 public static final long serialVersionUID = 1; 37 private int[] list; 38 SortTask(int[] list) { 39 this.list = list; 40 } 41 42 protected void compute() { 43 if (list.length < 2) return; // base case 44 // split into halves 45 int[] firstHalf = new int[list.length / 2]; 46 System.arraycopy(list, 0, firstHalf, 0, list.length / 2); 47 int secondLength = list.length - list.length / 2; 48 int[] secondHalf = new int[secondLength]; 49 System.arraycopy(list, list.length / 2, secondHalf, 0, secondLength); 50 51 // recursively sort the two halves 52 new SortTask(firstHalf).invoke(); 53 new SortTask(secondHalf).invoke(); 54 // merge halves together 55 merge(firstHalf, secondHalf, list); 56 } 57 } 58 59 public static void merge(int[] list1, int[] list2, int[] merged) { 60 int i1 = 0, i2 = 0, i3 = 0; // index in list1, list2, out 61 while (i1 < list1.length && i2 < list2.length) { 62 merged[i3++] = (list1[i1] < list2[i2]) ? list1[i1++] : list2[i2++]; 63 } 64 // any trailing ends 65 while (i1 < list1.length) { 66 merged[i3++] = list1[i1++]; 67 } 68 while (i2 < list2.length) { 69 merged[i3++] = list2[i2++]; 70 } 71 } 72 } num processors: 4 1 processors=2213 ms 2 processors=2345 ms 3 processors=2204 ms 4 processors=2234 ms 5 processors=2001 ms 6 processors=1812 ms 7 processors=1750 ms 8 processors=2531 ms num processors: 4 1 processors=2360 ms 2 processors=2126 ms 3 processors=1688 ms 4 processors=2422 ms 5 processors=1844 ms 6 processors=1875 ms 7 processors=1641 ms 8 processors=2329 ms

17 Outline USC CSCI 201L17/18 ▪P▪Parallel Computing ▪P▪Program

18 Program ▪Modify the ParallelMergeSort code to sort with merge sort without threads. Does it execute faster than the ParallelMergeSort? ▪Does the input size matter for relative run times? USC CSCI 201L18/18 Parallel Computing


Download ppt "Parallel Computing CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L."

Similar presentations


Ads by Google