Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallelizing C Programs Using Cilk Mahdi Javadi.

Similar presentations


Presentation on theme: "Parallelizing C Programs Using Cilk Mahdi Javadi."— Presentation transcript:

1 Parallelizing C Programs Using Cilk Mahdi Javadi

2 Cilk Language Cilk is a language for multithreaded parallel programming based on C. The programmer should not worry about scheduling the computation to run efficiently. There are three additional keywords: cilk, spawn and sync.

3 Example: Fibonacci Int fib (int n) { int x, y; if (n<2) return n; x = fib (n-1); y = fib (n-2); return x+y; } cilk Int fib (int n) { int x, y; if (n<2) return n; x = spawn fib (n-1); y = spawn fib (n-2); sync; return x+y; }

4 Performance Measures T p = execution time on P processors. T 1 is called work. T ∞ is called span. Obvious lower bounds: T p ≥ T 1 /P T p ≥ T ∞ p =T 1 /T ∞ is called parallelism. Using more than p processors makes little sense.

5 Cilk Compiler The file extension should be “.cilk”. Example: > cilkc -O3 fib.cilk -o fib To find the 30 th Fibonacci number using 4 CPUs: > fib --nproc 4 30 To collect timings of each processor and compute the span (not efficient): > cilkc -cilk-profile -cilk-span -O3 fib.cilk -o fib

6 Example: Matrix Multiplication Suppose we want to multiply two n by n matrices: We can recursively formulate the problem: i.e. one n by n matrix multiplication reduces to: 8 multiplications and for additions of (n/2) by (n/2) submatrices. ( C 11 C 12 C 21 C 22 ) = ( A 11 A 12 A 21 A 22 ). ( B 11 B 12 B 21 B 22 ) ( A 11 B 11 + A 12 B 21 A 11 B 12 + A 12 B 22 A 21 B 11 + A 22 B 21 A 21 B 12 + A 22 B 22 )( C 11 C 12 C 21 C 22 ) =

7 Multiplication Procedure Mult(C, A, B, n) if (n = 1) C[1,1] = A[1,1].B[1,1] else { spawn Mult(C 11,A 11,B 11,n/2); … spawn Mult(C 22,A 21,B 12,n/2); spawn Mult(T 11,A 12,B 21,n/2); … spawn Mult(T 22,A 22,B 22,n/2); sync; Add(C,T,n); }

8 Addition Procedure Add(C,T,n) if (n = 1) C[1,1] = C[1,1]+T[1,1]; else { spawn Add(C 11,T 11,n/2); … spawn Add(C 22,T 22,n/2); sync; } T 1 (work) for addition = O(n 2 ). T ∞ (span) for addition = O(log(n)).

9 Complexity of Multiplication We know that matrix multiplication is O(n 3 ) hence T 1 (work) for multiplication = O(n 3 ). T ∞ : M ∞ (n) = M ∞ (n/2) + O(log(n)) = O(log 2 (n)). p = T 1 / T ∞ = O(n 3 ) / O(log 2 (n)). To multiply 1000 by 1000: p = 10 7 ( a lot of CPUs !!!)

10 Discrete Fourier Transform DFT(n,w,p,…)... t = w 2 mod p DFT(n/2,t,p,…); … w 1 = 1; for (i = 0; i < n/2; i++) { … a[i] = … w 1 = w 1.w mod p; } cilk DFT(n,w,p,…)... t = w 2 mod p spawn DFT(n/2,t,p,…); sync; … spawn ParCom(n,a,p,1,…); cilk ParCom(n,a,p,m,…) if (n <= 512) … spawn ParCom(n/2,a,p,1,…); m’ = m. w n/2 mod p; spawn ParCom(n/2,a+n/2,p,m’,…); sync;

11 Complexity of ParCom The sequential combining does n/2 multiplication. T ∞ (span) for ParCom: –T ∞ (n) = T ∞ (n/2) + O(log(n)) T ∞ (n) = O(log 2 (n)). –p = O(n/log 2 (n)). We run FFT on “stan” which has 4 CPUs. Thus p > 4 does not make sense, so we cut off the parallelism at some level of recursion to speed up the program.

12 Timings # processors Par time (ms) Speed up 4328373.77 3443152.79 2662621.87 11240060.998 Sequential FFT: 123789 (ms)


Download ppt "Parallelizing C Programs Using Cilk Mahdi Javadi."

Similar presentations


Ads by Google