Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paraguin Compiler Examples.

Similar presentations


Presentation on theme: "Paraguin Compiler Examples."— Presentation transcript:

1 Paraguin Compiler Examples

2 Examples Matrix Addition (the complete program)
Traveling Salesman Problem (TSP) Sobel Edge Detection

3 Matrix Addition The complete program

4 Matrix Addition (complete)
#define N 512 #ifdef PARAGUIN typedef void* __builtin_va_list; extern int MPI_COMM_WORLD; extern int MPI_Barrier(); #endif #include <stdio.h> #include <math.h> #include <sys/time.h> print_results(char *prompt, float a[N][N]); int main(int argc, char *argv[]) { int i, j; float a[N][N], b[N][N], c[N][N]; char *usage = "Usage: %s file\n"; FILE *fd;

5 Matrix Addition (complete)
double elapsed_time; struct timeval tv1, tv2; if (argc < 2) { fprintf (stderr, usage, argv[0]); return -1; } if ((fd = fopen (argv[1], "r")) == NULL) { fprintf (stderr, "%s: Cannot open file %s for reading.\n", argv[0], argv[1]);

6 Matrix Addition (complete)
// Read input from file for matrices a and b. // The I/O is not timed because this I/O needs // to be done regardless of whether this program // is run sequentially on one processor or in // parallel on many processors. Therefore, it is // irrelevant when considering speedup. for (i = 0; i < N; i++) for (j = 0; j < N; j++) fscanf (fd, "%f", &a[i][j]); fscanf (fd, "%f", &b[i][j]);

7 Matrix Addition (complete)
#ifdef PARAGUIN ; #pragma paraguin begin_parallel // This barrier is here so that we can take a time stamp // Once we know all processes are ready to go. MPI_Barrier(MPI_COMM_WORLD); #pragma paraguin end_parallel #endif // Take a time stamp gettimeofday(&tv1, NULL); // Broadcast the input to all processors. This could be // faster if we used scatter, but Bcast is easy and scatter // is not implemented in Paraguin #pragma paraguin bcast a b

8 Matrix Addition (complete)
// Parallelize the following loop nest assigning iterations // of the outermost loop (i) to different partitions. #pragma paraguin forall C p i j \ 0x x0 \ 0x x0 // We need to gather all values c[i][j]. So we can just // use i,j => 0. #pragma paraguin gather 0x0 C i j \ 0x0 0x0 0x0 for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { c[i][j] = a[i][j] + b[i][j]; }

9 Matrix Addition (complete)
; #pragma paraguin end_parallel // Take a time stamp. This won't happen until after the master // process has gathered all the input from the other processes. gettimeofday(&tv2, NULL); elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / ); printf ("elapsed_time=\t%lf (seconds)\n", elapsed_time); // print result print_results("C = ", c); }

10 Matrix Addition (complete)
print_results(char *prompt, float a[N][N]) { int i, j; printf ("\n\n%s\n", prompt); for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { printf(" %.2f", a[i][j]); } printf ("\n"); printf ("\n\n");

11 Matrix Addition After compiling with the command: This produces:
runparaguin matrixadd.c This produces: matrixadd.out.c (source with MPI) matrixadd.out (compiled with mpicc) (Demonstration)

12 Partitioning Reviewed
#pragma paraguin forall C p i j \ 0x x0 \ 0x x0 The expression above assigns each iteration of the i loop to its own partition (p = i). We could also partition along the j loop: 0x x \ 0x x Or would could have many other partitions

13 Partitioning Reviewed
The partitioning is a system of inequalities written in matrix/vector form: where is a matrix, and and are vectors.

14 Partitioning Reviewed
So the partition expressed in the pragma: #pragma paraguin forall C p i j \ 0x x0 \ 0x x0 Represents the following:

15 Partitioning Reviewed
If we multiply this out: We get:

16 Partitioning Reviewed
Now simplify:

17 Partitioning Reviewed
#pragma paraguin forall C p i j \ 0x x0 \ 0x x0 j p = i p=0 p=1 p=2 p=3 p=4 p=5 p=6 p=7 p=8 p=9 i p=10 p=11

18 Partitioning Reviewed
So the partition expressed in the pragma: #pragma paraguin forall C p i j \ 0x x \ 0x x Represents the following:

19 Partitioning Reviewed
If we multiply this out: We get:

20 Partitioning Reviewed
Now simplify:

21 Partitioning Reviewed
#pragma paraguin forall C p i j \ 0x x \ 0x x p=11 p=10 p=9 p=8 p=7 p=6 j p=5 p = j p=4 p=3 p=2 p=1 p=0 i

22 Partitioning Reviewed
Let’s say we want to partition using p=i+j We actually have to go the other direction

23 Partitioning Reviewed

24 Partitioning Reviewed
To write this as a pragma: #pragma paraguin forall C p i j \ 0x \ 0x

25 Partitioning Reviewed
#pragma paraguin forall C p i j \ 0x \ 0x p=23 p=22 p=21 p=20 p=19 p=18 p=17 p=16 p = i + j p=15 j p=14 p=13 p=12 p=11 p=10 p=9 p=8 p=7 p=6 p=5 p=4 p=3 p=2 p=1 i

26 Traveling Salesman Problem (TSP)

27 The Traveling Salesman Problem is simply to find the shortest circuit (Hamiltonian circuit) that visits every city in a set of cities at most once

28 This problem falls into the class of “NP-hard” problems
What that means is that there is no known “polynomial” time (“big-oh” of a polynomial) algorithm that can solve it The only know algorithm to solve it is to compare the distances of all possible Hamiltonian circuits. But there are N! possible circuits of N cities.

29 Yes heuristics can be applied to find a “good” solution fast, but there’s no guarantee it is the best The “brute force” algorithm is to consider all possible permutations of the N cities First we’ll fix the first city since there are N equivalent circuits where we rotate the cities We will consider the reverse directions to be different circuits but that’s hard to account for

30 If we number the cities from 0 to N-1, and 0 is the origination city, then the possible permutations of 4 cities are: 0->1->2->3->0 0->1->3->2->0 0->2->3->1->0 0->2->1->3->0 0->3->1->2->0 0->3->2->1->0 Notice that there are some permutations that are the reverse of other. These are equivalent permutations. Since we are fixing origination city, there are (N-1)! permutations.

31 We can compute the distances between all pairs of locations (O(N2))
This is the input City 0 City 1 City 2 City 3

32 Solution: Use a for loop to assign the first two cities
Problem: Iterating through the possible permutations is recursive, but we need a straight forward for loop to parallelize Solution: Use a for loop to assign the first two cities Since city 0 is fixed, there are n-1 choices for city 1 and n-2 choices for city 2 That means there are (n-1)(n-2) = n2 – 3n + 2 combinations of the first two cities

33 Assignment of cities 0-2 N = n*n - 3*n + 2; // (n-1)(n-2) perm[0] = 0; for (i = 0; i < N; i++) { perm[1] = i / (n-2) + 1; perm[2] = i % (n-2) + 1; ...

34 ; #pragma paraguin begin_parallel perm[0] = 0; minDist = -1
; #pragma paraguin begin_parallel perm[0] = 0; minDist = -1.0; if (n == 2) { perm[1] = 1; // If n == 2, then N == 0, // and we are done. minPerm[0] = perm[0]; minPerm[1] = perm[1]; minDist = computeDist(D, n, perm); } #pragma paraguin bcast n #pragma paraguin bcast N #pragma paraguin bcast D

35 #pragma paraguin forall C p N i \ 0x0 -1 0x0 1 \ 0x0 1 0x0 -1 for (i = 0; i < N; i++) { perm[1] = i / (n-2) + 1; perm[2] = i % (n-2) + 1; ...

36 Sobel Edge Detection

37 Sobel Edge Detection Given an image, the problem is to detect where the “edges” are in the picture

38 Sobel Edge Detection

39 Sobel Edge Detection Algorithm
/* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; // handle image boundaries if(x==0 || x==(h-1) || y==0 || y==(w-1)) sum = 0; else{ Pragmas go here

40 Sobel Edge Detection Algorithm
//x gradient approx for(i=-1; i<=1; i++) for(j=-1; j<=1; j++) sumx += (grayImage[x+i][y+j] * GX[i+1][j+1]); //y gradient approx sumy += (grayImage[x+i][y+j] * GY[i+1][j+1]); //gradient magnitude approx sum = (abs(sumx) + abs(sumy)); } edgeImage[x][y] = clamp(sum);

41 Sobel Edge Detection Algorithm
Inputs (that need to be broadcast or scattered): GX and GY arrays grayImage array w and h (width and height) There are 4 nested loops (x, y, i, and j) The final answer is the array edgeImage

42 Sobel Edge Detection Algorithm
We put these in front of that loop to parallelize it. ; #pragma paraguin begin_parallel #pragma paraguin bcast grayImage #pragma paraguin bcast w #pragma paraguin bcast h #pragma paraguin forall C p x y i j \ 0x x0 0x0 0x0 \ 0x x0 0x0 0x0 #pragma paraguin gather 4 C x y \ 0x0 0x0 0x0 These are the inputs Partition the x loop (outermost loop) Gather all elements of the edgeImage array


Download ppt "Paraguin Compiler Examples."

Similar presentations


Ads by Google