1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling

9/3/20152 FRACTALS

9/3/20153 Fractals A fractal is a set of points such that: - its fractal dimension is infinite [infinite detail at every point]. - satisfies self-similarity: any part of the fractal is similar with the fractal. Generating a fractal is a iterative process: - start from P 0 - iteratively generate P 1 =F(P 0 ), P 2 =F(P 1 ), …, P n =F(P n-1 ), … P 0 is a set of initial points F is a transformation: Geometric transformations: translations, rotations, scaling, … Non-Linear coordinate transformation.

9/3/20154 We work with 2 rectangular areas. The user space: -Real coordinates (x,y) -Bounded between [xMin,xMax]*[yMin,yMax] The screen space -Integer coordinates (i, j) -Bounded between [0,w-1]*[0,h-1] -Is upside down with the Oy axis downward How to squeeze the user space into the screen space? How to translate (x,y) in (i,j)? Points vs Pixels

9/3/20155 Julia Sets – Self-Squaring Fractals Consider the generating function F(z)=z 2 +c, z,c  C. Sequence of complex numbers: z 0  C and z n+1= z n 2 + c. Chaotic behaviour but two attractors for |z n |: 0 and + . For a c  C, Julia’s set J c represents all the points whose orbit is finite.

9/3/20156 Julia Sets – Algorithm Inputs: c  C the complex number; [x min,x max ] * [y min,y max ] a region in plane. N iter a number of iterations for orbits; R a threshold for the attractor . Output: J c the Julia set of c Algorithm For each pixel (i,j) on the screen translate (i,j) into (x,y) construct z 0 =x+j*y; find the orbit of z 0 [first N iter elements] if (all the orbit points are under the threshold) draw (x,y)

9/3/20157 for(i=0; i<=width; i++) for(j=0; j<width; j++) { int k =0; // construct the orbit of z z.re = XMIN + i*STEP; z.im = YMIN + j*STEP; for (k=0; k < NUMITER; k++) { z = func(z,c); if (CompAbs(z) > R) break; } // test if the orbit in infinite if (k>NUMITER-1) { MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph); } else { MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph); }

9/3/20158 Julia Sets – || Algorithm Remark 1. zThe double for loop on (i,j) can be split into processors e.g. y uniform block or cyclic on i. y uniform block or cyclic on j. zNo communication at all between processors, therefore this is embarrassingly || computation. Remark 2. zAll processors draw a block of the fractal or several rows on the XGraph. zP rank knows the area to draw.

9/3/20159 for(i=rank*width/size; i<=(rank+1)*width/size; i++) for(j=0; j<width; j++){ // for(i=rank; i<width; i+=size) for(j=0; j<width; j++){ // for(i=0; i<width; i++) for(j=rank*width/size; j<=(rank+1)*width/size; j++) // for(i=0; i<width; i++) for(j=rank; j<width; j+=size) int k =0; // construct the orbit of z z.re = XMIN + i*STEP; z.im = YMIN + j*STEP; for (k=0; k < NUMITER; k++) { z = func(z,c); if (CompAbs(z) > R) break; } // test if the orbit in infinite if (k>NUMITER-1) { MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph); } else { MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph); }

9/3/201510

9/3/201511

9/3/201512 The Maldelbrot Set THE MANDELBROT FRACTAL IS AN INDEX FOR JULIA FRACTALS Maldelbrot Set contains all the points c  C such that z 0 =0 and z n+1= z n 2 + c has an finite orbit. Inputs: [x min,x max ] * [y min,y max ] a region in plane. N iter a number of iterations for orbits; R a threshold for the attractor . Output: M the Mandelbrot set. Algorithm For each (x,y) in [x min,x max ] * [y min,y max ] c=x+i*y; find the orbit of z 0 =0 while under the threshold. if (all the orbit points are not under the threshold) draw c(x,y)

9/3/201513 for(i=0; i<=width; i++) for(j=0; j<width; j++) { int k =0; // construct the point c c.re = XMIN + i*STEP; c.im = YMIN + j*STEP; // construct the orbit of 0 z.re = z.im = 0; for (k=0; k < NUMITER; k++) { z = func(z,c); if (CompAbs(z) > R) break; } // test if the orbit in infinite if (k>NUMITER-1) { MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph); } else { MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph); }

9/3/201514 The Mandelbrot Set – || Algorithm Remark 1. zThe double for loop on (i,j) can be split into processors e.g. y uniform block or cyclic on i. y uniform block or cyclic on j. zNo communication at all between processors, therefore this is embarrassingly || computation. Remark 2. zWhen the orbit goes to infinity in k steps then we can draw the pixel (i,j) with the k-th color from a palette. zBands color-ed similarly contain points with the same behaviour.

9/3/201515

9/3/201516 Fractal and Prime Numbers Prime numbers can generate fractals. Remarks: - If p>5 is prime then p%5 is 1,2,3,4. - 1,2,3,4 represent direction to do e.g. left, right, up down. - The fractal has the sizes w and h. Step 1. Initialise a matrix of color with 0. Step 2. For each number p>5 If p is prime then if(p%5==1)x=(x-1)%w; if(p%5==2)x=(x+1)%w; if(p%5==3)y=(y-1)%w; if(p%5==4)y=(y+1)%w; Increase the color of (x,y) Step 3. Draw the pixels with the color matrix.

9/3/201517 Simple Remarks The prime number set is infinite, furthermore it has no patter. prime: 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, … move: 3, 0, 2, 1, 3, 2, 4, 3, 4, 1, 2, … The set of moves satisfies: - it does not have any pattern  moves are quite random. - the number of 1-s, 2-s, 3-s and 4-s moves are quite similar, hence the central pixels are reached more often. The computation of the for loop is the most expensive operation.

9/3/201518 // initialise a matrix with 0 for(i=0;i<width;i++)for(j=0;j<width;j++)map[i][j]=0; //start from the image centre posX = posY = width/2; // traverse the set of prime numbers for(i=0;i<n;i++) { if(isPrime(2*i+1)) { // move to a new position on the map and increment it move = (2*i+1)%5; if (move==1) posX = (posX-1)%width; if (move==2) posX = (posX+1)%width; if (move==3) posY = (posY-1)%width; if (move==4) posY = (posY+1)%width; map[posY][posX]++ }

9/3/201519 Parallel Computation: Simple Remarks Processor rank gets some primes to test using some partitioning. Processor rank therefore will traverse the pixels according with some moves. Processor rank has to work with its own matrix map. The map must be reduce on processor 0 to find the total number of hits.

9/3/201520 Parallel Computation: Simple Remarks The parallel computation of processor rank follows the steps: 1.Initialise the matrix map. 2.For each prime number assigned to rank do 1.Find the move and go to a new location 2.Increment the map 3.Reduce the matrix map. 4.If processor 0 then draw the map.

21 Splitting Loops How to split the sequential loop if we have size processors? Maths: n iterations & size processors  n/size iterations per processor. for(i=0;i<n;i++) { // body of loop loop_body(data,i); }

22 Splitting Loops in Similar Blocks P rank gets the iterations rank*n/size, rank*n/size+1,…, (rank+1)*n/size-1 for(i=rank*n/size;i<(rank+1)*n/size;i++) { //aquire the data for this iteration loop_body(data,i); } rank*n/size (rank+1)*n/size-1 P rank

23 Splitting Loops in Cycles P rank gets the iterations rank, rank+size, rank+2*size,…. for(i=rank;i<n;i+=size) { //aquire the data for this iteration loop_body(data,i); } P rank

24 Splitting Loops in Variable Blocks P rank gets the iterations l[rank], l[rank]+1,…, u[rank] for(i=l[rank];i<=u[rank];i++) { //aquire the data for this iteration loop_body(data,i); } l[rank] u[rank] P rank

9/3/201525 // initialise a matrix with 0 for(i=0;i<width;i++)for(j=0;j<width;j++)map[i][j]=0; //start from the image centre posX = posY = width/2; // traverse the set of prime numbers for(i=rank*n/size;i<(rank+1)*n/size;i++) { if(isPrime(p=2*i+1)) { // move to a new position on the map and increment it move = p%5; if (move==1) posX = (posX-1)%width; if (move==2) posX = (posX+1)%width; if (move==3) posY = (posY-1)%width; if (move==4) posY = (posY+1)%width; map[posY][posX]++ } MPI_Reduce(&map[0][0], &globalMap[0][0], width*width, MPI_LONG, MPI_SUM, 0, MPI_COMM_WORLD); if(rank==0) { for(i=0;i<width;i++)for(j=0;j<width;j++) MPE_Draw_point(graph, i, j, colors[globalMap[i][j]); }

9/3/201526

9/3/201527 Scheduling

28 Parallel Loops Parallel loops represent the main source of parallelism. Consider a system with p processors P 1,P 2,…, P p and for i=1, n do call loop_body(i) end for Scheduling Problem: Map the iterations {1,2,…,n} onto processors so that: - the execution time is minimal. - the execution times per processors are balanced. - the processor’s idle time is minimal.

29 Parallel Loops Suppose that the workload of loop_body is know and given by w 1, w 2,…, w n. For Processor P J the set of iteration is S J ={i 1, i 2, …, i k } so - The execution time of Processor P J is T(P J )=∑ {w i : i in S J } - The execution time of the parallel loop is T=max{T(P J ): j=1,2,..,p}. Static Scheduling: the partition is found at the compiling time. Dynamic Scheduling: the partition is found at the running time.

30 Data Dependency A dependency exists between program statements when the order of statement execution affects the results of the program. A data dependency results from multiple use of the same location(s) in storage by different tasks. A data is “input” for another data. Dependencies are important to parallel programming because they are one of the primary inhibitors to parallelism. Loops with data dependencies cannot be scheduled. Example: The following for loop contains data dependencies. for i=1, n do a[i]=a[i-1]+1 end for

31 Load Balancing Load balancing refers to the practice of distributing work among processors so that all processors are kept busy all of the time. If all the processor execution times are the same then a perfect load balance is achieved. Load Imbalance is the most important overhead of parallel computation and reflects the case when there is a difference between two execution times.

34 Useful Rules: - If the workloads are similar then use static uniform block scheduling. - If the workloads increase/decrease then use static cyclic scheduling. - If we know the workloads and they are simple then guide the load balance. - If the workloads are not known they use dynamic methods.

35 Balanced Workload Block Scheduling w 1, w 2, …, w n the workload of the iterations - total workload is w 1 + w 2 + …+ w n - average per processor is Each Processor gets consecutive iterations: -l rank u rank – the lower and upper indices of the block - The workload is

36 Balanced Workload Block Scheduling Simple to work with integrals: Average Workload per a processor is Each processor workload is

42 Granularity Granularity is the ratio of computation to communication. Periods of computation are typically separated from periods of communication by synchronization events. Fine-grain Parallelism: Relatively small amounts of computational work are done between communication events. Facilitates load balancing and Implies high communication overhead and less opportunity for performance enhancement Coarse-grain Parallelism: Relatively large amounts of computational work are done between communication/synchronization events. Harder to load balance efficiently

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

Similar presentations

Presentation on theme: "1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

Similar presentations

Presentation on theme: "1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling."— Presentation transcript:

Similar presentations

About project

Feedback