Presentation is loading. Please wait.

Presentation is loading. Please wait.

O PERATING S YSTEMS AND A RCHITECTURES CS-M98: C OURSEWORK S OLUTION Benjamin Mora 1 Swansea University Dr. Benjamin Mora.

Similar presentations


Presentation on theme: "O PERATING S YSTEMS AND A RCHITECTURES CS-M98: C OURSEWORK S OLUTION Benjamin Mora 1 Swansea University Dr. Benjamin Mora."— Presentation transcript:

1 O PERATING S YSTEMS AND A RCHITECTURES CS-M98: C OURSEWORK S OLUTION Benjamin Mora 1 Swansea University Dr. Benjamin Mora

2 M ARKING RANGE 2 Benjamin Mora Swansea University Full understanding of problem and solution (>97) Ready for employment in HPC sector None of you (some very close though)! Almost there with multithreading. (70 to 97) Just need to see and understand solution. Most students in this category. Real issues with multithreading concepts, merging temporary results, and few basic C errors (50 to 70) Some hard work is really needed to understand the full solution <50: Issues with basic (C) programming and algorithmic concepts, including pointers and creating a data-structures Catching-up is crucial!!!

3 Q1 3 Benjamin Mora Swansea University Alignement of Data. Similar to lab exercise. See CPU part marks.

4 Q1 4 Benjamin Mora Swansea University void AoS_to_SoA (float *image, int x, int y) { imageRed=new float[x*y+PADDING]; imageGreen=new float[x*y+PADDING]; imageBlue=new float[x*y+PADDING]; unsigned long long alignR=(((unsigned long long) *imageRed)&31)/4; unsigned long long alignG=(((unsigned long long) *imageGreen)&31)/4; unsigned long long alignB=(((unsigned long long) *imageBlue)&31)/4; alignedRed=imageRed+8-alignR; alignedGreen=imageGreen+8-alignG; alignedBlue=imageBlue+8-alignB; float *R=alignedRed; float *G=alignedGreen; float *B=alignedBlue; for (int i=0;i

5 Q2 L OOP FOR K ITERATIONS 5 Benjamin Mora Swansea University for (int k=0;k

6 Q2 T HEN 6 Benjamin Mora Swansea University … //2. Determine and compute average of closer seeds for (int pixel=0;pixel

7 Q2 R ECOMPUTE NEW SEEDS 7 Benjamin Mora Swansea University //Last step for the iteration: compute average and update the current seed list for (int seed=0;seed0.01) { seeds[0][seed]=seedSums[0][seed]/seedCounters[seed]; seeds[1][seed]=seedSums[1][seed]/seedCounters[seed]; seeds[2][seed]=seedSums[2][seed]/seedCounters[seed]; } …//End of iteration

8 Q2 8 Benjamin Mora Swansea University Optimizing the inner loop Process 8 pixels at a time. Compare 8 pixels against one seed! Some were confused and tried 8 pixels vs 8 seeds Use cmplt and blend to replace condition. 2 blend s instructions needed! Some replicated mask computations! The part after the inner loop cannot be parallelized though. Still good speed-up using SIMD Especially when # seeds > 32 Many ways to do it. Extra cast computations done by all of you!

9 Q2 9 Benjamin Mora Swansea University Optimization comes from: Processing 8 pixels at a time. Removing the branch (no if then) Still tricky to get good speed up. Going further Loop unrolling. Minimize the number of computations inside the inner loop. Put all constant operations like set1 outside loop. Avoid shared cache lines when multithreading!

10 Q2 L OOP FOR K ITERATIONS 10 Benjamin Mora Swansea University float seedSums[3][N]; float seedCounters[N]; //Seed initialization; for(int j=0;j<3;j++) for(int i=0;i

11 Q2 L OOP FOR K ITERATIONS 11 Benjamin Mora Swansea University float seedSums[3][N];float seedCounters[N]; float8 seedId[N]; for (int seed=0;seed

12 Q2 T HEN 12 Benjamin Mora Swansea University … //2. Determine and compute average of closer seeds for (int pixel=0;pixel

13 Q2 T HEN 13 Benjamin Mora Swansea University float8 *R=(float8 *) alignedRed; float8 *G=(float8 *) alignedGreen; float8 *B=(float8 *) alignedBlue; for (int pixel=0;pixel

14 Q2 T HEN 14 Benjamin Mora Swansea University //Sum the pixel values to the appropriate seed for (int i=0;i<8;i++) { int found=(int&) found8.m256_f32[i]; seedCounters[found]+=1.; seedSums[0][found]+=((float *) R)[i]; seedSums[1][found]+=((float *) G)[i]; seedSums[2][found]+=((float *) B)[i]; } R++; G++; B++; } …

15 Q2 R ECOMPUTE NEW SEEDS 15 Benjamin Mora Swansea University Still the same!!! //Last step for the iteration: compute average and update the current seed list for (int seed=0;seed0.01) { seeds[0][seed]=seedSums[0][seed]/seedCounters[seed]; seeds[1][seed]=seedSums[1][seed]/seedCounters[seed]; seeds[2][seed]=seedSums[2][seed]/seedCounters[seed]; } …//End of iteration

16 Q3 16 Benjamin Mora Swansea University Most of you got the principles more or less right Practical implementation was wrong! Barriers were sometimes at the wrong location. Most of you added extra, unneeded barriers. Mutex have been accepted. Putting a lock on every seed change is too much/not good! Errors: Only using results from one thread at each iteration.

17 Q3 I DEA 17 Benjamin Mora Swansea University Break down image in 4 pieces For each thread iteration: Copy seeds in local variables (Performance) Loop for the current chunk of pixels. Compute seedSums and seeCounters the same way. Copy results in globally visible but separate variables. Barrier One thread Adds results from other threads to its own results Then Compute RGB average and update seeds. Barrier

18 Q3 C REATING T HREADS 18 Benjamin Mora Swansea University void knnCompressionSIMDPosix(float *image, int x, int y) { AoS_to_SoA(image,x,y); threadJobSize=x*y/nbThreads; pthread_t threads[nbThreads]; pthread_barrier_init(&barrier, NULL, nbThreads); for (int i=0;i

19 Q3 T HREAD ’ S J OB 19 Benjamin Mora Swansea University void * posixThread(void *arg) { long long threadNumber=(long long) arg; int firstPixel=threadNumber*threadJobSize; int lastPixel=firstPixel+threadJobSize; float seedSums[3][N]; float seedCounters[N]; //Seed initialization; float8 seedId[N]; for (int seed=0;seed

20 Q3 T HREAD ’ S J OB 20 Benjamin Mora Swansea University for (int k=0;k

21 Q3 M ERGING R ESULTS 21 Benjamin Mora Swansea University for (int seed=0;seed

22 Q3 M ERGING R ESULTS 22 Benjamin Mora Swansea University if (threadNumber==0) { for (int thread=1;thread

23 Q3 M ERGING R ESULTS 23 Benjamin Mora Swansea University for (int seed=0;seed0.01) { seeds[0][seed]=temporaryResults[0][0][seed] /temporaryCounters[0][seed]; seeds[1][seed]=temporaryResults[0][1][seed] /temporaryCounters[0][seed]; seeds[2][seed]=temporaryResults[0][2][seed] /temporaryCounters[0][seed]; } } //end condition threadNumber==0 pthread_barrier_wait(&barrier); //end of iteration, seeds have been updated!


Download ppt "O PERATING S YSTEMS AND A RCHITECTURES CS-M98: C OURSEWORK S OLUTION Benjamin Mora 1 Swansea University Dr. Benjamin Mora."

Similar presentations


Ads by Google