Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Muhammed Al-Mulhem 1ICS535-101 ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation.

Similar presentations


Presentation on theme: "Dr. Muhammed Al-Mulhem 1ICS535-101 ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation."— Presentation transcript:

1 Dr. Muhammed Al-Mulhem 1ICS535-101 ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example

2 Dr. Muhammed Al-Mulhem 2ICS535-101 Example Consider a simple C program that calculates values of some mathematical function. Consider a simple C program that calculates values of some mathematical function. The source of these slides is the following: The source of these slides is the following: http://www.viva64.com/content/articles/parallel- programming/?f=OpenMP_debug_and_optimization.html&lang=en&content= parallel-programming http://www.viva64.com/content/articles/parallel- programming/?f=OpenMP_debug_and_optimization.html&lang=en&content= parallel-programming

3 Dr. Muhammed Al-Mulhem 3ICS535-101 Listing 1 When calling this function with N to 15000, we'll get 287305025.528. When calling this function with N to 15000, we'll get 287305025.528.

4 Dr. Muhammed Al-Mulhem 4ICS535-101 Listing 2 This function can be easily paralleled with the help of OpenMP. This function can be easily paralleled with the help of OpenMP. To do this we use the #pragma directive before the first for statement : To do this we use the #pragma directive before the first for statement :

5 Dr. Muhammed Al-Mulhem 5ICS535-101 Listing 2 Unfortunately, the code we've created is incorrect and the result of the function is in general undefined. Unfortunately, the code we've created is incorrect and the result of the function is in general undefined. For example, it can be 298441282.231. For example, it can be 298441282.231. Why it doesn’t work?? Why it doesn’t work??

6 Dr. Muhammed Al-Mulhem 6ICS535-101 Listing 2 The main cause of errors in parallel programs is incorrect work with shared resources, i.e. resources common for all launched processes, and in particular - with shared variables. The main cause of errors in parallel programs is incorrect work with shared resources, i.e. resources common for all launched processes, and in particular - with shared variables. Variables in OpenMP-programs are divided into: Variables in OpenMP-programs are divided into: shared, existing as single copies and available for all the threads, and shared, existing as single copies and available for all the threads, and private, localized in a concrete process. private, localized in a concrete process. By default all the variables in parallel regions of OpenMP are shared save for parallel loops' indexes and variables defined inside these parallel regions. By default all the variables in parallel regions of OpenMP are shared save for parallel loops' indexes and variables defined inside these parallel regions.

7 Dr. Muhammed Al-Mulhem 7ICS535-101 Listing 2 From the example above x, y and s are taken as shared variables which is incorrect. From the example above x, y and s are taken as shared variables which is incorrect. Only s variable should be shared Only s variable should be shared Each process calculates their value of x, y and writes it into the corresponding variable (x or y). Each process calculates their value of x, y and writes it into the corresponding variable (x or y). Considering x and y as shared variables, the result depends on the sequence of executing the parallel threads. Considering x and y as shared variables, the result depends on the sequence of executing the parallel threads.

8 Dr. Muhammed Al-Mulhem 8ICS535-101 Listing 2 To search such errors we need a debugger such as To search such errors we need a debugger such as Intel Thread Checker (dynamic code analyzer) Intel Thread Checker (dynamic code analyzer) http://www.intel.com/software/products/tcwin VivaMP (static code analyzer) VivaMP (static code analyzer) http://www.viva64.com/vivamp-tool/

9 Dr. Muhammed Al-Mulhem 9ICS535-101 Listing 2 Let's consider s += j*y instruction. Let's consider s += j*y instruction. Originally it is suggested that each thread add the calculated result to the current value of s variable and then the same operations are executed by all the other threads. Originally it is suggested that each thread add the calculated result to the current value of s variable and then the same operations are executed by all the other threads. But in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them first reads the current value of s variable, then adds the result of j*y to this value and writes the final result into s variable. But in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them first reads the current value of s variable, then adds the result of j*y to this value and writes the final result into s variable.

10 Dr. Muhammed Al-Mulhem 10ICS535-101 Listing 2 You can avoid such a situation by making sure that at any moment only one thread is allowed to execute s += j*y operation. You can avoid such a situation by making sure that at any moment only one thread is allowed to execute s += j*y operation. Such operations are called indivisible or atomic. Such operations are called indivisible or atomic. To make some instruction atomic we use To make some instruction atomic we use #pragma omp atomic #pragma omp atomic The program code in which the described operations are corrected is shown in Listing 3. The program code in which the described operations are corrected is shown in Listing 3.

11 Dr. Muhammed Al-Mulhem 11ICS535-101 Listing 3 Does this work??? Is it effective????

12 Dr. Muhammed Al-Mulhem 12ICS535-101 Listing 3 Running the code shows that it Running the code shows that it Is parallelism effective?. Is parallelism effective?. Let's measure the execution time for three functions: Let's measure the execution time for three functions: 1. sequential, 2. parallel incorrect 3. parallel correct The results of this measuring for N=1500 are given in Table 1 The results of this measuring for N=1500 are given in Table 1

13 Dr. Muhammed Al-Mulhem 13ICS535-101 Listing 3 The correct variant works more than 60 times slower than the sequential one (Why???). The correct variant works more than 60 times slower than the sequential one (Why???). Do we need such parallelism? Of course not. Do we need such parallelism? Of course not.

14 Dr. Muhammed Al-Mulhem 14ICS535-101 Listing 3 The reason is that we have chosen a very ineffective method of solving the problem of summing the result in s variable by using atomic directive. The reason is that we have chosen a very ineffective method of solving the problem of summing the result in s variable by using atomic directive. This approach leads to that the threads wait for each other very often. This approach leads to that the threads wait for each other very often. To avoid constant deadlocks when executing atomic summing operation we can use the special directive reduction. To avoid constant deadlocks when executing atomic summing operation we can use the special directive reduction. reduction option defines that the variable will get the combined value at the exit from the parallel block. reduction option defines that the variable will get the combined value at the exit from the parallel block. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The modified variant of the function is shown in Listing 4. The modified variant of the function is shown in Listing 4.

15 Dr. Muhammed Al-Mulhem 15ICS535-101 Listing 4 T able 2 shows the result of running this code T able 2 shows the result of running this code

16 Dr. Muhammed Al-Mulhem 16ICS535-101 Listing 4 The code is correct and of higher performance The code is correct and of higher performance The speed of calculations has almost doubled. The speed of calculations has almost doubled.

17 Dr. Muhammed Al-Mulhem 17ICS535-101 Conclusion Although parallel programming provides many ways to increase code effectiveness it demands attention and good knowledge of the used technologies from the programmer. Although parallel programming provides many ways to increase code effectiveness it demands attention and good knowledge of the used technologies from the programmer. Fortunately, there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of multi-thread applications. Fortunately, there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of multi-thread applications.


Download ppt "Dr. Muhammed Al-Mulhem 1ICS535-101 ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation."

Similar presentations


Ads by Google