Presentation is loading. Please wait.

Presentation is loading. Please wait.

04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

Similar presentations


Presentation on theme: "04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita."— Presentation transcript:

1 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita

2 04/10/25Parallel and Distributed Programming2 Agenda  Introduction  Sample Sequential Program  Multi-thread programming  OpenMP  Summary

3 04/10/25Parallel and Distributed Programming3 Agenda  Introduction  Sample Sequential Program  Multi-thread programming  OpenMP  Summary

4 04/10/25Parallel and Distributed Programming4 Parallel Programming Model  Message Passing Model Talked by Imatake-kun just now  Shared Memory Model Memory is shared with all process elements  Multiprocessor (SMP, SunFire, …)  DSM (Distributed Shared Memory) Process elements can communicate each other through the shared memory

5 04/10/25Parallel and Distributed Programming5 Shared Memory Model PE …… Memory

6 04/10/25Parallel and Distributed Programming6 Shared Memory Model  Simplicity not necessary to think about the location of the computation data  Fast communication (Multiprocessor) not necessary to use networks in process communication  Dynamic load sharing the same reason as simplicity

7 04/10/25Parallel and Distributed Programming7 Shared Memory Parallel Programming  Multi-thread programming Pthreads  OpenMP Parallel Programming model for shared memory multiprocessor

8 04/10/25Parallel and Distributed Programming8 Agenda  Introduction  Sample Sequential Program  Multi-thread programming  OpenMP  Summary

9 04/10/25Parallel and Distributed Programming9 Sample Sequential Program … loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } } … FDM (Finite Difference Method)

10 04/10/25Parallel and Distributed Programming10 Parallelization Procedure Sequential Computation Decomposition Tasks Assignment Process Elements Orchestration Mapping Processors

11 04/10/25Parallel and Distributed Programming11 Parallelize the Sequential Program  Decomposition a task … loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } } …

12 04/10/25Parallel and Distributed Programming12 Parallelize the Sequential Program  Assignment PE Divide the tasks equally among process elements

13 04/10/25Parallel and Distributed Programming13 Parallelize the Sequential Program  Orchestration PE need to communicate and to synchronize

14 04/10/25Parallel and Distributed Programming14 Parallelize the Sequential Program  Mapping PE Multiprocessor

15 04/10/25Parallel and Distributed Programming15 Agenda  Introduction  Sample Sequential Program  Multi-thread programming  OpenMP  Summary

16 04/10/25Parallel and Distributed Programming16 Multi-thread Programming  A process element is a thread cf. a process  Memory is shared among all threads generated from the same process  Threads can communicate with each other through shared memory

17 04/10/25Parallel and Distributed Programming17 Fork-Join Model Fork Join Parallelized Section Serialized Section Program starts (Main Thread) Main Thread creates new threads Other threads join Main Thread Main Thread continues processing Main Thread

18 04/10/25Parallel and Distributed Programming18 Libraries for Thread Programming  Pthreads (C/C++) pthread_create() pthread_join()  Java Thread Thread Class / Runnable Interface

19 04/10/25Parallel and Distributed Programming19 Pthreads API (fork/join)  pthread_t // thread variable  pthread_create ( pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function )  pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread )

20 04/10/25Parallel and Distributed Programming20 Pthreads Parallel Programming #include … void do_sequentially (void){ /* sequential execution */ } main (){ … do_sequentially(); // want to parallelize … }

21 04/10/25Parallel and Distributed Programming21 Pthreads Parallel Programming #include … #include void do_in_parallel (void){ /* parallel execution */ } main (){ pthread_t tid; … pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel(); pthread_join(tid); … }

22 04/10/25Parallel and Distributed Programming22 Exclusive Access Control int sum = 0; thread_A(){ sum++; } thread_B(){ sum++; } ThreadAThreadB a ← read sum write a → sum a = a + 1 a ← read sum write a → sum a = a + 1 0 0 1 1 sum = 0 sum = 1

23 04/10/25Parallel and Distributed Programming23 Pthreads API ( Exclusive Access Control )  Variable pthread_mutex_t  Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr )  Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex)

24 04/10/25Parallel and Distributed Programming24 Exclusive Access Control int sum = 0; pthread_mutex_t mutex; pthread_mutex_init(&mutex, 0) thread_A(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex); } thread_B(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex); } acquire lock sum ++ release lock sum ++ acquire lock release lock ThreadAThreadB

25 04/10/25Parallel and Distributed Programming25 Pthreads API ( Condition Variable )  Variable pthread_cond_t  Initialization Function pthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr )  Condition Function pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond);

26 04/10/25Parallel and Distributed Programming26 Condition Wait acquire lock release lock pthread_mutex_lock(&mutex) while( condition is not satisfied ){ pthread_cond_wait(&cond, &mutex); } pthread_mutex_unlock(&mutex); Is condition satisfied? release lock sleep pthread_cond_broadcast pthread_cond_signal pthread_mutex_lock(&mutex) update_condition(); pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mutex); ThreadA ThreadB

27 04/10/25Parallel and Distributed Programming27 Synchronization  Synchronization in the sample program n = 0; … pthread_mutex_lock(&mutex); n++; while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex); } pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mutex);

28 04/10/25Parallel and Distributed Programming28 Characteristics of Pthreads  troublesome to describe exclusive access control and synchronization  likely to be deadlocked  still hard to parallelize a given sequential program

29 04/10/25Parallel and Distributed Programming29 Agenda  Introduction  Sample Sequential Program  Multi-thread programming  OpenMP  Summary

30 04/10/25Parallel and Distributed Programming30 What’s OpenMP?  specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs Fortran ver1.0 API – Oct.1997 C/C++ ver1.0 API – Oct. 1998

31 04/10/25Parallel and Distributed Programming31 Background of OpenMP  spread of shared memory multiprocessors  need for common directives in shared memory multiprocessors Each vendors had provided a different set of directives  need for simpler and more flexible interface for developing parallel applications Pthread is hard for developers to describe parallel applications

32 04/10/25Parallel and Distributed Programming32 OpenMP API  Directives  Libraries  Environment Variables

33 04/10/25Parallel and Distributed Programming33 Directives  C/C++  Fortran #pragma omp directive_name … !$OMP directive_name … If user’s compiler doesn’t support openMP, the directive sentences are ignored and therefore the program can be executed as a sequential program.

34 04/10/25Parallel and Distributed Programming34 Parallel Region  the part parallelized by some threads #pragma omp parallel { /* parallel region */ } create some threads at the beginning of the parallel region join at the end of the parallel region

35 04/10/25Parallel and Distributed Programming35 Parallel Region (thread)  the number of thread omp_get_num_threads() : get current # of threads omp_set_num_threads(int nthreads) : set # of threads to nthreads $OMP_NUM_THREADS  thread ID (0 ~ # of threads-1) omp_get_thread_num() : get thread ID

36 04/10/25Parallel and Distributed Programming36 Work Sharing Construction  specify the task assignment inside parallel region for  sharing iterations among threads sections  sharing sections among threads single  executing only by one thread

37 04/10/25Parallel and Distributed Programming37 Example of Work Sharing for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } omp_set_num_threads(4); #pragma omp parallel #pragma omp for for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } omp_set_num_threads(4); #pragma omp parallel for for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } Memory access conflict at i and j makes the computation slow

38 04/10/25Parallel and Distributed Programming38 Data Scoping Attributes  specify the data scoping at parallel construction or work sharing construction shared( var_list )  var_list is shared among threads private( var_list )  var_list is private reduction (operator : var_list )  ex) #pragma omp for reduction (+: sum)  var_list is private in construction and reflected after the construction

39 04/10/25Parallel and Distributed Programming39 Example of Data Scoping Attributes omp_set_num_threads(4); #pragma omp parallel for private(i, j) for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }

40 04/10/25Parallel and Distributed Programming40 Synchronization  barrier wait until all threads reach this line #pragma omp barrier  critical execute exclusively #pragma omp critical [(name)] { … }  atomic update a scalar variable atomically #pragma omp atomic ……

41 04/10/25Parallel and Distributed Programming41 Synchronization (Pthreads/OpenMP)  Synchronization in the sample program pthread_mutex_lock(&mutex); n++; while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex); } pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mutex); #pragma omp barrier

42 04/10/25Parallel and Distributed Programming42 Summary of OpenMP  Incremental parallelization of sequential programs  Portability  Easier to implement parallel application than Pthreads and MPI

43 04/10/25Parallel and Distributed Programming43 Agenda  Introduction  Sample Sequential Program  Multi-thread programming  OpenMP  Summary

44 04/10/25Parallel and Distributed Programming44 Message Passing Model / Shared Memory Model Message PassingShared Memory ArchitectureanySMP or DSM Programmingdifficulteasier Performancegoodbetter (SMP) worse (DSM) Costless expensivevery expensive SunFire15K $4,140,830

45 04/10/25Parallel and Distributed Programming45 Thank you!


Download ppt "04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita."

Similar presentations


Ads by Google