12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12a.2 Basic Principles

12a.3 Parallel Programming Parallel Computation is defined as splitting the tasks a program in a way that they are executed by multiple processors The basic idea is that n processors can do the work of 1 processor in 1/nth the amount of time (this is called speedup)‏ Data dependency reduces the possibility for speedup

12a.4 Data Dependency Data dependency occurs when one processor (P1) computes a value required by another processor (P2)‏ Suppose the following two statements are executed by two separate processors: x = f(y, z); area = x*x*PI;

12a.5 Data Dependency (continued)‏ The second processor (P2) needs the value computed by the first processor (P1) in order to correctly compute the area This requires that P2 –wait for P1 to compute the value of x (synchronization) and –have access to that value (communication)‏

12a.6 Data Dependency (continued)‏ The four types of dependencies: –True: P1 modifies a variable needed by P2 –Anti: P1 modifies a variable after P2 needs the old value –Output: Both P1 and P2 modify a variable –(Input: Both P1 and P2 read a variable)‏

12a.7 Types of Parallel Computers Shared-memory –All processors have access to all memory... Interconnection Network Processors... Memory Modules

12a.8 Types of Parallel Computers (continued)‏ Distributed- memory: –All processors have their own memory –Communication is done through message-passing (which is added latency)‏... Interconnection Network Processors... Memory Modules

12a.9 Types of Parallel Computers (continued)‏ Distributed- Shared memory: –All processors have their own memory but the memory is virtually shared... Interconnection Network Processors... Memory Modules Virtual Memory

12a.10 Types of Parallel Computers (continued)‏ When P1 computes a value needed by P2: –On a shared-memory system, P2 already has access that the value computed, but it still needs to wait for P1 to finish the computation (synchronization)‏ –On a distributed-memory system, P1 needs to send a message to P2 containing the required value (synchronization and communication)‏

12a.11 Types of Parallel Computers (continued)‏ When P1 computes a value needed by P2: –On a distributed-shared memory system, the programmer only needs to be concerned with synchronization –However, when P2 accesses P1's memory, a message must be sent implicitly containing the contents of P1's memory

12a.12 Parallel Execution Metrics Granularity – A measure of how big or small the individual units of computation are. With message-passing, large granularity is important to reduce the time spent sending messages: computation time communication time = t comp t comm

12a.13 Fastest known execution time on 1 processor Execution time on n processors Parallel Execution Metrics (continued)‏ Speedup – A measure of relative performance between a parallel program and a sequential program: = ts*tpts*tp = S(n)‏ Execution time on 1 processor Execution time on n processors = tstptstp =S(n)‏ Why should speedup be no better than linear (i.e. S(n) <= n)?

12a.14 Suppose S(n) > n then => t s * /t p > n => t s * > n*t p If we implement the parallel solution on 1 processor (where the 1 processor does the work of all the others), it would take n*t p to complete the work. However this would be a sequential program that is faster than the fastest sequential program! Parallel Execution Metrics (continued)‏

12a.15 t s t p * n Parallel Execution Metrics (continued)‏ Efficiency – A measure of how well the processors are being used: An Efficiency of 100% means that the processors are all busy; 50% means that they are used half of the time on average Execution time on 1 processor Execution time on n processors * n ==E(n)‏ S(n)‏ n =

12a.16 t p * n Parallel Execution Metrics (continued)‏ Cost – A measure of how much total CPU time is used (or wasted) to perform the computation: = Cost t s E(n)‏ =

12a.17 Example of Metrics Suppose we have a problem where the sequential execution time is 25 units and the parallel times are shown in the table below:

12a.18 Example of Metrics (continued)‏

12a.19 Example of Metrics (continued)‏

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Similar presentations

Presentation on theme: "12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Similar presentations

Presentation on theme: "12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008."— Presentation transcript:

Similar presentations

About project

Feedback