Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel System Performance CS 524 – High-Performance Computing.

Similar presentations


Presentation on theme: "Parallel System Performance CS 524 – High-Performance Computing."— Presentation transcript:

1 Parallel System Performance CS 524 – High-Performance Computing

2 CS 524 (Wi 2003/04)- Asim Karim @ LUMS2 Parallel System Performance Parallel system = algorithm + hardware Measure of problem  Problem size: e.g. the dimension N in vector and matrix computations  Floating point operations  Execution time Measure of hardware  Number of processors, p  Interconnection network performance (channel bandwidth, cost, diameter, etc)  Memory system characteristics (sizes, bandwidth, etc)

3 CS 524 (Wi 2003/04)- Asim Karim @ LUMS3 Performance Metrics Execution time  Serial run time is the time elapsed between the beginning and the end of execution on a sequential computer (T S )  Parallel run time is the time that elapses from the moment parallel execution starts to the moment the last processor finishes execution (T P ) Speedup (S): the ratio of the serial execution time of the best sequential algorithm to the parallel execution time Efficiency (E): the effective fractional utilization of parallel hardware Cost (C): the sum of times each processor spends on the problem

4 CS 524 (Wi 2003/04)- Asim Karim @ LUMS4 Speedup Speedup, S = T S /T P  Measures benefit of parallelizing program  Usually less than number of processors, p (sublinear speedup)  Can S be greater than p (super linear speedup)? p S sublinear (typical) linear superlinear

5 CS 524 (Wi 2003/04)- Asim Karim @ LUMS5 Efficiency and Cost Efficiency, E = S/p  Measures utilization of processors for problem computation only  Usually ranges from 0 to 1  Can efficiency be greater than 1? Cost, C = pT P (also known as work or processor-time product)  Measures sum of times spent by each processor  Cost-optimal: cost of solving a problem on a parallel computer is proportional to the execution time of the fastest known sequential algorithm on a single processor E = T S /C

6 CS 524 (Wi 2003/04)- Asim Karim @ LUMS6 Amdahl’s Law Let W = work needed to solve a problem and W S = work that is serial (i.e. is not parallelizable) The maximum possible speedup on p processors (assuming no superlinear speedup) is obtained as: S = W/[W S + (W – W S )/p]  If a problem has 10% of serial computation, the maximum speedup is 10  If a problem has 1% of serial computation, the maximum speedup is 100 Speedup is upper bounded by W/W S as the number of processor p increases

7 CS 524 (Wi 2003/04)- Asim Karim @ LUMS7 Execution Time In a distributed memory model, the execution time T P = t comp + t comm.  t comp : computation time  t comm : communication time for explicit send and receive of messages In a shared memory model, the execution time T P consists of computation time and communication time for memory access. Communications are not specified explicitly. Hence, execution time is CPU time, determined in a manner similar to that for sequential algorithms.

8 CS 524 (Wi 2003/04)- Asim Karim @ LUMS8 Message Passing Communication Overhead Parameters for determining communication time, t comm  Startup time (t s ): The time required to handle a message at the sending processor including the time to prepare the message, the time to execute the routing algorithm, and the time to establish an interface between the local processor and router.  Per-hop time (t h ): The time for the message header to travel between two directly connected processors. Also known as node latency.  Per-word transfer time (t w ): The time for a word to traverse a link. If the channel bandwidth is r words per second, then per-word transfer time is t w = 1/r. t comm = t s + t h + t w

9 CS 524 (Wi 2003/04)- Asim Karim @ LUMS9 Store-and-Forward Routing (1) Store-and-forward routing: a message is traversing a path with multiple links; each intermediate processor on the path forwards the message to the next processor after it has received and stored the entire message

10 CS 524 (Wi 2003/04)- Asim Karim @ LUMS10 Store-and-Forward Routing (2) Communication overhead/cost  Message size = m words  Path length = l links  Communication overhead, t comm = t s + (mt w + t h )l  Usually t h is small compared to mt w. Therefore, the communication cost is simplified to t comm = t s + mt w l

11 CS 524 (Wi 2003/04)- Asim Karim @ LUMS11 Cut-Through Routing (1) Cut-through routing: a message is forwarded at intermediate node without waiting for entire message to arrive

12 CS 524 (Wi 2003/04)- Asim Karim @ LUMS12 Cut-Through Routing (2) Wormhole routing: is cut-through routing with pipelining through the network  Message is partitioned in small pieces, called flits (flow control digits)  There is no buffering in memory; busy link causes worm to stall; deadlock may ensue Communication cost/overhead  Message size = m words  Path length = l links  Communication cost t comm = t s + mt w + lt h  Again, considering t h to be small compared to mt w, the communication cos tis simplified to t comm = t s + mt w


Download ppt "Parallel System Performance CS 524 – High-Performance Computing."

Similar presentations


Ads by Google