COMP60611 Fundamentals of Parallel and Distributed Systems

COMP60611 Fundamentals of Parallel and Distributed Systems
Lecture 6 Introduction to Performance Modelling John Gurd, Graham Riley Centre for Novel Computing School of Computer Science University of Manchester

Overview (Lectures 6 & 7) Aims of performance modelling
Allows the comparison of algorithms. Gives an indication of scalability of an algorithm on a machine (a parallel system) as both the problem size and the number of processors change – “complexity analysis of parallel algorithms”. Enables reasoned choices at the design stage. Overview of an approach to performance modelling Based on the approach of Foster and Grama et al. Targets a generic multicomputer – (model of message-passing). Limitations A worked example Vector sum reduction (compute the sum of the elements of a vector). Summary 30/05/2019

Aims of Performance Modelling
In this and the next lecture we will look at modelling the performance of algorithms that compute a result Issues of correctness are relatively straightforward We are interested in questions such as: How long will an algorithm take to execute? How much memory is required (though we will not consider this in detail here)? Does the algorithm scale as we vary the number of processors and/or the problem size? What does scaling mean? How do the performances of different algorithms compare? Typically, focus on one phase of a computation at a time e.g. assume start-up and initialisation has been done, or that these phases have been modelled separately 30/05/2019

An Approach to Performance Modelling
Based on a generic multiprocessor (see next slide). Defined in terms of tasks that undertake computation and communicate with other tasks as necessary A task may be an agglomeration of smaller (sub)tasks. Assumes a simple, but realistic, approach to communication between tasks: Based on channels that connect pairs of tasks. Seeks an analytical expression for execution time (T) as a function of (at least) the problem size (N), number of processors (P) (and, often, the number of tasks (U)): 30/05/2019

A Generic Multicomputer
Interconnect CPU Memory CPU Memory CPU Memory CPU Memory … 30/05/2019

Task-channel Model Tasks execute concurrently
The number of tasks can vary during execution. A task encapsulates a sequential program and local memory. Tasks are connected by channels to other tasks From the point of view of a task, a channel is either an input or an output channel. In addition to reading from, and writing to, its local memory, a task can: Send messages on output channels. Receive messages on input channels. Create new tasks. Terminate. 30/05/2019

Task-channel Model A channel connecting two tasks acts as a message queue. A send operation is asynchronous: it completes immediately Sends are considered to be ‘free’ (take zero time)(?!). A receive operation is synchronous: execution of a task is blocked until a message is available Receives may cause waiting (idle time) and take a finite time to complete (as data is transmitted from one task to another). Channels can be created dynamically (also taking zero time!). Tasks can be mapped to physical processors in various ways The mapping does not affect the semantics of the program, but it may affect performance. 30/05/2019

Specifics of Performance Modelling
Assume a processor is either computing, communicating or idling. Thus, the total execution time can be found as either the sum of the time spent in each activity for any particular processor (j): or as the sum of each activity over all processors divided by the number of processors (P): Such aggregate totals are often easier to calculate. 30/05/2019

Definitions 30/05/2019

Cost of Messages A simple model of the cost (in time) of a message is:
where: Tmsg is the time to receive a message, ts is the start up cost (in time) of receiving a message, tw is the cost (in time) per word (s/word), 1/ tw is the bandwidth (words/s), L is the number of words in the message. 30/05/2019

Cost of Communication Thus, is the sum of all message times:
30/05/2019

Limitations of the Model
The (basic) model presented in this lecture ignores the hierarchical nature of the memory of real computer systems: Cache behaviour, The impact of network architecture, Issues of competition for bandwidth (contention). The basic model can be extended to cope with any or all of these complicating factors. Experience with real performance analysis on real systems helps the designer to choose when and what extra modelling might be helpful. 30/05/2019

Relative Performance Metrics: Speed-up and Efficiency
Define relative speed-up as the ratio of the execution time of the parallel algorithm on one processor to the corresponding time on P processors: Define relative efficiency as: The latter is a fractional measure of the time that processors spend doing useful work (i.e., the time it takes to do all the necessary useful work divided by the total time on all P processors). It characterises the efficiency of an algorithm on a system, for any given problem size and any number of processors. 30/05/2019

Absolute Performance Metrics
Relative speed-up can be misleading! (Why?) Define absolute speed-up (or absolute efficiency) with reference to the execution time, Tref , of an implementation of the best known sequential algorithm for the problem-at-hand: Note: the best known sequential algorithm may solve the problem in a fashion that is significantly different to that of the parallel algorithm. 30/05/2019

Overhead Another way of viewing this is to look at the difference between an ideal parallel execution time and that actually observed (usually longer). The ideal is simply the time for the best known sequential algorithm divided by the number of processors. (Why?) The difference between the actually observed and the ideal is termed the execution time overhead, OP, which is the average overhead time per processor. 30/05/2019

Summary We have introduced two views of (approaches to) performance modelling: the task-channel model, and performance metrics (relative and absolute). Usually the task-channel model, which reflects the underlying hardware activity, is used to develop formulae for the gross execution times found in the performance metrics. Relative performance metrics can be misleading, so we prefer to use absolute performance metrics. 30/05/2019

COMP60611 Fundamentals of Parallel and Distributed Systems

Similar presentations

Presentation on theme: "COMP60611 Fundamentals of Parallel and Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COMP60611 Fundamentals of Parallel and Distributed Systems

Similar presentations

Presentation on theme: "COMP60611 Fundamentals of Parallel and Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback