CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Slides:

Advertisements

Similar presentations

Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.

Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.

10/4/01CSE Class #5 CSE 260 – Introduction to Parallel Computation Class 5: Parallel Performance October 4, 2001.

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Distributed Systems CS

Potential for parallel computers/parallel programming

1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.

Parallel System Performance CS 524 – High-Performance Computing.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.

1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.

CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.

CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.

Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.

Parallel System Performance CS 524 – High-Performance Computing.

CS 240A: Complexity Measures for Parallel Computation.

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.

Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.

Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework.

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.

Parallel Programming with MPI and OpenMP

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Advanced Computer Networks Lecture 1 - Parallelization 1.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

1 Concurrent and Distributed Programming Lecture 2 Parallel architectures Performance of parallel programs References: Based on: Mark Silberstein, ,

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.

Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.

Concurrency and Performance Based on slides by Henri Casanova.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Potential for parallel computers/parallel programming

18-447: Computer Architecture Lecture 30B: Multiprocessors

4- Performance Analysis of Parallel Programs

CS5102 High Performance Computer Systems Thread-Level Parallelism

Concurrent and Distributed Programming

Introduction to Parallelism.

EE 193: Parallel Computing

Chapter 3: Principles of Scalable Performance

CSE8380 Parallel and Distributed Processing Presentation

Distributed Systems CS

COMP60621 Fundamentals of Parallel and Distributed Systems

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics

Exam 1 9 May 2000, 1 week from today Covers material through HW#2 + through Amdahl’s law/Efficiency in this lecture Closed book, Closed Notes. You may use a calculator. At the end of this lecture –will ask you for a list of things you want reviewed during next lecture.

Concurrency/Granularity One key to efficient parallel programming is concurrency. For parallel tasks we talk about the granularity – size of the computation between synchronization points Coarse – heavyweight processes + IPC (interprocess communication (PVM, MPI, … ) Fine – Instruction level (eg. SIMD) Medium – Threads + [message passing + shared memory synch]

One measurement of granularity Computation to Communication Ratio (Computation time)/(Communication time) Increasing this ratio is often a key to good efficiency How does this measure granularity?  CCR = ? Grain  CCR = ? Grain

Communication Overhead Another important metric is communication overhead – time (measured in instructions) a zero-byte message consumes in a process Measure time spent on communication that cannot be spent on computation Overlapped Messages – portion of message lifetime that can occur concurrently with computation time bits are on wire time bits are in the switch or NIC

Many little things add up … Lots of little things add up that add overhead to a parallel program Efficient implementations demand Overlapping (aka hiding) the overheads as much as possible Keeping non-overlapping overheads as small as possible

Speed-Up S(n) = (Execution time on Single CPU)/(Execution on N parallel processors) ts /tp Serial time is for best serial algorithm This may be a different algorithm than a parallel version Divide-and-conquer Quicksort O(NlogN) vs. Mergesort

Linear and Superlinear Speedup Linear speedup = N, for N processors Parallel program is perfectly scalable Rarely achieved in practice Superlinear Speedup S(N) > N for N processors Theoretically not possible How is this achievable on real machines? Think about physical resources of N processors

Space-Time Diagrams Shows comm. patterns/dependencies XPVM has a nice view. Process Time 1 2 3 Message Overhead Waiting Computing

What is the Maximum Speedup? f = fraction of computation (algorithm) that is serial and cannot be parallelized Data setup Reading/writing to a single disk file ts = fts + (1-f) ts = serial portion + parallelizable portion tp = fts + ((1-f) ts)/n S(n) = ts/(fts + ((1-f) ts)/n) = n/(1 + (n-1)f) <- Amdahl’s Law Limit as n ->  = 1/f

Example of Amdahl’s Law Suppose that a calculation has a 4% serial portion, what is the limit of speedup on 16 processors? 16/(1 + (16 – 1)*.04) = 10 What is the maximum speedup? 1/0.04 = 25

More to think about … Amdahl’s law works on a fixed problem size This is reasonable if your only goal is to solve a problem faster. What if you also want to solve a larger problem? Gustafson’s Law (Scaled Speedup)

Gustafson’s Law Fix execution of on a single processor as s + p = serial part + parallelizable part = 1 S(n) = (s +p)/(s + p/n) = 1/(s + (1 – s)/n) = Amdahl’s law Now let, 1 = s +  = execution time on a parallel computer, with  = parallel part. Ss(n) = (s +  n)/(s + ) = n + (1-n)s

More on Gustafson’s Law Derived by fixing the parallel execution time (Amdahl fixed the problem size -> fixed serial execution time) For many practical situations, Gustafson’s law makes more sense Have a bigger computer, solve a bigger problem. Amdahl’s law turns out to be too conservative for high-performance computing.

Efficiency E(n) = S(n)/n * 100% A program with linear speedup is 100% efficient.

Example questions Given a (scaled) speed up of 20 on 32 processors, what is the serial fraction from Amdahl’s law?, From Gustafson’s Law? A program attains 89% efficiency with a serial fraction of 2%. Approximately how many processors are being used according to Amdahl’s law?

Evaluation of Parallel Programs Basic metrics Bandwith Latency Parallel metrics Barrier speed Broadcast/Multicast Reductions (eg. Global sum, average, …) Scatter speed

Bandwidth Various methods of measuring bandwidth Ping-pong Measure multiple roundtrips of message length L BW = 2*L*<#trials>/t Send + ACK Send #trials messages of length L, wait for single ACKnowledgement BW = L*<#trials>/t Is there a difference in what you are measuring? Simple model: tcomm = tstartup + ntdata

Ping-Pong All the overhead (including startup) are included in every message When message is very long, you get an accurate indication of bandwidth Time 1 2

Send + Ack Overheads of messages are masked Has the effect of decoupling startup latency from bandwidth (concurrency) More accurate with a large # of trials Time 1 2

In the limit … As messages get larger, both methods converge to the same number How does one measure latency? Ping-pong over multiple trials Latency = 2*<#ntrials>/t What things aren’t being measured (or are being smeared by these two methods)? Will talk about cost models and the start of LogP analysis next time.

How you measure performance It is important to understand exactly what you are measuring and how you are measuring it.

Exam … What topics/questions do you want reviewed? Are there specific questions on the homework/programming assignment that need talking about?

Questions Go over programming assignment #1