Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Distributed Systems CS
Potential for parallel computers/parallel programming
Proofs, Recursion, and Analysis of Algorithms Mathematical Structures for Computer Science Chapter 2 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProofs,
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
CS107 Introduction to Computer Science
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Chapter 7 Performance Analysis. 2 References (Primary Reference): Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Parallel Programming Chapter 3 Introduction to Parallel Architectures Johnnie Baker January 26 , 2011.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
CS 240A: Complexity Measures for Parallel Computation.
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Compiled by Maria Ramila Jimenez
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Parallel Programming with MPI and OpenMP
Computer Science 101 A Survey of Computer Science Timing Problems.
Chapter 9 Efficiency of Algorithms. 9.3 Efficiency of Algorithms.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Science 320 Load Balancing. Behavior of Parallel Program Why do 3 threads take longer than two?
Introduction to Complexity Analysis. Computer Science, Silpakorn University 2 Complexity of Algorithm algorithm คือ ขั้นตอนการคำนวณ ที่ถูกนิยามไว้อย่างชัดเจนโดยจะ.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Complexity of Algorithms Fundamental Data Structures and Algorithms Ananda Guna January 13, 2005.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Distributed and Parallel Processing George Wells.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
Supercomputing in Plain English Tuning Blue Waters Undergraduate Petascale Education Program May 29 – June
Potential for parallel computers/parallel programming
Parallel Computing and Parallel Computers
PERFORMANCE EVALUATIONS
Course Description Algorithms are: Recipes for solving problems.
Parallel Computers.
Chapter 3: Principles of Scalable Performance
Parallel Processing Sharing the load.
Numerical Algorithms Quiz questions
Quiz Questions Parallel Programming Parallel Computing Potential
Algorithms Analysis Algorithm efficiency can be measured in terms of:
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Mattan Erez The University of Texas at Austin
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Complexity Measures for Parallel Computation
Potential for parallel computers/parallel programming
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Course Description Algorithms are: Recipes for solving problems.
Potential for parallel computers/parallel programming
Presentation transcript:

Computer Science 320 Measuring Speedup

What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number of processors K To determine speedup, we’ll always determine both T seq (N, K) and T par (N, K), where K = 1 and K > 1, respectively

What Is Speed? Speed is the rate at which program runs can be done S(N, K) = 1 / T(N, K) Measured in program runs per second, rather than seconds per run

What Is Speedup? Speedup is the speed of a parallel version running on K processors relative to a sequential version running on one processor Speedup(N, K) = S par (N, K) / S seq (N, 1) Why seq in the denominator?

Speedup in Terms of Running Time Speedup(N, K) = T seq (N, K) / T par (N, K) Ideally, the speedup should equal K, or a linear speedup The real speedup of most algorithms is sublinear

What Is Efficiency? Eff(N, K) = Speedup (N, K) / K Usually a fraction < 1

Amdahl’s Law The sequential portion of a parallel program puts an upper bound on the efficiency it can achieve The sequential parts are usually run at startup and cleanup

What Is the Sequential Fraction? The sequential fraction F is the fraction of the code that must be run sequentially F * T(N, 1) is the running time of the sequential part on a single processor (1 – F) * T(N, 1) is the running time of the parallel part on a single processor

Amdahl’s Law T(N, K) = F * T(N, K) + 1/K * (1 – F) * T(N, K) Running time when the parallel part is equally divided among K processors

Speedup and Efficiency in Terms of Sequential Fraction Speedup(N, K) = 1 / (F + (1 – F) / K) Eff(N, K) = 1 / (K * F + 1 – F)

Consequences of Amdahl As K increases, the speedup approaches 1 / F As K goes to infinity, the efficiency approaches 0

Predicted Speedups

Predicted Efficiencies

A Normal Plot for up to 8 Processors

Experimentally Determining F To calculate the sequntial fraction F from the running time measurements, we rearrange Amdahl’s Law to get F = (K * T(N, K) - T(N, 1)) / (K * T(N, 1) - T(N, 1)) Called EDSF If F is not a constant or EDSF versus K is not a horizontal line, something’s not right with the program

Measuring Running Times Not the same on each run with the same data set Take the minimum time of several runs, not the average time. Why?

Experimental Situation Close all unnecessary apps and prevent remote logins T seq (N, 1) must be at least 60 seconds Run the sequential program 7 times on each data size N, and take the minimum time

Experimental Situation For each data size N and for each K from 1 up to the number of available processors, run T par (N, K) 7 times and take each minimum time

Running Time of Key Search

Speedup of Key Search

Efficiency of Key Search

EDSF of Key Search