1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
Computer Abstractions and Technology
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2: Computer Performance
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
Computer Performance Computer Engineering Department.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Measuring Performance II and Logic Design
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS/COE0447 Computer Organization & Assembly Language
CS2100 Computer Organisation
Presentation transcript:

1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance

2 Program Performance Program performance is measured in terms of time! Program execution time deals with –Number of instructions executed to complete a job –How many clock cycles are needed to execute a single instruction –The length of the clock cycle (clock cycle time)

3 Clock, Clock Cycle Time Circuits in computers are “clocked” At each clock rising (or falling) edge, some specified actions are done, usually within the next rising (or falling) edge Instructions typically require more than one cycle to execute Function block (made of circuits) clock clock cycle time

4 Program Performance time = (# of clock cycles)  (clock cycle time) # of clock cycles = (# of instructions executed)  (average cycles per instruction) time = (# of instructions executed)  (average clock cycles per instruction)  (clock cycle time) time = cycle x s cycle cycle = instruction x cycle (ave) SO: instruction time (s) = instruction x cycle (ave) x s instruction cycle

5 Example 1 You have a machine with a CPU running at 1GHz. The same company releases its 2GHz CPU with 100% compatibility with the existing 1GHz CPU, and you are considering upgrading. What is the expected performance improvement from doing so? Assume that programs have 40% memory-access instructions, and each memory access takes 10ns on average. All other instructions take exactly one cycle for execution. Answer: in class

6 From WikiPedia Amdahl's law, named after computer architect Gene Amdahl, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.computer architectGene Amdahl parallel computingspeedup

7 Amdahl’s Law (cont) The law is concerned with the speedup achievable from an improvement to a computation that affects a proportion P of that computation where the improvement has a speedup of S. Amdahl's law states that the overall speedup of applying the improvement will be: 1 ((1-P) + P/S) Our example: P =.6 and S = 2 1/((1-.6) + (.6/2)) = 1.43 This is the maximum speedup possible

8 Example 2 If a computer issues 30 network requests per second and each request is on average 64 KB, will a 100 Mbit Ethernet link be sufficient? (printer, accessing files, …) KB = 10^3 bytes Byte = 8 bits Mbit = 10^6 A 100 Mbit Ethernet: 10^8 bit/s “bitrate”

9 Answer Ethernet: 10^8 bit/s KB = Kilobyte; Kilo = 10^3; byte = 8 bits 30 request/s * 64 KB/request * 10^3 x 8 bit/KB (the units cancel to leave bit/s) 30 * 64 * 8 * 10^3 = 3 * 6.4 * 8 * 10^5 < 10^8 (or use a calculator to compute it exactly) So, yes, it is sufficient

10 Why Performance Evaluation? DESIGN EVALUATION

11 Defining Performance What do you mean when you say a computer has better performance than another? We need a “metric” for comparison –One metric may not fully characterize a system a number of metrics may be relevant –Important metrics for computer systems Response time (a.k.a. execution time) Throughput

12 Response Time vs. Throughput Which has higher performance? –Time to deliver 1 passenger –Time to deliver 400 passengers Time for 1 job is called –Response time or execution time Jobs per day is called –Throughput or bandwidth PlaneDC to ParisTop SpeedPassengers Throughput (pmph) Boeing hours610 mph470286,700 BAD/Sud Concorde 3 hours1350 mph132178,200

13 Some Definitions Throughput is in units of things per second –Bigger is better If we are primarily concerned with response time –Performance = 1 / execution time –Bigger is better  shorter execution time “Machine A is N times faster than B” –= performance (A) / performance (B) = execution time (B) / execution time (A)

14 Response Time vs. Throughput Time of Concorde vs. Boeing 747? –Concord is (6.5 hours/3 hours) faster –2.2 times faster Throughput of Boeing 747 vs. Concorde –286,700 pmph / 178,200 pmph –1.6 times higher Boeing 747 is 1.6 times (or 60%) higher in terms of throughput Concorde is 2.2 times (or 120%) faster in terms of flying time (response time) We will focus primarily on execution time for a single job for the remaining discussions

15 Regarding Time Straightforward definition of time –Total time to complete a task, including disk accesses, memory accesses, other I/O activities, operating system overheads, … –Terms for this: “Real time”, “response time”, “elapsed time” Alternative: time spent by CPU only on your program (since multiple processes may run at the same time) –“CPU execution time” or “CPU time” –Often divided into system CPU time (OS) and user CPU time (user program)

16 Clock

17 Measuring Time In terms of seconds CPU time: computers are constructed using digital circuitry running at a “clock” –Constant rate –Determines when events take place Clock cycle time = length of a clock or clock period = 1 / clock rate –1ns if 1GHz clock –0.5ns if 2GHz clock –0.25ns if 4GHz clock

18 Measuring Time w/ Clocks CPU execution time for program –Clock cycles for a program  clock cycle time –Clock cycles for a program / clock rate

19 Measuring Time w/ Clocks, cont’d Total clock cycles for a program –Instructions for a program (=instruction count)  average clock cycles per instruction CPI Time=(# of instr.)  CPI  (clock cycle time) Looking at the units: –s = inst * cycle/inst * s/cycle

20 Workload A set of programs run on a computer is a workload –Actual collection of applications –Synthetic programs (for experimentation) To evaluate two computer systems, a user would simply compare the execution time of the workload on the two computers

21 Benchmarks A set of applications relevant for performance evaluation SPEC (Standard Performance Evaluation Corporation) –CPU benchmarks –Server benchmarks –Graphics benchmarks –… EEMBC (Embedded Microprocessor Benchmark Consortium) –Automotive –Consumer –Network –Telecom –Office –…

22 Summarizing Performance A is 10 times faster than B for program 1 B is 10 times faster than A for program 2 Although the above statements are correct individually, they present a confusing picture! Computer AComputer B Program 1 (sec)110 Program 2 (sec) Total time (sec)

23 Summarizing Performance, cont’d Arithmetic mean (AM) = (  Time i ) / N Weighted AM = (  Time i  W i ),  W i = 1 AM is a special case of weighted AM where W i = 1/N

24 SPEC Benchmark SPEC CPU2000 benchmark –12 integer benchmarks –14 floating-point benchmarks To get a SPECmark –Run each program on the target machine –Get the performance ratio by dividing the pre- provided execution time (based on an old SUN workstation) with the execution time obtained

25 Amdahl’s Law (in terms of time) An optimization is usually applicable to only a limited portion of program execution –E.g., A larger cache; improved CPU frequency; improved FSB frequency; … Time improved = Time unaffected + Time affected /(Improvement Factor) “Make the common case fast!”

26 Amdahl’s Law - example A program runs in 100 seconds on a computer, with multiply operations responsible for 80 seconds of this time How much do I have to improve the speed of multiplication, if I want my program to run 5 times faster? Time improved = Time unaffected + Time affected /(Improvement Factor) 20 s = 20 s + 80 s / n 0 = 80 s / n There is no amount by which we can improve multiply to achieve a fivefold increase in performance!

27 Fallacies and Pitfalls Pitfall: Expecting the improvement of one aspect of a computer to increase performance by an amount proportional to the size of the improvement Pitfall: Using a subset of the performance equation as a performance metric

28 To Summarize… Performance evaluation is an important stage of an engineering process We are interested in measuring computer performance –Software improvement –Hardware improvement –… Defining performance –Need relevant metric! Latency vs. throughput

29 To Summarize…, cont’d Response time = time to finish a given single job Throughput = # of jobs done in a second Time = # of clock cycles  clock cycle time # of clock cycles = # of instructions  CPI

30 To Summarize…, cont’d Best workload is one that comes from real applications Benchmarks are a set of applications to aid performance evaluation Summarizing results –Arithmetic mean (AM) –Weighted mean Amdahl’s law –Specifies overall performance improvement due to a limited-scope optimization