Defining Performance Which airplane has the best performance?

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Organization and Architecture 18 th March, 2008.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
Introduction to Computer Architecture SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING SUMMER 2015 RAMYAR SAEEDI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
1 Embedded Systems Computer Architecture. Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)
순천향대학교 정보기술공학부 이 상 정 1 4. Accessing and Understanding Performance.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 CPS4150 Chapter 4 Assessing and Understanding Performance.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
COD Ch. 1 Introduction + The Role of Performance.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CPEN Digital System Design Assessing and Understanding CPU Performance © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall © Computer.
Performance. Moore's Law Moore's Law Related Curves.
Computer Organization
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
Morgan Kaufmann Publishers Computer Performance
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
January 25 Did you get mail from Chun-Fa about assignment grades?
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS161 – Design and Architecture of Computer Systems
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Defining Performance Which airplane has the best performance? Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Computer Performance: TIME, TIME, TIME Response Time (latency): how long it takes to perform a task Throughput: total work done per unit time For now, we will focus on response time… Have them raise their hands when answering questions Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Execution Time Elapsed Time CPU time Our focus: user CPU time - counts everything (disk and memory accesses, I/O , etc.) - a useful number, but often not good for comparison purposes CPU time - doesn't count I/O or time spent running other programs - can be broken up into system time, and user time Our focus: user CPU time - time spent executing the lines of code that are "in" our program Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Relative Performance Example: time taken to run a program 10s on A, 15s on B Execution TimeB / Execution TimeA = 15s / 10s = 1.5 So A is 1.5 times faster than B Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

CPU Clocking Operation of digital hardware are governed by a constant-rate clock. Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B clock be? Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

How many cycles are required for a program? Could assume that number of cycles equals number of instructions: 1st instruction 2nd instruction 3rd instruction 4th 5th 6th ... time This assumption is incorrect, different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Now that we understand cycles… A given program will require - some number of instructions (machine instructions) - some number of cycles - some number of seconds We have a vocabulary that relates these quantities: - cycle time (seconds per cycle) - clock rate (cycles per second) - CPI (cycles per instruction) a floating point intensive application might have a higher CPI - MIPS (millions of instructions per second) this would be higher for a program using simple instructions Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Performance Performance is determined by execution time Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn’t. Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? A is faster… …by this much Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

CPI in More Detail If different instruction classes take different numbers of cycles Weighted average CPI Relative frequency Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 IC in sequence 2 4 Sequence 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 = 9 Avg. CPI = 9/6 = 1.5 Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

# of Instructions Example A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence? Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

MIPS example Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 2 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time? The analysis is as follows: ----------------------------------------------------------- cycles | 13 x 10^6 15 x 10^6 instructions | 8 x 10^6 12 x 10^6 time | 13 15 | -- x 10^-3 -- x 10^-3 | 4 4 MIPS | 16 48 | -- x 10^3 -- x 10^3 | 13 15 Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Principle: Make the common case fast Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Example Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark? Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Remember Performance is specific to a particular program/s - Total execution time is a consistent summary of performance For a given architecture performance increases come from: - increases in clock rate (without adverse CPI affects) - improvements in processor organization that lower CPI - compiler enhancements that lower CPI and/or instruction count - algorithm/language choices that affect instruction count Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens

Performance Summary Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc Computer Science Dept Va Tech January 2009 ©2006-2009 McQuain & Ribbens