CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair

CEN316 - Chapter 1-2 2 Chapter 2: Performance Performance: –What is it: measures of performance The CPU performance equation: –Execution time as the measure –What affects execution time –Examples Popular alternative metrics –Why they don ’ t work Benchmarks Amdahl's law

CEN316 - Chapter 1-2 3 Performance is Time Time to do the task (execution time) –Execution time, response time, latency Tasks per unit time (sec, minute,...) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 MPH 1350 MPH DC to Paris 6.5 hours 3 hours Passengers 470 132 Throughput (PMPH) 286,700 178,200

CEN316 - Chapter 1-2 4 Throughput and Response Time Example –What is the effect of the following changes on throughput and response time? »Increasing processor speed? »Increasing the number of processors on the same system (multitask)? »Is there any relation between response time and throughput? »What about queuing?

CEN316 - Chapter 1-2 5 Performance as Response Time Performance is most often measured as response time or execution time for some task. “ X is n times faster than Y ” means Example –Execution time of program P »X is 5 sec »Y is 10 sec. –X is 2 times faster than Y.

CEN316 - Chapter 1-2 6 What Time to Measure? Elapsed time, wall-clock time: –Actual time from start to completion. –Depends on CPU, system, I/O, etc. –Often used in real benchmarks. –Only suitable choice when I/O is included. CPU time: –Measure/analyze CPU performance only. –May be suitable when machine is timeshared. –Possibly both user and system component. –User CPU time is our focus for first part of course. Elapsed time = CPU time + idle time. –Usually and assuming time is accurately accounted for.

CEN316 - Chapter 1-2 7 Metrics of performance Different performance metrics are appropriate at different levels: Compiler Programming Language Application Datapath Control Transistors ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Cycles per Instruction Answers per month Operations per second Instruction Set Architecture

CEN316 - Chapter 1-2 8 Relating Processor Metrics CPU execution time per program = CPU clock cycles/program  Clock cycle time = CPU clock cycles/program ÷ Clock rate (frequency) CPU clock cycles/program = Instructions/program  Clock cycles Per Instruction Clock cycles Per Instruction (CPI) is an average measurement, it depends on : –ISA, the implementation, and the program measured –CPI = CPU clock cycles/program ÷ Instructions/program –Also, Instructions per clock cycle or IPC = 1 / CPI CPU execution time = Instructions  CPI  Clock cycle

CEN316 - Chapter 1-2 9 Aspects of CPU Performance Instead of reporting execution time in seconds, we often use cycles Clock “ ticks ” indicate when to start activities (one abstraction): cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 200 Mhz. clock has a cycle time time

CEN316 - Chapter 1-2 10 Example Two machines A and B with the same ISA, A has a clock cycle time of 1ns and a CPI=2 for certain program. B has clock cycle time of 2ns and a CPI=1.2 for the same program. Which machine is faster and by how much? CPU time (A) = I x 2 x 1 ns CPU time (B) = I x 1.2 x 2 ns

CEN316 - Chapter 1-2 11 Example of Improving performance Given the following Information –program run in 10 seconds on computer A, 4 GH clock –Build a computer that runs the same program in 6 seconds –Assumptions: »Clock rate can be increased substantially »Increasing clock rate increases the clock cycles for this program by 20% What is the target clock rate?

CEN316 - Chapter 1-2 12 Organizational Trade-offs Instruction set (and hence Instruction Count), CPI, and Clock cycle time interact in complex ways: Compiler Programming Language Application Datapath Transistors ISA Function Units Instruction Mix Cycle Time CPI

CEN316 - Chapter 1-2 13 Cycles Per Instruction (CPI) CPI = average number of clock cycles per instruction –CPI = Clock Cycles / Instruction Count Affected by both : –cost for each instruction type –the frequency of different instructions »called the instruction mix Useful way to compute CPI (for n instruction types):

CEN316 - Chapter 1-2 14 Example comparing code segments Given the following Info from HW designer Number of instructions? 5 for sequence 1 and 6 For sequence 2 Which one is faster? CPU clock cycles1 = (2x1)+ (1x2) + (2x3)=10 cycles CPU clock cycles2 = 9 cycles What is the CPI for each sequence? CPI1 = 10/5 =2CPI2 = 9/6=1.5 Instruction class CPI ABCABC 123123 Code Sequence Instruction count for instruction class ABC 1212 2424 1111 2121

CEN316 - Chapter 1-2 15 Example CPI Computation RISC processor: register-register ISA: Typical Mix CPI = 1.5 Instruction Type Frequency F i Cycles CPI i CPI Contr F i  CPI i Time % This Instr ALU Load Store Branch 50% 20% 10% 20% 12221222 0.5 0.4 0.2 0.4 33% 27% 13% 27%

CEN316 - Chapter 1-2 16 Using the CPU Performance Equation Example: Consider adding ALU instructions that can have one memory operand to the MIPS ISA to produce MIPSE: –MIPSE = MIPS + ALU instrs with a memory operand. –Initial mix and cycle counts on MIPS: Instr classFreqCyclesMIPS CPI Load30% 2 0.6 Store 15% 2 0.3 ALU op40% 1 0.4 Branch 15% 2 0.3 Overall CPI 1.6 –Assume: »CPI of the MIPSE instruction ALU-with-memory-instruction is 2 »Clock cycle 1.25 times the MIPS clock cycle »One half of the load instructions and a corresponding number of ALU instructions are replaced by ALU-with-memory –Which machine is faster?

CEN316 - Chapter 1-2 17 Solution Normalize mix to 100 instructions –can be easier to calculate and enhance intuition MIPS execution time = 160 cycles  CC MIPS MIPSE execution time = 145 cycles  (1.25X CC MIPS ) MIPSE takes ((145  1.25) / 160) times as long as MIPS MIPSE is 1.13  performance of MIPS

CEN316 - Chapter 1-2 18 Alternative Performance Metrics: MIPS Use something other than time –Often good intention to find a simple metric »bigger is better, general measure, summarizes performance Most common metric: MIPS (Millions of Instructions Per Second): Flaws in using MIPS –Machines with different instruction sets ? –Programs with different instruction mixes ? » dynamic frequency of instructions –Can vary inversely with performance!

CEN316 - Chapter 1-2 19 Example of Problems Consider an optimized and unoptimized version of the same program: Here are cycle counts for instructions Memory Instructions ALU Instructions Branch Instruction FP Instruction Total Instructions Unoptimize d Program 100M 30M40M270M Optimized Program 50M 30M40M170M Memory Cycles ALU CyclesBranch Cycles FP CyclesCPI Per Instruction 2135 Unoptimize d Program 200M100M90M200M2.2 Optimized Program 100M50M90M200M2.6

CEN316 - Chapter 1-2 20 Example continued Assuming a 200 MHz clock: –MIPS unoptimized = 200/2.2 = 91 –MIPS optimized = 200/2.6 = 77 –Performance unoptimized > Performance optimized But look at Execution time! –Execution time unoptimized = CPI  IC / CR = 2.2  270 / 200 = 3s –Execution time optimized = CPI  IC / CR = 2.6  170 / 200 = 2.2s –Performance optimized > Performance unoptimized MIPS measurement is inverse of reality!

CEN316 - Chapter 1-2 21 Another Alternative: MFLOPS MFLOPS (Millions of FLoating Operations per Second): –common metric in scientific/engineering and supercomputer arenas –MFLOPS = Floating point Operations Time X 106 –Machine dependent: what is a floating point op? –often not where time is spent (i.e. not in FP operations) –at best, no better than execution time –at worst, much less informative and more deceptive

CEN316 - Chapter 1-2 22 What is benchmarks? Benchmarks – a set of programs that form a “workload” specifically chosen to measure performance SPEC (System Performance Evaluation Cooperative) creates standard sets of benchmarks starting with SPEC89. The latest is SPEC CPU2006 which consists of 12 integer benchmarks (CINT2006) and 17 floating-point benchmarks (CFP2006). www.spec.org There are also benchmark collections for power workloads (SPECpower_ssj2008), for mail workloads (SPECmail2008), for multimedia workloads (mediabench), …

CEN316 - Chapter 1-2 23 Comparing and Summarizing Performance  How do we summarize the performance for benchmark set with a single number? l First the execution times are normalized giving the “SPEC ratio” (bigger is faster, i.e., SPEC ratio is the inverse of execution time) l The SPEC ratios are then “averaged” using the geometric mean (GM) Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.)) GM = n  SPEC ratio i i = 1 n

CEN316 - Chapter 1-2 24 Amdahl's Law Handy for evaluating impact of a change not tied to CPU performance equation Insight: No improvement of a feature enhances performance by more than the use of the feature. Suppose that enhancement E accelerates fraction F of a program by a factor S (remainder of the task is unaffected): F S = 1–F1–F

CEN316 - Chapter 1-2 25 Example on Amdahl's Law Assume a program runs in 100 seconds on a machine whre multiply operations consumes 80 seconds of this time. How much do we have to improve the speed of multiplication if we want to run the program 5 times faster? Solution: –Execution time after improvement = –20 seconds = (80/n) + 20 seconds –What is the value of n??? Execution time affected by improvement A mount of improvement + Execution time unaffected

CEN316 - Chapter 1-2 26 Summary : Performance Time is the measure of computer performance! –Performance equation includes three parts; all three together determine performance Good products created when have: –Good benchmarks –Good ways to summarize performance Will need different performance metrics as well as a different set of applications to benchmark embedded and desktop computers, which are more focused on response time, versus servers, which are more focused on throughput Remember Amdahl ’ s Law: Speedup is limited by unimproved part of program

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

Similar presentations

Presentation on theme: "CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

Similar presentations

Presentation on theme: "CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair."— Presentation transcript:

Similar presentations

About project

Feedback