# Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:

## Presentation on theme: "Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:"— Presentation transcript:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Defining Performance  Response Time The time it takes to do a task  Execution Time  Throughput The total amount of work done in a given time  Difference? 2:08 PM Task ATask B Time t Calculate Save File Calculate Read File Calculate Save File Calculate Read File 012345678910 1 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Defining Performance  Response Time The time it takes to do a task  Execution Time  Throughput The total amount of work done in a given time  Difference? 2:08 PM Task B Time t Calculate Save File Calculate Read File Calculate Save File Calculate Read File 012345678910 Task A 2 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Measuring Performance  Performance = 1 / Execution Time  Response Time = Wall-Clock Time = Elapsed Time ●Processor Time ●+ Memory Access Time ●+ Disk and I/O Access Time ●+ Operating System Time, etc. 2:08 PM Calculate Save File Calculate Read File Calculate Save File Calculate Read File = CPU Time 3 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Measuring Performance  CPU Time ●User CPU Time ●System CPU Time  System Performance:  CPU Performance:  Clock Cycles: ●Clock Period ●Clock Rate 2:08 PM = Execute user code = Call OS functions, e.g. malloc Elapsed time on an unloaded system CPU time T 4 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. CPU Performance  Program CPU Execution Time = Number of CPU Clock Cycles × Clock Cycle Time Number of CPU Clock Cycles = ─────────────────── Clock Rate Exercise: A program takes 10 seconds to run on a 4 GHz CPU. The same program on another CPU would take 20% extra clock cycles, yet it finishes in 6 seconds. What is the other CPU clock rate? 2:08 PM e.g. Clocks × nanosecond Clocks e.g. ───────── GHz CPU-1: # of CPU clock cycles = sec × cycles/sec CPU-2: # of CPU clock cycles = × cycles Clock rate = cycles / seconds = GHz 5 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. CPU Performance  Clocks Per Instruction, CPI The average number of clock cycles each instruction takes to execute. Exercise: Which computer is faster? 2:08 PM ComputerClock Cycle TimeCPI A250 picoseconds2 B500 picoseconds1.2 # of instructions in a program = CPU-A: CPU Execution Time = × × ps = ps CPU-B: CPU Execution Time = × × ps = ps Computer A is ( / ) = times faster than B 6 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. CPU Performance Exercise: Given 3 groups of instructions: A, B and C, it takes different clock cycles to execute an instruction within each group. Given the shown instruction mix, which code sequence is faster to execute? 2:08 PM Instruction Class ABC CPI123 Code Instruction Count ABC Seq1212 Seq2411 Seq1: CPU Execution Time = × + × + × = cycles Seq2: CPU Execution Time = × + × + × = cycles Seq2 is / = times faster than Seq1 Seq1 average CPI = = cycles / instruction Seq2 average CPI = = cycles / instruction 7 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Evaluating Performance  Workload Set of user programs to be executed.  Benchmark Program specifically chosen to measure performance.  Target Benchmarks form a workload that the user hopes will predict the performance of the actual workload.  Today Benchmarks are real applications, from various environments. 2:08 PM 8 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Evaluating Performance  Total Execution Time Which computer is faster?  Arithmetic Mean 2:08 PM Computer AComputer B Pgm11 second5 seconds Pgm2500 seconds100 seconds Computer B = times faster than Computer A Performance B seconds ────────── = ──────── Performance A seconds  Weighted Arithmetic Mean 9 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. SPEC Benchmarks SS ystem Performance Evaluation Corporation ●C●CPU Performance ●G●Graphics/Workstations Performance ●H●High Performance Computing ●J●Java Client/Server ●M●Mail Servers ●N●Network File System ●P●Power ●S●SIP ●V●Virtualization ●W●Web Servers 2:08 PM 10 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. SPEC CPU Benchmarks  SPEC CPU2006 Suite ●CINT2006: 12 Integer Benchmarks ●CFP2006: 17 Floating Point Benchmarks ●Exercise ♦ CPU ♦ Memory Systems ♦ Compilers (Fortran, C, C++) One benchmark has ½ million lines in C++ 2:08 PM 11 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. SPEC CPU Benchmarks CINT2006CFP2006 LanguageApplication AreaLanguageApplication Area CProgramming LanguageFortranFluid Dynamics CCompressionFortranQuantum Chemistry CC CompilerCPhysics / Quantum Chromodynamics CCombinatorial OptimizationFortranPhysics / CFD CArtificial Intelligence: GoC, FortranBiochemistry / Molecular Dynamics CSearch Gene SequenceC, FortranPhysics / General Relativity CArtificial Intelligence: chessFortranFluid Dynamics CPhysics / Quantum ComputingC++Biology / Molecular Dynamics CVideo CompressionC++Finite Element Analysis C++Discrete Event SimulationC++Linear Programming, Optimization C++Path-finding AlgorithmsC++Image Ray-tracing C++XML ProcessingC++Structural Mechanics FortranComputational Electromagnetics FortranQuantum Chemistry CFluid Dynamics C, FortranWeather CSpeech recognition 2:08 PM 12 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. CPU Efficiency 2:08 PM  Is the increase in performance due to higher clocks? 13 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. CPU Efficiency  Implementation Efficiency ●Clock-Normalized Scores Example: Pentium 3 @ 800 MHz  152 Pentium 4 @ 3.4 GHz  539 Example: 2:08 PM Normalized Scores: ─── = ─── = Pentium IIIPentium IV CINT2000/MHz0.470.36 CFP2000/MHz0.340.39 CPI was sacrificed to enhance Clock rate. It takes more clocks New instructions; Streaming SIMD Ex2 14 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Amdahl’s Law  When introducing an improvement, Execution Time is divided into 2 parts: ●Affected by the improvement ●Not affected Execution Time Execution Time Affected Execution After = ──────────────── + Time Improvement Amount of Improvement Unaffected Example: How much improvement is required for the multiply hardware to make the program run 5 times faster? 2:08 PM AddMultiply Program Execution Time20 seconds80 seconds ─── = ─── + Not Possible 15 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. MIPS: Million Instructions Per Second  No Regard to Instruction Type ●Instructions Have Different Capabilities ●Different Computers Have Different Architectures  Different MIPS for Different Programs, Same CPU  MIPS Can Vary Inversely With Performance Example: Which code is faster? 2:08 PM Instruction Class ABC CPI123 4 GHz Clock Instruction Counts in Billions ABC Compiler1511 Compiler21011 16 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. MIPS: Million Instructions Per Second CPU Clock Cycles 1 = ( + + ) × = CPU Clock Cycles 2 = ( + + ) × = Execution Time 1 = / = seconds Execution Time 2 = / = seconds MIPS 1 = ( + + ) million instr / seconds = MIPS 2 = ( + + ) million instr / seconds = 2:08 PM Instruction Class ABC CPI123 4 GHz Clock Instruction Counts in Billions ABC Compiler1511 Compiler21011 17 / 17

Princess Sumaya University 22343 – Computer Organization & Design Computer Engineering Dept. Chapter 4