Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~mji www.cse.psu.edu/~cg431.

Similar presentations


Presentation on theme: "ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~mji www.cse.psu.edu/~cg431."— Presentation transcript:

1 ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~mji www.cse.psu.edu/~cg431 [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, UCB] Computer Systems Organization: Lecture 4

2 ENEE350 Performance Metrics  Purchasing perspective l given a collection of machines, which has the -best performance ? -least cost ? -best cost/performance?  Design perspective l faced with design options, which has the -best performance improvement ? -least cost ? -best cost/performance?  Both require l basis for comparison l metric for evaluation  Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors

3 ENEE350 Defining (Speed) Performance  Normally interested in reducing l Response time (aka execution time) – the time between the start and the completion of a task -Important to individual users l Thus, to maximize performance, need to minimize execution time l Throughput – the total amount of work done in a given time -Important to data center managers l Decreasing response time almost always improves throughput performance X = 1 / execution_time X If X is n times faster than Y, then performance X execution_time Y -------------------- = --------------------- = n performance Y execution_time X

4 ENEE350 Performance Factors CPU execution time # CPU clock cycles for a program for a program = x clock cycle time CPU execution time # CPU clock cycles for a program for a program clock rate = -------------------------------------------  Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program or 1.ECE244 Recall: Sequential Systems Need Synchronizing Clocks 2.A Computer is a Sequential System and has a Clock 3.Each Instruction Takes up a few Clock Cycles to Execute

5 ENEE350 Review: Machine Clock Rate  Clock rate (MHz, GHz) is inverse of clock cycle time (clock period) CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec clock cycle => 1 GHz clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate

6 ENEE350 Clock Cycles per Instruction  Not all instructions take the same amount of time to execute (different number of clock cycles in each instruction). For example MUL takes more cycles than Add  Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute l A way to compare two different implementations of the same ISA # CPU clock cycles # Instructions Average clock cycles for a program for a program per instruction = x CPI for this instruction class ABC CPI123

7 ENEE350 THE Performance Equation  Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle Instruction_count x CPI clock_rate CPU time = ----------------------------------------------- or  These equations separate the three key factors that affect performance

8 ENEE350 THE Performance Equation  Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle Instruction Count: Depends on the kind of instructions supported by the Architecture. For example a Multiply operation in C Language could be represented as a sequence of Adds in Assembly code, but the number of instructions would be quite a lot. Having a dedicated Mul instruction reduces the total number of instructions in the program

9 ENEE350 THE Performance Equation  Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle CPI: Depends on how complicated the instructions that are. More complex instructions need more clocks to execute. For example Mul instruction in MIPS takes more clocks than Add instruction in MIPS Hence if the Compiler and Assembler choose more complex instructions then they will increase the CPI but may reduce the total number of instructions

10 ENEE350 Computing the Effective CPI  Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle Given a specific Computer Architecture (MIPS for instance), each Instruction i can be associated with the number of clocks that it Needs Ci. Given a C/Java program, the compiler and assembler decide which instructions from the available instruction set to choose, this affects both the number of instructions and the CPI. Let us suppose that they end up choosing ICi number instructions from an instruction i. Then the effective CPI becomes (here n is the total number of instructions) Overall effective CPI =  (C i x IC i )/n

11 ENEE350 Computing the Effective CPI  Hence the effective CPI depends on l The kind of instructions (instruction set) supported by the Architecture l The choice of instructions from this instruction set by the compiler and assembler

12 ENEE350 Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPIclock_cycle Algorithm Programming language Compiler ISA Processor organization Technology

13 ENEE350 Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPIclock_cycle Algorithm Programming language Compiler ISA Instruction Set Processor organization Technology X XX XX XX X X X X X

14 ENEE350 A Simple Example  How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?  How does this compare with using branch prediction to shave a cycle off the branch time?  What if two ALU instructions could be executed at once? OpFreqCPI i Freq x CPI i ALU50%1. Load20%5 Store10%3 Branch20%2  =

15 ENEE350 A Simple Example  How much faster would the machine be if a better architecture reduced the average load time to 2 cycles?  How does this compare with using branch prediction to shave a cycle off the branch time?  What if two ALU instructions could be executed at once? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2  =.5 1.0.3.4 2.2 CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster 1.6.5.4.3.4.5 1.0.3.2 2.0 CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster.25 1.0.3.4 1.95 CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster

16 ENEE350 Another Example  Let us suppose that the ISA has 3 kinds of instructions l A:CPI=1 l B:CPI=2 l C:CPI=3 Let the Compiler/Assembler generate 2 kinds of Codes Code 1: Has 2 from A, 1 from B and 2 from C Code 2: Has 4 from A, 1 from B and 1 from C Which is Betters Code 1= Total Number of Clocks = 2x1 + 1x2 + 2x3 = 10 Code 2= 4x1 + 1x2 + 1x3 = 9

17 ENEE350 Another Example  Let us suppose that the ISA has 3 kinds of instructions l A:CPI=1 l B:CPI=2 l C:CPI=3 Let the Compiler/Assembler generate 2 kinds of Codes Code 1: Has 2 from A, 1 from B and 2 from C Code 2: Has 4 from A, 1 from B and 1 from C Which is Betters Code 1= Total Number of Clocks = 2x1 + 1x2 + 2x3 = 10 Code 2= 4x1 + 1x2 + 1x3 = 9 CODE 1 EXECUTION TIME = 10 x CLOCK CYCLE CODE 2 EXECUTION TIME = 9 x CLOCK CYCLE Therefore Code 2 is faster even though it has more instructions

18 ENEE350 SPEC Benchmarks www.spec.orgwww.spec.org Integer benchmarksFP benchmarks gzipcompressionwupwiseQuantum chromodynamics vprFPGA place & routeswimShallow water model gccGNU C compilermgridMultigrid solver in 3D fields mcfCombinatorial optimizationappluParabolic/elliptic pde craftyChess programmesa3D graphics library parserWord processing programgalgelComputational fluid dynamics eonComputer visualizationartImage recognition (NN) perlbmkperl applicationequakeSeismic wave propagation simulation gapGroup theory interpreterfacerecFacial image recognition vortexObject oriented databaseammpComputational chemistry bzip2compressionlucasPrimality testing twolfCircuit place & routefma3dCrash simulation fem sixtrackNuclear physics accel apsiPollutant distribution

19 ENEE350 Example SPEC Ratings


Download ppt "ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~mji www.cse.psu.edu/~cg431."

Similar presentations


Ads by Google