Presentation is loading. Please wait.

Presentation is loading. Please wait.

15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah CS-447– Computer Architecture.

Similar presentations


Presentation on theme: "15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah CS-447– Computer Architecture."— Presentation transcript:

1 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/ CS-447– Computer Architecture M,W 10-11:20am Lecture 7 Performance (Cont’d)

2 15-447 Computer ArchitectureFall 2007 © Today °Lecture & Discussion °Next Lecture: Review °Read the chapters & slides. °Practice the performance examples in the Patterson book. Done by now

3 15-447 Computer ArchitectureFall 2007 © Assessing & Understanding Performance This chapter discusses how to measure, report, and summarize performance of a computer.

4 15-447 Computer ArchitectureFall 2007 © Motivation It is often helpful to have some yardstick by which to compare systems During development to evaluate different algorithms or optimizations During purchasing to compare between product offerings …

5 15-447 Computer ArchitectureFall 2007 © °Measure, Report, and Summarize °Make intelligent choices °See through the marketing hype °Key to understanding underlying organizational motivation Performance

6 15-447 Computer ArchitectureFall 2007 © Performance Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance?

7 15-447 Computer ArchitectureFall 2007 © Which of these airplanes has the best performance? Airplane PassengersRange (mi) Speed (mph) Boeing 737-100 101630598 Boeing 7474704150610 BAC/Sud Concorde13240001350 Douglas DC-8-501468720544 ° How much faster is the Concorde compared to the 747? ° How much bigger is the 747 than the Douglas DC-8?

8 15-447 Computer ArchitectureFall 2007 © ° Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? ° Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done? Computer Performance

9 15-447 Computer ArchitectureFall 2007 © °Elapsed Time counts everything (disk and memory accesses, I/O, etc.) a useful number, but often not good for comparison purposes Execution Time

10 15-447 Computer ArchitectureFall 2007 © Execution Time °CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Our focus: user CPU time time spent executing the lines of code that are "in" our program

11 15-447 Computer ArchitectureFall 2007 © °For some program running on machine X, Performance X = 1 / Execution time X "X is n times faster than Y" Performance X / Performance Y = n Definition of Performance

12 15-447 Computer ArchitectureFall 2007 © Definition of Performance Problem: machine A runs a program in 20 seconds machine B runs the same program in 25 seconds

13 15-447 Computer ArchitectureFall 2007 © How to compare the performance? Total Execution Time : A Consistent Summary Measure Comparing and Summarizing Performance Computer AComputer B Program1(sec)110 Program2(sec)1000100 Total time (sec)1001110

14 15-447 Computer ArchitectureFall 2007 © Clock Cycles °Instead of reporting execution time in seconds, we often use cycles °Clock “ticks” indicate when to start activities (one abstraction): time

15 15-447 Computer ArchitectureFall 2007 © Clock cycles °cycle time = time between ticks = seconds per cycle °clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) A 4 Ghz clock has a 250ps cycle time

16 15-447 Computer ArchitectureFall 2007 © CPU Execution Time rateclockondscycle onds cycle Cycle SecondsCyclesSeconds CPU sec/ / Program cycles Program time)cycle(clock xprogram) afor cyclesclock (CPU program afor timeexecution    

17 15-447 Computer ArchitectureFall 2007 © So, to improve performance (everything else being equal) you can either increase or decrease? ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. How to Improve Performance

18 15-447 Computer ArchitectureFall 2007 © So, to improve performance (everything else being equal) you can either increase or decrease? _decrease_ the # of required cycles for a program, or _decrease_ the clock cycle time or, said another way, _increase_ the clock rate. How to Improve Performance

19 15-447 Computer ArchitectureFall 2007 © Could assume that # of cycles equals # of instruction time 1st instruction2nd instruction3rd instruction4th 5th6th... How many cycles are required for a program? This assumption is incorrect, different instructions take different amounts of time on different machines.

20 15-447 Computer ArchitectureFall 2007 © °Multiplication takes more time than addition °Floating point operations take longer than integer ones °Accessing memory takes more time than accessing registers °Important point: changing the cycle time often changes the number of cycles required for various instructions time Different numbers of cycles for different instructions

21 15-447 Computer ArchitectureFall 2007 © Now that we understand cycles Components of PerformanceUnits of Measure CPU execution time for a program Seconds for the program Instruction countInstructions executed for the program Clock Cycles per Instruction (CPI) Average number of clock cycles per instruction Clock cycle timeSeconds per clock cycle

22 15-447 Computer ArchitectureFall 2007 © CPI CPU clock cycles = Instructions for a program x Average clock cycles per Instruction (CPI) CPU time = Instruction count x CPI x clock cycle time

23 15-447 Computer ArchitectureFall 2007 © Performance ° Performance is determined by execution time ° Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second? ° Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

24 15-447 Computer ArchitectureFall 2007 © CPI i : the average number of cycles per instructions for that instruction class C i : the count of the number of instructions of class i executed. n : the number of instruction classes. CPU Clock Cycles

25 15-447 Computer ArchitectureFall 2007 © Example °Instruction Classes: Add Multiply °Average Clock Cycles per Instruction: Add 1cc Mul 3cc °Program A executed: 10 Add instructions 5 Multiply instructions

26 15-447 Computer ArchitectureFall 2007 © Quiz An application using a desktop client and a remote server is limited by network performance. What happens to response time and throughput when: °An extra network channel is added °Networking software is upgraded to reduce communications delay °More memory is added to the desktop computer

27 15-447 Computer ArchitectureFall 2007 © Formula Summary °T: Execution Time (seconds) °C: Total Number of Cycles °f: Clock Frequency (cycles/second) °I: (Dynamic) Instruction Count I j : Count for Instructions of type j C j : Cycles per Instruction of type j T = C / f C = I 1 x C 1 + … + I k x C k I = I 1 + I 2 + … + I k CPI = C / I T = (I x CPI) / f

28 15-447 Computer ArchitectureFall 2007 © Performance Calculation Example: fact(4) fact: a.pushl %ebp# Setup b.movl %esp,%ebp# Setup c.movl $1,%eax# eax = 1 d.movl 8(%ebp),%edx# edx = x L11: e.imull %edx,%eax# result *= x f.decl %edx# x— g.cmpl $1,%edx# Compare x:1 h.jg L11# if > repeat i.movl %ebp,%esp# Finish j.popl %ebp# Finish k.ret# Finish f = 1GHz Inst TypeInstCycles 1imull5 2decl2 3Other1 Calculate: T, C, I, & CPI when fact is executed with input x = 4

29 15-447 Computer ArchitectureFall 2007 © Performance Calculation Example: fact(4) fact: a.pushl %ebp b.movl %esp,%ebp c.movl $1,%eax d.movl 8(%ebp),%edx L11: e.imull %edx,%eax f.decl %edx g.cmpl $1,%ed h.jg L11 i.movl %ebp,%esp j.popl %ebp k.ret InstCountCycles a b c d e f g h i j k Total Inst TypeInstCycles 1imull5 2decl2 3Other1

30 15-447 Computer ArchitectureFall 2007 © °Performance best determined by running a real application Use programs typical of expected workload Or, typical of expected class of applications ex: compilers/editors, scientific applications, graphics °Small benchmarks nice for architects and designers easy to standardize can be abused Benchmarks

31 15-447 Computer ArchitectureFall 2007 © Benchmarks (2) °SPEC (Standard Performance Evaluation Corporation) companies have agreed on a set of real programs and inputs valuable indicator of performance (and compiler technology) can still be abused

32 15-447 Computer ArchitectureFall 2007 © Standard Performance Evaluation Corporation SPEC is supported by a number of computer vendors to create standard sets of benchmarks for modern computer systems. The SPEC benchmark sets include CPU performance, graphics, High-performance computing, Object-oriented computing, Java applications, Client-server models, Mail systems, File systems, and Web servers.

33 15-447 Computer ArchitectureFall 2007 © SPEC ‘89 °Compiler “enhancements” and performance

34 15-447 Computer ArchitectureFall 2007 © SPEC CPU Benchmarks CINT2000 : the SPEC ratio for the integer benchmark sets CFP2000 : the SPEC ratio for the floating-point benchmark sets.

35 15-447 Computer ArchitectureFall 2007 © SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?

36 15-447 Computer ArchitectureFall 2007 © SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?

37 15-447 Computer ArchitectureFall 2007 © Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Amdahl's Law

38 15-447 Computer ArchitectureFall 2007 © Example °Application execution time = 20sec 12 seconds are spent performing add operations °If we improve the add operation to run twice as fast, how much faster will the application run?

39 15-447 Computer ArchitectureFall 2007 © Amdahl’s Law °Example: "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"

40 15-447 Computer ArchitectureFall 2007 © Execution time after improvement Amdahl's Law

41 15-447 Computer ArchitectureFall 2007 © MIPS (million instructions per second) Example Code from Instruction Counts (in billions) for each instruction set A (1 CPI)B (2 CPI)C (3 CPI) Compiler 1511 Compiler 21011 Clock rate = 4GHz A,B,C : Instruction Classes Which code sequence will execute faster according to MIPS? According to execution time?

42 15-447 Computer ArchitectureFall 2007 © CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 10 9 = 10 x 10 9 CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 10 9 = 15 x 10 9 Execution time & MIPS  seconds 75.3 10 4 15 time2 9 9     Execution seconds 5.2 10 4 time1 9 9    Execution

43 15-447 Computer ArchitectureFall 2007 © Execution time & MIPS (2)

44 15-447 Computer ArchitectureFall 2007 © Performance Evaluation °Performance depends on Hardware architecture Software environment °Meaning of performance depends on viewpoint User: time System Manager: throughput

45 15-447 Computer ArchitectureFall 2007 © Performance Evaluation °Kinds of Performance Graphics Network Transactional Multi-user system I/O Scientific/Engineering codes

46 15-447 Computer ArchitectureFall 2007 © Example on the MIPS R10K Prof run at: Tue Apr 28 15:50:26 1998 Command line: prof suboptim.ideal.m28293 109148754: Total number of cycles 0.55974s: Total execution time 77660914: Total number of instructions executed 1.405: Ratio of cycles / instruction 195: Clock rate in MHz R10000: Target processor modelled cycles(%) cum % secs instrns calls procedure 61901843(56.71) 56.71 0.32 45113360 1 pdot 47212563(43.26) 99.97 0.24 32523280 1 init 31767( 0.03) 100.00 0.00 21523 1 vsum 1069( 0.00) 100.00 0.00 887 3 fflush : : : : : :


Download ppt "15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah CS-447– Computer Architecture."

Similar presentations


Ads by Google