Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4.

Similar presentations


Presentation on theme: "Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4."— Presentation transcript:

1 Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

2 #1 – Reporting Results Reporting “peak” performance (Capable of 4 Instructions Per Cycle)

3 #1 – Reporting Results Reporting “peak” performance (Capable of 4 Instructions Per Cycle) Solution: Report actual performance on actual programs

4 #2 - Choosing Applications Choose programs that run quickly but do not represent real world –That you know run faster on your machine than your competitor’s

5 #2 - Choosing Applications Choose programs that run quickly but do not represent real world Solution: Form a committee that creates benchmarks Benchmarks should: –Representative of real world –Expose performance aspects of machine

6 #3 - Choosing Applications Stuff the benchmark body with company members and do not invite your competitor

7 #3 - Choosing Applications Stuff the benchmark body with company members and do not invite your competitor –Intel did this to AMD Solution: Make sure lots of companies are represented on the committee

8 Benchmark suites SPEC-fp – scientific codes SPEC-int – integer codes (irregular, small codes) TPC – database benchmarks EEMBC – embedded proc benchmarks Many, many others (Java, Networks, media, …)

9 #4 - Running Applications Include flags in your compiler that hand- optimize specific benchmarks Solution:

10 #4 - Running Applications Include flags in your compiler that hand- optimize specific benchmarks Solution: Specify how much optimization is allowed and/or have independent bodies perform tests –http://www.tomshardware.comhttp://www.tomshardware.com –http://www.anandtech.com

11 #5 – Reporting Results Selectively reporting results Solution:

12 #5 – Reporting Results Selectively reporting results –Apple vs Intel Solution: Report suite performance, not just individual benchmarks Speedup of X over Y =

13 #5 – Reporting Results Selectively reporting results –Apple vs Intel Solution: Report suite performance, not just individual benchmarks Speedup of X over Y = ExecutionTimeY / ExecutionTimeX

14 Speedup Comp AComp BComp C P111020 P2100010020 Total100111040 On P1, A is ____ x’s as fast as B On P1, B is ____ x’s as fast as C

15 Speedup Comp AComp BComp C P111020 P2100010020 On P1, A is 10 x’s as fast as B On P1, B is 2 x’s as fast as C

16 #6 – Floating 0 on graph

17 #6 – Normalized speedup, 0 to 1… All execution times are divided by North’s execution time to obtain speedup

18 #7 – Reporting Results Manipulating average results Solution:

19 Speedup Comp AComp BComp C P111020 P2100010020 Arithmetic Mean Geometric Mean Weighted Mean (60/40)

20 #7 – Reporting Results Manipulating average results Solution: Use geometric mean or weighted means rather than arithmetic means (average)

21 #8 – Reporting Results Reporting speedup of a small portion of code Solution:

22 #8 – Reporting Results Reporting speedup of a small portion of code Solution: Use Amdahl’s Law

23 Amdahl’s Law Speedup =

24 Amdahl’s Law Helps target design effort Optimize the common case Allow rare cases to suffer

25 Amdahl’s Law Example Suppose multiplication operations constitute 80% of the execution of a program. How much improvement do we need in the multiply to get a 3x speedup in the program? What is the most possible speedup by improving only the multiply?

26 #9 – Misusing alternate metrics IPC = Instructions per Cycle CPI = Cycles per Instruction MIPS = Millions of Instructions per Second

27 CPU time = #Insts * Clock cycle time * CPI #Inst * CPI / Clock rate (go to board)

28 Using Execution Time Our old 1GHz machine runs our program in 20 seconds. We are designing a new machine, and the target execution time is 10 seconds. Through various architectural innovations, we can increase clock rate. Unfortunately, this increase comes at a price – we can pump the clock rate up to 2GHz, but the new execution takes 1.2 times as many cycles. Is this enough? What is the clock rate necessary for our performance target?

29 #9 – Misusing alternate metrics IPC = Instructions per Cycle CPI = Cycles per Instruction MIPS = Millions of Instructions per Second Clock Rate = Cycles / Second Solution: Only use alternate metrics when other elements of execute time equation are equal.

30 Using CPI We have two code sequences involving recalculation of the same values. The compiler writer can choose between having a code sequence of 7 instructions to recalculate three values, or temporarily storing them in the stack. What are the total cycles of the code sequences? Which should the compiler do?

31 Code SequenceLoadsStoresALU Ops 12115 2548 Instruction ClassAverage CPI for Instruction Class Loads2 Stores1.5 ALU Ops1 Machine performance characteristics: Program characterstics


Download ppt "Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4."

Similar presentations


Ads by Google