Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPS3340 COMPUTER ARCHITECTURE Fall Semester, 2013 09/03/2013 Lecture 3: Computer Performance Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.

Similar presentations


Presentation on theme: "CPS3340 COMPUTER ARCHITECTURE Fall Semester, 2013 09/03/2013 Lecture 3: Computer Performance Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE."— Presentation transcript:

1 CPS3340 COMPUTER ARCHITECTURE Fall Semester, 2013 09/03/2013 Lecture 3: Computer Performance Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL STATE UNIVERSITY, WILBERFORCE, OH 1

2 Review  Last Class  Definition of Computer Performance  Measure of Computer Performance  This Class  Computer Performance  Power Wall  Assignment 1  Next Class  Computer Logic  Boolean 2

3 Performance Summary  Performance depends on  Algorithm: affects IC, possibly CPI  Programming language: affects IC, CPI  Compiler: affects IC, CPI  Instruction set architecture: affects IC, CPI, T c The BIG Picture 3

4 Power Trends  In CMOS IC technology §1.5 The Power Wall ×1000 ×30 5V → 1V 4

5 Reducing Power  Suppose a new CPU has  85% of capacitive load of old CPU  15% voltage and 15% frequency reduction The power wall We can’t reduce voltage further We can’t remove more heat How else can we improve performance? 5

6 Uniprocessor Performance §1.6 The Sea Change: The Switch to Multiprocessors Constrained by power, instruction-level parallelism, memory latency 6

7 Multiprocessors  Multicore microprocessors  More than one processor per chip  Requires explicitly parallel programming  Compare with instruction level parallelism Hardware executes multiple instructions at once Hidden from the programmer  Hard to do Programming for performance Load balancing Optimizing communication and synchronization 7

8 Manufacturing ICs  Yield: proportion of working dies per wafer  http://www.youtube.com/watch?v=-GQmtITMdas http://www.youtube.com/watch?v=-GQmtITMdas §1.7 Real Stuff: The AMD Opteron X4 8

9 AMD Opteron X2 Wafer  X2: 300mm wafer, 117 chips, 90nm technology  X4: 45nm technology 9

10 Integrated Circuit Cost  Nonlinear relation to area and defect rate  Wafer cost and area are fixed  Defect rate determined by manufacturing process  Die area determined by architecture and circuit design 10

11 SPEC CPU Benchmark  Programs used to measure performance  Supposedly typical of actual workload  Standard Performance Evaluation Cooperative (SPEC)  Develops benchmarks for CPU, I/O, Web, …  SPEC CPU2006  Elapsed time to execute a selection of programs Negligible I/O, so focuses on CPU performance  Normalize relative to reference machine  Summarize as geometric mean of performance ratios CINT2006 (integer) and CFP2006 (floating-point) 11

12 CINT2006 for Opteron X4 2356 NameDescriptionIC×10 9 CPITc (ns)Exec timeRef timeSPECratio perlInterpreted string processing2,1180.750.406379,77715.3 bzip2Block-sorting compression2,3890.850.408179,65011.8 gccGNU C Compiler1,0501.720.47248,05011.1 mcfCombinatorial optimization33610.000.401,3459,1206.8 goGo game (AI)1,6581.090.4072110,49014.6 hmmerSearch gene sequence2,7830.800.408909,33010.5 sjengChess game (AI)2,1760.960.483712,10014.5 libquantumQuantum computer simulation1,6231.610.401,04720,72019.8 h264avcVideo compression3,1020.800.4099322,13022.3 omnetppDiscrete event simulation5872.940.406906,2509.1 astarGames/path finding1,0821.790.407737,0209.1 xalancbmkXML parsing1,0582.700.401,1436,9006.0 Geometric mean11.7 12

13 SPEC Power Benchmark  Power consumption of server at different workload levels  Performance: ssj_ops/sec  Power: Watts (Joules/sec) 13

14 SPECpower_ssj2008 for X4 Target Load %Performance (ssj_ops/sec)Average Power (Watts) 100%231,867295 90%211,282286 80%185,803275 70%163,427265 60%140,160256 50%118,324246 40%920,35233 30%70,500222 20%47,126206 10%23,066180 0%0141 Overall sum1,283,5902,605 ∑ssj_ops/ ∑power493 14

15 Pitfall: Amdahl’s Law  Improving an aspect of a computer and expecting a proportional improvement in overall performance §1.8 Fallacies and Pitfalls Can’t be done! Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5× overall? Corollary: make the common case fast 15

16 Fallacy: Low Power at Idle  Look back at X4 power benchmark  At 100% load: 295W  At 50% load: 246W (83%)  At 10% load: 180W (61%)  Google data center  Mostly operates at 10% – 50% load  At 100% load less than 1% of the time  Consider designing processors to make power proportional to load 16

17 Pitfall: MIPS as a Performance Metric  MIPS: Millions of Instructions Per Second  Doesn’t account for Differences in ISAs between computers Differences in complexity between instructions CPI varies between programs on a given CPU 17

18 Concluding Remarks  Cost/performance is improving  Due to underlying technology development  Hierarchical layers of abstraction  In both hardware and software  Instruction set architecture  The hardware/software interface  Execution time: the best performance measure  Power is a limiting factor  Use parallelism to improve performance §1.9 Concluding Remarks 18

19 Summary  Performance Definition  Power Trend  Amdahl’s Law 19

20 What I want you to do  Review Chapter 1  Work on your assignment 1 20


Download ppt "CPS3340 COMPUTER ARCHITECTURE Fall Semester, 2013 09/03/2013 Lecture 3: Computer Performance Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE."

Similar presentations


Ads by Google