Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Performance Feb 2005.

Similar presentations


Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Performance Feb 2005."— Presentation transcript:

1 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Performance Feb 2005 Reading: 4.1-4.6, 4.7* Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

2 Feb 2005Performance2 Roadmap for the term: major topics  Computer Systems Overview  Technology Trends  Instruction sets (and Software)  Logic & Arithmetic  Performance   Processor Implementation  Memory Systems  Input/Output

3 Feb 2005Performance3 Performance Outline  Motivation   Defining Performance  Common Performance Metrics  Benchmarks  Amdahl’s Law

4 Feb 2005Performance4  Goal: Learn to “measure, report and summarize” performance of a computer system  Why study performance?  To make intelligent decisions when choosing a system  To make intelligent decisions when designing a system  Understand impact of implementation decisions  Challenges  How do we measure performance accurately?  How do we compare performance fairly?

5 Feb 2005Performance5 What’s a good measure of performance?  Execution Time (a.k.a. response time, latency)  How long it takes to complete a single task  Example: “how long does it take to rip an MP3 file?”  Throughput  How many tasks are completed per unit time  Example: “how many MP3 files can I rip per hour?  The measure we use depends on the application

6 Feb 2005Performance6 Execution Time vs. Throughput  Analogy: passenger airplanes (book Figure 4.1)  Concorde - fastest “response time” for an individual user  Boeing 747 - highest passenger throughput

7 Feb 2005Performance7 Performance Outline  Motivation  Defining Performance   Common Performance Metrics  Benchmarks  Amdahl’s Law

8 Feb 2005Performance8 Measuring Execution Time  Wall-clock time or elapsed time  Includes I/O waiting  Includes time while OS runs other jobs  CPU time - measured by OS  User time - time spent in user program during execution  System time - time spent in OS during execution  Measuring CPU time: Unix/Linux time command % time myprog … 90.7u 12.9s 2:39 65% User Time System TimeWall-clock Time CPU Utilization

9 Feb 2005Performance9 Defining Performance using Execution Time  For a given program on machine X:  Comparing performance of machines: Performance X > Performance Y if Execution Time X < Execution Time Y

10 Feb 2005Performance10 Defining Performance (cont’d)  We say “X is n times faster than Y” if: Use this definition to compare performance in homework & exam problems!

11 Feb 2005Performance11 Example - Performance  Which machine is faster?  By how much? B A A

12 Feb 2005Performance12 Performance Outline  Motivation  Defining Performance  Common Performance Metrics   Benchmarks  Amdahl’s Law

13 Feb 2005Performance13 Clocks and Performance  Every processor has a clock  Clock frequency - MHz (or GHz)  Clock cycle time / period - ns (or ps)  How do we relate clock to program performance? Cycle time t clock = 2 ns f clk = 500 MHz

14 Feb 2005Performance14 Example - Clock Cycles  How many clock cycles does A execute?  How many clock cycles does B execute?

15 Feb 2005Performance15 Review - Clocks in Sequential Circuits  Controls sequential circuit operation  Register outputs change at beginning of cycle  Combinational logic determines “next state”  Storage elements store new state Adder Mux Combinational LogicRegister Output Register Input Clock

16 Feb 2005Performance16 What Limits Clock Frequency?  Propagation delay - t prop Logic (including register outputs) Interconnect  Register setup time - t setup Adder Mux Combinational Logic Register Output Register Input Clock t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack

17 Feb 2005Performance17 Clock Cycles per Instruction (CPI)  Consider the 68HC11 …  ADDA - 3 cycles (IMM) -> 5 cycles (IND, Y)  MUL - 10 cycles  IDIV - 41 cycles  More complex processors have other issues…  Pipelining - parallel execution, but sometimes stalls  Memory system issues: cache misses, page faults, etc.  How can we combine these into an overall metric? addldamulandsta Total Execution Time

18 Feb 2005Performance18 Definition: Clock Cycles per Instruction (CPI)  Average number of clock cycles per instruction  Measured for an entire program

19 Feb 2005Performance19 Example - CPI  What is the CPI of A?  What is the CPI of B?

20 Feb 2005Performance20 Definition - MIPS  MIPS - millions of instructions per second  Once used as a general metric for performance  But, not useful for comparing different architectures  Often ridiculed as “meaningless indicator of performance”

21 Feb 2005Performance21 Example - MIPS  What is the MIPS of A?  What is the MIPS of B?

22 Feb 2005Performance22 Relating the Metrics - The Performance Equation  The “Iron Law” of Performance

23 Feb 2005Performance23 Clock Cycles and Performance - Example  Program runs on Computer A:  CPU Time: 10 seconds  Clock: 400MHz  Computer B can run clock faster  But, requires 1.2X clock cycles to perform same task  Desired CPU Time: 6 Seconds  What should the clock frequency be to reach this target?  Key to approach: Performance equation

24 Feb 2005Performance24 Clock Cycles and Performance - Example (cont’d) First step: find clock cycles executed by Computer A Second step: find clock cycles executed by Computer B

25 Feb 2005Performance25 Clock Cycles and Performance - Example (cont’d) Third step: given clock cycles and CPU time, solve for clock rate of Computer B

26 Feb 2005Performance26 Performance Tradeoffs  Program Instruction count - impacted by  Instructions available to perform basic tasks (architecture)  Quality of compiler code generator  CPI - impacted by  Chip implementation  Quality of compiler code generator  Memory system performance  Clock Rate  Delay characteristics of IC technology  Logical structure of implementation

27 Feb 2005Performance27 Performance Outline  Motivation  Defining Performance  Common Performance Metrics  Benchmarks   Amdahl’s Law

28 Feb 2005Performance28 Benchmarks - Programs to Evaluate Performance  The book defines performance in terms of a specific program  But, which program should you use?  Ideally, “real” programs  Ideally, programs you will use  But what if you don’t know or don’t have time to find out?  Alternative - Benchmark Suites

29 Feb 2005Performance29 Benchmark Suites  Use a collection of small programs  Summarize performance … how?  Total Execution Time  Arithmetic Mean  Weighted Arithmetic Mean  Geometric Mean

30 Feb 2005Performance30 Total Execution Time  Suppose we have two benchmarks (Fig. 2.5)  How do we compare computers A and B?  A is 10 times faster than B for Program 1  B is 10 times faster than A for Program 2  Summarizing performance: total execution time  Reasonable comparison for even workload

31 Feb 2005Performance31 Summarizing Performance  Arithmetic Mean  Example:  Weighted Arithmetic Mean - use when some programs run more often than others  Example:  Program 1 is 80% of workload  Program 2 is 20% of workload

32 Feb 2005Performance32 The SPEC Benchmark Suite  System Performance Evaluation Corporation  Founded by workstation vendors with goal of realistic, standardized performance test  Philosophy: fair comparison between real systems  Multiple versions - most recent is SPEC2000  Basic approach  Measure execution time of several small programs  Normalize each to performance on a reference machine  Combine performances using geometric mean

33 Feb 2005Performance33 More about SPEC  Separate integer & floating point suites  SPECint - integer performance  SPECfp - floating point performance  Several Versions  SPEC89 - initial version  SPEC95 - see Figure 2.6 in book  SPEC CPU 2000 - current CINT 2000 CFP 2000

34 Feb 2005Performance34 Some Not-so-Recent SPEC Data Source: Berkeley CPU Information Center  Some SPEC95 numbers:  SPEC2000 Results - see http://www.spec.org

35 Feb 2005Performance35 SPEC CINT2000 Benchmarks  CINT2000 (Integer Component of SPEC CPU2000): BenchmarkLang.Category 164.gzipCCompression 175.vprCFPGA Circuit Placement and Routing 176.gccCC Programming Language Compiler 181.mcfCCombinatorial Optimization 186.craftyCGame Playing: Chess 197.parserCWord Processing 252.eonC++Computer Visualization 253.perlbmkCPERL Programming Language 254.gapCGroup Theory, Interpreter 255.vortexCObject-oriented Database 256.bzip2CCompression 300.twolfCPlace and Route Simulator

36 Feb 2005Performance36 SPEC CFP2000 Benchmarks  CFP2000 (Floating Point Component of SPEC CPU2000): BenchmarkLanguageCategory 168.wupwiseFortran 77Physics / Quantum Chromodynamics 171.swimFortran 77Shallow Water Modeling 172.mgridFortran 77Multi-grid Solver: 3D Potential Field 173.appluFortran 77Parabolic / Elliptic Partial Diff. Equations 177.mesaC3-D Graphics Library 178.galgelFortran 90Computational Fluid Dynamics 179.artCImage Recognition / Neural Networks 183.equakeCSeismic Wave Propagation Simulation 187.facerecFortran 90Image Processing: Face Recognition 188.ammpCComputational Chemistry 189.lucasFortran 90Number Theory / Primality Testing 191.fma3dFortran 90Finite-element Crash Simulation 200.sixtrackFortran 77High Energy Nuclear Physics Accelerator Design301.apsiFortran 77Meteorology: Pollutant Distribution

37 Feb 2005Performance37 Some Other SPEC2000 Benchmarks  SPECapc - application specific (e.g. PROEngineer)  SPECviewperf - 3D rendering under OpenGL  SPEC HPC - high performance computing  SPEC OMP - multiprocessor systems  SPEC JVM - Java virtual machine  SPEC Web - Web services

38 Feb 2005Performance38 Benchmark Pitfalls  Vendors sometimes focus on making specific benchmarks fast  Example: tuning compiler (Old Figure 2.3)

39 Feb 2005Performance39 Performance Outline  Motivation  Defining Performance  Common Performance Metrics  Benchmarks  Amdahl’s Law 

40 Feb 2005Performance40 Amdahl’s Law  Improving one part of performance by a factor of N doesn’t increase overall performance by N  Book example: Suppose a program executes in 100 seconds where:  80 seconds are spent performing multiply operations  20 seconds are spent performing other operations  What happens if we speed up multipy n times? Multiply - 80 secOther - 20 sec Multiply - 40 secOther - 20 sec

41 Feb 2005Performance41 Amdahl’s Law Example (cont’d)  Execution time after speeding up multiply:  Bottom line: no matter what we do to multiply, execution time will always be >20 seconds!  Speedup factor of 5 is not possible!

42 Feb 2005Performance42 Amdahl’s Law Corollary  Make the common case fast  In our example, biggest gains when we speed up multiply  Speeding up “other instructions” is not as valuable Multiply - 80 secOther - 20 secMultiply - 80 secOther - 15secMultiply - 60 secOther - 20 sec

43 Feb 2005Performance43 Roadmap for the term: major topics  Computer Systems Overview  Technology Trends  Instruction sets (and Software)  Logic & Arithmetic  Performance  Processor Implementation  Memory Systems  Input/Output


Download ppt "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Performance Feb 2005."

Similar presentations


Ads by Google