Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Performance Feb 2005.

Slides:



Advertisements
Similar presentations
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Advertisements

CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
Computer Abstractions and Technology
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Performance COE 308 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Patterson.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 23 - Course.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Memory Hierarchy 2.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 10 - Performance.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
Lecture 2: Computer Performance
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Lecture2: Performance Metrics Computer Architecture By Dr.Hadi Hassan 1/3/2016Dr. Hadi Hassan Computer Architecture 1.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance II and Logic Design
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Computer Architecture & Operations I
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Performance ICS 233 Computer Architecture and Assembly Language
COMS 361 Computer Organization
CS161 – Design and Architecture of Computer Systems
Computer Organization and Design Chapter 4
Presentation transcript:

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Performance Feb 2005 Reading: , 4.7* Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s Slides - Fall 1999 CMU other sources as noted

Feb 2005Performance2 Roadmap for the term: major topics  Computer Systems Overview  Technology Trends  Instruction sets (and Software)  Logic & Arithmetic  Performance   Processor Implementation  Memory Systems  Input/Output

Feb 2005Performance3 Performance Outline  Motivation   Defining Performance  Common Performance Metrics  Benchmarks  Amdahl’s Law

Feb 2005Performance4  Goal: Learn to “measure, report and summarize” performance of a computer system  Why study performance?  To make intelligent decisions when choosing a system  To make intelligent decisions when designing a system  Understand impact of implementation decisions  Challenges  How do we measure performance accurately?  How do we compare performance fairly?

Feb 2005Performance5 What’s a good measure of performance?  Execution Time (a.k.a. response time, latency)  How long it takes to complete a single task  Example: “how long does it take to rip an MP3 file?”  Throughput  How many tasks are completed per unit time  Example: “how many MP3 files can I rip per hour?  The measure we use depends on the application

Feb 2005Performance6 Execution Time vs. Throughput  Analogy: passenger airplanes (book Figure 4.1)  Concorde - fastest “response time” for an individual user  Boeing highest passenger throughput

Feb 2005Performance7 Performance Outline  Motivation  Defining Performance   Common Performance Metrics  Benchmarks  Amdahl’s Law

Feb 2005Performance8 Measuring Execution Time  Wall-clock time or elapsed time  Includes I/O waiting  Includes time while OS runs other jobs  CPU time - measured by OS  User time - time spent in user program during execution  System time - time spent in OS during execution  Measuring CPU time: Unix/Linux time command % time myprog … 90.7u 12.9s 2:39 65% User Time System TimeWall-clock Time CPU Utilization

Feb 2005Performance9 Defining Performance using Execution Time  For a given program on machine X:  Comparing performance of machines: Performance X > Performance Y if Execution Time X < Execution Time Y

Feb 2005Performance10 Defining Performance (cont’d)  We say “X is n times faster than Y” if: Use this definition to compare performance in homework & exam problems!

Feb 2005Performance11 Example - Performance  Which machine is faster?  By how much? B A A

Feb 2005Performance12 Performance Outline  Motivation  Defining Performance  Common Performance Metrics   Benchmarks  Amdahl’s Law

Feb 2005Performance13 Clocks and Performance  Every processor has a clock  Clock frequency - MHz (or GHz)  Clock cycle time / period - ns (or ps)  How do we relate clock to program performance? Cycle time t clock = 2 ns f clk = 500 MHz

Feb 2005Performance14 Example - Clock Cycles  How many clock cycles does A execute?  How many clock cycles does B execute?

Feb 2005Performance15 Review - Clocks in Sequential Circuits  Controls sequential circuit operation  Register outputs change at beginning of cycle  Combinational logic determines “next state”  Storage elements store new state Adder Mux Combinational LogicRegister Output Register Input Clock

Feb 2005Performance16 What Limits Clock Frequency?  Propagation delay - t prop Logic (including register outputs) Interconnect  Register setup time - t setup Adder Mux Combinational Logic Register Output Register Input Clock t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack

Feb 2005Performance17 Clock Cycles per Instruction (CPI)  Consider the 68HC11 …  ADDA - 3 cycles (IMM) -> 5 cycles (IND, Y)  MUL - 10 cycles  IDIV - 41 cycles  More complex processors have other issues…  Pipelining - parallel execution, but sometimes stalls  Memory system issues: cache misses, page faults, etc.  How can we combine these into an overall metric? addldamulandsta Total Execution Time

Feb 2005Performance18 Definition: Clock Cycles per Instruction (CPI)  Average number of clock cycles per instruction  Measured for an entire program

Feb 2005Performance19 Example - CPI  What is the CPI of A?  What is the CPI of B?

Feb 2005Performance20 Definition - MIPS  MIPS - millions of instructions per second  Once used as a general metric for performance  But, not useful for comparing different architectures  Often ridiculed as “meaningless indicator of performance”

Feb 2005Performance21 Example - MIPS  What is the MIPS of A?  What is the MIPS of B?

Feb 2005Performance22 Relating the Metrics - The Performance Equation  The “Iron Law” of Performance

Feb 2005Performance23 Clock Cycles and Performance - Example  Program runs on Computer A:  CPU Time: 10 seconds  Clock: 400MHz  Computer B can run clock faster  But, requires 1.2X clock cycles to perform same task  Desired CPU Time: 6 Seconds  What should the clock frequency be to reach this target?  Key to approach: Performance equation

Feb 2005Performance24 Clock Cycles and Performance - Example (cont’d) First step: find clock cycles executed by Computer A Second step: find clock cycles executed by Computer B

Feb 2005Performance25 Clock Cycles and Performance - Example (cont’d) Third step: given clock cycles and CPU time, solve for clock rate of Computer B

Feb 2005Performance26 Performance Tradeoffs  Program Instruction count - impacted by  Instructions available to perform basic tasks (architecture)  Quality of compiler code generator  CPI - impacted by  Chip implementation  Quality of compiler code generator  Memory system performance  Clock Rate  Delay characteristics of IC technology  Logical structure of implementation

Feb 2005Performance27 Performance Outline  Motivation  Defining Performance  Common Performance Metrics  Benchmarks   Amdahl’s Law

Feb 2005Performance28 Benchmarks - Programs to Evaluate Performance  The book defines performance in terms of a specific program  But, which program should you use?  Ideally, “real” programs  Ideally, programs you will use  But what if you don’t know or don’t have time to find out?  Alternative - Benchmark Suites

Feb 2005Performance29 Benchmark Suites  Use a collection of small programs  Summarize performance … how?  Total Execution Time  Arithmetic Mean  Weighted Arithmetic Mean  Geometric Mean

Feb 2005Performance30 Total Execution Time  Suppose we have two benchmarks (Fig. 2.5)  How do we compare computers A and B?  A is 10 times faster than B for Program 1  B is 10 times faster than A for Program 2  Summarizing performance: total execution time  Reasonable comparison for even workload

Feb 2005Performance31 Summarizing Performance  Arithmetic Mean  Example:  Weighted Arithmetic Mean - use when some programs run more often than others  Example:  Program 1 is 80% of workload  Program 2 is 20% of workload

Feb 2005Performance32 The SPEC Benchmark Suite  System Performance Evaluation Corporation  Founded by workstation vendors with goal of realistic, standardized performance test  Philosophy: fair comparison between real systems  Multiple versions - most recent is SPEC2000  Basic approach  Measure execution time of several small programs  Normalize each to performance on a reference machine  Combine performances using geometric mean

Feb 2005Performance33 More about SPEC  Separate integer & floating point suites  SPECint - integer performance  SPECfp - floating point performance  Several Versions  SPEC89 - initial version  SPEC95 - see Figure 2.6 in book  SPEC CPU current CINT 2000 CFP 2000

Feb 2005Performance34 Some Not-so-Recent SPEC Data Source: Berkeley CPU Information Center  Some SPEC95 numbers:  SPEC2000 Results - see

Feb 2005Performance35 SPEC CINT2000 Benchmarks  CINT2000 (Integer Component of SPEC CPU2000): BenchmarkLang.Category 164.gzipCCompression 175.vprCFPGA Circuit Placement and Routing 176.gccCC Programming Language Compiler 181.mcfCCombinatorial Optimization 186.craftyCGame Playing: Chess 197.parserCWord Processing 252.eonC++Computer Visualization 253.perlbmkCPERL Programming Language 254.gapCGroup Theory, Interpreter 255.vortexCObject-oriented Database 256.bzip2CCompression 300.twolfCPlace and Route Simulator

Feb 2005Performance36 SPEC CFP2000 Benchmarks  CFP2000 (Floating Point Component of SPEC CPU2000): BenchmarkLanguageCategory 168.wupwiseFortran 77Physics / Quantum Chromodynamics 171.swimFortran 77Shallow Water Modeling 172.mgridFortran 77Multi-grid Solver: 3D Potential Field 173.appluFortran 77Parabolic / Elliptic Partial Diff. Equations 177.mesaC3-D Graphics Library 178.galgelFortran 90Computational Fluid Dynamics 179.artCImage Recognition / Neural Networks 183.equakeCSeismic Wave Propagation Simulation 187.facerecFortran 90Image Processing: Face Recognition 188.ammpCComputational Chemistry 189.lucasFortran 90Number Theory / Primality Testing 191.fma3dFortran 90Finite-element Crash Simulation 200.sixtrackFortran 77High Energy Nuclear Physics Accelerator Design301.apsiFortran 77Meteorology: Pollutant Distribution

Feb 2005Performance37 Some Other SPEC2000 Benchmarks  SPECapc - application specific (e.g. PROEngineer)  SPECviewperf - 3D rendering under OpenGL  SPEC HPC - high performance computing  SPEC OMP - multiprocessor systems  SPEC JVM - Java virtual machine  SPEC Web - Web services

Feb 2005Performance38 Benchmark Pitfalls  Vendors sometimes focus on making specific benchmarks fast  Example: tuning compiler (Old Figure 2.3)

Feb 2005Performance39 Performance Outline  Motivation  Defining Performance  Common Performance Metrics  Benchmarks  Amdahl’s Law 

Feb 2005Performance40 Amdahl’s Law  Improving one part of performance by a factor of N doesn’t increase overall performance by N  Book example: Suppose a program executes in 100 seconds where:  80 seconds are spent performing multiply operations  20 seconds are spent performing other operations  What happens if we speed up multipy n times? Multiply - 80 secOther - 20 sec Multiply - 40 secOther - 20 sec

Feb 2005Performance41 Amdahl’s Law Example (cont’d)  Execution time after speeding up multiply:  Bottom line: no matter what we do to multiply, execution time will always be >20 seconds!  Speedup factor of 5 is not possible!

Feb 2005Performance42 Amdahl’s Law Corollary  Make the common case fast  In our example, biggest gains when we speed up multiply  Speeding up “other instructions” is not as valuable Multiply - 80 secOther - 20 secMultiply - 80 secOther - 15secMultiply - 60 secOther - 20 sec

Feb 2005Performance43 Roadmap for the term: major topics  Computer Systems Overview  Technology Trends  Instruction sets (and Software)  Logic & Arithmetic  Performance  Processor Implementation  Memory Systems  Input/Output