810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Assessing and Understanding Performance Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
LOGO Computer Architecture Dr. Esam Al_Qaralleh Princess Sumaya University for Technology.
ECE-C355 Computer Structures Winter 2008 Chapter 04: Understanding Performance Slides are adapted from Professor Mary Jane Irwin (
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 04: Understanding Performance Mary Jane Irwin (
Read Section 1.4, Section 1.7 (pp )
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Performance COE 308 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Performance ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Performance ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
Computer Performance Computer Engineering Department.
CS3350B Computer Architecture Winter 2015 Performance Metrics I Marc Moreno Maza
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Understanding Performance Mary Jane Irwin ( )
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
Computer Architecture Chapter 4 Assessing and Understanding Performance Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
Lecture2: Performance Metrics Computer Architecture By Dr.Hadi Hassan 1/3/2016Dr. Hadi Hassan Computer Architecture 1.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
CS35101 – Computer Architecture Week 9: Understanding Performance Paul Durand ( ) [Adapted from M Irwin (
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals.
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
Measuring Performance II and Logic Design
Computer Organization
Lecture 2: Performance Evaluation
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Assessing and Understanding Performance
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Computer Architecture
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Presentation transcript:

810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson & Hennessy, © 2005

810:142 Lecture 2: Performance Fall 2006 When would you want to compare performance between different computers?

810:142 Lecture 2: Performance Fall 2006 Performance Metrics  Purchasing perspective l given a collection of machines, which has the -best performance ? -least cost ? -best cost/performance?  Design perspective l faced with design options, which has the -best performance improvement ? -least cost ? -best cost/performance?  Both require l basis for comparison l metric for evaluation  Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors

810:142 Lecture 2: Performance Fall 2006 What ways can be used to determine perform on a desktop PC? What ways can be used to determine perform on a server?

810:142 Lecture 2: Performance Fall 2006 Defining (Speed) Performance  Normally interested in reducing l Response time (aka execution time) – the time between the start and the completion of a task -Important to individual users l Thus, to maximize performance, need to minimize execution time l Throughput – the total amount of work done in a given time -Important to data center managers l Decreasing response time almost always improves throughput performance X = 1 / execution_time X If X is n times faster than Y, then performance X execution_time Y = = n performance Y execution_time X

810:142 Lecture 2: Performance Fall 2006 Performance Factors  Want to distinguish elapsed time and the time spent on our task  CPU execution time (CPU time) – time the CPU spends working on a task l Does not include time waiting for I/O or running other programs CPU execution time # CPU clock cycles for a program for a program = x clock cycle time CPU execution time # CPU clock cycles for a program for a program clock rate =  Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program or

810:142 Lecture 2: Performance Fall 2006 CPU time l Does not include time waiting for I/O or running other programs

810:142 Lecture 2: Performance Fall 2006 Review: Machine Clock Rate  Clock rate (MHz, GHz) is inverse of clock cycle time (clock period) CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec clock cycle => 1 GHz clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate

810:142 Lecture 2: Performance Fall 2006 Clock Cycles per Instruction  Not all instructions take the same amount of time to execute l One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction  Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute l A way to compare two different implementations of the same ISA # CPU clock cycles # Instructions Average clock cycles for a program for a program per instruction = x CPI for this instruction class ABC CPI123

810:142 Lecture 2: Performance Fall 2006 Effective CPI  Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging Overall effective CPI =  (CPI i x IC i ) i = 1 n l Where IC i is the count (percentage) of the number of instructions of class i executed l CPI i is the (average) number of clock cycles per instruction for that instruction class l n is the number of instruction classes  The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs

810:142 Lecture 2: Performance Fall 2006 THE Performance Equation  Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle Instruction_count x CPI clock_rate CPU time = or  These equations separate the three key factors that affect performance l Can measure the CPU execution time by running the program l The clock rate is usually given l Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details l CPI varies by instruction type and ISA implementation for which we must know the implementation details

810:142 Lecture 2: Performance Fall 2006 Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPIclock_cycle Algorithm Programming language Compiler ISA Processor organization Technology

810:142 Lecture 2: Performance Fall 2006 Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPIclock_cycle Algorithm Programming language Compiler ISA Processor organization Technology X XX XX XX X X X X X

810:142 Lecture 2: Performance Fall 2006 A Simple Example – calculate average CPI OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  =

810:142 Lecture 2: Performance Fall 2006 A Simple Example OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  =

810:142 Lecture 2: Performance Fall 2006 A Simple Example  How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?  How does this compare with using branch prediction to shave a cycle off the branch time?  What if two ALU instructions could be executed at once? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  =

810:142 Lecture 2: Performance Fall 2006 A Simple Example  How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = Q

810:142 Lecture 2: Performance Fall 2006 A Simple Example  How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster (Notice that 2.2 x IC x CC / 1.6 x IC x CC = 2.2/1.6, since the IC and CC remain unchanged by adding a bigger cache) 1.6 Q

810:142 Lecture 2: Performance Fall 2006 A Simple Example  How does this compare with using branch prediction to shave a cycle off the branch time? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = Q

810:142 Lecture 2: Performance Fall 2006 A Simple Example  How does this compare with using branch prediction to shave a cycle off the branch time? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = Q CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster

810:142 Lecture 2: Performance Fall 2006 A Simple Example  What if two ALU instructions could be executed at once? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = Q

810:142 Lecture 2: Performance Fall 2006 A Simple Example  What if two ALU instructions could be executed at once? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = Q CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster

810:142 Lecture 2: Performance Fall 2006 A Simple Example  How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?  How does this compare with using branch prediction to shave a cycle off the branch time?  What if two ALU instructions could be executed at once? OpFreqCPI i Freq x CPI i ALU50%1 Load20%5 Store10%3 Branch20%2 Overall effective CPI  = CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster 1.6 Q Q CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster Q CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster

810:142 Lecture 2: Performance Fall 2006 Comparing and Summarizing Performance  Guiding principle in reporting performance measurements is reproducibility – list everything another experimenter would need to duplicate the experiment (version of the operating system, compiler settings, input set used, specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.))  How do we summarize the performance for benchmark set with a single number? l The average of execution times that is directly proportional to total execution time is the arithmetic mean (AM) AM = 1/n  Time i i = 1 n l Where Time i is the execution time for the i th program of a total of n programs in the workload l A smaller mean indicates a smaller average execution time and thus improved performance

810:142 Lecture 2: Performance Fall 2006 SPEC CPU Benchmarks Integer benchmarksFP benchmarks gzipcompressionwupwiseQuantum chromodynamics vprFPGA place & routeswimShallow water model gccGNU C compilermgridMultigrid solver in 3D fields mcfCombinatorial optimizationappluParabolic/elliptic pde craftyChess programmesa3D graphics library parserWord processing programgalgelComputational fluid dynamics eonComputer visualizationartImage recognition (NN) perlbmkperl applicationequakeSeismic wave propagation simulation gapGroup theory interpreterfacerecFacial image recognition vortexObject oriented databaseammpComputational chemistry bzip2compressionlucasPrimality testing twolfCircuit place & routefma3dCrash simulation fem sixtrackNuclear physics accel apsiPollutant distribution

810:142 Lecture 2: Performance Fall 2006 Example SPEC Ratings

810:142 Lecture 2: Performance Fall 2006 Example SPEC CPU Ratings higher Spec ratio number indicates faster performance than a Sun Ultra 5_10 with a 300 MHz processor in this graph performance scales almost linearly with clock rate increase but this is often not the case! Why?

810:142 Lecture 2: Performance Fall 2006 Other Performance Metrics  Power consumption – especially in the embedded market where battery life is important (and passive cooling) l For power-limited applications, the most important metric is energy efficiency

810:142 Lecture 2: Performance Fall 2006 Other Performance Metrics  SPECweb99: Throughput Benchmark for Web Servers  Embedded Computer SPEC benchmarks

810:142 Lecture 2: Performance Fall 2006 Summary: Evaluating ISAs  Design-time metrics: l Can it be implemented, in how long, at what cost? l Can it be programmed? Ease of compilation?  Static Metrics: l How many bytes does the program occupy in memory?  Dynamic Metrics: l How many instructions are executed? How many bytes does the processor fetch to execute the program? l How many clocks are required per instruction? l How "lean" a clock is practical? Best Metric: Time to execute the program! CPI Inst. CountCycle Time depends on the instructions set, the processor organization, and compilation techniques.

810:142 Lecture 2: Performance Fall 2006 Next Lecture and Reminders  Next lecture l MIPS non-pipelined datapath/control path review -Reading assignment – PH, Chapter 5 -We’ll focus on sections