1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
9/16/2004Comp 120 Fall September 16 Assignment 4 due date pushed back to 23 rd, better start anywayAssignment 4 due date pushed back to 23 rd, better.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2: Computer Performance
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Two notions of performance
Lecture 2: Performance Evaluation
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
Computer Performance He said, to speed things up we need to squeeze the clock.
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

1

2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable performance comparison? —The 3 components of CPU performance —The 2 laws of performance

3 Why know about performance  Purchasing Perspective: —Given a collection of machines, which has the Best Performance? Lowest Price? Best Performance/Price?  Design Perspective: —Faced with design options, which has the Best Performance Improvement? Lowest Cost? Best Performance/Cost ?  Both require —Basis for comparison —Metric for evaluation

4 Many possible definitions of performance  Every computer vendor will select one that makes them look good. How do you make sense of conflicting claims? Q: Why do end users need a new performance metric? A: End users who rely only on megahertz as an indicator for performance do not have a complete picture of PC processor performance and may pay the price of missed expectations.

5 Two notions of performance  Which has higher performance? —Depends on the metric Time to do the task (Execution Time, Latency, Response Time) Tasks per unit time (Throughput, Bandwidth) —Response time and throughput are often in opposition PlaneDC to ParisSpeedPassengersThroughput (pmph) hours610 mph470286,700 Concorde3 hours1350 mph132178,200

6 Some Definitions  Performance is in units of things/unit time —E.g., Hamburgers/hour —Bigger is better  If we are primarily concerned with response time —Performance(x) = 1 execution_time(x)  Relative performance: “X is N times faster than Y” N =Performance(X) = execution_time(Y) Performance(Y) execution_time(X)

7 Some Examples  Time of Concorde vs. 747?  Throughput of Concorde vs. 747? PlaneDC to ParisSpeedPassengersThroughput (pmph) hours610 mph470286,700 Concorde3 hours1350 mph132178,200

8  When comparing systems, need to fix the workload —Which workload? Basis of Comparison WorkloadProsCons Actual Target Workload RepresentativeVery specific Non-portable Difficult to run/measure Full Application Benchmarks Portable Widely used Realistic Less representative Small “Kernel” or “Synthetic” Benchmarks Easy to run Useful early in design Easy to “fool” MicrobenchmarksIdentify peak capability and potential bottlenecks Real application performance may be much below peak

9 Benchmarking  Some common benchmarks include: —Adobe Photoshop for image processingAdobe Photoshop —BAPCo SYSmark for office applicationsBAPCo SYSmark —Unreal Tournament 2003 for 3D gamesUnreal Tournament 2003 —SPEC2000 for CPU performance  The best way to see how a system performs for a variety of programs is to just show the execution times of all of the programs.  Here are execution times for several different Photoshop 5.5 tasks, from

10 Summarizing performance  Summarizing performance with a single number can be misleading—just like summarizing four years of school with a single GPA!  If you must have a single number, you could sum the execution times. This example graph displays the total execution time of the individual tests from the previous page.  A similar option is to find the average of all the execution times. For example, the 800MHz Pentium III (in yellow) needed seconds to run 21 programs, so its average execution time is 227.3/21 = seconds.  A weighted sum or average is also possible, and lets you emphasize some benchmarks more than others.

11 The components of execution time  Execution time can be divided into two parts. —User time is spent running the application program itself. —System time is when the application calls operating system code.  The distinction between user and system time is not always clear, especially under different operating systems.  The Unix time command shows both. salary.125 > time distill 05-examples.ps Distilling 05-examples.ps (449,119 bytes) 10.8 seconds (0:11) 449,119 bytes PS => 94,999 bytes PDF (21%) 10.61u 0.98s 0: % User time System time CPU usage = (User + System) / Total “Wall clock” time (including other processes)

12 Three Components of CPU Performance Cycles Per Instruction CPU time X,P = Instructions executed P * CPI X,P * Clock cycle time X

13  Instructions executed: —We are not interested in the static instruction count, or how many lines of code are in a program. —Instead we care about the dynamic instruction count, or how many instructions are actually executed when the program runs.  There are three lines of code below, but the number of instructions executed would be XXXX?. li$a0, 1000 Ostrich:sub$a0, $a0, 1 bne$a0, $0, Ostrich Instructions Executed

14  The average number of clock cycles per instruction, or CPI, is a function of the machine and program. —The CPI depends on the actual instructions appearing in the program— a floating-point intensive application might have a higher CPI than an integer-based program. —It also depends on the CPU implementation. For example, a Pentium can execute the same instructions as an older 80486, but faster.  It is common to each instruction took one cycle, making CPI = 1. —The CPI can be >1 due to memory stalls and slow instructions. —The CPI can be <1 on machines that execute more than 1 instruction per cycle (superscalar). CPI

15  One “cycle” is the minimum time it takes the CPU to do any work. —The clock cycle time or clock period is just the length of a cycle. —The clock rate, or frequency, is the reciprocal of the cycle time.  Generally, a higher frequency is better.  Some examples illustrate some typical frequencies. —A 500MHz processor has a cycle time of 2ns. —A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns (500ps). Clock cycle time

16 CPU time X,P = Instructions executed P * CPI X,P * Clock cycle time X  The easiest way to remember this is match up the units:  Make things faster by making any component smaller!!  Often easy to reduce one component by increasing another Execution time, again Seconds = Instructions * Clock cycles * Seconds Program InstructionsClock cycle ProgramCompilerISAOrganizationTechnology Instruction Executed CPI Clock Cycle TIme

17  Let’s compare the performances two 8086-based processors. —An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor. —A 1GHz Pentium III with a CPI of 1.5 for the same program.  Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions.  But they implement the ISA differently, which leads to different CPIs. CPU time AMD,P = Instructions P * CPI AMD,P * Cycle time AMD = CPU time P3,P = Instructions P * CPI P3,P * Cycle time P3 = Example 1: ISA-compatible processors

18 Example 2: Comparing across ISAs  Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple instructions per cycle. If an Itanium processor achieves an average CPI of.3 (3 instructions per cycle), how much faster is it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1? a)Itanium is three times faster b)Itanium is one third as fast c)Not enough information

19 Improving CPI  Many processor design techniques we’ll see improve CPI —Often they only improve CPI for certain types of instructions  F i = Fraction of instructions of type i CPI =  CPI  F where F = I i = 1 n i i i i Instruction Count  First Law of Performance: Make the common case fast

20 Example: CPI improvements  Base Machine:  How much faster would the machine be if: —we added a cache to reduce average load time to 3 cycles? —we added a branch predictor to reduce branch time by 1 cycle? —we could do two ALU operations in parallel? Op TypeFreq (f i )CyclesCPI i ALU50%3 Load20%5 Store10%3 Branch20%2

21  Amdahl’s Law states that optimizations are limited in their effectiveness.  For example, doubling the speed of floating-point operations sounds like a great idea. But if only 10% of the program execution time T involves floating-point code, then the overall performance improves by just 5%.  What is the maximum speedup from improving floating point? Amdahl’s Law Execution time after improvement = Time affected by improvement + Time unaffected by improvement Amount of improvement Execution time after improvement = 0.10 T T=0.95 T 2  Second Law of Performance: Make the fast case common

22  Performance is one of the most important criteria in judging systems.  There are two main measurements of performance. —Execution time is what we’ll focus on. —Throughput is important for servers and operating systems.  Our main performance equation explains how performance depends on several factors related to both hardware and software. CPU time X,P = Instructions executed P * CPI X,P * Clock cycle time X  It can be hard to measure these factors in real life, but this is a useful guide for comparing systems and designs.  Amdahl’s Law tell us how much improvement we can expect from specific enhancements.  The best benchmarks are real programs, which are more likely to reflect common instruction mixes. Summary