10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah CS-447– Computer Architecture.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
5/26/2016Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 CPS4150 Chapter 4 Assessing and Understanding Performance.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.
CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
COD Ch. 1 Introduction + The Role of Performance.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CPEN Digital System Design Assessing and Understanding CPU Performance © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall © Computer.
Computer Organization
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University

10/19/2015Erkay Savas2 Performance What is performance? How to measure performance? Performance metrics Performance evaluation Why some hardware perform better than others for different programs? What factors in hardware are related to system overall performance? How does the machine's instruction set affect performance?

10/19/2015Erkay Savas Passenger throughput (passenger x m.p.h) Airplane Analogy Which of these airplanes has the best performance? Airbus A 3xx Douglas DC Concorde Boeing Boeing 777 Speed (m.p.h) Range (miles) Passenger Capacity Airplane

10/19/2015Erkay Savas4 Computer Performance Response time (latency) –How long does it take for my job to run? –How long does it take to execute a program? –How long must I wait for a database query? Throughput –How many jobs can the machine run at once? –What is the average execution rate? –How much work is getting done? If we upgrade a machine with a new processor what do we increase? If we add a new machine what do we increase?

10/19/2015Erkay Savas5 Which Time to Measure? Elapsed Time (Wall clock time, response time) –Counts everything (disk and memory access, I/O, operating system overhead, work on other processes) –Useful but not always good for comparison purposes CPU (execution) time –The time CPU spends computing for the user task –Not include time spent waiting for I/O, running other programs –user CPU time CPU time spent within the program, –system CPU time CPU time spent in the operating system performing tasks on behalf of the program

10/19/2015Erkay Savas6 CPU Time Unix time command reflects this breakdown by returning the following when prompted: 90.7u 12.9s 2:39 65% Interpretation: User CPU time is 90.7 s System CPU time is 12.9s Elapsed time is 159 s (  ) CPU time is 65% of total elapsed time

10/19/2015Erkay Savas7 A Definition of Performance For some program running on machine X Performance X = 1/Execution_time X The machine X is said to be “n times faster” than the machine Y if Performance X /Performance Y = n Execution_time Y /Execution_time X = n Example: Machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B?

10/19/2015Erkay Savas8 Metrics of Performance “Time to execute a program” is the ultimate metric in determining the performance However, it is convenient to inspect other metrics as well when we examine the details of a machine. Computers use a clock that runs at a constant rate and determines when an event takes place in hardware. These discrete time intervals are called clock cycles (or ticks, clock ticks, clock periods). Clock rate (frequency) is the inverse of clock period.

10/19/2015Erkay Savas9 Clock Cycles Clock “ticks” indicate when to start activities Instead of reporting execution time in seconds, we often use cycles time Start of events often the rising edge of the clock

10/19/2015Erkay Savas10 Clock Cycle cycle time ( CT ) = time between ticks = seconds per cycle Cycle Count ( CC ): the number of clock cycles to execute a program clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) A 200 MHz clock has a 1/(200·10 6 ) = ? nanosecond cycle time A 4 GHz clock has a 1/(4· 10 9 ) = ? nanosecond cycle time

10/19/2015Erkay Savas11 CPI CPI Clocks Per Instruction –Number of cycles spent on an instruction on average. –CC = IC  CPI –Hard to compute. –It is useful when comparing the performances of two machines with the same ISA. (Why?) Example: two machines with the same ISA. For a certain program we have –Machine A: CPI = 2.0 –Machine B: CPI = 1.2 –Which machine is faster? –What if machine A uses 250 ps and machine B 500 ps cycle time

10/19/2015Erkay Savas12 Improving Performance So, to improve performance 1.Increase the clock frequency (i.e. decrease the clock period) 2.Reduce the number of the clock cycles per program (IC  CPI)

10/19/2015Erkay Savas13 Instruction  Cycle ? No ! The number of cycles per instruction depends on the implementations of the instructions in hardware The number differs for each processor (even with the same ISA)

10/19/2015Erkay Savas14 The Reason Operations take different number of cycles –Multiplication takes longer than addition –Floating point operations take longer than integer operations –The access time to a register is much shorter than access to the main memory.

10/19/2015Erkay Savas15 Simple Formulae for CPU Time CPU execution time = CPU clock cycles for a program  Clock cycle time (CC  CT) CPU execution time = CPU clock cycles for a program/Clock rate We can write CPU clock cycles for a program = IC  CPI Then CPU execution time = (IC  CPI)/Clock rate

10/19/2015Erkay Savas16 Example Computer A of 800 MHz –It runs our favorite program in 15 s Our goal –Design computer B with the same ISA –It will run the same program in 8 s. We will use a new technology –can increase the clock rate; –however, it will also increase CPI by What clock rate should we aim to use?

10/19/2015Erkay Savas17 Performance Performance is determined by execution time (CPU time) We have also other indicators –# of cycles to execute program –# of instructions in program (IC) –# of cycles per second –average # of cycles per instruction (CPI) –average # of instructions per second Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

10/19/2015Erkay Savas18 Number of Instructions Example A compiler designer has the following two alternatives to generate a certain piece of code with instructions A(1 cycle), B (2 cycles), and C(3 cycles): 1.2  10 6 of A, 10 6 of B, and 2  10 6 of C ( IC = 5  10 6 ) 2.4  10 6 of A, 10 6 of B, and 10 6 of C ( IC = 6  10 6 ) –Which code sequence is faster?

10/19/2015Erkay Savas19 MIPS Millions Instructions Per Second = MIPS = IC/(Execution_time  10 6 ) MIPS = IC/(#of clocks  cycle time  10 6 ) MIPS = (IC  clock rate)/(IC  CPI  10 6 ) MIPS = clock rate/(CPI  10 6 ) A faster machine has a higher MIPS Execution_time = IC/(MIPS  10 6 )

10/19/2015Erkay Savas20 A MIPS Example A computer with 500 MHz clock –Three different classes of instructions: –A (1 cycle), B (2 cycles), C (3 cycles) Two compilers used to produce code for a large piece of software. –Compiler 1: –5 billion A, 1 billion B, and 1 billion C instructions. –Compiler 2: 10 billion A, 1 billion B, and 1 billion C instructions. Which sequence will be faster according to execution time? Which sequence will be faster according to MIPS?

10/19/2015Erkay Savas21 Problems of MIPS MIPS specifies instruction execution rate MIPS does not take into account the capabilities of the instructions –Thus, it is impossible to compare computers with different ISA using MIPS. MIPS is not constant, even on a single machine, depends on the application. As we saw in the previous example, MIPS can vary inversely with performance.

10/19/2015Erkay Savas22 CPI example CPI –Machine A: CPI = 10/7 = 1.43 –Machine B: CPI = 15/12 = 1.25 CPU time –CPU time = (IC  CPI) / clock rate –Let us assume both machines use 200 MHz clock

10/19/2015Erkay Savas23 Overview A given program will require 1.Some number of instructions 2.Some number of clock cycles 3.Some number of seconds Vocabulary –Cycle time: (micro or nano) seconds per cycle –Clock rate (frequency): cycles per second –CPI: clock per instruction –MIPS: millions of instruction per second –MFLOPS: millions of floating point operations per second

10/19/2015Erkay Savas24 Performance Performance is ultimately determined by execution time Is any of the following metrics good to measure performance by itself? Why? –# of cycles to execute a program –# of instructions in a program –# of cycles per second –Average # of cycles per instruction –Average # number of instructions per second

10/19/2015Erkay Savas25 Question Assuming two machines have the same ISA, which of the following quantities are identical? –Clock rate –CPI –Execution time –# of instructions –MIPS

10/19/2015Erkay Savas26 Program Performance IC, clock rate, CPI IC, CPI IC, possibly CPI ISA Compiler Programming Language Algorithm Affects what?How?HW or SW component

10/19/2015Erkay Savas27 Benchmarks Programs specifically chosen to measure performance –must reflect typical workload of the user Benchmark types –Real applications –Small benchmarks –Benchmark suites –Synthetic benchmarks

10/19/2015Erkay Savas28 Real Applications Workload: Set of programs a typical user runs day in and day out. To use these real applications for metrics is a direct way of comparing the execution time of the workload on two machines. Using real applications for metrics has certain restrictions: –They are usually big –Takes time to port to different machines –Takes considerable time to execute –Hard to observe the outcome of a certain improvement technique

10/19/2015Erkay Savas29 Comparing & Summarizing Performance A is 100 times faster than B for program 1 B is 10 times faster than A for program 2 For total performance, arithmetic mean is used: Computer AComputer B Program 11 s100 s Program s100 s Total time1001 s200 s

10/19/2015Erkay Savas30 Arithmetic Mean If each program, in the workload, do not run equal times, then we have to use weighted arithmetic mean weight Computer AComputer B Program 1 (seconds) Program 2 (seconds) Weighted AM -?? Suppose that the program 1 runs 10 times as often as the program 2. Which machine is faster?