Performance Performance

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Chapter 4 Assessing and Understanding Performance
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
Memory/Storage Architecture Lab Computer Architecture Performance.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Cost and Performance.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
CpE 442 Introduction to Computer Architecture The Role of Performance
Lecture 2: Performance Evaluation
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
Defining Performance Which airplane has the best performance?
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
Computer Performance He said, to speed things up we need to squeeze the clock.
August 30, 2000 Prof. John Kubiatowicz
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Computer Organization and Design Chapter 4
Presentation transcript:

Performance Performance time to do the task, i.e., execution time, response time, latency, etc. tasks finished per day, hour, week, sec, ms, ns., etc throughput, bandwidth

Performance X is n times faster than Y means that ExTime(Y) Performance(X) --------------- = ---------------------- = n ExTime(X) Performance(Y)

Performance Metrics MIPS MFLOPS others millions of instructions per second MFLOPS millions of (F.P.) operations per second others Megabytes per second Cycles per second (clock rate)

CPU Performance CPU time = seconds/program = instructions/program x cycles/instruction x seconds/cycle Program compiler inst. set organization technology instr. count CPI clock rate x x x x x x x x

CPI CPI : average clock cycles per instruction CPI = (cpu time x clock rate) / instruction count = cpu time / (clock cycle time x instruction count) = clock cycles / instruction count CPU time = clock cycle time x CPIi x Ii Instruction frequency of the ith instruction = Fi = No. of the ith instr. / total inst. count invest resources where most time is spent make frequently used components run fast

RISC Processor A simple RISC machine instr. frequency mix Op Freq Cycles CPI % time ALU 50% 1 .5 33% Load 20% 2 .4 27% Store 10% 2 .2 13% Branch 20% 2 .4 27% 33% = 0.5 / (0.5 + 0.4 + 0.2 + 0.4) 27% = 0.4 / (0.5 + 0.4 + 0.2 + 0.4)

Amdahl’s Law Speedup due to enhancement E ExTime w/o E Performance w/ E Speedup(E) = -------------------- = ----------------------------- ExTime w/ E Performance w/o E Suppose a fraction F of the task is accelerated by a factor S 1 Speedup(E) = ------------------ (1- F) + F/S If a program has two processes to run, and we double the speed of one process, what is the speedup?

Processor Metrics Processor Metrics CPU execution time CPU clock cycles per program CPI (this is an average value) IPC = 1/CPI, instruction per clock cycle CPI is closely related with Instruction set architecture Implementation method program measured

Amdahl’s Law Make the common case fast examples used in modern computers registers : keep frequently used values in register files (register file is small in size, small access time) memory hierarchy : cache memory (temporal locality, spatial locality, save recently used data in cache. simple addressing modes

MIPS MIPS calculation instruction count execution time x 10 6 MIPS (millions of instructions per second) MIPS would be higher for a program using simple instructions. MIPS = MHz/CPI A processor running at 100 MHz and CPI = 2, MIPS of this processor = 100/2 = 50

MFLOPS MFLOPS What’s wrong with MFLOPS ? Machine dependent MFLOPS = FP operations / (time x 10 ) 6 What’s wrong with MFLOPS ? Machine dependent Program dependent The execution times of FP multiply, FP divide FP add, FP subtraction are different.

Evaluate Performance Benchmarks Benchmarks should represent a large class of important programs improving benchmark performance should help many programs for better or worse, benchmarks shape a field bad benchmarks hurt progress good benchmarks accelerate progress

Evaluate Processor Performance Toy benchmarks 100 line code, such as puzzle, quicksort, bubble sort, etc. Synthetic Benchmarks attempt to match average frequencies of real workloads such as Whetstone, Dhrystone Microbenchmarks collection of small tests that cover machine primitives Kernels time critical excerpts of real programs such as Linpack, Livermore loops Real Programs such as gcc, spice, database, stock trading

SPEC Benchmark SPEC : System Performance Evaluation Committee EE Times + 5 companies (Sun, MIPS, HP, Apollo, DEC) join together to perform SPEC. SPEC creates standard list of programs, inputs, reports, (some real programs, includes OS calls, some I/O) 6 integer programs, 14 floating point programs

Comparing and Summarizing Performance Arithmetic Mean (AM) i

Comparing and Summarizing Performance Harmonic Mean (HM) i rate = 1 / execution time n programs in the workload

Comparing and Summarizing Performance Geometric Mean (GM) i execution time ratioi is the execution time normalized to the reference machine for the ith program of a total of n programs

Comparing and Summarizing Performance Arithmetic Mean AM tracks relative execution time Harmonic Mean HM tracks total execution time Geometric Mean GM rewards all improvements equally

Performance Summary CPU time Benchmarks Amdahl’s Law time on workload is the final measure of computer performance Benchmarks good benchmarks, good ways to summarize performance Amdahl’s Law invest your effort where the time spent most remaining unimproved parts also count

CPI Example Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 10 ns. and CPI of 2.0 Machine B has a clock cycle time of 20 ns. and CPI of 1.2 What machine is faster for this program, and by how much? If two machines have the same ISA which of our quantities (clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

No. of Instructions Example A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require 1, 2, and 3 cycles respectively The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second code sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence?

MIPS Example Two different compilers are being tested for a 100MHz machine with three different classes of instructions: Class A, Class B, and Class C which require one, two, and three cycles respectively. Both compilers are used to produce code for a large piece of software. The first compiler’s code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions The second compiler’s code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time?

Amdahl’s Law Example Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions ?

Some Concepts to Remember For a given architecture, performance increase come from Increases in clock rate Lowers CPI (through improvements in processor organization) Lowers CPI or instruction count (through compiler enhancements)