Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.

Slides:



Advertisements
Similar presentations
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Advertisements

Datorteknik Pipeline1 bild 1 No Assembly line. Datorteknik Pipeline1 bild 2 Assembly line - start up ChassisAxelsMotorSeatsBody Start up waste.
Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1/18/02CSE Performance I Measuring Performance Part I.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
Memory/Storage Architecture Lab Computer Architecture Performance.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 Measuring and Discussing Computer System Performance or “My computer is faster than your computer” Reading: 2.4, Peer Instruction Lecture Materials.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Computer Architecture
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Computer Architecture CPSC 350
Cost and Performance.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction to Computer Architecture The Role of Performance
Lecture 2: Performance Evaluation
4- Performance Analysis of Parallel Programs
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Performance of Single-cycle Design
Morgan Kaufmann Publishers
Computer Architecture CSCE 350
Performance COE 301 Computer Organization
Performance of computer systems
Performance of computer systems
August 30, 2000 Prof. John Kubiatowicz
Performance of computer systems
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what affects execution time –examples Choosing good benchmarks? –choosing bad benchmarks? Amdahl's Law

Datorteknik PerformanceAnalyse bild 2 Performance is Time Time to do the task (Execution Time) –execution time, response time, latency Tasks per unit time (sec, minute,...) –throughput, bandwidth

Datorteknik PerformanceAnalyse bild 3 Performance as Response Time Performance is most often measured as response time or execution time for some task. “X is n times faster than Y” means Performance(X) Execution Time(Y) –––––––––––––– = –––––––––––––––– = n Performance(Y) Execution Time(X) Example Execution time of program P X is 5 sec; Y is 10 sec. X is 2 times faster than Y.

Datorteknik PerformanceAnalyse bild 4 What time to measure? Elapsed time, wall-clock time: –actual time from start to completion –depends on CPU, system, I/O, etc. –often used in real benchmarks –only suitable choice when I/O is included CPU Time: –measure/analyze CPU performance only –may be suitable when machine is timeshared –possibly both user and system component –User CPU time is our focus for first part of course Elapsed time = CPU time + Idle time –usually and assuming time is accurately accounted for

Datorteknik PerformanceAnalyse bild 5 Metrics of performance Different performance metrics are appropriate at different levels: Compiler Language Programming Application Datapath Control Function Units Transistors ISA Answers per month Operations per second (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Cycles per Instruction

Datorteknik PerformanceAnalyse bild 6 Relating Processor Metrics CPU execution time per program = CPU clock cycles/program X Clock cycle time = CPU clock cycles/program ÷ Clock rate (frequency) CPU clock cycles/program = Instructions/program X Clock cycles Per Instruction Clock cycles Per Instruction (CPI) is an average measurement, it depends on : –ISA, the implementation, and the program measured –CPI = CPU clock cycles/program ÷ Instructions/program –Also, Instructions per clock cycle or IPC = 1 / CPI CPU execution time = Instructions X CPI X Clock cycle

Datorteknik PerformanceAnalyse bild 7 Let’s look at the single-cycle model analytically

Datorteknik PerformanceAnalyse bild 8 Static timing analysis Memories10 ns Register 5 ns Adders10 ns ALU10 ns Use topological sort!

Datorteknik PerformanceAnalyse bild 9 5 ns Branch logic Sgn/Ze extend Zero ext. lw $2 const($3) 10 ns ALU A B ns 35 ns delay

Datorteknik PerformanceAnalyse bild 10 But that path goes through the data memory! What if this is not a load/store? How about an instruction that does nothing? “NOP”

Datorteknik PerformanceAnalyse bild 11 5 ns Branch logic Sgn/Ze extend Zero ext. Nop 10 ns ALU A B ns 10 ns delay

Datorteknik PerformanceAnalyse bild 12 5 ns Branch logic Sgn/Ze extend Zero ext. Add $ra $rb $rc 10 ns ALU A B ns 25 ns delay

Datorteknik PerformanceAnalyse bild 13 5 ns Branch logic Sgn/Ze extend Zero ext. B label 10 ns ALU A B ns 20 ns delay

Datorteknik PerformanceAnalyse bild ns for load/store but 10 ns for NOP !?

Datorteknik PerformanceAnalyse bild 15 Amdahl’s rule: “Make the common case fast”

Datorteknik PerformanceAnalyse bild 16 Amdahl's Law Handy for evaluating impact of a change not tied to CPU performance equation Insight: No improvement of a feature enhances performance by more than the use of the feature. Suppose that enhancement E accelerates fraction F of a program by a factor S (remainder of the task is unaffected): ExecTime E = ((1 – F( + (F/S)) X ExecTime without F1-F E S = F/S

Datorteknik PerformanceAnalyse bild 17 What if we don’t need the ALU? A branch instruction?

Datorteknik PerformanceAnalyse bild 18 BUT! The single cycle model has to accomodate the slowest instruction Even if it rarely occurs!

Datorteknik PerformanceAnalyse bild 19 How much work can our structure perform? For a program Q: Time = Number of executed instruction * Number of cycles per instruction * Time per cycle T = Nq * CPI * Tc

Datorteknik PerformanceAnalyse bild 20 For the single cycle model.... CPI = 1 for all instructions Tc determined by the slowest instruction

Datorteknik PerformanceAnalyse bild 21 How to reduce T? T = Nq * CPI * Tc Reduce Nq. More powerful instructions! More hardware, longer paths, cycle time goes up (slower machine)

Datorteknik PerformanceAnalyse bild 22 “No free lunch” Why designers are so well paid - to optimize designs.

Datorteknik PerformanceAnalyse bild 23 How to reduce T? T = Nq * CPI * Tc Faster hardware Technological limits Cost increase not linearly related Sales volume drops

Datorteknik PerformanceAnalyse bild 24 How to reduce T? T = Nq * CPI * Tc Make this a function of the instruction For example:NOP = 1 cycle LW = 4 cycles Chapter 5.4, the classical method

Datorteknik PerformanceAnalyse bild 25 How to reduce T? T = Nq * CPI * Tc Make this a function of the instruction CPI goes up, but we can use an average, not the worst case Tc goes down, time to do the longes step, not the entire instruction

Datorteknik PerformanceAnalyse bild 26 Example Branch:Step 1: fetch Step 2: New PC Add:Step 1: fetch Step 2: decode/ register fetch Step 3: Compute and write back

Datorteknik PerformanceAnalyse bild 27 Example LW = 4 steps Cycletime = 1/4 old time T = 4 * 1/4 old time, LW CPI just as slow for the lw instruction our worst case!

Datorteknik PerformanceAnalyse bild 28 But that’s not important if LW is not common! T = Nq * CPI * 1/4 old time Averaged over this many instructions 1,3? 1,7? Never = 4,0!

Datorteknik PerformanceAnalyse bild 29 We win because of quantitative statistical properties of our programs!

Datorteknik PerformanceAnalyse bild 30 What value of CPI do we use? 1,3?1,5?1,7? Easy: Use average program! ?

Datorteknik PerformanceAnalyse bild 31 There is no such thing!

Datorteknik PerformanceAnalyse bild 32 Artificial “average programs” called “benchmarks” Are they something to trust? What about “peak performance values” mips?mflops? We have a peak at CPI = a program of only NO-OPS!

Datorteknik PerformanceAnalyse bild 33 Why Do Benchmarks ? How we evaluate performance differences –Across and within a single system (design & variations) What should benchmarks do? –Represent a large class of important programs –Behave like typical programs: improved benchmark performance => improved performance broadly For better or worse, benchmarks shape a field Good ones accelerate progress Bad benchmarks hurt progress –help real programs vs. sell machines/papers? –Enhancements that help benchmarks may not help most programs and v.v.

Datorteknik PerformanceAnalyse bild 34 Classes of Benchmarks (Toy) Benchmarks – line–e.g.,: sieve, puzzle, quicksort –good first programming assignments Synthetic Benchmarks –attempt to match average frequencies of real workloads –e.g., Whetstone, dhrystone –mostly good for nothing: too artificial Kernels –Time critical excerpts of real programs –e.g., Livermore loops, Linpack –good for micro-performance studies Real programs –e.g., gcc, spice, Verilog, Database, stock trading

Datorteknik PerformanceAnalyse bild 35 Successful Benchmark: SPEC Collection 1987 RISC industry (workstations) mired in “bench marketing”: –(“That is an 8 MIPS machine, but they claim 10 MIPS!”) EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: –Sun, MIPS, HP, Apollo, DEC Create standard list of programs, inputs, reporting rules: –several real programs, including OS calls –some I/O –rules for running and reporting

Datorteknik PerformanceAnalyse bild 36 Multiple clock cycle designs: State machines Micro programming chapter 5.4 “VLSI” design

Datorteknik PerformanceAnalyse bild 37 How to reduce T? T = Nq * CPI * Tc Reduce quotient cycles / instruction reduce “cycles”multiple clock- cycle design Increase “instruction”execute more than one instr. per cycle!

Datorteknik PerformanceAnalyse bild 38 More than one instruction per cycle? Parallelism –Div/mult + floating point + integer Superscalarity –Multiple issue etc. Pipelining –Of general importance