EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
Chapter 4 Assessing and Understanding Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
Computer Performance Computer Engineering Department.
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
Memory/Storage Architecture Lab Computer Architecture Performance.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Cost and Performance.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction to Computer Architecture The Role of Performance
Computer Organization
Lecture 2: Performance Evaluation
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
Computer Performance He said, to speed things up we need to squeeze the clock.
Performance of computer systems
Performance of computer systems
Computer Performance Read Chapter 4
Presentation transcript:

EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance

EEL Overview of Today’s Lecture: Performance °Definition and Measures of Performance °Summarizing Performance and Performance Pitfalls °Reading: Chapter 1

EEL Technology and Cost Summary °Integrated circuits driving computer industry °Technology improvements: CMOS transistors getting smaller, faster for each new generation Smaller -> more transistors per area -> more functionality (e.g. 64- bit datapath, MMX extensions, superscalar execution, caches, multiple cores) Faster -> higher raw speed (clock cycle) °Die costs goes up with the cube of die area

EEL Review: Summary from Chapter 1 °All computers consist of five components Processor: (1) datapath and (2) control (3) Memory (4) Input devices and (5) Output devices °Not all “memory” are created equally Cache: fast (expensive) memory are placed closer to the processor, limited amount due to large area Main memory: less expensive memory--we can have more °Input and output (I/O) devices are very diverse Wide range of speed: graphics vs. keyboard Wide range of requirements: speed, standard, cost... etc.

EEL Performance °Purchasing perspective given a collection of machines, which has the -best performance ? -least cost ? -best performance / cost ? °Design perspective faced with design options, which has the -best performance improvement ? -least cost ? -best performance / cost ? °Both require basis for comparison metric for evaluation °Our goal is to understand cost & performance implications of architectural choices

EEL Two notions of “performance” ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns... (Performance) – throughput, bandwidth Response time and throughput often are in opposition Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200 Which has higher performance?

EEL Example Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ? Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” Boeing is 1.6 times (“60%”) faster in terms of throughput Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job

EEL Definitions °Performance is in units of things-per-second bigger is better °If we are primarily concerned with response time performance(x) = 1 execution_time(x) " X is n times faster than Y" means Performance(X) ExecutionTime(Y) n = = Performance(Y) ExecutionTime(X)

EEL Basis of Evaluation Actual Target Workload Full Application Benchmarks Small “Kernel” Benchmarks Microbenchmarks Pros Cons representative very specific non-portable difficult to run, or measure hard to identify cause portable widely used improvements useful in reality easy to run, early in design cycle identify peak capability and potential bottlenecks less representative than target workload easy to “fool” “peak” may be a long way from application performance

EEL Metrics of performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

EEL Relating Processor Metrics °CPU execution time = CPU clock cycles/program * clock cycle time Or, CPU execution time = CPU clock cycles/program ÷ clock rate °CPU clock cycles/program = Instructions/program * avg. clock cycles per instruction Or, more commonly: CPI (clock cycles per instruction) = (CPU clock cycles/program) ÷ (Instructions/program) °Examples: Single-cycle MIPS datapath: CPI=1 (all instructions take 1 cycle) Multi-cycle MIPS datapath: CPI within 2-5 range

EEL Organizational Trade-offs Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units Instruction Mix Cycle Time CPI

EEL CPI CPI = ∑ CPI * F where F = I i = 1 n iii i Instruction Count CPI varies depending on individual instruction e.g. 5-cycle load, 3-cycle ALU in multi-cycle MIPS Average CPI for a program depends on the “mix” of instructions it executes e.g. close to 5 if load-intensive, close to 4 if ALU intensive The instruction mix in % n classes/types of instructions

EEL Example (RISC processor) Typical Mix Base Machine OpFreqCyclesF*CPI(i)% Time ALU50%1.523% Load20% % Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once? Invest resources where time is spent!

EEL Aspects of CPU Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle instr. countCPIclock rate Program Compiler Instr. Set. Arch. Organization Technology IC CPICLK

EEL Aspects of CPU Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle instr. countCPIclock rate Program X (x) avg Compiler X (x) avg Instr. Set. Arch. X X X Organization X X Technology X IC CPICLK

EEL Marketing Metrics MIPS= Instruction Count / Time * 10^6 = Clock Rate / CPI * 10^6 machines with different instruction sets ? programs with different instruction mixes ? dynamic frequency of instructions uncorrelated with performance MFLOP/S = FP Operations / Time * 10^6 machine dependent Not necessarily where time is spent

EEL Why benchmarks? °How we evaluate differences Different systems Changes to a single system °Provide a target Benchmarks should represent large class of important programs Improving benchmark performance should help many programs °For better or worse, benchmarks shape a field °Good ones accelerate progress good target for development °Bad benchmarks hurt progress New ideas that help real programs v. sell machines/papers?

EEL Programs to Evaluate Processor Performance °(Toy) Benchmarks line e.g.,: sieve, puzzle, quicksort, “cast” °Synthetic Benchmarks attempt to match average frequencies of real workloads e.g., Whetstone, dhrystone °Kernels Time critical excerpts

EEL Successful Benchmark: SPEC °1987: RISC industry mired in “bench marketing” Inconsistent Not reported fairly or correctly Everyone had a “new and better” benchmark targeted to make their architecture look better °EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: Sun, MIPS, HP, Apollo, DEC °Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

EEL SPEC first round °First round 1989; 10 programs, single number to summarize performance °One program: 99% of time in single line of code °New front-end compiler could improve dramatically Comparing different platform performance

EEL SPEC Evolution °Second round; SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs) °Third round; 1995; new set of programs °Currently: SPEC 2006 °Additions: SPECweb (Web server throughput) JVM (Java virtual machine) SPEChpc (high-performance computing) SFS (file system) SPECviewperf, SPECapc (graphics)

EEL Quantitative design principles °Primary goal: cost-performance Increase performance with small cost implications °Key principle to keep in mind: “Make the common case fast” Quantified by “Amdahl’s Law”

EEL Amdahl’s Law °Idea: Given a system “X” and the opportunity of enhancing it to become a new system “Y” How faster will “Y” be relative to “X”? °Key parameters: The gain from the enhancement The frequency at which it can be applied

EEL Example - Triathlon °Three parts: run, swim, cycle °Bob has been training for a competition Based on his experience, he knows that during competition he can push his limit to: -Run 30% faster than in training, or -Swim 50% faster, or -Cycle 20% faster °Where should Bob spend his energy during competition? °By the way, when training Bob spends: 60 minutes running 40 minutes swimming 2 hours cycling

EEL Amdahl’s Law °Used to compute speedups: Performance_with_enhancement Speedup = Performance_without_enhancement °Performance: Inversely proportional to execution time Speedup = EXECold/EXECnew

EEL Amdahl’s Law (cont) EXECnew = EXECold * [ (1 – FRACenh) + FRACenh/SPenh] EXECnew,old: execution times (seconds) FRACenh: Fraction of time enhancement is applied (%) SPenh: Speedup due to enhancement (absolute number)

EEL Example °Bob’s speedup due to swimming: FRACenh = 40 min/220 min = SPenh = 1.50 (50% speedup) Speedup = EXECold/EXECnew = = 1/[( )+0.182/1.50] = (6.5% improvement) °Running: (6.7% improvement) °Cycling: (10% improvement)

EEL Other important design principles °Locality Programs tend to reuse code/data recently accessed -Memory hierarchies leverage this locality for increased performance -Combats the memory wall °Parallelism Multiple operations in a single clock cycle -Pipelining, super-scalar execution, multi-core designs, vector processors

EEL Fallacies & Pitfalls °Relative perf. can be judged by clock rates Fail to capture IC, CPI components Cannot use clock rate to judge, even if same program, same ISA -IC same, but CPI may not be e.g: Pentium 4 1.7GHz relative to P-III 1GHz

EEL Summary °Time is the measure of computer performance! °Remember Amdahl’s Law: Speedup is limited by unimproved part of program CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle