2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary

Slides:



Advertisements
Similar presentations
Performance Evaluation of Architectures Vittorio Zaccaria.
Advertisements

810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,
ECE-C355 Computer Structures Winter 2008 Chapter 04: Understanding Performance Slides are adapted from Professor Mary Jane Irwin (
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Fall 2003 Week 8 [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane.
Chapter 4 Assessing and Understanding Performance
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
Lecture 2: Computer Performance
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
Memory/Storage Architecture Lab Computer Architecture Performance.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
Cost and Performance.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Software School, Fudan University 2015 The Role of Performance To tell which system is faster.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction to Computer Architecture The Role of Performance
Two notions of performance
Computer Organization
Lecture 2: Performance Evaluation
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
August 30, 2000 Prof. John Kubiatowicz
Performance of computer systems
Presentation transcript:

2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary

2-2 ECE 361 Today’s Lecture Performance Concepts Response Time Throughput Performance Evaluation Benchmarks Announcements Processor Design Metrics Cycle Time Cycles per Instruction Amdahl’s Law Speedup what is important Critical Path

2-3 ECE 361 Performance Concepts

2-4 ECE 361 Performance Perspectives Purchasing perspective Given a collection of machines, which has the -Best performance ? -Least cost ? -Best performance / cost ? Design perspective Faced with design options, which has the -Best performance improvement ? -Least cost ? -Best performance / cost ? Both require basis for comparison metric for evaluation Our goal: understand cost & performance implications of architectural choices

2-5 ECE 361 Two Notions of “Performance” Which has higher performance? Execution time (response time, latency, …) Time to do a task Throughput (bandwidth, …) Tasks per unit of time Response time and throughput often are in opposition Plane Boeing 747 Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200

2-6 ECE 361 Definitions Performance is typically in units-per-second bigger is better If we are primarily concerned with response time performance = 1 execution_time " X is n times faster than Y" means

2-7 ECE 361 Example Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph= 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ? Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph= 1.60 “times faster” Boeing is 1.6 times (“60%”) faster in terms of throughput Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job Lots of instructions in a program => Instruction thruput important!

2-8 ECE 361 Benchmarks

2-9 ECE 361 Evaluation Tools Benchmarks, traces and mixes Macrobenchmarks and suites Microbenchmarks Traces Workloads Simulation at many levels ISA, microarchitecture, RTL, gate circuit Trade fidelity for simulation rate (Levels of abstraction) Other metrics Area, clock frequency, power, cost, … Analysis Queuing theory, back-of-the-envelope Rules of thumb, basic laws and principles

2-10 ECE 361 Benchmarks Microbenchmarks Measure one performance dimension -Cache bandwidth -Memory bandwidth -Procedure call overhead -FP performance Insight into the underlying performance factors Not a good predictor of application performance Macrobenchmarks Application execution time -Measures overall performance, but on just one application -Need application suite

2-11 ECE 361 Why Do Benchmarks? How we evaluate differences Different systems Changes to a single system Provide a target Benchmarks should represent large class of important programs Improving benchmark performance should help many programs For better or worse, benchmarks shape a field Good ones accelerate progress good target for development Bad benchmarks hurt progress help real programs v. sell machines/papers? Inventions that help real programs don’t help benchmark

2-12 ECE 361 Popular Benchmark Suites Desktop SPEC CPU CPU intensive, integer & floating-point applications SPECviewperf, SPECapc - Graphics benchmarks SysMark, Winstone, Winbench Embedded EEMBC - Collection of kernels from 6 application areas Dhrystone - Old synthetic benchmark Servers SPECweb, SPECfs TPC-C - Transaction processing system TPC-H, TPC-R - Decision support system TPC-W - Transactional web benchmark Parallel Computers SPLASH - Scientific applications & kernels Most markets have specific benchmarks for design and marketing.

2-13 ECE 361 SPEC CINT2000

2-14 ECE 361 tpC

2-15 ECE 361 Basis of Evaluation Actual Target Workload Full Application Benchmarks Small “Kernel” Benchmarks Microbenchmarks ProsCons representative very specific non-portable difficult to run, or measure hard to identify cause portable widely used improvements useful in reality easy to run, early in design cycle identify peak capability and potential bottlenecks less representative easy to “fool” “peak” may be a long way from application performance

2-16 ECE 361 Programs to Evaluate Processor Performance (Toy) Benchmarks line e.g.,: sieve, puzzle, quicksort Synthetic Benchmarks attempt to match average frequencies of real workloads e.g., Whetstone, dhrystone Kernels Time critical excerpts

2-17 ECE 361 Announcements Website Next lecture Instruction Set Architecture

2-18 ECE 361 Processor Design Metrics

2-19 ECE 361 Metrics of Performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Megabytes per second Seconds per program Useful Operations per second

2-20 ECE 361 Organizational Trade-offs Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units Instruction Mix Cycle Time CPI CPI is a useful design measure relating the Instruction Set Architecture with the Implementation of that architecture, and the program measured

2-21 ECE 361 Processor Cycles Clock Combination Logic Cycle Most contemporary computers have fixed, repeating clock cycles

2-22 ECE 361 CPU Performance

2-23 ECE 361 Cycles Per Instruction (Throughput) “Instruction Frequency” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count “Cycles per Instruction”

2-24 ECE 361 Principal Design Metrics: CPI and Cycle Time

2-25 ECE 361 Example How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? Load  20% x 2 cycles =.4 Total CPI 2.2  1.6 Relative performance is 2.2 / 1.6 = 1.38 How does this compare with reducing the branch instruction to 1 cycle? Branch  20% x 1 cycle =.2 Total CPI 2.2  2.0 Relative performance is 2.2 / 2.0 = 1.1 Typical Mix OpFreqCyclesCPI ALU50%1.5 Load20%5 1.0 Store10%3.3 Branch20%

2-26 ECE 361 Summary: Evaluating Instruction Sets and Implementation Design-time metrics: Can it be implemented, in how long, at what cost? Can it be programmed? Ease of compilation? Static Metrics: How many bytes does the program occupy in memory? Dynamic Metrics: How many instructions are executed? How many bytes does the processor fetch to execute the program? How many clocks are required per instruction? How "lean" a clock is practical? Best Metric: Time to execute the program! NOTE: Depends on instructions set, processor organization, and compilation techniques. CPI Inst. CountCycle Time

2-27 ECE 361 Amdahl's “Law”: Make the Common Case Fast Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) X ExTime(without E) Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E) Performance improvement is limited by how much the improved feature is used  Invest resources where time is spent.

2-28 ECE 361 Marketing Metrics MIPS= Instruction Count / Time * 10^6 = Clock Rate / CPI * 10^6 machines with different instruction sets ? programs with different instruction mixes ? dynamic frequency of instructions uncorrelated with performance MFLOP/s= FP Operations / Time * 10^6 machine dependent often not where time is spent

2-29 ECE 361 Summary Time is the measure of computer performance! Good products created when have: Good benchmarks Good ways to summarize performance If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales  sales almost always wins Remember Amdahl’s Law: Speedup is limited by unimproved part of program CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle

2-30 ECE 361 Critical Path

2-31 ECE 361 Range of Design Styles Gates Routing Channel Gates Routing Channel Gates Standard ALU Standard Registers Gates Custom Control Logic Custom Register File Custom DesignStandard Cell Gate Array/FPGA/CPLD Custom ALU Performance Design Complexity (Design Time) Longer wiresCompact

2-32 ECE 361 “Mealey Machine” “Moore Machine” Implementation as Combinational Logic + Latch Latch Combinational Logic Clock

2-33 ECE 361 Clocking Methodology All storage elements are clocked by the same clock edge (but there may be clock skews) The combination logic block’s: Inputs are updated at each clock tick All outputs MUST be stable before the next clock tick Clock Combination Logic

2-34 ECE 361 Critical Path & Cycle Time Critical path: the slowest path between any two storage devices Cycle time is a function of the critical path Clock

2-35 ECE 361 Tricks to Reduce Cycle Time Reduce the number of gate levels  Pay attention to loading One gate driving many gates is a bad idea Avoid using a small gate to drive a long wire  Use multiple stages to drive large load  Revise design A B C D A B C D INV4x Clarge

2-36 ECE 361 Summary Performance Concepts Response Time Throughput Performance Evaluation Benchmarks Processor Design Metrics Cycle Time Cycles per Instruction Amdahl’s Law Speedup what is important Critical Path