CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CPU Performance Evaluation: Cycles Per Instruction (CPI)
EECC550 - Shaaban #1 Lec # 3 Spring Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Fall 2003 Week 8 [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane.
Chapter 4 Assessing and Understanding Performance
CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH Instructor: Prof. Jason Cong Lecture 4: Performance and Cost Measurements.
1 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why.
CS61C L221 Performance © UC Regents 1 CS61C - Machine Structures Lecture 22 - Introduction to Performance November 17, 2000 David Patterson
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
Memory/Storage Architecture Lab Computer Architecture Performance.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
Csci 136 Computer Architecture II – CPU Performance Xiuzhen Cheng
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Cost and Performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction to Computer Architecture The Role of Performance
Computer Organization
Lecture 2: Performance Evaluation
Performance Performance The CPU Performance Equation:
How do we evaluate computer architectures?
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 1 Computer Abstractions & Technology Performance Evaluation
August 30, 2000 Prof. John Kubiatowicz
A Question to Ponder On [from last lecture]
Presentation transcript:

CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi

Performance zPurchasing perspective ygiven a collection of machines, which has the xbest performance ? xleast cost ? xbest performance / cost ? zDesign perspective yfaced with design options, which has the xbest performance improvement ? xleast cost ? xbest performance / cost ? zBoth require ybasis for comparison ymetric for evaluation zOur goal is to understand cost & performance implications of architectural choices

Two notions of “performance” ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns... – throughput, bandwidth Response time and throughput often are in opposition DC to Paris 6.5 hours 3 hours Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph Passengers Throughput 286, ,200 Which has higher performance?

Example Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ? Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” Boeing is 1.6 times (“60%”)faster in terms of throughput Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job

Definitions zPerformance is in units of things-per-second ybigger is better zIf we are primarily concerned with response time yperformance(x) = 1 execution_time(x) " X is n times faster than Y" means Performance(X) n = Performance(Y)

Performance measurement zHow about collection of programs? z Example: Three machines: A, B and C. Two Programs: P1 and P2. A B C W(1) W(2) W(3) P P W(1) Arithmetic mean:  Weight i * Time i W(2) W(3)

Performance measurement zOther option: Geometric Means (Self study pages text book)

Metrics of performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

Relating Processor Metrics zCPU execution time = CPU clock cycles X clock cycle time zor CPU execution time = CPU clock cycles ÷ clock rate zCPU clock cycles= Instructions X avg. clock cycles per instr. zor CPI = CPU clock cycles÷ Instructions zCPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program measured

Aspects of CPU Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle instr. countCPIclock rate Program Compiler Instr. Set Arch. Organization Technology

Aspects of CPU Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle instr countCPIclock rate Program X Compiler X (x) Instr. Set. X X Organization X X Technology X

Organizational Trade-offs Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units Instruction Mix Cycle Time CPI

CPU time = ClockCycleTime * SUM CPI * I i = 1 n i i CPI = SUM CPI * F where F = I i = 1 n i i ii Instruction Count "instruction frequency" Invest Resources where time is Spent! CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count “Average cycles per instruction”

Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI(i)% Time ALU50%1.523% Load20% % Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once?

Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI(i)% Time ALU50%1.523% Load20% % Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if: A) Loads took “0” cycles? B) Stores took “0” cycles? C) ALU ops took “0” cycles? D)Branches took “0” cycles? MAKE THE COMMON CASE FAST

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) X ExTime(without E) Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E ) Speedup(with E) =1/ ((1-F) + F/S)

Amdahl's Law-example A new CPU makes Web serving 10 times faster. The old CPU spent 40% of the time on computation and 60% on waiting for I/O. What is the overall enhancement? Fraction enhanced= 0.4 Speedup enhanced = 10 Speedup overall = 1 = /10

Example from Quiz z a)A program consists of 80% initialization code and of 20% code being the main iteration loop, which is run 1000 times. The total runtime of the program is 100 seconds. Calculate the fraction of the total run time needed for the initialization and the iteration. Which part would you optimize? B) The program should have a total run time of 60 seconds. How can this be achieved? (15 points)

Marketing Metrics MIPS = Instruction Count / Time * 10^6 = Clock Rate / CPI * 10^6 machines with different instruction sets ? programs with different instruction mixes ? dynamic frequency of instructions uncorrelated with performance GFLOPS = FP Operations / Time * 10^9 playstation: 6.4 GFLOPS machine dependent often not where time is spent

Why Do Benchmarks? zHow we evaluate differences yDifferent systems yChanges to a single system zProvide a target yBenchmarks should represent large class of important programs yImproving benchmark performance should help many programs zFor better or worse, benchmarks shape a field zGood ones accelerate progress ygood target for development zBad benchmarks hurt progress yhelp real programs v. sell machines/papers? yInventions that help real programs don’t help benchmark

Basis of Evaluation Actual Target Workload Full Application Benchmarks Small “Kernel” Benchmarks Microbenchmarks Cons representative very specific non-portable difficult to run, or measure hard to identify cause portable widely used improvements useful in reality easy to run, early in design cycle identify peak capability and potential bottlenecks less representative easy to “fool” “peak” may be a long way from application performance

Successful Benchmark: SPEC z1987 RISC industry mired in “bench marketing”: (“That is 8 MIPS machine, but they claim 10 MIPS!”) zEE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: Sun, MIPS, HP, Apollo, DEC zCreate standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

SPEC first round zFirst round 1989; 10 programs, single number to summarize performance zOne program: 99% of time in single line of code zNew front-end compiler could improve dramatically

SPEC Evolution zSecond round; SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs)  Add SPECbase: one flag setting for integer programs & 1 for FP zThird round; 1995; new set of programs y “benchmarks useful for 3 years” Now (SPEC 2000)

SPEC95 zEighteen application benchmarks (with inputs) reflecting a technical computing workload zEight integer ygo, m88ksim, gcc, compress, li, ijpeg, perl, vortex zTen floating-point intensive ytomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5 zMust run with standard compiler flags yeliminate special undocumented incantations that may not even generate working code for real programs

Summary zTime is the measure of computer performance! zGood products created when have: yGood benchmarks yGood ways to summarize performance zIf not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins zRemember Amdahl’s Law: Speedup is limited by unimproved part of program CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle

Readings & More… Reminder: READ: TEXTBOOK: Chapter 1 pages 1 to 47 Moore paper (posted on course web site).