How do we evaluate computer architectures?

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.
Computer Organization and Architecture 18 th March, 2008.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2: Computer Performance
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Performance. Moore's Law Moore's Law Related Curves.
Two notions of performance
Computer Organization
Computer Architecture & Operations I
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
Computer Architecture CSCE 350
CS2100 Computer Organisation
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Computer Performance He said, to speed things up we need to squeeze the clock.
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

How do we evaluate computer architectures? Think of 5 characteristics that differentiate computers? price ? size ? memory or disk space ? power consumption (temporature) ? performans virtualize easly I/O specification ISA reliability 1

Single-Cycle Performance Today, we’ll explore factors that contribute to a processor’s execution time, and specifically at the performance of the single-cycle machine. We’ll see a MIPS single-cycle datapath and control unit next week. Next time, we’ll explore how to improve on the single cycle machine’s performance using pipelining. June 2, 2018 ©2006 Craig Zilles (derived from slides by Howard Huang) 2

Two notions of performance Which has higher performance… from a passenger’s viewpoint? from an airline’s viewpoint? Aircraft DC to Paris Passengers 747 6 hours 500 Concorde 3 hours 125

Two notions of performance Latency vs. throughput Passenger’s viewpoint: hours per flight time to do the task (latency, execution time, response time) From an airline’s viewpoint: passengers per hour tasks per unit time (throughput, bandwidth) Latency and throughput are often in opposition Aircraft DC to Paris Passengers 747 6 hours 500 Concorde 3 hours 125

Three Components of CPU Performance CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX Cycles Per Instruction June 2, 2018 Performance 5

Some Definitions Latency is time per task (e.g. hours per flight) If we are primarily concerned with latency, Performance(x) = 1 execution_time(x) Bigger is better Throughput is number of tasks per unit time (e.g. passengers per hour) Performance(x) = throughput(x) Again, bigger is better Relative performance: “x is N times faster than y” N = Performance(x) Performance(y)

CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX CPU performance The obvious metric: how long does it take to run a test program? Aircraft analogy: how long does it take to transport 1000 passengers? Our vocabulary Aircraft analogy N instructions N passengers c cycles per instruction (1/c) passengers per flight t seconds per cycle t hours per flight Time = N  c  t seconds Time = N  c  t hours CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX Cycles Per Instruction

Instructions Executed We are not interested in the static instruction count, or how many lines of code are in a program. Instead we care about the dynamic instruction count, or how many instructions are actually executed when the program runs. There are three lines of code below, but the number of instructions executed would be 2001. li $a0, 1000 Ostrich: sub $a0, $a0, 1 bne $a0, $0, Ostrich

CPI The average number of clock cycles per instruction, or CPI, is a function of the machine and program. The CPI depends on the actual instructions appearing in the program—a floating-point intensive application might have a higher CPI than an integer-based program. It also depends on the CPU implementation. For example, a Pentium can execute the same instructions as an older 80486, but faster. In early courses, we assumed each instruction took one cycle, so we had CPI = 1. The CPI can be >1 due to memory stalls and slow instructions. The CPI can be <1 on machines that execute more than 1 instruction per cycle (superscalar).

Clock cycle time One “cycle” is the minimum time it takes the CPU to do any work. The clock cycle time or clock period is just the length of a cycle. The clock rate, or frequency, is the reciprocal of the cycle time. Generally, a higher frequency is better. Some examples illustrate some typical frequencies. A 500MHz processor has a cycle time of 2ns. A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns (500ps). June 2, 2018 Performance 10

CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX Execution time, again CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX The easiest way to remember this is match up the units: Make things faster by making any component smaller!! Often easy to reduce one component by increasing another Seconds = Instructions * Clock cycles Program Clock cycle Program Compiler ISA Hardware Technology Instruction Executed CPI Clock Cycle TIme June 2, 2018 Performance 11

Example 1: ISA-compatible processors Let’s compare the performances two x86-based processors. An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor. A 1GHz Pentium III with a CPI of 1.5 for the same program. Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions. But they implement the ISA differently, which leads to different CPIs. CPU timeAMD,P = InstructionsP * CPIAMD,P * Cycle timeAMD = N * 1.2 * 1/800M = N * 1.5 * 10 -9 CPU timeP3,P = InstructionsP * CPIP3,P * Cycle timeP3 = N * 1.5 * 1/1000M

Example 2: Comparing across ISAs Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple instructions per cycle. If an Itanium processor achieves an average CPI of .3 (3 instructions per cycle), how much faster is it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1? Itanium is three times faster Itanium is one third as fast Not enough information

Make the common case fast Improving CPI Some processor design techniques improve CPI Often they only improve CPI for certain types of instructions where Fi = fraction of instructions of type i First Law of Performance: Make the common case fast

Example: CPI improvements Base Machine: CPI = 1.5+1.2+0.6+0.2 = 3.5 How much faster would the machine be if: we added a cache to reduce average load time to 3 cycles? we added a branch predictor to reduce branch time by 1 cycle? we could do two ALU operations in parallel? Op Type Freq (Fi) CPIi contribution to CPI ALU 50% 3 0.5*3 = 1.5 Load 20% 6 1.2 Store 0.6 Branch 10% 2 0.2

Make the fast case common Amdahl’s Law Amdahl’s Law states that optimizations are limited in their effectiveness. For example, doubling the speed of floating-point operations sounds like a great idea. But if only 10% of the program execution time T involves floating-point code, then the overall performance improves by just 5%. Execution time after improvement = Time affected by improvement + Time unaffected by improvement Amount of improvement Execution time after improvement = 0.10 T + 0.90 T 0.95 T 2 Second Law of Performance: Make the fast case common

Summary Performance is one of the most important criteria in judging systems. Our main performance equation explains how performance depends on several factors related to both hardware and software. CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX It can be hard to measure these factors in real life, but this is a useful guide for comparing systems and designs. Amdahl’s Law also tells us how much improvement we can expect from specific enhancements.