Computer Organization

Slides:



Advertisements
Similar presentations
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,
ECE-C355 Computer Structures Winter 2008 Chapter 04: Understanding Performance Slides are adapted from Professor Mary Jane Irwin (
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah CS-447– Computer Architecture.
Chapter 4 Assessing and Understanding Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
Lecture 2: Computer Performance
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
순천향대학교 정보기술공학부 이 상 정 1 4. Accessing and Understanding Performance.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
1 CPS4150 Chapter 4 Assessing and Understanding Performance.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
CS35101 – Computer Architecture Week 9: Understanding Performance Paul Durand ( ) [Adapted from M Irwin (
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
COD Ch. 1 Introduction + The Role of Performance.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CPEN Digital System Design Assessing and Understanding CPU Performance © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall © Computer.
Computer Architecture & Operations I
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
Performances of Computer Systems
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Computer Organization Chapter 1b With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents

Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance?

Which of these airplanes has the best performance? Airplane Passengers Range (mi) Speed (mph) Boeing 777 375 4630 610 Boeing 747 470 4150 610 BAC/Sud Concorde 132 4000 1350 Douglas DC-8-50 146 8720 544 How much faster is the Concorde compared to the 747? How much bigger is the 747 than the Douglas DC-8? Which airplane moves the most passengers in the least time?

Performance Metrics Purchasing perspective Design perspective given a collection of machines, which has the best performance ? least cost ? best cost/performance? Design perspective faced with design options, which has the best performance improvement ? Both require basis for comparison metric for evaluation Our goal is to understand what factors in the architecture contribute to overall system performance and the relative importance (and cost) of these factors Or smallest/lightest Longest battery life Most reliable/durable (in space)

Computer Performance Measures Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done? If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase?

Execution Time Elapsed Time CPU time Our focus: user CPU time counts everything (CPU, disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Our focus: user CPU time time spent executing the lines of code that are "in" our program

Book's Definition of Performance For some program running on machine X, PerformanceX = 1 / Execution timeX "X is n times faster than Y" PerformanceX / PerformanceY = n Problem: machine A runs a program in 20 seconds machine B runs the same program in 25 seconds How much faster is A than B? (25/20)

CPU execution time (CPU time) – time the CPU spends working on a task Performance Factors Want to distinguish between total elapsed time and the time spent on our task CPU execution time (CPU time) – time the CPU spends working on a task Does not include time waiting for I/O or running other programs CPU execution time # CPU clock cycles for a program for a program = x clock cycle time or Many techniques that decrease the number of clock cycles also increase the clock cycle time CPU execution time # CPU clock cycles for a program for a program clock rate = ------------------------------------------- Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program

Review: Machine Clock Rate Clock rate (MHz, GHz), CR, is inverse of clock cycle (CC) time (clock period) CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec clock cycle => 1 GHz clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate A clock cycle is the basic unit of time to execute one operation/pipeline stage/etc.

Clock Cycles Instead of reporting execution time in seconds, we often use cycles Clock “ticks” indicate when to start activities (one abstraction): clock cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 4 Ghz. clock has a cycle time time

How to Improve Performance So, to improve performance (everything else being equal) you can either (increase or decrease?) ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate.

Big View of Performance CPU Computer Control Datapath Memory Devices Input Output elapsed (response) time

System Performance CPU Throughput Computer Control Datapath Memory Devices Input Output Throughput

Metrics of Performance Compiler Programming Language Application Datapath Control Transistors Wires Pins ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Megabytes per second Transaction per day Operations per second

Processor Metrics CPU clock cycles/pgm = instructions/program * avg. clock cycles per instr. (CPI) Execution time = instruction/program * CPI * clock cycle time CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program measured.

How many cycles are required for a program? Could assume that number of cycles equals number of instructions 1st instruction 2nd instruction 3rd instruction 4th 5th 6th ... time This assumption is incorrect, different instructions take different amounts of time on different machines. Why? [hint: remember that these are machine instructions, not lines of C code]

Different numbers of cycles for different instructions time Multiplication takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers Important point: changing the cycle time often changes the number of cycles required for various instructions (more later)

Now that we understand cycles A given program will require some number of instructions (machine instructions) some number of cycles some number of seconds We have a vocabulary that relates these quantities: cycle time (seconds per cycle) clock rate (cycles per second) CPI (cycles per instruction) a floating point intensive application might have a higher CPI MIPS (millions of instructions per second) this would be higher for a program using simple instructions

Performance Performance is determined by execution time! Time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

CPI Example Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2 What machine is faster for this program, and by how much? If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

CPI Example (cont.) Each computer executes the same number of instructions. Let I denote the instruction count. CPU clock cycles for machine A = I * 2 CPU clock cycles for machine B = I * 1.2 CPU time for A = CPU clock cycles * CC time = 500 * I ps CPU time for B = CPU clock cycles * CC time = 600 * I ps Computer A is faster CPU performance of A / CPU performance of B = 1.2

The Performance Equation Our basic performance equation is then CPU time = IC x CPI x CC - OR - IC x CPI CR CPU time = -------------------- These equations separate the three key factors that affect performance Can measure the CPU execution time by running the program The clock rate is usually given Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details CPI varies by instruction type and ISA implementation for which we must know the implementation details Note that instruction count is dynamic – i.e., its not the number of lines in the code, but THE NUMBER OF INSTRUCTIONS EXECUTED

Clock Rate Example Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?" CPIB = 1.2 * CPIA [ 10sec = (CPU clock cycles of A for program) / 4GHz ] [ 6sec= (CPU clock cycles of B for program) / CRB] CRB = 8GHz

Clock Rate != Performance Mobile Intel Pentium 4 vs. Intel Pentium M 2.4 GHz 1.6 GHz Performance on Mobilemark with same memory and disk – Word, excel, photoshop, powerpoint, etc. – Mobile Pentium 4 is only 1.15 times faster (15%)

CPI Varies Different instruction types require different numbers of cycles CPI is often reported for types of instructions where CPIi is the CPI for the i type of instruction and ICi is the count of that type of instruction

Computing CPI To compute the overall average CPI, use The CPIi is obtained from the processor’s data sheet, and the instruction frequencies obtained by code profiling Code profiling: running a software measurement tool in background, which counts instructions by type, as a program executes

Computing CPI Example Given this machine, the CPI is the sum of (CPIi × Frequencyi) Average CPI is 0.5 + 0.4 + 0.4 + 0.2 = 1.5 What fraction of the time is spent for data transfers? When considering where to try and improve performance, invest resources where the time is spent, i.e. make the common case fast!

Another CPI example: RISC processor Op Freq Cycles CPI(i) %Time ALU 50% 1 0.5 23% Load 20% 5 1.0 45% Store 10% 3 0.3 14% Branch 20% 2 0.4 18% ▲ 2.2 Typical Mix How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once?

# of Instructions Example A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence?

# of Instructions Example A given application written in Java runs 15 seconds on a desktop processor. A new Java compiler is released that requires only 0.6 as many instructions as the old compiler. Unfortunately, it increases the CPI by 1.1. How fast can we expect the application to run using this new compiler?

Design Trade-off Fast Clock, Small CPI and Fewer Instructions RISC: Reduced Instruction Set Computer each instruction does one simple operation easy to implement, less number of cycles per inst. more units in one chip and faster clock take more instructions per program CISC: Complex Instruction Set Computer powerful instructions which do lots of work and take many cycles long cycle time, less number of instructions

MIPS example Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. The second compiler's code uses 10 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time?

Pitfall MIPS as Performance Metric Instruction rate versus capabilities of the instructions Different computers have different instruction sets with differing capabilities MIPS varies between differing programs on the same machine.

Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle instr. count CPI clock rate Program Compiler Instr. Set Arch. Organization Technology X X X X X X X X

Benchmarks Performance best determined by running a real application Use programs typical of expected workload Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks nice for architects and designers easy to standardize can be abused SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real program and inputs valuable indicator of performance (and compiler technology) can still be abused