Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
Computer Organization and Architecture 18 th March, 2008.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
1 Embedded Systems Computer Architecture. Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
COD Ch. 1 Introduction + The Role of Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CPEN Digital System Design Assessing and Understanding CPU Performance © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall © Computer.
Computer Architecture & Operations I
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
CMSC 611: Advanced Computer Architecture
Morgan Kaufmann Publishers Computer Performance
Performance Cycle time of a computer CPU speed speed = 1 / cycle time
CMSC 611: Advanced Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
January 25 Did you get mail from Chun-Fa about assignment grades?
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998 These slides are base on the chapter 2 from the following book: D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998chapter 2 If you need more explanations you can find them in the book itself. Here is the list of the relevant slides numbers (from the chapter 2 slides): 11 – 14, 18 – 22, 28 – 30. The slides contain some examples (without solutions). We would solve some of them in the class.

We would focus on user CPU time – time spent executing the lines of code that are “in” our program (i.e. without I/O time, etc). Definition of performance: for some program running of machine X, Performance x = 1 / Execution time x Note that “machine X is n time faster than machine Y” => P x / P y = n Clock cycle: time between 2 consequent (machine) clock ticks. Instead of reporting execution time in seconds, we often use cycles. Clock rate (frequency) = cycles per second. ( 1 Hz = 1 cycle/sec) Example: Machine with 200 Mhz clock has 200 * 10 6 Hz => it produces 2*10 8 clock cycles per second => its cycle (time) is 1/ 2*10 8 = 5 nanoseconds. (nanosecond = seconds). Note: different (machine) instructions take different amount of clock cycles. e.g.: integers  floating points; memory access  register access, etc.

Problem: Some program runs in 10 seconds on computer A, which has a 400 Mhz. clock. We built a new machine B, which runs in 600MHZ, but this machine requires each instruction 1.2 times as many clock cycles as machine A. How much time would it take machine B to execute the same program? Solution: clock rate = cycles per second 400 Mhz = 4*10 8 Hz => machine A provides 4*10 8 cycles per second program runs 10 seconds on machine A => program execution takes 4*10 9 cycles = > on machine B it would take 1.2 * 4*10 9 = 4.8 *10 9 cycles. How much time would it run on machine B? 4.8 *10 9 / 6 *10 8 Hz = 8, or 8 seconds.

Problem: There are two different classes of instructions: A and B - machine A has a clock cycle time of 10 ns. (nanoseconds) and a CPI (cycles per instruction) of 2.0 for class A instruction, CPI of 1.5 for class B instructions. - machine B has a clock cycle time of 20 ns. and a CPI of 1.25 for both instructions classes. a given program is 50% class A instructions and 50% class B instructions which machine runs this program faster? Solution: machine A: ns. per class A instruction = 2.0 * 10 = 20. machine A: ns. per class B instruction = 3.0 * 10 = 30. machine B: ns. per instruction = 1.25 * 20 = 25. execution time on machine A: C * (0.5 * * 30) = C * 25. execution time on machine B: C * 1*25 = C * 25. => the machines have same performance for the given program

Problem: There are three different classes of instructions: class A, B and C. They require one, three and five cycles respectively. There are two code sequences: - first code contains: 1 instructions of class A, 2 of B, and 1 of C. - second code contains: 6 instructions of class A, 1 of B, and 1 of C. A)Which sequence will be faster? B) By how much? C) What is the CPI for each sequence? Solution: first code: 1*1+2*3+1*5 = 12 cycles => CPI = 12 / (1+2+1) = 3 second code: 6*1+1*3+1*5 = 14 cycles => CPI = 14 / (6+1+1) = 1.75 A)first code is faster. B)By 14/12. C)3 for first code, 1.75 for second code

Amdahl’s Law: e.t. after improvement = e.t. unaffected + (e.t. affected / amount of improvement) (e.t. = execution time) Problem: A program runs in 100 seconds, with multiply (instructions) responsible for 80 seconds of this time. ( i.e. a program spends 80 seconds for execution of multiply instructions ). How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Solution: e.t. after improvement = 20 seconds + 80 seconds / x => 100 / 4 = / x => x = 16 This means that multiplication should be executed 16 time faster! Now, to make run time 5 times faster: 100 / 5 = / x => x = !!! This means that the multiplication should take 0 time! That’s impossible.

Problem: Suppose we want to improve in a well known benchmark, we know that floating- point instructions are 70% of the benchmark, and benchmark runs for 20 seconds, we enhanced the machine making all floating-point instructions run 7 times faster, but for some reason, this caused rest of the instructions run double the time. what will the speedup be? Floating point instructions run for 14 seconds, the rest 6 seconds. Solution: e.t. after improvement = 6*2 seconds + 14 / 7 = 12+2 = 14 seconds => speedup = 20 / 14. Summary: - performance is specific to a particular program(s). Total execution time is a consistent summary of performance. - for a given architecture, performance increases come from: - increases in clock rate (without adverse CPI affects) - improvements in processor organization that lower CPI - compiler enhancements that lower CPI and / or instruction count Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance.