1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:

Slides:



Advertisements
Similar presentations
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Advertisements

Computer Abstractions and Technology
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Organization and Architecture 18 th March, 2008.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Performance summaries  Quantitative principles of computer.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
1 Lecture 2: MIPS Instruction Set Today’s topic:  MIPS instructions Reminder: sign up for the mailing list cs3810 Reminder: set up your CADE accounts.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Computer Performance: Metrics, Measurement, & Evaluation.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
Lecture 4: MIPS Instruction Set Reminders: –Homework #1 posted: due next Wed. –Midterm #1 scheduled Friday September 26 th, 2014 Location: TODD 430 –Midterm.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
EGRE 426 Computer Organization and Design Chapter 4.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Measuring Performance II and Logic Design
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
Lecture 3: MIPS Instruction Set
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Morgan Kaufmann Publishers
COSC 3406: Computer Organization
CSCE 212 Chapter 4: Assessing and Understanding Performance
Lecture 2: Performance Today’s topics: Technology wrap-up
Lecture 4: MIPS Instruction Set
Performance of computer systems
Lecture 3: MIPS Instruction Set
Performance of computer systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Parameters that affect it How to improve it and by how much
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS161 – Design and Architecture of Computer Systems
Computer Organization and Design Chapter 4
Presentation transcript:

1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage: Reminder: sign up for the mailing list csece3810 See info on TA office hours on class webpage From your classmate, Jeremy:

2 Performance Metrics Possible measures:  response time – time elapsed between start and end of a program  throughput – amount of work done in a fixed time The two measures are usually linked  A faster processor will improve both  More processors will likely only improve throughput  Some policies will improve throughput and worsen response time What influences performance?

3 Execution Time Consider a system X executing a fixed workload W Performance X = 1 / Execution time X Execution time = response time = wall clock time - Note that this includes time to execute the workload as well as time spent by the operating system co-ordinating various events The UNIX “time” command breaks up the wall clock time as user and system time

4 Speedup and Improvement System X executes a program in 10 seconds, system Y executes the same program in 15 seconds System X is 1.5 times faster than system Y The speedup of system X over system Y is 1.5 (the ratio) The performance improvement of X over Y is = 0.5 = 50% The execution time reduction for the program, compared to Y is (15-10) / 15 = 33% The execution time increase, compared to X is (15-10) / 10 = 50%

5 A Primer on Clocks and Cycles

6 Performance Equation - I CPU execution time = CPU clock cycles x Clock cycle time Clock cycle time = 1 / Clock speed If a processor has a frequency of 3 GHz, the clock ticks 3 billion times in a second – as we’ll soon see, with each clock tick, one or more/less instructions may complete If a program runs for 10 seconds on a 3 GHz processor, how many clock cycles did it run for? If a program runs for 2 billion clock cycles on a 1.5 GHz processor, what is the execution time in seconds?

7 Performance Equation - II CPU clock cycles = number of instrs x avg clock cycles per instruction (CPI) Substituting in previous equation, Execution time = clock cycle time x number of instrs x avg CPI If a 2 GHz processor graduates an instruction every third cycle, how many instructions are there in a program that runs for 10 seconds?

8 Factors Influencing Performance Execution time = clock cycle time x number of instrs x avg CPI Clock cycle time: manufacturing process (how fast is each transistor), how much work gets done in each pipeline stage (more on this later) Number of instrs: the quality of the compiler and the instruction set architecture CPI: the nature of each instruction and the quality of the architecture implementation

9 Example Execution time = clock cycle time x number of instrs x avg CPI Which of the following two systems is better? A program is converted into 4 billion MIPS instructions by a compiler ; the MIPS processor is implemented such that each instruction completes in an average of 1.5 cycles and the clock speed is 1 GHz The same program is converted into 2 billion x86 instructions; the x86 processor is implemented such that each instruction completes in an average of 6 cycles and the clock speed is 1.5 GHz

10 Benchmark Suites Each vendor announces a SPEC rating for their system  a measure of execution time for a fixed collection of programs  is a function of a specific CPU, memory system, IO system, operating system, compiler  enables easy comparison of different systems The key is coming up with a collection of relevant programs

11 SPEC CPU SPEC: System Performance Evaluation Corporation, an industry consortium that creates a collection of relevant programs The 2006 version includes 12 integer and 17 floating-point applications The SPEC rating specifies how much faster a system is, compared to a baseline machine – a system with SPEC rating 600 is 1.5 times faster than a system with SPEC rating 400 Note that this rating incorporates the behavior of all 29 programs – this may not necessarily predict performance for your favorite program!

12 Deriving a Single Performance Number How is the performance of 29 different apps compressed into a single performance number? SPEC uses geometric mean (GM) – the execution time of each program is multiplied and the N th root is derived Another popular metric is arithmetic mean (AM) – the average of each program’s execution time Weighted arithmetic mean – the execution times of some programs are weighted to balance priorities

13 Amdahl’s Law Architecture design is very bottleneck-driven – make the common case fast, do not waste resources on a component that has little impact on overall performance/power Amdahl’s Law: performance improvements through an enhancement is limited by the fraction of time the enhancement comes into play Example: a web server spends 40% of time in the CPU and 60% of time doing I/O – a new processor that is ten times faster results in a 36% reduction in execution time (speedup of 1.56) – Amdahl’s Law states that maximum execution time reduction is 40% (max speedup of 1.66)

14 Common Principles Amdahl’s Law Energy: systems leak energy even when idle Energy: performance improvements typically also result in energy improvements rule: 10% of the program accounts for 90% of execution time Principle of locality: the same data/code will be used again (temporal locality), nearby data/code will be touched next (spatial locality)

15 Recap Knowledge of hardware improves software quality: compilers, OS, threaded programs, memory management Important trends: growing transistors, move to multi-core, slowing rate of performance improvement, power/thermal constraints, long memory/disk latencies Reasoning about performance: clock speeds, CPI, benchmark suites, performance equations Next: assembly instructions

16 Instruction Set Understanding the language of the hardware is key to understanding the hardware/software interface A program (in say, C) is compiled into an executable that is composed of machine instructions – this executable must also run on future machines – for example, each Intel processor reads in the same x86 instructions, but each processor handles instructions differently Java programs are converted into portable bytecode that is converted into machine instructions during execution (just-in-time compilation) What are important design principles when defining the instruction set architecture (ISA)?

17 Instruction Set Important design principles when defining the instruction set architecture (ISA):  keep the hardware simple – the chip must only implement basic primitives and run fast  keep the instructions regular – simplifies the decoding/scheduling of instructions

18 A Basic MIPS Instruction C code: a = b + c ; Assembly code: (human-friendly machine instructions) add a, b, c # a is the sum of b and c Machine code: (hardware-friendly machine instructions) Translate the following C code into assembly code: a = b + c + d + e;

19 Example C code a = b + c + d + e; translates into the following assembly code: add a, b, c add a, b, c add a, a, d or add f, d, e add a, a, e add a, a, f Instructions are simple: fixed number of operands (unlike C) A single line of C code is converted into multiple lines of assembly code Some sequences are better than others… the second sequence needs one more (temporary) variable f

20 Subtract Example C code f = (g + h) – (i + j); Assembly code translation with only add and sub instructions:

21 Subtract Example C code f = (g + h) – (i + j); translates into the following assembly code: add t0, g, h add f, g, h add t1, i, j or sub f, f, i sub f, t0, t1 sub f, f, j Each version may produce a different result because floating-point operations are not necessarily associative and commutative… more on this later

22 Title Bullet