ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Goal: Write Programs in Assembly
Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.
Adding the Jump Instruction
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
CS 6461: Computer Architecture Instruction Set Architecture Instructor: Morris Lancaster.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
1 ECE462/562 ISA and Datapath Review Ali Akoglu. 2 Instruction Set Architecture A very important abstraction –interface between hardware and low-level.
CS1104: Computer Organisation School of Computing National University of Singapore.
1 ECE369 ECE369 Chapter 2. 2 ECE369 Instruction Set Architecture A very important abstraction –interface between hardware and low-level software –standardizes.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 5 MIPS ISA & Assembly Language Programming.
ELEN 468 Advanced Logic Design
Datorteknik DatapathControl bild 1 Designing a Single Cycle Datapath & Datapath Control.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Computer Systems. Computer System Components Computer Networks.
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
S. Barua – CPSC 440 CHAPTER 2 INSTRUCTIONS: LANGUAGE OF THE COMPUTER Goals – To get familiar with.
Chapter 4 Assessing and Understanding Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Computer Performance: Metrics, Measurement, & Evaluation.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1/9/02CSE ISA's Instruction Set Architectures Part 1 I/O systemInstr. Set Proc. Compiler Operating System Application Digital Design Circuit Design.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
6.004 – Fall /29/0L15 – Building a Beta 1 Building the Beta.
11/02/2009CA&O Lecture 03 by Engr. Umbreen Sabir Computer Architecture & Organization Instructions: Language of Computer Engr. Umbreen Sabir Computer Engineering.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
Csci 136 Computer Architecture II – Summary of MIPS ISA Xiuzhen Cheng
 1998 Morgan Kaufmann Publishers MIPS arithmetic All instructions have 3 operands Operand order is fixed (destination first) Example: C code: A = B +
Computer Architecture CSE 3322 Lecture 3 Assignment: 2.4.1, 2.4.4, 2.6.1, , Due 2/3/09 Read 2.8.
CS 211: Computer Architecture Lecture 2 Instructor: Morris Lancaster.
Computer Organization Rabie A. Ramadan Lecture 3.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Lecture 2: Instruction Set Architecture part 1 (Introduction) Mehran Rezaei.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Instruction Set Architecture Chapter 3 – P & H. Introduction Instruction set architecture interface between programmer and CPU Good ISA makes program.
Electrical and Computer Engineering University of Cyprus
September 2 Performance Read 3.1 through 3.4 for Tuesday
MIPS Instruction Set Advantages
A Closer Look at Instruction Set Architectures
Morgan Kaufmann Publishers
ELEN 468 Advanced Logic Design
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
A Closer Look at Instruction Set Architectures
Processor (I).
Computer Architecture (CS 207 D) Instruction Set Architecture ISA
Instructions - Type and Format
Computer Performance He said, to speed things up we need to squeeze the clock.
Single-Cycle CPU DataPath.
Lecture 4: MIPS Instruction Set
Systems Architecture I
COMP541 Datapaths I Montek Singh Mar 18, 2010.
Arrays versus Pointers
October 29 Review for 2nd Exam Ask Questions! 4/26/2019
CPU Structure CPU must:
Computer Performance Read Chapter 4
The Processor: Datapath & Control.
Presentation transcript:

ISA Issues; Performance Considerations

Testing / System Verilog: ECE385

Key ISA decisions instruction length how many registers? where do operands reside?  e.g., can you add contents of memory to a register? instruction format  which bits designate what? operands  how many? how big?  how are memory addresses computed?

Instruction Length Variable: Fixed: x86 – Instructions vary from 1 to 17 Bytes long VAX – from 1 to 54 Bytes MIPS, PowerPC, and most other RISC’s: all instruction are 4 Bytes long

Instruction Length Variable-length instructions (x86, VAX): - require multi-step fetch and decode. + allow for a more flexible and compact instruction set. Fixed-length instructions (RISC’s) + allow easy fetch and decode. + simplify pipelining and parallelism. - instruction bits are scarce.

How many registers? All computers have a small set of registers Memory to hold values that will be used soon Typical instruction will use 2 or 3 register values Advantages of a small number of registers: It requires fewer bits to specify which one. Less hardware Faster access (shorter wires, fewer gates) Faster context switch (when all registers need saving) Advantages of a larger number: Fewer loads and stores needed Easier to do several operations at once In 411, “load” means moving data from memory to register, “store” is reverse

Where do operands reside ( when the ALU needs them )? Stack machine: “Push” loads memory into 1 st register (“top of stack”), moves other regs down “Pop” does the reverse. “Add” combines contents of first two registers, moves rest up. Accumulator machine: Only 1 register (called the “accumulator”) Instruction include “store” and “acc  acc + mem” Register-Memory machine : Arithmetic instructions can use data in registers and/or memory Load-Store Machine (aka Register-Register Machine): Arithmetic instructions can only use data in registers.

Comparing the ISA classes Code sequence for C = A + B StackAccumulatorRegister-MemoryLoad-Store Push ALoad ALoad R1,A Push BAdd BLoad R2,B AddStore C Add C, A, B Add R3,R1,R2 Pop CStore C,R3

A = X*Y + X*Z Stack AccumulatorReg-MemLoad-store Memory A X Y B C temp ? ? Stack R1 R2 R3 Accumulator Comparing the ISA classes

Load-store architectures can do: add r1=r2+r3 load r3, M(address) store r1, M(address)  forces heavy dependence on registers, which is exactly what you want in today’s CPUs can’t do: add r1=r2+M(address) - more instructions + fast implementation (e.g., easy pipelining) +easier to keep instruction lengths fixed

Key ISA decisions instruction length  are all instructions the same length? how many registers? where do operands reside?  e.g., can you add contents of memory to a register? instruction format  which bits designate what?? operands  how many? how big?  how are memory addresses computed? operations  what operations are provided??

Instruction formats -what does each bit mean? Machine needs to determine quickly, – “This is a 6-byte instruction” – “Bits 7-11 specify a register” –... – Serial decoding bad Having many different instruction formats... – complicates decoding – uses instruction bits (to specify the format) What would be a good thing about having many different instruction formats?

MIPS Instruction Formats for instance, “ add r1, r2, r3” has – OP=0, rs=2, rt=3, rd=1, sa=0, funct=32 – opcode (OP) tells the machine which format OP rs rt rdsafunct rs rt immediate target 6 bits5 bits 6 bits r format i format j format

MIPS ISA Tradeoffs What if? – 64 registers – 20-bit immediates – 4 operand instruction (e.g. Y = AX + B) OP rs rt rdsafunct rs rt immediate target 6 bits5 bits 6 bits Think about how sparsely the bits are used

Conditional branch How do you specify the destination of a branch/jump?

Conditional branch How do you specify the destination of a branch/jump? studies show that almost all conditional branches go short distances from the current program counter (loops, if-then-else).

Conditional branch How do you specify the destination of a branch/jump? studies show that almost all conditional branches go short distances from the current program counter (loops, if-then-else). – we can specify a relative address in much fewer bits than an absolute address – e.g., beq $1, $2, 100 => if ($1 == $2) PC = PC * 4

Conditional branch How do you specify the destination of a branch/jump? studies show that almost all conditional branches go short distances from the current program counter (loops, if-then-else). – we can specify a relative address in much fewer bits than an absolute address – e.g., beq $1, $2, 100 => if ($1 == $2) PC = PC * 4 How do we specify the condition of the branch?

beq, bne beq r1, r2, addr => if (r1 == r2) goto addr slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0 these, combined with $0, can implement all fundamental branch conditions Always, never, !=, = =, >, =, (unsigned), <= (unsigned),... if (i<j) w = w+1; else w = 5; slt $temp, $i, $j beq $temp, $0, L1 add $w, $w, #1 Beq $0, $0, L2 L1: add $w, $0, #5 L2: MIPS conditional branches

Jumps need to be able to jump to an absolute address sometime need to be able to do procedure calls and returns jump -- j => PC = jump and link -- jal => $31 = PC + 4; PC = – used for procedure calls jump register -- jr $31 => PC = $31 – used for returns, but can be useful for lots of other things. OPtarget

Computer Performance

4/14/ Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done? If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase? Computer Performance: TIME, TIME, TIME

4/14/ Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

4/14/ For some program running on machine X, Performance X = 1 / Execution time X "X is n times faster than Y" Performance X / Performance Y = n Problem: – machine A runs a program in 20 seconds – machine B runs the same program in 25 seconds Book's Definition of Performance

4/14/ So, to improve performance (everything else being equal) you can either ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. How to Improve Performance

4/14/ A given program will require – some number of instructions (machine instructions) – some number of cycles – some number of seconds We have a vocubulary that relates these quantities: – cycle time (seconds per cycle) – clock rate (cycles per second) – CPI (cycles per instruction) a floating point intensive application might have a higher CPI – MIPS (millions of instructions per second) this would be higher for a program using simple instructions Now that we understand cycles

4/14/ Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has a clock cycle time of 20 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical? CPI Example

4/14/ A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence? # of Instructions Example

4/14/ Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time? MIPS example

Which is faster? for (i=0; i<N; i=i+1) for (j=0; j<N; j=j+1) { r = 0; for (k=0; k<N; k=k+1) r = r + y[i][k] * z[k][j]; x[i][j] = r; } for (jj=0; jj<N; jj=jj+B) for (kk=0; kk<N; kk=kk+B) for (i=0; i<N; i=i+1) { for (j=jj; j<min(jj+B-1,N); j=j+1) r = 0; for (k=kk; k<min(kk+B-1,N); k=k+1) r = r + y[i][k] * z[k][j]; x[i][j] = x[i][j] + r; } Significantly Faster!

Which is faster? load R1, addr1 store R1, addr2 add R0, R2 -> R3 subtract R4, R3 -> R5 add R0, R6 ->R7 store R7, addr3 load R1, addr1 add R0, R2 -> R3 add R0, R6 -> R7 store R1, addr2 subtract R4, R3 -> R5 store R7, addr3 Twice as fast on some machines and same on others

Which is faster? loop1: add... load... add... bne R1, loop1 loop2: add... load... bne R2, loop2 loop1: add... load... add... bne R1, loop1 nop loop2: add... load... bne R2, loop2 Identical performance on several machines

Various microarchitectural techniques that improve the performance of a program: You already know how to build a processor that works; over the course of the semester we will learn ways to improve performance…