Presentation on theme: "204521 Digital System Design 1 Lecture 3 Instruction Set Architecture Pradondet Nilagupta Fall 2000 (original notes from Prof. Mike Schulte)"— Presentation transcript:
204521 Digital System Design 1 Lecture 3 Instruction Set Architecture Pradondet Nilagupta Fall 2000 (original notes from Prof. Mike Schulte)
204521 Digital System Design 2 Overview ISA (I) Concentrate on ISA Introduce wide variety of design alternative to instruction set architecture –Focus on four topics Classification of instruction set alternative –Give some qualitative assessment of the advantage and disadvantage of various approach Present and analyze some instruction set measurement that are largely independent of a specific instruction
204521 Digital System Design 3 Address the issue of a languages and compiler and their bearing on ISA Show how these idea are reflected in DLX instruction set, which is typical of recent instruction set architectures Examine a wide variety of architectural measurement –Measurements depend on the programs measured and on the compiler used in making these measurements Overview ISA (II)
204521 Digital System Design 4 Hot Topics in Computer Architecture 1950s and 1960s: –Computer Arithmetic 1970 and 1980s: –Instruction Set Design –ISA Appropriate for Compilers 1990s: –Design of CPU –Design of memory system –Design of I/O system –Multiprocessors –Instruction Set Extensions
204521 Digital System Design 5 Instruction Set Architecture Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine. The instruction set architecture is also the machine description that a hardware designer must understand to design a correct implementation of the computer.
204521 Digital System Design 6 Instruction Set Architecture The instruction set architecture serves as the interface between software and hardware instruction set software hardware
204521 Digital System Design 7 Interface Design A good interface: –Lasts through many implementations (portability, compatibility) –Is used in many different ways (generality) –Provides convenient functionality to higher levels –Permits an efficient implementation at lower levels
204521 Digital System Design 8 What Are the Components of an ISA? Sometimes known as The Programmer ’ s Model of the machine Storage cells –General and special purpose registers in the CPU –Many general purpose cells of same size in memory –Storage associated with I/O devices The machine instruction set –The instruction set is the entire repertoire of machine operations –Makes use of storage cells, formats, and results of the fetch/execute cycle –i.e., register transfers
204521 Digital System Design 9 The instruction format –Size and meaning of fields within the instruction The nature of the fetch-execute cycle –Things that are done before the operation code is known What Are the Components of an ISA?
204521 Digital System Design 10 Programmer ’ s Models of Various Machines 2 16 bytes of main memory capacity Fewer than 100 instructions 7 15 A 2 16 – 1 B IX SP PC 0 12 general purpose registers More than 300 instructions More than 250 instructions More than 120 instructions 2 32 – 1 2 52 – 1 0 PSW Status R0 PC R11 AP FP SP 0310 32 64-bit floating point registers (introduced 1993)(introduced 1981)(introduced 1975)(introduced 1979) 0 31 063 32 32-bit general purpose registers 0 31 031 More than 50 32-bit special purpose registers 031 2 52 bytes of main memory capacity 0 M6800VAX11PPC601 2 20 – 1 AX BX CX DX SP BP SI DI 15708 IP Status Address and count registers CS DS SS ES Memory segment registers 2 20 bytes of main memory capacity 0 I8086 2 32 bytes of main memory capacity Data registers 6 special purpose registers
204521 Digital System Design 11 Which operation to performadd r0, r1, r3 –Ans: Op code: add, load, branch, etc. Where to find the operand or operandsadd r0, r1, r3 –In CPU registers, memory cells, I/O locations, or part of instruction Place to store resultadd r0, r1, r3 –Again CPU register or memory cell What Must an Instruction Specify?(I) Data Flow
204521 Digital System Design 12 Location of next instructionadd r0, r1, r3 br endloop –Almost always memory cell pointed to by program counter — PC Sometimes there is no operand, or no result, or no next instruction. Can you think of examples? What Must an Instruction Specify?(II)
204521 Digital System Design 13 Instructions Can Be Divided into 3 Classes (I) Data movement instructions –Move data from a memory location or register to another memory location or register without changing its form –Load — source is memory and destination is register –Store — source is register and destination is memory Arithmetic and logic (ALU) instructions –Change the form of one or more operands to produce a result stored in another location –Add, Sub, Shift, etc.
204521 Digital System Design 14 Branch instructions (control flow instructions) –Alter the normal flow of control from executing the next instruction in sequence –Br Loc, Brz Loc2, — unconditional or conditional branches Instructions Can Be Divided into 3 Classes (II)
204521 Digital System Design 15 Examples of Data Movement Instructions Lots of variation, even with one instruction type InstructionMeaningMachine MOV A, B Move 16 bits from memory location A to VAX11 Location B LDA A, Addr Load accumulator A with the byte at memory M6800 location Addr lwz R3, A Move 32-bit data from memory location A to PPC601 register R3 li $3, 455 Load the 32-bit integer 455 into register $3 MIPS R3000 mov R4, dout Move 16-bit data from R4 to output port dout DEC PDP11 IN, AL, KBD Load a byte from in port KBD to accumulatorIntel Pentium LEA.L (A0), A2 Load the address pointed to by A0 into A2 M6800
204521 Digital System Design 16 Examples of ALU Instructions InstructionMeaningMachine MULF A, B, Cmultiply the 32-bit floating point values atVAX11 mem loc ’ ns. A and B, store at C nabs r3, r1Store abs value of r1 in r3PPC601 ori $2, $1, 255Store logical OR of reg $ 1 with 255 into reg $2MIPS R3000 DEC R2Decrement the 16-bit value stored in reg R2DEC PDP11 SHL AX, 4Shift the 16-bit value in reg AX left by 4 bit pos ’ ns.Intel 8086 Notice again the complete dissimilarity of both syntax and semantics.
204521 Digital System Design 17 Examples of Branch Instructions InstructionMeaningMachine BLSS A, TgtBranch to address Tgt if the least significant VAX11 bit of mem loc ’ n. A is set (i.e. = 1) bun r2Branch to location in R2 if result of previousPPC601 floating point computation was Not a Number (NAN) beq $2, $1, 32Branch to location (PC + 4 + 32) if contentsMIPS R3000 of $1 and $2 are equal SOB R4, LoopDecrement R4 and branch to Loop if R4 น 0DEC PDP11 JCXZ AddrJump to Addr if contents of register CX น 0.Intel 8086
204521 Digital System Design 18 ISA Metrics Orthogonality –No special registers, few special cases, all operand modes available with any data type or instruction type Completeness –Support for a wide range of operations and target applications Regularity –No overloading for the meanings of instruction fields Streamlined –Resource needs easily determined Ease of compilation (programming?), Ease of implementation, Scalability
204521 Digital System Design 19 Instruction Set Design Issues Instruction set design issues include: –Where are operands stored? registers, memory, stack, accumulator –How many explicit operands are there? 0, 1, 2, or 3 –How is the operand location specified? register, immediate, indirect,... –What type & size of operands are supported? byte, int, float, double, string, vector... –What operations are supported? add, sub, mul, move, compare...
204521 Digital System Design 20 Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language BasedConcept of a Family (B5000 1963)(IBM 360 1964) General Purpose Register Machines Complex Instruction SetsLoad/Store Architecture RISC (Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76) (Mips,Sparc,88000,IBM RS6000,...1987+)
204521 Digital System Design 21 Evolution of Instruction Sets Major advances in computer architecture are typically associated with landmark instruction set designs –Ex: Stack VS. GPR (System 360) Design decisions must take into account: –technology –machine organization –programming languages –compiler technology –operating systems The design decisions in turn influence these.
204521 Digital System Design 22 Classifying ISAs Accumulator (before 1960): 1 addressadd Aacc ฌ acc + mem[A] Stack (1960s to 1970s): 0 addressaddtos ฌ tos + next Memory-Memory (1970s to 1980s): 2 addressadd A, Bmem[A] ฌ mem[A] + mem[B] 3 addressadd A, B, C mem[A] ฌ mem[B] + mem[C] Register-Memory (1970s to present): 2 addressadd R1, AR1 ฌ R1 + mem[A] load R1, AR1 ฌ mem[A] Register-Register (Load/Store) (1960s to present): 3 addressadd R1, R2, R3R1 ฌ R2 + R3 load R1, R2R1 ฌ mem[R2] store R1, R2mem[R1] ฌ R2
204521 Digital System Design 23 Ex. Expression Evaluation for 3-, 2-, 1-, and 0-Address Machines Number of instructions & number of addresses both vary Discuss as examples: size of code in each case
204521 Digital System Design 24 Stack Architectures Instruction set: add, sub, mult, div,... push A, pop A Example: A*B - (A+C*B) push A push B mul push A push C push B mul add sub AB A A*B A A C A ACBB*CA+B*Cresult
204521 Digital System Design 25 The 0-Address, or Stack, Machine and Instruction Format
204521 Digital System Design 26 Stacks: Pros and Cons Pros –Good code density (implicite top of stack) –Low hardware requirements –Easy to write a simpler compiler for stack architectures Cons –Stack becomes the bottleneck –Little ability for parallelism or pipelining –Data is not always at the top of stack when need, so additional instructions like TOP and SWAP are needed –Difficult to write an optimizing compiler for stack architectures
204521 Digital System Design 27 Accumulator Architectures Instruction set: add A, sub A, mult A, div A,... load A, store A Example: A*B - (A+C*B) load B mul C add A store D load A mul B sub D BB*CA+B*CA A*Bresult
204521 Digital System Design 28 1-Address Machine and Instruction Format Special CPU register, the accumulator, supplies 1 operand and stores result One memory address used for other operand Need instructions to load and store operands: LDA OpAddr STA OpAddr Memory Op1Addr:Op1 Nexti Program counter Accumulator NextiAddr: CPU Where to find next instruction 24 add Op1 (Acc ฌ Acc + Op1) Bits:824 Instruction format addOp1Addr Which operation Where to find operand1 Where to find operand2, and where to put result
204521 Digital System Design 29 Accumulators: Pros and Cons Pros –Very low hardware requirements –Easy to design and understand Cons –Accumulator becomes the bottleneck –Little ability for parallelism or pipelining –High memory traffic
204521 Digital System Design 30 Memory-Memory Architectures Instruction set: (3 operands)add A, B, Csub A, B, C mul A, B, C (2 operands)add A, Bsub A, B mul A, B Example: A*B - (A+C*B) –3 operands 2 operands mul D, A, Bmov D, A mul E, C, Bmul D, B add E, A, Emov E, C sub E, D, Emul E, B add E, A sub E, D
204521 Digital System Design 31 The 2-Address Machine and Instruction Format Result overwrites Operand 2 Needs only 2 addresses in instruction but less choice in placing data Memory Op1Addr: Op2Addr: Op1 Program counter Op2,Res Nexti NextiAddr: CPU Where to find next instruction 24 add Op2, Op1 (Op2 ฌ Op2 + Op1)
204521 Digital System Design 32 Memory-Memory: Pros and Cons Pros –Requires fewer instructions (especially if 3 operands) –Easy to write compilers for (especially if 3 operands) Cons –Very high memory traffic (especially if 3 operands) –Variable number of clocks per instruction –With two operands, more data movements are required
204521 Digital System Design 33 Register-Memory Architectures Instruction set: add R1, A sub R1, A mul R1, B load R1, Astore R1, A Example: A*B - (A+C*B) load R1, A mul R1, B/*A*B*/ store R1, D load R2, C mul R2, B/*C*B*/ add R2, A/*A + CB*/ sub R2, D/*AB - (A + C*B)*/
204521 Digital System Design 34 Memory-Register: Pros and Cons Pros –Some data can be accessed without loading first –Instruction format easy to encode –Good code density Cons –Operands are not equivalent (poor orthorganality) –Variable number of clocks per instruction –May limit number of registers
204521 Digital System Design 35 Load-Store Architectures Instruction set: add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3 load R1, R4store R1, R4 Example: A*B - (A+C*B) load R1, &A load R2, &B load R3, &C load R4, R1 load R5, R2 load R6, R3 mul R7, R6, R5/* C*B */ add R8, R7, R4 /* A + C*B */ mul R9, R4, R5/* A*B */ sub R10, R9, R8/*A*B - (A+C*B) */
204521 Digital System Design 36 The 3-Address Machine and Instruction format Address of next instruction kept in processor state register — the PC (except for explicit branches/jumps) Rest of addresses in instruction –Discuss: savings in instruction word size add, Res, Op1, Op2 (Res ฌ Op2 + Op1) Memory CPU
204521 Digital System Design 37 Load-Store: Pros and Cons Pros –Simple, fixed length instruction encoding –Instructions take similar number of cycles –Relatively easy to pipeline Cons –Higher instruction count –Not all instructions need three operands –Dependent on good compiler
204521 Digital System Design 38 Registers: Advantages and Disadvantages Advantages –Faster than cache (no addressing mode or tags) –Deterministic (no misses) –Can replicate (multiple read ports) –Short identifier (typically 3 to 8 bits) –Reduce memory traffic Disadvantages –Need to save and restore on procedure calls and context switch –Can ’ t take the address of a register (for pointers) –Fixed size (can ’ t store strings or structures efficiently) –Compiler must manage
204521 Digital System Design 39 General Register Machine and Instruction Formats
204521 Digital System Design 40 It is the most common choice in today ’ s general- purpose computers Which register is specified by small “ address ” (3 to 6 bits for 8 to 64 registers) Load and store have one long & one short address: 1- addresses Arithmetic instruction has 3 “ half ” addresses General Register Machine and Instruction Formats
204521 Digital System Design 41 Real Machines Are Not So Simple Most real machines have a mixture of 3, 2, 1, 0, and 1- address instructions A distinction can be made on whether arithmetic instructions use data from memory If ALU instructions only use registers for operands and result, machine type is load-store –Only load and store instructions reference memory Other machines have a mix of register-memory and memory-memory instructions
204521 Digital System Design 42 Big Endian Addressing With Big Endian addressing, the byte binary address x... x00 is in the most significant position (big end) of a 32 bit word (IBM, Motorola, Sun, HP).
204521 Digital System Design 43 Little Endian Addressing With Little Endian addressing, the byte binary address x... x00 is in the least significant position (little end) of a 32 bit word (DEC, Intel).
204521 Digital System Design 44 Operand Alignment An access to an operand of size s bytes at byte address A is said to be aligned if A mod s = 0
204521 Digital System Design 45 Unrestricted Alignment If the architecture does not restrict memory accesses to be aligned then –Software is simple –Hardware must detect misalignment and make 2 memory accesses –Expensive detection logic is required –All references can be made slower Sometimes unrestricted alignment is required for backwards compatibility
204521 Digital System Design 46 Restricted Alignment If the architecture restricts memory accesses to be aligned then –Software must guarantee alignment –Hardware detects misalignment access and traps –No extra time is spent when data is aligned Since we want to make the common case fast, having restricted alignment is often a better choice, unless compatibility is an issue.
204521 Digital System Design 47 Types of Addressing Modes (VAX) 1.Register directRi 2.Immediate (literal)#n 3.DisplacementM[Ri + #n] 4.Register indirect M[Ri] 5.IndexedM[Ri + Rj] 6.Direct (absolute)M[#n] 7.Memory IndirectM[M[Ri] ] 8.AutoincrementM[Ri++] 9.AutodecrementM[Ri - -] 10. ScaledM[Ri + Rj*d + #n] Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX. memory reg. file
204521 Digital System Design 48 Frequency of Immediate Addressing on DLX Not all instructions can take advantage of immediate addressing.
204521 Digital System Design 49 Types of Operations Arithmetic and Logic:AND, ADD Data Transfer:MOVE, LOAD, STORE ControlBRANCH, JUMP, CALL SystemOS CALL, VM Floating PointADDF, MULF, DIVF DecimalADDD, CONVERT StringMOVE, COMPARE Graphics(DE)COMPRESS
204521 Digital System Design 50 80x86 Instruction Frequency
204521 Digital System Design 51 Relative Frequency of Control Instructions Design hardware to handle branches quickly, since these occur most frequently
204521 Digital System Design 52 Frequency of Operand Sizes on 32-bit Load-Store Machine For floating-point want good performance for 64 bit operands. For integer operations want good performance for 32 bit operands.
204521 Digital System Design 53 Encoding an Instruction set a desire to have as many registers and addressing mode as possible the impact of size of register and addressing mode fields on the average instruction size and hence on the average program size a desire to have instruction encode into lengths that will be easy to handle in the implementation
204521 Digital System Design 54 Three choice for encoding the instruction set Variable –Instruction length varies based on opcode and address specifiers –For example, VAX instructions vary between 1 and 53 bytes –Good code density, but difficult to decode Fixed –Only a single size for all instructions –For example, DLX, MIPS, Power PC, Sparc all have 32 bit instructions –Not as good code density, but easier to decode Hybrid –Have multiple format lengths specified by the opcode –For example, IBM 360/370 and Intel 80x86 –Compromise between code density and ease of decode
204521 Digital System Design 55 Compilers and ISA Compiler Goals –All correct programs compile correctly –Most compiled programs execute quickly –Most programs compile quickly –Achieve small code size –Provide debugging support Multiple Source Compilers –Same compiler can compiler different languages Multiple Target Compilers –Same compiler can generate code for different machines
204521 Digital System Design 56 Compilers Phases Compilers use phases to manage complexity –Front end Convert language to intermediate form –High level optimizer Procedure inlining and loop transformations –Global optimizer Global and local optimization, plus register allocation –Code generator (and assembler) Dependency elimination, instruction selection, pipeline scheduling
204521 Digital System Design 57 Allocation of Variables Stack –used to allocate local variables –grown and shrunk on procedure calls and returns –register allocation works best for stack-allocated objects Global data area –used to allocate global variables and constants –many of these objects are arrays or large data structures –impossible to allocate to registers if they are aliased Heap –used to allocate dynamic objects –heap objects are accessed with pointers –never allocated to registers
204521 Digital System Design 58 Designing ISA to Improve Compilation Provide enough general purpose registers to ease register allocation ( more than 16). Provide regular instruction sets by keeping the operations, data types, and addressing modes orthogonal. Provide primitive constructs rather than trying to map to a high-level language. Simplify trade-off among alternatives. Allow compilers to help make the common case fast.