S.J.Lee 1 Chapter 2: Instruction Set Principles and Examples 순천향대학교 컴퓨터학부 이 상 정 Adapted from “http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”,

S.J.Lee 1 Chapter 2: Instruction Set Principles and Examples 순천향대학교 컴퓨터학부 이 상 정 Adapted from “http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”, “http://www.cs.berkeley.edu/~pattrsn/252S98/index.html” Copyright 1998 UCB

S.J.Lee 2 Review, #1 Designing to Last through Trends CapacitySpeed Logic2x in 3 years2x in 3 years DRAM4x in 3 years2x in 10 years Disk4x in 3 years2x in 10 years Processor ( n.a.)2x in 1.5 years Time to run the task –Execution time, response time, latency Tasks per day, hour, week, sec, ns, –Throughput, bandwidth “X is n times faster than Y” means ExTime(Y) Performance(X) --------- =-------------- ExTime(X)Performance(Y)

S.J.Lee 3 Review, #2 Amdahl’s Law: CPI Law: Execution time is the REAL measure of computer performance! Good products created when have: –Good benchmarks –Good ways to summarize performance Speedup overall = ExTime old ExTime new = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle

S.J.Lee 4 Introduction Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding –how is it decoded? Location of operands and result –where other than memory? –how many explicit operands? –how are memory operands located? –which can or cannot be in memory? Data type and Size Operations –what are supported Successor instruction –jumps, conditions, branches What Must be Specified?

S.J.Lee 5 Evolution of Instruction Sets (Mips,Sparc,HP-PA,IBM RS6000,PowerPC...1987) Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language BasedConcept of a Family (B5000 1963)(IBM 360 1964) General Purpose Register Machines Complex Instruction SetsLoad/Store Architecture RISC (Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76) LIW/”EPIC”?(IA-64...1999)

S.J.Lee 6 Classifying Instruction Set Architectures Accumulator (1 register): –1 addressadd Aacc  acc + mem[A] –1+x addressaddx Aacc  acc + mem[A + x] Stack: –0 addressaddtos  tos + next General Purpose Register: –2 addressadd A BEA(A)  EA(A) + EA(B) –3 addressadd A B CEA(A)  EA(B) + EA(C) Load/Store: –3 addressadd Ra Rb RcRa  Rb + Rc load Ra RbRa  mem[Rb] store Ra Rbmem[Rb]  Ra Comparison: –Bytes per instruction? – Number of Instructions? – Cycles per instruction?

S.J.Lee 7 Comparing Number of Instructions ° Code sequence for C = A + B for four classes of instruction sets: StackAccumulatorRegister (register-memory)(load-store) Push ALoad ALoad R1,A Push BAdd BAdd R1,BLoad R2,B AddStore CStore C, R1Add R3,R1,R2 Pop CStore C,R3

S.J.Lee 8 General Purpose Registers Dominate 1975-1995 all machines use general purpose registers Advantages of registers registers are faster than memory registers are easier for a compiler to use -e.g., (A*B) -(C*D)-(E*F) can do multiplies in any order vs. stack registers can hold variables -memory traffic is reduced, so program is sped up (since registers are faster than memory) -code density improves (since register named with fewer bits than memory location)

S.J.Lee 9 Memory Addressing Since 1980 almost every machine uses addresses to level of 8-bits (byte) 2 questions for design of ISA: Since could read a 32-bit word as four loads of bytes from sequential byte addresses or as one load word from a single byte address, how do byte addresses map onto words? Can a word be placed on any byte boundary?

S.J.Lee 10 Addressing Objects: Endianess and Alignment Big Endian: address of most significant –IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA Little Endian:address of least significant –Intel 80x86, DEC Vax, DEC Alpha (Windows NT) msblsb 3 2 1 0 little endian byte 0 0 1 2 3 big endian byte 0 Alignment: require that objects fall on address that is multiple of their size. 0 1 2 3 Aligned Not Aligned

S.J.Lee 11 Addressing Modes Addressing modeExampleMeaning RegisterAdd R4,R3R4  R4+R3 ImmediateAdd R4,#3R4  R4+3 DisplacementAdd R4,100(R1)R4  R4+Mem[100+R1] Register indirectAdd R4,(R1)R4  R4+Mem[R1] Indexed / BaseAdd R3,(R1+R2)R3  R3+Mem[R1+R2] Direct or absoluteAdd R1,(1001)R1  R1+Mem[1001] Memory indirectAdd R1,@(R3)R1  R1+Mem[Mem[R3]] Auto-incrementAdd R1,(R2)+R1  R1+Mem[R2]; R2  R2+d Auto-decrementAdd R1,-(R2)R2  R2-d; R1  R1+Mem[R2] Scaled Add R1,100(R2)[R3]R1  R1+Mem[100+R2+R3*d] Why Auto-increment/decrement? Scaled?

S.J.Lee 12 Addressing Mode Usage? (ignore register mode) SPEC89 3 programs --- Displacement: 42% avg, 32% to 55% 75% --- Immediate: 33% avg, 17% to 43% 85% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc:2% avg, 0% to 3% 75% displacement & immediate 88% displacement, immediate & register indirect

S.J.Lee 13 Displacement Address Size? Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs X-axis is in powers of 2: 4 => addresses > 2 3 (8) and  2 4 (16) Address Bits 1% of addresses > 16-bits 12 -16 bits of displacement needed

S.J.Lee 14 Immediate Size? 50% to 60% fit within 8 bits 75% to 80% fit within 16 bits

S.J.Lee 15 Operations in the Instruction Set Data MovementLoad (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Arithmeticinteger (binary + decimal) or FP Add, Subtract, Multiply, Divide Logicalnot, and, or, set, clear Shiftshift left/right, rotate left/right Control (Jump/Branch)unconditional, conditional Subroutine Linkagecall, return Interrupttrap, return Synchronizationtest & set Stringsearch, translate Graphics (MMX)parallel subword ops (4 16bit add)

S.J.Lee 16 Top 10 80x86 Instructions

S.J.Lee 17 Methods of Testing Condition Condition Codes –Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. ex:add r1, r2, r3 bz label Condition Register –Ex: cmp r1, r2, r3 bgt r1, label Compare and Branch –Ex: bgt r1, r2, label

S.J.Lee 18 Conditional Branch Distance

S.J.Lee 19 Conditional Branch Addressing PC-relative since most branches to the current PC address At least 8 bits suggested (  128 instructions) Compare Equal/Not Equal most important for integer programs (86%)

S.J.Lee 20 Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision M x R E How many +/- #'s? Where is decimal pt? How are +/- exponents represented? exponent base mantissa

S.J.Lee 21 Operand Size Usage Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers

S.J.Lee 22 Generic Examples of Instruction Format Widths Variable: Fixed: Hybrid:

S.J.Lee 23 Summary of Instruction Formats If code size is most important, use variable length instructions If performance is over is most important, use fixed length instructions Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16); per procedure decide performance or density

S.J.Lee 24 Compilers and Instruction Set Architectures Ease of compilation – orthogonality: no special registers, few special cases, all operand modes available with any data type or instruction type – completeness: support for a wide range of operations and target applications – regularity: no overloading for the meanings of instruction fields – streamlined: resource needs easily determined Register Assignment is critical too – Easier if lots of registers

S.J.Lee 25 Summary of Compiler Considerations Provide at least 16 general purpose registers plus separate floating-point registers, Be sure all addressing modes apply to all data transfer instructions, Aim for a minimalist instruction set.

S.J.Lee 26 A "Typical" RISC 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store: base + displacement –no indirection Simple branch conditions Delayed branch see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

S.J.Lee 27 DLX Architecture Simple load-store architecture Based on observations about instruction set architecture Emphasizes: –Simple load-store instruction set –Design for pipeline efficiency –Design for compiler target DLX registers –32 32-bit GPRS named R0, R1,..., R31 –32 32-bit FPRs named F0, F2,..., F30 »Accessed independently for 32-bit data »Accessed in pairs for 64-bit (double-precision) data –R0 is always 0 –Other status registers, e.g., floating-point status register Byte addressable in big-endian with 32-bit address

S.J.Lee 28 DLX Addressing Modes oprsrtrd immed register Register (direct) oprsrt register Base+index + Memory immedoprsrt Immediate immedoprsrt PC PC-relative + Memory All instructions 32 bits wide

S.J.Lee 29 DLX Instruction Format Op 312601516202125 rs1rd immediate Op 3126025 Op 312601516202125 rs1rs2 Offset added to PC rd func R-type instruction 1011 I-type instruction J-type instruction

S.J.Lee 30 DLX Operation Overview Data transfers –LB,LBU,SB, LH,LHU,SH, LW,SW –LF,LD,SF,SD –MOVI2S,MOVS2I, MOVF,MOVD, MOVFP2I,MOVI2FP Arithmetic logical –ADD, ADDI, ADDU, ADDUI,SUB,SUBI, SUBU,SUBUI, MULT,MULTU,DIV,DIVU –AND,ANDI, OR, ORI, XOR,XORI –LHI –SLL,SRL,SRA,SLLI,SRLI,SRAI –S__, S__I => “__” may be LT,GT,LE,GE,EQ,NE Control –BEQZ,BNEZ, BFPT,BFPF –J,JR, TRAP, RFE Floating point –ADDD,ADDF,SUBD,SUBF,MULTD,MULTF,DIVD,DIVF –CVTF2D,CVTF2I,CVTD2F,CTD2I,CVTI2F,CVTI2D –__D,__F => “__” may be LT,GT,LE,GE,EQ,NE

S.J.Lee 31 Examples of DLX load and store instructions InstructionExample Meaning load word LW R1, 30(R2) R1 = 32 Mem[30+R2] load word LW R1, 1000(R0) R1 = 32 Mem[1000+0] load byte LB R1, 40(R3) R1 = 32 (Mem[40+R3] 0 ) 24 ## Mem[40+R3] load byte unsigned LBU R1, 40(R3) R1 = 32 0 24 ## Mem[40+R3] load half word LH R1, 40(R3) R1 = 32 (Mem[40+R3] 0 ) 16 ## Mem[40+R3] ## Mem[41+R3] load float LF F0, 50(R3) F0 = 32 Mem[50+R3] load double LD F0, 50(R3) F0 ## F1 = 64 Mem[50+R3] store wordSW 500(R4), R3 Mem[500+R4] = 32 R3 store floatSF 40(R3), F0 Mem[40+R3] = 32 F0 store doubleSD 40(R3), F0 Mem[40+R3] = 32 F0; Mem[44+R3] = 32 F1 store halfSH 502(R2), R3 Mem[502+R2] = 16 R3 16..31 store byteSB 41(R3), R2 Mem[41+R3] = 8 R2 24..31

S.J.Lee 32 Examples of DLX arithmetic/logical instructions Instruction Example Meaning add ADD R1,R2,R3 R1 = R2 + R3 add immediate ADDI R1,R2,#3 R1 = R2 + 3 load high immediate LHI R1,#42 R1 = 42##0 16 shift left logical immediateSLLI R1,R2,#5 R1 = R2 << 5 set less thanSLT R1,R2,R3 if (R2<R3) R1 = 1 else R1 = 0

S.J.Lee 33 Examples of control-flow instructions InstructionExample Meaning jump J name PC = name; ( (PC+4)-2 25 ) <= name < ( (PC+4)+2 25 ) jump and link JAL name R31 = PC+4; PC = name; ( (PC+4)-2 25 ) <= name < ( (PC+4)+2 25 ) jump and link JALR R2 R31 = PC+4; PC = R2; register jump registerJR R3 PC = R3 branch equal zero BEQZ R4, name if (R4 == 0) PC = name; ( (PC+4)-2 15 ) <= name < ( (PC+4)+2 15 ) branch not equal BNEZ R4, name if (R4 != 0) PC = name; zero ( (PC+4)-2 15 ) <= name < ( (PC+4)+2 15 )

S.J.Lee 34 Summary, #1 Classifying Instruction Set Architectures – Accumulator (1 register):1 address – Stack:0 address – General Purpose Register:2 address, 3 address – Load/Store: 3 address General Purpose Registers Dorminate Data Addressing modes that are important: Displacement, Immediate, Register Indirect Displacement size should be 12 to 16 bits Immediate size should be 8 to 16 bits

S.J.Lee 35 Summary, #2 Operations in the Instruction Set : –Data Movement, Arithmetic, Shift, Logical, Control –subroutine linkage, interrupt –synchronization, string, graphics(MMX) Methods of Testing Condition –condition codes –condition register –compare and branch

S.J.Lee 36 Summary, #3 DLX Architecture –Simple load-store architecture –DLX registers »32 32-bit GPRS named R0, R1,..., R31 »32 32-bit FPRs named F0, F2,..., F30 –Byte addressable in big-endian with 32-bit address

S.J.Lee 1 Chapter 2: Instruction Set Principles and Examples 순천향대학교 컴퓨터학부 이 상 정 Adapted from “http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”,

Similar presentations

Presentation on theme: "S.J.Lee 1 Chapter 2: Instruction Set Principles and Examples 순천향대학교 컴퓨터학부 이 상 정 Adapted from “http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

S.J.Lee 1 Chapter 2: Instruction Set Principles and Examples 순천향대학교 컴퓨터학부 이 상 정 Adapted from “http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”,

Similar presentations

Presentation on theme: "S.J.Lee 1 Chapter 2: Instruction Set Principles and Examples 순천향대학교 컴퓨터학부 이 상 정 Adapted from “http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”,"— Presentation transcript:

Similar presentations

About project

Feedback