Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

Similar presentations


Presentation on theme: "CS 136 1 CS136, Advanced Architecture Instruction Set Architecture."— Presentation transcript:

1 CS 136 1 CS136, Advanced Architecture Instruction Set Architecture

2 CS 136 2 Types of ISAs Stack –Implicit operands (top of stack) –Heavy memory traffic –Limited ability to access operands at will –Obsolete Accumulator –Implicit register operand (“accumulator”) –One memory operand –Insufficient temporaries –Obsolete General-purpose register –Multiple registers –Several variations

3 CS 136 3 GPR Architectures Memory-memory –CISC idea –Usually allows any operand to be in register as well Register-memory –Example: x86 –Can do one operand in register, one in memory, or 2 in regs Register-register –Only design used in modern machines –Lots of registers ⇒ fast flexible operand access –Simplicity of hardware –Compiler has full flexibility in register usage

4 CS 136 4 Five Ways to Do C = A + B STACK PUSH A PUSH B ADD POP C ACCUM LOAD A ADD B STORE C MEM-MEM ADD C,A,B REG-MEM LOAD R1,A ADD R1,B STORE R1,C REG-REG LOAD R1,A LOAD R2,B ADD R3,R1,R2 STORE R3,C

5 CS 136 5 Memory Addressing Originally just word addressing 8-bit bytes and byte addressing introduced on IBM 360 series Brief experiments with bit addressing (bad idea) Unaligned accesses not worth supporting Some machines byte-address but only load/store a word at a time –Turned out to be bad design decision –Too many programs do string processing 1 character at a time –May need to revisit in future (32-bit characters?) Modern RISC designs allow short load/store, but not short arithmetic

6 CS 136 6 Endian-ness The word is “Endian”, not “Indian” Reference to Gulliver’s Travels Little-Endian invented by Digital Equipment on the PDP-11 –Mathematically more elegant –Horrible for humans –“It seemed like a good idea at the time” –Should be banished from the face of the Earth Some machines can switch endianness with a control bit –This idea is even stupider than the original

7 CS 136 7 Addressing Modes How can an instruction reference memory? Early days: absolute address in instruction –Led to instruction modification –Improvement: “Indirection” picked up absolute location, used it as final address Minimum necessary today: follow pointer in register –Clumsy if only option Fanciest conceivable: *(R1+S*R2+constant), with either or both of R1 and R2 autoincremented or autodecremented as side effect, either before or after instruction –No machine went quite this far –But VAX came close

8 CS 136 8 Addressing Modes (cont’d) What’s actually useful? Need to follow pointers: can restrict to registers –ADD R1,(R2) –Better: LOAD R1,R2 (like MIPS) Frequent stack access ⇒ register + constant useful Immediates needed for built-in constants Access to globals ⇒ absolute memory addresses –(We’ll see that that’s painful) PC-relative modes –Used to be needed for data; not in modern systems –Still needed for calls and branches Absolute addresses no longer needed for branches –Can always emulate with PC-relative, since PC known –Still available on some architectures

9 CS 136 9 Operand Types and Sizes Type usually implies size Integers can safely be widened to word size –Shrink again when stored –Takes advantage of two’s-complement representation Single-precision FP gives different results than double-precisions ⇒ Necessary to support both widths –Some FPUs can do two SP operations in parallel Older machines allowed “packed” decimal (2 digits per byte) –x86 supports with DAA (Decimal Add Adjust) instruction –Still useful in business world, though dying 32 bits standard these days, 64 bits coming –128 some day?

10 CS 136 10 Operations Provided Only one instruction truly needed: SJ –Subtract A from B, giving C; if result is < 0, jump to D –It’s Turing-complete! Practical machines need a bit more at minimum: –Arithmetic and logical (add, multiply, divide?, and, or, …) –Data movement (load/store, move between registers) –Control (conditional/unconditional branch, procedure call and return, trap to OS) –System control (return from interrupt, manage VM, set unprivileged mode, access I/O devices) Other builtins can be useful: –Basic floating point »Bad x86 design idea: sin, sqrt, etc.! –Decimal –String –Vector, graphics

11 CS 136 11 Control Flow Addressing modes are important –PC-relative means code can run at any virtual address –Useful for dynamically linked (shared) libraries Pointer-following jump needed for returns –Also useful for switch statements, function pointers, virtual functions, and shared libraries How to specify condition for conditional branches? –Condition code as side effect of every instruction »Boils down to extra register »Spurious dependencies in pipeline –Condition register explicitly set by comparison –Compare as part of branch »Adds delay slots in pipeline

12 CS 136 12 Encodings Variable-length instructions –Highly efficient (few wasted bits) –Allows complex specifications (e.g., x86 addressing modes) –Usually means misaligned instruction fetch –Greatly complicates fetch/decode units Fixed-length instructions –May limit number of registers –Usually very few instruction formats –Wastes space but gains speed (e.g., only aligned fetches) –Limits width of immediate operands

13 CS 136 13 The Fight for Bits How wide should instruction be? –Wider ⇒ can encode more registers, more options –Wider ⇒ bigger programs, more memory bandwidth –Bigger programs ⇒ fewer cache hits Things you need to encode: –Operation code (16 to 1000 instructions) –Operands (at least one, normally two or three) –Immediate operands –Memory offsets –Branch targets –Branch conditions –Conditional operations (e.g., conditional load, add)

14 CS 136 14 Two or Three Operands? In favor of three: –Smaller code size –No clobbered operands ⇒ fewer copies or reloads –Setting R0 to zero allows fewer operations supported in ALU In favor of two: –Can address more registers

15 CS 136 15 How to Decide All These Questions? Slide rules at 50 paces? Analysis wars –Look at existing designs, existing programs –“Recompile” programs for hypothetical architecture »Analyze size of resulting program »Run through simulator to see how it performs –Impractical approach »Writing compiler back ends is expensive »Simulators are slow –instead, make projections based on existing object code

16 CS 136 16 Example of Bad Analysis: @-(R2) DEC VAX had three “auto” addressing modes: autopostincrement, autopredecrement, and indirect autopostincrement What happened to indirect autopredecrement? –Analyzed output of BLISS compiler on many programs –Language didn’t provide way to express autopredecrement –Concluded it wasn’t necessary –Very different result if had analyzed C! *--p1 = a[--i];

17 CS 136 17 Example of Difficult Analysis: imm16 How big should an immediate be? Easy analysis: examine existing code –Calculate frequency of various widths –Analyze tradeoff of using those bits for other purposes Problem: analyzed architecture affects frequency of different widths –E.g., Alpha has only 16 bits, so you’ll never see over 16! –Alternative: look for multi-instruction sequences that effectively use more than 16 bits »Hard to find (compiler pipeline scheduling) »Compiler will stand on head, use sneaky tricks to avoid generating extra instructions –Need for wider constants depends on architecture »E.g., MIPS needs them when jumping to shared libraries

18 CS 136 18

19 CS 136 19 Interaction with Compilers Nearly all modern code generated by compilers Architect must make compiler’s job easier –Lots of registers –Orthogonal instruction set –Few side effects –Instructions and addressing modes matched to language constructs »But NOT attempt to implement them in detail! »Primitives are better than “solutions” even when solutions are correct –Good support for stack, globals, and pointers –Support for both compile-time and run-time binding –Don’t ask compiler to predict dynamic information (e.g., branch targets) –Don’t provide features language can’t express »Example pro and con: vector architectures

20 CS 136 20 The MIPS64 Architecture Extension of MIPS32 Data path widened to 64 bits –Still 32-bit instructions –Still only 32 registers Most instructions have “D” as prefix to indicate 64-bit version

21 CS 136 21 MIPS Instruction Formats Opcode 6 rs 5 rt 5 rd 5 shamt 5 funct 6 R-Type Instruction I-Type Instruction Opcode 6 rs 5 rt 516 Immediate J-Type Instruction Opcode 626 Offset inserted into PC

22 CS 136 22 I-Type Instructions Encodes loads, stores (all widths), immediate ALU ops Also conditional branches (rt unused) Opcode 6 rs 5 rt 516 Immediate

23 CS 136 23 R-Type Instructions Opcode 6 rs 5 rt 5 rd 5 shamt 5 funct 6 Register-register ALU operations –“funct” encodes the ALU operation: add, sub, etc. –Opcode chooses operands, special registers, sizes, etc. –Conditional moves Handles special registers, floating point, …

24 CS 136 24 J-Type Instructions Opcode 626 Offset inserted into PC Jump, jump and link Trap, return from exception

25 CS 136 25 MIPS Control Flow Unconditional jump substitutes low bits of PC –NOT addition! –Exceptionally bad on 64-bit architecture, where 36 bits unchanged No built-in stack –Subroutine call stores return in register –Callee must save on stack if necessary –Reduces overall cycle time –Ultra-efficient for leaf functions Conditional branches only test against zero –Complex tests (e.g., <) store Z/NZ result in a register –We’ve seen how this improves the pipeline Conditional moves can eliminate many branches –Feature of many modern architectures

26 CS 136 26 MIPS Floating Point Floating point was originally coprocessor ⇒ Separate FP registers –Special instructions to move to/from integer registers MIPS64 (but not 32) has paired single operations –Two SP numbers pass through DP ALU simultaneously MIPS64 also has multiply-add in one instruction –Useful in signal processing (multimedia)

27 CS 136 27 Fallacies and Pitfalls PITFALL: Instruction designed to support feature in some language –Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK –Why is this bad? »Easy to get wrong (PDP-11 MARK instruction) »Easy to make inefficient (VAX CALLS) »Languages evolve, hardware doesn‘t

28 CS 136 28 Fallacies and Pitfalls (2) FALLACY: Typical programs exist –We wish! PITFALL: Ignoring the compiler –Design better code size, based on bad compiler –Good compiler can blow your idea out of the water FALLACY: Flawed architectures can’t succeed –Ummm, x86? –Every architecture has drawbacks FALLACY: You (YOU!) can design a flawless architecture –Always tradeoffs –Always something new to learn

29 CS 136 29 Summary Instruction encoding is important Don’t forget to provide what the compiler needs –This is NOT what you think the compiler needs! Addresses will only get wider Data will only get wider –Including characters Cleverness to improve bandwidth (e.g., MADD) RISC is here to stay


Download ppt "CS 136 1 CS136, Advanced Architecture Instruction Set Architecture."

Similar presentations


Ads by Google