Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining
Instruction set design (Sections ) control instructions instruction encoding Basic pipelining implementation (Section A.1)

Control Transfer Instructions
Conditional branches (75% - Int) (82% - FP) Jumps (6% - Int) (10% - FP) Procedure calls/returns (19% - Int) (8% - FP) Design issues: How do you specify the target address? How do you specify the condition? What happens on a procedure call/return?

Specifying the Target Address
PC-Relative: needs fewer bits to encode, independent of how/where the compiled code is linked, used for branches and jumps – typically, the displacement needs 4-8 bits Register-indirect jumps: the address is not known at compile-time and has to be computed at run-time (note: can use any other addressing mode too) procedure returns case statements virtual functions function pointers dynamically shared libraries

Specifying the Condition
Name Examples How condition is tested Advantages Disadvantages Condition Code (CC) 80x86, ARM, PowerPC, SPARC Tests special bits set by ALU ops Sometimes condition is set for free CC is extra state. Instructions cannot be re-ordered Condition Register Alpha, MIPS Comparison sets register and this is tested Simple Register pressure Compare and branch PA-RISC, VAX Comparison is part of the branch One instruction instead of two Complex pipelines

Procedure Call/Returns
Need to maintain a stack of return addresses (in memory or in hardware) Can copy and save all registers together or this can be done selectively Who is responsible for saving registers? Caller saving: correctness issues (global register has to be made available to other procedures), it only saves values that it cares about Callee saving: it saves only as many registers as it needs (provided it doesn’t call other procedures) A combination of both is typically employed

Instruction Set Encoding
Operations are easy to encode efficiently – the key issues are the number of operands and their addressing modes Few addressing modes  low complexity in decoding and pipelining, but greater code size Fixed instruction lengths  low complexity in decoding, but greater code size

Instruction Lengths

Dealing with Code Size in RISC
Some hybrid versions allow for 16 and 32-bit instructions (40% reduction in code size) – useful for embedded apps IBM PowerPC stores 32-bit instructions in compressed form in memory – more hardware complexity on an I-cache miss (need to translate from uncompressed to compressed in addition to virtual to physical) Reducing the register file size can also reduce the instruction length

Compiler Optimizations
The phase-ordering problem…early phases have to assume that register allocation will find a register, else, optimizations such as common subexpression elimination may increase memory traffic

Register Allocation Issues
Graph coloring: determine when variables are live and avoid allocating the same register to variables that are simultaneously live Stack variables (typically local to a procedure): easy to allocate registers for Global data: can be accessed from multiple places (aliasing), difficult to allocate to registers Heap data: dynamically created objects, accessed with pointers, difficult to allocate to registers because of aliasing

Case Study: The MIPS ISA
Load-store architecture Focus on pipelining, decoding, and compiler efficiency In other words, RISC

Registers 32 GPRs (general-purpose/integer registers) and 32 FPRs
64-bit registers; two single-precision FP values can fit in one register Register R0 is hardwired to zero – with displacement addressing mode, we can also accomplish absolute addressing; other uses for R0?

Instruction Format

Control Instructions Comparisons with zero can happen as part of the branch Compares between registers are placed in other registers that are tested by branches Jump-and-link places the return address in register R31

Instruction Frequencies

Summary In the 1960s, stack architectures were considered a good
match for high-level languages In the 1970s, software costs were a concern – ISAs were enriched to make the compiler’s job easier – CISC In the 1980s, there was a push for simpler architectures – high clock speed and high parallelism – RISC ISAs designed in 1980 are still around!

The Assembly Line Unpipelined Pipelined
Start and finish a job before moving to the next Jobs Time A B C Break the job into smaller stages A B C A B C A B C Pipelined

Performance Improvements?
Does it take longer to finish each individual job? Does it take shorter to finish a series of jobs? What assumptions were made while answering these questions? Is a 10-stage pipeline better than a 5-stage pipeline?

Title Bullet

Lecture 4: Instruction Set Design/Pipelining

Similar presentations

Presentation on theme: "Lecture 4: Instruction Set Design/Pipelining"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 4: Instruction Set Design/Pipelining

Similar presentations

Presentation on theme: "Lecture 4: Instruction Set Design/Pipelining"— Presentation transcript:

Similar presentations

About project

Feedback