Feb 2005Pipelining 32 Project 4 - Pipelined MIPS
Feb 2005Pipelining 33 Project 4 - What to Do Download & simulate basic model Extend processor to do either Data forwarding + load/use stall (see Fig. 6-36/6.33) OR Branch implementation in ID including IF.Flush (See Fig. 6-38) Simulate extended processor to show it works You may work in groups of two
Feb 2005Pipelining 34 Review - Pipelined Processor with Hazard Detection (Fig. 6.36)
Feb 2005Pipelining 35 Pipelined Processor - Branch Hardware in ID (Old Fig. 6.51)
Feb 2005Pipelining 37 Instruction Level Parallelism (ILP) Parallel execution of instructions is known as Instruction Level Parallelism (ILP) Pipelining exploits ILP by overlapping execution ILP limited by Data hazards Control hazards
Feb 2005Pipelining 311 Static Multiple Issue Key idea: issue (decode & execute) multiple instructions in each clock cycle Example: Issue load/store and ALU/branch in MIPS ALU or branch Instruction typePipe stages IFIDEXMEMWB Load/ StoreIFIDEXMEMWB ALU or branch Load/ Store ALU or branch Load/ Store ALU or branch Load/ Store IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB (Fig. 6.44)
Feb 2005Pipelining 312 Example - A Static Multiple Issue MIPS (Fig. 6.45) Executes ALU/Branch Instructions Executes Load/Store Instructions
Feb 2005Pipelining 313 Static Multiple Issue Tradeoffs Advantage: increased performance Real processors issue up to 6 instructions / cycle Several challenges: Building a register file with lots of ports Dealing with data dependencies Stalls due to control dependencies (branch prediction helps!) Building a memory system that can “keep up” (caches help!) Finding opportunities to fully utilize the functional units
Feb 2005Pipelining 314 VLIW / EPIC Processors VLIW - Very Long Instruction Words Functional units exposed in instruction word Static scheduling by compiler Pipeline is exposed; compiler must schedule delays to get right result Examples: Philips Trimedia, Transmeta Crusoe Explicit Parallel Instruction Computer (EPIC) 3 41-bit instructions in each instruction packet Compiler determines parallelism Hardware checks dependencies and fowards/stalls Examples: Intel Itanium, Itanium 2
Feb 2005Pipelining 320 Dynamic Multiple Issue Key ideas: ”Look past" stalls for instructions that can execute lw $t0, 20($t2) addu$t1, $t0, $t2 sub$s4, $s4, $s3 slti$t5, $s4, 20 Execute instructions out of order Use multiple functional units for parallel execution Forward results between functional units when necessary Update registers (in original order of execution) addu stalls until $t0 available sub is ready to execute but blocked by stall!
Feb 2005Pipelining 321 Speculation Guess about the outcome of an instruction (e.g., branch or load) Based on guess, start executing instructions Cancel started instructions if guess is incorrect Complicating factors Must buffer instruction results until outcome known Exceptions in speculated instructions - how can you have an exception in an instruction that didn’t execute?
Feb 2005Pipelining 322 Dynamic Pipelining (Fig. 6.49) Instruction Fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Floating point Load/ Store Commit unit Functional units In-order issue In-order commit Out-of-order execute
Feb 2005Pipelining 323 Dynamic Pipelining in the Pentium 4 (Fig. 6.50)
Feb 2005Pipelining 324 Dynamic Pipelining in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm.
Feb 2005Pipelining 325 Pentium 3 & 4 Pipeline Stages Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm. Drive stages - waiting for signal propagation
Feb 2005Pipelining 327 Simultaneous Multithreading (SMT) Key idea: extend processor to multiple threads of execution that execute concurrently Each thread has its own PC and register state Scheduling hardware shares functional units Appears to software as two “separate” processors Advantage: when one thread stalls, another may be ready Proposed for servers, where multiple threads are common State Thread A State Thread B Functional Units Issue Slots Time
Feb 2005Pipelining 328 Roadmap for the term: major topics Overview / Abstractions and Technology Performance Instruction sets Logic & arithmetic Processor Implementation Memory systems Input/Output