Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline

A.7. Another View: MIPS 4300 64-bit embedded processor –Used in Nintendo game box, colour laser printers, network router Still uses the “classic” five-stage pipeline FP has out-of-order completion –Hybrid scheme for providing precise exceptions (Instructions only issued when certain that preceding instructions will not cause an exception; may stall the pipeline)

A.8. Cross-Cutting Issues Instruction Sets and Pipelining Simple instruction sets make pipelining easier Also allow scheduling of code –Instructions reordered for maximum efficiency Statically, by compiler Dynamically, by hardware –E.g. addition on VAX: one instruction; on RISC four instructions (load, load, add, store)

Dynamic Scheduling Simple pipelines: –Fetch instruction and issue it (unless stalled for data hazard prevention) Dynamic scheduling: processor rearranges instructions to minimise stalls! –Out of order execution Out of order completion!

Dynamic Scheduling Normal pipeline: –Stalling add stalls sub Dynamic scheduling: –Hardware lets sub execute while add stalls fdivd %f2, %f4, %f0 faddd %f0, %f8, %f10 fsubd %f8, %f14, %f12 ! Independent

Dynamic Scheduling Issuing instructions: –Issue: decode, check for structural hazards –Read operands: check for data hazards As soon as operands are available, start execution –Out-of-order execution –Out-of-order completion –Complicates exception handling! –Introduces WAR and WAW hazards

Dynamic Scheduling Two approaches –Scoreboarding Centralised control –Tomasulo’s Algorithm Distributed control

Scoreboarding Developed for CDC 6600 (1964) Issues instructions if they do not depend on any active or stalled instruction Requires multiple (or pipelined) functional units –E.g. CDC: 4 FP units, 7 integer units and 5 memory units –Assume MIPS with one integer unit, 2 FP multipliers, 1 adder and 1 divide

Scoreboard Determines data dependences, then determines when an instruction can read its operands and begin Tracks when an instruction can write results Centralises all hazard detection and resolution

Scoreboard Instructions may stall at: –Issue (first stage of old ID) Resolves WAW and structural hazards –Read Operands (second stage of old ID) Resolves RAW hazards –Write Results Resolves WAR hazards

Scoreboard Keeps track of: –Instruction status Where is instruction? (Issue, Operands, Exec, Write) –Functional unit status Busy? Destination for result. Sources for operands. –Register result status For each active instruction, which unit will write to register

Scoreboard Relatively simple to implement –Main “cost” is extra busses connecting multiple functional units Limited by: –Amount of parallelism (independent instructions) –Number of scoreboard entries (window used for instruction look ahead ) –Number and type of functional units –Dependences (causing WAR and WAW stalls)

A.9. Fallacies and Pitfalls Pitfall: Unexpected execution sequences may cause unexpected hazards –E.g. WAW caused by compiler filling delay slots Pitfall: Extensive pipelining can lead to poor price/performance –E.g. VAX microprogram pipeline

Fallacies and Pitfalls Pitfall: Evaluating a code scheduler with unoptimised code

Concluding Remarks Before 1980 pipelining was only used in expensive supercomputers and high-end mainframes Mid-1980’s: adopted by high-end microprocessors –Displaced minicomputers and mainframes 1990’s: desktop processors using sophisticated pipelines –dynamic scheduling, multiple-issue, etc.

Chapter 3 Instruction Level Parallelism and Its Dynamic Exploitation

Introduction Chapter 3: dynamic techniques using hardware –Pentium, Athlon, MIPS, SPARC, etc. Chapter 4: static techniques using software –Itanium

3.1. Instruction-Level Parallelism Pipelining overlaps independent instructions –Instruction-level parallelism (ILP) Extend basic concept of pipelining: –Reducing hazards –Increasing performance by exploiting further parallelism

Pipeline Performance Ideal CPI is maximum performance Seek to reduce each term as far as possible

Finding ILP Basic block –A single straight-line code sequence without branches in or out –Very little ILP –Branch frequency 15% (int)  six/seven instructions per block, largely dependent Need more ILP! –Need to look across basic blocks

Finding ILP Loop-level Parallelism –Often loop iterations are independent for (int k = 0; k < 1000; k++) x[k] = x[k] + y[k]; –1000 independent “blocks” –Use “loop unrolling”

Dependences Which instructions are dependent on each other? Three types of dependence: –Data dependence –Name dependence Antidependence Output dependence –Control dependences

Data Dependence One instruction requires result from another –directly or indirectly –through registers or memory Order must be maintained Data dependences are determined by the program Whether this is a RAW hazard and causes a stall is determined by the pipeline

Name Dependence Two instructions use the same register or memory location (i.e. name), but there is no data flow between them Antidependence –Instr i reads, instr j (after i) writes (WAR) Output dependence –Instr i writes, instr j (after i) writes (WAW) Use “register renaming”

Avoiding Data Hazards Techniques for exploiting parallelism that preserve program order only where it affects results

Control Dependences Determine the order of execution of instructions S1 is control dependent on c1 if (c1) { S1; }

Control Dependences If instruction i is control dependent on a branch it cannot be moved before the branch If instruction i is not control dependent on a branch it cannot be moved after the branch

Control Dependences Can violate control dependences, if we maintain: –Exception behaviour –Data flow

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

Similar presentations

Presentation on theme: "Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

Similar presentations

Presentation on theme: "Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline."— Presentation transcript:

Similar presentations

About project

Feedback