5 Simple Instruction Cycle Model Where is the time spent here ?
6 Faster Processing Can be achieved through: Faster cycle time Divide cycle into more StatesImplementing parallelism
7 Prefetch Consider the instruction sequence as: Fetch instructionExecution instruction (often does not access main memory)Can computer fetch next instruction during execution of current instruction ?Called instruction Pre-fetchWhat are the implications of Pre-fetch?36
8 A Two Stage Instruction Pipeline What additional hardware is required for Pre-fetch ?
9 Improved Performance with Prefetch Improved speed, but not doubled, why?Fetch usually shorter than executionAny jump or branch means that pre-fetched instructions are not the required instructionsCould we Prefetch more than one instruction ?Could we add “more stages” to further improve performance?37
10 Instruction Cycle with Indirect Addressing What is the benefit of this organization ?
11 Five State Instruction Cycle Fetch instructionsDecode instructionsFetch Operands (Calc Addr & get data)Execute (Process data)Write results (Calculate Addr & store data)22
13 Pipelining Consider the instruction sequence as: This is pipelining Fetch instruction (FI) ,Decode instruction (DI),Calculate Operands (CO),Fetch Operands (FO)Execute Instruction (EI),Write Operand (WO),Check for Interrupt (CI)Consider it as an “assembly line” of operations.Then we can begin the next instruction assembly line sequencebefore the last has finished. Actually we can fetch the nextinstruction while the present one is being decoded.This is pipelining36
14 Pipeline “stations” Let’s define a possible set of Pipeline stations: Fetch Instruction (FI)Decode Instruction (DI)Calculate Operand Addresses (CO)Fetch Operands (FO)Execute Instruction (EI)Write Operand (WO)
15 Possible Timing Diagram for Instruction Pipeline Operation Limitation: - maximum time for any stage,- unnecessary stages, and- overhead of transfers39
16 The Impact of a Conditional Branch on Instruction Pipeline Operation Instruction 3 is a conditional branch to instruction 15:40
17 Alternative Pipeline View Instruction 3 is conditional branch to instruction 15:
21 Structural HazardsStructural hazards occur when instruction in the pipeline need the same resource:MemoryCPUEtc.
22 Example: Resource Hazard Fetch of I3 has to stall for memory access of I1 operand.
23 Data HazardData Hazards occur when there is a conflict in the access of:a memory location ora register
24 Types of Data Hazards Read after Write (RAW) – true dependency A Hazard occurs if the Read occurs before the Write is completeWrite after Read (WAR) – anti-dependencyA Hazard occurs if the Write occurs before the Read happensWrite after Write (WAW) – output dependencyA Hazard occurs if the two Writes occur in the reverse order than intended
25 Example: RAW Data Hazard The second instruction needs to stall for EAC to be written by the first instruction before fetching it.Is there a way of stalling one cycle instead of two?
26 The Other Data Hazards Write after Read (WAR) – anti-dependency A Hazard occurs if the Write occurs before the Read happensExample?Write after Write (WAW) – output dependencyA Hazard occurs if the two Writes occur in the reverse order than intended
27 Control HazardControl Hazards occur when a wrong fetch decision results in a new instruction fetch and the pipeline being flushedSolutions include:Multiple Pipeline streamsPrefetching the branch targetUsing a Loop BufferBranch PredictionDelayed BranchReordering of InstructionsMultiple Copies of RegistersGet branch target early
28 Multiple Streams Have two pipelines Prefetch each branch into a separate pipelineUse appropriate pipelineChallenges:Leads to bus & register contentionMultiple branches lead to further pipelines being needed42
29 Prefetch Branch Target Target of branch is prefetched in addition to instructions following branchKeep target until branch is executed43
30 Using a Loop BufferHave a small fast memory to hold the past n instructions – perhaps already decodedThis likely contains loops that are executed repeatedly
32 Branch Prediction Predict branch never taken Predict branch always takenPredict by opcodeUse Predict branch taken/not taken switchMaintain branch history tableGet help from Compiler45
33 Predict Branch Taken / Not taken Predict never takenAssume that jump will not happenAlways fetch next instructionPredict always takenAssume that jump will happenAlways fetch target instructionWhich is better – consider possible page faults?45
34 Branch Prediction by Opcode / Switch Predict by OpcodeSome instructions are more likely to result in a jump than othersCan get up to 75% success with this stategyTaken/Not taken switchBased on previous historyGood for loopsPerhaps good to match programmer style46
39 Delayed BranchIn Delayed Branch, the branch is moved before “independent instructions” preceding it. Then those instructions which now follow the branch can be executed while the branch target is being determined.What would it take to actually do this ?
40 Instruction Reordering Instruction reordering requires a judicious reordering of instructions so that data hazards can be eliminated.How can this be implemented ?
41 Multiple Copies of Registers Having multiple copies of registers – perhaps as many as one set for each stage can eliminate many data hazardsHow would you implement this ?
42 Get Branch Target Early The branch target is often available before the end of the pipeline, e.g. a JMP has it available as soon as the source operand stage is completed. There is no need to wait until the completion of the write back stage to begin fetching the next instruction.What would it take to implement this ?
43 Example: Intel 80486 Pipelining Fetch (Fetch)From cache or external memoryPut in one of two 16-byte prefetch buffersFill buffer with new data as soon as old data consumedAverage 5 instructions fetched per loadIndependent of other stages to keep buffers fullDecode stage 1 (D1)Opcode & address-mode infoAt most first 3 bytes of instructionCan direct D2 stage to get rest of instructionDecode stage 2 (D2)Expand opcode into control signalsComputation of complex address modesExecute (EX)ALU operations, cache access, register updateWriteback (WB)Update registers & flagsResults sent to cache & bus interface write buffers