1 Pipelined Implementation. 2 Outline Handle Control Hazard Handle Exception Performance Analysis Suggested Reading 4.5.

Slides:

Advertisements

Similar presentations

Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture PipelinedImplementation Part II CS:APP Chapter 4 Computer Architecture.

Advertisements

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture.

Pipelining (Week 8).

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

1 Seoul National University Wrap-Up. 2 Overview Seoul National University Wrap-Up of PIPE Design  Exception conditions  Performance analysis Modern.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Real-World Pipelines: Car Washes Idea  Divide process into independent stages  Move objects through stages in sequence  At any instant, multiple objects.

PipelinedImplementation Part I PipelinedImplementation.

Instructor: Erol Sahin

– 1 – Chapter 4 Processor Architecture Pipelined Implementation Chapter 4 Processor Architecture Pipelined Implementation Instructor: Dr. Hyunyoung Lee.

Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.

Computer Architecture Carnegie Mellon University

PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.

Wrap-Up CSC 333. – 2 – Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional conditions Modern High-Performance Processors.

Randal E. Bryant CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture SequentialImplementation Slides.

Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.

Pipelining III Topics Hazard mitigation through pipeline forwarding Hardware support for forwarding Forwarding to mitigate control (branch) hazards Systems.

David O’Hallaron Carnegie Mellon University Processor Architecture PIPE: Pipelined Implementation Part I Processor Architecture PIPE: Pipelined Implementation.

Data Hazard Solution 2: Data Forwarding Our naïve pipeline would experience many data stalls  Register isn’t written until completion of write-back stage.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

1 Seoul National University Pipelined Implementation : Part I.

1 Naïve Pipelined Implementation. 2 Outline General Principles of Pipelining –Goal –Difficulties Naïve PIPE Implementation Suggested Reading 4.4, 4.5.

Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.

Randal E. Bryant adapted by Jason Fritts CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.

Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Systems I.

Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.

1 Sequential CPU Implementation. 2 Outline Logic design Organizing Processing into Stages SEQ timing Suggested Reading 4.2,4.3.1 ~

Pipeline Architecture I Slides from: Bryant & O’ Hallaron

PipelinedImplementation Part II PipelinedImplementation.

Computer Architecture: Pipelined Implementation - I

Computer Architecture adapted by Jason Fritts

1 Seoul National University Pipelining Basics. 2 Sequential Processing Seoul National University.

Pipelining IV Topics Implementing pipeline control Pipelining and performance analysis Systems I.

1 SEQ CPU Implementation. 2 Outline SEQ Implementation Suggested Reading 4.3.1,

Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture PipelinedImplementation Part II CS:APP Chapter 4 Computer Architecture.

Sequential Hardware “God created the integers, all else is the work of man” Leopold Kronecker (He believed in the reduction of all mathematics to arguments.

Rabi Mahapatra CS:APP3e Slides are from Authors: Bryant and O Hallaron.

Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.

1 Seoul National University Sequential Implementation.

Real-World Pipelines Idea Divide process into independent stages

Lecture 13 Y86-64: SEQ – sequential implementation

William Stallings Computer Organization and Architecture 8th Edition

Systems I Pipelining IV

Lecture 14 Y86-64: PIPE – pipelined implementation

Sequential Implementation

Administrivia Midterm to be posted on Tuesday after class

Computer Architecture adapted by Jason Fritts then by David Ferry

Morgan Kaufmann Publishers The Processor

Pipelined Implementation : Part I

Seoul National University

Seoul National University

Instruction Decoding Optional icode ifun valC Instruction Format

Pipelined Implementation : Part II

Systems I Pipelining III

Computer Architecture adapted by Jason Fritts

Systems I Pipelining II

Pipelined Implementation : Part I

Seoul National University

Control unit extension for data hazards

Pipeline Architecture I Slides from: Bryant & O’ Hallaron

Pipelined Implementation : Part I

Pipelined Implementation

Pipelined Implementation

Systems I Pipelining II

Chapter 4 Processor Architecture

Systems I Pipelining II

Pipelined Implementation

Real-World Pipelines: Car Washes

Sequential CPU Implementation

Sequential Design תרגול 10.

Presentation transcript:

1 Pipelined Implementation

2 Outline Handle Control Hazard Handle Exception Performance Analysis Suggested Reading 4.5

3 Control Hazard

4 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020:.pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100:.pos 0x100 0x100: Stack: # Stack: Stack pointer Return Example –Previously executed three additional instructions demo-retb.ys

5 Correct Return Example #demo_retb 0x026:ret bubble 0x00b:irmovl $5, %esi #return As ret passes through pipeline, stall at fetch stage –While in decode, execute, and memory stage –fetch the same instruction after ret 3 times. Inject bubble into decode stage Release stall when reach write-back stage FDEM WFDEM W FDEMW FDEMW FDEMWFDEMW F valC  5 rB  %esi F valC  5 rB  %esi W valM= 0x0b W valM= 0x0b

6 M D Register file Register file CC ALU rB dstEdstM ALU A B srcAsrcB ALU fun. Decode Execute AB M E Cnd icodevalEvalAdstEdstM E icodeifunvalCvalAvalBdstEdstMsrcAsrcB valCvalPicodeifunrA d_srcBd_srcA e_Cnd Sel+Fwd A Fwd B Detecting Return

7 0x026: ret FDEM W bubble FDEM W FDEMW FDEMW 0x00b:irmovl $5,% esi# Return FDEMW # demo-retb FDEMW Control for Return ConditionFDEMW Processing retstallbubblenormal ConditionTrigger Processing retIRET in { D_icode, E_icode, M_icode }

8 E M W F D rB icodevalEvalMdstEdstM CndicodevalEvalAdstEdstM icode ifunvalCvalAvalBdstEdstMsrcAsrcB valCvalP icode ifunrA predPC D_icode E_icode M_icode Pipe control logic D_bubble F_stall

9 Special Control Cases Detection ConditionTrigger Processing retIRET in { D_icode, E_icode, M_icode } Load/Use HazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } Mispredicted BranchE_icode = IJXX & !e_Cnd ConditionFDEMW Processing retstallbubblenormal Load/Use Hazardstall bubblenormal Mispredicted Branchnormalbubble normal Action

10 E M W F D CC rB srcA srcB icodevalEvalMdstEdstM CndicodevalEvalAdstEdstM icode ifunvalCvalAvalBdstEdstMsrcAsrcB valCvalP icode ifunrA predPC d_srcB d_srcA e_Cnd D_icode E_icode M_icode E_dstM Pipe control logic D_bubble D_stall E_bubble F_stall

11 Implementing Pipeline Control Combinational logic generates pipeline control signals Action occurs at start of following cycle

12 Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool D_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

13 Initial Version of Pipeline Control bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool E_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

14 Control Combinations –Special cases that can arise on same clock cycle Combination A –Not-taken branch – ret instruction at branch target Combination B –Instruction that reads from memory to %esp –Followed by ret instruction Load E Use D M Load/use JXX E D M Mispredict JXX E D M Mispredict E ret D M 1 E bubble D M ret 2 bubble E D ret M 3 E D M 1 E D M 1 E bubble D M ret 2 E bubble D M ret 2 bubble E D ret M 3 bubble E D ret M 3 Combination B Combination A

15 JXX E D M Mispredict JXX E D M Mispredict E ret D M 1 E D M 1 E D M 1 Combination A ConditionFDEMW Processing retstallbubblenormal Mispredicted Branchnormalbubble normal Combinationstallbubble normal F memory Instruction memory PC increment Select PC Fetch M_valA W_valM f_PC PC predPC Control Combination A

16 Control Combination A Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valA anyhow ConditionFDEMW Processing retstallbubblenormal Mispredicted Branchnormalbubble normal Combinationstallbubble normal

17 Control Combination B Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error Load E Use D M Load/use E ret D M E D M E D M Combination B ConditionFDEMW Processing retstallbubblenormal Load/Use Hazardstall bubblenormal Combinationstallbubble + stallbubblenormal

18 Handling Control Combination B Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle ConditionFDEMW Processing retstallbubblenormal Load/Use Hazardstall bubblenormal Combinationstall bubblenormal

19 Corrected Pipeline Control Logic ConditionFDEMW Processing retstallbubblenormal Load/Use Hazardstall bubblenormal Combinationstall bubblenormal bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });

20 Pipeline Summary Data Hazards –Most handled by forwarding No performance penalty –Load/use hazard requires one cycle stall Control Hazards –Cancel instructions when detect mispredicted branch Two clock cycles wasted –Stall fetch stage while ret passes through pipeline Three clock cycles wasted

21 Pipeline Summary Control Combinations –Must analyze carefully –First version had subtle bug Only arises with unusual instruction combination

Exception Handling 22

23 Exception Condition: Instruction encounter an error condition Deal flow: –Break the program flow –Invoke the exception handler provided by OS –(Maybe) continue the program flow E.g. page fault exception

24 Exceptions –Conditions under which pipeline cannot continue normal operation Causes –Halt instruction(Current) –Bad address for instruction or data(Previous) –Invalid instruction(Previous) –Pipeline control error(Previous) Desired Action –Complete some instructions Either current or previous (depends on exception type) –Discard others –Call exception handler Like an unexpected procedure call

25 Exception Examples Detect in Fetch Stage irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address jmp $-1 # Invalid jump target.byte 0xFF # Invalid instruction code halt # Halt instruction Detect in Memory Stage

26 Exceptions in Pipeline Processor #1 Desired Behavior – rmmovl should cause exception # demo-exc1.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # Invalid address nop.byte 0xFF # Invalid instruction code 0x000: irmovl $100,%eax 0x006: rmmovl %eax,0x10000(%eax) 0x00c: nop 0x00d:.byte 0xFF 1234 FDEM FDE FD F W 5 M E D Exception detected

27 Exceptions in Pipeline Processor #2 Desired Behavior – No exception should occur # demo-exc2.ys 0x000: xorl %eax,%eax # Set condition codes 0x002: jne t # Not taken 0x007: irmovl $1,%eax 0x00d: irmovl $2,%edx 0x013: halt 0x014: t:.byte 0xFF # Target Exception detected 0x000: xorl %eax,%eax 0x002: jne t 0x014: t:.byte 0xFF 0x???: (I’m lost!) 0x007: irmovl $1,%eax 123 FDE FD F 4 M E F D W 5 M D F E E D M 6 M E W 7 W M 8 W 9

28 Maintaining Exception Ordering Add exception status field to pipeline registers Fetch stage sets to either “AOK”, “ADR” (when bad fetch address), or “INS” (illegal instruction) Decode & execute pass values through Memory either passes through or sets to “ADR” Exception triggered only when instruction hits write back FpredPC W icodevalEvalMdstEdstM stat MCndvalEvalAdstEdstM stat EicodeifunvalCvalAvalBdstEdstMsrcAsrcB stat DrBvalCvalPicodeifunrA stat icode

Exception Handling Logic Fetch Stage Memory Stage Writeback Stage dmem_error # Determine status code for fetched instruction int f_stat = [ imem_error: SADR; !instr_valid : SINS; f_icode == IHALT : SHLT; 1 : SAOK; ]; # Update the status int m_stat = [ dmem_error : SADR; 1 : M_stat; ]; int Stat = [ # SBUB in earlier stages indicates bubble W_stat == SBUB : SAOK; 1 : W_stat; ]; 29

30 Side Effects in Pipeline Processor Desired Behavior – rmmovl should cause exception –No following instruction should have any effect # demo-exc3.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address addl %eax,%eax # Sets condition codes 0x000: irmovl $100,%eax 0x006: rmmovl %eax,0x10000(%eax) 0x00c: addl %eax,%eax 1234 FDEM FDE FD W 5 M E Exception detected Condition code set

31 Avoiding Side Effects Presence of Exception Should Disable State Update –When detect exception in memory stage Disable condition code setting in execute Must happen in same clock cycle –When exception passes to write-back stage Disable memory write in memory stage Disable condition code setting in execute stage Implementation –Hardwired into the design of the PIPE simulator –You have no control over this

Control Logic for State Changes Setting Condition Codes Stage Control –Also controls updating of memory # Should the condition codes be updated? bool set_cc = E_icode == IOPL && # State changes only during normal operation !m_stat in { SADR, SINS, SHLT } && !W_stat in { SADR, SINS, SHLT }; # Start injecting bubbles as soon as exception passes through memory stage bool M_bubble = m_stat in { SADR, SINS, SHLT } || W_stat in { SADR, SINS, SHLT }; # Stall pipeline register W when exception encountered bool W_stall = W_stat in { SADR, SINS, SHLT }; 32

33 Rest of Exception Handling Calling Exception Handler –Push PC onto stack Either PC of faulting instruction or of next instruction Usually pass through pipeline along with exception status –Jump to handler address Usually fixed address Defined as part of ISA Implementation –Haven’t tried it yet!

34 Processor Summary Design Technique –Create uniform framework for all instructions Want to share hardware among instructions –Connect standard logic blocks with bits of control logic Operation –State held in memories and clocked registers –Computation done by combinational logic –Clocking of registers/memories sufficient to control overall behavior Enhancing Performance –Pipelining increases throughput and improves resource utilization –Must make sure maintains ISA behavior

35 Performance Analysis

36 Performance Metrics Clock rate –Measured in Megahertz or Gigahertz –Function of stage partitioning and circuit design Keep amount of work per stage small Rate at which instructions executed –CPI: cycles per instruction –On average, how many clock cycles does each instruction require? –Function of pipeline design and benchmark programs E.g., how frequently are branches mispredicted?

37 CPI for PIPE CPI  1.0 –Fetch instruction each clock cycle –Effectively process new instruction almost every cycle Although each individual instruction has latency of 5 cycles CPI > 1.0 –Sometimes must stall or cancel branches

38 CPI for PIPE Computing CPI –C clock cycles –I instructions executed to completion –B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = B/I –Factor B/I represents average penalty due to bubbles

39 CPI for PIPE (Cont.) LP: Penalty due to load/use hazard stalling –Fraction of instructions that are loads0.25 –Fraction of load instructions requiring stall0.20 –Number of bubbles injected each time1  LP = 0.25 * 0.20 * 1 = 0.05 MP: Penalty due to mispredicted branches –Fraction of instructions that are cond. jumps 0.20 –Fraction of cond. jumps mispredicted0.40 –Number of bubbles injected each time 2  MP = 0.20 * 0.40 * 2 = 0.16 Typical Values

40 CPI for PIPE (Cont.) RP: Penalty due to ret instructions –Fraction of instructions that are returns0.02 –Number of bubbles injected each time 3  RP = 0.02 * 3 = 0.06 Net effect of penalties = 0.27  CPI = 1.27 (Not bad!) B/I = LP + MP + RP Typical Values

41 Engineering Consideration

Fetch Logic Revisited During Fetch Cycle 1.Select PC 2.Read bytes from instruction memory 3.Examine icode to determine instruction length 4.Increment PC Timing –Steps 2 & 4 require significant amount of time 42

Standard Fetch Timing –Must Perform Everything in Sequence –Can’t compute incremented PC until know how much to increment it by Select PC Mem. ReadIncrement need_regids, need_valC 1 clock cycle 43

A Fast PC Increment Circuit 3-bit adder need_ValC need_regids 0 29-bit incrementer MUX High-order 29 bits Low-order 3 bits High-order 29 bitsLow-order 3 bits 01 PC incrPC Slow Fast carry 44

Modified Fetch Timing 29-Bit Incrementer –Acts as soon as PC selected –Output not needed until final MUX –Works in parallel with memory read Select PC Mem. Read Incrementer need_regids, need_valC 3-bit add MUX 1 clock cycle Standard cycle 45

More Realistic Fetch Logic Fetch Box –Integrated into instruction cache –Fetches entire cache block (16 or 32 bytes) –Selects current instruction from current block –Works ahead to fetch next block As reaches end of current block At branch target 46

Next Machine-Independent Optimization –Code motion –Memory optimization Suggested reading –5.1 ~