February 2, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II Krste Asanovic Electrical Engineering and Computer.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Computer Organization and Architecture
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Pipelining III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture MIPS Review & Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
January 31, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
February 2, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II Krste Asanovic Electrical Engineering and Computer.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1
February 2, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II (Branches, Exceptions) Krste Asanovic Electrical.
Krste Asanovic Electrical Engineering and Computer Sciences
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 7 Pipelining – Part 2 Benjamin Lee Electrical and Computer Engineering Duke University
Constructive Computer Architecture Interrupts/Exceptions/Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
CS 152 Computer Architecture and Engineering Lecture 15 - Out-of-Order Memory, Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Computer Architecture and Parallel Computing 并行结构与计算 Lecture 2 - Pipelining Peng Liu Dept. of Info. Sci. & Elec. Engg. Zhejiang University
Interrupts and exceptions
Exceptions Another form of control hazard Could be caused by…
Computer Organization CS224
Basic Processor Structure/design
CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction John Wawrzynek Electrical Engineering.
CS252 Graduate Computer Architecture Fall 2015 Lecture 4: Pipelining
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
Morgan Kaufmann Publishers The Processor
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining
Exceptions & Multi-cycle Operations
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining
Pipelining: Advanced ILP
Dr. George Michelogiannakis EECS, University of California at Berkeley
The processor: Pipelining and Branching
Computer Organization & Design 计算机组成与设计
Dept. of Info. Sci. & Elec. Engg.
Krste Asanovic Electrical Engineering and Computer Sciences
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Project Instruction Scheduler Assembler for DLX
Computer Architecture
Control Hazards Branches (conditional, unconditional, call-return)
Control unit extension for data hazards
Control unit extension for data hazards
Chapter 11 Processor Structure and function
Interrupts and exceptions
CMSC 611: Advanced Computer Architecture
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

February 2, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley

February 2, 2010CS152, Spring Last time in Lecture 4 Pipelining increases clock frequency, while growing CPI more slowly, hence giving greater performance Time = Instructions Cycles Time Program Program * Instruction * Cycle Reduces because fewer logic gates on critical paths between flip-flops Increases because of pipeline bubbles Pipelining of instructions is complicated by HAZARDS: –Structural hazards (two instructions want same hardware resource) –Data hazards (earlier instruction produces value needed by later instruction) –Control hazards (instruction changes control flow, e.g., branches or exceptions) Techniques to handle hazards: –Interlock (hold newer instruction until older instructions drain out of pipeline and write back results) –Bypass (transfer value from older instruction to newer instruction as soon as available somewhere in machine) –Speculate (guess effect of earlier instruction)

February 2, 2010CS152, Spring Control Hazards What do we need to calculate next PC? –For Jumps » Opcode, offset and PC –For Jump Register »Opcode and Register value –For Conditional Branches »Opcode, PC, Register (for condition), and offset –For all other instructions »Opcode and PC have to know it’s not one of above

February 2, 2010CS152, Spring time t0t1t2t3t4t5t6t7.... (I 1 ) r1 (r0) + 10IF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) r3 (r2) + 17 IF 2 IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 )IF 3 IF 3 ID 3 EX 3 MA 3 WB 3 (I 4 ) IF 4 IF 4 ID 4 EX 4 MA 4 WB 4 time t0t1t2t3t4t5t6t7.... IFI 1 nop I 2 nop I 3 nop I 4 IDI 1 nop I 2 nop I 3 nop I 4 EX I 1 nopI 2 nopI 3 nopI 4 MA I 1 nopI 2 nopI 3 nopI 4 WB I 1 nopI 2 nopI 3 nopI 4 PC Calculation Bubbles (assuming no branch delay slots for now) Resource Usage nop  pipeline bubble

February 2, 2010CS152, Spring Speculate next address is PC+4 I 1 096ADD I 2 100J 304 I 3 104ADD I 4 304ADD kill A jump instruction kills (not stalls) the following instruction stall How? I2I2 I1I1 104 IR PC addr inst Inst Memory 0x4 Add nop IR E M Add Jump? PCSrc (pc+4 / jabs / rind/ br)

February 2, 2010CS152, Spring Pipelining Jumps I 1 096ADD I 2 100J 304 I 3 104ADD I 4 304ADD kill I2I2 I1I1 104 stall IR PC addr inst Inst Memory 0x4 Add nop IR E M Add Jump? PCSrc (pc+4 / jabs / rind/ br) IRSrc D = Case opcode D J, JAL nop... IM To kill a fetched instruction -- Insert a mux before IR Any interaction between stall and jump? nop IRSrc D I2I2 I1I1 304 nop

February 2, 2010CS152, Spring time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 I 5 IDI 1 I 2 nop I 4 I 5 EX I 1 I 2 nop I 4 I 5 MA I 1 I 2 nop I 4 I 5 WB I 1 I 2 nop I 4 I 5 Jump Pipeline Diagrams time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) 100: J 304IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 ) 104: ADDIF 3 nop nop nop nop (I 4 ) 304: ADD IF 4 ID 4 EX 4 MA 4 WB 4 Resource Usage nop  pipeline bubble

February 2, 2010CS152, Spring Pipelining Conditional Branches I 1 096ADD I 2 100BEQZ r I 3 104ADD I 4 304ADD BEQZ? I2I2 I1I1 104 stall IR PC addr inst Inst Memory 0x4 Add nop IR E M Add PCSrc (pc+4 / jabs / rind / br) nop IRSrc D Branch condition is not known until the execute stage what action should be taken in the decode stage ? A Y ALU zero?

February 2, 2010CS152, Spring Pipelining Conditional Branches I 1 096ADD I 2 100BEQZ r I 3 104ADD I 4 304ADD stall IR PC addr inst Inst Memory 0x4 Add nop IR E M Add PCSrc (pc+4 / jabs / rind / br) nop IRSrc D A Y ALU zero? If the branch is taken - kill the two following instructions - the instruction at the decode stage is not valid  stall signal is not valid I2I2 I1I1 108 I3I3 BEQZ? ?

February 2, 2010CS152, Spring Pipelining Conditional Branches I 1 096ADD I 2 100BEQZ r I 3 104ADD I 4 304ADD stall IR PC addr inst Inst Memory 0x4 Add nop IR E M PCSrc (pc+4/jabs/rind/br) nop A Y ALU zero? I2I2 I1I1 108 I3I3 BEQZ? Jump? IRSrc D IRSrc E If the branch is taken - kill the two following instructions - the instruction at the decode stage is not valid  stall signal is not valid Add PC

February 2, 2010CS152, Spring New Stall Signal stall = ( ((rs D =ws E ).we E + (rs D =ws M ).we M + (rs D =ws W ).we W ).re1 D + ((rt D =ws E ).we E + (rt D =ws M ).we M + (rt D =ws W ).we W ).re2 D ). !((opcode E =BEQZ).z + (opcode E =BNEZ).!z) Don’t stall if the branch is taken. Why? Instruction at the decode stage is invalid

February 2, 2010CS152, Spring Control Equations for PC and IR Muxes PCSrc = Case opcode E BEQZ.z, BNEZ.!z br...  Case opcode D J, JALjabs JR, JALRrind... pc+4 IRSrc D = Case opcode E BEQZ.z, BNEZ.!z nop...  Case opcode D J, JAL, JR, JALR nop... IM Give priority to the older instruction, i.e., execute stage instruction over decode stage instruction IRSrc E = Case opcode E BEQZ.z, BNEZ.!z nop... stall.nop + !stall.IR D

February 2, 2010CS152, Spring time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 I 5 IDI 1 I 2 I 3 nop I 5 EX I 1 I 2 nop nop I 5 MA I 1 I 2 nop nop I 5 WB I 1 I 2 nop nop I 5 Branch Pipeline Diagrams (resolved in execute stage) time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) 100: BEQZ +200IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 ) 104: ADDIF 3 ID 3 nop nop nop (I 4 ) 108: IF 4 nop nop nop nop (I 5 ) 304: ADD IF 5 ID 5 EX 5 MA 5 WB 5 Resource Usage nop  pipeline bubble

February 2, 2010CS152, Spring One pipeline bubble can be removed if an extra comparator is used in the Decode stage PC addr inst Inst Memory 0x4 Add IR nop E Add PCSrc (pc+4 / jabs / rind/ br) rd1 GPRs rs1 rs2 ws wd rd2 we nop Zero detect on register file output Pipeline diagram now same as for jumps D Reducing Branch Penalty ( resolve in decode stage)

February 2, 2010CS152, Spring Branch Delay Slots (expose control hazard to software) Change the ISA semantics so that the instruction that follows a jump or branch is always executed –gives compiler the flexibility to put in a useful instruction where normally a pipeline bubble would have resulted. I 1 096ADD I 2 100BEQZ r I 3 104ADD I 4 304ADD Delay slot instruction executed regardless of branch outcome Other techniques include more advanced branch prediction, which can dramatically reduce the branch penalty... to come later

February 2, 2010CS152, Spring time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 IDI 1 I 2 I 3 I 4 EX I 1 I 2 I 3 I 4 MA I 1 I 2 I 3 I 4 WB I 1 I 2 I 3 I 4 Branch Pipeline Diagrams (branch delay slot) time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) 100: BEQZ +200IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 ) 104: ADDIF 3 ID 3 EX 3 MA 3 WB 3 (I 4 ) 304: ADD IF 4 ID 4 EX 4 MA 4 WB 4 Resource Usage

February 2, 2010CS152, Spring Why an Instruction may not be dispatched every cycle (CPI>1) Full bypassing may be too expensive to implement –typically all frequently used paths are provided –some infrequently used bypass paths may increase cycle time and counteract the benefit of reducing CPI Loads have two-cycle latency –Instruction after load cannot use load result –MIPS-I ISA defined load delay slots, a software-visible pipeline hazard (compiler schedules independent instruction or inserts NOP to avoid hazard). Removed in MIPS-II (pipeline interlocks added in hardware) »MIPS:“Microprocessor without Interlocked Pipeline Stages” Conditional branches may cause bubbles –kill following instruction(s) if no delay slots Machines with software-visible delay slots may execute significant number of NOP instructions inserted by the compiler. NOPs not counted in useful CPI (alternatively, increase instructions/program)

February 2, 2010CS152, Spring CS152 Administrivia PS1/Lab1 due start of class Thursday Feb 11 Quiz 1, Tuesday Feb 16

February 2, 2010CS152, Spring Interrupts : altering the normal flow of control I i-1 HI 1 HI 2 HI n IiIi I i+1 program interrupt handler An external or internal event that needs to be processed by another (system) program. The event is usually unexpected or rare from program’s point of view.

February 2, 2010CS152, Spring Causes of Interrupts Asynchronous: an external event –input/output device service-request –timer expiration –power disruptions, hardware failure Synchronous: an internal event (a.k.a. exceptions) –undefined opcode, privileged instruction –arithmetic overflow, FPU exception –misaligned memory access –virtual memory exceptions: page faults, TLB misses, protection violations –traps: system calls, e.g., jumps into kernel Interrupt: an event that requests the attention of the processor

February 2, 2010CS152, Spring History of Exception Handling First system with exceptions was Univac-I, 1951 –Arithmetic overflow would either »1. trigger the execution a two-instruction fix-up routine at address 0, or »2. at the programmer's option, cause the computer to stop –Later Univac 1103, 1955, modified to add external interrupts »Used to gather real-time wind tunnel data First system with I/O interrupts was DYSEAC, 1954 –Had two program counters, and I/O signal caused switch between two PCs –Also, first system with DMA (direct memory access by I/O device) [Courtesy Mark Smotherman]

February 2, 2010CS152, Spring DYSEAC, first mobile computer! Carried in two tractor trailers, 12 tons + 8 tons Built for US Army Signal Corps [Courtesy Mark Smotherman]

February 2, 2010CS152, Spring Asynchronous Interrupts: invoking the interrupt handler An I/O device requests attention by asserting one of the prioritized interrupt request lines When the processor decides to process the interrupt –It stops the current program at instruction I i, completing all the instructions up to I i-1 (precise interrupt) –It saves the PC of instruction I i in a special register (EPC) –It disables interrupts and transfers control to a designated interrupt handler running in the kernel mode

February 2, 2010CS152, Spring Interrupt Handler Saves EPC before enabling interrupts to allow nested interrupts  –need an instruction to move EPC into GPRs –need a way to mask further interrupts at least until EPC can be saved Needs to read a status register that indicates the cause of the interrupt Uses a special indirect jump instruction RFE (return- from-exception) which –enables interrupts –restores the processor to the user mode –restores hardware status and control state

February 2, 2010CS152, Spring Synchronous Interrupts A synchronous interrupt (exception) is caused by a particular instruction In general, the instruction cannot be completed and needs to be restarted after the exception has been handled –requires undoing the effect of one or more partially executed instructions In the case of a system call trap, the instruction is considered to have been completed –a special jump instruction involving a change to privileged kernel mode

February 2, 2010CS152, Spring Exception Handling 5-Stage Pipeline How to handle multiple simultaneous exceptions in different pipeline stages? How and where to handle external asynchronous interrupts? PC Inst. Mem D Decode EM Data Mem W + Illegal Opcode Overflow Data address Exceptions PC address Exception Asynchronous Interrupts

February 2, 2010CS152, Spring Exception Handling 5-Stage Pipeline PC Inst. Mem D Decode EM Data Mem W + Illegal Opcode Overflow Data address Exceptions PC address Exception Asynchronous Interrupts Exc D PC D Exc E PC E Exc M PC M Cause EPC Kill D Stage Kill F Stage Kill E Stage Select Handler PC Kill Writeback Commit Point

February 2, 2010CS152, Spring Exception Handling 5-Stage Pipeline Hold exception flags in pipeline until commit point (M stage) Exceptions in earlier pipe stages override later exceptions for a given instruction Inject external interrupts at commit point (override others) If exception at commit: update Cause and EPC registers, kill all stages, inject handler PC into fetch stage

February 2, 2010CS152, Spring Speculating on Exceptions Prediction mechanism –Exceptions are rare, so simply predicting no exceptions is very accurate! Check prediction mechanism –Exceptions detected at end of instruction execution pipeline, special hardware for various exception types Recovery mechanism –Only write architectural state at commit point, so can throw away partially executed instructions after exception –Launch exception handler after flushing pipeline Bypassing allows use of uncommitted instruction results by following instructions

February 2, 2010CS152, Spring time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 I 5 IDI 1 I 2 I 3 nop I 5 EX I 1 I 2 nop nop I 5 MA I 1 nop nop nop I 5 WB nop nop nop nop I 5 Exception Pipeline Diagram time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 nop overflow! (I 2 ) 100: XORIF 2 ID 2 EX 2 nop nop (I 3 ) 104: SUBIF 3 ID 3 nop nop nop (I 4 ) 108: ADD IF 4 nop nop nopnop (I 5 ) Exc. Handler code IF 5 ID 5 EX 5 MA 5 WB 5 Resource Usage

February 2, 2010CS152, Spring Acknowledgements These slides contain material developed and copyright by: –Arvind (MIT) –Krste Asanovic (MIT/UCB) –Joel Emer (Intel/MIT) –James Hoe (CMU) –John Kubiatowicz (UCB) –David Patterson (UCB) MIT material derived from course UCB material derived from course CS252