EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Instruction-Level Parallelism (ILP)
CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.
331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.
COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
EENG449b/Savvides Lec 4.1 1/25/05 January 25 and 25, 2005 Prof. Andreas Savvides Spring g449b EENG 449b/CPSC.
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Appendix A Pipelining: Basic and Intermediate Concepts
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.
Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.
Lecture 7: Pipelining Review Kai Bu
Lecture 5: Pipelining Implementation Kai Bu
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Computer Organization
Pipelines An overview of pipelining
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
CMSC 611: Advanced Computer Architecture
5 Steps of MIPS Datapath Figure A.2, Page A-8
Appendix C Pipeline implementation
ECE232: Hardware Organization and Design
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 2
CMSC 611: Advanced Computer Architecture
Pipelining Multicycle, MIPS R4000, and More
Serial versus Pipelined Execution
CS-447– Computer Architecture Lecture 14 Pipelining (2)
\course\cpeg323-05F\Topic6b-323
An Introduction to pipelining
Overview What are pipeline hazards? Types of hazards
Pipelining Appendix A and Chapter 3.
Recall: Performance Evaluation
Presentation transcript:

EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch Execute Memory Access Write Back

EE524/CptS561 Jose G. Delgado-Frias 2 Instruction Fetch Write Back Memory AccessExecute Inst. Dec. Op. Fetch Datapath IR Reg imm A B ALUALU PC Inst. Mem. Data Mem. +4 zero IR  Mem[PC] NPC  PC + 4 A  Reg[IR ] B  Reg[IR ] Imm  ((IR 16 ) 16 ## IR ] NPC Multiplexers (mux)

EE524/CptS561 Jose G. Delgado-Frias 3 Datapath (Arith/Logic Inst.) IR Reg imm A B ALUALU PC Inst. Mem. Data Mem. +4 zero ALUoutput  A op B ALUoutput  A op Imm Reg[IR16..20]  ALUoutput IR  Mem[PC] NPC  PC + 4 A  Reg[IR ] B  Reg[IR ] Imm  ((IR 16 ) 16 ## IR ]

EE524/CptS561 Jose G. Delgado-Frias 4 Datapath (Load Inst.) IR Reg imm A B ALUALU PC Inst. Mem. Data Mem. +4 zero ALUoutput  A op Imm Reg[IR11-15]  LMD IR  Mem[PC] NPC  PC + 4 A  Reg[IR ] B  Reg[IR ] Imm  ((IR 16 ) 16 ## IR ]

EE524/CptS561 Jose G. Delgado-Frias 5 Datapath (Store Inst.) IR Reg imm A B ALUALU PC Inst. Mem. Data Mem. +4 zero ALUoutput  A op Imm IR  Mem[PC] NPC  PC + 4 A  Reg[IR ] B  Reg[IR ] Imm  ((IR 16 ) 16 ## IR ] Mem[ALUoutput]  B

EE524/CptS561 Jose G. Delgado-Frias 6 Datapath (Branch Inst.) IR Reg imm A B ALUALU PC Inst. Mem. Data Mem. +4 zero ALUoutput  (PC+4) op Imm IR  Mem[PC] NPC  PC + 4 A  Reg[IR ] B  Reg[IR ] Imm  ((IR 16 ) 16 ## IR ]

Instructions of a program EE524/CptS561 Jose G. Delgado-Frias 7 1 IFIDEXMEMWBIFIDEXWB 2 IFID 3 Time (clock cycles)

Instructions of a program EE524/CptS561 Jose G. Delgado-Frias ID IF EX IF ID MEM IF EX ID WB IF MEM EX ID WB MEM EX ID IF CLOCK CYCLE WB MEM EX ID IF WB MEM EX IF ID WB MEM ID EX 7 8

Pipelining Lessons EE524/CptS561 Jose G. Delgado-Frias 9 Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup

EE524/CptS561 Jose G. Delgado-Frias 10 Datapath w/ pipeline Reg ALUALU Data Mem. zero PC Inst. Mem. +4 Pipeline registers Clock

EE524/CptS561 Jose G. Delgado-Frias 11 Datapath w/ pipeline Reg ALUALU Data Mem. zero PC Inst. Mem. +4

EE524/CptS561 Jose G. Delgado-Frias 12 Pipeline IF 1 ID/OF IF INSTRUCTIONS CLOCK CYCLE EX ID/OF IF MEM EX ID/OF IF WB MEM EX ID/OF IF WB MEM EX ID/OF IF WB MEM EX ID/OF IF WB MEM EX ID/OF IF WB MEM EX ID/OF IF

EE524/CptS561 Jose G. Delgado-Frias 13 Pipeline Hazards Structural Hazards –two or more instructions use same hardware at the same time. Data Hazards –Data dependencies –Result from inst. j is needed by inst. k Control Hazards –Branch changes flow, what happen with the following instruction(s)

EE524/CptS561 Jose G. Delgado-Frias 14 Resources Mem (IM) Reg Mem (IM) ALU Reg Mem (IM) Mem (DM) ALU Reg Mem (IM) Reg Mem (DM) Reg Mem (DM) Reg ALU Reg Mem (DM) Reg ALU

EE524/CptS561 Jose G. Delgado-Frias 15 Data Hazards Mem (IM) Reg Mem (IM) ALU Reg Mem (IM) Mem (DM) ALU Reg Mem (IM) Reg Mem (DM) Reg Mem (DM) Reg ALU Reg Mem (DM) Reg ALU R1 R2+R3 R5 R1+R3 R8 R1-R6

EE524/CptS561 Jose G. Delgado-Frias 16 Data Forwarding Mem (IM) Reg Mem (IM) ALU Reg Mem (IM) Mem (DM) ALU Reg Mem (IM) Reg Mem (DM) Reg Mem (DM) Reg ALU Reg Mem (DM) Reg ALU R1 R2+R3 R5 R1+R3 R8 R1-R6

EE524/CptS561 Jose G. Delgado-Frias 17 Datapath w/ pipeline Reg ALUALU Data Mem. zero PC Inst. Mem. +4 Forwarding unit

EE524/CptS561 Jose G. Delgado-Frias 18 Example Reg ALUALU Data Mem. zero PC Inst. Mem. +4 Forwarding unit ADD R1,R2,R3 SUB R4,R3,R1ADD R1,R2,R3SUB R4,R3,R1XOR R7,R8,R1 ADD R1,R2,R3 SUB R4,R3,R1XOR R7,R8,R1

EE524/CptS561 Jose G. Delgado-Frias 19 Example Reg ALUALU Data Mem. zero PC Inst. Mem. +4 Forwarding unit ADD R1.. SUB R4,R3,R1XOR R7,R8,R1

EE524/CptS561 Jose G. Delgado-Frias 20 Example Reg ALUALU Data Mem. zero PC Inst. Mem. +4 Forwarding unit ADD R1.. SUB R8,R3,R1XOR R7,R8,R1

EE524/CptS561 Jose G. Delgado-Frias 21 Data Hazard Classification RAW (Read After Write) –w/ forward only load presents a problem WAW WAR RAR j: R1  k:RY  R1 j: R1  k:R1  j:  R1 k:R1  j:  R1 k:  R1

EE524/CptS561 Jose G. Delgado-Frias 22 Data Forwarding (load) Mem (IM) Reg Mem (IM) ALU Reg Mem (IM) Mem (DM) ALU Reg Mem (IM) Reg Mem (DM) Reg Mem (DM) Reg ALU Reg Mem (DM) Reg ALU R1 LD[Mem] R5 R1+R3 R8 R1-R6

EE524/CptS561 Jose G. Delgado-Frias 23 Data hazard (load) IF LW R1,0(R1) ID IF SUB R4,R1,R5 EX ID IF WB EX ID IF MEM EX ID AND R6,R1,R7 OR R8,R1,R9 MEM stall MEM EX WB “R1”

EE524/CptS561 Jose G. Delgado-Frias 24 Branch BR R1, LABEL_A ADD R2,R3,R7 AND R5,R7,R11 :::: LD R4,R2,005LABEL_A:

EE524/CptS561 Jose G. Delgado-Frias 25 Branch Mem (IM) Reg Mem (IM) Reg BR R1, LABEL_A ALU Reg Mem (IM) Reg Mem (DM) ALU Reg Mem (DM) ALU Reg Mem (DM) ADD R2,R3,R7 AND R5,R7,R11 LD R4,R2,005 Mem (DM) ALU Reg Mem (IM)

EE524/CptS561 Jose G. Delgado-Frias 26 Datapath w/ pipeline Reg ALUALU Data Mem. zero PC Inst. Mem. +4 Forwarding unit

EE524/CptS561 Jose G. Delgado-Frias 27 What to do w/ branch Reduce the number of cycles to decide on a branch. Delayed branch (Software Solutions) –NO-OP –move instructions from before from target from fall through

EE524/CptS561 Jose G. Delgado-Frias 28 Branch Mem (IM) Reg Mem (IM) Reg BR R1, LABEL_A ALU Reg Mem (IM) Mem (DM) ALU Reg Mem (IM) Reg Mem (DM) ALU Reg Mem (DM) ALU Reg Mem (DM) ADD R2,R3,R7 LD R4,R2,005

EE524/CptS561 Jose G. Delgado-Frias 29 NO-OP Branch NO-OP

EE524/CptS561 Jose G. Delgado-Frias 30 From Before Branch

EE524/CptS561 Jose G. Delgado-Frias 31 From Target Branch

EE524/CptS561 Jose G. Delgado-Frias 32 From Fall Through Branch

33 Multicycle Operations I FI D MEMW B EX inst. unit FP multiply FP adder FP divider

34 FP operations FP Add: 4 cycles FP Multiply: 7 cycles FP Divide: 25 cycles

35 Out of order completionExecution starts in order Example MULTD ADDD LD SD 1 IF 2 ID IF 3 m1 ID IF 4 m2 a1 ID IF 5 m3 a2 X ID 6 m4 a3 M X 7 m5 a4 W M 8 m6 M W 9 m7 W 10 M 11 W

36 MIPS R4000 ( Superpipelining ) instruction memory IFIS ALU EX data memory DFDSTC Reg WB Reg RF IF: Instruction fetch First half IS: Instruction fetch Second half RF:Inst. Decode & Register Fetch EX:Execution DF: Data fetch First half DS: Data fetch Second half TC:Tag Check WB:Write Back

37 Load instruction memory ALU data memory Reg instruction memory ALU data memory Reg instruction memory ALU data memory Reg instruction memory ALU data memory Reg LW R1 Instruction 1 Instruction 2 ADD R2,R1 CC1CC2CC3CC4CC5CC6CC7

38 Branch instruction memory ALU data memory Reg instruction memory ALU data memory Reg instruction memory ALU data memory Reg instruction memory ALU data memory Reg BEQZ instruction memorydata memory Reg ALU

39 Branch (taken) Branch instIFISRFEXDFDSTCWB Delay slot IFISRFEXDFDSTCWB stallSSSSSSSS Branch target IFISRFEXDFDSTCWB

40 Branch (not taken) Branch instIFISRFEXDFDSTCWB Delay slot IFISRFEXDFDSTCWB Branch inst+2IFISRFEXDFDSTCWB Branch inst+3IFISRFEXDFDSTCWB Branch inst+4IFISRFEXDFDSTCWB