Computer Organization

Slides:

Advertisements

Similar presentations

Adding the Jump Instruction

Advertisements

The Processor: Datapath & Control

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Analogy: Gotta Do Laundry

CSE431 Chapter 4A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 4A: The Processor, Part A Mary Jane Irwin ( )

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

COM181 Computer Hardware Lecture 6: The MIPs CPU.

CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Stalling delays the entire pipeline

Single-Cycle Datapath and Control

Processor Design & Implementation

Morgan Kaufmann Publishers

Performance of Single-cycle Design

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

ECE232: Hardware Organization and Design

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

ECE232: Hardware Organization and Design

Design of the Control Unit for Single-Cycle Instruction Execution

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Single-cycle datapath, slightly rearranged

ELEC / Computer Architecture and Design Spring Pipelining (Chapter 6)

Design of the Control Unit for One-cycle Instruction Execution

A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID

Pipelining in more detail

CS-447– Computer Architecture Lecture 14 Pipelining (2)

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

Rocky K. C. Chang 6 November 2017

Pipelining Basic concept of assembly line

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.4: Pipelining Datapath and Control

The Processor Lecture 3.2: Building a Datapath with Control

The Processor Lecture 3.5: Data Hazards

Instruction Execution Cycle

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Pipelining Basic concept of assembly line

Pipelining Basic concept of assembly line

Pipelining Appendix A and Chapter 3.

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

The Processor: Datapath & Control.

MIPS Pipelined Datapath

Processor: Datapath and Control

Pipelined datapath and control

Presentation transcript:

Computer Organization 2015 - 2016

The Processor 2

Executing Branch Operations Branch operations involves: compare the operands read from the Register File during decode for equality (zero ALU output) compute the branch target address by adding the updated PC to the 16- bit signed-extended constant field in the instruction beq $s1, $s2, 100

Executing Branch Operations (cont.)

Executing Jump Operations Jump operation involves: replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits Add 4 4 Jump address Instruction Memory Shift left 2 28 PC Read Address Instruction 26

The Main Control Unit The control unit takes inputs and generates a write signal for state elements (memory or registers), the selector control for multiplexors, and the ALU control The control unit uses the 6-bit opcode field to generate these control signals MIPS regularity and simplicity means that a simple decoding process can be used to determine how to set the control lines The datapath operates in a single clock cycle and the signals within the datapath can vary during the clock cycle

The Operation of the Datapath Execution steps for R-type instructions: The instruction is fetched from the instruction memory, and the PC is incremented Two register values are read from the register file using bits 25:21 and 20:16 of the instruction to select the source registers also, the main control unit computes the setting of the control lines during this step The ALU operates on the data values read from the register file, using the function code (bits 5:0, which is the funct field, of the instruction) to generate the ALU function The result from the ALU is written into the register file using bits 15:11 of the instruction to select the destination register

The Operation of the Datapath (cont.) Execution steps for a lw instruction: The instruction is fetched from the instruction memory, and the PC is incremented A register value is read from the register file using bits 25:21 of the instruction to select the source register also, the main control unit computes the setting of the control lines during this step The ALU computes the sum of the value read from the register file and the sign- extended, lower 16 bits of the instruction The sum from the ALU is used as the address for the data memory The data from the memory unit is written into the register file using bits 20:16 of the instruction to select the destination register

The Operation of the Datapath (cont.) Execution steps for a beq instruction: The instruction is fetched from the instruction memory, and the PC is incremented Two register values are read from the register file using bits 25:21 and 20:16 of the instruction to select the source registers also, the main control unit computes the setting of the control lines during this step The ALU performs a subtract on the data values read from the register file The value of PC+4 is added to the sign-extended 16 bits constant of the instruction shifted left by two, the result is the branch target address The Zero result from the ALU is used to decide which adder result to store into the PC

Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle but Is simple and easy to understand Clk lw sw Waste Cycle 1 Cycle 2 In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted.

How Can We Make It Faster? Start fetching and executing the next instruction before the current one has completed Pipelining – all modern processors are pipelined for performance Remember the performance equation: CPU time = CPI * CCT * IC Under ideal conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stages A five stage pipeline is nearly five times faster because the CC is nearly five times faster In reality the time per instruction in a pipeline processor is longer than the minimum possible because 1) the pipeline stages may not be perfectly balanced and 2) pipelining involves some overhead (like pipeline stage isolation registers).

The Five Stages of Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB IFetch: Instruction Fetch and Update PC Dec: Registers Fetch and Instruction Decode Exec: Execute R-type; calculate memory address Mem: Read/write the data from/to the Data Memory WB: Write the result data into the register file As shown here, each of these five steps will take one clock cycle to complete.

A Pipelined MIPS Processor Start the next instruction before the current one has completed improves throughput - total amount of work done in a given time instruction latency (execution time - time from the start of an instruction to its completion) is not reduced clock cycle (pipeline stage time) is limited by the slowest stage for some stages don’t need the whole clock cycle (e.g., WB) for some instructions, some stages are wasted cycles (i.e., nothing is done during that cycle for that instruction)

A Pipelined MIPS Processor (cont.) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type IFetch Dec Exec Mem WB

Single Cycle versus Pipeline Clk Single Cycle Implementation (CC = 800 ps): lw sw Waste Cycle 1 Cycle 2 1,000,000 adds for single cycle = 800,000,000 ps 1,000,000 adds for pipelined = 200,000,000 ps + 800 ps (fill or flush time) lw IFetch Dec Exec Mem WB Pipeline Implementation (CC = 200 ps): sw R-type To complete an entire instruction in the pipelined case takes 1000 ps

Pipelining the MIPS What makes it easy all instructions are the same length (32 bits) so they can fetch in the 1st stage and decode in the 2nd stage Some instructions can begin reading register file in 2nd stage memory operations occur only in loads and stores and can use the execute stage to calculate memory addresses each instruction writes at most one result (i.e., changes the machine state) and does it in the last two pipeline stages (MEM or WB) operands must be aligned in memory so a single data transfer takes only one data memory access

MIPS Pipeline Datapath IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Any info that is needed in a later pipe stage must be passed to that stage via a pipeline register (e.g., need to preserve the destination register address in the pipeline state registers) Note two exceptions to right-to-left flow WB that writes the result back into the register file in the middle of the datapath Selection of the next value of the PC, one input comes from the calculated branch address from the MEM stage Only later instructions in the pipeline can be influenced by these two REVERSE data movements. The first one (WB to ID) leads to data hazards. The second one (MEM to IF) leads to control hazards. All instructions must update some state in the processor – the register file, the memory, or the PC – so separate pipeline registers are redundant to the state that is updated (not needed). PC can be thought of as a pipeline register: the one that feeds the IF stage of the pipeline. Unlike all of the other pipeline registers, the PC is part of the visible architecture state – its content must be saved when an exception occurs (the contents of the other pipe registers are discarded). System Clock

Graphically Representing MIPS Pipeline ALU (EX) IM (IF) Reg (ID) DM (MEM) (WB)

Why Pipeline? For Performance! Time (clock cycles) Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 ALU IM Reg DM Inst 0 I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 ALU IM Reg DM Inst 3 ALU IM Reg DM Inst 4 Time to fill the pipeline

Can Pipelining Get Us Into Trouble? Yes: Pipeline Hazards structural hazards: attempt to use the same resource by two different instructions at the same time data hazards: attempt to use data before it is ready An instruction’s source operand(s) are produced by a prior instruction still in the pipeline control hazards: attempt to make a decision about program control flow before the condition has been evaluated and the new PC target address calculated branch and jump instructions Note that data hazards can come from R-type instructions or lw instructions Exceptions can’t be resolved by waiting!

A Single Memory Would Be a Structural Hazard Time (clock cycles) Reading data from memory ALU Mem Reg lw I n s t r. O r d e ALU Mem Reg Inst 1 ALU Mem Reg Inst 2 ALU Mem Reg Inst 3 Reading instruction from memory ALU Mem Reg Inst 4 Fix with separate instruction and data memories (I$ and D$)

How About Register File Access? Time (clock cycles) Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half ALU IM Reg DM add $s1, I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 Define register reads to occur in the second half of the cycle and register writes in the first half ALU IM Reg DM add $s2,$s1,

Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM add $s1, ALU IM Reg DM sub $s4,$s1,$s5 ALU IM Reg DM and $s6,$s1,$s7 ALU IM Reg DM For lecture or $s8,$s1,$s9 ALU IM Reg DM xor $s4,$s1,$s5 Read before write (data hazard)

Avoiding the Data Hazard The compiler inserts two nops (no operation instruction) before the and instruction add $s1, $s2, $s3 nop sub $s4, $s1, $s5 and $s6, $s1, $s7 or $s8, $s1, $s9 xor $s4, $s1, $s5

One Way to “Fix” a Data Hazard Can fix data hazard by waiting ALU IM Reg DM add $s1, I n s t r. O r d e nops nops sub $s4,$s1,$s5 and $s6,$s1,$s7 ALU IM Reg DM

Another Way to “Fix” a Data Hazard Fix data hazards by forwarding results as soon as they are available to where they are needed ALU IM Reg DM add $s1, I n s t r. O r d e ALU IM Reg DM sub $s4,$s1,$s5 ALU IM Reg DM and $s6,$s1,$s7 For lecture Forwarding paths are valid only if the destination stage is later in time than the source stage. Forwarding is harder if there are multiple results to forward per instruction or if they need to write a result early in the pipeline. Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely, we will see that it is really supplied by the pipeline register EX/MEM or MEM/WB and will depict is as such. ALU IM Reg DM or $s8,$s1,$s9 ALU IM Reg DM xor $s4,$s1,$s5

Loads Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM lw $s1,4($s2) I n s t r. O r d e ALU IM Reg DM sub $s4,$s1,$s5 ALU IM Reg DM and $s6,$s1,$s7 ALU IM Reg DM Note that lw is just another example of register usage (beyond ALU ops) or $s8,$s1,$s9 ALU IM Reg DM xor $s4,$s1,$s5 Load-use data hazard

Branch Instructions Cause Control Hazards Dependencies backward in time cause hazards beq ALU IM Reg DM I n s t r. O r d e ALU IM Reg DM lw ALU IM Reg DM Inst 3 ALU IM Reg DM Inst 4