1 1999 ©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162.

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
©UCB CS 161Computer Architecture Chapter 5 Lecture 9 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)
©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan
Lab Assignment 2: MIPS single-cycle implementation
©UCB CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13 Instructor: L.N. Bhuyan Adapted from notes.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan
331 W9.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 9 Building a Single-Cycle Datapath [Adapted from Dave Patterson’s.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.
©UCB CS 161Computer Architecture Chapter 5 Lecture 11 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
©UCB CS 161Computer Architecture Chapter 5 Instructor: L.N. Bhuyan LECTURE 10.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Supplementary notes for pipelining LW ____,____ SUB ____,____,____ BEQ ____,____,____ ; assume that, condition for branch is not satisfied OR ____,____,____.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Analogy: Gotta Do Laundry
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
Electrical and Computer Engineering University of Cyprus LAB 2: MIPS.
©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
December 26, 2015©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Performance of Single-cycle Design
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 10: Control Design
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 9: MIPS Lite 4 th edition: Chapter.
Electrical and Computer Engineering University of Cyprus
Computer Organization
Stalling delays the entire pipeline
Performance of Single-cycle Design
ECE232: Hardware Organization and Design
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
Review: MIPS Pipeline Data and Control Paths
Chapter 4 The Processor Part 2
A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID
Lecturer: Alan Christopher
Systems Architecture II
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.4: Pipelining Datapath and Control
The Processor Lecture 3.2: Building a Datapath with Control
An Introduction to pipelining
Pipelining Appendix A and Chapter 3.
Introduction to Computer Organization and Architecture
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
The Processor: Datapath & Control.
Pipelined datapath and control
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
CS/COE0447 Computer Organization & Assembly Language
Presentation transcript:

©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan

©UCB Review of Last Class °MIPS Datapath °Introduction to Pipelining °Introduction to Instruction Level Parallelism (ILP) °Introduction to VLIW

©UCB What is Multiprocessing °Parallelism at the Instruction Level is limited because of data dependency => Speed up is limited!! °Abundant availability of program level parallelism, like Do I = 1000, Loop Level Parallelism. How about employing multiple processors to execute the loops => Parallel processing or Multiprocessing °With billion transistors on a chip, we can put a few CPUs in one chip => Chip multiprocessor

©UCB Memory Latency Problem Even if we increase CPU power, memory is the real bottleneck. Techniques to alleviate memory latency problem: 1.Memory hierarchy – Program locality, cache memory, multilevel, pages and context switching 2.Prefetching – Get the instruction/data before the CPU needs. Good for instns because of sequential locality, so all modern processors use prefetch buffers for instns. What do with data? 3.Multithreading – Can the CPU jump to another program when accessing memory? It’s like multiprogramming!!

©UCB Hardware Multithreading °We need to develop a hardware multithreading technique because switching between threads in software is very time-consuming (Why?), so not suitable for main memory (instead of I/O) access, Ex: Multitasking °Develop multiple PCs and register sets on the CPU so that thread switching can occur without having to store the register contents in main memory (stack, like it is done for context switching). °Several threads reside in the CPU simultaneously, and execution switches between the threads on main memory access. °How about both multiprocessors and multithreading on a chip? => Network Processor

©UCB Architectural Comparisons (cont.) Time (processor cycle) SuperscalarFine-GrainedCoarse-Grained Multiprocessing Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Idle slot Simultaneous Multithreading

©UCB Intel IXP1200 Network Processor Initial component of the Intel Exchange Architecture - IXA Each micro engine is a 5-stage pipeline – no ILP, 4-way multithreaded 7 core multiprocessing – 6 Micro engines and a Strong Arm Core 166 MHz fundamental clock rate  Intel claims 2.5 Mpps IP routing for 64 byte packets Already the most widely used NPU  Or more accurately the most widely admitted use

©UCB IXP1200 Chip Layout StrongARM processing core Microengines introduce new ISA I/O  PCI  SDRAM  SRAM  IX : PCI-like packet bus On chip FIFOs  16 entry 64B each

©UCB IXP1200 Microengine 4 hardware contexts  Single issue processor  Explicit optional context switch on SRAM access Registers  All are single ported  Separate GPR  1536 registers total 32-bit ALU  Can access GPR or XFER registers Standard 5 stage pipe 4KB SRAM instruction store – not a cache!

©UCB Intel IXP2400 Microengine (New) XScale core replaces StrongARM 1.4 GHz target in 0.13-micron Nearest neighbor routes added between microengines Hardware to accelerate CRC operations and Random number generation 16 entry CAM

©UCB MIPS Pipeline Chapter 6 CS 161 Text

©UCB Review: Single-cycle Datapath for MIPS Data Memory (Dmem) PCRegisters ALU Instruction Memory (Imem) Stage 1Stage 2Stage 3 Stage 4 Stage 5 IFtchDcdExecMemWB °Use datapath figure to represent pipeline ALU IM Reg DMReg

©UCB Stages of Execution in Pipelined MIPS 5 stage instruction pipeline 1) I-fetch: Fetch Instruction, Increment PC 2) Decode: Instruction, Read Registers 3) Execute: Mem-reference: Calculate Address R-format: Perform ALU Operation 4) Memory: Load:Read Data from Data Memory Store:Write Data to Data Memory 5) Write Back: Write Data to Register

©UCB Pipelined Execution Representation °To simplify pipeline, every instruction takes same number of steps, called stages °One clock cycle per stage IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB Program Flow Time

©UCB Datapath Timing: Single-cycle vs. Pipelined °Assume the following delays for major functional units: 2 ns for a memory access or ALU operation 1 ns for register file read or write °Total datapath delay for single-cycle: °In pipeline machine, each stage = length of longest delay = 2ns; 5 stages = 10ns InsnInsnRegALUDataRegTotal TypeFetchReadOperAccessWriteTime beq 2ns1ns2ns5ns R-form2ns1ns2ns1ns6ns sw 2ns1ns2ns2ns7ns lw 2ns1ns2ns2ns1ns8ns

©UCB Pipelining Lessons °Pipelining doesn’t help latency (execution time) of single task, it helps throughput of entire workload °Multiple tasks operating simultaneously using different resources °Potential speedup = Number of pipe stages °Time to “fill” pipeline and time to “drain” it reduces speedup °Pipeline rate limited by slowest pipeline stage °Unbalanced lengths of pipe stages also reduces speedup

©UCB Single Cycle Datapath (From Ch 5) Regs Read Reg1 Read data1 ALUALU Read data2 Read Reg2 Write Reg Write Data Zero ALU- con RegWrite Address Read data Write Data Sign Extend Dmem MemRead MemWrite MuxMux MemTo- Reg MuxMux Read Addr Instruc- tion Imem 4 PCPC addadd addadd << 2 MuxMux PCSrc ALUOp ALU- src MuxMux 25:21 20:16 15:11 RegDst 15:0 31:0

©UCB Required Changes to Datapath °Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. °Next PC value is computed in the 3 rd step, but we need to bring in next instn in the next cycle – Move PCSrc Mux to 1 st stage. The PC is incremented unless there is a new branch address. °Branch address is computed in 3 rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits.

©UCB Changes to Datapath Contd. °For lw instn, we need write register address at stage 5. But the IR is now occupied by another instn! So, we must carry the IR destination field as we move along the stages. See connection in fig. Length of ID/EX register = (Reg1:32)+(Reg2:32)+(offset:32)+ (PC:32)+ (destination register:5) = 133 bits Assignment: What are the lengths of EX/MEM, and MEM/WB registers

©UCB Pipelined Datapath (with Pipeline Regs)(6.2) Address Add Add result Shift left 2 I n s t r u c t i o n M u x 0 1 Add PC 0 Address Write data M u x 1 Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero Imem Dmem Regs IF/ID ID/EX EX/MEM MEM/WB 64 bits 133 bits 102 bits 69 bits 5 Fetch Decode Execute Memory Write Back