The Single Cycle Datapath

Slides:



Advertisements
Similar presentations
361 datapath Computer Architecture Lecture 8: Designing a Single Cycle Datapath.
Advertisements

The Processor: Datapath & Control
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
Processor II CPSC 321 Andreas Klappenecker. Midterm 1 Tuesday, October 5 Thursday, October 7 Advantage: less material Disadvantage: less preparation time.
Levels in Processor Design
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.
ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
CPE 442 single-cycle datapath.1 Intro. To Computer Architecture CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath.
W.S Computer System Design Lecture 4 Wannarat Suntiamorntut.
By Wannarat Computer System Design Lecture 4 Wannarat Suntiamorntut.
Csci 136 Computer Architecture II –Single-Cycle Datapath Xiuzhen Cheng
Single Cycle Controller Design
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
EE204 Computer Architecture
CS161 – Design and Architecture of Computer Systems
Single-Cycle Datapath and Control
CS 704 Advanced Computer Architecture
Designing a Single-Cycle Processor
IT 251 Computer Organization and Architecture
Instructor Paul Pearce
(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers
CPU Control Lecture 16 CDA
Morgan Kaufmann Publishers
Computer Organization Fall 2017 Chapter 4A: The Processor, Part A
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath Start: X:40.
CPU Organization (Design)
Single Cycle CPU Design
(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers
Single-Cycle CPU DataPath.
Vladimir Stojanovic and Nicholas Weaver
Lecturer PSOE Dan Garcia
CSCI206 - Computer Organization & Programming
Instructors: Randy H. Katz David A. Patterson
Levels in Processor Design
Topic 5: Processor Architecture Implementation Methodology
Rocky K. C. Chang 6 November 2017
Single Cycle datapath.
CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.
The Processor Lecture 3.2: Building a Datapath with Control
CSCE 350 Computer Architecture Designing a Single Cycle Datapath
Topic 5: Processor Architecture
Systems Architecture I
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Lecture 14: Single Cycle MIPS Processor
inst.eecs.berkeley.edu/~cs61c-to
CSC3050 – Computer Architecture
Computer Architecture Processor: Datapath
Prof. Giancarlo Succi, Ph.D., P.Eng.
John Kubiatowicz ( CS152 Computer Architecture and Engineering Lecture 7 Designing a Single Cycle Datapath Start: X:40.
Instructors: Randy H. Katz David A. Patterson
The Processor: Datapath & Control.
COMS 361 Computer Organization
What You Will Learn In Next Few Sets of Lectures
Designing a Single-Cycle Processor
Processor: Datapath and Control
CS/COE0447 Computer Organization & Assembly Language
Presentation transcript:

The Single Cycle Datapath UCSD CSE 141 Larry Carter Winter, 2002 The Single Cycle Datapath Note: Some of the material in this lecture are COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGH RESERVED. Figures may be reproduced only for classroom or personal education use in conjunction with our text and only when the above line is included. 2/6/02 CSE 141 - Single Cycle Datapath

The Performance Big Picture UCSD CSE 141 Larry Carter Winter, 2002 The Performance Big Picture Execution Time = Insts * CPI * Cycle Time Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction Starting today: Single cycle processor: Advantage: CPI = 1 Disadvantage: long cycle time Execute an entire instruction Performance is determined by 3 factors: (a) Instruction count, (b) Clock cycle time, and (c) Clock cycles per instruction. Instruction count is controlled by the ISA and the compiler design; the computer engineer has very little control over it. Implementer of ISA can affect the Clock Cycle Time and Instruction Count per cycle. We’ll design a processor that takes one clock cycle to execute every instruction. The disadvantage of this single cycle processor design is that it has a long cycle time. CSE 141 - Single Cycle Datapath

Processor Design We're ready to implement the MIPS “core” UCSD CSE 141 Larry Carter Winter, 2002 Processor Design We're ready to implement the MIPS “core” load-store instructions: lw, sw reg-reg instructions: add, sub, and, or, slt control flow instructions: beq First, we need to fetch an instruction into processor program counter (PC) supplies instruction address get the instruction from memory Address PC Write Enable Data In DataOut 32 32 Clk CSE 141 - Single Cycle Datapath

Processor Design We're ready to implement the MIPS “core” UCSD CSE 141 Larry Carter Winter, 2002 Processor Design We're ready to implement the MIPS “core” load-store instructions: lw, sw reg-reg instructions: add, sub, and, or, slt control flow instructions: beq First, we need to fetch an instruction into processor program counter (PC) supplies instruction address get the instruction from memory Address PC Write Enable Data In DataOut instruction appears here 32 32 Clk CSE 141 - Single Cycle Datapath

That was too easy A problem – how will we do a load or store? remember that memory has only 1 port and we want to do everything in 1 cycle Address PC Write Enable Data In DataOut instruction appears here 32 32 Clk CSE 141 - Single Cycle Datapath

Instruction & Data in same cycle? Solution: separate data and instruction memory There will be only one DRAM memory We want a stored program architecture How else can you compile and then run a program?? But we can have separate SRAM caches (We’ll study caches later) instruction appears here address Address Instruction cache Write Enable PC Data In DataOut Data Cache 32 32 Clk CSE 141 - Single Cycle Datapath

Instruction Fetch Unit UCSD CSE 141 Larry Carter Winter, 2002 Instruction Fetch Unit Updating the PC for next instruction Sequential Code: PC <- PC + 4 Branch and Jump: PC <- “something else” we’ll worry about these later Now let’s take a look at the first major component of the datapath: the instruction fetch unit. The common RTL operations for all instructions are: (a) Fetch the instruction using the Program Counter (PC) at the beginning of an instruction’s execution (PC -> Instruction Memory -> Instruction Word). (b) Then at the end of the instruction’s execution, you need to update the Program Counter (PC -> Next Address Logic -> PC). More specifically, you need to increment the PC by 4 if you are executing sequential code. For Branch and Jump instructions, you need to update the program counter to “something else” other than plus 4. I will show you what is inside this Next Address Logic block when we talked about the Branch and Jump instructions. For now, let’s focus our attention to the Add and Subtract instructions. +2 = 37 min. (Y:17) CSE 141 - Single Cycle Datapath

The MIPS core subset R-type LOAD and STORE BRANCH: add rd, rs, rt UCSD CSE 141 Larry Carter Winter, 2002 The MIPS core subset op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits R-type add rd, rs, rt sub, and, or, slt LOAD and STORE lw rt, rs, imm sw rt, rs, imm BRANCH: beq rs, rt, imm Read registers rs and rt Feed them to ALU Update register file op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits Read register rs (and rt for store) Feed rs and immed to ALU Move data between mem and reg The Add and Subtract instructions use the R format. The Op together with the Funct fields together specified all the different kinds of add and subtract instructions. Rs and Rt specifies the source registers. And the Rd field specifies the destination register. The Or immediate instruction uses the I format. It only uses one source register, Rs. The other operand comes from the immediate field. The Rt field is used to specified the destination register. Both he load and store instructions use the I format and both add the Rs and the immediate filed together to from the memory address. The difference is that the load instruction will load the data from memory into Rt while the store instruction will store the data in Rt into the memory. The branch on equal instruction also uses the I format. Here Rs and Rt are used to specified the registers we need to compare. If these two registers are equal, we will branch to a location specified by the immediate field. Finally, the jump instruction uses the J format and always causes the program to jump to a memory location specified in the address field. I know I went over this rather quickly and you may have missed something. But don’t worry, this is just an overview. You will keep seeing these (point to the format) all day today. op rs rt displacement 16 21 26 31 6 bits 16 bits 5 bits Read registers rs and rt Feed to ALU to compare Add PC to disp; update PC CSE 141 - Single Cycle Datapath

Processor Design Suggests basic design: Generic Implementation: UCSD CSE 141 Larry Carter Winter, 2002 Processor Design Generic Implementation: all instruction read some registers all instructions use the ALU after reading registers memory accessed & registers updated after ALU Suggests basic design: CSE 141 - Single Cycle Datapath

Datapath for Reg-Reg Operations UCSD CSE 141 Larry Carter Winter, 2002 Datapath for Reg-Reg Operations R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw come from rs, rt, and rd fields ALUoperation signal depends on op and funct op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits And here is the datapath that can do the trick. First of all, we connect the register file’s Ra, Rb, and Rw input to the Rd, Rs, and Rt fields of the instruction bus (points to the format diagram). Then we need to connect busA and busB of the register file to the ALU. Finally, we need to connect the output of the ALU to the input bus of the register file. Conceptually, this is how it works. The instruction bus coming out of the Instruction memory will set the Ra and Rb to the register specifiers Rs and Rt. This causes the register file to put the value of register Rs onto busA and the value of register Rt onto busB, respectively. But setting the ALUctr appropriately, the ALU will perform either the Add and Subtract for us. The result is then fed back to the register file where the register specifier Rw should already be set to the instruction bus’s Rd field. Since the control, which we will design in our next lecture, should have already set the RegWr signal to 1, the result will be written back to the register file at the next clock tick (points to the Clk input). +3 = 42 min. (Y:22) CSE 141 - Single Cycle Datapath

Datapath for Load Operations UCSD CSE 141 Larry Carter Datapath for Load Operations Winter, 2002 R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits Once again we cannot use the instruction’s Rd field for the Register File’s Rw input because load is a I-type instruction and there is no such thing as the Rd field in the I format. So instead of Rd, the Rt field is used to specify the destination register through this two to one multiplexor. The first operand of the ALU comes from busA of the register file which contains the value of Register Rs (points to the Ra input of the register file). The second operand, on the other hand, comes from the immediate field of the instruction. Instead of using the Zero Extender I used in datapath for the or immediate datapath, I have to use a more general purpose Extender that can do both Sign Extend and Zero Extend. The ALU then adds these two operands together to form the memory address. Consequently, the output of the ALU has to go to two places: (a) First the address input of the data memory. (b) And secondly, also to the input of this two-to-one multiplexor. The other input of this multiplexor comes from the output of the data memory so we can place the output of the data memory onto the register file’s input bus for the load instruction. For Add, Subtract, and the Or immediate instructions, the output of the ALU will be selected to be placed on the input bus of the register file. In either case, the control signal RegWr should be asserted so the register file will be written at the end of the cycle. +3 = 60 min. (Y:40) CSE 141 - Single Cycle Datapath

Datapath for Store Operations UCSD CSE 141 Larry Carter Datapath for Store Operations Winter, 2002 Mem[R[rs] + SignExt[imm16]] <- R[rt] Example: sw rt, rs, imm16 op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits Once again we cannot use the instruction’s Rd field for the Register File’s Rw input because load is a I-type instruction and there is no such thing as the Rd field in the I format. So instead of Rd, the Rt field is used to specify the destination register through this two to one multiplexor. The first operand of the ALU comes from busA of the register file which contains the value of Register Rs (points to the Ra input of the register file). The second operand, on the other hand, comes from the immediate field of the instruction. Instead of using the Zero Extender I used in datapath for the or immediate datapath, I have to use a more general purpose Extender that can do both Sign Extend and Zero Extend. The ALU then adds these two operands together to form the memory address. Consequently, the output of the ALU has to go to two places: (a) First the address input of the data memory. (b) And secondly, also to the input of this two-to-one multiplexor. The other input of this multiplexor comes from the output of the data memory so we can place the output of the data memory onto the register file’s input bus for the load instruction. For Add, Subtract, and the Or immediate instructions, the output of the ALU will be selected to be placed on the input bus of the register file. In either case, the control signal RegWr should be asserted so the register file will be written at the end of the cycle. +3 = 60 min. (Y:40) CSE 141 - Single Cycle Datapath

Combining datapaths How do we allow different datapaths for different instructions?? R-type Store CSE 141 - Single Cycle Datapath

Combining datapaths How do we allow different datapaths for different instructions?? Use a multiplexor! ALUscr CSE 141 - Single Cycle Datapath

Datapath for Branch Operations UCSD CSE 141 Larry Carter Winter, 2002 Datapath for Branch Operations beq rs, rt, imm16 We need to compare Rs and Rt op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits The datapath for calculating the branch condition is rather simple. All we have to do is feed the Rs and Rt fields of the instruction into the Ra and Rb inputs of the register file. Bus A will then contain the value from the register selected by Rs. And bus B will contain the value from the register selected by Rt. The next thing to do is to ask the ALU to perform a subtract operation and feed the output Zero to the next address logic. How does the next address logic block look like? Well, before I show you that, let’s take a look at the binary arithmetic's behind the program counter (PC). +2 = 67 min. (Y:47) CSE 141 - Single Cycle Datapath

Computing the Next Address UCSD CSE 141 Larry Carter Winter, 2002 Computing the Next Address PC is a 32-bit byte address into the instruction memory: Sequential operation: PC<31:0> = PC<31:0> + 4 Branch: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4 We don’t need the 2 least-significant bits because: The 32-bit PC is a byte address And all our instructions are 4 bytes (32 bits) long The 2 LSB's of the 32-bit PC are always zeros In theory, the Program Counter (PC) is a 32-bit byte address into the Instruction memory. The Program Counter is increment by four after each sequential instruction. When a branch is taken, we need to sign extend the 16 bit immediate field, multiply this sign extended value by four, and add it to the sequential instruction address (PC + 4). Why does this magic number “4” always come up? Well the reason is that the 32-bit PC is a byte address and all MIPS instructions are four bytes, or 32 bits, long. In other words, if we keep a 32-bit Program Counter, then the two least significant bits of the Program Counter will always be zeros. And if these two bits are always zeros, there is no reason to have hardware to keep them. So in practice, we will simply the hardware by using a 30 bit program counter. That is, we will build a Program Counter that only keep tracks of the upper 30 bits (<31:2>) of the instruction address because we know the 2 least significant bits will always be 0s. Then instead of always increase the Program Counter by four for sequential operation, we only have to increase it by 1. And for branch operation, we don’t need to multiply the sign extended immediate field by four before adding to the sequential PC (PC + 1). And when we apply the program counter to the address of the instruction memory, we need to attach two zeros to its least significant bits. +3 = 70 min. (Y:50) CSE 141 - Single Cycle Datapath

All together: the single cycle datapath UCSD CSE 141 Larry Carter Winter, 2002 All together: the single cycle datapath So here is the single cycle datapath we just built. If you push into the Instruction Fetch Unit, you will see the last slide showing the PC, the next address logic, and the Instruction Memory. Here I have shown how we can get the Rt, Rs, Rd, and Imm16 fields out of the 32-bit instruction word. The Rt, Rs, and Rd fields will go to the register file as register specifiers while the Imm16 field will go to the Extender where it is either Zero and Sign extended to 32 bits. The signals ExtOp, ALUSrc, ALUctr, MemWr, MemtoReg, RegDst, RegWr, Branch, and Jump are control signals. And I will show you how to generate them on Friday. +2 = 80 min. (Z:00) CSE 141 - Single Cycle Datapath

The R-Format (e.g. add) Datapath UCSD CSE 141 Larry Carter Winter, 2002 The R-Format (e.g. add) Datapath So here is the single cycle datapath we just built. If you push into the Instruction Fetch Unit, you will see the last slide showing the PC, the next address logic, and the Instruction Memory. Here I have shown how we can get the Rt, Rs, Rd, and Imm16 fields out of the 32-bit instruction word. The Rt, Rs, and Rd fields will go to the register file as register specifiers while the Imm16 field will go to the Extender where it is either Zero and Sign extended to 32 bits. The signals ExtOp, ALUSrc, ALUctr, MemWr, MemtoReg, RegDst, RegWr, Branch, and Jump are control signals. And I will show you how to generate them on Friday. +2 = 80 min. (Z:00) Need ALUsrc=1, ALUop=“add”, MemWrite=0, MemToReg=0, RegDst = 0, RegWrite=1 and PCsrc=1. CSE 141 - Single Cycle Datapath

The Load Datapath What control signals do we need for load?? UCSD CSE 141 Larry Carter Winter, 2002 The Load Datapath So here is the single cycle datapath we just built. If you push into the Instruction Fetch Unit, you will see the last slide showing the PC, the next address logic, and the Instruction Memory. Here I have shown how we can get the Rt, Rs, Rd, and Imm16 fields out of the 32-bit instruction word. The Rt, Rs, and Rd fields will go to the register file as register specifiers while the Imm16 field will go to the Extender where it is either Zero and Sign extended to 32 bits. The signals ExtOp, ALUSrc, ALUctr, MemWr, MemtoReg, RegDst, RegWr, Branch, and Jump are control signals. And I will show you how to generate them on Friday. +2 = 80 min. (Z:00) What control signals do we need for load?? CSE 141 - Single Cycle Datapath

The Store Datapath CSE 141 - Single Cycle Datapath UCSD CSE 141 Larry Carter Winter, 2002 The Store Datapath So here is the single cycle datapath we just built. If you push into the Instruction Fetch Unit, you will see the last slide showing the PC, the next address logic, and the Instruction Memory. Here I have shown how we can get the Rt, Rs, Rd, and Imm16 fields out of the 32-bit instruction word. The Rt, Rs, and Rd fields will go to the register file as register specifiers while the Imm16 field will go to the Extender where it is either Zero and Sign extended to 32 bits. The signals ExtOp, ALUSrc, ALUctr, MemWr, MemtoReg, RegDst, RegWr, Branch, and Jump are control signals. And I will show you how to generate them on Friday. +2 = 80 min. (Z:00) CSE 141 - Single Cycle Datapath

The beq Datapath CSE 141 - Single Cycle Datapath UCSD CSE 141 Larry Carter Winter, 2002 The beq Datapath So here is the single cycle datapath we just built. If you push into the Instruction Fetch Unit, you will see the last slide showing the PC, the next address logic, and the Instruction Memory. Here I have shown how we can get the Rt, Rs, Rd, and Imm16 fields out of the 32-bit instruction word. The Rt, Rs, and Rd fields will go to the register file as register specifiers while the Imm16 field will go to the Extender where it is either Zero and Sign extended to 32 bits. The signals ExtOp, ALUSrc, ALUctr, MemWr, MemtoReg, RegDst, RegWr, Branch, and Jump are control signals. And I will show you how to generate them on Friday. +2 = 80 min. (Z:00) CSE 141 - Single Cycle Datapath

Key Points CPU is just a collection of state and combinational logic We just designed a very rich processor, at least in terms of functionality Execution time = Insts * CPI * Cycle Time where does the single-cycle machine fit in? CSE 141 - Single Cycle Datapath

Computer of the Day The IBM 1620 (1959) A 2nd generation computer: transistors & core storage (First generation ones used tubes and delay-based memory) Example of creative architecture ~ 2000 built. Relatively inexpensive ( < $1620/month rental) A decimal computer – 6 bits per digit or character 4 bits, flag (for +/- and end-of-word), ECC Variable-length data – fields terminated by flag Arithmetic by table lookup! Codenamed CADET “Can’t Add, Doesn’t Even Try” CSE 141 - Single Cycle Datapath