CS61C L24 Introduction to CPU Design (1) Garcia, Spring 2007 © UCB Cell pic to web site  A new MS app lets people search the web based on a digital cell.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

CMPT 334 Computer Organization
CS 61C L29 Combinational Logic Blocks (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
CS61C L23 Combinational Logic Blocks (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L23 Combinational Logic Blocks (1) Garcia, Spring 2007 © UCB Salamander robot!  Swiss scientists have built a robot that can both swim and walk.
CS61C L19 CPU Design : Designing a Single-Cycle CPU (1) Beamer, Summer 2007 © UCB Scott Beamer Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L21 State Elements : Circuits that Remember (1) Spring 2007 © UCB 161 Exabytes In 2006  In 2006 we created, captured, and replicated 161 exabytes.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2007 © UCB 3.6 TB DVDs? Maybe!  Researchers at Harvard have found a way to use.
CS61C L21 State Elements : Circuits that Remember (1) Garcia, Fall 2006 © UCB One Laptop per Child  The OLPC project has been making news recently with.
CS61C L18 Introduction to CPU Design (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
CS61C L17 Combinatorial Logic Blocks (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Sat Google in Mountain.
CS61C L19 Intro to CPU (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 – Intro.
CS61C L24 Latches (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
CS61C L28 CPU Design : Pipelining to Improve Performance I (1) Garcia, Fall 2006 © UCB 100 Msites!  Sometimes it’s nice to stop and reflect. The web was.
CS61C L25 Single Cycle CPU Datapath (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L18 Combinational Logic Blocks, Latches (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Fall 2006 © UCB T-Mobile’s Wi-Fi / Cell phone  T-mobile just announced a new phone that.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
CS61C L23 Combinational Logic Blocks (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UC.
CS61C L24 Introduction to CPU Design (1) Garcia, Fall 2006 © UCB Fedora Core 6 (FC6) just out  The latest version of the distro has been released; they.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 24 Introduction to CPU design Stanford researchers developing 3D camera.
CS 61C L23 Combinational Logic Blocks (1) Garcia, Fall 2004 © UCB slashdot.org/article.pl?sid=04/10/21/ &tid=126&tid=216 Lecturer PSOE Dan Garcia.
CS 61C L20 Introduction to Synchronous Digital Systems (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Spring 2007 © UCB Google Summer of Code  Student applications are now open (through );
CS61C L20 Introduction to Synchronous Digital Systems (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.
CS 61C L30 Introduction to Pipelined Execution (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
CS61CL L09 Single Cycle CPU Design (1) Huddleston, Summer 2009 © UCB Jeremy Huddleston inst.eecs.berkeley.edu/~cs61c CS61CL : Machine Structures Lecture.
CS61C L20 Datapath © UC Regents 1 CS61C - Machine Structures Lecture 20 - Datapath November 8, 2000 David Patterson
CS61C L27 Single Cycle CPU Control (1) Garcia, Fall 2006 © UCB Wireless High Definition?  Several companies will be working on a “WirelessHD” standard,
CS 61C L15 State & Blocks (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #15: State 2 and Blocks.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single-Cycle CPU Datapath Design Instructor: Sr Lecturer SOE Dan Garcia
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
CS61C L26 Combinational Logic Blocks (1) Garcia, Spring 2014 © UCB Very fast 3D Micro Printer  A new company called Nanoscribe has developed a fabrication.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Organization and Design Lecture 16 – Combinational Logic Blocks Close call!   Cal squeaked by Washington in Overtime We win, and then.
CS 61C: Great Ideas in Computer Architecture Datapath
CDA 3101 Fall 2013 Introduction to Computer Organization
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
CS61C L20 Datapath © UC Regents 1 Microprocessor James Tan Adapted from D. Patterson’s CS61C Copyright 2000.
CS61C L24 State Elements : Circuits that Remember (1) Garcia, Fall 2014 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
MIPS Processor.
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
CS 61C: Great Ideas in Computer Architecture MIPS Datapath 1 Instructors: Nicholas Weaver & Vladimir Stojanovic
CPU Design - Datapath. Review Use muxes to select among input S input bits selects 2 S inputs Each input can be n-bits wide, indep of S Can implement.
Revisão de Circuitos Lógicos PARTE III. Review Use this table and techniques we learned to transform from 1 to another.
IT 251 Computer Organization and Architecture
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Single-Cycle DataPath
CSCE 212 Chapter 5 The Processor: Datapath and Control
minecraft.gamepedia.com/Tutorials/Basic_Logic_Gates
Senior Lecturer SOE Dan Garcia
MIPS Processor.
Rocky K. C. Chang 6 November 2017
Lecturer SOE Dan Garcia
Hello to Casey Holgado listening from Oklahoma State!
robosavvy.com/forum/viewtopic.php?p=32542
Guest Lecturer TA: Shreyas Chand
Instructor Paul Pearce
COMS 361 Computer Organization
MIPS Processor.
Presentation transcript:

CS61C L24 Introduction to CPU Design (1) Garcia, Spring 2007 © UCB Cell pic to web site  A new MS app lets people search the web based on a digital cell phone photo (of poster, ad, dvd, magazine, painting, product). Cool! Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 24 – Introduction to CPU Design lincoln.msresearch.us

CS61C L24 Introduction to CPU Design (2) Garcia, Spring 2007 © UCB What about overflow? Consider a 2-bit signed # & overflow: 10 = or = only 00 = 0 NOTHING! 01 = only Overflows when… C in, but no C out  A,B both > 0, overflow! C out, but no C in  A,B both < 0, overflow! ± #

CS61C L24 Introduction to CPU Design (3) Garcia, Spring 2007 © UCB Extremely Clever Signed Adder/Subtractor xyxor Conditional Inverter A - B = A + (-B); how do we make “-B”?

CS61C L24 Introduction to CPU Design (4) Garcia, Spring 2007 © UCB Five Components of a Computer Processor Computer Control Datapath Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk (where programs, data live when not running)

CS61C L24 Introduction to CPU Design (5) Garcia, Spring 2007 © UCB The CPU Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making) Datapath: portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn) Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)

CS61C L24 Introduction to CPU Design (6) Garcia, Spring 2007 © UCB Stages of the Datapath : Overview Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath smaller stages are easier to design easy to optimize (change) one stage without touching the others

CS61C L24 Introduction to CPU Design (7) Garcia, Spring 2007 © UCB Stages of the Datapath (1/5) There is a wide variety of MIPS instructions: so what general steps do they have in common? Stage 1: Instruction Fetch no matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy) also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)

CS61C L24 Introduction to CPU Design (8) Garcia, Spring 2007 © UCB Stages of the Datapath (2/5) Stage 2: Instruction Decode upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and field lengths second, read in data from all necessary registers  for add, read two registers  for addi, read one register  for jal, no reads necessary

CS61C L24 Introduction to CPU Design (9) Garcia, Spring 2007 © UCB Stages of the Datapath (3/5) Stage 3: ALU (Arithmetic-Logic Unit) the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons ( slt ) what about loads and stores?  lw $t0, 40($t1)  the address we are accessing in memory = the value in $t1 PLUS the value 40  so we do this addition in this stage

CS61C L24 Introduction to CPU Design (10) Garcia, Spring 2007 © UCB Stages of the Datapath (4/5) Stage 4: Memory Access actually only the load and store instructions do anything during this stage; the others remain idle during this stage or skip it all together since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be fast

CS61C L24 Introduction to CPU Design (11) Garcia, Spring 2007 © UCB Stages of the Datapath (5/5) Stage 5: Register Write most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps?  don’t write anything into a register at the end  these remain idle during this fifth stage or skip it all together

CS61C L24 Introduction to CPU Design (12) Garcia, Spring 2007 © UCB Generic Steps of Datapath PC instruction memory +4 rt rs rd registers ALU Data memory imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory 5. Reg. Write

CS61C L24 Introduction to CPU Design (13) Garcia, Spring 2007 © UCB Administrivia Dan’s office hours for the next two weeks moved to 3pm Homework 5 due tonight Midterm Grading standards up soon If you wish to have a problem regraded  Staple your reasons to the front of the exam  Return your exam to your TA

CS61C L24 Introduction to CPU Design (14) Garcia, Spring 2007 © UCB Datapath Walkthroughs (1/3) add $r3,$r1,$r2 # r3 = r1+r2 Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s an add, then read registers $r1 and $r2 Stage 3: add the two values retrieved in Stage 2 Stage 4: idle (nothing to write to memory) Stage 5: write result of Stage 3 into register $r3

CS61C L24 Introduction to CPU Design (15) Garcia, Spring 2007 © UCB Example: add Instruction PC instruction memory +4 registers ALU Data memory imm add r3, r1, r2 reg[1]+reg[2] reg[2] reg[1]

CS61C L24 Introduction to CPU Design (16) Garcia, Spring 2007 © UCB Datapath Walkthroughs (2/3) slti $r3,$r1,17 Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s an slti, then read register $r1 Stage 3: compare value retrieved in Stage 2 with the integer 17 Stage 4: idle Stage 5: write the result of Stage 3 in register $r3

CS61C L24 Introduction to CPU Design (17) Garcia, Spring 2007 © UCB Example: slti Instruction PC instruction memory +4 registers ALU Data memory imm 3 1 x slti r3, r1, 17 reg[1]<17? 17reg[1]

CS61C L24 Introduction to CPU Design (18) Garcia, Spring 2007 © UCB Datapath Walkthroughs (3/3) sw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $41 (retrieved in Stage 2) Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: idle (nothing to write into a register)

CS61C L24 Introduction to CPU Design (19) Garcia, Spring 2007 © UCB Example: sw Instruction PC instruction memory +4 registers ALU Data memory imm 3 1 x SW r3, 17(r1) reg[1]+17 17reg[1] MEM[r1+17]<=r3 reg[3]

CS61C L24 Introduction to CPU Design (20) Garcia, Spring 2007 © UCB Why Five Stages? (1/2) Could we have a different number of stages? Yes, and other architectures do So why does MIPS have five if instructions tend to idle for at least one stage? The five stages are the union of all the operations needed by all the instructions. There is one instruction that uses all five stages: the load

CS61C L24 Introduction to CPU Design (21) Garcia, Spring 2007 © UCB Why Five Stages? (2/2) lw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address compute in Stage 3 Stage 5: write value found in Stage 4 into register $r3

CS61C L24 Introduction to CPU Design (22) Garcia, Spring 2007 © UCB Example: lw Instruction PC instruction memory +4 registers ALU Data memory imm 3 1 x LW r3, 17(r1) reg[1]+17 17reg[1] MEM[r1+17]

CS61C L24 Introduction to CPU Design (23) Garcia, Spring 2007 © UCB Datapath Summary The datapath based on data transfers required to perform instructions A controller causes the right transfers to happen PC instruction memory +4 rt rs rd registers ALU Data memory imm Controller opcode, funct

CS61C L24 Introduction to CPU Design (24) Garcia, Spring 2007 © UCB What Hardware Is Needed? (1/2) PC: a register which keeps track of memory addr of the next instruction General Purpose Registers used in Stages 2 (Read) and 5 (Write) MIPS has 32 of these Memory used in Stages 1 (Fetch) and 4 (R/W) cache system makes these two stages as fast as the others, on average

CS61C L24 Introduction to CPU Design (25) Garcia, Spring 2007 © UCB What Hardware Is Needed? (2/2) ALU used in Stage 3 something that performs all necessary functions: arithmetic, logicals, etc. we’ll design details later Miscellaneous Registers In implementations with only one stage per clock cycle, registers are inserted between stages to hold intermediate data and control signals as they travels from stage to stage. Note: Register is a general purpose term meaning something that stores bits. Not all registers are in the “register file”.

CS61C L24 Introduction to CPU Design (26) Garcia, Spring 2007 © UCB CPU clocking (1/2) Single Cycle CPU: All stages of an instruction are completed within one long clock cycle. The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle. For each instruction, how do we control the flow of information though the datapath? 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory 5. Reg. Write

CS61C L24 Introduction to CPU Design (27) Garcia, Spring 2007 © UCB CPU clocking (2/2) Multiple-cycle CPU: Only one stage of instruction per clock cycle. The clock is made as long as the slowest stage. Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped). For each instruction, how do we control the flow of information though the datapath? 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory 5. Reg. Write

CS61C L24 Introduction to CPU Design (28) Garcia, Spring 2007 © UCB Peer Instruction A. If the destination reg is the same as the source reg, we could compute the incorrect value! B. We’re going to be able to read 2 registers and write a 3 rd in 1 cycle C. Datapath is hard, Control is easy ABC 0: FFF 1: FFT 2: FTF 3: FTT 4: TFF 5: TFT 6: TTF 7: TTT

CS61C L24 Introduction to CPU Design (29) Garcia, Spring 2007 © UCB Peer Instruction A. Truth table for mux with 4-bits of signals has 2 4 rows B. We could cascade N 1-bit shifters to make 1 N-bit shifter for sll, srl C. If 1-bit adder delay is T, the N-bit adder delay would also be T ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

CS61C L24 Introduction to CPU Design (30) Garcia, Spring 2007 © UCB Peer Instruction Answer A. Truth table for mux with 4-bits of signals is 2 4 rows long B. We could cascade N 1-bit shifters to make 1 N-bit shifter for sll, srl C. If 1-bit adder delay is T, the N-bit adder delay would also be T ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT A. Truth table for mux with 4-bits of signals controls 16 inputs, for a total of 20 inputs, so truth table is 2 20 rows…FALSE B. We could cascade N 1-bit shifters to make 1 N-bit shifter for sll, srl … TRUE C. What about the cascading carry? FALSE

CS61C L24 Introduction to CPU Design (31) Garcia, Spring 2007 © UCB “And In conclusion…” N-bit adder-subtractor done using N 1- bit adders with XOR gates on input XOR serves as conditional inverter CPU design involves Datapath,Control Datapath in MIPS involves 5 CPU stages 1) Instruction Fetch 2) Instruction Decode & Register Read 3) ALU (Execute) 4) Memory 5) Register Write