1 EE/CPRE 465 VLSI Design Process. 2 Outline Design Partitioning Design process: MIPS Processor as an example –Architecture Design –Microarchitecture.

Slides:



Advertisements
Similar presentations
Introduction to CMOS VLSI Design Lecture 2: MIPS Processor Example Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris lecture.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
Multicycle Datapath & Control Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers We will be reusing functional units –ALU used to compute address and to increment PC –Memory used for instruction and.
Chapter 2.
1 Chapter Five The Processor: Datapath and Control.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
L14 – Control & Execution 1 Comp 411 – Fall /04/09 Control & Execution Finite State Machines for Control MIPS Execution.
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
10/26/2004Comp 120 Fall October Only 11 to go! Questions? Today Exam Review and Instruction Execution.
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
CSCE 212 Chapter 5 The Processor: Datapath and Control Instructor: Jason D. Bakos.
Fall 2007 MIPS Datapath (Single Cycle and Multi-Cycle)
1 Chapter Five. 2 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical.
10/18/2005Comp 120 Fall October Questions? Instruction Execution.
L15 – Control & Execution 1 Comp 411 – Spring /25/08 Control & Execution Finite State Machines for Control MIPS Execution.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
CPU Architecture Why not single cycle? Why not single cycle? Hardware complexity Hardware complexity Why not pipelined? Why not pipelined? Time constraints.
Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.
1 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical instructions:
The Multicycle Processor CPSC 321 Andreas Klappenecker.
Introduction to CMOS VLSI Design
Computing Systems The Processor: Datapath and Control.
Dept. of Communications and Tokyo Institute of Technology
CAD for Physical Design of VLSI Circuits
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Systems Lecture # 10 MIPS Processor Example Material taken/adapted from.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
EECS 322: Computer Architecture
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.
Lecture 2: MIPS Processor Example
1  2004 Morgan Kaufmann Publishers Chapter Five.
CMOS VLSI Design MIPS Processor Example
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
1. 2 MIPS Hardware Implementation Full die photograph of the MIPS R2000 RISC Microprocessor. The 1986 MIPS R2000 with five pipeline stages and 450,000.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
CS161 – Design and Architecture of Computer Systems
Control & Execution Finite State Machines for Control MIPS Execution.
Systems Architecture I
Five Execution Steps Instruction Fetch
MIPS Instructions.
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Chapter Five.
Control & Execution Finite State Machines for Control MIPS Execution.
CSCE 212 Chapter 5 The Processor: Datapath and Control
Chapter Five The Processor: Datapath and Control
Chapter Five The Processor: Datapath and Control
Drawbacks of single cycle implementation
Systems Architecture I
Guest Lecturer TA: Shreyas Chand
Multicycle Approach We will be reusing functional units
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor (II).
Processor: Multi-Cycle Datapath & Control
Multicycle Design.
COMP541 Datapaths I Montek Singh Mar 18, 2010.
Introduction to CMOS VLSI Design Lecture 2: MIPS Processor Example
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Systems Architecture I
CS161 – Design and Architecture of Computer Systems
Chapter 4 The Von Neumann Model
Presentation transcript:

1 EE/CPRE 465 VLSI Design Process

2 Outline Design Partitioning Design process: MIPS Processor as an example –Architecture Design –Microarchitecture Design –Logic Design –Circuit Design –Physical Design Fabrication, Packaging, Testing

3 Coping with Complexity How to design System-on-Chip? –Many millions (even billions!) of transistors –Tens to hundreds of engineers Structured Design Partitioning of Design Process

4 Structured Design Hierarchy: Divide and Conquer –Recursively partition a system into modules Regularity –Reuse modules wherever possible –Example: Uniformly sized transistors at circuit level Standard cell library at gate level Modularity: well-formed interfaces –Allows modules to be treated as black boxes Locality –Physical and temporal

5 Partitioning of Design Process Architecture Design: User’s perspective, what does it do? –Instruction set, register set, and memory model –MIPS, x86, PIC, ARM, Power, SPARC, Alpha,… Microarchitecture Design: how the architecture is partitioned into registers and functional units –Single cycle, multcycle, pipelined, superscalar? –For x86: 386, 486, Pentium, PII, PIII, P4, Core, Core 2, Atom, Celeron, Cyrix MII, AMD K5, Athlon, Phenom Logic Design: how are functional blocks constructed –Ripple carry, carry lookahead, carry select adders Circuit Design: how are transistors used to implement the logic –Complementary CMOS, pass transistors, domino Physical Design: chip layout

Two Types of Engineers “Short and fat” engineers –Understand a large amount about a narrow field “Tall and skinny” engineers –Understand something about a broad range of topics Digital VLSI design favors the tall and skinny engineer –can evaluate how choices in one part of the system impact other parts of the system 6

7 MIPS Architecture Example: subset of MIPS processor architecture –Drawn from Patterson & Hennessy MIPS is a 32-bit architecture with 32 registers –Consider 8-bit subset using 8-bit datapath –Only implement 8 registers ($0 - $7) –$0 hardwired to –8-bit program counter Original MIPS Architecture Simplified MIPS Architecture here Data width32 bits8 bits Address width32 bits8 bits # of registers328 Instruction length32 bits

8 Instruction Set imm x

9 Instruction Encoding 32-bit instruction encoding –Requires four cycles to fetch on 8-bit datapath Note that the destination register is specified by: –Bits 15:11 for R-type instructions –Bits 20:16 for addi instruction

10 Fibonacci (C) f 0 = 1; f -1 = -1 f n = f n-1 + f n-2 f 1 =0, f 2 =1, f 3 =1, f 4 =2, f 5 =3,...

11 Fibonacci (Assembly)

12 Fibonacci (Binary) Machine language program

13 Multicycle MIPS Microarchitecture Shift left 2

14 Multicycle MIPS µ-arch (32-bit Design)

15 Multicycle Controller

16 Chapter 5 of Patterson and Hennessy (32-bit Design) Summary of Steps for Each Instruction Class Step name Action for R-type Instruction Action for load Instruction Action for store Instruction Action for branch Instruction Action for jump Instruction Instruction fetch IR <= Memory[PC] PC <= PC + 4 Instruction decode / register fetch A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= PC + (sign-extend(IR[15:0]) << 2) Execution / address computation / branch/jump completion ALUOut <= A op B ALUOut <= A + sign-extend(IR[15:0]) If (A==B) PC <= ALUOut PC <= {PC[31:28], IR[25:0], 2’b00} Memory access / R-type completion Reg[IR[15:11]] <= ALUOut MDR <= Memory[ALUOut] <= B Memory read completion Reg[IR[20:16]] <= MDR Become 4 steps in our 8-bit design

17 Chapter 5 of Patterson and Hennessy (32-bit Design) Instructions from ISA Perspective Consider each instruction from the perspective of ISA. Example: Add instruction –Instruction specified by the PC. –Operand registers are specified by bits 25:21 and 20:16 of the instruction –New value is the sum of two registers. –Register written is specified by bits 15:11 of instruction. Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] + Reg[Memory[PC][20:16]] PC <= PC + 4 In order to accomplish this we must break up the instruction. –kind of like introducing variables when programming ISA: Instruction Set Architecture

18 Chapter 5 of Patterson and Hennessy (32-bit Design) Breaking Down an Instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] + Reg[Memory[PC][20:16]] Could break down to: –IR <= Memory[PC] –A <= Reg[IR[25:21]] –B <= Reg[IR[20:16]] –ALUOut <= A + B –Reg[IR[15:11]] <= ALUOut Don’t forgot an important part of the definition of arithmetic! –PC <= PC + 4

19 Chapter 5 of Patterson and Hennessy (32-bit Design) Idea Behind Multicycle Approach We define each instruction from the ISA perspective Break it down into steps: –Balance the amount of work to be done in different steps –Restrict each cycle to use only one major functional unit Introduce new registers as needed –A, B, ALUOut, MDR, IR, etc. Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution) Result: Our book’s multicycle implementation!

20 Chapter 5 of Patterson and Hennessy (32-bit Design) 1.Instruction Fetch 2.Instruction Decode and Register Fetch 3.Execution, Memory Address Computation, or Branch / Jump Completion 4.Memory Access or R-type Instruction Completion 5.Memory Read Completion INSTRUCTIONS TAKE FROM CYCLES! Five Execution Steps 6-8 cycles in our 8-bit design Become 4 steps since we have an 8-bit design

21 Chapter 5 of Patterson and Hennessy (32-bit Design) Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using "Register-Transfer Language“ (RTL): IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? Step 1: Instruction Fetch Become 4 steps in our 8-bit design

22 Chapter 5 of Patterson and Hennessy (32-bit Design) Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); We are not setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Step 2: Instruction Decode and Register Fetch

23 Chapter 5 of Patterson and Hennessy (32-bit Design) ALU is performing one of three functions, based on instruction type R-type: ALUOut <= A op B; Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); Branch: if (A==B) PC <= ALUOut; Jump : PC <= {PC[31:28], IR[25:0], 2’b00} Step 3: Execution, Memory Address Computation, or Branch / Jump Completion (Instruction Dependent)

24 Chapter 5 of Patterson and Hennessy (32-bit Design) Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; R-type instructions completion Reg[IR[15:11]] <= ALUOut; Step 4: Memory Access or R-type Instruction Completion

25 Chapter 5 of Patterson and Hennessy (32-bit Design) Reg[IR[20:16]] <= MDR; Step 5: Memory Read Completion

26 Logic Design Start at top level –Hierarchically decompose MIPS into units Top-level interface

27 Block Diagram

28 Hierarchical Design

29 HDLs Hardware Description Languages –Widely used in logic design –Verilog and VHDL Describe hardware using code –Document logic functions –Simulate logic before building –Synthesize code into gates and layout Requires a library of standard cells

30 Verilog Example module adder(input logic [7:0] a, b, input logic c, output logic [7:0] s, output logic cout); wire [6:0] carry; fulladder fa0(a[0], b[0], c, s[0], carry[0]); fulladder fa0(a[1], b[1], carry[0], s[1], carry[1]); fulladder fa0(a[2], b[2], carry[1], s[2], carry[2]);.... fulladder fa0(a[7], b[7], carry[6], s[7], cout); endmodule module fulladder(input logic a, b, c, output logic s, cout); sums1(a, b, c, s); carryc1(a, b, c, cout); endmodule module carry(input logic a, b, c, output logic cout) assign cout = (a&b) | (a&c) | (b&c); endmodule

31 Circuit Design How should logic be implemented? –NANDs and NORs vs. ANDs and ORs? –Fan-in and fan-out? –How wide should transistors be? These choices affect speed, area, power Logic synthesis makes these choices for you –Good enough for many applications –Hand-crafted circuits are still better

32 Example: Carry Logic assign cout = (a&b) | (a&c) | (b&c); Gate-level design: 26 transistors, 4 stages of gate delays

33 Example: Carry Logic assign cout = (a&b) | (a&c) | (b&c); Transistor-level design: 12 transistors, 2 stages of gate delays

34 Gate-level Netlist module carry(input a, b, c, output cout) wire x, y, z; and g1(x, a, b); and g2(y, a, c); and g3(z, b, c); or g4(cout, x, y, z); endmodule

35 Transistor-Level Netlist module carry(input a, b, c, output cout) wire i1, i2, i3, i4, cn; tranif1 n1(i1, 0, a); tranif1 n2(i1, 0, b); tranif1 n3(cn, i1, c); tranif1 n4(i2, 0, b); tranif1 n5(cn, i2, a); tranif0 p1(i3, 1, a); tranif0 p2(i3, 1, b); tranif0 p3(cn, i3, c); tranif0 p4(i4, 1, b); tranif0 p5(cn, i4, a); tranif1 n6(cout, 0, cn); tranif0 p6(cout, 1, cn); endmodule

36 SPICE Netlist.SUBCKT CARRY A B C COUT VDD GND MN1 I1 A GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN2 I1 B GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN3 CN C I1 GND NMOS W=1U L=0.18U AD=0.5P AS=0.5P MN4 I2 B GND GND NMOS W=1U L=0.18U AD=0.15P AS=0.5P MN5 CN A I2 GND NMOS W=1U L=0.18U AD=0.5P AS=0.15P MP1 I3 A VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1 P MP2 I3 B VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1P MP3 CN C I3 VDD PMOS W=2U L=0.18U AD=1P AS=1P MP4 I4 B VDD VDD PMOS W=2U L=0.18U AD=0.3P AS=1P MP5 CN A I4 VDD PMOS W=2U L=0.18U AD=1P AS=0.3P MN6 COUT CN GND GND NMOS W=2U L=0.18U AD=1P AS=1P MP6 COUT CN VDD VDD PMOS W=4U L=0.18U AD=2P AS=2P CI1 I1 GND 2FF CI3 I3 GND 3FF CA A GND 4FF CB B GND 4FF CC C GND 2FF CCN CN GND 4FF CCOUT COUT GND 2FF.ENDS

37 Physical Design Floorplan –Area estimation Place & route –Standard cells Datapaths –Slice planning

38 Synthesized MIPS Layout

39 MIPS Floorplan

40 Area Estimation Need area estimates to make floorplan –Compare to another block you already designed –Or estimate from transistor counts –Budget room for large wiring tracks –Your mileage may vary!

41 MIPS Layout

42 Standard Cells Uniform cell height Uniform well height M1 V DD and GND rails M2 Access to I/Os Well / substrate taps Exploits regularity

43 Synthesized Controller Synthesize HDL into gate-level netlist Place & Route using standard cell library

44 Snap-Together Cells Synthesized controller area is mostly wires –Design is smaller if wires run through/over cells –Smaller = faster, lower power as well! Design snap-together cells for datapaths and arrays –Plan wires into cells –Pitch Matching required –Connect by abutment Exploits locality Takes lots of effort

45 MIPS Datapath 8-bit datapath built from 8 bitslices (regularity) Zipper at top drives control signals to datapath

46 MIPS ALU Arithmetic / Logic Unit is part of bitslice

47 Slice Plans Slice plan for bitslice –Cell ordering, dimensions, wiring tracks –Arrange cells for wiring locality

48 Design Verification Fabrication is slow & expensive –MOSIS 0.6  m: $1000, 3 months –65 nm: $3M, 1 month Debugging chips is very hard –Limited visibility into operation Prove design is right before building! –Logic simulation –Ckt. simulation / Formal verification –Layout vs. schematic (LVS) comparison –Design & electrical rule checks (DRC & ERC) Verification is > 50% of effort on most chips!

49 Fabrication & Packaging Tapeout final layout Fabrication –6, 8, 12” wafers –Optimized for throughput, not latency (10 weeks!) –Cut into individual dice Packaging –Bond gold wires from die I/O pads to package

50 Testing Test that chip operates –Design errors –Manufacturing errors A single dust particle or wafer defect kills a die –Yields from 90% to < 10% –Depends on die size, maturity of process –Test each part before shipping to customer