Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 EE/CPRE 465 VLSI Design Process. 2 Outline Design Partitioning Design process: MIPS Processor as an example –Architecture Design –Microarchitecture.

Similar presentations


Presentation on theme: "1 EE/CPRE 465 VLSI Design Process. 2 Outline Design Partitioning Design process: MIPS Processor as an example –Architecture Design –Microarchitecture."— Presentation transcript:

1 1 EE/CPRE 465 VLSI Design Process

2 2 Outline Design Partitioning Design process: MIPS Processor as an example –Architecture Design –Microarchitecture Design –Logic Design –Circuit Design –Physical Design Fabrication, Packaging, Testing

3 3 Coping with Complexity How to design System-on-Chip? –Many millions (even billions!) of transistors –Tens to hundreds of engineers Structured Design Partitioning of Design Process

4 4 Structured Design Hierarchy: Divide and Conquer –Recursively partition a system into modules Regularity –Reuse modules wherever possible –Example: Uniformly sized transistors at circuit level Standard cell library at gate level Modularity: well-formed interfaces –Allows modules to be treated as black boxes Locality –Physical and temporal

5 5 Partitioning of Design Process Architecture Design: User’s perspective, what does it do? –Instruction set, register set, and memory model –MIPS, x86, PIC, ARM, Power, SPARC, Alpha,… Microarchitecture Design: how the architecture is partitioned into registers and functional units –Single cycle, multcycle, pipelined, superscalar? –For x86: 386, 486, Pentium, PII, PIII, P4, Core, Core 2, Atom, Celeron, Cyrix MII, AMD K5, Athlon, Phenom Logic Design: how are functional blocks constructed –Ripple carry, carry lookahead, carry select adders Circuit Design: how are transistors used to implement the logic –Complementary CMOS, pass transistors, domino Physical Design: chip layout

6 Two Types of Engineers “Short and fat” engineers –Understand a large amount about a narrow field “Tall and skinny” engineers –Understand something about a broad range of topics Digital VLSI design favors the tall and skinny engineer –can evaluate how choices in one part of the system impact other parts of the system 6

7 7 MIPS Architecture Example: subset of MIPS processor architecture –Drawn from Patterson & Hennessy MIPS is a 32-bit architecture with 32 registers –Consider 8-bit subset using 8-bit datapath –Only implement 8 registers ($0 - $7) –$0 hardwired to 00000000 –8-bit program counter Original MIPS Architecture Simplified MIPS Architecture here Data width32 bits8 bits Address width32 bits8 bits # of registers328 Instruction length32 bits

8 8 Instruction Set imm x4 101000

9 9 Instruction Encoding 32-bit instruction encoding –Requires four cycles to fetch on 8-bit datapath Note that the destination register is specified by: –Bits 15:11 for R-type instructions –Bits 20:16 for addi instruction

10 10 Fibonacci (C) f 0 = 1; f -1 = -1 f n = f n-1 + f n-2 f 1 =0, f 2 =1, f 3 =1, f 4 =2, f 5 =3,...

11 11 Fibonacci (Assembly)

12 12 Fibonacci (Binary) Machine language program

13 13 Multicycle MIPS Microarchitecture Shift left 2

14 14 Multicycle MIPS µ-arch (32-bit Design)

15 15 Multicycle Controller

16 16 Chapter 5 of Patterson and Hennessy (32-bit Design) Summary of Steps for Each Instruction Class Step name Action for R-type Instruction Action for load Instruction Action for store Instruction Action for branch Instruction Action for jump Instruction Instruction fetch IR <= Memory[PC] PC <= PC + 4 Instruction decode / register fetch A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= PC + (sign-extend(IR[15:0]) << 2) Execution / address computation / branch/jump completion ALUOut <= A op B ALUOut <= A + sign-extend(IR[15:0]) If (A==B) PC <= ALUOut PC <= {PC[31:28], IR[25:0], 2’b00} Memory access / R-type completion Reg[IR[15:11]] <= ALUOut MDR <= Memory[ALUOut] <= B Memory read completion Reg[IR[20:16]] <= MDR Become 4 steps in our 8-bit design

17 17 Chapter 5 of Patterson and Hennessy (32-bit Design) Instructions from ISA Perspective Consider each instruction from the perspective of ISA. Example: Add instruction –Instruction specified by the PC. –Operand registers are specified by bits 25:21 and 20:16 of the instruction –New value is the sum of two registers. –Register written is specified by bits 15:11 of instruction. Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] + Reg[Memory[PC][20:16]] PC <= PC + 4 In order to accomplish this we must break up the instruction. –kind of like introducing variables when programming ISA: Instruction Set Architecture

18 18 Chapter 5 of Patterson and Hennessy (32-bit Design) Breaking Down an Instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] + Reg[Memory[PC][20:16]] Could break down to: –IR <= Memory[PC] –A <= Reg[IR[25:21]] –B <= Reg[IR[20:16]] –ALUOut <= A + B –Reg[IR[15:11]] <= ALUOut Don’t forgot an important part of the definition of arithmetic! –PC <= PC + 4

19 19 Chapter 5 of Patterson and Hennessy (32-bit Design) Idea Behind Multicycle Approach We define each instruction from the ISA perspective Break it down into steps: –Balance the amount of work to be done in different steps –Restrict each cycle to use only one major functional unit Introduce new registers as needed –A, B, ALUOut, MDR, IR, etc. Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution) Result: Our book’s multicycle implementation!

20 20 Chapter 5 of Patterson and Hennessy (32-bit Design) 1.Instruction Fetch 2.Instruction Decode and Register Fetch 3.Execution, Memory Address Computation, or Branch / Jump Completion 4.Memory Access or R-type Instruction Completion 5.Memory Read Completion INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! Five Execution Steps 6-8 cycles in our 8-bit design Become 4 steps since we have an 8-bit design

21 21 Chapter 5 of Patterson and Hennessy (32-bit Design) Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using "Register-Transfer Language“ (RTL): IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? Step 1: Instruction Fetch Become 4 steps in our 8-bit design

22 22 Chapter 5 of Patterson and Hennessy (32-bit Design) Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); We are not setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Step 2: Instruction Decode and Register Fetch

23 23 Chapter 5 of Patterson and Hennessy (32-bit Design) ALU is performing one of three functions, based on instruction type R-type: ALUOut <= A op B; Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); Branch: if (A==B) PC <= ALUOut; Jump : PC <= {PC[31:28], IR[25:0], 2’b00} Step 3: Execution, Memory Address Computation, or Branch / Jump Completion (Instruction Dependent)

24 24 Chapter 5 of Patterson and Hennessy (32-bit Design) Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; R-type instructions completion Reg[IR[15:11]] <= ALUOut; Step 4: Memory Access or R-type Instruction Completion

25 25 Chapter 5 of Patterson and Hennessy (32-bit Design) Reg[IR[20:16]] <= MDR; Step 5: Memory Read Completion

26 26 Logic Design Start at top level –Hierarchically decompose MIPS into units Top-level interface

27 27 Block Diagram

28 28 Hierarchical Design

29 29 HDLs Hardware Description Languages –Widely used in logic design –Verilog and VHDL Describe hardware using code –Document logic functions –Simulate logic before building –Synthesize code into gates and layout Requires a library of standard cells

30 30 Verilog Example module adder(input logic [7:0] a, b, input logic c, output logic [7:0] s, output logic cout); wire [6:0] carry; fulladder fa0(a[0], b[0], c, s[0], carry[0]); fulladder fa0(a[1], b[1], carry[0], s[1], carry[1]); fulladder fa0(a[2], b[2], carry[1], s[2], carry[2]);.... fulladder fa0(a[7], b[7], carry[6], s[7], cout); endmodule module fulladder(input logic a, b, c, output logic s, cout); sums1(a, b, c, s); carryc1(a, b, c, cout); endmodule module carry(input logic a, b, c, output logic cout) assign cout = (a&b) | (a&c) | (b&c); endmodule

31 31 Circuit Design How should logic be implemented? –NANDs and NORs vs. ANDs and ORs? –Fan-in and fan-out? –How wide should transistors be? These choices affect speed, area, power Logic synthesis makes these choices for you –Good enough for many applications –Hand-crafted circuits are still better

32 32 Example: Carry Logic assign cout = (a&b) | (a&c) | (b&c); Gate-level design: 26 transistors, 4 stages of gate delays

33 33 Example: Carry Logic assign cout = (a&b) | (a&c) | (b&c); Transistor-level design: 12 transistors, 2 stages of gate delays

34 34 Gate-level Netlist module carry(input a, b, c, output cout) wire x, y, z; and g1(x, a, b); and g2(y, a, c); and g3(z, b, c); or g4(cout, x, y, z); endmodule

35 35 Transistor-Level Netlist module carry(input a, b, c, output cout) wire i1, i2, i3, i4, cn; tranif1 n1(i1, 0, a); tranif1 n2(i1, 0, b); tranif1 n3(cn, i1, c); tranif1 n4(i2, 0, b); tranif1 n5(cn, i2, a); tranif0 p1(i3, 1, a); tranif0 p2(i3, 1, b); tranif0 p3(cn, i3, c); tranif0 p4(i4, 1, b); tranif0 p5(cn, i4, a); tranif1 n6(cout, 0, cn); tranif0 p6(cout, 1, cn); endmodule

36 36 SPICE Netlist.SUBCKT CARRY A B C COUT VDD GND MN1 I1 A GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN2 I1 B GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN3 CN C I1 GND NMOS W=1U L=0.18U AD=0.5P AS=0.5P MN4 I2 B GND GND NMOS W=1U L=0.18U AD=0.15P AS=0.5P MN5 CN A I2 GND NMOS W=1U L=0.18U AD=0.5P AS=0.15P MP1 I3 A VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1 P MP2 I3 B VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1P MP3 CN C I3 VDD PMOS W=2U L=0.18U AD=1P AS=1P MP4 I4 B VDD VDD PMOS W=2U L=0.18U AD=0.3P AS=1P MP5 CN A I4 VDD PMOS W=2U L=0.18U AD=1P AS=0.3P MN6 COUT CN GND GND NMOS W=2U L=0.18U AD=1P AS=1P MP6 COUT CN VDD VDD PMOS W=4U L=0.18U AD=2P AS=2P CI1 I1 GND 2FF CI3 I3 GND 3FF CA A GND 4FF CB B GND 4FF CC C GND 2FF CCN CN GND 4FF CCOUT COUT GND 2FF.ENDS

37 37 Physical Design Floorplan –Area estimation Place & route –Standard cells Datapaths –Slice planning

38 38 Synthesized MIPS Layout

39 39 MIPS Floorplan

40 40 Area Estimation Need area estimates to make floorplan –Compare to another block you already designed –Or estimate from transistor counts –Budget room for large wiring tracks –Your mileage may vary!

41 41 MIPS Layout

42 42 Standard Cells Uniform cell height Uniform well height M1 V DD and GND rails M2 Access to I/Os Well / substrate taps Exploits regularity

43 43 Synthesized Controller Synthesize HDL into gate-level netlist Place & Route using standard cell library

44 44 Snap-Together Cells Synthesized controller area is mostly wires –Design is smaller if wires run through/over cells –Smaller = faster, lower power as well! Design snap-together cells for datapaths and arrays –Plan wires into cells –Pitch Matching required –Connect by abutment Exploits locality Takes lots of effort

45 45 MIPS Datapath 8-bit datapath built from 8 bitslices (regularity) Zipper at top drives control signals to datapath

46 46 MIPS ALU Arithmetic / Logic Unit is part of bitslice

47 47 Slice Plans Slice plan for bitslice –Cell ordering, dimensions, wiring tracks –Arrange cells for wiring locality

48 48 Design Verification Fabrication is slow & expensive –MOSIS 0.6  m: $1000, 3 months –65 nm: $3M, 1 month Debugging chips is very hard –Limited visibility into operation Prove design is right before building! –Logic simulation –Ckt. simulation / Formal verification –Layout vs. schematic (LVS) comparison –Design & electrical rule checks (DRC & ERC) Verification is > 50% of effort on most chips!

49 49 Fabrication & Packaging Tapeout final layout Fabrication –6, 8, 12” wafers –Optimized for throughput, not latency (10 weeks!) –Cut into individual dice Packaging –Bond gold wires from die I/O pads to package

50 50 Testing Test that chip operates –Design errors –Manufacturing errors A single dust particle or wafer defect kills a die –Yields from 90% to < 10% –Depends on die size, maturity of process –Test each part before shipping to customer


Download ppt "1 EE/CPRE 465 VLSI Design Process. 2 Outline Design Partitioning Design process: MIPS Processor as an example –Architecture Design –Microarchitecture."

Similar presentations


Ads by Google