Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE241 L2 Datapath/Memory.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath and Memory.

Similar presentations


Presentation on theme: "CSE241 L2 Datapath/Memory.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath and Memory."— Presentation transcript:

1 CSE241 L2 Datapath/Memory.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath and Memory

2 CSE241 L2 Datapath/Memory.2Kahng & Cichy, UCSD ©2003 Introduction: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers, decoders  Control l Finite state machines (PLA, ROM, random logic)  Interconnect l Switches, arbiters, buses  Memory l Caches (SRAMs), TLBs, DRAMs, buffers

3 CSE241 L2 Datapath/Memory.3Kahng & Cichy, UCSD ©2003 The 1-bit Binary Adder 1-bit Full Adder (FA) A B S C in S = A  B  C in C out = A&B | A&C in | B&C in (majority function)  How can we use it to build a 64-bit adder?  How can we modify it easily to build an adder/subtractor?  How can we make it better (faster, lower power, smaller)? ABC in C out Scarry status 00000kill 00101 01001propagate 01110 10001 10110 11010generate 11111 C out G = A&B P = A  B K = !A & !B = P  C in = G | P&C in Slide courtesy of Mary Jane Irwin, Penn state

4 CSE241 L2 Datapath/Memory.4Kahng & Cichy, UCSD ©2003 FA Gate Level Implementations AB S C out C in t1 t0 t2 t0 t1 AB S C out C in t2 Slide courtesy of Mary Jane Irwin, Penn state

5 CSE241 L2 Datapath/Memory.5Kahng & Cichy, UCSD ©2003 Review: XOR FA C out S C in A B 16 transistors Slide courtesy of Mary Jane Irwin, Penn state

6 CSE241 L2 Datapath/Memory.6Kahng & Cichy, UCSD ©2003 Ripple Carry Adder (RCA) A0A0 B0B0 S0S0 C 0 =C in FA A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A3A3 B3B3 S3S3 C out =C 4 T = O(N) worst case delay T adder  T FA (A,B  C out ) + (N-2)T FA (C in  C out ) + T FA (C in  S) Real Goal: Make the fastest possible carry path Max delay = tdelay = tsum + (N-1) tcarry Slide courtesy of Mary Jane Irwin, Penn state

7 CSE241 L2 Datapath/Memory.7Kahng & Cichy, UCSD ©2003 Inversion Property AB S C in FA !C out (A, B, C in ) = C out (!A, !B, !C in ) C out AB S FAC out C in !S (A, B, C in ) = S(!A, !B, !C in )   Inverting all inputs to a FA results in inverted values for all outputs Slide courtesy of Mary Jane Irwin, Penn state

8 CSE241 L2 Datapath/Memory.8Kahng & Cichy, UCSD ©2003 Exploiting the Inversion Property A0A0 B0B0 S0S0 C 0 =C in FA’ A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A3A3 B3B3 S3S3 C out =C 4 Now need two “flavors” of FAs regular cellinverted cell Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder). Slide courtesy of Mary Jane Irwin, Penn state

9 CSE241 L2 Datapath/Memory.9Kahng & Cichy, UCSD ©2003 Fast Carry Chain Design  The key to fast addition is a low latency carry network  What matters is whether in a given position a carry is l generatedG i = A i & B i = A i B i l propagatedP i = A i  B i (sometimes use A i | B i ) l annihilated (killed)K i = !A i & !B i  Giving a carry recurrence of C i+1 = G i | P i C i C 1 = C 2 = C 3 = C 4 = Slide courtesy of Mary Jane Irwin, Penn state

10 CSE241 L2 Datapath/Memory.10Kahng & Cichy, UCSD ©2003 Fast Carry Chain Design  The key to fast addition is a low latency carry network  What matters is whether in a given position a carry is l generatedG i = A i & B i = A i B i l propagatedP i = A i  B i (sometimes use A i | B i ) l annihilated (killed)K i = !A i & !B i  Giving a carry recurrence of C i+1 = G i | P i C i C 1 = G 0 | P 0 C 0 C 2 = G 1 | P 1 G 0 | P 1 P 0 C 0 C 3 = G 2 | P 2 G 1 | P 2 P 1 G 0 | P 2 P 1 P 0 C 0 C 4 = G 3 | P 3 G 2 | P 3 P 2 G 1 | P 3 P 2 P 1 G 0 | P 3 P 2 P 1 P 0 C 0 Slide courtesy of Mary Jane Irwin, Penn state

11 CSE241 L2 Datapath/Memory.11Kahng & Cichy, UCSD ©2003 Binary Adder Landscape synchronous word parallel adders ripple carry adders (RCA) carry prop min adders signed-digit fast carry prop residue adders adders adders Manchester carry parallel conditional carry carry chain select prefix sum skip T = O(N), A = O(N) T = O(1), A = O(N) T = O(log N) A = O(N log N) T = O(  N), A = O(N) T = O(N) A = O(N)

12 CSE241 L2 Datapath/Memory.12Kahng & Cichy, UCSD ©2003 Parallel Prefix Adders (PPAs)  Define carry operator € on (G,P) signal pairs l € is associative, i.e., [(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)] € (G’’,P’’)(G’,P’) (G,P) where G = G’’  P’’G’ P = P’’P’ € €€ € G’G’ !G G ’’ P ’’ Slide courtesy of Mary Jane Irwin, Penn state

13 CSE241 L2 Datapath/Memory.13Kahng & Cichy, UCSD ©2003 PPA General Structure  Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G 0,P 0 ) € (G 1,P 1 ) € (G 2,P 2 ) € … € (G N-2,P N-2 ) € (G N-1,P N-1 )  Since € is associative, we can group them in any order l but note that it is not commutative  Measures to consider l number of € cells l tree cell depth (time) l tree cell area l cell fan-in and fan-out l max wiring length l wiring congestion l delay path variation (glitching) P i, G i logic (1 unit delay) S i logic (1 unit delay) C i parallel prefix logic tree (1 unit delay per level) Slide courtesy of Mary Jane Irwin, Penn state

14 CSE241 L2 Datapath/Memory.14Kahng & Cichy, UCSD ©2003 Adder Types  RCA = Ripple Carry  MCC = Manchester Carry Chain  CCSka = Carry-Chain haSave  VCSka =  CCSia = Carry Save with Invert  BK = Brent Kung  Others: l Ling-Ling l ELM l Kogge-Stone

15 CSE241 L2 Datapath/Memory.15Kahng & Cichy, UCSD ©2003 Adder Speed Comparisons Slide courtesy of Mary Jane Irwin, Penn state ns

16 CSE241 L2 Datapath/Memory.16Kahng & Cichy, UCSD ©2003 Adder Average Power Comparisons Slide courtesy of Mary Jane Irwin, Penn state Watt

17 CSE241 L2 Datapath/Memory.17Kahng & Cichy, UCSD ©2003 Power-Delay Product of Adder Comparisons From Nagendra, 1996 Slide courtesy of Mary Jane Irwin, Penn state Power Delay Product

18 CSE241 L2 Datapath/Memory.18Kahng & Cichy, UCSD ©2003 Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers, decoders  Control l Finite state machines (PLA, ROM, random logic)  Interconnect l Switches, arbiters, buses  Memory l Caches (SRAMs), TLBs, DRAMs, buffers

19 CSE241 L2 Datapath/Memory.19Kahng & Cichy, UCSD ©2003 Parallel Programmable Shifters Data In Control = Data Out Shift amount Shift direction Shift type (logical, arith, circular) Shifters used in multipliers, floating point units Consume lots of area if done in random logic gates Slide courtesy of Mary Jane Irwin, Penn state

20 CSE241 L2 Datapath/Memory.20Kahng & Cichy, UCSD ©2003 Shifters - Applications  Linear shifting l Concatenate 2 words (N-bits) and pull out a contiguous N-bit word. l Take an portion of a word and shift to to the left or right -Multiply by 2 M -Pad the emptied position with 0’s or 1’s -Arithmetic shifts –Left shift, pad 0’s –Right shift, pad 1’s  Barrel shifting l Emptied position filled with bit dropped off. l Rotational shifting… circular convolution. wordA wordB wordC Slide courtesy of Ken Yang, UCLA

21 CSE241 L2 Datapath/Memory.21Kahng & Cichy, UCSD ©2003 A Programmable Binary Shifter rgtnopleft AiAi A i-1 B i-1 BiBi AiAi A i-1 rgtnopleftBiBi B i-1 A1A1 A0A0 010A1A1 A0A0 A1A1 A0A0 1000A1A1 A1A1 A0A0 001A0A0 0 Slide courtesy of Mary Jane Irwin, Penn state

22 CSE241 L2 Datapath/Memory.22Kahng & Cichy, UCSD ©2003 A Programmable Binary Shifter rgtnopleft AiAi A i-1 B i-1 BiBi AiAi A i-1 rgtnopleftBiBi B i-1 A1A1 A0A0 010A1A1 A0A0 A1A1 A0A0 1000A1A1 A1A1 A0A0 001A0A0 0 Slide courtesy of Mary Jane Irwin, Penn state

23 CSE241 L2 Datapath/Memory.23Kahng & Cichy, UCSD ©2003 4-bit Barrel Shifter A0A0 A1A1 A2A2 A3A3 B0B0 B1B1 B2B2 B3B3 Sh1 Sh2 Sh3 Sh0Sh1Sh2Sh3 Example: Sh0 = 1 B 3 B 2 B 1 B 0 = A 3 A 2 A 1 A 0 Sh1 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 2 A 1 Sh2 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 3 A 2 Sh3 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 3 A 3 Area dominated by wiring Slide courtesy of Mary Jane Irwin, Penn state

24 CSE241 L2 Datapath/Memory.24Kahng & Cichy, UCSD ©2003 4-bit Barrel Shifter A0A0 A1A1 A2A2 A3A3 B0B0 B1B1 B2B2 B3B3 Sh1 Sh2 Sh3 Sh0Sh1Sh2Sh3 Example: Sh0 = 1 B 3 B 2 B 1 B 0 = A 3 A 2 A 1 A 0 Sh1 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 2 A 1 Sh2 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 3 A 2 Sh3 = 1 B 3 B 2 B 1 B 0 = A 3 A 3 A 3 A 3 Area dominated by wiring Slide courtesy of Mary Jane Irwin, Penn state

25 CSE241 L2 Datapath/Memory.25Kahng & Cichy, UCSD ©2003 4-bit Barrel Shifter Layout Width barrel ~ 2 p m N N = max shift distance, p m = metal pitch Delay ~ 1 fet + N diff caps Width barrel Only one Sh# active at a time l Slide courtesy of Mary Jane Irwin, Penn state

26 CSE241 L2 Datapath/Memory.26Kahng & Cichy, UCSD ©2003 Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers  Memories l SRAM cell -6T l DRAM -1T l Other types -1T SRAM

27 CSE241 L2 Datapath/Memory.27Kahng & Cichy, UCSD ©2003 Semiconductor Memories RWM Read Write Memory NVRWM Non Volatile ROM Read Only Random Access Non-Random Access EPROMMask- programmed SRAM (cache, register file) FIFO/LIFOE 2 PROM DRAMShift Register CAM FLASHElectrically- programmed (PROM) Slide courtesy of Mary Jane Irwin, Penn state

28 CSE241 L2 Datapath/Memory.28Kahng & Cichy, UCSD ©2003 Second Level Cache (SRAM) A Typical Memory Hierarchy Control Datapath Secondary Memory (Disk) On-Chip Components RegFile Main Memory (DRAM) Data Cache Instr Cache ITLB DTLB eDRAM Speed (ns):.1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest  By taking advantage of the principle of locality: l Present the user with as much memory as is available in the cheapest technology. l Provide access at the speed offered by the fastest technology. Slide courtesy of Mary Jane Irwin, Penn state

29 CSE241 L2 Datapath/Memory.29Kahng & Cichy, UCSD ©2003 Access Time comparison TypeTime (ns) RDRAM35ns DRAM30ns SRAM2ns FLASH90ns FRAM100ns ROM (read)50ns  Latency Time to read  Bandwidth Throughput of system (Generalized)

30 CSE241 L2 Datapath/Memory.30Kahng & Cichy, UCSD ©2003 Read-Write Memories (RAMs)  Static – SRAM l data is stored as long as supply is applied l large cells (6 fets/cell) – so fewer bits/chip l fast – so used where speed is important (e.g., caches) l differential outputs (output BL and !BL) l use sense amps for performance l compatible with CMOS technology  Dynamic – DRAM l periodic refresh required l small cells (1 to 3 fets/cell) – so more bits/chip l slower – so used for main memories l single ended output (output BL only) l need sense amps for correct operation l not typically compatible with CMOS technology Slide courtesy of Mary Jane Irwin, Penn state

31 CSE241 L2 Datapath/Memory.31Kahng & Cichy, UCSD ©2003 6-transistor SRAM Cell !BLBL WL M1 M2 M3 M4 M5 M6Q !Q Slide courtesy of Mary Jane Irwin, Penn state

32 CSE241 L2 Datapath/Memory.32Kahng & Cichy, UCSD ©2003 SRAM Cell Analysis (Read) !BL=1 BL=1 WL=1 M1 M4 M5 M6 Q=1 !Q=0 C bit Read-disturb (read-upset): must carefully limit the allowed voltage rise on !Q to a value that prevents the read-upset condition from occurring while simultaneously maintaining acceptable circuit speed and area constraints Slide courtesy of Mary Jane Irwin, Penn state

33 CSE241 L2 Datapath/Memory.33Kahng & Cichy, UCSD ©2003 SRAM Cell Analysis (Read) !BL=1 BL=1 WL=1 M1 M4 M5 M6 Q=1 !Q=0 C bit Cell Ratio (CR) = (W M1 /L M1 )/(W M5 /L M5 ) V !Q = [(V dd - V Tn )(1 + CR  (CR(1 + CR))]/(1 + CR) Slide courtesy of Mary Jane Irwin, Penn state

34 CSE241 L2 Datapath/Memory.34Kahng & Cichy, UCSD ©2003 Read Voltages Ratios V dd = 2.5V V Tn = 0.5V Slide courtesy of Mary Jane Irwin, Penn state

35 CSE241 L2 Datapath/Memory.35Kahng & Cichy, UCSD ©2003 SRAM Cell Analysis (Write) !BL=1 BL=0 WL=1 M1 M4 M5 M6 Q=1 !Q=0 Pullup Ratio (PR) = (W M4 /L M4 )/(W M6 /L M6 ) V Q = (V dd - V Tn )  ((V dd – V Tn ) 2 – (  p /  n )(PR)((V dd – V Tn - V Tp ) 2 ) Slide courtesy of Mary Jane Irwin, Penn state

36 CSE241 L2 Datapath/Memory.36Kahng & Cichy, UCSD ©2003 Write Voltages Ratios V dd = 2.5V |V Tp | = 0.5V  p /  n = 0.5 Slide courtesy of Mary Jane Irwin, Penn state

37 CSE241 L2 Datapath/Memory.37Kahng & Cichy, UCSD ©2003 Cell Sizing  Keeping cell size minimized is critical for large caches  Minimum sized pull down fets (M1 and M3) l Requires minimum width and longer than minimum channel length pass transistors (M5 and M6) to ensure proper CR l But sizing of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle  Minimum width and length pass transistors l Boost the width of the pull downs (M1 and M3) l Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger Slide courtesy of Mary Jane Irwin, Penn state

38 CSE241 L2 Datapath/Memory.38Kahng & Cichy, UCSD ©2003 6T-SRAM Layout V DD GND Q Q WL BL M1 M3 M4M2 M5M6 Slide courtesy of Mary Jane Irwin, Penn state

39 CSE241 L2 Datapath/Memory.39Kahng & Cichy, UCSD ©2003 1-Transistor DRAM Cell M1 X BL WL XV dd -V t WL write “1” BL V dd Write: C s is charged (or discharged) by asserting WL and BL Read: Charge redistribution occurs between C BL and C s CsCs read “1” V dd /2 sensing Read is destructive, so must refresh after read C BL Slide courtesy of Mary Jane Irwin, Penn state

40 CSE241 L2 Datapath/Memory.40Kahng & Cichy, UCSD ©2003 1-T DRAM Cell Slide courtesy of Mary Jane Irwin, Penn state

41 CSE241 L2 Datapath/Memory.41Kahng & Cichy, UCSD ©2003 DRAM Cell Observations  DRAM memory cells are single ended (complicates the design of the sense amp)  1T cell requires a sense amp for each bit line due to charge redistribution read  1T cell read is destructive; refresh must follow to restore data  1T cell requires an extra capacitor that must be explicitly included in the design  A threshold voltage is lost when writing a 1 l can be circumvented by bootstrapping the word lines to a higher value than V dd l Not usually available on chip, unless analog elements are present


Download ppt "CSE241 L2 Datapath/Memory.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 02: Datapath and Memory."

Similar presentations


Ads by Google