Presentation is loading. Please wait.

Presentation is loading. Please wait.

Soc 2.1 Chapter 2 Chip Basics: Time, Area, Power, Reliability, Configurability Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011.

Similar presentations


Presentation on theme: "Soc 2.1 Chapter 2 Chip Basics: Time, Area, Power, Reliability, Configurability Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011."— Presentation transcript:

1 soc 2.1 Chapter 2 Chip Basics: Time, Area, Power, Reliability, Configurability Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

2 soc 2.2 Basic design issue: Time clocking pipelining –optimal pipelining –pipeline partitioning –wave pipelining and low overhead clocking

3 soc 2.3 SIA roadmap

4 soc 2.4 Tradeoffs in IP selection and design: performance, area, power

5 soc 2.5 Clock parameters parameters –P max : maximum delay through logic –P min : minimum delay through logic –  t : cycle time (in seconds per cycle) –t w : clock pulse width –t g : data setup time –t d : register output delay –C : total clocking overhead tgtg –tw–tw P max tdtd tt  t = P max + C

6 soc 2.6 Skew skew: uncertainty in the clock arrival time two types of skew –depends on  t.....skew = k, a fraction of P max where P max is the segment delay that determines  t large segments may have longer delay and skew part of skew varies with L eff, like segment delay –independent of  t....skew =  can relate to clock routing, jitter from environmental conditions, other effects unrelated to segment delay effect of skew = k(P max ) +  –skew range adds directly to the clock overhead

7 soc 2.7 Optimal pipelining let the total instruction execution without pipelining and associated clock overhead be T in a pipelined processor, let S be the number of segments S - 1 is number of cycles lost due to a pipeline break let b = probability of break, C = clock overhead incl. fixed skew

8 soc 2.8 Optimum pipelining P1P2P3P4 T suppose T =  i P max i without clock overhead S = number of pipeline segments C = clock overhead T/S max (P max i ) [quantization] P max i = delay of the i th functional unit

9 soc 2.9  t = T/S + C performance = 1/ (1+(S - 1)b) [IPC] throughput = G = performance /  t [IPS] G = Find S for optimum performance by solving for S: we get Cycle time Avg. Time / segment Clock overhead

10 soc 2.10 Find S opt estimate b –use instruction traces find T and C from design details –feasibility studies example:

11 soc 2.11 Quantization + other considerations quantization effects –T cannot be arbitrarily divided into segments –segments defined by functional unit delays –some segments cannot be divided; others can be divided only at particular boundaries some functional operations are atomic –cycle: usually not cross function unit boundary S opt –ignores cost/area of extra pipeline stages –ignores quantization loss –largest S to be used

12 soc 2.12 Microprocessor design practice tradeoff around design target optimal in-order integer RISC: 5-10 stages –performance: relatively flat across this range –deeper for out-of-order or complex ISA (e.g. Intel Architectures) use longer pipeline (higher frequency) if –FP/multimedia vector performance important –clock overhead low else use shorter pipeline –especially if area/power/effort are critical

13 soc 2.13 Advanced circuit techniques asynchronous or self-timed clocking –avoids clock distribution problems but has its own overhead multi-phase domino clocking –skew tolerant and low clock overhead; lots of power required and extra area wave pipelining –ultimate limit on  t  t = P max - P min + C

14 soc 2.14 Basic Design Issues: Silicon Area, Power, Reliability, Reconfiguration die floorplanning methodology area-cost model power analysis and model reliability reconfigurable design soft processors

15 soc 2.15 AMD Barcelona multicore http://www.techwarelabs.com/reviews/processors/barcelona/

16 soc 2.16 Die floorplanning methodology pick target cost based on market requirements determine total area available within cost budget –defect and yield model compute net available area for processors, caches and memory –account for I/O, buses, test hooks, I/O pads etc. select core processors and assess area and performance re-allocate area to optimize performance –cache, signal processors, multimedia processors, etc.

17 soc 2.17 Wafers and chips suppose the wafer has diameter d and each die is square with area A d

18 soc 2.18 If N is the number of dice on the wafer, N =  d) 2 / (4A) [Gross Yield] Let N G be number of good dice and N D be the number of defects on a wafer. Given N dice of which N G are good.....suppose we randomly add 1 new defect to the wafer. What’s the probability that it strikes a good die....and changes N G ? Wafers and chips: example

19 soc 2.19 Probability of the defect hitting a good die = N G / N The change in N G is d N G /d N D = - N G / N Rewriting this we getd N G / N G = - ( 1/N) d N D Integrating and solving:ln(N G ) = -N D /N + C Since N G = N => N D = 0, C must be ln(N) N G / N = Yield = e - N D /N let defect density ( defects / cm 2 ) =  D N d =  D x wafer area =  D x A x N Yield = N g / N = e -  D A typically  D = 0.3 – 1.0 defect / cm 2

20 soc 2.20 Using yield to size a die to find the cost per die: 1. find N, the number of die on a wafer 2. find Yield 3. find N g = Yield x N 4. cost/die = wafer cost/ Ng

21 soc 2.21 Effect of defect density

22 soc 2.22 What can be put on the die? depends on the lithography and die area lithography determined by f, minimum feature size feature size is related to  the mask registration variation – f = 2

23 soc 2.23 Smallest device: 5 x 5 2 4 4 5 5

24 soc 2.24 Area Units: rbe and A rbe: small area unit for sizing functional units of the processor suppose we define another larger unit, A, as 1A =f 2 x 10 6,then 1A = 10 6 / 675 = 1481 rbe since 1481 is close to 1444 we can also refer to the simple register file as occupying 1 A

25 soc 2.25 Area of other cells 1 register bit = 1 rbe 1 CAM bit = 2 rbe 1 cache bit (6 tx cell) = 0.6 rbe 1 SRAM bit = 0.6 rbe 1 DRAM bit = 0.1 rbe = 67.5 f 2 These are the parameters for basic cells in most design tradeoffs

26 soc 2.26 Floorplan and area allocation Core processors Signal processor Cache Bus Memory Clock Test

27 soc 2.27 The baseline: I suppose  d is 0.2 defects /cm 2 and we target 80% yield then A = 110 mm 2 gross or (allowing 20%) guard 88 mm 2 net if f = 0.13  we have 5200 A area units for our design we want to realize –a 32b core processor (w 8kB I & 16kB D cache) –2 32b Vector proc. W 16 x 1k x 32 vector memory + I and D cache –128kB ROM – anything else is SRAM

28 soc 2.28 The baseline: II This leaves 5200 - 2462 = 2538A available for data SRAM This implies about 512kB of SRAM

29 soc 2.29 Example SOC floorplan

30 soc 2.30 Die area summary cost: an exponential function of area successful business model –targets initial production at relatively low yield (~0.3) –ride learning curve and leverage technology to reduce cost and improve performance technical innovation and analysis –intersect with business decisions to make a product –use design feasibility studies and empirical targets –methodology for cost and performance evaluation –marketing targets: determine weighting of performance metrics

31 soc 2.31 Power consumption power consumption: becoming key design issue increased power: largely due to higher frequency operation

32 soc 2.32 Bipolar and CMOS clock frequency Bipolar power limit

33 soc 2.33 Bipolar cooling technology (ca ’91) Hitachi M880: 500 MHz; one processor/module, 40 die sealed in helium then cooled by a water jacket. Power consumed: about 800 watts per module. F. Kobayashi, et al. “Hardware technology for Hitachi M-880.” Proceedings Electronic Components and Tech Conf., 1991.

34 soc 2.34 Power: real price of performance As feature size & C (capacitance) decrease, the electric fields force a reduction in V. To maintain performance we also reduce V th So as V th decreases this increases I leakage and static power. Static power is now a big problem in high performance designs. Static power can be controlled by maintaining V th and using lower frequencies; also lowering V reduces dynamic power. Dynamic power Static power

35 soc 2.35 Power and frequency I = C dV/dt ….smaller C enables higher dV/dt (frequency) but I = (V - V th ) 1.25 /V and I also directly determines max. frequency. for V th = 0.6v, halving V also halves the frequency. (E.g. if V goes from 3 to 1.5v then freq is ½) so halving the voltage (V DD or the signal V) halves the frequency BUT reduces the power by 1/8 … (CV 2 f/2) so

36 soc 2.36 Power: a new frontier cooled high power: >70w/ die high power: 10- 50w/ die … plug in supply low power: 0.1- 2w / die.. rechargeable battery very low power: 1- 100mw /die.. AA size batteries extremely low power: 1- 100 microwatt/die and below (nano watts).. button batteries no power: extract from local EM field, ….O (1uw/die)

37 soc 2.37 Battery energy and usage typeenergy capacity timepower recharage able 10,000 mAh 50 hours (10-20% duty) 400mw- 4w 2xAA4000 mAh ½ year (10- 20% duty) 1-10 mw button40mAh5 years (always on) 1uw

38 soc 2.38 Power is important! by scaling alone a 1000 x slower implementation may need only 10 -9 as much power gating power to functional units and other techniques should enable 100MHz processors to operate at O(10 -3 ) watts goal: O(10 -6 ) watts…. implies about 10 MHz

39 soc 2.39 design for reliability using –redundancy –error detect and correct –process recoverability –fail-safe computation failure: a deviation from a design specification error: a failure that results in an incorrect signal value fault: an error manifests as an incorrect logical result faults –do not necessarily produce incorrect program execution –can be masked by detection/correction logic, e.g. ecc codes types of faults: –physical fault –design fault Reliability + computational integrity

40 soc 2.40 Redundancy: carefully applied P(t) = e -t/  –derived in the same way as the yield equation TMR (triple modular redundancy) system –additional reliability over a time much less than  the expected failure time for a single module additional hardware –makes the occurrence of multiple module failures more probable

41 soc 2.41 Highly reliable designs typical usage –error detection: parity, residue, block codes; sanity & bounds checks –action (instruction) retry –error correction: code or alternate path compute –reconfiguration

42 soc 2.42 Why reconfigurable design? manage design complexity based on high-performance IP-blocks –avoid the risk and delay of fabrication time – support highly-pipelined designs area – regularity of FPGA, readily to advance to better process technology reliability – FPGA enables redundant cells and interconnections, avoid run-time faults

43 soc 2.43 Area estimate of FPGAs use rbe model as the basic measure –one slice  7000 transistors = 700 rbe –one logic element (LE)  12000 = 1200 rbe –Xilinx Virtex XC2V6000 = 33,792 slices 23.65 million rbe or 16400A 8 x 8 multiplier: around 35 slices –equivalent to 24500 rbe or 17A –1-bit multiplier in VLSI contains a full-adder and an AND gate  3840 transistors = 384 rbe around 60 times smaller than reconfigurable version block multipliers in FPGAs: more efficient

44 soc 2.44 Soft processors: using FPGAs soft processors  how soft they are? –an instruction processor design in bit-stream format, used to program an FPGA device –cost reduction, design reuse, … major soft processors include: –Altera: Nios –Xilinx: MicroBlaze –open-source: OpenRISC, Leon –all 32-bit RISC architecture with 5-stage pipelines, connect to different bus standards

45 soc 2.45 Features: soft processors

46 soc 2.46 Summary best optimise: time, area, power cycle time: optimized pipelining area: die floorplanning, rbe model power: cooling + battery implications reliability: computational integrity, redundancy reconfiguration: reduce risks and delays –area overhead alleviated by coarse-grained blocks –soft processors: instruction processors in FPGA


Download ppt "Soc 2.1 Chapter 2 Chip Basics: Time, Area, Power, Reliability, Configurability Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011."

Similar presentations


Ads by Google