Presentation is loading. Please wait.

Presentation is loading. Please wait.

® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004.

Similar presentations


Presentation on theme: "® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004."— Presentation transcript:

1 ® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004

2 2 ISSCC 2003 Gordon Moore said… No exponential is forever… But We can delay Forever

3 3 Outline The exponential challenges The exponential challenges Circuit and Arch solutions Circuit and Arch solutions Major paradigm shifts in design Major paradigm shifts in design Integration & SOC Integration & SOC The exponential reward The exponential reward Summary Summary

4 4 Goal: 1TIPS by 2010 Pentium® Pro Architecture Pentium® 4 Architecture Pentium® Architecture 486 386 286 8086 How do you get there?

5 5 Technology Scaling GATE SOURCE BODY DRAIN Xj Tox D GATE SOURCE DRAIN Leff BODY Dimensions scale down by 30% Doubles transistor density Oxide thickness scales down Faster transistor, higher performance Vdd & Vt scaling Lower active power Technology has scaled well, will it in the future?

6 6 Technology Outlook High Volume Manufacturing 20042006200820102012201420162018 Technology Node (nm) 906545322216118 Integration Capacity (BT) 248163264128256 Delay = CV/I scaling 0.7~0.7>0.7 Delay scaling will slow down Energy/Logic Op scaling >0.35>0.5>0.5 Energy scaling will slow down Bulk Planar CMOS High Probability Low Probability Alternate, 3G etc Low Probability High Probability Variability Medium High Very High ILD (K) ~3<3 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 RC Delay 11111111 Metal Layers 6-77-88-9 0.5 to 1 layer per generation

7 7 Is Transistor a Good Switch? On I = I = 0 Off I = 0 I 0 I = 1ma/u I 0 Sub-threshold Leakage

8 8 Exponential Challenge #1

9 9 Sub-threshold Leakage Sub-threshold leakage increases exponentially Assume: 0.25 m, I off = 1na/ 5X increase each generation at 30ºC

10 10 SD Leakage Power SD leakage power becomes prohibitive

11 11 Leakage Power Leakage power limits Vt scaling A. Grove, IEDM 2002

12 12 Exponential Challenge #2

13 13 Gate Oxide is Near Limit High-K dielectric is crucial 90nm MOS Transistor50nm Silicon substrate 1.2 nm SiO 2 Gate If Tox scaling slows down, then Vdd scaling will have to slow down

14 14 Exponential Challenge #3

15 15 Energy per Logic Operation Energy per logic operation scaling will slow down

16 16 The Power Crisis Business as usual is not an option

17 17 Exponential Challenge #4

18 18 Sources of Variations Heat Flux (W/cm 2 ) Results in Vcc variation Temperature Variation (°C) Hot spots Random Dopant Fluctuations 193nm 248nm 365nm LithographyWavelength 65nm 90nm 130nm Generation Gap 45nm 32nm 13nm EUV 180nm Source: Mark Bohr, Intel Sub-wavelength Lithography

19 19 Frequency & SD Leakage 0.18 micron ~1000 samples 20X 30% Low Freq Low Isb High Freq Medium Isb High Freq High Isb

20 20 ~30mV Vt Distribution 0.18 micron ~1000 samples Low Freq Low Isb High Freq Medium Isb High Freq High Isb

21 21 Exponential Challenge #5

22 22 Platform Requirements 0 500 1000 1500 2000 2500 3000 PC towerMini tower tower Slim lineSmall pc System Volume ( cubic inch) Shrinking volume Quieter Yet, High Performance 0 0.5 1.0 1.5 050100150200 Power (W) Thermal Budget ( o C/W) 0 25 50 75 Heat-Sink Volume (in 3 ) Projected Heat Dissipation Volume Projected Air Flow Rate Pentium ® III 100 250 Thermal Budget Air Flow Rate (CFM) Pentium ® 4 Thermal budget decreasing Higher heat sink volume Higher air flow rate

23 23 Exponential Challenge #6

24 24 Exponential Costs G. Moore ISSCC 03 Litho Cost www.icknowledge.com FAB Cost $ per Transistor $ per MIPS

25 25 Product Cost Pressure Shrinking ASP, and shrinking $ budget for power

26 26 Must Fit in Power Envelope Technology, Circuits, and Architecture to constrain the power 0 200 400 600 800 1000 1200 1400 90nm65nm45nm32nm22nm16nm Power (W), Power Density (W/cm 2 ) SiO2 Lkg SD Lkg Active 10 mm Die 1 MB 2 MB 4 MB 8 MB

27 27 Some Implications Tox scaling will slow downmay stop? Tox scaling will slow downmay stop? Vdd scaling will slow downmay stop? Vdd scaling will slow downmay stop? Vt scaling will slow downmay stop? Vt scaling will slow downmay stop? Approaching constant Vdd scaling Approaching constant Vdd scaling Energy/logic op will not scale Energy/logic op will not scale

28 28 The Gigascale Dilemma 1B T integration capacity will be available 1B T integration capacity will be available But could be unusable due to power But could be unusable due to power Logic T growth will slow down Logic T growth will slow down Transistor performance will be limited Transistor performance will be limitedSolutions Low power design techniques Low power design techniques Improve design efficiencyMulti everywhere Improve design efficiencyMulti everywhere Valued performance by even higher integration (of potentially slower transistors) Valued performance by even higher integration (of potentially slower transistors)

29 29 Poweractive and leakage VariationsMicroarchitecture

30 30 Active Power Reduction SlowFastSlow Low Supply Voltage High Supply Voltage Multiple Supply Voltages Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Vdd/2 Logic Block Replicated Designs

31 31 Leakage Control Body Bias Vdd Vbp Vbn -Ve +Ve 2-10XReduction Sleep Transistor Logic Block 2-1000XReduction Stack Effect Equal Loading 5-10XReduction

32 32 Circuit Design Tradeoffs 0 0.5 1 1.5 2 Low-Vt usage low high Higher probability of target frequency with: 1.Larger transistor sizes 2.Higher Low-Vt usage But with power penalty 0 0.5 1 1.5 2 Transistor size small large power target frequency probability

33 33 Impact of Critical Paths 1.1 1.2 1.3 1.4 191725 # of critical paths Mean clock frequency Clock frequency Number of dies 0% 20% 40% 60% 0.91.11.31.5 # critical paths With increasing # of critical paths With increasing # of critical paths –Both and become smaller –Lower mean frequency

34 34 Impact of Logic Depth 0% 20% 40% -16%-8%0%8%16% Delay 20% 40% NMOS PMOS Device I ON Number of samples (%) Variation (%) NMOS Ion PMOS Ion / Delay / 5.6%3.0%4.2% Logic depth: 16 0.0 0.5 1.0 Logic depth Ratio of delay- to Ion- 16 49

35 35 0 0.5 1 1.5 Logic depth small large frequency target frequency probability # uArch critical paths 0 0.5 1 1.5 lessmore Architecture Tradeoffs Architecture Tradeoffs Higher target frequency with: 1.Shallow logic depth 2.Larger number of critical paths But with lower probability

36 36 Variation-tolerant Design 0 0.5 1 1.5 # uArch critical paths lessmore Balance power & frequency with variation tolerance 0 0.5 1 1.5 Logic depth small large frequency target frequency probability 0 0.5 1 1.5 2 Transistor size small large power target frequency probability 0 0.5 1 1.5 2 Low-Vt usage low high

37 37 Leakage Power FrequencyDeterministicProbabilistic 10X variation ~50% total power Probabilistic Design Delay Path Delay Probability Deterministic design techniques inadequate in the future Due to variations in: V dd, V t, and Temp Delay Target # of Paths Deterministic Delay Target # of Paths Probabilistic

38 38 Shift in Design Paradigm Multi-variable design optimization for: Multi-variable design optimization for: – Yield and bin splits – Parameter variations – Active and leakage power – Performance Tomorrow: Global Optimization Multi-variateToday: Local Optimization Single Variable

39 39 Resistor Network 4.5 mm 5.3 mm Multiple subsites PD & Counter Resistor Network CUT Bias Amplifier Delay Die frequency: Min(F 1..F 21 ) Die power: Sum(P 1..P 21 ) Technology 150nm CMOS Number of subsites per die 21 Body bias range 0.5V FBB to 0.5V RBB Bias resolution 32 mV 1.6 X 0.24 mm, 21 sites per die 150nm CMOS Adaptive Body Bias--Experiment

40 40 Adaptive Body Bias--Results 0% 20% 60% 100% Accepted die noBB 100% yield ABB Higher Frequency 97% highest bin within die ABB For given Freq and Power density 100% yield with ABB 100% yield with ABB 97% highest freq bin with ABB for within die variability 97% highest freq bin with ABB for within die variability

41 41 Design & Arch Efficiency Employ efficient design & Architectures

42 42 Memory Latency MemoryCPUCache Small ~few Clocks Large 50-100ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency Cache miss hurts performance Worse at higher frequency

43 43 Increase on-die Memory Large on die memory provides: 1.Increased Data Bandwidth & Reduced Latency 2.Hence, higher performance for much lower power

44 44 Multi-threading ST Wait for Mem MT1 Wait for Mem MT2 Wait MT3 Single Thread Multi-Threading Full HW Utilization Multi-threading improves performance without impacting thermals & power delivery Thermals & Power Delivery designed for full HW utilization

45 45 Chip Multi-Processing C1C2 C3C4 Cache Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature

46 46 Special Purpose Hardware 2.23 mm X 3.54 mm, 260K transistors Opportunities: Network processing engines MPEG Encode/Decode engines Speech engines Special purpose HWBest Mips/Watt TCP Offload Engine

47 47 Valued Performance: SOC (System on a Chip) Special-purpose hardware more MIPS/mm² Special-purpose hardware more MIPS/mm² SIMD integer and FP instructions in several ISAs SIMD integer and FP instructions in several ISAs Die Area PowerPerformance General Purpose 2X2X~1.4X Multimedia Kernels <10%<10% 1.5 - 4X Si Monolithic Polylithic CPU Special HW Memory CMOS RF Wireline RF Opto- Electronics Dense Memory Heterogeneous Si, SiGe, GaAs

48 48 The Exponential Reward Multi-everywhere: MT, CMP Speculative, OOO Era of Instruction LevelParallelism Super Scalar 486 386 286 8086 Era of Pipelined Architecture Multi Threaded Era of Thread & ProcessorLevelParallelism Special Purpose HW Multi-Threaded, Multi-Core

49 49 SummaryDelaying Forever Gigascale transistor integration capacity will be availablePower and Energy are the barriers Gigascale transistor integration capacity will be availablePower and Energy are the barriers Variations will be even more prominentshift from Deterministic to Probabilistic design Variations will be even more prominentshift from Deterministic to Probabilistic design Improve design efficiency Improve design efficiency Multieverywhere, & SOC valued performance Multieverywhere, & SOC valued performance Exploit integration capacity to deliver performance in power/cost envelope Exploit integration capacity to deliver performance in power/cost envelope


Download ppt "® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004."

Similar presentations


Ads by Google