Presentation is loading. Please wait.

Presentation is loading. Please wait.

® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.

Similar presentations


Presentation on theme: "® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005."— Presentation transcript:

1 ® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005

2 2 Outline Technology scaling challenges Technology scaling challenges Circuit and design solutions Circuit and design solutions Microarchitecture advances Microarchitecture advances Multi-everywhere Multi-everywhere Summary Summary

3 3 Goal: 10 TIPS by 2015 Pentium® Pro Architecture Pentium® 4 Architecture Pentium® Architecture 486 386 286 8086 How do you get there?

4 4 Technology Scaling GATE SOURCE BODY DRAIN Xj Tox D GATE SOURCE DRAIN Leff BODY Dimensions scale down by 30% Doubles transistor density Oxide thickness scales down Faster transistor, higher performance Vdd & Vt scaling Lower active power Scaling will continue, but with challenges!

5 5 Technology Outlook High Volume Manufacturing 20042006200820102012201420162018 Technology Node (nm) 906545322216118 Integration Capacity (BT) 248163264128256 Delay = CV/I scaling 0.7~0.7>0.7 Delay scaling will slow down Energy/Logic Op scaling >0.35>0.5>0.5 Energy scaling will slow down Bulk Planar CMOS High Probability Low Probability Alternate, 3G etc Low Probability High Probability Variability Medium High Very High ILD (K) ~3<3 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 RC Delay 11111111 Metal Layers 6-77-88-9 0.5 to 1 layer per generation

6 6 The Leakage(s)… 90nm MOS Transistor50nm Si 1.2 nm SiO 2 Gate

7 7 Must Fit in Power Envelope 0 200 400 600 800 1000 1200 1400 90nm65nm45nm32nm22nm16nm Power (W), Power Density (W/cm 2 ) SiO2 Lkg SD Lkg Active 10 mm Die Technology, Circuits, and Architecture to constrain the power

8 8 Solutions Move away from Frequency alone to deliver performance Move away from Frequency alone to deliver performance More on-die memory More on-die memory Multi-everywhere Multi-everywhere –Multi-threading –Chip level multi-processing Throughput oriented designs Throughput oriented designs Valued performance by higher level of integration Valued performance by higher level of integration –Monolithic & Polylithic

9 9 Leakage Solutions Tri-gate Transistor Silicon substrate 1.2 nm SiO 2 Gate Planar Transistor Silicon substrate Gate electrode 3.0nm High-k For a few generations, then what?

10 10 Active Power Reduction SlowFastSlow Low Supply Voltage High Supply Voltage Multiple Supply Voltages Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Vdd/2 Logic Block Throughput Oriented Designs

11 11 Leakage Control Body Bias Vdd Vbp Vbn -Ve +Ve 2-10XReduction Sleep Transistor Logic Block 2-1000XReduction Stack Effect Equal Loading 5-10XReduction

12 12 Optimum Frequency Maximum performance with Optimum pipeline depth Optimum pipeline depth Optimum frequency Optimum frequency Process Technology 0 2 4 6 8 10 123456789 Relative Frequency Sub-threshold Leakage increases exponentially Pipeline Depth 0 2 4 6 8 10 123456789 Relative Pipeline Depth Power Efficiency Optimum Pipeline & Performance 0 2 4 6 8 10 123456789 Relative Frequency (Pipelining) Performance Diminishing Return

13 13 Memory Latency MemoryCPUCache Small ~few Clocks Large 50-100ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency Cache miss hurts performance Worse at higher frequency

14 14 Increase on-die Memory Large on die memory provides: 1.Increased Data Bandwidth & Reduced Latency 2.Hence, higher performance for much lower power

15 15 Multi-threading ST Wait for Mem MT1 Wait for Mem MT2 Wait MT3 Single Thread Multi-Threading Full HW Utilization Multi-threading improves performance without impacting thermals & power delivery Thermals & Power Delivery designed for full HW utilization

16 16 Single Core Power/Performance Moore’s Law  more transistors for advanced architectures Delivers higher peak performance But… Lower power efficiency

17 17 Chip Multi-Processing C1C2 C3C4 Cache Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature

18 18 Dual Core VoltageFrequencyPowerPerformance1%1%3%0.66% Rule of thumb Core Cache Core Cache Core Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8 In the same process technology…

19 19 Multi-Core C1C2 C3C4 Cache Large Core Cache 1 2 3 4 1 2 Small Core 11 1 2 3 4 1 2 3 4 Power Performance Power = 1/4 Performance = 1/2 Multi-Core: Power efficient Better power and thermal management

20 20 Special Purpose Hardware 2.23 mm X 3.54 mm, 260K transistors Opportunities: Network processing engines MPEG Encode/Decode engines, Speech engines TCP/IP Offload Engine Special purpose HW provides best Mips/Watt

21 21 Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N 1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N 1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success

22 22 From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm 12 Cores24 Cores 144 Cores

23 23 GP General Purpose Cores Future Multi-core Platform SP Special Purpose HW CC CC CC CC CC CC CC CC Interconnect fabric Heterogeneous Multi-Core Platform

24 24 The New Era of Computing Multi-everywhere: MT, CMP Speculative, OOO Era of Instruction LevelParallelism Super Scalar 486 386 286 8086 Era of Pipelined Architecture Multi Threaded Era of Thread & ProcessorLevelParallelism Special Purpose HW Multi-Threaded, Multi-Core

25 25 Summary Business as usual is not an option Business as usual is not an option –Performance at any cost is history Must make a Right Hand Turn (RHT) Must make a Right Hand Turn (RHT) –Move away from frequency alone Future  Architectures and designs Future  Architectures and designs –More memory (larger caches) –Multi-threading –Multi-processing –Special purpose hardware –Valued performance with higher integration


Download ppt "® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005."

Similar presentations


Ads by Google