Download presentation
Presentation is loading. Please wait.
Published byJonah Atkinson Modified over 9 years ago
1
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005
2
2 Outline Technology scaling challenges Technology scaling challenges Circuit and design solutions Circuit and design solutions Microarchitecture advances Microarchitecture advances Multi-everywhere Multi-everywhere Summary Summary
3
3 Goal: 10 TIPS by 2015 Pentium® Pro Architecture Pentium® 4 Architecture Pentium® Architecture 486 386 286 8086 How do you get there?
4
4 Technology Scaling GATE SOURCE BODY DRAIN Xj Tox D GATE SOURCE DRAIN Leff BODY Dimensions scale down by 30% Doubles transistor density Oxide thickness scales down Faster transistor, higher performance Vdd & Vt scaling Lower active power Scaling will continue, but with challenges!
5
5 Technology Outlook High Volume Manufacturing 20042006200820102012201420162018 Technology Node (nm) 906545322216118 Integration Capacity (BT) 248163264128256 Delay = CV/I scaling 0.7~0.7>0.7 Delay scaling will slow down Energy/Logic Op scaling >0.35>0.5>0.5 Energy scaling will slow down Bulk Planar CMOS High Probability Low Probability Alternate, 3G etc Low Probability High Probability Variability Medium High Very High ILD (K) ~3<3 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 RC Delay 11111111 Metal Layers 6-77-88-9 0.5 to 1 layer per generation
6
6 The Leakage(s)… 90nm MOS Transistor50nm Si 1.2 nm SiO 2 Gate
7
7 Must Fit in Power Envelope 0 200 400 600 800 1000 1200 1400 90nm65nm45nm32nm22nm16nm Power (W), Power Density (W/cm 2 ) SiO2 Lkg SD Lkg Active 10 mm Die Technology, Circuits, and Architecture to constrain the power
8
8 Solutions Move away from Frequency alone to deliver performance Move away from Frequency alone to deliver performance More on-die memory More on-die memory Multi-everywhere Multi-everywhere –Multi-threading –Chip level multi-processing Throughput oriented designs Throughput oriented designs Valued performance by higher level of integration Valued performance by higher level of integration –Monolithic & Polylithic
9
9 Leakage Solutions Tri-gate Transistor Silicon substrate 1.2 nm SiO 2 Gate Planar Transistor Silicon substrate Gate electrode 3.0nm High-k For a few generations, then what?
10
10 Active Power Reduction SlowFastSlow Low Supply Voltage High Supply Voltage Multiple Supply Voltages Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Vdd/2 Logic Block Throughput Oriented Designs
11
11 Leakage Control Body Bias Vdd Vbp Vbn -Ve +Ve 2-10XReduction Sleep Transistor Logic Block 2-1000XReduction Stack Effect Equal Loading 5-10XReduction
12
12 Optimum Frequency Maximum performance with Optimum pipeline depth Optimum pipeline depth Optimum frequency Optimum frequency Process Technology 0 2 4 6 8 10 123456789 Relative Frequency Sub-threshold Leakage increases exponentially Pipeline Depth 0 2 4 6 8 10 123456789 Relative Pipeline Depth Power Efficiency Optimum Pipeline & Performance 0 2 4 6 8 10 123456789 Relative Frequency (Pipelining) Performance Diminishing Return
13
13 Memory Latency MemoryCPUCache Small ~few Clocks Large 50-100ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency Cache miss hurts performance Worse at higher frequency
14
14 Increase on-die Memory Large on die memory provides: 1.Increased Data Bandwidth & Reduced Latency 2.Hence, higher performance for much lower power
15
15 Multi-threading ST Wait for Mem MT1 Wait for Mem MT2 Wait MT3 Single Thread Multi-Threading Full HW Utilization Multi-threading improves performance without impacting thermals & power delivery Thermals & Power Delivery designed for full HW utilization
16
16 Single Core Power/Performance Moore’s Law more transistors for advanced architectures Delivers higher peak performance But… Lower power efficiency
17
17 Chip Multi-Processing C1C2 C3C4 Cache Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature
18
18 Dual Core VoltageFrequencyPowerPerformance1%1%3%0.66% Rule of thumb Core Cache Core Cache Core Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8 In the same process technology…
19
19 Multi-Core C1C2 C3C4 Cache Large Core Cache 1 2 3 4 1 2 Small Core 11 1 2 3 4 1 2 3 4 Power Performance Power = 1/4 Performance = 1/2 Multi-Core: Power efficient Better power and thermal management
20
20 Special Purpose Hardware 2.23 mm X 3.54 mm, 260K transistors Opportunities: Network processing engines MPEG Encode/Decode engines, Speech engines TCP/IP Offload Engine Special purpose HW provides best Mips/Watt
21
21 Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N 1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N 1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success
22
22 From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm 12 Cores24 Cores 144 Cores
23
23 GP General Purpose Cores Future Multi-core Platform SP Special Purpose HW CC CC CC CC CC CC CC CC Interconnect fabric Heterogeneous Multi-Core Platform
24
24 The New Era of Computing Multi-everywhere: MT, CMP Speculative, OOO Era of Instruction LevelParallelism Super Scalar 486 386 286 8086 Era of Pipelined Architecture Multi Threaded Era of Thread & ProcessorLevelParallelism Special Purpose HW Multi-Threaded, Multi-Core
25
25 Summary Business as usual is not an option Business as usual is not an option –Performance at any cost is history Must make a Right Hand Turn (RHT) Must make a Right Hand Turn (RHT) –Move away from frequency alone Future Architectures and designs Future Architectures and designs –More memory (larger caches) –Multi-threading –Multi-processing –Special purpose hardware –Valued performance with higher integration
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.