Presentation on theme: "® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Based on lecture of Shekhar Borkar Intel Fellow Circuit Research, Intel Labs."— Presentation transcript:
® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Based on lecture of Shekhar Borkar Intel Fellow Circuit Research, Intel Labs
2 ISSCC 2003 Gordon Moore said… No exponential is forever… But We can delay Forever
3 Goal: 1TIPS by 2010 Pentium® Pro Architecture Pentium® 4 Architecture Pentium® Architecture How do you get there?
4 Transistors Scaling Will high K happen? Would you count on it?
5 Technology Scaling GATE SOURCE BODY DRAIN Xj Tox D GATE SOURCE DRAIN Leff BODY Dimensions scale down by 30% Doubles transistor density Oxide thickness scales down Faster transistor, higher performance Vdd & Vt scaling Lower active power Technology has scaled well, will it in the future?
6 Gate Oxide is Near Limit 70 nm Si 3 N 4 CoSi 2 130nm Transistor Will high K happen? Would you count on it? GATE SOURCE BODY DRAIN Tox GATE SOURCE DRAIN 70 nm BODY
7 Transistor Integration Capacity On track for 1billion transistor integration capacity
8 Transistor Integration Capacity
10 Transistor Integration Capacity
11 Transistor Integration Capacity
12 Exponential Challenge #1
13 Is Transistor a Good Switch? On I = I = 0 Off I = 0 I 0 I = 1ma/u I 0 Sub-threshold Leakage
14 Sub-threshold Leakage Sub-threshold leakage increases exponentially Assume: 0.25 m, I off = 1na/ 5X increase each generation at 30ºC
15 Leakage Power Leakage power limits Vt scaling A. Grove, IEDM 2002
16 The Power Crisis
17 Exponential Challenge #4
18 Impact on Path Delays Path Delay Path delay variability due to technological variations Impacts individual circuit performance and power Optimize each circuit for performance and power Delay Probability Due to variations in: Vdd, Vt, and Temp
19 Impact on Path Delays Path Delay Path delay variability due to technological variations Impacts individual circuit performance and power Optimize each circuit for performance and power Delay Probability Due to variations in: Vdd, Vt, and Temp How many silicon atoms (111pm) have on transistor channel (20nm)? 3D transistor is a solution?
20 Shift in Design Paradigm From deterministic design to probabilistic and statistical design From deterministic design to probabilistic and statistical design –A path delay estimate is probabilistic (not deterministic) Multi-variable design optimization for Multi-variable design optimization for – Parameter variations – Active and leakage power – Performance
21 Exponential Challenge #6
22 Exponential Costs G. Moore ISSCC 03 Litho Cost FAB Cost $ per Transistor $ per MIPS
23 Some Implications Tox scaling will slow downmay stop? Tox scaling will slow downmay stop? Vdd scaling will slow downmay stop? Vdd scaling will slow downmay stop? Vt scaling will slow downmay stop? Vt scaling will slow downmay stop? Approaching constant Vdd scaling Approaching constant Vdd scaling Energy/logic op will not scale Energy/logic op will not scale
24 The Terascale Dilemma Many billion transistor integration capacity will be available Many billion transistor integration capacity will be available – But could be unusable due to power Logic transistor growth will slow down Logic transistor growth will slow down Transistor performance will be limited Transistor performance will be limitedSolutions Low power design techniques Low power design techniques Improve design efficiency Improve design efficiency
25 Exponential Challenge #5
26 Platform Requirements PC towerMini tower tower Slim lineSmall pc System Volume ( cubic inch) Shrinking volume Quieter Yet, High Performance Power (W) Thermal Budget ( o C/W) Heat-Sink Volume (in 3 ) Projected Heat Dissipation Volume Projected Air Flow Rate Pentium ® III Thermal Budget Air Flow Rate (CFM) Pentium ® 4 Thermal budget decreasing Higher heat sink volume Higher air flow rate
27 Active Power Reduction SlowFastSlow Low Supply Voltage High Supply Voltage Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = Vdd/2 Logic Block Multiple Vdd Throughput oriented design
Improve Arch Efficiency ST Wait for Mem MT1 Wait for Mem MT2 Wait MT3 Single Thread Multi-Threading Thermals & Power Delivery designed for full HW utilization Multi-threading improves performance without impacting thermals & power delivery
30 Increase on-die Memory Large on die memory provides: 1.Increased Data Bandwidth & Reduced Latency 2.Hence, higher performance for much lower power
31 Chip Multi-Processing C1C2 C3C4 Cache Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Spreading hot spots Lower junction temperature
32 Example (Itanium Tukwila)
33 Example (Itanium Tukwila) 30 MBytes cache 130 Watts
34 Example (Itanium Tukwila)
35 What the Cores Will look like?
36 What the Cores Will look like?
37 What the Cores Will look like?
38 What the Cores Will look like? clocks run with the same frequency but unknown phases
39 What the Cores Will look like?
40 What the Cores Will look like? Intelligent redistribution workload Improvement of energy efficiency Multiple functionalities
41 What the Cores Will look like? Several interconnection possibilities Mesh Ring
42 Tera-Scale RMS - Recognition, Mining and Synthesis
46 The Exponential Reward Speculative, OOO Era of Instruction LevelParallelism Super Scalar Era of Pipelined Architecture Multi Threaded Era of Thread & ProcessorLevelParallelism Special Purpose HW Multi-Threaded, Multi-Core
47 SummaryDelaying Forever Terascale transistor integration capacity will be available - Power and Energy are the barriers Terascale transistor integration capacity will be available - Power and Energy are the barriers Variations will be even more prominent - shift from Deterministic to Probabilistic design Variations will be even more prominent - shift from Deterministic to Probabilistic design Improve design efficiency Improve design efficiency Exploit integration capacity to deliver performance in power/cost envelope Exploit integration capacity to deliver performance in power/cost envelope
48 1. Discuta um problema associados a integração dos dispositivos 2. Comente a afirmação: - A redução do tamanho dos transistores muda o paradigma de avaliação de consumo de energia e tempo de execução de determinístico para probabilístico 3. Porque o consumo de energia estático é tão problemático para as tecnologias futuras? 4. Porque a redução da voltagem é um dos principais elementos a tratar para reduzir o consumo de energia? 5. Como um sistema com várias alimentações pode contribuir para a redução do consumo de energia? Qual o efeito sobre o tempo de execução? Exercícios
49 6. Faça uma ilustração que mostre como um programa multi- thread pode ocupar melhor os recursos de um sistema, reduzindo o gargalo de comunicação com a memória 7. Qual o motivo do percentual de memória interno a um circuito integrado passar de 50% nos processadores atuais? 8. Dada a limitação do escalamento, o que pode ser feito para continuar o crescente aumento do desempenho das máquinas? 9. Quais as tendências em termos de computação (cores), infra-estrutura de comunicação e armazenamento para os próximos processadores? Exercícios