Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Runnemede: Disruptive Technologies for UHPC John Gustafson Intel Labs HPC User Forum – Houston 2011.

Similar presentations


Presentation on theme: "1 Runnemede: Disruptive Technologies for UHPC John Gustafson Intel Labs HPC User Forum – Houston 2011."— Presentation transcript:

1 1 Runnemede: Disruptive Technologies for UHPC John Gustafson Intel Labs HPC User Forum – Houston 2011

2 The battle lines are drawn… 2 “Caches are for morons.” —Shekhar Borkar, Intel “We’re going to try to make the entire exascale machine cache-coherent.” —Bill Dally, Nvidia

3 Intel’s UHPC Approach Design test chips with the idea of maximizing learning. Very different from producing product roadmap processor designs. Going from Peta to Exa is nothing like the last few 1000x increases… 3

4 4 Building with Today’s Technology 200pJ per FLOP 200W 150W 100W 4450W Decode and control Translations …etc Power supply losses Cooling…etc 5KW Compute Memory Com Disk TFLOP Machine today 10TB disk @ 1TB/disk @10W 0.1B/FLOP @ 1.5nJ per Byte 100pJ com per FLOP KW Tera, MW Peta, GW Exa?

5 5 The Power & Energy Challenge 200W 150W 100W 4550W 5KW Compute Memory Com Disk TFLOP Machine today 5W 2W ~5W ~3W 5W TFLOP Machine then With Exa Technology ~20W

6 6 Scaling Assumptions Technology (High Volume) 45 nm (2008) 32 nm (2010) 22 nm (2012) 16 nm (2014) 11 nm (2016) 8 nm (2018) 5 nm (2020) Transistor density 1.751.751.751.751.751.751.75 Frequency scaling 15%10%8%5%4%3%2% Vdd scaling -10%-7.5%-5%-2.5%-1.5%-1%-0.5% SD Leakage scaling/micron 1X Optimistic to 1.43X Pessimistic 65 nm Core + Local Memory Memory 0.35MB 5mm 2 (50%) DP FP Add, Multiply Integer Core, RF Router 5mm 2 (50%) 10 mm 2, 3 GHz, 6 GF, 1.8 W 8 nm Core + Local Memory Memory 0.35MB 0.17mm 2 (50%) DP FP Add, Multiply Integer Core, RF Router 0.17mm 2 (50%) 0.34 mm 2, 4.6 GHz, 9.2 GF, 0.24 to 0.46 W ~0.6mm

7 7 Near Threshold Logic 320mV 65nm CMOS, 50°C 320mV Subthreshold Region 9.6X 65nm CMOS, 50°C 10 -2 10 1 10 1 0 50 100 150 200 250 300 350 400 450 0.20.40.60.81.01.21.4 Supply Voltage (V) Energy Efficiency (GOPS/Watt) Active Leakage Power (mW) H. Kaul et al, 16.6: ISSCC08

8 8 Revise DRAM Architecture Page RAS CAS Activates many pages Lots of reads and writes (refresh) Small amount of read data is used Requires small number of pins Traditional DRAM New DRAM architecture Page Addr Activates few pages Read and write (refresh) what is needed All read data is used Requires large number of I/Os (3D) Energy cost today: ~150 pJ/bit

9 9 Data Locality Core-to-core Communication on the chip: ~10 pJ per Byte Chip to memory Communication: ~1.5 nJ per Byte ~150 pJ per Byte Chip to chip Communication: ~100 pJ per Byte Data movement is expensive—keep it local (1) Core to core, (2) Chip-to-chip, (3) Memory

10 Disruptive Approach to Faults We tend to assume that execution faults (soft errors, hard errors) are rare. And it’s a valid speculation. Currently. Soon, we will need much more paranoia in hardware designs. 10

11 11 Road to Unreliability? From Peta to Exa Reliability Issues 1,000X parallelism More hardware for something to go wrong >1,000X intermittent faults due to soft errors Aggressive Vcc scaling to reduce power/energy Gradual faults due to increased variations More susceptible to Vcc droops (noise) More susceptible to dynamic temp variations Exacerbates intermittent faults—soft errors Deeply scaled technologies Aging related faults Lack of burn-in? Variability increases dramatically Resiliency will be the cornerstone

12 12 ResiliencyFaultsExample Permanent faults Stuck-at 0 & 1 Gradual faults VariabilityTemperature Intermittent faults Soft errors Voltage droops Aging faults Degradation Faults cause errors (data & control) Datapath errors Detected by parity/ECC Silent data corruption Need HW hooks Control errors Control lost (Blue screen) Minimal overhead for resiliency Circuit & Design Microarchitecture Microcode, Platform Programming system Applications Error detection Fault isolation Fault confinement Reconfiguration Recovery & Adapt System Software

13 Execution Model and Codelets Programming Models/Systems (Rich) Hardware Abstraction Cores Peripherals/Devices Advanced Hardware Monitoring Net Run Time System Sea of Codelets Codelet - Code that can be executed non- preemptively with an “event-driven” model Shared memory model based on LC (Location Consistency – a generalized single- assignment model [GaoSarkar1980]) Codelet - Code that can be executed non- preemptively with an “event-driven” model Shared memory model based on LC (Location Consistency – a generalized single- assignment model [GaoSarkar1980]) 13

14 14 Summary Voltage scaling to reduce power and energy Explodes parallelism Cost of communication vs computation—critical balance Resiliency to combat side-effects and unreliability Programming system for extreme parallelism Application driven, HW/SW co-design approach Self-awareness & execution model to harmonize


Download ppt "1 Runnemede: Disruptive Technologies for UHPC John Gustafson Intel Labs HPC User Forum – Houston 2011."

Similar presentations


Ads by Google