Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise Group Intel Corporation HPC: Energy Efficient Computing April.

Similar presentations


Presentation on theme: "Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise Group Intel Corporation HPC: Energy Efficient Computing April."— Presentation transcript:

1 Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise Group Intel Corporation HPC: Energy Efficient Computing April 20, 2009

2 2 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document may contain information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Wireless connectivity and some features may require you to purchase additional software, services or external hardware. Nehalem, Penryn, Westmere, Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside, Pentium, Xeon, Core and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2009 Intel Corporation.

3 3 Reach Exascale by 2018 From GigFlops to ExaFlops Sustained TeraFlop Sustained PetaFlop Sustained GigaFlop Sustained ExaFlop “The pursuit of each milestone has led to important breakthroughs in science and engineering.” Source: IDC “In Pursuit of Petascale Computing: Initiatives Around the World,” 2007 ~1987 ~1997 2008 ~2018 Note: Numbers are based on Linpack Benchmark. Dates are approximate.

4 4 What are the Challenges? Power is Gating Every Part of Computing The Challenge of Exascale Source: Intel, for illustration and assumptions, not product representativeEFLOP2015-18Power? Power Consumption Power (KW) 1000,000 Voltage is not scaling as in the past 10000 100 1000 1964198519972008 MFLOP GFLOP TFLOP PFLOP 2018 100,000 ? 200MW 150MW 100MW 10MW 500 + MW? Compute Memory Comm Disk An ExaFLOPS Machine without Power Management Other misc. power consumptions: Power supply losses Cooling … etc 10EB disk @ 10TB/disk @10W 100pJ per FLOP 0.1B/FLOP @ 1.5nJ per Byte ~400W / Socket

5 5 HPC Platform Power Data from P3 Jet Power Calculator, V2.0 DP 80W Nehalem Memory – 48GB (12 x 4GB DIMMs) Single Power Supply Unit @ 230Vac Need a platform view of power consumption: CPU, Memory and VR, etc. CPU Planar & VR’s Memory

6 6 Device Efficiency is Slowing Unmanaged growth in power will reach Giga Watt level at Exascale Relative Performance and Power (GFlops as the base) Power at a glance: (assume 31% CPU Power in a system) Today’s Peta: 0.7- 2 nj/op Today’s COTS: 2nj/op (assume 100W/50GFlops) Unmanaged Exa: if 1GW, 0.31nj/op; Exa Source: Intel Labs

7 7 To Reach ExaFlops Flops Pentium® II Architecture Pentium® 4 Architecture Pentium® Architecture 486 386 Intel® Core™ uArch 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 1.E+13 1.E+14 1.E+15 198519901995200020052010 Pentium® III Architecture Tera Peta Giga Source: Intel 2015 2020 Future Projection What it takes to get to Exa… 40 + TFlops per socket Power goal = 200W / Socket, to reach Linpack ExaFlops: 5 pJ / op / socket * 40 TFlops - 25K sockets peak or 33K sustained, or 10 pJ / op / socket * 20 TFlops - 50K sockets peak (conservative) Intel estimates of future trends. Intel estimates are based in part on historical capability of Intel products and projections for capability improvement. Actual capability of Intel products will vary based on actual product configurations.

8 8 Parallelism for Energy Efficient Performance 0.01 0.1 1 10 100 1000 10000 100000 1000000 19701980199020002010 Relative Performance Super Scalar 486 386 286 8086 Era of Pipelined Architecture Multi Threaded Multi-Core Era of Thread & Processor Level Parallelism Speculative, OOO Era of Instruction LevelParallelism 2020 10000000 Many Core Future Projection Intel estimates of future trends. Intel estimates are based in part on historical capability of Intel products and projections for capability improvement. Actual capability of Intel products will vary based on actual product configurations. Source: Intel Labs

9 9 Parallelism’s Challenges   Current models based on communication between sequential processes (e.g. MPI, SHMEM, etc.).   Depend on check-pointing for resilience. TerascalePetascaleExascale Time Mean Time between Component Failure Mean Time For a Global Checkpoint Comm-based systems break-down beyond the crossover point We need new, fault resilient programming models, so computations make progress even as components fail. Source: Intel

10 10 Software Scaling Performance Forward Existing Software Message Passing Programming Model Resiliency Issues Exa Concurrency Require A New Hierarchical Structure A concurrency primitives framework Specifying, assigning, executing, migrating, and debugging a hierarchy of units of computation Providing a unified foundation A high-level declarative coordination language Orchestrate billions of tasks written in existing serial languages Manage Resiliency: fully utilize hardware capabilities Today’s parallel framework Concurrency Primitives Framework High-level Declarative Coordination Language Virtual, abstract machine Application

11 11 Reduce Memory and Communication Power Core-to-core ~10pJ per Byte Chip to memory ~150pj per Byte Chip to chip ~16pJ per Byte Data movement is expensive

12 12 Technologies to Increase Bandwidth HE-WS/HPC Traditional CPU BW demand BW Trend Source: Intel Forecast DDR3 Assuming Assuming DDR4 increasing channels eDRAM: replace on-pkg mem controller with very fast flex links to an on-board mem controller Memory Package CPU Memory Controller + Buffer Assuming eDRAM at 2X/3 yrs CAGR Intel estimates of future trends in bandwidth capability. Intel estimates are based in part on historical bandwidth capability of Intel products and projections for bandwidth capability improvement. Actual bandwidth capability of Intel products will vary based on actual product configurations. BW Projections 0 50 100 150 200 2008200920102011201220132014201520162017 GB/S (Per Skt)

13 13 Power Efficient High I/O Interconnect Bandwidth 10 100 1000 10000 3.5X 8X 40X 75X 20122014 2016 2018-2019 (Exa) HPC Interconnect requirement progressionCOTS interconnect 50 GB/s MPI: 30Mmsgs/s, SHMEM: 300Mmsgs/s 200 GB/s MPI: 75Mmsgs/s, SHMEM: 1Gmsgs/s 1TB/s MPI: 325Mmsgs/s, SHMEM: 5Gmsgs/s 4TB/s MPI: 1.25Gmsgs/s, SHMEM: 20Gmsgs/s Source: Intel MPI: Message Passing Interconnect; SHMEM: Shared Memory <20 mW/Gb/s10 mW/Gb/s3 mW/Gb/s1 mW/Gb/s Power Target Copper and/or Silicon Photonics Intel estimates of future trends in bandwidth capability. Intel estimates are based in part on historical bandwidth capability of Intel products and projections for bandwidth capability improvement. Actual bandwidth capability of Intel products will vary based on actual product configurations.

14 14 Signaling Data Rate and Energy Efficiency Data Rate (Gb/s) 01015205 0.1 1 10 Signaling Energy Efficiency (pj/bit) Proposed Copper (Near term target) 1.0 GDDR5 ~25 DDR3 Intel ISSCC 05 11.7 3.6 5.0 Intel VLSI 06 2.7 ~15 Silicon Photonics (Longer term) Source: Intel Labs

15 15 Solid State Drive Future Performance and Energy Efficiency Assume: Capacity of the SSD grows at a CAGR of about 1.5; historical HDD at 1.6 20082012 2014 2016 2010 0 100 SSD GigaBytes Future projection 2018 50 5000 Vision 10 ExaBytes at 2018: 2 Million SSD’s vs. ½ Million HDD If @1w each, total 2MW If HDD (300 IOPS) and SSD (10k IOPS) constant: SSD has 140X IOPS Innovations to improve IO: 2X less power with 140x performance gain Source: Intel, calculations based on today’s vision

16 16 Increase Data Center Compute Density Silicon Process Target 50% yearly improvements in performance/watt Year Compute Density Data Center Innovation Power Management Small Form Factor New Technology ++++ Source: Intel, based on Intel YoY improvement with SpecPower Benchmark

17 17 Revised Exascale System Power ExaFLOPS Machine without Power Mgmt 10EB disk @ 10TB/disk @10W 100pJ com per FLOP 0.1B/FLOP @ 1.5nJ per Byte ~400W / Socket 200MW 150MW 100MW 10MW Other misc. power consumption: … Power supply losses Cooling … etc 500 + MW? Memory Comm Disk Compute Source: Intel, for illustration and assumptions, not product representative 10EB SSD @ 5TB/SSD 9pJ per FLOP 0.1B/FLOP @ 150pJ per Byte 50K Sockets @~200W each 10MW 9MW ~9MW ~2MW ~40MW Compute Memory Comm SSD ExaFLOPS Machine Future Vision Other misc. power consumption: … Power supply losses Cooling … etc 10MW

18 18


Download ppt "Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise Group Intel Corporation HPC: Energy Efficient Computing April."

Similar presentations


Ads by Google