Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Supermicro 2013 The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代- BAH! Oil & Gas - Rio de Janeiro, Brazil Marc XAB, M.A. - 桜美林大学大学院.

Similar presentations

Presentation on theme: "© Supermicro 2013 The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代- BAH! Oil & Gas - Rio de Janeiro, Brazil Marc XAB, M.A. - 桜美林大学大学院."— Presentation transcript:

1 © Supermicro 2013 The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代- 5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Brazil Marc XAB, M.A. - 桜美林大学大学院 Country Manager Super Micro Computer Inc. Rua Funchal, 418. Sao Paulo – SP

2 Networking in Rio

3 Company Overview Fremont Facility Revenues: FY10 $721 M  FY11 $942 M  FY12 $1B Global Footprint: >70 Countries, 700 customers, 6800 SKUs Production: US, EU and Asia Production facilities Engineering: 70% of workforce in engineering, SSI Member Market Share: #1 Server Channel Corporate Focus: Leader Energy Efficient, HPC & Application-Optimized Systems San Jose (Headquarter) Fortune 2012 100 Fastest- Growing Companies

4 COPROCESSOR ( 协处理器 ) A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, encryption or I/O Interfacing with peripheral devices. Math coprocessor – a computer chip that handles the floating point operations and mathematical computations in a computer. Graphics Processing Unit (GPU) – a separate card that handles graphics rendering and can improve performance in graphics intensive applications, like games. Secure crypto-processor – a dedicated computer on a chip or microprocessor for carrying out cryptographic operations, embedded in a packaging with multiple physical security measures, which give it a degree of tamper resistance Network coprocessor. 网络协处理器. ……..

5 Green500 Rank MFLOPS/WSite*Computer*Total Power (kW) 1 2,499.44 National Institute for Computational Sciences/University of Tennessee Beacon - Appro GreenBlade GB824M, Xeon E5-2670 8C 2.600GHz, Infiniband FDR, Intel Xeon Phi 5110P 44.89 2 2,351.10 King Abdulaziz City for Science and Technology SANAM - Adtech ESC4000/FDR G2, Xeon E5-2650 8C 2.000GHz, Infiniband FDR, AMD FirePro S10000 179.15 3 2,142.77 DOE/SC/Oak Ridge National Laboratory Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x 8,209.00 4 2,121.71 Swiss Scientific Computing Center (CSCS) Todi - Cray XK7, Opteron 6272 16C 2.100GHz, Cray Gemini interconnect, NVIDIA Tesla K20 Kepler 129.00 5 2,102.12 Forschungszentrum Juelich (FZJ) JUQUEEN - BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect 1,970.00 The Trend Indicated on Green500

6 “Submerged Supermicro Servers Accelerated by GPUs”  Supermicro 1U (Single CPU) with two coprocessors  No requirement for room-level cooling  Operates at PUE ~ 1.12  25 kilowatts per rack – the breakpoint per rack (between regular air-cool and submerged cool) Case Study – Submerged Liquid Cooling Cost Efficiency Air cool Submerged liquid cool KW / rack ~25kW Removed Fans and Heat Sinks Use SSD & Updated BIOS Reverse the handlers

7 Tesla: 2-3x Faster Every 2 Years 16 2 4 6 8 10 12 14 DP GFLOPS per Watt 2008201020122014 T10 Fermi Kepler Maxwell 512 cores Thousands of core

8 GPU Supercomputer Momentum Tesla Fermi Launched 200820092010201120122013 June 2012 Top500 # of GPU Accelerated Systems on Top500 52 First Double Precision GPU 4x

9 Case Study – PNNL  Expects supercomputer to rank in world's top 20 fastest machines.  Research for climate and environmental science, chemical processes, biology- based fuels that can replace fossil fuels, new materials for energy applications, etc. Supermicro FatTwin™ with 2x MIC 5110P per node

10  Theoretical peak processing speed of 3.4 petaflops  42 racks / 195,840 cores  1440 compute nodes with conventional processors and Intel Xeon Phi "MIC" accelerators  128 GB memory per node  FDR Infiniband network  2.7 petabyte shared parallel file system (60 gigabytes per second read/write) Case Study – PNNL Supermicro FatTwin™ with 2x MIC 5110P per node

11 Programing Paradigm The Xeon Phi programming model and its optimization are shared across the Intel Xeon CUDA (Compute Unified Device Architecture) – a parallel computing platform and programming model. CUDA provides developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs. Made Easier Don’t Complicated

12 Keynotes This is a new era of hybrid computing – heterogeneous architecture with PCI-E based coprocessor Specialized (or application-optimized) design is required for GPU/MIC applications and HPC future scalability There are more to come in the industry roadmap with new technologies, power management and system architecture Configurable cooling & power for energy efficiency and performance are more and more critical The trend towards heterogeneous architecture poses many challenges for system builder and software developers in making efficient use of the resources Programming paradigm and its investment are important as a part of the selecting consideration

13 Options pricing Risk analysis Algorithmic trading Medical imaging Visualization & docking Filmmaking & animation Computational fluid dynamics Materials science Molecular dynamics Quantum chemistry Mechanical design & simulation Structural mechanics Electronic Design Automation Data parallel mathematics Extend Excel with OLAP for planning & analysis Database and data analysis acceleration Computational Finance Imaging and Computer Vision Weather Atmospheric Ocean Modeling Space Sciences Weather and Climate Simulation & Creation Design Scientific Seismic imaging Seismic Interpretation Reservoir Modeling Seismic Inversion Oil and Gas/Seismic Data Mining Massively parallel architecture accelerates Scientific & Engineering Applications HPC Coprocessor Applications

14 Telsa S1070 PCI-E x16 1U Twin™ The most powerful PSC The fastest 1U server in the world 1U 4-GPU Standalone box 2U GPU w/ QDR IB onboard 2U Twin 2U 4-GPU 1U 3-GPU 7U GPU Blades 20 CPUs + 20 GPUs X9 (DP) 1U 4-GPU/MIC X9 2U 6-GPU/MIC X9 (UP) 1U 2-GPU/MIC NVIDIA Kepler & Intel Xeon Phi supports Hybrid Computing FatTwin™ 2-node 8 GPUs or MICs per node FatTwin™ 4-node 3 GPUs or MICs per node Ultra High Efficiency 200820092010201120122013 4 GPUs or MICs Workstation / 4U Hybrid Computing Pioneer GPGPU Where it started… Efficiency Density Mainstream

15 Communication Between Coprocessors IB IB Switch The model used by existing CPU-GPU Heterogeneous architectures for GPU- GPU communication. Data travels via CPU & Infiniband (IB) Host Channel Adapter (HCA) and Switch or other proprietary interconnect Data transfer between cooperating GPUs in separate nodes in a TCA cluster enabled by the PEACH2 chip. Schematic of the PEARL network within a CPU/GPU cluster Implementation Example Source: Tsukuba University

16 Designing GPU/MIC Optimized Systems Performance  PCI-e lanes arrangement, PCB placement, interconnect Mechanical design  mounting, location, space utilization Thermal  air flow, fan speed control, location, noise control Power support  PSU efficiency, wattage options, power management  Number of power connectors (& location)

17 Summary  Coprocessor and Applications  Performance and Efficiency  Top500 & Green500  Hybrid Computing & HPC  GPU/MIC Optimized Systems  Design Considerations Performance Mechanical Design Thermal & Cooling Power Support

18 Thank You! Marc XAB

19 Conference Puzzle How do you put an ELEPHANT in a Refrigerator ?

20 Conference Puzzle

Download ppt "© Supermicro 2013 The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代- BAH! Oil & Gas - Rio de Janeiro, Brazil Marc XAB, M.A. - 桜美林大学大学院."

Similar presentations

Ads by Google