CHEP 2012 Computing in High Energy and Nuclear Physics Forrest Norrod Vice President and General Manager, Servers.

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

CIS 501: Computer Architecture | Prof. Joe Devietti | Introduction 1 CIS 501: Computer Architecture Unit 1: Introduction Slides developed by Joe Devietti,
Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
1 Computational models of the physical world Cortical bone Trabecular bone.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
2nd Workshop on Energy for Sustainable Science at Research Infrastructures Report on parallel session A3 Wayne Salter on behalf of Dr. Mike Ashworth (STFC)
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Chapter 1 CSF 2009 Computer Abstractions and Technology.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
EVOLVING TRENDS IN HIGH PERFORMANCE INFRASTRUCTURE Andrew F. Bach Chief Architect FSI – Juniper Networks.
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Workload Optimized Processor
May Richard P. Mount, SLAC Advanced Computing Technology Overview Richard P. Mount Director: Scientific Computing and Computing Services Stanford.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
Empowering efficient HPC with Dell Martin Hilgeman HPC Consultant EMEA.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.
Lots of hype, little science But – lots of this hype is real, and there are many challenging engineering problems Initially, we focus on data centers:
March 9, 2015 San Jose Compute Engineering Workshop.
1 Latest Generations of Multi Core Processors
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
1 CS : Technology Trends Ion Stoica and Ali Ghodsi ( August 31, 2015.
1 Evolving ATLAS Computing Model and Requirements Michael Ernst, BNL With slides from Borut Kersevan and Karsten Koeneke U.S. ATLAS Distributed Facilities.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
High Performance Computing
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Overview of VLSI 魏凱城 彰化師範大學資工系. VLSI  Very-Large-Scale Integration Today’s complex VLSI chips  The number of transistors has exceeded 120 million 
Operating Systems COT 4600 – Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu, Th 3:00-4:00 PM.
Parallel IO for Cluster Computing Tran, Van Hoai.
CERN VISIONS LEP  web LHC  grid-cloud HL-LHC/FCC  ?? Proposal: von-Neumann  NON-Neumann Table 1: Nick Tredennick’s Paradigm Classification Scheme Early.
CS203 – Advanced Computer Architecture
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Societal applications of large scalable parallel computing systems ARTEMIS & ITEA Co-summit, Madrid, October 30th 2009.
Extreme Scale Infrastructure
Conclusions on CS3014 David Gregg Department of Computer Science
Lynn Choi School of Electrical Engineering
Enabling Effective Utilization of GPUs for Data Management Systems
Inc. 32 nm fabrication process and Intel SpeedStep.
Technology advancement in computer architecture
OCP: High Performance Computing Project
Architecture & Organization 1
CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (
Low Latency Analytics HPC Clusters
Architecture & Organization 1
Overview of VLSI 魏凱城 彰化師範大學資工系.
Chapter 1 Introduction.
IBM Power Systems.
Exascale Programming Models in an Era of Big Computation and Big Data
Presentation transcript:

CHEP 2012 Computing in High Energy and Nuclear Physics Forrest Norrod Vice President and General Manager, Servers

Advancing human progress through basic knowledge advances technological innovations 2 Or whether, in an urge to provide better communication, one might have found electromagnetic waves. They weren't found that way. One might ask … whether basic circuits in computers might have been found by people who wanted to build computers. As it happens, they were discovered in the thirties by physicists dealing with the counting of nuclear particles because they were interested in nuclear physics. 1 They were found by Hertz who emphasized the beauty of physics and who based his work on the theoretical considerations of Maxwell. 1 1: C.H. Llewellyn Smith, former Director-General of CERN

Physicists’ needs are outpacing Engineers delivery 2007 goal was to achieve Exascale by 2015 So far … off by 100x Major challenges  Power  Reliability/Resiliency  Concurrency & locality (drives efficiency)  COST Source:

Performance Trajectories Source: ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems

What Levers do We Have? Hardware Performance What you can really get

Architecture Knob #1: frequency Frequency stagnant for a decade Unlikely to change anytime soon Thermal density challenges Macro power & cooling concerns No longer a main focus of IC industry

Moore’s law is fine … –130nm -> 22nm LOTS of Transistors Cores/socket growth in standard CPUs tapering off … Physical “uncore” problem: very costly to feed the powerful beast Mismatch with DRAM speeds Huge growth in CPU skt size, pins, memory (500 to 2000 pins in last 10 years, pin beasts on horizon) Architecture Knob #2: cores / socket

Architecture knob #3: # of sockets Heaviest recent growth Despite accelerators “Easiest” knob to turn But, Massive implications Network performance Resiliency & Stability Density Power/cooling costs

Architecture knob #5: efficiency Large compute networks generally = lower system efficiency Particularly as more CPUs & infrastructure are added Efficiency driven by Locality, storage, network, CPU,…..and code quality Typical values range: 0.3 – 0.9 More Sockets

Architecture knob #4: Instructions per core/clock IPC & FLOPS growth continues  uArch, replication of pipeline subunits, # of FPUs, FMA, BLAS libraries, others  ISA, AVX, SSEx, FPU Vector Width  Direct FPGA-implementation of algorithms Challenges: continual modification of software in order to take advantage of new instructions, new features, etc.

So what can we do to make this possible and affordable?

Driving IPC 12 Tomorrow Accelerator hybrids FPGA, SOC … Today Hybrid of CPU:GPU 2008: NVIDIA GPU 2012 / 13: AMD APU & INTEL MIC FPGAs: performance and perf/watt advantages > 4k DSP cores in 28nm System-on-Chip implementations OpenCL: ease programming model Enablement on top of CPUs, GPUS, FPGAs - hiding the complexity of silicon design Coupled with FPGAs - could change computational landscape And Open programming

Scaling Sockets 13 Disruptive integrated designs Innovative packaging Shared Infrastructure evolving Hybrid nodes Concurrent serviceability Highest efficiency for power and cooling Extending design to facility Modularized compute/storage/facility optimization 2000 nodes, 30 PB storage, 600 kW in 22 m 2 Ultimate in power efficiency, PUE as low as 1.02 Tomorrow New CPU Core Contender ARM ARM: potential to disrupt perf/$$, perf/watt model Standard ISA and Open ecosystem ARM+Native acceleration SOCs possible

Efficient Coordination 14 Reinventing Memory Hierarchy Redefining Networking Flatten & UnLock Network Topologies Software Defined (OpenFlow) Flat L2 networks East-West optimization 10G -> 40G -> 100G Memory Access a principal driver of “stalls” CPU/DRAM stacking Flash as a new tier (1/2 way in between DRAM and Disk) Memory virtualization coming soon… Accelerating collaborative usage and models Transparent Infrastructure stacks Leverage “Cloud computing” investments Cloud usage for low-duty-cycle usage Openflow, OpenStack, Hadoop,… open collaboration CPU regs DRAM Disk Flash

We’ll get there….but not in 2015