HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Introduction to Grid Application On-Boarding Nick Werstiuk
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Types of Parallel Computers
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Chapter Hardwired vs Microprogrammed Control Multithreading
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Dr. Gheith Abandah, Chair Computer Engineering Department The University of Jordan 20/4/20091.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Computer System Architectures Computer System Software
Center for Research on Multicore Computing (CRMC) Overview Ken Kennedy Rice University
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures Ahmed Sameh and Ananth Grama Computer Science Department Purdue University.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
System Software for Parallel Computing. Two System Software Components Hard to do the innovation Replacement for Tradition Optimizing Compilers Replacement.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
March, 2003 SOS 7 Jim Harrell Unlimited Scale Inc.
Interactive Supercomputing Update IDC HPC User’s Forum, September 2008.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
LLMGuard: Compiler and Runtime Support for Memory Management on Limited Local Memory (LLM) Multi-Core Architectures Ke Bai and Aviral Shrivastava Compiler.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Parallel Portability and Heterogeneous programming Stefan Möhl, Co-founder, CSO, Mitrionics.
Programmability Hiroshi Nakashima Thomas Sterling.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
Nov 14, 08ACES III and SIAL1 ACES III and SIAL: technologies for petascale computing in chemistry and materials physics Erik Deumens, Victor Lotrich, Mark.
Background Computer System Architectures Computer System Software.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Chapter 1 Introduction.
Chapter 1 Introduction.
Constructing a system with multiple computers or processors
Many-core Software Development Platforms
Compiler Back End Panel
Compiler Back End Panel
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Alternative Processor Panel Results 2008
Constructing a system with multiple computers or processors
Back End Compiler Panel
HPC User Forum: Back-End Compiler Technology Panel
Types of Parallel Computers
Presentation transcript:

HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009

Will compiler code generation techniques transition along with the hardware transition from multi-core to many-core and hybrid systems and at what speed? Not quickly: Typical sequence of compiler, hardware & user response to hardware innovations: –Hand coded experimentation with primitives –Directly specify new primitives in source to simplify the development chain –Exploit “normal” language constructs to target subset capability of new primitives –Early attempts at optimization of feature usage –Language extensions to allow user specification of source attributes & choices to assist –New programming paradigm for radical programmer productivity improvement

What information do you need from a Compiler Intermediate Format to efficiently utilize multi-core, many-core and hybrid systems that is not available from traditional languages like C, C++, or F90? Are you looking at directive- based or library-based approaches or is there another approach that you like? IR not an obvious problem: HW communication design is: –Each core/alu obeys single threaded von Neumann design. Traditional optimizers work well. –Source languages continue to be single threaded or explicitly parallel for ease of understanding & control –Performance limitations around the use of multiple cores and hybrids are all bandwidth & latency related:  Primary optimization challenge since the 80s: How to minimize data movement?  Optimization paradigms for single threaded (e.g. cache reuse) work on new opportunities – how to apply?

Is embedded global memory addressing (like Co-Array Fortran) to be widely available and supported even on distributed memory systems? Yes! Partitioned Global Address Space (PGAS) languages bridge the “shared” vs. “distributed” design gap. –UPC, Co-Array Fortran, Titanium, HPCS languages... –Shared memory does not scale: Cache coherency is too expensive. RDMA needed – here soon (OpenFabrics Alliance) –The hard part about distributed memory:  Knowing where to put the data  Knowing when & how to move it (bandwidth & latency) ‏  A problem even in the shared memory case (NUMA) –Two level model: allows the compiler to provide the mechanisms, the programmer specifies the data placement and movement. –Compiler & runtime can help: Prefetching, caching etc. Analogous to traditional local memory optimizations.

What kind of hybrid systems or processor extensions are going to be supported by your compiler's code generation suite? SiCortex systems use multi-core chips with integrated communications fabric with built-in RDMA –Extant parallel programming models work well –Ideal platform for PGAS languages – coming soon! Committed to Open Source model for compilers & tools –Investment in gcc toolchain – bringing MIPS to HPC –Large investment in Open64 compiler codebase –Using open source components for PGAS support Integrated platform allows tighter tool integration than commodity cluster approach

What new run-time libraries will be available to utilize multi- core, many-core, and hybrid systems and will they work seamlessly through dynamic linking? Autotuned libraries for dense linear algebra and signal processing are clear successes. This trend will continue for other common HPC programming paradigms. Should work well for new HW paradigms. Data layout and movement optimization is unsolved for irregular problems. Both static analysis and run-time information appears essential to get the best results. Pushing AMR down into the tools seems promising. Modern dynamic languages (Java, C#, Matlab,...) not designed to exploit the power of static analysis – far from Fortran or C performance for HPC apps

Multi/Many-core vs. Hybrid From a programming perspective, Multi/Many-core isn’t fundamentally different from well-studied SMP exploitation going back decades. Cache coherency is expensive in both old SMPs and many-core contexts – difficult for compilers & libraries to optimize – false sharing, etc. HPC use of many-core will be severely limited by memory BW. Not obvious what compilers or libraries can do to overcome. Use of hybrid systems (GPGPUs, FPGAs, Cell,...) are relatively recent in comparison. Performance asymmetries still being explored. Much usage experience needed for tool evolution.