Compiler Front End Panel

Slides:



Advertisements
Similar presentations
Multi-core Computing Lecture 3 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
Advertisements

Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
1 Presenter: Chien-Chih Chen. 2 Dynamic Scheduler for Multi-core Systems Analysis of The Linux 2.6 Kernel Scheduler Optimal Task Scheduler for Multi-core.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Software and Services Group Optimization Notice Advancing HPC == advancing the business of software Rich Altmaier Director of Engineering Sept 1, 2011.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Simplicissimus, an Optimization Framework // User program Front end Core engine: stand-alone C++ program (19k lines of code) Optimizations expressed in.
Andrea Camesi, Jarle Hulaas Software Engineering Laboratory Swiss Federal Institute of Technology in Lausanne (EPFL) Switzerland.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Parallel Programming Models and Paradigms
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Embedded Java Research Geoffrey Beers Peter Jantz December 18, 2001.
CS 350 Operating Systems & Programming Languages Ethan Race Oren Rasekh Christopher Roberts Christopher Rogers Anthony Simon Benjamin Ramos.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
GPU Architecture and Programming
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Breakout Session 3 Stack of adaptive systems (with a view on self-adaptation)
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron.
Programmability Hiroshi Nakashima Thomas Sterling.
Introduction Why are virtual machines interesting?
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
HPC F ORUM S EPTEMBER 8-10, 2009 Steve Rowan srowan at conveycomputer.com.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Parallelisation of Desktop Environments Nasser Giacaman Supervised by Dr Oliver Sinnen Department of Electrical and Computer Engineering, The University.
Chapter Overview General Concepts IA-32 Processor Architecture
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
The Post Windows Operating System
Using Ada-C/C++ Changer as a Converter Automatically convert to C/C++ to reuse or redeploy your Ada code Eliminate the need for a costly and.
Chapter 4: Threads.
Multi-threading the Oxman Game Engine Sean Oxley CS 523, Fall 2012
Current Generation Hypervisor Type 1 Type 2.
Ioannis E. Venetis Department of Computer Engineering and Informatics
Ph.D. in Computer Science
Welcome: Intel Multicore Research Conference
Parallel Software Development with Intel Threading Analysis Tools
CMPS 5433 Programming Models
Threads Cannot Be Implemented As a Library
Pattern Parallel Programming
Task Scheduling for Multicore CPUs and NUMA Systems
Report on Vector Prototype
Many-core Software Development Platforms
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
Chapter 4 Multithreading programming
Intel® Parallel Studio and Advisor
Chapter 4: Threads.
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Compiler Back End Panel
Benjamin Goldberg Compiler Verification and Optimization
Compiler Back End Panel
Coe818 Advanced Computer Architecture
Tools.
Alternative Processor Panel Results 2008
Erlang Multicore support
Back End Compiler Panel
Tools.
Chapter 4: Threads & Concurrency
WG4: Language Integration & Tools
HPC User Forum: Back-End Compiler Technology Panel
WP3 Case Management Systems
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

Compiler Front End Panel Robert Geva Speaking for myself, working as a parallel programming architect Intel Compilers and Languages 2018/12/8

1. Language Support for Parallelism Data Parallelism Integration of Firetown (Ct) and RapidMind Array notation for C/C++ Semantically mandating data parallelism Task Parallelism Integration of Cilk++ and TBB A Cilk++ program is serialize- able Well defined points where execution becomes async Minimal addition to baseline language Defense against data races Safe, friendly parallel programming is achievable 12/8/2018

2. Performance Provided by System SW Firetown will use JIT compilation Adapt to HW, vector ISA Profile guided optimization – take the data into account TBB, Cilk rely on the programmer to write small units of work, units of scheduling (over decomposition) User mode work stealing scheduler provides dynamic load balancing Research project: CPU performance counters made available to a JIT compiler, allow dynamic re-optimizations Working to move performance responsibility from the programmer to system SW 12/8/2018

4. Tools Discovery tools Intel Parallel Advisor: read the source code Guide though elimination of potential data races Recommend opportunities to introduce parallelism based on program loop structure, targeting outer loops Guided Auto Parallelism: Programmer writes serial code Compiler attempts to auto parallelize, vectorize Provide guidance on how to restructure the serial code to remove semantic obstacles Biggest challenge: tools to help write a parallel program 12/8/2018

5. 3’rd party, pre compiled libraries IA SW echo system has a strong force of gravity Any solution, especially in native programming, must be compatible with existing, non recompiled SW As well as with 3’rd party tools Backward compatibility is unquestioned 12/8/2018

3. Heterogeneous Targets Target: CPU + LRB A Solution: CELO Multi core big core IA Many core IA Separate physical memories Different ISA extensions Different OSes Native, C/C++ A single source code base fed into a single compiler, generates a program that utilizes both core types TBB, Cilk++ model for parallelism MYO – system SW layer for data synchronization across the PCIe buss Minimal (<5 keywords) language extension to express what code executes on which core type Heterogeneous target is programmable with existing languages and existing methodologies. 12/8/2018