Contemporary Languages in Parallel Computing Raymond Hummel.

Slides:

Advertisements

Similar presentations

Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.

Advertisements

Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

Fine-grain Task Aggregation and Coordination on GPUs

Parallel Processing with OpenMP

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

Computing with Accelerators: Overview ITS Research Computing Mark Reed.

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

GRAPHICS AND COMPUTING GPUS Jehan-François Pâris

OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained by the Khronos group  Currently 1.0, 1.1, and 1.2.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

OpenSSL acceleration using Graphics Processing Units

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.

Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.

Hossein Bastan Isfahan University of Technology 1/23.

Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.

What is Concurrent Programming? Maram Bani Younes.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.

GPU Architecture and Programming

1 The Portland Group, Inc. Brent Leback HPC User Forum, Broomfield, CO September 2009.

1 Latest Generations of Multi Core Processors

Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

OpenCL Programming James Perry EPCC The University of Edinburgh.

Contemporary Languages in Parallel Computing Raymond Hummel.

11 Computers, C#, XNA, and You Session 1.1. Session Overview  Find out what computers are all about ...and what makes a great programmer  Discover.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Martin Kruliš by Martin Kruliš (v1.1)1.

Parallel Programming Models

Computer Engg, IIT(BHU)

Productive Performance Tools for Heterogeneous Parallel Computing

Chapter 4: Multithreaded Programming

GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht

For Massively Parallel Computation The Chaotic State of the Art

Chapter 3: Windows7 Part 1.

Chapter 4: Threads.

Chapter 4: Threads.

Shared Memory Programming

Hybrid Programming with OpenMP and MPI

Introduction to Operating Systems

MPJ: A Java-based Parallel Computing System

Introduction to CUDA.

Java Programming Introduction

Chapter 4: Threads & Concurrency

Graphics Processing Unit

Presentation transcript:

Contemporary Languages in Parallel Computing Raymond Hummel

Current Languages

Standard Languages Distributed Memory Architectures  MPI Shared Memory Architectures  OpenMP  pthreads Graphics Processing Units  CUDA  OpenCL

Use in Academia Journal articles referencing parallel languages and libraries  MPI – 863  CUDA – 539  OpenMP – 391  OpenCL – 195  Posix - 124

MPI Stands for: Message Passing Interface Pros  Extremely Scalable  Remains the dominant model for high performance computing today  Can be used to tie implementations in other languages together  Portable  Can be run in almost all OS/hardware combinations  Bindings exist for multiple languages, from Fortran to Python  Can harness a multitude of hardware setups  MPI programs can run on both distributed memory and shared memory systems

MPI Cons  Complicated Software  Requires the programmer to wrap their head around all aspects of parallel execution  Single program must handle the behavior of every process  Complicated Hardware  Building and maintaining a cluster isn’t easy  Complicated Setup  Jobs have to be run using mpirun or mpiexec  Requires mpicc to link mpi libraries for compiler

MPI

OpenMP Stands for: Open Multi-Processing Pros  Incremental Parallelization  Parallelize just that pesky triple for-loop  Portable  Does require compiler support, but all major compilers already support it  Simple Software  Include the library, add a preprocessor directive, compile with a special flag

OpenMP Cons  Limited Use-Case  Constrained to shared memory architectures  63% of survey participants from were focused on development for individual desktops and servershttp://goparallel.sourceforge.net  Scalability limited by memory architecture  Memory bandwidth is not scaling at the same rate as computation speeds

OpenMP

POSIX Threads Stands for: Portable Operating System Interface Threads Pros  Fairly Portable  Native support in UNIX operating systems  Versions exist for Windows as well  Fine Grained Control  Can control mapping of threads to processors

POSIX Threads Cons  All-or-Nothing  Can’t use software written with pthreads on systems that don’t have support for it  Major rewrite of main function required  Complicated Software  Thread management  Limited Use-Case

POSIX Threads

CUDA Stands for: Compute Unified Device Architecture Pros  Manufacturer Support  NVIDIA is actively encouraging CUDA development  Provide lots of shiny tools for developers  Low Level Hardware Access  Because Cross-Platform Portability isn’t a priority, NVIDIA can expose low-level details

CUDA Cons  Limited Use-Case  GPU computing requires massive data parallelism  Only Compatible with NVIDIA Hardware

CUDA

OpenCL Stands for: Open Compute Language Pros  Portability  Works on all major operating systems  Heterogeneous Platform  Works on CPUs, GPUs, APUs, FPGAs, coprocessors, etc…  Works with All Major Manufacturers  AMD, Intel, NVIDIA, Qualcomm, ARM, and more

OpenCL Cons  Complicated Software  Manual Everything  Special Tuning Required  Because it cannot assume anything about the hardware on which it will run, programmer has to tell it the best way to do things

Non-Standard Languages CILK OpenACC C++ AMP

CILK Language first developed by MIT Based on C, commercial improvements extend it to C++ Championed by Intel Operates on the theory that the programmer should identify parallelism, then let the run-time divide the work between processing elements Has only 5 keywords: cilk, spawn, sync, inlet, abort CILK Plus implementation merged into version 4.9 of the GNU C and C++ compilers

OpenACC Stands for: Open ACCelerators Not currently supported by major compilers Aims to function like OpenMP, but for heterogeneous CPU/GPU systems NVIDIA’s answer to OpenCL

C++ AMP Stands for: C++ Accelerated Massive Parallelism Library implemented on DirectX 11 and an open specification by Microsoft Visual Studio 2012 and up provide Debugging and Profiling support Works on any hardware that has DirectX 11 drivers

Future Languages

Developing Languages D Rust Harlan

D Performance of Compiled Languages Memory Safety Expressiveness of Dynamic Languages Includes a Concurrency Aware Type-System Nearing Maturity

Rust Designed for creation of large Client-Server Programs on the Internet Safety Memory Layout Concurrency Still Major Changes Occurring

Harlan Experimental Language Based on Scheme Designed to take care of boilerplate for GPU Programming Could be expanded to include automatic scheduling for both CPU and GPU, depending on available resources.

Questions?