Dilemma of Parallel Programming Xinhua Lin ( 林新华 ) HPC Lab of 17 th Oct 2011.

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

Introductions to Parallel Programming Using OpenMP
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Introduction CSCI 444/544 Operating Systems Fall 2008.
CSE-700 Parallel Programming Introduction POSTECH Sep 6, 2007 박성우.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Chapel: The Cascade High Productivity Language Ting Yang University of Massachusetts.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Chapel: Motivating Themes Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010.
The Mother of All Chapel Talks Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Contemporary Languages in Parallel Computing Raymond Hummel.
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
GPU Architecture and Programming
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Multicore Computing Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
1 The Portland Group, Inc. Brent Leback HPC User Forum, Broomfield, CO September 2009.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Overview of Operating Systems Introduction to Operating Systems: Module 0.
Basic UNIX Concepts. Why We Need an Operating System (OS) OS interacts with hardware and manages programs. A safe environment for programs to run is required.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
Contemporary Languages in Parallel Computing Raymond Hummel.
Introduction to UNIX CS 2204 Class meeting 1 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
CSci6702 Parallel Computing Andrew Rau-Chaplin
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Chapter 4: Threads 羅習五. Chapter 4: Threads Motivation and Overview Multithreading Models Threading Issues Examples – Pthreads – Windows XP Threads – Linux.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
NCSA Strategic Retreat: System Software Trends Bill Gropp.
B ERKELEY P AR L AB Lithe Composing Parallel Software Efficiently PLDI  June 09, 2010 Heidi Pan, Benjamin Hindman, Krste Asanovic  {benh,
11 Brian Van Straalen Portable Performance Discussion August 7, FASTMath SciDAC Institute.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Computer System Structures
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
Productive Performance Tools for Heterogeneous Parallel Computing
Chapter 4: Multithreaded Programming
Performance Evaluation of Adaptive MPI
Chapter 4: Threads.
Chapter 4: Threads.
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Java Programming Introduction
Chapter 4: Threads & Concurrency
Human Media Multicore Computing Lecture 1 : Course Overview
Lecture Topics: 11/1 Hand back midterms
Programming Parallel Computers
Presentation transcript:

Dilemma of Parallel Programming Xinhua Lin ( 林新华 ) HPC Lab of 17 th Oct 2011

Disclaimers I am not funded by CRAY Slides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘The Mother of All Chapel Talks’, with permission from himself Funny pictures are from Internet

About me and HPC Lab in SJTU Directing HPC Lab Co-translator of PPP Co-founder of HMPP CoC for AP&Japan As MS HPC Invitation Support For HPC Center of SJTU Hold SJTU HPC Seminar monthly

Three Challenges for ParaProg in multi/many core era Revolution V.S. Evolution Low level V.S. High level – Performance V.S. Programmable Performance V.S. Performance Portability For more detail: Paper Version: Special issue for HPC and Cloud, Sep 2011 Online Version:

Outline Right Level to expose Parallel ParaProg languages Reviews Multiresolution and Chapel

Right Level to Expose Parallel

Can we stop water/parallel ? Hardware ISA OS Library Language

Performance V.S. Programmable Target Machine MPI OpenMP pthreads Expose Implementing Mechanisms “Why is everything so tedious?” Target Machine ZPL HPF Higher-Level Abstractions “Why don’t I have more control?” Low Level High Level

ParaProg Education Tired of teaching yet another specific lang. – MPI for Cluster – OpenMP for SMP then Multi-core CPU – CUDA for GPU, and now OpenCL – More on the way… Had to explain concepts by different tools – Single lang. to explain them all? Similar in OS education – Production OS: Linux, Unix and Window – OS only for education: Minix

ParaProg languages Reviews

Hybrid Programming Model MPI is insufficient in multi/many core era – OpenMP for multi-core – CUDA/OpenCL for many-core* So called Hybrid Programming was invented as a temporary solution, workable but ugly – MPI+OpenMP for Multi-core cluster – MPI+CUDA/OpenCL for GPU cluster like Tianhe-1A Similar idea used in CUDA for thread and thread-block, OpenCL for work-item and work- group * We will wait and see how OpenMP works on Intel MIC

ParaProg from different ways Low Level (expose implementation mechanism ) – MPI, CUDA and OpenCL – OpenMP High Level – PGAS: CAF, UPC and Tianuim – Global View: NESL, ZPL – APGAS: Chapel, X10 Directive Based – HMPP, PGI, CRAY-directive

Mulutiesolution and Chapel

What is Mulutiesolution? Structure the language in a layered manner, permitting it to be used at multiple levels as required/desired – support high-level features and automation for convenience – provide the ability to drop down to lower, more manual levels – use appropriate separation of concerns to keep these layers clean Distributions Data parallelism Task Parallelism Locality Control Target Machine Base Language language concepts

Where Chapel was born: HPCS HPCS: High Productivity Computing Systems (DARPA et al.) – Goal: Raise productivity of high-end computing users by 10  – Productivity = Performance + Programmability + Portability + Robustness Phase II: Cray, IBM, Sun (July 2003 – June 2006) – Evaluated the entire system architecture’s impact on productivity… processors, memory, network, I/O, OS, runtime, compilers, tools, … …and new languages: Cray: Chapel IBM: X10 Sun: Fortress Phase III: Cray, IBM (July 2006 – 2010) – Implement the systems and technologies resulting from phase II – (Sun also continues work on Fortress, without HPCS funding)

Global-view V.S. Fragmented Problem: “Apply 3-pt stencil to vector” global-view = + ( )/2)/2 fragmented = + = + = )/2+ ( ((

Global-view V.S. SPMD Code Global-View def main() { var n: int = 1000; var a, b: [1..n] real; forall i in 2..n-1 { b(i) = (a(i-1) + a(i+1))/2; } SPMD def main() { var n: int = 1000; var locN: int = n/numProcs; var a, b: [0..locN+1] real; if (iHaveRightNeighbor) { send(right, a(locN)); recv(right, a(locN+1)); } if (iHaveLeftNeighbor) { send(left, a(1)); recv(left, a(0)); } forall i in 1..locN { b(i) = (a(i-1) + a(i+1))/2; }

Chapel Overview A design principle for HPC – “Support the general case, optimize for the common case” Data Parallel (ZPL) + Task Parallel(CRAY MTA) + Script Lang. Latest version is available in as OSS: Distributions Data parallelism Task Parallelism Locality Control Target Machine Base Language language concepts

Chapel example: Heat Transfer A: 1.0 n n   4 repeat until max change < 

Chapel Code For Heat Transfer

Chapel as Minix in ParaProg If I were to offer a ParaProg class, I’d want to teach about: – data parallelism – task parallelism – concurrency – synchronization – locality/affinity – deadlock, livelock, and other pitfalls – performance tuning – …

Conclusion—Major Points Programmable and Performance are always the dilemma of ParaProg Multiresolution sounds perfect in theory but not mature enough for production However, Chapel could be used as Minix in ParaProg

Q&A