Parallel Programming Languages Andrew Rau-Chaplin.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
Introduction CS 524 – High-Performance Computing.
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Parallel Programming Models and Paradigms
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Strategic Directions in Real- Time & Embedded Systems Aatash Patel 18 th September, 2001.
EECC756 - Shaaban #1 lec # 1 Spring Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate.
OPL: Our Pattern Language. Background Design Patterns: Elements of Reusable Object-Oriented Software o Introduced patterns o Very influential book Pattern.
Contemporary Languages in Parallel Computing Raymond Hummel.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Hossein Bastan Isfahan University of Technology 1/23.
Parallel Architectures
PROGRAMMING LANGUAGES The Study of Programming Languages.
Computer Architecture Parallel Processing
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
CS 395 Last Lecture Summary, Anti-summary, and Final Thoughts.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
RAM, PRAM, and LogP models
09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Chapter 2: A Brief History Object- Oriented Programming Presentation slides for Object-Oriented Programming by Yahya Garout KFUPM Information & Computer.
CUDA - 2.
Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.
Processes Introduction to Operating Systems: Module 3.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Outline Why this subject? What is High Performance Computing?
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
09/02/2010CS4961 CS4961 Parallel Programming Lecture 4: CTA, cont. Data and Task Parallelism Mary Hall September 2,
Parallel Computing Presented by Justin Reschke
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Models and Languages for Parallel Computation & Modeling Stateful Resources with Web Services v. 1.1 Ramakrishna Varadarajan.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo
Advanced Architectures
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Distributed Shared Memory
Models and Languages for Parallel Computation
Parallel Programming By J. H. Wang May 2, 2017.
6- General Purpose GPU Programming
Presentation transcript:

Parallel Programming Languages Andrew Rau-Chaplin

Sources D. Skillicorn, D. Talia, “Models and Languages for Parallel Computation”, ACM Comp. Surveys. Warning: This is very much ONE practitioners viewpoint! Little attempt has been made to capture the conventional wisdom.

Outline Introduction to parallel programming Example Languages Message Passing in MPI Data parallel programming in *Lisp Shared address space programming in OpenMP CILK

Historically Supercomputers Highly structured numerical programs Parallelization of loops Multicomputers Each machine had its languages/compilers/libraries optimized for its architecture Parallel computing for REAL computer scientist, “Parallel programming is tough, but worth it”  Mostly numerical/scientific applications written using Fortran and parallel numerical libraries  Little other parallel software was written!

Needed Parallel programming abstractions that where Easy – provide help managing programming complexity But general! Portable – across machines But efficient!

Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Generic Parallel Architecture Solution: Yet another layer of abstraction! Parallel Model/ Language

Layered Perspective CAD MultiprogrammingShared address Message passing Data parallel DatabaseScientific modeling Parallel applications Programming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication hardware Physical communication medium Hardware/software boundary [Language = Library = Model ]

Programming Model Conceptualization of the machine that programmer uses in coding applications How parts cooperate and coordinate their activities Specifies communication and synchronization operations Multiprogramming no communication or synch. at program level Shared address space like bulletin board Message passing like letters or phone calls, explicit point to point Data parallel: more regimented, global actions on data Implemented with shared address space or message passing

What does parallelism add? Decomposition How is the work divided into distinct parallel threads? Mapping Which thread should be executed on which processor? Communication How is non-local data acquired? Synchronization When must threads know that they have reached a common state?

Skillicorn’s Wish list What properties should a good model of parallel computation have? Note: desired properties may be conflicting Themes What does the programming model handle for the programmer? How abstract can the model be and still realize efficient programs? Six Desirable Features

1) Easy to program Should conceal as much detail as possible Example of 100 proc., each with 5 threads, each thread potential communicated with any other = possible communication states! Hide: Decomposition, Mapping, Communications, and Synchronization As much as possible, rely on translation process to produce exact structure of parallel program

2) Software development methodology Firm semantic foundation to permit reliable transformation Issues: Correctness Efficiency Deadlock free Parallel Model/ Language Parallel Architecture

3) Architecture-Independent Should be able to migrate code easily to next generation of an architecture Short cycle-times Should be able to migrate code easily from one architecture to another Need to share code Even in this space, people are more expensive and harder to maintain than hardware

4) Easy to understand For parallel computing to be main stream Easy to go from sequential  Parallel Easy to teach Focus on easy-to-understand tools with clear, if limited, goals over, complex ones that may be powerful but are hard to use/master!

5) Guaranteed performance Guaranteed performance on a useful variety of real machines If T(n,p) = c f(n,p) + low order terms Preserve the Order of the complexity Keep the constants small A model that is good (not necessarily great) on a range of architectures is attractive!

6) Provide Cost Measures Cost measures are need to drive algorithmic design choices Estimated execution time Processor utilization Development costs In sequential, executions times between machines proportional (Machine A is 5 times faster than Machine B)  Two step model: Optimize algorithmically then code and tune.

6) Provide Cost Measures cont. In Parallel, Not so simple, no two step model Costs associated with decomposition, Mapping, Communications, and Synchronization may vary independently!  model must make estimated cost of operations available at design time Need an accounting scheme or cost model! Example: How should an algorithm trade- off communication vs. local computation?

Summary: Desired Features Often contradictory Some features more realistic on some architectures Room for more than one Language/Model!

Six Classification of Parallel Models 1) Nothing Explicit, Parallelism Implicit 2) Parallelism Explicit, Decomposition Implicit 3) Decomposition Explicit, Mapping Implicit 4) Mapping Explicit, Communications Implicit 5) Communications Explicit, Synchronization Implicit 6) Everything Explicit More Abstract, Less Efficient (?) Less Abstract, More Efficient (?)

Within Each Classification Dynamic Structure Allows dynamic thread creation Unable to restrict communications May overrun communication capacity Static Structure No dynamic thread creation May overrun communication capacity, cut Static structure supports cost models for prediction of communication Static and Communication Limited Structure No dynamic thread creation Can guarantee performance by limiting frequency and size of communications

Cilk Where should OpenMP go ??? *Lisp Models Languages Libraries

Recent Languages/systems Cilk plus plus MapReduce html html

Recent Languages GPUs: OpenCL & CUDA zone zone Grid Programming ammingPrimer.pdf ammingPrimer.pdf

Recent Languages Cloud Computing solution-providers/redhat/ solution-providers/redhat/ Cycle Scavenging