Parallel Programming By J. H. Wang May 2, 2017.

Parallel Programming By J. H. Wang May 2, 2017

Outline Introduction to Parallel Programming Parallel Algorithm Design

Motivation “Fast” isn’t fast enough
Faster computers let you tackle larger computations

What’s Parallel Programming
The use of a parallel computer to reduce the time needed to solve a single computational problem Parallel computer is a multiple-processor system Multicomputers, centralized multiprocessors (SMP) Programming in a language that allows you to explicitly indicate how different portions of the computation may be executed concurrently by different processors MPI: Message Passing Interface OpenMP: SMP

Concurrency To identify operations that may be performed in parallel (concurrently) Data dependence graph Vertex u: task Edge u->v: task v is dependent on task u Data parallelism Independent tasks applying the same operation to different data elements Functional parallelism Independent tasks applying different operation to different data elements Pipelined computation Computation divided into stages Size considerations

An Example of Data Dependence Graph

Programming parallel computers
Parallelizing compilers Sequential programs with compiler directives To extend a sequential programming language with parallel functions For creation, synchronization, and communication of processes, E.g.: MPI Adding a parallel programming layer Creation and synchronization of processes, partitioning of data Parallel language Or to add parallel constructs to existing languages

Parallel Algorithm Design
Task/Channel Model represents a parallel computation as a set of tasks that interact by sending messages through channels Task: a program, its local memory, and a collection of I/O ports Channel: a message queue that connects output port with other’s input port Asynchronous sending, synchronous receiving

PCAM: a design methodology for parallel programs

Partitioning Dividing the computation and data into pieces
Domain decomposition First divide the data into pieces, then determine how to associate computations with the data Functional decomposition First divide the computation into pieces, then determine how to associate data items with the computations E.g. pipelining To identify as many primitive tasks as possible

Checklist for partitioning
There are at least an order of magnitude more primitive tasks than processors Redundant computations and data storage are minimized Primitive tasks are roughly the same size The number of tasks is an increasing function of the problem size

Communication Local communication Global communication
When a task needs values from a small number of other tasks, we create channels from the tasks supplying data to the task consuming them Global communication When a significant number of primitive tasks must contribute data in order to perform a computation Part of the overhead of a parallel algorithm

Checklist for communication
Communication operations are balanced among tasks Each task communicates with only a small number of neighbors Tasks can perform their communications concurrently Tasks can perform their computations concurrently

Agglomeration Grouping tasks into larger tasks in order to improve performance or simplify programming Goals of agglomeration To lower communication overhead Increasing the locality of parallel algorithm Another way to lower communication overhead is to combine groups of sending and receiving tasks, reducing the number of messages being sent To maintain the scalability of the design To reduce software engineering costs

Checklist of Agglomeration
The agglomeration has increased the locality of the parallel algorithm Replicated computations task less time than the communications they replace The amount of replicated data is small enough to allow the algorithm to scale Agglomerated tasks have similar computational and communications costs The number of tasks is an increasing function of the problem size The number of tasks is as small as possible, yet at least as great as the number of processors The tradeoff between agglomeration and the cost of modifications to existing sequential code is reasonable

Mapping Assigning tasks to processors
Goal: to maximize processor utilization and minimize interprocess communication They are usually conflicting goals Finding an optimal solution is NP-hard

Checklist for mapping Designs based on one task per processor and multiple tasks per processor have been considered Both static and dynamic allocation of tasks to processors have been evaluated For dynamic allocation, the task allocator is not a bottleneck For static allocation, the ratio of tasks to processors is at least 10:1

References Ian Foster, Designing and Building Parallel Programs, available online at:

Parallel Programming By J. H. Wang May 2, 2017.

Similar presentations

Presentation on theme: "Parallel Programming By J. H. Wang May 2, 2017."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Programming By J. H. Wang May 2, 2017.

Similar presentations

Presentation on theme: "Parallel Programming By J. H. Wang May 2, 2017."— Presentation transcript:

Similar presentations

About project

Feedback