Presentation is loading. Please wait.

Presentation is loading. Please wait.

FUNDAMENTAL CONCEPTS OF PARALLEL PROGRAMMING

Similar presentations


Presentation on theme: "FUNDAMENTAL CONCEPTS OF PARALLEL PROGRAMMING"— Presentation transcript:

1 FUNDAMENTAL CONCEPTS OF PARALLEL PROGRAMMING
Module II

2 Introduction Parallel programming is based :
Concept of threads Design, develop and deploy each thread within an application Communication between threads and application How to divide each task into chunks suitable for threading

3 Designing for threads Traditional programming : OOP Main()
Serial programming Waits for user interaction in a loop Simple to program and execute One step generally flows into the next Based on predetermined parameters, a predictable conclusion can be reached

4 Parallel programming model
Rethink idea of process flow Identify activities that can be run in parallel Programs are a set of tasks with dependencies between them The process of breaking a program to tasks and identifying dependencies between them are called decomposition

5 Dependencies Data dependencies Control dependency Flow Anti-dependancy
Output dependacy Control dependency

6 I. Designing for threads
Task decomposition Data decomposition Data flow decomposition Implications of different decompositions

7 Decomposition Program can be decomposed based on task, data and data flow

8 1. Task Decomposition One of the simplest ways
Tasks are catalogued and if 2 can run concurrently, developer assigns them concurrent Slight changes in program becomes necessary to indicate concurrent running and avoid conflicts Example : Mowing and weeding a lawn : one worker mowes the lawn, another worker weeds Text entry and pagination in MSWord

9 2. Data Decomposition Also called data level parallelism
Based on data rather than on work Many threads – same task – different data More work in same time Example Digital image processing Mowing and weeding in a garden : both mow half the property first, and then weed the property.

10 Data Flow Decomposition
How data flows between tasks Example : producer consumer problem output of producer, input to consumer Different threads for producer and consumer Consumer cannot start until producer finishes some portion of execution

11 Example : gardening Example : parsing
One gardener can prepare the tools, put gas in the mower, cleans shear etc – to be used by both gardeners.. No gardening occurs till this step is finished Example : parsing An input file must be parsed and analysed semantically before code generation.

12 Dimensions noticed in a PC Problem
Dependence b/w producer and consumer can cause delays if not implemented properly A good design has to avoid delays and situations wherein the consumer threads have to be idle while waiting for producer threads Handoff between producer and consumer has to be clean Output has to be context independent and consumer need not know any details about producer If consumer is completing and producer has already finished, one thread is idle while others are working Then loads cannot be balanced and all threads cannot be kept busy

13 Implications of Different Decompositions
Different decompositions have different benefits Applications are threaded for higher performance. So choice of decomposition is important. Certain tasks are suited to certain decompositions Example : digital image processing – task decomposition : one thread does color balancing, one thread does decoding etc.. – data decomposition : each thread does all the work on one frame and moves to next framre

14 Decision made on resource constraints
Example – gardening : suppose only one mower.

15 Challenges while threading
Use of threads improves performance by allowing 2 or more activities to occur simultaneously Complex coding – because more than one activity is occurring in the program at the same time Thoughtful programming

16 CHALLENGES IN USING THREADS
Synchronization : 2 or more threads coordinate their activities Communication : Bandwidth and latency issues associated with exchanging data b/w threads Load balancing : distribution of work among threads so that they all perform roughly the same amount of work Scalability : challenge of making efficient use of a larger number of threads when s/w is run on more capable systems.

17 Parallel Programming Patterns
Object oriented programmers have been using design pattern to design their applications Example : Divide and conquer, greedy algorithm.. Parallel programming problems often fall into one of the several patterns.

18 Patterns

19 Task level parallelism pattern
Focus on tasks A problem is decomposed to a set of activities that operate parallel It is necessary to remove dependencies between tasks through use of replication

20 Divide and conquer pattern
The problem is divided into sub-problems Each sub-problem is solved parallel and independently Results of each sub problem are aggregated into final solution Example : merge sort Advantage : load balancing

21 Geometric decomposition pattern
Based on parallelization of data structures used in the problem being solved Each thread is responsible for operating on data chunks Example : problems like wave propagation

22 Pipeline pattern Break computation to stages
Each thread works on a different stage simultaneously

23 Wavefront pattern Processing data elements along a diagonal in a 2D grid

24 A Motivating Problem : Error Diffusion
Case study : Error Diffusion Error diffusion is a type of halftoning in which the quantization residual is distributed to neighboring pixels that have not yet been processed. Its main use is to convert a multi-level image into a binary image, though it has other applications. Displays continous tone digital image on devices that have limited tone range

25

26 Floyd and Steinberg algorithm
 This algorithm pushes (adds) the residual quantization error of a pixel onto its neighbouring pixels, to be dealt with later.

27

28 Error Diffusion algorithm
Determine output value for each input value. Uses thresholding or quantization. for an 8 bit image to be displayed on a binary device we do conversions : [0,127] – 0 ; [128,256] – 1

29 Compute error between what is being displayed and what should have been displayed
First normalize before finding error. ie, input is 0 if output is 0. Input is 255 if output is 1 Assume input is 168. Output will be 1. Error is the difference between actual value that should have been displayed (168) and output value (255) = -87

30 Distribute error value on fractional basis to neighboring pixels

31 C Implementation (pg 64)

32 Analysis of the error diffusion algorithm
Previous pixel errors must be known to compute value of next pixel Interdependency between pixels So only one pixel can be processed at a time

33 Alernate approach : Parallel Error Diffusion
How to cover the same problem parallely – using multiple threads Initial case :

34 Look from receiver’s perspective

35 A pixel is not processed until its spatial predecessors have been processed
Multiple producers producing data (error) and a single consumer (current pixel) Data flow decomposition

36 Determining pattern 1 method : keep one thread for even pixels, another for odd pixels of a row. Disadv : one thread will be blocked waiting for another Interdependancies should be reduced For a pixel to be processed, it should get three values from the previous row and one from the immediate left. One thread for each row

37 Wavefront pattern

38 Other alternatives If we have multiple images (pages), each are non dependant data set Hybrid approach :


Download ppt "FUNDAMENTAL CONCEPTS OF PARALLEL PROGRAMMING"

Similar presentations


Ads by Google