Finding Concurrency CET306 Harry R. Erwin University of Sunderland.

Finding Concurrency CET306 Harry R. Erwin University of Sunderland

Roadmap Design models for concurrent algorithms Patterns for finding design models Task decomposition Data decomposition What’s not parallel Conclusions Feedback opportunity

Texts Clay Breshears (2009) The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications, O'Reilly Media, Pages: 304. Mattson, T G; Sanders, B A; and B L Massingill (2005) Patterns for Parallel Programming, Addison-Wesley. Mordechai Ben-Ari (2006) Principles of Concurrent and Distributed Programming, Addison-Wesley. Wolfgang Kreutzer (1986) System Simulation: Programming Styles and Languages, Addison-Wesley.

Resources Gamma, Helm, Johnson, and Vlissides, 1995, Design Patterns, Addison-Wesley. The Portland Pattern Repository: http://c2.com/ppr/http://c2.com/ppr/ Resources on Parallel Patterns http://www.cs.uiuc.edu/homes/snir/PPP/ http://www.cs.uiuc.edu/homes/snir/PPP/ Visual Studio 2010 and the Parallel Patterns Library http://msdn.microsoft.com/en-us/magazine/dd434652.aspx http://www.microsoft.com/download/en/details.aspx?id=19222 http://msdn.microsoft.com/en-us/library/dd492418.aspx http://msdn.microsoft.com/en-us/magazine/dd434652.aspx http://www.microsoft.com/download/en/details.aspx?id=19222 http://msdn.microsoft.com/en-us/library/dd492418.aspx Alexander, 1977, A Pattern Language: Towns/Buildings/ Construction, Oxford University Press. (For historical interest.)

Flynn’s Taxonomy (from Wikipedia) Single InstructionMultiple Instruction Single DataSISD—a sequential computer with no parallelism in instructions or data. MISD—multiple instruction streams operate on a single data stream. Unusual— used for reliability. The best known example is the space shuttle flight control computer, where the results of each instruction stream must agree. Multiple DataSIMD—a parallel computer where sequential instructions operate on multiple data streams. An array processor or GPU. MIMD—a distributed system with multiple CPUs.

Finding Concurrency Chapter 2 of Breshears begins by mentioning Mattson, et al. (2005). That book defines a pattern language for parallel programming, and explores four design spaces where patterns provide solutions. These include: – Finding Concurrency – Algorithm Structure – Supporting Structures – Implementation Mechanisms

Finding Concurrency Design Space This contains six patterns: – Task Decomposition – Data Decomposition – Group Tasks – Order Tasks – Data Sharing – Design Evaluation Breshears (2009) explores the first two patterns in detail. I will summarise all six here and add some slides after the first two to cover Breshears’ points.

Task Decomposition What tasks can execute concurrently to solve the problem? The programmer starts by investigating the computationally intensive parts of the problem, the key data structures, and how the data are used. The tasks may be clear. Your concerns are flexibility, efficiency, and simplicity. Identify lots of tasks—they can be merged later or threads can perform multiple tasks. Look at function calls and loops. Finally look at the data.

Possible Ways to Organise Your Tasks Have the main method create and start the threads. It then waits for all tasks to complete and finally generates the results. You can also create and start threads as needed. This is preferred if the need for threads is not clear until the program has been running. A recursive or binary search is an example. It’s cheaper not to start a thread if it’s likely you will have to stop it.

Content of your Task Decomposition What are the tasks? What are the dependencies between tasks? How are tasks assigned to threads?

Consider Explore concurrent execution of your threads. – Do a desk (manual) simulation, or – Program a simulation. Look for correctness—you want to avoid race conditions and ensure data are shared when required. Look for efficiency—all threads with parallel tasks should be sharing the computer. If threads are blocked, the design is inefficient. Balance your threads. Focus on the resource-intensive parts of the program. Often the limiting resources are a surprise. At least one task per thread or core and task should actually do enough useful work to justify their existence.

Data Decomposition Look for parallelism in the problem’s data. If the most computationally intensive part of the problem involves a large data structure, and the data in the structure can be manipulated in parallel, consider organising your tasks around that manipulation. Consider flexibility, efficiency, and simplicity in your design. Chunk the data so it can be operated on in parallel. Look for array-based processing and recursion. Plan for scalability and efficiency. Finally, look at the tasks.

Possible Ways to Organise your Data Consider the structure of your data. Consider restructuring your data to support parallel operations. Arrays are good for data parallelism. Divide them along one or more of their dimensions. Fixed format tables are also good for data parallelism. Statistical data frames lend themselves to parallel algorithms. Lists are good, but only if you have random access to sublists. Load balancing is important.

Consider How do you divide your data into chunks? How do you ensure that the task responsible for a chunk has access to the data it needs to do its job? How are data chunks assigned to threads?

Content of a Data Decomposition Chunking the data: – Individual elements – Rows – Columns – Blocks What do the boundaries between chunks look like? They should have small ‘area’ to minimise interference.

Data Synchronisation Consider efficiency. There are two approaches: – Copy the data over before it is needed. (Storage is required, and the data need to be frozen after copying.) – Share the data when it is needed. (Time is required, both to move the data and to wait for the transfer to complete. Locking may be required while the data are used.) Consider how often will copying be needed.

Data Scheduling You can assign data to specific threads statically or dynamically. Static is easier to implement. Dynamically allows load-balancing and supports scalability. Your task may have to wait for another thread to run, so you need to consider dynamic scheduling of tasks, which is messy…

Group Tasks How can tasks be grouped to simplify managing dependencies. This is done after the task decomposition. If tasks share constraints or are closely related, consider grouping them so that one feeds another or they form a larger task. You want an organised team of tasks, not a large number of individual tasks. Consider the following possibilities: order dependency, simultaneous execution, free concurrency. Look at various possible groupings and organisations.

Order Tasks Given a collection of tasks, in what order must they run? You will need to find and enforce the order dependencies of the system. The order needs to be restrictive enough that the order dependencies are enforced, but no more restrictive than that for maximum efficiency. Consider data ordering and limitations imposed by external services.

Data Sharing How should data be shared among the tasks you have defined? Classify data into task-local data and shared data and then define a protocol for data sharing. Consider race conditions and synchronisation overhead. Avoid joins if the threads involved have very different resource requirements or timing. Data can be read-only, effectively-local, or read-write. Look at replication for read-only data. Some read-write data summarise information collected by individual tasks, or data may be modified by a single task. Look at using local copies of these data.

Design Evaluation Time to ask yourself, am I done? Iterate over possible designs to choose the best one. Perhaps prototype the design to gain an understanding of where the time and resources are going. Check each possible design for correctness and efficiency. Consider the hardware environment.

Four Key Factors Efficiency ** Simplicity * Portability * Scalability ***

What’s Not Parallel Having a baby Algorithms, functions, or procedures with persistent state. Recurrence relations using data from loop t in loop t+1. If it’s loop t+k, you can ‘unwind’ the loop for some parallelism. Induction variables incremented non-linearly with each loop pass. Reductions transforming a vector to a value. Loop-carried dependence—where data generated in a previous loop iteration is used in the current iteration.

Modelling Massive Parallelism Eventually, you’ll be asked to model a massively parallel system, consisting of about 10,000 workstations communicating with a flight database. You may be tempted to define 10,000 threads, each modelling a workstation. Don’t go there. Why? Because threads take up storage and have overhead. Also, operating systems cannot deal with that many threads simultaneously. UNIX, for example, is limited to 32 threads. There’s a better way, called ‘event-stepped simulation’.

Approach Treat each workstation thread as a task. For each, keep track of what is next to be done and when. Define a simulation thread that works with a priority queue. It also keeps track of a clock. The priority queue maintains task actions in time order. The simulation thread asks the priority queue for the next action, updates the clock to the time of that action, performs any associated commands, and files the next task action(s) in the priority queue, scheduled for its next action time. We will explore this next week in Tutorial.

Conclusion Take out a piece of paper. Write down: – What’s working. – What isn’t. – What you would do differently. Hand it in. I’ll go over the comments next lecture. Note next lecture looks at some code, and there are no slides.

Finding Concurrency CET306 Harry R. Erwin University of Sunderland.

Similar presentations

Presentation on theme: "Finding Concurrency CET306 Harry R. Erwin University of Sunderland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding Concurrency CET306 Harry R. Erwin University of Sunderland.

Similar presentations

Presentation on theme: "Finding Concurrency CET306 Harry R. Erwin University of Sunderland."— Presentation transcript:

Similar presentations

About project

Feedback