Finding Concurrency CET306 Harry R. Erwin University of Sunderland.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Concurrent and Distributed Systems Introduction to CET306 Harry R. Erwin, PhD University of Sunderland.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 14: Basic Parallel Programming Concepts.
Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland.
Memory Management Chapter 7. Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated efficiently to pack as.
Chapter 7 Memory Management
Chapter 7 Memory Management Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2009, Prentice.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
CS444/CS544 Operating Systems Introduction to Synchronization 2/07/2007 Prof. Searleman
Reference: Message Passing Fundamentals.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Chapter 2: Algorithm Discovery and Design
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Memory Management Chapter 5.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
1 Shawlands Academy Higher Computing Software Development Unit.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Memory Management Chapter 7.
Memory Management Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Dr.
Why Be Concurrent? CET306 Harry R. Erwin University of Sunderland.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Multicore Systems CET306 Harry R. Erwin University of Sunderland.
Creational Patterns CSE301 University of Sunderland Harry R Erwin, PhD.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Memory Management. Roadmap Basic requirements of Memory Management Memory Partitioning Basic blocks of memory management –Paging –Segmentation.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
CE Operating Systems Lecture 3 Overview of OS functions and structure.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
The Software Development Process
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.
Informationsteknologi Wednesday, October 3, 2007Computer Systems/Operating Systems - Class 121 Today’s class Memory management Virtual memory.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Programming Massively Parallel.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Unit 4: Processes, Threads & Deadlocks June 2012 Kaplan University 1.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Parallel Computing Presented by Justin Reschke
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Proving Correctness and Measuring Performance CET306 Harry R. Erwin University of Sunderland.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Chapter 7 Memory Management
Memory Management Chapter 7.
Parallel Patterns.
Chapter 2: System Structures
Computer Engg, IIT(BHU)
Chapter 2: Operating-System Structures
Lecture 2 The Art of Concurrency
Parallel Programming in C with MPI and OpenMP
Chapter 2: Operating-System Structures
Presentation transcript:

Finding Concurrency CET306 Harry R. Erwin University of Sunderland

Roadmap Design models for concurrent algorithms Patterns for finding design models Task decomposition Data decomposition What’s not parallel Conclusions Feedback opportunity

Texts Clay Breshears (2009) The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications, O'Reilly Media, Pages: 304. Mattson, T G; Sanders, B A; and B L Massingill (2005) Patterns for Parallel Programming, Addison-Wesley. Mordechai Ben-Ari (2006) Principles of Concurrent and Distributed Programming, Addison-Wesley. Wolfgang Kreutzer (1986) System Simulation: Programming Styles and Languages, Addison-Wesley.

Resources Gamma, Helm, Johnson, and Vlissides, 1995, Design Patterns, Addison-Wesley. The Portland Pattern Repository: Resources on Parallel Patterns Visual Studio 2010 and the Parallel Patterns Library Alexander, 1977, A Pattern Language: Towns/Buildings/ Construction, Oxford University Press. (For historical interest.)

Flynn’s Taxonomy (from Wikipedia) Single InstructionMultiple Instruction Single DataSISD—a sequential computer with no parallelism in instructions or data. MISD—multiple instruction streams operate on a single data stream. Unusual— used for reliability. The best known example is the space shuttle flight control computer, where the results of each instruction stream must agree. Multiple DataSIMD—a parallel computer where sequential instructions operate on multiple data streams. An array processor or GPU. MIMD—a distributed system with multiple CPUs.

Finding Concurrency Chapter 2 of Breshears begins by mentioning Mattson, et al. (2005). That book defines a pattern language for parallel programming, and explores four design spaces where patterns provide solutions. These include: – Finding Concurrency – Algorithm Structure – Supporting Structures – Implementation Mechanisms

Finding Concurrency Design Space This contains six patterns: – Task Decomposition – Data Decomposition – Group Tasks – Order Tasks – Data Sharing – Design Evaluation Breshears (2009) explores the first two patterns in detail. I will summarise all six here and add some slides after the first two to cover Breshears’ points.

Task Decomposition What tasks can execute concurrently to solve the problem? The programmer starts by investigating the computationally intensive parts of the problem, the key data structures, and how the data are used. The tasks may be clear. Your concerns are flexibility, efficiency, and simplicity. Identify lots of tasks—they can be merged later or threads can perform multiple tasks. Look at function calls and loops. Finally look at the data.

Possible Ways to Organise Your Tasks Have the main method create and start the threads. It then waits for all tasks to complete and finally generates the results. You can also create and start threads as needed. This is preferred if the need for threads is not clear until the program has been running. A recursive or binary search is an example. It’s cheaper not to start a thread if it’s likely you will have to stop it.

Content of your Task Decomposition What are the tasks? What are the dependencies between tasks? How are tasks assigned to threads?

Consider Explore concurrent execution of your threads. – Do a desk (manual) simulation, or – Program a simulation. Look for correctness—you want to avoid race conditions and ensure data are shared when required. Look for efficiency—all threads with parallel tasks should be sharing the computer. If threads are blocked, the design is inefficient. Balance your threads. Focus on the resource-intensive parts of the program. Often the limiting resources are a surprise. At least one task per thread or core and task should actually do enough useful work to justify their existence.

Data Decomposition Look for parallelism in the problem’s data. If the most computationally intensive part of the problem involves a large data structure, and the data in the structure can be manipulated in parallel, consider organising your tasks around that manipulation. Consider flexibility, efficiency, and simplicity in your design. Chunk the data so it can be operated on in parallel. Look for array-based processing and recursion. Plan for scalability and efficiency. Finally, look at the tasks.

Possible Ways to Organise your Data Consider the structure of your data. Consider restructuring your data to support parallel operations. Arrays are good for data parallelism. Divide them along one or more of their dimensions. Fixed format tables are also good for data parallelism. Statistical data frames lend themselves to parallel algorithms. Lists are good, but only if you have random access to sublists. Load balancing is important.

Consider How do you divide your data into chunks? How do you ensure that the task responsible for a chunk has access to the data it needs to do its job? How are data chunks assigned to threads?

Content of a Data Decomposition Chunking the data: – Individual elements – Rows – Columns – Blocks What do the boundaries between chunks look like? They should have small ‘area’ to minimise interference.

Data Synchronisation Consider efficiency. There are two approaches: – Copy the data over before it is needed. (Storage is required, and the data need to be frozen after copying.) – Share the data when it is needed. (Time is required, both to move the data and to wait for the transfer to complete. Locking may be required while the data are used.) Consider how often will copying be needed.

Data Scheduling You can assign data to specific threads statically or dynamically. Static is easier to implement. Dynamically allows load-balancing and supports scalability. Your task may have to wait for another thread to run, so you need to consider dynamic scheduling of tasks, which is messy…

Group Tasks How can tasks be grouped to simplify managing dependencies. This is done after the task decomposition. If tasks share constraints or are closely related, consider grouping them so that one feeds another or they form a larger task. You want an organised team of tasks, not a large number of individual tasks. Consider the following possibilities: order dependency, simultaneous execution, free concurrency. Look at various possible groupings and organisations.

Order Tasks Given a collection of tasks, in what order must they run? You will need to find and enforce the order dependencies of the system. The order needs to be restrictive enough that the order dependencies are enforced, but no more restrictive than that for maximum efficiency. Consider data ordering and limitations imposed by external services.

Data Sharing How should data be shared among the tasks you have defined? Classify data into task-local data and shared data and then define a protocol for data sharing. Consider race conditions and synchronisation overhead. Avoid joins if the threads involved have very different resource requirements or timing. Data can be read-only, effectively-local, or read-write. Look at replication for read-only data. Some read-write data summarise information collected by individual tasks, or data may be modified by a single task. Look at using local copies of these data.

Design Evaluation Time to ask yourself, am I done? Iterate over possible designs to choose the best one. Perhaps prototype the design to gain an understanding of where the time and resources are going. Check each possible design for correctness and efficiency. Consider the hardware environment.

Four Key Factors Efficiency ** Simplicity * Portability * Scalability ***

What’s Not Parallel Having a baby Algorithms, functions, or procedures with persistent state. Recurrence relations using data from loop t in loop t+1. If it’s loop t+k, you can ‘unwind’ the loop for some parallelism. Induction variables incremented non-linearly with each loop pass. Reductions transforming a vector to a value. Loop-carried dependence—where data generated in a previous loop iteration is used in the current iteration.

Modelling Massive Parallelism Eventually, you’ll be asked to model a massively parallel system, consisting of about 10,000 workstations communicating with a flight database. You may be tempted to define 10,000 threads, each modelling a workstation. Don’t go there. Why? Because threads take up storage and have overhead. Also, operating systems cannot deal with that many threads simultaneously. UNIX, for example, is limited to 32 threads. There’s a better way, called ‘event-stepped simulation’.

Approach Treat each workstation thread as a task. For each, keep track of what is next to be done and when. Define a simulation thread that works with a priority queue. It also keeps track of a clock. The priority queue maintains task actions in time order. The simulation thread asks the priority queue for the next action, updates the clock to the time of that action, performs any associated commands, and files the next task action(s) in the priority queue, scheduled for its next action time. We will explore this next week in Tutorial.

Conclusion Take out a piece of paper. Write down: – What’s working. – What isn’t. – What you would do differently. Hand it in. I’ll go over the comments next lecture. Note next lecture looks at some code, and there are no slides.