Presentation on theme: "Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland."— Presentation transcript:
Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland
Texts Clay Breshears (2009) The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications, O'Reilly Media. Mordechai Ben-Ari (2006) Principles of Concurrent and Distributed Programming, Second edition, Addison-Wesley.
Signup for Individual Feedback Choose a 15 minute slot with your regular tutor. There are several questions: – What is your approach (overview)? (4 marks) – How are you testing it? (4 marks) – How are you solving the first half—threading? (4 marks) – How are you solving the second half—cleaning up the overlapping reservations? (4 marks) – Looking back, what surprises have you encountered? (4 marks) Mark scale: 0—non-engagement; 1—serious problems; 2—average; 3—good; 4—professional.
Eight Rules of Concurrent Design 1.Identify Truly Independent Computations 2.Implement Concurrency at the Highest Level Possible 3.Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores 4.Make Use of Thread-Safe Libraries Wherever Possible 5.Use the Right Threading Model 6.Never Assume a Particular Order of Execution 7.Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data 8.Dare to Change the Algorithm for a Better Chance of Concurrency Concurrent programming remains more art than science!
Identify Truly Independent Computations You cannot execute something concurrently unless the operations in each thread are independent of each other! Review the list in “What’s Not Parallel.” (Next Slide)
What’s Not Parallel Having a baby Algorithms, functions, or procedures with persistent state. Recurrence relations using data from loop t in loop t+1. If it’s loop t+k for k>1, you can ‘unwind’ the loop for some parallelism. Induction variables incremented non-linearly with each loop pass. Reductions computing a value from a vector. Loop-carried dependence—where data generated in some previous loop iteration is used in the current iteration.
Implement Concurrency at the Highest Level Possible Suppose you have serial code and wish to thread it. You can work top-down or bottom-up. In your initial analysis, you’re looking for hot-spots that run in parallel give you the best performance. In bottom-up, you start with the hot-spots and move up. In top-down you consider the whole application and break it down. Placing concurrency at the highest possible level breaks the program up into naturally independent threads of work that are unlikely to share data. This provides structure for your more detailed threading.
Plan Early for Scalability (Taking Advantage of the Added Cores) The number of cores will only increase. Plan for it. This is not Moore’s Law—the speed-up is not background; you have to make it happen. Scalability is the ability of your application to handle useful increases in system resources (cores, memory, bus performance) Data decomposition methods give more scalable solutions. (Hint!) Note the project exploits data decomposition.
Make Use of Thread-Safe Libraries Wherever Possible Don’t reinvent the wheel, especially when it’s complicated. Many libraries already take advantage of multicore processors – Intel Math Kernel Library (MKL) – Intel Integrated Performance Primitives (IPP) Even more important—all library calls used should be thread-safe. Check the library documentation. In your own libraries—routines should be reentrant.
Use the Right Threading Model If threaded libraries are not good enough, so that you need to use your own threads, don’t use explicit threads if an implicit threading model is good enough. – OpenMP (data decomposition, threading loops running over large data sets) – Intel Threading Building Blocks Keep it as simple as possible! If you can’t use third party libraries in the deliverable code, prototype with them first and then convert.
Never Assume a Particular Order of Execution The execution order of threads is non- deterministic. There is no reliable way of predicting the ordering. If you assume an ordering, you will have data races, particularly when the hardware changes. Let the threads run as fast as possible and design them to be unencumbered. Synchronise only when necessary.
Use Thread-Local Storage or Associate Locks to Specific Data Synchronisation costs—don’t do it unless it’s needed for correctness. Use thread-local storage or memory associated with specific threads. Watch out for assumptions on the number of threads—don’t hard-code your design. Avoid frequent shared updates. If you must synchronise, use carefully designed locks, usually one-to-one with data structures or critical clumps of data. Only one lock to a data object. (Document!)
Dare to Change the Algorithm for a Better Chance of Concurrency The bottom line is execution time. Analysis usually uses asymptotic performance (big-O notation, to be covered later). However, the best serial algorithm may not be parallelisable. Then consider a suboptimal serial algorithm that you can parallelise. Know where to find a good book on algorithms – Knuth (the ‘Bible’ of algorithm theory) – Sedgewick (3 rd or 4 th edition, 3 rd edition is more advanced)