We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byRaina Adcock
Modified over 4 years ago
© 2009 RoCk SOLid KnOwledge 1 http://www.rocksolidknowledge.com Architecting for multiple cores Andrew Clymer firstname.lastname@example.org
© 2009 RoCk SOLid KnOwledge 2 Objectives Why should I care about parallelism? Types of parallelism Introduction to Parallel Patterns
© 2009 RoCk SOLid KnOwledge 3 The free lunch is over Herb Sutter, Dr Dobb’s Journal 30 th March 2005
© 2009 RoCk SOLid KnOwledge 4 Multi core architectures Symmetric multi processors ( SMP ) Shared Memory Easy to program with Scalability issues with memory bus contention Distributed Memory Hard to program against, message passing model Scales well when very low latency or low message exchange Grid Heterogeneous Message Passing No central control Hybrid Many SMP’s networked together
© 2009 RoCk SOLid KnOwledge 5 The continual need for speed For a given time frame X Data volume increases More detail in games Intellisense Better answers Speech recognition Route planning Word gets better at grammar checking
© 2009 RoCk SOLid KnOwledge 6 Expectations Amdahl’s Law If only n% of the compute can be parallelised at best you can get an n% reduction in time taken. Example 10 hours of compute of which 1 hour cannot be parallelised. So minimum time to run is 1 hour + Time taken for parallel part, thus you can only ever achieve a maximum of 10x speed up. Not even the most perfect parallel algorithms scale linearly Scheduling overhead Combining results overhead
© 2009 RoCk SOLid KnOwledge 7 Types of parallelism Coarse grain parallelism Large tasks, little scheduling or communication overhead Each service request executed in parallel, parallelism realised by many independent requests. DoX,DoY,DoZ and combine results. X,Y,Z large independent operations Applications Web Page rendering Servicing multiple web requests Calculating the payroll Coarse grain
© 2009 RoCk SOLid KnOwledge 8 Types of parallelism Fine grain parallelism A single activity broken down into a series of small co-operating parallel tasks. f(n=0...X) can possibly be performed by many tasks in parallel for given values of n Input => Step 1 => Step 2 => Step 3 => Output Applications Sensor processing Route planning Graphical layout Fractal generation ( Trade Show Demo ) Fine grain
© 2009 RoCk SOLid KnOwledge 9 Moving to a parallel mind set Identify areas of code that could benefit from concurrency Keep in mind Amdahl’s law Adapt algorithms and data structures to assist concurrency Lock free data structures Utilise a parallel framework Microsoft Parallel Extensions.NET 4 Open MP Intel Threading blocks Thinking Parallel
© 2009 RoCk SOLid KnOwledge 10 Balancing safety and concurrency Reduce shared data Maximising “free running” to produce scalability Optimise shared data access Select cheapest form of synchronization Consider lock free data structures If larger proportion of time spent reading than writing, consider optimise for read.
© 2009 RoCk SOLid KnOwledge 11 Parallel patterns Fork/Join Pipe line Geometric Decomposition Parallel loops Monti Carlo
© 2009 RoCk SOLid KnOwledge 12 Fork/Join Book Hotel, Book Car, Book Flight Sum X + Sum Y + Sum Z Task sumX = Task.Factory.StartNew(SumX); Task sumY = Task.Factory.StartNew(SumY); Task sumZ = Task.Factory.StartNew(SumZ); int total = sumX.Result+ sumY.Result+ sumZ.Result; Parallel.Invoke( BookHotel, BookCar, BookFlight ); // Won’t get here until all three tasks are complete
© 2009 RoCk SOLid KnOwledge 13 Pipe Line When an algorithm has Strict ordering, first in first out Large sequential steps that are hard to parallelise Each stage of the pipeline represents a sequential step, and runs in parallel Queues connect tasks to form the pipeline Example, Signal processing
© 2009 RoCk SOLid KnOwledge 14 Pipeline Advantages Easy to implement. Disadvantages In order to scale you need as many stages as cores. Hybrid approach Parallelise further a given stage to utilise more cores.
© 2009 RoCk SOLid KnOwledge 15 Geometric Decomposition Task parallelism is one approach, data parallelism is another. Decompose the data into a smaller sub set, have identical tasks work on smaller data sets. Combine results.
© 2009 RoCk SOLid KnOwledge 16 Geometric Decomposition Each task runs on its own sub region, often they need to share edge data Update/Reading needs to be synchronised with neighbouring tasks Edge Complications Task 1Task 2Task 3Task 4Task 5Task 6
© 2009 RoCk SOLid KnOwledge 17 Parallel loops Loops can be successfully parallelised IF The order of execution is not important. Each iteration is independent of each other. Each Iteration has sufficient work to compensate for scheduling overhead. If insufficient work per loop iteration consider creating an inner loop. for
© 2009 RoCk SOLid KnOwledge 18 Cost of Loop Parallelisation Task Scheduling Cost = 50 Iteration Cost = 1 Total Iterations = 100,000 Sequential Time would be 100,000 Assuming fixed cost per iteration Time = Number of Tasks * ( Task Scheduling Cost + ( IterationCost * IterationsPerTask ) Number of Cores Iterations Per Task Number of Tasks Time 2100100075000 250000250050 4100100037500 425000425050 2561001000586 256391256440
© 2009 RoCk SOLid KnOwledge 19 Travelling Salesman problem Calculate the shortest path, between a series of towns, starting and finishing at the same location Possible combinations n Factorial 15 City produces 1307674368000 combinations Took 8 Cores appx. 18 days to calculate exact shortest route. Estimate 256 Cores would take appx. 0.5 days
© 2009 RoCk SOLid KnOwledge 20 Monti Carlo Some algorithms even after heavy parallelisation are too slow. Often some answer in time T is often better than a perfect answer in T+ Real time speech processing Route planning Machines can make educated guesses too.. Thinking out of the box
© 2009 RoCk SOLid KnOwledge 21 Monti Carlo Single Core
© 2009 RoCk SOLid KnOwledge 22 Monti Carlo 8 Cores
© 2009 RoCk SOLid KnOwledge 23 Monti Carlo Average % Error over 1-8 cores
© 2009 RoCk SOLid KnOwledge 24 Monti Carlo Each core runs independently making guesses. Scales, more cores greater the chance of a perfect guess. As we approach 256 cores...looks very attractive. Strategies Pure Random Hybrid, Random + Logic
© 2009 RoCk SOLid KnOwledge 25 Summary Free Lunch is over Think concurrent Test on a variety of hardware platforms Utilise a framework and patterns to produce scalable solutions, now and for the future. Andrew Clymer email@example.com http://www.rocksolidknoweldge.com /Blogs /Screencasts
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
GSA Pizza Talk - EPFL - Capillary routing with FEC by E. Gabrielyan 1 Capillary Multi-Path Routing for reliable Real-Time Streaming with FEC.
Chapter 13: Query Processing
EE384y: Packet Switch Architectures
© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Analysis of Computer Algorithms
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
System Development MIS Chapter 6 Jack G. Zheng May 28 th 2008.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Year 6 mental test 5 second questions
© 2018 SlidePlayer.com Inc. All rights reserved.