Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2009 RoCk SOLid KnOwledge 1 Architecting for multiple cores Andrew Clymer

Similar presentations

Presentation on theme: "© 2009 RoCk SOLid KnOwledge 1 Architecting for multiple cores Andrew Clymer"— Presentation transcript:

1 © 2009 RoCk SOLid KnOwledge 1 Architecting for multiple cores Andrew Clymer

2 © 2009 RoCk SOLid KnOwledge 2 Objectives Why should I care about parallelism? Types of parallelism Introduction to Parallel Patterns

3 © 2009 RoCk SOLid KnOwledge 3 The free lunch is over Herb Sutter, Dr Dobb’s Journal 30 th March 2005

4 © 2009 RoCk SOLid KnOwledge 4 Multi core architectures Symmetric multi processors ( SMP ) Shared Memory Easy to program with Scalability issues with memory bus contention Distributed Memory Hard to program against, message passing model Scales well when very low latency or low message exchange Grid Heterogeneous Message Passing No central control Hybrid Many SMP’s networked together

5 © 2009 RoCk SOLid KnOwledge 5 The continual need for speed For a given time frame X Data volume increases More detail in games Intellisense Better answers Speech recognition Route planning Word gets better at grammar checking

6 © 2009 RoCk SOLid KnOwledge 6 Expectations Amdahl’s Law If only n% of the compute can be parallelised at best you can get an n% reduction in time taken. Example 10 hours of compute of which 1 hour cannot be parallelised. So minimum time to run is 1 hour + Time taken for parallel part, thus you can only ever achieve a maximum of 10x speed up. Not even the most perfect parallel algorithms scale linearly Scheduling overhead Combining results overhead

7 © 2009 RoCk SOLid KnOwledge 7 Types of parallelism Coarse grain parallelism Large tasks, little scheduling or communication overhead Each service request executed in parallel, parallelism realised by many independent requests. DoX,DoY,DoZ and combine results. X,Y,Z large independent operations Applications Web Page rendering Servicing multiple web requests Calculating the payroll Coarse grain

8 © 2009 RoCk SOLid KnOwledge 8 Types of parallelism Fine grain parallelism A single activity broken down into a series of small co-operating parallel tasks. f(n=0...X) can possibly be performed by many tasks in parallel for given values of n Input => Step 1 => Step 2 => Step 3 => Output Applications Sensor processing Route planning Graphical layout Fractal generation ( Trade Show Demo ) Fine grain

9 © 2009 RoCk SOLid KnOwledge 9 Moving to a parallel mind set Identify areas of code that could benefit from concurrency Keep in mind Amdahl’s law Adapt algorithms and data structures to assist concurrency Lock free data structures Utilise a parallel framework Microsoft Parallel Extensions.NET 4 Open MP Intel Threading blocks Thinking Parallel

10 © 2009 RoCk SOLid KnOwledge 10 Balancing safety and concurrency Reduce shared data Maximising “free running” to produce scalability Optimise shared data access Select cheapest form of synchronization Consider lock free data structures If larger proportion of time spent reading than writing, consider optimise for read.

11 © 2009 RoCk SOLid KnOwledge 11 Parallel patterns Fork/Join Pipe line Geometric Decomposition Parallel loops Monti Carlo

12 © 2009 RoCk SOLid KnOwledge 12 Fork/Join Book Hotel, Book Car, Book Flight Sum X + Sum Y + Sum Z Task sumX = Task.Factory.StartNew(SumX); Task sumY = Task.Factory.StartNew(SumY); Task sumZ = Task.Factory.StartNew(SumZ); int total = sumX.Result+ sumY.Result+ sumZ.Result; Parallel.Invoke( BookHotel, BookCar, BookFlight ); // Won’t get here until all three tasks are complete

13 © 2009 RoCk SOLid KnOwledge 13 Pipe Line When an algorithm has Strict ordering, first in first out Large sequential steps that are hard to parallelise Each stage of the pipeline represents a sequential step, and runs in parallel Queues connect tasks to form the pipeline Example, Signal processing

14 © 2009 RoCk SOLid KnOwledge 14 Pipeline Advantages Easy to implement. Disadvantages In order to scale you need as many stages as cores. Hybrid approach Parallelise further a given stage to utilise more cores.

15 © 2009 RoCk SOLid KnOwledge 15 Geometric Decomposition Task parallelism is one approach, data parallelism is another. Decompose the data into a smaller sub set, have identical tasks work on smaller data sets. Combine results.

16 © 2009 RoCk SOLid KnOwledge 16 Geometric Decomposition Each task runs on its own sub region, often they need to share edge data Update/Reading needs to be synchronised with neighbouring tasks Edge Complications Task 1Task 2Task 3Task 4Task 5Task 6

17 © 2009 RoCk SOLid KnOwledge 17 Parallel loops Loops can be successfully parallelised IF The order of execution is not important. Each iteration is independent of each other. Each Iteration has sufficient work to compensate for scheduling overhead. If insufficient work per loop iteration consider creating an inner loop. for

18 © 2009 RoCk SOLid KnOwledge 18 Cost of Loop Parallelisation Task Scheduling Cost = 50 Iteration Cost = 1 Total Iterations = 100,000 Sequential Time would be 100,000 Assuming fixed cost per iteration Time = Number of Tasks * ( Task Scheduling Cost + ( IterationCost * IterationsPerTask ) Number of Cores Iterations Per Task Number of Tasks Time 2100100075000 250000250050 4100100037500 425000425050 2561001000586 256391256440

19 © 2009 RoCk SOLid KnOwledge 19 Travelling Salesman problem Calculate the shortest path, between a series of towns, starting and finishing at the same location Possible combinations n Factorial 15 City produces 1307674368000 combinations Took 8 Cores appx. 18 days to calculate exact shortest route. Estimate 256 Cores would take appx. 0.5 days

20 © 2009 RoCk SOLid KnOwledge 20 Monti Carlo Some algorithms even after heavy parallelisation are too slow. Often some answer in time T is often better than a perfect answer in T+ Real time speech processing Route planning Machines can make educated guesses too.. Thinking out of the box

21 © 2009 RoCk SOLid KnOwledge 21 Monti Carlo Single Core

22 © 2009 RoCk SOLid KnOwledge 22 Monti Carlo 8 Cores

23 © 2009 RoCk SOLid KnOwledge 23 Monti Carlo Average % Error over 1-8 cores

24 © 2009 RoCk SOLid KnOwledge 24 Monti Carlo Each core runs independently making guesses. Scales, more cores greater the chance of a perfect guess. As we approach 256 cores...looks very attractive. Strategies Pure Random Hybrid, Random + Logic

25 © 2009 RoCk SOLid KnOwledge 25 Summary Free Lunch is over Think concurrent Test on a variety of hardware platforms Utilise a framework and patterns to produce scalable solutions, now and for the future. Andrew Clymer /Blogs /Screencasts

Download ppt "© 2009 RoCk SOLid KnOwledge 1 Architecting for multiple cores Andrew Clymer"

Similar presentations

Ads by Google