Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14.

Similar presentations


Presentation on theme: "DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14."— Presentation transcript:

1 DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

2 Presentation title | Presenter name The Multi-core Age 2 | Mobile PhonePCIntel Xeon Phi CSIRO ‘Bragg’ Compute Cluster 2-4 Cores4-16 Cores61 Cores2048 Cores

3 Presentation title | Presenter name Programming for multi-cores 3 | Problem CPU Core 1 CPU Core 2 CPU Core 3 CPU Core 4 Machine Instructions Execution Divide the problem

4 The maximum speedup is dependent on % of the problem you can run in parallel Presentation title | Presenter name Amdahl's Law 4 | Single Core Processor 1x Speed 50% 2x speedup 75% 4x speedup 90% 95% 10x speedup 20x speedup Maximum Speedup

5 Presentation title | Presenter name Data structures: 5 | Memory (data) is still a shared resource. Memory (data) CPU core Single Core Computer Memory (data) CPU core 4-Core Computer

6 Presentation title | Presenter name Linked-list (Stack) Data Structure 6 | Data A link to the next data point EMPTY A “node” that holds data. TOP

7 Presentation title | Presenter name Add new item (Push) 7 | Data A EMPTY We want to add a chunk of data (Data B) to the structure TOP Data B

8 Presentation title | Presenter name Add new item (Push) 8 | Data A EMPTY Steps: For new data B 1)Find the start of the structure (TOP) Data B TOP

9 Presentation title | Presenter name Add new item 9 | Data A EMPTY Data B Steps: For new data B 2) Link into the structure. TOP

10 Presentation title | Presenter name Add new item 10 | Data A NULL TOP (new) Data B Steps: For new data B 3) Update TOP.

11 Like stacking dinner plates Only need to keep track of where TOP is to access the rest. Presentation title | Presenter name Resulting structure 11 | Data NULL Data TOP

12 Presentation title | Presenter name What happens in multi-core systems? 12 | Two threads trying to operate on the stack structure: Thread 1 attempts at time T. Thread 2 attempts at time T + 1 nanosecond. Because each of the steps takes time to complete, errors occur.

13 Presentation title | Presenter name What happens in multi-core systems? 13 | This causes the interleaving of steps Thread 1 reads TOP (1) Thread 2 reads TOP (1) Thread 1 sets the next pointer (2) Thread 2 sets the next pointer (2) Thread 1 updates TOP (3) Thread 2 updates TOP (3)

14 Presentation title | Presenter name 14 | Data A EMPTY TOP Data CData B Data B is lost forever because it is not linked to TOP anymore  (Stack failure) Thread 1 Thread 2

15 Use “data locks”. Protect the 3 steps. One thread at a time is granted access to the stack. Complete an operation and release the lock. This is the standard approach for multithreaded structures. Presentation title | Presenter name How do we fix this? 15 |

16 Easy to use. 2 lines of code added to fix. -Get Lock -Step 1, 2,3. -Release Lock. × Slow. One thread at a time can use the lock. This becomes sequential code. This is the code that cannot run in parallel. Analogy: Merging highway traffic into a single lane. Presentation title | Presenter name Locks 16 |

17 New method Lock-free data structure. Special low-level instructions allows three steps in one computer instruction. Removes the need for locks. Called a Compare-Exchange. Presentation title | Presenter name Lock-free 17 |

18 Downside: Writing lock-free code is difficult (hence the project). The Compare-Exchange operation forms the base for writing lock-free code. The project takes specifications from research papers to implement. Presentation title | Presenter name Lock-free 18 |

19 Implemented a range of lock-free optimizations for the stack. Open coding standards (C++, OpenMP) Benchmarked using a Intel Xeon Phi 61 core processor. Lock-free structure performed about 2x better for pure stack operations. Presentation title | Presenter name Lock-free 19 |

20 Amdahl’s Law shows that it’s important to optimize sequential sections of code. The shared data structures are often sequential bottlenecks. Implementing lock-free data structures reduced this bottleneck. Presentation title | Presenter name Summary 20 |


Download ppt "DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14."

Similar presentations


Ads by Google