Presentation is loading. Please wait.

Presentation is loading. Please wait.

DTHREADS: Efficient Deterministic Multithreading

Similar presentations


Presentation on theme: "DTHREADS: Efficient Deterministic Multithreading"— Presentation transcript:

1 DTHREADS: Efficient Deterministic Multithreading
Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: Lokesh Gidra

2 Concurrent Programming is hard!
Prone to deadlocks and race conditions. Thread interleavings are non-deterministic  Hard to debug! Deterministic Multithreaded System (DMT) eliminates this non-determinism. Same program with same input  same result. Simplifies debugging. Simplifies record and replay (eliminates need to track memory operations). Multiple replicated execution for fault tolerance.

3 Contributions DTHREADS guarantees deterministic execution.
Straightforward deployment: replaces libpthread. No recompilation required. Eliminates cache-line false sharing (as a side effect). Makes printf debugging practical!

4 Basic Idea Isolated memory access between different threads.
Replace threads with processes. Replace pthread_create() with clone system call. Memory mapped files are used to share memory (globals and the heap). Heap Thread 1 Thread 2

5 Fence and Global Token

6 Commit Protocol

7 Deterministic Synchronization (Global token is the key!)
Locks If held by someone else, pass the token. Release the token only when lock count is 0. Condition Variables Pthread_cond_wait: Remove from token’s Q and add to variable’s Q. Pthread_cond_signal: remove first thread in variable Q and add to token’s Q.

8 Contd… Barriers (similar to condition variable) Thread Creation
If not last to enter: move self from token Q to barrier Q. otherwise, move all from barrier Q to token Q. Thread Creation Child: place on token Q; wait for || phase. Thread Exit/Cancellation Remove from Q, call pthread_exit()/kill()

9 Memory Allocation and OS Support
Assign sub-heap to each thread using deterministic thread index. Superblocks allocated using locks  deterministic. Intercepts system calls which affect program execution (like sigwait). Intercepts read/write system calls: touch pages for COW, to avoid segfault.

10 Performance On 8-core machine with 16GB RAM, 4MB L2.
Benchmarks from PARSEC and Phoenix suites. For 9 of 14 benchs, dthreads runs nearly as fast or faster than pthreads, while providing determinism.

11 Scalability Scales nearly as well or better than pthreads.
Scales almost always as well or better than CoreDet.

12

13 Limitations Incurs substantial overhead for apps with large number of:
short lived transactions. modified pages per-transaction. No control over external non-determinism. Apps using Ad-hoc synchronization are not supported. Sharing of stack variables is not supported. Increases program’s memory footprint. Will perform poorly if #threads > #cores.

14 Personal Observations (side-effects on NUMA systems)
Substantially reduces TLB miss cost: For 64-bit apps, one TLB miss: Pthreads: ~1500 cycles Dthreads: ~500 cycles Diff-ing will be too expensive: 4K as compared to just few cache lines.

15 Take Away Deterministic Multithreaded Systems are good.
Dthreads: an easy to deploy DMT system. Supports all pthread APIs. Replaces threads with processes for memory isolation. Uses twin pages and diff-ing to commit changes. Avoids cache-line false sharing. Good for apps with less transactions. Or, can we say for scalable apps? Doesn’t support Ad-hoc synchronization.

16 Optimizations Lazy Commit Lazy twin creation and diff elimination
Single threaded execution Lock ownership Parallelization


Download ppt "DTHREADS: Efficient Deterministic Multithreading"

Similar presentations


Ads by Google