Michael Bond (Ohio State) Milind Kulkarni (Purdue)
Concurrent software is nondeterministic Record & replay: more important & harder
Offline replay Reproduce production bugs Online replay Replication-based fault tolerance Offloading of security events
Nondeterministic thread interleavings: Synchronization Data races
Detects races high overhead [LeBlanc & Mellor-Crummey ’87] Custom hardware support [FDR] [Rerun] [Strata] [DeLorean] [MRR] Doesn’t support offline or online replay [Respec] [ODR] [PRES] [Weeratunge et al. ’10] DoublePlay: extra cores; doesn’t scale well
Detects races high overhead [LeBlanc & Mellor-Crummey ’87] Custom hardware support [FDR] [Rerun] [Strata] [DeLorean] [MRR] Doesn’t support offline or online replay [Respec] [ODR] [PRES] [Weeratunge et al. ’10] DoublePlay: extra cores; doesn’t scale well
Detects races high overhead [LeBlanc & Mellor-Crummey ’87] Custom hardware support [FDR] [Rerun] [Strata] [DeLorean] [MRR] Doesn’t support offline or online replay [Respec] [ODR] [PRES] [Weeratunge et al. ’10] DoublePlay: extra cores; doesn’t scale well Don’t track conflicting dependences
Every access might conflict Synchronization conflicting access T1 if o.lastAccess != T1 … write o.f T2 if o.lastWrite != T2 … read o.f
Every access might conflict: Need instrumentation at every access Synchronization conflicting access T1 if o.lastAccess != T1 … o.lastAccess = T1 write o.f T2 if o.lastAccess != T2 … o.lastAccess = T2 read o.f
Every access might conflict: Need synchronization at every access Synchronization conflicting access T1 if o.lastAccess != T1 … o.lastAccess = T1 write o.f T2 if o.lastAccess != T2 … o.lastAccess = T2 read o.f
Every access might conflict: Need synchronization at every access at conflicting accesses only T1 if o.lastAccess != T1 … o.lastAccess = T1 write o.f T2 if o.lastAccess != T2 … o.lastAccess = T2 read o.f
Every access might conflict: Need synchronization at every access at conflicting accesses only T2 if o.lastAccess != T2 … o.lastAccess = T2 read o.f
Every access might conflict: Need synchronization at every access at conflicting accesses only T1 … safe point: … T2 if o.lastAccess != T2 … o.lastAccess = T2 read o.f
Every access might conflict: Need synchronization at every access at conflicting accesses only T1 if o.state != WrEx T1 … write o.f T2 if o.state in { WrEx T2, RdEx T2, RdSh } … read o.f
Every access might conflict: Need synchronization at every access at conflicting accesses only Related to locality & ownership tracking [Shasta] [Biased locking] [von Praun & Gross ’01] [CoreDet?] [IBM’s STM?]
… safe point if o.state = … … read o.f Record dynamic program location Happens-before
Increment counter Wait for counter … safe point if o.state = … … read o.f Happens-before
sync (o) { write o.f } sync (o) { read o.f } Happens-before
sync (o) { write o.f } sync (o) { read o.f } Happens-before
Non-conflicting accesses very fast Static analysis Conflicting accesses not too slow Pessimistic concurrency?
Controlling other sources of nondeterminism: I/O Low-level VM concurrency Timer-based sampling Record vs. replay
Controlling other sources of nondeterminism: I/O Low-level VM concurrency Timer-based sampling Record vs. replay Different heap layouts different hash codes
Controlling other sources of nondeterminism: I/O Low-level VM concurrency Timer-based sampling Record vs. replay Different heap layouts different hash codes Deterministic hash codes?
Software record & replay by tracking conflicting dependences Optimistic concurrency control Performance & replayability challenges Apply concurrency control mechanism to other problems?