Presentation is loading. Please wait.

Presentation is loading. Please wait.

Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Similar presentations


Presentation on theme: "Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen."— Presentation transcript:

1 Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen

2 Big ideas Detection and replay of memory races is possible on commodity hardware Overhead high for some workloads …but surprisingly low for other workloads

3 Execution Replay CPU Memory Disk Network Keyboard, mouse Interrupts

4 Uses of Execution Replay Reconstructing state –Fault tolerance Reconstructing execution –Debugging –Realistic trace generation Both –Intrusion analysis

5 Single-processor Replay Basic principles well understood –Log all non-deterministic inputs –Timing of asynchronous events Minimal overhead (Dunlap02) –13% worst case –Log for months or years Available commercially –VMWare: Record/Replay

6 Replay for Multiprocessors Memory races in multiprocessor VMs The Ordering Requirement The CREW Protocol –Implementing with page protections –Relation to the Ordering Requirement –Generating constrants from CREW events DMA-capable devices and CREW Performance

7 The Multiprocessor Challenge Interleaved reads and writes –Fine-grained non-determinism –Much more difficult Existing solutions –Hardware modification –Software instrumentation SMP-ReVirt –Hardware MMU to detect sharing

8 Multiprocessor Replay P2 Memory P1 P2 n=3 n=5 if (n<4)

9 Ordering Memory Accesses Preserving order will reproduce execution –a→b: “a happens-before b” –Ordering is transitive: a→b, b→c means a→c Two instructions must be ordered if: –they both access the same memory, and –one of them is a write

10 Constraints: Enforcing order To guarantee a→d: –a→d–a→d –b→d–b→d –a→c–a→c –b→c–b→c Suppose we need b→c –b→c is necessary –a→d is redundant P1 a b c d P2 overconstrained

11 CREW Protocol Each shared object in one of two states: –Concurrent-Read: all processors can read, none can write –Exclusive-Write: one processor (the owner) can read and write; others have no access

12 CREW protocol, con’t Enforced with hardware MMU –Read/write –Read-only –None Change CREW states on demand –Fault, fixup, re-execute CREW event –Increasing or reducing permission due to CREW state changes

13 CREW Property If two instructions on different processors: –access the same page, –and one of them is a write, –there will be a CREW event on each processor between them.

14 Generating Constraints State: Concurrent Read –All processors read-only d*: CREW fault New state: P2 Exclusive r: privilege reduction –Read to None i: privilege increase –Read to Read/write Log timing of r and i Constraint: –r → i P1 a d P2 r i d*

15 Direct Memory Access Device accesses memory directly Logically another processor –Reads and writes need to be ordered –IOMMU: can’t fault/fixup/re-execute Observation: Transaction model Device: non-preemptible actor

16 Prototype: SMP-ReVirt Modified Xen hypervisor Implement logging, CREW protocol Details in paper

17 Evaluation questions What is the overhead? What affects performance? –In paper When might I want to use MP? –Log with 1, 2, or N cpus?

18 Evaluation Workloads SPLASH2 parallel application suite –FMM, LU, ocean, radix, water-spatial, radiosity Kernel-build Dbench

19 Predicting results Key changes in sharing attributes –4096-byte sharing granularity –“Miss” is very expensive SPLASH2 –Good: high spatial locality / low false sharing –Bad: random access patterns / high false sharing The Linux kernel –Tuned to 16-byte cacheline –Involving the kernel may be expensive

20 Single-processor Xen guests

21 Log Growth Rate WorkloadLog growth(GB/day)Days to fill 300GB FMM0.2341280 LU0.2371261 Ocean0.2321295 Radix0.2921025 Water-spatial0.2321296 Kernel-build0.564531 Radiosity0.2311295 Dbench0.557538

22 2-processor Xen guests

23 2-processor, con’t

24 Log Growth Rate WorkloadLog growth(GB/day)Days to fill 300GB FMM34.58.7 LU3.292.7 Ocean4.369.1 Radix39.87.5 Water-spatial36.38.25 Kernel-build43.36.9 Radiosity88.43.4 Dbench77.03.9

25 4-processor Xen guests

26 Recap Memory races in multiprocessor VMs The Ordering Requirement The CREW Protocol –Implementing with page protections –Relation to the Ordering Requirement –Generating constrants from CREW events DMA-capable devices and CREW Performance

27 Big ideas Detection and replay of memory races is possible on commodity hardware Overhead high for some workloads …but surprisingly low for other workloads

28 Questions


Download ppt "Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen."

Similar presentations


Ads by Google