Presentation is loading. Please wait.

Presentation is loading. Please wait.

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

Similar presentations


Presentation on theme: "UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at."— Presentation transcript:

1 UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at University of Wisconsin-Madison

2 Executive summary Applications of deterministic record-replay –Debugging –Fault tolerance –Security Existing hardware record-replayer –Fast record but –Slow replay or –Requires major hardware changes Karma: Faster Replay with nearly- conventional h/w –Extends Rerun –Records more parallelism 2

3 Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 3

4 Deterministic Record-Replay Multi-threaded execution non-deterministic Deterministic record-replay to reincarnate past execution Record: –Record selective events in a log Replay: –Use the log to reincarnate past execution Key Challenge: Memory races 4

5 Record-Replay Motivation Debugging –Ensures bugs faithfully reappear (no heisenbugs) Fault-Tolerance –Enable hot backup for primary server to shadow primary & take over on failure Security –Real time intrusion detection & attack analysis Replay speed matters 5

6 Previous work Record Dependence –Wisconsin Flight Data Recorder [ISCA’03,etc.]: Too much state –UCSD Strata [ASPLOS’06]: Log size grows rapidly w #cores Record Independence –UIUC DeLorean [ISCA’08]: Non-conventional BulkSC H/W –Wisconsin Rerun [ISCA’08]: Sequential replay –Intel MRR [MICRO’09]: Only for snoop based systems –Timetraveler [ISCA’10]: Extends Rerun to lower log size Our Goal –Retain Rerun’s near-conventional hardware –Enable Faster Replay 6

7 Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 7

8 Rerun’s Recording Most code executes without races –Use race-free regions for ordering Episodes: independent execution regions –Defined per thread T0 T1 LD A ST B ST C LD F ST E LD B ST X LD R ST T LD X T2 ST V ST Z LD W LD J ST C LD Q LD J ST Q ST E ST K LD Z LD V ST X Partially adopted from ISCA’08 talk 8

9 23 Rerun’s Recording (Contd.) Capturing causality: –Timestamp via Lamport scalar clock [Lamport ‘78] Replay in timestamp order –Episodes with same timestamp can be replayed in parallel 43 22 60 61 44 62 23 44 45 T0T1T2 9

10 Rerun’s Replay T0T1T2 22 43 44 45 60 61 TS=22 TS=45 TS=44 TS=43 TS=60 TS=61 10

11 Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 11

12 Karma’s Insight 1: Capture order with DAG (not scalar clock) Recording: DAG captured with episode predecessor & successor sets 23 43 22 60 61 44 62 23 44 45 T0T1T2 12

13 Karma’s Insight 1: T0T1T2 22 60 61 43 44 62 T0T1T2 22 43 44 45 60 61 Rerun’s Replay Karma’s Replay 13

14 Karma’s Insight 1: (Contd.) Naïve approach: DAG arcs point to episodes –E–Episode represented by integers –T–Too much log size overhead !! Our approach: DAG arcs point to cores –R–Recording: Only one “active” episode per core –R–Replay: Send wakeup message(s) to core(s) of successor episode(s) 14

15 Karma’s Insight 1: T0T1T2 22 60 61 43 44 62 84 0|0|1 0|0|1 Anatomy of a log entry 15

16 Each log entry: Karma’s Insight 1: (Contd.) REFS Count --------------- Predecessor Successor 16

17 Not necessary to end the episode on every conflict: –As long as the episodes can be ordered during replay ST B ST C Karma Insight 2: T0 T1 LD A LD F ST E LD B ST X LD R ST T LD X T2 ST V ST Z LD W LD J ST C LD Q LD J ST Q ST E ST K LD Z LD V ST X 17

18 Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 18

19 Karma’s Per-Core State Karma Hardware Data Tags Directory Coherence Controller L1 I L1 D Pipeline L2 0 L2 1 L2 14 L2 15 Core 15 Interconnect DRAM … Core 14 Core 1 Core 0 … Base System Rerun L2/Memory State Total State: 148 bytes/core Address Filter(FLT) Reference (REFS ) Predecessor(PRED) Successor(SUCC ) Timestamp(TS ) 19

20 Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 20

21 Evaluation: Were we able to speed up the replay? 21

22 Evaluation: Were we able to speed up the replay? On Average ~4X improvement in replay speed over Rerun 22

23 Evaluation Did we blowup log size? On average Karma does not increase the size of the log but instead improves it by as much as 40% as we allow larger episodes 23

24 Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 24

25 Conclusion Applications of deterministic replay –Debugging –Fault tolerance –Security Existing hardware record-replayer –Slow replay or –Requires major hardware changes Karma: Faster Replay with nearly-conventional h/w –Extends Rerun –Uses DAG instead of Scalar clock –Extend episodes past conflicts Widen Application + Lower Cost  More Attractive 25

26 Questions? 26


Download ppt "UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at."

Similar presentations


Ads by Google