RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

Fill in missing numbers or operations

& dding ubtracting ractions.

1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.

1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.

Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =

Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

We need a common denominator to add these fractions.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt ShapesPatterns Counting Number.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.

1 Processes and Threads Creation and Termination States Usage Implementations.

Who Wants To Be A Millionaire? Decimal Edition Question 1.

Learning to show the remainder

Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.

Break Time Remaining 10:00.

The basics for simulations

15. Oktober Oktober Oktober 2012.

Area of triangles.

We are learning how to read the 24 hour clock

Adding Up In Chunks.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.

1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.

Effective and Inexpensive (Memory) Race Recording Min Xu Thesis Defense 05/04/2006 Electrical and Computer Engineering Department, UW-Madison Advisors:

Before Between After.

Benjamin Banneker Charter Academy of Technology Making AYP Benjamin Banneker Charter Academy of Technology Making AYP.

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

Improving OLTP scalability using speculative lock inheritance Ryan Johnson, Ippokratis Pandis, Anastasia Ailamaki.

Number bonds to 10,

We will resume in: 25 Minutes.

Static Equilibrium; Elasticity and Fracture

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Clock will move after 1 minute

A SMALL TRUTH TO MAKE LIFE 100%

& dding ubtracting ractions.

Select a time to count down from the clock above

Cooperative Cache Scrubbing Jennifer B. Sartor, Wim Heirman, Steve Blackburn*, Lieven Eeckhout, Kathryn S. McKinley^ PACT 2014 * ^

One step equations Add Subtract Multiply Divide Addition X + 5 = -9 X = X = X = X = X = 2.

Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.

Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.

Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.

Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.

Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.

A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay Min Xu, Rastislav Bodik, Mark D. Hill

A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay Min Xu, Rastislav Bodik, Mark D. Hill

Rerun: Exploiting Episodes for Lightweight Memory Race Recording Derek R. Hower and Mark D. Hill Computer systems complex – more so with multicore What.

A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation.

Rerun: Exploiting Episodes for Lightweight Memory Race Recording

Hardware Memory Race Recording for Deterministic Replay

Presentation transcript:

RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill

1 % gcc sim.c % a.out Segmentation fault % % gdb a.out gdb> run Program received SIGSEGV. In get() at hash.c:45 45 a = bucket->d; % gdb a.out gdb> run Program exited normally. gdb> % gcc para-sim.c % a.out Segmentation fault % Why Do You Need a Recorder? % gdb a.out log gdb> run Program received SIGSEGV. In get() at para-hash.c:67 67 a = bucket->d; % gcc para-sim.c % a.out Segmentation fault Race recorded in “log” %

2 Ideally … % gdb a.out log gdb> run Program received SIGSEGV. In get() at para-hash.c:67 67 a = bucket->d; % gcc para-sim.c % a.out Segmentation fault Race recorded in “log” % Long recording: small log Low runtime overhead Low cost Applicability: Programs – data race Systems – non-SC

3 Better and Better Recorders Low Cost Low Overhead Small Log Size Applicability Data RaceNon-SC InstRply ’87  YY B. & G. ’91  Y Netzer’93  Y Déjà Vu ’98  Y RecPlay ’00 JaRec ’04  FDR ’03 BugNet ’05  Y ReEnact ‘03  Y CORD ‘06  Y Strata ’06  Y RTR ‘06  YY

1 Byte/Kilo- Instruction [ASPLOS’06] A New Recorder Hardware Acceleration [ISCA’03] Less Hardware [ASPLOS’06] SC & TSO [ASPLOS’06] Result: One more step toward practical This talk covers only RTR Regulated Transitive Reduction algorithm

5 Outline Race Recording RTR Algorithm Results with Commercial Workloads Compress log during recording  replay more “regularly” Conclusion

Technically, what’s race recording?

7 Race Recording X=6 X = 1 X++ print(X) X = 1 X++ print(X) - X = X*5 - X = X*5 - Thread I Thread J OriginalReplay X=10 Recording X= 6 - X = X*5 - Log Thread I Thread J

8 Goal: Reproduce same conflicts with minimum log data Terminologies and Assumptions ld A Thread I Thread J Recording st B st C sub ld B add st C ld B st A st C Thread I Thread J Replay Log ld D st D ld A st B st C sub ld B add st C ld B st A st C ld D st D Conflicts (red) Dependence (black)

Regulated Transitive Reduction (RTR)

10 Log All Conflicts ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D Log J : 2  3 1  4 3  5 4  6 Log I : 2  3 Log Size: 5*16=80 bytes (10 integers) Dependence Log 16 bytes But too many conflicts

11 Netzer’s Transitive Reduction (TR) ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D TR reduced Log J : 2  3 3  5 4  6 Log I : 2  3 Log Size: 64 bytes (8 integers) TR Reduced Log How to further reduce log size?

12 The Intuition of the RTR Algorithm After Reduction From I to J From J to I Vectors “Regulate” Replay

13 Stricter Dependences to Aid Vectorization ld A Thread I Thread J Replay st B st C add st C ld B st A ld D 55 subst C 66 ld B st D Log J : 2  3 4  5 Log I : 2  3 Log Size: 48 bytes (6 integers) New Reduced Log stricter Reduced Fewer dependencies to log

14 Compress Vectorized Dependencies ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D Log J : x=3,5, ∆=1 Log I : x=3, ∆=1 Log Size: 40 bytes (5 integers) Vectorized Log Vector Deps. TR  RTR: fewer deps + fewer byte/dep

15 Deadlock Avoidance of RTR ld A Thread I Thread J Recording st B st C sub ld B add st C ld B st A st C ld D st D Limit the strict dependencies (see paper) i:4  j:1  j:2  i:3  i:4 Replay Cycle

Results with Commercial Workloads

17 Full-system Simulation Method Commercial server hardware GEMS: Full-system (OS + application) executions 4-core CMP (Sequential Consistent) 1-way in-order issue, 2 GHz, 64KB I/D L1, 4MB L2, 64byte lines, MOSI directory Commercial server software Apache – static web serving SpecJBB – middleware OLTP – TPC-C like Zeus – static web serving L2 C0C0 C1C1 C3C3 C2C2

18 Log Size: 1 byte/KI byte/core/KI ApacheJBBOLTPZeusAVG KB/core/s ApacheJBBOLTPZeusAVG Less buffer, longer recording, smaller logs

19 RTR vs. Netzer’s TR TR RTR ApacheJBBOLTPZeusAVG Log Size 28% smaller log TR was “optimal”

20 Why Does RTR Work Well? RTR Instructions execute at similar speed Dependencies are often “vectorizable”

A New Recorder Hardware Acceleration [ISCA’03] 1 Byte/Kilo- Instruction [ASPLOS’06] Less Hardware [ASPLOS’06] SC & TSO [ASPLOS’06] Result: One more step toward practical “Less hardware” & “TSO” not covered Equally important More details in the paper

22 Conclusion Race recording  Counter nondeterminism RTR  1 byte/kilo-instruction Based on Netzer’s transitive reduction Create stricter dependencies Vectorize dependencies to compress log Avoid overly-strict hence no deadlock Future work Support snooping, SMT, replayer