AADEBUG 2000 - MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.

Slides:



Advertisements
Similar presentations
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
4 December 2001 SEESCOASEESCOA STWW - Programma Debugging of Real-Time Embedded Systems: Experiences from SEESCOA Michiel Ronsse RUG-ELIS.
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
PRAM (Parallel Random Access Machine)
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Race Conditions. Isolated & Non-Isolated Processes Isolated: Do not share state with other processes –The output of process is unaffected by run of other.
An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
Continuously Recording Program Execution for Deterministic Replay Debugging.
Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.
CS533 - Concepts of Operating Systems
DTHREADS: Efficient Deterministic Multithreading
PRASHANTHI NARAYAN NETTEM.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Deterministic Replay of Java Multithreaded Applications Jong-Deok Choi and Harini Srinivasan slides made by Qing Zhang.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
SSGRR A Taxonomy of Execution Replay Systems Frank Cornelis Andy Georges Mark Christiaens Michiel Ronsse Tom Ghesquiere Koen De Bosschere Dept. ELIS.
Bug Localization with Machine Learning Techniques Wujie Zheng
Chapter 4 Storage Management (Memory Management).
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Games Development 2 Concurrent Programming CO3301 Week 9.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Chapter 4 Memory Management Virtual Memory.
Processes Introduction to Operating Systems: Module 3.
25 April 2000 SEESCOASEESCOA STWW - Programma Evaluation of on-chip debugging techniques Deliverable D5.1 Michiel Ronsse.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Virtual Memory 1 1.
QCAdesigner – CUDA HPPS project
Seminar of “Virtual Machines” Course Mohammad Mahdizadeh SM. University of Science and Technology Mazandaran-Babol January 2010.
A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation.
Processor Architecture
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
October 24, 2003 SEESCOASEESCOA STWW - Programma Debugging Components Koen De Bosschere RUG-ELIS.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.
CHESS Finding and Reproducing Heisenbugs in Concurrent Programs
Agenda  Quick Review  Finish Introduction  Java Threads.
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
Clock Snooping and its Application in On-the-fly Data Race Detection Koen De Bosschere and Michiel Ronsse University of Ghent, Belgium Taipei, TaiwanDec.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Computer Organization
Processes and threads.
Chapter 8: Main Memory.
Effective Data-Race Detection for the Kernel
Architecture Background
Jihyun Park, Changsun Park, Byoungju Choi, Gihun Chang
Chapter 2: Operating-System Structures
Operating System Introduction.
Foundations and Definitions
Maximizing Speedup through Self-Tuning of Processor Allocation
Chapter 2: Operating-System Structures
Threads CSE 2431: Introduction to Operating Systems
Virtual Memory 1 1.
Presentation transcript:

AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium

AADEBUG Munchen2 Contents  Introduction  Non-determinism & data races  RecPlay Method Implementation  Example  Experimental Evaluation  Conclusions

AADEBUG Munchen3 Introduction  Developing parallel programs for multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared memory: too much synchronisation: deadlock too little synchronisation: race condition  cyclic debugging is impossible due to non- deterministic nature of most parallel programs  program execution is not repeatable

AADEBUG Munchen4 Causes of non-determinism  Sequential Programs: input (keyboard, disk, network), signals, interrupts, certain system calls ( gettimeofday(),…)  Parallel programs: race conditions: two threads accessing the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable

AADEBUG Munchen5 Example code #include unsigned global=5; thread1(){ global=global+6; } thread2(){ global=global+7; } main(){ pthread_t t1,t2; pthread_create(&t1, NULL, thread1, NULL); pthread_create(&t2, NULL, thread2, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“global=%d\n”, global); }

AADEBUG Munchen6 Possible executions L(5) global=12 global=18global=11 L(5) L(11) S(11) S(12) S(11) S(12) S(11) S(18) A A A A A A

AADEBUG Munchen7 Race conditions  Two types: synchronisation races: doesn’t allow us to use cycli debugging is not a bug, is desired non-determinism data races: doesn’t allow us to use cyclic debugging is a bug, is undesired non-determinism distinction is a matter of abstraction  Automatic of data races detection is possible collect all memory references check parallel references

AADEBUG Munchen8 Detecting data races  Static methods: checking the source code for all possible executions with all possible input NP complete  not feasible  Dynamic methods: during an actual execution => only detects data races during this execution  Removal requires cyclic debugging

AADEBUG Munchen9 Dynamic data race detection  Piece of code between two consecutive synchronisation operations: a segment  We collect two sets for all segments i of all thread: L(i) and S(i) with the addresses of all load and store operations  For all parallel segments, gives the list of conflicting addresses.

AADEBUG Munchen10 Existing race detection methods  Huge overhead causing probe effect and Heisenbugs  Only detect the existence of a data race (and the variable), not the instructions involved.  It is a bug, we need cyclic debugging!

AADEBUG Munchen11 RecPlay  Synchronisation races: execution replay  Data races: detect also enables cyclic debugging  Allows you to detect/remove the first data race  Three phases: record the order of the synchronisation operations replay the synchronisation operations and check for data races normal replay, without checking for data races

AADEBUG Munchen12 Overview Choose input Record Replay+ detect Replay+ ident. Replay+ debug Replay+ debug Choose new input The end AutomaticRequires user intervention

AADEBUG Munchen13 Instrumentation JiTI (Just in Time Instrumentation) was developed especially for RecPlay, but it is a generic instrumentation tool Instruments memory and synchronisation operations Deals correctly with data in code, code in data, self- modifying code Clones processes: the original process is used for the data and the instrumented clone is used for the code No need for recompilation, relinking or instrumentation of files.

AADEBUG Munchen14 Execution replay  ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations  Attaches a scaler Lamport timestamp to each synchronisation operation  Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay  We only need to log a small subset of all operations

AADEBUG Munchen15 Collecting memory operations  We need two lists of adresses per segment i: L(i) and S(i)  A multilevel bitmap is used low memory consumption comparing two bitmaps is easy  We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection

AADEBUG Munchen16 Memory bitmap 9 bit 14 bit

AADEBUG Munchen17 Detecting parallel segments  A vectorclock is attached to each segment  All segment information (two bitmaps+vector timestamps) is kept on a list L.  Each new segment is compared against the segments on list L.

AADEBUG Munchen18 Detecting obsolete segments  Obsolete segments should be removed from list L.  We use snooped matrix clock in order to detect these segments

AADEBUG Munchen19 Detecting obsolete segments segment on list L obsolete segment segment in execution point of execution the future

AADEBUG Munchen20 Identification phase  If a data race is detected, we know the address involved the type of operations involved (load or store) the threads involved the segments containing the racing instructions  We need another replayed execution to find the racing instructions themselves (+ call stack, …)  This replay executes at full speed till the racing segments start executing.

AADEBUG Munchen21 B2B2 An Example

AADEBUG Munchen22 B2B2 A1A1 C4C4P(S1) An Example

AADEBUG Munchen23 B2B2 A1A1 C4C4P(S1) An Example

AADEBUG Munchen24 B2B2 A1A1 C4C4P(S1) V(S1) An Example

AADEBUG Munchen25 B2B2 A1A1 C4C4P(S1) V(S1) An Example

AADEBUG Munchen26 B2B2 A1A1 C4C4P(S1) V(S1) An Example

AADEBUG Munchen27 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) An Example

AADEBUG Munchen28 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) An Example

AADEBUG Munchen29 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

AADEBUG Munchen30 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

AADEBUG Munchen31 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

AADEBUG Munchen32 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

AADEBUG Munchen33 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) An Example

AADEBUG Munchen34 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) An Example

AADEBUG Munchen35 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

AADEBUG Munchen36 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

AADEBUG Munchen37 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

AADEBUG Munchen38 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3)  An Example

AADEBUG Munchen39 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3)   An Example

AADEBUG Munchen40 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

AADEBUG Munchen41 Experimental Evaluation  RecPlay has been implemented for Solaris running on SPARC multiprocessors  Tested on a SUN SparcServer 1000 with 4 processors  SPLASH-2 was used as a benchmark number of multithreaded numeric applications, such as fast fourier transform, a raytracer,...  Several data races were found, including in SPLASH-2

AADEBUG Munchen42 Basic performance of RecPlay

AADEBUG Munchen43 Segments with memory accesses

AADEBUG Munchen44 Efficiency of the ROLT mechanism

AADEBUG Munchen45 Conclusions  RecPlay is a practical and effictient tool for detecting and removing data races  RecPlay also make cyclic debugging possible  Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory-effictient implementation  Data races have been found