Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.

Slides:



Advertisements
Similar presentations
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Advertisements

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
4 December 2001 SEESCOASEESCOA STWW - Programma Debugging of Real-Time Embedded Systems: Experiences from SEESCOA Michiel Ronsse RUG-ELIS.
IGOR: A System for Program Debugging via Reversible Execution Stuart I. Feldman Channing B. Brown slides made by Qing Zhang.
OPERATING SYSTEMS Threads
Chap 4 Multithreaded Programming. Thread A thread is a basic unit of CPU utilization It comprises a thread ID, a program counter, a register set and a.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
SHelp: Automatic Self-healing for Multiple Application Instances in a Virtual Machine Environment Gang Chen, Hai Jin, Deqing Zou, Weizhong Qiang, Gang.
1 Rx: Treating Bugs as Allergies – A Safe Method to Survive Software Failures Feng Qin Joseph Tucek Jagadeesan Sundaresan Yuanyuan Zhou Presentation by.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Continuously Recording Program Execution for Deterministic Replay Debugging.
October 2003 What Does the Future Hold for Parallel Languages A Computer Architect’s Perspective Josep Torrellas University of Illinois
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
MPICH-V: Fault Tolerant MPI Rachit Chawla. Outline  Introduction  Objectives  Architecture  Performance  Conclusion.
Virtual Memory Art Munson CS614 Presentation February 10, 2004.
Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
SSGRR A Taxonomy of Execution Replay Systems Frank Cornelis Andy Georges Mark Christiaens Michiel Ronsse Tom Ghesquiere Koen De Bosschere Dept. ELIS.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.
Seminar of “Virtual Machines” Course Mohammad Mahdizadeh SM. University of Science and Technology Mazandaran-Babol January 2010.
Lecture 3 Threads Erick Pranata © Sekolah Tinggi Teknik Surabaya 1.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
Processes and Virtual Memory
31 Oktober 2000 SEESCOASEESCOA STWW - Programma Work Package 5 – Debugging Task Generic Debug Interface K. De Bosschere e.a.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
October 24, 2003 SEESCOASEESCOA STWW - Programma Debugging Components Koen De Bosschere RUG-ELIS.
CHESS Finding and Reproducing Heisenbugs in Concurrent Programs
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Better Performance Through Thread-local Emulation Ali Razeen, Valentin Pistol, Alexander Meijer, and Landon P. Cox Duke University.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Multi-threading and other parallelism options J. Apostolakis Summary of parallel session. Original title was “Technical aspects of proposed multi-threading.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 4: Multithreaded Programming
Introduction to Operating Systems
Chapter 4: Threads.
CS703 - Advanced Operating Systems
Chapter 4: Threads.
Presentation transcript:

Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher R. Andrews, and Yuanyuan Zhou Review by M. Kozuch

Motivation  Better debugging  Reproducing errors is hard because 1. Bug setup may require significant time 2. Exact inputs may be difficult to reproduce 3. Some are Heisenbugs Solution: Provide a mechanism for deterministic replay after a software error is detected.

Related Work  Compile-time static checking  Run-time dynamic checking  Hardware support

Overview  State = checkpoint()  discard(State)  replay(State)

Main Idea Part I  Use fork()-like mechanism to create shadow copy of a process Copy-on-write memory image Copy-on-write memory image Register values Register values Some process state Some process state  Not exactly fork Shadow not runnable Shadow not runnable Not all reference counts are incremented Not all reference counts are incremented

Memory Image  Copy-on-write semantics Reduces cost of checkpoint() Reduces cost of checkpoint() Reduces memory footprint Reduces memory footprint Reduces impact of multiple checkpoints Reduces impact of multiple checkpoints Reduces cost of replay() Reduces cost of replay()

Multithreaded Processes  Option 1: Rollback the process “Trivially” ensures consistency “Trivially” ensures consistency  Option 2: Rollback the thread Requires ordering log for memory and files Requires ordering log for memory and files Rollback thread set with inter-dependencies Rollback thread set with inter-dependencies Problems: Problems: Logic adds overhead and is error-proneLogic adds overhead and is error-prone Data races may require watching all threadsData races may require watching all threads Overhead paid even when no errorsOverhead paid even when no errors Multithreaded state capture is still hard if some of the threads can be in a different context (i.e. kernel).

Main Idea Part II  Can re-execute code  Must replay inputs (e.g. from read())  After checkpoint(), log syscall return values*  After replay(), replay the log

Log Weaknesses  Shared memory Regions must be identified, and all accesses set to generate #PF Regions must be identified, and all accesses set to generate #PF Not currently handled Not currently handled  Signals Replay of asynchronous events is challenging Replay of asynchronous events is challenging Not currently handled Not currently handled

uBenchmark Performance I  Checkpoint() is us  Discard()/Replay() is /7500us

uBenchmark Performance II  read()/write() syscalls  Used cold caches “for consistency”

Application Performance Experiments  One application  Network bound  Logging disabled  Conclusion: the overhead of checkpointing is low.