Presentation is loading. Please wait.

Presentation is loading. Please wait.

0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang.

Similar presentations


Presentation on theme: "0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang."— Presentation transcript:

1 0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA alice.t.lee1@jsc.nasa.gov Yann-Hang Lee Computer Science & Eng Arizona State University, Tempe, AZ yhlee@asu.edu

2 lee_IV&V-1 Background  Major difficulties of building real-time embedded applications m handling concurrent events (real-world events occur in parallel) m timing control and temporal dependence in program behavior m asynchronous operations  Non-deterministic operation, Time-dependent behavior, and race condition m difficult to model, analyze, test, and re-produce.  Example: NASA Pathfinder spacecraft m Total system resets in Mars Pathfinder m An overrun of data collection task  a priority inversion in mutex semaphore  failure of communication task  a system reset. m Took 18 hours to reproduce the failure in a lab replica  the problem became obvious and a fix was installed

3 lee_IV&V-2 Background (Cont’d)  Other examples m select(2)/accept(2) Race Condition in TCP Servers of NetBSD  the bug depends on a specific event and is sometimes difficult to reproduce, particularly if the server is very fast and the network is relatively slow. m The Delphi Bug Report 459  difficult to reproduce the bug since the timing of the two threads (one is being destroyed and one is being created) has to be “right” for it to occur.  it is easy to identify the faults and fix them once the failing sequences are reproduced (or observed).  The failures are rooted in the interaction of multiple concurrent operations/threads and are based on timing dependencies.

4 lee_IV&V-3 Deterministic Replay  Can we re-produce the exact execution behavior with additional delays in a controlled environment m the delays may be caused by instrumentation and break points  For multiple purposes: m Test analysis m Debugging m Recovery Execution/ Instrumentation D. replay/ Instrumentation Execution Execution/ Observation/ Assertion D. replay/ Observation/ Assertion Execution Execution/ Checkpointing/ Msg logging Rollback/ D. replay

5 lee_IV&V-4 Deterministic Replay (Cont’d)  Programs read in the same input values (timer, DAQ, status, etc.)  Interrupts occurs in the same program execution instances  Need to log external events during real-time execution and re- submit the events during replay m recording and replaying stages real-time execution interrupt_1 interrupt_2 PC=1000 PC=2000 deterministic replay interrupt_1 interrupt_2 PC=1000 PC=2000 time intrusions

6 lee_IV&V-5 Testing Analysis and Timing Intrusion  Software quality analysis and test coverage m Instrumentation at source programs m program behavior may be changed due to timing intrusion  test a robotic controller in the target system – hardware and human-in-the loop operations m some solutions :  hardware-based trace collection (Applied Microsystems)  special data logging, monitoring, and test facility (SVF for NASA ISS)  Apply instrumentation during deterministic replay m if the overhead of logging external events can be minimized

7 lee_IV&V-6 Our Approach -- A Two-stage Instrumentation  Instrumentation based on RTOS -- for context switches, interrupts, events, and task communication  Annotation for device drivers  Synchronize program execution with external events m cannot rely on program counter  an interrupt during a loop (need loop count and program counter) m simulated time  must be adjusted to match with the real execution time  determine when an event occurs if no data dependence, it can occur at any instance during a block execution else, need to know the corresponding statement

8 lee_IV&V-7 Software Instruction Counter  Exact instance in program execution m specified by program counter (PC) p Software instruction counter (SIC) -- m incremented when backward jump or procedure call m software or hardware implemented m Has been applied to recovery and debugging read I/O check value I/O status changed read I/O check value

9 lee_IV&V-8 Current Status source program code analyzer ESIC, system, and event instrumentation instrumented program_1 target - record environment code instrumentation ESIC and replay instrumentation instrumented program_2 event trace_1 event trace_2 PC stamp converter target - replay environment execution trace

10 lee_IV&V-9 Current Status (Cont’d)  Works for single execution thread in the whole system (vxWork + MPC860)  There are kernel and non-instrumented threads m test analysis of one program in a multitasking environment m debug a program which calls library routines m system calls to RTOS  Can we still reach deterministic replay if the execution of the instrumented thread is interleaved with other threads?  If interrupts (input)  thread_1  thread_2, then, both threads must be instrumented instrumented program RTOS The other thread semTake() interrupt semGive() ISR

11 lee_IV&V-10 Current Status (Cont’d)  If interrupts (input)  thread_2 and thread_1  thread_2, m thread_1 doesn’t need to be instrumented m however, interrupts can occur while thread_1 is running (I.e. execution is not in the instrumentation region due to a blocked system call or library call)  Solution: m check thread id when an interrupt occurs m if the interrupted instruction is in the instrumentation region, use PC+SIC for replay m else, replay the interrupt just before the call (RTOS or library)

12 lee_IV&V-11 Current Tasks  Tool integration and GUI  Experiments m joystick program with input and timer m DC motor controller with a LabView-based simulator  Applications in JSC m X38 m AERCam  Porting m vxWorks and  Suds on MBX860 embedded controller m porting to RT-linux and other platforms  Documentation and dissemination


Download ppt "0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang."

Similar presentations


Ads by Google