Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

Introduction to Recursion and Recursive Algorithms

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.

INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun.

PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

1 Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael Jordan Presented By : Arpita Gandhi.

Program Representations. Representing programs Goals.

1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.

Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.

Interprocedural analyses and optimizations. Costs of procedure calls Up until now, we treated calls conservatively: –make the flow function for call nodes.

© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.

November 29, 2005Christopher Tuttle1 Linear Scan Register Allocation Massimiliano Poletto (MIT) and Vivek Sarkar (IBM Watson)

Previous finals up on the web page use them as practice problems look at them early.

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

1 Ivan Lanese Computer Science Department University of Bologna Italy Concurrent and located synchronizations in π-calculus.

Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.

1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer

Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice ZhengMike Jordan.

© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.

Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.

Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Control Flow Resolution in Dynamic Language Author: Štěpán Šindelář Supervisor: Filip Zavoral, Ph.D.

CMSC 345 Fall 2000 Unit Testing. The testing process.

Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan University of Wisconsin, Stanford University, and.

Richard Johnson  How can we use the visualization tools we currently have more effectively?  How can the Software Development.

1 CO Games Development 1 Week 6 Introduction To Pathfinding + Crash and Turn + Breadth-first Search Gareth Bellaby.

Bug Localization with Machine Learning Techniques Wujie Zheng

Chapter 5: Programming Languages and Constructs by Ravi Sethi Activation Records Dolores Zage.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.

Presented by IBM developer Works ibm.com/developerworks/ 2006 January – April © 2006 IBM Corporation. Making the most of The Eclipse debugger.

COMP3190: Principle of Programming Languages

CASE/Re-factoring and program slicing

Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

Reachability Analysis for Callbacks 北京大学唐浩

Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.

1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program.

Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.

Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.

Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael I. Jordan UC Berkeley.

“<Fill in your definition here.>”

Testing and Debugging PPT By :Dr. R. Mall.

Object-Oriented Analysis and Design

Unified Modeling Language

Harry Xu University of California, Irvine & Microsoft Research

Amir Kamil and Katherine Yelick

Unit Test: Functions, Procedures, Classes, and Methods as Units

CMSC 611: Advanced Computer Architecture

Behavioral Models for Software Development

Sampling User Executions for Bug Isolation

Public Deployment of Cooperative Bug Isolation

Objective of This Course

Human Complexity of Software

Inlining and Devirtualization Hal Perkins Autumn 2011

Lectures on Graph Algorithms: searching, testing and sorting

Execution Indexing Xiangyu Zhang.

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Dongyun Jin, Patrick Meredith, Dennis Griffith, Grigore Rosu

Trace-based Just-in-Time Type Specialization for Dynamic Languages

Amir Kamil and Katherine Yelick

Type Systems For Distributed Data Sharing

Dynamic Binary Translators and Instrumenters

Presentation transcript:

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken

A Few Grim Realities Programs fail post-deployment –Ship with known bugs –Users discover new bugs Users are lousy testers –Never do the same thing twice –Wild variation in execution environment –Poor bug reporting, if any Users’ bugs are the ones that really matter

Program Analysis for Pessimists Assume & prepare for postmortem analysis –Compile-time analysis, stashed away for later –Lightweight (deployable) instrumentation Analyze failed program instances –Mix of automated / interactive tools –Not quite static analysis, not quite dynamic Help humans find and fix bugs that matter

This Talk: Reconstructing Execution Chronologies Control flow decision history captures important properties Fundamental questions –“How in the world did I get here?” –“What happened just before this point?” –“How can I make this happen again?” Broader interest than just crashes

This Talk: Reconstructing Execution Chronologies Heavyweight (academic) approaches –Replay debugging –Program tracing Lightweight (industrial) approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

This Talk: Reconstructing Execution Chronologies Heavyweight (academic) approaches –Replay debugging –Program tracing Lightweight (industrial) approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

This Talk: Reconstructing Execution Chronologies Heavyweight (academic) approaches –Replay debugging –Program tracing Lightweight (industrial) approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

This Talk: Reconstructing Execution Chronologies Heavyweight (academic) approaches –Replay debugging –Program tracing Lightweight (industrial) approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

The Big Idea: “Gotten Here” is Control Flow Reachability

The Big Idea: “Gotten Here” is Control Flow Reachability  

The Big Idea: “Gotten Here” is Control Flow Reachability Interested in paths –“How”, not just “yes/no” Transitive paths within one function Multiple functions? –Matched call/return paths –This is a form of context free language reachability  ?  ?

Global Control Flow Graph Split each function invocation site into… –Call node –Return node Edge: call node → function entry node Edge: function exit node → return node No edge: call node → return node /

Global Control Flow Graph callreturn entryexit callreturn

Enforcing Matched Call/Return Label call and return edges –“(” for first call edge; “)” for return –“[” and “]” for next site’s edges –“{” and “}” for next –In practice, ( i and ) i for invocation site i Execution paths obey a context free language of matched parentheses

() [ ] Global Control Flow Graph callreturn entryexit callreturn

Variations in Matching Grammar Complete execution –All calls & returns must be matched {()(){()}[{}(())]}

Variations in Matching Grammar Aborted execution –Some calls without returns –We use a variant of this {()(){()}[{}(())]}

Variations in Matching Grammar Arbitrary subinterval of execution –Prefix containing unmatched returns –Suffix containing unmatched calls –Used by context-sensitive points-to analyses {()(){()}[{}(())]}

Implementation Notes Similar to transitive graph search –Use a work list to incrementally extend frontier –Forward from α or backward from ω –Transitively adding flow edges is one case Several additional cases for calls/returns Complexity –O(N 3 ) for arbitrary grammar and graph –O(E) for our analyses (and many others)

Case 1: Transitive Flow

Case 2: Seeding a New Function )i)i

Case 3a: Bridge Discovery )i)i (i(i

Case 3b: Crossing Known Bridge )i)i (i(i

Case 4: Unmatched Call (i(i

Reconstruction With Crash Site Only Work backward from crash site Remember why each edge is added –Record justifications in route map –route(x, z) = { r 1, …, r n } –r i = cross from x to y, then see route(y, z) x and y must be “adjacent”: one of four cases route(α, ω) defines possible chronologies

Reconstruction With Crash Site Only Case 4 (unmatched call) defines stack trace –Unmatched parens: {()(){()}[{}(())]} –Stack trace: {[( But we probably have a specific stack trace in mind…

Reconstruction With Crash Site + Stack Trace S ::= vector of call edges Build |S + 1| clones of global flow graph

Reconstruction With Crash Site + Stack Trace S ::= vector of call edges Build |S + 1| clones of global flow graph Two types of call edge –( i can match ) i Stays on same layer –c i are unmatched Only way to next layer Determined by S c6c6 c3c3 c 14

Reconstruction With Crash Site + Stack Trace Possible histories –Start at α on top layer –End at ω on bottom layer –route(  α, 0 ,  ω, |S|  ) Backward, not forward –more deterministic Complexity –O(E) work, |S + 1| times c6c6 c3c3 c 14

Reconstruction With Crash Site + Event Trace V ::= vector of trace nodes Use |V + 1| layered clones, as before Must report event when crossing trace node –On each layer, knock out all trace nodes but one On bottommost layer, no trace nodes at all! –Further restricts set of possible paths Complexity: O(E|V|)

Reconstruction With Whatever You’ve Got Handy Stack trace + event trace Multiple event traces Ambiguous traces Incomplete event trace –Recent-branch registers Program counter sampling Finite state machine of your choosing…

Practical Considerations Dynamic dispatch / function pointers –Usual static techniques (points-to, receiver-class, etc.) –Event tracing can help –Note: stack trace is never dynamic Interactivity –Backward analysis is ideal: most bugs are close to crash –FIFO work list, demand-driven search –Deterministic versus non-deterministic state machines Summarization / visualization –Dominator tree walk-back with progressive disclosure

Summary and Conclusions Program analysis in an imperfect world –Post-crash: unique challenges / leverage points CFL path recovery as basis for analysis –Efficient, demand-driven, adaptable Future work –Adaptive annotation to fill in gaps –Leveraging multiple runs –Data value modeling