Presentation is loading. Please wait.

Presentation is loading. Please wait.

10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool

Similar presentations


Presentation on theme: "10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool"— Presentation transcript:

1 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool http://www.cs.cmu.edu/~NatProg/whyline-java.html

2 10/06/2015Dr Andy Brooks2 Case Study Dæmisaga Reference Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior, Andrew J. Ko and Brad A. Myers, ICSE´08, pp 301-310, 2008. ©ACM

3 1. Introduction When program behaviour is incorrect, software engineers must think of questions to ask about the code. Often they simply guess. –“Is this double increment caused by a typo somewhere, a ‘2’ perhaps instead of a ‘1’? Studies have reported that initial guesses are wrong almost 90% of the time. –The double increment was actually caused by faulty program logic which resulted in the incrementing method being called twice. 10/06/2015Dr Andy Brooks3

4 1. Introduction Breakpoint debuggers require the software engineer to choose a line of code. –to examine program state at a particular time Slicing tools also require the software engineer to choose a seed variable or statement. –to display all the code that has an influence If the wrong variable or wrong line of code is chosen then tool output can be irrelevant to solving the problem. –garbage-in garbage-out 10/06/2015Dr Andy Brooks4

5 Whyline A new kind of program understanding and debugging tool. Whyline allows the user to choose a why did or why didn´t question about program output. Whyline then generates an answer to the question using various program analyses. –static and dynamic slicing, precise call graphs, “new algorithms” –chains of events as explanations Whyline works with Java programs that use standard Java I/O and that do not run “too long”. 10/06/2015Dr Andy Brooks5 1. Introduction

6 Simple painting application The user demonstrates the program behaviour they want to inquire about (1). When the program halts, Whyline loads the trace. Using a time controller (2), the user finds the point in time they want to ask about. The user clicks on something of interest and questions pop up about it (3). The user selects a question. Whyline determines the responsible execution sequence and the user can select from a list of pop up questions (4). Whyline determines the instantiation event (5) and the corresponding source code is shown (6). The call stack and locals at the time of the selected event are also shown (7). 10/06/2015Dr Andy Brooks6 2. An Example

7 10/06/2015Dr Andy Brooks7 Figure 1. ©ACM green not blue slider used interactive debugging

8 2. An Example Using Whyline, time spent debugging was halved because: –people do not have to guess search terms or understand the resulting matches –people do not have to set breakpoints Using WhyLine people “simply pointed to something that they knew was relevant and wrong, and let the Whyline determine the related evidence”. 10/06/2015Dr Andy Brooks8 Watch the WhyLine videos: http://www.cs.cmu.edu/~NatProg/movies/whyline-java-demo-web.mov http://www.cs.cmu.edu/~NatProg/movies/whyline-java-demo-web.mov http://www.cs.cmu.edu/~NatProg/movies/whyline-java-tutorial-web.mov http://staff.unak.is/not/andy/MScMaintenance0809/WhyLine/whyline-java-demo-web.mov http://staff.unak.is/not/andy/MScMaintenance0809/WhyLine/whyline-java-tutorial-web.mov

9 3.1 Recording an Execution Trace The Whyline takes a postmortem approach to debugging by capturing a trace. A trace stores Java source files, instrumented class files, sequences of events in each thread, and other types of meta data. Each thread has a separate trace file for its events. Currently 55 types of events are defined in the Whyline. Events include values after their header to help developers interpret program state. –for an assignment event, the value assigned is included –for an invocation event, values passed as arguments are included 10/06/2015Dr Andy Brooks9 method invocation

10 3.1 Recording an Execution Trace Unexecuted classes referenced by a dynamically loaded class are also saved as part of the trace to help answer why didn´t questions. –This is not applied recursively as this would “likely include all known classes”. 10/06/2015Dr Andy Brooks10

11 3.2 Loading a trace All source files and class files are loaded. –used for almost every aspect of question and answering Whyline constructs lists of output instructions which are used as basis to generate questions. Whyline generates a call graph from the invocations found in the class files. Then events are loaded in order of their event IDs. –Whyline has a “complete ordering of the events in the execution.” “To improve the performance of question derivation and answering, the Whyline constructs lists of invocations, assignments to fields, and other types of events.” 10/06/2015Dr Andy Brooks11

12 3.3 Creating an I/O History From the low-level event information recorded in traces, Whyline constructs a user interface for navigating the output history. 10/06/2015Dr Andy Brooks12 A user can move backwards and forwards in time. The selected input time T determines what events are visible on the screen. snapshots from QuickTime video

13 3.3 Creating an I/O History Whyline finds fields and invocations that could have influenced output. –“For example, the color of a rectangle might be affected by some field in an object, or by the return value of a call to some method.” If an output instruction directly invokes rather than simply influence output (e.g. draw a rectangle rather than set the rectangle´s colour), Whyline marks all the potential indirect callers as output invoking. 10/06/2015Dr Andy Brooks13 tracking dependencies

14 (1) Why did property = value? (refers to value passed to output call) 10/06/2015Dr Andy Brooks14 3.4 Deriving questionsSee Figure 3.

15 (6) Why didn´t an instance of class C appear? (refers to instantiations of C) Why didn´t questions “support questions about output that has no representative output to click on”. Whyline has a why didn´t question for each familiar class that has output invoking methods (not output influencing), inherited or declared. –“A class is familiar if user owned code either defines or references the specific class.” 10/06/2015Dr Andy Brooks15 3.4 Deriving questionsSee Figure 3.

16 (4) Why did object get created? (refers to instantiation of object) (5) Why didn´t method execute after time T? (refers to potential invocation instructions) 10/06/2015Dr Andy Brooks16 3.4 Deriving questionsSee Figure 3.

17 (2) Why did field = value? (refers to assignment before T) (3) Why didn´t field´s value change after time T? (refers to potential assignment instructions) 10/06/2015Dr Andy Brooks17 3.4 Deriving questionsSee Figure 3.

18 5.1 Performance Feasibility Performance tests were run on a 2GHz Intel Core Duo MacBook Pro with 2GB of RAM. –standard OS X JVM, given a 1 GB heap The Unix time command was used to measure time to a tenth of a second. The casy study article text says performance tests were run five times and the results averaged. –Table 1 says tests were run 10 times and results averaged. Execution times were measured for normal operation, profiling time (using the profiler YourKit), and tracing time using the Whyline. 6/10/2015Dr Andy Brooks18 5. EVALUATION

19 Table 1 ©ACM LOC calculated omitting whitespace lines. Whyline´s tracing is slower than profiling “because it instruments more code”. Whyline´s tracing time should improve once Whyline has been optimised. 10/06/2015Dr Andy Brooks19

20 Table 1 ©ACM Compressed trace sizes compare favourable with those reported in dynamic slicing work. Loading time is an issue. The single biggest limiting factor is memory. The larger traces resulted in garbage collection and virtual memory use. –improvements in Whyline´s memory management are needed 10/06/2015Dr Andy Brooks20

21 Does Whyline scale? A minute of user interaction with ArgoUML was tested. –35,597 I/O events The output history is navigable at interactive speeds. Clicking on an event produced a menu of questions at interactive speeds. 10/06/2015Dr Andy Brooks21 5. EVALUATION

22 5.2 Question Coverage Does the Whyline provide questions that a user actually wants to ask? 9 bug reports for the applications listed in Table 1 were chosen at random. All but one bug report had a possible corresponding Whyline question. –one bug report was a feature request This evaluation did not test actual Whyline usage. –Would the user actually locate the question and would Whyline’s answer make any sense? “In future work, we will assess this issue in greater detail” 10/06/2015Dr Andy Brooks22 5. EVALUATION

23 5.3 User Study A pilot evaluation was conducted with 9 participants having a variety of backgrounds: –psychology, design, computer science, linguistics, food science, engineering One participant had never seen a line of code. Another had programmed for more than 10 years. The evaluation task was to resolve the slider bug using the Whyline. Task performance was compared with18 self-described Java experts who used Eclipse 2.1 to resolve the slider bug in a previous study [10]. 10/06/2015Dr Andy Brooks23 5. EVALUATION

24 5.3 User Study Participants recieved a short tutorial (1-2 minutes) on how to use the Whyline. The blue slider´s incorrect behaviour was demonstrated to participants. –Participants were asked to find the cause of this incorrect behaviour. Participants were allowed to ask about the user interface but not about that the task or code. The experimenter offered clarification if a user expressed confusion about the user interface. 10/06/2015Dr Andy Brooks24 5. EVALUATION

25 5.3 User Study Times (minutes)MininumMaximumMedian Whyline1124 control group33810 Whyline participants were more than twice as fast as the Java experts (the control group). –statistically significant difference p < 0,05 (Wilcoxon rank sums test) The pilot evaluation has limited external validity. –single task –small sample size (n=9) 10/06/2015Dr Andy Brooks25 5. EVALUATION

26 5.3 User Study Novices in the pilot evaluation tended to outperform the experts in the pilot evaluation. –Often they asked aloud “Why is the line blue?” and used Whyline directly to have the question answered. Experts in the pilot evaluation asked the same question but they first speculated about the reason rather than use the Whyline directly. –e.g. “Why didn´t this slider´s event get handled” One expert didn´t expect Whyline could make the connection between the slider and the color. 10/06/2015Dr Andy Brooks26 5. EVALUATION

27 7. Limitations (of Whyline) The Whyline tracing approach is practical only for executions lasting a few minutes. Some bugs can only be reproduced without interference from instrumentation. Loading traces feels “heavier” in comparison to breakpoint debugger use that has virtually no setup time. Cryptic names used for method and field names will result in cryptic Whyline questions. –‘Why did wd = 251?’ rather than ‘Why did width = 251?’ Whyline helps find code related to a behaviour but does not explain how to change that behaviour. 10/06/2015Dr Andy Brooks27

28 8. Discussion Whyline has no special knowledge about user interface toolkits or other APIs. A user thinking “Why didn´t this window change?” must choose a question like “Why didn´t this JFrame´s repaint() method get called?” “It might be helpful if one could write plug-ins for the Whyline to add special knowledge and heuristics for certain APIs, to improve the specificity of questions and answers.” 10/06/2015Dr Andy Brooks28

29 8. Discussion Modern applications can run across multiple platforms and can be written using multiple languages. How can traces be captured in such an environment? Does Whyline need to provide support for people collaborating on bug fixing? 10/06/2015Dr Andy Brooks29

30 Critical commentary from Andy Whyline technology could revolutionise approaches to debugging. The evaluation, however, was focussed on one defect in a small, stand-alone application. –The result regarding time saved is not generalisable. Would maintainers prefer a DORA approach to identify relevant code to reason about rather than explore the question set posed by Whyline? Much more evaluation work needs to be done. 10/06/2015Dr Andy Brooks30


Download ppt "10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool"

Similar presentations


Ads by Google