Mining Jungloids to Cure Programmer Headaches Dave Mandelin, Ras BodikUC Berkeley Doug KimelmanIBM.

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

SPLGraph: Towards a Formalism for Software Product Lines Itay Maman IBM Research – Haifa Goetz Botterweck Lero – The Irish software Engineering Research.
ASSUMPTION HIERARCHY FOR A CHA CALL GRAPH CONSTRUCTION ALGORITHM JASON SAWIN & ATANAS ROUNTEV.
110/6/2014CSE Suprakash Datta datta[at]cse.yorku.ca CSE 3101: Introduction to the Design and Analysis of Algorithms.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
TEST CASE GENERATOR Mahmoud, Nidhi, Fernando, Chris, Joe, John, and Thomas present:
Winter Compiler Construction T7 – semantic analysis part II type-checking Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv.
Compiler Construction
Introduction To System Analysis and Design
Aki Hecht Seminar in Databases (236826) January 2009
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
1 CS1001 Lecture Overview Object Oriented Design Object Oriented Design.
Chapter 9 Introduction to Procedures Dr. Ali Can Takinacı İstanbul Technical University Faculty of Naval Architecture and Ocean Engineering İstanbul -
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
CS 106 Introduction to Computer Science I 03 / 17 / 2008 Instructor: Michael Eckmann.
From C++ to C#. Web programming The course is on web programming using ASP.Net and C# The course is on web programming using ASP.Net and C# ASP.Net is.
CSE 332: C++ templates and generic programming I Motivation for Generic Programming in C++ We’ve looked at procedural programming –Reuse of code by packaging.
UNIT-V The MVC architecture and Struts Framework.
Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.
Instructor: Dr. Sahar Shabanah Fall Lectures ST, 9:30 pm-11:00 pm Text book: M. T. Goodrich and R. Tamassia, “Data Structures and Algorithms in.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Managing the development and purchase of information systems (Part 1)
CISC6795: Spring Object-Oriented Programming: Polymorphism.
COP4020 Programming Languages
An Introduction to Software Architecture
JavaScript II ECT 270 Robin Burke. Outline JavaScript review Processing Syntax Events and event handling Form validation.
1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
EJB Framework.  As we know, EJB is the center of the J2EE architecture that provides a sturdy framework for building enterprise applications. The major.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
1 Java Inheritance. 2 Inheritance On the surface, inheritance is a code re-use issue. –we can extend code that is already written in a manageable manner.
CSE 425: Data Types II Survey of Common Types I Records –E.g., structs in C++ –If elements are named, a record is projected into its fields (e.g., via.
{ Graphite Grigory Arashkovich, Anuj Khanna, Anirban Gangopadhyay, Michael D’Egidio, Laura Willson.
Lecturer: Prof. Piero Fraternali, Teaching Assistant: Alessandro Bozzon, Advanced Web Technologies: Struts–
Arrays Module 6. Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CSE 425: Data Types I Data and Data Types Data may be more abstract than their representation –E.g., integer (unbounded) vs. 64-bit int (bounded) A language.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Server-side Programming The combination of –HTML –JavaScript –DOM is sometimes referred to as Dynamic HTML (DHTML) Web pages that include scripting are.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Cross Language Clone Analysis Team 2 October 13, 2010.
Weaving a Debugging Aspect into Domain-Specific Language Grammars SAC ’05 PSC Track Santa Fe, New Mexico USA March 17, 2005 Hui Wu, Jeff Gray, Marjan Mernik,
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
M1G Introduction to Programming 2 5. Completing the program.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Chapter 10 Algorithmic Thinking. Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the.
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
JAVA: An Introduction to Problem Solving & Programming, 6 th Ed. By Walter Savitch ISBN © 2012 Pearson Education, Inc., Upper Saddle River,
CSE 332: C++ STL iterators What is an Iterator? An iterator must be able to do 2 main things –Point to the start of a range of elements (in a container)
(c) University of Washington10-1 CSC 143 Java Errors and Exceptions Reading: Ch. 15.
Programming by Sketching Ras Bodik. 2 The Problem Problem: k-line algorithm translates to k lines of code. 30-year-old idea: Can we synthesize the.
1 Budapest University of Technology and Economics Department of Measurement and Information Systems Budapest University of Technology and Economics Fault.
Arrays Chapter 7.
The Emergent Structure of Development Tasks
Types for Programs and Proofs
Lecture 22 Complexity and Reductions
Lecture 2 of Computer Science II
Topics Introduction to File Input and Output
CISC/CMPE320 - Prof. McLeod
CSC 143 Java Errors and Exceptions.
Topics Introduction to File Input and Output
MAPO: Mining and Recommending API Usage Patterns
Presentation transcript:

Mining Jungloids to Cure Programmer Headaches Dave Mandelin, Ras BodikUC Berkeley Doug KimelmanIBM

Motivation: the price of code reuse Inherently, reusable code has complex APIs. Why? –Many classes and methods –Indirection –Many options Simple tasks often require arcane code — jungloids –Example. In Eclipse IDE, parsing a Java file into an AST –simple: a handle for the Java file (object of type IFile ) –simple: what we want (object of type CompilationUnit ) –hard: finding the parser took hours of documentation/code browsing ICompilationUnit cu = JavaCore.createCompilationUnitFrom(javaFile); CompilationUnit ASTroot = AST.parseCompilationUnit(cu, false);

First key observation Part 1: Headache task requirements can usually be described by a 1-1 query: “What code will transform a (single) object of (static) type A into a (single) object of (static) type B?” Our experiments: –12 out of 16 queries are of such single-source, single- target, static-type nature Same example: –type A: IFile, type B: CompilationUnit ICompilationUnit cu = JavaCore.createCompilationUnitFrom(javaFile); CompilationUnit ASTroot = AST.parseCompilationUnit(cu, false);

First key observation (cont’d) Part 2: Most 1-1 queries are correctly answered with 1-1 jungloids 1-1 jungloid: an expression with single-input, single- output operations: –field access; instance method calls with 0 arguments; static method and constructor calls with one argument ; array element access. Our experiments: –9 out of 12 such 1-1 queries are 1-1 jungloids –Others require operations with k inputs ICompilationUnit cu = JavaCore.createCompilationUnitFrom(javaFile); CompilationUnit ASTroot = AST.parseCompilationUnit(cu, false);

Prospector: a jungloid assistant tool Prospector: a programmer’s “search engine” –mine API implementation and sample client code –search a jungloid “database” –paste the result into programmers code User experience: –similar to code assist in Eclipse or.Net –editor cursor position specifies both target type B and context from which the source type A is drawn Soundness guarantees? –such as “does the mined jungloid do the work I intend?” –no such guarantees, of course (because the query doesn’t specify the full intention)

Demo Place the demo already here, to show that Prospector is self- explanatory (no need to explain what’s under the hood to use the tool)? –depends whether the demo exposes some limitations that you cannot explain without first explaining how Prospector works Return to the demo later, to show more complex examples? Jungloids to demo here: –IFile  ASTNode (so far, the running example; use context with size one) –IEditor  Shell (use a bigger context) –ISelection  ASTNode (needs cast) –File  FileChannel Jungloids that you are not able to demo, you can leave as non- demoed examples for later slides

Demo

Program representation The representation is defined to support 1-1 jungloid mining –A directed graph where each path is a 1-1 jungloid –Vertices: pointer types (instances and arrays) –Edges: well-typed expressions with single pointer-typed input and single pointer-typed output A small part of our representation: JavaEditorIWorkbenchPartSiteShell ISourceViewerStyledText.getSite().getShell().getTextWidget().getViewer().getShell()

Second key observation The jungloid that answers a 1-1 query “How do I get from A to B?” typically corresponds to the shortest path from A to B. –Fewer steps are fewer chances to throw an exception return semantically unrelated objects confuse the programmer JavaEditor IWorkbenchPartSite Shell.getSite().getShell() ISourceViewerStyledText.getTextWidget().getViewer().getShell() IViewPartInputProviderUserInputWizardPage String JFormattedTextField ObjectMouseMotionListener EventListenerProxy

Experiment (shortest-path jungloids) Result: –in 10 out of 10 queries, shortest path produced correct code Breakdown: 9found best code (in 3, path length = 1, but code non- trivial) 1found correct code, but the graph contains a subjectively better jungloid of equal length Conclusions: –shortest path a very good heuristic for finding correct jungloids –offering k shortest jungloids likely to find the best jungloid

The downcast problem Problem: Java code is full of downcasts –containers return Objects –type depends on configuration files or other input IStructuredSelection ssel = (IStructuredSelection) sel; ICompilationUnit cu = (ICompilationUnit) sel.getFirstElement(); CompilationUnit ast = AST.parseCompilationUnit(cu, false); CompilationUnitICompilationUnit IStructuredSelectionObject.getFirstElement() AST.parseCompilationUnit(_, false).getFirstElement() ActionContext ac = new ActionContext(startVar); char[] char_ary = ac.toString().toCharArray(); CompilationUnit resultVar = AST.parseCompilationUnit(char_ary);

The subtype mining algorithm Mining a code base –Mine sample API client code base to find valid casts –Assumption: Code base contains the scenario the user wants Goal: for A.f() declared to return object of T, find a superset of possible dynamic subtypes –Superset ensures that the correct jungloid is in the graph Idea: mine invocation sites of A.f(), find casts reached by return value Algorithm: flow insensitive, interprocedural inference – (T) e 1  T  types[e 1 ] – e 1 instanceof T  T  types[e 1 ] – types[e 1 ]  types[(e 0 ? e 1 : e 2 )] – T x = e 1  types[x]  types[e 1 ]

The big picture Prospector architecture –cast miner –shortest path searcher –representation –UI TODO (goal of slide is to use the architecture to recap how prospector works)

Summary Two key observations –many headache scenarios are searches for 1-1 jungloids –most jungloids can be found with “k-shortest paths” over a simple program representation based on declared types + static cast mining Under the hood –new program representation –cast mining –memory footprint reduction (graph clustering) Prospector status –for Java under Eclipse –to be available … Summer 2004

Future work Semantics Q: Is this jungloid semantically valid? A: Model checking Types Q: Can we mine more kinds of jungloids? A: Java 1.5 generics A: Inferring polymorphic types A: Inferring input types A: Typestates Plenty more…

Future work Graph-theoretic considerations: –breaking 2-cycles (conversion from A to B and back) –high-degree nodes may need special handling. –assign weights regarding probabilities that expressions will succeed, and use weighted SP. Generalize the downcast problem –into a more general inference of narrower types. Modeling and inferring generics/polymorphic types –legacy code k-shortest path results –ranking –clustering Dynamic techniques –finding downcasts controlled by configuration data