UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August 2001 Software Engineering Tools Research on Only $10 a.

UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August 2001 Software Engineering Tools Research on Only $10 a Day William Griswold University of California, San Diego

2 Goals 1.How program analysis is used in software engineering, and how that impacts research 2.Issues for tool implementation, infrastructure 3.Infrastructure approaches and example uses – My labs experiences, interviews with 5 others – Not every infrastructure out there, or IDE infras 4.Lessons learned 5.Challenges and opportunities

3 Base Assumptions Software engineering is about coping with the complexity of software and its development –Scale, scope, arbitrariness of real world Evaluation of SE tools is best done in settings that manifest these complexities –Experiment X involves a tool user with a need –Hard to bend real settings to your tool Mature infrastructure can put more issues within reach at lower cost –Complete & scalable tools, suitable for more settings Background

4 Role of Program Analysis in SE Behavioral: find/prevent bugs; find invariants –PREfix, Purify, HotPath, JInsight, DejaVu, TestTube Structural: find design anomalies, architecture –Lackwit, Womble, RM, Seesoft, RIGI, slicers Evolutionary: enhance, subset, restructure –Restructure, StarTool, WolfPack Discover hidden or dispersed program properties, display them in a natural form, and assist in their change

5 Analysis Methods Dynamic –Trace analysis –Testing Static –Lexical (e.g., grep, diff) –Syntactic –Data-flow analysis, abstract interpretation –Constraint equation solving –Model checking; theorem proving Issues are remarkably similar across methods

6 Use in Iterative Analysis Cycle 1.Programmer identifies problem or task Is this horrific hack the cause of my bug? 2.Choose program source model and analysis I think Ill do a slice with Sprite. [data-flow analysis] 3.Extract (and analyze) model [Programmer feeds code to slicer, chooses variable reference in code that has wrong value] 4.Render model (and analysis) [Tool highlights reached source text] 5.Reason about results, plan course of action Nope, that hack didnt get highlighted… Steps 2-4 may be done manually, with ad hoc automation, an interactive tool, or a batch tool. User-tool interface is rich and dynamic.

7 Interactive, Graphical, Integrated

8 The Perfect Tool User Your tool will solve all sorts of problems. But itll have to analyze my entire 1 MLOC program, which doesnt compile right now, and is written in 4 languages. I want the results as fast as compilation, with an intuitive graphical display linked back to the source and integrated into our IDE. I want to save the results, and have them automatically updated as I change the program. Oh, I use Windows, but some of my colleagues use Unix. Its OK if the tool misses stuff or returns lots of data, we can post-process. We just want a net win. For our most recent tool, the first study involved a 500 KLOC Fortran/C app developed on SGIs

9 Unique Infrastructure Challenges Wide-spectrum needs (e.g., GUI) –Provide function and/or outstanding interoperability Whole-program analysis versus interactivity –Demand, precompute, reuse [Harrold], modularize Source-to-source analysis and transformation –Analyze, present, modify as programmer sees it Ill-defined task space and process structure Saving grace is programmer: intelligent, adaptive –Can interpret, interpolate, iterate; adjust process –Requires tool (and hence infrastructure) support

10 Infrastructure Spectrum Monolithic environment –Generative environment (Gandalf, Synthesizer Generator), programming language (Refine) –Reuse model: high-level language Generator (compiler) or interpreter Component-based –Frameworks, toolkits (Ponder), IDE plug-in support –Reuse model: interface Piecewise replacement and innovation Subclassing (augmentation, specialization)

11 Monolithic Environments Refine: syntactic analysis & trans env [Reasoning] –Powerful C-like functional language w/ lazy eval. –AST datatype w/grammar and pattern language –Aggregate ADTs, GUI, persistence, C/Cobol targets –Wolfpack C function splitter took 11 KLOC (1/2 reps, 5% LISP), no pointer analysis; slow [Lakhotia] CodeSurfer: C program slicing tool [GrammaTech] –Rich GUI, PDG in repository, Scheme back door –~500 LOC to prototype globals model [Lakhotia] –Not really meant for extension, code transformation Great for prototyping and one-shot tasks

12 Components Overview 1. Standalone components –Idea: Ad hoc composition, lots of choices –Example Component: EDG front-ends –Example Tools: static:Alloy [Jackson] dynamic:Daikon [Ernst] 2. Component architectures –Idea: Components must conform to design rules –Examples: data arch:Aristotle [Harrold] control arch:Icaria [Atkinson] 3. Analyses (tools) as components –Idea: Infrastructure-independent tool design –Example: StarTool [Hayes]

13 Standalone Components Component generators –Yacc, lex, JavaCC, Jlex, JJTree, ANTLR … –Little help for scoping, type checking (symbol tables) Representation packages for various languages –Icaria (C AST), GNAT (Ada), EDG (*), … GUI systems galore, mostly generic –WFC, Visual Basic, Tcl/Tk, Swing; dot, vcg Databases and persistence frameworks Few OTS analyses available –Model checkers and SAT constraint solvers

14 Edison Design Group Front-Ends Front-ends for C/C++, Fortran, Java (new) –Lexing, parsing, elaborated AST, generates C Thorough static error checking –Know what you get, but not robust to errors APIs best for translation to IR –Simple things can be hard; white-box reuse Precise textual mappings –C/C++ AST is post-processed, but columns correct C++ front-end cant handle some features

15 Alloy Tool [Jackson] Property checker for Alloy OO spec language –Takes spec and property, finds counterexamples –Uses SAT constraint solvers for analysis back-end –Spec language designed explicitly for analyzability Front-end –Wrote own lexer (JLex), parser (CUP), AST –Eased because of analyzability Translation to SAT formula IR –Aggregate is mapped to collection of scalars –Several stages of formula rewriting Ad Hoc Component Example, Static Analysis

16 Alloy, contd Uses 3 SAT solvers, each with strengths –National challenge resulted in standard SAT IR –Allowed declarative format for hooking in a solver Java Swing for general GUI, dot for graphs –Scalars are mapped back to aggregates, etc., and results are reported as counterexamples –Currently dont map results directly back to program Expects to use variables as a way to map to source About 20 KLOC of new code to build Alloy

17 Alloy: Lessons Designing for analyzability a major benefit –Eases all aspects of front-end and translation to SAT –Adding 3 kinds of polymorphism added 20 KLOC! SAT solver National Challenge a boon –Several good solver components –Standard IR eased integration SAT solver start/stop protocol the hardest –Primitive form of computational steering –Subprocess control, capturing/interpreting output

18 Daikon Tool [Ernst] Program invariant detector for C and Java –Instruments program at proc entries/exits, runs it –Infers variable value patterns at program points Programs with test-suites have been invaluable –Class programs with grading suites –Siemens/Rothermel C programs with test-suites Front-end the least interesting, 1/2 the work –Parser, symbol table, AST/IR manipulation, unparser Get any two: manipulation toys with symbol table Symbol table the hardest, unparser the easiest –Lots of choices, a few false starts Ad Hoc Component Example, Dynamic Analysis

19 Daikon: Choosing Java Front-End Byte-code instrumenters (JOIE, Bobby) –Flexible and precise insertion points –Loss of names complicates mapping to source –Byte codes generated are compiler dependent –Debugging voluminous instrumentation is hard Source-level instrumentation –Java lacks insertability, e.g., no comma operation –Invalidates symbol table, etc. –Chose Jikes, an open source compiler (got 2 of 4) Added AST manipulation good enough to unparse New byte-code instrumenters; EDG for Java

20 Ad Hoc Components: Critique Freedom is great, but integration is weak –Data bloat: replicated and unused functionality –Minimal support for mapping between reps Data: implementation of precise mappings Control: synchronize to compute only whats needed Scalability a huge issue; data-flow information for a 1 MLOC program, highly optimized: 500 MB AST 500 MB BB/CFG 500 MB Bit-vectors Component-based architecture to the rescue Space translates to time by stressing memory hierarchy

21 Aristotle Infrastructure [Harrold] Data-flow analysis and testing infra for C Database is universal integration mechanism –Provides uniform, loose integration Separately compiled tools can write and read DB –Added ProLangs framework [Ryder] at modest cost Scalability benefits –Big file system overcomes space problem –Persistence mitigates time problem Performance still an issue, hasnt been focus –Loose control integration produces reps in toto –DB implemented with flat files Data-based Component Architecture

22 Icaria Infrastructure [Atkinson] Scalable data-flow (and syntactic) infra for C –Hypothesis: need optimized components, control integration, and user control for good performance Space- and time-tuned data structures –AST, BBs, CFG; bit-vectors semi-sparse & factored –Memory allocation pools, free block –Steensgaard pointer analysis Also piggybacked with CFG build pass for locality Event-based demand-driven architecture –Compute all on demand; even discard/recompute –Persistently store undemandable information Control-based Component Architecture

23 Event-based Demand Architecture

24 Icaria: User Control Declarative precision management –Context sensitivity (call stack modelling) –Pointer analysis (e.g., distinguish struct fields) Iteration strategies –With tuned bit-vector stealing and reclamation Declarative programmer input –ANSI/non-ANSI typing, memory allocators, … –Adds precision, sometimes speed-up Termination control –Suspend/resume buttons, procedural hook –Because analysis is a means to an end (a task)

25 Icaria: The Price of Performance Must conform to architectural rules to get performance benefits –E.g., cant demand/discard/redemand your AST unless it meets architectures protocol May cascade into a lot of front-end work –Can buy in modularly, incrementally Demand in batch Dont discard Reconsider demand strategy for new analysis –I.e., when to discard, what to save persistently

26 Icaria: Scenario – Java Retarget Use existing AST or derive off of Ponders Rethink pointer analysis –Calls through function pointers mean bad CG –Intersect (filter) Steensgaard with language types? Modular; variant works for C Rethink 3-address code and call-graph –Small methods (many, deep calling contexts) –Allocation contexts instead of calling contexts? Context sensitivity module would support Existing analyses not likely reusable OTS

27 Icaria: Applications Icaria supports Cawk, Sprite slicer, StarTool –Cawk generated by Ponder syntactic infra [Atkinson] –Slicer is 6 KLOC: 50% GUI, 20% equations Discard AST, CFG Persistently store backwards call-graph Scalability –Simple Cawk scripts run at 500 KLOC/minute –Sliced gcc (200 KLOC) on 200MHz/200MB UltraSparc 1 hour --> 1/2 minute by tuning function pointers Dependent on program and slice Other parameters less dramatic

28 Designing for Reusable Analyses Approaches assume that tool is coded within infrastructure –Complicates migration to a new infrastructure Genoa [Devanbu] and sharlit [Tjiang] are monolithic language/generator solutions How design a reusable analysis component? –A client of infrastructure, so incomplete Addressed for StarTool reengineering tool –Only front-end infra and target lang., not Tcl/tk GUI Analysis Components

29 StarTool: Main View Referenced-by relation for entity in clustered hierarchy Views are navigable, customizable, and annotable

30 Infra StarTool: Adapter Approach [Hayes] Star Infra ? More responsibility in Star relieves all future adapters Star Adapter What adapter interface allows best retargets? Interpose an adapter [GHJV] to increase separation of analysis and infra Low-level: a few small, simple operations – E.g., generic tree traversal ops Did 3 retargets, including to GNAT Ada AST [Dewar]

31 StarTool: Lessons Learned Retargets range from 500 to 2000 LOC –Precise mappings to source, language complexity Best interface assumes nothing about infra –In extreme, dont assume theres an AST at all –Means providing operations that make StarTools implementation easy (despite that theres just one) E.g., iterator for all references similar to this Metaquery operations resolve feature specifics –Gives adapter lots of design room, can choose best –More, bigger ops; mitigated by template class [GHJV] –Got multi-language tool using 2 levels of adapters

32 Observations Infrastructures for prototyping or scalability –1000 LOC effort wont scale-up, yet –Absolute effort is lessening, scale increasing –Boring stuff is still 1/2+ effort Trend towards components –Span of requirements, performance, IDE integration –Many components are programmable, however Interactive whole-program analysis stresses modularity (reuse) of infrastructure –Much reuse is white-box Conclusion!

33 Observations, contd Retargeting is expensive, defies infrastructure –Symbol table (scoping, typing), and base analyses –Language proliferation & evolution continue, slowly –Tool retargets lag language definition, maybe a lot Bigger components are better [Sullivan] –Many small components complicate integration –Mitigates symbol-table issue –Reuse still hard, sometimes white-box Language analyzability has big impact –Front-end, mappings, precise and fast analysis –Designers need to consider consequences

34 Open Issues Effective infrastructures for deep analysis –In principle not hard –In practice, performance/precision tradeoffs can require significant rewrites for small change Out of private toolbox, beyond white-box reuse –Fragile modularity, complexity, documentation Robustness –Useful for incomplete or evolving systems –Complicates the analysis, results harder to interpret Modification: beyond instrumentation & translation

35 Emerging Challenges Integration into IDEs –GUI dependence, native AST; reuse across IDEs What is a program? What is the program? –Multi-language programs –Federated applications, client-server apps –Trend is towards writing component glue Less source code (maybe), but huge apps How treat vast, numerous packages? Sans source? Current tools provide/require stub code Multi-threading is entering the main stream

36 Opportunities Faster computers, better OSs and compilers –Basic Dells can take two processors, and it works Compatibility packages: Cygwin, VMware, Exceed Emergence of Java, etc., for tool construction –Better type systems, garbage collection –API model, persistence, GUI, multi-threading –(Maybe better analyzability, too) Infrastructure –Modular analyses [Ryder], incremental update –Visualization toolkits (e.g., SGIs MineSet) Open source: share, improve; benchmarks

37 URLs Refine www.reasoning.com CodeSurfer www.grammatech.com EDG www.edg.com Alloy sdg.lcs.mit.edu/alloy Daikon cs.washington.edu/homes/mernst/daikon Aristotle www.cc.gatech.edu/aristotle ProLangs www.prolangs.rutgers.edu Icaria, etc. www.cs.ucsd.edu/~wgg/Software

38 Thanks! Michael Ernst: Dynamic analysis Daniel Jackson: Alloy Mik Kersten: IDE integration Mary Jean Harrold:Aristotle Arun Lakhotia: Refine and CodeSurfer Nicholas Mitchell: Compiler infras, EDG John Stasko: Visualization Michelle Strout: Compiler infrastructures Kevin Sullivan: Mediators and components

UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August 2001 Software Engineering Tools Research on Only $10 a.

Similar presentations

Presentation on theme: "UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August 2001 Software Engineering Tools Research on Only $10 a."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August 2001 Software Engineering Tools Research on Only $10 a.

Similar presentations

Presentation on theme: "UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August 2001 Software Engineering Tools Research on Only $10 a."— Presentation transcript:

Similar presentations

About project

Feedback