Introduction © Marcelo d’Amorim 2010.

Slides:



Advertisements
Similar presentations
Advanced programming tools at Microsoft
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Program Slicing and Debugging Elton Alves Informatics Center Federal University of Pernambuco (UFPE) V Encontro Brasilieiro de Testes de Software (EBTS),
Semantics Static semantics Dynamic semantics attribute grammars
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Symbolic execution © Marcelo d’Amorim 2010.
Type checking © Marcelo d’Amorim 2010.
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Ongoing projects in the Program Analysis Group Marcelo d’Amorim Informatics Center, Federal University of Pernambuco (UFPE) Belo Horizonte, MG-Brazil,
ISBN Chapter 3 Describing Syntax and Semantics.
CS 355 – Programming Languages
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
Interprocedural analysis © Marcelo d’Amorim 2010.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION TESTING II Autumn 2011.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Program analysis Mooly Sagiv html://
Houdini: An Annotation Assistant for ESC/Java Cormac Flanagan and K. Rustan M. Leino Compaq Systems Research Center.
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.
Overview of program analysis Mooly Sagiv html://
Describing Syntax and Semantics
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Overview of program analysis Mooly Sagiv html://
Claus Brabrand, UFPE, Brazil Aug 09, 2010DATA-FLOW ANALYSIS Claus Brabrand ((( ))) Associate Professor, Ph.D. ((( Programming, Logic, and.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
An Overview on Static Program Analysis Mooly Sagiv.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
XFindBugs: eXtended FindBugs for AspectJ Haihao Shen, Sai Zhang, Jianjun Zhao, Jianhong Fang, Shiyuan Yao Software Theory and Practice Group (STAP) Shanghai.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Ongoing projects in the Program Analysis Group Marcelo d’Amorim Informatics Center, Federal University of Pernambuco (UFPE) Belo Horizonte, MG-Brazil,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Introduction to Software Analysis CS Why Take This Course? Learn methods to improve software quality – reliability, security, performance, etc.
1 Proving program termination Lecture 5 · February 4 th, 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.
CS223: Software Engineering Lecture 21: Unit Testing Metric.
Textbook: Principles of Program Analysis
(One-Path) Reachability Logic
Objective of This Course
CUTE: A Concolic Unit Testing Engine for C
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Introduction © Marcelo d’Amorim 2010

Definition of Static Analysis (SA) Technique to extract information at compile- time from a computer program © Marcelo d’Amorim 2010

Enabling technology… © Marcelo d’Amorim 2010 …to different SE and PL fields. In particular: – Software Design – Software Verification

Several Purposes Prove correctness – e.g., show that program has no null derefs, etc. Guide other tools – e.g., integration testing from dependence graphs Assist human activity – e.g., find bad smells, find code clones, report quality metrics, report code dependencies etc. © Marcelo d’Amorim 2010

Several Forms Pattern matching Type checking Partial correctness Symbolic execution Dataflow analysis Our focus © Marcelo d’Amorim 2010

Several Forms: By Example Match this anti-pattern against this program: Type check the function abstractions: lambda f g h. (f g) (h + 3) lambda f. f f © Marcelo d’Amorim 2010 lambda f g h. f (g (h + 3)) public static void main(String[] args) { if (args != null && args.length > 1 && args[0] == “option1”) {…}} BAD_PRACTICE: String comparison with ==

Several Forms: By Example Generate predicate P and check assertion: Execute symbolically the method: public static void foo(int x) { if (x > 10) { … } else { ERROR! } } © Marcelo d’Amorim 2010 public static void sort(int[] x) { … {P} assert(P => Q) // Q = x is permutation of old-x && // x is ascending }

Several Forms: Dataflow analysis *Example from Barbara Ryder’s ACACES Summer School Lecture Notes: Do any of j-manipulating expressions denote compile-time constants?

Several Forms: Dataflow analysis *Example from Barbara Ryder’s ACACES Summer School Lecture Notes:

Several Forms: Dataflow analysis *Example from Barbara Ryder’s ACACES Summer School Lecture Notes: Direction of arrows denote control and data dependency, respectively!

© Marcelo d’Amorim 2010 No silver bullet! There are compromises. But several tools can successfully use them. Pattern matching Type checking Symbolic Execution Dataflow analysis VCG and Checking

Success Cases Popular Tools – Case 1: Lint (dataflow and pattern matching) – Case 2: PReFIX (symbolic execution) – Case 3: FindBugs (mostly pattern matching) Huge Market! – Coverity: – GrammaTech: – KlocWork: – Parasoft: – Semmle: © Marcelo d’Amorim 2010

Case 1: Lint [Johnson, Bell Lab’s TR ] Problem: Find common error patterns in C code – E.g., enforces strict typing rules (function calls and casting), use without def, def without use, functions without used, portability issues, etc. Motivation: C is weakly typed Proposal: Use compiler’s intra-procedural (cheap) analysis Comment: Use regularly or on mature codebase to avoid a warning flood See: frame.html frame.html © Marcelo d’Amorim 2010

Case 2: PReFIX [Bush et al., SPE 2000] Problem: Find common errors in C code. – E.g., memory misuse (null de-refs and leaks), uninitialized variables, library idioms, etc. Motivation: Lint-like tools report many false alarms Proposal: Simulate runs at compile-time – Symbolic execution of C programs. Use heuristics to: Select inter-procedural paths to visit Filter/Sort warning reports © Marcelo d’Amorim 2010

Case 3: FindBugs [Hovemeyer and Pugh, OOPSLA 2004] Problem: Programmers repeat standard errors Proposal: Look for code anti-patterns (error- prone code, inefficient, etc.) – The FindBugs took looks for bytecode patterns © Marcelo d’Amorim 2010

public void visit(Code code) { seenGuardClauseAt = Integer.MIN_VALUE; logBlockStart = 0; logBlockEnd = 0; super.visit(code); } public void sawOpcode(int seen) { if ("cbg/app/Logger".equals(classConstant) && seen == INVOKESTATIC && "isLogging".equals(nameConstant) && "()Z".equals(sigConstant)) { seenGuardClauseAt = PC; return; } if (seen == IFEQ && (PC >= seenGuardClauseAt + 3 && PC < seenGuardClauseAt + 7)) { logBlockStart = branchFallThrough; logBlockEnd = branchTarget; } if (seen == INVOKEVIRTUAL && "log".equals(nameConstant)) { if (PC = logBlockEnd) { bugReporter.reportBug(new BugInstance("CBG_UNPROTECTED_LOGGING", HIGH_PRIORITY).addClassAndMethod(this).addSourceLine(this)); } Case 3: FindBugs [Hovemeyer and Pugh, OOPSLA 2004] Unguarded logging affects performance!

© Marcelo d’Amorim 2010 public void visit(Code code) { seenGuardClauseAt = Integer.MIN_VALUE; logBlockStart = 0; logBlockEnd = 0; super.visit(code); } public void sawOpcode(int seen) { if ("cbg/app/Logger".equals(classConstant) && seen == INVOKESTATIC && "isLogging".equals(nameConstant) && "()Z".equals(sigConstant)) { seenGuardClauseAt = PC; return; } if (seen == IFEQ && (PC >= seenGuardClauseAt + 3 && PC < seenGuardClauseAt + 7)) { logBlockStart = branchFallThrough; logBlockEnd = branchTarget; } if (seen == INVOKEVIRTUAL && "log".equals(nameConstant)) { if (PC = logBlockEnd) { bugReporter.reportBug(new BugInstance("CBG_UNPROTECTED_LOGGING", HIGH_PRIORITY).addClassAndMethod(this).addSourceLine(this)); } Case 3: FindBugs [Hovemeyer and Pugh, OOPSLA 2004] Several others query languages: SeemleCode [Verbaere et al., OOPSLA 2007], Design Wizard [Brunet et al., ICSE 2009], etc.

Remember Pattern matching Type checking Partial correctness Symbolic execution Dataflow analysis Our focus © Marcelo d’Amorim 2010

Soundness and Completeness © Marcelo d’Amorim 2010 Soundness: Analysis reports no errors  Really are no errors Completeness: Analysis reports an error  Really is an error ok error Sound analysis ok error Complete analysis *Courtesy of Claus Brabrand :

Soundness and Completeness Soundness: No false negatives – There are no escaped errors. We say that a sound analysis is conservative (pessimistic). Completeness: No false positives © Marcelo d’Amorim 2010 Definitions vary from field to field. This applies in the context of verification.

Sound Rejects all type-invalid programs Type checking Java © Marcelo d’Amorim 2010 void m(Object o) { if (s instanceof String) { s.indexOf(“.”); } void m(Thread t) {… t.remove(); } InComplete Rejects few type-valid programs

FAQ My analysis is sound and reports an error! – Is the error real? MAYBE NOT (assume incomplete) My analysis is sound and reports no error! – Is my program correct w.r.t. that property? YES My analysis is complete and reports an error! – Is the error it reports a real error? YES My type checker is conservative! – Can it accept programs with type errors? NO – Can it reject type-correct programs? YES, IF INCOMPLETE © Marcelo d’Amorim 2010

Inaccuracy Results from the decisions of the analyzer to deal with performance and hard problems – Pessimistic (can result in false positives) – Optimistic (can result in missed errors) © Marcelo d’Amorim 2010

Reality: No Silver Bullet optimistic inaccuracy pessimistic inaccuracy Testing Complexity of property + program © Marcelo d’Amorim 2010 Sound static analysis

Reality: No Silver Bullet optimistic inaccuracy pessimistic inaccuracy Complexity of property + program © Marcelo d’Amorim 2010 Ideal (but unrealistic) scenario: Accurate results regardless of complexity.

Reality: No Silver Bullet optimistic inaccuracy pessimistic inaccuracy Practice 1: Sacrifice soundness in favor of decidability Complexity of property + program © Marcelo d’Amorim 2010

Reality: No Silver Bullet optimistic inaccuracy pessimistic inaccuracy Complexity of property + program © Marcelo d’Amorim 2010 Practice 2: Sacrifice completeness in favor of scalability

In Summary… © Marcelo d’Amorim 2010 Needs to simplify (approximate) results to deal with undecidable properties and/or large programs

Language Features and Imprecision Language features lead to imprecise results – Reflection – Pointers – I/O © Marcelo d’Amorim 2010 Better precision comes with higher cost!

Example: Reachable Definitions *Example from Barbara Ryder’s ACACES Summer School Lecture Notes:

x = 0; x = x+1; output x;  a = b = f x=0 (a) c = b d d = f x=x+1 (c) e = d Dataflow Analysis 3. Recursive equations: x = 0; do { x = x+1; } while (…); output x; Program: 1. Control-flow graph: T  4. one ”big” transfer function: T((a,b,c,d,e)) = (,f x=0 (a),b d,f x=x+1 (c),d) |VAR|*|PP| = 1*5 = 5 …over a ”big” power-lattice: T T T 0 ( ) T 1 ( ) T 2 ( ) T 3 ( ) T a = b = d = c = e = T 5 ( ) T = LEAST FIXED POINT ANOTHER FIXED POINT 5. Solve rec. equations…: 2. Transfer functions: solution T 4 ( ) T f x=0 ( l ) = f x=x+1 ( l ) = l  L *Courtesy of Claus Brabrand :

Reachable Definitions in SOOT © Marcelo d’Amorim 2010 public class SimpleReachingDefinitions implements ReachingDefinitions { private HashMap > unitToDefinitionAfter; private HashMap > unitToDefinitionBefore; public SimpleReachingDefinitions(DirectedGraph graph) {/*WORK*/} public List getReachingDefinitionsAfter(Unit _unit) { return this.unitToDefinitionAfter.get(_unit);} public List getReachingDefinitionsBefore(Unit _unit) { return this.unitToDefinitionBefore.get(_unit);} } class SimpleReachingDefinitionsAnalysis extends ForwardFlowAnalysis { private FlowSet emptySet; public SimpleReachingDefinitionsAnalysis(DirectedGraph _graph) { /*INIT*/} protected void copy(FlowSet _source, FlowSet _dest) { …} protected void merge(FlowSet _source1, FlowSet _source2, FlowSet _dest) {...} protected FlowSet entryInitialFlow() {...} protected FlowSet newInitialFlow() {...} protected void flowThrough(FlowSet _source, Unit _unit, FlowSet _dest){...} private void kill(FlowSet _source, Unit _unit, FlowSet _dest) {...} private bdef(FlowSet _source, Unit _unit, FlowSet _dest) {...} }

public class SimpleReachingDefinitions implements ReachingDefinitions { private HashMap > unitToDefinitionAfter; private HashMap > unitToDefinitionBefore; public SimpleReachingDefinitions(DirectedGraph graph) {/*WORK*/} public List getReachingDefinitionsAfter(Unit _unit) { return this.unitToDefinitionAfter.get(_unit);} public List getReachingDefinitionsBefore(Unit _unit) { return this.unitToDefinitionBefore.get(_unit);} } class SimpleReachingDefinitionsAnalysis extends ForwardFlowAnalysis { private FlowSet emptySet; public SimpleReachingDefinitionsAnalysis(DirectedGraph _graph) { /*INIT*/} protected void copy(FlowSet _source, FlowSet _dest) { …} protected void merge(FlowSet _source1, FlowSet _source2, FlowSet _dest) {...} protected FlowSet entryInitialFlow() {...} protected FlowSet newInitialFlow() {...} protected void flowThrough(FlowSet _source, Unit _unit, FlowSet _dest){...} private void kill(FlowSet _source, Unit _unit, FlowSet _dest) {...} private bdef(FlowSet _source, Unit _unit, FlowSet _dest) {...} } Reachable Definitions in SOOT © Marcelo d’Amorim 2010 Programmer specifies how to transfer information across edges of a flow graph.

Basic terminology: dependency On Control: dominance On Data: def-use, use-def © Marcelo d’Amorim 2010 PROGRAM DEPENDENCE GRAPH (PDG) From “Dynamic Program Slicing”, Agrawal and Horgan, PLDI’90

Basic terminology: dependency On Control – Dominance – Post-dominance © Marcelo d’Amorim 2010 dn entry n exit pd

Dataflow analysis terminology [“A few billion LOC latter”, Bessey et al., CACM 2010] © Marcelo d’Amorim 2010 […] checkers […] traverse program paths in a forward direction (flow-sensitive), going across function calls (inter-procedural) while keeping track of call-site-specific information (context-sensitive) and […] detect when a path is infeasible (path-sensitive).

Final Question Why SA is not more intensively used? – Engineer: Takes too long to run – Theoretician: Property to check is undecidable – Econ. 1: It is cheaper to train people – Econ. 2: Defeats purp.; high number of false alarms © Marcelo d’Amorim 2010

Program analysis (dynamic, static, mixed) is promising. But one needs to learn when and how to apply it. This is one of the goals of this course.

Proofs and Decidability (1/3) One can use axiomatic semantics of Java to derive a predicate that holds at the exit of sort Such predicate can assist the proof of © Marcelo d’Amorim 2010 public static void sort(int[] numbers) { for (int i = 0; i < numbers.length; i++) { int copyNumber = numbers[i]; int j = i; while (j > 0 && copyNumber < numbers[j-1]) { numbers[j] = numbers[j-1]; j--; } numbers[j] = copyNumber; } forall as. ascending(sort(as)) && permutation(sort(as),as)

Proofs and Decidability (2/3) FOL is undecidable in general User needs to provide loop invariants © Marcelo d’Amorim 2010

Proofs and Decidability (3/3) © Marcelo d’Amorim 2010 Note 1: symbolic execution can show that no errors exist up to given bounds of array sizes Note 2: symbolic execution is very expensive.