The Quest for Minimal Program Abstractions Mayur Naik Georgia Tech Ravi Mangal and Xin Zhang (Georgia Tech), Percy Liang (Stanford), Mooly Sagiv (Tel-Aviv.

Slides:

Advertisements

Similar presentations

Finding Optimal Program Abstractions Mayur Naik Georgia Tech Xin Zhang (Georgia Tech) Hongseok Yang (Oxford) Percy Liang (Stanford) Mooly Sagiv (Tel-Aviv.

Advertisements

Finding bugs: Analysis Techniques & Tools Symbolic Execution & Constraint Solving CS161 Computer Security Cho, Chia Yuan.

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Pallavi Joshi  Chang-Seo Park  Koushik Sen  Mayur Naik ‡  Par Lab, EECS, UC Berkeley‡

Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.

Conditional Must Not Aliasing for Static Race Detection Mayur Naik Alex Aiken Stanford University.

3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.

1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)

Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.

CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.

ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.

Symmetry-Aware Predicate Abstraction for Shared-Variable Concurrent Programs Alastair Donaldson, Alexander Kaiser, Daniel Kroening, and Thomas Wahl Computer.

1 E. Yahav School of Computer Science Tel-Aviv University Verifying Safety Properties using Separation and Heterogeneous Abstractions G. Ramalingam IBM.

What is an Algorithm? (And how do we analyze one?)

Program Verification as Probabilistic Inference Sumit Gulwani Nebojsa Jojic Microsoft Research, Redmond.

Introduction to Analysis of Algorithms

Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.

1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.

Mayur Naik Alex Aiken John Whaley Stanford University Effective Static Race Detection for Java.

CS 536 Spring Global Optimizations Lecture 23.

1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.

CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 2. Analysis of Algorithms - 1 Analysis.

Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.

Overview of program analysis Mooly Sagiv html://

1 Abstraction Refinement for Bounded Model Checking Anubhav Gupta, CMU Ofer Strichman, Technion Highly Jet Lagged.

Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.

Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.

Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.

Data Structure & Algorithm Lecture 3 –Algorithm Analysis JJCAO.

CSC2108 Lazy Abstraction on Software Model Checking Wai Sum Mong.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Finding Optimum Abstractions in Parametric Dataflow Analysis Xin Zhang Georgia Tech Mayur Naik Georgia Tech Hongseok Yang University of Oxford.

Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.

Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.

Race Checking by Context Inference Tom Henzinger Ranjit Jhala Rupak Majumdar UC Berkeley.

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.

CSC 413/513: Intro to Algorithms NP Completeness.

Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.

Variance Analyses from Invariance Analyses Josh Berdine Microsoft Research, Cambridge Joint work with Aziem Chawdhary, Byron Cook, Dino.

Model construction and verification for dynamic programming languages Radu Iosif

Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.

Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.

Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India & K. V. Raghavan.

1/6/20161 CS 3343: Analysis of Algorithms Lecture 2: Asymptotic Notations.

Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.

Effective Static Deadlock Detection Mayur Naik* Chang-Seo Park +, Koushik Sen +, David Gay* *Intel Research, Berkeley + UC Berkeley.

CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.

Effective Static Deadlock Detection Mayur Naik (Intel Research) Chang-Seo Park and Koushik Sen (UC Berkeley) David Gay (Intel Research)

1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.

5/7/03ICSE Fragment Class Analysis for Testing of Polymorphism in Java Software Atanas (Nasko) Rountev Ohio State University Ana Milanova Barbara.

Algorithmics - Lecture 41 LECTURE 4: Analysis of Algorithms Efficiency (I)

Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.

Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.

Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.

Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.

A User-Guided Approach to Program Analysis Ravi Mangal, Xin Zhang, Mayur Naik Georgia Tech Aditya Nori Microsoft Research.

All-pairs Shortest paths Transitive Closure

Combining Logical and Probabilistic Reasoning in Program Analysis

Pointer Analysis Lecture 2

Ravi Mangal Mayur Naik Hongseok Yang

Ravi Mangal, Xin Zhang, Mayur Naik

Over-Approximating Boolean Programs with Unbounded Thread Creation

Pointer Analysis Lecture 2

Predicate Abstraction

Presentation transcript:

The Quest for Minimal Program Abstractions Mayur Naik Georgia Tech Ravi Mangal and Xin Zhang (Georgia Tech), Percy Liang (Stanford), Mooly Sagiv (Tel-Aviv Univ), Hongseok Yang (Oxford)

p ² q1?p ² q1? p ² q2?p ² q2? The Static Analysis Problem April static analysis X program p query q 1 query q 2 X MIT

Static Analysis: 70’s to 90’s April client-oblivious “Because clients have different precision and scalability needs, future work should identify the client they are addressing …” M. Hind, Pointer Analysis: Haven’t We Solved This Problem Yet?, 2001 abstraction a program p query q 1 query q 2 p ² q1?p ² q1? p ² q2?p ² q2? MIT

p ² q1?p ² q1? p ² q2?p ² q2? Static Analysis: 00’s to Present April client-driven – demand-driven points-to analysis Heintze & Tardieu ’01, Guyer & Lin ’03, Sridharan & Bodik ’06, … – CEGAR model checkers: SLAM, BLAST, … abstraction a program p query q 1 query q 2 MIT

Static Analysis: 00’s to Present April abstraction a 2 abstraction a 1 q1q1 p q2q2 p ² q 1 ? p ² q 2 ? client-driven – demand-driven points-to analysis Heintze & Tardieu ’01, Guyer & Lin ’03, Sridharan & Bodik ’06, … – CEGAR model checkers: SLAM, BLAST, … MIT

Our Static Analysis Setting April client-driven + parametric – new search algorithms: testing, machine learning, … – new analysis questions: minimal, impossible, … abstraction a 2 abstraction a 1 q1q1 p q2q2 p ² q 1 ? p ² q 2 ? MIT

Example 1: Predicate Abstraction (CEGAR) April abstraction a 2 abstraction a 1 q1q1 p q2q2 Predicates to use in predicate abstraction p ² q 1 ? p ² q 2 ? MIT

Example 2: Shape Analysis (TVLA) April Predicates to use as abstraction predicates abstraction a 2 abstraction a 1 q1q1 p q2q2 p ² q 1 ? p ² q 2 ? MIT

Example 3: Cloning-based Pointer Analysis April abstraction a 2 abstraction a 1 q1q1 p q2q2 K value to use for each call and each allocation site p ² q 1 ? p ² q 2 ? MIT

Problem Statement, 1 st Attempt An efficient algorithm with: INPUTS: – program p and query q – abstractions A = { a 1, …, a n } – boolean function S(p, q, a) OUTPUT: – a 2 A: S(p, q, a) = true – Proof: a 2 A: S(p, q, a) = true April q p S p ` q p 0 q a MIT

Orderings on A Efficiency Partial Order – a 1 · cost a 2, sum of a 1 ’s bits · sum of a 2 ’s bits – S(p, q, a 1 ) runs faster than S(p, q, a 2 ) Precision Partial Order – a 1 · prec a 2, a 1 is pointwise · a 2 – S(p, q, a 1 ) = true ) S(p, q, a 2 ) = true April MIT

Final Problem Statement An efficient algorithm with: INPUTS: – program p and property q – abstractions A = { a 1, …, a n } – boolean function S(p, q, a) OUTPUT: – a 2 A: S(p, q, a) = true – Proof: a 2 A: S(p, q, a) = true 8 a’ 2 A: (a’ · a Æ S(p, q, a’) = true) ) a’ = a April Minimal Sufficient Abstraction q p S p ` q p 0 q a AND MIT

An efficient algorithm with: INPUTS: – program p and property q – abstractions A = { a 1, …, a n } – boolean function S(p, q, a) OUTPUT: – a 2 A: S(p, q, a) = true – Proof: a 2 A: S(p, q, a) = true 8 a’ 2 A: (a’ · a Æ S(p, q, a’) = true) ) a’ = a Final Problem Statement April : S(p, q, a) S(p, q, a) 1111 finest 0100 minimal 0000 coarsest Minimal Sufficient Abstraction AND MIT

Why Minimality? Empirical lower bounds for static analysis Efficient to compute Better for user consumption – analysis imprecision facts – assumptions about missing program parts Better for machine learning April MIT

Why is this Hard in Practice? |A| exponential in size of p, or even infinite S(p, q, a) = false for most p, q, a Different a is minimal for different p, q April MIT

Talk Outline Minimal Abstraction Problem Two Algorithms: – Abstraction Coarsening [POPL’11] – Abstractions from Tests [POPL’12] Summary April MIT

Talk Outline Minimal Abstraction Problem Two Algorithms: – Abstraction Coarsening [POPL’11] – Abstractions from Tests [POPL’12] Summary April MIT

Abstraction Coarsening [POPL’11] For given p, q: start with finest a, incrementally replace 1’s with 0’s Two algorithms: – deterministic: ScanCoarsen – randomized: ActiveCoarsen In practice, use combination of the algorithms April : S(p, q, a) S(p, q, a) 1111 finest 0100 minimal 0000 coarsest MIT

Algorithm ScanCoarsen a Ã (1, …, 1) Loop: Remove a component from a Run S(p, q, a) If : S(p, q, a) then Add component back permanently Exploits monotonicity of · prec : Component whose removal causes : S(p, q, a) must exist in minimal abstraction ) Never visits a component more than once April MIT

Problem with ScanCoarsen Takes O(# components) time # components can be > 10,000 ) > 30 days! Idea: try to remove a constant fraction of components in each step April MIT

Algorithm ActiveCoarsen April a Ã (1, …, 1) Loop: Remove each component from a with probability (1 - ® ) Run S(p, q, a) If : S(p, q, a) then add components back Else remove components permanently MIT

Performance of ActiveCoarsen Let: n = total # components s = # components in largest minimal abstraction If set probability ® = e (-1/s) then: ActiveCoarsen outputs minimal abstraction in O(s log n) expected time Significance: s is small, only log dependence on total # components April MIT

Application 1: Pointer Analysis Abstractions Client: static datarace detector [PLDI’06] – Pointer analysis using k-CFA with heap cloning – Uses call graph, may-alias, thread-escape, and may-happen-in-parallel analyses April # components (x 1000) # unproven queries (dataraces) (x 1000) alloc sites call sites 0-CFA1-CFAdiff1-obj2-objdiff hedc weblech lusearch MIT

Experimental Results: All Queries April K-CFA# components (x 1000) BasicRefine (x 1000) ActiveCoarsen hedc (83%)90 (1.0%) weblech (85%)157 (1.0%) lusearch (88%)250 (1.5%) K-obj# components (x 1000) BasicRefine (x 1000) ActiveCoarsen hedc (57%)37 (2.3%) weblech (68%)48 (1.9%) lusearch (73%)56 (1.9%) MIT

Empirical Results: Per Query April MIT

Empirical Results: Per Query, contd. April MIT

Application 2: Library Assumptions The Problem: – Libraries ever-complex to analyze (e.g. native code) – Libraries ever-growing in size and layers Our Solution: – Completely ignore library code – Each component of abstraction = assumption on different library method Example: 1 = best-case, 0 = worst-case – Use coarsening to find a minimal assumption – Users confirm or refute reported assumption April MIT

Summary: Abstraction Coarsening Sparse abstractions suffice to prove most queries Sparsity yields efficient machine learning algorithm Minimal assumptions more practical application of coarsening than minimal abstractions Limitations: runs static analysis as black-box April MIT

Talk Outline Minimal Abstraction Problem Two Algorithms: – Abstraction Coarsening [POPL’11] – Abstractions from Tests [POPL’12] Summary April MIT

Talk Outline Minimal Abstraction Problem Two Algorithms: – Abstraction Coarsening [POPL’11] – Abstractions from Tests [POPL’12] Summary April MIT

Abstractions From Tests [POPL’12] April p, q dynamic analysis p ² q?p ² q? and minimal! static analysis MIT

Combining Dynamic and Static Analysis Previous work: – Counterexamples: query is false on some input suffices if most queries are expected to be false – Likely invariants: a query true on some inputs is likely true on all inputs [Ernst 2001] Our approach: – Proofs: a query true on some inputs is likely true on all inputs and for likely the same reason! April MIT

Example: Thread-Escape Analysis April L L L L h1 h2 h3 h4 local(pc, w)? // u, v, w are local variables // g is a global variable // start() spawns new thread for (i = 0; i < N; i++) { u = new h1; v = new h2; g = new h3; v.f = g; w = new h4; u.f2 = w; pc: w.id = i; u.start(); } MIT

Example: Thread-Escape Analysis // u, v, w are local variables // g is a global variable // start() spawns new thread for (i = 0; i < N; i++) { u = new h1; v = new h2; g = new h3; v.f = g; w = new h4; u.f2 = w; pc: w.id = i; u.start(); } April L L E L h1 h2 h3 h4 but not minimal local(pc, w)? MIT

Example: Thread-Escape Analysis April L E E L h1 h2 h3 h4 and minimal! local(pc, w)? // u, v, w are local variables // g is a global variable // start() spawns new thread for (i = 0; i < N; i++) { u = new h1; v = new h2; g = new h3; v.f = g; w = new h4; u.f2 = w; pc: w.id = i; u.start(); } MIT

Benchmarks April classesbytecodes (x 1000) alloc. sites (x 1000) apptotalapptotal hedc weblech lusearch sunflow1641, avrora1,1591, hsqldb MIT

Precision April MIT

Running Time 38 pre-process time dynamic analysis static analysis time (serial) time#events hedc18s6s0.6M38s weblech33s8s1.5M74s lusearch27s31s11M8m sunflow46s8m375M74m avrora36s32s11M41m hsqldb44s35s25M86m April 2012MIT

Running Time (sec.) CDFs 39April 2012MIT

Running Time (sec.) CDFs 40April 2012MIT

CDF of Number of Alloc. Sites in L 41April 2012MIT

CDF of Number of Alloc. Sites in L 42April 2012MIT

CDF of Number of Queries per Group 43April 2012MIT

CDF of Number of Queries per Group 44April 2012MIT

Summary: Abstractions from Tests If a query is simple, we can find why it holds by observing a few execution traces A methodology to use dynamic analysis to obtain necessary condition for proving queries If static analysis succeeds, then also sufficient condition => minimality! Testing is a growing trend in verification Limitation: needs small tests with good coverage 45April 2012MIT

Talk Outline Minimal Abstraction Problem Two Algorithms: – Abstraction Coarsening [POPL’11] – Abstractions from Tests [POPL’12] Summary April MIT

Talk Outline Minimal Abstraction Problem Two Algorithms: – Abstraction Coarsening [POPL’11] – Abstractions from Tests [POPL’12] Summary April MIT

Overview of Our Approaches April ApproachMinimality?Completeness?Generic? Coarsening [POPL’11] Yes Testing [POPL’12] YesNo Naïve Refine [POPL’11] NoYes Refine+Prune [PLDI’11] NoYes Backward Refine (ongoing work) Yes No Provenance Refine (ongoing work) Yes MIT

Key Takeaways New questions: minimality, impossibility, … New applications: lower bounds, lib assumptions, … New techniques: search algorithms, abstractions, … New tools: meta-analysis, parallelism, … April MIT

Thank You! April Come visit us in beautiful Atlanta! MIT