Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.

Slides:



Advertisements
Similar presentations
Saumya Debray The University of Arizona Tucson, AZ
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Alan Shaffer, Mikhail Auguston, Cynthia Irvine, Tim Levin The 7th OOPSLA Workshop on Domain-Specific Modeling October 21-22, 2007 Toward a Security Domain.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
ISBN Chapter 3 Describing Syntax and Semantics.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Testing Without Executing the Code Pavlina Koleva Junior QA Engineer WinCore Telerik QA Academy Telerik QA Academy.
HCSSAS Capabilities and Limitations of Static Error Detection in Software for Critical Systems S. Tucker Taft CTO, SofCheck, Inc., Burlington, MA, USA.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Program analysis Mooly Sagiv html://
Program analysis Mooly Sagiv html://
Run time vs. Compile time
Programming Language Semantics Mooly SagivEran Yahav Schrirber 317Open space html://
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Overview of program analysis Mooly Sagiv html://
Semantics with Applications Mooly Sagiv Schrirber html:// Textbooks:Winskel The.
1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
Describing Syntax and Semantics
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Applying Dynamic Analysis to Test Corner Cases First Penka Vassileva Markova Madanlal Musuvathi.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Overview of program analysis Mooly Sagiv html://
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
1 Tentative Schedule u Today: Theory of abstract interpretation u May 5 Procedures u May 15, Orna Grumberg u May 12 Yom Hatzamaut u May.
Quality Assurance and Testing CSE 403 Lecture 22 Slides derived from a talk by Ian King.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing
Scalable and Flexible Static Analysis of Flight-Critical Software Guillaume P. Brat Arnaud J. Venet Carnegie.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
8/9/2005Kestrel Technology LLC Page 1 C Global Surveyor Arnaud Venet Kestrel Technology, LLC 3260 Hillview Avenue Palo Alto, CA 94304
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
An Information Theory based Modeling of DSMLs Zekai Demirezen 1, Barrett Bryant 1, Murat M. Tanik 2 1 Department of Computer and Information Sciences,
Bridging the chasm between MDE and the world of compilation Nondini Das 1.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Cs3102: Theory of Computation Class 18: Proving Undecidability Spring 2010 University of Virginia David Evans.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Information Leaks Without Memory Disclosures: Remote Side Channel Attacks on Diversified Code Jeff Seibert, Hamed Okhravi, and Eric Söderström Presented.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Introduction to Software Analysis CS Why Take This Course? Learn methods to improve software quality – reliability, security, performance, etc.
1 Proving program termination Lecture 5 · February 4 th, 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
1 Program Analysis Too Loopy? Set the Loops Aside Eric Larson September 25, 2011 Seattle University.
Abstraction and Abstract Interpretation. Abstraction (a simplified view) Abstraction is an effective tool in verification Given a transition system, we.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Software Testing.
10.3 Details of Recursion.
runtime verification Brief Overview Grigore Rosu
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology Please see Notes View for annotations.

Page 2 5/2/2007  Kestrel Technology LLC Static Analysis Code Static Analysis Tool Parameters List of defects Confirmed property violations Indeterminates (possible violations)

Page 3 5/2/2007  Kestrel Technology LLC Can we find all defects? Classic undecidability results in Computer Science apply (halting problem) We require soundness (no defects are missed) –A conservative approach is acceptable Abstract Interpretation is the enabling theory in CodeHawk –Sound –Tunably precise –Scalable –“Generatively general”

Page 4 5/2/2007  Kestrel Technology LLC Intuition SW Defects All lines in the code Abstract Interpretation

Page 5 5/2/2007  Kestrel Technology LLC Abstract Interpretation Computes an envelope of all data in the program Mathematical assurance Static analyzers based on Abstract interpretation are difficult to engineer KT’s expertise: building scalable and effective abstract interpreters

Page 6 5/2/2007  Kestrel Technology LLC Properties vs. defects An application might be free of programming language defects but not carry the desired property –resource issues (memory, execution time) –separation –range of output data Abstract Interpretation can address application-specific of properties as well as language defects

Page 7 5/2/2007  Kestrel Technology LLC Objectives Go over a detailed example –Understand how the technology works Achievements and challenges in the engineering of abstract interpreters –What it means to build an analyzer based on Abstract Interpretation

Page 8 5/2/2007  Kestrel Technology LLC Detailed example

Page 9 5/2/2007  Kestrel Technology LLC Buffer overflow for(i = 0; i < 10; i++) { if(message[i].kind == SHORT_CMD) allocate_space (channel, 1000); else allocate_space (channel, 2000); } Can we exceed the channel’s buffer capacity?

Page 10 5/2/2007  Kestrel Technology LLC Control Flow Graph i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop

Page 11 5/2/2007  Kestrel Technology LLC i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop Analytic model of the code i < 10 ? i = 0 i++ M = M M = M stop M = 0

Page 12 5/2/2007  Kestrel Technology LLC Analysis process intuition We mimic the execution of the program We collect all possible data values for all variables for all possible traces

Page 13 5/2/2007  Kestrel Technology LLC Analyzing the model i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 14 5/2/2007  Kestrel Technology LLC Initially i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 15 5/2/2007  Kestrel Technology LLC Loop initialization i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 16 5/2/2007  Kestrel Technology LLC Loop entry i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 17 5/2/2007  Kestrel Technology LLC Analyzing a branching (1) i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 18 5/2/2007  Kestrel Technology LLC Analyzing a branching (2) i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M ?

Page 19 5/2/2007  Kestrel Technology LLC Accumulating all possible values i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 20 5/2/2007  Kestrel Technology LLC Abstraction of point clouds We want the analysis to terminate in reasonable time We need a tractable representation of point clouds in arbitrary dimensions Abstract Interpretation offers a broad choice of such representations Example: convex polyhedra –Compute the convex hull of a point cloud

Page 21 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M

Page 22 5/2/2007  Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 23 5/2/2007  Kestrel Technology LLC Iterating the loop analysis i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M

Page 24 5/2/2007  Kestrel Technology LLC Building the loop invariant i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 25 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 26 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M

Page 27 5/2/2007  Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 28 5/2/2007  Kestrel Technology LLC Building the loop invariant i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M

Page 29 5/2/2007  Kestrel Technology LLC Keeping iterating… i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 30 5/2/2007  Kestrel Technology LLC Passing to the limit We want this iterative process to end at some point We need to converge when analyzing loops After some iteration steps, we use a widening operation at loop entry to enforce convergence

Page 31 5/2/2007  Kestrel Technology LLC Widening  Let a 1, a 2, …a n, … be a sequence of polyhedra, then the sequence –w 1 = a 1 –w n+1 = w n  a n+1 is ultimately stationary The widening is a join operation i.e., a  a  b & b  a  b

Page 32 5/2/2007  Kestrel Technology LLC Widening for intervals [a, b]  [c, d] = [if c < a then -  else a, if b < d then +  else b] Example: [10, 20]  [11, 30] = [10, +  ]

Page 33 5/2/2007  Kestrel Technology LLC Widening for polyhedra We eliminate the faces of the computed convex envelope that are not stable Convergence is reached in at most N steps where N is the number of faces of the polyhedron at loop entry

Page 34 5/2/2007  Kestrel Technology LLC Widening i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M

Page 35 5/2/2007  Kestrel Technology LLC After widening i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M

Page 36 5/2/2007  Kestrel Technology LLC Detecting convergence Abstract iteration sequence –F 1 = P (initial polyhedron) –F n+1 = F n if S(F n )  F n F n  S(F n ) otherwise where S is the semantic transformer associated to the loop body Theorem: if there exists N such that F N+1  F N, then F n = F N for n > N.

Page 37 5/2/2007  Kestrel Technology LLC Convergence i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M The computation has converged

Page 38 5/2/2007  Kestrel Technology LLC We are not done yet… The analyzer has just proven that 1000 * i ≤ M ≤ 2000 * i But we have lost all information about the termination condition 0 ≤ i ≤ 10 Since we have obtained an envelope of all possible values of the variables, if we run the computation again we still get such an envelope The point is that this new envelope can be smaller This refinement step is called narrowing

Page 39 5/2/2007  Kestrel Technology LLC Refinement i = 0 i < 10 ? M = M M = M M = 0 stop i++ M i i M 9

Page 40 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M M = M M = 0 stop i++ i i M 9 i i M 9

Page 41 5/2/2007  Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M 9

Page 42 5/2/2007  Kestrel Technology LLC Back to loop entry i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M i M 10 1

Page 43 5/2/2007  Kestrel Technology LLC Narrowing i = 0 i < 10 ? M = M M = M M = 0 stop i++ i M 10 M i

Page 44 5/2/2007  Kestrel Technology LLC Refined loop invariant i = 0 i < 10 ? M = M M = M M = 0 stop i++ M i 10

Page 45 5/2/2007  Kestrel Technology LLC Invariant at loop exit i = 0 i < 10 ? M = M M = M M = 0 stop i++ M i 10 20,000 10,000 i  10

Page 46 5/2/2007  Kestrel Technology LLC [ ] Interpretation of the results 5,000 15,000 25,000 10,000 20,000 Space allocated in the buffer Buffer size: Certain buffer overflow Possible buffer overflow No buffer overflow

Page 47 5/2/2007  Kestrel Technology LLC Achievements and challenges

Page 48 5/2/2007  Kestrel Technology LLC Assured static analyzers for NASA C Global Surveyor: verified array bound compliance for NASA mission-critical software –Mars Exploration Rovers: 550K LOC –Deep Space 1: 280K LOC –Mars Path Finder: 140K LOC Pointer analysis: –International Space Station payload software (major bug found)

Page 49 5/2/2007  Kestrel Technology LLC What we observe No scalable, precise, and general-purpose abstract interpreter PolySpace: –Handles all kinds of runtime errors –Decent precision (<20% indeterminates) –Doesn’t scale (topped out at  40K LOC) Customization is the key –Specialized for a property or a class of applications –Manually crafted by experts

Page 50 5/2/2007  Kestrel Technology LLC CodeHawk™ Abstract Interpretation development platform/ static analyzer generator Automated generation of customized static analyzers –Leverage from pre-built analyzers –Directly tunable by the end-user

Page 51 5/2/2007  Kestrel Technology LLC Where CodeHawk stands Application: architecture environment input range usage protocol … Abstract Interpretation: application independent abstract model algorithms difficult to implement needs a lot of expertise CodeHawk

Page 52 5/2/2007  Kestrel Technology LLC Recent Applications Malware detection –Customized analyzers for specific kinds of malware –Naturally resistant to complex obfuscating transformations –Evaluated on DoD-supplied test case Library/Component analysis –Proof of absence of buffer overflow in OpenSSH’s dynamic buffer library Shared variables –Protection policies for shared variables –Evaluated on a code from a large DoD prime contractor

Page 53 5/2/2007  Kestrel Technology LLC Conclusions Promising and proven technology –key distinction for assurance: no false negatives –can verify application properties as well as detect defects –can be tailored for various domains (e.g., malware) Not a silver bullet..rather a bullet generator –each modeled domain offers leverage –required expertise for broad use must still be reduced