Data Modeling for Program Analysis Scott McPeak OSQ Retreat.

Slides:



Advertisements
Similar presentations
Functional Decompositions for Hardware Verification With a few speculations on formal methods for embedded systems Ken McMillan.
Advertisements

Writing specifications for object-oriented programs K. Rustan M. Leino Microsoft Research, Redmond, WA, USA 21 Jan 2005 Invited talk, AIOOL 2005 Paris,
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Pointers.
De necessariis pre condiciones consequentia sine machina P. Consobrinus, R. Consobrinus M. Aquilifer, F. Oratio.
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Identity and Equality Based on material by Michael Ernst, University of Washington.
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Chris Riesbeck, Fall 2007 Dynamic Memory Allocation Today Dynamic memory allocation – mechanisms & policies Memory bugs.
Names and Bindings.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Kernighan/Ritchie: Kelley/Pohl:
6/14/991 Symbolic verification of systems with state machines David L. Dill Jeffrey Su Jens Skakkebaek Computer System Laboratory Stanford University.
ISBN Chapter 3 Describing Syntax and Semantics.
CS 355 – Programming Languages
The Design and Implementation of a Certifying Compiler [Necula, Lee] A Certifying Compiler for Java [Necula, Lee et al] David W. Hill CSCI
An Integration of Program Analysis and Automated Theorem Proving Bill J. Ellis & Andrew Ireland School of Mathematical & Computer Sciences Heriot-Watt.
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.
Informatics 43 – April 28, Fun with Models Fashion Student Model = Ideal.
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar [UC Berkeley] Shaz Qadeer [Microsoft Research]
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
Abstract Data Types (ADT)
Houdini: An Annotation Assistant for ESC/Java Cormac Flanagan and K. Rustan M. Leino Compaq Systems Research Center.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
Some administrative stuff Class mailing list: –send to with the command “subscribe”
From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.
Review: forward E { P } { P && E } TF { P && ! E } { P 1 } { P 2 } { P 1 || P 2 } x = E { P } { \exists … }
Review: forward E { P } { P && E } TF { P && ! E } { P 1 } { P 2 } { P 1 || P 2 } x = E { P } { \exists … }
Describing Syntax and Semantics
May 9, 2001OSQ Retreat 1 Run-Time Type Checking for Pointers and Arrays in C Wes Weimer, George Necula Scott McPeak, S.P. Rahul, Raymond To.
May 22, 2002OSQ Retreat 1 CCured: Taming C Pointers George Necula Scott McPeak Wes Weimer
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
CENG 311 Machine Representation/Numbers
1 Chapter 5: Names, Bindings and Scopes Lionel Williams Jr. and Victoria Yan CSci 210, Advanced Software Paradigms September 26, 2010.
Feudal C Automatic memory management with zero runtime overhead CS263 - Spring 1999 Scott McPeak Dan Bonachea Carol Hurwitz C.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,
CSCE 121: Introduction to Program Design and Concepts, Honors Dr. J. Michael Moore Spring 2015 Set 3: Objects, Types, and Values 1 Based on slides.
Introduction to Formal Methods Based on Jeannette M. Wing. A Specifier's Introduction to Formal Methods. IEEE Computer, 23(9):8-24, September,
1 C - Memory Simple Types Arrays Pointers Pointer to Pointer Multi-dimensional Arrays Dynamic Memory Allocation.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Synthesis, Analysis, and Verification Lecture 12 Verifying Programs that have Data Structures.
Reasoning about programs March CSE 403, Winter 2011, Brun.
Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
Motivation  Parallel programming is difficult  Culprit: Non-determinism Interleaving of parallel threads But required to harness parallelism  Sequential.
Testing CSE 160 University of Washington 1. Testing Programming to analyze data is powerful It’s useless (or worse!) if the results are not correct Correctness.
Understanding ADTs CSE 331 University of Washington.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
What’s Ahead for Embedded Software? (Wed) Gilsoo Kim
Extended Static Checking for Java Cormac Flanagan Joint work with: Rustan Leino, Mark Lillibridge, Greg Nelson, Jim Saxe, and Raymie Stata Compaq Systems.
C HAPTER 3 Describing Syntax and Semantics. D YNAMIC S EMANTICS Describing syntax is relatively simple There is no single widely acceptable notation or.
Announcements Quiz this Thursday 1. Multi dimensional arrays A student got a warning when compiling code like: int foo(char **a) { } int main() { char.
Writing, Verifying and Exploiting Formal Specifications for Hardware Designs Chapter 3: Verifying a Specification Presenter: Scott Crosby.
Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm.
Interface specifications At the core of each Larch interface language is a model of the state manipulated by the associated programming language. Each.
Reasoning and Design (and Assertions). How to Design Your Code The hard way: Just start coding. When something doesn’t work, code some more! The easier.
CSE 220 – C Programming malloc, calloc, realloc.
Seminar in automatic tools for analyzing programs with dynamic memory
Programming Languages and Paradigms
Aspect Validation: Connecting Aspects and Formal Methods
Presentation transcript:

Data Modeling for Program Analysis Scott McPeak OSQ Retreat

A Program Verifier Verification assures that a program meets some specification, e.g. "no segfaults" –Full correctness vs. partial specs This is undecidable: annotations Program Specification Annotations useful factsnew obligations

Verifier Architecture Verification condition generation (semantics) Theorem prover program annotations specification (hardcoded) predicates (collectively imply program meets spec) "proved" "not proved"

Verification Benefits Potential for reducing costs of testing and debugging is enormous –Memory safety –Concurrency safety –Adherence to domain-specific protocols Annotation appeal: capture "why" info Could prove absence of certain security violations

Run Time is Too Late Doesn't reduce testing cost Run-time cost may be significant –Cumulative across different analyses Recovery after run-time failure? Delay between introduction of a bug and the discovery of its effect

Will Anyone Annotate? Of course, if cost/benefit ratio is right Benefits can be high (previous slide) Abstraction is key to controlling cost –Can re-use "why" knowledge; libraries, etc. –Common tasks must be easy (e.g. array of non- null elements) –Module-wide defaults under user control

Development Model codecompileverifiertesting type error fix failed proof diagnosis assistant explanation fix wrong behavior debugging...

Data Modeling Program analyzer must abstract application data (otherwise it's just executing!) Model: family of mathematical objects, and axioms which relate them Enormous design space, little guidance Direct impact on success of analysis

Example: Strings Initial model: two function symbols –size(addr)# of allocated bytes –strlen(addr)least index of a 0 byte strcpy(d, s) pre: size(d) < strlen(s) post: strlen(d) = strlen(s) strcat(d, s) pre: size(d) - strlen(d) < strlen(s) post: strlen(d) = pre(strlen(d) + strlen(s))

String as a Set Add the predicate contains(addr, ch) ! {T,F} strcpy(d, s) post: 8 ch. contains(s, ch), contains(d, ch) strchr(s, ch) ! r post: contains(s, ch) ) 9 i. r = s+i && : contains(s, ch) ) r = NULL

String as a Sequence Add another symbol "[]" addr[i] ! ch strcpy(d, s) post: 8 i. d[i] = s[i] strchr(s, ch) ! r post: ( 9 i. s[i]=ch) ) *r=ch && : ( 9 i. s[i]=ch) ) r=NULL

Example: Integers " int " is easy to model, right? Well... Mathematical integers Finite partition: { 1 } 32-bit 2's complement with wraparound

Example: Memory mem toplevel obj addr &x malloc(..) a struct field offsets g array int indexes 8 3 "x" = sel(mem 0, addr x ) "a.g[3]" = sel(sel(sel(mem 0, addr a ), g), 3) "a" "a.g"

Pointers Pointers are access paths "&(a.g[3])" = sub(sub(sub(whole, a), g), 3) Rules to read via pointers Can also write, do pointer arithmetic, deeper indexing, e.g. "&(p->x)" selPtr(obj, sub(rest, index)) = v sel(selPtr(obj, rest), index) = v selPtr(obj, whole) = obj

Data Structure Invariants Classic approach: universal quantifier – 8 a. type(a)=Foo ) a->x = a->y + 1 Field admission predicate –Bar *p; admission: p!=NULL; Object state field: "ok" vs. "not ok" –Change a field ! state:="not ok" –Manually certify "ok", precondition=invariant – 8 a. type(a)=Foo ) a->state="ok"

Example: Change Sets Globals: list of changed / list of unchanged –Not ideal.. name sets of globals? Hierarchical mem: changed object is easy –new = update(old, obj_addr, some_value) But changed field (of many objects) is hard Possible alternative: staged & weakened invariants; state what is still true, rather than naming what has changed

Conclusions Try to capture invariants implicitly, via representation choices Be explicit about related entities: inDegree(n)=d vs. inDegree1(n, referrer) Let user select among possible models, even to choose not to model certain fields Try to think like a programmer