1 Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions Dawson Engler Benjamin Chelf Andy Chou Seth Hallem Stanford University.

Slides:

Advertisements

Similar presentations

Detecting Bugs Using Assertions Ben Scribner. Defining the Problem  Bugs exist  Unexpected errors happen Hardware failures Loss of data Data may exist.

Advertisements

1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.

Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Current Techniques in Language-based Security David Walker COS 597B With slides stolen from: Steve Zdancewic University of Pennsylvania.

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

Program Representations. Representing programs Goals.

Programming Types of Testing.

Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.

(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

Aliases in a bug finding tool Benjamin Chelf Seth Hallem June 5 th, 2002.

Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)

Checking System Rules Using System-Specific, Programmer- Written Compiler Extensions Dawson Engler, Benjamin Chelf, Andy Chow, Seth Hallem Computer Systems.

Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?

Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.

CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.

Context-sensitive Analysis, II Ad-hoc syntax-directed translation, Symbol Tables, andTypes.

Multiscalar processors

1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.

Intermediate Code. Local Optimizations

MULTIVIE W Checking System Rules Using System-Specific, Program-Written Compiler Extensions Paper: Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem.

CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.

Describing Syntax and Semantics

CS 104 Introduction to Computer Science and Graphics Problems Software and Programming Language (2) Programming Languages 09/26/2008 Yang Song (Prepared.

Gaurav S. Kc, 1 MC: Meta-level Compilation Extending the Process of Code Compilation with Application-Specific Information.

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,

Programming. Software is made by programmers Computers need all kinds of software, from operating systems to applications People learn how to tell the.

1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.

Compiled by Benjamin Muganzi 3.2 Functions and Purposes of Translators Computing 9691 Paper 3 1.

Unit Testing & Defensive Programming. F-22 Raptor Fighter.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Design-Making Projects Work (Chapter7) n Large Projects u Design often distinct from analysis or coding u Project takes weeks, months or years to create.

A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.

Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

Inferring and checking system rules by static analysis William R Wright.

Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.

Richard Mancusi - CSCI 297 Static Analysis and Modeling Tools which allows further checking of software systems.

Static Code Checking: Security and Concurrency Ben Watson The George Washington University CS 297 Security and Programming Languages June 9, 2005.

From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.

MD – Object Model Domain eSales Checker Presentation Régis Elling 26 th October 2005.

Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.

3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.

Chapter 3 Part II Describing Syntax and Semantics.

The Software Development Process

An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.

Chapter 1 Introduction Major Data Structures in Compiler

Copyright © Curt Hill The IF Revisited If part 4 Style and Testing.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

Testing and Debugging. Testing Fundamentals  Test as you develop Easier to find bugs early rather than later Prototyping helps identify problems early.

“ A Simple Method for Extracting Models From Protocol Code ” David Lie, Andy Chou, Dawson Engler and David D. Dill Presented by Anna Zamansky.

September 1999Compaq Computer CorporationSlide 1 of 16 Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter,

/ PSWLAB Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions by D. Engler, B. Chelf, A. Chou, S. Hallem published.

Checking System Rules Using System-Specific Programmer Written Compiler Extensions Dawson Engler, Benjamin Chelf, Andy Chou and Seth Hallem Presented by:

GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.

Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.

Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

BY Al Bessey, Ken Block, Ben Chelf, Andy Chou,

Chapter 8 – Software Testing

Superscalar Processors & VLIW Processors

Background and Motivation

Presentation transcript:

1 Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions Dawson Engler Benjamin Chelf Andy Chou Seth Hallem Stanford University Matthew Thornton November 9, 2005

2 Motivation Developers of systems software have “rules” to check for correctness or performance. (Do X, don’t do X, do X before Y…) Code that does not obey these “rules” will run slow, crash the system, launch the missiles… Consequently, we need a systematic way of finding as many of these bugs as we can, preferably for as little cost as possible.

3 What Will we be Talking About? What’s the problem? What’s the solution? Discuss some of the interesting details – Assertion Checking – Global Rule Enforcement – FLASH Optimizations Evaluation and conclusions Some related work/history of the paper

4 What’s the Problem? Current solutions all have trade-offs. Formal Specifications-rigorous, mathematical approach – Finds obscure bugs, but is hard to do, expensive, and don’t always mirror the actual written code. Testing-systematic approach to test the actual code – Will detect bugs, but testing a large system could require exponential/combinatorial number of test cases. It also doesn’t isolate where the bug is, just that a bug exists. Manual Inspection-peer review of the code – Peer has knowledge of whole system and semantics, but doesn’t have the diligence of a computer.

5 What’s the Problem? None of the current methods seem to give us what we’re looking for. Can the compiler check the code? – It would be nice to put the code in the compiler and have it check all of the “rules.” – Unfortunately, those “rules” are based on semantics of the system that the compiler doesn’t understand. (Lock and Unlock are valid to the compiler, but how and when they should be used isn’t.) Need some technique that merges the domain knowledge of the developer with the analysis of a compiler.

6 What’s the Solution? Meta-level compilation (MC) combines the domain knowledge of developers with analysis capabilities of a compiler. Allows programmers to write short, simple, system-specific checkers that take into account unique semantics of a system. Checkers are then added to a compiler to check during compile-time.

7 What’s the Solution? The author’s MC system uses a high-level, state- machine language called Metal. Metal extensions written by programmers are linked to a compiler (xg++) that analyzes the code as it is being compiled. – Intra and Interprocedural analysis. xg++ source.cpp Metal “rules” Warnings/Errors source.o

8 How does it work? The language is a high-level, state-machine language. Two parts of the language—pattern part and state- transition part. – Pattern language—finds “interesting” parts of code based on the extension the programmer writes. – State-transition—Based on the discovered pattern, current state, either move to a new state or raise an error. Tests are written and then added to the xg++ compiler. Xg++ includes a base library that includes some common, useful functions and types.

9 How does it work? Compiler generates the AST for the program that is being compiled. Metal extensions are compiled into a set of transitions. The xg++ compiler traverses the AST for the program in execution order in a depth-first manner, following the transition patterns, as they apply.

10 AST for an Assert Statement assert(_exos_self_inser t_pte(0, PG_P|PG_U|PG_W, PGROUNDDOWN(va), 0, NULL) == 0); assert expr == _exos_self_insert_pte int expr 0 Exprexpr ()

11 Assertion Checking Assertions should check a value, a state, etc. Assertions should not change any values or the state of the program. (ie, you should be able to take them out of the program without any change.) Example that would crash if it was removed: assert(_exos_self_insert_pte(0, PG_P|PG_U|PG_W, PGROUNDDOWN(va), 0, NULL) == 0);

12 Assertion Checking sm Assert flow_insensitive { decl { any } expr, x, y, z; decl { any_call } any_fcall; decl { any_args } args; start: { assert(expr); } ==> { mgk_expr_recurse(expr, in_assert); } ; in_assert: { any_fcall(args) } ==>{ err("function call"); } | { x = y } ==> { err("assignment"); } | { z++ } ==> { err("post-increment"); } | { z-- } ==> { err("post-decrement"); } ; } Start in_assert assert(expr) err function assignment post-increment post-decrement Other structures Variable Declarations— ”any” symbolizes a hole type, with self- explanatory names. State Transitions: Given current state, if you see the given pattern, execute what is after the ==> (Typically an error message or a state transition.)

13 Global Rule Checking Many rules apply globally across function call chains. Example: Rules that are expressed in terms of blocking functions, such as certain types of deadlock. xg++ provides mechanisms for gathering “global” data and then applying it to a xg++ extension.

14 Global Rule Checking—Checking for Deadlock “Kernel code cannot call blocking functions with interrupts disabled or while holding a spin lock. Violating this rule can lead to deadlock.” We need to include a rule that will handle this rule. – Unfortunately, when executing a rule like this, we need to know what function calls can result in a call to a blocking function. Solution: Use Global Rule Checking

15 Global Rule Checking—Checking for Deadlock Compiler’s 2 passes generate a call graph. – First pass uses a Metal extension to find those functions that potentially block, tags those functions in the resulting call graph. – Second pass links all files sent to xg++ into a large call graph, does a depth-first traversal to find all functions that have a path to a blocking function. Generates a listing of these functions. Now, we can execute a localized rule within the context of these blocking functions.

16 Global Rule Checking—Checking for Deadlock With the list of blocking functions available, a second extension is run through the program code. Rules include detecting when spin locks are enabled/disabled or when interrupts are enabled/disabled. When in the state where locks are enabled or interrupts are disabled, a blocking function cannot be called because it can cause deadlock in the Linux implementation. Clean Interrupts, Locked No Interrupts, not Locked No Interrupts, Locked err +/- Interrupts +/- lock +/- Interrupts +/- lock Function call that can block

17 New, Improved Global Rule Checking Global Rule Checking was formalized in later version of Metal. Two Passes – First Pass: Each file being compiled has a temporary AST generated for it. – Second Pass: Reads temporary files to reconstruct the ASTs for entire program, control-flow graph is generated to trace the execution through multiple files. (Functions that aren’t called are the roots of the trees.) Metal extensions are then run on the AST in depth- first manner based on the control-flow graph that is generated.

18 FLASH Optimizations Not only can you detect software bugs, it should be obvious that any types of rules can be enforced using this code, including performance-enhancing rules. Example: FLASH Hardware/Software – Code for FLASH must be fast because it implements functionality usually in hardware. – Been aggressively optimized for many years, but MC still is able to provide hundreds of optimizations, because it’s hard to manually traverse deeply nested control paths.

19 FLASH Optimizations Buffer-free optimizations – Traces send calls. Detects if a buffer is needed and if the send frees the buffer. Redundant length assignments – It can be difficult down deep nesting paths to remember if a length field for a buffer has already been set. – Metal allows for such a scan. Efficient opcode setting – Scan to see if the message header has a known opcode already there. If so, recommend XOR’ing with the desired opcode. (Reduces assembly instructions to 1.)

20 Evaluation Anecdotal evidence throughout the paper demonstrating that MC discovers a large number of bugs. – Ran tests on FLASH’s cache coherence code, as well as versions of Linux. – In both cases, the rule extensions that were run found bugs that could have potentially crashed the system. – In one case, there was a bug that was detected that would have required the tester to look through 300 lines of code, 20 if- statements, 4 else clauses, and 29 conditional compilations. The large number of bugs is magnified by the fact that the rules for finding the bugs were written in few lines of code (<100, in most cases.)

21 Evaluation continued No formal experiment done to demonstrate that their system was better than other established systems. For the performance evidence, there was no discussion of how much of a performance improvement there would have been if the compiler’s recommendations were actually executed.

22 Evaluation continued After paper was published, more data was gathered on bug discovery using Metal on Linux kernel. Available at: nux/list.php3

23 Conclusions—The Good Best of all worlds (testing, formal specs, manual inspection) Very simple to write “rules”. Discovers a large number of bugs that could potentially crash the system, even with simple rules. Problems are identified before code is even executed. Flexible solution that allows for varied checks to security, stability, and even performance.

24 Conclusions—The Bad Situations occurred when there were false positives. – Many of which were a result of not completely fleshing out the rules, in particular for more complex scenarios. – Currently coming up with ways of easily writing code to eliminate these false positives. (Heuristic algorithms to determine which bugs are the most important bugs, for example.) No discussion of the backgrounds of people who wrote the rules. – How much domain knowledge did the rule authors have? – How much programming ability did the rule authors have? – More explanation of the experimental setup would have been nice.

25 Related Work/History Won Best Paper at OSDI 2000 Based on previous work called Magik. – Much more difficult to write extensions. Several other papers written on topic. Ideas are now marketed as a company founded by Engler called Coverity.

26 Related Work/History Application-specific information in compilers— Eraser. Formal verification, strong type checkers. Extensible compilers – Ctool—Traverse the AST, look for domain-specific issues. – Meta-object protocols—Extensions written into compiler. – Aspect-Oriented Programming—Weave checks into existing code.

27 References Engler, D. et al. Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions, OSDI Engler, D. Incorporating Application Semantics and Control into Compilation. IEEE Transactions on Software Engineering, May/June, Vol 25, Number 3, Hallem, Seth et al. A System and Language for Building System-Specific, Static Analyses, PLDI 2002

28 Questions? ?