Evaluating Static Analysis Tools Dr. Paul E. Black

Slides:



Advertisements
Similar presentations
Static Analysis for Security
Advertisements

Juliet Test Suite Overview
Software Assurance Metrics and Tool Evaluation (SAMATE) Michael Kass National Institute of Standards and Technology
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
SPLINT STATIC CHECKING TOOL Sripriya Subramanian 10/29/2002.
SATE 2010 Background Vadim Okun, NIST October 1, 2010 The SAMATE Project
CMSC 414 Computer and Network Security Lecture 22 Jonathan Katz.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Static code check – Klocwork
Improving Static Analysis Results Accuracy Chris Wysopal CTO & Co-founder, Veracode SATE Summit October 1, 2010.
CMSC 414 Computer and Network Security Lecture 24 Jonathan Katz.
SOFTWARE SECURITY JORINA VAN MALSEN 1 FLAX: Systematic Discovery of Client-Side Validation Vulnerabilities in Rich Web Applications.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
Examining the Code [Reading assignment: Chapter 6, pp ]
This is a work of the U.S. Government and is not subject to copyright protection in the United States. The OWASP Foundation OWASP AppSec DC October 2005.
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Planning for SATE V Paul E. Black National Institute of Standards and Technology
TGDC Meeting, December 2011 Michael Kass National Institute of Standards and Technology Update on SAMATE Automated Source Code Conformance.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Testing Tools. Categories of testing tools Black box testing, or functional testing Testing performed via GUI. The tool helps in emulating end-user actions.
Web Application Access to Databases. Logistics Test 2: May 1 st (24 hours) Extra office hours: Friday 2:30 – 4:00 pm Tuesday May 5 th – you can review.
A Security Review Process for Existing Software Applications
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
The Role of Static Analysis Tools in Software Development Paul E. Black
Natalia Yastrebova What is Coverity? Each developer should answer to some very simple, yet difficult to answer questions: How do I find new.
Copyright 2007 © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.
An Overview of the DHS/NIST SAMATE Project SSE Seminar April 10, 2006 Michael Kass Information Technology Laboratory National Institute of Standards and.
OWASP Top Ten #1 Unvalidated Input. Agenda What is the OWASP Top 10? Where can I find it? What is Unvalidated Input? What environments are effected? How.
1 Vulnerability Assessment of Grid Software James A. Kupsch Computer Sciences Department University of Wisconsin Condor Week 2007 May 2, 2007.
Software Security Weakness Scoring Chris Wysopal Metricon August 2007.
Security Attacks CS 795. Buffer Overflow Problem Buffer overflows can be triggered by inputs that are designed to execute code, or alter the way the program.
DTS ( Defect Testing System ) Yang Zhao Hong, Gong Yun Zhan,Xiao Qing, Wang Ya Wen Beijing University of Posts and Telecommunications
Static Analysis James Walden Northern Kentucky University.
This is a work of the U.S. Government and is not subject to copyright protection in the United States. The OWASP Foundation OWASP AppSec DC October 2005.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
NIST SAMATE Project and OMG Michael Kass NIST Information Technology Laboratory March 11, 2008.
Chapter 1 The Software Security Problem. Goals of this course Become aware of common pitfalls. Static Analysis and tools.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
© 2011 IBM Corporation Hybrid Analysis for JavaScript Security Assessment Omer Tripp Omri Weisman Salvatore Guarnieri IBM Software Group Sep 2011.
SwA Co-Chair and Task Lead Strategy Session Agenda Technology, Tools and Product Evaluation Working Group Status Briefing Co-Chair(s) Michael Kass (NIST),
Security Attacks Tanenbaum & Bo, Modern Operating Systems:4th ed., (c) 2013 Prentice-Hall, Inc. All rights reserved.
SANS Top 25 Most Dangerous Programming Errors Catagory 1: Insecure Interaction Between Components These weaknesses are related to insecure ways.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
ESSoS: February Leuven, Belgium1 Measuring the Effect of Code Complexity on Static Analysis Results James Walden, Adam Messer, Alex Kuhl Northern.
Tool Support for Testing Classify different types of test tools according to their purpose Explain the benefits of using test tools.
SECURE DEVELOPMENT. SEI CERT TOP 10 SECURE CODING PRACTICES Validate input Use strict compiler settings and resolve warnings Architect and design for.
Smashing WebGoat for Fun and Research: Static Code Scanner Evaluation Josh Windsor & Dr. Josh Pauli.
Code improvement: Coverity static analysis Valgrind dynamic analysis GABRIELE COSMO CERN, EP/SFT.
Content Coverity Static Analysis Use cases of Coverity Examples
Secure Programming Dr. X
Application Communities
Tools for Code Review Static Analysis Handles unfinished code
YAHMD - Yet Another Heap Memory Debugger
Secure Programming Dr. X
Chapter 8 – Software Testing
Finding and Fighting the Causes of Insecure Applications
^ About the.
A Security Review Process for Existing Software Applications
Taint tracking Suman Jana.
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
Improving Security Using Extensible Lightweight Static Analysis
AdaCore Technologies for Cyber Security
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
Covering CWE with Programming Languages and Tools
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Introduction to Static Analyzer
Finding and Fighting the Causes of Insecure Applications
Introduction to Data Structure
CSC-682 Advanced Computer Security
Presentation transcript:

Evaluating Static Analysis Tools Dr. Paul E. Black

Static and Dynamic Analysis Complement Each Other Static Analysis Examine code Handles unfinished code Can find backdoors, eg, full access for user name “JoshuaCaleb ” Potentially complete Dynamic Analysis Run code Code not needed, eg, embedded systems Has few(er) assumptions Covers end-to-end or system tests

Different Static Analyzers Are Used For Different Purposes To check intellectual property violation By developers to decide if anything needs to be fixed (and learn better practices) By auditors or reviewer to decide if it is good enough for use

Dimensions of Static Analysis SyntacticHeuristicAnalyticFormal General (implicit) Application (explicit) Source Byte code Binary Level of Rigor Properties Code Analysis can look for general or application-specific properties Analysis can be on source code, byte code, or binary The level of rigor can vary from syntactic to fully formal.

SATE 2008 Overview Static Analysis Tool Exposition (SATE) goals: –Enable empirical research based on large test sets –Encourage improvement of tools –Speed adoption of tools by objectively demonstrating their use on real software NOT to choose the “best” tool Co-funded by NIST and DHS, Nat’l Cyber Security Division Participants: Aspect Security ASC  HP DevInspect Checkmarx CxSuite  SofCheck Inspector for Java Flawfinder  UMD FindBugs Fortify SCA  Veracode SecurityReview Grammatech CodeSonar

6 SATE 2008 Events Telecons, etc. to come up with procedures and goals We chose 6 C & Java programs with security implications and gave them to tool makers (15 Feb) Tool makers ran tools and returned reports (29 Feb) We analyzed reports - (tried to) find “ground truth” (15 Apr) We expected a few thousand warnings - we got over 48,000. Critique and update rounds with some tool makers (13 May) Everyone shared observations at a workshop (12 June) We released our final report and all data 30 June

SATE 2008: There’s No Such Thing as “One Weakness” Only 1/8 to 1/3 of weaknesses are simple. The notion breaks down when –weakness classes are related and –data or control flows are intermingled. Even “location” is nebulous.

Hierarchy Chains lang = %2e./%2e./%2e/etc/passwd%00 Composites f rom “Chains and Composites”,Steve Christey, MITRE How Weakness Classes Relate Cross-Site Scripting CWE-79 Command Injection CWE-77 Improper Input Validation CWE-20 Validate- Before-Canonicalize CWE-180 Relative Path Traversal CWE-23 Container Errors CWE-216 Race Conditions CWE-362 Predictability CWE-340 Permissions CWE-275 Symlink Following CWE-61

use line 819 use line 808 Intermingled Flow: 2 sources, 2 sinks, 4 paths How many weakness sites? free line 1503 free line 2644

Other Observations Tools can’t catch everything: cleartext transmission, unimplemented features, improper access control, … Tools catch real problems: XSS, buffer overflow, cross-site request forgery - 13 of SANS Top 25 (21 with related CWEs) Tools reported some 200 different kinds of weaknesses –Buffer errors still very frequent in C –Many XSS errors in Java “Raw” report rates vary by 3x depending on code Tools are even more helpful when “tuned” Coding without security in mind leaves MANY weaknesses

Current Source Code Security Analyzers Have Little Overlap 2 tools 3 tools 4 tools All 5 tools Non-overlap: Hits reported by one tool and no others (84%) Overlap: Hits reported by more than one tool (16%) from MITRE

Precision & Recall Scoring All True Positives No True Positives Reports Everything Misses Everything Finds more flaws Finds mostly flaws “Better” The Perfect Tool Finds all flaws and finds only flaws from DoD

Tool A All True Positives No True Positives Uninitialized variable use Null pointer dereference Improper return value use All flaw types Use after free TOCTOU Memory leak Buffer overflow Tainted data/Unvalidated user input Reports Everything Misses Everything from DoD

Tool B All True Positives No True Positives Uninitialized variable use Null pointer dereference Improper return value use All flaw types Use after free TOCTOU Memory leak Buffer overflow Tainted data/Unvalidated user input Command injection Format string vulnerability Reports Everything Misses Everything from DoD

Best Tool All True Positives No True Positives Uninitialized variable use Improper return value use Use after free TOCTOU Memory leak Buffer overflow Tainted data/Unvalidated user input Command injection Format string vulnerability Null pointer dereference Reports Everything Misses Everything from DoD

Tools Useful in Quality “Plains” Tools alone are not enough to achieve the highest “peaks” of quality. In the “plains” of typical quality, tools can help. If code is adrift in a “sea” of chaos, train developers. Tararua mountains and the Horowhenua region, New Zealand Swazi Apparel Limited used with permissionwww.swazi.co.nz

Tips on Tool Evaluation Start with many examples covering code complexities and weaknesses SAMATE Reference Dataset (SRD) Many cases from MIT: Lippmann, Zitser, Leek, Kratkiewicz Add some of your typical code. Look for –Weakness types (CWEs) reported –Code complexities handled –Traces, explanations, and other analyst support –Integration and machine-readable reports –Ability to write rules and ignore “known good” code False alarm ratio (fp/tp) is a poor measure. Report density (r/kLoc) is probably better.