Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Intermediate Code Generation
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Programming Languages and Paradigms
Names and Bindings.
The Assembly Language Level
Chapter 7: User-Defined Functions II
Chapter 7: User-Defined Functions II Instructor: Mohammad Mojaddam.
By Philipp Vogt, Florian Nentwich, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna Network and Distributed System Security(NDSS ‘07)
Introduction to C Programming
Automating Bespoke Attack Ruei-Jiun Chapter 13. Outline Uses of bespoke automation ◦ Enumerating identifiers ◦ Harvesting data ◦ Web application fuzzing.
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
Run time vs. Compile time
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
XP Tutorial 1 New Perspectives on JavaScript, Comprehensive1 Introducing JavaScript Hiding Addresses from Spammers.
Introduction to C Programming
Data Flow Analysis Compiler Design Nov. 8, 2005.
Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Automatic Creation of SQL Injection and Cross-Site Scripting Attacks 2nd-order XSS attacks 1st-order XSS attacks SQLI attacks Adam Kiezun, Philip J. Guo,
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Prevent Cross-Site Scripting (XSS) attack
VEX: VETTING BROWSER EXTENSIONS FOR SECURITY VULNERABILITIES XIANG PAN.
CSCI 6962: Server-side Design and Programming Secure Web Programming.
NDSS 2007 Philipp Vogt, Florian Nentwich, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, Giovanni Vigna.
Web Application Access to Databases. Logistics Test 2: May 1 st (24 hours) Extra office hours: Friday 2:30 – 4:00 pm Tuesday May 5 th – you can review.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
Computer Security and Penetration Testing
Preventing Web Application Injections with Complementary Character Coding Raymond Mui Phyllis Frankl Polytechnic Institute of NYU Presented at ESORICS.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
CMPS 211 JavaScript Topic 1 JavaScript Syntax. 2Outline Goals and Objectives Goals and Objectives Chapter Headlines Chapter Headlines Introduction Introduction.
Chapter 6: User-Defined Functions
20-753: Fundamentals of Web Programming 1 Lecture 12: Javascript I Fundamentals of Web Programming Lecture 12: Introduction to Javascript.
NMD202 Web Scripting Week3. What we will cover today Includes Exercises PHP Forms Exercises Server side validation Exercises.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 2 Chapter 2 - Introduction to C Programming.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Slide 1 Vitaly Shmatikov CS 380S Static Detection of Web Application Vulnerabilities.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 2 - Introduction to C Programming Outline.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Internet & World Wide Web How to Program, 5/e © by Pearson Education, Inc. All Rights Reserved.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic,
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
Beyond Stack Smashing: Recent Advances In Exploiting Buffer Overruns Jonathan Pincus and Brandon Baker Microsoft Researchers IEEE Security and.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
Automatic Web Security Unit Testing: XSS Vulnerability Detection Mahmoud Mohammadi, Bill Chu, Heather Richter, Emerson Murphy-Hill Presenter:
A Simple Syntax-Directed Translator
Static Detection of Cross-Site Scripting Vulnerabilities
Chapter 2 - Introduction to C Programming
Chapter 2 - Introduction to C Programming
User-Defined Functions
Chapter 2 - Introduction to C Programming
Data Flow Analysis Compiler Design
UNIT V Run Time Environments.
Presentation transcript:

Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna University of Technology Proceedings of the IEEE Symposium on Security and Privacy. (May 2006)

2008/10/22 Outline Introduction Taint-Style Vulnerabilities Data Flow Analysis Empirical Results Conclusions Comments

2008/10/23 Introduction(1/2) There are urgent need for automated vulnerability detection in Web apps development. The existing approaches for mitigating threats to Web apps can be divided into  client-side and server-side solutions Server-side solutions:  Static approaches Scan source code for vulnerabilities  Dynamic approaches Detect while executing the audited program

2008/10/24 Introduction(2/2) Pixy  The first open source tool for statically detecting XSS vulnerabilities in PHP4 code by means of data flow analysis  It can be applied to other taint-style vulnerabilities such as SQL injection or command injection 

2008/10/25 Taint-Style Vulnerabilities(1/2) Of all vulnerabilities in Web apps, problem caused by unchecked input are recognized as being the most common  Inject malicious data in Web applications  Manipulate applications using malicious data The authors refer to this class of vulnerabilities as the tainted object propagation problem Referenced from “Finding security errors in Java programs with static analysis,. in Proceedings of the 14th UsenixSecurity Symposium, Aug. 2005”

2008/10/26 Taint-Style Vulnerabilities(2/2) Tainted data  Originate from potentially malicious users  Cause security problems at vulnerable points in the program (called sensitive sinks)  May enter the program at specific places, and can spread via assignment and similar constructs  Can be untainted (sanitized) using a set of operations Many important types of vulnerabilities (e.g., XSS or SQL injection) can be seen as instances of this general class of taint-style vulnerabilities.  Differ only with respect to concrete values of few parameters

2008/10/27 Cross-Site Scripting (XSS)(1/2) Occurs when dynamically generated Web pages display improperly validated input An attacker may embed malicious JavaScript code into dynamically generated pages of trusted sites.  hijack the user account credentials  change user settings  steal cookies  insert unwanted content into the page

2008/10/28 Cross-Site Scripting (XSS)(2/2) Reflected Cross-Site Scripting Attacks Stored Cross-Site Scripting Attacks  An attacker's malicious script is rendered more than once alert('Hello World'); 一個關於兔子的網頁 location.replace(' e)

2008/10/29 Properties of XSS Entry Points into the programs  GET: $_GET  POST: $_POST  COOKIE: $_COOKIE  entry points grows when the “register globals” is active Sanitation Routines  htmlentities(), htmlspecialchars(), and type casts Sensitive Sinks  echo()  print()  printf()…

2008/10/210 Data Flow Analysis(1/4) Goal: To determine whether it is possible that tainted data reaches sensitive sinks without being properly sanitized.  Identify the taint value of variables used in these sinks Statistically compute certain information for every single program point (or for coarser units such as functions) PHP Front-End  construct a parse tree for PHP input file  transformed into linearized form resembling three-address code (TAC), and kept as a control flow graph for each encounter function Assembly-like language At most 3 operands “x = y op z”

2008/10/211 Data Flow Analysis(2/4) Operates on the control flow graph (CFG) of a program  A data structure built on top of the intermediate code representation abstracting the control flow behavior of a function that is being compiled  Node – atomic statement of program  Edge – flow of control

2008/10/212 Literal Analysis: Basics Purpose: To determine, for each program point, the literal that a variable or a constant can hold. Can improve the precision of the overall analysis by:  Evaluate branch conditions  Ignore program paths that cannot be executed at runtime (called path pruning)  Resolution of non-literal include statements, variable variables, variable array indices, and variable function calls (only for potential uses) After performing literal analysis  each CFG node is associated with information about which literal is mapped to a variable before executing that node

2008/10/213 How Data Flow Analysis is Used to Perform Literal Analysis Assume a fictitious programming language  One variable (v)  Two literals (the integer 3 and 4) “skip” node  empty instruction “Ω”  Unknown literal

2008/10/214 Data Flow Analysis(3/4) Carrier Lattice  Information about program represented using values from algebraic structure  Every information that could ever be associated with a CFG node by the analysis must be contained as an element of the used lattice  Bottom element : “ not visited yet ” at the biginning  Line: ordering between elements regard to precision  Least upper bound : the smallest element that is greater than or equal to both of the elements. Needed by the analysis algorithm

2008/10/215 Data Flow Analysis(4/4) Transfer Function  f: P  P for each node in control flow graph Input: a lattice element Output: a lattice element  Models effect of the node on the program information  Each CFG node is associated with such a transfer function

2008/10/216 Literal Analysis: Basics Carrier Lattice Definition  Provides mappings for all variables and constants that appear in the scanned program  Able to describe the mapping to any possible literal (infinite)

2008/10/217 Literal Analysis: Basics Transfer Function Definition  PHP without explicit type declarations  “Hidden” array

2008/10/218 Four cases in order of increasing complexity 1. Not an array element and not known as array  strong update 2. An array, but not an array element  Array tree 3. Element without non-literal indices (may be an array)  strong overlap

2008/10/219 Four cases in order of increasing complexity 4. An array element with non-literal indices and maybe an array  weak overlap algorithm: all overwrite operations are replaced by least upper bound operations Array elements with one or more non-literal indices are permanently mapped to Ω

2008/10/220 Alias Analysis Ignoring the information of alias relationships would prevent literal analysis from producing correct results in a number of cases. Without alias analysis, literal analysis can’t decide that $a also affects $b $b remain unchanged and be incorrect!

2008/10/221 Carrier Lattice Definition Alias group: a group of variables referencing the same memory location Modeling alias information through sets of alias group sets  (…): an alias group  {…}: an alias group set Must-aliases of a variable  “{(a,b) (c)}”  $b: must-alias of $a May-aliases of a variable  “{(a,b) (c)} {(a,c) (b)}”  $b and $c: may-aliases of $a The order among lattice elements is defined as subset inclusion

2008/10/222 Static analysis is not able to decide which path the program will take  Under the assumption that the condition is determined by dynamic factors  Environment variables, user input

2008/10/223 Transfer Function Definition Reference assignment  “$a = & $b” Unset node  Own one-element alias group for each alias group set Global node  Equally-name variable from the global scope on the right side  “global $a;” The authors only consider references to simple variables

2008/10/224 Literal Analysis Revisited Here we only consider references to simple variables Functions built into PHP are conservatively modeled as returning Ω since the increased precision is expected to be rather small  only built-in function modeled precisely is “define”

2008/10/225 Literal Analysis Revisited The transfer function at the call preparation node stores the alias information for the local variables of the calling function, and resets it to its default (initial) value On function return (i.e., at the call return node), the alias information for local variables of the callee is reset to its default, while the caller's locals are restored again.

2008/10/226 Taint Analysis Purpose: To determine, for each program point, the taint value (instead of the literal) of a variable or constant. Possible to inspect whether any sensitive sink in the program is receiving malicious data, and hence, to detect vulnerabilities

2008/10/227 Taint Analysis Carrier Lattice Definition  Tainted: if it can hold a malicious, not yet sanitized (checked) value originating from user input  Not map to Ω but to the tainted values tainted and untainted mapped to tainted: this variable might be tainted. mapping to untainted: this variable is untainted. whenever the analysis cannot determine, it is conservatively assumed to be tainted

2008/10/228 Taint Analysis Transfer Functions Definition  Implicitly casting a tainted variable into an integer untaints this variable (with unary operators such as +, -, and (int))  Correctly model built-in PHP functions can reduce the number of false positives Pixy processes a specification file on startup which contains abstracted versions of some built-in functions in PHP syntax “htmlentities” and “array” return $_UNTAINTED

2008/10/229 Taint Analysis Using the Analysis Results  Generating warnings that point the developer to possible XSS vulnerabilities at the end of the analysis is straightforward. The analysis information for each sensitive sink is searched for tainted input variables a A warning message indicating the corresponding line is issued if such a violation is discovered

2008/10/230 Limitations Pixy does not support object-oriented features of PHP.  Malicious data can never arise from such constructs. Files included with “include” and similar keywords are not scanned automatically  The authors frequently observed false positives stemming from these lacking file inclusions  Eliminated through manual inclusion

2008/10/231 Empirical Results

2008/10/232 Empirical Results

2008/10/233

2008/10/234 Conclusions A flow-sensitive, interprocedural, and context- sensitive data flow analysis for PHP, targeted at detecting taint-style vulnerabilities Additional literal analysis and alias analysis to improve correctness and precision of taint analysis Pixy, an open-source Java tool that implements these analysis technique Experimental validation of Pixy’s ability to detect unknown vulnerabilities with a low false positive rate

2008/10/235 Comments The first to perform alias analysis for an untyped, reference-based scripting language such as PHP Beyond the scope of the paper  Recursive calls depends on dynamic information  Infinite call depth for non-terminating programs The implementation is widely used by the public. Future work  automatic inclusion of “include” files