Finding Application Errors and Security Flaws Using PQL: A Program Query Language Michael Martin, Benjamin Livshits, Monica S. Lam Stanford University.

Slides:



Advertisements
Similar presentations
Locating Matching Method Calls by Mining Revision History Data Benjamin Livshits Stanford University and Thomas Zimmermann Saarland University.
Advertisements

Runtime Prevention & Recovery Protect existing applications Advantages: Prevents vulnerabilities from doing harm Safe mode for Web application execution.
Finding Application Errors and Security Flaws Using PQL: a Program Query Language MICHAEL MARTIN, BENJAMIN LIVSHITS, MONICA S. LAM PRESENTED BY SATHISHKUMAR.
Type-based Taint Analysis for Java Web Applications Wei Huang, Yao Dong and Ana Milanova Rensselaer Polytechnic Institute 1.
GATEKEEPER MOSTLY STATIC ENFORCEMENT OF SECURITY AND RELIABILITY PROPERTIES FOR JAVASCRIPT CODE Salvatore Guarnieri & Benjamin Livshits Presented by Michael.
Moving Target Defense in Cyber Security
Various languages….  Could affect performance  Could affect reliability  Could affect language choice.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Static code check – Klocwork
Finding Security Errors in Java Applications Using Lightweight Static Analysis Benjamin Livshits Computer Science Lab Stanford University.
Michael Martin, Ben Livshits, Monica S. Lam Stanford University First presented at OOPSLA 2005.
Securing Web Applications Static and Dynamic Information Flow Tracking Jason Ganzhorn 4/12/2010.
George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
Introducing the Common Language Runtime. The Common Language Runtime The Common Language Runtime (CLR) The Common Language Runtime (CLR) –Execution engine.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Java Security. Topics Intro to the Java Sandbox Language Level Security Run Time Security Evolution of Security Sandbox Models The Security Manager.
Testing Tools. Categories of testing tools Black box testing, or functional testing Testing performed via GUI. The tool helps in emulating end-user actions.
Approaches to Application Security – DSM
Web Application Access to Databases. Logistics Test 2: May 1 st (24 hours) Extra office hours: Friday 2:30 – 4:00 pm Tuesday May 5 th – you can review.
Automatically Hardening Web Applications Using Precise Tainting Anh Nguyen-Tuong Salvatore Guarnieri Doug Greene Jeff Shirley David Evans University of.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Computer Security and Penetration Testing
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Learning, Monitoring, and Repair in Application Communities Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
Attacking Applications: SQL Injection & Buffer Overflows.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
 What software components are required?  How do I install the Oracle JDBC driver?  How do I connect to the database?  What form is the data in and.
Attacking Data Stores Brad Stancel CSCE 813 Presentation 11/12/2012.
Basic Semantics Associating meaning with language entities.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Finding Security Vulnerabilities in Java Applications with Static Analysis Reviewed by Roy Ford.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 EECS 6083 Compiler Theory Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Plug-in Architectures Presented by Truc Nguyen. What’s a plug-in? “a type of program that tightly integrates with a larger application to add a special.
Introduction Program File Authorization Security Theorem Active Code Authorization Authorization Logic Implementation considerations Conclusion.
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
Java FilesOops - Mistake Java lingoSyntax
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Security Attacks Tanenbaum & Bo, Modern Operating Systems:4th ed., (c) 2013 Prentice-Hall, Inc. All rights reserved.
Application Communities Phase 2 (AC2) Project Overview Nov. 20, 2008 Greg Sullivan BAE Systems Advanced Information Technologies (AIT)
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Content Coverity Static Analysis Use cases of Coverity Examples
Object Lifetime and Pointers
Web Application Vulnerabilities, Detection Mechanisms, and Defenses
Introduction to Computers, the Internet and the World Wide Web
Secure Software Development: Theory and Practice
Security mechanisms and vulnerabilities in .NET
Java programming lecture one
High Coverage Detection of Input-Related Security Faults
Human Complexity of Software
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Securing Web Applications with Information Flow Tracking
Introduction to Static Analyzer
Chapter 8 Advanced SQL.
CS5123 Software Validation and Quality Assurance
Presentation transcript:

Finding Application Errors and Security Flaws Using PQL: A Program Query Language Michael Martin, Benjamin Livshits, Monica S. Lam Stanford University OOPSLA 2005

The Problem Lots of bug-finding research Lots of bug-finding research  Null dereferences  Buffer overruns  Data races Many bugs application-specific Many bugs application-specific  Misuse of libraries  Violation of application logic

Solution: Division of Labor Programmer Programmer  Knows target program  Doesn’t know analysis Program Analysis Specialists Program Analysis Specialists  Knows analysis  Doesn’t know specific bugs Give the programmer a usable analysis

Program Query Language Queries operate on program traces Queries operate on program traces  Sequence of events representing a run  Refers to object instances, not variables  Events matched may be widely spaced Patterns resemble Java code Patterns resemble Java code  Like a small matching code snippet  No references to compiler internals

System Architecture Question instrumenter Program static analyzer PQL Query PQL Engine Instrumented Program Optimized Instrumented Program Static Results

Complementary Approaches Dynamic analysis: finds matches at run time Dynamic analysis: finds matches at run time  After a match:  Can execute user code  Can fix code by replacing instructions Static analysis: finds all possible matches Static analysis: finds all possible matches  Conservative: can prove lack of match  Results can optimize dynamic analysis

Experimental Results Explored a wide range of PQL queries Explored a wide range of PQL queries  Bad session stores (API violations)  SQL injections (security flaws)  Mismatched calls (API violations)  Lapsed listeners (memory leaks) Automatically repaired and prevented many bugs at runtime Automatically repaired and prevented many bugs at runtime  Fixed memory leaks, prevented security flaws Runtime overhead is reasonable Runtime overhead is reasonable  Overhead in the 9-125% range  Static optimization removed 82-99% of instr. points Found 206 bugs in 6 real-life Java applications Found 206 bugs in 6 real-life Java applications  Eclipse, popular web applications  60,000 classes combined

Question instrumenter Program static analyzer PQL Query PQL Engine Instrumented Program Optimized Instrumented Program Static Results System Architecture

Running Example: SQL Injection Unvalidated user input passed to a database back-end for execution Unvalidated user input passed to a database back-end for execution If SQL in embedded in the input, attacker can take over database If SQL in embedded in the input, attacker can take over database  Unauthorized data reads  Unauthorized data modifications One of the top web security flaws One of the top web security flaws

SQL Injection 1 HttpServletRequest req = /*... */; java.sql.Connection conn = /*... */; String q = req.getParameter(“QUERY”); conn.execute(q); 1 CALLo 1.getParameter(o 2 ) 2 RETo 2 3 CALLo 3.execute(o 2 ) 4 RETo 4

SQL Injection 2 String read() { HttpServletRequest req = /*... */; HttpServletRequest req = /*... */; return req.getParameter(“QUERY”); } /*... */ java.sql.Connection conn = /*... */; conn.execute(read()); 1 CALLread() 2 CALLo 1.getParameter(o 2 ) 3 RETo 3 4 RETo 3 5 CALLo 4.execute(o 3 ) 6 RETo 5

Essence of Pattern the Same 1 CALLo 1.getParameter(o 2 ) 2 RETo 3 3 CALLo 4.execute(o 3 ) 4 RETo 5 1 CALLread() 2 CALLo 1.getParameter(o 2 ) 3 RETo 3 4 RETo 3 5 CALLo 4.execute(o 3 ) 6 RETo 5 The object returned by getParameter is then argument 1 to execute

Translates Directly to PQL query main() uses String x; matches { x = HttpServletRequest.getParameter(_); Connection.execute(x); } Query variables  heap objects Query variables  heap objects Instructions need not be adjacent in trace Instructions need not be adjacent in trace

Alternation query main() uses String x; matches { x = HttpServletRequest.getParameter(_) | x = HttpServletRequest.getHeader(_); | x = HttpServletRequest.getHeader(_); Connection.execute(x); }

SQL Injection 3 HttpServletRequest req = /*... */; n = getParameter(“NAME”); p = getParameter(“PASSWORD”); conn.execute( “SELECT * FROM logins WHERE name=” + n + “ AND passwd=” + p); Compiler translates string concatenation into operations on String and StringBuffer objects Compiler translates string concatenation into operations on String and StringBuffer objects

SQL Injection 3 1 CALLo 1.getParameter(o 2 ) 2 RETo 3 3 CALLo 1.getParameter(o 4 ) 4 RETo 5 5 CALLStringBuffer. (o 6 ) 6 RETo 7 7 CALLo 7.append(o 8 ) 8 RETo 7 9 CALLo 7.append(o 3 ) 10 RETo 7 11 CALLo 7.append(o 9 ) 12 RETo 7 13 CALLo 7.append(o 5 ) 14 RETo 7 15 CALLo 7.toString() 16 RETo CALLo 11.execute(o 10 ) 18 RETo 12

Old Pattern Doesn’t Work 1 CALLo 1.getParameter(o 2 ) 2 RETo 3 3 CALLo 1.getParameter(o 4 ) 4 RETo 5 17 CALLo 11.execute(o 10 ) o 10 is neither o 3 nor o 5, so no match o 10 is neither o 3 nor o 5, so no match

Instance of Tainted Data Problem User-controlled input must be trapped and validated before being passed to database User-controlled input must be trapped and validated before being passed to database Objects derived from an initial input must also be considered user controlled Objects derived from an initial input must also be considered user controlled Generalizes to many security problems: cross-site scripting, path traversal, response splitting, format string attacks... Generalizes to many security problems: cross-site scripting, path traversal, response splitting, format string attacks... Sensitive Component o1o1 o2o2 o3o3

Pattern Must Catch Derived String s 1 CALLo 1.getParameter(o 2 ) 2 RETo 3 3 CALLo 1.getParameter(o 4 ) 4 RETo 5 9 CALLo 7.append(o 3 ) 10 RETo 7 11 CALLo 7.append(o 9 ) 12 RETo 7 15 CALLo 7.toString() 16 RETo CALLo 11.execute(o 10 )

Pattern Must Catch Derived String s 1 CALLo 1.getParameter(o 2 ) 2 RETo 3 3 CALLo 1.getParameter(o 4 ) 4 RETo 5 9 CALLo 7.append(o 3 ) 10 RETo 7 11 CALLo 7.append(o 9 ) 12 RETo 7 15 CALLo 7.toString() 16 RETo CALLo 11.execute(o 10 )

Pattern Must Catch Derived String s 1 CALLo 1.getParameter(o 2 ) 2 RETo 3 3 CALLo 1.getParameter(o 4 ) 4 RETo 5 9 CALLo 7.append(o 3 ) 10 RETo 7 11 CALLo 7.append(o 9 ) 12 RETo 7 15 CALLo 7.toString() 16 RETo CALLo 11.execute(o 10 )

Derived String Query query derived (Object x) uses Object temp; uses Object temp; returns Object d; returns Object d; matches { matches { { temp.append(x); d := derived(temp); } { temp.append(x); d := derived(temp); } | { temp = x.toString(); d := derived(temp); } | { temp = x.toString(); d := derived(temp); } | { d := x; } | { d := x; } }

New Main Query query main() uses String x, final; matches { x = HttpServletRequest.getParameter(_) | x = HttpServletRequest.getHeader(_); | x = HttpServletRequest.getHeader(_); final := derived(x); Connection.execute(final); }

Defending Against Attacks query main() uses String x, final; matches { x = HttpServletRequest.getParameter(_) | x = HttpServletRequest.getHeader(_); | x = HttpServletRequest.getHeader(_); final := derived(x); } replaces Connection.execute(final) with replaces Connection.execute(final) with SQLUtil.safeExecute(x, final); SQLUtil.safeExecute(x, final); Sanitizes user-derived input Sanitizes user-derived input Dangerous data cannot reach the database Dangerous data cannot reach the database

Other PQL Constructs Partial order Partial order  { x.a(), x.b(), x.c(); }  Match calls to a, b, and c on x in any order. Forbidden Events Forbidden Events  Example: double-lock x.lock(); ~x.unlock(); x.lock();  Single statements only

Expressiveness Concatenation + alternation = Loop-free regexp Concatenation + alternation = Loop-free regexp + Subqueries = CFG + Subqueries = CFG + Partial Order = CFG + Intersection + Partial Order = CFG + Intersection Quantified over heap Quantified over heap  Each subquery independent  Existentially quantified

Question instrumenter Program static analyzer PQL Query PQL Engine Instrumented Program Optimized Instrumented Program Static Results System Architecture

Dynamic Matcher Subquery  state machine Subquery  state machine Call to subquery  new instance of machine Call to subquery  new instance of machine States carry “bindings” with them States carry “bindings” with them  Query variables  heap objects  Bindings are acquired as the variables are referenced for the first time in a match

Query to Translate query main() uses Object x, final; matches { x = getParameter(_) | x = getHeader(); f := derived (x); execute (f); } query derived(Object x) uses Object t; returns Object y; matches { { y := x; } |{ t = x.toString(); y := derived(t); } |{ t = x.toString(); y := derived(t); } | { t.append(x); y := derived(t); } | { t.append(x); y := derived(t); }}

main() Query Machine    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f)

derived() Query Machine       t=x.toString() t.append(x) y := x y := derived(t) * *

Example Program Trace o 1 = getHeader(o 2 ) o 3.append(o 1 ) o 3.append(o 4 ) o 5 = execute(o 3 )

main() : Top Level Match    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) { } { x=o 1 } { x=o 1 } 1 o 1 = getHeader(o 2 )

derived() : call 1       t=x.toString() t.append(x) y := x y := derived(t) * * {x=o 1 } {x=y=o 1 } o 1 = getHeader(o 2 )

main() : Top Level Match    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) { } { x=o 1 } { x=o 1 } 1 o 1 = getHeader(o 2 ) {x=o 1,f=o 1 } o 3.append(o 1 )

derived() : call 1       t=x.toString() t.append(x) y := x y := derived(t) * * {x=o 1 } o 1 = getHeader(o 2 ) o 3.append(o 1 ) {x=o 1, t=o 3 } 2 {x=y=o 1 }

derived() : call 2       t=x.toString() t.append(x) y := x y := derived(t) * * o 1 = getHeader(o 2 ) o 3.append(o 1 ) {x=o 3 } {x=y=o 3 }

derived() : call 1       t=x.toString() t.append(x) y := x y := derived(t) * * {x=o 1 } o 1 = getHeader(o 2 ) o 3.append(o 1 ) {x=o 1, t=o 3 } 2 {x=y=o 1 } {x=o 1, y=t=o 3 }

main() : Top Level Match    * * * x = getParameter(_)x = getHeader(_) f := derived(x) execute(f) { } { x=o 1 } { x=o 1 } 1 o 1 = getHeader(o 2 ) {x=o 1,f=o 1 } o 3.append(o 1 ) o 3.append(o 4 ) o 5 = execute(o 3 ) {x=o 1,f=o 3 }, {x=o 1,f=o 3 }

Find Relevance Fast Hash map for each transition Hash map for each transition  Every call-instance combined Key on known-used variables Key on known-used variables All used variables known-used  one lookup per transition All used variables known-used  one lookup per transition

Question instrumenter Program static analyzer PQL Query PQL Engine Instrumented Program Optimized Instrumented Program Static Results System Architecture

Static Analysis “Can this program match this query?” “Can this program match this query?” Undecidable in general Undecidable in general We give a conservative approximation We give a conservative approximation No matches found = None possible No matches found = None possible

Static Analysis PQL query automatically translated to query on pointer analysis results PQL query automatically translated to query on pointer analysis results Pointer analysis is sound and context-sensitive Pointer analysis is sound and context-sensitive  contexts in a good-sized application  Exponential space represented with BDDs  Analyses given in Datalog Whaley/Lam, PLDI 2004 (bddbddb) for details Whaley/Lam, PLDI 2004 (bddbddb) for details

Static Results Sets of objects and events that could represent a match Sets of objects and events that could represent a matchOR Program points that could participate in a match Program points that could participate in a match No results = no match possible! No results = no match possible!

Question instrumenter Program static analyzer PQL Query PQL Engine Instrumented Program Optimized Instrumented Program Static Results System Architecture

Optimizing the Dynamic Analysis Static results conservative Static results conservative  So, point not in result  point never in any match  So, no need to instrument Usually more than 90% reduction Usually more than 90% reduction

Experiment Topics Domain of Java Web applications Domain of Java Web applications  Serialization errors  SQL injection Domain of Eclipse IDE plugins Domain of Eclipse IDE plugins  API violations  Memory leaks

Experiment Summary NameClasses Inst Pts Bugs webgoat1, personalblog5, road2hibernate7, snipsnap10, roller16,35901 Eclipse19,43918, TOTAL59,96819,579206

Session Serialization Errors Very common bug in Web applications Very common bug in Web applications Server tries to persist non-persistent objects Server tries to persist non-persistent objects  Only manifests under heavy load  Hard to find with testing One-line query in PQL One-line query in PQL  HttpSession.setAttribute(_,!Serializable); Solvable purely statically Solvable purely statically  Dynamic confirmation possible

SQL Injection Our running example Our running example Static optimizes greatly Static optimizes greatly  92%-99.8% reduction of points  2-3x speedup 4 injections, 2 exploitable 4 injections, 2 exploitable  Blocked both exploits Further applications and an improved static analysis in Usenix Security ’05 Further applications and an improved static analysis in Usenix Security ’05

Full derived() Query query StringProp (Object * x) returns Object y; uses Object z; matches { y.append(x,...) y.append(x,...) | x.getChars(_, _, y, _) | y.insert(_, x) | y.replace(_, _, x) | y = x.substring(...) | y = new java.lang.String(x) | y = new java.lang.StringBuffer(x) | y = x.toString() | y = x.getBytes(...) | y = _.copyValueOf(x) | y = x.concat(_) | y = _.concat(x) | y = new java.util.StringTokenizer(x) | y = x.nextToken() | y = x.next() | y = new java.lang.Number(x) | y = x.trim() | { z = x.split(...); y = z[]; } | y = x.toLowerCase(...) | y = x.toUpperCase(...) | y = _.replaceAll(_, x) | y = _.replaceFirst(_, x); } query derived (Object x) returns Object y; uses Object temp; matches { y := x y := x | { temp = StringProp(x); y := derived(temp); } y := derived(temp); }}

Eclipse IDE for Java IDE for Java Very large (tens of MB of bytecode) Very large (tens of MB of bytecode)  Too large for our static analysis Purely interactive Purely interactive  Unoptimized dynamic overhead acceptable

Queries on Eclipse Paired Methods Paired Methods  register/deregister  createWidget/destroyWidget  install/uninstall  startup/shutdown Lapsed Listeners Lapsed Listeners

Eclipse Results All paired methods queries were run simultaneously All paired methods queries were run simultaneously  56 mismatches detected Lapsed listener query was run alone Lapsed listener query was run alone  136 lapsed listeners  Can be automatically fixed

Current Status Open source and hosted on SourceForge Open source and hosted on SourceForge – standalone dynamic implementation – standalone dynamic implementation

Related work PQL is a query language PQL is a query language  JQuery on program traces on program traces  Partiqle, Dalek,... Observing behavior and finding bugs Observing behavior and finding bugs  Metal, Daikon, PREfix, Clouseau,... and automatically add code to fix them and automatically add code to fix them  AspectJ

Conclusions PQL – a Program Query Language PQL – a Program Query Language  Match histories of sets of objects on a program trace  Targeting application developers Found many bugs Found many bugs  206 application bugs and security flaws  6 large real-life applications PQL provides a bridge to powerful analyses PQL provides a bridge to powerful analyses  Dynamic matcher  Point-and-shoot even for unknown applications  Automatically repairs program on the fly  Static matcher  Proves absence of bugs  Can reduce runtime overhead to production-acceptable