Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis Using Datalog

Similar presentations


Presentation on theme: "Program Analysis Using Datalog"— Presentation transcript:

0 Why Use Datalog to Analyze Programs?
Monica Lam Stanford University Team: John Whaley, Ben Livshits, Michael Martin, Dzintars Avots, Michael Carbin, Chris Unkel

1 Program Analysis Using Datalog
Jeffrey Ullman, Principles of Database and Knowledge-Base Systems, 1989

2 Reps, 1994 bddbddb, 2005 Problem interproc. data-flow reaching def/slicing software security programmer specified pointer alias analysis Demand Driven Exhaustive Implementation ease magic set xform BDD tuning Coral Custom, BDD based Slower Faster solved open problem < 1000 lines of code 800,000 byte codes

3 Confidential information leak
Web Applications Evil Input Confidential information leak Hacker Browser Web App Database

4 Web Application Vulnerabilities
50% databases had a security breach [2002 Computer crime & security survey] 48% of all vulnerabilities Q3-Q4, 2004 Up from 39% Q1-Q2, 04 [Symantec May, 2005]

5 Top Ten Security Flaws in Web Applications [OWASP]
Unvalidated Input Broken Access Control Broken Authentication and Session Management Cross Site Scripting (XSS) Flaws Buffer Overflows Injection Flaws Improper Error Handling Insecure Storage Denial of Service Insecure Configuration Management

6 Vulnerability Alerts SecurityFocus.com, on May 16, 2005

7 Source of vulnerabilities
: JGS-Portal Multiple Cross-Site Scripting and SQL Injection Vulnerabilities : WoltLab Burning Board Verify_ Function SQL Injection Vulnerability : Version Cue Local Privilege Escalation Vulnerability : NPDS THOLD Parameter SQL Injection Vulnerability : DotNetNuke User Registration Information HTML Injection Vulnerability : Pserv completedPath Remote Buffer Overflow Vulnerability : DotNetNuke User-Agent String Application Logs HTML Injection Vulnerability : DotNetNuke Failed Logon Username Application Logs HTML Injection Vulnerability : Mozilla Suite And Firefox DOM Property Overrides Code Execution Vulnerability : Sigma ISP Manager Sigmaweb.DLL SQL Injection Vulnerability : Mozilla Suite And Firefox Multiple Script Manager Security Bypass Vulnerabilities : PServ Remote Source Code Disclosure Vulnerability : PServ Symbolic Link Information Disclosure Vulnerability : Pserv Directory Traversal Vulnerability : MetaCart E-Shop ProductsByCategory.ASP Cross-Site Scripting Vulnerability : WebAPP Apage.CGI Remote Command Execution Vulnerability : OpenBB Multiple Input Validation Vulnerabilities : PostNuke Blocks Module Directory Traversal Vulnerability : MetaCart E-Shop V-8 IntProdID Parameter Remote SQL Injection Vulnerability : MetaCart2 StrSubCatalogID Parameter Remote SQL Injection Vulnerability : Shop-Script ProductID SQL Injection Vulnerability : Shop-Script CategoryID SQL Injection Vulnerability : SWSoft Confixx Change User SQL Injection Vulnerability : PGN2WEB Buffer Overflow Vulnerability : Apache HTDigest Realm Command Line Argument Buffer Overflow Vulnerability : Squid Proxy Unspecified DNS Spoofing Vulnerability : Linux Kernel ELF Core Dump Local Buffer Overflow Vulnerability : Gaim Jabber File Request Remote Denial Of Service Vulnerability : Gaim IRC Protocol Plug-in Markup Language Injection Vulnerability : Gaim Gaim_Markup_Strip_HTML Remote Denial Of Service Vulnerability : GDK-Pixbuf BMP Image Processing Double Free Remote Denial of Service Vulnerability : Mozilla Firefox Install Method Remote Arbitrary Code Execution Vulnerability : Multiple Vendor FTP Client Side File Overwriting Vulnerability : PostgreSQL TSearch2 Design Error Vulnerability : PostgreSQL Character Set Conversion Privilege Escalation Vulnerability Source of vulnerabilities Input validation: 62% SQL injection: 26%

8 Give me Bob’s credit card #
SQL Injection Errors Hacker Browser Web App Database Give me Bob’s credit card # Delete all records

9 Happy-go-lucky SQL Query
User supplies: name, password Java program: String query = “SELECT UserID, Creditcard FROM CCRec WHERE Name = ” + name + “ AND PW = ” + password

10 Fun with SQL “ — ”: “the rest are comments” in Oracle SQL
SELECT UserID, CreditCard FROM CCRec WHERE: Name = bob AND PW = foo Name = bob— AND PW = x Name = bob or 1=1— AND PW = x Name = bob; DROP CCRec— AND PW = x

11 A Simple SQL Injection Pattern
o = req.getParameter ( ); stmt.executeQuery ( o );

12 In Practice ParameterParser.java:586
String session.ParameterParser.getRawParameter(String name) public String getRawParameter(String name) throws ParameterNotFoundException { String[] values = request.getParameterValues(name); if (values == null) { throw new ParameterNotFoundException(name + " not found"); } else if (values[0].length() == 0) { throw new ParameterNotFoundException(name + " was empty"); } return (values[0]); ParameterParser.java:570 String session.ParameterParser.getRawParameter(String name, String def) public String getRawParameter(String name, String def) { try { return getRawParameter(name); } catch (Exception e) { return def; }

13 In Practice (II) ChallengeScreen.java:194
Element lessons.ChallengeScreen.doStage2(WebSession s) String user = s.getParser().getRawParameter( USER, "" ); StringBuffer tmp = new StringBuffer(); tmp.append("SELECT cc_type, cc_number from user_data WHERE userid = '“); tmp.append(user); tmp.append("'“); query = tmp.toString(); Vector v = new Vector(); try { ResultSet results = statement3.executeQuery( query ); ...

14 PQL: Program Query Language
o = req.getParameter ( ); stmt.executeQuery ( o ); Query on the dynamic behavior based on object entities Generates a static checker and a dynamic checker

15 SQL Injection in PQL query SQLInjection()
returns object Object source, taint; uses object HttpServletRequest req, java.sql.Statement stmt; matches { source = req.getParameter (); tainted := derivedString(source); stmt.execute(tainted); } query derivedString(object Object x) returns object Object y; uses object Object temp; y := x | { temp.append(x); y := derivedString(temp); }

16 Vulnerabilities in Web Applications
Inject Parameters Hidden fields Headers Cookie poisoning Exploit SQL injection Cross-site scripting HTTP splitting Path traversal X

17 Dynamic vs. Static Pattern
o = req.getParameter ( ); stmt.executeQuery (o); Dynamically: p1 = req.getParameter ( ); stmt.executeQuery (p2); Statically: p1 and p2 point to same object? Pointer alias analysis

18 Top 4 Techniques in PQL Implementation
Drawn from 4 different fields Compiler Context-sensitive pointer analysis HW Verification BDD: binary decision diagrams Logic Programming Datalog AI Active machine learning

19 Context-Sensitive Pointer Analysis
L1: a=malloc(); a=id(a); id(x) id(x) {return x;} L2: b=malloc( ); b=id(b); context-sensitive a L1 x context-insensitive b L2 x

20 # of Contexts is exponential!

21 Recursion A A G B C D E F B C D E F G

22 Top 20 Sourceforge Java Apps
1016 1012 108 104 100

23 Costs of Context Sensitivity
Typical large program has ~1014 paths If you need 1 byte to represent a context: 256 terabytes of storage > 12 times size of Library of Congress 1GB DIMMs: $98.6 million Power: 96.4 kilowatts (128 homes) 300 GB hard disks: 939 x $250 = $234,750 Time to read sequentially: 70.8 days

24 Cloning-Based Algorithm
Whaley&Lam, PLDI 2004 (best paper award) Create a “clone” for every context Apply context-insensitive algorithm to cloned call graph Lots of redundancy in result Exploit redundancy by clever use of BDDs (binary decision diagrams)

25 Performance of BDD Algorithm
Direct implementation Does not finish even for small programs > 3000 lines of code Requires tuning for about 1 year Easy to make mistakes Mistakes found months later

26 Automatic Analysis Generation
PQL Ptr analysis in 10 lines Datalog bddbddb (BDD-based deductive database) with Active Machine Learning Thousand-lines 1 year tuning BDD code

27 Automatic Analysis Generation
PQL Datalog bddbddb (BDD-based deductive database) with Active Machine Learning BDD code

28 Flow-Insensitive Pointer Analysis
o1: p = new Object(); o2: q = new Object(); p.f = q; r = p.f; Input Tuples vPointsTo(p,o1) vPointsTo(q,o2) Store(p,f,q) Load(p,f,r) New Tuples hPointsTo(o1,f,o2) vPointsTo(r,o2) p o1 f q o2 r

29 Inference Rule in Datalog
Stores: hPointsTo(h1, f, h2) :- Store(v1, f, v2), vPointsTo(v1, h1), vPointsTo(v2, h2). v1.f = v2; v1 h1 f v2 h2

30 Inference Rules vPointsTo(v, h) :- vPointsTo0(v, h). vPointsTo(v1, h1)
Creation site :- vPointsTo0(v, h). vPointsTo(v1, h1) Assignment :- Assign(v1, v2), vPointsTo(v2, h1). hPointsTo(h1, f, h2) :- Store(v1, f, v2), vPointsTo(v1, h1), vPointsTo(v2, h2). Store vPointsTo(v2, h2) Load :- Load(v1, f, v2), vPointsTo(v1, h1), hPointsTo(h1, f, h2).

31 Pointer Alias Analysis
Specified by a few Datalog rules Creation sites Assignments Stores Loads Apply rules until they converge

32 SQL Injection Query SQLInjection: o = req.getParameter ( );
PQL: stmt.executeQuery ( o ); SQLInjection (o) :- calls(c1,b1,_, “getParameter”), ret(b1,v1),vPointsTo(c1, v1,o), Datalog: calls(c2,b2,_, “executeQuery”), actual(b2,1,v2),vPointsTo(c2,v2,o)

33 Program Analyses in Datalog
Context-sensitive Java pointer analysis C pointer analysis Escape analysis Type analysis External lock analysis Interprocedural def-use Interprocedural mod-ref Object-sensitive analysis Cartesian product algorithm

34 Automatic Analysis Generation
PQL Datalog bddbddb (BDD-based deductive database) with Active Machine Learning BDD code

35 Example: Call Graph Relation
“Call graph” expressed as a relation. Five edges: calls(A,B) calls(A,C) calls(A,D) calls(B,D) calls(C,D) A B C D

36 Call Graph Relation Relation expressed as a binary function. 00 01 10
1 Relation expressed as a binary function. A=00, B=01, C=10, D=11 A 00 01 B C 10 D 11

37 Binary Decision Diagrams
Graphical encoding of a truth table. x1 0 edge 1 edge x2 x2 x3 x3 x3 x3 x4 x4 x4 x4 x4 x4 x4 x4 1 1 1 1 1

38 Binary Decision Diagrams
Collapse redundant nodes. x1 x2 x2 x3 x3 x3 x3 x4 x4 x4 x4 x4 x4 x4 x4 1 1 1 1 1

39 Binary Decision Diagrams
Collapse redundant nodes. x1 x2 x2 x3 x3 x3 x3 x4 x4 x4 x4 x4 x4 x4 x4 1

40 Binary Decision Diagrams
Collapse redundant nodes. x1 x2 x2 x3 x3 x3 x3 x4 x4 x4 1

41 Binary Decision Diagrams
Collapse redundant nodes. x1 x2 x2 x3 x3 x3 x4 x4 x4 1

42 Binary Decision Diagrams
Eliminate unnecessary nodes. x1 x2 x2 x3 x3 x3 x4 x4 x4 1

43 Binary Decision Diagrams
Eliminate unnecessary nodes. x1 x2 x2 x3 x3 x4 1

44 Datalog  BDDs Datalog BDDs Relations Boolean functions
Relation ops: ⋈, , select, project Boolean function ops:  ,  , − , ~ Relation at a time Function at a time Semi-naïve evaluation Incrementalization Fixed-point Iterate until stable

45 Binary Decision Diagrams
Represent tiny and huge relations compactly Size depends on redundancy Similar contexts have similar numberings Variable ordering in BDDs

46 BDD Variable Order is Important!
x1x2 + x3x4 x1 x3 x4 1 x2 x1 x3 x4 1 x2 x1<x2<x3<x4 x1<x3<x2<x4

47 Variable Numbering: Active Machine Learning
Must be determined dynamically Limit trials with properties of relations Each trial may take a long time Active learning: select trials based on uncertainty Several hours Comparable to exhaustive for small apps

48 Optimizations in bddbddb
Algorithmic Clever context numbering to exploit similarities Query optimizations Magic-set transformation semi-naïve evaluation Compiler optimizations Redundancy elimination, liveness analysis BDD optimizations Active machine learning BDD library extensions and turning

49 Top 4 Techniques in PQL Compiler Context-sensitive pointer analysis
HW Verification BDD: binary decision diagrams Logic Programming Datalog AI Active machine learning

50 Benchmark 9 large, widely used applications
Blogging/bulletin board applications Used at a variety of sites Open-source Java J2EE apps Available from SourceForge.net

51 Vulnerabilities Found
SQL injection HTTP splitting Cross-site scripting Path traversal Total Header 6 4 10 Parameter 5 2 13 Cookie 1 Non-Web 3 9 11 29

52 Accuracy Benchmark Classes Context insensitive Context sensitive False
jboard 264 blueblog 306 1 webgoat 349 51 6 blojsom 428 48 2 personalblog 611 460 snipsnap 653 732 27 12 road2hibernate 867 18 pebble 889 427 roller 989 378 Total 5356 2115 41

53 Easy Context-Sensitive Analysis
PQL Datalog bddbddb (BDD-based deductive database) with Active Machine Learning Context-sensitive Analysis BDD code

54 To try out our software …with a click of a button
Bddbddb + JavaBDD + Eclipse + JDK Encapsulated as a LivePC Visit Download the LivePC player Click to subscribe “Java Program Analysis Toolset”

55 References Pointers: Whaley, Lam, PLDI 04
C pointers: Avots, Dalton, Livshits, Lam, ICSE 05 PQL: Martin, Livshits, Lam, OOPSLA 05 Java Security: Livshits, Lam, Usenix security 05


Download ppt "Program Analysis Using Datalog"

Similar presentations


Ads by Google