Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why Use Datalog to Analyze Programs? Team: John Whaley, Ben Livshits, Michael Martin, Dzintars Avots, Michael Carbin, Chris Unkel.

Similar presentations


Presentation on theme: "Why Use Datalog to Analyze Programs? Team: John Whaley, Ben Livshits, Michael Martin, Dzintars Avots, Michael Carbin, Chris Unkel."— Presentation transcript:

1 Why Use Datalog to Analyze Programs? Team: John Whaley, Ben Livshits, Michael Martin, Dzintars Avots, Michael Carbin, Chris Unkel

2 Program Analysis Using Datalog Jeffrey Ullman, Principles of Database and Knowledge-Base Systems, 1989

3 Reps, 1994bddbddb, 2005 Problem interproc. data-flow reaching def/slicing software security programmer specified pointer alias analysis CoralCustom, BDD based Demand DrivenExhaustive Implementation ease magic set xform BDD tuning < 1000 lines of code800,000 byte codes Faster solved open problem Slower

4 Web Applications DatabaseWeb AppBrowser Evil Input Confidential information leak Hacker

5 Web Application Vulnerabilities 50% databases had a security breach [2002 Computer crime & security survey] 48% of all vulnerabilities Q3-Q4, 2004 Up from 39% Q1-Q2, 04 [Symantec May, 2005]

6 Top Ten Security Flaws in Web Applications [OWASP] 1.Unvalidated Input 2.Broken Access Control 3.Broken Authentication and Session Management 4.Cross Site Scripting (XSS) Flaws 5.Buffer Overflows 6.Injection Flaws 7.Improper Error Handling 8.Insecure Storage 9.Denial of Service 10.Insecure Configuration Management

7 Vulnerability Alerts SecurityFocus.com, on May 16, 2005

8 : JGS-Portal Multiple Cross-Site Scripting and SQL Injection Vulnerabilities : WoltLab Burning Board Verify_ Function SQL Injection Vulnerability : Version Cue Local Privilege Escalation Vulnerability : NPDS THOLD Parameter SQL Injection Vulnerability : DotNetNuke User Registration Information HTML Injection Vulnerability : Pserv completedPath Remote Buffer Overflow Vulnerability : DotNetNuke User-Agent String Application Logs HTML Injection Vulnerability : DotNetNuke Failed Logon Username Application Logs HTML Injection Vulnerability : Mozilla Suite And Firefox DOM Property Overrides Code Execution Vulnerability : Sigma ISP Manager Sigmaweb.DLL SQL Injection Vulnerability : Mozilla Suite And Firefox Multiple Script Manager Security Bypass Vulnerabilities : PServ Remote Source Code Disclosure Vulnerability : PServ Symbolic Link Information Disclosure Vulnerability : Pserv Directory Traversal Vulnerability : MetaCart E-Shop ProductsByCategory.ASP Cross-Site Scripting Vulnerability : WebAPP Apage.CGI Remote Command Execution Vulnerability : OpenBB Multiple Input Validation Vulnerabilities : PostNuke Blocks Module Directory Traversal Vulnerability : MetaCart E-Shop V-8 IntProdID Parameter Remote SQL Injection Vulnerability : MetaCart2 StrSubCatalogID Parameter Remote SQL Injection Vulnerability : Shop-Script ProductID SQL Injection Vulnerability : Shop-Script CategoryID SQL Injection Vulnerability : SWSoft Confixx Change User SQL Injection Vulnerability : PGN2WEB Buffer Overflow Vulnerability : Apache HTDigest Realm Command Line Argument Buffer Overflow Vulnerability : Squid Proxy Unspecified DNS Spoofing Vulnerability : Linux Kernel ELF Core Dump Local Buffer Overflow Vulnerability : Gaim Jabber File Request Remote Denial Of Service Vulnerability : Gaim IRC Protocol Plug-in Markup Language Injection Vulnerability : Gaim Gaim_Markup_Strip_HTML Remote Denial Of Service Vulnerability : GDK-Pixbuf BMP Image Processing Double Free Remote Denial of Service Vulnerability : Mozilla Firefox Install Method Remote Arbitrary Code Execution Vulnerability : Multiple Vendor FTP Client Side File Overwriting Vulnerability : PostgreSQL TSearch2 Design Error Vulnerability : PostgreSQL Character Set Conversion Privilege Escalation Vulnerability Source of vulnerabilities Input validation: 62% SQL injection: 26%

9 SQL Injection Errors DatabaseWeb AppBrowser Give me Bob s credit card # Delete all records Hacker

10 Happy-go-lucky SQL Query User supplies: name, password Java program: String query = SELECT UserID, Creditcard FROM CCRec WHERE Name = + name + AND PW = + password

11 Fun with SQL : the rest are comments in Oracle SQL SELECT UserID, CreditCard FROM CCRec WHERE: Name = bob AND PW = foo Name = bob AND PW = x Name = bob or 1=1 AND PW = x Name = bob; DROP CCRec AND PW = x

12 A Simple SQL Injection Pattern o = req.getParameter ( ); stmt.executeQuery ( o );

13 In Practice ParameterParser.java:586 String session.ParameterParser.getRawParameter(String name) public String getRawParameter(String name) throws ParameterNotFoundException { String[] values = request.getParameterValues(name); if (values == null) { throw new ParameterNotFoundException(name + " not found"); } else if (values[0].length() == 0) { throw new ParameterNotFoundException(name + " was empty"); } return (values[0]); } ParameterParser.java:570 String session.ParameterParser.getRawParameter(String name, String def) public String getRawParameter(String name, String def) { try { return getRawParameter(name); } catch (Exception e) { return def; }

14 In Practice (II) ChallengeScreen.java:194 Element lessons.ChallengeScreen.doStage2(WebSession s) String user = s.getParser().getRawParameter( USER, "" ); StringBuffer tmp = new StringBuffer(); tmp.append("SELECT cc_type, cc_number from user_data WHERE userid = '); tmp.append(user); tmp.append("'); query = tmp.toString(); Vector v = new Vector(); try { ResultSet results = statement3.executeQuery( query );...

15 PQL: Program Query Language Query on the dynamic behavior based on object entities Generates a static checker and a dynamic checker o = req.getParameter ( ); stmt.executeQuery ( o );

16 SQL Injection in PQL query SQLInjection() returns object Object source, taint; uses object HttpServletRequest req, java.sql.Statement stmt; matches { source = req.getParameter (); tainted := derivedString(source); stmt.execute(tainted); } query derivedString(object Object x) returns object Object y; uses object Object temp; matches { y := x | { temp.append(x); y := derivedString(temp); } }

17 Vulnerabilities in Web Applications Inject Parameters Hidden fields Headers Cookie poisoning Exploit SQL injection Cross-site scripting HTTP splitting Path traversal X

18 Dynamic vs. Static Pattern p 1 and p 2 point to same object? Pointer alias analysis o = req.getParameter ( ); stmt.executeQuery (o); Dynamically: p 1 = req.getParameter ( ); stmt.executeQuery (p 2 ); Statically:

19 Logic Programming Datalog HW VerificationBDD: binary decision diagramsAIActive machine learningCompilerContext-sensitive pointer analysis Top 4 Techniques in PQL Implementation Drawn from 4 different fields

20 id(x) {return x;} id(x) Context-Sensitive Pointer Analysis L1: a=malloc(); a=id(a); L2: b=malloc( ); b=id(b); a b L1 L2 context-insensitive context-sensitivex x

21 # of Contexts is exponential!

22 Recursion A G BCD EF A G BCD EF EFEF GG

23 Top 20 Sourceforge Java Apps

24 Costs of Context Sensitivity Typical large program has ~10 14 paths If you need 1 byte to represent a context: 256 terabytes of storage > 12 times size of Library of Congress 1GB DIMMs: $98.6 million Power: 96.4 kilowatts (128 homes) 300 GB hard disks: 939 x $250 = $234,750 Time to read sequentially: 70.8 days

25 Cloning-Based Algorithm Whaley&Lam, PLDI 2004 (best paper award) Create a clone for every context Apply context-insensitive algorithm to cloned call graph Lots of redundancy in result Exploit redundancy by clever use of BDDs (binary decision diagrams)

26 Performance of BDD Algorithm Direct implementation Does not finish even for small programs > 3000 lines of code Requires tuning for about 1 year Easy to make mistakes Mistakes found months later

27 Automatic Analysis Generation BDD code Thousand-lines 1 year tuning Datalog Ptr analysis in 10 lines bddbddb (BDD-based deductive database) with Active Machine Learning PQL

28 BDD code Datalog bddbddb (BDD-based deductive database) with Active Machine Learning PQL Automatic Analysis Generation

29 Flow-Insensitive Pointer Analysis o 1 : p = new Object(); o 2 : q = new Object(); p.f = q; r = p.f; Input Tuples vPointsTo(p,o 1 ) vPointsTo(q,o 2 ) Store(p,f,q) Load(p,f,r) New Tuples hPointsTo(o 1,f,o 2 ) vPointsTo(r,o 2 ) po1o1 qo2o2 f r

30 hPointsTo(h 1, f, h 2 ):- Store(v 1, f, v 2 ), vPointsTo(v 1, h 1 ), vPointsTo(v 2, h 2 ). v1v1 h1h1 v2v2 h2h2 f Inference Rule in Datalog v 1.f = v 2 ; Stores:

31 Inference Rules vPointsTo(v 1, h 1 ) :- Assign(v 1, v 2 ), vPointsTo(v 2, h 1 ). hPointsTo(h 1, f, h 2 ) :- Store(v 1, f, v 2 ), vPointsTo(v 1, h 1 ), vPointsTo(v 2, h 2 ). vPointsTo(v 2, h 2 ) :- Load(v 1, f, v 2 ), vPointsTo(v 1, h 1 ), hPointsTo(h 1, f, h 2 ). vPointsTo(v, h) :- vPointsTo 0 (v, h). Creation site Assignment Store Load

32 Pointer Alias Analysis Specified by a few Datalog rules Creation sites Assignments Stores Loads Apply rules until they converge

33 SQL Injection Query SQLInjection: PQL: Datalog: o = req.getParameter ( ); stmt.executeQuery ( o ); SQLInjection (o) :- calls(c 1,b 1,_, getParameter), ret(b 1,v 1 ),vPointsTo(c 1, v 1,o), calls(c 2,b 2,_, executeQuery), actual(b 2,1,v 2 ),vPointsTo(c 2,v 2,o)

34 33 Program Analyses in Datalog Context-sensitive Java pointer analysis C pointer analysis Escape analysis Type analysis External lock analysis Interprocedural def-use Interprocedural mod-ref Object-sensitive analysis Cartesian product algorithm

35 34 BDD code Datalog bddbddb (BDD-based deductive database) with Active Machine Learning PQL Automatic Analysis Generation

36 Example: Call Graph Relation Call graph expressed as a relation. Five edges: calls(A,B) calls(A,C) calls(A,D) calls(B,D) calls(C,D) B D C A

37 Call Graph Relation Relation expressed as a binary function. A=00, B=01, C=10, D=11 x1x1 x2x2 x3x3 x4x4 f B D C A

38 Binary Decision Diagrams Graphical encoding of a truth table. x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x1x1 0 edge 1 edge

39 Binary Decision Diagrams Collapse redundant nodes. x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x1x

40 Binary Decision Diagrams Collapse redundant nodes. x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x4 x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x4 0 x1x1 1

41 Binary Decision Diagrams Collapse redundant nodes. x2x2 x4x4 x3x3 x3x3 x2x2 x3x3 x3x3 x4x4 x4x4 0 x1x1 1

42 Binary Decision Diagrams Collapse redundant nodes. x2x2 x4x4 x3x3 x3x3 x2x2 x3x3 x4x4 x4x4 0 x1x1 1

43 Binary Decision Diagrams Eliminate unnecessary nodes. x2x2 x4x4 x3x3 x3x3 x2x2 x3x3 x4x4 x4x4 0 x1x1 1

44 Binary Decision Diagrams Eliminate unnecessary nodes. x2x2 x3x3 x2x2 x3x3 x4x4 0 x1x1 1

45 Datalog BDDs DatalogBDDs RelationsBoolean functions Relation ops:,, select, project Boolean function ops:,,, Relation at a timeFunction at a time Semi-naïve evaluationIncrementalization Fixed-pointIterate until stable

46 Binary Decision Diagrams Represent tiny and huge relations compactly Size depends on redundancy Similar contexts have similar numberings Variable ordering in BDDs

47 BDD Variable Order is Important! x1x1 x3x3 x4x4 01 x2x2 x1x1 x3x3 x4x4 01 x2x2 x3x3 x2x2 x 1 x 2 + x 3 x 4 x 1

48 Variable Numbering: Active Machine Learning Must be determined dynamically Limit trials with properties of relations Each trial may take a long time Active learning: select trials based on uncertainty Several hours Comparable to exhaustive for small apps

49 Optimizations in bddbddb Algorithmic Clever context numbering to exploit similarities Query optimizations Magic-set transformation semi-naïve evaluation Compiler optimizations Redundancy elimination, liveness analysis BDD optimizations Active machine learning BDD library extensions and turning

50 HW VerificationBDD: binary decision diagrams Logic Programming Datalog AIActive machine learningCompilerContext-sensitive pointer analysis Top 4 Techniques in PQL

51 Benchmark 9 large, widely used applications Blogging/bulletin board applications Used at a variety of sites Open-source Java J2EE apps Available from SourceForge.net

52 Vulnerabilities Found SQL injection HTTP splitting Cross-site scripting Path traversal Total Header Parameter Cookie Non-Web Total

53 Accuracy BenchmarkClasses Context insensitive Context sensitive False jboard blueblog webgoat blojsom personalblog snipsnap road2hibernate pebble roller Total

54 Easy Context-Sensitive Analysis BDD code Datalog bddbddb (BDD-based deductive database) with Active Machine Learning PQL Context-sensitive Analysis

55 To try out our software …with a click of a button Bddbddb + JavaBDD + Eclipse + JDK Encapsulated as a LivePC Visit Download the LivePC player Click to subscribe Java Program Analysis Toolset

56 References Pointers: Whaley, Lam, PLDI 04 C pointers: Avots, Dalton, Livshits, Lam, ICSE 05 PQL: Martin, Livshits, Lam, OOPSLA 05 Java Security: Livshits, Lam, Usenix security 05


Download ppt "Why Use Datalog to Analyze Programs? Team: John Whaley, Ben Livshits, Michael Martin, Dzintars Avots, Michael Carbin, Chris Unkel."

Similar presentations


Ads by Google