Program Analysis for Security John Mitchell CS 155 Spring 2012.

Program Analysis for Security John Mitchell CS 155 Spring 2012

Software bugs are serious problems Thanks: Isil and Thomas Dillig

Several different approaches Instrument code for testing – Heap memory: Purify – Perl tainting (information flow) – Java race condition checking Black-box testing – Fuzzing and penetration testing – Black-box web application security analysis Static code analysis – FindBugs, Fortify, Coverity, MS tools, … 3

Entry 1 23 4 Software Exit Behaviors Entry 1 2 4 Exit 124124134 124134 123124134 124123134 123123134 124124134... 124134 Manual testing only examines small subset of behaviors 4

Static analysis goals Bug finding – Identify code that the programmer wishes to modify or improve Correctness – Verify the absence of certain classes of errors Note: some fundamental limitations…

Outline General discussion of tools – Fundamental limitations – Basic method based on abstract states More details on one specific method – Property checkers from Engler et al., Coverity – Sample security-related results Quick tour of another tool – JavaScript confinement verifier – Proves isolation; not “just” bug finding Slides from: S. Bugrahe, A. Chou, I&T Dillig, D. Engler, …

Program Analyzers Code ReportTypeLine 1mem leak324 2buffer oflow4,353,245 3sql injection23,212 4stack oflow86,923 5dang ptr8,491 ……… 10,502info leak10,921 Program Analyzer Spec potentially reports many warnings may emit false alarms analyze large code bases false alarm

Soundness, Completeness PropertyDefinition SoundnessIf the program contains an error, the analysis will report a warning. “Sound for reporting correctness” CompletenessIf the analysis reports an error, the program will contain an error. “Complete for reporting correctness”

CompleteIncomplete Sound Unsound Reports all errors Reports no false alarms Reports all errors May report false alarms UndecidableDecidable May not report all errors May report false alarms Decidable May not report all errors Reports no false alarms

Software... Behaviors Sound Over-approximation of Behaviors False Alarm Reported Error approximation is too coarse… …yields too many false alarms Modules

entry X  0 Is Y = 0 ? X  X + 1X  X - 1 Is Y = 0 ? Is X < 0 ?exit crash yes no yes no yesno Does this program ever crash?

entry X  0 Is Y = 0 ? X  X + 1 X  X - 1 Is Y = 0 ? Is X < 0 ? exit crash yes no yes no yesno infeasible path! … program will never crash Does this program ever crash?

entry X  0 Is Y = 0 ? X  X + 1X  X - 1 Is Y = 0 ? Is X < 0 ?exit crash yes no yes no yesno X = 0 X = 1 X = 2 X = 3 non-termination! … therefore, need to approximate Try analyzing without approximating…

X  X + 1 f d in d out d out = f(d in ) X = 0 X = 1 dataflow elements transfer function dataflow equation

X  X + 1 f1 d in1 d out1 = f 1 (d in1 ) Is Y = 0 ? f2 d out2 d out1 d in2 d out1 = d in2 d out2 = f 2 (d in2 ) X = 0 X = 1

d out1 = f 1 (d in1 ) d join = d out1 ⊔ d out2 d out2 = f 2 (d in2 ) f1f1 f2f2 f3f3 d out1 d in1 d in2 d out2 d join d in3 d out3 d join = d in3 d out3 = f 3 (d in3 ) least upper bound operator Example: union of possible values What is the space of dataflow elements,  ? What is the least upper bound operator, ⊔ ?

entry X  0 Is Y = 0 ? X  X + 1X  X - 1 Is Y = 0 ? Is X < 0 ?exit crash yes no yes no yesno X = 0 X = posX = T X = negX = 0X = T Try analyzing with “signs” approximation… terminates... … but reports false alarm … therefore, need more precision lost precision X = T

X = posX = 0X = neg X =  X  negX  pos true Y = 0 Y  0 false X = T X = posX = 0X = neg X =  signs lattice Boolean formula lattice refined signs lattice

entry X  0 Is Y = 0 ? X  X + 1X  X - 1 Is Y = 0 ? Is X < 0 ?exit crash yes no yes no yesno X = 0trueX = 0Y=0X = posY=0X = neg Y0Y0 X = posY=0 X = neg Y0Y0 X = posY=0 X = posY=0X = neg Y0Y0 X = 0 Y0Y0 Try analyzing with “path-sensitive signs” approximation… terminates... … no false alarm … soundly proved never crashes no precision loss refinement

Bugs to Detect Some examples Crash Causing Defects Null pointer dereference Use after free Double free Array indexing errors Mismatched array new/delete Potential stack overrun Potential heap overrun Return pointers to local variables Logically inconsistent code Uninitialized variables Invalid use of negative values Passing large parameters by value Underallocations of dynamic data Memory leaks File handle leaks Network resource leaks Unused values Unhandled return codes Use of invalid iterators Slide credit: Andy Chou 21

Example: Check for missing optional args Prototype for open() syscall: Typical mistake: Result: file has random permissions Check: Look for oflags == O_CREAT without mode argument int open(const char *path, int oflag, /* mode_t mode */...); fd = open(“file”, O_CREAT); 22

Example: Chroot protocol checker Goal: confine process to a “jail” on the filesystem −chroot() changes filesystem root for a process Problem −chroot() itself does not change current working directory chroot() chdir(“/”) open(“../file”,…) 23 Error if open before chdir

TOCTOU Race condition between time of check and use Not applicable to all programs check(“foo”) use(“foo”) 24

Tainting checkers 25

Example Code #include void say_hello(char * name, int size) { printf("Enter your name: "); fgets(name, size, stdin); printf("Hello %s.\n", name); } int main(int argc, char *argv[]) { if (argc != 2) { printf("Error, must provide an input buffer size.\n"); exit(-1); } int size = atoi(argv[1]); char * name = (char*)malloc(size); if (name) { say_hello(name, size); free(name); } else { printf("Failed to allocate %d bytes.\n", size); } 26

atoi main exitfreemalloc printffgets say_hello Callgraph 27

atoi main exitfreemalloc printffgets say_hello Reverse Topological Sort 1 2 3 456 7 8 Idea: analyze function before you analyze caller 28

atoi main exitfreemalloc printffgets say_hello Apply Library Models 1 2 3 456 7 8 Tool has built-in summaries of library function behavior 29

atoi main exitfreemalloc printffgets say_hello Bottom Up Analysis 1 2 3 456 7 8 Analyze function using known properties of functions it calls 30

atoi main exitfreemalloc printffgets say_hello Bottom Up Analysis 1 2 3 456 7 8 Analyze function using known properties of functions it calls 31

atoi main exitfreemalloc printffgets say_hello Bottom Up Analysis 1 2 3 456 7 8 Finish analysis by analyzing all functions in the program 32

Finding Local Bugs #define SIZE 8 void set_a_b(char * a, char * b) { char * buf[SIZE]; if (a) { b = new char[5]; } else { if (a && b) { buf[SIZE] = a; return; } else { delete [] b; } *b = ‘x’; } *a = *b; } 33

char * buf[8]; if (a) b = new char [5];if (a && b) buf[8] = a;delete [] b; *b = ‘x’; END *a = *b; a!a a && b !(a && b) Control Flow Graph Represent logical structure of code in graph form 34

char * buf[8]; if (a) b = new char [5];if (a && b) buf[8] = a;delete [] b; *b = ‘x’; END *a = *b; a!a a && b !(a && b) Path Traversal Conceptually: Analyze each path through control graph separately Actually Perform some checking computation once per node; combine paths at merge nodes Conceptually Actually 35

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun See how three checkers are run for this path Defined by a state diagram, with state transitions and error states Checker Assign initial state to each program var State at program point depends on state at previous point, program actions Emit error if error state reached Run Checker 36

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun “buf is 8 bytes” 37

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun “buf is 8 bytes” “a is null” 38

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun “buf is 8 bytes” “a is null” Already knew a was null 39

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun “buf is 8 bytes” “a is null” “b is deleted” 40

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun “buf is 8 bytes” “a is null” “b is deleted” “b dereferenced!” 41

char * buf[8]; if (a) if (a && b) delete [] b; *b = ‘x’; END *a = *b; !a !(a && b) Apply Checking Null pointersUse after freeArray overrun “buf is 8 bytes” “a is null” “b is deleted” “b dereferenced!” No more errors reported for b 42

False Positives What is a bug? Something the user will fix. Many sources of false positives −False paths −Idioms −Execution environment assumptions −Killpaths −Conditional compilation −“third party code” −Analysis imprecision −… 43

char * buf[8]; if (a) b = new char [5];if (a && b) buf[8] = a;delete [] b; *b = ‘x’; END *a = *b; a!a a && b !(a && b) A False Path 44

char * buf[8]; if (a) if (a && b) buf[8] = a; END !a a && b False Path Pruning Integer Range Disequality Branch 45

char * buf[8]; if (a) if (a && b) buf[8] = a; END !a a && b False Path Pruning “a in [0,0]” “a == 0 is true” Integer Range Disequality Branch 46

char * buf[8]; if (a) if (a && b) buf[8] = a; END !a a && b False Path Pruning “a in [0,0]” “a == 0 is true” “a != 0” Integer Range Disequality Branch 47

char * buf[8]; if (a) if (a && b) buf[8] = a; END !a a && b False Path Pruning “a in [0,0]” “a == 0 is true” “a != 0” Impossible Integer Range Disequality Branch 48

Environment Assumptions Should the return value of malloc() be checked? int *p = malloc(sizeof(int)); *p = 42; OS Kernel: Crash machine. File server: Pause filesystem. Spreadsheet: Lose unsaved changes. Game: Annoy user. Library: ? Medical device: malloc?! Web application: 200ms downtime IP Phone: Annoy user. 49

Statistical Analysis Assume the code is usually right int *p = malloc(sizeof(int)); *p = 42; int *p = malloc(sizeof(int)); if(p) *p = 42; int *p = malloc(sizeof(int)); *p = 42; int *p = malloc(sizeof(int)); *p = 42; int *p = malloc(sizeof(int)); if(p) *p = 42; int *p = malloc(sizeof(int)); *p = 42; int *p = malloc(sizeof(int)); if(p) *p = 42; int *p = malloc(sizeof(int)); if(p) *p = 42; 3/4 deref 1/4 deref 50

Application to Security Bugs Stanford research project −Ken Ashcraft and Dawson Engler, Using Programmer-Written Compiler Extensions to Catch Security Holes, IEEE Security and Privacy 2002 −Used modified compiler to find over 100 security holes in Linux and BSD −http://www.stanford.edu/~engler/ Benefit −Capture recommended practices, known to experts, in tool available to all 51

Sanitize integers before use Linux: 125 errors, 24 false; BSD: 12 errors, 4 false array[v] while(i < v) … v.clean Use(v) v.tainted Syscall param Network packet copyin(&v, p, len) any<= v <= any memcpy(p, q, v) copyin(p,q,v) copyout(p,q,v) ERROR Warn when unchecked integers from untrusted sources reach trusting sinks

Example security holes /* 2.4.9/drivers/isdn/act2000/capi.c:actcapi_dispatch */ isdn_ctrl cmd;... while ((skb = skb_dequeue(&card->rcvq))) { msg = skb->data;... memcpy(cmd.parm.setup.phone, msg->msg.connect_ind.addr.num, msg->msg.connect_ind.addr.len - 1); Remote exploit, no checks 53

Example security holes /* 2.4.5/drivers/char/drm/i810_dma.c */ if(copy_from_user(&d, arg, sizeof(arg))) return –EFAULT; if(d.idx > dma->buf_count) return –EINVAL; buf = dma->buflist[d.idx]; Copy_from_user(buf_priv->virtual, d.address, d.used); Missed lower-bound check: 54

User-pointer inference Problem: which are the user pointers? −Hard to determine by dataflow analysis −Easy to tell if kernel believes pointer is from user! Belief inference −“*p” implies safe kernel pointer −“copyin(p)/copyout(p)” implies dangerous user ptr −Error: pointer p has both beliefs. Implementation: 2 pass checker inter-procedural: compute all tainted pointers local pass to check that they are not dereferenced 55

Results for BSD and Linux All bugs released to implementers; most serious fixed Gain control of system 18 15 3 3 Corrupt memory 43 17 2 2 Read arbitrary memory 19 14 7 7 Denial of service 17 5 0 0 Minor 28 1 0 0 Total 125 52 12 12 Linux BSD Violation Bug Fixed Bug Fixed 56

Two Problems Sandboxing Untrusted JavaScript58 Sandbox Design: ensure that sandboxed code obtains access to ANY protected resource ONLY via the API API Confinement: verify that sandboxed code cannot use the API to obtain direct access to a security-critical resource Evolution of Standardized JavaScript ECMA-262 3 rd edition (ES3) - wa s around when I started this work in 2008 ECMA-262 5 th edition (ES5) - released in Dec 2009 – has a strict mode (ES5-strict)

Sandboxing Untrusted JavaScript59 Solution to the Sandboxing and API Confinement problems for ES5-strict Solution to the Sandboxing and API Confinement problems for ES5-strict Plan: 1.Subset SES of ES5-strict 2.Sandboxing technique for untrusted SES code 3.Confinement analysis technique for SES APIs 4.Application: Yahoo! ADSafe

The language ES5-strict ES5-strict = ES3 with the following restrictions Sandboxing Untrusted JavaScript60 Restriction (relative to ES3)Rationale No delete on variable names No prototypes for scope objects No with No this coercion Safe built-in functions No.caller,.callee on arguments object No.caller,.arguments on function objects No arguments and formal parameters aliasing Lexical scoping No ambient access to Global object Closure-based encapsulation

Scope-bounded eval Sandboxing Untrusted JavaScript61 Example: eval(“function(){return x}”, x) Explicitly list free variables of s Run-time restriction: Free(Parse(s)) ⊆ {x 1,…, x n } Allows an upper bound on side-effects of executing s eval (s, x 1,…, x n ) Implemented by de-sugaring to (eval noFree (“function(x 1,…, x n ){“+ s +“}”))(x 1,…, x n )

Next few slides Sandboxing Untrusted JavaScript62 API Confinement: Verify that sandboxed code cannot use the API to obtain direct access to a security-critical resource 1.Subset SES of ES5-strict 2.Sandboxing technique for untrusted SES code 3.Confinement analysis technique for SES APIs 4.Application: Yahoo! ADSafe

API Confinement is Complex JavaScript API Confinement63 Resources, DOM f1f1 r1r1 r4r4 r3r3 r2r2 Untrusted JS Invoke r2r2 Return r 2 Access r 2 r3r3 r4r4 Side-effect r 4 u1u1 Repeat Precision-Efficiency tradeoff API Critical, e.g., window.location

Key Properties of API Implementations Code is part of the trusted computing base Small in size, relative to the application Written in a disciplined manner Developers have an incentive in keeping the code simple Sandboxing Untrusted JavaScript64 Insights : Conservative and scalable static analysis techniques can do well Can soundly establish API Confinement Can warn developers away from using complex coding patterns

Challenges & Techniques Hurdles: Forall quantification on untrusted code Analysis of eval( s, x 1,…, x n ) in general Sandboxing Untrusted JavaScript65 Techniques: Flow-Insensitive and Context-Insensitive Points-to analysis Abstract eval(s, x 1,…, x n ) by the set of all statements that can be written using free variables {x 1,…, x n } Confine(t, critical) : For all untrusted terms s in SES light,

Verifying Confine(t, critical) Sandboxing Untrusted JavaScript66 Trusted code t eval with free vars ”test”,“api” Environment (Built-ins) + + Datalog Solver (least fixed point) Inference Rules (SES semantics) Stack( “test”, l) ∧ Critical(l) ? NOT CONFINED CONFINED true false Abstraction Our decision procedure and implementation

Express Analysis in Datalog (Whaley et. al.) Program t l 1 :var y = {}; l 2 :var x = y; l 3 :x.f = y; Sandboxing Untrusted JavaScript67 Facts(t) Stack(y, l 1 ) Assign(x, y) Store(x, “f”, y) abstract Abstract programs as Datalog facts Abstract the semantics of SES as Datalog inference rules Stack(x, l) :- Assign(x, y), Stack(y, l) Heap(l, f, m) :- Store(x, f, y), Stack(x, l), Stack(y, m) Execution of program t is abstracted by the least-fixed-point of Facts(t) under the inference rules

Complete set of Predicates Sandboxing Untrusted JavaScript68 Abstracting termsAbstracting Heaps & Stacks Assign(x, y)Throw(l, x)Heap(l, x, m)Stack(x, l) Load(x, y, f)Catch(l, x)Prototype(l, m)FuncType(l) Store(x, f, y)TP(l, x)ObjType(l)ArrayType(l) Formal(l, i, x)FormalRet(l, x)NotBuiltin(l)Critical(l) Actual(x, i, z, y, l)Instance(l, x) Global(x)Annotation(x, y) Sufficient to model implicit type conversions, reflection, exceptions

Next few slides Sandboxing Untrusted JavaScript69 1.Subset SES of ES5-strict 2.Sandboxing technique for untrusted SES code 3.Confinement analysis technique for SES APIs 4.Application: Yahoo! ADSafe Implementation: We implemented the decision procedure in the form of an automated tool ENCAP built on top of Datalog engine: bddbddb available online at: http://code.google.com/p/es-lab/Encap

Application: Yahoo! ADSafe JavaScript API Confinement70 Analysis (1 st attempt) annotated the API implementation (2000 LOC) desugared it to SES and ran ENCAP obtained NOT CONFINED  culprits: ADSAFE.go and ADSAFE.lib Sandbox: JSLint Filter API: ADSAFE object ADSafe: Mechanism for safely embedding ads This was an actual exploit Analysis (2 nd attempt) fixed this bug ran ENCAP again obtained CONFINED Result: ADSafe DOM API safely confines DOM objects under the SES threat model, assuming the annotations hold

Summary General discussion of tools −Fundamental limitations −Basic method based on abstract states More details on one specific method −Property checkers from Engler et al., Coverity −Sample security-related results Quick tour of JavaScript tool −JavaScript confinement verifier −Proves isolation; not “just” bug finding Slides from: S. Bugrahe, A. Chou, I&T Dillig, D. Engler, A. Taly, …

Program Analysis for Security John Mitchell CS 155 Spring 2012.

Similar presentations

Presentation on theme: "Program Analysis for Security John Mitchell CS 155 Spring 2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Program Analysis for Security John Mitchell CS 155 Spring 2012.

Similar presentations

Presentation on theme: "Program Analysis for Security John Mitchell CS 155 Spring 2012."— Presentation transcript:

Similar presentations

About project

Feedback