Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory.

Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory University of Illinois at Urbana-Champaign Preliminary Examination, May 4 th, 2004 Committee Chair: Prof. Ravishankar K. Iyer Committee: Prof. Vikram Adve Prof. Jose Meseguer Prof. David Nicol

Significance of Software Implementation Errors Bugtraq: 70% of security vulnerabilities due to implementation errors. Bugtraq: 70% of security vulnerabilities due to implementation errors.

What I Have Done Analyzed CERT and Bugtraq reports and the corresponding application source code. Analyzed CERT and Bugtraq reports and the corresponding application source code. Developed a new FSM representation to decompose each security vulnerability to a series of elementary activities (primitive FSMs), each indicating a simple predicate. Developed a new FSM representation to decompose each security vulnerability to a series of elementary activities (primitive FSMs), each indicating a simple predicate. The FSM analysis showed The FSM analysis showed –Many vulnerabilities (  66%) due to pointer taintedness: user input value used as a pointer value (which should be transparent to users). –A significant portion of vulnerabilities (  33.6%) due to errors in library functions or incorrect invocations of library functions The FSM modeling led to a formal reasoning approach to examine pointer taintedness in applications. The FSM modeling led to a formal reasoning approach to examine pointer taintedness in applications.

Formal Analysis of Pointer Taintedness : a pointer value, including a return address, is derived directly or indirectly from user input. (formally defined using equational logic) Pointer Taintedness: a pointer value, including a return address, is derived directly or indirectly from user input. (formally defined using equational logic) Provides a unifying perspective for reasoning about a significant number of security vulnerabilities. Provides a unifying perspective for reasoning about a significant number of security vulnerabilities. The notion of pointer taintedness enables: The notion of pointer taintedness enables: –Static analysis: reasoning about the possibility of pointer taintedness by source code analysis; –Runtime checking: inserting assertions in object code to check pointer taintedness at runtime; –Hardware architecture-based support to detect pointer taintedness. Current focus: extraction of security specifications of library functions based on pointer taintedness semantics. Current focus: extraction of security specifications of library functions based on pointer taintedness semantics.

Publications of My Research Papers: – –J. Xu, S. Chen, Z. Kalbarczyk, R. K. Iyer. "An Experimental Study of Security Vulnerabilities Caused by Errors". DSN 2001. – –S. Chen, J. Xu, R. K. Iyer, K. Whisnant. "Modeling and Analyzing the Security Threat of Firewall Data Corruption Caused by Instruction Transient Errors". DSN 2002. – –S. Chen, Z. Kalbarczyk, J. Xu, R. K. Iyer. "A Data-Driven Finite State Machine Model for Analyzing Security Vulnerabilities". DSN 2003. – –S. Chen, K. Pattabiraman, Z. Kalbarczyk, R. K. Iyer, “Formal Reasoning of Various Categories of Widely Exploited Security Vulnerabilities Using Pointer Taintedness Semantics”, IFIP Information Security Conference, 2004. Security Vulnerability Report – –S. Chen and J. Xu, “Bugtraq ID 6255: NULL HTTPD Heap Corruption Vulnerability”, the Bugtraq List.

A Finite State Machine Approach for Analyzing Security Vulnerabilities

Overview of the Study An analysis of security vulnerability databases (CERT and Bugtraq) An analysis of security vulnerability databases (CERT and Bugtraq) Examination of security vulnerabilities at the application source-code level Examination of security vulnerabilities at the application source-code level A security vulnerability usually consists of a series of vulnerabilities in multiple elementary activities. Each can be represented by a primitive FSM, indicating a simple predicate. A security vulnerability usually consists of a series of vulnerabilities in multiple elementary activities. Each can be represented by a primitive FSM, indicating a simple predicate. Provide formalism in reasoning and describing security vulnerabilities. Provide formalism in reasoning and describing security vulnerabilities. Usefulness of the formalism: discovery of the HTTP daemon heap overflow vulnerability. Usefulness of the formalism: discovery of the HTTP daemon heap overflow vulnerability.

Observation from Data Analysis Vulnerability ID and Name Assigned Category Description in Bugtraq Report Elementary Activity #3163: Sendmail signed integer overflow Input validation error A negative input integer is accepted as an array index Get an input integer #5493: FreeBSD System Call Signed Integer Vulnerability Boundary condition error A negative value supplied for the argument allows exceeding the boundary of an array Use the integer as the index to an array #3958: RSYNC Signed Array Index Remote Code Execution Vulnerability Access validation error A remotely supplied signed value is used as an array index, allowing the corruption of a function pointer or a return address. Execute a code referred by a function pointer or a return address Same vulnerabilities can be classified in different categories. Why? Because of the existence of multiple elementary activities.

Primitive FSM We use Primitive FSM (pFSM) to depict an elementary activity, which specifies a predicate (SPEC) that should be guaranteed in order to ensure security. We use Primitive FSM (pFSM) to depict an elementary activity, which specifies a predicate (SPEC) that should be guaranteed in order to ensure security. IMPL_ACCEPT IMPL_REJECT SPEC_REJECT SPEC_ACCEPT SPEC Check State Reject State Accept State

Size(PostData)<length(input) contentLen<0 contentLen>=0 length(input) <= Size(PostData) Op 1: Read user input from a socket into a heap buffer get (contentLen, input) Calloc PostData[1024+contentLen] Copy input from the socket B->fd=A B->bk=C B->fd and B->bk changed Heap structure corrupted * When buf is freed, execute B->fd->bk = B->bk B->fd and B->bk unchanged A function pointer corrupted * pFree changed  - pFree unchanged  - Execute pFree when function free is called Attacker’s malicious code is executed Calloc is called -  Load pFree to the memory during program initialization pFSM 1 pFSM 2 pFSM 3 pFSM 4 Op 2: Free the buffer Op 3: Manipulate the function pointer NULL HTTPD Heap Corruption Vulnerabilities (Bugtraq #5774, #6255)

length(input)>Size(PostData) contentLen<0 contentLen>=0 length(input) <= Size(PostData) get (contentLen, input) contentLen is an integer, input: string to be read from a socket Calloc PostData[1024+contentLen] Copy input from the socket to PostData by recv() call ? pFSM 1 pFSM 2 0: Get contentLen //Negative ?? 1: PostData = calloc(contentLen +1024, sizeof(char));x=0; rc=0; 2: pPostData= PostData; 3: do { 4: rc=recv(sock, pPostData, 1024, 0); 5: if (rc==-1) { 6: closeconnect(sid,1); 7: return; 8: } 9: pPostData+=rc; 10: x+=rc; 11: }while ((rc==1024) || (x<contentLen)); Op 1: Read User Data from a Socket to a Heap Buffer

Sendmail Debugging Function Signed Integer Overflow (Bugtraq #3163) Operation 1: Write integer i to tTvect[x] addr_setuid unchanged tTvect[x]=i Operation 2: Manipulate the function pointer addr_setuid changed Execute code referred by addr_setuid convert str_i and str_x to integer i and x ( integer represented by str_x) > 2 31 x  100 x > 100 ? Execute malicious code get text strings str_x and str_i ? x 100 0  x  100 Function pointer is tainted * Load the function pointer ( integer represented by str_x)  2 31 pFSM 1 pFSM 2 pFSM 3

Modeled Vulnerabilities Signed Integer Overflow Signed Integer Overflow Heap Corruption Heap Corruption Stack Overflow Stack Overflow Format String Vulnerabilities Format String Vulnerabilities File Race Conditions File Race Conditions Some Input Validation Vulnerabilities Some Input Validation Vulnerabilities

Formal Reasoning of Security Vulnerabilities by Pointer Taintedness Semantics

Pointer Taintedness Caused Vulnerabilities Format string vulnerability Format string vulnerability –Taint an argument pointer of functions such as printf, fprintf, sprintf and syslog. Stack smashing Stack smashing –Taint a return address. Heap corruption Heap corruption –Taint the free-chunk doubly-linked list of the heap. Glibc globbing vulnerabilities Glibc globbing vulnerabilities –User input resides in a location that is used as a pointer by the parent function of glob().

Example of Format String Vulnerability In vfprintf(), if (fmt points to “%n”) then **ap = (character count) Vulnerable code: recv(buf); printf(buf); /* should be printf(“%s”,buf) */ \xdd \xcc \xbb \xaa %d %d %d %n …%n%d%d%d0xaabbccdd fmt: format string pointer ap: argument pointer High Low Stack growth *ap is a tainted value. ap: argument pointer fmt: format string pointer

Taintedness Semantics (Memory Model) A store represents a snapshot of the memory state at a point in the program execution. For each memory location, we can evaluate two properties: content and taintedness (true/false). Operations on memory locations: The fetch operation Ftch(S,A) gives the content of the memory address A in store S The location-taintedness operation LocT(S,A) gives the taintedness of the location A in store S Operations on expressions: The evaluation operation Eval(S,E) evaluates expression E in store S The expression-taintedness operation ExpT(S,E) computes the taintedness of expression E in store S

Axioms of Eval and ExpT operations Eval(S, I) = I // I is an integer constant Eval(S, ^ E1) = Ftch(S, Eval(S,E1)) Eval(S, E1 + E2) = Eval(S, E1) + Eval(S, E2) Eval(S, E1 - E2) = Eval(S, E1) - Eval(S, E2) … ExpT (S, I) = false ExpT(S, ^ E1) = LocT(S,Eval(S,E1)) ExpT(S,E1 + E2) = ExpT(S,E1) or ExpT((S,E2) ExpT(S,E1 - E2) = ExpT(S,E1) or ExpT((S,E2) … E.g., is the expression (^100)–2 tainted? ExpT(S, (^100)–2) = ExpT(S, (^100)) or ExpT(S, 2) = LocT(S,100) or false = LocT(S,100) Note: ^ is the dereference operator, ^100 gives the content in the location 100

Semantics of Language L Extend the semantics proposed by Goguen and Malcolm Extend the semantics proposed by Goguen and Malcolm The following operations (arithmetic/logic) are defined: The following operations (arithmetic/logic) are defined: – +, -, *, /, %, !, &&, ||, !=, ==, …… The following instructions are defined: The following instructions are defined: –mov [Exp1] <- Exp2 –branch (Condition) Label –call FuncName(Exp1,Exp2,…) Axioms defining mov instruction semantics Axioms defining mov instruction semantics –Specify the effects of applying mov instruction on a store –Allow taintedness to propagate from Exp2 to [Exp1]. Axioms defining the semantics of recv (similarly, scanf, recvfrom) Axioms defining the semantics of recv (similarly, scanf, recvfrom) – –Specify the memory locations tainted by the recv call.

Extracting Function Specifications by Theorem Prover C source code of a library function Code in language L Automatically translated to Language L Critical instruction – indirect writes For each mov [^ E1] <- E2, generate theorems: a) E1 should not be tainted b) The mov instruction should not taint any location outside the buffer pointed by E1 Theorem generation ITP theorem prover A set of sufficient conditions that imply the validity of the theorems. They are the security specifications of the analyzed function.

Example: strcpy() char * strcpy (char * dst, char * src) { char * res; 0: res =dst; while (*src!=0) { 1: *dst=*src; dst++; src++; } 2: *dst=0; return res; } 0: mov [res] <- ^ dst lbl(#while#6) branch (^ ^ src is 0) #ex#while#6 1: mov [^ dst] <- ^ ^ src mov [dst] <- (^ dst) + 1 mov [src] <- (^ src) + 1 branch true #while#6 lbl(#ex#while#6) 2: mov [^ dst] <- 0 mov [ret] <- ^ res Translate to Language L a) Suppose S1 is the store before Line L1, then LocT(S1,dst) = false b) If S0 is the store before Line L0, and S2 is the store after Line L1, then I LocT(S2,I) = LocT(S0, I) c) Suppose S3 is the store before Line L2, then LocT(S3,dst) = false Theorem generation Theorem prover

Specifications Suggested by Theorem Prover Specifications that are extracted by the theorem proving approach Specifications that are extracted by the theorem proving approach –srclen <= dstsize –The buffers src and dst do not overlap in such a way that the buffer dst covers the NULL-terminator of the src string. –The buffers dst and src do not cover the function frame of strcpy. –Initially, dst is not tainted Documented in Linux man page Not documented Suppose when function strcpy() is called, the size of destination buffer (dst) is dstsize, the length of user input string (src) is srclen Suppose when function strcpy() is called, the size of destination buffer (dst) is dstsize, the length of user input string (src) is srclen

Example Scenario Destination buffer should not cover the function frame of strcpy. char input[240]; void foo( ) { int offset; char buf[200]; scanf(“%s”, input ); offset = 200 – strlen( input ); strcpy( buf + offset, input ); } buf strcpy foo res buf buf+offset High Low Stack growth Return Addr. Frame Pointer src dst index Are the extracted specifications possible to be violated in application code?

Other Examples A simplied version of printf() A simplied version of printf() –55 lines of C code –Four security specifications are extracted, including one indicating format string vulnerability Function free() of a heap management system Function free() of a heap management system –36 lines of C code –Seven security specifications are extracted, including several specifications indicating heap corruption vulnerabilities. Socket read functions of Apache HTTPD and NULL HTTPD Socket read functions of Apache HTTPD and NULL HTTPD –The Apache function is proved to be free of pointer taintedness. –Two (known) vulnerabilities are exposed in the theorem proving process.

Summary FSM representation: decompose each vulnerability to multiple simple predicates (with real vulnerability examples) FSM representation: decompose each vulnerability to multiple simple predicates (with real vulnerability examples) A common characteristic of many predicates: their violations result in pointer taintedness A common characteristic of many predicates: their violations result in pointer taintedness Defined a memory model to reason about pointer taintedness Defined a memory model to reason about pointer taintedness Developed a theorem proving approach to extract security specifications from library functions Developed a theorem proving approach to extract security specifications from library functions

Future Directions Develop a VCGen (verification condition generator) to facilitate theorem proving. (in progress) Develop a VCGen (verification condition generator) to facilitate theorem proving. (in progress) Apply the pointer taintedness analysis to a substantial number of commonly used library functions to extract their security specifications. Apply the pointer taintedness analysis to a substantial number of commonly used library functions to extract their security specifications. Compiler techniques for inserting “guarding code” to check unproved properties at runtime. Compiler techniques for inserting “guarding code” to check unproved properties at runtime. Explore the possibility of building the taintedness notion into virtual machines. Explore the possibility of building the taintedness notion into virtual machines. Architecture supports for pointer taintedness detection. A module working with RSE (Reliability and Security Engine). Architecture supports for pointer taintedness detection. A module working with RSE (Reliability and Security Engine).

Backup Slides

Format String Vulnerability int vfprintf (FILE *s, const char *format, va_list ap) { char * p; … … *(int *) va_arg (ap, void *) = count; … … } int printf (const char *format,...) { … … count = vfprintf (stdout, format, arg); … … } int i,j; int main() { char buf[100]; *(unsigned int *)buf=&i; *(buf+4)=0; strcat(buf,"%d%d%d12345%n"); printf(buf); } the addr of i buf = \x78 \x99 \x04 \x08 %d %d %d ‘1’ ‘2’ ‘3’ ‘4’ ‘5’ %n

Elementary Activity 1 of Sendmail Vulnerability ? pFSM 1 a Elementary Activity 1: get user input Get strings str_x and str_i, convert them to integers x and i (integer represented by str_x) > 2 31 (integer represented by str_x)  2 31 Convert str_x and str_i to integers x and i Get str_x and str_i

Elementary Activity 2 of Sendmail Vulnerability pFSM 2 Elementary Activity 2: assign debug level Convert str_x and str_i to integers x and i x 100 0  x  100 x >100 x  100 tTvect[x]=i A function pointer (psetuid) is corrupted

Elementary Activity 3 of Sendmail Vulnerability ? pFSM 3 Elementary Activity 3: manipulation of function pointer psetuid A function pointer (psetuid) is corrupted starting sendmail program Load psetuid to the memory psetuid is changed psetuid is unchanged Execute the code referred by psetuid Execute malicious code

Appropriateness of Dereference A data value x is appropriate to be dereferenced if and only if one of the following condition is true, assuming Y,Z are integer constants: A data value x is appropriate to be dereferenced if and only if one of the following condition is true, assuming Y,Z are integer constants: –x is &foo (foo is a program variable) –x is malloc(Y) –If there exist values a, b and c that are appropriate to dereference, (recursive definition) and x = a + b – c + Z Theorems to prove for indirect write mov [Ê1] <- E2 Theorems to prove for indirect write mov [Ê1] <- E2 –E1 should be appropriate to dereference –If E2 is not appropriate to dereference, then [Ê1] should not be appropriate to dereference.

About Equational Logic A logic defined by equations. Equations are used to rewrite symbolic terms (by replacing the term on the left of the equation with the term on the right of the term). Emphasize on its executability. Define the natural number (NAT): Operators: 0 : a constant of NAT s_ : NAT -> NAT (successor operator) _+_ : NAT NAT -> NAT (addition operator) Equations: 0 + N = N (s M) + N = M + (s N) Example: (s s s 0) + (s s 0) = (s s 0) + (s s s 0) = (s 0) + (s s s s 0) = 0 + (s s s s s 0) = s s s s s 0 Intuitively, this represents “3 + 2 = 5”

Semantics of mov and recv Axioms of mov instruction Axioms of mov instruction Ftch((S ; mov [E1] <- E2),X) = Eval(S,E2) if (Eval(S,E1) is X). Ftch((S ; mov [E1] <- E2),X) = Ftch(S,X) if not (Eval(S,E1) is X). LocT((S ; mov [E1] <- E2),X) = ExpT(S,E2) if (Eval(S,E1) is X). LocT((S ; mov [E1] <- E2),X) = LocT(S,X) if not (Eval(S,E1) is X). Semantics of recv (similarly, scanf, recvfrom) Semantics of recv (similarly, scanf, recvfrom) – –LocT(S ; call recv (sock, buf, len, flag), A) = true if Eval(S,buf) <= A and A < Eval(S, buf + len). – –LocT(S ; call recv (sock, buf, len, flag), A) = LocT(S, A) otherwise.

Related Work Security Modeling Security Modeling –Sheyner and Wing: Attack graphs –Ortalo and Deswarte: Markov models Static code analysis Static code analysis –Buffer overflow detection: Wagner, many others –Format string detection: CQUAL, SPLINT –Assembly code verification: Proof-Carrying Code –Generic (annotation based): SPLINT, Eau Claire Taintedness analysis Taintedness analysis –Perl runtime –CQUAL and SPLINT: taintedness of program variables. »A symbol gets tainted only if an explicit C statement passes a tainted value to it by assignment, argument passing or function return. No underlying memory model. »Not sufficient to detect real pointer taintedness vulnerabilities.

Position My Work Security Specs Library Functions Application Code e.g., src_len < dst_size (strcpy) src and dst do not overlap (strcpy) Do not free a stack buffer Do not double free a buffer First argument of printf cannot come from user … Existing static analysis tools My work

Presentation Outline A Brief Description of FSM Approach of Modeling and Analyzing Security Vulnerabilities A Brief Description of FSM Approach of Modeling and Analyzing Security Vulnerabilities Real Examples of Pointer Taintedness Real Examples of Pointer Taintedness Definition of Pointer Taintedness in Equational Logic Definition of Pointer Taintedness in Equational Logic Extraction of Function Specifications by Theorem Proving Extraction of Function Specifications by Theorem Proving Summary and Future Directions Summary and Future Directions

Extraction of Security Specs of Library Functions using Pointer Taintedness A formal approach to reason about potential vulnerabilities in library source code. A formal approach to reason about potential vulnerabilities in library source code. Reasoning based on a hypothetical memory model: a boolean property taintedness associated with each memory location. Reasoning based on a hypothetical memory model: a boolean property taintedness associated with each memory location. The semantics of pointer taintedness defined in equational logic. The semantics of pointer taintedness defined in equational logic. A theorem prover employed to extract security specifications of library functions. A theorem prover employed to extract security specifications of library functions. Security specifications extracted by the analysis: Security specifications extracted by the analysis: –expose different classes of known security vulnerabilities, such as format string, heap corruption and buffer overflow vulnerabilities; –indicate function invocation scenarios that may expose new vulnerabilities.

Observations from Data Analysis (cont.) Exploiting a vulnerability involves multiple vulnerable operations on several objects. Exploiting a vulnerability involves multiple vulnerable operations on several objects. Exploits must pass through multiple elementary activities, each providing an opportunity for performing a security check. Exploits must pass through multiple elementary activities, each providing an opportunity for performing a security check. For each elementary activity, the vulnerability data and corresponding code inspections allow us to define a predicate, which if violated, will result in a security vulnerability. For each elementary activity, the vulnerability data and corresponding code inspections allow us to define a predicate, which if violated, will result in a security vulnerability.

Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory.

Similar presentations

Presentation on theme: "Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory.

Similar presentations

Presentation on theme: "Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory."— Presentation transcript:

Similar presentations

About project

Feedback