Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Contents What is a buffer overflow? Buffer overflow: Security concern BinarySecure: An overview BinarySecure: Implementation Metadata Phase Mapping Phase Modification Phase Advantages Disadvantages Results
Memory Organization Every programming language divides its memory into four segments Stack: For function calls Heap: For dynamic allocation Code: For program code Data: For static and global variables
Program Execution Stack Sample Code void function (char *a, char* b, char* c) { char buffer1[8]; } void main( ){ function (“foo”, “bar”, “ren”); } Stack at the start ESP Stack Heap Data Code 00 ff ff
Stack Organization: Before a Call Sample Code void function (char* a, char* b, char* c){ char buffer1[8]; } void main( ){ function(“foo”, “bar”, “ren”); } Stack before a call Parameters Heap, Data & Code Param 3 = “ren” Param 2 = “bar” Param 1 = “foo” ESP Stack
Stack Organization: After a Call Sample Code void function (char* a, char* b, char* c){ char buffer1[8]; } void main( ){ function(“foo”, “bar”, “ren”); } Stack after a function call Local variables... Stack Param 3 = “foo” Param 2 = “bar” Param 1 = “ren” Return address ebp Local variables Heap, Data & Code EBP ESP
Buffer Overflow Sample Code void function (char *str){ char buffer1[8]; strcpy (buffer1, str); } void main( ){ char large_str[256] ; for (int i=0; i<255; i++) large_str[i] = ‘A’; function(large_str); Label: } New return address = Stack showing buffer overflow Stack Large_str (Size = 64) Return address ebp Buffer1 (Size = 2) Strcpy writes Label: Pointer Garbage 41 41
Abusing the Buffer Overflow Step 1: Overwrite the return address with an address that points ‘back’ to the buffer area Step 2: Insert code that you wish to execute in the buffer area Step 3: Buffer start of inserted code with NOP instructions Step 4: Eliminate any null values in inserted code Stack used to abuse Buffer Overflow Stack Return Address ebp NOP mov eax,ebx add eax, 1
Buffer Overflow: Security Concern Percentage of buffer overflows listed in CERT advisories each year Some examples include Windows 2003 server, sendmail, windows HTML conversion library Percentage of Buffer Overflows Per Year as listed by CERT [1]
Buffer Overflow Solutions RAD: RAD stores the return address in RAR area It is a gcc compiler patch. All code has to recompiled Stackguard: Stackguard inserts a ‘canary’ word to protect return address The ‘canary’ word can be compromised Splint: Splint allows the user to write annotations in the code that define allocated and used sizes User is required to write annotations Richard Wagner’s Prevention Method: Static analysis solution Depends on code syntax and hence not complete
BinarySecure: An Overview Buffer Overflow is achieved by overwriting the return address If return addresses are recorded in a separate area, away from the buffer overflow, then they cannot be overwritten So modify the memory organization to add a new return address stack, allocated in an area opposite to the direction in which buffer would write When a function call returns, it uses the return address from this new stack
BinarySecure: Return Address The return address is saved as part of the program execution stack The stack is allocated at the bottom of the program stack This stack is uncompromised as memory writes occur in the opposite direction Overflow Direction
BinarySecure : The methodology Input: Portable Executable (PE) file Output: Modified PE file The PE file is analyzed to determine all function calls in a file Code is added to the start and end of each function The modification copies the return address to a new location and on return retrieves from that location
BinarySecure
Binary Secure: Specifications These are some of the conditions that must hold Code must be re-entrant Code should not modify the stack pointer Processor: Intel x386 Compiler: Dev C++ compiler Platform: Windows
The Portable Executable Format Validation DOS Error Code NT Signature Number of Sections e_lfanew
BinarySecure: Metadata Phase Metadata phase is so called because information required for analysis is collected This include entry points, size of code, Virtual address and relative address The PE Explorer software is then used to obtain the disassembled form of the PE file
BinarySecure: Mapping Phase Analysis starts from the entry point to determine all the calls made by the code Each call determines a function location For each unique function location using the start address, the end address is determined If instructions are added, all calls and jumps need to be changed accordingly
Mapping Phase: Passes First Pass: All function calls made Result: StartAddr table Second Pass: End of each function call Result: EndAddr table Third Pass: ‘Call’ and ‘jump’ opcodes Result: Opcodes to modify and increase value Opcodes modification can be relative or absolute addresses
Mapping Phase: Relative Addresses Calling location > Called Location 40: Called Location 60: Calling Location-20 Function End / Start Address
Mapping Phase: Relative Addresses Calling location < Called Location 40: Calling Location 60: Called Location Function End / Start Address -20
BinarySecure: Modification Phase Instructions to copy the return address to the new ‘BinarySecure’ stack are written to all function start Instructions to retrieve the return address are added to each function end Changes to all opcodes are made Changes to the header are committed Output: Modified PE file
Advantages Binary code is analysed. This can be used on third- party software where one does not have access to source code. Run-time checks require modification to the source code (Splint) Compiler modifications are costly and performing changes to all available compilers is not possible. (RAD, Stackguard) Return addresses are stored on the stack itself. Hence overhead incurred while accessing addresses in other areas is reduced.
Disadvantages The stack has to store a list of return addresses. Storage overhead = depth of the flow graph is incurred. The code is machine dependent. But, it covers machines from 80x86 upwards. A large number of machines fall in this category.
Results Correctness Code with buffer overflow – buffTest The code has an overflow problem. The modified exe fixes that problem Performance BubbleSort, Calendar, Math The modified PE provides the same result as the original
Results - Demo
References 1. Smashing the stack for fun and profit ow.html ow.html 2. Intel manuals entation.htm entation.htm Programmer’s reference manual
Thank You ! Thank you & Questions
PE File Binary Format
PE File Disassembled
Flow Graph Of Code
Mapping Phase: Relative Addresses Calling location > Called Location 56: Called Location 76: Calling Location-20 Function End / Start Address
Mapping Phase: Relative Addresses Calling location > Called Location 56: Called Location 76: Calling Location-20 40: Function End / Start Address Calling Location should call 40 Hence, offset should be -36
Mapping Phase: Relative Addresses Calling location > Called Location 40: Called Location 76: Calling Location-20 Calling Location should call 40 Hence, offset should be -36 Function End / Start Address
Mapping Phase: Relative Addresses Calling location > Called Location 40: Called Location 76: Calling Location-20 Calling Location should call 40 Hence, offset should be : Function End / Start Address
Mapping Phase: Relative Addresses Calling location > Called Location 40: Called Location 60: Calling Location-20 Function End / Start Address
Mapping Phase: Relative Addresses Calling location < Called Location 56: Calling Location 76: Called Location +20 Function End / Start Address
Mapping Phase: Relative Addresses Calling location < Called Location 56: Calling Location 76: Called Location : Function End / Start Address
Mapping Phase: Relative Addresses Calling location < Called Location 40: Calling Location 76: Called Location +20 Calling Location should call 76 Hence, offset should be +36 Function End / Start Address
Mapping Phase: Relative Addresses Calling location < Called Location 40: Calling Location 76: Called Location +20 We don’t change this because we want it to call the modified function at 60 60: Function End / Start Address
Mapping Phase: Relative Addresses Calling location < Called Location 40: Calling Location 60: Called Location +20 Function End / Start Address