Analyzing and Transforming Binary Code (for Fun & Profit) Gopal Gupta R. Venkitaraman, R. Reghuramalingam The University of Texas at Dallas 11/15/2004.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Computer Security: Principles and Practice EECS710: Information Security Professor Hossein Saiedian Fall 2014 Chapter 10: Buffer Overflow.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 11 – Buffer Overflow.
Lecture 16 Buffer Overflow modified from slides of Lawrie Brown.
Programming Types of Testing.
Breno de MedeirosFlorida State University Fall 2005 Buffer overflow and stack smashing attacks Principles of application software security.
Stack-Based Buffer Overflows Attacker – Can take over a system remotely across a network. local malicious users – To elevate their privileges and gain.
Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Overview of program analysis Mooly Sagiv html://
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Overview of program analysis Mooly Sagiv html://
Lecture 16 Buffer Overflow
CMSC 414 Computer and Network Security Lecture 20 Jonathan Katz.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
University of Washington CSE 351 : The Hardware/Software Interface Section 5 Structs as parameters, buffer overflows, and lab 3.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
Lecture 6: Buffer Overflow CS 436/636/736 Spring 2014 Nitesh Saxena *Adopted from a previous lecture by Aleph One (Smashing the Stack for Fun and Profit)
ELG6163 Presentation Geoff Green March 20, 2006 TI Standard for Writing Algorithms.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Lecture slides prepared for “Computer Security: Principles and Practice”, 3/e, by William Stallings and Lawrie Brown, Chapter 10 “Buffer Overflow”.
ECE 353 Lab 1: Cache Simulation. Purpose Introduce C programming by means of a simple example Reinforce your knowledge of set associative caches.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Buffer Overflow Attack-proofing by Transforming Code Binary Gopal Gupta Parag Doshi, R. Reghuramalingam The University of Texas at Dallas 11/15/2004.
Smashing the Stack Overview The Stack Region Buffer Overflow
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Lecture 8: Buffer Overflow CS 436/636/736 Spring 2013 Nitesh Saxena *Adopted from a previous lecture by Aleph One (Smashing the Stack for Fun and Profit)
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Protecting C Programs from Attacks via Invalid Pointer Dereferences Suan Hsi Yong, Susan Horwitz University of Wisconsin – Madison.
Computer Organization and Design Pointers, Arrays and Strings in C Montek Singh Sep 18, 2015 Lab 5 supplement.
Buffer overflow and stack smashing attacks Principles of application software security.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Static Analysis of Executable Assembly Code to Ensure QA and Reuse Ramakrishnan Venkitaraman Graduate Student, Research Track Computer Science, UT-Dallas.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
VM: Chapter 7 Buffer Overflows. csci5233 computer security & integrity (VM: Ch. 7) 2 Outline Impact of buffer overflows What is a buffer overflow? Types.
Chapter 10 Chapter 10 Implementing Subprograms. Implementing Subprograms  The subprogram call and return operations are together called subprogram linkage.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
Hello world !!! ASCII representation of hello.c.
Framework for Safe Reuse Of Software Binaries Ramakrishnan Venkitaraman Advisor: Gopal Gupta The University of Texas at Dallas 11/15/2004.
Chapter 10 Buffer Overflow 1. A very common attack mechanism o First used by the Morris Worm in 1988 Still of major concern o Legacy of buggy code in.
Operating Systems A Biswas, Dept. of Information Technology.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Dynamic Allocation in C
Object Lifetime and Pointers
Shellcode COSC 480 Presentation Alison Buben.
Protecting Memory What is there to protect in memory?
YAHMD - Yet Another Heap Memory Debugger
Protecting Memory What is there to protect in memory?
Protecting Memory What is there to protect in memory?
CMSC 414 Computer and Network Security Lecture 21
High Coverage Detection of Input-Related Security Faults
CS 465 Buffer Overflow Slides by Kent Seamons and Tim van der Horst
Advanced Buffer Overflow: Pointer subterfuge
Programming Fundamentals (750113) Ch1. Problem Solving
Software Security Lesson Introduction
Lecture 9: Buffer Overflow*
Chapter 1 Introduction(1.1)
Programming Fundamentals (750113) Ch1. Problem Solving
Programming Fundamentals (750113) Ch1. Problem Solving
Presentation transcript:

Analyzing and Transforming Binary Code (for Fun & Profit) Gopal Gupta R. Venkitaraman, R. Reghuramalingam The University of Texas at Dallas 11/15/2004

The Components Marketplace COTS Component based software engg has been touted as a pathway to improving productivity (now called web-services) However: many obstacles to be surmounted: Discovering that the needed component exists Checking that the component is compliant Checking that the component is secure

Software Reuse & System Integration But, the Integrated System does not work Cost of Project Companies

Our work Design of a Universal Service-Semantics Description Language (USDL) [ECOWS’05] Construction of automatic service discovery and service composition engines Once a service/component has been down- loaded, ensuring that it is compliant & safe

Analyzing & Transforming Binaries Most of the time when a component/service is obtained, only the binary code is available (source code is properietary). Compliance and safety checks have to be done on the binary code. Our thesis: this is quite feasible Illustrate compliance check with an example from DSP industry. Illustrate code securing by transforming binary for protecting from buffer overflow attacks.

Analyzing DSP codes: Motivation Facilitate software reuse in the DSP industry DSP h/w manufacturers are interested in developing DSP software COTS components so that time to market is small DSP components generally available only in binary form (no source code) DSP software uses low-level optimizations for efficiency Need to ensure that these optimizations do not interfere with reusability

Our Framework We develop necessary and sufficient conditions that ensure that a software binary is reusable We relate these conditions to TI’s XDAIS standard We show how static analysis can be used to check if these conditions hold We illustrate this through analysis for detecting hard coded pointers

Conditions to ensure reusablility C1: The binary code should not change during execution in a way that link-time symbol resolution will become invalid C2: The binary code should not be written in a way that it needs to be located starting from some fixed location in the virtual memory

Broadening the Conditions C1 and C2 are hard to characterize and even harder to detect So, broaden the conditions C1 and C2 to get conditions C3 and C4

Framework to ensure reusability C3: The binary code is re-entrant No self-modifying code Should not make link-time symbol resolution invalid C4: The binary code should not contain any hard- wired memory addresses Binaries should not be assumed to be located at a fixed virtual memory location

TI XDAIS Standard Contains 35 rules and 15 guidelines SIX General Programming Rules No tool currently exists to check for compliance We want to build a tool to ENFORCE software compliance for these rules

XDAIS – General Programming Rules 1)All programs should follow the runtime conventions of TI’s C programming language 2)Programs must be re-entrant 3)No hard coded data memory locations 4)No hard coded program memory locations 5)Algorithms must characterize their ROM-ability 6)No peripheral device accessed directly

Advantages Of Compliant Code Allows system integrators to easily migrate between TI DSP chips Subsystems from multiple software vendors can be integrated into a single system Programs are framework-agnostic: the same program can be efficiently used in virtually any application

XDAIS vs. Our Framework Rule 1 is not really a programming rule, since it requires compliance with TI's definition of the C Language Rules 2 through 5 are manifestations of conditions C3 and C4 above. Rules 2 and 5 correspond to condition C3 Rules 3, 4, and 6 correspond to condition C4

XDAIS – General Programming Rules 1)All programs should follow the runtime conventions of TI’s C programming language 2)Programs must be re-entrant 3)No hard coded data memory locations 4)No hard coded program memory locations 5)Algorithms must characterize their ROM-ability 6)No peripheral device accessed directly

Problem and Solution Problem: Detection of hard coded addresses in programs without accessing source code. Solution: “Static Program Analysis of Assembly Code”

Some examples showing hardcoding void main() { int * p = 0x8800; // Some code *p = …; } Example1: Directly Hardcoded void main() { int *p = 0x80; int *q = p; //Some code *q = …; } Example2: Indirectly Hardcoded void main() { int *p, val; p = ….; val = …; if(val) p = 0x900; else p = malloc(…); *p; } Example3: Conditional Hardcoding NOTE: We don’t care if a pointer is hard coded and is never dereferenced.

Static Analysis Un-decidability: Impossible to build a tool that will precisely detect hard coding Static Analysis: defined as any analysis of a program carried out without completely executing the program

Interest in Static Analysis “We actually went out and bought for 30 million dollars, a company that was in the business of building static analysis tools and now we want to focus on applying these tools to large-scale software systems ” Remarks by Bill Gates, 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Application, November 2002.

Hard Coded Addresses Bad Programming Practice. Results in non relocatable code. Results in non reusable code.

Overview Of Our Approach Input: Object Code of the Software Output: Compliant or Not Compliant status Activity Diagram for our Static Analyzer Disassemble Object Code Split Into Functions Obtain Basic Blocks Obtain Flow Graph Static Analysis Output the Result

Basic Aim Of Analysis Find a path to trace pointer origin. Problem: Exponential Complexity Static Analysis approximation makes it linear

Analyzing Source Code – Easy { { q } } { { p } } P IS HARD CODED So, the program is not compliant with the standard

Analyzing Assembly Code is Hard Problem No type information is available Instruction level pipeline and parallelism Solution Backward analysis Use Abstract Interpretation

Analyzing Assembly – Hard A0 main: A0 07BD09C2 SUB.D2 SP,0x8,SP A4 020FA02A MVK.S2 0x1f40,B A8 023C22F6 STW.D2T2 B4,*+SP[0x1] AC NOP B0 023C42F6 STW.D2T2 B4,*+SP[0x2] B NOP B8 0280A042 MVK.D2 5,B BC F6 STW.D2T2 B5,*+B4[0x0] C NOP C4 008C8362 BNOP.S2 B3, C8 07BD0942 ADD.D2 SP,0x8,SP CC NOP D NOP {{ }} { { B4 } } B4 = 0x1f40 So, B4 is HARD CODED Code is NOT Compliant

Abstract Interpretation Based Analysis Domains from which variables draw their values are approximated by abstract domains The original domains are called concrete domains

Lattice Abstraction Lattice based abstraction is used to determine pointer hard-coded ness.

Contexts Contexts to Abstract Contexts Abstract Context to Context

Phases In Analysis Phase 1: Find the set of dereferenced pointers Phase 2: Check the safety of dereferenced pointers

Building Unsafe Sets (Phase 1) The first element is added to the unsafe set during pointer dereferencing. E.g. If “*Reg” in the disassembled code, the unsafe set is initialized to {Reg}. ‘N’ Pointers Dereferenced  ‘N’ Unsafe sets Maintained as SOUS (Set Of Unsafe Sets)

Populating Unsafe Sets (Phase 2) For e.g., if Reg = reg1 + reg2, the element “Reg” is deleted from the unsafe set, and the elements “reg1”, “reg2”, are inserted into the unsafe set. Contents of the unsafe set will now become {reg1, reg2}.

Pointer Arithmetic All pointer operations are abstracted during analysis

Handling Loops Complex: # iterations of loop may not be known until runtime. Cycle the loop until the unsafe set reaches a “fixed point”. No new information is added to the unsafe set during successive iterations.

Merging Information If no merging, then exponential complexity. Mandatory when loops Information loss. If (Cond) Then Block B Else Block C Block D Block A Block E

Extensive Compliance Checking Handle all cases that occur in programs Single pointer, double pointer, triple pointer… Global pointer variables Static and Dynamic arrays

Extensive Compliance Checking Loops – all forms (e.g. for, while…) Function calls Pipelining and Parallelism Merging information from multiple paths

Proof – Analysis is Sound Consistency of α and γ functions is established by showing the existence of Galois Connection. That is, x = α(γ(x)) y belongs to γ(α(y))

Analysis Results Program# Lines# * Ptrs # Hard Coded Chain Length Running Time (ms) t_read timer mcbsp figtest m_hdrv dat gui_codec codec stress demo

Sample Code

Fig. Flow Graph

Related Work UNO Project – Bell Labs Analyze at source level TI XDAIS Standard Contains 35 rules and 15 guidelines. SIX General Programming Rules. No tool currently exists to check for compliance.

Current Status and Future Work Prototype Implementation done But, context insensitive, intra-procedural Extend to context sensitive, inter-procedural. Extend compliance check for other rules.

So… Software reuse is an important issue in the industry, particularly the DSP industry Checking compatibility of code w/ reusability standards at assembly level is possible A Static Analysis based technique is useful and practical

Buffer Overflow Attack-proofing Sample Code void function (char *a, char* b, char* c) { char buffer1[8]; } void main( ){ function (“foo”, “bar”, “ren”); } Stack at the start ESP Stack Heap Data Code 00 ff ff

Stack Organization: Before a Call Sample Code void function (char* a, char* b, char* c){ char buffer1[8]; } void main( ){ function(“foo”, “bar”, “ren”); } Stack before a call Parameters Heap, Data & Code Param 3 = “ren” Param 2 = “bar” Param 1 = “foo” ESP Stack

Stack Organization: After a Call Sample Code void function (char* a, char* b, char* c){ char buffer1[8]; } void main( ){ function(“foo”, “bar”, “ren”); } Stack after a function call Local variables... Stack Param 3 = “foo” Param 2 = “bar” Param 1 = “ren” Return address ebp Local variables Heap, Data & Code EBP ESP

Buffer Overflow Sample Code void function (char *str){ char buffer1[8]; strcpy (buffer1, str); } void main( ){ char large_str[256] ; for (int i=0; i<255; i++) large_str[i] = ‘A’; function(large_str); Label: } New return address = Stack showing buffer overflow Stack Large_str (Size = 64) Return address ebp Buffer1 (Size = 2) Strcpy writes Label: Pointer Garbage 41 41

Abusing the Buffer Overflow Step 1: Overwrite the return address with an address that points ‘back’ to the buffer area Step 2: Insert code that you wish to execute in the buffer area Step 3: Buffer start of inserted code with NOP instructions Step 4: Eliminate any null values in inserted code Stack used to abuse Buffer Overflow Stack Return Address ebp NOP mov eax,ebx add eax, 1

Buffer Overflow: Security Concern Percentage of buffer overflows listed in CERT advisories each year Some examples include Windows 2003 server, sendmail, windows HTML conversion library Percentage of Buffer Overflows Per Year as listed by CERT [1]

Buffer Overflow Solutions RAD: RAD stores the return address in RAR area It is a gcc compiler patch. All code has to recompiled Stackguard: Stackguard inserts a ‘canary’ word to protect return address The ‘canary’ word can be compromised Splint: Splint allows the user to write annotations in the code that define allocated and used sizes User is required to write annotations Wagner’s Prevention Method: Static analysis solution Depends on source code availability

BinarySecure: An Overview Buffer Overflow is achieved by overwriting the return address If return addresses are recorded in a separate area, away from the buffer overflow, then they cannot be overwritten So modify the memory organization to add a new auxiliary return address stack, allocated in an area opposite to the direction of buffer write/overflow --When a function call returns, it uses the return address from this new stack Transform the binary to make it consistent with this new memory organization.

BinarySecure: Return Address The return address is saved as part of the program execution stack The auxiliary stack is allocated at the bottom of the program stack This stack is uncompromised as memory writes occur in the opposite direction Overflow Direction

BinarySecure

Binary Secure: Specifications These are some of the conditions that must hold Code must be re-entrant Code should not modify the stack pointer Processor: Intel x386 Compiler: Dev C++ compiler Platform: Windows

Advantages Binary code is analysed. This can be used on third-party software where one does not have access to source code. Run-time checks require modification to the source code (Splint) Compiler modifications are costly and performing changes to all available compilers is not possible. (RAD, Stackguard) Return addresses are stored on the stack itself. Hence overhead incurred while accessing addresses in other areas is reduced.

Software Reuse & System Integration WOW!!!! It works… Select ONLY Compliant/Safe Software

More Information 1.R.Venkitaraman and G.Gupta, Static Program Analysis of Embedded Executable Assembly Code. ACM CASES, September R.Venkitaraman and G.Gupta, Framework for Safe Reuse of Software Binaries. ICDCIT, December Master’s Thesis– R.Venkitaraman, Framework for Safe Reuse Of Software Binaries, The University of Texas at Dallas; Dec Master’s Thesis – R. Reguramalingam, BinarySecure: A Tool for Buffer Overflow Attack Proofing of Software Binaries; Dec S. Kona, A. Bansal, L. Simon, A. Mallya, G. Gupta, T. Hite. Universal Service-Semantics Description Lang. ECOWS’05

Questions…