String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

ByteWeight: Learning to Recognize Functions in Binary Code
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Memory Image of Running Programs Executable file on disk, running program in memory, activation record, C-style and Pascal-style parameter passing.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
Model Checking x86 Executables with CodeSurfer/x86 and WPDS++ G. Balakrishnan 1, T. Reps 1,2, N. Kidd 1, A. Lal 1, J. Lim 1, D. Melski 2, R. Gruian 2,
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
Accessing parameters from the stack and calling functions.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Practical Session 8 Computer Architecture and Assembly Language.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
High-Level Language Interface Chapter 17 S. Dandamudi.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
CS162B: Assembly and C Jacob T. Chan. Objectives ▪ System calls ▪ Relation of System calls to Assembly and C ▪ Special System Calls (exit, write, print,
6.828: PC hardware and x86 Frans Kaashoek
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
Practical Session 4. Labels Definition - advanced label: (pseudo) instruction operands ; comment valid characters in labels are: letters, numbers, _,
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Presenter: Jianyong Dai Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookhot.
IA32 (Pentium) Processor Architecture. Processor modes: 1.Protected (mode we will study) – 32-bit mode – 32-bit (4GB) address space 2.Virtual 8086 modes.
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
CNIT 127: Exploit Development Ch 3: Shellcode. Topics Protection rings Syscalls Shellcode nasm Assembler ld GNU Linker objdump to see contents of object.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Computer Organization and Design Pointers, Arrays and Strings in C Montek Singh Sep 18, 2015 Lab 5 supplement.
LOW vs. HIGH Level Languages
C LANGUAGE Characteristics of C · Small size
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Compiler Construction Code Generation Activation Records
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
Introduction to Assembly II Abed Asi Extended System Programming Laboratory (ESPL) CS BGU Fall 2014/2015.
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
Practical Session 4. GNU Linker Links object files together Used as the last step in the compilation We will use ld to link together compiled assembly.
Arrays. Outline 1.(Introduction) Arrays An array is a contiguous block of list of data in memory. Each element of the list must be the same type and use.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
1 Assembly Language: Function Calls Jennifer Rexford.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
Gogul Balakrishnan Thomas Reps University of Wisconsin Analyzing Memory Accesses in x86 Executables.
Practical Session 4. GNU Linker Links object files together Used as the last step in the compilation We will use ld to link together compiled assembly.
Software Engineering Algorithms, Compilers, & Lifecycle.
ICS51 Introductory Computer Organization Accessing parameters from the stack and calling functions.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh
Practical Session 3.
CS 177 Computer Security Lecture 9
Assembly function call convention
Instruction Set Architecture
Data Flow Analysis Suman Jana
Computer Architecture and Assembly Language
Introduction to programming languages, Algorithms & flowcharts
Computer Architecture and Assembly Language
Machine-Level Programming 4 Procedures
University Of Virginia
Machine-Level Programming III: Procedures Sept 18, 2001
MIPS Procedure Calls CSE 378 – Section 3.
Multi-modules programming
“Way easier than when we were students”
Computer Architecture and System Programming Laboratory
Computer Architecture and Assembly Language
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
SPL – PS1 Introduction to C++.
Computer Architecture and System Programming Laboratory
C Pointers Another ref:
Presentation transcript:

String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE What is String Analysis? Recovery of values a string variable might take at a given program point. void main( void ) { char * msg = “no msg”; printf( “This food has %s.\n”, msg ); } Output: This food has no msg. 

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Why Do We Need String Analysis? We could just use the strings program: $ strings no_msg /lib/ld-linux.so.2 libc.so.6... no msg This food has %s. $  

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Why Perform String Analysis? Computer forensics Given an unknown program, we want to know the files it might access, the registry keys it might get and set, the commands it might execute. Program verification SQL queries, embedded scripting, …

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE A Complicated Example void main( void ) { char buf[257]; strcpy( buf, “/” ); strcat( buf, “b” ); strcat( buf, “i” ); strcat( buf, “n” );... system( buf ); }  Running strings : / a b c d... Running a string analysis: /bin/ifconfig –a |

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Our Contributions Developed a string analysis for binaries. Implemented x86sa, a string analyzer for Intel IA-32 binaries. Evaluated on both benign and malicious binaries.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Outline  String analysis for Java.  String analysis for x86.  Evaluation.  Applications & future work.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE String Analysis for Java 1.Create string flowgraph. void main( void ) { String x = “/”; x = x + “b”; x = x + “i”; x = x + “n”;... System.exec( x ); } Christensen, Møller, Schwartzbach "Precise Analysis of String Expressions " (SAS’03) “/”“b” concat “i” concat...

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE String Analysis for Java [2] 2.Create context-free grammar. 3.Approximate with finite automaton. “/”“b” concat “i” concat... A1  “/” “b” A2  A1 “i” A3  A2 “n” A4  A3 “/” … “/bin/ifconfig...”

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE From Java to x86 executables FlowgraphCFGFSA Java.class

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE From Java to x86 executables Rest of this talk: Bridge the syntactic and semantic gaps between Java and assembly language. x86 executable FlowgraphCFGFSA

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Outline  String analysis for Java.  String analysis for x86.  Evaluation.  Applications & future work.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Four Problems with Assembly 1.No types. 2.No high-level constructs. 3.No argument passing convention. 4.No Java string semantics.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 1: No Types Solution: infer types from C lib. funcs. Assumption #1: Strings are manipulated only using string library functions. char * strcat( char * dest, char * src ) After:“eax” points to a string. Before:“dest” and “src” point to a string.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 1: No Types [cont.] Perform a backwards analysis to find the strings: –Destination registers “kill” string type information. –Libc string functions “gen” string type information. –Strings at entry to CFG are constant strings or function parameters.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 1: No Types [example] String variables: –after the call:{ eax } –before the call:{ ebx, ecx } eax = _strcat( ebx, ecx );

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 2: Function Parameters Function parameters are not explicit in x86 machine code. movecx, [ebp+var1] pushecx movebx, [ebp+var2] pushebx call_strcat add esp, 8

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 2: Fn. Params [example] movecx, [ebp+var1] pushecx movebx, [ebp+var2] pushebx call_strcat add esp, 8 Stack pointer pushecx pushebx

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE movecx, [ebp+var1] pushecx movebx, [ebp+var2] pushebx call_strcat add esp, 8 Problem 2: Function Parameters Solution: Perform forwards analysis modeling x86 instructions effects on the stack.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 3: Unmodeled Functions String type information and stack model may be incorrect! Assumption #2: “_cdecl” calling convention and well behaved functions Treat all function arguments and return values as strings.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 4: Java vs. x86 Semantics Java strings are immutable, x86 strings are not. String y, x=“x”; y = x; y = y + “123”; System.out.println(x); char *y; char x[10] = “x”; y = x; y = strcat(y,”123”); printf(x); => “x”=> “x123”

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 4: Java vs. x86 Semantics Solution: May-Must alias analysis. 0x200: _strcat( ebx,ecx ) eax ebx esi 20 0 eax ebx esi “May alias” relations:

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 4: Java vs. x86 Semantics Solution: May-Must alias analysis. 0x200: _strcat( ebx,ecx ) eax ebx esi 20 0 eax ebx esi “Must alias” relations:

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Stack Height Transformers String Type Transformers Alias Analysis Transformers UTF-8 func’s Stack Height Transformers String Type Transformers Alias Analysis Transformers libc char* func’s Stack Height Transformers String Type Transformers Alias Analysis Transformers Unicode func’s x86sa Architecture EXE IDA Pro Connector WPDS++ JSA

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Intraprocedural Analysis Summary 1.Recover callsite arguments. (stack-operation modeling) 2.Infer string types. (backward type analysis) 3.Discover aliases. (may-, must-alias forward analysis) Generate the String Flow Graph for the Control Flow Graph.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Outline  String analysis for Java.  String analysis for x86.  Evaluation.  Applications & future work.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Example 1: simple char * s1 = "Argc has "; char * s2; char * s3 = " arguments"; char * s4; switch( argc ) { case 1:s2 = "1“ ; break; case 2:s2 = "2“ ; break; default:s2 = "> 2"; break; } s4 = malloc( strlen(s1)+strlen(s2)+strlen(s3)+1 ); s4[0] = 0; strcat( strcat( strcat( s4, s1 ), s2), s3 ); printf( "%s\n", s4 );

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Example 1: String Flow Graph “Argc has” “ arguments” “1” concat malloc “2”“> 2” assign Our result: "Argc has 1 arguments" "Argc has 2 arguments" "Argc has > 2 arguments"

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Example 2: cstar char * c = “c"; char * s4 = malloc(101); for( int i=0; i < 100 ; i++ ) { strcat( s4, c ); } printf( "%s\n", s4 );

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Example 2: String Flow Graph “c” concat malloc assign Our result: c* Correct answer: c 100

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Example 3: Lion Worm Code and String Flow Graph omitted. x86sa analysis results: " /sbin/ifconfig -a|/bin/mail "

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Future Work: Interprocedural Analysis 1.Inline everything and apply intra- procedural analysis. 2.“Hook” intraprocedural String Flow Graphs into a “Super String Flow Graph”. 3.Polyvariant analysis over function summaries for String Flow Graphs.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Future Work: Relax Assumptions Relax the assumptions: –Strings can be manipulated in many ways. –Calling conventions can vary in a program. Value Set Analysis looks promising: –Identifies “variables” based on usage patterns.

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Future Work: More Applications Malicious code analysis Analysis of dynamic code generators: –Packed programs –Shell code generators

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE What to Take Away: If you havetype information, AND arguments to calls, AND no aliasing then string analysis is straightforward. String analysis on binaries performs well and provides useful results.

String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison kidd { mihai, kidd, wen-han

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE From Java to x86 Binaries Input: x86 binary file Java string analyzer input: Flowgraph Output: finite automaton ?

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Problem 4: Java vs. x86 Semantics Solution: Alias analysis. 0x100: mov eax,ecx λS.let A = aliases(ecx) S – {(eax,*)}  ({eax}  A) 0x200: _strcat( ebx, ecx ) λS. let A = alias(ebx) let N = vars(A) (S  (N  {200})) – {(ebx,*),{(eax,*)}  {(ebx,200),(eax,200)}

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Example 1: x86sa Analysis Results 1."Argc has 1 arguments" 2."Argc has 2 arguments" 3."Argc has > 2 arguments"

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE cstar’s x86sa analysis results Infinitely many strings with common prefix: “c”

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Transformers – String Inference 0x200: _strcat( ebx, ecx ) String inference λd.{ d – {eax}  {ebx,ecx} }

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Transformers – Alias Analysis 0x200: _strcat( ebx, ecx ) ebx  must let T = alias must (ebx) let V = alias may (ebx) λd.  { must(d) ­ {(ebx, *), (eax, *)}  {(ebx, 200), (eax, 200)} - a  T {(a,*)}  a  T {(a,200)}, (may(d) ­ {(ebx, *), (eax, *)}  a  V {(a,200)}) 

6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Transformers – Alias Analysis 0x200: _strcat( ebx, ecx ) ebx  may let T = alias must (ebx) let V = alias may (ebx) λd.  { must(d) ­ {(eax, *)} - a  T {(a,*)}  {(ebx, 200), (eax, 200)}, {may(d) ­ {(ebx, *), (eax, *)}  a  T {(a,200),(a,old(a)}  a  V {(a,200)}} 