Presentation is loading. Please wait.

Presentation is loading. Please wait.

String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison.

Similar presentations


Presentation on theme: "String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison."— Presentation transcript:

1 String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison

2 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20052 What is String Analysis? Recovery of values a string variable might take at a given program point. void main( void ) { char * msg = “no msg”; printf( “This food has %s.\n”, msg ); } Output: This food has no msg. 

3 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20053 Why Do We Need String Analysis? We could just use the strings program: $ strings no_msg /lib/ld-linux.so.2 libc.so.6... no msg This food has %s. $  

4 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20054 Why Perform String Analysis? Computer forensics Given an unknown program, we want to know the files it might access, the registry keys it might get and set, the commands it might execute. Program verification SQL queries, embedded scripting, …

5 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20055 A Complicated Example void main( void ) { char buf[257]; strcpy( buf, “/” ); strcat( buf, “b” ); strcat( buf, “i” ); strcat( buf, “n” );... system( buf ); }  Running strings : / a b c d... Running a string analysis: /bin/ifconfig –a | /bin/mail...@...

6 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20056 Our Contributions Developed a string analysis for binaries. Implemented x86sa, a string analyzer for Intel IA-32 binaries. Evaluated on both benign and malicious binaries.

7 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20057 Outline  String analysis for Java.  String analysis for x86.  Evaluation.  Applications & future work.

8 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20058 String Analysis for Java 1.Create string flowgraph. void main( void ) { String x = “/”; x = x + “b”; x = x + “i”; x = x + “n”;... System.exec( x ); } Christensen, Møller, Schwartzbach "Precise Analysis of String Expressions " (SAS’03) “/”“b” concat “i” concat...

9 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 20059 String Analysis for Java [2] 2.Create context-free grammar. 3.Approximate with finite automaton. “/”“b” concat “i” concat... A1  “/” “b” A2  A1 “i” A3  A2 “n” A4  A3 “/” … “/bin/ifconfig...”

10 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200510 From Java to x86 executables FlowgraphCFGFSA Java.class

11 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200511 From Java to x86 executables Rest of this talk: Bridge the syntactic and semantic gaps between Java and assembly language. x86 executable FlowgraphCFGFSA

12 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200512 Outline  String analysis for Java.  String analysis for x86.  Evaluation.  Applications & future work.

13 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200513 Four Problems with Assembly 1.No types. 2.No high-level constructs. 3.No argument passing convention. 4.No Java string semantics.

14 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200514 Problem 1: No Types Solution: infer types from C lib. funcs. Assumption #1: Strings are manipulated only using string library functions. char * strcat( char * dest, char * src ) After:“eax” points to a string. Before:“dest” and “src” point to a string.

15 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200515 Problem 1: No Types [cont.] Perform a backwards analysis to find the strings: –Destination registers “kill” string type information. –Libc string functions “gen” string type information. –Strings at entry to CFG are constant strings or function parameters.

16 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200516 Problem 1: No Types [example] String variables: –after the call:{ eax } –before the call:{ ebx, ecx } eax = _strcat( ebx, ecx );

17 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200517 Problem 2: Function Parameters Function parameters are not explicit in x86 machine code. movecx, [ebp+var1] pushecx movebx, [ebp+var2] pushebx call_strcat add esp, 8

18 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200518 Problem 2: Fn. Params [example] movecx, [ebp+var1] pushecx movebx, [ebp+var2] pushebx call_strcat add esp, 8 Stack pointer pushecx pushebx

19 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200519 movecx, [ebp+var1] pushecx movebx, [ebp+var2] pushebx call_strcat add esp, 8 Problem 2: Function Parameters Solution: Perform forwards analysis modeling x86 instructions effects on the stack.

20 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200520 Problem 3: Unmodeled Functions String type information and stack model may be incorrect! Assumption #2: “_cdecl” calling convention and well behaved functions Treat all function arguments and return values as strings.

21 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200521 Problem 4: Java vs. x86 Semantics Java strings are immutable, x86 strings are not. String y, x=“x”; y = x; y = y + “123”; System.out.println(x); char *y; char x[10] = “x”; y = x; y = strcat(y,”123”); printf(x); => “x”=> “x123”

22 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200522 Problem 4: Java vs. x86 Semantics Solution: May-Must alias analysis. 0x200: _strcat( ebx,ecx ) eax ebx esi 20 0 eax ebx esi “May alias” relations:

23 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200523 Problem 4: Java vs. x86 Semantics Solution: May-Must alias analysis. 0x200: _strcat( ebx,ecx ) eax ebx esi 20 0 eax ebx esi “Must alias” relations:

24 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200524 Stack Height Transformers String Type Transformers Alias Analysis Transformers UTF-8 func’s Stack Height Transformers String Type Transformers Alias Analysis Transformers libc char* func’s Stack Height Transformers String Type Transformers Alias Analysis Transformers Unicode func’s x86sa Architecture EXE IDA Pro Connector WPDS++ JSA

25 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200525 Intraprocedural Analysis Summary 1.Recover callsite arguments. (stack-operation modeling) 2.Infer string types. (backward type analysis) 3.Discover aliases. (may-, must-alias forward analysis) Generate the String Flow Graph for the Control Flow Graph.

26 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200526 Outline  String analysis for Java.  String analysis for x86.  Evaluation.  Applications & future work.

27 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200527 Example 1: simple char * s1 = "Argc has "; char * s2; char * s3 = " arguments"; char * s4; switch( argc ) { case 1:s2 = "1“ ; break; case 2:s2 = "2“ ; break; default:s2 = "> 2"; break; } s4 = malloc( strlen(s1)+strlen(s2)+strlen(s3)+1 ); s4[0] = 0; strcat( strcat( strcat( s4, s1 ), s2), s3 ); printf( "%s\n", s4 );

28 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200528 Example 1: String Flow Graph “Argc has” “ arguments” “1” concat malloc “2”“> 2” assign Our result: "Argc has 1 arguments" "Argc has 2 arguments" "Argc has > 2 arguments"

29 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200529 Example 2: cstar char * c = “c"; char * s4 = malloc(101); for( int i=0; i < 100 ; i++ ) { strcat( s4, c ); } printf( "%s\n", s4 );

30 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200530 Example 2: String Flow Graph “c” concat malloc assign Our result: c* Correct answer: c 100

31 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200531 Example 3: Lion Worm Code and String Flow Graph omitted. x86sa analysis results: " /sbin/ifconfig -a|/bin/mail angelz1578@usa.net "

32 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200532 Future Work: Interprocedural Analysis 1.Inline everything and apply intra- procedural analysis. 2.“Hook” intraprocedural String Flow Graphs into a “Super String Flow Graph”. 3.Polyvariant analysis over function summaries for String Flow Graphs.

33 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200533 Future Work: Relax Assumptions Relax the assumptions: –Strings can be manipulated in many ways. –Calling conventions can vary in a program. Value Set Analysis looks promising: –Identifies “variables” based on usage patterns.

34 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200534 Future Work: More Applications Malicious code analysis Analysis of dynamic code generators: –Packed programs –Shell code generators

35 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200535 What to Take Away: If you havetype information, AND arguments to calls, AND no aliasing then string analysis is straightforward. String analysis on binaries performs well and provides useful results.

36 String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison kidd { mihai, kidd, wen-han }@cs.wisc.edu

37 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200537 From Java to x86 Binaries Input: x86 binary file Java string analyzer input: Flowgraph Output: finite automaton ?

38 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200538 Problem 4: Java vs. x86 Semantics Solution: Alias analysis. 0x100: mov eax,ecx λS.let A = aliases(ecx) S – {(eax,*)}  ({eax}  A) 0x200: _strcat( ebx, ecx ) λS. let A = alias(ebx) let N = vars(A) (S  (N  {200})) – {(ebx,*),{(eax,*)}  {(ebx,200),(eax,200)}

39 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200539 Example 1: x86sa Analysis Results 1."Argc has 1 arguments" 2."Argc has 2 arguments" 3."Argc has > 2 arguments"

40 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200540 cstar’s x86sa analysis results Infinitely many strings with common prefix: “c”

41 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200541 Transformers – String Inference 0x200: _strcat( ebx, ecx ) String inference λd.{ d – {eax}  {ebx,ecx} }

42 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200542 Transformers – Alias Analysis 0x200: _strcat( ebx, ecx ) ebx  must let T = alias must (ebx) let V = alias may (ebx) λd.  { must(d) ­ {(ebx, *), (eax, *)}  {(ebx, 200), (eax, 200)} - a  T {(a,*)}  a  T {(a,200)}, (may(d) ­ {(ebx, *), (eax, *)}  a  V {(a,200)}) 

43 6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE 200543 Transformers – Alias Analysis 0x200: _strcat( ebx, ecx ) ebx  may let T = alias must (ebx) let V = alias may (ebx) λd.  { must(d) ­ {(eax, *)} - a  T {(a,*)}  {(ebx, 200), (eax, 200)}, {may(d) ­ {(ebx, *), (eax, *)}  a  T {(a,200),(a,old(a)}  a  V {(a,200)}} 


Download ppt "String Analysis for Binaries Mihai Christodorescu Nicholas Kidd Wen-Han Goh University of Wisconsin, Madison."

Similar presentations


Ads by Google