Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International.

Similar presentations


Presentation on theme: "Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International."— Presentation transcript:

1 Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08)

2 Motivation -- Most software takes structural input

3 Applications -- Software Testing/Debugging  Using Input Grammar to Generate Test Cases  K. Hanford. Automatic Generation of Test Cases. In IBM Systems Journal, 9(4), 1970.  P. Purdom. A sentence generator for testing parsers. In BIT Numerical Mathematics, 12(3), 1972  Grammar based whitebox fuzz [PLDI’08]  Delta Debugging  Reducing large failure input [TSE’02]  Hierarchical Delta Debugging (HDD) [ICSE’06]  Execution Fast Forwarding  Reducing Event Log for failure replay[FSE’06]

4 Applications -- Computer Security  Malware, Attack instance  Signature generation  Exploit (input) Signature  Payload length, keywords, Field structure…  Penetration testing  Software vulnerability  Play with Input (fuzz)  Packet Vaccine [CCS’06]  ShieldGen [IEEE S&P’07]  Malware Protocol Replayer  Malware feature Replay the protocolInput Format 

5 Challenges  Input structure exists in a machine unfriendly way  Plain text (ASCII Stream, e.g., C File)  Binary Code (Protocol Message Stream)  Known specification (RFC)  Implementation Deviation  Unknown Specification  Malware  Bot  Botnet protocol  Legal software  SAMBA protocol (12 years for open source community)

6 Challenges  May not have the Source Code Access  Penetration testing  Malware analysis  Legal software  Working on binary

7 Our Contributions  2 different approaches to handling 2 types of parsers  Using Dynamic Control Dependency to handle top down parsers  A new dynamic analysis to handle bottom up parsers by identifying and analyzing the parsing stack  Experimental results show that the proposed analyses are highly effective in producing very precise input syntax trees

8 Outline  Motivation  Technical Description  Handling Inputs with A Top-down Parser  Handling Inputs with A Bottom-up Parser  Evaluation  Discussion  Related Work  Conclusion

9 I. Top down Parser  Parse input in a top-down manner. S B H SHB HhNhN N1|21|2 BbB|εbB|ε hN 1 Bb ε B b h1bbε

10 Implementation Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 SHB HhNhN N1|21|2 BbB|εbB|ε H B

11 Execution Trace Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 c=getchar() if(c==‘h’) c = getchar() if(c==‘1’||’2’) c = getchar() break c = getchar() h 1 while(c==‘b’) b1b1 if(c==‘ε’’) b2b2 while(c==‘b’) b2b2 if(c==‘ε’’) ε h1bbε Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes

12 Execution Trace Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 c=getchar() if(c==‘h’) c = getchar() while(c==‘b’) break if(c==‘ε’’) c = getchar() h if(c==‘1’||’2’) 1 b1b1 if(c==‘ε’’) c = getchar() b2b2 while(c==‘b’) b2b2 ε h1bbε Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes if(c==‘ε’’) c = getchar() while(c==‘b’)

13 Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Control dependency graph for the execution trace c=getchar() if(c==‘h’) c = getchar() if(c==‘1’||’2’) c = getchar() while(c==‘b’) if(c==‘ε’’) c = getchar() break while(c==‘b’) if(c==‘ε’’) c = getchar() h 1 b2b2 b1b1 b2b2 ε START A Control Dependency Graph: A Graph in which any given node directly controls its child node execution S B H hN 1 Bb ε B b

14 Eliminate non data use node c=getchar() if(c==‘h’) c = getchar() if(c==‘1’||’2’) c = getchar() while(c==‘b’) if(c==‘ε’’) c = getchar() break while(c==‘b’) if(c==‘ε’’) c = getchar() START h 1 b2b2 b1b1 b2b2 ε S B H hN 1 Bb ε B b

15 Add Data Use Leaf Node if(c==‘h’) if(c==‘1’||’2’) while(c==‘b’) if(c==‘ε’’) while(c==‘b’) if(c==‘ε’’) START h 1 b2b2 b1b1 b2b2 ε S B H hN 1 Bb ε B b

16 Add Data Use Leaf Node if(c==‘h’) if(c==‘1’||’2’) while(c==‘b’) if(c==‘ε’’) while(c==‘b’) if(c==‘ε’’) START h 1 b 1 b 2 ε S B H hN 1 Bb ε B b

17 Eliminate Redundant Node 2 if(c==‘h’) 4 if(c==‘1’||’2’) 9 1 while(c==‘b’) 11 1 if(c==‘ε’’) START h 1 b 1 b 2 9 2 while(c==‘b’) 11 2 if(c==‘ε’’) b 2 ε S B H hN 1 Bb ε B b Identical Node

18 II. Bottom up parser  Parse input in a bottom up manner  Programming languages  lex/yacc SAB Aaa Bb aab S a B a b A

19 A General Bottom Up Parsing Algorithm while (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A → β stack.pop (|β|); stack.push (A); } aab SAB Aaa Bb Trace: while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….

20 A General Bottom Up Parsing Algorithm while (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A → β stack.pop (|β|); stack.push (A); } aab SAB Aaa Bb Trace: while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….

21 Tree Construction aab SAB Aaa Bb Stack Operation Trace: Push(a), Push(a), Pop(aa), Push(A) Push(b), Pop(b), Push(B), Pop(AB), Push(S) Pop(b) Push(B) Push(b) Push(a) Push(A) Push(a) Push(S) S a B a b A Identify the parsing stack Identical Node

22 Evaluation – Top down grammar Bad?

23 Evaluation – Top down grammar

24 Evaluation – Bottom up grammar Identical Node

25 Performance Overhead 5X-45X 6X-8X

26 Discussion  Grammar categories  Top down, bottom up, any others?  Possible to evade the control dependency structure in top down parser implementation.  Individual input  Multiple input  final grammar  Syntactic Structure  Semantics

27 Related Work  Network Protocol Format Reverse Engineering  Instruction Semantics (Comparison, loop  keyword, delimiter)  Polyglot [CCS’07]  Automatic Network Protocol Analysis [NDSS’08]  Tupni [CCS’08]  Execution Context (Call stack, PC)  AutoFormat [NDSS’08]  Limitations  Part of the problem space  Only top-down parsers.  Part of the problem’s essence.  Comparison (predicate), call stack  control dependency

28 Conclusion  Two dynamic analyses to construct input structure from program execution.  No source code access or any symbolic information.  Highly effective and produce input syntax trees with high quality.

29 Thank you To further contact us: {zlin,xyzhang}@cs.purdue.eduzlin,xyzhang}@cs.purdue.edu Q & A


Download ppt "Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International."

Similar presentations


Ads by Google