Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantics-Aware Malware Detection

Similar presentations


Presentation on theme: "Semantics-Aware Malware Detection"— Presentation transcript:

1 Semantics-Aware Malware Detection
Mihai Christodorescu, Somesh Jha University of Wisconsin, Madison Sanjit Seshia, Dawn Song, Randal Bryant Carnegie Mellon University

2 Malicious Code Problem
Malware is everywhere. Source: Symantec Internet Security Threat Report (vol. VII) Large malware families. Problem in general is undecidable – the goal is, then, to find more powerful techniques for detection. There is no technique that detects all malware. May 9, 2005 Mihai Christodorescu

3 Evasion Techniques Obfuscations applied to malicious code make evasion very easy. Attacker’s goal: Preserve (subset of) behavior. Transformations of code and data. Addition of new code and data. Easy to create variants in large numbers. May 9, 2005 Mihai Christodorescu

4 The Current Solution Syntactic signatures are insufficient.
Overfitting, easy to evade. Each new variant requires a new signature. Then users need to update their software frequently (even hourly at times). May 9, 2005 Mihai Christodorescu

5 No Resilience to Obfuscations
False Negative Rate for Obfuscated Worms Source: “Testing Malware Detectors” (ISSTA 2004) May 9, 2005 Mihai Christodorescu

6 Our Contributions Introduce semantic signatures that combine syntactic and semantic information. Develop a prototype for malware detection using semantic signatures. Show, empirically, that one semantic signature can detect a malware family. To improve this situation, we developed a new detection method based on semantic signatures. In contrast to syntactic signatures, semantic signatures have more detection power by being less tied to a particular malware instance. Semantic information reduces the number of false positives. To show that these new signatures can be used in practice and are not just a theoretical construct, we built a prototype for malware detection that uses semantic signatures. Finally, we designed signatures that can detect instances from the same family of malware. May 9, 2005 Mihai Christodorescu

7 An Example Goal: detect any mass-mailing virus.
Detect capabilities. Detect self-propagation. Possible syntactic signature: socket() connect() “EHLO” send() s = socket( ... ); connect( s ); ... sprintf( buf, “EHLO %s”, dnsname ); send( s, buf ); Adapted from MyDoom.Q May 9, 2005 Mihai Christodorescu

8 Variant 1: String Manipulation
Hide known constants from virus scanner. Syntactic signature does not match. Syntactic signature: socket() connect() “EHLO” send() s = socket( ... ); connect( s ); ... sprintf( buf, “E%s %s”, “HLO”, dnsname ); send( s, buf ); s = socket( ... ); connect( s ); ... sprintf( buf, “EHLO %s”, dnsname ); send( s, buf ); X Adapted from MyDoom.L May 9, 2005 Mihai Christodorescu

9 Variant 2: String Obfuscation
Hide known constants using simple encryption techniques (e.g. ROT13, XOR). Syntactic signature does not match. Syntactic signature: socket() connect() “EHLO” send() s = socket( ... ); connect( s ); ... cmdbuf = rot13( “URYB %f” ); sprintf( buf, cmdbuf, dnsname ); send( s, buf ); s = socket( ... ); connect( s ); ... sprintf( buf, “EHLO %s”, dnsname ); send( s, buf ); X May 9, 2005 Mihai Christodorescu Adapted from MyDoom.G

10 = + Semantic Signatures
Goal of attacker: same behavior in different form. s = socket( ... ); connect( s ); ... sprintf( buf, “EHLO send( s, buf ); X = socket(); connect(Y); send(Z,T); Syntactic info “EHLO” X Y Z T Semantic info = + Malware Instance Semantic Signature May 9, 2005 Mihai Christodorescu

11 Power of Semantic Signatures
Detect any variant that uses the same sequence of instructions. s1 = socket( ... ); connect( s2 ); send( s3, buf ); X = socket(); connect(Y); send(Z,T); X + Y “EHLO” Z T Syntactic info Semantic info Semantic Signature May 9, 2005 Mihai Christodorescu

12 Semantics-Aware Detection
We have a new detection model: Semantic signatures combine syntactic and semantic information. Can we build it? Does it work? Where do signatures come from? May 9, 2005 Mihai Christodorescu

13 A Semantics-Aware Detector
Match the syntactic constructs, then check for the semantic information. X X = socket(); connect(Y); send(Z,T); + Y “EHLO” Z T Syntactic info Semantic info Semantic Signature May 9, 2005 Mihai Christodorescu

14 Checking for Semantic Info
Program: s1 = socket( ... ); a[ i++ ] = s1; s2 = a[ i – 1 ]; connect( s2 ); ... 1 2 3 4 X X = socket(); connect(Y); ... + Y “EHLO” Z T Check that “s2 has the same value as s1” or check the value predicate: value(s1 after line 1) == value(s2 before line 4) May 9, 2005 Mihai Christodorescu

15 Checking a Value Predicate
Program: Value predicate: s1 = socket( ... ); a[ i++ ] = s1; s2 = a[ i – 1 ]; connect( s2 ); ... value(s1 after line 1) == value(s2 before line 4) 1 2 3 4 Equivalent condition: Lines 2 and 3 are a semantic nop with respect to the value predicate. May 9, 2005 Mihai Christodorescu

16 Tools for Checking Value Preds.
Instance of program verification problem: Does program P respect property φ ? More powerful, higher cost Code Fragment P Pattern Matching Random Execution Simplify Theorem Prover UCLID Model Checker Expressions e1, …, ek Yes No Yes Yes May 9, 2005 Mihai Christodorescu

17 Evaluation of Our Prototype
Developed signatures for several families of worms. No false positives. Improved resilience to common obfuscations. May 9, 2005 Mihai Christodorescu

18 Evaluation of Semantic Signatures
Netsky.C Netsky.D Netsky.O Netsky.P Netsky.T Netsky.W Decryption sig Prototype detector Netsky.B Mass-mailing sig McAfee uses individual signatures for each worm. Semantic signatures provide forward detection. May 9, 2005 Mihai Christodorescu

19 Performance Prototype is slower than commercial anti-virus tools.
Plenty of room for improvement. e.g. disassembler: 25% of time. Malware Family Running Time Average Std. Deviation Netsky 99.57 s 41.01 s Beagle 56.41 s 40.72 s May 9, 2005 Mihai Christodorescu

20 Evaluation: False Positive Rate
Tested the semantic signatures on 2,000 benign Windows binaries. False positive rate: 0% May 9, 2005 Mihai Christodorescu

21 Evaluation: Obfuscation Resilience
Different types garbage insertion applied to Beagle.Y to obtain more variants. Obfuscation Type Semantics-Aware Detection McAfee Average Time Detection Rate Nop insertion 74.81 s 100% 75% Stack op. insertion s 25% Math op. insertion s 95% 5% These obfuscations reflect the capabilities of obfuscation toolkits and libraries available in the wild. May 9, 2005 Mihai Christodorescu

22 Limitations X X Limited support for equivalent code sequences.
a = b * 2 a = b << 1 In the semantic signature, order of instructions is significant. a = b + 1 c = c + 1 c = c + 1 a = b + 1 X X May 9, 2005 Mihai Christodorescu

23 Where do we go from here? Detection of self-replication
Up to now: Syntactic signature detection This work: Semantics-aware detection Future: Equivalent code sequences Detection of self-replication Better performance May 9, 2005 Mihai Christodorescu

24 Semantics-Aware Malware Detection
Mihai Christodorescu Somesh Jha University of Wisconsin, Madison Sanjit Seshia Dawn Song Randal Bryant Carnegie Mellon University

25 BACKUP May 9, 2005 Mihai Christodorescu

26 Implementation Program IDA Pro IR Conversion Template Detector
Decision Proc. Yes/No May 9, 2005 Mihai Christodorescu

27 Semantics-Aware Detection
Templates encode program semantics, independent of the program code. Detection succeeds in the presence of obfuscations. Code reordering Garbage insertion Register renaming May 9, 2005 Mihai Christodorescu

28 Intermediate Representation (IR)
Intel IA-32 instruction set is complex (CISC) eax  eax + ebx add eax, ebx  zflag  ( eax == 0 ) Detection on the IR can handle a limited type of equivalent instruction replacement. May 9, 2005 Mihai Christodorescu

29 Value Predicates Program Template Conditions Conditions
x = socket( ... ); connect( x ); ... sprintf( tmp, “E%s %s”, “HLO”, dnsname ); send( x, tmp ); 1 2 3 4 5 Template s = socket( ... ); connect( s ); send( s, buf ); 1 2 3 Conditions value( ) = value( ) value( ) = value( ) value( ) = “EHLO “ Conditions value( ) = value( ) value( ) = value( ) value( ) = “EHLO “ May 9, 2005 Mihai Christodorescu

30 Evaluation of Semantic Signatures
Created 2 templates describing decryption and mass-mailing features. Netsky B, C, D, O, P, T, W Beagle I, J, N, O, P, R, Y Sober A, C, D, E, F, G, I Measured false positive rate. Measured resilience to obfuscation. May 9, 2005 Mihai Christodorescu


Download ppt "Semantics-Aware Malware Detection"

Similar presentations


Ads by Google