Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome *, Brad Karp *†, and Dawn Song * † Intel Research Pittsburgh * Carnegie.

Similar presentations


Presentation on theme: "Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome *, Brad Karp *†, and Dawn Song * † Intel Research Pittsburgh * Carnegie."— Presentation transcript:

1 Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome *, Brad Karp *†, and Dawn Song * † Intel Research Pittsburgh * Carnegie Mellon University

2 James Newsome May, 20052 Internet Worms Definition: Malicious code that propagates by exploiting software No human interaction needed Able to spread very quickly Slammer scanned 90% of Internet in 10 minutes

3 James Newsome May, 20053 Proposed Defense Strategy ! Worm Detected! Honeycomb [Kreibich2003] Autograph [Kim2004] Earlybird [Singh2004]

4 James Newsome May, 20054 Challenge: Polymorphic Worms Polymorphic worms minimize invariant content Encrypted payload Obfuscated decryption routine Polymorphic tools are already available Clet,ADMmutate Do good signatures for polymorphic worms exist? Can we generate them automatically?

5 James Newsome May, 20055 Good News: Still some invariant content GETHost: Payload Part 2 HTTP/1.1URLHost: Payload Part 1 Random Headers Random Headers Random Headers Decryption Routine Decryption Key Encrypted Payload \xff\xbf NOP slide Protocol framing Needed to make server go down vulnerable code path Overwritten Return Address Needed to redirect execution to worm code Decryption routine Needed to decrypt main payload BUT, code obfuscation can eliminate patterns here

6 James Newsome May, 20056 Bad News: Previous Approaches Insufficient Previous approaches use a common substring Longest substring “HTTP/1.1” 93% false positive rate Most specific substring “\xff\xbf”.008% false positive rate (10 / 125,301) Decryption Routine Decryption Key Encrypted Payload \xff\xbf NOP slide GETHost: Payload Part 2 HTTP/1.1URLHost: Payload Part 1 Random Headers Random Headers Random Headers

7 James Newsome May, 20057 What to do? No one substring is specific enough BUT, there are multiple substrings Protocol framing Value used to overwrite return address (Parts of poorly obfuscated code) Our approach: combine the substrings

8 James Newsome May, 20058 Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Evaluation Attacking the system Conclusion

9 James Newsome May, 20059 Goals Identify classes of signatures that can: Accurately describe polymorphic worms Be used to filter a high speed network line Be generated automatically and efficiently Design and implement a system to automatically generate signatures of these classes

10 James Newsome May, 200510 Polygraph Architecture Network Tap Flow Classifier Signature Generator Suspicious Flow Pool Innocuous Flow Pool Worm Signatures

11 James Newsome May, 200511 Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Evaluation Attacking the system Conclusion

12 James Newsome May, 200512 Signature Class (I): Conjunction Signature is a set of strings (tokens) Flow matches signature iff it contains all tokens in the signature O(n) time to match (n is flow length) Generated signature: “GET” and “HTTP/1.1” and “\r\nHost:” and “\r\nHost:” and “\xff\xbf”.0024% false positive rate (3 / 125,301) Decryption Routine Decryption Key Encrypted Payload \xff\xbf NOP slide GETHost: Payload Part 2 HTTP/1.1URLHost: Payload Part 1 Random Headers Random Headers Random Headers

13 James Newsome May, 200513 Generating Conjunction Signatures Use suffix tree to find set of tokens that: Occur in every sample of suspicious pool Are at least 2 bytes long Generation time is linear in total byte size of suspicious pool Based on a well-known string processing algorithm [Hui1992]

14 James Newsome May, 200514 Signature Class (II): Token Subsequence Signature is an ordered set of tokens Flow matches iff it contains all the tokens in signature, in the given order O(n) time to match (n is flow length) Generated signature: GET.*HTTP/1.1.*\r\nHost:.*\r\nHost:.*\xff\xbf.0008% false positive rate (1 / 125,301) Decryption Routine Decryption Key Encrypted Payload \xff\xbf NOP slide GETHost: Payload Part 2 HTTP/1.1URLHost: Payload Part 1 Random Headers Random Headers Random Headers

15 James Newsome May, 200515 Generating Token Subsequence Signatures Use dynamic programming to find longest common token subsequence (lcseq) between 2 samples in O(n 2 ) time [SmithWaterman1981] Find lcseq of first two samples Iteratively find lcseq of intermediate result and next sample

16 James Newsome May, 200516 Experiment: Signature Generation How many worm samples do we need? Too few samples  signature is too specific  false negatives Experimental setup Using a 25 day port 80 trace from lab perimeter Innocuous pool: First 5 days (45,111 streams) Suspicious Pool: Using Apache exploit described earlier Non-invariant portions filled with random bytes Signature evaluation: False positives: Last 10 days (125,301 streams) False negatives: 1000 generated worm samples

17 James Newsome May, 200517 Signature Generation Results # Worm Samples ConjunctionSubseq 2100% FN 3 to 100 0% FN.0024% FP 0% FN.0008% FP GET.* HTTP/1.1\r\n.*\r\nHost:.*\xee\xb7.*\xb2\x1e.*\r\nHost:.*\xef\xa3.*\x8b\xf4.*\x89\x8b.*E\xeb.*\xff\xbf GET.* HTTP/1.1\r\n.*\r\nHost:.*\r\nHost:.*\xff\xbf

18 James Newsome May, 200518 Also Works for Binary Protocols Created polymorphic version of BIND TSIG exploit used by Li0n Worm Single substring signatures: 2 bytes of Ret Address:.001% false positives 3 byte TSIG marker:.067% false positives Conjunction: 0% false positives Subsequence: 0% false positives Evaluated using a 1 million request trace from a DNS server that serves a major university and several CCTLDs

19 James Newsome May, 200519 Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Evaluation Attacking the system Conclusion

20 James Newsome May, 200520 Noise in Suspicious Flow Pool What if classifier has false positives? 3 worm samples: GET.* HTTP/1.1\r\n.*\r\nHost:.*\r\nHost:.*\xff\xbf 3 worm samples + 1 legit GET request: GET.* HTTP/1.1\r\n.*\r\nHost: 3 worm samples + a non-HTTP request:.*

21 James Newsome May, 200521 Our Approach: Hierarchical Clustering Used for multiple sequence alignment in Bioinformatics [Gusfield1997] Initialization: Each sample is a cluster Each cluster has a signature matching all samples in that cluster Greedily merge clusters Minimize false positive rate, using innocuous pool Stop when any further merging results in significant false positives Output the signature of each final cluster of sufficient size

22 James Newsome May, 200522 Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Merge Candidate Common substrings: HTTP/1.1, GET, … High false positive rate!

23 James Newsome May, 200523 Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Merge Candidate Common substrings: HTTP/1.1, GET, … High false positive rate!

24 James Newsome May, 200524 Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Common substrings: HTTP/1.1, GET, \xff\xbf, \xde\xad Low false positive rate (but high false negative rate) Merge Candidate

25 James Newsome May, 200525 Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Cluster HTTP/1.1, GET, \xff\xbf, \xde\xad HTTP/1.1, GET, \xff\xbf

26 James Newsome May, 200526 Clustering Evaluation (with noise) Suspicious pool consists of: 5 polymorphic worm samples Varying number of noise samples Noise samples chosen uniformly at random from evaluation trace Clustering uses innocuous pool to estimate false positive rate

27 James Newsome May, 200527 Clustering Results NoiseConjunction Fpos Fneg Subseq Fpos Fneg 0%.0024% 0%.0008% 0% 38%.0024% 0%.0008% 0% 50%.0024% 0%.0008% 0% 80%.0024% 0%.7470% 100%.0008% 0% 1.109% 100% 90%.0024% 0%.3384% 100%.4150% 100%.0008% 0%.6903% 100% 1.716% 100%

28 James Newsome May, 200528 Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Evaluation Attacking the system Conclusion

29 James Newsome May, 200529 Overtraining Attacks Conjunction and Subsequence can be tricked into overtraining Red herring attack Include extra fixed tokens Remove them over time Result: Have to keep generating new signatures Coincidental pattern attack Create ‘coincidental’ patterns given a small set of worm samples Result: more samples needed to generate a low-false-negative signature (50+)

30 James Newsome May, 200530 Solution: Threshold matching Signature classifies as worm if enough tokens are present Implementation: Bayes Signatures Assign each token a score based on Bayes Law Choose highest-acceptable false positive rate Choose threshold that gets at most that rate in innocuous training pool Properties:  Signatures generated and matched in linear time  Not susceptible to overtraining attacks  Don’t need clustering  You get the false positive rate you specify  Currently does not use ordering

31 James Newsome May, 200531 Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Evaluation Attacking the system Conclusion

32 James Newsome May, 200532 Remaining False Positives Conjunction signature has 3 false positives 1 of these also matched by subsequence signature What is causing these? Would it be so bad if 3 legitimate requests were filtered out every 10 days?

33 James Newsome May, 200533 The Offending Request GET /Download/GetPaper.php?paperId=XXX HTTP/1.1 … Host: nsdi05.cs.washington.edu\r\n … POST /Author/UploadPaper.php HTTP/1.1\r\n … Host: nsdi05.cs.washington.edu\r\n …

34 James Newsome May, 200534 Possible Fixes Use protocol knowledge Match on request level instead of TCP flow level Require \xff\xbf be part of Host header Disadvantage: need protocol knowledge Use distance between tokens Makes signatures more specific Disadvantage: risks more overtraining attacks

35 James Newsome May, 200535 Future Work Defending against overtraining Further reducing false positives Could be reduced by learning more features (such as offsets) But this increases risk of overtraining Promising solution: semantic analysis Automatically analyze how worm exploit works Only use features that must be present First steps in Newsome05 (NDSS) Currently extending this work (Brumley-Newsome-Song)

36 James Newsome May, 200536 Conclusions Key observation: Content variability is limited by nature of the software vulnerability Have shown that: Accurate signatures can be automatically generated for polymorphic worms Demonstrated low false positives with real exploits, on real traffic traces

37 James Newsome May, 200537 Thanks! Questions? Contact: jnewsome@ece.cmu.edu

38

39 James Newsome May, 200539 Conjunction & Subsequence may overtrain Coincidental pattern attack: For non-invariant bytes, choose ‘a’ or ‘b’ Result: Suspicious pool has many substrings in common of form: ‘aabba’, ‘babba’… Unseen worm samples will have many of these substrings, but not every one Coincidental Pattern Attack

40 James Newsome May, 200540 Results with “Coincidental Pattern Attack” False negatives: Suspicious Pool Size

41 James Newsome May, 200541 Results: Multiple Worms + Noise NoiseConjunctionSubseqBayes 0%.0024% 0%.0008% 0%.008% 0% 38%.0024% 0%.0008% 0%.008% 0% 50%.0024% 0%.0008% 0%.008% 0% 80%.0024% 0%.7470% 100%.0008% 0% 1.109% 100%.008% 0% 90%.0024% 0%.3384% 100%.4150% 100%.0008% 0%.6903% 100% 1.716% 100% 10% 100%

42 James Newsome May, 200542 The Innocuous Pool Used to determine: How often tokens appear in legit traffic Estimated signature false positive rates Goals: Representative of current traffic Does not contain worm flows Can be generated by: Taking a relatively old trace Filtering out known worms and exploits

43 James Newsome May, 200543 Key Algorithm: Token Extraction Need to identify useful tokens Substrings that occur in worm samples Problem: Find all substrings that: Occur in at least k out of n samples Are at least x bytes long Can be solved in time linear in total length of samples using a suffix tree

44 James Newsome May, 200544 Signature Class (III): Bayes Use a Bayes classifier Presence of a token is a feature Hence, each token has a score: Generated signature: (‘GET’:.0035, ‘Host:’:.0022, ‘HTTP/1.1’:.11, ‘\xff\xbf’: 3.15) Threshold=1.99.008% false positive rate (10 / 125,301)

45 James Newsome May, 200545 Generating Bayes Signatures Use suffix tree to find tokens that occur in a significant number of samples Determine probabilities: Pr(worm) = Pr(~worm) =.5 Pr(substring|worm): use suspicious pool Pr(substring|~worm): use innocuous pool Set a “certainty threshold” c Signature matches a flow if the Bayes formula identifies it as more than c% likely to be a worm Choose c that results in few (< 5) false positives in innocuous pool

46 James Newsome May, 200546 Innocuous Pool Poisoning Before releasing worm: Determine what signature of worm is Flood Internet with innocuous requests that match Eventually included in innocuous training pool Release worm Polygraph will: Generate signature for worm See that it causes many false positives in innocuous pool Reject signature Solution: Use a relatively old trace for innocuous pool Drawback: Hierarchical clustering generates more spurious signatures


Download ppt "Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome *, Brad Karp *†, and Dawn Song * † Intel Research Pittsburgh * Carnegie."

Similar presentations


Ads by Google