Presentation is loading. Please wait.

Presentation is loading. Please wait.

Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.

Similar presentations


Presentation on theme: "Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast."— Presentation transcript:

1 Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast Conference Gatlinburg, Tennessee November 12-13, 2009

2 State of the Threat (Jan – Jun 2009) Microsoft Security Intelligence Report : – 115,854,807 infections in first half 2009 – 94,985,967 infections in second half 2008  An increase of about 22% (2008) AVTest Labs – 15,000 to 20,000 new specimens analyzed each day. (4 times as many as in 2006, 15 times as many as in 2005) (ESET ) Talented teams of programmers Automated Malware Creation: – W32.Evol, W32.Simile, W32.NGVCK, W32.VCL, etc.

3 What Does the AV Industry Need? Automation – (Szor 2005) The need for analysis by humans is a major bottleneck! Ability to quickly and accurately detect new malware. – (Team Cymru, 2008) 1000 new samples submitted, only 37% detected by commercial AV products! Badly needs “good” Generic Signatures – (Kaspersky Lab 2008) Windows Explorer was flagged as malicious – AVIEN’s HARLEY (On average, current detection(using generic signatures) rates are no better than 70%-80%)

4 Our Problem: Engine Generated Malware ENGINE VIRUS SAMPLE Variant1 Variant2 Variant nVariant3 Network Too many signatures challenge the detector Malware detector Signature Database (Virus Definitions) In Out

5 Solution: Use Engine Signature ENGINE VIRUS SAMPLE Variant1 Variant2 Variant nVariant3 Internet Use one small piece of info about the engine to detect all of the variants. Malware detector Engine Signature In Out Network

6 MALWARE GENERATION AS A HIDDEN MARKOV MODEL NOP * CALL JMP MOV Transition Matrix = Engine Signature (Choice of relevant instructions = 5 most frequent instructions) NOP MOV PUSH CALL JMP * NOP0.00 0.33 0.33 0.00 0.00 0.33 MOV0.21 0.29 0.14 0.21 0.00 0.07 PUSH0.00 0.60 0.40 0.00 0.00 0.00 CALL0.00 0.67 0.00 0.00 0.00 0.33 JMP0.00 0.50 0.00 0.00 0.50 0.00 *0.00 0.67 0.00 0.00 0.33 0.00 PUSH 0.33 0.50 0.21 MOV JNZ MOV PUSH MOV NOP MOV NOP ADD JMP MOV NOP PUSH JZ PUSH MOV CALL MOV CALL SUB MOV PUSH MOV CALL POP MOV Transition matrix is n+1 by n+1 and represents the engine  Problem: Find smallest n that will induce best accuracy MOV * MOV PUSH MOV NOP MOV NOP * JMP MOV NOP PUSH * PUSH MOV CALL MOV CALL * MOV PUSH MOV CALL POP MOV Take only the n most frequent instructions, for some n. 0.33 0.21 0.29 0.21 0.07 0.60 0.40 0.67 0.33 0.50 0.67

7 Subjects and Preparation 100 malware samples of W32.Evol and W32.Simile (Metamorphic viruses) 100 malware samples generated by NGVCK 100 malware samples generated by VCL – Source: www.vx.netlux.org. 100 benign samples – Source: sourceforge.net, download.com, installation of Windows Vista.

8 Classification Method For each sample – Identify a training subset of size 30 – Compute the transition matrix for each trainer – Take the average of these. – This average is the engine signature for the sample. For each instance not used for training – Compute the transition matrix of the instance – Compute the Euclidian Distance between the instance and each of the engine signatures generated in the above stage – The signature that is found to be closest to this instance’s transition matrix is declared to be the instances’ family. If there are ties, choose one at random.

9 Average Matrix Classifier (1 st Order Markov Chain) Results: RELEVANT INSTRUCTIONSMISCLASSIFICATIONS 205.33% 257.33% 108% 1511%

10 K-Nearest Neighbor Classification Concept Results Limitations: – Time – Space

11 Discussion Average Matrix vs Knn – Time and space efficiency – Accuracy – Behavioral characteristics not taken into account RIs: ideal RI in the vicinity of 20

12 Conclusion and Further Work Conclusion – Good Accuracy (8% misclassifications) – Small Signature (11 by 11 matrix) – Fast Detection (12 min for 150 tests) Further Work – 2 nd order – Work with more samples – Work with other families of malware – Different ways of choosing the relevant instructions – Try a different distance measure

13 References http://www.microsoft.com/security/portal/Threat/SIR.aspx http://www.washingtonpost.com/wp- dyn/content/article/2008/03/19/AR2008031901439.html http://packetstormsecurity.org/mag/40hex/40HEX-10/40HEX- 10.001J http://www.research.ibm.com/antivirus/SciPapers/Tesauro/N euralNets.html. Last retrieved April 12, 2009 M.R. Chouchane. “Approximate Detection of Machine- morphed Malicious Programs”. Ph.D. Dissertation. (2008) Using Engine Signature to Detect Metamorphic Malware. Chouchane and Lakhotia, WORM 2006.

14 References Ivan Krsul and Eugene H. Spafford, Authorship Analysis: Identifying the Author of a Program. Computers & Security (1997) Peter Szor, The Art of Computer Virus Research and Defense. (Chapter 7) 2005 Wing Wong and Mark Stamp, Hunting for Metamorphic Engines. J Comput Virol (2006) www.vx.netlux.org, last retrieved April 12, 2009


Download ppt "Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast."

Similar presentations


Ads by Google