METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Slides:



Advertisements
Similar presentations
Cryptography encryption authentication digital signatures
Advertisements

Classical Encryption Techniques Week 6-wend. One-Time Pad if a truly random key as long as the message is used, the cipher will be secure called a One-Time.
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Intro 1 Introduction Intro 2 Good Guys and Bad Guys  Alice and Bob are the good guys  Trudy is the bad guy  Trudy is our generic “intruder”
Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 5: Learning models using EM
Akelarre 1 Akelarre Akelarre 2 Akelarre  Block cipher  Combines features of 2 strong ciphers o IDEA — “mixed mode” arithmetic o RC5 — keyed rotations.
Sigaba 1 Sigaba Sigaba 2 Sigaba  Used by Americans during WWII o And afterwards (to about 1948)  Never broken o Germans quit collecting, considered.
Practical Techniques for Searches on Encrypted Data Author: Dawn Xiaodong Song, David Wagner, Adrian Perrig Presenter: 紀銘偉.
RC4 1 RC4 RC4 2 RC4  Invented by Ron Rivest o “RC” is “Ron’s Code” or “Rivest Cipher”  A stream cipher  Generate keystream byte at a step o Efficient.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Metamorphic Malware Research
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Polymorphism in Computer Viruses CS265 Security Engineering Term Project Puneet Mishra.
Purple 1 Purple Purple 2 Purple  Used by Japanese government o Diplomatic communications o Named for color of binder cryptanalysts used o Other Japanese.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Session 6: Introduction to cryptanalysis part 1. Contents Problem definition Symmetric systems cryptanalysis Particularities of block ciphers cryptanalysis.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Chapter 2 Basic Encryption and Decryption (part B)
Lecture 23 Symmetric Encryption
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Enigma 1 Enigma Enigma 2 Enigma  Developed and patented (in 1918) by Arthur Scherbius  Many variations on basic design  Eventually adopted by Germany.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Cryptography Week-6.
Cryptanalysis. The Speaker  Chuck Easttom  
Introduction to Profile Hidden Markov Models
1 Chap 10 Malicious Software. 2 Viruses and ”Malicious Programs ” Computer “Viruses” and related programs have the ability to replicate themselves on.
Chapter 2 Basic Encryption and Decryption. csci5233 computer security & integrity 2 Encryption / Decryption encrypted transmission AB plaintext ciphertext.
Masquerade Detection Mark Stamp 1Masquerade Detection.
Department of Computer Science Yasmine Kandissounon.
1 “Operating System Protection Through Program Evolution” Dr. Frederick B. Cohen “…one of the major reasons attacks succeed is because of the static nature.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Structure Classifications &
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Topic 21 Cryptography CS 555 Topic 2: Evolution of Classical Cryptography CS555.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Terminology and classical Cryptology
Cryptanalysis of the Stream Cipher DECIM Hongjun Wu and Bart Preneel Katholieke Universiteit Leuven ESAT/COSIC.
2004 Symantec Corporation, All Rights Reserved Principles and Practice of X-raying Frédéric Perriot Peter Ferrie Symantec Security Response.
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
Advanced Polymorphic Worms: Evading IDS by Blending in with Normal Traffic Authors: Oleg Kolensnikov and Wenke Lee Published: Technical report, 2005, College.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 23 Symmetric Encryption
Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.
n Just as a human virus is passed from person from person, a computer virus is passed from computer to computer. n A virus can be attached to any file.
DATA & COMPUTER SECURITY (CSNB414) MODULE 3 MODERN SYMMETRIC ENCRYPTION.
METAMORPHIC VIRUS NGUYEN LE VAN.
Brian Lukoff Stanford University October 13, 2006.
Lecture 4 Page 1 CS 236 Online Basic Encryption Methods Substitutions –Monoalphabetic –Polyalphabetic Permutations.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Lecture 3 Page 1 CS 236 Online Introduction to Cryptography CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
Department of Computer Science Chapter 5 Introduction to Cryptography Semester 1.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
HMM Applications Mark Stamp HMM Applications.
Chap 10 Malicious Software.
Chap 10 Malicious Software.
Presentation transcript:

METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006

Outline I. Metamorphic software What is it? Good and evil uses II. Metamorphic virus construction kits III. How effective are metamorphic engines? How to compare two pieces of code? Similarity of viruses/normal code IV. Can we detect metamorphic viruses? Commercial virus scanners HMMs and similarity index V. Conclusion

PART I Metamorphic Software

What is Metamorphic Software?  Software is metamorphic provided All copies do the same thing Internal structure differs  Today almost all software is cloned  “Good” metamorphic software… Mitigate buffer overflow attacks  “Bad” metamorphic software… Avoid virus/worm signature detection

Metamorphic Software for Good?  Suppose program has a buffer overflow  If we clone the program One attack breaks every copy Break once, break everywhere (BOBE)  If instead, we have metamorphic copies Each copy still has a buffer overflow One attack does not work against every copy BOBE-resistant Analogous to genetic diversity in biology  A little metamorphism does a lot of good!

Metamorphic Software for Evil?  Cloned virus/worm can be detected Common signature on every copy Detect once, detect everywhere (DODE?)  If instead virus/worm is metamorphic Each copy has different signature Same detection may not work against every copy Provides DODE-resistance? Analogous to genetic diversity in biology  Effective use of metamorphism here is tricky!

Crypto Analogy  Consider WWII ciphers  German Enigma Broken by Polish and British cryptanalysts Design was (mostly) known to cryptanalysts  Japanese Purple Broken by American cryptanalysts Design was (mostly) unknown to cryptanalysts

Crypto Analogy  Cryptanalysis  break a (known) cipher  Diagnosis  determine how an unknown cipher works (from ciphertext)  Which was the greater achievement, breaking Enigma or Purple? Cryptanalysis of Enigma was harder Diagnosis of Purple was harder  Can make a reasonable case for either…

Crypto Analogy  What does this have to do with metamorphic software?  Suppose the good guys generate metamorphic copies of software  Bad guys can attack individual copies  Can bad guys attack all copies? If they can diagnose our metamorphic generator, maybe But that’s a diagnosis problem…

Crypto Analogy  What about case where bad guys write metamorphic code? Metamorphic viruses, for example  Do good guys need to solve diagnosis problem? If so, good guys are in trouble  Not if good guys “only” need to detect the metamorphic code (not diagnose)  Not claiming the good guys job is easy  Just claiming that there is hope…

Virus Evolution  Viruses first appeared in the 1980s Fred Cohen  Viruses must avoid signature detection Virus can alter its “appearance”  Techniques employed encryption polymorphic metamorphic

Virus Evolution - Encryption  Virus consists of decrypting module (decryptor) encrypted virus body  Different encryption key different virus body signature  Weakness decryptor can be detected

Virus Evolution – Polymorphism  Try to hide signature of decryptor  Can use code emulator to decrypt putative virus dynamically  Decrypted virus body is constant Once (partially) decrypted, signature detection is possible

Virus Evolution – Metamorphism  Change virus body  Mutation techniques: permutation of subroutines insertion of garbage/jump instructions substitution of instructions

PART II Virus Construction Kits

Virus Construction Kits – PS-MPC  According to Peter Szor: “… PS-MPC [Phalcon/Skism Mass- Produced Code generator] uses a generator that effectively works as a code-morphing engine…… the viruses that PS-MPC generates are not [only] polymorphic, but their decryption routines and structures change in variants…”

Virus Construction Kits – G2  From the documentation of G2 ( Second Generation virus generator ): “… different viruses may be generated from identical configuration files…”

Virus Construction Kits – NGVCK  From the documentation for NGVCK ( Next Generation Virus Creation Kit ): “… all created viruses are completely different in structure and opcode…… impossible to catch all variants with one or more scanstrings.…… nearly 100% variability of the entire code”  Oh, really?

PART III How Effective Are Metamorphic Engines?

How We Compare Two Pieces of Code

Virus Families – Test Data  Four generators, 45 viruses 20 viruses by NGVCK 10 viruses by G2 10 viruses by VCL32 5 viruses by MPCGEN  20 normal utility programs from the Cygwin bin directory

Similarity within Virus Families – Results

NGVCK Similarity to Virus Families  NGVCK versus other viruses 0% similar to G2 and MPCGEN viruses 0 – 5.5% similar to VCL32 viruses (43 out of 100 comparisons have score > 0) 0 – 1.2% similar to normal files (only 8 out of 400 comparisons have score > 0)

NGVCK Metamorphism/Similarity  NGVCK By far the highest degree of metamorphism of any kit tested Virtually no similarity to other viruses or normal programs Undetectable???

PART IV Can Metamorphic Viruses Be Detected?

Commercial Virus Scanners  Tested three virus scanners eTrust version avast! antivirus version 4.7 AVG Anti-Virus version 7.1  Each scanned 37 files 10 NGVCK viruses 10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses

Commercial Virus Scanners  Results eTrust and avast! detected 17 (G2 and MPCGEN) AVG detected 27 viruses (G2, MPCGEN and VCL32) none of NGVCK viruses detected by the scanners tested

Virus Detection with HMMs  Use hidden Markov models (HMMs) to represent statistical properties of a set of metamorphic virus variants Train the model on family of metamorphic viruses Use trained model to determine whether a given program is similar to the viruses the HMM represents

Virus Detection with HMMs – Data  Data set 200 NGVCK viruses (160 for training, 40 for testing)  Comparison set 40 normal exes from Cygwin 25 other “non-family” viruses (G2, MPCGEN and VCL32)  25 HMM models generated and tested

Virus Detection with HMMs – Methodology

Virus Detection with HMMs – Results

 Detect some other viruses “for free”

Virus Detection with HMMs  Summary of experimental results All normal programs distinguished VCL32 viruses had scores close to NGVCK family viruses With proper threshold, 17 HMM models had 100% detection rate and 10 models had 0% false positive rate No significant difference in performance between HMMs with 3 or more hidden states

Virus Detection with HMMs – Trained Models  Converged probabilities in HMM matrices may give insight into the features of the represented viruses  We observe opcodes grouped into “hidden” states most opcodes in one state only  What does this mean? We are not sure…

Detection via Similarity Index  Straightforward similarity index can be used as detector To determine whether a program belongs to the NGVCK virus family, compare it to any randomly chosen NGVCK virus NGVCK similarity to non-NGVCK code is small Can use this fact to detect metamorphic NGVCK variants

Detection via Similarity Index

 Experiment compare 105 programs to one selected NGVCK virus  Results 100% detection, 0% false positive  Does not depend on specific NGVCK virus selected

PART V Conclusion

 Metamorphic generators vary a lot NGVCK has highest metamorphism (10% similarity on average) Other generators far less effective (60% similarity on average) Normal files 35% similar, on average  But, NGVCK viruses can be detected! NGVCK viruses too different from other viruses and normal programs

Conclusion  NGVCK viruses not detected by commercial scanners we tested  Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracy  NGVCK viruses also detectable by similarity index

Conclusion  All metamorphic viruses tested were detectable because High similarity within family and/or Too different from normal programs  Effective use of metamorphism by virus/worm requires A high degree of metamorphism and similarity to other programs This is not trivial!

The Bottom Line  Metamorphism for “good” Buffer overflow mitigation, BOBE- resistance A little metamorphism does a lot of good  Metamorphism for “evil” For example, try to evade virus/worm signature detection Requires high degree of metamorphism and similarity to normal programs Not impossible, but not easy…

The Bottom Bottom Line  All-too-often in security, the advantage lies with the bad guys  For metamorphic software, perhaps the inherent advantage lies with the good guys

References  X. Gao, Metamorphic software for buffer overflow mitigation, MS thesis, Dept. of CS, SJSU, 2005  P. Szor, The Art of Computer Virus Research and Defense, Addison-Wesley, 2005  M. Stamp, Information Security: Principles and Practice, Wiley InterScience, 2005  M. Stamp, Applied Cryptanalysis: Breaking Ciphers in the Real World, Wiley, 2007  W. Wong, Analysis and detection of metamorphic computer viruses, MS thesis, Dept. of CS, SJSU, 2006  W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology, Vol. 2, No. 3, 2006, pp

Appendix Bonus Material

Hidden Markov Models (HMMs)  state machines  transitions between states have fixed probabilities  each state has a probability distribution for observing a set of observation symbols  states = features of the input data  transition and the observation probabilities = statistical properties of features  can “train” an HMM to represent a set of data (in the form of observation sequences)

HMM Example – the Occasionally Dishonest Casino

 2 states: fair/loaded  The switch between dice is a Markov process  Outcomes of a roll have different probabilities in each state  If we can only see a sequence of rolls, the state sequence is hidden  want to understand the underlying Markov process from the observations

HMMs – the Three Problems 1. Find the likelihood of seeing an observation sequence O given a model, i.e. P(O | ) 2. Find an optimal state sequence that could have generated a sequence O 3. Find the model parameters given a sequence O  There exist efficient algorithms to solve the three problems

HMM

HMM Application – Determining the Properties of English Text  Given: a large quantity of written English text  Input: a long sequence of observations consisting of 27 symbols (the 26 lower-case letters and the word space)  Train a model to find the most probable parameters (i.e., solve Problem 3)

HMM Application – Initial and Final Observation Probability Distributions

HMM Application - Results  Observation probabilities converged, each letter belongs to one of the two hidden states  The two states correspond to consonants and vowels  Can use trained model to score any unknown sequence of letters to determine whether it corresponds to English text. (i.e. Problem 1)  Note: no a priori assumption was made HMM effectively recovered the statistically significant feature inherent in English

HMM Application - Results  Probabilities can be sensibly interpreted for up to n = 12 hidden states  Trained model could be used to detect English text, even if the text is “disguised” by, say, a simple substitution cipher or similar transformation

HMMs – The Trained Models

HMMs – Run Time of Training Process  5 to 38 minutes, depending on number of states N.

HMMs – Run Time of Classifying Process  to 0.4 milliseconds, depending on N and number of opcodes T.

AVG Anti-Virus Scanning Result