Metamorphic Malware Research

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Transform Techniques Mark Stamp Transform Techniques.
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Machine Learning Hidden Markov Model Darshana Pathak University of North Carolina at Chapel Hill Research Seminar – November 14, 2012.
Finite State Transducers
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
數據分析 David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Profiles for Sequences
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Hidden Markov Models First Story! Majid Hajiloo, Aria Khademi.
Albert Gatt Corpora and Statistical Methods Lecture 8.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell.
Malware 1 Malware Malware 2 Malicious Software  Malware is not new…  Fred Cohen’s initial virus work in 1980’s o Used viruses to break MLS systems.
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Automated malware classification based on network behavior
Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant.
Introduction to Profile Hidden Markov Models
A Revealing Introduction to Hidden Markov Models
Masquerade Detection Mark Stamp 1Masquerade Detection.
Isolated-Word Speech Recognition Using Hidden Markov Models
Department of Computer Science Yasmine Kandissounon.
Conditional Random Fields
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
Hidden Markov Modelling and Handwriting Recognition Csink László 2009.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
A Revealing Introduction to Hidden Markov Models
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Hidden Markov Models Sean Callen Joel Henningsen.
Simple Substitution Distance and Metamorphic Detection Simple Substitution Distance 1 Gayathri Shanmugam Richard M. Low Mark Stamp.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models Sean Callen Joel Henningsen.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Partially Observable Markov Decision Process and RL
PCA Applications Mark Stamp PCA Applications.
An INTRODUCTION TO HIDDEN MARKOV MODEL
Presentation transcript:

Metamorphic Malware Research

Metamorphic Malware Metamorphic software changes “shape” But has instance has same function In contrast, most software is “cloned” Metamorphism used by virus writers to evade signature detection Lots of interesting research problems We look at some here… Metamorphic Malware 2

Metamorphic Research How metamorphic are hacker produced generators? How to detect metamorphic viruses? The “ultimate” metamorphic generator? How to make metamorphic that “carries its own generator” Related questions/issues? Metamorphic Malware 3

Metamorphic Generators To analyze metamorphic generators… First problem is, how to compare code? We developed a “similarity index” Based on extracted opcodes Can be represented graphically Also gives a numerical score Metamorphic Malware 4

Similarity Suppose we want to compare exe files Say, file X and file Y Extract opcodes from each x0, x1, …, xn and y0, y1, …, ym Compare all 3-opcode subsequences If they agree (in any order) plot a point on the axes at appropriate point Filter noise with window of length 5 Metamorphic Malware 5

Similarity That is, matches of length 5 or greater are add to score Lengths were determined experimentally Scores range from 0 to 1, where 0 == no match, 1 == perfect match Gives us a graphical view and a score In graph, what is a perfect match? Main diagonal, or segments parallel to it Metamorphic Malware 6

Normal Files Similar of typical “normal” files Metamorphic Malware 7

Metamorphic Generators A typical “metamorphic” generator Metamorphic Malware 8

Metamorphic Generators Highly metamorphic generator Metamorphic Malware 9

Metamorphic Generators We measured metamorphism of metamorphic generators What did we find? Generally, not very metamorphic… We did find one exception: Next Generation Virus Creation Kit (NGVCK) Can we detect NGVCK viruses? Metamorphic Malware 10

Metamorphic Detection We “trained” a hidden Markov model Based on a bunch of “family” viruses Using extracted opcode sequences Then trained a model for detection Next, we discuss HMMs Other techniques could be used Neural nets, data mining, etc. Metamorphic Malware 11

Hidden Markov Models HMMs --- a machine learning technique Widely used in speech recognition, bioinformatics, and other areas We can train an HMM Then use the resulting trained model to score unknown High score? Data matches training data Low score? Does not match training data Metamorphic Malware 12

Hidden Markov Models What are HMMs? Consider an example… Suppose we want to know average annual temperature in the past We cannot go back in time So what to do? Suppose we know that tree ring size is related to temperature Metamorphic Malware 13

Hidden Markov Models We consider 2 possible temperatures Hot (H) and cold (C) We consider 3 tree ring sizes Small (S), medium (M), large (L) Based on measurements, we find: Metamorphic Malware 14

HMM Also, based on historical record: Then transitions between hot and cold years is a Markov process (order 1) For the past, we cannot observe temp But, we can measure tree rings sizes Metamorphic Malware 15

HMMs HMM give us efficient algorithms to solve problems like: Given a series of tree ring sizes, can we say anything about temperatures? Metamorphic Malware 16

HMMs The generic picture is like this… Note, there is a Markov process And a series of observations Metamorphic Malware 17

HMMs HMM model denoted as: λ=(A,B,π) A is state transition matrix B gives probabilities of observations, depending on state of Markov process π contains initial state probabilities For HMMs there are efficient algorithms to solve 3 problems Next slide… Metamorphic Malware 18

The 3 HMM Problems 1. Given a model and observations, we can score the sequence of observations How well does observed data fit model? 2. Given model and observations, we can find optimal state sequence Here, we uncover the hidden states 3. Given observation sequence, we can train a model to best fit the data Only assumption is size of the A matrix Metamorphic Malware 19

HMM Training: English Text Example Assuming 2 hidden states Here, we show the B matrix… Metamorphic Malware 20

HMMs and Metamorphic Generators So, what’s the game plan? Extract opcodes from several metamorphic viruses from same family Train HMM model to on these opcodes (problem 3 from previous slide) Given unknown file, score extracted opcodes using the trained HMM model (problem 1) Metamorphic Malware 21

HMM Detection of NGVCK Trained model works for detection Effective to the point of practical… Metamorphic Malware 22

Why Does this Work? NGVCK viruses are highly metamorphic But they have some common statistical properties This is automatically extracted by HMM NGVCK differs from normal code So HMM can distinguish between the How to make a “better” metamorphic generator? Hold that thought… Metamorphic Malware 23

What Next? Can we extract opcodes (or approximation) efficiently? Are “profile hidden Markov models” better? Similarity index for detection? Better ways to measure similarity? Statistical tests versus similarity? HMMs to detect the “undetectable”? HMM compared to other proposed methods? Metamorphism for software watermarking? Metamorphic Malware 24

Ultimate Metamorphic? How to evade signature detection and HMM detection? Metamorphic code evades signature detection But how to also evade HMM detection? Make the code highly metamorphic and similar to normal code Then trained HMM will confuse the two Metamorphic Malware 25

Ultimate Metamorphic? Insert dead code from normal programs Before After Metamorphic Malware 26

What Now? How to detect the “ultimate” metamorphic generator? Remove the dead code How to remove dead code? Emulation can help, but… Can we “improve” the generator? Can we improve the detection? Can we say something more general? Metamorphic Malware 27

References Revealing introduction to HMMs Hunting for metamorphic engines Profile hidden Markov models Approximate disassembly Detecting “undetectable” metamorphic viruses Hunting for undetectable metamorphic viruses And lots more work in progress… Metamorphic Malware 28