Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.

Similar presentations


Presentation on theme: "Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh."— Presentation transcript:

1 Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh

2 Agenda Introduction – Virus Obfuscation Techniques – Existing Virus Detection Methods – Experimental Detection Using Hidden Markov Models – Proposed Approach Using Profile Hidden Markov Models – Op-code Sequences – Example Multiple Alignment – Pairwise Alignment

3 Agenda (cont’d) Creating / Scoring Alignments – Substitution Scoring – Gap Penalties – Creating a Pairwise Alignment – Creating a Multiple Alignment – Feng-Doolittle Algorithm – Sequence Preprocessing Case Studies Application Demo Conclusion

4 Introduction Viruses are becoming increasingly more complicated It is becoming easier for amateur programmers to create viruses using kits that are readily available online Some viruses have the capability to change itself from one generation to the next making it difficult to detect The goal is to explore a new approach to detecting these kinds viruses

5 Obfuscation Techniques Encrypted Viruses – Static decryptor, and an encrypted virus body – Key changes from one generation to the next – Weakness is the decryptor never changes Polymorphic Viruses – An encrypted virus with varying decryptors – Weakness is the virus body still never changes Metamorphic Viruses – Virus body can change – Assembly morphing engine – Virus Generators

6 Existing Virus Detection Methods Code Emulation – Simulated virtual environment – Retrieval of unencrypted form of the virus Pattern Based Scanning – Detect patterns or signatures Heuristic Analysis – Detect capabilities of an application

7 Experimental Approach: Using A Standard Hidden Markov Model Introduced in a previous student’s Master’s Writing Project Use a set of disassembled viruses in a particular family of viruses to train a hidden Markov model (HMM) Use the HMM to score an arbitrary assembly Designate a threshold such that if the score is over the threshold the assembly must have been a virus Promising results have been shown

8 Proposed Approach: Using a Profile Hidden Markov Model Instead of using a standard HMM the proposal is to use a profile HMM Profile HMMs will use position specific information within the sequence A profile HMM is trained using a multiple alignment This project will concentrate on the problem of creating multiple alignments for op-code sequences This approach is used in other fields which use sequence analysis

9 Op-code Sequences An application such as a virus can usually be decompiled into assembly Represent a virus as a sequence of op- codes The op-codes are parsed from the assembly Each op-code is given a representative character

10 Example Multiple Alignment FCDBAAE0 CDBAAEAA CDABAEAA CDABAEAA FCDB1AAEA ABAEAA CDABAEAA DBAAFAA AFABPAAEA ABAAEAA

11 FCDB-AAE0- -CDB-AAEAA -CDA-BAEAA -CDA-BAEAA FCDB1AAE-A -A-B-A-EAA -CDA-BAEAA --DB-AAFAA AFABPAAE-A -A-B-AAEAA

12 Pairwise Alignment A special case of a multiple alignment deals with only 2 sequences A pairwise alignment can be viewed as substitutions and gap insertions ABAA---ADD ABCABCD--D Substitute A with C Insert gap size 3 Insert gap size 2

13 Creating / Scoring Alignments

14 Substitution Scoring Each possible substitution can be assigned a score and placed into a substitution matrix Ideally the scores should be statistically correlated to the probability that the substitution would take place Without a comprehensive statistics on substitutions of op-codes in real viruses, these values can be guessed A simple example is given here ABCD A10-5 B 10-5 C 10-5 D 10

15 Gap Penalties When inserting a gap, the score will be penalized The penalty is usually a function of the length of the gap Common gap penalties include – Linear Gap Penalty, each gap has the same cost – Affine Gap Penalty, opening a gap is more expensive than extending a gap The overall score of a pairwise alignment will be the sum total of substitution scores and gap penalties

16 Creating a Pairwise Alignment Use Dynamic Programming optimum(X 1…m,Y 1…n ) = MAX – optimum(X 1…m-1, Y 1…n ) + cost add 1 more gap to X – optimum(X 1…m, Y 1…n-1 ) + cost add 1 more gap to Y – optimum(X 1…m, Y 1…n ) + substitution score of mth symbol in X with nth symbol of Y Can compute the optimal alignment in time O(m*n) for sequences of size m and n

17 Creating a Multiple Alignment Use a Progressive Alignment Choose 2 sequences to create a pairwise alignment using dynamic programming Progressively add sequences to this alignment – Choose a sequence in the alignment, and one not in the alignment – Create a pairwise alignment – Update the other sequences in the alignment with any new gaps that were inserted, add the new aligned sequence to the overall alignment

18 Feng-Doolittle Algorithm How do you choose the order in which you add the sequences to the MSA? If given a set of n sequences, pre-compute alignment scores between each possible pair of sequences (n choose 2 pairs) Data can be represented as a distance matrix of a fully connected graph of size n Compute a minimum spanning tree, to minimize the cost (or maximize the score)

19 Feng-Doolittle Algorithm (cont’d) Start with the alignment with the high scoring alignment and follow the tree 12345678910 1---856374708461576270 285---7973665994615951 36379---75686055855265 4747375---1055460785953 5706668105---4061795839 68459605440---68457578 7619455606168---647242 857618578794564---5070 96259525958757250---81 10705165533978427081---

20 Feng-Doolittle Algorithm (cont’d) MSA Before New Alignment 5) CDABBAFCDB1AAEAA+CEDA+EQ+CDABABABALF4LBBAFBSBAAAAA 4) 2AABBAFCDABA+EAABCEDCDEQFCDABA+APALF4+BBA++SBAAAAA 8) ++AABA+CDB+AAEAA+CEDCDEQ+CDABPBA+ABF4+BBAFBSBMAAAA 3) A+ABBAFCDABA+EAA+CEDCDEQA++ABFBAN++F4+BBAFBTYBAAAA New Alignment 2) A-ABNBAFCD-BAAEAABCEDA-EQ-CDABAB--BAF4NBBM-BTYBAAAA 3) A+AB-BAFCDABA+EAA+CEDCDEQA++ABFBAN++F4+BBAFBTYBAAAA ^ (gap introduced) MSA After New Alignment 5) CDAB+BAFCDB1AAEAA+CEDA+EQ+CDABABABALF4LBBAFBSBAAAAA 4) 2AAB+BAFCDABA+EAABCEDCDEQFCDABA+APALF4+BBA++SBAAAAA 8) ++AA+BA+CDB+AAEAA+CEDCDEQ+CDABPBA+ABF4+BBAFBSBMAAAA 3) A+AB+BAFCDABA+EAA+CEDCDEQA++ABFBAN++F4+BBAFBTYBAAAA 2) A+ABNBAFCD+BAAEAABCEDA+EQ+CDABAB++BAF4NBBM+BTYBAAAA ^ (gap matched)

21 Sequence Preprocessing Some metamorphic viruses will permute subroutines Permuted sequences will not align well Removing the permutations in each of the sequences will produce the best alignment Using subroutine matching, a permutation can be found which will maximize the scores

22 Case Studies

23 Selected Viruses Next Generation Virus Creation Kit (NGVCK) – Advanced assembly morphing engine – Junk code insertion – Function reordering Virus Creation Lab Win 32 (VCL32) – No function reordering Phalcon/Skism Mass-Produced Code Generator (PS-MPC) – No function reordering

24 NGVCK Results Raw NGVCK viruses did not align well Preprocessing was required in order to create usable alignments Profile HMM was able to detect viruses with a 6.8% false-positive rate and 1% false-negative rate

25 VCL32 and PS-MPC The raw viruses both aligned well and did not require preprocessing VCL32 aligned the best The Profile HMM was able to detect both viruses with 0% false- positive and false- negative rates

26 Visual Representation of Multiple Alignments Created Raw NGVCK, groups of 20 Preprocessed NGVCK, groups of 20 PS-MPC, groups of 15 VCL32, group of 10

27 Application Demo

28 Conclusion The profile HMM works well on metamorphic viruses which do not permute subroutines Future research is needed in order to fully understand the affects of preprocessing on the profile HMM

29 Thank you Email questions to codefactor@yahoo.com


Download ppt "Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh."

Similar presentations


Ads by Google