Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science Yasmine Kandissounon.

Similar presentations


Presentation on theme: "Department of Computer Science Yasmine Kandissounon."— Presentation transcript:

1 Department of Computer Science Yasmine Kandissounon

2 The problem In an attempt to create the undetectable virus, malware writers have imagined and used many strategies, the most current and efficient being “metamorphism”. Metamorphism is a strategy that helps a virus hide its malicious behavior and change its appearance at each generation. Metamorphism led to a profusion of virus with which the Anti-Virus scanners cannot keep up with. The invention of virus generation kits made things worse as it allows people with few or even non-existent programming skills to generate a metamorphic virus in no time [1]. Most malicious programs created by virus generation kits are able to avoid detection because the techniques used by Anti Virus scanners are just not efficient enough to outsmart them.

3 Related Works Metamorphic malware has challenged scholars and inspired serious research. IBM researchers applied neural networks to detect boot-sector viruses [2]. They generated short byte strings (called trigrams) from a set of trained examples and used them as features for virus detection. According to IBM, this technique has helped detect about 75% of known boot sector viruses, but failed to recognize programs which malicious programs were obscured. Chouchane and al. have suggested a detection using Instructions Frequency Vectors based on Markov Chains [3]. They computed the matrices for the IFVs of the eve of a virus and its variant after a number of generations to prove or disprove that the variant was generated by a particular engine.

4 Our solution Eugene Spafford’s analysis of authorship of a software inspired our solution. Spafford used the same idea behind forensic linguistics which can accurately identify an English text’s author [4]. Indeed, combining software metrics and other features like variable naming and code indentation, Spafford showed that a program could be attributed to a specific author. His technique is even easier to use in the case of virus generation kits, given that their signature is more consistent than humans’. Our solution consists of using Markov Chains to attribute the authorship of a virus to an engine. The extraction and study of the opcodes of a number of variants of popular generation kits showed an independency between an opcode and the one two steps up. Hence, the Markov Chains can be applied to viruses generated by kits to get the engine’s signature.

5 The culture We decided to work with both Next Generation Virus Construction Kit (NGVCK) and Virus Creation Lab (VCL32). Mark Stamp from San Jose State University showed that the similarities among NGVCK variants are less than 2%, which makes it a highly metamorphic engine and thus relevant to our study [6]. Except from the fact that VCL32 variants also presents an interesting low degree of similarity, VCL32 has inspired many other virus generation kits which strive to get the same metamorphic features. Virtual Box of Sun Microsystems was used as our isolated platform. We downloaded our kits from vx.netlux.org, which is an almost complete repository of all known virus engines, constructors and simulators. This website also provides some documentation for each virus in a library.

6 NGVCK’s graphical interface VCL32’s graphical interface

7 The work From the preceding GUIs, we created 50 variants of each kit and extracted the opcodes of each variant using a little homemade java program. The next phase in the process of finding a common signature to the variants of each kit consists of computing a transition matrix using Markov Chains for each variant and calculating the average matrix which will constitute a signature for each the variants of NGVCK and VCL32.

8 A Markov Chain is a set of states linked by probabilities. Let S={s 1, s 2, s 3,…, s n } be a set of states. If a process starts at s1, it will need a probability p 12 (called transition probability) to move to state s 2 and so on. More generally, the probability p ij of a process with n states to move from s i to s j is : n p ij = ∑ p ik p kj k=1 A transition matrix is a matrix which holds the probabilities of the different states in the Markov Chain. In our case, the states are the different opcodes

9 For each opcode, the probability will be taken proportionally to the opcodes that follow it. Thus, if an opcode O i occurs n times in a variant and is followed x times by an opcode O j, in our transition matrix the probability p for state (here opcode) O i to be followed by state O j is: x/n. As an example, let’s compute the transition matrix of a simple program with opcodes common to those in our variants:

10 call push add sub jmp push add call This yields the following transition matrix M (notice that the sum of the probabilities of each state has to be 1): CallAddSubJmpPush Call 00001 Add 1/3 00 M=M= Sub 00010 Jmp 00001 Push 0½001/2

11 Expected Impact Our solution presents the advantage of accuracy and space and time efficiency. Using Markov Chains help reduce the percentage of false positives. We expect to define a reasonable threshold which will help separate malicious programs from benign ones without getting high quantities of false negatives. In addition, storing only one signature for a whole set of metamorphic variants with a common origin is more space-efficient than storing a signature for each of the variants as the Anti Virus companies seems to do. Finally, our solution presents the advantage of being time-efficient, as the algorithm of the comparison our computed signature against a potential malicious program has a linear time complexity in the size of the matrix, which is accepted as time-efficient by scientists.

12 Limitations Although our solution seems very appealing, it also has some downsides: One disadvantage is the very fact that the signature is the average matrix. The definition of a threshold to back up the average matrix may be really tricky as it will need to be accurate enough to avoid false negatives. Also, because we have a very limited culture (50 variants for each NGVCK and VCL32), we will test the signature on a very limited scale and will only assume it works on a larger scale.

13 References [1] http://packetstormsecurity.org/mag/40hex/40HEX-10/40HEX- 10.001Jhttp://packetstormsecurity.org/mag/40hex/40HEX-10/40HEX- 10.001J [2]http://www.research.ibm.com/antivirus/SciPapers/Tesauro/Neural Nets.html [3] M.R. Chouchane, A. Walenstein, A. Lakhotia. Using Markov Chains to Filter Machine-morphed Variants of Malicious Programs. [4] Ivan Krsul and Eugene H. Spafford, Authorship Analysis: Identifying the Author of a Program. [5]Peter Szor, Advanced Code Evolution Techniques and Computer Virus Generation Kits. [6]Wing wong and Mark Stamp, Hunting for Metamorphic Engine. [7] www.vx.netlux.orgwww.vx.netlux.org


Download ppt "Department of Computer Science Yasmine Kandissounon."

Similar presentations


Ads by Google