Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi)1.

Similar presentations


Presentation on theme: "Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi)1."— Presentation transcript:

1 Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi)1

2 Introduction AVAR 2008 (New Delhi)2 Director, Software Research Lab Lab’s focus: Malware Analysis Graduate level course on Malware Analysis Six years of AV related research Issues investigated: Metamorphism Obfuscation Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT Erik Uday Kumar, Authentium Moinuddin Mohammed, Microsoft Prashant Pathak, Ex-Symantec Funded by: Louisiana Governor’s IT Initiative

3 Outline 2008 AVAR (New Delhi)3 Attack of Variants AV vulnerability: Exact match Information Retrieval Techniques Inexact match Adapting IR to AV Account for code permutation Vilo: System using IR for AV Integrating Vilo into AV Infrastructure Self-Learning AV using Vilo

4 ATTACK OF VARIANTS 2008 AVAR (New Delhi) 4

5 Variants vs Family AVAR 2008 (New Delhi) 5 Source: Symantec Internet Threat Report, XI

6 Analysis of attacker strategy 2008 AVAR (New Delhi)6 Purpose of attack of variants Denial of Service on AV infrastructure Increase odds of passing through Weakness exploited AV system use: Exact match over extract Attack strategy Generate just enough variation to beat exact match Attacker cost Cost of generating and distributing variants

7 Analyzing attacker cost 2008 AVAR (New Delhi)7 Payload creation is expensive Must reuse payload Need thousands of variants Must be automated “General” transformers are expensive Specialized, limited transformers Hence packers/unpackers

8 Attacker vulnerability 2008 AVAR (New Delhi)8 Automated transformers Limited capability Machine generated, must have regular pattern Exploiting attacker vulnerability Detect patterns of similarities Approach Information Retrieval (this presentation) Markov Analysis (other work)

9 Information Retrieval 2008 AVAR (New Delhi) 9

10 IR Basics 2008 AVAR (New Delhi)10 Basis of Google, Bioinformatics Organizing very large corpus of data Key idea Inexact match over whole Contrast with AV Exact match over extract

11 IR Problem AVAR 2008 (New Delhi)11 IR Document Collection Query: Keywords or Document Related documents

12 IR Steps AVAR 2008 (New Delhi)12 Have you wondered When is a rose a rose? Have you wondered You wondered when Wondered when rose When rose rose Step 1: Convert documents to vectors 1a. Define a method to identify “features” Example: k-consecutive words 1b. Extract all features from all documents 1c. Count features, make feature vector 1 How about onions Onion smell stinks 1 1 1 0 0 [1, 1, 1, 1, 0,0]

13 IR Steps AVAR 2008 (New Delhi)13 Step 2: Compute feature vectors Take into account features in entire corpus Classical method W=TF x IDF You wondered when Wondered when rose When rose rose How about onions Onion smell stinks DF = # documents containing the feature TF = Term Frequency 5 7 8 6 3 DF 1/5 1/7 1/8 1/6 1/3 IDF IDF = Inverse of DF 1 2 5 3 0 TF(v 1 ) 1/5 2/7 5/8 3/6 0/3 w 1 = TFxIDF(v 1 )

14 IR Steps 2008 AVAR (New Delhi)14 Step 3: Compare vectors Cosine similarity w 1 = [0.33, =0.25, 0.66, 0.50]

15 IR Steps AVAR 2008 (New Delhi)15 Step 4: Document Ranking Using similarity measure IR Document Collection 0.90 0.82 0.76 0.30 Matching document New Document

16 Adapting IR for AV AVAR 2008 (New Delhi) 16

17 Adapting IR for AV 2008 AVAR (New Delhi)17 l2D2: pushecx push4 popecx pushecx l2D7:roledx, 8 movdl, al anddl, 3Fh shreax, 6 loopl2D7 popecx calls319 xchgeax, edx stosd xchgeax, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l305 l144: pushecx push4 popecx pushecx l149:movdl, al anddl, 3Fh roledx, 8 shrebx, 6 loopl149 popecx calls52F xchgebx, edx stosd xchgebx, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l18 l2D2: pushecx push4 popecx pushecx l2D7:roledx, 8 movdl, al anddl, 3Fh shreax, 6 loopl2D7 popecx calls319 xchgeax, edx stosd xchgeax, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l305 l144: pushecx push4 popecx pushecx l149:movdl, al anddl, 3Fh roledx, 8 shrebx, 6 loopl149 popecx calls52F xchgebx, edx stosd xchgebx, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l18 push pop push rol mov and shr loop pop call xchg stosd xchg inc cmp jnz push pop push mov and rol shr loop pop call xchg stosd xchg inc cmp jnz Step 0: Mapping program to document Extract Sequence of operations

18 Adapting IR for AV 2008 AVAR (New Delhi)18 Step 1a: Defining features k-perm PPOPRMASLOCXSXICJPPOPRMASLOCXSXICJ PPOPMARSLOCXSXICJPPOPMARSLOCXSXICJ P P O P R M A S L O C X S X I C J P P O P S L O C X S X I C JRM A Virus 1 Virus 2 Feature = Permutation of k operations

19 Adapting IR for AV AVAR 2008 (New Delhi)19 P P O P R M A S L O C X S X I C J PP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L P O P Virus 1 Virus 2 Virus 3 Step 1 Example of 3-perm

20 Adapting IR for AV AVAR 2008 (New Delhi)20 POPR OPR M PRMARMASMASLPOPMOPMAARSLRSLPSLPOLPOP 1 11111 000000 2000 111 000 3000000 1111 1 1 1 PMAR MARS 0 0 0 0 0 0 P O P R M A S L P O P M A R S L 1 2 3 M A R S L P O P PMAR MARS Step 2: Construct feature vectors (4-perms)

21 Adapting IR for AV AVAR 2008 (New Delhi)21 Step 3: Compare vectors Cosine similarity (as before) Step 4: Match new sample

22 Vilo: System using IR for AV AVAR 2008 (New Delhi)22

23 Vilo Functional View AVAR 2008 (New Delhi)23 Vilo Malware Collection 0.90 0.82 0.76 0.30 Malware Match New Sample

24 Vilo in Action: Query Match AVAR 2008 (New Delhi)24

25 Vilo: Performance AVAR 2008 (New Delhi)25 Response time vs Database size Search on generic desktop: In Seconds Contrast with Behavior match: In Minutes Graph match: In Minutes

26 Vilo Match Accuracy AVAR 2008 (New Delhi)26 ROC Curve: True Positive vs False Positive False Positive True Positive

27 Vilo in AV Product AVAR 2008 (New Delhi) 27

28 Vilo in AV Product AVAR 2008 (New Delhi)28 AV Scanner Classifier Vilo Classifier AV Systems: Composed of classifiers Introduce Vilo as a Classifier

29 Self-Learning AV Product AVAR 2008 (New Delhi)29 Vilo Classifier How to get malware collection? Collect malware detected by the Product. Solution 1

30 Self-Learning AV Product AVAR 2008 (New Delhi)30 Vilo Classifier Internet Cloud Vilo How to get malware collection? Collect and learn in the cloud Solution 2

31 Learning in the Cloud AVAR 2008 (New Delhi)31 Vilo Classifier Classifier Internet Cloud Vilo Learner How to get malware collection? Collect and learn in the cloud Solution 2

32 Experience with Vilo-Learning AVAR 2008 (New Delhi)32 Vilo-in-the-cloud holds promise Can utilize cluster of workstations Like Google Take advantage of increasing bandwidth and compute power Engineering issues to address Control growth of database Forget samples Use “signature” feature vector(s) for family Be “selective” about features to use

33 Summary AVAR 2008 (New Delhi)33 Weakness of current AV system Exact match over extract Exploited by creating large number of variants Information Retrieval research strengths Inexact match over whole VILO demonstrates IR techniques have promise Architecture of Self-Learning AV System Integrate VILO into existing AV systems Create feedback mechanism to drive learning


Download ppt "Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi)1."

Similar presentations


Ads by Google