Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi)1.

Slides:



Advertisements
Similar presentations
Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
Advertisements

Chapter 5: Introduction to Information Retrieval
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval in Practice
A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval University of Illinois at Urbana-Champain Zhichen XuYan Chen Northwestern.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Beyond Anti-Virus by Dan Keller Fred Cohen- Computer Scientist “there is no algorithm that can perfectly detect all possible computer viruses”
Automated malware classification based on network behavior
Silvio Cesare Ph.D. Candidate, Deakin University.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
Charles Curtsinger UMass at Amherst Benjamin Livshits and Benjamin Zorm Microsoft Research Christian Seifert Microsoft 20 th USENIX Security Symposium.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Computer Viruses Preetha Annamalai Niranjan Potnis.
Department of Computer Science Yasmine Kandissounon.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
“Artificial Intelligence” in Database Querying Dept. of CSE Seung-won Hwang.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
DBPD: A Dynamic Birthmark-based Software Plagiarism Detection Tool
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
Chapter 6: Information Retrieval and Web Search
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Finding Diversity in Remote Code Injection Exploits Justin Ma, John Dunagan, Helen J. Wang, Stefan Savage, Geoffrey M. Voelker *University of California,
Normalizing Metamorphic Malware Using Term Rewriting A. Walenstein, R. Mathur, M. R. Chouchane, and A. Lakhotia Software Research Laboratory The University.
Using Engine Signature to Detect Metamorphic Malware Mohamed R. Chouchane and Arun Lakhotia Software Research Laboratory The University of Louisiana at.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.
Ensemble Learning for Low-level Hardware-supported Malware Detection
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Antivirus Software Troy Behmer. Outline Topics covered: – What is Antivirus software (AVS)? – What are the advantages and disadvantages of AVS? – What.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Artificial Intelligence. Real Threat Prevention.
Application Communities
Automatic Extraction of Malicious Behaviors
CSCE 590 Web Scraping – Information Extraction II
Search Engine Architecture
Clustering of Web pages
Techniques, Tools, and Research Issues
Techniques, Tools, and Research Issues
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Chapter 5: Information Retrieval and Web Search
Normalizing Metamorphic Malware Using Term Rewriting
Information Retrieval and Web Design
Presentation transcript:

Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette AVAR (New Delhi)1

Introduction AVAR 2008 (New Delhi)2 Director, Software Research Lab Lab’s focus: Malware Analysis Graduate level course on Malware Analysis Six years of AV related research Issues investigated: Metamorphism Obfuscation Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT Erik Uday Kumar, Authentium Moinuddin Mohammed, Microsoft Prashant Pathak, Ex-Symantec Funded by: Louisiana Governor’s IT Initiative

Outline 2008 AVAR (New Delhi)3 Attack of Variants AV vulnerability: Exact match Information Retrieval Techniques Inexact match Adapting IR to AV Account for code permutation Vilo: System using IR for AV Integrating Vilo into AV Infrastructure Self-Learning AV using Vilo

ATTACK OF VARIANTS 2008 AVAR (New Delhi) 4

Variants vs Family AVAR 2008 (New Delhi) 5 Source: Symantec Internet Threat Report, XI

Analysis of attacker strategy 2008 AVAR (New Delhi)6 Purpose of attack of variants Denial of Service on AV infrastructure Increase odds of passing through Weakness exploited AV system use: Exact match over extract Attack strategy Generate just enough variation to beat exact match Attacker cost Cost of generating and distributing variants

Analyzing attacker cost 2008 AVAR (New Delhi)7 Payload creation is expensive Must reuse payload Need thousands of variants Must be automated “General” transformers are expensive Specialized, limited transformers Hence packers/unpackers

Attacker vulnerability 2008 AVAR (New Delhi)8 Automated transformers Limited capability Machine generated, must have regular pattern Exploiting attacker vulnerability Detect patterns of similarities Approach Information Retrieval (this presentation) Markov Analysis (other work)

Information Retrieval 2008 AVAR (New Delhi) 9

IR Basics 2008 AVAR (New Delhi)10 Basis of Google, Bioinformatics Organizing very large corpus of data Key idea Inexact match over whole Contrast with AV Exact match over extract

IR Problem AVAR 2008 (New Delhi)11 IR Document Collection Query: Keywords or Document Related documents

IR Steps AVAR 2008 (New Delhi)12 Have you wondered When is a rose a rose? Have you wondered You wondered when Wondered when rose When rose rose Step 1: Convert documents to vectors 1a. Define a method to identify “features” Example: k-consecutive words 1b. Extract all features from all documents 1c. Count features, make feature vector 1 How about onions Onion smell stinks [1, 1, 1, 1, 0,0]

IR Steps AVAR 2008 (New Delhi)13 Step 2: Compute feature vectors Take into account features in entire corpus Classical method W=TF x IDF You wondered when Wondered when rose When rose rose How about onions Onion smell stinks DF = # documents containing the feature TF = Term Frequency DF 1/5 1/7 1/8 1/6 1/3 IDF IDF = Inverse of DF TF(v 1 ) 1/5 2/7 5/8 3/6 0/3 w 1 = TFxIDF(v 1 )

IR Steps 2008 AVAR (New Delhi)14 Step 3: Compare vectors Cosine similarity w 1 = [0.33, =0.25, 0.66, 0.50]

IR Steps AVAR 2008 (New Delhi)15 Step 4: Document Ranking Using similarity measure IR Document Collection Matching document New Document

Adapting IR for AV AVAR 2008 (New Delhi) 16

Adapting IR for AV 2008 AVAR (New Delhi)17 l2D2: pushecx push4 popecx pushecx l2D7:roledx, 8 movdl, al anddl, 3Fh shreax, 6 loopl2D7 popecx calls319 xchgeax, edx stosd xchgeax, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l305 l144: pushecx push4 popecx pushecx l149:movdl, al anddl, 3Fh roledx, 8 shrebx, 6 loopl149 popecx calls52F xchgebx, edx stosd xchgebx, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l18 l2D2: pushecx push4 popecx pushecx l2D7:roledx, 8 movdl, al anddl, 3Fh shreax, 6 loopl2D7 popecx calls319 xchgeax, edx stosd xchgeax, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l305 l144: pushecx push4 popecx pushecx l149:movdl, al anddl, 3Fh roledx, 8 shrebx, 6 loopl149 popecx calls52F xchgebx, edx stosd xchgebx, edx inc[ebp+v4] cmp[ebp+v4], 12h jnzshort l18 push pop push rol mov and shr loop pop call xchg stosd xchg inc cmp jnz push pop push mov and rol shr loop pop call xchg stosd xchg inc cmp jnz Step 0: Mapping program to document Extract Sequence of operations

Adapting IR for AV 2008 AVAR (New Delhi)18 Step 1a: Defining features k-perm PPOPRMASLOCXSXICJPPOPRMASLOCXSXICJ PPOPMARSLOCXSXICJPPOPMARSLOCXSXICJ P P O P R M A S L O C X S X I C J P P O P S L O C X S X I C JRM A Virus 1 Virus 2 Feature = Permutation of k operations

Adapting IR for AV AVAR 2008 (New Delhi)19 P P O P R M A S L O C X S X I C J PP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L P O P Virus 1 Virus 2 Virus 3 Step 1 Example of 3-perm

Adapting IR for AV AVAR 2008 (New Delhi)20 POPR OPR M PRMARMASMASLPOPMOPMAARSLRSLPSLPOLPOP PMAR MARS P O P R M A S L P O P M A R S L M A R S L P O P PMAR MARS Step 2: Construct feature vectors (4-perms)

Adapting IR for AV AVAR 2008 (New Delhi)21 Step 3: Compare vectors Cosine similarity (as before) Step 4: Match new sample

Vilo: System using IR for AV AVAR 2008 (New Delhi)22

Vilo Functional View AVAR 2008 (New Delhi)23 Vilo Malware Collection Malware Match New Sample

Vilo in Action: Query Match AVAR 2008 (New Delhi)24

Vilo: Performance AVAR 2008 (New Delhi)25 Response time vs Database size Search on generic desktop: In Seconds Contrast with Behavior match: In Minutes Graph match: In Minutes

Vilo Match Accuracy AVAR 2008 (New Delhi)26 ROC Curve: True Positive vs False Positive False Positive True Positive

Vilo in AV Product AVAR 2008 (New Delhi) 27

Vilo in AV Product AVAR 2008 (New Delhi)28 AV Scanner Classifier Vilo Classifier AV Systems: Composed of classifiers Introduce Vilo as a Classifier

Self-Learning AV Product AVAR 2008 (New Delhi)29 Vilo Classifier How to get malware collection? Collect malware detected by the Product. Solution 1

Self-Learning AV Product AVAR 2008 (New Delhi)30 Vilo Classifier Internet Cloud Vilo How to get malware collection? Collect and learn in the cloud Solution 2

Learning in the Cloud AVAR 2008 (New Delhi)31 Vilo Classifier Classifier Internet Cloud Vilo Learner How to get malware collection? Collect and learn in the cloud Solution 2

Experience with Vilo-Learning AVAR 2008 (New Delhi)32 Vilo-in-the-cloud holds promise Can utilize cluster of workstations Like Google Take advantage of increasing bandwidth and compute power Engineering issues to address Control growth of database Forget samples Use “signature” feature vector(s) for family Be “selective” about features to use

Summary AVAR 2008 (New Delhi)33 Weakness of current AV system Exact match over extract Exploited by creating large number of variants Information Retrieval research strengths Inexact match over whole VILO demonstrates IR techniques have promise Architecture of Self-Learning AV System Integrate VILO into existing AV systems Create feedback mechanism to drive learning