U.S. Army Research, Development and Engineering Command Jaime C. Acosta, Ph.D. Using the Longest Common Substring on Dynamic Traces of Malware to Automatically.

Slides:



Advertisements
Similar presentations
American Society Chapter 07.
Advertisements

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Accredited Supplier Communications Plan FY09-10 Q1 to Q4 May 2009, v2.0 Home Access Marketing & Stakeholder Engagement Team.
Partitioning 2-digit numbers
No.1058,Sobha Arcade, Vijaya Bank Colony,SRS Nagar,Off Banarghatta Road,Behind IIM, Bangalore Web: Ph: ,
01 Prologo 02 Bran 103 Catelyn 1 05 Eddard 1 04 Daenerys 1 06 Jon 1 07 Catelyn 2 10 Tyrion 1 08 Arya 1 16 Sansa 1 09 Bran 2 11 Jon 2 12 Daenerys 2 13 Eddard.
1_Panel Production. 380 pannelli 45 giorni di produzione = 8.4 pannelli/day.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Analyzing and Exploiting Network Behaviors of Malware Jose Andre Morales Areej Al-Bataineh Shouhuai XuRavi Sandhu SecureComm Singapore, 2010 ©2010 Institute.
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 116.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 107.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 40.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 28.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 44.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 29.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 101.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 58.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 112.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 75.
Analyzing the Impact of Granularity on IP-to-AS Mapping Presented by Baobao Zhang Authours: Baobao Zhang, Jun Bi, Yangyang Wang, Jianping Wu.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
June 2006Made available under EPL v1.01 Scope of OpenUP Some slides used in discussion Mark Dickson.
Summary of Second Draft of the NERC Standard PRC Disturbance Monitoring and Reporting JSIS Meeting August 10, 2010 Salt Lake City, UT.
Tenths and Hundredths.
Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.
BREAKOUT SESSION 2 Smart Grid 2-B: Grid Integration – Essential Step for Optimization of Resources Integrating Intermittent Wind Generation into an Island.
11 June 2009 American Control ConferenceSt. Louis, MO Control of Wind Turbines: Past, Present and Future.
What Time Is It? Lesson by Mrs. Moody, FLE. M1M2 Students will develop an understanding of the measurement of time. a Tell time to the nearest hour and.
Break Time Remaining 10:00.
This module: Telling the time
The basics for simulations
Kronos Timecard Pay Rounding Tips.
KARACHI FASHION WEEK CHAPTER 3 JANUARY 27 – 30, 2011 FASHION RUNWAY SHOW FASHION RUNWAY SHOW BRAND PRESENTATIONS BRAND PRESENTATIONS FASHION BRANDS EXHIBITIONS.
TRIAL VERSION Instructions Slide (delete before presenting or click on welcome screen before starting slideshow) The Wedding Series is designed to give.
TOP Server: Understanding Modbus for Device Connectivity
Pearls of Functional Algorithm Design Chapter 1 1 Roger L. Costello June 2011.
Precedence Diagram Technique Precedence Networks Critical Path Analysis.
August 16, 2014 Modeling the Performance of Wireless Sensor Networks Carla Fabiana Chiasserini Michele Garetto Telecommunication Networks Group Politecnico.
15. Oktober Oktober Oktober 2012.
Core Code Quiz How much do you really know…?. Core Code Quiz   Take a minute and write everything you know about this Core Code.
Created by Mr. Lafferty Maths Dept.
Note: A bolded number or letter refers to an entire lesson or appendix. A Adding Data Through a View ADD_MONTHS Function 03-22, 03-23, 03-46,
We are learning how to read the 24 hour clock
S elçuk N as SELÇUK NAS DOKUZ EYLUL UNIVERSITY SCHOOL OF MARITIME BUSINESS AND MANAGEMENT DEPARTMENT OF DECK CURRENT TRIANGLE.
Produced by the Department of Learning and Teaching Resources, Belfast Institute. Want to be a xxxxx? Welcome to College Name Click here to start.
This PowerPoint file contains animations. View as a slide show to ensure all information is visible.
Want to be a xxxxx? Welcome to College Name Click here to start.
MOTION. 01. When an object’s distance from another object is changing, it is in ___.
SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Volume Concepts HP Restricted Module.
School Census Summer 2010 Headlines 1 Jim Haywood Product Manager for Statutory Returns Version 1.0.
: 3 00.
5 minutes.
Lifehacking Patrick Haller CPLUG 2006 August.
THE QUESTIONS THAT NO ONE ASKS Social Entrepreneurship Conference Luis Pareras.
Visions of Australia – Regional Exhibition Touring Fund Applicant organisation Exhibition title Exhibition Sample Support Material Instructions 1) Please.
Clock will move after 1 minute
SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.
State of the Exploit Matt Miller / Trust Boundary VulnerabilityExploitation.
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) a.m. Effect size 2 (W. Van den.
Digital Communication
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Presented by: Kushal Mehta University of Central Florida Michael Spreitzenbarth, Felix Freiling Friedrich-Alexander- University Erlangen, Germany michael.spreitzenbart,
1 CSC 421: Algorithm Design & Analysis Spring 2014 Complexity & lower bounds  brute force  decision trees  adversary arguments  problem reduction.
Presentation transcript:

U.S. Army Research, Development and Engineering Command Jaime C. Acosta, Ph.D. Using the Longest Common Substring on Dynamic Traces of Malware to Automatically Identify Common Behaviors

Research Question Can we reduce redundant analysis by finding common behaviors in malware instances?

Malware Analysis Dynamic Analysis: Run the malware instance (binary) in a controlled environment –Log all events (registry, memory, sockets, etc.) –Analyze logs for malicious behavior –Find similar malware instances based on runtime behavior

Malware Analysis Event Logs … Malware A Event Codes Initialize network socket Establish connection to malicious.com Load library Sleep

Malware Instance Similarity Event n-grams (Rieck et al. 2010) –Find common n-grams (or sequences of events) in event logs 01, 02; 02, 03; 2-grams for Malware A / Malware B … Malware A … Malware B … Malware C 01, 02; 2-grams for Malware A / Malware C 01, 02; 2-grams for Malware B / Malware C Events Codes

Malware Instance Similarity Event n-grams (Rieck 2010) –Find common fixed size n-grams (or sequences of events) in event logs Malware A / Malware B are more likely to be of the same type 01, 02; 02, 03; 2-grams for Malware A / Malware B 00 … … Malware A 01 … … Malware B 04 … … Malware C 01, 02; 2-grams for Malware A / Malware C 01, 02; 2-grams for Malware B / Malware C

Malware Instance Similarity Limitations for post analysis –Lose context given by varied-length sequences … Malware A Event Codes Initialize network socket Establish connection to malicious.com Load library Sleep … Install a rootkit

Malware Instance Similarity Limitations for post analysis –Lose context given by varied-length sequences –Lose commonalities between different types of malware 08 … Malware A … Malware B 00 … Malware C

Approach Common Substrings Algorithm –Based on the Longest Common Substring –Finds all common event sequences of minimum (not fixed) length n between trace files in a dataset

Approach Malheur Reference Dataset –Dynamic traces of 3131 malware instances Generated with CWSandbox Trace size ranges from 700B to 3.4MB Collected in August 2009

Approach Malheur Reference Dataset –Traces split into 2 sets Small Set (<100KB)Large Set (>=100KB) Total # malware instance trace files2,0711,060 Total # events1,217,98517,400,262 Total size of malware instance trace files44 MB490 MB

Approach Goal –Reduce redundant analysis, especially in larger malware First, find common substrings within small malware traces Next, reduce analysis workload by removing redundancies in larger malware traces

Approach – Common Substrings Algorithm Input: Malware dynamic traces of the small set (size < 100KB) 00 … … Malware A 04 … … Malware D 01 … … Malware B 02 … … Malware E 04 … … Malware C 04 … … Malware F Events Output: Common substrings matrix XXXXXX …XXXXX ……XXXX ………XXX …………XX ……………X ABCDEF A B C D E F All common substrings between Pairs of malware traces

Approach – Common Substrings Algorithm Iteration … Malware A … Malware B Malware A Malware B

Approach – Common Substrings Algorithm Iteration … Malware A … Malware B Malware A Malware B

Approach – Common Substrings Algorithm Iteration 2 – match found, merge with upper left corner 01 01, Malware A Malware B … Malware A … Malware B

Approach – Common Substrings Algorithm Final Iteration 01 01, Malware A Malware B We have 2 common substrings. We only keep those with minimum substring length … Malware A … Malware B

Approach – Common Substrings Algorithm Selecting which Common Substrings to keep Common Substrings Matrix 01 01, Malware A Malware B We have 2 common substrings. We only keep those with minimum substring length 2 XXXXXX 01,02 XXXXX XXXX XXX XX X ABCDEF A B C D E F

Approach – Common Substrings Algorithm Unique common substrings are merged XXXXXX 01,02 02,03,04 XXXXX 03,02,24,4 6,35 01,02 02,03,04 XXXX 03,02,20,4 0,35 03,02,20,4 0,3,5 XXX 03,02,24,4 0,36 03,02,20,4 0,3,5 XX 01,02,54,4 09,35 03,02,20,4 0,3,5 X ABCDEF A B C D E F 03,02,20,40,35; 03,02,02,02,03; 01,02,02; 00,02; 03,02; … Small set (<100KB) common substrings

Approach – Common Substrings Algorithm Doesnt that take a lot of space? –Many shared common substrings –Total size of all unique common substrings was 25MB Doesnt that take a lot of processing time? –Can be run on separate processes with multithreading –GPU

Approach Find and remove common substrings in large set (size >= 100KB) 03,02,20,40,35; 03,02,02,02,03; 01,02,02; 00,02; 03,02; … Small set (<100KB) common substrings … Malware AA … Malware BB … Malware CC … Malware AA 00 … Malware BB … Malware CC 40% shared 30% shared 50% shared

Approach Find and remove common substrings in large set (size >= 100KB) 03,02,20,40,35; 03,02,02,02,03; 01,02,02; 00,02; 03,02; … Small set (<100KB) common substrings … Malware AA … Malware BB … Malware CC … Malware AA 00 … Malware BB … Malware CC 40% shared 30% shared 50% shared Average = 40%

Approach Find and remove common substrings in large set (size >= 100KB) 03,02,20,40,35; 03,02,02,02,03; 01,02,02; 00,02; 03,02; … Small set (<100KB) common substrings … Malware AA … Malware BB … Malware CC … Malware AA 00 … Malware BB … Malware CC 40% shared 30% shared 50% shared This process was run several times with minimum length sizes 2 to 100

Results Analysts dream: Many long common substrings are shared with the larger set

Results A B C A - Not too interesting finding common pairs of instructions is expected and will not reduce redundant analysis by much

Results A B C B - Indicates that small traces can be analyzed thus reducing the larger set analysis by about half

Results A B C C - Some reassurance that the dataset was reasonably diverse

Contributions –The common substring algorithm is capable of identifying similarities in dynamic traces of malware –Redundant event sequences can be identified to reduce analysis –Commonalities are not limited to short event sequences

Future Work –Use behavior templates For example: regular expressions to identify a recurring sequences (5 vs. 10 sleep events) –Develop a user interface –Optimization GPU

Questions

Sample Common Substrings Retrieve file from server and replace system file –Load library –Connect –Download –Check if exists –Remove –Copy –Remove evidence

Dataset Reference