EFFECTIVE AND EFFICIENT MALWARE DETECTION AT THE END HOST Presentation by Clark Wachsmuth C. Kolbitsch, P. M. Comparetti, C. Kreugel, E. Kirda, X. Zhou.

Slides:

Advertisements

Similar presentations

ROP is Still Dangerous: Breaking Modern Defenses Nicholas Carlini et. al University of California, Berkeley USENIX Security 2014 Presenter: Yue Li Part.

Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Polymorphic Malware Detection Connor Schnaith, Taiyo Sogawa 9 April 2012.

Programming Types of Testing.

1 Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code (DOME) Subha Ramanathan & Arun Krishnamurthy Nov 15, 2005.

Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.

 Understanding the Sources of Inefficiency in General-Purpose Chips.

Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch, Paolo Milani TU Vienna Christopher UCSB Engin Kirda.

Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 10 04/18/2011 Security and Privacy in Cloud Computing.

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

2  Problem Definition  Project Purpose – Building Obfuscator  Obfuscation Quality  Obfuscation Using Opaque Predicates  Future Planning.

Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps Bin Liu (SRA), Bin Liu (CMU), Hongxia Jin (SRA), Ramesh Govindan (USC)

Automated malware classification based on network behavior

Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.

© The McGraw-Hill Companies, 2006 Chapter 1 The first step.

DroidKungFu and AnserverBot

Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis Authors: Heng Yin, Dawn Song, Manuel Egele, Christoper Kruegel, and.

Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.

Fundamentals of Python: From First Programs Through Data Structures Chapter 14 Linear Collections: Stacks.

Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.

Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.

Principles of Computer Security: CompTIA Security + ® and Beyond, Third Edition © 2012 Principles of Computer Security: CompTIA Security+ ® and Beyond,

1 Vulnerability Analysis and Patches Management Using Secure Mobile Agents Presented by: Muhammad Awais Shibli.

Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt In ACM CCS’05.

Carnegie Mellon Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University.

Ether: Malware Analysis via Hardware Virtualization Extensions Author: Artem Dinaburg, Paul Royal, Monirul Sharif, Wenke Lee Presenter: Yi Yang Presenter:

Virus Detection Mechanisms Final Year Project by Chaitanya kumar CH K.S. Karthik.

Chapter 12 Recursion, Complexity, and Searching and Sorting

1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.

Hardware Assisted Control Flow Obfuscation for Embedded Processors Xiaoton Zhuang, Tao Zhang, Hsien-Hsin S. Lee, Santosh Pande HIDE: An Infrastructure.

Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.

AccessMiner Using System- Centric Models for Malware Protection Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu and Engin Kirda.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

Complexity of Algorithms

Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.

Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.

Presented by: Akbar Saidov Authors: M. Polychronakis, K. G. Anagnostakis, E. P. Markatos.

Security CS Introduction to Operating Systems.

14.1/21 Part 5: protection and security Protection mechanisms control access to a system by limiting the types of file access permitted to users. In addition,

Advanced Polymorphic Worms: Evading IDS by Blending in with Normal Traffic Authors: Oleg Kolensnikov and Wenke Lee Published: Technical report, 2005, College.

Application Recognition Sam Larsen Determina. Process Control One method to improve computer security is through process control  Whitelist: user specifies.

Computer Systems Week 14: Memory Management Amanda Oddie.

CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.

Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma

Lorenzo Martignoni, Elizabeth Stinson, Matt Fredrikson, Somesh Jha, John Mitchell RAID

Cryptography and Network Security Sixth Edition by William Stallings.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

Bitwise Sort By Matt Hannon. What is Bitwise Sort It is an algorithm that works with the individual bits of each entry in order to place them in groups.

Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.

Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.

Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.

Brief Version of Starting Out with C++ Chapter 1 Introduction to Computers and Programming.

Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA

Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.

Computer Systems Architecture Edited by Original lecture by Ian Sunley Areas: Computer users Basic topics What is a computer?

October 20-23rd, 2015 FEEBO: A Framework for Empirical Evaluation of Malware Detection Resilience Against Behavior Obfuscation Sebastian Banescu Tobias.

Cosc 4765 Antivirus Approaches. In a Perfect world The best solution to viruses and worms to prevent infected the system –Generally considered impossible.

Malware Detection XUTONG CHEN & Xin zhou.

Automatic Network Protocol Analysis

Chapter 1. Basic Static Techniques

COMBINED PAGING AND SEGMENTATION

CSCI1600: Embedded and Real Time Software

Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

RHMD: Evasion-Resilient Hardware Malware Detectors

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Outline System architecture Current work Experiments Next Steps

CSCI1600: Embedded and Real Time Software

Presentation transcript:

EFFECTIVE AND EFFICIENT MALWARE DETECTION AT THE END HOST Presentation by Clark Wachsmuth C. Kolbitsch, P. M. Comparetti, C. Kreugel, E. Kirda, X. Zhou and X. Wang or: How I Learned to Stop Worrying and Love the Malware-infested Internet

THE PROBLEM  Malware!  Ineffective (and/or inefficient) detection models  Can be evaded by fairly simple means by malware authors, such as using polymorphism, obfuscation or system call reordering  Resource-heavy detectors may be effective, but not efficient enough for average consumer computer 2

PAST IMPLEMENTATIONS  Network-based detection  Pros: Useful for detecting some network malware Modern malware is heavily network-bound  Cons: For network-based malware only Content sniffers thwarted by encrypted data Blending attacks make malicious data match normal data signatures 3

PAST IMPLEMENTATIONS 4  Host-based detection  Pros: Has the resources to see complete work of malware programs – not limited to specific host resource Some pre-emptive strategies  Cons: Code obfuscation and polymorphism can easily bypass methods such as byte signatures while keeping the same functionality System call sequence based detection, again, easily bypassed by reordering calls or making unused calls

PAST IMPLEMENTATIONS 5  Static analysis-based detection  Pros: More effective due to focus on malware behavior, thus less stymied by obfuscation and polymorphism  Cons: Method itself is difficult to employ Has its own vulnerabilities such as detecting metamorphic code and runtime packaging Takes a heavy toll on system resources making it unreasonable for home computer systems

PAST IMPLEMENTATIONS 6  Dynamic analysis-based detection  Pros: Less rigid focus on malware behavior allowing for a more general and broad way of detecting malware  Cons: Can require special hardware for detection (data tainting) Large associated overhead making it unusable in the home computer realm

THE SOLUTION 7  Effective and Efficient malware detection, duh!  But how?  Effective:  Can’t be duped by simple order-changing, rearranging schemes  Doesn’t rely only on known quantities; can detect unknown running programs  No false positives  Efficient:  Not incurring a very significant chunk of system resource overhead

THE PLAN 8  In a sandboxed environment, observe different malware and develop fine-grained models  Efficiently match these models up with the run-time behavior of an unknown program  If a match is found, terminate and eliminate

BUT HOW?! 9  By creating a behavior graph where each node is an “interesting” system call  The nodes store a symbolic expression (simple node) or a program “slice” (complex node) that can calculate the output of the system call  These expressions/slices used to detect if output is the argument of another interesting system call during runtime  If found, an edge is created between the two nodes

THE CONTROLLED ENVIRONMENT 10  Uses Anubis (Analyzing Unknown Binaries)  Disassembles instructions (including system calls) and keeps an instruction log  Keeps memory log for instructions that read from memory, where (in memory) the instruction reads and writes  Each bite tainted to detect data dependencies between system calls  Any labels within a branch operation are labeled with the taint of the controlling instruction for control dependencies

THE INITIAL BEHAVIOR GRAPH 11  With all the instructions labeled, an initial graph is creating placing on it all system calls (as nodes)  Edges are created when a dependency is found  Using the logs, a recursive backwards trace of system call arguments is made to determine how the argument’s bits were created  These instructions are gathered into a program slice until either an instruction that can’t be traced further (from the outside) or a value produced by an immediate operand from an instruction or coming from the initialized data segment

PROGRAM SLICES  FUNCTIONS 12  With the slice, we know how and who created the argument of the sys call  It’s not necessarily the direct program code, though (unrolled loops won’t match with different sizes)  Each line in binary that appears at least once in slice is marked and appropriate code copied to function. Non-marked lines become nops.  Stack needs fixing because stack creating code often not part of slice (uses instruction log)

SIMPLIFYING FUNCTIONS 13  Yay! We have a function that gives an expected output for a given input  Some functions can be quite long and fairly basic  We can optimize it to a smaller symbolic expression  This optimization can have huge overhead reduction at the end host  Other functions aren’t so basic, so we retain the program code of the function rather than have a symbolic reduction

SCANNING END HOST 14  Scanner monitors running program for sys calls  Has admin privileges running is user-mode  Assume programs can’t get to kernel  All nodes inactive in initial behavior graph  When a system call is made, the scanner checks graph for inactive nodes of the same type and sees if parent nodes are active  If found, checks all arguments from sys calls for simple functions; defers complex functions for later but allows complex function to hold  If all simple function arguments hold, node becomes active

SCANNING END HOST 15  When do we check the complex functions?  When we reach an interesting node Interesting if it is a security-relevant system call (writing to file system, network or registry, starting new processes) Also interesting if node has no outgoing edges  If complex function holds, the interesting node is confirmed Otherwise, the node with the complex function becomes inactive and any subgraph rooted under it becomes inactive as well as the edge being formed

MATCHING MALWARE 16  If an interesting node is confirmed, then the program is matched as malware  However, if there is no complex function dependency, then the graph created is not used to help detect future malware programs  The subgraph created with the interesting node is also a behavior graph that denotes a trait of the particular malware running

DETECTION EFFECTIVENESS 17  Generated behavior graphs for six popular malware families (Table 1)  100 samples of each family were Selected from the database and the non-interesting samples were tossed out  50 random samples chosen from remaining bunch to create behavior graphs and train dataset  Not all samples could be detected due to non- interesting behavior and complex function crashes

TESTING DATA 18

IS IT EFFECTIVE? 19  Some were effective and some weren’t so much  AV software notoriously bad at classifying malware  Confirmed by manual inspection, especially for Agent  Restricting samples to 155 known variants yielded 92% effectiveness  Also restricted data samples to 108 unknown variants and still achieved 23% effectiveness, indicating that this method can even detect some unknown variants  This behavior-based method is more general than an AV scanner, therefore requires less graphs than signatures

WHAT ABOUT FALSE POSITIVES? 20  Tested on WinXP using IE, Firefox, Thunderbird, putty and Notepad  Yielded no false positives  When complex functions were unchecked and allowed to hold, all of the above yielded false positives  Therefore, system call dependencies are at the root of this method’s success

OK, BUT IS IT EFFICIENT? 21  System setup for testing:  WinXP, single-core 1.8Ghz P4 with 1GB RAM  Tested using 7-Zip, IE, Visual Studio

UMM, DID THAT SAY 40%? 22  CPU / I/O-Bound tests showed low overhead  Compiling seems quite high at 40%  System calls in compiling 5000/sec compared to 7-zip’s 700/sec  Compilation is worst-case scenario  Improved symbolic execution engine could possibly reduce high complex function evaluation of 16.7%  Still performed well for common tasks

LIMITATIONS 23  Authors could use time-triggered behavior or command and control mechanisms to prevent malware behavior during test  A reactive method that only works on running malware  But, new graphs can be employed quickly and it can detect some unknown variants  Authors could change algorithms rendering program slices unusable  Changing algorithms is a lot of work and this method still raises the bar considerably higher for malware authors

TECHNICAL CONTRIBUTIONS 24  Developed effective models with detailed semantic information about the malware family  Created a scanner that efficiently matches the behavior of an unknown, running program against the models by tracking system call dependencies  Experimental evidence that approach is feasible and usable in practice

CONCLUSION 25  Effective? Check  With correctly labeled, known variants, a 92% effectiveness was obtained with no false positives  Efficient? Check  While compiling was a worst-case scenario, tasks common to the average end user incurred only a low overhead