Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,

Similar presentations


Presentation on theme: "Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,"— Presentation transcript:

1 Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel, Engin Kirda European Symposium on Research in Computer Security (ESORICS'09)ESORICS'09

2 Outline  Introduction  System Overview  Model Generation Data  Generating Detection Models  Evaluation  Conclusion

3 Introduction  Two main kinds of network-based detection system Vertical correlation technique  Detection of individual bots  Checking traffic patterns, content of C&C traffic, and bot related activities.  Require prior knowledge of C&C channels and propagation vectors of bot Horizontal correlation technique  Detection of a group of bots  Based on network traffic  Require that at two bots in the monitor networks

4 Introduction (cont ’ d)  Characteristic behavior of a bot Receive commands from botmater Carry out some actions in response to these commands  This paper proposed a two-stage detection model to leverage these two characteristics  In the experiments, the authors generated 18 different bot families. 16 controlled via IRC, One via HTTP (Kraken) One via a peer-to-peer network (Storm Worm).

5 System Overview  Input of the system A collection of bot binaries  Launch a bot in a controlled environment and record its network activities (traces)  Identify the commands that this bot receives as well as its corresponding responses  Translate observations into detection models  Output of the system Detection models for different bot families

6 Detecting Procedure  Stateful model (two-stage detection) 1.Checking if a bot command is sent 2.If yes in stage 1, checking if the responses is above a threshold or not (e.g., the number of new connections opened by a host)  Use content-based specifications to model commands (comparable to intrusion detection signatures)  Use network-based specifications to model responses (comparable to anomaly detection)

7 Model Generation Data  Run each bot binary for a period of several days  Locating bot responses  Finding commands  Extracting model generation data

8 Locating bot responses  Assumption: bot responses that lead to a change in network behavior  Partition network traffic into consecutive time intervals of equal length  For each time interval, define 8 normalized features (called traffic profile):

9 Locating bot responses (cont ’ d)  Convert the traffic profiles (vectors) into time series data d(t) as follows: where ε is the sliding window size  Locate bot responses by using CUSUM algorithm  ε = 5 and an interval of 50 seconds delivered the best results in the tests

10 Finding bot commands  After locating bot responses, a small section of network traffic (snippet) is extracted for each response  Cluster those traffic snippets that lead to similar responses

11 Extracting model generation data  Extract two pieces of information the subsequent model generation step  A snippet Contains 90 seconds of traffic  Plus last 30 seconds of the previous one and first 10 seconds of the following one A snippet  Average of the traffic profile vectors This period is the time from the start of the current response to the next change in behavior

12 Generating Detection Models  Command model generation  Response model generation

13 Command model generation  The goal is to identify common elements in a particular behavior cluster  First, apply a second clustering refinement step that groups similar network packet payloads within each behavior cluster  The longest common subsequence algorithm is applied to each set of similar payloads  Generate one token sequence per set

14 Response model generation  Compute the element-wise average of the individual behavior profiles for a behavior cluster  Give minimal bounds for certain network features 1,000 for UDP packets 100 for HTTP packets 10 for SMTP packets 20 for different IPs  A detection model is not generated if a response profile exceeds none of these thresholds

15 Evaluation  Collected a set of 416 different (based on MD5 hash) bot samples From Anubis The collection period was more than 8 months Each bot produce a traffic trace with a length of five days  Divided into families of bots 16 different IRC bot families (with 356 traffic traces) One HTTP bot family (with 60 traffic traces) One p2p bot family (Storm Worm, with 30 traffic traces)

16

17 Detection Capability  Split our set of 446 network traces into training sets and test sets  Each training set contained 25% of one bot family's traces  This procedure was performed four times per family (four-fold cross validation)

18 Real-World Deployment  Deployed a sensor In front of the residential homes of RWTH Aachen University At a Greek university network  The total traffic is in the order of 94 billion network packets over a period of over three months at two different sites in Europe

19 Real-World Deployment  In the Greek network, most cases were false positives.  BotHunter w/o Blacklist means BotHunter without blacklists of known DNS names and IP addresses  The detection rate of BotHunter w/o Blacklist in the detection capacity experiment drops to 39%

20 Conclusion  This paper proposed a two-stage detection method which included a command model and a response model  Automatically derives signatures for the bot commands and network-level specifications for the bot responses  Can generate models for IRC bots, HTTP bots, and even P2P bots such as Storm


Download ppt "Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,"

Similar presentations


Ads by Google