Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee

Similar presentations


Presentation on theme: "Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee"— Presentation transcript:

1 Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
BotMiner: Clustering Analysis of Network Traffic for Protocol-and Structure-Independent Botnet Detection Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee Presented by Hongrui Zhang

2 OVERVIEW Introduction: Botnet and how botnet evolving
Current state-of-art botnet detection method and limitation BotMiner: novel general botnet detection framework Implementation Architecture Experiment result Conclusion and future work

3 Introduction We define botnet as:
“ A coordinated group of malware instances that are controlled via C&C channels” C&C channel: Command & Control Protocols---IRC(Internet Relay Chat) protocols to HTTP protocols. Malware instances: Scan, Spam, Binary Downloading, and Exploit

4 Botnet are evolving Centralized to distributed (Stormworm)
Fast-flux service network

5 Current state-of-art method
Rishi Can only work with IRC protocol, no good for http protocol base botnet BotSniffer Can only work with C&C centralized network, no good for distributed network BotHunter Can only work with bot behavior, no good as botnet may chanage infection model

6 Limitation Botnet is characterized by both a C&C channel and malicious activities. Malware may perform malicious activities but they do not connect to a C&C channel. Normal applications (normal file sharing application) may show communication patterns but without performing any malicious activities.

7 BotMiner A novel general botnet detection framework, independent of botnet C&C protocol, structure, and infection model of botnets, and be resilient to the change of C&C server addresses. In addition, it require no a priori knowledge of specific botnets. Cluster similar communication activities in the communication traffic, clusters similar malicious activities in the activity traffic, and then performs cross cluster correlation to identify the hosts (botnet).

8 BotMiner Framework Five component: A-Plane Monitor C-Plane Monitor
A-Plane Clustering C-Plane Clustering Cross-Plane Correlation

9 C-Plane Traffic Monitors
C-Plane monitor captures network flows and records information including ( time, duration, source IP, source port, destination IP, destination port and the number of packets and bytes transferred in both directions) from the logging of routers. Self-developed tool fcapture based on the Judy library, very low packet loss ratio on high speed networks ( 300Mbps traffic) Generate 200MB to 1GB data per day compared to Argus 36GB binary flow.

10 A-Plane Traffic Monitors
A-plane monitor analyzes the outbound traffic through the monitored network and detect internal host performed malicious activities. For the most common activities a botmaster may command: Scan: SCADE (Statistical sCan Anomaly Detection Engine) as Snort pre-processor plug-ins with abnormally-high scan rate and weighted failed connection rate modules. Spam: Snort plug-in detect anomalous amounts of DNS queries for MX(mail exchange) records from the same source IP and the amount of SMTP (simple mail transfer protocols) connections initiated by the same source to mail servers outside the monitored network. Binary Downloading: Bot Hunter’s egg download detection method. Relying only on the A-Plane monitor will generate a lot false positives.

11 C-Plane Clustering C-plane clustering reading the log and find clusters of machines that share similar communication pattern.

12 C-Plane Clustering Filter process:
F1: filter out communications that are not directed from internal hosts to external hosts. F2: filter not completely established flows ( one way traffic) F3: white lish, filter out flows whose destinations are well known as legitimate servers (US top 100 and global top 100 most popular websites). Aggregation: aggregating related flows into communication flows ( all TCP/UDP flows that share the same protocol, source IP, destination IP and port are into the same C-flow)

13 C-Plane Clustering Vector Representation of C-flows:
Extract a number of statistical features from each C-flow and translated them into d- dimensional pattern vectors FPH: The number of flows per hour PPF: The number of packets per flow BPP: The average number of bytes per packets BPS: The average number of bytes per second

14 C-Plane Clustering Given the discrete sample distribution of each of these four variable, by binning technique, describe each distribution as a vector of 13 elements and a total of 52 elements

15 C-Plane Clustering The clustering process is changing due to the Dateset is often large even for a moderate scale network and the dimensionality d of the feature is also large 2-Step clustering Coarse-grained clustering: reduced the 52 elements to 8 elements by calculating the mean and variance of the distribution and apply X-means clustering algorithm Refined clustering: use all 52 elements on the relative small size cluster and perform the X-means clustering algorithm to refine the result.

16 A-Plane Clustering Two layer of clustering:
First cluster according to the types of their activities Second further cluster clients according to specific activity features Scan: features such as scanning ports, or target subnet Spam: features such as highly overlapped SMTP connection destinations or similar embedded URL within spam content. (Author only use first layer) Binary downloading: features such as similar binaries. (author only capture the first packet for clustering.)

17 A-Plane Clustering

18 Cross-Plane Correlation
Idea is to cross check clusters in the two planes to find out interactions that reinforce evidence of a host being a botnet. First filter out the hosts that a score below a certain detection threshold Then group the remaining most suspicious hosts according to a similarity metric that take both C-plane and A-plane into account.

19 Cross-Plane Correlation
Use another equation to calculate the similarity between hosts. And thus give opportunity to apply hierarchical clustering Below is a dendrogram encodes the relationships among the bots, Davies-Bouldin(DB) validation index is used to find the best dendrogram cut.

20 Experiment setup and results
Setup traffic monitors at the campus network of College of computing at Georgia Tech and run both C-plane and A-plane monitors for 10 day period in late A total of four different botnets were collected. One addition from virtual network and one from 2004

21 Experiment setup and results
The last two botnet were obtained from real-world trace. Nugache is a TCP=based P2P bot that performs encrypted communications on port 8. The author run it on VM-based honeypot and observed scanning activity to port 8 since it attempted to connect to its seeding peers. Such activities can easily detect through A-Plane clustering Storm is well-known as a spam botnet with a huge number of infected hosts. Same as Nugache, huge amount of spam traffic can be observed so do the A-plane clustering.

22 Result evaluation

23 Result evaluation

24 Result evaluation Successfully identified all the bots within the 6 botnets. One false negative for both Botnet-IRC-spybot and Botnet-IRC-N Very small false positive.

25 Limitation and potential solution
Evading C-plane monitoring and clustering Botnets may utilize a legitimate website for their C&C purpose and thus evading detection due to our white list filter. This can be fix by clustering of network traffic towards the server pointed by the secondary URL. Botnets may attempt to intentionally manipulate their communication pattern. But this will not evading just like clustering P2P network. More advanced way is to randomize each individual communication pattern by injecting random packets in a flow. Also Botnets could use covert channels to hide communication.

26 Limitation and potential solution
Evading A-plane monitoring and clustering Botnet may perform very stealthy malicious activities such as scan very slowly, or send spam very slowly ( e.g. one scan per hour, or one spam per day) More advanced evasion is to differentiate the bots and avoid commanding bots in the same monitored network. To defeat such evasion, we have to deploy distributed monitors on the internet to cover a larger monitored space.

27 Limitation and potential solution
Evading Cross-plane monitoring and clustering The botmaster can perform and extremely delayed task evasion sinch our approach only using one day’s data. As a solution we may use multiple day data and cross check back several days. This is the trade-off for both BotMiner and botmaster. For us we may generate high false positives to miss potential botnet. For the botmaster, they may suffer efficiency in coordinating the bot army

28 Conclusion and future work
BotMiner shows excellent detection accuracy on various types of botnets (including IRC-based, HTTP-based, and P2P-based botnets) with a very low false positive rate on normal traffic. New techniques to monitor/cluster communication and activity patterns of botnets should be studied due to future botnets may utilize evasion techniques. In addition, C-flow converting and clustering algorithm could be improved by combining different correlation techniques. And develop new real-time detection system.

29 Q&A


Download ppt "Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee"

Similar presentations


Ads by Google