Presentation is loading. Please wait.

Presentation is loading. Please wait.

2008-7-31 Guofei Gu BotMiner BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Guofei Gu 1,2,

Similar presentations


Presentation on theme: "2008-7-31 Guofei Gu BotMiner BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Guofei Gu 1,2,"— Presentation transcript:

1 Guofei Gu BotMiner BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Guofei Gu 1,2, Roberto Perdisci 3, Junjie Zhang 1, and Wenke Lee 1 1 Georgia Tech 3 Damballa, Inc. 2 Texas A&M University

2 2 Roadmap Introduction –Botnet problem –Challenges for botnet detection –Related work BotMiner –Motivation –Design –Evaluation Conclusion Roadmap

3 3 What Is a Bot/Botnet? Bot –A malware instance that runs autonomously and automatically on a compromised computer (zombie) without owners consent –Profit-driven, professionally written, widely propagated Botnet (Bot Army): network of bots controlled by criminals –Definition: A coordinated group of malware instances that are controlled by a botmaster via some C&C channel –Architecture: centralized (e.g., IRC,HTTP), distributed (e.g., P2P) –25% of Internet PCs are part of a botnet! ( - Vint Cerf) bot C&C Botmaster Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work

4 4 Botnets are used for … All DDoS attacks Spam Click fraud Information theft Phishing attacks Distributing other malware, e.g., spyware Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work

5 5 Challenges for Botnet Detection Bots are stealthy on the infected machines –We focus on a network-based solution Bot infection is usually a multi-faceted and multi- phased process –Only looking at one specific aspect likely to fail Bots are dynamically evolving –Static and signature-based approaches may not be effective Botnets can have very flexible design of C&C channels –A solution very specific to a botnet instance is not desirable Botnet Problem Challenges for Botnet Detection Related Work Introduction BotMiner Conclusion

6 6 Why Existing Techniques Not Enough? Traditional AV tools –Bots use packer, rootkit, frequent updating to easily defeat AV tools Traditional IDS/IPS –Look at only specific aspect –Do not have a big picture Honeypot –Not a good botnet detection tool Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work

7 7 Existing Botnet Detection Work [Binkley,Singh 2006]: IRC-based bot detection combine IRC statistics and TCP work weight Rishi [Goebel, Holz 2007]: signature-based IRC bot nickname detection [Livadas et al. 2006, Karasaridis et al. 2007]: (BBN, AT&T) network flow level detection of IRC botnets (IRC botnet) BotHunter [Gu etal Security07]: dialog correlation to detect bots based on an infection dialog model BotSniffer [Gu etal NDSS08]: spatial-temporal correlation to detect centralized botnet C&C TAMD [Yen, Reiter 2008]: traffic aggregation to detect botnets that use a centralized C&C structure Botnet Problem Challenges for Botnet Detection Related Work Introduction BotMiner Conclusion

8 8 Why BotMiner? Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models … Introduction BotMiner Conclusion Motivation Design Evaluation Example: Nugache, Storm, …

9 9 BotMiner: Protocol- and Structure- Independent Detection Enterprise-like Network Horizontal correlation - Bots are for long-term use - Botnet: communication and activities are coordinated/similar Introduction BotMiner Conclusion Motivation Design Evaluation Internet

10 10 Revisit the Definition of a Botnet A coordinated group of malware instances that are controlled by a botmaster via some C&C channel We need to monitor two planes –C-plane (C&C communication plane): who is talking to whom –A-plane (malicious activity plane): who is doing what Introduction BotMiner Conclusion Motivation Design Evaluation

11 11 BotMiner Architecture Introduction BotMiner Conclusion Motivation Design Evaluation

12 12 BotMiner C-plane Clustering What characterizes a communication flow (C- flow) between a local host and a remote service? – Introduction BotMiner Conclusion Motivation Design Evaluation

13 13 How to Capture Talking in What Kind of Patterns? Temporal related statistical distribution information in –BPS (bytes per second) –FPH (flow per hour) Spatial related statistical distribution information in –BPP (bytes per packet) –PPF (packet per flow) Introduction BotMiner Conclusion Motivation Design Evaluation

14 14 Two-step Clustering of C-flows Why multi-step? How? –Coarse-grained clustering Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8) Efficient clustering algorithm: X-means –Fine-grained clustering Using full feature space (13*4=52) Whats left? Introduction BotMiner Conclusion Motivation Design Evaluation

15 15 A-plane Clustering Capture activities in what kind of patterns Introduction BotMiner Conclusion Motivation Design Evaluation

16 16 Cross-plane Correlation Botnet score s(h) for every host h Similarity score between host h i and h j Hierarchical clustering AiAi AjAj Two hosts in the same A-clusters and in at least one common C-cluster are clustered together Introduction BotMiner Conclusion Motivation Design Evaluation

17 17 Evaluation Traces Introduction BotMiner Conclusion Motivation Design Evaluation

18 18 Evaluation Results: False Positives Introduction BotMiner Conclusion Motivation Design Evaluation

19 19 Evaluation Results: Detection Rate Introduction BotMiner Conclusion Motivation Design Evaluation

20 20 Summary and Future Work BotMiner –New botnet detection system based on Horizontal correlation –Independent of botnet C&C protocol and structure –Real-world evaluation shows promising results Future work –More efficient clustering, more robust features –New faster detection system using active techniques BotMiner: offline correlation, and requires a relatively long time for detection BotProbe: fast detection by observing at most one round of C&C –New real-time solution for very high speed and very large networks Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework

21 21 Correlation-based Botnet Detection Framework Internet Enterprise-like Network Horizontal Correlation Vertical Correlation BotHunter (Security07) BotSniffer (NDSS08) BotMiner (Security08) Cause-Effect Correlation BotProbe Time Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework

22 22 Limitation and Discussion Evading C-plane monitoring and clustering –Misuse whitelist –Manipulate communication patterns Evading A-plane monitoring and clustering –Very stealthy activity –Individualize bots communication/activity Evading cross-plane analysis –Extremely delayed task Appendix

23 23 High-Speed Packet Sampling Traffic arrives at high rates –High volume –Some analysis scales with the size of the input Possible approaches –Random packet sampling –Targeted packet sampling

24 24 Approach Idea: Bias sampling of traffic towards subpopulations based on conditions of traffic Two modules –Counting: Count statistics of each traffic flow –Sampling: Sample packets based on (1) overall target sampling rate (2) input conditions Counting Traffic stream Sampling Input conditions Instantaneous sampling probability Overall sampling rate Traffic subpopulations

25 25 Challenges How to specify subpopulations? –Solution: multi-dimensional array specification How to maintain counts for each subpopulation? –Solution: rotating array of counting Bloom filters How to derive instantaneous sampling probabilities from overall constraints? –Solution: multi-dimensional counter array, and scaling based on target rates

26 26 Specifying Subpopulations Idea: Use concatenation of header fields (tupples) as a key for a subpopulation –These keys specify a group of packets that will be counted together # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, 1] AND tuple_2 in (0, 5]: 0.5 Count groups of packets with the same source and destination IP address Count groups of packets with the same source IP, source port, and destination port

27 27 # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Sampling Rates for Subpopulations Operator specifies –Overall sampling rate –Conditional rate within each class Flexsample computes instantaneous sampling probabilities based on this Sample one in 100 packets on average Within the 1/100 budget, half of sampled packets should come from groups satisfying this condition

28 28 Examining the Condition Biases sampling towards packets from (source IP, destination IP) pairs which –Have sent at least 30 packets –Have sent packets to at least 5 distinct ports Application: Portscan # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5

29 29 Sampling Lookup Table Problem: Conditions may not be completely specified Solution: Sampling budget lookup table –Lookup table for allocating sampling budget to each class # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Deduced values Next problem: Determining which condition each packet satisfies

30 30 Counting Subpopulations Each packet belongs to a particular range in n-dimensional space Counts for each condition –Maintain counter (counting Bloom filter) for each tuple in every subcondition –Rotate counters to expunge stale values Details: 1. Number of counters 2. How often to rotate

31 31 Deriving Instantaneous Sampling Rates Problem: Traffic rates are dynamic –Relative fractions of packets in each class may change Solution: Count packets in each sampling class, and adjust probabilities to rebalance according to the lookup table –Instantaneous rate = overall rate * (target rate) / (actual rate) –Keep track of actual rate using Bloom filter array and EWMA

32 32 Example Evaluation: Portscan Parameters as above Nmap scan injected into ful one-hour trace from department network Results Setup FlexSample can capture 10x more of the portscan packets if all sampling budget is allocated to portscan class Bias can be configured

33 33 Other Applications Recovering unique conversations in sampled traffic Identifying DDoS Attacks Identifying heavy hiters, high-degree nodes, etc.

34 34 Open Challenges Specifying ranges and classes for specific applications Scaling the counter array as the number of tuples and ranges increases Simultaneously satisfying multiple objectives

35 35 Next Steps: BotMiner Integration Determine –The traffic rates that BotMiner can support for online analysis –The subpopulations that will yield the highest detection rates Evaluation on traffic traces that contain botnets of interest


Download ppt "2008-7-31 Guofei Gu BotMiner BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Guofei Gu 1,2,"

Similar presentations


Ads by Google