Presentation is loading. Please wait.

Presentation is loading. Please wait.

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1,

Similar presentations


Presentation on theme: "BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1,"— Presentation transcript:

1 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1, and Wenke Lee1 1Georgia Tech Damballa, Inc. 2Texas A&M University

2 Roadmap Introduction BotMiner Conclusion Botnet problem
Challenges for botnet detection Related work BotMiner Motivation Design Evaluation Conclusion

3 What Is a Bot/Botnet? Bot
Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work What Is a Bot/Botnet? Bot A malware instance that runs autonomously and automatically on a compromised computer (zombie) without owner’s consent Profit-driven, professionally written, widely propagated Botnet (Bot Army): network of bots controlled by criminals Definition: “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” Architecture: centralized (e.g., IRC,HTTP), distributed (e.g., P2P) “25% of Internet PCs are part of a botnet!” ( - Vint Cerf) a remote control facility (C&C) IRC, HTTP, P2P a spreading mechanism to propagate Remote vulnerability scan, , Drive-by download, IM Botmaster bot C&C

4 Botnets are used for … All DDoS attacks Spam Click fraud
Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Botnets are used for … All DDoS attacks Spam Click fraud Information theft Phishing attacks Distributing other malware, e.g., spyware 95% of is spam

5 Challenges for Botnet Detection
Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Challenges for Botnet Detection Bots are stealthy on the infected machines We focus on a network-based solution Bot infection is usually a multi-faceted and multi-phased process Only looking at one specific aspect likely to fail Bots are dynamically evolving Static and signature-based approaches may not be effective Botnets can have very flexible design of C&C channels A solution very specific to a botnet instance is not desirable

6 Why Existing Techniques Not Enough?
Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Why Existing Techniques Not Enough? Traditional AV tools Bots use packer, rootkit, frequent updating to easily defeat AV tools Traditional IDS/IPS Look at only specific aspect Do not have a big picture Honeypot Not a good botnet detection tool Not scalable, mostly passively waiting Bots can detect/discover honeypot/honeynet

7 Existing Botnet Detection Work
Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Existing Botnet Detection Work [Binkley,Singh 2006]: IRC-based bot detection combine IRC statistics and TCP work weight Rishi [Goebel, Holz 2007]: signature-based IRC bot nickname detection [Livadas et al. 2006, Karasaridis et al. 2007]: (BBN, AT&T) network flow level detection of IRC botnets (IRC botnet) BotHunter [Gu etal Security’07]: dialog correlation to detect bots based on an infection dialog model BotSniffer [Gu etal NDSS’08]: spatial-temporal correlation to detect centralized botnet C&C TAMD [Yen, Reiter 2008]: traffic aggregation to detect botnets that use a centralized C&C structure

8 Example: Nugache, Storm, …
Introduction BotMiner Conclusion Motivation Design Evaluation Why BotMiner? Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models … Example: Nugache, Storm, …

9 BotMiner: Protocol- and Structure-Independent Detection
Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner: Protocol- and Structure-Independent Detection Horizontal correlation - Bots are for long-term use Botnet: communication and activities are coordinated/similar Enterprise-like Network Internet

10 Revisit the Definition of a Botnet
Introduction BotMiner Conclusion Motivation Design Evaluation Revisit the Definition of a Botnet “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” We need to monitor two planes C-plane (C&C communication plane): “who is talking to whom” A-plane (malicious activity plane): “who is doing what” Color, re-draw, bigger

11 BotMiner Architecture
Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner Architecture

12 BotMiner C-plane Clustering
Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner C-plane Clustering What characterizes a communication flow (C-flow) between a local host and a remote service? <protocol, srcIP, dstIP, dstPort>

13 How to Capture “Talking in What Kind of Patterns”?
Introduction BotMiner Conclusion Motivation Design Evaluation How to Capture “Talking in What Kind of Patterns”? Temporal related statistical distribution information in BPS (bytes per second) FPH (flow per hour) Spatial related statistical distribution information in BPP (bytes per packet) PPF (packet per flow) Content independent

14 Two-step Clustering of C-flows
Introduction BotMiner Conclusion Motivation Design Evaluation Two-step Clustering of C-flows Why multi-step? How? Coarse-grained clustering Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8) Efficient clustering algorithm: X-means Fine-grained clustering Using full feature space (13*4=52) What’s left?

15 A-plane Clustering Capture “activities in what kind of patterns”
Introduction BotMiner Conclusion Motivation Design Evaluation A-plane Clustering Capture “activities in what kind of patterns”

16 Cross-plane Correlation
Introduction BotMiner Conclusion Motivation Design Evaluation Cross-plane Correlation Botnet score s(h) for every host h Similarity score between host hi and hj Hierarchical clustering Ai Aj Two hosts in the same A-clusters and in at least one common C-cluster are clustered together

17 Evaluation Traces BotMiner Introduction Conclusion Evaluation
Motivation Design Evaluation Evaluation Traces

18 Evaluation Results: False Positives
Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Results: False Positives

19 Evaluation Results: Detection Rate
Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Results: Detection Rate

20 Summary and Future Work
Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework Summary and Future Work BotMiner New botnet detection system based on Horizontal correlation Independent of botnet C&C protocol and structure Real-world evaluation shows promising results Future work More efficient clustering, more robust features New faster detection system using active techniques BotMiner: offline correlation, and requires a relatively long time for detection BotProbe: fast detection by observing at most one round of C&C New real-time solution for very high speed and very large networks

21 Correlation-based Botnet Detection Framework
Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework Correlation-based Botnet Detection Framework Vertical Correlation BotHunter (Security’07) Enterprise-like Network Horizontal Correlation BotSniffer (NDSS’08) BotMiner (Security’08) BotHunter, regardless of archtectue… BotSniffer: mainly centralized BotMiner: Time Internet Cause-Effect Correlation BotProbe

22 Limitation and Discussion
Appendix Limitation and Discussion Evading C-plane monitoring and clustering Misuse whitelist Manipulate communication patterns Evading A-plane monitoring and clustering Very stealthy activity Individualize bots’ communication/activity Evading cross-plane analysis Extremely delayed task

23 High-Speed Packet Sampling
Traffic arrives at high rates High volume Some analysis scales with the size of the input Possible approaches Random packet sampling Targeted packet sampling

24 Instantaneous sampling probability
Approach Idea: Bias sampling of traffic towards subpopulations based on conditions of traffic Two modules Counting: Count statistics of each traffic flow Sampling: Sample packets based on (1) overall target sampling rate (2) input conditions Instantaneous sampling probability Overall sampling rate Input conditions Traffic stream Traffic subpopulations Counting Sampling

25 Challenges How to specify subpopulations?
Solution: multi-dimensional array specification How to maintain counts for each subpopulation? Solution: rotating array of counting Bloom filters How to derive instantaneous sampling probabilities from overall constraints? Solution: multi-dimensional counter array, and scaling based on target rates

26 Specifying Subpopulations
Idea: Use concatenation of header fields (“tupples”) as a “key” for a subpopulation These keys specify a group of packets that will be counted together Count groups of packets with the same source and destination IP address # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, 1] AND tuple_2 in (0, 5]: 0.5 Count groups of packets with the same source IP, source port, and destination port

27 Sampling Rates for Subpopulations
Operator specifies Overall sampling rate Conditional rate within each class Flexsample computes instantaneous sampling probabilities based on this # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Sample one in 100 packets on average Within the 1/100 “budget”, half of sampled packets should come from groups satisfying this condition

28 Examining the Condition
Biases sampling towards packets from (source IP, destination IP) pairs which Have sent at least 30 packets Have sent packets to at least 5 distinct ports Application: Portscan # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5

29 Next problem: Determining which condition each packet satisfies
Sampling Lookup Table Problem: Conditions may not be completely specified Solution: Sampling budget lookup table Lookup table for allocating sampling “budget” to each class Deduced values # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Next problem: Determining which condition each packet satisfies

30 Counting Subpopulations
Each packet belongs to a particular range in n-dimensional space Counts for each condition Maintain counter (counting Bloom filter) for each tuple in every subcondition Rotate counters to expunge “stale” values Details: 1. Number of counters 2. How often to rotate

31 Deriving Instantaneous Sampling Rates
Problem: Traffic rates are dynamic Relative fractions of packets in each class may change Solution: Count packets in each sampling class, and adjust probabilities to rebalance according to the lookup table Instantaneous rate = overall rate * (target rate) / (actual rate) Keep track of actual rate using Bloom filter array and EWMA

32 Example Evaluation: Portscan
Setup Parameters as above Nmap scan injected into ful one-hour trace from department network Results FlexSample can capture 10x more of the portscan packets if all sampling budget is allocated to portscan class Bias can be configured

33 Other Applications Recovering unique “conversations” in sampled traffic Identifying DDoS Attacks Identifying heavy hiters, high-degree nodes, etc.

34 Open Challenges Specifying ranges and classes for specific applications Scaling the counter array as the number of tuples and ranges increases Simultaneously satisfying multiple objectives

35 Next Steps: BotMiner Integration
Determine The traffic rates that BotMiner can support for online analysis The subpopulations that will yield the highest detection rates Evaluation on traffic traces that contain botnets of interest


Download ppt "BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1,"

Similar presentations


Ads by Google