Download presentation
Presentation is loading. Please wait.
Published byAlexandra Glass Modified over 11 years ago
1
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks
2
Overview Problem: Botnet and Spam Detection in high-speed networks Common theme: Examine network-level properties and build classifier Two systems: BotMiner and SNARE –Overview –Integration with SMITE architecture Current integration status and plan
3
3 BotMiner: Structure and Protocol Independent Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
4
4 Definition of a Botnet A coordinated group of malware instances that are controlled by a botmaster via some C&C channel –Hosts that have similar C&C-like traffic and similar malicious activities We need to monitor two planes –C-plane (C&C communication plane): who is talking to whom –A-plane (malicious activity plane): who is doing what
5
5 BotMiner Architecture Sensors Algorithms Correlation
6
6 BotMiner C-plane Clustering What characterizes a communication flow (C- flow) between a local host and a remote service? – –Temporal related statistical distribution information –E.g., BPS (bytes per second), FPH (flows per hour) –Spatial related statistical distribution information –E.g., BPP (bytes per packet), PPF (packets per flow)
7
7 A-plane Clustering Capture similar activities patterns
8
8 Cross-plane Correlation Botnet score s(h) for every host h –A host has higher score if it is in more activity clusters and in both activity and communication clusters –A host with a high score is a bot Similarity score between bot host h i and h j –Two hosts in the same A-clusters and in at least one common C-cluster are clustered together –Each cluster is a bot
9
9 SMITE Integration: BotMiner
10
10 Sensors –Feature extraction for C-Plane and A-Plane clustering –C-Flow temporal and statistical features Counting packets and connections between each pair of endpoints: bytes per second, flows per hour, bytes per packet, packets per flow –A-Plane header and payload features Destination IP addresses and ports, payload bytes/strings –These sensors are not specific to BotMiner Integrating BotMiner and SMITE
11
11 Algorithms –C-plane clustering Multi-step clustering based on statistical and temporal C-flow features –A-plane clustering Based on activity-specific similarity measures: e.g., spread of destination IP addresses and ports, Dices coefficient of string similarity, and byte frequency or entropy of payload –Bot scoring and botnet clustering methods Scoring based on participation in C-plane and A-plane clusters Clustering based on common memberships in the C-plane and A-plane clusters Integrating BotMiner and SMITE
12
12 Correlation –Botnet detection involves both vertical and horizontal analysis/clustering: Vertical: what activities a host has been involved in –Bot detection Horizontal: what other hosts have similar (vertical) behavior patterns –Botnet detection –Similar analysis can be applied to other alerts Improve botnet detection Understand malicious activities and plans of attacks Measure the scale of attacks Integrating BotMiner and SMITE
13
13 Filter email based on how it is sent, in addition to simply what is sent. Network-level properties are less malleable –Hosting or upstream ISP (AS number) –Membership in a botnet (spammer, hosting infrastructure) –Network location of sender and receiver –Set of target recipients Network-Based Spam Detection
14
14 Finding the Right Features Goal: Sender reputation from a single packet header? –Low overhead –Fast classification –In-network –Perhaps more evasion resistant Key challenge –What features satisfy these properties and can distinguish spammers from legitimate senders?
15
15 Network-Level Features Single-Packet –AS of senders IP –Distance to k nearest senders –Status of email service ports –Geodesic distance –Time of day Single-Message –Number of recipients –Length of message Aggregate (Multiple Message/Recipient)
16
16 Sender-Receiver Geodesic Distance 90% of legitimate messages travel 2,200 miles or less
17
17 Density of Senders in IP Space For spammers, k nearest senders are much closer in IP space
18
18 Local Time of Day at Sender Spammers peak at different local times of day
19
19 Other Network-Level Features Time-of-day at sender Upstream AS of sender Message size (and variance) Number of recipients (and variance)
20
20 Combining Features: RuleFit Put features into the RuleFit classifier 10-fold cross validation on one day of query logs from a large spam filtering appliance provider Comparable performance to SpamHaus –Incorporating into the system can further reduce FPs Using only network-level features Completely automated
21
21 Benefits of Whitelisting Whitelisting top 50 ASes: False positives reduced to 0.14%
22
22 Integrating SNARE and SMITE Sensors Algorithms/ Correlation
23
23 Integration with SMITE Sensors –Extract network features from traffic –IP addresses –Combine with auxiliary data (routing, time, etc.) Algorithms –Clustering algorithm to identify behavioral fingerprints –Learning algorithm to classify based on multiple features Correlation –Clusters formed by aggregating sending behavior observed across multiple sensors –Various features also require input from data collected across collections of IP addresses
24
24 SMITE Integration Challenges Sources of labeled data –SNARE requires clean sources of labeled data for training Data collection –SNAREs performance improves when behavior can be observed across multiple domains
25
25 Overall SMITE Integration
26
26 SMITE Integration: Current Work Study pipeline architecture and code Modify flow-analyzer to dump 5-tuple flow information
27
27 SMITE Integration: Phase I Modify flow-analyzer with SMITE team to generate 5-tuple flow information (mid-March) Spam/scan detection, flow aggregation in BotMiner; Spam feature extraction in SNARE (end of March) Clustering and correlation in BotMiner; Classifier in SNARE (end of April)
28
28 SMITE Integration: Phase II Evaluate performance of BotMiner and SNARE –How many hours to process one-day of traffic, or what is the lag time between event and detection? Design real-time detection algorithms –A two-tier system: off-line module output lists of suspicious hosts, and real-time module inspects all packets of these hosts; or, off-line module output clusters Design algorithms to handle asymmetric traffic –Cluster on each direction of traffic and cross-correlate
29
Thank You!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.