Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tracking Malicious Regions of the IP Address Space Dynamically.

Similar presentations


Presentation on theme: "Tracking Malicious Regions of the IP Address Space Dynamically."— Presentation transcript:

1 Tracking Malicious Regions of the IP Address Space Dynamically

2 Introduction Recent focus on network-level behaviour of malicious activity:  Spam originating from different /24 prefixes [RF06]  Spam originating from long-lived network-aware clusters [VSSHS07]  Malicious activity originating from different /16 prefixes [CSFJWS07] Application:  Real-time, effective mechanism for blocking traffic at line speed Challenges:  Transience of individual IP addresses: due to e.g., bots, DHCP effects  Effective clustering: variety of different clustering of IP prefixes possible  Scaling to large volume of data [RF06] Ramachandran & Feamster, Understanding the Network-level Behavior of Spammers, SIGCOMM’06 [VSSHS07] Venkataraman et al. Exploiting Network Structure for Proactive Spam Mitigation. Security’ 07 [CSFJWS07] Collins et al, Using Uncleanliness to Predict Future Botnet Addresses, IMC’07

3 Study: Transience of Individual IPs Spamming IPs present on many days contribute little spam! Measurement Study: Analysis of spamming IP addresses & regions Data: Enterprise mail logs over 6 months, 28 million emails

4 Network-Aware Clusters [KW00]: Pre-defined set of IP prefixes that reflect Internet routing structure IP address belongs to cluster with longest matching prefix Study: Persistence of IP Prefix Clusters 90% of total spam comes from “bad” clusters present for 60+ days Most spam comes from “bad” clusters present for many days Bad Clusters [KW00] On Network-Aware Clustering of Web Clients, Krishnamurty & Wang, SIGCOMM ’00 Bad IPs

5 Introduction Recent focus on network-level behaviour of malicious activity:  Spam originating from different /24 prefixes [RF06]  Spam originating from long-lived network-aware clusters [VSSHS07]  Malicious activity originating from different /16 prefixes [CSFJWS07] Challenges:  Transience of individual IP addresses: due to e.g., bots, DHCP effects  Scaling to large volumes of data  Effective Clustering: variety of different clustering of IP prefixes possible Question:  Can we automatically compute the optimal clustering for tracking and predicting malicious activity? [RF06] Ramachandran & Feamster, Understanding the Network-level Behavior of Spammers, SIGCOMM’06 [VSSHS07] Venkataraman et al. Exploiting Network Structure for Proactive Spam Mitigation. Security’ 07 [CSFJWS07] Collins et al, Using Uncleanliness to Predict Future Botnet Addresses, IMC’07

6 Problem, v.1 (naïve) Input: IP addresses labeled normal(+)/malicious(-) e.g., e-mail logs with sending mail server’s IP, SpamAssassin labels spam(-) or nonspam(+) Output: Tree of IP prefixes, that  optimal for classifying IPs as normal(+)/malicious(-)  contains no more than k leaves Limit output IP tree to k leaves:  Small output state  Avoid overfitting 0.0.0.0/0 0.0.0.0/1 128.0.0.0/1 0.0.0.0/2 192.0.0.0/2 x.x.x.x/32 (IP addr) --- + ---- x.x.x.x/24 x.x.x.x/30

7 Challenges As stated, version 1 easy: use dynamic programming! However: IP tree over dynamic and evolving data:  Data is collected and labelled over time (e.g., e-mail logs, traffic logs)  Compromised/malicious IP prefixes may change over time Want to compute updated tree without going back to past data Low space & low overhead: up to 2 32 leaves in IP tree! Want algorithm to use space near-linear in k Adversarially-generated IP addresses: Want guarantees as function of data seen & adversary’s power Approach: Design online algorithms to address all issues

8 Online Learning Low space & overhead: IPs seen one at a time  Algorithm only needs to maintain small internal state Naturally incorporates dynamic aspect of data Guarantees: function of mistakes made by optimal offline tree  Quantifies cost of learning tree online  Data may be generated adversarially Algorithm IP Predicted Label Correct Label Data e.g., mail logs labelled +/-

9 Problem, v.2: Online Learning of Static Tree Input: a stream of IPs labelled normal(+)/malicious(-) Output: tree of IP prefixes:  Predicts nearly as well as OPTIMAL offline tree with k leaves  Using low space in online model of learning 0.0.0.0/0 0.0.0.0/1 128.0.0.0/1 0.0.0.0/2 192.0.0.0/2 x.x.x.x/32 (IP addr) --- + ---- x.x.x.x/24 x.x.x.x/30

10 Dynamic IP Prefix Tree Malicious IP regions may change over time New Goal: Predict nearly as well as the optimal changing tree  Optimal tree may make two kinds of changes: splitting & changing leaf sign  Prediction Guarantee: Our mistakes = O(OPTIMAL tree’s cost) Optimal tree’s cost: function of mistakes made, and changes it makes + - #1: Leaf changes sign #2: Leaf splits

11 - - + + - IP address x.x.x.x/32 0.0.0.0/0 0.0.0.0/1 128.0.0.0/1 -- - + -- - - x.x.x.x/24 x.x.x.x/29 - - + + + - + + - - - + Problem: Compute IP tree online to predict nearly as well as best changing tree with k leaves, using space O(k) ~ Problem: Online Learning of Dynamic Tree

12 Related Problems Machine learning:  Predicting as well as the best pruning of a decision tree [HS97] Our requirements: low space, tracking a dynamic tree online, real-world implementation  Learning decision trees in streaming models [DH00] Our requirement: Adversarial data Simpler problem: tree fixed; no need to “learn” a tree Online algorithms:  Paging Algorithms [HS97] Helmbold & Schapire, Predicting Nearly As Well As the Best Pruning of a Decision Tree. Machine Learning ‘97 [DH00] Domingos & Hulten, Mining High-Speed Data Streams, KDD 2000

13 Overview of our Algorithm Given IP address, predict:  Trace IP path on tree & flag all nodes on path (i.e., all prefixes of input IP in tree)  Label IP by combining weights of flagged nodes Given correct label, update:  If predicted correctly, do nothing  If predicted incorrectly: update flagged node weights grow tree by splitting leaf (if necessary) IP: a.b.c.d Predicted Label Correct Label

14 Four Algorithmic Questions Fixed Tree Structure:  Relative importance of flagged nodes?  Label of flagged node: positive/negative? Changing the Tree Structure:  When to grow the tree?  How to maintain a small tree?

15 w1w1 Relative Importance of Nodes Use sleeping experts algorithm:  Every node is expert  Each flagged node is an “awake” expert  Best expert: leaf of optimal tree Algorithm:  Each node has a “relative” weight  Predict with all experts that are awake e.g., weighted coin toss  Update by redistribution of weight between awake experts w0w0 w2w2 w 10 w 15 w 16 w 17

16 Label of Flagged Nodes Shifting experts problem:  Each node has 2 internal experts: “positive” expert: label + “negative” expert: label –  Best expert: label of node Algorithm:  Predict by weighted coin toss between + & – experts  Update by shifting weight from incorrect to correct expert Tracking a dynamic tree:  Automatically incorporates leaf changing sign w+w+ w-w-

17 Growing the Tree Algorithm:  Track # mistakes made by each leaf  Split after leaf makes sufficient mistakes Tracking dynamic tree:  Also incorporates changes caused by leaf splitting # mistakes?

18 Maintaining Small-Space Tree Convert to Paging Problem:  Each node is a page  Size of optimal cache: 2k  Q: which node to discard? Use paging algorithms & competitive analysis:  e.g. using Flush-When-Full Start from scratch after tree has grown to max size ? ? ? ? ?? ??

19 Analysis Define ε so that: additive change to internal node weight per mistake: ε multiplicative loss factor to relative node weight per mistake: 1/ε mistakes needed before splitting node: 1/ε Then, E[# mistakes of algorithm] = (1+ ε ) (mistakes of OPTIMAL) + (1/ ε )(sign-changes of OPTIMAL’s leaves) + (log k/ ε ) (splits of OPTIMAL’s leaves) Space required in using FWF: O(k log k/ ε 2 )

20 Implementation Issues Sparse data from some IP prefixes  Might not see any IP addresses from some prefixes Clusters might be too big to be meaningful  Include loss function to penalize prefixes for “nothingness” Efficient implementations for large-scale data:  Coalesce nodes appropriately in binary tree (effectively becomes trie)  Randomization calls are computationally-expensive Experimental performance: In progress


Download ppt "Tracking Malicious Regions of the IP Address Space Dynamically."

Similar presentations


Ads by Google