Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,

Similar presentations


Presentation on theme: "A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,"— Presentation transcript:

1 A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs, Alcatel-Lucent 1 Columbia University 2 May 10, 2007

2 Outline Motivation Why heavy-key detection? What are the challenges? Sequential hashing scheme Allows fast, memory-efficient heavy-key detection in high-speed networks Results of trace-driven simulation

3 Motivation Many anomalies in today’s networks: Worms, DoS attacks, flash crowds, … Input: a stream of packets in (key, value) pairs Key: e.g., srcIPs, flows,… Value: e.g., data volume Goal: identify heavy keys that cause anomalies Heavy hitters: keys with massive data in one period E.g., flows that violate service agreements Heavy changers: keys with massive data change across two periods E.g., sources that start DoS attacks

4 Challenge Keeping track of per-key values is infeasible Counter value Key … 1 2 3 N Number of keys = 2 32 if we keep track of source IPs Number of keys = 2 104 if we keep track of 5-tuples (srcIP, dstIP, srcPort, dstPort, proto) v1v1 v2v2 v3v3 vNvN

5 Goal Find heavy keys using a “smart” design: Fast per-packet update Fast identification of heavy keys Memory-efficient High accuracy

6 Previous Work Multi-stage filter [Estan & Varghese, 03] Covers only heavy hitter detection, but not heavy changer detection Deltoids [Cormode & Muthukrishnan, 04] Covers both heavy hitter and heavy changer detections, but is not memory-efficient in general Reversible sketch [Schweller et al., 06] Space and time complexities of detection are sub-linear in the key space size

7 Our Contributions Derive the minimum memory requirement subject to a targeted error rate Propose a sequential hashing scheme that is memory- efficient and allows fast detection Propose an accurate estimation method to estimate the values of heavy keys Show via trace-driven simulation that our scheme is more accurate than the existing work

8 Use a hash array [Estan & Varghese, 2003] M independent hash tables K buckets in each table Table 1 2 … M 1 2 K : bucket Hash array Minimum Memory Requirement How to feasibly keep track of per-key values?

9 1 2 K : bucket For each packet of key x, Find bucket in Table i by hashing x: h i (x) Increment the counter of each hash bucket by value v Packet Key x value v +v Record step Minimum Memory Requirement h1h1 h2h2 hMhM Table 1 2 … M

10 Find heavy buckets, whose values (changes) > threshold Heavy keys: associated buckets are heavy buckets 1 2 K : bucket Heavy bucket Detection step Minimum Memory Requirement Table 1 2 … M

11 Input parameters: N = size of the key space H = max. number of heavy keys  = error rate, Pr(a non-heavy key is treated as a heavy key) Objective: Find all heavy keys subject to a targeted error rate . Minimum memory requirement: Size of a hash array, given by M*K, is minimized when K = H / ln(2) M = log 2 (N / (  H)) Minimum Memory Requirement

12 How to identify heavy keys? Table 1 2 … M 1 2 K : bucket Challenge: hash array is irreversible Many-to-one mapping Solution: Enumerate all keys!! Computationally expensive Heavy bucket

13 Sequential Hashing Scheme Basic idea: smaller keys first, then larger keys Observation: if there are H heavy keys, then there are at most H unique sub-keys with respect to the heavy keys Find all possible sub-keys of the H heavy keys Enumeration of a sub-key space is easier Sub-IP space Size = 2 8 0000 12859 16 1 : 255 Entire IP space Size = 2 32 135104 0 2 : : Heavy key

14 1 2 K : … Array 1Array 2Array D Sequential Hashing Scheme - Record step bucket +v Key x Input: (key x, value v) w1w1 w2w2 wDwD w3w3 … Table1... M 2 1 2 … M 1 1 2... M D

15 … Sequential Hashing Scheme - Detection step 1 2 K : Array 1 Array 2 Array D (1 +  )H w 1 ’s Try all w 1 ’s (1 +  )H w 1 w 2 ’s (1 +  )H w 1 w 2 w 3 ’s Try all w 2 ’s (1 +  )H w 1 w 2 …w D ’s … Try all w 3 ’sTry all w D ’s  - intermediate error rate  - targeted error rate Array 3 Heavy bucket

16 Estimation Goal: find the values of heavy keys Rank the importance of heavy keys Eliminate more non-heavy keys Use maximum likelihood Bucket values due to non-heavy keys ~ Weibull Estimation is solved by linear programming

17 Recap Data stream Record step Hash arrays 1 2 K : Array 1 Array D … Record step Hash arrays Detection step Estimation Threshold Candidate heavy keys Heavy keys + values Detection step

18 Experiments Traces: Abilene data collected at an OC-192 link 1 hour long, ~50 GB traffic Evaluation approach: Compare our scheme and Deltoids [Cormode & Muthukrishnan, 04], both of which use the same number of counters Metrics: False positive rate (# of non-heavy keys treated as heavy) / (# of returned keys) False negative rate (# of heavy keys missed) / (true # of heavy keys)

19 Results - Heavy Hitter Detection Worst-case error rates: Sequential hashing: 1.2% false +ve and 0.8% false -ve Deltoids: 10.5% false +ve, 80% false –ve False +ve/-ve rates of sequential hashing

20 Results - Heavy Changer Detection False +ve/-ve rates of sequential hashing Worst-case error rates: Sequential hashing: 1.8% false +ve, 2.9% false -ve Deltoids: 1.2% false +ve, 70% false –ve

21 Summary of Results High accuracy of heavy-key detection while using a memory-efficient data structure Fast detection On the order of seconds Accurate estimation Provides more accurate estimates than least-square regression [Lee et al., 05]

22 Conclusions Derived the minimum memory requirement for heavy- key detection Proposed the sequential hashing scheme Using a memory-efficient data structure Allowing fast detection Providing small false positives/negatives Proposed an accurate estimation method to reconstruct the values of heavy keys

23 Thank you

24 How to Determine H? H = maximum number of heavy keys Total data volume threshold H ≈

25 Tradeoff Between Memory and Computation  – intermediate error rate Large  : fewer tables, more computation Small  : more tables, less computation


Download ppt "A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,"

Similar presentations


Ads by Google