Download presentation

Presentation is loading. Please wait.

Published byRoman Sabey Modified over 2 years ago

1
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs, Alcatel-Lucent 1 Columbia University 2 May 10, 2007

2
Outline Motivation Why heavy-key detection? What are the challenges? Sequential hashing scheme Allows fast, memory-efficient heavy-key detection in high-speed networks Results of trace-driven simulation

3
Motivation Many anomalies in today’s networks: Worms, DoS attacks, flash crowds, … Input: a stream of packets in (key, value) pairs Key: e.g., srcIPs, flows,… Value: e.g., data volume Goal: identify heavy keys that cause anomalies Heavy hitters: keys with massive data in one period E.g., flows that violate service agreements Heavy changers: keys with massive data change across two periods E.g., sources that start DoS attacks

4
Challenge Keeping track of per-key values is infeasible Counter value Key … 1 2 3 N Number of keys = 2 32 if we keep track of source IPs Number of keys = 2 104 if we keep track of 5-tuples (srcIP, dstIP, srcPort, dstPort, proto) v1v1 v2v2 v3v3 vNvN

5
Goal Find heavy keys using a “smart” design: Fast per-packet update Fast identification of heavy keys Memory-efficient High accuracy

6
Previous Work Multi-stage filter [Estan & Varghese, 03] Covers only heavy hitter detection, but not heavy changer detection Deltoids [Cormode & Muthukrishnan, 04] Covers both heavy hitter and heavy changer detections, but is not memory-efficient in general Reversible sketch [Schweller et al., 06] Space and time complexities of detection are sub-linear in the key space size

7
Our Contributions Derive the minimum memory requirement subject to a targeted error rate Propose a sequential hashing scheme that is memory- efficient and allows fast detection Propose an accurate estimation method to estimate the values of heavy keys Show via trace-driven simulation that our scheme is more accurate than the existing work

8
Use a hash array [Estan & Varghese, 2003] M independent hash tables K buckets in each table Table 1 2 … M 1 2 K : bucket Hash array Minimum Memory Requirement How to feasibly keep track of per-key values?

9
1 2 K : bucket For each packet of key x, Find bucket in Table i by hashing x: h i (x) Increment the counter of each hash bucket by value v Packet Key x value v +v Record step Minimum Memory Requirement h1h1 h2h2 hMhM Table 1 2 … M

10
Find heavy buckets, whose values (changes) > threshold Heavy keys: associated buckets are heavy buckets 1 2 K : bucket Heavy bucket Detection step Minimum Memory Requirement Table 1 2 … M

11
Input parameters: N = size of the key space H = max. number of heavy keys = error rate, Pr(a non-heavy key is treated as a heavy key) Objective: Find all heavy keys subject to a targeted error rate . Minimum memory requirement: Size of a hash array, given by M*K, is minimized when K = H / ln(2) M = log 2 (N / ( H)) Minimum Memory Requirement

12
How to identify heavy keys? Table 1 2 … M 1 2 K : bucket Challenge: hash array is irreversible Many-to-one mapping Solution: Enumerate all keys!! Computationally expensive Heavy bucket

13
Sequential Hashing Scheme Basic idea: smaller keys first, then larger keys Observation: if there are H heavy keys, then there are at most H unique sub-keys with respect to the heavy keys Find all possible sub-keys of the H heavy keys Enumeration of a sub-key space is easier Sub-IP space Size = 2 8 0000 12859 16 1 : 255 Entire IP space Size = 2 32 135104 0 2 : : Heavy key

14
1 2 K : … Array 1Array 2Array D Sequential Hashing Scheme - Record step bucket +v Key x Input: (key x, value v) w1w1 w2w2 wDwD w3w3 … Table1... M 2 1 2 … M 1 1 2... M D

15
… Sequential Hashing Scheme - Detection step 1 2 K : Array 1 Array 2 Array D (1 + )H w 1 ’s Try all w 1 ’s (1 + )H w 1 w 2 ’s (1 + )H w 1 w 2 w 3 ’s Try all w 2 ’s (1 + )H w 1 w 2 …w D ’s … Try all w 3 ’sTry all w D ’s - intermediate error rate - targeted error rate Array 3 Heavy bucket

16
Estimation Goal: find the values of heavy keys Rank the importance of heavy keys Eliminate more non-heavy keys Use maximum likelihood Bucket values due to non-heavy keys ~ Weibull Estimation is solved by linear programming

17
Recap Data stream Record step Hash arrays 1 2 K : Array 1 Array D … Record step Hash arrays Detection step Estimation Threshold Candidate heavy keys Heavy keys + values Detection step

18
Experiments Traces: Abilene data collected at an OC-192 link 1 hour long, ~50 GB traffic Evaluation approach: Compare our scheme and Deltoids [Cormode & Muthukrishnan, 04], both of which use the same number of counters Metrics: False positive rate (# of non-heavy keys treated as heavy) / (# of returned keys) False negative rate (# of heavy keys missed) / (true # of heavy keys)

19
Results - Heavy Hitter Detection Worst-case error rates: Sequential hashing: 1.2% false +ve and 0.8% false -ve Deltoids: 10.5% false +ve, 80% false –ve False +ve/-ve rates of sequential hashing

20
Results - Heavy Changer Detection False +ve/-ve rates of sequential hashing Worst-case error rates: Sequential hashing: 1.8% false +ve, 2.9% false -ve Deltoids: 1.2% false +ve, 70% false –ve

21
Summary of Results High accuracy of heavy-key detection while using a memory-efficient data structure Fast detection On the order of seconds Accurate estimation Provides more accurate estimates than least-square regression [Lee et al., 05]

22
Conclusions Derived the minimum memory requirement for heavy- key detection Proposed the sequential hashing scheme Using a memory-efficient data structure Allowing fast detection Providing small false positives/negatives Proposed an accurate estimation method to reconstruct the values of heavy keys

23
Thank you

24
How to Determine H? H = maximum number of heavy keys Total data volume threshold H ≈

25
Tradeoff Between Memory and Computation – intermediate error rate Large : fewer tables, more computation Small : more tables, less computation

Similar presentations

OK

Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.

Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google