Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.

Similar presentations


Presentation on theme: "Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish."— Presentation transcript:

1 Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish Gupta 1, Yin Zhang 2, Peter Dinda 1, Ming-Yang Kao 1, Gokhan Memik 1 1 Lab for Internet and Security Technology (LIST), Northwestern Univ. 2 University of Texas at Austin

2 The Spread of Sapphire/Slammer Worms

3 Motivation (online change detection) Online network anomaly/intrusion detection over high speed links –Small memory usage –Small # of memory access per packet –Scalable to large key space size Primitives for online anomaly detection –Heavy hitters (lots of prior work) –Heavy changes: enabler for aggregate queries over multiple data streams Asymmetric routing demands spatial aggregation Time Series Analysis (TSA) need temporal aggregation

4 Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

5 [Krishnamurthy, Sen, Zhang, Chen, 2003] First to detect flow-level heavy changes in massive data streams at network traffic speeds K-ary sketch 1 j H 01K-1 … … …

6 k-ary sketch 1 j H 01K-1 … … … hj(k)hj(k) hH(k)hH(k) h1(k)h1(k) Update (k, u): T j [ h j (k)] += u (for all j) Estimate v(S, k): sum of updates for key k [Krishnamurthy, Sen, Zhang, Chen, 2003] APIs: + =  S=COMBINE( ,S1, ,S2):

7 ? ? Main problem –Cannot efficiently report keys with heavy change INFERENCE(S,t) –Important function for anomaly detection! Our Contribution –Determine set of keys that have “large” estimates in a sketch Reverse Sketch Problem

8 Reversible sketch framework Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold

9 Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

10 Intersect A 1, A 2, A 3, A 4, A 5 Taking Intersections H = 5 K = 2 12 #keys = 2 32 (IP addresses) E[false positives] << 1

11 The problem with simple intersection Each set A i can be very large ! H = 5 K = 2 12 #keys = 2 32 (IP addresses) |A 1 | = 2 32 / 2 12 = 2 20

12 The problem with simple intersection Each set A i can be very large ! Solution: Modular hashing

13 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 010 110 001 101 h() 12 bits

14 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 h 1 ()h 2 ()h 3 ()h 4 () 010110001101 010 110 001 101 Greatly reduces size of reverse mapped sets

15 Modular hashing reduces the set size 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 Intersection: Only 32 elements per word set

16 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 A 2: 2 5 * 2 5 * 2 5 * 2 5 Intersection: Modular hashing reduces the set size

17 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits

18 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits IP Mangling with GF (Galois Extension Field) Solution: IP Mangling: a bijective mapping function for breaking the key space continuity

19 Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

20 Handling Multiple Intersections… 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 b3b3 b1b1 b2b2 b4b4 b5b5 2 H different intersections Much more difficult – Solution: Reverse Hashing algorithms Step 1: Reverse hashing for each module Step 2: Infer the whole key through bucket index matching among candidates from each module

21 Reverse Hashing for Each Module 1 2 3 5 4 H=5, r=1, K=2 12 r tolerance level candidate set of the first word in Hash table i All possible values of the first word in the sketch Take the first word as an example { 2,3,5} { 2, 6,9,10} {0,2,3} { 2,3,8,10} { 3,6,7,9} {2}{2,3}

22 Bucket Index Matrix of Candidates H=5, r=1, K=2 12 For each x in I 1, we can get B 1 (x), a vector of the heavy bucket sets which x hashes to. 192.168.0.1 1 2 3 5 4 b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b 52 1 2 3 5 4 b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b 52 192.123.47.62 1 2 3 5 4 b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b 52 192.*.*.* hash to the red heavy buckets

23 Prefix Extension Algorithm I1I1 I2I2 B1B1 B2B2 150 47 236 72 104 += * more than r=1 Ignore! Ignore! Path discovery algorithm

24 += 182 32 49 I3I3 B3B3 + = 75 I4I4 B4B4 Prefix Extension Algorithm

25 Recap: Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold n is the size of key space

26 Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

27 Evaluation Dataset –A large US ISP (330M Netflow records) –NU (19M Netflow records) Efficient data recording For the worst case traffic, all 40-byte packets –Software: 526Mbps on P4 3.2Ghz PC –Hardware: 16Gbps on a single FPGA broad –Only a few hundred KB to a couple of MB memory used –Only 15 memory access per packet for 48 bit reversible sketches and 16 per packet for 64 bit reversible sketches Efficient heavy change detection and key inference –0.34 seconds for 100 changes. 13.33 seconds for 1000 change

28 Key Inference Accuracy True positives and false positives of 16bit reversible sketches for 32bit IP addresses [Deltoids]: S.Muthukrishnan and Graham Cormode, What's New: Find Significant Differences in Network Data Streams. Infocom 2004

29 Stress test with larger dataset still accurate Scalable to larger key space size: similar results for 64bit IP pairs Built anomaly/intrusion detection system to detect, e.g., SYN flooding and port scans [ICDCS 2006] More Results

30 Conclusions Proposed the first reversible sketches which Record high speed network streams online Detect the heavy changes and infer the keys online Small memory usage, small # of memory access per packet Scalable to large key space size

31 Backup Slides

32 Related work Compare with [deltoids] –Accuracy better –Scalable to large key space better –# of Memory access less [PCF, IMC2004]: not reversible [Q. Zhao et al, IMC2005] [S.Venkataraman, NDSS2005]: unique fan-out (fan-in) estimation.

33 Modular Hashing Optimal Hashing

34 However… Not reversible Lack of an inference API: INFERENCE(S,t) Important function for anomaly detection! Decouple the recording stage of sketches from the detection stage to enable efficient combine and inference. Given a threshold t, report keys whose corresponding sum of updates are larger than the threshold. Our contribution: an efficient algorithm for inference Reversible sketch problem

35 ? ?

36 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits IP Mangling with Solution:

37 IP-mangling Use GF (Galois Extension Field) function for attack resilience

38 Modular Hashing Modular Hashing with IP Mangling Optimal Hashing

39 Reverse Hashing for Each Module 1 2 3 5 4 b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b 52 H=5, r=1, K=2 12 all possible value of the first word for the No. j heavy bucket in Hash table i all possible value of the first word in Hash table i All possible value of the first word in the sketch Take the first word as an example

40 False positive reduction by original sketch verifying Estimate (, 180) Threshold 150 (, 180) Final result Verified original k -ary sketch

41 [Krishnamurthy, Sen, Zhang, Chen, 2003] K-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] first to detect flow-level heavy changes in massive data streams at network traffic speeds APIs –UPDATE(S,k,u): T j [ h j (k)] += u (for all j) –ESTIMATE(S, k): sum of updates for key k –Linear combination: S=COMBINE( ,S 1, ,S 2 ) + = 


Download ppt "Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish."

Similar presentations


Ads by Google