Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Similar presentations


Presentation on theme: "1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer."— Presentation transcript:

1 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University

2 2 Online Change Detection Network anomalies are common –Flash crowds, failures, DoS, worms, … Online Detection over Data Streams Data Stream: key/update pairs (k,u) –Heavy hitters (lots of prior work) –Heavy changes

3 3 -first to detect flow-level heavy changes in massive data streams at network traffic speeds. [Krishnamurthy, Sen, Zhang, Chen, 2003] k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] 1 j H 01K-1 … … …

4 4 [Krishnamurthy, Sen, Zhang, Chen, 2003] k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] 1 j H 01K-1 … … … hj(k)hj(k) hH(k)hH(k) h1(k)h1(k) Update (k, u): T j [ h j (k)] += u (for all j) Estimate v(S, k): sum of updates for key k

5 5 ? ?

6 6 ? ? Main problem –Cannot efficiently report keys with heavy change Our Contribution –Determine set of keys that have “large” estimates in sketch Requires very little space: –E.g. 5 hash tables with 16 K buckets = 80 KB –Fits in high speed memory

7 7 1 2 3 5 4 “Heavy” Input: Output: Set of keys that hash to heavy buckets in majority (or all) hash tables -Sketch -Threshold Reverse Sketch Problem

8 8 Outline Streaming data recording k-ary sketch value key Heavy change detection k-ary sketch heavy change keys change threshold fast slow Modular hashing IP mangling Reverse Hashing Algorithms Improve Heavy Change Detection

9 9 Intersect A 1, A 2, A 3, A 4, A 5 Taking Intersections H = 5 K = 2 12 #keys = 2 32 (IP addresses) E[false positives] << 1

10 10 The problem with simple intersection Why is this difficult ? Each set A i can be very large ! H = 5 K = 2 12 #keys = 2 32 (IP addresses) |A 1 | = 2 32 / 2 12 = 2 20

11 11 The problem with simple intersection Why is this difficult ? Each set A i can be very large ! Solution: Modular hashing

12 12 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 010 110 001 101 h() 12 bits

13 13 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 h 1 ()h 2 ()h 3 ()h 4 () 010110001101 010 110 001 101 Greatly reduces size of reverse mapped sets

14 14 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 h 1 ()h 2 ()h 3 ()h 4 () 010110001101 010 110 001 101 Greatly reduces size of reverse mapped sets 2 8 /2 3 = 2 5

15 15 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 Modular hashing reduces the set size Intersection: Only 32 elements per partition

16 16 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 A 2: 2 5 * 2 5 * 2 5 * 2 5 Modular hashing reduces the set size Intersection: Only 32 elements per partition

17 17 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 b3b3 b1b1 b2b2 b4b4 b5b5 Handling Multiple Intersections… 2 H different intersections Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )

18 18 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits

19 19 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits IP Mangling Solution:

20 20 IP-mangling

21 21 Invertible Modular Linear Equation f(x)  a·x mod n To be invertible: Must be relatively prime a is odd, chosen randomly

22 22 Modular Hashing Optimal Hashing

23 23 Modular Hashing Modular Hashing with IP Mangling Optimal Hashing

24 24 Recap: Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold

25 25 Evaluation Traffic traces from Northwestern University edge router –Each 5 min interval  average traffic 7.5 GB in each interval Compared with Ground Truth 6 hash tables, 4K buckets each, totally 192KB memory Up to 140 true heavy change keys in 1.5 seconds –Over 95% TPP –Less than 2% FPP All missing changes are due to boundary effects

26 26 Conclusions/ Future Work Sketches: efficient summary structures Our contribution: Reversible Sketches –efficient online detection of keys with heavy changes Work in Progress (see tech report) Improved reverse hashing Statistical guarantee on detection accuracy More advanced applications: –Hierarchical change detection E.g. 129.105.100.* shows a big change !

27 27 See tech report for more! http://list.cs.northwestern.edu Thank you !


Download ppt "1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer."

Similar presentations


Ads by Google