1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Slides:



Advertisements
Similar presentations
Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Advertisements

Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.
Hashing Part Two Better Collision Resolution Small parts of this material stolen from "File Organization and Access" by Austing and Cassel.
Advanced Algorithms for Massive Datasets Basics of Hashing.
PERSISTENT DROPPING: An Efficient Control of Traffic Aggregates Hani JamjoomKang G. Shin Electrical Engineering & Computer Science UNIVERSITY OF MICHIGAN,
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Hash Tables1 Part E Hash Tables  
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Hash Tables1 Part E Hash Tables  
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
Hash Tables1 Part E Hash Tables  
Detecting Attacks in Routers Using Sketches Dhiman Barman Piyush Satapathy Gianfranco Ciardo.
A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks Yan Gao, Zhichun Li, Yan Chen Lab for Internet and Security Technology.
Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams Jun Xu, Zhenglin Yu (Georgia Tech) Jia Wang, Zihui Ge, He Yan (AT&T.
1 Towards Anomaly/Intrusion Detection and Mitigation on High-Speed Networks Yan Gao, Zhichun Li, Yan Chen Northwestern Lab for Internet and Security Technology.
Towards a High speed Router based Anomaly/Intrusion detection System Yan Gao & Zhichun Li.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
1 Network-based Intrusion Detection, Mitigation and Forensics System Yan Chen Department of Electrical Engineering and Computer Science Northwestern University.
1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.
CS333 Intro to Operating Systems Jonathan Walpole.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
2006/3/211 Multiple Aggregations over Data Stream Rui Zhang, Nick Koudas, Beng Chin Ooi Divesh Srivastava SIGMOD 2005.
A Dos Resilient Flow-level Intrusion Detection Approach for High-speed Networks Yan Gao, Zhichun Li, Yan Chen Department of EECS, Northwestern University.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Streaming Big Data with Self-Adjusting Computation Umut A. Acar, Yan Chen DDFP January 2014 SNU IDB Lab. Namyoon Kim.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Fast Pseudo-Random Fingerprints Yoram Bachrach, Microsoft Research Cambridge Ely Porat – Bar Ilan-University.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,
SketchVisor: Robust Network Measurement for Software Packet Processing
A Resource-minimalist Flow Size Histogram Estimator
Streaming & sampling.
Query-Friendly Compression of Graph Streams
Sublinear Algorithmic Tools 2
Pyramid Sketch: a Sketch Framework
Optimal Elephant Flow Detection Presented by: Gil Einziger,
SCREAM: Sketch Resource Allocation for Software-defined Measurement
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Heavy Hitters in Streams and Sliding Windows
Lu Tang , Qun Huang, Patrick P. C. Lee
Toward Self-Driving Networks
Toward Self-Driving Networks
(Learned) Frequency Estimation Algorithms
Presentation transcript:

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University

2 Online Change Detection Network anomalies are common –Flash crowds, failures, DoS, worms, … Online Detection over Data Streams Data Stream: key/update pairs (k,u) –Heavy hitters (lots of prior work) –Heavy changes

3 -first to detect flow-level heavy changes in massive data streams at network traffic speeds. [Krishnamurthy, Sen, Zhang, Chen, 2003] k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] 1 j H 01K-1 … … …

4 [Krishnamurthy, Sen, Zhang, Chen, 2003] k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] 1 j H 01K-1 … … … hj(k)hj(k) hH(k)hH(k) h1(k)h1(k) Update (k, u): T j [ h j (k)] += u (for all j) Estimate v(S, k): sum of updates for key k

5 ? ?

6 ? ? Main problem –Cannot efficiently report keys with heavy change Our Contribution –Determine set of keys that have “large” estimates in sketch Requires very little space: –E.g. 5 hash tables with 16 K buckets = 80 KB –Fits in high speed memory

“Heavy” Input: Output: Set of keys that hash to heavy buckets in majority (or all) hash tables -Sketch -Threshold Reverse Sketch Problem

8 Outline Streaming data recording k-ary sketch value key Heavy change detection k-ary sketch heavy change keys change threshold fast slow Modular hashing IP mangling Reverse Hashing Algorithms Improve Heavy Change Detection

9 Intersect A 1, A 2, A 3, A 4, A 5 Taking Intersections H = 5 K = 2 12 #keys = 2 32 (IP addresses) E[false positives] << 1

10 The problem with simple intersection Why is this difficult ? Each set A i can be very large ! H = 5 K = 2 12 #keys = 2 32 (IP addresses) |A 1 | = 2 32 / 2 12 = 2 20

11 The problem with simple intersection Why is this difficult ? Each set A i can be very large ! Solution: Modular hashing

12 Modular hashing reduces the set size 32 bits 8 bits h() 12 bits

13 Modular hashing reduces the set size 32 bits 8 bits h 1 ()h 2 ()h 3 ()h 4 () Greatly reduces size of reverse mapped sets

14 Modular hashing reduces the set size 32 bits 8 bits h 1 ()h 2 ()h 3 ()h 4 () Greatly reduces size of reverse mapped sets 2 8 /2 3 = 2 5

b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 Modular hashing reduces the set size Intersection: Only 32 elements per partition

b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 A 2: 2 5 * 2 5 * 2 5 * 2 5 Modular hashing reduces the set size Intersection: Only 32 elements per partition

b1b1 b2b2 b4b4 b5b5 b3b3 b3b3 b1b1 b2b2 b4b4 b5b5 Handling Multiple Intersections… 2 H different intersections Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )

18 Problem: Too many collisions * 32 bits 12 bits

19 Problem: Too many collisions * 32 bits 12 bits IP Mangling Solution:

20 IP-mangling

21 Invertible Modular Linear Equation f(x)  a·x mod n To be invertible: Must be relatively prime a is odd, chosen randomly

22 Modular Hashing Optimal Hashing

23 Modular Hashing Modular Hashing with IP Mangling Optimal Hashing

24 Recap: Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold

25 Evaluation Traffic traces from Northwestern University edge router –Each 5 min interval  average traffic 7.5 GB in each interval Compared with Ground Truth 6 hash tables, 4K buckets each, totally 192KB memory Up to 140 true heavy change keys in 1.5 seconds –Over 95% TPP –Less than 2% FPP All missing changes are due to boundary effects

26 Conclusions/ Future Work Sketches: efficient summary structures Our contribution: Reversible Sketches –efficient online detection of keys with heavy changes Work in Progress (see tech report) Improved reverse hashing Statistical guarantee on detection accuracy More advanced applications: –Hierarchical change detection E.g * shows a big change !

27 See tech report for more! Thank you !