Lu Tang , Qun Huang, Patrick P. C. Lee

Slides:



Advertisements
Similar presentations
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Advertisements

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Fast Algorithms For Hierarchical Range Histogram Constructions
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.
Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams Jun Xu, Zhenglin Yu (Georgia Tech) Jia Wang, Zihui Ge, He Yan (AT&T.
Fast and Robust Worm Detection Algorithm Tian Bu Aiyou Chen Scott Vander Wiel Thomas Woo bearhsu.
1 BRICK: A Novel Exact Active Statistics Counter Architecture Nan Hua 1, Bill Lin 2, Jun (Jim) Xu 1, Haiquan (Chuck) Zhao 1 1 Georgia Institute of Technology.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
By Graham Cormode and Marios Hadjieleftheriou Presented by Ankur Agrawal ( )
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
We used ns-2 network simulator [5] to evaluate RED-DT and compare its performance to RED [1], FRED [2], LQD [3], and CHOKe [4]. All simulation scenarios.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,
SketchVisor: Robust Network Measurement for Software Packet Processing
Problem: Internet diagnostics and forensics
Constant Time Updates in Hierarchical Heavy Hitters
Jennifer Rexford Princeton University
A Resource-minimalist Flow Size Histogram Estimator
The Variable-Increment Counting Bloom Filter
Augmented Sketch: Faster and More Accurate Stream Processing
Query-Friendly Compression of Graph Streams
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Edge computing (1) Content Distribution Networks
Qun Huang, Patrick P. C. Lee, Yungang Bao
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Approximate Frequency Counts over Data Streams
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Memento: Making Sliding Windows Efficient for Heavy Hitters
Constant Time Updates in Hierarchical Heavy Hitters
By: Ran Ben Basat, Technion, Israel
Network-Wide Routing Oblivious Heavy Hitters
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
EMOMA- Exact Match in One Memory Access
Author: Yi Lu, Balaji Prabhakar Publisher: INFOCOM’09
A flow aware packet sampling mechanism for high speed links
Toward Self-Driving Networks
Toward Self-Driving Networks
(Learned) Frequency Estimation Algorithms
Presentation transcript:

Lu Tang , Qun Huang, Patrick P. C. Lee MV-Sketch: A Fast and Compact Invertible Sketch for Heavy Flow Detection in Network Data Streams † ‡ † Lu Tang , Qun Huang, Patrick P. C. Lee † The Chinese University of Hong Kong ‡ Institute of Computing Technology, CAS IEEE INFOCOM 2019

Heavy Flow Detection Network traffic: a stream of packets denoted by (𝒙, 𝒗 𝒙 ) pairs 𝑥: flow key, e.g., 5-tuple, source IP address 𝑣 𝑥 : value, e.g., 1 for packet counting, payload bytes for size counting Heavy flows – abnormal patterns in network traffic Heavy hitters: flows with high traffic volume Heavy changers: flows with high change of traffic volume Detecting heavy flows in real time is critical for: anomaly detection, load balancing, traffic engineering

Challenges Fast packet processing Limited memory e.g. 10 Gb/s link: one packet every 67 ns Limited memory Programmable switches: 1-2 MB per stage [Bosshart, SIGCOMM’13] Servers: tens of MB of SRAM Per-flow tracking is expensive e.g., a maximum of 2 104 entries for 5-tuple flows

Sketches Sketches: summary data structures Count-Min [Cormode, 2005]: e.g., Count-Min [Cormode, 2005], Count Sketch [Charikar, 2002] Idea: project high-dimensional data into small subspace Count-Min [Cormode, 2005]: Update with a packet Hash flow key to one counter per row Increment each hashed counter Query a flow Return the minimum among all hashed counters + 𝑣 𝑥 + 𝑣 𝑥 + 𝑣 𝑥 Packet (𝑥, 𝑣 𝑥 ) + 𝑣 𝑥 Each element is a counter

Sketches Good: Bad: Non-invertible High accuracy with small memory Fast processing speed Bad: Non-invertible Cannot readily return all heavy flows e.g., Count-Min needs to enumerate all possible flows in entire flow key space to recover all heavy flows

Existing Invertible Sketches Use extra data structures to store candidate heavy flows e.g., Count-Min-Heap [Cormode, PODS’05], LD-Sketch [Huang, INFOCOM’14] High memory access overhead, increasing with number of heavy flows Dynamic memory allocation Apply group testing by keeping one counter per bit e.g., Deltoid [Cormode, ToN’05], Fast Sketch [Liu, INFOCOM’12] High update overhead, increasing with key length Prune flow key space via new hash schemes e.g., Reversible Sketch [Schweller, INFOCOM’06], SeqHash [Bu, INFOCOM’07]

Our Contributions MV-Sketch, a fast and compact invertible sketch for heavy flow detection in network data streams Built on Majority Voting [Boyer and Moore, 1991] Small and static memory usage High processing speed High accuracy Theoretical analysis on accuracy, space, and time complexity Experiments on real-world network traces Higher accuracy; up to 3.38× throughput gain over state-of-the-arts Match 10Gb/s line rate with SIMD instructions

Problem Formulation Perform detection at regular time intervals called epochs Input: packet stream (𝑥, 𝑣 𝑥 ) Heavy hitter: all 𝑥 with 𝑆 𝑥 >𝜑 𝑆 𝑥 : total traffic volume of flow 𝑥 in one epoch 𝜑: user-specified threshold Heavy changer: all 𝑥 with 𝐷(𝑥) >𝜑 𝐷(𝑥): total traffic change of flow 𝑥 across two epochs Problem: infer 𝑆 𝑥 and 𝐷 𝑥 in real-time with limited memory

Design Key observation: small number of large flows dominate Idea: e.g., 9% of flows account for 90% of traffic [Fang, 1999] Idea: A heavy flow has more traffic than all other flows in the same bucket with high probability Track a candidate heavy flow in each bucket via majority voting (MJRTY) [Boyer and Moore, 1991] Theorem: MJRTY ensures that the true majority vote (with over half of total vote counts) must be tracked

Design Data structure: 𝑟×𝑤 table of buckets Bucket 𝐵(𝑖,𝑗) Total sum Flow key Indicator 𝑟 rows 𝑉 𝑖,𝑗 𝐾 𝑖,𝑗 𝐶 𝑖,𝑗 𝑤 buckets

Update Insert a packet (𝑥, 𝑣 𝑥 ) into the sketch Map 𝑥 to one bucket per row Increment 𝑉 with 𝑣 𝑥 Compare 𝑥 with 𝐾 Case1: 𝐾=𝑥, increment 𝐶 with 𝑣 𝑥 Packet (1101,19) Total sum Flow key Indicator Total sum Flow key Indicator 80 1101 20 99 1101 39 Before After

Update Insert a packet (𝑥, 𝑣 𝑥 ) into the sketch Map 𝑥 to one bucket per row Increment 𝑉 with 𝑣 𝑥 Compare 𝑥 with 𝐾 Case1: 𝐾=𝑥, increment 𝐶 with 𝑣 𝑥 Case2: 𝐾≠𝑥, decrement 𝐶 with 𝑣 𝑥 Packet (1101,19) Total sum Total sum Flow key Flow key Indicator Indicator Total sum Total sum Flow key Flow key Indicator Indicator 80 75 1101 0110 20 25 99 94 1101 0110 39 6 Before Before After After

Update Insert a packet (𝑥, 𝑣 𝑥 ) into the sketch Map 𝑥 to one bucket per row Increment 𝑉 with 𝑣 𝑥 Compare 𝑥 with 𝐾 Case1: 𝐾=𝑥, increment 𝐶 with 𝑣 𝑥 Case2: 𝐾≠𝑥, decrement 𝐶 with 𝑣 𝑥 if 𝑪<𝟎, copy 𝒙 to 𝑲, 𝑪=𝒂𝒃𝒔(𝑪) Packet (1101,19) Total sum Flow key Indicator Total sum Flow key Indicator Total sum Total sum Flow key Flow key Indicator Indicator Total sum Total sum Flow key Flow key Indicator Indicator 75 0110 5 94 1101 14 80 75 1101 0110 20 25 99 94 1101 0110 39 6 Before After Before Before After After

Query Returns the estimated sum of a given flow Total sum Flow key Indicator 𝑒𝑠𝑡 1 = 90+10 2 =50 90 1101 10 𝑒𝑠𝑡 2 = 120−6 2 =57 120 0011 6 𝑒𝑠𝑡 3 = 100+2 2 =51 100 1101 2 Query flow: 1101 210 0101 90 𝑒𝑠𝑡 4 = 210−90 2 =60 𝑒𝑠𝑡(1101) = 𝐌𝐢𝐧 (50, 57, 51, 60) = 50

Identify Heavy Hitters Idea: consider keys tracked by buckets Enumerate all buckets Report 𝐾 𝑖,𝑗 as a heavy hitter if its estimation exceeds threshold 𝑉 𝑖,𝑗 >𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? 𝐵𝑢𝑐𝑘𝑒𝑡 𝐵(𝑖,𝑗) Get the sum estimation of 𝐾 𝑖,𝑗

Identify Heavy Changers Idea: use estimated maximum change Get upper and lower bounds of 𝑥 in one sketch: Upper bound U(𝑥) : estimated sum of 𝑥 Lower bound 𝐿(𝑥): check each hashed bucket 𝐵(𝑖, 𝑗) 𝐿 𝑖,𝑗 (𝑥) = 𝐶 𝑖,𝑗 , 𝑖𝑓 𝐾 𝑖,𝑗 =𝑥 0 , 𝑖𝑓 𝐾 𝑖,𝑗 ≠𝑥 , 𝐿(𝑥) = Max{ 𝐿 𝑖,𝑗 (𝑥)} Estimate maximum change: 𝐷 𝑥 =max⁡{| 𝑈 1 𝑥 − 𝐿 2 (𝑥)|,| 𝐿 1 𝑥 − 𝑈 2 (𝑥)|} MV-Sketch at 1st epoch MV-Sketch at 2nd epoch 𝑈 1 𝑥 and 𝐿 1 𝑥 𝑈 2 𝑥 and 𝐿 2 𝑥

Distributed Detection Architecture: 𝒒 > 1 monitoring nodes and a centralized controller Goal: detect heavy flows at 𝒅 < 𝒒 monitoring nodes with threshold 𝝋/𝒅 locally and report results to the controller at the end of an epoch Assumption: traffic of a flow is uniformly distributed to 𝑑 out of 𝑞 nodes Controller aggregates estimated value of each flow received from all nodes Reports heavy flows with aggregate estimates exceeding φ MV-Sketch MV-Sketch MV-Sketch Controller

Theoretical Analysis On accuracy On complexity Bounded errors Small false negative rate (almost zero false negatives in our evaluation) On complexity Space complexity: Ο (log 𝑛) , 𝑛 is flow key space Per-packet update time complexity: Ο 1

Evaluation Setup Traces: Approach: Metric: CAIDA16 1-hour trace, we focus on the first five minutes Epoch length: 1 minute Each epoch: ~29M packets, ~1M flows on average Approach: Compare with Count-Min-Heap, LD-Sketch, Deltoid, Fast Sketch Flow key: 64 bit (source/destination address pairs) Metric: Accuracy: precision, recall, relative error Speed: throughput (pkts/s)

Evaluation - Accuracy Heavy hitter detection MV-Sketch has highest accuracy Reduce relative error by 55% and 87% over LD-Sketch and Count-Min-Heap, resp.

Evaluation - Accuracy Heavy changer detection MV-Sketch has recall 1 in all cases except 64KB MV-Sketch has lower precision than CMH for small memory

Evaluation - Speed MV-Sketch achieves up to 3.38X throughput gain With SIMD, MV-Sketch’s throughput further improves by 75%

Conclusions MV-Sketch, an invertible sketch that enables fast and accurate heavy flow detection in network data streams Contributions: Propose a new sketch design for invertible sketches High accuracy with small and static memory Fast processing speed Extensions to distributed heavy flow detection Extensive experiments on real-world traces Source code: http://adslab.cse.cuhk.edu.hk/software/mvsketch

Thank you! & Questions?