SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)

Slides:



Advertisements
Similar presentations
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
Advertisements

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.
Pegasus: Precision Hunting for Icebergs and Anomalies in Network Flows Sriharsha Gangam 1, Puneet Sharma 2, Sonia Fahmy 1 1 Purdue University, 2 HP Labs.
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
Measuring Large Traffic Aggregates on Commodity Switches Lavanya Jose, Minlan Yu, Jennifer Rexford Princeton University, NJ 1.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.
DREAM: Dynamic Resource Allocation for Software-defined Measurement
1 BRICK: A Novel Exact Active Statistics Counter Architecture Nan Hua 1, Bill Lin 2, Jun (Jim) Xu 1, Haiquan (Chuck) Zhao 1 1 Georgia Institute of Technology.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Software-defined Measurement
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.
Tracking Port Scanners on the IP Backbone Tao Ye Sprint Burlingame, CA Avinash Sridharan University of Southern California.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Huirong Fu and Edward W. Knightly Rice Networks Group Aggregation and Scalable QoS: A Performance Study.
Scalable Multi-Class Traffic Management in Data Center Backbone Networks Amitabha Ghosh (UtopiaCompression) Sangtae Ha (Princeton) Edward Crabbe (Google)
Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Enabling a “RISC” Approach for Software-Defined Monitoring using Universal Streaming Vyas Sekar Zaoxing Liu, Greg Vorsanger, Vladimir Braverman.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,
SketchVisor: Robust Network Measurement for Software Packet Processing
Constant Time Updates in Hierarchical Heavy Hitters
Jennifer Rexford Princeton University
FlowRadar: A Better NetFlow For Data Centers
A Resource-minimalist Flow Size Histogram Estimator
Data Streaming in Computer Networking
Streaming & sampling.
Augmented Sketch: Faster and More Accurate Stream Processing
Query-Friendly Compression of Graph Streams
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Qun Huang, Patrick P. C. Lee, Yungang Bao
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Memento: Making Sliding Windows Efficient for Heavy Hitters
Constant Time Updates in Hierarchical Heavy Hitters
Network-Wide Routing Oblivious Heavy Hitters
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
Ran Ben Basat, Xiaoqi Chen, Gil Einziger, Ori Rottenstreich
Lu Tang , Qun Huang, Patrick P. C. Lee
NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner,
(Learned) Frequency Estimation Algorithms
Presentation transcript:

SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)

Measurement is Crucial for Network Management 2 Accounting Anomaly Detection Traffic Engineering Heavy Hitter detection Heavy hitter detection (HH) Change detection Super source detection (SSD) DDoS detection Anomaly Detection Traffic Engineering Network Management on multiple tenants: Measurement tasks: Heavy Hitter detection Hierarchical heavy hitter detection (HHH) Need fine-grained visibility of network traffic

Controller DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15] Software Defined Measurement 3 Switch A Task 1 counters Task 2 counters Switch B Task 1 counters Task 2 counters Collect Configure Task 2Task 1

Our Focus: Sketch-based Measurement 4 Summaries of streaming data to approximately answer specific queries E.g., Bitmap for counting unique items OpenFlowCounters DREAM[SIGCOMM’14] Sketches MemoryExpensive, power-hungry TCAM Cheaper SRAM CountersVolume countersVolume and Connection counters FlowsSelected prefixesAll traffic all-the-time SCREAM [CoNEXT’15] Sketches use a cheaper memory and are more expressive

Sketch Example: Count-Min Sketch 5 (IP, 1 Kbytes) h1(IP) h2(IP) h3(IP) What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2 d At packet arrival: Provable error bound given traffic properties (e.g., skew) Resource accuracy trade-off: At query: 2+1=3 4+1=5 1+1=2

Challenges: Limited Counters for Many Tasks 6 Many task instances: 3 types (Heavy hitter, Hierarchical heavy hitter, Super source) Different flow aggregates (Rack, App, Src/Dst/Port) 1000s of tenants Limited shared resources: SRAM capacity (e.g., 128 MB) Shared with other functions (e.g., routing) Too many resources to guarantee accuracy: 1 MB-32 MB per task Less than tasks in SRAM

Goal: Many Accurate Sketch-based Measurements 7 Users dynamically instantiate a variety of measurement tasks SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy

Approach: Dynamic Resource Allocation 8 Resource accuracy trade-off depends on traffic Dynamic allocation for current traffic Worst-case uses >10x counters than average Count Min: Provable error bound given traffic properties Ex: Skew of traffic from each IP Skew Required memory

Opportunity: Temporal Multiplexing 9 Task 1 Task 2 Required Memory Time Multiplex memory among tasks over time Memory requirement varies over time

Opportunity: Spatial Multiplexing 10 Required Memory Switch ASwitch B Memory requirement varies across switches Multiplex memory among tasks across switches Task 1 Task 2

Key Insight 11 Leverage spatial and temporal multiplexing and dynamically allocate switch memory per task to achieve sufficient accuracy for many tasks DREAM has the same insight SCREAM applies it for sketches

SCREAM Contributions 12 Heavy hitter (HH) tasks Super Source (SSD) tasks Dynamic resource allocator Hierarchical heavy hitter (HHH) tasks Allocation 1- Supports 3 sketch-based task types 2- Allocate memory among sketch-based task instances across switches while maintaining sufficient accuracy SCREAM Anomaly detection Traffic engineering DDoS detection

SCREAM Iterative Workflow 13 Estimate accuracy Allocate resources Collect & report Counters from many switches Accuracy Memory size

SCREAM Iterative Workflow 14 Task1 accuracy <80% Give more memory to task1 Estimate accuracy Allocate resources Collect & report Accuracy

SCREAM Iterative Workflow 15 Estimate accuracy Allocate resources Collect & report Skew of traffic for task2 changes Task2 accuracy <80% Give more memory to task2 Accuracy Merge counters from switches

SCREAM Challenges Estimate accuracy Allocate resources Collect & report Network-wide task implementation using sketches Accuracy estimation without the ground-truth Fast & Stable allocation in DREAM [SIGCOMM’14]

Switch BSwitch A Challenge: Merge Sketches of Different Sizes 17 Network-wide Task Heavy hitter (HH) d d w1 w2 Source IPs sending > 10Mbps

≥ SCREAM Solution to Merge Sketches for HH Detection Previous work: Min of sumsSCREAM: Sum of mins Min 1020 Min Switch BSwitch A Both over-approximate  smaller is more accurate

SCREAM Solutions Estimate accuracy Allocate resources Collect & report Accuracy estimation without the ground-truth Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Network-wide task implementation using sketches Fast & Stable allocation in DREAM [SIGCOMM’14]

Precision Estimation for Heavy Hitter Detection 20 Threshold True HHFalse HH Estimated Real Error Estimate-Threshold = Sum(P[Detected HH is true]) = 1 - P[Error ≥ Estimate-Threshold] True detected HH Detected HHs Precision = Insight: Relate probability to Error on counters of detected HHs P[Detected HH is true]

Precision Estimation Step 1: Find a Bound on The Error 21 Idea 1: Use average Error in Markov’s inequality to bound it Idea 1 = 1 - P[Error ≥ Estimate-Threshold] Insight: Relate probability to Error on counters of detected HHs P[Detected HH is true]

A row in Count-Min: Precision Estimation Step 2: Improve The Bound 22 Insight: Average Error = heavy items collision + small items collision Counter indices of detected HHs show heavy collisions Idea 2: Markov’s inequality only for small items Idea 1 Idea 2

SCREAM Solutions Estimate accuracy Allocate resources Collect & report Accuracy estimation without the ground-truth Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Network-wide task implementation using sketches Precision estimators for HH, HHH and SSD tasks Fast & Stable allocation in DREAM [SIGCOMM’14]

SCREAM Solutions Estimate accuracy Allocate resources Collect & report Accuracy estimation without the ground-truth Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Network-wide task implementation using sketches Precision estimators for HH, HHH and SSD tasks Fast & Stable allocation in DREAM [SIGCOMM’14]

Evaluation 25 Metrics: Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy % of rejected tasks Alternatives: OpenSketch: Allocate for bounded error for worst-case traffic at task instantiation (test with different bounds) Oracle: Knows required resource for a task in each switch in advance

Evaluation Setting 26 Simulation for 8 switches: 256 task instances (HH, HHH, SSD, combination) Accuracy bound = 80% 5 min tasks arriving in 20 minutes 2 hours CAIDA trace

SCREAM Provides High Accuracy for More Tasks 27 SCREAM: High satisfaction and low reject OpenSketch: Loose bound  Under provision  low satisfaction Tight bound  Over provision  high reject

SCREAM’s Performance Is Close to An Oracle 28 SCREAM performance is close to an oracle, its satisfaction is a bit lower because: Iterative allocation takes time Accuracy estimation has error

Other Evaluations 29 SCREAM accuracy estimation has 5% error in average Accuracy estimation error Changing traffic skew SCREAM supports more accurate tasks than OpenSketch Other accuracy metrics Tasks in SCREAM have high recall (low false negative)

Conclusion 30 Practical sketch-based SDM by dynamic memory allocation Implementing network-wide tasks using sketches Estimating accuracy for 3 types of tasks SCREAM is available at github.com/USC-NSL/SCREAM Measurement is crucial for SDN management in a resource-constrained environment

Thanks! Questions? 31