Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.

Slides:



Advertisements
Similar presentations
Monitoring very high speed links Gianluca Iannaccone Sprint ATL joint work with: Christophe Diot – Sprint ATL Ian Graham – University of Waikato Nick McKeown.
Advertisements

Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.
Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies.
A Data Stream Management System for Network Traffic Management Shivnath Babu Stanford University Lakshminarayanan Subramanian Univ. California, Berkeley.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.
Sampling and Flow Measurement Eric Purpus 5/18/04.
Trajectory Sampling for Direct Traffic Observation Matthias Grossglauser joint work with Nick Duffield AT&T Labs – Research.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Traffic Engineering With Traditional IP Routing Protocols
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Traffic Measurement for IP Operations Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
Measurement and Monitoring Nick Feamster Georgia Tech.
User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.
Hash-Based IP Traceback Alex C. Snoeren, Craig Partidge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, and W. Timothy Strayer.
Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams Jun Xu, Zhenglin Yu (Georgia Tech) Jia Wang, Zihui Ge, He Yan (AT&T.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
1 BRICK: A Novel Exact Active Statistics Counter Architecture Nan Hua 1, Bill Lin 2, Jun (Jim) Xu 1, Haiquan (Chuck) Zhao 1 1 Georgia Institute of Technology.
Licentiate Seminar: On Measurement and Analysis of Internet Backbone Traffic Wolfgang John Department of Computer Science and Engineering Chalmers University.
Data Center Traffic and Measurements: Available Bandwidth Estimation Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
NET-REPLAY: A NEW NETWORK PRIMITIVE Ashok Anand Aditya Akella University of Wisconsin, Madison.
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Using Measurement Data to Construct a Network-Wide View Jennifer Rexford AT&T Labs—Research Florham Park, NJ
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.
1 Countering DoS Through Filtering Omar Bashir Communications Enabling Technologies
Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College.
Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
April 4th, 2002George Wai Wong1 Deriving IP Traffic Demands for an ISP Backbone Network Prepared for EECE565 – Data Communications.
Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.
Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Packet-Marking Scheme for DDoS Attack Prevention
Distributed Denial-of-Service Attack Detection (and Mitigation?) Mukesh Agarwal, Aditya Akella, Ashwin Bharambe.
D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Hash-Based IP Traceback Alex C. Snoeren +, Craig Partridge, Luis A. Sanchez ++, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent and W. Timothy.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
1 Monitoring: from research to operations Christophe Diot and the IP Sprintlabs ipmon.sprintlabs.com.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
FlowRadar: A Better NetFlow For Data Centers
A Resource-minimalist Flow Size Histogram Estimator
Data Streaming in Computer Networking
Lightweight Application Classification for Network Management
Srinivas Narayana MIT CSAIL October 7, 2016
Optimal Elephant Flow Detection Presented by: Gil Einziger,
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Memento: Making Sliding Windows Efficient for Heavy Hitters
Heavy Hitters in Streams and Sliding Windows
Lu Tang , Qun Huang, Patrick P. C. Lee
(Learned) Frequency Estimation Algorithms
Presentation transcript:

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College of Computing, Georgia Tech + AT&T Labs - Research

Flow matrix FM  FM [i, j, f] = the size of the flow f flowing from node i to node j  Useful in Computing usage pattern of ISPs Detecting of flapping routes Detecting DDoS attacks Traffic and flow matrices Traffic matrix TM  TM [i, j] = traffic volume from node i to node j  Useful in Capacity planning and forecasting Routing configuration Network fault/reliability diagnoses Provisioning for SLA

Existing approaches Traffic matrix  Indirect inference (holistic) Link counts from SNMP Routing matrix Network model  Direct measurement Sampling Our approach Flow matrix  Not well studied yet  Straightforward approach: sampling

Data streaming algorithms Data streaming: processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream. Our context  Packet arrival rate is high (e.g., Gbps)  Small but fast memory — SRAM (10ns per access) will be used. Challenge: how to fully use SRAM to remember as much information pertinent to traffic/flow matrix as possible?

Two data streaming schemes The bitmap-based scheme  Traffic matrix The counter array-based scheme  Flow matrix  Traffic matrix

System model Online streaming module Online streaming module Data analysis module Node i Node j Sever

The bitmap-based scheme Online streaming module Data analysis module

Online streaming module The data digest data-structure is a bit array (bitmap) initially set to all 0’s. It is updated upon each packet arrival. Measurement proceeds in epochs.

Example packet 012i 0 Invariant packet header + the first 8 bytes of the payload [Snoeren et al. SIGCOMM’01] shows that these 28 bytes are sufficient to differentiate almost all non-identical packets. H(.) U := U-1 If U/b < Threshold save the bitmap start a new epoch b-1 1

Complexities Computational complexity  One hash function computation  One write to the memory Storage complexity  Each packet only produces a little more than one bit as its digest.  This can be further reduced using sampling.

The bitmap-based scheme Online streaming module Data analysis module

What we have so far? (for TM [i, j]):  BM i generated by the traffic at node i (T i ) and  BM j generated by the traffic at node j (T j ) What we want to estimate

Estimation based on BM i and BM j [Whang et al. 1990] proposed a method to infer |T| from BM, i.e., where is the number of “0”s in BM. |T i U T j | can be inferred from the bitwise-OR of BM i and BM j. An estimator of TM [i, j] is given by We derive the variance of the estimator

Multipaging t1t1 t2t2 Node i Node j

Eliminating the effects of clock offset and packets in transit t Node i Node j T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network) If t < T1, then overlap(1,2) = 1 Combining with packets in transit T2 : a tight upper bound of packet traversal time If t < T1+T2, then overlap(1,2) = 1

Counter array based scheme Online streaming module Data analysis module

Online streaming module The data digest data-structure is a counter array. It is updated upon each packet arrival. Measurement proceeds in epochs.

Example packet 012i b-1 n Flow label H(.) n+1

Counter array based scheme Online streaming module Data analysis module

Principle: find good counter-value matching between ingress nodes and egress nodes Challenge: the hashing collisions make the one- to-one matching fail. Method: iterative elephant-first matching Accuracy: work well for the medium-to-large flow matrix elements due to the Zipfian nature of Internet traffic.

Elephant-first matching K a1a1 Node i a2a2 Node j a1>a2 a1-a2 Node i 0 Node j FM[i, j, f] = a2 K a1a1a2a2 a1<=a2 0a2-a1FM[i, j, f] = a1

Evaluation Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time. We construct the synthetic experiments based on 16 publicly available packet- level traces from NLANR.

Evaluation: traffic matrix bitmap schemecounter array scheme

Metric

RMSRE: traffic matrix

RMSRE: flow matrix

Conclusion A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches. Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix. Both algorithms are designed to operate at very high speed networks.

Thank You! Questions?