Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig

Slides:

Advertisements

Similar presentations

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.

Advertisements

1 IP-Lookup and Packet Classification Advanced Algorithms & Data Structures Lecture Theme 08 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.

Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.

Introduction to IPv6 Presented by: Minal Mishra. Agenda IP Network Addressing IP Network Addressing Classful IP addressing Classful IP addressing Techniques.

COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??

Chapter 4: Trees Part II - AVL Tree

IP Routing Lookups Scalable High Speed IP Routing Lookups.

TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.

A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.

1 TCAM Razor: A Systematic Approach Towards Minimizing Packet Classifiers in TCAMs Department of Computer Science and Information Engineering National.

ClassBench: A Packet Classification Benchmark

Tries Standard Tries Compressed Tries Suffix Tries.

Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.

Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.

Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi.

ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE YAN CHEN – MATT MODAFF – AARON BEACH.

Domain Name System: DNS

Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.

CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.

George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.

Cs6390 summer 2000 Tradeoffs for Packet Classification 1 Tradeoffs for Packet Classification Members: Jinxiao Song & Yan Tong.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

2 © 2003, Cisco Systems, Inc. All rights reserved. RST-2002 IP Addressing.

Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.

SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

NetFlow: Digging Flows Out of the Traffic Evandro de Souza ESnet ESnet Site Coordinating Committee Meeting Columbus/OH – July/2004.

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.

Packet Classification on Multiple Fields 참고 논문 : Pankaj Gupta and Nick McKeown SigComm 1999.

Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29 1.

CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.

Christopher Moh 2005 Competition Programming Analyzing and Solving problems.

CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.

XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.

Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.

Week 8 - Wednesday.  What did we talk about last time?  Level order traversal  BST delete  2-3 trees.

1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,

AutoFocus: A Tool for Automatic Traffic Analysis Cristian Estan, University of California, San Diego.

Cristian Estan, Garret Magin University of Wisconsin-Madison USENIX LISA, 17 December 2015 Interactive traffic analysis and visualization with Wisconsin.

D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

Data Structure and Algorithms

IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.

Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.

8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.

Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.

Data Streaming in Computer Networking

Cristian Estan, Stefan Savage, George Varghese

Transport Layer Systems Packet Classification

Programmable Networks

Memento: Making Sliding Windows Efficient for Heavy Hitters

Lu Tang , Qun Huang, Patrick P. C. Lee

Presentation transcript:

Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig Authors: Cristian Estan, Stefan Savage, George Varghese Note: Portions of this presentation are from the author’s SIGCOMM presentation slides available at:

Attig 2 Organization Motivation Traffic Clusters Algorithms –Unidimensional –Multidimensional AutoFocus Experimental Results Conclusions

Attig 3 Motivation New applications  new traffic patterns Network Admins need to know how being used –provide better service –Reconfigure network elements to recognize traffic model Difficult to know how being used, must infer characteristics Currently use pre-defined list of patterns to identify –This paper focuses on the standard 5-tuple –Allows comparison to Cisco’s FlowScan

Attig 4 Goals Reduce the amount of detail in reports –Easier to read and act upon Provide multiple dimensions to report / aggregate on –1 dimension  lose info –Example: If aggregate on protocol or src port fields, may conclude p2p in wide use, when only small set of hosts responsible Need for Automation - detect/contain Worms/DoS –Network pushback needs automated technique to distinguish malicious flows

Attig 5 How to report? “top ten” –Could lose important info Aggregate individual flows into common category (single dimension) –Choose wrong dimension  lose important characteristics Aggregate on multiple fields (multiple dimensions) –Better Dynamically define traffic clusters to report

Attig 6 Traffic Clusters Traffic clusters are the multidimensional traffic aggregates identified by reports –Dimensionality –Detail –Utility A cluster is defined by a range for each field The ranges are from natural hierarchies (e.g. IP prefix hierarchy) – meaningful aggregates Example –Traffic aggregate: incoming web traffic for a subnet –Traffic cluster: ( SrcIP=*, DestIP in /24, Proto=TCP, SrcPort=80, DestPort=high )

Attig 7 Traffic Reports A list of clusters to be presented Restrict what to report by volume # bytes # packets Could restrict based on other criteria, they focus on volume

Attig 8 What Must be done to report? Compute –Identify clusters with a traffic volume above threshold, H Compress –Remove cluster C if can infer it’s traffic from C’ Compare –show how traffic changes over time Prioritize –Sort in terms of potential level of interest –Use unexpectedness metric

Attig 9 Algorithms Core of AutoFocus tool k = number of fields Sets for each field form a natural hierarchy –parent always smallest superset of children –Leaves are individual values the field had –Root is set of all possible values (*) d i = depth of hierarchy of the i-th of the k fields s = T/H, where T = total traffic, H = threshold –Think of this as the number of reports you can have n = number flow records of input m = number distinct values of the fields in the n flow records

Attig 10 Data Source Use simplified NetFlow records as raw data Has key that specifies exact values for all 5 fields Has two counters –One to count packets that matched key during measurement interval –One to count number of bytes in those packets

Attig 11 Unidimensional Compute Clusters can overlap Bad  Can have up to 1+20*25 = 501 reports –H = 5%  s = 20, hierarchy size is 25+1 Only consider prefixes of size 32 down to 8 Number of reports of clusters above H at most 1 + (d-1)s Two approaches: –When number of sets in hierarchy small, apply a brute force approach: keep count for each set and traverse all n flow records –At end, list clusters who exceed H

Attig 12 Unidimensional Compute – Approach 2 As go through flow records: –Build leaf nodes that correspond to IP addresses that actually appear in trace –Do trace updating counter of all leaf nodes In second pass (post-order), determine which clusters over threshold H –Add traffic to that of its parent Memory required is O(md) Running time O(n + md)

Attig 13 Unidimensional report example / / / / / / / / / / HierarchyThreshold= / /30

Attig 14 Unidimensional Compress Reduce redundant info –Example: a /30 network may have bit more traffic than /31 it covers, but not much Reporting the /30 adds nothing to the report Define compression threshold C as amount by which a cluster can be off (used C=H) Trade-off: accuracy versus reduced size The number of clusters above the threshold in a non-redundant compressed report is at most s. Brings reports of previous example down to 20 instead of 501

Attig Unidimensional report example / / / / / Compression < ≥100 Source IPTraffi c / /

Attig 16 Unidimensional Compress Single traversal of the tree (post-order) Two counters maintained in each node –One reflecting traffic –One reflecting estimate of its traffic based on more specific clusters in report Sum of estimates of children If difference between sum and actual traffic below threshold, H, ignore Else report with exact traffic and set estimate counter to actual traffic

Attig 17 Unidimensional Compare How has structure of traffic changed? Absolute change –Number bytes or packets by which clusters increase/decrease –Only if measurement interval constant Relative change –Comparison of traffic mixes over different measurement intervals –Must normalize for comparison Do a pass over the trace to compute the exact traffic in both intervals If estimate lower or larger by H than actual change, report - a compression step –Estimate based on reports already added to report

Attig 18 Unidimensional Prioritize Unexpectedness –deviation from an uniform model –Example 50% of all traffic is web Node B receives 20% of all traffic The web traffic received by node B is 15% instead of 50%*20%=10% unexpectedness label is 15%/10%=150% –This is used to flag interesting data in a report

Attig 19 Multidimensional Structure Example All traffic USEU CANYGBDE WebMail Source netApplication US Web Nodes (clusters) have multiple parents US Web Nodes (clusters) overlap CA

Attig 20 Another way to understand multi-dim. case x hierarchy y hierarchy “leaf”nodes “root”node decreasing x-specificity decreasing y-specificity

Attig 21 Pruning the hierarchy x hierarchy y hierarchy significant 1d nodes potentially significant 2d nodes

Attig 22 Multidimensional Compute For each cluster, examine n flows and report those above the threshold Can’t do brute force – ~ n  d i clusters –Evaluate all clusters generated by n flows? Too many! Restrict search based on optimizations that prune search space –Consider only if all unidimensional ancestors above H Solve k unidimensional problems –Combine clusters

Attig 23 Greedy compression algorithm Add estimate counters of children along all dimensions Set this node’s estimate to largest of sums If above threshold, report

Attig 24 System: AutoFocus Traffic parser Web based GUI Cluster miner Grapher Packet header trace categories names

Attig 25

Attig 26 Experience Small network exchange point – SD-NAP Slammer Worm Worm traffic separated (black) Uncharacterized spike in “other” category indicates something fishy

Attig 27 Analysis of unusual events UCSD to UCLA route change Sapphire/SQL Slammer worm Site 2

Attig 28 Conclusions Multidimensional traffic clusters using natural hierarchies describe traffic aggregates Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates Their prototype system, AutoFocus, provides insights into the structure of regular traffic and unexpected events