Presentation is loading. Please wait.

Presentation is loading. Please wait.

Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig

Similar presentations


Presentation on theme: "Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig"— Presentation transcript:

1 Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig http://www.acm.org/sigs/sigcomm/sigcomm2003/papers.html#p137-estan Authors: Cristian Estan, Stefan Savage, George Varghese Note: Portions of this presentation are from the author’s SIGCOMM presentation slides available at: http://www.cs.ucsd.edu/~cestan/

2 Attig 2 Organization Motivation Traffic Clusters Algorithms –Unidimensional –Multidimensional AutoFocus Experimental Results Conclusions

3 Attig 3 Motivation New applications  new traffic patterns Network Admins need to know how being used –provide better service –Reconfigure network elements to recognize traffic model Difficult to know how being used, must infer characteristics Currently use pre-defined list of patterns to identify –This paper focuses on the standard 5-tuple –Allows comparison to Cisco’s FlowScan

4 Attig 4 Goals Reduce the amount of detail in reports –Easier to read and act upon Provide multiple dimensions to report / aggregate on –1 dimension  lose info –Example: If aggregate on protocol or src port fields, may conclude p2p in wide use, when only small set of hosts responsible Need for Automation - detect/contain Worms/DoS –Network pushback needs automated technique to distinguish malicious flows

5 Attig 5 How to report? “top ten” –Could lose important info Aggregate individual flows into common category (single dimension) –Choose wrong dimension  lose important characteristics Aggregate on multiple fields (multiple dimensions) –Better Dynamically define traffic clusters to report

6 Attig 6 Traffic Clusters Traffic clusters are the multidimensional traffic aggregates identified by reports –Dimensionality –Detail –Utility A cluster is defined by a range for each field The ranges are from natural hierarchies (e.g. IP prefix hierarchy) – meaningful aggregates Example –Traffic aggregate: incoming web traffic for a subnet –Traffic cluster: ( SrcIP=*, DestIP in 128.252.153.0/24, Proto=TCP, SrcPort=80, DestPort=high )

7 Attig 7 Traffic Reports A list of clusters to be presented Restrict what to report by volume # bytes # packets Could restrict based on other criteria, they focus on volume

8 Attig 8 What Must be done to report? Compute –Identify clusters with a traffic volume above threshold, H Compress –Remove cluster C if can infer it’s traffic from C’ Compare –show how traffic changes over time Prioritize –Sort in terms of potential level of interest –Use unexpectedness metric

9 Attig 9 Algorithms Core of AutoFocus tool k = number of fields Sets for each field form a natural hierarchy –parent always smallest superset of children –Leaves are individual values the field had –Root is set of all possible values (*) d i = depth of hierarchy of the i-th of the k fields s = T/H, where T = total traffic, H = threshold –Think of this as the number of reports you can have n = number flow records of input m = number distinct values of the fields in the n flow records

10 Attig 10 Data Source Use simplified NetFlow records as raw data Has key that specifies exact values for all 5 fields Has two counters –One to count packets that matched key during measurement interval –One to count number of bytes in those packets

11 Attig 11 Unidimensional Compute Clusters can overlap Bad  Can have up to 1+20*25 = 501 reports –H = 5%  s = 20, hierarchy size is 25+1 Only consider prefixes of size 32 down to 8 Number of reports of clusters above H at most 1 + (d-1)s Two approaches: –When number of sets in hierarchy small, apply a brute force approach: keep count for each set and traverse all n flow records –At end, list clusters who exceed H

12 Attig 12 Unidimensional Compute – Approach 2 As go through flow records: –Build leaf nodes that correspond to IP addresses that actually appear in trace –Do trace updating counter of all leaf nodes In second pass (post-order), determine which clusters over threshold H –Add traffic to that of its parent Memory required is O(md) Running time O(n + md)

13 Attig 13 Unidimensional report example 10.0.0.210.0.0.310.0.0.410.0.0.510.0.0.810.0.0.910.0.0.1010.0.0.14 153530 40 160110 35 75 10.0.0.2/3110.0.0.4/31 50 10.0.0.8/31 10.0.0. 10/31 702703575 10.0.0.0/3010.0.0.4/3010.0.0.8/30 753055070 10.0.0.0/2910.0.0.8/29 120380 10.0.0.0/28 500 120380 305 270 160110 HierarchyThreshold=100 10.0.0.14/31 10.0.0.12/30

14 Attig 14 Unidimensional Compress Reduce redundant info –Example: a /30 network may have bit more traffic than /31 it covers, but not much Reporting the /30 adds nothing to the report Define compression threshold C as amount by which a cluster can be off (used C=H) Trade-off: accuracy versus reduced size The number of clusters above the threshold in a non-redundant compressed report is at most s. Brings reports of previous example down to 20 instead of 501

15 Attig 15 270 120 500 305 380 160110 Unidimensional report example 10.0.0.810.0.0.9 10.0.0.0/2910.0.0.8/29 10.0.0.8/31 10.0.0.8/30 10.0.0.0/28 120380 160110 Compression 305-270<100 380-270≥100 Source IPTraffi c 10.0.0.0/2 9 120 10.0.0.8/2 9 380 10.0.0.8160 10.0.0.9110

16 Attig 16 Unidimensional Compress Single traversal of the tree (post-order) Two counters maintained in each node –One reflecting traffic –One reflecting estimate of its traffic based on more specific clusters in report Sum of estimates of children If difference between sum and actual traffic below threshold, H, ignore Else report with exact traffic and set estimate counter to actual traffic

17 Attig 17 Unidimensional Compare How has structure of traffic changed? Absolute change –Number bytes or packets by which clusters increase/decrease –Only if measurement interval constant Relative change –Comparison of traffic mixes over different measurement intervals –Must normalize for comparison Do a pass over the trace to compute the exact traffic in both intervals If estimate lower or larger by H than actual change, report - a compression step –Estimate based on reports already added to report

18 Attig 18 Unidimensional Prioritize Unexpectedness –deviation from an uniform model –Example 50% of all traffic is web Node B receives 20% of all traffic The web traffic received by node B is 15% instead of 50%*20%=10% unexpectedness label is 15%/10%=150% –This is used to flag interesting data in a report

19 Attig 19 Multidimensional Structure Example All traffic USEU CANYGBDE WebMail Source netApplication US Web Nodes (clusters) have multiple parents US Web Nodes (clusters) overlap CA

20 Attig 20 Another way to understand multi-dim. case x hierarchy y hierarchy “leaf”nodes “root”node decreasing x-specificity decreasing y-specificity

21 Attig 21 Pruning the hierarchy x hierarchy y hierarchy significant 1d nodes potentially significant 2d nodes

22 Attig 22 Multidimensional Compute For each cluster, examine n flows and report those above the threshold Can’t do brute force – ~ n  d i clusters –Evaluate all clusters generated by n flows? Too many! Restrict search based on optimizations that prune search space –Consider only if all unidimensional ancestors above H Solve k unidimensional problems –Combine clusters

23 Attig 23 Greedy compression algorithm Add estimate counters of children along all dimensions Set this node’s estimate to largest of sums If above threshold, report

24 Attig 24 System: AutoFocus Traffic parser Web based GUI Cluster miner Grapher Packet header trace categories names

25 Attig 25

26 Attig 26 Experience Small network exchange point – SD-NAP Slammer Worm Worm traffic separated (black) Uncharacterized spike in “other” category indicates something fishy

27 Attig 27 Analysis of unusual events UCSD to UCLA route change Sapphire/SQL Slammer worm Site 2

28 Attig 28 Conclusions Multidimensional traffic clusters using natural hierarchies describe traffic aggregates Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates Their prototype system, AutoFocus, provides insights into the structure of regular traffic and unexpected events


Download ppt "Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig"

Similar presentations


Ads by Google