Presentation is loading. Please wait.

Presentation is loading. Please wait.

DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,

Similar presentations


Presentation on theme: "DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,"— Presentation transcript:

1 DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns, jwkhong}@postech.ac.kr April 20, 2010 1 Dept. of Computer Science and Engineering, POSTECH, Korea 2 Division of IT Convergence Engineering, POSTECH, Korea An Effective Similarity Metric for Application Traffic Classification

2 DPNM, POSTECH 2/23 NOMS 2010 Contents  Introduction  Related Work  Research Goal  Proposed Methodology  Evaluation  Conclusion and Future Work

3 DPNM, POSTECH 3/23 NOMS 2010 Introduction  Traffic classification for network management Network planning QoS management Security Etc.  Diversity of today’s Internet traffic New types of network applications Increase of P2P traffic Various techniques for avoiding detection  Document classification  Traffic classification Document classification in natural language processing Comparing packet payload vectors is analogous to document classification

4 DPNM, POSTECH 4/23 NOMS 2010 Related Work  Well-known port-based classification Low complexity Low accuracy (approximately 50~70%)  Signature-based classification High reliability Exhaustive tasks for searching signatures E.g.) Snort, LASER  Behavior-based classification Focusing on traffic patterns and connection behaviors Questionable accuracy E.g.) BLINC  Machine Learning-based classification Utilize statistical information A huge computing resource consumption E.g.) SVM, Bayesian Network  Similarity-based classification Utilize document classification approach Questionable scalability E.g.) Flow similarity calculation [IPOM ‘09]

5 DPNM, POSTECH 5/23 NOMS 2010 Summary of IPOM 2009  Proposed new traffic classification approach Utilize document classification approach using Cosine similarity calculation Propose new packet representation using Vector Space Model Propose flow similarity calculation methodology which is to compare packets in flow sequentially  Methodology validation using real-world traffic on our campus backbone network Cannot classify flows in asymmetric routing environment  No comparison of Cosine similarity and other similarity metrics Cosine similarity that is common similarity metric for human- document classification High variation of similarity value according to term-frequency

6 DPNM, POSTECH 6/23 NOMS 2010 Research Goals  Propose new traffic classification algorithm Automation of signature generation step Generate application vector, which is an alternative signature, using simple vector operation Make groups according to traffic type and operation within single- application traffic Accurate and feasible traffic classification algorithm Classify application traffic using similarity calculation Solve asymmetric routing classification problem Validation using real-world network traffic to compare similarity metrics Complexity analysis  Compare three similarity metrics for traffic classification Jaccard similarity – counting fragment of signature Cosine similarity – high weighting scheme for signature RBF similarity – Euclidean distance between packets

7 DPNM, POSTECH 7/23 NOMS 2010 Proposed Methodology

8 DPNM, POSTECH 8/23 NOMS 2010 Vector Space Modeling  Vector Space Modeling An algebraic model representing text documents as vectors Widely used to document classification Categorize electronic document based on its content (e.g. E-mail spam filtering)  Document classification vs. Traffic classification Document classification Find documents from stored text documents which satisfy certain information queries Traffic classification Classify network traffic according to the type of application based on traffic information

9 DPNM, POSTECH 9/23 NOMS 2010 Payload Vector Conversion (1/2)  Definition of word in payload Payload data within an i-bytes sliding window |Word set| = 2 (8*sliding window size)  Definition of payload vector A term-frequency vector in NLP Payload Vector = [w 1 w 2 … w n ] T

10 DPNM, POSTECH 10/23 NOMS 2010 Payload Vector Conversion (2/2) Word The word size is 2 and the word set size is 2 16 –The simplest case for representing the order of content in payloads

11 DPNM, POSTECH 11/23 NOMS 2010 Similarity Metrics for Traffic Classification  Jaccard similarity The size of the intersection of the sample sets X and Y divided by the size of the union of the sample sets X and Y  Cosine similarity Two vectors X and Y of n dimensions by fining the cosine angle between them  RBF similarity Radius based function of Euclidean distance between two vectors X and Y

12 DPNM, POSTECH 12/23 NOMS 2010 Application Vector Heuristics  Application vector Represent typical packets that are generated by target applications as the center (basis) of each cluster  Application vector generator Read packets from the target application trace Divide the packets into several types of clusters without any pre- processing Application vector generator Application trace Application vector 1 Application vector 2 Application vector 3 Traffic cluster 1 Traffic cluster 2

13 DPNM, POSTECH 13/23 NOMS 2010 Application Vector Generation  Unsupervised grouping within single-application traffic Provide fine-grained classification Classify single-application traffic according to traffic types packet6 packet5 packet4 packet3 packet2 packet1 Application vector 1 Application vector 2 Application Traffic Cluster 1 Cluster 2

14 DPNM, POSTECH 14/23 NOMS 2010 Two-stage Traffic Classification  Packet level clustering Classify signal packets regardless of flow information Compare payload vectors with application vectors by calculating similarity value Mark on each packet with its application and priority Allow the permutation of packet sequence  Flow level classification Rearrange packets according to flow information Ignore mis-clustered packets that are caused by protocol ambiguities HTTP for Web HTTP for P2P

15 DPNM, POSTECH 15/23 NOMS 2010 Two-stage Traffic Classification Flow 2Flow 1 Cluster 3 Cluster 2 Cluster 1 F2 P2 F2 P3 F2 P1 F2 P4 F1 P1 F1 P2 F1 P4 F1 P3 F1 P2 F1 P4 F1 P3 F1 P1 F2 P2 F2 P3 F2 P1 F2 P4 Application Vector 1 Application Vector 2 Application Vector 3 F1 P2 F1 P4 F1 P3 F1 P1 F2 P2 F2 P1 F2 P4 F2 P3 Stage 1Stage 2BackboneTraffic BitTorrent Traffic FileGuri Traffic BitTorrent FileGuri Melon BitTorrentFileGuri Mis- clustered

16 DPNM, POSTECH 16/23 NOMS 2010 Evaluation

17 DPNM, POSTECH 17/23 NOMS 2010 Classifying Real-world Traffic  Fix-port Applications Traffic trace on one of two Internet junctions at POSTECH using optical tap Ground-truth traffic Some active flows among application traffic distinguished by usage of active port number Target Applications FileGuri, ClubBox, Melon, BigFile  Untraceable-port Applications Traffic Measurement Agent (TMA) Monitoring the network interface of the host Recording log data (five-flow tuples, process name, packet count, etc) Target Applications eMule, BitTorrent Backbone Traffic Target Application Traffic Ground-truth Traffic Target Application Traffic Ground-truth Traffic

18 DPNM, POSTECH 18/23 NOMS 2010 Classification Accuracy  Classification accuracy comparison Fixed-port application FileGuri, ClubBox, Melon, BigFile Untraceable-port application eMule, BitTorrent Jaccard similarity Reliable – count common segment Cosine similarity Emphasize common segment – cannot distinguish ambiguous packets RBF similarity Difficulty of setting parameter – need guideline how to set parameter  BitTorrent traffic on Backbone network Traffic over-classification by Cosine similarity High false positive rate of Cosine similarity

19 DPNM, POSTECH 19/23 NOMS 2010 Histogram of Similarity Values

20 DPNM, POSTECH 20/23 NOMS 2010 CDF of Distance among Payload Vectors

21 DPNM, POSTECH 21/23 NOMS 2010 Complexity Analysis

22 DPNM, POSTECH 22/23 NOMS 2010 Conclusion and Future Work  Develop new traffic classification research Utilizing document classification approach to traffic classification Unsupervised classification to make cluster within a single-application traffic Two-stage classification algorithm to solve asymmetric routing classification problem Linear time complexity  Compare three similarity metrics Provide guideline for selecting similarity metrics for traffic classification Provide soft-classification that represents similarity as a numerical value ranges from 0 to 1  Future Work Enhance unsupervised classification methodology for automated signature generation Extract orthogonal application vectors to improve scalability

23 DPNM, POSTECH 23/23 NOMS 2010


Download ppt "DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,"

Similar presentations


Ads by Google