Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,

Similar presentations


Presentation on theme: "Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,"— Presentation transcript:

1 Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky, István Szabó Traffic Lab, Ericsson Research Hungary

2 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt On the Validation of Traffic Classification Algorithms2008-04-292 /17 Aim & Contents  Aim: –Introduce our novel validation method which makes it possible to measure the accuracy of traffic classification methods  Contents: –Requirements – How should validation be done? –Related work – How is it currently done? –Our proposal – What have we proposed? –Working mechanism – How does our proposal work? –Validation a state-of-the-art traffic classification method – What have we learnt from the validation? –Future work – What else can be done with the proposed method?

3 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt On the Validation of Traffic Classification Algorithms2008-04-293 /17 Requirements – How should validation be done?  Objective of traffic classification: –Identify applications in passively observed traffic  Validation of classification method by active test -It should be independent from classification methods -About each packet the test should provide reference information -The test should be deterministic -Feasibility: create large tests in a highly automated way -Realistic environment

4 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt On the Validation of Traffic Classification Algorithms2008-04-294 /17 Related work – How is it currently done? Traffic classification methods Port based classification Signature based classification Connection pattern based classification Statistics based classification Information theory based classification Combined classification method Validation methods Manual validation Use of other traffic classification method Measurement data Manually created / Active measurement Public availableNon public availableOnline measurement Header traces → port based method Impossible to validate by others Impossible to repeat with same conditions Non- realistic environment Dynamically allocated ports Proprietary protocols Encryption Be up2date Proprietary protocols Encryption Be up2date Lot of flows Simultaneous applications Lot of flows Simultaneous applications Previously well-classified traces Just hint S. Sen and J. Wang: Analyzing Peer-to- peer Traffic Across Large Networks T. Karagiannis, K. Papagiannaki and M. Faloutsos : BLINC: Multilevel Traffic Classification in the Dark J. Erman, M. Arlitt and A. Mahanti : Traffic Classification Using Clustering Algorithms L. Bernaille et al: Traffic Classification On The Fly CURRENTLY Weak and ad hoc validation No reliable and widely accepted validation technique No reference packet trace with well-defined content is available CURRENTLY Weak and ad hoc validation No reliable and widely accepted validation technique No reference packet trace with well-defined content is available

5 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-295 OUR PROPOSAL

6 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-296 The proposed method for validation  Principle: –Packets are collected into flows at the traffic generating terminal –Flows are marked with the identifier of the application that generated the packets of the flow  The main requirements on the realization of the method: –It should not deteriorate the performance of the terminal –The byte overhead of marking should be negligible  The preferred realization is a driver that can be easily installed on terminals The position of the proposed driver within the terminal

7 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-297 Working mechanism 1.The packet is examined whether it is an incoming or outgoing packet 2.In case of an outgoing packet, the size of the packet is examined  Continues with only those packets which are smaller than the MTU decreased with the size of marking 3.The process continues with only TCP or UDP packets 4.According to the five-tuple identifier of the packet, it is checked whether there is already available information about which application the flow belongs to 5.Query operation system 6.Need marking:  Randomly  Only first  Leave the first  No mark The working mechanism of the introduced driver

8 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-298 Place of marking  Extending the original IP packet with one option field –Router Alert option field  Transparent for both the routers on the path and also for the receiver host (according to RFC 2113 [3]).  The first two characters of the corresponding executable file name are added –Increasing the size of the packet with 4 bytes –The packet size field in the IP header is also increased with 4 bytes –Header checksum is recalculated A marked packet of the BitTorrent protocol

9 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-299 PROOF-OF-CONCEPT

10 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2910 Reference measurement  Available at http://pics.etl.hu/˜szabog/measurement.tar  In a separated access network  Our driver has been installed onto all computers on this network  Duration of the measurement: 43 hours  Captured data volume: 6 Gbytes, containing 12 million packets  The measurement contains the traffic of the most popular –P2P protocols:  BitTorrent  eDonkey  Gnutella  DirectConnect –VoIP and chat applications:  Skype  MSN Live –FTP sessions –Download manager –E-mail sending, receiving sessions –Web based e-mail (e.g., Gmail) –SSH sessions –SCP sessions –FPS, MMORPG gaming sessions –Streaming:  Radio  Video  Web based The traffic mix of the measurement

11 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2911 Validation results (1) – Success  Combined traffic classification method (described in [1]) with the addition that the classification of VoIP applications has been extended with ideas from [2]  Accurately identified: –E-mail –Filetransfer –Streaming –Secure channel –Gaming traffic  Success due to: –Well-documented protocols –Open standards –Do not constantly change  Difficulties in case of…? –Encryption:  But: session initiation phase is critical as this phase can be identified accurately  Success: SSH or SCP [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification [2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification The results of the classification compared [1] to the reference measurement

12 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2912 Validation results (2) – P2P Difficulties:  Many TCP flows containing 1-2 SYN packets probably to disconnected peers –No payload in these packets =>the signature based methods can not work –Dynamically allocated source ports towards not well-known destination ports => the port based methods fail –Server search and P2P communication heuristic [1] methods also fail => there are no other successful flows to such IPs  Also some small non-P2P flows were misclassified into the P2P class –Not fully proper content of the port- application database –Creating too many port-application associations easily results in the rise of the misclassification ratio.  The constant change of P2P protocols –New features added to P2P clients day-by-day –Working mechanism can be typical for a selected client not the whole protocol itself [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification [2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification The results of the classification compared [1] to the reference measurement

13 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2913 Validation results (3) – Philosophy  Traffic which is the derivation of other traffic: –E.g., DNS traffic –MSN: HTTP protocol for transmitting chat messages –MSN client transmits advertisements over HTTP, but this cannot be recognized as deliberate web browsing  Hit := the classification outcome and the generating application type (the validation outcome) agreed –E.g., the chat on the DirectConnect hubs which has been classified as chat could have been considered as actually correct but in this comparison it was considered as misclassification The results of the classification compared [1] to the reference measurement [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification

14 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2914 Validation results (4) – VoIP: MSN, Skype  High VoIP hit ratio is due to the successful identification –MSN Messenger –Skype  Skype is difficult to identify –Same problem as in the case of P2P –Proprietary protocol designed to ensure secure communication –[2] characteristic feature: the application sends packets even when there is no ongoing call with an exact 20 sec interval. –In [1]: a P2P identification heuristic which was designed to track any message which has a periodicity in packet sending –Extension of [1] was straightforward  The validation showed: –The deficiency of the classification of Skype  Simple extension of the algorithm –Idea of [1] has been validated as it proved to be robust for the extension with new application recognition –Also the validation mechanism proved to be useful [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification [2] M. Perenyi and S. Molnar: Enhanced Skype Traffic Identification The results of the classification compared [1] to the reference measurement

15 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2915 It is independent from classification methods About each packet the test provides reference information The test is deterministic Feasibility: creates large tests in a highly automated way Summary  We introduced a new active measurement method which can help in the validation of traffic classification methods.  The introduced method is a network driver –Mark the outgoing packets from the clients with an application specific marking  With the introduced method we created a measurement and used this to validate the method presented in [1] –The method has been proved to be working accurately –Some deficiencies in the classification  P2P applications  Skype [1] G. Szabo, I. Szabo and D. Orincsay: Accurate Traffic Classification Benefits:

16 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt On the Validation of Traffic Classification Algorithms2008-04-2916 /17 Further work  Use the marking method at the measurement side for online traffic classification –Assumptions:  The terminals accessing an operator’s network are all installed with the proposed driver  The driver is made tamper-proof to avoid users forging the marking –Online clustering of the traffic into QoS classes based on the resource requirements of the generating application –Used by operators to charge on the basis of the used application by the user  Extension of the marking by other information about the traffic generating application –E.g., version number  Operator could track the security risks of an old application

17 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt On the Validation of Traffic Classification Algorithms2008-04-2917 /17 Questions, discussion…  Thank you very much for your kind attention!  Contact: –E-mail: geza.szabo@ericsson.comgeza.szabo@ericsson.com

18 Top right corner for field-mark, customer or partner logotypes. See Best practice for example. Slide title 40 pt Slide subtitle 24 pt Text 24 pt Bullets level 2-5 20 pt /17 On the Validation of Traffic Classification Algorithms2008-04-2918


Download ppt "Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,"

Similar presentations


Ads by Google