Presentation is loading. Please wait.

Presentation is loading. Please wait.

Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino

Similar presentations


Presentation on theme: "Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino"— Presentation transcript:

1 Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino Email:mellia@tlc.polito.it 1

2 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Traffic Classification & Measurement ? Why?  Identify normal and anomalous behavior  Characterize the network and its users  Quality of service  Filtering  … How? How?  By means of passive measurement 2

3 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 3 Scenario Traffic classifier  Deep packet inspection  Statistical methods Persistent and scalable monitoring platform  Round Robin Database (RRD)  Histograms Internal Clients Edge Router External Servers http://tstat.tlc.polito.it

4 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Tstat at a Glance

5 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Worm and Viruses? Did someone open a Christmas card? Happy new year to Windows!!

6 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Anomalies (Good!) Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008 Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

7 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 New Applications – P2PTV Fiorentina 4 - Udinese 2 Inter 1 - Juventus 0

8 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Megaupload blocked 19/01/12

9 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How to monitor traffic? All previous examples rely on the availability of a CLASSIFIER  A tool that can discriminate classes of traffic Classification: the problem of assigning a class to an observation  The set of classes is pre-defined  The output may be correct or not 9

10 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Some terminology Question: Is this a cat, a rabbit, or a dog? 10

11 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Some terminology Question: Is this a cat, a rabbit, or a dog? 11

12 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How to compute performance? Confusion matrix  On rows we have the actual class  On columns we have the predicted class Allows to see if some confusion arises 12 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211

13 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How to compute performance? Confusion matrix True positive  It was classified as a cat, and it was a cat 13 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211

14 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How to compute performance? Confusion matrix False negative  It was classified NOT as a cat, but it was a cat 14 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211

15 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How to compute performance? Confusion matrix True negative  It was classified NOT as a cat, and it was NOT a cat 15 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211

16 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How to compute performance? Confusion matrix False positive  It was classified as a cat, but it was NOT a cat 16 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211

17 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Other metrics Accuracy: is the ratio of the sum of all True Positives to the sum of all tests, for all classes. It is biased toward the most predominant class in a data set.  Consider for example a test to identify patients that suffer from a disease that affects 10 patient over 100 tests. The classifier that always returns ``sane'' will have accuracy of 90%. 17

18 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211 Other metrics Recall of a class: is the ratio of the True Positives and the sum of True Positives and False Negatives.  Recall(cat)=5/(5+3+0)  It is a measure of the ability of a classifier to select instances of the given class from a data set 18

19 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Other metrics Precision of a class: is the ratio of True Positives and the sum of True Positives and False Positive  Precision(cat) = 5/(5+2+0)  It is a metric that measure how precise is the classifier in labeling only samples of a given class 19 Predicted class CatDogRabbit Actual class Cat530 Dog231 Rabbit0211

20 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Traffic classification Look at the packets… Tell me what protocol and/or application generated them

21 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Port: Port: 4662/4672 Port: Payload: “bittorrent” Payload: E4/E5 Payload: RTP protocol SkypeBittorrent GtalkeMule Typical approach: Deep Packet Inspection (DPI)

22 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 The problem of traffic classification Deep Packet Inspection  Based on looking for some pre-defined payload patterns, deep in the packet Simple at L2-L4  “if ethertype == 0x0800, then there is an IP packet”  Usually done with a set of if-then-else or even switch- case Ambiguous at L7  TCP port 80 does not mean automatically “protocol HTTP”

23 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 DPI: Rule-set complexity Practical rule-sets:  Snort, as of November 2007 8536 rules, 5549 Perl Compatible Regular Expressions  OpenDPI as of February 2012 118 protocols  Tstat as of February 2012 Approx 200 classes/services 23 Deep packet inspection Regular expression matching at line rate Finite Automata based techniques =

24 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Some notes... Protocol identification… … or application verification?  Skype can use the standard HTTP protocol to exchange data  Is that traffic “Skype” or “HTTP”? Today everything is going over HTTP  Is it Facebook? Twitter? YouTube video? Or HTTP?

25 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 The question Which granularity are you interested into ?? 25

26 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Several approaches to traffic classification Content- based Packet-based (e.g., Spatscheck) Port-based (stateless) Statistical methods Host social behaviour (e.g., Faloutsos) Traffic statistics (e.g., Salgarelli, Baiocchi, Moore, Mellia) Auto-learning methods (e.g. Bayes) Preclassified bins Message- based Protocol behaviour (e.g., BinPac, SML) Traffic classification Pre-computed or auto- learning signatures Payload-based (stateful)

27 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Some references

28 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Some references Some critical overview/tutorial  T.T.T.Nguyen, G.Armitage. A survey of techniques for internet traffic classification using machine learning, Communications Surveys & Tutorials, IEEE, V.10, N.4, pp.56 - 76  H.Kim, KC Claffy, M.Fomenkov, D.Barman, M.Faloutsos, K.Lee. Internet traffic classification demystified: myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference (CoNEXT '08). ACM, New York, NY, USA, 2008. For this class:  D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli Revealing skype traffic: when randomness plays with you ACM SIGCOMM Kyoto, JP, ISBN: 978-1-59593-71, 27 August 2007.  A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st TMA Workshop, Aachen, 11 May 2009.  A. Finamore, M. Mellia, M. Meo, D. Rossi, KISS: Stochastic Packet Inspection Classifier for UDP Traffic, IEEE/ACM Transactions on Networking "5", Vol. 18, pp. 1505-1015, ISSN: 1063-6692, October 2010.  G. La Mantia, D. Rossi, A. Finamore, M. Mellia, M. Meo, Stochastic Packet Inspection for TCP Traffic, IEEE ICC, Cape Town, South Africa, 23 May 2010.

29 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Port: Port: 4662/4672 Port: Payload: “bittorrent” Payload: E4/E5 Payload: RTP protocol SkypeBittorrent GtalkeMule Typical approach: Deep Packet Inspection (DPI) It fails more and more: P2P Encryption Proprietary solution Many different flavours

30 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 The Failure of DPI 11.05.2008 12:29 eMule 0.49a released 1.08.2008 20:25 eMule 0.49b released

31 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Possible Solution: Behavioral Classifier Phase 1 Feature Phase 3 Verify 1. Statistical characterization of traffic 2. Look for the behaviour of unknown traffic and assign the class that better fits it 3. Check for possible classification mistakes Phase 2 Decision Traffic (Known) (Training) (Operation)

32 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Behavioural classifiers Which statistics?  Packet size Average, std, max, min Len of first X pkts  IPG Average, std, max, min IPG of first X+1 pkts  Total size, duration, #data packets  From client, from server, from both  RTT, #concurrent connection, rtx, dups, …  TCP options, flags, signaling, … Feature selection? Which decision process?  Ad Hoc  Bayesian  Neural Networks  Decision trees  SVM  … Which training set?  Supervised techniques 32

33 The case of Skype Consider a simple example 33

34 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Our Goal Identify Skype traffic Motivations  Operators need to know what is running in their network New business models, provisioning, TE, etc.  Understand user behaviour  Traffic characterization, security  …  It’s fun

35 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype Overview Skype offers voice, video, chat and data transfer services over IP Closed design, proprietary solutions  P2P technology  Proprietary protocols  Encrypted communications Easy to use, difficult to reveal  It is the perfect example of DPI failure No server No well-known port … No standard No RFC State-of-the-Art Encryption/Obfuscation Mechanisms

36 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Our Goal Identify Skype traffic  Voice stream first: both E2E and SkypeOut/In streams  Possible video/chat/file transfers/signaling Constraints  Passive observation of traffic  Protocol ignorance

37 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Three Classifiers Naïve Bayes Classifier Payload Based Classifier Chi Square Classifier Skype? Traffic Flow Skype? ANDAND

38 Phase 1 – try to understand it

39 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype as VoIP Application Skype selects the voice codec from a list  Low bit rate: 10-32 kbps  Regular Inter-Packet-Gap (30 ms frames) Redundancy may be added to mitigate packet loss Framing may be modified from the original codec one Multiplexes different source into the same message (voice, video, chat, … )

40 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype Source Model Skype Message TCP/UDP IP

41 Skype Header Formats (What we guess about it) Can we design a DPI classifier?

42 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Possible Skype Messages Signaling and data messages  Use TCP, with ciphered payload Login, lookup, signaling… Data flow  Use UDP whenever possible: payload is encrypted … but some header MUST be exposed Question:  Some header MUST be exposed…  Why?!?? Impossible to exploit. Everything is ciphered

43 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Possible Skype Messages Signaling and data messages  Use TCP, with ciphered payload Login, lookup, signaling… Data flow  Use UDP whenever possible: payload is encrypted … but some header MUST be exposed Source AES Receiver AES Impossible to exploit. Everything is ciphered Unreliable

44 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype Source Model Skype Message TCP/UDP IP

45 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 SoM Format for E2E Messages Start of Message (SoM) of End2End messages carried by UDP has: ID : 16 bits long random identifier FUNC : 5 bits long function (multiplexing?), obfuscated in a Byte 0 8 16 24 --------------------- | ID | FUNC | ---------------------

46 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 46 Function Values 0x01 = ??Query message 0x02 = ??Query 0x0d = Data 0x07 = NAK Voice Video Chat File

47 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 PBC SoM can be used to identify Skype flows carried by UDP  5bits long signature Question:  Which is the chance that you have a false positive? Classic signature based classifier

48 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 PBC SoM can be used to identify Skype flows carried by UDP  5bits long signature Question:  Which is the chance that you have a false positive? We look for 1 string out of 32 possible strings  1/32 of false detection possible Can we improve it by checking multiple packets  Yet we can have a very high false positive rate Classic signature based classifier

49 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 PBC Any other smart way of improving accuracy?  Hint: this is UDP 49

50 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 PBC SoM can be used to identify Skype flows carried by UDP  5bits long signature Classic signature based classifier

51 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 PBC SoM can be used to identify Skype flows carried by UDP  5bits long signature IMPROVE: Identify Skype socket address at clients  The UDP port is FIXED and not random (as in TCP)  Then, look for Skype flows with the same UDP port It works  with UDP only  at edge node only Complex Cannot discriminate VOICE/VIDEO/CHAT/DATA Classic signature based classifier

52 Skype Encrypts Traffic Can we leverage this?

53 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype Source Model Skype Message TCP/UDP IP

54 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Randomness Classifier Skype encrypts traffic  payload looks like random Some headers are constant ( FUNC ) Apply randomness test to the payload bits  Chi-Square test: statistic test for random sequences

55 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 55 CSC Split the payload into groups Apply the test on the values assumed at each group  Each message is an observation Some groups will contain  Random bits  Mixed bits  Deterministic bits 0 8 16 24 --------------------- | ID | FUNC | ---------------------

56 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 CSC Set a threshold

57 Skype is a VoIP Application Which are the features that make it different from a bulk download?

58 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype Source Model Skype Message TCP/UDP IP

59 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Which features? Question: Which features would you select to differentiate a VoIP stream from a data download? 59

60 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Sample Trace Regular IPG Small/regular packets

61 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Naive Baysean Classifier Simple classifier: based on the a-priori prob, evaluate the a-posteriori prob  How similar is this flow to a Skype voice flow? What makes “VoIP” traffic different from other traffic?  Packet size, i.e., small packets (packet NBC)  Inter-Packet-Gap, i.e., small IPG (IPG NBC)

62 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Design of the behavioral classifier We consider windows of N packets In each window, we check the IPG and the payloadsize distribution We compare against the expected distribution and compute a belief  One belief for each “mode” After K windows, take a decision  Consider the most likely mode in each windows  Average over all K windows to get the average max belief  Compare it against a threshold 62

63 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Skype Naive Bayes Classifier W (k) W (k+1) W (k+2) W (k+3) W (k+4) X Y max AVG min B s (k,j) E[B s (,j) ] B  (k) E[B  ] B Packet NBC IPG NBC IPG NBC``

64 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 NBC over Time Set a threshold

65 Results

66 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Three Classifiers Naïve Bayes Classifier Payload Based Classifier Chi Square Classifier Skype? Traffic Flow Skype? ANDAND UDP benchmark dataset

67 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Scenario Testbed traces: 100% accuracy Campus LAN@polito  Simple scenario, no P2P, no VoIP Italian ISP  Stiff scenario: lot of P2P, tons of VoIP Results consider  True positive (OK): Skype, and identified  False positive (FP): Not Skype, but identified  False negative (FN): Skype, but discarded

68 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Performance Evaluation: UDP NBC + CSC Chi Square Classifier Naïve Based Classifier Payload Based Classifier

69 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 CSC Threshold Impact

70 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 NBC Threshold Impact 0 1 2 3 4 5 -30-25-20-15-10-5 0 FP [%] E2E 0 20 40 60 80 100 -30-25-20-15-10-5 0 FN [%] Minimum Belief E2E

71 Kiss: chi square stocastich classifier or Stocastic Packet Inspection Generalize it 71

72 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Our Approach Statistical characterization of bits in a flow Do NOT look at the SEMANTIC and TIMING … but rather look at the protocol FORMAT Test  2

73 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 04816192432 Source PortDestination Port Sequence Number Acknowledgment Number Checksum Options Window Urgent Pointer HLEN ResvControl flag Padding Question: Which protocol is this? CONSTANT COUNTER RANDOM

74 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 74 CSC Split the payload into groups Apply the test on the values assumed at each group  Each message is an observation Some groups will contain  Random bits  Mixed bits  Deterministic bits 0 8 16 24 --------------------- | ID | FUNC | ---------------------

75 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Chunking and  2 First N payload bytes C chunks Each of b bits  2 1  2 C [], …, Vector of Statistics The provides an implicit measure of entropy or randomness  2 Observed distribution Expected distribution (uniform)

76 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Consider a chunk of 2 bits: 0 1 2 3 Random Values Deterministic Value Counter OiOi and different beaviour EiEi

77 Question: Why comparing against a UNIFORM pdf?? 77

78 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 4 bit long chunks: evolution random xxxx  2

79 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 random Deterministic 0001 4 bit long chunks: evolution  2

80 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 random deterministic mixed x000x0x00xxx 4 bit long chunks: evolution  2 Question: is it dependent on WHICH bits are fixed And WHICH are random???

81 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 KISS

82 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Question: What is this?!?! 2 byte long counter MSGL2L1LSG Most Significant Group Less Significant Group

83 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Protocol format as seen from the  2

84 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Statistical characterization of bits in a flow Decision process Test Minimum distance / maximum likelihood  2 Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Our Approach

85 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 C-dimension space  2 1  2 C [], …, Iperspace Classification Regions Euclidean Distance Support Vector Machine  2 i  2 j Class My Point

86 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Example considering the  2

87 3° Cost-TMA PhD school – Krakow – Feb. 15 2012  2 i  2 j Centroid Center of mass Euclidean Distance Classifier

88 3° Cost-TMA PhD school – Krakow – Feb. 15 2012  2 i  2 j True Negative Are “Far” True Positives Are “Nearby” Centroid Center of mass Euclidean Distance Classifier

89 3° Cost-TMA PhD school – Krakow – Feb. 15 2012  2 i  2 j False Positives Centroid Center of mass Iper-sphere Euclidean Distance Classifier

90 3° Cost-TMA PhD school – Krakow – Feb. 15 2012  2 i  2 j Centroid Center of mass Iper-sphere False negatives Radius Euclidean Distance Classifier

91 3° Cost-TMA PhD school – Krakow – Feb. 15 2012  2 i  2 j Centroid Center of mass Iper-sphere max { True Pos. } min { False Neg. } Confidence The distance is a measure of the condifence of the decision Euclidean Distance Classifier

92 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Radius True Positive – False positive How to define the sphere radius?

93 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Space of samples (dim. C) Kernel function Space of feature (dim. ∞) Kernel functions Move point so that borders are simple Support Vector Machine

94 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Support vectors Kernel functions Move point so that borders are simple Borders are planes Simple surface! Nice math Support Vectors LibSVM Support Vector Machine

95 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Decision Distance from the border Confidence is a probability p (  class ) Kernel functions Borders are planes Simple surface! Nice math Support Vectors LibSVM Support Vector Machine

96 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Performance evaluation How accurate is all this? Our Approach Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Statistical characterization of bits in a flow Decision process Test Minimum distance / maximum likelihood  2

97 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Per flow and per endpoint What are we going to classify?  It can be applied to both single flows  And to endpoints Question:  Do we assume to monitor ALL packets?  Do we assume to monitor since the first packet? 97

98 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Per flow and per endpoint What are we going to classify?  It can be applied to both single flows  And to endpoints Question:  Do we assume to monitor ALL packets?  Do we assume to monitor since the FIRST packet? NO!  It is robust to sampling  It can start from any point 98

99 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Per flow and per endpoint 99

100 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Real traffic traces Internet Fastweb Known + Other Training Known Traffic False Negatives Unknown traffic False Positives Trace RTP eMule DNS Oracle (DPI + Manual ) other Other Unknown Traffic 1 day long trace 20 GByte di UDP traffic

101 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Definition of false positive/negative Traffic Oracle (DPI) eMule RTP DNS Other Classifing “known” true positives false negatives KISS true negatives false positives Classifing “other” KISS

102 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Case ACase B Rtp0.080.23 Edk13.037.97 Dns6.5719.19 Case ACase B 0.000.05 0.980.54 0.122.14 Case ACase B other13.617.01 Euclidean Distance SVM Case ACase B 0.000.18 Results Known traffic (False Neg.) [%] Other (False Pos.) [%]

103 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Tuning trainset size % True positives False positives Samples per class Small training set For “known”: 70-80 Mbyte For “other”: 300 Mbyte

104 3° Cost-TMA PhD school – Krakow – Feb. 15 2012  2 packets % True positives False positives Tuning Num. of Packets for (confidence 5%) Protocols with volumes at least 70-80 pkts per flow

105 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Real traffic trace RTP errors are oracle mistakes (do not identify RTP v1) DNS errors are due to impure training set (for the oracle all port 53 is DNS traffic) EDK errors are (maybe) Xbox Live (proper training for “other”) FN are always below 3%!!!

106 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 P2P-TV applications P2P-TV applications are becoming popular They heavly rely on UDP at the transport protocol They are based on proprietary protocols They are evolving over time very quickly How to identify them?... After 6 hours, KISS give you results

107 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Putting all together Now with  9 classes  3 different networks 107

108 Another example of behavioral classifier Abacus 108

109 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Abacus: Rationale Applications are like people in a party room  Some prefer brief exchanges with many other people  Some likes long talks with few other people “Attitudes” are different across P2P applications...  Some prefer to download small pieces of data from many peers  Some prefer to download all data from almost the same peers... enough to classify them  Observe a host for a given time  Count the number of peers contacted and the number of packets exchanged which represent the attitude

110 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Abacus signature definition Consider a host X which in a fixed time- window ΔT = 5s is contacted by N=5 peers Y i for each peer Y i count the number of packets sent to X in ΔT Consider a set of bins of exponential width Divide the peers in bins according to the number of exchanged packets Normalize the bins, i.e. divide for the total number of peers N The final signature is an empirical probability distribution function In the example N=5, bins = (1, 0, 2, 2)‏ Abacus signature (0.2, 0, 0.4, 0.4)‏ X Y1Y1 Y2Y2 123-45-8... Y3Y3 Y5Y5 Y4Y4 9-16

111 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Signature comparison PPLiveTVAnts Joost SopCast

112 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Performance evaluation How accurate is all this? Our Approach Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Statistical characterization of bits in a flow ABACUS signatures Decision process Supervised machine learning based on SVM

113 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Hyper-space is partitioned  every point is given a label  even “unknown” apps Need a way to recognize them  Define a center for each class  Define a threshold R  Calculate the distance d between the point and the center of the assigned class  If d > R mark the new point as unknown Bhattacharyya distance BD  Distance between p.d.f. Rejection criterion Training points R R Center of the class New points Labeled as “unknown” Labeled as “green”

114 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Experimental results

115 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Experimental results For “unknown traffic” the selection of the rejection threshold R is fundamental! For R~0 low TPR low FPR For R~1 high TPR high FPR For R=0.5 high TPR low FPR

116 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 The Failure of DPI

117 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Question you should ask yourself Which feature to use?  Are those “portable”?  How often should the classifier be retrained?  Can it take a decision after some packets? Open issues  Which is a good training set?  How to get a valid benchmarking set? Never trust other classifiers!  How to make it work at 100Gb/s?  How much traffic can it actually classify (coverage)?

118 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 How Much Traffic can be Classified? 118

119 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Interesting research topics And for TCP? And for HTTP?  How to get Facebook or Twitter  Over HTTPS?  When served by the same CDN? And for the application?  Is it a video seen on Facebook? Can I get rid of the training set?  Use unsupervised classifiers? 119

120 120

121 Ops, wrong key 121

122 And for TCP? 122

123 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Chunking and  2 First N payload bytes C chunks Each of b bits  2 1  2 C [], …, Vector of Statistics The provides an implicit measure of entropy or randomness  2 Observed distribution Expected distribution (uniform)

124 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Results 124

125 3° Cost-TMA PhD school – Krakow – Feb. 15 2012 Results 125


Download ppt "Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino"

Similar presentations


Ads by Google