Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Similar presentations


Presentation on theme: "Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1."— Presentation transcript:

1 Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1

2 Traffic Classification & Measurement ? Why?  Identify normal and anomalous behavior  Characterize the network and its users  Quality of service  Filtering  … How? How?  By means of passive measurement  Using Tstat 2

3 3 Tstat Traffic classifier  Deep packet inspection  Statistical methods Persistent and scalable monitoring platform  Round Robin Database (RRD)  Histograms Internal Clients Edge Router External Servers http://tstat.tlc.polito.it

4 Tstat at a Glance

5 Worm and Viruses? Did someone open a Christmas card? Happy new year to Windows!!

6 Anomalies (Good!) Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008 Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

7 New Applications – P2PTV Fiorentina 4 - Udinese 2 Inter 1 - Juventus 0

8 Traffic classification Look at the packets… Tell me what protocol and/or application generated them

9 Port: Port: 4662/4672 Port: Payload: “bittorrent” Payload: E4/E5 Payload: RTP protocol SkypeBittorrent GtalkeMule Typical approach: Deep Packet Inspection (DPI) It fails more and more: P2P Encryption Proprietary solution Many different flavours

10 The Failure of DPI 11.05.2008 12:29 eMule 0.49a released 1.08.2008 20:25 eMule 0.49b released

11 Possible Solution: Behavioral Classifier Phase 1 Feature Phase 3 Verify 1. Statistical characterization of traffic (given source) 2. Look for the behaviour of unknown traffic and assign the class that better fits it 3. Check for possible classification mistakes Phase 2 Decision Traffic (Known) (Training) (Operation)

12 Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Our Approach Statistical characterization of bits in a flow Do NOT look at the SEMANTIC and TIMING … but rather look at the protocol FORMAT Test  2

13 Chunking and  2 First N payload bytes C chunks Each of b bits  2 1  2 C [], …, Vector of Statistics The provides an implicit measure of entropy or randomness  2 Observed distribution Expected distribution (uniform)

14 Consider a chunk of 2 bits: 0 1 2 3 Random Values Deterministic Value Counter OiOi and different beaviour

15 4 bit long chunks: evolution random xxxx  2

16 Deterministic 0001 4 bit long chunks: evolution  2

17 random deterministic mixed x000x0x00xxx 4 bit long chunks: evolution  2

18 Chi Square Classifier Split the payload into groups Apply the test on the groups at the flow end: each message is a sample Some groups will contain  Random bits  Mixed bits  Deterministic bits 0 8 16 24 --------------------- | ID | FUNC | ---------------------

19 CSC

20 And the counter example? 2 byte long counter MSGL2L1LSG Most Significant Group Less Significant Group

21 Protocol format as seen from the  2

22 Statistical characterization of bits in a flow Decision process Test Minimum distance / maximum likelihood  2 Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Our Approach

23 C-dimension space  2 1  2 C [], …, Iperspace Classification Regions Euclidean Distance Support Vector Machine  2 i  2 j Class My Point

24 Example considering the  2

25  2 i  2 j Centroid Center of mass Euclidean Distance Classifier

26  2 i  2 j True Negative Are “Far” True Positives Are “Nearby” Centroid Center of mass Euclidean Distance Classifier

27  2 i  2 j False Positives Centroid Center of mass Iper-sphere Euclidean Distance Classifier

28  2 i  2 j Centroid Center of mass Iper-sphere False negatives Radius Euclidean Distance Classifier

29  2 i  2 j Centroid Center of mass Iper-sphere min { False Pos. } min { False Neg. } Confidence The distance is a measure of the condifence of the decision Euclidean Distance Classifier

30 Radius True Positive – False positive How to define the sphere radius?

31 Space of samples (dim. C) Kernel function Space of feature (dim. ∞) Kernel functions Move point so that borders are simple Support Vector Machine

32 Support vectors Kernel functions Move point so that borders are simple Borders are planes Simple surface! Nice math Support Vectors LibSVM Support Vector Machine

33 Decision Distance from the border Confidence is a probability p (  class ) Kernel functions Borders are planes Simple surface! Nice math Support Vectors LibSVM Support Vector Machine

34 Performance evaluation How accurate is all this? Our Approach Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Statistical characterization of bits in a flow Decision process Test Minimum distance / maximum likelihood  2

35 Per flow and per endpoint What are we going to classify?  It can be applied to both single flows  And to endpoints It is robust to sampling  Does not require to monitor all packets, not the first packets 35

36 Real traffic traces Internet Fastweb Known + Other Training Known Traffic False Negatives Unknown traffic False Positives Trace RTP eMule DNS Oracle (DPI + Manual ) other Other Unknown Traffic 1 day long trace 20 GByte di UDP traffic

37 Definition of false positive/negative Traffic Oracle (DPI) eMule RTP DNS Other Classifing “known” true positives false negatives true negatives false positives Classifing “other” KISS

38 Case ACase B Rtp0.080.23 Edk13.037.97 Dns6.5719.19 Case ACase B 0.000.05 0.980.54 0.122.14 Case ACase B other13.617.01 Euclidean Distance SVM Case ACase B 0.000.18 Results Known traffic (False Neg.) [%] Other (False Pos.) [%]

39 Real traffic trace RTP errors are oracle mistakes (do not identify RTP v1) DNS errors are due to impure training set (for the oracle all port 53 is DNS traffic) EDK errors are (maybe) Xbox Live (proper training for “other”) FN are always below 3%!!!

40 Tuning trainset size % True positives False positives Samples per class (confidence 5%) Small training set For “known”: 70-80 Mbyte For “other”: 300 Mbyte

41  2 packets % True positives False positives Tuning num of packets for (confidence 5%) Protocols with volumes at least 70-80 pkts per flow

42 P2P-TV applications P2P-TV applications are becoming popular They heavly rely on UDP at the transport protocol They are based on proprietary protocols They are evolving over time very quickly How to identify them?... After 6 hours, KISS give you results

43 The Failure of DPI

44 And for TCP? 44

45 Chunking and  2 First N payload bytes C chunks Each of b bits  2 1  2 C [], …, Vector of Statistics The provides an implicit measure of entropy or randomness  2 Observed distribution Expected distribution (uniform)

46 Results 46

47 Results 47

48 Pros and Cons KISS is good because… Blind approach Completely automated Works with many protocols Works even with small training Statistics can start at any point Robust w.r.t. packet drops Bypasses some DPI problems but… Learn (other) properly Needs volumes of traffic May require memory (for now) Only UDP (for now) Only offline (for now)

49 Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007 D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008 D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008 D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009 A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009

50 And for TCP 50


Download ppt "Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1."

Similar presentations


Ads by Google