Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Characteristics of Network Traffic Flow Anomalies Paul Barford and David Plonka University of Wisconsin – Madison SIGCOMM IMW, 2001.
Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Performance Testing - Kanwalpreet Singh.
Tools, Algorithms & System Implementation for End-user performance monitoring dario.rossi Dario Rossi
An Introduction of Botnet Detection – Part 2 Guofei Gu, Wenke Lee (Georiga Tech)
Tactics to Discover “Passive” Monitoring Devices
KISS: Stochastic Packet Inspection for UDP Traffic Classification
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
Chapter 7 – Transport Layer Protocols
The testbed environment for this research to generate real-world Skype behaviors for analyzation is as follows: A NAT-ed LAN consisting of 7 machines running.
Toyota InfoTechnology Center U.S.A, Inc. 1 Mixture Models of End-host Network Traffic John Mark Agosta, Jaideep Chandrashekar, Mark Crovella, Nina Taft.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Determining applications and characteristics of encrypted wireless traffic. Chris Hanks CMPE 257 3/17/2011.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Texture Recognition and Synthesis A Non-parametric Multi-Scale Statistical Model by De Bonet & Viola Artificial Intelligence Lab MIT Presentation by Pooja.
1 Collaborative Online Passive Monitoring for Internet Quarantine Weidong Cui SAHARA Winter Retreat, 2004.
Licentiate Seminar: On Measurement and Analysis of Internet Backbone Traffic Wolfgang John Department of Computer Science and Engineering Chalmers University.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Tracking down Traffic Dario Bonfiglio Marco Mellia Michela Meo Nicolo’ Ritacca Dario Rossi.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Network Planète Chadi Barakat
Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Traffic Classification through Simple Statistical Fingerprinting M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review,
Revealing Skype Traffic: When Randomness Plays with You D. Bonfiglio 1, M. Mellia 1, M. Meo 1, D. Rossi 2, P. Tofanelli 3 Dipartimento di Elettronica,
DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,
SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo.
TCOM 515 Lecture 6.
1 Understanding VoIP from Backbone Measurements Marco Mellia, Dario Rossi Robert Birke, and Michele Petracca INFOCOM 07’, Anchorage, Alaska, USA Young.
Intrusion Detection and Prevention. Objectives ● Purpose of IDS's ● Function of IDS's in a secure network design ● Install and use an IDS ● Customize.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
On the processing time for detection of Skype traffic P.M. Santiago del Río, J. Ramos, J.L. García-Dorado, J. Aracil Universidad Autónoma de Madrid A.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.
NetworkProfiler: Towards Automatic Fingerprinting of Android Apps Shuaifu Dai, Alok Tongaonkar, Xiaoyin Wang, Antonio Nucci, and Dawn Song Presented by:
IEEE Communications Surveys & Tutorials 1st Quarter 2008.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
1 Measuring P2P IPTV Systems Thomas Silverston, Olivier Fourmaux Universit ´e Pierre et Marie Curie - Paris 6 ACM NOSSDAV th International workshop.
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Presenter: Kuei-Yu Hsu Advisor: Dr. Kai-Wei Ke 2013/4/29 Detecting Skype flows Hidden in Web Traffic.
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
1 Virtual Dark IP for Internet Threat Detection Akihiro Shimoda & Shigeki Goto Waseda University
Machine Learning for Network Anomaly Detection Matt Mahoney.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Transport layer identification of P2P traffic Victor Gau Yi-Hsien Wang
@Yuan Xue CS 283Computer Networks Spring 2011 Instructor: Yuan Xue.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
1 Netflow Collection and Aggregation in the AT&T Common Backbone Carsten Lund.
A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their original slides that accompany the.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
SketchVisor: Robust Network Measurement for Software Packet Processing
On-line Detection of Real Time Multimedia Traffic
Measuring Service in Multi-Class Networks
Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Roland Kwitt & Tobias Strohmeier
Unknown Malware Detection Using Network Traffic Classification
Network Simulation/Emulation Platform
DDoS Attack Detection under SDN Context
POINT ESTIMATOR OF PARAMETERS
Chapter-5 Traffic Engineering.
Internet Traffic Classification Using Bayesian Analysis Techniques
A flow aware packet sampling mechanism for high speed links
Presentation transcript:

Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1

Traffic Classification & Measurement ? Why?  Identify normal and anomalous behavior  Characterize the network and its users  Quality of service  Filtering  … How? How?  By means of passive measurement  Using Tstat 2

3 Tstat Traffic classifier  Deep packet inspection  Statistical methods Persistent and scalable monitoring platform  Round Robin Database (RRD)  Histograms Internal Clients Edge Router External Servers

Tstat at a Glance

Worm and Viruses? Did someone open a Christmas card? Happy new year to Windows!!

Anomalies (Good!) Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008 Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

New Applications – P2PTV Fiorentina 4 - Udinese 2 Inter 1 - Juventus 0

Traffic classification Look at the packets… Tell me what protocol and/or application generated them

Port: Port: 4662/4672 Port: Payload: “bittorrent” Payload: E4/E5 Payload: RTP protocol SkypeBittorrent GtalkeMule Typical approach: Deep Packet Inspection (DPI) It fails more and more: P2P Encryption Proprietary solution Many different flavours

The Failure of DPI :29 eMule 0.49a released :25 eMule 0.49b released

Possible Solution: Behavioral Classifier Phase 1 Feature Phase 3 Verify 1. Statistical characterization of traffic (given source) 2. Look for the behaviour of unknown traffic and assign the class that better fits it 3. Check for possible classification mistakes Phase 2 Decision Traffic (Known) (Training) (Operation)

Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Our Approach Statistical characterization of bits in a flow Do NOT look at the SEMANTIC and TIMING … but rather look at the protocol FORMAT Test  2

Chunking and  2 First N payload bytes C chunks Each of b bits  2 1  2 C [], …, Vector of Statistics The provides an implicit measure of entropy or randomness  2 Observed distribution Expected distribution (uniform)

Consider a chunk of 2 bits: Random Values Deterministic Value Counter OiOi and different beaviour

4 bit long chunks: evolution random xxxx  2

Deterministic bit long chunks: evolution  2

random deterministic mixed x000x0x00xxx 4 bit long chunks: evolution  2

Chi Square Classifier Split the payload into groups Apply the test on the groups at the flow end: each message is a sample Some groups will contain  Random bits  Mixed bits  Deterministic bits | ID | FUNC |

CSC

And the counter example? 2 byte long counter MSGL2L1LSG Most Significant Group Less Significant Group

Protocol format as seen from the  2

Statistical characterization of bits in a flow Decision process Test Minimum distance / maximum likelihood  2 Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Our Approach

C-dimension space  2 1  2 C [], …, Iperspace Classification Regions Euclidean Distance Support Vector Machine  2 i  2 j Class My Point

Example considering the  2

 2 i  2 j Centroid Center of mass Euclidean Distance Classifier

 2 i  2 j True Negative Are “Far” True Positives Are “Nearby” Centroid Center of mass Euclidean Distance Classifier

 2 i  2 j False Positives Centroid Center of mass Iper-sphere Euclidean Distance Classifier

 2 i  2 j Centroid Center of mass Iper-sphere False negatives Radius Euclidean Distance Classifier

 2 i  2 j Centroid Center of mass Iper-sphere min { False Pos. } min { False Neg. } Confidence The distance is a measure of the condifence of the decision Euclidean Distance Classifier

Radius True Positive – False positive How to define the sphere radius?

Space of samples (dim. C) Kernel function Space of feature (dim. ∞) Kernel functions Move point so that borders are simple Support Vector Machine

Support vectors Kernel functions Move point so that borders are simple Borders are planes Simple surface! Nice math Support Vectors LibSVM Support Vector Machine

Decision Distance from the border Confidence is a probability p (  class ) Kernel functions Borders are planes Simple surface! Nice math Support Vectors LibSVM Support Vector Machine

Performance evaluation How accurate is all this? Our Approach Phase 1 Feature Phase 3 Verify Phase 2 Decision Traffic (Known) Statistical characterization of bits in a flow Decision process Test Minimum distance / maximum likelihood  2

Per flow and per endpoint What are we going to classify?  It can be applied to both single flows  And to endpoints It is robust to sampling  Does not require to monitor all packets, not the first packets 35

Real traffic traces Internet Fastweb Known + Other Training Known Traffic False Negatives Unknown traffic False Positives Trace RTP eMule DNS Oracle (DPI + Manual ) other Other Unknown Traffic 1 day long trace 20 GByte di UDP traffic

Definition of false positive/negative Traffic Oracle (DPI) eMule RTP DNS Other Classifing “known” true positives false negatives true negatives false positives Classifing “other” KISS

Case ACase B Rtp Edk Dns Case ACase B Case ACase B other Euclidean Distance SVM Case ACase B Results Known traffic (False Neg.) [%] Other (False Pos.) [%]

Real traffic trace RTP errors are oracle mistakes (do not identify RTP v1) DNS errors are due to impure training set (for the oracle all port 53 is DNS traffic) EDK errors are (maybe) Xbox Live (proper training for “other”) FN are always below 3%!!!

Tuning trainset size % True positives False positives Samples per class (confidence 5%) Small training set For “known”: Mbyte For “other”: 300 Mbyte

 2 packets % True positives False positives Tuning num of packets for (confidence 5%) Protocols with volumes at least pkts per flow

P2P-TV applications P2P-TV applications are becoming popular They heavly rely on UDP at the transport protocol They are based on proprietary protocols They are evolving over time very quickly How to identify them?... After 6 hours, KISS give you results

The Failure of DPI

And for TCP? 44

Chunking and  2 First N payload bytes C chunks Each of b bits  2 1  2 C [], …, Vector of Statistics The provides an implicit measure of entropy or randomness  2 Observed distribution Expected distribution (uniform)

Results 46

Results 47

Pros and Cons KISS is good because… Blind approach Completely automated Works with many protocols Works even with small training Statistics can start at any point Robust w.r.t. packet drops Bypasses some DPI problems but… Learn (other) properly Needs volumes of traffic May require memory (for now) Only UDP (for now) Only offline (for now)

Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007 D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008 D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008 D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp , ISSN: , January 2009 A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009

And for TCP 50