KISS: Stochastic Packet Inspection for UDP Traffic Classification

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Tools, Algorithms & System Implementation for End-user performance monitoring dario.rossi Dario Rossi
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Chapter 7 – Transport Layer Protocols
The testbed environment for this research to generate real-world Skype behaviors for analyzation is as follows: A NAT-ed LAN consisting of 7 machines running.
Digging Deeper Into DPI Network Visibility & Service Management Jay Klein May 2007.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
EEC-484/584 Computer Networks Lecture 10 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Treatment-Based Traffic Signatures Mark Claypool Robert Kinicki Craig Wills Computer Science Department Worcester Polytechnic Institute
Slide title In CAPITALS 50 pt Slide subtitle 32 pt On the Validation of Traffic Classification Algorithms Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,
Assessing the Nature of Internet traffic: Methods and Pitfalls Wolfgang John Chalmers University of Technology, Sweden together with Min Zhang Beijing.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Tracking down Traffic Dario Bonfiglio Marco Mellia Michela Meo Nicolo’ Ritacca Dario Rossi.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Automated malware classification based on network behavior
Network Planète Chadi Barakat
Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnico di Torino
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Traffic Classification through Simple Statistical Fingerprinting M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review,
Revealing Skype Traffic: When Randomness Plays with You D. Bonfiglio 1, M. Mellia 1, M. Meo 1, D. Rossi 2, P. Tofanelli 3 Dipartimento di Elettronica,
DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo.
BitTorrent How it applies to networking. What is BitTorrent P2P file sharing protocol Allows users to distribute large amounts of data without placing.
Presentation on Osi & TCP/IP MODEL
1 Understanding VoIP from Backbone Measurements Marco Mellia, Dario Rossi Robert Birke, and Michele Petracca INFOCOM 07’, Anchorage, Alaska, USA Young.
A Black-box QoS Measurement Methodology for VoIP End-points Wenyu Jiang Henning Schulzrinne NYMAN Workshop September 12, 2003.
11 Automatic Discovery of Botnet Communities on Large-Scale Communication Networks Wei Lu, Mahbod Tavallaee and Ali A. Ghorbani - in ACM Symposium on InformAtion,
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
University of the Western Cape Chapter 12: The Transport Layer.
On the processing time for detection of Skype traffic P.M. Santiago del Río, J. Ramos, J.L. García-Dorado, J. Aracil Universidad Autónoma de Madrid A.
NetworkProfiler: Towards Automatic Fingerprinting of Android Apps Shuaifu Dai, Alok Tongaonkar, Xiaoyin Wang, Antonio Nucci, and Dawn Song Presented by:
Packet Classifiers In Ternary CAMs Can Be Smaller Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison) Jia Wang.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
1 Measuring P2P IPTV Systems Thomas Silverston, Olivier Fourmaux Universit ´e Pierre et Marie Curie - Paris 6 ACM NOSSDAV th International workshop.
IPsec Introduction 18.2 Security associations 18.3 Internet Security Association and Key Management Protocol (ISAKMP) 18.4 Internet Key Exchange.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Presenter: Kuei-Yu Hsu Advisor: Dr. Kai-Wei Ke 2013/4/29 Detecting Skype flows Hidden in Web Traffic.
BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
1 CURELAN TECHNOLOGY Co., LTD Flowviewer FM-800A CURELAN TECHNOLOGY Co., LTD
Transport layer identification of P2P traffic Victor Gau Yi-Hsien Wang
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
@Yuan Xue CS 283Computer Networks Spring 2011 Instructor: Yuan Xue.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their original slides that accompany the.
Quality and Value for the Exam 100% Guarantee to Pass Your Exam Based on Real Exams Scenarios Verified Answers Researched by Industry.
On-line Detection of Real Time Multimedia Traffic
Forensic Framework for Skype Communication
Unknown Malware Detection Using Network Traffic Classification
DDoS Attack Detection under SDN Context
POINT ESTIMATOR OF PARAMETERS
2019/5/10 A Technique for Classification of VoIP Flows in UDP Media Streams using VoIP Signalling Traffic Author: Tejmani Sinam, Irengbam Tilokchan Singh,
2019/5/8 BitCoding Network Traffic Classification Through Encoded Bit Level Signatures Author: Neminath Hubballi, Mayank Swarnkar Publisher/Conference:
Transport Layer Identification of P2P Traffic
Internet Traffic Classification Using Bayesian Analysis Techniques
Presentation transcript:

KISS: Stochastic Packet Inspection for UDP Traffic Classification Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi

Traffic classification Internet Service Provider Look at the packets… Tell me what protocol and/or application generated them

Deep Packet Inspection (DPI) Typical approach: Deep Packet Inspection (DPI) PPLive Bittorrent Internet Service Provider ? ? Port: Port: ? Payload: Payload: “bittorrent” Gtalk eMule ? ? Port: Port: 4662/4672 Payload: RTP protocol Payload: E4/E5

Deep Packet Inspection (DPI) It fails more and more: P2P Encryption Proprietary solutions Many different flavours Typical approach: Deep Packet Inspection (DPI) PPLive Bittorrent Internet Service Provider ? ? Port: Port: ? Payload: Payload: “bittorrent” Gtalk eMule ? ? Port: Port: 4662/4672 Payload: RTP protocol Payload: E4/E5

Possible Solution: Behavioral Classifier Phase 1 Phase 2 Phase 3 Verify Traffic (Known) (Training) (Operation) Feature Decision Statistical characterization of traffic (given source) Look for the behaviour of unknown traffic and assign the class that better fits it Check for possible classification mistakes

Phase 1 : Statistical characterization Verify Traffic (Known) Feature Decision Statistical characterization of bits in a flow c 2 Test Do NOT look at the SEMANTIC and TIMING … but rather look at the protocol FORMAT

Chunking and [ ] c c c Expected distribution Observed (uniform) 2 Expected distribution (uniform) Observed distribution UDP header First N payload bytes C chunks each of b bits c 2 1 C [ ] , … , Vector of Statistics The provides an implicit measure of entropy or randomness of the payload c 2

Chi square statistics

Chi square statistics Deterministic Counter Deterministic 24 Chunks == 12 payload bytes, 4bit x Chunk Deterministic Deterministic Deterministic Deterministic Random Deterministic Counter Time

Protocol format as seen from the 2 RTP eMule DNS

Phase 2 : Decision process Verify Traffic (Known) Feature Decision Statistical characterization of bits in a flow c 2 Test Decision process Minimum distance / maximum likelihood

C-dimension space [ ] ? c Hyperspace Classification Regions c 2 1 C [ ] , … , c 2 i j Hyperspace Classification Regions Class ? My Point Class Euclidean Distance Support Vector Machine

Example

Phase 3 : Performance c Phase 1 Phase 2 Phase 3 Verify Traffic (Known) Feature Decision Statistical characterization of bits in a flow c 2 Test Decision process Minimum distance / maximum likelihood Performance evaluation How accurate is all this?

Real traffic traces Internet Training False Negatives False Positives Fastweb Trace other Complement of known traffic 1 day long trace RTP eMule DNS > 90% of tot. volume Oracle (Manual DPI) 20 GByte of UDP traffic Known + Other Training q Known Traffic False Negatives Unknown traffic False Positives

Definition of false positive/negative Traffic Oracle (DPI) eMule RTP DNS Other Classifing “known” Classifing “other” KISS KISS true positives true negatives false negatives false positives

Results (local) Case A Case B Rtp 0.08 0.23 Edk 13.03 7.97 Dns 6.57 Euclidean Distance SVM Case A Case B Rtp 0.08 0.23 Edk 13.03 7.97 Dns 6.57 19.19 Case A Case B - 0.05 0.98 0.54 0.12 2.14 Known traffic (False Neg.) [%] Case A Case B other 13.6 17.01 Case A Case B - 0.18 Other (False Pos.) [%]

Real traffic trace FN are always below 3%!!! RTP errors are oracle mistakes (do not identify RTP v1) DNS errors are due to impure training set (for the oracle all port 53 is DNS traffic) EDK errors are (maybe) Xbox Live (proper training for “other”) FN are always below 3%!!!

P2P-TV applications P2P-TV applications are becoming popular They heavily rely on UDP at the transport protocol They are based on proprietary protocols They are evolving over time very quickly Tot. Vectors % FN Joost 33514 1.9 PPLive 84452 - SopCast 84473 0.1 Tvants 27184 Tot. Vectors % FP Other 1.2M 0.3

Pros and Cons KISS is good because… but… Blind approach Completely automated Works with many protocols Works even with small training Statistics can start at any point Robust w.r.t. packet drops Bypasses some DPI problems but… Learn (other) properly Needs volumes of traffic May require memory (for now) Only UDP (for now) Only offline (for now)

Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype traffic: when randomness plays with you”, ACM SIGCOMM Computer Communication Review "4", Vol. 37, pp. 37-48, ISSN: 0146-4833, October 2007 D. Rossi, M. Mellia, M. Meo, “Following Skype Signaling Footsteps”, IT-NEWS - QoS-IP 2008 - The Fourth International Workshop on QoS in Multiservice IP Networks, Venice, 13-15 Febbruary D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, 25-26/2/2008 D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008 D.Bonfiglio, A. Finamore, M. Mellia, M. Meo, D. Rossi, “KISS: Stochastic Packet Inspection for UDP Traffic Classification”, submitted to InfoCom09