Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
1 A B C
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
AP STUDY SESSION 2.
1
JTR04, Montpellier, France, 4-6 octobre Traffic characterization and analysis Nicolas Larrieu, Philippe Owezarski LAAS-CNRS Toulouse, France {nlarrieu,
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
UNITED NATIONS Shipment Details Report – January 2006.
David Burdett May 11, 2004 Package Binding for WS CDL.
Introduction to Algorithms 6.046J/18.401J
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Measurements and Their Uncertainty 3.1
CALENDAR.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Chapter 7 Sampling and Sampling Distributions
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
The basics for simulations
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Seungmi Choi PlanetLab - Overview, History, and Future Directions - Using PlanetLab for Network Research: Myths, Realities, and Best Practices.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Disambiguation of Residential Wired and Wireless Access in a Forensic Setting Sookhyun.
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
1 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. On the Capacity of Wireless CSMA/CA Multihop Networks Rafael Laufer and Leonard Kleinrock Bell.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Oil & Gas Final Sample Analysis April 27, Background Information TXU ED provided a list of ESI IDs with SIC codes indicating Oil & Gas (8,583)
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
Chapter 20 Network Layer: Internet Protocol
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
GIS Lecture 8 Spatial Data Processing.
ICmyNet.Flow Network Traffic Analysis System If You Want to See Your Net
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 10 Routing Fundamentals and Subnets.
Adding Up In Chunks.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Before Between After.
Benjamin Banneker Charter Academy of Technology Making AYP Benjamin Banneker Charter Academy of Technology Making AYP.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Prof.ir. Klaas H.J. Robers, 14 July Graduation: a process organised by YOU.
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
KISS: Stochastic Packet Inspection for UDP Traffic Classification
Presentation transcript:

Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur de thèse: Dario Rossi

2 Outline Context and motivation P2P applications P2P traffic diffusion Contributions of this thesis 1. Traffic classification 2. Data reduction 3. Congestion control for P2P traffic Summary and Conclusion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Traffic classification State of the art Behavioral classification for P2P traffic – Abacus Methodology Experimental campaign Dataset & metrics Abacus vs KISS Abacus & sampling

3 P2P applications 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Client-server systems resources on the server contents on the server clients exploit server resources Peer-2-Peer systems hosts share their resources with the others clients talk directly to each other and collaborate robust, scalable, autonomous many services file-sharing, VoIP, live-streaming

4 File-Sharing P2P timeline 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis t Napster Gnutella Bittorrent eMule Limewire Spotify Kazaa Search ChordKademlia VoIP Skype Live streaming PPLive SopcastJoost TVAnts Cool streaming uTorrent 3.0 Web based Megaupload 2005 Music streaming P2P in browsers PhD Thesis!

5 Decline in the last few years Video traffic (YouTube) Web hosting (MegaUpload) …but likely not to disappear Absolute volume increases Users go back to P2P [2] New services still emerging P2P traffic in modern networks 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis High volumes: in 2009, 40-70% of total traffic Concerns among ISPs: especially for P2P-TV [1] Source: Ipoque, Internet studies

6 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction 3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

7 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction 3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

8 P2P Traffic classification 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Problem: Identify P2P traffic in the network …to better manage it Management: QoS, Differential queuing Security: Intrusion detection, Lawful intercept Technical Challenges encryption, tunneling, proprietary protocols CPU power ?

9 Abacus classifier 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Contribution: Abacus Behavioral classifier tailored for P2P-TV applications Later generalized to P2P in general Open source demo software Features Behavioral approach Based only flow-level data counts of packets and bytes Fine-grained classification Robust, portable As accurate as a payload-based classifier

10 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction 3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

11 Sampling common practice among ISPs reduces load, amount of data… …and information! Goal: is traffic classification possible with flow-level data? (Netflow) with flow-sampling? (routing) with packet-sampling? Contributions: studied Abacus with Netflow-data and flow-sampling Data reduction and classification 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

12 Packet Sampling 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Studied impact of packet sampling on classification tstat flow-monitor modified to apply sampling different sampling policies (systematic, random…) and rates Findings heavy distortion no matter the policy information content of features less impacted classification possible when sampled data used for training (homogeneous policy) Flow accuracy Sampling step Train = unsampled Train = sampled

13 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

14 Congestion control for P2P 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: a low-priority protocol for P2P applications Requirements: efficient use of bandwidth, detect congestion early automatically yield to other traffic (interactive, web) Contributions: implemented new BitTorrent protocol (LEDBAT or uTP) delay-based low-priority congestion control

15 Congestion control for P2P 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Contributions: Evaluated through measurements and simulation Discovered a fairness issue Latecomer advantage Proposed effective solution Verified also analytically

16 Outline Context and motivation P2P applications P2P traffic diffusion Contributions of this thesis 1. Traffic classification 2. Data reduction 3. Congestion control for P2P traffic Summary and Conclusion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Traffic classification State of the art Behavioral classification for P2P traffic – Abacus Methodology Experimental campaign Dataset & metrics Abacus vs KISS Abacus & sampling

17 Statistical classification Classifiers families (1) Deep Packet Inspection (DPI) Behavior analysis (Abacus) GET MAIL FROM: BT Specific KeywordFlow propertiesAlgorithm design +s 1 +s 3 -s 2 +s 6 -s 4 -s 5 +s 1 -s 3 -s 2 +s 4 -s 5 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

18 Taxonomy of traffic classification 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis ApproachFeaturesGranularityTimelinessTrainingComputational cost Deep packet inspection Signature in payload [3] Fine grained First payload packet difficultHigh, access to payload Stochastic packet inspection Statistical properties of payload [4] Fine grained Online after 80 packets (~100ms) easyHigh, access to payload StatisticalFlow-level properties [5] Coarse grained Post mortemeasyLightweight Packet-level properties [6] Fine grained After few packets (~5) easyLightweight BehavioralHost-level properties [7] Coarse grained Post mortemeasyLightweight Endpoint rate [8] Fine grained Online, after 5seasyLightweight

19 Abacus: the idea 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Different kinds of people in a party chat briefly with many others talk at length with few others …and different kinds of applications download small pieces of video from many peers download all video from almost the same peers Leverage this to classify traffic 1. Observe a host for a given time 2. Count the packet received by others 3. What kind of application? APP1 APP2

20 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

21 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

22 Abacus: Signature definition 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Procedure 1. Observe host X for T = 5s 2. Count packets received from peers Y i 3. Divide peers in bin (exponential size) 4. Normalize over total number of peers 5. Repeat for bytes 6. The distribution is the Abacus signature Pros Only lightweight operations No access to packet payloads Focus on incoming traffic more stable throughput for video X Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y5Y5 Freq. Distribution = [1, 1, 3, 0] Signature = [0.2, 0.2, 0.6]

23 Signature comparison 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Pmf Time [steps of 5sec] PPLive Tvants Joost Sopcast Pmf

24 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

25 Support Vector Machines 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Space of samples (dim. C) Kernel trick Space of feature (dim. ) Classification decision Training = Signatures are points in a multi-dimentional space complex surfaces separating regions SVM training phase starting from a set of labeled points kernel maps points in a higher-dimentionality space simple hyperplanes separating points Support Vectors individuate the planes Decision phase map the new sample in the higher space label the point according to the region it falls into Unknown traffic rejection criterion or additional class

26 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

27 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results accuracy results portability analysis Abacus with Netflow Abacus in the core

28 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results accuracy results portability analysis Abacus with Netflow Abacus in the core

29 Dataset 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Known issues Ground-truth vs representativeness Our dataset Active traces from European testbed (2008) P2P-TV apps, 40 hosts, 26 GB of data Reliable ground-truth High-heterogenity (access network, location) Passive traces from ISP, Campus (2006–2009) Other P2P apps, and generic traffic, ~4GB of data Ground-truth with DPI or GT[8] Representative of generic environment PPLive Sopcast Joost TVAnts Bittorrent eMuleSkype

30 Metrics True Positive Rate (TPR), percentage of traffic correctly classified Misclassified (Mis) percentage of traffic classified as the wrong applications Other (Ot) percentage of traffic classified as unknown Percentage are computed… signature-wise related to the performance of the classification engine byte-wise related to the bulk of traffic (interesting for ISPs) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

31 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results accuracy results portability analysis Abacus with Netflow Abacus in the core

32 Baseline results 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Classification outcome Signature %Bytes % TPMisUnkTPMisUnk PPLive TVAnts SopCast Joost Unk(UDP) <0.1>99.9 TP higher than 95% in term of signature and 98% in term of bytes Misclassification for signatures carrying less bytes FPR for unknown traffic 0.1% Effective rejection criterion

33 Portability 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Are Abacus signature portable across… Networks? train on one network, test on another one loss 6% worst case Access technologies? divide peers with High Bandwidth 10Mbps and ADSL 2Mbps ok, train=HB has some difficult when test=ADSL Channel popularity? (# of peers in swarm) 2 nd experiment with unpopular channel problems when train=popular and test=unpopular Time? traces of P2P-TV from 2006 as test (train 2008) classification possible unless software version changes

34 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results baseline results Portability analysis Abacus with Netflow Abacus in the core

35 Abacus and Netflow (1) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Netflow de facto standard for flow monitoring routers exports data on flows when flow terminates (explicitly or for timeout) Netflow data has larger time granularities (minutes ) Netflow router Collector For each flow ip src, dst port src, dst ip protocol #packets #bytes begin, end time …

36 Abacus and Netflow (2) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis BytesSignatures ApplicationTPMisOtherTPMisOther PPLive SopCast TVAnts Joost eDonkey BitTorrent Skype DNS Other Most significant signatures are correctly classified

37 Abacus in the core (1) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Abacus needs all traffic for one host (only on the edge) In the core, it is no longer possible due to routing Target host Classifier Flows seen Host1 Host2 Host3 Host4

38 Abacus in the core(2) Abacus signature are normalized if there is no bias in peers selection, classification possible Randomly sampled network with rate 1/2, 1/4, 1/8 train with unsampled, test with sampled traffic Results Byte and signature accuracy degrade smoothly Test with real routing tables agrees with our experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

39 Conclusion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis P2P has a central role in todays Internet traffic Operators need tools to manage such traffic Abacus our contribution to traffic classification behavioral classifier as accurate as payload based algorithms (byte accuracy > 98%) portable (time, space) robust (low false alarm rate <0.1%) works with Netflow data may be deployed in the core

40 Future work 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Behavioral classification test Abacus with TCP and other kind of traffic 2. Data reduction test abacus with packet sampling evaluate other smart policies evaluate portability of sampled flow records 3. Congestion control evaluate LEDBAT in the real world evaluate Bittorrent+LEDBAT in the real world improve LEDBAT definition

41 References 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross. A Measurement Study of a Large- Scale P2P IPTV System. IEEE Transactions on Multimedia, Dec A. Finamore, M. Mellia, M. Meo, M. Munafo, and D. Rossi. Experiences of internet traffic monitoring with tstat. IEEE Network Magazine, Special Issue on Network Traffic Monitoring and Analysis, May V. Paxson. Bro: a system for detecting network intruders in real-time. Elsevier Comput. Netw., 31:2435–2463, December A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli. Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review, 37(1):5–16, January A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, T. Z. J. Fu, Y. Hu, X. Shi, D.-M. Chiu, and J. C. S. Lui. PBS: Periodic Behavioral Spectrum of P2P Applications. In Proc. of PAM 09, Seoul, South Korea, Apr F. Gringoli, Luca Salgarelli, M. Dusi, N. Cascarano, F. Risso, and k. c. claffy. GT: picking up the truth from the ground for internet traffic. SIGCOMM Comput. Commun. Rev. 39,

42 Publications 1. S. Valenti, D. Rossi, Fine-grained behavioral classification in the core: the issue of flow sampling, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC P. Bermolen, M. Mellia, M. Meo, D. Rossi, S. Valenti, Abacus: Accurate behavioral classification of P2P-TV traffic, Elsevier Computer Networks, 55(6): , April S.Valenti, D. Rossi, Identifying key features for P2P traffic classification, in IEEE ICC'11, Kyoto, Japon, June G. Carofiglio, L. Muscariello, D. Rossi and S. Valenti, The quest for LEDBAT fairness, In IEEE Globecom'10, 5. A. Finamore, M. Mellia, M. Meo, D. Rossi and S. Valenti, Peer-to-peer traffic classification: exploiting human communication dynamics, In IEEE Globecom'10, Demo Session, 6. A. Pescape, D. Rossi, D. Tammaro and S. Valenti, On the impact of sampling on traffic monitoring and analysis, In Proceedings of the 22nd International Teletraffic Congress (ITC22), D. Rossi, C. Testa, S. Valenti and L. Muscariello, LEDBAT: the new BitTorrent congestion control protocol, In International Conference on Computer Communication Networks (ICCCN'10) 8. D.Rossi, S. Valenti, Fine-grained traffic classification with Netflow data, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC A.Finamore, M. Meo, D. Rossi, S. Valenti, Kiss to Abacus: a comparison of P2P-TV traffic classifiers, In Traffic Measurement and Analysis (TMA) Workshop at PAM' D. Rossi, C. Testa, S. Valenti, Yes, we LEDBAT: Playing with the new BitTorrent congestion control algorithm, In Passive and Active Measurement (PAM) D. Rossi, E. Sottile, S. Valenti and P. Veglia, Gauging the network friendliness of P2P applications., In SIGCOMM Demo Session, 12. S. Valenti, D. Rossi, M. Meo, M.Mellia and P. Bermolen, Accurate and Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets, In Traffic Measurement and Analysis (TMA) Workshop at IFIP Networking' S. Valenti, D. Rossi, M. Meo, M. Mellia and P. Bermolen, An Abacus for P2P-TV traffic classification, In IEEE INFOCOM 2009, Demo 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

43 Thank you for your attention! 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

44 Abacus and KISS 21/09/2011 CharacteristicAbacusKiss TechniqueBehavioralStochastic Payload Inspection Input formatNetflow-likePacket trace Protocol familyP2P (especially P2P-TV) Any Train set size4000 samples300 samples Time responsiveness deterministic (5s) Stochastic (80 pkts, ~2s) Memory occupation 320 B384 B Memory operations 2 per pkt 177 every 5s 49 per pkt 768 every 80 pkt CPU operations2 per pkt 200 every 5s 24 per pkt 1200 every 80 pkt KISS[6] recognizes protocol syntax analyze first payload bytes use a -like to recognize fields Abacus has same accuracy Abacus outperform KISS for computation cost F1 pkt1 cb d F1 pkt2 cc d F2 pkt1 01 da F1 pkt3 cd c d9 F2 pkt2 02 c c F2 pkt3 03 dc bit = 1 ID detrandomcounter S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

45 Performance metrics 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Confusion matrix representation Indexes TP rate (or Recall) = TP / (TP + FN) recognizing the application traffic FP rate = FP / (FP + TN) recognizing other traffic Real label Classification outcome App AOther App ATPFN OtherFPTN

46 Sensitivity 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Impact of classifier parameters 1. Time interval shorter windows (1s) -> difficult longer windows 10, 15, 30, 60 s -> similar performance 2. Training set size we used 20% of dataset (4000 signatures per app) with 300 signatures -> 10% reduction for some apps 3. Training set diversity 1 or 2 peers per network is enough for a robust training 4. SVM Kernel and bin size Gaussian kernel is better than linear Exponential binning is more efficient of linear binning

47 Packet Sampling 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Studied impact of packet sampling on classification Tstat export flow-level feature Modified to apply sampling Different policies and rates Findings heavy distortion in the measurement, no matter the policy information content of metrics less impacted classification possible when sampled data used for training Systematic Random Stratified SYN Flow accuracy Sampling step Heterogeneous Homogeneous

48 For R~0 low TPR low FPR For R~1 high TPR high FPR For R=0.5 high TPR low FPR Rejection criterion selection 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

49 Training points R R Center of the class New points Labeled as unknown Labeled as green Rejection criterion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Hyper-space is partitioned every point is given a label even unknown apps Need a way to recognize them Define a center for each class Define a threshold R d = distance between sampled and the center of the assigned class If d > R mark the new point as unknown Bhattacharyya distance BD Distance between p.d.f.

50 Signature comparison (mean) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis PPLive Tvants Bins Pmf Pmf Sopcast Bins Joost Bins

51 Abacus with Netflow 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis From packet-level to flow-level data 1. Use longer time-scales (5->120s) applications may become similar -> more difficult to identify 2. Prorate flow-records over time-windows 3. Add a specific class for unknown traffic this problem comes from SVM flow 1, ok! flow 2, to split! t

52 Ops… wrong key ! 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis