Presentation is loading. Please wait.

Presentation is loading. Please wait.

Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur.

Similar presentations


Presentation on theme: "Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur."— Presentation transcript:

1 Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur de thèse: Dario Rossi

2 2 Outline Context and motivation P2P applications P2P traffic diffusion Contributions of this thesis 1. Traffic classification 2. Data reduction 3. Congestion control for P2P traffic Summary and Conclusion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Traffic classification State of the art Behavioral classification for P2P traffic – Abacus Methodology Experimental campaign Dataset & metrics Abacus vs KISS Abacus & sampling

3 3 P2P applications 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Client-server systems resources on the server contents on the server clients exploit server resources Peer-2-Peer systems hosts share their resources with the others clients talk directly to each other and collaborate robust, scalable, autonomous many services file-sharing, VoIP, live-streaming

4 4 File-Sharing P2P timeline 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis t Napster Gnutella Bittorrent 1999200020012002 2003 2004 2007 eMule Limewire Spotify Kazaa Search ChordKademlia VoIP Skype Live streaming PPLive SopcastJoost TVAnts Cool streaming uTorrent 3.0 Web based Megaupload 2005 Music streaming P2P in browsers PhD Thesis! 20082011

5 5 Decline in the last few years Video traffic (YouTube) Web hosting (MegaUpload) …but likely not to disappear Absolute volume increases Users go back to P2P [2] New services still emerging P2P traffic in modern networks 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis High volumes: in 2009, 40-70% of total traffic Concerns among ISPs: especially for P2P-TV [1] Source: Ipoque, Internet studies 2008-2009

6 6 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction 3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

7 7 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction 3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

8 8 P2P Traffic classification 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Problem: Identify P2P traffic in the network …to better manage it Management: QoS, Differential queuing Security: Intrusion detection, Lawful intercept Technical Challenges encryption, tunneling, proprietary protocols CPU power ?

9 9 Abacus classifier 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Contribution: Abacus Behavioral classifier tailored for P2P-TV applications Later generalized to P2P in general Open source demo software Features Behavioral approach Based only flow-level data counts of packets and bytes Fine-grained classification Robust, portable As accurate as a payload-based classifier

10 10 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction 3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

11 11 Sampling common practice among ISPs reduces load, amount of data… …and information! Goal: is traffic classification possible with flow-level data? (Netflow) with flow-sampling? (routing) with packet-sampling? Contributions: studied Abacus with Netflow-data and flow-sampling Data reduction and classification 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

12 12 Packet Sampling 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Studied impact of packet sampling on classification tstat flow-monitor modified to apply sampling different sampling policies (systematic, random…) and rates Findings heavy distortion no matter the policy information content of features less impacted classification possible when sampled data used for training (homogeneous policy) Flow accuracy Sampling step Train = unsampled Train = sampled

13 13 Content of this thesis 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic 1.Traffic Classification 2.Data reduction3.Congestion control for P2P ? P2P? File-sharing? Bittorrent? Sampling

14 14 Congestion control for P2P 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: a low-priority protocol for P2P applications Requirements: efficient use of bandwidth, detect congestion early automatically yield to other traffic (interactive, web) Contributions: implemented new BitTorrent protocol (LEDBAT or uTP) delay-based low-priority congestion control

15 15 Congestion control for P2P 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Contributions: Evaluated through measurements and simulation Discovered a fairness issue Latecomer advantage Proposed effective solution Verified also analytically

16 16 Outline Context and motivation P2P applications P2P traffic diffusion Contributions of this thesis 1. Traffic classification 2. Data reduction 3. Congestion control for P2P traffic Summary and Conclusion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Traffic classification State of the art Behavioral classification for P2P traffic – Abacus Methodology Experimental campaign Dataset & metrics Abacus vs KISS Abacus & sampling

17 17 Statistical classification Classifiers families (1) Deep Packet Inspection (DPI) Behavior analysis (Abacus) GET MAIL FROM: BT Specific KeywordFlow propertiesAlgorithm design +s 1 +s 3 -s 2 +s 6 -s 4 -s 5 +s 1 -s 3 -s 2 +s 4 -s 5 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

18 18 Taxonomy of traffic classification 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis ApproachFeaturesGranularityTimelinessTrainingComputational cost Deep packet inspection Signature in payload [3] Fine grained First payload packet difficultHigh, access to payload Stochastic packet inspection Statistical properties of payload [4] Fine grained Online after 80 packets (~100ms) easyHigh, access to payload StatisticalFlow-level properties [5] Coarse grained Post mortemeasyLightweight Packet-level properties [6] Fine grained After few packets (~5) easyLightweight BehavioralHost-level properties [7] Coarse grained Post mortemeasyLightweight Endpoint rate [8] Fine grained Online, after 5seasyLightweight

19 19 Abacus: the idea 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Different kinds of people in a party chat briefly with many others talk at length with few others …and different kinds of applications download small pieces of video from many peers download all video from almost the same peers Leverage this to classify traffic 1. Observe a host for a given time 2. Count the packet received by others 3. What kind of application? APP1 APP2

20 20 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

21 21 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

22 22 Abacus: Signature definition 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Procedure 1. Observe host X for T = 5s 2. Count packets received from peers Y i 3. Divide peers in bin (exponential size) 4. Normalize over total number of peers 5. Repeat for bytes 6. The distribution is the Abacus signature Pros Only lightweight operations No access to packet payloads Focus on incoming traffic more stable throughput for video X Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y5Y5 Freq. Distribution = [1, 1, 3, 0] Signature = [0.2, 0.2, 0.6] 1 2 3-4 5-8 9-16

23 23 Signature comparison 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Pmf Time [steps of 5sec] PPLive Tvants Joost Sopcast Pmf

24 24 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

25 25 Support Vector Machines 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Space of samples (dim. C) Kernel trick Space of feature (dim. ) Classification decision Training = Signatures are points in a multi-dimentional space complex surfaces separating regions SVM training phase starting from a set of labeled points kernel maps points in a higher-dimentionality space simple hyperplanes separating points Support Vectors individuate the planes Decision phase map the new sample in the higher space label the point according to the region it falls into Unknown traffic rejection criterion or additional class

26 26 Classification process 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it Support Vector Machines (or other learning tool) 3. Validate the classification accuracy Cfr with an oracle that knows the truth Phase 1 Signature Phase 3 Verify Phase 2 Decision Traffic (Known) (Training) (Operation)

27 27 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results accuracy results portability analysis Abacus with Netflow Abacus in the core

28 28 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results accuracy results portability analysis Abacus with Netflow Abacus in the core

29 29 Dataset 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Known issues Ground-truth vs representativeness Our dataset Active traces from European testbed (2008) P2P-TV apps, 40 hosts, 26 GB of data Reliable ground-truth High-heterogenity (access network, location) Passive traces from ISP, Campus (2006–2009) Other P2P apps, and generic traffic, ~4GB of data Ground-truth with DPI or GT[8] Representative of generic environment PPLive Sopcast Joost TVAnts Bittorrent eMuleSkype

30 30 Metrics True Positive Rate (TPR), percentage of traffic correctly classified Misclassified (Mis) percentage of traffic classified as the wrong applications Other (Ot) percentage of traffic classified as unknown Percentage are computed… signature-wise related to the performance of the classification engine byte-wise related to the bulk of traffic (interesting for ISPs) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

31 31 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results accuracy results portability analysis Abacus with Netflow Abacus in the core

32 32 Baseline results 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Classification outcome Signature %Bytes % TPMisUnkTPMisUnk PPLive95.422.442.1498.321.540.14 TVAnts99.840.160.0099.820.170.01 SopCast97.551.171.2998.960.980.06 Joost94.970.234.8099.620.230.15 Unk(UDP)0.199.9<0.1>99.9 TP higher than 95% in term of signature and 98% in term of bytes Misclassification for signatures carrying less bytes FPR for unknown traffic 0.1% Effective rejection criterion

33 33 Portability 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Are Abacus signature portable across… Networks? train on one network, test on another one loss 6% worst case Access technologies? divide peers with High Bandwidth 10Mbps and ADSL 2Mbps ok, train=HB has some difficult when test=ADSL Channel popularity? (# of peers in swarm) 2 nd experiment with unpopular channel problems when train=popular and test=unpopular Time? traces of P2P-TV from 2006 as test (train 2008) classification possible unless software version changes

34 34 Overview of experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Dataset and metrics Experimental results baseline results Portability analysis Abacus with Netflow Abacus in the core

35 35 Abacus and Netflow (1) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Netflow de facto standard for flow monitoring routers exports data on flows when flow terminates (explicitly or for timeout) Netflow data has larger time granularities (minutes ) Netflow router Collector For each flow ip src, dst port src, dst ip protocol #packets #bytes begin, end time …

36 36 Abacus and Netflow (2) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis BytesSignatures ApplicationTPMisOtherTPMisOther PPLive964-63.614.122.3 SopCast92.97.1-54.421.324.3 TVAnts99.40.6-49.722.228.1 Joost99.90.1-53.224.322.5 eDonkey98.90.20.994.40.84.8 BitTorrent89.110.30.612.586.90.6 Skype90.53.16.486.17.56.4 DNS92.15.02.963.9729.1 Other0.299.812.487.6 Most significant signatures are correctly classified

37 37 Abacus in the core (1) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Abacus needs all traffic for one host (only on the edge) In the core, it is no longer possible due to routing Target host Classifier Flows seen Host1 Host2 Host3 Host4

38 38 Abacus in the core(2) Abacus signature are normalized if there is no bias in peers selection, classification possible Randomly sampled network with rate 1/2, 1/4, 1/8 train with unsampled, test with sampled traffic Results Byte and signature accuracy degrade smoothly Test with real routing tables agrees with our experiments 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

39 39 Conclusion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis P2P has a central role in todays Internet traffic Operators need tools to manage such traffic Abacus our contribution to traffic classification behavioral classifier as accurate as payload based algorithms (byte accuracy > 98%) portable (time, space) robust (low false alarm rate <0.1%) works with Netflow data may be deployed in the core

40 40 Future work 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. Behavioral classification test Abacus with TCP and other kind of traffic 2. Data reduction test abacus with packet sampling evaluate other smart policies evaluate portability of sampled flow records 3. Congestion control evaluate LEDBAT in the real world evaluate Bittorrent+LEDBAT in the real world improve LEDBAT definition

41 41 References 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis 1. X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross. A Measurement Study of a Large- Scale P2P IPTV System. IEEE Transactions on Multimedia, Dec. 2007. 2. A. Finamore, M. Mellia, M. Meo, M. Munafo, and D. Rossi. Experiences of internet traffic monitoring with tstat. IEEE Network Magazine, Special Issue on Network Traffic Monitoring and Analysis, May 2011. 3. V. Paxson. Bro: a system for detecting network intruders in real-time. Elsevier Comput. Netw., 31:2435–2463, December 1999 4. A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, 2010. 5. M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli. Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review, 37(1):5–16, January 2007. 6. A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, 2010. 7. T. Z. J. Fu, Y. Hu, X. Shi, D.-M. Chiu, and J. C. S. Lui. PBS: Periodic Behavioral Spectrum of P2P Applications. In Proc. of PAM 09, Seoul, South Korea, Apr 2009 8. F. Gringoli, Luca Salgarelli, M. Dusi, N. Cascarano, F. Risso, and k. c. claffy. GT: picking up the truth from the ground for internet traffic. SIGCOMM Comput. Commun. Rev. 39, 5 2009

42 42 Publications 1. S. Valenti, D. Rossi, Fine-grained behavioral classification in the core: the issue of flow sampling, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC 2011 2. P. Bermolen, M. Mellia, M. Meo, D. Rossi, S. Valenti, Abacus: Accurate behavioral classification of P2P-TV traffic, Elsevier Computer Networks, 55(6):1394-1411, April 2011 3. S.Valenti, D. Rossi, Identifying key features for P2P traffic classification, in IEEE ICC'11, Kyoto, Japon, June 2011 4. G. Carofiglio, L. Muscariello, D. Rossi and S. Valenti, The quest for LEDBAT fairness, In IEEE Globecom'10, 5. A. Finamore, M. Mellia, M. Meo, D. Rossi and S. Valenti, Peer-to-peer traffic classification: exploiting human communication dynamics, In IEEE Globecom'10, Demo Session, 6. A. Pescape, D. Rossi, D. Tammaro and S. Valenti, On the impact of sampling on traffic monitoring and analysis, In Proceedings of the 22nd International Teletraffic Congress (ITC22), 2010. 7. D. Rossi, C. Testa, S. Valenti and L. Muscariello, LEDBAT: the new BitTorrent congestion control protocol, In International Conference on Computer Communication Networks (ICCCN'10) 8. D.Rossi, S. Valenti, Fine-grained traffic classification with Netflow data, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC 2010 9. A.Finamore, M. Meo, D. Rossi, S. Valenti, Kiss to Abacus: a comparison of P2P-TV traffic classifiers, In Traffic Measurement and Analysis (TMA) Workshop at PAM'10 10. D. Rossi, C. Testa, S. Valenti, Yes, we LEDBAT: Playing with the new BitTorrent congestion control algorithm, In Passive and Active Measurement (PAM) 2010 11. D. Rossi, E. Sottile, S. Valenti and P. Veglia, Gauging the network friendliness of P2P applications., In SIGCOMM Demo Session, 12. S. Valenti, D. Rossi, M. Meo, M.Mellia and P. Bermolen, Accurate and Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets, In Traffic Measurement and Analysis (TMA) Workshop at IFIP Networking'09 13. S. Valenti, D. Rossi, M. Meo, M. Mellia and P. Bermolen, An Abacus for P2P-TV traffic classification, In IEEE INFOCOM 2009, Demo 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

43 43 Thank you for your attention! 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

44 44 Abacus and KISS 21/09/2011 CharacteristicAbacusKiss TechniqueBehavioralStochastic Payload Inspection Input formatNetflow-likePacket trace Protocol familyP2P (especially P2P-TV) Any Train set size4000 samples300 samples Time responsiveness deterministic (5s) Stochastic (80 pkts, ~2s) Memory occupation 320 B384 B Memory operations 2 per pkt 177 every 5s 49 per pkt 768 every 80 pkt CPU operations2 per pkt 200 every 5s 24 per pkt 1200 every 80 pkt KISS[6] recognizes protocol syntax analyze first payload bytes use a -like to recognize fields Abacus has same accuracy Abacus outperform KISS for computation cost F1 pkt1 cb d2... 02 60 F1 pkt2 cc d5... 02 08 F2 pkt1 01 da... 02 65 F1 pkt3 cd c0... 02 d9 F2 pkt2 02 c1... 02 5c F2 pkt3 03 dc... 02 11 3 bit = 1 ID detrandomcounter S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

45 45 Performance metrics 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Confusion matrix representation Indexes TP rate (or Recall) = TP / (TP + FN) recognizing the application traffic FP rate = FP / (FP + TN) recognizing other traffic Real label Classification outcome App AOther App ATPFN OtherFPTN

46 46 Sensitivity 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Impact of classifier parameters 1. Time interval shorter windows (1s) -> difficult longer windows 10, 15, 30, 60 s -> similar performance 2. Training set size we used 20% of dataset (4000 signatures per app) with 300 signatures -> 10% reduction for some apps 3. Training set diversity 1 or 2 peers per network is enough for a robust training 4. SVM Kernel and bin size Gaussian kernel is better than linear Exponential binning is more efficient of linear binning

47 47 Packet Sampling 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Studied impact of packet sampling on classification Tstat export flow-level feature Modified to apply sampling Different policies and rates Findings heavy distortion in the measurement, no matter the policy information content of metrics less impacted classification possible when sampled data used for training Systematic Random Stratified SYN Flow accuracy Sampling step Heterogeneous Homogeneous

48 48 For R~0 low TPR low FPR For R~1 high TPR high FPR For R=0.5 high TPR low FPR Rejection criterion selection 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

49 49 Training points R R Center of the class New points Labeled as unknown Labeled as green Rejection criterion 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Hyper-space is partitioned every point is given a label even unknown apps Need a way to recognize them Define a center for each class Define a threshold R d = distance between sampled and the center of the assigned class If d > R mark the new point as unknown Bhattacharyya distance BD Distance between p.d.f.

50 50 Signature comparison (mean) 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis PPLive Tvants Bins 0.5 0.4 0.3 0.2 0.1 0.0 Pmf 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.4 0.3 0.2 0.1 0.0 Pmf Sopcast Bins Joost Bins

51 51 Abacus with Netflow 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis From packet-level to flow-level data 1. Use longer time-scales (5->120s) applications may become similar -> more difficult to identify 2. Prorate flow-records over time-windows 3. Add a specific class for unknown traffic this problem comes from SVM flow 1, ok! flow 2, to split! t0120240

52 52 Ops… wrong key ! 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis


Download ppt "Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur."

Similar presentations


Ads by Google