Download presentation
Presentation is loading. Please wait.
Published byOwen Lier Modified over 9 years ago
1
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine Learning Techniques COST-TMA Meeting, Samos 2008 Valentín Carela-Español Pere Barlet-Ros Josep Solé-Pareta {vcarela, pbarlet, pareta}@ac.upc.edu
2
Outline Scenario and objectives Existing solutions Well-known ports Payload based (pattern matching) Machine Learning –Supervised –Unsupervised Proposed method Results Conclusions and Future work
3
Scenario and objectives Scenario: SMARTxAC Traffic Monitoring and Analysis System for the Anella Científica Real-time classification Independent from packet contents High-speed link Objectives: Development of a ML Technique to identify applications in SMARTxAC Automate the ML training phase Adapt our solution to Netflow Study how it affects the sampling
4
Outline Scenario and objectives Existing solutions Well-known ports Payload based (pattern matching) Machine Learning –Supervised –Unsupervised Proposed method Results Conclusions and Future work
5
Existing Solutions Well-known ports + Computationally lightweight - Very low accuracy Payload based (pattern matching) + High accuracy - Packet contents are required - Computationally expensive - Content encryption - Privacy legislations Consequence: Not a feasible solutions
6
Existing Solutions Machine Learning Techniques - Difficult training phase + Packet contents are not required + High accuracy + Computationally viable Two main possibilities: Supervised methods: + Better accuracy for classes expected - Need a complete pre-labeled dataset - Difficult detection of retraining necessity - No detection of new classes Unsupervised methods: + Do not need a full labeled dataset + Automatic detection of new classes + Better accuracy for new classes
7
Outline Scenario and objectives Existing solutions Well-known ports Payload based (pattern matching) Machine Learning –Supervised –Unsupervised Proposed method Results Conclusions and Future work
8
Proposed method Supervised identification based on C4.5 algorithm Developed by Ross Quinlan as extension of ID3 Based on the construction of a classification tree Training set Actual traffic flows Pairs Feature vector contains relevant characteristics of traffic flows Application is identified using L7-filter
9
Machine Learning process 1) Collection of the training set Representative flows of the environment to be monitored 2)Automatic flow classification → application class Pattern matching using L7-filter It can be simplified if an artificial training set is used in 1) 3) Feature extraction from the training flows 4) Construction of a C4.5 classification tree E.g. using Weka 5) Deployment of the tree obtained in 4) in the monitoring system 6) Retraining of the system Starting from phase 1)
10
Outline Scenario and objectives Existing solutions Well-known ports Payload based (pattern matching) Machine Learning –Supervised –Unsupervised Proposed method Results Conclusions and Future work
11
Accuracy
12
Netflow Accuracy
13
Accuracy
14
Features Accuracy · Best Normal Feature Subset : dport, bytes_out, avg_out_size, sport, avg_in_size, push_in. · Best Netflow Feature Subset: dport, bytes, push
15
How it affects the sampling?
16
Outline Scenario and objectives Existing solutions Well-known ports Payload based (pattern matching) Machine Learning –Supervised –Unsupervised Proposed method Results Conclusions and Future work
17
Conclusions and Future Work Machine learning techniques are a good solution to identify applications The identification in sampled scenarios are still very open Future work: Find a more accurate automatic system to label the dataset Build early decision trees to identify the flow as soon as possible Find features that achieves more accuracy and more resilient to sampling Test with traces from another networks to check the generality of the solution.
18
Thank you for your attention Questions?
19
SMARTxAC SMARTxAC: Traffic Monitoring and Analysis System for the Anella Científica Operative since July 2003 Developed under a collaboration agreement CESCA-UPC Tailor-made traffic monitoring system for the Anella Científica Main objectives Low-cost platform Continuous monitoring of high-speed links without packet loss Detection of network anomalies and irregular usage Multi-user system: Network operators and Institutions Measurement of two full-duplex 10GigE links Connection between Anella Científica and RedIRIS Current load: > 5 Gbps / > 300 Kpps
20
Features Requirements Real-time extraction Independence from packet contents Feature examples (total: 25) Packets and bytes per flow Flow duration min/avg/max paquet size min/avg/max TCP window size min/avg/max packet interarrival time Packets with flags PUSH, URG, DF, … set Average increase of IPID OS estimation (source and destination) Also ports and protocols (but not in the traditional way) …
21
Netflow Features Requirements Available in the Netflow traces (version 5) –Unidirectional flows Feature examples (total: 15) Packets and bytes per flow Flow duration average paquet size average packet interarrival time Flows with flags PUSH, URG, SYN, FIN, RST, ACK set Type of service Also ports and protocols (but not in the traditional way)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.