Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intrusion detection and identification based on Supelec TCPdump data and KDD1999 Sylvain GOMBAULT et Wei WANG Département Réseaux, Sécurité et Multimédia.

Similar presentations


Presentation on theme: "Intrusion detection and identification based on Supelec TCPdump data and KDD1999 Sylvain GOMBAULT et Wei WANG Département Réseaux, Sécurité et Multimédia."— Presentation transcript:

1

2 Intrusion detection and identification based on Supelec TCPdump data and KDD1999 Sylvain GOMBAULT et Wei WANG Département Réseaux, Sécurité et Multimédia École Nationale Supérieure des Télécommunications de Bretagne, France

3 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 2 -- 2 - GET/ENST Bretagne Outline Deep analysis of kdd99 transformation and database Intrusion detection using Supelec TCPdump data Building multiple behavioral models for network intrusion identification (Monam 2007) kNN based Intrusion detection and identification PCA based intrusion detection and identification Conclusion & future work

4 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 3 -- 3 - GET/ENST Bretagne Data transformation and explicit Approach Fonction de transformation Choix dattributs pertinents Définition des propriétés à satisfaire (fonction riche) Deux étapes après transformation des données brutes Construction du modèle par apprentissage de données étiquetées Phase de détection : données à classifier (analyser)service domain_u http private time auth normal Protocol_type tcp udp normal DOS Probenormal Classification duréeserviceProtocoleClasse 230shttptcpnormal 0sprivateudpDOS Transformation du trafic brut

5 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 4 -- 4 - GET/ENST Bretagne Fonction de transformation Données considérées : Trafic réseau Pour alimenter loutil de classification à partir du trafic brut : Fonction de transformation T R : ensemble du trafic brut I : ensemble ditems structurés

6 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 5 -- 5 - GET/ENST Bretagne Analysis of kdd99 database (1) Learning base : 4 connections have the same 41 attributes but the label is different 0,icmp,ecr_i,SF,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0. 00,0.00,1,1,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,ipsweep.148774 0,icmp,ecr_i,SF,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0. 00,0.00,1,1,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,portsweep.345836 0,icmp,tim_i,SF,564,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, 0.00,0.00,2,2,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,normal.143855 0,icmp,tim_i,SF,564,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, 0.00,0.00,2,2,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,pod.345952

7 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 6 -- 6 - GET/ENST Bretagne Analysis of kdd99 database (2) Test base (corrected file) 71 distinct connections have the same attributes but have the different labels. 71503 (22.99% of the total) connections have the same attributes but appear the different labels 3 ipsweep (Probing) attack connections have the same attributes of those of smurf (DoS) attack (56608 connections) 3 (0.07%) Probing attacks cannot be detected (classifed as DoS attack instead)

8 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 7 -- 7 - GET/ENST Bretagne Analysis of kdd99 database Test base (corrected file) : 7563 (97.7% of the total) connections of the snmpgetattack attack have the same attributes of those of normal 2.3% of the snmpgetattack have similar attributes as normal, (but not all the same) 7563 (46.72% of the total) R2L attack cannot be detected (they are classifed as normal)

9 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 8 -- 8 - GET/ENST Bretagne Améliorations du C4.5 (for kdd99)

10 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 9 -- 9 - GET/ENST Bretagne Supelec TCPdump data Supelec TCPdump (trafic brut) -> using BRO to construct attributes Transformation du trafic tcpdump en 41 attributsservice domain_u http private time auth normal Protocol_type tcp udp normal DOS Probenormal Classification duréeserviceProtocoleClasse 230shttptcpnormal 0sprivateudpDOS Transformation du trafic brut par BRO

11 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 10 - GET/ENST Bretagne Supelec TCPdump Data (suite) Transformation du trafic TCPdump en 41 attributs Use BRO 4 catégories dattributs : Données générales de la connexion (niveau réseau et transport) Service, Type de protocole (TCP, UDP ou ICMP), … Attributs liés à la couche application Nombre de création de fichier, Nombre de shells, … Attributs statistiques sur les connexions situées dans les 2 dernières secondes de la connexion courante Attributs statistiques sur les 100 dernières connexions

12 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 11 - GET/ENST Bretagne Learning and test data sets Base dapprentissage (from KDD99) ~5 millions de connexions (10% (494021) utilisées from KDD99 learning set) 4 classes dattaques + trafic normal Probing (4), DoS (6), U2R (4), R2L (9). Base de test (from Supelec) Normal Use of 0-29 files of 101 tcpdump files 30Gb size 4652059 connexions TCP: 1173654; UDP: 3254160; ICMP: 224245 Only normal data Attack 10 connections Cross-http, write-http, login-http, execute-http

13 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 12 - GET/ENST Bretagne Résultats avec les Arbres de décision (c4.5) Lalgorithme c4.5 introduit par Quinlan avec qq modifications Processus de construction Processus de classification Normal (%)Probing (%)DoS (%)U2R (%)R2L (%)New (%) Normal (4652059)72.312.914.6000.2 Attack (10)1000000

14 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 13 - GET/ENST Bretagne Intrusion detection and Identification based on KDD99 data Building the normal model based on normal data for intrusion detection Building individual attack model based on corresponding attack data for intrusion identification

15 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 14 - GET/ENST Bretagne The general Intrusion detection and Identification Model

16 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 15 - GET/ENST Bretagne kNN Based intrusion detection Building normal behavioral model Calculate the distances between each test vector t and each vector in the training data set by using Euclidean distance: Sort the distance and choose the k nearest neighbors. Average the k closest distance scores as the anomaly index. Detection If the anomaly index of a test sequence vector t is above a threshold the test sequence is then classified as abnormal. otherwise it is considered as normal.

17 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 16 - GET/ENST Bretagne kNN based intrusion identification Define normal and individual attack data sets as ; Identification: For each test vector t do Calculate for in each training set; Find k smallest scores of as k-nearest neighbors; If more than a half of k nearest neighbors correspond to a specific attack type then t is identified as Else If the number of smallest distance that corresponds to an attack type is greater than those of others then t is identified as Else then t is identified as a new attack End If End For

18 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 17 - GET/ENST Bretagne Principal Component Analysis Dimension reduction technique for data analysis and compression New coordinate system to represent the original large data set The axes are the eigenvectors associated with the several largest eigenvalues without sacrificing valuable information in the data set Have been applied in face recognition, text categorization, etc. Original coordinateNew coordinate PCA methods for intrusion detection

19 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 18 - GET/ENST Bretagne PCA based normal model building for intrusion detection U Training data (attribute matrix) Mean vector Mean-justed matrix Covariance matrix Eigenvalue-eigenvector pairs k eigenvectors associated with the k largest eigenvalue

20 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 19 - GET/ENST Bretagne Intrusion detection based on PCA model U t Reconstruction Projection Test data Mean vector Anomaly/identification index Projection coefficient (Principal component)

21 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 20 - GET/ENST Bretagne PCA based intrusion detection and intrusion identification Intrusion detection Given a new data vector t, If its anomaly index ε is above a threshold, the test vector is considered as abnormal Otherwise, it is classified as normal Intrusion identification Calculate the Euclidean distance between the test vector and its reconstruction onto each subspace formed by normal data and individual type of attack and set the minimum ε i as the identification index. If εi is below the predefined threshold θ i for a certain individual type of attack, the vector is then identified as this type of attack. Otherwise it is identified as a new attack.

22 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 21 - GET/ENST Bretagne Learning and test data sets for intrusion identification Data description: 41 attributes + name of the class Text format Data for intrusion detection (learning base of kdd99) Learning data: randomly selected 7000 connections Test data: 4 classes dattaques + trafic normal Normal data: randomly selected 10,000 normal connections Attack data: all the other attack connections 391,458 DoS attacks, 1,126 R2L attacks, 52 U2R attacks and 4107 Probe attacks. Data for intrusion identification (learning base of kdd99) Learning data: Randomly selected 7,000 normal network connections The former 2,000 back, 10,000 Nepture, 200 Pod, 20,000 Smurf, 800 Teardrop, 40 Guess passwd, 900 Warezclient, 1000 Ipsweep, 900 Portsweep, 1200 Satan, 200 Nmap, 15 Warezmaster, 25 buffer overflow attack Test data All the other network connections of these types of attacks are used for identification.

23 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 22 - GET/ENST Bretagne Intrusion detection: results based on PCA and kNN for kdd99 data MethodsOverall dataDoSR2LU2RProbe DR (%) FPR (%) DR (%) FPR (%) DR (%) FPR (%) DR (%) FPR (%) DR (%) FPR (%) kNN (k=5)84.32.987.12.937.61.6754.156.418.6 PCA98.80.499.20.294.5488.50.680.74

24 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 23 - GET/ENST Bretagne Intrusion identification: results based on PCA and kNN for kdd99 data Attack typeAttack category Identification Rate (%) kNNPCA k=5k=7k=9 guess_ passwdR2L92.3 warezclientR2L100 57.5 warezmasterR2L80 100 backDoS98.599.598100 neptuneDoS99.8 97.795.3 podDoS10096.910095.3 smurfDoS100 80.5 teardropDoS97.799.497.8100 buffer overflowU2R80 60 ipsweepProbe97.699.297.66.1 nmapProbe12.9 67.1 portsweepProbe100 0 satanProbe88.2 91.5

25 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 24 - GET/ENST Bretagne kNN and PCA methods comparison kNN No need for training Suitable for dynamical envorinment Require large computation in testing stage Need computation (m – dimensionality of vector; n – number of samples) PCA Need considerable computation for training Leight weight in testing stage Need computation (p – number of different attack types; q – number principal components) Suitable for detection massive data

26 Intrusion detection and identification based on Supelec data and KDD1999 DADDi Reunion, Rennes October 11, 2007 - 25 - GET/ENST Bretagne Conclusion KDD 99 transformation function didnot extract enough information from the raw data for anomaly detection Using the 41 attributes can achieve 72% detection rate of Supelec normal data kNN and PCA achieve good detection and identification results based on kdd99 data PCA can process massive data sets Identification process needs attack data set (sometimes it is difficult) The 41 attributes may be reduced for light weight detection while remain the detection accuracy Use some optimization methods for selecting key attributes in future work Early and fast detection of network attacks is important No need to wait the connection is finished and early detection is our future work

27 Merci pour votre attention! Thank for your attention! Questions?


Download ppt "Intrusion detection and identification based on Supelec TCPdump data and KDD1999 Sylvain GOMBAULT et Wei WANG Département Réseaux, Sécurité et Multimédia."

Similar presentations


Ads by Google