Intrusion Detection Using Neural Networks and Support Vector Machine

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

Introduction to Support Vector Machines (SVM)

ECG Signal processing (2)

EE 690 Design of Embodied Intelligence

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Artificial Neural Networks

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Discriminative and generative methods for bags of features

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Speaker Adaptation for Vowel Classification

Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.

Support Vector Machines Kernel Machines

2806 Neural Computation Support Vector Machines Lecture Ari Visa.

Before we start ADALINE

Lecture 10: Support Vector Machines

Data Mining with Neural Networks (HK: Chapter 7.5)

Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.

ICS 273A UC Irvine Instructor: Max Welling Neural Networks.

An Introduction to Support Vector Machines Martin Law.

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Review – Backpropagation

Efficient Model Selection for Support Vector Machines

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.

Intrusion Detection Using Hybrid Neural Networks Vishal Sevani ( )

Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.

Chapter 9 Neural Network.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy

Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Non-Bayes classifiers. Linear discriminants, neural networks.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

SUPPORT VECTOR MACHINE

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machines Tao Department of computer science University of Illinois.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Chapter 6 – Classification (Advanced) Shuaiqiang Wang ( 王帅强 ) School of Computer Science and Technology Shandong University of Finance and Economics Homepage:

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Neural networks and support vector machines

CS 9633 Machine Learning Support Vector Machines

Artificial Neural Networks

Jan Rupnik Jozef Stefan Institute

An Introduction to Support Vector Machines

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Synaptic DynamicsII : Supervised Learning

network of simple neuron-like computing elements

Chapter - 3 Single Layer Percetron

Modeling IDS using hybrid intelligent systems

Outline Announcement Neural networks Perceptrons - continued

Presentation transcript:

Intrusion Detection Using Neural Networks and Support Vector Machine IEEE WCCI IJCNN 2002 World Congress on Computational Intelligence International Joint Conference on Neural Networks Intrusion Detection Using Neural Networks and Support Vector Machine Srinivas Mukkamala, Guadalupe Janoski, Andrew Sung Dept. of CS in New Mexico Institute of Mining and Technology

Outline Approaches to intrusion detection using neural networks and support vector machines DARPA dataset Neural Networks Support Vector Machines Experiments Conclusion and Comments

Approaches Key ideas are to discover useful patterns or features that describe user behavior on a system And use the set of relevant features to build classifiers that can recognize anomalies and known intrusions Neural networks and support vector machines are trained with normal user activity and attack patterns Significant deviations from normal behavior are flagged as attacks

DARPA Data for Intrusion Detection DARPA (Defense Advanced Research Projects Agency) An agency of US Department of Defense responsible for the development of new technology for use by the military Benchmark from a KDD (Knowledge Discovery and Data Mining) competition designed by DARPA Attacks fall into four main categories DOS: denial of service R2L: unauthorized access from a remote machine U2R: unauthorized access to local super user (root) privileges Probing: surveillance and other probing

Features http://kdd.ics.uci.edu/databases/kddcup99/task.html

Neural Networks Neuron 神經 Signals Dendrite 樹突 Signals Gather signals Soma 中心 Combine signals & decide to trigger Signals Signals Axon 軸突 Output signal

Σ Σ Σ Σ X2 θ X1 w1 w2 Divide and Conquer N1 N3 N2 平面的線: w1X1 + w2X2 – θ = 0 D A θ X1 INPUT w1 OUTPUT Σ w2 C B ACTIVATION Divide and Conquer WEIGHT Data N1 N2 A +1 -3 B -1 C D -1 1 N1 -1 -1 x1 1 Σ N3 1 1 out1 x2 Σ out3 N3 A +1 -1 B C D -1 1 N2 1 -1 x1 Σ out2 -1 x2

Feed Forward Neural Network (FFNN) Layer 1 Layer 2 Layer 3 Layer 4 Decide Architecture 1 Determine Weight Automatically 2 tanh(S) S x1(1) xj(l) N1 Nj Hyperbolic function eS – e-S Layer 1 Layer l tanh(S) = eS + e-S general S1(1) Sj(l) Σ Σ w01(1) w21(1) wij(l) w11(1) x0(0) x2(0) x1(0) xi(l-1)

Σ Σ Σ How to minimize E(w) ?  Stochastic Gradient Descent (SGD) w w w Input Output Σ w w w Σ w g(x) 由w所組成的classifier Training Data: E Error Function: How to minimize E(w) ?  Stochastic Gradient Descent (SGD) w is random small value at the beginning for T iterations wnew  wold – η．▽w(En) w learning rate

Σ …… Back Propagation Algorithm Nj forward for l = 1, 2, …, L Layer 1 Layer 2 … … Layer L-1 Layer L …… for l = 1, 2, …, L compute Sj(l) and xj(l) x1(l) Back Propagation Algorithm backward Nj for l = L, L-1, …, 1 compute δi(l) Layer l Sj(l) Σ wij(l) General xi(l-1)

Σ Σ Σ Σ Feed Forward NNet Consists of layers w 1, 2, …, L w w wij(l) connect neuron i in layer (l-1) to neuron j in layer l w w … … Σ Cumulated signal w w w Σ Activated output w often tanh x1(l) Minimize E(w) and determine the weights automatically Nj SGD (Stochastic Gradient Descent) Layer l w is random small value at the beginning for T iterations wnew  wold – η．▽w(En) Sj(l) Σ Forward: compute Sj(l) and xj(l) Backward: compute δi(l) wij(l) Stop when desired error rate was met xi(l-1)

Support Vector Machine A supervised learning method Is known as the maximum margin classifier Find the max-margin separating hyperplane

SVM – hard margin max argmin <w, x> - θ = 0 2 max ∥w∥ w, θ yn(<w, xn> - θ) ≧1 2 ∥w∥ 1 argmin <w, w> 2 w, θ yn(<w, xn> - θ) ≧1 x1

Quadratic programming 1 Σ Σ aijvivj + Σ bivi 2 i j argmin v V*  quadprog(A, b, R, q) Σ rkivi ≧ qk i Let V = [ θ, w1, w2, …, wD ] Adapt the problem for quadratic programming Find A, b, R, q and put into the quad. solver argmin 2 w, θ yn(<w, xn> - θ) ≧1 1 <w, w> Σ wd2 2 1 d=1 D (-yn) θ + Σ yn (xn)d wd ≧ 1 d=1 D

Adaptation argmin Σ Σ aijvivj + Σ bivi Σ rkivi ≧ qk Σ wd2 1 2 1 V = [ θ, w1, w2, …, wD ] v0, v1, v2, .…, vD (-yn) θ + Σ yn (xn)d wd ≧ 1 d=1 D Σ wd2 2 1 d=1 D (1+D)*(1+D) v0 vd (1+D)*1 a00 = 0 a0j = 0 ai0 = 0 i ≠ 0, j ≠ 0 aij = 1 (i = j) 0 (i ≠ j) b0 = 0 i ≠ 0 bi = 0 rn0 = -yn d > 0 rnd = yn (xn)d (2N)*(1+D) qn = 1 (2N)*1

SVM – soft margin Allow possible training errors Tradeoff c Large c : thinner hyperplane, care about error Small c : thicker hyperplane, not care about error tradeoff 1 argmin <w, w> + c Σξn errors 2 w, θ n yn(<w, xn> - θ) ≧1 – ξn ξn ≧ 0

Adaptation argmin Σ Σ aijvivj + Σ bivi Σ rkivi ≧ qk 1 2 V = [ θ, w1, w2, …, wD, ξ1, ξ2, …, ξN ] (1+D+N)*(1+D+N) (1+D+N)*1 (2N)*1 (2N)*(1+D+N)

Primal form and Dual form 1 argmin <w, w> + c Σξn 2 w, θ n yn(<w, xn> - θ) ≧1 – ξn Variables: 1+D+N Constraints: 2N ξn ≧ 0 Dual form 1 argmin ΣΣ αnynαmym<xn, xm> - Σ αn 2 α n m n 0 ≦αn≦C Variables: N Constraints: 2N+1 Σ ynαn = 0 n

Dual form SVM Find optimal α* Use α* solve w* and θ αn=0  correct or on 0<αn<C  on αn=C  wrong or on αn=C free SV Support Vector αn=0

Nonlinear SVM Nonlinear mapping X  Φ(X) {(x)1, (x)2} R2  {1, (x)1, (x)2, (x)12, (x)22, (x)1(x)2} R6 Need kernel trick 1 argmin ΣΣ αnynαmym<Φ(xn), Φ(xm)> - Σ αn 2 α n m n 0 ≦αn≦C Σ ynαn = 0 n (1+ <xn, xm>)2

Support Vector Machines Experiments Pre-processing Training Testing Using automated parsers to process the raw TCP/IP dump data into machine-readable form 7312 training data (different types of attacks and normal data) has 41 features 6980 testing data evaluate the classifier Support Vector Machines Neural Networks Details RBF kernel C = 1000 204 support vectors (29 free) 3-layer 41-40-40-1 FFNNets Scaled conjugate gradient descent Desired error rate = 0.001 Accuracy 99.5% 99.25% Time spent 17.77 sec 18 min

Conclusion and Comments Speed SVMs is significant shorter Avoid the ”curse of dimensionality” by max-margin Accuracy Both have high accuracy SVMs can only make binary classification IDS requires multiple-class identification How to determine the features？