Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolutionary Neural Networks

Similar presentations


Presentation on theme: "Evolutionary Neural Networks"— Presentation transcript:

1 Evolutionary Neural Networks

2 Backgrounds Why NN+EC? “Evolving brains”: Biological neural networks compete and evolve The way that intelligence was created Global search Adaptation to dynamic environments without human intervention Architecture evolution Optimal solution Local Max Initial weights Population Samples

3 General Framework of EANN
Backgrounds General Framework of EANN [X. Yao]

4 Evolution of Connection Weights
Backgrounds Evolution of Connection Weights Encode each individual neural network’s connection weights into chromosomes Calculate the error function and determine individual’s fitness Reproduce children based on selection criterion Apply genetic operators

5 Representation Backgrounds Binary representation
Weights are represented by binary digits e.g. 8 bits can represent connection weights between and +127 Limitation on representation precision too few bits → some numbers cannot be approximated too many bits → training might be prolonged To overcome binary representation, some proposed using real number i.e., one real number per connection weight Standard genetic operators such as crossover not applicable to this representation However, some argue that it is possible to perform evolutionary computation with only mutation Fogel, Fogel and Porto (1990): adopted one genetic operator – Gaussian random mutation

6 Evolution of Architectures
Backgrounds Evolution of Architectures Encode each individual neural network’s architecture into chromosomes Train each neural network with predetermined learning rule Calculate the error function and determine individual’s fitness Reproduce children based on selection criterion Apply genetic operators

7 Backgrounds Direct Encoding All information is represented by binary strings, i.e. each connection and node is specified by some binary bits An N by N matrix can represent the connectivity with N nodes, where Does not scale well since large NN needs a big matrix to represent

8 Backgrounds Indirect Encoding Only the most important parameters or features of an architecture are represented. Other details are left to the learning process to decide e.g. specify the number of hidden nodes and let the learning process decide how they are connected (e.g. fully connected) More biologically plausible as it is impossible for genetic information encoded in humans to specify the whole nervous system directly according to the discoveries of neuroscience

9 Evolution of Learning Rules
Backgrounds Evolution of Learning Rules Decode each individual into a learning rule Construct a neural network (either pre-determined or randomly) and train it with decoded learning rule refers to adapting the learning function, in this case, the connection weights are updated with an adaptive rule Calculate the error function and determine individual’s fitness Reproduce children based on selection criterion Apply genetic operators

10 Two Case Studies Evolving intrusion detector
Evolving classifier for DNA microarray data

11 Evolutionary Learning Program’s Behavior In Neural Networks for Anomaly Detection
Hi~ I am kyung-joong kim from Yonsei University in South Korea. In this talk, I’ll introduce the idea of applying evolutionary neural network to anomaly detection. Learning program’s behavior using machine learning techniques based on system call audit data is effective to detect intrusions. Among several machine learning techniques, the neural networks are known for its good performance in learning system call sequences. However, it suffers from very long training time because there are no formal solutions for determining the suitable structure of networks. In this talk, a novel intrusion detection technique based on evolutionary neural networks that simultaneously learn the structure and connection weights. Experimental results against 1999 DARPA IDEVAL data is promising.

12 Motivation (1) Attacker’s strategy: Leading to malfunctions by using program’s bug Showing different behavior compared to normal one Anomaly detection Learning normal program’s behavior from audit data Classifying programs which show different behavior with normal one as intrusion Adopted in many host-based intrusion detection system System audit data and machine learning techniques Basic security module (BSM) Rule-based learning, neural network and HMM Usually, intruder uses known or unknown bugs which lead to malfunctions and the behavior of program is different from normal one when they attack. Anomaly detection is based on the idea that intrusion detection system classifies the behavior which deviates from the normal behavior which is modeled using machine learning techniques. The main point of the approach is how to learn normal behavior and there are many research using various machine learning techniques. The basic source of the learning is coming from system audit data and Basic security module provides the data. Rule-based learning, neural network and HMM is previously used for learning program’s behavior.

13 Optimizing architectures as well as connection weights
Motivation (2) Machine learning methods such as Neural network (NN) and HMM Effective for intrusion detection based on program’s behavior Architecture of classifier The most important thing in classification Searching for appropriate architecture for the problems is crucial NN: the number of hidden neurons and connection information HMM: the number of states and connection information Traditional methods Trial-and-error Train 90 neural networks [Ghosh99]  It took too much time because the size of audit data is too large Especially, in previous researches, neural network showed the performance superior to other techniques. However learning normal behavior requires very long time due to the huge amount of audit data and computationally intensive learning algorithm. Moreover, to apply neural network to real world problems successfully, it is very important to determine the topology of network, and the number of hidden nodes which are proper for the given problem, because the performance hinges upon the structure of network. Unfortunately, there is no formal solution and typically the network structure is designed by repeating trial and error cycles. Evolutionary neural network which learns weights and topology simultaneously is developed to solve the problem. Optimizing architectures as well as connection weights

14 Related Works S. Forrest (1998, 1999)
First intrusion detection by learning program’s behavior HMM performed better than other methods J. Stolfo (1997) : Rule-based learning (RIPPER) N. Ye (2001) Probabilistic methods: Decision tree, chi-square multivariate test and one order Markov chain model (1998 IDEVAL data) Ghosh  (1999, 2000) Multi-layer perceptrons and Elman neural network Elman neural network performed the best (1999 IDEVAL data) Vemuri (2003) kNN and SVM (1998 IDEVAL data) Let’s get insight from the previous works. Forrest first introduce the learning program’s behavior and solve the problem using hidden Markov model. After then, many researchers such as Stolfo, Ye, Ghosh, Vemuri have used probabilistic, neural and rule-based methods. Ghosh who showed the best performance against the public benchmark data trained 90 neural networks in total for each program: from 10 to 60 hidden nodes. It require much training time. Similarly, HMM suffers from the difficulty to determine the number of states.To the best of my knowledge, there is no work on evolutionary neural network for learning program’s behavior.

15 The Proposed Method Architecture
System call audit data and evolutionary neural networks The main idea of our method is learning each program’s behavior using evolutionary neural network. We use system call-level audit data and preprocessor monitors the execution of specified programs while it generates system call sequences for each program. GA modeler builds normal behavior profiles using evolutionary neural network. One neural network is used per one program. Unknown data are inputted to the corresponding neural network. If the valuation value exceeds the pre-defined threshold, the alarm is raised.

16 Normal Behavior Modeling
Evolutionary neural networks Simultaneously learning weights and architectures using genetic algorithm Partial training: back-propagation algorithm Representation: matrix Rank-based selection, crossover, mutation operators Fitness evaluation : Recognition rate on training data (mixing real normal sequences and artificial intrusive sequences) Generating neural networks with optimal architectures for learning program’s behavior Anomaly detector uses only attack-free data in training phase, but to train the supervised learner like neural network, the data labeled as attack are also needed. For this reason, we have generated the artificial random system call sequences and used them as intrusive data.

17 ENN (Evolutionary Neural Network) Algorithm
Let me introduce the details of evolutionary neural network used. It is based on the simple genetic algorithm but its genetic operators and representation are specialized. Back-propagation algorithm is used to accelerate evolution.

18 Representation I1 H1 H3 H2 O1 0.4 0.5 0.1 0.7 0.2 Generation of
Neural Network Weight Connectivity Hidden Node Input Node Output Node This is an example of evolutionary neural network. It adopts matrix-based representation. Though there are many alternatives for representing neural network in genetic algorithm, this one is simple and straight forward to implement. In this representation, we set the number of hidden nodes, maximum number of nodes and output nodes. Half of the matrix is used for connectivity and the half is used to represent topology of the network.

19 Crossover (1) H1 I1 H1 0.4 0.7 O1 0.1 0.2 H3 H2 0.5 0.4 0.7 0.2 I1 H2 O1 0.5 0.1 H3 0.7 0.1 Crossover H1 0.1 0.2 0.1 I1 H2 0.5 O1 0.5 0.4 H3

20 Crossover (2) Crossover

21 Mutation H1 I1 H1 H3 H2 O1 0.4 0.5 0.1 0.7 0.2 0.3 0.4 0.7 Add Connection I1 H2 0.2 O1 0.5 0.1 H3 0.7 0.1 I1 H1 H3 H2 O1 0.4 0.5 0.1 0.7 0.2 H1 0.4 0.7 Delete Connection I1 H2 0.2 O1 0.5 0.1 H3 0.7 0.1

22 Anomaly Detection (1) 280 system calls in BSM audit data
45 frequently occurred calls (indexing as 0~44) Indexing remaining calls as 45 10 input nodes, 15 hidden nodes (Maximum number of hidden nodes), 2 output nodes Normalizing input values between 0 and 1 Output nodes: Normal and anomaly exit fcntl ioctl munmap fork rename pipe seteuid creat mkdir setuid putmsg unlink fchdir utime getmsg chown open -read setgid auditon access open - write              mmap memcntl stat open - write,creat audit sysinfo lstat open - write,trunc setgroups close  readlink open - write,creat,trunc setpgrp getaudit  execve open - read,write chdir pathconf vfork open -  read,write,crea

23 Anomaly Detection (2) Evaluation value will rise up shortly when intrusion occurs Detection of locally continuous anomaly sequence is important Considering previous values Normalizing output values for applying the same threshold to all neural networks m: Average output value for training data, d: std

24 Experimental Design 1999 DARPA IDEVAL data provided by MIT Lincoln Lab
Denial of Service, probe, Remove-to-local (R2L), User-to-root (U2R) Main focus: Detection of U2R attack Bearing marks of traces in audit data Monitoring program’s behavior which has SETUID privilege Main target for U2R attack at rsh sendmail deallocate atq su utmp_update list_devices atm uptime accton ffbconfig chkey w xlock ptree crontab yppasswd ff.core pwait eject volcheck kcms_configure ssh fdformat ct kcms_calibrate sulogin login nispasswd mkcookie admintool newgrp top allocate passwd quota mkdevalloc whodo ps ufsdump mkdevmaps pt_chmod rcp ufsrestore ping rlogin rdist exrecover sacadm

25 Experimental Design (2)
1999 IDEVAL : audit data for 5 weeks 1, 3 weeks (attack free)  training data 4-5 weeks  test data Test data includes totally 11 attacks with 4 types of U2R Setting of genetic algorithm Population size: 20, crossover rate: 0.3 mutation rate: 0.08, Maximum generation:100 The best individual in the last generation Name Description Times eject exploiting buffer overflow in the 'eject' program 2 ffbconfig exploiting buffer overflow in the 'ffbconifg' program fdformat exploiting buffer overflow in the 'fdformat' program 3 ps race condition attack in 'ps' program 4

26 Evolution Results Convergence to fitness 0.8 near 100 generations

27 Learning Time Environments
Intel Pentium Zeon 2.4GHz Dual processor, 1GB RAM Solaris 9 operating system Data Login program Totally 1905 sequences Parameters Learning for 5000epoch Average of 10 runs Types Hidden Nodes Running Time (sec) MLP 10 235.5 15 263.4 20 454.2 25 482 30 603.6 35 700 40 853.6 50 1216 60 1615 ENN 4460

28 Effectiveness of Evolutionary NN for IDS
Detection Rates 100% detection rate with 0.7 false alarm per day Elman NN which shows the best performance for the 1999 IDEVAL data : 100% detection rate with 3 false alarms per day Effectiveness of Evolutionary NN for IDS

29 Results Analysis – Architecture of NN
The best individual for learning behavior of ps program Effective for system call sequence and more complex than general MLP

30 Comparison of Architectures
Comparison of the number of connections between ENN learned for 100 generations using ps program data and MLP They have the similar number of connections However, ENN has different types of connections and sophisticated architectures MLP ENN FROM╲TO Input Hidden Output 150 30 FROM╲TO Input Hidden Output 86 15 67 19

31 Evolving Artificial Neural Networks for DNA Microarray Analysis

32 Motivation Colon cancer : The second only to lung cancer as a cause of cancer-related mortality in Western countries The development of microarray technology has supplied a large volume of data to many fields It has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer Proposed method Feature selection + evolutionary neural network (ENN) ENN : no restriction on architecture (design without human’s prior knowledge)

33 What is Microarray? Microarray technology
Enables the simultaneous analysis of thousands of sequences of DNA for genetic and genomic research and for diagnostics Two Major Techniques Hybridization method cDNA microarray/ Oligonucleotide microarray Sequencing method SAGE

34 Acquiring Gene Expression Data

35 Machine Learning for DNA Microarray

36 Related Works 91.9 Quadratic discriminant 93.5 Logistic discriminant
Partial least square 87.1 Principal component analysis Nguyen et al. 72.6 AdaBoost 74.2 SVM with quadratic kernel 80.6 Nearest neighbor All genes, TNoM score Ben-Dor et al. 94.1 KNN Genetic algorithm Li et al. 90.3 SVM Signal to noise ratio Furey et al. Classifier Feature Accuracy (%) Method Authors

37 Overview

38 Colon Cancer Dataset Alon’s data
Colon dataset consists of 62 samples of colon epithelial cells taken from colon-cancer patients 40 of 62 samples are colon cancer samples and the remaining are normal samples Each sample contains 2000 gene expression levels Each sample was taken from tumors and normal healthy parts of the colons of the same patients and measured using high density oligonucleotide arrays Training data: 31 of 62, Test data: 31 of 62

39 Experimental Setup Feature size : 30 Parameters of genetic algorithm
Population size : 20 Maximum generation number : 200 Crossover rate : 0.3 Mutation rate : 0.1 Fitness function : recognition rate for validation data Learning rate of BP : 0.1

40 Performance Comparison

41 Sensitivity/Specificity
Cost comparison Classifying cancer person as normal person > classifying normal person as cancer person 20 1 (Cancer) 2 9 0 (Normal) Actual Predicted EANN

42 Architecture Analysis
Whole architecture From input to hidden neuron

43 Architecture Analysis (2)
Input to output relationship is useful to analyze Input to output Hidden neuron to output neuron Hidden neuron to hidden neuron


Download ppt "Evolutionary Neural Networks"

Similar presentations


Ads by Google