Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-valued negative selection algorithms Zhou Ji 11-2-2005.

Similar presentations


Presentation on theme: "Real-valued negative selection algorithms Zhou Ji 11-2-2005."— Presentation transcript:

1 Real-valued negative selection algorithms Zhou Ji

2 outline Background Background Variations of real-valued selection algorithms Variations of real-valued selection algorithms More details through an example: V-detector More details through an example: V-detector Demonstration Demonstration

3 3background Background: AIS AIS (Artificial Immune Systems) – only about 10 years history AIS (Artificial Immune Systems) – only about 10 years history Negative selection (development of T cells) Negative selection (development of T cells) Immune network theory (how B cells and antibodies interact with each other) Immune network theory (how B cells and antibodies interact with each other) Clonal selection (how a pool of B cells, especially, memory cells are developed) Clonal selection (how a pool of B cells, especially, memory cells are developed) New inspirations from immunology: danger theory, germinal center, etc. New inspirations from immunology: danger theory, germinal center, etc. Negative selection algorithms Negative selection algorithms The earliest and most widely used AIS. The earliest and most widely used AIS.

4 4 Biological metaphor of negative selection How T cells mature in the thymus: The cell are diversified. Those that recognize self are eliminated. The rest are used to recognize nonself.

5 5background The idea of negative selection algorithms (NSA) The problem to deal with: anomaly detection (or one-class classification) Detector set random generation: maintain diversity censoring: eliminating those that match self samples The concept of feature space and detectors

6 6background Outline of a typical NSA Generation of detector set Anomaly detection: (classification of incoming data items)

7 7background Family of NSA Types of works about NSA Applications: solving real world problems by using a typical version or adapting for specific applications Applications: solving real world problems by using a typical version or adapting for specific applications Improving NSA of new detector scheme and generation method and analyzing existing methods. Works are data representation specific, mostly binary representation. Improving NSA of new detector scheme and generation method and analyzing existing methods. Works are data representation specific, mostly binary representation. Establishment of framework for binary representation to include various matching rules; discussion on uniqueness and usefulness of NSA; introduction of new concepts. Establishment of framework for binary representation to include various matching rules; discussion on uniqueness and usefulness of NSA; introduction of new concepts. What defines a negative selection algorithm? Representation in negative space Representation in negative space One-class learning One-class learning Usage of detector set Usage of detector set

8 Data representation in NSA Different representations vs. different searching space Different representations vs. different searching space Various representations: Various representations: Binary Binary String over finite alphabet: no fundamental difference from binary String over finite alphabet: no fundamental difference from binary Real-valued vector Real-valued vector hybrid hybrid Different distance measure Different distance measure Data representation is not the only factor to make a scheme different Data representation is not the only factor to make a scheme different

9 Real-valued NSA Why is real-valued NSA different from binary NSA? Why is real-valued NSA different from binary NSA? Hard to analyze: simple combinatorics would not work Hard to analyze: simple combinatorics would not work Necessary and proper for many real applications: binary representation may decouple the relation between feature space and representation Necessary and proper for many real applications: binary representation may decouple the relation between feature space and representation Is categorization based on data representation a good way to understand and develop NSA? Is categorization based on data representation a good way to understand and develop NSA?

10 10 Major issues in NSA Number of detectors Number of detectors Affecting the efficiency of generation and detection Affecting the efficiency of generation and detection Detector coverage Detector coverage Affecting the accuracy detection Affecting the accuracy detection Generation mechanisms Generation mechanisms Affecting the efficiency of generation and the quality of resulted detectors Affecting the efficiency of generation and the quality of resulted detectors Matching rules – generalization Matching rules – generalization How to interpret the training data How to interpret the training data depending on the feature space and representation scheme depending on the feature space and representation scheme Issues that are not NSA specific Issues that are not NSA specific Difficulty of one-class classification Difficulty of one-class classification Curse of dimensionality Curse of dimensionality

11 Variations of real-valued NSA Rectangular detectors generated with GA Rectangular detectors generated with GA Circular detectors that move and change size Circular detectors that move and change size MILA (multilevel immune learning algorithm) MILA (multilevel immune learning algorithm)

12 Rectangular detectors + GA Rectangular detectors: rules of value range Rectangular detectors: rules of value range Generated by a typical genetic algorithm Generated by a typical genetic algorithm By Gonzalez, Dasgupta

13 Circular detectors (hypersphere) From constant size to variable size From constant size to variable size Moving after initial generation: Moving after initial generation: Reduce overlap Reduce overlap artificial annealing artificial annealing By Dasgupta, KrishnaKumar et al By Dasgupta, Gonzalez

14 MILA Multilevel – to capture local patterns and global patterns Multilevel – to capture local patterns and global patterns Negative selection + positive selection Negative selection + positive selection Euclidean distance on sub-space Euclidean distance on sub-space For example, suppose that a self string is and the window size is chosen as 3, then the self peptide strings can be,, and so on by randomly picking up the attribute at some positions.

15 V-detector V-detector is a new negative selection algorithm. V-detector is a new negative selection algorithm. It embraces a series of related works to develop a more efficient and more reliable algorithm. It embraces a series of related works to develop a more efficient and more reliable algorithm. It has its unique process to generate detectors and determine coverage. It has its unique process to generate detectors and determine coverage.

16 16 V-detectors major features Variable-sized detectors Variable-sized detectors Statistical confidence in detector coverage Statistical confidence in detector coverage Boundary-aware algorithm Boundary-aware algorithm Extensibility Extensibility

17 In real-valued representation, detector can be visualized as hyper-sphere. Candidate 1: thrown-away; candidate 2: made a detector. Match or not match?

18 18 Variable sized detectors in V-detector method are maximized detector Unanswered question: what is the self space? traditional detectors: constant size V-detector: maximized size

19 19 Why is the idea of variable sized detectors novel? The rational of constant size: a uniform matching threshold The rational of constant size: a uniform matching threshold Detectors of variable size exist in some negative selection algorithms as a different mechanism Detectors of variable size exist in some negative selection algorithms as a different mechanism Allowing multiple or evolving size to optimize the coverage – limited by the concern of overlap Allowing multiple or evolving size to optimize the coverage – limited by the concern of overlap Variable size as part of random property of detectors/candidates Variable size as part of random property of detectors/candidates V-detector uses variable sized detectors to maximize the coverage with limited number of detectors V-detector uses variable sized detectors to maximize the coverage with limited number of detectors Size is decided on by the training data Size is decided on by the training data Large nonself region is covered easily Large nonself region is covered easily Small detectors cover holes Small detectors cover holes Overlap is not an issue in V-detector Overlap is not an issue in V-detector

20 20 Statistical estimate of detector coverage Exiting works: estimate necessary number of detectors – no direct relationship between the estimate and the actual detector set obtained. Exiting works: estimate necessary number of detectors – no direct relationship between the estimate and the actual detector set obtained. Novelty of V-detector: Novelty of V-detector: Evaluate the coverage of the actual detector set Evaluate the coverage of the actual detector set Statistical inference is used as an integrated components of the detector generation algorithm, not to estimate coverage of finished detector set. Statistical inference is used as an integrated components of the detector generation algorithm, not to estimate coverage of finished detector set.

21 21 Basic idea leading to the new estimation mechanism Random points are taken as detector candidates. The probability that a random point falls on covered region (some exiting detectors) reflects the portion that is covered -- similar to the idea of Monte Carlo integral. Random points are taken as detector candidates. The probability that a random point falls on covered region (some exiting detectors) reflects the portion that is covered -- similar to the idea of Monte Carlo integral. Proportion of covered nonself space = probability of a sample point to be a covered point. (the points on self region not counted) Proportion of covered nonself space = probability of a sample point to be a covered point. (the points on self region not counted) When more nonself space has been covered, it becomes less likely that a sample point to be an uncovered one. In other words, we need try more random point to find a uncovered one - one that can be used to make a detector. When more nonself space has been covered, it becomes less likely that a sample point to be an uncovered one. In other words, we need try more random point to find a uncovered one - one that can be used to make a detector.

22 22 Statistics involved Central limit theory: sample statistic follows normal distribution Central limit theory: sample statistic follows normal distribution Using sample statistic to population parameter Using sample statistic to population parameter In our application, use proportion of covered random points to estimate the actual proportion of covered area In our application, use proportion of covered random points to estimate the actual proportion of covered area proportion 01

23 23 Statistic inference Point estimate versus confidence interval Point estimate versus confidence interval Estimate with confidence interval versus hypothesis testing Estimate with confidence interval versus hypothesis testing Proportion that is close to 100% will make the assumption of central limit theory invalid – not normal distribution. Proportion that is close to 100% will make the assumption of central limit theory invalid – not normal distribution. Purpose of terminating the detector generation Purpose of terminating the detector generation

24 Hypothesis testing Identifying null hypothesis/alternative hypothesis. Identifying null hypothesis/alternative hypothesis. Type I error: falsely reject null hypothesis Type I error: falsely reject null hypothesis Type II error: falsely accept null hypothesis Type II error: falsely accept null hypothesis The null hypothesis is the statement that wed rather take as true if there is not strong enough evidence showing otherwise. In other words, we consider type I error more costly. The null hypothesis is the statement that wed rather take as true if there is not strong enough evidence showing otherwise. In other words, we consider type I error more costly. In term of coverage estimate, we consider falsely inadequate coverage is more costly. So the null hypothesis is: the current coverage is below the target coverage. In term of coverage estimate, we consider falsely inadequate coverage is more costly. So the null hypothesis is: the current coverage is below the target coverage. Choose significant level: maximum probability we are willing to accept in making Type I Error. Choose significant level: maximum probability we are willing to accept in making Type I Error. Collect sample and compute its statistic, in this case, the proportion. Collect sample and compute its statistic, in this case, the proportion. Calculate z score from proportion an compare with z Calculate z score from proportion an compare with z If z is larger, we can reject null hypothesis and claim adequate coverage with confidence If z is larger, we can reject null hypothesis and claim adequate coverage with confidence

25 25 Boundary-aware algorithm versus point-wise interpretation A new concept in negative selection algorithm A new concept in negative selection algorithm Previous works of NSA Previous works of NSA Matching threshold is used as mechanism to control the extent of generalization Matching threshold is used as mechanism to control the extent of generalization However, each self sample is used individually. The continuous area represented by a group of sample is not captured. (point-wise interpretation) However, each self sample is used individually. The continuous area represented by a group of sample is not captured. (point-wise interpretation) More specificity Relatively more aggressive to detect anomaly More generalization The real boundary is Extended. Desired interpretation: The area represented by The group of points

26 26 Boundary–aware: using the training points as a collection Boundary-aware algorithm A clustering mechanism though represented in negative space The training data are used as a collection instead individually. Positive selection cannot do the same thing

27 27 V-detector is more than a real-valued negative selection algorithm V-detector can be implemented for any data representation and distance measure. V-detector can be implemented for any data representation and distance measure. Usually negative selection algorithms were designed with specific data representation and distance measure. Usually negative selection algorithms were designed with specific data representation and distance measure. The features we just introduced are not limited by representation scheme or generation mechanism. (as long as we have a distance measure and a threshold to decide matching) The features we just introduced are not limited by representation scheme or generation mechanism. (as long as we have a distance measure and a threshold to decide matching)

28 28contribution V-detector algorithm with confidence in detector coverage

29 29contribution V-detector algorithm with confidence in detector coverage

30 30contribution V-detector algorithm with confidence in detector coverage

31 31 V-detectors advantages Efficiency: Efficiency: fewer detectors fewer detectors fast generation fast generation Coverage confidence Coverage confidence Extensibility, simplicity Extensibility, simplicity

32 Experiments A large pool of synthetic data (2-D real space) are experimented to understand V-detectors behavior A large pool of synthetic data (2-D real space) are experimented to understand V-detectors behaviorsynthetic data synthetic data More detail analysis of the influence of various parameters is planned as work to do More detail analysis of the influence of various parameters is planned as work to do Real world data Real world data Confirm it works well enough to detect real world anomaly Confirm it works well enough to detect real world anomaly Compare with methods dealing with similar problems Compare with methods dealing with similar problems Demonstration Demonstration How actual training data and detector look like How actual training data and detector look like Basic UI and visualization of V-detector implementation Basic UI and visualization of V-detector implementation

33 Parameters to evaluate its performance Detection rate Detection rate False alarm rate False alarm rate Number of detectors Number of detectors

34 34 Control parameters and algorithm variations Control parameters and algorithm variations Self radius – key parameter Self radius – key parameter Target coverage Target coverage Significant level (of hypothesis testing) Significant level (of hypothesis testing) Boundary-aware versus point-wise Boundary-aware versus point-wise Hypothesis testing versus naïve estimate Hypothesis testing versus naïve estimate Reuse random points versus minimum detector set (to be implemented) Reuse random points versus minimum detector set (to be implemented)

35 35 Datas influence on performance Specific shape Specific shape Intuitively, corners will affect the results. Intuitively, corners will affect the results. Number of training points Number of training points Major influence Major influence

36 36 Experiments on 2-D synthetic data Training points (1000)Test data (1000 points) and the real shape we try to learn

37 37 Detector sets generated Trained with 1000 pointsTrained with 100 points

38 Synthetic data (intersection and pentagram): compare naïve estimate and hypothesis testing intersection shapepentagram

39 Synthetic data : results for different shapes of self region

40 Synthetic data (ring): compare boundary-aware and point- wise Detection rate False alarm rate

41 Synthetic data (cross-shaped self): balance of errors

42 42 Real world data Biomedical data Biomedical data Pollution data Pollution data Ball bearing – preprocessed time series data Ball bearing – preprocessed time series data Others: Iris data, gene data, India Telugu Others: Iris data, gene data, India Telugu

43 Results of biomedical data Training Data Algorithm Detection Rate False Alarm rate Number of Detectors MeanSDMeanSDMeanSD 100% training MILA * 0 NSA r= r= % training MILA * 0 NSA r = r= % training MILA * 0 NSA r= r=

44 Results of air pollution data Detection rate and false alarm rateNumber of detectors

45 Ball bearings structure and damage Damaged cage

46 Ball bearing data raw data: time series of acceleration measurements Preprocessing (from time domain to representation space for detection) 1. FFT (Fast Fourier Transform) with Hanning windowing: window size Statistical moments: up to 5 th order Example of raw data (new bearings, first 1000 points)

47 Ball bearing data: results Ball bearing conditions Total number of data points Number of detected anomalies Percentage detected New bearing (normal) % Outer race completely broken % Broken cage with one loose element % Damage cage, four loose elements % No evident damage; badly worn % Ball bearing conditions Total number of data points Number of detected anomalies Percentage detected New bearing (normal) % Outer race completely broken % Broken cage with one loose element % Damage cage, four loose elements % No evident damage; badly worn % Preprocessed with FFT Preprocessed with statistical moments

48 48contribution Ball bearing experiments with two different preprocessing techniques

49 Results of Iris data Detection rate False alarm rate Setosa 100% MILA NSA (single level) 1000 V-detector Setosa 50% MILA NSA (single level) V-detector Versicolor 100% MILA NSA (single level) V-detector Versicolor 50% MILA NSA (single level) V-detector Virginica 100% MILA NSA (single level) V-detector Virginica 50% MILA NSA (single level) V-detector

50 Iris data: number of detectors meanmaxMinSD Setosa 100% Setosa 50% Veriscolor 100% Versicolor 50% Virginica 100% Virginica 50%

51 51 Conclusions Real-valued NSA has unique advantages and difficulties. Real-valued NSA has unique advantages and difficulties. Good NSA should not be limited by the difference in data representation Good NSA should not be limited by the difference in data representation Killer application is needed to support the necessity of NSA as many other soft computation paradigm Killer application is needed to support the necessity of NSA as many other soft computation paradigm Compare with other methods. In case of NSA, other one- class classification, e.g. one-class SVM Compare with other methods. In case of NSA, other one- class classification, e.g. one-class SVM Good representation scheme and distance measure play a very important role in performance – more important than algorithm variations in many cases. Good representation scheme and distance measure play a very important role in performance – more important than algorithm variations in many cases.

52 references S Forrest, A. S. Perelson, L. Allen, and R. Cherukuri. Self-nonself discrimination in a computer. In Proc. of the IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press, Los Alamitos, CA, pp. 202–212, S Forrest, A. S. Perelson, L. Allen, and R. Cherukuri. Self-nonself discrimination in a computer. In Proc. of the IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press, Los Alamitos, CA, pp. 202–212, D. Dasgupta and F. Gonzalez, An Immunity-Based Technique to Characterize Intrusions in Computer Networks. In the Journal IEEE Transactions on Evolutionary Computation, Volume:6, Issue:3,Page(s): , June, D. Dasgupta and F. Gonzalez, An Immunity-Based Technique to Characterize Intrusions in Computer Networks. In the Journal IEEE Transactions on Evolutionary Computation, Volume:6, Issue:3,Page(s): , June, F. Gonzalez, D. Dasgupta and L.F. Nino. A Randomized Real-Valued Negative Selection Algorithm. In the proceedings of the 2nd International Conference on Artificial Immune Systems UK September 1-3, F. Gonzalez, D. Dasgupta and L.F. Nino. A Randomized Real-Valued Negative Selection Algorithm. In the proceedings of the 2nd International Conference on Artificial Immune Systems UK September 1-3, D.Dasgupta, S.Yu and N.S. Majumdar. MILA - Multilevel Immune Learning Algorithm. In the proceedings of the Genetic and Evolutionary Computation Conference(GECCO) Chicago, July D.Dasgupta, S.Yu and N.S. Majumdar. MILA - Multilevel Immune Learning Algorithm. In the proceedings of the Genetic and Evolutionary Computation Conference(GECCO) Chicago, July Dasgupta, Ji, Gonzalez, Artificial immune system (AIS) research in the last five years, CEC 2003 Dasgupta, Ji, Gonzalez, Artificial immune system (AIS) research in the last five years, CEC 2003 Ji, Dasgupta, Augmented negative selection algorithm with variable-coverage detectors, CEC 2004 Ji, Dasgupta, Augmented negative selection algorithm with variable-coverage detectors, CEC 2004 D.Dasgupta, K.KrishnaKumar, D.Wong, M.Berry Negative Selection Algorithm for Aircraft Fault Detection. 3rd International Conference on Artificial Immune Systems Catania, Sicily.(Italy) September D.Dasgupta, K.KrishnaKumar, D.Wong, M.Berry Negative Selection Algorithm for Aircraft Fault Detection. 3rd International Conference on Artificial Immune Systems Catania, Sicily.(Italy) September Ji, Dasgupta, Real-valued negative selection algorithm with variable-sized detectors, GECCO 2004 Ji, Dasgupta, Real-valued negative selection algorithm with variable-sized detectors, GECCO 2004 Simon M. Garrett. How do we evaluate artificial immune systems? Evolutionary Computation, 13(2):145–178, Simon M. Garrett. How do we evaluate artificial immune systems? Evolutionary Computation, 13(2):145–178, Ji, Dasgupta, Estimating the detector coverage in a negative selection algorithm, GECCO 2005 Ji, Dasgupta, Estimating the detector coverage in a negative selection algorithm, GECCO 2005 Ji, A boundary-aware negative selection algorithm, ASC 2005 Ji, A boundary-aware negative selection algorithm, ASC 2005 Ji, Dasgupta, Revisiting negative selection algorithms, submitted to the Evolutionary Computation Journal Ji, Dasgupta, Revisiting negative selection algorithms, submitted to the Evolutionary Computation Journal Ji, Dasgupta, An efficient negative selection algorithm of probably adequate coverage, submitted to SMC Ji, Dasgupta, An efficient negative selection algorithm of probably adequate coverage, submitted to SMC

53 Questions? Thank you!

54 What is matching rule? When a sample and a detector are considered matching. When a sample and a detector are considered matching. Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation. Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.

55 Experiments and Results Synthetic Data Synthetic Data 2D. Training data are randomly chosen from the normal region. 2D. Training data are randomly chosen from the normal region. Fishers Iris Data Fishers Iris Data One of the three types is considered as normal. One of the three types is considered as normal. Biomedical Data Biomedical Data Abnormal data are the medical measures of disease carrier patients. Abnormal data are the medical measures of disease carrier patients. Air Pollution Data Air Pollution Data Abnormal data are made by artificially altering the normal air measurements Abnormal data are made by artificially altering the normal air measurements Ball bearings: Ball bearings: Measurement: time series data with preprocessing - 30D and 5D Measurement: time series data with preprocessing - 30D and 5D

56 Synthetic data - Cross-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

57 Synthetic data - Cross-shaped self space Results Detection rate and false alarm rateNumber of detectors

58 Synthetic data - Ring-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

59 Synthetic data - Ring-shaped self space Results Detection rate and false alarm rateNumber of detectors

60 Iris data Virginica as normal, 50% points used to train Detection rate and false alarm rateNumber of detectors

61 Biomedical data Blood measure for a group of 209 patients Blood measure for a group of 209 patients Each patient has four different types of measurement Each patient has four different types of measurement 75 patients are carriers of a rare genetic disorder. Others are normal. 75 patients are carriers of a rare genetic disorder. Others are normal.

62 Biomedical data Detection rate and false alarm rateNumber of detectors

63 Air pollution data Totally 60 original records. Totally 60 original records. Each is 16 different measurements concerning air pollution. Each is 16 different measurements concerning air pollution. All the real data are considered as normal. All the real data are considered as normal. More data are made artificially: More data are made artificially: 1. Decide the normal range of each of 16 measurements 2. Randomly choose a real record 3. Change three randomly chosen measurements within a larger than normal range 4. If some the changed measurements are out of range, the record is considered abnormal; otherwise they are considered normal Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data. Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.

64 Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points

65 Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points

66 How much one sample tells

67 Samples may be on boundary

68 In term of detectors

69 Comparing three methods Constant-sized detectors V-detector New algorithm Self radius = 0.05

70 Comparing three methods Constant-sized detectors V-detectorsNew algorithm Self radius = 0.1

71 Back to the presentation


Download ppt "Real-valued negative selection algorithms Zhou Ji 11-2-2005."

Similar presentations


Ads by Google