What is matching rule? When a sample and a detector are considered matching. Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.
In real-valued representation, detector can be visualized as hyper-sphere. Candidate 1: thrown-away; candidate 2: made a detector. Match or not match?
Main idea of V-detector By allowing the detectors to have some variable properties, V-detector enhances negative selection algorithm from several aspects: It takes fewer large detectors to cover non-self region – saving time and space Small detector covers holes better. Coverage is estimated when the detector set is generated. The shapes of detectors or even the types of matching rules can be extended to be variable too.
Main concept of Negative Selection and V-detector Constant-sized detectorsVariable-sized detectors
Outline of the algorithm (generation of variable-sized detector set)
Detector Set Generation Algorithm Constant-sized detectors Variable-sized detectors
Screenshots of the software Message view Visualization of data points and detectors
Experiments and Results Synthetic Data 2D. Training data are randomly chosen from the normal region. Fishers Iris Data One of the three types is considered as normal. Biomedical Data Abnormal data are the medical measures of disease carrier patients. Air Pollution Data Abnormal data are made by artificially altering the normal air measurements Ball bearings: Measurement: time series data with preprocessing - 30D and 5D
Synthetic data - Cross-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1
Synthetic data - Cross-shaped self space Results Detection rate and false alarm rateNumber of detectors
Iris data Comparison with other methods: number of detectors meanmaxMinSD Setosa 100%204257.87 Setosa 50%16.443355.63 Veriscolor 100%153.242557238.8 Versicolor 50%110.081846022.61 Virginica 100%218.364437866.11 Virginica 50%108.122034630.74
Iris data Virginica as normal, 50% points used to train Detection rate and false alarm rateNumber of detectors
Biomedical data Blood measure for a group of 209 patients Each patient has four different types of measurement 75 patients are carriers of a rare genetic disorder. Others are normal.
Biomedical data Detection rate and false alarm rateNumber of detectors
Air pollution data Totally 60 original records. Each is 16 different measurements concerning air pollution. All the real data are considered as normal. More data are made artificially: 1. Decide the normal range of each of 16 measurements 2. Randomly choose a real record 3. Change three randomly chosen measurements within a larger than normal range 4. If some the changed measurements are out of range, the record is considered abnormal; otherwise they are considered normal Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.
Air pollution data Detection rate and false alarm rateNumber of detectors
Ball bearing data raw data: time series of acceleration measurements Preprocessing (from time domain to representation space for detection) 1. FFT (Fast Fourier Transform) with Hanning windowing: window size 30 2. Statistical moments: up to 5 th order
Example of data (raw data of new bearings) --- first 1000 points
Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points
Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points
Ball bearings structure and damage Damaged cage
Ball bearing data: results Ball bearing conditionsTotal number of data pointsNumber of detected anomalies Percentage detected New bearing (normal)273900% Outer race completely broken2241218297.37% Broken cage with one loose element298857719.31% Damage cage, four loose elements298833711.28% No evident damage; badly worn29882096.99% Ball bearing conditionsTotal number of data pointsNumber of detected anomalies Percentage detected New bearing (normal)265100% Outer race completely broken2169167477.18% Broken cage with one loose element2892140.48% Damage cage, four loose elements289200% No evident damage; badly worn289200% Preprocessed with FFT Preprocessed with statistical moments
New development of this work A new algorithm to generate variable-sized detectors. Purpose: reduce the possible false negative at the boundary of self region Why the issue exits: some self samples may be very close to the boundary. Main idea: differentiate between internal self samples and boundary self samples Solution: combine the advantage of the algorithms to generate variable-sized and constant-sized detectors described previously.
Summary 1. V-detector uses fewer detectors to obtain similar coverage. 2. Smaller detectors are more acceptable if the total number of detectors are largely controlled. 3. Coverage estimate is superior to fixed number of detectors. 4. V-detector can deal with high-dimensional data, including time series, better. 5. Self radius and estimated coverage are the two control parameters in V-detector. 6. Variable size, variable shape, variable matching rules, or other variable properties of detectors provide encouraging opportunity to enhance negative selection mechanism.