Presentation on theme: "Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis."— Presentation transcript:
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis GECCO 20057/27/2005
outline Quick review of negative selection algorithm Description of the algorithm (mainly the new strategy to deal with detector coverage) Summary
Review of negative selection algorithms Biological metaphor: T cells and thymus Major steps: 1.Generate detector candidates randomly 2.Eliminate those that recognize self samples Major elements of a negative selection algorithm Data/detector representation Matching rule *** Detector generation algorithm
Review of negative selection algorithms It is in fact a family of algorithms Different detector representations Different detector generation mechanisms Original generation mechanism are not always used in claimed (and accepted) negative selection algorithms. Main characteristics: Representing target concept using detectors in negative space Learning by one-class training data Detector coverage/number of detectors proportion of nonself area that is covered by detectors
Main idea in this paper Stop generating detectors when the coverage is enough instead of using a pre-chosen number of detectors Some earlier works used similar statistical tools to estimate the necessary number of detectors. It is a totally different approach – though appearing similar because of the similarity in mathematics.
How to deal with detector coverage? Different possible approaches: Decide necessary number before generation. Generate enough detectors *** (The real concern is the coverage, not the number.) Estimate afterwards, e.g. from the actual detection rate.
Goal from the statistical point of view Estimate the parameter (coverage) by sample parameter (proportion – probability) Point estimate versus confidence interval Two types of statistical inference: estimation versus hypothesis testing
Statistical basics used in this method Central limit theory Sample mean approximately follows normal distribution Hypothesis testing Testing the hypothesis (e.g. the coverage is enough) instead of estimating the value of parameter (e.g. coverage) Null hypothesis: assumed true unless evidence shows otherwise Type I error and type II error: cost Type I: falsely reject hypothesis - more costly
Control parameters involved Self threshold What and how much do we know about the training data? significant level for hypothesis testing target coverage
Issue of integration Re-use the random points we get when doing hypothesis testing The coverage should not change during hypothesis testing Require minimum sample size to ensure that the hypothesis testing is valid.
summary A new negative selection algorithm is designed. Estimation of detector coverage with certain confidence is integrated with the detector generation algorithm. The same strategy is extensible to different data presentations, distance measure, or detector generation mechanism.