Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applications of one-class classification -- searching for comparable applications for negative selection algorithms.

Similar presentations


Presentation on theme: "Applications of one-class classification -- searching for comparable applications for negative selection algorithms."— Presentation transcript:

1 Applications of one-class classification -- searching for comparable applications for negative selection algorithms

2 background Purpose: looking for real world applications that demonstrates the usage of V-detector (a negative selection algorithm) Purpose: looking for real world applications that demonstrates the usage of V-detector (a negative selection algorithm) One-class classification problem: One-class classification problem: Different from conventional classification: only information of one of the classes (target class) is available Different from conventional classification: only information of one of the classes (target class) is available Original application: anomaly (outliner) detection Original application: anomaly (outliner) detection

3 One-class Classification Basic concept of classification: Basic concept of classification: A classifier is a function which outputs a class label from each input object. It cannot be constructed from known rules. A classifier is a function which outputs a class label from each input object. It cannot be constructed from known rules. In pattern recognition or machine learning: inferring a classifier (a function) from a set of training examples. In pattern recognition or machine learning: inferring a classifier (a function) from a set of training examples. Usually, the type of function is chosen beforehand and parameters are to be determined. Usually, the type of function is chosen beforehand and parameters are to be determined. Line classifier, mixture of Gaussians, neural networks, support vector classifiers Line classifier, mixture of Gaussians, neural networks, support vector classifiers

4 One-class Classification Basic concept of classification: Basic concept of classification: Assumptions: continuity, enough information (amount of samples, limited noise), etc. Assumptions: continuity, enough information (amount of samples, limited noise), etc. Multi-class classification can be decomposed into two-class classifications Multi-class classification can be decomposed into two-class classifications

5 One-class classification Same problems as conventional classification Same problems as conventional classification Definition of errors Definition of errors Atypical training data Atypical training data Measuring the complexity of a solution Measuring the complexity of a solution The curse of dimensionality The curse of dimensionality The generalization of the method The generalization of the method

6 A conventional and a one-class classifier applied to an example dataset containing apples and pears, represented by 2 features per object. The solid line is the conventional classifier which distinguishes between the apples and pears, while the dashed line describes the dataset. This description can identify the outlier apple in the lower right corner, while the classifier will just classify it as an pear.

7 Additional problems Additional problems Most conventional classifiers assumption that more or less balanced data. Most conventional classifiers assumption that more or less balanced data. Hard to decide on the basis of on class how tightly the boundary should fit in each direction around the data. Hard to decide on the basis of on class how tightly the boundary should fit in each direction around the data. Hard to find which features should be used to find the best separation. Hard to find which features should be used to find the best separation. Impossible to estimate false positives. Impossible to estimate false positives. More prominent curse of dimension. More prominent curse of dimension. Extra constraints: closed boundary etc. Extra constraints: closed boundary etc. One-class classification

8 Various techniques Generate outliner detection Generate outliner detection Some methods requires near-target objects; Some methods requires near-target objects; Density method: directly estimating the density of target objects Density method: directly estimating the density of target objects some works requires density estimate in the complete feature space some works requires density estimate in the complete feature space Typical sample is assumed Typical sample is assumed Reconstruction methods: based on prior knowledge Reconstruction methods: based on prior knowledge Boundary methods Boundary methods Well defined distance Well defined distance

9 Application 1: texture classification Problem: classification of texture images Problem: classification of texture images polished granite (or ceramic) tiles that are widely used as construction elements The polished granite tiles are usually inspected by a human expert using a chosen master tile as the reference. Such inspection is subjective and qualitative One-class classifier is suitable: Outliners cannot be used to train any methods Recent development quasi-statistical representation of binary images used as a feature space for texture image classification

10 Based on CCR feature space (coordinated cluster representation) Based on CCR feature space (coordinated cluster representation) Outline of the method: Outline of the method: Given a master texture image of a class, estimate statistics of CCR histogram Given a master texture image of a class, estimate statistics of CCR histogram Use parameters of the statistics to define a closed decision boundary. Use parameters of the statistics to define a closed decision boundary.

11 Master images

12 CCR feature space A binary image intensity: S ={s {l,m}}, where l=1, 2, …L and m=1, 2, …, M A binary image intensity: S ={s {l,m}}, where l=1, 2, …L and m=1, 2, …, M A rectangular window W = I X J A rectangular window W = I X J Scan all over the image with one pixel steps using that window Scan all over the image with one pixel steps using that window The number of all possible state of the window is 2 w The number of all possible state of the window is 2 w Coordinated clusters representation consists of a histogram H (I,J) (b) Coordinated clusters representation consists of a histogram H (I,J) (b) is the index of the image is the index of the image (I,J) indicated the size of the window (I,J) indicated the size of the window b = 1, 2, …, 2 w b = 1, 2, …, 2 w

13 When a histogram is normalized, it is considered as a proability distribution function of occurrence When a histogram is normalized, it is considered as a proability distribution function of occurrence F (I,J) (b) = 1/A H (I,J) (b) F (I,J) (b) = 1/A H (I,J) (b) Where A = (L-I+1)X(M-J+1) Where A = (L-I+1)X(M-J+1) Histogram H contains all the information about n-point correlation moments of the image if and only if the separation vectors between n pixels fit between the scanning window Histogram H contains all the information about n-point correlation moments of the image if and only if the separation vectors between n pixels fit between the scanning window In general, when the order of statistics is higher, more structural information is available In general, when the order of statistics is higher, more structural information is available There is a structural correspondence between a gray level image and its thresholded counterpart There is a structural correspondence between a gray level image and its thresholded counterpart Provided that the binary image keeps enough structural information about a primary gray level image to be classified, the CCR of a binary image is highly suitable for recognition and classification of gray level texture image Provided that the binary image keeps enough structural information about a primary gray level image to be classified, the CCR of a binary image is highly suitable for recognition and classification of gray level texture image

14 Framework of classification Framework of classification Training phase Training phase a set of gray level image from each texture class a set of gray level image from each texture class Each threshold Each threshold Calculate CCR distribution function Calculate CCR distribution function Recognition phase Recognition phase Input test image Input test image Thresholded Thresholded CCR distribution CCR distribution Compare with prototypes and assign to the class of best match Compare with prototypes and assign to the class of best match One-class classification One-class classification Define the limits of feature variations Define the limits of feature variations Establish the criterion Establish the criterion

15 Thresholding (binarization) Thresholding (binarization) Because CCR is defined for binary image Because CCR is defined for binary image Fuzzy C-Means clustering method Fuzzy C-Means clustering method

16 Training phase Training phase assuming Q images of a class are available, a random set of P subimages is sampled assuming Q images of a class are available, a random set of P subimages is sampled If only one image is available, Q independent random sets are sampled If only one image is available, Q independent random sets are sampled Five measurements are calculated from distribution function F Five measurements are calculated from distribution function F 1. F: mass center of subimages (not a value, still a function or histogram) 2. D: mean of distance (distance refers to the mean distance within a set) : mean of standard deviation (standard deviation refers to the standard deviation of a set) : mean of standard deviation (standard deviation refers to the standard deviation of a set) 4. : mean of q-th sample center to the center of all samples 4. D: mean of q-th sample center to the center of all samples 2 : variance of each sets with regard to center of samples 2 : variance of each sets with regard to center of samples

17 Criterion Criterion d(F test, F) < D+C d(F test, F) < D+C D-2

18 Results Results C should be in the range of 1, 2, …, 20 C should be in the range of 1, 2, …, 20 Based on observation that is approximately ten times less than D Based on observation that is approximately ten times less than D 8 master images (training data) plus 16 testing images are used (128X128). 8 master images (training data) plus 16 testing images are used (128X128). For C=1 or 2, only the master images are recognized For C=1 or 2, only the master images are recognized More images are recognized for larger C More images are recognized for larger C For C<19, no mis-classification For C<19, no mis-classification Proper C depends on the size of subimage (32, 24, 64 are discussed) Proper C depends on the size of subimage (32, 24, 64 are discussed)

19 Application 2: authorship Problem: authorship verification Problem: authorship verification Different from standard text categorization problem Different from standard text categorization problem No realistic to train with negative samples No realistic to train with negative samples Difference from other one-class classification Difference from other one-class classification 1.Negative samples are not lacked – hard to choose to represent the entire class 2.The object texts are long We can chunk to multiple samples – a set instead of single instance We can chunk to multiple samples – a set instead of single instance

20 New idea: New idea: Depth of difference between two sets Depth of difference between two sets Test the rate of degradation of accuracy as the best features are iteratively dropped Test the rate of degradation of accuracy as the best features are iteratively dropped

21 Standard method Standard method Choose a feature set: frequencies of function words, syntactic structures, parts-of-speech n-grams, complexity and richness measure, syntactic and orthographic idiosyncrasies Choose a feature set: frequencies of function words, syntactic structures, parts-of-speech n-grams, complexity and richness measure, syntactic and orthographic idiosyncrasies Note: very different from text categorization by topic Note: very different from text categorization by topic Having constructed feature vectors, use learning algorithm to construct distinguishing model Having constructed feature vectors, use learning algorithm to construct distinguishing model Similar to categorization by topic Similar to categorization by topic Liner separators are believed to work well Liner separators are believed to work well Assessment: k-fold cross-validation or bootstrapping Assessment: k-fold cross-validation or bootstrapping

22 One-class scenario One-class scenario Naïve approach Naïve approach Chunk two works to generate two sufficient large sets Chunk two works to generate two sufficient large sets Test if we can distinguish using cross-validation with high accuracy Test if we can distinguish using cross-validation with high accuracy Failed in experiment (different works are just different enough to tell) Failed in experiment (different works are just different enough to tell) New approach: unmasking New approach: unmasking In the above approach, a small number of features are doing all the work. They are likely to be from thematic differences, difference in genre or purpose, chronological shift of style, deliberate attempt to mask identity In the above approach, a small number of features are doing all the work. They are likely to be from thematic differences, difference in genre or purpose, chronological shift of style, deliberate attempt to mask identity Unmasking: removing features that are most useful to distinguish Unmasking: removing features that are most useful to distinguish Hypothesis: if they are by the same author, difference will be refelected in only a relative small number of features Hypothesis: if they are by the same author, difference will be refelected in only a relative small number of features Sudden degradation shows the same author Sudden degradation shows the same author

23

24 Results: Results: Corpus: th century English iterature Corpus: th century English iterature Baseline: one-class SVM Baseline: one-class SVM Extension: using negative samples to eliminate false positive Extension: using negative samples to eliminate false positive Solution to a literary mystery: the case of the bashful rabbi Solution to a literary mystery: the case of the bashful rabbi

25 bibliography D. M. J. Tax, One-class classification, PhD thesis, 2001 D. M. J. Tax, One-class classification, PhD thesis, 2001 D. M. J. Tax, Data description toolbox: A Matlab toolbox for data description, outlier and novelty detection D. M. J. Tax, Data description toolbox: A Matlab toolbox for data description, outlier and novelty detection M. Koppel and J. Schler, Authorship verification as a one- class classification problem, in Proceedings of 21 st International Conference on Machine Learning, M. Koppel and J. Schler, Authorship verification as a one- class classification problem, in Proceedings of 21 st International Conference on Machine Learning, R.E.Sanchez-Yanez et al, One-class texture classifier in the CCRfeature space, Pattern Recognition Letter, 24, R.E.Sanchez-Yanez et al, One-class texture classifier in the CCRfeature space, Pattern Recognition Letter, 24, R.E.Sanchez-Yanez et al, A framework for texture classification using the coordinated clusters representation, Pattern Recognition Letter, 24, R.E.Sanchez-Yanez et al, A framework for texture classification using the coordinated clusters representation, Pattern Recognition Letter, 24, 2003.


Download ppt "Applications of one-class classification -- searching for comparable applications for negative selection algorithms."

Similar presentations


Ads by Google