Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan.

Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan

Motivation Feature Extraction for Outlier Detection 2 Outlier detection techniques  Compute distances between points in full feature space  Curse of dimensionality  Solution: feature extraction Feature extraction techniques  Do not consider class imbalance  Not suitable for asymmetric classification (and outlier detection!)

Overview Feature Extraction for Outlier Detection 3 DROUT  Dimensionality Reduction/Feature Extraction for OUTlier Detection  Extract features for the detection process  To be integrated with outlier detectors Training set DROUT Features Testing set Detector Outliers

Background Feature Extraction for Outlier Detection 4 Training set:  Normal class ω m : cardinality N m, mean vector μ m, covariance matrix ∑ m  Anomaly class ω a : cardinality N m, mean vector μ a, covariance matrix ∑ a  N m >> N a  Total number of points: N t = N m + N a ∑ w = (N m /N t ). ∑ m + (N a /N t ). ∑ a ∑ b = (N m /N t ). (μ m – μ t ) (μ m – μ t ) T + (N a /N t ). (μ a – μ t )(μ a – μ t ) T ∑ t = ∑ w + ∑ b

Background (cont.) Feature Extraction for Outlier Detection 5 Eigenspace of scatter matrix ∑ : (spanned by eigenvectors)  Consists of 3 subspaces: principal, noise, and null space  Solving eigenvalue problem and obtain d eigenvalues v 1 ≥ v 2 ≥ … ≥ v d  Noise and null subspaces are caused by noise and mainly by the insufficient training data  Existing methods: discard the noise and null subspaces  loss of information  Jiang et al. 2008: regularize all 3 subspaces before performing feature extraction 1mrd PN Ø 0 Plot of eigenvalues

DROUT Approach Feature Extraction for Outlier Detection 6 Weight-adjusted Within-Class Scatter Matrix  ∑ w = (N m /N t ). ∑ m + (N a /N t ). ∑ a  N m >> N a  ∑ a is far less reliable than ∑ m  Weighing ∑ m and ∑ a according to (N m /N t ) and (N a /N t )  when doing feature extraction on ∑ w (using PCA etc.), dimensions (eigenvectors) specified mainly by small eigenvalues of ∑ m unexpectedly removed  dimensions extracted are not really relevant for the asymmetric classification task Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classiﬁcation. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009 Solutions  ∑ w = w m. ∑ m + w a. ∑ a  w m < w a and w m + w a = 1  more suitable for asymmetric classification

DROUT Approach (cont.) Feature Extraction for Outlier Detection 7 Which matrix to regularize first?  Goal: extract features that minimize the within-class and maximize the between-class variances  Within-class variances are estimated from limited training data  small variances estimated tend to be unstable and cause overﬁtting  proceed with regularizing 3 subspaces of the adjusted within-class scatter matrix

DROUT Approach (cont.) Feature Extraction for Outlier Detection 8 Subspace decomposition  Solving eigenvalue problem on (weight-adjusted) ∑ w and obtain eigenvectors {e 1, e 2, …, e d } with corresponding eigenvalues v 1 ≥ v 2 ≥ … ≥ v d  Identify m: v med = median i ≤ r {v i } v m+1 = max i ≤ r {v i | v i < 2v med – v r } 1mrd PN Ø 0 Plot of eigenvalues

DROUT Approach (cont.) Feature Extraction for Outlier Detection 9 Subspace regularization  a = v 1. v m. (m – 1)/(v 1 – v m )  b = (mv m – v 1 )/(v 1 – v m )  Regularize: i ≤ m: x i = v i m < i ≤ r: x i = a/(i + b) r < i ≤ d: x i = a/(r + 1 + b)  A = [e i. w i ] 1 ≤ i ≤ d where w i = 1/sqrt(x i ) 1mrd PN Ø 0

DROUT Approach (cont.) Feature Extraction for Outlier Detection 10 Subspace regularization  p T = A T. p with p being a data point  Form new (weight-adjusted) total scatter matrix (slide 4) and solve the eigenvalue problem using it  B = matrix of c resulting eigenvectors with largest eigenvalues  feature extraction done only after regularization  limit loss of information Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008  Transform matrix: M = A. B

DROUT Approach (cont.) Feature Extraction for Outlier Detection 11 Summary:  Let ∑ w = w m. ∑ m + w a. ∑ a  Compute A from ∑ w  Transform the training set using A  Compute the new total scatter matrix ∑ t  Compute B by solving the eigenvalue problem on ∑ t  M = A. B  Use M to transform the testing set

Related Work Feature Extraction for Outlier Detection 12 APCDA Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classiﬁcation. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009  Uses weight-adjusted scatter matrices for feature extraction  Discards noise and null subspaces  loss of information ERE Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008  Performs regularization before feature extraction  Ignores class imbalance  not suitable for outlier detection ACP David Lindgren and Per Spangeus: A novel feature extraction algorithm for asymmetric classiﬁcation. IEEE Sensors Journal, 4(5):643–650, 2004  Consider neither noise-null subspaces nor class imbalance

Outlier Detection with DROUT Feature Extraction for Outlier Detection 13 Detectors:  ORCA Stephen D. Bay and Mark Schwabacher: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In KDD, pages 29–38, 2003  BSOUT George Kollios, Dimitrios Gunopulos, Nick Koudas, and Stefan Berchtold: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng., 15(5):1170–1187, 2003

Outlier Detection with DROUT (cont.) Feature Extraction for Outlier Detection 14 Datasets:  KDD Cup 1999 Normal class (60593 records) vs. U2R class (246) d = 34 (7 categorical attributes are excluded) Training set: 1000 normal recs. vs. 50 anomalous recs.  Ann-thyroid 1 Class 3 vs. class 1 d = 21 Training set: 450 normal recs. vs. 50 anomalous recs.  Ann-thyroid 2 Class 3 vs. class 2 d = 21 Training set: 450 normal recs. vs. 50 anomalous recs. Parameter settings:  w m = 0.1 and w a = 0.9  Number of extracted features b ≤ d/2

Results Feature Extraction for Outlier Detection 15

Results (cont.) Feature Extraction for Outlier Detection 16

Conclusion Feature Extraction for Outlier Detection 17 Summary of contributions  Explore the effect of feature extraction on outlier detection  Results on real datasets and two detection methods are promising  A novel framework for ensemble outlier detection. Experiments on real data sets seem to be promising Future work  More experiments on larger datasets  Examine other possibilities of dimensionality reduction

Last words…

Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan.

Similar presentations

Presentation on theme: "Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan.

Similar presentations

Presentation on theme: "Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan."— Presentation transcript:

Similar presentations

About project

Feedback