Outlier Processing via L1-Principal Subspaces

Outlier Processing via L1-Principal Subspaces
Shubham Chamadia and Dimitris A. Pados Department of Electrical Engineering, The State University of New York at Buffalo, NY 14260 {shubhamc,

Outline Outlier detection classification Limitations of L1-PCA
The proposed outlier processing scheme Simulation results Conclusion

What is an outlier? [Hawkins, 1980] an observation that “deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. [Jarrell, 1994] a point that is far outside the norm for a variable or population. [Moore and McCabe, 1999] an observation that lies outside the overall pattern of distribution.

Outliers detection classification
Supervised: Semi-supervised: Unsupervised: Training data with labeled normal and abnormal classes. Classification is highly unbalanced, labelling often done manually. Availability of labeled samples from only one class. Declare test instance not belonging to one class as belonging to other. Most preferred, makes no assumption about availability of labeled training samples. Assumption: normal instances are more frequent than abnormal. we focus on unsupervised technique

Unsupervised detection
Principal Component Analysis (PCA) overview X2 Most valuable tool to reduce dimensionality of data. Finds new directions over which: Preserves most of samples’ information. Maximizes the data variance. X1 Problem formulation Given a real-valued data matrix Maximizes the norm of projection of over

Unsupervised detection
X2 Genesis of -PCA Outlier X1 Sensitive to extreme outliers. Nearest preferred solution often referred as norm PCA (robust to outliers). Q. To what extent L1 –PCA is robust?

Motivation to outlier processing
Setup Generate a 2D-data matrix Add 4 outlier points Limitation of and to some extent limitation of Urgent need for outlier removal in conjunction with robust

Proposed outlier processing via L1-PCs
Step 1-4: For , obtain -principal components: Ways to calculate If sample support is small, use optimal -PCA algorithm[1] For medium to high sample, use iterative suboptimal algorithm[2]

Step 2-4: Evaluate reliability weight corresponding to each sample of X2 : often known as rank -reconstruction error of each sample Outlier : relative degree (normalized) of outliers Ideally, corrupted sample lie far away from direction, high reconstruction error high weighted-value Þ X1

Step 3-4: Outlier detection and removal Prior knowledge of number of corrupted sample (say p) Discard p-highest weighted samples. Apriori knowledge often non-practical. No prior knowledge of corruption Implement (K=2)-means clustering over scalar reliability weight vector Extract sample index corresponding to higher mean cluster (potential outlier cluster) Discard samples contained in index set

Step 4-4: Recalculate -principal components over outlier processed data Expected reveals a deeper insight of given data matrix than

Step 1-4 Step 2-4 Scheme denoted by L1 + L1 Step 3-4 Step 4-4

Simulation result 1-3 Experiment 1: Data dimensionality reduction
Setup Generate data matrix , i. i. d. Gaussian distribution Corrupt certain percentage of samples by outlier vector , i. i. d. Obtain P=2 principal components Metric: Average representation error (ARE) is the PC over corrupted data

Simulation result 2-3 Experiment 2: Direction-of-Arrival (DoA) estimation Setup Uniform linear antenna array D = 7 elements, recording N = 30 complex observation . A1 A7 A2 x : received signal amplitude, : Bernoulli equiprobable bits : additive white complex Gaussian noise vector : array response vector , with signal of interest at

Simulation result 2-3 Experiment 2: Direction-of-Arrival (DoA) estimation Setup cont. Adding jammer corruption . A1 A7 A2 x Number of corrupted samples = 3 (out of 30) Number of jammers (random location) corrupting = 3 Jammer SNR = 10 dB

Simulation result 2-3 Experiment 2: Direction-of-Arrival (DoA) estimation Setup cont. MUSIC-type DoA spectrum function: : principal component over corrupted data Jammer angle signal angle

Simulation result 2-3 Experiment 2: Direction-of-Arrival (DoA) estimation Setup cont. MUSIC-type DoA spectrum function: : principal component over corrupted data Jammer angle signal angle Root mean square error (RMSE) : estimated angle at realization

Simulation result 3-3 Experiment 3: Robust image fusion Setup
Original image N = 10 grayscale image (256*256) Adding independent noise P10 Add AWGN, 𝜎 2 =50 P2 40% of 8 (out of 10) images with salt and pepper P1 P0 An extreme outlier Append with a baboon image

Simulation result 3-3 Experiment 3: Robust image fusion Processing . .
Restoration block (32*32) P2 P1 P0 x x x Total restoration blocks 256 x x 32 =64 x x . x . . . Proposed outlier scheme over x x x

Simulation result 3-3 Experiment 3: Robust image fusion original noisy
Salt-pepper Baboon

Outlier processing via L1-PCs
Conclusion: With advent of big-data, requires robust outlier detection technique. Conventional subspaces have limitations to extreme outliers. Low complexity clustering over scalar reliability weights. Simulation exhibits the demand of such a robust outlier removal scheme.

Outlier Processing via L1-Principal Subspaces

Similar presentations

Presentation on theme: "Outlier Processing via L1-Principal Subspaces"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outlier Processing via L1-Principal Subspaces

Similar presentations

Presentation on theme: "Outlier Processing via L1-Principal Subspaces"— Presentation transcript:

Similar presentations

About project

Feedback