Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.
Machine Learning Lecture 8 Data Processing and Representation
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Face Recognition Under Varying Illumination Erald VUÇINI Vienna University of Technology Muhittin GÖKMEN Istanbul Technical University Eduard GRÖLLER Vienna.
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Dimensional reduction, PCA
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Principal Components Analysis on Images and Face Recognition
Chapter 2 Dimensionality Reduction. Linear Methods
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.
Element 2: Discuss basic computational intelligence methods.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Basics of Neural Networks Neural Network Topologies.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Discriminant Analysis
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Feature Extraction 主講人:虞台文.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis (PCA)
Dimensionality Reduction
LECTURE 10: DISCRIMINANT ANALYSIS
Face Recognition and Feature Subspaces
Lecture: Face Recognition and Feature Reduction
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Principal Component Analysis
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
PCA vs ICA vs LDA.
Principal Component Analysis
Introduction PCA (Principal Component Analysis) Characteristics:
Recitation: SVD and dimensionality reduction
Dimensionality Reduction
Feature space tansformation methods
LECTURE 09: DISCRIMINANT ANALYSIS
Feature Selection Methods
Principal Component Analysis
Presentation transcript:

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear discriminant analysis (LDA)

Feature reduction vs feature selection Feature reduction – All original features are used – The transformed features are linear combinations of the original features. Feature selection – Only a subset of the original features are used.

Why feature reduction? Most machine learning and data mining techniques may not be effective for high- dimensional data – Curse of Dimensionality – Query accuracy and efficiency degrade rapidly as the dimension increases. The intrinsic dimension may be small. – For example, the number of genes responsible for a certain type of disease may be small.

Why feature reduction? Visualization: projection of high-dimensional data onto 2D or 3D. Data compression: efficient storage and retrieval. Noise removal: positive effect on query accuracy.

Applications of feature reduction Face recognition Handwritten digit recognition Text mining Image retrieval Microarray data analysis Protein classification

High-dimensional data in bioinformatics Gene expression Gene expression pattern images

High-dimensional data in computer vision Face imagesHandwritten digits

Principle component analysis (PCA) Motivation Example: Given 53 blood and urine samples (features) from 65 people. How can we visualize the measurements?

PCA Matrix format (65x53) Instances Features Difficult to see the correlations between the features...

PCA - Data Visualization Spectral format (53 pictures, one for each feature) Difficult to see the correlations between the features...

Bi-variate Tri-variate PCA - Data Visualization How can we visualize the other variables??? … difficult to see in 4 or higher dimensional spaces...

Data Visualization Is there a representation better than the coordinate axes? Is it really necessary to show all the 53 dimensions? – … what if there are strong correlations between the features? How could we find the smallest subspace of the 53-D space that keeps the most information about the original data? A solution: Principal Component Analysis

Principle Component Analysis Orthogonal projection of data onto lower-dimension linear space that... – maximizes variance of projected data (purple line) – minimizes mean squared distance between data point and projections (sum of blue lines) PCA:

Principle Components Analysis Idea: – Given data points in a d-dimensional space, project into lower dimensional space while preserving as much information as possible Eg, find best planar approximation to 3D data Eg, find best 12-D approximation to D data – In particular, choose projection that minimizes squared error in reconstructing original data

Vectors originating from the center of mass Principal component #1 points in the direction of the largest variance. Each subsequent principal component… – is orthogonal to the previous ones, and – points in the directions of the largest variance of the residual subspace The Principal Components

2D Gaussian dataset

1 st PCA axis

2 nd PCA axis

PCA algorithm I (sequential) We maximize the variance of the projection in the residual subspace We maximize the variance of projection of x x’ PCA reconstruction Given the centered data {x 1, …, x m }, compute the principal vectors: 1 st PCA vector k th PCA vector w1(w1Tx)w1(w1Tx) w2(w2Tx)w2(w2Tx) x w1w1 w2w2 x’=w 1 (w 1 T x)+w 2 (w 2 T x) w

PCA algorithm II (sample covariance matrix) Given data {x 1, …, x m }, compute covariance matrix  PCA basis vectors = the eigenvectors of  Larger eigenvalue  more important eigenvectors where

PCA algorithm II PCA algorithm(X, k): top k eigenvalues/eigenvectors % X = N  m data matrix, % … each data point x i = column vector, i=1..m X  subtract mean x from each column vector x i in X   X X T … covariance matrix of X { i, u i } i=1..N = eigenvectors/eigenvalues of   1  2  …  N Return { i, u i } i=1.. k % top k principle components

PCA algorithm III (SVD of the data matrix) Singular Value Decomposition of the centered data matrix X. X features  samples = USV T X VTVT SU = samples significant noise significant sig.

PCA algorithm III Columns of U – the principal vectors, { u (1), …, u (k) } – orthogonal and has unit norm – so U T U = I – Can reconstruct the data using linear combinations of { u (1), …, u (k) } Matrix S – Diagonal – Shows importance of each eigenvector Columns of V T – The coefficients for reconstructing the samples

Input: n x k data matrix, with k original variables: x 1,x 2,...,x k Output: a 11 a a 1k λ 1 a 21 a 22 a 2k λ 2...… a k1 a k2 a kk λ k k x k transfer matrixk x 1 Eigen values w PCA – Input and Output

From k original variables: x 1,x 2,...,x k : Produce k new variables: y 1,y 2,...,y k : y 1 = a 11 x 1 + a 12 x a 1k x k y 2 = a 21 x 1 + a 22 x a 2k x k... y k = a k1 x 1 + a k2 x a kk x k such that: y i 's are uncorrelated (orthogonal) y 1 explains as much as possible of original variance in data set y 2 explains as much as possible of remaining variance etc. y i 's are Principal Components PCA – usage (k to k transfer)

PCA - usage n x k data k x k w T n x k transferred data x= n x k data k x t W’ T n x t transfe rred data x= Dimension reduction Direct transfer

From k original variables: x 1,x 2,...,x k : Produce t new variables: y 1,y 2,...,y t : y 1 = a 11 x 1 + a 12 x a 1k x k y 2 = a 21 x 1 + a 22 x a 2k x k... y t = a t1 x 1 + a t2 x a tk x k such that: y i 's are uncorrelated (orthogonal) y 1 explains as much as possible of original variance in data set y 2 explains as much as possible of remaining variance etc. PCA – usage (Dimension reduction) Use first t rows Of Matrix w

PCA Summary until now Rotates multivariate dataset into a new configuration which is easier to interpret Purposes – simplify data – look at relationships between variables – look at patterns of units

Example of PCA on Iris Data

PCA application on image compression

Original Image Divide the original 372x492 image into patches: Each patch is an instance that contains 12x12 pixels on a grid View each as a 144-D vector

L 2 error and PCA dim

PCA compression: 144D ) 60D

PCA compression: 144D ) 16D

16 most important eigenvectors

PCA compression: 144D ) 6D

6 most important eigenvectors

PCA compression: 144D ) 3D

3 most important eigenvectors

PCA compression: 144D ) 1D

60 most important eigenvectors Looks like the discrete cosine bases of JPG!...

2D Discrete Cosine Basis

PCA for image compression p=1p=2p=4p=8 p=16p=32p=64p=100 Original Image

On Class Practice Data – Iris.txt (Neucom format) and your own data (if applicable) Method: PCA, LDA, SNR Software – Neucom v0.919 – Steps: Visualization->PCA – Steps: Visualization->LDA – Steps: Data Analysis->SNR

Linear Discriminant Analysis First applied by M. Barnard at the suggestion of R. A. Fisher (1936), Fisher linear discriminant analysis (FLDA): Dimension reduction – Finds linear combinations of the features X=X 1,...,X d with large ratios of between-groups to within-groups sums of squares - discriminant variables; Classification – Predicts the class of an observation X by the class whose mean vector is closest to X in terms of the discriminant variables

Is PCA a good criterion for classification? Data variation determines the projection direction What’s missing? – Class information

What is a good projection? Similarly, what is a good criterion? – Separating different classes Two classes overlap Two classes are separated

What class information may be useful? Between-class distance – Distance between the centroids of different classes

What class information may be useful? Within-class distance – Accumulated distance of an instance to the centroid of its class Between-class distance –Distance between the centroids of different classes

Linear discriminant analysis Linear discriminant analysis (LDA) finds most discriminant projection by maximizing between-class distance and minimizing within-class distance

Linear discriminant analysis Linear discriminant analysis (LDA) finds most discriminant projection by maximizing between-class distance and minimizing within-class distance

Notations Data matrix Training data from different from 1, 2, …, k

Notations Between-class scatter Properties: Between-class distance = trace of between-class scatter (I.e., the summation of diagonal elements of the scatter) Within-class distance = trace of within-class scatter Within-class scatter

Discriminant criterion Discriminant criterion in mathematical formulation – Between-class scatter matrix – Within-class scatter matrix The optimal transformation is given by solving a generalized eigenvalue problem

Graphical view of classification d n n K-1 d 1 Find the nearest neighbor Or nearest centroid d 1 A test data point h

LDA Computation

Input: n x k data matrix, with k original variables: x 1,x 2,...,x k Output: a 11 a a 1k λ 1 a 21 a 22 a 2k λ 2...… a k1 a k2 a kk λ k k x k transfer matrixk x 1 Eigen values w LDA – Input and Output

LDA - usage n x k data k x k w T n x k transferred data x= n x k data k x t W’ T n x t transfe rred data x= Dimension reduction Direct transfer

LDA Principle: Maximizing between class distances and Minimizing within class distances

An Example: Fisher’s Iris Data Actual Group Number of Observations Predicted Group SetosaVersicolorVirginica Setosa Versicolor Virginica Table 1:Linear Discriminant Analysis (APER = )

LDA on Iris Data

On Class Practice Data – Iris.txt (Neucom format) and your own data (if applicable) Method: PCA, LDA, SNR Software – Neucom v0.919 – Steps: Visualization->PCA – Steps: Visualization->LDA – Steps: Data Analysis->SNR