Principal Component Analysis

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Eigen Decomposition and Singular Value Decomposition
Eigen Decomposition and Singular Value Decomposition
Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Machine Learning Lecture 8 Data Processing and Representation
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
An introduction to Principal Component Analysis (PCA)
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Dimensional reduction, PCA
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Lecture 10: Support Vector Machines
Continuous Latent Variables --Bishop
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
CSE 185 Introduction to Computer Vision Face Recognition.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Discriminant Analysis
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis (PCA)
Principal Component Analysis
LECTURE 11: Advanced Discriminant Analysis
School of Computer Science & Engineering
LECTURE 10: DISCRIMINANT ANALYSIS
Lecture 8:Eigenfaces and Shared Features
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Principal Component Analysis
Techniques for studying correlation and covariance structure
Principal Component Analysis
Dimensionality Reduction
Feature space tansformation methods
Principal Component Analysis
Presentation transcript:

Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University http://www.public.asu.edu/~jye02

Outline of lecture What is feature reduction? Why feature reduction? Feature reduction algorithms Principal Component Analysis (PCA) Nonlinear PCA using Kernels

What is feature reduction? Feature reduction refers to the mapping of the original high-dimensional data onto a lower-dimensional space. Criterion for feature reduction can be different based on different problem settings. Unsupervised setting: minimize the information loss Supervised setting: maximize the class discrimination Given a set of data points of p variables Compute the linear transformation (projection)

What is feature reduction? Original data reduced data Linear transformation

High-dimensional data Gene expression Face images Handwritten digits

Outline of lecture What is feature reduction? Why feature reduction? Feature reduction algorithms Principal Component Analysis Nonlinear PCA using Kernels

Why feature reduction? Most machine learning and data mining techniques may not be effective for high-dimensional data Curse of Dimensionality Query accuracy and efficiency degrade rapidly as the dimension increases. The intrinsic dimension may be small. For example, the number of genes responsible for a certain type of disease may be small.

Why feature reduction? Visualization: projection of high-dimensional data onto 2D or 3D. Data compression: efficient storage and retrieval. Noise removal: positive effect on query accuracy.

Application of feature reduction Face recognition Handwritten digit recognition Text mining Image retrieval Microarray data analysis Protein classification

Outline of lecture What is feature reduction? Why feature reduction? Feature reduction algorithms Principal Component Analysis Nonlinear PCA using Kernels

Feature reduction algorithms Unsupervised Latent Semantic Indexing (LSI): truncated SVD Independent Component Analysis (ICA) Principal Component Analysis (PCA) Canonical Correlation Analysis (CCA) Supervised Linear Discriminant Analysis (LDA) Semi-supervised Research topic

Outline of lecture What is feature reduction? Why feature reduction? Feature reduction algorithms Principal Component Analysis Nonlinear PCA using Kernels

What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most of the sample's information. Useful for the compression and classification of data. By information we mean the variation present in the sample, given by the correlations between the original variables. The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains.

Geometric picture of principal components (PCs) the 1st PC is a minimum distance fit to a line in X space the 2nd PC is a minimum distance fit to a line in the plane perpendicular to the 1st PC PCs are a series of linear least squares fits to a sample, each orthogonal to all the previous.

Algebraic definition of PCs Given a sample of n observations on a vector of p variables define the first principal component of the sample by the linear transformation where the vector is chosen such that is maximum.

Algebraic derivation of PCs To find first note that where is the covariance matrix. In the following, we assume the Data is centered.

Algebraic derivation of PCs Assume Form the matrix: then Obtain eigenvectors of S by computing the SVD of X:

Algebraic derivation of PCs To find that maximizes subject to Let λ be a Lagrange multiplier is an eigenvector of S therefore corresponding to the largest eigenvalue

Algebraic derivation of PCs To find the next coefficient vector maximizing uncorrelated subject to and to First note that then let λ and φ be Lagrange multipliers, and maximize

Algebraic derivation of PCs

Algebraic derivation of PCs We find that is also an eigenvector of S whose eigenvalue is the second largest. In general The kth largest eigenvalue of S is the variance of the kth PC. The kth PC retains the kth greatest fraction of the variation in the sample.

Algebraic derivation of PCs Main steps for computing PCs Form the covariance matrix S. Compute its eigenvectors: Use the first d eigenvectors to form the d PCs. The transformation G is given by

Optimality property of PCA Reconstruction Dimension reduction Original data

Optimality property of PCA Main theoretical result: The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem: reconstruction error PCA projection minimizes the reconstruction error among all linear projections of size d.

Applications of PCA Eigenfaces for recognition. Turk and Pentland. 1991. Principal Component Analysis for clustering gene expression data. Yeung and Ruzzo. 2001. Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. Lilien. 2003.

PCA for image compression d=1 d=2 d=4 d=8 Original Image d=16 d=32 d=64 d=100

Outline of lecture What is feature reduction? Why feature reduction? Feature reduction algorithms Principal Component Analysis Nonlinear PCA using Kernels

Motivation Linear projections will not detect the pattern.

Nonlinear PCA using Kernels Traditional PCA applies linear transformation May not be effective for nonlinear data Solution: apply nonlinear transformation to potentially very high-dimensional space. Computational efficiency: apply the kernel trick. Require PCA can be rewritten in terms of dot product. More on kernels later

Nonlinear PCA using Kernels Rewrite PCA in terms of dot product The covariance matrix S can be written as Let v be The eigenvector of S corresponding to nonzero eigenvalue Eigenvectors of S lie in the space spanned by all data points.

Nonlinear PCA using Kernels The covariance matrix can be written in matrix form: Any benefits?

Nonlinear PCA using Kernels Next consider the feature space: The (i,j)-th entry of is Apply the kernel trick: K is called the kernel matrix.

Nonlinear PCA using Kernels Projection of a test point x onto v: Explicit mapping is not required here.

Reference Principal Component Analysis. I.T. Jolliffe. Kernel Principal Component Analysis. Schölkopf, et al. Geometric Methods for Feature Extraction and Dimensional Reduction. Burges.