SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.

Slides:



Advertisements
Similar presentations
3D Geometry for Computer Graphics
Advertisements

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Principal Component Analysis
Computer Graphics Recitation 5.
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Singular Value Decomposition COS 323. Underconstrained Least Squares What if you have fewer data points than parameters in your function?What if you have.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
3D Geometry for Computer Graphics
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Recognition – PCA and Templates. Recognition Suppose you want to find a face in an imageSuppose you want to find a face in an image One possibility: look.
SVD and PCA COS 323, Spring 05. SVD and PCA Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional subspacePrincipal.
Proximity matrices and scaling Purpose of scaling Classical Euclidean scaling Non-Euclidean scaling Non-Metric Scaling Example.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
SVD(Singular Value Decomposition) and Its Applications
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
CSE 185 Introduction to Computer Vision Face Recognition.
CSSE463: Image Recognition Day 27 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Manifold learning: MDS and Isomap
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Analyzing Expression Data: Clustering and Stats Chapter 16.
CSE 446 Dimensionality Reduction and PCA Winter 2012 Slides adapted from Carlos Guestrin & Luke Zettlemoyer.
Principle Component Analysis and its use in MA clustering Lecture 12.
Recognition, SVD, and PCA. Recognition Suppose you want to find a face in an imageSuppose you want to find a face in an image One possibility: look for.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Unsupervised Learning II Feature Extraction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.
Unsupervised Learning II Feature Extraction
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Dimensionality Reduction
Lecture 8:Eigenfaces and Shared Features
Lecture: Face Recognition and Feature Reduction
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Principal Component Analysis
Singular Value Decomposition
Principal Component Analysis
Dimensionality Reduction
Recitation: SVD and dimensionality reduction
Presentation transcript:

SVD and PCA COS 323

Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to lower number of dimensions Preserve structure: pairwise distances, etc.Preserve structure: pairwise distances, etc. Useful for further processing:Useful for further processing: – Less computation, fewer parameters – Easier to understand, visualize

PCA Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional linear subspacePrincipal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional linear subspace Original axes * * * * * * * *** * * *** * * * * * * * * * Data points First principal component Second principal component

SVD and PCA Data matrix with points as rows, take SVDData matrix with points as rows, take SVD – Subtract out mean (“whitening”) Columns of V k are principal componentsColumns of V k are principal components Value of w i gives importance of each componentValue of w i gives importance of each component

PCA on Faces: “Eigenfaces” Average face First principal component Other components For all except average, “gray” = 0, “white” > 0, “black” < 0

Uses of PCA Compression: each new image can be approximated by projection onto first few principal componentsCompression: each new image can be approximated by projection onto first few principal components Recognition: for a new image, project onto first few principal components, match feature vectorsRecognition: for a new image, project onto first few principal components, match feature vectors

PCA for Relighting Images under different illuminationImages under different illumination [Matusik & McMillan]

PCA for Relighting Images under different illuminationImages under different illumination Most variation captured by first 5 principal components – can re-illuminate by combining only a few imagesMost variation captured by first 5 principal components – can re-illuminate by combining only a few images [Matusik & McMillan]

PCA for DNA Microarrays Measure gene activation under different conditionsMeasure gene activation under different conditions [Troyanskaya]

PCA for DNA Microarrays Measure gene activation under different conditionsMeasure gene activation under different conditions [Troyanskaya]

PCA for DNA Microarrays PCA shows patterns of correlated activationPCA shows patterns of correlated activation – Genes with same pattern might have similar function [Wall et al.]

PCA for DNA Microarrays PCA shows patterns of correlated activationPCA shows patterns of correlated activation – Genes with same pattern might have similar function [Wall et al.]

Multidimensional Scaling In some experiments, can only measure similarity or dissimilarityIn some experiments, can only measure similarity or dissimilarity – e.g., is response to stimuli similar or different? – Frequent in psychophysical experiments, preference surveys, etc. Want to recover absolute positions in k-dimensional spaceWant to recover absolute positions in k-dimensional space

Multidimensional Scaling Example: given pairwise distances between citiesExample: given pairwise distances between cities – Want to recover locations [Pellacini et al.]

Euclidean MDS Formally, let’s say we have n  n matrix D consisting of squared distances d ij = ( x i – x j ) 2Formally, let’s say we have n  n matrix D consisting of squared distances d ij = ( x i – x j ) 2 Want to recover n  d matrix X of positions in d -dimensional spaceWant to recover n  d matrix X of positions in d -dimensional space

Euclidean MDS Observe thatObserve that Strategy: convert matrix D of d ij 2 into matrix B of x i x jStrategy: convert matrix D of d ij 2 into matrix B of x i x j – “Centered” distance matrix – B = XX T

Euclidean MDS Centering:Centering: – Sum of row i of D = sum of column i of D = – Sum of all entries in D =

Euclidean MDS Choose  x i = 0Choose  x i = 0 – Solution will have average position at origin – Then, So, to get B :So, to get B : – compute row (or column) sums – compute sum of sums – apply above formula to each entry of D – Divide by –2

Euclidean MDS Now have B, want to factor into XX TNow have B, want to factor into XX T If X is n  d, B must have rank dIf X is n  d, B must have rank d Take SVD, set all but top d singular values to 0Take SVD, set all but top d singular values to 0 – Eliminate corresponding columns of U and V – Have B 3 = U 3 W 3 V 3 T – B is square and symmetric, so U = V – Take X = U 3 times square root of W 3

Multidimensional Scaling Result ( d = 2):Result ( d = 2): [Pellacini et al.]

Multidimensional Scaling Caveat: actual axes, center not necessarily what you want (can’t recover them!)Caveat: actual axes, center not necessarily what you want (can’t recover them!) This is “classical” or “Euclidean” MDS [Torgerson 52]This is “classical” or “Euclidean” MDS [Torgerson 52] – Distance matrix assumed to be actual Euclidean distance More sophisticated versions availableMore sophisticated versions available – “Non-metric MDS”: not Euclidean distance, sometimes just inequalities – “Weighted MDS”: account for observer bias

Computation SVD very closely related to eigenvalue/vector computationSVD very closely related to eigenvalue/vector computation – Eigenvectors/values of A T A – In practice, similar class of methods, but operate on A directly

Methods for Eigenvalue Computation Simplest: power methodSimplest: power method – Begin with arbitrary vector x 0 – Compute x i+1 =Ax i – Normalize – Iterate Converges to eigenvector with maximum eigenvalue!Converges to eigenvector with maximum eigenvalue!

Power Method As this is repeated, coefficient of e 1 approaches 1As this is repeated, coefficient of e 1 approaches 1

Power Method II To find smallest eigenvalue, similar process:To find smallest eigenvalue, similar process: – Begin with arbitrary vector x 0 – Solve Ax i+1 = x i – Normalize – Iterate

Deflation Once we have found an eigenvector e 1 with eigenvalue 1, can compute matrix A – 1 e 1 e 1 TOnce we have found an eigenvector e 1 with eigenvalue 1, can compute matrix A – 1 e 1 e 1 T This makes eigenvalue of e 1 equal to 0, but has no effect on other eigenvectors/valuesThis makes eigenvalue of e 1 equal to 0, but has no effect on other eigenvectors/values In principle, could find all eigenvectors this wayIn principle, could find all eigenvectors this way

Other Eigenvector Computation Methods Power method OK for a few eigenvalues, but slow and sensitive to roundoff errorPower method OK for a few eigenvalues, but slow and sensitive to roundoff error Modern methods for eigendecomposition/SVD use sequence of similarity transformations to reduce to diagonal, then read off eigenvaluesModern methods for eigendecomposition/SVD use sequence of similarity transformations to reduce to diagonal, then read off eigenvalues