Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.

Slides:



Advertisements
Similar presentations
Chapter3 Pattern Association & Associative Memory
Advertisements

3D Geometry for Computer Graphics
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Introduction to Neural Networks Computing
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis
Linear Transformations
Principal Component Analysis
Computer Graphics Recitation 5.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Bayesian belief networks 2. PCA and ICA
6 1 Linear Transformations. 6 2 Hopfield Network Questions.
PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing.
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
化工應用數學 授課教師: 郭修伯 Lecture 9 Matrices
Matrices CS485/685 Computer Vision Dr. George Bebis.
Stats & Linear Models.
5.1 Orthogonality.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 5QF Introduction to Vector and Matrix Operations Needed for the.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Differential Equations
Boyce/DiPrima 9th ed, Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues Elementary Differential Equations and Boundary Value Problems,
Summarized by Soo-Jin Kim
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Chapter 5 Unsupervised learning. Introduction Unsupervised learning –Training samples contain only input patterns No desired output is given (teacher-less)
Unsupervised learning
Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.
Non-Bayes classifiers. Linear discriminants, neural networks.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions WK9 – Principle Component Analysis CS 476: Networks.
Eigenvalues The eigenvalue problem is to determine the nontrivial solutions of the equation Ax= x where A is an n-by-n matrix, x is a length n column.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Arab Open University Faculty of Computer Studies M132: Linear Algebra
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Feature Extraction 主講人:虞台文.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Review of Linear Algebra
CS479/679 Pattern Recognition Dr. George Bebis
Review of Matrix Operations
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Principle Component Analysis (PCA) Networks (§ 5.8)
9.3 Filtered delay embeddings
Matrices and Vectors Review Objective
Unsupervised Learning: Principle Component Analysis
Neural Networks and Backpropagation
Principal Component Analysis
Bayesian belief networks 2. PCA and ICA
CS485/685 Computer Vision Dr. George Bebis
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
X.1 Principal component analysis
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Feature space tansformation methods
Principal Components What matters most?.
Principal Component Analysis
Subject :- Applied Mathematics
Presentation transcript:

Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them are dependent of others Extract important (new) features of data which are functions of original features Minimize information loss in the process –This is done by forming new interesting features As linear combinations of original features (first order of approximation) New features are required to be linearly independent (to avoid redundancy) New features are desired to be different from each other as much as possible (maximum variability)

Two vectors are said to be orthogonal to each other if A set of vectors of dimension n are said to be linearly independent of each other if there does not exist a set of real numbers which are not all zero such that otherwise, these vectors are linearly dependent and each one can be expressed as a linear combination of the others Linear Algebra

Vector x is an eigenvector of matrix A if there exists a constant  != 0 such that Ax =  x –  is called a eigenvalue of A (wrt x) –A matrix A may have more than one eigenvectors, each with its own eigenvalue –Eigenvectors of a matrix corresponding to distinct eigenvalues are linearly independent of each other Matrix B is called the inverse matrix of matrix A if AB = 1 –1 is the identity matrix –Denote B as A -1 –Not every matrix has inverse (e.g., when one of the row/column can be expressed as a linear combination of other rows/columns) Every matrix A has a unique pseudo-inverse A *, which satisfies the following properties AA * A = A;A * AA * = A * ;A * A = (A * A) T ;AA * = (AA * ) T

If rows of W have unit length and are ortho- gonal (e.g., w 1 w 2 = ap + bq + cr = 0), then Example of PCA: 3-dim x is transformed to 2-dem y 2-d feature vector Transformation matrix W 3-d feature vector W T is a pseudo-inverse of W

Generalization –Transform n-dim x to m-dem y (m < n), the pseudo-inverse matrix W is a m x n matrix –Transformation: y = Wx –Opposite transformation: x’ = W T y = W T Wx –If W minimizes “information loss” in the transformation, then ||x – x’|| = ||x – W T Wx|| should also be minimized –If W T is the pseudo-inverse of W, then x’ = x: perfect transformation (no information loss) How to find such a W for a given set of input vectors –Let T = {x 1, …, x k } be a set of input vectors –Making them zero-mean vectors by subtracting the mean vector (∑ x i ) / k from each x i. –Compute the correlation matrix S(T) of these zero-mean vectors, which is a n x n matrix (book calls covariance-variance matrix)

–Find the m eigenvectors of S(T): w 1, …, w m corresponding to m largest eigenvalues  1, …,  m –w 1, …, w m are the first m principal components of T –W = (w1, …, w m ) is the transformation matrix we are looking for –m new features extract from transformation with W would be linearly independent and have maximum variability –This is based on the following mathematical result:

Example

PCA network architecture Output: vector y of m-dim W: transformation matrix y = Wx x = W T y Input: vector x of n-dim –Train W so that it can transform sample input vector x l from n-dim to m-dim output vector y l. –Transformation should minimize information loss: Find W which minimizes ∑ l ||x l – x l ’|| = ∑ l ||x l – W T Wx l || = ∑ l ||x l – W T y l || where x l ’ is the “opposite” transformation of y l = Wx l via W T

Training W for PCA net –Unsupervised learning: only depends on input samples x l –Error driven: ΔW depends on ||x l – x l ’|| = ||x l – W T Wx l || –Start with randomly selected weight, change W according to –This is only one of a number of suggestions for K l, (Williams) –Weight update rule becomes column vector row vector transf. error ( )

Example (sample sample inputs as in previous example) After x 3 After x 4 After x 5 After second epoch eventually converging to 1 st PC ( ) -

Notes –PCA net approximates principal components (error may exist) –It obtains PC by learning, without using statistical methods –Forced stabilization by gradually reducing η –Some suggestions to improve learning results. instead of using identity function for output y = Wx, using non-linear function S, then try to minimize If S is differentiable, use gradient descent approach For example: S be monotonically increasing odd function S(-x) = -S(x) (e.g., S(x) = x 3