Principal Component Analysis

Slides:



Advertisements
Similar presentations
Determinant The numerical value of a square array of numbers that can be used to solve systems of equations with matrices. Second-Order Determinant (of.
Advertisements

Eigen Decomposition and Singular Value Decomposition
Lecture 3: A brief background to multivariate statistics
Chapter 6 Eigenvalues and Eigenvectors
Machine Learning Lecture 8 Data Processing and Representation
Linear Algebra.
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
Solving System of Linear Equations. 1. Diagonal Form of a System of Equations 2. Elementary Row Operations 3. Elementary Row Operation 1 4. Elementary.
4.2 Adding and Subtracting Matrices 4.3 Matrix Multiplication
5.1 Orthogonality.
Chapter 2 Dimensionality Reduction. Linear Methods
Everything Matrices All the other tricks without all the hassles.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Review for Chapter 4 Important Terms, Symbols, Concepts 4.1. Systems of Linear Equations in Two Variables.
THU, JAN 8, 2015 Create a “Big Book of Matrices” flip book using 4 pages. Do not make your tabs big! BIG BOOK OF MATRICES What is a Matrix? Adding & Subtracting.
Some matrix stuff.
Day 1 Eigenvalues and Eigenvectors
Day 1 Eigenvalues and Eigenvectors
Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Matrices Addition & Subtraction Scalar Multiplication & Multiplication Determinants Inverses Solving Systems – 2x2 & 3x3 Cramer’s Rule.
Ch X 2 Matrices, Determinants, and Inverses.
Unit 3: Matrices.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Domain Range definition: T is a linear transformation, EIGENVECTOR EIGENVALUE.
SINGULAR VALUE DECOMPOSITION (SVD)
Lesson 11-1 Matrix Basics and Augmented Matrices Objective: To learn to solve systems of linear equation using matrices.
Class Opener:. Identifying Matrices Student Check:
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Linear Algebra Diyako Ghaderyan 1 Contents:  Linear Equations in Linear Algebra  Matrix Algebra  Determinants  Vector Spaces  Eigenvalues.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Linear Algebra Diyako Ghaderyan 1 Contents:  Linear Equations in Linear Algebra  Matrix Algebra  Determinants  Vector Spaces  Eigenvalues.
Algebra Matrix Operations. Definition Matrix-A rectangular arrangement of numbers in rows and columns Dimensions- number of rows then columns Entries-
2.5 – Determinants and Multiplicative Inverses of Matrices.
Discriminant Function Analysis Mechanics. Equations To get our results we’ll have to use those same SSCP matrices as we did with Manova.
Math 1320 Chapter 3: Systems of Linear Equations and Matrices 3.2 Using Matrices to Solve Systems of Equations.
3.5 Perform Basic Matrix Operations Add Matrices Subtract Matrices Solve Matric equations for x and y.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Unit 3: Matrices. Matrix: A rectangular arrangement of data into rows and columns, identified by capital letters. Matrix Dimensions: Number of rows, m,
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Unsupervised Learning II Feature Extraction
Reduced echelon form Matrix equations Null space Range Determinant Invertibility Similar matrices Eigenvalues Eigenvectors Diagonabilty Power.
Unsupervised Learning II Feature Extraction
Introduction to Vectors and Matrices
13.4 Product of Two Matrices
Dimensionality Reduction
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
9.3 Filtered delay embeddings
Complex Eigenvalues Prepared by Vince Zaccone
Principal Component Analysis
Eigenvalues and Eigenvectors
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Techniques for studying correlation and covariance structure
MATRICES MATRIX OPERATIONS.
Chapter 4 Systems of Linear Equations; Matrices
Unit 3: Matrices
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Eigenvalues and Eigenvectors
Principal Components What matters most?.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Introduction to Vectors and Matrices
Principal Component Analysis
Chapter 4 Systems of Linear Equations; Matrices
Eigen Decomposition Based on the slides by Mani Thomas
Principal Components What matters most?.
Presentation transcript:

Principal Component Analysis Step by Step Walk Through Paul Biliniski

Purpose Find patterns in data with many dimensions Reduces the number of dimensions, analysis becomes easier ONLY WORKS ON SQUARE MATRICES

Mathematical Concepts Measures of Spread in 1 Dimension Standard Deviation – spread of data from mean Variance Measure of Spread in 2 Dimensions Covariance – Variance between 2 data sets, to see if they change at similar rates; sign is important Covariance Matrix Matrix of all of the covariances between each pair of data sets Eigenvector Transformation vector that creates a reflection of a data set onto itself Eigenvalue Amount by which the original vector is scaled

Step 1: Data, Subtract Means Find the Mean of each component of the data set Subtract that mean from each of the components Mean of Height: 173.9366667 Mean of OFC: 57.59333333 Height (CM) OFC (CM) 161.1 56.1 179.8 57.5 186.3 60.1 163.9 56.6 190 59.8 179.9 58 177.9 59.3 195 59.9

Step 1: Data Graphed Clearly, there is a linear relationship

Step 1: Subtract Means Height (CM) OFC (CM) -12.83666667 -1.665789474 5.863333333 -0.265789474 12.36333333 2.334210526 -10.03666667 -1.165789474 16.06333333 2.034210526 5.963333333 0.234210526 3.963333333 1.534210526 21.06333333 2.134210526

Step 2: Covariance Matrix Calculate the covariance matrix The diagonal should be the variance in each data set, the anti-diagonal should be the covariance All positive values tells us that we expect to see that as data set 1 increases, data set 2 also increases. Verified by graph 104.901023 15.36128736 2.791678161

Step 3: Calculate Eigens Eigenvector x times matrix A equals Eigenvalue (Λ) times x Ax = Λx The eigenvalue is found with: det (A – ΛI) = 0 It is the determinant (performs a linear transformation of vector space) of the original matrix – Λ in the diagonals So with the original matrix: The Eigenvalue can be found with the equation: Use the Quadratic to solve

Step 3: Calculate Eigens So for our situation, we use the 2x2 matrix of the covariances: So the determinant for this is: (104.901023 – Λ) * (2.791678161 – Λ) - 15.36128736^2 =0 So, the eigenvalues are 0.54 and 107.161! 104.901023 15.36128736 2.791678161 104.901023 - Λ 15.36128736 2.791678161 - Λ det

Step 3: Calculate Eigens With the Eigens solved, we can now solve for the vectors in the null space, getting our vector to Row Echelon form; first just use one of the eigenvalues as the Λ For row echelon form, get the item in the second row(15.36), first column to equal – 105.44, or super close to it. So, multiply the second row by 6.86! 104.901023 - 0.54 15.36128736 2.791678161 – 0.54 X Y 104.35 15.36128736 2.26 X Y

Step 3: Calculate Eigens Row 1 stays the same, Row 2 = Row 1 – 6.86*(Row2) So, do the multiplication: 105.44X +15.36Y = 0 We can define 105.44X = S, the variable S. This means that 15.36Y = 0.14S The Vector for the value 0.54 is Apply this same technique to the eigenvalue of 107, and get 105.44 15.36128736 ~0 X Y 1 0.14 -.014 1

Step 3: Calculate Eigens Now lets see how our Eigens look on our graph of subtracted means data… looks like a line of best fit, reasonable.

Step 3: Calculate Eigens The eigenvalue with the highest value is considered the principle component of the data set. Set up a matrix with your two eigenvalues, so you can transform the data. Note that column 1 is the eigenvector associated with our 107 eigenvalue, the bigger one Now, we get back to the data… 1 0.146 -0.146

Step 4: Transform Data Using the 2x2 matrix of eigenvectors, multiply each matrix of the data (each should be a 1 column 2 row matrix, X value on top of Y value) Continue this for EVERY set of points… 1 0.146 -0.146 -12.83666667 -1.493333333 -13.05469333 0.38082

Step 4: Transform Data This is what the data should look like after this eigenvector multiplication step.

Step 5: Define Noise One of the axes is going to be the noise that we assume occurs as a result of sampling. Choose one, in this case the Y values. 1 0.146 -0.146 -12.83666667 -1.493333333 -13.05469333 0.38082 -13.05469333

Step 6: Getting back the Data Use the new noise-less data to plot your new points. Multiply the data without noise by the eigen matrix Repeat for all of your points. Now, add the means of each back to the data: 1 0.146 -0.146 13.05469333 -13.05469333 -1.905985227 -13.05469333 -1.905985227 173.9366667 57.59333333 160.8819733 55.68734811

Step 7: Victory Plot your new data points. They are now one line without noise. This is now the new axis against which you can plot another component. Keep adding variables into each component until there is no longer a linear relationship. That will show you what components cause the variation in your data.