2Outline What is the covariance matrix? Example Properties of the covariance matrixSpectral DecompositionPrincipal Component Analysis
3Covariance MatrixCovariance matrix captures the variance and linear correlation in multivariate/multidimensional data.If data is an N x D matrix, the Covariance Matrix is a d x d square matrix.Think of N as the number of data instances (rows) and D the number of attributes (columns).
4Covariance FormulaLet Data = N x D matrix.The Cov(Data)
6Moral: Covariance can only capture linear relationships
7Dimensionality Reduction If you work in “data analytics” it is common these days to be handed a data set which has lots of variables (dimensions).The information in these variables is often redundant – there are only a few sources of genuine information.Question: How can be identify these sources automatically?
8Hidden Sources of Variance X1X2H1X1X2X3X4DATX3H2X4Model: Hidden Sources are Linear Combinations of Original Variables
9Hidden SourcesIf the information that the known variables provided was different then the covariance matrix between the variables should be a diagonal matrix – i.e, the non-zero entries only appear on the diagonal.In particular, if Hi and Hj are independent then E(Hi-i)(Hj-j)=0.
10Hidden Sources So the question is what should be the hidden sources. It turns out that the “best” hidden sources are the eigenvectors of the covariance matrix.If A is a d x d matrix, then <, x> is an eigenvalue-eigenvector pair ifAx = x
11ExplanationaWe have two axis, X1 and X2. We want to project the data along the directionof maximum variance.
12Covariance Matrix Properties The Covariance matrix is symmetric.Non-negative eigenvalues.0 · 1 · 2 dCorresponding eigenvectorsu1,u2,,ud
13Principal Component Analysis Also known asSingular Value DecompositionLatent Semantic IndexingTechnique for data reduction. Essentially reduce the number of columns while losing minimal informationAlso think in terms of lossy compression.
14Motivation Bulk of data has a time component For example, retail transactions, stock pricesData set can be organized as N x M tableN customers and the price of the calls they made in 365 daysM << N
15Objective Compress the data matrix X into Xc, such that The compression ratio is high and the average error between the original and the compressed matrix is lowN could be in the order of millions and M in the order of hundreds
16Example database We 7/10 Thr 7/11 Fri 7/12 Sat 7/13 Sun 7/14 ABC 1 DEF DEF2GHIKLM5smithjohn3tom
17Decision Support Queries What was the amount of sales to GHI on July 11?Find the total sales to business customers for the week ending July 12th?
18Intuition behind SVDyx’y’xCustomer are 2-D points
19SVD Definition An N x M matrix X can be expressed as Lambda is a diagonal r x r matrix.
20SVD Definition More importantly X can be written as Where the eigenvalues are in decreasing order.k,<r
23Explanation Let X be a mean-centered N x d matrix. Let a be an arbitrary d x 1 unit vector (initially).The projection of X onto a is given by XaWe want to maximize the variance of Xa.The constraint is that aTa = 1It can be shown that a is given by the solution of the equation (XTX - I)a = 0In other words a is the eigenvector of the covariance matrix and the is the eigenvalue.