 # The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

## Presentation on theme: "The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product."— Presentation transcript:

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product Eigenvalue, Eigenvector Projection

Least Squares Problem:  The normal equation for LS problem: Finding the projection of onto the  The projection matrix:  Let be a matrix with full column rank  If has orthonormal columns, then the LS problem becomes easy:  Think of orthonormal axis system

Matrix Factorization  LU-Factorization:  QR-Factorization:  Very useful for solving linear system equations  Some row exchanges are required Every matrix with linearly independent columns can be factored into. The columns of are orthonormal,and is upper triangular and invertible. When and all matrices are square, becomes an orthogonal matrix ( )

QR Factorization Simplifies Least Squares Problem  The normal equation for LS problem:  Note: The orthogonal matrix constructs the column space of matrix LS problem: Finding the projection of onto the

Motivation for Computing QR of the term-by-doc Matrix  The basis vectors of the column space of can be used to describe the semantic content of the corresponding text collection  Let be the angle between a query and the document vector  That means we can keep and instead of  QR also can be applied to dimension reduction

Recall Matrix Notations Random vector x = [X 1, X 2, …, X n ] T where each X i is a random variable to describe the value of the i -th attribute Expectation: E[x] = , covariance: E[(x –  )(x –  ) T ] =  Expectation of projection: E[w T x] = E[ ∑ i w i X i ] = ∑ i w i E[X i ] = w T E[x] = w T  Variance of projection: Var(w T x)= E[(w T x – w T  ) 2 ] = E[(w T x – w T  )(w T x – w T  )] = E[w T (x –  )(x –  ) T w] = w T E[(x –  )(x –  ) T ]w = w T  w w T : 1  n x: n  1

Principal Components Analysis (PCA) Not using the output information Find a mapping from the inputs in the original n - dimensional space to a new ( k<n )-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = w T x Find w such that Var(z) is maximized (after the projection, the difference between the sample points becomes most apparent) For a unique solution, ||w|| = 1 w (||w|| = 1) x wTxwTx

The 1st Principal Component Maximize Var(z) subject to ||w||=1 Take the derivative w.r.t. w 1 and setting it to 0, we have That is, w 1 is an eigenvector of  and  the corresponding eigenvalue Also, We can choose the largest eigenvalue for Var(z) to be maximum  The 1st principal component is the eigenvector of the covariance matrix of the input sample with the largest eigenvalue, 1 = 

The 2nd Principal Component Maximize Var(z 2 ), s.t., ||w 2 || = 1 and orthogonal to w 1 Take the derivative w.r.t. w 2 and setting it equal to 0, we have Premultiply by w 1 T and we get Note that w 1 T w 2 = 0, and w 1 T  w 2 is a scalar, equal to its transpose, therefore And we have That is, w 2 is the eigenvector of  with the second largest eigenvalue 2 = , and so on.

Recall from Linear Algebra Theorem: Eigenvectors associated with different eigenvalues are orthogonal to each other Theorem: A real symmetry matrix A can be transformed into a diagonal matrix by P -1 AP = D where P has its columns as the eigenvectors of A

Recall from Linear Algebra (cont.) Def: Positive definite bilinear form: f (x, x) > 0,  x  0 E.g.: f (x, y) = x T Ay x T Ax > 0  x  0  A an n  n matrix is called a positive definite matrix Def: Positive semidefinite bilinear form: f (x, x)  0,  x E.g.: x T Ax  0,  x  A is called a positive semidefinite matrix Theorem: Matrix A is positive definite if and only if all the eigenvalues of A are positives

What PCA does Consider an R n  R k transformation where the k columns of W are the k leading eigenvectors of S (the estimator to  ), and m is the sample mean Note: if k = n, WW T = W T W = I, so W -1 = W T, or W T W = I k  k, if k < n The transformation will center the data at the origin and rotates the axes to those eigenvectors, and the variances over the new dimensions are equal to the eigenvalues z = W T (x – m) (just like z = w 1 T x, z = w 2 T x, … )

Singular Value Decomposition (SVD) The columns of are eigenvectors of and the columns of are eigenvectors of eigenvalues of both and are square roots of the nonzero

Singular Value Decomposition (SVD)

Latent Semantic Indexing (LSI) Basic idea: explore the correlation between words and documents Two words are correlated when they co-occur together many times Two documents are correlated when they have many words

Latent Semantic Indexing (LSI) Computation: using single value decomposition (SVD) Concept Space m is the number of concepts Rep. of Concepts in term space Concept Rep. of concepts in document space m: number of concepts/topics 

XX SVD: Example: m=2

XX

XX

XX

SVD: Eigenvalues Determining m is usually difficult

SVD: Orthogonality XX u 1 u 2 · = 0 v1v1 v2v2 v 1 · v 2 = 0

XX SVD: Properties rank(S): the maximum number of either row or column vectors within matrix S that are linearly independent. SVD produces the best low rank approximation  X’: rank(X’) = 2 X: rank(X) = 9

SVD: Visualization X =

SVD tries to preserve the Euclidean distance of document vectors

Download ppt "The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product."

Similar presentations