CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

3D Geometry for Computer Graphics
Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.
Covariance Matrix Applications
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Data Mining: Concepts and Techniques — Chapter 3 — Cont.
Principal Component Analysis
3D Geometry for Computer Graphics
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
3D Geometry for Computer Graphics
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Separate multivariate observations
SVD(Singular Value Decomposition) and Its Applications
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
SINGULAR VALUE DECOMPOSITION (SVD)
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
CSE 554 Lecture 8: Alignment
CSCE822 Data Mining and Warehousing
PREDICT 422: Practical Machine Learning
Lecture: Face Recognition and Feature Reduction
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis (PCA)
Principal Components Analysis
LSI, SVD and Data Management
Principal Component Analysis
Principal Component Analysis
Recitation: SVD and dimensionality reduction
Dimension reduction : PCA and Clustering
Principal Component Analysis (PCA)
Feature space tansformation methods
Feature Selection Methods
Principal Component Analysis
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

CpSc 881: Machine Learning PCA and MDS

2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!

3 Background: Covariance X=TemperatureY=Humidity Covariance: measures the correlation between X and Y cov(X,Y)=0: independent Cov(X,Y)>0: move same dir Cov(X,Y)<0: move oppo dir

4 Background: Covariance Matrix Contains covariance values between all possible dimensions (=attributes): Example for three attributes (x,y,z):

5 Background: eigenvalues AND eigenvectors Eigenvectors e : C e = e How to calculate e and : Calculate det(C- I), yields a polynomial (degree n) Determine roots to det(C- I)=0, roots are eigenvalues Check out any math book such as Elementary Linear Algebra by Howard Anton, Publisher John,Wiley & Sons Or any math packages such as MATLAB

6 An Example X1X2X1'X2' Mean1=24.1 Mean2=53.8

7 Covariance Matrix C= Using MATLAB, we find out: Eigenvectors: e1=(-0.98, 0.21), 1=51.8 e2=(0.21, 0.98), 2=560.2 Thus the second eigenvector is more important!

8 Principal Component Analysis (PCA) Used for visualization of complex data Principle Component Analysis: project onto subspace with the most variance Developed to capture as much of the variation in data as possible Generic features of principal components summary variables linear combinations of the original variables uncorrelated with each other capture as much of the original variance as possible

9 PCA Algorithm 1. X  Create N x d data matrix, with one row vector x n per data point 2. X subtract mean x from each row vector x n in X 3. Σ  covariance matrix of X Find eigenvectors and eigenvalues of Σ PC’s  the M eigenvectors with largest eigenvalues

10 Principal components 1.principal component (PC1) the direction along which there is greatest variation 2.principal component (PC2) the direction with maximum variation left in data, orthogonal to the direction (i.e. vector) of PC1 3.principal component (PC3) –the direction with maximal variation left in data, orthogonal to the plane of PC1 and PC2 –(Rarely used) –etc...

11 Geometric Rationale of PCA objective of PCA is to rigidly rotate the axes of this p-dimensional space to new positions (principal axes) that have the following properties: ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance,...., and axis p has the lowest variance covariance among each pair of the principal axes is zero (the principal axes are uncorrelated).

12 Example: 3 dimensions => 2 dimensions

13 PCA on all Genes Leukemia data, precursor B and T Plot of 34 patients, 8973 dimensions (genes) reduced to 2

14 How many components? Check the distribution of eigen-values Take enough many eigen-vectors to cover % of the variance

15 Problems and limitations What if very large dimensional data? e.g., Images (d ≥ 10 4 ) Problem: Covariance matrix Σ is size (d 2 ) d=10 4  |Σ| = 10 8 Singular Value Decomposition (SVD)! efficient algorithms available (Matlab) some implementations find just top N eigenvectors

16

17 Singular Value Decomposition Problem: #1: Find concepts in text #2: Reduce dimensionality

18 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (e.g., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix) V: m x r matrix (m terms, r concepts)

19 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) U T U = I; V T V = I (I: identity matrix)  : singular value are positive, and sorted in decreasing order

20 SVD - Properties ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

21 SVD - Interpretation ‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept similarity matrix  : its diagonal elements: ‘strength’ of each concept Projection: best axis to project on: (‘best’ = min sum of squares of projection errors)

22 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

23 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

24 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

25 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

26 SVD – Dimensionality reduction Q: how exactly is dim. reduction done? A: set the smallest singular values to zero: = xx

27 SVD - Dimensionality reduction ~ xx

28 SVD - Dimensionality reduction ~

29 Multidimensional Scaling Procedures Similar in spirit to PCA but it takes a dissimilarity as input

30 Multidimensional Scaling Procedures The purpose of multidimensional scaling (MDS) is to map the distances between points in a high dimensional space into a lower dimensional space without too much loss of information.

31 Math MDS seeks values z_1,...,z_N in R^k to minimize the so-called stress function This is known as least squares or classical multidimensional scaling. A gradient descent algorithm is used to minimize S. A non-metric form of MDS is Sammons (1996) non-linear mapping. Here the following stress function is being minimized:

32 We use MDS to visualize the dissimilarities between objects. The procedures are very exploratory and their interpretations are as much art as they are science.

33 Examples The “points” that are represented in multidimensional space can be just about anything. These objects might be people, in which case MDS can identify clusters of people who are “close” versus “distant” in some real or psychological sense.

34 Multidimensional Scaling Procedures As long as the “distance” between the objects can be assessed in some fashion, MDS can be used to find the lowest dimensional space that still adequately captures the distances between objects. Once the number of dimensions is identified, a further challenge is identifying the meaning of those dimensions. Basic data representation in MDS is a dissimilarity matrix that shows the distance between every possible pair of objects. The goal of MDS is to faithfully represent these distances with the lowest possible dimensional space.

35 Multidimensional Scaling Procedures The mathematics behind MDS can be daunting to understand. Two types: classical (metric) multidimensional scaling and non-metric scaling. Example: Distances between cities on the globe

36 Multidimensional Scaling Procedures This table lists the distances between European cities. A multidimensional scaling of these data should be able to recover the two dimensions (North-South x East-West) that we know must underlie the spatial relations among the cities.

37 Multidimensional Scaling Procedures MDS begins by restricting the dimension of the space and then seeking an arrangement of the objects in that restricted space that minimizes the difference between the distances in that space compared to the actual distances.

38 Multidimensional Scaling Procedures Appropriate number of dimensions are identified… Objects can be plotted in the multidimensional space… Determine what objects cluster together and why they might cluster together. The latter issue concerns the meaning of the dimensions and often requires additional information.

39 Multidimensional Scaling Procedures In the cities data, the meaning is quite clear. The dimensions refer to the North-South x East-West surface area across which the cities are dispersed. We would expect MDS to faithfully recreate the map relations among the cities.

40 Multidimensional Scaling Procedures This arrangement provides the best fit for a one-dimensional model. How good is the fit? We use a statistic called “stress” to judge the goodness-of-fit.

41 Smaller stress values indicate better fit. Some rules of thumb for degree of fit are: StressFit.20 Poor.10 Fair.05 Good.02 Excellent Multidimensional Scaling Procedures

42 The stress for the one-dimensional model of the cities data is.31, clearly a poor fit. The poor fit can also be seen in a plot of the actual distances versus the distances in the one-dimensional model, known as a Shepard plot. In a good fitting model, the points will lie along a line, sloping upward to the right, showing a one-to-one correspondence between distances in the model space and actual distances. Clearly not evident here. Multidimensional Scaling Procedures

43 A two- dimensional model fits very well. The stress value is also quite small (.00902) indicating an exceptional fit. Of course, this is no great surprise for these data. Multidimensional Scaling Procedures

44 Not any room for a three- dimensional model to improve matters. The stress is.00918, indicating that a third dimension does not help at all. Multidimensional Scaling Procedures

45 MDS Example: Clusters among Prostate Samples