Download presentation

Presentation is loading. Please wait.

Published byBrad Huskins Modified over 3 years ago

1
CHAPTER 6: Dimensionality Reduction Author: Christoph Eick The material is mostly based on the Shlens PCA Tutorial http://www2.cs.uh.edu/~ceick/ML/pca.pdfhttp://www2.cs.uh.edu/~ceick/ML/pca.pdf and to a lesser extend based on material in the Alpaydin book.

2
2 Why Reduce Dimensionality? 1. Reduces time complexity: Less computation 2. Reduces space complexity: Less parameters 3. Saves the cost of aquiring the feature 4. Simpler models are more robust 5. Easier to interpret; simpler explanation 6. Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions Ch. Eick: Dimensionality Reduction

3
3 Feature Selection/Extraction/Construction Feature selection: Choosing k

4
4 Key Ideas Dimensionality Reduction Given a dataset X Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized The average projection cost is minimized Both are equivalent Ch. Eick: Dimensionality Reduction

5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Principal Components Analysis (PCA) Find a low-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = w T x Find w such that Var(z) capture is maximized Var(z) = Var(w T x) = E[(w T x – w T μ ) 2 ] = E[(w T x – w T μ )(w T x – w T μ )] = E[w T (x – μ )(x – μ ) T w] = w T E[(x – μ )(x – μ ) T ]w = w T ∑ w where Var(x)= E[(x – μ )(x – μ ) T ] = ∑ Question: Why does PCA maximize and not minimize the variance in z?

6
Clarifications Assume the dataset x is d-dimensional with n examples and we want to reduce it to a k-dimensional dataset z : x= {(…) n w T = {(…) d (…) (…) d } kxd (…) n } d n z= w T x kxn (you take scalar products of the elements in x with w obtaining a k-dimensional dataset) Remarks: w contains the k eigenvectors of the co-variance matrix of x with the highest eigenvalues: w i = i w i k is usually chosen based on the variance captured/largeness of the first k eigenvalues. 6 Ch. Eick: Dimensionality Reduction http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors k Eigenvectors Corrected on 2/24/2011

7
Example x= {(…) n w T = {(…) d (…) (…) d } kxd (…) n } d n z= w T x kxn (you take scalar products of the elements in x with w obtaining a k-dimensional dataset) Example: 4d dataset which contains 5 examples 1 2 0 1 4 0.5 0.5 -1.0 0.5 0 0 2 -1 2 1.0 0.0 1.0 1.0 1 1 1 1 1 0 -1 0 3 0 x w T 7 Ch. Eick: Dimensionality Reduction Corrected on 2/24/2011 (z1,z2):= (0.5*a1+0.5*a2 a3+0.5*a4, a1+a2+a4)

8
Shlens Tutorial on PCA PCA most valuable result of applied linear algebra (other PageRank) “The goal of PCA is to compute the most meaningful basis to re-express a noisy dataset. The hope is that the new basis will filter out noise and reveal hidden structure”. The goal of PCA is deciphering “garbled” data, referring to: rotation, redundancy, and noise. PCA is a non-parametric method; no way to incorporate preferences and other choices 8 Ch. Eick: Dimensionality Reduction

9
Computing Principal Components as Eigenvectors of the Covariance Matrix 1. Normalize x by subtracting from each attribute value its mean, obtaining y. 2. Compute 1/(n-1)*yy T = the covariance matrix of x. 3. Diagonalize obtaining a set of eigenvectors e with: e e T = i I ( i is the eigenvalue of the i th eigenvector) 4. Select how many and which eigenvectors in e to keep, obtaining w (based on variance expressed/largeness of eigenvalues and possibly other criteria) 5. Create your transformed dataset z= w T x Remark: Symmetric matrices are always orthogonally diagonalizable see proof page 11 of Shlens paper! 9 Ch. Eick: Dimensionality Reduction

10
10 Maximize Var(z) subject to ||w||=1 ∑w 1 = α w 1 that is, w 1 is an eigenvector of ∑ Choose the one principal component with the largest eigenvalue for Var(z) Second principal component: Max Var(z 2 ), s.t., ||w 2 ||=1 and orthogonal to w 1 ∑ w 2 = α w 2 that is, w 2 is another eigenvector of ∑ and so on. Textbook’s PCA Version

11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 What PCA does z = W T (x – m) where the columns of W are the eigenvectors of ∑, and m is sample mean Centers the data at the origin and rotates the axes http://www.youtube.com/watch?v=BfTMmoDFXyE

12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 How to choose k ? Proportion of Variance (PoV) explained when λ i are sorted in descending order Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at “elbow”

13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13

14
14 SVD Maximizing Variance Eigenvectors of Rome — Principal Components All Roads Lead to Rome! http://voices.yahoo.com/historical-origin-all-roads-lead-rome-5443183.html?cat=37 Ch. Eick: Dimensionality Reduction

15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Visualizing Numbers after applying PCA

16
16 Multidimensional Scaling Given pairwise distances between N points, d ij, i,j =1,...,N place on a low-dim map s.t. distances are preserved. z = g (x | θ )Find θ that min Sammon stress L1-Norm: http://en.wikipedia.org/wiki/Taxicab_geometryhttp://en.wikipedia.org/wiki/Taxicab_geometry Lq-NormL1-Norm: http://en.wikipedia.org/wiki/Lp_spacehttp://en.wikipedia.org/wiki/Lp_space

17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Map of Europe by MDS Map from CIA – The World Factbook: http://www.cia.gov/ http://forrest.psych.unc.edu/teaching/p208a/mds/mds.html

Similar presentations

Presentation is loading. Please wait....

OK

INTRODUCTION TO Machine Learning 3rd Edition

INTRODUCTION TO Machine Learning 3rd Edition

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Download ppt on indus valley civilization religion Ppt on total internal reflection definition Ppt on nitrogen cycle and nitrogen fixation by lightning Ppt on single phase and three phase dual converter operation Free ppt on entrepreneurship development programme Ppt on tamper resistant tape Ppt on file system in mobile computing Ppt on world environment day 2012 Ppt on vision and mission Ppt on diversity in living organisms class 9